drivers/tty/serial/qcom_geni_serial.c | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-)
A deadlock is observed in the qcom_geni_serial driver during runtime
resume. This occurs when the pinctrl subsystem reconfigures device pins
via msm_pinmux_set_mux() while the serial device's interrupt is an
active wakeup source. msm_pinmux_set_mux() calls disable_irq() or
__synchronize_irq(), conflicting with the active wakeup state and
causing the IRQ thread to enter an uninterruptible (D-state) sleep,
leading to system instability.
The critical call trace leading to the deadlock is:
Call trace:
__switch_to+0xe0/0x120
__schedule+0x39c/0x978
schedule+0x5c/0xf8
__synchronize_irq+0x88/0xb4
disable_irq+0x3c/0x4c
msm_pinmux_set_mux+0x508/0x644
pinmux_enable_setting+0x190/0x2dc
pinctrl_commit_state+0x13c/0x208
pinctrl_pm_select_default_state+0x4c/0xa4
geni_se_resources_on+0xe8/0x154
qcom_geni_serial_runtime_resume+0x4c/0x88
pm_generic_runtime_resume+0x2c/0x44
__genpd_runtime_resume+0x30/0x80
genpd_runtime_resume+0x114/0x29c
__rpm_callback+0x48/0x1d8
rpm_callback+0x6c/0x78
rpm_resume+0x530/0x750
__pm_runtime_resume+0x50/0x94
handle_threaded_wake_irq+0x30/0x94
irq_thread_fn+0x2c/xa8
irq_thread+0x160/x248
kthread+0x110/x114
ret_from_fork+0x10/x20
To resolve this, explicitly manage the wakeup IRQ state within the
runtime suspend/resume callbacks. In the runtime resume callback, call
disable_irq_wake() before enabling resources. This preemptively
removes the "wakeup" capability from the IRQ, allowing subsequent
interrupt management calls to proceed without conflict. An error path
re-enables the wakeup IRQ if resource enablement fails.
Conversely, in runtime suspend, call enable_irq_wake() after resources
are disabled. This ensures the interrupt is configured as a wakeup
source only once the device has fully entered its low-power state. An
error path handles disabling the wakeup IRQ if the suspend operation
fails.
Fixes: 1afa70632c39 ("serial: qcom-geni: Enable PM runtime for serial driver")
Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
---
drivers/tty/serial/qcom_geni_serial.c | 22 ++++++++++++++++++++--
1 file changed, 20 insertions(+), 2 deletions(-)
diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
index 0fdda3a1e70b..4f5ea28dfe8f 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -1926,8 +1926,17 @@ static int __maybe_unused qcom_geni_serial_runtime_suspend(struct device *dev)
struct uart_port *uport = &port->uport;
int ret = 0;
- if (port->dev_data->power_state)
+ if (port->dev_data->power_state) {
ret = port->dev_data->power_state(uport, false);
+ if (ret) {
+ if (device_can_wakeup(dev))
+ disable_irq_wake(port->wakeup_irq);
+ return ret;
+ }
+ }
+
+ if (device_can_wakeup(dev))
+ enable_irq_wake(port->wakeup_irq);
return ret;
}
@@ -1938,8 +1947,17 @@ static int __maybe_unused qcom_geni_serial_runtime_resume(struct device *dev)
struct uart_port *uport = &port->uport;
int ret = 0;
- if (port->dev_data->power_state)
+ if (device_can_wakeup(dev))
+ disable_irq_wake(port->wakeup_irq);
+
+ if (port->dev_data->power_state) {
ret = port->dev_data->power_state(uport, true);
+ if (ret) {
+ if (device_can_wakeup(dev))
+ enable_irq_wake(port->wakeup_irq);
+ return ret;
+ }
+ }
return ret;
}
base-commit: 3e8e5822146bc396d2a7e5fbb7be13271665522a
--
2.34.1
On Mon Sep 8, 2025 at 5:45 PM BST, Praveen Talari wrote:
> A deadlock is observed in the qcom_geni_serial driver during runtime
> resume. This occurs when the pinctrl subsystem reconfigures device pins
> via msm_pinmux_set_mux() while the serial device's interrupt is an
> active wakeup source. msm_pinmux_set_mux() calls disable_irq() or
> __synchronize_irq(), conflicting with the active wakeup state and
> causing the IRQ thread to enter an uninterruptible (D-state) sleep,
> leading to system instability.
>
> The critical call trace leading to the deadlock is:
>
> Call trace:
> __switch_to+0xe0/0x120
> __schedule+0x39c/0x978
> schedule+0x5c/0xf8
> __synchronize_irq+0x88/0xb4
> disable_irq+0x3c/0x4c
> msm_pinmux_set_mux+0x508/0x644
> pinmux_enable_setting+0x190/0x2dc
> pinctrl_commit_state+0x13c/0x208
> pinctrl_pm_select_default_state+0x4c/0xa4
> geni_se_resources_on+0xe8/0x154
> qcom_geni_serial_runtime_resume+0x4c/0x88
> pm_generic_runtime_resume+0x2c/0x44
> __genpd_runtime_resume+0x30/0x80
> genpd_runtime_resume+0x114/0x29c
> __rpm_callback+0x48/0x1d8
> rpm_callback+0x6c/0x78
> rpm_resume+0x530/0x750
> __pm_runtime_resume+0x50/0x94
> handle_threaded_wake_irq+0x30/0x94
> irq_thread_fn+0x2c/xa8
> irq_thread+0x160/x248
> kthread+0x110/x114
> ret_from_fork+0x10/x20
>
> To resolve this, explicitly manage the wakeup IRQ state within the
> runtime suspend/resume callbacks. In the runtime resume callback, call
> disable_irq_wake() before enabling resources. This preemptively
> removes the "wakeup" capability from the IRQ, allowing subsequent
> interrupt management calls to proceed without conflict. An error path
> re-enables the wakeup IRQ if resource enablement fails.
>
> Conversely, in runtime suspend, call enable_irq_wake() after resources
> are disabled. This ensures the interrupt is configured as a wakeup
> source only once the device has fully entered its low-power state. An
> error path handles disabling the wakeup IRQ if the suspend operation
> fails.
>
> Fixes: 1afa70632c39 ("serial: qcom-geni: Enable PM runtime for serial driver")
> Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
You forgot:
Reported-by: Alexey Klimov <alexey.klimov@linaro.org>
Also, not sure where this change will go, via Greg or Jiri, but ideally
this should be picked for current -rc cycle since regression is
introduced during latest merge window.
I also would like to test it on qrb2210 rb1 where this regression is
reproduciable.
Thanks,
Alexey
[..]
(adding Krzysztof to c/c)
On Mon Sep 8, 2025 at 6:43 PM BST, Alexey Klimov wrote:
> On Mon Sep 8, 2025 at 5:45 PM BST, Praveen Talari wrote:
>> A deadlock is observed in the qcom_geni_serial driver during runtime
>> resume. This occurs when the pinctrl subsystem reconfigures device pins
>> via msm_pinmux_set_mux() while the serial device's interrupt is an
>> active wakeup source. msm_pinmux_set_mux() calls disable_irq() or
>> __synchronize_irq(), conflicting with the active wakeup state and
>> causing the IRQ thread to enter an uninterruptible (D-state) sleep,
>> leading to system instability.
>>
>> The critical call trace leading to the deadlock is:
>>
>> Call trace:
>> __switch_to+0xe0/0x120
>> __schedule+0x39c/0x978
>> schedule+0x5c/0xf8
>> __synchronize_irq+0x88/0xb4
>> disable_irq+0x3c/0x4c
>> msm_pinmux_set_mux+0x508/0x644
>> pinmux_enable_setting+0x190/0x2dc
>> pinctrl_commit_state+0x13c/0x208
>> pinctrl_pm_select_default_state+0x4c/0xa4
>> geni_se_resources_on+0xe8/0x154
>> qcom_geni_serial_runtime_resume+0x4c/0x88
>> pm_generic_runtime_resume+0x2c/0x44
>> __genpd_runtime_resume+0x30/0x80
>> genpd_runtime_resume+0x114/0x29c
>> __rpm_callback+0x48/0x1d8
>> rpm_callback+0x6c/0x78
>> rpm_resume+0x530/0x750
>> __pm_runtime_resume+0x50/0x94
>> handle_threaded_wake_irq+0x30/0x94
>> irq_thread_fn+0x2c/xa8
>> irq_thread+0x160/x248
>> kthread+0x110/x114
>> ret_from_fork+0x10/x20
>>
>> To resolve this, explicitly manage the wakeup IRQ state within the
>> runtime suspend/resume callbacks. In the runtime resume callback, call
>> disable_irq_wake() before enabling resources. This preemptively
>> removes the "wakeup" capability from the IRQ, allowing subsequent
>> interrupt management calls to proceed without conflict. An error path
>> re-enables the wakeup IRQ if resource enablement fails.
>>
>> Conversely, in runtime suspend, call enable_irq_wake() after resources
>> are disabled. This ensures the interrupt is configured as a wakeup
>> source only once the device has fully entered its low-power state. An
>> error path handles disabling the wakeup IRQ if the suspend operation
>> fails.
>>
>> Fixes: 1afa70632c39 ("serial: qcom-geni: Enable PM runtime for serial driver")
>> Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
>
> You forgot:
>
> Reported-by: Alexey Klimov <alexey.klimov@linaro.org>
>
> Also, not sure where this change will go, via Greg or Jiri, but ideally
> this should be picked for current -rc cycle since regression is
> introduced during latest merge window.
>
> I also would like to test it on qrb2210 rb1 where this regression is
> reproduciable.
It doesn't seem that it fixes the regression on RB1 board:
INFO: task kworker/u16:3:50 blocked for more than 120 seconds.
Not tainted 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u16:3 state:D stack:0 pid:50 tgid:50 ppid:2 task_flags:0x4208060 flags:0x00000010
Workqueue: async async_run_entry_fn
Call trace:
__switch_to+0xf0/0x1c0 (T)
__schedule+0x358/0x99c
schedule+0x34/0x11c
rpm_resume+0x17c/0x6a0
rpm_resume+0x2c4/0x6a0
rpm_resume+0x2c4/0x6a0
rpm_resume+0x2c4/0x6a0
__pm_runtime_resume+0x50/0x9c
__driver_probe_device+0x58/0x120
driver_probe_device+0x3c/0x154
__driver_attach_async_helper+0x4c/0xc0
async_run_entry_fn+0x34/0xe0
process_one_work+0x148/0x284
worker_thread+0x2c4/0x3e0
kthread+0x12c/0x210
ret_from_fork+0x10/0x20
INFO: task irq/92-4a8c000.:79 blocked for more than 120 seconds.
Not tainted 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:irq/92-4a8c000. state:D stack:0 pid:79 tgid:79 ppid:2 task_flags:0x208040 flags:0x00000010
Call trace:
__switch_to+0xf0/0x1c0 (T)
__schedule+0x358/0x99c
schedule+0x34/0x11c
__synchronize_irq+0x90/0xcc
disable_irq+0x3c/0x4c
msm_pinmux_set_mux+0x3b4/0x45c
pinmux_enable_setting+0x1fc/0x2d8
pinctrl_commit_state+0xa0/0x260
pinctrl_pm_select_default_state+0x4c/0xa0
geni_se_resources_on+0xe8/0x154
geni_serial_resource_state+0x8c/0xbc
qcom_geni_serial_runtime_resume+0x3c/0x88
pm_generic_runtime_resume+0x2c/0x44
__rpm_callback+0x48/0x1e0
rpm_callback+0x74/0x80
rpm_resume+0x3bc/0x6a0
__pm_runtime_resume+0x50/0x9c
handle_threaded_wake_irq+0x30/0x80
irq_thread_fn+0x2c/0xb0
irq_thread+0x170/0x334
kthread+0x12c/0x210
ret_from_fork+0x10/0x20
I see exactly the same behaviour with this changes applied.
root@rb1:~# uname -a
Linux rb1 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13 SMP PREEMPT Tue Sep 9 20:14:22 BST 2025 aarch64 GNU/Linux
I see the same behaviour with linux-next but my local tree is a bit old,
maybe there are some dependencies.
Best regards,
Alexey
Hi Alexy,
Thank you for update.
On 9/10/2025 1:35 AM, Alexey Klimov wrote:
>
> (adding Krzysztof to c/c)
>
> On Mon Sep 8, 2025 at 6:43 PM BST, Alexey Klimov wrote:
>> On Mon Sep 8, 2025 at 5:45 PM BST, Praveen Talari wrote:
>>> A deadlock is observed in the qcom_geni_serial driver during runtime
>>> resume. This occurs when the pinctrl subsystem reconfigures device pins
>>> via msm_pinmux_set_mux() while the serial device's interrupt is an
>>> active wakeup source. msm_pinmux_set_mux() calls disable_irq() or
>>> __synchronize_irq(), conflicting with the active wakeup state and
>>> causing the IRQ thread to enter an uninterruptible (D-state) sleep,
>>> leading to system instability.
>>>
>>> The critical call trace leading to the deadlock is:
>>>
>>> Call trace:
>>> __switch_to+0xe0/0x120
>>> __schedule+0x39c/0x978
>>> schedule+0x5c/0xf8
>>> __synchronize_irq+0x88/0xb4
>>> disable_irq+0x3c/0x4c
>>> msm_pinmux_set_mux+0x508/0x644
>>> pinmux_enable_setting+0x190/0x2dc
>>> pinctrl_commit_state+0x13c/0x208
>>> pinctrl_pm_select_default_state+0x4c/0xa4
>>> geni_se_resources_on+0xe8/0x154
>>> qcom_geni_serial_runtime_resume+0x4c/0x88
>>> pm_generic_runtime_resume+0x2c/0x44
>>> __genpd_runtime_resume+0x30/0x80
>>> genpd_runtime_resume+0x114/0x29c
>>> __rpm_callback+0x48/0x1d8
>>> rpm_callback+0x6c/0x78
>>> rpm_resume+0x530/0x750
>>> __pm_runtime_resume+0x50/0x94
>>> handle_threaded_wake_irq+0x30/0x94
>>> irq_thread_fn+0x2c/xa8
>>> irq_thread+0x160/x248
>>> kthread+0x110/x114
>>> ret_from_fork+0x10/x20
>>>
>>> To resolve this, explicitly manage the wakeup IRQ state within the
>>> runtime suspend/resume callbacks. In the runtime resume callback, call
>>> disable_irq_wake() before enabling resources. This preemptively
>>> removes the "wakeup" capability from the IRQ, allowing subsequent
>>> interrupt management calls to proceed without conflict. An error path
>>> re-enables the wakeup IRQ if resource enablement fails.
>>>
>>> Conversely, in runtime suspend, call enable_irq_wake() after resources
>>> are disabled. This ensures the interrupt is configured as a wakeup
>>> source only once the device has fully entered its low-power state. An
>>> error path handles disabling the wakeup IRQ if the suspend operation
>>> fails.
>>>
>>> Fixes: 1afa70632c39 ("serial: qcom-geni: Enable PM runtime for serial driver")
>>> Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
>>
>> You forgot:
>>
>> Reported-by: Alexey Klimov <alexey.klimov@linaro.org>
>>
>> Also, not sure where this change will go, via Greg or Jiri, but ideally
>> this should be picked for current -rc cycle since regression is
>> introduced during latest merge window.
>>
>> I also would like to test it on qrb2210 rb1 where this regression is
>> reproduciable.
>
> It doesn't seem that it fixes the regression on RB1 board:
>
> INFO: task kworker/u16:3:50 blocked for more than 120 seconds.
> Not tainted 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:kworker/u16:3 state:D stack:0 pid:50 tgid:50 ppid:2 task_flags:0x4208060 flags:0x00000010
> Workqueue: async async_run_entry_fn
> Call trace:
> __switch_to+0xf0/0x1c0 (T)
> __schedule+0x358/0x99c
> schedule+0x34/0x11c
> rpm_resume+0x17c/0x6a0
> rpm_resume+0x2c4/0x6a0
> rpm_resume+0x2c4/0x6a0
> rpm_resume+0x2c4/0x6a0
> __pm_runtime_resume+0x50/0x9c
> __driver_probe_device+0x58/0x120
> driver_probe_device+0x3c/0x154
> __driver_attach_async_helper+0x4c/0xc0
> async_run_entry_fn+0x34/0xe0
> process_one_work+0x148/0x284
> worker_thread+0x2c4/0x3e0
> kthread+0x12c/0x210
> ret_from_fork+0x10/0x20
> INFO: task irq/92-4a8c000.:79 blocked for more than 120 seconds.
> Not tainted 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:irq/92-4a8c000. state:D stack:0 pid:79 tgid:79 ppid:2 task_flags:0x208040 flags:0x00000010
> Call trace:
> __switch_to+0xf0/0x1c0 (T)
> __schedule+0x358/0x99c
> schedule+0x34/0x11c
> __synchronize_irq+0x90/0xcc
> disable_irq+0x3c/0x4c
> msm_pinmux_set_mux+0x3b4/0x45c
> pinmux_enable_setting+0x1fc/0x2d8
> pinctrl_commit_state+0xa0/0x260
> pinctrl_pm_select_default_state+0x4c/0xa0
> geni_se_resources_on+0xe8/0x154
> geni_serial_resource_state+0x8c/0xbc
> qcom_geni_serial_runtime_resume+0x3c/0x88
> pm_generic_runtime_resume+0x2c/0x44
> __rpm_callback+0x48/0x1e0
> rpm_callback+0x74/0x80
> rpm_resume+0x3bc/0x6a0
> __pm_runtime_resume+0x50/0x9c
> handle_threaded_wake_irq+0x30/0x80
> irq_thread_fn+0x2c/0xb0
> irq_thread+0x170/0x334
> kthread+0x12c/0x210
> ret_from_fork+0x10/0x20
I can see call stack is mostly similar for yours and mine but not
completely at initial calls.
Yours dump:
> qcom_geni_serial_runtime_resume+0x3c/0x88
> pm_generic_runtime_resume+0x2c/0x44
> __rpm_callback+0x48/0x1e0
> rpm_callback+0x74/0x80
> rpm_resume+0x3bc/0x6a0
> __pm_runtime_resume+0x50/0x9c
> handle_threaded_wake_irq+0x30/0x80
Mine:
>>> qcom_geni_serial_runtime_resume+0x4c/0x88
>>> pm_generic_runtime_resume+0x2c/0x44
>>> __genpd_runtime_resume+0x30/0x80
>>> genpd_runtime_resume+0x114/0x29c
>>> __rpm_callback+0x48/0x1d8
>>> rpm_callback+0x6c/0x78
>>> rpm_resume+0x530/0x750
Can you please share what is DT file for this Board if possible?
is there any usecase enabled on this SE instance?
Thanks,
Praveen Talari
>
> I see exactly the same behaviour with this changes applied.
>
> root@rb1:~# uname -a
> Linux rb1 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13 SMP PREEMPT Tue Sep 9 20:14:22 BST 2025 aarch64 GNU/Linux
>
> I see the same behaviour with linux-next but my local tree is a bit old,
> maybe there are some dependencies.
>
> Best regards,
> Alexey
Hi Praveen,
On Thu Sep 11, 2025 at 9:34 AM BST, Praveen Talari wrote:
> Hi Alexy,
>
> Thank you for update.
>
> On 9/10/2025 1:35 AM, Alexey Klimov wrote:
>>
>> (adding Krzysztof to c/c)
>>
>> On Mon Sep 8, 2025 at 6:43 PM BST, Alexey Klimov wrote:
>>> On Mon Sep 8, 2025 at 5:45 PM BST, Praveen Talari wrote:
>>>> A deadlock is observed in the qcom_geni_serial driver during runtime
>>>> resume. This occurs when the pinctrl subsystem reconfigures device pins
>>>> via msm_pinmux_set_mux() while the serial device's interrupt is an
>>>> active wakeup source. msm_pinmux_set_mux() calls disable_irq() or
>>>> __synchronize_irq(), conflicting with the active wakeup state and
>>>> causing the IRQ thread to enter an uninterruptible (D-state) sleep,
>>>> leading to system instability.
>>>>
>>>> The critical call trace leading to the deadlock is:
>>>>
>>>> Call trace:
>>>> __switch_to+0xe0/0x120
>>>> __schedule+0x39c/0x978
>>>> schedule+0x5c/0xf8
>>>> __synchronize_irq+0x88/0xb4
>>>> disable_irq+0x3c/0x4c
>>>> msm_pinmux_set_mux+0x508/0x644
>>>> pinmux_enable_setting+0x190/0x2dc
>>>> pinctrl_commit_state+0x13c/0x208
>>>> pinctrl_pm_select_default_state+0x4c/0xa4
>>>> geni_se_resources_on+0xe8/0x154
>>>> qcom_geni_serial_runtime_resume+0x4c/0x88
>>>> pm_generic_runtime_resume+0x2c/0x44
>>>> __genpd_runtime_resume+0x30/0x80
>>>> genpd_runtime_resume+0x114/0x29c
>>>> __rpm_callback+0x48/0x1d8
>>>> rpm_callback+0x6c/0x78
>>>> rpm_resume+0x530/0x750
>>>> __pm_runtime_resume+0x50/0x94
>>>> handle_threaded_wake_irq+0x30/0x94
>>>> irq_thread_fn+0x2c/xa8
>>>> irq_thread+0x160/x248
>>>> kthread+0x110/x114
>>>> ret_from_fork+0x10/x20
>>>>
>>>> To resolve this, explicitly manage the wakeup IRQ state within the
>>>> runtime suspend/resume callbacks. In the runtime resume callback, call
>>>> disable_irq_wake() before enabling resources. This preemptively
>>>> removes the "wakeup" capability from the IRQ, allowing subsequent
>>>> interrupt management calls to proceed without conflict. An error path
>>>> re-enables the wakeup IRQ if resource enablement fails.
>>>>
>>>> Conversely, in runtime suspend, call enable_irq_wake() after resources
>>>> are disabled. This ensures the interrupt is configured as a wakeup
>>>> source only once the device has fully entered its low-power state. An
>>>> error path handles disabling the wakeup IRQ if the suspend operation
>>>> fails.
>>>>
>>>> Fixes: 1afa70632c39 ("serial: qcom-geni: Enable PM runtime for serial driver")
>>>> Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
>>>
>>> You forgot:
>>>
>>> Reported-by: Alexey Klimov <alexey.klimov@linaro.org>
>>>
>>> Also, not sure where this change will go, via Greg or Jiri, but ideally
>>> this should be picked for current -rc cycle since regression is
>>> introduced during latest merge window.
>>>
>>> I also would like to test it on qrb2210 rb1 where this regression is
>>> reproduciable.
>>
>> It doesn't seem that it fixes the regression on RB1 board:
>>
>> INFO: task kworker/u16:3:50 blocked for more than 120 seconds.
>> Not tainted 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> task:kworker/u16:3 state:D stack:0 pid:50 tgid:50 ppid:2 task_flags:0x4208060 flags:0x00000010
>> Workqueue: async async_run_entry_fn
>> Call trace:
>> __switch_to+0xf0/0x1c0 (T)
>> __schedule+0x358/0x99c
>> schedule+0x34/0x11c
>> rpm_resume+0x17c/0x6a0
>> rpm_resume+0x2c4/0x6a0
>> rpm_resume+0x2c4/0x6a0
>> rpm_resume+0x2c4/0x6a0
>> __pm_runtime_resume+0x50/0x9c
>> __driver_probe_device+0x58/0x120
>> driver_probe_device+0x3c/0x154
>> __driver_attach_async_helper+0x4c/0xc0
>> async_run_entry_fn+0x34/0xe0
>> process_one_work+0x148/0x284
>> worker_thread+0x2c4/0x3e0
>> kthread+0x12c/0x210
>> ret_from_fork+0x10/0x20
>> INFO: task irq/92-4a8c000.:79 blocked for more than 120 seconds.
>> Not tainted 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> task:irq/92-4a8c000. state:D stack:0 pid:79 tgid:79 ppid:2 task_flags:0x208040 flags:0x00000010
>> Call trace:
>> __switch_to+0xf0/0x1c0 (T)
>> __schedule+0x358/0x99c
>> schedule+0x34/0x11c
>> __synchronize_irq+0x90/0xcc
>> disable_irq+0x3c/0x4c
>> msm_pinmux_set_mux+0x3b4/0x45c
>> pinmux_enable_setting+0x1fc/0x2d8
>> pinctrl_commit_state+0xa0/0x260
>> pinctrl_pm_select_default_state+0x4c/0xa0
>> geni_se_resources_on+0xe8/0x154
>> geni_serial_resource_state+0x8c/0xbc
>> qcom_geni_serial_runtime_resume+0x3c/0x88
>> pm_generic_runtime_resume+0x2c/0x44
>> __rpm_callback+0x48/0x1e0
>> rpm_callback+0x74/0x80
>> rpm_resume+0x3bc/0x6a0
>> __pm_runtime_resume+0x50/0x9c
>> handle_threaded_wake_irq+0x30/0x80
>> irq_thread_fn+0x2c/0xb0
>> irq_thread+0x170/0x334
>> kthread+0x12c/0x210
>> ret_from_fork+0x10/0x20
>
> I can see call stack is mostly similar for yours and mine but not
> completely at initial calls.
>
> Yours dump:
> > qcom_geni_serial_runtime_resume+0x3c/0x88
> > pm_generic_runtime_resume+0x2c/0x44
> > __rpm_callback+0x48/0x1e0
> > rpm_callback+0x74/0x80
> > rpm_resume+0x3bc/0x6a0
> > __pm_runtime_resume+0x50/0x9c
> > handle_threaded_wake_irq+0x30/0x80
>
> Mine:
> >>> qcom_geni_serial_runtime_resume+0x4c/0x88
> >>> pm_generic_runtime_resume+0x2c/0x44
> >>> __genpd_runtime_resume+0x30/0x80
> >>> genpd_runtime_resume+0x114/0x29c
> >>> __rpm_callback+0x48/0x1d8
> >>> rpm_callback+0x6c/0x78
> >>> rpm_resume+0x530/0x750
>
>
> Can you please share what is DT file for this Board if possible?
> is there any usecase enabled on this SE instance?
Well, yeah, sorry, I didn't really compared backtraces line to line and
behaviour was exactly the same. I thought that the purpose was to fix
the regression reported earlier.
RB1 main dts files are qrb2210-rb1.dts and qcm2290.dtsi.
The similar board RB2 uses qrb4210-rb2.dts and sm4250.dtsi+sm6115.dtsi,
it is worth checking it as well.
For testing here I didn't use anything extra (the only change was wifi fix
from Loic); I tested -master and linux-next usually.
If you can tell me what is SE instance I may be able to answer. But
as far as I know it is not a part of any infrastructure or CI machinery.
I just boot the board and see if it works, if it does then I rebuild and
test my changes (audio).
Best regards,
Alexey
On 11/09/25 10:00:27, Alexey Klimov wrote:
> Hi Praveen,
>
> On Thu Sep 11, 2025 at 9:34 AM BST, Praveen Talari wrote:
> > Hi Alexy,
> >
> > Thank you for update.
> >
> > On 9/10/2025 1:35 AM, Alexey Klimov wrote:
> >>
> >> (adding Krzysztof to c/c)
> >>
> >> On Mon Sep 8, 2025 at 6:43 PM BST, Alexey Klimov wrote:
> >>> On Mon Sep 8, 2025 at 5:45 PM BST, Praveen Talari wrote:
> >>>> A deadlock is observed in the qcom_geni_serial driver during runtime
> >>>> resume. This occurs when the pinctrl subsystem reconfigures device pins
> >>>> via msm_pinmux_set_mux() while the serial device's interrupt is an
> >>>> active wakeup source. msm_pinmux_set_mux() calls disable_irq() or
> >>>> __synchronize_irq(), conflicting with the active wakeup state and
> >>>> causing the IRQ thread to enter an uninterruptible (D-state) sleep,
> >>>> leading to system instability.
> >>>>
> >>>> The critical call trace leading to the deadlock is:
> >>>>
> >>>> Call trace:
> >>>> __switch_to+0xe0/0x120
> >>>> __schedule+0x39c/0x978
> >>>> schedule+0x5c/0xf8
> >>>> __synchronize_irq+0x88/0xb4
> >>>> disable_irq+0x3c/0x4c
> >>>> msm_pinmux_set_mux+0x508/0x644
> >>>> pinmux_enable_setting+0x190/0x2dc
> >>>> pinctrl_commit_state+0x13c/0x208
> >>>> pinctrl_pm_select_default_state+0x4c/0xa4
> >>>> geni_se_resources_on+0xe8/0x154
> >>>> qcom_geni_serial_runtime_resume+0x4c/0x88
> >>>> pm_generic_runtime_resume+0x2c/0x44
> >>>> __genpd_runtime_resume+0x30/0x80
> >>>> genpd_runtime_resume+0x114/0x29c
> >>>> __rpm_callback+0x48/0x1d8
> >>>> rpm_callback+0x6c/0x78
> >>>> rpm_resume+0x530/0x750
> >>>> __pm_runtime_resume+0x50/0x94
> >>>> handle_threaded_wake_irq+0x30/0x94
> >>>> irq_thread_fn+0x2c/xa8
> >>>> irq_thread+0x160/x248
> >>>> kthread+0x110/x114
> >>>> ret_from_fork+0x10/x20
> >>>>
> >>>> To resolve this, explicitly manage the wakeup IRQ state within the
> >>>> runtime suspend/resume callbacks. In the runtime resume callback, call
> >>>> disable_irq_wake() before enabling resources. This preemptively
> >>>> removes the "wakeup" capability from the IRQ, allowing subsequent
> >>>> interrupt management calls to proceed without conflict. An error path
> >>>> re-enables the wakeup IRQ if resource enablement fails.
> >>>>
> >>>> Conversely, in runtime suspend, call enable_irq_wake() after resources
> >>>> are disabled. This ensures the interrupt is configured as a wakeup
> >>>> source only once the device has fully entered its low-power state. An
> >>>> error path handles disabling the wakeup IRQ if the suspend operation
> >>>> fails.
> >>>>
> >>>> Fixes: 1afa70632c39 ("serial: qcom-geni: Enable PM runtime for serial driver")
> >>>> Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
> >>>
> >>> You forgot:
> >>>
> >>> Reported-by: Alexey Klimov <alexey.klimov@linaro.org>
> >>>
> >>> Also, not sure where this change will go, via Greg or Jiri, but ideally
> >>> this should be picked for current -rc cycle since regression is
> >>> introduced during latest merge window.
> >>>
> >>> I also would like to test it on qrb2210 rb1 where this regression is
> >>> reproduciable.
> >>
> >> It doesn't seem that it fixes the regression on RB1 board:
> >>
> >> INFO: task kworker/u16:3:50 blocked for more than 120 seconds.
> >> Not tainted 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13
> >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >> task:kworker/u16:3 state:D stack:0 pid:50 tgid:50 ppid:2 task_flags:0x4208060 flags:0x00000010
> >> Workqueue: async async_run_entry_fn
> >> Call trace:
> >> __switch_to+0xf0/0x1c0 (T)
> >> __schedule+0x358/0x99c
> >> schedule+0x34/0x11c
> >> rpm_resume+0x17c/0x6a0
> >> rpm_resume+0x2c4/0x6a0
> >> rpm_resume+0x2c4/0x6a0
> >> rpm_resume+0x2c4/0x6a0
> >> __pm_runtime_resume+0x50/0x9c
> >> __driver_probe_device+0x58/0x120
> >> driver_probe_device+0x3c/0x154
> >> __driver_attach_async_helper+0x4c/0xc0
> >> async_run_entry_fn+0x34/0xe0
> >> process_one_work+0x148/0x284
> >> worker_thread+0x2c4/0x3e0
> >> kthread+0x12c/0x210
> >> ret_from_fork+0x10/0x20
> >> INFO: task irq/92-4a8c000.:79 blocked for more than 120 seconds.
> >> Not tainted 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13
> >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >> task:irq/92-4a8c000. state:D stack:0 pid:79 tgid:79 ppid:2 task_flags:0x208040 flags:0x00000010
> >> Call trace:
> >> __switch_to+0xf0/0x1c0 (T)
> >> __schedule+0x358/0x99c
> >> schedule+0x34/0x11c
> >> __synchronize_irq+0x90/0xcc
> >> disable_irq+0x3c/0x4c
> >> msm_pinmux_set_mux+0x3b4/0x45c
> >> pinmux_enable_setting+0x1fc/0x2d8
> >> pinctrl_commit_state+0xa0/0x260
> >> pinctrl_pm_select_default_state+0x4c/0xa0
> >> geni_se_resources_on+0xe8/0x154
> >> geni_serial_resource_state+0x8c/0xbc
> >> qcom_geni_serial_runtime_resume+0x3c/0x88
> >> pm_generic_runtime_resume+0x2c/0x44
> >> __rpm_callback+0x48/0x1e0
> >> rpm_callback+0x74/0x80
> >> rpm_resume+0x3bc/0x6a0
> >> __pm_runtime_resume+0x50/0x9c
> >> handle_threaded_wake_irq+0x30/0x80
> >> irq_thread_fn+0x2c/0xb0
> >> irq_thread+0x170/0x334
> >> kthread+0x12c/0x210
> >> ret_from_fork+0x10/0x20
> >
> > I can see call stack is mostly similar for yours and mine but not
> > completely at initial calls.
> >
> > Yours dump:
> > > qcom_geni_serial_runtime_resume+0x3c/0x88
> > > pm_generic_runtime_resume+0x2c/0x44
> > > __rpm_callback+0x48/0x1e0
> > > rpm_callback+0x74/0x80
> > > rpm_resume+0x3bc/0x6a0
> > > __pm_runtime_resume+0x50/0x9c
> > > handle_threaded_wake_irq+0x30/0x80
> >
> > Mine:
> > >>> qcom_geni_serial_runtime_resume+0x4c/0x88
> > >>> pm_generic_runtime_resume+0x2c/0x44
> > >>> __genpd_runtime_resume+0x30/0x80
> > >>> genpd_runtime_resume+0x114/0x29c
> > >>> __rpm_callback+0x48/0x1d8
> > >>> rpm_callback+0x6c/0x78
> > >>> rpm_resume+0x530/0x750
> >
> >
> > Can you please share what is DT file for this Board if possible?
> > is there any usecase enabled on this SE instance?
>
> Well, yeah, sorry, I didn't really compared backtraces line to line and
> behaviour was exactly the same. I thought that the purpose was to fix
> the regression reported earlier.
>
> RB1 main dts files are qrb2210-rb1.dts and qcm2290.dtsi.
>
> The similar board RB2 uses qrb4210-rb2.dts and sm4250.dtsi+sm6115.dtsi,
> it is worth checking it as well.
> For testing here I didn't use anything extra (the only change was wifi fix
> from Loic); I tested -master and linux-next usually.
>
> If you can tell me what is SE instance I may be able to answer. But
> as far as I know it is not a part of any infrastructure or CI machinery.
> I just boot the board and see if it works, if it does then I rebuild and
> test my changes (audio).
>
> Best regards,
> Alexey
>
will there be a fix any time soon Praveen? reverting "serial: qcom-geni:
Enable PM runtime for serial driver" does fix the problem on RB1.
Otherwise I suggest that we revert this commit on linux-next.
Hi Alexey,
Really appreciate you waiting!
On 9/11/2025 2:30 PM, Alexey Klimov wrote:
> Hi Praveen,
>
> On Thu Sep 11, 2025 at 9:34 AM BST, Praveen Talari wrote:
>> Hi Alexy,
>>
>> Thank you for update.
>>
>> On 9/10/2025 1:35 AM, Alexey Klimov wrote:
>>>
>>> (adding Krzysztof to c/c)
>>>
>>> On Mon Sep 8, 2025 at 6:43 PM BST, Alexey Klimov wrote:
>>>> On Mon Sep 8, 2025 at 5:45 PM BST, Praveen Talari wrote:
>>>>> A deadlock is observed in the qcom_geni_serial driver during runtime
>>>>> resume. This occurs when the pinctrl subsystem reconfigures device pins
>>>>> via msm_pinmux_set_mux() while the serial device's interrupt is an
>>>>> active wakeup source. msm_pinmux_set_mux() calls disable_irq() or
>>>>> __synchronize_irq(), conflicting with the active wakeup state and
>>>>> causing the IRQ thread to enter an uninterruptible (D-state) sleep,
>>>>> leading to system instability.
>>>>>
>>>>> The critical call trace leading to the deadlock is:
>>>>>
>>>>> Call trace:
>>>>> __switch_to+0xe0/0x120
>>>>> __schedule+0x39c/0x978
>>>>> schedule+0x5c/0xf8
>>>>> __synchronize_irq+0x88/0xb4
>>>>> disable_irq+0x3c/0x4c
>>>>> msm_pinmux_set_mux+0x508/0x644
>>>>> pinmux_enable_setting+0x190/0x2dc
>>>>> pinctrl_commit_state+0x13c/0x208
>>>>> pinctrl_pm_select_default_state+0x4c/0xa4
>>>>> geni_se_resources_on+0xe8/0x154
>>>>> qcom_geni_serial_runtime_resume+0x4c/0x88
>>>>> pm_generic_runtime_resume+0x2c/0x44
>>>>> __genpd_runtime_resume+0x30/0x80
>>>>> genpd_runtime_resume+0x114/0x29c
>>>>> __rpm_callback+0x48/0x1d8
>>>>> rpm_callback+0x6c/0x78
>>>>> rpm_resume+0x530/0x750
>>>>> __pm_runtime_resume+0x50/0x94
>>>>> handle_threaded_wake_irq+0x30/0x94
>>>>> irq_thread_fn+0x2c/xa8
>>>>> irq_thread+0x160/x248
>>>>> kthread+0x110/x114
>>>>> ret_from_fork+0x10/x20
>>>>>
>>>>> To resolve this, explicitly manage the wakeup IRQ state within the
>>>>> runtime suspend/resume callbacks. In the runtime resume callback, call
>>>>> disable_irq_wake() before enabling resources. This preemptively
>>>>> removes the "wakeup" capability from the IRQ, allowing subsequent
>>>>> interrupt management calls to proceed without conflict. An error path
>>>>> re-enables the wakeup IRQ if resource enablement fails.
>>>>>
>>>>> Conversely, in runtime suspend, call enable_irq_wake() after resources
>>>>> are disabled. This ensures the interrupt is configured as a wakeup
>>>>> source only once the device has fully entered its low-power state. An
>>>>> error path handles disabling the wakeup IRQ if the suspend operation
>>>>> fails.
>>>>>
>>>>> Fixes: 1afa70632c39 ("serial: qcom-geni: Enable PM runtime for serial driver")
>>>>> Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
>>>>
>>>> You forgot:
>>>>
>>>> Reported-by: Alexey Klimov <alexey.klimov@linaro.org>
>>>>
>>>> Also, not sure where this change will go, via Greg or Jiri, but ideally
>>>> this should be picked for current -rc cycle since regression is
>>>> introduced during latest merge window.
>>>>
>>>> I also would like to test it on qrb2210 rb1 where this regression is
>>>> reproduciable.
>>>
>>> It doesn't seem that it fixes the regression on RB1 board:
>>>
>>> INFO: task kworker/u16:3:50 blocked for more than 120 seconds.
>>> Not tainted 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13
>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> task:kworker/u16:3 state:D stack:0 pid:50 tgid:50 ppid:2 task_flags:0x4208060 flags:0x00000010
>>> Workqueue: async async_run_entry_fn
>>> Call trace:
>>> __switch_to+0xf0/0x1c0 (T)
>>> __schedule+0x358/0x99c
>>> schedule+0x34/0x11c
>>> rpm_resume+0x17c/0x6a0
>>> rpm_resume+0x2c4/0x6a0
>>> rpm_resume+0x2c4/0x6a0
>>> rpm_resume+0x2c4/0x6a0
>>> __pm_runtime_resume+0x50/0x9c
>>> __driver_probe_device+0x58/0x120
>>> driver_probe_device+0x3c/0x154
>>> __driver_attach_async_helper+0x4c/0xc0
>>> async_run_entry_fn+0x34/0xe0
>>> process_one_work+0x148/0x284
>>> worker_thread+0x2c4/0x3e0
>>> kthread+0x12c/0x210
>>> ret_from_fork+0x10/0x20
>>> INFO: task irq/92-4a8c000.:79 blocked for more than 120 seconds.
>>> Not tainted 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13
>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> task:irq/92-4a8c000. state:D stack:0 pid:79 tgid:79 ppid:2 task_flags:0x208040 flags:0x00000010
>>> Call trace:
>>> __switch_to+0xf0/0x1c0 (T)
>>> __schedule+0x358/0x99c
>>> schedule+0x34/0x11c
>>> __synchronize_irq+0x90/0xcc
>>> disable_irq+0x3c/0x4c
>>> msm_pinmux_set_mux+0x3b4/0x45c
>>> pinmux_enable_setting+0x1fc/0x2d8
>>> pinctrl_commit_state+0xa0/0x260
>>> pinctrl_pm_select_default_state+0x4c/0xa0
>>> geni_se_resources_on+0xe8/0x154
>>> geni_serial_resource_state+0x8c/0xbc
>>> qcom_geni_serial_runtime_resume+0x3c/0x88
>>> pm_generic_runtime_resume+0x2c/0x44
>>> __rpm_callback+0x48/0x1e0
>>> rpm_callback+0x74/0x80
>>> rpm_resume+0x3bc/0x6a0
>>> __pm_runtime_resume+0x50/0x9c
>>> handle_threaded_wake_irq+0x30/0x80
>>> irq_thread_fn+0x2c/0xb0
>>> irq_thread+0x170/0x334
>>> kthread+0x12c/0x210
>>> ret_from_fork+0x10/0x20
>>
>> I can see call stack is mostly similar for yours and mine but not
>> completely at initial calls.
>>
>> Yours dump:
>> > qcom_geni_serial_runtime_resume+0x3c/0x88
>> > pm_generic_runtime_resume+0x2c/0x44
>> > __rpm_callback+0x48/0x1e0
>> > rpm_callback+0x74/0x80
>> > rpm_resume+0x3bc/0x6a0
>> > __pm_runtime_resume+0x50/0x9c
>> > handle_threaded_wake_irq+0x30/0x80
>>
>> Mine:
>> >>> qcom_geni_serial_runtime_resume+0x4c/0x88
>> >>> pm_generic_runtime_resume+0x2c/0x44
>> >>> __genpd_runtime_resume+0x30/0x80
>> >>> genpd_runtime_resume+0x114/0x29c
>> >>> __rpm_callback+0x48/0x1d8
>> >>> rpm_callback+0x6c/0x78
>> >>> rpm_resume+0x530/0x750
>>
>>
>> Can you please share what is DT file for this Board if possible?
>> is there any usecase enabled on this SE instance?
>
> Well, yeah, sorry, I didn't really compared backtraces line to line and
> behaviour was exactly the same. I thought that the purpose was to fix
> the regression reported earlier.
>
> RB1 main dts files are qrb2210-rb1.dts and qcm2290.dtsi.
>
> The similar board RB2 uses qrb4210-rb2.dts and sm4250.dtsi+sm6115.dtsi,
> it is worth checking it as well.
> For testing here I didn't use anything extra (the only change was wifi fix
> from Loic); I tested -master and linux-next usually.
>
> If you can tell me what is SE instance I may be able to answer. But
> as far as I know it is not a part of any infrastructure or CI machinery.
> I just boot the board and see if it works, if it does then I rebuild and
> test my changes (audio).
I'm actively working on this and experimenting various scenarios with
wakeup. I’ll share the updated patch as soon as possible.
Should we include fix in V2 or new version(V1) if the fix originates
from a different subsystem(pinctrol)?
Thanks,
Praveen Talari
>
> Best regards,
> Alexey
>
(removing <quic_mnaresh@quicinc.com> from c/c -- too many mail not delivered)
Hi Praveen,
On Mon Sep 15, 2025 at 7:58 AM BST, Praveen Talari wrote:
> Hi Alexey,
>
> Really appreciate you waiting!
>
> On 9/11/2025 2:30 PM, Alexey Klimov wrote:
>> Hi Praveen,
>>
>> On Thu Sep 11, 2025 at 9:34 AM BST, Praveen Talari wrote:
>>> Hi Alexy,
>>>
>>> Thank you for update.
>>>
>>> On 9/10/2025 1:35 AM, Alexey Klimov wrote:
>>>>
>>>> (adding Krzysztof to c/c)
>>>>
>>>> On Mon Sep 8, 2025 at 6:43 PM BST, Alexey Klimov wrote:
>>>>> On Mon Sep 8, 2025 at 5:45 PM BST, Praveen Talari wrote:
>>>>>> A deadlock is observed in the qcom_geni_serial driver during runtime
>>>>>> resume. This occurs when the pinctrl subsystem reconfigures device pins
>>>>>> via msm_pinmux_set_mux() while the serial device's interrupt is an
>>>>>> active wakeup source. msm_pinmux_set_mux() calls disable_irq() or
>>>>>> __synchronize_irq(), conflicting with the active wakeup state and
>>>>>> causing the IRQ thread to enter an uninterruptible (D-state) sleep,
>>>>>> leading to system instability.
>>>>>>
>>>>>> The critical call trace leading to the deadlock is:
>>>>>>
>>>>>> Call trace:
>>>>>> __switch_to+0xe0/0x120
>>>>>> __schedule+0x39c/0x978
>>>>>> schedule+0x5c/0xf8
>>>>>> __synchronize_irq+0x88/0xb4
>>>>>> disable_irq+0x3c/0x4c
>>>>>> msm_pinmux_set_mux+0x508/0x644
>>>>>> pinmux_enable_setting+0x190/0x2dc
>>>>>> pinctrl_commit_state+0x13c/0x208
>>>>>> pinctrl_pm_select_default_state+0x4c/0xa4
>>>>>> geni_se_resources_on+0xe8/0x154
>>>>>> qcom_geni_serial_runtime_resume+0x4c/0x88
>>>>>> pm_generic_runtime_resume+0x2c/0x44
>>>>>> __genpd_runtime_resume+0x30/0x80
>>>>>> genpd_runtime_resume+0x114/0x29c
>>>>>> __rpm_callback+0x48/0x1d8
>>>>>> rpm_callback+0x6c/0x78
>>>>>> rpm_resume+0x530/0x750
>>>>>> __pm_runtime_resume+0x50/0x94
>>>>>> handle_threaded_wake_irq+0x30/0x94
>>>>>> irq_thread_fn+0x2c/xa8
>>>>>> irq_thread+0x160/x248
>>>>>> kthread+0x110/x114
>>>>>> ret_from_fork+0x10/x20
>>>>>>
>>>>>> To resolve this, explicitly manage the wakeup IRQ state within the
>>>>>> runtime suspend/resume callbacks. In the runtime resume callback, call
>>>>>> disable_irq_wake() before enabling resources. This preemptively
>>>>>> removes the "wakeup" capability from the IRQ, allowing subsequent
>>>>>> interrupt management calls to proceed without conflict. An error path
>>>>>> re-enables the wakeup IRQ if resource enablement fails.
>>>>>>
>>>>>> Conversely, in runtime suspend, call enable_irq_wake() after resources
>>>>>> are disabled. This ensures the interrupt is configured as a wakeup
>>>>>> source only once the device has fully entered its low-power state. An
>>>>>> error path handles disabling the wakeup IRQ if the suspend operation
>>>>>> fails.
>>>>>>
>>>>>> Fixes: 1afa70632c39 ("serial: qcom-geni: Enable PM runtime for serial driver")
>>>>>> Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
>>>>>
>>>>> You forgot:
>>>>>
>>>>> Reported-by: Alexey Klimov <alexey.klimov@linaro.org>
>>>>>
>>>>> Also, not sure where this change will go, via Greg or Jiri, but ideally
>>>>> this should be picked for current -rc cycle since regression is
>>>>> introduced during latest merge window.
>>>>>
>>>>> I also would like to test it on qrb2210 rb1 where this regression is
>>>>> reproduciable.
>>>>
>>>> It doesn't seem that it fixes the regression on RB1 board:
>>>>
>>>> INFO: task kworker/u16:3:50 blocked for more than 120 seconds.
>>>> Not tainted 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13
>>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>> task:kworker/u16:3 state:D stack:0 pid:50 tgid:50 ppid:2 task_flags:0x4208060 flags:0x00000010
>>>> Workqueue: async async_run_entry_fn
>>>> Call trace:
>>>> __switch_to+0xf0/0x1c0 (T)
>>>> __schedule+0x358/0x99c
>>>> schedule+0x34/0x11c
>>>> rpm_resume+0x17c/0x6a0
>>>> rpm_resume+0x2c4/0x6a0
>>>> rpm_resume+0x2c4/0x6a0
>>>> rpm_resume+0x2c4/0x6a0
>>>> __pm_runtime_resume+0x50/0x9c
>>>> __driver_probe_device+0x58/0x120
>>>> driver_probe_device+0x3c/0x154
>>>> __driver_attach_async_helper+0x4c/0xc0
>>>> async_run_entry_fn+0x34/0xe0
>>>> process_one_work+0x148/0x284
>>>> worker_thread+0x2c4/0x3e0
>>>> kthread+0x12c/0x210
>>>> ret_from_fork+0x10/0x20
>>>> INFO: task irq/92-4a8c000.:79 blocked for more than 120 seconds.
>>>> Not tainted 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13
>>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>> task:irq/92-4a8c000. state:D stack:0 pid:79 tgid:79 ppid:2 task_flags:0x208040 flags:0x00000010
>>>> Call trace:
>>>> __switch_to+0xf0/0x1c0 (T)
>>>> __schedule+0x358/0x99c
>>>> schedule+0x34/0x11c
>>>> __synchronize_irq+0x90/0xcc
>>>> disable_irq+0x3c/0x4c
>>>> msm_pinmux_set_mux+0x3b4/0x45c
>>>> pinmux_enable_setting+0x1fc/0x2d8
>>>> pinctrl_commit_state+0xa0/0x260
>>>> pinctrl_pm_select_default_state+0x4c/0xa0
>>>> geni_se_resources_on+0xe8/0x154
>>>> geni_serial_resource_state+0x8c/0xbc
>>>> qcom_geni_serial_runtime_resume+0x3c/0x88
>>>> pm_generic_runtime_resume+0x2c/0x44
>>>> __rpm_callback+0x48/0x1e0
>>>> rpm_callback+0x74/0x80
>>>> rpm_resume+0x3bc/0x6a0
>>>> __pm_runtime_resume+0x50/0x9c
>>>> handle_threaded_wake_irq+0x30/0x80
>>>> irq_thread_fn+0x2c/0xb0
>>>> irq_thread+0x170/0x334
>>>> kthread+0x12c/0x210
>>>> ret_from_fork+0x10/0x20
>>>
>>> I can see call stack is mostly similar for yours and mine but not
>>> completely at initial calls.
>>>
>>> Yours dump:
>>> > qcom_geni_serial_runtime_resume+0x3c/0x88
>>> > pm_generic_runtime_resume+0x2c/0x44
>>> > __rpm_callback+0x48/0x1e0
>>> > rpm_callback+0x74/0x80
>>> > rpm_resume+0x3bc/0x6a0
>>> > __pm_runtime_resume+0x50/0x9c
>>> > handle_threaded_wake_irq+0x30/0x80
>>>
>>> Mine:
>>> >>> qcom_geni_serial_runtime_resume+0x4c/0x88
>>> >>> pm_generic_runtime_resume+0x2c/0x44
>>> >>> __genpd_runtime_resume+0x30/0x80
>>> >>> genpd_runtime_resume+0x114/0x29c
>>> >>> __rpm_callback+0x48/0x1d8
>>> >>> rpm_callback+0x6c/0x78
>>> >>> rpm_resume+0x530/0x750
>>>
>>>
>>> Can you please share what is DT file for this Board if possible?
>>> is there any usecase enabled on this SE instance?
>>
>> Well, yeah, sorry, I didn't really compared backtraces line to line and
>> behaviour was exactly the same. I thought that the purpose was to fix
>> the regression reported earlier.
>>
>> RB1 main dts files are qrb2210-rb1.dts and qcm2290.dtsi.
>>
>> The similar board RB2 uses qrb4210-rb2.dts and sm4250.dtsi+sm6115.dtsi,
>> it is worth checking it as well.
>> For testing here I didn't use anything extra (the only change was wifi fix
>> from Loic); I tested -master and linux-next usually.
>>
>> If you can tell me what is SE instance I may be able to answer. But
>> as far as I know it is not a part of any infrastructure or CI machinery.
>> I just boot the board and see if it works, if it does then I rebuild and
>> test my changes (audio).
>
> I'm actively working on this and experimenting various scenarios with
> wakeup. I’ll share the updated patch as soon as possible.
>
> Should we include fix in V2 or new version(V1) if the fix originates
> from a different subsystem(pinctrol)?
Wait, I am a bit lost. Are there two regresssions? And is this patch only
targets one of the them?
Are there two fixes now for different problems?
If they are not related (independent) then I'd split it but it not something
exceptional -- just standard rules should apply.
Thanks,
Alexey
Hi Alexey,
On 9/15/2025 3:09 PM, Alexey Klimov wrote:
> (removing <quic_mnaresh@quicinc.com> from c/c -- too many mail not delivered)
>
> Hi Praveen,
>
> On Mon Sep 15, 2025 at 7:58 AM BST, Praveen Talari wrote:
>> Hi Alexey,
>>
>> Really appreciate you waiting!
>>
>> On 9/11/2025 2:30 PM, Alexey Klimov wrote:
>>> Hi Praveen,
>>>
>>> On Thu Sep 11, 2025 at 9:34 AM BST, Praveen Talari wrote:
>>>> Hi Alexy,
>>>>
>>>> Thank you for update.
>>>>
>>>> On 9/10/2025 1:35 AM, Alexey Klimov wrote:
>>>>>
>>>>> (adding Krzysztof to c/c)
>>>>>
>>>>> On Mon Sep 8, 2025 at 6:43 PM BST, Alexey Klimov wrote:
>>>>>> On Mon Sep 8, 2025 at 5:45 PM BST, Praveen Talari wrote:
>>>>>>> A deadlock is observed in the qcom_geni_serial driver during runtime
>>>>>>> resume. This occurs when the pinctrl subsystem reconfigures device pins
>>>>>>> via msm_pinmux_set_mux() while the serial device's interrupt is an
>>>>>>> active wakeup source. msm_pinmux_set_mux() calls disable_irq() or
>>>>>>> __synchronize_irq(), conflicting with the active wakeup state and
>>>>>>> causing the IRQ thread to enter an uninterruptible (D-state) sleep,
>>>>>>> leading to system instability.
>>>>>>>
>>>>>>> The critical call trace leading to the deadlock is:
>>>>>>>
>>>>>>> Call trace:
>>>>>>> __switch_to+0xe0/0x120
>>>>>>> __schedule+0x39c/0x978
>>>>>>> schedule+0x5c/0xf8
>>>>>>> __synchronize_irq+0x88/0xb4
>>>>>>> disable_irq+0x3c/0x4c
>>>>>>> msm_pinmux_set_mux+0x508/0x644
>>>>>>> pinmux_enable_setting+0x190/0x2dc
>>>>>>> pinctrl_commit_state+0x13c/0x208
>>>>>>> pinctrl_pm_select_default_state+0x4c/0xa4
>>>>>>> geni_se_resources_on+0xe8/0x154
>>>>>>> qcom_geni_serial_runtime_resume+0x4c/0x88
>>>>>>> pm_generic_runtime_resume+0x2c/0x44
>>>>>>> __genpd_runtime_resume+0x30/0x80
>>>>>>> genpd_runtime_resume+0x114/0x29c
>>>>>>> __rpm_callback+0x48/0x1d8
>>>>>>> rpm_callback+0x6c/0x78
>>>>>>> rpm_resume+0x530/0x750
>>>>>>> __pm_runtime_resume+0x50/0x94
>>>>>>> handle_threaded_wake_irq+0x30/0x94
>>>>>>> irq_thread_fn+0x2c/xa8
>>>>>>> irq_thread+0x160/x248
>>>>>>> kthread+0x110/x114
>>>>>>> ret_from_fork+0x10/x20
>>>>>>>
>>>>>>> To resolve this, explicitly manage the wakeup IRQ state within the
>>>>>>> runtime suspend/resume callbacks. In the runtime resume callback, call
>>>>>>> disable_irq_wake() before enabling resources. This preemptively
>>>>>>> removes the "wakeup" capability from the IRQ, allowing subsequent
>>>>>>> interrupt management calls to proceed without conflict. An error path
>>>>>>> re-enables the wakeup IRQ if resource enablement fails.
>>>>>>>
>>>>>>> Conversely, in runtime suspend, call enable_irq_wake() after resources
>>>>>>> are disabled. This ensures the interrupt is configured as a wakeup
>>>>>>> source only once the device has fully entered its low-power state. An
>>>>>>> error path handles disabling the wakeup IRQ if the suspend operation
>>>>>>> fails.
>>>>>>>
>>>>>>> Fixes: 1afa70632c39 ("serial: qcom-geni: Enable PM runtime for serial driver")
>>>>>>> Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
>>>>>>
>>>>>> You forgot:
>>>>>>
>>>>>> Reported-by: Alexey Klimov <alexey.klimov@linaro.org>
>>>>>>
>>>>>> Also, not sure where this change will go, via Greg or Jiri, but ideally
>>>>>> this should be picked for current -rc cycle since regression is
>>>>>> introduced during latest merge window.
>>>>>>
>>>>>> I also would like to test it on qrb2210 rb1 where this regression is
>>>>>> reproduciable.
>>>>>
>>>>> It doesn't seem that it fixes the regression on RB1 board:
>>>>>
>>>>> INFO: task kworker/u16:3:50 blocked for more than 120 seconds.
>>>>> Not tainted 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13
>>>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>> task:kworker/u16:3 state:D stack:0 pid:50 tgid:50 ppid:2 task_flags:0x4208060 flags:0x00000010
>>>>> Workqueue: async async_run_entry_fn
>>>>> Call trace:
>>>>> __switch_to+0xf0/0x1c0 (T)
>>>>> __schedule+0x358/0x99c
>>>>> schedule+0x34/0x11c
>>>>> rpm_resume+0x17c/0x6a0
>>>>> rpm_resume+0x2c4/0x6a0
>>>>> rpm_resume+0x2c4/0x6a0
>>>>> rpm_resume+0x2c4/0x6a0
>>>>> __pm_runtime_resume+0x50/0x9c
>>>>> __driver_probe_device+0x58/0x120
>>>>> driver_probe_device+0x3c/0x154
>>>>> __driver_attach_async_helper+0x4c/0xc0
>>>>> async_run_entry_fn+0x34/0xe0
>>>>> process_one_work+0x148/0x284
>>>>> worker_thread+0x2c4/0x3e0
>>>>> kthread+0x12c/0x210
>>>>> ret_from_fork+0x10/0x20
>>>>> INFO: task irq/92-4a8c000.:79 blocked for more than 120 seconds.
>>>>> Not tainted 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13
>>>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>> task:irq/92-4a8c000. state:D stack:0 pid:79 tgid:79 ppid:2 task_flags:0x208040 flags:0x00000010
>>>>> Call trace:
>>>>> __switch_to+0xf0/0x1c0 (T)
>>>>> __schedule+0x358/0x99c
>>>>> schedule+0x34/0x11c
>>>>> __synchronize_irq+0x90/0xcc
>>>>> disable_irq+0x3c/0x4c
>>>>> msm_pinmux_set_mux+0x3b4/0x45c
>>>>> pinmux_enable_setting+0x1fc/0x2d8
>>>>> pinctrl_commit_state+0xa0/0x260
>>>>> pinctrl_pm_select_default_state+0x4c/0xa0
>>>>> geni_se_resources_on+0xe8/0x154
>>>>> geni_serial_resource_state+0x8c/0xbc
>>>>> qcom_geni_serial_runtime_resume+0x3c/0x88
>>>>> pm_generic_runtime_resume+0x2c/0x44
>>>>> __rpm_callback+0x48/0x1e0
>>>>> rpm_callback+0x74/0x80
>>>>> rpm_resume+0x3bc/0x6a0
>>>>> __pm_runtime_resume+0x50/0x9c
>>>>> handle_threaded_wake_irq+0x30/0x80
>>>>> irq_thread_fn+0x2c/0xb0
>>>>> irq_thread+0x170/0x334
>>>>> kthread+0x12c/0x210
>>>>> ret_from_fork+0x10/0x20
>>>>
>>>> I can see call stack is mostly similar for yours and mine but not
>>>> completely at initial calls.
>>>>
>>>> Yours dump:
>>>> > qcom_geni_serial_runtime_resume+0x3c/0x88
>>>> > pm_generic_runtime_resume+0x2c/0x44
>>>> > __rpm_callback+0x48/0x1e0
>>>> > rpm_callback+0x74/0x80
>>>> > rpm_resume+0x3bc/0x6a0
>>>> > __pm_runtime_resume+0x50/0x9c
>>>> > handle_threaded_wake_irq+0x30/0x80
>>>>
>>>> Mine:
>>>> >>> qcom_geni_serial_runtime_resume+0x4c/0x88
>>>> >>> pm_generic_runtime_resume+0x2c/0x44
>>>> >>> __genpd_runtime_resume+0x30/0x80
>>>> >>> genpd_runtime_resume+0x114/0x29c
>>>> >>> __rpm_callback+0x48/0x1d8
>>>> >>> rpm_callback+0x6c/0x78
>>>> >>> rpm_resume+0x530/0x750
>>>>
>>>>
>>>> Can you please share what is DT file for this Board if possible?
>>>> is there any usecase enabled on this SE instance?
>>>
>>> Well, yeah, sorry, I didn't really compared backtraces line to line and
>>> behaviour was exactly the same. I thought that the purpose was to fix
>>> the regression reported earlier.
>>>
>>> RB1 main dts files are qrb2210-rb1.dts and qcm2290.dtsi.
>>>
>>> The similar board RB2 uses qrb4210-rb2.dts and sm4250.dtsi+sm6115.dtsi,
>>> it is worth checking it as well.
>>> For testing here I didn't use anything extra (the only change was wifi fix
>>> from Loic); I tested -master and linux-next usually.
>>>
>>> If you can tell me what is SE instance I may be able to answer. But
>>> as far as I know it is not a part of any infrastructure or CI machinery.
>>> I just boot the board and see if it works, if it does then I rebuild and
>>> test my changes (audio).
>>
>> I'm actively working on this and experimenting various scenarios with
>> wakeup. I’ll share the updated patch as soon as possible.
>>
>> Should we include fix in V2 or new version(V1) if the fix originates
>> from a different subsystem(pinctrol)?
>
> Wait, I am a bit lost. Are there two regresssions? And is this patch only
> targets one of the them?
I am simulated on different target(SC7280) and it is same issue only.
> Are there two fixes now for different problems?
The problem is same.
> If they are not related (independent) then I'd split it but it not something
> exceptional -- just standard rules should apply.
I am fixing from this issue from pinctrol subsystem.
Please guide me on this.
Should we include fix in V2 or new version(V1) if the fix originates
from a different subsystem(pinctrol)?
Thanks,
Praveen
>
> Thanks,
> Alexey
Hi Alexey
Thank you for your support.
On 9/15/2025 7:55 PM, Praveen Talari wrote:
> Hi Alexey,
>
> On 9/15/2025 3:09 PM, Alexey Klimov wrote:
>> (removing <quic_mnaresh@quicinc.com> from c/c -- too many mail not
>> delivered)
>>
>> Hi Praveen,
>>
>> On Mon Sep 15, 2025 at 7:58 AM BST, Praveen Talari wrote:
>>> Hi Alexey,
>>>
>>> Really appreciate you waiting!
>>>
>>> On 9/11/2025 2:30 PM, Alexey Klimov wrote:
>>>> Hi Praveen,
>>>>
>>>> On Thu Sep 11, 2025 at 9:34 AM BST, Praveen Talari wrote:
>>>>> Hi Alexy,
>>>>>
>>>>> Thank you for update.
>>>>>
>>>>> On 9/10/2025 1:35 AM, Alexey Klimov wrote:
>>>>>>
>>>>>> (adding Krzysztof to c/c)
>>>>>>
>>>>>> On Mon Sep 8, 2025 at 6:43 PM BST, Alexey Klimov wrote:
>>>>>>> On Mon Sep 8, 2025 at 5:45 PM BST, Praveen Talari wrote:
>>>>>>>> A deadlock is observed in the qcom_geni_serial driver during
>>>>>>>> runtime
>>>>>>>> resume. This occurs when the pinctrl subsystem reconfigures
>>>>>>>> device pins
>>>>>>>> via msm_pinmux_set_mux() while the serial device's interrupt is an
>>>>>>>> active wakeup source. msm_pinmux_set_mux() calls disable_irq() or
>>>>>>>> __synchronize_irq(), conflicting with the active wakeup state and
>>>>>>>> causing the IRQ thread to enter an uninterruptible (D-state) sleep,
>>>>>>>> leading to system instability.
>>>>>>>>
>>>>>>>> The critical call trace leading to the deadlock is:
>>>>>>>>
>>>>>>>> Call trace:
>>>>>>>> __switch_to+0xe0/0x120
>>>>>>>> __schedule+0x39c/0x978
>>>>>>>> schedule+0x5c/0xf8
>>>>>>>> __synchronize_irq+0x88/0xb4
>>>>>>>> disable_irq+0x3c/0x4c
>>>>>>>> msm_pinmux_set_mux+0x508/0x644
>>>>>>>> pinmux_enable_setting+0x190/0x2dc
>>>>>>>> pinctrl_commit_state+0x13c/0x208
>>>>>>>> pinctrl_pm_select_default_state+0x4c/0xa4
>>>>>>>> geni_se_resources_on+0xe8/0x154
>>>>>>>> qcom_geni_serial_runtime_resume+0x4c/0x88
>>>>>>>> pm_generic_runtime_resume+0x2c/0x44
>>>>>>>> __genpd_runtime_resume+0x30/0x80
>>>>>>>> genpd_runtime_resume+0x114/0x29c
>>>>>>>> __rpm_callback+0x48/0x1d8
>>>>>>>> rpm_callback+0x6c/0x78
>>>>>>>> rpm_resume+0x530/0x750
>>>>>>>> __pm_runtime_resume+0x50/0x94
>>>>>>>> handle_threaded_wake_irq+0x30/0x94
>>>>>>>> irq_thread_fn+0x2c/xa8
>>>>>>>> irq_thread+0x160/x248
>>>>>>>> kthread+0x110/x114
>>>>>>>> ret_from_fork+0x10/x20
>>>>>>>>
>>>>>>>> To resolve this, explicitly manage the wakeup IRQ state within the
>>>>>>>> runtime suspend/resume callbacks. In the runtime resume
>>>>>>>> callback, call
>>>>>>>> disable_irq_wake() before enabling resources. This preemptively
>>>>>>>> removes the "wakeup" capability from the IRQ, allowing subsequent
>>>>>>>> interrupt management calls to proceed without conflict. An error
>>>>>>>> path
>>>>>>>> re-enables the wakeup IRQ if resource enablement fails.
>>>>>>>>
>>>>>>>> Conversely, in runtime suspend, call enable_irq_wake() after
>>>>>>>> resources
>>>>>>>> are disabled. This ensures the interrupt is configured as a wakeup
>>>>>>>> source only once the device has fully entered its low-power
>>>>>>>> state. An
>>>>>>>> error path handles disabling the wakeup IRQ if the suspend
>>>>>>>> operation
>>>>>>>> fails.
>>>>>>>>
>>>>>>>> Fixes: 1afa70632c39 ("serial: qcom-geni: Enable PM runtime for
>>>>>>>> serial driver")
>>>>>>>> Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
>>>>>>>
>>>>>>> You forgot:
>>>>>>>
>>>>>>> Reported-by: Alexey Klimov <alexey.klimov@linaro.org>
>>>>>>>
>>>>>>> Also, not sure where this change will go, via Greg or Jiri, but
>>>>>>> ideally
>>>>>>> this should be picked for current -rc cycle since regression is
>>>>>>> introduced during latest merge window.
>>>>>>>
>>>>>>> I also would like to test it on qrb2210 rb1 where this regression is
>>>>>>> reproduciable.
Since I don't have this board, could you kindly validate the new change
and run a quick test on your end?
diff --git a/drivers/pinctrl/qcom/pinctrl-msm.c
b/drivers/pinctrl/qcom/pinctrl-msm.c
index 83eb075b6bfa..3d6601dc6fcc 100644
--- a/drivers/pinctrl/qcom/pinctrl-msm.c
+++ b/drivers/pinctrl/qcom/pinctrl-msm.c
@@ -215,7 +215,7 @@ static int msm_pinmux_set_mux(struct pinctrl_dev
*pctldev,
*/
if (d && i != gpio_func &&
!test_and_set_bit(d->hwirq, pctrl->disabled_for_mux))
- disable_irq(irq);
+ disable_irq_nosync(irq);
raw_spin_lock_irqsave(&pctrl->lock, flags);
Thanks,
Praveen Talari
>>>>>>
>>>>>> It doesn't seem that it fixes the regression on RB1 board:
>>>>>>
>>>>>> INFO: task kworker/u16:3:50 blocked for more than 120 seconds.
>>>>>> Not tainted 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13
>>>>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
>>>>>> this message.
>>>>>> task:kworker/u16:3 state:D stack:0 pid:50 tgid:50
>>>>>> ppid:2 task_flags:0x4208060 flags:0x00000010
>>>>>> Workqueue: async async_run_entry_fn
>>>>>> Call trace:
>>>>>> __switch_to+0xf0/0x1c0 (T)
>>>>>> __schedule+0x358/0x99c
>>>>>> schedule+0x34/0x11c
>>>>>> rpm_resume+0x17c/0x6a0
>>>>>> rpm_resume+0x2c4/0x6a0
>>>>>> rpm_resume+0x2c4/0x6a0
>>>>>> rpm_resume+0x2c4/0x6a0
>>>>>> __pm_runtime_resume+0x50/0x9c
>>>>>> __driver_probe_device+0x58/0x120
>>>>>> driver_probe_device+0x3c/0x154
>>>>>> __driver_attach_async_helper+0x4c/0xc0
>>>>>> async_run_entry_fn+0x34/0xe0
>>>>>> process_one_work+0x148/0x284
>>>>>> worker_thread+0x2c4/0x3e0
>>>>>> kthread+0x12c/0x210
>>>>>> ret_from_fork+0x10/0x20
>>>>>> INFO: task irq/92-4a8c000.:79 blocked for more than 120 seconds.
>>>>>> Not tainted 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13
>>>>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
>>>>>> this message.
>>>>>> task:irq/92-4a8c000. state:D stack:0 pid:79 tgid:79
>>>>>> ppid:2 task_flags:0x208040 flags:0x00000010
>>>>>> Call trace:
>>>>>> __switch_to+0xf0/0x1c0 (T)
>>>>>> __schedule+0x358/0x99c
>>>>>> schedule+0x34/0x11c
>>>>>> __synchronize_irq+0x90/0xcc
>>>>>> disable_irq+0x3c/0x4c
>>>>>> msm_pinmux_set_mux+0x3b4/0x45c
>>>>>> pinmux_enable_setting+0x1fc/0x2d8
>>>>>> pinctrl_commit_state+0xa0/0x260
>>>>>> pinctrl_pm_select_default_state+0x4c/0xa0
>>>>>> geni_se_resources_on+0xe8/0x154
>>>>>> geni_serial_resource_state+0x8c/0xbc
>>>>>> qcom_geni_serial_runtime_resume+0x3c/0x88
>>>>>> pm_generic_runtime_resume+0x2c/0x44
>>>>>> __rpm_callback+0x48/0x1e0
>>>>>> rpm_callback+0x74/0x80
>>>>>> rpm_resume+0x3bc/0x6a0
>>>>>> __pm_runtime_resume+0x50/0x9c
>>>>>> handle_threaded_wake_irq+0x30/0x80
>>>>>> irq_thread_fn+0x2c/0xb0
>>>>>> irq_thread+0x170/0x334
>>>>>> kthread+0x12c/0x210
>>>>>> ret_from_fork+0x10/0x20
>>>>>
>>>>> I can see call stack is mostly similar for yours and mine but not
>>>>> completely at initial calls.
>>>>>
>>>>> Yours dump:
>>>>> > qcom_geni_serial_runtime_resume+0x3c/0x88
>>>>> > pm_generic_runtime_resume+0x2c/0x44
>>>>> > __rpm_callback+0x48/0x1e0
>>>>> > rpm_callback+0x74/0x80
>>>>> > rpm_resume+0x3bc/0x6a0
>>>>> > __pm_runtime_resume+0x50/0x9c
>>>>> > handle_threaded_wake_irq+0x30/0x80
>>>>>
>>>>> Mine:
>>>>> >>> qcom_geni_serial_runtime_resume+0x4c/0x88
>>>>> >>> pm_generic_runtime_resume+0x2c/0x44
>>>>> >>> __genpd_runtime_resume+0x30/0x80
>>>>> >>> genpd_runtime_resume+0x114/0x29c
>>>>> >>> __rpm_callback+0x48/0x1d8
>>>>> >>> rpm_callback+0x6c/0x78
>>>>> >>> rpm_resume+0x530/0x750
>>>>>
>>>>>
>>>>> Can you please share what is DT file for this Board if possible?
>>>>> is there any usecase enabled on this SE instance?
>>>>
>>>> Well, yeah, sorry, I didn't really compared backtraces line to line and
>>>> behaviour was exactly the same. I thought that the purpose was to fix
>>>> the regression reported earlier.
>>>>
>>>> RB1 main dts files are qrb2210-rb1.dts and qcm2290.dtsi.
>>>>
>>>> The similar board RB2 uses qrb4210-rb2.dts and sm4250.dtsi+sm6115.dtsi,
>>>> it is worth checking it as well.
>>>> For testing here I didn't use anything extra (the only change was
>>>> wifi fix
>>>> from Loic); I tested -master and linux-next usually.
>>>>
>>>> If you can tell me what is SE instance I may be able to answer. But
>>>> as far as I know it is not a part of any infrastructure or CI
>>>> machinery.
>>>> I just boot the board and see if it works, if it does then I rebuild
>>>> and
>>>> test my changes (audio).
>>>
>>> I'm actively working on this and experimenting various scenarios with
>>> wakeup. I’ll share the updated patch as soon as possible.
>>>
>>> Should we include fix in V2 or new version(V1) if the fix originates
>>> from a different subsystem(pinctrol)?
>>
>> Wait, I am a bit lost. Are there two regresssions? And is this patch only
>> targets one of the them?
> I am simulated on different target(SC7280) and it is same issue only.
>> Are there two fixes now for different problems?
> The problem is same.
>> If they are not related (independent) then I'd split it but it not
>> something
>> exceptional -- just standard rules should apply.
> I am fixing from this issue from pinctrol subsystem.
>
> Please guide me on this.
> Should we include fix in V2 or new version(V1) if the fix originates
> from a different subsystem(pinctrol)?
>
> Thanks,
> Praveen
>>
>> Thanks,
>> Alexey
On 16/09/25 12:20:25, Praveen Talari wrote:
> Hi Alexey
>
> Thank you for your support.
>
> On 9/15/2025 7:55 PM, Praveen Talari wrote:
> > Hi Alexey,
> >
> > On 9/15/2025 3:09 PM, Alexey Klimov wrote:
> > > (removing <quic_mnaresh@quicinc.com> from c/c -- too many mail not
> > > delivered)
> > >
> > > Hi Praveen,
> > >
> > > On Mon Sep 15, 2025 at 7:58 AM BST, Praveen Talari wrote:
> > > > Hi Alexey,
> > > >
> > > > Really appreciate you waiting!
> > > >
> > > > On 9/11/2025 2:30 PM, Alexey Klimov wrote:
> > > > > Hi Praveen,
> > > > >
> > > > > On Thu Sep 11, 2025 at 9:34 AM BST, Praveen Talari wrote:
> > > > > > Hi Alexy,
> > > > > >
> > > > > > Thank you for update.
> > > > > >
> > > > > > On 9/10/2025 1:35 AM, Alexey Klimov wrote:
> > > > > > >
> > > > > > > (adding Krzysztof to c/c)
> > > > > > >
> > > > > > > On Mon Sep 8, 2025 at 6:43 PM BST, Alexey Klimov wrote:
> > > > > > > > On Mon Sep 8, 2025 at 5:45 PM BST, Praveen Talari wrote:
> > > > > > > > > A deadlock is observed in the
> > > > > > > > > qcom_geni_serial driver during runtime
> > > > > > > > > resume. This occurs when the pinctrl
> > > > > > > > > subsystem reconfigures device pins
> > > > > > > > > via msm_pinmux_set_mux() while the serial device's interrupt is an
> > > > > > > > > active wakeup source. msm_pinmux_set_mux() calls disable_irq() or
> > > > > > > > > __synchronize_irq(), conflicting with the active wakeup state and
> > > > > > > > > causing the IRQ thread to enter an uninterruptible (D-state) sleep,
> > > > > > > > > leading to system instability.
> > > > > > > > >
> > > > > > > > > The critical call trace leading to the deadlock is:
> > > > > > > > >
> > > > > > > > > Call trace:
> > > > > > > > > __switch_to+0xe0/0x120
> > > > > > > > > __schedule+0x39c/0x978
> > > > > > > > > schedule+0x5c/0xf8
> > > > > > > > > __synchronize_irq+0x88/0xb4
> > > > > > > > > disable_irq+0x3c/0x4c
> > > > > > > > > msm_pinmux_set_mux+0x508/0x644
> > > > > > > > > pinmux_enable_setting+0x190/0x2dc
> > > > > > > > > pinctrl_commit_state+0x13c/0x208
> > > > > > > > > pinctrl_pm_select_default_state+0x4c/0xa4
> > > > > > > > > geni_se_resources_on+0xe8/0x154
> > > > > > > > > qcom_geni_serial_runtime_resume+0x4c/0x88
> > > > > > > > > pm_generic_runtime_resume+0x2c/0x44
> > > > > > > > > __genpd_runtime_resume+0x30/0x80
> > > > > > > > > genpd_runtime_resume+0x114/0x29c
> > > > > > > > > __rpm_callback+0x48/0x1d8
> > > > > > > > > rpm_callback+0x6c/0x78
> > > > > > > > > rpm_resume+0x530/0x750
> > > > > > > > > __pm_runtime_resume+0x50/0x94
> > > > > > > > > handle_threaded_wake_irq+0x30/0x94
> > > > > > > > > irq_thread_fn+0x2c/xa8
> > > > > > > > > irq_thread+0x160/x248
> > > > > > > > > kthread+0x110/x114
> > > > > > > > > ret_from_fork+0x10/x20
> > > > > > > > >
> > > > > > > > > To resolve this, explicitly manage the wakeup IRQ state within the
> > > > > > > > > runtime suspend/resume callbacks. In the
> > > > > > > > > runtime resume callback, call
> > > > > > > > > disable_irq_wake() before enabling resources. This preemptively
> > > > > > > > > removes the "wakeup" capability from the IRQ, allowing subsequent
> > > > > > > > > interrupt management calls to proceed
> > > > > > > > > without conflict. An error path
> > > > > > > > > re-enables the wakeup IRQ if resource enablement fails.
> > > > > > > > >
> > > > > > > > > Conversely, in runtime suspend, call
> > > > > > > > > enable_irq_wake() after resources
> > > > > > > > > are disabled. This ensures the interrupt is configured as a wakeup
> > > > > > > > > source only once the device has fully
> > > > > > > > > entered its low-power state. An
> > > > > > > > > error path handles disabling the wakeup IRQ
> > > > > > > > > if the suspend operation
> > > > > > > > > fails.
> > > > > > > > >
> > > > > > > > > Fixes: 1afa70632c39 ("serial: qcom-geni:
> > > > > > > > > Enable PM runtime for serial driver")
> > > > > > > > > Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
> > > > > > > >
> > > > > > > > You forgot:
> > > > > > > >
> > > > > > > > Reported-by: Alexey Klimov <alexey.klimov@linaro.org>
> > > > > > > >
> > > > > > > > Also, not sure where this change will go, via
> > > > > > > > Greg or Jiri, but ideally
> > > > > > > > this should be picked for current -rc cycle since regression is
> > > > > > > > introduced during latest merge window.
> > > > > > > >
> > > > > > > > I also would like to test it on qrb2210 rb1 where this regression is
> > > > > > > > reproduciable.
>
> Since I don't have this board, could you kindly validate the new change and
> run a quick test on your end?
>
> diff --git a/drivers/pinctrl/qcom/pinctrl-msm.c
> b/drivers/pinctrl/qcom/pinctrl-msm.c
> index 83eb075b6bfa..3d6601dc6fcc 100644
> --- a/drivers/pinctrl/qcom/pinctrl-msm.c
> +++ b/drivers/pinctrl/qcom/pinctrl-msm.c
> @@ -215,7 +215,7 @@ static int msm_pinmux_set_mux(struct pinctrl_dev
> *pctldev,
> */
> if (d && i != gpio_func &&
> !test_and_set_bit(d->hwirq, pctrl->disabled_for_mux))
> - disable_irq(irq);
> + disable_irq_nosync(irq);
>
> raw_spin_lock_irqsave(&pctrl->lock, flags);
sorry Praveen, didnt see this proposal. testing on my end as well.
On 16/09/25 16:39:00, Jorge Ramirez wrote:
> On 16/09/25 12:20:25, Praveen Talari wrote:
> > Hi Alexey
> >
> > Thank you for your support.
> >
> > On 9/15/2025 7:55 PM, Praveen Talari wrote:
> > > Hi Alexey,
> > >
> > > On 9/15/2025 3:09 PM, Alexey Klimov wrote:
> > > > (removing <quic_mnaresh@quicinc.com> from c/c -- too many mail not
> > > > delivered)
> > > >
> > > > Hi Praveen,
> > > >
> > > > On Mon Sep 15, 2025 at 7:58 AM BST, Praveen Talari wrote:
> > > > > Hi Alexey,
> > > > >
> > > > > Really appreciate you waiting!
> > > > >
> > > > > On 9/11/2025 2:30 PM, Alexey Klimov wrote:
> > > > > > Hi Praveen,
> > > > > >
> > > > > > On Thu Sep 11, 2025 at 9:34 AM BST, Praveen Talari wrote:
> > > > > > > Hi Alexy,
> > > > > > >
> > > > > > > Thank you for update.
> > > > > > >
> > > > > > > On 9/10/2025 1:35 AM, Alexey Klimov wrote:
> > > > > > > >
> > > > > > > > (adding Krzysztof to c/c)
> > > > > > > >
> > > > > > > > On Mon Sep 8, 2025 at 6:43 PM BST, Alexey Klimov wrote:
> > > > > > > > > On Mon Sep 8, 2025 at 5:45 PM BST, Praveen Talari wrote:
> > > > > > > > > > A deadlock is observed in the
> > > > > > > > > > qcom_geni_serial driver during runtime
> > > > > > > > > > resume. This occurs when the pinctrl
> > > > > > > > > > subsystem reconfigures device pins
> > > > > > > > > > via msm_pinmux_set_mux() while the serial device's interrupt is an
> > > > > > > > > > active wakeup source. msm_pinmux_set_mux() calls disable_irq() or
> > > > > > > > > > __synchronize_irq(), conflicting with the active wakeup state and
> > > > > > > > > > causing the IRQ thread to enter an uninterruptible (D-state) sleep,
> > > > > > > > > > leading to system instability.
> > > > > > > > > >
> > > > > > > > > > The critical call trace leading to the deadlock is:
> > > > > > > > > >
> > > > > > > > > > Call trace:
> > > > > > > > > > __switch_to+0xe0/0x120
> > > > > > > > > > __schedule+0x39c/0x978
> > > > > > > > > > schedule+0x5c/0xf8
> > > > > > > > > > __synchronize_irq+0x88/0xb4
> > > > > > > > > > disable_irq+0x3c/0x4c
> > > > > > > > > > msm_pinmux_set_mux+0x508/0x644
> > > > > > > > > > pinmux_enable_setting+0x190/0x2dc
> > > > > > > > > > pinctrl_commit_state+0x13c/0x208
> > > > > > > > > > pinctrl_pm_select_default_state+0x4c/0xa4
> > > > > > > > > > geni_se_resources_on+0xe8/0x154
> > > > > > > > > > qcom_geni_serial_runtime_resume+0x4c/0x88
> > > > > > > > > > pm_generic_runtime_resume+0x2c/0x44
> > > > > > > > > > __genpd_runtime_resume+0x30/0x80
> > > > > > > > > > genpd_runtime_resume+0x114/0x29c
> > > > > > > > > > __rpm_callback+0x48/0x1d8
> > > > > > > > > > rpm_callback+0x6c/0x78
> > > > > > > > > > rpm_resume+0x530/0x750
> > > > > > > > > > __pm_runtime_resume+0x50/0x94
> > > > > > > > > > handle_threaded_wake_irq+0x30/0x94
> > > > > > > > > > irq_thread_fn+0x2c/xa8
> > > > > > > > > > irq_thread+0x160/x248
> > > > > > > > > > kthread+0x110/x114
> > > > > > > > > > ret_from_fork+0x10/x20
> > > > > > > > > >
> > > > > > > > > > To resolve this, explicitly manage the wakeup IRQ state within the
> > > > > > > > > > runtime suspend/resume callbacks. In the
> > > > > > > > > > runtime resume callback, call
> > > > > > > > > > disable_irq_wake() before enabling resources. This preemptively
> > > > > > > > > > removes the "wakeup" capability from the IRQ, allowing subsequent
> > > > > > > > > > interrupt management calls to proceed
> > > > > > > > > > without conflict. An error path
> > > > > > > > > > re-enables the wakeup IRQ if resource enablement fails.
> > > > > > > > > >
> > > > > > > > > > Conversely, in runtime suspend, call
> > > > > > > > > > enable_irq_wake() after resources
> > > > > > > > > > are disabled. This ensures the interrupt is configured as a wakeup
> > > > > > > > > > source only once the device has fully
> > > > > > > > > > entered its low-power state. An
> > > > > > > > > > error path handles disabling the wakeup IRQ
> > > > > > > > > > if the suspend operation
> > > > > > > > > > fails.
> > > > > > > > > >
> > > > > > > > > > Fixes: 1afa70632c39 ("serial: qcom-geni:
> > > > > > > > > > Enable PM runtime for serial driver")
> > > > > > > > > > Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
> > > > > > > > >
> > > > > > > > > You forgot:
> > > > > > > > >
> > > > > > > > > Reported-by: Alexey Klimov <alexey.klimov@linaro.org>
> > > > > > > > >
> > > > > > > > > Also, not sure where this change will go, via
> > > > > > > > > Greg or Jiri, but ideally
> > > > > > > > > this should be picked for current -rc cycle since regression is
> > > > > > > > > introduced during latest merge window.
> > > > > > > > >
> > > > > > > > > I also would like to test it on qrb2210 rb1 where this regression is
> > > > > > > > > reproduciable.
> >
> > Since I don't have this board, could you kindly validate the new change and
> > run a quick test on your end?
> >
> > diff --git a/drivers/pinctrl/qcom/pinctrl-msm.c
> > b/drivers/pinctrl/qcom/pinctrl-msm.c
> > index 83eb075b6bfa..3d6601dc6fcc 100644
> > --- a/drivers/pinctrl/qcom/pinctrl-msm.c
> > +++ b/drivers/pinctrl/qcom/pinctrl-msm.c
> > @@ -215,7 +215,7 @@ static int msm_pinmux_set_mux(struct pinctrl_dev
> > *pctldev,
> > */
> > if (d && i != gpio_func &&
> > !test_and_set_bit(d->hwirq, pctrl->disabled_for_mux))
> > - disable_irq(irq);
> > + disable_irq_nosync(irq);
> >
> > raw_spin_lock_irqsave(&pctrl->lock, flags);
>
>
> sorry Praveen, didnt see this proposal. testing on my end as well.
>
just tested on my end and all modules load - deadlocked before this
update so there is progress (now we can load the network driver)
I can see however irq/92 (threaded) stuck in D-state inside runtime pm
root@qrb2210-rb1-core-kit:~# echo w > /proc/sysrq-trigger
[ 498.247349] sysrq: Show Blocked State
[ 498.251190] task:irq/92-4a8c000. state:D stack:0 pid:80
tgid:80 ppid:2 task_flags:0x208040 flags:0x00000010
[ 498.262334] Call trace:
[ 498.264812] __switch_to+0xf0/0x1c0 (T)
[ 498.268777] __schedule+0x110/0x9bc
with irq92 being:
92: 199870 0 0 0 msmgpio 11 Level 4a8c000.serial:wakeup
this log changes over time but it is alwas irq/92:
root@qrb2210-rb1-core-kit:~# echo w > /proc/sysrq-trigger [90/66818]
[ 613.019101] sysrq: Show Blocked State
[ 613.023055] task:irq/92-4a8c000. state:D stack:0 pid:80 tgid:80 ppid:2 task_flags:0x208040 flags:0x00000010
[ 613.034189] Call trace:
[ 613.036770] __switch_to+0xf0/0x1c0 (T)
[ 613.040779] __schedule+0x35c/0x9bc
[ 613.044412] schedule+0x34/0x110
[ 613.047782] rpm_resume+0x17c/0x690
[ 613.051359] __pm_runtime_resume+0x4c/0x98
[ 613.055556] handle_threaded_wake_irq+0x30/0x80
[ 613.060168] irq_thread_fn+0x28/0xa8
[ 613.063864] irq_thread+0x178/0x338
[ 613.067434] kthread+0x12c/0x210
[ 613.070735] ret_from_fork+0x10/0x20
root@qrb2210-rb1-core-kit:~#
root@qrb2210-rb1-core-kit:~# echo w > /proc/sysrq-trigger
[ 617.586960] sysrq: Show Blocked State
[ 617.590771] task:irq/92-4a8c000. state:D stack:0 pid:80 tgid:80 ppid:2 task_flags:0x208040 flags:0x00000010
[ 617.601906] Call trace:
[ 617.604442] __switch_to+0xf0/0x1c0 (T)
[ 617.608408] __schedule+0x35c/0x9bc
[ 617.612074] 0x766c7362
root@qrb2210-rb1-core-kit:~#
root@qrb2210-rb1-core-kit:~#
root@qrb2210-rb1-core-kit:~# echo w > /proc/sysrq-trigger
[ 619.656937] sysrq: Show Blocked State
[ 619.660847] task:irq/92-4a8c000. state:D stack:0 pid:80 tgid:80 ppid:2 task_flags:0x208040 flags:0x00000010
[ 619.672009] Call trace:
[ 619.674531] __switch_to+0xf0/0x1c0 (T)
[ 619.678508] __schedule+0x35c/0x9bc
[ 619.682102] schedule+0x34/0x110
[ 619.685488] schedule_timeout+0x80/0x104
root@qrb2210-rb1-core-kit:~#
root@qrb2210-rb1-core-kit:~#
root@qrb2210-rb1-core-kit:~# echo w > /proc/sysrq-trigger
[ 624.786811] sysrq: Show Blocked State
root@qrb2210-rb1-core-kit:~#
root@qrb2210-rb1-core-kit:~#
root@qrb2210-rb1-core-kit:~# echo w > /proc/sysrq-trigger
[ 630.546744] sysrq: Show Blocked State
[ 630.550593] task:irq/92-4a8c000. state:D stack:0 pid:80 tgid:80 ppid:2 task_flags:0x208040 flags:0x00000010
[ 630.561724] Call trace:
[ 630.564219] __switch_to+0xf0/0x1c0 (T)
[ 630.568138] __schedule+0x35c/0x9bc
[ 630.571729] 0x766c7362
root@qrb2210-rb1-core-kit:~#
Hi Praveen,
On Tue Sep 16, 2025 at 4:07 PM BST, Jorge Ramirez wrote:
> On 16/09/25 16:39:00, Jorge Ramirez wrote:
>> On 16/09/25 12:20:25, Praveen Talari wrote:
>> > Hi Alexey
>> >
>> > Thank you for your support.
>> >
>> > On 9/15/2025 7:55 PM, Praveen Talari wrote:
>> > > Hi Alexey,
>> > >
>> > > On 9/15/2025 3:09 PM, Alexey Klimov wrote:
>> > > > (removing <quic_mnaresh@quicinc.com> from c/c -- too many mail not
>> > > > delivered)
>> > > >
>> > > > Hi Praveen,
>> > > >
>> > > > On Mon Sep 15, 2025 at 7:58 AM BST, Praveen Talari wrote:
>> > > > > Hi Alexey,
>> > > > >
>> > > > > Really appreciate you waiting!
>> > > > >
>> > > > > On 9/11/2025 2:30 PM, Alexey Klimov wrote:
>> > > > > > Hi Praveen,
>> > > > > >
>> > > > > > On Thu Sep 11, 2025 at 9:34 AM BST, Praveen Talari wrote:
>> > > > > > > Hi Alexy,
>> > > > > > >
>> > > > > > > Thank you for update.
>> > > > > > >
>> > > > > > > On 9/10/2025 1:35 AM, Alexey Klimov wrote:
>> > > > > > > >
>> > > > > > > > (adding Krzysztof to c/c)
>> > > > > > > >
>> > > > > > > > On Mon Sep 8, 2025 at 6:43 PM BST, Alexey Klimov wrote:
>> > > > > > > > > On Mon Sep 8, 2025 at 5:45 PM BST, Praveen Talari wrote:
>> > > > > > > > > > A deadlock is observed in the
>> > > > > > > > > > qcom_geni_serial driver during runtime
>> > > > > > > > > > resume. This occurs when the pinctrl
>> > > > > > > > > > subsystem reconfigures device pins
>> > > > > > > > > > via msm_pinmux_set_mux() while the serial device's interrupt is an
>> > > > > > > > > > active wakeup source. msm_pinmux_set_mux() calls disable_irq() or
>> > > > > > > > > > __synchronize_irq(), conflicting with the active wakeup state and
>> > > > > > > > > > causing the IRQ thread to enter an uninterruptible (D-state) sleep,
>> > > > > > > > > > leading to system instability.
>> > > > > > > > > >
>> > > > > > > > > > The critical call trace leading to the deadlock is:
>> > > > > > > > > >
>> > > > > > > > > > Call trace:
>> > > > > > > > > > __switch_to+0xe0/0x120
>> > > > > > > > > > __schedule+0x39c/0x978
>> > > > > > > > > > schedule+0x5c/0xf8
>> > > > > > > > > > __synchronize_irq+0x88/0xb4
>> > > > > > > > > > disable_irq+0x3c/0x4c
>> > > > > > > > > > msm_pinmux_set_mux+0x508/0x644
>> > > > > > > > > > pinmux_enable_setting+0x190/0x2dc
>> > > > > > > > > > pinctrl_commit_state+0x13c/0x208
>> > > > > > > > > > pinctrl_pm_select_default_state+0x4c/0xa4
>> > > > > > > > > > geni_se_resources_on+0xe8/0x154
>> > > > > > > > > > qcom_geni_serial_runtime_resume+0x4c/0x88
>> > > > > > > > > > pm_generic_runtime_resume+0x2c/0x44
>> > > > > > > > > > __genpd_runtime_resume+0x30/0x80
>> > > > > > > > > > genpd_runtime_resume+0x114/0x29c
>> > > > > > > > > > __rpm_callback+0x48/0x1d8
>> > > > > > > > > > rpm_callback+0x6c/0x78
>> > > > > > > > > > rpm_resume+0x530/0x750
>> > > > > > > > > > __pm_runtime_resume+0x50/0x94
>> > > > > > > > > > handle_threaded_wake_irq+0x30/0x94
>> > > > > > > > > > irq_thread_fn+0x2c/xa8
>> > > > > > > > > > irq_thread+0x160/x248
>> > > > > > > > > > kthread+0x110/x114
>> > > > > > > > > > ret_from_fork+0x10/x20
>> > > > > > > > > >
>> > > > > > > > > > To resolve this, explicitly manage the wakeup IRQ state within the
>> > > > > > > > > > runtime suspend/resume callbacks. In the
>> > > > > > > > > > runtime resume callback, call
>> > > > > > > > > > disable_irq_wake() before enabling resources. This preemptively
>> > > > > > > > > > removes the "wakeup" capability from the IRQ, allowing subsequent
>> > > > > > > > > > interrupt management calls to proceed
>> > > > > > > > > > without conflict. An error path
>> > > > > > > > > > re-enables the wakeup IRQ if resource enablement fails.
>> > > > > > > > > >
>> > > > > > > > > > Conversely, in runtime suspend, call
>> > > > > > > > > > enable_irq_wake() after resources
>> > > > > > > > > > are disabled. This ensures the interrupt is configured as a wakeup
>> > > > > > > > > > source only once the device has fully
>> > > > > > > > > > entered its low-power state. An
>> > > > > > > > > > error path handles disabling the wakeup IRQ
>> > > > > > > > > > if the suspend operation
>> > > > > > > > > > fails.
>> > > > > > > > > >
>> > > > > > > > > > Fixes: 1afa70632c39 ("serial: qcom-geni:
>> > > > > > > > > > Enable PM runtime for serial driver")
>> > > > > > > > > > Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
>> > > > > > > > >
>> > > > > > > > > You forgot:
>> > > > > > > > >
>> > > > > > > > > Reported-by: Alexey Klimov <alexey.klimov@linaro.org>
>> > > > > > > > >
>> > > > > > > > > Also, not sure where this change will go, via
>> > > > > > > > > Greg or Jiri, but ideally
>> > > > > > > > > this should be picked for current -rc cycle since regression is
>> > > > > > > > > introduced during latest merge window.
>> > > > > > > > >
>> > > > > > > > > I also would like to test it on qrb2210 rb1 where this regression is
>> > > > > > > > > reproduciable.
>> >
>> > Since I don't have this board, could you kindly validate the new change and
>> > run a quick test on your end?
>> >
>> > diff --git a/drivers/pinctrl/qcom/pinctrl-msm.c
>> > b/drivers/pinctrl/qcom/pinctrl-msm.c
>> > index 83eb075b6bfa..3d6601dc6fcc 100644
>> > --- a/drivers/pinctrl/qcom/pinctrl-msm.c
>> > +++ b/drivers/pinctrl/qcom/pinctrl-msm.c
>> > @@ -215,7 +215,7 @@ static int msm_pinmux_set_mux(struct pinctrl_dev
>> > *pctldev,
>> > */
>> > if (d && i != gpio_func &&
>> > !test_and_set_bit(d->hwirq, pctrl->disabled_for_mux))
>> > - disable_irq(irq);
>> > + disable_irq_nosync(irq);
>> >
>> > raw_spin_lock_irqsave(&pctrl->lock, flags);
>>
>>
>> sorry Praveen, didnt see this proposal. testing on my end as well.
>>
>
> just tested on my end and all modules load - deadlocked before this
> update so there is progress (now we can load the network driver)
Is it supposed to be orginal patch here plus disable_irq_nosync()?
Meaning changes for qcom_geni_serial_runtime_{suspend,resume}
+ disable_irq_nosync() in msm_pinmux_set_mux()?
It seems to work here but let me know few more runs.
Best regards,
Alexey
Hi Alexey,
On 9/16/2025 10:42 PM, Alexey Klimov wrote:
> Hi Praveen,
>
> On Tue Sep 16, 2025 at 4:07 PM BST, Jorge Ramirez wrote:
>> On 16/09/25 16:39:00, Jorge Ramirez wrote:
>>> On 16/09/25 12:20:25, Praveen Talari wrote:
>>>> Hi Alexey
>>>>
>>>> Thank you for your support.
>>>>
>>>> On 9/15/2025 7:55 PM, Praveen Talari wrote:
>>>>> Hi Alexey,
>>>>>
>>>>> On 9/15/2025 3:09 PM, Alexey Klimov wrote:
>>>>>> (removing <quic_mnaresh@quicinc.com> from c/c -- too many mail not
>>>>>> delivered)
>>>>>>
>>>>>> Hi Praveen,
>>>>>>
>>>>>> On Mon Sep 15, 2025 at 7:58 AM BST, Praveen Talari wrote:
>>>>>>> Hi Alexey,
>>>>>>>
>>>>>>> Really appreciate you waiting!
>>>>>>>
>>>>>>> On 9/11/2025 2:30 PM, Alexey Klimov wrote:
>>>>>>>> Hi Praveen,
>>>>>>>>
>>>>>>>> On Thu Sep 11, 2025 at 9:34 AM BST, Praveen Talari wrote:
>>>>>>>>> Hi Alexy,
>>>>>>>>>
>>>>>>>>> Thank you for update.
>>>>>>>>>
>>>>>>>>> On 9/10/2025 1:35 AM, Alexey Klimov wrote:
>>>>>>>>>>
>>>>>>>>>> (adding Krzysztof to c/c)
>>>>>>>>>>
>>>>>>>>>> On Mon Sep 8, 2025 at 6:43 PM BST, Alexey Klimov wrote:
>>>>>>>>>>> On Mon Sep 8, 2025 at 5:45 PM BST, Praveen Talari wrote:
>>>>>>>>>>>> A deadlock is observed in the
>>>>>>>>>>>> qcom_geni_serial driver during runtime
>>>>>>>>>>>> resume. This occurs when the pinctrl
>>>>>>>>>>>> subsystem reconfigures device pins
>>>>>>>>>>>> via msm_pinmux_set_mux() while the serial device's interrupt is an
>>>>>>>>>>>> active wakeup source. msm_pinmux_set_mux() calls disable_irq() or
>>>>>>>>>>>> __synchronize_irq(), conflicting with the active wakeup state and
>>>>>>>>>>>> causing the IRQ thread to enter an uninterruptible (D-state) sleep,
>>>>>>>>>>>> leading to system instability.
>>>>>>>>>>>>
>>>>>>>>>>>> The critical call trace leading to the deadlock is:
>>>>>>>>>>>>
>>>>>>>>>>>> Call trace:
>>>>>>>>>>>> __switch_to+0xe0/0x120
>>>>>>>>>>>> __schedule+0x39c/0x978
>>>>>>>>>>>> schedule+0x5c/0xf8
>>>>>>>>>>>> __synchronize_irq+0x88/0xb4
>>>>>>>>>>>> disable_irq+0x3c/0x4c
>>>>>>>>>>>> msm_pinmux_set_mux+0x508/0x644
>>>>>>>>>>>> pinmux_enable_setting+0x190/0x2dc
>>>>>>>>>>>> pinctrl_commit_state+0x13c/0x208
>>>>>>>>>>>> pinctrl_pm_select_default_state+0x4c/0xa4
>>>>>>>>>>>> geni_se_resources_on+0xe8/0x154
>>>>>>>>>>>> qcom_geni_serial_runtime_resume+0x4c/0x88
>>>>>>>>>>>> pm_generic_runtime_resume+0x2c/0x44
>>>>>>>>>>>> __genpd_runtime_resume+0x30/0x80
>>>>>>>>>>>> genpd_runtime_resume+0x114/0x29c
>>>>>>>>>>>> __rpm_callback+0x48/0x1d8
>>>>>>>>>>>> rpm_callback+0x6c/0x78
>>>>>>>>>>>> rpm_resume+0x530/0x750
>>>>>>>>>>>> __pm_runtime_resume+0x50/0x94
>>>>>>>>>>>> handle_threaded_wake_irq+0x30/0x94
>>>>>>>>>>>> irq_thread_fn+0x2c/xa8
>>>>>>>>>>>> irq_thread+0x160/x248
>>>>>>>>>>>> kthread+0x110/x114
>>>>>>>>>>>> ret_from_fork+0x10/x20
>>>>>>>>>>>>
>>>>>>>>>>>> To resolve this, explicitly manage the wakeup IRQ state within the
>>>>>>>>>>>> runtime suspend/resume callbacks. In the
>>>>>>>>>>>> runtime resume callback, call
>>>>>>>>>>>> disable_irq_wake() before enabling resources. This preemptively
>>>>>>>>>>>> removes the "wakeup" capability from the IRQ, allowing subsequent
>>>>>>>>>>>> interrupt management calls to proceed
>>>>>>>>>>>> without conflict. An error path
>>>>>>>>>>>> re-enables the wakeup IRQ if resource enablement fails.
>>>>>>>>>>>>
>>>>>>>>>>>> Conversely, in runtime suspend, call
>>>>>>>>>>>> enable_irq_wake() after resources
>>>>>>>>>>>> are disabled. This ensures the interrupt is configured as a wakeup
>>>>>>>>>>>> source only once the device has fully
>>>>>>>>>>>> entered its low-power state. An
>>>>>>>>>>>> error path handles disabling the wakeup IRQ
>>>>>>>>>>>> if the suspend operation
>>>>>>>>>>>> fails.
>>>>>>>>>>>>
>>>>>>>>>>>> Fixes: 1afa70632c39 ("serial: qcom-geni:
>>>>>>>>>>>> Enable PM runtime for serial driver")
>>>>>>>>>>>> Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
>>>>>>>>>>>
>>>>>>>>>>> You forgot:
>>>>>>>>>>>
>>>>>>>>>>> Reported-by: Alexey Klimov <alexey.klimov@linaro.org>
>>>>>>>>>>>
>>>>>>>>>>> Also, not sure where this change will go, via
>>>>>>>>>>> Greg or Jiri, but ideally
>>>>>>>>>>> this should be picked for current -rc cycle since regression is
>>>>>>>>>>> introduced during latest merge window.
>>>>>>>>>>>
>>>>>>>>>>> I also would like to test it on qrb2210 rb1 where this regression is
>>>>>>>>>>> reproduciable.
>>>>
>>>> Since I don't have this board, could you kindly validate the new change and
>>>> run a quick test on your end?
>>>>
>>>> diff --git a/drivers/pinctrl/qcom/pinctrl-msm.c
>>>> b/drivers/pinctrl/qcom/pinctrl-msm.c
>>>> index 83eb075b6bfa..3d6601dc6fcc 100644
>>>> --- a/drivers/pinctrl/qcom/pinctrl-msm.c
>>>> +++ b/drivers/pinctrl/qcom/pinctrl-msm.c
>>>> @@ -215,7 +215,7 @@ static int msm_pinmux_set_mux(struct pinctrl_dev
>>>> *pctldev,
>>>> */
>>>> if (d && i != gpio_func &&
>>>> !test_and_set_bit(d->hwirq, pctrl->disabled_for_mux))
>>>> - disable_irq(irq);
>>>> + disable_irq_nosync(irq);
>>>>
>>>> raw_spin_lock_irqsave(&pctrl->lock, flags);
>>>
>>>
>>> sorry Praveen, didnt see this proposal. testing on my end as well.
>>>
>>
>> just tested on my end and all modules load - deadlocked before this
>> update so there is progress (now we can load the network driver)
>
> Is it supposed to be orginal patch here plus disable_irq_nosync()?
Only this disable_irq_nosync() change from pinctrol subsystem.
Thanks,
Praveen Talari
> Meaning changes for qcom_geni_serial_runtime_{suspend,resume}
> + disable_irq_nosync() in msm_pinmux_set_mux()?
No, only disable_irq_nosync() in msm_pinmux_set_mux().
>
> It seems to work here but let me know few more runs.
>
> Best regards,
> Alexey
>
>
>
On 17/09/2025 19:12, Alexey Klimov wrote:
> Hi Praveen,
>
> On Tue Sep 16, 2025 at 4:07 PM BST, Jorge Ramirez wrote:
>> On 16/09/25 16:39:00, Jorge Ramirez wrote:
>>> On 16/09/25 12:20:25, Praveen Talari wrote:
>>>> Hi Alexey
>>>>
>>>> Thank you for your support.
>>>>
>>>> On 9/15/2025 7:55 PM, Praveen Talari wrote:
>>>>> Hi Alexey,
>>>>>
>>>>> On 9/15/2025 3:09 PM, Alexey Klimov wrote:
>>>>>> (removing <quic_mnaresh@quicinc.com> from c/c -- too many mail not
>>>>>> delivered)
>>>>>>
>>>>>> Hi Praveen,
>>>>>>
>>>>>> On Mon Sep 15, 2025 at 7:58 AM BST, Praveen Talari wrote:
>>>>>>> Hi Alexey,
>>>>>>>
>>>>>>> Really appreciate you waiting!
>>>>>>>
>>>>>>> On 9/11/2025 2:30 PM, Alexey Klimov wrote:
>>>>>>>> Hi Praveen,
>>>>>>>>
>>>>>>>> On Thu Sep 11, 2025 at 9:34 AM BST, Praveen Talari wrote:
>>>>>>>>> Hi Alexy,
>>>>>>>>>
>>>>>>>>> Thank you for update.
>>>>>>>>>
>>>>>>>>> On 9/10/2025 1:35 AM, Alexey Klimov wrote:
>>>>>>>>>>
>>>>>>>>>> (adding Krzysztof to c/c)
>>>>>>>>>>
>>>>>>>>>> On Mon Sep 8, 2025 at 6:43 PM BST, Alexey Klimov wrote:
>>>>>>>>>>> On Mon Sep 8, 2025 at 5:45 PM BST, Praveen Talari wrote:
>>>>>>>>>>>> A deadlock is observed in the
>>>>>>>>>>>> qcom_geni_serial driver during runtime
>>>>>>>>>>>> resume. This occurs when the pinctrl
>>>>>>>>>>>> subsystem reconfigures device pins
>>>>>>>>>>>> via msm_pinmux_set_mux() while the serial device's interrupt is an
>>>>>>>>>>>> active wakeup source. msm_pinmux_set_mux() calls disable_irq() or
>>>>>>>>>>>> __synchronize_irq(), conflicting with the active wakeup state and
>>>>>>>>>>>> causing the IRQ thread to enter an uninterruptible (D-state) sleep,
>>>>>>>>>>>> leading to system instability.
>>>>>>>>>>>>
>>>>>>>>>>>> The critical call trace leading to the deadlock is:
>>>>>>>>>>>>
>>>>>>>>>>>> Call trace:
>>>>>>>>>>>> __switch_to+0xe0/0x120
>>>>>>>>>>>> __schedule+0x39c/0x978
>>>>>>>>>>>> schedule+0x5c/0xf8
>>>>>>>>>>>> __synchronize_irq+0x88/0xb4
>>>>>>>>>>>> disable_irq+0x3c/0x4c
>>>>>>>>>>>> msm_pinmux_set_mux+0x508/0x644
>>>>>>>>>>>> pinmux_enable_setting+0x190/0x2dc
>>>>>>>>>>>> pinctrl_commit_state+0x13c/0x208
>>>>>>>>>>>> pinctrl_pm_select_default_state+0x4c/0xa4
>>>>>>>>>>>> geni_se_resources_on+0xe8/0x154
>>>>>>>>>>>> qcom_geni_serial_runtime_resume+0x4c/0x88
>>>>>>>>>>>> pm_generic_runtime_resume+0x2c/0x44
>>>>>>>>>>>> __genpd_runtime_resume+0x30/0x80
>>>>>>>>>>>> genpd_runtime_resume+0x114/0x29c
>>>>>>>>>>>> __rpm_callback+0x48/0x1d8
>>>>>>>>>>>> rpm_callback+0x6c/0x78
>>>>>>>>>>>> rpm_resume+0x530/0x750
>>>>>>>>>>>> __pm_runtime_resume+0x50/0x94
>>>>>>>>>>>> handle_threaded_wake_irq+0x30/0x94
>>>>>>>>>>>> irq_thread_fn+0x2c/xa8
>>>>>>>>>>>> irq_thread+0x160/x248
>>>>>>>>>>>> kthread+0x110/x114
>>>>>>>>>>>> ret_from_fork+0x10/x20
>>>>>>>>>>>>
>>>>>>>>>>>> To resolve this, explicitly manage the wakeup IRQ state within the
>>>>>>>>>>>> runtime suspend/resume callbacks. In the
>>>>>>>>>>>> runtime resume callback, call
>>>>>>>>>>>> disable_irq_wake() before enabling resources. This preemptively
>>>>>>>>>>>> removes the "wakeup" capability from the IRQ, allowing subsequent
>>>>>>>>>>>> interrupt management calls to proceed
>>>>>>>>>>>> without conflict. An error path
>>>>>>>>>>>> re-enables the wakeup IRQ if resource enablement fails.
>>>>>>>>>>>>
>>>>>>>>>>>> Conversely, in runtime suspend, call
>>>>>>>>>>>> enable_irq_wake() after resources
>>>>>>>>>>>> are disabled. This ensures the interrupt is configured as a wakeup
>>>>>>>>>>>> source only once the device has fully
>>>>>>>>>>>> entered its low-power state. An
>>>>>>>>>>>> error path handles disabling the wakeup IRQ
>>>>>>>>>>>> if the suspend operation
>>>>>>>>>>>> fails.
>>>>>>>>>>>>
>>>>>>>>>>>> Fixes: 1afa70632c39 ("serial: qcom-geni:
>>>>>>>>>>>> Enable PM runtime for serial driver")
>>>>>>>>>>>> Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
>>>>>>>>>>>
>>>>>>>>>>> You forgot:
>>>>>>>>>>>
>>>>>>>>>>> Reported-by: Alexey Klimov <alexey.klimov@linaro.org>
>>>>>>>>>>>
>>>>>>>>>>> Also, not sure where this change will go, via
>>>>>>>>>>> Greg or Jiri, but ideally
>>>>>>>>>>> this should be picked for current -rc cycle since regression is
>>>>>>>>>>> introduced during latest merge window.
>>>>>>>>>>>
>>>>>>>>>>> I also would like to test it on qrb2210 rb1 where this regression is
>>>>>>>>>>> reproduciable.
>>>>
>>>> Since I don't have this board, could you kindly validate the new change and
>>>> run a quick test on your end?
>>>>
>>>> diff --git a/drivers/pinctrl/qcom/pinctrl-msm.c
>>>> b/drivers/pinctrl/qcom/pinctrl-msm.c
>>>> index 83eb075b6bfa..3d6601dc6fcc 100644
>>>> --- a/drivers/pinctrl/qcom/pinctrl-msm.c
>>>> +++ b/drivers/pinctrl/qcom/pinctrl-msm.c
>>>> @@ -215,7 +215,7 @@ static int msm_pinmux_set_mux(struct pinctrl_dev
>>>> *pctldev,
>>>> */
>>>> if (d && i != gpio_func &&
>>>> !test_and_set_bit(d->hwirq, pctrl->disabled_for_mux))
>>>> - disable_irq(irq);
>>>> + disable_irq_nosync(irq);
>>>>
>>>> raw_spin_lock_irqsave(&pctrl->lock, flags);
>>>
>>>
>>> sorry Praveen, didnt see this proposal. testing on my end as well.
>>>
>>
>> just tested on my end and all modules load - deadlocked before this
>> update so there is progress (now we can load the network driver)
>
> Is it supposed to be orginal patch here plus disable_irq_nosync()?
> Meaning changes for qcom_geni_serial_runtime_{suspend,resume}
> + disable_irq_nosync() in msm_pinmux_set_mux()?
>
> It seems to work here but let me know few more runs.
So this bug, after 5 weeks is still not fixed?!?
This is just and should be reverted long time ago.
Best regards,
Krzysztof
On 17/09/2025 02:05, Krzysztof Kozlowski wrote:
> On 17/09/2025 19:12, Alexey Klimov wrote:
>> Hi Praveen,
>>
>> On Tue Sep 16, 2025 at 4:07 PM BST, Jorge Ramirez wrote:
>>> On 16/09/25 16:39:00, Jorge Ramirez wrote:
>>>> On 16/09/25 12:20:25, Praveen Talari wrote:
>>>>> Hi Alexey
>>>>>
>>>>> Thank you for your support.
>>>>>
>>>>> On 9/15/2025 7:55 PM, Praveen Talari wrote:
>>>>>> Hi Alexey,
>>>>>>
>>>>>> On 9/15/2025 3:09 PM, Alexey Klimov wrote:
>>>>>>> (removing <quic_mnaresh@quicinc.com> from c/c -- too many mail not
>>>>>>> delivered)
>>>>>>>
>>>>>>> Hi Praveen,
>>>>>>>
>>>>>>> On Mon Sep 15, 2025 at 7:58 AM BST, Praveen Talari wrote:
>>>>>>>> Hi Alexey,
>>>>>>>>
>>>>>>>> Really appreciate you waiting!
>>>>>>>>
>>>>>>>> On 9/11/2025 2:30 PM, Alexey Klimov wrote:
>>>>>>>>> Hi Praveen,
>>>>>>>>>
>>>>>>>>> On Thu Sep 11, 2025 at 9:34 AM BST, Praveen Talari wrote:
>>>>>>>>>> Hi Alexy,
>>>>>>>>>>
>>>>>>>>>> Thank you for update.
>>>>>>>>>>
>>>>>>>>>> On 9/10/2025 1:35 AM, Alexey Klimov wrote:
>>>>>>>>>>>
>>>>>>>>>>> (adding Krzysztof to c/c)
>>>>>>>>>>>
>>>>>>>>>>> On Mon Sep 8, 2025 at 6:43 PM BST, Alexey Klimov wrote:
>>>>>>>>>>>> On Mon Sep 8, 2025 at 5:45 PM BST, Praveen Talari wrote:
>>>>>>>>>>>>> A deadlock is observed in the
>>>>>>>>>>>>> qcom_geni_serial driver during runtime
>>>>>>>>>>>>> resume. This occurs when the pinctrl
>>>>>>>>>>>>> subsystem reconfigures device pins
>>>>>>>>>>>>> via msm_pinmux_set_mux() while the serial device's interrupt is an
>>>>>>>>>>>>> active wakeup source. msm_pinmux_set_mux() calls disable_irq() or
>>>>>>>>>>>>> __synchronize_irq(), conflicting with the active wakeup state and
>>>>>>>>>>>>> causing the IRQ thread to enter an uninterruptible (D-state) sleep,
>>>>>>>>>>>>> leading to system instability.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The critical call trace leading to the deadlock is:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Call trace:
>>>>>>>>>>>>> __switch_to+0xe0/0x120
>>>>>>>>>>>>> __schedule+0x39c/0x978
>>>>>>>>>>>>> schedule+0x5c/0xf8
>>>>>>>>>>>>> __synchronize_irq+0x88/0xb4
>>>>>>>>>>>>> disable_irq+0x3c/0x4c
>>>>>>>>>>>>> msm_pinmux_set_mux+0x508/0x644
>>>>>>>>>>>>> pinmux_enable_setting+0x190/0x2dc
>>>>>>>>>>>>> pinctrl_commit_state+0x13c/0x208
>>>>>>>>>>>>> pinctrl_pm_select_default_state+0x4c/0xa4
>>>>>>>>>>>>> geni_se_resources_on+0xe8/0x154
>>>>>>>>>>>>> qcom_geni_serial_runtime_resume+0x4c/0x88
>>>>>>>>>>>>> pm_generic_runtime_resume+0x2c/0x44
>>>>>>>>>>>>> __genpd_runtime_resume+0x30/0x80
>>>>>>>>>>>>> genpd_runtime_resume+0x114/0x29c
>>>>>>>>>>>>> __rpm_callback+0x48/0x1d8
>>>>>>>>>>>>> rpm_callback+0x6c/0x78
>>>>>>>>>>>>> rpm_resume+0x530/0x750
>>>>>>>>>>>>> __pm_runtime_resume+0x50/0x94
>>>>>>>>>>>>> handle_threaded_wake_irq+0x30/0x94
>>>>>>>>>>>>> irq_thread_fn+0x2c/xa8
>>>>>>>>>>>>> irq_thread+0x160/x248
>>>>>>>>>>>>> kthread+0x110/x114
>>>>>>>>>>>>> ret_from_fork+0x10/x20
>>>>>>>>>>>>>
>>>>>>>>>>>>> To resolve this, explicitly manage the wakeup IRQ state within the
>>>>>>>>>>>>> runtime suspend/resume callbacks. In the
>>>>>>>>>>>>> runtime resume callback, call
>>>>>>>>>>>>> disable_irq_wake() before enabling resources. This preemptively
>>>>>>>>>>>>> removes the "wakeup" capability from the IRQ, allowing subsequent
>>>>>>>>>>>>> interrupt management calls to proceed
>>>>>>>>>>>>> without conflict. An error path
>>>>>>>>>>>>> re-enables the wakeup IRQ if resource enablement fails.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Conversely, in runtime suspend, call
>>>>>>>>>>>>> enable_irq_wake() after resources
>>>>>>>>>>>>> are disabled. This ensures the interrupt is configured as a wakeup
>>>>>>>>>>>>> source only once the device has fully
>>>>>>>>>>>>> entered its low-power state. An
>>>>>>>>>>>>> error path handles disabling the wakeup IRQ
>>>>>>>>>>>>> if the suspend operation
>>>>>>>>>>>>> fails.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Fixes: 1afa70632c39 ("serial: qcom-geni:
>>>>>>>>>>>>> Enable PM runtime for serial driver")
>>>>>>>>>>>>> Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
>>>>>>>>>>>>
>>>>>>>>>>>> You forgot:
>>>>>>>>>>>>
>>>>>>>>>>>> Reported-by: Alexey Klimov <alexey.klimov@linaro.org>
>>>>>>>>>>>>
>>>>>>>>>>>> Also, not sure where this change will go, via
>>>>>>>>>>>> Greg or Jiri, but ideally
>>>>>>>>>>>> this should be picked for current -rc cycle since regression is
>>>>>>>>>>>> introduced during latest merge window.
>>>>>>>>>>>>
>>>>>>>>>>>> I also would like to test it on qrb2210 rb1 where this regression is
>>>>>>>>>>>> reproduciable.
>>>>>
>>>>> Since I don't have this board, could you kindly validate the new change and
>>>>> run a quick test on your end?
>>>>>
>>>>> diff --git a/drivers/pinctrl/qcom/pinctrl-msm.c
>>>>> b/drivers/pinctrl/qcom/pinctrl-msm.c
>>>>> index 83eb075b6bfa..3d6601dc6fcc 100644
>>>>> --- a/drivers/pinctrl/qcom/pinctrl-msm.c
>>>>> +++ b/drivers/pinctrl/qcom/pinctrl-msm.c
>>>>> @@ -215,7 +215,7 @@ static int msm_pinmux_set_mux(struct pinctrl_dev
>>>>> *pctldev,
>>>>> */
>>>>> if (d && i != gpio_func &&
>>>>> !test_and_set_bit(d->hwirq, pctrl->disabled_for_mux))
>>>>> - disable_irq(irq);
>>>>> + disable_irq_nosync(irq);
>>>>>
>>>>> raw_spin_lock_irqsave(&pctrl->lock, flags);
>>>>
>>>>
>>>> sorry Praveen, didnt see this proposal. testing on my end as well.
>>>>
>>>
>>> just tested on my end and all modules load - deadlocked before this
>>> update so there is progress (now we can load the network driver)
>>
>> Is it supposed to be orginal patch here plus disable_irq_nosync()?
>> Meaning changes for qcom_geni_serial_runtime_{suspend,resume}
>> + disable_irq_nosync() in msm_pinmux_set_mux()?
>>
>> It seems to work here but let me know few more runs.
>
>
> So this bug, after 5 weeks is still not fixed?!?
>
> This is just and should be reverted long time ago.
I will send the revert, because this is just mocking the kernel process.
Best regards,
Krzysztof
On 9/17/2025 5:43 AM, Krzysztof Kozlowski wrote:
> On 17/09/2025 02:05, Krzysztof Kozlowski wrote:
>> On 17/09/2025 19:12, Alexey Klimov wrote:
>>> Hi Praveen,
>>>
>>> On Tue Sep 16, 2025 at 4:07 PM BST, Jorge Ramirez wrote:
>>>> On 16/09/25 16:39:00, Jorge Ramirez wrote:
>>>>> On 16/09/25 12:20:25, Praveen Talari wrote:
>>>>>> Hi Alexey
>>>>>>
>>>>>> Thank you for your support.
>>>>>>
>>>>>> On 9/15/2025 7:55 PM, Praveen Talari wrote:
>>>>>>> Hi Alexey,
>>>>>>>
>>>>>>> On 9/15/2025 3:09 PM, Alexey Klimov wrote:
>>>>>>>> (removing <quic_mnaresh@quicinc.com> from c/c -- too many mail not
>>>>>>>> delivered)
>>>>>>>>
>>>>>>>> Hi Praveen,
>>>>>>>>
>>>>>>>> On Mon Sep 15, 2025 at 7:58 AM BST, Praveen Talari wrote:
>>>>>>>>> Hi Alexey,
>>>>>>>>>
>>>>>>>>> Really appreciate you waiting!
>>>>>>>>>
>>>>>>>>> On 9/11/2025 2:30 PM, Alexey Klimov wrote:
>>>>>>>>>> Hi Praveen,
>>>>>>>>>>
>>>>>>>>>> On Thu Sep 11, 2025 at 9:34 AM BST, Praveen Talari wrote:
>>>>>>>>>>> Hi Alexy,
>>>>>>>>>>>
>>>>>>>>>>> Thank you for update.
>>>>>>>>>>>
>>>>>>>>>>> On 9/10/2025 1:35 AM, Alexey Klimov wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> (adding Krzysztof to c/c)
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon Sep 8, 2025 at 6:43 PM BST, Alexey Klimov wrote:
>>>>>>>>>>>>> On Mon Sep 8, 2025 at 5:45 PM BST, Praveen Talari wrote:
>>>>>>>>>>>>>> A deadlock is observed in the
>>>>>>>>>>>>>> qcom_geni_serial driver during runtime
>>>>>>>>>>>>>> resume. This occurs when the pinctrl
>>>>>>>>>>>>>> subsystem reconfigures device pins
>>>>>>>>>>>>>> via msm_pinmux_set_mux() while the serial device's interrupt is an
>>>>>>>>>>>>>> active wakeup source. msm_pinmux_set_mux() calls disable_irq() or
>>>>>>>>>>>>>> __synchronize_irq(), conflicting with the active wakeup state and
>>>>>>>>>>>>>> causing the IRQ thread to enter an uninterruptible (D-state) sleep,
>>>>>>>>>>>>>> leading to system instability.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The critical call trace leading to the deadlock is:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Call trace:
>>>>>>>>>>>>>> __switch_to+0xe0/0x120
>>>>>>>>>>>>>> __schedule+0x39c/0x978
>>>>>>>>>>>>>> schedule+0x5c/0xf8
>>>>>>>>>>>>>> __synchronize_irq+0x88/0xb4
>>>>>>>>>>>>>> disable_irq+0x3c/0x4c
>>>>>>>>>>>>>> msm_pinmux_set_mux+0x508/0x644
>>>>>>>>>>>>>> pinmux_enable_setting+0x190/0x2dc
>>>>>>>>>>>>>> pinctrl_commit_state+0x13c/0x208
>>>>>>>>>>>>>> pinctrl_pm_select_default_state+0x4c/0xa4
>>>>>>>>>>>>>> geni_se_resources_on+0xe8/0x154
>>>>>>>>>>>>>> qcom_geni_serial_runtime_resume+0x4c/0x88
>>>>>>>>>>>>>> pm_generic_runtime_resume+0x2c/0x44
>>>>>>>>>>>>>> __genpd_runtime_resume+0x30/0x80
>>>>>>>>>>>>>> genpd_runtime_resume+0x114/0x29c
>>>>>>>>>>>>>> __rpm_callback+0x48/0x1d8
>>>>>>>>>>>>>> rpm_callback+0x6c/0x78
>>>>>>>>>>>>>> rpm_resume+0x530/0x750
>>>>>>>>>>>>>> __pm_runtime_resume+0x50/0x94
>>>>>>>>>>>>>> handle_threaded_wake_irq+0x30/0x94
>>>>>>>>>>>>>> irq_thread_fn+0x2c/xa8
>>>>>>>>>>>>>> irq_thread+0x160/x248
>>>>>>>>>>>>>> kthread+0x110/x114
>>>>>>>>>>>>>> ret_from_fork+0x10/x20
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> To resolve this, explicitly manage the wakeup IRQ state within the
>>>>>>>>>>>>>> runtime suspend/resume callbacks. In the
>>>>>>>>>>>>>> runtime resume callback, call
>>>>>>>>>>>>>> disable_irq_wake() before enabling resources. This preemptively
>>>>>>>>>>>>>> removes the "wakeup" capability from the IRQ, allowing subsequent
>>>>>>>>>>>>>> interrupt management calls to proceed
>>>>>>>>>>>>>> without conflict. An error path
>>>>>>>>>>>>>> re-enables the wakeup IRQ if resource enablement fails.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Conversely, in runtime suspend, call
>>>>>>>>>>>>>> enable_irq_wake() after resources
>>>>>>>>>>>>>> are disabled. This ensures the interrupt is configured as a wakeup
>>>>>>>>>>>>>> source only once the device has fully
>>>>>>>>>>>>>> entered its low-power state. An
>>>>>>>>>>>>>> error path handles disabling the wakeup IRQ
>>>>>>>>>>>>>> if the suspend operation
>>>>>>>>>>>>>> fails.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Fixes: 1afa70632c39 ("serial: qcom-geni:
>>>>>>>>>>>>>> Enable PM runtime for serial driver")
>>>>>>>>>>>>>> Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>> You forgot:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Reported-by: Alexey Klimov <alexey.klimov@linaro.org>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Also, not sure where this change will go, via
>>>>>>>>>>>>> Greg or Jiri, but ideally
>>>>>>>>>>>>> this should be picked for current -rc cycle since regression is
>>>>>>>>>>>>> introduced during latest merge window.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I also would like to test it on qrb2210 rb1 where this regression is
>>>>>>>>>>>>> reproduciable.
>>>>>>
>>>>>> Since I don't have this board, could you kindly validate the new change and
>>>>>> run a quick test on your end?
>>>>>>
>>>>>> diff --git a/drivers/pinctrl/qcom/pinctrl-msm.c
>>>>>> b/drivers/pinctrl/qcom/pinctrl-msm.c
>>>>>> index 83eb075b6bfa..3d6601dc6fcc 100644
>>>>>> --- a/drivers/pinctrl/qcom/pinctrl-msm.c
>>>>>> +++ b/drivers/pinctrl/qcom/pinctrl-msm.c
>>>>>> @@ -215,7 +215,7 @@ static int msm_pinmux_set_mux(struct pinctrl_dev
>>>>>> *pctldev,
>>>>>> */
>>>>>> if (d && i != gpio_func &&
>>>>>> !test_and_set_bit(d->hwirq, pctrl->disabled_for_mux))
>>>>>> - disable_irq(irq);
>>>>>> + disable_irq_nosync(irq);
>>>>>>
>>>>>> raw_spin_lock_irqsave(&pctrl->lock, flags);
>>>>>
>>>>>
>>>>> sorry Praveen, didnt see this proposal. testing on my end as well.
>>>>>
>>>>
>>>> just tested on my end and all modules load - deadlocked before this
>>>> update so there is progress (now we can load the network driver)
>>>
>>> Is it supposed to be orginal patch here plus disable_irq_nosync()?
>>> Meaning changes for qcom_geni_serial_runtime_{suspend,resume}
>>> + disable_irq_nosync() in msm_pinmux_set_mux()?
>>>
>>> It seems to work here but let me know few more runs.
>>
>>
>> So this bug, after 5 weeks is still not fixed?!?
I understand the concern. We didn’t have access to the same board where
Alexey is seeing the issue, so we tried to reproduce it on a different
target by simulating with wake-up IRQ scenarios.
From our analysis, the issue seems to be triggered by commit
1afa70632c39 ("serial: qcom-geni: Enable PM runtime for serial driver")
in the pinctrl subsystem.
A fix has already been submitted, and we’re currently waiting for
Alexey’s feedback to proceed.
>>
>> This is just and should be reverted long time ago.
>
> I will send the revert, because this is just mocking the kernel process.
>
> Best regards,
> Krzysztof
© 2016 - 2026 Red Hat, Inc.