[PATCH v1] serial: qcom-geni: Fix pinctrl deadlock on runtime resume

Praveen Talari posted 1 patch 5 months ago
drivers/tty/serial/qcom_geni_serial.c | 22 ++++++++++++++++++++--
1 file changed, 20 insertions(+), 2 deletions(-)
[PATCH v1] serial: qcom-geni: Fix pinctrl deadlock on runtime resume
Posted by Praveen Talari 5 months ago
A deadlock is observed in the qcom_geni_serial driver during runtime
resume. This occurs when the pinctrl subsystem reconfigures device pins
via msm_pinmux_set_mux() while the serial device's interrupt is an
active wakeup source. msm_pinmux_set_mux() calls disable_irq() or
__synchronize_irq(), conflicting with the active wakeup state and
causing the IRQ thread to enter an uninterruptible (D-state) sleep,
leading to system instability.

The critical call trace leading to the deadlock is:

    Call trace:
    __switch_to+0xe0/0x120
    __schedule+0x39c/0x978
    schedule+0x5c/0xf8
    __synchronize_irq+0x88/0xb4
    disable_irq+0x3c/0x4c
    msm_pinmux_set_mux+0x508/0x644
    pinmux_enable_setting+0x190/0x2dc
    pinctrl_commit_state+0x13c/0x208
    pinctrl_pm_select_default_state+0x4c/0xa4
    geni_se_resources_on+0xe8/0x154
    qcom_geni_serial_runtime_resume+0x4c/0x88
    pm_generic_runtime_resume+0x2c/0x44
    __genpd_runtime_resume+0x30/0x80
    genpd_runtime_resume+0x114/0x29c
    __rpm_callback+0x48/0x1d8
    rpm_callback+0x6c/0x78
    rpm_resume+0x530/0x750
    __pm_runtime_resume+0x50/0x94
    handle_threaded_wake_irq+0x30/0x94
    irq_thread_fn+0x2c/xa8
    irq_thread+0x160/x248
    kthread+0x110/x114
    ret_from_fork+0x10/x20

To resolve this, explicitly manage the wakeup IRQ state within the
runtime suspend/resume callbacks. In the runtime resume callback, call
disable_irq_wake() before enabling resources. This preemptively
removes the "wakeup" capability from the IRQ, allowing subsequent
interrupt management calls to proceed without conflict. An error path
re-enables the wakeup IRQ if resource enablement fails.

Conversely, in runtime suspend, call enable_irq_wake() after resources
are disabled. This ensures the interrupt is configured as a wakeup
source only once the device has fully entered its low-power state. An
error path handles disabling the wakeup IRQ if the suspend operation
fails.

Fixes: 1afa70632c39 ("serial: qcom-geni: Enable PM runtime for serial driver")
Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
---
 drivers/tty/serial/qcom_geni_serial.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
index 0fdda3a1e70b..4f5ea28dfe8f 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -1926,8 +1926,17 @@ static int __maybe_unused qcom_geni_serial_runtime_suspend(struct device *dev)
 	struct uart_port *uport = &port->uport;
 	int ret = 0;
 
-	if (port->dev_data->power_state)
+	if (port->dev_data->power_state) {
 		ret = port->dev_data->power_state(uport, false);
+		if (ret) {
+			if (device_can_wakeup(dev))
+				disable_irq_wake(port->wakeup_irq);
+			return ret;
+		}
+	}
+
+	if (device_can_wakeup(dev))
+		enable_irq_wake(port->wakeup_irq);
 
 	return ret;
 }
@@ -1938,8 +1947,17 @@ static int __maybe_unused qcom_geni_serial_runtime_resume(struct device *dev)
 	struct uart_port *uport = &port->uport;
 	int ret = 0;
 
-	if (port->dev_data->power_state)
+	if (device_can_wakeup(dev))
+		disable_irq_wake(port->wakeup_irq);
+
+	if (port->dev_data->power_state) {
 		ret = port->dev_data->power_state(uport, true);
+		if (ret) {
+			if (device_can_wakeup(dev))
+				enable_irq_wake(port->wakeup_irq);
+			return ret;
+		}
+	}
 
 	return ret;
 }

base-commit: 3e8e5822146bc396d2a7e5fbb7be13271665522a
-- 
2.34.1
Re: [PATCH v1] serial: qcom-geni: Fix pinctrl deadlock on runtime resume
Posted by Alexey Klimov 5 months ago
On Mon Sep 8, 2025 at 5:45 PM BST, Praveen Talari wrote:
> A deadlock is observed in the qcom_geni_serial driver during runtime
> resume. This occurs when the pinctrl subsystem reconfigures device pins
> via msm_pinmux_set_mux() while the serial device's interrupt is an
> active wakeup source. msm_pinmux_set_mux() calls disable_irq() or
> __synchronize_irq(), conflicting with the active wakeup state and
> causing the IRQ thread to enter an uninterruptible (D-state) sleep,
> leading to system instability.
>
> The critical call trace leading to the deadlock is:
>
>     Call trace:
>     __switch_to+0xe0/0x120
>     __schedule+0x39c/0x978
>     schedule+0x5c/0xf8
>     __synchronize_irq+0x88/0xb4
>     disable_irq+0x3c/0x4c
>     msm_pinmux_set_mux+0x508/0x644
>     pinmux_enable_setting+0x190/0x2dc
>     pinctrl_commit_state+0x13c/0x208
>     pinctrl_pm_select_default_state+0x4c/0xa4
>     geni_se_resources_on+0xe8/0x154
>     qcom_geni_serial_runtime_resume+0x4c/0x88
>     pm_generic_runtime_resume+0x2c/0x44
>     __genpd_runtime_resume+0x30/0x80
>     genpd_runtime_resume+0x114/0x29c
>     __rpm_callback+0x48/0x1d8
>     rpm_callback+0x6c/0x78
>     rpm_resume+0x530/0x750
>     __pm_runtime_resume+0x50/0x94
>     handle_threaded_wake_irq+0x30/0x94
>     irq_thread_fn+0x2c/xa8
>     irq_thread+0x160/x248
>     kthread+0x110/x114
>     ret_from_fork+0x10/x20
>
> To resolve this, explicitly manage the wakeup IRQ state within the
> runtime suspend/resume callbacks. In the runtime resume callback, call
> disable_irq_wake() before enabling resources. This preemptively
> removes the "wakeup" capability from the IRQ, allowing subsequent
> interrupt management calls to proceed without conflict. An error path
> re-enables the wakeup IRQ if resource enablement fails.
>
> Conversely, in runtime suspend, call enable_irq_wake() after resources
> are disabled. This ensures the interrupt is configured as a wakeup
> source only once the device has fully entered its low-power state. An
> error path handles disabling the wakeup IRQ if the suspend operation
> fails.
>
> Fixes: 1afa70632c39 ("serial: qcom-geni: Enable PM runtime for serial driver")
> Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>

You forgot:

Reported-by: Alexey Klimov <alexey.klimov@linaro.org>

Also, not sure where this change will go, via Greg or Jiri, but ideally
this should be picked for current -rc cycle since regression is
introduced during latest merge window.

I also would like to test it on qrb2210 rb1 where this regression is
reproduciable.

Thanks,
Alexey

[..]
Re: [PATCH v1] serial: qcom-geni: Fix pinctrl deadlock on runtime resume
Posted by Alexey Klimov 5 months ago
(adding Krzysztof to c/c)

On Mon Sep 8, 2025 at 6:43 PM BST, Alexey Klimov wrote:
> On Mon Sep 8, 2025 at 5:45 PM BST, Praveen Talari wrote:
>> A deadlock is observed in the qcom_geni_serial driver during runtime
>> resume. This occurs when the pinctrl subsystem reconfigures device pins
>> via msm_pinmux_set_mux() while the serial device's interrupt is an
>> active wakeup source. msm_pinmux_set_mux() calls disable_irq() or
>> __synchronize_irq(), conflicting with the active wakeup state and
>> causing the IRQ thread to enter an uninterruptible (D-state) sleep,
>> leading to system instability.
>>
>> The critical call trace leading to the deadlock is:
>>
>>     Call trace:
>>     __switch_to+0xe0/0x120
>>     __schedule+0x39c/0x978
>>     schedule+0x5c/0xf8
>>     __synchronize_irq+0x88/0xb4
>>     disable_irq+0x3c/0x4c
>>     msm_pinmux_set_mux+0x508/0x644
>>     pinmux_enable_setting+0x190/0x2dc
>>     pinctrl_commit_state+0x13c/0x208
>>     pinctrl_pm_select_default_state+0x4c/0xa4
>>     geni_se_resources_on+0xe8/0x154
>>     qcom_geni_serial_runtime_resume+0x4c/0x88
>>     pm_generic_runtime_resume+0x2c/0x44
>>     __genpd_runtime_resume+0x30/0x80
>>     genpd_runtime_resume+0x114/0x29c
>>     __rpm_callback+0x48/0x1d8
>>     rpm_callback+0x6c/0x78
>>     rpm_resume+0x530/0x750
>>     __pm_runtime_resume+0x50/0x94
>>     handle_threaded_wake_irq+0x30/0x94
>>     irq_thread_fn+0x2c/xa8
>>     irq_thread+0x160/x248
>>     kthread+0x110/x114
>>     ret_from_fork+0x10/x20
>>
>> To resolve this, explicitly manage the wakeup IRQ state within the
>> runtime suspend/resume callbacks. In the runtime resume callback, call
>> disable_irq_wake() before enabling resources. This preemptively
>> removes the "wakeup" capability from the IRQ, allowing subsequent
>> interrupt management calls to proceed without conflict. An error path
>> re-enables the wakeup IRQ if resource enablement fails.
>>
>> Conversely, in runtime suspend, call enable_irq_wake() after resources
>> are disabled. This ensures the interrupt is configured as a wakeup
>> source only once the device has fully entered its low-power state. An
>> error path handles disabling the wakeup IRQ if the suspend operation
>> fails.
>>
>> Fixes: 1afa70632c39 ("serial: qcom-geni: Enable PM runtime for serial driver")
>> Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
>
> You forgot:
>
> Reported-by: Alexey Klimov <alexey.klimov@linaro.org>
>
> Also, not sure where this change will go, via Greg or Jiri, but ideally
> this should be picked for current -rc cycle since regression is
> introduced during latest merge window.
>
> I also would like to test it on qrb2210 rb1 where this regression is
> reproduciable.

It doesn't seem that it fixes the regression on RB1 board:

 INFO: task kworker/u16:3:50 blocked for more than 120 seconds.
       Not tainted 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 task:kworker/u16:3   state:D stack:0     pid:50    tgid:50    ppid:2      task_flags:0x4208060 flags:0x00000010
 Workqueue: async async_run_entry_fn
 Call trace:
  __switch_to+0xf0/0x1c0 (T)
  __schedule+0x358/0x99c
  schedule+0x34/0x11c
  rpm_resume+0x17c/0x6a0
  rpm_resume+0x2c4/0x6a0
  rpm_resume+0x2c4/0x6a0
  rpm_resume+0x2c4/0x6a0
  __pm_runtime_resume+0x50/0x9c
  __driver_probe_device+0x58/0x120
  driver_probe_device+0x3c/0x154
  __driver_attach_async_helper+0x4c/0xc0
  async_run_entry_fn+0x34/0xe0
  process_one_work+0x148/0x284
  worker_thread+0x2c4/0x3e0
  kthread+0x12c/0x210
  ret_from_fork+0x10/0x20
 INFO: task irq/92-4a8c000.:79 blocked for more than 120 seconds.
       Not tainted 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 task:irq/92-4a8c000. state:D stack:0     pid:79    tgid:79    ppid:2      task_flags:0x208040 flags:0x00000010
 Call trace:
  __switch_to+0xf0/0x1c0 (T)
  __schedule+0x358/0x99c
  schedule+0x34/0x11c
  __synchronize_irq+0x90/0xcc
  disable_irq+0x3c/0x4c
  msm_pinmux_set_mux+0x3b4/0x45c
  pinmux_enable_setting+0x1fc/0x2d8
  pinctrl_commit_state+0xa0/0x260
  pinctrl_pm_select_default_state+0x4c/0xa0
  geni_se_resources_on+0xe8/0x154
  geni_serial_resource_state+0x8c/0xbc
  qcom_geni_serial_runtime_resume+0x3c/0x88
  pm_generic_runtime_resume+0x2c/0x44
  __rpm_callback+0x48/0x1e0
  rpm_callback+0x74/0x80
  rpm_resume+0x3bc/0x6a0
  __pm_runtime_resume+0x50/0x9c
  handle_threaded_wake_irq+0x30/0x80
  irq_thread_fn+0x2c/0xb0
  irq_thread+0x170/0x334
  kthread+0x12c/0x210
  ret_from_fork+0x10/0x20

I see exactly the same behaviour with this changes applied.

root@rb1:~# uname -a
Linux rb1 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13 SMP PREEMPT Tue Sep  9 20:14:22 BST 2025 aarch64 GNU/Linux

I see the same behaviour with linux-next but my local tree is a bit old,
maybe there are some dependencies.

Best regards,
Alexey
Re: [PATCH v1] serial: qcom-geni: Fix pinctrl deadlock on runtime resume
Posted by Praveen Talari 5 months ago
Hi Alexy,

Thank you for update.

On 9/10/2025 1:35 AM, Alexey Klimov wrote:
> 
> (adding Krzysztof to c/c)
> 
> On Mon Sep 8, 2025 at 6:43 PM BST, Alexey Klimov wrote:
>> On Mon Sep 8, 2025 at 5:45 PM BST, Praveen Talari wrote:
>>> A deadlock is observed in the qcom_geni_serial driver during runtime
>>> resume. This occurs when the pinctrl subsystem reconfigures device pins
>>> via msm_pinmux_set_mux() while the serial device's interrupt is an
>>> active wakeup source. msm_pinmux_set_mux() calls disable_irq() or
>>> __synchronize_irq(), conflicting with the active wakeup state and
>>> causing the IRQ thread to enter an uninterruptible (D-state) sleep,
>>> leading to system instability.
>>>
>>> The critical call trace leading to the deadlock is:
>>>
>>>      Call trace:
>>>      __switch_to+0xe0/0x120
>>>      __schedule+0x39c/0x978
>>>      schedule+0x5c/0xf8
>>>      __synchronize_irq+0x88/0xb4
>>>      disable_irq+0x3c/0x4c
>>>      msm_pinmux_set_mux+0x508/0x644
>>>      pinmux_enable_setting+0x190/0x2dc
>>>      pinctrl_commit_state+0x13c/0x208
>>>      pinctrl_pm_select_default_state+0x4c/0xa4
>>>      geni_se_resources_on+0xe8/0x154
>>>      qcom_geni_serial_runtime_resume+0x4c/0x88
>>>      pm_generic_runtime_resume+0x2c/0x44
>>>      __genpd_runtime_resume+0x30/0x80
>>>      genpd_runtime_resume+0x114/0x29c
>>>      __rpm_callback+0x48/0x1d8
>>>      rpm_callback+0x6c/0x78
>>>      rpm_resume+0x530/0x750
>>>      __pm_runtime_resume+0x50/0x94
>>>      handle_threaded_wake_irq+0x30/0x94
>>>      irq_thread_fn+0x2c/xa8
>>>      irq_thread+0x160/x248
>>>      kthread+0x110/x114
>>>      ret_from_fork+0x10/x20
>>>
>>> To resolve this, explicitly manage the wakeup IRQ state within the
>>> runtime suspend/resume callbacks. In the runtime resume callback, call
>>> disable_irq_wake() before enabling resources. This preemptively
>>> removes the "wakeup" capability from the IRQ, allowing subsequent
>>> interrupt management calls to proceed without conflict. An error path
>>> re-enables the wakeup IRQ if resource enablement fails.
>>>
>>> Conversely, in runtime suspend, call enable_irq_wake() after resources
>>> are disabled. This ensures the interrupt is configured as a wakeup
>>> source only once the device has fully entered its low-power state. An
>>> error path handles disabling the wakeup IRQ if the suspend operation
>>> fails.
>>>
>>> Fixes: 1afa70632c39 ("serial: qcom-geni: Enable PM runtime for serial driver")
>>> Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
>>
>> You forgot:
>>
>> Reported-by: Alexey Klimov <alexey.klimov@linaro.org>
>>
>> Also, not sure where this change will go, via Greg or Jiri, but ideally
>> this should be picked for current -rc cycle since regression is
>> introduced during latest merge window.
>>
>> I also would like to test it on qrb2210 rb1 where this regression is
>> reproduciable.
> 
> It doesn't seem that it fixes the regression on RB1 board:
> 
>   INFO: task kworker/u16:3:50 blocked for more than 120 seconds.
>         Not tainted 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13
>   "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>   task:kworker/u16:3   state:D stack:0     pid:50    tgid:50    ppid:2      task_flags:0x4208060 flags:0x00000010
>   Workqueue: async async_run_entry_fn
>   Call trace:
>    __switch_to+0xf0/0x1c0 (T)
>    __schedule+0x358/0x99c
>    schedule+0x34/0x11c
>    rpm_resume+0x17c/0x6a0
>    rpm_resume+0x2c4/0x6a0
>    rpm_resume+0x2c4/0x6a0
>    rpm_resume+0x2c4/0x6a0
>    __pm_runtime_resume+0x50/0x9c
>    __driver_probe_device+0x58/0x120
>    driver_probe_device+0x3c/0x154
>    __driver_attach_async_helper+0x4c/0xc0
>    async_run_entry_fn+0x34/0xe0
>    process_one_work+0x148/0x284
>    worker_thread+0x2c4/0x3e0
>    kthread+0x12c/0x210
>    ret_from_fork+0x10/0x20
>   INFO: task irq/92-4a8c000.:79 blocked for more than 120 seconds.
>         Not tainted 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13
>   "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>   task:irq/92-4a8c000. state:D stack:0     pid:79    tgid:79    ppid:2      task_flags:0x208040 flags:0x00000010
>   Call trace:
>    __switch_to+0xf0/0x1c0 (T)
>    __schedule+0x358/0x99c
>    schedule+0x34/0x11c
>    __synchronize_irq+0x90/0xcc
>    disable_irq+0x3c/0x4c
>    msm_pinmux_set_mux+0x3b4/0x45c
>    pinmux_enable_setting+0x1fc/0x2d8
>    pinctrl_commit_state+0xa0/0x260
>    pinctrl_pm_select_default_state+0x4c/0xa0
>    geni_se_resources_on+0xe8/0x154
>    geni_serial_resource_state+0x8c/0xbc
>    qcom_geni_serial_runtime_resume+0x3c/0x88
>    pm_generic_runtime_resume+0x2c/0x44
>    __rpm_callback+0x48/0x1e0
>    rpm_callback+0x74/0x80
>    rpm_resume+0x3bc/0x6a0
>    __pm_runtime_resume+0x50/0x9c
>    handle_threaded_wake_irq+0x30/0x80
>    irq_thread_fn+0x2c/0xb0
>    irq_thread+0x170/0x334
>    kthread+0x12c/0x210
>    ret_from_fork+0x10/0x20

I can see call stack is mostly similar for yours and mine but not 
completely at initial calls.

Yours dump:
 >    qcom_geni_serial_runtime_resume+0x3c/0x88
 >    pm_generic_runtime_resume+0x2c/0x44
 >    __rpm_callback+0x48/0x1e0
 >    rpm_callback+0x74/0x80
 >    rpm_resume+0x3bc/0x6a0
 >    __pm_runtime_resume+0x50/0x9c
 >    handle_threaded_wake_irq+0x30/0x80

Mine:
 >>>      qcom_geni_serial_runtime_resume+0x4c/0x88
 >>>      pm_generic_runtime_resume+0x2c/0x44
 >>>      __genpd_runtime_resume+0x30/0x80
 >>>      genpd_runtime_resume+0x114/0x29c
 >>>      __rpm_callback+0x48/0x1d8
 >>>      rpm_callback+0x6c/0x78
 >>>      rpm_resume+0x530/0x750


Can you please share what is DT file for this Board if possible?
is there any usecase enabled on this SE instance?

Thanks,
Praveen Talari
> 
> I see exactly the same behaviour with this changes applied.
> 
> root@rb1:~# uname -a
> Linux rb1 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13 SMP PREEMPT Tue Sep  9 20:14:22 BST 2025 aarch64 GNU/Linux
> 
> I see the same behaviour with linux-next but my local tree is a bit old,
> maybe there are some dependencies.
> 
> Best regards,
> Alexey
Re: [PATCH v1] serial: qcom-geni: Fix pinctrl deadlock on runtime resume
Posted by Alexey Klimov 5 months ago
Hi Praveen,

On Thu Sep 11, 2025 at 9:34 AM BST, Praveen Talari wrote:
> Hi Alexy,
>
> Thank you for update.
>
> On 9/10/2025 1:35 AM, Alexey Klimov wrote:
>> 
>> (adding Krzysztof to c/c)
>> 
>> On Mon Sep 8, 2025 at 6:43 PM BST, Alexey Klimov wrote:
>>> On Mon Sep 8, 2025 at 5:45 PM BST, Praveen Talari wrote:
>>>> A deadlock is observed in the qcom_geni_serial driver during runtime
>>>> resume. This occurs when the pinctrl subsystem reconfigures device pins
>>>> via msm_pinmux_set_mux() while the serial device's interrupt is an
>>>> active wakeup source. msm_pinmux_set_mux() calls disable_irq() or
>>>> __synchronize_irq(), conflicting with the active wakeup state and
>>>> causing the IRQ thread to enter an uninterruptible (D-state) sleep,
>>>> leading to system instability.
>>>>
>>>> The critical call trace leading to the deadlock is:
>>>>
>>>>      Call trace:
>>>>      __switch_to+0xe0/0x120
>>>>      __schedule+0x39c/0x978
>>>>      schedule+0x5c/0xf8
>>>>      __synchronize_irq+0x88/0xb4
>>>>      disable_irq+0x3c/0x4c
>>>>      msm_pinmux_set_mux+0x508/0x644
>>>>      pinmux_enable_setting+0x190/0x2dc
>>>>      pinctrl_commit_state+0x13c/0x208
>>>>      pinctrl_pm_select_default_state+0x4c/0xa4
>>>>      geni_se_resources_on+0xe8/0x154
>>>>      qcom_geni_serial_runtime_resume+0x4c/0x88
>>>>      pm_generic_runtime_resume+0x2c/0x44
>>>>      __genpd_runtime_resume+0x30/0x80
>>>>      genpd_runtime_resume+0x114/0x29c
>>>>      __rpm_callback+0x48/0x1d8
>>>>      rpm_callback+0x6c/0x78
>>>>      rpm_resume+0x530/0x750
>>>>      __pm_runtime_resume+0x50/0x94
>>>>      handle_threaded_wake_irq+0x30/0x94
>>>>      irq_thread_fn+0x2c/xa8
>>>>      irq_thread+0x160/x248
>>>>      kthread+0x110/x114
>>>>      ret_from_fork+0x10/x20
>>>>
>>>> To resolve this, explicitly manage the wakeup IRQ state within the
>>>> runtime suspend/resume callbacks. In the runtime resume callback, call
>>>> disable_irq_wake() before enabling resources. This preemptively
>>>> removes the "wakeup" capability from the IRQ, allowing subsequent
>>>> interrupt management calls to proceed without conflict. An error path
>>>> re-enables the wakeup IRQ if resource enablement fails.
>>>>
>>>> Conversely, in runtime suspend, call enable_irq_wake() after resources
>>>> are disabled. This ensures the interrupt is configured as a wakeup
>>>> source only once the device has fully entered its low-power state. An
>>>> error path handles disabling the wakeup IRQ if the suspend operation
>>>> fails.
>>>>
>>>> Fixes: 1afa70632c39 ("serial: qcom-geni: Enable PM runtime for serial driver")
>>>> Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
>>>
>>> You forgot:
>>>
>>> Reported-by: Alexey Klimov <alexey.klimov@linaro.org>
>>>
>>> Also, not sure where this change will go, via Greg or Jiri, but ideally
>>> this should be picked for current -rc cycle since regression is
>>> introduced during latest merge window.
>>>
>>> I also would like to test it on qrb2210 rb1 where this regression is
>>> reproduciable.
>> 
>> It doesn't seem that it fixes the regression on RB1 board:
>> 
>>   INFO: task kworker/u16:3:50 blocked for more than 120 seconds.
>>         Not tainted 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13
>>   "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>   task:kworker/u16:3   state:D stack:0     pid:50    tgid:50    ppid:2      task_flags:0x4208060 flags:0x00000010
>>   Workqueue: async async_run_entry_fn
>>   Call trace:
>>    __switch_to+0xf0/0x1c0 (T)
>>    __schedule+0x358/0x99c
>>    schedule+0x34/0x11c
>>    rpm_resume+0x17c/0x6a0
>>    rpm_resume+0x2c4/0x6a0
>>    rpm_resume+0x2c4/0x6a0
>>    rpm_resume+0x2c4/0x6a0
>>    __pm_runtime_resume+0x50/0x9c
>>    __driver_probe_device+0x58/0x120
>>    driver_probe_device+0x3c/0x154
>>    __driver_attach_async_helper+0x4c/0xc0
>>    async_run_entry_fn+0x34/0xe0
>>    process_one_work+0x148/0x284
>>    worker_thread+0x2c4/0x3e0
>>    kthread+0x12c/0x210
>>    ret_from_fork+0x10/0x20
>>   INFO: task irq/92-4a8c000.:79 blocked for more than 120 seconds.
>>         Not tainted 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13
>>   "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>   task:irq/92-4a8c000. state:D stack:0     pid:79    tgid:79    ppid:2      task_flags:0x208040 flags:0x00000010
>>   Call trace:
>>    __switch_to+0xf0/0x1c0 (T)
>>    __schedule+0x358/0x99c
>>    schedule+0x34/0x11c
>>    __synchronize_irq+0x90/0xcc
>>    disable_irq+0x3c/0x4c
>>    msm_pinmux_set_mux+0x3b4/0x45c
>>    pinmux_enable_setting+0x1fc/0x2d8
>>    pinctrl_commit_state+0xa0/0x260
>>    pinctrl_pm_select_default_state+0x4c/0xa0
>>    geni_se_resources_on+0xe8/0x154
>>    geni_serial_resource_state+0x8c/0xbc
>>    qcom_geni_serial_runtime_resume+0x3c/0x88
>>    pm_generic_runtime_resume+0x2c/0x44
>>    __rpm_callback+0x48/0x1e0
>>    rpm_callback+0x74/0x80
>>    rpm_resume+0x3bc/0x6a0
>>    __pm_runtime_resume+0x50/0x9c
>>    handle_threaded_wake_irq+0x30/0x80
>>    irq_thread_fn+0x2c/0xb0
>>    irq_thread+0x170/0x334
>>    kthread+0x12c/0x210
>>    ret_from_fork+0x10/0x20
>
> I can see call stack is mostly similar for yours and mine but not 
> completely at initial calls.
>
> Yours dump:
>  >    qcom_geni_serial_runtime_resume+0x3c/0x88
>  >    pm_generic_runtime_resume+0x2c/0x44
>  >    __rpm_callback+0x48/0x1e0
>  >    rpm_callback+0x74/0x80
>  >    rpm_resume+0x3bc/0x6a0
>  >    __pm_runtime_resume+0x50/0x9c
>  >    handle_threaded_wake_irq+0x30/0x80
>
> Mine:
>  >>>      qcom_geni_serial_runtime_resume+0x4c/0x88
>  >>>      pm_generic_runtime_resume+0x2c/0x44
>  >>>      __genpd_runtime_resume+0x30/0x80
>  >>>      genpd_runtime_resume+0x114/0x29c
>  >>>      __rpm_callback+0x48/0x1d8
>  >>>      rpm_callback+0x6c/0x78
>  >>>      rpm_resume+0x530/0x750
>
>
> Can you please share what is DT file for this Board if possible?
> is there any usecase enabled on this SE instance?

Well, yeah, sorry, I didn't really compared backtraces line to line and
behaviour was exactly the same. I thought that the purpose was to fix
the regression reported earlier.

RB1 main dts files are qrb2210-rb1.dts and qcm2290.dtsi.

The similar board RB2 uses qrb4210-rb2.dts and sm4250.dtsi+sm6115.dtsi,
it is worth checking it as well.
For testing here I didn't use anything extra (the only change was wifi fix
from Loic); I tested -master and linux-next usually.

If you can tell me what is SE instance I may be able to answer. But
as far as I know it is not a part of any infrastructure or CI machinery.
I just boot the board and see if it works, if it does then I rebuild and
test my changes (audio).

Best regards,
Alexey
Re: [PATCH v1] serial: qcom-geni: Fix pinctrl deadlock on runtime resume
Posted by Jorge Ramirez 4 months, 3 weeks ago
On 11/09/25 10:00:27, Alexey Klimov wrote:
> Hi Praveen,
> 
> On Thu Sep 11, 2025 at 9:34 AM BST, Praveen Talari wrote:
> > Hi Alexy,
> >
> > Thank you for update.
> >
> > On 9/10/2025 1:35 AM, Alexey Klimov wrote:
> >> 
> >> (adding Krzysztof to c/c)
> >> 
> >> On Mon Sep 8, 2025 at 6:43 PM BST, Alexey Klimov wrote:
> >>> On Mon Sep 8, 2025 at 5:45 PM BST, Praveen Talari wrote:
> >>>> A deadlock is observed in the qcom_geni_serial driver during runtime
> >>>> resume. This occurs when the pinctrl subsystem reconfigures device pins
> >>>> via msm_pinmux_set_mux() while the serial device's interrupt is an
> >>>> active wakeup source. msm_pinmux_set_mux() calls disable_irq() or
> >>>> __synchronize_irq(), conflicting with the active wakeup state and
> >>>> causing the IRQ thread to enter an uninterruptible (D-state) sleep,
> >>>> leading to system instability.
> >>>>
> >>>> The critical call trace leading to the deadlock is:
> >>>>
> >>>>      Call trace:
> >>>>      __switch_to+0xe0/0x120
> >>>>      __schedule+0x39c/0x978
> >>>>      schedule+0x5c/0xf8
> >>>>      __synchronize_irq+0x88/0xb4
> >>>>      disable_irq+0x3c/0x4c
> >>>>      msm_pinmux_set_mux+0x508/0x644
> >>>>      pinmux_enable_setting+0x190/0x2dc
> >>>>      pinctrl_commit_state+0x13c/0x208
> >>>>      pinctrl_pm_select_default_state+0x4c/0xa4
> >>>>      geni_se_resources_on+0xe8/0x154
> >>>>      qcom_geni_serial_runtime_resume+0x4c/0x88
> >>>>      pm_generic_runtime_resume+0x2c/0x44
> >>>>      __genpd_runtime_resume+0x30/0x80
> >>>>      genpd_runtime_resume+0x114/0x29c
> >>>>      __rpm_callback+0x48/0x1d8
> >>>>      rpm_callback+0x6c/0x78
> >>>>      rpm_resume+0x530/0x750
> >>>>      __pm_runtime_resume+0x50/0x94
> >>>>      handle_threaded_wake_irq+0x30/0x94
> >>>>      irq_thread_fn+0x2c/xa8
> >>>>      irq_thread+0x160/x248
> >>>>      kthread+0x110/x114
> >>>>      ret_from_fork+0x10/x20
> >>>>
> >>>> To resolve this, explicitly manage the wakeup IRQ state within the
> >>>> runtime suspend/resume callbacks. In the runtime resume callback, call
> >>>> disable_irq_wake() before enabling resources. This preemptively
> >>>> removes the "wakeup" capability from the IRQ, allowing subsequent
> >>>> interrupt management calls to proceed without conflict. An error path
> >>>> re-enables the wakeup IRQ if resource enablement fails.
> >>>>
> >>>> Conversely, in runtime suspend, call enable_irq_wake() after resources
> >>>> are disabled. This ensures the interrupt is configured as a wakeup
> >>>> source only once the device has fully entered its low-power state. An
> >>>> error path handles disabling the wakeup IRQ if the suspend operation
> >>>> fails.
> >>>>
> >>>> Fixes: 1afa70632c39 ("serial: qcom-geni: Enable PM runtime for serial driver")
> >>>> Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
> >>>
> >>> You forgot:
> >>>
> >>> Reported-by: Alexey Klimov <alexey.klimov@linaro.org>
> >>>
> >>> Also, not sure where this change will go, via Greg or Jiri, but ideally
> >>> this should be picked for current -rc cycle since regression is
> >>> introduced during latest merge window.
> >>>
> >>> I also would like to test it on qrb2210 rb1 where this regression is
> >>> reproduciable.
> >> 
> >> It doesn't seem that it fixes the regression on RB1 board:
> >> 
> >>   INFO: task kworker/u16:3:50 blocked for more than 120 seconds.
> >>         Not tainted 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13
> >>   "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >>   task:kworker/u16:3   state:D stack:0     pid:50    tgid:50    ppid:2      task_flags:0x4208060 flags:0x00000010
> >>   Workqueue: async async_run_entry_fn
> >>   Call trace:
> >>    __switch_to+0xf0/0x1c0 (T)
> >>    __schedule+0x358/0x99c
> >>    schedule+0x34/0x11c
> >>    rpm_resume+0x17c/0x6a0
> >>    rpm_resume+0x2c4/0x6a0
> >>    rpm_resume+0x2c4/0x6a0
> >>    rpm_resume+0x2c4/0x6a0
> >>    __pm_runtime_resume+0x50/0x9c
> >>    __driver_probe_device+0x58/0x120
> >>    driver_probe_device+0x3c/0x154
> >>    __driver_attach_async_helper+0x4c/0xc0
> >>    async_run_entry_fn+0x34/0xe0
> >>    process_one_work+0x148/0x284
> >>    worker_thread+0x2c4/0x3e0
> >>    kthread+0x12c/0x210
> >>    ret_from_fork+0x10/0x20
> >>   INFO: task irq/92-4a8c000.:79 blocked for more than 120 seconds.
> >>         Not tainted 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13
> >>   "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >>   task:irq/92-4a8c000. state:D stack:0     pid:79    tgid:79    ppid:2      task_flags:0x208040 flags:0x00000010
> >>   Call trace:
> >>    __switch_to+0xf0/0x1c0 (T)
> >>    __schedule+0x358/0x99c
> >>    schedule+0x34/0x11c
> >>    __synchronize_irq+0x90/0xcc
> >>    disable_irq+0x3c/0x4c
> >>    msm_pinmux_set_mux+0x3b4/0x45c
> >>    pinmux_enable_setting+0x1fc/0x2d8
> >>    pinctrl_commit_state+0xa0/0x260
> >>    pinctrl_pm_select_default_state+0x4c/0xa0
> >>    geni_se_resources_on+0xe8/0x154
> >>    geni_serial_resource_state+0x8c/0xbc
> >>    qcom_geni_serial_runtime_resume+0x3c/0x88
> >>    pm_generic_runtime_resume+0x2c/0x44
> >>    __rpm_callback+0x48/0x1e0
> >>    rpm_callback+0x74/0x80
> >>    rpm_resume+0x3bc/0x6a0
> >>    __pm_runtime_resume+0x50/0x9c
> >>    handle_threaded_wake_irq+0x30/0x80
> >>    irq_thread_fn+0x2c/0xb0
> >>    irq_thread+0x170/0x334
> >>    kthread+0x12c/0x210
> >>    ret_from_fork+0x10/0x20
> >
> > I can see call stack is mostly similar for yours and mine but not 
> > completely at initial calls.
> >
> > Yours dump:
> >  >    qcom_geni_serial_runtime_resume+0x3c/0x88
> >  >    pm_generic_runtime_resume+0x2c/0x44
> >  >    __rpm_callback+0x48/0x1e0
> >  >    rpm_callback+0x74/0x80
> >  >    rpm_resume+0x3bc/0x6a0
> >  >    __pm_runtime_resume+0x50/0x9c
> >  >    handle_threaded_wake_irq+0x30/0x80
> >
> > Mine:
> >  >>>      qcom_geni_serial_runtime_resume+0x4c/0x88
> >  >>>      pm_generic_runtime_resume+0x2c/0x44
> >  >>>      __genpd_runtime_resume+0x30/0x80
> >  >>>      genpd_runtime_resume+0x114/0x29c
> >  >>>      __rpm_callback+0x48/0x1d8
> >  >>>      rpm_callback+0x6c/0x78
> >  >>>      rpm_resume+0x530/0x750
> >
> >
> > Can you please share what is DT file for this Board if possible?
> > is there any usecase enabled on this SE instance?
> 
> Well, yeah, sorry, I didn't really compared backtraces line to line and
> behaviour was exactly the same. I thought that the purpose was to fix
> the regression reported earlier.
> 
> RB1 main dts files are qrb2210-rb1.dts and qcm2290.dtsi.
> 
> The similar board RB2 uses qrb4210-rb2.dts and sm4250.dtsi+sm6115.dtsi,
> it is worth checking it as well.
> For testing here I didn't use anything extra (the only change was wifi fix
> from Loic); I tested -master and linux-next usually.
> 
> If you can tell me what is SE instance I may be able to answer. But
> as far as I know it is not a part of any infrastructure or CI machinery.
> I just boot the board and see if it works, if it does then I rebuild and
> test my changes (audio).
> 
> Best regards,
> Alexey
> 

will there be a fix any time soon Praveen? reverting "serial: qcom-geni:
Enable PM runtime for serial driver" does fix the problem on RB1.

Otherwise I suggest that we revert this commit on linux-next.
Re: [PATCH v1] serial: qcom-geni: Fix pinctrl deadlock on runtime resume
Posted by Praveen Talari 4 months, 3 weeks ago
Hi Alexey,

Really appreciate you waiting!

On 9/11/2025 2:30 PM, Alexey Klimov wrote:
> Hi Praveen,
> 
> On Thu Sep 11, 2025 at 9:34 AM BST, Praveen Talari wrote:
>> Hi Alexy,
>>
>> Thank you for update.
>>
>> On 9/10/2025 1:35 AM, Alexey Klimov wrote:
>>>
>>> (adding Krzysztof to c/c)
>>>
>>> On Mon Sep 8, 2025 at 6:43 PM BST, Alexey Klimov wrote:
>>>> On Mon Sep 8, 2025 at 5:45 PM BST, Praveen Talari wrote:
>>>>> A deadlock is observed in the qcom_geni_serial driver during runtime
>>>>> resume. This occurs when the pinctrl subsystem reconfigures device pins
>>>>> via msm_pinmux_set_mux() while the serial device's interrupt is an
>>>>> active wakeup source. msm_pinmux_set_mux() calls disable_irq() or
>>>>> __synchronize_irq(), conflicting with the active wakeup state and
>>>>> causing the IRQ thread to enter an uninterruptible (D-state) sleep,
>>>>> leading to system instability.
>>>>>
>>>>> The critical call trace leading to the deadlock is:
>>>>>
>>>>>       Call trace:
>>>>>       __switch_to+0xe0/0x120
>>>>>       __schedule+0x39c/0x978
>>>>>       schedule+0x5c/0xf8
>>>>>       __synchronize_irq+0x88/0xb4
>>>>>       disable_irq+0x3c/0x4c
>>>>>       msm_pinmux_set_mux+0x508/0x644
>>>>>       pinmux_enable_setting+0x190/0x2dc
>>>>>       pinctrl_commit_state+0x13c/0x208
>>>>>       pinctrl_pm_select_default_state+0x4c/0xa4
>>>>>       geni_se_resources_on+0xe8/0x154
>>>>>       qcom_geni_serial_runtime_resume+0x4c/0x88
>>>>>       pm_generic_runtime_resume+0x2c/0x44
>>>>>       __genpd_runtime_resume+0x30/0x80
>>>>>       genpd_runtime_resume+0x114/0x29c
>>>>>       __rpm_callback+0x48/0x1d8
>>>>>       rpm_callback+0x6c/0x78
>>>>>       rpm_resume+0x530/0x750
>>>>>       __pm_runtime_resume+0x50/0x94
>>>>>       handle_threaded_wake_irq+0x30/0x94
>>>>>       irq_thread_fn+0x2c/xa8
>>>>>       irq_thread+0x160/x248
>>>>>       kthread+0x110/x114
>>>>>       ret_from_fork+0x10/x20
>>>>>
>>>>> To resolve this, explicitly manage the wakeup IRQ state within the
>>>>> runtime suspend/resume callbacks. In the runtime resume callback, call
>>>>> disable_irq_wake() before enabling resources. This preemptively
>>>>> removes the "wakeup" capability from the IRQ, allowing subsequent
>>>>> interrupt management calls to proceed without conflict. An error path
>>>>> re-enables the wakeup IRQ if resource enablement fails.
>>>>>
>>>>> Conversely, in runtime suspend, call enable_irq_wake() after resources
>>>>> are disabled. This ensures the interrupt is configured as a wakeup
>>>>> source only once the device has fully entered its low-power state. An
>>>>> error path handles disabling the wakeup IRQ if the suspend operation
>>>>> fails.
>>>>>
>>>>> Fixes: 1afa70632c39 ("serial: qcom-geni: Enable PM runtime for serial driver")
>>>>> Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
>>>>
>>>> You forgot:
>>>>
>>>> Reported-by: Alexey Klimov <alexey.klimov@linaro.org>
>>>>
>>>> Also, not sure where this change will go, via Greg or Jiri, but ideally
>>>> this should be picked for current -rc cycle since regression is
>>>> introduced during latest merge window.
>>>>
>>>> I also would like to test it on qrb2210 rb1 where this regression is
>>>> reproduciable.
>>>
>>> It doesn't seem that it fixes the regression on RB1 board:
>>>
>>>    INFO: task kworker/u16:3:50 blocked for more than 120 seconds.
>>>          Not tainted 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13
>>>    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>    task:kworker/u16:3   state:D stack:0     pid:50    tgid:50    ppid:2      task_flags:0x4208060 flags:0x00000010
>>>    Workqueue: async async_run_entry_fn
>>>    Call trace:
>>>     __switch_to+0xf0/0x1c0 (T)
>>>     __schedule+0x358/0x99c
>>>     schedule+0x34/0x11c
>>>     rpm_resume+0x17c/0x6a0
>>>     rpm_resume+0x2c4/0x6a0
>>>     rpm_resume+0x2c4/0x6a0
>>>     rpm_resume+0x2c4/0x6a0
>>>     __pm_runtime_resume+0x50/0x9c
>>>     __driver_probe_device+0x58/0x120
>>>     driver_probe_device+0x3c/0x154
>>>     __driver_attach_async_helper+0x4c/0xc0
>>>     async_run_entry_fn+0x34/0xe0
>>>     process_one_work+0x148/0x284
>>>     worker_thread+0x2c4/0x3e0
>>>     kthread+0x12c/0x210
>>>     ret_from_fork+0x10/0x20
>>>    INFO: task irq/92-4a8c000.:79 blocked for more than 120 seconds.
>>>          Not tainted 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13
>>>    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>    task:irq/92-4a8c000. state:D stack:0     pid:79    tgid:79    ppid:2      task_flags:0x208040 flags:0x00000010
>>>    Call trace:
>>>     __switch_to+0xf0/0x1c0 (T)
>>>     __schedule+0x358/0x99c
>>>     schedule+0x34/0x11c
>>>     __synchronize_irq+0x90/0xcc
>>>     disable_irq+0x3c/0x4c
>>>     msm_pinmux_set_mux+0x3b4/0x45c
>>>     pinmux_enable_setting+0x1fc/0x2d8
>>>     pinctrl_commit_state+0xa0/0x260
>>>     pinctrl_pm_select_default_state+0x4c/0xa0
>>>     geni_se_resources_on+0xe8/0x154
>>>     geni_serial_resource_state+0x8c/0xbc
>>>     qcom_geni_serial_runtime_resume+0x3c/0x88
>>>     pm_generic_runtime_resume+0x2c/0x44
>>>     __rpm_callback+0x48/0x1e0
>>>     rpm_callback+0x74/0x80
>>>     rpm_resume+0x3bc/0x6a0
>>>     __pm_runtime_resume+0x50/0x9c
>>>     handle_threaded_wake_irq+0x30/0x80
>>>     irq_thread_fn+0x2c/0xb0
>>>     irq_thread+0x170/0x334
>>>     kthread+0x12c/0x210
>>>     ret_from_fork+0x10/0x20
>>
>> I can see call stack is mostly similar for yours and mine but not
>> completely at initial calls.
>>
>> Yours dump:
>>   >    qcom_geni_serial_runtime_resume+0x3c/0x88
>>   >    pm_generic_runtime_resume+0x2c/0x44
>>   >    __rpm_callback+0x48/0x1e0
>>   >    rpm_callback+0x74/0x80
>>   >    rpm_resume+0x3bc/0x6a0
>>   >    __pm_runtime_resume+0x50/0x9c
>>   >    handle_threaded_wake_irq+0x30/0x80
>>
>> Mine:
>>   >>>      qcom_geni_serial_runtime_resume+0x4c/0x88
>>   >>>      pm_generic_runtime_resume+0x2c/0x44
>>   >>>      __genpd_runtime_resume+0x30/0x80
>>   >>>      genpd_runtime_resume+0x114/0x29c
>>   >>>      __rpm_callback+0x48/0x1d8
>>   >>>      rpm_callback+0x6c/0x78
>>   >>>      rpm_resume+0x530/0x750
>>
>>
>> Can you please share what is DT file for this Board if possible?
>> is there any usecase enabled on this SE instance?
> 
> Well, yeah, sorry, I didn't really compared backtraces line to line and
> behaviour was exactly the same. I thought that the purpose was to fix
> the regression reported earlier.
> 
> RB1 main dts files are qrb2210-rb1.dts and qcm2290.dtsi.
> 
> The similar board RB2 uses qrb4210-rb2.dts and sm4250.dtsi+sm6115.dtsi,
> it is worth checking it as well.
> For testing here I didn't use anything extra (the only change was wifi fix
> from Loic); I tested -master and linux-next usually.
> 
> If you can tell me what is SE instance I may be able to answer. But
> as far as I know it is not a part of any infrastructure or CI machinery.
> I just boot the board and see if it works, if it does then I rebuild and
> test my changes (audio).

I'm actively working on this and experimenting various scenarios with 
wakeup. I’ll share the updated patch as soon as possible.

Should we include fix in V2 or new version(V1) if the fix originates 
from a different subsystem(pinctrol)?

Thanks,
Praveen Talari
> 
> Best regards,
> Alexey
> 
Re: [PATCH v1] serial: qcom-geni: Fix pinctrl deadlock on runtime resume
Posted by Alexey Klimov 4 months, 3 weeks ago
(removing <quic_mnaresh@quicinc.com> from c/c -- too many mail not delivered)

Hi Praveen,

On Mon Sep 15, 2025 at 7:58 AM BST, Praveen Talari wrote:
> Hi Alexey,
>
> Really appreciate you waiting!
>
> On 9/11/2025 2:30 PM, Alexey Klimov wrote:
>> Hi Praveen,
>> 
>> On Thu Sep 11, 2025 at 9:34 AM BST, Praveen Talari wrote:
>>> Hi Alexy,
>>>
>>> Thank you for update.
>>>
>>> On 9/10/2025 1:35 AM, Alexey Klimov wrote:
>>>>
>>>> (adding Krzysztof to c/c)
>>>>
>>>> On Mon Sep 8, 2025 at 6:43 PM BST, Alexey Klimov wrote:
>>>>> On Mon Sep 8, 2025 at 5:45 PM BST, Praveen Talari wrote:
>>>>>> A deadlock is observed in the qcom_geni_serial driver during runtime
>>>>>> resume. This occurs when the pinctrl subsystem reconfigures device pins
>>>>>> via msm_pinmux_set_mux() while the serial device's interrupt is an
>>>>>> active wakeup source. msm_pinmux_set_mux() calls disable_irq() or
>>>>>> __synchronize_irq(), conflicting with the active wakeup state and
>>>>>> causing the IRQ thread to enter an uninterruptible (D-state) sleep,
>>>>>> leading to system instability.
>>>>>>
>>>>>> The critical call trace leading to the deadlock is:
>>>>>>
>>>>>>       Call trace:
>>>>>>       __switch_to+0xe0/0x120
>>>>>>       __schedule+0x39c/0x978
>>>>>>       schedule+0x5c/0xf8
>>>>>>       __synchronize_irq+0x88/0xb4
>>>>>>       disable_irq+0x3c/0x4c
>>>>>>       msm_pinmux_set_mux+0x508/0x644
>>>>>>       pinmux_enable_setting+0x190/0x2dc
>>>>>>       pinctrl_commit_state+0x13c/0x208
>>>>>>       pinctrl_pm_select_default_state+0x4c/0xa4
>>>>>>       geni_se_resources_on+0xe8/0x154
>>>>>>       qcom_geni_serial_runtime_resume+0x4c/0x88
>>>>>>       pm_generic_runtime_resume+0x2c/0x44
>>>>>>       __genpd_runtime_resume+0x30/0x80
>>>>>>       genpd_runtime_resume+0x114/0x29c
>>>>>>       __rpm_callback+0x48/0x1d8
>>>>>>       rpm_callback+0x6c/0x78
>>>>>>       rpm_resume+0x530/0x750
>>>>>>       __pm_runtime_resume+0x50/0x94
>>>>>>       handle_threaded_wake_irq+0x30/0x94
>>>>>>       irq_thread_fn+0x2c/xa8
>>>>>>       irq_thread+0x160/x248
>>>>>>       kthread+0x110/x114
>>>>>>       ret_from_fork+0x10/x20
>>>>>>
>>>>>> To resolve this, explicitly manage the wakeup IRQ state within the
>>>>>> runtime suspend/resume callbacks. In the runtime resume callback, call
>>>>>> disable_irq_wake() before enabling resources. This preemptively
>>>>>> removes the "wakeup" capability from the IRQ, allowing subsequent
>>>>>> interrupt management calls to proceed without conflict. An error path
>>>>>> re-enables the wakeup IRQ if resource enablement fails.
>>>>>>
>>>>>> Conversely, in runtime suspend, call enable_irq_wake() after resources
>>>>>> are disabled. This ensures the interrupt is configured as a wakeup
>>>>>> source only once the device has fully entered its low-power state. An
>>>>>> error path handles disabling the wakeup IRQ if the suspend operation
>>>>>> fails.
>>>>>>
>>>>>> Fixes: 1afa70632c39 ("serial: qcom-geni: Enable PM runtime for serial driver")
>>>>>> Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
>>>>>
>>>>> You forgot:
>>>>>
>>>>> Reported-by: Alexey Klimov <alexey.klimov@linaro.org>
>>>>>
>>>>> Also, not sure where this change will go, via Greg or Jiri, but ideally
>>>>> this should be picked for current -rc cycle since regression is
>>>>> introduced during latest merge window.
>>>>>
>>>>> I also would like to test it on qrb2210 rb1 where this regression is
>>>>> reproduciable.
>>>>
>>>> It doesn't seem that it fixes the regression on RB1 board:
>>>>
>>>>    INFO: task kworker/u16:3:50 blocked for more than 120 seconds.
>>>>          Not tainted 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13
>>>>    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>    task:kworker/u16:3   state:D stack:0     pid:50    tgid:50    ppid:2      task_flags:0x4208060 flags:0x00000010
>>>>    Workqueue: async async_run_entry_fn
>>>>    Call trace:
>>>>     __switch_to+0xf0/0x1c0 (T)
>>>>     __schedule+0x358/0x99c
>>>>     schedule+0x34/0x11c
>>>>     rpm_resume+0x17c/0x6a0
>>>>     rpm_resume+0x2c4/0x6a0
>>>>     rpm_resume+0x2c4/0x6a0
>>>>     rpm_resume+0x2c4/0x6a0
>>>>     __pm_runtime_resume+0x50/0x9c
>>>>     __driver_probe_device+0x58/0x120
>>>>     driver_probe_device+0x3c/0x154
>>>>     __driver_attach_async_helper+0x4c/0xc0
>>>>     async_run_entry_fn+0x34/0xe0
>>>>     process_one_work+0x148/0x284
>>>>     worker_thread+0x2c4/0x3e0
>>>>     kthread+0x12c/0x210
>>>>     ret_from_fork+0x10/0x20
>>>>    INFO: task irq/92-4a8c000.:79 blocked for more than 120 seconds.
>>>>          Not tainted 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13
>>>>    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>    task:irq/92-4a8c000. state:D stack:0     pid:79    tgid:79    ppid:2      task_flags:0x208040 flags:0x00000010
>>>>    Call trace:
>>>>     __switch_to+0xf0/0x1c0 (T)
>>>>     __schedule+0x358/0x99c
>>>>     schedule+0x34/0x11c
>>>>     __synchronize_irq+0x90/0xcc
>>>>     disable_irq+0x3c/0x4c
>>>>     msm_pinmux_set_mux+0x3b4/0x45c
>>>>     pinmux_enable_setting+0x1fc/0x2d8
>>>>     pinctrl_commit_state+0xa0/0x260
>>>>     pinctrl_pm_select_default_state+0x4c/0xa0
>>>>     geni_se_resources_on+0xe8/0x154
>>>>     geni_serial_resource_state+0x8c/0xbc
>>>>     qcom_geni_serial_runtime_resume+0x3c/0x88
>>>>     pm_generic_runtime_resume+0x2c/0x44
>>>>     __rpm_callback+0x48/0x1e0
>>>>     rpm_callback+0x74/0x80
>>>>     rpm_resume+0x3bc/0x6a0
>>>>     __pm_runtime_resume+0x50/0x9c
>>>>     handle_threaded_wake_irq+0x30/0x80
>>>>     irq_thread_fn+0x2c/0xb0
>>>>     irq_thread+0x170/0x334
>>>>     kthread+0x12c/0x210
>>>>     ret_from_fork+0x10/0x20
>>>
>>> I can see call stack is mostly similar for yours and mine but not
>>> completely at initial calls.
>>>
>>> Yours dump:
>>>   >    qcom_geni_serial_runtime_resume+0x3c/0x88
>>>   >    pm_generic_runtime_resume+0x2c/0x44
>>>   >    __rpm_callback+0x48/0x1e0
>>>   >    rpm_callback+0x74/0x80
>>>   >    rpm_resume+0x3bc/0x6a0
>>>   >    __pm_runtime_resume+0x50/0x9c
>>>   >    handle_threaded_wake_irq+0x30/0x80
>>>
>>> Mine:
>>>   >>>      qcom_geni_serial_runtime_resume+0x4c/0x88
>>>   >>>      pm_generic_runtime_resume+0x2c/0x44
>>>   >>>      __genpd_runtime_resume+0x30/0x80
>>>   >>>      genpd_runtime_resume+0x114/0x29c
>>>   >>>      __rpm_callback+0x48/0x1d8
>>>   >>>      rpm_callback+0x6c/0x78
>>>   >>>      rpm_resume+0x530/0x750
>>>
>>>
>>> Can you please share what is DT file for this Board if possible?
>>> is there any usecase enabled on this SE instance?
>> 
>> Well, yeah, sorry, I didn't really compared backtraces line to line and
>> behaviour was exactly the same. I thought that the purpose was to fix
>> the regression reported earlier.
>> 
>> RB1 main dts files are qrb2210-rb1.dts and qcm2290.dtsi.
>> 
>> The similar board RB2 uses qrb4210-rb2.dts and sm4250.dtsi+sm6115.dtsi,
>> it is worth checking it as well.
>> For testing here I didn't use anything extra (the only change was wifi fix
>> from Loic); I tested -master and linux-next usually.
>> 
>> If you can tell me what is SE instance I may be able to answer. But
>> as far as I know it is not a part of any infrastructure or CI machinery.
>> I just boot the board and see if it works, if it does then I rebuild and
>> test my changes (audio).
>
> I'm actively working on this and experimenting various scenarios with 
> wakeup. I’ll share the updated patch as soon as possible.
>
> Should we include fix in V2 or new version(V1) if the fix originates 
> from a different subsystem(pinctrol)?

Wait, I am a bit lost. Are there two regresssions? And is this patch only
targets one of the them?
Are there two fixes now for different problems?
If they are not related (independent) then I'd split it but it not something
exceptional -- just standard rules should apply.

Thanks,
Alexey
Re: [PATCH v1] serial: qcom-geni: Fix pinctrl deadlock on runtime resume
Posted by Praveen Talari 4 months, 3 weeks ago
Hi Alexey,

On 9/15/2025 3:09 PM, Alexey Klimov wrote:
> (removing <quic_mnaresh@quicinc.com> from c/c -- too many mail not delivered)
> 
> Hi Praveen,
> 
> On Mon Sep 15, 2025 at 7:58 AM BST, Praveen Talari wrote:
>> Hi Alexey,
>>
>> Really appreciate you waiting!
>>
>> On 9/11/2025 2:30 PM, Alexey Klimov wrote:
>>> Hi Praveen,
>>>
>>> On Thu Sep 11, 2025 at 9:34 AM BST, Praveen Talari wrote:
>>>> Hi Alexy,
>>>>
>>>> Thank you for update.
>>>>
>>>> On 9/10/2025 1:35 AM, Alexey Klimov wrote:
>>>>>
>>>>> (adding Krzysztof to c/c)
>>>>>
>>>>> On Mon Sep 8, 2025 at 6:43 PM BST, Alexey Klimov wrote:
>>>>>> On Mon Sep 8, 2025 at 5:45 PM BST, Praveen Talari wrote:
>>>>>>> A deadlock is observed in the qcom_geni_serial driver during runtime
>>>>>>> resume. This occurs when the pinctrl subsystem reconfigures device pins
>>>>>>> via msm_pinmux_set_mux() while the serial device's interrupt is an
>>>>>>> active wakeup source. msm_pinmux_set_mux() calls disable_irq() or
>>>>>>> __synchronize_irq(), conflicting with the active wakeup state and
>>>>>>> causing the IRQ thread to enter an uninterruptible (D-state) sleep,
>>>>>>> leading to system instability.
>>>>>>>
>>>>>>> The critical call trace leading to the deadlock is:
>>>>>>>
>>>>>>>        Call trace:
>>>>>>>        __switch_to+0xe0/0x120
>>>>>>>        __schedule+0x39c/0x978
>>>>>>>        schedule+0x5c/0xf8
>>>>>>>        __synchronize_irq+0x88/0xb4
>>>>>>>        disable_irq+0x3c/0x4c
>>>>>>>        msm_pinmux_set_mux+0x508/0x644
>>>>>>>        pinmux_enable_setting+0x190/0x2dc
>>>>>>>        pinctrl_commit_state+0x13c/0x208
>>>>>>>        pinctrl_pm_select_default_state+0x4c/0xa4
>>>>>>>        geni_se_resources_on+0xe8/0x154
>>>>>>>        qcom_geni_serial_runtime_resume+0x4c/0x88
>>>>>>>        pm_generic_runtime_resume+0x2c/0x44
>>>>>>>        __genpd_runtime_resume+0x30/0x80
>>>>>>>        genpd_runtime_resume+0x114/0x29c
>>>>>>>        __rpm_callback+0x48/0x1d8
>>>>>>>        rpm_callback+0x6c/0x78
>>>>>>>        rpm_resume+0x530/0x750
>>>>>>>        __pm_runtime_resume+0x50/0x94
>>>>>>>        handle_threaded_wake_irq+0x30/0x94
>>>>>>>        irq_thread_fn+0x2c/xa8
>>>>>>>        irq_thread+0x160/x248
>>>>>>>        kthread+0x110/x114
>>>>>>>        ret_from_fork+0x10/x20
>>>>>>>
>>>>>>> To resolve this, explicitly manage the wakeup IRQ state within the
>>>>>>> runtime suspend/resume callbacks. In the runtime resume callback, call
>>>>>>> disable_irq_wake() before enabling resources. This preemptively
>>>>>>> removes the "wakeup" capability from the IRQ, allowing subsequent
>>>>>>> interrupt management calls to proceed without conflict. An error path
>>>>>>> re-enables the wakeup IRQ if resource enablement fails.
>>>>>>>
>>>>>>> Conversely, in runtime suspend, call enable_irq_wake() after resources
>>>>>>> are disabled. This ensures the interrupt is configured as a wakeup
>>>>>>> source only once the device has fully entered its low-power state. An
>>>>>>> error path handles disabling the wakeup IRQ if the suspend operation
>>>>>>> fails.
>>>>>>>
>>>>>>> Fixes: 1afa70632c39 ("serial: qcom-geni: Enable PM runtime for serial driver")
>>>>>>> Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
>>>>>>
>>>>>> You forgot:
>>>>>>
>>>>>> Reported-by: Alexey Klimov <alexey.klimov@linaro.org>
>>>>>>
>>>>>> Also, not sure where this change will go, via Greg or Jiri, but ideally
>>>>>> this should be picked for current -rc cycle since regression is
>>>>>> introduced during latest merge window.
>>>>>>
>>>>>> I also would like to test it on qrb2210 rb1 where this regression is
>>>>>> reproduciable.
>>>>>
>>>>> It doesn't seem that it fixes the regression on RB1 board:
>>>>>
>>>>>     INFO: task kworker/u16:3:50 blocked for more than 120 seconds.
>>>>>           Not tainted 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13
>>>>>     "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>>     task:kworker/u16:3   state:D stack:0     pid:50    tgid:50    ppid:2      task_flags:0x4208060 flags:0x00000010
>>>>>     Workqueue: async async_run_entry_fn
>>>>>     Call trace:
>>>>>      __switch_to+0xf0/0x1c0 (T)
>>>>>      __schedule+0x358/0x99c
>>>>>      schedule+0x34/0x11c
>>>>>      rpm_resume+0x17c/0x6a0
>>>>>      rpm_resume+0x2c4/0x6a0
>>>>>      rpm_resume+0x2c4/0x6a0
>>>>>      rpm_resume+0x2c4/0x6a0
>>>>>      __pm_runtime_resume+0x50/0x9c
>>>>>      __driver_probe_device+0x58/0x120
>>>>>      driver_probe_device+0x3c/0x154
>>>>>      __driver_attach_async_helper+0x4c/0xc0
>>>>>      async_run_entry_fn+0x34/0xe0
>>>>>      process_one_work+0x148/0x284
>>>>>      worker_thread+0x2c4/0x3e0
>>>>>      kthread+0x12c/0x210
>>>>>      ret_from_fork+0x10/0x20
>>>>>     INFO: task irq/92-4a8c000.:79 blocked for more than 120 seconds.
>>>>>           Not tainted 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13
>>>>>     "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>>     task:irq/92-4a8c000. state:D stack:0     pid:79    tgid:79    ppid:2      task_flags:0x208040 flags:0x00000010
>>>>>     Call trace:
>>>>>      __switch_to+0xf0/0x1c0 (T)
>>>>>      __schedule+0x358/0x99c
>>>>>      schedule+0x34/0x11c
>>>>>      __synchronize_irq+0x90/0xcc
>>>>>      disable_irq+0x3c/0x4c
>>>>>      msm_pinmux_set_mux+0x3b4/0x45c
>>>>>      pinmux_enable_setting+0x1fc/0x2d8
>>>>>      pinctrl_commit_state+0xa0/0x260
>>>>>      pinctrl_pm_select_default_state+0x4c/0xa0
>>>>>      geni_se_resources_on+0xe8/0x154
>>>>>      geni_serial_resource_state+0x8c/0xbc
>>>>>      qcom_geni_serial_runtime_resume+0x3c/0x88
>>>>>      pm_generic_runtime_resume+0x2c/0x44
>>>>>      __rpm_callback+0x48/0x1e0
>>>>>      rpm_callback+0x74/0x80
>>>>>      rpm_resume+0x3bc/0x6a0
>>>>>      __pm_runtime_resume+0x50/0x9c
>>>>>      handle_threaded_wake_irq+0x30/0x80
>>>>>      irq_thread_fn+0x2c/0xb0
>>>>>      irq_thread+0x170/0x334
>>>>>      kthread+0x12c/0x210
>>>>>      ret_from_fork+0x10/0x20
>>>>
>>>> I can see call stack is mostly similar for yours and mine but not
>>>> completely at initial calls.
>>>>
>>>> Yours dump:
>>>>    >    qcom_geni_serial_runtime_resume+0x3c/0x88
>>>>    >    pm_generic_runtime_resume+0x2c/0x44
>>>>    >    __rpm_callback+0x48/0x1e0
>>>>    >    rpm_callback+0x74/0x80
>>>>    >    rpm_resume+0x3bc/0x6a0
>>>>    >    __pm_runtime_resume+0x50/0x9c
>>>>    >    handle_threaded_wake_irq+0x30/0x80
>>>>
>>>> Mine:
>>>>    >>>      qcom_geni_serial_runtime_resume+0x4c/0x88
>>>>    >>>      pm_generic_runtime_resume+0x2c/0x44
>>>>    >>>      __genpd_runtime_resume+0x30/0x80
>>>>    >>>      genpd_runtime_resume+0x114/0x29c
>>>>    >>>      __rpm_callback+0x48/0x1d8
>>>>    >>>      rpm_callback+0x6c/0x78
>>>>    >>>      rpm_resume+0x530/0x750
>>>>
>>>>
>>>> Can you please share what is DT file for this Board if possible?
>>>> is there any usecase enabled on this SE instance?
>>>
>>> Well, yeah, sorry, I didn't really compared backtraces line to line and
>>> behaviour was exactly the same. I thought that the purpose was to fix
>>> the regression reported earlier.
>>>
>>> RB1 main dts files are qrb2210-rb1.dts and qcm2290.dtsi.
>>>
>>> The similar board RB2 uses qrb4210-rb2.dts and sm4250.dtsi+sm6115.dtsi,
>>> it is worth checking it as well.
>>> For testing here I didn't use anything extra (the only change was wifi fix
>>> from Loic); I tested -master and linux-next usually.
>>>
>>> If you can tell me what is SE instance I may be able to answer. But
>>> as far as I know it is not a part of any infrastructure or CI machinery.
>>> I just boot the board and see if it works, if it does then I rebuild and
>>> test my changes (audio).
>>
>> I'm actively working on this and experimenting various scenarios with
>> wakeup. I’ll share the updated patch as soon as possible.
>>
>> Should we include fix in V2 or new version(V1) if the fix originates
>> from a different subsystem(pinctrol)?
> 
> Wait, I am a bit lost. Are there two regresssions? And is this patch only
> targets one of the them?
I am simulated on different target(SC7280) and it is same issue only.
> Are there two fixes now for different problems?
The problem is same.
> If they are not related (independent) then I'd split it but it not something
> exceptional -- just standard rules should apply.
I am fixing from this issue from pinctrol subsystem.

Please guide me on this.
Should we include fix in V2 or new version(V1) if the fix originates
from a different subsystem(pinctrol)?

Thanks,
Praveen
> 
> Thanks,
> Alexey
Re: [PATCH v1] serial: qcom-geni: Fix pinctrl deadlock on runtime resume
Posted by Praveen Talari 4 months, 3 weeks ago
Hi Alexey

Thank you for your support.

On 9/15/2025 7:55 PM, Praveen Talari wrote:
> Hi Alexey,
> 
> On 9/15/2025 3:09 PM, Alexey Klimov wrote:
>> (removing <quic_mnaresh@quicinc.com> from c/c -- too many mail not 
>> delivered)
>>
>> Hi Praveen,
>>
>> On Mon Sep 15, 2025 at 7:58 AM BST, Praveen Talari wrote:
>>> Hi Alexey,
>>>
>>> Really appreciate you waiting!
>>>
>>> On 9/11/2025 2:30 PM, Alexey Klimov wrote:
>>>> Hi Praveen,
>>>>
>>>> On Thu Sep 11, 2025 at 9:34 AM BST, Praveen Talari wrote:
>>>>> Hi Alexy,
>>>>>
>>>>> Thank you for update.
>>>>>
>>>>> On 9/10/2025 1:35 AM, Alexey Klimov wrote:
>>>>>>
>>>>>> (adding Krzysztof to c/c)
>>>>>>
>>>>>> On Mon Sep 8, 2025 at 6:43 PM BST, Alexey Klimov wrote:
>>>>>>> On Mon Sep 8, 2025 at 5:45 PM BST, Praveen Talari wrote:
>>>>>>>> A deadlock is observed in the qcom_geni_serial driver during 
>>>>>>>> runtime
>>>>>>>> resume. This occurs when the pinctrl subsystem reconfigures 
>>>>>>>> device pins
>>>>>>>> via msm_pinmux_set_mux() while the serial device's interrupt is an
>>>>>>>> active wakeup source. msm_pinmux_set_mux() calls disable_irq() or
>>>>>>>> __synchronize_irq(), conflicting with the active wakeup state and
>>>>>>>> causing the IRQ thread to enter an uninterruptible (D-state) sleep,
>>>>>>>> leading to system instability.
>>>>>>>>
>>>>>>>> The critical call trace leading to the deadlock is:
>>>>>>>>
>>>>>>>>        Call trace:
>>>>>>>>        __switch_to+0xe0/0x120
>>>>>>>>        __schedule+0x39c/0x978
>>>>>>>>        schedule+0x5c/0xf8
>>>>>>>>        __synchronize_irq+0x88/0xb4
>>>>>>>>        disable_irq+0x3c/0x4c
>>>>>>>>        msm_pinmux_set_mux+0x508/0x644
>>>>>>>>        pinmux_enable_setting+0x190/0x2dc
>>>>>>>>        pinctrl_commit_state+0x13c/0x208
>>>>>>>>        pinctrl_pm_select_default_state+0x4c/0xa4
>>>>>>>>        geni_se_resources_on+0xe8/0x154
>>>>>>>>        qcom_geni_serial_runtime_resume+0x4c/0x88
>>>>>>>>        pm_generic_runtime_resume+0x2c/0x44
>>>>>>>>        __genpd_runtime_resume+0x30/0x80
>>>>>>>>        genpd_runtime_resume+0x114/0x29c
>>>>>>>>        __rpm_callback+0x48/0x1d8
>>>>>>>>        rpm_callback+0x6c/0x78
>>>>>>>>        rpm_resume+0x530/0x750
>>>>>>>>        __pm_runtime_resume+0x50/0x94
>>>>>>>>        handle_threaded_wake_irq+0x30/0x94
>>>>>>>>        irq_thread_fn+0x2c/xa8
>>>>>>>>        irq_thread+0x160/x248
>>>>>>>>        kthread+0x110/x114
>>>>>>>>        ret_from_fork+0x10/x20
>>>>>>>>
>>>>>>>> To resolve this, explicitly manage the wakeup IRQ state within the
>>>>>>>> runtime suspend/resume callbacks. In the runtime resume 
>>>>>>>> callback, call
>>>>>>>> disable_irq_wake() before enabling resources. This preemptively
>>>>>>>> removes the "wakeup" capability from the IRQ, allowing subsequent
>>>>>>>> interrupt management calls to proceed without conflict. An error 
>>>>>>>> path
>>>>>>>> re-enables the wakeup IRQ if resource enablement fails.
>>>>>>>>
>>>>>>>> Conversely, in runtime suspend, call enable_irq_wake() after 
>>>>>>>> resources
>>>>>>>> are disabled. This ensures the interrupt is configured as a wakeup
>>>>>>>> source only once the device has fully entered its low-power 
>>>>>>>> state. An
>>>>>>>> error path handles disabling the wakeup IRQ if the suspend 
>>>>>>>> operation
>>>>>>>> fails.
>>>>>>>>
>>>>>>>> Fixes: 1afa70632c39 ("serial: qcom-geni: Enable PM runtime for 
>>>>>>>> serial driver")
>>>>>>>> Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
>>>>>>>
>>>>>>> You forgot:
>>>>>>>
>>>>>>> Reported-by: Alexey Klimov <alexey.klimov@linaro.org>
>>>>>>>
>>>>>>> Also, not sure where this change will go, via Greg or Jiri, but 
>>>>>>> ideally
>>>>>>> this should be picked for current -rc cycle since regression is
>>>>>>> introduced during latest merge window.
>>>>>>>
>>>>>>> I also would like to test it on qrb2210 rb1 where this regression is
>>>>>>> reproduciable.

Since I don't have this board, could you kindly validate the new change 
and run a quick test on your end?

diff --git a/drivers/pinctrl/qcom/pinctrl-msm.c 
b/drivers/pinctrl/qcom/pinctrl-msm.c
index 83eb075b6bfa..3d6601dc6fcc 100644
--- a/drivers/pinctrl/qcom/pinctrl-msm.c
+++ b/drivers/pinctrl/qcom/pinctrl-msm.c
@@ -215,7 +215,7 @@ static int msm_pinmux_set_mux(struct pinctrl_dev 
*pctldev,
          */
         if (d && i != gpio_func &&
             !test_and_set_bit(d->hwirq, pctrl->disabled_for_mux))
-               disable_irq(irq);
+               disable_irq_nosync(irq);

         raw_spin_lock_irqsave(&pctrl->lock, flags);

Thanks,
Praveen Talari

>>>>>>
>>>>>> It doesn't seem that it fixes the regression on RB1 board:
>>>>>>
>>>>>>     INFO: task kworker/u16:3:50 blocked for more than 120 seconds.
>>>>>>           Not tainted 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13
>>>>>>     "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
>>>>>> this message.
>>>>>>     task:kworker/u16:3   state:D stack:0     pid:50    tgid:50    
>>>>>> ppid:2      task_flags:0x4208060 flags:0x00000010
>>>>>>     Workqueue: async async_run_entry_fn
>>>>>>     Call trace:
>>>>>>      __switch_to+0xf0/0x1c0 (T)
>>>>>>      __schedule+0x358/0x99c
>>>>>>      schedule+0x34/0x11c
>>>>>>      rpm_resume+0x17c/0x6a0
>>>>>>      rpm_resume+0x2c4/0x6a0
>>>>>>      rpm_resume+0x2c4/0x6a0
>>>>>>      rpm_resume+0x2c4/0x6a0
>>>>>>      __pm_runtime_resume+0x50/0x9c
>>>>>>      __driver_probe_device+0x58/0x120
>>>>>>      driver_probe_device+0x3c/0x154
>>>>>>      __driver_attach_async_helper+0x4c/0xc0
>>>>>>      async_run_entry_fn+0x34/0xe0
>>>>>>      process_one_work+0x148/0x284
>>>>>>      worker_thread+0x2c4/0x3e0
>>>>>>      kthread+0x12c/0x210
>>>>>>      ret_from_fork+0x10/0x20
>>>>>>     INFO: task irq/92-4a8c000.:79 blocked for more than 120 seconds.
>>>>>>           Not tainted 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13
>>>>>>     "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
>>>>>> this message.
>>>>>>     task:irq/92-4a8c000. state:D stack:0     pid:79    tgid:79    
>>>>>> ppid:2      task_flags:0x208040 flags:0x00000010
>>>>>>     Call trace:
>>>>>>      __switch_to+0xf0/0x1c0 (T)
>>>>>>      __schedule+0x358/0x99c
>>>>>>      schedule+0x34/0x11c
>>>>>>      __synchronize_irq+0x90/0xcc
>>>>>>      disable_irq+0x3c/0x4c
>>>>>>      msm_pinmux_set_mux+0x3b4/0x45c
>>>>>>      pinmux_enable_setting+0x1fc/0x2d8
>>>>>>      pinctrl_commit_state+0xa0/0x260
>>>>>>      pinctrl_pm_select_default_state+0x4c/0xa0
>>>>>>      geni_se_resources_on+0xe8/0x154
>>>>>>      geni_serial_resource_state+0x8c/0xbc
>>>>>>      qcom_geni_serial_runtime_resume+0x3c/0x88
>>>>>>      pm_generic_runtime_resume+0x2c/0x44
>>>>>>      __rpm_callback+0x48/0x1e0
>>>>>>      rpm_callback+0x74/0x80
>>>>>>      rpm_resume+0x3bc/0x6a0
>>>>>>      __pm_runtime_resume+0x50/0x9c
>>>>>>      handle_threaded_wake_irq+0x30/0x80
>>>>>>      irq_thread_fn+0x2c/0xb0
>>>>>>      irq_thread+0x170/0x334
>>>>>>      kthread+0x12c/0x210
>>>>>>      ret_from_fork+0x10/0x20
>>>>>
>>>>> I can see call stack is mostly similar for yours and mine but not
>>>>> completely at initial calls.
>>>>>
>>>>> Yours dump:
>>>>>    >    qcom_geni_serial_runtime_resume+0x3c/0x88
>>>>>    >    pm_generic_runtime_resume+0x2c/0x44
>>>>>    >    __rpm_callback+0x48/0x1e0
>>>>>    >    rpm_callback+0x74/0x80
>>>>>    >    rpm_resume+0x3bc/0x6a0
>>>>>    >    __pm_runtime_resume+0x50/0x9c
>>>>>    >    handle_threaded_wake_irq+0x30/0x80
>>>>>
>>>>> Mine:
>>>>>    >>>      qcom_geni_serial_runtime_resume+0x4c/0x88
>>>>>    >>>      pm_generic_runtime_resume+0x2c/0x44
>>>>>    >>>      __genpd_runtime_resume+0x30/0x80
>>>>>    >>>      genpd_runtime_resume+0x114/0x29c
>>>>>    >>>      __rpm_callback+0x48/0x1d8
>>>>>    >>>      rpm_callback+0x6c/0x78
>>>>>    >>>      rpm_resume+0x530/0x750
>>>>>
>>>>>
>>>>> Can you please share what is DT file for this Board if possible?
>>>>> is there any usecase enabled on this SE instance?
>>>>
>>>> Well, yeah, sorry, I didn't really compared backtraces line to line and
>>>> behaviour was exactly the same. I thought that the purpose was to fix
>>>> the regression reported earlier.
>>>>
>>>> RB1 main dts files are qrb2210-rb1.dts and qcm2290.dtsi.
>>>>
>>>> The similar board RB2 uses qrb4210-rb2.dts and sm4250.dtsi+sm6115.dtsi,
>>>> it is worth checking it as well.
>>>> For testing here I didn't use anything extra (the only change was 
>>>> wifi fix
>>>> from Loic); I tested -master and linux-next usually.
>>>>
>>>> If you can tell me what is SE instance I may be able to answer. But
>>>> as far as I know it is not a part of any infrastructure or CI 
>>>> machinery.
>>>> I just boot the board and see if it works, if it does then I rebuild 
>>>> and
>>>> test my changes (audio).
>>>
>>> I'm actively working on this and experimenting various scenarios with
>>> wakeup. I’ll share the updated patch as soon as possible.
>>>
>>> Should we include fix in V2 or new version(V1) if the fix originates
>>> from a different subsystem(pinctrol)?
>>
>> Wait, I am a bit lost. Are there two regresssions? And is this patch only
>> targets one of the them?
> I am simulated on different target(SC7280) and it is same issue only.
>> Are there two fixes now for different problems?
> The problem is same.
>> If they are not related (independent) then I'd split it but it not 
>> something
>> exceptional -- just standard rules should apply.
> I am fixing from this issue from pinctrol subsystem.
> 
> Please guide me on this.
> Should we include fix in V2 or new version(V1) if the fix originates
> from a different subsystem(pinctrol)?
> 
> Thanks,
> Praveen
>>
>> Thanks,
>> Alexey
Re: [PATCH v1] serial: qcom-geni: Fix pinctrl deadlock on runtime resume
Posted by Jorge Ramirez 4 months, 3 weeks ago
On 16/09/25 12:20:25, Praveen Talari wrote:
> Hi Alexey
> 
> Thank you for your support.
> 
> On 9/15/2025 7:55 PM, Praveen Talari wrote:
> > Hi Alexey,
> > 
> > On 9/15/2025 3:09 PM, Alexey Klimov wrote:
> > > (removing <quic_mnaresh@quicinc.com> from c/c -- too many mail not
> > > delivered)
> > > 
> > > Hi Praveen,
> > > 
> > > On Mon Sep 15, 2025 at 7:58 AM BST, Praveen Talari wrote:
> > > > Hi Alexey,
> > > > 
> > > > Really appreciate you waiting!
> > > > 
> > > > On 9/11/2025 2:30 PM, Alexey Klimov wrote:
> > > > > Hi Praveen,
> > > > > 
> > > > > On Thu Sep 11, 2025 at 9:34 AM BST, Praveen Talari wrote:
> > > > > > Hi Alexy,
> > > > > > 
> > > > > > Thank you for update.
> > > > > > 
> > > > > > On 9/10/2025 1:35 AM, Alexey Klimov wrote:
> > > > > > > 
> > > > > > > (adding Krzysztof to c/c)
> > > > > > > 
> > > > > > > On Mon Sep 8, 2025 at 6:43 PM BST, Alexey Klimov wrote:
> > > > > > > > On Mon Sep 8, 2025 at 5:45 PM BST, Praveen Talari wrote:
> > > > > > > > > A deadlock is observed in the
> > > > > > > > > qcom_geni_serial driver during runtime
> > > > > > > > > resume. This occurs when the pinctrl
> > > > > > > > > subsystem reconfigures device pins
> > > > > > > > > via msm_pinmux_set_mux() while the serial device's interrupt is an
> > > > > > > > > active wakeup source. msm_pinmux_set_mux() calls disable_irq() or
> > > > > > > > > __synchronize_irq(), conflicting with the active wakeup state and
> > > > > > > > > causing the IRQ thread to enter an uninterruptible (D-state) sleep,
> > > > > > > > > leading to system instability.
> > > > > > > > > 
> > > > > > > > > The critical call trace leading to the deadlock is:
> > > > > > > > > 
> > > > > > > > >        Call trace:
> > > > > > > > >        __switch_to+0xe0/0x120
> > > > > > > > >        __schedule+0x39c/0x978
> > > > > > > > >        schedule+0x5c/0xf8
> > > > > > > > >        __synchronize_irq+0x88/0xb4
> > > > > > > > >        disable_irq+0x3c/0x4c
> > > > > > > > >        msm_pinmux_set_mux+0x508/0x644
> > > > > > > > >        pinmux_enable_setting+0x190/0x2dc
> > > > > > > > >        pinctrl_commit_state+0x13c/0x208
> > > > > > > > >        pinctrl_pm_select_default_state+0x4c/0xa4
> > > > > > > > >        geni_se_resources_on+0xe8/0x154
> > > > > > > > >        qcom_geni_serial_runtime_resume+0x4c/0x88
> > > > > > > > >        pm_generic_runtime_resume+0x2c/0x44
> > > > > > > > >        __genpd_runtime_resume+0x30/0x80
> > > > > > > > >        genpd_runtime_resume+0x114/0x29c
> > > > > > > > >        __rpm_callback+0x48/0x1d8
> > > > > > > > >        rpm_callback+0x6c/0x78
> > > > > > > > >        rpm_resume+0x530/0x750
> > > > > > > > >        __pm_runtime_resume+0x50/0x94
> > > > > > > > >        handle_threaded_wake_irq+0x30/0x94
> > > > > > > > >        irq_thread_fn+0x2c/xa8
> > > > > > > > >        irq_thread+0x160/x248
> > > > > > > > >        kthread+0x110/x114
> > > > > > > > >        ret_from_fork+0x10/x20
> > > > > > > > > 
> > > > > > > > > To resolve this, explicitly manage the wakeup IRQ state within the
> > > > > > > > > runtime suspend/resume callbacks. In the
> > > > > > > > > runtime resume callback, call
> > > > > > > > > disable_irq_wake() before enabling resources. This preemptively
> > > > > > > > > removes the "wakeup" capability from the IRQ, allowing subsequent
> > > > > > > > > interrupt management calls to proceed
> > > > > > > > > without conflict. An error path
> > > > > > > > > re-enables the wakeup IRQ if resource enablement fails.
> > > > > > > > > 
> > > > > > > > > Conversely, in runtime suspend, call
> > > > > > > > > enable_irq_wake() after resources
> > > > > > > > > are disabled. This ensures the interrupt is configured as a wakeup
> > > > > > > > > source only once the device has fully
> > > > > > > > > entered its low-power state. An
> > > > > > > > > error path handles disabling the wakeup IRQ
> > > > > > > > > if the suspend operation
> > > > > > > > > fails.
> > > > > > > > > 
> > > > > > > > > Fixes: 1afa70632c39 ("serial: qcom-geni:
> > > > > > > > > Enable PM runtime for serial driver")
> > > > > > > > > Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
> > > > > > > > 
> > > > > > > > You forgot:
> > > > > > > > 
> > > > > > > > Reported-by: Alexey Klimov <alexey.klimov@linaro.org>
> > > > > > > > 
> > > > > > > > Also, not sure where this change will go, via
> > > > > > > > Greg or Jiri, but ideally
> > > > > > > > this should be picked for current -rc cycle since regression is
> > > > > > > > introduced during latest merge window.
> > > > > > > > 
> > > > > > > > I also would like to test it on qrb2210 rb1 where this regression is
> > > > > > > > reproduciable.
> 
> Since I don't have this board, could you kindly validate the new change and
> run a quick test on your end?
> 
> diff --git a/drivers/pinctrl/qcom/pinctrl-msm.c
> b/drivers/pinctrl/qcom/pinctrl-msm.c
> index 83eb075b6bfa..3d6601dc6fcc 100644
> --- a/drivers/pinctrl/qcom/pinctrl-msm.c
> +++ b/drivers/pinctrl/qcom/pinctrl-msm.c
> @@ -215,7 +215,7 @@ static int msm_pinmux_set_mux(struct pinctrl_dev
> *pctldev,
>          */
>         if (d && i != gpio_func &&
>             !test_and_set_bit(d->hwirq, pctrl->disabled_for_mux))
> -               disable_irq(irq);
> +               disable_irq_nosync(irq);
> 
>         raw_spin_lock_irqsave(&pctrl->lock, flags);


sorry Praveen, didnt see this proposal. testing on my end as well.
Re: [PATCH v1] serial: qcom-geni: Fix pinctrl deadlock on runtime resume
Posted by Jorge Ramirez 4 months, 3 weeks ago
On 16/09/25 16:39:00, Jorge Ramirez wrote:
> On 16/09/25 12:20:25, Praveen Talari wrote:
> > Hi Alexey
> > 
> > Thank you for your support.
> > 
> > On 9/15/2025 7:55 PM, Praveen Talari wrote:
> > > Hi Alexey,
> > > 
> > > On 9/15/2025 3:09 PM, Alexey Klimov wrote:
> > > > (removing <quic_mnaresh@quicinc.com> from c/c -- too many mail not
> > > > delivered)
> > > > 
> > > > Hi Praveen,
> > > > 
> > > > On Mon Sep 15, 2025 at 7:58 AM BST, Praveen Talari wrote:
> > > > > Hi Alexey,
> > > > > 
> > > > > Really appreciate you waiting!
> > > > > 
> > > > > On 9/11/2025 2:30 PM, Alexey Klimov wrote:
> > > > > > Hi Praveen,
> > > > > > 
> > > > > > On Thu Sep 11, 2025 at 9:34 AM BST, Praveen Talari wrote:
> > > > > > > Hi Alexy,
> > > > > > > 
> > > > > > > Thank you for update.
> > > > > > > 
> > > > > > > On 9/10/2025 1:35 AM, Alexey Klimov wrote:
> > > > > > > > 
> > > > > > > > (adding Krzysztof to c/c)
> > > > > > > > 
> > > > > > > > On Mon Sep 8, 2025 at 6:43 PM BST, Alexey Klimov wrote:
> > > > > > > > > On Mon Sep 8, 2025 at 5:45 PM BST, Praveen Talari wrote:
> > > > > > > > > > A deadlock is observed in the
> > > > > > > > > > qcom_geni_serial driver during runtime
> > > > > > > > > > resume. This occurs when the pinctrl
> > > > > > > > > > subsystem reconfigures device pins
> > > > > > > > > > via msm_pinmux_set_mux() while the serial device's interrupt is an
> > > > > > > > > > active wakeup source. msm_pinmux_set_mux() calls disable_irq() or
> > > > > > > > > > __synchronize_irq(), conflicting with the active wakeup state and
> > > > > > > > > > causing the IRQ thread to enter an uninterruptible (D-state) sleep,
> > > > > > > > > > leading to system instability.
> > > > > > > > > > 
> > > > > > > > > > The critical call trace leading to the deadlock is:
> > > > > > > > > > 
> > > > > > > > > >        Call trace:
> > > > > > > > > >        __switch_to+0xe0/0x120
> > > > > > > > > >        __schedule+0x39c/0x978
> > > > > > > > > >        schedule+0x5c/0xf8
> > > > > > > > > >        __synchronize_irq+0x88/0xb4
> > > > > > > > > >        disable_irq+0x3c/0x4c
> > > > > > > > > >        msm_pinmux_set_mux+0x508/0x644
> > > > > > > > > >        pinmux_enable_setting+0x190/0x2dc
> > > > > > > > > >        pinctrl_commit_state+0x13c/0x208
> > > > > > > > > >        pinctrl_pm_select_default_state+0x4c/0xa4
> > > > > > > > > >        geni_se_resources_on+0xe8/0x154
> > > > > > > > > >        qcom_geni_serial_runtime_resume+0x4c/0x88
> > > > > > > > > >        pm_generic_runtime_resume+0x2c/0x44
> > > > > > > > > >        __genpd_runtime_resume+0x30/0x80
> > > > > > > > > >        genpd_runtime_resume+0x114/0x29c
> > > > > > > > > >        __rpm_callback+0x48/0x1d8
> > > > > > > > > >        rpm_callback+0x6c/0x78
> > > > > > > > > >        rpm_resume+0x530/0x750
> > > > > > > > > >        __pm_runtime_resume+0x50/0x94
> > > > > > > > > >        handle_threaded_wake_irq+0x30/0x94
> > > > > > > > > >        irq_thread_fn+0x2c/xa8
> > > > > > > > > >        irq_thread+0x160/x248
> > > > > > > > > >        kthread+0x110/x114
> > > > > > > > > >        ret_from_fork+0x10/x20
> > > > > > > > > > 
> > > > > > > > > > To resolve this, explicitly manage the wakeup IRQ state within the
> > > > > > > > > > runtime suspend/resume callbacks. In the
> > > > > > > > > > runtime resume callback, call
> > > > > > > > > > disable_irq_wake() before enabling resources. This preemptively
> > > > > > > > > > removes the "wakeup" capability from the IRQ, allowing subsequent
> > > > > > > > > > interrupt management calls to proceed
> > > > > > > > > > without conflict. An error path
> > > > > > > > > > re-enables the wakeup IRQ if resource enablement fails.
> > > > > > > > > > 
> > > > > > > > > > Conversely, in runtime suspend, call
> > > > > > > > > > enable_irq_wake() after resources
> > > > > > > > > > are disabled. This ensures the interrupt is configured as a wakeup
> > > > > > > > > > source only once the device has fully
> > > > > > > > > > entered its low-power state. An
> > > > > > > > > > error path handles disabling the wakeup IRQ
> > > > > > > > > > if the suspend operation
> > > > > > > > > > fails.
> > > > > > > > > > 
> > > > > > > > > > Fixes: 1afa70632c39 ("serial: qcom-geni:
> > > > > > > > > > Enable PM runtime for serial driver")
> > > > > > > > > > Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
> > > > > > > > > 
> > > > > > > > > You forgot:
> > > > > > > > > 
> > > > > > > > > Reported-by: Alexey Klimov <alexey.klimov@linaro.org>
> > > > > > > > > 
> > > > > > > > > Also, not sure where this change will go, via
> > > > > > > > > Greg or Jiri, but ideally
> > > > > > > > > this should be picked for current -rc cycle since regression is
> > > > > > > > > introduced during latest merge window.
> > > > > > > > > 
> > > > > > > > > I also would like to test it on qrb2210 rb1 where this regression is
> > > > > > > > > reproduciable.
> > 
> > Since I don't have this board, could you kindly validate the new change and
> > run a quick test on your end?
> > 
> > diff --git a/drivers/pinctrl/qcom/pinctrl-msm.c
> > b/drivers/pinctrl/qcom/pinctrl-msm.c
> > index 83eb075b6bfa..3d6601dc6fcc 100644
> > --- a/drivers/pinctrl/qcom/pinctrl-msm.c
> > +++ b/drivers/pinctrl/qcom/pinctrl-msm.c
> > @@ -215,7 +215,7 @@ static int msm_pinmux_set_mux(struct pinctrl_dev
> > *pctldev,
> >          */
> >         if (d && i != gpio_func &&
> >             !test_and_set_bit(d->hwirq, pctrl->disabled_for_mux))
> > -               disable_irq(irq);
> > +               disable_irq_nosync(irq);
> > 
> >         raw_spin_lock_irqsave(&pctrl->lock, flags);
> 
> 
> sorry Praveen, didnt see this proposal. testing on my end as well.
> 

just tested on my end and all modules load - deadlocked before this
update so there is progress (now we can load the network driver)

I can see however irq/92 (threaded) stuck in D-state inside runtime pm

root@qrb2210-rb1-core-kit:~# echo w > /proc/sysrq-trigger                                                                                                                              
[  498.247349] sysrq: Show Blocked State
[  498.251190] task:irq/92-4a8c000. state:D stack:0     pid:80
tgid:80    ppid:2      task_flags:0x208040 flags:0x00000010
[  498.262334] Call trace:
[  498.264812]  __switch_to+0xf0/0x1c0 (T)  
[  498.268777]  __schedule+0x110/0x9bc


with irq92 being:
92: 199870  0  0   0  msmgpio  11 Level     4a8c000.serial:wakeup

this log changes over time but it is alwas irq/92:

root@qrb2210-rb1-core-kit:~# echo w > /proc/sysrq-trigger                                                                                                   [90/66818]
[  613.019101] sysrq: Show Blocked State                                                                                                                              
[  613.023055] task:irq/92-4a8c000. state:D stack:0     pid:80    tgid:80    ppid:2      task_flags:0x208040 flags:0x00000010                                         
[  613.034189] Call trace:                                                                                                                                            
[  613.036770]  __switch_to+0xf0/0x1c0 (T)                                                                                                                            
[  613.040779]  __schedule+0x35c/0x9bc                                                                                                                                
[  613.044412]  schedule+0x34/0x110                                                                                                                                   
[  613.047782]  rpm_resume+0x17c/0x690                                                                                                                                
[  613.051359]  __pm_runtime_resume+0x4c/0x98                                                                                                                         
[  613.055556]  handle_threaded_wake_irq+0x30/0x80                                                                                                                    
[  613.060168]  irq_thread_fn+0x28/0xa8                                                                                                                               
[  613.063864]  irq_thread+0x178/0x338                                                                                                                                
[  613.067434]  kthread+0x12c/0x210                                                                                                                                   
[  613.070735]  ret_from_fork+0x10/0x20                                                                                                                               
root@qrb2210-rb1-core-kit:~#                                                                                                                                          
root@qrb2210-rb1-core-kit:~# echo w > /proc/sysrq-trigger                                                                                                             
[  617.586960] sysrq: Show Blocked State                                                              
[  617.590771] task:irq/92-4a8c000. state:D stack:0     pid:80    tgid:80    ppid:2      task_flags:0x208040 flags:0x00000010                                        
[  617.601906] Call trace:                                                                            
[  617.604442]  __switch_to+0xf0/0x1c0 (T)                                                            
[  617.608408]  __schedule+0x35c/0x9bc                                                                
[  617.612074]  0x766c7362                                                                            
root@qrb2210-rb1-core-kit:~#                                                                          
root@qrb2210-rb1-core-kit:~#                                                                                                                                          
root@qrb2210-rb1-core-kit:~# echo w > /proc/sysrq-trigger                           
[  619.656937] sysrq: Show Blocked State                                                                                                                              
[  619.660847] task:irq/92-4a8c000. state:D stack:0     pid:80    tgid:80    ppid:2      task_flags:0x208040 flags:0x00000010
[  619.672009] Call trace:                                                                                                                                            
[  619.674531]  __switch_to+0xf0/0x1c0 (T)                                                                                                                            
[  619.678508]  __schedule+0x35c/0x9bc                                                                                                                                
[  619.682102]  schedule+0x34/0x110                                                                                                                                   
[  619.685488]  schedule_timeout+0x80/0x104                                                                                                                           
root@qrb2210-rb1-core-kit:~#                                                                                                                                          
root@qrb2210-rb1-core-kit:~#                                                                                                                                          
root@qrb2210-rb1-core-kit:~# echo w > /proc/sysrq-trigger                           
[  624.786811] sysrq: Show Blocked State                                                                                                                              
root@qrb2210-rb1-core-kit:~#                                                                                                                                          
root@qrb2210-rb1-core-kit:~#                                                                                                                                          
root@qrb2210-rb1-core-kit:~# echo w > /proc/sysrq-trigger                           
[  630.546744] sysrq: Show Blocked State                                                                                                                              
[  630.550593] task:irq/92-4a8c000. state:D stack:0     pid:80    tgid:80    ppid:2      task_flags:0x208040 flags:0x00000010
[  630.561724] Call trace:                                                                                                                                            
[  630.564219]  __switch_to+0xf0/0x1c0 (T)                                                                                                                            
[  630.568138]  __schedule+0x35c/0x9bc                                                                                                                                
[  630.571729]  0x766c7362                                                                                                                                            
root@qrb2210-rb1-core-kit:~#
Re: [PATCH v1] serial: qcom-geni: Fix pinctrl deadlock on runtime resume
Posted by Alexey Klimov 4 months, 3 weeks ago
Hi Praveen,

On Tue Sep 16, 2025 at 4:07 PM BST, Jorge Ramirez wrote:
> On 16/09/25 16:39:00, Jorge Ramirez wrote:
>> On 16/09/25 12:20:25, Praveen Talari wrote:
>> > Hi Alexey
>> > 
>> > Thank you for your support.
>> > 
>> > On 9/15/2025 7:55 PM, Praveen Talari wrote:
>> > > Hi Alexey,
>> > > 
>> > > On 9/15/2025 3:09 PM, Alexey Klimov wrote:
>> > > > (removing <quic_mnaresh@quicinc.com> from c/c -- too many mail not
>> > > > delivered)
>> > > > 
>> > > > Hi Praveen,
>> > > > 
>> > > > On Mon Sep 15, 2025 at 7:58 AM BST, Praveen Talari wrote:
>> > > > > Hi Alexey,
>> > > > > 
>> > > > > Really appreciate you waiting!
>> > > > > 
>> > > > > On 9/11/2025 2:30 PM, Alexey Klimov wrote:
>> > > > > > Hi Praveen,
>> > > > > > 
>> > > > > > On Thu Sep 11, 2025 at 9:34 AM BST, Praveen Talari wrote:
>> > > > > > > Hi Alexy,
>> > > > > > > 
>> > > > > > > Thank you for update.
>> > > > > > > 
>> > > > > > > On 9/10/2025 1:35 AM, Alexey Klimov wrote:
>> > > > > > > > 
>> > > > > > > > (adding Krzysztof to c/c)
>> > > > > > > > 
>> > > > > > > > On Mon Sep 8, 2025 at 6:43 PM BST, Alexey Klimov wrote:
>> > > > > > > > > On Mon Sep 8, 2025 at 5:45 PM BST, Praveen Talari wrote:
>> > > > > > > > > > A deadlock is observed in the
>> > > > > > > > > > qcom_geni_serial driver during runtime
>> > > > > > > > > > resume. This occurs when the pinctrl
>> > > > > > > > > > subsystem reconfigures device pins
>> > > > > > > > > > via msm_pinmux_set_mux() while the serial device's interrupt is an
>> > > > > > > > > > active wakeup source. msm_pinmux_set_mux() calls disable_irq() or
>> > > > > > > > > > __synchronize_irq(), conflicting with the active wakeup state and
>> > > > > > > > > > causing the IRQ thread to enter an uninterruptible (D-state) sleep,
>> > > > > > > > > > leading to system instability.
>> > > > > > > > > > 
>> > > > > > > > > > The critical call trace leading to the deadlock is:
>> > > > > > > > > > 
>> > > > > > > > > >        Call trace:
>> > > > > > > > > >        __switch_to+0xe0/0x120
>> > > > > > > > > >        __schedule+0x39c/0x978
>> > > > > > > > > >        schedule+0x5c/0xf8
>> > > > > > > > > >        __synchronize_irq+0x88/0xb4
>> > > > > > > > > >        disable_irq+0x3c/0x4c
>> > > > > > > > > >        msm_pinmux_set_mux+0x508/0x644
>> > > > > > > > > >        pinmux_enable_setting+0x190/0x2dc
>> > > > > > > > > >        pinctrl_commit_state+0x13c/0x208
>> > > > > > > > > >        pinctrl_pm_select_default_state+0x4c/0xa4
>> > > > > > > > > >        geni_se_resources_on+0xe8/0x154
>> > > > > > > > > >        qcom_geni_serial_runtime_resume+0x4c/0x88
>> > > > > > > > > >        pm_generic_runtime_resume+0x2c/0x44
>> > > > > > > > > >        __genpd_runtime_resume+0x30/0x80
>> > > > > > > > > >        genpd_runtime_resume+0x114/0x29c
>> > > > > > > > > >        __rpm_callback+0x48/0x1d8
>> > > > > > > > > >        rpm_callback+0x6c/0x78
>> > > > > > > > > >        rpm_resume+0x530/0x750
>> > > > > > > > > >        __pm_runtime_resume+0x50/0x94
>> > > > > > > > > >        handle_threaded_wake_irq+0x30/0x94
>> > > > > > > > > >        irq_thread_fn+0x2c/xa8
>> > > > > > > > > >        irq_thread+0x160/x248
>> > > > > > > > > >        kthread+0x110/x114
>> > > > > > > > > >        ret_from_fork+0x10/x20
>> > > > > > > > > > 
>> > > > > > > > > > To resolve this, explicitly manage the wakeup IRQ state within the
>> > > > > > > > > > runtime suspend/resume callbacks. In the
>> > > > > > > > > > runtime resume callback, call
>> > > > > > > > > > disable_irq_wake() before enabling resources. This preemptively
>> > > > > > > > > > removes the "wakeup" capability from the IRQ, allowing subsequent
>> > > > > > > > > > interrupt management calls to proceed
>> > > > > > > > > > without conflict. An error path
>> > > > > > > > > > re-enables the wakeup IRQ if resource enablement fails.
>> > > > > > > > > > 
>> > > > > > > > > > Conversely, in runtime suspend, call
>> > > > > > > > > > enable_irq_wake() after resources
>> > > > > > > > > > are disabled. This ensures the interrupt is configured as a wakeup
>> > > > > > > > > > source only once the device has fully
>> > > > > > > > > > entered its low-power state. An
>> > > > > > > > > > error path handles disabling the wakeup IRQ
>> > > > > > > > > > if the suspend operation
>> > > > > > > > > > fails.
>> > > > > > > > > > 
>> > > > > > > > > > Fixes: 1afa70632c39 ("serial: qcom-geni:
>> > > > > > > > > > Enable PM runtime for serial driver")
>> > > > > > > > > > Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
>> > > > > > > > > 
>> > > > > > > > > You forgot:
>> > > > > > > > > 
>> > > > > > > > > Reported-by: Alexey Klimov <alexey.klimov@linaro.org>
>> > > > > > > > > 
>> > > > > > > > > Also, not sure where this change will go, via
>> > > > > > > > > Greg or Jiri, but ideally
>> > > > > > > > > this should be picked for current -rc cycle since regression is
>> > > > > > > > > introduced during latest merge window.
>> > > > > > > > > 
>> > > > > > > > > I also would like to test it on qrb2210 rb1 where this regression is
>> > > > > > > > > reproduciable.
>> > 
>> > Since I don't have this board, could you kindly validate the new change and
>> > run a quick test on your end?
>> > 
>> > diff --git a/drivers/pinctrl/qcom/pinctrl-msm.c
>> > b/drivers/pinctrl/qcom/pinctrl-msm.c
>> > index 83eb075b6bfa..3d6601dc6fcc 100644
>> > --- a/drivers/pinctrl/qcom/pinctrl-msm.c
>> > +++ b/drivers/pinctrl/qcom/pinctrl-msm.c
>> > @@ -215,7 +215,7 @@ static int msm_pinmux_set_mux(struct pinctrl_dev
>> > *pctldev,
>> >          */
>> >         if (d && i != gpio_func &&
>> >             !test_and_set_bit(d->hwirq, pctrl->disabled_for_mux))
>> > -               disable_irq(irq);
>> > +               disable_irq_nosync(irq);
>> > 
>> >         raw_spin_lock_irqsave(&pctrl->lock, flags);
>> 
>> 
>> sorry Praveen, didnt see this proposal. testing on my end as well.
>> 
>
> just tested on my end and all modules load - deadlocked before this
> update so there is progress (now we can load the network driver)

Is it supposed to be orginal patch here plus disable_irq_nosync()?
Meaning changes for qcom_geni_serial_runtime_{suspend,resume}
+ disable_irq_nosync() in msm_pinmux_set_mux()?

It seems to work here but let me know few more runs.

Best regards,
Alexey
Re: [PATCH v1] serial: qcom-geni: Fix pinctrl deadlock on runtime resume
Posted by Praveen Talari 4 months, 3 weeks ago
Hi Alexey,

On 9/16/2025 10:42 PM, Alexey Klimov wrote:
> Hi Praveen,
> 
> On Tue Sep 16, 2025 at 4:07 PM BST, Jorge Ramirez wrote:
>> On 16/09/25 16:39:00, Jorge Ramirez wrote:
>>> On 16/09/25 12:20:25, Praveen Talari wrote:
>>>> Hi Alexey
>>>>
>>>> Thank you for your support.
>>>>
>>>> On 9/15/2025 7:55 PM, Praveen Talari wrote:
>>>>> Hi Alexey,
>>>>>
>>>>> On 9/15/2025 3:09 PM, Alexey Klimov wrote:
>>>>>> (removing <quic_mnaresh@quicinc.com> from c/c -- too many mail not
>>>>>> delivered)
>>>>>>
>>>>>> Hi Praveen,
>>>>>>
>>>>>> On Mon Sep 15, 2025 at 7:58 AM BST, Praveen Talari wrote:
>>>>>>> Hi Alexey,
>>>>>>>
>>>>>>> Really appreciate you waiting!
>>>>>>>
>>>>>>> On 9/11/2025 2:30 PM, Alexey Klimov wrote:
>>>>>>>> Hi Praveen,
>>>>>>>>
>>>>>>>> On Thu Sep 11, 2025 at 9:34 AM BST, Praveen Talari wrote:
>>>>>>>>> Hi Alexy,
>>>>>>>>>
>>>>>>>>> Thank you for update.
>>>>>>>>>
>>>>>>>>> On 9/10/2025 1:35 AM, Alexey Klimov wrote:
>>>>>>>>>>
>>>>>>>>>> (adding Krzysztof to c/c)
>>>>>>>>>>
>>>>>>>>>> On Mon Sep 8, 2025 at 6:43 PM BST, Alexey Klimov wrote:
>>>>>>>>>>> On Mon Sep 8, 2025 at 5:45 PM BST, Praveen Talari wrote:
>>>>>>>>>>>> A deadlock is observed in the
>>>>>>>>>>>> qcom_geni_serial driver during runtime
>>>>>>>>>>>> resume. This occurs when the pinctrl
>>>>>>>>>>>> subsystem reconfigures device pins
>>>>>>>>>>>> via msm_pinmux_set_mux() while the serial device's interrupt is an
>>>>>>>>>>>> active wakeup source. msm_pinmux_set_mux() calls disable_irq() or
>>>>>>>>>>>> __synchronize_irq(), conflicting with the active wakeup state and
>>>>>>>>>>>> causing the IRQ thread to enter an uninterruptible (D-state) sleep,
>>>>>>>>>>>> leading to system instability.
>>>>>>>>>>>>
>>>>>>>>>>>> The critical call trace leading to the deadlock is:
>>>>>>>>>>>>
>>>>>>>>>>>>         Call trace:
>>>>>>>>>>>>         __switch_to+0xe0/0x120
>>>>>>>>>>>>         __schedule+0x39c/0x978
>>>>>>>>>>>>         schedule+0x5c/0xf8
>>>>>>>>>>>>         __synchronize_irq+0x88/0xb4
>>>>>>>>>>>>         disable_irq+0x3c/0x4c
>>>>>>>>>>>>         msm_pinmux_set_mux+0x508/0x644
>>>>>>>>>>>>         pinmux_enable_setting+0x190/0x2dc
>>>>>>>>>>>>         pinctrl_commit_state+0x13c/0x208
>>>>>>>>>>>>         pinctrl_pm_select_default_state+0x4c/0xa4
>>>>>>>>>>>>         geni_se_resources_on+0xe8/0x154
>>>>>>>>>>>>         qcom_geni_serial_runtime_resume+0x4c/0x88
>>>>>>>>>>>>         pm_generic_runtime_resume+0x2c/0x44
>>>>>>>>>>>>         __genpd_runtime_resume+0x30/0x80
>>>>>>>>>>>>         genpd_runtime_resume+0x114/0x29c
>>>>>>>>>>>>         __rpm_callback+0x48/0x1d8
>>>>>>>>>>>>         rpm_callback+0x6c/0x78
>>>>>>>>>>>>         rpm_resume+0x530/0x750
>>>>>>>>>>>>         __pm_runtime_resume+0x50/0x94
>>>>>>>>>>>>         handle_threaded_wake_irq+0x30/0x94
>>>>>>>>>>>>         irq_thread_fn+0x2c/xa8
>>>>>>>>>>>>         irq_thread+0x160/x248
>>>>>>>>>>>>         kthread+0x110/x114
>>>>>>>>>>>>         ret_from_fork+0x10/x20
>>>>>>>>>>>>
>>>>>>>>>>>> To resolve this, explicitly manage the wakeup IRQ state within the
>>>>>>>>>>>> runtime suspend/resume callbacks. In the
>>>>>>>>>>>> runtime resume callback, call
>>>>>>>>>>>> disable_irq_wake() before enabling resources. This preemptively
>>>>>>>>>>>> removes the "wakeup" capability from the IRQ, allowing subsequent
>>>>>>>>>>>> interrupt management calls to proceed
>>>>>>>>>>>> without conflict. An error path
>>>>>>>>>>>> re-enables the wakeup IRQ if resource enablement fails.
>>>>>>>>>>>>
>>>>>>>>>>>> Conversely, in runtime suspend, call
>>>>>>>>>>>> enable_irq_wake() after resources
>>>>>>>>>>>> are disabled. This ensures the interrupt is configured as a wakeup
>>>>>>>>>>>> source only once the device has fully
>>>>>>>>>>>> entered its low-power state. An
>>>>>>>>>>>> error path handles disabling the wakeup IRQ
>>>>>>>>>>>> if the suspend operation
>>>>>>>>>>>> fails.
>>>>>>>>>>>>
>>>>>>>>>>>> Fixes: 1afa70632c39 ("serial: qcom-geni:
>>>>>>>>>>>> Enable PM runtime for serial driver")
>>>>>>>>>>>> Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
>>>>>>>>>>>
>>>>>>>>>>> You forgot:
>>>>>>>>>>>
>>>>>>>>>>> Reported-by: Alexey Klimov <alexey.klimov@linaro.org>
>>>>>>>>>>>
>>>>>>>>>>> Also, not sure where this change will go, via
>>>>>>>>>>> Greg or Jiri, but ideally
>>>>>>>>>>> this should be picked for current -rc cycle since regression is
>>>>>>>>>>> introduced during latest merge window.
>>>>>>>>>>>
>>>>>>>>>>> I also would like to test it on qrb2210 rb1 where this regression is
>>>>>>>>>>> reproduciable.
>>>>
>>>> Since I don't have this board, could you kindly validate the new change and
>>>> run a quick test on your end?
>>>>
>>>> diff --git a/drivers/pinctrl/qcom/pinctrl-msm.c
>>>> b/drivers/pinctrl/qcom/pinctrl-msm.c
>>>> index 83eb075b6bfa..3d6601dc6fcc 100644
>>>> --- a/drivers/pinctrl/qcom/pinctrl-msm.c
>>>> +++ b/drivers/pinctrl/qcom/pinctrl-msm.c
>>>> @@ -215,7 +215,7 @@ static int msm_pinmux_set_mux(struct pinctrl_dev
>>>> *pctldev,
>>>>           */
>>>>          if (d && i != gpio_func &&
>>>>              !test_and_set_bit(d->hwirq, pctrl->disabled_for_mux))
>>>> -               disable_irq(irq);
>>>> +               disable_irq_nosync(irq);
>>>>
>>>>          raw_spin_lock_irqsave(&pctrl->lock, flags);
>>>
>>>
>>> sorry Praveen, didnt see this proposal. testing on my end as well.
>>>
>>
>> just tested on my end and all modules load - deadlocked before this
>> update so there is progress (now we can load the network driver)
> 
> Is it supposed to be orginal patch here plus disable_irq_nosync()?

Only this disable_irq_nosync() change from pinctrol subsystem.

Thanks,
Praveen Talari
> Meaning changes for qcom_geni_serial_runtime_{suspend,resume}
> + disable_irq_nosync() in msm_pinmux_set_mux()?

No, only disable_irq_nosync() in msm_pinmux_set_mux().
> 
> It seems to work here but let me know few more runs.
> 
> Best regards,
> Alexey
> 
> 
> 
Re: [PATCH v1] serial: qcom-geni: Fix pinctrl deadlock on runtime resume
Posted by Krzysztof Kozlowski 4 months, 3 weeks ago
On 17/09/2025 19:12, Alexey Klimov wrote:
> Hi Praveen,
> 
> On Tue Sep 16, 2025 at 4:07 PM BST, Jorge Ramirez wrote:
>> On 16/09/25 16:39:00, Jorge Ramirez wrote:
>>> On 16/09/25 12:20:25, Praveen Talari wrote:
>>>> Hi Alexey
>>>>
>>>> Thank you for your support.
>>>>
>>>> On 9/15/2025 7:55 PM, Praveen Talari wrote:
>>>>> Hi Alexey,
>>>>>
>>>>> On 9/15/2025 3:09 PM, Alexey Klimov wrote:
>>>>>> (removing <quic_mnaresh@quicinc.com> from c/c -- too many mail not
>>>>>> delivered)
>>>>>>
>>>>>> Hi Praveen,
>>>>>>
>>>>>> On Mon Sep 15, 2025 at 7:58 AM BST, Praveen Talari wrote:
>>>>>>> Hi Alexey,
>>>>>>>
>>>>>>> Really appreciate you waiting!
>>>>>>>
>>>>>>> On 9/11/2025 2:30 PM, Alexey Klimov wrote:
>>>>>>>> Hi Praveen,
>>>>>>>>
>>>>>>>> On Thu Sep 11, 2025 at 9:34 AM BST, Praveen Talari wrote:
>>>>>>>>> Hi Alexy,
>>>>>>>>>
>>>>>>>>> Thank you for update.
>>>>>>>>>
>>>>>>>>> On 9/10/2025 1:35 AM, Alexey Klimov wrote:
>>>>>>>>>>
>>>>>>>>>> (adding Krzysztof to c/c)
>>>>>>>>>>
>>>>>>>>>> On Mon Sep 8, 2025 at 6:43 PM BST, Alexey Klimov wrote:
>>>>>>>>>>> On Mon Sep 8, 2025 at 5:45 PM BST, Praveen Talari wrote:
>>>>>>>>>>>> A deadlock is observed in the
>>>>>>>>>>>> qcom_geni_serial driver during runtime
>>>>>>>>>>>> resume. This occurs when the pinctrl
>>>>>>>>>>>> subsystem reconfigures device pins
>>>>>>>>>>>> via msm_pinmux_set_mux() while the serial device's interrupt is an
>>>>>>>>>>>> active wakeup source. msm_pinmux_set_mux() calls disable_irq() or
>>>>>>>>>>>> __synchronize_irq(), conflicting with the active wakeup state and
>>>>>>>>>>>> causing the IRQ thread to enter an uninterruptible (D-state) sleep,
>>>>>>>>>>>> leading to system instability.
>>>>>>>>>>>>
>>>>>>>>>>>> The critical call trace leading to the deadlock is:
>>>>>>>>>>>>
>>>>>>>>>>>>        Call trace:
>>>>>>>>>>>>        __switch_to+0xe0/0x120
>>>>>>>>>>>>        __schedule+0x39c/0x978
>>>>>>>>>>>>        schedule+0x5c/0xf8
>>>>>>>>>>>>        __synchronize_irq+0x88/0xb4
>>>>>>>>>>>>        disable_irq+0x3c/0x4c
>>>>>>>>>>>>        msm_pinmux_set_mux+0x508/0x644
>>>>>>>>>>>>        pinmux_enable_setting+0x190/0x2dc
>>>>>>>>>>>>        pinctrl_commit_state+0x13c/0x208
>>>>>>>>>>>>        pinctrl_pm_select_default_state+0x4c/0xa4
>>>>>>>>>>>>        geni_se_resources_on+0xe8/0x154
>>>>>>>>>>>>        qcom_geni_serial_runtime_resume+0x4c/0x88
>>>>>>>>>>>>        pm_generic_runtime_resume+0x2c/0x44
>>>>>>>>>>>>        __genpd_runtime_resume+0x30/0x80
>>>>>>>>>>>>        genpd_runtime_resume+0x114/0x29c
>>>>>>>>>>>>        __rpm_callback+0x48/0x1d8
>>>>>>>>>>>>        rpm_callback+0x6c/0x78
>>>>>>>>>>>>        rpm_resume+0x530/0x750
>>>>>>>>>>>>        __pm_runtime_resume+0x50/0x94
>>>>>>>>>>>>        handle_threaded_wake_irq+0x30/0x94
>>>>>>>>>>>>        irq_thread_fn+0x2c/xa8
>>>>>>>>>>>>        irq_thread+0x160/x248
>>>>>>>>>>>>        kthread+0x110/x114
>>>>>>>>>>>>        ret_from_fork+0x10/x20
>>>>>>>>>>>>
>>>>>>>>>>>> To resolve this, explicitly manage the wakeup IRQ state within the
>>>>>>>>>>>> runtime suspend/resume callbacks. In the
>>>>>>>>>>>> runtime resume callback, call
>>>>>>>>>>>> disable_irq_wake() before enabling resources. This preemptively
>>>>>>>>>>>> removes the "wakeup" capability from the IRQ, allowing subsequent
>>>>>>>>>>>> interrupt management calls to proceed
>>>>>>>>>>>> without conflict. An error path
>>>>>>>>>>>> re-enables the wakeup IRQ if resource enablement fails.
>>>>>>>>>>>>
>>>>>>>>>>>> Conversely, in runtime suspend, call
>>>>>>>>>>>> enable_irq_wake() after resources
>>>>>>>>>>>> are disabled. This ensures the interrupt is configured as a wakeup
>>>>>>>>>>>> source only once the device has fully
>>>>>>>>>>>> entered its low-power state. An
>>>>>>>>>>>> error path handles disabling the wakeup IRQ
>>>>>>>>>>>> if the suspend operation
>>>>>>>>>>>> fails.
>>>>>>>>>>>>
>>>>>>>>>>>> Fixes: 1afa70632c39 ("serial: qcom-geni:
>>>>>>>>>>>> Enable PM runtime for serial driver")
>>>>>>>>>>>> Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
>>>>>>>>>>>
>>>>>>>>>>> You forgot:
>>>>>>>>>>>
>>>>>>>>>>> Reported-by: Alexey Klimov <alexey.klimov@linaro.org>
>>>>>>>>>>>
>>>>>>>>>>> Also, not sure where this change will go, via
>>>>>>>>>>> Greg or Jiri, but ideally
>>>>>>>>>>> this should be picked for current -rc cycle since regression is
>>>>>>>>>>> introduced during latest merge window.
>>>>>>>>>>>
>>>>>>>>>>> I also would like to test it on qrb2210 rb1 where this regression is
>>>>>>>>>>> reproduciable.
>>>>
>>>> Since I don't have this board, could you kindly validate the new change and
>>>> run a quick test on your end?
>>>>
>>>> diff --git a/drivers/pinctrl/qcom/pinctrl-msm.c
>>>> b/drivers/pinctrl/qcom/pinctrl-msm.c
>>>> index 83eb075b6bfa..3d6601dc6fcc 100644
>>>> --- a/drivers/pinctrl/qcom/pinctrl-msm.c
>>>> +++ b/drivers/pinctrl/qcom/pinctrl-msm.c
>>>> @@ -215,7 +215,7 @@ static int msm_pinmux_set_mux(struct pinctrl_dev
>>>> *pctldev,
>>>>          */
>>>>         if (d && i != gpio_func &&
>>>>             !test_and_set_bit(d->hwirq, pctrl->disabled_for_mux))
>>>> -               disable_irq(irq);
>>>> +               disable_irq_nosync(irq);
>>>>
>>>>         raw_spin_lock_irqsave(&pctrl->lock, flags);
>>>
>>>
>>> sorry Praveen, didnt see this proposal. testing on my end as well.
>>>
>>
>> just tested on my end and all modules load - deadlocked before this
>> update so there is progress (now we can load the network driver)
> 
> Is it supposed to be orginal patch here plus disable_irq_nosync()?
> Meaning changes for qcom_geni_serial_runtime_{suspend,resume}
> + disable_irq_nosync() in msm_pinmux_set_mux()?
> 
> It seems to work here but let me know few more runs.


So this bug, after 5 weeks is still not fixed?!?

This is just and should be reverted long time ago.

Best regards,
Krzysztof
Re: [PATCH v1] serial: qcom-geni: Fix pinctrl deadlock on runtime resume
Posted by Krzysztof Kozlowski 4 months, 3 weeks ago
On 17/09/2025 02:05, Krzysztof Kozlowski wrote:
> On 17/09/2025 19:12, Alexey Klimov wrote:
>> Hi Praveen,
>>
>> On Tue Sep 16, 2025 at 4:07 PM BST, Jorge Ramirez wrote:
>>> On 16/09/25 16:39:00, Jorge Ramirez wrote:
>>>> On 16/09/25 12:20:25, Praveen Talari wrote:
>>>>> Hi Alexey
>>>>>
>>>>> Thank you for your support.
>>>>>
>>>>> On 9/15/2025 7:55 PM, Praveen Talari wrote:
>>>>>> Hi Alexey,
>>>>>>
>>>>>> On 9/15/2025 3:09 PM, Alexey Klimov wrote:
>>>>>>> (removing <quic_mnaresh@quicinc.com> from c/c -- too many mail not
>>>>>>> delivered)
>>>>>>>
>>>>>>> Hi Praveen,
>>>>>>>
>>>>>>> On Mon Sep 15, 2025 at 7:58 AM BST, Praveen Talari wrote:
>>>>>>>> Hi Alexey,
>>>>>>>>
>>>>>>>> Really appreciate you waiting!
>>>>>>>>
>>>>>>>> On 9/11/2025 2:30 PM, Alexey Klimov wrote:
>>>>>>>>> Hi Praveen,
>>>>>>>>>
>>>>>>>>> On Thu Sep 11, 2025 at 9:34 AM BST, Praveen Talari wrote:
>>>>>>>>>> Hi Alexy,
>>>>>>>>>>
>>>>>>>>>> Thank you for update.
>>>>>>>>>>
>>>>>>>>>> On 9/10/2025 1:35 AM, Alexey Klimov wrote:
>>>>>>>>>>>
>>>>>>>>>>> (adding Krzysztof to c/c)
>>>>>>>>>>>
>>>>>>>>>>> On Mon Sep 8, 2025 at 6:43 PM BST, Alexey Klimov wrote:
>>>>>>>>>>>> On Mon Sep 8, 2025 at 5:45 PM BST, Praveen Talari wrote:
>>>>>>>>>>>>> A deadlock is observed in the
>>>>>>>>>>>>> qcom_geni_serial driver during runtime
>>>>>>>>>>>>> resume. This occurs when the pinctrl
>>>>>>>>>>>>> subsystem reconfigures device pins
>>>>>>>>>>>>> via msm_pinmux_set_mux() while the serial device's interrupt is an
>>>>>>>>>>>>> active wakeup source. msm_pinmux_set_mux() calls disable_irq() or
>>>>>>>>>>>>> __synchronize_irq(), conflicting with the active wakeup state and
>>>>>>>>>>>>> causing the IRQ thread to enter an uninterruptible (D-state) sleep,
>>>>>>>>>>>>> leading to system instability.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The critical call trace leading to the deadlock is:
>>>>>>>>>>>>>
>>>>>>>>>>>>>        Call trace:
>>>>>>>>>>>>>        __switch_to+0xe0/0x120
>>>>>>>>>>>>>        __schedule+0x39c/0x978
>>>>>>>>>>>>>        schedule+0x5c/0xf8
>>>>>>>>>>>>>        __synchronize_irq+0x88/0xb4
>>>>>>>>>>>>>        disable_irq+0x3c/0x4c
>>>>>>>>>>>>>        msm_pinmux_set_mux+0x508/0x644
>>>>>>>>>>>>>        pinmux_enable_setting+0x190/0x2dc
>>>>>>>>>>>>>        pinctrl_commit_state+0x13c/0x208
>>>>>>>>>>>>>        pinctrl_pm_select_default_state+0x4c/0xa4
>>>>>>>>>>>>>        geni_se_resources_on+0xe8/0x154
>>>>>>>>>>>>>        qcom_geni_serial_runtime_resume+0x4c/0x88
>>>>>>>>>>>>>        pm_generic_runtime_resume+0x2c/0x44
>>>>>>>>>>>>>        __genpd_runtime_resume+0x30/0x80
>>>>>>>>>>>>>        genpd_runtime_resume+0x114/0x29c
>>>>>>>>>>>>>        __rpm_callback+0x48/0x1d8
>>>>>>>>>>>>>        rpm_callback+0x6c/0x78
>>>>>>>>>>>>>        rpm_resume+0x530/0x750
>>>>>>>>>>>>>        __pm_runtime_resume+0x50/0x94
>>>>>>>>>>>>>        handle_threaded_wake_irq+0x30/0x94
>>>>>>>>>>>>>        irq_thread_fn+0x2c/xa8
>>>>>>>>>>>>>        irq_thread+0x160/x248
>>>>>>>>>>>>>        kthread+0x110/x114
>>>>>>>>>>>>>        ret_from_fork+0x10/x20
>>>>>>>>>>>>>
>>>>>>>>>>>>> To resolve this, explicitly manage the wakeup IRQ state within the
>>>>>>>>>>>>> runtime suspend/resume callbacks. In the
>>>>>>>>>>>>> runtime resume callback, call
>>>>>>>>>>>>> disable_irq_wake() before enabling resources. This preemptively
>>>>>>>>>>>>> removes the "wakeup" capability from the IRQ, allowing subsequent
>>>>>>>>>>>>> interrupt management calls to proceed
>>>>>>>>>>>>> without conflict. An error path
>>>>>>>>>>>>> re-enables the wakeup IRQ if resource enablement fails.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Conversely, in runtime suspend, call
>>>>>>>>>>>>> enable_irq_wake() after resources
>>>>>>>>>>>>> are disabled. This ensures the interrupt is configured as a wakeup
>>>>>>>>>>>>> source only once the device has fully
>>>>>>>>>>>>> entered its low-power state. An
>>>>>>>>>>>>> error path handles disabling the wakeup IRQ
>>>>>>>>>>>>> if the suspend operation
>>>>>>>>>>>>> fails.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Fixes: 1afa70632c39 ("serial: qcom-geni:
>>>>>>>>>>>>> Enable PM runtime for serial driver")
>>>>>>>>>>>>> Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
>>>>>>>>>>>>
>>>>>>>>>>>> You forgot:
>>>>>>>>>>>>
>>>>>>>>>>>> Reported-by: Alexey Klimov <alexey.klimov@linaro.org>
>>>>>>>>>>>>
>>>>>>>>>>>> Also, not sure where this change will go, via
>>>>>>>>>>>> Greg or Jiri, but ideally
>>>>>>>>>>>> this should be picked for current -rc cycle since regression is
>>>>>>>>>>>> introduced during latest merge window.
>>>>>>>>>>>>
>>>>>>>>>>>> I also would like to test it on qrb2210 rb1 where this regression is
>>>>>>>>>>>> reproduciable.
>>>>>
>>>>> Since I don't have this board, could you kindly validate the new change and
>>>>> run a quick test on your end?
>>>>>
>>>>> diff --git a/drivers/pinctrl/qcom/pinctrl-msm.c
>>>>> b/drivers/pinctrl/qcom/pinctrl-msm.c
>>>>> index 83eb075b6bfa..3d6601dc6fcc 100644
>>>>> --- a/drivers/pinctrl/qcom/pinctrl-msm.c
>>>>> +++ b/drivers/pinctrl/qcom/pinctrl-msm.c
>>>>> @@ -215,7 +215,7 @@ static int msm_pinmux_set_mux(struct pinctrl_dev
>>>>> *pctldev,
>>>>>          */
>>>>>         if (d && i != gpio_func &&
>>>>>             !test_and_set_bit(d->hwirq, pctrl->disabled_for_mux))
>>>>> -               disable_irq(irq);
>>>>> +               disable_irq_nosync(irq);
>>>>>
>>>>>         raw_spin_lock_irqsave(&pctrl->lock, flags);
>>>>
>>>>
>>>> sorry Praveen, didnt see this proposal. testing on my end as well.
>>>>
>>>
>>> just tested on my end and all modules load - deadlocked before this
>>> update so there is progress (now we can load the network driver)
>>
>> Is it supposed to be orginal patch here plus disable_irq_nosync()?
>> Meaning changes for qcom_geni_serial_runtime_{suspend,resume}
>> + disable_irq_nosync() in msm_pinmux_set_mux()?
>>
>> It seems to work here but let me know few more runs.
> 
> 
> So this bug, after 5 weeks is still not fixed?!?
> 
> This is just and should be reverted long time ago.

I will send the revert, because this is just mocking the kernel process.

Best regards,
Krzysztof
Re: [PATCH v1] serial: qcom-geni: Fix pinctrl deadlock on runtime resume
Posted by Praveen Talari 4 months, 3 weeks ago

On 9/17/2025 5:43 AM, Krzysztof Kozlowski wrote:
> On 17/09/2025 02:05, Krzysztof Kozlowski wrote:
>> On 17/09/2025 19:12, Alexey Klimov wrote:
>>> Hi Praveen,
>>>
>>> On Tue Sep 16, 2025 at 4:07 PM BST, Jorge Ramirez wrote:
>>>> On 16/09/25 16:39:00, Jorge Ramirez wrote:
>>>>> On 16/09/25 12:20:25, Praveen Talari wrote:
>>>>>> Hi Alexey
>>>>>>
>>>>>> Thank you for your support.
>>>>>>
>>>>>> On 9/15/2025 7:55 PM, Praveen Talari wrote:
>>>>>>> Hi Alexey,
>>>>>>>
>>>>>>> On 9/15/2025 3:09 PM, Alexey Klimov wrote:
>>>>>>>> (removing <quic_mnaresh@quicinc.com> from c/c -- too many mail not
>>>>>>>> delivered)
>>>>>>>>
>>>>>>>> Hi Praveen,
>>>>>>>>
>>>>>>>> On Mon Sep 15, 2025 at 7:58 AM BST, Praveen Talari wrote:
>>>>>>>>> Hi Alexey,
>>>>>>>>>
>>>>>>>>> Really appreciate you waiting!
>>>>>>>>>
>>>>>>>>> On 9/11/2025 2:30 PM, Alexey Klimov wrote:
>>>>>>>>>> Hi Praveen,
>>>>>>>>>>
>>>>>>>>>> On Thu Sep 11, 2025 at 9:34 AM BST, Praveen Talari wrote:
>>>>>>>>>>> Hi Alexy,
>>>>>>>>>>>
>>>>>>>>>>> Thank you for update.
>>>>>>>>>>>
>>>>>>>>>>> On 9/10/2025 1:35 AM, Alexey Klimov wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> (adding Krzysztof to c/c)
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon Sep 8, 2025 at 6:43 PM BST, Alexey Klimov wrote:
>>>>>>>>>>>>> On Mon Sep 8, 2025 at 5:45 PM BST, Praveen Talari wrote:
>>>>>>>>>>>>>> A deadlock is observed in the
>>>>>>>>>>>>>> qcom_geni_serial driver during runtime
>>>>>>>>>>>>>> resume. This occurs when the pinctrl
>>>>>>>>>>>>>> subsystem reconfigures device pins
>>>>>>>>>>>>>> via msm_pinmux_set_mux() while the serial device's interrupt is an
>>>>>>>>>>>>>> active wakeup source. msm_pinmux_set_mux() calls disable_irq() or
>>>>>>>>>>>>>> __synchronize_irq(), conflicting with the active wakeup state and
>>>>>>>>>>>>>> causing the IRQ thread to enter an uninterruptible (D-state) sleep,
>>>>>>>>>>>>>> leading to system instability.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The critical call trace leading to the deadlock is:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         Call trace:
>>>>>>>>>>>>>>         __switch_to+0xe0/0x120
>>>>>>>>>>>>>>         __schedule+0x39c/0x978
>>>>>>>>>>>>>>         schedule+0x5c/0xf8
>>>>>>>>>>>>>>         __synchronize_irq+0x88/0xb4
>>>>>>>>>>>>>>         disable_irq+0x3c/0x4c
>>>>>>>>>>>>>>         msm_pinmux_set_mux+0x508/0x644
>>>>>>>>>>>>>>         pinmux_enable_setting+0x190/0x2dc
>>>>>>>>>>>>>>         pinctrl_commit_state+0x13c/0x208
>>>>>>>>>>>>>>         pinctrl_pm_select_default_state+0x4c/0xa4
>>>>>>>>>>>>>>         geni_se_resources_on+0xe8/0x154
>>>>>>>>>>>>>>         qcom_geni_serial_runtime_resume+0x4c/0x88
>>>>>>>>>>>>>>         pm_generic_runtime_resume+0x2c/0x44
>>>>>>>>>>>>>>         __genpd_runtime_resume+0x30/0x80
>>>>>>>>>>>>>>         genpd_runtime_resume+0x114/0x29c
>>>>>>>>>>>>>>         __rpm_callback+0x48/0x1d8
>>>>>>>>>>>>>>         rpm_callback+0x6c/0x78
>>>>>>>>>>>>>>         rpm_resume+0x530/0x750
>>>>>>>>>>>>>>         __pm_runtime_resume+0x50/0x94
>>>>>>>>>>>>>>         handle_threaded_wake_irq+0x30/0x94
>>>>>>>>>>>>>>         irq_thread_fn+0x2c/xa8
>>>>>>>>>>>>>>         irq_thread+0x160/x248
>>>>>>>>>>>>>>         kthread+0x110/x114
>>>>>>>>>>>>>>         ret_from_fork+0x10/x20
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> To resolve this, explicitly manage the wakeup IRQ state within the
>>>>>>>>>>>>>> runtime suspend/resume callbacks. In the
>>>>>>>>>>>>>> runtime resume callback, call
>>>>>>>>>>>>>> disable_irq_wake() before enabling resources. This preemptively
>>>>>>>>>>>>>> removes the "wakeup" capability from the IRQ, allowing subsequent
>>>>>>>>>>>>>> interrupt management calls to proceed
>>>>>>>>>>>>>> without conflict. An error path
>>>>>>>>>>>>>> re-enables the wakeup IRQ if resource enablement fails.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Conversely, in runtime suspend, call
>>>>>>>>>>>>>> enable_irq_wake() after resources
>>>>>>>>>>>>>> are disabled. This ensures the interrupt is configured as a wakeup
>>>>>>>>>>>>>> source only once the device has fully
>>>>>>>>>>>>>> entered its low-power state. An
>>>>>>>>>>>>>> error path handles disabling the wakeup IRQ
>>>>>>>>>>>>>> if the suspend operation
>>>>>>>>>>>>>> fails.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Fixes: 1afa70632c39 ("serial: qcom-geni:
>>>>>>>>>>>>>> Enable PM runtime for serial driver")
>>>>>>>>>>>>>> Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>> You forgot:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Reported-by: Alexey Klimov <alexey.klimov@linaro.org>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Also, not sure where this change will go, via
>>>>>>>>>>>>> Greg or Jiri, but ideally
>>>>>>>>>>>>> this should be picked for current -rc cycle since regression is
>>>>>>>>>>>>> introduced during latest merge window.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I also would like to test it on qrb2210 rb1 where this regression is
>>>>>>>>>>>>> reproduciable.
>>>>>>
>>>>>> Since I don't have this board, could you kindly validate the new change and
>>>>>> run a quick test on your end?
>>>>>>
>>>>>> diff --git a/drivers/pinctrl/qcom/pinctrl-msm.c
>>>>>> b/drivers/pinctrl/qcom/pinctrl-msm.c
>>>>>> index 83eb075b6bfa..3d6601dc6fcc 100644
>>>>>> --- a/drivers/pinctrl/qcom/pinctrl-msm.c
>>>>>> +++ b/drivers/pinctrl/qcom/pinctrl-msm.c
>>>>>> @@ -215,7 +215,7 @@ static int msm_pinmux_set_mux(struct pinctrl_dev
>>>>>> *pctldev,
>>>>>>           */
>>>>>>          if (d && i != gpio_func &&
>>>>>>              !test_and_set_bit(d->hwirq, pctrl->disabled_for_mux))
>>>>>> -               disable_irq(irq);
>>>>>> +               disable_irq_nosync(irq);
>>>>>>
>>>>>>          raw_spin_lock_irqsave(&pctrl->lock, flags);
>>>>>
>>>>>
>>>>> sorry Praveen, didnt see this proposal. testing on my end as well.
>>>>>
>>>>
>>>> just tested on my end and all modules load - deadlocked before this
>>>> update so there is progress (now we can load the network driver)
>>>
>>> Is it supposed to be orginal patch here plus disable_irq_nosync()?
>>> Meaning changes for qcom_geni_serial_runtime_{suspend,resume}
>>> + disable_irq_nosync() in msm_pinmux_set_mux()?
>>>
>>> It seems to work here but let me know few more runs.
>>
>>
>> So this bug, after 5 weeks is still not fixed?!?

I understand the concern. We didn’t have access to the same board where 
Alexey is seeing the issue, so we tried to reproduce it on a different 
target by simulating with wake-up IRQ scenarios.

 From our analysis, the issue seems to be triggered by commit 
1afa70632c39 ("serial: qcom-geni: Enable PM runtime for serial driver") 
in the pinctrl subsystem.

A fix has already been submitted, and we’re currently waiting for 
Alexey’s feedback to proceed.

>>
>> This is just and should be reverted long time ago.
> 
> I will send the revert, because this is just mocking the kernel process.
> 
> Best regards,
> Krzysztof