[PATCH v1] serial: qcom-geni: Fix pinctrl deadlock on runtime resume

Praveen Talari posted 1 patch 1 day, 4 hours ago
drivers/tty/serial/qcom_geni_serial.c | 22 ++++++++++++++++++++--
1 file changed, 20 insertions(+), 2 deletions(-)
[PATCH v1] serial: qcom-geni: Fix pinctrl deadlock on runtime resume
Posted by Praveen Talari 1 day, 4 hours ago
A deadlock is observed in the qcom_geni_serial driver during runtime
resume. This occurs when the pinctrl subsystem reconfigures device pins
via msm_pinmux_set_mux() while the serial device's interrupt is an
active wakeup source. msm_pinmux_set_mux() calls disable_irq() or
__synchronize_irq(), conflicting with the active wakeup state and
causing the IRQ thread to enter an uninterruptible (D-state) sleep,
leading to system instability.

The critical call trace leading to the deadlock is:

    Call trace:
    __switch_to+0xe0/0x120
    __schedule+0x39c/0x978
    schedule+0x5c/0xf8
    __synchronize_irq+0x88/0xb4
    disable_irq+0x3c/0x4c
    msm_pinmux_set_mux+0x508/0x644
    pinmux_enable_setting+0x190/0x2dc
    pinctrl_commit_state+0x13c/0x208
    pinctrl_pm_select_default_state+0x4c/0xa4
    geni_se_resources_on+0xe8/0x154
    qcom_geni_serial_runtime_resume+0x4c/0x88
    pm_generic_runtime_resume+0x2c/0x44
    __genpd_runtime_resume+0x30/0x80
    genpd_runtime_resume+0x114/0x29c
    __rpm_callback+0x48/0x1d8
    rpm_callback+0x6c/0x78
    rpm_resume+0x530/0x750
    __pm_runtime_resume+0x50/0x94
    handle_threaded_wake_irq+0x30/0x94
    irq_thread_fn+0x2c/xa8
    irq_thread+0x160/x248
    kthread+0x110/x114
    ret_from_fork+0x10/x20

To resolve this, explicitly manage the wakeup IRQ state within the
runtime suspend/resume callbacks. In the runtime resume callback, call
disable_irq_wake() before enabling resources. This preemptively
removes the "wakeup" capability from the IRQ, allowing subsequent
interrupt management calls to proceed without conflict. An error path
re-enables the wakeup IRQ if resource enablement fails.

Conversely, in runtime suspend, call enable_irq_wake() after resources
are disabled. This ensures the interrupt is configured as a wakeup
source only once the device has fully entered its low-power state. An
error path handles disabling the wakeup IRQ if the suspend operation
fails.

Fixes: 1afa70632c39 ("serial: qcom-geni: Enable PM runtime for serial driver")
Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
---
 drivers/tty/serial/qcom_geni_serial.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
index 0fdda3a1e70b..4f5ea28dfe8f 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -1926,8 +1926,17 @@ static int __maybe_unused qcom_geni_serial_runtime_suspend(struct device *dev)
 	struct uart_port *uport = &port->uport;
 	int ret = 0;
 
-	if (port->dev_data->power_state)
+	if (port->dev_data->power_state) {
 		ret = port->dev_data->power_state(uport, false);
+		if (ret) {
+			if (device_can_wakeup(dev))
+				disable_irq_wake(port->wakeup_irq);
+			return ret;
+		}
+	}
+
+	if (device_can_wakeup(dev))
+		enable_irq_wake(port->wakeup_irq);
 
 	return ret;
 }
@@ -1938,8 +1947,17 @@ static int __maybe_unused qcom_geni_serial_runtime_resume(struct device *dev)
 	struct uart_port *uport = &port->uport;
 	int ret = 0;
 
-	if (port->dev_data->power_state)
+	if (device_can_wakeup(dev))
+		disable_irq_wake(port->wakeup_irq);
+
+	if (port->dev_data->power_state) {
 		ret = port->dev_data->power_state(uport, true);
+		if (ret) {
+			if (device_can_wakeup(dev))
+				enable_irq_wake(port->wakeup_irq);
+			return ret;
+		}
+	}
 
 	return ret;
 }

base-commit: 3e8e5822146bc396d2a7e5fbb7be13271665522a
-- 
2.34.1
Re: [PATCH v1] serial: qcom-geni: Fix pinctrl deadlock on runtime resume
Posted by Alexey Klimov 1 day, 3 hours ago
On Mon Sep 8, 2025 at 5:45 PM BST, Praveen Talari wrote:
> A deadlock is observed in the qcom_geni_serial driver during runtime
> resume. This occurs when the pinctrl subsystem reconfigures device pins
> via msm_pinmux_set_mux() while the serial device's interrupt is an
> active wakeup source. msm_pinmux_set_mux() calls disable_irq() or
> __synchronize_irq(), conflicting with the active wakeup state and
> causing the IRQ thread to enter an uninterruptible (D-state) sleep,
> leading to system instability.
>
> The critical call trace leading to the deadlock is:
>
>     Call trace:
>     __switch_to+0xe0/0x120
>     __schedule+0x39c/0x978
>     schedule+0x5c/0xf8
>     __synchronize_irq+0x88/0xb4
>     disable_irq+0x3c/0x4c
>     msm_pinmux_set_mux+0x508/0x644
>     pinmux_enable_setting+0x190/0x2dc
>     pinctrl_commit_state+0x13c/0x208
>     pinctrl_pm_select_default_state+0x4c/0xa4
>     geni_se_resources_on+0xe8/0x154
>     qcom_geni_serial_runtime_resume+0x4c/0x88
>     pm_generic_runtime_resume+0x2c/0x44
>     __genpd_runtime_resume+0x30/0x80
>     genpd_runtime_resume+0x114/0x29c
>     __rpm_callback+0x48/0x1d8
>     rpm_callback+0x6c/0x78
>     rpm_resume+0x530/0x750
>     __pm_runtime_resume+0x50/0x94
>     handle_threaded_wake_irq+0x30/0x94
>     irq_thread_fn+0x2c/xa8
>     irq_thread+0x160/x248
>     kthread+0x110/x114
>     ret_from_fork+0x10/x20
>
> To resolve this, explicitly manage the wakeup IRQ state within the
> runtime suspend/resume callbacks. In the runtime resume callback, call
> disable_irq_wake() before enabling resources. This preemptively
> removes the "wakeup" capability from the IRQ, allowing subsequent
> interrupt management calls to proceed without conflict. An error path
> re-enables the wakeup IRQ if resource enablement fails.
>
> Conversely, in runtime suspend, call enable_irq_wake() after resources
> are disabled. This ensures the interrupt is configured as a wakeup
> source only once the device has fully entered its low-power state. An
> error path handles disabling the wakeup IRQ if the suspend operation
> fails.
>
> Fixes: 1afa70632c39 ("serial: qcom-geni: Enable PM runtime for serial driver")
> Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>

You forgot:

Reported-by: Alexey Klimov <alexey.klimov@linaro.org>

Also, not sure where this change will go, via Greg or Jiri, but ideally
this should be picked for current -rc cycle since regression is
introduced during latest merge window.

I also would like to test it on qrb2210 rb1 where this regression is
reproduciable.

Thanks,
Alexey

[..]
Re: [PATCH v1] serial: qcom-geni: Fix pinctrl deadlock on runtime resume
Posted by Alexey Klimov an hour ago
(adding Krzysztof to c/c)

On Mon Sep 8, 2025 at 6:43 PM BST, Alexey Klimov wrote:
> On Mon Sep 8, 2025 at 5:45 PM BST, Praveen Talari wrote:
>> A deadlock is observed in the qcom_geni_serial driver during runtime
>> resume. This occurs when the pinctrl subsystem reconfigures device pins
>> via msm_pinmux_set_mux() while the serial device's interrupt is an
>> active wakeup source. msm_pinmux_set_mux() calls disable_irq() or
>> __synchronize_irq(), conflicting with the active wakeup state and
>> causing the IRQ thread to enter an uninterruptible (D-state) sleep,
>> leading to system instability.
>>
>> The critical call trace leading to the deadlock is:
>>
>>     Call trace:
>>     __switch_to+0xe0/0x120
>>     __schedule+0x39c/0x978
>>     schedule+0x5c/0xf8
>>     __synchronize_irq+0x88/0xb4
>>     disable_irq+0x3c/0x4c
>>     msm_pinmux_set_mux+0x508/0x644
>>     pinmux_enable_setting+0x190/0x2dc
>>     pinctrl_commit_state+0x13c/0x208
>>     pinctrl_pm_select_default_state+0x4c/0xa4
>>     geni_se_resources_on+0xe8/0x154
>>     qcom_geni_serial_runtime_resume+0x4c/0x88
>>     pm_generic_runtime_resume+0x2c/0x44
>>     __genpd_runtime_resume+0x30/0x80
>>     genpd_runtime_resume+0x114/0x29c
>>     __rpm_callback+0x48/0x1d8
>>     rpm_callback+0x6c/0x78
>>     rpm_resume+0x530/0x750
>>     __pm_runtime_resume+0x50/0x94
>>     handle_threaded_wake_irq+0x30/0x94
>>     irq_thread_fn+0x2c/xa8
>>     irq_thread+0x160/x248
>>     kthread+0x110/x114
>>     ret_from_fork+0x10/x20
>>
>> To resolve this, explicitly manage the wakeup IRQ state within the
>> runtime suspend/resume callbacks. In the runtime resume callback, call
>> disable_irq_wake() before enabling resources. This preemptively
>> removes the "wakeup" capability from the IRQ, allowing subsequent
>> interrupt management calls to proceed without conflict. An error path
>> re-enables the wakeup IRQ if resource enablement fails.
>>
>> Conversely, in runtime suspend, call enable_irq_wake() after resources
>> are disabled. This ensures the interrupt is configured as a wakeup
>> source only once the device has fully entered its low-power state. An
>> error path handles disabling the wakeup IRQ if the suspend operation
>> fails.
>>
>> Fixes: 1afa70632c39 ("serial: qcom-geni: Enable PM runtime for serial driver")
>> Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
>
> You forgot:
>
> Reported-by: Alexey Klimov <alexey.klimov@linaro.org>
>
> Also, not sure where this change will go, via Greg or Jiri, but ideally
> this should be picked for current -rc cycle since regression is
> introduced during latest merge window.
>
> I also would like to test it on qrb2210 rb1 where this regression is
> reproduciable.

It doesn't seem that it fixes the regression on RB1 board:

 INFO: task kworker/u16:3:50 blocked for more than 120 seconds.
       Not tainted 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 task:kworker/u16:3   state:D stack:0     pid:50    tgid:50    ppid:2      task_flags:0x4208060 flags:0x00000010
 Workqueue: async async_run_entry_fn
 Call trace:
  __switch_to+0xf0/0x1c0 (T)
  __schedule+0x358/0x99c
  schedule+0x34/0x11c
  rpm_resume+0x17c/0x6a0
  rpm_resume+0x2c4/0x6a0
  rpm_resume+0x2c4/0x6a0
  rpm_resume+0x2c4/0x6a0
  __pm_runtime_resume+0x50/0x9c
  __driver_probe_device+0x58/0x120
  driver_probe_device+0x3c/0x154
  __driver_attach_async_helper+0x4c/0xc0
  async_run_entry_fn+0x34/0xe0
  process_one_work+0x148/0x284
  worker_thread+0x2c4/0x3e0
  kthread+0x12c/0x210
  ret_from_fork+0x10/0x20
 INFO: task irq/92-4a8c000.:79 blocked for more than 120 seconds.
       Not tainted 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 task:irq/92-4a8c000. state:D stack:0     pid:79    tgid:79    ppid:2      task_flags:0x208040 flags:0x00000010
 Call trace:
  __switch_to+0xf0/0x1c0 (T)
  __schedule+0x358/0x99c
  schedule+0x34/0x11c
  __synchronize_irq+0x90/0xcc
  disable_irq+0x3c/0x4c
  msm_pinmux_set_mux+0x3b4/0x45c
  pinmux_enable_setting+0x1fc/0x2d8
  pinctrl_commit_state+0xa0/0x260
  pinctrl_pm_select_default_state+0x4c/0xa0
  geni_se_resources_on+0xe8/0x154
  geni_serial_resource_state+0x8c/0xbc
  qcom_geni_serial_runtime_resume+0x3c/0x88
  pm_generic_runtime_resume+0x2c/0x44
  __rpm_callback+0x48/0x1e0
  rpm_callback+0x74/0x80
  rpm_resume+0x3bc/0x6a0
  __pm_runtime_resume+0x50/0x9c
  handle_threaded_wake_irq+0x30/0x80
  irq_thread_fn+0x2c/0xb0
  irq_thread+0x170/0x334
  kthread+0x12c/0x210
  ret_from_fork+0x10/0x20

I see exactly the same behaviour with this changes applied.

root@rb1:~# uname -a
Linux rb1 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13 SMP PREEMPT Tue Sep  9 20:14:22 BST 2025 aarch64 GNU/Linux

I see the same behaviour with linux-next but my local tree is a bit old,
maybe there are some dependencies.

Best regards,
Alexey