[PATCH 0/7] slimbus: qcom-ngd-ctrl: Fix some race conditions and deadlocks

Bjorn Andersson posted 7 patches 1 month ago
There is a newer version of this series
drivers/slimbus/qcom-ngd-ctrl.c | 127 +++++++++++++++++++++++++---------------
1 file changed, 80 insertions(+), 47 deletions(-)
[PATCH 0/7] slimbus: qcom-ngd-ctrl: Fix some race conditions and deadlocks
Posted by Bjorn Andersson 1 month ago
When the qcom-ngd-ctrl driver is probed after the ADSP remoteproc, the
SSR notifier will fire immediately, which results in
qcom_slim_ngd_ssr_pdr_notify() attempting to schedule_work() on an
unitialized work_struct.

The concrete result of this is that my db845c/RB3 now fails to boot 100%
of the time.

In reviewing the problematic code, a few other problems where
discovered, such that platform_driver_unregister() is used to unregister
the child device.

Lastly, with the db845c booting, it was determined that attempting to
stop the ADSP remoteproc causes the slimbus driver to deadlock.

Note that while this solves the problems described above, and unblock
boot as well as restart of the remoteproc, this stack needs more love.

Upon tearing down the slimbus controller (when the ADSP goes down), the
slimbus devices attempts to access their slimbus devices - which is
prevented by the controller being runtime suspended. This results in a
wall of errors in the log, about failing transactions.

Signed-off-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com>
---
Bjorn Andersson (7):
      slimbus: qcom-ngd-ctrl: Fix up platform_driver registration
      slimbus: qcom-ngd-ctrl: Fix probe error path ordering
      slimbus: qcom-ngd-ctrl: Correct PDR and SSR cleanup ownership
      slimbus: qcom-ngd-ctrl: Register callbacks after creating the ngd
      slimbus: qcom-ngd-ctrl: Initialize controller resources in controller
      slimbus: qcom-ngd-ctrl: Balance pm_runtime enablement for NGD
      slimbus: qcom-ngd-ctrl: Avoid ABBA on tx_lock/ctrl->lock

 drivers/slimbus/qcom-ngd-ctrl.c | 127 +++++++++++++++++++++++++---------------
 1 file changed, 80 insertions(+), 47 deletions(-)
---
base-commit: a0ae2a256046c0c5d3778d1a194ff2e171f16e5f
change-id: 20260211-slim-ngd-dev-74166f29f035

Best regards,
-- 
Bjorn Andersson <bjorn.andersson@oss.qualcomm.com>
Re: [PATCH 0/7] slimbus: qcom-ngd-ctrl: Fix some race conditions and deadlocks
Posted by Dmitry Baryshkov 4 weeks, 1 day ago
On Mon, Mar 09, 2026 at 11:09:01PM -0500, Bjorn Andersson wrote:
> When the qcom-ngd-ctrl driver is probed after the ADSP remoteproc, the
> SSR notifier will fire immediately, which results in
> qcom_slim_ngd_ssr_pdr_notify() attempting to schedule_work() on an
> unitialized work_struct.
> 
> The concrete result of this is that my db845c/RB3 now fails to boot 100%
> of the time.
> 
> In reviewing the problematic code, a few other problems where
> discovered, such that platform_driver_unregister() is used to unregister
> the child device.
> 
> Lastly, with the db845c booting, it was determined that attempting to
> stop the ADSP remoteproc causes the slimbus driver to deadlock.
> 
> Note that while this solves the problems described above, and unblock
> boot as well as restart of the remoteproc, this stack needs more love.
> 
> Upon tearing down the slimbus controller (when the ADSP goes down), the
> slimbus devices attempts to access their slimbus devices - which is
> prevented by the controller being runtime suspended. This results in a
> wall of errors in the log, about failing transactions.
> 
> Signed-off-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com>
> ---
> Bjorn Andersson (7):
>       slimbus: qcom-ngd-ctrl: Fix up platform_driver registration
>       slimbus: qcom-ngd-ctrl: Fix probe error path ordering
>       slimbus: qcom-ngd-ctrl: Correct PDR and SSR cleanup ownership
>       slimbus: qcom-ngd-ctrl: Register callbacks after creating the ngd
>       slimbus: qcom-ngd-ctrl: Initialize controller resources in controller
>       slimbus: qcom-ngd-ctrl: Balance pm_runtime enablement for NGD
>       slimbus: qcom-ngd-ctrl: Avoid ABBA on tx_lock/ctrl->lock
> 
>  drivers/slimbus/qcom-ngd-ctrl.c | 127 +++++++++++++++++++++++++---------------
>  1 file changed, 80 insertions(+), 47 deletions(-)
> ---
> base-commit: a0ae2a256046c0c5d3778d1a194ff2e171f16e5f
> change-id: 20260211-slim-ngd-dev-74166f29f035
> 
> Best regards,
> -- 
> Bjorn Andersson <bjorn.andersson@oss.qualcomm.com>

Bjorn,

While you are at it, it looks like there is another possible issue:
ngd->base is set after platform_device_add(), possibly letting NGD
driver to use uninitialized base.

-- 
With best wishes
Dmitry
Re: [PATCH 0/7] slimbus: qcom-ngd-ctrl: Fix some race conditions and deadlocks
Posted by Bjorn Andersson 1 week, 1 day ago
On Wed, Mar 11, 2026 at 03:40:46AM +0200, Dmitry Baryshkov wrote:
> On Mon, Mar 09, 2026 at 11:09:01PM -0500, Bjorn Andersson wrote:
> > When the qcom-ngd-ctrl driver is probed after the ADSP remoteproc, the
> > SSR notifier will fire immediately, which results in
> > qcom_slim_ngd_ssr_pdr_notify() attempting to schedule_work() on an
> > unitialized work_struct.
> > 
> > The concrete result of this is that my db845c/RB3 now fails to boot 100%
> > of the time.
> > 
> > In reviewing the problematic code, a few other problems where
> > discovered, such that platform_driver_unregister() is used to unregister
> > the child device.
> > 
> > Lastly, with the db845c booting, it was determined that attempting to
> > stop the ADSP remoteproc causes the slimbus driver to deadlock.
> > 
> > Note that while this solves the problems described above, and unblock
> > boot as well as restart of the remoteproc, this stack needs more love.
> > 
> > Upon tearing down the slimbus controller (when the ADSP goes down), the
> > slimbus devices attempts to access their slimbus devices - which is
> > prevented by the controller being runtime suspended. This results in a
> > wall of errors in the log, about failing transactions.
> > 
> > Signed-off-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com>
> > ---
> > Bjorn Andersson (7):
> >       slimbus: qcom-ngd-ctrl: Fix up platform_driver registration
> >       slimbus: qcom-ngd-ctrl: Fix probe error path ordering
> >       slimbus: qcom-ngd-ctrl: Correct PDR and SSR cleanup ownership
> >       slimbus: qcom-ngd-ctrl: Register callbacks after creating the ngd
> >       slimbus: qcom-ngd-ctrl: Initialize controller resources in controller
> >       slimbus: qcom-ngd-ctrl: Balance pm_runtime enablement for NGD
> >       slimbus: qcom-ngd-ctrl: Avoid ABBA on tx_lock/ctrl->lock
> > 
> >  drivers/slimbus/qcom-ngd-ctrl.c | 127 +++++++++++++++++++++++++---------------
> >  1 file changed, 80 insertions(+), 47 deletions(-)
> > ---
> > base-commit: a0ae2a256046c0c5d3778d1a194ff2e171f16e5f
> > change-id: 20260211-slim-ngd-dev-74166f29f035
> > 
> > Best regards,
> > -- 
> > Bjorn Andersson <bjorn.andersson@oss.qualcomm.com>
> 
> Bjorn,
> 
> While you are at it, it looks like there is another possible issue:
> ngd->base is set after platform_device_add(), possibly letting NGD
> driver to use uninitialized base.
> 

ngd->base is only dereferences from qcom_slim_ngd_up_worker() and
qcom_slim_ngd_runtime_resume(), so at this time there's no concrete
problem here.

I'll keep it in mind as I continue to poke at the driver.

Regards,
Bjorn

> -- 
> With best wishes
> Dmitry