[PATCH v2] soc: qcom: pdr: Fix the potential deadlock

Mukesh Ojha posted 1 patch 1 year ago
There is a newer version of this series
drivers/soc/qcom/pdr_interface.c | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)
[PATCH v2] soc: qcom: pdr: Fix the potential deadlock
Posted by Mukesh Ojha 1 year ago
When some client process A call pdr_add_lookup() to add the look up for
the service and does schedule locator work, later a process B got a new
server packet indicating locator is up and call pdr_locator_new_server()
which eventually sets pdr->locator_init_complete to true which process A
sees and takes list lock and queries domain list but it will timeout due
to deadlock as the response will queued to the same qmi->wq and it is
ordered workqueue and process B is not able to complete new server
request work due to deadlock on list lock.

       Process A                        Process B

                                     process_scheduled_works()
pdr_add_lookup()                      qmi_data_ready_work()
 process_scheduled_works()             pdr_locator_new_server()
                                         pdr->locator_init_complete=true;
   pdr_locator_work()
    mutex_lock(&pdr->list_lock);

     pdr_locate_service()                  mutex_lock(&pdr->list_lock);

      pdr_get_domain_list()
       pr_err("PDR: %s get domain list
               txn wait failed: %d\n",
               req->service_name,
               ret);

Fix it by removing the unnecessary list iteration as the list iteration
is already being done inside locator work, so avoid it here and just
call schedule_work() here.

Fixes: fbe639b44a82 ("soc: qcom: Introduce Protection Domain Restart helpers")
CC: stable@vger.kernel.org
Signed-off-by: Saranya R <quic_sarar@quicinc.com>
Signed-off-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com>
---
Changes in v2:
 - Added Fixes tag,

 drivers/soc/qcom/pdr_interface.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/soc/qcom/pdr_interface.c b/drivers/soc/qcom/pdr_interface.c
index 328b6153b2be..71be378d2e43 100644
--- a/drivers/soc/qcom/pdr_interface.c
+++ b/drivers/soc/qcom/pdr_interface.c
@@ -75,7 +75,6 @@ static int pdr_locator_new_server(struct qmi_handle *qmi,
 {
 	struct pdr_handle *pdr = container_of(qmi, struct pdr_handle,
 					      locator_hdl);
-	struct pdr_service *pds;
 
 	mutex_lock(&pdr->lock);
 	/* Create a local client port for QMI communication */
@@ -87,12 +86,7 @@ static int pdr_locator_new_server(struct qmi_handle *qmi,
 	mutex_unlock(&pdr->lock);
 
 	/* Service pending lookup requests */
-	mutex_lock(&pdr->list_lock);
-	list_for_each_entry(pds, &pdr->lookups, node) {
-		if (pds->need_locator_lookup)
-			schedule_work(&pdr->locator_work);
-	}
-	mutex_unlock(&pdr->list_lock);
+	schedule_work(&pdr->locator_work);
 
 	return 0;
 }
-- 
2.34.1
Re: [PATCH v2] soc: qcom: pdr: Fix the potential deadlock
Posted by Bjorn Andersson 1 year ago
On Wed, Jan 29, 2025 at 09:25:44PM +0530, Mukesh Ojha wrote:
> When some client process A call pdr_add_lookup() to add the look up for
> the service and does schedule locator work, later a process B got a new
> server packet indicating locator is up and call pdr_locator_new_server()
> which eventually sets pdr->locator_init_complete to true which process A
> sees and takes list lock and queries domain list but it will timeout due
> to deadlock as the response will queued to the same qmi->wq and it is
> ordered workqueue and process B is not able to complete new server
> request work due to deadlock on list lock.
> 
>        Process A                        Process B
> 
>                                      process_scheduled_works()
> pdr_add_lookup()                      qmi_data_ready_work()
>  process_scheduled_works()             pdr_locator_new_server()
>                                          pdr->locator_init_complete=true;
>    pdr_locator_work()
>     mutex_lock(&pdr->list_lock);
> 
>      pdr_locate_service()                  mutex_lock(&pdr->list_lock);
> 
>       pdr_get_domain_list()
>        pr_err("PDR: %s get domain list
>                txn wait failed: %d\n",
>                req->service_name,
>                ret);
> 
> Fix it by removing the unnecessary list iteration as the list iteration
> is already being done inside locator work, so avoid it here and just
> call schedule_work() here.
> 

I came to the same patch while looking into the issue related to
in-kernel pd-mapper reported here:
https://lore.kernel.org/lkml/Zqet8iInnDhnxkT9@hovoldconsulting.com/

So:
Reviewed-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com>
Tested-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com>

> Fixes: fbe639b44a82 ("soc: qcom: Introduce Protection Domain Restart helpers")
> CC: stable@vger.kernel.org
> Signed-off-by: Saranya R <quic_sarar@quicinc.com>

Can we please use full names?

> Signed-off-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com>

Unfortunately I can't merge this; Saranya's S-o-b comes first which
implies that she authored the patch, but you're listed as author.

Regards,
Bjorn

> ---
> Changes in v2:
>  - Added Fixes tag,
> 
>  drivers/soc/qcom/pdr_interface.c | 8 +-------
>  1 file changed, 1 insertion(+), 7 deletions(-)
> 
> diff --git a/drivers/soc/qcom/pdr_interface.c b/drivers/soc/qcom/pdr_interface.c
> index 328b6153b2be..71be378d2e43 100644
> --- a/drivers/soc/qcom/pdr_interface.c
> +++ b/drivers/soc/qcom/pdr_interface.c
> @@ -75,7 +75,6 @@ static int pdr_locator_new_server(struct qmi_handle *qmi,
>  {
>  	struct pdr_handle *pdr = container_of(qmi, struct pdr_handle,
>  					      locator_hdl);
> -	struct pdr_service *pds;
>  
>  	mutex_lock(&pdr->lock);
>  	/* Create a local client port for QMI communication */
> @@ -87,12 +86,7 @@ static int pdr_locator_new_server(struct qmi_handle *qmi,
>  	mutex_unlock(&pdr->lock);
>  
>  	/* Service pending lookup requests */
> -	mutex_lock(&pdr->list_lock);
> -	list_for_each_entry(pds, &pdr->lookups, node) {
> -		if (pds->need_locator_lookup)
> -			schedule_work(&pdr->locator_work);
> -	}
> -	mutex_unlock(&pdr->list_lock);
> +	schedule_work(&pdr->locator_work);
>  
>  	return 0;
>  }
> -- 
> 2.34.1
>
Re: [PATCH v2] soc: qcom: pdr: Fix the potential deadlock
Posted by Mukesh Ojha 12 months ago
On Thu, Feb 06, 2025 at 04:13:25PM -0600, Bjorn Andersson wrote:

> On Wed, Jan 29, 2025 at 09:25:44PM +0530, Mukesh Ojha wrote:
> > When some client process A call pdr_add_lookup() to add the look up for
> > the service and does schedule locator work, later a process B got a new
> > server packet indicating locator is up and call pdr_locator_new_server()
> > which eventually sets pdr->locator_init_complete to true which process A
> > sees and takes list lock and queries domain list but it will timeout due
> > to deadlock as the response will queued to the same qmi->wq and it is
> > ordered workqueue and process B is not able to complete new server
> > request work due to deadlock on list lock.
> > 
> >        Process A                        Process B
> > 
> >                                      process_scheduled_works()
> > pdr_add_lookup()                      qmi_data_ready_work()
> >  process_scheduled_works()             pdr_locator_new_server()
> >                                          pdr->locator_init_complete=true;
> >    pdr_locator_work()
> >     mutex_lock(&pdr->list_lock);
> > 
> >      pdr_locate_service()                  mutex_lock(&pdr->list_lock);
> > 
> >       pdr_get_domain_list()
> >        pr_err("PDR: %s get domain list
> >                txn wait failed: %d\n",
> >                req->service_name,
> >                ret);
> > 
> > Fix it by removing the unnecessary list iteration as the list iteration
> > is already being done inside locator work, so avoid it here and just
> > call schedule_work() here.
> > 
> 
> I came to the same patch while looking into the issue related to
> in-kernel pd-mapper reported here:
> https://lore.kernel.org/lkml/Zqet8iInnDhnxkT9@hovoldconsulting.com/
> 
> So:
> Reviewed-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com>
> Tested-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com>

Thanks.,

> 
> > Fixes: fbe639b44a82 ("soc: qcom: Introduce Protection Domain Restart helpers")
> > CC: stable@vger.kernel.org
> > Signed-off-by: Saranya R <quic_sarar@quicinc.com>
> 
> Can we please use full names?

I have informed her about this and she wants the same name to be mentioned here as well.
> 
> > Signed-off-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com>
> 
> Unfortunately I can't merge this; Saranya's S-o-b comes first which
> implies that she authored the patch, but you're listed as author.

I will correct it.

-Mukesh

> 
> Regards,
> Bjorn
> 
> > ---
> > Changes in v2:
> >  - Added Fixes tag,
> > 
> >  drivers/soc/qcom/pdr_interface.c | 8 +-------
> >  1 file changed, 1 insertion(+), 7 deletions(-)
> > 
> > diff --git a/drivers/soc/qcom/pdr_interface.c b/drivers/soc/qcom/pdr_interface.c
> > index 328b6153b2be..71be378d2e43 100644
> > --- a/drivers/soc/qcom/pdr_interface.c
> > +++ b/drivers/soc/qcom/pdr_interface.c
> > @@ -75,7 +75,6 @@ static int pdr_locator_new_server(struct qmi_handle *qmi,
> >  {
> >  	struct pdr_handle *pdr = container_of(qmi, struct pdr_handle,
> >  					      locator_hdl);
> > -	struct pdr_service *pds;
> >  
> >  	mutex_lock(&pdr->lock);
> >  	/* Create a local client port for QMI communication */
> > @@ -87,12 +86,7 @@ static int pdr_locator_new_server(struct qmi_handle *qmi,
> >  	mutex_unlock(&pdr->lock);
> >  
> >  	/* Service pending lookup requests */
> > -	mutex_lock(&pdr->list_lock);
> > -	list_for_each_entry(pds, &pdr->lookups, node) {
> > -		if (pds->need_locator_lookup)
> > -			schedule_work(&pdr->locator_work);
> > -	}
> > -	mutex_unlock(&pdr->list_lock);
> > +	schedule_work(&pdr->locator_work);
> >  
> >  	return 0;
> >  }
> > -- 
> > 2.34.1
> >
Re: [PATCH v2] soc: qcom: pdr: Fix the potential deadlock
Posted by Johan Hovold 12 months ago
On Mon, Feb 10, 2025 at 02:50:18PM +0530, Mukesh Ojha wrote:
> On Thu, Feb 06, 2025 at 04:13:25PM -0600, Bjorn Andersson wrote:
> 
> > On Wed, Jan 29, 2025 at 09:25:44PM +0530, Mukesh Ojha wrote:
> > > When some client process A call pdr_add_lookup() to add the look up for
> > > the service and does schedule locator work, later a process B got a new
> > > server packet indicating locator is up and call pdr_locator_new_server()
> > > which eventually sets pdr->locator_init_complete to true which process A
> > > sees and takes list lock and queries domain list but it will timeout due
> > > to deadlock as the response will queued to the same qmi->wq and it is
> > > ordered workqueue and process B is not able to complete new server
> > > request work due to deadlock on list lock.
> > > 
> > >        Process A                        Process B
> > > 
> > >                                      process_scheduled_works()
> > > pdr_add_lookup()                      qmi_data_ready_work()
> > >  process_scheduled_works()             pdr_locator_new_server()
> > >                                          pdr->locator_init_complete=true;
> > >    pdr_locator_work()
> > >     mutex_lock(&pdr->list_lock);
> > > 
> > >      pdr_locate_service()                  mutex_lock(&pdr->list_lock);
> > > 
> > >       pdr_get_domain_list()
> > >        pr_err("PDR: %s get domain list
> > >                txn wait failed: %d\n",
> > >                req->service_name,
> > >                ret);
> > > 
> > > Fix it by removing the unnecessary list iteration as the list iteration
> > > is already being done inside locator work, so avoid it here and just
> > > call schedule_work() here.
> > > 
> > 
> > I came to the same patch while looking into the issue related to
> > in-kernel pd-mapper reported here:
> > https://lore.kernel.org/lkml/Zqet8iInnDhnxkT9@hovoldconsulting.com/
> > 
> > So:
> > Reviewed-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com>
> > Tested-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com>

I was gonna ask if you have confirmed that this indeed fixes the audio
regression with the in-kernel pd-mapper?

Is this how you discovered the issue as well, Mukesh and Saranya?

If so, please mention that in the commit message, but in any case also
include the corresponding error messages directly so that people running
into this can find the fix more easily. (I see the pr_err now, but it's
not as greppable).

A Link tag to my report would be good to have as well if this fixes the
audio regression.

Johan
Re: [PATCH v2] soc: qcom: pdr: Fix the potential deadlock
Posted by Mukesh Ojha 12 months ago
On Mon, Feb 10, 2025 at 10:43:23AM +0100, Johan Hovold wrote:
> On Mon, Feb 10, 2025 at 02:50:18PM +0530, Mukesh Ojha wrote:
> > On Thu, Feb 06, 2025 at 04:13:25PM -0600, Bjorn Andersson wrote:
> > 
> > > On Wed, Jan 29, 2025 at 09:25:44PM +0530, Mukesh Ojha wrote:
> > > > When some client process A call pdr_add_lookup() to add the look up for
> > > > the service and does schedule locator work, later a process B got a new
> > > > server packet indicating locator is up and call pdr_locator_new_server()
> > > > which eventually sets pdr->locator_init_complete to true which process A
> > > > sees and takes list lock and queries domain list but it will timeout due
> > > > to deadlock as the response will queued to the same qmi->wq and it is
> > > > ordered workqueue and process B is not able to complete new server
> > > > request work due to deadlock on list lock.
> > > > 
> > > >        Process A                        Process B
> > > > 
> > > >                                      process_scheduled_works()
> > > > pdr_add_lookup()                      qmi_data_ready_work()
> > > >  process_scheduled_works()             pdr_locator_new_server()
> > > >                                          pdr->locator_init_complete=true;
> > > >    pdr_locator_work()
> > > >     mutex_lock(&pdr->list_lock);
> > > > 
> > > >      pdr_locate_service()                  mutex_lock(&pdr->list_lock);
> > > > 
> > > >       pdr_get_domain_list()
> > > >        pr_err("PDR: %s get domain list
> > > >                txn wait failed: %d\n",
> > > >                req->service_name,
> > > >                ret);
> > > > 
> > > > Fix it by removing the unnecessary list iteration as the list iteration
> > > > is already being done inside locator work, so avoid it here and just
> > > > call schedule_work() here.
> > > > 
> > > 
> > > I came to the same patch while looking into the issue related to
> > > in-kernel pd-mapper reported here:
> > > https://lore.kernel.org/lkml/Zqet8iInnDhnxkT9@hovoldconsulting.com/
> > > 
> > > So:
> > > Reviewed-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com>
> > > Tested-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com>

Should i add this in next version ?

> 
> I was gonna ask if you have confirmed that this indeed fixes the audio
> regression with the in-kernel pd-mapper?
> 
> Is this how you discovered the issue as well, Mukesh and Saranya?

No, we are not using in kernel pd-mapper yet in downstream..

> 
> If so, please mention that in the commit message, but in any case also
> include the corresponding error messages directly so that people running
> into this can find the fix more easily. (I see the pr_err now, but it's
> not as greppable).

Below is the sample log which got in downstream when we hit this issue

13.799119:   PDR: tms/servreg get domain list txn wait failed: -110
13.799146:   PDR: service lookup for msm/adsp/sensor_pd:tms/servreg failed: -110

> 
> A Link tag to my report would be good to have as well if this fixes the
> audio regression.

I see this is somehow matching the logs you have reported, but this deadlock
is there from the very first day of pdr_interface driver.

[   14.565059] PDR: avs/audio get domain list txn wait failed: -110
[   14.571943] PDR: service lookup for avs/audio failed: -110

-Mukesh
> 
> Johan
Re: [PATCH v2] soc: qcom: pdr: Fix the potential deadlock
Posted by Johan Hovold 12 months ago
On Tue, Feb 11, 2025 at 10:37:11PM +0530, Mukesh Ojha wrote:
> On Mon, Feb 10, 2025 at 10:43:23AM +0100, Johan Hovold wrote:
> > On Mon, Feb 10, 2025 at 02:50:18PM +0530, Mukesh Ojha wrote:
> > > On Thu, Feb 06, 2025 at 04:13:25PM -0600, Bjorn Andersson wrote:

> > > > I came to the same patch while looking into the issue related to
> > > > in-kernel pd-mapper reported here:
> > > > https://lore.kernel.org/lkml/Zqet8iInnDhnxkT9@hovoldconsulting.com/
> > > > 
> > > > So:
> > > > Reviewed-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com>
> > > > Tested-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com>
> 
> Should i add this in next version ?

Yes, if there is another revision.

> > I was gonna ask if you have confirmed that this indeed fixes the audio
> > regression with the in-kernel pd-mapper?
> > 
> > Is this how you discovered the issue as well, Mukesh and Saranya?
> 
> No, we are not using in kernel pd-mapper yet in downstream..

Ok, thanks for confirming.

> > If so, please mention that in the commit message, but in any case also
> > include the corresponding error messages directly so that people running
> > into this can find the fix more easily. (I see the pr_err now, but it's
> > not as greppable).
> 
> Below is the sample log which got in downstream when we hit this issue
> 
> 13.799119:   PDR: tms/servreg get domain list txn wait failed: -110
> 13.799146:   PDR: service lookup for msm/adsp/sensor_pd:tms/servreg failed: -110

I think it would be good to include this (without the time stamp) as an
example as it would make it easier to find this fix even if the failure
happens for another service.

> > A Link tag to my report would be good to have as well if this fixes the
> > audio regression.
> 
> I see this is somehow matching the logs you have reported, but this deadlock
> is there from the very first day of pdr_interface driver.
> 
> [   14.565059] PDR: avs/audio get domain list txn wait failed: -110
> [   14.571943] PDR: service lookup for avs/audio failed: -110

Yes, but using the in-kernel pd-mapper has exposed a number of existing
bugs since it changes the timing of events enough to make it easier to
hit them.

The audio regression is a very real regression for users of Snapdragon
based laptops like, for example, the Lenovo Yoga Slim 7x.

If Bjorn has confirmed that this is the same issue (I can try to
instrument the code based on your analysis to confirm this too), then I
think it would be good to mention this in the commit message and link to
the report, for example:

	This specifically also fixes an audio regression when using the
	in-kernel pd-mapper as that makes it easier to hit this race. [1]

	Link: https://lore.kernel.org/lkml/Zqet8iInnDhnxkT9@hovoldconsulting.com/ # [1]

or similar.

Johan
Re: [PATCH v2] soc: qcom: pdr: Fix the potential deadlock
Posted by Johan Hovold 12 months ago
On Wed, Feb 12, 2025 at 10:37:35AM +0100, Johan Hovold wrote:
> On Tue, Feb 11, 2025 at 10:37:11PM +0530, Mukesh Ojha wrote:
> > On Mon, Feb 10, 2025 at 10:43:23AM +0100, Johan Hovold wrote:

> > > A Link tag to my report would be good to have as well if this fixes the
> > > audio regression.
> > 
> > I see this is somehow matching the logs you have reported, but this deadlock
> > is there from the very first day of pdr_interface driver.
> > 
> > [   14.565059] PDR: avs/audio get domain list txn wait failed: -110
> > [   14.571943] PDR: service lookup for avs/audio failed: -110
> 
> Yes, but using the in-kernel pd-mapper has exposed a number of existing
> bugs since it changes the timing of events enough to make it easier to
> hit them.
> 
> The audio regression is a very real regression for users of Snapdragon
> based laptops like, for example, the Lenovo Yoga Slim 7x.
> 
> If Bjorn has confirmed that this is the same issue (I can try to
> instrument the code based on your analysis to confirm this too), then I
> think it would be good to mention this in the commit message and link to
> the report, for example:
> 
> 	This specifically also fixes an audio regression when using the
> 	in-kernel pd-mapper as that makes it easier to hit this race. [1]
> 
> 	Link: https://lore.kernel.org/lkml/Zqet8iInnDhnxkT9@hovoldconsulting.com/ # [1]
> 
> or similar.

I can confirm that audio regression with the in-kernel pd-mapper appears
to be caused by the race that this patch fixes.

If I insert a short (100-200 ms) sleep before taking the list lock in
pdr_locator_new_server() to increase the race window, I see the audio
service failing to register on both the X1E CRD and Lenovo ThinkPad
X13s:

[   11.118557] pdr_add_lookup - tms/servreg / msm/adsp/charger_pd
[   11.443632] pdr_locator_new_server
[   11.558122] pdr_locator_new_server - taking list lock
[   11.563939] pdr_locator_new_server - releasing list lock
[   11.582178] pdr_locator_work - taking list lock
[   11.594468] pdr_locator_work - releasing list lock
[   11.992018] pdr_add_lookup - avs/audio / msm/adsp/audio_pd
[   11.992034] pdr_add_lookup - avs/audio / msm/adsp/audio_pd
[   11.992224] pdr_locator_new_server
    < 100 ms sleep inserted before taking lock in pdr_locator_new_server() >
[   11.997937] pdr_locator_work - taking list lock
[   12.093984] pdr_locator_new_server - taking list lock
[   17.120169] PDR: avs/audio get domain list txn wait failed: -110
[   17.127066] PDR: service lookup for avs/audio failed: -110
[   17.132893] pdr_locator_work - releasing list lock
[   17.139885] pdr_locator_new_server - releasing list lock

[ On the X13s, where I have not hit this issue with the in-kernel
  pd-mapper, I had to make sure to insert the sleep only on the second
  call, possibly because of interaction with the charger_pd registration
  which happened closer to the audio registration. ]

Please add a comment and link to the audio regression report as I
suggested above, and feel free to include my:

	Tested-by: Johan Hovold <johan+linaro@kernel.org>

Thanks for fixing this!

Johan
Re: [PATCH v2] soc: qcom: pdr: Fix the potential deadlock
Posted by Mukesh Ojha 12 months ago
On Wed, Feb 12, 2025 at 11:59:36AM +0100, Johan Hovold wrote:
> On Wed, Feb 12, 2025 at 10:37:35AM +0100, Johan Hovold wrote:
> > On Tue, Feb 11, 2025 at 10:37:11PM +0530, Mukesh Ojha wrote:
> > > On Mon, Feb 10, 2025 at 10:43:23AM +0100, Johan Hovold wrote:
> 
> > > > A Link tag to my report would be good to have as well if this fixes the
> > > > audio regression.
> > > 
> > > I see this is somehow matching the logs you have reported, but this deadlock
> > > is there from the very first day of pdr_interface driver.
> > > 
> > > [   14.565059] PDR: avs/audio get domain list txn wait failed: -110
> > > [   14.571943] PDR: service lookup for avs/audio failed: -110
> > 
> > Yes, but using the in-kernel pd-mapper has exposed a number of existing
> > bugs since it changes the timing of events enough to make it easier to
> > hit them.
> > 
> > The audio regression is a very real regression for users of Snapdragon
> > based laptops like, for example, the Lenovo Yoga Slim 7x.
> > 
> > If Bjorn has confirmed that this is the same issue (I can try to
> > instrument the code based on your analysis to confirm this too), then I
> > think it would be good to mention this in the commit message and link to
> > the report, for example:
> > 
> > 	This specifically also fixes an audio regression when using the
> > 	in-kernel pd-mapper as that makes it easier to hit this race. [1]
> > 
> > 	Link: https://lore.kernel.org/lkml/Zqet8iInnDhnxkT9@hovoldconsulting.com/ # [1]
> > 
> > or similar.
> 
> I can confirm that audio regression with the in-kernel pd-mapper appears
> to be caused by the race that this patch fixes.
> 
> If I insert a short (100-200 ms) sleep before taking the list lock in
> pdr_locator_new_server() to increase the race window, I see the audio
> service failing to register on both the X1E CRD and Lenovo ThinkPad
> X13s:
> 
> [   11.118557] pdr_add_lookup - tms/servreg / msm/adsp/charger_pd
> [   11.443632] pdr_locator_new_server
> [   11.558122] pdr_locator_new_server - taking list lock
> [   11.563939] pdr_locator_new_server - releasing list lock
> [   11.582178] pdr_locator_work - taking list lock
> [   11.594468] pdr_locator_work - releasing list lock
> [   11.992018] pdr_add_lookup - avs/audio / msm/adsp/audio_pd
> [   11.992034] pdr_add_lookup - avs/audio / msm/adsp/audio_pd
> [   11.992224] pdr_locator_new_server
>     < 100 ms sleep inserted before taking lock in pdr_locator_new_server() >
> [   11.997937] pdr_locator_work - taking list lock
> [   12.093984] pdr_locator_new_server - taking list lock
> [   17.120169] PDR: avs/audio get domain list txn wait failed: -110
> [   17.127066] PDR: service lookup for avs/audio failed: -110
> [   17.132893] pdr_locator_work - releasing list lock
> [   17.139885] pdr_locator_new_server - releasing list lock
> 
> [ On the X13s, where I have not hit this issue with the in-kernel
>   pd-mapper, I had to make sure to insert the sleep only on the second
>   call, possibly because of interaction with the charger_pd registration
>   which happened closer to the audio registration. ]
> 
> Please add a comment and link to the audio regression report as I
> suggested above, and feel free to include my:
> 
> 	Tested-by: Johan Hovold <johan+linaro@kernel.org>
> 
> Thanks for fixing this!

Thanks for putting extra effort in reproducing the issue and testing.

-Mukesh
> 
> Johan