[PATCH 1/3] driver core: Fix concurrent problem of deferred_probe_extend_timeout()

Wang Wensheng posted 3 patches 1 month, 3 weeks ago
[PATCH 1/3] driver core: Fix concurrent problem of deferred_probe_extend_timeout()
Posted by Wang Wensheng 1 month, 3 weeks ago
The deferred_probe_timeout_work may be canceled forever unexpected when
deferred_probe_extend_timeout() executes concurrently. Start with
deferred_probe_timeout_work pending, and the problem would
occur after the following sequence.

         CPU0                                 CPU1
deferred_probe_extend_timeout
  -> cancel_delayed_work => true
                                     deferred_probe_extend_timeout
                                       -> cancel_delayed_wrok
                                         -> __cancel_work
                                           -> try_grab_pending
  -> schedule_delayed_work
   -> queue_delayed_work_on
since pending bit is grabbed,
just return without doing anything
                                        -> set_work_pool_and_clear_pending
                                     this __cancel_work return false and
                                     the work would never be queued again

The root cause is that the PENDING_BIT of the work_struct would be set
temporaily in __cancel_work and this bit could prevent the work_struct
to be queued in another CPU.

Use deferred_probe_mutex to protect the cancel and queue operations for
the deferred_probe_timeout_work to fix this problem.

Fixes: 2b28a1a84a0e ("driver core: Extend deferred probe timeout on driver registration")
Signed-off-by: Wang Wensheng <wangwensheng4@huawei.com>
---
 drivers/base/dd.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 13ab98e033ea..1983919917e0 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -323,6 +323,7 @@ static DECLARE_DELAYED_WORK(deferred_probe_timeout_work, deferred_probe_timeout_
 
 void deferred_probe_extend_timeout(void)
 {
+	mutex_lock(&deferred_probe_mutex);
 	/*
 	 * If the work hasn't been queued yet or if the work expired, don't
 	 * start a new one.
@@ -333,6 +334,7 @@ void deferred_probe_extend_timeout(void)
 		pr_debug("Extended deferred probe timeout by %d secs\n",
 					driver_deferred_probe_timeout);
 	}
+	mutex_unlock(&deferred_probe_mutex);
 }
 
 /**
-- 
2.22.0
Re: [PATCH 1/3] driver core: Fix concurrent problem of deferred_probe_extend_timeout()
Posted by Greg KH 1 month, 3 weeks ago
On Thu, Aug 14, 2025 at 07:10:21PM +0800, Wang Wensheng wrote:
> The deferred_probe_timeout_work may be canceled forever unexpected when
> deferred_probe_extend_timeout() executes concurrently. Start with
> deferred_probe_timeout_work pending, and the problem would
> occur after the following sequence.
> 
>          CPU0                                 CPU1
> deferred_probe_extend_timeout
>   -> cancel_delayed_work => true
>                                      deferred_probe_extend_timeout
>                                        -> cancel_delayed_wrok
>                                          -> __cancel_work
>                                            -> try_grab_pending
>   -> schedule_delayed_work
>    -> queue_delayed_work_on
> since pending bit is grabbed,
> just return without doing anything
>                                         -> set_work_pool_and_clear_pending
>                                      this __cancel_work return false and
>                                      the work would never be queued again
> 
> The root cause is that the PENDING_BIT of the work_struct would be set
> temporaily in __cancel_work and this bit could prevent the work_struct
> to be queued in another CPU.
> 
> Use deferred_probe_mutex to protect the cancel and queue operations for
> the deferred_probe_timeout_work to fix this problem.
> 
> Fixes: 2b28a1a84a0e ("driver core: Extend deferred probe timeout on driver registration")
> Signed-off-by: Wang Wensheng <wangwensheng4@huawei.com>
> ---
>  drivers/base/dd.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index 13ab98e033ea..1983919917e0 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -323,6 +323,7 @@ static DECLARE_DELAYED_WORK(deferred_probe_timeout_work, deferred_probe_timeout_
>  
>  void deferred_probe_extend_timeout(void)
>  {
> +	mutex_lock(&deferred_probe_mutex);

Perhaps use a guard() instead?

thanks,

greg k-h
Re: [PATCH 1/3] driver core: Fix concurrent problem of deferred_probe_extend_timeout()
Posted by wangwensheng (C) 1 month, 3 weeks ago

在 2025/8/14 19:37, Greg KH 写道:
> On Thu, Aug 14, 2025 at 07:10:21PM +0800, Wang Wensheng wrote:
>> The deferred_probe_timeout_work may be canceled forever unexpected when
>> deferred_probe_extend_timeout() executes concurrently. Start with
>> deferred_probe_timeout_work pending, and the problem would
>> occur after the following sequence.
>>
>>           CPU0                                 CPU1
>> deferred_probe_extend_timeout
>>    -> cancel_delayed_work => true
>>                                       deferred_probe_extend_timeout
>>                                         -> cancel_delayed_wrok
>>                                           -> __cancel_work
>>                                             -> try_grab_pending
>>    -> schedule_delayed_work
>>     -> queue_delayed_work_on
>> since pending bit is grabbed,
>> just return without doing anything
>>                                          -> set_work_pool_and_clear_pending
>>                                       this __cancel_work return false and
>>                                       the work would never be queued again
>>
>> The root cause is that the PENDING_BIT of the work_struct would be set
>> temporaily in __cancel_work and this bit could prevent the work_struct
>> to be queued in another CPU.
>>
>> Use deferred_probe_mutex to protect the cancel and queue operations for
>> the deferred_probe_timeout_work to fix this problem.
>>
>> Fixes: 2b28a1a84a0e ("driver core: Extend deferred probe timeout on driver registration")
>> Signed-off-by: Wang Wensheng <wangwensheng4@huawei.com>
>> ---
>>   drivers/base/dd.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
>> index 13ab98e033ea..1983919917e0 100644
>> --- a/drivers/base/dd.c
>> +++ b/drivers/base/dd.c
>> @@ -323,6 +323,7 @@ static DECLARE_DELAYED_WORK(deferred_probe_timeout_work, deferred_probe_timeout_
>>   
>>   void deferred_probe_extend_timeout(void)
>>   {
>> +	mutex_lock(&deferred_probe_mutex);
> 
> Perhaps use a guard() instead?
> 
> thanks,
> 
> greg k-h
> 

Thanks for your suggestion. I have sent a v2 for this signle patch, 
because the other issue is not strongly related to this and need
more discussion.
Re: [PATCH 1/3] driver core: Fix concurrent problem of deferred_probe_extend_timeout()
Posted by Greg KH 1 month, 3 weeks ago
On Thu, Aug 14, 2025 at 07:10:21PM +0800, Wang Wensheng wrote:
> The deferred_probe_timeout_work may be canceled forever unexpected when
> deferred_probe_extend_timeout() executes concurrently. Start with
> deferred_probe_timeout_work pending, and the problem would
> occur after the following sequence.
> 
>          CPU0                                 CPU1
> deferred_probe_extend_timeout
>   -> cancel_delayed_work => true
>                                      deferred_probe_extend_timeout
>                                        -> cancel_delayed_wrok
>                                          -> __cancel_work
>                                            -> try_grab_pending
>   -> schedule_delayed_work
>    -> queue_delayed_work_on
> since pending bit is grabbed,
> just return without doing anything
>                                         -> set_work_pool_and_clear_pending
>                                      this __cancel_work return false and
>                                      the work would never be queued again
> 
> The root cause is that the PENDING_BIT of the work_struct would be set
> temporaily in __cancel_work and this bit could prevent the work_struct
> to be queued in another CPU.
> 
> Use deferred_probe_mutex to protect the cancel and queue operations for
> the deferred_probe_timeout_work to fix this problem.
> 
> Fixes: 2b28a1a84a0e ("driver core: Extend deferred probe timeout on driver registration")
> Signed-off-by: Wang Wensheng <wangwensheng4@huawei.com>
> ---
>  drivers/base/dd.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index 13ab98e033ea..1983919917e0 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -323,6 +323,7 @@ static DECLARE_DELAYED_WORK(deferred_probe_timeout_work, deferred_probe_timeout_
>  
>  void deferred_probe_extend_timeout(void)
>  {
> +	mutex_lock(&deferred_probe_mutex);
>  	/*
>  	 * If the work hasn't been queued yet or if the work expired, don't
>  	 * start a new one.
> @@ -333,6 +334,7 @@ void deferred_probe_extend_timeout(void)
>  		pr_debug("Extended deferred probe timeout by %d secs\n",
>  					driver_deferred_probe_timeout);
>  	}
> +	mutex_unlock(&deferred_probe_mutex);
>  }
>  
>  /**
> -- 
> 2.22.0
> 

Hi,

This is the friendly patch-bot of Greg Kroah-Hartman.  You have sent him
a patch that has triggered this response.  He used to manually respond
to these common problems, but in order to save his sanity (he kept
writing the same thing over and over, yet to different people), I was
created.  Hopefully you will not take offence and will fix the problem
in your patch and resubmit it so that it can be accepted into the Linux
kernel tree.

You are receiving this message because of the following common error(s)
as indicated below:

- You have marked a patch with a "Fixes:" tag for a commit that is in an
  older released kernel, yet you do not have a cc: stable line in the
  signed-off-by area at all, which means that the patch will not be
  applied to any older kernel releases.  To properly fix this, please
  follow the documented rules in the
  Documentation/process/stable-kernel-rules.rst file for how to resolve
  this.

If you wish to discuss this problem further, or you have questions about
how to resolve this issue, please feel free to respond to this email and
Greg will reply once he has dug out from the pending patches received
from other developers.

thanks,

greg k-h's patch email bot