locking/rwsem: Fix rwsem waiter optimistic spinning problem with RT tasks

[PATCH v2 1/2] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath

Posted by Waiman Long 3 years, 5 months ago

A non-first waiter can potentially spin in the for loop of
rwsem_down_write_slowpath() without sleeping but fail to acquire the
lock even if the rwsem is free if the following sequence happens:

  Non-first waiter       First waiter      Lock holder
  ----------------       ------------      -----------
  Acquire wait_lock
  rwsem_try_write_lock():
    Set handoff bit if RT or
      wait too long
    Set waiter->handoff_set
  Release wait_lock
                         Acquire wait_lock
                         Inherit waiter->handoff_set
                         Release wait_lock
					   Clear owner
                                           Release lock
  if (waiter.handoff_set) {
    rwsem_spin_on_owner(();
    if (OWNER_NULL)
      goto trylock_again;
  }
  trylock_again:
  Acquire wait_lock
  rwsem_try_write_lock():
     if (first->handoff_set && (waiter != first))
     	return false;
  Release wait_lock

It is especially problematic if the non-first waiter is an RT task and
it is running on the same CPU as the first waiter as this can lead to
live lock.

Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more consistent")
Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index 44873594de03..3839b38608da 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -636,6 +636,11 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem,
 		new = count;
 
 		if (count & RWSEM_LOCK_MASK) {
+			/*
+			 * A waiter (first or not) can set the handoff bit
+			 * if it is an RT task or wait in the wait queue
+			 * for too long.
+			 */
 			if (has_handoff || (!rt_task(waiter->task) &&
 					    !time_after(jiffies, waiter->timeout)))
 				return false;
@@ -651,11 +656,13 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem,
 	} while (!atomic_long_try_cmpxchg_acquire(&sem->count, &count, new));
 
 	/*
-	 * We have either acquired the lock with handoff bit cleared or
-	 * set the handoff bit.
+	 * We have either acquired the lock with handoff bit cleared or set
+	 * the handoff bit. Only the first waiter can have its handoff_set
+	 * set here to enable optimistic spinning in slowpath loop.
 	 */
 	if (new & RWSEM_FLAG_HANDOFF) {
-		waiter->handoff_set = true;
+		if (waiter == first)
+			waiter->handoff_set = true;
 		lockevent_inc(rwsem_wlock_handoff);
 		return false;
 	}
-- 
2.31.1

Re: [PATCH v2 1/2] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath

Posted by Peter Zijlstra 3 years, 5 months ago

On Wed, Oct 12, 2022 at 09:33:32AM -0400, Waiman Long wrote:
> A non-first waiter can potentially spin in the for loop of
> rwsem_down_write_slowpath() without sleeping but fail to acquire the
> lock even if the rwsem is free if the following sequence happens:
> 
>   Non-first waiter       First waiter      Lock holder
>   ----------------       ------------      -----------
>   Acquire wait_lock
>   rwsem_try_write_lock():
>     Set handoff bit if RT or
>       wait too long
>     Set waiter->handoff_set
>   Release wait_lock
>                          Acquire wait_lock
>                          Inherit waiter->handoff_set
>                          Release wait_lock
> 					   Clear owner
>                                            Release lock
>   if (waiter.handoff_set) {
>     rwsem_spin_on_owner(();
>     if (OWNER_NULL)
>       goto trylock_again;
>   }
>   trylock_again:
>   Acquire wait_lock
>   rwsem_try_write_lock():
>      if (first->handoff_set && (waiter != first))
>      	return false;
>   Release wait_lock
> 
> It is especially problematic if the non-first waiter is an RT task and
> it is running on the same CPU as the first waiter as this can lead to
> live lock.
> 

So why not do a better handoff? Specifically, have the owner set owner
to first-waiter instead of NULL ? (like the normal mutex code)

Re: [PATCH v2 1/2] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath

Posted by Waiman Long 3 years, 5 months ago

On 10/13/22 06:02, Peter Zijlstra wrote:
> On Wed, Oct 12, 2022 at 09:33:32AM -0400, Waiman Long wrote:
>> A non-first waiter can potentially spin in the for loop of
>> rwsem_down_write_slowpath() without sleeping but fail to acquire the
>> lock even if the rwsem is free if the following sequence happens:
>>
>>    Non-first waiter       First waiter      Lock holder
>>    ----------------       ------------      -----------
>>    Acquire wait_lock
>>    rwsem_try_write_lock():
>>      Set handoff bit if RT or
>>        wait too long
>>      Set waiter->handoff_set
>>    Release wait_lock
>>                           Acquire wait_lock
>>                           Inherit waiter->handoff_set
>>                           Release wait_lock
>> 					   Clear owner
>>                                             Release lock
>>    if (waiter.handoff_set) {
>>      rwsem_spin_on_owner(();
>>      if (OWNER_NULL)
>>        goto trylock_again;
>>    }
>>    trylock_again:
>>    Acquire wait_lock
>>    rwsem_try_write_lock():
>>       if (first->handoff_set && (waiter != first))
>>       	return false;
>>    Release wait_lock
>>
>> It is especially problematic if the non-first waiter is an RT task and
>> it is running on the same CPU as the first waiter as this can lead to
>> live lock.
>>
> So why not do a better handoff? Specifically, have the owner set owner
> to first-waiter instead of NULL ? (like the normal mutex code)

I understand your desire to make the rwsem handoff process more like 
what mutex is currently doing. I certainly think it is doable and will 
put this in my todo list. However, that needs to be done at unlock and 
wakeup time. I expect that will require moderate amount of code changes 
which will make it not that suitable for backporting to the stable releases.

I would like to see these simple fixes get merged first and then we can 
work on a major revamp of the handoff code. What do you think?

Cheers,
Longman

Re: [PATCH v2 1/2] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath

Posted by Waiman Long 3 years, 5 months ago

On 10/13/22 09:33, Waiman Long wrote:
> On 10/13/22 06:02, Peter Zijlstra wrote:
>> On Wed, Oct 12, 2022 at 09:33:32AM -0400, Waiman Long wrote:
>>> A non-first waiter can potentially spin in the for loop of
>>> rwsem_down_write_slowpath() without sleeping but fail to acquire the
>>> lock even if the rwsem is free if the following sequence happens:
>>>
>>>    Non-first waiter       First waiter      Lock holder
>>>    ----------------       ------------      -----------
>>>    Acquire wait_lock
>>>    rwsem_try_write_lock():
>>>      Set handoff bit if RT or
>>>        wait too long
>>>      Set waiter->handoff_set
>>>    Release wait_lock
>>>                           Acquire wait_lock
>>>                           Inherit waiter->handoff_set
>>>                           Release wait_lock
>>>                        Clear owner
>>>                                             Release lock
>>>    if (waiter.handoff_set) {
>>>      rwsem_spin_on_owner(();
>>>      if (OWNER_NULL)
>>>        goto trylock_again;
>>>    }
>>>    trylock_again:
>>>    Acquire wait_lock
>>>    rwsem_try_write_lock():
>>>       if (first->handoff_set && (waiter != first))
>>>           return false;
>>>    Release wait_lock
>>>
>>> It is especially problematic if the non-first waiter is an RT task and
>>> it is running on the same CPU as the first waiter as this can lead to
>>> live lock.
>>>
>> So why not do a better handoff? Specifically, have the owner set owner
>> to first-waiter instead of NULL ? (like the normal mutex code)
>
> I understand your desire to make the rwsem handoff process more like 
> what mutex is currently doing. I certainly think it is doable and will 
> put this in my todo list. However, that needs to be done at unlock and 
> wakeup time. I expect that will require moderate amount of code 
> changes which will make it not that suitable for backporting to the 
> stable releases.
>
> I would like to see these simple fixes get merged first and then we 
> can work on a major revamp of the handoff code. What do you think?
>
I am planning to post additional patches on top to rework the handoff 
code sometimes next week, but I will keep these fix patches for the 
stable releases.

Cheers,
Longman

Re: [PATCH v2 1/2] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath

Posted by Mukesh Ojha 3 years, 5 months ago

Hi,


On 10/12/2022 7:03 PM, Waiman Long wrote:
> A non-first waiter can potentially spin in the for loop of
> rwsem_down_write_slowpath() without sleeping but fail to acquire the
> lock even if the rwsem is free if the following sequence happens:
> 
>    Non-first waiter       First waiter      Lock holder
>    ----------------       ------------      -----------
>    Acquire wait_lock
>    rwsem_try_write_lock():
>      Set handoff bit if RT or
>        wait too long
>      Set waiter->handoff_set
>    Release wait_lock
>                           Acquire wait_lock
>                           Inherit waiter->handoff_set
>                           Release wait_lock
> 					   Clear owner
>                                             Release lock
>    if (waiter.handoff_set) {
>      rwsem_spin_on_owner(();
>      if (OWNER_NULL)
>        goto trylock_again;
>    }
>    trylock_again:
>    Acquire wait_lock
>    rwsem_try_write_lock():
>       if (first->handoff_set && (waiter != first))
>       	return false;
>    Release wait_lock
> 
> It is especially problematic if the non-first waiter is an RT task and
> it is running on the same CPU as the first waiter as this can lead to
> live lock.
> 
> Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more consistent")
> Signed-off-by: Waiman Long <longman@redhat.com>

Since this patch is tested and it covers the mentioned scenario.

Reviewed-and-Tested-by: Mukesh Ojha <quic_mojha@quicinc.com>

-Mukesh
> ---
>   kernel/locking/rwsem.c | 13 ++++++++++---
>   1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
> index 44873594de03..3839b38608da 100644
> --- a/kernel/locking/rwsem.c
> +++ b/kernel/locking/rwsem.c
> @@ -636,6 +636,11 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem,
>   		new = count;
>   
>   		if (count & RWSEM_LOCK_MASK) {
> +			/*
> +			 * A waiter (first or not) can set the handoff bit
> +			 * if it is an RT task or wait in the wait queue
> +			 * for too long.
> +			 */
>   			if (has_handoff || (!rt_task(waiter->task) &&
>   					    !time_after(jiffies, waiter->timeout)))
>   				return false;
> @@ -651,11 +656,13 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem,
>   	} while (!atomic_long_try_cmpxchg_acquire(&sem->count, &count, new));
>   
>   	/*
> -	 * We have either acquired the lock with handoff bit cleared or
> -	 * set the handoff bit.
> +	 * We have either acquired the lock with handoff bit cleared or set
> +	 * the handoff bit. Only the first waiter can have its handoff_set
> +	 * set here to enable optimistic spinning in slowpath loop.
>   	 */
>   	if (new & RWSEM_FLAG_HANDOFF) {
> -		waiter->handoff_set = true;
> +		if (waiter == first)
> +			waiter->handoff_set = true;
>   		lockevent_inc(rwsem_wlock_handoff);
>   		return false
>   	}