[PATCH] locking/osq_lock: Use atomic_try_cmpxchg_release() in osq_unlock()

Uros Bizjak posted 1 patch 1 month, 3 weeks ago
There is a newer version of this series
kernel/locking/osq_lock.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
[PATCH] locking/osq_lock: Use atomic_try_cmpxchg_release() in osq_unlock()
Posted by Uros Bizjak 1 month, 3 weeks ago
Replace this pattern in osq_unlock():

    atomic_cmpxchg(*ptr, old, new) == old

... with the simpler and faster:

    atomic_try_cmpxchg(*ptr, &old, new)

The x86 CMPXCHG instruction returns success in the ZF flag,
so this change saves a compare after the CMPXCHG.  The code
in the fast path of osq_unlock() improves from:

 11b:	31 c9                	xor    %ecx,%ecx
 11d:	8d 50 01             	lea    0x1(%rax),%edx
 120:	89 d0                	mov    %edx,%eax
 122:	f0 0f b1 0f          	lock cmpxchg %ecx,(%rdi)
 126:	39 c2                	cmp    %eax,%edx
 128:	75 05                	jne    12f <...>

to:

 12b:	31 d2                	xor    %edx,%edx
 12d:	83 c0 01             	add    $0x1,%eax
 130:	f0 0f b1 17          	lock cmpxchg %edx,(%rdi)
 134:	75 05                	jne    13b <...>

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
---
 kernel/locking/osq_lock.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index 75a6f6133866..b4233dc2c2b0 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -215,8 +215,7 @@ void osq_unlock(struct optimistic_spin_queue *lock)
 	/*
 	 * Fast path for the uncontended case.
 	 */
-	if (likely(atomic_cmpxchg_release(&lock->tail, curr,
-					  OSQ_UNLOCKED_VAL) == curr))
+	if (atomic_try_cmpxchg_release(&lock->tail, &curr, OSQ_UNLOCKED_VAL))
 		return;
 
 	/*
-- 
2.46.2
Re: [PATCH] locking/osq_lock: Use atomic_try_cmpxchg_release() in osq_unlock()
Posted by Peter Zijlstra 1 month ago
On Tue, Oct 01, 2024 at 01:45:57PM +0200, Uros Bizjak wrote:
> Replace this pattern in osq_unlock():
> 
>     atomic_cmpxchg(*ptr, old, new) == old
> 
> ... with the simpler and faster:
> 
>     atomic_try_cmpxchg(*ptr, &old, new)
> 
> The x86 CMPXCHG instruction returns success in the ZF flag,
> so this change saves a compare after the CMPXCHG.  The code
> in the fast path of osq_unlock() improves from:
> 
>  11b:	31 c9                	xor    %ecx,%ecx
>  11d:	8d 50 01             	lea    0x1(%rax),%edx
>  120:	89 d0                	mov    %edx,%eax
>  122:	f0 0f b1 0f          	lock cmpxchg %ecx,(%rdi)
>  126:	39 c2                	cmp    %eax,%edx
>  128:	75 05                	jne    12f <...>
> 
> to:
> 
>  12b:	31 d2                	xor    %edx,%edx
>  12d:	83 c0 01             	add    $0x1,%eax
>  130:	f0 0f b1 17          	lock cmpxchg %edx,(%rdi)
>  134:	75 05                	jne    13b <...>
> 
> Signed-off-by: Uros Bizjak <ubizjak@gmail.com>

Thanks!
Re: [PATCH] locking/osq_lock: Use atomic_try_cmpxchg_release() in osq_unlock()
Posted by Waiman Long 1 month, 3 weeks ago
On 10/1/24 07:45, Uros Bizjak wrote:
> Replace this pattern in osq_unlock():
>
>      atomic_cmpxchg(*ptr, old, new) == old
>
> ... with the simpler and faster:
>
>      atomic_try_cmpxchg(*ptr, &old, new)
>
> The x86 CMPXCHG instruction returns success in the ZF flag,
> so this change saves a compare after the CMPXCHG.  The code
> in the fast path of osq_unlock() improves from:
>
>   11b:	31 c9                	xor    %ecx,%ecx
>   11d:	8d 50 01             	lea    0x1(%rax),%edx
>   120:	89 d0                	mov    %edx,%eax
>   122:	f0 0f b1 0f          	lock cmpxchg %ecx,(%rdi)
>   126:	39 c2                	cmp    %eax,%edx
>   128:	75 05                	jne    12f <...>
>
> to:
>
>   12b:	31 d2                	xor    %edx,%edx
>   12d:	83 c0 01             	add    $0x1,%eax
>   130:	f0 0f b1 17          	lock cmpxchg %edx,(%rdi)
>   134:	75 05                	jne    13b <...>
>
> Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Will Deacon <will@kernel.org>
> Cc: Waiman Long <longman@redhat.com>
> Cc: Boqun Feng <boqun.feng@gmail.com>
> ---
>   kernel/locking/osq_lock.c | 3 +--
>   1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> index 75a6f6133866..b4233dc2c2b0 100644
> --- a/kernel/locking/osq_lock.c
> +++ b/kernel/locking/osq_lock.c
> @@ -215,8 +215,7 @@ void osq_unlock(struct optimistic_spin_queue *lock)
>   	/*
>   	 * Fast path for the uncontended case.
>   	 */
> -	if (likely(atomic_cmpxchg_release(&lock->tail, curr,
> -					  OSQ_UNLOCKED_VAL) == curr))
> +	if (atomic_try_cmpxchg_release(&lock->tail, &curr, OSQ_UNLOCKED_VAL))
>   		return;
>   
>   	/*

LGTM

Acked-by: Waiman Long <longman@redhat.com>
[tip: locking/core] locking/osq_lock: Use atomic_try_cmpxchg_release() in osq_unlock()
Posted by tip-bot2 for Uros Bizjak 1 month ago
The following commit has been merged into the locking/core branch of tip:

Commit-ID:     0d75e0c420e52b4057a2de274054a5274209a2ae
Gitweb:        https://git.kernel.org/tip/0d75e0c420e52b4057a2de274054a5274209a2ae
Author:        Uros Bizjak <ubizjak@gmail.com>
AuthorDate:    Tue, 01 Oct 2024 13:45:57 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Fri, 25 Oct 2024 10:01:50 +02:00

locking/osq_lock: Use atomic_try_cmpxchg_release() in osq_unlock()

Replace this pattern in osq_unlock():

    atomic_cmpxchg(*ptr, old, new) == old

... with the simpler and faster:

    atomic_try_cmpxchg(*ptr, &old, new)

The x86 CMPXCHG instruction returns success in the ZF flag,
so this change saves a compare after the CMPXCHG.  The code
in the fast path of osq_unlock() improves from:

 11b:	31 c9                	xor    %ecx,%ecx
 11d:	8d 50 01             	lea    0x1(%rax),%edx
 120:	89 d0                	mov    %edx,%eax
 122:	f0 0f b1 0f          	lock cmpxchg %ecx,(%rdi)
 126:	39 c2                	cmp    %eax,%edx
 128:	75 05                	jne    12f <...>

to:

 12b:	31 d2                	xor    %edx,%edx
 12d:	83 c0 01             	add    $0x1,%eax
 130:	f0 0f b1 17          	lock cmpxchg %edx,(%rdi)
 134:	75 05                	jne    13b <...>

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Waiman Long <longman@redhat.com>
Link: https://lore.kernel.org/r/20241001114606.820277-1-ubizjak@gmail.com
---
 kernel/locking/osq_lock.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index 75a6f61..b4233dc 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -215,8 +215,7 @@ void osq_unlock(struct optimistic_spin_queue *lock)
 	/*
 	 * Fast path for the uncontended case.
 	 */
-	if (likely(atomic_cmpxchg_release(&lock->tail, curr,
-					  OSQ_UNLOCKED_VAL) == curr))
+	if (atomic_try_cmpxchg_release(&lock->tail, &curr, OSQ_UNLOCKED_VAL))
 		return;
 
 	/*