locking/osq_lock: Optimisations to osq_lock code.

[PATCH next v2 4/5] locking/osq_lock: Avoid writing to node->next in the osq_lock() fast path.

Posted by David Laight 2 years, 1 month ago

When osq_lock() returns false or osq_unlock() returns static
analysis shows that node->next should always be NULL.
This means that it isn't necessary to explicitly set it to NULL
prior to atomic_xchg(&lock->tail, curr) on extry to osq_lock().

Just in case there a non-obvious race condition that can leave it
non-NULL check with WARN_ON_ONCE() and NULL if set.
Note that without this check the fast path (adding at the list head)
doesn't need to to access the per-cpu osq_node at all.

Signed-off-by: David Laight <david.laight@aculab.com>
---
 kernel/locking/osq_lock.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index 27324b509f68..35bb99e96697 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -87,12 +87,17 @@ osq_wait_next(struct optimistic_spin_queue *lock,
 
 bool osq_lock(struct optimistic_spin_queue *lock)
 {
-	struct optimistic_spin_node *node = this_cpu_ptr(&osq_node);
-	struct optimistic_spin_node *prev, *next;
+	struct optimistic_spin_node *node, *prev, *next;
 	int curr = encode_cpu(smp_processor_id());
 	int prev_cpu;
 
-	node->next = NULL;
+	/*
+	 * node->next should be NULL on entry.
+	 * Check just in case there is a race somewhere.
+	 * Note that this is probably an unnecessary cache miss in the fast path.
+	 */
+	if (WARN_ON_ONCE(raw_cpu_read(osq_node.next) != NULL))
+		raw_cpu_write(osq_node.next, NULL);
 
 	/*
 	 * We need both ACQUIRE (pairs with corresponding RELEASE in
@@ -104,8 +109,9 @@ bool osq_lock(struct optimistic_spin_queue *lock)
 	if (prev_cpu == OSQ_UNLOCKED_VAL)
 		return true;
 
-	node->prev_cpu = prev_cpu;
+	node = this_cpu_ptr(&osq_node);
 	prev = decode_cpu(prev_cpu);
+	node->prev_cpu = prev_cpu;
 	node->locked = 0;
 
 	/*
-- 
2.17.1

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Re: [PATCH next v2 4/5] locking/osq_lock: Avoid writing to node->next in the osq_lock() fast path.

Posted by Ingo Molnar 2 years, 1 month ago

* David Laight <David.Laight@ACULAB.COM> wrote:

> When osq_lock() returns false or osq_unlock() returns static
> analysis shows that node->next should always be NULL.
> This means that it isn't necessary to explicitly set it to NULL
> prior to atomic_xchg(&lock->tail, curr) on extry to osq_lock().
> 
> Just in case there a non-obvious race condition that can leave it
> non-NULL check with WARN_ON_ONCE() and NULL if set.
> Note that without this check the fast path (adding at the list head)
> doesn't need to to access the per-cpu osq_node at all.
> 
> Signed-off-by: David Laight <david.laight@aculab.com>
> ---
>  kernel/locking/osq_lock.c | 14 ++++++++++----
>  1 file changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> index 27324b509f68..35bb99e96697 100644
> --- a/kernel/locking/osq_lock.c
> +++ b/kernel/locking/osq_lock.c
> @@ -87,12 +87,17 @@ osq_wait_next(struct optimistic_spin_queue *lock,
>  
>  bool osq_lock(struct optimistic_spin_queue *lock)
>  {
> -	struct optimistic_spin_node *node = this_cpu_ptr(&osq_node);
> -	struct optimistic_spin_node *prev, *next;
> +	struct optimistic_spin_node *node, *prev, *next;
>  	int curr = encode_cpu(smp_processor_id());
>  	int prev_cpu;
>  
> -	node->next = NULL;
> +	/*
> +	 * node->next should be NULL on entry.
> +	 * Check just in case there is a race somewhere.
> +	 * Note that this is probably an unnecessary cache miss in the fast path.
> +	 */
> +	if (WARN_ON_ONCE(raw_cpu_read(osq_node.next) != NULL))
> +		raw_cpu_write(osq_node.next, NULL);

The fix-uppery and explanation about something that shouldn't happen is 
excessive: please just put a plain WARN_ON_ONCE() here - which we can 
remove in a release or so.

Thanks,

	Ingo

Re: [PATCH next v2 4/5] locking/osq_lock: Avoid writing to node->next in the osq_lock() fast path.

Posted by Waiman Long 2 years, 1 month ago

On 12/31/23 16:54, David Laight wrote:
> When osq_lock() returns false or osq_unlock() returns static
> analysis shows that node->next should always be NULL.
> This means that it isn't necessary to explicitly set it to NULL
> prior to atomic_xchg(&lock->tail, curr) on extry to osq_lock().
>
> Just in case there a non-obvious race condition that can leave it
> non-NULL check with WARN_ON_ONCE() and NULL if set.
> Note that without this check the fast path (adding at the list head)
> doesn't need to to access the per-cpu osq_node at all.
>
> Signed-off-by: David Laight <david.laight@aculab.com>
> ---
>   kernel/locking/osq_lock.c | 14 ++++++++++----
>   1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> index 27324b509f68..35bb99e96697 100644
> --- a/kernel/locking/osq_lock.c
> +++ b/kernel/locking/osq_lock.c
> @@ -87,12 +87,17 @@ osq_wait_next(struct optimistic_spin_queue *lock,
>   
>   bool osq_lock(struct optimistic_spin_queue *lock)
>   {
> -	struct optimistic_spin_node *node = this_cpu_ptr(&osq_node);
> -	struct optimistic_spin_node *prev, *next;
> +	struct optimistic_spin_node *node, *prev, *next;
>   	int curr = encode_cpu(smp_processor_id());
>   	int prev_cpu;
>   
> -	node->next = NULL;
> +	/*
> +	 * node->next should be NULL on entry.
> +	 * Check just in case there is a race somewhere.
> +	 * Note that this is probably an unnecessary cache miss in the fast path.
> +	 */
> +	if (WARN_ON_ONCE(raw_cpu_read(osq_node.next) != NULL))
> +		raw_cpu_write(osq_node.next, NULL);
>   
>   	/*
>   	 * We need both ACQUIRE (pairs with corresponding RELEASE in
> @@ -104,8 +109,9 @@ bool osq_lock(struct optimistic_spin_queue *lock)
>   	if (prev_cpu == OSQ_UNLOCKED_VAL)
>   		return true;
>   
> -	node->prev_cpu = prev_cpu;
> +	node = this_cpu_ptr(&osq_node);
>   	prev = decode_cpu(prev_cpu);
> +	node->prev_cpu = prev_cpu;
>   	node->locked = 0;
>   
>   	/*
Reviewed-by: Waiman Long <longman@redhat.com>

[PATCH next v2 1/5] locking/osq_lock: Defer clearing node->locked until the slow osq_lock() path.
[PATCH next v2 2/5] locking/osq_lock: Optimise the vcpu_is_preempted() check.
[PATCH next v2 3/5] locking/osq_lock: Use node->prev_cpu instead of saving node->prev.
[PATCH next v2 4/5] locking/osq_lock: Avoid writing to node->next in the osq_lock() fast path.
[PATCH next v2 5/5] locking/osq_lock: Optimise decode_cpu() and per_cpu_ptr().