osq_lock() starts by setting node->next to NULL and node->locked to 0.
Careful analysis shows that node->next is always NULL on entry.
node->locked is set non-zero by another cpu to force a wakeup.
This can only happen after the 'prev->next = node' assignment,
so locked can be set to zero just before that (along with the assignment
to node->prev).
Only initialise node->cpu once, after that use its value instead
of smp_processor_id() - which is probably a real function call.
Should reduce cache-line bouncing a little.
Signed-off-by: David Laight <david.laight@aculab.com>
---
Re-send without the 'RE:' on the subject line.
kernel/locking/osq_lock.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index d414eef4bec6..55f5db896c02 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -51,7 +51,7 @@ osq_wait_next(struct optimistic_spin_queue *lock,
struct optimistic_spin_node *prev)
{
struct optimistic_spin_node *next = NULL;
- int curr = encode_cpu(smp_processor_id());
+ int curr = node->cpu;
int old;
/*
@@ -98,12 +98,10 @@ bool osq_lock(struct optimistic_spin_queue *lock)
{
struct optimistic_spin_node *node = this_cpu_ptr(&osq_node);
struct optimistic_spin_node *prev, *next;
- int curr = encode_cpu(smp_processor_id());
int old;
- node->locked = 0;
- node->next = NULL;
- node->cpu = curr;
+ if (unlikely(node->cpu == OSQ_UNLOCKED_VAL))
+ node->cpu = encode_cpu(smp_processor_id());
/*
* We need both ACQUIRE (pairs with corresponding RELEASE in
@@ -111,12 +109,13 @@ bool osq_lock(struct optimistic_spin_queue *lock)
* the node fields we just initialised) semantics when updating
* the lock tail.
*/
- old = atomic_xchg(&lock->tail, curr);
+ old = atomic_xchg(&lock->tail, node->cpu);
if (old == OSQ_UNLOCKED_VAL)
return true;
prev = decode_cpu(old);
node->prev = prev;
+ node->locked = 0;
/*
* osq_lock() unqueue
@@ -214,7 +213,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
void osq_unlock(struct optimistic_spin_queue *lock)
{
struct optimistic_spin_node *node, *next;
- int curr = encode_cpu(smp_processor_id());
+ int curr = raw_cpu_read(osq_node.cpu);
/*
* Fast path for the uncontended case.
--
2.17.1
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
On 12/29/23 17:11, David Laight wrote:
> osq_lock() starts by setting node->next to NULL and node->locked to 0.
> Careful analysis shows that node->next is always NULL on entry.
>
> node->locked is set non-zero by another cpu to force a wakeup.
> This can only happen after the 'prev->next = node' assignment,
> so locked can be set to zero just before that (along with the assignment
> to node->prev).
>
> Only initialise node->cpu once, after that use its value instead
> of smp_processor_id() - which is probably a real function call.
>
> Should reduce cache-line bouncing a little.
>
> Signed-off-by: David Laight <david.laight@aculab.com>
> ---
>
> Re-send without the 'RE:' on the subject line.
> kernel/locking/osq_lock.c | 13 ++++++-------
> 1 file changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> index d414eef4bec6..55f5db896c02 100644
> --- a/kernel/locking/osq_lock.c
> +++ b/kernel/locking/osq_lock.c
> @@ -51,7 +51,7 @@ osq_wait_next(struct optimistic_spin_queue *lock,
> struct optimistic_spin_node *prev)
> {
> struct optimistic_spin_node *next = NULL;
> - int curr = encode_cpu(smp_processor_id());
> + int curr = node->cpu;
> int old;
>
> /*
> @@ -98,12 +98,10 @@ bool osq_lock(struct optimistic_spin_queue *lock)
> {
> struct optimistic_spin_node *node = this_cpu_ptr(&osq_node);
> struct optimistic_spin_node *prev, *next;
> - int curr = encode_cpu(smp_processor_id());
> int old;
>
> - node->locked = 0;
> - node->next = NULL;
I have some concern about not clearing node->next at the beginning. I
know that next is supposed to be cleared at the end. I do have some
worry that there may exist a race condition that leaves next not cleared
before it is used again. So you either have to prove that such a race
does not exist, or initializing it to NULL at the beginning like it is
today.
Cheers,
Longman
> - node->cpu = curr;
> + if (unlikely(node->cpu == OSQ_UNLOCKED_VAL))
> + node->cpu = encode_cpu(smp_processor_id());
>
> /*
> * We need both ACQUIRE (pairs with corresponding RELEASE in
> @@ -111,12 +109,13 @@ bool osq_lock(struct optimistic_spin_queue *lock)
> * the node fields we just initialised) semantics when updating
> * the lock tail.
> */
> - old = atomic_xchg(&lock->tail, curr);
> + old = atomic_xchg(&lock->tail, node->cpu);
> if (old == OSQ_UNLOCKED_VAL)
> return true;
>
> prev = decode_cpu(old);
> node->prev = prev;
> + node->locked = 0;
>
> /*
> * osq_lock() unqueue
> @@ -214,7 +213,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
> void osq_unlock(struct optimistic_spin_queue *lock)
> {
> struct optimistic_spin_node *node, *next;
> - int curr = encode_cpu(smp_processor_id());
> + int curr = raw_cpu_read(osq_node.cpu);
>
> /*
> * Fast path for the uncontended case.
From: Waiman Long
> Sent: 30 December 2023 03:20
>
> On 12/29/23 17:11, David Laight wrote:
> > osq_lock() starts by setting node->next to NULL and node->locked to 0.
> > Careful analysis shows that node->next is always NULL on entry.
> >
> > node->locked is set non-zero by another cpu to force a wakeup.
> > This can only happen after the 'prev->next = node' assignment,
> > so locked can be set to zero just before that (along with the assignment
> > to node->prev).
> >
> > Only initialise node->cpu once, after that use its value instead
> > of smp_processor_id() - which is probably a real function call.
> >
> > Should reduce cache-line bouncing a little.
> >
> > Signed-off-by: David Laight <david.laight@aculab.com>
> > ---
> >
> > Re-send without the 'RE:' on the subject line.
> > kernel/locking/osq_lock.c | 13 ++++++-------
> > 1 file changed, 6 insertions(+), 7 deletions(-)
> >
> > diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> > index d414eef4bec6..55f5db896c02 100644
> > --- a/kernel/locking/osq_lock.c
> > +++ b/kernel/locking/osq_lock.c
> > @@ -51,7 +51,7 @@ osq_wait_next(struct optimistic_spin_queue *lock,
> > struct optimistic_spin_node *prev)
> > {
> > struct optimistic_spin_node *next = NULL;
> > - int curr = encode_cpu(smp_processor_id());
> > + int curr = node->cpu;
> > int old;
> >
> > /*
> > @@ -98,12 +98,10 @@ bool osq_lock(struct optimistic_spin_queue *lock)
> > {
> > struct optimistic_spin_node *node = this_cpu_ptr(&osq_node);
> > struct optimistic_spin_node *prev, *next;
> > - int curr = encode_cpu(smp_processor_id());
> > int old;
> >
> > - node->locked = 0;
> > - node->next = NULL;
>
> I have some concern about not clearing node->next at the beginning. I
> know that next is supposed to be cleared at the end. I do have some
> worry that there may exist a race condition that leaves next not cleared
> before it is used again. So you either have to prove that such a race
> does not exist, or initializing it to NULL at the beginning like it is
> today.
I've stared at the code some more :-)
There are two places where node->next is written non-NULL, both in osq_lock().
The first is at the top of the slow-path where prev->next = node
(this should be overwriting the NULL set (or not) on entry).
The second is at the bottom of osq_lock() prev->next = next (Step C)
where the NULL written in Step A is written with the correct 'next' node.
After either of those the 'node' cpu must later either take the unqueue
exit from osq_lock() or call osq_unlock().
Both require it wait for node->next be non-NULL and NULL it.
If code calls osq_lock() twice all bets are off!
Even if (somehow) node->next was non-NULL on entry it will be set by an
osq_lock() call from another cpu.
The only problem would be if osq_unlock() were called before the queueing
cpu set prev->next = node.
That in itself is unlikely - but would happen if node->next were always set.
I don't completely understand the 'acquire'/'release' semantics (they didn't
exist when I started doing SMP kernel code in the late 1980s).
But it looks odd that osq_unlock()'s fast path uses _release but the very
similar code in osq_wait_next() uses _acquire.
Indeed, apart from some (assumed) optimisations, I think osq_unlock()
could just be:
next = osq_wait_next(lock, this_cpu_ptr(&osq_node), 0);
if (next)
next->locked = 1;
I don't think the order of the tests for lock->tail and node->next
matter in osq_wait_next().
If they were swapped the 'Second most likely case' code from osq_unlock()
could be removed.
(The 'uncontended case' doesn't need to load the address of 'node'.)
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
On Sat, Dec 30, 2023 at 03:49:52PM +0000, David Laight wrote: [...] > I don't completely understand the 'acquire'/'release' semantics (they didn't > exist when I started doing SMP kernel code in the late 1980s). > But it looks odd that osq_unlock()'s fast path uses _release but the very > similar code in osq_wait_next() uses _acquire. > The _release in osq_unlock() is needed since unlocks are needed to be RELEASE so that lock+unlock can be a critical section (i.e. no memory accesses can escape). When osq_wait_next() is used in non unlock cases, the RELEASE is not required. As for the case where osq_wait_next() is used in osq_unlock(), there is a xchg() preceding it, which provides a full barrier, so things are fine. /me wonders whether we can relax the _acquire in osq_wait_next() into a _relaxed. > Indeed, apart from some (assumed) optimisations, I think osq_unlock() > could just be: > next = osq_wait_next(lock, this_cpu_ptr(&osq_node), 0); > if (next) > next->locked = 1; > If so we need to provide some sort of RELEASE semantics for the osq_unlock() in all the cases. Regards, Boqun > I don't think the order of the tests for lock->tail and node->next > matter in osq_wait_next(). > If they were swapped the 'Second most likely case' code from osq_unlock() > could be removed. > (The 'uncontended case' doesn't need to load the address of 'node'.) > > David > > > - > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK > Registration No: 1397386 (Wales)
From: Boqun Feng > Sent: 02 January 2024 18:54 > > On Sat, Dec 30, 2023 at 03:49:52PM +0000, David Laight wrote: > [...] > > But it looks odd that osq_unlock()'s fast path uses _release but the very > > similar code in osq_wait_next() uses _acquire. > > > > The _release in osq_unlock() is needed since unlocks are needed to be > RELEASE so that lock+unlock can be a critical section (i.e. no memory > accesses can escape). When osq_wait_next() is used in non unlock cases, > the RELEASE is not required. As for the case where osq_wait_next() is > used in osq_unlock(), there is a xchg() preceding it, which provides a > full barrier, so things are fine. I know there have been issues with ACQUIRE/RELEASE/FULL xchg in this code, but are FULL xchg always needed on node->next? > /me wonders whether we can relax the _acquire in osq_wait_next() into > a _relaxed. I wouldn't have worried about relaxed v release. > > Indeed, apart from some (assumed) optimisations, I think osq_unlock() > > could just be: > > next = osq_wait_next(lock, this_cpu_ptr(&osq_node), 0); > > if (next) > > next->locked = 1; > > > > If so we need to provide some sort of RELEASE semantics for the > osq_unlock() in all the cases. I wonder how often the unqueue code happens, and especially for the last cpu in the list? I'd only expect need_resched() to return true after spinning for a while - in which case perhaps it is more likely that there are a lot of cpu in the queue and the cpu being removed won't be last. So osq_wait_next() exits on xchg(&node->next, NULL) != NULL which is full barrier. On a slightly different note I've also wondered if 'osq_node' actually needs to be cache line aligned? You definitely don't want it spanning 2 cache line, but I'd expect that per-cpu data is mostly accessed by its own cpu? So you really aren't going to get false sharing with some other per-cpu data since the cpu is busy in this code. So __aligned(16) would do? David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
© 2016 - 2025 Red Hat, Inc.