[PATCH] rwlock: remove unneeded subtraction

Roger Pau Monne posted 1 patch 2 years, 2 months ago
Test gitlab-ci failed
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/20220215093951.97830-1-roger.pau@citrix.com
xen/common/rwlock.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[PATCH] rwlock: remove unneeded subtraction
Posted by Roger Pau Monne 2 years, 2 months ago
There's no need to subtract _QR_BIAS from the lock value for storing
in the local cnts variable in the read lock slow path: the users of
the value in cnts only care about the writer-related bits and use a
mask to get the value.

Note that further setting of cnts in rspin_until_writer_unlock already
do not subtract _QR_BIAS.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/common/rwlock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/common/rwlock.c b/xen/common/rwlock.c
index dadab372b5..aa15529bbe 100644
--- a/xen/common/rwlock.c
+++ b/xen/common/rwlock.c
@@ -47,7 +47,7 @@ void queue_read_lock_slowpath(rwlock_t *lock)
     while ( atomic_read(&lock->cnts) & _QW_WMASK )
         cpu_relax();
 
-    cnts = atomic_add_return(_QR_BIAS, &lock->cnts) - _QR_BIAS;
+    cnts = atomic_add_return(_QR_BIAS, &lock->cnts);
     rspin_until_writer_unlock(lock, cnts);
 
     /*
-- 
2.34.1


Re: [PATCH] rwlock: remove unneeded subtraction
Posted by Julien Grall 2 years, 2 months ago
Hi,

On 15/02/2022 09:39, Roger Pau Monne wrote:
> There's no need to subtract _QR_BIAS from the lock value for storing
> in the local cnts variable in the read lock slow path: the users of
> the value in cnts only care about the writer-related bits and use a
> mask to get the value.
> 
> Note that further setting of cnts in rspin_until_writer_unlock already
> do not subtract _QR_BIAS.

The rwlock is a copy of the Linux implementation. So I looked at the 
history to find out why _QR_BIAS was substracted.

It looks like this was done to get better assembly on x86:

commit f9852b74bec0117b888da39d070c323ea1cb7f4c
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Mon Apr 18 01:27:03 2016 +0200

     locking/atomic, arch/qrwlock: Employ atomic_fetch_add_acquire()

     The only reason for the current code is to make GCC emit only the
     "LOCK XADD" instruction on x86 (and not do a pointless extra ADD on
     the result), do so nicer.

     Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
     Acked-by: Waiman Long <waiman.long@hpe.com>
     Cc: Andrew Morton <akpm@linux-foundation.org>
     Cc: Linus Torvalds <torvalds@linux-foundation.org>
     Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
     Cc: Peter Zijlstra <peterz@infradead.org>
     Cc: Thomas Gleixner <tglx@linutronix.de>
     Cc: linux-arch@vger.kernel.org
     Cc: linux-kernel@vger.kernel.org
     Signed-off-by: Ingo Molnar <mingo@kernel.org>

diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
index fec082338668..19248ddf37ce 100644
--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -93,7 +93,7 @@ void queued_read_lock_slowpath(struct qrwlock *lock, 
u32 cnts)
          * that accesses can't leak upwards out of our subsequent critical
          * section in the case that the lock is currently held for write.
          */
-       cnts = atomic_add_return_acquire(_QR_BIAS, &lock->cnts) - _QR_BIAS;
+       cnts = atomic_fetch_add_acquire(_QR_BIAS, &lock->cnts);
         rspin_until_writer_unlock(lock, cnts);

         /*

This is a slowpath, so probably not a concern. But I thought I would 
double check whether the x86 folks are still happy to proceed with that 
in mind.

Cheers,

-- 
Julien Grall

Re: [PATCH] rwlock: remove unneeded subtraction
Posted by Jan Beulich 2 years, 2 months ago
On 15.02.2022 14:13, Julien Grall wrote:
> On 15/02/2022 09:39, Roger Pau Monne wrote:
>> There's no need to subtract _QR_BIAS from the lock value for storing
>> in the local cnts variable in the read lock slow path: the users of
>> the value in cnts only care about the writer-related bits and use a
>> mask to get the value.
>>
>> Note that further setting of cnts in rspin_until_writer_unlock already
>> do not subtract _QR_BIAS.
> 
> The rwlock is a copy of the Linux implementation. So I looked at the 
> history to find out why _QR_BIAS was substracted.
> 
> It looks like this was done to get better assembly on x86:
> 
> commit f9852b74bec0117b888da39d070c323ea1cb7f4c
> Author: Peter Zijlstra <peterz@infradead.org>
> Date:   Mon Apr 18 01:27:03 2016 +0200
> 
>      locking/atomic, arch/qrwlock: Employ atomic_fetch_add_acquire()
> 
>      The only reason for the current code is to make GCC emit only the
>      "LOCK XADD" instruction on x86 (and not do a pointless extra ADD on
>      the result), do so nicer.
> 
>      Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>      Acked-by: Waiman Long <waiman.long@hpe.com>
>      Cc: Andrew Morton <akpm@linux-foundation.org>
>      Cc: Linus Torvalds <torvalds@linux-foundation.org>
>      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>      Cc: Peter Zijlstra <peterz@infradead.org>
>      Cc: Thomas Gleixner <tglx@linutronix.de>
>      Cc: linux-arch@vger.kernel.org
>      Cc: linux-kernel@vger.kernel.org
>      Signed-off-by: Ingo Molnar <mingo@kernel.org>
> 
> diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
> index fec082338668..19248ddf37ce 100644
> --- a/kernel/locking/qrwlock.c
> +++ b/kernel/locking/qrwlock.c
> @@ -93,7 +93,7 @@ void queued_read_lock_slowpath(struct qrwlock *lock, 
> u32 cnts)
>           * that accesses can't leak upwards out of our subsequent critical
>           * section in the case that the lock is currently held for write.
>           */
> -       cnts = atomic_add_return_acquire(_QR_BIAS, &lock->cnts) - _QR_BIAS;
> +       cnts = atomic_fetch_add_acquire(_QR_BIAS, &lock->cnts);
>          rspin_until_writer_unlock(lock, cnts);
> 
>          /*
> 
> This is a slowpath, so probably not a concern. But I thought I would 
> double check whether the x86 folks are still happy to proceed with that 
> in mind.

Hmm, that's an interesting observation. Roger - did you inspect the
generated code? At the very least the description may want amending.

Jan


Re: [PATCH] rwlock: remove unneeded subtraction
Posted by Roger Pau Monné 2 years, 2 months ago
On Tue, Feb 15, 2022 at 02:22:02PM +0100, Jan Beulich wrote:
> On 15.02.2022 14:13, Julien Grall wrote:
> > On 15/02/2022 09:39, Roger Pau Monne wrote:
> >> There's no need to subtract _QR_BIAS from the lock value for storing
> >> in the local cnts variable in the read lock slow path: the users of
> >> the value in cnts only care about the writer-related bits and use a
> >> mask to get the value.
> >>
> >> Note that further setting of cnts in rspin_until_writer_unlock already
> >> do not subtract _QR_BIAS.
> > 
> > The rwlock is a copy of the Linux implementation. So I looked at the 
> > history to find out why _QR_BIAS was substracted.
> > 
> > It looks like this was done to get better assembly on x86:
> > 
> > commit f9852b74bec0117b888da39d070c323ea1cb7f4c
> > Author: Peter Zijlstra <peterz@infradead.org>
> > Date:   Mon Apr 18 01:27:03 2016 +0200
> > 
> >      locking/atomic, arch/qrwlock: Employ atomic_fetch_add_acquire()
> > 
> >      The only reason for the current code is to make GCC emit only the
> >      "LOCK XADD" instruction on x86 (and not do a pointless extra ADD on
> >      the result), do so nicer.
> > 
> >      Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> >      Acked-by: Waiman Long <waiman.long@hpe.com>
> >      Cc: Andrew Morton <akpm@linux-foundation.org>
> >      Cc: Linus Torvalds <torvalds@linux-foundation.org>
> >      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> >      Cc: Peter Zijlstra <peterz@infradead.org>
> >      Cc: Thomas Gleixner <tglx@linutronix.de>
> >      Cc: linux-arch@vger.kernel.org
> >      Cc: linux-kernel@vger.kernel.org
> >      Signed-off-by: Ingo Molnar <mingo@kernel.org>
> > 
> > diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
> > index fec082338668..19248ddf37ce 100644
> > --- a/kernel/locking/qrwlock.c
> > +++ b/kernel/locking/qrwlock.c
> > @@ -93,7 +93,7 @@ void queued_read_lock_slowpath(struct qrwlock *lock, 
> > u32 cnts)
> >           * that accesses can't leak upwards out of our subsequent critical
> >           * section in the case that the lock is currently held for write.
> >           */
> > -       cnts = atomic_add_return_acquire(_QR_BIAS, &lock->cnts) - _QR_BIAS;
> > +       cnts = atomic_fetch_add_acquire(_QR_BIAS, &lock->cnts);
> >          rspin_until_writer_unlock(lock, cnts);
> > 
> >          /*
> > 
> > This is a slowpath, so probably not a concern. But I thought I would 
> > double check whether the x86 folks are still happy to proceed with that 
> > in mind.
> 
> Hmm, that's an interesting observation. Roger - did you inspect the
> generated code? At the very least the description may want amending.

It seems to always generate the same code for me when using gcc 8.3,
I even tried using arch_fetch_and_add directly, it always results
in:

ffff82d04022d983:       f0 0f c1 03             lock xadd %eax,(%rbx)
ffff82d04022d987:       25 00 30 00 00          and    $0x3000,%eax

Similarly clang 13.0.0 seem to always generate:

ffff82d0402085de:       f0 0f c1 03             lock xadd %eax,(%rbx)
ffff82d0402085e2:       89 c1                   mov    %eax,%ecx
ffff82d0402085e4:       81 e1 00 30 00 00       and    $0x3000,%ecx

Maybe I'm missing something, but I don't see a difference in the
generated code.

I could add to the commit message:

"Originally _QR_BIAS was subtracted from the result of
atomic_add_return_acquire in order to prevent GCC from emitting an
unneeded ADD instruction. This being in the lock slow path such
optimizations don't seem likely to make any relevant performance
difference. Also modern GCC and CLANG versions will already avoid
emitting the ADD instruction."

Thanks, Roger.

Re: [PATCH] rwlock: remove unneeded subtraction
Posted by Jan Beulich 2 years, 2 months ago
On 15.02.2022 15:16, Roger Pau Monné wrote:
> On Tue, Feb 15, 2022 at 02:22:02PM +0100, Jan Beulich wrote:
>> On 15.02.2022 14:13, Julien Grall wrote:
>>> On 15/02/2022 09:39, Roger Pau Monne wrote:
>>>> There's no need to subtract _QR_BIAS from the lock value for storing
>>>> in the local cnts variable in the read lock slow path: the users of
>>>> the value in cnts only care about the writer-related bits and use a
>>>> mask to get the value.
>>>>
>>>> Note that further setting of cnts in rspin_until_writer_unlock already
>>>> do not subtract _QR_BIAS.
>>>
>>> The rwlock is a copy of the Linux implementation. So I looked at the 
>>> history to find out why _QR_BIAS was substracted.
>>>
>>> It looks like this was done to get better assembly on x86:
>>>
>>> commit f9852b74bec0117b888da39d070c323ea1cb7f4c
>>> Author: Peter Zijlstra <peterz@infradead.org>
>>> Date:   Mon Apr 18 01:27:03 2016 +0200
>>>
>>>      locking/atomic, arch/qrwlock: Employ atomic_fetch_add_acquire()
>>>
>>>      The only reason for the current code is to make GCC emit only the
>>>      "LOCK XADD" instruction on x86 (and not do a pointless extra ADD on
>>>      the result), do so nicer.
>>>
>>>      Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>>>      Acked-by: Waiman Long <waiman.long@hpe.com>
>>>      Cc: Andrew Morton <akpm@linux-foundation.org>
>>>      Cc: Linus Torvalds <torvalds@linux-foundation.org>
>>>      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>>>      Cc: Peter Zijlstra <peterz@infradead.org>
>>>      Cc: Thomas Gleixner <tglx@linutronix.de>
>>>      Cc: linux-arch@vger.kernel.org
>>>      Cc: linux-kernel@vger.kernel.org
>>>      Signed-off-by: Ingo Molnar <mingo@kernel.org>
>>>
>>> diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
>>> index fec082338668..19248ddf37ce 100644
>>> --- a/kernel/locking/qrwlock.c
>>> +++ b/kernel/locking/qrwlock.c
>>> @@ -93,7 +93,7 @@ void queued_read_lock_slowpath(struct qrwlock *lock, 
>>> u32 cnts)
>>>           * that accesses can't leak upwards out of our subsequent critical
>>>           * section in the case that the lock is currently held for write.
>>>           */
>>> -       cnts = atomic_add_return_acquire(_QR_BIAS, &lock->cnts) - _QR_BIAS;
>>> +       cnts = atomic_fetch_add_acquire(_QR_BIAS, &lock->cnts);
>>>          rspin_until_writer_unlock(lock, cnts);
>>>
>>>          /*
>>>
>>> This is a slowpath, so probably not a concern. But I thought I would 
>>> double check whether the x86 folks are still happy to proceed with that 
>>> in mind.
>>
>> Hmm, that's an interesting observation. Roger - did you inspect the
>> generated code? At the very least the description may want amending.
> 
> It seems to always generate the same code for me when using gcc 8.3,
> I even tried using arch_fetch_and_add directly, it always results
> in:
> 
> ffff82d04022d983:       f0 0f c1 03             lock xadd %eax,(%rbx)
> ffff82d04022d987:       25 00 30 00 00          and    $0x3000,%eax
> 
> Similarly clang 13.0.0 seem to always generate:
> 
> ffff82d0402085de:       f0 0f c1 03             lock xadd %eax,(%rbx)
> ffff82d0402085e2:       89 c1                   mov    %eax,%ecx
> ffff82d0402085e4:       81 e1 00 30 00 00       and    $0x3000,%ecx
> 
> Maybe I'm missing something, but I don't see a difference in the
> generated code.

I've looked myself in the meantime, and I can largely confirm this.
Clang 5 doesn't eliminate the "add" (or really "lea") though. But
nevertheless ...

> I could add to the commit message:
> 
> "Originally _QR_BIAS was subtracted from the result of
> atomic_add_return_acquire in order to prevent GCC from emitting an
> unneeded ADD instruction. This being in the lock slow path such
> optimizations don't seem likely to make any relevant performance
> difference. Also modern GCC and CLANG versions will already avoid
> emitting the ADD instruction."

... I'm fine with this as explanation; I'd also be fine adding
this to the description while committing.

Jan


Re: [PATCH] rwlock: remove unneeded subtraction
Posted by Roger Pau Monné 2 years, 2 months ago
On Tue, Feb 15, 2022 at 03:45:00PM +0100, Jan Beulich wrote:
> On 15.02.2022 15:16, Roger Pau Monné wrote:
> > I could add to the commit message:
> > 
> > "Originally _QR_BIAS was subtracted from the result of
> > atomic_add_return_acquire in order to prevent GCC from emitting an
> > unneeded ADD instruction. This being in the lock slow path such
> > optimizations don't seem likely to make any relevant performance
> > difference. Also modern GCC and CLANG versions will already avoid
> > emitting the ADD instruction."
> 
> ... I'm fine with this as explanation; I'd also be fine adding
> this to the description while committing.

Sure, thanks.

Roger.

Re: [PATCH] rwlock: remove unneeded subtraction
Posted by Luca Fancellu 2 years, 2 months ago

> On 15 Feb 2022, at 09:39, Roger Pau Monne <roger.pau@citrix.com> wrote:
> 
> There's no need to subtract _QR_BIAS from the lock value for storing
> in the local cnts variable in the read lock slow path: the users of
> the value in cnts only care about the writer-related bits and use a
> mask to get the value.
> 
> Note that further setting of cnts in rspin_until_writer_unlock already
> do not subtract _QR_BIAS.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>

> ---
> xen/common/rwlock.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/xen/common/rwlock.c b/xen/common/rwlock.c
> index dadab372b5..aa15529bbe 100644
> --- a/xen/common/rwlock.c
> +++ b/xen/common/rwlock.c
> @@ -47,7 +47,7 @@ void queue_read_lock_slowpath(rwlock_t *lock)
>     while ( atomic_read(&lock->cnts) & _QW_WMASK )
>         cpu_relax();
> 
> -    cnts = atomic_add_return(_QR_BIAS, &lock->cnts) - _QR_BIAS;
> +    cnts = atomic_add_return(_QR_BIAS, &lock->cnts);
>     rspin_until_writer_unlock(lock, cnts);
> 
>     /*
> -- 
> 2.34.1
> 
> 


Re: [PATCH] rwlock: remove unneeded subtraction
Posted by Jan Beulich 2 years, 2 months ago
On 15.02.2022 10:39, Roger Pau Monne wrote:
> There's no need to subtract _QR_BIAS from the lock value for storing
> in the local cnts variable in the read lock slow path: the users of
> the value in cnts only care about the writer-related bits and use a
> mask to get the value.
> 
> Note that further setting of cnts in rspin_until_writer_unlock already
> do not subtract _QR_BIAS.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>