[PATCH] LoongArch: Fixup cmpxchg sematic for memory barrier

guoren@kernel.org posted 1 patch 2 years, 4 months ago
arch/loongarch/include/asm/cmpxchg.h | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
[PATCH] LoongArch: Fixup cmpxchg sematic for memory barrier
Posted by guoren@kernel.org 2 years, 4 months ago
From: Guo Ren <guoren@linux.alibaba.com>

When cmpxchg failed, no memory barrier was needed to take. Only
when cmpxchg success and the new value is written, then the memory
barriers needed.

 - cmpxchg_asm:   Unnecessary __WEAK_LLSC_WB for the fail path would
		  reduce the performance of the cmpxchg loop trying.
 - cmpxchg_small: Miss an necessary __WEAK_LLSC_WB when sc succeeds.
		  It's a bug because there is no memory synchronization
		  when sc succeeds.

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
---
 arch/loongarch/include/asm/cmpxchg.h | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/loongarch/include/asm/cmpxchg.h b/arch/loongarch/include/asm/cmpxchg.h
index 979fde61bba8..6a05b92814b6 100644
--- a/arch/loongarch/include/asm/cmpxchg.h
+++ b/arch/loongarch/include/asm/cmpxchg.h
@@ -102,8 +102,8 @@ __arch_xchg(volatile void *ptr, unsigned long x, int size)
 	"	move	$t0, %z4			\n"		\
 	"	" st "	$t0, %1				\n"		\
 	"	beqz	$t0, 1b				\n"		\
-	"2:						\n"		\
 	__WEAK_LLSC_MB							\
+	"2:						\n"		\
 	: "=&r" (__ret), "=ZB"(*m)					\
 	: "ZB"(*m), "Jr" (old), "Jr" (new)				\
 	: "t0", "memory");						\
@@ -148,10 +148,8 @@ static inline unsigned int __cmpxchg_small(volatile void *ptr, unsigned int old,
 	"	or		%1, %1, %z6	\n"
 	"	sc.w		%1, %2		\n"
 	"	beqz		%1, 1b		\n"
-	"	b		3f		\n"
-	"2:					\n"
 	__WEAK_LLSC_MB
-	"3:					\n"
+	"2:					\n"
 	: "=&r" (old32), "=&r" (temp), "=ZC" (*ptr32)
 	: "ZC" (*ptr32), "Jr" (mask), "Jr" (old), "Jr" (new)
 	: "memory");
-- 
2.36.1
Re: [PATCH] LoongArch: Fixup cmpxchg sematic for memory barrier
Posted by Xi Ruoyao 2 years, 4 months ago
On Mon, 2023-07-31 at 21:15 -0400, guoren@kernel.org wrote:
> From: Guo Ren <guoren@linux.alibaba.com>
> 
> When cmpxchg failed, no memory barrier was needed to take. Only
> when cmpxchg success and the new value is written, then the memory
> barriers needed.
> 
>  - cmpxchg_asm:   Unnecessary __WEAK_LLSC_WB for the fail path would
>                   reduce the performance of the cmpxchg loop trying.

I'm not an expert in memory models, but in practice this barrier is
really needed or cmpxchg will be "not atomic".  See
https://lore.kernel.org/linux-mips/1d49da11-51d5-e148-cb02-9bd0ee57fae6@flygoat.com/.

>  - cmpxchg_small: Miss an necessary __WEAK_LLSC_WB when sc succeeds.
>                   It's a bug because there is no memory synchronization
>                   when sc succeeds.
> 
> Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
> Signed-off-by: Guo Ren <guoren@kernel.org>
> ---
>  arch/loongarch/include/asm/cmpxchg.h | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/loongarch/include/asm/cmpxchg.h b/arch/loongarch/include/asm/cmpxchg.h
> index 979fde61bba8..6a05b92814b6 100644
> --- a/arch/loongarch/include/asm/cmpxchg.h
> +++ b/arch/loongarch/include/asm/cmpxchg.h
> @@ -102,8 +102,8 @@ __arch_xchg(volatile void *ptr, unsigned long x, int size)
>         "       move    $t0, %z4                        \n"             \
>         "       " st "  $t0, %1                         \n"             \
>         "       beqz    $t0, 1b                         \n"             \
> -       "2:                                             \n"             \
>         __WEAK_LLSC_MB                                                  \
> +       "2:                                             \n"             \
>         : "=&r" (__ret), "=ZB"(*m)                                      \
>         : "ZB"(*m), "Jr" (old), "Jr" (new)                              \
>         : "t0", "memory");                                              \
> @@ -148,10 +148,8 @@ static inline unsigned int __cmpxchg_small(volatile void *ptr, unsigned int old,
>         "       or              %1, %1, %z6     \n"
>         "       sc.w            %1, %2          \n"
>         "       beqz            %1, 1b          \n"
> -       "       b               3f              \n"
> -       "2:                                     \n"
>         __WEAK_LLSC_MB
> -       "3:                                     \n"
> +       "2:                                     \n"
>         : "=&r" (old32), "=&r" (temp), "=ZC" (*ptr32)
>         : "ZC" (*ptr32), "Jr" (mask), "Jr" (old), "Jr" (new)
>         : "memory");

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University
Re: [PATCH] LoongArch: Fixup cmpxchg sematic for memory barrier
Posted by WANG Rui 2 years, 4 months ago
Hello,

On Tue, Aug 1, 2023 at 9:16 AM <guoren@kernel.org> wrote:
> diff --git a/arch/loongarch/include/asm/cmpxchg.h b/arch/loongarch/include/asm/cmpxchg.h
> index 979fde61bba8..6a05b92814b6 100644
> --- a/arch/loongarch/include/asm/cmpxchg.h
> +++ b/arch/loongarch/include/asm/cmpxchg.h
> @@ -102,8 +102,8 @@ __arch_xchg(volatile void *ptr, unsigned long x, int size)
>         "       move    $t0, %z4                        \n"             \
>         "       " st "  $t0, %1                         \n"             \
>         "       beqz    $t0, 1b                         \n"             \
> -       "2:                                             \n"             \
>         __WEAK_LLSC_MB                                                  \
> +       "2:                                             \n"             \

Thanks for the patch.

This would look pretty good if it weren't for the special memory
barrier semantics of the LoongArch's LL and SC instructions.

The LL/SC memory barrier behavior of LoongArch:

* LL: <memory-barrier> + <load-exclusive>
* SC: <store-conditional> + <memory-barrier>

and the LoongArch's weak memory model allows load/load reorder for the
same address.

So, the __WEAK_LLSC_MB[1] is used to prevent load/load reorder and no
explicit barrier instruction is required after SC.

[1] https://lore.kernel.org/loongarch/20230516124536.535343-1-chenhuacai@loongson.cn/

Regards,
--
WANG Rui

Re: [PATCH] LoongArch: Fixup cmpxchg sematic for memory barrier
Posted by Guo Ren 2 years, 4 months ago
On Tue, Aug 1, 2023 at 10:29 AM WANG Rui <wangrui@loongson.cn> wrote:
>
> Hello,
>
> On Tue, Aug 1, 2023 at 9:16 AM <guoren@kernel.org> wrote:
> > diff --git a/arch/loongarch/include/asm/cmpxchg.h b/arch/loongarch/include/asm/cmpxchg.h
> > index 979fde61bba8..6a05b92814b6 100644
> > --- a/arch/loongarch/include/asm/cmpxchg.h
> > +++ b/arch/loongarch/include/asm/cmpxchg.h
> > @@ -102,8 +102,8 @@ __arch_xchg(volatile void *ptr, unsigned long x, int size)
> >         "       move    $t0, %z4                        \n"             \
> >         "       " st "  $t0, %1                         \n"             \
> >         "       beqz    $t0, 1b                         \n"             \
> > -       "2:                                             \n"             \
> >         __WEAK_LLSC_MB                                                  \
> > +       "2:                                             \n"             \
>
> Thanks for the patch.
>
> This would look pretty good if it weren't for the special memory
> barrier semantics of the LoongArch's LL and SC instructions.
>
> The LL/SC memory barrier behavior of LoongArch:
>
> * LL: <memory-barrier> + <load-exclusive>
> * SC: <store-conditional> + <memory-barrier>
>
> and the LoongArch's weak memory model allows load/load reorder for the
> same address.
The CoRR problem would cause wider problems than this.For this case,
do you mean your LL -> LL would be reordered?

CPU 0
          CPU1
LL(2) (set ex-monitor)

                store (break the ex-monitor)
LL(1) (reordered instruction set ex-monitor
SC(3) (successed ?)

>
> So, the __WEAK_LLSC_MB[1] is used to prevent load/load reorder and no
> explicit barrier instruction is required after SC.
>
> [1] https://lore.kernel.org/loongarch/20230516124536.535343-1-chenhuacai@loongson.cn/
>
> Regards,
> --
> WANG Rui
>


-- 
Best Regards
 Guo Ren
Re: [PATCH] LoongArch: Fixup cmpxchg sematic for memory barrier
Posted by Guo Ren 2 years, 4 months ago
On Tue, Aug 1, 2023 at 5:02 PM Guo Ren <guoren@kernel.org> wrote:
>
> On Tue, Aug 1, 2023 at 10:29 AM WANG Rui <wangrui@loongson.cn> wrote:
> >
> > Hello,
> >
> > On Tue, Aug 1, 2023 at 9:16 AM <guoren@kernel.org> wrote:
> > > diff --git a/arch/loongarch/include/asm/cmpxchg.h b/arch/loongarch/include/asm/cmpxchg.h
> > > index 979fde61bba8..6a05b92814b6 100644
> > > --- a/arch/loongarch/include/asm/cmpxchg.h
> > > +++ b/arch/loongarch/include/asm/cmpxchg.h
> > > @@ -102,8 +102,8 @@ __arch_xchg(volatile void *ptr, unsigned long x, int size)
> > >         "       move    $t0, %z4                        \n"             \
> > >         "       " st "  $t0, %1                         \n"             \
> > >         "       beqz    $t0, 1b                         \n"             \
> > > -       "2:                                             \n"             \
> > >         __WEAK_LLSC_MB                                                  \
> > > +       "2:                                             \n"             \
> >
> > Thanks for the patch.
> >
> > This would look pretty good if it weren't for the special memory
> > barrier semantics of the LoongArch's LL and SC instructions.
> >
> > The LL/SC memory barrier behavior of LoongArch:
> >
> > * LL: <memory-barrier> + <load-exclusive>
> > * SC: <store-conditional> + <memory-barrier>
> >
> > and the LoongArch's weak memory model allows load/load reorder for the
> > same address.
> The CoRR problem would cause wider problems than this.For this case,
> do you mean your LL -> LL would be reordered?
>
> CPU 0
>           CPU1
> LL(2) (set ex-monitor)
>
>                 store (break the ex-monitor)
> LL(1) (reordered instruction set ex-monitor
> SC(3) (successes ?)
Sorry for the mail client reformat, I mean:

CPU0  LL(2) (set ex-monitor)
CPU1  STORE (break the ex-monitor)
CPU0  LL(1) (reordered instruction set ex-monitor
CPU0  SC(3) (success?)

>
> >
> > So, the __WEAK_LLSC_MB[1] is used to prevent load/load reorder and no
> > explicit barrier instruction is required after SC.
> >
> > [1] https://lore.kernel.org/loongarch/20230516124536.535343-1-chenhuacai@loongson.cn/
> >
> > Regards,
> > --
> > WANG Rui
> >
>
>
> --
> Best Regards
>  Guo Ren



-- 
Best Regards
 Guo Ren
Re: [PATCH] LoongArch: Fixup cmpxchg sematic for memory barrier
Posted by WANG Rui 2 years, 4 months ago
Hello,

On Tue, Aug 1, 2023 at 5:05 PM Guo Ren <guoren@kernel.org> wrote:
>
> > The CoRR problem would cause wider problems than this.For this case,
> > do you mean your LL -> LL would be reordered?
> >
> > CPU 0
> >           CPU1
> > LL(2) (set ex-monitor)
> >
> >                 store (break the ex-monitor)
> > LL(1) (reordered instruction set ex-monitor
> > SC(3) (successes ?)
> Sorry for the mail client reformat, I mean:
>
> CPU0  LL(2) (set ex-monitor)
> CPU1  STORE (break the ex-monitor)
> CPU0  LL(1) (reordered instruction set ex-monitor
> CPU0  SC(3) (success?)

No. LL and LL won't reorder because LL implies a memory barrier(though
not acquire semantics).

Regards,
-- 
WANG Rui

Re: [PATCH] LoongArch: Fixup cmpxchg sematic for memory barrier
Posted by Guo Ren 2 years, 4 months ago
On Tue, Aug 1, 2023 at 5:32 PM WANG Rui <wangrui@loongson.cn> wrote:
>
> Hello,
>
> On Tue, Aug 1, 2023 at 5:05 PM Guo Ren <guoren@kernel.org> wrote:
> >
> > > The CoRR problem would cause wider problems than this.For this case,
> > > do you mean your LL -> LL would be reordered?
> > >
> > > CPU 0
> > >           CPU1
> > > LL(2) (set ex-monitor)
> > >
> > >                 store (break the ex-monitor)
> > > LL(1) (reordered instruction set ex-monitor
> > > SC(3) (successes ?)
> > Sorry for the mail client reformat, I mean:
> >
> > CPU0  LL(2) (set ex-monitor)
> > CPU1  STORE (break the ex-monitor)
> > CPU0  LL(1) (reordered instruction set ex-monitor
> > CPU0  SC(3) (success?)
>
> No. LL and LL won't reorder because LL implies a memory barrier(though
> not acquire semantics).
That means we could remove __WEAK_LLSC_MB totally, right?

>
> Regards,
> --
> WANG Rui
>


-- 
Best Regards
 Guo Ren
Re: [PATCH] LoongArch: Fixup cmpxchg sematic for memory barrier
Posted by WANG Rui 2 years, 4 months ago
On Tue, Aug 1, 2023 at 6:50 PM Guo Ren <guoren@kernel.org> wrote:
>
> On Tue, Aug 1, 2023 at 5:32 PM WANG Rui <wangrui@loongson.cn> wrote:
> > No. LL and LL won't reorder because LL implies a memory barrier(though
> > not acquire semantics).
> That means we could remove __WEAK_LLSC_MB totally, right?

More precisely, __WEAK_LLSC_MB is intended to prevent reordering
between LL and normal LD used to fetch the expected value for cmpxchg.

Regards,
-- 
WANG Rui

Re: [PATCH] LoongArch: Fixup cmpxchg sematic for memory barrier
Posted by Guo Ren 2 years, 4 months ago
On Tue, Aug 1, 2023 at 12:37 PM WANG Rui <wangrui@loongson.cn> wrote:
>
> On Tue, Aug 1, 2023 at 6:50 PM Guo Ren <guoren@kernel.org> wrote:
> >
> > On Tue, Aug 1, 2023 at 5:32 PM WANG Rui <wangrui@loongson.cn> wrote:
> > > No. LL and LL won't reorder because LL implies a memory barrier(though
> > > not acquire semantics).
> > That means we could remove __WEAK_LLSC_MB totally, right?
>
> More precisely, __WEAK_LLSC_MB is intended to prevent reordering
> between LL and normal LD used to fetch the expected value for cmpxchg.
Oh, that's unnecessary when cmpxchg fails.

Maybe you treat cmpxchg as a CoRR antidote in coincidence. Please
solve the CoRR problem by READ_ONCE.

See alpha architecture.

>
> Regards,
> --
> WANG Rui
>


-- 
Best Regards
 Guo Ren
Re: [PATCH] LoongArch: Fixup cmpxchg sematic for memory barrier
Posted by WANG Rui 2 years, 4 months ago
On Wed, Aug 2, 2023 at 7:17 AM Guo Ren <guoren@kernel.org> wrote:
>
> On Tue, Aug 1, 2023 at 12:37 PM WANG Rui <wangrui@loongson.cn> wrote:
> >
> > On Tue, Aug 1, 2023 at 6:50 PM Guo Ren <guoren@kernel.org> wrote:
> > >
> > > On Tue, Aug 1, 2023 at 5:32 PM WANG Rui <wangrui@loongson.cn> wrote:
> > > > No. LL and LL won't reorder because LL implies a memory barrier(though
> > > > not acquire semantics).
> > > That means we could remove __WEAK_LLSC_MB totally, right?
> >
> > More precisely, __WEAK_LLSC_MB is intended to prevent reordering
> > between LL and normal LD used to fetch the expected value for cmpxchg.
> Oh, that's unnecessary when cmpxchg fails.
>
> Maybe you treat cmpxchg as a CoRR antidote in coincidence. Please
> solve the CoRR problem by READ_ONCE.
>
> See alpha architecture.

Unfortunately, the LL instruction has no acquire semantics. Even if
our kernel team improves READ_ONCE, it cannot prevent reordering
between LL and READ_ONCE after cmpxchg fails.

LL (<memory-barrier> + <load-exclusive>); WEAK_LLSC_MB; READ_ONCE
(<normal-load>); ...

vs

LL (<memory-barrier> + <load-exclusive>); READ_ONCE (<normal-load> +
<memory-barrier>); ...

Improving READ_ONCE is really important.

Regards,
-- 
WANG Rui

Re: [PATCH] LoongArch: Fixup cmpxchg sematic for memory barrier
Posted by Xi Ruoyao 2 years, 4 months ago
On Tue, 2023-08-01 at 18:49 +0800, Guo Ren wrote:
> > On Tue, Aug 1, 2023 at 5:05 PM Guo Ren <guoren@kernel.org> wrote:
> > > 
> > > > The CoRR problem would cause wider problems than this.For this case,
> > > > do you mean your LL -> LL would be reordered?
> > > > 
> > > > CPU 0
> > > >            CPU1
> > > > LL(2) (set ex-monitor)
> > > > 
> > > >                  store (break the ex-monitor)
> > > > LL(1) (reordered instruction set ex-monitor
> > > > SC(3) (successes ?)
> > > Sorry for the mail client reformat, I mean:
> > > 
> > > CPU0  LL(2) (set ex-monitor)
> > > CPU1  STORE (break the ex-monitor)
> > > CPU0  LL(1) (reordered instruction set ex-monitor
> > > CPU0  SC(3) (success?)
> > 
> > No. LL and LL won't reorder because LL implies a memory barrier(though
> > not acquire semantics).
> That means we could remove __WEAK_LLSC_MB totally, right?

As I've said, to implement CAS on LA464 this barrier is *really* needed.
I initially didn't believe it then I spent one night (from 11 PM to 04
AM) debugging GCC libgomp test failures.

On LA664 (3A6000) things may change though.

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University
Re: [PATCH] LoongArch: Fixup cmpxchg sematic for memory barrier
Posted by Xi Ruoyao 2 years, 4 months ago
On Tue, 2023-08-01 at 19:23 +0800, Xi Ruoyao wrote:
> On Tue, 2023-08-01 at 18:49 +0800, Guo Ren wrote:
> > > On Tue, Aug 1, 2023 at 5:05 PM Guo Ren <guoren@kernel.org> wrote:
> > > > 
> > > > > The CoRR problem would cause wider problems than this.For this case,
> > > > > do you mean your LL -> LL would be reordered?
> > > > > 
> > > > > CPU 0
> > > > >            CPU1
> > > > > LL(2) (set ex-monitor)
> > > > > 
> > > > >                  store (break the ex-monitor)
> > > > > LL(1) (reordered instruction set ex-monitor
> > > > > SC(3) (successes ?)
> > > > Sorry for the mail client reformat, I mean:
> > > > 
> > > > CPU0  LL(2) (set ex-monitor)
> > > > CPU1  STORE (break the ex-monitor)
> > > > CPU0  LL(1) (reordered instruction set ex-monitor
> > > > CPU0  SC(3) (success?)
> > > 
> > > No. LL and LL won't reorder because LL implies a memory barrier(though
> > > not acquire semantics).
> > That means we could remove __WEAK_LLSC_MB totally, right?
> 
> As I've said, to implement CAS on LA464 this barrier is *really* needed.
> I initially didn't believe it then I spent one night (from 11 PM to 04
> AM) debugging GCC libgomp test failures.
> 
> On LA664 (3A6000) things may change though.

And I consider this a hardware bug, so it's out of the general concept
of general memory orders.  We are basically just applying a workaround
provided by uarch designers.

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University
Re: [PATCH] LoongArch: Fixup cmpxchg sematic for memory barrier
Posted by Will Deacon 2 years, 4 months ago
On Tue, Aug 01, 2023 at 10:29:31AM +0800, WANG Rui wrote:
> On Tue, Aug 1, 2023 at 9:16 AM <guoren@kernel.org> wrote:
> > diff --git a/arch/loongarch/include/asm/cmpxchg.h b/arch/loongarch/include/asm/cmpxchg.h
> > index 979fde61bba8..6a05b92814b6 100644
> > --- a/arch/loongarch/include/asm/cmpxchg.h
> > +++ b/arch/loongarch/include/asm/cmpxchg.h
> > @@ -102,8 +102,8 @@ __arch_xchg(volatile void *ptr, unsigned long x, int size)
> >         "       move    $t0, %z4                        \n"             \
> >         "       " st "  $t0, %1                         \n"             \
> >         "       beqz    $t0, 1b                         \n"             \
> > -       "2:                                             \n"             \
> >         __WEAK_LLSC_MB                                                  \
> > +       "2:                                             \n"             \
> 
> Thanks for the patch.
> 
> This would look pretty good if it weren't for the special memory
> barrier semantics of the LoongArch's LL and SC instructions.
> 
> The LL/SC memory barrier behavior of LoongArch:
> 
> * LL: <memory-barrier> + <load-exclusive>
> * SC: <store-conditional> + <memory-barrier>
> 
> and the LoongArch's weak memory model allows load/load reorder for the
> same address.

Hmm, somehow this one passed me by, but I think that puts you in the naughty
corner with Itanium. It probably also means your READ_ONCE() is broken,
unless the compiler emits barriers for volatile reads (like ia64)?

Will
Re: [PATCH] LoongArch: Fixup cmpxchg sematic for memory barrier
Posted by WANG Rui 2 years, 4 months ago
Hello,

On Tue, Aug 1, 2023 at 4:32 PM Will Deacon <will@kernel.org> wrote:
>
> Hmm, somehow this one passed me by, but I think that puts you in the naughty
> corner with Itanium. It probably also means your READ_ONCE() is broken,
> unless the compiler emits barriers for volatile reads (like ia64)?

Hmm, I agree with your perspective. Allowing out-of-order loads for
the same address in the memory model provides certain performance
benefits, but it also poses challenges to software. Fortunately,
hardware supports software to disable this feature when needed.

Regards,
-- 
WANG Rui