arch/loongarch/include/asm/cmpxchg.h | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
From: Guo Ren <guoren@linux.alibaba.com>
When cmpxchg failed, no memory barrier was needed to take. Only
when cmpxchg success and the new value is written, then the memory
barriers needed.
- cmpxchg_asm: Unnecessary __WEAK_LLSC_WB for the fail path would
reduce the performance of the cmpxchg loop trying.
- cmpxchg_small: Miss an necessary __WEAK_LLSC_WB when sc succeeds.
It's a bug because there is no memory synchronization
when sc succeeds.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
---
arch/loongarch/include/asm/cmpxchg.h | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/arch/loongarch/include/asm/cmpxchg.h b/arch/loongarch/include/asm/cmpxchg.h
index 979fde61bba8..6a05b92814b6 100644
--- a/arch/loongarch/include/asm/cmpxchg.h
+++ b/arch/loongarch/include/asm/cmpxchg.h
@@ -102,8 +102,8 @@ __arch_xchg(volatile void *ptr, unsigned long x, int size)
" move $t0, %z4 \n" \
" " st " $t0, %1 \n" \
" beqz $t0, 1b \n" \
- "2: \n" \
__WEAK_LLSC_MB \
+ "2: \n" \
: "=&r" (__ret), "=ZB"(*m) \
: "ZB"(*m), "Jr" (old), "Jr" (new) \
: "t0", "memory"); \
@@ -148,10 +148,8 @@ static inline unsigned int __cmpxchg_small(volatile void *ptr, unsigned int old,
" or %1, %1, %z6 \n"
" sc.w %1, %2 \n"
" beqz %1, 1b \n"
- " b 3f \n"
- "2: \n"
__WEAK_LLSC_MB
- "3: \n"
+ "2: \n"
: "=&r" (old32), "=&r" (temp), "=ZC" (*ptr32)
: "ZC" (*ptr32), "Jr" (mask), "Jr" (old), "Jr" (new)
: "memory");
--
2.36.1
On Mon, 2023-07-31 at 21:15 -0400, guoren@kernel.org wrote: > From: Guo Ren <guoren@linux.alibaba.com> > > When cmpxchg failed, no memory barrier was needed to take. Only > when cmpxchg success and the new value is written, then the memory > barriers needed. > > - cmpxchg_asm: Unnecessary __WEAK_LLSC_WB for the fail path would > reduce the performance of the cmpxchg loop trying. I'm not an expert in memory models, but in practice this barrier is really needed or cmpxchg will be "not atomic". See https://lore.kernel.org/linux-mips/1d49da11-51d5-e148-cb02-9bd0ee57fae6@flygoat.com/. > - cmpxchg_small: Miss an necessary __WEAK_LLSC_WB when sc succeeds. > It's a bug because there is no memory synchronization > when sc succeeds. > > Signed-off-by: Guo Ren <guoren@linux.alibaba.com> > Signed-off-by: Guo Ren <guoren@kernel.org> > --- > arch/loongarch/include/asm/cmpxchg.h | 6 ++---- > 1 file changed, 2 insertions(+), 4 deletions(-) > > diff --git a/arch/loongarch/include/asm/cmpxchg.h b/arch/loongarch/include/asm/cmpxchg.h > index 979fde61bba8..6a05b92814b6 100644 > --- a/arch/loongarch/include/asm/cmpxchg.h > +++ b/arch/loongarch/include/asm/cmpxchg.h > @@ -102,8 +102,8 @@ __arch_xchg(volatile void *ptr, unsigned long x, int size) > " move $t0, %z4 \n" \ > " " st " $t0, %1 \n" \ > " beqz $t0, 1b \n" \ > - "2: \n" \ > __WEAK_LLSC_MB \ > + "2: \n" \ > : "=&r" (__ret), "=ZB"(*m) \ > : "ZB"(*m), "Jr" (old), "Jr" (new) \ > : "t0", "memory"); \ > @@ -148,10 +148,8 @@ static inline unsigned int __cmpxchg_small(volatile void *ptr, unsigned int old, > " or %1, %1, %z6 \n" > " sc.w %1, %2 \n" > " beqz %1, 1b \n" > - " b 3f \n" > - "2: \n" > __WEAK_LLSC_MB > - "3: \n" > + "2: \n" > : "=&r" (old32), "=&r" (temp), "=ZC" (*ptr32) > : "ZC" (*ptr32), "Jr" (mask), "Jr" (old), "Jr" (new) > : "memory"); -- Xi Ruoyao <xry111@xry111.site> School of Aerospace Science and Technology, Xidian University
Hello, On Tue, Aug 1, 2023 at 9:16 AM <guoren@kernel.org> wrote: > diff --git a/arch/loongarch/include/asm/cmpxchg.h b/arch/loongarch/include/asm/cmpxchg.h > index 979fde61bba8..6a05b92814b6 100644 > --- a/arch/loongarch/include/asm/cmpxchg.h > +++ b/arch/loongarch/include/asm/cmpxchg.h > @@ -102,8 +102,8 @@ __arch_xchg(volatile void *ptr, unsigned long x, int size) > " move $t0, %z4 \n" \ > " " st " $t0, %1 \n" \ > " beqz $t0, 1b \n" \ > - "2: \n" \ > __WEAK_LLSC_MB \ > + "2: \n" \ Thanks for the patch. This would look pretty good if it weren't for the special memory barrier semantics of the LoongArch's LL and SC instructions. The LL/SC memory barrier behavior of LoongArch: * LL: <memory-barrier> + <load-exclusive> * SC: <store-conditional> + <memory-barrier> and the LoongArch's weak memory model allows load/load reorder for the same address. So, the __WEAK_LLSC_MB[1] is used to prevent load/load reorder and no explicit barrier instruction is required after SC. [1] https://lore.kernel.org/loongarch/20230516124536.535343-1-chenhuacai@loongson.cn/ Regards, -- WANG Rui
On Tue, Aug 1, 2023 at 10:29 AM WANG Rui <wangrui@loongson.cn> wrote:
>
> Hello,
>
> On Tue, Aug 1, 2023 at 9:16 AM <guoren@kernel.org> wrote:
> > diff --git a/arch/loongarch/include/asm/cmpxchg.h b/arch/loongarch/include/asm/cmpxchg.h
> > index 979fde61bba8..6a05b92814b6 100644
> > --- a/arch/loongarch/include/asm/cmpxchg.h
> > +++ b/arch/loongarch/include/asm/cmpxchg.h
> > @@ -102,8 +102,8 @@ __arch_xchg(volatile void *ptr, unsigned long x, int size)
> > " move $t0, %z4 \n" \
> > " " st " $t0, %1 \n" \
> > " beqz $t0, 1b \n" \
> > - "2: \n" \
> > __WEAK_LLSC_MB \
> > + "2: \n" \
>
> Thanks for the patch.
>
> This would look pretty good if it weren't for the special memory
> barrier semantics of the LoongArch's LL and SC instructions.
>
> The LL/SC memory barrier behavior of LoongArch:
>
> * LL: <memory-barrier> + <load-exclusive>
> * SC: <store-conditional> + <memory-barrier>
>
> and the LoongArch's weak memory model allows load/load reorder for the
> same address.
The CoRR problem would cause wider problems than this.For this case,
do you mean your LL -> LL would be reordered?
CPU 0
CPU1
LL(2) (set ex-monitor)
store (break the ex-monitor)
LL(1) (reordered instruction set ex-monitor
SC(3) (successed ?)
>
> So, the __WEAK_LLSC_MB[1] is used to prevent load/load reorder and no
> explicit barrier instruction is required after SC.
>
> [1] https://lore.kernel.org/loongarch/20230516124536.535343-1-chenhuacai@loongson.cn/
>
> Regards,
> --
> WANG Rui
>
--
Best Regards
Guo Ren
On Tue, Aug 1, 2023 at 5:02 PM Guo Ren <guoren@kernel.org> wrote: > > On Tue, Aug 1, 2023 at 10:29 AM WANG Rui <wangrui@loongson.cn> wrote: > > > > Hello, > > > > On Tue, Aug 1, 2023 at 9:16 AM <guoren@kernel.org> wrote: > > > diff --git a/arch/loongarch/include/asm/cmpxchg.h b/arch/loongarch/include/asm/cmpxchg.h > > > index 979fde61bba8..6a05b92814b6 100644 > > > --- a/arch/loongarch/include/asm/cmpxchg.h > > > +++ b/arch/loongarch/include/asm/cmpxchg.h > > > @@ -102,8 +102,8 @@ __arch_xchg(volatile void *ptr, unsigned long x, int size) > > > " move $t0, %z4 \n" \ > > > " " st " $t0, %1 \n" \ > > > " beqz $t0, 1b \n" \ > > > - "2: \n" \ > > > __WEAK_LLSC_MB \ > > > + "2: \n" \ > > > > Thanks for the patch. > > > > This would look pretty good if it weren't for the special memory > > barrier semantics of the LoongArch's LL and SC instructions. > > > > The LL/SC memory barrier behavior of LoongArch: > > > > * LL: <memory-barrier> + <load-exclusive> > > * SC: <store-conditional> + <memory-barrier> > > > > and the LoongArch's weak memory model allows load/load reorder for the > > same address. > The CoRR problem would cause wider problems than this.For this case, > do you mean your LL -> LL would be reordered? > > CPU 0 > CPU1 > LL(2) (set ex-monitor) > > store (break the ex-monitor) > LL(1) (reordered instruction set ex-monitor > SC(3) (successes ?) Sorry for the mail client reformat, I mean: CPU0 LL(2) (set ex-monitor) CPU1 STORE (break the ex-monitor) CPU0 LL(1) (reordered instruction set ex-monitor CPU0 SC(3) (success?) > > > > > So, the __WEAK_LLSC_MB[1] is used to prevent load/load reorder and no > > explicit barrier instruction is required after SC. > > > > [1] https://lore.kernel.org/loongarch/20230516124536.535343-1-chenhuacai@loongson.cn/ > > > > Regards, > > -- > > WANG Rui > > > > > -- > Best Regards > Guo Ren -- Best Regards Guo Ren
Hello, On Tue, Aug 1, 2023 at 5:05 PM Guo Ren <guoren@kernel.org> wrote: > > > The CoRR problem would cause wider problems than this.For this case, > > do you mean your LL -> LL would be reordered? > > > > CPU 0 > > CPU1 > > LL(2) (set ex-monitor) > > > > store (break the ex-monitor) > > LL(1) (reordered instruction set ex-monitor > > SC(3) (successes ?) > Sorry for the mail client reformat, I mean: > > CPU0 LL(2) (set ex-monitor) > CPU1 STORE (break the ex-monitor) > CPU0 LL(1) (reordered instruction set ex-monitor > CPU0 SC(3) (success?) No. LL and LL won't reorder because LL implies a memory barrier(though not acquire semantics). Regards, -- WANG Rui
On Tue, Aug 1, 2023 at 5:32 PM WANG Rui <wangrui@loongson.cn> wrote: > > Hello, > > On Tue, Aug 1, 2023 at 5:05 PM Guo Ren <guoren@kernel.org> wrote: > > > > > The CoRR problem would cause wider problems than this.For this case, > > > do you mean your LL -> LL would be reordered? > > > > > > CPU 0 > > > CPU1 > > > LL(2) (set ex-monitor) > > > > > > store (break the ex-monitor) > > > LL(1) (reordered instruction set ex-monitor > > > SC(3) (successes ?) > > Sorry for the mail client reformat, I mean: > > > > CPU0 LL(2) (set ex-monitor) > > CPU1 STORE (break the ex-monitor) > > CPU0 LL(1) (reordered instruction set ex-monitor > > CPU0 SC(3) (success?) > > No. LL and LL won't reorder because LL implies a memory barrier(though > not acquire semantics). That means we could remove __WEAK_LLSC_MB totally, right? > > Regards, > -- > WANG Rui > -- Best Regards Guo Ren
On Tue, Aug 1, 2023 at 6:50 PM Guo Ren <guoren@kernel.org> wrote: > > On Tue, Aug 1, 2023 at 5:32 PM WANG Rui <wangrui@loongson.cn> wrote: > > No. LL and LL won't reorder because LL implies a memory barrier(though > > not acquire semantics). > That means we could remove __WEAK_LLSC_MB totally, right? More precisely, __WEAK_LLSC_MB is intended to prevent reordering between LL and normal LD used to fetch the expected value for cmpxchg. Regards, -- WANG Rui
On Tue, Aug 1, 2023 at 12:37 PM WANG Rui <wangrui@loongson.cn> wrote: > > On Tue, Aug 1, 2023 at 6:50 PM Guo Ren <guoren@kernel.org> wrote: > > > > On Tue, Aug 1, 2023 at 5:32 PM WANG Rui <wangrui@loongson.cn> wrote: > > > No. LL and LL won't reorder because LL implies a memory barrier(though > > > not acquire semantics). > > That means we could remove __WEAK_LLSC_MB totally, right? > > More precisely, __WEAK_LLSC_MB is intended to prevent reordering > between LL and normal LD used to fetch the expected value for cmpxchg. Oh, that's unnecessary when cmpxchg fails. Maybe you treat cmpxchg as a CoRR antidote in coincidence. Please solve the CoRR problem by READ_ONCE. See alpha architecture. > > Regards, > -- > WANG Rui > -- Best Regards Guo Ren
On Wed, Aug 2, 2023 at 7:17 AM Guo Ren <guoren@kernel.org> wrote: > > On Tue, Aug 1, 2023 at 12:37 PM WANG Rui <wangrui@loongson.cn> wrote: > > > > On Tue, Aug 1, 2023 at 6:50 PM Guo Ren <guoren@kernel.org> wrote: > > > > > > On Tue, Aug 1, 2023 at 5:32 PM WANG Rui <wangrui@loongson.cn> wrote: > > > > No. LL and LL won't reorder because LL implies a memory barrier(though > > > > not acquire semantics). > > > That means we could remove __WEAK_LLSC_MB totally, right? > > > > More precisely, __WEAK_LLSC_MB is intended to prevent reordering > > between LL and normal LD used to fetch the expected value for cmpxchg. > Oh, that's unnecessary when cmpxchg fails. > > Maybe you treat cmpxchg as a CoRR antidote in coincidence. Please > solve the CoRR problem by READ_ONCE. > > See alpha architecture. Unfortunately, the LL instruction has no acquire semantics. Even if our kernel team improves READ_ONCE, it cannot prevent reordering between LL and READ_ONCE after cmpxchg fails. LL (<memory-barrier> + <load-exclusive>); WEAK_LLSC_MB; READ_ONCE (<normal-load>); ... vs LL (<memory-barrier> + <load-exclusive>); READ_ONCE (<normal-load> + <memory-barrier>); ... Improving READ_ONCE is really important. Regards, -- WANG Rui
On Tue, 2023-08-01 at 18:49 +0800, Guo Ren wrote: > > On Tue, Aug 1, 2023 at 5:05 PM Guo Ren <guoren@kernel.org> wrote: > > > > > > > The CoRR problem would cause wider problems than this.For this case, > > > > do you mean your LL -> LL would be reordered? > > > > > > > > CPU 0 > > > > CPU1 > > > > LL(2) (set ex-monitor) > > > > > > > > store (break the ex-monitor) > > > > LL(1) (reordered instruction set ex-monitor > > > > SC(3) (successes ?) > > > Sorry for the mail client reformat, I mean: > > > > > > CPU0 LL(2) (set ex-monitor) > > > CPU1 STORE (break the ex-monitor) > > > CPU0 LL(1) (reordered instruction set ex-monitor > > > CPU0 SC(3) (success?) > > > > No. LL and LL won't reorder because LL implies a memory barrier(though > > not acquire semantics). > That means we could remove __WEAK_LLSC_MB totally, right? As I've said, to implement CAS on LA464 this barrier is *really* needed. I initially didn't believe it then I spent one night (from 11 PM to 04 AM) debugging GCC libgomp test failures. On LA664 (3A6000) things may change though. -- Xi Ruoyao <xry111@xry111.site> School of Aerospace Science and Technology, Xidian University
On Tue, 2023-08-01 at 19:23 +0800, Xi Ruoyao wrote: > On Tue, 2023-08-01 at 18:49 +0800, Guo Ren wrote: > > > On Tue, Aug 1, 2023 at 5:05 PM Guo Ren <guoren@kernel.org> wrote: > > > > > > > > > The CoRR problem would cause wider problems than this.For this case, > > > > > do you mean your LL -> LL would be reordered? > > > > > > > > > > CPU 0 > > > > > CPU1 > > > > > LL(2) (set ex-monitor) > > > > > > > > > > store (break the ex-monitor) > > > > > LL(1) (reordered instruction set ex-monitor > > > > > SC(3) (successes ?) > > > > Sorry for the mail client reformat, I mean: > > > > > > > > CPU0 LL(2) (set ex-monitor) > > > > CPU1 STORE (break the ex-monitor) > > > > CPU0 LL(1) (reordered instruction set ex-monitor > > > > CPU0 SC(3) (success?) > > > > > > No. LL and LL won't reorder because LL implies a memory barrier(though > > > not acquire semantics). > > That means we could remove __WEAK_LLSC_MB totally, right? > > As I've said, to implement CAS on LA464 this barrier is *really* needed. > I initially didn't believe it then I spent one night (from 11 PM to 04 > AM) debugging GCC libgomp test failures. > > On LA664 (3A6000) things may change though. And I consider this a hardware bug, so it's out of the general concept of general memory orders. We are basically just applying a workaround provided by uarch designers. -- Xi Ruoyao <xry111@xry111.site> School of Aerospace Science and Technology, Xidian University
On Tue, Aug 01, 2023 at 10:29:31AM +0800, WANG Rui wrote: > On Tue, Aug 1, 2023 at 9:16 AM <guoren@kernel.org> wrote: > > diff --git a/arch/loongarch/include/asm/cmpxchg.h b/arch/loongarch/include/asm/cmpxchg.h > > index 979fde61bba8..6a05b92814b6 100644 > > --- a/arch/loongarch/include/asm/cmpxchg.h > > +++ b/arch/loongarch/include/asm/cmpxchg.h > > @@ -102,8 +102,8 @@ __arch_xchg(volatile void *ptr, unsigned long x, int size) > > " move $t0, %z4 \n" \ > > " " st " $t0, %1 \n" \ > > " beqz $t0, 1b \n" \ > > - "2: \n" \ > > __WEAK_LLSC_MB \ > > + "2: \n" \ > > Thanks for the patch. > > This would look pretty good if it weren't for the special memory > barrier semantics of the LoongArch's LL and SC instructions. > > The LL/SC memory barrier behavior of LoongArch: > > * LL: <memory-barrier> + <load-exclusive> > * SC: <store-conditional> + <memory-barrier> > > and the LoongArch's weak memory model allows load/load reorder for the > same address. Hmm, somehow this one passed me by, but I think that puts you in the naughty corner with Itanium. It probably also means your READ_ONCE() is broken, unless the compiler emits barriers for volatile reads (like ia64)? Will
Hello, On Tue, Aug 1, 2023 at 4:32 PM Will Deacon <will@kernel.org> wrote: > > Hmm, somehow this one passed me by, but I think that puts you in the naughty > corner with Itanium. It probably also means your READ_ONCE() is broken, > unless the compiler emits barriers for volatile reads (like ia64)? Hmm, I agree with your perspective. Allowing out-of-order loads for the same address in the memory model provides certain performance benefits, but it also poses challenges to software. Fortunately, hardware supports software to disable this feature when needed. Regards, -- WANG Rui
© 2016 - 2025 Red Hat, Inc.