[v2] riscv: Add Zalasr ISA extension support

[PATCH v2 0/4] riscv: Add Zalasr ISA extension support

Posted by Xu Lu 1 month ago

This patch adds support for the Zalasr ISA extension, which supplies the
real load acquire/store release instructions.

The specification can be found here:
https://github.com/riscv/riscv-zalasr/blob/main/chapter2.adoc

This patch seires has been tested with ltp on Qemu with Brensan's zalasr
support patch[1].

Some false positive spacing error happens during patch checking. Thus I
CCed maintainers of checkpatch.pl as well.

[1] https://lore.kernel.org/all/CAGPSXwJEdtqW=nx71oufZp64nK6tK=0rytVEcz4F-gfvCOXk2w@mail.gmail.com/

v2:
 - Adjust the order of Zalasr and Zalrsc in dt-bindings. Thanks to 
Conor.

Xu Lu (4):
  riscv: add ISA extension parsing for Zalasr
  dt-bindings: riscv: Add Zalasr ISA extension description
  riscv: Instroduce Zalasr instructions
  riscv: Use Zalasr for smp_load_acquire/smp_store_release

 .../devicetree/bindings/riscv/extensions.yaml |  5 ++
 arch/riscv/include/asm/barrier.h              | 79 ++++++++++++++++---
 arch/riscv/include/asm/hwcap.h                |  1 +
 arch/riscv/include/asm/insn-def.h             | 79 +++++++++++++++++++
 arch/riscv/kernel/cpufeature.c                |  1 +
 5 files changed, 154 insertions(+), 11 deletions(-)

-- 
2.20.1

Re: [PATCH v2 0/4] riscv: Add Zalasr ISA extension support

Posted by Andrea Parri 1 month ago

> Xu Lu (4):
>   riscv: add ISA extension parsing for Zalasr
>   dt-bindings: riscv: Add Zalasr ISA extension description
>   riscv: Instroduce Zalasr instructions
>   riscv: Use Zalasr for smp_load_acquire/smp_store_release

Informally put, our (Linux) memory consistency model specifies that any
sequence

  spin_unlock(s);
  spin_lock(t);

behaves "as it provides at least FENCE.TSO ordering between operations
which precede the UNLOCK+LOCK sequence and operations which follow the
sequence".  Unless I missing something, the patch set in question breaks
such ordering property (on RISC-V): for example, a "release" annotation,
.RL (as in spin_unlock() -> smp_store_release(), after patch #4) paired
with an "acquire" fence, FENCE R,RW (as could be found in spin_lock() ->
atomic_try_cmpxchg_acquire()) do not provide the specified property.

I _think some solutions to the issue above include:

 a) make sure an .RL annotation is always paired with an .AQ annotation
    and viceversa an .AQ annotation is paired with an .RL annotation
    (this approach matches the current arm64 approach/implementation);

 b) on the opposite direction, always pair FENCE R,RW (or occasionally
    FENCE R,R) with FENCE RW,W (this matches the current approach/the
    current implementation within riscv);

 c) mix the previous two solutions (resp., annotations and fences), but
    make sure to "upgrade" any releases to provide (insert) a FENCE.TSO.

(a) would align RISC-V and ARM64 (which is a good thing IMO), though it
is probably the most invasive approach among the three approaches above
(requiring certain changes to arch/riscv/include/asm/{cmpxchg,atomic}.h,
which are already relatively messy due to the various ZABHA plus ZACAS
switches).  Overall, I'm not too exited at the idea of reviewing any of
those changes, but if the community opts for it, I'll almost definitely
take a closer look with due calm.  ;-)

  Andrea

Re: [PATCH v2 0/4] riscv: Add Zalasr ISA extension support

Posted by Guo Ren 2 weeks, 2 days ago

On Tue, Sep 02, 2025 at 06:59:15PM +0200, Andrea Parri wrote:
> > Xu Lu (4):
> >   riscv: add ISA extension parsing for Zalasr
> >   dt-bindings: riscv: Add Zalasr ISA extension description
> >   riscv: Instroduce Zalasr instructions
> >   riscv: Use Zalasr for smp_load_acquire/smp_store_release
> 
> Informally put, our (Linux) memory consistency model specifies that any
> sequence
> 
>   spin_unlock(s);
>   spin_lock(t);
> 
> behaves "as it provides at least FENCE.TSO ordering between operations
> which precede the UNLOCK+LOCK sequence and operations which follow the
> sequence".  Unless I missing something, the patch set in question breaks
> such ordering property (on RISC-V): for example, a "release" annotation,
> .RL (as in spin_unlock() -> smp_store_release(), after patch #4) paired
> with an "acquire" fence, FENCE R,RW (as could be found in spin_lock() ->
> atomic_try_cmpxchg_acquire()) do not provide the specified property.
> 
> I _think some solutions to the issue above include:
> 
>  a) make sure an .RL annotation is always paired with an .AQ annotation
>     and viceversa an .AQ annotation is paired with an .RL annotation
>     (this approach matches the current arm64 approach/implementation);
> 
>  b) on the opposite direction, always pair FENCE R,RW (or occasionally
>     FENCE R,R) with FENCE RW,W (this matches the current approach/the
>     current implementation within riscv);
> 
>  c) mix the previous two solutions (resp., annotations and fences), but
>     make sure to "upgrade" any releases to provide (insert) a FENCE.TSO.
I prefer option c) at first, it has fewer modification and influence.

asm volatile(ALTERNATIVE("fence rw, w;\t\nsb %0, 0(%1)\t\n",	\
-			  SB_RL(%0, %1) "\t\nnop\t\n",		\
+			  SB_RL(%0, %1) "\t\n fence.tso;\t\n",	\
			  0, RISCV_ISA_EXT_ZALASR, 1)		\
			  : : "r" (v), "r" (p) : "memory");	\

I didn't object option a), and I think it could be done in the future.
Acquire Zalasr extension step by step.

> 
> (a) would align RISC-V and ARM64 (which is a good thing IMO), though it
> is probably the most invasive approach among the three approaches above
> (requiring certain changes to arch/riscv/include/asm/{cmpxchg,atomic}.h,
> which are already relatively messy due to the various ZABHA plus ZACAS
> switches).  Overall, I'm not too exited at the idea of reviewing any of
> those changes, but if the community opts for it, I'll almost definitely
> take a closer look with due calm.  ;-)
> 
>   Andrea
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
>

Re: [PATCH v2 0/4] riscv: Add Zalasr ISA extension support

Posted by Guo Ren 2 weeks ago

On Wed, Sep 17, 2025 at 12:01:34AM -0400, Guo Ren wrote:
> On Tue, Sep 02, 2025 at 06:59:15PM +0200, Andrea Parri wrote:
> > > Xu Lu (4):
> > >   riscv: add ISA extension parsing for Zalasr
> > >   dt-bindings: riscv: Add Zalasr ISA extension description
> > >   riscv: Instroduce Zalasr instructions
> > >   riscv: Use Zalasr for smp_load_acquire/smp_store_release
> > 
> > Informally put, our (Linux) memory consistency model specifies that any
> > sequence
> > 
> >   spin_unlock(s);
> >   spin_lock(t);
> > 
> > behaves "as it provides at least FENCE.TSO ordering between operations
> > which precede the UNLOCK+LOCK sequence and operations which follow the
> > sequence".  Unless I missing something, the patch set in question breaks
> > such ordering property (on RISC-V): for example, a "release" annotation,
> > .RL (as in spin_unlock() -> smp_store_release(), after patch #4) paired
> > with an "acquire" fence, FENCE R,RW (as could be found in spin_lock() ->
> > atomic_try_cmpxchg_acquire()) do not provide the specified property.
> > 
> > I _think some solutions to the issue above include:
> > 
> >  a) make sure an .RL annotation is always paired with an .AQ annotation
> >     and viceversa an .AQ annotation is paired with an .RL annotation
> >     (this approach matches the current arm64 approach/implementation);
> > 
> >  b) on the opposite direction, always pair FENCE R,RW (or occasionally
> >     FENCE R,R) with FENCE RW,W (this matches the current approach/the
> >     current implementation within riscv);
> > 
> >  c) mix the previous two solutions (resp., annotations and fences), but
> >     make sure to "upgrade" any releases to provide (insert) a FENCE.TSO.
> I prefer option c) at first, it has fewer modification and influence.
Another reason is that store-release-to-load-acquire would give out a
FENCE rw, rw according to RVWMO PPO 7th rule instead of FENCE.TSO, which
is stricter than the Linux requirement you've mentioned.

> 
> asm volatile(ALTERNATIVE("fence rw, w;\t\nsb %0, 0(%1)\t\n",	\
> -			  SB_RL(%0, %1) "\t\nnop\t\n",		\
> +			  SB_RL(%0, %1) "\t\n fence.tso;\t\n",	\
> 			  0, RISCV_ISA_EXT_ZALASR, 1)		\
> 			  : : "r" (v), "r" (p) : "memory");	\
> 
> I didn't object option a), and I think it could be done in the future.
> Acquire Zalasr extension step by step.
> 
> > 
> > (a) would align RISC-V and ARM64 (which is a good thing IMO), though it
> > is probably the most invasive approach among the three approaches above
> > (requiring certain changes to arch/riscv/include/asm/{cmpxchg,atomic}.h,
> > which are already relatively messy due to the various ZABHA plus ZACAS
> > switches).  Overall, I'm not too exited at the idea of reviewing any of
> > those changes, but if the community opts for it, I'll almost definitely
> > take a closer look with due calm.  ;-)
> > 
> >   Andrea
> > 
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
> > 
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
>

Re: [PATCH v2 0/4] riscv: Add Zalasr ISA extension support

Posted by Andrea Parri 2 weeks ago

[merging replies]

> > I prefer option c) at first, it has fewer modification and influence.
> Another reason is that store-release-to-load-acquire would give out a
> FENCE rw, rw according to RVWMO PPO 7th rule instead of FENCE.TSO, which
> is stricter than the Linux requirement you've mentioned.

I mean, if "fewer modification" and "not-a-full-fence" were the only
arguments, we would probably just stick with the current scheme (b),
right?  What other arguments are available?  Don't get me wrong: no a
priori objection from my end; I was really just wondering about the
various interests/rationales in the RISC-V kernel community.  (It may
surprise you, but some communities did consider that "UNLOCK+LOCK is
not a full memory barrier" a disadvantage, because briefly "locking
should provide strong ordering guarantees and be easy to reason about";
in fact, not just "locking" if we consider x86 or arm64...)


> > asm volatile(ALTERNATIVE("fence rw, w;\t\nsb %0, 0(%1)\t\n",	\
> > -			  SB_RL(%0, %1) "\t\nnop\t\n",		\
> > +			  SB_RL(%0, %1) "\t\n fence.tso;\t\n",	\
> > 			  0, RISCV_ISA_EXT_ZALASR, 1)		\
> > 			  : : "r" (v), "r" (p) : "memory");	\

nit: Why placing the fence after the store?  I imagine that FENCE.TSO
could precede the store, this way, the store would actually not need
to have that .RL annotation.  More importantly,

That for (part of) smp_store_release().  Let me stress that my option
(c) was meant to include similar changes for _every releases (that is,
cmpxchg_release(), atomic_inc_return_release(), and many other), even
if most of such releases do not currently create "problematic pairs"
with a corresponding acquire: the concern is that future changes in the
RISC-V atomics implementation or in generic locking code will introduce
pairs of the form FENCE RW,W + .AQ or .RL + FENCE R,RW without anyone
noticing...  In other words, I was suggesting that RISC-V _continues
to meet the ordering property under discussion "by design" rather than
"by Andrea or whoever's code auditing/review" (assuming it's feasible,
i.e. that it doesn't clash with other people's arguments?); options (a)
and (b) were also "designed" following this same criterion.

  Andrea

Re: [External] Re: [PATCH v2 0/4] riscv: Add Zalasr ISA extension support

Posted by Xu Lu 2 weeks ago

Hi Guo Ren,

Thanks for your advice.

On Fri, Sep 19, 2025 at 12:48 AM Guo Ren <guoren@kernel.org> wrote:
>
> On Wed, Sep 17, 2025 at 12:01:34AM -0400, Guo Ren wrote:
> > On Tue, Sep 02, 2025 at 06:59:15PM +0200, Andrea Parri wrote:
> > > > Xu Lu (4):
> > > >   riscv: add ISA extension parsing for Zalasr
> > > >   dt-bindings: riscv: Add Zalasr ISA extension description
> > > >   riscv: Instroduce Zalasr instructions
> > > >   riscv: Use Zalasr for smp_load_acquire/smp_store_release
> > >
> > > Informally put, our (Linux) memory consistency model specifies that any
> > > sequence
> > >
> > >   spin_unlock(s);
> > >   spin_lock(t);
> > >
> > > behaves "as it provides at least FENCE.TSO ordering between operations
> > > which precede the UNLOCK+LOCK sequence and operations which follow the
> > > sequence".  Unless I missing something, the patch set in question breaks
> > > such ordering property (on RISC-V): for example, a "release" annotation,
> > > .RL (as in spin_unlock() -> smp_store_release(), after patch #4) paired
> > > with an "acquire" fence, FENCE R,RW (as could be found in spin_lock() ->
> > > atomic_try_cmpxchg_acquire()) do not provide the specified property.
> > >
> > > I _think some solutions to the issue above include:
> > >
> > >  a) make sure an .RL annotation is always paired with an .AQ annotation
> > >     and viceversa an .AQ annotation is paired with an .RL annotation
> > >     (this approach matches the current arm64 approach/implementation);
> > >
> > >  b) on the opposite direction, always pair FENCE R,RW (or occasionally
> > >     FENCE R,R) with FENCE RW,W (this matches the current approach/the
> > >     current implementation within riscv);
> > >
> > >  c) mix the previous two solutions (resp., annotations and fences), but
> > >     make sure to "upgrade" any releases to provide (insert) a FENCE.TSO.
> > I prefer option c) at first, it has fewer modification and influence.
> Another reason is that store-release-to-load-acquire would give out a
> FENCE rw, rw according to RVWMO PPO 7th rule instead of FENCE.TSO, which
> is stricter than the Linux requirement you've mentioned.

The existing implementation of spin_unlock, when followed by
spin_lock, is equal to 'FENCE rw, rw' for operations before
spin_unlock
 and after spin_lock:

spin_unlock:
    fence rw, w
    sd
spin_lock:
    amocas
    fence r, rw

The store-release semantics in spin_unlock, is used to ensure that
when the other cores can watch the store, they must also watch the
operations before the store, which is a more common case than calling
spin_unlock immediately followed by spin_lock on the same core. And
the existing implementation 'fence rw, w' 'fence r, rw' is stricter
than '.aq' '.rl'. That is why we want to modify it.

I have reimplemented the code and it now looks like the attached text
file. I will send the patch out later.

Best Regards.

Xu Lu

>
> >
> > asm volatile(ALTERNATIVE("fence rw, w;\t\nsb %0, 0(%1)\t\n",  \
> > -                       SB_RL(%0, %1) "\t\nnop\t\n",          \
> > +                       SB_RL(%0, %1) "\t\n fence.tso;\t\n",  \
> >                         0, RISCV_ISA_EXT_ZALASR, 1)           \
> >                         : : "r" (v), "r" (p) : "memory");     \
> >
> > I didn't object option a), and I think it could be done in the future.
> > Acquire Zalasr extension step by step.
> >
> > >
> > > (a) would align RISC-V and ARM64 (which is a good thing IMO), though it
> > > is probably the most invasive approach among the three approaches above
> > > (requiring certain changes to arch/riscv/include/asm/{cmpxchg,atomic}.h,
> > > which are already relatively messy due to the various ZABHA plus ZACAS
> > > switches).  Overall, I'm not too exited at the idea of reviewing any of
> > > those changes, but if the community opts for it, I'll almost definitely
> > > take a closer look with due calm.  ;-)
> > >
> > >   Andrea
> > >
> > > _______________________________________________
> > > linux-riscv mailing list
> > > linux-riscv@lists.infradead.org
> > > http://lists.infradead.org/mailman/listinfo/linux-riscv
> > >
> >
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
> >
|----------------------------------------------------------------------------------------------|
|    |    arch_xchg_release       |     arch_cmpxchg_release     |     __smp_store_release     |
|    |-----------------------------------------------------------------------------------------|
|    | zabha      | !zabha        | zabha+zacas | !(zabha+zacas) | zalasr        | !zalasr     |
| rl |-----------------------------------------------------------------------------------------|
|    |            | (fence rw, w) |             | (fence rw, w)  |               | fence rw, w |
|    | amoswap.rl | lr.w          | amocas.rl   | lr.w           | s{b|h|w|d}.rl | s{b|h|w|d}  |
|    |            | sc.w.rl       |             | sc.w.rl        |               |             |
|----------------------------------------------------------------------------------------------|
|    |    arch_xchg_acquire       |     arch_cmpxchg_acquire     |    __smp_load_acquire       |
|    |-----------------------------------------------------------------------------------------|
|    | zabha      | !zabha        | zabha+zacas | !(zabha+zacas) | zalasr        | !zalasr     |
| aq |-----------------------------------------------------------------------------------------|
|    |            | lr.w.aq       |             | lr.w.aq        |               | l{b|h|w|d}  |
|    | amoswap.aq | sc.w          | amocas.aq   | sc.w           | l{b|h|w|d}.aq | fence r, rw |
|    |            | (fence r, rw) |             | (fence r, rw)  |               |             |
|----------------------------------------------------------------------------------------------|

(fence rw, w), (fence r, rw) here means such instructions will only be inserted when zalasr is not implemented.

Re: [External] Re: [PATCH v2 0/4] riscv: Add Zalasr ISA extension support

Posted by Andrea Parri 2 weeks ago

> The existing implementation of spin_unlock, when followed by
> spin_lock, is equal to 'FENCE rw, rw' for operations before

This is not true without Zacas, that is, when using LR/SC: write-to-read
remains unordered in that case.

  Andrea

Re: [External] Re: [PATCH v2 0/4] riscv: Add Zalasr ISA extension support

Posted by Xu Lu 2 weeks ago

Hi Andrea,

On Fri, Sep 19, 2025 at 3:45 PM Andrea Parri <parri.andrea@gmail.com> wrote:
>
> > The existing implementation of spin_unlock, when followed by
> > spin_lock, is equal to 'FENCE rw, rw' for operations before
>
> This is not true without Zacas, that is, when using LR/SC: write-to-read
> remains unordered in that case.

Yes. Thanks for your corrections.

The LR/SC here, when Zacas or Zabha is not implemented, will behaves like:

    fence rw, w
    sd
    lr.w
    sc,w
    fence r, rw

The 'fence rw, w' ensures the order of operations before itself and 'sc.w'.
The 'fence r, rw' ensures the order of operations after itself and 'lr.w'.
The operations between 'lr.w' and 'sc.w' are as few as possible and
cannot contain load or stores as is said in section 13.3 of riscv
unpriv spec:
"The dynamic code executed between the LR and SC instructions can only
contain instructions from the base ''I'' instruction set, excluding
loads, stores, backward jumps, taken backward branches, JALR, FENCE,
and SYSTEM instructions."

So in summary, though it does not provide 'fence rw, rw' semantics,
there is limited space available for the CPU to reorder.

And, still, I think calling spin_unlock immediately followed by
spin_lock on the same core is much rarer than calling spin_unlock on
one core and spin_lock on another core.

Best regards,
Xu Lu

>
>   Andrea

Re: [PATCH v2 0/4] riscv: Add Zalasr ISA extension support

Posted by Guo Ren 2 weeks, 2 days ago

On Wed, Sep 17, 2025 at 12:01:34AM -0400, Guo Ren wrote:
> On Tue, Sep 02, 2025 at 06:59:15PM +0200, Andrea Parri wrote:
> > > Xu Lu (4):
> > >   riscv: add ISA extension parsing for Zalasr
> > >   dt-bindings: riscv: Add Zalasr ISA extension description
> > >   riscv: Instroduce Zalasr instructions
> > >   riscv: Use Zalasr for smp_load_acquire/smp_store_release
> > 
> > Informally put, our (Linux) memory consistency model specifies that any
> > sequence
> > 
> >   spin_unlock(s);
> >   spin_lock(t);
> > 
> > behaves "as it provides at least FENCE.TSO ordering between operations
> > which precede the UNLOCK+LOCK sequence and operations which follow the
> > sequence".  Unless I missing something, the patch set in question breaks
> > such ordering property (on RISC-V): for example, a "release" annotation,
> > .RL (as in spin_unlock() -> smp_store_release(), after patch #4) paired
> > with an "acquire" fence, FENCE R,RW (as could be found in spin_lock() ->
> > atomic_try_cmpxchg_acquire()) do not provide the specified property.
> > 
> > I _think some solutions to the issue above include:
> > 
> >  a) make sure an .RL annotation is always paired with an .AQ annotation
> >     and viceversa an .AQ annotation is paired with an .RL annotation
> >     (this approach matches the current arm64 approach/implementation);
> > 
> >  b) on the opposite direction, always pair FENCE R,RW (or occasionally
> >     FENCE R,R) with FENCE RW,W (this matches the current approach/the
> >     current implementation within riscv);
> > 
> >  c) mix the previous two solutions (resp., annotations and fences), but
> >     make sure to "upgrade" any releases to provide (insert) a FENCE.TSO.
> I prefer option c) at first, it has fewer modification and influence.
> 
> asm volatile(ALTERNATIVE("fence rw, w;\t\nsb %0, 0(%1)\t\n",	\
> -			  SB_RL(%0, %1) "\t\nnop\t\n",		\
> +			  SB_RL(%0, %1) "\t\n fence.tso;\t\n",	\
			   "fence rw, rw;\t\nsb %0, 0(%1)\t\n",	\
How about enhance fence rw, rw? It's a bit more stronger than .tso.

 			  0, RISCV_ISA_EXT_ZALASR, 1)		\
> 			  : : "r" (v), "r" (p) : "memory");	\
> 
> I didn't object option a), and I think it could be done in the future.
> Acquire Zalasr extension step by step.
> 
> > 
> > (a) would align RISC-V and ARM64 (which is a good thing IMO), though it
> > is probably the most invasive approach among the three approaches above
> > (requiring certain changes to arch/riscv/include/asm/{cmpxchg,atomic}.h,
> > which are already relatively messy due to the various ZABHA plus ZACAS
> > switches).  Overall, I'm not too exited at the idea of reviewing any of
> > those changes, but if the community opts for it, I'll almost definitely
> > take a closer look with due calm.  ;-)
> > 
> >   Andrea
> > 
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
> > 
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
>

Re: [External] Re: [PATCH v2 0/4] riscv: Add Zalasr ISA extension support

Posted by Xu Lu 4 weeks, 1 day ago

Hi Andrea,

Great catch! Thanks a lot for your review.

The problem comes from the mixed use of acquire/release semantics via
fence and via real ld.aq/sd.rl. I would prefer your method (a). The
existing atomic acquire/release functions' implementation can be
further modified to amocas.sq/amocas.rl/lr.aq/sc.rl. I will send the
next version after I finish it and hope you can help with review then.

Best regards,
Xu Lu

On Wed, Sep 3, 2025 at 12:59 AM Andrea Parri <parri.andrea@gmail.com> wrote:
>
> > Xu Lu (4):
> >   riscv: add ISA extension parsing for Zalasr
> >   dt-bindings: riscv: Add Zalasr ISA extension description
> >   riscv: Instroduce Zalasr instructions
> >   riscv: Use Zalasr for smp_load_acquire/smp_store_release
>
> Informally put, our (Linux) memory consistency model specifies that any
> sequence
>
>   spin_unlock(s);
>   spin_lock(t);
>
> behaves "as it provides at least FENCE.TSO ordering between operations
> which precede the UNLOCK+LOCK sequence and operations which follow the
> sequence".  Unless I missing something, the patch set in question breaks
> such ordering property (on RISC-V): for example, a "release" annotation,
> .RL (as in spin_unlock() -> smp_store_release(), after patch #4) paired
> with an "acquire" fence, FENCE R,RW (as could be found in spin_lock() ->
> atomic_try_cmpxchg_acquire()) do not provide the specified property.
>
> I _think some solutions to the issue above include:
>
>  a) make sure an .RL annotation is always paired with an .AQ annotation
>     and viceversa an .AQ annotation is paired with an .RL annotation
>     (this approach matches the current arm64 approach/implementation);
>
>  b) on the opposite direction, always pair FENCE R,RW (or occasionally
>     FENCE R,R) with FENCE RW,W (this matches the current approach/the
>     current implementation within riscv);
>
>  c) mix the previous two solutions (resp., annotations and fences), but
>     make sure to "upgrade" any releases to provide (insert) a FENCE.TSO.
>
> (a) would align RISC-V and ARM64 (which is a good thing IMO), though it
> is probably the most invasive approach among the three approaches above
> (requiring certain changes to arch/riscv/include/asm/{cmpxchg,atomic}.h,
> which are already relatively messy due to the various ZABHA plus ZACAS
> switches).  Overall, I'm not too exited at the idea of reviewing any of
> those changes, but if the community opts for it, I'll almost definitely
> take a closer look with due calm.  ;-)
>
>   Andrea