.../devicetree/bindings/riscv/extensions.yaml | 5 ++ arch/riscv/include/asm/barrier.h | 79 ++++++++++++++++--- arch/riscv/include/asm/hwcap.h | 1 + arch/riscv/include/asm/insn-def.h | 79 +++++++++++++++++++ arch/riscv/kernel/cpufeature.c | 1 + 5 files changed, 154 insertions(+), 11 deletions(-)
This patch adds support for the Zalasr ISA extension, which supplies the real load acquire/store release instructions. The specification can be found here: https://github.com/riscv/riscv-zalasr/blob/main/chapter2.adoc This patch seires has been tested with ltp on Qemu with Brensan's zalasr support patch[1]. Some false positive spacing error happens during patch checking. Thus I CCed maintainers of checkpatch.pl as well. [1] https://lore.kernel.org/all/CAGPSXwJEdtqW=nx71oufZp64nK6tK=0rytVEcz4F-gfvCOXk2w@mail.gmail.com/ v2: - Adjust the order of Zalasr and Zalrsc in dt-bindings. Thanks to Conor. Xu Lu (4): riscv: add ISA extension parsing for Zalasr dt-bindings: riscv: Add Zalasr ISA extension description riscv: Instroduce Zalasr instructions riscv: Use Zalasr for smp_load_acquire/smp_store_release .../devicetree/bindings/riscv/extensions.yaml | 5 ++ arch/riscv/include/asm/barrier.h | 79 ++++++++++++++++--- arch/riscv/include/asm/hwcap.h | 1 + arch/riscv/include/asm/insn-def.h | 79 +++++++++++++++++++ arch/riscv/kernel/cpufeature.c | 1 + 5 files changed, 154 insertions(+), 11 deletions(-) -- 2.20.1
> Xu Lu (4): > riscv: add ISA extension parsing for Zalasr > dt-bindings: riscv: Add Zalasr ISA extension description > riscv: Instroduce Zalasr instructions > riscv: Use Zalasr for smp_load_acquire/smp_store_release Informally put, our (Linux) memory consistency model specifies that any sequence spin_unlock(s); spin_lock(t); behaves "as it provides at least FENCE.TSO ordering between operations which precede the UNLOCK+LOCK sequence and operations which follow the sequence". Unless I missing something, the patch set in question breaks such ordering property (on RISC-V): for example, a "release" annotation, .RL (as in spin_unlock() -> smp_store_release(), after patch #4) paired with an "acquire" fence, FENCE R,RW (as could be found in spin_lock() -> atomic_try_cmpxchg_acquire()) do not provide the specified property. I _think some solutions to the issue above include: a) make sure an .RL annotation is always paired with an .AQ annotation and viceversa an .AQ annotation is paired with an .RL annotation (this approach matches the current arm64 approach/implementation); b) on the opposite direction, always pair FENCE R,RW (or occasionally FENCE R,R) with FENCE RW,W (this matches the current approach/the current implementation within riscv); c) mix the previous two solutions (resp., annotations and fences), but make sure to "upgrade" any releases to provide (insert) a FENCE.TSO. (a) would align RISC-V and ARM64 (which is a good thing IMO), though it is probably the most invasive approach among the three approaches above (requiring certain changes to arch/riscv/include/asm/{cmpxchg,atomic}.h, which are already relatively messy due to the various ZABHA plus ZACAS switches). Overall, I'm not too exited at the idea of reviewing any of those changes, but if the community opts for it, I'll almost definitely take a closer look with due calm. ;-) Andrea
On Tue, Sep 02, 2025 at 06:59:15PM +0200, Andrea Parri wrote: > > Xu Lu (4): > > riscv: add ISA extension parsing for Zalasr > > dt-bindings: riscv: Add Zalasr ISA extension description > > riscv: Instroduce Zalasr instructions > > riscv: Use Zalasr for smp_load_acquire/smp_store_release > > Informally put, our (Linux) memory consistency model specifies that any > sequence > > spin_unlock(s); > spin_lock(t); > > behaves "as it provides at least FENCE.TSO ordering between operations > which precede the UNLOCK+LOCK sequence and operations which follow the > sequence". Unless I missing something, the patch set in question breaks > such ordering property (on RISC-V): for example, a "release" annotation, > .RL (as in spin_unlock() -> smp_store_release(), after patch #4) paired > with an "acquire" fence, FENCE R,RW (as could be found in spin_lock() -> > atomic_try_cmpxchg_acquire()) do not provide the specified property. > > I _think some solutions to the issue above include: > > a) make sure an .RL annotation is always paired with an .AQ annotation > and viceversa an .AQ annotation is paired with an .RL annotation > (this approach matches the current arm64 approach/implementation); > > b) on the opposite direction, always pair FENCE R,RW (or occasionally > FENCE R,R) with FENCE RW,W (this matches the current approach/the > current implementation within riscv); > > c) mix the previous two solutions (resp., annotations and fences), but > make sure to "upgrade" any releases to provide (insert) a FENCE.TSO. I prefer option c) at first, it has fewer modification and influence. asm volatile(ALTERNATIVE("fence rw, w;\t\nsb %0, 0(%1)\t\n", \ - SB_RL(%0, %1) "\t\nnop\t\n", \ + SB_RL(%0, %1) "\t\n fence.tso;\t\n", \ 0, RISCV_ISA_EXT_ZALASR, 1) \ : : "r" (v), "r" (p) : "memory"); \ I didn't object option a), and I think it could be done in the future. Acquire Zalasr extension step by step. > > (a) would align RISC-V and ARM64 (which is a good thing IMO), though it > is probably the most invasive approach among the three approaches above > (requiring certain changes to arch/riscv/include/asm/{cmpxchg,atomic}.h, > which are already relatively messy due to the various ZABHA plus ZACAS > switches). Overall, I'm not too exited at the idea of reviewing any of > those changes, but if the community opts for it, I'll almost definitely > take a closer look with due calm. ;-) > > Andrea > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv >
On Wed, Sep 17, 2025 at 12:01:34AM -0400, Guo Ren wrote: > On Tue, Sep 02, 2025 at 06:59:15PM +0200, Andrea Parri wrote: > > > Xu Lu (4): > > > riscv: add ISA extension parsing for Zalasr > > > dt-bindings: riscv: Add Zalasr ISA extension description > > > riscv: Instroduce Zalasr instructions > > > riscv: Use Zalasr for smp_load_acquire/smp_store_release > > > > Informally put, our (Linux) memory consistency model specifies that any > > sequence > > > > spin_unlock(s); > > spin_lock(t); > > > > behaves "as it provides at least FENCE.TSO ordering between operations > > which precede the UNLOCK+LOCK sequence and operations which follow the > > sequence". Unless I missing something, the patch set in question breaks > > such ordering property (on RISC-V): for example, a "release" annotation, > > .RL (as in spin_unlock() -> smp_store_release(), after patch #4) paired > > with an "acquire" fence, FENCE R,RW (as could be found in spin_lock() -> > > atomic_try_cmpxchg_acquire()) do not provide the specified property. > > > > I _think some solutions to the issue above include: > > > > a) make sure an .RL annotation is always paired with an .AQ annotation > > and viceversa an .AQ annotation is paired with an .RL annotation > > (this approach matches the current arm64 approach/implementation); > > > > b) on the opposite direction, always pair FENCE R,RW (or occasionally > > FENCE R,R) with FENCE RW,W (this matches the current approach/the > > current implementation within riscv); > > > > c) mix the previous two solutions (resp., annotations and fences), but > > make sure to "upgrade" any releases to provide (insert) a FENCE.TSO. > I prefer option c) at first, it has fewer modification and influence. Another reason is that store-release-to-load-acquire would give out a FENCE rw, rw according to RVWMO PPO 7th rule instead of FENCE.TSO, which is stricter than the Linux requirement you've mentioned. > > asm volatile(ALTERNATIVE("fence rw, w;\t\nsb %0, 0(%1)\t\n", \ > - SB_RL(%0, %1) "\t\nnop\t\n", \ > + SB_RL(%0, %1) "\t\n fence.tso;\t\n", \ > 0, RISCV_ISA_EXT_ZALASR, 1) \ > : : "r" (v), "r" (p) : "memory"); \ > > I didn't object option a), and I think it could be done in the future. > Acquire Zalasr extension step by step. > > > > > (a) would align RISC-V and ARM64 (which is a good thing IMO), though it > > is probably the most invasive approach among the three approaches above > > (requiring certain changes to arch/riscv/include/asm/{cmpxchg,atomic}.h, > > which are already relatively messy due to the various ZABHA plus ZACAS > > switches). Overall, I'm not too exited at the idea of reviewing any of > > those changes, but if the community opts for it, I'll almost definitely > > take a closer look with due calm. ;-) > > > > Andrea > > > > _______________________________________________ > > linux-riscv mailing list > > linux-riscv@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/linux-riscv > > > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv >
[merging replies] > > I prefer option c) at first, it has fewer modification and influence. > Another reason is that store-release-to-load-acquire would give out a > FENCE rw, rw according to RVWMO PPO 7th rule instead of FENCE.TSO, which > is stricter than the Linux requirement you've mentioned. I mean, if "fewer modification" and "not-a-full-fence" were the only arguments, we would probably just stick with the current scheme (b), right? What other arguments are available? Don't get me wrong: no a priori objection from my end; I was really just wondering about the various interests/rationales in the RISC-V kernel community. (It may surprise you, but some communities did consider that "UNLOCK+LOCK is not a full memory barrier" a disadvantage, because briefly "locking should provide strong ordering guarantees and be easy to reason about"; in fact, not just "locking" if we consider x86 or arm64...) > > asm volatile(ALTERNATIVE("fence rw, w;\t\nsb %0, 0(%1)\t\n", \ > > - SB_RL(%0, %1) "\t\nnop\t\n", \ > > + SB_RL(%0, %1) "\t\n fence.tso;\t\n", \ > > 0, RISCV_ISA_EXT_ZALASR, 1) \ > > : : "r" (v), "r" (p) : "memory"); \ nit: Why placing the fence after the store? I imagine that FENCE.TSO could precede the store, this way, the store would actually not need to have that .RL annotation. More importantly, That for (part of) smp_store_release(). Let me stress that my option (c) was meant to include similar changes for _every releases (that is, cmpxchg_release(), atomic_inc_return_release(), and many other), even if most of such releases do not currently create "problematic pairs" with a corresponding acquire: the concern is that future changes in the RISC-V atomics implementation or in generic locking code will introduce pairs of the form FENCE RW,W + .AQ or .RL + FENCE R,RW without anyone noticing... In other words, I was suggesting that RISC-V _continues to meet the ordering property under discussion "by design" rather than "by Andrea or whoever's code auditing/review" (assuming it's feasible, i.e. that it doesn't clash with other people's arguments?); options (a) and (b) were also "designed" following this same criterion. Andrea
Hi Guo Ren, Thanks for your advice. On Fri, Sep 19, 2025 at 12:48 AM Guo Ren <guoren@kernel.org> wrote: > > On Wed, Sep 17, 2025 at 12:01:34AM -0400, Guo Ren wrote: > > On Tue, Sep 02, 2025 at 06:59:15PM +0200, Andrea Parri wrote: > > > > Xu Lu (4): > > > > riscv: add ISA extension parsing for Zalasr > > > > dt-bindings: riscv: Add Zalasr ISA extension description > > > > riscv: Instroduce Zalasr instructions > > > > riscv: Use Zalasr for smp_load_acquire/smp_store_release > > > > > > Informally put, our (Linux) memory consistency model specifies that any > > > sequence > > > > > > spin_unlock(s); > > > spin_lock(t); > > > > > > behaves "as it provides at least FENCE.TSO ordering between operations > > > which precede the UNLOCK+LOCK sequence and operations which follow the > > > sequence". Unless I missing something, the patch set in question breaks > > > such ordering property (on RISC-V): for example, a "release" annotation, > > > .RL (as in spin_unlock() -> smp_store_release(), after patch #4) paired > > > with an "acquire" fence, FENCE R,RW (as could be found in spin_lock() -> > > > atomic_try_cmpxchg_acquire()) do not provide the specified property. > > > > > > I _think some solutions to the issue above include: > > > > > > a) make sure an .RL annotation is always paired with an .AQ annotation > > > and viceversa an .AQ annotation is paired with an .RL annotation > > > (this approach matches the current arm64 approach/implementation); > > > > > > b) on the opposite direction, always pair FENCE R,RW (or occasionally > > > FENCE R,R) with FENCE RW,W (this matches the current approach/the > > > current implementation within riscv); > > > > > > c) mix the previous two solutions (resp., annotations and fences), but > > > make sure to "upgrade" any releases to provide (insert) a FENCE.TSO. > > I prefer option c) at first, it has fewer modification and influence. > Another reason is that store-release-to-load-acquire would give out a > FENCE rw, rw according to RVWMO PPO 7th rule instead of FENCE.TSO, which > is stricter than the Linux requirement you've mentioned. The existing implementation of spin_unlock, when followed by spin_lock, is equal to 'FENCE rw, rw' for operations before spin_unlock and after spin_lock: spin_unlock: fence rw, w sd spin_lock: amocas fence r, rw The store-release semantics in spin_unlock, is used to ensure that when the other cores can watch the store, they must also watch the operations before the store, which is a more common case than calling spin_unlock immediately followed by spin_lock on the same core. And the existing implementation 'fence rw, w' 'fence r, rw' is stricter than '.aq' '.rl'. That is why we want to modify it. I have reimplemented the code and it now looks like the attached text file. I will send the patch out later. Best Regards. Xu Lu > > > > > asm volatile(ALTERNATIVE("fence rw, w;\t\nsb %0, 0(%1)\t\n", \ > > - SB_RL(%0, %1) "\t\nnop\t\n", \ > > + SB_RL(%0, %1) "\t\n fence.tso;\t\n", \ > > 0, RISCV_ISA_EXT_ZALASR, 1) \ > > : : "r" (v), "r" (p) : "memory"); \ > > > > I didn't object option a), and I think it could be done in the future. > > Acquire Zalasr extension step by step. > > > > > > > > (a) would align RISC-V and ARM64 (which is a good thing IMO), though it > > > is probably the most invasive approach among the three approaches above > > > (requiring certain changes to arch/riscv/include/asm/{cmpxchg,atomic}.h, > > > which are already relatively messy due to the various ZABHA plus ZACAS > > > switches). Overall, I'm not too exited at the idea of reviewing any of > > > those changes, but if the community opts for it, I'll almost definitely > > > take a closer look with due calm. ;-) > > > > > > Andrea > > > > > > _______________________________________________ > > > linux-riscv mailing list > > > linux-riscv@lists.infradead.org > > > http://lists.infradead.org/mailman/listinfo/linux-riscv > > > > > > > _______________________________________________ > > linux-riscv mailing list > > linux-riscv@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/linux-riscv > > |----------------------------------------------------------------------------------------------| | | arch_xchg_release | arch_cmpxchg_release | __smp_store_release | | |-----------------------------------------------------------------------------------------| | | zabha | !zabha | zabha+zacas | !(zabha+zacas) | zalasr | !zalasr | | rl |-----------------------------------------------------------------------------------------| | | | (fence rw, w) | | (fence rw, w) | | fence rw, w | | | amoswap.rl | lr.w | amocas.rl | lr.w | s{b|h|w|d}.rl | s{b|h|w|d} | | | | sc.w.rl | | sc.w.rl | | | |----------------------------------------------------------------------------------------------| | | arch_xchg_acquire | arch_cmpxchg_acquire | __smp_load_acquire | | |-----------------------------------------------------------------------------------------| | | zabha | !zabha | zabha+zacas | !(zabha+zacas) | zalasr | !zalasr | | aq |-----------------------------------------------------------------------------------------| | | | lr.w.aq | | lr.w.aq | | l{b|h|w|d} | | | amoswap.aq | sc.w | amocas.aq | sc.w | l{b|h|w|d}.aq | fence r, rw | | | | (fence r, rw) | | (fence r, rw) | | | |----------------------------------------------------------------------------------------------| (fence rw, w), (fence r, rw) here means such instructions will only be inserted when zalasr is not implemented.
> The existing implementation of spin_unlock, when followed by > spin_lock, is equal to 'FENCE rw, rw' for operations before This is not true without Zacas, that is, when using LR/SC: write-to-read remains unordered in that case. Andrea
Hi Andrea, On Fri, Sep 19, 2025 at 3:45 PM Andrea Parri <parri.andrea@gmail.com> wrote: > > > The existing implementation of spin_unlock, when followed by > > spin_lock, is equal to 'FENCE rw, rw' for operations before > > This is not true without Zacas, that is, when using LR/SC: write-to-read > remains unordered in that case. Yes. Thanks for your corrections. The LR/SC here, when Zacas or Zabha is not implemented, will behaves like: fence rw, w sd lr.w sc,w fence r, rw The 'fence rw, w' ensures the order of operations before itself and 'sc.w'. The 'fence r, rw' ensures the order of operations after itself and 'lr.w'. The operations between 'lr.w' and 'sc.w' are as few as possible and cannot contain load or stores as is said in section 13.3 of riscv unpriv spec: "The dynamic code executed between the LR and SC instructions can only contain instructions from the base ''I'' instruction set, excluding loads, stores, backward jumps, taken backward branches, JALR, FENCE, and SYSTEM instructions." So in summary, though it does not provide 'fence rw, rw' semantics, there is limited space available for the CPU to reorder. And, still, I think calling spin_unlock immediately followed by spin_lock on the same core is much rarer than calling spin_unlock on one core and spin_lock on another core. Best regards, Xu Lu > > Andrea
On Wed, Sep 17, 2025 at 12:01:34AM -0400, Guo Ren wrote: > On Tue, Sep 02, 2025 at 06:59:15PM +0200, Andrea Parri wrote: > > > Xu Lu (4): > > > riscv: add ISA extension parsing for Zalasr > > > dt-bindings: riscv: Add Zalasr ISA extension description > > > riscv: Instroduce Zalasr instructions > > > riscv: Use Zalasr for smp_load_acquire/smp_store_release > > > > Informally put, our (Linux) memory consistency model specifies that any > > sequence > > > > spin_unlock(s); > > spin_lock(t); > > > > behaves "as it provides at least FENCE.TSO ordering between operations > > which precede the UNLOCK+LOCK sequence and operations which follow the > > sequence". Unless I missing something, the patch set in question breaks > > such ordering property (on RISC-V): for example, a "release" annotation, > > .RL (as in spin_unlock() -> smp_store_release(), after patch #4) paired > > with an "acquire" fence, FENCE R,RW (as could be found in spin_lock() -> > > atomic_try_cmpxchg_acquire()) do not provide the specified property. > > > > I _think some solutions to the issue above include: > > > > a) make sure an .RL annotation is always paired with an .AQ annotation > > and viceversa an .AQ annotation is paired with an .RL annotation > > (this approach matches the current arm64 approach/implementation); > > > > b) on the opposite direction, always pair FENCE R,RW (or occasionally > > FENCE R,R) with FENCE RW,W (this matches the current approach/the > > current implementation within riscv); > > > > c) mix the previous two solutions (resp., annotations and fences), but > > make sure to "upgrade" any releases to provide (insert) a FENCE.TSO. > I prefer option c) at first, it has fewer modification and influence. > > asm volatile(ALTERNATIVE("fence rw, w;\t\nsb %0, 0(%1)\t\n", \ > - SB_RL(%0, %1) "\t\nnop\t\n", \ > + SB_RL(%0, %1) "\t\n fence.tso;\t\n", \ "fence rw, rw;\t\nsb %0, 0(%1)\t\n", \ How about enhance fence rw, rw? It's a bit more stronger than .tso. 0, RISCV_ISA_EXT_ZALASR, 1) \ > : : "r" (v), "r" (p) : "memory"); \ > > I didn't object option a), and I think it could be done in the future. > Acquire Zalasr extension step by step. > > > > > (a) would align RISC-V and ARM64 (which is a good thing IMO), though it > > is probably the most invasive approach among the three approaches above > > (requiring certain changes to arch/riscv/include/asm/{cmpxchg,atomic}.h, > > which are already relatively messy due to the various ZABHA plus ZACAS > > switches). Overall, I'm not too exited at the idea of reviewing any of > > those changes, but if the community opts for it, I'll almost definitely > > take a closer look with due calm. ;-) > > > > Andrea > > > > _______________________________________________ > > linux-riscv mailing list > > linux-riscv@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/linux-riscv > > > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv >
Hi Andrea, Great catch! Thanks a lot for your review. The problem comes from the mixed use of acquire/release semantics via fence and via real ld.aq/sd.rl. I would prefer your method (a). The existing atomic acquire/release functions' implementation can be further modified to amocas.sq/amocas.rl/lr.aq/sc.rl. I will send the next version after I finish it and hope you can help with review then. Best regards, Xu Lu On Wed, Sep 3, 2025 at 12:59 AM Andrea Parri <parri.andrea@gmail.com> wrote: > > > Xu Lu (4): > > riscv: add ISA extension parsing for Zalasr > > dt-bindings: riscv: Add Zalasr ISA extension description > > riscv: Instroduce Zalasr instructions > > riscv: Use Zalasr for smp_load_acquire/smp_store_release > > Informally put, our (Linux) memory consistency model specifies that any > sequence > > spin_unlock(s); > spin_lock(t); > > behaves "as it provides at least FENCE.TSO ordering between operations > which precede the UNLOCK+LOCK sequence and operations which follow the > sequence". Unless I missing something, the patch set in question breaks > such ordering property (on RISC-V): for example, a "release" annotation, > .RL (as in spin_unlock() -> smp_store_release(), after patch #4) paired > with an "acquire" fence, FENCE R,RW (as could be found in spin_lock() -> > atomic_try_cmpxchg_acquire()) do not provide the specified property. > > I _think some solutions to the issue above include: > > a) make sure an .RL annotation is always paired with an .AQ annotation > and viceversa an .AQ annotation is paired with an .RL annotation > (this approach matches the current arm64 approach/implementation); > > b) on the opposite direction, always pair FENCE R,RW (or occasionally > FENCE R,R) with FENCE RW,W (this matches the current approach/the > current implementation within riscv); > > c) mix the previous two solutions (resp., annotations and fences), but > make sure to "upgrade" any releases to provide (insert) a FENCE.TSO. > > (a) would align RISC-V and ARM64 (which is a good thing IMO), though it > is probably the most invasive approach among the three approaches above > (requiring certain changes to arch/riscv/include/asm/{cmpxchg,atomic}.h, > which are already relatively messy due to the various ZABHA plus ZACAS > switches). Overall, I'm not too exited at the idea of reviewing any of > those changes, but if the community opts for it, I'll almost definitely > take a closer look with due calm. ;-) > > Andrea
© 2016 - 2025 Red Hat, Inc.