xen/arch/arm/Kconfig | 11 + xen/arch/arm/Makefile | 1 + xen/arch/arm/lse.c | 13 + xen/arch/arm/setup.c | 13 +- xen/include/asm-arm/arm32/atomic.h | 253 +++++++----- xen/include/asm-arm/arm32/cmpxchg.h | 388 +++++++++++------- xen/include/asm-arm/arm32/system.h | 2 +- xen/include/asm-arm/arm64/atomic.h | 236 +++++------ xen/include/asm-arm/arm64/atomic_ll_sc.h | 231 +++++++++++ xen/include/asm-arm/arm64/atomic_lse.h | 246 +++++++++++ xen/include/asm-arm/arm64/cmpxchg.h | 497 ++++++++++++++++------- xen/include/asm-arm/arm64/lse.h | 48 +++ xen/include/asm-arm/arm64/system.h | 2 +- xen/include/asm-arm/atomic.h | 15 +- xen/include/asm-arm/cpufeature.h | 57 +-- xen/include/asm-arm/system.h | 9 +- xen/include/xen/rwonce.h | 21 + 17 files changed, 1469 insertions(+), 574 deletions(-) create mode 100644 xen/arch/arm/lse.c create mode 100644 xen/include/asm-arm/arm64/atomic_ll_sc.h create mode 100644 xen/include/asm-arm/arm64/atomic_lse.h create mode 100644 xen/include/asm-arm/arm64/lse.h create mode 100644 xen/include/xen/rwonce.h
From: Ash Wilding <ash.j.wilding@gmail.com>
Hey,
I've found some time to improve this series a bit:
Changes since RFC v1
====================
- Broken up patches into smaller chunks to aid in readability.
- As per Julien's feedback I've also introduced intermediary patches
that first remove Xen's existing headers, then pull in the current
Linux versions as-is. This means we only need to review the changes
made while porting to Xen, rather than reviewing the existing Linux
code.
- Pull in Linux's <asm-generic/rwonce.h> as <xen/rwonce.h> for
__READ_ONCE()/__WRITE_ONCE() instead of putting these in Xen's
<xen/compiler.h>. While doing this, provide justification for
dropping Linux's <linux/compiler_types.h> and relaxing the
__READ_ONCE() usage of __unqual_scalar_typeof() to just typeof()
(see patches #5 and #6).
Keeping the rest of the cover-letter unchanged as it would still be
good to discuss these things:
Arguments in favour of doing this
=================================
- Lets SMMUv3 driver switch to using <asm/atomic.h> rather than
maintaining its own implementation of the helpers.
- Provides mitigation against XSA-295 [2], which affects both arm32
and arm64, across all versions of Xen, and may allow a domU to
maliciously or erroneously DoS the hypervisor.
- All Armv8-A core implementations since ~2017 implement LSE so
there is an argument to be made we are long overdue support in
Xen. This is compounded by LSE atomics being more performant than
LL/SC atomics in most real-world applications, especially at high
core counts.
- We may be able to get improved performance when using LL/SC too
as Linux provides helpers with relaxed ordering requirements which
are currently not available in Xen, though for this we would need
to go back through existing code to see where the more relaxed
versions can be safely used.
- Anything else?
Arguments against doing this
============================
- Limited testing infrastructure in place to ensure use of new
atomics helpers does not introduce new bugs and regressions. This
is a particularly strong argument given how difficult it can be to
identify and debug malfunctioning atomics. The old adage applies,
"If it ain't broke, don't fix it."
- Anything else?
Disclaimers
===========
- This is a very rough first-pass effort intended to spur the
discussions along.
- Only build-tested on arm64 and arm32, *not* run-tested. I did also
build for x86_64 just to make I didn't inadvertently break that.
- This version only tackles atomics and cmpxchg; I've not yet had a
chance to look at locks so those are still using LL/SC.
- The timeout variants of cmpxchg() (used by guest atomics) are still
using LL/SC regardless of LSE being available, as these helpers are
not provided by Linux so I just copied over the existing Xen ones.
Any further comments, guidance, and discussion on how to improve this
approach and get LSE atomics support merged into Xen would be greatly
appreciated.
Thanks!
Ash.
[1] https://lore.kernel.org/xen-devel/13baac40-8b10-0def-4e44-0d8f655fcaf1@xen.org/
[2] https://xenbits.xen.org/xsa/advisory-295.txt
Ash Wilding (15):
xen/arm: Support detection of CPU features in other ID registers
xen/arm: Add detection of Armv8.1-LSE atomic instructions
xen/arm: Add ARM64_HAS_LSE_ATOMICS hwcap
xen/arm: Delete Xen atomics helpers
xen/arm: pull in Linux atomics helpers and dependencies
xen: port Linux <asm-generic/rwonce.h> to Xen
xen/arm: prepare existing Xen headers for Linux atomics
xen/arm64: port Linux LL/SC atomics helpers to Xen
xen/arm64: port Linux LSE atomics helpers to Xen
xen/arm64: port Linux's arm64 cmpxchg.h to Xen
xen/arm64: port Linux's arm64 atomic.h to Xen
xen/arm64: port Linux's arm64 lse.h to Xen
xen/arm32: port Linux's arm32 atomic.h to Xen
xen/arm32: port Linux's arm32 cmpxchg.h to Xen
xen/arm: remove dependency on gcc built-in __sync_fetch_and_add()
xen/arch/arm/Kconfig | 11 +
xen/arch/arm/Makefile | 1 +
xen/arch/arm/lse.c | 13 +
xen/arch/arm/setup.c | 13 +-
xen/include/asm-arm/arm32/atomic.h | 253 +++++++-----
xen/include/asm-arm/arm32/cmpxchg.h | 388 +++++++++++-------
xen/include/asm-arm/arm32/system.h | 2 +-
xen/include/asm-arm/arm64/atomic.h | 236 +++++------
xen/include/asm-arm/arm64/atomic_ll_sc.h | 231 +++++++++++
xen/include/asm-arm/arm64/atomic_lse.h | 246 +++++++++++
xen/include/asm-arm/arm64/cmpxchg.h | 497 ++++++++++++++++-------
xen/include/asm-arm/arm64/lse.h | 48 +++
xen/include/asm-arm/arm64/system.h | 2 +-
xen/include/asm-arm/atomic.h | 15 +-
xen/include/asm-arm/cpufeature.h | 57 +--
xen/include/asm-arm/system.h | 9 +-
xen/include/xen/rwonce.h | 21 +
17 files changed, 1469 insertions(+), 574 deletions(-)
create mode 100644 xen/arch/arm/lse.c
create mode 100644 xen/include/asm-arm/arm64/atomic_ll_sc.h
create mode 100644 xen/include/asm-arm/arm64/atomic_lse.h
create mode 100644 xen/include/asm-arm/arm64/lse.h
create mode 100644 xen/include/xen/rwonce.h
--
2.24.3 (Apple Git-128)
On 11/11/2020 21:51, Ash Wilding wrote: > From: Ash Wilding <ash.j.wilding@gmail.com> > > > Hey, Hi Ash, > > I've found some time to improve this series a bit: > > > Changes since RFC v1 > ==================== > > - Broken up patches into smaller chunks to aid in readability. > > - As per Julien's feedback I've also introduced intermediary patches > that first remove Xen's existing headers, then pull in the current > Linux versions as-is. This means we only need to review the changes > made while porting to Xen, rather than reviewing the existing Linux > code. > > - Pull in Linux's <asm-generic/rwonce.h> as <xen/rwonce.h> for > __READ_ONCE()/__WRITE_ONCE() instead of putting these in Xen's > <xen/compiler.h>. While doing this, provide justification for > dropping Linux's <linux/compiler_types.h> and relaxing the > __READ_ONCE() usage of __unqual_scalar_typeof() to just typeof() > (see patches #5 and #6). > > > > Keeping the rest of the cover-letter unchanged as it would still be > good to discuss these things: > > > Arguments in favour of doing this > ================================= > > - Lets SMMUv3 driver switch to using <asm/atomic.h> rather than > maintaining its own implementation of the helpers. > > - Provides mitigation against XSA-295 [2], which affects both arm32 > and arm64, across all versions of Xen, and may allow a domU to > maliciously or erroneously DoS the hypervisor. > > - All Armv8-A core implementations since ~2017 implement LSE so > there is an argument to be made we are long overdue support in > Xen. This is compounded by LSE atomics being more performant than > LL/SC atomics in most real-world applications, especially at high > core counts. > > - We may be able to get improved performance when using LL/SC too > as Linux provides helpers with relaxed ordering requirements which > are currently not available in Xen, though for this we would need > to go back through existing code to see where the more relaxed > versions can be safely used. > > - Anything else? > > > Arguments against doing this > ============================ > > - Limited testing infrastructure in place to ensure use of new > atomics helpers does not introduce new bugs and regressions. This > is a particularly strong argument given how difficult it can be to > identify and debug malfunctioning atomics. The old adage applies, > "If it ain't broke, don't fix it." I am not too concerned about the Linux atomics implementation. They are well tested and your changes doesn't seem to touch the implementation itself. However, I vaguely remember that some of the atomics helper in Xen are somewhat much stronger than the Linux counterpart. Would you be able to look at all the existing helpers and see whether the memory ordering diverge? Once we have a list we could decide whether we want to make them stronger again or check the callers. Cheers, -- Julien Grall
Hi Julien,
Thanks for taking a look at the patches and providing feedback. I've seen your
other comments and will reply to those separately when I get a chance (maybe at
the weekend or over the Christmas break).
RE the differences in ordering semantics between Xen's and Linux's atomics
helpers, please find my notes below.
Thoughts?
Cheers,
Ash.
The tables below use format AAA/BBB/CCC/DDD/EEE, where:
- AAA is the memory barrier before the operation
- BBB is the acquire semantics of the atomic operation
- CCC is the release semantics of the atomic operation
- DDD is whether the asm() block clobbers memory
- EEE is the memory barrier after the operation
For example, ---/---/rel/mem/dmb would mean:
- No memory barrier before the operation
- The atomic does *not* have acquire semantics
- The atomic *does* have release semantics
- The asm() block clobbers memory
- There is a DMB memory barrier after the atomic operation
arm64 LL/SC
===========
Xen Function Xen Linux Inconsistent
============ === ===== ============
atomic_add ---/---/---/---/--- ---/---/---/---/--- ---
atomic_add_return ---/---/rel/mem/dmb ---/---/rel/mem/dmb --- (1)
atomic_sub ---/---/---/---/--- ---/---/---/---/--- ---
atomic_sub_return ---/---/rel/mem/dmb ---/---/rel/mem/dmb --- (1)
atomic_and ---/---/---/---/--- ---/---/---/---/--- ---
atomic_cmpxchg dmb/---/---/---/dmb ---/---/rel/mem/--- YES (2)
atomic_xchg ---/---/rel/mem/dmb ---/acq/rel/mem/dmb YES (3)
(1) It's actually interesting to me that Linux does it this way. As with the
LSE atomics below, I'd have expected acq/rel semantics and ditch the DMB.
Unless I'm missing something where there is a concern around taking an IRQ
between the LDAXR and the STLXR, which can't happen in the LSE atomic case
since it's a single instruction. But the exclusive monitor is cleared on
exception return in AArch64 so I'm struggling to see what that potential
issue may be. Regardless, Linux and Xen are consistent so we're OK ;-)
(2) The Linux version uses either STLXR with rel semantics if the comparison
passes, or DMB if the comparison fails. This is weaker than Xen's version,
which is quite blunt in always wrapping the operation between two DMBs. This
may be a holdover from Xen's arm32 versions being ported to arm64, as we
didn't support acq/rel semantics on LDREX and STREX in Armv7-A? Regardless,
this is quite a big discrepancy and I've not yet given it enough thought to
determine whether it would actually cause an issue. My feeling is that the
Linux LL/SC atomic_cmpxchg() should have have acq semantics on the LL, but
like you said these helpers are well tested so I'd be surprised if there
is a bug. See (5) below though, where the Linux LSE atomic_cmpxchg() *does*
have acq semantics.
(3) The Linux version just adds acq semantics to the LL, so we're OK here.
arm64 LSE (comparison to Xen's LL/SC)
=====================================
Xen Function Xen Linux Inconsistent
============ === ===== ============
atomic_add ---/---/---/---/--- ---/---/---/---/--- ---
atomic_add_return ---/---/rel/mem/dmb ---/acq/rel/mem/--- YES (4)
atomic_sub ---/---/---/---/--- ---/---/---/---/--- ---
atomic_sub_return ---/---/rel/mem/dmb ---/acq/rel/mem/--- YES (4)
atomic_and ---/---/---/---/--- ---/---/---/---/--- ---
atomic_cmpxchg dmb/---/---/---/dmb ---/acq/rel/mem/--- YES (5)
atomic_xchg ---/---/rel/mem/dmb ---/acq/rel/mem/--- YES (4)
(4) As noted in (1), this is how I would have expected Linux's LL/SC atomics to
work too. I don't think this discrepancy will cause any issues.
(5) As with (2) above, this is quite a big discrepancy to Xen. However at least
this version has acq semantics unlike the LL/SC version in (2), so I'm more
confident that there won't be regressions going from Xen LL/SC to Linux LSE
version of atomic_cmpxchg().
arm32 LL/SC
===========
Xen Function Xen Linux Inconsistent
============ === ===== ============
atomic_add ---/---/---/---/--- ---/---/---/---/--- ---
atomic_add_return dmb/---/---/---/dmb XXX/XXX/XXX/XXX/XXX YES (6)
atomic_sub ---/---/---/---/--- ---/---/---/---/--- ---
atomic_sub_return dmb/---/---/---/dmb XXX/XXX/XXX/XXX/XXX YES (6)
atomic_and ---/---/---/---/--- ---/---/---/---/--- ---
atomic_cmpxchg dmb/---/---/---/dmb XXX/XXX/XXX/XXX/XXX YES (6)
atomic_xchg dmb/---/---/---/dmb XXX/XXX/XXX/XXX/XXX YES (6)
(6) Linux only provides relaxed variants of these functions, such as
atomic_add_return_relaxed() and atomic_xchg_relaxed(). Patches #13 and #14
in the series add the stricter versions expected by Xen, wrapping calls to
Linux's relaxed variants inbetween two calls to smb_mb(). This makes them
consistent with Xen's existing helpers, though is quite blunt. It is worth
noting that Armv8-A AArch32 does support acq/rel semantics on exclusive
accesses, with LDAEX and STLEX, so I could imagine us introducing a new
arm32 hwcap to detect whether we're on actual Armv7-A hardware or Armv8-A
AArch32, then swap to lighterweight STLEX versions of these helpers rather
than the heavyweight double DMB versions. Whether that would actually give
measurable performance improvements is another story!
Having pondered note (1) in my previous email a bit more, I imagine the reason
for using a DMB instead of acq/rel semantics is to prevent accesses following
the STLXR from being reordered between it and the LDAXR.
I won't be winning any awards for this ASCII art but hopefully it helps convey
the point.
Using just an LDAXR/STLXR pair and ditching the DMB, accesses to [D] and [E]
can be reodered between the LDAXR and STLXR:
...
+---------- LDR [A]
| ...
| ...
| +----- STR [B]
| | ...
====|====|======LDAXR [C]================
| | ... X
| +----> ... |
| ... |
| ... <----------+ |
X ... | |
================STLXR [C]========|===|===
... | |
... | |
LDR [D]--------+ |
... |
STR [E]------------+
...
While dropping the acq semantics from the LDAXR and using a DMB instead will
prevent accesses to [D] and [E] being reordered between the LDXR/STLXR pair,
and keeping the rel semantics on the STLXR to prevents accesses to [A] and [B]
from being reordered after the STLXR:
...
+---------- LDR [A]
| ...
| ...
| +----- STR [B]
| | ...
| | LDXR [C]
| | ...
| +----> ...
| ...
X ...
================STLXR [C]================
================DMB======================
... X X
... | |
LDR [D]----+ |
... |
STR [E]---------+
...
As mentioned in my original email, the LSE atomic is a single instruction so
we can give it acq/rel semantics and not worry about any accesses to [A], [B],
[D], or [E] being reordered relative to that atomic:
...
+---------- LDR [A]
| ...
| ...
| +----- STR [B]
| | ...
X X ...
================LDADDAL [C]================
... X X
... | |
LDR [D]----+ |
... |
STR [E]---------+
...
So, makes sense that Linux uses acq/rel and no DMB for LSE, but Linux (and Xen)
are forced to use rel semantics and a DMB for the LL/SC case.
Anyway, point (2) from my earlier email is the one that's potentially more
concerning as we only have rel semantics and no DMB on the Linux LL/SC version
of atomic_cmpxchg(), in contrast to the existing Xen LL/SC implementation being
sandwiched between two DMBs and the Linux LSE version having acq/rel semantics.
Cheers,
Ash.
On 17/12/2020 15:37, Ash Wilding wrote:
> Hi Julien,
Hi,
First of all, apologies for the very late reply.
> Thanks for taking a look at the patches and providing feedback. I've seen your
> other comments and will reply to those separately when I get a chance (maybe at
> the weekend or over the Christmas break).
>
> RE the differences in ordering semantics between Xen's and Linux's atomics
> helpers, please find my notes below.
>
> Thoughts?
Thank you for the very detailed answer, it made a lot easier to
understand the differences!
I think it would be good to have some (if not all) the content in
Documents for future reference.
[...]
> The tables below use format AAA/BBB/CCC/DDD/EEE, where:
>
> - AAA is the memory barrier before the operation
> - BBB is the acquire semantics of the atomic operation
> - CCC is the release semantics of the atomic operation
> - DDD is whether the asm() block clobbers memory
> - EEE is the memory barrier after the operation
>
> For example, ---/---/rel/mem/dmb would mean:
>
> - No memory barrier before the operation
> - The atomic does *not* have acquire semantics
> - The atomic *does* have release semantics
> - The asm() block clobbers memory
> - There is a DMB memory barrier after the atomic operation
>
>
> arm64 LL/SC
> ===========
>
> Xen Function Xen Linux Inconsistent
> ============ === ===== ============
>
> atomic_add ---/---/---/---/--- ---/---/---/---/--- ---
> atomic_add_return ---/---/rel/mem/dmb ---/---/rel/mem/dmb --- (1)
> atomic_sub ---/---/---/---/--- ---/---/---/---/--- ---
> atomic_sub_return ---/---/rel/mem/dmb ---/---/rel/mem/dmb --- (1)
> atomic_and ---/---/---/---/--- ---/---/---/---/--- ---
> atomic_cmpxchg dmb/---/---/---/dmb ---/---/rel/mem/--- YES (2)
If I am not mistaken, Linux is implementing atomic_cmpxchg() with the
*_mb() version. So the semantic would be:
---/---/rel/mem/dmb
> atomic_xchg ---/---/rel/mem/dmb ---/acq/rel/mem/dmb YES (3)
From Linux:
#define __XCHG_CASE(w, sfx, name, sz, mb, nop_lse, acq, acq_lse, rel,
cl) \
[...]
/* LL/SC */
\
" prfm pstl1strm, %2\n"
\
"1: ld" #acq "xr" #sfx "\t%" #w "0, %2\n"
\
" st" #rel "xr" #sfx "\t%w1, %" #w "3, %2\n"
\
" cbnz %w1, 1b\n"
\
" " #mb,
\
[...]
__XCHG_CASE(w, , mb_, 32, dmb ish, nop, , a, l, "memory")
So I think the Linux semantic would be:
---/---/rel/mem/dmb
Therefore there would be no inconsistency between Xen and Linux.
>
> (1) It's actually interesting to me that Linux does it this way. As with the
> LSE atomics below, I'd have expected acq/rel semantics and ditch the DMB.
I have done some digging. The original implementation of atomic_{sub,
add}_return was actually using acq/rel semantics up until Linux 3.14.
But this was reworked by 8e86f0b409a4 "arm64: atomics: fix use of
acquire + release for full barrier semantics".
> Unless I'm missing something where there is a concern around taking an IRQ
> between the LDAXR and the STLXR, which can't happen in the LSE atomic case
> since it's a single instruction. But the exclusive monitor is cleared on
> exception return in AArch64 so I'm struggling to see what that potential
> issue may be. Regardless, Linux and Xen are consistent so we're OK ;-)
The commit I pointed above contains a lot of details on the issue. For
convenience, I copied the most relevant bits below:
"
On arm64, these operations have been incorrectly implemented as follows:
// A, B, C are independent memory locations
<Access [A]>
// atomic_op (B)
1: ldaxr x0, [B] // Exclusive load with acquire
<op(B)>
stlxr w1, x0, [B] // Exclusive store with release
cbnz w1, 1b
<Access [C]>
The assumption here being that two half barriers are equivalent to a
full barrier, so the only permitted ordering would be A -> B -> C
(where B is the atomic operation involving both a load and a store).
Unfortunately, this is not the case by the letter of the architecture
and, in fact, the accesses to A and C are permitted to pass their
nearest half barrier resulting in orderings such as Bl -> A -> C -> Bs
or Bl -> C -> A -> Bs (where Bl is the load-acquire on B and Bs is the
store-release on B). This is a clear violation of the full barrier
requirement.
"
> (2) The Linux version uses either STLXR with rel semantics if the comparison
> passes, or DMB if the comparison fails.
I think the DMB is only on the success path and there is no barrier on
the failure path. The commit 4e39715f4b5c "arm64: cmpxchg: avoid memory
barrier on comparison failure" seems to corroborate that.
> This is weaker than Xen's version,
> which is quite blunt in always wrapping the operation between two DMBs. This
> may be a holdover from Xen's arm32 versions being ported to arm64, as we
> didn't support acq/rel semantics on LDREX and STREX in Armv7-A? Regardless,
The atomic helpers used in Xen were originally taken from Linux 3.14.
Back then, atomic_cmpxchg() were using the two full barriers. This was
introduced by the commit I pointed out in (1).
This was then optimized with commit 4e39715f4b5c "arm64: cmpxchg: avoid
memory barrier on comparison failure".
> this is quite a big discrepancy and I've not yet given it enough thought to
> determine whether it would actually cause an issue. My feeling is that the
> Linux LL/SC atomic_cmpxchg() should have have acq semantics on the LL, but
> like you said these helpers are well tested so I'd be surprised if there
> is a bug. See (5) below though, where the Linux LSE atomic_cmpxchg() *does*
> have acq semantics.
If my understanding is correct the semantics would be (xen vs Linux):
- failure: dmb/---/---/---/dmb ---/---/rel/mem/---
- success: dmb/---/---/---/dmb ---/---/rel/mem/dmb
I think the success path is not going to be a problem. But we would need
to check if all the callers expect a full barrier for the failure path
(I would hope not).
>
> (3) The Linux version just adds acq semantics to the LL, so we're OK here.
>
>
> arm64 LSE (comparison to Xen's LL/SC)
> =====================================
>
> Xen Function Xen Linux Inconsistent
> ============ === ===== ============
>
> atomic_add ---/---/---/---/--- ---/---/---/---/--- ---
> atomic_add_return ---/---/rel/mem/dmb ---/acq/rel/mem/--- YES (4)
> atomic_sub ---/---/---/---/--- ---/---/---/---/--- ---
> atomic_sub_return ---/---/rel/mem/dmb ---/acq/rel/mem/--- YES (4)
> atomic_and ---/---/---/---/--- ---/---/---/---/--- ---
> atomic_cmpxchg dmb/---/---/---/dmb ---/acq/rel/mem/--- YES (5)
> atomic_xchg ---/---/rel/mem/dmb ---/acq/rel/mem/--- YES (4)
>
> (4) As noted in (1), this is how I would have expected Linux's LL/SC atomics to
> work too. I don't think this discrepancy will cause any issues.
IIUC, the LSE implementation would be using a single instruction that
has both the acquire and release semantics. Therefore, it would act as a
full barrier.
On the other hand, in the LL/SC implementation, the acquire and release
semantics is happening with two different instruction. Therefore, they
don't act as a full barrier.
So I think the memory ordering is going to be equivalent between Xen and
the Linux LSE implementation.
>
> (5) As with (2) above, this is quite a big discrepancy to Xen. However at least
> this version has acq semantics unlike the LL/SC version in (2), so I'm more
> confident that there won't be regressions going from Xen LL/SC to Linux LSE
> version of atomic_cmpxchg().
Are they really different? In both cases, the helper will act as a full
barrier. The main difference is how this ordering is achieved.
>
>
> arm32 LL/SC
> ===========
>
> Xen Function Xen Linux Inconsistent
> ============ === ===== ============
>
> atomic_add ---/---/---/---/--- ---/---/---/---/--- ---
> atomic_add_return dmb/---/---/---/dmb XXX/XXX/XXX/XXX/XXX YES (6)
> atomic_sub ---/---/---/---/--- ---/---/---/---/--- ---
> atomic_sub_return dmb/---/---/---/dmb XXX/XXX/XXX/XXX/XXX YES (6)
> atomic_and ---/---/---/---/--- ---/---/---/---/--- ---
> atomic_cmpxchg dmb/---/---/---/dmb XXX/XXX/XXX/XXX/XXX YES (6)
> atomic_xchg dmb/---/---/---/dmb XXX/XXX/XXX/XXX/XXX YES (6)
>
> (6) Linux only provides relaxed variants of these functions, such as
> atomic_add_return_relaxed() and atomic_xchg_relaxed(). Patches #13 and #14
> in the series add the stricter versions expected by Xen, wrapping calls to
> Linux's relaxed variants inbetween two calls to smb_mb(). This makes them
> consistent with Xen's existing helpers, though is quite blunt.
Linux will do the same when the fully ordered version is not implemented
(see include/linux/atomic-fallback.h).
> It is worth
> noting that Armv8-A AArch32 does support acq/rel semantics on exclusive
> accesses, with LDAEX and STLEX, so I could imagine us introducing a new
> arm32 hwcap to detect whether we're on actual Armv7-A hardware or Armv8-A
> AArch32, then swap to lighterweight STLEX versions of these helpers rather
> than the heavyweight double DMB versions. Whether that would actually give
> measurable performance improvements is another story!
That's good to know! So far, I haven't heard anyone using Xen 32-bit on
Armv8. I would wait until there is a request before introducing a 3rd
(4th if counting the ll/sc as 2) implementation for the atomics helpers.
That said, the 32-bit port is meant to only be supported on Armv7. It
should boot on Armv8, but there is no promise it will be fully
functional nor stable.
Overall, it looks like to me that re-syncing the atomic implementation
with Linux should not be a major problem.
We are in the middle of 4.15 freeze, I will try to go through the series
ASAP.
Cheers,
--
Julien Grall
© 2016 - 2026 Red Hat, Inc.