xen/arch/arm/Kconfig | 11 + xen/arch/arm/Makefile | 1 + xen/arch/arm/lse.c | 13 + xen/arch/arm/setup.c | 13 +- xen/include/asm-arm/arm32/atomic.h | 253 +++++++----- xen/include/asm-arm/arm32/cmpxchg.h | 388 +++++++++++------- xen/include/asm-arm/arm32/system.h | 2 +- xen/include/asm-arm/arm64/atomic.h | 236 +++++------ xen/include/asm-arm/arm64/atomic_ll_sc.h | 231 +++++++++++ xen/include/asm-arm/arm64/atomic_lse.h | 246 +++++++++++ xen/include/asm-arm/arm64/cmpxchg.h | 497 ++++++++++++++++------- xen/include/asm-arm/arm64/lse.h | 48 +++ xen/include/asm-arm/arm64/system.h | 2 +- xen/include/asm-arm/atomic.h | 15 +- xen/include/asm-arm/cpufeature.h | 57 +-- xen/include/asm-arm/system.h | 9 +- xen/include/xen/rwonce.h | 21 + 17 files changed, 1469 insertions(+), 574 deletions(-) create mode 100644 xen/arch/arm/lse.c create mode 100644 xen/include/asm-arm/arm64/atomic_ll_sc.h create mode 100644 xen/include/asm-arm/arm64/atomic_lse.h create mode 100644 xen/include/asm-arm/arm64/lse.h create mode 100644 xen/include/xen/rwonce.h
From: Ash Wilding <ash.j.wilding@gmail.com> Hey, I've found some time to improve this series a bit: Changes since RFC v1 ==================== - Broken up patches into smaller chunks to aid in readability. - As per Julien's feedback I've also introduced intermediary patches that first remove Xen's existing headers, then pull in the current Linux versions as-is. This means we only need to review the changes made while porting to Xen, rather than reviewing the existing Linux code. - Pull in Linux's <asm-generic/rwonce.h> as <xen/rwonce.h> for __READ_ONCE()/__WRITE_ONCE() instead of putting these in Xen's <xen/compiler.h>. While doing this, provide justification for dropping Linux's <linux/compiler_types.h> and relaxing the __READ_ONCE() usage of __unqual_scalar_typeof() to just typeof() (see patches #5 and #6). Keeping the rest of the cover-letter unchanged as it would still be good to discuss these things: Arguments in favour of doing this ================================= - Lets SMMUv3 driver switch to using <asm/atomic.h> rather than maintaining its own implementation of the helpers. - Provides mitigation against XSA-295 [2], which affects both arm32 and arm64, across all versions of Xen, and may allow a domU to maliciously or erroneously DoS the hypervisor. - All Armv8-A core implementations since ~2017 implement LSE so there is an argument to be made we are long overdue support in Xen. This is compounded by LSE atomics being more performant than LL/SC atomics in most real-world applications, especially at high core counts. - We may be able to get improved performance when using LL/SC too as Linux provides helpers with relaxed ordering requirements which are currently not available in Xen, though for this we would need to go back through existing code to see where the more relaxed versions can be safely used. - Anything else? Arguments against doing this ============================ - Limited testing infrastructure in place to ensure use of new atomics helpers does not introduce new bugs and regressions. This is a particularly strong argument given how difficult it can be to identify and debug malfunctioning atomics. The old adage applies, "If it ain't broke, don't fix it." - Anything else? Disclaimers =========== - This is a very rough first-pass effort intended to spur the discussions along. - Only build-tested on arm64 and arm32, *not* run-tested. I did also build for x86_64 just to make I didn't inadvertently break that. - This version only tackles atomics and cmpxchg; I've not yet had a chance to look at locks so those are still using LL/SC. - The timeout variants of cmpxchg() (used by guest atomics) are still using LL/SC regardless of LSE being available, as these helpers are not provided by Linux so I just copied over the existing Xen ones. Any further comments, guidance, and discussion on how to improve this approach and get LSE atomics support merged into Xen would be greatly appreciated. Thanks! Ash. [1] https://lore.kernel.org/xen-devel/13baac40-8b10-0def-4e44-0d8f655fcaf1@xen.org/ [2] https://xenbits.xen.org/xsa/advisory-295.txt Ash Wilding (15): xen/arm: Support detection of CPU features in other ID registers xen/arm: Add detection of Armv8.1-LSE atomic instructions xen/arm: Add ARM64_HAS_LSE_ATOMICS hwcap xen/arm: Delete Xen atomics helpers xen/arm: pull in Linux atomics helpers and dependencies xen: port Linux <asm-generic/rwonce.h> to Xen xen/arm: prepare existing Xen headers for Linux atomics xen/arm64: port Linux LL/SC atomics helpers to Xen xen/arm64: port Linux LSE atomics helpers to Xen xen/arm64: port Linux's arm64 cmpxchg.h to Xen xen/arm64: port Linux's arm64 atomic.h to Xen xen/arm64: port Linux's arm64 lse.h to Xen xen/arm32: port Linux's arm32 atomic.h to Xen xen/arm32: port Linux's arm32 cmpxchg.h to Xen xen/arm: remove dependency on gcc built-in __sync_fetch_and_add() xen/arch/arm/Kconfig | 11 + xen/arch/arm/Makefile | 1 + xen/arch/arm/lse.c | 13 + xen/arch/arm/setup.c | 13 +- xen/include/asm-arm/arm32/atomic.h | 253 +++++++----- xen/include/asm-arm/arm32/cmpxchg.h | 388 +++++++++++------- xen/include/asm-arm/arm32/system.h | 2 +- xen/include/asm-arm/arm64/atomic.h | 236 +++++------ xen/include/asm-arm/arm64/atomic_ll_sc.h | 231 +++++++++++ xen/include/asm-arm/arm64/atomic_lse.h | 246 +++++++++++ xen/include/asm-arm/arm64/cmpxchg.h | 497 ++++++++++++++++------- xen/include/asm-arm/arm64/lse.h | 48 +++ xen/include/asm-arm/arm64/system.h | 2 +- xen/include/asm-arm/atomic.h | 15 +- xen/include/asm-arm/cpufeature.h | 57 +-- xen/include/asm-arm/system.h | 9 +- xen/include/xen/rwonce.h | 21 + 17 files changed, 1469 insertions(+), 574 deletions(-) create mode 100644 xen/arch/arm/lse.c create mode 100644 xen/include/asm-arm/arm64/atomic_ll_sc.h create mode 100644 xen/include/asm-arm/arm64/atomic_lse.h create mode 100644 xen/include/asm-arm/arm64/lse.h create mode 100644 xen/include/xen/rwonce.h -- 2.24.3 (Apple Git-128)
On 11/11/2020 21:51, Ash Wilding wrote: > From: Ash Wilding <ash.j.wilding@gmail.com> > > > Hey, Hi Ash, > > I've found some time to improve this series a bit: > > > Changes since RFC v1 > ==================== > > - Broken up patches into smaller chunks to aid in readability. > > - As per Julien's feedback I've also introduced intermediary patches > that first remove Xen's existing headers, then pull in the current > Linux versions as-is. This means we only need to review the changes > made while porting to Xen, rather than reviewing the existing Linux > code. > > - Pull in Linux's <asm-generic/rwonce.h> as <xen/rwonce.h> for > __READ_ONCE()/__WRITE_ONCE() instead of putting these in Xen's > <xen/compiler.h>. While doing this, provide justification for > dropping Linux's <linux/compiler_types.h> and relaxing the > __READ_ONCE() usage of __unqual_scalar_typeof() to just typeof() > (see patches #5 and #6). > > > > Keeping the rest of the cover-letter unchanged as it would still be > good to discuss these things: > > > Arguments in favour of doing this > ================================= > > - Lets SMMUv3 driver switch to using <asm/atomic.h> rather than > maintaining its own implementation of the helpers. > > - Provides mitigation against XSA-295 [2], which affects both arm32 > and arm64, across all versions of Xen, and may allow a domU to > maliciously or erroneously DoS the hypervisor. > > - All Armv8-A core implementations since ~2017 implement LSE so > there is an argument to be made we are long overdue support in > Xen. This is compounded by LSE atomics being more performant than > LL/SC atomics in most real-world applications, especially at high > core counts. > > - We may be able to get improved performance when using LL/SC too > as Linux provides helpers with relaxed ordering requirements which > are currently not available in Xen, though for this we would need > to go back through existing code to see where the more relaxed > versions can be safely used. > > - Anything else? > > > Arguments against doing this > ============================ > > - Limited testing infrastructure in place to ensure use of new > atomics helpers does not introduce new bugs and regressions. This > is a particularly strong argument given how difficult it can be to > identify and debug malfunctioning atomics. The old adage applies, > "If it ain't broke, don't fix it." I am not too concerned about the Linux atomics implementation. They are well tested and your changes doesn't seem to touch the implementation itself. However, I vaguely remember that some of the atomics helper in Xen are somewhat much stronger than the Linux counterpart. Would you be able to look at all the existing helpers and see whether the memory ordering diverge? Once we have a list we could decide whether we want to make them stronger again or check the callers. Cheers, -- Julien Grall
Hi Julien, Thanks for taking a look at the patches and providing feedback. I've seen your other comments and will reply to those separately when I get a chance (maybe at the weekend or over the Christmas break). RE the differences in ordering semantics between Xen's and Linux's atomics helpers, please find my notes below. Thoughts? Cheers, Ash. The tables below use format AAA/BBB/CCC/DDD/EEE, where: - AAA is the memory barrier before the operation - BBB is the acquire semantics of the atomic operation - CCC is the release semantics of the atomic operation - DDD is whether the asm() block clobbers memory - EEE is the memory barrier after the operation For example, ---/---/rel/mem/dmb would mean: - No memory barrier before the operation - The atomic does *not* have acquire semantics - The atomic *does* have release semantics - The asm() block clobbers memory - There is a DMB memory barrier after the atomic operation arm64 LL/SC =========== Xen Function Xen Linux Inconsistent ============ === ===== ============ atomic_add ---/---/---/---/--- ---/---/---/---/--- --- atomic_add_return ---/---/rel/mem/dmb ---/---/rel/mem/dmb --- (1) atomic_sub ---/---/---/---/--- ---/---/---/---/--- --- atomic_sub_return ---/---/rel/mem/dmb ---/---/rel/mem/dmb --- (1) atomic_and ---/---/---/---/--- ---/---/---/---/--- --- atomic_cmpxchg dmb/---/---/---/dmb ---/---/rel/mem/--- YES (2) atomic_xchg ---/---/rel/mem/dmb ---/acq/rel/mem/dmb YES (3) (1) It's actually interesting to me that Linux does it this way. As with the LSE atomics below, I'd have expected acq/rel semantics and ditch the DMB. Unless I'm missing something where there is a concern around taking an IRQ between the LDAXR and the STLXR, which can't happen in the LSE atomic case since it's a single instruction. But the exclusive monitor is cleared on exception return in AArch64 so I'm struggling to see what that potential issue may be. Regardless, Linux and Xen are consistent so we're OK ;-) (2) The Linux version uses either STLXR with rel semantics if the comparison passes, or DMB if the comparison fails. This is weaker than Xen's version, which is quite blunt in always wrapping the operation between two DMBs. This may be a holdover from Xen's arm32 versions being ported to arm64, as we didn't support acq/rel semantics on LDREX and STREX in Armv7-A? Regardless, this is quite a big discrepancy and I've not yet given it enough thought to determine whether it would actually cause an issue. My feeling is that the Linux LL/SC atomic_cmpxchg() should have have acq semantics on the LL, but like you said these helpers are well tested so I'd be surprised if there is a bug. See (5) below though, where the Linux LSE atomic_cmpxchg() *does* have acq semantics. (3) The Linux version just adds acq semantics to the LL, so we're OK here. arm64 LSE (comparison to Xen's LL/SC) ===================================== Xen Function Xen Linux Inconsistent ============ === ===== ============ atomic_add ---/---/---/---/--- ---/---/---/---/--- --- atomic_add_return ---/---/rel/mem/dmb ---/acq/rel/mem/--- YES (4) atomic_sub ---/---/---/---/--- ---/---/---/---/--- --- atomic_sub_return ---/---/rel/mem/dmb ---/acq/rel/mem/--- YES (4) atomic_and ---/---/---/---/--- ---/---/---/---/--- --- atomic_cmpxchg dmb/---/---/---/dmb ---/acq/rel/mem/--- YES (5) atomic_xchg ---/---/rel/mem/dmb ---/acq/rel/mem/--- YES (4) (4) As noted in (1), this is how I would have expected Linux's LL/SC atomics to work too. I don't think this discrepancy will cause any issues. (5) As with (2) above, this is quite a big discrepancy to Xen. However at least this version has acq semantics unlike the LL/SC version in (2), so I'm more confident that there won't be regressions going from Xen LL/SC to Linux LSE version of atomic_cmpxchg(). arm32 LL/SC =========== Xen Function Xen Linux Inconsistent ============ === ===== ============ atomic_add ---/---/---/---/--- ---/---/---/---/--- --- atomic_add_return dmb/---/---/---/dmb XXX/XXX/XXX/XXX/XXX YES (6) atomic_sub ---/---/---/---/--- ---/---/---/---/--- --- atomic_sub_return dmb/---/---/---/dmb XXX/XXX/XXX/XXX/XXX YES (6) atomic_and ---/---/---/---/--- ---/---/---/---/--- --- atomic_cmpxchg dmb/---/---/---/dmb XXX/XXX/XXX/XXX/XXX YES (6) atomic_xchg dmb/---/---/---/dmb XXX/XXX/XXX/XXX/XXX YES (6) (6) Linux only provides relaxed variants of these functions, such as atomic_add_return_relaxed() and atomic_xchg_relaxed(). Patches #13 and #14 in the series add the stricter versions expected by Xen, wrapping calls to Linux's relaxed variants inbetween two calls to smb_mb(). This makes them consistent with Xen's existing helpers, though is quite blunt. It is worth noting that Armv8-A AArch32 does support acq/rel semantics on exclusive accesses, with LDAEX and STLEX, so I could imagine us introducing a new arm32 hwcap to detect whether we're on actual Armv7-A hardware or Armv8-A AArch32, then swap to lighterweight STLEX versions of these helpers rather than the heavyweight double DMB versions. Whether that would actually give measurable performance improvements is another story!
Having pondered note (1) in my previous email a bit more, I imagine the reason for using a DMB instead of acq/rel semantics is to prevent accesses following the STLXR from being reordered between it and the LDAXR. I won't be winning any awards for this ASCII art but hopefully it helps convey the point. Using just an LDAXR/STLXR pair and ditching the DMB, accesses to [D] and [E] can be reodered between the LDAXR and STLXR: ... +---------- LDR [A] | ... | ... | +----- STR [B] | | ... ====|====|======LDAXR [C]================ | | ... X | +----> ... | | ... | | ... <----------+ | X ... | | ================STLXR [C]========|===|=== ... | | ... | | LDR [D]--------+ | ... | STR [E]------------+ ... While dropping the acq semantics from the LDAXR and using a DMB instead will prevent accesses to [D] and [E] being reordered between the LDXR/STLXR pair, and keeping the rel semantics on the STLXR to prevents accesses to [A] and [B] from being reordered after the STLXR: ... +---------- LDR [A] | ... | ... | +----- STR [B] | | ... | | LDXR [C] | | ... | +----> ... | ... X ... ================STLXR [C]================ ================DMB====================== ... X X ... | | LDR [D]----+ | ... | STR [E]---------+ ... As mentioned in my original email, the LSE atomic is a single instruction so we can give it acq/rel semantics and not worry about any accesses to [A], [B], [D], or [E] being reordered relative to that atomic: ... +---------- LDR [A] | ... | ... | +----- STR [B] | | ... X X ... ================LDADDAL [C]================ ... X X ... | | LDR [D]----+ | ... | STR [E]---------+ ... So, makes sense that Linux uses acq/rel and no DMB for LSE, but Linux (and Xen) are forced to use rel semantics and a DMB for the LL/SC case. Anyway, point (2) from my earlier email is the one that's potentially more concerning as we only have rel semantics and no DMB on the Linux LL/SC version of atomic_cmpxchg(), in contrast to the existing Xen LL/SC implementation being sandwiched between two DMBs and the Linux LSE version having acq/rel semantics. Cheers, Ash.
On 17/12/2020 15:37, Ash Wilding wrote: > Hi Julien, Hi, First of all, apologies for the very late reply. > Thanks for taking a look at the patches and providing feedback. I've seen your > other comments and will reply to those separately when I get a chance (maybe at > the weekend or over the Christmas break). > > RE the differences in ordering semantics between Xen's and Linux's atomics > helpers, please find my notes below. > > Thoughts? Thank you for the very detailed answer, it made a lot easier to understand the differences! I think it would be good to have some (if not all) the content in Documents for future reference. [...] > The tables below use format AAA/BBB/CCC/DDD/EEE, where: > > - AAA is the memory barrier before the operation > - BBB is the acquire semantics of the atomic operation > - CCC is the release semantics of the atomic operation > - DDD is whether the asm() block clobbers memory > - EEE is the memory barrier after the operation > > For example, ---/---/rel/mem/dmb would mean: > > - No memory barrier before the operation > - The atomic does *not* have acquire semantics > - The atomic *does* have release semantics > - The asm() block clobbers memory > - There is a DMB memory barrier after the atomic operation > > > arm64 LL/SC > =========== > > Xen Function Xen Linux Inconsistent > ============ === ===== ============ > > atomic_add ---/---/---/---/--- ---/---/---/---/--- --- > atomic_add_return ---/---/rel/mem/dmb ---/---/rel/mem/dmb --- (1) > atomic_sub ---/---/---/---/--- ---/---/---/---/--- --- > atomic_sub_return ---/---/rel/mem/dmb ---/---/rel/mem/dmb --- (1) > atomic_and ---/---/---/---/--- ---/---/---/---/--- --- > atomic_cmpxchg dmb/---/---/---/dmb ---/---/rel/mem/--- YES (2) If I am not mistaken, Linux is implementing atomic_cmpxchg() with the *_mb() version. So the semantic would be: ---/---/rel/mem/dmb > atomic_xchg ---/---/rel/mem/dmb ---/acq/rel/mem/dmb YES (3) From Linux: #define __XCHG_CASE(w, sfx, name, sz, mb, nop_lse, acq, acq_lse, rel, cl) \ [...] /* LL/SC */ \ " prfm pstl1strm, %2\n" \ "1: ld" #acq "xr" #sfx "\t%" #w "0, %2\n" \ " st" #rel "xr" #sfx "\t%w1, %" #w "3, %2\n" \ " cbnz %w1, 1b\n" \ " " #mb, \ [...] __XCHG_CASE(w, , mb_, 32, dmb ish, nop, , a, l, "memory") So I think the Linux semantic would be: ---/---/rel/mem/dmb Therefore there would be no inconsistency between Xen and Linux. > > (1) It's actually interesting to me that Linux does it this way. As with the > LSE atomics below, I'd have expected acq/rel semantics and ditch the DMB. I have done some digging. The original implementation of atomic_{sub, add}_return was actually using acq/rel semantics up until Linux 3.14. But this was reworked by 8e86f0b409a4 "arm64: atomics: fix use of acquire + release for full barrier semantics". > Unless I'm missing something where there is a concern around taking an IRQ > between the LDAXR and the STLXR, which can't happen in the LSE atomic case > since it's a single instruction. But the exclusive monitor is cleared on > exception return in AArch64 so I'm struggling to see what that potential > issue may be. Regardless, Linux and Xen are consistent so we're OK ;-) The commit I pointed above contains a lot of details on the issue. For convenience, I copied the most relevant bits below: " On arm64, these operations have been incorrectly implemented as follows: // A, B, C are independent memory locations <Access [A]> // atomic_op (B) 1: ldaxr x0, [B] // Exclusive load with acquire <op(B)> stlxr w1, x0, [B] // Exclusive store with release cbnz w1, 1b <Access [C]> The assumption here being that two half barriers are equivalent to a full barrier, so the only permitted ordering would be A -> B -> C (where B is the atomic operation involving both a load and a store). Unfortunately, this is not the case by the letter of the architecture and, in fact, the accesses to A and C are permitted to pass their nearest half barrier resulting in orderings such as Bl -> A -> C -> Bs or Bl -> C -> A -> Bs (where Bl is the load-acquire on B and Bs is the store-release on B). This is a clear violation of the full barrier requirement. " > (2) The Linux version uses either STLXR with rel semantics if the comparison > passes, or DMB if the comparison fails. I think the DMB is only on the success path and there is no barrier on the failure path. The commit 4e39715f4b5c "arm64: cmpxchg: avoid memory barrier on comparison failure" seems to corroborate that. > This is weaker than Xen's version, > which is quite blunt in always wrapping the operation between two DMBs. This > may be a holdover from Xen's arm32 versions being ported to arm64, as we > didn't support acq/rel semantics on LDREX and STREX in Armv7-A? Regardless, The atomic helpers used in Xen were originally taken from Linux 3.14. Back then, atomic_cmpxchg() were using the two full barriers. This was introduced by the commit I pointed out in (1). This was then optimized with commit 4e39715f4b5c "arm64: cmpxchg: avoid memory barrier on comparison failure". > this is quite a big discrepancy and I've not yet given it enough thought to > determine whether it would actually cause an issue. My feeling is that the > Linux LL/SC atomic_cmpxchg() should have have acq semantics on the LL, but > like you said these helpers are well tested so I'd be surprised if there > is a bug. See (5) below though, where the Linux LSE atomic_cmpxchg() *does* > have acq semantics. If my understanding is correct the semantics would be (xen vs Linux): - failure: dmb/---/---/---/dmb ---/---/rel/mem/--- - success: dmb/---/---/---/dmb ---/---/rel/mem/dmb I think the success path is not going to be a problem. But we would need to check if all the callers expect a full barrier for the failure path (I would hope not). > > (3) The Linux version just adds acq semantics to the LL, so we're OK here. > > > arm64 LSE (comparison to Xen's LL/SC) > ===================================== > > Xen Function Xen Linux Inconsistent > ============ === ===== ============ > > atomic_add ---/---/---/---/--- ---/---/---/---/--- --- > atomic_add_return ---/---/rel/mem/dmb ---/acq/rel/mem/--- YES (4) > atomic_sub ---/---/---/---/--- ---/---/---/---/--- --- > atomic_sub_return ---/---/rel/mem/dmb ---/acq/rel/mem/--- YES (4) > atomic_and ---/---/---/---/--- ---/---/---/---/--- --- > atomic_cmpxchg dmb/---/---/---/dmb ---/acq/rel/mem/--- YES (5) > atomic_xchg ---/---/rel/mem/dmb ---/acq/rel/mem/--- YES (4) > > (4) As noted in (1), this is how I would have expected Linux's LL/SC atomics to > work too. I don't think this discrepancy will cause any issues. IIUC, the LSE implementation would be using a single instruction that has both the acquire and release semantics. Therefore, it would act as a full barrier. On the other hand, in the LL/SC implementation, the acquire and release semantics is happening with two different instruction. Therefore, they don't act as a full barrier. So I think the memory ordering is going to be equivalent between Xen and the Linux LSE implementation. > > (5) As with (2) above, this is quite a big discrepancy to Xen. However at least > this version has acq semantics unlike the LL/SC version in (2), so I'm more > confident that there won't be regressions going from Xen LL/SC to Linux LSE > version of atomic_cmpxchg(). Are they really different? In both cases, the helper will act as a full barrier. The main difference is how this ordering is achieved. > > > arm32 LL/SC > =========== > > Xen Function Xen Linux Inconsistent > ============ === ===== ============ > > atomic_add ---/---/---/---/--- ---/---/---/---/--- --- > atomic_add_return dmb/---/---/---/dmb XXX/XXX/XXX/XXX/XXX YES (6) > atomic_sub ---/---/---/---/--- ---/---/---/---/--- --- > atomic_sub_return dmb/---/---/---/dmb XXX/XXX/XXX/XXX/XXX YES (6) > atomic_and ---/---/---/---/--- ---/---/---/---/--- --- > atomic_cmpxchg dmb/---/---/---/dmb XXX/XXX/XXX/XXX/XXX YES (6) > atomic_xchg dmb/---/---/---/dmb XXX/XXX/XXX/XXX/XXX YES (6) > > (6) Linux only provides relaxed variants of these functions, such as > atomic_add_return_relaxed() and atomic_xchg_relaxed(). Patches #13 and #14 > in the series add the stricter versions expected by Xen, wrapping calls to > Linux's relaxed variants inbetween two calls to smb_mb(). This makes them > consistent with Xen's existing helpers, though is quite blunt. Linux will do the same when the fully ordered version is not implemented (see include/linux/atomic-fallback.h). > It is worth > noting that Armv8-A AArch32 does support acq/rel semantics on exclusive > accesses, with LDAEX and STLEX, so I could imagine us introducing a new > arm32 hwcap to detect whether we're on actual Armv7-A hardware or Armv8-A > AArch32, then swap to lighterweight STLEX versions of these helpers rather > than the heavyweight double DMB versions. Whether that would actually give > measurable performance improvements is another story! That's good to know! So far, I haven't heard anyone using Xen 32-bit on Armv8. I would wait until there is a request before introducing a 3rd (4th if counting the ll/sc as 2) implementation for the atomics helpers. That said, the 32-bit port is meant to only be supported on Armv7. It should boot on Armv8, but there is no promise it will be fully functional nor stable. Overall, it looks like to me that re-syncing the atomic implementation with Linux should not be a major problem. We are in the middle of 4.15 freeze, I will try to go through the series ASAP. Cheers, -- Julien Grall
© 2016 - 2024 Red Hat, Inc.