arch/loongarch/Kconfig | 2 + arch/loongarch/include/asm/cmpxchg.h | 66 +++++++++++++++++++++++ arch/loongarch/include/asm/cpu-features.h | 1 + arch/loongarch/include/asm/cpu.h | 2 + arch/loongarch/include/asm/loongarch.h | 1 + arch/loongarch/kernel/cpu-probe.c | 2 + arch/loongarch/kernel/proc.c | 1 + 7 files changed, 75 insertions(+)
This patch series adds 128-bit atomic compare-and-exchange support for
LoongArch architecture, which fixes BPF scheduler test failures caused
by missing 128-bit atomics support.
The series consists of three patches:
1. "LoongArch: Add SCQ support detection"
- Check CPUCFG2_SCQ bit to determin if the CPU supports
SCQ instrction.
2. "LoongArch: Add 128-bit atomic cmpxchg support"
- Implements 128-bit atomic compare-and-exchange using LoongArch's
LL.D/SC.Q instructions
- For LoongArch CPUs lacking 128-bit atomic instruction(e.g.,
the SCQ instruction on 3A5000), use a spinlock to emulate
the atomic operation.
- Fixes BPF scheduler test failures (scx_central scx_qmap) where
kmalloc_nolock_noprof returns NULL due to missing 128-bit atomics,
leading to -ENOMEM errors during scheduler initialization
3. LoongArch: Enable 128-bit atomics cmpxchg support"
- Adds select HAVE_CMPXCHG_DOUBLE and select HAVE_ALIGNED_STRUCT_PAGE
in Kconfig to enable 128-bit atomic cmpxchg support
The issue was identified through BPF scheduler test failures where
scx_central and scx_qmap schedulers would fail to initialize. Testing
was performed using the scx_qmap scheduler from tools/sched_ext/,
confirming that the patches resolve the initialization failures.
---
Changes in v8:
- Merge patch 2 and patch 3 into one patch
- Put HAVE_CMPXCHG_DOUBLE in order
- Link to v7: https://lore.kernel.org/all/20251230013417.37393-1-dongtai.guo@linux.dev/
---
Changes in v7:
- Create patches based on loongarch-next branch(previously used master)
- Link to v6: https://lore.kernel.org/r/20251215-2-v6-0-09a486e8df99@linux.dev
Changes in v6:
- Put SCQ information in hwcap
- Link to v5: https://lore.kernel.org/r/20251212-2-v5-0-704b3af55f7d@linux.dev
Changes in v5:
- Reordered the patches
- Link to v4: https://lore.kernel.org/r/20251205-2-v4-0-e5ab932cf219@linux.dev
Changes in v4:
- Add SCQ support detection
- Add spinlock to emulate 128-bit cmpxchg
- Link to v3: https://lore.kernel.org/r/20251126-2-v3-0-851b5a516801@linux.dev
Changes in v3:
- dbar 0 -> __WEAK_LLSC_MB
- =ZB" (__ptr[0]) -> "r" (__ptr)
- Link to v2: https://lore.kernel.org/r/20251124-2-v2-0-b38216e25fd9@linux.dev
Changes in v2:
- Use a normal ld.d for the high word instead of ll.d to avoid race
condition
- Insert a dbar between ll.d and ld.d to prevent reordering
- Simply __cmpxchg128_asm("ll.d", "sc.q", ptr, o, n) to __cmpxchg128_asm(ptr, o, n)
- Fix address operand constraints after testing different approaches:
* ld.d with "m"
* ll.d with "ZC",
* sc.q with "ZB"(alternative constraints caused issues:
- "r" caused system hang
- "ZC" caused compiler error:
{standard input}: Assembler messages:
{standard input}:10037: Fatal error: Immediate overflow.
format: u0:0 )
- Link to v1: https://lore.kernel.org/r/20251120-2-v1-0-705bdc440550@linux.dev
George Guo (3):
LoongArch: Add SCQ support detection
LoongArch: Add 128-bit atomic cmpxchg support
LoongArch: Enable 128-bit atomics cmpxchg support
arch/loongarch/Kconfig | 2 +
arch/loongarch/include/asm/cmpxchg.h | 66 +++++++++++++++++++++++
arch/loongarch/include/asm/cpu-features.h | 1 +
arch/loongarch/include/asm/cpu.h | 2 +
arch/loongarch/include/asm/loongarch.h | 1 +
arch/loongarch/kernel/cpu-probe.c | 2 +
arch/loongarch/kernel/proc.c | 1 +
7 files changed, 75 insertions(+)
--
2.49.0
Hi, George,
On Wed, Dec 31, 2025 at 11:45 AM George Guo <dongtai.guo@linux.dev> wrote:
>
> This patch series adds 128-bit atomic compare-and-exchange support for
> LoongArch architecture, which fixes BPF scheduler test failures caused
> by missing 128-bit atomics support.
>
> The series consists of three patches:
>
> 1. "LoongArch: Add SCQ support detection"
> - Check CPUCFG2_SCQ bit to determin if the CPU supports
> SCQ instrction.
>
> 2. "LoongArch: Add 128-bit atomic cmpxchg support"
> - Implements 128-bit atomic compare-and-exchange using LoongArch's
> LL.D/SC.Q instructions
> - For LoongArch CPUs lacking 128-bit atomic instruction(e.g.,
> the SCQ instruction on 3A5000), use a spinlock to emulate
> the atomic operation.
> - Fixes BPF scheduler test failures (scx_central scx_qmap) where
> kmalloc_nolock_noprof returns NULL due to missing 128-bit atomics,
> leading to -ENOMEM errors during scheduler initialization
>
> 3. LoongArch: Enable 128-bit atomics cmpxchg support"
> - Adds select HAVE_CMPXCHG_DOUBLE and select HAVE_ALIGNED_STRUCT_PAGE
> in Kconfig to enable 128-bit atomic cmpxchg support
>
> The issue was identified through BPF scheduler test failures where
> scx_central and scx_qmap schedulers would fail to initialize. Testing
> was performed using the scx_qmap scheduler from tools/sched_ext/,
> confirming that the patches resolve the initialization failures.
>
> ---
> Changes in v8:
> - Merge patch 2 and patch 3 into one patch
> - Put HAVE_CMPXCHG_DOUBLE in order
> - Link to v7: https://lore.kernel.org/all/20251230013417.37393-1-dongtai.guo@linux.dev/
I don't know why you make all versions in a single thread, and the
version numbers of cover letters are always wrong.
For the code itself:
1. You said you have set hwcaps, but you completely ignore
arch/loongarch/include/uapi/asm/hwcap.h, I don't know why.
2. You can simply do
#define system_has_cmpxchg128() (cpu_has_scq)
and don't need to define __cmpxchg128_locked(), which is the same as
X86 and RISC-V.
Huacai
>
> ---
> Changes in v7:
> - Create patches based on loongarch-next branch(previously used master)
> - Link to v6: https://lore.kernel.org/r/20251215-2-v6-0-09a486e8df99@linux.dev
>
> Changes in v6:
> - Put SCQ information in hwcap
> - Link to v5: https://lore.kernel.org/r/20251212-2-v5-0-704b3af55f7d@linux.dev
>
> Changes in v5:
> - Reordered the patches
> - Link to v4: https://lore.kernel.org/r/20251205-2-v4-0-e5ab932cf219@linux.dev
>
> Changes in v4:
> - Add SCQ support detection
> - Add spinlock to emulate 128-bit cmpxchg
> - Link to v3: https://lore.kernel.org/r/20251126-2-v3-0-851b5a516801@linux.dev
>
> Changes in v3:
> - dbar 0 -> __WEAK_LLSC_MB
> - =ZB" (__ptr[0]) -> "r" (__ptr)
> - Link to v2: https://lore.kernel.org/r/20251124-2-v2-0-b38216e25fd9@linux.dev
>
> Changes in v2:
> - Use a normal ld.d for the high word instead of ll.d to avoid race
> condition
> - Insert a dbar between ll.d and ld.d to prevent reordering
> - Simply __cmpxchg128_asm("ll.d", "sc.q", ptr, o, n) to __cmpxchg128_asm(ptr, o, n)
> - Fix address operand constraints after testing different approaches:
> * ld.d with "m"
> * ll.d with "ZC",
> * sc.q with "ZB"(alternative constraints caused issues:
> - "r" caused system hang
> - "ZC" caused compiler error:
> {standard input}: Assembler messages:
> {standard input}:10037: Fatal error: Immediate overflow.
> format: u0:0 )
> - Link to v1: https://lore.kernel.org/r/20251120-2-v1-0-705bdc440550@linux.dev
>
> George Guo (3):
> LoongArch: Add SCQ support detection
> LoongArch: Add 128-bit atomic cmpxchg support
> LoongArch: Enable 128-bit atomics cmpxchg support
>
> arch/loongarch/Kconfig | 2 +
> arch/loongarch/include/asm/cmpxchg.h | 66 +++++++++++++++++++++++
> arch/loongarch/include/asm/cpu-features.h | 1 +
> arch/loongarch/include/asm/cpu.h | 2 +
> arch/loongarch/include/asm/loongarch.h | 1 +
> arch/loongarch/kernel/cpu-probe.c | 2 +
> arch/loongarch/kernel/proc.c | 1 +
> 7 files changed, 75 insertions(+)
>
> --
> 2.49.0
>
© 2016 - 2026 Red Hat, Inc.