arch/loongarch/Kconfig | 2 + arch/loongarch/include/asm/cmpxchg.h | 66 +++++++++++++++++++++++ arch/loongarch/include/asm/cpu-features.h | 1 + arch/loongarch/include/asm/cpu.h | 2 + arch/loongarch/include/asm/loongarch.h | 1 + arch/loongarch/kernel/cpu-probe.c | 2 + arch/loongarch/kernel/proc.c | 1 + 7 files changed, 75 insertions(+)
This patch series adds 128-bit atomic compare-and-exchange support for
LoongArch architecture, which fixes BPF scheduler test failures caused
by missing 128-bit atomics support.
The series consists of four patches:
1. "LoongArch: Add SCQ support detection"
- Check CPUCFG2_SCQ bit to determin if the CPU supports
SCQ instrction.
2. "LoongArch: Add 128-bit atomic cmpxchg support"
- Implements 128-bit atomic compare-and-exchange using LoongArch's
LL.D/SC.Q instructions
- Fixes BPF scheduler test failures (scx_central scx_qmap) where
kmalloc_nolock_noprof returns NULL due to missing 128-bit atomics,
leading to -ENOMEM errors during scheduler initialization
3. "LoongArch: Use spinlock to emulate 128-bit cmpxchg"
- For LoongArch CPUs lacking 128-bit atomic instruction(e.g.,
the SCQ instruction on 3A5000), provide a fallback implementation
of __cmpxchg128 using a spinlock to emulate the atomic operation.
4. "LoongArch: Enable 128-bit atomics cmpxchg support"
- Adds select HAVE_CMPXCHG_DOUBLE and select HAVE_ALIGNED_STRUCT_PAGE
in Kconfig to enable 128-bit atomic cmpxchg support
The issue was identified through BPF scheduler test failures where
scx_central and scx_qmap schedulers would fail to initialize. Testing
was performed using the scx_qmap scheduler from tools/sched_ext/,
confirming that the patches resolve the initialization failures.
---
Changes in v7:
- Create patches based on loongarch-next branch(previously used master)
- Link to v6: https://lore.kernel.org/r/20251215-2-v6-0-09a486e8df99@linux.dev
Changes in v6:
- Put SCQ information in hwcap
- Link to v5: https://lore.kernel.org/r/20251212-2-v5-0-704b3af55f7d@linux.dev
Changes in v5:
- Reordered the patches
- Link to v4: https://lore.kernel.org/r/20251205-2-v4-0-e5ab932cf219@linux.dev
Changes in v4:
- Add SCQ support detection
- Add spinlock to emulate 128-bit cmpxchg
- Link to v3: https://lore.kernel.org/r/20251126-2-v3-0-851b5a516801@linux.dev
Changes in v3:
- dbar 0 -> __WEAK_LLSC_MB
- =ZB" (__ptr[0]) -> "r" (__ptr)
- Link to v2: https://lore.kernel.org/r/20251124-2-v2-0-b38216e25fd9@linux.dev
Changes in v2:
- Use a normal ld.d for the high word instead of ll.d to avoid race
condition
- Insert a dbar between ll.d and ld.d to prevent reordering
- Simply __cmpxchg128_asm("ll.d", "sc.q", ptr, o, n) to __cmpxchg128_asm(ptr, o, n)
- Fix address operand constraints after testing different approaches:
* ld.d with "m"
* ll.d with "ZC",
* sc.q with "ZB"(alternative constraints caused issues:
- "r" caused system hang
- "ZC" caused compiler error:
{standard input}: Assembler messages:
{standard input}:10037: Fatal error: Immediate overflow.
format: u0:0 )
- Link to v1: https://lore.kernel.org/r/20251120-2-v1-0-705bdc440550@linux.dev
George Guo (4):
LoongArch: Add SCQ support detection
LoongArch: Add 128-bit atomic cmpxchg support
LoongArch: Use spinlock to emulate 128-bit cmpxchg
LoongArch: Enable 128-bit atomics cmpxchg support
arch/loongarch/Kconfig | 2 +
arch/loongarch/include/asm/cmpxchg.h | 66 +++++++++++++++++++++++
arch/loongarch/include/asm/cpu-features.h | 1 +
arch/loongarch/include/asm/cpu.h | 2 +
arch/loongarch/include/asm/loongarch.h | 1 +
arch/loongarch/kernel/cpu-probe.c | 2 +
arch/loongarch/kernel/proc.c | 1 +
7 files changed, 75 insertions(+)
--
2.49.0
On Tue, Dec 30, 2025 at 9:34 AM George Guo <dongtai.guo@linux.dev> wrote:
>
> This patch series adds 128-bit atomic compare-and-exchange support for
> LoongArch architecture, which fixes BPF scheduler test failures caused
> by missing 128-bit atomics support.
>
> The series consists of four patches:
>
> 1. "LoongArch: Add SCQ support detection"
> - Check CPUCFG2_SCQ bit to determin if the CPU supports
> SCQ instrction.
>
> 2. "LoongArch: Add 128-bit atomic cmpxchg support"
> - Implements 128-bit atomic compare-and-exchange using LoongArch's
> LL.D/SC.Q instructions
> - Fixes BPF scheduler test failures (scx_central scx_qmap) where
> kmalloc_nolock_noprof returns NULL due to missing 128-bit atomics,
> leading to -ENOMEM errors during scheduler initialization
>
> 3. "LoongArch: Use spinlock to emulate 128-bit cmpxchg"
> - For LoongArch CPUs lacking 128-bit atomic instruction(e.g.,
> the SCQ instruction on 3A5000), provide a fallback implementation
> of __cmpxchg128 using a spinlock to emulate the atomic operation.
>
> 4. "LoongArch: Enable 128-bit atomics cmpxchg support"
> - Adds select HAVE_CMPXCHG_DOUBLE and select HAVE_ALIGNED_STRUCT_PAGE
> in Kconfig to enable 128-bit atomic cmpxchg support
>
> The issue was identified through BPF scheduler test failures where
> scx_central and scx_qmap schedulers would fail to initialize. Testing
> was performed using the scx_qmap scheduler from tools/sched_ext/,
> confirming that the patches resolve the initialization failures.
>
Testing good, this series fixes the BPF timer issues.
For the series:
Tested-by: Hengqi Chen <hengqi.chen@gmail.com>
> ---
> Changes in v7:
> - Create patches based on loongarch-next branch(previously used master)
> - Link to v6: https://lore.kernel.org/r/20251215-2-v6-0-09a486e8df99@linux.dev
>
> Changes in v6:
> - Put SCQ information in hwcap
> - Link to v5: https://lore.kernel.org/r/20251212-2-v5-0-704b3af55f7d@linux.dev
>
> Changes in v5:
> - Reordered the patches
> - Link to v4: https://lore.kernel.org/r/20251205-2-v4-0-e5ab932cf219@linux.dev
>
> Changes in v4:
> - Add SCQ support detection
> - Add spinlock to emulate 128-bit cmpxchg
> - Link to v3: https://lore.kernel.org/r/20251126-2-v3-0-851b5a516801@linux.dev
>
> Changes in v3:
> - dbar 0 -> __WEAK_LLSC_MB
> - =ZB" (__ptr[0]) -> "r" (__ptr)
> - Link to v2: https://lore.kernel.org/r/20251124-2-v2-0-b38216e25fd9@linux.dev
>
> Changes in v2:
> - Use a normal ld.d for the high word instead of ll.d to avoid race
> condition
> - Insert a dbar between ll.d and ld.d to prevent reordering
> - Simply __cmpxchg128_asm("ll.d", "sc.q", ptr, o, n) to __cmpxchg128_asm(ptr, o, n)
> - Fix address operand constraints after testing different approaches:
> * ld.d with "m"
> * ll.d with "ZC",
> * sc.q with "ZB"(alternative constraints caused issues:
> - "r" caused system hang
> - "ZC" caused compiler error:
> {standard input}: Assembler messages:
> {standard input}:10037: Fatal error: Immediate overflow.
> format: u0:0 )
> - Link to v1: https://lore.kernel.org/r/20251120-2-v1-0-705bdc440550@linux.dev
>
>
> George Guo (4):
> LoongArch: Add SCQ support detection
> LoongArch: Add 128-bit atomic cmpxchg support
> LoongArch: Use spinlock to emulate 128-bit cmpxchg
> LoongArch: Enable 128-bit atomics cmpxchg support
>
> arch/loongarch/Kconfig | 2 +
> arch/loongarch/include/asm/cmpxchg.h | 66 +++++++++++++++++++++++
> arch/loongarch/include/asm/cpu-features.h | 1 +
> arch/loongarch/include/asm/cpu.h | 2 +
> arch/loongarch/include/asm/loongarch.h | 1 +
> arch/loongarch/kernel/cpu-probe.c | 2 +
> arch/loongarch/kernel/proc.c | 1 +
> 7 files changed, 75 insertions(+)
>
> --
> 2.49.0
>
This patch series adds 128-bit atomic compare-and-exchange support for
LoongArch architecture, which fixes BPF scheduler test failures caused
by missing 128-bit atomics support.
The series consists of three patches:
1. "LoongArch: Add SCQ support detection"
- Check CPUCFG2_SCQ bit to determin if the CPU supports
SCQ instrction.
2. "LoongArch: Add 128-bit atomic cmpxchg support"
- Implements 128-bit atomic compare-and-exchange using LoongArch's
LL.D/SC.Q instructions
- For LoongArch CPUs lacking 128-bit atomic instruction(e.g.,
the SCQ instruction on 3A5000), use a spinlock to emulate
the atomic operation.
- Fixes BPF scheduler test failures (scx_central scx_qmap) where
kmalloc_nolock_noprof returns NULL due to missing 128-bit atomics,
leading to -ENOMEM errors during scheduler initialization
3. LoongArch: Enable 128-bit atomics cmpxchg support"
- Adds select HAVE_CMPXCHG_DOUBLE and select HAVE_ALIGNED_STRUCT_PAGE
in Kconfig to enable 128-bit atomic cmpxchg support
The issue was identified through BPF scheduler test failures where
scx_central and scx_qmap schedulers would fail to initialize. Testing
was performed using the scx_qmap scheduler from tools/sched_ext/,
confirming that the patches resolve the initialization failures.
---
Changes in v8:
- Merge patch 2 and patch 3 into one patch
- Put HAVE_CMPXCHG_DOUBLE in order
- Link to v7: https://lore.kernel.org/all/20251230013417.37393-1-dongtai.guo@linux.dev/
---
Changes in v7:
- Create patches based on loongarch-next branch(previously used master)
- Link to v6: https://lore.kernel.org/r/20251215-2-v6-0-09a486e8df99@linux.dev
Changes in v6:
- Put SCQ information in hwcap
- Link to v5: https://lore.kernel.org/r/20251212-2-v5-0-704b3af55f7d@linux.dev
Changes in v5:
- Reordered the patches
- Link to v4: https://lore.kernel.org/r/20251205-2-v4-0-e5ab932cf219@linux.dev
Changes in v4:
- Add SCQ support detection
- Add spinlock to emulate 128-bit cmpxchg
- Link to v3: https://lore.kernel.org/r/20251126-2-v3-0-851b5a516801@linux.dev
Changes in v3:
- dbar 0 -> __WEAK_LLSC_MB
- =ZB" (__ptr[0]) -> "r" (__ptr)
- Link to v2: https://lore.kernel.org/r/20251124-2-v2-0-b38216e25fd9@linux.dev
Changes in v2:
- Use a normal ld.d for the high word instead of ll.d to avoid race
condition
- Insert a dbar between ll.d and ld.d to prevent reordering
- Simply __cmpxchg128_asm("ll.d", "sc.q", ptr, o, n) to __cmpxchg128_asm(ptr, o, n)
- Fix address operand constraints after testing different approaches:
* ld.d with "m"
* ll.d with "ZC",
* sc.q with "ZB"(alternative constraints caused issues:
- "r" caused system hang
- "ZC" caused compiler error:
{standard input}: Assembler messages:
{standard input}:10037: Fatal error: Immediate overflow.
format: u0:0 )
- Link to v1: https://lore.kernel.org/r/20251120-2-v1-0-705bdc440550@linux.dev
George Guo (3):
LoongArch: Add SCQ support detection
LoongArch: Add 128-bit atomic cmpxchg support
LoongArch: Enable 128-bit atomics cmpxchg support
arch/loongarch/Kconfig | 2 +
arch/loongarch/include/asm/cmpxchg.h | 66 +++++++++++++++++++++++
arch/loongarch/include/asm/cpu-features.h | 1 +
arch/loongarch/include/asm/cpu.h | 2 +
arch/loongarch/include/asm/loongarch.h | 1 +
arch/loongarch/kernel/cpu-probe.c | 2 +
arch/loongarch/kernel/proc.c | 1 +
7 files changed, 75 insertions(+)
--
2.49.0
Hi, George,
On Wed, Dec 31, 2025 at 11:45 AM George Guo <dongtai.guo@linux.dev> wrote:
>
> This patch series adds 128-bit atomic compare-and-exchange support for
> LoongArch architecture, which fixes BPF scheduler test failures caused
> by missing 128-bit atomics support.
>
> The series consists of three patches:
>
> 1. "LoongArch: Add SCQ support detection"
> - Check CPUCFG2_SCQ bit to determin if the CPU supports
> SCQ instrction.
>
> 2. "LoongArch: Add 128-bit atomic cmpxchg support"
> - Implements 128-bit atomic compare-and-exchange using LoongArch's
> LL.D/SC.Q instructions
> - For LoongArch CPUs lacking 128-bit atomic instruction(e.g.,
> the SCQ instruction on 3A5000), use a spinlock to emulate
> the atomic operation.
> - Fixes BPF scheduler test failures (scx_central scx_qmap) where
> kmalloc_nolock_noprof returns NULL due to missing 128-bit atomics,
> leading to -ENOMEM errors during scheduler initialization
>
> 3. LoongArch: Enable 128-bit atomics cmpxchg support"
> - Adds select HAVE_CMPXCHG_DOUBLE and select HAVE_ALIGNED_STRUCT_PAGE
> in Kconfig to enable 128-bit atomic cmpxchg support
>
> The issue was identified through BPF scheduler test failures where
> scx_central and scx_qmap schedulers would fail to initialize. Testing
> was performed using the scx_qmap scheduler from tools/sched_ext/,
> confirming that the patches resolve the initialization failures.
>
> ---
> Changes in v8:
> - Merge patch 2 and patch 3 into one patch
> - Put HAVE_CMPXCHG_DOUBLE in order
> - Link to v7: https://lore.kernel.org/all/20251230013417.37393-1-dongtai.guo@linux.dev/
I don't know why you make all versions in a single thread, and the
version numbers of cover letters are always wrong.
For the code itself:
1. You said you have set hwcaps, but you completely ignore
arch/loongarch/include/uapi/asm/hwcap.h, I don't know why.
2. You can simply do
#define system_has_cmpxchg128() (cpu_has_scq)
and don't need to define __cmpxchg128_locked(), which is the same as
X86 and RISC-V.
Huacai
>
> ---
> Changes in v7:
> - Create patches based on loongarch-next branch(previously used master)
> - Link to v6: https://lore.kernel.org/r/20251215-2-v6-0-09a486e8df99@linux.dev
>
> Changes in v6:
> - Put SCQ information in hwcap
> - Link to v5: https://lore.kernel.org/r/20251212-2-v5-0-704b3af55f7d@linux.dev
>
> Changes in v5:
> - Reordered the patches
> - Link to v4: https://lore.kernel.org/r/20251205-2-v4-0-e5ab932cf219@linux.dev
>
> Changes in v4:
> - Add SCQ support detection
> - Add spinlock to emulate 128-bit cmpxchg
> - Link to v3: https://lore.kernel.org/r/20251126-2-v3-0-851b5a516801@linux.dev
>
> Changes in v3:
> - dbar 0 -> __WEAK_LLSC_MB
> - =ZB" (__ptr[0]) -> "r" (__ptr)
> - Link to v2: https://lore.kernel.org/r/20251124-2-v2-0-b38216e25fd9@linux.dev
>
> Changes in v2:
> - Use a normal ld.d for the high word instead of ll.d to avoid race
> condition
> - Insert a dbar between ll.d and ld.d to prevent reordering
> - Simply __cmpxchg128_asm("ll.d", "sc.q", ptr, o, n) to __cmpxchg128_asm(ptr, o, n)
> - Fix address operand constraints after testing different approaches:
> * ld.d with "m"
> * ll.d with "ZC",
> * sc.q with "ZB"(alternative constraints caused issues:
> - "r" caused system hang
> - "ZC" caused compiler error:
> {standard input}: Assembler messages:
> {standard input}:10037: Fatal error: Immediate overflow.
> format: u0:0 )
> - Link to v1: https://lore.kernel.org/r/20251120-2-v1-0-705bdc440550@linux.dev
>
> George Guo (3):
> LoongArch: Add SCQ support detection
> LoongArch: Add 128-bit atomic cmpxchg support
> LoongArch: Enable 128-bit atomics cmpxchg support
>
> arch/loongarch/Kconfig | 2 +
> arch/loongarch/include/asm/cmpxchg.h | 66 +++++++++++++++++++++++
> arch/loongarch/include/asm/cpu-features.h | 1 +
> arch/loongarch/include/asm/cpu.h | 2 +
> arch/loongarch/include/asm/loongarch.h | 1 +
> arch/loongarch/kernel/cpu-probe.c | 2 +
> arch/loongarch/kernel/proc.c | 1 +
> 7 files changed, 75 insertions(+)
>
> --
> 2.49.0
>
From: George Guo <guodongtai@kylinos.cn>
Check CPUCFG2_SCQ bit to determine if the CPU supports
SCQ instruction.
Co-developed-by: Yangyang Lian <lianyangyang@kylinos.cn>
Signed-off-by: Yangyang Lian <lianyangyang@kylinos.cn>
Signed-off-by: George Guo <guodongtai@kylinos.cn>
---
arch/loongarch/include/asm/cpu-features.h | 1 +
arch/loongarch/include/asm/cpu.h | 2 ++
arch/loongarch/include/asm/loongarch.h | 1 +
arch/loongarch/kernel/cpu-probe.c | 2 ++
arch/loongarch/kernel/proc.c | 1 +
5 files changed, 7 insertions(+)
diff --git a/arch/loongarch/include/asm/cpu-features.h b/arch/loongarch/include/asm/cpu-features.h
index 3745d991a99a..39c7fe64c3ef 100644
--- a/arch/loongarch/include/asm/cpu-features.h
+++ b/arch/loongarch/include/asm/cpu-features.h
@@ -67,5 +67,6 @@
#define cpu_has_msgint cpu_opt(LOONGARCH_CPU_MSGINT)
#define cpu_has_avecint cpu_opt(LOONGARCH_CPU_AVECINT)
#define cpu_has_redirectint cpu_opt(LOONGARCH_CPU_REDIRECTINT)
+#define cpu_has_scq cpu_opt(LOONGARCH_CPU_SCQ)
#endif /* __ASM_CPU_FEATURES_H */
diff --git a/arch/loongarch/include/asm/cpu.h b/arch/loongarch/include/asm/cpu.h
index f3efb00b6141..5531039027ec 100644
--- a/arch/loongarch/include/asm/cpu.h
+++ b/arch/loongarch/include/asm/cpu.h
@@ -125,6 +125,7 @@ static inline char *id_to_core_name(unsigned int id)
#define CPU_FEATURE_MSGINT 29 /* CPU has MSG interrupt */
#define CPU_FEATURE_AVECINT 30 /* CPU has AVEC interrupt */
#define CPU_FEATURE_REDIRECTINT 31 /* CPU has interrupt remapping */
+#define CPU_FEATURE_SCQ 32 /* CPU has SC.Q instruction */
#define LOONGARCH_CPU_CPUCFG BIT_ULL(CPU_FEATURE_CPUCFG)
#define LOONGARCH_CPU_LAM BIT_ULL(CPU_FEATURE_LAM)
@@ -158,5 +159,6 @@ static inline char *id_to_core_name(unsigned int id)
#define LOONGARCH_CPU_MSGINT BIT_ULL(CPU_FEATURE_MSGINT)
#define LOONGARCH_CPU_AVECINT BIT_ULL(CPU_FEATURE_AVECINT)
#define LOONGARCH_CPU_REDIRECTINT BIT_ULL(CPU_FEATURE_REDIRECTINT)
+#define LOONGARCH_CPU_SCQ BIT_ULL(CPU_FEATURE_SCQ)
#endif /* _ASM_CPU_H */
diff --git a/arch/loongarch/include/asm/loongarch.h b/arch/loongarch/include/asm/loongarch.h
index e6b8ff61c8cc..817cd90941d9 100644
--- a/arch/loongarch/include/asm/loongarch.h
+++ b/arch/loongarch/include/asm/loongarch.h
@@ -94,6 +94,7 @@
#define CPUCFG2_LSPW BIT(21)
#define CPUCFG2_LAM BIT(22)
#define CPUCFG2_PTW BIT(24)
+#define CPUCFG2_SCQ BIT(30)
#define LOONGARCH_CPUCFG3 0x3
#define CPUCFG3_CCDMA BIT(0)
diff --git a/arch/loongarch/kernel/cpu-probe.c b/arch/loongarch/kernel/cpu-probe.c
index 08a227034042..382c472c6bfe 100644
--- a/arch/loongarch/kernel/cpu-probe.c
+++ b/arch/loongarch/kernel/cpu-probe.c
@@ -205,6 +205,8 @@ static void cpu_probe_common(struct cpuinfo_loongarch *c)
c->options |= LOONGARCH_CPU_PTW;
elf_hwcap |= HWCAP_LOONGARCH_PTW;
}
+ if (config & CPUCFG2_SCQ)
+ c->options |= LOONGARCH_CPU_SCQ;
if (config & CPUCFG2_LSPW) {
c->options |= LOONGARCH_CPU_LSPW;
elf_hwcap |= HWCAP_LOONGARCH_LSPW;
diff --git a/arch/loongarch/kernel/proc.c b/arch/loongarch/kernel/proc.c
index a8800d20e11b..252fa1d03b85 100644
--- a/arch/loongarch/kernel/proc.c
+++ b/arch/loongarch/kernel/proc.c
@@ -75,6 +75,7 @@ static int show_cpuinfo(struct seq_file *m, void *v)
if (cpu_has_lbt_x86) seq_printf(m, " lbt_x86");
if (cpu_has_lbt_arm) seq_printf(m, " lbt_arm");
if (cpu_has_lbt_mips) seq_printf(m, " lbt_mips");
+ if (cpu_has_scq) seq_printf(m, " scq");
seq_printf(m, "\n");
seq_printf(m, "Hardware Watchpoint\t: %s", str_yes_no(cpu_has_watch));
--
2.49.0
On Wed, Dec 31, 2025 at 11:45 AM George Guo <dongtai.guo@linux.dev> wrote:
>
> From: George Guo <guodongtai@kylinos.cn>
>
> Check CPUCFG2_SCQ bit to determine if the CPU supports
> SCQ instruction.
>
> Co-developed-by: Yangyang Lian <lianyangyang@kylinos.cn>
> Signed-off-by: Yangyang Lian <lianyangyang@kylinos.cn>
> Signed-off-by: George Guo <guodongtai@kylinos.cn>
> ---
There is a conflict with latest loongarch-next branch. Other than that
Reviewed-by: Hengqi Chen <hengqi.chen@gmail.com>
Tested-by: Hengqi Chen <hengqi.chen@gmail.com>
> arch/loongarch/include/asm/cpu-features.h | 1 +
> arch/loongarch/include/asm/cpu.h | 2 ++
> arch/loongarch/include/asm/loongarch.h | 1 +
> arch/loongarch/kernel/cpu-probe.c | 2 ++
> arch/loongarch/kernel/proc.c | 1 +
> 5 files changed, 7 insertions(+)
>
> diff --git a/arch/loongarch/include/asm/cpu-features.h b/arch/loongarch/include/asm/cpu-features.h
> index 3745d991a99a..39c7fe64c3ef 100644
> --- a/arch/loongarch/include/asm/cpu-features.h
> +++ b/arch/loongarch/include/asm/cpu-features.h
> @@ -67,5 +67,6 @@
> #define cpu_has_msgint cpu_opt(LOONGARCH_CPU_MSGINT)
> #define cpu_has_avecint cpu_opt(LOONGARCH_CPU_AVECINT)
> #define cpu_has_redirectint cpu_opt(LOONGARCH_CPU_REDIRECTINT)
> +#define cpu_has_scq cpu_opt(LOONGARCH_CPU_SCQ)
>
> #endif /* __ASM_CPU_FEATURES_H */
> diff --git a/arch/loongarch/include/asm/cpu.h b/arch/loongarch/include/asm/cpu.h
> index f3efb00b6141..5531039027ec 100644
> --- a/arch/loongarch/include/asm/cpu.h
> +++ b/arch/loongarch/include/asm/cpu.h
> @@ -125,6 +125,7 @@ static inline char *id_to_core_name(unsigned int id)
> #define CPU_FEATURE_MSGINT 29 /* CPU has MSG interrupt */
> #define CPU_FEATURE_AVECINT 30 /* CPU has AVEC interrupt */
> #define CPU_FEATURE_REDIRECTINT 31 /* CPU has interrupt remapping */
> +#define CPU_FEATURE_SCQ 32 /* CPU has SC.Q instruction */
>
> #define LOONGARCH_CPU_CPUCFG BIT_ULL(CPU_FEATURE_CPUCFG)
> #define LOONGARCH_CPU_LAM BIT_ULL(CPU_FEATURE_LAM)
> @@ -158,5 +159,6 @@ static inline char *id_to_core_name(unsigned int id)
> #define LOONGARCH_CPU_MSGINT BIT_ULL(CPU_FEATURE_MSGINT)
> #define LOONGARCH_CPU_AVECINT BIT_ULL(CPU_FEATURE_AVECINT)
> #define LOONGARCH_CPU_REDIRECTINT BIT_ULL(CPU_FEATURE_REDIRECTINT)
> +#define LOONGARCH_CPU_SCQ BIT_ULL(CPU_FEATURE_SCQ)
>
> #endif /* _ASM_CPU_H */
> diff --git a/arch/loongarch/include/asm/loongarch.h b/arch/loongarch/include/asm/loongarch.h
> index e6b8ff61c8cc..817cd90941d9 100644
> --- a/arch/loongarch/include/asm/loongarch.h
> +++ b/arch/loongarch/include/asm/loongarch.h
> @@ -94,6 +94,7 @@
> #define CPUCFG2_LSPW BIT(21)
> #define CPUCFG2_LAM BIT(22)
> #define CPUCFG2_PTW BIT(24)
> +#define CPUCFG2_SCQ BIT(30)
>
> #define LOONGARCH_CPUCFG3 0x3
> #define CPUCFG3_CCDMA BIT(0)
> diff --git a/arch/loongarch/kernel/cpu-probe.c b/arch/loongarch/kernel/cpu-probe.c
> index 08a227034042..382c472c6bfe 100644
> --- a/arch/loongarch/kernel/cpu-probe.c
> +++ b/arch/loongarch/kernel/cpu-probe.c
> @@ -205,6 +205,8 @@ static void cpu_probe_common(struct cpuinfo_loongarch *c)
> c->options |= LOONGARCH_CPU_PTW;
> elf_hwcap |= HWCAP_LOONGARCH_PTW;
> }
> + if (config & CPUCFG2_SCQ)
> + c->options |= LOONGARCH_CPU_SCQ;
> if (config & CPUCFG2_LSPW) {
> c->options |= LOONGARCH_CPU_LSPW;
> elf_hwcap |= HWCAP_LOONGARCH_LSPW;
> diff --git a/arch/loongarch/kernel/proc.c b/arch/loongarch/kernel/proc.c
> index a8800d20e11b..252fa1d03b85 100644
> --- a/arch/loongarch/kernel/proc.c
> +++ b/arch/loongarch/kernel/proc.c
> @@ -75,6 +75,7 @@ static int show_cpuinfo(struct seq_file *m, void *v)
> if (cpu_has_lbt_x86) seq_printf(m, " lbt_x86");
> if (cpu_has_lbt_arm) seq_printf(m, " lbt_arm");
> if (cpu_has_lbt_mips) seq_printf(m, " lbt_mips");
> + if (cpu_has_scq) seq_printf(m, " scq");
> seq_printf(m, "\n");
>
> seq_printf(m, "Hardware Watchpoint\t: %s", str_yes_no(cpu_has_watch));
> --
> 2.49.0
>
From: George Guo <guodongtai@kylinos.cn>
Implement 128-bit atomic compare-and-exchange using LoongArch's
LL.D/SC.Q instructions.
For LoongArch CPUs lacking 128-bit atomic instruction(e.g.,
the SCQ instruction on 3A5000), use a spinlock to emulate
the atomic operation.
At the same time, fix BPF scheduler test failures (scx_central scx_qmap)
caused by kmalloc_nolock_noprof returning NULL due to missing
128-bit atomics. The NULL returns led to -ENOMEM errors during
scheduler initialization, causing test cases to fail.
Verified by testing with the scx_qmap scheduler (located in
tools/sched_ext/). Building with `make` and running
./tools/sched_ext/build/bin/scx_qmap.
Link: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/commit/?id=5fb750e8a9ae
Signed-off-by: George Guo <guodongtai@kylinos.cn>
---
arch/loongarch/include/asm/cmpxchg.h | 66 ++++++++++++++++++++++++++++
1 file changed, 66 insertions(+)
diff --git a/arch/loongarch/include/asm/cmpxchg.h b/arch/loongarch/include/asm/cmpxchg.h
index 0494c2ab553e..ef793bcb7b25 100644
--- a/arch/loongarch/include/asm/cmpxchg.h
+++ b/arch/loongarch/include/asm/cmpxchg.h
@@ -8,6 +8,7 @@
#include <linux/bits.h>
#include <linux/build_bug.h>
#include <asm/barrier.h>
+#include <asm/cpu-features.h>
#define __xchg_amo_asm(amswap_db, m, val) \
({ \
@@ -137,6 +138,61 @@ __arch_xchg(volatile void *ptr, unsigned long x, int size)
__ret; \
})
+union __u128_halves {
+ u128 full;
+ struct {
+ u64 low;
+ u64 high;
+ };
+};
+
+#define __cmpxchg128_asm(ptr, old, new) \
+({ \
+ union __u128_halves __old, __new, __ret; \
+ volatile u64 *__ptr = (volatile u64 *)(ptr); \
+ \
+ __old.full = (old); \
+ __new.full = (new); \
+ \
+ __asm__ __volatile__( \
+ "1: ll.d %0, %3 # 128-bit cmpxchg low \n" \
+ __WEAK_LLSC_MB \
+ " ld.d %1, %4 # 128-bit cmpxchg high \n" \
+ " bne %0, %z5, 2f \n" \
+ " bne %1, %z6, 2f \n" \
+ " move $t0, %z7 \n" \
+ " move $t1, %z8 \n" \
+ " sc.q $t0, $t1, %2 \n" \
+ " beqz $t0, 1b \n" \
+ "2: \n" \
+ __WEAK_LLSC_MB \
+ : "=&r" (__ret.low), "=&r" (__ret.high) \
+ : "r" (__ptr), \
+ "ZC" (__ptr[0]), "m" (__ptr[1]), \
+ "Jr" (__old.low), "Jr" (__old.high), \
+ "Jr" (__new.low), "Jr" (__new.high) \
+ : "t0", "t1", "memory"); \
+ \
+ __ret.full; \
+})
+
+#define __cmpxchg128_locked(ptr, old, new) \
+({ \
+ u128 __ret; \
+ static DEFINE_SPINLOCK(lock); \
+ unsigned long flags; \
+ \
+ spin_lock_irqsave(&lock, flags); \
+ \
+ __ret = *(volatile u128 *)(ptr); \
+ if (__ret == (old)) \
+ *(volatile u128 *)(ptr) = (new); \
+ \
+ spin_unlock_irqrestore(&lock, flags); \
+ \
+ __ret; \
+})
+
static inline unsigned int __cmpxchg_small(volatile void *ptr, unsigned int old,
unsigned int new, unsigned int size)
{
@@ -224,6 +280,16 @@ __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new, unsigned int
__res; \
})
+/* cmpxchg128 */
+#define system_has_cmpxchg128() 1
+
+#define arch_cmpxchg128(ptr, o, n) \
+({ \
+ BUILD_BUG_ON(sizeof(*(ptr)) != 16); \
+ cpu_has_scq ? __cmpxchg128_asm(ptr, o, n) : \
+ __cmpxchg128_locked(ptr, o, n); \
+})
+
#ifdef CONFIG_64BIT
#define arch_cmpxchg64_local(ptr, o, n) \
({ \
--
2.49.0
On Wed, Dec 31, 2025 at 11:45 AM George Guo <dongtai.guo@linux.dev> wrote:
>
> From: George Guo <guodongtai@kylinos.cn>
>
> Implement 128-bit atomic compare-and-exchange using LoongArch's
> LL.D/SC.Q instructions.
>
> For LoongArch CPUs lacking 128-bit atomic instruction(e.g.,
> the SCQ instruction on 3A5000), use a spinlock to emulate
> the atomic operation.
>
> At the same time, fix BPF scheduler test failures (scx_central scx_qmap)
> caused by kmalloc_nolock_noprof returning NULL due to missing
> 128-bit atomics. The NULL returns led to -ENOMEM errors during
> scheduler initialization, causing test cases to fail.
>
> Verified by testing with the scx_qmap scheduler (located in
> tools/sched_ext/). Building with `make` and running
> ./tools/sched_ext/build/bin/scx_qmap.
>
> Link: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/commit/?id=5fb750e8a9ae
> Signed-off-by: George Guo <guodongtai@kylinos.cn>
> ---
Acked-by: Hengqi Chen <hengqi.chen@gmail.com>
Tested-by: Hengqi Chen <hengqi.chen@gmail.com>
> arch/loongarch/include/asm/cmpxchg.h | 66 ++++++++++++++++++++++++++++
> 1 file changed, 66 insertions(+)
>
> diff --git a/arch/loongarch/include/asm/cmpxchg.h b/arch/loongarch/include/asm/cmpxchg.h
> index 0494c2ab553e..ef793bcb7b25 100644
> --- a/arch/loongarch/include/asm/cmpxchg.h
> +++ b/arch/loongarch/include/asm/cmpxchg.h
> @@ -8,6 +8,7 @@
> #include <linux/bits.h>
> #include <linux/build_bug.h>
> #include <asm/barrier.h>
> +#include <asm/cpu-features.h>
>
> #define __xchg_amo_asm(amswap_db, m, val) \
> ({ \
> @@ -137,6 +138,61 @@ __arch_xchg(volatile void *ptr, unsigned long x, int size)
> __ret; \
> })
>
> +union __u128_halves {
> + u128 full;
> + struct {
> + u64 low;
> + u64 high;
> + };
> +};
> +
> +#define __cmpxchg128_asm(ptr, old, new) \
> +({ \
> + union __u128_halves __old, __new, __ret; \
> + volatile u64 *__ptr = (volatile u64 *)(ptr); \
> + \
> + __old.full = (old); \
> + __new.full = (new); \
> + \
> + __asm__ __volatile__( \
> + "1: ll.d %0, %3 # 128-bit cmpxchg low \n" \
> + __WEAK_LLSC_MB \
> + " ld.d %1, %4 # 128-bit cmpxchg high \n" \
> + " bne %0, %z5, 2f \n" \
> + " bne %1, %z6, 2f \n" \
> + " move $t0, %z7 \n" \
> + " move $t1, %z8 \n" \
> + " sc.q $t0, $t1, %2 \n" \
> + " beqz $t0, 1b \n" \
> + "2: \n" \
> + __WEAK_LLSC_MB \
> + : "=&r" (__ret.low), "=&r" (__ret.high) \
> + : "r" (__ptr), \
> + "ZC" (__ptr[0]), "m" (__ptr[1]), \
> + "Jr" (__old.low), "Jr" (__old.high), \
> + "Jr" (__new.low), "Jr" (__new.high) \
> + : "t0", "t1", "memory"); \
> + \
> + __ret.full; \
> +})
> +
> +#define __cmpxchg128_locked(ptr, old, new) \
> +({ \
> + u128 __ret; \
> + static DEFINE_SPINLOCK(lock); \
> + unsigned long flags; \
> + \
> + spin_lock_irqsave(&lock, flags); \
> + \
> + __ret = *(volatile u128 *)(ptr); \
> + if (__ret == (old)) \
> + *(volatile u128 *)(ptr) = (new); \
> + \
> + spin_unlock_irqrestore(&lock, flags); \
> + \
> + __ret; \
> +})
> +
> static inline unsigned int __cmpxchg_small(volatile void *ptr, unsigned int old,
> unsigned int new, unsigned int size)
> {
> @@ -224,6 +280,16 @@ __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new, unsigned int
> __res; \
> })
>
> +/* cmpxchg128 */
> +#define system_has_cmpxchg128() 1
> +
> +#define arch_cmpxchg128(ptr, o, n) \
> +({ \
> + BUILD_BUG_ON(sizeof(*(ptr)) != 16); \
> + cpu_has_scq ? __cmpxchg128_asm(ptr, o, n) : \
> + __cmpxchg128_locked(ptr, o, n); \
> +})
> +
> #ifdef CONFIG_64BIT
> #define arch_cmpxchg64_local(ptr, o, n) \
> ({ \
> --
> 2.49.0
>
From: George Guo <guodongtai@kylinos.cn>
Add select HAVE_CMPXCHG_DOUBLE and select HAVE_ALIGNED_STRUCT_PAGE in Kconfig
to enable 128-bit atomic cmpxchg support on LoongArch.
Signed-off-by: George Guo <guodongtai@kylinos.cn>
---
arch/loongarch/Kconfig | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 730f34214519..f9845ebec1a4 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -114,6 +114,7 @@ config LOONGARCH
select GENERIC_TIME_VSYSCALL
select GPIOLIB
select HAS_IOPORT
+ select HAVE_ALIGNED_STRUCT_PAGE
select HAVE_ARCH_AUDITSYSCALL
select HAVE_ARCH_BITREVERSE
select HAVE_ARCH_JUMP_LABEL
@@ -130,6 +131,7 @@ config LOONGARCH
select HAVE_ARCH_TRANSPARENT_HUGEPAGE
select HAVE_ARCH_USERFAULTFD_MINOR if USERFAULTFD
select HAVE_ASM_MODVERSIONS
+ select HAVE_CMPXCHG_DOUBLE
select HAVE_CONTEXT_TRACKING_USER
select HAVE_C_RECORDMCOUNT
select HAVE_DEBUG_KMEMLEAK
--
2.49.0
On Wed, Dec 31, 2025 at 11:45 AM George Guo <dongtai.guo@linux.dev> wrote: > > From: George Guo <guodongtai@kylinos.cn> > > Add select HAVE_CMPXCHG_DOUBLE and select HAVE_ALIGNED_STRUCT_PAGE in Kconfig > to enable 128-bit atomic cmpxchg support on LoongArch. > Reviewed-by: Hengqi Chen <hengqi.chen@gmail.com> Tested-by: Hengqi Chen <hengqi.chen@gmail.com> > Signed-off-by: George Guo <guodongtai@kylinos.cn> > --- > arch/loongarch/Kconfig | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig > index 730f34214519..f9845ebec1a4 100644 > --- a/arch/loongarch/Kconfig > +++ b/arch/loongarch/Kconfig > @@ -114,6 +114,7 @@ config LOONGARCH > select GENERIC_TIME_VSYSCALL > select GPIOLIB > select HAS_IOPORT > + select HAVE_ALIGNED_STRUCT_PAGE > select HAVE_ARCH_AUDITSYSCALL > select HAVE_ARCH_BITREVERSE > select HAVE_ARCH_JUMP_LABEL > @@ -130,6 +131,7 @@ config LOONGARCH > select HAVE_ARCH_TRANSPARENT_HUGEPAGE > select HAVE_ARCH_USERFAULTFD_MINOR if USERFAULTFD > select HAVE_ASM_MODVERSIONS > + select HAVE_CMPXCHG_DOUBLE > select HAVE_CONTEXT_TRACKING_USER > select HAVE_C_RECORDMCOUNT > select HAVE_DEBUG_KMEMLEAK > -- > 2.49.0 >
© 2016 - 2026 Red Hat, Inc.