Documentation/arch/arm64/silicon-errata.rst | 1 + arch/arm64/Kconfig | 19 +++++++++++++ arch/arm64/include/asm/assembler.h | 10 +++++++ arch/arm64/kernel/cpu_errata.c | 31 +++++++++++++++++++++ arch/arm64/tools/cpucaps | 1 + 5 files changed, 62 insertions(+)
When software issues a Cache Maintenance Operation (CMO) targeting a
dirty cache line, the CPU and DSU cluster may optimize the operation by
combining the CopyBack Write and CMO into a single combined CopyBack
Write plus CMO transaction presented to the interconnect (MCN).
For these combined transactions, the MCN splits the operation into two
separate transactions, one Write and one CMO, and then propagates the
write and optionally the CMO to the downstream memory system or external
Point of Serialization (PoS).
However, the MCN may return an early CompCMO response to the DSU cluster
before the corresponding Write and CMO transactions have completed at
the external PoS or downstream memory. As a result, stale data may be
observed by external observers that are directly connected to the
external PoS or downstream memory.
This erratum affects any system topology in which the following
conditions apply:
- The Point of Serialization (PoS) is located downstream of the
interconnect.
- A downstream observer accesses memory directly, bypassing the
interconnect.
Conditions:
This erratum occurs only when all of the following conditions are met:
1. Software executes a data cache maintenance operation, specifically,
a clean or clean&invalidate by virtual address (DC CVAC or DC
CIVAC), that hits on unique dirty data in the CPU or DSU cache.
This results in a combined CopyBack and CMO being issued to the
interconnect.
2. The interconnect splits the combined transaction into separate Write
and CMO transactions and returns an early completion response to the
CPU or DSU before the write has completed at the downstream memory
or PoS.
3. A downstream observer accesses the affected memory address after the
early completion response is issued but before the actual memory
write has completed. This allows the observer to read stale data
that has not yet been updated at the PoS or downstream memory.
The implementation of workaround put a second loop of CMOs at the same
virtual address whose operation meet erratum conditions to wait until
cache data be cleaned to PoC. This way of implementation mitigates
performance penalty compared to purely duplicate original CMO.
Cc: stable@vger.kernel.org # 6.12.x
Signed-off-by: Lucas Wei <lucaswei@google.com>
---
Changes in v3:
1. Fix typos
2. Remove 'lkp@intel.com' from commit message
3. Keep ARM within a single section
4. Remove workaround of #4311569 from `cache_inval_poc()`
Changes in v2:
1. Fixed warning from kernel test robot by changing
arm_si_l1_workaround_4311569 to static
[Reported-by: kernel test robot <lkp@intel.com>]
---
Documentation/arch/arm64/silicon-errata.rst | 1 +
arch/arm64/Kconfig | 19 +++++++++++++
arch/arm64/include/asm/assembler.h | 10 +++++++
arch/arm64/kernel/cpu_errata.c | 31 +++++++++++++++++++++
arch/arm64/tools/cpucaps | 1 +
5 files changed, 62 insertions(+)
diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst
index a7ec57060f64..4c300caad901 100644
--- a/Documentation/arch/arm64/silicon-errata.rst
+++ b/Documentation/arch/arm64/silicon-errata.rst
@@ -212,6 +212,7 @@ stable kernels.
+----------------+-----------------+-----------------+-----------------------------+
| ARM | GIC-700 | #2941627 | ARM64_ERRATUM_2941627 |
+----------------+-----------------+-----------------+-----------------------------+
+| ARM | SI L1 | #4311569 | ARM64_ERRATUM_4311569 |
+----------------+-----------------+-----------------+-----------------------------+
| Broadcom | Brahma-B53 | N/A | ARM64_ERRATUM_845719 |
+----------------+-----------------+-----------------+-----------------------------+
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 93173f0a09c7..89326bb26f48 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1155,6 +1155,25 @@ config ARM64_ERRATUM_3194386
If unsure, say Y.
+config ARM64_ERRATUM_4311569
+ bool "SI L1: 4311569: workaround for premature CMO completion erratum"
+ default y
+ help
+ This option adds the workaround for ARM SI L1 erratum 4311569.
+
+ The erratum of SI L1 can cause an early response to a combined write
+ and cache maintenance operation (WR+CMO) before the operation is fully
+ completed to the Point of Serialization (POS).
+ This can result in a non-I/O coherent agent observing stale data,
+ potentially leading to system instability or incorrect behavior.
+
+ Enabling this option implements a software workaround by inserting a
+ second loop of Cache Maintenance Operation (CMO) immediately following the
+ end of function to do CMOs. This ensures that the data is correctly serialized
+ before the buffer is handed off to a non-coherent agent.
+
+ If unsure, say Y.
+
config CAVIUM_ERRATUM_22375
bool "Cavium erratum 22375, 24313"
default y
diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index f0ca7196f6fa..d3d46e5f7188 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -381,6 +381,9 @@ alternative_endif
.macro dcache_by_myline_op op, domain, start, end, linesz, tmp, fixup
sub \tmp, \linesz, #1
bic \start, \start, \tmp
+alternative_if ARM64_WORKAROUND_4311569
+ mov \tmp, \start
+alternative_else_nop_endif
.Ldcache_op\@:
.ifc \op, cvau
__dcache_op_workaround_clean_cache \op, \start
@@ -402,6 +405,13 @@ alternative_endif
add \start, \start, \linesz
cmp \start, \end
b.lo .Ldcache_op\@
+alternative_if ARM64_WORKAROUND_4311569
+ .ifnc \op, cvau
+ mov \start, \tmp
+ mov \tmp, xzr
+ cbnz \start, .Ldcache_op\@
+ .endif
+alternative_else_nop_endif
dsb \domain
_cond_uaccess_extable .Ldcache_op\@, \fixup
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index 8cb3b575a031..5c0ab6bfd44a 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -141,6 +141,30 @@ has_mismatched_cache_type(const struct arm64_cpu_capabilities *entry,
return (ctr_real != sys) && (ctr_raw != sys);
}
+#ifdef CONFIG_ARM64_ERRATUM_4311569
+static DEFINE_STATIC_KEY_FALSE(arm_si_l1_workaround_4311569);
+static int __init early_arm_si_l1_workaround_4311569_cfg(char *arg)
+{
+ static_branch_enable(&arm_si_l1_workaround_4311569);
+ pr_info("Enabling cache maintenance workaround for ARM SI-L1 erratum 4311569\n");
+
+ return 0;
+}
+early_param("arm_si_l1_workaround_4311569", early_arm_si_l1_workaround_4311569_cfg);
+
+/*
+ * We have some earlier use cases to call cache maintenance operation functions, for example,
+ * dcache_inval_poc() and dcache_clean_poc() in head.S, before making decision to turn on this
+ * workaround. Since the scope of this workaround is limited to non-coherent DMA agents, its
+ * safe to have the workaround off by default.
+ */
+static bool
+need_arm_si_l1_workaround_4311569(const struct arm64_cpu_capabilities *entry, int scope)
+{
+ return static_branch_unlikely(&arm_si_l1_workaround_4311569);
+}
+#endif
+
static void
cpu_enable_trap_ctr_access(const struct arm64_cpu_capabilities *cap)
{
@@ -870,6 +894,13 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
ERRATA_MIDR_RANGE_LIST(erratum_spec_ssbs_list),
},
#endif
+#ifdef CONFIG_ARM64_ERRATUM_4311569
+ {
+ .capability = ARM64_WORKAROUND_4311569,
+ .type = ARM64_CPUCAP_SYSTEM_FEATURE,
+ .matches = need_arm_si_l1_workaround_4311569,
+ },
+#endif
#ifdef CONFIG_ARM64_WORKAROUND_SPECULATIVE_UNPRIV_LOAD
{
.desc = "ARM errata 2966298, 3117295",
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index 0fac75f01534..856b6cf6e71e 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -103,6 +103,7 @@ WORKAROUND_2077057
WORKAROUND_2457168
WORKAROUND_2645198
WORKAROUND_2658417
+WORKAROUND_4311569
WORKAROUND_AMPERE_AC03_CPU_38
WORKAROUND_AMPERE_AC04_CPU_23
WORKAROUND_TRBE_OVERWRITE_FILL_MODE
base-commit: 0f61b1860cc3f52aef9036d7235ed1f017632193
--
2.52.0.457.g6b5491de43-goog
On Wed, 14 Jan 2026 14:52:41 +0000, Lucas Wei wrote:
> When software issues a Cache Maintenance Operation (CMO) targeting a
> dirty cache line, the CPU and DSU cluster may optimize the operation by
> combining the CopyBack Write and CMO into a single combined CopyBack
> Write plus CMO transaction presented to the interconnect (MCN).
> For these combined transactions, the MCN splits the operation into two
> separate transactions, one Write and one CMO, and then propagates the
> write and optionally the CMO to the downstream memory system or external
> Point of Serialization (PoS).
> However, the MCN may return an early CompCMO response to the DSU cluster
> before the corresponding Write and CMO transactions have completed at
> the external PoS or downstream memory. As a result, stale data may be
> observed by external observers that are directly connected to the
> external PoS or downstream memory.
>
> [...]
Applied to arm64 (for-next/errata), thanks!
[1/1] arm64: errata: Workaround for SI L1 downstream coherency issue
https://git.kernel.org/arm64/c/3fed7e0059f0
Cheers,
--
Will
https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev
Hi,
I have a few comments/questions, please.
On 1/14/26 6:52 AM, Lucas Wei wrote:
> When software issues a Cache Maintenance Operation (CMO) targeting a
> dirty cache line, the CPU and DSU cluster may optimize the operation by
> combining the CopyBack Write and CMO into a single combined CopyBack
> Write plus CMO transaction presented to the interconnect (MCN).
> For these combined transactions, the MCN splits the operation into two
> separate transactions, one Write and one CMO, and then propagates the
> write and optionally the CMO to the downstream memory system or external
> Point of Serialization (PoS).
> However, the MCN may return an early CompCMO response to the DSU cluster
> before the corresponding Write and CMO transactions have completed at
> the external PoS or downstream memory. As a result, stale data may be
> observed by external observers that are directly connected to the
> external PoS or downstream memory.
>
> This erratum affects any system topology in which the following
> conditions apply:
> - The Point of Serialization (PoS) is located downstream of the
> interconnect.
> - A downstream observer accesses memory directly, bypassing the
> interconnect.
>
> Conditions:
> This erratum occurs only when all of the following conditions are met:
> 1. Software executes a data cache maintenance operation, specifically,
> a clean or clean&invalidate by virtual address (DC CVAC or DC
> CIVAC), that hits on unique dirty data in the CPU or DSU cache.
> This results in a combined CopyBack and CMO being issued to the
> interconnect.
> 2. The interconnect splits the combined transaction into separate Write
> and CMO transactions and returns an early completion response to the
> CPU or DSU before the write has completed at the downstream memory
> or PoS.
> 3. A downstream observer accesses the affected memory address after the
> early completion response is issued but before the actual memory
> write has completed. This allows the observer to read stale data
> that has not yet been updated at the PoS or downstream memory.
>
> The implementation of workaround put a second loop of CMOs at the same
> virtual address whose operation meet erratum conditions to wait until
> cache data be cleaned to PoC. This way of implementation mitigates
> performance penalty compared to purely duplicate original CMO.
>
> Cc: stable@vger.kernel.org # 6.12.x
> Signed-off-by: Lucas Wei <lucaswei@google.com>
> ---
>
> Changes in v3:
>
> 1. Fix typos
> 2. Remove 'lkp@intel.com' from commit message
> 3. Keep ARM within a single section
> 4. Remove workaround of #4311569 from `cache_inval_poc()`
>
> Changes in v2:
>
> 1. Fixed warning from kernel test robot by changing
> arm_si_l1_workaround_4311569 to static
> [Reported-by: kernel test robot <lkp@intel.com>]
>
> ---
> Documentation/arch/arm64/silicon-errata.rst | 1 +
> arch/arm64/Kconfig | 19 +++++++++++++
> arch/arm64/include/asm/assembler.h | 10 +++++++
> arch/arm64/kernel/cpu_errata.c | 31 +++++++++++++++++++++
> arch/arm64/tools/cpucaps | 1 +
> 5 files changed, 62 insertions(+)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 93173f0a09c7..89326bb26f48 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -1155,6 +1155,25 @@ config ARM64_ERRATUM_3194386
>
> If unsure, say Y.
>
> +config ARM64_ERRATUM_4311569
> + bool "SI L1: 4311569: workaround for premature CMO completion erratum"
> + default y
> + help
> + This option adds the workaround for ARM SI L1 erratum 4311569.
> +
> + The erratum of SI L1 can cause an early response to a combined write
> + and cache maintenance operation (WR+CMO) before the operation is fully
> + completed to the Point of Serialization (POS).
> + This can result in a non-I/O coherent agent observing stale data,
> + potentially leading to system instability or incorrect behavior.
> +
> + Enabling this option implements a software workaround by inserting a
> + second loop of Cache Maintenance Operation (CMO) immediately following the
> + end of function to do CMOs. This ensures that the data is correctly serialized
> + before the buffer is handed off to a non-coherent agent.
> +
> + If unsure, say Y.
> +
> config CAVIUM_ERRATUM_22375
> bool "Cavium erratum 22375, 24313"
> default y
[snip]
> diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
> index 8cb3b575a031..5c0ab6bfd44a 100644
> --- a/arch/arm64/kernel/cpu_errata.c
> +++ b/arch/arm64/kernel/cpu_errata.c
> @@ -141,6 +141,30 @@ has_mismatched_cache_type(const struct arm64_cpu_capabilities *entry,
> return (ctr_real != sys) && (ctr_raw != sys);
> }
>
> +#ifdef CONFIG_ARM64_ERRATUM_4311569
> +static DEFINE_STATIC_KEY_FALSE(arm_si_l1_workaround_4311569);
> +static int __init early_arm_si_l1_workaround_4311569_cfg(char *arg)
> +{
> + static_branch_enable(&arm_si_l1_workaround_4311569);
> + pr_info("Enabling cache maintenance workaround for ARM SI-L1 erratum 4311569\n");
> +
> + return 0;
> +}
> +early_param("arm_si_l1_workaround_4311569", early_arm_si_l1_workaround_4311569_cfg);
> +
It looks like all other errata don't use early_param() -- are they auto-detected?
Could this one be auto-detected?
> +/*
> + * We have some earlier use cases to call cache maintenance operation functions, for example,
> + * dcache_inval_poc() and dcache_clean_poc() in head.S, before making decision to turn on this
> + * workaround. Since the scope of this workaround is limited to non-coherent DMA agents, its
> + * safe to have the workaround off by default.
But it's not off by default...
[snip]
thanks.
--
~Randy
Hi Randy,
> > diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
> > index 8cb3b575a031..5c0ab6bfd44a 100644
> > --- a/arch/arm64/kernel/cpu_errata.c
> > +++ b/arch/arm64/kernel/cpu_errata.c
> > @@ -141,6 +141,30 @@ has_mismatched_cache_type(const struct arm64_cpu_capabilities *entry,
> > return (ctr_real != sys) && (ctr_raw != sys);
> > }
> >
> > +#ifdef CONFIG_ARM64_ERRATUM_4311569
> > +static DEFINE_STATIC_KEY_FALSE(arm_si_l1_workaround_4311569);
> > +static int __init early_arm_si_l1_workaround_4311569_cfg(char *arg)
> > +{
> > + static_branch_enable(&arm_si_l1_workaround_4311569);
> > + pr_info("Enabling cache maintenance workaround for ARM SI-L1 erratum 4311569\n");
> > +
> > + return 0;
> > +}
> > +early_param("arm_si_l1_workaround_4311569", early_arm_si_l1_workaround_4311569_cfg);
> > +
>
> It looks like all other errata don't use early_param() -- are they auto-detected?
> Could this one be auto-detected?
Sadly, this can't be auto-detected...
In my v2 patches, thanks Marc and Will for pointing this question out
and we don't have a reliable way to detect
errata in runtime because Linux generally doesn't need to worry about the SLC.
Robin also proposes a few feasible ways(SMCCC, top-level SoC/platform
compatible or kernel cmdline) to
enable this workaround. But, I think it would be more straightforward
to let the admin to enable this workaround via cmdline.
> > +/*
> > + * We have some earlier use cases to call cache maintenance operation functions, for example,
> > + * dcache_inval_poc() and dcache_clean_poc() in head.S, before making decision to turn on this
> > + * workaround. Since the scope of this workaround is limited to non-coherent DMA agents, its
> > + * safe to have the workaround off by default.
>
> But it's not off by default...
I think it's off by default.
Would you point me to where the workaround was enabled without cmdline?
Thanks.
- Lucas
On 1/14/26 6:11 PM, Lucas Wei wrote: > Hi Randy, > >>> +/* >>> + * We have some earlier use cases to call cache maintenance operation functions, for example, >>> + * dcache_inval_poc() and dcache_clean_poc() in head.S, before making decision to turn on this >>> + * workaround. Since the scope of this workaround is limited to non-coherent DMA agents, its >>> + * safe to have the workaround off by default. >> >> But it's not off by default... > > I think it's off by default. > Would you point me to where the workaround was enabled without cmdline? I'm probably confused by the Kconfig option defaulting to 'y' but the run-time option itself is still off by default. Sorry for the noise. +config ARM64_ERRATUM_4311569 + bool "SI L1: 4311569: workaround for premature CMO completion erratum" + default y + help + This option adds the workaround for ARM SI L1 erratum 4311569. -- ~Randy
© 2016 - 2026 Red Hat, Inc.