[PATCH] perf: arm_spe: Add barrier before enabling profiling buffer

James Clark posted 1 patch 2 weeks, 1 day ago
drivers/perf/arm_spe_pmu.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
[PATCH] perf: arm_spe: Add barrier before enabling profiling buffer
Posted by James Clark 2 weeks, 1 day ago
The Arm ARM known issues document [1] states that the architecture will
be relaxed so that the profiling buffer must be correctly configured
when ProfilingBufferEnabled() && !SPEProfilingStopped() &&
PMBLIMITR_EL1.FM != DISCARD:

  R24557

  While the Profiling Buffer is enabled, profiling is not stopped, and
  Discard mode is not enabled, all of the following must be true:

  * The current write pointer must be at least one sample record below
    the write limit pointer.

The same relaxation also says that writes may be completely ignored:

  When the Profiling Buffer is enabled, profiling is not stopped, and
  Discard mode is not enabled, the PE might ignore a direct write to any
  of the following Profiling Buffer registers, other than a direct write
  to PMBLIMITR_EL1 that clears PMBLIMITR_EL1.E from 1 to 0:

  * The current write pointer, PMBPTR_EL1.
  * The Limit pointer, PMBLIMITR_EL1.
  * PMBSR_EL1.

When arm_spe_pmu_start() occurs, SPEProfilingStopped() is false
(PMBSR_EL1.S == 0) meaning that the write to PMBLIMITR_EL1 now becomes
the point where the buffer configuration must be correct by, rather than
the "When profiling becomes enabled" (StatisticalProfilingEnabled())
from the old rule which is much later when PMSCR_EL1 is written.

If the writes to PMBLIMITR_EL1 and PMBPTR_EL1 are re-ordered then a
misconfigured state could be observed, resulting in a buffer management
event. Or the write to PMBPTR_EL1 could be ignored.

Fix it by adding an isb() after the write to PMBPTR_EL1 to ensure that
this completes before enabling the buffer.

To avoid redundant isb()s in the IRQ handler, remove the isb() between
the PMBLIMITR_EL1 write and SYS_PMBSR_EL1 as it doesn't matter which
order these happen in now that all the previous configuration is covered
by the new isb().

This issue is only present in arm_spe_pmu_start() and not in the IRQ
handler because SPEProfilingStopped() is true in the IRQ handler. Jumps
to the out_write_limit label will skip the isb() but this is ok as they
only happen if discard mode is set or the buffer isn't enabled so
correct configuration is not required.

[1]: https://developer.arm.com/documentation/102105/latest/

Fixes: d5d9696b0380 ("drivers/perf: Add support for ARMv8.2 Statistical Profiling Extension")
Signed-off-by: James Clark <james.clark@linaro.org>
---
A previous version of this was posted here [1] bundled with other
changes to support running in a guest. Since then the known issues doc
linked in the commit message has been released so this is a resend of
only the critical part that also needs to be fixed for hosts.

A redundant isb() has also been removed in this version which is not
present in the previous version.

[1]: https://lore.kernel.org/linux-arm-kernel/20250701-james-spe-vm-interface-v1-0-52a2cd223d00@linaro.org/
---
 drivers/perf/arm_spe_pmu.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
index 4801115f2b54..62ae409fd5b4 100644
--- a/drivers/perf/arm_spe_pmu.c
+++ b/drivers/perf/arm_spe_pmu.c
@@ -639,6 +639,7 @@ static void arm_spe_perf_aux_output_begin(struct perf_output_handle *handle,
 	limit += (u64)buf->base;
 	base = (u64)buf->base + PERF_IDX2OFF(handle->head, buf);
 	write_sysreg_s(base, SYS_PMBPTR_EL1);
+	isb();
 
 out_write_limit:
 	write_sysreg_s(limit, SYS_PMBLIMITR_EL1);
@@ -780,10 +781,8 @@ static irqreturn_t arm_spe_pmu_irq_handler(int irq, void *dev)
 		 * PMBPTR might be misaligned, but we'll burn that bridge
 		 * when we get to it.
 		 */
-		if (!(handle->aux_flags & PERF_AUX_FLAG_TRUNCATED)) {
+		if (!(handle->aux_flags & PERF_AUX_FLAG_TRUNCATED))
 			arm_spe_perf_aux_output_begin(handle, event);
-			isb();
-		}
 		break;
 	case SPE_PMU_BUF_FAULT_ACT_SPURIOUS:
 		/* We've seen you before, but GCC has the memory of a sieve. */

---
base-commit: c072629f05d7bca1148ab17690d7922a31423984
change-id: 20260123-james-spe-relaxation-d6621c7a68ff

Best regards,
-- 
James Clark <james.clark@linaro.org>
Re: [PATCH] perf: arm_spe: Add barrier before enabling profiling buffer
Posted by Will Deacon 5 days, 18 hours ago
On Fri, Jan 23, 2026 at 04:03:53PM +0000, James Clark wrote:
> The Arm ARM known issues document [1] states that the architecture will
> be relaxed so that the profiling buffer must be correctly configured
> when ProfilingBufferEnabled() && !SPEProfilingStopped() &&
> PMBLIMITR_EL1.FM != DISCARD:
> 
>   R24557
> 
>   While the Profiling Buffer is enabled, profiling is not stopped, and
>   Discard mode is not enabled, all of the following must be true:
> 
>   * The current write pointer must be at least one sample record below
>     the write limit pointer.
> 
> The same relaxation also says that writes may be completely ignored:
> 
>   When the Profiling Buffer is enabled, profiling is not stopped, and
>   Discard mode is not enabled, the PE might ignore a direct write to any
>   of the following Profiling Buffer registers, other than a direct write
>   to PMBLIMITR_EL1 that clears PMBLIMITR_EL1.E from 1 to 0:
> 
>   * The current write pointer, PMBPTR_EL1.
>   * The Limit pointer, PMBLIMITR_EL1.
>   * PMBSR_EL1.

Thinking about this some more, does that mean that the direct write to
PMBPTR_EL1 performs an indirect read of PMBLIMITR_EL1 so that it can
determine the write-ignore semantics? If so, doesn't that mean that
we'll get order against a subsequent direct write of PMBLIMITR_EL1
without an ISB thanks to table "D24-1 Synchronization requirements"
which says that an indirect read followed by a direct write doesn't
require synchronisation?

There's also a sentence above the table stating:

"Direct writes to System registers are not allowed to affect any
 instructions appearing in program order before the direct write."

so after all that, I'm not really sure why the ISB is required.

Will
Re: [PATCH] perf: arm_spe: Add barrier before enabling profiling buffer
Posted by James Clark 5 days, 2 hours ago

On 02/02/2026 7:03 pm, Will Deacon wrote:
> On Fri, Jan 23, 2026 at 04:03:53PM +0000, James Clark wrote:
>> The Arm ARM known issues document [1] states that the architecture will
>> be relaxed so that the profiling buffer must be correctly configured
>> when ProfilingBufferEnabled() && !SPEProfilingStopped() &&
>> PMBLIMITR_EL1.FM != DISCARD:
>>
>>    R24557
>>
>>    While the Profiling Buffer is enabled, profiling is not stopped, and
>>    Discard mode is not enabled, all of the following must be true:
>>
>>    * The current write pointer must be at least one sample record below
>>      the write limit pointer.
>>
>> The same relaxation also says that writes may be completely ignored:
>>
>>    When the Profiling Buffer is enabled, profiling is not stopped, and
>>    Discard mode is not enabled, the PE might ignore a direct write to any
>>    of the following Profiling Buffer registers, other than a direct write
>>    to PMBLIMITR_EL1 that clears PMBLIMITR_EL1.E from 1 to 0:
>>
>>    * The current write pointer, PMBPTR_EL1.
>>    * The Limit pointer, PMBLIMITR_EL1.
>>    * PMBSR_EL1.
> 
> Thinking about this some more, does that mean that the direct write to
> PMBPTR_EL1 performs an indirect read of PMBLIMITR_EL1 so that it can
> determine the write-ignore semantics? If so, doesn't that mean that
> we'll get order against a subsequent direct write of PMBLIMITR_EL1
> without an ISB thanks to table "D24-1 Synchronization requirements"
> which says that an indirect read followed by a direct write doesn't
> require synchronisation?
> 
> There's also a sentence above the table stating:
> 
> "Direct writes to System registers are not allowed to affect any
>   instructions appearing in program order before the direct write."
> 
> so after all that, I'm not really sure why the ISB is required.
> 
> Will

We were under the impression that this was required for the SPU as it is 
treated as a separate entity than the PE.

In "D17.9 Synchronization and Statistical Profiling" there is:

   INDWCG

   A Context Synchronization event guarantees that a direct write to a
   System register made by the PE in program order before the Context
   synchronization event are observable by indirect reads and indirect
   writes of the same System register made by a profiling operation
   relating to a sampled operation in program order after the Context
   synchronization event.

That specifically mentions an indirect read following a direct write, 
which seems to contradict D24-1. Although I thought this is a special 
case for SPE.
Re: [PATCH] perf: arm_spe: Add barrier before enabling profiling buffer
Posted by Will Deacon 5 days, 2 hours ago
On Tue, Feb 03, 2026 at 10:46:37AM +0000, James Clark wrote:
> 
> 
> On 02/02/2026 7:03 pm, Will Deacon wrote:
> > On Fri, Jan 23, 2026 at 04:03:53PM +0000, James Clark wrote:
> > > The Arm ARM known issues document [1] states that the architecture will
> > > be relaxed so that the profiling buffer must be correctly configured
> > > when ProfilingBufferEnabled() && !SPEProfilingStopped() &&
> > > PMBLIMITR_EL1.FM != DISCARD:
> > > 
> > >    R24557
> > > 
> > >    While the Profiling Buffer is enabled, profiling is not stopped, and
> > >    Discard mode is not enabled, all of the following must be true:
> > > 
> > >    * The current write pointer must be at least one sample record below
> > >      the write limit pointer.
> > > 
> > > The same relaxation also says that writes may be completely ignored:
> > > 
> > >    When the Profiling Buffer is enabled, profiling is not stopped, and
> > >    Discard mode is not enabled, the PE might ignore a direct write to any
> > >    of the following Profiling Buffer registers, other than a direct write
> > >    to PMBLIMITR_EL1 that clears PMBLIMITR_EL1.E from 1 to 0:
> > > 
> > >    * The current write pointer, PMBPTR_EL1.
> > >    * The Limit pointer, PMBLIMITR_EL1.
> > >    * PMBSR_EL1.
> > 
> > Thinking about this some more, does that mean that the direct write to
> > PMBPTR_EL1 performs an indirect read of PMBLIMITR_EL1 so that it can
> > determine the write-ignore semantics? If so, doesn't that mean that
> > we'll get order against a subsequent direct write of PMBLIMITR_EL1
> > without an ISB thanks to table "D24-1 Synchronization requirements"
> > which says that an indirect read followed by a direct write doesn't
> > require synchronisation?
> > 
> > There's also a sentence above the table stating:
> > 
> > "Direct writes to System registers are not allowed to affect any
> >   instructions appearing in program order before the direct write."
> > 
> > so after all that, I'm not really sure why the ISB is required.
> > 
> > Will
> 
> We were under the impression that this was required for the SPU as it is
> treated as a separate entity than the PE.
> 
> In "D17.9 Synchronization and Statistical Profiling" there is:
> 
>   INDWCG
> 
>   A Context Synchronization event guarantees that a direct write to a
>   System register made by the PE in program order before the Context
>   synchronization event are observable by indirect reads and indirect
>   writes of the same System register made by a profiling operation
>   relating to a sampled operation in program order after the Context
>   synchronization event.
> 
> That specifically mentions an indirect read following a direct write, which
> seems to contradict D24-1. Although I thought this is a special case for
> SPE.

My reading of the the text above is that it is covering the direct write
-> indirect read case, whereas I think the case in the SPE driver that
we're considering for your patch is when we have an indirect read
followed by a direct write.

Will
Re: [PATCH] perf: arm_spe: Add barrier before enabling profiling buffer
Posted by James Clark 2 days, 3 hours ago

On 03/02/2026 11:07 am, Will Deacon wrote:
> On Tue, Feb 03, 2026 at 10:46:37AM +0000, James Clark wrote:
>>
>>
>> On 02/02/2026 7:03 pm, Will Deacon wrote:
>>> On Fri, Jan 23, 2026 at 04:03:53PM +0000, James Clark wrote:
>>>> The Arm ARM known issues document [1] states that the architecture will
>>>> be relaxed so that the profiling buffer must be correctly configured
>>>> when ProfilingBufferEnabled() && !SPEProfilingStopped() &&
>>>> PMBLIMITR_EL1.FM != DISCARD:
>>>>
>>>>     R24557
>>>>
>>>>     While the Profiling Buffer is enabled, profiling is not stopped, and
>>>>     Discard mode is not enabled, all of the following must be true:
>>>>
>>>>     * The current write pointer must be at least one sample record below
>>>>       the write limit pointer.
>>>>
>>>> The same relaxation also says that writes may be completely ignored:
>>>>
>>>>     When the Profiling Buffer is enabled, profiling is not stopped, and
>>>>     Discard mode is not enabled, the PE might ignore a direct write to any
>>>>     of the following Profiling Buffer registers, other than a direct write
>>>>     to PMBLIMITR_EL1 that clears PMBLIMITR_EL1.E from 1 to 0:
>>>>
>>>>     * The current write pointer, PMBPTR_EL1.
>>>>     * The Limit pointer, PMBLIMITR_EL1.
>>>>     * PMBSR_EL1.
>>>
>>> Thinking about this some more, does that mean that the direct write to
>>> PMBPTR_EL1 performs an indirect read of PMBLIMITR_EL1 so that it can
>>> determine the write-ignore semantics? If so, doesn't that mean that
>>> we'll get order against a subsequent direct write of PMBLIMITR_EL1
>>> without an ISB thanks to table "D24-1 Synchronization requirements"
>>> which says that an indirect read followed by a direct write doesn't
>>> require synchronisation?
>>>
>>> There's also a sentence above the table stating:
>>>
>>> "Direct writes to System registers are not allowed to affect any
>>>    instructions appearing in program order before the direct write."
>>>
>>> so after all that, I'm not really sure why the ISB is required.
>>>
>>> Will
>>
>> We were under the impression that this was required for the SPU as it is
>> treated as a separate entity than the PE.
>>
>> In "D17.9 Synchronization and Statistical Profiling" there is:
>>
>>    INDWCG
>>
>>    A Context Synchronization event guarantees that a direct write to a
>>    System register made by the PE in program order before the Context
>>    synchronization event are observable by indirect reads and indirect
>>    writes of the same System register made by a profiling operation
>>    relating to a sampled operation in program order after the Context
>>    synchronization event.
>>
>> That specifically mentions an indirect read following a direct write, which
>> seems to contradict D24-1. Although I thought this is a special case for
>> SPE.
> 
> My reading of the the text above is that it is covering the direct write
> -> indirect read case, whereas I think the case in the SPE driver that
> we're considering for your patch is when we have an indirect read
> followed by a direct write.
> 
> Will

Yeah, and that text also only applies to "profiling operations", not 
writes to PMBPTR and PMBLIMITR.

Upon further investigation you are correct about the isb() not being 
required, even with the new relaxation. Seems like we just accepted that 
the relaxation required some change to the driver without really 
thinking about it. But yeah thanks for looking in detail and catching it.

So we can drop this now. Sorry for the noise.

James
Re: [PATCH] perf: arm_spe: Add barrier before enabling profiling buffer
Posted by Leo Yan 1 week, 1 day ago
On Fri, Jan 23, 2026 at 04:03:53PM +0000, James Clark wrote:
> The Arm ARM known issues document [1] states that the architecture will
> be relaxed so that the profiling buffer must be correctly configured
> when ProfilingBufferEnabled() && !SPEProfilingStopped() &&
> PMBLIMITR_EL1.FM != DISCARD:
> 
>   R24557
> 
>   While the Profiling Buffer is enabled, profiling is not stopped, and
>   Discard mode is not enabled, all of the following must be true:
> 
>   * The current write pointer must be at least one sample record below
>     the write limit pointer.
> 
> The same relaxation also says that writes may be completely ignored:
> 
>   When the Profiling Buffer is enabled, profiling is not stopped, and
>   Discard mode is not enabled, the PE might ignore a direct write to any
>   of the following Profiling Buffer registers, other than a direct write
>   to PMBLIMITR_EL1 that clears PMBLIMITR_EL1.E from 1 to 0:
> 
>   * The current write pointer, PMBPTR_EL1.
>   * The Limit pointer, PMBLIMITR_EL1.
>   * PMBSR_EL1.
> 
> When arm_spe_pmu_start() occurs, SPEProfilingStopped() is false
> (PMBSR_EL1.S == 0) meaning that the write to PMBLIMITR_EL1 now becomes
> the point where the buffer configuration must be correct by, rather than
> the "When profiling becomes enabled" (StatisticalProfilingEnabled())
> from the old rule which is much later when PMSCR_EL1 is written.
> 
> If the writes to PMBLIMITR_EL1 and PMBPTR_EL1 are re-ordered then a
> misconfigured state could be observed, resulting in a buffer management
> event. Or the write to PMBPTR_EL1 could be ignored.
> 
> Fix it by adding an isb() after the write to PMBPTR_EL1 to ensure that
> this completes before enabling the buffer.

Makes sense.

> To avoid redundant isb()s in the IRQ handler, remove the isb() between
> the PMBLIMITR_EL1 write and SYS_PMBSR_EL1 as it doesn't matter which
> order these happen in now that all the previous configuration is covered
> by the new isb().

The isb() in the interrupt handler is useful and should not be removed.

See the sequence in the interrupt handler:

    arm_spe_perf_aux_output_begin() {
        write_sysreg_s(base, SYS_PMBPTR_EL1);

        // Ensure the write pointer is ordered
        isb();

        write_sysreg_s(limit, SYS_PMBLIMITR_EL1);
    }

    // Ensure the limit pointer is ordered
    isb();

    // Profiling is enabled:
    //   (PMBLIMITR_EL1.E==1) && (PMBSR_EL1.S==0) && (not discard mode)
    write_sysreg_s(0, SYS_PMBSR_EL1);

The first isb() ensures that the write pointer update is ordered.  The
second isb() ensures that the limit pointer is visible before profiling
is enabled by clearing the PMBSR_EL1.S bit.  When handling a normal
maintenance interrupt, PMBSR_EL1.S is set by the SPE to stop tracing,
while PMBLIMITR_EL1.E remains set.  Clearing PMBSR_EL1.S therefore
effectively re-enables profiling.

Since the second isb() is a synchronization for both the write pointer
and the limit pointer before profiling is enabled, it could be argued
that the first isb() is redundant in the interrupt handling.  However,
the first isb() is crucial for the arm_spe_pmu_start() case, and keeping
it in the interrupt handler does no harm and simplifies code maintenance.

In short, if preserves the second isb() instead of removing it, the
change looks good to me.

Thanks,
Leo

> This issue is only present in arm_spe_pmu_start() and not in the IRQ
> handler because SPEProfilingStopped() is true in the IRQ handler. Jumps
> to the out_write_limit label will skip the isb() but this is ok as they
> only happen if discard mode is set or the buffer isn't enabled so
> correct configuration is not required.
> 
> [1]: https://developer.arm.com/documentation/102105/latest/
> 
> Fixes: d5d9696b0380 ("drivers/perf: Add support for ARMv8.2 Statistical Profiling Extension")
> Signed-off-by: James Clark <james.clark@linaro.org>
> ---
> A previous version of this was posted here [1] bundled with other
> changes to support running in a guest. Since then the known issues doc
> linked in the commit message has been released so this is a resend of
> only the critical part that also needs to be fixed for hosts.
> 
> A redundant isb() has also been removed in this version which is not
> present in the previous version.
> 
> [1]: https://lore.kernel.org/linux-arm-kernel/20250701-james-spe-vm-interface-v1-0-52a2cd223d00@linaro.org/
> ---
>  drivers/perf/arm_spe_pmu.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
> index 4801115f2b54..62ae409fd5b4 100644
> --- a/drivers/perf/arm_spe_pmu.c
> +++ b/drivers/perf/arm_spe_pmu.c
> @@ -639,6 +639,7 @@ static void arm_spe_perf_aux_output_begin(struct perf_output_handle *handle,
>  	limit += (u64)buf->base;
>  	base = (u64)buf->base + PERF_IDX2OFF(handle->head, buf);
>  	write_sysreg_s(base, SYS_PMBPTR_EL1);
> +	isb();
>  
>  out_write_limit:
>  	write_sysreg_s(limit, SYS_PMBLIMITR_EL1);
> @@ -780,10 +781,8 @@ static irqreturn_t arm_spe_pmu_irq_handler(int irq, void *dev)
>  		 * PMBPTR might be misaligned, but we'll burn that bridge
>  		 * when we get to it.
>  		 */
> -		if (!(handle->aux_flags & PERF_AUX_FLAG_TRUNCATED)) {
> +		if (!(handle->aux_flags & PERF_AUX_FLAG_TRUNCATED))
>  			arm_spe_perf_aux_output_begin(handle, event);
> -			isb();
> -		}
>  		break;
>  	case SPE_PMU_BUF_FAULT_ACT_SPURIOUS:
>  		/* We've seen you before, but GCC has the memory of a sieve. */
> 
> ---
> base-commit: c072629f05d7bca1148ab17690d7922a31423984
> change-id: 20260123-james-spe-relaxation-d6621c7a68ff
> 
> Best regards,
> -- 
> James Clark <james.clark@linaro.org>
>
Re: [PATCH] perf: arm_spe: Add barrier before enabling profiling buffer
Posted by Will Deacon 5 days, 20 hours ago
On Fri, Jan 30, 2026 at 08:24:37PM +0000, Leo Yan wrote:
> On Fri, Jan 23, 2026 at 04:03:53PM +0000, James Clark wrote:
> > The Arm ARM known issues document [1] states that the architecture will
> > be relaxed so that the profiling buffer must be correctly configured
> > when ProfilingBufferEnabled() && !SPEProfilingStopped() &&
> > PMBLIMITR_EL1.FM != DISCARD:
> > 
> >   R24557
> > 
> >   While the Profiling Buffer is enabled, profiling is not stopped, and
> >   Discard mode is not enabled, all of the following must be true:
> > 
> >   * The current write pointer must be at least one sample record below
> >     the write limit pointer.
> > 
> > The same relaxation also says that writes may be completely ignored:
> > 
> >   When the Profiling Buffer is enabled, profiling is not stopped, and
> >   Discard mode is not enabled, the PE might ignore a direct write to any
> >   of the following Profiling Buffer registers, other than a direct write
> >   to PMBLIMITR_EL1 that clears PMBLIMITR_EL1.E from 1 to 0:
> > 
> >   * The current write pointer, PMBPTR_EL1.
> >   * The Limit pointer, PMBLIMITR_EL1.
> >   * PMBSR_EL1.

Oh nice, since when was it ok to relax the architecture and break
existing drivers that were perfectly fine before? The SPE spec's not
worth the paper it's written on...

Anyway, we're not changing the driver without a comment next to the new
isb() explaining the backwards incompatible change.

> > When arm_spe_pmu_start() occurs, SPEProfilingStopped() is false
> > (PMBSR_EL1.S == 0) meaning that the write to PMBLIMITR_EL1 now becomes
> > the point where the buffer configuration must be correct by, rather than
> > the "When profiling becomes enabled" (StatisticalProfilingEnabled())
> > from the old rule which is much later when PMSCR_EL1 is written.
> > 
> > If the writes to PMBLIMITR_EL1 and PMBPTR_EL1 are re-ordered then a
> > misconfigured state could be observed, resulting in a buffer management
> > event. Or the write to PMBPTR_EL1 could be ignored.
> > 
> > Fix it by adding an isb() after the write to PMBPTR_EL1 to ensure that
> > this completes before enabling the buffer.
> 
> Makes sense.
> 
> > To avoid redundant isb()s in the IRQ handler, remove the isb() between
> > the PMBLIMITR_EL1 write and SYS_PMBSR_EL1 as it doesn't matter which
> > order these happen in now that all the previous configuration is covered
> > by the new isb().
> 
> The isb() in the interrupt handler is useful and should not be removed.
> 
> See the sequence in the interrupt handler:
> 
>     arm_spe_perf_aux_output_begin() {
>         write_sysreg_s(base, SYS_PMBPTR_EL1);
> 
>         // Ensure the write pointer is ordered
>         isb();
> 
>         write_sysreg_s(limit, SYS_PMBLIMITR_EL1);
>     }
> 
>     // Ensure the limit pointer is ordered
>     isb();
> 
>     // Profiling is enabled:
>     //   (PMBLIMITR_EL1.E==1) && (PMBSR_EL1.S==0) && (not discard mode)
>     write_sysreg_s(0, SYS_PMBSR_EL1);
> 
> The first isb() ensures that the write pointer update is ordered.  The
> second isb() ensures that the limit pointer is visible before profiling
> is enabled by clearing the PMBSR_EL1.S bit.  When handling a normal
> maintenance interrupt, PMBSR_EL1.S is set by the SPE to stop tracing,
> while PMBLIMITR_EL1.E remains set.  Clearing PMBSR_EL1.S therefore
> effectively re-enables profiling.
> 
> Since the second isb() is a synchronization for both the write pointer
> and the limit pointer before profiling is enabled, it could be argued
> that the first isb() is redundant in the interrupt handling.  However,
> the first isb() is crucial for the arm_spe_pmu_start() case, and keeping
> it in the interrupt handler does no harm and simplifies code maintenance.
> 
> In short, if preserves the second isb() instead of removing it, the
> change looks good to me.

I'm not sure I follow your logic as to why both ISBs are required, but
I'd have thought that if perf_aux_output_begin() fails when called from
arm_spe_perf_aux_output_begin() in the irqhandler, we need the ISB
because we're going to clear pmblimitr_el1 to 0 and that surely has
to be ordered before clearing pmbsr?

Will
Re: [PATCH] perf: arm_spe: Add barrier before enabling profiling buffer
Posted by Leo Yan 5 days, 19 hours ago
On Mon, Feb 02, 2026 at 04:53:57PM +0000, Will Deacon wrote:

[...]

> > > To avoid redundant isb()s in the IRQ handler, remove the isb() between
> > > the PMBLIMITR_EL1 write and SYS_PMBSR_EL1 as it doesn't matter which
> > > order these happen in now that all the previous configuration is covered
> > > by the new isb().
> > 
> > The isb() in the interrupt handler is useful and should not be removed.
> > 
> > See the sequence in the interrupt handler:
> > 
> >     arm_spe_perf_aux_output_begin() {
> >         write_sysreg_s(base, SYS_PMBPTR_EL1);
> > 
> >         // Ensure the write pointer is ordered
> >         isb();
> > 
> >         write_sysreg_s(limit, SYS_PMBLIMITR_EL1);
> >     }
> > 
> >     // Ensure the limit pointer is ordered
> >     isb();
> > 
> >     // Profiling is enabled:
> >     //   (PMBLIMITR_EL1.E==1) && (PMBSR_EL1.S==0) && (not discard mode)
> >     write_sysreg_s(0, SYS_PMBSR_EL1);
> > 
> > The first isb() ensures that the write pointer update is ordered.  The
> > second isb() ensures that the limit pointer is visible before profiling
> > is enabled by clearing the PMBSR_EL1.S bit.  When handling a normal
> > maintenance interrupt, PMBSR_EL1.S is set by the SPE to stop tracing,
> > while PMBLIMITR_EL1.E remains set.  Clearing PMBSR_EL1.S therefore
> > effectively re-enables profiling.
> > 
> > Since the second isb() is a synchronization for both the write pointer
> > and the limit pointer before profiling is enabled, it could be argued
> > that the first isb() is redundant in the interrupt handling.  However,
> > the first isb() is crucial for the arm_spe_pmu_start() case, and keeping
> > it in the interrupt handler does no harm and simplifies code maintenance.
> > 
> > In short, if preserves the second isb() instead of removing it, the
> > change looks good to me.
> 
> I'm not sure I follow your logic as to why both ISBs are required, but
> I'd have thought that if perf_aux_output_begin() fails when called from
> arm_spe_perf_aux_output_begin() in the irqhandler, we need the ISB
> because we're going to clear pmblimitr_el1 to 0 and that surely has
> to be ordered before clearing pmbsr?

I think the ISB after arm_spe_perf_aux_output_begin() in the irq
handler is required for both the failure and success cases.

For a normal maintenance interrupt, an ISB is inserted between writing
PMBLIMITR_EL1 and PMBSR_EL1 to ensure that a valid limit write is
visible before tracing restarts.  This ensures that the following
conditions are safely met:

  "While the Profiling Buffer is enabled, profiling is not stopped, and
   Discard mode is not enabled, all of the following must be true:

   The current write pointer must be at least one sample record below
   the write limit pointer.

   PMBPTR_EL1.PTR[63:56] must equal PMBLIMITR_EL1.LIMIT[63:56],
   regardless of the value of the applicable TBI bit."

Thanks,
Leo
Re: [PATCH] perf: arm_spe: Add barrier before enabling profiling buffer
Posted by Will Deacon 5 days, 18 hours ago
On Mon, Feb 02, 2026 at 06:42:34PM +0000, Leo Yan wrote:
> On Mon, Feb 02, 2026 at 04:53:57PM +0000, Will Deacon wrote:
> 
> [...]
> 
> > > > To avoid redundant isb()s in the IRQ handler, remove the isb() between
> > > > the PMBLIMITR_EL1 write and SYS_PMBSR_EL1 as it doesn't matter which
> > > > order these happen in now that all the previous configuration is covered
> > > > by the new isb().
> > > 
> > > The isb() in the interrupt handler is useful and should not be removed.
> > > 
> > > See the sequence in the interrupt handler:
> > > 
> > >     arm_spe_perf_aux_output_begin() {
> > >         write_sysreg_s(base, SYS_PMBPTR_EL1);
> > > 
> > >         // Ensure the write pointer is ordered
> > >         isb();
> > > 
> > >         write_sysreg_s(limit, SYS_PMBLIMITR_EL1);
> > >     }
> > > 
> > >     // Ensure the limit pointer is ordered
> > >     isb();
> > > 
> > >     // Profiling is enabled:
> > >     //   (PMBLIMITR_EL1.E==1) && (PMBSR_EL1.S==0) && (not discard mode)
> > >     write_sysreg_s(0, SYS_PMBSR_EL1);
> > > 
> > > The first isb() ensures that the write pointer update is ordered.  The
> > > second isb() ensures that the limit pointer is visible before profiling
> > > is enabled by clearing the PMBSR_EL1.S bit.  When handling a normal
> > > maintenance interrupt, PMBSR_EL1.S is set by the SPE to stop tracing,
> > > while PMBLIMITR_EL1.E remains set.  Clearing PMBSR_EL1.S therefore
> > > effectively re-enables profiling.
> > > 
> > > Since the second isb() is a synchronization for both the write pointer
> > > and the limit pointer before profiling is enabled, it could be argued
> > > that the first isb() is redundant in the interrupt handling.  However,
> > > the first isb() is crucial for the arm_spe_pmu_start() case, and keeping
> > > it in the interrupt handler does no harm and simplifies code maintenance.
> > > 
> > > In short, if preserves the second isb() instead of removing it, the
> > > change looks good to me.
> > 
> > I'm not sure I follow your logic as to why both ISBs are required, but
> > I'd have thought that if perf_aux_output_begin() fails when called from
> > arm_spe_perf_aux_output_begin() in the irqhandler, we need the ISB
> > because we're going to clear pmblimitr_el1 to 0 and that surely has
> > to be ordered before clearing pmbsr?
> 
> I think the ISB after arm_spe_perf_aux_output_begin() in the irq
> handler is required for both the failure and success cases.
> 
> For a normal maintenance interrupt, an ISB is inserted between writing
> PMBLIMITR_EL1 and PMBSR_EL1 to ensure that a valid limit write is
> visible before tracing restarts.  This ensures that the following
> conditions are safely met:
> 
>   "While the Profiling Buffer is enabled, profiling is not stopped, and
>    Discard mode is not enabled, all of the following must be true:
> 
>    The current write pointer must be at least one sample record below
>    the write limit pointer.
> 
>    PMBPTR_EL1.PTR[63:56] must equal PMBLIMITR_EL1.LIMIT[63:56],
>    regardless of the value of the applicable TBI bit."

Hmm, so let's say we've executed the first ISB. At that point, the
Profiling Buffer is disabled (PMBLIMITR_EL1.E = 0) and profiling is
stopped (PMBSR_EL1.S = 1). If we *don't* have the second ISB then either
PMBLIMITR_EL1 is written first or PMBSR_EL1 is written first. But the
text you quoted will only come into effect once they've both happened,
right? In which case, why does the order matter for the success case?

Will
Re: [PATCH] perf: arm_spe: Add barrier before enabling profiling buffer
Posted by Leo Yan 5 days, 18 hours ago
On Mon, Feb 02, 2026 at 06:57:11PM +0000, Will Deacon wrote:

[...]


> > > I'm not sure I follow your logic as to why both ISBs are required, but
> > > I'd have thought that if perf_aux_output_begin() fails when called from
> > > arm_spe_perf_aux_output_begin() in the irqhandler, we need the ISB
> > > because we're going to clear pmblimitr_el1 to 0 and that surely has
> > > to be ordered before clearing pmbsr?
> > 
> > I think the ISB after arm_spe_perf_aux_output_begin() in the irq
> > handler is required for both the failure and success cases.
> > 
> > For a normal maintenance interrupt, an ISB is inserted between writing
> > PMBLIMITR_EL1 and PMBSR_EL1 to ensure that a valid limit write is
> > visible before tracing restarts.  This ensures that the following
> > conditions are safely met:
> > 
> >   "While the Profiling Buffer is enabled, profiling is not stopped, and
> >    Discard mode is not enabled, all of the following must be true:
> > 
> >    The current write pointer must be at least one sample record below
> >    the write limit pointer.
> > 
> >    PMBPTR_EL1.PTR[63:56] must equal PMBLIMITR_EL1.LIMIT[63:56],
> >    regardless of the value of the applicable TBI bit."
> 
> Hmm, so let's say we've executed the first ISB. At that point, the
> Profiling Buffer is disabled (PMBLIMITR_EL1.E = 0) and profiling is
> stopped (PMBSR_EL1.S = 1).

This is not true.  PMBLIMITR_EL1.E is always 1 during interrupt
handling.

> If we *don't* have the second ISB then either
> PMBLIMITR_EL1 is written first or PMBSR_EL1 is written first. But the
> text you quoted will only come into effect once they've both happened,
> right? In which case, why does the order matter for the success case?

Yes, both PMBLIMITR_EL1.E == 1 and PMBSR_EL1.S == 0 must be true to
enable tracing.

However, the tricky part is that PMBLIMITR_EL1.E remains 1 throughout
the sequence.  Writing PMBLIMITR_EL1 effectively only sets the limit,
while clearing PMBSR_EL1 is the distinct step that enables tracing.

Thanks,
Leo
Re: [PATCH] perf: arm_spe: Add barrier before enabling profiling buffer
Posted by James Clark 5 days, 4 hours ago

On 02/02/2026 7:14 pm, Leo Yan wrote:
> On Mon, Feb 02, 2026 at 06:57:11PM +0000, Will Deacon wrote:
> 
> [...]
> 
> 
>>>> I'm not sure I follow your logic as to why both ISBs are required, but
>>>> I'd have thought that if perf_aux_output_begin() fails when called from
>>>> arm_spe_perf_aux_output_begin() in the irqhandler, we need the ISB
>>>> because we're going to clear pmblimitr_el1 to 0 and that surely has
>>>> to be ordered before clearing pmbsr?
>>>
>>> I think the ISB after arm_spe_perf_aux_output_begin() in the irq
>>> handler is required for both the failure and success cases.
>>>
>>> For a normal maintenance interrupt, an ISB is inserted between writing
>>> PMBLIMITR_EL1 and PMBSR_EL1 to ensure that a valid limit write is
>>> visible before tracing restarts.  This ensures that the following
>>> conditions are safely met:
>>>
>>>    "While the Profiling Buffer is enabled, profiling is not stopped, and
>>>     Discard mode is not enabled, all of the following must be true:
>>>
>>>     The current write pointer must be at least one sample record below
>>>     the write limit pointer.
>>>
>>>     PMBPTR_EL1.PTR[63:56] must equal PMBLIMITR_EL1.LIMIT[63:56],
>>>     regardless of the value of the applicable TBI bit."
>>
>> Hmm, so let's say we've executed the first ISB. At that point, the
>> Profiling Buffer is disabled (PMBLIMITR_EL1.E = 0) and profiling is
>> stopped (PMBSR_EL1.S = 1).
> 
> This is not true.  PMBLIMITR_EL1.E is always 1 during interrupt
> handling.
> 
>> If we *don't* have the second ISB then either
>> PMBLIMITR_EL1 is written first or PMBSR_EL1 is written first. But the
>> text you quoted will only come into effect once they've both happened,
>> right? In which case, why does the order matter for the success case?
> 
> Yes, both PMBLIMITR_EL1.E == 1 and PMBSR_EL1.S == 0 must be true to
> enable tracing.
> 
> However, the tricky part is that PMBLIMITR_EL1.E remains 1 throughout
> the sequence.  Writing PMBLIMITR_EL1 effectively only sets the limit,
> while clearing PMBSR_EL1 is the distinct step that enables tracing.
> 
> Thanks,
> Leo

I think Leo is correct that the old isb() is still needed. I removed it 
under the assumption that PMBLIMITR_EL1.E was unset in the interrupt 
handler. Possibly because the previous version re-arranged the handler 
to do that.

If PMBLIMITR_EL1.E is set, we have to make sure clearing PMBSR_EL1 comes 
last as it's the thing that defines the point where both pointers must 
be correct by.

Re: [PATCH] perf: arm_spe: Add barrier before enabling profiling buffer
Posted by Will Deacon 5 days, 4 hours ago
On Tue, Feb 03, 2026 at 09:29:56AM +0000, James Clark wrote:
> 
> 
> On 02/02/2026 7:14 pm, Leo Yan wrote:
> > On Mon, Feb 02, 2026 at 06:57:11PM +0000, Will Deacon wrote:
> > 
> > [...]
> > 
> > 
> > > > > I'm not sure I follow your logic as to why both ISBs are required, but
> > > > > I'd have thought that if perf_aux_output_begin() fails when called from
> > > > > arm_spe_perf_aux_output_begin() in the irqhandler, we need the ISB
> > > > > because we're going to clear pmblimitr_el1 to 0 and that surely has
> > > > > to be ordered before clearing pmbsr?
> > > > 
> > > > I think the ISB after arm_spe_perf_aux_output_begin() in the irq
> > > > handler is required for both the failure and success cases.
> > > > 
> > > > For a normal maintenance interrupt, an ISB is inserted between writing
> > > > PMBLIMITR_EL1 and PMBSR_EL1 to ensure that a valid limit write is
> > > > visible before tracing restarts.  This ensures that the following
> > > > conditions are safely met:
> > > > 
> > > >    "While the Profiling Buffer is enabled, profiling is not stopped, and
> > > >     Discard mode is not enabled, all of the following must be true:
> > > > 
> > > >     The current write pointer must be at least one sample record below
> > > >     the write limit pointer.
> > > > 
> > > >     PMBPTR_EL1.PTR[63:56] must equal PMBLIMITR_EL1.LIMIT[63:56],
> > > >     regardless of the value of the applicable TBI bit."
> > > 
> > > Hmm, so let's say we've executed the first ISB. At that point, the
> > > Profiling Buffer is disabled (PMBLIMITR_EL1.E = 0) and profiling is
> > > stopped (PMBSR_EL1.S = 1).
> > 
> > This is not true.  PMBLIMITR_EL1.E is always 1 during interrupt
> > handling.

Ah, yes, thank you for correcting me here.

> > > If we *don't* have the second ISB then either
> > > PMBLIMITR_EL1 is written first or PMBSR_EL1 is written first. But the
> > > text you quoted will only come into effect once they've both happened,
> > > right? In which case, why does the order matter for the success case?
> > 
> > Yes, both PMBLIMITR_EL1.E == 1 and PMBSR_EL1.S == 0 must be true to
> > enable tracing.
> > 
> > However, the tricky part is that PMBLIMITR_EL1.E remains 1 throughout
> > the sequence.  Writing PMBLIMITR_EL1 effectively only sets the limit,
> > while clearing PMBSR_EL1 is the distinct step that enables tracing.
> > 
> > Thanks,
> > Leo
> 
> I think Leo is correct that the old isb() is still needed. I removed it
> under the assumption that PMBLIMITR_EL1.E was unset in the interrupt
> handler. Possibly because the previous version re-arranged the handler to do
> that.
> 
> If PMBLIMITR_EL1.E is set, we have to make sure clearing PMBSR_EL1 comes
> last as it's the thing that defines the point where both pointers must be
> correct by.

Agreed that we can't remove the existing isb, but please see my other
reply as I'm not entirely sure we need to add an extra isb to handle the
arch relaxation.

Will