[PATCH v3] perf: arm_spe: Properly set hw.state on failures

Leo Yan posted 1 patch 2 weeks, 4 days ago
There is a newer version of this series
drivers/perf/arm_spe_pmu.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)
[PATCH v3] perf: arm_spe: Properly set hw.state on failures
Posted by Leo Yan 2 weeks, 4 days ago
When arm_spe_pmu_next_off() fails to calculate a valid limit, it returns
zero to indicate that tracing should not start.  However, the caller
arm_spe_perf_aux_output_begin() does not propagate this failure by
updating hwc->state, cause the error to be silently ignored by upper
layers.

Because hwc->state remains zero after a failure, arm_spe_pmu_start()
continues to programs filter registers unnecessarily.  The driver
still reports success to the perf core, so the core assumes the SPE
event was enabled and proceeds to enable other events.  This breaks
event group semantics: SPE is already stopped while other events in the
same group are enabled.

Fix this by updating arm_spe_perf_aux_output_begin() to return a status
code indicating success (0) or failure (-EAGAIN).  Both the interrupt
handler and arm_spe_pmu_start() check the return value and call
arm_spe_pmu_stop() to set PERF_HES_STOPPED in hwc->state.

In the interrupt handler, the period (e.g., period_left) needs to be
updated, so PERF_EF_UPDATE is passed to arm_spe_pmu_stop().  When the
error occurs during event start, the trace unit is not yet enabled, so
a flag '0' is used to drain buffer and update state only.

Fixes: d5d9696b0380 ("drivers/perf: Add support for ARMv8.2 Statistical Profiling Extension")
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
This change has been verified on Orion6 board.
---
Changes in v3:
- Updated arm_spe_perf_aux_output_begin() for returning error and used
  arm_spe_pmu_stop() for setting hw_state properly (Will).
- Link to v2: https://lore.kernel.org/r/20251110-arm_spe_fix_truncated_flag-v2-0-a629740985cc@arm.com
---
 drivers/perf/arm_spe_pmu.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
index 4801115f2b54052d584b59881f458c2640e974ff..b9a6748827cc5557c42023f323a90b5826eadb54 100644
--- a/drivers/perf/arm_spe_pmu.c
+++ b/drivers/perf/arm_spe_pmu.c
@@ -106,6 +106,8 @@ struct arm_spe_pmu {
 /* Keep track of our dynamic hotplug state */
 static enum cpuhp_state arm_spe_pmu_online;
 
+static void arm_spe_pmu_stop(struct perf_event *event, int flags);
+
 enum arm_spe_pmu_buf_fault_action {
 	SPE_PMU_BUF_FAULT_ACT_SPURIOUS,
 	SPE_PMU_BUF_FAULT_ACT_FATAL,
@@ -607,8 +609,8 @@ static u64 arm_spe_pmu_next_off(struct perf_output_handle *handle)
 	return limit;
 }
 
-static void arm_spe_perf_aux_output_begin(struct perf_output_handle *handle,
-					  struct perf_event *event)
+static int arm_spe_perf_aux_output_begin(struct perf_output_handle *handle,
+					 struct perf_event *event)
 {
 	u64 base, limit;
 	struct arm_spe_pmu_buf *buf;
@@ -622,7 +624,6 @@ static void arm_spe_perf_aux_output_begin(struct perf_output_handle *handle,
 	/* Start a new aux session */
 	buf = perf_aux_output_begin(handle, event);
 	if (!buf) {
-		event->hw.state |= PERF_HES_STOPPED;
 		/*
 		 * We still need to clear the limit pointer, since the
 		 * profiler might only be disabled by virtue of a fault.
@@ -642,6 +643,7 @@ static void arm_spe_perf_aux_output_begin(struct perf_output_handle *handle,
 
 out_write_limit:
 	write_sysreg_s(limit, SYS_PMBLIMITR_EL1);
+	return (limit & PMBLIMITR_EL1_E) ? 0 : -EAGAIN;
 }
 
 static void arm_spe_perf_aux_output_end(struct perf_output_handle *handle)
@@ -781,7 +783,10 @@ static irqreturn_t arm_spe_pmu_irq_handler(int irq, void *dev)
 		 * when we get to it.
 		 */
 		if (!(handle->aux_flags & PERF_AUX_FLAG_TRUNCATED)) {
-			arm_spe_perf_aux_output_begin(handle, event);
+			if (arm_spe_perf_aux_output_begin(handle, event)) {
+				arm_spe_pmu_stop(event, PERF_EF_UPDATE);
+				break;
+			}
 			isb();
 		}
 		break;
@@ -880,9 +885,10 @@ static void arm_spe_pmu_start(struct perf_event *event, int flags)
 	struct perf_output_handle *handle = this_cpu_ptr(spe_pmu->handle);
 
 	hwc->state = 0;
-	arm_spe_perf_aux_output_begin(handle, event);
-	if (hwc->state)
+	if (arm_spe_perf_aux_output_begin(handle, event)) {
+		arm_spe_pmu_stop(event, 0);
 		return;
+	}
 
 	reg = arm_spe_event_to_pmsfcr(event);
 	write_sysreg_s(reg, SYS_PMSFCR_EL1);

---
base-commit: c03e9c42ae8f9be76a0cf55ef3f88663f0f6a63a
change-id: 20251015-arm_spe_fix_truncated_flag-1a5b3620a3ab

Best regards,
-- 
Leo Yan <leo.yan@arm.com>
Re: [PATCH v3] perf: arm_spe: Properly set hw.state on failures
Posted by Will Deacon 2 weeks, 3 days ago
On Wed, Jan 21, 2026 at 11:33:21AM +0000, Leo Yan wrote:
> When arm_spe_pmu_next_off() fails to calculate a valid limit, it returns
> zero to indicate that tracing should not start.  However, the caller
> arm_spe_perf_aux_output_begin() does not propagate this failure by
> updating hwc->state, cause the error to be silently ignored by upper
> layers.
> 
> Because hwc->state remains zero after a failure, arm_spe_pmu_start()
> continues to programs filter registers unnecessarily.  The driver
> still reports success to the perf core, so the core assumes the SPE
> event was enabled and proceeds to enable other events.  This breaks
> event group semantics: SPE is already stopped while other events in the
> same group are enabled.
> 
> Fix this by updating arm_spe_perf_aux_output_begin() to return a status
> code indicating success (0) or failure (-EAGAIN).  Both the interrupt
> handler and arm_spe_pmu_start() check the return value and call
> arm_spe_pmu_stop() to set PERF_HES_STOPPED in hwc->state.
> 
> In the interrupt handler, the period (e.g., period_left) needs to be
> updated, so PERF_EF_UPDATE is passed to arm_spe_pmu_stop().  When the
> error occurs during event start, the trace unit is not yet enabled, so
> a flag '0' is used to drain buffer and update state only.
> 
> Fixes: d5d9696b0380 ("drivers/perf: Add support for ARMv8.2 Statistical Profiling Extension")
> Signed-off-by: Leo Yan <leo.yan@arm.com>
> ---
> This change has been verified on Orion6 board.
> ---
> Changes in v3:
> - Updated arm_spe_perf_aux_output_begin() for returning error and used
>   arm_spe_pmu_stop() for setting hw_state properly (Will).
> - Link to v2: https://lore.kernel.org/r/20251110-arm_spe_fix_truncated_flag-v2-0-a629740985cc@arm.com
> ---
>  drivers/perf/arm_spe_pmu.c | 18 ++++++++++++------
>  1 file changed, 12 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
> index 4801115f2b54052d584b59881f458c2640e974ff..b9a6748827cc5557c42023f323a90b5826eadb54 100644
> --- a/drivers/perf/arm_spe_pmu.c
> +++ b/drivers/perf/arm_spe_pmu.c
> @@ -106,6 +106,8 @@ struct arm_spe_pmu {
>  /* Keep track of our dynamic hotplug state */
>  static enum cpuhp_state arm_spe_pmu_online;
>  
> +static void arm_spe_pmu_stop(struct perf_event *event, int flags);

This is fine, but I'm also happy if you want to move the functions around
to avoid the forward declaration.

>  enum arm_spe_pmu_buf_fault_action {
>  	SPE_PMU_BUF_FAULT_ACT_SPURIOUS,
>  	SPE_PMU_BUF_FAULT_ACT_FATAL,
> @@ -607,8 +609,8 @@ static u64 arm_spe_pmu_next_off(struct perf_output_handle *handle)
>  	return limit;
>  }
>  
> -static void arm_spe_perf_aux_output_begin(struct perf_output_handle *handle,
> -					  struct perf_event *event)
> +static int arm_spe_perf_aux_output_begin(struct perf_output_handle *handle,
> +					 struct perf_event *event)
>  {
>  	u64 base, limit;
>  	struct arm_spe_pmu_buf *buf;
> @@ -622,7 +624,6 @@ static void arm_spe_perf_aux_output_begin(struct perf_output_handle *handle,
>  	/* Start a new aux session */
>  	buf = perf_aux_output_begin(handle, event);
>  	if (!buf) {
> -		event->hw.state |= PERF_HES_STOPPED;
>  		/*
>  		 * We still need to clear the limit pointer, since the
>  		 * profiler might only be disabled by virtue of a fault.
> @@ -642,6 +643,7 @@ static void arm_spe_perf_aux_output_begin(struct perf_output_handle *handle,
>  
>  out_write_limit:
>  	write_sysreg_s(limit, SYS_PMBLIMITR_EL1);
> +	return (limit & PMBLIMITR_EL1_E) ? 0 : -EAGAIN;

I'd probably go with -EIO here. -EAGAIN implies that if the caller
retries the operation then it might succeed, which probably isn't the
case for these failures.

>  static void arm_spe_perf_aux_output_end(struct perf_output_handle *handle)
> @@ -781,7 +783,10 @@ static irqreturn_t arm_spe_pmu_irq_handler(int irq, void *dev)
>  		 * when we get to it.
>  		 */
>  		if (!(handle->aux_flags & PERF_AUX_FLAG_TRUNCATED)) {
> -			arm_spe_perf_aux_output_begin(handle, event);
> +			if (arm_spe_perf_aux_output_begin(handle, event)) {
> +				arm_spe_pmu_stop(event, PERF_EF_UPDATE);

Why do you need to pass PERF_EF_UPDATE in this case? It looks to me
like we're going to get into a mess with PMBSR_EL1, as that will get
re-read by arm_spe_pmu_buf_get_fault_act() in arm_spe_pmu_stop()
before we've cleared it here in the irq handler.

I was expecting that we would always pass 0 for the flags when handling
the case where we get an error back from arm_spe_perf_aux_output_begin().

Will
Re: [PATCH v3] perf: arm_spe: Properly set hw.state on failures
Posted by Leo Yan 2 weeks, 3 days ago
On Thu, Jan 22, 2026 at 01:38:24PM +0000, Will Deacon wrote:
> On Wed, Jan 21, 2026 at 11:33:21AM +0000, Leo Yan wrote:

[...]

> > +static void arm_spe_pmu_stop(struct perf_event *event, int flags);
> 
> This is fine, but I'm also happy if you want to move the functions around
> to avoid the forward declaration.

I also dislike the forward declaration, but the current code is well
organized (the PMU start/stop/add/del helpers are grouped together).

Moving arm_spe_pmu_stop() elsewhere might hurt the readability.

> > @@ -642,6 +643,7 @@ static void arm_spe_perf_aux_output_begin(struct perf_output_handle *handle,
> >  
> >  out_write_limit:
> >  	write_sysreg_s(limit, SYS_PMBLIMITR_EL1);
> > +	return (limit & PMBLIMITR_EL1_E) ? 0 : -EAGAIN;
> 
> I'd probably go with -EIO here. -EAGAIN implies that if the caller
> retries the operation then it might succeed, which probably isn't the
> case for these failures.

Will do.

> >  static void arm_spe_perf_aux_output_end(struct perf_output_handle *handle)
> > @@ -781,7 +783,10 @@ static irqreturn_t arm_spe_pmu_irq_handler(int irq, void *dev)
> >  		 * when we get to it.
> >  		 */
> >  		if (!(handle->aux_flags & PERF_AUX_FLAG_TRUNCATED)) {
> > -			arm_spe_perf_aux_output_begin(handle, event);
> > +			if (arm_spe_perf_aux_output_begin(handle, event)) {
> > +				arm_spe_pmu_stop(event, PERF_EF_UPDATE);
> 
> Why do you need to pass PERF_EF_UPDATE in this case?

The main purpose is to read PMSICR_EL1 and save the left count to
"hwc->period_left".  This might be used in next enable.

> It looks to me
> like we're going to get into a mess with PMBSR_EL1, as that will get
> re-read by arm_spe_pmu_buf_get_fault_act() in arm_spe_pmu_stop()
> before we've cleared it here in the irq handler.

When arm_spe_perf_aux_output_begin() fails, either because
perf_aux_output_begin() fails to start a new session or because an
invalid limit is detected and perf_aux_output_end(handle, 0) is
called.

In either case, perf_get_aux(handle) returns NULL after the failure,
and arm_spe_pmu_stop() has no chance to run into the path that
re-reads PMBSR_EL1.

> I was expecting that we would always pass 0 for the flags when handling
> the case where we get an error back from arm_spe_perf_aux_output_begin().

We can look at this another way.  If we do not call arm_spe_pmu_stop()
in the interrupt handler and instead defer stopping the trace to
arm_spe_pmu_del(), the PERF_EF_UPDATE flag is used.  This patch simply
keeps the same behavior while stopping the trace earlier in the
interrupt handler.  Make sense?

Thanks,
Leo
Re: [PATCH v3] perf: arm_spe: Properly set hw.state on failures
Posted by Will Deacon 5 days, 22 hours ago
On Thu, Jan 22, 2026 at 04:30:10PM +0000, Leo Yan wrote:
> On Thu, Jan 22, 2026 at 01:38:24PM +0000, Will Deacon wrote:
> > On Wed, Jan 21, 2026 at 11:33:21AM +0000, Leo Yan wrote:
> 
> [...]
> 
> > > +static void arm_spe_pmu_stop(struct perf_event *event, int flags);
> > 
> > This is fine, but I'm also happy if you want to move the functions around
> > to avoid the forward declaration.
> 
> I also dislike the forward declaration, but the current code is well
> organized (the PMU start/stop/add/del helpers are grouped together).
> 
> Moving arm_spe_pmu_stop() elsewhere might hurt the readability.

Ok, up to you.

> > > @@ -642,6 +643,7 @@ static void arm_spe_perf_aux_output_begin(struct perf_output_handle *handle,
> > >  
> > >  out_write_limit:
> > >  	write_sysreg_s(limit, SYS_PMBLIMITR_EL1);
> > > +	return (limit & PMBLIMITR_EL1_E) ? 0 : -EAGAIN;
> > 
> > I'd probably go with -EIO here. -EAGAIN implies that if the caller
> > retries the operation then it might succeed, which probably isn't the
> > case for these failures.
> 
> Will do.

Thanks.

> > >  static void arm_spe_perf_aux_output_end(struct perf_output_handle *handle)
> > > @@ -781,7 +783,10 @@ static irqreturn_t arm_spe_pmu_irq_handler(int irq, void *dev)
> > >  		 * when we get to it.
> > >  		 */
> > >  		if (!(handle->aux_flags & PERF_AUX_FLAG_TRUNCATED)) {
> > > -			arm_spe_perf_aux_output_begin(handle, event);
> > > +			if (arm_spe_perf_aux_output_begin(handle, event)) {
> > > +				arm_spe_pmu_stop(event, PERF_EF_UPDATE);
> > 
> > Why do you need to pass PERF_EF_UPDATE in this case?
> 
> The main purpose is to read PMSICR_EL1 and save the left count to
> "hwc->period_left".  This might be used in next enable.
>
> 
> > It looks to me
> > like we're going to get into a mess with PMBSR_EL1, as that will get
> > re-read by arm_spe_pmu_buf_get_fault_act() in arm_spe_pmu_stop()
> > before we've cleared it here in the irq handler.
> 
> When arm_spe_perf_aux_output_begin() fails, either because
> perf_aux_output_begin() fails to start a new session or because an
> invalid limit is detected and perf_aux_output_end(handle, 0) is
> called.
> 
> In either case, perf_get_aux(handle) returns NULL after the failure,
> and arm_spe_pmu_stop() has no chance to run into the path that
> re-reads PMBSR_EL1.

Got it.

> > I was expecting that we would always pass 0 for the flags when handling
> > the case where we get an error back from arm_spe_perf_aux_output_begin().
> 
> We can look at this another way.  If we do not call arm_spe_pmu_stop()
> in the interrupt handler and instead defer stopping the trace to
> arm_spe_pmu_del(), the PERF_EF_UPDATE flag is used.  This patch simply
> keeps the same behavior while stopping the trace earlier in the
> interrupt handler.  Make sense?

Yes, fair enough. Please post an updated patch.

Will