[PATCH] perf: Fix the throttle error of some clock events

kan.liang@linux.intel.com posted 1 patch 8 months, 2 weeks ago
There is a newer version of this series
include/linux/perf_event.h |  1 +
kernel/events/core.c       | 23 ++++++++++++++++++++---
2 files changed, 21 insertions(+), 3 deletions(-)
[PATCH] perf: Fix the throttle error of some clock events
Posted by kan.liang@linux.intel.com 8 months, 2 weeks ago
From: Kan Liang <kan.liang@linux.intel.com>

The Arm CI reports RCU stall, which can be reproduced by the below perf
command.
  perf record -a -e cpu-clock -- sleep 2

The cpu-clock and task_clock are two special SW events, which rely on
the hrtimer. Instead of invoking the stop(), the HRTIMER_NORESTART is
returned to stop the timer. Because the hrtimer interrupt handler cannot
cancel itself, which causes infinite loop.

There may be two ways to fix it.
- Add a check of MAX_INTERRUPTS in the event_stop. Return immediately if
the stop is invoked by the throttle.
- Introduce a PMU flag to track the case. Avoid the event_stop in
perf_event_throttle() if the flag is detected.

The latter looks more generic. It may be used if there are more other
cases that want to avoid the stop later. The latter is implemented.

Reported-by: Leo Yan <leo.yan@arm.com>
Reported-by: Aishwarya TCV <aishwarya.tcv@arm.com>
Closes: https://lore.kernel.org/lkml/20250527161656.GJ2566836@e132581.arm.com/
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 include/linux/perf_event.h |  1 +
 kernel/events/core.c       | 23 ++++++++++++++++++++---
 2 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 947ad12dfdbe..66f02f46595c 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -303,6 +303,7 @@ struct perf_event_pmu_context;
 #define PERF_PMU_CAP_AUX_OUTPUT			0x0080
 #define PERF_PMU_CAP_EXTENDED_HW_TYPE		0x0100
 #define PERF_PMU_CAP_AUX_PAUSE			0x0200
+#define PERF_PMU_CAP_NO_THROTTLE_STOP		0x0400
 
 /**
  * pmu::scope
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 8327ab0ee641..4df274705038 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2655,7 +2655,22 @@ static void perf_event_unthrottle(struct perf_event *event, bool start)
 
 static void perf_event_throttle(struct perf_event *event)
 {
-	event->pmu->stop(event, 0);
+	/*
+	 * Some PMUs, e.g., cpu-clock and task_clock, may rely on
+	 * a special mechanism (hrtimer) to manipulate counters.
+	 * The regular stop doesn't work, since the hrtimer interrupt
+	 * handler cannot cancel itself.
+	 *
+	 * The stop should be avoided for such cases. Let the
+	 * driver-specific code handle it.
+	 *
+	 * The counters will eventually be disabled in the driver-specific
+	 * code. In unthrottle, they still need to be re-enabled.
+	 * There is no handling for PERF_PMU_CAP_NO_THROTTLE_STOP in
+	 * the perf_event_unthrottle().
+	 */
+	if (!(event->pmu->capabilities & PERF_PMU_CAP_NO_THROTTLE_STOP))
+		event->pmu->stop(event, 0);
 	event->hw.interrupts = MAX_INTERRUPTS;
 	perf_log_throttle(event, 0);
 }
@@ -11846,7 +11861,8 @@ static int cpu_clock_event_init(struct perf_event *event)
 static struct pmu perf_cpu_clock = {
 	.task_ctx_nr	= perf_sw_context,
 
-	.capabilities	= PERF_PMU_CAP_NO_NMI,
+	.capabilities	= PERF_PMU_CAP_NO_NMI |
+			  PERF_PMU_CAP_NO_THROTTLE_STOP,
 	.dev		= PMU_NULL_DEV,
 
 	.event_init	= cpu_clock_event_init,
@@ -11928,7 +11944,8 @@ static int task_clock_event_init(struct perf_event *event)
 static struct pmu perf_task_clock = {
 	.task_ctx_nr	= perf_sw_context,
 
-	.capabilities	= PERF_PMU_CAP_NO_NMI,
+	.capabilities	= PERF_PMU_CAP_NO_NMI |
+			  PERF_PMU_CAP_NO_THROTTLE_STOP,
 	.dev		= PMU_NULL_DEV,
 
 	.event_init	= task_clock_event_init,
-- 
2.38.1
Re: [PATCH] perf: Fix the throttle error of some clock events
Posted by Leo Yan 8 months, 2 weeks ago
On Wed, May 28, 2025 at 07:48:23AM -0700, kan.liang@linux.intel.com wrote:
> From: Kan Liang <kan.liang@linux.intel.com>
> 
> The Arm CI reports RCU stall, which can be reproduced by the below perf
> command.
>   perf record -a -e cpu-clock -- sleep 2
> 
> The cpu-clock and task_clock are two special SW events, which rely on
> the hrtimer. Instead of invoking the stop(), the HRTIMER_NORESTART is
> returned to stop the timer. Because the hrtimer interrupt handler cannot
> cancel itself, which causes infinite loop.
> 
> There may be two ways to fix it.
> - Add a check of MAX_INTERRUPTS in the event_stop. Return immediately if
> the stop is invoked by the throttle.
> - Introduce a PMU flag to track the case. Avoid the event_stop in
> perf_event_throttle() if the flag is detected.
> 
> The latter looks more generic. It may be used if there are more other
> cases that want to avoid the stop later. The latter is implemented.
> 
> Reported-by: Leo Yan <leo.yan@arm.com>
> Reported-by: Aishwarya TCV <aishwarya.tcv@arm.com>

Thanks for adding me and my colleague's name!

> Closes: https://lore.kernel.org/lkml/20250527161656.GJ2566836@e132581.arm.com/
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> ---
>  include/linux/perf_event.h |  1 +
>  kernel/events/core.c       | 23 ++++++++++++++++++++---
>  2 files changed, 21 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 947ad12dfdbe..66f02f46595c 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -303,6 +303,7 @@ struct perf_event_pmu_context;
>  #define PERF_PMU_CAP_AUX_OUTPUT			0x0080
>  #define PERF_PMU_CAP_EXTENDED_HW_TYPE		0x0100
>  #define PERF_PMU_CAP_AUX_PAUSE			0x0200
> +#define PERF_PMU_CAP_NO_THROTTLE_STOP		0x0400

When applying the patch on the top of tip/sched/core, I found it is
conflict with a new added PERF_PMU_CAP_AUX_PREFER_LARGE.  So need to
change the PERF_PMU_CAP_NO_THROTTLE_STOP to 0x0800.

With the change, I tested on Arm64 board with two commands:

  perf record -a -e cpu-clock -- sleep 2
  perf test "perftool-testsuite_report"

Tested-by: Leo Yan <leo.yan@arm.com>

Thanks for quick fixing.

>  /**
>   * pmu::scope
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 8327ab0ee641..4df274705038 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -2655,7 +2655,22 @@ static void perf_event_unthrottle(struct perf_event *event, bool start)
>  
>  static void perf_event_throttle(struct perf_event *event)
>  {
> -	event->pmu->stop(event, 0);
> +	/*
> +	 * Some PMUs, e.g., cpu-clock and task_clock, may rely on
> +	 * a special mechanism (hrtimer) to manipulate counters.
> +	 * The regular stop doesn't work, since the hrtimer interrupt
> +	 * handler cannot cancel itself.
> +	 *
> +	 * The stop should be avoided for such cases. Let the
> +	 * driver-specific code handle it.
> +	 *
> +	 * The counters will eventually be disabled in the driver-specific
> +	 * code. In unthrottle, they still need to be re-enabled.
> +	 * There is no handling for PERF_PMU_CAP_NO_THROTTLE_STOP in
> +	 * the perf_event_unthrottle().
> +	 */
> +	if (!(event->pmu->capabilities & PERF_PMU_CAP_NO_THROTTLE_STOP))
> +		event->pmu->stop(event, 0);
>  	event->hw.interrupts = MAX_INTERRUPTS;
>  	perf_log_throttle(event, 0);
>  }
> @@ -11846,7 +11861,8 @@ static int cpu_clock_event_init(struct perf_event *event)
>  static struct pmu perf_cpu_clock = {
>  	.task_ctx_nr	= perf_sw_context,
>  
> -	.capabilities	= PERF_PMU_CAP_NO_NMI,
> +	.capabilities	= PERF_PMU_CAP_NO_NMI |
> +			  PERF_PMU_CAP_NO_THROTTLE_STOP,
>  	.dev		= PMU_NULL_DEV,
>  
>  	.event_init	= cpu_clock_event_init,
> @@ -11928,7 +11944,8 @@ static int task_clock_event_init(struct perf_event *event)
>  static struct pmu perf_task_clock = {
>  	.task_ctx_nr	= perf_sw_context,
>  
> -	.capabilities	= PERF_PMU_CAP_NO_NMI,
> +	.capabilities	= PERF_PMU_CAP_NO_NMI |
> +			  PERF_PMU_CAP_NO_THROTTLE_STOP,
>  	.dev		= PMU_NULL_DEV,
>  
>  	.event_init	= task_clock_event_init,
> -- 
> 2.38.1
>