From nobody Tue Oct 7 22:35:59 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BB35328C865 for ; Fri, 4 Jul 2025 17:08:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751648931; cv=none; b=H2Ag2UTBKvVw4oiORWs9xxX9CCDSanGb0dT6fzKyIrwe4beWEbBEb8gTao39/Mnf+ZFexZUb4pH+PFfvdrfWtVbQy7WAB62twzj8xv1W/JWaZ76sOCoDv4hpYuEzBwbwPnpFq7jPBYSWqV1rffJ/v1fQ8d8fTO+vkviHhFyGdbc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751648931; c=relaxed/simple; bh=OdjCzs5oBwsiFU9ybTJnF41S+AHSQl7mGlpQc1/NOj4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=OB67c2MJ7xvxEodoQAQnjhp8H3MsWQ5pFqN1ZlUdfLO+zBvorW/e4X9khv1hgviIlq0rHYiTdIwYxw/WNb4V+sqxEoSMxdZSvOe3i1XzPbC+sgnCom2xJDSrJqWzVdiQggcomZwsfYShiEu2mI0He6ytdreV4zWWQhaKOxoUi/k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Shejrna2; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Shejrna2" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1751648929; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8c26B+tXYtM+uAF1aKbVYP1jnJ2dmO5fQ29lWYwD/18=; b=Shejrna2L/l+wjQJ7HfOvjkWW8xuIvFh53Gu+CILApDFA9wyIdntQJaNVacltUJI5pLP30 V5mh36SDDx5h96A9fFybFkln5ppEoz7G74kmjURswtqfqSEy8BSBDtm3nXtZpJ4dtz59mS R6F9qMgII8iPeQy9vKCImeKesQMZRxw= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-584-bd-cPDVlOkGG_ZHBfgvbMg-1; Fri, 04 Jul 2025 13:08:43 -0400 X-MC-Unique: bd-cPDVlOkGG_ZHBfgvbMg-1 X-Mimecast-MFC-AGG-ID: bd-cPDVlOkGG_ZHBfgvbMg_1751648921 Received: from mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.40]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 77047180047F; Fri, 4 Jul 2025 17:08:41 +0000 (UTC) Received: from wcosta-thinkpadt14gen4.rmtbr.csb (unknown [10.22.80.79]) by mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 5A57B19560AB; Fri, 4 Jul 2025 17:08:34 +0000 (UTC) From: Wander Lairson Costa To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Masami Hiramatsu , Mathieu Desnoyers , David Woodhouse , Boqun Feng , Thomas Gleixner , Wander Lairson Costa , linux-kernel@vger.kernel.org (open list), linux-trace-kernel@vger.kernel.org (open list:TRACING) Cc: Arnaldo Carvalho de Melo , Clark Williams , Gabriele Monaco Subject: [PATCH v3 1/2] trace/preemptirq: reduce overhead of irq_enable/disable tracepoints Date: Fri, 4 Jul 2025 14:07:42 -0300 Message-ID: <20250704170748.97632-2-wander@redhat.com> In-Reply-To: <20250704170748.97632-1-wander@redhat.com> References: <20250704170748.97632-1-wander@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.40 The irqsoff tracer is rarely enabled in production systems due to the non-negligible overhead it introduces=E2=80=94even when unused. This is cau= sed by how trace_hardirqs_on/off() are always invoked in local_irq_enable/disable(), evaluate the tracepoint static key. This patch reduces the overhead in the common case where the tracepoint is disabled by performing the static key check earlier, avoiding the call to trace_hardirqs_on/off() entirely. This makes the impact of disabled preemptirq IRQ tracing negligible in performance-sensitive environments. We also move the atomic.h include from tracepoint-defs.h to tracepoint.h due a circular dependency when building 32 bits ARM. The failure occurs because the new logic in calls tracepoint_enabled(), which requires the tracepoint-defs.h header file. This header, in turn, includes . On ARM32, the include path for kernel/bounds.c creates a circular dependency: atomic.h -> cmpxchg.h -> irqflags.h -> tracepoint.h -> atomic.h Signed-off-by: Wander Lairson Costa Suggested-by: Steven Rostedt Cc: Arnaldo Carvalho de Melo Cc: Clark Williams Cc: Gabriele Monaco Cc: Juri Lelli --- include/linux/irqflags.h | 30 +++++++++++++++++++++--------- include/linux/tracepoint-defs.h | 1 - include/linux/tracepoint.h | 1 + kernel/trace/trace_preemptirq.c | 3 +++ 4 files changed, 25 insertions(+), 10 deletions(-) diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h index 57b074e0cfbb..40e456fa3d10 100644 --- a/include/linux/irqflags.h +++ b/include/linux/irqflags.h @@ -17,6 +17,7 @@ #include #include #include +#include =20 struct task_struct; =20 @@ -197,9 +198,17 @@ extern void warn_bogus_irq_restore(void); */ #ifdef CONFIG_TRACE_IRQFLAGS =20 +DECLARE_TRACEPOINT(irq_enable); +DECLARE_TRACEPOINT(irq_disable); + +#define __trace_enabled(tp) \ + (IS_ENABLED(CONFIG_PROVE_LOCKING) || \ + tracepoint_enabled(tp)) + #define local_irq_enable() \ do { \ - trace_hardirqs_on(); \ + if (__trace_enabled(irq_enable)) \ + trace_hardirqs_on(); \ raw_local_irq_enable(); \ } while (0) =20 @@ -207,31 +216,34 @@ extern void warn_bogus_irq_restore(void); do { \ bool was_disabled =3D raw_irqs_disabled();\ raw_local_irq_disable(); \ - if (!was_disabled) \ + if (__trace_enabled(irq_disable) && \ + !was_disabled) \ trace_hardirqs_off(); \ } while (0) =20 #define local_irq_save(flags) \ do { \ raw_local_irq_save(flags); \ - if (!raw_irqs_disabled_flags(flags)) \ + if (__trace_enabled(irq_disable) && \ + !raw_irqs_disabled_flags(flags)) \ trace_hardirqs_off(); \ } while (0) =20 #define local_irq_restore(flags) \ do { \ - if (!raw_irqs_disabled_flags(flags)) \ + if (__trace_enabled(irq_enable) && \ + !raw_irqs_disabled_flags(flags)) \ trace_hardirqs_on(); \ raw_local_irq_restore(flags); \ } while (0) =20 -#define safe_halt() \ - do { \ - trace_hardirqs_on(); \ - raw_safe_halt(); \ +#define safe_halt() \ + do { \ + if (__trace_enabled(irq_enable)) \ + trace_hardirqs_on(); \ + raw_safe_halt(); \ } while (0) =20 - #else /* !CONFIG_TRACE_IRQFLAGS */ =20 #define local_irq_enable() do { raw_local_irq_enable(); } while (0) diff --git a/include/linux/tracepoint-defs.h b/include/linux/tracepoint-def= s.h index aebf0571c736..cb1f15a4e43f 100644 --- a/include/linux/tracepoint-defs.h +++ b/include/linux/tracepoint-defs.h @@ -8,7 +8,6 @@ * trace_print_flags{_u64}. Otherwise linux/tracepoint.h should be used. */ =20 -#include #include =20 struct static_call_key; diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h index 826ce3f8e1f8..2fd91ef49b7f 100644 --- a/include/linux/tracepoint.h +++ b/include/linux/tracepoint.h @@ -20,6 +20,7 @@ #include #include #include +#include =20 struct module; struct tracepoint; diff --git a/kernel/trace/trace_preemptirq.c b/kernel/trace/trace_preemptir= q.c index 0c42b15c3800..90ee65db4516 100644 --- a/kernel/trace/trace_preemptirq.c +++ b/kernel/trace/trace_preemptirq.c @@ -111,6 +111,9 @@ void trace_hardirqs_off(void) } EXPORT_SYMBOL(trace_hardirqs_off); NOKPROBE_SYMBOL(trace_hardirqs_off); + +EXPORT_TRACEPOINT_SYMBOL(irq_disable); +EXPORT_TRACEPOINT_SYMBOL(irq_enable); #endif /* CONFIG_TRACE_IRQFLAGS */ =20 #ifdef CONFIG_TRACE_PREEMPT_TOGGLE --=20 2.50.0 From nobody Tue Oct 7 22:35:59 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D743D2D836C for ; Fri, 4 Jul 2025 17:08:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751648939; cv=none; b=kroYI7Glf7Ym/SYEjG0SYrJJJlJ2gbakBm8l6mtC0vBbEjwg9YUeTD/cwqTFsPJbW77P+sPCiALJdQBD3+xnd0cyjrJyVfMbLCSt7xyERu6MnIONyEz85T2dpy/z6qBTpy7pMNK6uRqspvuYnDbgP6NmLnJsJpjprBChzlRrqT4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751648939; c=relaxed/simple; bh=pb209bXaugEvQjg2MuIrZK2EF+7xdDuFhOeXfAta4ww=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Hoxwc55iozT0Dh/yw0Mzjo3sFA0bte0rISVr0W2EWFtvNxnLYOZaUmLWhxvhuPcXNnjwr/Bf8vpgX9Sxl/WlrRc1Y2JNZXXbUmon/gnejMLVp7HD0q+ycVb8Em2qPivmw+8T7jHMpC1EfMfiI5A9WddhZKmA/ivHklW1wZst1uA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=WE05GyT2; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="WE05GyT2" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1751648935; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4Fb7vbgp9yWJFAfr2LZ2H/YK07/bv8dqvyUraYZjnSI=; b=WE05GyT2V4men3KTjrEJTOIZfLPu4oqmtwV94fR1pKHVWm9uRKL5gErqvc+ugW4m5BbRRi j8Ve1f33d1zzHhs4hLwHLMG+TXq3oWXp31gwF4EZ1crTAIrJN7+26Pam+EtOIfFxs2NPCG 1NyPf7mQpKkN87D1R8a3mxfF2MIpDtQ= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-56-QULDOL_hNQ6DpGpVzBdRvw-1; Fri, 04 Jul 2025 13:08:52 -0400 X-MC-Unique: QULDOL_hNQ6DpGpVzBdRvw-1 X-Mimecast-MFC-AGG-ID: QULDOL_hNQ6DpGpVzBdRvw_1751648930 Received: from mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.40]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 73186195608C; Fri, 4 Jul 2025 17:08:50 +0000 (UTC) Received: from wcosta-thinkpadt14gen4.rmtbr.csb (unknown [10.22.80.79]) by mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 50E36194128F; Fri, 4 Jul 2025 17:08:43 +0000 (UTC) From: Wander Lairson Costa To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Masami Hiramatsu , Mathieu Desnoyers , David Woodhouse , Thomas Gleixner , Wander Lairson Costa , Boqun Feng , linux-kernel@vger.kernel.org (open list), linux-trace-kernel@vger.kernel.org (open list:TRACING) Cc: Arnaldo Carvalho de Melo , Clark Williams , Gabriele Monaco Subject: [PATCH v3 2/2] tracing/preemptirq: Optimize preempt_disable/enable() tracepoint overhead Date: Fri, 4 Jul 2025 14:07:43 -0300 Message-ID: <20250704170748.97632-3-wander@redhat.com> In-Reply-To: <20250704170748.97632-1-wander@redhat.com> References: <20250704170748.97632-1-wander@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.40 Content-Type: text/plain; charset="utf-8" Similar to the IRQ tracepoint, the preempt tracepoints are typically disabled in production systems due to the significant overhead they introduce even when not in use. The overhead primarily comes from two sources: First, when tracepoints are compiled into the kernel, preempt_count_add() and preempt_count_sub() become external function calls rather than inlined operations. Second, these functions perform unnecessary preempt_count() checks even when the tracepoint itself is disabled. This optimization introduces an early check of the tracepoint static key, which allows us to skip both the function call overhead and the redundant preempt_count() checks when tracing is disabled. The change maintains all existing functionality when tracing is active while significantly reducing overhead for the common case where tracing is inactive. Signed-off-by: Wander Lairson Costa Suggested-by: Steven Rostedt Cc: Arnaldo Carvalho de Melo Cc: Clark Williams Cc: Gabriele Monaco Cc: Juri Lelli --- include/linux/preempt.h | 35 ++++++++++++++++++++++++++++++--- kernel/sched/core.c | 12 +---------- kernel/trace/trace_preemptirq.c | 19 ++++++++++++++++++ 3 files changed, 52 insertions(+), 14 deletions(-) diff --git a/include/linux/preempt.h b/include/linux/preempt.h index b0af8d4ef6e6..d13c755cd934 100644 --- a/include/linux/preempt.h +++ b/include/linux/preempt.h @@ -10,6 +10,7 @@ #include #include #include +#include =20 /* * We put the hardirq and softirq counter into the preemption @@ -191,17 +192,45 @@ static __always_inline unsigned char interrupt_contex= t_level(void) */ #define in_atomic_preempt_off() (preempt_count() !=3D PREEMPT_DISABLE_OFFS= ET) =20 -#if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_TRACE_PREEMPT_TOGGLE) +#if defined(CONFIG_DEBUG_PREEMPT) extern void preempt_count_add(int val); extern void preempt_count_sub(int val); -#define preempt_count_dec_and_test() \ - ({ preempt_count_sub(1); should_resched(0); }) +#elif defined(CONFIG_TRACE_PREEMPT_TOGGLE) +extern void __trace_preempt_on(void); +extern void __trace_preempt_off(void); + +DECLARE_TRACEPOINT(preempt_enable); +DECLARE_TRACEPOINT(preempt_disable); + +#define __preempt_trace_enabled(type) \ + (tracepoint_enabled(preempt_##type) && preempt_count() =3D=3D val) + +static inline void preempt_count_add(int val) +{ + __preempt_count_add(val); + + if (__preempt_trace_enabled(disable)) + __trace_preempt_off(); +} + +static inline void preempt_count_sub(int val) +{ + if (__preempt_trace_enabled(enable)) + __trace_preempt_on(); + + __preempt_count_sub(val); +} #else #define preempt_count_add(val) __preempt_count_add(val) #define preempt_count_sub(val) __preempt_count_sub(val) #define preempt_count_dec_and_test() __preempt_count_dec_and_test() #endif =20 +#if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_TRACE_PREEMPT_TOGGLE) +#define preempt_count_dec_and_test() \ + ({ preempt_count_sub(1); should_resched(0); }) +#endif + #define __preempt_count_inc() __preempt_count_add(1) #define __preempt_count_dec() __preempt_count_sub(1) =20 diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 8988d38d46a3..4feba4738d79 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5840,8 +5840,7 @@ static inline void sched_tick_start(int cpu) { } static inline void sched_tick_stop(int cpu) { } #endif =20 -#if defined(CONFIG_PREEMPTION) && (defined(CONFIG_DEBUG_PREEMPT) || \ - defined(CONFIG_TRACE_PREEMPT_TOGGLE)) +#if defined(CONFIG_PREEMPTION) && defined(CONFIG_DEBUG_PREEMPT) /* * If the value passed in is equal to the current preempt count * then we just disabled preemption. Start timing the latency. @@ -5850,30 +5849,24 @@ static inline void preempt_latency_start(int val) { if (preempt_count() =3D=3D val) { unsigned long ip =3D get_lock_parent_ip(); -#ifdef CONFIG_DEBUG_PREEMPT current->preempt_disable_ip =3D ip; -#endif trace_preempt_off(CALLER_ADDR0, ip); } } =20 void preempt_count_add(int val) { -#ifdef CONFIG_DEBUG_PREEMPT /* * Underflow? */ if (DEBUG_LOCKS_WARN_ON((preempt_count() < 0))) return; -#endif __preempt_count_add(val); -#ifdef CONFIG_DEBUG_PREEMPT /* * Spinlock count overflowing soon? */ DEBUG_LOCKS_WARN_ON((preempt_count() & PREEMPT_MASK) >=3D PREEMPT_MASK - 10); -#endif preempt_latency_start(val); } EXPORT_SYMBOL(preempt_count_add); @@ -5891,7 +5884,6 @@ static inline void preempt_latency_stop(int val) =20 void preempt_count_sub(int val) { -#ifdef CONFIG_DEBUG_PREEMPT /* * Underflow? */ @@ -5903,14 +5895,12 @@ void preempt_count_sub(int val) if (DEBUG_LOCKS_WARN_ON((val < PREEMPT_MASK) && !(preempt_count() & PREEMPT_MASK))) return; -#endif =20 preempt_latency_stop(val); __preempt_count_sub(val); } EXPORT_SYMBOL(preempt_count_sub); NOKPROBE_SYMBOL(preempt_count_sub); - #else static inline void preempt_latency_start(int val) { } static inline void preempt_latency_stop(int val) { } diff --git a/kernel/trace/trace_preemptirq.c b/kernel/trace/trace_preemptir= q.c index 90ee65db4516..deb2428b34a2 100644 --- a/kernel/trace/trace_preemptirq.c +++ b/kernel/trace/trace_preemptirq.c @@ -118,6 +118,25 @@ EXPORT_TRACEPOINT_SYMBOL(irq_enable); =20 #ifdef CONFIG_TRACE_PREEMPT_TOGGLE =20 +#if !defined(CONFIG_DEBUG_PREEMPT) +EXPORT_SYMBOL(__tracepoint_preempt_disable); +EXPORT_SYMBOL(__tracepoint_preempt_enable); + +void __trace_preempt_on(void) +{ + trace_preempt_on(CALLER_ADDR0, get_lock_parent_ip()); +} +EXPORT_SYMBOL(__trace_preempt_on); +NOKPROBE_SYMBOL(__trace_preempt_on); + +void __trace_preempt_off(void) +{ + trace_preempt_off(CALLER_ADDR0, get_lock_parent_ip()); +} +EXPORT_SYMBOL(__trace_preempt_off); +NOKPROBE_SYMBOL(__trace_preempt_off); +#endif /* !CONFIG_DEBUG_PREEMPT */ + void trace_preempt_on(unsigned long a0, unsigned long a1) { trace(preempt_enable, TP_ARGS(a0, a1)); --=20 2.50.0