From nobody Sun Feb 8 11:26:30 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AE54B2820C7; Thu, 24 Apr 2025 16:11:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745511098; cv=none; b=bmVl1Khi2AdtM4gjyC8z9kvfl73HkDwE+m7INS8E5q7YCjsqqxN8Yfn+iOwawwrT7Hdat5on19jUBbERm55lIGbelUkh+iME6vAM6xqhuVhVVC0NgPxU0n0nxmcuj7tRkxr9llqWHanlpaTD7Ln4VnbogGB+WrQLfu7j4/WnbUg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745511098; c=relaxed/simple; bh=lyvdiLKQ9R+pCnIHW8OAPlcsNKg8JJFvCI/SqBqxElE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=gb22FK/49iqsLAhQhmBasRv/Y5YIduY9lC+0oxmGfLpH/5qW0pxDAB3iKiM/BCVT5KExmOVsU5UZrjvKHAyFTKex9JouewLvQDzmA+KjUoSCquyXZ8Q2LPDuLjd0KT8xpWrI1fVSCl+9P0iIBSKh12xMY8EX8savdXqgkwFGSTs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=gQCA6lVC; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="gQCA6lVC" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5FB82C4CEF1; Thu, 24 Apr 2025 16:11:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1745511098; bh=lyvdiLKQ9R+pCnIHW8OAPlcsNKg8JJFvCI/SqBqxElE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=gQCA6lVCdDznLodL7bazT3YzJbQQObyMuJtsw3tsKizXKa6fjoqidUggWDbMSCSyo mZJE1qyj9T+p8zl9fI1tlz+YI+CADHotG7qpTwUqshZMsFJ5QynUWybYCOPgiW5qYg qIy/SmgZ4vExF4gbJFFJPU7T7w4AqGSmgRaPozfQpnYos3VS372Wmaj0MGtID0QUd0 x3I3cfhTcUFJESX9IXbrI2w6HiWoNM0V4+E8oX8m9Z6HKx1iTJucrjr1RjrZAIZbhA Ffs368ws4lG8x8xs62kBAyw6keLN5/Dsgs328SbCJRlknmpIwGi5ziVbyL8Jtx0fyg 40CpgD3q70MKg== From: Frederic Weisbecker To: Peter Zijlstra , Ingo Molnar Cc: LKML , Frederic Weisbecker , "Liang, Kan" , Adrian Hunter , Alexander Shishkin , Arnaldo Carvalho de Melo , Ian Rogers , Jiri Olsa , Mark Rutland , Namhyung Kim , Ravi Bangoria , linux-perf-users@vger.kernel.org Subject: [PATCH 1/4] perf: Fix failing inherit_event() doing extra refcount decrement on parent Date: Thu, 24 Apr 2025 18:11:25 +0200 Message-ID: <20250424161128.29176-2-frederic@kernel.org> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250424161128.29176-1-frederic@kernel.org> References: <20250424161128.29176-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When inherit_event() fails after the child allocation but before the parent refcount has been incremented, calling put_event() wrongly decrements the reference to the parent, risking to free it too early. Also pmu_get_event() can't be holding a reference to the child concurrently at this point since it is under pmus_srcu critical section. Fix it with restoring the deleted free_event() function and call it on the failing child in order to free it directly under the verified assumption that its refcount is only 1. The refcount to the parent is then voluntarily omitted. Fixes: da916e96e2de ("perf: Make perf_pmu_unregister() useable") Signed-off-by: Frederic Weisbecker --- kernel/events/core.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index 07414cb1279b..7bcb02ffb93a 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -5627,6 +5627,22 @@ static void _free_event(struct perf_event *event) __free_event(event); } =20 +/* + * Used to free events which have a known refcount of 1, such as in error = paths + * of inherited events. + */ +static void free_event(struct perf_event *event) +{ + if (WARN(atomic_long_cmpxchg(&event->refcount, 1, 0) !=3D 1, + "unexpected event refcount: %ld; ptr=3D%p\n", + atomic_long_read(&event->refcount), event)) { + /* leak to avoid use-after-free */ + return; + } + + _free_event(event); +} + /* * Remove user event from the owner task. */ @@ -14184,7 +14200,7 @@ inherit_event(struct perf_event *parent_event, =20 pmu_ctx =3D find_get_pmu_context(child_event->pmu, child_ctx, child_event= ); if (IS_ERR(pmu_ctx)) { - put_event(child_event); + free_event(child_event); return ERR_CAST(pmu_ctx); } child_event->pmu_ctx =3D pmu_ctx; @@ -14199,7 +14215,7 @@ inherit_event(struct perf_event *parent_event, if (is_orphaned_event(parent_event) || !atomic_long_inc_not_zero(&parent_event->refcount)) { mutex_unlock(&parent_event->child_mutex); - put_event(child_event); + free_event(child_event); return NULL; } =20 --=20 2.48.1 From nobody Sun Feb 8 11:26:30 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 980AD27FD6A; Thu, 24 Apr 2025 16:11:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745511101; cv=none; b=DdxGIAXhy/jdbvRxNc8p2F8pNkzj3ukBovUs3N2T4x0Vip+cqZSVNQRwkm3ofuT2cyqdVIyq92lw6OYVvTpuANkPsxd5bGN+fVzMXrDl20rx8YthExPXbMybknVnefs+q7suCvBU0KLCi0jJkdkM8CKyyzK6gjYMEI+s5TpOcXo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745511101; c=relaxed/simple; bh=WIN1ACm5TPwenVhT3KIYFpuUDxEvnWaUqxH62Jen2As=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=RI6pztvJnepF6f1BgdYeM5vMywWmpdHjnPcaCbTxbAi/x63ZIu5keHtzZW+drqOFPLz8VswChvYFOz/p9pQyPsS+L1NyGpkKXPG1yuNxcLgJhOw0Foll8MGvDUBiCYIujkloH9Eq3h3SwKFhulzwMpmG1sBMqWMYhsihQNYT4b4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Ze0s5rn9; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Ze0s5rn9" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9F7E2C4CEE3; Thu, 24 Apr 2025 16:11:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1745511101; bh=WIN1ACm5TPwenVhT3KIYFpuUDxEvnWaUqxH62Jen2As=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Ze0s5rn90jKoJ9wE20KrfbujQxYEsvHL3fWFFfS+rHNCaPdj1HLj+MFZHWTAqPZue weRmhuOCEmmENsD4TMKbmu7Y/hWMuMrcSPwG01+5Oh9Ias6B0BpfThZb+/6RUBGFhA pdXUKTWa61JxJwwNXbAl+P7MnDUbQqGKud7+pPLjEpDcN5nwWVl4TzmfJJ+9ZJT7uh fA7ArP3popdHYZl6VScFn8KNh1ImHve80btvmGQKmap7UznRIVsIFl2ftRVRzpQnic 5jCt+2TW3/RCV7lCTK9s/oOLHOXmgBEn2hbbr/JW/tV+2L4UF1MfuDiCNiL+CtDCtY UAjFbFlMk94bA== From: Frederic Weisbecker To: Peter Zijlstra , Ingo Molnar Cc: LKML , Frederic Weisbecker , "Liang, Kan" , Adrian Hunter , Alexander Shishkin , Arnaldo Carvalho de Melo , Ian Rogers , Jiri Olsa , Mark Rutland , Namhyung Kim , Ravi Bangoria , linux-perf-users@vger.kernel.org Subject: [PATCH 2/4] perf: Fix irq work dereferencing garbage Date: Thu, 24 Apr 2025 18:11:26 +0200 Message-ID: <20250424161128.29176-3-frederic@kernel.org> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250424161128.29176-1-frederic@kernel.org> References: <20250424161128.29176-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The following commit: da916e96e2de ("perf: Make perf_pmu_unregister() useable") has introduced two significant event's parent lifecycle changes: 1) An event that has exited now has EVENT_TOMBSTONE as a parent. This can result in a situation where the delayed wakeup irq_work can accidentally dereference EVENT_TOMBSTONE on: CPU 0 CPU 1 ----- ----- __schedule() local_irq_disable() rq_lock() perf_event_overflow() irq_work_queue(&child->pending_irq) perf_event_task_sched_out() raw_spin_lock(&ctx->lock) ctx_sched_out() ctx->is_active =3D 0 event_sched_out(child) raw_spin_unlock(&ctx->lock) perf_event_release_kernel(par= ent) perf_remove_from_context(= child) raw_spin_lock_irq(&ctx->l= ock) // Sees !ctx->is_active // Removes from context i= nline __perf_remove_from_contex= t(child) perf_child_detach(chi= ld) event->parent =3D= EVENT_TOMBSTONE raw_spin_rq_unlock_irq(rq); perf_pending_irq() perf_event_wakeup(child) ring_buffer_wakeup(child) rcu_dereference(child->parent->rb) <--- CRASH This also concerns the call to kill_fasync() on parent->fasync. 2) The final parent reference count decrement can now happen before the the final child reference count decrement. ie: the parent can now be freed before its child. On PREEMPT_RT, this can result in a situation where the delayed wakeup irq_work can accidentally dereference a freed parent: CPU 0 CPU 1 = CPU 2 ----- ----- = ------ perf_pmu_unregister() pmu_detach_events() pmu_get_event() atomic_long_inc_not_zero(&child->refcount) perf_event_overflow() irq_work_queue(&child->p= ending_irq); irq_work_run() wake_irq_workd() preempt_schedule_irq() =3D=3D=3D=3D=3D=3D=3D=3D=3D>= SWITCH to workd irq_work_run_list() perf_pending_irq() perf_event_wakeup(ch= ild) ring_buffer_wake= up(child) event =3D ch= ild->parent = perf_event_release_kernel(parent) = // Not last ref, PMU holds it = put_event(child) = // Last ref = put_event(parent) = free_event() = call_rcu(...) = rcu_core() = free_event_rcu() rcu_derefere= nce(event->rb) <--- CRASH This also concerns the call to kill_fasync() on parent->fasync. The "easy" solution to 1) is to check that event->parent is not EVENT_TOMBSTONE on perf_event_wakeup() (including both ring buffer and fasync uses). The "easy" solution to 2) is to turn perf_event_wakeup() to wholefully run under rcu_read_lock(). However because of 2), sanity would prescribe to make event::parent an __rcu pointer and annotate each and every users to prove they are reliable. Propose an alternate solution and restore the stable pointer to the parent until all its children have called _free_event() themselves to avoid any further accident. Also revert the EVENT_TOMBSTONE design that is mostly here to determine which caller of perf_event_exit_event() must perform the refcount decrement on a child event matching the increment in inherit_event(). Arrange instead for checking the attach state of an event prior to its removal and decrement the refcount of the child accordingly. Fixes: da916e96e2de ("perf: Make perf_pmu_unregister() useable") Signed-off-by: Frederic Weisbecker --- kernel/events/core.c | 87 ++++++++++++++++++++++++-------------------- 1 file changed, 48 insertions(+), 39 deletions(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index 7bcb02ffb93a..968a1d14bc8b 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -208,7 +208,6 @@ static void perf_ctx_unlock(struct perf_cpu_context *cp= uctx, } =20 #define TASK_TOMBSTONE ((void *)-1L) -#define EVENT_TOMBSTONE ((void *)-1L) =20 static bool is_kernel_event(struct perf_event *event) { @@ -2338,12 +2337,6 @@ static void perf_child_detach(struct perf_event *eve= nt) =20 sync_child_event(event); list_del_init(&event->child_list); - /* - * Cannot set to NULL, as that would confuse the situation vs - * not being a child event. See for example unaccount_event(). - */ - event->parent =3D EVENT_TOMBSTONE; - put_event(parent_event); } =20 static bool is_orphaned_event(struct perf_event *event) @@ -2469,6 +2462,11 @@ ctx_time_update_event(struct perf_event_context *ctx= , struct perf_event *event) #define DETACH_REVOKE 0x08UL #define DETACH_DEAD 0x10UL =20 +struct perf_remove_data { + unsigned int detach_flags; + unsigned int old_state; +}; + /* * Cross CPU call to remove a performance event * @@ -2483,28 +2481,30 @@ __perf_remove_from_context(struct perf_event *event, { struct perf_event_pmu_context *pmu_ctx =3D event->pmu_ctx; enum perf_event_state state =3D PERF_EVENT_STATE_OFF; - unsigned long flags =3D (unsigned long)info; + struct perf_remove_data *prd =3D info; =20 ctx_time_update(cpuctx, ctx); =20 + prd->old_state =3D event->attach_state; + /* * Ensure event_sched_out() switches to OFF, at the very least * this avoids raising perf_pending_task() at this time. */ - if (flags & DETACH_EXIT) + if (prd->detach_flags & DETACH_EXIT) state =3D PERF_EVENT_STATE_EXIT; - if (flags & DETACH_REVOKE) + if (prd->detach_flags & DETACH_REVOKE) state =3D PERF_EVENT_STATE_REVOKED; - if (flags & DETACH_DEAD) { + if (prd->detach_flags & DETACH_DEAD) { event->pending_disable =3D 1; state =3D PERF_EVENT_STATE_DEAD; } event_sched_out(event, ctx); perf_event_set_state(event, min(event->state, state)); =20 - if (flags & DETACH_GROUP) + if (prd->detach_flags & DETACH_GROUP) perf_group_detach(event); - if (flags & DETACH_CHILD) + if (prd->detach_flags & DETACH_CHILD) perf_child_detach(event); list_del_event(event, ctx); =20 @@ -2541,7 +2541,7 @@ __perf_remove_from_context(struct perf_event *event, * When called from perf_event_exit_task, it's OK because the * context has been detached from its task. */ -static void perf_remove_from_context(struct perf_event *event, unsigned lo= ng flags) +static void perf_remove_from_context(struct perf_event *event, struct perf= _remove_data *prd) { struct perf_event_context *ctx =3D event->ctx; =20 @@ -2555,13 +2555,13 @@ static void perf_remove_from_context(struct perf_ev= ent *event, unsigned long fla raw_spin_lock_irq(&ctx->lock); if (!ctx->is_active) { __perf_remove_from_context(event, this_cpu_ptr(&perf_cpu_context), - ctx, (void *)flags); + ctx, (void *)prd); raw_spin_unlock_irq(&ctx->lock); return; } raw_spin_unlock_irq(&ctx->lock); =20 - event_function_call(event, __perf_remove_from_context, (void *)flags); + event_function_call(event, __perf_remove_from_context, (void *)prd); } =20 /* @@ -5705,7 +5705,7 @@ static void put_event(struct perf_event *event) _free_event(event); =20 /* Matches the refcount bump in inherit_event() */ - if (parent && parent !=3D EVENT_TOMBSTONE) + if (parent) put_event(parent); } =20 @@ -5718,6 +5718,7 @@ int perf_event_release_kernel(struct perf_event *even= t) { struct perf_event_context *ctx =3D event->ctx; struct perf_event *child, *tmp; + struct perf_remove_data prd =3D { .old_state =3D 0 }; =20 /* * If we got here through err_alloc: free_event(event); we will not @@ -5747,7 +5748,8 @@ int perf_event_release_kernel(struct perf_event *even= t) * child events. */ if (event->state > PERF_EVENT_STATE_REVOKED) { - perf_remove_from_context(event, DETACH_GROUP|DETACH_DEAD); + prd.detach_flags =3D DETACH_GROUP | DETACH_DEAD; + perf_remove_from_context(event, &prd); } else { event->state =3D PERF_EVENT_STATE_DEAD; } @@ -5789,7 +5791,8 @@ int perf_event_release_kernel(struct perf_event *even= t) tmp =3D list_first_entry_or_null(&event->child_list, struct perf_event, child_list); if (tmp =3D=3D child) { - perf_remove_from_context(child, DETACH_GROUP | DETACH_CHILD); + prd.detach_flags =3D DETACH_GROUP | DETACH_CHILD; + perf_remove_from_context(child, &prd); } else { child =3D NULL; } @@ -13583,11 +13586,12 @@ SYSCALL_DEFINE5(perf_event_open, */ =20 if (move_group) { - perf_remove_from_context(group_leader, 0); + struct perf_remove_data prd =3D { 0 }; + perf_remove_from_context(group_leader, &prd); put_pmu_ctx(group_leader->pmu_ctx); =20 for_each_sibling_event(sibling, group_leader) { - perf_remove_from_context(sibling, 0); + perf_remove_from_context(sibling, &prd); put_pmu_ctx(sibling->pmu_ctx); } =20 @@ -13789,14 +13793,15 @@ static void __perf_pmu_remove(struct perf_event_c= ontext *ctx, struct list_head *events) { struct perf_event *event, *sibling; + struct perf_remove_data prd =3D { 0 }; =20 perf_event_groups_for_cpu_pmu(event, groups, cpu, pmu) { - perf_remove_from_context(event, 0); + perf_remove_from_context(event, &prd); put_pmu_ctx(event->pmu_ctx); list_add(&event->migrate_entry, events); =20 for_each_sibling_event(sibling, event) { - perf_remove_from_context(sibling, 0); + perf_remove_from_context(sibling, &prd); put_pmu_ctx(sibling->pmu_ctx); list_add(&sibling->migrate_entry, events); } @@ -13921,11 +13926,7 @@ perf_event_exit_event(struct perf_event *event, struct perf_event_context *ctx, bool revoke) { struct perf_event *parent_event =3D event->parent; - unsigned long detach_flags =3D DETACH_EXIT; - bool is_child =3D !!parent_event; - - if (parent_event =3D=3D EVENT_TOMBSTONE) - parent_event =3D NULL; + struct perf_remove_data prd =3D { .detach_flags =3D DETACH_EXIT }; =20 if (parent_event) { /* @@ -13940,29 +13941,36 @@ perf_event_exit_event(struct perf_event *event, * Do destroy all inherited groups, we don't care about those * and being thorough is better. */ - detach_flags |=3D DETACH_GROUP | DETACH_CHILD; + prd.detach_flags |=3D DETACH_GROUP | DETACH_CHILD; mutex_lock(&parent_event->child_mutex); } =20 if (revoke) - detach_flags |=3D DETACH_GROUP | DETACH_REVOKE; + prd.detach_flags |=3D DETACH_GROUP | DETACH_REVOKE; =20 - perf_remove_from_context(event, detach_flags); + perf_remove_from_context(event, &prd); /* * Child events can be freed. */ - if (is_child) { - if (parent_event) { - mutex_unlock(&parent_event->child_mutex); - /* - * Kick perf_poll() for is_event_hup(); - */ - perf_event_wakeup(parent_event); + if (parent_event) { + mutex_unlock(&parent_event->child_mutex); + /* + * Kick perf_poll() for is_event_hup(); + */ + perf_event_wakeup(parent_event); + + /* + * Match the refcount initialization. Make sure it doesn't happen + * twice if pmu_detach_event() calls it on an already exited task. + */ + if (prd.old_state & PERF_ATTACH_CHILD) { /* * pmu_detach_event() will have an extra refcount. + * perf_pending_task() might have one too. */ put_event(event); } + return; } =20 @@ -14532,13 +14540,14 @@ static void perf_swevent_init_cpu(unsigned int cp= u) static void __perf_event_exit_context(void *__info) { struct perf_cpu_context *cpuctx =3D this_cpu_ptr(&perf_cpu_context); + struct perf_remove_data prd =3D { .detach_flags =3D DETACH_GROUP }; struct perf_event_context *ctx =3D __info; struct perf_event *event; =20 raw_spin_lock(&ctx->lock); ctx_sched_out(ctx, NULL, EVENT_TIME); list_for_each_entry(event, &ctx->event_list, event_entry) - __perf_remove_from_context(event, cpuctx, ctx, (void *)DETACH_GROUP); + __perf_remove_from_context(event, cpuctx, ctx, (void *)&prd); raw_spin_unlock(&ctx->lock); } =20 --=20 2.48.1 From nobody Sun Feb 8 11:26:30 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 31E40284689; Thu, 24 Apr 2025 16:11:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745511105; cv=none; b=jbNBQ+LqRkVTd01tDF/ctC1n2nlzsWZEkr6gAh9/Rm/prOmcC3BlhBxzymPFv36BT9s4FJWsvjScQtJ1SLw32eofcc0DqpseLaf6djonXawYIr/a0ievOhT3NNiLdBZ/XhrXo+1wkyORHWICXG4T2JJQTcxIarJL47kmkrcMwZY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745511105; c=relaxed/simple; bh=E8g/WnMSIPVgYxWXisJxaBHvsrMBzUmnKRw22jKI5qY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ViBYTare86rydVTQdbqjfccAO+tT7eyUSS5UBlACgAQNi+TIwNhEccwYbjtdOdQdOS1cB7crp7FYTG5RH4A9x/4IqWf5KV2vlqwu/aeiFoXdeaoZofbUycu3SHEIbdBKrUtrlR8pXFSYNANv14KX3JFgWeYHWBQxymz0VINtcH0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=JrhFG47z; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="JrhFG47z" Received: by smtp.kernel.org (Postfix) with ESMTPSA id E206EC4CEEC; Thu, 24 Apr 2025 16:11:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1745511104; bh=E8g/WnMSIPVgYxWXisJxaBHvsrMBzUmnKRw22jKI5qY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=JrhFG47zVZmvzup+qm0cq6XD9g3k1tWEk49QJsqzvNY2HdnpuYafLDQRBNqug0oEL Smg1gian8PPPd4VbrSegGU9E1pY4NSt4CnerbXDtAzbpChhqsCElU7Qs/QXyfwXRKy rxboJHcxDqPvJL6C6eBLPI3UZ1ngnlzio0srK3k52i2LTfmjZDhUaiLhFv8NvJ008U xh0/OO5xJJVhAYCz2bO1gJDHJfnVwXoQHYZz0p6CXh3qFAiTg+St8er2WNy8sWyAB+ IF0wURGZdUJ5u1nTZECnz60u6TXKteEydgmzlJMZClRONKsKSAyhq698Km5lp/Cx+k p4G1mxl4mVJGA== From: Frederic Weisbecker To: Peter Zijlstra , Ingo Molnar Cc: LKML , Frederic Weisbecker , "Liang, Kan" , Adrian Hunter , Alexander Shishkin , Arnaldo Carvalho de Melo , Ian Rogers , Jiri Olsa , Mark Rutland , Namhyung Kim , Ravi Bangoria , linux-perf-users@vger.kernel.org Subject: [PATCH 3/4] perf: Remove too early and redundant CPU hotplug handling Date: Thu, 24 Apr 2025 18:11:27 +0200 Message-ID: <20250424161128.29176-4-frederic@kernel.org> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250424161128.29176-1-frederic@kernel.org> References: <20250424161128.29176-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The CPU hotplug handlers are called twice: at prepare and online stage. Their role is to: 1) Enable/disable a CPU context. This is irrelevant and even buggy at the prepare stage because the CPU is still offline. On early secondary CPU up, creating an event attached to that CPU might silently fail because the CPU context is observed as online but the context installation's IPI failure is ignored. 2) Update the scope cpumasks and re-migrate the events accordingly in the CPU down case. This is irrelevant at the prepare stage. 3) Remove the events attached to the context of the offlining CPU. It even uses an (unnecessary) IPI for it. This is also irrelevant at the prepare stage. Also none of the *_PREPARE and *_STARTING architecture perf related CPU hotplug callbacks rely on CPUHP_PERF_PREPARE. CPUHP_AP_PERF_ONLINE is enough and the right place to perform the work. Signed-off-by: Frederic Weisbecker --- include/linux/cpuhotplug.h | 1 - kernel/cpu.c | 5 ----- 2 files changed, 6 deletions(-) diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h index 1987400000b4..df366ee15456 100644 --- a/include/linux/cpuhotplug.h +++ b/include/linux/cpuhotplug.h @@ -60,7 +60,6 @@ enum cpuhp_state { /* PREPARE section invoked on a control CPU */ CPUHP_OFFLINE =3D 0, CPUHP_CREATE_THREADS, - CPUHP_PERF_PREPARE, CPUHP_PERF_X86_PREPARE, CPUHP_PERF_X86_AMD_UNCORE_PREP, CPUHP_PERF_POWER, diff --git a/kernel/cpu.c b/kernel/cpu.c index b08bb34b1718..a59e009e0be4 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -2069,11 +2069,6 @@ static struct cpuhp_step cpuhp_hp_states[] =3D { .teardown.single =3D NULL, .cant_stop =3D true, }, - [CPUHP_PERF_PREPARE] =3D { - .name =3D "perf:prepare", - .startup.single =3D perf_event_init_cpu, - .teardown.single =3D perf_event_exit_cpu, - }, [CPUHP_RANDOM_PREPARE] =3D { .name =3D "random:prepare", .startup.single =3D random_prepare_cpu, --=20 2.48.1 From nobody Sun Feb 8 11:26:30 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1584E285412; Thu, 24 Apr 2025 16:11:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745511108; cv=none; b=IYO2P9xtshSAfX+5/+TNHRnzg/+1IYBlgH+bGEPSRe4Jzp4XmCxonP4CmCvXihH1uHd9g1oWXK6ShCrx6N5dK9w4VsS9IakVl6bNXAPhYVO9mePTwk8UrJA+B+0xfB+qOrK3B4uihkAEQh1Yd6fKZk9htGoozQkQN0z2V6/757c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745511108; c=relaxed/simple; bh=DyNbIMm3t1OQosHaL0SdDU7894u7I6YldwA1DhSvZII=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=XAesvNRtSmUM2R22+8pJ+BkVK1XB/cD3p6MTuiDaj9bQrj+23jHYmAnCh3FEAJKa8x6n0pXPle2SEnOoaSwcq+6BVfVN8hYfGUASKs4PCLb0ajtpmvQgItYKQ++1WKvibpxLQLdJXGoSoJpGg8ykNq5lbksK6bmASpFxRhZDQVM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=bFnuii9a; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="bFnuii9a" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0899EC4CEE3; Thu, 24 Apr 2025 16:11:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1745511107; bh=DyNbIMm3t1OQosHaL0SdDU7894u7I6YldwA1DhSvZII=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=bFnuii9aq5fTzsXM0BwraDE0WeCM4/xjUCeQdF1E3wj9U1jEIrmxwQHUL4hpDN+Qa 8c5LQ+iTuX499/BsntY8R+1DBQZQvPRI/f9ONW3dPzNrLJ1HvXMWlJOnpnqOx+hjFA uskP7OeNvdFhlgEkIKOn87MklG+NpTZqrSn5sTPNSyIi38LMEEuuyK/7gNwFUwGZB7 t1vpjaikuLu0zGMAIuEBclEuBhvRZS1ETJ2ZgL1y0L5efFq1KOyyGLke8axGOkPyIX TF1IoPlusbU0JIwA7YNTBJ9OMwuUkn0GotBqKXLzrOHnFc+q6yggpbxn/AlRRoMpSs ftsquHYDh77Tw== From: Frederic Weisbecker To: Peter Zijlstra , Ingo Molnar Cc: LKML , Frederic Weisbecker , "Liang, Kan" , Adrian Hunter , Alexander Shishkin , Arnaldo Carvalho de Melo , Ian Rogers , Jiri Olsa , Mark Rutland , Namhyung Kim , Ravi Bangoria , linux-perf-users@vger.kernel.org Subject: [PATCH 4/4] perf: Fix confusing aux iteration Date: Thu, 24 Apr 2025 18:11:28 +0200 Message-ID: <20250424161128.29176-5-frederic@kernel.org> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250424161128.29176-1-frederic@kernel.org> References: <20250424161128.29176-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" While an event tears down all links to it as an aux, the iteration happens on the event's group leader instead of the group itself. If the event is a group leader, it has no effect because the event is also its own group leader. But otherwise there would be a risk to detach all the siblings events from the wrong group leader. It just happens to work because each sibling's aux link is tested against the right event before proceeding. Also the ctx lock is the same for the events and their group leader so the iteration is safe. Yet the iteration is confusing. Clarify the actual intent. Signed-off-by: Frederic Weisbecker --- kernel/events/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index 968a1d14bc8b..0d25bde536c9 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -2171,7 +2171,7 @@ static void perf_put_aux_event(struct perf_event *eve= nt) * If the event is an aux_event, tear down all links to * it from other events. */ - for_each_sibling_event(iter, event->group_leader) { + for_each_sibling_event(iter, event) { if (iter->aux_event !=3D event) continue; =20 --=20 2.48.1