From nobody Mon Feb 9 01:47:42 2026 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 05928245001; Tue, 8 Apr 2025 19:05:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744139106; cv=none; b=M5gFfDlP7wLuIOpqk1N0/604OZYWZCI+n8H/As+LGzpeUAlmQ28Q1u5ep47VyCOg7ZrW3jK0fspzwtCBF1ZM1bCX6jXrbqoeG4rH6zbYZFBSQupske6Dhx2IXH939TcJd+qC+XGqgOw/aLxrtMf4oYEypyNQ8odj/ZDySN5VbJ8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744139106; c=relaxed/simple; bh=jpnkIqq+SyHb9OIaSEHcYWixKYDgToRhCtI/yDh+5Qo=; h=Date:From:To:Subject:Cc:In-Reply-To:References:MIME-Version: Message-ID:Content-Type; b=IziCiV6/u0wLFbSUGPnPLE56I81bHGcZYtqelQBekahnlTbzjn05gNEkZFzL7b3xHC71AiAJ65JTIZjAxiNc/Hxua99ADiJPC9vZf5QHh5m4pPFBwVPR4FUv6GMMbK9uopjI5tCAZvEjBsPTbWOd7nKojqFgXwUazDs4X3oyGN4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=3sIyQVPK; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=U0CPspid; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="3sIyQVPK"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="U0CPspid" Date: Tue, 08 Apr 2025 19:05:00 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1744139101; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=n1NIK0tHwkSyjmUJpC5jp/u2rLhkGl48EZP81y2HL54=; b=3sIyQVPKh3Z36nZbfUgEO49NMF05Os6y8nmXPJypuyMHCe9J6UERRfNTt1r0wSJ3XhGyEq AmOdyVk/Q1vY4gjAnrVYMkXn5L+6DYaJFidIrK6SKgLrlteKcNDKaIstjJibiaDIxOtDjl bI4FbYxbIFQPgb4fcAkY5uwYw8azUkZyUKJJVjSFXRHXsg3ujJGABEw5J6RgzatabKgq97 9MtMPrT6Fs1fz77LluDg6hYfF3aHmAVqIJCU1YuuwvF1M26Qt1EWep7rbCY0mnCZq0LOpJ YhEMTIeNEfZNNcXn93N5Ho99cPyvCMgcVHni6kvxuwghxnvJkrON4fx4muFIBg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1744139101; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=n1NIK0tHwkSyjmUJpC5jp/u2rLhkGl48EZP81y2HL54=; b=U0CPspid2Ta5JIjMEhnnqV50uDKx9TQrEmYEG3KHsWypzWAufICfBu5kRB8OTm+x7JePTu JqjOYyHmvF3nWlAA== From: "tip-bot2 for Peter Zijlstra" Sender: tip-bot2@linutronix.de Reply-to: linux-kernel@vger.kernel.org To: linux-tip-commits@vger.kernel.org Subject: [tip: perf/core] perf: Make perf_pmu_unregister() useable Cc: "Peter Zijlstra (Intel)" , Ravi Bangoria , x86@kernel.org, linux-kernel@vger.kernel.org In-Reply-To: <20250307193723.525402029@infradead.org> References: <20250307193723.525402029@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-ID: <174413910060.31282.8504333237561774159.tip-bot2@tip-bot2> Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails Precedence: bulk Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable The following commit has been merged into the perf/core branch of tip: Commit-ID: da916e96e2dedcb2d40de77a7def833d315b81a6 Gitweb: https://git.kernel.org/tip/da916e96e2dedcb2d40de77a7def833d3= 15b81a6 Author: Peter Zijlstra AuthorDate: Fri, 25 Oct 2024 10:21:41 +02:00 Committer: Peter Zijlstra CommitterDate: Tue, 08 Apr 2025 20:55:48 +02:00 perf: Make perf_pmu_unregister() useable Previously it was only safe to call perf_pmu_unregister() if there were no active events of that pmu around -- which was impossible to guarantee since it races all sorts against perf_init_event(). Rework the whole thing by: - keeping track of all events for a given pmu - 'hiding' the pmu from perf_init_event() - waiting for the appropriate (s)rcu grace periods such that all prior references to the PMU will be completed - detaching all still existing events of that pmu (see first point) and moving them to a new REVOKED state. - actually freeing the pmu data. Where notably the new REVOKED state must inhibit all event actions from reaching code that wants to use event->pmu. Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Ravi Bangoria Link: https://lkml.kernel.org/r/20250307193723.525402029@infradead.org --- include/linux/perf_event.h | 15 +- kernel/events/core.c | 320 ++++++++++++++++++++++++++++++------ 2 files changed, 280 insertions(+), 55 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 0069ba6..7f49a58 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -325,6 +325,9 @@ struct perf_output_handle; struct pmu { struct list_head entry; =20 + spinlock_t events_lock; + struct list_head events; + struct module *module; struct device *dev; struct device *parent; @@ -622,9 +625,10 @@ struct perf_addr_filter_range { * enum perf_event_state - the states of an event: */ enum perf_event_state { - PERF_EVENT_STATE_DEAD =3D -4, - PERF_EVENT_STATE_EXIT =3D -3, - PERF_EVENT_STATE_ERROR =3D -2, + PERF_EVENT_STATE_DEAD =3D -5, + PERF_EVENT_STATE_REVOKED =3D -4, /* pmu gone, must not touch */ + PERF_EVENT_STATE_EXIT =3D -3, /* task died, still inherit */ + PERF_EVENT_STATE_ERROR =3D -2, /* scheduling error, can enable */ PERF_EVENT_STATE_OFF =3D -1, PERF_EVENT_STATE_INACTIVE =3D 0, PERF_EVENT_STATE_ACTIVE =3D 1, @@ -865,6 +869,7 @@ struct perf_event { void *security; #endif struct list_head sb_list; + struct list_head pmu_list; =20 /* * Certain events gets forwarded to another pmu internally by over- @@ -1155,7 +1160,7 @@ extern void perf_aux_output_flag(struct perf_output_h= andle *handle, u64 flags); extern void perf_event_itrace_started(struct perf_event *event); =20 extern int perf_pmu_register(struct pmu *pmu, const char *name, int type); -extern void perf_pmu_unregister(struct pmu *pmu); +extern int perf_pmu_unregister(struct pmu *pmu); =20 extern void __perf_event_task_sched_in(struct task_struct *prev, struct task_struct *task); @@ -1760,7 +1765,7 @@ static inline bool needs_branch_stack(struct perf_eve= nt *event) =20 static inline bool has_aux(struct perf_event *event) { - return event->pmu->setup_aux; + return event->pmu && event->pmu->setup_aux; } =20 static inline bool has_aux_action(struct perf_event *event) diff --git a/kernel/events/core.c b/kernel/events/core.c index 985b5c7..2eb9cd5 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -208,6 +208,7 @@ static void perf_ctx_unlock(struct perf_cpu_context *cp= uctx, } =20 #define TASK_TOMBSTONE ((void *)-1L) +#define EVENT_TOMBSTONE ((void *)-1L) =20 static bool is_kernel_event(struct perf_event *event) { @@ -2336,6 +2337,11 @@ static void perf_child_detach(struct perf_event *eve= nt) =20 sync_child_event(event); list_del_init(&event->child_list); + /* + * Cannot set to NULL, as that would confuse the situation vs + * not being a child event. See for example unaccount_event(). + */ + event->parent =3D EVENT_TOMBSTONE; } =20 static bool is_orphaned_event(struct perf_event *event) @@ -2457,8 +2463,9 @@ ctx_time_update_event(struct perf_event_context *ctx,= struct perf_event *event) =20 #define DETACH_GROUP 0x01UL #define DETACH_CHILD 0x02UL -#define DETACH_DEAD 0x04UL -#define DETACH_EXIT 0x08UL +#define DETACH_EXIT 0x04UL +#define DETACH_REVOKE 0x08UL +#define DETACH_DEAD 0x10UL =20 /* * Cross CPU call to remove a performance event @@ -2484,18 +2491,21 @@ __perf_remove_from_context(struct perf_event *event, */ if (flags & DETACH_EXIT) state =3D PERF_EVENT_STATE_EXIT; + if (flags & DETACH_REVOKE) + state =3D PERF_EVENT_STATE_REVOKED; if (flags & DETACH_DEAD) { event->pending_disable =3D 1; state =3D PERF_EVENT_STATE_DEAD; } event_sched_out(event, ctx); - perf_event_set_state(event, min(event->state, state)); if (flags & DETACH_GROUP) perf_group_detach(event); if (flags & DETACH_CHILD) perf_child_detach(event); list_del_event(event, ctx); =20 + event->state =3D min(event->state, state); + if (!pmu_ctx->nr_events) { pmu_ctx->rotate_necessary =3D 0; =20 @@ -4523,7 +4533,8 @@ out: =20 static void perf_remove_from_owner(struct perf_event *event); static void perf_event_exit_event(struct perf_event *event, - struct perf_event_context *ctx); + struct perf_event_context *ctx, + bool revoke); =20 /* * Removes all events from the current task that have been marked @@ -4550,7 +4561,7 @@ static void perf_event_remove_on_exec(struct perf_eve= nt_context *ctx) =20 modified =3D true; =20 - perf_event_exit_event(event, ctx); + perf_event_exit_event(event, ctx, false); } =20 raw_spin_lock_irqsave(&ctx->lock, flags); @@ -5132,6 +5143,7 @@ static bool is_sb_event(struct perf_event *event) attr->context_switch || attr->text_poke || attr->bpf_event) return true; + return false; } =20 @@ -5528,6 +5540,8 @@ static void perf_free_addr_filters(struct perf_event = *event); /* vs perf_event_alloc() error */ static void __free_event(struct perf_event *event) { + struct pmu *pmu =3D event->pmu; + if (event->attach_state & PERF_ATTACH_CALLCHAIN) put_callchain_buffers(); =20 @@ -5557,6 +5571,7 @@ static void __free_event(struct perf_event *event) * put_pmu_ctx() needs an event->ctx reference, because of * epc->ctx. */ + WARN_ON_ONCE(!pmu); WARN_ON_ONCE(!event->ctx); WARN_ON_ONCE(event->pmu_ctx->ctx !=3D event->ctx); put_pmu_ctx(event->pmu_ctx); @@ -5569,8 +5584,13 @@ static void __free_event(struct perf_event *event) if (event->ctx) put_ctx(event->ctx); =20 - if (event->pmu) - module_put(event->pmu->module); + if (pmu) { + module_put(pmu->module); + scoped_guard (spinlock, &pmu->events_lock) { + list_del(&event->pmu_list); + wake_up_var(pmu); + } + } =20 call_rcu(&event->rcu_head, free_event_rcu); } @@ -5606,22 +5626,6 @@ static void _free_event(struct perf_event *event) } =20 /* - * Used to free events which have a known refcount of 1, such as in error = paths - * where the event isn't exposed yet and inherited events. - */ -static void free_event(struct perf_event *event) -{ - if (WARN(atomic_long_cmpxchg(&event->refcount, 1, 0) !=3D 1, - "unexpected event refcount: %ld; ptr=3D%p\n", - atomic_long_read(&event->refcount), event)) { - /* leak to avoid use-after-free */ - return; - } - - _free_event(event); -} - -/* * Remove user event from the owner task. */ static void perf_remove_from_owner(struct perf_event *event) @@ -5724,7 +5728,11 @@ int perf_event_release_kernel(struct perf_event *eve= nt) * Thus this guarantees that we will in fact observe and kill _ALL_ * child events. */ - perf_remove_from_context(event, DETACH_GROUP|DETACH_DEAD); + if (event->state > PERF_EVENT_STATE_REVOKED) { + perf_remove_from_context(event, DETACH_GROUP|DETACH_DEAD); + } else { + event->state =3D PERF_EVENT_STATE_DEAD; + } =20 perf_event_ctx_unlock(event, ctx); =20 @@ -6013,7 +6021,7 @@ __perf_read(struct perf_event *event, char __user *bu= f, size_t count) * error state (i.e. because it was pinned but it couldn't be * scheduled on to the CPU at some point). */ - if (event->state =3D=3D PERF_EVENT_STATE_ERROR) + if (event->state <=3D PERF_EVENT_STATE_ERROR) return 0; =20 if (count < event->read_size) @@ -6052,8 +6060,14 @@ static __poll_t perf_poll(struct file *file, poll_ta= ble *wait) struct perf_buffer *rb; __poll_t events =3D EPOLLHUP; =20 + if (event->state <=3D PERF_EVENT_STATE_REVOKED) + return EPOLLERR; + poll_wait(file, &event->waitq, wait); =20 + if (event->state <=3D PERF_EVENT_STATE_REVOKED) + return EPOLLERR; + if (is_event_hup(event)) return events; =20 @@ -6232,6 +6246,9 @@ static long _perf_ioctl(struct perf_event *event, uns= igned int cmd, unsigned lon void (*func)(struct perf_event *); u32 flags =3D arg; =20 + if (event->state <=3D PERF_EVENT_STATE_REVOKED) + return -ENODEV; + switch (cmd) { case PERF_EVENT_IOC_ENABLE: func =3D _perf_event_enable; @@ -6607,9 +6624,22 @@ void ring_buffer_put(struct perf_buffer *rb) call_rcu(&rb->rcu_head, rb_free_rcu); } =20 +typedef void (*mapped_f)(struct perf_event *event, struct mm_struct *mm); + +#define get_mapped(event, func) \ +({ struct pmu *pmu; \ + mapped_f f =3D NULL; \ + guard(rcu)(); \ + pmu =3D READ_ONCE(event->pmu); \ + if (pmu) \ + f =3D pmu->func; \ + f; \ +}) + static void perf_mmap_open(struct vm_area_struct *vma) { struct perf_event *event =3D vma->vm_file->private_data; + mapped_f mapped =3D get_mapped(event, event_mapped); =20 atomic_inc(&event->mmap_count); atomic_inc(&event->rb->mmap_count); @@ -6617,8 +6647,8 @@ static void perf_mmap_open(struct vm_area_struct *vma) if (vma->vm_pgoff) atomic_inc(&event->rb->aux_mmap_count); =20 - if (event->pmu->event_mapped) - event->pmu->event_mapped(event, vma->vm_mm); + if (mapped) + mapped(event, vma->vm_mm); } =20 static void perf_pmu_output_stop(struct perf_event *event); @@ -6634,14 +6664,16 @@ static void perf_pmu_output_stop(struct perf_event = *event); static void perf_mmap_close(struct vm_area_struct *vma) { struct perf_event *event =3D vma->vm_file->private_data; + mapped_f unmapped =3D get_mapped(event, event_unmapped); struct perf_buffer *rb =3D ring_buffer_get(event); struct user_struct *mmap_user =3D rb->mmap_user; int mmap_locked =3D rb->mmap_locked; unsigned long size =3D perf_data_size(rb); bool detach_rest =3D false; =20 - if (event->pmu->event_unmapped) - event->pmu->event_unmapped(event, vma->vm_mm); + /* FIXIES vs perf_pmu_unregister() */ + if (unmapped) + unmapped(event, vma->vm_mm); =20 /* * The AUX buffer is strictly a sub-buffer, serialize using aux_mutex @@ -6834,6 +6866,7 @@ static int perf_mmap(struct file *file, struct vm_are= a_struct *vma) unsigned long nr_pages; long user_extra =3D 0, extra =3D 0; int ret, flags =3D 0; + mapped_f mapped; =20 /* * Don't allow mmap() of inherited per-task counters. This would @@ -6864,6 +6897,16 @@ static int perf_mmap(struct file *file, struct vm_ar= ea_struct *vma) mutex_lock(&event->mmap_mutex); ret =3D -EINVAL; =20 + /* + * This relies on __pmu_detach_event() taking mmap_mutex after marking + * the event REVOKED. Either we observe the state, or __pmu_detach_event() + * will detach the rb created here. + */ + if (event->state <=3D PERF_EVENT_STATE_REVOKED) { + ret =3D -ENODEV; + goto unlock; + } + if (vma->vm_pgoff =3D=3D 0) { nr_pages -=3D 1; =20 @@ -7042,8 +7085,9 @@ aux_unlock: if (!ret) ret =3D map_range(rb, vma); =20 - if (!ret && event->pmu->event_mapped) - event->pmu->event_mapped(event, vma->vm_mm); + mapped =3D get_mapped(event, event_mapped); + if (mapped) + mapped(event, vma->vm_mm); =20 return ret; } @@ -7054,6 +7098,9 @@ static int perf_fasync(int fd, struct file *filp, int= on) struct perf_event *event =3D filp->private_data; int retval; =20 + if (event->state <=3D PERF_EVENT_STATE_REVOKED) + return -ENODEV; + inode_lock(inode); retval =3D fasync_helper(fd, filp, on, &event->fasync); inode_unlock(inode); @@ -11062,6 +11109,9 @@ static int __perf_event_set_bpf_prog(struct perf_ev= ent *event, { bool is_kprobe, is_uprobe, is_tracepoint, is_syscall_tp; =20 + if (event->state <=3D PERF_EVENT_STATE_REVOKED) + return -ENODEV; + if (!perf_event_is_tracing(event)) return perf_event_set_bpf_handler(event, prog, bpf_cookie); =20 @@ -12245,6 +12295,9 @@ int perf_pmu_register(struct pmu *_pmu, const char = *name, int type) if (!pmu->event_idx) pmu->event_idx =3D perf_event_idx_default; =20 + INIT_LIST_HEAD(&pmu->events); + spin_lock_init(&pmu->events_lock); + /* * Now that the PMU is complete, make it visible to perf_try_init_event(). */ @@ -12258,21 +12311,143 @@ int perf_pmu_register(struct pmu *_pmu, const ch= ar *name, int type) } EXPORT_SYMBOL_GPL(perf_pmu_register); =20 -void perf_pmu_unregister(struct pmu *pmu) +static void __pmu_detach_event(struct pmu *pmu, struct perf_event *event, + struct perf_event_context *ctx) +{ + /* + * De-schedule the event and mark it REVOKED. + */ + perf_event_exit_event(event, ctx, true); + + /* + * All _free_event() bits that rely on event->pmu: + * + * Notably, perf_mmap() relies on the ordering here. + */ + scoped_guard (mutex, &event->mmap_mutex) { + WARN_ON_ONCE(pmu->event_unmapped); + /* + * Mostly an empty lock sequence, such that perf_mmap(), which + * relies on mmap_mutex, is sure to observe the state change. + */ + } + + perf_event_free_bpf_prog(event); + perf_free_addr_filters(event); + + if (event->destroy) { + event->destroy(event); + event->destroy =3D NULL; + } + + if (event->pmu_ctx) { + put_pmu_ctx(event->pmu_ctx); + event->pmu_ctx =3D NULL; + } + + exclusive_event_destroy(event); + module_put(pmu->module); + + event->pmu =3D NULL; /* force fault instead of UAF */ +} + +static void pmu_detach_event(struct pmu *pmu, struct perf_event *event) +{ + struct perf_event_context *ctx; + + ctx =3D perf_event_ctx_lock(event); + __pmu_detach_event(pmu, event, ctx); + perf_event_ctx_unlock(event, ctx); + + scoped_guard (spinlock, &pmu->events_lock) + list_del(&event->pmu_list); +} + +static struct perf_event *pmu_get_event(struct pmu *pmu) +{ + struct perf_event *event; + + guard(spinlock)(&pmu->events_lock); + list_for_each_entry(event, &pmu->events, pmu_list) { + if (atomic_long_inc_not_zero(&event->refcount)) + return event; + } + + return NULL; +} + +static bool pmu_empty(struct pmu *pmu) +{ + guard(spinlock)(&pmu->events_lock); + return list_empty(&pmu->events); +} + +static void pmu_detach_events(struct pmu *pmu) +{ + struct perf_event *event; + + for (;;) { + event =3D pmu_get_event(pmu); + if (!event) + break; + + pmu_detach_event(pmu, event); + put_event(event); + } + + /* + * wait for pending _free_event()s + */ + wait_var_event(pmu, pmu_empty(pmu)); +} + +int perf_pmu_unregister(struct pmu *pmu) { scoped_guard (mutex, &pmus_lock) { + if (!idr_cmpxchg(&pmu_idr, pmu->type, pmu, NULL)) + return -EINVAL; + list_del_rcu(&pmu->entry); - idr_remove(&pmu_idr, pmu->type); } =20 /* * We dereference the pmu list under both SRCU and regular RCU, so * synchronize against both of those. + * + * Notably, the entirety of event creation, from perf_init_event() + * (which will now fail, because of the above) until + * perf_install_in_context() should be under SRCU such that + * this synchronizes against event creation. This avoids trying to + * detach events that are not fully formed. */ synchronize_srcu(&pmus_srcu); synchronize_rcu(); =20 + if (pmu->event_unmapped && !pmu_empty(pmu)) { + /* + * Can't force remove events when pmu::event_unmapped() + * is used in perf_mmap_close(). + */ + guard(mutex)(&pmus_lock); + idr_cmpxchg(&pmu_idr, pmu->type, NULL, pmu); + list_add_rcu(&pmu->entry, &pmus); + return -EBUSY; + } + + scoped_guard (mutex, &pmus_lock) + idr_remove(&pmu_idr, pmu->type); + + /* + * PMU is removed from the pmus list, so no new events will + * be created, now take care of the existing ones. + */ + pmu_detach_events(pmu); + + /* + * PMU is unused, make it go away. + */ perf_pmu_free(pmu); + return 0; } EXPORT_SYMBOL_GPL(perf_pmu_unregister); =20 @@ -12366,7 +12541,7 @@ static struct pmu *perf_init_event(struct perf_even= t *event) struct pmu *pmu; int type, ret; =20 - guard(srcu)(&pmus_srcu); + guard(srcu)(&pmus_srcu); /* pmu idr/list access */ =20 /* * Save original type before calling pmu->event_init() since certain @@ -12590,6 +12765,7 @@ perf_event_alloc(struct perf_event_attr *attr, int = cpu, INIT_LIST_HEAD(&event->active_entry); INIT_LIST_HEAD(&event->addr_filters.list); INIT_HLIST_NODE(&event->hlist_entry); + INIT_LIST_HEAD(&event->pmu_list); =20 =20 init_waitqueue_head(&event->waitq); @@ -12768,6 +12944,13 @@ perf_event_alloc(struct perf_event_attr *attr, int= cpu, /* symmetric to unaccount_event() in _free_event() */ account_event(event); =20 + /* + * Event creation should be under SRCU, see perf_pmu_unregister(). + */ + lockdep_assert_held(&pmus_srcu); + scoped_guard (spinlock, &pmu->events_lock) + list_add(&event->pmu_list, &pmu->events); + return_ptr(event); } =20 @@ -12967,6 +13150,9 @@ set: goto unlock; =20 if (output_event) { + if (output_event->state <=3D PERF_EVENT_STATE_REVOKED) + goto unlock; + /* get the rb we want to redirect to */ rb =3D ring_buffer_get(output_event); if (!rb) @@ -13148,6 +13334,11 @@ SYSCALL_DEFINE5(perf_event_open, if (event_fd < 0) return event_fd; =20 + /* + * Event creation should be under SRCU, see perf_pmu_unregister(). + */ + guard(srcu)(&pmus_srcu); + CLASS(fd, group)(group_fd); // group_fd =3D=3D -1 =3D> empty if (group_fd !=3D -1) { if (!is_perf_file(group)) { @@ -13155,6 +13346,10 @@ SYSCALL_DEFINE5(perf_event_open, goto err_fd; } group_leader =3D fd_file(group)->private_data; + if (group_leader->state <=3D PERF_EVENT_STATE_REVOKED) { + err =3D -ENODEV; + goto err_fd; + } if (flags & PERF_FLAG_FD_OUTPUT) output_event =3D group_leader; if (flags & PERF_FLAG_FD_NO_GROUP) @@ -13451,7 +13646,7 @@ err_cred: if (task) up_read(&task->signal->exec_update_lock); err_alloc: - free_event(event); + put_event(event); err_task: if (task) put_task_struct(task); @@ -13488,6 +13683,11 @@ perf_event_create_kernel_counter(struct perf_event= _attr *attr, int cpu, if (attr->aux_output || attr->aux_action) return ERR_PTR(-EINVAL); =20 + /* + * Event creation should be under SRCU, see perf_pmu_unregister(). + */ + guard(srcu)(&pmus_srcu); + event =3D perf_event_alloc(attr, cpu, task, NULL, NULL, overflow_handler, context, -1); if (IS_ERR(event)) { @@ -13559,7 +13759,7 @@ err_unlock: perf_unpin_context(ctx); put_ctx(ctx); err_alloc: - free_event(event); + put_event(event); err: return ERR_PTR(err); } @@ -13699,10 +13899,15 @@ static void sync_child_event(struct perf_event *c= hild_event) } =20 static void -perf_event_exit_event(struct perf_event *event, struct perf_event_context = *ctx) +perf_event_exit_event(struct perf_event *event, + struct perf_event_context *ctx, bool revoke) { struct perf_event *parent_event =3D event->parent; - unsigned long detach_flags =3D 0; + unsigned long detach_flags =3D DETACH_EXIT; + bool is_child =3D !!parent_event; + + if (parent_event =3D=3D EVENT_TOMBSTONE) + parent_event =3D NULL; =20 if (parent_event) { /* @@ -13717,22 +13922,29 @@ perf_event_exit_event(struct perf_event *event, s= truct perf_event_context *ctx) * Do destroy all inherited groups, we don't care about those * and being thorough is better. */ - detach_flags =3D DETACH_GROUP | DETACH_CHILD; + detach_flags |=3D DETACH_GROUP | DETACH_CHILD; mutex_lock(&parent_event->child_mutex); } =20 - perf_remove_from_context(event, detach_flags | DETACH_EXIT); + if (revoke) + detach_flags |=3D DETACH_GROUP | DETACH_REVOKE; =20 + perf_remove_from_context(event, detach_flags); /* * Child events can be freed. */ - if (parent_event) { - mutex_unlock(&parent_event->child_mutex); - /* - * Kick perf_poll() for is_event_hup(); - */ - perf_event_wakeup(parent_event); - put_event(event); + if (is_child) { + if (parent_event) { + mutex_unlock(&parent_event->child_mutex); + /* + * Kick perf_poll() for is_event_hup(); + */ + perf_event_wakeup(parent_event); + /* + * pmu_detach_event() will have an extra refcount. + */ + put_event(event); + } return; } =20 @@ -13796,7 +14008,7 @@ static void perf_event_exit_task_context(struct tas= k_struct *task, bool exit) perf_event_task(task, ctx, 0); =20 list_for_each_entry_safe(child_event, next, &ctx->event_list, event_entry) - perf_event_exit_event(child_event, ctx); + perf_event_exit_event(child_event, ctx, false); =20 mutex_unlock(&ctx->mutex); =20 @@ -13949,6 +14161,14 @@ inherit_event(struct perf_event *parent_event, if (parent_event->parent) parent_event =3D parent_event->parent; =20 + if (parent_event->state <=3D PERF_EVENT_STATE_REVOKED) + return NULL; + + /* + * Event creation should be under SRCU, see perf_pmu_unregister(). + */ + guard(srcu)(&pmus_srcu); + child_event =3D perf_event_alloc(&parent_event->attr, parent_event->cpu, child, @@ -13962,7 +14182,7 @@ inherit_event(struct perf_event *parent_event, =20 pmu_ctx =3D find_get_pmu_context(child_event->pmu, child_ctx, child_event= ); if (IS_ERR(pmu_ctx)) { - free_event(child_event); + put_event(child_event); return ERR_CAST(pmu_ctx); } child_event->pmu_ctx =3D pmu_ctx; @@ -13977,7 +14197,7 @@ inherit_event(struct perf_event *parent_event, if (is_orphaned_event(parent_event) || !atomic_long_inc_not_zero(&parent_event->refcount)) { mutex_unlock(&parent_event->child_mutex); - free_event(child_event); + put_event(child_event); return NULL; }