From nobody Tue Dec 16 18:24:18 2025 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 473DB18027 for ; Mon, 15 Jan 2024 17:01:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="aM57N1Ez" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1705338097; x=1736874097; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=rDlcveJOd/QxYfdylxHGM+KIbu+7YyGNcC1nda6F/iw=; b=aM57N1Eze0zbGuhan2KsgmvcXVwyYiXCFEdnFhCw/Pax4p6I6tMC2O2O qWMrq5O7dA9lwuCIW1QP0QvTx3Xn0WAmdCxw0DNZVeCqejDh8U9nnsxPS gk8ZJOzN1CEf3P4vhIumtpglnoW1jiFhpjLf3FLA8V4fxU7pPjc8GW6Mj evERt0Ol69SU00KYNHsape+jIPTLN9RcW53Tozic7CNlGf95ij/dEVi3J 2w1G9J5nyfZxlJD04uUHaPH0hDT9md5gukvYz5y5OKTs3B4Qi1QRiKpiT 727QoUsjIIYfN1k/f/S5WcBlVyWiHceSEbsaF3vW4ij7kJ4VQuL/SEIh4 A==; X-IronPort-AV: E=McAfee;i="6600,9927,10954"; a="6408196" X-IronPort-AV: E=Sophos;i="6.04,197,1695711600"; d="scan'208";a="6408196" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 09:01:37 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10954"; a="907101324" X-IronPort-AV: E=Sophos;i="6.04,197,1695711600"; d="scan'208";a="907101324" Received: from mleonvig-mobl1.ger.corp.intel.com (HELO localhost.localdomain) ([10.213.223.101]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 09:01:31 -0800 From: Tvrtko Ursulin To: linux-kernel@vger.kernel.org, tvrtko.ursulin@linux.intel.com Cc: Tvrtko Ursulin , Peter Zijlstra , Umesh Nerlige Ramappa , Aravind Iddamsetty Subject: [RFC 1/3] perf: Add new late event free callback Date: Mon, 15 Jan 2024 17:01:18 +0000 Message-Id: <20240115170120.662220-2-tvrtko.ursulin@linux.intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240115170120.662220-1-tvrtko.ursulin@linux.intel.com> References: <20240115170120.662220-1-tvrtko.ursulin@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Tvrtko Ursulin This allows drivers to implement a PMU with support for unbinding the device, for example by making event->pmu reference counted on the driver side and its lifetime matching the struct perf_event init/free. Otherwise, if an open perf fd is kept past driver unbind, the perf code can dereference the potentially freed struct pmu from the _free_event steps which follow the existing destroy callback. TODO/FIXME/QQQ: A simpler version could be to simply move the ->destroy() callback to later in _free_event(). However a comment there claims there are steps which need to run after the existing destroy callbacks, hence I opted for an initially cautious approach. Signed-off-by: Tvrtko Ursulin Cc: Peter Zijlstra Cc: Umesh Nerlige Ramappa Cc: Aravind Iddamsetty --- include/linux/perf_event.h | 1 + kernel/events/core.c | 13 +++++++++++-- 2 files changed, 12 insertions(+), 2 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 5547ba68e6e4..a567d2d98be1 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -799,6 +799,7 @@ struct perf_event { struct perf_event *aux_event; =20 void (*destroy)(struct perf_event *); + void (*free)(struct perf_event *); struct rcu_head rcu_head; =20 struct pid_namespace *ns; diff --git a/kernel/events/core.c b/kernel/events/core.c index a64165af45c1..4b62d2201ca7 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -5242,6 +5242,9 @@ static void _free_event(struct perf_event *event) exclusive_event_destroy(event); module_put(event->pmu->module); =20 + if (event->free) + event->free(event); + call_rcu(&event->rcu_head, free_event_rcu); } =20 @@ -11662,8 +11665,12 @@ static int perf_try_init_event(struct pmu *pmu, st= ruct perf_event *event) event_has_any_exclude_flag(event)) ret =3D -EINVAL; =20 - if (ret && event->destroy) - event->destroy(event); + if (ret) { + if (event->destroy) + event->destroy(event); + if (event->free) + event->free(event); + } } =20 if (ret) @@ -12090,6 +12097,8 @@ perf_event_alloc(struct perf_event_attr *attr, int = cpu, perf_detach_cgroup(event); if (event->destroy) event->destroy(event); + if (event->free) + event->free(event); module_put(pmu->module); err_ns: if (event->hw.target) --=20 2.40.1 From nobody Tue Dec 16 18:24:18 2025 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 92B0218035 for ; Mon, 15 Jan 2024 17:01:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="BnhGgSoF" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1705338098; x=1736874098; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=hgHSCCkYdUQvQN3OmRimDqcNGoAmnLSX2bEwJ8BLOuU=; b=BnhGgSoFC9E3ZqxrABuvUoDn9QU/D4US16ZoZQfQlpOjhRtfs/pY+5il ro+wmM2n6ItAM0kgEwy73rnQ/sSW6kapEH5HbeST6ro4GvsUeZiFGoEbN Ae5kcBL0ju6YUUipSl6mn+MUo8jKVvrLWmADHResMZu3ZRNKzePTra1ea j01H8vgi4uyXz7CsJA3Z48wgP180Ux5wyftefEa9P7P6DpmymO/nsq7Xx xCljgaWPprBV/qnptO0Gg42rZiHjhlsCQgtnPJFj6Ggn4tZVLcxqz62BW h+Pf4uh6mwGQpgZH/YF0GJ0VFwdkyQT7K0Xd7uuaRAeaSs/XVdYk26PW0 g==; X-IronPort-AV: E=McAfee;i="6600,9927,10954"; a="6408200" X-IronPort-AV: E=Sophos;i="6.04,197,1695711600"; d="scan'208";a="6408200" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 09:01:38 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10954"; a="907101347" X-IronPort-AV: E=Sophos;i="6.04,197,1695711600"; d="scan'208";a="907101347" Received: from mleonvig-mobl1.ger.corp.intel.com (HELO localhost.localdomain) ([10.213.223.101]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 09:01:34 -0800 From: Tvrtko Ursulin To: linux-kernel@vger.kernel.org, tvrtko.ursulin@linux.intel.com Cc: Tvrtko Ursulin , Peter Zijlstra , Umesh Nerlige Ramappa , Aravind Iddamsetty Subject: [RFC 2/3] drm/i915/pmu: Move i915 reference drop to new event->free() Date: Mon, 15 Jan 2024 17:01:19 +0000 Message-Id: <20240115170120.662220-3-tvrtko.ursulin@linux.intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240115170120.662220-1-tvrtko.ursulin@linux.intel.com> References: <20240115170120.662220-1-tvrtko.ursulin@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Tvrtko Ursulin Avoids use after free in the perf core code on the event destruction path, after the PCI driver has been unbound with the active perf file descriptors. Signed-off-by: Tvrtko Ursulin Cc: Peter Zijlstra Cc: Umesh Nerlige Ramappa Cc: Aravind Iddamsetty --- drivers/gpu/drm/i915/i915_pmu.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pm= u.c index 21eb0c5b320d..010763a5bc39 100644 --- a/drivers/gpu/drm/i915/i915_pmu.c +++ b/drivers/gpu/drm/i915/i915_pmu.c @@ -514,7 +514,7 @@ static enum hrtimer_restart i915_sample(struct hrtimer = *hrtimer) return HRTIMER_RESTART; } =20 -static void i915_pmu_event_destroy(struct perf_event *event) +static void i915_pmu_event_free(struct perf_event *event) { struct i915_pmu *pmu =3D event_to_pmu(event); struct drm_i915_private *i915 =3D pmu_to_i915(pmu); @@ -630,7 +630,7 @@ static int i915_pmu_event_init(struct perf_event *event) =20 if (!event->parent) { drm_dev_get(&i915->drm); - event->destroy =3D i915_pmu_event_destroy; + event->free =3D i915_pmu_event_free; } =20 return 0; --=20 2.40.1 From nobody Tue Dec 16 18:24:18 2025 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 079531805A for ; Mon, 15 Jan 2024 17:01:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="n8LhiVe3" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1705338100; x=1736874100; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=tuXsnLpj2x6E0qxbdbl5zZtSQpYNIsi1oslLdV3rXHU=; b=n8LhiVe3SzGVt9nn0Fx+Mccn6RQNc8jOCp0P2z/H/EJ5T4W1R24SgmFj f4xWDC6cU80zfXdV1rfxBhQRxlM9iPOUlH379Oj2t87ywMXo4w4iYILtH 096yHLdLrBEObRCt/4BBF42tBv7zwf4WeAC6ez+xWy2f5+a052IzzW8CK Q4FEYXjxkbZRf7CHcoCea9PgsJyG/fwgpXRiwA4JFODzXxUJs18TriN0T I3GkFMza6KM/tGu11mxRA0aThJibuVXKUb6uGjZA+c6XHLrXVc6yL4MH/ ifg+jB+1aV5+F6GZvWMxRxnDZhDNFpU4L0duifHR6qf05RxbQ5A2IjJCB g==; X-IronPort-AV: E=McAfee;i="6600,9927,10954"; a="6408206" X-IronPort-AV: E=Sophos;i="6.04,197,1695711600"; d="scan'208";a="6408206" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 09:01:40 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10954"; a="907101354" X-IronPort-AV: E=Sophos;i="6.04,197,1695711600"; d="scan'208";a="907101354" Received: from mleonvig-mobl1.ger.corp.intel.com (HELO localhost.localdomain) ([10.213.223.101]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 09:01:37 -0800 From: Tvrtko Ursulin To: linux-kernel@vger.kernel.org, tvrtko.ursulin@linux.intel.com Cc: Tvrtko Ursulin , Peter Zijlstra , Umesh Nerlige Ramappa , Aravind Iddamsetty Subject: [RFC 3/3] perf: Reference count struct perf_cpu_pmu_context to fix driver unbind Date: Mon, 15 Jan 2024 17:01:20 +0000 Message-Id: <20240115170120.662220-4-tvrtko.ursulin@linux.intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240115170120.662220-1-tvrtko.ursulin@linux.intel.com> References: <20240115170120.662220-1-tvrtko.ursulin@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Tvrtko Ursulin If a PMU driver is a PCI driver which can be unbound at runtime there is currently an use after free issue with CPU contexts when an active perf fd is kept around. Specifically, when perf_pmu_unregister() calls free_pmu_context() on the PMU provided per cpu struct perf_cpu_pmu_context storage, any call path which ends up in event_sched_out() (such as __perf_remove_from_context() via perf_event_release_kernel()) will dereference a freed event->pmu_ctx. Furthermore if the same percpu area has in the meantime been re-allocated, the use after free will corrupt someone elses per cpu storage area. To fix it we attempt to add reference counting to struct perf_cpu_pmu_context such that the object remains until the last user is done with it. TODO/FIXME/QQQ: 1) I am really not sure about the locking here. I *think* I needed a per struct pmu counter and by looking at what find_get_pmu_context does when it takes a slot from driver provided pmu->cpu_pmu_context under the ctx->lock, it looked like that should be sufficient. Maybe even if not atomic_t. Or maybe ctx->lock is not enough. 2) I believe pmu->pmu_disable_count will need a similar treatment, but as I wasn't sure of the locking model, or even if this all makes sense on the high level I left it out for now. Like does the idea to reference count even flies or a completely different solution will be needed. Signed-off-by: Tvrtko Ursulin Cc: Peter Zijlstra Cc: Umesh Nerlige Ramappa Cc: Aravind Iddamsetty --- include/linux/perf_event.h | 1 + kernel/events/core.c | 21 ++++++++++++++------- 2 files changed, 15 insertions(+), 7 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index a567d2d98be1..bd1c8f3c1736 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -317,6 +317,7 @@ struct pmu { =20 int __percpu *pmu_disable_count; struct perf_cpu_pmu_context __percpu *cpu_pmu_context; + atomic_t cpu_pmu_context_refcount; atomic_t exclusive_cnt; /* < 0: cpu; > 0: tsk */ int task_ctx_nr; int hrtimer_interval_ms; diff --git a/kernel/events/core.c b/kernel/events/core.c index 4b62d2201ca7..0c95aecf560a 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -4873,6 +4873,9 @@ find_get_pmu_context(struct pmu *pmu, struct perf_eve= nt_context *ctx, atomic_inc(&epc->refcount); } raw_spin_unlock_irq(&ctx->lock); + + atomic_inc(&pmu->cpu_pmu_context_refcount); + return epc; } =20 @@ -4928,6 +4931,12 @@ find_get_pmu_context(struct pmu *pmu, struct perf_ev= ent_context *ctx, return epc; } =20 +static void put_pmu_context(struct pmu *pmu) +{ + if (atomic_dec_and_test(&pmu->cpu_pmu_context_refcount)) + free_percpu(pmu->cpu_pmu_context); +} + static void get_pmu_ctx(struct perf_event_pmu_context *epc) { WARN_ON_ONCE(!atomic_inc_not_zero(&epc->refcount)); @@ -4967,8 +4976,10 @@ static void put_pmu_ctx(struct perf_event_pmu_contex= t *epc) =20 raw_spin_unlock_irqrestore(&ctx->lock, flags); =20 - if (epc->embedded) + if (epc->embedded) { + put_pmu_context(epc->pmu); return; + } =20 call_rcu(&epc->rcu_head, free_epc_rcu); } @@ -11347,11 +11358,6 @@ static int perf_event_idx_default(struct perf_even= t *event) return 0; } =20 -static void free_pmu_context(struct pmu *pmu) -{ - free_percpu(pmu->cpu_pmu_context); -} - /* * Let userspace know that this PMU supports address range filtering: */ @@ -11573,6 +11579,7 @@ int perf_pmu_register(struct pmu *pmu, const char *= name, int type) pmu->event_idx =3D perf_event_idx_default; =20 list_add_rcu(&pmu->entry, &pmus); + atomic_set(&pmu->cpu_pmu_context_refcount, 1); atomic_set(&pmu->exclusive_cnt, 0); ret =3D 0; unlock: @@ -11615,7 +11622,7 @@ void perf_pmu_unregister(struct pmu *pmu) device_del(pmu->dev); put_device(pmu->dev); } - free_pmu_context(pmu); + put_pmu_context(pmu); mutex_unlock(&pmus_lock); } EXPORT_SYMBOL_GPL(perf_pmu_unregister); --=20 2.40.1