From nobody Mon Dec 15 22:38:08 2025 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3923F24FC03 for ; Fri, 7 Mar 2025 19:38:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741376301; cv=none; b=Q1X3MMPYF0vfK1pspwKemrIJdAYfEoVunn3cruELnU/BMzRJ5VN8Ir61D5tk6IcUm76ACAtnjDKB8/x7O07oTaXHYND7UuPcre/rBeI1Fnka2airfygNILiBbJocOvF7AyYehu8vT8yXT0/H4k/TIQ53nJ9vO32+81urOJt0csU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741376301; c=relaxed/simple; bh=PT0MFRFyLD4Nrt+AWvY1iqPo5Fb8gsgbfH3i68RevHU=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=dPcU6eo6E1FIXNFQqVHQT04Vso9nns/P/z4ivTtp/wOlnE+sN8RZgERd1OxTVVuwbYM6XPO/s2dbkp2/rsRoBFPYF1QDwWM6EM+KCWnUP++yNhV6GqMY+mEEjz1SWuiiPULU9I/GtzEj0XusQ59fADxzocrlSpnW/D0VUbejXDM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=a9wA8ZU2; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="a9wA8ZU2" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=e+B1evRM8nlK/KgEVRn/y53WuRUX9fT+vNoNZwtwPyU=; b=a9wA8ZU2VGoCYhyHcc/wTRC6/K PCVBAB5n07aO+SeVX3m4T4NNJggw0ZcbM1N4otu8QWf2gmqEUWaVO5Qg3yBUl/SlBCcusBjxSnGa6 wj2lGStspaZkjUH+zSIKxM/v8fbBMnqrcsrezTLe9yhMdUVniRU3EhBh7FbT2FVjFAg6EK5GVFvF0 XtZnPGlPMA7a9hL/sABpafuac77reWKqAuE90Ex2r5evd2nuZ8t0vnDbk6zbgE2Al0z8ULPAZCS// oa8KdEOjCVs+T8I0xBcXW+g4MsGs6dINnAFSXgUgnjh8t+UEIDyWzbvpiETB0AEhA4QrZd1CylKei yjE96wSg==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.98 #2 (Red Hat Linux)) id 1tqdWP-00000001PmJ-2myb; Fri, 07 Mar 2025 19:38:13 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 099BF3006C0; Fri, 7 Mar 2025 20:38:13 +0100 (CET) Message-ID: <20250307193722.827444714@infradead.org> User-Agent: quilt/0.66 Date: Fri, 07 Mar 2025 20:33:06 +0100 From: Peter Zijlstra To: mingo@kernel.org, ravi.bangoria@amd.com, lucas.demarchi@intel.com Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, acme@kernel.org, namhyung@kernel.org, mark.rutland@arm.com, alexander.shishkin@linux.intel.com, jolsa@kernel.org, irogers@google.com, adrian.hunter@intel.com, kan.liang@linux.intel.com Subject: [PATCH v3 1/7] perf: Ensure bpf_perf_link path is properly serialized References: <20250307193305.486326750@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Ravi reported that the bpf_perf_link_attach() usage of perf_event_set_bpf_prog() is not serialized by ctx->mutex, unlike the PERF_EVENT_IOC_SET_BPF case. Reported-by: Ravi Bangoria Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Ravi Bangoria --- kernel/events/core.c | 34 ++++++++++++++++++++++++++++++---- 1 file changed, 30 insertions(+), 4 deletions(-) --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -6087,6 +6087,9 @@ static int perf_event_set_output(struct static int perf_event_set_filter(struct perf_event *event, void __user *ar= g); static int perf_copy_attr(struct perf_event_attr __user *uattr, struct perf_event_attr *attr); +static int __perf_event_set_bpf_prog(struct perf_event *event, + struct bpf_prog *prog, + u64 bpf_cookie); =20 static long _perf_ioctl(struct perf_event *event, unsigned int cmd, unsign= ed long arg) { @@ -6149,7 +6152,7 @@ static long _perf_ioctl(struct perf_even if (IS_ERR(prog)) return PTR_ERR(prog); =20 - err =3D perf_event_set_bpf_prog(event, prog, 0); + err =3D __perf_event_set_bpf_prog(event, prog, 0); if (err) { bpf_prog_put(prog); return err; @@ -10875,8 +10878,9 @@ static inline bool perf_event_is_tracing return false; } =20 -int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *pro= g, - u64 bpf_cookie) +static int __perf_event_set_bpf_prog(struct perf_event *event, + struct bpf_prog *prog, + u64 bpf_cookie) { bool is_kprobe, is_uprobe, is_tracepoint, is_syscall_tp; =20 @@ -10914,6 +10918,20 @@ int perf_event_set_bpf_prog(struct perf_ return perf_event_attach_bpf_prog(event, prog, bpf_cookie); } =20 +int perf_event_set_bpf_prog(struct perf_event *event, + struct bpf_prog *prog, + u64 bpf_cookie) +{ + struct perf_event_context *ctx; + int ret; + + ctx =3D perf_event_ctx_lock(event); + ret =3D __perf_event_set_bpf_prog(event, prog, bpf_cookie); + perf_event_ctx_unlock(event, ctx); + + return ret; +} + void perf_event_free_bpf_prog(struct perf_event *event) { if (!event->prog) @@ -10936,7 +10954,15 @@ static void perf_event_free_filter(struc { } =20 -int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *pro= g, +static int __perf_event_set_bpf_prog(struct perf_event *event, + struct bpf_prog *prog, + u64 bpf_cookie) +{ + return -ENOENT; +} + +int perf_event_set_bpf_prog(struct perf_event *event, + struct bpf_prog *prog, u64 bpf_cookie) { return -ENOENT; From nobody Mon Dec 15 22:38:08 2025 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A94B8254868 for ; Fri, 7 Mar 2025 19:38:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741376302; cv=none; b=BqiGOJF4ZeBJhY2H4HCgy586eMV1UYT/Nkf/Ww5MRTcDgDWOuJ1lRh06Alym4c0siMv6RAFQOh36h42+XNaA29GO/MLmECt8zVZa3nlIpXAlc5sLe7SAW+v6nvF1WVEDdAfB8z9wFVaXCtYYW7085ai58ZFX0ICnkShS3LDnG8I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741376302; c=relaxed/simple; bh=h2Qfm5AD490nSk65qA0H8/AR7PVOmz7D7DXR+8aLOs4=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=i9pBK7ETvrSj/YN8GOhJUnguzwCGFCGfTvZrmGiygVcBU+CJjl/HotJHz6YuTImFoA0vlyD4kZPad/zLjCJ2woPUN7FIoG6JuddmT8nDMgaaQdOcAp7OwSirNLWhkqqTTGikz+8lh1hBkjurLXFlBQ7SqdAudA9p3PwoLNC+wVA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=ZiCDQEuN; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="ZiCDQEuN" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=tgWtq/8LZOPD9ofFzcDLs5SJ7dmiKAcTwTGhgjpQnRY=; b=ZiCDQEuN0V4azqR8NBlCLmKi3o +75qbdOkVorQhW/qulUAe2n8Sw1dj5dyG+VcoiWoqhf6fHdZA1a7xWVUJXtzdqjdpxX6+ZSYTEUJ1 eNaXv/gGzzBKAlceM7JaK73jOHXfkMserlJ8yWPdDtKvRrCBrhO3aKWM5/W+Ei8FYIuJYbCkJiO0L GSdwspNpYgdHCF5uAVx15oavTvuX+r7vJPUQERcpr2ZNNeIcWqPT9ew/mwSxNqpByow+73s5AnVwe IZlsOZZFmeKR+DAz5KbprXF34wLT+R3iCya8x6B+cwubHxsvcxLa91kv726DRnizHkWOWG4A47V13 BYa9V+PQ==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.98 #2 (Red Hat Linux)) id 1tqdWP-00000001PmK-2lWB; Fri, 07 Mar 2025 19:38:13 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 0DBEE300780; Fri, 7 Mar 2025 20:38:13 +0100 (CET) Message-ID: <20250307193722.935929443@infradead.org> User-Agent: quilt/0.66 Date: Fri, 07 Mar 2025 20:33:07 +0100 From: Peter Zijlstra To: mingo@kernel.org, ravi.bangoria@amd.com, lucas.demarchi@intel.com Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, acme@kernel.org, namhyung@kernel.org, mark.rutland@arm.com, alexander.shishkin@linux.intel.com, jolsa@kernel.org, irogers@google.com, adrian.hunter@intel.com, kan.liang@linux.intel.com Subject: [PATCH v3 2/7] perf: Simplify child event tear-down References: <20250307193305.486326750@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Currently perf_event_release_kernel() will iterate the child events and att= empt tear-down. However, it removes them from the child_list using list_move(), notably skipping the state management done by perf_child_detach(). Crucially, it fails to clear PERF_ATTACH_CHILD, which opens the door for a concurrent perf_remove_from_context() to race. This way child_list management stays fully serialized using child_mutex. Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Ravi Bangoria --- kernel/events/core.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -2338,7 +2338,11 @@ static void perf_child_detach(struct per if (WARN_ON_ONCE(!parent_event)) return; =20 + /* + * Can't check this from an IPI, the holder is likey another CPU. + * lockdep_assert_held(&parent_event->child_mutex); + */ =20 sync_child_event(event); list_del_init(&event->child_list); @@ -5611,8 +5615,8 @@ int perf_event_release_kernel(struct per tmp =3D list_first_entry_or_null(&event->child_list, struct perf_event, child_list); if (tmp =3D=3D child) { - perf_remove_from_context(child, DETACH_GROUP); - list_move(&child->child_list, &free_list); + perf_remove_from_context(child, DETACH_GROUP | DETACH_CHILD); + list_add(&child->child_list, &free_list); /* * This matches the refcount bump in inherit_event(); * this can't be the last reference. From nobody Mon Dec 15 22:38:08 2025 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 391CC1A3035 for ; Fri, 7 Mar 2025 19:38:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741376301; cv=none; b=Qh2RdzIQGDNVAJvQeReu4M9pZm+doOQUVTStpiQSrvz1hRRN0+gSyC047/sA1KiiV8Ny5mZknxIJwM3d3xyyhEChFXE4x8ubDou/Wt3UW0SDQ2CV/zLZiYraogVlLF6mfjag53nf8VfnlQvrq/7z/tYLq68iWBc88AWprxTA76w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741376301; c=relaxed/simple; bh=xhcyai6CZkGCFIa3quGH0XnAPCUhm8a9+8kObGkcbpQ=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=S1iKQ3A4kNj7+BQ1woXfwY3roSmWDPcP0LfG8Y57ppthXCF6yyF/s+mI/32DGd+xn7/N6lNr7Lgj6B60sDx13Ve3rd0btm62asSPjm/RRlWwEcVdfv7S2IWbRLR39HmNZ3knwTzxaw9vsAwhemPXLfNcPZU+f/fKjrvlzuN0x38= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=TPuo4sXS; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="TPuo4sXS" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=0DfnIZ92YyJHiYWZaZSFmOmWqQ7IfHZj6snzAYyr9+Y=; b=TPuo4sXSgiYpAabmD26v7Kf2fw 9R+BalqVXt8U9r94TC4vK85l//3CoyV/Iv/L59MyzmD3AzLejL44CJwiDPWe+WS41B2DDeYS5PYVv FcmoYwhfddBNeJ9m1e48uOw/iyREdEv6w2j1u6y7ankvJcVo5joeuIFGiNIZ9ABhl8E44ZPnMnUO9 kNLXeMLGPTrOfp/NgoMnxL/n1AHT8cLOXZZChmvvbYL5n3gGNs18JzU4wV2VzGybYsYZcoSw6P5wf ljkSGFSD1s9LNc0GFmdpwnGYjkrn3GqLVMYuZHhTu7ceovKVcvZ6zsconVVB0smdOHcKBo0ZbnFXD JdR8hgqw==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.98 #2 (Red Hat Linux)) id 1tqdWP-00000001PmI-2gqw; Fri, 07 Mar 2025 19:38:13 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 118C830078C; Fri, 7 Mar 2025 20:38:13 +0100 (CET) Message-ID: <20250307193723.044499344@infradead.org> User-Agent: quilt/0.66 Date: Fri, 07 Mar 2025 20:33:08 +0100 From: Peter Zijlstra To: mingo@kernel.org, ravi.bangoria@amd.com, lucas.demarchi@intel.com Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, acme@kernel.org, namhyung@kernel.org, mark.rutland@arm.com, alexander.shishkin@linux.intel.com, jolsa@kernel.org, irogers@google.com, adrian.hunter@intel.com, kan.liang@linux.intel.com Subject: [PATCH v3 3/7] perf: Simplify perf_event_free_task() wait References: <20250307193305.486326750@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Simplify the code by moving the duplicated wakeup condition into put_ctx(). Notably, wait_var_event() is in perf_event_free_task() and will have set ctx->task =3D TASK_TOMBSTONE. Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Ravi Bangoria --- kernel/events/core.c | 32 ++++++++------------------------ 1 file changed, 8 insertions(+), 24 deletions(-) --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -1223,8 +1223,14 @@ static void put_ctx(struct perf_event_co if (refcount_dec_and_test(&ctx->refcount)) { if (ctx->parent_ctx) put_ctx(ctx->parent_ctx); - if (ctx->task && ctx->task !=3D TASK_TOMBSTONE) - put_task_struct(ctx->task); + if (ctx->task) { + if (ctx->task =3D=3D TASK_TOMBSTONE) { + smp_mb(); /* pairs with wait_var_event() */ + wake_up_var(&ctx->refcount); + } else { + put_task_struct(ctx->task); + } + } call_rcu(&ctx->rcu_head, free_ctx); } } @@ -5492,8 +5498,6 @@ int perf_event_release_kernel(struct per again: mutex_lock(&event->child_mutex); list_for_each_entry(child, &event->child_list, child_list) { - void *var =3D NULL; - /* * Cannot change, child events are not migrated, see the * comment with perf_event_ctx_lock_nested(). @@ -5533,39 +5537,19 @@ int perf_event_release_kernel(struct per * this can't be the last reference. */ put_event(event); - } else { - var =3D &ctx->refcount; } =20 mutex_unlock(&event->child_mutex); mutex_unlock(&ctx->mutex); put_ctx(ctx); =20 - if (var) { - /* - * If perf_event_free_task() has deleted all events from the - * ctx while the child_mutex got released above, make sure to - * notify about the preceding put_ctx(). - */ - smp_mb(); /* pairs with wait_var_event() */ - wake_up_var(var); - } goto again; } mutex_unlock(&event->child_mutex); =20 list_for_each_entry_safe(child, tmp, &free_list, child_list) { - void *var =3D &child->ctx->refcount; - list_del(&child->child_list); free_event(child); - - /* - * Wake any perf_event_free_task() waiting for this event to be - * freed. - */ - smp_mb(); /* pairs with wait_var_event() */ - wake_up_var(var); } =20 no_ctx: From nobody Mon Dec 15 22:38:08 2025 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0F765254858 for ; Fri, 7 Mar 2025 19:38:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741376301; cv=none; b=kJ2G6qnfS4zm57+kkxcJqwCAGZb/i0pO+G3PiNTn+6LSTof5y91ZwyMx9ARpDsBHj2pVYuzCmW0ITErqCtC3GxE3K4I+8XdOzdUJIobwKnLugGtsPG11NgpIXRG9R8nZ+1utw97voGFC/XKWgXe2lSsk0ImSMns41RfcTO220Ko= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741376301; c=relaxed/simple; bh=TrICi+isOBGGRKsfgqcPVSNpn0NACYzEFgWZCgPRMR0=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=PooaNuoEXFNn4kDsRldyag4xYZzpoNqO3Sn6PwYugY04pLALh1lmkrdUsSfImQBkBo4rWkkqCfU8YyFqqpy+E0iur4n7D1PSlCIqERj8L9EE3nm2USbXamFBwlEe5vNPTVvhx2T76uyBb+2dH6ZeteuD0LjcDrnH2pOhkrsIHG8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=vWfcn+4q; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="vWfcn+4q" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=UQmPsELNCK6nA5UcqNGR072lhiJXyTi8/Mos2B9S288=; b=vWfcn+4q7TBJYUDc//EasjRHCV bAuW337sdeHZwPmbsioYNx/JOEaoy2iMFHtKmlObwuXVxxZ725DA6Pg8BtQrRA8JB3+SrAMG8UNNM iBBpNcSXlPq8hwqqckzno4IWOz+g+jYDg9ULWv+qlTY0N6xxtmVfHTSFdKBJPp6audmcQtg78MCNf ki1L094bY8o7KWLDwv0mIPLo1IADd+E6u3eNsV57gZwpkSfvdWhqzyPTiRm7I0P/uMA49q+/n/U+U w0lN0kj2+iOUK5ge1Cid+YDdzn4sJn2HUMrW+MZ4d0TcFBcV5Hjs0UoSInp3p11IhR7/2sPAMQB+d WN/4udeQ==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98 #2 (Red Hat Linux)) id 1tqdWP-0000000EKgK-351Z; Fri, 07 Mar 2025 19:38:13 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 14FEB30088D; Fri, 7 Mar 2025 20:38:13 +0100 (CET) Message-ID: <20250307193723.151721102@infradead.org> User-Agent: quilt/0.66 Date: Fri, 07 Mar 2025 20:33:09 +0100 From: Peter Zijlstra To: mingo@kernel.org, ravi.bangoria@amd.com, lucas.demarchi@intel.com Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, acme@kernel.org, namhyung@kernel.org, mark.rutland@arm.com, alexander.shishkin@linux.intel.com, jolsa@kernel.org, irogers@google.com, adrian.hunter@intel.com, kan.liang@linux.intel.com Subject: [PATCH v3 4/7] perf: Simplify perf_event_release_kernel() References: <20250307193305.486326750@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" There is no good reason to have the free list anymore. It is possible to call free_event() after the locks have been dropped in the main loop. Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Ravi Bangoria --- kernel/events/core.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -5462,7 +5462,6 @@ int perf_event_release_kernel(struct per { struct perf_event_context *ctx =3D event->ctx; struct perf_event *child, *tmp; - LIST_HEAD(free_list); =20 /* * If we got here through err_alloc: free_event(event); we will not @@ -5531,27 +5530,26 @@ int perf_event_release_kernel(struct per struct perf_event, child_list); if (tmp =3D=3D child) { perf_remove_from_context(child, DETACH_GROUP | DETACH_CHILD); - list_add(&child->child_list, &free_list); /* * This matches the refcount bump in inherit_event(); * this can't be the last reference. */ put_event(event); + } else { + child =3D NULL; } =20 mutex_unlock(&event->child_mutex); mutex_unlock(&ctx->mutex); + + if (child) + free_event(child); put_ctx(ctx); =20 goto again; } mutex_unlock(&event->child_mutex); =20 - list_for_each_entry_safe(child, tmp, &free_list, child_list) { - list_del(&child->child_list); - free_event(child); - } - no_ctx: put_event(event); /* Must be the 'last' reference */ return 0; From nobody Mon Dec 15 22:38:08 2025 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B1F00250BFB for ; Fri, 7 Mar 2025 19:38:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741376301; cv=none; b=JoafmT1ejVw+KwQmETIyqAXl1xyyO/MWiwBhilXGNKNf1bUphfsyWwSsXsoGW/juUJIsNVBAgMsMw6HLZB8Z2ZSWVRe/vZQWMfwBcV98sRH4tsURd24920xupDZU18geKJtmczK/dcUAPgFiqPst0Tf4vgRMWIHmKX+ShiJkce0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741376301; c=relaxed/simple; bh=hbyD5rRgXV/cMl6XH8Ew53TqYXbGwic7abLzIsRroX8=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=Re3/PEryJLaJWfSc/8WrSbqLh04yoHd2EASBPsQcF17iudBDfj8Mq/cFQMhdEGGFH6wbV0o2kTZtmxNNLG69mQg3W7CrBVaMUbPFPmlW+FPjghFVuQLrtn9rMGja1YsRW1cZM/nYL/YHOYNecD+4ZeKggwJMQILYBiSt2U23W/Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=NQVIx2Zd; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="NQVIx2Zd" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=thzfnb0ghCtJVaioL2il6H50r6cCIYQARyaxQotwpvs=; b=NQVIx2ZdoqlBJRxHfOWb7oHD+D SN0cm2nV674k/DBDN5pqI65aRT9DmVriVPSNFvdqnJmoTHUUsz2vCPDGo+9iHXDlEVRteppUBXJ07 e4H03xkXJ3QmMf1LMALkQ+XaMMWL4hhexpz79Z+xw7jzqtsRcypnpg1nE+RbntcVyj21GXtdbG0l8 MZD4uk1sTk76DbGWasGDB8BUPYZkMO8480KyTY7a0qf1qLINKYMq1XipeOBTQtDtRNJY92LbBD/n7 /QDt5H2gEIR/zAW6vgLXXDEwPVyQbNYfqM7WzMD4s2Y6xElNLPSb5I2WDSIsH22OYP2G30bxsgdaU Tjn4uorA==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.98 #2 (Red Hat Linux)) id 1tqdWQ-00000001PmP-1A2P; Fri, 07 Mar 2025 19:38:14 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 19062300F1D; Fri, 7 Mar 2025 20:38:13 +0100 (CET) Message-ID: <20250307193723.274039710@infradead.org> User-Agent: quilt/0.66 Date: Fri, 07 Mar 2025 20:33:10 +0100 From: Peter Zijlstra To: mingo@kernel.org, ravi.bangoria@amd.com, lucas.demarchi@intel.com Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, acme@kernel.org, namhyung@kernel.org, mark.rutland@arm.com, alexander.shishkin@linux.intel.com, jolsa@kernel.org, irogers@google.com, adrian.hunter@intel.com, kan.liang@linux.intel.com Subject: [PATCH v3 5/7] perf: Unify perf_event_free_task() / perf_event_exit_task_context() References: <20250307193305.486326750@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Both perf_event_free_task() and perf_event_exit_task_context() are very similar, except perf_event_exit_task_context() is a little more generic / makes less assumptions. Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Ravi Bangoria --- kernel/events/core.c | 88 ++++++++++++----------------------------------= ----- 1 file changed, 22 insertions(+), 66 deletions(-) --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -13488,7 +13488,7 @@ perf_event_exit_event(struct perf_event perf_event_wakeup(event); } =20 -static void perf_event_exit_task_context(struct task_struct *child) +static void perf_event_exit_task_context(struct task_struct *child, bool e= xit) { struct perf_event_context *child_ctx, *clone_ctx =3D NULL; struct perf_event *child_event, *next; @@ -13539,13 +13539,31 @@ static void perf_event_exit_task_context * won't get any samples after PERF_RECORD_EXIT. We can however still * get a few PERF_RECORD_READ events. */ - perf_event_task(child, child_ctx, 0); + if (exit) + perf_event_task(child, child_ctx, 0); =20 list_for_each_entry_safe(child_event, next, &child_ctx->event_list, event= _entry) perf_event_exit_event(child_event, child_ctx); =20 mutex_unlock(&child_ctx->mutex); =20 + if (!exit) { + /* + * perf_event_release_kernel() could still have a reference on + * this context. In that case we must wait for these events to + * have been freed (in particular all their references to this + * task must've been dropped). + * + * Without this copy_process() will unconditionally free this + * task (irrespective of its reference count) and + * _free_event()'s put_task_struct(event->hw.target) will be a + * use-after-free. + * + * Wait for all events to drop their context reference. + */ + wait_var_event(&child_ctx->refcount, + refcount_read(&child_ctx->refcount) =3D=3D 1); + } put_ctx(child_ctx); } =20 @@ -13573,7 +13591,7 @@ void perf_event_exit_task(struct task_st } mutex_unlock(&child->perf_event_mutex); =20 - perf_event_exit_task_context(child); + perf_event_exit_task_context(child, true); =20 /* * The perf_event_exit_task_context calls perf_event_task @@ -13584,27 +13602,6 @@ void perf_event_exit_task(struct task_st perf_event_task(child, NULL, 0); } =20 -static void perf_free_event(struct perf_event *event, - struct perf_event_context *ctx) -{ - struct perf_event *parent =3D event->parent; - - if (WARN_ON_ONCE(!parent)) - return; - - mutex_lock(&parent->child_mutex); - list_del_init(&event->child_list); - mutex_unlock(&parent->child_mutex); - - put_event(parent); - - raw_spin_lock_irq(&ctx->lock); - perf_group_detach(event); - list_del_event(event, ctx); - raw_spin_unlock_irq(&ctx->lock); - free_event(event); -} - /* * Free a context as created by inheritance by perf_event_init_task() belo= w, * used by fork() in case of fail. @@ -13614,48 +13611,7 @@ static void perf_free_event(struct perf_ */ void perf_event_free_task(struct task_struct *task) { - struct perf_event_context *ctx; - struct perf_event *event, *tmp; - - ctx =3D rcu_access_pointer(task->perf_event_ctxp); - if (!ctx) - return; - - mutex_lock(&ctx->mutex); - raw_spin_lock_irq(&ctx->lock); - /* - * Destroy the task <-> ctx relation and mark the context dead. - * - * This is important because even though the task hasn't been - * exposed yet the context has been (through child_list). - */ - RCU_INIT_POINTER(task->perf_event_ctxp, NULL); - WRITE_ONCE(ctx->task, TASK_TOMBSTONE); - put_task_struct(task); /* cannot be last */ - raw_spin_unlock_irq(&ctx->lock); - - - list_for_each_entry_safe(event, tmp, &ctx->event_list, event_entry) - perf_free_event(event, ctx); - - mutex_unlock(&ctx->mutex); - - /* - * perf_event_release_kernel() could've stolen some of our - * child events and still have them on its free_list. In that - * case we must wait for these events to have been freed (in - * particular all their references to this task must've been - * dropped). - * - * Without this copy_process() will unconditionally free this - * task (irrespective of its reference count) and - * _free_event()'s put_task_struct(event->hw.target) will be a - * use-after-free. - * - * Wait for all events to drop their context reference. - */ - wait_var_event(&ctx->refcount, refcount_read(&ctx->refcount) =3D=3D 1); - put_ctx(ctx); /* must be last */ + perf_event_exit_task_context(task, false); } =20 void perf_event_delayed_put(struct task_struct *task) From nobody Mon Dec 15 22:38:08 2025 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9FE81183CB0 for ; Fri, 7 Mar 2025 19:38:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741376301; cv=none; b=ltOYT7M12ukSZHnoFrhFs6TDg1Tf/a3S4W6PaMyvT6bWGvBONTwtTsKjsfFO06KQ/wq4p/tDpUAom9zPwYBQvmj4jxlGy3yLT7Z4ilU4EEOTLDa0KCJFCYa2ShjdUcrGKK2eMZF57xsa1iD5ndnSR64X6jI6KnmMNRurtXs08j4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741376301; c=relaxed/simple; bh=lsgvCLPhKx3wqpaGv5bL+h/ym4bF5M8XOSzqFs7GOgg=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=HNc9jPTtSThbamofFkjfarp7XRVGX2/eUqT1D7gsGJl9B9Wd4vusLgrMm2ezBrI+T62c7n5X7eTIMvMFONImdebFAVb5mRf1L7wYOmUihf5dAMtLAYO2EhAhjo6MT7CxFDWl3D8t87NjB8vIR1+wzzulzjvX46RMvIZlOMePhU8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=khASgWTJ; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="khASgWTJ" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=Gx3DrT3i4RY+wSttB1b6LIWTxgvuxu14FZqcRmFdk9U=; b=khASgWTJeigzTU9xGlaD+IwFjB FuEOHBLbL8WUNPQMHPG19cQ6mk7Eb2Xym7SMigadVXYSu4y/lcNfxllnyvdLzvaUG4jlTowqflwdB 69BPCmui5wzvpGxiTOiFo+PCIIbbDNueD3qn+tHIQheY5fTOaYLWF8RVGiWvfK21tqbBaaWWyNPoJ O3bWAYa4HrkeNsDp/90ldU4xK/feeDHRbUg8r5FpaFUpiitIg7UlI2dBqx5WHAHYVAXvYUqD/Wf5I yD5Ic4nhSUtFvL6TjtCI+cWKXyvkdAYGqooDpAAC8KdKKS4Ahu836VzRZV4ycLtlaShYHfohr9U4F gHyR5+ow==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.98 #2 (Red Hat Linux)) id 1tqdWQ-00000001PmO-18XV; Fri, 07 Mar 2025 19:38:14 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 1CD6230114C; Fri, 7 Mar 2025 20:38:13 +0100 (CET) Message-ID: <20250307193723.417881572@infradead.org> User-Agent: quilt/0.66 Date: Fri, 07 Mar 2025 20:33:11 +0100 From: Peter Zijlstra To: mingo@kernel.org, ravi.bangoria@amd.com, lucas.demarchi@intel.com Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, acme@kernel.org, namhyung@kernel.org, mark.rutland@arm.com, alexander.shishkin@linux.intel.com, jolsa@kernel.org, irogers@google.com, adrian.hunter@intel.com, kan.liang@linux.intel.com Subject: [PATCH v3 6/7] perf: Rename perf_event_exit_task(.child) References: <20250307193305.486326750@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The task passed to perf_event_exit_task() is not a child, it is current. Fix this confusing naming, since much of the rest of the code also relies on it being current. Specifically, both exec() and exit() callers use it with current as the argument. Notably, task_ctx_sched_out() doesn't make much sense outside of current. Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Ravi Bangoria --- kernel/events/core.c | 60 ++++++++++++++++++++++++++--------------------= ----- 1 file changed, 31 insertions(+), 29 deletions(-) --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -13488,15 +13488,15 @@ perf_event_exit_event(struct perf_event perf_event_wakeup(event); } =20 -static void perf_event_exit_task_context(struct task_struct *child, bool e= xit) +static void perf_event_exit_task_context(struct task_struct *task, bool ex= it) { - struct perf_event_context *child_ctx, *clone_ctx =3D NULL; + struct perf_event_context *ctx, *clone_ctx =3D NULL; struct perf_event *child_event, *next; =20 - WARN_ON_ONCE(child !=3D current); + WARN_ON_ONCE(task !=3D current); =20 - child_ctx =3D perf_pin_task_context(child); - if (!child_ctx) + ctx =3D perf_pin_task_context(task); + if (!ctx) return; =20 /* @@ -13509,27 +13509,27 @@ static void perf_event_exit_task_context * without ctx::mutex (it cannot because of the move_group double mutex * lock thing). See the comments in perf_install_in_context(). */ - mutex_lock(&child_ctx->mutex); + mutex_lock(&ctx->mutex); =20 /* * In a single ctx::lock section, de-schedule the events and detach the * context from the task such that we cannot ever get it scheduled back * in. */ - raw_spin_lock_irq(&child_ctx->lock); - task_ctx_sched_out(child_ctx, NULL, EVENT_ALL); + raw_spin_lock_irq(&ctx->lock); + task_ctx_sched_out(ctx, NULL, EVENT_ALL); =20 /* * Now that the context is inactive, destroy the task <-> ctx relation * and mark the context dead. */ - RCU_INIT_POINTER(child->perf_event_ctxp, NULL); - put_ctx(child_ctx); /* cannot be last */ - WRITE_ONCE(child_ctx->task, TASK_TOMBSTONE); + RCU_INIT_POINTER(task->perf_event_ctxp, NULL); + put_ctx(ctx); /* cannot be last */ + WRITE_ONCE(ctx->task, TASK_TOMBSTONE); put_task_struct(current); /* cannot be last */ =20 - clone_ctx =3D unclone_ctx(child_ctx); - raw_spin_unlock_irq(&child_ctx->lock); + clone_ctx =3D unclone_ctx(ctx); + raw_spin_unlock_irq(&ctx->lock); =20 if (clone_ctx) put_ctx(clone_ctx); @@ -13540,12 +13540,12 @@ static void perf_event_exit_task_context * get a few PERF_RECORD_READ events. */ if (exit) - perf_event_task(child, child_ctx, 0); + perf_event_task(task, ctx, 0); =20 - list_for_each_entry_safe(child_event, next, &child_ctx->event_list, event= _entry) - perf_event_exit_event(child_event, child_ctx); + list_for_each_entry_safe(child_event, next, &ctx->event_list, event_entry) + perf_event_exit_event(child_event, ctx); =20 - mutex_unlock(&child_ctx->mutex); + mutex_unlock(&ctx->mutex); =20 if (!exit) { /* @@ -13561,24 +13561,26 @@ static void perf_event_exit_task_context * * Wait for all events to drop their context reference. */ - wait_var_event(&child_ctx->refcount, - refcount_read(&child_ctx->refcount) =3D=3D 1); + wait_var_event(&ctx->refcount, + refcount_read(&ctx->refcount) =3D=3D 1); } - put_ctx(child_ctx); + put_ctx(ctx); } =20 /* - * When a child task exits, feed back event values to parent events. + * When a task exits, feed back event values to parent events. * * Can be called with exec_update_lock held when called from * setup_new_exec(). */ -void perf_event_exit_task(struct task_struct *child) +void perf_event_exit_task(struct task_struct *task) { struct perf_event *event, *tmp; =20 - mutex_lock(&child->perf_event_mutex); - list_for_each_entry_safe(event, tmp, &child->perf_event_list, + WARN_ON_ONCE(task !=3D current); + + mutex_lock(&task->perf_event_mutex); + list_for_each_entry_safe(event, tmp, &task->perf_event_list, owner_entry) { list_del_init(&event->owner_entry); =20 @@ -13589,17 +13591,17 @@ void perf_event_exit_task(struct task_st */ smp_store_release(&event->owner, NULL); } - mutex_unlock(&child->perf_event_mutex); + mutex_unlock(&task->perf_event_mutex); =20 - perf_event_exit_task_context(child, true); + perf_event_exit_task_context(task, true); =20 /* * The perf_event_exit_task_context calls perf_event_task - * with child's task_ctx, which generates EXIT events for - * child contexts and sets child->perf_event_ctxp[] to NULL. + * with task's task_ctx, which generates EXIT events for + * task contexts and sets task->perf_event_ctxp[] to NULL. * At this point we need to send EXIT events to cpu contexts. */ - perf_event_task(child, NULL, 0); + perf_event_task(task, NULL, 0); } =20 /* From nobody Mon Dec 15 22:38:08 2025 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0F7D025485E for ; Fri, 7 Mar 2025 19:38:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741376302; cv=none; b=VTgo+9oLjw20BYetCvawZYyAZdwtH5SXynSnqirX7KRYZaNbSh94S6FYPBJLyArstrVQjg9GSoXBNMLgiCib/ZkIisHAvrFRgHO8asSKP1J6rpyTlKiTQ3hssP3hsqOwfVChFfNQ+VdWQA1vPHaaKZ4fzXemF5jPOmSGuAZhAfE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741376302; c=relaxed/simple; bh=mQziWYWulIbxfr3JkNTf7Em/rjmngTozwDN/vSuqu74=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=qtxJeXLT371QMtvOWwlmM4Qv4n/tpoyv4MIiDdc9oorRc1k4mhr1GyzQrSKZnDaZQJSf4CBRqxVbKEXBJl5rcAzbIn9sh5FIYk/kovRXIXrxD4cHpARqoDYLb4QRqSl9OT7P/GUTlDb3pouGMbsdmUL9sRFTX6mMGSNZFBXPClk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=IkkBHnSA; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="IkkBHnSA" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=4MOL/kM/2yl3V45QTJrAfP/6QxdLrhkyHFC/r4/glI0=; b=IkkBHnSAtQM/O812wMSEmYVqls dTSsIX+IDpbTs/ktGC1/EYOLqMCTTojS6FhFgHV7AexcTPcrmDLBNz2YyBG9/EevuG7NHGyurruo2 fTYCovlc9DU+UrbqJnf7d7SGc/2MmrtEiQbrKrqN/srlpfNmJmXR5C8j5BX/MgUZWzY4PIUdR6Aeu wxw/LLit8lt6SeYPWhXU/cFGDwJclXCgVROWYOelczSwL8ePN1JgwJewu1MBWRjsF/Psr+l6wrvPB 6kKTQypV+qLjmVsYPTV4L/PikdZNRgHz4tcqWF7C/dOWupmfx095aBKUB7ntvy0JoDNT5Co1kgtxu 3Jg+1b2A==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98 #2 (Red Hat Linux)) id 1tqdWQ-0000000EKgR-1CuQ; Fri, 07 Mar 2025 19:38:14 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 2067C30119B; Fri, 7 Mar 2025 20:38:13 +0100 (CET) Message-ID: <20250307193723.525402029@infradead.org> User-Agent: quilt/0.66 Date: Fri, 07 Mar 2025 20:33:12 +0100 From: Peter Zijlstra To: mingo@kernel.org, ravi.bangoria@amd.com, lucas.demarchi@intel.com Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, acme@kernel.org, namhyung@kernel.org, mark.rutland@arm.com, alexander.shishkin@linux.intel.com, jolsa@kernel.org, irogers@google.com, adrian.hunter@intel.com, kan.liang@linux.intel.com Subject: [PATCH v3 7/7] perf: Make perf_pmu_unregister() useable References: <20250307193305.486326750@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Previously it was only safe to call perf_pmu_unregister() if there were no active events of that pmu around -- which was impossible to guarantee since it races all sorts against perf_init_event(). Rework the whole thing by: - keeping track of all events for a given pmu - 'hiding' the pmu from perf_init_event() - waiting for the appropriate (s)rcu grace periods such that all prior references to the PMU will be completed - detaching all still existing events of that pmu (see first point) and moving them to a new REVOKED state. - actually freeing the pmu data. Where notably the new REVOKED state must inhibit all event actions from reaching code that wants to use event->pmu. Signed-off-by: Peter Zijlstra (Intel) Reported-by: "Mi, Dapeng" Reported-by: James Clark Reviewed-by: Ravi Bangoria --- include/linux/perf_event.h | 15 +- kernel/events/core.c | 294 ++++++++++++++++++++++++++++++++++++++++= ----- 2 files changed, 274 insertions(+), 35 deletions(-) --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -325,6 +325,9 @@ struct perf_output_handle; struct pmu { struct list_head entry; =20 + spinlock_t events_lock; + struct list_head events; + struct module *module; struct device *dev; struct device *parent; @@ -632,9 +635,10 @@ struct perf_addr_filter_range { * enum perf_event_state - the states of an event: */ enum perf_event_state { - PERF_EVENT_STATE_DEAD =3D -4, - PERF_EVENT_STATE_EXIT =3D -3, - PERF_EVENT_STATE_ERROR =3D -2, + PERF_EVENT_STATE_DEAD =3D -5, + PERF_EVENT_STATE_REVOKED =3D -4, /* pmu gone, must not touch */ + PERF_EVENT_STATE_EXIT =3D -3, /* task died, still inherit */ + PERF_EVENT_STATE_ERROR =3D -2, /* scheduling error, can enable */ PERF_EVENT_STATE_OFF =3D -1, PERF_EVENT_STATE_INACTIVE =3D 0, PERF_EVENT_STATE_ACTIVE =3D 1, @@ -875,6 +879,7 @@ struct perf_event { void *security; #endif struct list_head sb_list; + struct list_head pmu_list; =20 /* * Certain events gets forwarded to another pmu internally by over- @@ -1132,7 +1137,7 @@ extern void perf_aux_output_flag(struct extern void perf_event_itrace_started(struct perf_event *event); =20 extern int perf_pmu_register(struct pmu *pmu, const char *name, int type); -extern void perf_pmu_unregister(struct pmu *pmu); +extern int perf_pmu_unregister(struct pmu *pmu); =20 extern void __perf_event_task_sched_in(struct task_struct *prev, struct task_struct *task); @@ -1734,7 +1739,7 @@ static inline bool needs_branch_stack(st =20 static inline bool has_aux(struct perf_event *event) { - return event->pmu->setup_aux; + return event->pmu && event->pmu->setup_aux; } =20 static inline bool has_aux_action(struct perf_event *event) --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -207,6 +207,7 @@ static void perf_ctx_unlock(struct perf_ } =20 #define TASK_TOMBSTONE ((void *)-1L) +#define EVENT_TOMBSTONE ((void *)-1L) =20 static bool is_kernel_event(struct perf_event *event) { @@ -2348,6 +2349,11 @@ static void perf_child_detach(struct per =20 sync_child_event(event); list_del_init(&event->child_list); + /* + * Cannot set to NULL, as that would confuse the situation vs + * not being a child event. See for example unaccount_event(). + */ + event->parent =3D EVENT_TOMBSTONE; } =20 static bool is_orphaned_event(struct perf_event *event) @@ -2469,7 +2475,9 @@ ctx_time_update_event(struct perf_event_ =20 #define DETACH_GROUP 0x01UL #define DETACH_CHILD 0x02UL -#define DETACH_DEAD 0x04UL +#define DETACH_EXIT 0x04UL +#define DETACH_REVOKE 0x08UL +#define DETACH_DEAD 0x10UL =20 /* * Cross CPU call to remove a performance event @@ -2484,6 +2492,7 @@ __perf_remove_from_context(struct perf_e void *info) { struct perf_event_pmu_context *pmu_ctx =3D event->pmu_ctx; + enum perf_event_state state =3D PERF_EVENT_STATE_OFF; unsigned long flags =3D (unsigned long)info; =20 ctx_time_update(cpuctx, ctx); @@ -2492,16 +2501,22 @@ __perf_remove_from_context(struct perf_e * Ensure event_sched_out() switches to OFF, at the very least * this avoids raising perf_pending_task() at this time. */ - if (flags & DETACH_DEAD) + if (flags & DETACH_EXIT) + state =3D PERF_EVENT_STATE_EXIT; + if (flags & DETACH_REVOKE) + state =3D PERF_EVENT_STATE_REVOKED; + if (flags & DETACH_DEAD) { event->pending_disable =3D 1; + state =3D PERF_EVENT_STATE_DEAD; + } event_sched_out(event, ctx); if (flags & DETACH_GROUP) perf_group_detach(event); if (flags & DETACH_CHILD) perf_child_detach(event); list_del_event(event, ctx); - if (flags & DETACH_DEAD) - event->state =3D PERF_EVENT_STATE_DEAD; + + event->state =3D min(event->state, state); =20 if (!pmu_ctx->nr_events) { pmu_ctx->rotate_necessary =3D 0; @@ -4560,7 +4575,8 @@ static void perf_event_enable_on_exec(st =20 static void perf_remove_from_owner(struct perf_event *event); static void perf_event_exit_event(struct perf_event *event, - struct perf_event_context *ctx); + struct perf_event_context *ctx, + bool revoke); =20 /* * Removes all events from the current task that have been marked @@ -4587,7 +4603,7 @@ static void perf_event_remove_on_exec(st =20 modified =3D true; =20 - perf_event_exit_event(event, ctx); + perf_event_exit_event(event, ctx, false); } =20 raw_spin_lock_irqsave(&ctx->lock, flags); @@ -5187,6 +5203,7 @@ static bool is_sb_event(struct perf_even attr->context_switch || attr->text_poke || attr->bpf_event) return true; + return false; } =20 @@ -5388,6 +5405,8 @@ static void perf_pending_task_sync(struc /* vs perf_event_alloc() error */ static void __free_event(struct perf_event *event) { + struct pmu *pmu =3D event->pmu; + if (event->attach_state & PERF_ATTACH_CALLCHAIN) put_callchain_buffers(); =20 @@ -5414,6 +5433,7 @@ static void __free_event(struct perf_eve * put_pmu_ctx() needs an event->ctx reference, because of * epc->ctx. */ + WARN_ON_ONCE(!pmu); WARN_ON_ONCE(!event->ctx); WARN_ON_ONCE(event->pmu_ctx->ctx !=3D event->ctx); put_pmu_ctx(event->pmu_ctx); @@ -5426,8 +5446,13 @@ static void __free_event(struct perf_eve if (event->ctx) put_ctx(event->ctx); =20 - if (event->pmu) - module_put(event->pmu->module); + if (pmu) { + module_put(pmu->module); + scoped_guard (spinlock, &pmu->events_lock) { + list_del(&event->pmu_list); + wake_up_var(pmu); + } + } =20 call_rcu(&event->rcu_head, free_event_rcu); } @@ -5575,7 +5600,11 @@ int perf_event_release_kernel(struct per * Thus this guarantees that we will in fact observe and kill _ALL_ * child events. */ - perf_remove_from_context(event, DETACH_GROUP|DETACH_DEAD); + if (event->state > PERF_EVENT_STATE_REVOKED) { + perf_remove_from_context(event, DETACH_GROUP|DETACH_DEAD); + } else { + event->state =3D PERF_EVENT_STATE_DEAD; + } =20 perf_event_ctx_unlock(event, ctx); =20 @@ -5628,7 +5657,7 @@ int perf_event_release_kernel(struct per mutex_unlock(&ctx->mutex); =20 if (child) - free_event(child); + put_event(child); put_ctx(ctx); =20 goto again; @@ -5863,7 +5892,7 @@ __perf_read(struct perf_event *event, ch * error state (i.e. because it was pinned but it couldn't be * scheduled on to the CPU at some point). */ - if (event->state =3D=3D PERF_EVENT_STATE_ERROR) + if (event->state <=3D PERF_EVENT_STATE_ERROR) return 0; =20 if (count < event->read_size) @@ -5902,8 +5931,14 @@ static __poll_t perf_poll(struct file *f struct perf_buffer *rb; __poll_t events =3D EPOLLHUP; =20 + if (event->state <=3D PERF_EVENT_STATE_REVOKED) + return EPOLLERR; + poll_wait(file, &event->waitq, wait); =20 + if (event->state <=3D PERF_EVENT_STATE_REVOKED) + return EPOLLERR; + if (is_event_hup(event)) return events; =20 @@ -6078,6 +6113,9 @@ static long _perf_ioctl(struct perf_even void (*func)(struct perf_event *); u32 flags =3D arg; =20 + if (event->state <=3D PERF_EVENT_STATE_REVOKED) + return -ENODEV; + switch (cmd) { case PERF_EVENT_IOC_ENABLE: func =3D _perf_event_enable; @@ -6453,9 +6491,22 @@ void ring_buffer_put(struct perf_buffer call_rcu(&rb->rcu_head, rb_free_rcu); } =20 +typedef void (*mapped_f)(struct perf_event *event, struct mm_struct *mm); + +#define get_mapped(event, func) \ +({ struct pmu *pmu; \ + mapped_f f =3D NULL; \ + guard(rcu)(); \ + pmu =3D READ_ONCE(event->pmu); \ + if (pmu) \ + f =3D pmu->func; \ + f; \ +}) + static void perf_mmap_open(struct vm_area_struct *vma) { struct perf_event *event =3D vma->vm_file->private_data; + mapped_f mapped =3D get_mapped(event, event_mapped); =20 atomic_inc(&event->mmap_count); atomic_inc(&event->rb->mmap_count); @@ -6463,8 +6514,8 @@ static void perf_mmap_open(struct vm_are if (vma->vm_pgoff) atomic_inc(&event->rb->aux_mmap_count); =20 - if (event->pmu->event_mapped) - event->pmu->event_mapped(event, vma->vm_mm); + if (mapped) + mapped(event, vma->vm_mm); } =20 static void perf_pmu_output_stop(struct perf_event *event); @@ -6480,14 +6531,16 @@ static void perf_pmu_output_stop(struct static void perf_mmap_close(struct vm_area_struct *vma) { struct perf_event *event =3D vma->vm_file->private_data; + mapped_f unmapped =3D get_mapped(event, event_unmapped); struct perf_buffer *rb =3D ring_buffer_get(event); struct user_struct *mmap_user =3D rb->mmap_user; int mmap_locked =3D rb->mmap_locked; unsigned long size =3D perf_data_size(rb); bool detach_rest =3D false; =20 - if (event->pmu->event_unmapped) - event->pmu->event_unmapped(event, vma->vm_mm); + /* FIXIES vs perf_pmu_unregister() */ + if (unmapped) + unmapped(event, vma->vm_mm); =20 /* * The AUX buffer is strictly a sub-buffer, serialize using aux_mutex @@ -6680,6 +6733,7 @@ static int perf_mmap(struct file *file, unsigned long nr_pages; long user_extra =3D 0, extra =3D 0; int ret, flags =3D 0; + mapped_f mapped; =20 /* * Don't allow mmap() of inherited per-task counters. This would @@ -6710,6 +6764,16 @@ static int perf_mmap(struct file *file, mutex_lock(&event->mmap_mutex); ret =3D -EINVAL; =20 + /* + * This relies on __pmu_detach_event() taking mmap_mutex after marking + * the event REVOKED. Either we observe the state, or __pmu_detach_event() + * will detach the rb created here. + */ + if (event->state <=3D PERF_EVENT_STATE_REVOKED) { + ret =3D -ENODEV; + goto unlock; + } + if (vma->vm_pgoff =3D=3D 0) { nr_pages -=3D 1; =20 @@ -6888,8 +6952,9 @@ static int perf_mmap(struct file *file, if (!ret) ret =3D map_range(rb, vma); =20 - if (!ret && event->pmu->event_mapped) - event->pmu->event_mapped(event, vma->vm_mm); + mapped =3D get_mapped(event, event_mapped); + if (mapped) + mapped(event, vma->vm_mm); =20 return ret; } @@ -6900,6 +6965,9 @@ static int perf_fasync(int fd, struct fi struct perf_event *event =3D filp->private_data; int retval; =20 + if (event->state <=3D PERF_EVENT_STATE_REVOKED) + return -ENODEV; + inode_lock(inode); retval =3D fasync_helper(fd, filp, on, &event->fasync); inode_unlock(inode); @@ -10866,6 +10934,9 @@ static int __perf_event_set_bpf_prog(str { bool is_kprobe, is_uprobe, is_tracepoint, is_syscall_tp; =20 + if (event->state <=3D PERF_EVENT_STATE_REVOKED) + return -ENODEV; + if (!perf_event_is_tracing(event)) return perf_event_set_bpf_handler(event, prog, bpf_cookie); =20 @@ -12049,6 +12120,9 @@ int perf_pmu_register(struct pmu *_pmu, if (!pmu->event_idx) pmu->event_idx =3D perf_event_idx_default; =20 + INIT_LIST_HEAD(&pmu->events); + spin_lock_init(&pmu->events_lock); + /* * Now that the PMU is complete, make it visible to perf_try_init_event(). */ @@ -12062,21 +12136,143 @@ int perf_pmu_register(struct pmu *_pmu, } EXPORT_SYMBOL_GPL(perf_pmu_register); =20 -void perf_pmu_unregister(struct pmu *pmu) +static void __pmu_detach_event(struct pmu *pmu, struct perf_event *event, + struct perf_event_context *ctx) +{ + /* + * De-schedule the event and mark it REVOKED. + */ + perf_event_exit_event(event, ctx, true); + + /* + * All _free_event() bits that rely on event->pmu: + * + * Notably, perf_mmap() relies on the ordering here. + */ + scoped_guard (mutex, &event->mmap_mutex) { + WARN_ON_ONCE(pmu->event_unmapped); + /* + * Mostly an empty lock sequence, such that perf_mmap(), which + * relies on mmap_mutex, is sure to observe the state change. + */ + } + + perf_event_free_bpf_prog(event); + perf_free_addr_filters(event); + + if (event->destroy) { + event->destroy(event); + event->destroy =3D NULL; + } + + if (event->pmu_ctx) { + put_pmu_ctx(event->pmu_ctx); + event->pmu_ctx =3D NULL; + } + + exclusive_event_destroy(event); + module_put(pmu->module); + + event->pmu =3D NULL; /* force fault instead of UAF */ +} + +static void pmu_detach_event(struct pmu *pmu, struct perf_event *event) +{ + struct perf_event_context *ctx; + + ctx =3D perf_event_ctx_lock(event); + __pmu_detach_event(pmu, event, ctx); + perf_event_ctx_unlock(event, ctx); + + scoped_guard (spinlock, &pmu->events_lock) + list_del(&event->pmu_list); +} + +static struct perf_event *pmu_get_event(struct pmu *pmu) +{ + struct perf_event *event; + + guard(spinlock)(&pmu->events_lock); + list_for_each_entry(event, &pmu->events, pmu_list) { + if (atomic_long_inc_not_zero(&event->refcount)) + return event; + } + + return NULL; +} + +static bool pmu_empty(struct pmu *pmu) +{ + guard(spinlock)(&pmu->events_lock); + return list_empty(&pmu->events); +} + +static void pmu_detach_events(struct pmu *pmu) +{ + struct perf_event *event; + + for (;;) { + event =3D pmu_get_event(pmu); + if (!event) + break; + + pmu_detach_event(pmu, event); + put_event(event); + } + + /* + * wait for pending _free_event()s + */ + wait_var_event(pmu, pmu_empty(pmu)); +} + +int perf_pmu_unregister(struct pmu *pmu) { scoped_guard (mutex, &pmus_lock) { + if (!idr_cmpxchg(&pmu_idr, pmu->type, pmu, NULL)) + return -EINVAL; + list_del_rcu(&pmu->entry); - idr_remove(&pmu_idr, pmu->type); } =20 /* * We dereference the pmu list under both SRCU and regular RCU, so * synchronize against both of those. + * + * Notably, the entirety of event creation, from perf_init_event() + * (which will now fail, because of the above) until + * perf_install_in_context() should be under SRCU such that + * this synchronizes against event creation. This avoids trying to + * detach events that are not fully formed. */ synchronize_srcu(&pmus_srcu); synchronize_rcu(); =20 + if (pmu->event_unmapped && !pmu_empty(pmu)) { + /* + * Can't force remove events when pmu::event_unmapped() + * is used in perf_mmap_close(). + */ + guard(mutex)(&pmus_lock); + idr_cmpxchg(&pmu_idr, pmu->type, NULL, pmu); + list_add_rcu(&pmu->entry, &pmus); + return -EBUSY; + } + + scoped_guard (mutex, &pmus_lock) + idr_remove(&pmu_idr, pmu->type); + + /* + * PMU is removed from the pmus list, so no new events will + * be created, now take care of the existing ones. + */ + pmu_detach_events(pmu); + + /* + * PMU is unused, make it go away. + */ perf_pmu_free(pmu); + return 0; } EXPORT_SYMBOL_GPL(perf_pmu_unregister); =20 @@ -12170,7 +12366,7 @@ static struct pmu *perf_init_event(struc struct pmu *pmu; int type, ret; =20 - guard(srcu)(&pmus_srcu); + guard(srcu)(&pmus_srcu); /* pmu idr/list access */ =20 /* * Save original type before calling pmu->event_init() since certain @@ -12394,6 +12590,7 @@ perf_event_alloc(struct perf_event_attr INIT_LIST_HEAD(&event->active_entry); INIT_LIST_HEAD(&event->addr_filters.list); INIT_HLIST_NODE(&event->hlist_entry); + INIT_LIST_HEAD(&event->pmu_list); =20 =20 init_waitqueue_head(&event->waitq); @@ -12561,6 +12758,13 @@ perf_event_alloc(struct perf_event_attr /* symmetric to unaccount_event() in _free_event() */ account_event(event); =20 + /* + * Event creation should be under SRCU, see perf_pmu_unregister(). + */ + lockdep_assert_held(&pmus_srcu); + scoped_guard (spinlock, &pmu->events_lock) + list_add(&event->pmu_list, &pmu->events); + return_ptr(event); } =20 @@ -12760,6 +12964,9 @@ perf_event_set_output(struct perf_event goto unlock; =20 if (output_event) { + if (output_event->state <=3D PERF_EVENT_STATE_REVOKED) + goto unlock; + /* get the rb we want to redirect to */ rb =3D ring_buffer_get(output_event); if (!rb) @@ -12941,6 +13148,11 @@ SYSCALL_DEFINE5(perf_event_open, if (event_fd < 0) return event_fd; =20 + /* + * Event creation should be under SRCU, see perf_pmu_unregister(). + */ + guard(srcu)(&pmus_srcu); + CLASS(fd, group)(group_fd); // group_fd =3D=3D -1 =3D> empty if (group_fd !=3D -1) { if (!is_perf_file(group)) { @@ -12948,6 +13160,10 @@ SYSCALL_DEFINE5(perf_event_open, goto err_fd; } group_leader =3D fd_file(group)->private_data; + if (group_leader->state <=3D PERF_EVENT_STATE_REVOKED) { + err =3D -ENODEV; + goto err_fd; + } if (flags & PERF_FLAG_FD_OUTPUT) output_event =3D group_leader; if (flags & PERF_FLAG_FD_NO_GROUP) @@ -13281,6 +13497,11 @@ perf_event_create_kernel_counter(struct if (attr->aux_output || attr->aux_action) return ERR_PTR(-EINVAL); =20 + /* + * Event creation should be under SRCU, see perf_pmu_unregister(). + */ + guard(srcu)(&pmus_srcu); + event =3D perf_event_alloc(attr, cpu, task, NULL, NULL, overflow_handler, context, -1); if (IS_ERR(event)) { @@ -13492,10 +13713,14 @@ static void sync_child_event(struct perf } =20 static void -perf_event_exit_event(struct perf_event *event, struct perf_event_context = *ctx) +perf_event_exit_event(struct perf_event *event, + struct perf_event_context *ctx, bool revoke) { struct perf_event *parent_event =3D event->parent; - unsigned long detach_flags =3D 0; + unsigned long detach_flags =3D DETACH_EXIT; + + if (parent_event =3D=3D EVENT_TOMBSTONE) + parent_event =3D NULL; =20 if (parent_event) { /* @@ -13510,16 +13735,14 @@ perf_event_exit_event(struct perf_event * Do destroy all inherited groups, we don't care about those * and being thorough is better. */ - detach_flags =3D DETACH_GROUP | DETACH_CHILD; + detach_flags |=3D DETACH_GROUP | DETACH_CHILD; mutex_lock(&parent_event->child_mutex); } =20 - perf_remove_from_context(event, detach_flags); + if (revoke) + detach_flags |=3D DETACH_GROUP | DETACH_REVOKE; =20 - raw_spin_lock_irq(&ctx->lock); - if (event->state > PERF_EVENT_STATE_EXIT) - perf_event_set_state(event, PERF_EVENT_STATE_EXIT); - raw_spin_unlock_irq(&ctx->lock); + perf_remove_from_context(event, detach_flags); =20 /* * Child events can be freed. @@ -13530,7 +13753,10 @@ perf_event_exit_event(struct perf_event * Kick perf_poll() for is_event_hup(); */ perf_event_wakeup(parent_event); - free_event(event); + /* + * pmu_detach_event() will have an extra refcount. + */ + put_event(event); put_event(parent_event); return; } @@ -13596,7 +13822,7 @@ static void perf_event_exit_task_context perf_event_task(task, ctx, 0); =20 list_for_each_entry_safe(child_event, next, &ctx->event_list, event_entry) - perf_event_exit_event(child_event, ctx); + perf_event_exit_event(child_event, ctx, false); =20 mutex_unlock(&ctx->mutex); =20 @@ -13743,6 +13969,14 @@ inherit_event(struct perf_event *parent_ if (parent_event->parent) parent_event =3D parent_event->parent; =20 + if (parent_event->state <=3D PERF_EVENT_STATE_REVOKED) + return NULL; + + /* + * Event creation should be under SRCU, see perf_pmu_unregister(). + */ + guard(srcu)(&pmus_srcu); + child_event =3D perf_event_alloc(&parent_event->attr, parent_event->cpu, child, From nobody Mon Dec 15 22:38:08 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CF41D253B42; Tue, 8 Apr 2025 19:05:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744139109; cv=none; b=uvLCwqY+zwuM6Vg9qVSuGeanCc2UQRyVb9q2muP/NJ77Lv+H/c6NtBBW04WSROtgHtMQiIillr+RKpkWS2JDWzr9vbEDRmTxMyn9wAETQz61oR6iSsBn80pIuF9OFtazJ/hN8Csvh2vENqwluSCjvCyaRJSlc2IxrcorBp+jaRc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744139109; c=relaxed/simple; bh=oPgmhWsz4TQavAmb2Lh3JnNdsz4SJE0zEGPpKpgB1Nk=; h=Date:From:To:Subject:Cc:In-Reply-To:References:MIME-Version: Message-ID:Content-Type; b=HwJLA5BBNu24pzpkrYiqLvelDdtN94GMsDcBKDvyzrZkAksR1QUHOewjXLNPNn1rjupsDCA3mA7dw6UrgIaHLuuIpFBSWLNd18COWCo1koSF7biriEF9dVSfJ+lnXjJDMZh/sQdDef+8gbufLdxK0d2yGAq01yH+tyEJDypkKfo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=NFzLKjXZ; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=yaaBt46p; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="NFzLKjXZ"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="yaaBt46p" Date: Tue, 08 Apr 2025 19:05:05 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1744139106; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=432bzppfCCDS99zJysHZrpLMYDROQidyikzV2QBabN0=; b=NFzLKjXZWeqRTdPAWH8eLzwU03Huz5+JUc5xwue2syuIYzWDZSoP8cfml7T3UQL98s2RqK CDVz5yes6bIZAjB1Dct+bvIyYYbfl+O6nb5fIfRgJ35wGMBnMdeZEFEwz8ZfZ++bWck8MX wuPSezmpkMpgKxFHwxSczUlbq2WAEs8FmB1N64+kwmW6tWfFcRjkSexuWpCOWDoLsn6QeD UqD8wNIh5YL5u4sEz9x5ffJRRvub0vftf6NfTMZqx/cwEEn4oxcTShWnGL2u9QEtWjJpFd kSXLJLlb/ptKkIA6vRfwO6SBf47ZZ37ndXEgG4C6WDktIMzX7jaVw2jckR+iVw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1744139106; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=432bzppfCCDS99zJysHZrpLMYDROQidyikzV2QBabN0=; b=yaaBt46pkoM+Jd5c278Ow8el4wV4LlfA9GCbaaJ3pvd4nmLRqS74amhYkQKi18P6T8JGDN 5aYMkU8zNKtWgDDQ== From: "tip-bot2 for Peter Zijlstra" Sender: tip-bot2@linutronix.de Reply-to: linux-kernel@vger.kernel.org To: linux-tip-commits@vger.kernel.org Subject: [tip: perf/core] perf: Ensure bpf_perf_link path is properly serialized Cc: Ravi Bangoria , "Peter Zijlstra (Intel)" , x86@kernel.org, linux-kernel@vger.kernel.org In-Reply-To: <20250307193305.486326750@infradead.org> References: <20250307193305.486326750@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-ID: <174413910546.31282.13108128229694254483.tip-bot2@tip-bot2> Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails Precedence: bulk Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable The following commit has been merged into the perf/core branch of tip: Commit-ID: 7ed9138a72829d2035ecbd8dbd35b1bc3c137c40 Gitweb: https://git.kernel.org/tip/7ed9138a72829d2035ecbd8dbd35b1bc3= c137c40 Author: Peter Zijlstra AuthorDate: Fri, 17 Jan 2025 10:54:50 +01:00 Committer: Peter Zijlstra CommitterDate: Tue, 08 Apr 2025 20:55:46 +02:00 perf: Ensure bpf_perf_link path is properly serialized Ravi reported that the bpf_perf_link_attach() usage of perf_event_set_bpf_prog() is not serialized by ctx->mutex, unlike the PERF_EVENT_IOC_SET_BPF case. Reported-by: Ravi Bangoria Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Ravi Bangoria Link: https://lkml.kernel.org/r/20250307193305.486326750@infradead.org --- kernel/events/core.c | 34 ++++++++++++++++++++++++++++++---- 1 file changed, 30 insertions(+), 4 deletions(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index e93c195..a85d63b 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -6239,6 +6239,9 @@ static int perf_event_set_output(struct perf_event *e= vent, static int perf_event_set_filter(struct perf_event *event, void __user *ar= g); static int perf_copy_attr(struct perf_event_attr __user *uattr, struct perf_event_attr *attr); +static int __perf_event_set_bpf_prog(struct perf_event *event, + struct bpf_prog *prog, + u64 bpf_cookie); =20 static long _perf_ioctl(struct perf_event *event, unsigned int cmd, unsign= ed long arg) { @@ -6301,7 +6304,7 @@ static long _perf_ioctl(struct perf_event *event, uns= igned int cmd, unsigned lon if (IS_ERR(prog)) return PTR_ERR(prog); =20 - err =3D perf_event_set_bpf_prog(event, prog, 0); + err =3D __perf_event_set_bpf_prog(event, prog, 0); if (err) { bpf_prog_put(prog); return err; @@ -11069,8 +11072,9 @@ static inline bool perf_event_is_tracing(struct per= f_event *event) return false; } =20 -int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *pro= g, - u64 bpf_cookie) +static int __perf_event_set_bpf_prog(struct perf_event *event, + struct bpf_prog *prog, + u64 bpf_cookie) { bool is_kprobe, is_uprobe, is_tracepoint, is_syscall_tp; =20 @@ -11108,6 +11112,20 @@ int perf_event_set_bpf_prog(struct perf_event *eve= nt, struct bpf_prog *prog, return perf_event_attach_bpf_prog(event, prog, bpf_cookie); } =20 +int perf_event_set_bpf_prog(struct perf_event *event, + struct bpf_prog *prog, + u64 bpf_cookie) +{ + struct perf_event_context *ctx; + int ret; + + ctx =3D perf_event_ctx_lock(event); + ret =3D __perf_event_set_bpf_prog(event, prog, bpf_cookie); + perf_event_ctx_unlock(event, ctx); + + return ret; +} + void perf_event_free_bpf_prog(struct perf_event *event) { if (!event->prog) @@ -11130,7 +11148,15 @@ static void perf_event_free_filter(struct perf_eve= nt *event) { } =20 -int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *pro= g, +static int __perf_event_set_bpf_prog(struct perf_event *event, + struct bpf_prog *prog, + u64 bpf_cookie) +{ + return -ENOENT; +} + +int perf_event_set_bpf_prog(struct perf_event *event, + struct bpf_prog *prog, u64 bpf_cookie) { return -ENOENT; From nobody Mon Dec 15 22:38:08 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 26A182528ED; Tue, 8 Apr 2025 19:05:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744139109; cv=none; b=eQsDIX9h7TvmUbLKS+pXGPebMAJsEDhlRH152JfjXtaz75+MWyYCGaUJU+zpy3CXAqrV81knLmc3/9ZDfHWxL6aZb1ndEwTym1j7Pj46Wn3IxGF6im0/t4+2dNnQ5/8jQ0owwlHo8Ce2RQrD5XJRkPDXkI5DHFNH7+cyC8WQFK4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744139109; c=relaxed/simple; bh=/ai1heGd72j5PThisRlYees80ah3AMc9rQi5nwAAwGQ=; h=Date:From:To:Subject:Cc:In-Reply-To:References:MIME-Version: Message-ID:Content-Type; b=pCUNR1JeFP3qrLmDWpIlXprv/964x0eLlG7FfUtucs2oLmzutF9z7iF3buTX2vqHcKhyd7AakBQaVkd+zPkPTZoVs+QM2pzkjeJryiHCIwdXQpd6E04PdlaaYcg8Xtv6SbzFx2a0393neYY9VowRk0eQSMPfWAPIh84i0GLw8/I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=gJxZbRJZ; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=Gr9s7HH5; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="gJxZbRJZ"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="Gr9s7HH5" Date: Tue, 08 Apr 2025 19:05:04 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1744139105; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gLCwYNFocq6ShE2lDB25ZIRbAWEGo8QgOQcJOT7ZIWA=; b=gJxZbRJZwHKBJVIQ/GgOmUIsM+p1tPwLXR84xCaj295YIhyDOWADiqG+TrqcuWGN1NtoxN PGuZH7GB6VH8DpOq2rqn5J1VPd+PEg6aoPYBbPF7NYkiPIHPg7Sn1UCpcC2ZSEl34qchT/ mcKcycAh6Ca4v5vimmiXMVFMZzNsV60kFiubFc/dqJpmRAUb2wDw7zGcizGRUTRHEnLumW X4vnitqYOYARjQJbyYxO0NRr/3s6d0d/yt4quZAQQpqd/29iJxO4hQtBdM8CTrn3PZeZSc 6ZLxOMPEzlpKKUSfM6ZA5iqyFn3U1ZdBNGM9f2ANtBtoJ05H+Vbh3j3Xwdr8mA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1744139105; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gLCwYNFocq6ShE2lDB25ZIRbAWEGo8QgOQcJOT7ZIWA=; b=Gr9s7HH5kp34XAajw21RLVwnNQir2HSLEl6UvOEOtA5FFOHGqfpDgSv0t5Ck7jidSc27Yi JbrVbocAiDiWICCA== From: "tip-bot2 for Peter Zijlstra" Sender: tip-bot2@linutronix.de Reply-to: linux-kernel@vger.kernel.org To: linux-tip-commits@vger.kernel.org Subject: [tip: perf/core] perf: Simplify child event tear-down Cc: "Peter Zijlstra (Intel)" , Ravi Bangoria , x86@kernel.org, linux-kernel@vger.kernel.org In-Reply-To: <20250307193305.486326750@infradead.org> References: <20250307193305.486326750@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-ID: <174413910477.31282.11766952801429346476.tip-bot2@tip-bot2> Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails Precedence: bulk Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable The following commit has been merged into the perf/core branch of tip: Commit-ID: 0a00a43b8c200df5b9ca2b3e1726479b5916264b Gitweb: https://git.kernel.org/tip/0a00a43b8c200df5b9ca2b3e1726479b5= 916264b Author: Peter Zijlstra AuthorDate: Fri, 17 Jan 2025 15:25:23 +01:00 Committer: Peter Zijlstra CommitterDate: Tue, 08 Apr 2025 20:55:46 +02:00 perf: Simplify child event tear-down Currently perf_event_release_kernel() will iterate the child events and att= empt tear-down. However, it removes them from the child_list using list_move(), notably skipping the state management done by perf_child_detach(). Crucially, it fails to clear PERF_ATTACH_CHILD, which opens the door for a concurrent perf_remove_from_context() to race. This way child_list management stays fully serialized using child_mutex. Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Ravi Bangoria Link: https://lkml.kernel.org/r/20250307193305.486326750@infradead.org --- kernel/events/core.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index a85d63b..3c92b75 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -2325,7 +2325,11 @@ static void perf_child_detach(struct perf_event *eve= nt) if (WARN_ON_ONCE(!parent_event)) return; =20 + /* + * Can't check this from an IPI, the holder is likey another CPU. + * lockdep_assert_held(&parent_event->child_mutex); + */ =20 sync_child_event(event); list_del_init(&event->child_list); @@ -5759,8 +5763,8 @@ again: tmp =3D list_first_entry_or_null(&event->child_list, struct perf_event, child_list); if (tmp =3D=3D child) { - perf_remove_from_context(child, DETACH_GROUP); - list_move(&child->child_list, &free_list); + perf_remove_from_context(child, DETACH_GROUP | DETACH_CHILD); + list_add(&child->child_list, &free_list); } else { var =3D &ctx->refcount; } From nobody Mon Dec 15 22:38:08 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0588C244EAB; Tue, 8 Apr 2025 19:05:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744139106; cv=none; b=C43/SwAeA/vR9b/9QJW1NCY6Y3w5I28SulbDKGw/QNMqjfnA8SNdFrAAwjyTk/c7KIboCxfZP4qMFfEMRnixlqbWu4E2uQsjhMAVeDvIgEihgt8sDHQJ9ukzgZCoiYFfVDFosFaluJPMM2qw2FlixxQ/l9kuyhmAL1iUMRyWYs0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744139106; c=relaxed/simple; bh=JRhNMd7xTxqboMD0FX/Ihyk/W+msvNQQTYpMLGvBT1Y=; h=Date:From:To:Subject:Cc:In-Reply-To:References:MIME-Version: Message-ID:Content-Type; b=tm0wuaEvj22xhsXUZ352sQeRswRWNau3WRnfeaa38BOJuFAfpnXDoRjVfUWHUkcbFpHaVRdcWXH46QiwxkFnbg/CGNUXJRDVuaWmgbZHgS4f51KeZWFmlhpdUeYrkOCcw4GmuAcNztcxOt1e5QmSibm9BHr1iqcJ6ImNh1/uN70= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=s64nWPtg; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=NlXd84oz; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="s64nWPtg"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="NlXd84oz" Date: Tue, 08 Apr 2025 19:05:01 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1744139102; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=glmAbEgD3RNqNOkbjLIV1w7OIgaSSIhfghEcLItK2Jc=; b=s64nWPtgyEhqG5N4bKdXz3jUIhaPMjELIHAblC0E79pNYmeG2iJcKdxIxMDooCc618dE2f tQY0hkdkx1tDrtW/3fg6MQ5a2sphnp/cIGcFQAbTTcCZ0YCcubn5ItbGLpkMVmK3H5oq2B rAM0WoooVhCoW2M1brkbkJ0i2rriPJN1lZJXyimnx9iYz91TTFKhr3XwaIv4T9iXjRyrMP 54An4C/cSArI0jmpfFN/mA3IPNuoePcjJA2QCLsztGJf+nJ1SQQioybDRDFPBi7R17bpd/ awubC7oN5PE+iU0dNoxGNsBzXJkgYC6ywwUDRKgKczm/UtfIefHnnz3NwoKqtA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1744139102; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=glmAbEgD3RNqNOkbjLIV1w7OIgaSSIhfghEcLItK2Jc=; b=NlXd84ozL1coM4EYGtdT+gkwg2UgEaEQ8I07oq2dMwpENQtLmUNpP6jOWW6rIl6PcoAbgE qTtpcnES/erEq2Cw== From: "tip-bot2 for Peter Zijlstra" Sender: tip-bot2@linutronix.de Reply-to: linux-kernel@vger.kernel.org To: linux-tip-commits@vger.kernel.org Subject: [tip: perf/core] perf: Rename perf_event_exit_task(.child) Cc: "Peter Zijlstra (Intel)" , Ravi Bangoria , x86@kernel.org, linux-kernel@vger.kernel.org In-Reply-To: <20250307193305.486326750@infradead.org> References: <20250307193305.486326750@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-ID: <174413910123.31282.973495586484753335.tip-bot2@tip-bot2> Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails Precedence: bulk Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable The following commit has been merged into the perf/core branch of tip: Commit-ID: 4da0600edae1cf15d12bebacc66d7237e2c33fc6 Gitweb: https://git.kernel.org/tip/4da0600edae1cf15d12bebacc66d7237e= 2c33fc6 Author: Peter Zijlstra AuthorDate: Fri, 14 Feb 2025 13:23:45 +01:00 Committer: Peter Zijlstra CommitterDate: Tue, 08 Apr 2025 20:55:48 +02:00 perf: Rename perf_event_exit_task(.child) The task passed to perf_event_exit_task() is not a child, it is current. Fix this confusing naming, since much of the rest of the code also relies on it being current. Specifically, both exec() and exit() callers use it with current as the argument. Notably, task_ctx_sched_out() doesn't make much sense outside of current. Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Ravi Bangoria Link: https://lkml.kernel.org/r/20250307193305.486326750@infradead.org --- kernel/events/core.c | 62 ++++++++++++++++++++++--------------------- 1 file changed, 32 insertions(+), 30 deletions(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index 85c8b79..985b5c7 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -13742,13 +13742,13 @@ perf_event_exit_event(struct perf_event *event, s= truct perf_event_context *ctx) perf_event_wakeup(event); } =20 -static void perf_event_exit_task_context(struct task_struct *child, bool e= xit) +static void perf_event_exit_task_context(struct task_struct *task, bool ex= it) { - struct perf_event_context *child_ctx, *clone_ctx =3D NULL; + struct perf_event_context *ctx, *clone_ctx =3D NULL; struct perf_event *child_event, *next; =20 - child_ctx =3D perf_pin_task_context(child); - if (!child_ctx) + ctx =3D perf_pin_task_context(task); + if (!ctx) return; =20 /* @@ -13761,28 +13761,28 @@ static void perf_event_exit_task_context(struct t= ask_struct *child, bool exit) * without ctx::mutex (it cannot because of the move_group double mutex * lock thing). See the comments in perf_install_in_context(). */ - mutex_lock(&child_ctx->mutex); + mutex_lock(&ctx->mutex); =20 /* * In a single ctx::lock section, de-schedule the events and detach the * context from the task such that we cannot ever get it scheduled back * in. */ - raw_spin_lock_irq(&child_ctx->lock); + raw_spin_lock_irq(&ctx->lock); if (exit) - task_ctx_sched_out(child_ctx, NULL, EVENT_ALL); + task_ctx_sched_out(ctx, NULL, EVENT_ALL); =20 /* * Now that the context is inactive, destroy the task <-> ctx relation * and mark the context dead. */ - RCU_INIT_POINTER(child->perf_event_ctxp, NULL); - put_ctx(child_ctx); /* cannot be last */ - WRITE_ONCE(child_ctx->task, TASK_TOMBSTONE); - put_task_struct(child); /* cannot be last */ + RCU_INIT_POINTER(task->perf_event_ctxp, NULL); + put_ctx(ctx); /* cannot be last */ + WRITE_ONCE(ctx->task, TASK_TOMBSTONE); + put_task_struct(task); /* cannot be last */ =20 - clone_ctx =3D unclone_ctx(child_ctx); - raw_spin_unlock_irq(&child_ctx->lock); + clone_ctx =3D unclone_ctx(ctx); + raw_spin_unlock_irq(&ctx->lock); =20 if (clone_ctx) put_ctx(clone_ctx); @@ -13793,12 +13793,12 @@ static void perf_event_exit_task_context(struct t= ask_struct *child, bool exit) * get a few PERF_RECORD_READ events. */ if (exit) - perf_event_task(child, child_ctx, 0); + perf_event_task(task, ctx, 0); =20 - list_for_each_entry_safe(child_event, next, &child_ctx->event_list, event= _entry) - perf_event_exit_event(child_event, child_ctx); + list_for_each_entry_safe(child_event, next, &ctx->event_list, event_entry) + perf_event_exit_event(child_event, ctx); =20 - mutex_unlock(&child_ctx->mutex); + mutex_unlock(&ctx->mutex); =20 if (!exit) { /* @@ -13814,24 +13814,26 @@ static void perf_event_exit_task_context(struct t= ask_struct *child, bool exit) * * Wait for all events to drop their context reference. */ - wait_var_event(&child_ctx->refcount, - refcount_read(&child_ctx->refcount) =3D=3D 1); + wait_var_event(&ctx->refcount, + refcount_read(&ctx->refcount) =3D=3D 1); } - put_ctx(child_ctx); + put_ctx(ctx); } =20 /* - * When a child task exits, feed back event values to parent events. + * When a task exits, feed back event values to parent events. * * Can be called with exec_update_lock held when called from * setup_new_exec(). */ -void perf_event_exit_task(struct task_struct *child) +void perf_event_exit_task(struct task_struct *task) { struct perf_event *event, *tmp; =20 - mutex_lock(&child->perf_event_mutex); - list_for_each_entry_safe(event, tmp, &child->perf_event_list, + WARN_ON_ONCE(task !=3D current); + + mutex_lock(&task->perf_event_mutex); + list_for_each_entry_safe(event, tmp, &task->perf_event_list, owner_entry) { list_del_init(&event->owner_entry); =20 @@ -13842,23 +13844,23 @@ void perf_event_exit_task(struct task_struct *chi= ld) */ smp_store_release(&event->owner, NULL); } - mutex_unlock(&child->perf_event_mutex); + mutex_unlock(&task->perf_event_mutex); =20 - perf_event_exit_task_context(child, true); + perf_event_exit_task_context(task, true); =20 /* * The perf_event_exit_task_context calls perf_event_task - * with child's task_ctx, which generates EXIT events for - * child contexts and sets child->perf_event_ctxp[] to NULL. + * with task's task_ctx, which generates EXIT events for + * task contexts and sets task->perf_event_ctxp[] to NULL. * At this point we need to send EXIT events to cpu contexts. */ - perf_event_task(child, NULL, 0); + perf_event_task(task, NULL, 0); =20 /* * Detach the perf_ctx_data for the system-wide event. */ guard(percpu_read)(&global_ctx_data_rwsem); - detach_task_ctx_data(child); + detach_task_ctx_data(task); } =20 /*