From nobody Sat Feb 7 23:45:26 2026 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5335E1E4EFB; Wed, 7 Aug 2024 11:56:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723031793; cv=none; b=IjOwfEuWMBhx1/24sd0BGksJDLu5kJreH86/uSm82X1BUNpvbjJYh6PsFBoPXSswEbgaAjuGJ8ZlQrrzHvM5rGGjJZpefWfpU/VvgOqgBaRd2tIfK3UEGnREqF+Atv8n3HaF7MqLh9R2Gj/dxRKdEftB5XtpUA95IDM5EBnmNe4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723031793; c=relaxed/simple; bh=YEfkuLz05+knBqwzH+1YZVTlS8GBzWRXQD4GtvkzhTw=; h=Message-Id:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=iC2GjDaAi1XkgG3lnxM7YdOwWXKJk/iVvveP2U6k/7g4grig+XS+YbThfgAZCrOh/n5mtK7IlbyoVbhfuHJKk0XJMLR4nAvQWbWyVJ6I0dkwBeqzEGUuHaaLL0TahcUFd9oOTFyjn1kG+hevOtBWqlvq4drN/Jy5kGeYmqrTO6M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=Y6Qse5eK; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="Y6Qse5eK" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-Id:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=zSTLT+/QdH0dxTPKQB/RqMCvuA6BWZoPzuTugdHd1v0=; b=Y6Qse5eK2/Q4QV6Q90hNOu7TrN zyr0cSk1VpcETSC3C9bLPRsrOP585MT8GDNZ3tAgcBvtbgXCA3RVhWL983OHHnb08zvRA4Cl4Bbm4 YKrUGhyXnkt4hpLmLCb/o0T5cWO9bY1Rvd3z/9URUjPNFlRqnbLpePHPcUGhRUDldcgjcB72vAswr cUXKD0JQxx3b0YeCd+GVgETV4SOX2cf2gziJlNEaLT7+R76BHl84AI4HA89iHYPFC5pNg+vS6BU8h cqm/67boz/7q3IOr/iLe/gjYi9RWL1AAy6jTwHUyCjiGQECf3sT9ZbIhhTj6i52rbqdRM2YhaChVI Z+SchCpw==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1sbfHC-00000007FPc-2DU2; Wed, 07 Aug 2024 11:56:22 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 8EA69300729; Wed, 7 Aug 2024 13:56:21 +0200 (CEST) Message-Id: <20240807115549.920950699@infradead.org> User-Agent: quilt/0.65 Date: Wed, 07 Aug 2024 13:29:25 +0200 From: Peter Zijlstra To: mingo@kernel.org Cc: peterz@infradead.org, acme@kernel.org, namhyung@kernel.org, mark.rutland@arm.com, alexander.shishkin@linux.intel.com, jolsa@kernel.org, irogers@google.com, adrian.hunter@intel.com, kan.liang@linux.intel.com, linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 1/5] perf: Optimize context reschedule for single PMU cases References: <20240807112924.448091402@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Currently re-scheduling a context will reschedule all active PMUs for that context, even if it is known only a single event is added. Namhyung reported that changing this to only reschedule the affected PMU when possible provides significant performance gains under certain conditions. Therefore, allow partial context reschedules for a specific PMU, that of the event modified. While the patch looks somewhat noisy, it mostly just propagates a new @pmu argument through the callchain and modifies the epc loop to only pick the 'epc->pmu =3D=3D @pmu' case. Reported-by: Namhyung Kim Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Kan Liang Reviewed-by: Namhyung Kim --- kernel/events/core.c | 164 +++++++++++++++++++++++++++-------------------= ----- 1 file changed, 88 insertions(+), 76 deletions(-) --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -685,30 +685,32 @@ do { \ ___p; \ }) =20 +#define for_each_epc(_epc, _ctx, _pmu, _cgroup) \ + list_for_each_entry(_epc, &((_ctx)->pmu_ctx_list), pmu_ctx_entry) \ + if (_cgroup && !_epc->nr_cgroups) \ + continue; \ + else if (_pmu && _epc->pmu !=3D _pmu) \ + continue; \ + else + static void perf_ctx_disable(struct perf_event_context *ctx, bool cgroup) { struct perf_event_pmu_context *pmu_ctx; =20 - list_for_each_entry(pmu_ctx, &ctx->pmu_ctx_list, pmu_ctx_entry) { - if (cgroup && !pmu_ctx->nr_cgroups) - continue; + for_each_epc(pmu_ctx, ctx, NULL, cgroup) perf_pmu_disable(pmu_ctx->pmu); - } } =20 static void perf_ctx_enable(struct perf_event_context *ctx, bool cgroup) { struct perf_event_pmu_context *pmu_ctx; =20 - list_for_each_entry(pmu_ctx, &ctx->pmu_ctx_list, pmu_ctx_entry) { - if (cgroup && !pmu_ctx->nr_cgroups) - continue; + for_each_epc(pmu_ctx, ctx, NULL, cgroup) perf_pmu_enable(pmu_ctx->pmu); - } } =20 -static void ctx_sched_out(struct perf_event_context *ctx, enum event_type_= t event_type); -static void ctx_sched_in(struct perf_event_context *ctx, enum event_type_t= event_type); +static void ctx_sched_out(struct perf_event_context *ctx, struct pmu *pmu,= enum event_type_t event_type); +static void ctx_sched_in(struct perf_event_context *ctx, struct pmu *pmu, = enum event_type_t event_type); =20 #ifdef CONFIG_CGROUP_PERF =20 @@ -865,7 +867,7 @@ static void perf_cgroup_switch(struct ta perf_ctx_lock(cpuctx, cpuctx->task_ctx); perf_ctx_disable(&cpuctx->ctx, true); =20 - ctx_sched_out(&cpuctx->ctx, EVENT_ALL|EVENT_CGROUP); + ctx_sched_out(&cpuctx->ctx, NULL, EVENT_ALL|EVENT_CGROUP); /* * must not be done before ctxswout due * to update_cgrp_time_from_cpuctx() in @@ -877,7 +879,7 @@ static void perf_cgroup_switch(struct ta * perf_cgroup_set_timestamp() in ctx_sched_in() * to not have to pass task around */ - ctx_sched_in(&cpuctx->ctx, EVENT_ALL|EVENT_CGROUP); + ctx_sched_in(&cpuctx->ctx, NULL, EVENT_ALL|EVENT_CGROUP); =20 perf_ctx_enable(&cpuctx->ctx, true); perf_ctx_unlock(cpuctx, cpuctx->task_ctx); @@ -2656,7 +2658,8 @@ static void add_event_to_ctx(struct perf } =20 static void task_ctx_sched_out(struct perf_event_context *ctx, - enum event_type_t event_type) + struct pmu *pmu, + enum event_type_t event_type) { struct perf_cpu_context *cpuctx =3D this_cpu_ptr(&perf_cpu_context); =20 @@ -2666,18 +2669,19 @@ static void task_ctx_sched_out(struct pe if (WARN_ON_ONCE(ctx !=3D cpuctx->task_ctx)) return; =20 - ctx_sched_out(ctx, event_type); + ctx_sched_out(ctx, pmu, event_type); } =20 static void perf_event_sched_in(struct perf_cpu_context *cpuctx, - struct perf_event_context *ctx) + struct perf_event_context *ctx, + struct pmu *pmu) { - ctx_sched_in(&cpuctx->ctx, EVENT_PINNED); + ctx_sched_in(&cpuctx->ctx, pmu, EVENT_PINNED); if (ctx) - ctx_sched_in(ctx, EVENT_PINNED); - ctx_sched_in(&cpuctx->ctx, EVENT_FLEXIBLE); + ctx_sched_in(ctx, pmu, EVENT_PINNED); + ctx_sched_in(&cpuctx->ctx, pmu, EVENT_FLEXIBLE); if (ctx) - ctx_sched_in(ctx, EVENT_FLEXIBLE); + ctx_sched_in(ctx, pmu, EVENT_FLEXIBLE); } =20 /* @@ -2695,16 +2699,12 @@ static void perf_event_sched_in(struct p * event_type is a bit mask of the types of events involved. For CPU event= s, * event_type is only either EVENT_PINNED or EVENT_FLEXIBLE. */ -/* - * XXX: ctx_resched() reschedule entire perf_event_context while adding new - * event to the context or enabling existing event in the context. We can - * probably optimize it by rescheduling only affected pmu_ctx. - */ static void ctx_resched(struct perf_cpu_context *cpuctx, struct perf_event_context *task_ctx, - enum event_type_t event_type) + struct pmu *pmu, enum event_type_t event_type) { bool cpu_event =3D !!(event_type & EVENT_CPU); + struct perf_event_pmu_context *epc; =20 /* * If pinned groups are involved, flexible groups also need to be @@ -2715,10 +2715,14 @@ static void ctx_resched(struct perf_cpu_ =20 event_type &=3D EVENT_ALL; =20 - perf_ctx_disable(&cpuctx->ctx, false); + for_each_epc(epc, &cpuctx->ctx, pmu, false) + perf_pmu_disable(epc->pmu); + if (task_ctx) { - perf_ctx_disable(task_ctx, false); - task_ctx_sched_out(task_ctx, event_type); + for_each_epc(epc, task_ctx, pmu, false) + perf_pmu_disable(epc->pmu); + + task_ctx_sched_out(task_ctx, pmu, event_type); } =20 /* @@ -2729,15 +2733,19 @@ static void ctx_resched(struct perf_cpu_ * - otherwise, do nothing more. */ if (cpu_event) - ctx_sched_out(&cpuctx->ctx, event_type); + ctx_sched_out(&cpuctx->ctx, pmu, event_type); else if (event_type & EVENT_PINNED) - ctx_sched_out(&cpuctx->ctx, EVENT_FLEXIBLE); + ctx_sched_out(&cpuctx->ctx, pmu, EVENT_FLEXIBLE); =20 - perf_event_sched_in(cpuctx, task_ctx); + perf_event_sched_in(cpuctx, task_ctx, pmu); =20 - perf_ctx_enable(&cpuctx->ctx, false); - if (task_ctx) - perf_ctx_enable(task_ctx, false); + for_each_epc(epc, &cpuctx->ctx, pmu, false) + perf_pmu_enable(epc->pmu); + + if (task_ctx) { + for_each_epc(epc, task_ctx, pmu, false) + perf_pmu_enable(epc->pmu); + } } =20 void perf_pmu_resched(struct pmu *pmu) @@ -2746,7 +2754,7 @@ void perf_pmu_resched(struct pmu *pmu) struct perf_event_context *task_ctx =3D cpuctx->task_ctx; =20 perf_ctx_lock(cpuctx, task_ctx); - ctx_resched(cpuctx, task_ctx, EVENT_ALL|EVENT_CPU); + ctx_resched(cpuctx, task_ctx, pmu, EVENT_ALL|EVENT_CPU); perf_ctx_unlock(cpuctx, task_ctx); } =20 @@ -2802,9 +2810,10 @@ static int __perf_install_in_context(vo #endif =20 if (reprogram) { - ctx_sched_out(ctx, EVENT_TIME); + ctx_sched_out(ctx, NULL, EVENT_TIME); add_event_to_ctx(event, ctx); - ctx_resched(cpuctx, task_ctx, get_event_type(event)); + ctx_resched(cpuctx, task_ctx, event->pmu_ctx->pmu, + get_event_type(event)); } else { add_event_to_ctx(event, ctx); } @@ -2948,7 +2957,7 @@ static void __perf_event_enable(struct p return; =20 if (ctx->is_active) - ctx_sched_out(ctx, EVENT_TIME); + ctx_sched_out(ctx, NULL, EVENT_TIME); =20 perf_event_set_state(event, PERF_EVENT_STATE_INACTIVE); perf_cgroup_event_enable(event, ctx); @@ -2957,7 +2966,7 @@ static void __perf_event_enable(struct p return; =20 if (!event_filter_match(event)) { - ctx_sched_in(ctx, EVENT_TIME); + ctx_sched_in(ctx, NULL, EVENT_TIME); return; } =20 @@ -2966,7 +2975,7 @@ static void __perf_event_enable(struct p * then don't put it on unless the group is on. */ if (leader !=3D event && leader->state !=3D PERF_EVENT_STATE_ACTIVE) { - ctx_sched_in(ctx, EVENT_TIME); + ctx_sched_in(ctx, NULL, EVENT_TIME); return; } =20 @@ -2974,7 +2983,7 @@ static void __perf_event_enable(struct p if (ctx->task) WARN_ON_ONCE(task_ctx !=3D ctx); =20 - ctx_resched(cpuctx, task_ctx, get_event_type(event)); + ctx_resched(cpuctx, task_ctx, event->pmu_ctx->pmu, get_event_type(event)); } =20 /* @@ -3276,8 +3285,17 @@ static void __pmu_ctx_sched_out(struct p perf_pmu_enable(pmu); } =20 +/* + * Be very careful with the @pmu argument since this will change ctx state. + * The @pmu argument works for ctx_resched(), because that is symmetric in + * ctx_sched_out() / ctx_sched_in() usage and the ctx state ends up invari= ant. + * + * However, if you were to be asymmetrical, you could end up with messed up + * state, eg. ctx->is_active cleared even though most EPCs would still act= ually + * be active. + */ static void -ctx_sched_out(struct perf_event_context *ctx, enum event_type_t event_type) +ctx_sched_out(struct perf_event_context *ctx, struct pmu *pmu, enum event_= type_t event_type) { struct perf_cpu_context *cpuctx =3D this_cpu_ptr(&perf_cpu_context); struct perf_event_pmu_context *pmu_ctx; @@ -3331,11 +3349,8 @@ ctx_sched_out(struct perf_event_context =20 is_active ^=3D ctx->is_active; /* changed bits */ =20 - list_for_each_entry(pmu_ctx, &ctx->pmu_ctx_list, pmu_ctx_entry) { - if (cgroup && !pmu_ctx->nr_cgroups) - continue; + for_each_epc(pmu_ctx, ctx, pmu, cgroup) __pmu_ctx_sched_out(pmu_ctx, is_active); - } } =20 /* @@ -3579,7 +3594,7 @@ perf_event_context_sched_out(struct task =20 inside_switch: perf_ctx_sched_task_cb(ctx, false); - task_ctx_sched_out(ctx, EVENT_ALL); + task_ctx_sched_out(ctx, NULL, EVENT_ALL); =20 perf_ctx_enable(ctx, false); raw_spin_unlock(&ctx->lock); @@ -3877,29 +3892,22 @@ static void pmu_groups_sched_in(struct p merge_sched_in, &can_add_hw); } =20 -static void ctx_groups_sched_in(struct perf_event_context *ctx, - struct perf_event_groups *groups, - bool cgroup) +static void __pmu_ctx_sched_in(struct perf_event_pmu_context *pmu_ctx, + enum event_type_t event_type) { - struct perf_event_pmu_context *pmu_ctx; - - list_for_each_entry(pmu_ctx, &ctx->pmu_ctx_list, pmu_ctx_entry) { - if (cgroup && !pmu_ctx->nr_cgroups) - continue; - pmu_groups_sched_in(ctx, groups, pmu_ctx->pmu); - } -} + struct perf_event_context *ctx =3D pmu_ctx->ctx; =20 -static void __pmu_ctx_sched_in(struct perf_event_context *ctx, - struct pmu *pmu) -{ - pmu_groups_sched_in(ctx, &ctx->flexible_groups, pmu); + if (event_type & EVENT_PINNED) + pmu_groups_sched_in(ctx, &ctx->pinned_groups, pmu_ctx->pmu); + if (event_type & EVENT_FLEXIBLE) + pmu_groups_sched_in(ctx, &ctx->flexible_groups, pmu_ctx->pmu); } =20 static void -ctx_sched_in(struct perf_event_context *ctx, enum event_type_t event_type) +ctx_sched_in(struct perf_event_context *ctx, struct pmu *pmu, enum event_t= ype_t event_type) { struct perf_cpu_context *cpuctx =3D this_cpu_ptr(&perf_cpu_context); + struct perf_event_pmu_context *pmu_ctx; int is_active =3D ctx->is_active; bool cgroup =3D event_type & EVENT_CGROUP; =20 @@ -3935,12 +3943,16 @@ ctx_sched_in(struct perf_event_context * * First go through the list and put on any pinned groups * in order to give them the best chance of going on. */ - if (is_active & EVENT_PINNED) - ctx_groups_sched_in(ctx, &ctx->pinned_groups, cgroup); + if (is_active & EVENT_PINNED) { + for_each_epc(pmu_ctx, ctx, pmu, cgroup) + __pmu_ctx_sched_in(pmu_ctx, EVENT_PINNED); + } =20 /* Then walk through the lower prio flexible groups */ - if (is_active & EVENT_FLEXIBLE) - ctx_groups_sched_in(ctx, &ctx->flexible_groups, cgroup); + if (is_active & EVENT_FLEXIBLE) { + for_each_epc(pmu_ctx, ctx, pmu, cgroup) + __pmu_ctx_sched_in(pmu_ctx, EVENT_FLEXIBLE); + } } =20 static void perf_event_context_sched_in(struct task_struct *task) @@ -3983,10 +3995,10 @@ static void perf_event_context_sched_in( */ if (!RB_EMPTY_ROOT(&ctx->pinned_groups.tree)) { perf_ctx_disable(&cpuctx->ctx, false); - ctx_sched_out(&cpuctx->ctx, EVENT_FLEXIBLE); + ctx_sched_out(&cpuctx->ctx, NULL, EVENT_FLEXIBLE); } =20 - perf_event_sched_in(cpuctx, ctx); + perf_event_sched_in(cpuctx, ctx, NULL); =20 perf_ctx_sched_task_cb(cpuctx->task_ctx, true); =20 @@ -4327,14 +4339,14 @@ static bool perf_rotate_context(struct p update_context_time(&cpuctx->ctx); __pmu_ctx_sched_out(cpu_epc, EVENT_FLEXIBLE); rotate_ctx(&cpuctx->ctx, cpu_event); - __pmu_ctx_sched_in(&cpuctx->ctx, pmu); + __pmu_ctx_sched_in(cpu_epc, EVENT_FLEXIBLE); } =20 if (task_event) rotate_ctx(task_epc->ctx, task_event); =20 if (task_event || (task_epc && cpu_event)) - __pmu_ctx_sched_in(task_epc->ctx, pmu); + __pmu_ctx_sched_in(task_epc, EVENT_FLEXIBLE); =20 perf_pmu_enable(pmu); perf_ctx_unlock(cpuctx, cpuctx->task_ctx); @@ -4400,7 +4412,7 @@ static void perf_event_enable_on_exec(st =20 cpuctx =3D this_cpu_ptr(&perf_cpu_context); perf_ctx_lock(cpuctx, ctx); - ctx_sched_out(ctx, EVENT_TIME); + ctx_sched_out(ctx, NULL, EVENT_TIME); =20 list_for_each_entry(event, &ctx->event_list, event_entry) { enabled |=3D event_enable_on_exec(event, ctx); @@ -4412,9 +4424,9 @@ static void perf_event_enable_on_exec(st */ if (enabled) { clone_ctx =3D unclone_ctx(ctx); - ctx_resched(cpuctx, ctx, event_type); + ctx_resched(cpuctx, ctx, NULL, event_type); } else { - ctx_sched_in(ctx, EVENT_TIME); + ctx_sched_in(ctx, NULL, EVENT_TIME); } perf_ctx_unlock(cpuctx, ctx); =20 @@ -13202,7 +13214,7 @@ static void perf_event_exit_task_context * in. */ raw_spin_lock_irq(&child_ctx->lock); - task_ctx_sched_out(child_ctx, EVENT_ALL); + task_ctx_sched_out(child_ctx, NULL, EVENT_ALL); =20 /* * Now that the context is inactive, destroy the task <-> ctx relation @@ -13751,7 +13763,7 @@ static void __perf_event_exit_context(vo struct perf_event *event; =20 raw_spin_lock(&ctx->lock); - ctx_sched_out(ctx, EVENT_TIME); + ctx_sched_out(ctx, NULL, EVENT_TIME); list_for_each_entry(event, &ctx->event_list, event_entry) __perf_remove_from_context(event, cpuctx, ctx, (void *)DETACH_GROUP); raw_spin_unlock(&ctx->lock); From nobody Sat Feb 7 23:45:26 2026 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C2433171E69; Wed, 7 Aug 2024 11:56:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723031789; cv=none; b=H6iQo7BPiF35X/atE8ntS++2tZgRslhxhfudl6XcwHD0/GO/hDTAxcS6oxqZ/U12Me3nn+TIReog1e3QGNTt4NWn2lWvtcGfYPqLSPRdW25OHJMtDwI+mYKPfMHLUC1EeJVfi6f006RZ0IYY7DbD6BpFqeu3gXw6+KCzs/CyFEs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723031789; c=relaxed/simple; bh=uMFl+7ygjlbqWs0UDPLyRXIAhdtOiGqRDnVBA/mNfdA=; h=Message-Id:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=TVP15+scxYmoNhql6bRjLH39H1ttj+Wdkqy2irYB+oH29ZKWWLEIC6kI07bsJV7Vok9N0Lz/vblVzSa3j3P5xFEWJtz0k5nEhR6RCUpUS+R1v/sqkzuWOR26ZIC3gzx06DNKjDRpfu6ruOKAibYxMPYKcSTcnkdzJmO1HmWvUxI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=iTPpt8P+; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="iTPpt8P+" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-Id:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=5P0E9yKnMuzgmP4VzsJaPEqiK0GCko+7ZBlVG8EI9A4=; b=iTPpt8P+yvP3cOUYj3HP4q3/Xj m0AxjNp3Uqe54q+a0Fg7+wRD3JGU8F7eEVsYpS0gnLtgZGlIcGRqcdyaQEE1eZ1DjKLZixvgL7bqs br5aGYafOxVsoKK6a2l1diHAYHn4w41S7O41jZ27kuxm3Ayl2A4fh2NlO7XyuLH2QbYAYvC9/5433 7CQWtha6aqcvBJd6jtzDy1Ys1jJvhg807vV5uj0EzPrOIt1g0emCWyL2yua/HOwE8I1AlX8y6mCEm Sqem7IvnGawtLaygGbE6dcS6c1WruZJbqj2B5IaJT9S2iU0KVSVX3mAI9el6xcR/XOpUeU7aXaVu4 dpFr26+g==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1sbfHC-00000007FPh-2En8; Wed, 07 Aug 2024 11:56:22 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 9307A30074E; Wed, 7 Aug 2024 13:56:21 +0200 (CEST) Message-Id: <20240807115550.031212518@infradead.org> User-Agent: quilt/0.65 Date: Wed, 07 Aug 2024 13:29:26 +0200 From: Peter Zijlstra To: mingo@kernel.org Cc: peterz@infradead.org, acme@kernel.org, namhyung@kernel.org, mark.rutland@arm.com, alexander.shishkin@linux.intel.com, jolsa@kernel.org, irogers@google.com, adrian.hunter@intel.com, kan.liang@linux.intel.com, linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 2/5] perf: Extract a few helpers References: <20240807112924.448091402@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The context time update code is repeated verbatim a few times. Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Kan Liang Reviewed-by: Namhyung Kim --- kernel/events/core.c | 39 ++++++++++++++++++++++----------------- 1 file changed, 22 insertions(+), 17 deletions(-) --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -2330,6 +2330,24 @@ group_sched_out(struct perf_event *group event_sched_out(event, ctx); } =20 +static inline void +ctx_time_update(struct perf_cpu_context *cpuctx, struct perf_event_context= *ctx) +{ + if (ctx->is_active & EVENT_TIME) { + update_context_time(ctx); + update_cgrp_time_from_cpuctx(cpuctx, false); + } +} + +static inline void +ctx_time_update_event(struct perf_event_context *ctx, struct perf_event *e= vent) +{ + if (ctx->is_active & EVENT_TIME) { + update_context_time(ctx); + update_cgrp_time_from_event(event); + } +} + #define DETACH_GROUP 0x01UL #define DETACH_CHILD 0x02UL #define DETACH_DEAD 0x04UL @@ -2349,10 +2367,7 @@ __perf_remove_from_context(struct perf_e struct perf_event_pmu_context *pmu_ctx =3D event->pmu_ctx; unsigned long flags =3D (unsigned long)info; =20 - if (ctx->is_active & EVENT_TIME) { - update_context_time(ctx); - update_cgrp_time_from_cpuctx(cpuctx, false); - } + ctx_time_update(cpuctx, ctx); =20 /* * Ensure event_sched_out() switches to OFF, at the very least @@ -2437,12 +2452,8 @@ static void __perf_event_disable(struct if (event->state < PERF_EVENT_STATE_INACTIVE) return; =20 - if (ctx->is_active & EVENT_TIME) { - update_context_time(ctx); - update_cgrp_time_from_event(event); - } - perf_pmu_disable(event->pmu_ctx->pmu); + ctx_time_update_event(ctx, event); =20 if (event =3D=3D event->group_leader) group_sched_out(event, ctx); @@ -4529,10 +4540,7 @@ static void __perf_event_read(void *info return; =20 raw_spin_lock(&ctx->lock); - if (ctx->is_active & EVENT_TIME) { - update_context_time(ctx); - update_cgrp_time_from_event(event); - } + ctx_time_update_event(ctx, event); =20 perf_event_update_time(event); if (data->group) @@ -4732,10 +4740,7 @@ static int perf_event_read(struct perf_e * May read while context is not active (e.g., thread is * blocked), in that case we cannot update context time */ - if (ctx->is_active & EVENT_TIME) { - update_context_time(ctx); - update_cgrp_time_from_event(event); - } + ctx_time_update_event(ctx, event); =20 perf_event_update_time(event); if (group) From nobody Sat Feb 7 23:45:26 2026 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ABC5C1E4F04; Wed, 7 Aug 2024 11:56:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723031792; cv=none; b=bFMnRmDSIVSZ4BYeq8ahPoVF3ZNWWDIxap9V9nTppFsuBRflk64kk2slEmekVH1OO1vwPsHzF2gf7tQubz4NDy92y3vSRDkFdG4ZgBkYxylHPsadFZPH1L6LAZt8GRGaNYvir7hfXxiOZPvKwaaf4N0+Y7HOS39U/zOMHDaSgGo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723031792; c=relaxed/simple; bh=NCLV3XRv4F70DO/lDLYGUKSAExlAGjF5MiNF6cNiI1g=; h=Message-Id:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=MQmZSDmNGfLXji84wYfh/HsFPmOrGBJvrXDnpoqnbaD10W8HXBHlTtnvbj2EwMlLmDKJjqOVrQwmnrN7G2aNnMTcv08ZT9G0AQVFGnM/SDI5qLN52yakJ+A1a9LAFEKDt3PJNtJFOHx9es0ZwqGj9N0v58p7irWBcvCONfKPgEQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=BKiECp4b; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="BKiECp4b" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-Id:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=fk1YC0a9penq8DhSiSBUazh/i3TIeiQXPW6dmx7/vqk=; b=BKiECp4bzsHN+8TBk4EdBU/2hS OaatYv2X8xW3jKXmFoK1A1xCV1EFnN5fb0AgDB0qvMcvUj7DhYn4W5AGaLxYjuGCU5cIFQpJfAVXd 0hxUlOdi1+T05wi9SobGWJiSZkapgPsdFHfDX0gvyB0/MODK9eF26yMoTNoGNr9YxqYEaGFSWJcWz FPdd3fPROVFC+9CAsam7BoAPnZBuCUTpE3+R2dhJ+eZwMj9TrJ5Fa1dJS0fgKW88l1hc8cYiesRGm Hh8lOvAWE1pfj0++5/iL2J+yASKNn+QdPGMsypsPd1tR5EfnmlnERipwB5NlTzURX0V06AGtq5coI 5dc7kWVQ==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1sbfHC-00000006cCk-25bd; Wed, 07 Aug 2024 11:56:22 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 970B2300AA5; Wed, 7 Aug 2024 13:56:21 +0200 (CEST) Message-Id: <20240807115550.138301094@infradead.org> User-Agent: quilt/0.65 Date: Wed, 07 Aug 2024 13:29:27 +0200 From: Peter Zijlstra To: mingo@kernel.org Cc: peterz@infradead.org, acme@kernel.org, namhyung@kernel.org, mark.rutland@arm.com, alexander.shishkin@linux.intel.com, jolsa@kernel.org, irogers@google.com, adrian.hunter@intel.com, kan.liang@linux.intel.com, linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 3/5] perf: Fix event_function_call() locking References: <20240807112924.448091402@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" All the event_function/@func call context already uses perf_ctx_lock() except for the !ctx->is_active case. Make it all consistent. Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Kan Liang Reviewed-by: Namhyung Kim --- kernel/events/core.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -263,6 +263,7 @@ static int event_function(void *info) static void event_function_call(struct perf_event *event, event_f func, vo= id *data) { struct perf_event_context *ctx =3D event->ctx; + struct perf_cpu_context *cpuctx =3D this_cpu_ptr(&perf_cpu_context); struct task_struct *task =3D READ_ONCE(ctx->task); /* verified in event_f= unction */ struct event_function_struct efs =3D { .event =3D event, @@ -291,22 +292,22 @@ static void event_function_call(struct p if (!task_function_call(task, event_function, &efs)) return; =20 - raw_spin_lock_irq(&ctx->lock); + perf_ctx_lock(cpuctx, ctx); /* * Reload the task pointer, it might have been changed by * a concurrent perf_event_context_sched_out(). */ task =3D ctx->task; if (task =3D=3D TASK_TOMBSTONE) { - raw_spin_unlock_irq(&ctx->lock); + perf_ctx_unlock(cpuctx, ctx); return; } if (ctx->is_active) { - raw_spin_unlock_irq(&ctx->lock); + perf_ctx_unlock(cpuctx, ctx); goto again; } func(event, NULL, ctx, data); - raw_spin_unlock_irq(&ctx->lock); + perf_ctx_unlock(cpuctx, ctx); } =20 /* From nobody Sat Feb 7 23:45:26 2026 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 533051E4EF9; Wed, 7 Aug 2024 11:56:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723031791; cv=none; b=QObVyyzieXPyMkB/GN+IKbj3Ibveu1xekhbnGeuKxsBjNaG5ZWkr0nUBa0JnbloMcbgadTXyQVifQ4pG2QEa/Ir0Mza8vxz0lbzWhOiHlPYds3vHr+I5XjEN0PAKaSJh17pn6LAeeFQAz1uHUP4pHZr4eNbOOfpHuQv9rZ3wR4w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723031791; c=relaxed/simple; bh=6gQiTjGykhW8SFyeSx9DwNl2psRAC9Xu2UHTYOvcTRA=; h=Message-Id:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=gZ9A5Ag6gWPEpODDq2H48Runtyz+qUum+rfo0m5b4f0yggnw59eN1gQN5oqkGDFWSE2szP8mIXR4nUCZqdEHuHBu4+Ncq/1RRjLW2ywLbu7jO2JhbBHBRMOeoyWWfGdd/D7A5HKjyAldCBySYvigixQu8nZGZa8lxBOKlr+v1JI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=bZwH3wfe; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="bZwH3wfe" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-Id:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=pRqURCFC6WRD3RajDGiuhRpL9LQ05saHVBXcyGEeN24=; b=bZwH3wfexhWefWICLlJRo37iaH jGeqBMYaoBe5sGNXD9DPVzkigDT8nS4Bl/jp5Xoi+BMwKblylE3DfFkumul0lit+Bs5yFkKLEHcTM Yu2gZPJHkYryBGYrfF5BV9pCmv74QY4mci8P0Vsd4gX5YBkXmIwZ4+IN/kwfsvTva2mr70aVxCz3Y aNXGMdvvAmxJswwzCf+xr9VSWxJv4m3KLmLt/ftlE1ZaPJ6t+w/i3vo7n2Xfj9Q0pFjzH3ELkuRwz BMM7OkOy0d/Ko5eb2L7P2Sfoala9GRfaRY96P2/YgIwriQA6+9cFZ8aOWScQl7bmh+0gZLb7uFrt3 5Rv0kiWA==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1sbfHC-00000007FPd-2A4V; Wed, 07 Aug 2024 11:56:22 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 9AE0E300ABE; Wed, 7 Aug 2024 13:56:21 +0200 (CEST) Message-Id: <20240807115550.250637571@infradead.org> User-Agent: quilt/0.65 Date: Wed, 07 Aug 2024 13:29:28 +0200 From: Peter Zijlstra To: mingo@kernel.org Cc: peterz@infradead.org, acme@kernel.org, namhyung@kernel.org, mark.rutland@arm.com, alexander.shishkin@linux.intel.com, jolsa@kernel.org, irogers@google.com, adrian.hunter@intel.com, kan.liang@linux.intel.com, linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 4/5] perf: Add context time freeze References: <20240807112924.448091402@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Many of the the context reschedule users are of the form: ctx_sched_out(.type =3D EVENT_TIME); ... modify context ctx_resched(); With the idea that the whole reschedule happens with a single time-stamp, rather than with each ctx_sched_out() advancing time and ctx_sched_in() re-starting time, creating a non-atomic experience. However, Kan noticed that since this completely stops time, it actually looses a bit of time between the stop and start. Worse, now that we can do partial (per PMU) reschedules, the PMUs that are not scheduled out still observe the time glitch. Replace this with: ctx_time_freeze(); ... modify context ctx_resched(); With the assumption that this happens in a perf_ctx_lock() / perf_ctx_unlock() pair. The new ctx_time_freeze() will update time and sets EVENT_FROZEN, and ensures EVENT_TIME and EVENT_FROZEN remain set, this avoids perf_event_time_now() from observing a time wobble from not seeing EVENT_TIME for a little while. Additionally, this avoids loosing time between ctx_sched_out(EVENT_TIME) and ctx_sched_in(), which would re-set the timestamp. Reported-by: Kan Liang Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Kan Liang Reviewed-by: Namhyung Kim --- kernel/events/core.c | 128 ++++++++++++++++++++++++++++++++++------------= ----- 1 file changed, 86 insertions(+), 42 deletions(-) --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -155,20 +155,55 @@ static int cpu_function_call(int cpu, re return data.ret; } =20 +enum event_type_t { + EVENT_FLEXIBLE =3D 0x01, + EVENT_PINNED =3D 0x02, + EVENT_TIME =3D 0x04, + EVENT_FROZEN =3D 0x08, + /* see ctx_resched() for details */ + EVENT_CPU =3D 0x10, + EVENT_CGROUP =3D 0x20, + + /* compound helpers */ + EVENT_ALL =3D EVENT_FLEXIBLE | EVENT_PINNED, + EVENT_TIME_FROZEN =3D EVENT_TIME | EVENT_FROZEN, +}; + +static inline void __perf_ctx_lock(struct perf_event_context *ctx) +{ + raw_spin_lock(&ctx->lock); + WARN_ON_ONCE(ctx->is_active & EVENT_FROZEN); +} + static void perf_ctx_lock(struct perf_cpu_context *cpuctx, struct perf_event_context *ctx) { - raw_spin_lock(&cpuctx->ctx.lock); + __perf_ctx_lock(&cpuctx->ctx); if (ctx) - raw_spin_lock(&ctx->lock); + __perf_ctx_lock(ctx); +} + +static inline void __perf_ctx_unlock(struct perf_event_context *ctx) +{ + /* + * If ctx_sched_in() didn't again set any ALL flags, clean up + * after ctx_sched_out() by clearing is_active. + */ + if (ctx->is_active & EVENT_FROZEN) { + if (!(ctx->is_active & EVENT_ALL)) + ctx->is_active =3D 0; + else + ctx->is_active &=3D ~EVENT_FROZEN; + } + raw_spin_unlock(&ctx->lock); } =20 static void perf_ctx_unlock(struct perf_cpu_context *cpuctx, struct perf_event_context *ctx) { if (ctx) - raw_spin_unlock(&ctx->lock); - raw_spin_unlock(&cpuctx->ctx.lock); + __perf_ctx_unlock(ctx); + __perf_ctx_unlock(&cpuctx->ctx); } =20 #define TASK_TOMBSTONE ((void *)-1L) @@ -370,16 +405,6 @@ static void event_function_local(struct (PERF_SAMPLE_BRANCH_KERNEL |\ PERF_SAMPLE_BRANCH_HV) =20 -enum event_type_t { - EVENT_FLEXIBLE =3D 0x1, - EVENT_PINNED =3D 0x2, - EVENT_TIME =3D 0x4, - /* see ctx_resched() for details */ - EVENT_CPU =3D 0x8, - EVENT_CGROUP =3D 0x10, - EVENT_ALL =3D EVENT_FLEXIBLE | EVENT_PINNED, -}; - /* * perf_sched_events : >0 events exist */ @@ -2332,18 +2357,39 @@ group_sched_out(struct perf_event *group } =20 static inline void -ctx_time_update(struct perf_cpu_context *cpuctx, struct perf_event_context= *ctx) +__ctx_time_update(struct perf_cpu_context *cpuctx, struct perf_event_conte= xt *ctx, bool final) { if (ctx->is_active & EVENT_TIME) { + if (ctx->is_active & EVENT_FROZEN) + return; update_context_time(ctx); - update_cgrp_time_from_cpuctx(cpuctx, false); + update_cgrp_time_from_cpuctx(cpuctx, final); } } =20 static inline void +ctx_time_update(struct perf_cpu_context *cpuctx, struct perf_event_context= *ctx) +{ + __ctx_time_update(cpuctx, ctx, false); +} + +/* + * To be used inside perf_ctx_lock() / perf_ctx_unlock(). Lasts until perf= _ctx_unlock(). + */ +static inline void +ctx_time_freeze(struct perf_cpu_context *cpuctx, struct perf_event_context= *ctx) +{ + ctx_time_update(cpuctx, ctx); + if (ctx->is_active & EVENT_TIME) + ctx->is_active |=3D EVENT_FROZEN; +} + +static inline void ctx_time_update_event(struct perf_event_context *ctx, struct perf_event *e= vent) { if (ctx->is_active & EVENT_TIME) { + if (ctx->is_active & EVENT_FROZEN) + return; update_context_time(ctx); update_cgrp_time_from_event(event); } @@ -2822,7 +2868,7 @@ static int __perf_install_in_context(vo #endif =20 if (reprogram) { - ctx_sched_out(ctx, NULL, EVENT_TIME); + ctx_time_freeze(cpuctx, ctx); add_event_to_ctx(event, ctx); ctx_resched(cpuctx, task_ctx, event->pmu_ctx->pmu, get_event_type(event)); @@ -2968,8 +3014,7 @@ static void __perf_event_enable(struct p event->state <=3D PERF_EVENT_STATE_ERROR) return; =20 - if (ctx->is_active) - ctx_sched_out(ctx, NULL, EVENT_TIME); + ctx_time_freeze(cpuctx, ctx); =20 perf_event_set_state(event, PERF_EVENT_STATE_INACTIVE); perf_cgroup_event_enable(event, ctx); @@ -2977,19 +3022,15 @@ static void __perf_event_enable(struct p if (!ctx->is_active) return; =20 - if (!event_filter_match(event)) { - ctx_sched_in(ctx, NULL, EVENT_TIME); + if (!event_filter_match(event)) return; - } =20 /* * If the event is in a group and isn't the group leader, * then don't put it on unless the group is on. */ - if (leader !=3D event && leader->state !=3D PERF_EVENT_STATE_ACTIVE) { - ctx_sched_in(ctx, NULL, EVENT_TIME); + if (leader !=3D event && leader->state !=3D PERF_EVENT_STATE_ACTIVE) return; - } =20 task_ctx =3D cpuctx->task_ctx; if (ctx->task) @@ -3263,7 +3304,7 @@ static void __pmu_ctx_sched_out(struct p struct perf_event *event, *tmp; struct pmu *pmu =3D pmu_ctx->pmu; =20 - if (ctx->task && !ctx->is_active) { + if (ctx->task && !(ctx->is_active & EVENT_ALL)) { struct perf_cpu_pmu_context *cpc; =20 cpc =3D this_cpu_ptr(pmu->cpu_pmu_context); @@ -3338,24 +3379,29 @@ ctx_sched_out(struct perf_event_context * * would only update time for the pinned events. */ - if (is_active & EVENT_TIME) { - /* update (and stop) ctx time */ - update_context_time(ctx); - update_cgrp_time_from_cpuctx(cpuctx, ctx =3D=3D &cpuctx->ctx); + __ctx_time_update(cpuctx, ctx, ctx =3D=3D &cpuctx->ctx); + + /* + * CPU-release for the below ->is_active store, + * see __load_acquire() in perf_event_time_now() + */ + barrier(); + ctx->is_active &=3D ~event_type; + + if (!(ctx->is_active & EVENT_ALL)) { /* - * CPU-release for the below ->is_active store, - * see __load_acquire() in perf_event_time_now() + * For FROZEN, preserve TIME|FROZEN such that perf_event_time_now() + * does not observe a hole. perf_ctx_unlock() will clean up. */ - barrier(); + if (ctx->is_active & EVENT_FROZEN) + ctx->is_active &=3D EVENT_TIME_FROZEN; + else + ctx->is_active =3D 0; } =20 - ctx->is_active &=3D ~event_type; - if (!(ctx->is_active & EVENT_ALL)) - ctx->is_active =3D 0; - if (ctx->task) { WARN_ON_ONCE(cpuctx->task_ctx !=3D ctx); - if (!ctx->is_active) + if (!(ctx->is_active & EVENT_ALL)) cpuctx->task_ctx =3D NULL; } =20 @@ -3943,7 +3989,7 @@ ctx_sched_in(struct perf_event_context * =20 ctx->is_active |=3D (event_type | EVENT_TIME); if (ctx->task) { - if (!is_active) + if (!(is_active & EVENT_ALL)) cpuctx->task_ctx =3D ctx; else WARN_ON_ONCE(cpuctx->task_ctx !=3D ctx); @@ -4424,7 +4470,7 @@ static void perf_event_enable_on_exec(st =20 cpuctx =3D this_cpu_ptr(&perf_cpu_context); perf_ctx_lock(cpuctx, ctx); - ctx_sched_out(ctx, NULL, EVENT_TIME); + ctx_time_freeze(cpuctx, ctx); =20 list_for_each_entry(event, &ctx->event_list, event_entry) { enabled |=3D event_enable_on_exec(event, ctx); @@ -4437,8 +4483,6 @@ static void perf_event_enable_on_exec(st if (enabled) { clone_ctx =3D unclone_ctx(ctx); ctx_resched(cpuctx, ctx, NULL, event_type); - } else { - ctx_sched_in(ctx, NULL, EVENT_TIME); } perf_ctx_unlock(cpuctx, ctx); From nobody Sat Feb 7 23:45:26 2026 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ECC061C9DD6; Wed, 7 Aug 2024 11:56:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723031789; cv=none; b=kVNFZhaDx+NDSJ86m6Ii9TYcnnheDFatdZnA31BDpi+koOhddNLOuq438u24kZIE/bagmYFcTKMyPK2oFThWiuO4kVv2vAUiw4ugH/5lVp3gtf9YLRubLCJhE2CDXoXRPWpJsnNRzBS4BnPRKQhRuKhtaS+bHu5S5C5K3y88Svc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723031789; c=relaxed/simple; bh=7vFpReSgChNl5N2B1nNSb0ynkw7YnJsq1aiowqO2L78=; h=Message-Id:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=j4uN0OSJxywLLQ+FmtuTx9t/vYNU8dYPh0gT620IPFskvEafTa4pTQ6eQxw4KTyFXxj2brZUAFQQh7Ooo+iiCEmaPmh14cMonSTPZsJh5NJE/rE7NOfKe2dO83AeCFfpXonjtSyVxiU8/LmAmpSu7KmA8flVe5oxAHBWs9t5BUE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=dUhByFM6; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="dUhByFM6" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-Id:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=Q1UNDQZ2e2JbYztzZF7igz7f6ZnlzcACcly57OU05G0=; b=dUhByFM6nnJRQGJzT7e9QWsSgT Rd3+Tb1dlCwVrt1blMhP4yWIWeIvqIfzG6z+qzw8Es6VOZnaNZzAdzsUoaDSLA4KNlFwshvwfG2IP ZCREuDkYbhAwwB6f7THXjNoIDGq3SbVbChvF59lNdSzCMPPy5LDtZMCIN2uGS8Q1NhVpK30sg6uht jpN4APqF/axDEAGeaaf9pFDXhhTLb4cAI7rNv5TNYu76cBRXgJ7jgoX8JuGyAw1DsdI/QhqSMGNY6 bSqO8luVG5zxp24SQB/6I7gCt9/eAVv4HKyIQVtnxQLJg3tTf/Od6ApT0wlcuqAlr6R2JhVwlzAO5 fkcgPPIw==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1sbfHD-00000006cCn-136Q; Wed, 07 Aug 2024 11:56:23 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 9F2D9300DDC; Wed, 7 Aug 2024 13:56:21 +0200 (CEST) Message-Id: <20240807115550.392851915@infradead.org> User-Agent: quilt/0.65 Date: Wed, 07 Aug 2024 13:29:29 +0200 From: Peter Zijlstra To: mingo@kernel.org Cc: peterz@infradead.org, acme@kernel.org, namhyung@kernel.org, mark.rutland@arm.com, alexander.shishkin@linux.intel.com, jolsa@kernel.org, irogers@google.com, adrian.hunter@intel.com, kan.liang@linux.intel.com, linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 5/5] perf: Optimize __pmu_ctx_sched_out() References: <20240807112924.448091402@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" There is is no point in doing the perf_pmu_disable() dance just to do nothing. This happens for ctx_sched_out(.type =3D EVENT_TIME) for instance. Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Kan Liang Reviewed-by: Namhyung Kim --- kernel/events/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -3304,7 +3304,7 @@ static void __pmu_ctx_sched_out(struct p cpc->task_epc =3D NULL; } =20 - if (!event_type) + if (!(event_type & EVENT_ALL)) return; =20 perf_pmu_disable(pmu);