Since __perf_remove_from_context updates event->state before
list_del_event, this prevents list_del_event from calling
perf_cgroup_event_disable, resulting in will not update nr_cgroups and
cpuctx->cgrp.
To fix this problem, move perf_cgroup_event_disable into
__perf_remove_from_context as:
commit a3c3c66670ce ("perf/core: Fix child_total_time_enabled accounting bug at task exit")
did.
Fixes: a3c3c66670ce ("perf/core: Fix child_total_time_enabled accounting bug at task exit")
Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com>
---
kernel/events/core.c | 15 +++------------
1 file changed, 3 insertions(+), 12 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index f34c99f8ce8f..280d42b40b34 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2120,18 +2120,6 @@ list_del_event(struct perf_event *event, struct perf_event_context *ctx)
if (event->group_leader == event)
del_event_from_groups(event, ctx);
- /*
- * If event was in error state, then keep it
- * that way, otherwise bogus counts will be
- * returned on read(). The only way to get out
- * of error state is by explicit re-enabling
- * of the event
- */
- if (event->state > PERF_EVENT_STATE_OFF) {
- perf_cgroup_event_disable(event, ctx);
- perf_event_set_state(event, PERF_EVENT_STATE_OFF);
- }
-
ctx->generation++;
event->pmu_ctx->nr_events--;
}
@@ -2498,6 +2486,9 @@ __perf_remove_from_context(struct perf_event *event,
state = PERF_EVENT_STATE_DEAD;
}
event_sched_out(event, ctx);
+
+ if (event->state > PERF_EVENT_STATE_OFF)
+ perf_cgroup_event_disable(event, ctx);
perf_event_set_state(event, min(event->state, state));
if (flags & DETACH_GROUP)
--
2.34.1
On Wed, Jun 04, 2025 at 03:39:23AM +0000, Luo Gengkun wrote:
> Since __perf_remove_from_context updates event->state before
> list_del_event, this prevents list_del_event from calling
> perf_cgroup_event_disable, resulting in will not update nr_cgroups and
> cpuctx->cgrp.
>
> To fix this problem, move perf_cgroup_event_disable into
> __perf_remove_from_context as:
>
> commit a3c3c66670ce ("perf/core: Fix child_total_time_enabled accounting bug at task exit")
>
> did.
>
> Fixes: a3c3c66670ce ("perf/core: Fix child_total_time_enabled accounting bug at task exit")
> Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com>
So I already have Yeoreum's patch for this:
https://lkml.kernel.org/r/20250603144414.GC38114@noisy.programming.kicks-ass.net
On 2025/6/4 17:19, Peter Zijlstra wrote:
> On Wed, Jun 04, 2025 at 03:39:23AM +0000, Luo Gengkun wrote:
>> Since __perf_remove_from_context updates event->state before
>> list_del_event, this prevents list_del_event from calling
>> perf_cgroup_event_disable, resulting in will not update nr_cgroups and
>> cpuctx->cgrp.
>>
>> To fix this problem, move perf_cgroup_event_disable into
>> __perf_remove_from_context as:
>>
>> commit a3c3c66670ce ("perf/core: Fix child_total_time_enabled accounting bug at task exit")
>>
>> did.
>>
>> Fixes: a3c3c66670ce ("perf/core: Fix child_total_time_enabled accounting bug at task exit")
>> Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com>
> So I already have Yeoreum's patch for this:
>
> https://lkml.kernel.org/r/20250603144414.GC38114@noisy.programming.kicks-ass.net
> Oh! okay. But the second patch can still be reviewd. Thanks.
© 2016 - 2025 Red Hat, Inc.