From nobody Sun Jun 21 10:10:14 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5A782C433EF for ; Tue, 29 Mar 2022 15:45:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238959AbiC2Pri (ORCPT ); Tue, 29 Mar 2022 11:47:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42652 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238901AbiC2Prg (ORCPT ); Tue, 29 Mar 2022 11:47:36 -0400 Received: from mail-pg1-x52c.google.com (mail-pg1-x52c.google.com [IPv6:2607:f8b0:4864:20::52c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B959B3AA61 for ; Tue, 29 Mar 2022 08:45:52 -0700 (PDT) Received: by mail-pg1-x52c.google.com with SMTP id w21so15157375pgm.7 for ; Tue, 29 Mar 2022 08:45:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=sR+AGUD6IJfSlX//tE9IvBlX5391lNRFVMozD/58NMw=; b=vVx9AbYAQXR2dNV5eh0x3EJ+LbogFysar/7XN+VBC2k1xy/kv/3ZPB/z55oUbDf02Q 6fC8PQbzMis9xzotcHHE3cVoQ4Or9C+wptqK6kJnWp2y/cfcih5irmscWYDcTfbDIqKh FLRPchgp+CbllrsAH3bSs9nI1Y9xCFGvBcGRiyRkhg07DOcTTX/7l01/SJVqW1mzhSE7 TC1pjXxNSGB4xqnmMiaItc42TDS9hGm5eb7C+JLtll547vyLcwkL72fEkoTll1L2y1ac J0ng+qisZGHoZ8DBYv36BcIu7PHMb/KObOrOrViA8yyWrm7Ll1WoElKQGjpsZrUMUcl6 v3MA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=sR+AGUD6IJfSlX//tE9IvBlX5391lNRFVMozD/58NMw=; b=Ebh/GaNoDB1a6+m2FIs99CO8p6v4Z9yi7YD+Om4EDmrv+SVxvP7EALw0NpgX+bfQ+L Mp3e0pO7wgiREuc7lbIyD+Z/yuioGS9J3TArNQE/46bc2oponoel5kjMLyte984e1HaC nVZS3zsuS3oYK3HgyRPZQcKUQCbUUxa2mipiGTVaJHh4PpJU0A1VK8lguh3FJIDMicx0 NIJEU3j8GFwi5epxRuSMsIavyYo/AyZZzzC7Fsg3CymFhar2hIaAprbSjHbQPldiM7O4 Ku6qJut7G6PHHLITEHfZZFU2zJm8J0ZegL4mhCzspicCjG3HlLcEsHncdzZGFAyawcFn 71Fw== X-Gm-Message-State: AOAM5338mWTxSGEIZOdxw6lxZuxDHeCURWrAzpQYVGObRgfe3lQNQA8f G4kmimErdJ3k8A4xSuhBLKbqbA== X-Google-Smtp-Source: ABdhPJx6s1QCzN95yDkgEce4/qTzdIQESwqGAFZDm+gKDFnKo5L8OFL4QeMNiVLBngAd+LjJzWVtTQ== X-Received: by 2002:a62:8308:0:b0:4fa:7bcd:d0e6 with SMTP id h8-20020a628308000000b004fa7bcdd0e6mr28934815pfe.35.1648568751759; Tue, 29 Mar 2022 08:45:51 -0700 (PDT) Received: from localhost.localdomain ([2409:8a28:e65:74c0:705b:241a:6dc0:a4ac]) by smtp.gmail.com with ESMTPSA id u19-20020a056a00125300b004fafa43330csm17930733pfi.163.2022.03.29.08.45.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Mar 2022 08:45:51 -0700 (PDT) From: Chengming Zhou To: peterz@infradead.org, mingo@redhat.com, acme@kernel.org, mark.rutland@arm.com, alexander.shishkin@linux.intel.com, jolsa@kernel.org, namhyung@kernel.org, eranian@google.com Cc: linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, duanxiongchun@bytedance.com, songmuchun@bytedance.com, Chengming Zhou Subject: [PATCH v4 1/4] perf/core: Don't pass task around when ctx sched in Date: Tue, 29 Mar 2022 23:45:20 +0800 Message-Id: <20220329154523.86438-2-zhouchengming@bytedance.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220329154523.86438-1-zhouchengming@bytedance.com> References: <20220329154523.86438-1-zhouchengming@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The current code pass task around for ctx_sched_in(), only to get perf_cgroup of the task, then update the timestamp of it and its ancestors and set them to active. But we can use cpuctx->cgrp to get active perf_cgroup and its ancestors since cpuctx->cgrp has been set before ctx_sched_in(). This patch remove the task argument in ctx_sched_in() and cleanup related code. Signed-off-by: Chengming Zhou --- kernel/events/core.c | 58 ++++++++++++++++++++------------------------ 1 file changed, 26 insertions(+), 32 deletions(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index cfde994ce61c..d50f45012c05 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -574,8 +574,7 @@ static void cpu_ctx_sched_out(struct perf_cpu_context *= cpuctx, enum event_type_t event_type); =20 static void cpu_ctx_sched_in(struct perf_cpu_context *cpuctx, - enum event_type_t event_type, - struct task_struct *task); + enum event_type_t event_type); =20 static void update_context_time(struct perf_event_context *ctx); static u64 perf_event_time(struct perf_event *event); @@ -801,10 +800,10 @@ static inline void update_cgrp_time_from_event(struct= perf_event *event) } =20 static inline void -perf_cgroup_set_timestamp(struct task_struct *task, - struct perf_event_context *ctx) +perf_cgroup_set_timestamp(struct perf_cpu_context *cpuctx) { - struct perf_cgroup *cgrp; + struct perf_event_context *ctx =3D &cpuctx->ctx; + struct perf_cgroup *cgrp =3D cpuctx->cgrp; struct perf_cgroup_info *info; struct cgroup_subsys_state *css; =20 @@ -813,10 +812,10 @@ perf_cgroup_set_timestamp(struct task_struct *task, * ensure we do not access cgroup data * unless we have the cgroup pinned (css_get) */ - if (!task || !ctx->nr_cgroups) + if (!cgrp) return; =20 - cgrp =3D perf_cgroup_from_task(task, ctx); + WARN_ON_ONCE(!ctx->nr_cgroups); =20 for (css =3D &cgrp->css; css; css =3D css->parent) { cgrp =3D container_of(css, struct perf_cgroup, css); @@ -869,14 +868,14 @@ static void perf_cgroup_switch(struct task_struct *ta= sk, int mode) WARN_ON_ONCE(cpuctx->cgrp); /* * set cgrp before ctxsw in to allow - * event_filter_match() to not have to pass - * task around + * perf_cgroup_set_timestamp() in ctx_sched_in() + * to not have to pass task around * we pass the cpuctx->ctx to perf_cgroup_from_task() * because cgorup events are only per-cpu */ cpuctx->cgrp =3D perf_cgroup_from_task(task, &cpuctx->ctx); - cpu_ctx_sched_in(cpuctx, EVENT_ALL, task); + cpu_ctx_sched_in(cpuctx, EVENT_ALL); } perf_pmu_enable(cpuctx->ctx.pmu); perf_ctx_unlock(cpuctx, cpuctx->task_ctx); @@ -1118,8 +1117,7 @@ static inline int perf_cgroup_connect(pid_t pid, stru= ct perf_event *event, } =20 static inline void -perf_cgroup_set_timestamp(struct task_struct *task, - struct perf_event_context *ctx) +perf_cgroup_set_timestamp(struct perf_cpu_context *cpuctx) { } =20 @@ -2713,8 +2711,7 @@ static void ctx_sched_out(struct perf_event_context *= ctx, static void ctx_sched_in(struct perf_event_context *ctx, struct perf_cpu_context *cpuctx, - enum event_type_t event_type, - struct task_struct *task); + enum event_type_t event_type); =20 static void task_ctx_sched_out(struct perf_cpu_context *cpuctx, struct perf_event_context *ctx, @@ -2730,15 +2727,14 @@ static void task_ctx_sched_out(struct perf_cpu_cont= ext *cpuctx, } =20 static void perf_event_sched_in(struct perf_cpu_context *cpuctx, - struct perf_event_context *ctx, - struct task_struct *task) + struct perf_event_context *ctx) { - cpu_ctx_sched_in(cpuctx, EVENT_PINNED, task); + cpu_ctx_sched_in(cpuctx, EVENT_PINNED); if (ctx) - ctx_sched_in(ctx, cpuctx, EVENT_PINNED, task); - cpu_ctx_sched_in(cpuctx, EVENT_FLEXIBLE, task); + ctx_sched_in(ctx, cpuctx, EVENT_PINNED); + cpu_ctx_sched_in(cpuctx, EVENT_FLEXIBLE); if (ctx) - ctx_sched_in(ctx, cpuctx, EVENT_FLEXIBLE, task); + ctx_sched_in(ctx, cpuctx, EVENT_FLEXIBLE); } =20 /* @@ -2788,7 +2784,7 @@ static void ctx_resched(struct perf_cpu_context *cpuc= tx, else if (ctx_event_type & EVENT_PINNED) cpu_ctx_sched_out(cpuctx, EVENT_FLEXIBLE); =20 - perf_event_sched_in(cpuctx, task_ctx, current); + perf_event_sched_in(cpuctx, task_ctx); perf_pmu_enable(cpuctx->ctx.pmu); } =20 @@ -3011,7 +3007,7 @@ static void __perf_event_enable(struct perf_event *ev= ent, return; =20 if (!event_filter_match(event)) { - ctx_sched_in(ctx, cpuctx, EVENT_TIME, current); + ctx_sched_in(ctx, cpuctx, EVENT_TIME); return; } =20 @@ -3020,7 +3016,7 @@ static void __perf_event_enable(struct perf_event *ev= ent, * then don't put it on unless the group is on. */ if (leader !=3D event && leader->state !=3D PERF_EVENT_STATE_ACTIVE) { - ctx_sched_in(ctx, cpuctx, EVENT_TIME, current); + ctx_sched_in(ctx, cpuctx, EVENT_TIME); return; } =20 @@ -3865,8 +3861,7 @@ ctx_flexible_sched_in(struct perf_event_context *ctx, static void ctx_sched_in(struct perf_event_context *ctx, struct perf_cpu_context *cpuctx, - enum event_type_t event_type, - struct task_struct *task) + enum event_type_t event_type) { int is_active =3D ctx->is_active; =20 @@ -3878,7 +3873,7 @@ ctx_sched_in(struct perf_event_context *ctx, if (is_active ^ EVENT_TIME) { /* start ctx time */ __update_context_time(ctx, false); - perf_cgroup_set_timestamp(task, ctx); + perf_cgroup_set_timestamp(cpuctx); /* * CPU-release for the below ->is_active store, * see __load_acquire() in perf_event_time_now() @@ -3909,12 +3904,11 @@ ctx_sched_in(struct perf_event_context *ctx, } =20 static void cpu_ctx_sched_in(struct perf_cpu_context *cpuctx, - enum event_type_t event_type, - struct task_struct *task) + enum event_type_t event_type) { struct perf_event_context *ctx =3D &cpuctx->ctx; =20 - ctx_sched_in(ctx, cpuctx, event_type, task); + ctx_sched_in(ctx, cpuctx, event_type); } =20 static void perf_event_context_sched_in(struct perf_event_context *ctx, @@ -3956,7 +3950,7 @@ static void perf_event_context_sched_in(struct perf_e= vent_context *ctx, */ if (!RB_EMPTY_ROOT(&ctx->pinned_groups.tree)) cpu_ctx_sched_out(cpuctx, EVENT_FLEXIBLE); - perf_event_sched_in(cpuctx, ctx, task); + perf_event_sched_in(cpuctx, ctx); =20 if (cpuctx->sched_cb_usage && pmu->sched_task) pmu->sched_task(cpuctx->task_ctx, true); @@ -4267,7 +4261,7 @@ static bool perf_rotate_context(struct perf_cpu_conte= xt *cpuctx) if (cpu_event) rotate_ctx(&cpuctx->ctx, cpu_event); =20 - perf_event_sched_in(cpuctx, task_ctx, current); + perf_event_sched_in(cpuctx, task_ctx); =20 perf_pmu_enable(cpuctx->ctx.pmu); perf_ctx_unlock(cpuctx, cpuctx->task_ctx); @@ -4339,7 +4333,7 @@ static void perf_event_enable_on_exec(int ctxn) clone_ctx =3D unclone_ctx(ctx); ctx_resched(cpuctx, ctx, event_type); } else { - ctx_sched_in(ctx, cpuctx, EVENT_TIME, current); + ctx_sched_in(ctx, cpuctx, EVENT_TIME); } perf_ctx_unlock(cpuctx, ctx); =20 --=20 2.35.1 From nobody Sun Jun 21 10:10:14 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9457EC433EF for ; Tue, 29 Mar 2022 15:46:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238978AbiC2Prs (ORCPT ); Tue, 29 Mar 2022 11:47:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43288 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238961AbiC2Pro (ORCPT ); Tue, 29 Mar 2022 11:47:44 -0400 Received: from mail-pg1-x52b.google.com (mail-pg1-x52b.google.com [IPv6:2607:f8b0:4864:20::52b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4C34C3B3D4 for ; Tue, 29 Mar 2022 08:45:59 -0700 (PDT) Received: by mail-pg1-x52b.google.com with SMTP id c2so15152159pga.10 for ; Tue, 29 Mar 2022 08:45:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=zysooMn8IJ98VrrDS0WMswXwEoXCxIeE5qgjfz/kBk0=; b=tP53GDW0V6xO3Mc6lQ9a/ywqGjPQrFBuXylv2E+lWz1fyum/gB3R0xzsKhs8Qcny5w Bp5hw/OO2KHU9LnnO2++npNbSOm6AvfiZjg2XW59Ze415jjONWFRfBRQ2zvXQ1u/eHQO XpgfAMAa75Lerlc4uCJXdT8LRg43l4wUEbLCZ2Rspjhnb9/ZzSpFyxL0KqY6+63TTqQ3 slhWFTfgGIHBfRCPg13N56XKxv/iJ8XMX20oe3NcthQCiENRu0GyxZC67yrthteVdlQk Q4Edw4TyDf56JBRFblClP2GXYJ7S8wXKtTv8uuIv84jsm1F8Re5nf42DEy+/q0e7dEdT B1Ww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=zysooMn8IJ98VrrDS0WMswXwEoXCxIeE5qgjfz/kBk0=; b=WdTy612EP0YE4dAZ/KizCOjxo+EDQFZoddkWUqEeqnHdJ9sKKvHyCD4xhX5r7ojhbC c+uW5bIP2UP+8rR4cboHnN+3GFtu+8FpQdZIDRjzyN6oL/lh4lgSfc37NWL0UxK+Uyt3 4TnewvJ7sHO0RZV9ghMxq2TU2eU4aX/BOnHeMn52YX5XVEwDHd7Cxkfez3EcbPnA8Kx1 FEzy0XJeAVfTRqO/TAphNc6PrCfKAU0wJfKVyZIhkgFal78Yn1QCbPQm6yHo+NUTICXP wWWAlW6nQgJ/LrMFBH1JHiaswYgA8Ff8O/FhfAqqeZAoLVybKI64KLPk+AUOcJvrLjVm 9r8g== X-Gm-Message-State: AOAM533xzb0praM5aU3xgdmpCNXJOcCIYH93zMdM6PLo1Roe0RakzqDn 2bSqqjBiMxd6klb0u1ZMG+pWUA== X-Google-Smtp-Source: ABdhPJzc/ykja27CyhJnywo1svS553ci+1IAjRy0v1iRDw+KcTufkHUAhz0uSj45kr4a+uoaDKbRkQ== X-Received: by 2002:a63:2ad0:0:b0:398:31d7:9955 with SMTP id q199-20020a632ad0000000b0039831d79955mr2400322pgq.198.1648568758750; Tue, 29 Mar 2022 08:45:58 -0700 (PDT) Received: from localhost.localdomain ([2409:8a28:e65:74c0:705b:241a:6dc0:a4ac]) by smtp.gmail.com with ESMTPSA id u19-20020a056a00125300b004fafa43330csm17930733pfi.163.2022.03.29.08.45.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Mar 2022 08:45:58 -0700 (PDT) From: Chengming Zhou To: peterz@infradead.org, mingo@redhat.com, acme@kernel.org, mark.rutland@arm.com, alexander.shishkin@linux.intel.com, jolsa@kernel.org, namhyung@kernel.org, eranian@google.com Cc: linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, duanxiongchun@bytedance.com, songmuchun@bytedance.com, Chengming Zhou Subject: [PATCH v4 2/4] perf/core: Use perf_cgroup_info->active to check if cgroup is active Date: Tue, 29 Mar 2022 23:45:21 +0800 Message-Id: <20220329154523.86438-3-zhouchengming@bytedance.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220329154523.86438-1-zhouchengming@bytedance.com> References: <20220329154523.86438-1-zhouchengming@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Since we use perf_cgroup_set_timestamp() to start cgroup time and set active to 1, then use update_cgrp_time_from_cpuctx() to stop cgroup time and set active to 0. We can use info->active directly to check if cgroup is active. Signed-off-by: Chengming Zhou --- kernel/events/core.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index d50f45012c05..dd985c77bc37 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -780,7 +780,6 @@ static inline void update_cgrp_time_from_cpuctx(struct = perf_cpu_context *cpuctx, static inline void update_cgrp_time_from_event(struct perf_event *event) { struct perf_cgroup_info *info; - struct perf_cgroup *cgrp; =20 /* * ensure we access cgroup data only when needed and @@ -789,14 +788,12 @@ static inline void update_cgrp_time_from_event(struct= perf_event *event) if (!is_cgroup_event(event)) return; =20 - cgrp =3D perf_cgroup_from_task(current, event->ctx); + info =3D this_cpu_ptr(event->cgrp->info); /* * Do not update time when cgroup is not active */ - if (cgroup_is_descendant(cgrp->css.cgroup, event->cgrp->css.cgroup)) { - info =3D this_cpu_ptr(event->cgrp->info); + if (info->active) __update_cgrp_time(info, perf_clock(), true); - } } =20 static inline void --=20 2.35.1 From nobody Sun Jun 21 10:10:14 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A66D1C43217 for ; Tue, 29 Mar 2022 15:46:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238990AbiC2PsR (ORCPT ); Tue, 29 Mar 2022 11:48:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43760 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238992AbiC2Pru (ORCPT ); Tue, 29 Mar 2022 11:47:50 -0400 Received: from mail-pl1-x634.google.com (mail-pl1-x634.google.com [IPv6:2607:f8b0:4864:20::634]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1D2CD3AA61 for ; Tue, 29 Mar 2022 08:46:07 -0700 (PDT) Received: by mail-pl1-x634.google.com with SMTP id i11so18026750plr.1 for ; Tue, 29 Mar 2022 08:46:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=uC+G029pRUbANRtsBSWrhj/k5Z97cXUw0xB+VoEAtDM=; b=S1cNNRliR63v2Hn/AZFrMYIQKTI3MtamEdSp9uZ2v7hwVzXkNgEmMfwTwKokufqtSn wVjrnu3Wdm2RBACe0XVLvqzAHCxbYFvyOiIGGF6hVvfFig8qY/IZc8cUfGYrd1F/uTT6 WcA8esrQfIC1SjhvT/192FZWD8pYLDrlA0wOPA90EM3/lxqIe2vltEFheTVWhBDCgR/x wVKvNm8853UMiNRnN8fzKYoZcN7+CIARw2lnnbd9+NsEB8W1Qy0vqFt0EHXqFjrIPzGi b3w0bVbGeFQ3mQWgMS3Wfz1HUPDR76doXcVUUEEZX91eXII4Z/EWnsLZsNL3zAEAVnVd XeQA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=uC+G029pRUbANRtsBSWrhj/k5Z97cXUw0xB+VoEAtDM=; b=0ekRzr0O/0ipg2EnLy0ydsV10LaQQDcu72lZ0LOG5noXr+0QlmYJSs0LTWNVDwpDsp ZtNiKgN5/FqRO2HVYuhysRC1sm8SkjPUR5C1ItZsqzFx7VhtVtu22hIAYtOPmVfkSsrn 0UywnlO6otxLs7BJCHOmp6loU6YS+e+rf2O9SYxw414XJbxdZxzndsOWGslMZ+7Qpu4k wT7/XJlufSsrBuaem2B2k5J3HPHA3j3bDFsaI8o5NkMpFLlmE3qDKWIopRTY/R6KCFRa r+29hl7+oUpQ7698thPxQnU0C8Io6sTgxaZeEJyVLudo4xTgpjDvngjhkqzap4p0XjSw TBxA== X-Gm-Message-State: AOAM530tGHsRRppKIlJh9OyJTpomHitTO8V4tr7gD6cywX1SKBQOKnd+ 6DnoxTGuvilOpsvJUU/2V0b44vQTqt3bFw== X-Google-Smtp-Source: ABdhPJzat1rQkcKsQ8VTqNECqthm5riWpxvqqpb3wNtS9WoIR+3YOsUau/2UIPL9Ux2rw9QRYTbwSg== X-Received: by 2002:a17:902:864b:b0:14c:d45e:a77b with SMTP id y11-20020a170902864b00b0014cd45ea77bmr31190050plt.143.1648568765776; Tue, 29 Mar 2022 08:46:05 -0700 (PDT) Received: from localhost.localdomain ([2409:8a28:e65:74c0:705b:241a:6dc0:a4ac]) by smtp.gmail.com with ESMTPSA id u19-20020a056a00125300b004fafa43330csm17930733pfi.163.2022.03.29.08.45.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Mar 2022 08:46:05 -0700 (PDT) From: Chengming Zhou To: peterz@infradead.org, mingo@redhat.com, acme@kernel.org, mark.rutland@arm.com, alexander.shishkin@linux.intel.com, jolsa@kernel.org, namhyung@kernel.org, eranian@google.com Cc: linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, duanxiongchun@bytedance.com, songmuchun@bytedance.com, Chengming Zhou Subject: [PATCH v4 3/4] perf/core: Fix perf_cgroup_switch() Date: Tue, 29 Mar 2022 23:45:22 +0800 Message-Id: <20220329154523.86438-4-zhouchengming@bytedance.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220329154523.86438-1-zhouchengming@bytedance.com> References: <20220329154523.86438-1-zhouchengming@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" There is a race problem that can trigger WARN_ON_ONCE(cpuctx->cgrp) in perf_cgroup_switch(). CPU1 CPU2 perf_cgroup_sched_out(prev, next) cgrp1 =3D perf_cgroup_from_task(prev) cgrp2 =3D perf_cgroup_from_task(next) if (cgrp1 !=3D cgrp2) perf_cgroup_switch(prev, PERF_CGROUP_SWOUT) cgroup_migrate_execute() task->cgroups =3D ? perf_cgroup_attach() task_function_call(task, __perf_cgroup_move) perf_cgroup_sched_in(prev, next) cgrp1 =3D perf_cgroup_from_task(prev) cgrp2 =3D perf_cgroup_from_task(next) if (cgrp1 !=3D cgrp2) perf_cgroup_switch(next, PERF_CGROUP_SWIN) __perf_cgroup_move() perf_cgroup_switch(task, PERF_CGROUP_SWOUT | PERF_CGROUP_SWIN) The commit a8d757ef076f ("perf events: Fix slow and broken cgroup context switch code") want to skip perf_cgroup_switch() when the perf_cgroup of "prev" and "next" are the same. But task->cgroups can change in concurrent with context_switch() in cgroup_migrate_execute(). If cgrp1 =3D=3D cgrp2 in sched_out(), cpuctx won't do sched_out. Then task->cgroups changed cause cgrp1 !=3D cgrp2 in sched_in(), cpuctx will do sched_in. So trigger WARN_ON_ONCE(cpuctx->cgrp). Even though __perf_cgroup_move() will be synchronized as the context switch disables the interrupt, context_switch() still can see the task->cgroups is changing in the middle, since task->cgroups changed before sending IPI. So we have to combine perf_cgroup_sched_in() into perf_cgroup_sched_out(), unified into perf_cgroup_switch(), to fix the incosistency between perf_cgroup_sched_out() and perf_cgroup_sched_in(). But we can't just compare prev->cgroups with next->cgroups to decide whether to skip cpuctx sched_out/in since the prev->cgroups is changing too. For example: CPU1 CPU2 cgroup_migrate_execute() prev->cgroups =3D ? perf_cgroup_attach() task_function_call(task, __perf_cgroup_move) perf_cgroup_switch(task) cgrp1 =3D perf_cgroup_from_task(prev) cgrp2 =3D perf_cgroup_from_task(next) if (cgrp1 !=3D cgrp2) cpuctx sched_out/in ... task_function_call() will return -ESRCH In the above example, prev->cgroups changing cause (cgrp1 =3D=3D cgrp2) to be true, so skip cpuctx sched_out/in. And later task_function_call() would return -ESRCH since the prev task isn't running on cpu anymore. So we would leave perf_events of the old prev->cgroups still sched on the CPU, which is wrong. The solution is that we should use cpuctx->cgrp to compare with the next task's perf_cgroup. Since cpuctx->cgrp can only be changed on local CPU, and we have irq disabled, we can read cpuctx->cgrp to compare without holding ctx lock. Fixes: a8d757ef076f ("perf events: Fix slow and broken cgroup context switc= h code") Signed-off-by: Chengming Zhou --- kernel/events/core.c | 132 ++++++++----------------------------------- 1 file changed, 25 insertions(+), 107 deletions(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index dd985c77bc37..782b9f5e3fc7 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -824,17 +824,12 @@ perf_cgroup_set_timestamp(struct perf_cpu_context *cp= uctx) =20 static DEFINE_PER_CPU(struct list_head, cgrp_cpuctx_list); =20 -#define PERF_CGROUP_SWOUT 0x1 /* cgroup switch out every event */ -#define PERF_CGROUP_SWIN 0x2 /* cgroup switch in events based on task */ - /* * reschedule events based on the cgroup constraint of task. - * - * mode SWOUT : schedule out everything - * mode SWIN : schedule in based on cgroup for next */ -static void perf_cgroup_switch(struct task_struct *task, int mode) +static void perf_cgroup_switch(struct task_struct *task) { + struct perf_cgroup *cgrp; struct perf_cpu_context *cpuctx, *tmp; struct list_head *list; unsigned long flags; @@ -845,35 +840,31 @@ static void perf_cgroup_switch(struct task_struct *ta= sk, int mode) */ local_irq_save(flags); =20 + cgrp =3D perf_cgroup_from_task(task, NULL); + list =3D this_cpu_ptr(&cgrp_cpuctx_list); list_for_each_entry_safe(cpuctx, tmp, list, cgrp_cpuctx_entry) { WARN_ON_ONCE(cpuctx->ctx.nr_cgroups =3D=3D 0); + if (READ_ONCE(cpuctx->cgrp) =3D=3D cgrp) + continue; =20 perf_ctx_lock(cpuctx, cpuctx->task_ctx); perf_pmu_disable(cpuctx->ctx.pmu); =20 - if (mode & PERF_CGROUP_SWOUT) { - cpu_ctx_sched_out(cpuctx, EVENT_ALL); - /* - * must not be done before ctxswout due - * to event_filter_match() in event_sched_out() - */ - cpuctx->cgrp =3D NULL; - } + cpu_ctx_sched_out(cpuctx, EVENT_ALL); + /* + * must not be done before ctxswout due + * to update_cgrp_time_from_cpuctx() in + * ctx_sched_out() + */ + cpuctx->cgrp =3D cgrp; + /* + * set cgrp before ctxsw in to allow + * perf_cgroup_set_timestamp() in ctx_sched_in() + * to not have to pass task around + */ + cpu_ctx_sched_in(cpuctx, EVENT_ALL); =20 - if (mode & PERF_CGROUP_SWIN) { - WARN_ON_ONCE(cpuctx->cgrp); - /* - * set cgrp before ctxsw in to allow - * perf_cgroup_set_timestamp() in ctx_sched_in() - * to not have to pass task around - * we pass the cpuctx->ctx to perf_cgroup_from_task() - * because cgorup events are only per-cpu - */ - cpuctx->cgrp =3D perf_cgroup_from_task(task, - &cpuctx->ctx); - cpu_ctx_sched_in(cpuctx, EVENT_ALL); - } perf_pmu_enable(cpuctx->ctx.pmu); perf_ctx_unlock(cpuctx, cpuctx->task_ctx); } @@ -881,58 +872,6 @@ static void perf_cgroup_switch(struct task_struct *tas= k, int mode) local_irq_restore(flags); } =20 -static inline void perf_cgroup_sched_out(struct task_struct *task, - struct task_struct *next) -{ - struct perf_cgroup *cgrp1; - struct perf_cgroup *cgrp2 =3D NULL; - - rcu_read_lock(); - /* - * we come here when we know perf_cgroup_events > 0 - * we do not need to pass the ctx here because we know - * we are holding the rcu lock - */ - cgrp1 =3D perf_cgroup_from_task(task, NULL); - cgrp2 =3D perf_cgroup_from_task(next, NULL); - - /* - * only schedule out current cgroup events if we know - * that we are switching to a different cgroup. Otherwise, - * do no touch the cgroup events. - */ - if (cgrp1 !=3D cgrp2) - perf_cgroup_switch(task, PERF_CGROUP_SWOUT); - - rcu_read_unlock(); -} - -static inline void perf_cgroup_sched_in(struct task_struct *prev, - struct task_struct *task) -{ - struct perf_cgroup *cgrp1; - struct perf_cgroup *cgrp2 =3D NULL; - - rcu_read_lock(); - /* - * we come here when we know perf_cgroup_events > 0 - * we do not need to pass the ctx here because we know - * we are holding the rcu lock - */ - cgrp1 =3D perf_cgroup_from_task(task, NULL); - cgrp2 =3D perf_cgroup_from_task(prev, NULL); - - /* - * only need to schedule in cgroup events if we are changing - * cgroup during ctxsw. Cgroup events were not scheduled - * out of ctxsw out if that was not the case. - */ - if (cgrp1 !=3D cgrp2) - perf_cgroup_switch(task, PERF_CGROUP_SWIN); - - rcu_read_unlock(); -} - static int perf_cgroup_ensure_storage(struct perf_event *event, struct cgroup_subsys_state *css) { @@ -1096,16 +1035,6 @@ static inline void update_cgrp_time_from_cpuctx(stru= ct perf_cpu_context *cpuctx, { } =20 -static inline void perf_cgroup_sched_out(struct task_struct *task, - struct task_struct *next) -{ -} - -static inline void perf_cgroup_sched_in(struct task_struct *prev, - struct task_struct *task) -{ -} - static inline int perf_cgroup_connect(pid_t pid, struct perf_event *event, struct perf_event_attr *attr, struct perf_event *group_leader) @@ -1118,11 +1047,6 @@ perf_cgroup_set_timestamp(struct perf_cpu_context *c= puctx) { } =20 -static inline void -perf_cgroup_switch(struct task_struct *task, struct task_struct *next) -{ -} - static inline u64 perf_cgroup_event_time(struct perf_event *event) { return 0; @@ -1142,6 +1066,10 @@ static inline void perf_cgroup_event_disable(struct perf_event *event, struct perf_event_cont= ext *ctx) { } + +static void perf_cgroup_switch(struct task_struct *task) +{ +} #endif =20 /* @@ -3661,7 +3589,7 @@ void __perf_event_task_sched_out(struct task_struct *= task, * cgroup event are system-wide mode only */ if (atomic_read(this_cpu_ptr(&perf_cgroup_events))) - perf_cgroup_sched_out(task, next); + perf_cgroup_switch(next); } =20 /* @@ -3975,16 +3903,6 @@ void __perf_event_task_sched_in(struct task_struct *= prev, struct perf_event_context *ctx; int ctxn; =20 - /* - * If cgroup events exist on this CPU, then we need to check if we have - * to switch in PMU state; cgroup event are system-wide mode only. - * - * Since cgroup events are CPU events, we must schedule these in before - * we schedule in the task events. - */ - if (atomic_read(this_cpu_ptr(&perf_cgroup_events))) - perf_cgroup_sched_in(prev, task); - for_each_task_context_nr(ctxn) { ctx =3D task->perf_event_ctxp[ctxn]; if (likely(!ctx)) @@ -13553,7 +13471,7 @@ static int __perf_cgroup_move(void *info) { struct task_struct *task =3D info; rcu_read_lock(); - perf_cgroup_switch(task, PERF_CGROUP_SWOUT | PERF_CGROUP_SWIN); + perf_cgroup_switch(task); rcu_read_unlock(); return 0; } --=20 2.35.1 From nobody Sun Jun 21 10:10:14 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D5DAEC433F5 for ; Tue, 29 Mar 2022 15:46:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239021AbiC2Ps0 (ORCPT ); Tue, 29 Mar 2022 11:48:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44234 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239036AbiC2Pr5 (ORCPT ); Tue, 29 Mar 2022 11:47:57 -0400 Received: from mail-pj1-x102e.google.com (mail-pj1-x102e.google.com [IPv6:2607:f8b0:4864:20::102e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 883A03A727 for ; Tue, 29 Mar 2022 08:46:13 -0700 (PDT) Received: by mail-pj1-x102e.google.com with SMTP id y16so5123750pju.4 for ; Tue, 29 Mar 2022 08:46:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Rhwh9jxyPwcfCpsQO0KCczugdPBmMe7ki4JPplrKdX0=; b=1ZOjuYVV7IS49mm+29IIWCx9Kgw806xN7HcEro53PMxH+61SjdUcSsXzGPUYBBEH88 uwKJaOwzbkvmfD6JHHJo/DrDfG7P1GwtqU9KbajcVCiZBlnrjlcspNGkc30DBibvKO0a WzNULbGBUuWwjO8xuVOTcMfY1exC724DLjk4ZtC3nNeV7FSLgKLprzgTrGQrDoABGLeH Gwq9ad9A6tDfoGOobXdgBJyW4FaOySqhkOSHUWmiUdatAzSGW0fWmqF5RIy7zn52Gjkh GDMB70CeSy9CZzN37x+re8L/OOI4ghmWSNrFRkdk28aZLVcUjiOiGm/GYi7hEpYn9L2y GiVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Rhwh9jxyPwcfCpsQO0KCczugdPBmMe7ki4JPplrKdX0=; b=zAgBW2uwLroB+5YgSg0F1/WU+JAbcByNprOjyidWZ7X+u6zIRgb+IHmZsvQt5LSU8l SSzJ2fyFIL5ufJmtCNkzjxYM9w779c4iK2VyLcR18F2EOiJNYvsewlFiVsz/mOu1sEjI /wqZ3AcvohCDHZwVivWdo0YVnCiCcbI04a+JaEgRKWNIBruBG+8KkQlpn9Ra9ADX4NtU 5bi5dwb9O4aBvU85rvEiA/074Cxc8FNqSB1HSAAqjvYG5n7UQg3geo/OXiif4hw721hI aEAr3NOXnaOVk9rCk2VlMHMLumWJ5LhCgL+24jJyUpFkDQMv6u8LvScG7QFUQT1K8NNa EOgg== X-Gm-Message-State: AOAM532GhMnkG2CW8FNt8jfpA5nHw8klAlTNGrxLE4jeateYfiLsFWmt JHpB+pg6ktKjtM84E490DtdnQiVd3Lh4bw== X-Google-Smtp-Source: ABdhPJxI2f60H+9yDfbq7bCXzPA5q5GFVR35gKrDEchQo7UVAUEKkvLhORP9cHiUPxiVJ/8hzds0VA== X-Received: by 2002:a17:902:f70a:b0:153:88c7:774 with SMTP id h10-20020a170902f70a00b0015388c70774mr31141798plo.166.1648568772925; Tue, 29 Mar 2022 08:46:12 -0700 (PDT) Received: from localhost.localdomain ([2409:8a28:e65:74c0:705b:241a:6dc0:a4ac]) by smtp.gmail.com with ESMTPSA id u19-20020a056a00125300b004fafa43330csm17930733pfi.163.2022.03.29.08.46.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Mar 2022 08:46:12 -0700 (PDT) From: Chengming Zhou To: peterz@infradead.org, mingo@redhat.com, acme@kernel.org, mark.rutland@arm.com, alexander.shishkin@linux.intel.com, jolsa@kernel.org, namhyung@kernel.org, eranian@google.com Cc: linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, duanxiongchun@bytedance.com, songmuchun@bytedance.com, Chengming Zhou Subject: [PATCH v4 4/4] perf/core: Always set cpuctx cgrp when enable cgroup event Date: Tue, 29 Mar 2022 23:45:23 +0800 Message-Id: <20220329154523.86438-5-zhouchengming@bytedance.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220329154523.86438-1-zhouchengming@bytedance.com> References: <20220329154523.86438-1-zhouchengming@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" When enable a cgroup event, cpuctx->cgrp setting is conditional on the current task cgrp matching the event's cgroup, so have to do it for every new event. It brings complexity but no advantage. To keep it simple, this patch would always set cpuctx->cgrp when enable the first cgroup event, and reset to NULL when disable the last cgroup event. Signed-off-by: Chengming Zhou --- kernel/events/core.c | 18 ++---------------- 1 file changed, 2 insertions(+), 16 deletions(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index 782b9f5e3fc7..a1f6f0a54ef7 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -967,22 +967,10 @@ perf_cgroup_event_enable(struct perf_event *event, st= ruct perf_event_context *ct */ cpuctx =3D container_of(ctx, struct perf_cpu_context, ctx); =20 - /* - * Since setting cpuctx->cgrp is conditional on the current @cgrp - * matching the event's cgroup, we must do this for every new event, - * because if the first would mismatch, the second would not try again - * and we would leave cpuctx->cgrp unset. - */ - if (ctx->is_active && !cpuctx->cgrp) { - struct perf_cgroup *cgrp =3D perf_cgroup_from_task(current, ctx); - - if (cgroup_is_descendant(cgrp->css.cgroup, event->cgrp->css.cgroup)) - cpuctx->cgrp =3D cgrp; - } - if (ctx->nr_cgroups++) return; =20 + cpuctx->cgrp =3D perf_cgroup_from_task(current, ctx); list_add(&cpuctx->cgrp_cpuctx_entry, per_cpu_ptr(&cgrp_cpuctx_list, event->cpu)); } @@ -1004,9 +992,7 @@ perf_cgroup_event_disable(struct perf_event *event, st= ruct perf_event_context *c if (--ctx->nr_cgroups) return; =20 - if (ctx->is_active && cpuctx->cgrp) - cpuctx->cgrp =3D NULL; - + cpuctx->cgrp =3D NULL; list_del(&cpuctx->cgrp_cpuctx_entry); } =20 --=20 2.35.1