From nobody Thu Dec 18 14:53:25 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 07F88C001DF for ; Fri, 20 Oct 2023 17:51:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1377975AbjJTRvl (ORCPT ); Fri, 20 Oct 2023 13:51:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55204 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1377946AbjJTRvi (ORCPT ); Fri, 20 Oct 2023 13:51:38 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 4AF471A4 for ; Fri, 20 Oct 2023 10:51:36 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id F241E1595; Fri, 20 Oct 2023 10:52:16 -0700 (PDT) Received: from e121345-lin.cambridge.arm.com (e121345-lin.cambridge.arm.com [10.1.196.40]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 1A5F23F762; Fri, 20 Oct 2023 10:51:35 -0700 (PDT) From: Robin Murphy To: will@kernel.org Cc: mark.rutland@arm.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, jeremy.linton@arm.com, ilkka@os.amperecomputing.com, renyu.zj@linux.alibaba.com Subject: [PATCH 3/3] perf/arm-cmn: Enable per-DTC counter allocation Date: Fri, 20 Oct 2023 18:51:27 +0100 Message-Id: <849f65566582cb102c6d0843d0f26e231180f8ac.1697824215.git.robin.murphy@arm.com> X-Mailer: git-send-email 2.39.2.101.g768bb238c484.dirty In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Finally enable independent per-DTC-domain counter allocation, except on CMN-600 where we still need to cope with not knowing the domain topology and thus keep counter indices sychronised across domains. This allows users to simultaneously count up to 8 targeted events per domain, rather than 8 globally, for up to 4x wider coverage on maximum configurations. Even though this now looks deceptively simple, I stand by my previous assertion that it was a flippin' nightmare to implement; all the real head-scratchers are hidden in the foundations in the previous patch... Signed-off-by: Robin Murphy Reviewed-by: Ilkka Koskinen --- drivers/perf/arm-cmn.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c index 675f1638013e..9479e919c063 100644 --- a/drivers/perf/arm-cmn.c +++ b/drivers/perf/arm-cmn.c @@ -1570,7 +1570,7 @@ struct arm_cmn_val { u8 dtm_count[CMN_MAX_DTMS]; u8 occupid[CMN_MAX_DTMS][SEL_MAX]; u8 wp[CMN_MAX_DTMS][4]; - int dtc_count; + int dtc_count[CMN_MAX_DTCS]; bool cycles; }; =20 @@ -1591,7 +1591,8 @@ static void arm_cmn_val_add_event(struct arm_cmn *cmn= , struct arm_cmn_val *val, return; } =20 - val->dtc_count++; + for_each_hw_dtc_idx(hw, dtc, idx) + val->dtc_count[dtc]++; =20 for_each_hw_dn(hw, dn, i) { int wp_idx, dtm =3D dn->dtm, sel =3D hw->filter_sel; @@ -1638,8 +1639,9 @@ static int arm_cmn_validate_group(struct arm_cmn *cmn= , struct perf_event *event) goto done; } =20 - if (val->dtc_count =3D=3D CMN_DT_NUM_COUNTERS) - goto done; + for (i =3D 0; i < CMN_MAX_DTCS; i++) + if (val->dtc_count[i] =3D=3D CMN_DT_NUM_COUNTERS) + goto done; =20 for_each_hw_dn(hw, dn, i) { int wp_idx, wp_cmb, dtm =3D dn->dtm, sel =3D hw->filter_sel; @@ -1806,9 +1808,9 @@ static int arm_cmn_event_add(struct perf_event *event= , int flags) return 0; } =20 - /* Grab a free global counter first... */ + /* Grab the global counters first... */ for_each_hw_dtc_idx(hw, j, idx) { - if (j > 0) { + if (cmn->part =3D=3D PART_CMN600 && j > 0) { idx =3D hw->dtc_idx[0]; } else { idx =3D 0; @@ -1819,10 +1821,10 @@ static int arm_cmn_event_add(struct perf_event *eve= nt, int flags) hw->dtc_idx[j] =3D idx; } =20 - /* ...then the local counters to feed it. */ + /* ...then the local counters to feed them */ for_each_hw_dn(hw, dn, i) { struct arm_cmn_dtm *dtm =3D &cmn->dtms[dn->dtm] + hw->dtm_offset; - unsigned int dtm_idx, shift, d =3D 0; + unsigned int dtm_idx, shift, d =3D max_t(int, dn->dtc, 0); u64 reg; =20 dtm_idx =3D 0; --=20 2.39.2.101.g768bb238c484.dirty