From nobody Tue Jan 27 00:13:44 2026 Received: from CO1PR03CU002.outbound.protection.outlook.com (mail-westus2azon11010028.outbound.protection.outlook.com [52.101.46.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1DC2034D932; Mon, 26 Jan 2026 18:13:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.46.28 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769451200; cv=fail; b=u5jTYwVaTwuKrVylenNxkZ8jA4i6RrohptVLkaFdkc3NTWyGiVE6DT7cP6JN0iRvbOelCIo3TLqIfrzKCw7ghOa8d6HXxMJId/SJTdxce/c8fbDfYqEwp/g9dgy9Pzr/0QVOIt92/jal1dwx90/FnC9PzP3xx8RSyt1frb60zI4= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769451200; c=relaxed/simple; bh=a0KJJu2FOafD5BAMVclpDoU3oFxaRVlDEPs6OU3Dxmw=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=DXpVFhGuYqe2U+5eNlFZk5jHVRYld1495mefnKMxDks0k/8HtQt+xCQf54eCW0eT1RDX5qKRCXTr9PrhWH737IDUUU1NKY3VY8MRkRawli1VD5dwOnQFvTPi2WHJB+07d4mZY7mFL4SD+8d0VqWmWh3gB4Z6XV76Z5g/O1KkOiM= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=oIRmPS7L; arc=fail smtp.client-ip=52.101.46.28 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="oIRmPS7L" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=bJkWZHbPtPNWIutish54Q3h2KX4GD2m1O6THlYUs+AFKzy9epQQxc0olJD2pW8mIYomZrxGc9hsAJZNWYhZjehEuV8eOsUd/60Jvbce8zmzc7ObcZ8cTGsV4/jclPyAaAyXjoCNqmtd2fhP1NPMzz59QslTur6EIezt7+/1BGJ/+1Ra8igtisGm4JtQl6WDnvx4QekeY/iMRWcdAZl2tIQRb7Qm3QH/wsuZqEQ8L+mozctlRo9l/ORmCdJjk9ap4EXe+Ait8lycWACBp/g7BLr/79QXtdgZmSaPmX2dXYgklRDhtkcIEkR+VqshppZEdnbp9ZvdFqMP3CZ04Nex2lA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=wGU28m02fh2GLpogu/Ho6mdEWfmVROUdh4qaI76JK/E=; b=KCf1wkiURaTIo4vIZJE8+XoH1kgwaMFBSjolMz+oFgXLpDjw8RmplHCJ8IwgoYUXI/P0YKWL2sxymnUlAvPVREEJlV6jsnUwtPLRDQ7AhiIgPff99HpNKX8/2PdU4x15rfTmHvYMJa6/lJzy/b0RZTc1MC0j3wzgN3N6175rj7+PaDMymCnNFh5WM21RJwnWU+VcMK7UMcNbRLM0QWWhWF/oku6Xvz0mD5VtFkwDgx2pxlhg59uSXW3Y+NZ/96hXjMFQ41cByknsKpiVX5BIXs1iCS0oL+NgDB04Zop88fHW9x6RoS5EqdKOzYJMiCYTeYxUjamwlJDZaeE2Do/ghA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.233) smtp.rcpttodomain=kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=wGU28m02fh2GLpogu/Ho6mdEWfmVROUdh4qaI76JK/E=; b=oIRmPS7LSroKdDnTCuv6asOsY9gAM9HTBc8Wwin6gaFofij9479wzkCmx5Y54gQBd1cETdCup0wjjxR9cIBUsY1e8diD3nStTGNW38Hpgk2+gVrJhBYpKuErcSaz5KpbM8w1gX0eT81xEVrGsQSBBfk0zfjr50k3bcNXXC6uYRrY4EyYHciJscNxYlf3u443RajcwQdtzRosZQSFVPWGKmPUkuofenBvQx1mAFS3Eo73Nv4AX9gFkvZ3mWSJvS8EObr4VahrxYAYnUcWdzDGxbtdt8aNcm2O7SAVKmvrKNtYAowjLVbjWgDFMXyNDEGFTJgJjLVH3pEnM+i5oFnKag== Received: from SJ0PR05CA0080.namprd05.prod.outlook.com (2603:10b6:a03:332::25) by MN2PR12MB4270.namprd12.prod.outlook.com (2603:10b6:208:1d9::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9542.14; Mon, 26 Jan 2026 18:13:05 +0000 Received: from SJ5PEPF000001F4.namprd05.prod.outlook.com (2603:10b6:a03:332:cafe::17) by SJ0PR05CA0080.outlook.office365.com (2603:10b6:a03:332::25) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9564.7 via Frontend Transport; Mon, 26 Jan 2026 18:13:00 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.233) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.233 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.233; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.233) by SJ5PEPF000001F4.mail.protection.outlook.com (10.167.242.72) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9564.3 via Frontend Transport; Mon, 26 Jan 2026 18:13:04 +0000 Received: from drhqmail202.nvidia.com (10.126.190.181) by mail.nvidia.com (10.127.129.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Mon, 26 Jan 2026 10:12:53 -0800 Received: from drhqmail201.nvidia.com (10.126.190.180) by drhqmail202.nvidia.com (10.126.190.181) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Mon, 26 Jan 2026 10:12:53 -0800 Received: from build-bwicaksono-noble-20251018.internal (10.127.8.11) by mail.nvidia.com (10.126.190.180) with Microsoft SMTP Server id 15.2.2562.20 via Frontend Transport; Mon, 26 Jan 2026 10:12:52 -0800 From: Besar Wicaksono To: , , , CC: , , , , , , , , , , , , , Besar Wicaksono Subject: [PATCH 4/8] perf/arm_cspmu: nvidia: Add Tegra410 PCIE PMU Date: Mon, 26 Jan 2026 18:11:51 +0000 Message-ID: <20260126181155.2776097-5-bwicaksono@nvidia.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260126181155.2776097-1-bwicaksono@nvidia.com> References: <20260126181155.2776097-1-bwicaksono@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-NV-OnPremToCloud: ExternallySecured X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SJ5PEPF000001F4:EE_|MN2PR12MB4270:EE_ X-MS-Office365-Filtering-Correlation-Id: cd233f17-af10-46a3-f55c-08de5d068cdf X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|376014|82310400026|36860700013; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?uJeqUVkR0fRcX1AjvLUfnr4KZSVAnKJe/rTxEklvIVSlaTVpHmjC5J8UROBU?= =?us-ascii?Q?YMxpkH8Wk1IVv04f9ecFLxB782R9J5nDQ5OAn3xpOFGIFje2DY+b2EewCRRK?= =?us-ascii?Q?/f/JgOWbqSA6XSeg6AnzaX56po2tB0eXO9fE91mW+Te8bYkcCVjKqAh0QfyI?= =?us-ascii?Q?7hwKEXsjK1AEhxAyyaOPplVl23XSlELhtJ8PJzqhTdUhl4r3IKPNvt2f46Tc?= =?us-ascii?Q?SO6+U5PY0Mi+wcZPwJfGEH1nKT3PKtx5FfMfk2/cImp/xBEjUaT2RnR1m64R?= =?us-ascii?Q?l7bF0Fd3dhfO5CCjHJN2EsNaXaQR680Om17ZQpIo01N95MfVfau7DKnCyZpY?= =?us-ascii?Q?vosjCeQOzT8UiCvAH+8sqXxLm3crXKV4+WD54iVyZSIOaOb+whJzlkfIiwzf?= =?us-ascii?Q?c0nz4a3IrRSSPgwyzd/3LmrNmeN65VHvL0ySPdFmzRatqPwrulPphZxMsEsd?= =?us-ascii?Q?Md/Q5c6VHbbI6TnXogMUq4pnOv6l48TsIPK0wixt6oo1vjvmdUQnB6RdY3Ly?= =?us-ascii?Q?BCCGYUrEw/E559GUHZJ3EfXafkKIjwmTM+++2QkAtGwXB/NHAgExQEdPt6z5?= =?us-ascii?Q?zBOZSppIY7dn9GEReHXDZHsNgoFz3K0PNGQSYLzXDcXKnUR5nYpqF/8O5D86?= =?us-ascii?Q?WTvwhnlwsY3kO0jsYA1o7vcDW+2qN/KEhJfdZLYVxD48xfxm4/ffSL2sM8vL?= =?us-ascii?Q?qKz6QxDOWA//HhatntTuIVg2M/zQmR2gWzy6xtv5b7nbIkjtwvFUNpJ/RpqS?= =?us-ascii?Q?sTmkRZBpOwDQlN4mU2StBusL9mx2At9OnpwUc2cd795sBVUOpjzDC9uPdkY0?= =?us-ascii?Q?UsNtUjSuHUfwQX8XrYzcxVChAa4LBhhlSe0qIwopS+MmecLAa0FqQNltfqvm?= =?us-ascii?Q?yc7+4qq2nxHmEdfadJwrU7fqVwrCxfGjnV4+QHgHAIozXysqSOUPjcps6qR+?= =?us-ascii?Q?TqYVnVL6mgg3izhnV6/+cqz6ZexaG+mbRR+VbSczUxbOc/FhT/E8y1YXJ96L?= =?us-ascii?Q?36D5Ne8bfZ7oREQzWU+Wmua05Th1O6u3oi/U5+SMEEsiNGjKMvRu3nA+dg2W?= =?us-ascii?Q?okEvus6Uv2wEg6BsnOkVYGGnCRkR0SSNWv6WY+rjxTCfH+Z5MFe8D8uzFUk/?= =?us-ascii?Q?GmmKDvKaoVZeCCg7RglhxXM1iJ25Dj68FUPZOfFYdMkMHvEJxdljE4YRxwgv?= =?us-ascii?Q?nLNjk2swpFQDmJ7toJrZBLcGSfz+WQ6Ba1G/f+Dwo3713gTmfrT+28B17Eu/?= =?us-ascii?Q?nfHT/PbOKnSDhyX5RP6gl2WHercOQKHDwypGYeS0bPUdb6e2TCDd9O4NTxF5?= =?us-ascii?Q?Mwobk6I9mw9UUcPZ0aWsNCjjVgmRS3MJd08nMOK85DwlYagWW8h13bRKDoC7?= =?us-ascii?Q?rGDuAdyu05ekg6tMQf7jO+kWe7IFbK4MYifLMucCydEtryZZujKHha5kEjyk?= =?us-ascii?Q?libHXle3nbwgHJxxm/b2x1JvlZ3wxWa8R8SvpnYzRvn7gn8FJC0K0lFuAksy?= =?us-ascii?Q?6b99yAVwEU1lvkmDFfMcj+1LZB04ygEBsSUSVyyxK97A75E8E217xIZ3hDfb?= =?us-ascii?Q?nreiE4/eeaF91+z56ZMyoTbq0AjT6MzRxs0bzzbd8SCNINxDc0ntL93KNORi?= =?us-ascii?Q?fUAliCh0haMHXjaRsB/63zNQGiRuDIcVIvGFJDaB0g16DUxtJMwk/Cs62W/L?= =?us-ascii?Q?qa0QFg=3D=3D?= X-Forefront-Antispam-Report: CIP:216.228.118.233;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge2.nvidia.com;CAT:NONE;SFS:(13230040)(1800799024)(376014)(82310400026)(36860700013);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 26 Jan 2026 18:13:04.7526 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: cd233f17-af10-46a3-f55c-08de5d068cdf X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.233];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: SJ5PEPF000001F4.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN2PR12MB4270 Content-Type: text/plain; charset="utf-8" Adds PCIE PMU support in Tegra410 SOC. Signed-off-by: Besar Wicaksono --- .../admin-guide/perf/nvidia-tegra410-pmu.rst | 162 ++++++++++++++ drivers/perf/arm_cspmu/nvidia_cspmu.c | 208 +++++++++++++++++- 2 files changed, 368 insertions(+), 2 deletions(-) diff --git a/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst b/Docum= entation/admin-guide/perf/nvidia-tegra410-pmu.rst index 7b7ba5700ca1..8528685ddb61 100644 --- a/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst +++ b/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst @@ -6,6 +6,7 @@ The NVIDIA Tegra410 SoC includes various system PMUs to mea= sure key performance metrics like memory bandwidth, latency, and utilization: =20 * Unified Coherence Fabric (UCF) +* PCIE =20 PMU Driver ---------- @@ -104,3 +105,164 @@ Example usage: destination filter =3D remote memory:: =20 perf stat -a -e nvidia_ucf_pmu_1/event=3D0x0,src_loc_noncpu=3D0x1,dst_= rem=3D0x1/ + +PCIE PMU +-------- + +This PMU monitors all read/write traffic from the root port(s) or a partic= ular +BDF in a PCIE root complex (RC) to local or remote memory. There is one PM= U per +PCIE RC in the SoC. Each RC can have up to 16 lanes that can be bifurcated= into +up to 8 root ports. The traffic from each root port can be filtered using = RP or +BDF filter. For example, specifying "src_rp_mask=3D0xFF" means the PMU cou= nter will +capture traffic from all RPs. Please see below for more details. + +The events and configuration options of this PMU device are described in s= ysfs, +see /sys/bus/event_source/devices/nvidia_pcie_pmu__rc_. + +The events in this PMU can be used to measure bandwidth, utilization, and +latency: + + * rd_req: count the number of read requests by PCIE device. + * wr_req: count the number of write requests by PCIE device. + * rd_bytes: count the number of bytes transferred by rd_req. + * wr_bytes: count the number of bytes transferred by wr_req. + * rd_cum_outs: count outstanding rd_req each cycle. + * cycles: counts the PCIE cycles. + +The average bandwidth is calculated as:: + + AVG_RD_BANDWIDTH_IN_GBPS =3D RD_BYTES / ELAPSED_TIME_IN_NS + AVG_WR_BANDWIDTH_IN_GBPS =3D WR_BYTES / ELAPSED_TIME_IN_NS + +The average request rate is calculated as:: + + AVG_RD_REQUEST_RATE =3D RD_REQ / CYCLES + AVG_WR_REQUEST_RATE =3D WR_REQ / CYCLES + + +The average latency is calculated as:: + + FREQ_IN_GHZ =3D CYCLES / ELAPSED_TIME_IN_NS + AVG_LATENCY_IN_CYCLES =3D RD_CUM_OUTS / RD_REQ + AVERAGE_LATENCY_IN_NS =3D AVG_LATENCY_IN_CYCLES / FREQ_IN_GHZ + +The PMU events can be filtered based on the traffic source and destination. +The source filter indicates the PCIE devices that will be monitored. The +destination filter specifies the destination memory type, e.g. local system +memory (CMEM), local GPU memory (GMEM), or remote memory. The local/remote +classification of the destination filter is based on the home socket of the +address, not where the data actually resides. These filters can be found in +/sys/bus/event_source/devices/nvidia_pcie_pmu__rc_/= format/. + +The list of event filters: + +* Source filter: + + * src_rp_mask: bitmask of root ports that will be monitored. Each bit in= this + bitmask represents the RP index in the RC. If the bit is set, all devi= ces under + the associated RP will be monitored. E.g "src_rp_mask=3D0xF" will moni= tor + devices in root port 0 to 3. + * src_bdf: the BDF that will be monitored. This is a 16-bit value that + follows formula: (bus << 8) + (device << 3) + (function). For example,= the + value of BDF 27:01.1 is 0x2781. + * src_bdf_en: enable the BDF filter. If this is set, the BDF filter valu= e in + "src_bdf" is used to filter the traffic. + + Note that Root-Port and BDF filters are mutually exclusive and the PMU in + each RC can only have one BDF filter for the whole counters. If BDF filt= er + is enabled, the BDF filter value will be applied to all events. + +* Destination filter: + + * dst_loc_cmem: if set, count events to local system memory (CMEM) addre= ss + * dst_loc_gmem: if set, count events to local GPU memory (GMEM) address + * dst_loc_pcie_p2p: if set, count events to local PCIE peer address + * dst_loc_pcie_cxl: if set, count events to local CXL memory address + * dst_rem: if set, count events to remote memory address + +If the source filter is not specified, the PMU will count events from all = root +ports. If the destination filter is not specified, the PMU will count even= ts +to all destinations. + +Example usage: + +* Count event id 0x0 from root port 0 of PCIE RC-0 on socket 0 targeting a= ll + destinations:: + + perf stat -a -e nvidia_pcie_pmu_0_rc_0/event=3D0x0,src_rp_mask=3D0x1/ + +* Count event id 0x1 from root port 0 and 1 of PCIE RC-1 on socket 0 and + targeting just local CMEM of socket 0:: + + perf stat -a -e nvidia_pcie_pmu_0_rc_1/event=3D0x1,src_rp_mask=3D0x3,d= st_loc_cmem=3D0x1/ + +* Count event id 0x2 from root port 0 of PCIE RC-2 on socket 1 targeting a= ll + destinations:: + + perf stat -a -e nvidia_pcie_pmu_1_rc_2/event=3D0x2,src_rp_mask=3D0x1/ + +* Count event id 0x3 from root port 0 and 1 of PCIE RC-3 on socket 1 and + targeting just local CMEM of socket 1:: + + perf stat -a -e nvidia_pcie_pmu_1_rc_3/event=3D0x3,src_rp_mask=3D0x3,d= st_loc_cmem=3D0x1/ + +* Count event id 0x4 from BDF 01:01.0 of PCIE RC-4 on socket 0 targeting a= ll + destinations:: + + perf stat -a -e nvidia_pcie_pmu_0_rc_4/event=3D0x4,src_bdf=3D0x0180,sr= c_bdf_en=3D0x1/ + +Mapping the RC# to lspci segment number can be non-trivial; hence a new NV= IDIA +Designated Vendor Specific Capability (DVSEC) register is added into the P= CIE config space +for each RP. This DVSEC has vendor id "10de" and DVSEC id of "0x4". The DV= SEC register +contains the following information to map PCIE devices under the RP back t= o its RC# : + + - Bus# (byte 0xc) : bus number as reported by the lspci output + - Segment# (byte 0xd) : segment number as reported by the lspci output + - RP# (byte 0xe) : port number as reported by LnkCap attribute from lspc= i for a device with Root Port capability + - RC# (byte 0xf): root complex number associated with the RP + - Socket# (byte 0x10): socket number associated with the RP + +Example script for mapping lspci BDF to RC# and socket#:: + + #!/bin/bash + while read bdf rest; do + dvsec4_reg=3D$(lspci -vv -s $bdf | awk ' + /Designated Vendor-Specific: Vendor=3D10de ID=3D0004/ { + match($0, /\[([0-9a-fA-F]+)/, arr); + print "0x" arr[1]; + exit + } + ') + if [ -n "$dvsec4_reg" ]; then + bus=3D$(setpci -s $bdf $(printf '0x%x' $((${dvsec4_reg} + 0xc))).b) + segment=3D$(setpci -s $bdf $(printf '0x%x' $((${dvsec4_reg} + 0xd)))= .b) + rp=3D$(setpci -s $bdf $(printf '0x%x' $((${dvsec4_reg} + 0xe))).b) + rc=3D$(setpci -s $bdf $(printf '0x%x' $((${dvsec4_reg} + 0xf))).b) + socket=3D$(setpci -s $bdf $(printf '0x%x' $((${dvsec4_reg} + 0x10)))= .b) + echo "$bdf: Bus=3D$bus, Segment=3D$segment, RP=3D$rp, RC=3D$rc, Sock= et=3D$socket" + fi + done < <(lspci -d 10de:) + +Example output:: + + 0001:00:00.0: Bus=3D00, Segment=3D01, RP=3D00, RC=3D00, Socket=3D00 + 0002:80:00.0: Bus=3D80, Segment=3D02, RP=3D01, RC=3D01, Socket=3D00 + 0002:a0:00.0: Bus=3Da0, Segment=3D02, RP=3D02, RC=3D01, Socket=3D00 + 0002:c0:00.0: Bus=3Dc0, Segment=3D02, RP=3D03, RC=3D01, Socket=3D00 + 0002:e0:00.0: Bus=3De0, Segment=3D02, RP=3D04, RC=3D01, Socket=3D00 + 0003:00:00.0: Bus=3D00, Segment=3D03, RP=3D00, RC=3D02, Socket=3D00 + 0004:00:00.0: Bus=3D00, Segment=3D04, RP=3D00, RC=3D03, Socket=3D00 + 0005:00:00.0: Bus=3D00, Segment=3D05, RP=3D00, RC=3D04, Socket=3D00 + 0005:40:00.0: Bus=3D40, Segment=3D05, RP=3D01, RC=3D04, Socket=3D00 + 0005:c0:00.0: Bus=3Dc0, Segment=3D05, RP=3D02, RC=3D04, Socket=3D00 + 0006:00:00.0: Bus=3D00, Segment=3D06, RP=3D00, RC=3D05, Socket=3D00 + 0009:00:00.0: Bus=3D00, Segment=3D09, RP=3D00, RC=3D00, Socket=3D01 + 000a:80:00.0: Bus=3D80, Segment=3D0a, RP=3D01, RC=3D01, Socket=3D01 + 000a:a0:00.0: Bus=3Da0, Segment=3D0a, RP=3D02, RC=3D01, Socket=3D01 + 000a:e0:00.0: Bus=3De0, Segment=3D0a, RP=3D03, RC=3D01, Socket=3D01 + 000b:00:00.0: Bus=3D00, Segment=3D0b, RP=3D00, RC=3D02, Socket=3D01 + 000c:00:00.0: Bus=3D00, Segment=3D0c, RP=3D00, RC=3D03, Socket=3D01 + 000d:00:00.0: Bus=3D00, Segment=3D0d, RP=3D00, RC=3D04, Socket=3D01 + 000d:40:00.0: Bus=3D40, Segment=3D0d, RP=3D01, RC=3D04, Socket=3D01 + 000d:c0:00.0: Bus=3Dc0, Segment=3D0d, RP=3D02, RC=3D04, Socket=3D01 + 000e:00:00.0: Bus=3D00, Segment=3D0e, RP=3D00, RC=3D05, Socket=3D01 diff --git a/drivers/perf/arm_cspmu/nvidia_cspmu.c b/drivers/perf/arm_cspmu= /nvidia_cspmu.c index c67667097a3c..3a5531d1f94c 100644 --- a/drivers/perf/arm_cspmu/nvidia_cspmu.c +++ b/drivers/perf/arm_cspmu/nvidia_cspmu.c @@ -8,6 +8,7 @@ =20 #include #include +#include #include =20 #include "arm_cspmu.h" @@ -28,6 +29,19 @@ #define NV_UCF_FILTER_DST GENMASK_ULL(11, 8) #define NV_UCF_FILTER_DEFAULT (NV_UCF_FILTER_SRC | NV_UCF_FILTER_DS= T) =20 +#define NV_PCIE_V2_PORT_COUNT 8ULL +#define NV_PCIE_V2_FILTER_ID_MASK GENMASK_ULL(24, 0) +#define NV_PCIE_V2_FILTER_PORT GENMASK_ULL(NV_PCIE_V2_PORT_COUNT - 1= , 0) +#define NV_PCIE_V2_FILTER_BDF_VAL GENMASK_ULL(23, NV_PCIE_V2_PORT_COUNT) +#define NV_PCIE_V2_FILTER_BDF_EN BIT(24) +#define NV_PCIE_V2_FILTER_BDF_VAL_EN GENMASK_ULL(24, NV_PCIE_V2_PORT_COUNT) +#define NV_PCIE_V2_FILTER_DEFAULT NV_PCIE_V2_FILTER_PORT + +#define NV_PCIE_V2_DST_COUNT 5ULL +#define NV_PCIE_V2_FILTER2_ID_MASK GENMASK_ULL(4, 0) +#define NV_PCIE_V2_FILTER2_DST GENMASK_ULL(NV_PCIE_V2_DST_COUNT - 1,= 0) +#define NV_PCIE_V2_FILTER2_DEFAULT NV_PCIE_V2_FILTER2_DST + #define NV_GENERIC_FILTER_ID_MASK GENMASK_ULL(31, 0) =20 #define NV_PRODID_MASK (PMIIDR_PRODUCTID | PMIIDR_VARIANT | PMIIDR_REVISIO= N) @@ -162,6 +176,16 @@ static struct attribute *ucf_pmu_event_attrs[] =3D { NULL, }; =20 +static struct attribute *pcie_v2_pmu_event_attrs[] =3D { + ARM_CSPMU_EVENT_ATTR(rd_bytes, 0x0), + ARM_CSPMU_EVENT_ATTR(wr_bytes, 0x1), + ARM_CSPMU_EVENT_ATTR(rd_req, 0x2), + ARM_CSPMU_EVENT_ATTR(wr_req, 0x3), + ARM_CSPMU_EVENT_ATTR(rd_cum_outs, 0x4), + ARM_CSPMU_EVENT_ATTR(cycles, ARM_CSPMU_EVT_CYCLES_DEFAULT), + NULL, +}; + static struct attribute *generic_pmu_event_attrs[] =3D { ARM_CSPMU_EVENT_ATTR(cycles, ARM_CSPMU_EVT_CYCLES_DEFAULT), NULL, @@ -202,6 +226,19 @@ static struct attribute *ucf_pmu_format_attrs[] =3D { NULL, }; =20 +static struct attribute *pcie_v2_pmu_format_attrs[] =3D { + ARM_CSPMU_FORMAT_EVENT_ATTR, + ARM_CSPMU_FORMAT_ATTR(src_rp_mask, "config1:0-7"), + ARM_CSPMU_FORMAT_ATTR(src_bdf, "config1:8-23"), + ARM_CSPMU_FORMAT_ATTR(src_bdf_en, "config1:24"), + ARM_CSPMU_FORMAT_ATTR(dst_loc_cmem, "config2:0"), + ARM_CSPMU_FORMAT_ATTR(dst_loc_gmem, "config2:1"), + ARM_CSPMU_FORMAT_ATTR(dst_loc_pcie_p2p, "config2:2"), + ARM_CSPMU_FORMAT_ATTR(dst_loc_pcie_cxl, "config2:3"), + ARM_CSPMU_FORMAT_ATTR(dst_rem, "config2:4"), + NULL, +}; + static struct attribute *generic_pmu_format_attrs[] =3D { ARM_CSPMU_FORMAT_EVENT_ATTR, ARM_CSPMU_FORMAT_FILTER_ATTR, @@ -233,6 +270,32 @@ nv_cspmu_get_name(const struct arm_cspmu *cspmu) return ctx->name; } =20 +#if defined(CONFIG_ACPI) +static int nv_cspmu_get_inst_id(const struct arm_cspmu *cspmu, u32 *id) +{ + struct fwnode_handle *fwnode; + struct acpi_device *adev; + int ret; + + adev =3D arm_cspmu_acpi_dev_get(cspmu); + if (!adev) + return -ENODEV; + + fwnode =3D acpi_fwnode_handle(adev); + ret =3D fwnode_property_read_u32(fwnode, "instance_id", id); + if (ret) + dev_err(cspmu->dev, "Failed to get instance ID\n"); + + acpi_dev_put(adev); + return ret; +} +#else +static int nv_cspmu_get_inst_id(const struct arm_cspmu *cspmu, u32 *id) +{ + return -EINVAL; +} +#endif + static u32 nv_cspmu_event_filter(const struct perf_event *event) { const struct nv_cspmu_ctx *ctx =3D @@ -278,6 +341,20 @@ static void nv_cspmu_set_ev_filter(struct arm_cspmu *c= spmu, } } =20 +static void nv_cspmu_reset_ev_filter(struct arm_cspmu *cspmu, + const struct perf_event *event) +{ + const struct nv_cspmu_ctx *ctx =3D + to_nv_cspmu_ctx(to_arm_cspmu(event->pmu)); + const u32 offset =3D 4 * event->hw.idx; + + if (ctx->get_filter) + writel(0, cspmu->base0 + PMEVFILTR + offset); + + if (ctx->get_filter2) + writel(0, cspmu->base0 + PMEVFILT2R + offset); +} + static void nv_cspmu_set_cc_filter(struct arm_cspmu *cspmu, const struct perf_event *event) { @@ -308,9 +385,103 @@ static u32 ucf_pmu_event_filter(const struct perf_eve= nt *event) return ret; } =20 +static u32 pcie_v2_pmu_bdf_val_en(u32 filter) +{ + const u32 bdf_en =3D FIELD_GET(NV_PCIE_V2_FILTER_BDF_EN, filter); + + /* Returns both BDF value and enable bit if BDF filtering is enabled. */ + if (bdf_en) + return FIELD_GET(NV_PCIE_V2_FILTER_BDF_VAL_EN, filter); + + /* Ignore the BDF value if BDF filter is not enabled. */ + return 0; +} + +static u32 pcie_v2_pmu_event_filter(const struct perf_event *event) +{ + u32 filter, lead_filter, lead_bdf; + struct perf_event *leader; + const struct nv_cspmu_ctx *ctx =3D + to_nv_cspmu_ctx(to_arm_cspmu(event->pmu)); + + filter =3D event->attr.config1 & ctx->filter_mask; + if (filter !=3D 0) + return filter; + + leader =3D event->group_leader; + + /* Use leader's filter value if its BDF filtering is enabled. */ + if (event !=3D leader) { + lead_filter =3D pcie_v2_pmu_event_filter(leader); + lead_bdf =3D pcie_v2_pmu_bdf_val_en(lead_filter); + if (lead_bdf !=3D 0) + return lead_filter; + } + + /* Otherwise, return default filter value. */ + return ctx->filter_default_val; +} + +static int pcie_v2_pmu_validate_event(struct arm_cspmu *cspmu, + struct perf_event *new_ev) +{ + /* + * Make sure the events are using same BDF filter since the PCIE-SRC PMU + * only supports one common BDF filter setting for all of the counters. + */ + + int idx; + u32 new_filter, new_rp, new_bdf, new_lead_filter, new_lead_bdf; + struct perf_event *leader, *new_leader; + + if (cspmu->impl.ops.is_cycle_counter_event(new_ev)) + return 0; + + new_leader =3D new_ev->group_leader; + + new_filter =3D pcie_v2_pmu_event_filter(new_ev); + new_lead_filter =3D pcie_v2_pmu_event_filter(new_leader); + + new_bdf =3D pcie_v2_pmu_bdf_val_en(new_filter); + new_lead_bdf =3D pcie_v2_pmu_bdf_val_en(new_lead_filter); + + new_rp =3D FIELD_GET(NV_PCIE_V2_FILTER_PORT, new_filter); + + if (new_rp !=3D 0 && new_bdf !=3D 0) { + dev_err(cspmu->dev, + "RP and BDF filtering are mutually exclusive\n"); + return -EINVAL; + } + + if (new_bdf !=3D new_lead_bdf) { + dev_err(cspmu->dev, + "sibling and leader BDF value should be equal\n"); + return -EINVAL; + } + + /* Compare BDF filter on existing events. */ + idx =3D find_first_bit(cspmu->hw_events.used_ctrs, + cspmu->cycle_counter_logical_idx); + + if (idx !=3D cspmu->cycle_counter_logical_idx) { + leader =3D cspmu->hw_events.events[idx]->group_leader; + + const u32 lead_filter =3D pcie_v2_pmu_event_filter(leader); + const u32 lead_bdf =3D pcie_v2_pmu_bdf_val_en(lead_filter); + + if (new_lead_bdf !=3D lead_bdf) { + dev_err(cspmu->dev, "only one BDF value is supported\n"); + return -EINVAL; + } + } + + return 0; +} + enum nv_cspmu_name_fmt { NAME_FMT_GENERIC, - NAME_FMT_SOCKET + NAME_FMT_SOCKET, + NAME_FMT_SOCKET_INST }; =20 struct nv_cspmu_match { @@ -430,6 +601,27 @@ static const struct nv_cspmu_match nv_cspmu_match[] = =3D { .init_data =3D NULL }, }, + { + .prodid =3D 0x10301000, + .prodid_mask =3D NV_PRODID_MASK, + .name_pattern =3D "nvidia_pcie_pmu_%u_rc_%u", + .name_fmt =3D NAME_FMT_SOCKET_INST, + .template_ctx =3D { + .event_attr =3D pcie_v2_pmu_event_attrs, + .format_attr =3D pcie_v2_pmu_format_attrs, + .filter_mask =3D NV_PCIE_V2_FILTER_ID_MASK, + .filter_default_val =3D NV_PCIE_V2_FILTER_DEFAULT, + .filter2_mask =3D NV_PCIE_V2_FILTER2_ID_MASK, + .filter2_default_val =3D NV_PCIE_V2_FILTER2_DEFAULT, + .get_filter =3D pcie_v2_pmu_event_filter, + .get_filter2 =3D nv_cspmu_event_filter2, + .init_data =3D NULL + }, + .ops =3D { + .validate_event =3D pcie_v2_pmu_validate_event, + .reset_ev_filter =3D nv_cspmu_reset_ev_filter, + } + }, { .prodid =3D 0, .prodid_mask =3D 0, @@ -453,7 +645,7 @@ static const struct nv_cspmu_match nv_cspmu_match[] =3D= { static char *nv_cspmu_format_name(const struct arm_cspmu *cspmu, const struct nv_cspmu_match *match) { - char *name; + char *name =3D NULL; struct device *dev =3D cspmu->dev; =20 static atomic_t pmu_generic_idx =3D {0}; @@ -467,6 +659,16 @@ static char *nv_cspmu_format_name(const struct arm_csp= mu *cspmu, socket); break; } + case NAME_FMT_SOCKET_INST: { + const int cpu =3D cpumask_first(&cspmu->associated_cpus); + const int socket =3D cpu_to_node(cpu); + u32 inst_id; + + if (!nv_cspmu_get_inst_id(cspmu, &inst_id)) + name =3D devm_kasprintf(dev, GFP_KERNEL, + match->name_pattern, socket, inst_id); + break; + } case NAME_FMT_GENERIC: name =3D devm_kasprintf(dev, GFP_KERNEL, match->name_pattern, atomic_fetch_inc(&pmu_generic_idx)); @@ -514,8 +716,10 @@ static int nv_cspmu_init_ops(struct arm_cspmu *cspmu) cspmu->impl.ctx =3D ctx; =20 /* NVIDIA specific callbacks. */ + SET_OP(validate_event, impl_ops, match, NULL); SET_OP(set_cc_filter, impl_ops, match, nv_cspmu_set_cc_filter); SET_OP(set_ev_filter, impl_ops, match, nv_cspmu_set_ev_filter); + SET_OP(reset_ev_filter, impl_ops, match, NULL); SET_OP(get_event_attrs, impl_ops, match, nv_cspmu_get_event_attrs); SET_OP(get_format_attrs, impl_ops, match, nv_cspmu_get_format_attrs); SET_OP(get_name, impl_ops, match, nv_cspmu_get_name); --=20 2.43.0