From nobody Sun Feb  8 21:16:07 2026
Received: from PH0PR06CU001.outbound.protection.outlook.com
 (mail-westus3azon11011004.outbound.protection.outlook.com [40.107.208.4])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3F83634DB6D;
	Mon, 26 Jan 2026 18:13:23 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=fail smtp.client-ip=40.107.208.4
ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1769451206; cv=fail;
 b=QQJciabNN/Yn3MoJQvMdcvjaIT72X1wXE61MhwtPtOQfSEqBM0Kf6M5iK+sCQ0/0y9esrH/slUvF7FNaWjGKdDSmVhpgkvaxupE1nVu8I+8KSfpngvMsKLLonQ9xmPiVxFqFvoyqiMeFzOQREMWrfrQpicjyavIXo68gURFPG5M=
ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1769451206; c=relaxed/simple;
	bh=yl6zxSw91Yx8qkR2e+gTGSliiz6E7PM+dWAYI8rmG0M=;
	h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version:Content-Type;
 b=aDzK1RdOHJtw+KgE1rQdcMU2yFEuw9/mC7yF6UXmCbDM7lTN9GGpdQwEzCjbblhqC/fJUaMWEINQ6Cgt5UEQMG5N83uyeGPjDyXvoZBWc+qiiOFJMD0+pk4OKtP0kXAh0ZPYtj8Nxlo6uIh/vC9lqGGjGs9U946Z5S7CKGcOxuQ=
ARC-Authentication-Results: i=2; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=nvidia.com;
 spf=fail smtp.mailfrom=nvidia.com;
 dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com
 header.b=H3TYYh5b; arc=fail smtp.client-ip=40.107.208.4
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=nvidia.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=fail smtp.mailfrom=nvidia.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com
 header.b="H3TYYh5b"
ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none;
 b=nB7wT8Hk48aF58sQo9mgeFg6tgs6bchx94aa7fPKvV8yDCQXXZwbSSY7nk3bSE9sj3uHgeBIjlydDDeiZYHHu2OnmaamqZbV7OfgaYWjkKZGjVSdicpGYtQuJPsqH1kOQcj2afN3ugFuS1NSHMhAbFLv/lRhjckkQh6gShxTQ5yTKc25wn4TO0cKMrg4DgvTw/D9vR2Vg2H/eGpb2tC1+RWwAancedSG1zsh678fQFqHyvjkWMMMOeNJi3qUOKrPTRZb/L9U+f4sVyHxtNuIOxIUusbGI8ofl9RMWihNxf5sYV9EnMO6y2YNz3Fn3WsAArBMI/gKTv+DhPCQWgHRUg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;
 s=arcselector10001;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
 bh=nJDSG7d5kYxZnIsI0GgZZi/fMZvdLqZQqBdxLxX0BQM=;
 b=GDnOJoMNpxctpVey95nIhTCwLSw1lnZnqXq9+lycCFca1sdNchySX4FlVa6AAdDDlTrL8ZDFPYGkIk81B7Y4u65XYLbhdu8Nv1PuSlKd9Erm5abyjo9rJ04RuWdFUEPH4GaHO5JBQqQyseQYz+mKIavjSv4LlO0QgK0hy6rNlBFutjL070C42pGB7HSXc/QEu6iZLFjgWu7mUFd/th8sWlPg1BhOaCdTfbsyIMr/RilddIwc0/UhSwuuGgtFC6HRcWc78ZE9rw4a6xjRVmOKMbb1rl28vqfgUpWgmzlJzqLcgWbc/uudKdek6Pw0jxHt1MBTCza6RsQXS9bc78Lt3g==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is
 216.228.118.233) smtp.rcpttodomain=kernel.org smtp.mailfrom=nvidia.com;
 dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com;
 dkim=none (message not signed); arc=none (0)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com;
 s=selector2;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=nJDSG7d5kYxZnIsI0GgZZi/fMZvdLqZQqBdxLxX0BQM=;
 b=H3TYYh5bOidQHOW5plXj7oZBYwXxX2guAxAO/O138xSf5Uc5GHW5CmJ6lNxqwAufKgHwOD2fBuyDKRoIhmpQ3GfXuPq86RgJLA7zpRVBlOk9QytgBUlqo4HA34oZWdTHGxAbuW7hRhNu/m2bGZ1s8HrVxeyQILorfmgogv+TDvhwYMfIlyDaYUwdKTwt4qhkr9LDBuhqizkln1dzjPNZkFLCOqs9sXKAZ0APnWKF4SCFDlWG3A01Ery2HD6pfxLC9k4o+2ERwj+qHdiH5wNH97teZBb1gIVviYbNAaSoBh9mytTzf1iZuqkRFVJPyIgEVo3zUM47CVPM+b1aO5c0lw==
Received: from SJ0PR05CA0206.namprd05.prod.outlook.com (2603:10b6:a03:330::31)
 by LV8PR12MB9714.namprd12.prod.outlook.com (2603:10b6:408:2a0::5) with
 Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9542.16; Mon, 26 Jan
 2026 18:13:11 +0000
Received: from SJ5PEPF000001F1.namprd05.prod.outlook.com
 (2603:10b6:a03:330:cafe::aa) by SJ0PR05CA0206.outlook.office365.com
 (2603:10b6:a03:330::31) with Microsoft SMTP Server (version=TLS1_3,
 cipher=TLS_AES_256_GCM_SHA384) id 15.20.9564.7 via Frontend Transport; Mon,
 26 Jan 2026 18:13:02 +0000
X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.233)
 smtp.mailfrom=nvidia.com; dkim=none (message not signed)
 header.d=none;dmarc=pass action=none header.from=nvidia.com;
Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates
 216.228.118.233 as permitted sender) receiver=protection.outlook.com;
 client-ip=216.228.118.233; helo=mail.nvidia.com; pr=C
Received: from mail.nvidia.com (216.228.118.233) by
 SJ5PEPF000001F1.mail.protection.outlook.com (10.167.242.69) with Microsoft
 SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.20.9564.3 via Frontend Transport; Mon, 26 Jan 2026 18:13:11 +0000
Received: from drhqmail203.nvidia.com (10.126.190.182) by mail.nvidia.com
 (10.127.129.6) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Mon, 26 Jan
 2026 10:13:00 -0800
Received: from drhqmail201.nvidia.com (10.126.190.180) by
 drhqmail203.nvidia.com (10.126.190.182) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.2.2562.20; Mon, 26 Jan 2026 10:12:59 -0800
Received: from build-bwicaksono-noble-20251018.internal (10.127.8.11) by
 mail.nvidia.com (10.126.190.180) with Microsoft SMTP Server id 15.2.2562.20
 via Frontend Transport; Mon, 26 Jan 2026 10:12:59 -0800
From: Besar Wicaksono <bwicaksono@nvidia.com>
To: <will@kernel.org>, <suzuki.poulose@arm.com>, <robin.murphy@arm.com>,
	<ilkka@os.amperecomputing.com>
CC: <linux-arm-kernel@lists.infradead.org>, <linux-kernel@vger.kernel.org>,
	<linux-tegra@vger.kernel.org>, <mark.rutland@arm.com>, <treding@nvidia.com>,
	<jonathanh@nvidia.com>, <vsethi@nvidia.com>, <rwiley@nvidia.com>,
	<sdonthineni@nvidia.com>, <skelley@nvidia.com>, <ywan@nvidia.com>,
	<mochs@nvidia.com>, <nirmoyd@nvidia.com>, Besar Wicaksono
	<bwicaksono@nvidia.com>
Subject: [PATCH 7/8] perf: add NVIDIA Tegra410 C2C PMU
Date: Mon, 26 Jan 2026 18:11:54 +0000
Message-ID: <20260126181155.2776097-8-bwicaksono@nvidia.com>
X-Mailer: git-send-email 2.43.0
In-Reply-To: <20260126181155.2776097-1-bwicaksono@nvidia.com>
References: <20260126181155.2776097-1-bwicaksono@nvidia.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
X-NV-OnPremToCloud: ExternallySecured
X-EOPAttributedMessage: 0
X-MS-PublicTrafficType: Email
X-MS-TrafficTypeDiagnostic: SJ5PEPF000001F1:EE_|LV8PR12MB9714:EE_
X-MS-Office365-Filtering-Correlation-Id: 7dca0bf2-8639-464b-d275-08de5d0690b1
X-MS-Exchange-SenderADCheck: 1
X-MS-Exchange-AntiSpam-Relay: 0
X-Microsoft-Antispam: 
	BCL:0;ARA:13230040|1800799024|36860700013|82310400026|376014;
X-Microsoft-Antispam-Message-Info: 
	=?us-ascii?Q?OSM/+Y82C7Nzw4irKb1L6Jj1Ed0iBW/+kZeWxPd9Lohp54i5o3FZwoVuk6af?=
 =?us-ascii?Q?GBLThktB+MgOZRSY4FCFXxTOhO8yZhEu3A0AwmDFIjE9dnEt173T5fYnxwUh?=
 =?us-ascii?Q?B9z9eaj7eBmpS+BfhyHmIu1U0atev6cBspOwcytGWHG5SND3ySN9GkUYW56e?=
 =?us-ascii?Q?qnPBLxvJ3RfItU+wIIMYeAiCLiG7CD62NptdvnNUzAAyi2YaumOeVOfFDJxo?=
 =?us-ascii?Q?wdejA1mxCcFo6MeLA/LWY5V3sG4Svr/PUzZkhlmwiWHk9HtUwBrUZrrOfVFo?=
 =?us-ascii?Q?6dG2BGSIW5hocwbkkJ4Wt3BDkiEsRvBmiEYmpyEZHHYfrwcfiaP/LFjfteTm?=
 =?us-ascii?Q?fKMWuvjanMIT3sQXR8yCmvOILszmwdqTsgD3sU+ZK40LDFDx/olei11J1dmB?=
 =?us-ascii?Q?2cD78K/1D0SveEsaPXlq4mUjlHpAnOE2rP0NmU3yb3KN/Q3GxdAJyCi3UjtM?=
 =?us-ascii?Q?6t2tE3OZfB8SjQQX/kpWDUwtkFs9uDtNSrlQVY0m/JaXojKf+X5nDl4Doqei?=
 =?us-ascii?Q?wjbdVEtY06WOl2ALNpMVqd0CQVfogDiQnkkwEVJGZmqNHfSgc81Dab1FifoC?=
 =?us-ascii?Q?KE5e8Q0dYADgBT7kRASpfgs3U+1VJpzp8p969dJgL6CACytGGavT8NLxhsqS?=
 =?us-ascii?Q?HTjlCDCk2GgpV96TxevaskV/OHaSH6HF9AZFfGTpuEyFmyqE4rmsx5sAfNuu?=
 =?us-ascii?Q?Us3g6XxhklT/I43hcudtemO8W1LocS2PF6YgYIb5lTFuFshgMlobCLpXHD0f?=
 =?us-ascii?Q?tytX9JRwsQ7tj2yZOJVhboxEq568SRaxWO+pjD75Gufw4Ekbqk9R3Vo+uZ//?=
 =?us-ascii?Q?gXYewchMk3ZZHg9lejeZGfh894vEJjncDkX+KmWpQqsnbLI3qFy6i3wFvFB7?=
 =?us-ascii?Q?78zbWmL9++Ku2kozlLDLrezptwhRogylrC4EvMxVrGwAys7SUxyAbc6w/9D9?=
 =?us-ascii?Q?84988+18zJktS3kqHPGZCeNngWCHR3z+s2/8jQ/Cw4wRPKGQ7J0K0bzzw06p?=
 =?us-ascii?Q?94/b4THUpJep7IwUkI8wxbxpEuc/0C1RTy/VnYYm5dLuir5IRDCPR1ac3wRq?=
 =?us-ascii?Q?0nxDNbqWdCNFRNQf4NqLsyh1dsOoWjROi79bAF0UaeLIF1eEcamr9aL0R4ns?=
 =?us-ascii?Q?p/+QreL8tniNBLytQ0ysH3y0QmzfJaQVJjra3bODorVVaFtNduNFY4poHd3W?=
 =?us-ascii?Q?1qqrHOa56hAg7UJXsTglkx/ts0kO8RyXCJFsDIepV5HuMlR4fjIHCkC63oY2?=
 =?us-ascii?Q?+YEDM64eULTggF222SeVaE6oOGoL3LpY3EyxyD74HkhTxqhWogXEM6pfeGLO?=
 =?us-ascii?Q?8Jq7s4Amk8e3BRHmBCb2xc89uKoxYNh0dHobx/SbfmthQSYyZe7ciOZL/aSW?=
 =?us-ascii?Q?GM7s0zu0SDtSxCxaZuOD25KVvIN+cJPypbHUXvx1fS3Mf19D9Elg01zt4dMI?=
 =?us-ascii?Q?rbVwHP+YDFnM5QGPZLdQmRk09as9yJhUn+hbO5Vk14jFhIGDzACkTQHMxNQV?=
 =?us-ascii?Q?mE/13K3XyWraYKouCEDJDCsET0SmxXuORWWlCJzO+KGaUjjUigZGuVwlvld1?=
 =?us-ascii?Q?TFguVTWY60qhv9NfCnHyLWMiQ+2Kmr/x7VKv6gILDzRqzlBQYC1SeSe+y3U7?=
 =?us-ascii?Q?39yqbqxQE292SnL+FbLfAdcwl55oxNO3EXFt6P1qgZ8UQ0eXNT+NvSoWKRPf?=
 =?us-ascii?Q?ACPGoQ=3D=3D?=
X-Forefront-Antispam-Report: 
	CIP:216.228.118.233;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge2.nvidia.com;CAT:NONE;SFS:(13230040)(1800799024)(36860700013)(82310400026)(376014);DIR:OUT;SFP:1101;
X-OriginatorOrg: Nvidia.com
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 26 Jan 2026 18:13:11.1611
 (UTC)
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 7dca0bf2-8639-464b-d275-08de5d0690b1
X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a
X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: 
 TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.233];Helo=[mail.nvidia.com]
X-MS-Exchange-CrossTenant-AuthSource: 
	SJ5PEPF000001F1.namprd05.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Anonymous
X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem
X-MS-Exchange-Transport-CrossTenantHeadersStamped: LV8PR12MB9714
Content-Type: text/plain; charset="utf-8"

Adds NVIDIA C2C PMU support in Tegra410 SOC.

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
---
 .../admin-guide/perf/nvidia-tegra410-pmu.rst  |  151 +++
 drivers/perf/Kconfig                          |    7 +
 drivers/perf/Makefile                         |    1 +
 drivers/perf/nvidia_t410_c2c_pmu.c            | 1061 +++++++++++++++++
 4 files changed, 1220 insertions(+)
 create mode 100644 drivers/perf/nvidia_t410_c2c_pmu.c

diff --git a/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst b/Docum=
entation/admin-guide/perf/nvidia-tegra410-pmu.rst
index 11fc1c88346a..f81f356debe1 100644
--- a/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst
+++ b/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst
@@ -9,6 +9,9 @@ metrics like memory bandwidth, latency, and utilization:
 * PCIE
 * PCIE-TGT
 * CPU Memory (CMEM) Latency
+* NVLink-C2C
+* NV-CLink
+* NV-DLink
=20
 PMU Driver
 ----------
@@ -367,3 +370,151 @@ see /sys/bus/event_source/devices/nvidia_cmem_latency=
_pmu_<socket-id>.
 Example usage::
=20
   perf stat -a -e '{nvidia_cmem_latency_pmu_0/rd_req/,nvidia_cmem_latency_=
pmu_0/rd_cum_outs/,nvidia_cmem_latency_pmu_0/cycles/}'
+
+NVLink-C2C PMU
+--------------
+
+This PMU monitors latency events of memory read/write requests that pass t=
hrough
+the NVIDIA Chip-to-Chip (C2C) interface. Bandwidth events are not available
+in this PMU, unlike the C2C PMU in Grace (Tegra241 SoC).
+
+The events and configuration options of this PMU device are available in s=
ysfs,
+see /sys/bus/event_source/devices/nvidia_nvlink_c2c_pmu_<socket-id>.
+
+The list of events:
+
+  * IN_RD_CUM_OUTS: accumulated outstanding request (in cycles) of incomin=
g read requests.
+  * IN_RD_REQ: the number of incoming read requests.
+  * IN_WR_CUM_OUTS: accumulated outstanding request (in cycles) of incomin=
g write requests.
+  * IN_WR_REQ: the number of incoming write requests.
+  * OUT_RD_CUM_OUTS: accumulated outstanding request (in cycles) of outgoi=
ng read requests.
+  * OUT_RD_REQ: the number of outgoing read requests.
+  * OUT_WR_CUM_OUTS: accumulated outstanding request (in cycles) of outgoi=
ng write requests.
+  * OUT_WR_REQ: the number of outgoing write requests.
+  * CYCLES: NVLink-C2C interface cycle counts.
+
+The incoming events count the reads/writes from remote device to the SoC.
+The outgoing events count the reads/writes from the SoC to remote device.
+
+The sysfs /sys/bus/event_source/devices/nvidia_nvlink_c2c_pmu_<socket-id>/=
peer
+contains the information about the connected device.
+
+When the C2C interface is connected to GPU(s), the user can use the
+"gpu_mask" parameter to filter traffic to/from specific GPU(s). Each bit r=
epresents the GPU
+index, e.g. "gpu_mask=3D0x1" corresponds to GPU 0 and "gpu_mask=3D0x3" is =
for GPU 0 and 1.
+The PMU will monitor all GPUs by default if not specified.
+
+When connected to another SoC, only the read events are available.
+
+The events can be used to calculate the average latency of the read/write =
requests::
+
+   C2C_FREQ_IN_GHZ =3D CYCLES / ELAPSED_TIME_IN_NS
+
+   IN_RD_AVG_LATENCY_IN_CYCLES =3D IN_RD_CUM_OUTS / IN_RD_REQ
+   IN_RD_AVG_LATENCY_IN_NS =3D IN_RD_AVG_LATENCY_IN_CYCLES / C2C_FREQ_IN_G=
HZ
+
+   IN_WR_AVG_LATENCY_IN_CYCLES =3D IN_WR_CUM_OUTS / IN_WR_REQ
+   IN_WR_AVG_LATENCY_IN_NS =3D IN_WR_AVG_LATENCY_IN_CYCLES / C2C_FREQ_IN_G=
HZ
+
+   OUT_RD_AVG_LATENCY_IN_CYCLES =3D OUT_RD_CUM_OUTS / OUT_RD_REQ
+   OUT_RD_AVG_LATENCY_IN_NS =3D OUT_RD_AVG_LATENCY_IN_CYCLES / C2C_FREQ_IN=
_GHZ
+
+   OUT_WR_AVG_LATENCY_IN_CYCLES =3D OUT_WR_CUM_OUTS / OUT_WR_REQ
+   OUT_WR_AVG_LATENCY_IN_NS =3D OUT_WR_AVG_LATENCY_IN_CYCLES / C2C_FREQ_IN=
_GHZ
+
+Example usage:
+
+  * Count incoming traffic from all GPUs connected via NVLink-C2C::
+
+      perf stat -a -e nvidia_nvlink_c2c_pmu_0/in_rd_req/
+
+  * Count incoming traffic from GPU 0 connected via NVLink-C2C::
+
+      perf stat -a -e nvidia_nvlink_c2c_pmu_0/in_rd_cum_outs,gpu_mask=3D0x=
1/
+
+  * Count incoming traffic from GPU 1 connected via NVLink-C2C::
+
+      perf stat -a -e nvidia_nvlink_c2c_pmu_0/in_rd_cum_outs,gpu_mask=3D0x=
2/
+
+  * Count outgoing traffic to all GPUs connected via NVLink-C2C::
+
+      perf stat -a -e nvidia_nvlink_c2c_pmu_0/out_rd_req/
+
+  * Count outgoing traffic to GPU 0 connected via NVLink-C2C::
+
+      perf stat -a -e nvidia_nvlink_c2c_pmu_0/out_rd_cum_outs,gpu_mask=3D0=
x1/
+
+  * Count outgoing traffic to GPU 1 connected via NVLink-C2C::
+
+      perf stat -a -e nvidia_nvlink_c2c_pmu_0/out_rd_cum_outs,gpu_mask=3D0=
x2/
+
+NV-CLink PMU
+------------
+
+This PMU monitors latency events of memory read requests that pass through
+the NV-CLINK interface. Bandwidth events are not available in this PMU.
+In Tegra410 SoC, the NV-CLink interface is used to connect to another Tegr=
a410
+SoC and this PMU only counts read traffic.
+
+The events and configuration options of this PMU device are available in s=
ysfs,
+see /sys/bus/event_source/devices/nvidia_nvclink_pmu_<socket-id>.
+
+The list of events:
+
+  * IN_RD_CUM_OUTS: accumulated outstanding request (in cycles) of incomin=
g read requests.
+  * IN_RD_REQ: the number of incoming read requests.
+  * OUT_RD_CUM_OUTS: accumulated outstanding request (in cycles) of outgoi=
ng read requests.
+  * OUT_RD_REQ: the number of outgoing read requests.
+  * CYCLES: NV-CLINK interface cycle counts.
+
+The incoming events count the reads from remote device to the SoC.
+The outgoing events count the reads from the SoC to remote device.
+
+The events can be used to calculate the average latency of the read reques=
ts::
+
+   CLINK_FREQ_IN_GHZ =3D CYCLES / ELAPSED_TIME_IN_NS
+
+   IN_RD_AVG_LATENCY_IN_CYCLES =3D IN_RD_CUM_OUTS / IN_RD_REQ
+   IN_RD_AVG_LATENCY_IN_NS =3D IN_RD_AVG_LATENCY_IN_CYCLES / CLINK_FREQ_IN=
_GHZ
+
+   OUT_RD_AVG_LATENCY_IN_CYCLES =3D OUT_RD_CUM_OUTS / OUT_RD_REQ
+   OUT_RD_AVG_LATENCY_IN_NS =3D OUT_RD_AVG_LATENCY_IN_CYCLES / CLINK_FREQ_=
IN_GHZ
+
+Example usage:
+
+  * Count incoming read traffic from remote SoC connected via NV-CLINK::
+
+      perf stat -a -e nvidia_nvclink_pmu_0/in_rd_req/
+
+  * Count outgoing read traffic to remote SoC connected via NV-CLINK::
+
+      perf stat -a -e nvidia_nvclink_pmu_0/out_rd_req/
+
+NV-DLink PMU
+------------
+
+This PMU monitors latency events of memory read requests that pass through
+the NV-DLINK interface.  Bandwidth events are not available in this PMU.
+In Tegra410 SoC, this PMU only counts CXL memory read traffic.
+
+The events and configuration options of this PMU device are available in s=
ysfs,
+see /sys/bus/event_source/devices/nvidia_nvdlink_pmu_<socket-id>.
+
+The list of events:
+
+  * IN_RD_CUM_OUTS: accumulated outstanding read requests (in cycles) to C=
XL memory.
+  * IN_RD_REQ: the number of read requests to CXL memory.
+  * CYCLES: NV-DLINK interface cycle counts.
+
+The events can be used to calculate the average latency of the read reques=
ts::
+
+   DLINK_FREQ_IN_GHZ =3D CYCLES / ELAPSED_TIME_IN_NS
+
+   IN_RD_AVG_LATENCY_IN_CYCLES =3D IN_RD_CUM_OUTS / IN_RD_REQ
+   IN_RD_AVG_LATENCY_IN_NS =3D IN_RD_AVG_LATENCY_IN_CYCLES / DLINK_FREQ_IN=
_GHZ
+
+Example usage:
+
+  * Count read events to CXL memory::
+
+      perf stat -a -e '{nvidia_nvdlink_pmu_0/in_rd_req/,nvidia_nvdlink_pmu=
_0/in_rd_cum_outs/}'
diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
index 9fed3c41d5ea..7ee36efe6bc0 100644
--- a/drivers/perf/Kconfig
+++ b/drivers/perf/Kconfig
@@ -318,4 +318,11 @@ config NVIDIA_TEGRA410_CMEM_LATENCY_PMU
 	  Enable perf support for CPU memory latency counters monitoring on
 	  NVIDIA Tegra410 SoC.
=20
+config NVIDIA_TEGRA410_C2C_PMU
+	tristate "NVIDIA Tegra410 C2C PMU"
+	depends on ARM64 && ACPI
+	help
+	  Enable perf support for counters in NVIDIA C2C interface of NVIDIA
+	  Tegra410 SoC.
+
 endmenu
diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
index 4aa6aad393c2..eb8a022dad9a 100644
--- a/drivers/perf/Makefile
+++ b/drivers/perf/Makefile
@@ -36,3 +36,4 @@ obj-$(CONFIG_ARM_CORESIGHT_PMU_ARCH_SYSTEM_PMU) +=3D arm_=
cspmu/
 obj-$(CONFIG_MESON_DDR_PMU) +=3D amlogic/
 obj-$(CONFIG_CXL_PMU) +=3D cxl_pmu.o
 obj-$(CONFIG_NVIDIA_TEGRA410_CMEM_LATENCY_PMU) +=3D nvidia_t410_cmem_laten=
cy_pmu.o
+obj-$(CONFIG_NVIDIA_TEGRA410_C2C_PMU) +=3D nvidia_t410_c2c_pmu.o
diff --git a/drivers/perf/nvidia_t410_c2c_pmu.c b/drivers/perf/nvidia_t410_=
c2c_pmu.c
new file mode 100644
index 000000000000..362e0e5f8b24
--- /dev/null
+++ b/drivers/perf/nvidia_t410_c2c_pmu.c
@@ -0,0 +1,1061 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * NVIDIA Tegra410 C2C PMU driver.
+ *
+ * Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserve=
d.
+ */
+
+#include <linux/acpi.h>
+#include <linux/bitops.h>
+#include <linux/cpumask.h>
+#include <linux/device.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/module.h>
+#include <linux/perf_event.h>
+#include <linux/platform_device.h>
+#include <linux/property.h>
+
+/* The C2C interface types in Tegra410. */
+#define C2C_TYPE_NVLINK          0x0
+#define C2C_TYPE_NVCLINK         0x1
+#define C2C_TYPE_NVDLINK         0x2
+#define C2C_TYPE_COUNT           0x3
+
+/* The type of the peer device connected to the C2C interface. */
+#define C2C_PEER_TYPE_CPU        0x0
+#define C2C_PEER_TYPE_GPU        0x1
+#define C2C_PEER_TYPE_CXLMEM     0x2
+#define C2C_PEER_TYPE_COUNT      0x3
+
+/* The number of peer devices can be connected to the C2C interface. */
+#define C2C_NR_PEER_CPU          0x1
+#define C2C_NR_PEER_GPU          0x2
+#define C2C_NR_PEER_CXLMEM       0x1
+#define C2C_NR_PEER_MAX          0x2
+
+/* Number of instances on each interface. */
+#define C2C_NR_INST_NVLINK       14
+#define C2C_NR_INST_NVCLINK      12
+#define C2C_NR_INST_NVDLINK      16
+#define C2C_NR_INST_MAX          16
+
+/* Register offsets. */
+#define C2C_CTRL                    0x864
+#define C2C_IN_STATUS               0x868
+#define C2C_CYCLE_CNTR              0x86c
+#define C2C_IN_RD_CUM_OUTS_CNTR     0x874
+#define C2C_IN_RD_REQ_CNTR          0x87c
+#define C2C_IN_WR_CUM_OUTS_CNTR     0x884
+#define C2C_IN_WR_REQ_CNTR          0x88c
+#define C2C_OUT_STATUS              0x890
+#define C2C_OUT_RD_CUM_OUTS_CNTR    0x898
+#define C2C_OUT_RD_REQ_CNTR         0x8a0
+#define C2C_OUT_WR_CUM_OUTS_CNTR    0x8a8
+#define C2C_OUT_WR_REQ_CNTR         0x8b0
+
+/* C2C_IN_STATUS register field. */
+#define C2C_IN_STATUS_CYCLE_OVF             BIT(0)
+#define C2C_IN_STATUS_IN_RD_CUM_OUTS_OVF    BIT(1)
+#define C2C_IN_STATUS_IN_RD_REQ_OVF         BIT(2)
+#define C2C_IN_STATUS_IN_WR_CUM_OUTS_OVF    BIT(3)
+#define C2C_IN_STATUS_IN_WR_REQ_OVF         BIT(4)
+
+/* C2C_OUT_STATUS register field. */
+#define C2C_OUT_STATUS_OUT_RD_CUM_OUTS_OVF    BIT(0)
+#define C2C_OUT_STATUS_OUT_RD_REQ_OVF         BIT(1)
+#define C2C_OUT_STATUS_OUT_WR_CUM_OUTS_OVF    BIT(2)
+#define C2C_OUT_STATUS_OUT_WR_REQ_OVF         BIT(3)
+
+/* Events. */
+#define C2C_EVENT_CYCLES                0x0
+#define C2C_EVENT_IN_RD_CUM_OUTS        0x1
+#define C2C_EVENT_IN_RD_REQ             0x2
+#define C2C_EVENT_IN_WR_CUM_OUTS        0x3
+#define C2C_EVENT_IN_WR_REQ             0x4
+#define C2C_EVENT_OUT_RD_CUM_OUTS       0x5
+#define C2C_EVENT_OUT_RD_REQ            0x6
+#define C2C_EVENT_OUT_WR_CUM_OUTS       0x7
+#define C2C_EVENT_OUT_WR_REQ            0x8
+
+#define C2C_NUM_EVENTS           0x9
+#define C2C_MASK_EVENT           0xFF
+#define C2C_MAX_ACTIVE_EVENTS    32
+
+#define C2C_ACTIVE_CPU_MASK        0x0
+#define C2C_ASSOCIATED_CPU_MASK    0x1
+
+/*
+ * Maximum poll count for reading counter value using high-low-high sequen=
ce.
+ */
+#define HILOHI_MAX_POLL    1000
+
+static unsigned long nv_c2c_pmu_cpuhp_state;
+
+/* PMU descriptor. */
+
+/* Tracks the events assigned to the PMU for a given logical index. */
+struct nv_c2c_pmu_hw_events {
+	/* The events that are active. */
+	struct perf_event *events[C2C_MAX_ACTIVE_EVENTS];
+
+	/*
+	 * Each bit indicates a logical counter is being used (or not) for an
+	 * event.
+	 */
+	DECLARE_BITMAP(used_ctrs, C2C_MAX_ACTIVE_EVENTS);
+};
+
+struct nv_c2c_pmu {
+	struct pmu pmu;
+	struct device *dev;
+	struct acpi_device *acpi_dev;
+
+	const char *name;
+	const char *identifier;
+
+	unsigned int c2c_type;
+	unsigned int peer_type;
+	unsigned int socket;
+	unsigned int nr_inst;
+	unsigned int nr_peer;
+	unsigned long peer_insts[C2C_NR_PEER_MAX][BITS_TO_LONGS(C2C_NR_INST_MAX)];
+	u32 filter_default;
+
+	struct nv_c2c_pmu_hw_events hw_events;
+
+	cpumask_t associated_cpus;
+	cpumask_t active_cpu;
+
+	struct hlist_node cpuhp_node;
+
+	struct attribute **formats;
+	const struct attribute_group *attr_groups[6];
+
+	void __iomem *base_broadcast;
+	void __iomem *base[C2C_NR_INST_MAX];
+};
+
+#define to_c2c_pmu(p) (container_of(p, struct nv_c2c_pmu, pmu))
+
+/* Get event type from perf_event. */
+static inline u32 get_event_type(struct perf_event *event)
+{
+	return (event->attr.config) & C2C_MASK_EVENT;
+}
+
+static inline u32 get_filter_mask(struct perf_event *event)
+{
+	u32 filter;
+	struct nv_c2c_pmu *c2c_pmu =3D to_c2c_pmu(event->pmu);
+
+	filter =3D ((u32)event->attr.config1) & c2c_pmu->filter_default;
+	if (filter =3D=3D 0)
+		filter =3D c2c_pmu->filter_default;
+
+	return filter;
+}
+
+/* PMU operations. */
+
+static int nv_c2c_pmu_get_event_idx(struct nv_c2c_pmu_hw_events *hw_events,
+				    struct perf_event *event)
+{
+	u32 idx;
+
+	idx =3D find_first_zero_bit(hw_events->used_ctrs, C2C_MAX_ACTIVE_EVENTS);
+	if (idx >=3D C2C_MAX_ACTIVE_EVENTS)
+		return -EAGAIN;
+
+	set_bit(idx, hw_events->used_ctrs);
+
+	return idx;
+}
+
+static bool
+nv_c2c_pmu_validate_event(struct pmu *pmu,
+			  struct nv_c2c_pmu_hw_events *hw_events,
+			  struct perf_event *event)
+{
+	if (is_software_event(event))
+		return true;
+
+	/* Reject groups spanning multiple HW PMUs. */
+	if (event->pmu !=3D pmu)
+		return false;
+
+	return nv_c2c_pmu_get_event_idx(hw_events, event) >=3D 0;
+}
+
+/*
+ * Make sure the group of events can be scheduled at once
+ * on the PMU.
+ */
+static bool nv_c2c_pmu_validate_group(struct perf_event *event)
+{
+	struct perf_event *sibling, *leader =3D event->group_leader;
+	struct nv_c2c_pmu_hw_events fake_hw_events;
+
+	if (event->group_leader =3D=3D event)
+		return true;
+
+	memset(&fake_hw_events, 0, sizeof(fake_hw_events));
+
+	if (!nv_c2c_pmu_validate_event(event->pmu, &fake_hw_events, leader))
+		return false;
+
+	for_each_sibling_event(sibling, leader) {
+		if (!nv_c2c_pmu_validate_event(event->pmu, &fake_hw_events,
+					       sibling))
+			return false;
+	}
+
+	return nv_c2c_pmu_validate_event(event->pmu, &fake_hw_events, event);
+}
+
+static int nv_c2c_pmu_event_init(struct perf_event *event)
+{
+	struct nv_c2c_pmu *c2c_pmu =3D to_c2c_pmu(event->pmu);
+	struct hw_perf_event *hwc =3D &event->hw;
+	u32 event_type =3D get_event_type(event);
+
+	if (event->attr.type !=3D event->pmu->type ||
+	    event_type >=3D C2C_NUM_EVENTS)
+		return -ENOENT;
+
+	/*
+	 * Following other "uncore" PMUs, we do not support sampling mode or
+	 * attach to a task (per-process mode).
+	 */
+	if (is_sampling_event(event)) {
+		dev_dbg(c2c_pmu->pmu.dev, "Can't support sampling events\n");
+		return -EOPNOTSUPP;
+	}
+
+	if (event->cpu < 0 || event->attach_state & PERF_ATTACH_TASK) {
+		dev_dbg(c2c_pmu->pmu.dev, "Can't support per-task counters\n");
+		return -EINVAL;
+	}
+
+	/*
+	 * Make sure the CPU assignment is on one of the CPUs associated with
+	 * this PMU.
+	 */
+	if (!cpumask_test_cpu(event->cpu, &c2c_pmu->associated_cpus)) {
+		dev_dbg(c2c_pmu->pmu.dev,
+			"Requested cpu is not associated with the PMU\n");
+		return -EINVAL;
+	}
+
+	/* Enforce the current active CPU to handle the events in this PMU. */
+	event->cpu =3D cpumask_first(&c2c_pmu->active_cpu);
+	if (event->cpu >=3D nr_cpu_ids)
+		return -EINVAL;
+
+	if (!nv_c2c_pmu_validate_group(event))
+		return -EINVAL;
+
+	hwc->idx =3D -1;
+	hwc->config =3D event_type;
+
+	return 0;
+}
+
+/*
+ * Read 64-bit register as a pair of 32-bit registers using hi-lo-hi seque=
nce.
+ */
+static u64 read_reg64_hilohi(const void __iomem *addr, u32 max_poll_count)
+{
+	u32 val_lo, val_hi;
+	u64 val;
+
+	/* Use high-low-high sequence to avoid tearing */
+	do {
+		if (max_poll_count-- =3D=3D 0) {
+			pr_err("NV C2C PMU: timeout hi-low-high sequence\n");
+			return 0;
+		}
+
+		val_hi =3D readl(addr + 4);
+		val_lo =3D readl(addr);
+	} while (val_hi !=3D readl(addr + 4));
+
+	val =3D (((u64)val_hi << 32) | val_lo);
+
+	return val;
+}
+
+static void nv_c2c_pmu_check_status(struct nv_c2c_pmu *c2c_pmu, u32 instan=
ce)
+{
+	u32 in_status, out_status;
+
+	in_status =3D readl(c2c_pmu->base[instance] + C2C_IN_STATUS);
+	out_status =3D readl(c2c_pmu->base[instance] + C2C_OUT_STATUS);
+
+	if (in_status || out_status)
+		dev_warn(c2c_pmu->dev,
+			"C2C PMU overflow in: 0x%x, out: 0x%x\n",
+			in_status, out_status);
+}
+
+static u32 nv_c2c_ctr_offset[C2C_NUM_EVENTS] =3D {
+	[C2C_EVENT_CYCLES] =3D C2C_CYCLE_CNTR,
+	[C2C_EVENT_IN_RD_CUM_OUTS] =3D C2C_IN_RD_CUM_OUTS_CNTR,
+	[C2C_EVENT_IN_RD_REQ] =3D C2C_IN_RD_REQ_CNTR,
+	[C2C_EVENT_IN_WR_CUM_OUTS] =3D C2C_IN_WR_CUM_OUTS_CNTR,
+	[C2C_EVENT_IN_WR_REQ] =3D C2C_IN_WR_REQ_CNTR,
+	[C2C_EVENT_OUT_RD_CUM_OUTS] =3D C2C_OUT_RD_CUM_OUTS_CNTR,
+	[C2C_EVENT_OUT_RD_REQ] =3D C2C_OUT_RD_REQ_CNTR,
+	[C2C_EVENT_OUT_WR_CUM_OUTS] =3D C2C_OUT_WR_CUM_OUTS_CNTR,
+	[C2C_EVENT_OUT_WR_REQ] =3D C2C_OUT_WR_REQ_CNTR,
+};
+
+static u64 nv_c2c_pmu_read_counter(struct perf_event *event)
+{
+	u32 ctr_id, ctr_offset, filter_mask, filter_idx, inst_idx;
+	unsigned long *inst_mask;
+	DECLARE_BITMAP(filter_bitmap, C2C_NR_PEER_MAX);
+	struct nv_c2c_pmu *c2c_pmu =3D to_c2c_pmu(event->pmu);
+	u64 val =3D 0;
+
+	filter_mask =3D get_filter_mask(event);
+	bitmap_from_arr32(filter_bitmap, &filter_mask, c2c_pmu->nr_peer);
+
+	ctr_id =3D event->hw.config;
+	ctr_offset =3D nv_c2c_ctr_offset[ctr_id];
+
+	for_each_set_bit(filter_idx, filter_bitmap, c2c_pmu->nr_peer) {
+		inst_mask =3D c2c_pmu->peer_insts[filter_idx];
+		for_each_set_bit(inst_idx, inst_mask, c2c_pmu->nr_inst) {
+			nv_c2c_pmu_check_status(c2c_pmu, inst_idx);
+
+			/*
+			 * Each instance share same clock and the driver always
+			 * enables all instances. So we can use the counts from
+			 * one instance for cycle counter.
+			 */
+			if (ctr_id =3D=3D C2C_EVENT_CYCLES)
+				return read_reg64_hilohi(
+					c2c_pmu->base[inst_idx] + ctr_offset,
+					HILOHI_MAX_POLL);
+
+			/*
+			 * For other events, sum up the counts from all instances.
+			 */
+			val +=3D read_reg64_hilohi(
+				c2c_pmu->base[inst_idx] + ctr_offset,
+				HILOHI_MAX_POLL);
+		}
+	}
+
+	return val;
+}
+
+static void nv_c2c_pmu_event_update(struct perf_event *event)
+{
+	struct hw_perf_event *hwc =3D &event->hw;
+	u64 prev, now;
+
+	do {
+		prev =3D local64_read(&hwc->prev_count);
+		now =3D nv_c2c_pmu_read_counter(event);
+	} while (local64_cmpxchg(&hwc->prev_count, prev, now) !=3D prev);
+
+	local64_add(now - prev, &event->count);
+}
+
+static void nv_c2c_pmu_start(struct perf_event *event, int pmu_flags)
+{
+	event->hw.state =3D 0;
+}
+
+static void nv_c2c_pmu_stop(struct perf_event *event, int pmu_flags)
+{
+	event->hw.state |=3D PERF_HES_STOPPED;
+}
+
+static int nv_c2c_pmu_add(struct perf_event *event, int flags)
+{
+	struct nv_c2c_pmu *c2c_pmu =3D to_c2c_pmu(event->pmu);
+	struct nv_c2c_pmu_hw_events *hw_events =3D &c2c_pmu->hw_events;
+	struct hw_perf_event *hwc =3D &event->hw;
+	int idx;
+
+	if (WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(),
+					   &c2c_pmu->associated_cpus)))
+		return -ENOENT;
+
+	idx =3D nv_c2c_pmu_get_event_idx(hw_events, event);
+	if (idx < 0)
+		return idx;
+
+	hw_events->events[idx] =3D event;
+	hwc->idx =3D idx;
+	hwc->state =3D PERF_HES_STOPPED | PERF_HES_UPTODATE;
+
+	if (flags & PERF_EF_START)
+		nv_c2c_pmu_start(event, PERF_EF_RELOAD);
+
+	/* Propagate changes to the userspace mapping. */
+	perf_event_update_userpage(event);
+
+	return 0;
+}
+
+static void nv_c2c_pmu_del(struct perf_event *event, int flags)
+{
+	struct nv_c2c_pmu *c2c_pmu =3D to_c2c_pmu(event->pmu);
+	struct nv_c2c_pmu_hw_events *hw_events =3D &c2c_pmu->hw_events;
+	struct hw_perf_event *hwc =3D &event->hw;
+	int idx =3D hwc->idx;
+
+	nv_c2c_pmu_stop(event, PERF_EF_UPDATE);
+
+	hw_events->events[idx] =3D NULL;
+
+	clear_bit(idx, hw_events->used_ctrs);
+
+	perf_event_update_userpage(event);
+}
+
+static void nv_c2c_pmu_read(struct perf_event *event)
+{
+	nv_c2c_pmu_event_update(event);
+}
+
+static void nv_c2c_pmu_enable(struct pmu *pmu)
+{
+	void __iomem *bcast;
+	struct nv_c2c_pmu *c2c_pmu =3D to_c2c_pmu(pmu);
+
+	/* Check if any filter is enabled. */
+	if (bitmap_empty(c2c_pmu->hw_events.used_ctrs, C2C_MAX_ACTIVE_EVENTS))
+		return;
+
+	/* Enable all the counters. */
+	bcast =3D c2c_pmu->base_broadcast;
+	writel(0x1UL, bcast + C2C_CTRL);
+}
+
+static void nv_c2c_pmu_disable(struct pmu *pmu)
+{
+	unsigned int idx;
+	void __iomem *bcast;
+	struct perf_event *event;
+	struct nv_c2c_pmu *c2c_pmu =3D to_c2c_pmu(pmu);
+
+	/* Disable all the counters. */
+	bcast =3D c2c_pmu->base_broadcast;
+	writel(0x0UL, bcast + C2C_CTRL);
+
+	/*
+	 * The counters will start from 0 again on restart.
+	 * Update the events immediately to avoid losing the counts.
+	 */
+	for_each_set_bit(idx, c2c_pmu->hw_events.used_ctrs,
+			 C2C_MAX_ACTIVE_EVENTS) {
+		event =3D c2c_pmu->hw_events.events[idx];
+
+		if (!event)
+			continue;
+
+		nv_c2c_pmu_event_update(event);
+
+		local64_set(&event->hw.prev_count, 0ULL);
+	}
+}
+
+/* PMU identifier attribute. */
+
+static ssize_t nv_c2c_pmu_identifier_show(struct device *dev,
+					  struct device_attribute *attr,
+					  char *page)
+{
+	struct nv_c2c_pmu *c2c_pmu =3D to_c2c_pmu(dev_get_drvdata(dev));
+
+	return sysfs_emit(page, "%s\n", c2c_pmu->identifier);
+}
+
+static struct device_attribute nv_c2c_pmu_identifier_attr =3D
+	__ATTR(identifier, 0444, nv_c2c_pmu_identifier_show, NULL);
+
+static struct attribute *nv_c2c_pmu_identifier_attrs[] =3D {
+	&nv_c2c_pmu_identifier_attr.attr,
+	NULL,
+};
+
+static struct attribute_group nv_c2c_pmu_identifier_attr_group =3D {
+	.attrs =3D nv_c2c_pmu_identifier_attrs,
+};
+
+/* Peer attribute. */
+
+static ssize_t nv_c2c_pmu_peer_show(struct device *dev,
+	struct device_attribute *attr,
+	char *page)
+{
+	const char *peer_type[C2C_PEER_TYPE_COUNT] =3D {
+		[C2C_PEER_TYPE_CPU] =3D "cpu",
+		[C2C_PEER_TYPE_GPU] =3D "gpu",
+		[C2C_PEER_TYPE_CXLMEM] =3D "cxlmem",
+	};
+
+	struct nv_c2c_pmu *c2c_pmu =3D to_c2c_pmu(dev_get_drvdata(dev));
+	return sysfs_emit(page, "nr_%s=3D%u\n", peer_type[c2c_pmu->peer_type],
+		c2c_pmu->nr_peer);
+}
+
+static struct device_attribute nv_c2c_pmu_peer_attr =3D
+	__ATTR(peer, 0444, nv_c2c_pmu_peer_show, NULL);
+
+static struct attribute *nv_c2c_pmu_peer_attrs[] =3D {
+	&nv_c2c_pmu_peer_attr.attr,
+	NULL,
+};
+
+static struct attribute_group nv_c2c_pmu_peer_attr_group =3D {
+	.attrs =3D nv_c2c_pmu_peer_attrs,
+};
+
+/* Format attributes. */
+
+#define NV_C2C_PMU_EXT_ATTR(_name, _func, _config)			\
+	(&((struct dev_ext_attribute[]){				\
+		{							\
+			.attr =3D __ATTR(_name, 0444, _func, NULL),	\
+			.var =3D (void *)_config				\
+		}							\
+	})[0].attr.attr)
+
+#define NV_C2C_PMU_FORMAT_ATTR(_name, _config) \
+	NV_C2C_PMU_EXT_ATTR(_name, device_show_string, _config)
+
+#define NV_C2C_PMU_FORMAT_EVENT_ATTR \
+	NV_C2C_PMU_FORMAT_ATTR(event, "config:0-3")
+
+static struct attribute *nv_c2c_nvlink_pmu_formats[] =3D {
+	NV_C2C_PMU_FORMAT_EVENT_ATTR,
+	NV_C2C_PMU_FORMAT_ATTR(gpu_mask, "config1:0-1"),
+	NULL,
+};
+
+static struct attribute *nv_c2c_pmu_formats[] =3D {
+	NV_C2C_PMU_FORMAT_EVENT_ATTR,
+	NULL,
+};
+
+static struct attribute_group *
+nv_c2c_pmu_alloc_format_attr_group(struct nv_c2c_pmu *c2c_pmu)
+{
+	struct attribute_group *format_group;
+	struct device *dev =3D c2c_pmu->dev;
+
+	format_group =3D
+		devm_kzalloc(dev, sizeof(struct attribute_group), GFP_KERNEL);
+	if (!format_group)
+		return NULL;
+
+	format_group->name =3D "format";
+	format_group->attrs =3D c2c_pmu->formats;
+
+	return format_group;
+}
+
+/* Event attributes. */
+
+static ssize_t nv_c2c_pmu_sysfs_event_show(struct device *dev,
+					   struct device_attribute *attr,
+					   char *buf)
+{
+	struct perf_pmu_events_attr *pmu_attr;
+
+	pmu_attr =3D container_of(attr, typeof(*pmu_attr), attr);
+	return sysfs_emit(buf, "event=3D0x%llx\n", pmu_attr->id);
+}
+
+#define NV_C2C_PMU_EVENT_ATTR(_name, _config)	\
+	PMU_EVENT_ATTR_ID(_name, nv_c2c_pmu_sysfs_event_show, _config)
+
+static struct attribute *nv_c2c_pmu_events[] =3D {
+	NV_C2C_PMU_EVENT_ATTR(cycles, C2C_EVENT_CYCLES),
+	NV_C2C_PMU_EVENT_ATTR(in_rd_cum_outs, C2C_EVENT_IN_RD_CUM_OUTS),
+	NV_C2C_PMU_EVENT_ATTR(in_rd_req, C2C_EVENT_IN_RD_REQ),
+	NV_C2C_PMU_EVENT_ATTR(in_wr_cum_outs, C2C_EVENT_IN_WR_CUM_OUTS),
+	NV_C2C_PMU_EVENT_ATTR(in_wr_req, C2C_EVENT_IN_WR_REQ),
+	NV_C2C_PMU_EVENT_ATTR(out_rd_cum_outs, C2C_EVENT_OUT_RD_CUM_OUTS),
+	NV_C2C_PMU_EVENT_ATTR(out_rd_req, C2C_EVENT_OUT_RD_REQ),
+	NV_C2C_PMU_EVENT_ATTR(out_wr_cum_outs, C2C_EVENT_OUT_WR_CUM_OUTS),
+	NV_C2C_PMU_EVENT_ATTR(out_wr_req, C2C_EVENT_OUT_WR_REQ),
+	NULL
+};
+
+static umode_t
+nv_c2c_pmu_event_attr_is_visible(struct kobject *kobj, struct attribute *a=
ttr,
+				 int unused)
+{
+	struct device *dev =3D kobj_to_dev(kobj);
+	struct nv_c2c_pmu *c2c_pmu =3D to_c2c_pmu(dev_get_drvdata(dev));
+	struct perf_pmu_events_attr *eattr;
+
+	eattr =3D container_of(attr, typeof(*eattr), attr.attr);
+
+	if (c2c_pmu->c2c_type =3D=3D C2C_TYPE_NVDLINK) {
+		/* Only incoming reads are available. */
+		switch (eattr->id) {
+		case C2C_EVENT_IN_WR_CUM_OUTS:
+		case C2C_EVENT_IN_WR_REQ:
+		case C2C_EVENT_OUT_RD_CUM_OUTS:
+		case C2C_EVENT_OUT_RD_REQ:
+		case C2C_EVENT_OUT_WR_CUM_OUTS:
+		case C2C_EVENT_OUT_WR_REQ:
+			return 0;
+		default:
+			return attr->mode;
+		}
+	} else {
+		/* Hide the write events if C2C connected to another SoC. */
+		if (c2c_pmu->peer_type =3D=3D C2C_PEER_TYPE_CPU) {
+			switch (eattr->id) {
+			case C2C_EVENT_IN_WR_CUM_OUTS:
+			case C2C_EVENT_IN_WR_REQ:
+			case C2C_EVENT_OUT_WR_CUM_OUTS:
+			case C2C_EVENT_OUT_WR_REQ:
+				return 0;
+			default:
+				return attr->mode;
+			}
+		}
+	}
+
+	return attr->mode;
+}
+
+static const struct attribute_group nv_c2c_pmu_events_group =3D {
+	.name =3D "events",
+	.attrs =3D nv_c2c_pmu_events,
+	.is_visible =3D nv_c2c_pmu_event_attr_is_visible,
+};
+
+/* Cpumask attributes. */
+
+static ssize_t nv_c2c_pmu_cpumask_show(struct device *dev,
+				       struct device_attribute *attr, char *buf)
+{
+	struct pmu *pmu =3D dev_get_drvdata(dev);
+	struct nv_c2c_pmu *c2c_pmu =3D to_c2c_pmu(pmu);
+	struct dev_ext_attribute *eattr =3D
+		container_of(attr, struct dev_ext_attribute, attr);
+	unsigned long mask_id =3D (unsigned long)eattr->var;
+	const cpumask_t *cpumask;
+
+	switch (mask_id) {
+	case C2C_ACTIVE_CPU_MASK:
+		cpumask =3D &c2c_pmu->active_cpu;
+		break;
+	case C2C_ASSOCIATED_CPU_MASK:
+		cpumask =3D &c2c_pmu->associated_cpus;
+		break;
+	default:
+		return 0;
+	}
+	return cpumap_print_to_pagebuf(true, buf, cpumask);
+}
+
+#define NV_C2C_PMU_CPUMASK_ATTR(_name, _config)			\
+	NV_C2C_PMU_EXT_ATTR(_name, nv_c2c_pmu_cpumask_show,	\
+				(unsigned long)_config)
+
+static struct attribute *nv_c2c_pmu_cpumask_attrs[] =3D {
+	NV_C2C_PMU_CPUMASK_ATTR(cpumask, C2C_ACTIVE_CPU_MASK),
+	NV_C2C_PMU_CPUMASK_ATTR(associated_cpus, C2C_ASSOCIATED_CPU_MASK),
+	NULL,
+};
+
+static const struct attribute_group nv_c2c_pmu_cpumask_attr_group =3D {
+	.attrs =3D nv_c2c_pmu_cpumask_attrs,
+};
+
+/* Per PMU device attribute groups. */
+
+static int nv_c2c_pmu_alloc_attr_groups(struct nv_c2c_pmu *c2c_pmu)
+{
+	const struct attribute_group **attr_groups =3D c2c_pmu->attr_groups;
+
+	attr_groups[0] =3D nv_c2c_pmu_alloc_format_attr_group(c2c_pmu);
+	attr_groups[1] =3D &nv_c2c_pmu_events_group;
+	attr_groups[2] =3D &nv_c2c_pmu_cpumask_attr_group;
+	attr_groups[3] =3D &nv_c2c_pmu_identifier_attr_group;
+	attr_groups[4] =3D &nv_c2c_pmu_peer_attr_group;
+
+	if (!attr_groups[0])
+		return -ENOMEM;
+
+	return 0;
+}
+
+static int nv_c2c_pmu_online_cpu(unsigned int cpu, struct hlist_node *node)
+{
+	struct nv_c2c_pmu *c2c_pmu =3D
+		hlist_entry_safe(node, struct nv_c2c_pmu, cpuhp_node);
+
+	if (!cpumask_test_cpu(cpu, &c2c_pmu->associated_cpus))
+		return 0;
+
+	/* If the PMU is already managed, there is nothing to do */
+	if (!cpumask_empty(&c2c_pmu->active_cpu))
+		return 0;
+
+	/* Use this CPU for event counting */
+	cpumask_set_cpu(cpu, &c2c_pmu->active_cpu);
+
+	return 0;
+}
+
+static int nv_c2c_pmu_cpu_teardown(unsigned int cpu, struct hlist_node *no=
de)
+{
+	unsigned int dst;
+
+	struct nv_c2c_pmu *c2c_pmu =3D
+		hlist_entry_safe(node, struct nv_c2c_pmu, cpuhp_node);
+
+	/* Nothing to do if this CPU doesn't own the PMU */
+	if (!cpumask_test_and_clear_cpu(cpu, &c2c_pmu->active_cpu))
+		return 0;
+
+	/* Choose a new CPU to migrate ownership of the PMU to */
+	dst =3D cpumask_any_and_but(&c2c_pmu->associated_cpus,
+				  cpu_online_mask, cpu);
+	if (dst >=3D nr_cpu_ids)
+		return 0;
+
+	/* Use this CPU for event counting */
+	perf_pmu_migrate_context(&c2c_pmu->pmu, cpu, dst);
+	cpumask_set_cpu(dst, &c2c_pmu->active_cpu);
+
+	return 0;
+}
+
+static int nv_c2c_pmu_get_cpus(struct nv_c2c_pmu *c2c_pmu)
+{
+	int ret =3D 0, socket =3D c2c_pmu->socket, cpu;
+
+	for_each_possible_cpu(cpu) {
+		if (cpu_to_node(cpu) =3D=3D socket)
+			cpumask_set_cpu(cpu, &c2c_pmu->associated_cpus);
+	}
+
+	if (cpumask_empty(&c2c_pmu->associated_cpus)) {
+		dev_dbg(c2c_pmu->dev,
+			"No cpu associated with C2C PMU socket-%u\n", socket);
+		ret =3D -ENODEV;
+	}
+
+	return ret;
+}
+
+static int nv_c2c_pmu_init_socket(struct nv_c2c_pmu *c2c_pmu)
+{
+	const char *uid_str;
+	int ret, socket;
+
+	uid_str =3D acpi_device_uid(c2c_pmu->acpi_dev);
+	if (!uid_str) {
+		ret =3D -ENODEV;
+		goto fail;
+	}
+
+	ret =3D kstrtou32(uid_str, 0, &socket);
+	if (ret)
+		goto fail;
+
+	c2c_pmu->socket =3D socket;
+	return 0;
+
+fail:
+	dev_err(c2c_pmu->dev, "Failed to initialize socket\n");
+	return ret;
+}
+
+static int nv_c2c_pmu_init_id(struct nv_c2c_pmu *c2c_pmu)
+{
+	const char *name_fmt[C2C_TYPE_COUNT] =3D {
+		[C2C_TYPE_NVLINK] =3D "nvidia_nvlink_c2c_pmu_%u",
+		[C2C_TYPE_NVCLINK] =3D "nvidia_nvclink_pmu_%u",
+		[C2C_TYPE_NVDLINK] =3D "nvidia_nvdlink_pmu_%u",
+	};
+
+	char *name;
+	int ret;
+
+	name =3D devm_kasprintf(c2c_pmu->dev, GFP_KERNEL,
+		name_fmt[c2c_pmu->c2c_type], c2c_pmu->socket);
+	if (!name) {
+		ret =3D -ENOMEM;
+		goto fail;
+	}
+
+	c2c_pmu->name =3D name;
+
+	c2c_pmu->identifier =3D acpi_device_hid(c2c_pmu->acpi_dev);
+
+	return 0;
+
+fail:
+	dev_err(c2c_pmu->dev, "Failed to initialize name\n");
+	return ret;
+}
+
+static int nv_c2c_pmu_init_filter(struct nv_c2c_pmu *c2c_pmu)
+{
+	u32 cpu_en =3D 0;
+	struct device *dev =3D c2c_pmu->dev;
+
+	if (c2c_pmu->c2c_type =3D=3D C2C_TYPE_NVDLINK) {
+		c2c_pmu->peer_type =3D C2C_PEER_TYPE_CXLMEM;
+
+		c2c_pmu->nr_inst =3D C2C_NR_INST_NVDLINK;
+		c2c_pmu->peer_insts[0][0] =3D (1UL << c2c_pmu->nr_inst) - 1;
+
+		c2c_pmu->nr_peer =3D C2C_NR_PEER_CXLMEM;
+		c2c_pmu->filter_default =3D (1 << c2c_pmu->nr_peer) - 1;
+
+		c2c_pmu->formats =3D nv_c2c_pmu_formats;
+
+		return 0;
+	}
+
+	c2c_pmu->nr_inst =3D (c2c_pmu->c2c_type =3D=3D C2C_TYPE_NVLINK) ?
+		C2C_NR_INST_NVLINK : C2C_NR_INST_NVCLINK;
+
+	if (device_property_read_u32(dev, "cpu_en_mask", &cpu_en))
+		dev_dbg(dev, "no cpu_en_mask property\n");
+
+	if (cpu_en) {
+		c2c_pmu->peer_type =3D C2C_PEER_TYPE_CPU;
+
+		/* Fill peer_insts bitmap with instances connected to peer CPU. */
+		bitmap_from_arr32(c2c_pmu->peer_insts[0], &cpu_en,
+				c2c_pmu->nr_inst);
+
+		c2c_pmu->nr_peer =3D 1;
+		c2c_pmu->formats =3D nv_c2c_pmu_formats;
+	} else {
+		u32 i;
+		u32 gpu_en =3D 0;
+		const char *props[C2C_NR_PEER_MAX] =3D {
+			"gpu0_en_mask", "gpu1_en_mask"
+		};
+
+		for (i =3D 0; i < C2C_NR_PEER_MAX; i++) {
+			if (device_property_read_u32(dev, props[i], &gpu_en))
+				dev_dbg(dev, "no %s property\n", props[i]);
+
+			if (gpu_en) {
+				/* Fill peer_insts bitmap with instances connected to peer GPU. */
+				bitmap_from_arr32(c2c_pmu->peer_insts[i], &gpu_en,
+						c2c_pmu->nr_inst);
+
+				c2c_pmu->nr_peer++;
+			}
+		}
+
+		if (c2c_pmu->nr_peer =3D=3D 0) {
+			dev_err(dev, "No GPU is enabled\n");
+			return -EINVAL;
+		}
+
+		c2c_pmu->peer_type =3D C2C_PEER_TYPE_GPU;
+		c2c_pmu->formats =3D nv_c2c_nvlink_pmu_formats;
+	}
+
+	c2c_pmu->filter_default =3D (1 << c2c_pmu->nr_peer) - 1;
+
+	return 0;
+}
+
+static void *nv_c2c_pmu_init_pmu(struct platform_device *pdev)
+{
+	int ret;
+	struct nv_c2c_pmu *c2c_pmu;
+	struct acpi_device *acpi_dev;
+	struct device *dev =3D &pdev->dev;
+
+	acpi_dev =3D ACPI_COMPANION(dev);
+	if (!acpi_dev)
+		return ERR_PTR(-ENODEV);
+
+	c2c_pmu =3D devm_kzalloc(dev, sizeof(*c2c_pmu), GFP_KERNEL);
+	if (!c2c_pmu)
+		return ERR_PTR(-ENOMEM);
+
+	c2c_pmu->dev =3D dev;
+	c2c_pmu->acpi_dev =3D acpi_dev;
+	c2c_pmu->c2c_type =3D (unsigned int)(unsigned long)device_get_match_data(=
dev);
+	platform_set_drvdata(pdev, c2c_pmu);
+
+	ret =3D nv_c2c_pmu_init_socket(c2c_pmu);
+	if (ret)
+		goto done;
+
+	ret =3D nv_c2c_pmu_init_id(c2c_pmu);
+	if (ret)
+		goto done;
+
+	ret =3D nv_c2c_pmu_init_filter(c2c_pmu);
+	if (ret)
+		goto done;
+
+done:
+	if (ret)
+		return ERR_PTR(ret);
+
+	return c2c_pmu;
+}
+
+static int nv_c2c_pmu_init_mmio(struct nv_c2c_pmu *c2c_pmu)
+{
+	int i;
+	struct device *dev =3D c2c_pmu->dev;
+	struct platform_device *pdev =3D to_platform_device(dev);
+
+	/* Map the address of all the instances. */
+	for (i =3D 0; i < c2c_pmu->nr_inst; i++) {
+		c2c_pmu->base[i] =3D devm_platform_ioremap_resource(pdev, i);
+		if (IS_ERR(c2c_pmu->base[i])) {
+			dev_err(dev, "Failed map address for instance %d\n", i);
+			return PTR_ERR(c2c_pmu->base[i]);
+		}
+	}
+
+	/* Map broadcast address. */
+	c2c_pmu->base_broadcast =3D devm_platform_ioremap_resource(pdev,
+								 c2c_pmu->nr_inst);
+	if (IS_ERR(c2c_pmu->base_broadcast)) {
+		dev_err(dev, "Failed map broadcast address\n");
+		return PTR_ERR(c2c_pmu->base_broadcast);
+	}
+
+	return 0;
+}
+
+static int nv_c2c_pmu_register_pmu(struct nv_c2c_pmu *c2c_pmu)
+{
+	int ret;
+
+	ret =3D cpuhp_state_add_instance(nv_c2c_pmu_cpuhp_state,
+				       &c2c_pmu->cpuhp_node);
+	if (ret) {
+		dev_err(c2c_pmu->dev, "Error %d registering hotplug\n", ret);
+		return ret;
+	}
+
+	c2c_pmu->pmu =3D (struct pmu) {
+		.parent		=3D c2c_pmu->dev,
+		.task_ctx_nr	=3D perf_invalid_context,
+		.pmu_enable	=3D nv_c2c_pmu_enable,
+		.pmu_disable	=3D nv_c2c_pmu_disable,
+		.event_init	=3D nv_c2c_pmu_event_init,
+		.add		=3D nv_c2c_pmu_add,
+		.del		=3D nv_c2c_pmu_del,
+		.start		=3D nv_c2c_pmu_start,
+		.stop		=3D nv_c2c_pmu_stop,
+		.read		=3D nv_c2c_pmu_read,
+		.attr_groups	=3D c2c_pmu->attr_groups,
+		.capabilities	=3D PERF_PMU_CAP_NO_EXCLUDE |
+					PERF_PMU_CAP_NO_INTERRUPT,
+	};
+
+	ret =3D perf_pmu_register(&c2c_pmu->pmu, c2c_pmu->name, -1);
+	if (ret) {
+		dev_err(c2c_pmu->dev, "Failed to register C2C PMU: %d\n", ret);
+		cpuhp_state_remove_instance(nv_c2c_pmu_cpuhp_state,
+					  &c2c_pmu->cpuhp_node);
+		return ret;
+	}
+
+	return 0;
+}
+
+static int nv_c2c_pmu_probe(struct platform_device *pdev)
+{
+	int ret;
+	struct nv_c2c_pmu *c2c_pmu;
+
+	c2c_pmu =3D nv_c2c_pmu_init_pmu(pdev);
+	if (IS_ERR(c2c_pmu))
+		return PTR_ERR(c2c_pmu);
+
+	ret =3D nv_c2c_pmu_init_mmio(c2c_pmu);
+	if (ret)
+		return ret;
+
+	ret =3D nv_c2c_pmu_get_cpus(c2c_pmu);
+	if (ret)
+		return ret;
+
+	ret =3D nv_c2c_pmu_alloc_attr_groups(c2c_pmu);
+	if (ret)
+		return ret;
+
+	ret =3D nv_c2c_pmu_register_pmu(c2c_pmu);
+	if (ret)
+		return ret;
+
+	dev_dbg(c2c_pmu->dev, "Registered %s PMU\n", c2c_pmu->name);
+
+	return 0;
+}
+
+static void nv_c2c_pmu_device_remove(struct platform_device *pdev)
+{
+	struct nv_c2c_pmu *c2c_pmu =3D platform_get_drvdata(pdev);
+
+	perf_pmu_unregister(&c2c_pmu->pmu);
+	cpuhp_state_remove_instance(nv_c2c_pmu_cpuhp_state, &c2c_pmu->cpuhp_node);
+}
+
+static const struct acpi_device_id nv_c2c_pmu_acpi_match[] =3D {
+	{ "NVDA2023", (kernel_ulong_t)C2C_TYPE_NVLINK },
+	{ "NVDA2022", (kernel_ulong_t)C2C_TYPE_NVCLINK },
+	{ "NVDA2020", (kernel_ulong_t)C2C_TYPE_NVDLINK },
+	{ }
+};
+MODULE_DEVICE_TABLE(acpi, nv_c2c_pmu_acpi_match);
+
+static struct platform_driver nv_c2c_pmu_driver =3D {
+	.driver =3D {
+		.name =3D "nvidia-t410-c2c-pmu",
+		.acpi_match_table =3D ACPI_PTR(nv_c2c_pmu_acpi_match),
+		.suppress_bind_attrs =3D true,
+	},
+	.probe =3D nv_c2c_pmu_probe,
+	.remove =3D nv_c2c_pmu_device_remove,
+};
+
+static int __init nv_c2c_pmu_init(void)
+{
+	int ret;
+
+	ret =3D cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN,
+				      "perf/nvidia/c2c:online",
+				      nv_c2c_pmu_online_cpu,
+				      nv_c2c_pmu_cpu_teardown);
+	if (ret < 0)
+		return ret;
+
+	nv_c2c_pmu_cpuhp_state =3D ret;
+	return platform_driver_register(&nv_c2c_pmu_driver);
+}
+
+static void __exit nv_c2c_pmu_exit(void)
+{
+	platform_driver_unregister(&nv_c2c_pmu_driver);
+	cpuhp_remove_multi_state(nv_c2c_pmu_cpuhp_state);
+}
+
+module_init(nv_c2c_pmu_init);
+module_exit(nv_c2c_pmu_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("NVIDIA Tegra410 C2C PMU driver");
+MODULE_AUTHOR("Besar Wicaksono <bwicaksono@nvidia.com>");
--=20
2.43.0