From nobody Fri Apr 3 04:41:06 2026 Received: from PH8PR06CU001.outbound.protection.outlook.com (mail-westus3azon11012028.outbound.protection.outlook.com [40.107.209.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 03F78192B90; Tue, 24 Mar 2026 01:30:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.209.28 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774315821; cv=fail; b=HY6W6JFx4Hzbfnxb8dGexhpWtTvsPmhgi2pXG25jr7uQZT+/t51ddueRP9TboJnUhu2JS5DlC3w/DQeEACQ4Rq7Ui6FFyoXZa9D+iyuxilEmE2xktP6dFHLhkXAyIP4Tv8QVGQQkZP2+eH07BQxnhhjkvC6C5gn3hpufB3ZnpqQ= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774315821; c=relaxed/simple; bh=uW0U7sWv4q6D+t7dQEhGQ2iAta3BZxmXjZuKXSqECjg=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=pvh87xYl5e6HqJnpXqPkW6PgQz+tWZQRBn83R7r216OLkhFNaX075MwFNgh/X5k5pS1lJsyadIsk8c15nYTx1Wa36/jzNq6IqBQeX7Z8P+pXSNKk1s7ymK7NuIDrADQtS0tWx351ERXvWcv9Ij+jno+7WWq3uWV19KoxNfUE26c= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=aA4VTyPs; arc=fail smtp.client-ip=40.107.209.28 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="aA4VTyPs" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=RPGrW2KPHTKk+PcZspDD+rBFbyHIkJxupU0J5Xcu5XBL2QUGSuv+77B/R/M8b62bB6Xy8Dce1YQwz3GodmQ2hRXVQjHGf45nzcMrGnPJHyH21kzK9qS/DD6rqMOiXty48jmQeFEipIXmDaGTgC+BAUBrtShvKPE/3uP7+5BCYbM5z4ppsFWuk4wJ3XgseEELIBjB6rb8HsfGyaM87Q49uPmoIiq6oYUlN1yFthFkV14oou/PoCf2ZM7FPZzQ+Fw6IKOLvC+dmTFAbcGf7desSzIxlvCOZJpTbvVp+oOEukPjAA71hHTGSMplYxiy7Mj2eCwYe+hssZhfCiWRaa54DQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=qRL37xd9wtmmL9LmO6hkhJ+rDg1HbnpPEFTOE5j6uAw=; b=PWeSsSZCT1a9QFophjdPEGoKm7z0cKZDExU/ztl3qItO3XyQ0W09TvtutAd6Mfo7yq3OUMc9vAsDGz0wxn+9wePw0K2FYRU1uY6czQjF9JNHJarW6e7/u8EMYxouixamZi6a685OhUUiRGYzG1ix1fz9yDuMbJlQ/QbLI0ry8XfcYd3HNzPDUFqfIkf+0MPdq+5FcxrfJ2SsbZuLKJCpe5YCmPfZeCSBwX6N+9kcS/GctcpMRiVT7BEy6kxbBKlaqV9vBn6PZXYUE5MMhbxpQWQ4EOETvMoF1vuub3/FSG4CjHgh/3vXx+/714akncXGl+utk819XRbowHdtspsP+w== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.232) smtp.rcpttodomain=kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=qRL37xd9wtmmL9LmO6hkhJ+rDg1HbnpPEFTOE5j6uAw=; b=aA4VTyPs0/XrVH+cu8Z3v2MmeZ5ilxKcFIeL4PEuKib16u3gBS56tZmuG8oIdY7oQpikxNfT8uyodYVioSJeckBcwqeA2eHg7C6Yl/7dcsYTnW+iPVh7g/EYPb0vJ3Nb1aC4kAl/Zf0661oY1FnFAfH9A8A76K/p68oHH7KOaJNU9B5TaaJHWc91l+T0UZDckBaYYFEx6eGGvVJxccEYJPwC3MIy/fe08AkLl7IdaJeeCPfjfO4vHHeAEvZ+MixFN5BtMCNOFmitGqrpB/10MGZh8z6QJii9pgvk6SQp+KkodhcaJNLa6kLrLBltLYyfK9dedCtEa8D0dwWBY+Il7A== Received: from MN2PR16CA0059.namprd16.prod.outlook.com (2603:10b6:208:234::28) by SJ1PR12MB6316.namprd12.prod.outlook.com (2603:10b6:a03:455::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9745.20; Tue, 24 Mar 2026 01:30:15 +0000 Received: from MN1PEPF0000F0E2.namprd04.prod.outlook.com (2603:10b6:208:234:cafe::f2) by MN2PR16CA0059.outlook.office365.com (2603:10b6:208:234::28) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9723.31 via Frontend Transport; Tue, 24 Mar 2026 01:29:50 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.232) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.232 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.232; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.232) by MN1PEPF0000F0E2.mail.protection.outlook.com (10.167.242.40) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9723.19 via Frontend Transport; Tue, 24 Mar 2026 01:30:15 +0000 Received: from drhqmail203.nvidia.com (10.126.190.182) by mail.nvidia.com (10.127.129.5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Mon, 23 Mar 2026 18:30:04 -0700 Received: from drhqmail201.nvidia.com (10.126.190.180) by drhqmail203.nvidia.com (10.126.190.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Mon, 23 Mar 2026 18:30:03 -0700 Received: from build-bwicaksono-noble-20251018.internal (10.127.8.11) by mail.nvidia.com (10.126.190.180) with Microsoft SMTP Server id 15.2.2562.20 via Frontend Transport; Mon, 23 Mar 2026 18:30:02 -0700 From: Besar Wicaksono To: , , , , CC: , , , , , , , , , , , , , , , , Besar Wicaksono Subject: [PATCH v3 1/8] perf/arm_cspmu: nvidia: Rename doc to Tegra241 Date: Tue, 24 Mar 2026 01:29:45 +0000 Message-ID: <20260324012952.1923296-2-bwicaksono@nvidia.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260324012952.1923296-1-bwicaksono@nvidia.com> References: <20260324012952.1923296-1-bwicaksono@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-NV-OnPremToCloud: ExternallySecured X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: MN1PEPF0000F0E2:EE_|SJ1PR12MB6316:EE_ X-MS-Office365-Filtering-Correlation-Id: e74d8de0-9e9c-4536-53f3-08de8944e6ac X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|376014|7416014|36860700016|82310400026|56012099003|18002099003|22082099003; X-Microsoft-Antispam-Message-Info: wPxXc56bA6/bLFU8G+FT/BpwnmQM8aI5tJpUiRToZeWbcYp5M+luLEQ7oG++SlYJT8Cr+/r7hKWxt1kYxvfYYT3IVAscGaXPkO+p4MRKRtcTgKIrQ2QeCeugC5acJu/5cquTM0dwOzGWqE9ruxyhB9I+ostJoJJkQI1q0YJZxRHQXCXJsarzIm/tX+PNmQrHFJev3VhcdFRffhAD/gJcSpwC+1N6339JxGiM6aQay7wCB4q32ZsnBZH7dFqFY+cCuVNPTgiTfACscLa1/wrwoSS6eQ9GeAYKfHecJSvCRzoXmLisrgx4A8byhkD3E7w78i4ka7eRHjmnnrnrrYthOvjJkUiktvmY7Pz4cyxrcyJEf0UyeUuboIqGnnDWABwB1ltLikRmRqrX9hHxHRV9Zaa9HV+OAV/C9Zo7qy58dK3B2mibv6+X9Za1JaYPRwz/EU2hWwNkSDfd2VvPBEiEOggDl6EAGoei5jRSzvBudM0sHCdZB+ncYesR9/LpPPbzrLp2ROI8Ge0LSP/m793hAmIQEgE9FBYdvr03hzl5uu1PP60ywE6OvQZc0+UbYJVqzhajHDmBanJMKP3U224XnFb8C0yY11IGUixkuV/CA5lSLNVMQcix+Md7f5Izt+dR/BL+CciHv9JpceXwm8lk9ArmrfH23UGMj4suVhHifIvc5XgdS6MaJztPvLPZ6Kv57a/+StIXL/s0YqXtXsDmyqqj2lKYL0gMB4eKQd2Vd9wRL8UK+aH4JEEm8wxMczkSqviBoHSW4eLVuUX6KMniCg== X-Forefront-Antispam-Report: CIP:216.228.118.232;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge1.nvidia.com;CAT:NONE;SFS:(13230040)(1800799024)(376014)(7416014)(36860700016)(82310400026)(56012099003)(18002099003)(22082099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: uqStbu8oS9IXF2uYSSsbEZb7CeBrymuw8u9n3dp/79K+zLdSinFLeNe0oiU1HiYj8ZCDeB/5911sPtqYaKbCUOGG9XdazMPAbgu4Gtf0rUPiBOLo1NiMCn664qYlqzDVOb97c+b2QFfoBpi5eGUdmAwN719iSck8WpieVJP3ZNHVvfocb2a9v3vnuFGBrnN3taPDziBeaQWPShtUBvCfoyyyoNEnB6YaBbicKaUCDMGSOYL0PmqrQL0Miga1djAEBsTV51wfEfG8KkU4Ca9yY7fj6EcMAh+Jm+86GbN280zca6ndsm7dHKToNqBt8NfNm7FD0ZxQOnZLSDXjwiACCtkWD42PGTSzOwv5cOUviMuL3XoUpAU+53jmosUjqtueexeW3ZUtmkP6mr2ckr0BFzgCIqY2YI0kQWIArGiI4nnCxTn7Iidxgo7uuN8N8tq7 X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Mar 2026 01:30:15.2169 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: e74d8de0-9e9c-4536-53f3-08de8944e6ac X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.232];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: MN1PEPF0000F0E2.namprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SJ1PR12MB6316 Content-Type: text/plain; charset="utf-8" The documentation in nvidia-pmu.rst contains PMUs specific to NVIDIA Tegra241 SoC. Rename the file for this specific SoC to have better distinction with other NVIDIA SoC. Signed-off-by: Besar Wicaksono --- Documentation/admin-guide/perf/index.rst | 2 +- .../perf/{nvidia-pmu.rst =3D> nvidia-tegra241-pmu.rst} | 8 ++++---- 2 files changed, 5 insertions(+), 5 deletions(-) rename Documentation/admin-guide/perf/{nvidia-pmu.rst =3D> nvidia-tegra241= -pmu.rst} (98%) diff --git a/Documentation/admin-guide/perf/index.rst b/Documentation/admin= -guide/perf/index.rst index 47d9a3df6329..c407bb44b08e 100644 --- a/Documentation/admin-guide/perf/index.rst +++ b/Documentation/admin-guide/perf/index.rst @@ -24,7 +24,7 @@ Performance monitor support thunderx2-pmu alibaba_pmu dwc_pcie_pmu - nvidia-pmu + nvidia-tegra241-pmu meson-ddr-pmu cxl ampere_cspmu diff --git a/Documentation/admin-guide/perf/nvidia-pmu.rst b/Documentation/= admin-guide/perf/nvidia-tegra241-pmu.rst similarity index 98% rename from Documentation/admin-guide/perf/nvidia-pmu.rst rename to Documentation/admin-guide/perf/nvidia-tegra241-pmu.rst index f538ef67e0e8..fad5bc4cee6c 100644 --- a/Documentation/admin-guide/perf/nvidia-pmu.rst +++ b/Documentation/admin-guide/perf/nvidia-tegra241-pmu.rst @@ -1,8 +1,8 @@ -=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D -NVIDIA Tegra SoC Uncore Performance Monitoring Unit (PMU) -=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +NVIDIA Tegra241 SoC Uncore Performance Monitoring Unit (PMU) +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 -The NVIDIA Tegra SoC includes various system PMUs to measure key performan= ce +The NVIDIA Tegra241 SoC includes various system PMUs to measure key perfor= mance metrics like memory bandwidth, latency, and utilization: =20 * Scalable Coherency Fabric (SCF) --=20 2.43.0 From nobody Fri Apr 3 04:41:07 2026 Received: from BL2PR02CU003.outbound.protection.outlook.com (mail-eastusazon11011069.outbound.protection.outlook.com [52.101.52.69]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B5468377ECA; Tue, 24 Mar 2026 01:30:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.52.69 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774315841; cv=fail; b=lSdxGHHwvZbiJrfEyT4A+YOhFr/lyq44+fIm946kG0pITKdfNucm2TTTg2FMdiyGTlXrPae3CnMiPdhfrFC8fV0mxc3EWPzxXgsGCchUp9Ft1iEgo7mtmK43PL7Ll9Y2GiGUo1L39IlfAUcadeuvJ+Gg8vHxFf715+4IdvC4jNs= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774315841; c=relaxed/simple; bh=g9WXL7KFuRY2fA7MBv/iFR0IxFLNazRBX+5YYpvAiKo=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=fCjKHdgipem1xqBid/4zFE4utsBs2ep/3DMUkTT3BpwPI3nPsJE5cERXscCCMZUoIZ9VTLXtqwDpAJzrrMidpmSYLyTUxXg7g6xpo1NvHAsY0Ksm3vjsY4z2lZcPbf/Ggh1XUJJFC3lCF7xUoJsI7bFUaxTscRzGIAHUdNAvOQw= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=DUs5LoWD; arc=fail smtp.client-ip=52.101.52.69 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="DUs5LoWD" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=c3GjdgyzWcz37vMKr25nh4kUn6IarnCR77YlSn6nMHn0BgOoLNlneiYFjmQXvf7sojxu5oYzoKpSKg0D6/e/Xg1ZufWd51lvWgyGmQqql0/tsqe8e2MpRG/KlzaWFP6knBrd29XLi55+Mpd8srhVf9aHxlpTVmbpMPhLCiamJn1/saSRh4zI5FfNLuOEYlQ2kyO41J1pe1O3RF9LNYTrdGpYJtyyonueSaS3kjmJkVXM+3WUcxqNlJxhqwPY7Uiywm3GlHxK6BnywjCRzSkqH/fw1OXjp4Q6SOIeGAiHWXbKOzzcH0VmNrMNA+Q31txFlUFYIkSEoyAeGct0hb1tKA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=vuQLh5pLubrPMi5AySEr+GMTXrsEcUFRpizADzRdajI=; b=mzaLRT5TdcwaKDql9+kew4XXtyPjGNlv7yIlKdxtED28MbpdRKeOz7z37b/sb5n+pxeLKilgslJzDKyq6OHH7sob3rboxEXNbyZPqknal1a2coCa6VCc9KSiZO0B7sksI1hHscZYu34b+JPhZL2MIXahYU798mQ3Hnm8eT5U/ziRoYngUoXA1WFm6BhRGCSLaAiVFybbFcsh8PsiIMUOztAT2W6Wy/RYlot0bX+eiOA2WtdHVizGY6Jo3pA/isl8Jl60wpjZZw7z97qdf6sMQC6qLvTH4708n4iW+vsDimorNaKKA7N/A+1Q/C/BHVdCzm9GGj2YmB0sGt85stZD8Q== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.232) smtp.rcpttodomain=kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=vuQLh5pLubrPMi5AySEr+GMTXrsEcUFRpizADzRdajI=; b=DUs5LoWDHCgJRXk/lmGzqpSDjrtRTjCODch7HZ70IEynn6IC69+5pTmxbE89a2E3at3YTl/OtRqlYW8ddubWWqT9Hhw/Rl+esPJtgEZu3Cq4UCUydS2SGxDz4R7jMPC4qx3uopFOEIx3kMbWKpLaVz357RHUNfWM7nKl+JimvEEUj49zeVF82xMz0tMbkZXf4vT7SRnkuZNRTaIWJetYv8wDJKs7xpLJtt4IkO+d60wDzrScIUV81VDtGCL1FEwPI6mDEWe6bpZ9FmUfCx8ayi2SFN7HwyhGQ+bjxrSbKUwuX9QZyTRbmBwcFqTxGFteQht9vSi2mxxPadElmq8IDA== Received: from IA1P220CA0015.NAMP220.PROD.OUTLOOK.COM (2603:10b6:208:464::15) by SJ0PR12MB6805.namprd12.prod.outlook.com (2603:10b6:a03:44f::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9745.20; Tue, 24 Mar 2026 01:30:31 +0000 Received: from MN1PEPF0000F0E3.namprd04.prod.outlook.com (2603:10b6:208:464:cafe::87) by IA1P220CA0015.outlook.office365.com (2603:10b6:208:464::15) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9723.31 via Frontend Transport; Tue, 24 Mar 2026 01:30:36 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.232) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.232 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.232; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.232) by MN1PEPF0000F0E3.mail.protection.outlook.com (10.167.242.41) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9723.19 via Frontend Transport; Tue, 24 Mar 2026 01:30:30 +0000 Received: from drhqmail201.nvidia.com (10.126.190.180) by mail.nvidia.com (10.127.129.5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Mon, 23 Mar 2026 18:30:11 -0700 Received: from drhqmail201.nvidia.com (10.126.190.180) by drhqmail201.nvidia.com (10.126.190.180) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Mon, 23 Mar 2026 18:30:11 -0700 Received: from build-bwicaksono-noble-20251018.internal (10.127.8.11) by mail.nvidia.com (10.126.190.180) with Microsoft SMTP Server id 15.2.2562.20 via Frontend Transport; Mon, 23 Mar 2026 18:30:10 -0700 From: Besar Wicaksono To: , , , , CC: , , , , , , , , , , , , , , , , Besar Wicaksono Subject: [PATCH v3 2/8] perf/arm_cspmu: nvidia: Add Tegra410 UCF PMU Date: Tue, 24 Mar 2026 01:29:46 +0000 Message-ID: <20260324012952.1923296-3-bwicaksono@nvidia.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260324012952.1923296-1-bwicaksono@nvidia.com> References: <20260324012952.1923296-1-bwicaksono@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-NV-OnPremToCloud: ExternallySecured X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: MN1PEPF0000F0E3:EE_|SJ0PR12MB6805:EE_ X-MS-Office365-Filtering-Correlation-Id: 302dd1ff-b013-4aae-44dc-08de8944f00b X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|82310400026|376014|7416014|36860700016|1800799024|22082099003|56012099003|18002099003; X-Microsoft-Antispam-Message-Info: rvVaXht5PG0wlu3vuZ0QfICmYFwCwHeGyrP2WxXRM0L82xQX16KTWfG/FbCkUdO8ra/kT1auAjXjWwj1UhS98gTbe5N1RvCKak1XyfM99AhpLKD3UkZfwDH5XXznCTtVZ9jlAiG2cCwIEZ2vDflewnRRSXsWdVnP7Cqp/18gMR32+5u+cQJKVwK7DObkslQ9w2F4ckSMgaPGbpu6hw6qb0KPp4xneo6E4e1UJAv3/6zQ5P4FKrgIwyX9s7mLnhtwl+ZFHITJOCrsn0fdZ1g2Aqo9fcmyQMpIIK1hZvg3U68xaK1eXUs9E/DFdIVV9giT1ZPXzGSETSf1GCVY23tQHxPwqeOWC7bbvrZy7RTa6QHNOYjw86aSSvUVrdbLg1KtXAH7LyJw9exVnwXv+O1xh2WZ/L2X8u2DeDC0nTvO/HyM91oqkJo2f8MD64rU5O7+X0BlvRH/M2A5IZEgYwnsagaKWNNoOFPtmiWjXCYKfOQv+PUtVO/UGsGLxtbm6M7nGBsYDomVebh4Q2O0OPK4X765zWlVWQLmVWfpHoVF5NT2hvXSNBDvHUj2ERNE6XexNtmPjrLWdgaVo/6HHryObnFNfPYPkZmrmxjD+QPeyNp3235sXgpcGcxljqWSvBf8pyzyM0TRTNzcwM4QJmFen3XKQTfi4q+30V8Fs4bkAqUO5kx1DAq9cGOw8KNhQC3VcksdZhjWvzBIG84vC4ABA65ZmIPdba9bMvx3XJJby4C/xsMp74wXmFvLmeETmZ09v3QW8I0S1zF3EcFrnFOULQ== X-Forefront-Antispam-Report: CIP:216.228.118.232;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge1.nvidia.com;CAT:NONE;SFS:(13230040)(82310400026)(376014)(7416014)(36860700016)(1800799024)(22082099003)(56012099003)(18002099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: Xq2LJpxjI/OfD0tVRs5NrnQe2z0mVFRa1zZXVjdrrASWj5c0i6VedH0xm773Idluimds2y6/AFxijs5PWyxO0My4h/2vnlqx0tMASd8qPKTdCSNJcLEKuQtHx24Vb089jc8AO0XJfVOw75SCLyFAN43GhaigoH8yayJvwDG0tj6RNVwmbbZrEVYA1C9vksLOHI8WyxYDaj1YBzkkFU1XSjBnBZwMaPTFxATyJ74N1n3LRJGS+HyNda5nM/wK6LUgMahjoNdYM0Ewl5ZM+GuOlIHmq0P8Rs8sreufMwyh+Q1EFAzr8up2wXaL6lEXhQQKBUQfSb5jjkvYN7YOmMWF4YZnpucCcD2mK+H0R3beqO1+o9nyAHypXarxooHy43zhGm7njzDcNKlt3dNcm23nkda5/NpwT3wElLlVtGMk/fwVIsT46rdPezGmTVkOYUnU X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Mar 2026 01:30:30.9294 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 302dd1ff-b013-4aae-44dc-08de8944f00b X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.232];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: MN1PEPF0000F0E3.namprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SJ0PR12MB6805 Content-Type: text/plain; charset="utf-8" The Unified Coherence Fabric (UCF) contains last level cache and cache coherent interconnect in Tegra410 SOC. The PMU in this device can be used to capture events related to access to the last level cache and memory from different sources. Reviewed-by: Ilkka Koskinen Signed-off-by: Besar Wicaksono --- Documentation/admin-guide/perf/index.rst | 1 + .../admin-guide/perf/nvidia-tegra410-pmu.rst | 106 ++++++++++++++++++ drivers/perf/arm_cspmu/nvidia_cspmu.c | 87 +++++++++++++- 3 files changed, 193 insertions(+), 1 deletion(-) create mode 100644 Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst diff --git a/Documentation/admin-guide/perf/index.rst b/Documentation/admin= -guide/perf/index.rst index c407bb44b08e..aa12708ddb96 100644 --- a/Documentation/admin-guide/perf/index.rst +++ b/Documentation/admin-guide/perf/index.rst @@ -25,6 +25,7 @@ Performance monitor support alibaba_pmu dwc_pcie_pmu nvidia-tegra241-pmu + nvidia-tegra410-pmu meson-ddr-pmu cxl ampere_cspmu diff --git a/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst b/Docum= entation/admin-guide/perf/nvidia-tegra410-pmu.rst new file mode 100644 index 000000000000..7b7ba5700ca1 --- /dev/null +++ b/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst @@ -0,0 +1,106 @@ +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +NVIDIA Tegra410 SoC Uncore Performance Monitoring Unit (PMU) +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +The NVIDIA Tegra410 SoC includes various system PMUs to measure key perfor= mance +metrics like memory bandwidth, latency, and utilization: + +* Unified Coherence Fabric (UCF) + +PMU Driver +---------- + +The PMU driver describes the available events and configuration of each PM= U in +sysfs. Please see the sections below to get the sysfs path of each PMU. Li= ke +other uncore PMU drivers, the driver provides "cpumask" sysfs attribute to= show +the CPU id used to handle the PMU event. There is also "associated_cpus" +sysfs attribute, which contains a list of CPUs associated with the PMU ins= tance. + +UCF PMU +------- + +The Unified Coherence Fabric (UCF) in the NVIDIA Tegra410 SoC serves as a +distributed cache, last level for CPU Memory and CXL Memory, and cache coh= erent +interconnect that supports hardware coherence across multiple coherently c= aching +agents, including: + + * CPU clusters + * GPU + * PCIe Ordering Controller Unit (OCU) + * Other IO-coherent requesters + +The events and configuration options of this PMU device are described in s= ysfs, +see /sys/bus/event_source/devices/nvidia_ucf_pmu_. + +Some of the events available in this PMU can be used to measure bandwidth = and +utilization: + + * slc_access_rd: count the number of read requests to SLC. + * slc_access_wr: count the number of write requests to SLC. + * slc_bytes_rd: count the number of bytes transferred by slc_access_rd. + * slc_bytes_wr: count the number of bytes transferred by slc_access_wr. + * mem_access_rd: count the number of read requests to local or remote me= mory. + * mem_access_wr: count the number of write requests to local or remote m= emory. + * mem_bytes_rd: count the number of bytes transferred by mem_access_rd. + * mem_bytes_wr: count the number of bytes transferred by mem_access_wr. + * cycles: counts the UCF cycles. + +The average bandwidth is calculated as:: + + AVG_SLC_READ_BANDWIDTH_IN_GBPS =3D SLC_BYTES_RD / ELAPSED_TIME_IN_NS + AVG_SLC_WRITE_BANDWIDTH_IN_GBPS =3D SLC_BYTES_WR / ELAPSED_TIME_IN_NS + AVG_MEM_READ_BANDWIDTH_IN_GBPS =3D MEM_BYTES_RD / ELAPSED_TIME_IN_NS + AVG_MEM_WRITE_BANDWIDTH_IN_GBPS =3D MEM_BYTES_WR / ELAPSED_TIME_IN_NS + +The average request rate is calculated as:: + + AVG_SLC_READ_REQUEST_RATE =3D SLC_ACCESS_RD / CYCLES + AVG_SLC_WRITE_REQUEST_RATE =3D SLC_ACCESS_WR / CYCLES + AVG_MEM_READ_REQUEST_RATE =3D MEM_ACCESS_RD / CYCLES + AVG_MEM_WRITE_REQUEST_RATE =3D MEM_ACCESS_WR / CYCLES + +More details about what other events are available can be found in Tegra41= 0 SoC +technical reference manual. + +The events can be filtered based on source or destination. The source filt= er +indicates the traffic initiator to the SLC, e.g local CPU, non-CPU device,= or +remote socket. The destination filter specifies the destination memory typ= e, +e.g. local system memory (CMEM), local GPU memory (GMEM), or remote memory= . The +local/remote classification of the destination filter is based on the home +socket of the address, not where the data actually resides. The available +filters are described in +/sys/bus/event_source/devices/nvidia_ucf_pmu_/format/. + +The list of UCF PMU event filters: + +* Source filter: + + * src_loc_cpu: if set, count events from local CPU + * src_loc_noncpu: if set, count events from local non-CPU device + * src_rem: if set, count events from CPU, GPU, PCIE devices of remote so= cket + +* Destination filter: + + * dst_loc_cmem: if set, count events to local system memory (CMEM) addre= ss + * dst_loc_gmem: if set, count events to local GPU memory (GMEM) address + * dst_loc_other: if set, count events to local CXL memory address + * dst_rem: if set, count events to CPU, GPU, and CXL memory address of r= emote socket + +If the source is not specified, the PMU will count events from all sources= . If +the destination is not specified, the PMU will count events to all destina= tions. + +Example usage: + +* Count event id 0x0 in socket 0 from all sources and to all destinations:: + + perf stat -a -e nvidia_ucf_pmu_0/event=3D0x0/ + +* Count event id 0x0 in socket 0 with source filter =3D local CPU and dest= ination + filter =3D local system memory (CMEM):: + + perf stat -a -e nvidia_ucf_pmu_0/event=3D0x0,src_loc_cpu=3D0x1,dst_loc= _cmem=3D0x1/ + +* Count event id 0x0 in socket 1 with source filter =3D local non-CPU devi= ce and + destination filter =3D remote memory:: + + perf stat -a -e nvidia_ucf_pmu_1/event=3D0x0,src_loc_noncpu=3D0x1,dst_= rem=3D0x1/ diff --git a/drivers/perf/arm_cspmu/nvidia_cspmu.c b/drivers/perf/arm_cspmu= /nvidia_cspmu.c index e06a06d3407b..8e37cbe3bae9 100644 --- a/drivers/perf/arm_cspmu/nvidia_cspmu.c +++ b/drivers/perf/arm_cspmu/nvidia_cspmu.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0 /* - * Copyright (c) 2022-2023, NVIDIA CORPORATION & AFFILIATES. All rights re= served. + * Copyright (c) 2022-2026, NVIDIA CORPORATION & AFFILIATES. All rights re= served. * */ =20 @@ -21,6 +21,13 @@ #define NV_CNVL_PORT_COUNT 4ULL #define NV_CNVL_FILTER_ID_MASK GENMASK_ULL(NV_CNVL_PORT_COUNT - 1, 0) =20 +#define NV_UCF_SRC_COUNT 3ULL +#define NV_UCF_DST_COUNT 4ULL +#define NV_UCF_FILTER_ID_MASK GENMASK_ULL(11, 0) +#define NV_UCF_FILTER_SRC GENMASK_ULL(2, 0) +#define NV_UCF_FILTER_DST GENMASK_ULL(11, 8) +#define NV_UCF_FILTER_DEFAULT (NV_UCF_FILTER_SRC | NV_UCF_FILTER_DS= T) + #define NV_GENERIC_FILTER_ID_MASK GENMASK_ULL(31, 0) =20 #define NV_PRODID_MASK (PMIIDR_PRODUCTID | PMIIDR_VARIANT | PMIIDR_REVISIO= N) @@ -124,6 +131,36 @@ static struct attribute *mcf_pmu_event_attrs[] =3D { NULL, }; =20 +static struct attribute *ucf_pmu_event_attrs[] =3D { + ARM_CSPMU_EVENT_ATTR(bus_cycles, 0x1D), + + ARM_CSPMU_EVENT_ATTR(slc_allocate, 0xF0), + ARM_CSPMU_EVENT_ATTR(slc_wb, 0xF3), + ARM_CSPMU_EVENT_ATTR(slc_refill_rd, 0x109), + ARM_CSPMU_EVENT_ATTR(slc_refill_wr, 0x10A), + ARM_CSPMU_EVENT_ATTR(slc_hit_rd, 0x119), + + ARM_CSPMU_EVENT_ATTR(slc_access_dataless, 0x183), + ARM_CSPMU_EVENT_ATTR(slc_access_atomic, 0x184), + + ARM_CSPMU_EVENT_ATTR(slc_access_rd, 0x111), + ARM_CSPMU_EVENT_ATTR(slc_access_wr, 0x112), + ARM_CSPMU_EVENT_ATTR(slc_bytes_rd, 0x113), + ARM_CSPMU_EVENT_ATTR(slc_bytes_wr, 0x114), + + ARM_CSPMU_EVENT_ATTR(mem_access_rd, 0x121), + ARM_CSPMU_EVENT_ATTR(mem_access_wr, 0x122), + ARM_CSPMU_EVENT_ATTR(mem_bytes_rd, 0x123), + ARM_CSPMU_EVENT_ATTR(mem_bytes_wr, 0x124), + + ARM_CSPMU_EVENT_ATTR(local_snoop, 0x180), + ARM_CSPMU_EVENT_ATTR(ext_snp_access, 0x181), + ARM_CSPMU_EVENT_ATTR(ext_snp_evict, 0x182), + + ARM_CSPMU_EVENT_ATTR(cycles, ARM_CSPMU_EVT_CYCLES_DEFAULT), + NULL +}; + static struct attribute *generic_pmu_event_attrs[] =3D { ARM_CSPMU_EVENT_ATTR(cycles, ARM_CSPMU_EVT_CYCLES_DEFAULT), NULL, @@ -152,6 +189,18 @@ static struct attribute *cnvlink_pmu_format_attrs[] = =3D { NULL, }; =20 +static struct attribute *ucf_pmu_format_attrs[] =3D { + ARM_CSPMU_FORMAT_EVENT_ATTR, + ARM_CSPMU_FORMAT_ATTR(src_loc_noncpu, "config1:0"), + ARM_CSPMU_FORMAT_ATTR(src_loc_cpu, "config1:1"), + ARM_CSPMU_FORMAT_ATTR(src_rem, "config1:2"), + ARM_CSPMU_FORMAT_ATTR(dst_loc_cmem, "config1:8"), + ARM_CSPMU_FORMAT_ATTR(dst_loc_gmem, "config1:9"), + ARM_CSPMU_FORMAT_ATTR(dst_loc_other, "config1:10"), + ARM_CSPMU_FORMAT_ATTR(dst_rem, "config1:11"), + NULL +}; + static struct attribute *generic_pmu_format_attrs[] =3D { ARM_CSPMU_FORMAT_EVENT_ATTR, ARM_CSPMU_FORMAT_FILTER_ATTR, @@ -236,6 +285,27 @@ static void nv_cspmu_set_cc_filter(struct arm_cspmu *c= spmu, writel(filter, cspmu->base0 + PMCCFILTR); } =20 +static u32 ucf_pmu_event_filter(const struct perf_event *event) +{ + u32 ret, filter, src, dst; + + filter =3D nv_cspmu_event_filter(event); + + /* Monitor all sources if none is selected. */ + src =3D FIELD_GET(NV_UCF_FILTER_SRC, filter); + if (src =3D=3D 0) + src =3D GENMASK_ULL(NV_UCF_SRC_COUNT - 1, 0); + + /* Monitor all destinations if none is selected. */ + dst =3D FIELD_GET(NV_UCF_FILTER_DST, filter); + if (dst =3D=3D 0) + dst =3D GENMASK_ULL(NV_UCF_DST_COUNT - 1, 0); + + ret =3D FIELD_PREP(NV_UCF_FILTER_SRC, src); + ret |=3D FIELD_PREP(NV_UCF_FILTER_DST, dst); + + return ret; +} =20 enum nv_cspmu_name_fmt { NAME_FMT_GENERIC, @@ -342,6 +412,21 @@ static const struct nv_cspmu_match nv_cspmu_match[] = =3D { .init_data =3D NULL }, }, + { + .prodid =3D 0x2CF20000, + .prodid_mask =3D NV_PRODID_MASK, + .name_pattern =3D "nvidia_ucf_pmu_%u", + .name_fmt =3D NAME_FMT_SOCKET, + .template_ctx =3D { + .event_attr =3D ucf_pmu_event_attrs, + .format_attr =3D ucf_pmu_format_attrs, + .filter_mask =3D NV_UCF_FILTER_ID_MASK, + .filter_default_val =3D NV_UCF_FILTER_DEFAULT, + .filter2_mask =3D 0x0, + .filter2_default_val =3D 0x0, + .get_filter =3D ucf_pmu_event_filter, + }, + }, { .prodid =3D 0, .prodid_mask =3D 0, --=20 2.43.0 From nobody Fri Apr 3 04:41:07 2026 Received: from CY3PR05CU001.outbound.protection.outlook.com (mail-westcentralusazon11013042.outbound.protection.outlook.com [40.93.201.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 949F9376465; Tue, 24 Mar 2026 01:30:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.93.201.42 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774315841; cv=fail; b=dWghltaX4DXLBjIiALZrRkpZQDV6b2ThHgT+e5Gw+6UD/qmnoDb9W2Nh+LfzysQEaQQ4OGPgdsnWAoFc1ajvUZK6DHta5yFwAbY8q4i9awcC8l8270Uqy951WQq0eIlwqCpTkPER6oplo5vkPz8Yd9rpaRB4MNRRNWEQoIsEnhg= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774315841; c=relaxed/simple; bh=p8wg8+aDEwTUPuia4htwnUVxE8GYBXeQ02CLMeq2fSk=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=riU0J6VkOuSLp+FPh9vWXBJmUdHw9sCzSbmLXgPIN7n6SJFo5VGBquxqvPZ9Q7jz1WyGFGBKtDioDXgr/rCjwLwoe2O94GHDyEnK4YuAThk0L4wNmUD2GCjrCfDSHDzjunb6Mp3O6elHICWnSfc45lk91iAyhPrj0XSwdx/yG6c= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=EeMXWEJz; arc=fail smtp.client-ip=40.93.201.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="EeMXWEJz" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=dasLjEGm6Qm7eqhwc9jRbb6DeLxMLDprp8hYLn86GON70RlQhAv9bhvLEb27JyWL4bTjjckrQm5ewYkQQOB/PkZOYkgarGgZpxM7e53hmF3ITUIA/dFs17apoGd5U7JxQNpo5w2rukDoUEXELZVY6ZR21vb2ymd3b4S+Pj6LSESD9vDN3WeKisliI5AKTRvP0AEYps4TYy+W5SHtyocIq7fa8QAu21/td3Bo+OmXTpDNSD7Ibu7EezRgCsMVGwQeYv5tv19kLpr9BffIOwmwk7r6DNAdAESdZ2BOLrzEhnIFtcTCWBFyanTzS8uVrX3Hp53iWgwFKRrsywiOtuXCPA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=W2XY9ToOTJj5LX1UxUtEFcXJ54zADajt4NNMRmgp058=; b=QE1hHK1ZS5PiNTLZefXreMIAl56kVQhDkSpXgcsV0KT2WtCHIpYNPzouiCGaNxehZ5Qva8+Dez3AljGOCWjznJ0mjrvKx2zIE1KbyDpBpJXZEgrAfmVsYnhtSr0mpJblap/EbgnUkeffg9kz11HY+OEuY+pG08SgtVoI0BHinthJPjT3I8rGzQXLTgoIcsO+x3Z0mS5r22AWDuMNZtUAgGIoIlFvH0iK2BGmM4pbZeXaVGyQNn7+IATSKGRVy0GRKSEadsPKALLHBhyGt2wON8dI9nRnzWqCn0aABPU4U5HVt+e9VQa2YtEnQ9emxPP+ZD+q3lC3mXGZlFHlfxJ3rA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.232) smtp.rcpttodomain=kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=W2XY9ToOTJj5LX1UxUtEFcXJ54zADajt4NNMRmgp058=; b=EeMXWEJzQ0pGj0O1MfoPJq+xeyIdGtufMYySKpc608vLbypUirqz7VkDI7qroatiKd/w3R2+B3fhdWd7rvhV8dr7RN8BCPWi9qnQcCnZtdGjhz6ZC4ViU4gyvyDTWoA3PigjxuhfHpk5OXfofybT3KMNzQgR2yMn18K7m4sqyl1pmIApa2lFbHVFRnw1Rw2HXyR0IZlKI/FnKAEF78jCxmMDXt0VughptQ+/PN3Z+XpY++dgfyTICepngKiD0nKIBTX0Ot4jizDw3MvwcQLmkhdUnIJ7EQFeMq3Jfjb3/VrMcDmSxRwgJHjejWb4ru+Drt08RTcafYAhfCi/niN1WQ== Received: from MN2PR08CA0016.namprd08.prod.outlook.com (2603:10b6:208:239::21) by LV2PR12MB5727.namprd12.prod.outlook.com (2603:10b6:408:17d::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9745.20; Tue, 24 Mar 2026 01:30:34 +0000 Received: from MN1PEPF0000F0DF.namprd04.prod.outlook.com (2603:10b6:208:239:cafe::8a) by MN2PR08CA0016.outlook.office365.com (2603:10b6:208:239::21) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9723.31 via Frontend Transport; Tue, 24 Mar 2026 01:30:32 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.232) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.232 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.232; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.232) by MN1PEPF0000F0DF.mail.protection.outlook.com (10.167.242.37) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9723.19 via Frontend Transport; Tue, 24 Mar 2026 01:30:34 +0000 Received: from drhqmail201.nvidia.com (10.126.190.180) by mail.nvidia.com (10.127.129.5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Mon, 23 Mar 2026 18:30:14 -0700 Received: from drhqmail201.nvidia.com (10.126.190.180) by drhqmail201.nvidia.com (10.126.190.180) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Mon, 23 Mar 2026 18:30:13 -0700 Received: from build-bwicaksono-noble-20251018.internal (10.127.8.11) by mail.nvidia.com (10.126.190.180) with Microsoft SMTP Server id 15.2.2562.20 via Frontend Transport; Mon, 23 Mar 2026 18:30:13 -0700 From: Besar Wicaksono To: , , , , CC: , , , , , , , , , , , , , , , , Besar Wicaksono Subject: [PATCH v3 3/8] perf/arm_cspmu: Add arm_cspmu_acpi_dev_get Date: Tue, 24 Mar 2026 01:29:47 +0000 Message-ID: <20260324012952.1923296-4-bwicaksono@nvidia.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260324012952.1923296-1-bwicaksono@nvidia.com> References: <20260324012952.1923296-1-bwicaksono@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-NV-OnPremToCloud: ExternallySecured X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: MN1PEPF0000F0DF:EE_|LV2PR12MB5727:EE_ X-MS-Office365-Filtering-Correlation-Id: 0534c25f-8970-490b-c34f-08de8944f1fe X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|82310400026|1800799024|36860700016|7416014|376014|18002099003|56012099003|22082099003; X-Microsoft-Antispam-Message-Info: rx61I6tcGhAknLbLOs5AvikZZb9dEzca6wdEnp55whl3ZfkQu9EZPo9aUNmy0FG4KzJ1Ljmc+6B+mh3FwVb7V1nqG7XW62MxWyiL+jABhW+K34Ode3I8cfF/HJzTMYUrDMPxTRMJCqR5Tp6in8ODXDaC8pw1y1BYU3ZaYBmeylIRpxsiT1z1hdV7VtFoDCmO23arj7cjXn3FMQSJoj//rDz/lZhm1tsf3sCMdBGVbD+3xX0db+PF3SAC14hx6+IBwNW/fLZFttzRfEG4SQeM0LJ56tl3jYa8qLXJUQMqbbHc5B0RSBzUgrU7Ht4/o+wO1qZFSbCP++CXXZiMU2hjXG1ogk2P2NgyACA7JnkcXejwKaEm/wc2CZAaHB7rMM6TLKGygAUQWsqJjM9OXgKFbESDj2EY8HrBmJfs1DpiipddEmlEoxBnROqmU6oLeRrNwrWhKYGeakvOF2aMXiZ+F6hs9GjCNwp3r4ZZpYOiKsu2V9jWhbkLS8LQYuLiyHMpppyfNyA3R5x3nmU1xF1EOpjdIBpIDiC9phSCCYmPkfYJgILhsh0/4lZkQN1Q9BNduUQhnINaAbzJgHJ5haJutDuNqlQCSYfH9rsCD89pT72/u6IbvtpsD6L33XGugxHdVbC+W5rvObs5qlPN3uYg2Q6lCdcieHSQggAt1XUVr9kbTxqkFvjB4CgzlrRIs0pTs5qd0a6pfDfZHJBo6aPofNJ7WRrizx4nQkOYSWznVyoiXe/pO68O1liXpvo38tvZEp2mtt/Ew2XklGOlpsDMfg== X-Forefront-Antispam-Report: CIP:216.228.118.232;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge1.nvidia.com;CAT:NONE;SFS:(13230040)(82310400026)(1800799024)(36860700016)(7416014)(376014)(18002099003)(56012099003)(22082099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: +1nsKLobXbu3i2gIic5Olo75J0/UIgG48f26FdRI1JfwwBwaQXFJ1I3d/G2W4yz/g3PY6gak4CT6jAlOTQAXK5tS6NPK26fkmgiEJjiRnfzUGYUsYK8a4dx48WcJmkvtmIVcLx/WVSJIb7rCei3Bx4HKYw3dQyUPOkav/7A3jbEbceAMWZ8hwwXxRJigbq44qyhSMYeMYovKWNgwzZHMb9cOhN2mjzCYRXhCAM/Uc+iZDqynxXv6PTAXp6HsY6Lf9n+l9WO8FryvYAFL1c1qyiaidy0ynFG7FWcAKJMKd8p6h0NtY+gwOCWYKXZt6Hwibktzof8vW8WHrlNENF+VwfuVVq2zPUAIVzxMG5HePrg7j7uLdHAD6hNmzov6kaEyhvZrnBZAvK35WdmkuWaBIqxz6Q3cGvFxoZHrvqfXjmSz3t3iiDRHFfgWpLDX1tUg X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Mar 2026 01:30:34.1984 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 0534c25f-8970-490b-c34f-08de8944f1fe X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.232];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: MN1PEPF0000F0DF.namprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: LV2PR12MB5727 Content-Type: text/plain; charset="utf-8" Add interface to get ACPI device associated with the PMU. This ACPI device may contain additional properties not covered by the standard properties. Reviewed-by: Ilkka Koskinen Signed-off-by: Besar Wicaksono --- drivers/perf/arm_cspmu/arm_cspmu.c | 19 ++++++++++++++++++- drivers/perf/arm_cspmu/arm_cspmu.h | 17 ++++++++++++++++- 2 files changed, 34 insertions(+), 2 deletions(-) diff --git a/drivers/perf/arm_cspmu/arm_cspmu.c b/drivers/perf/arm_cspmu/ar= m_cspmu.c index 34430b68f602..49e8a1f38131 100644 --- a/drivers/perf/arm_cspmu/arm_cspmu.c +++ b/drivers/perf/arm_cspmu/arm_cspmu.c @@ -16,7 +16,7 @@ * The user should refer to the vendor technical documentation to get deta= ils * about the supported events. * - * Copyright (c) 2022-2023, NVIDIA CORPORATION & AFFILIATES. All rights re= served. + * Copyright (c) 2022-2026, NVIDIA CORPORATION & AFFILIATES. All rights re= served. * */ =20 @@ -1132,6 +1132,23 @@ static int arm_cspmu_acpi_get_cpus(struct arm_cspmu = *cspmu) =20 return 0; } + +struct acpi_device *arm_cspmu_acpi_dev_get(const struct arm_cspmu *cspmu) +{ + char hid[16] =3D {}; + char uid[16] =3D {}; + const struct acpi_apmt_node *apmt_node; + + apmt_node =3D arm_cspmu_apmt_node(cspmu->dev); + if (!apmt_node || apmt_node->type !=3D ACPI_APMT_NODE_TYPE_ACPI) + return NULL; + + memcpy(hid, &apmt_node->inst_primary, sizeof(apmt_node->inst_primary)); + snprintf(uid, sizeof(uid), "%u", apmt_node->inst_secondary); + + return acpi_dev_get_first_match_dev(hid, uid, -1); +} +EXPORT_SYMBOL_GPL(arm_cspmu_acpi_dev_get); #else static int arm_cspmu_acpi_get_cpus(struct arm_cspmu *cspmu) { diff --git a/drivers/perf/arm_cspmu/arm_cspmu.h b/drivers/perf/arm_cspmu/ar= m_cspmu.h index cd65a58dbd88..3fc5c8d77266 100644 --- a/drivers/perf/arm_cspmu/arm_cspmu.h +++ b/drivers/perf/arm_cspmu/arm_cspmu.h @@ -1,13 +1,14 @@ /* SPDX-License-Identifier: GPL-2.0 * * ARM CoreSight Architecture PMU driver. - * Copyright (c) 2022-2023, NVIDIA CORPORATION & AFFILIATES. All rights re= served. + * Copyright (c) 2022-2026, NVIDIA CORPORATION & AFFILIATES. All rights re= served. * */ =20 #ifndef __ARM_CSPMU_H__ #define __ARM_CSPMU_H__ =20 +#include #include #include #include @@ -255,4 +256,18 @@ int arm_cspmu_impl_register(const struct arm_cspmu_imp= l_match *impl_match); /* Unregister vendor backend. */ void arm_cspmu_impl_unregister(const struct arm_cspmu_impl_match *impl_mat= ch); =20 +#if defined(CONFIG_ACPI) && defined(CONFIG_ARM64) +/** + * Get ACPI device associated with the PMU. + * The caller is responsible for calling acpi_dev_put() on the returned de= vice. + */ +struct acpi_device *arm_cspmu_acpi_dev_get(const struct arm_cspmu *cspmu); +#else +static inline struct acpi_device * +arm_cspmu_acpi_dev_get(const struct arm_cspmu *cspmu) +{ + return NULL; +} +#endif + #endif /* __ARM_CSPMU_H__ */ --=20 2.43.0 From nobody Fri Apr 3 04:41:07 2026 Received: from BYAPR05CU005.outbound.protection.outlook.com (mail-westusazon11010068.outbound.protection.outlook.com [52.101.85.68]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7C7C93750AB; Tue, 24 Mar 2026 01:30:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.85.68 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774315851; cv=fail; b=XoZtYptzETnmdjIUK0nw3g5hwlhE/KFDaM2DMqMmQ3aMmSHF2n9EH7Rc3XChAuyUeei6svGg6WyfyAtGYddw2m6fKLvAYlfgjXjam0dQbOBae4rS4SQ9niznWGeT3GbJDWQS+qjU2gYRkVjbp1RDSqWVU9/oQ0vkyhtHP2+C7A0= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774315851; c=relaxed/simple; bh=dSPKUMOY9vLx6XK3FeRH8Ia6jlIiBArcxeX2oV7Vh70=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Fqv261eOAUVcmkp3GgF+NADyxF5T3CIQS+CfooDIO6OIN4AKBJkTJtDJS0azuXawMBbIr7CwaSD7aGQrmrOQEMIqPofO2hzsCQbPJF7lLKZ4pg4bE1VT0vyn4nCsQgciho1jyckT95aW+qjyvBdzxqyJFKR44dFFo+NKOp0eiZA= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=CmH7sAQ8; arc=fail smtp.client-ip=52.101.85.68 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="CmH7sAQ8" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=UrbYJeE7n2uebTz6Jpd96jpUuXjhNe74YMnugmvkZ9q6kD2xmuOPFvTreI5HKjuuYv0YchXFYuwWoG0A0chLbGThaRgHUKLFyn53Iok6yV+cg6B0nNQK+ZXTMRz+211SJxPzv1kRIJQIvWMjTO4WjSfSlRIf1+pKMmiRBLQ2xCjaRBK6WlEz9BxLljObOUersP76vzG6jZCm2sCDA2RWWmWpEsWWWV/oOSyW/ZTlm5ksDS56uiG39FkxJbiaYmMxhNQ3Qpm/jkKGGK9lXCP6y4tGhTXE/SNTfNxot+4PvNq9qMoE9bqpgNr4OZE5f8D/BZEiBu49HtjgphnlaVSyrw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=6c7amCQLuI6Q+bCquVlkMA3sKViWBJh/gSEg8otJfaQ=; b=OfzIJxvjPca4lqqkaCPxy3Ys8RLReE8f5vl+WsFNJha/hl9gzT8ODUYwrOzoF+3gA15BIMm8Dl1QSKM/5THHvvgEBnoF3firMmQxS1IH16M9UgwyAw7lUXc5S8gWtYwxEEKfhBQvYU/dvH7pbPQERWpNoMxibbTfYSU1xTeSq+rLl6M/+P7L5LZ6hWD5B/YaiV57L7Z+cByh8mn5mtyQIR5REqgnTwygKkQXeljM8jWxSAp4CrvcP6zpEOJHUW30iNZfDr5Z22R16ywh53NawB5mhlhz0/eQjdeat2qYGD5nyhOdjqpRA34/VjNmeQ3Cw8d5NFKV58VdxtS1mCmkRQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.233) smtp.rcpttodomain=kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=6c7amCQLuI6Q+bCquVlkMA3sKViWBJh/gSEg8otJfaQ=; b=CmH7sAQ8yoaq2msKNXWFhePrcUj1+wxXO7ul+ub5urh0jkzR67Zf1ilJtUc1A3iQ2ZLfpqCH9YVWPeW58gJI90uaOMcrkueSnXdF2PRUM+b0SUgtzPimHrm0/elkrpj7+cUU8YQIG9cGk/gBcWxDMAtqRFtoXH2b6jFWc8FcE/lJzXVDo0PyuY5qTkYDUb+oNE0/FE9Aw0jFi16x1HJbigIiSWFEiKaOx+liITaZZoSkBDBJLrdTZdm2Fpm34UwwXS+ftToG3sfvxmzYPsxZumR8hM7QozIAz44JdFENGulCnijlWBhqK+JexrGpZF86ZWq3h0QN/7Li8sOeecbyTw== Received: from BN9PR03CA0899.namprd03.prod.outlook.com (2603:10b6:408:13c::34) by PH7PR12MB7308.namprd12.prod.outlook.com (2603:10b6:510:20c::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9745.20; Tue, 24 Mar 2026 01:30:37 +0000 Received: from BN3PEPF0000B075.namprd04.prod.outlook.com (2603:10b6:408:13c:cafe::c5) by BN9PR03CA0899.outlook.office365.com (2603:10b6:408:13c::34) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9723.31 via Frontend Transport; Tue, 24 Mar 2026 01:30:31 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.233) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.233 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.233; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.233) by BN3PEPF0000B075.mail.protection.outlook.com (10.167.243.120) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9723.19 via Frontend Transport; Tue, 24 Mar 2026 01:30:36 +0000 Received: from drhqmail202.nvidia.com (10.126.190.181) by mail.nvidia.com (10.127.129.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Mon, 23 Mar 2026 18:30:16 -0700 Received: from drhqmail201.nvidia.com (10.126.190.180) by drhqmail202.nvidia.com (10.126.190.181) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Mon, 23 Mar 2026 18:30:16 -0700 Received: from build-bwicaksono-noble-20251018.internal (10.127.8.11) by mail.nvidia.com (10.126.190.180) with Microsoft SMTP Server id 15.2.2562.20 via Frontend Transport; Mon, 23 Mar 2026 18:30:15 -0700 From: Besar Wicaksono To: , , , , CC: , , , , , , , , , , , , , , , , Besar Wicaksono Subject: [PATCH v3 4/8] perf/arm_cspmu: nvidia: Add Tegra410 PCIE PMU Date: Tue, 24 Mar 2026 01:29:48 +0000 Message-ID: <20260324012952.1923296-5-bwicaksono@nvidia.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260324012952.1923296-1-bwicaksono@nvidia.com> References: <20260324012952.1923296-1-bwicaksono@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-NV-OnPremToCloud: ExternallySecured X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN3PEPF0000B075:EE_|PH7PR12MB7308:EE_ X-MS-Office365-Filtering-Correlation-Id: 9d1c1313-d49b-4fdb-07a8-08de8944f368 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|376014|7416014|36860700016|82310400026|56012099003|18002099003|22082099003; X-Microsoft-Antispam-Message-Info: w+IBETcMvo9MCPkZbzCmbnaXt905jS66BKYkt6QSkP0cNVgYHu9M3M1amhg8/nMcJ5tPnLoPrIio/y+KZnAVS8c5HGmzkfl773tXJqSvHBqeOF3Wrczfy+r/822lT648GL7ztKpigH2iI7wIMJa9odvz4Ow2nwdgLN0Lb8tzoG2/oOAA6uQFcNeSMr4Km6wZY9WAMRrMumT6HkgYTqpi2rIR9CQ4Z3+XJbyAheT4AIvcPGjHz0BZgeOcAQsJylwuZeukhW+4CaMtHhppk+7+ERIBfbRzkpOhmoKmSVs0Tco9+aFNyfcn2kHEa+I9i12+suetMQQOSkbBeKC8REOK3KcH0UkkUKP/e7KrincivXMN8yhMV3FbT6BRSaylF8moRiAaBr3u1iEJ6cgI82CT0ypPV/a5mY3OGiR45F6y8FJj8FwFVQjJJRK+SNloEwGIlPKYaBy17/oG4XFfQ4/FHGmVQJ4drNUs3ycxGbYUyWnDbc8gAY5kKsuehk78VCCzEHmXIatdZAllzJxY+xp+5ngzDV7uEGJyzU/ydCv+I326Juf1vXIJ5RwW2G+1VotrWaKBiZxVQXDo5MO7lgghveCd1qnv0q1c3EPWqGqTAadSFYhLga0gLir1yCvRXX1Ml5Xot9fo4Xztbb/JZtWvuYrKuxHWS+jbxgG70KPkOjy7oeViU+g1Z3rA1YO1z+mh1HF/dRjcc8nxb/xyuKp16ONOCJc4B2EmzmT7blzByeQYS2TJcjzsgQwpvivQ0+T7G2itDPwtzbNktJAe2bGZBA== X-Forefront-Antispam-Report: CIP:216.228.118.233;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge2.nvidia.com;CAT:NONE;SFS:(13230040)(1800799024)(376014)(7416014)(36860700016)(82310400026)(56012099003)(18002099003)(22082099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: 0cTyCXIiGGTXAMmHitBfAGmBd6CBJ/iZNo/ClD4IIqaWQ5L0/XeVXyYHNKkYI72Mi8RxXMqt9Dj/J7Klf9y7gXfNlJAA3hNgTX4tHTR+W4Ijgvd1FDfkiqRqatrXxADjsAz4HYESh+uwhjEH+LzWoQNAj0EMVKq8rOqIQem5EWc6J3451B0SPSWpnYWOSjRtvTlRno2gb428PIHZ2b5W4RBgwhkfWC+f6ceJRgmq21pR27BjAqrvS82mEAuOVgpwLI68uk02wusHoTItFArIWb0u9LAV728mhskAS/UrPcasCdYNRGrzSM1+8vzwE5Wmcp3S7+aAa8eRC+DlHmADNYmgAxQM4cvWRJ2y5ROGGqpakFnd8NH5JgxynVd0jAptImXLAgN7j4c3NGOdkT+KlhV4enydfEJpgMFmJNZbzxW9cT+aoqkiAtH+55oNx3Nr X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Mar 2026 01:30:36.5756 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 9d1c1313-d49b-4fdb-07a8-08de8944f368 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.233];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BN3PEPF0000B075.namprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH7PR12MB7308 Content-Type: text/plain; charset="utf-8" Adds PCIE PMU support in Tegra410 SOC. This PMU is instanced in each root complex in the SOC and can capture traffic from PCIE device to various memory types. This PMU can filter traffic based on the originating root port or BDF and the target memory types (CPU DRAM, GPU Memory, CXL Memory, or remote Memory). Reviewed-by: Ilkka Koskinen Signed-off-by: Besar Wicaksono --- .../admin-guide/perf/nvidia-tegra410-pmu.rst | 163 ++++++++++++++ drivers/perf/arm_cspmu/nvidia_cspmu.c | 210 +++++++++++++++++- 2 files changed, 368 insertions(+), 5 deletions(-) diff --git a/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst b/Docum= entation/admin-guide/perf/nvidia-tegra410-pmu.rst index 7b7ba5700ca1..b8cfbb80be1c 100644 --- a/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst +++ b/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst @@ -6,6 +6,7 @@ The NVIDIA Tegra410 SoC includes various system PMUs to mea= sure key performance metrics like memory bandwidth, latency, and utilization: =20 * Unified Coherence Fabric (UCF) +* PCIE =20 PMU Driver ---------- @@ -104,3 +105,165 @@ Example usage: destination filter =3D remote memory:: =20 perf stat -a -e nvidia_ucf_pmu_1/event=3D0x0,src_loc_noncpu=3D0x1,dst_= rem=3D0x1/ + +PCIE PMU +-------- + +This PMU is located in the SOC fabric connecting the PCIE root complex (RC= ) and +the memory subsystem. It monitors all read/write traffic from the root por= t(s) +or a particular BDF in a PCIE RC to local or remote memory. There is one P= MU per +PCIE RC in the SoC. Each RC can have up to 16 lanes that can be bifurcated= into +up to 8 root ports. The traffic from each root port can be filtered using = RP or +BDF filter. For example, specifying "src_rp_mask=3D0xFF" means the PMU cou= nter will +capture traffic from all RPs. Please see below for more details. + +The events and configuration options of this PMU device are described in s= ysfs, +see /sys/bus/event_source/devices/nvidia_pcie_pmu__rc_. + +The events in this PMU can be used to measure bandwidth, utilization, and +latency: + + * rd_req: count the number of read requests by PCIE device. + * wr_req: count the number of write requests by PCIE device. + * rd_bytes: count the number of bytes transferred by rd_req. + * wr_bytes: count the number of bytes transferred by wr_req. + * rd_cum_outs: count outstanding rd_req each cycle. + * cycles: count the clock cycles of SOC fabric connected to the PCIE int= erface. + +The average bandwidth is calculated as:: + + AVG_RD_BANDWIDTH_IN_GBPS =3D RD_BYTES / ELAPSED_TIME_IN_NS + AVG_WR_BANDWIDTH_IN_GBPS =3D WR_BYTES / ELAPSED_TIME_IN_NS + +The average request rate is calculated as:: + + AVG_RD_REQUEST_RATE =3D RD_REQ / CYCLES + AVG_WR_REQUEST_RATE =3D WR_REQ / CYCLES + + +The average latency is calculated as:: + + FREQ_IN_GHZ =3D CYCLES / ELAPSED_TIME_IN_NS + AVG_LATENCY_IN_CYCLES =3D RD_CUM_OUTS / RD_REQ + AVERAGE_LATENCY_IN_NS =3D AVG_LATENCY_IN_CYCLES / FREQ_IN_GHZ + +The PMU events can be filtered based on the traffic source and destination. +The source filter indicates the PCIE devices that will be monitored. The +destination filter specifies the destination memory type, e.g. local system +memory (CMEM), local GPU memory (GMEM), or remote memory. The local/remote +classification of the destination filter is based on the home socket of the +address, not where the data actually resides. These filters can be found in +/sys/bus/event_source/devices/nvidia_pcie_pmu__rc_/= format/. + +The list of event filters: + +* Source filter: + + * src_rp_mask: bitmask of root ports that will be monitored. Each bit in= this + bitmask represents the RP index in the RC. If the bit is set, all devi= ces under + the associated RP will be monitored. E.g "src_rp_mask=3D0xF" will moni= tor + devices in root port 0 to 3. + * src_bdf: the BDF that will be monitored. This is a 16-bit value that + follows formula: (bus << 8) + (device << 3) + (function). For example,= the + value of BDF 27:01.1 is 0x2781. + * src_bdf_en: enable the BDF filter. If this is set, the BDF filter valu= e in + "src_bdf" is used to filter the traffic. + + Note that Root-Port and BDF filters are mutually exclusive and the PMU in + each RC can only have one BDF filter for the whole counters. If BDF filt= er + is enabled, the BDF filter value will be applied to all events. + +* Destination filter: + + * dst_loc_cmem: if set, count events to local system memory (CMEM) addre= ss + * dst_loc_gmem: if set, count events to local GPU memory (GMEM) address + * dst_loc_pcie_p2p: if set, count events to local PCIE peer address + * dst_loc_pcie_cxl: if set, count events to local CXL memory address + * dst_rem: if set, count events to remote memory address + +If the source filter is not specified, the PMU will count events from all = root +ports. If the destination filter is not specified, the PMU will count even= ts +to all destinations. + +Example usage: + +* Count event id 0x0 from root port 0 of PCIE RC-0 on socket 0 targeting a= ll + destinations:: + + perf stat -a -e nvidia_pcie_pmu_0_rc_0/event=3D0x0,src_rp_mask=3D0x1/ + +* Count event id 0x1 from root port 0 and 1 of PCIE RC-1 on socket 0 and + targeting just local CMEM of socket 0:: + + perf stat -a -e nvidia_pcie_pmu_0_rc_1/event=3D0x1,src_rp_mask=3D0x3,d= st_loc_cmem=3D0x1/ + +* Count event id 0x2 from root port 0 of PCIE RC-2 on socket 1 targeting a= ll + destinations:: + + perf stat -a -e nvidia_pcie_pmu_1_rc_2/event=3D0x2,src_rp_mask=3D0x1/ + +* Count event id 0x3 from root port 0 and 1 of PCIE RC-3 on socket 1 and + targeting just local CMEM of socket 1:: + + perf stat -a -e nvidia_pcie_pmu_1_rc_3/event=3D0x3,src_rp_mask=3D0x3,d= st_loc_cmem=3D0x1/ + +* Count event id 0x4 from BDF 01:01.0 of PCIE RC-4 on socket 0 targeting a= ll + destinations:: + + perf stat -a -e nvidia_pcie_pmu_0_rc_4/event=3D0x4,src_bdf=3D0x0180,sr= c_bdf_en=3D0x1/ + +Mapping the RC# to lspci segment number can be non-trivial; hence a new NV= IDIA +Designated Vendor Specific Capability (DVSEC) register is added into the P= CIE config space +for each RP. This DVSEC has vendor id "10de" and DVSEC id of "0x4". The DV= SEC register +contains the following information to map PCIE devices under the RP back t= o its RC# : + + - Bus# (byte 0xc) : bus number as reported by the lspci output + - Segment# (byte 0xd) : segment number as reported by the lspci output + - RP# (byte 0xe) : port number as reported by LnkCap attribute from lspc= i for a device with Root Port capability + - RC# (byte 0xf): root complex number associated with the RP + - Socket# (byte 0x10): socket number associated with the RP + +Example script for mapping lspci BDF to RC# and socket#:: + + #!/bin/bash + while read bdf rest; do + dvsec4_reg=3D$(lspci -vv -s $bdf | awk ' + /Designated Vendor-Specific: Vendor=3D10de ID=3D0004/ { + match($0, /\[([0-9a-fA-F]+)/, arr); + print "0x" arr[1]; + exit + } + ') + if [ -n "$dvsec4_reg" ]; then + bus=3D$(setpci -s $bdf $(printf '0x%x' $((${dvsec4_reg} + 0xc))).b) + segment=3D$(setpci -s $bdf $(printf '0x%x' $((${dvsec4_reg} + 0xd)))= .b) + rp=3D$(setpci -s $bdf $(printf '0x%x' $((${dvsec4_reg} + 0xe))).b) + rc=3D$(setpci -s $bdf $(printf '0x%x' $((${dvsec4_reg} + 0xf))).b) + socket=3D$(setpci -s $bdf $(printf '0x%x' $((${dvsec4_reg} + 0x10)))= .b) + echo "$bdf: Bus=3D$bus, Segment=3D$segment, RP=3D$rp, RC=3D$rc, Sock= et=3D$socket" + fi + done < <(lspci -d 10de:) + +Example output:: + + 0001:00:00.0: Bus=3D00, Segment=3D01, RP=3D00, RC=3D00, Socket=3D00 + 0002:80:00.0: Bus=3D80, Segment=3D02, RP=3D01, RC=3D01, Socket=3D00 + 0002:a0:00.0: Bus=3Da0, Segment=3D02, RP=3D02, RC=3D01, Socket=3D00 + 0002:c0:00.0: Bus=3Dc0, Segment=3D02, RP=3D03, RC=3D01, Socket=3D00 + 0002:e0:00.0: Bus=3De0, Segment=3D02, RP=3D04, RC=3D01, Socket=3D00 + 0003:00:00.0: Bus=3D00, Segment=3D03, RP=3D00, RC=3D02, Socket=3D00 + 0004:00:00.0: Bus=3D00, Segment=3D04, RP=3D00, RC=3D03, Socket=3D00 + 0005:00:00.0: Bus=3D00, Segment=3D05, RP=3D00, RC=3D04, Socket=3D00 + 0005:40:00.0: Bus=3D40, Segment=3D05, RP=3D01, RC=3D04, Socket=3D00 + 0005:c0:00.0: Bus=3Dc0, Segment=3D05, RP=3D02, RC=3D04, Socket=3D00 + 0006:00:00.0: Bus=3D00, Segment=3D06, RP=3D00, RC=3D05, Socket=3D00 + 0009:00:00.0: Bus=3D00, Segment=3D09, RP=3D00, RC=3D00, Socket=3D01 + 000a:80:00.0: Bus=3D80, Segment=3D0a, RP=3D01, RC=3D01, Socket=3D01 + 000a:a0:00.0: Bus=3Da0, Segment=3D0a, RP=3D02, RC=3D01, Socket=3D01 + 000a:e0:00.0: Bus=3De0, Segment=3D0a, RP=3D03, RC=3D01, Socket=3D01 + 000b:00:00.0: Bus=3D00, Segment=3D0b, RP=3D00, RC=3D02, Socket=3D01 + 000c:00:00.0: Bus=3D00, Segment=3D0c, RP=3D00, RC=3D03, Socket=3D01 + 000d:00:00.0: Bus=3D00, Segment=3D0d, RP=3D00, RC=3D04, Socket=3D01 + 000d:40:00.0: Bus=3D40, Segment=3D0d, RP=3D01, RC=3D04, Socket=3D01 + 000d:c0:00.0: Bus=3Dc0, Segment=3D0d, RP=3D02, RC=3D04, Socket=3D01 + 000e:00:00.0: Bus=3D00, Segment=3D0e, RP=3D00, RC=3D05, Socket=3D01 diff --git a/drivers/perf/arm_cspmu/nvidia_cspmu.c b/drivers/perf/arm_cspmu= /nvidia_cspmu.c index 8e37cbe3bae9..61fde84ea343 100644 --- a/drivers/perf/arm_cspmu/nvidia_cspmu.c +++ b/drivers/perf/arm_cspmu/nvidia_cspmu.c @@ -8,6 +8,7 @@ =20 #include #include +#include #include =20 #include "arm_cspmu.h" @@ -28,6 +29,19 @@ #define NV_UCF_FILTER_DST GENMASK_ULL(11, 8) #define NV_UCF_FILTER_DEFAULT (NV_UCF_FILTER_SRC | NV_UCF_FILTER_DS= T) =20 +#define NV_PCIE_V2_PORT_COUNT 8ULL +#define NV_PCIE_V2_FILTER_ID_MASK GENMASK_ULL(24, 0) +#define NV_PCIE_V2_FILTER_PORT GENMASK_ULL(NV_PCIE_V2_PORT_COUNT - 1= , 0) +#define NV_PCIE_V2_FILTER_BDF_VAL GENMASK_ULL(23, NV_PCIE_V2_PORT_COUNT) +#define NV_PCIE_V2_FILTER_BDF_EN BIT(24) +#define NV_PCIE_V2_FILTER_BDF_VAL_EN GENMASK_ULL(24, NV_PCIE_V2_PORT_COUNT) +#define NV_PCIE_V2_FILTER_DEFAULT NV_PCIE_V2_FILTER_PORT + +#define NV_PCIE_V2_DST_COUNT 5ULL +#define NV_PCIE_V2_FILTER2_ID_MASK GENMASK_ULL(4, 0) +#define NV_PCIE_V2_FILTER2_DST GENMASK_ULL(NV_PCIE_V2_DST_COUNT - 1,= 0) +#define NV_PCIE_V2_FILTER2_DEFAULT NV_PCIE_V2_FILTER2_DST + #define NV_GENERIC_FILTER_ID_MASK GENMASK_ULL(31, 0) =20 #define NV_PRODID_MASK (PMIIDR_PRODUCTID | PMIIDR_VARIANT | PMIIDR_REVISIO= N) @@ -161,6 +175,16 @@ static struct attribute *ucf_pmu_event_attrs[] =3D { NULL }; =20 +static struct attribute *pcie_v2_pmu_event_attrs[] =3D { + ARM_CSPMU_EVENT_ATTR(rd_bytes, 0x0), + ARM_CSPMU_EVENT_ATTR(wr_bytes, 0x1), + ARM_CSPMU_EVENT_ATTR(rd_req, 0x2), + ARM_CSPMU_EVENT_ATTR(wr_req, 0x3), + ARM_CSPMU_EVENT_ATTR(rd_cum_outs, 0x4), + ARM_CSPMU_EVENT_ATTR(cycles, ARM_CSPMU_EVT_CYCLES_DEFAULT), + NULL +}; + static struct attribute *generic_pmu_event_attrs[] =3D { ARM_CSPMU_EVENT_ATTR(cycles, ARM_CSPMU_EVT_CYCLES_DEFAULT), NULL, @@ -201,6 +225,19 @@ static struct attribute *ucf_pmu_format_attrs[] =3D { NULL }; =20 +static struct attribute *pcie_v2_pmu_format_attrs[] =3D { + ARM_CSPMU_FORMAT_EVENT_ATTR, + ARM_CSPMU_FORMAT_ATTR(src_rp_mask, "config1:0-7"), + ARM_CSPMU_FORMAT_ATTR(src_bdf, "config1:8-23"), + ARM_CSPMU_FORMAT_ATTR(src_bdf_en, "config1:24"), + ARM_CSPMU_FORMAT_ATTR(dst_loc_cmem, "config2:0"), + ARM_CSPMU_FORMAT_ATTR(dst_loc_gmem, "config2:1"), + ARM_CSPMU_FORMAT_ATTR(dst_loc_pcie_p2p, "config2:2"), + ARM_CSPMU_FORMAT_ATTR(dst_loc_pcie_cxl, "config2:3"), + ARM_CSPMU_FORMAT_ATTR(dst_rem, "config2:4"), + NULL +}; + static struct attribute *generic_pmu_format_attrs[] =3D { ARM_CSPMU_FORMAT_EVENT_ATTR, ARM_CSPMU_FORMAT_FILTER_ATTR, @@ -232,6 +269,32 @@ nv_cspmu_get_name(const struct arm_cspmu *cspmu) return ctx->name; } =20 +#if defined(CONFIG_ACPI) && defined(CONFIG_ARM64) +static int nv_cspmu_get_inst_id(const struct arm_cspmu *cspmu, u32 *id) +{ + struct fwnode_handle *fwnode; + struct acpi_device *adev; + int ret; + + adev =3D arm_cspmu_acpi_dev_get(cspmu); + if (!adev) + return -ENODEV; + + fwnode =3D acpi_fwnode_handle(adev); + ret =3D fwnode_property_read_u32(fwnode, "instance_id", id); + if (ret) + dev_err(cspmu->dev, "Failed to get instance ID\n"); + + acpi_dev_put(adev); + return ret; +} +#else +static int nv_cspmu_get_inst_id(const struct arm_cspmu *cspmu, u32 *id) +{ + return -EINVAL; +} +#endif + static u32 nv_cspmu_event_filter(const struct perf_event *event) { const struct nv_cspmu_ctx *ctx =3D @@ -277,6 +340,20 @@ static void nv_cspmu_set_ev_filter(struct arm_cspmu *c= spmu, } } =20 +static void nv_cspmu_reset_ev_filter(struct arm_cspmu *cspmu, + const struct perf_event *event) +{ + const struct nv_cspmu_ctx *ctx =3D + to_nv_cspmu_ctx(to_arm_cspmu(event->pmu)); + const u32 offset =3D 4 * event->hw.idx; + + if (ctx->get_filter) + writel(0, cspmu->base0 + PMEVFILTR + offset); + + if (ctx->get_filter2) + writel(0, cspmu->base0 + PMEVFILT2R + offset); +} + static void nv_cspmu_set_cc_filter(struct arm_cspmu *cspmu, const struct perf_event *event) { @@ -307,9 +384,103 @@ static u32 ucf_pmu_event_filter(const struct perf_eve= nt *event) return ret; } =20 +static u32 pcie_v2_pmu_bdf_val_en(u32 filter) +{ + const u32 bdf_en =3D FIELD_GET(NV_PCIE_V2_FILTER_BDF_EN, filter); + + /* Returns both BDF value and enable bit if BDF filtering is enabled. */ + if (bdf_en) + return FIELD_GET(NV_PCIE_V2_FILTER_BDF_VAL_EN, filter); + + /* Ignore the BDF value if BDF filter is not enabled. */ + return 0; +} + +static u32 pcie_v2_pmu_event_filter(const struct perf_event *event) +{ + u32 filter, lead_filter, lead_bdf; + struct perf_event *leader; + const struct nv_cspmu_ctx *ctx =3D + to_nv_cspmu_ctx(to_arm_cspmu(event->pmu)); + + filter =3D event->attr.config1 & ctx->filter_mask; + if (filter !=3D 0) + return filter; + + leader =3D event->group_leader; + + /* Use leader's filter value if its BDF filtering is enabled. */ + if (event !=3D leader) { + lead_filter =3D pcie_v2_pmu_event_filter(leader); + lead_bdf =3D pcie_v2_pmu_bdf_val_en(lead_filter); + if (lead_bdf !=3D 0) + return lead_filter; + } + + /* Otherwise, return default filter value. */ + return ctx->filter_default_val; +} + +static int pcie_v2_pmu_validate_event(struct arm_cspmu *cspmu, + struct perf_event *new_ev) +{ + /* + * Make sure the events are using same BDF filter since the PCIE-SRC PMU + * only supports one common BDF filter setting for all of the counters. + */ + + int idx; + u32 new_filter, new_rp, new_bdf, new_lead_filter, new_lead_bdf; + struct perf_event *new_leader; + + if (cspmu->impl.ops.is_cycle_counter_event(new_ev)) + return 0; + + new_leader =3D new_ev->group_leader; + + new_filter =3D pcie_v2_pmu_event_filter(new_ev); + new_lead_filter =3D pcie_v2_pmu_event_filter(new_leader); + + new_bdf =3D pcie_v2_pmu_bdf_val_en(new_filter); + new_lead_bdf =3D pcie_v2_pmu_bdf_val_en(new_lead_filter); + + new_rp =3D FIELD_GET(NV_PCIE_V2_FILTER_PORT, new_filter); + + if (new_rp !=3D 0 && new_bdf !=3D 0) { + dev_err(cspmu->dev, + "RP and BDF filtering are mutually exclusive\n"); + return -EINVAL; + } + + if (new_bdf !=3D new_lead_bdf) { + dev_err(cspmu->dev, + "sibling and leader BDF value should be equal\n"); + return -EINVAL; + } + + /* Compare BDF filter on existing events. */ + idx =3D find_first_bit(cspmu->hw_events.used_ctrs, + cspmu->cycle_counter_logical_idx); + + if (idx !=3D cspmu->cycle_counter_logical_idx) { + struct perf_event *leader =3D cspmu->hw_events.events[idx]->group_leader; + + const u32 lead_filter =3D pcie_v2_pmu_event_filter(leader); + const u32 lead_bdf =3D pcie_v2_pmu_bdf_val_en(lead_filter); + + if (new_lead_bdf !=3D lead_bdf) { + dev_err(cspmu->dev, "only one BDF value is supported\n"); + return -EINVAL; + } + } + + return 0; +} + enum nv_cspmu_name_fmt { NAME_FMT_GENERIC, - NAME_FMT_SOCKET + NAME_FMT_SOCKET, + NAME_FMT_SOCKET_INST, }; =20 struct nv_cspmu_match { @@ -427,6 +598,26 @@ static const struct nv_cspmu_match nv_cspmu_match[] = =3D { .get_filter =3D ucf_pmu_event_filter, }, }, + { + .prodid =3D 0x10301000, + .prodid_mask =3D NV_PRODID_MASK, + .name_pattern =3D "nvidia_pcie_pmu_%u_rc_%u", + .name_fmt =3D NAME_FMT_SOCKET_INST, + .template_ctx =3D { + .event_attr =3D pcie_v2_pmu_event_attrs, + .format_attr =3D pcie_v2_pmu_format_attrs, + .filter_mask =3D NV_PCIE_V2_FILTER_ID_MASK, + .filter_default_val =3D NV_PCIE_V2_FILTER_DEFAULT, + .filter2_mask =3D NV_PCIE_V2_FILTER2_ID_MASK, + .filter2_default_val =3D NV_PCIE_V2_FILTER2_DEFAULT, + .get_filter =3D pcie_v2_pmu_event_filter, + .get_filter2 =3D nv_cspmu_event_filter2, + }, + .ops =3D { + .validate_event =3D pcie_v2_pmu_validate_event, + .reset_ev_filter =3D nv_cspmu_reset_ev_filter, + } + }, { .prodid =3D 0, .prodid_mask =3D 0, @@ -450,7 +641,7 @@ static const struct nv_cspmu_match nv_cspmu_match[] =3D= { static char *nv_cspmu_format_name(const struct arm_cspmu *cspmu, const struct nv_cspmu_match *match) { - char *name; + char *name =3D NULL; struct device *dev =3D cspmu->dev; =20 static atomic_t pmu_generic_idx =3D {0}; @@ -464,13 +655,20 @@ static char *nv_cspmu_format_name(const struct arm_cs= pmu *cspmu, socket); break; } + case NAME_FMT_SOCKET_INST: { + const int cpu =3D cpumask_first(&cspmu->associated_cpus); + const int socket =3D cpu_to_node(cpu); + u32 inst_id; + + if (!nv_cspmu_get_inst_id(cspmu, &inst_id)) + name =3D devm_kasprintf(dev, GFP_KERNEL, + match->name_pattern, socket, inst_id); + break; + } case NAME_FMT_GENERIC: name =3D devm_kasprintf(dev, GFP_KERNEL, match->name_pattern, atomic_fetch_inc(&pmu_generic_idx)); break; - default: - name =3D NULL; - break; } =20 return name; @@ -511,8 +709,10 @@ static int nv_cspmu_init_ops(struct arm_cspmu *cspmu) cspmu->impl.ctx =3D ctx; =20 /* NVIDIA specific callbacks. */ + SET_OP(validate_event, impl_ops, match, NULL); SET_OP(set_cc_filter, impl_ops, match, nv_cspmu_set_cc_filter); SET_OP(set_ev_filter, impl_ops, match, nv_cspmu_set_ev_filter); + SET_OP(reset_ev_filter, impl_ops, match, NULL); SET_OP(get_event_attrs, impl_ops, match, nv_cspmu_get_event_attrs); SET_OP(get_format_attrs, impl_ops, match, nv_cspmu_get_format_attrs); SET_OP(get_name, impl_ops, match, nv_cspmu_get_name); --=20 2.43.0 From nobody Fri Apr 3 04:41:07 2026 Received: from PH0PR06CU001.outbound.protection.outlook.com (mail-westus3azon11011032.outbound.protection.outlook.com [40.107.208.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7104921CC58; Tue, 24 Mar 2026 01:30:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.208.32 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774315852; cv=fail; b=dzSqPI5nOgDUeKSxfbyY0whFsDNEjsCmC1/Qk9RLpezNdrSlRDXWg/M6SNySaFMQODJ+jdvBswnagUwUa9N7jq6SPVuplxi9nq0r9ubVqfP6kWA7EqdL1J7kgIP9CRqYNUmvxVNiF0RLZoATkeeKI86aElhDlaHd1VWH1FUe+CU= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774315852; c=relaxed/simple; bh=gul9uVclujkVyo86GcG+TAaA17VqcHaPHjlxIBy5YFw=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=lMdx08qzfBlwjGh1W4fO8E4d9EWi1LiFEaWhfFFfpn8S6tBw/YPZwRrGrsjgoKbyij1n7Zlvx5yKFS22DaMvhkHggDDyE20LraDtuCaEEML1OQ46K+Hduy2RctVa9Il6fy55AvK41NSDzWdR0XDTHa17Z6DRRR4ogsKl3tkN1Zo= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=QrACZ9Z6; arc=fail smtp.client-ip=40.107.208.32 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="QrACZ9Z6" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=jGJLrftOB1neZUmFHuW7iQ8KoE8Z+995bGkXIqf6zM/8aloMKoahH0k0Px/qgZQ6YGwsh/c4Q1V0xy7zNeYKb3u6fQLUQW3DPmSSYd4M499lUnQMtmuewMeELJyEfcF0wfyJ7lqNFG4xzuBACubUqjNVJyPL5rrN/rQ0X3hPf86xTddf4r+wTD9Q4nlBhDRGE4NMvQ69CSp2dnDBI/Yp4aQ3FgqsXrvZ4n9l5/Q+wJXDgQ5Tlkt0+2EeE+JmFITmy66XM7F7bTrUnlHgP3rGGdV+FASBGadBoRPDKn1cv4ceel8fk4dtl/w5XkR6l1kCFMEjs4YhDeFIIlZ3MiWVLA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=uZOOkppL1ziIdW00NoMwt+w9lNpDFTZASROKYukwCy4=; b=NQ+hZlC9P1b/u99St0STXjlWD7U3dZ0T9SY3i+0Guf41WJB0i7wfNk0eN2d8uzGmgV6orWi2nWiwN9nqiuifybLQDLZqL60d9HKbX4uB3iFyMSudT/MfP9arZuWtItkioFRAk/BfpFKUtVB5N+atU4e/7dnT3Gmn1Pkvf1vPL+koeckjHdM2xn1lLzKl1WOvX4sEL4YRyw2KB4kcV1nDAm6ZPuKT5ZBeoZHG3E//VAlsYaTu0Gr9Tl+OeZokbiONGH3iwwJVXzaNIiR8Z2GrXdsXpZvWLRJ2/l3hSfntmsk1+haWMPckKDtdE6P30KQ8cgoqKfY9JtViOasFrt1Aqg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.232) smtp.rcpttodomain=kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=uZOOkppL1ziIdW00NoMwt+w9lNpDFTZASROKYukwCy4=; b=QrACZ9Z6pPOeUwqElFOfUMOvfYEZCkgZlL++odUmEu2JJfyDtGW8N41bLpoGaoO3KNXyGSrLy7hieXxVw8co6ErrAiTdOBbcjLsRa4mCIO+r3NCKR9tUgmUS1y2Gm7Dzfzt0S6S277tUNDSPMxUp+738XP832SC2KBd/4BSbaEOqaFFOdBaawsYVc1A5YuJ3+N+oEaJCYjdQ1a/zFtmEU3AIgmQQJxZlzH2We6B2ZO3jyd2Ug8xWNkk52x2DxtXQ8R26ZQLV8taC4ql4TmOLpNQeJxTyOtQQoi/z9FqQli3lr+wGSNOxYC/TQLqXSWe5u+gWCr80/PnTGrcaeeBsiA== Received: from BL1PR13CA0441.namprd13.prod.outlook.com (2603:10b6:208:2c3::26) by LV2PR12MB5824.namprd12.prod.outlook.com (2603:10b6:408:176::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9745.20; Tue, 24 Mar 2026 01:30:40 +0000 Received: from MN1PEPF0000F0E4.namprd04.prod.outlook.com (2603:10b6:208:2c3:cafe::5) by BL1PR13CA0441.outlook.office365.com (2603:10b6:208:2c3::26) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9723.31 via Frontend Transport; Tue, 24 Mar 2026 01:30:36 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.232) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.232 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.232; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.232) by MN1PEPF0000F0E4.mail.protection.outlook.com (10.167.242.42) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9723.19 via Frontend Transport; Tue, 24 Mar 2026 01:30:40 +0000 Received: from drhqmail203.nvidia.com (10.126.190.182) by mail.nvidia.com (10.127.129.5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Mon, 23 Mar 2026 18:30:18 -0700 Received: from drhqmail201.nvidia.com (10.126.190.180) by drhqmail203.nvidia.com (10.126.190.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Mon, 23 Mar 2026 18:30:18 -0700 Received: from build-bwicaksono-noble-20251018.internal (10.127.8.11) by mail.nvidia.com (10.126.190.180) with Microsoft SMTP Server id 15.2.2562.20 via Frontend Transport; Mon, 23 Mar 2026 18:30:17 -0700 From: Besar Wicaksono To: , , , , CC: , , , , , , , , , , , , , , , , Besar Wicaksono Subject: [PATCH v3 5/8] perf/arm_cspmu: nvidia: Add Tegra410 PCIE-TGT PMU Date: Tue, 24 Mar 2026 01:29:49 +0000 Message-ID: <20260324012952.1923296-6-bwicaksono@nvidia.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260324012952.1923296-1-bwicaksono@nvidia.com> References: <20260324012952.1923296-1-bwicaksono@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-NV-OnPremToCloud: ExternallySecured X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: MN1PEPF0000F0E4:EE_|LV2PR12MB5824:EE_ X-MS-Office365-Filtering-Correlation-Id: d74ae709-4626-409c-3048-08de8944f5b1 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|82310400026|36860700016|1800799024|7416014|376014|22082099003|56012099003|18002099003; X-Microsoft-Antispam-Message-Info: DXk92KBY11dsCIibrmdFE+Jl17P2z85iWD7lSNqqSi/PSI3zTaESUx0VeMvB+2d9oR/09C3BBTwsSYtaDOtaWT2XMW/RcyG4Qd3iD30PdM9JQS+bNR0VbJBej2wAQr7umuFw3q32J0038R7J8YHpAiNGogKz5B4TOQMV+JmG/J7mezT8+SOa0QO8BeUitwTDRnJ5vc178MRQoCZz1QXnBgG2LuZ3j5UkM9IIROFeQBkLw4FaBN0COhd0F1OhmrG0x24d23uj9EAggy0g9dYra+hwGtWpZO6iwMaqqwJlUN35wRk41wuwCBmHezpWdtYHJXKmas/gl8Jvjd+CHC6BLOKrY7JDEfkNmuL2yAFOTG4hRpK795GM+q7lhk7PfwH4DRv9U4838N5hnNQxXj4zDg4ySLpMeg5yDNI9yvtEv4uvktTLQzw2jmGJ907aZFCjKqeZOVy6VQ8qwGxa4bH5Hyp/AqvRnBNRYfX2XKalvtq33UIsku2ge/Q2Hs1jNWsH+FpQONXZ2ROuz7/4OGEsz2JMad53FjgpKBPWb038TgPwLpeGKQ5EmkoJYBleSh1eEXb5rIW2QlstjyQatRHJ4q6+3sqdkzNGokmmk0VIxxc++zGeQsP1rb07G21ysLySMC/6cu2st+l3nFY93FYnS2AtvepT1CYEZoy817dMpWOodJU/F/i4tH0yEYyx8UEYX6kjq8/4WmJrVhmfHgDtq3CqfgB4fD3cquOTLZKHpyXFwDh7W79RWl7MusnyhfmnQdDi8mkjI0kL7qOZypNDIw== X-Forefront-Antispam-Report: CIP:216.228.118.232;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge1.nvidia.com;CAT:NONE;SFS:(13230040)(82310400026)(36860700016)(1800799024)(7416014)(376014)(22082099003)(56012099003)(18002099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: 3/Fj1ZmCMlPK6PXaurdWybLXzvutnZJgSQtcuUNd517bi255Az6aBbjwlZHE/mmFaX5cOOO/Wzp79X4wdEMEtNS+dsJzWQBv71wtoYukGCt2V7MWz7wKGsmzUsLBIvB5VZAY9WkESjBN53vBffbQBCmaP4uXxbX5cKQYbGG/HCbve31OQIYkKfmetYhh0yFYwwgKTf418vTFeO135J55fIZI6o4DN73Y96SVWHEcWjhT6fBDsTrROeUev4MXOjkSXONbcyC0VPGH2QqI0xRNt5EsLEFYrgZfORHrY5ZsEtKN/vZY/VnZODXNuQtM0EoM2IYjaB+Sf45HOvLSbnhBtGLc+gdtOj0pkgn2L/v3VIFeCgTMAteovmvYO1yeKnK+hpCECICgRIfcznaZmlMgMnTPS28DrW/S2WqnV9s1IgSUwzseo12bWjIXWesgS6NJ X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Mar 2026 01:30:40.4729 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: d74ae709-4626-409c-3048-08de8944f5b1 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.232];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: MN1PEPF0000F0E4.namprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: LV2PR12MB5824 Content-Type: text/plain; charset="utf-8" Adds PCIE-TGT PMU support in Tegra410 SOC. This PMU is instanced in each root complex in the SOC and it captures traffic originating from any source towards PCIE BAR and CXL HDM range. The traffic can be filtered based on the destination root port or target address range. Reviewed-by: Ilkka Koskinen Signed-off-by: Besar Wicaksono --- .../admin-guide/perf/nvidia-tegra410-pmu.rst | 77 +++++ drivers/perf/arm_cspmu/nvidia_cspmu.c | 321 ++++++++++++++++++ 2 files changed, 398 insertions(+) diff --git a/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst b/Docum= entation/admin-guide/perf/nvidia-tegra410-pmu.rst index b8cfbb80be1c..c065764d41fe 100644 --- a/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst +++ b/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst @@ -7,6 +7,7 @@ metrics like memory bandwidth, latency, and utilization: =20 * Unified Coherence Fabric (UCF) * PCIE +* PCIE-TGT =20 PMU Driver ---------- @@ -212,6 +213,11 @@ Example usage: =20 perf stat -a -e nvidia_pcie_pmu_0_rc_4/event=3D0x4,src_bdf=3D0x0180,sr= c_bdf_en=3D0x1/ =20 +.. _NVIDIA_T410_PCIE_PMU_RC_Mapping_Section: + +Mapping the RC# to lspci segment number +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Mapping the RC# to lspci segment number can be non-trivial; hence a new NV= IDIA Designated Vendor Specific Capability (DVSEC) register is added into the P= CIE config space for each RP. This DVSEC has vendor id "10de" and DVSEC id of "0x4". The DV= SEC register @@ -267,3 +273,74 @@ Example output:: 000d:40:00.0: Bus=3D40, Segment=3D0d, RP=3D01, RC=3D04, Socket=3D01 000d:c0:00.0: Bus=3Dc0, Segment=3D0d, RP=3D02, RC=3D04, Socket=3D01 000e:00:00.0: Bus=3D00, Segment=3D0e, RP=3D00, RC=3D05, Socket=3D01 + +PCIE-TGT PMU +------------ + +This PMU is located in the SOC fabric connecting the PCIE root complex (RC= ) and +the memory subsystem. It monitors traffic targeting PCIE BAR and CXL HDM r= anges. +There is one PCIE-TGT PMU per PCIE RC in the SoC. Each RC in Tegra410 SoC = can +have up to 16 lanes that can be bifurcated into up to 8 root ports (RP). T= he PMU +provides RP filter to count PCIE BAR traffic to each RP and address filter= to +count access to PCIE BAR or CXL HDM ranges. The details of the filters are +described in the following sections. + +Mapping the RC# to lspci segment number is similar to the PCIE PMU. Please= see +:ref:`NVIDIA_T410_PCIE_PMU_RC_Mapping_Section` for more info. + +The events and configuration options of this PMU device are available in s= ysfs, +see /sys/bus/event_source/devices/nvidia_pcie_tgt_pmu__rc_. + +The events in this PMU can be used to measure bandwidth and utilization: + + * rd_req: count the number of read requests to PCIE. + * wr_req: count the number of write requests to PCIE. + * rd_bytes: count the number of bytes transferred by rd_req. + * wr_bytes: count the number of bytes transferred by wr_req. + * cycles: count the clock cycles of SOC fabric connected to the PCIE int= erface. + +The average bandwidth is calculated as:: + + AVG_RD_BANDWIDTH_IN_GBPS =3D RD_BYTES / ELAPSED_TIME_IN_NS + AVG_WR_BANDWIDTH_IN_GBPS =3D WR_BYTES / ELAPSED_TIME_IN_NS + +The average request rate is calculated as:: + + AVG_RD_REQUEST_RATE =3D RD_REQ / CYCLES + AVG_WR_REQUEST_RATE =3D WR_REQ / CYCLES + +The PMU events can be filtered based on the destination root port or target +address range. Filtering based on RP is only available for PCIE BAR traffi= c. +Address filter works for both PCIE BAR and CXL HDM ranges. These filters c= an be +found in sysfs, see +/sys/bus/event_source/devices/nvidia_pcie_tgt_pmu__rc_/format/. + +Destination filter settings: + +* dst_rp_mask: bitmask to select the root port(s) to monitor. E.g. "dst_rp= _mask=3D0xFF" + corresponds to all root ports (from 0 to 7) in the PCIE RC. Note that th= is filter is + only available for PCIE BAR traffic. +* dst_addr_base: BAR or CXL HDM filter base address. +* dst_addr_mask: BAR or CXL HDM filter address mask. +* dst_addr_en: enable BAR or CXL HDM address range filter. If this is set,= the + address range specified by "dst_addr_base" and "dst_addr_mask" will be u= sed to filter + the PCIE BAR and CXL HDM traffic address. The PMU uses the following com= parison + to determine if the traffic destination address falls within the filter = range:: + + (txn's addr & dst_addr_mask) =3D=3D (dst_addr_base & dst_addr_mask) + + If the comparison succeeds, then the event will be counted. + +If the destination filter is not specified, the RP filter will be configur= ed by default +to count PCIE BAR traffic to all root ports. + +Example usage: + +* Count event id 0x0 to root port 0 and 1 of PCIE RC-0 on socket 0:: + + perf stat -a -e nvidia_pcie_tgt_pmu_0_rc_0/event=3D0x0,dst_rp_mask=3D0= x3/ + +* Count event id 0x1 for accesses to PCIE BAR or CXL HDM address range + 0x10000 to 0x100FF on socket 0's PCIE RC-1:: + + perf stat -a -e nvidia_pcie_tgt_pmu_0_rc_1/event=3D0x1,dst_addr_base= =3D0x10000,dst_addr_mask=3D0xFFF00,dst_addr_en=3D0x1/ diff --git a/drivers/perf/arm_cspmu/nvidia_cspmu.c b/drivers/perf/arm_cspmu= /nvidia_cspmu.c index 61fde84ea343..bac83e424d6d 100644 --- a/drivers/perf/arm_cspmu/nvidia_cspmu.c +++ b/drivers/perf/arm_cspmu/nvidia_cspmu.c @@ -42,6 +42,24 @@ #define NV_PCIE_V2_FILTER2_DST GENMASK_ULL(NV_PCIE_V2_DST_COUNT - 1,= 0) #define NV_PCIE_V2_FILTER2_DEFAULT NV_PCIE_V2_FILTER2_DST =20 +#define NV_PCIE_TGT_PORT_COUNT 8ULL +#define NV_PCIE_TGT_EV_TYPE_CC 0x4 +#define NV_PCIE_TGT_EV_TYPE_COUNT 3ULL +#define NV_PCIE_TGT_EV_TYPE_MASK GENMASK_ULL(NV_PCIE_TGT_EV_TYPE_COUNT= - 1, 0) +#define NV_PCIE_TGT_FILTER2_MASK GENMASK_ULL(NV_PCIE_TGT_PORT_COUNT, 0) +#define NV_PCIE_TGT_FILTER2_PORT GENMASK_ULL(NV_PCIE_TGT_PORT_COUNT - = 1, 0) +#define NV_PCIE_TGT_FILTER2_ADDR_EN BIT(NV_PCIE_TGT_PORT_COUNT) +#define NV_PCIE_TGT_FILTER2_ADDR GENMASK_ULL(15, NV_PCIE_TGT_PORT_COUN= T) +#define NV_PCIE_TGT_FILTER2_DEFAULT NV_PCIE_TGT_FILTER2_PORT + +#define NV_PCIE_TGT_ADDR_COUNT 8ULL +#define NV_PCIE_TGT_ADDR_STRIDE 20 +#define NV_PCIE_TGT_ADDR_CTRL 0xD38 +#define NV_PCIE_TGT_ADDR_BASE_LO 0xD3C +#define NV_PCIE_TGT_ADDR_BASE_HI 0xD40 +#define NV_PCIE_TGT_ADDR_MASK_LO 0xD44 +#define NV_PCIE_TGT_ADDR_MASK_HI 0xD48 + #define NV_GENERIC_FILTER_ID_MASK GENMASK_ULL(31, 0) =20 #define NV_PRODID_MASK (PMIIDR_PRODUCTID | PMIIDR_VARIANT | PMIIDR_REVISIO= N) @@ -185,6 +203,15 @@ static struct attribute *pcie_v2_pmu_event_attrs[] =3D= { NULL }; =20 +static struct attribute *pcie_tgt_pmu_event_attrs[] =3D { + ARM_CSPMU_EVENT_ATTR(rd_bytes, 0x0), + ARM_CSPMU_EVENT_ATTR(wr_bytes, 0x1), + ARM_CSPMU_EVENT_ATTR(rd_req, 0x2), + ARM_CSPMU_EVENT_ATTR(wr_req, 0x3), + ARM_CSPMU_EVENT_ATTR(cycles, NV_PCIE_TGT_EV_TYPE_CC), + NULL +}; + static struct attribute *generic_pmu_event_attrs[] =3D { ARM_CSPMU_EVENT_ATTR(cycles, ARM_CSPMU_EVT_CYCLES_DEFAULT), NULL, @@ -238,6 +265,15 @@ static struct attribute *pcie_v2_pmu_format_attrs[] = =3D { NULL }; =20 +static struct attribute *pcie_tgt_pmu_format_attrs[] =3D { + ARM_CSPMU_FORMAT_ATTR(event, "config:0-2"), + ARM_CSPMU_FORMAT_ATTR(dst_rp_mask, "config:3-10"), + ARM_CSPMU_FORMAT_ATTR(dst_addr_en, "config:11"), + ARM_CSPMU_FORMAT_ATTR(dst_addr_base, "config1:0-63"), + ARM_CSPMU_FORMAT_ATTR(dst_addr_mask, "config2:0-63"), + NULL +}; + static struct attribute *generic_pmu_format_attrs[] =3D { ARM_CSPMU_FORMAT_EVENT_ATTR, ARM_CSPMU_FORMAT_FILTER_ATTR, @@ -477,6 +513,267 @@ static int pcie_v2_pmu_validate_event(struct arm_cspm= u *cspmu, return 0; } =20 +struct pcie_tgt_addr_filter { + u32 refcount; + u64 base; + u64 mask; +}; + +struct pcie_tgt_data { + struct pcie_tgt_addr_filter addr_filter[NV_PCIE_TGT_ADDR_COUNT]; + void __iomem *addr_filter_reg; +}; + +#if defined(CONFIG_ACPI) && defined(CONFIG_ARM64) +static int pcie_tgt_init_data(struct arm_cspmu *cspmu) +{ + int ret; + struct acpi_device *adev; + struct pcie_tgt_data *data; + struct list_head resource_list; + struct resource_entry *rentry; + struct nv_cspmu_ctx *ctx =3D to_nv_cspmu_ctx(cspmu); + struct device *dev =3D cspmu->dev; + + data =3D devm_kzalloc(dev, sizeof(struct pcie_tgt_data), GFP_KERNEL); + if (!data) + return -ENOMEM; + + adev =3D arm_cspmu_acpi_dev_get(cspmu); + if (!adev) { + dev_err(dev, "failed to get associated PCIE-TGT device\n"); + return -ENODEV; + } + + INIT_LIST_HEAD(&resource_list); + ret =3D acpi_dev_get_memory_resources(adev, &resource_list); + if (ret < 0) { + dev_err(dev, "failed to get PCIE-TGT device memory resources\n"); + acpi_dev_put(adev); + return ret; + } + + rentry =3D list_first_entry_or_null( + &resource_list, struct resource_entry, node); + if (rentry) { + data->addr_filter_reg =3D devm_ioremap_resource(dev, rentry->res); + ret =3D 0; + } + + if (IS_ERR(data->addr_filter_reg)) { + dev_err(dev, "failed to get address filter resource\n"); + ret =3D PTR_ERR(data->addr_filter_reg); + } + + acpi_dev_free_resource_list(&resource_list); + acpi_dev_put(adev); + + ctx->data =3D data; + + return ret; +} +#else +static int pcie_tgt_init_data(struct arm_cspmu *cspmu) +{ + return -ENODEV; +} +#endif + +static struct pcie_tgt_data *pcie_tgt_get_data(struct arm_cspmu *cspmu) +{ + struct nv_cspmu_ctx *ctx =3D to_nv_cspmu_ctx(cspmu); + + return ctx->data; +} + +/* Find the first available address filter slot. */ +static int pcie_tgt_find_addr_idx(struct arm_cspmu *cspmu, u64 base, u64 m= ask, + bool is_reset) +{ + int i; + struct pcie_tgt_data *data =3D pcie_tgt_get_data(cspmu); + + for (i =3D 0; i < NV_PCIE_TGT_ADDR_COUNT; i++) { + if (!is_reset && data->addr_filter[i].refcount =3D=3D 0) + return i; + + if (data->addr_filter[i].base =3D=3D base && + data->addr_filter[i].mask =3D=3D mask) + return i; + } + + return -ENODEV; +} + +static u32 pcie_tgt_pmu_event_filter(const struct perf_event *event) +{ + u32 filter; + + filter =3D (event->attr.config >> NV_PCIE_TGT_EV_TYPE_COUNT) & + NV_PCIE_TGT_FILTER2_MASK; + + return filter; +} + +static bool pcie_tgt_pmu_addr_en(const struct perf_event *event) +{ + u32 filter =3D pcie_tgt_pmu_event_filter(event); + + return FIELD_GET(NV_PCIE_TGT_FILTER2_ADDR_EN, filter) !=3D 0; +} + +static u32 pcie_tgt_pmu_port_filter(const struct perf_event *event) +{ + u32 filter =3D pcie_tgt_pmu_event_filter(event); + + return FIELD_GET(NV_PCIE_TGT_FILTER2_PORT, filter); +} + +static u64 pcie_tgt_pmu_dst_addr_base(const struct perf_event *event) +{ + return event->attr.config1; +} + +static u64 pcie_tgt_pmu_dst_addr_mask(const struct perf_event *event) +{ + return event->attr.config2; +} + +static int pcie_tgt_pmu_validate_event(struct arm_cspmu *cspmu, + struct perf_event *new_ev) +{ + u64 base, mask; + int idx; + + if (!pcie_tgt_pmu_addr_en(new_ev)) + return 0; + + /* Make sure there is a slot available for the address filter. */ + base =3D pcie_tgt_pmu_dst_addr_base(new_ev); + mask =3D pcie_tgt_pmu_dst_addr_mask(new_ev); + idx =3D pcie_tgt_find_addr_idx(cspmu, base, mask, false); + if (idx < 0) + return -EINVAL; + + return 0; +} + +static void pcie_tgt_pmu_config_addr_filter(struct arm_cspmu *cspmu, + bool en, u64 base, u64 mask, int idx) +{ + struct pcie_tgt_data *data; + struct pcie_tgt_addr_filter *filter; + void __iomem *filter_reg; + + data =3D pcie_tgt_get_data(cspmu); + filter =3D &data->addr_filter[idx]; + filter_reg =3D data->addr_filter_reg + (idx * NV_PCIE_TGT_ADDR_STRIDE); + + if (en) { + filter->refcount++; + if (filter->refcount =3D=3D 1) { + filter->base =3D base; + filter->mask =3D mask; + + writel(lower_32_bits(base), filter_reg + NV_PCIE_TGT_ADDR_BASE_LO); + writel(upper_32_bits(base), filter_reg + NV_PCIE_TGT_ADDR_BASE_HI); + writel(lower_32_bits(mask), filter_reg + NV_PCIE_TGT_ADDR_MASK_LO); + writel(upper_32_bits(mask), filter_reg + NV_PCIE_TGT_ADDR_MASK_HI); + writel(1, filter_reg + NV_PCIE_TGT_ADDR_CTRL); + } + } else { + filter->refcount--; + if (filter->refcount =3D=3D 0) { + writel(0, filter_reg + NV_PCIE_TGT_ADDR_CTRL); + writel(0, filter_reg + NV_PCIE_TGT_ADDR_BASE_LO); + writel(0, filter_reg + NV_PCIE_TGT_ADDR_BASE_HI); + writel(0, filter_reg + NV_PCIE_TGT_ADDR_MASK_LO); + writel(0, filter_reg + NV_PCIE_TGT_ADDR_MASK_HI); + + filter->base =3D 0; + filter->mask =3D 0; + } + } +} + +static void pcie_tgt_pmu_set_ev_filter(struct arm_cspmu *cspmu, + const struct perf_event *event) +{ + bool addr_filter_en; + int idx; + u32 filter2_val, filter2_offset, port_filter; + u64 base, mask; + + filter2_val =3D 0; + filter2_offset =3D PMEVFILT2R + (4 * event->hw.idx); + + addr_filter_en =3D pcie_tgt_pmu_addr_en(event); + if (addr_filter_en) { + base =3D pcie_tgt_pmu_dst_addr_base(event); + mask =3D pcie_tgt_pmu_dst_addr_mask(event); + idx =3D pcie_tgt_find_addr_idx(cspmu, base, mask, false); + + if (idx < 0) { + dev_err(cspmu->dev, + "Unable to find a slot for address filtering\n"); + writel(0, cspmu->base0 + filter2_offset); + return; + } + + /* Configure address range filter registers.*/ + pcie_tgt_pmu_config_addr_filter(cspmu, true, base, mask, idx); + + /* Config the counter to use the selected address filter slot. */ + filter2_val |=3D FIELD_PREP(NV_PCIE_TGT_FILTER2_ADDR, 1U << idx); + } + + port_filter =3D pcie_tgt_pmu_port_filter(event); + + /* Monitor all ports if no filter is selected. */ + if (!addr_filter_en && port_filter =3D=3D 0) + port_filter =3D NV_PCIE_TGT_FILTER2_PORT; + + filter2_val |=3D FIELD_PREP(NV_PCIE_TGT_FILTER2_PORT, port_filter); + + writel(filter2_val, cspmu->base0 + filter2_offset); +} + +static void pcie_tgt_pmu_reset_ev_filter(struct arm_cspmu *cspmu, + const struct perf_event *event) +{ + bool addr_filter_en; + u64 base, mask; + int idx; + + addr_filter_en =3D pcie_tgt_pmu_addr_en(event); + if (!addr_filter_en) + return; + + base =3D pcie_tgt_pmu_dst_addr_base(event); + mask =3D pcie_tgt_pmu_dst_addr_mask(event); + idx =3D pcie_tgt_find_addr_idx(cspmu, base, mask, true); + + if (idx < 0) { + dev_err(cspmu->dev, + "Unable to find the address filter slot to reset\n"); + return; + } + + pcie_tgt_pmu_config_addr_filter(cspmu, false, base, mask, idx); +} + +static u32 pcie_tgt_pmu_event_type(const struct perf_event *event) +{ + return event->attr.config & NV_PCIE_TGT_EV_TYPE_MASK; +} + +static bool pcie_tgt_pmu_is_cycle_counter_event(const struct perf_event *e= vent) +{ + u32 event_type =3D pcie_tgt_pmu_event_type(event); + + return event_type =3D=3D NV_PCIE_TGT_EV_TYPE_CC; +} + enum nv_cspmu_name_fmt { NAME_FMT_GENERIC, NAME_FMT_SOCKET, @@ -618,6 +915,28 @@ static const struct nv_cspmu_match nv_cspmu_match[] = =3D { .reset_ev_filter =3D nv_cspmu_reset_ev_filter, } }, + { + .prodid =3D 0x10700000, + .prodid_mask =3D NV_PRODID_MASK, + .name_pattern =3D "nvidia_pcie_tgt_pmu_%u_rc_%u", + .name_fmt =3D NAME_FMT_SOCKET_INST, + .template_ctx =3D { + .event_attr =3D pcie_tgt_pmu_event_attrs, + .format_attr =3D pcie_tgt_pmu_format_attrs, + .filter_mask =3D 0x0, + .filter_default_val =3D 0x0, + .filter2_mask =3D NV_PCIE_TGT_FILTER2_MASK, + .filter2_default_val =3D NV_PCIE_TGT_FILTER2_DEFAULT, + .init_data =3D pcie_tgt_init_data + }, + .ops =3D { + .is_cycle_counter_event =3D pcie_tgt_pmu_is_cycle_counter_event, + .event_type =3D pcie_tgt_pmu_event_type, + .validate_event =3D pcie_tgt_pmu_validate_event, + .set_ev_filter =3D pcie_tgt_pmu_set_ev_filter, + .reset_ev_filter =3D pcie_tgt_pmu_reset_ev_filter, + } + }, { .prodid =3D 0, .prodid_mask =3D 0, @@ -710,6 +1029,8 @@ static int nv_cspmu_init_ops(struct arm_cspmu *cspmu) =20 /* NVIDIA specific callbacks. */ SET_OP(validate_event, impl_ops, match, NULL); + SET_OP(event_type, impl_ops, match, NULL); + SET_OP(is_cycle_counter_event, impl_ops, match, NULL); SET_OP(set_cc_filter, impl_ops, match, nv_cspmu_set_cc_filter); SET_OP(set_ev_filter, impl_ops, match, nv_cspmu_set_ev_filter); SET_OP(reset_ev_filter, impl_ops, match, NULL); --=20 2.43.0 From nobody Fri Apr 3 04:41:07 2026 Received: from SA9PR02CU001.outbound.protection.outlook.com (mail-southcentralusazon11013029.outbound.protection.outlook.com [40.93.196.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 77718239567; Tue, 24 Mar 2026 01:30:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.93.196.29 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774315852; cv=fail; b=SE6tbvKqBCy4CKEGCqMhzlKixr1PnKV4E//ooDY0Mh/77LWUdO6qNcCuLxFmb7gey8G6n+neHVJMIsOP3Sja1FXLHpmatfiCtDDrdI5OqTc9pJg0TH2jVbpKZZh0gdJvjNEGtg49d5aW47U8BlGyZy8pX3JiYdAwm5kE+9ggIyk= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774315852; c=relaxed/simple; bh=Es/4zZNnMxkRotgkfaeLrG0QgAVhCWfS2Kfs2jdN0cc=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=jDqVtFu3VkR58Hp3elVHEwtCHZRRi2oLJr5omP0RRJlee5fw1rCcmYail1Ck+TsU0eXS1bEErbRtgfb1UVm64OV6xmuf1WLygXoK9AZtSX/33b5hTBBVLdGDFgl46uX88tRin+z95JSaUw9jCQuS2MqE/AqWsQ2bCiK3+C5RPJU= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=JZ6Knq12; arc=fail smtp.client-ip=40.93.196.29 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="JZ6Knq12" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=FK+r1yvnhciGU/TDjBlIkLBV4FZLfFu+yyndZgrje2bRjZ2NnSLfVDSwi16QXwvrmrklYG7+RcPwbOmzzlHA7AWFqLd7uSKSUqo3Ax28EyktDJlSDZdGMEZ/neQasoCwJtbVoDy2ttow+DXDyTOEqpjeM2fhHswca8ki1TKycyRDTcnaYBofmcNKVXCM3p6yqDgiNbHhxwF0ke0+zgTNdkZZIDxv62fNur3kgLxL8D5ZQiJFh9DTxZw45Cm05Q20fkhZawAD1e7SvMQigvW4jm7Xww2rzoN7u68FQBA1HnOn7SNqdbDAfezZ8gaG5Q7bJPYl5De/04Kb/EmUwZBX6Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=es83l2r4nDUrD2m5uzrgFlia0E01r+50Dvvc62h3UNg=; b=rj74VwyLmaTt/SRKDR56NkcztK1ZvbnIXn2X+vlLoKXvMS3ivxO5n2TXTmS84cmY1L+Aybs8SWSbRALipJTsokwEAelhPjhwMpUaUbGJQ6OqH5QWRGg3TKsYmMyMkdVPmLbV/FgQyH4+jRJAlXjWcaqBanJSFvRtu5wcPb7Kc84uzGryAyOrOrlAMgiWSRNODLFjmzyxDYEzCyoMNaNkbhnUoWaW++NdTXrg79/oSLKdJVhsV+ZOPoWYtsbN+R2IQ9RLyWRbAOdmxKHXK9ZJg+rwkL5cSM54BYKZxw2eCNsOOGWF/fODY5MqHIRcDuCXzeCMSBY8ov7bPiBwkBycjw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.233) smtp.rcpttodomain=kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=es83l2r4nDUrD2m5uzrgFlia0E01r+50Dvvc62h3UNg=; b=JZ6Knq12UnVJgUoYuEpc4e1yt7zdyI+1MIKTpAV8ubUnf7XDHXZC+Zw3+fG4LctWThsRSwZHuB0bmlpus5ZZ2KOlELDZWcGDbTGLkCONrOssYjHVpB7rcmQ7rYI9euf7xImIiiYLidmlyRZJKLjlwIKNcnCLDxWu1xODh7WUUeeO4Fsl84nayzOHfEPGg95AV5jehH6rn5zXQVdWUTakV9/lvi9fWVFLn3QC/hJ2z72Ieji4Akfqlfyqg+EsI6c92ZqABaBo+FoyXto0e/O4fspJiNqXSDrncitW/eVxIHmtaEd11rqwOaDdZhXzIU1D2UiNC5DARiWGArkWYTWD6A== Received: from BN9PR03CA0886.namprd03.prod.outlook.com (2603:10b6:408:13c::21) by DS0PR12MB8018.namprd12.prod.outlook.com (2603:10b6:8:149::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9745.20; Tue, 24 Mar 2026 01:30:38 +0000 Received: from BN3PEPF0000B075.namprd04.prod.outlook.com (2603:10b6:408:13c:cafe::cf) by BN9PR03CA0886.outlook.office365.com (2603:10b6:408:13c::21) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9723.31 via Frontend Transport; Tue, 24 Mar 2026 01:30:33 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.233) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.233 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.233; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.233) by BN3PEPF0000B075.mail.protection.outlook.com (10.167.243.120) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9723.19 via Frontend Transport; Tue, 24 Mar 2026 01:30:38 +0000 Received: from drhqmail202.nvidia.com (10.126.190.181) by mail.nvidia.com (10.127.129.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Mon, 23 Mar 2026 18:30:20 -0700 Received: from drhqmail201.nvidia.com (10.126.190.180) by drhqmail202.nvidia.com (10.126.190.181) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Mon, 23 Mar 2026 18:30:20 -0700 Received: from build-bwicaksono-noble-20251018.internal (10.127.8.11) by mail.nvidia.com (10.126.190.180) with Microsoft SMTP Server id 15.2.2562.20 via Frontend Transport; Mon, 23 Mar 2026 18:30:19 -0700 From: Besar Wicaksono To: , , , , CC: , , , , , , , , , , , , , , , , Besar Wicaksono Subject: [PATCH v3 6/8] perf: add NVIDIA Tegra410 CPU Memory Latency PMU Date: Tue, 24 Mar 2026 01:29:50 +0000 Message-ID: <20260324012952.1923296-7-bwicaksono@nvidia.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260324012952.1923296-1-bwicaksono@nvidia.com> References: <20260324012952.1923296-1-bwicaksono@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-NV-OnPremToCloud: ExternallySecured X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN3PEPF0000B075:EE_|DS0PR12MB8018:EE_ X-MS-Office365-Filtering-Correlation-Id: 05f50deb-3912-4bcf-6bf7-08de8944f45e X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|82310400026|1800799024|36860700016|7416014|376014|18002099003|22082099003|56012099003; X-Microsoft-Antispam-Message-Info: iqpX6uwju+WisV9wLeJCuKFOOE/LOkO9KxA6XB+bu1q99pbv3uAknUyoWPAz4KspEax0IFTv4M78dqfGa8Zz0JiBCdbeMA9jRrswse1umYU8iZ4JbM7Da9ij9VacewevuAZjt28D48orLjA/JYbLl4N4F0gEZCH7ESvSjqK7zedeqV074yNaahv0WKq24SiujsD6i49fQSxrcjRbRSVAfW5rZ6OLAym2Jpdh9YL99P3ZT5aRTi1cv5pvHvZonm9ae+hWN32f4AuCwV2ZmRNcjmEqkRSAWg9nm9BBgygDeLc5UzCiwcFoARzQlt8ZtnNrJRxWPhnoacreRfreWx3Uv02dEquyAGUVu/zbFvKZunYe8Us00mHAo6oI0Xp9KnOMBYvQCm+dW2S9JoenKXhPX94XH8aC8QBqSUDKdjzyJh++PjqB51HvgPremioTOzIK8ep04iJmKxPveL4YhK9Bai8pZtIVvNB1dz7uPAbjNCLwCPIjY707hfQIEg4RYv4MZionbPeqXWbTrCWdgWBEMdfu+qwmxLe37ZzC2wizvF+ZBOs6vcQq2bvEZu63az0R8NDqdZURxLQ7zNNQ8LJRoxRNUJMnoTmxx6PP/t7pEDTzmaZqKRXC3pDRWfuNxFN4tRbuuSXR8W6W2EzgwfFe/OIKWhJyJC8aZmAVUZUgIZDgjZqYAL8+cmVws51GRAtSXa+U+w9pvTgm/5wiU6nkJw3YNo00I4hxgRuGA2s41u18fCgEGphyHdYhgDBPju+rjPd2DMb/PP/Iee6vS/8CTQ== X-Forefront-Antispam-Report: CIP:216.228.118.233;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge2.nvidia.com;CAT:NONE;SFS:(13230040)(82310400026)(1800799024)(36860700016)(7416014)(376014)(18002099003)(22082099003)(56012099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: t7/cb5teAz+KiPU14A/C9mknhwlBpLfXhkg2wu0BdkDvAuesPCkjRhKHvTdpBExwF011THx3mqiUewmhqsxEfE/recCNDfurggagDZcLTVZIEw+mv2S0Z0Q0im6ILeegle1p+UorqSymTgDx556GQsapT5xfi2NswOw9AZTAo/wPai6XfTC/5UX+1ZBZe9ZburRP47f0JqUbLDTChEcQ23yA0HEQrIaovHY2L4DvzxKyiKnS9IRy6zrl1hWNbVYw4TDRims0YKKs6RnUf23CsGZ2O6BEsmoD0IrNGjmUT1XpO9qmZYYaVqx68JmEJUCt/DS5v9mMbUWgJ1jVtBQyYiIuS9u7aRo0ALpWbbzmialTqbVGT4e/pXOh7ijy3mCJzJbI4dudCGgOVLsiRBigUyRurgTlVGbnH93ge+oD1szOstpWwQKbrZHwDF9KOcxj X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Mar 2026 01:30:38.1976 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 05f50deb-3912-4bcf-6bf7-08de8944f45e X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.233];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BN3PEPF0000B075.namprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS0PR12MB8018 Content-Type: text/plain; charset="utf-8" Adds CPU Memory (CMEM) Latency PMU support in Tegra410 SOC. The PMU is used to measure latency between the edge of the Unified Coherence Fabric to the local system DRAM. Reviewed-by: Ilkka Koskinen Signed-off-by: Besar Wicaksono --- .../admin-guide/perf/nvidia-tegra410-pmu.rst | 25 + drivers/perf/Kconfig | 7 + drivers/perf/Makefile | 1 + drivers/perf/nvidia_t410_cmem_latency_pmu.c | 736 ++++++++++++++++++ 4 files changed, 769 insertions(+) create mode 100644 drivers/perf/nvidia_t410_cmem_latency_pmu.c diff --git a/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst b/Docum= entation/admin-guide/perf/nvidia-tegra410-pmu.rst index c065764d41fe..9945c43f6a7a 100644 --- a/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst +++ b/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst @@ -8,6 +8,7 @@ metrics like memory bandwidth, latency, and utilization: * Unified Coherence Fabric (UCF) * PCIE * PCIE-TGT +* CPU Memory (CMEM) Latency =20 PMU Driver ---------- @@ -344,3 +345,27 @@ Example usage: 0x10000 to 0x100FF on socket 0's PCIE RC-1:: =20 perf stat -a -e nvidia_pcie_tgt_pmu_0_rc_1/event=3D0x1,dst_addr_base= =3D0x10000,dst_addr_mask=3D0xFFF00,dst_addr_en=3D0x1/ + +CPU Memory (CMEM) Latency PMU +----------------------------- + +This PMU monitors latency events of memory read requests from the edge of = the +Unified Coherence Fabric (UCF) to local CPU DRAM: + + * RD_REQ counters: count read requests (32B per request). + * RD_CUM_OUTS counters: accumulated outstanding request counter, which t= rack + how many cycles the read requests are in flight. + * CYCLES counter: counts the number of elapsed cycles. + +The average latency is calculated as:: + + FREQ_IN_GHZ =3D CYCLES / ELAPSED_TIME_IN_NS + AVG_LATENCY_IN_CYCLES =3D RD_CUM_OUTS / RD_REQ + AVERAGE_LATENCY_IN_NS =3D AVG_LATENCY_IN_CYCLES / FREQ_IN_GHZ + +The events and configuration options of this PMU device are described in s= ysfs, +see /sys/bus/event_source/devices/nvidia_cmem_latency_pmu_. + +Example usage:: + + perf stat -a -e '{nvidia_cmem_latency_pmu_0/rd_req/,nvidia_cmem_latency_= pmu_0/rd_cum_outs/,nvidia_cmem_latency_pmu_0/cycles/}' diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig index 638321fc9800..26e86067d8f9 100644 --- a/drivers/perf/Kconfig +++ b/drivers/perf/Kconfig @@ -311,4 +311,11 @@ config MARVELL_PEM_PMU Enable support for PCIe Interface performance monitoring on Marvell platform. =20 +config NVIDIA_TEGRA410_CMEM_LATENCY_PMU + tristate "NVIDIA Tegra410 CPU Memory Latency PMU" + depends on ARM64 && ACPI + help + Enable perf support for CPU memory latency counters monitoring on + NVIDIA Tegra410 SoC. + endmenu diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile index ea52711a87e3..4aa6aad393c2 100644 --- a/drivers/perf/Makefile +++ b/drivers/perf/Makefile @@ -35,3 +35,4 @@ obj-$(CONFIG_DWC_PCIE_PMU) +=3D dwc_pcie_pmu.o obj-$(CONFIG_ARM_CORESIGHT_PMU_ARCH_SYSTEM_PMU) +=3D arm_cspmu/ obj-$(CONFIG_MESON_DDR_PMU) +=3D amlogic/ obj-$(CONFIG_CXL_PMU) +=3D cxl_pmu.o +obj-$(CONFIG_NVIDIA_TEGRA410_CMEM_LATENCY_PMU) +=3D nvidia_t410_cmem_laten= cy_pmu.o diff --git a/drivers/perf/nvidia_t410_cmem_latency_pmu.c b/drivers/perf/nvi= dia_t410_cmem_latency_pmu.c new file mode 100644 index 000000000000..acb8f5571522 --- /dev/null +++ b/drivers/perf/nvidia_t410_cmem_latency_pmu.c @@ -0,0 +1,736 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * NVIDIA Tegra410 CPU Memory (CMEM) Latency PMU driver. + * + * Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserve= d. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define NUM_INSTANCES 14 + +/* Register offsets. */ +#define CMEM_LAT_CG_CTRL 0x800 +#define CMEM_LAT_CTRL 0x808 +#define CMEM_LAT_STATUS 0x810 +#define CMEM_LAT_CYCLE_CNTR 0x818 +#define CMEM_LAT_MC0_REQ_CNTR 0x820 +#define CMEM_LAT_MC0_AOR_CNTR 0x830 +#define CMEM_LAT_MC1_REQ_CNTR 0x838 +#define CMEM_LAT_MC1_AOR_CNTR 0x848 +#define CMEM_LAT_MC2_REQ_CNTR 0x850 +#define CMEM_LAT_MC2_AOR_CNTR 0x860 + +/* CMEM_LAT_CTRL values. */ +#define CMEM_LAT_CTRL_DISABLE 0x0ULL +#define CMEM_LAT_CTRL_ENABLE 0x1ULL +#define CMEM_LAT_CTRL_CLR 0x2ULL + +/* CMEM_LAT_CG_CTRL values. */ +#define CMEM_LAT_CG_CTRL_DISABLE 0x0ULL +#define CMEM_LAT_CG_CTRL_ENABLE 0x1ULL + +/* CMEM_LAT_STATUS register field. */ +#define CMEM_LAT_STATUS_CYCLE_OVF BIT(0) +#define CMEM_LAT_STATUS_MC0_AOR_OVF BIT(1) +#define CMEM_LAT_STATUS_MC0_REQ_OVF BIT(3) +#define CMEM_LAT_STATUS_MC1_AOR_OVF BIT(4) +#define CMEM_LAT_STATUS_MC1_REQ_OVF BIT(6) +#define CMEM_LAT_STATUS_MC2_AOR_OVF BIT(7) +#define CMEM_LAT_STATUS_MC2_REQ_OVF BIT(9) + +/* Events. */ +#define CMEM_LAT_EVENT_CYCLES 0x0 +#define CMEM_LAT_EVENT_REQ 0x1 +#define CMEM_LAT_EVENT_AOR 0x2 + +#define CMEM_LAT_NUM_EVENTS 0x3 +#define CMEM_LAT_MASK_EVENT 0x3 +#define CMEM_LAT_MAX_ACTIVE_EVENTS 32 + +#define CMEM_LAT_ACTIVE_CPU_MASK 0x0 +#define CMEM_LAT_ASSOCIATED_CPU_MASK 0x1 + +static unsigned long cmem_lat_pmu_cpuhp_state; + +struct cmem_lat_pmu_hw_events { + struct perf_event *events[CMEM_LAT_MAX_ACTIVE_EVENTS]; + DECLARE_BITMAP(used_ctrs, CMEM_LAT_MAX_ACTIVE_EVENTS); +}; + +struct cmem_lat_pmu { + struct pmu pmu; + struct device *dev; + const char *name; + const char *identifier; + void __iomem *base_broadcast; + void __iomem *base[NUM_INSTANCES]; + cpumask_t associated_cpus; + cpumask_t active_cpu; + struct hlist_node node; + struct cmem_lat_pmu_hw_events hw_events; +}; + +#define to_cmem_lat_pmu(p) \ + container_of(p, struct cmem_lat_pmu, pmu) + + +/* Get event type from perf_event. */ +static inline u32 get_event_type(struct perf_event *event) +{ + return (event->attr.config) & CMEM_LAT_MASK_EVENT; +} + +/* PMU operations. */ +static int cmem_lat_pmu_get_event_idx(struct cmem_lat_pmu_hw_events *hw_ev= ents, + struct perf_event *event) +{ + unsigned int idx; + + idx =3D find_first_zero_bit(hw_events->used_ctrs, CMEM_LAT_MAX_ACTIVE_EVE= NTS); + if (idx >=3D CMEM_LAT_MAX_ACTIVE_EVENTS) + return -EAGAIN; + + set_bit(idx, hw_events->used_ctrs); + + return idx; +} + +static bool cmem_lat_pmu_validate_event(struct pmu *pmu, + struct cmem_lat_pmu_hw_events *hw_events, + struct perf_event *event) +{ + int ret; + + if (is_software_event(event)) + return true; + + /* Reject groups spanning multiple HW PMUs. */ + if (event->pmu !=3D pmu) + return false; + + ret =3D cmem_lat_pmu_get_event_idx(hw_events, event); + if (ret < 0) + return false; + + return true; +} + +/* Make sure the group of events can be scheduled at once on the PMU. */ +static bool cmem_lat_pmu_validate_group(struct perf_event *event) +{ + struct perf_event *sibling, *leader =3D event->group_leader; + struct cmem_lat_pmu_hw_events fake_hw_events; + + if (event->group_leader =3D=3D event) + return true; + + memset(&fake_hw_events, 0, sizeof(fake_hw_events)); + + if (!cmem_lat_pmu_validate_event(event->pmu, &fake_hw_events, leader)) + return false; + + for_each_sibling_event(sibling, leader) { + if (!cmem_lat_pmu_validate_event(event->pmu, &fake_hw_events, sibling)) + return false; + } + + return cmem_lat_pmu_validate_event(event->pmu, &fake_hw_events, event); +} + +static int cmem_lat_pmu_event_init(struct perf_event *event) +{ + struct cmem_lat_pmu *cmem_lat_pmu =3D to_cmem_lat_pmu(event->pmu); + struct hw_perf_event *hwc =3D &event->hw; + u32 event_type =3D get_event_type(event); + + if (event->attr.type !=3D event->pmu->type || + event_type >=3D CMEM_LAT_NUM_EVENTS) + return -ENOENT; + + /* + * Sampling, per-process mode, and per-task counters are not supported + * since this PMU is shared across all CPUs. + */ + if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK) { + dev_dbg(cmem_lat_pmu->pmu.dev, + "Can't support sampling and per-process mode\n"); + return -EOPNOTSUPP; + } + + if (event->cpu < 0) { + dev_dbg(cmem_lat_pmu->pmu.dev, "Can't support per-task counters\n"); + return -EINVAL; + } + + /* + * Make sure the CPU assignment is on one of the CPUs associated with + * this PMU. + */ + if (!cpumask_test_cpu(event->cpu, &cmem_lat_pmu->associated_cpus)) { + dev_dbg(cmem_lat_pmu->pmu.dev, + "Requested cpu is not associated with the PMU\n"); + return -EINVAL; + } + + /* Enforce the current active CPU to handle the events in this PMU. */ + event->cpu =3D cpumask_first(&cmem_lat_pmu->active_cpu); + if (event->cpu >=3D nr_cpu_ids) + return -EINVAL; + + if (!cmem_lat_pmu_validate_group(event)) + return -EINVAL; + + hwc->idx =3D -1; + hwc->config =3D event_type; + + return 0; +} + +static u64 cmem_lat_pmu_read_status(struct cmem_lat_pmu *cmem_lat_pmu, + unsigned int inst) +{ + return readq(cmem_lat_pmu->base[inst] + CMEM_LAT_STATUS); +} + +static u64 cmem_lat_pmu_read_cycle_counter(struct perf_event *event) +{ + const unsigned int instance =3D 0; + u64 status; + struct cmem_lat_pmu *cmem_lat_pmu =3D to_cmem_lat_pmu(event->pmu); + struct device *dev =3D cmem_lat_pmu->dev; + + /* + * Use the reading from first instance since all instances are + * identical. + */ + status =3D cmem_lat_pmu_read_status(cmem_lat_pmu, instance); + if (status & CMEM_LAT_STATUS_CYCLE_OVF) + dev_warn(dev, "Cycle counter overflow\n"); + + return readq(cmem_lat_pmu->base[instance] + CMEM_LAT_CYCLE_CNTR); +} + +static u64 cmem_lat_pmu_read_req_counter(struct perf_event *event) +{ + unsigned int i; + u64 status, val =3D 0; + struct cmem_lat_pmu *cmem_lat_pmu =3D to_cmem_lat_pmu(event->pmu); + struct device *dev =3D cmem_lat_pmu->dev; + + /* Sum up the counts from all instances. */ + for (i =3D 0; i < NUM_INSTANCES; i++) { + status =3D cmem_lat_pmu_read_status(cmem_lat_pmu, i); + if (status & CMEM_LAT_STATUS_MC0_REQ_OVF) + dev_warn(dev, "MC0 request counter overflow\n"); + if (status & CMEM_LAT_STATUS_MC1_REQ_OVF) + dev_warn(dev, "MC1 request counter overflow\n"); + if (status & CMEM_LAT_STATUS_MC2_REQ_OVF) + dev_warn(dev, "MC2 request counter overflow\n"); + + val +=3D readq(cmem_lat_pmu->base[i] + CMEM_LAT_MC0_REQ_CNTR); + val +=3D readq(cmem_lat_pmu->base[i] + CMEM_LAT_MC1_REQ_CNTR); + val +=3D readq(cmem_lat_pmu->base[i] + CMEM_LAT_MC2_REQ_CNTR); + } + + return val; +} + +static u64 cmem_lat_pmu_read_aor_counter(struct perf_event *event) +{ + unsigned int i; + u64 status, val =3D 0; + struct cmem_lat_pmu *cmem_lat_pmu =3D to_cmem_lat_pmu(event->pmu); + struct device *dev =3D cmem_lat_pmu->dev; + + /* Sum up the counts from all instances. */ + for (i =3D 0; i < NUM_INSTANCES; i++) { + status =3D cmem_lat_pmu_read_status(cmem_lat_pmu, i); + if (status & CMEM_LAT_STATUS_MC0_AOR_OVF) + dev_warn(dev, "MC0 AOR counter overflow\n"); + if (status & CMEM_LAT_STATUS_MC1_AOR_OVF) + dev_warn(dev, "MC1 AOR counter overflow\n"); + if (status & CMEM_LAT_STATUS_MC2_AOR_OVF) + dev_warn(dev, "MC2 AOR counter overflow\n"); + + val +=3D readq(cmem_lat_pmu->base[i] + CMEM_LAT_MC0_AOR_CNTR); + val +=3D readq(cmem_lat_pmu->base[i] + CMEM_LAT_MC1_AOR_CNTR); + val +=3D readq(cmem_lat_pmu->base[i] + CMEM_LAT_MC2_AOR_CNTR); + } + + return val; +} + +static u64 (*read_counter_fn[CMEM_LAT_NUM_EVENTS])(struct perf_event *) = =3D { + [CMEM_LAT_EVENT_CYCLES] =3D cmem_lat_pmu_read_cycle_counter, + [CMEM_LAT_EVENT_REQ] =3D cmem_lat_pmu_read_req_counter, + [CMEM_LAT_EVENT_AOR] =3D cmem_lat_pmu_read_aor_counter, +}; + +static void cmem_lat_pmu_event_update(struct perf_event *event) +{ + u32 event_type; + u64 prev, now; + struct hw_perf_event *hwc =3D &event->hw; + + if (hwc->state & PERF_HES_STOPPED) + return; + + event_type =3D hwc->config; + + do { + prev =3D local64_read(&hwc->prev_count); + now =3D read_counter_fn[event_type](event); + } while (local64_cmpxchg(&hwc->prev_count, prev, now) !=3D prev); + + local64_add(now - prev, &event->count); + + hwc->state |=3D PERF_HES_UPTODATE; +} + +static void cmem_lat_pmu_start(struct perf_event *event, int pmu_flags) +{ + event->hw.state =3D 0; +} + +static void cmem_lat_pmu_stop(struct perf_event *event, int pmu_flags) +{ + event->hw.state |=3D PERF_HES_STOPPED; +} + +static int cmem_lat_pmu_add(struct perf_event *event, int flags) +{ + struct cmem_lat_pmu *cmem_lat_pmu =3D to_cmem_lat_pmu(event->pmu); + struct cmem_lat_pmu_hw_events *hw_events =3D &cmem_lat_pmu->hw_events; + struct hw_perf_event *hwc =3D &event->hw; + int idx; + + if (WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), + &cmem_lat_pmu->associated_cpus))) + return -ENOENT; + + idx =3D cmem_lat_pmu_get_event_idx(hw_events, event); + if (idx < 0) + return idx; + + hw_events->events[idx] =3D event; + hwc->idx =3D idx; + hwc->state =3D PERF_HES_STOPPED | PERF_HES_UPTODATE; + + if (flags & PERF_EF_START) + cmem_lat_pmu_start(event, PERF_EF_RELOAD); + + /* Propagate changes to the userspace mapping. */ + perf_event_update_userpage(event); + + return 0; +} + +static void cmem_lat_pmu_del(struct perf_event *event, int flags) +{ + struct cmem_lat_pmu *cmem_lat_pmu =3D to_cmem_lat_pmu(event->pmu); + struct cmem_lat_pmu_hw_events *hw_events =3D &cmem_lat_pmu->hw_events; + struct hw_perf_event *hwc =3D &event->hw; + int idx =3D hwc->idx; + + cmem_lat_pmu_stop(event, PERF_EF_UPDATE); + + hw_events->events[idx] =3D NULL; + + clear_bit(idx, hw_events->used_ctrs); + + perf_event_update_userpage(event); +} + +static void cmem_lat_pmu_read(struct perf_event *event) +{ + cmem_lat_pmu_event_update(event); +} + +static inline void cmem_lat_pmu_cg_ctrl(struct cmem_lat_pmu *cmem_lat_pmu, + u64 val) +{ + writeq(val, cmem_lat_pmu->base_broadcast + CMEM_LAT_CG_CTRL); +} + +static inline void cmem_lat_pmu_ctrl(struct cmem_lat_pmu *cmem_lat_pmu, u6= 4 val) +{ + writeq(val, cmem_lat_pmu->base_broadcast + CMEM_LAT_CTRL); +} + +static void cmem_lat_pmu_enable(struct pmu *pmu) +{ + bool disabled; + struct cmem_lat_pmu *cmem_lat_pmu =3D to_cmem_lat_pmu(pmu); + + disabled =3D bitmap_empty(cmem_lat_pmu->hw_events.used_ctrs, + CMEM_LAT_MAX_ACTIVE_EVENTS); + + if (disabled) + return; + + /* Enable all the counters. */ + cmem_lat_pmu_cg_ctrl(cmem_lat_pmu, CMEM_LAT_CG_CTRL_ENABLE); + cmem_lat_pmu_ctrl(cmem_lat_pmu, CMEM_LAT_CTRL_ENABLE); +} + +static void cmem_lat_pmu_disable(struct pmu *pmu) +{ + int idx; + struct cmem_lat_pmu *cmem_lat_pmu =3D to_cmem_lat_pmu(pmu); + + /* Disable all the counters. */ + cmem_lat_pmu_ctrl(cmem_lat_pmu, CMEM_LAT_CTRL_DISABLE); + + /* + * The counters will start from 0 again on restart. + * Update the events immediately to avoid losing the counts. + */ + for_each_set_bit(idx, cmem_lat_pmu->hw_events.used_ctrs, + CMEM_LAT_MAX_ACTIVE_EVENTS) { + struct perf_event *event =3D cmem_lat_pmu->hw_events.events[idx]; + + if (!event) + continue; + + cmem_lat_pmu_event_update(event); + + local64_set(&event->hw.prev_count, 0ULL); + } + + cmem_lat_pmu_ctrl(cmem_lat_pmu, CMEM_LAT_CTRL_CLR); + cmem_lat_pmu_cg_ctrl(cmem_lat_pmu, CMEM_LAT_CG_CTRL_DISABLE); +} + +/* PMU identifier attribute. */ + +static ssize_t cmem_lat_pmu_identifier_show(struct device *dev, + struct device_attribute *attr, + char *page) +{ + struct cmem_lat_pmu *cmem_lat_pmu =3D to_cmem_lat_pmu(dev_get_drvdata(dev= )); + + return sysfs_emit(page, "%s\n", cmem_lat_pmu->identifier); +} + +static struct device_attribute cmem_lat_pmu_identifier_attr =3D + __ATTR(identifier, 0444, cmem_lat_pmu_identifier_show, NULL); + +static struct attribute *cmem_lat_pmu_identifier_attrs[] =3D { + &cmem_lat_pmu_identifier_attr.attr, + NULL +}; + +static struct attribute_group cmem_lat_pmu_identifier_attr_group =3D { + .attrs =3D cmem_lat_pmu_identifier_attrs, +}; + +/* Format attributes. */ + +#define NV_PMU_EXT_ATTR(_name, _func, _config) \ + (&((struct dev_ext_attribute[]){ \ + { \ + .attr =3D __ATTR(_name, 0444, _func, NULL), \ + .var =3D (void *)_config \ + } \ + })[0].attr.attr) + +static struct attribute *cmem_lat_pmu_formats[] =3D { + NV_PMU_EXT_ATTR(event, device_show_string, "config:0-1"), + NULL +}; + +static const struct attribute_group cmem_lat_pmu_format_group =3D { + .name =3D "format", + .attrs =3D cmem_lat_pmu_formats, +}; + +/* Event attributes. */ + +static ssize_t cmem_lat_pmu_sysfs_event_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct perf_pmu_events_attr *pmu_attr; + + pmu_attr =3D container_of(attr, typeof(*pmu_attr), attr); + return sysfs_emit(buf, "event=3D0x%llx\n", pmu_attr->id); +} + +#define NV_PMU_EVENT_ATTR(_name, _config) \ + PMU_EVENT_ATTR_ID(_name, cmem_lat_pmu_sysfs_event_show, _config) + +static struct attribute *cmem_lat_pmu_events[] =3D { + NV_PMU_EVENT_ATTR(cycles, CMEM_LAT_EVENT_CYCLES), + NV_PMU_EVENT_ATTR(rd_req, CMEM_LAT_EVENT_REQ), + NV_PMU_EVENT_ATTR(rd_cum_outs, CMEM_LAT_EVENT_AOR), + NULL +}; + +static const struct attribute_group cmem_lat_pmu_events_group =3D { + .name =3D "events", + .attrs =3D cmem_lat_pmu_events, +}; + +/* Cpumask attributes. */ + +static ssize_t cmem_lat_pmu_cpumask_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct pmu *pmu =3D dev_get_drvdata(dev); + struct cmem_lat_pmu *cmem_lat_pmu =3D to_cmem_lat_pmu(pmu); + struct dev_ext_attribute *eattr =3D + container_of(attr, struct dev_ext_attribute, attr); + unsigned long mask_id =3D (unsigned long)eattr->var; + const cpumask_t *cpumask; + + switch (mask_id) { + case CMEM_LAT_ACTIVE_CPU_MASK: + cpumask =3D &cmem_lat_pmu->active_cpu; + break; + case CMEM_LAT_ASSOCIATED_CPU_MASK: + cpumask =3D &cmem_lat_pmu->associated_cpus; + break; + default: + return 0; + } + return cpumap_print_to_pagebuf(true, buf, cpumask); +} + +#define NV_PMU_CPUMASK_ATTR(_name, _config) \ + NV_PMU_EXT_ATTR(_name, cmem_lat_pmu_cpumask_show, \ + (unsigned long)_config) + +static struct attribute *cmem_lat_pmu_cpumask_attrs[] =3D { + NV_PMU_CPUMASK_ATTR(cpumask, CMEM_LAT_ACTIVE_CPU_MASK), + NV_PMU_CPUMASK_ATTR(associated_cpus, CMEM_LAT_ASSOCIATED_CPU_MASK), + NULL +}; + +static const struct attribute_group cmem_lat_pmu_cpumask_attr_group =3D { + .attrs =3D cmem_lat_pmu_cpumask_attrs, +}; + +/* Per PMU device attribute groups. */ + +static const struct attribute_group *cmem_lat_pmu_attr_groups[] =3D { + &cmem_lat_pmu_identifier_attr_group, + &cmem_lat_pmu_format_group, + &cmem_lat_pmu_events_group, + &cmem_lat_pmu_cpumask_attr_group, + NULL +}; + +static int cmem_lat_pmu_cpu_online(unsigned int cpu, struct hlist_node *no= de) +{ + struct cmem_lat_pmu *cmem_lat_pmu =3D + hlist_entry_safe(node, struct cmem_lat_pmu, node); + + if (!cpumask_test_cpu(cpu, &cmem_lat_pmu->associated_cpus)) + return 0; + + /* If the PMU is already managed, there is nothing to do */ + if (!cpumask_empty(&cmem_lat_pmu->active_cpu)) + return 0; + + /* Use this CPU for event counting */ + cpumask_set_cpu(cpu, &cmem_lat_pmu->active_cpu); + + return 0; +} + +static int cmem_lat_pmu_cpu_teardown(unsigned int cpu, struct hlist_node *= node) +{ + unsigned int dst; + + struct cmem_lat_pmu *cmem_lat_pmu =3D + hlist_entry_safe(node, struct cmem_lat_pmu, node); + + /* Nothing to do if this CPU doesn't own the PMU */ + if (!cpumask_test_and_clear_cpu(cpu, &cmem_lat_pmu->active_cpu)) + return 0; + + /* Choose a new CPU to migrate ownership of the PMU to */ + dst =3D cpumask_any_and_but(&cmem_lat_pmu->associated_cpus, + cpu_online_mask, cpu); + if (dst >=3D nr_cpu_ids) + return 0; + + /* Use this CPU for event counting */ + perf_pmu_migrate_context(&cmem_lat_pmu->pmu, cpu, dst); + cpumask_set_cpu(dst, &cmem_lat_pmu->active_cpu); + + return 0; +} + +static int cmem_lat_pmu_get_cpus(struct cmem_lat_pmu *cmem_lat_pmu, + unsigned int socket) +{ + int cpu; + + for_each_possible_cpu(cpu) { + if (cpu_to_node(cpu) =3D=3D socket) + cpumask_set_cpu(cpu, &cmem_lat_pmu->associated_cpus); + } + + if (cpumask_empty(&cmem_lat_pmu->associated_cpus)) { + dev_dbg(cmem_lat_pmu->dev, + "No cpu associated with PMU socket-%u\n", socket); + return -ENODEV; + } + + return 0; +} + +static int cmem_lat_pmu_probe(struct platform_device *pdev) +{ + struct device *dev =3D &pdev->dev; + struct acpi_device *acpi_dev; + struct cmem_lat_pmu *cmem_lat_pmu; + char *name, *uid_str; + int ret, i; + u32 socket; + + acpi_dev =3D ACPI_COMPANION(dev); + if (!acpi_dev) + return -ENODEV; + + uid_str =3D acpi_device_uid(acpi_dev); + if (!uid_str) + return -ENODEV; + + ret =3D kstrtou32(uid_str, 0, &socket); + if (ret) + return ret; + + cmem_lat_pmu =3D devm_kzalloc(dev, sizeof(*cmem_lat_pmu), GFP_KERNEL); + name =3D devm_kasprintf(dev, GFP_KERNEL, "nvidia_cmem_latency_pmu_%u", so= cket); + if (!cmem_lat_pmu || !name) + return -ENOMEM; + + cmem_lat_pmu->dev =3D dev; + cmem_lat_pmu->name =3D name; + cmem_lat_pmu->identifier =3D acpi_device_hid(acpi_dev); + platform_set_drvdata(pdev, cmem_lat_pmu); + + cmem_lat_pmu->pmu =3D (struct pmu) { + .parent =3D &pdev->dev, + .task_ctx_nr =3D perf_invalid_context, + .pmu_enable =3D cmem_lat_pmu_enable, + .pmu_disable =3D cmem_lat_pmu_disable, + .event_init =3D cmem_lat_pmu_event_init, + .add =3D cmem_lat_pmu_add, + .del =3D cmem_lat_pmu_del, + .start =3D cmem_lat_pmu_start, + .stop =3D cmem_lat_pmu_stop, + .read =3D cmem_lat_pmu_read, + .attr_groups =3D cmem_lat_pmu_attr_groups, + .capabilities =3D PERF_PMU_CAP_NO_EXCLUDE | + PERF_PMU_CAP_NO_INTERRUPT, + }; + + /* Map the address of all the instances. */ + for (i =3D 0; i < NUM_INSTANCES; i++) { + cmem_lat_pmu->base[i] =3D devm_platform_ioremap_resource(pdev, i); + if (IS_ERR(cmem_lat_pmu->base[i])) { + dev_err(dev, "Failed map address for instance %d\n", i); + return PTR_ERR(cmem_lat_pmu->base[i]); + } + } + + /* Map broadcast address. */ + cmem_lat_pmu->base_broadcast =3D devm_platform_ioremap_resource(pdev, + NUM_INSTANCES); + if (IS_ERR(cmem_lat_pmu->base_broadcast)) { + dev_err(dev, "Failed map broadcast address\n"); + return PTR_ERR(cmem_lat_pmu->base_broadcast); + } + + ret =3D cmem_lat_pmu_get_cpus(cmem_lat_pmu, socket); + if (ret) + return ret; + + ret =3D cpuhp_state_add_instance(cmem_lat_pmu_cpuhp_state, + &cmem_lat_pmu->node); + if (ret) { + dev_err(&pdev->dev, "Error %d registering hotplug\n", ret); + return ret; + } + + cmem_lat_pmu_cg_ctrl(cmem_lat_pmu, CMEM_LAT_CG_CTRL_ENABLE); + cmem_lat_pmu_ctrl(cmem_lat_pmu, CMEM_LAT_CTRL_CLR); + cmem_lat_pmu_cg_ctrl(cmem_lat_pmu, CMEM_LAT_CG_CTRL_DISABLE); + + ret =3D perf_pmu_register(&cmem_lat_pmu->pmu, name, -1); + if (ret) { + dev_err(&pdev->dev, "Failed to register PMU: %d\n", ret); + cpuhp_state_remove_instance(cmem_lat_pmu_cpuhp_state, + &cmem_lat_pmu->node); + return ret; + } + + dev_dbg(&pdev->dev, "Registered %s PMU\n", name); + + return 0; +} + +static void cmem_lat_pmu_device_remove(struct platform_device *pdev) +{ + struct cmem_lat_pmu *cmem_lat_pmu =3D platform_get_drvdata(pdev); + + perf_pmu_unregister(&cmem_lat_pmu->pmu); + cpuhp_state_remove_instance(cmem_lat_pmu_cpuhp_state, + &cmem_lat_pmu->node); +} + +static const struct acpi_device_id cmem_lat_pmu_acpi_match[] =3D { + { "NVDA2021" }, + { } +}; +MODULE_DEVICE_TABLE(acpi, cmem_lat_pmu_acpi_match); + +static struct platform_driver cmem_lat_pmu_driver =3D { + .driver =3D { + .name =3D "nvidia-t410-cmem-latency-pmu", + .acpi_match_table =3D ACPI_PTR(cmem_lat_pmu_acpi_match), + .suppress_bind_attrs =3D true, + }, + .probe =3D cmem_lat_pmu_probe, + .remove =3D cmem_lat_pmu_device_remove, +}; + +static int __init cmem_lat_pmu_init(void) +{ + int ret; + + ret =3D cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, + "perf/nvidia/cmem_latency:online", + cmem_lat_pmu_cpu_online, + cmem_lat_pmu_cpu_teardown); + if (ret < 0) + return ret; + + cmem_lat_pmu_cpuhp_state =3D ret; + + return platform_driver_register(&cmem_lat_pmu_driver); +} + +static void __exit cmem_lat_pmu_exit(void) +{ + platform_driver_unregister(&cmem_lat_pmu_driver); + cpuhp_remove_multi_state(cmem_lat_pmu_cpuhp_state); +} + +module_init(cmem_lat_pmu_init); +module_exit(cmem_lat_pmu_exit); + +MODULE_LICENSE("GPL"); +MODULE_DESCRIPTION("NVIDIA Tegra410 CPU Memory Latency PMU driver"); +MODULE_AUTHOR("Besar Wicaksono "); --=20 2.43.0 From nobody Fri Apr 3 04:41:07 2026 Received: from BN1PR04CU002.outbound.protection.outlook.com (mail-eastus2azon11010018.outbound.protection.outlook.com [52.101.56.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3FB4B34D3B9; Tue, 24 Mar 2026 01:30:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.56.18 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774315859; cv=fail; b=QFLS5JHq/CyD6AXJtCqkPJ7y4LKx0sMPeBEkqcLCJE4IMf+ngfLQE71gmBwgy1CgQFIcAMIUzPg9+ioKZvJzMoIduoREc6rWiIor5hYQlpr48gOU2rMsCm4chSzCRqZTYe2hqHa6a4cDH2kGMpgRuQoBgGUqZuGFMQayamZIsj8= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774315859; c=relaxed/simple; bh=7ZGZYxhtHb4ljmxbJ4QnvSkkP+lKrBRYBnqZcHwW0l4=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=nYiAsmC69/JZ2WJ7LaIF8uy3E6Oc5EVDcCYplgVUu/w8/LPZ+oQvaci8n5TynbUUSejgTUDZGpqmSQJwG7OQgoQkh6iUcppNJdXppPDyO9Z6DhnzotgM32mu4wv70rI7NUca48+/NLovxJSo0ykc1kpnv4joXnwL96hpUxDPalQ= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=db+rZG3G; arc=fail smtp.client-ip=52.101.56.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="db+rZG3G" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=iF9beXMMtd86R/oeRcn0tClGopyQWAd0cZHlSSo5Go1pRTaykNR4OsCPfyjAX6d73T3XJaWKVWCASd4/rRKmS0cVbVVt8Q9fUVQnGHQVCnksOcho1XEfQTXDQPZG4/XGJfImg8uy+I3iwmP0F0Vid8FpuajJJSTKnXc8Wi9Lk6uw5VC2sZ8jwHFchcvGgZMIFcC8CXsA6PH6fYEx74hjE7AaHvOGQ+uBvUimQNkmkczR/cdnWYTRYhWJi6C3lkxhVzAhi7v5dfN5U6YRaTQgDk9CnOGkKQfZ6mFVeMxGIopffy4V9iM7Rfptxh6xyZ/LfyoWewvtOqvc2OvSGAx87g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=rLjL8tH9pHzRbA/n1e0T57QScQBoLWmtA3czo9Q5NMc=; b=tzQgZSksNz1zn4HjKIlOXkPpZroHyhwDZp65+Ti1mtvAABk+lI9PpRNTc/M5292534/GG0zcg6NV48GUDCCA5724sANgzJNYU+G5AAspRBRREXJ1Ngt3Q0sKV2BeeXfkKXvQb0OtOKvQod+UBnGq0u/WM54254yD1OYtwdOrckdQYhSkJSCd2xYZ/a3FsC3R/Rh8VgJPSoBKHKx7mP8oq1A+ugST8G20yMkKUU40DiQlVXsvix0b+1QWgZmgLgXXW/D0tw3Tqp5OuIVCI+s+g8xiK+2dK6W7p5L3m8q9VVhkycGf0RV3NTyx6l6JFYWBH+LYwhfSy0iTApFGfRJvJw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.232) smtp.rcpttodomain=kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=rLjL8tH9pHzRbA/n1e0T57QScQBoLWmtA3czo9Q5NMc=; b=db+rZG3GdexYCzNyolNcZCMRU1I8eTbuv15VvJ9f3P+k99eCJ8gNtKxoynIs47Bzg6+uVRjXOFUC5RzhCfKbMoM3Qfc9eTmMy5tgJ2o2eJIIQp/cbnsc4HQWLQkGfOsD5ymcgGhxaX/+icAhiaGtAdfW78Lx6w4ynQxuHheWxu8/stYjsSobPoxCJoVqMpywNFH2sJjcSCTIl29QTWUW9hVdSFbdrFSt6ENiO3aRvCQwEhLAR3l8WNwRVfTXGh92VZ09a8zvGjtSGfjgU8RDfuW6egK6O1oqlD0n+ncNSVKSkphE8UaytdAj+iwR48P8N98l+OXJ9K6IDcYHrZkhGQ== Received: from MN2PR11CA0011.namprd11.prod.outlook.com (2603:10b6:208:23b::16) by LV3PR12MB9355.namprd12.prod.outlook.com (2603:10b6:408:216::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9745.20; Tue, 24 Mar 2026 01:30:45 +0000 Received: from MN1PEPF0000F0E5.namprd04.prod.outlook.com (2603:10b6:208:23b:cafe::9c) by MN2PR11CA0011.outlook.office365.com (2603:10b6:208:23b::16) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9723.31 via Frontend Transport; Tue, 24 Mar 2026 01:30:39 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.232) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.232 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.232; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.232) by MN1PEPF0000F0E5.mail.protection.outlook.com (10.167.242.43) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9723.19 via Frontend Transport; Tue, 24 Mar 2026 01:30:45 +0000 Received: from drhqmail203.nvidia.com (10.126.190.182) by mail.nvidia.com (10.127.129.5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Mon, 23 Mar 2026 18:30:22 -0700 Received: from drhqmail201.nvidia.com (10.126.190.180) by drhqmail203.nvidia.com (10.126.190.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Mon, 23 Mar 2026 18:30:22 -0700 Received: from build-bwicaksono-noble-20251018.internal (10.127.8.11) by mail.nvidia.com (10.126.190.180) with Microsoft SMTP Server id 15.2.2562.20 via Frontend Transport; Mon, 23 Mar 2026 18:30:22 -0700 From: Besar Wicaksono To: , , , , CC: , , , , , , , , , , , , , , , , Besar Wicaksono Subject: [PATCH v3 7/8] perf: add NVIDIA Tegra410 C2C PMU Date: Tue, 24 Mar 2026 01:29:51 +0000 Message-ID: <20260324012952.1923296-8-bwicaksono@nvidia.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260324012952.1923296-1-bwicaksono@nvidia.com> References: <20260324012952.1923296-1-bwicaksono@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-NV-OnPremToCloud: ExternallySecured X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: MN1PEPF0000F0E5:EE_|LV3PR12MB9355:EE_ X-MS-Office365-Filtering-Correlation-Id: ae1cb6be-1dac-43b9-d1f4-08de8944f8ae X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|376014|7416014|36860700016|82310400026|56012099003|18002099003|22082099003; X-Microsoft-Antispam-Message-Info: +Fmf2fZZESqO+jEhv+3Gn7vusia+M+IIWXuUNzh8UBxgDXFnKbuJ7alStZRWdpkyOhnhac2EXERFZAYo4s8/9x4/MJ+tioWt1lltqRBrtj4SQj8LgVNnXCuRGb48O/RgJxkRTXGSyrXnDBjnF8vkCboBk+KHiSf/jM08TRu/kKBXbxK6vssTQyZbu+CLPjbtAptgA4eAfmzdFo37NvgJDofASYX3qZhCt/yW/ETJDx18pR8c5uXXFqDJMd/jf98N1uCuXB1L3Yrwwxz8lPfSjstzL9/vWpz5JPuUqhcbxUdLETJgLyJl8xMvxh42MAmkuRuauIABkA0HbyDBpefZLFx0QF3GN39R7mGhTTwP04icus/QWKAKzRHyBQcqBM1lGIbEdE0XrZu3U5DnO8L6s0qKepttGOoH1MR+jUVn86DyGIwyQU63Zn1mhxkFlovwxr3j/xx+0NADIvR6rgW69622KMrmJD2Dxalg/Hxj7gUYgGbPIt1KLov6BtgvXDhm51XBJqH1yOw/ztuf3W7NHWbZy1oH6g8n8Do6QcJUt/wTqLcCrlDUuS0xdT9hygUrF6IsmO79efAVr8x+xKfBpAP6QnTViJWtWHVTCftC8ktyO99M0peptcCgVEUDIbA89DTM7+0CDG/mgBYlu4WSACngOfzALGJyxaq4kCnORyJqdx0CdBnG2+I1O0/DoEZjdXUpvBnkj2dIi2M5K3TYGy+hCwnIexvlcluyoPzAkSPaRwogeEBITxsLkSNggePjyT90ftU1aUqQVNaun7gqkA== X-Forefront-Antispam-Report: CIP:216.228.118.232;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge1.nvidia.com;CAT:NONE;SFS:(13230040)(1800799024)(376014)(7416014)(36860700016)(82310400026)(56012099003)(18002099003)(22082099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: 7kmxCKLZXJmC0Ft17KjuKx+zFrODt5IVOe4AkqoJXOoTUf/QX6dizdXLyCfndwMu3hg6UE0qiHHS3Qz4Co3Rm8hMfEfsKYpbMtLadWmVpWtf5hSb0GbGM7OTeDO4rUhjtLhP1Yz/EqPNVFjIAa2C2XKR/69NA8D/3YBfNb9dV8g/T5Ry70UVJ5biB8l8mEk/VoyakUUmLOQ0nSx6LQoRTt4hLazEeQZFGwsp48HGETRebk/VbobU4TN5PlQptQZLoJ1baTiDPkB6NlcOFISbken0O9Zxu27dLmZ0hTh8NE7Z9cjktKyilBH+4hXc0/u18dB+qj5+pRHGNMA6W56nh3Mtw7hO7Mw9sUyZhSSgjTGW1473MWmZuSAP+EscUIGxnzYhh9EIMVIVrj4XACtLAZZuelw63NeKvLuzBVf9muo4BIAxq3np77mBX4KMQIE4 X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Mar 2026 01:30:45.4432 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: ae1cb6be-1dac-43b9-d1f4-08de8944f8ae X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.232];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: MN1PEPF0000F0E5.namprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: LV3PR12MB9355 Content-Type: text/plain; charset="utf-8" Adds NVIDIA C2C PMU support in Tegra410 SOC. This PMU is used to measure memory latency between the SOC and device memory, e.g GPU Memory (GMEM), CXL Memory, or memory on remote Tegra410 SOC. Reviewed-by: Ilkka Koskinen Signed-off-by: Besar Wicaksono --- .../admin-guide/perf/nvidia-tegra410-pmu.rst | 151 +++ drivers/perf/Kconfig | 7 + drivers/perf/Makefile | 1 + drivers/perf/nvidia_t410_c2c_pmu.c | 1051 +++++++++++++++++ 4 files changed, 1210 insertions(+) create mode 100644 drivers/perf/nvidia_t410_c2c_pmu.c diff --git a/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst b/Docum= entation/admin-guide/perf/nvidia-tegra410-pmu.rst index 9945c43f6a7a..0656223b61d4 100644 --- a/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst +++ b/Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst @@ -9,6 +9,9 @@ metrics like memory bandwidth, latency, and utilization: * PCIE * PCIE-TGT * CPU Memory (CMEM) Latency +* NVLink-C2C +* NV-CLink +* NV-DLink =20 PMU Driver ---------- @@ -369,3 +372,151 @@ see /sys/bus/event_source/devices/nvidia_cmem_latency= _pmu_. Example usage:: =20 perf stat -a -e '{nvidia_cmem_latency_pmu_0/rd_req/,nvidia_cmem_latency_= pmu_0/rd_cum_outs/,nvidia_cmem_latency_pmu_0/cycles/}' + +NVLink-C2C PMU +-------------- + +This PMU monitors latency events of memory read/write requests that pass t= hrough +the NVIDIA Chip-to-Chip (C2C) interface. Bandwidth events are not available +in this PMU, unlike the C2C PMU in Grace (Tegra241 SoC). + +The events and configuration options of this PMU device are available in s= ysfs, +see /sys/bus/event_source/devices/nvidia_nvlink_c2c_pmu_. + +The list of events: + + * IN_RD_CUM_OUTS: accumulated outstanding request (in cycles) of incomin= g read requests. + * IN_RD_REQ: the number of incoming read requests. + * IN_WR_CUM_OUTS: accumulated outstanding request (in cycles) of incomin= g write requests. + * IN_WR_REQ: the number of incoming write requests. + * OUT_RD_CUM_OUTS: accumulated outstanding request (in cycles) of outgoi= ng read requests. + * OUT_RD_REQ: the number of outgoing read requests. + * OUT_WR_CUM_OUTS: accumulated outstanding request (in cycles) of outgoi= ng write requests. + * OUT_WR_REQ: the number of outgoing write requests. + * CYCLES: NVLink-C2C interface cycle counts. + +The incoming events count the reads/writes from remote device to the SoC. +The outgoing events count the reads/writes from the SoC to remote device. + +The sysfs /sys/bus/event_source/devices/nvidia_nvlink_c2c_pmu_/= peer +contains the information about the connected device. + +When the C2C interface is connected to GPU(s), the user can use the +"gpu_mask" parameter to filter traffic to/from specific GPU(s). Each bit r= epresents the GPU +index, e.g. "gpu_mask=3D0x1" corresponds to GPU 0 and "gpu_mask=3D0x3" is = for GPU 0 and 1. +The PMU will monitor all GPUs by default if not specified. + +When connected to another SoC, only the read events are available. + +The events can be used to calculate the average latency of the read/write = requests:: + + C2C_FREQ_IN_GHZ =3D CYCLES / ELAPSED_TIME_IN_NS + + IN_RD_AVG_LATENCY_IN_CYCLES =3D IN_RD_CUM_OUTS / IN_RD_REQ + IN_RD_AVG_LATENCY_IN_NS =3D IN_RD_AVG_LATENCY_IN_CYCLES / C2C_FREQ_IN_G= HZ + + IN_WR_AVG_LATENCY_IN_CYCLES =3D IN_WR_CUM_OUTS / IN_WR_REQ + IN_WR_AVG_LATENCY_IN_NS =3D IN_WR_AVG_LATENCY_IN_CYCLES / C2C_FREQ_IN_G= HZ + + OUT_RD_AVG_LATENCY_IN_CYCLES =3D OUT_RD_CUM_OUTS / OUT_RD_REQ + OUT_RD_AVG_LATENCY_IN_NS =3D OUT_RD_AVG_LATENCY_IN_CYCLES / C2C_FREQ_IN= _GHZ + + OUT_WR_AVG_LATENCY_IN_CYCLES =3D OUT_WR_CUM_OUTS / OUT_WR_REQ + OUT_WR_AVG_LATENCY_IN_NS =3D OUT_WR_AVG_LATENCY_IN_CYCLES / C2C_FREQ_IN= _GHZ + +Example usage: + + * Count incoming traffic from all GPUs connected via NVLink-C2C:: + + perf stat -a -e nvidia_nvlink_c2c_pmu_0/in_rd_req/ + + * Count incoming traffic from GPU 0 connected via NVLink-C2C:: + + perf stat -a -e nvidia_nvlink_c2c_pmu_0/in_rd_cum_outs,gpu_mask=3D0x= 1/ + + * Count incoming traffic from GPU 1 connected via NVLink-C2C:: + + perf stat -a -e nvidia_nvlink_c2c_pmu_0/in_rd_cum_outs,gpu_mask=3D0x= 2/ + + * Count outgoing traffic to all GPUs connected via NVLink-C2C:: + + perf stat -a -e nvidia_nvlink_c2c_pmu_0/out_rd_req/ + + * Count outgoing traffic to GPU 0 connected via NVLink-C2C:: + + perf stat -a -e nvidia_nvlink_c2c_pmu_0/out_rd_cum_outs,gpu_mask=3D0= x1/ + + * Count outgoing traffic to GPU 1 connected via NVLink-C2C:: + + perf stat -a -e nvidia_nvlink_c2c_pmu_0/out_rd_cum_outs,gpu_mask=3D0= x2/ + +NV-CLink PMU +------------ + +This PMU monitors latency events of memory read requests that pass through +the NV-CLINK interface. Bandwidth events are not available in this PMU. +In Tegra410 SoC, the NV-CLink interface is used to connect to another Tegr= a410 +SoC and this PMU only counts read traffic. + +The events and configuration options of this PMU device are available in s= ysfs, +see /sys/bus/event_source/devices/nvidia_nvclink_pmu_. + +The list of events: + + * IN_RD_CUM_OUTS: accumulated outstanding request (in cycles) of incomin= g read requests. + * IN_RD_REQ: the number of incoming read requests. + * OUT_RD_CUM_OUTS: accumulated outstanding request (in cycles) of outgoi= ng read requests. + * OUT_RD_REQ: the number of outgoing read requests. + * CYCLES: NV-CLINK interface cycle counts. + +The incoming events count the reads from remote device to the SoC. +The outgoing events count the reads from the SoC to remote device. + +The events can be used to calculate the average latency of the read reques= ts:: + + CLINK_FREQ_IN_GHZ =3D CYCLES / ELAPSED_TIME_IN_NS + + IN_RD_AVG_LATENCY_IN_CYCLES =3D IN_RD_CUM_OUTS / IN_RD_REQ + IN_RD_AVG_LATENCY_IN_NS =3D IN_RD_AVG_LATENCY_IN_CYCLES / CLINK_FREQ_IN= _GHZ + + OUT_RD_AVG_LATENCY_IN_CYCLES =3D OUT_RD_CUM_OUTS / OUT_RD_REQ + OUT_RD_AVG_LATENCY_IN_NS =3D OUT_RD_AVG_LATENCY_IN_CYCLES / CLINK_FREQ_= IN_GHZ + +Example usage: + + * Count incoming read traffic from remote SoC connected via NV-CLINK:: + + perf stat -a -e nvidia_nvclink_pmu_0/in_rd_req/ + + * Count outgoing read traffic to remote SoC connected via NV-CLINK:: + + perf stat -a -e nvidia_nvclink_pmu_0/out_rd_req/ + +NV-DLink PMU +------------ + +This PMU monitors latency events of memory read requests that pass through +the NV-DLINK interface. Bandwidth events are not available in this PMU. +In Tegra410 SoC, this PMU only counts CXL memory read traffic. + +The events and configuration options of this PMU device are available in s= ysfs, +see /sys/bus/event_source/devices/nvidia_nvdlink_pmu_. + +The list of events: + + * IN_RD_CUM_OUTS: accumulated outstanding read requests (in cycles) to C= XL memory. + * IN_RD_REQ: the number of read requests to CXL memory. + * CYCLES: NV-DLINK interface cycle counts. + +The events can be used to calculate the average latency of the read reques= ts:: + + DLINK_FREQ_IN_GHZ =3D CYCLES / ELAPSED_TIME_IN_NS + + IN_RD_AVG_LATENCY_IN_CYCLES =3D IN_RD_CUM_OUTS / IN_RD_REQ + IN_RD_AVG_LATENCY_IN_NS =3D IN_RD_AVG_LATENCY_IN_CYCLES / DLINK_FREQ_IN= _GHZ + +Example usage: + + * Count read events to CXL memory:: + + perf stat -a -e '{nvidia_nvdlink_pmu_0/in_rd_req/,nvidia_nvdlink_pmu= _0/in_rd_cum_outs/}' diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig index 26e86067d8f9..ab90932fc2d0 100644 --- a/drivers/perf/Kconfig +++ b/drivers/perf/Kconfig @@ -318,4 +318,11 @@ config NVIDIA_TEGRA410_CMEM_LATENCY_PMU Enable perf support for CPU memory latency counters monitoring on NVIDIA Tegra410 SoC. =20 +config NVIDIA_TEGRA410_C2C_PMU + tristate "NVIDIA Tegra410 C2C PMU" + depends on ARM64 && ACPI + help + Enable perf support for counters in NVIDIA C2C interface of NVIDIA + Tegra410 SoC. + endmenu diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile index 4aa6aad393c2..eb8a022dad9a 100644 --- a/drivers/perf/Makefile +++ b/drivers/perf/Makefile @@ -36,3 +36,4 @@ obj-$(CONFIG_ARM_CORESIGHT_PMU_ARCH_SYSTEM_PMU) +=3D arm_= cspmu/ obj-$(CONFIG_MESON_DDR_PMU) +=3D amlogic/ obj-$(CONFIG_CXL_PMU) +=3D cxl_pmu.o obj-$(CONFIG_NVIDIA_TEGRA410_CMEM_LATENCY_PMU) +=3D nvidia_t410_cmem_laten= cy_pmu.o +obj-$(CONFIG_NVIDIA_TEGRA410_C2C_PMU) +=3D nvidia_t410_c2c_pmu.o diff --git a/drivers/perf/nvidia_t410_c2c_pmu.c b/drivers/perf/nvidia_t410_= c2c_pmu.c new file mode 100644 index 000000000000..411987153ff3 --- /dev/null +++ b/drivers/perf/nvidia_t410_c2c_pmu.c @@ -0,0 +1,1051 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * NVIDIA Tegra410 C2C PMU driver. + * + * Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserve= d. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* The C2C interface types in Tegra410. */ +#define C2C_TYPE_NVLINK 0x0 +#define C2C_TYPE_NVCLINK 0x1 +#define C2C_TYPE_NVDLINK 0x2 +#define C2C_TYPE_COUNT 0x3 + +/* The type of the peer device connected to the C2C interface. */ +#define C2C_PEER_TYPE_CPU 0x0 +#define C2C_PEER_TYPE_GPU 0x1 +#define C2C_PEER_TYPE_CXLMEM 0x2 +#define C2C_PEER_TYPE_COUNT 0x3 + +/* The number of peer devices can be connected to the C2C interface. */ +#define C2C_NR_PEER_CPU 0x1 +#define C2C_NR_PEER_GPU 0x2 +#define C2C_NR_PEER_CXLMEM 0x1 +#define C2C_NR_PEER_MAX 0x2 + +/* Number of instances on each interface. */ +#define C2C_NR_INST_NVLINK 14 +#define C2C_NR_INST_NVCLINK 12 +#define C2C_NR_INST_NVDLINK 16 +#define C2C_NR_INST_MAX 16 + +/* Register offsets. */ +#define C2C_CTRL 0x864 +#define C2C_IN_STATUS 0x868 +#define C2C_CYCLE_CNTR 0x86c +#define C2C_IN_RD_CUM_OUTS_CNTR 0x874 +#define C2C_IN_RD_REQ_CNTR 0x87c +#define C2C_IN_WR_CUM_OUTS_CNTR 0x884 +#define C2C_IN_WR_REQ_CNTR 0x88c +#define C2C_OUT_STATUS 0x890 +#define C2C_OUT_RD_CUM_OUTS_CNTR 0x898 +#define C2C_OUT_RD_REQ_CNTR 0x8a0 +#define C2C_OUT_WR_CUM_OUTS_CNTR 0x8a8 +#define C2C_OUT_WR_REQ_CNTR 0x8b0 + +/* C2C_IN_STATUS register field. */ +#define C2C_IN_STATUS_CYCLE_OVF BIT(0) +#define C2C_IN_STATUS_IN_RD_CUM_OUTS_OVF BIT(1) +#define C2C_IN_STATUS_IN_RD_REQ_OVF BIT(2) +#define C2C_IN_STATUS_IN_WR_CUM_OUTS_OVF BIT(3) +#define C2C_IN_STATUS_IN_WR_REQ_OVF BIT(4) + +/* C2C_OUT_STATUS register field. */ +#define C2C_OUT_STATUS_OUT_RD_CUM_OUTS_OVF BIT(0) +#define C2C_OUT_STATUS_OUT_RD_REQ_OVF BIT(1) +#define C2C_OUT_STATUS_OUT_WR_CUM_OUTS_OVF BIT(2) +#define C2C_OUT_STATUS_OUT_WR_REQ_OVF BIT(3) + +/* Events. */ +#define C2C_EVENT_CYCLES 0x0 +#define C2C_EVENT_IN_RD_CUM_OUTS 0x1 +#define C2C_EVENT_IN_RD_REQ 0x2 +#define C2C_EVENT_IN_WR_CUM_OUTS 0x3 +#define C2C_EVENT_IN_WR_REQ 0x4 +#define C2C_EVENT_OUT_RD_CUM_OUTS 0x5 +#define C2C_EVENT_OUT_RD_REQ 0x6 +#define C2C_EVENT_OUT_WR_CUM_OUTS 0x7 +#define C2C_EVENT_OUT_WR_REQ 0x8 + +#define C2C_NUM_EVENTS 0x9 +#define C2C_MASK_EVENT 0xFF +#define C2C_MAX_ACTIVE_EVENTS 32 + +#define C2C_ACTIVE_CPU_MASK 0x0 +#define C2C_ASSOCIATED_CPU_MASK 0x1 + +/* + * Maximum poll count for reading counter value using high-low-high sequen= ce. + */ +#define HILOHI_MAX_POLL 1000 + +static unsigned long nv_c2c_pmu_cpuhp_state; + +/* PMU descriptor. */ + +/* C2C type information. */ +struct nv_c2c_pmu_data { + unsigned int c2c_type; + unsigned int nr_inst; + const char *name_fmt; +}; + +static const struct nv_c2c_pmu_data nv_c2c_pmu_data[] =3D { + [C2C_TYPE_NVLINK] =3D { + .c2c_type =3D C2C_TYPE_NVLINK, + .nr_inst =3D C2C_NR_INST_NVLINK, + .name_fmt =3D "nvidia_nvlink_c2c_pmu_%u", + }, + [C2C_TYPE_NVCLINK] =3D { + .c2c_type =3D C2C_TYPE_NVCLINK, + .nr_inst =3D C2C_NR_INST_NVCLINK, + .name_fmt =3D "nvidia_nvclink_pmu_%u", + }, + [C2C_TYPE_NVDLINK] =3D { + .c2c_type =3D C2C_TYPE_NVDLINK, + .nr_inst =3D C2C_NR_INST_NVDLINK, + .name_fmt =3D "nvidia_nvdlink_pmu_%u", + }, +}; + +/* Tracks the events assigned to the PMU for a given logical index. */ +struct nv_c2c_pmu_hw_events { + /* The events that are active. */ + struct perf_event *events[C2C_MAX_ACTIVE_EVENTS]; + + /* + * Each bit indicates a logical counter is being used (or not) for an + * event. + */ + DECLARE_BITMAP(used_ctrs, C2C_MAX_ACTIVE_EVENTS); +}; + +struct nv_c2c_pmu { + struct pmu pmu; + struct device *dev; + struct acpi_device *acpi_dev; + + const char *name; + const char *identifier; + + const struct nv_c2c_pmu_data *data; + unsigned int peer_type; + unsigned int socket; + unsigned int nr_peer; + unsigned long peer_insts[C2C_NR_PEER_MAX][BITS_TO_LONGS(C2C_NR_INST_MAX)]; + u32 filter_default; + + struct nv_c2c_pmu_hw_events hw_events; + + cpumask_t associated_cpus; + cpumask_t active_cpu; + + struct hlist_node cpuhp_node; + + const struct attribute_group **attr_groups; + + void __iomem *base_broadcast; + void __iomem *base[C2C_NR_INST_MAX]; +}; + +#define to_c2c_pmu(p) (container_of(p, struct nv_c2c_pmu, pmu)) + +/* Get event type from perf_event. */ +static inline u32 get_event_type(struct perf_event *event) +{ + return (event->attr.config) & C2C_MASK_EVENT; +} + +static inline u32 get_filter_mask(struct perf_event *event) +{ + u32 filter; + struct nv_c2c_pmu *c2c_pmu =3D to_c2c_pmu(event->pmu); + + filter =3D ((u32)event->attr.config1) & c2c_pmu->filter_default; + if (filter =3D=3D 0) + filter =3D c2c_pmu->filter_default; + + return filter; +} + +/* PMU operations. */ + +static int nv_c2c_pmu_get_event_idx(struct nv_c2c_pmu_hw_events *hw_events, + struct perf_event *event) +{ + u32 idx; + + idx =3D find_first_zero_bit(hw_events->used_ctrs, C2C_MAX_ACTIVE_EVENTS); + if (idx >=3D C2C_MAX_ACTIVE_EVENTS) + return -EAGAIN; + + set_bit(idx, hw_events->used_ctrs); + + return idx; +} + +static bool +nv_c2c_pmu_validate_event(struct pmu *pmu, + struct nv_c2c_pmu_hw_events *hw_events, + struct perf_event *event) +{ + if (is_software_event(event)) + return true; + + /* Reject groups spanning multiple HW PMUs. */ + if (event->pmu !=3D pmu) + return false; + + return nv_c2c_pmu_get_event_idx(hw_events, event) >=3D 0; +} + +/* + * Make sure the group of events can be scheduled at once + * on the PMU. + */ +static bool nv_c2c_pmu_validate_group(struct perf_event *event) +{ + struct perf_event *sibling, *leader =3D event->group_leader; + struct nv_c2c_pmu_hw_events fake_hw_events; + + if (event->group_leader =3D=3D event) + return true; + + memset(&fake_hw_events, 0, sizeof(fake_hw_events)); + + if (!nv_c2c_pmu_validate_event(event->pmu, &fake_hw_events, leader)) + return false; + + for_each_sibling_event(sibling, leader) { + if (!nv_c2c_pmu_validate_event(event->pmu, &fake_hw_events, + sibling)) + return false; + } + + return nv_c2c_pmu_validate_event(event->pmu, &fake_hw_events, event); +} + +static int nv_c2c_pmu_event_init(struct perf_event *event) +{ + struct nv_c2c_pmu *c2c_pmu =3D to_c2c_pmu(event->pmu); + struct hw_perf_event *hwc =3D &event->hw; + u32 event_type =3D get_event_type(event); + + if (event->attr.type !=3D event->pmu->type || + event_type >=3D C2C_NUM_EVENTS) + return -ENOENT; + + /* + * Following other "uncore" PMUs, we do not support sampling mode or + * attach to a task (per-process mode). + */ + if (is_sampling_event(event)) { + dev_dbg(c2c_pmu->pmu.dev, "Can't support sampling events\n"); + return -EOPNOTSUPP; + } + + if (event->cpu < 0 || event->attach_state & PERF_ATTACH_TASK) { + dev_dbg(c2c_pmu->pmu.dev, "Can't support per-task counters\n"); + return -EINVAL; + } + + /* + * Make sure the CPU assignment is on one of the CPUs associated with + * this PMU. + */ + if (!cpumask_test_cpu(event->cpu, &c2c_pmu->associated_cpus)) { + dev_dbg(c2c_pmu->pmu.dev, + "Requested cpu is not associated with the PMU\n"); + return -EINVAL; + } + + /* Enforce the current active CPU to handle the events in this PMU. */ + event->cpu =3D cpumask_first(&c2c_pmu->active_cpu); + if (event->cpu >=3D nr_cpu_ids) + return -EINVAL; + + if (!nv_c2c_pmu_validate_group(event)) + return -EINVAL; + + hwc->idx =3D -1; + hwc->config =3D event_type; + + return 0; +} + +/* + * Read 64-bit register as a pair of 32-bit registers using hi-lo-hi seque= nce. + */ +static u64 read_reg64_hilohi(const void __iomem *addr, u32 max_poll_count) +{ + u32 val_lo, val_hi; + u64 val; + + /* Use high-low-high sequence to avoid tearing */ + do { + if (max_poll_count-- =3D=3D 0) { + pr_err("NV C2C PMU: timeout hi-low-high sequence\n"); + return 0; + } + + val_hi =3D readl(addr + 4); + val_lo =3D readl(addr); + } while (val_hi !=3D readl(addr + 4)); + + val =3D (((u64)val_hi << 32) | val_lo); + + return val; +} + +static void nv_c2c_pmu_check_status(struct nv_c2c_pmu *c2c_pmu, u32 instan= ce) +{ + u32 in_status, out_status; + + in_status =3D readl(c2c_pmu->base[instance] + C2C_IN_STATUS); + out_status =3D readl(c2c_pmu->base[instance] + C2C_OUT_STATUS); + + if (in_status || out_status) + dev_warn(c2c_pmu->dev, + "C2C PMU overflow in: 0x%x, out: 0x%x\n", + in_status, out_status); +} + +static u32 nv_c2c_ctr_offset[C2C_NUM_EVENTS] =3D { + [C2C_EVENT_CYCLES] =3D C2C_CYCLE_CNTR, + [C2C_EVENT_IN_RD_CUM_OUTS] =3D C2C_IN_RD_CUM_OUTS_CNTR, + [C2C_EVENT_IN_RD_REQ] =3D C2C_IN_RD_REQ_CNTR, + [C2C_EVENT_IN_WR_CUM_OUTS] =3D C2C_IN_WR_CUM_OUTS_CNTR, + [C2C_EVENT_IN_WR_REQ] =3D C2C_IN_WR_REQ_CNTR, + [C2C_EVENT_OUT_RD_CUM_OUTS] =3D C2C_OUT_RD_CUM_OUTS_CNTR, + [C2C_EVENT_OUT_RD_REQ] =3D C2C_OUT_RD_REQ_CNTR, + [C2C_EVENT_OUT_WR_CUM_OUTS] =3D C2C_OUT_WR_CUM_OUTS_CNTR, + [C2C_EVENT_OUT_WR_REQ] =3D C2C_OUT_WR_REQ_CNTR, +}; + +static u64 nv_c2c_pmu_read_counter(struct perf_event *event) +{ + u32 ctr_id, ctr_offset, filter_mask, filter_idx, inst_idx; + unsigned long *inst_mask; + DECLARE_BITMAP(filter_bitmap, C2C_NR_PEER_MAX); + struct nv_c2c_pmu *c2c_pmu =3D to_c2c_pmu(event->pmu); + u64 val =3D 0; + + filter_mask =3D get_filter_mask(event); + bitmap_from_arr32(filter_bitmap, &filter_mask, c2c_pmu->nr_peer); + + ctr_id =3D event->hw.config; + ctr_offset =3D nv_c2c_ctr_offset[ctr_id]; + + for_each_set_bit(filter_idx, filter_bitmap, c2c_pmu->nr_peer) { + inst_mask =3D c2c_pmu->peer_insts[filter_idx]; + for_each_set_bit(inst_idx, inst_mask, c2c_pmu->data->nr_inst) { + nv_c2c_pmu_check_status(c2c_pmu, inst_idx); + + /* + * Each instance share same clock and the driver always + * enables all instances. So we can use the counts from + * one instance for cycle counter. + */ + if (ctr_id =3D=3D C2C_EVENT_CYCLES) + return read_reg64_hilohi( + c2c_pmu->base[inst_idx] + ctr_offset, + HILOHI_MAX_POLL); + + /* + * For other events, sum up the counts from all instances. + */ + val +=3D read_reg64_hilohi( + c2c_pmu->base[inst_idx] + ctr_offset, + HILOHI_MAX_POLL); + } + } + + return val; +} + +static void nv_c2c_pmu_event_update(struct perf_event *event) +{ + struct hw_perf_event *hwc =3D &event->hw; + u64 prev, now; + + do { + prev =3D local64_read(&hwc->prev_count); + now =3D nv_c2c_pmu_read_counter(event); + } while (local64_cmpxchg(&hwc->prev_count, prev, now) !=3D prev); + + local64_add(now - prev, &event->count); +} + +static void nv_c2c_pmu_start(struct perf_event *event, int pmu_flags) +{ + event->hw.state =3D 0; +} + +static void nv_c2c_pmu_stop(struct perf_event *event, int pmu_flags) +{ + event->hw.state |=3D PERF_HES_STOPPED; +} + +static int nv_c2c_pmu_add(struct perf_event *event, int flags) +{ + struct nv_c2c_pmu *c2c_pmu =3D to_c2c_pmu(event->pmu); + struct nv_c2c_pmu_hw_events *hw_events =3D &c2c_pmu->hw_events; + struct hw_perf_event *hwc =3D &event->hw; + int idx; + + if (WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), + &c2c_pmu->associated_cpus))) + return -ENOENT; + + idx =3D nv_c2c_pmu_get_event_idx(hw_events, event); + if (idx < 0) + return idx; + + hw_events->events[idx] =3D event; + hwc->idx =3D idx; + hwc->state =3D PERF_HES_STOPPED | PERF_HES_UPTODATE; + + if (flags & PERF_EF_START) + nv_c2c_pmu_start(event, PERF_EF_RELOAD); + + /* Propagate changes to the userspace mapping. */ + perf_event_update_userpage(event); + + return 0; +} + +static void nv_c2c_pmu_del(struct perf_event *event, int flags) +{ + struct nv_c2c_pmu *c2c_pmu =3D to_c2c_pmu(event->pmu); + struct nv_c2c_pmu_hw_events *hw_events =3D &c2c_pmu->hw_events; + struct hw_perf_event *hwc =3D &event->hw; + int idx =3D hwc->idx; + + nv_c2c_pmu_stop(event, PERF_EF_UPDATE); + + hw_events->events[idx] =3D NULL; + + clear_bit(idx, hw_events->used_ctrs); + + perf_event_update_userpage(event); +} + +static void nv_c2c_pmu_read(struct perf_event *event) +{ + nv_c2c_pmu_event_update(event); +} + +static void nv_c2c_pmu_enable(struct pmu *pmu) +{ + void __iomem *bcast; + struct nv_c2c_pmu *c2c_pmu =3D to_c2c_pmu(pmu); + + /* Check if any filter is enabled. */ + if (bitmap_empty(c2c_pmu->hw_events.used_ctrs, C2C_MAX_ACTIVE_EVENTS)) + return; + + /* Enable all the counters. */ + bcast =3D c2c_pmu->base_broadcast; + writel(0x1UL, bcast + C2C_CTRL); +} + +static void nv_c2c_pmu_disable(struct pmu *pmu) +{ + unsigned int idx; + void __iomem *bcast; + struct perf_event *event; + struct nv_c2c_pmu *c2c_pmu =3D to_c2c_pmu(pmu); + + /* Disable all the counters. */ + bcast =3D c2c_pmu->base_broadcast; + writel(0x0UL, bcast + C2C_CTRL); + + /* + * The counters will start from 0 again on restart. + * Update the events immediately to avoid losing the counts. + */ + for_each_set_bit(idx, c2c_pmu->hw_events.used_ctrs, + C2C_MAX_ACTIVE_EVENTS) { + event =3D c2c_pmu->hw_events.events[idx]; + + if (!event) + continue; + + nv_c2c_pmu_event_update(event); + + local64_set(&event->hw.prev_count, 0ULL); + } +} + +/* PMU identifier attribute. */ + +static ssize_t nv_c2c_pmu_identifier_show(struct device *dev, + struct device_attribute *attr, + char *page) +{ + struct nv_c2c_pmu *c2c_pmu =3D to_c2c_pmu(dev_get_drvdata(dev)); + + return sysfs_emit(page, "%s\n", c2c_pmu->identifier); +} + +static struct device_attribute nv_c2c_pmu_identifier_attr =3D + __ATTR(identifier, 0444, nv_c2c_pmu_identifier_show, NULL); + +static struct attribute *nv_c2c_pmu_identifier_attrs[] =3D { + &nv_c2c_pmu_identifier_attr.attr, + NULL, +}; + +static struct attribute_group nv_c2c_pmu_identifier_attr_group =3D { + .attrs =3D nv_c2c_pmu_identifier_attrs, +}; + +/* Peer attribute. */ + +static ssize_t nv_c2c_pmu_peer_show(struct device *dev, + struct device_attribute *attr, + char *page) +{ + const char *peer_type[C2C_PEER_TYPE_COUNT] =3D { + [C2C_PEER_TYPE_CPU] =3D "cpu", + [C2C_PEER_TYPE_GPU] =3D "gpu", + [C2C_PEER_TYPE_CXLMEM] =3D "cxlmem", + }; + + struct nv_c2c_pmu *c2c_pmu =3D to_c2c_pmu(dev_get_drvdata(dev)); + return sysfs_emit(page, "nr_%s=3D%u\n", peer_type[c2c_pmu->peer_type], + c2c_pmu->nr_peer); +} + +static struct device_attribute nv_c2c_pmu_peer_attr =3D + __ATTR(peer, 0444, nv_c2c_pmu_peer_show, NULL); + +static struct attribute *nv_c2c_pmu_peer_attrs[] =3D { + &nv_c2c_pmu_peer_attr.attr, + NULL, +}; + +static struct attribute_group nv_c2c_pmu_peer_attr_group =3D { + .attrs =3D nv_c2c_pmu_peer_attrs, +}; + +/* Format attributes. */ + +#define NV_C2C_PMU_EXT_ATTR(_name, _func, _config) \ + (&((struct dev_ext_attribute[]){ \ + { \ + .attr =3D __ATTR(_name, 0444, _func, NULL), \ + .var =3D (void *)_config \ + } \ + })[0].attr.attr) + +#define NV_C2C_PMU_FORMAT_ATTR(_name, _config) \ + NV_C2C_PMU_EXT_ATTR(_name, device_show_string, _config) + +#define NV_C2C_PMU_FORMAT_EVENT_ATTR \ + NV_C2C_PMU_FORMAT_ATTR(event, "config:0-3") + +static struct attribute *nv_c2c_pmu_gpu_formats[] =3D { + NV_C2C_PMU_FORMAT_EVENT_ATTR, + NV_C2C_PMU_FORMAT_ATTR(gpu_mask, "config1:0-1"), + NULL, +}; + +static const struct attribute_group nv_c2c_pmu_gpu_format_group =3D { + .name =3D "format", + .attrs =3D nv_c2c_pmu_gpu_formats, +}; + +static struct attribute *nv_c2c_pmu_formats[] =3D { + NV_C2C_PMU_FORMAT_EVENT_ATTR, + NULL, +}; + +static const struct attribute_group nv_c2c_pmu_format_group =3D { + .name =3D "format", + .attrs =3D nv_c2c_pmu_formats, +}; + +/* Event attributes. */ + +static ssize_t nv_c2c_pmu_sysfs_event_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct perf_pmu_events_attr *pmu_attr; + + pmu_attr =3D container_of(attr, typeof(*pmu_attr), attr); + return sysfs_emit(buf, "event=3D0x%llx\n", pmu_attr->id); +} + +#define NV_C2C_PMU_EVENT_ATTR(_name, _config) \ + PMU_EVENT_ATTR_ID(_name, nv_c2c_pmu_sysfs_event_show, _config) + +static struct attribute *nv_c2c_pmu_gpu_events[] =3D { + NV_C2C_PMU_EVENT_ATTR(cycles, C2C_EVENT_CYCLES), + NV_C2C_PMU_EVENT_ATTR(in_rd_cum_outs, C2C_EVENT_IN_RD_CUM_OUTS), + NV_C2C_PMU_EVENT_ATTR(in_rd_req, C2C_EVENT_IN_RD_REQ), + NV_C2C_PMU_EVENT_ATTR(in_wr_cum_outs, C2C_EVENT_IN_WR_CUM_OUTS), + NV_C2C_PMU_EVENT_ATTR(in_wr_req, C2C_EVENT_IN_WR_REQ), + NV_C2C_PMU_EVENT_ATTR(out_rd_cum_outs, C2C_EVENT_OUT_RD_CUM_OUTS), + NV_C2C_PMU_EVENT_ATTR(out_rd_req, C2C_EVENT_OUT_RD_REQ), + NV_C2C_PMU_EVENT_ATTR(out_wr_cum_outs, C2C_EVENT_OUT_WR_CUM_OUTS), + NV_C2C_PMU_EVENT_ATTR(out_wr_req, C2C_EVENT_OUT_WR_REQ), + NULL +}; + +static const struct attribute_group nv_c2c_pmu_gpu_events_group =3D { + .name =3D "events", + .attrs =3D nv_c2c_pmu_gpu_events, +}; + +static struct attribute *nv_c2c_pmu_cpu_events[] =3D { + NV_C2C_PMU_EVENT_ATTR(cycles, C2C_EVENT_CYCLES), + NV_C2C_PMU_EVENT_ATTR(in_rd_cum_outs, C2C_EVENT_IN_RD_CUM_OUTS), + NV_C2C_PMU_EVENT_ATTR(in_rd_req, C2C_EVENT_IN_RD_REQ), + NV_C2C_PMU_EVENT_ATTR(out_rd_cum_outs, C2C_EVENT_OUT_RD_CUM_OUTS), + NV_C2C_PMU_EVENT_ATTR(out_rd_req, C2C_EVENT_OUT_RD_REQ), + NULL +}; + +static const struct attribute_group nv_c2c_pmu_cpu_events_group =3D { + .name =3D "events", + .attrs =3D nv_c2c_pmu_cpu_events, +}; + +static struct attribute *nv_c2c_pmu_cxlmem_events[] =3D { + NV_C2C_PMU_EVENT_ATTR(cycles, C2C_EVENT_CYCLES), + NV_C2C_PMU_EVENT_ATTR(in_rd_cum_outs, C2C_EVENT_IN_RD_CUM_OUTS), + NV_C2C_PMU_EVENT_ATTR(in_rd_req, C2C_EVENT_IN_RD_REQ), + NULL +}; + +static const struct attribute_group nv_c2c_pmu_cxlmem_events_group =3D { + .name =3D "events", + .attrs =3D nv_c2c_pmu_cxlmem_events, +}; + +/* Cpumask attributes. */ + +static ssize_t nv_c2c_pmu_cpumask_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct pmu *pmu =3D dev_get_drvdata(dev); + struct nv_c2c_pmu *c2c_pmu =3D to_c2c_pmu(pmu); + struct dev_ext_attribute *eattr =3D + container_of(attr, struct dev_ext_attribute, attr); + unsigned long mask_id =3D (unsigned long)eattr->var; + const cpumask_t *cpumask; + + switch (mask_id) { + case C2C_ACTIVE_CPU_MASK: + cpumask =3D &c2c_pmu->active_cpu; + break; + case C2C_ASSOCIATED_CPU_MASK: + cpumask =3D &c2c_pmu->associated_cpus; + break; + default: + return 0; + } + return cpumap_print_to_pagebuf(true, buf, cpumask); +} + +#define NV_C2C_PMU_CPUMASK_ATTR(_name, _config) \ + NV_C2C_PMU_EXT_ATTR(_name, nv_c2c_pmu_cpumask_show, \ + (unsigned long)_config) + +static struct attribute *nv_c2c_pmu_cpumask_attrs[] =3D { + NV_C2C_PMU_CPUMASK_ATTR(cpumask, C2C_ACTIVE_CPU_MASK), + NV_C2C_PMU_CPUMASK_ATTR(associated_cpus, C2C_ASSOCIATED_CPU_MASK), + NULL, +}; + +static const struct attribute_group nv_c2c_pmu_cpumask_attr_group =3D { + .attrs =3D nv_c2c_pmu_cpumask_attrs, +}; + +/* Attribute groups for C2C PMU connecting SoC and GPU */ +static const struct attribute_group *nv_c2c_pmu_gpu_attr_groups[] =3D { + &nv_c2c_pmu_gpu_format_group, + &nv_c2c_pmu_gpu_events_group, + &nv_c2c_pmu_cpumask_attr_group, + &nv_c2c_pmu_identifier_attr_group, + &nv_c2c_pmu_peer_attr_group, + NULL +}; + +/* Attribute groups for C2C PMU connecting multiple SoCs */ +static const struct attribute_group *nv_c2c_pmu_cpu_attr_groups[] =3D { + &nv_c2c_pmu_format_group, + &nv_c2c_pmu_cpu_events_group, + &nv_c2c_pmu_cpumask_attr_group, + &nv_c2c_pmu_identifier_attr_group, + &nv_c2c_pmu_peer_attr_group, + NULL +}; + +/* Attribute groups for C2C PMU connecting SoC and CXLMEM */ +static const struct attribute_group *nv_c2c_pmu_cxlmem_attr_groups[] =3D { + &nv_c2c_pmu_format_group, + &nv_c2c_pmu_cxlmem_events_group, + &nv_c2c_pmu_cpumask_attr_group, + &nv_c2c_pmu_identifier_attr_group, + &nv_c2c_pmu_peer_attr_group, + NULL +}; + +static int nv_c2c_pmu_online_cpu(unsigned int cpu, struct hlist_node *node) +{ + struct nv_c2c_pmu *c2c_pmu =3D + hlist_entry_safe(node, struct nv_c2c_pmu, cpuhp_node); + + if (!cpumask_test_cpu(cpu, &c2c_pmu->associated_cpus)) + return 0; + + /* If the PMU is already managed, there is nothing to do */ + if (!cpumask_empty(&c2c_pmu->active_cpu)) + return 0; + + /* Use this CPU for event counting */ + cpumask_set_cpu(cpu, &c2c_pmu->active_cpu); + + return 0; +} + +static int nv_c2c_pmu_cpu_teardown(unsigned int cpu, struct hlist_node *no= de) +{ + unsigned int dst; + + struct nv_c2c_pmu *c2c_pmu =3D + hlist_entry_safe(node, struct nv_c2c_pmu, cpuhp_node); + + /* Nothing to do if this CPU doesn't own the PMU */ + if (!cpumask_test_and_clear_cpu(cpu, &c2c_pmu->active_cpu)) + return 0; + + /* Choose a new CPU to migrate ownership of the PMU to */ + dst =3D cpumask_any_and_but(&c2c_pmu->associated_cpus, + cpu_online_mask, cpu); + if (dst >=3D nr_cpu_ids) + return 0; + + /* Use this CPU for event counting */ + perf_pmu_migrate_context(&c2c_pmu->pmu, cpu, dst); + cpumask_set_cpu(dst, &c2c_pmu->active_cpu); + + return 0; +} + +static int nv_c2c_pmu_get_cpus(struct nv_c2c_pmu *c2c_pmu) +{ + int socket =3D c2c_pmu->socket, cpu; + + for_each_possible_cpu(cpu) { + if (cpu_to_node(cpu) =3D=3D socket) + cpumask_set_cpu(cpu, &c2c_pmu->associated_cpus); + } + + if (cpumask_empty(&c2c_pmu->associated_cpus)) { + dev_dbg(c2c_pmu->dev, + "No cpu associated with C2C PMU socket-%u\n", socket); + return -ENODEV; + } + + return 0; +} + +static int nv_c2c_pmu_init_socket(struct nv_c2c_pmu *c2c_pmu) +{ + const char *uid_str; + int ret, socket; + + uid_str =3D acpi_device_uid(c2c_pmu->acpi_dev); + if (!uid_str) { + dev_err(c2c_pmu->dev, "No ACPI device UID\n"); + return -ENODEV; + } + + ret =3D kstrtou32(uid_str, 0, &socket); + if (ret) { + dev_err(c2c_pmu->dev, "Failed to parse ACPI device UID\n"); + return ret; + } + + c2c_pmu->socket =3D socket; + return 0; +} + +static int nv_c2c_pmu_init_id(struct nv_c2c_pmu *c2c_pmu) +{ + char *name; + + name =3D devm_kasprintf(c2c_pmu->dev, GFP_KERNEL, c2c_pmu->data->name_fmt, + c2c_pmu->socket); + if (!name) + return -ENOMEM; + + c2c_pmu->name =3D name; + + c2c_pmu->identifier =3D acpi_device_hid(c2c_pmu->acpi_dev); + + return 0; +} + +static int nv_c2c_pmu_init_filter(struct nv_c2c_pmu *c2c_pmu) +{ + u32 cpu_en =3D 0; + struct device *dev =3D c2c_pmu->dev; + const struct nv_c2c_pmu_data *data =3D c2c_pmu->data; + + if (data->c2c_type =3D=3D C2C_TYPE_NVDLINK) { + c2c_pmu->peer_type =3D C2C_PEER_TYPE_CXLMEM; + + c2c_pmu->peer_insts[0][0] =3D (1UL << data->nr_inst) - 1; + + c2c_pmu->nr_peer =3D C2C_NR_PEER_CXLMEM; + c2c_pmu->filter_default =3D (1 << c2c_pmu->nr_peer) - 1; + + c2c_pmu->attr_groups =3D nv_c2c_pmu_cxlmem_attr_groups; + + return 0; + } + + if (device_property_read_u32(dev, "cpu_en_mask", &cpu_en)) + dev_dbg(dev, "no cpu_en_mask property\n"); + + if (cpu_en) { + c2c_pmu->peer_type =3D C2C_PEER_TYPE_CPU; + + /* Fill peer_insts bitmap with instances connected to peer CPU. */ + bitmap_from_arr32(c2c_pmu->peer_insts[0], &cpu_en, data->nr_inst); + + c2c_pmu->nr_peer =3D 1; + c2c_pmu->attr_groups =3D nv_c2c_pmu_cpu_attr_groups; + } else { + u32 i; + const char *props[C2C_NR_PEER_MAX] =3D { + "gpu0_en_mask", "gpu1_en_mask" + }; + + for (i =3D 0; i < C2C_NR_PEER_MAX; i++) { + u32 gpu_en =3D 0; + + if (device_property_read_u32(dev, props[i], &gpu_en)) + dev_dbg(dev, "no %s property\n", props[i]); + + if (gpu_en) { + /* Fill peer_insts bitmap with instances connected to peer GPU. */ + bitmap_from_arr32(c2c_pmu->peer_insts[i], &gpu_en, + data->nr_inst); + + c2c_pmu->nr_peer++; + } + } + + if (c2c_pmu->nr_peer =3D=3D 0) { + dev_err(dev, "No GPU is enabled\n"); + return -EINVAL; + } + + c2c_pmu->peer_type =3D C2C_PEER_TYPE_GPU; + c2c_pmu->attr_groups =3D nv_c2c_pmu_gpu_attr_groups; + } + + c2c_pmu->filter_default =3D (1 << c2c_pmu->nr_peer) - 1; + + return 0; +} + +static void *nv_c2c_pmu_init_pmu(struct platform_device *pdev) +{ + int ret; + struct nv_c2c_pmu *c2c_pmu; + struct acpi_device *acpi_dev; + struct device *dev =3D &pdev->dev; + + acpi_dev =3D ACPI_COMPANION(dev); + if (!acpi_dev) + return ERR_PTR(-ENODEV); + + c2c_pmu =3D devm_kzalloc(dev, sizeof(*c2c_pmu), GFP_KERNEL); + if (!c2c_pmu) + return ERR_PTR(-ENOMEM); + + c2c_pmu->dev =3D dev; + c2c_pmu->acpi_dev =3D acpi_dev; + c2c_pmu->data =3D (const struct nv_c2c_pmu_data *)device_get_match_data(d= ev); + if (!c2c_pmu->data) + return ERR_PTR(-EINVAL); + + platform_set_drvdata(pdev, c2c_pmu); + + ret =3D nv_c2c_pmu_init_socket(c2c_pmu); + if (ret) + return ERR_PTR(ret); + + ret =3D nv_c2c_pmu_init_id(c2c_pmu); + if (ret) + return ERR_PTR(ret); + + ret =3D nv_c2c_pmu_init_filter(c2c_pmu); + if (ret) + return ERR_PTR(ret); + + return c2c_pmu; +} + +static int nv_c2c_pmu_init_mmio(struct nv_c2c_pmu *c2c_pmu) +{ + int i; + struct device *dev =3D c2c_pmu->dev; + struct platform_device *pdev =3D to_platform_device(dev); + const struct nv_c2c_pmu_data *data =3D c2c_pmu->data; + + /* Map the address of all the instances. */ + for (i =3D 0; i < data->nr_inst; i++) { + c2c_pmu->base[i] =3D devm_platform_ioremap_resource(pdev, i); + if (IS_ERR(c2c_pmu->base[i])) { + dev_err(dev, "Failed map address for instance %d\n", i); + return PTR_ERR(c2c_pmu->base[i]); + } + } + + /* Map broadcast address. */ + c2c_pmu->base_broadcast =3D devm_platform_ioremap_resource(pdev, + data->nr_inst); + if (IS_ERR(c2c_pmu->base_broadcast)) { + dev_err(dev, "Failed map broadcast address\n"); + return PTR_ERR(c2c_pmu->base_broadcast); + } + + return 0; +} + +static int nv_c2c_pmu_register_pmu(struct nv_c2c_pmu *c2c_pmu) +{ + int ret; + + ret =3D cpuhp_state_add_instance(nv_c2c_pmu_cpuhp_state, + &c2c_pmu->cpuhp_node); + if (ret) { + dev_err(c2c_pmu->dev, "Error %d registering hotplug\n", ret); + return ret; + } + + c2c_pmu->pmu =3D (struct pmu) { + .parent =3D c2c_pmu->dev, + .task_ctx_nr =3D perf_invalid_context, + .pmu_enable =3D nv_c2c_pmu_enable, + .pmu_disable =3D nv_c2c_pmu_disable, + .event_init =3D nv_c2c_pmu_event_init, + .add =3D nv_c2c_pmu_add, + .del =3D nv_c2c_pmu_del, + .start =3D nv_c2c_pmu_start, + .stop =3D nv_c2c_pmu_stop, + .read =3D nv_c2c_pmu_read, + .attr_groups =3D c2c_pmu->attr_groups, + .capabilities =3D PERF_PMU_CAP_NO_EXCLUDE | + PERF_PMU_CAP_NO_INTERRUPT, + }; + + ret =3D perf_pmu_register(&c2c_pmu->pmu, c2c_pmu->name, -1); + if (ret) { + dev_err(c2c_pmu->dev, "Failed to register C2C PMU: %d\n", ret); + cpuhp_state_remove_instance(nv_c2c_pmu_cpuhp_state, + &c2c_pmu->cpuhp_node); + return ret; + } + + return 0; +} + +static int nv_c2c_pmu_probe(struct platform_device *pdev) +{ + int ret; + struct nv_c2c_pmu *c2c_pmu; + + c2c_pmu =3D nv_c2c_pmu_init_pmu(pdev); + if (IS_ERR(c2c_pmu)) + return PTR_ERR(c2c_pmu); + + ret =3D nv_c2c_pmu_init_mmio(c2c_pmu); + if (ret) + return ret; + + ret =3D nv_c2c_pmu_get_cpus(c2c_pmu); + if (ret) + return ret; + + ret =3D nv_c2c_pmu_register_pmu(c2c_pmu); + if (ret) + return ret; + + dev_dbg(c2c_pmu->dev, "Registered %s PMU\n", c2c_pmu->name); + + return 0; +} + +static void nv_c2c_pmu_device_remove(struct platform_device *pdev) +{ + struct nv_c2c_pmu *c2c_pmu =3D platform_get_drvdata(pdev); + + perf_pmu_unregister(&c2c_pmu->pmu); + cpuhp_state_remove_instance(nv_c2c_pmu_cpuhp_state, &c2c_pmu->cpuhp_node); +} + +static const struct acpi_device_id nv_c2c_pmu_acpi_match[] =3D { + { "NVDA2023", (kernel_ulong_t)&nv_c2c_pmu_data[C2C_TYPE_NVLINK] }, + { "NVDA2022", (kernel_ulong_t)&nv_c2c_pmu_data[C2C_TYPE_NVCLINK] }, + { "NVDA2020", (kernel_ulong_t)&nv_c2c_pmu_data[C2C_TYPE_NVDLINK] }, + { } +}; +MODULE_DEVICE_TABLE(acpi, nv_c2c_pmu_acpi_match); + +static struct platform_driver nv_c2c_pmu_driver =3D { + .driver =3D { + .name =3D "nvidia-t410-c2c-pmu", + .acpi_match_table =3D nv_c2c_pmu_acpi_match, + .suppress_bind_attrs =3D true, + }, + .probe =3D nv_c2c_pmu_probe, + .remove =3D nv_c2c_pmu_device_remove, +}; + +static int __init nv_c2c_pmu_init(void) +{ + int ret; + + ret =3D cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, + "perf/nvidia/c2c:online", + nv_c2c_pmu_online_cpu, + nv_c2c_pmu_cpu_teardown); + if (ret < 0) + return ret; + + nv_c2c_pmu_cpuhp_state =3D ret; + return platform_driver_register(&nv_c2c_pmu_driver); +} + +static void __exit nv_c2c_pmu_exit(void) +{ + platform_driver_unregister(&nv_c2c_pmu_driver); + cpuhp_remove_multi_state(nv_c2c_pmu_cpuhp_state); +} + +module_init(nv_c2c_pmu_init); +module_exit(nv_c2c_pmu_exit); + +MODULE_LICENSE("GPL"); +MODULE_DESCRIPTION("NVIDIA Tegra410 C2C PMU driver"); +MODULE_AUTHOR("Besar Wicaksono "); --=20 2.43.0 From nobody Fri Apr 3 04:41:07 2026 Received: from PH7PR06CU001.outbound.protection.outlook.com (mail-westus3azon11010040.outbound.protection.outlook.com [52.101.201.40]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CC6663750A2; Tue, 24 Mar 2026 01:30:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.201.40 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774315855; cv=fail; b=pJQwloUfVu1Rkw6LhPThFH2Mu4tkzJEJU7z3YmI9BQHxyDXN5fKMTvhARR/pD0VOk/OkeAOAnqnlOSGnQ1MBocvj1s1PIDO9DBhhxb9R0XsRFd6HBo/D0Djqr9WWrODdKFy3HUj8H+42NRc3C7nINMbMcbQY3pQNoivuRyN0VQk= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774315855; c=relaxed/simple; bh=cIhQMFQvRKF7KVieu1ZFGTOizXUCCSywviTQh8bvTKQ=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=PF4n7v7z21lUmnEIhqqbrP3P69AyXRLntd9eM218KSy9oUoJOUipkddndtlgA6U+FNfBUPkCgE3C/DwinPhH4VhsIEX5jaUEp3OYoguwILxHt3kLumGT90J/gfEn1WVOXtw+iF/g/1FYYd+4laR1YmlyeaSv6BBhXVk3ayjjqyM= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=pDupJ+ZX; arc=fail smtp.client-ip=52.101.201.40 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="pDupJ+ZX" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=DcMnTP3h7e+bnip2nQEI5nmimA5fawPptR9+1yaFq2MupgMZXOVPC+lXDsJVSs6BeFfUdalAjpHw0zyxoSkW/2aZ5TxqLkLeIt+WwsrfgHsckBA6qOK87GbEAJeHm88bkk9QOBhSSRiI0EMKyvrZup4f0Zd3k2y7YhcdRhPoJNs8L/h37nBwx+fxJBtBplF/hr38QDrZAU8H9CPt/wAOWYheibGSRDrNkbvNR6et8TDcwZvkjpFj/E49U4Pi89uUYEZ5sm90vYHvKARNKECNdcw0SiqMREBxo7Z6POWm4NTeE4u/3iRMz2PM+/U1iHoQjvXYsdM1H+Om/iAmUOEnrQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=KnmCTrDMeHYriH5d6Xmzv7d4/3KX92c2SKYbVpbkJ6w=; b=Mb/Vf3KxuHUxqk13FkIlQ4ulRMoV/vnqDMNWnzTctJPuMPsIO3P1ZXNlv3DTlStFBm0NFvYyhY7Wkmr0XeH9N/O3TtFGuIQsQAYwBHYps2QMRoK9eThh1Y8pdy7pSQUseAj4jDcx5PX5zaMzo1Sz9hFmn+3JV3Y/wbhbbeh2V4U++JHz740p8VbgEqDcSiTbSKPfDDhtr57eRqhjQEgnEJQbxspWC5/3NOi7OU4aEZWgtCtzX2xsNwNjat3+Nfy2Ej7FPHWholwzUWNyyXb7A8B4/zPyQ87EAyz/oTG+zJxopnImKZV+PhRBMoheSf94vGLKjK/eMnf1l+q9edD+HQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.233) smtp.rcpttodomain=kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=KnmCTrDMeHYriH5d6Xmzv7d4/3KX92c2SKYbVpbkJ6w=; b=pDupJ+ZXh0tShdySQCGIYdy5R4mActcYUQFq1c5G0/mGus56AC4cCyxY21MH22vDBke1Mguj6uJnxwMwMVxLFv10iNn7Uf+CoaGEnbl33Xey8WZfomuUvjRamuLXugqlr0irh53Y89f+ibmpwD2QR5KUj0/EUhe6rYTT/dDISUmMm71cZRbhp5HL0XhV26dMinUfWbiPx4QO35wXt+Nhb49i++nKGSyr/4iYi7ygY6IMPZPdArTEe5L6v0UZhBipU0XFjZy1zhCUEo1SWWJDpHYoQi5OBE8Uj/PDK9bxt4TeBHDI/ZaEREFpweFvUS3ID9GyGB5/Wv2w/K2kMvaYNw== Received: from BN9PR03CA0879.namprd03.prod.outlook.com (2603:10b6:408:13c::14) by DS7PR12MB5815.namprd12.prod.outlook.com (2603:10b6:8:77::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9745.20; Tue, 24 Mar 2026 01:30:44 +0000 Received: from BN3PEPF0000B075.namprd04.prod.outlook.com (2603:10b6:408:13c:cafe::49) by BN9PR03CA0879.outlook.office365.com (2603:10b6:408:13c::14) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9723.31 via Frontend Transport; Tue, 24 Mar 2026 01:30:41 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.233) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.233 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.233; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.233) by BN3PEPF0000B075.mail.protection.outlook.com (10.167.243.120) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9723.19 via Frontend Transport; Tue, 24 Mar 2026 01:30:44 +0000 Received: from drhqmail203.nvidia.com (10.126.190.182) by mail.nvidia.com (10.127.129.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Mon, 23 Mar 2026 18:30:25 -0700 Received: from drhqmail201.nvidia.com (10.126.190.180) by drhqmail203.nvidia.com (10.126.190.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Mon, 23 Mar 2026 18:30:25 -0700 Received: from build-bwicaksono-noble-20251018.internal (10.127.8.11) by mail.nvidia.com (10.126.190.180) with Microsoft SMTP Server id 15.2.2562.20 via Frontend Transport; Mon, 23 Mar 2026 18:30:24 -0700 From: Besar Wicaksono To: , , , , CC: , , , , , , , , , , , , , , , , Besar Wicaksono Subject: [PATCH v3 8/8] arm64: defconfig: Enable NVIDIA TEGRA410 PMU Date: Tue, 24 Mar 2026 01:29:52 +0000 Message-ID: <20260324012952.1923296-9-bwicaksono@nvidia.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260324012952.1923296-1-bwicaksono@nvidia.com> References: <20260324012952.1923296-1-bwicaksono@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-NV-OnPremToCloud: ExternallySecured X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN3PEPF0000B075:EE_|DS7PR12MB5815:EE_ X-MS-Office365-Filtering-Correlation-Id: 32cc17bc-cbf4-40d2-b86c-08de8944f804 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|82310400026|1800799024|36860700016|7416014|376014|18002099003|56012099003|22082099003; X-Microsoft-Antispam-Message-Info: UlX0+F8ZXdYSv49fUzPeLJg/86IF88RJPu6nlZRKtS68TZDuR6aMAgrDnG0l/NlrSx9uKPUbBKu3hsWv36O567/mPyaPWmS85TRGNMS4iOgaZIWDOGye8yL7lFUUHm9Zz6YQHRWcWsPq3194GCanOX+vDtv2ZHIzxmqc/iwFVVHPI9HOjtYlePghTePqCHRpum25L2oSh0NEt9GdLXof5/k2KLdqHlu5uTzVSIMqmFqeBDi00kcJfiqjzPr9EGVKorcWcg+SoiobyyGJMCd32QmsNwu3qDy46SDAn+kJiMgrw5kqtB/um7hGP5Lr7/AUULsxu1ZlQF3N2wdCQ0pnQh9jVTrmq10dOFmcG7mFxLRDbzWVoG900c8RuAKEPyQhwXuiO6tnmILhiR74Xr/nBpRGlzgxPez6u6n5xKyFrYafbjBpmlZ9VCAOskQk8KKnADp1+WQQWkIxiSCJjX78sMuOait6lbycnnryz4btzkievRCyU+7FimOK2D9ic4Fib/unddL6JTByoEpFX6GAqyzDcQyT6wN+N95nFVIz0YYGSEwgAoL9wGkSR/sJW2YwMgRgCZ26w4YP0shhAWSSWUUHRMpkeRo8I4zgClNEKnSz7+sKtki+7DQC1mpUjfD4qGucBlnp+0L+DEc1B5s08WgXaHQiQw9wp0tNsIQFeggND1uz1pPaYYb2kvgVgqO1rki8j4KpGrvc3v2tEjSkrluMn9Xvi0BVNnzg5yPDirkq3paDIK9imx4ZlfwQBx46pMKUR2PFfhW1bFhf6jgA7g== X-Forefront-Antispam-Report: CIP:216.228.118.233;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge2.nvidia.com;CAT:NONE;SFS:(13230040)(82310400026)(1800799024)(36860700016)(7416014)(376014)(18002099003)(56012099003)(22082099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: L/7JJI5Bh8Pbmv5+n/i4xCvsPoX6pzL4wd8tHO8KjdK2ye+WnSURh46rg9bPuUyLupK6igAl7Bxd5XrII4gzTUDjmmcYeG1XZIjm7l4oGl8di6G9G1TC8XyUM3u1tMeeN3wjRTiV3/Aity//b/rjxfQGQ7iXMNa1I2r0+zIJQzq6ZmlSomeE/587Z9lLmuyuL+vDzymtjtNrnqzFxe+cWeV7UudqYw88KiZTYg2UudlQyOd5GD0P2kUi84JFZrqUT7KvZw45Q3p63VOnGd4Hp4hFD3Zqd87c+lUb3XXnIxYBkaaOxs27tQ0VuB3wK5x+vjG0Ip6s1CHAK6o/yX70q8Y0NXUd29/+b4UkTn/kkFxjbOP5FXKCLO6m8xbJs4Z7CU2g/ShYm2w4F+gvWtpUdJOJov96R6Wh3I0KVTIwFn7IS+0iKLoQcMiF3BXPwTQ9 X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Mar 2026 01:30:44.3244 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 32cc17bc-cbf4-40d2-b86c-08de8944f804 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.233];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BN3PEPF0000B075.namprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS7PR12MB5815 Content-Type: text/plain; charset="utf-8" Enable the driver for NVIDIA TEGRA410 CMEM Latency and C2C PMU present in Tegra410 SOC, which is an ACPI-based ARM64 platform. Signed-off-by: Besar Wicaksono --- arch/arm64/configs/defconfig | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig index 45288ec9eaf7..3d0e438cb997 100644 --- a/arch/arm64/configs/defconfig +++ b/arch/arm64/configs/defconfig @@ -1723,6 +1723,8 @@ CONFIG_ARM_DMC620_PMU=3Dm CONFIG_HISI_PMU=3Dy CONFIG_ARM_CORESIGHT_PMU_ARCH_SYSTEM_PMU=3Dm CONFIG_NVIDIA_CORESIGHT_PMU_ARCH_SYSTEM_PMU=3Dm +CONFIG_NVIDIA_TEGRA410_CMEM_LATENCY_PMU=3Dm +CONFIG_NVIDIA_TEGRA410_C2C_PMU=3Dm CONFIG_MESON_DDR_PMU=3Dm CONFIG_NVMEM_LAYOUT_SL28_VPD=3Dm CONFIG_NVMEM_IMX_OCOTP=3Dy --=20 2.43.0