From nobody Sat Feb 7 09:30:16 2026 Received: from CH1PR05CU001.outbound.protection.outlook.com (mail-northcentralusazon11010058.outbound.protection.outlook.com [52.101.193.58]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 66E5125485A; Tue, 27 Jan 2026 22:59:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.193.58 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769554808; cv=fail; b=T8ZCVS9yERcRTrfF2KeaCbyA2l3HxtBl3LlKnG8HWD3hcpX5VVZ21XzH7366XTMoPYmnwiRR2NKSk/jQhP3yQnqx6t6swLcvw3DzEIPl9vwUfNFem1O0XJhIUfbWzUYPfgJmIRuTv2RTxXrFv2aPSh7clPxUGndSEjYF+ne2q+c= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769554808; c=relaxed/simple; bh=l5wYcI2xHsxpeEpYoJGpPl+UfoTK5HgFtMW1gZZTLc8=; h=From:To:CC:Subject:Date:Message-ID:MIME-Version:Content-Type; b=QBkFDE9iBhHMmBC7flGeY1O/GC6SaKN3I6s3JjKWl30V0xfMyV2zFkOtnb5QA0HpUDawWgamXx3ZiyxWSngVrt0uTBp/c4O8OGeVSWP2m1LeJUzTs1H3iF5m0Y+dAzLpDahxx1NNkCw3jjqMYKjLPw/bPxYelwvo3vcs09MrAD8= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=UcZ6vRLH; arc=fail smtp.client-ip=52.101.193.58 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="UcZ6vRLH" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=TuhLOhuB5OjEcbBiLOGD4Hxjvv8EQzZwTlJ8oFrZNL3ArUY5gIlZ8kCVGlGYRYcYCcjV7ejzNAAsT6Hj1APvlxUx90qaPQGV5L9UzYfjdmpH5lsjsaoZi+1dtfmmGrfBz1yutyGCTvMJhEg8Kl+JLlnd6XR42NJqpJUcaEQajRI4WXTGeEvFgp5XsJDXqC2vsbx7ln5pf/1kooO3F9cx/VjVIkqkTzz0GXwsZgTBt+oTEmHpQXmUqOoxyFBeT8EWHVZxq1TESkuD/vbr5Nh/iFBvU5H6RxQ/+gJebVwEWdQc8cTG/csp7fYvsoeuo97QnUFLV3+EDheKE1ml0Id6VQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=sPzT+Gszswnu6oHwqz53VrMCQQikSjxtpsfpeBgMFNc=; b=R4w2BiPj0pajSQ7uRlLWA42CcTTZRDJMOFUQj6KyED/qhlZWb0C4oRC+l/vP2ra7FLEC850MvklQoiubxotushqtM97SIYn6sFsoG8y4ciCcH3YT/H/0Y1PJ5S/e3EIn++Wo3pMJEaP4KDjdGEVMSiCmcDrfK4ULx9nxMbm1gZQ11j3AwYhmNmuF9jP0hAwU+UjWzzG1PBMjQGXZsgUZURprIA0bXeu0+nnAUqea/MQq/EGFsbggFDkXlDyo2lGf3WXZi7Zfkds6+THcTF8IEsScFtQd6OfruJYR/sVei+rTgFoJtshJohrhZgoGaN8yyrYfu93dXnhTY2OS5S/ZPw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.160) smtp.rcpttodomain=oracle.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=sPzT+Gszswnu6oHwqz53VrMCQQikSjxtpsfpeBgMFNc=; b=UcZ6vRLHlLODWdJapuCjM0lRv3Ut3j1hFvR+B3sc6+CQqMBLzz7nQh7GwcBU8nBQXGqErt2JQPrf7lxi0uRyxDk/rZpwna7ZhmYJMW6kiamgXNspJq35FLbjgWqJX6fpg16uYvCY0t10bVZS4mYrHf7tQaR0SS6F6n+W7oEEHa7ziSSVo/RdAHEDNNhDh/wG1cffuE6k9T43BJJFiHCslWE3fRSMG1EujRSqdm8u2OLaU5fZ9lEkhGT+Vo42tOvYCXVKMRIcLGYx0wlJpnD8y/mciYex8rAKt7e6byZeHrZrjr99XwussnRqSnHUrqYKyczK3DWfhcMPvcxNUbk/sw== Received: from CH5P222CA0017.NAMP222.PROD.OUTLOOK.COM (2603:10b6:610:1ee::21) by PH8PR12MB6818.namprd12.prod.outlook.com (2603:10b6:510:1c9::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9542.15; Tue, 27 Jan 2026 22:59:30 +0000 Received: from CH2PEPF00000148.namprd02.prod.outlook.com (2603:10b6:610:1ee:cafe::28) by CH5P222CA0017.outlook.office365.com (2603:10b6:610:1ee::21) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9564.7 via Frontend Transport; Tue, 27 Jan 2026 22:59:29 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.160) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.160 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.160; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.160) by CH2PEPF00000148.mail.protection.outlook.com (10.167.244.105) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9564.3 via Frontend Transport; Tue, 27 Jan 2026 22:59:29 +0000 Received: from rnnvmail202.nvidia.com (10.129.68.7) by mail.nvidia.com (10.129.200.66) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Tue, 27 Jan 2026 14:59:14 -0800 Received: from rnnvmail205.nvidia.com (10.129.68.10) by rnnvmail202.nvidia.com (10.129.68.7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Tue, 27 Jan 2026 14:59:14 -0800 Received: from build-bwicaksono-noble-20251018.internal (10.127.8.11) by mail.nvidia.com (10.129.68.10) with Microsoft SMTP Server id 15.2.2562.20 via Frontend Transport; Tue, 27 Jan 2026 14:59:12 -0800 From: Besar Wicaksono To: , , , , , , , , , , , , , CC: , , , , , , , , , , , , Besar Wicaksono Subject: [PATCH] perf vendor events arm64: Add Tegra410 Olympus PMU events Date: Tue, 27 Jan 2026 22:59:09 +0000 Message-ID: <20260127225909.3296202-1-bwicaksono@nvidia.com> X-Mailer: git-send-email 2.43.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-NV-OnPremToCloud: ExternallySecured X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CH2PEPF00000148:EE_|PH8PR12MB6818:EE_ X-MS-Office365-Filtering-Correlation-Id: facbe005-8adb-43a6-9541-08de5df7ba86 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|376014|7416014|1800799024|82310400026|36860700013|921020; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?dtQ/QlgL8QZz+zDgmOVdFaTnZoi7tf68PCpxPV57XrmjBB77JJMxae/CEp70?= =?us-ascii?Q?gdEdI2GZ0EJ9F5t3FQKeQuz7t2bruvj9Cvzn81iOIEKnLdjIHv6exKtUEzoL?= =?us-ascii?Q?jHeNhtpcufQ3fArRSb4aI0+lxXE8NBt9nPMIoirFxiqflAc0xicA4mccyidX?= =?us-ascii?Q?8XbXwybRKXynB+r5HMXokSaHFmleRgZk8in/PUQd4Zo7o4mILm+Wug9HhEu9?= =?us-ascii?Q?L6a7xqSLBH3fIwzf8CaVao7es1FQBFzwYbHBfkBYtX6GH2r6uhtogB6zDAcC?= =?us-ascii?Q?aVwI5U5DFa5DXkGlmXkRbUYNFzW+5eWwYedVPxoSzitw8nVXNdXHNTQHO7Bc?= =?us-ascii?Q?xBxIG2g+2zQAcz/stlVQ65tEKAv7UXQtPsUnN5QojE+WouRO9YKmF29xom8b?= =?us-ascii?Q?Qh08o34nlhZYjuP/GgelvH3JgdOcCqCl3Ij5u0uT7Bb8/iILdMPEX3oe9rqN?= =?us-ascii?Q?aGYFJ9NM9IImI451ekv5FyTo4NHhk1xoF1w8CUQnAd6aOZGR0NU8pMQWmI1X?= =?us-ascii?Q?vUU7Zm1t9cMuLesyniEVm+q692Sdo+FWi1ughGQzs3qGE/CjifXOWAcvKuID?= =?us-ascii?Q?VxEgiSaBRfQRj3HgcIwVGf1b9kI0VtxyWeChfXZHAHA4mG7Q6dBch2k1K3wD?= =?us-ascii?Q?VTC1jtffYLmsUnNE92Xdga7KUSEeP39H/2KVceV1D+uni3vVgFdl2q60V7J6?= =?us-ascii?Q?b/poJvIVM0JOb+WkryQYGU5FORUYsePAKsx43pwoTrapLx5oLTmXYLvAt7+2?= =?us-ascii?Q?E00ld/ApWXrXaWyRNAcxmbNXSCiXNsacPpHR8pNFf+s0s7K96y1Vv31ww/d+?= =?us-ascii?Q?jaoOK+K7kdjy6lj9pg1Zgmrcd3MnxyX1GvJdGvut4u4LwRfkAllanQnwVJCu?= =?us-ascii?Q?MKs6HbGbxmIHAs4Vh43kBduDpGjCmovfWY/WVp7vw3eBoT8xhPKa+nMbO8jZ?= =?us-ascii?Q?9oiBWEtMRK+0Tf83/nwvxuTKdWoIN+zk8q5e1MICPGsbSQWaSx1i+iJs8OCc?= =?us-ascii?Q?4hxJgJvfwfvlYVEDTj8Ebqzt0lmDSlD4bxyNKNNyNbIyqSyj48aOte0XoPUG?= =?us-ascii?Q?1wx/Kh0+9Bep6tpaG/6EnDCK+YrIlzzGN2egkOCBbi5rHkfuj11M3Ubk53f8?= =?us-ascii?Q?+7y2UeDdUS2cECso3R8aFPKd+3q+d2TRK0p/B1uJfI2TmVkTDaZzkaYSFQRP?= =?us-ascii?Q?/HytgTuP9EU7vnJJ/3ykSDv15v/fOQ9rhxkmP4ST6LIeBA7vPR0pXUdHPirY?= =?us-ascii?Q?zjVTWktejU8pIzmZjRNPtAJ2Teb3wnF0H4Zk3sZfHZqwiRtPuhBk42O3IZmX?= =?us-ascii?Q?qPcNNxAKzGdh+MvDhth/xEaxvHHiDrzE3mZR3zKAoxWKTjuz6ZazTLYed8Wi?= =?us-ascii?Q?FAnpK1dHK0bAOWelrOyMgaw1eHJvSo3QbsiFEQdz/zlZDoDPKvyFQjm0bDnm?= =?us-ascii?Q?+Rq8v22DqCrZiXabTv+JVrVEXaS/OsBqQw0S2M8upVWsaiAZgfwZtKpBOi0i?= =?us-ascii?Q?y16PdWaCNGXrHbAwWJp8skMzjpB1lhFqypJdN5uJrdBsNyw4zFrFMFJEiVG4?= =?us-ascii?Q?qGxmnLwALLgz5804RkaG8QPXf6EElhj4JRdD/v13zMLRncpJJJAGcdnYSUP7?= =?us-ascii?Q?RV+0HZgIwOi2ELxfYNtey23jpQh0slVnkkpVc6VsqvpPLTlwjv8jZKu4LB1F?= =?us-ascii?Q?WWv2zPVbPW5NXiRNz8A+QHFXLFU=3D?= X-Forefront-Antispam-Report: CIP:216.228.117.160;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge1.nvidia.com;CAT:NONE;SFS:(13230040)(376014)(7416014)(1800799024)(82310400026)(36860700013)(921020);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 27 Jan 2026 22:59:29.9226 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: facbe005-8adb-43a6-9541-08de5df7ba86 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.160];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CH2PEPF00000148.namprd02.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH8PR12MB6818 Content-Type: text/plain; charset="utf-8" Add JSON files for NVIDIA Tegra410 Olympus core PMU events. Also updated the common-and-microarch.json. Signed-off-by: Besar Wicaksono --- .../arch/arm64/common-and-microarch.json | 90 +++ tools/perf/pmu-events/arch/arm64/mapfile.csv | 1 + .../arch/arm64/nvidia/t410/branch.json | 45 ++ .../arch/arm64/nvidia/t410/brbe.json | 6 + .../arch/arm64/nvidia/t410/bus.json | 48 ++ .../arch/arm64/nvidia/t410/exception.json | 62 ++ .../arch/arm64/nvidia/t410/fp_operation.json | 78 ++ .../arch/arm64/nvidia/t410/general.json | 15 + .../arch/arm64/nvidia/t410/l1d_cache.json | 122 +++ .../arch/arm64/nvidia/t410/l1i_cache.json | 114 +++ .../arch/arm64/nvidia/t410/l2d_cache.json | 134 ++++ .../arch/arm64/nvidia/t410/ll_cache.json | 107 +++ .../arch/arm64/nvidia/t410/memory.json | 46 ++ .../arch/arm64/nvidia/t410/metrics.json | 722 ++++++++++++++++++ .../arch/arm64/nvidia/t410/misc.json | 646 ++++++++++++++++ .../arch/arm64/nvidia/t410/retired.json | 94 +++ .../arch/arm64/nvidia/t410/spe.json | 42 + .../arm64/nvidia/t410/spec_operation.json | 230 ++++++ .../arch/arm64/nvidia/t410/stall.json | 145 ++++ .../arch/arm64/nvidia/t410/tlb.json | 158 ++++ 20 files changed, 2905 insertions(+) create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/branch.json create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/brbe.json create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/bus.json create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/exception.= json create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/fp_operati= on.json create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/general.js= on create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/l1d_cache.= json create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/l1i_cache.= json create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/l2d_cache.= json create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/ll_cache.j= son create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/memory.json create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/metrics.js= on create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/misc.json create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/retired.js= on create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/spe.json create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/spec_opera= tion.json create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/stall.json create mode 100644 tools/perf/pmu-events/arch/arm64/nvidia/t410/tlb.json diff --git a/tools/perf/pmu-events/arch/arm64/common-and-microarch.json b/t= ools/perf/pmu-events/arch/arm64/common-and-microarch.json index 468cb085d879..6af15776ff17 100644 --- a/tools/perf/pmu-events/arch/arm64/common-and-microarch.json +++ b/tools/perf/pmu-events/arch/arm64/common-and-microarch.json @@ -179,6 +179,11 @@ "EventName": "BUS_CYCLES", "BriefDescription": "Bus cycle" }, + { + "EventCode": "0x001E", + "EventName": "CHAIN", + "BriefDescription": "Chain a pair of event counters." + }, { "PublicDescription": "Level 1 data cache allocation without refill= ", "EventCode": "0x1F", @@ -1512,11 +1517,26 @@ "EventName": "L2D_CACHE_REFILL_PRFM", "BriefDescription": "Level 2 data cache refill, software preload" }, + { + "EventCode": "0x8150", + "EventName": "L3D_CACHE_RW", + "BriefDescription": "Level 3 data cache demand access." + }, + { + "EventCode": "0x8151", + "EventName": "L3D_CACHE_PRFM", + "BriefDescription": "Level 3 data cache software prefetch" + }, { "EventCode": "0x8152", "EventName": "L3D_CACHE_MISS", "BriefDescription": "Level 3 data cache demand access miss" }, + { + "EventCode": "0x8153", + "EventName": "L3D_CACHE_REFILL_PRFM", + "BriefDescription": "Level 3 data cache refill, software prefetch." + }, { "EventCode": "0x8154", "EventName": "L1D_CACHE_HWPRF", @@ -1527,6 +1547,11 @@ "EventName": "L2D_CACHE_HWPRF", "BriefDescription": "Level 2 data cache hardware prefetch." }, + { + "EventCode": "0x8156", + "EventName": "L3D_CACHE_HWPRF", + "BriefDescription": "Level 3 data cache hardware prefetch." + }, { "EventCode": "0x8158", "EventName": "STALL_FRONTEND_MEMBOUND", @@ -1682,6 +1707,11 @@ "EventName": "L2D_CACHE_REFILL_HWPRF", "BriefDescription": "Level 2 data cache refill, hardware prefetch." }, + { + "EventCode": "0x81BE", + "EventName": "L3D_CACHE_REFILL_HWPRF", + "BriefDescription": "Level 3 data cache refill, hardware prefetch." + }, { "EventCode": "0x81C0", "EventName": "L1I_CACHE_HIT_RD", @@ -1712,11 +1742,31 @@ "EventName": "L1I_CACHE_HIT_RD_FPRFM", "BriefDescription": "Level 1 instruction cache demand fetch first = hit, fetched by software preload" }, + { + "EventCode": "0x81DC", + "EventName": "L1D_CACHE_HIT_RW_FPRFM", + "BriefDescription": "Level 1 data cache demand access first hit, f= etched by software prefetch." + }, { "EventCode": "0x81E0", "EventName": "L1I_CACHE_HIT_RD_FHWPRF", "BriefDescription": "Level 1 instruction cache demand fetch first = hit, fetched by hardware prefetcher" }, + { + "EventCode": "0x81EC", + "EventName": "L1D_CACHE_HIT_RW_FHWPRF", + "BriefDescription": "Level 1 data cache demand access first hit, f= etched by hardware prefetcher." + }, + { + "EventCode": "0x81F0", + "EventName": "L1I_CACHE_HIT_RD_FPRF", + "BriefDescription": "Level 1 instruction cache demand fetch first = hit, fetched by prefetch." + }, + { + "EventCode": "0x81FC", + "EventName": "L1D_CACHE_HIT_RW_FPRF", + "BriefDescription": "Level 1 data cache demand access first hit, f= etched by prefetch." + }, { "EventCode": "0x8200", "EventName": "L1I_CACHE_HIT", @@ -1767,11 +1817,26 @@ "EventName": "L1I_LFB_HIT_RD_FPRFM", "BriefDescription": "Level 1 instruction cache demand fetch line-f= ill buffer first hit, recently fetched by software preload" }, + { + "EventCode": "0x825C", + "EventName": "L1D_LFB_HIT_RW_FPRFM", + "BriefDescription": "Level 1 data cache demand access line-fill bu= ffer first hit, recently fetched by software prefetch." + }, { "EventCode": "0x8260", "EventName": "L1I_LFB_HIT_RD_FHWPRF", "BriefDescription": "Level 1 instruction cache demand fetch line-f= ill buffer first hit, recently fetched by hardware prefetcher" }, + { + "EventCode": "0x826C", + "EventName": "L1D_LFB_HIT_RW_FHWPRF", + "BriefDescription": "Level 1 data cache demand access line-fill bu= ffer first hit, recently fetched by hardware prefetcher." + }, + { + "EventCode": "0x827C", + "EventName": "L1D_LFB_HIT_RW_FPRF", + "BriefDescription": "Level 1 data cache demand access line-fill bu= ffer first hit, recently fetched by prefetch." + }, { "EventCode": "0x8280", "EventName": "L1I_CACHE_PRF", @@ -1807,6 +1872,11 @@ "EventName": "LL_CACHE_REFILL", "BriefDescription": "Last level cache refill" }, + { + "EventCode": "0x828E", + "EventName": "L3D_CACHE_REFILL_PRF", + "BriefDescription": "Level 3 data cache refill, prefetch." + }, { "EventCode": "0x8320", "EventName": "L1D_CACHE_REFILL_PERCYC", @@ -1872,6 +1942,16 @@ "EventName": "FP_FP8_MIN_SPEC", "BriefDescription": "Floating-point operation speculatively_execut= ed, smallest type is 8-bit floating-point." }, + { + "EventCode": "0x8480", + "EventName": "FP_SP_FIXED_MIN_OPS_SPEC", + "BriefDescription": "Non-scalable element arithmetic operations sp= eculatively executed, smallest type is single-precision floating-point." + }, + { + "EventCode": "0x8482", + "EventName": "FP_HP_FIXED_MIN_OPS_SPEC", + "BriefDescription": "Non-scalable element arithmetic operations sp= eculatively executed, smallest type is half-precision floating-point." + }, { "EventCode": "0x8483", "EventName": "FP_BF16_FIXED_MIN_OPS_SPEC", @@ -1882,6 +1962,16 @@ "EventName": "FP_FP8_FIXED_MIN_OPS_SPEC", "BriefDescription": "Non-scalable element arithmetic operations sp= eculatively executed, smallest type is 8-bit floating-point." }, + { + "EventCode": "0x8488", + "EventName": "FP_SP_SCALE_MIN_OPS_SPEC", + "BriefDescription": "Scalable element arithmetic operations specul= atively executed, smallest type is single-precision floating-point." + }, + { + "EventCode": "0x848A", + "EventName": "FP_HP_SCALE_MIN_OPS_SPEC", + "BriefDescription": "Scalable element arithmetic operations specul= atively executed, smallest type is half-precision floating-point." + }, { "EventCode": "0x848B", "EventName": "FP_BF16_SCALE_MIN_OPS_SPEC", diff --git a/tools/perf/pmu-events/arch/arm64/mapfile.csv b/tools/perf/pmu-= events/arch/arm64/mapfile.csv index bb3fa8a33496..7f0eaa702048 100644 --- a/tools/perf/pmu-events/arch/arm64/mapfile.csv +++ b/tools/perf/pmu-events/arch/arm64/mapfile.csv @@ -46,3 +46,4 @@ 0x00000000500f0000,v1,ampere/emag,core 0x00000000c00fac30,v1,ampere/ampereone,core 0x00000000c00fac40,v1,ampere/ampereonex,core +0x000000004e0f0100,v1,nvidia/t410,core diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/branch.json b/too= ls/perf/pmu-events/arch/arm64/nvidia/t410/branch.json new file mode 100644 index 000000000000..532bc59dc573 --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/branch.json @@ -0,0 +1,45 @@ +[ + { + "ArchStdEvent": "BR_MIS_PRED", + "PublicDescription": "The Event counts Branches which are speculat= ively executed and mis-predicted." + }, + { + "ArchStdEvent": "BR_PRED", + "PublicDescription": "The Event counts all speculatively executed = Branches." + }, + { + "EventCode": "0x017e", + "EventName": "BR_PRED_BTB_CTX_UPDATE", + "PublicDescription": "Branch context table update." + }, + { + "EventCode": "0x0188", + "EventName": "BR_MIS_PRED_DIR_RESOLVED", + "PublicDescription": "Number of Branch misprediction due to direct= ion misprediction." + }, + { + "EventCode": "0x0189", + "EventName": "BR_MIS_PRED_DIR_UNCOND_RESOLVED", + "PublicDescription": "Number of Branch misprediction due to direct= ion misprediction for unconditional Branches." + }, + { + "EventCode": "0x018a", + "EventName": "BR_MIS_PRED_DIR_UNCOND_DIRECT_RESOLVED", + "PublicDescription": "Number of Branch misprediction due to direct= ion misprediction for unconditional direct Branches." + }, + { + "EventCode": "0x018b", + "EventName": "BR_PRED_MULTI_RESOLVED", + "PublicDescription": "Number of resolved branch which made predict= ion by polymorphic indirect predictor." + }, + { + "EventCode": "0x018c", + "EventName": "BR_MIS_PRED_MULTI_RESOLVED", + "PublicDescription": "Number of branch misprediction which made pr= ediction by polymorphic indirect predictor." + }, + { + "EventCode": "0x01e4", + "EventName": "BR_RGN_RECLAIM", + "PublicDescription": "The Event counts the Indirect predictor entr= ies flushed by region reclamation." + } +] diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/brbe.json b/tools= /perf/pmu-events/arch/arm64/nvidia/t410/brbe.json new file mode 100644 index 000000000000..7c43a01c9707 --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/brbe.json @@ -0,0 +1,6 @@ +[ + { + "ArchStdEvent": "BRB_FILTRATE", + "PublicDescription": "The Event counts each valid Branch record ca= ptured in the Branch record buffer. Branch records that are not captured be= cause they are removed by filtering are not counted." + } +] diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/bus.json b/tools/= perf/pmu-events/arch/arm64/nvidia/t410/bus.json new file mode 100644 index 000000000000..c4cee0be1242 --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/bus.json @@ -0,0 +1,48 @@ +[ + { + "ArchStdEvent": "BUS_ACCESS", + "PublicDescription": "The Event counts the number of Data-beat acc= ess between the CPU and the external bus. This count includes access due to= Read, Write, and Snoop. Each beat of Data is counted individually." + }, + { + "ArchStdEvent": "BUS_CYCLES", + "PublicDescription": "The Event counts bus cycles in the CPU. Bus = cycles represent a clock cycle in which a transaction could be sent or rece= ived on the interface from the CPU to the external bus. Since that interfac= e is driven at the same clock speed as the CPU, this Event increments at th= e rate of CPU clock. Regardless of the WFE/WFI state of the PE, this Event = increment on each processor clock." + }, + { + "ArchStdEvent": "BUS_ACCESS_RD", + "PublicDescription": "The Event counts memory Read transactions se= en on the external bus. Each beat of Data is counted individually." + }, + { + "ArchStdEvent": "BUS_ACCESS_WR", + "PublicDescription": "The Event counts memory Write transactions s= een on the external bus. Each beat of Data is counted individually." + }, + { + "EventCode": "0x0154", + "EventName": "BUS_REQUEST_REQ", + "PublicDescription": "Bus request, request." + }, + { + "EventCode": "0x0155", + "EventName": "BUS_REQUEST_RETRY", + "PublicDescription": "Bus request, retry." + }, + { + "EventCode": "0x0198", + "EventName": "L2_CHI_CBUSY0", + "PublicDescription": "Number of RXDAT or RXRSP response received w= idth CBusy of 0." + }, + { + "EventCode": "0x0199", + "EventName": "L2_CHI_CBUSY1", + "PublicDescription": "Number of RXDAT or RXRSP response received w= idth CBusy of 1." + }, + { + "EventCode": "0x019a", + "EventName": "L2_CHI_CBUSY2", + "PublicDescription": "Number of RXDAT or RXRSP response received w= idth CBusy of 2." + }, + { + "EventCode": "0x019b", + "EventName": "L2_CHI_CBUSY3", + "PublicDescription": "Number of RXDAT or RXRSP response received w= idth CBusy of 3." + } +] diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/exception.json b/= tools/perf/pmu-events/arch/arm64/nvidia/t410/exception.json new file mode 100644 index 000000000000..2f31fb2e67a2 --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/exception.json @@ -0,0 +1,62 @@ +[ + { + "ArchStdEvent": "EXC_TAKEN", + "PublicDescription": "The Event counts any taken architecturally v= isible exceptions such as IRQ, FIQ, SError, and other synchronous exception= s. Exceptions are counted whether or not they are taken locally." + }, + { + "ArchStdEvent": "EXC_RETURN", + "PublicDescription": "The Event counts any architecturally execute= d exception return Instructions. For example: AArch64: ERET." + }, + { + "ArchStdEvent": "EXC_UNDEF", + "PublicDescription": "The Event counts the number of synchronous e= xceptions which are taken locally that are due to attempting to execute an = Instruction that is UNDEFINED. Attempting to execute Instruction bit patter= ns that have not been allocated. Attempting to execute Instructions when th= ey are disabled. Attempting to execute Instructions at an inappropriate Exc= eption level. Attempting to execute an Instruction when the value of PSTATE= .IL is 1." + }, + { + "ArchStdEvent": "EXC_SVC", + "PublicDescription": "The Event counts SVC exceptions taken locall= y." + }, + { + "ArchStdEvent": "EXC_PABORT", + "PublicDescription": "The Event counts synchronous exceptions that= are taken locally and caused by Instruction Aborts." + }, + { + "ArchStdEvent": "EXC_DABORT", + "PublicDescription": "The Event counts exceptions that are taken l= ocally and are caused by Data aborts or SErrors. Conditions that could caus= e those exceptions are attempting to read or write memory where the MMU gen= erates a fault, attempting to read or write memory with a misaligned addres= s, Interrupts from the nSEI inputs and internally generated SErrors." + }, + { + "ArchStdEvent": "EXC_IRQ", + "PublicDescription": "The Event counts IRQ exceptions including th= e virtual IRQs that are taken locally." + }, + { + "ArchStdEvent": "EXC_FIQ", + "PublicDescription": "The Event counts FIQ exceptions including th= e virtual FIQs that are taken locally." + }, + { + "ArchStdEvent": "EXC_SMC", + "PublicDescription": "The Event counts SMC exceptions take to EL3." + }, + { + "ArchStdEvent": "EXC_HVC", + "PublicDescription": "The Event counts HVC exceptions taken to EL2= ." + }, + { + "ArchStdEvent": "EXC_TRAP_PABORT", + "PublicDescription": "The Event counts exceptions which are traps = not taken locally and are caused by Instruction Aborts. For example, attemp= ting to execute an Instruction with a misaligned PC." + }, + { + "ArchStdEvent": "EXC_TRAP_DABORT", + "PublicDescription": "The Event counts exceptions which are traps = not taken locally and are caused by Data Aborts or SError Interrupts. Condi= tions that could cause those exceptions are: * Attempting to read or write = memory where the MMU generates a fault, * Attempting to read or write memor= y with a misaligned address, * Interrupts from the SEI input. * internally = generated SErrors." + }, + { + "ArchStdEvent": "EXC_TRAP_OTHER", + "PublicDescription": "The Event counts the number of synchronous t= rap exceptions which are not taken locally and are not SVC, SMC, HVC, Data = aborts, Instruction Aborts, or Interrupts." + }, + { + "ArchStdEvent": "EXC_TRAP_IRQ", + "PublicDescription": "The Event counts IRQ exceptions including th= e virtual IRQs that are not taken locally." + }, + { + "ArchStdEvent": "EXC_TRAP_FIQ", + "PublicDescription": "The Event counts FIQs which are not taken lo= cally but taken from EL0, EL1, or EL2 to EL3 (which would be the normal beh= avior for FIQs when not executing in EL3)." + } +] diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/fp_operation.json= b/tools/perf/pmu-events/arch/arm64/nvidia/t410/fp_operation.json new file mode 100644 index 000000000000..0b6d04720751 --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/fp_operation.json @@ -0,0 +1,78 @@ +[ + { + "ArchStdEvent": "FP_HP_SPEC", + "PublicDescription": "The Event counts speculatively executed half= precision floating point operations." + }, + { + "ArchStdEvent": "FP_SP_SPEC", + "PublicDescription": "The Event counts speculatively executed sing= le precision floating point operations." + }, + { + "ArchStdEvent": "FP_DP_SPEC", + "PublicDescription": "The Event counts speculatively executed doub= le precision floating point operations." + }, + { + "ArchStdEvent": "FP_SCALE_OPS_SPEC", + "PublicDescription": "The Event counts speculatively executed scal= able single precision floating point operations." + }, + { + "ArchStdEvent": "FP_FIXED_OPS_SPEC", + "PublicDescription": "The Event counts speculatively executed non-= scalable single precision floating point operations." + }, + { + "ArchStdEvent": "FP_HP_SCALE_OPS_SPEC", + "PublicDescription": "The Event increments by v for each speculati= vely executed scalable element arithmetic operation, due to an Instruction = where the largest type was half-precision floating-point, where v is a valu= e such that (v*(VL/128)) is the number of arithmetic operations carried out= by the operation or Instruction which causes the counter to increment. The= Event does not count operations that are counted by FP_FIXED_OPS_SPEC or F= P_SCALE2_OPS_SPEC." + }, + { + "ArchStdEvent": "FP_HP_FIXED_OPS_SPEC", + "PublicDescription": "The Event increments by v for each speculati= vely executed non-scalable element arithmetic operation, due to an Instruct= ion where the largest type was half-precision floating-point, where v is th= e number of arithmetic operations carried out by the operation or Instructi= on which causes The even to increment. The Event does not count operations = that are counted by FP_SCALE_OPS_SPEC or FP_SCALE2_OPS_SPEC." + }, + { + "ArchStdEvent": "FP_SP_SCALE_OPS_SPEC", + "PublicDescription": "The Event increments by v for each speculati= vely executed scalable element arithmetic operation, due to an Instruction = where the largest type was single-precision floating-point, where v is a va= lue such that (v*(VL/128)) is the number of arithmetic operations carried o= ut by the operation or Instruction which causes The Event to increment. The= Event does not count operations that are counted by FP_FIXED_OPS_SPEC or F= P_SCALE2_OPS_SPEC." + }, + { + "ArchStdEvent": "FP_SP_FIXED_OPS_SPEC", + "PublicDescription": "The Event increments by v for each speculati= vely executed non-scalable element arithmetic operation, due to an Instruct= ion where the largest type was single-precision floating-point, where v is = the number of arithmetic operations carried out by the operation or Instruc= tion which causes The Event to increment. The Event does not count operatio= ns that are counted by FP_SCALE_OPS_SPEC or FP_SCALE2_OPS_SPEC." + }, + { + "ArchStdEvent": "FP_DP_SCALE_OPS_SPEC", + "PublicDescription": "The Event increments by v for each speculati= vely executed scalable element arithmetic operation, due to an Instruction = where the largest type was double-precision floating-point, where v is a va= lue such that (v*(VL/128)) is the number of arithmetic operations carried o= ut by the operation or Instruction which causes The Event to increment. The= Event does not count operations that are counted by FP_FIXED_OPS_SPEC or F= P_SCALE2_OPS_SPEC." + }, + { + "ArchStdEvent": "FP_DP_FIXED_OPS_SPEC", + "PublicDescription": "The Event increments by v for each speculati= vely executed non-scalable element arithmetic operation, due to an Instruct= ion where the largest type was double-precision floating-point, where v is = the number of arithmetic operations carried out by the operation or Instruc= tion which causes The Event to increment. The Event does not count operatio= ns that are counted by FP_SCALE_OPS_SPEC or FP_SCALE2_OPS_SPEC." + }, + { + "ArchStdEvent": "FP_BF16_FIXED_MIN_OPS_SPEC", + "PublicDescription": "The Event increments by v for each speculati= vely executed non-scalable element arithmetic operation, due to an Instruct= ion where the smallest type was BFloat16 floating-point. Where v is the num= ber of arithmetic operations carried out by the operation or Instruction wh= ich causes The Event to increment. The Event does not count operations that= are counted by FP_SCALE_OPS_SPEC or FP_SCALE2_OPS_SPEC." + }, + { + "ArchStdEvent": "FP_BF16_SCALE_MIN_OPS_SPEC", + "PublicDescription": "The Event increments by v for each speculati= vely executed scalable element arithmetic operation, due to an Instruction = where the smallest type was BFloat16 floating-point, where v is a value suc= h that (v*(VL/128)) is the number of arithmetic operations carried out by t= he operation or Instruction which causes The Event to increment. The Event = does not count operations that are counted by FP_FIXED_OPS_SPEC or FP_SCALE= 2_OPS_SPEC." + }, + { + "ArchStdEvent": "FP_FP8_FIXED_MIN_OPS_SPEC", + "PublicDescription": "The Event increments by v for each speculati= vely executed non-scalable element arithmetic operation, due to an Instruct= ion where the smallest type was 8-bit floating-point, where v is the number= of arithmetic operations carried out by the operation or Instruction which= causes The Event to increment. The Event does not count operations that ar= e counted by FP_SCALE_OPS_SPEC or FP_SCALE2_OPS_SPEC." + }, + { + "ArchStdEvent": "FP_FP8_SCALE_MIN_OPS_SPEC", + "PublicDescription": "The Event increments by v for each speculati= vely executed scalable element arithmetic operation, due to an Instruction = where the smallest type was 8-bit floating-point, where v is a value such t= hat (v*(VL/128)) is the number of arithmetic operations carried out by the = operation or Instruction which causes The Event to increment. The Event doe= s not count operations that are counted by FP_FIXED_OPS_SPEC or FP_SCALE2_O= PS_SPEC." + }, + { + "ArchStdEvent": "FP_HP_FIXED_MIN_OPS_SPEC", + "PublicDescription": "The Event increments by v for each speculati= vely executed non-scalable element arithmetic operation, due to an Instruct= ion where the smallest type was half-precision floating-point, where v is t= he number of arithmetic operations carried out by the operation or Instruct= ion which causes The Event to increment. The Event does not count operation= s that are counted by FP_SCALE_OPS_SPEC or FP_SCALE2_OPS_SPEC." + }, + { + "ArchStdEvent": "FP_HP_SCALE_MIN_OPS_SPEC", + "PublicDescription": "The Event increments by v for each speculati= vely executed scalable element arithmetic operation, due to an Instruction = where the smallest type was half-precision floating-point, where v is a val= ue such that (v*(VL/128)) is the number of arithmetic operations carried ou= t by the operation or Instruction which causes The Event to increment. The = Event does not count operations that are counted by FP_FIXED_OPS_SPEC or FP= _SCALE2_OPS_SPEC." + }, + { + "ArchStdEvent": "FP_SP_FIXED_MIN_OPS_SPEC", + "PublicDescription": "The Event increments by v for each speculati= vely executed non-scalable element arithmetic operation, due to an Instruct= ion where the smallest type was single-precision floating-point, where v is= the number of arithmetic operations carried out by the operation or Instru= ction which causes The Event to increment. The Event does not count operati= ons that are counted by FP_SCALE_OPS_SPEC or FP_SCALE2_OPS_SPEC." + }, + { + "ArchStdEvent": "FP_SP_SCALE_MIN_OPS_SPEC", + "PublicDescription": "The Event increments by v for each speculati= vely executed scalable element arithmetic operation, due to an Instruction = where the smallest type was single-precision floating-point, where v is a v= alue such that (v*(VL/128)) is the number of arithmetic operations carried = out by the operation or Instruction which causes The Event to increment. Th= e Event does not count operations that are counted by FP_FIXED_OPS_SPEC or = FP_SCALE2_OPS_SPEC." + } +] diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/general.json b/to= ols/perf/pmu-events/arch/arm64/nvidia/t410/general.json new file mode 100644 index 000000000000..bbeb4a75d261 --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/general.json @@ -0,0 +1,15 @@ +[ + { + "ArchStdEvent": "CPU_CYCLES", + "PublicDescription": "The Event counts CPU clock cycles, when the = PE is not in WFE/WFI. The clock measured by this Event is defined as the ph= ysical clock driving the CPU logic." + }, + { + "ArchStdEvent": "CNT_CYCLES", + "PublicDescription": "The Event increments at a constant frequency= equal to the rate of increment of the System Counter, CNTPCT_EL0. This Eve= nt does not increment when the PE is in WFE/WFI." + }, + { + "EventCode": "0x01e1", + "EventName": "CPU_SLOT", + "PublicDescription": "Entitled CPU slots. The Event counts the num= ber of slots. When in ST mode, this Event shall increment by PMMIR_EL1.SLOT= S quantities, and when in SMT partitioned resource mode (regardless of in W= FI state or otherwise), this Event is incremented by PMMIR_EL1.SLOTS/2 quan= tities." + } +] diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/l1d_cache.json b/= tools/perf/pmu-events/arch/arm64/nvidia/t410/l1d_cache.json new file mode 100644 index 000000000000..7c9b2fc4b38c --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/l1d_cache.json @@ -0,0 +1,122 @@ +[ + { + "ArchStdEvent": "L1D_CACHE_REFILL", + "PublicDescription": "The Event counts L1 D-cache refills caused b= y speculatively executed Load or Store operations or preload Instructions o= r hardware cache prefetching that missed in the L1 D-cache. This Event only= counts one Event per cache line. Since the caches are Write-back only for = this processor, there are no Write-through cache accesses." + }, + { + "ArchStdEvent": "L1D_CACHE", + "PublicDescription": "The Event counts L1 D-cache accesses from an= y Load/Store operations, software preload or hardware prefetch operation. A= tomic operations that resolve in the CPUs caches (near atomic operations) c= ounts as both a Write access and Read access. Each access to a cache line i= s counted including the multiple accesses caused by single Instructions suc= h as LDM or STM. Each access to other L1 Data or unified memory structures,= for example refill buffers, Write buffers, and Write-back buffers, are als= o counted. This Event counts the sum of L1D_CACHE_RD, L1D_CACHE_WR, L1D_CAC= HE_PRFM and L1D_CACHE_HWPRF." + }, + { + "ArchStdEvent": "L1D_CACHE_WB", + "PublicDescription": "The Event counts Write-backs of dirty Data f= rom the L1 D-cache to the L2 cache. This occurs when either a dirty cache l= ine is evicted from L1 D-cache and allocated in the L2 cache or dirty Data = is written to the L2 and possibly to the next level of cache. This Event co= unts both victim cache line evictions and cache Write-backs from snoops or = cache maintenance operations. The following cache operations are not counte= d: * Invalidations which do not result in Data being transferred out of the= L1 (such as evictions of clean Data), * Full line Writes which write to L2= without writing L1, such as Write streaming mode. This Event is the sum of= the L1D_CACHE_WB_CLEAN and L1D_CACHE_WB_VICTIM Events." + }, + { + "ArchStdEvent": "L1D_CACHE_LMISS_RD", + "PublicDescription": "The Event counts cache line refills into the= L1 D-cache from any memory Read operations, that incurred additional laten= cy. Counts same as L1D_CACHE_REFILL_RD on this CPU." + }, + { + "ArchStdEvent": "L1D_CACHE_RD", + "PublicDescription": "The Event counts L1 D-cache accesses from an= y Load operation. Atomic Load operations that resolve in the CPUs caches co= unts as both a Write access and Read access." + }, + { + "ArchStdEvent": "L1D_CACHE_WR", + "PublicDescription": "The Event counts L1 D-cache accesses generat= ed by Store operations. This Event also counts accesses caused by a DC ZVA = (D-cache zero, specified by virtual address) Instruction. Near atomic opera= tions that resolve in the CPUs caches count as a Write access and Read acce= ss. This Event is a subset of the L1D_CACHE Event, except this Event only c= ounts memory Write operations." + }, + { + "ArchStdEvent": "L1D_CACHE_REFILL_RD", + "PublicDescription": "The Event counts L1 D-cache refills caused b= y speculatively executed Load Instructions where the memory Read operation = misses in the L1 D-cache. This Event only counts one Event per cache line. = This Event is a subset of the L1D_CACHE_REFILL Event, but only counts memor= y Read operations. This Event does not count reads caused by cache maintena= nce operations or preload Instructions." + }, + { + "ArchStdEvent": "L1D_CACHE_REFILL_WR", + "PublicDescription": "The Event counts L1 D-cache refills caused b= y speculatively executed Store Instructions where the memory Write operatio= n misses in the L1 D-cache. This Event only counts one Event per cache line= . This Event is a subset of the L1D_CACHE_REFILL Event, but only counts mem= ory Write operations." + }, + { + "ArchStdEvent": "L1D_CACHE_REFILL_INNER", + "PublicDescription": "The Event counts L1 D-cache refills (L1D_CAC= HE_REFILL) where the cache line Data came from caches inside the immediate = Cluster of the Core (L2 cache)." + }, + { + "ArchStdEvent": "L1D_CACHE_REFILL_OUTER", + "PublicDescription": "The Event counts L1 D-cache refills (L1D_CAC= HE_REFILL) for which the cache line Data came from outside the immediate Cl= uster of the Core, like an SLC in the system interconnect or DRAM or remote= socket." + }, + { + "ArchStdEvent": "L1D_CACHE_WB_VICTIM", + "PublicDescription": "The Event counts dirty cache line evictions = from the L1 D-cache caused by a new cache line allocation. This Event does = not count evictions caused by cache maintenance operations. This Event is a= subset of the L1D_CACHE_WB Event, but the Event only counts Write-backs th= at are a result of the line being allocated for an access made by the CPU." + }, + { + "ArchStdEvent": "L1D_CACHE_WB_CLEAN", + "PublicDescription": "The Event counts Write-backs from the L1 D-c= ache that are a result of a coherency operation made by another CPU. Event = count includes cache maintenance operations. This Event is a subset of the = L1D_CACHE_WB Event." + }, + { + "ArchStdEvent": "L1D_CACHE_INVAL", + "PublicDescription": "The Event counts each explicit invalidation = of a cache line in the L1 D-cache caused by: * Cache Maintenance Operations= (CMO) that operate by a virtual address. * Broadcast cache coherency opera= tions from another CPU in the system. This Event does not count for the fol= lowing conditions: * A cache refill invalidates a cache line. * A CMO which= is executed on that CPU and invalidates a cache line specified by Set/Way.= Note that CMOs that operate by Set/Way cannot be broadcast from one CPU to= another." + }, + { + "ArchStdEvent": "L1D_CACHE_RW", + "PublicDescription": "The Event counts L1 Data demand cache access= es from any Load or Store operation. Near atomic operations that resolve in= the CPUs caches counts as both a Write access and Read access. This Event = is implemented as L1D_CACHE_RD + L1D_CACHE_WR" + }, + { + "ArchStdEvent": "L1D_CACHE_PRFM", + "PublicDescription": "The Event counts L1 D-cache accesses from so= ftware preload or prefetch Instructions." + }, + { + "ArchStdEvent": "L1D_CACHE_REFILL_PRFM", + "PublicDescription": "The Event counts L1 D-cache refills where th= e cache line access was generated by software preload or prefetch Instructi= ons." + }, + { + "ArchStdEvent": "L1D_CACHE_HWPRF", + "PublicDescription": "The Event counts L1 D-cache accesses from an= y Load/Store operations generated by the hardware prefetcher." + }, + { + "ArchStdEvent": "L1D_CACHE_MISS", + "PublicDescription": "The Event counts each demand access counted = by L1D_CACHE_RW that misses in the L1 Data or unified cache, causing an acc= ess to outside of the L1 caches of this PE." + }, + { + "ArchStdEvent": "L1D_CACHE_REFILL_HWPRF", + "PublicDescription": "The Event counts each hardware prefetch acce= ss counted by L1D_CACHE_HWPRF that causes a refill of the L1 D-cache from o= utside of the L1 D-cache." + }, + { + "ArchStdEvent": "L1D_CACHE_HIT_RW_FPRFM", + "PublicDescription": "The Event counts each demand access first hi= t counted by L1D_CACHE_HIT_RW_FPRF where the cache line was fetched in resp= onse to a prefetch Instruction. That is, the L1D_CACHE_REFILL_PRFM Event wa= s generated when the cache line was fetched into the cache. Only the first = hit by a demand access is counted. After this Event is generated for a cach= e line, the Event is not generated again for the same cache line while it r= emains in the cache." + }, + { + "ArchStdEvent": "L1D_CACHE_HIT_RW_FHWPRF", + "PublicDescription": "The Event counts each demand access first hi= t counted by L1D_CACHE_HIT_RW_FPRF where the cache line was fetched by a ha= rdware prefetcher. That is, the L1D_CACHE_REFILL_HWPRF Event was generated = when the cache line was fetched into the cache. Only the first hit by a dem= and access is counted. After this Event is generated for a cache line, the = Event is not generated again for the same cache line while it remains in th= e cache." + }, + { + "ArchStdEvent": "L1D_CACHE_HIT_RW_FPRF", + "PublicDescription": "The Event counts each demand access first hi= t counted by L1D_CACHE_HIT_RW where the cache line was fetched in response = to a prefetch Instruction or by a hardware prefetcher. That is, the L1D_CAC= HE_REFILL_PRF Event was generated when the cache line was fetched into the = cache. Only the first hit by a demand access is counted. After this Event i= s generated for a cache line, the Event is not generated again for the same= cache line while it remains in the cache." + }, + { + "ArchStdEvent": "L1D_LFB_HIT_RW_FHWPRF", + "PublicDescription": "The Event counts each demand access line-fil= l buffer first hit counted by L1D_LFB_HIT_RW_FPRF, where the cache line was= fetched by a hardware prefetcher. That is, the access hits a cache line th= at is in the process of being loaded into the L1 D-cache, and so does not g= enerate a new refill, but has to wait for the previous refill to complete, = and the L1D_CACHE_REFILL_HWPRF Event was generated when the cache line was = fetched into the cache. Only the first hit by a demand access is counted. A= fter this Event is generated for a cache line, the Event is not generated a= gain for the same cache line while it remains in the cache." + }, + { + "ArchStdEvent": "L1D_LFB_HIT_RW_FPRFM", + "PublicDescription": "The Event counts each demand access line-fil= l buffer first hit counted by L1D_LFB_HIT_RW_FPRF where the cache line was = fetched in response to a prefetch Instruction. That is, the access hits a c= ache line that is in the process of being loaded into the L1 D-cache, and s= o does not generate a new refill, but has to wait for the previous refill t= o complete, and theL1D_CACHE_REFILL_PRFM Event was generated when the cache= line was fetched into the cache. Only the first hit by a demand access is = counted. After this Event is generated for a cache line, the Event is not g= enerated again for the same cache line while it remains in the cache." + }, + { + "ArchStdEvent": "L1D_LFB_HIT_RW_FPRF", + "PublicDescription": "The Event counts each demand access line-fil= l buffer first hit counted by L1D_LFB_HIT_RW where the cache line was fetch= ed in response to a prefetch Instruction or by a hardware prefetcher. That = is, the access hits a cache line that is in the process of being loaded int= o the L1 D-cache, and so does not generate a new refill, but has to wait fo= r the previous refill to complete, and the L1D_CACHE_REFILL_PRF Event was g= enerated when the cache line was fetched into the cache. Only the first hit= by a demand access is counted. After this Event is generated for a cache l= ine, the Event is not generated again for the same cache line while it rema= ins in the cache." + }, + { + "EventCode": "0x0204", + "EventName": "L1D_CACHE_REFILL_OUTER_LLC", + "PublicDescription": "The Event counts L1D_CACHE_REFILL from L3 D-= cache." + }, + { + "EventCode": "0x0205", + "EventName": "L1D_CACHE_REFILL_OUTER_DRAM", + "PublicDescription": "The Event counts L1D_CACHE_REFILL from local= memory." + }, + { + "EventCode": "0x0206", + "EventName": "L1D_CACHE_REFILL_OUTER_REMOTE", + "PublicDescription": "The Event counts L1D_CACHE_REFILL from a rem= ote memory." + }, + { + "EventCode": "0x01f5", + "EventName": "L1D_CACHE_REFILL_RW", + "PublicDescription": "L1 D-cache refill, demand Read and Write. Th= e Event counts demand Read and Write accesses that causes a refill of the L= 1 D-cache of this PE, from outside of this cache." + } +] diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/l1i_cache.json b/= tools/perf/pmu-events/arch/arm64/nvidia/t410/l1i_cache.json new file mode 100644 index 000000000000..a1faa284b80f --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/l1i_cache.json @@ -0,0 +1,114 @@ +[ + { + "ArchStdEvent": "L1I_CACHE_REFILL", + "PublicDescription": "The Event counts cache line refills in the L= 1 I-cache caused by a missed Instruction fetch (Demand, hardware prefetch a= nd software preload accesses). Instruction fetches may include accessing mu= ltiple Instructions, but the single cache line allocation is counted once." + }, + { + "ArchStdEvent": "L1I_CACHE", + "PublicDescription": "The Event counts Instruction fetches (Demand= , hardware prefetch and software preload accesses) which access the L1 Inst= ruction Cache. Instruction Cache accesses caused by cache maintenance opera= tions are not counted." + }, + { + "ArchStdEvent": "L1I_CACHE_LMISS", + "PublicDescription": "The Event counts cache line refills into the= L1 I-cache, that incurred additional latency. Counts the same as L1I_CACHE= _REFILL in this CPU." + }, + { + "ArchStdEvent": "L1I_CACHE_RD", + "PublicDescription": "The Event counts demand Instruction fetches = which access the L1 I-cache." + }, + { + "ArchStdEvent": "L1I_CACHE_PRFM", + "PublicDescription": "The Event counts Instruction fetches generat= ed by software preload or prefetch Instructions which access the L1 I-cache= ." + }, + { + "ArchStdEvent": "L1I_CACHE_HWPRF", + "PublicDescription": "The Event counts Instruction fetches which a= ccess the L1 I-cache generated by the hardware prefetcher." + }, + { + "ArchStdEvent": "L1I_CACHE_REFILL_PRFM", + "PublicDescription": "The Event counts cache line refills in the L= 1 I-cache caused by a missed Instruction fetch generated by software preloa= d or prefetch Instructions. Instruction fetches may include accessing multi= ple Instructions, but the single cache line allocation is counted once." + }, + { + "ArchStdEvent": "L1I_CACHE_HIT_RD", + "PublicDescription": "The Event counts demand Instruction fetches = that access the L1 I-cache and hit in the L1 I-cache." + }, + { + "ArchStdEvent": "L1I_CACHE_HIT", + "PublicDescription": "The Event counts Instruction fetches that ac= cess the L1 I-cache (Demand, hardware prefetch and software preload accesse= s) and hit in the L1 I-cache. I-cache accesses caused by cache maintenance = operations are not counted." + }, + { + "ArchStdEvent": "L1I_CACHE_HIT_PRFM", + "PublicDescription": "The Event counts Instruction fetches generat= ed by software preload or prefetch Instructions that access the L1 I-cache = and hit in the L1 I-cache." + }, + { + "ArchStdEvent": "L1I_LFB_HIT_RD", + "PublicDescription": "The Event counts demand Instruction fetches = that access the L1 I-cache and hit in a line that is in the process of bein= g loaded into the L1 I-cache." + }, + { + "ArchStdEvent": "L1I_CACHE_REFILL_HWPRF", + "PublicDescription": "The Event counts each hardware prefetch acce= ss counted by L1I_CACHE_HWPRF that causes a refill of the Level 1I-cache fr= om outside of the L1 I-cache." + }, + { + "ArchStdEvent": "L1I_CACHE_HIT_RD_FPRF", + "PublicDescription": "The Event counts each demand fetch first hit= counted by L1I_CACHE_HIT_RD where the cache line was fetched in response t= o a software preload or by a hardware prefetcher. That is, the L1I_CACHE_RE= FILL_PRF Event was generated when the cache line was fetched into the cache= . Only the first hit by a demand access is counted. After this Event is gen= erated for a cache line, the Event is not generated again for the same cach= e line while it remains in the cache." + }, + { + "EventCode": "0x0174", + "EventName": "L1I_HWPRF_REQ_DROP", + "PublicDescription": "L1 I-cache hardware prefetch dropped." + }, + { + "EventCode": "0x01ea", + "EventName": "L1I_CFC_ENTRIES", + "PublicDescription": "The Event counts the CFC (Cache Fill Control= ) entries. The CFC is the fill buffer for I-cache." + }, + { + "EventCode": "0x0228", + "EventName": "L1I_CACHE_HIT_PRFM_FPRF", + "PublicDescription": "L1 I-cache software prefetch access first hi= t, fetched by hardware or software prefetch. The Event counts each software= preload access first hit where the cache line was fetched in response to a= hadware prefetcher or software preload Instruction. Only the first hit is = counted. After this Event is generated for a cache line, the Event is not g= enerated again for the same cache line while it remains in the cache." + }, + { + "EventCode": "0x0212", + "EventName": "L1I_CACHE_HIT_HWPRF", + "PublicDescription": "The Event counts each hardware prefetch acce= ss that hits an L1 I-cache." + }, + { + "EventCode": "0x022a", + "EventName": "L1I_CACHE_HIT_HWPRF_FPRF", + "PublicDescription": "L1 I-cache hardware prefetch access first hi= t, fetched by hardware or software prefetch. The Event counts each hardware= prefetch access first hit where the cache line was fetched in response to = a hardware or prefetch Instruction. Only the first hit is counted. After th= is Event is generated for a cache line, the Event is not generated again fo= r the same cache line while it remains in the cache." + }, + { + "EventCode": "0x0215", + "EventName": "L1I_LFB_HIT", + "PublicDescription": "L1 Line fill buffer hit. The Event counts ea= ch Demand or software preload or hardware prefetch induced Instruction fetc= h that hits an L1 I-cache line that is in the process of being loaded into = the L1 Instruction, and so does not generate a new refill, but has to wait = for the previous refill to complete." + }, + { + "EventCode": "0x0216", + "EventName": "L1I_LFB_HIT_PRFM", + "PublicDescription": "The Event counts each software prefetch acce= ss that hits a cache line that is in the process of being loaded into the L= 1 Instruction, and so does not generate a new refill, but has to wait for t= he previous refill to complete." + }, + { + "EventCode": "0x0219", + "EventName": "L1I_LFB_HIT_HWPRF", + "PublicDescription": "The Event counts each hardware prefetch acce= ss that hits a cache line that is in the process of being loaded into the L= 1 Instruction, and so does not generate a new refill, but has to wait for t= he previous refill to complete." + }, + { + "EventCode": "0x0221", + "EventName": "L1I_PRFM_REQ", + "PublicDescription": "L1 I-cache software prefetch requests." + }, + { + "EventCode": "0x0222", + "EventName": "L1I_HWPRF_REQ", + "PublicDescription": "L1 I-cache hardware prefetch requests." + }, + { + "EventCode": "0x01e3", + "EventName": "L1I_CACHE_REFILL_RD", + "PublicDescription": "L1 I-cache refill, Read. The Event counts de= mand Instruction fetch that causes a refill of the L1 I-cache of this PE, f= rom outside of this cache." + }, + { + "EventCode": "0x01ef", + "EventName": "L1I_CACHE_INVAL", + "PublicDescription": "L1 I-cache invalidate. The Event counts each= explicit invalidation of a cache line in the L1 I-cache caused by: * Broad= cast cache coherency operations from another CPU in the system. * Invalidat= ion dues to capacity eviction in L2 D-cache. This Event does not count for = the following conditions: * A cache refill invalidates a cache line. * A CM= O which is executed on that CPU Core and invalidates a cache line specified= by Set/Way. * Cache Maintenance Operations (CMO) that operate by a virtual= address. Note that * CMOs that operate by Set/Way cannot be broadcast from= one CPU Core to another. * The CMO is treated as No-op for the purposes of= L1 I-cache line invalidation, as this Core implements fully coherent I-cac= he." + } +] diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/l2d_cache.json b/= tools/perf/pmu-events/arch/arm64/nvidia/t410/l2d_cache.json new file mode 100644 index 000000000000..ac3f8095a997 --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/l2d_cache.json @@ -0,0 +1,134 @@ +[ + { + "ArchStdEvent": "L2D_CACHE", + "PublicDescription": "The Event counts accesses to the L2 cache du= e to Data accesses. L2 cache is a unified cache for Data and Instruction ac= cesses. Accesses are for misses in the L1 D-cache or translation resolution= s due to accesses. This Event also counts Write back of dirty Data from L1 = D-cache to the L2 cache. This CPU includes I-cache accesses in this counter= as L2I equivalent Event was not implemented. This Event is the sum of the = L2D_CACHE_RD, L2D_CACHE_WR, L2D_CACHE_PRFM, and L2D_CACHE_HWPRF Events." + }, + { + "ArchStdEvent": "L2D_CACHE_REFILL", + "PublicDescription": "The Event counts cache line refills into the= L2 cache. L2 cache is a unified cache for Data and Instruction accesses. A= ccesses are for misses in the L1 D-cache or translation resolutions due to = accesses. This CPU includes I-cache refills in this counter as L2I equivale= nt Event was not implemented. This Event is the sum of L2D_CACHE_REFILL_RD,= L2D_CACHE_REFILL_WR, L2D_CACHE_REFILL_HWPRF, and L2D_CACHE_REFILL_PRFM." + }, + { + "ArchStdEvent": "L2D_CACHE_WB", + "PublicDescription": "The Event counts Write-backs of Data from th= e L2 cache to outside the CPU. This includes snoops to the L2 (from other C= PUs) which return Data even if the snoops cause an invalidation. L2 cache l= ine invalidations which do not write Data outside the CPU and snoops which = return Data from an L1 cache are not counted. Data would not be written out= side the cache when invalidating a clean cache line. This Event is the sum = of the L2D_CACHE_WB_VICTIM and L2D_CACHE_WB_CLEAN Events." + }, + { + "ArchStdEvent": "L2D_CACHE_RD", + "PublicDescription": "The Event counts L2 D-cache accesses due to = memory Read operations. L2 cache is a unified cache for Data and Instructio= n accesses, accesses are for misses in the L1 D-cache or translation resolu= tions due to accesses. This CPU includes I-cache accesses in this counter a= s L2I equivalent Event was not implemented. This Event is a subset of the L= 2D_CACHE Event, but this Event only counts memory Read operations." + }, + { + "ArchStdEvent": "L2D_CACHE_WR", + "PublicDescription": "The Event counts L2 cache accesses due to me= mory Write operations. L2 cache is a unified cache for Data and Instruction= accesses, accesses are for misses in the L1 D-cache or translation resolut= ions due to accesses. This Event is a subset of the L2D_CACHE Event, but th= is Event only counts memory Write operations." + }, + { + "ArchStdEvent": "L2D_CACHE_REFILL_RD", + "PublicDescription": "The Event counts refills for memory accesses= due to memory Read operation counted by L2D_CACHE_RD. L2 cache is a unifie= d cache for Data and Instruction accesses, accesses are for misses in the L= 1 D-cache or translation resolutions due to accesses. This CPU includes I-c= ache refills in this counter as L2I equivalent Event was not implemented. T= his Event is a subset of the L2D_CACHE_REFILL Event. This Event does not co= unt L2 refills caused by stashes into L2. This count includes demand reques= ts that encounter an L2 prefetch request or an L2 software prefetch request= to the same cache line, which is still pending in the L2 LFB." + }, + { + "ArchStdEvent": "L2D_CACHE_REFILL_WR", + "PublicDescription": "The Event counts refills for memory accesses= due to memory Write operation counted by L2D_CACHE_WR. L2 cache is a unifi= ed cache for Data and Instruction accesses, accesses are for misses in the = L1 D-cache or translation resolutions due to accesses. This count includes = demand requests that encounter an L2 prefetch request or an L2 software pre= fetch request to the same cache line, which is still pending in the L2 LFB." + }, + { + "ArchStdEvent": "L2D_CACHE_WB_VICTIM", + "PublicDescription": "The Event counts evictions from the L2 cache= because of a line being allocated into the L2 cache. This Event is a subse= t of the L2D_CACHE_WB Event." + }, + { + "ArchStdEvent": "L2D_CACHE_WB_CLEAN", + "PublicDescription": "The Event counts Write-backs from the L2 cac= he that are a result of any of the following: * Cache maintenance operation= s, * Snoop responses or, * Direct cache transfers to another CPU due to a f= orwarding snoop request. This Event is a subset of the L2D_CACHE_WB Event." + }, + { + "ArchStdEvent": "L2D_CACHE_INVAL", + "PublicDescription": "The Event counts each explicit invalidation = of a cache line in the L2 cache by cache maintenance operations that operat= e by a virtual address, or by external coherency operations. This Event doe= s not count if either: * A cache refill invalidates a cache line, or * A ca= che Maintenance Operation (CMO), which invalidates a cache line specified b= y Set/Way, is executed on that CPU. CMOs that operate by Set/Way cannot be = broadcast from one CPU to another." + }, + { + "ArchStdEvent": "L2D_CACHE_LMISS_RD", + "PublicDescription": "The Event counts cache line refills into the= L2 unified cache from any memory Read operations that incurred additional = latency. Counts the same as L2D_CACHE_REFILL_RD in this CPU" + }, + { + "ArchStdEvent": "L2D_CACHE_RW", + "PublicDescription": "The Event counts L2 cache demand accesses fr= om any Load/Store operations. L2 cache is a unified cache for Data and Inst= ruction accesses, accesses are for misses in the L1 D-cache or translation = resolutions due to accesses. This CPU includes I-cache accesses in this cou= nter as L2I equivalent Event was not implemented. This Event is the sum of = the L2D_CACHE_RD and L2D_CACHE_WR Events." + }, + { + "ArchStdEvent": "L2D_CACHE_PRFM", + "PublicDescription": "The Event counts L2 D-cache accesses generat= ed by software preload or prefetch Instructions with target =3D L1/L2/L3 ca= che. Note that a software preload or prefetch Instructions with (target =3D= L1/L2/L3) that hits in L1D will not result in an L2 D-cache access. Theref= ore, such a software preload or prefetch Instructions will not be counted b= y this Event." + }, + { + "ArchStdEvent": "L2D_CACHE_MISS", + "PublicDescription": "The Event counts cache line misses in the L2= cache. L2 cache is a unified cache for Data and Instruction accesses. Acce= sses are for misses in the L1 D-cache or translation resolutions due to acc= esses. Counts same as L2D_CACHE_REFILL_RD in this CPU" + }, + { + "ArchStdEvent": "L2D_CACHE_REFILL_PRFM", + "PublicDescription": "The Event counts refills due to accesses gen= erated as a result of software preload or prefetch Instructions as counted = by L2D_CACHE_PRFM. This CPU includes I-cache refills in this counter as L2I= equivalent Event was not implemented." + }, + { + "ArchStdEvent": "L2D_CACHE_REFILL_HWPRF", + "PublicDescription": "The Event counts each hardware prefetch acce= ss counted by L2D_CACHE_HWPRF that causes a refill of the L2 cache, or any = L1 Data, or Instruction of this PE, from outside of those caches. This does= not include prefetch requests pending waiting for a refill in LFB and a ne= w demand request to the same cache line hitting the LFB entry. All such ref= ills are counted as L2D_LFB_HIT_RWL1PRF_FHWPRF." + }, + { + "ArchStdEvent": "L2D_CACHE_REFILL_PRF", + "PublicDescription": "The Event counts each access to L2 Cache due= to a prefetch Instruction, or hardware prefetch that causes a refill of th= e L2 or any Level 1, from outside of those caches." + }, + { + "ArchStdEvent": "L2D_CACHE_HWPRF", + "PublicDescription": "The Event counts the L2 D-cache access cause= d by L1 or L2 hardware prefetcher." + }, + { + "EventCode": "0x0108", + "EventName": "L2D_CACHE_IF_REFILL", + "PublicDescription": "L2 D-cache refill, Instruction fetch. The Ev= ent counts demand Instruction fetch that causes a refill of the L2 cache or= L1 cache of this PE, from outside of those caches." + }, + { + "EventCode": "0x0109", + "EventName": "L2D_CACHE_TBW_REFILL", + "PublicDescription": "L2 D-cache refill, Page table walk. The Even= t counts demand translation table walk that causes a refill of the L2 cache= or L1 cache of this PE, from outside of those caches." + }, + { + "EventCode": "0x010a", + "EventName": "L2D_CACHE_PF_REFILL", + "PublicDescription": "L2 D-cache refill, prefetch. The Event count= s L1 or L2 hardware or software prefetch accesses that causes a refill of t= he L2 cache or L1 cache of this PE, from outside of those caches." + }, + { + "EventCode": "0x0201", + "EventName": "L2D_CACHE_BACKSNOOP_L1D_VIRT_ALIASING", + "PublicDescription": "The Event counts when the L2 D-cache sends a= n invalidating back-snoop to the L1 D for an access initiated by the L1 D, = where the corresponding line is already present in the L1 D-cache. The L2 D= -cache line tags the PE that refilled the line. It also retains specific bi= ts of the VA to identify virtually aliased addresses. The L1 D request requ= iring a back-snoop can originate either from the same PE that refilled the = L2 D line or from a different PE. In either case, this Event only counts th= ose back snoop where the requested VA mismatch the VA stored in the L2 D ta= g. This Event is counted only by PE that initiated the original request nec= essitating a back-snoop. Note : The L1 D is VIPT, it identifies this access= as a miss. Conversely, as L2 is PIPT, it identifies this as a hit. L2 D ut= ilizes the back-snoop mechanism to refill L1 D with the snooped Data." + }, + { + "EventCode": "0x0179", + "EventName": "L2D_CACHE_HIT_RWL1PRF_FHWPRF", + "PublicDescription": "L2 D-cache demand Read, demand Write and L1 = prefetch hit, fetched by hardware prefetch.. The Event counts each demand R= ead, demand Write and L1 hardware or software prefetch request that hit an = L2 D-cache line that was refilled into L2 D-cache in response to an L2 hard= ware prefetch. Only the first hit is counted. After this Event is generated= for a cache line, the Event is not generated again for the same cache line= while it remains in the cache." + }, + { + "EventCode": "0x020c", + "EventName": "L2D_CACHE_HIT_RWL1PRF_FPRFM", + "PublicDescription": "L2 D-cache demand Read, demand Write and L1 = prefetch hit, fetched by software prefetch. The Event counts each demand Re= ad, demand Write and L1 hardware or software prefetch request that hit an L= 2 D-cache line that was refilled into L2 D-cache in response to an L2 softw= are prefetch. Only the first hit is counted. After this Event is generated = for a cache line, the Event is not generated again for the same cache line = while it remains in the cache." + }, + { + "EventCode": "0x020e", + "EventName": "L2D_CACHE_HIT_RWL1PRF_FPRF", + "PublicDescription": "L2 D-cache demand Read, demand Write and L1 = prefetch hit, fetched by software or hardware prefetch. The Event counts ea= ch demand Read, demand Write and L1 hardware or software prefetch request t= hat hit an L2 D-cache line that was refilled into L2 D-cache in response to= an L2 hardware prefetch or software prefetch. Only the first hit is counte= d. After this Event is generated for a cache line, the Event is not generat= ed again for the same cache line while it remains in the cache." + }, + { + "EventCode": "0x010b", + "EventName": "L2D_LFB_HIT_RWL1PRF_FHWPRF", + "PublicDescription": "L2 line fill buffer demand Read, demand Writ= e or L1 prefetch first hit, fetched by hardware prefetch. The Event counts = each of the following access that hit the line-fill buffer when the same ca= che line is already being fetched due to an L2 hardware prefetcher. * Deman= d Read or Write * L1I-HWPRF * L1D-HWPRF * L1I PRFM * L1D PRFM These accesse= s hit a cache line that is currently being loaded into the L2 cache as a re= sult of a hardware prefetcher to the same line. Consequently, this access d= oes not initiate a new refill but waits for the completion of the previous = refill. Only the first hit is counted. After this Event is generated for a = cache line, the Event is not generated again for the same cache line while = it remains in the cache." + }, + { + "EventCode": "0x01b9", + "EventName": "L2D_CACHE_REFILL_L1PRF", + "PublicDescription": "L2 D-cache refill, L1 hardware or software p= refetch. The Event counts each access counted by L2D_CACHE_L1PRF that cause= s a refill of the L2 cache or any L1 cache of this PE, from outside of thos= e caches." + }, + { + "EventCode": "0x020a", + "EventName": "L2D_CACHE_REFILL_RWL1PRF", + "PublicDescription": "L2 D-cache refill, demand Read, demand Write= or L1 hardware or software prefetch. The Event counts each access counted = by L2D_CACHE_RWL1PRF that causes a refill of the L2 cache, or any L1 cache = of this PE, from outside of those caches." + }, + { + "EventCode": "0x01b8", + "EventName": "L2D_CACHE_L1PRF", + "PublicDescription": "L2 D-cache access, L1 hardware or software p= refetch. The Event counts L1 Hardware or software prefetch access to L2 D-c= ache." + }, + { + "EventCode": "0x0208", + "EventName": "L2D_CACHE_RWL1PRF", + "PublicDescription": "L2 D-cache access, demand Read, demand Write= or L1 hardware or software prefetch. The Event counts each access to L2 D-= cache due to the following: * Demand Read or Write. * L1 Hardware or softwa= re prefetch." + } +] diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/ll_cache.json b/t= ools/perf/pmu-events/arch/arm64/nvidia/t410/ll_cache.json new file mode 100644 index 000000000000..661cbed4ee34 --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/ll_cache.json @@ -0,0 +1,107 @@ +[ + { + "ArchStdEvent": "LL_CACHE_RD", + "PublicDescription": "This is an alias to the Event L3D_CACHE_RD (= 0x00a0)." + }, + { + "ArchStdEvent": "LL_CACHE_MISS_RD", + "PublicDescription": "This is an alias to the Event L3D_CACHE_REFI= LL_RD (0x00a2)." + }, + { + "ArchStdEvent": "L3D_CACHE_ALLOCATE", + "PublicDescription": "The Event counts each memory Write operation= that writes an entire line into the L3 Data without fetching Data from out= side the L3 Data. These are allocations of cache lines in the L3 Data that = are not refills counted by L3D_CACHE_REFILL. For example: A Write-back of a= n entire cache line from an L2 cache to the L3 D-cache. * A Write of an ent= ire cache line from a coalescing Write buffer. * An operation such as DC ZV= A. This counter does not count that writes an entire line to beyond level 3= . Thus this counter does not count the streaming Writes to beyond L3 cache." + }, + { + "ArchStdEvent": "L3D_CACHE_REFILL", + "PublicDescription": "The Event counts each access counted by L3D_= CACHE that causes a refill of the L3 Data,or any L1 Data, Instruction or L2= cache of this PE, from outside of those caches. This includes the refill d= ue to hardware prefetch and software prefetch accesses. This Event is a sum= of L3D_CACHE_MISS, L3D_CACHE_REFILL_PRFM and L3D_CACHE_REFILL_HWPRF Event.= A refill includes any access that causes Data to be fetched from outside o= f the L1 to L3 caches, even if the Data is ultimately not allocated into th= e L3 D-cache." + }, + { + "ArchStdEvent": "L3D_CACHE", + "PublicDescription": "The Event counts each memory Read operation = or memory Write operation that causes a cache access to the Level 3. This E= vent is a sum of the following Events: * L3D_CACHE_RD(0x00a0) * L3D_CACHE_A= LLOCATE(0x0029) * L3D_CACHE_PRFM(0x8151) * L3D_CACHE_HWPRF(0x8156) * L2D_CA= CHE_WB(0x0018)" + }, + { + "ArchStdEvent": "L3D_CACHE_RD", + "PublicDescription": "The Event counts each Memory Read operation = to L3 D-cache from Instruction Fetch, Load/Store, and MMU translation table= accesses. This does not include HWPRF or PRFM accesses. This include L1 an= d L2 prefetcher accesses to L3 D-cache." + }, + { + "ArchStdEvent": "L3D_CACHE_REFILL_RD", + "PublicDescription": "The Event counts each access counted by both= L3D_CACHE_RD and L3D_CACHE_REFILL. That is, every refill of the L3 cache c= ounted by L3D_CACHE_REFILL that is caused by a Memory Read operation. The L= 3D_CACHE_MISS(0x8152), L3D_CACHE_REFILL_RD (0x00a2) and L3D_CACHE_LMISS_RD(= 0x400b) count the same Event in the hardware." + }, + { + "ArchStdEvent": "L3D_CACHE_LMISS_RD", + "PublicDescription": "The Event counts each memory Read operation = to the L3 cache counted by L3D_CACHE that incurs additional latency because= it returns Data from outside of the L1 to L3 caches. The L3D_CACHE_MISS(0x= 8152), L3D_CACHE_REFILL_RD (0x00a2) and L3D_CACHE_LMISS_RD(0x400b) count th= e same Event in the hardware." + }, + { + "ArchStdEvent": "L3D_CACHE_RW", + "PublicDescription": "The Event counts each access counted by L3D_= CACHE that is due to a demand memory Read operation or demand memory Write = operation. This Event is a sum of L3D_CACHE_RD(0x00a0), L3D_CACHE_ALLOCATE(= 0x0029) and L2D_CACHE_WB(0x0018). Note that this counter does not count tha= t writes an entire line to beyond level 3. Thus this counter does not count= the streaming Writes to beyond L3 cache." + }, + { + "ArchStdEvent": "L3D_CACHE_PRFM", + "PublicDescription": "The Event counts each access counted by L3D_= CACHE that is due to a prefetch Instruction. This includes L3 Data accesses= due to the L1, L2, or L3 prefetch Instruction." + }, + { + "ArchStdEvent": "L3D_CACHE_MISS", + "PublicDescription": "The Event counts each demand Read access cou= nted by L3D_CACHE_RD that misses in the L1 to L3 Data, causing an access to= outside of the L3 cache. The L3D_CACHE_MISS(0x8152), L3D_CACHE_REFILL_RD (= 0x00a2) and L3D_CACHE_LMISS_RD(0x400b) count the same Event in the hardware= ." + }, + { + "ArchStdEvent": "L3D_CACHE_REFILL_PRFM", + "PublicDescription": "The Event counts each access counted by L3D_= CACHE_PRFM that causes a refill of the L3 cache, or any L1 or L2 Data, from= outside of those caches." + }, + { + "ArchStdEvent": "L3D_CACHE_HWPRF", + "PublicDescription": "The Event counts each access to L3 cache tha= t is due to a hardware prefetcher. This includes L3D accesses due to the Le= vel-1 or Level-2 or Level-3 hardware prefetcher." + }, + { + "ArchStdEvent": "L3D_CACHE_REFILL_HWPRF", + "PublicDescription": "The Event counts each hardware prefetch coun= ted by L3D_CACHE_HWPRF that causes a refill of the L3 Data or unified cache= , or any L1 or L2 Data, Instruction, or unified cache of this PE, from outs= ide of those caches." + }, + { + "ArchStdEvent": "L3D_CACHE_REFILL_PRF", + "PublicDescription": "The Event counts each access to L3 cache due= to a prefetch Instruction, or hardware prefetch that causes a refill of th= e L3 Data, or any L1 or L2 Data, from outside of those caches." + }, + { + "EventCode": "0x01f7", + "EventName": "L3D_CACHE_HIT_RWL1PRFL2PRF_FPRF", + "PublicDescription": "L3 cache demand Read, demand Write , L1 pref= etch L2 prefetch first hit, fetched by software or hardware prefetch. The E= vent counts each demand Read, demand Write , L1 hardware or software prefet= ch request and L2 hardware or software prefetch that hit an L3 D-cache line= that was refilled into L3 D-cache in response to an L3 hardware prefetch o= r software prefetch. Only the first hit is counted. After this Event is gen= erated for a cache line, the Event is not generated again for the same cach= e line while it remains in the cache." + }, + { + "EventCode": "0x01e9", + "EventName": "L3D_CACHE_REFILL_RWL1PRFL2PRF", + "PublicDescription": "L3 cache refill, demand Read, demand Write, = L1 hardware or software prefetch or L2 hardware or software prefetch. The E= vent counts each access counted by L3D_CACHE_RWL1PRFL2PRF that causes a ref= ill of the L3 cache, or any L1 or L2 cache of this PE, from outside of thos= e caches." + }, + { + "EventCode": "0x01e8", + "EventName": "L3D_CACHE_RWL1PRFL2PRF", + "PublicDescription": "L3 cache access, demand Read, demand Write, = L1 hardware or software prefetch or L2 hardware or software prefetch. The E= vent counts each access to L3 D-cache due to the following: * Demand Read o= r Write. * L1 Hardware or software prefetch. * L2 Hardware or software pref= etch." + }, + { + "EventCode": "0x0225", + "EventName": "L3D_CACHE_REFILL_IF", + "PublicDescription": "L3 cache refill, Instruction fetch. The Even= t counts demand Instruction fetch that causes a refill of the L3 cache, or = any L1 or L2 cache of this PE, from outside of those caches." + }, + { + "EventCode": "0x0226", + "EventName": "L3D_CACHE_REFILL_MM", + "PublicDescription": "L3 cache refill, translation table walk acce= ss. The Event counts demand translation table access that causes a refill o= f the L3 cache, or any L1 or L2 cache of this PE, from outside of those cac= hes." + }, + { + "EventCode": "0x0227", + "EventName": "L3D_CACHE_REFILL_L1PRF", + "PublicDescription": "The Event counts each access counted by L3D_= CACHE_L1PRF that causes a refill of the L3 cache, or any L1 or L2 cache of = this PE, from outside of those caches." + }, + { + "EventCode": "0x01f6", + "EventName": "L3D_CACHE_REFILL_L2PRF", + "PublicDescription": "The Event counts each access counted by L3D_= CACHE_L2PRF that causes a refill of the L3 cache, or any L1 or L2 cache of = this PE, from outside of those caches." + }, + { + "EventCode": "0x022c", + "EventName": "L3D_CACHE_L1PRF", + "PublicDescription": "The Event counts the L3 D-cache access due t= o L1 hardware prefetch of software prefetch request. The L1 hardware prefet= ch or software prefetch request that miss the L1I, L1D and L2 D-cache are c= ounted by this counter" + }, + { + "EventCode": "0x022d", + "EventName": "L3D_CACHE_L2PRF", + "PublicDescription": "The Event counts the L3 D-cache access due t= o L2 hardware prefetch of software prefetch request. The L2 hardware prefet= ch or software prefetch request that miss the L2 D-cache are counted by thi= s counter" + } +] diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/memory.json b/too= ls/perf/pmu-events/arch/arm64/nvidia/t410/memory.json new file mode 100644 index 000000000000..64fced85a988 --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/memory.json @@ -0,0 +1,46 @@ +[ + { + "ArchStdEvent": "MEM_ACCESS", + "PublicDescription": "The Event counts memory accesses issued by t= he CPU Load/Store unit, where those accesses are issued due to Load or Stor= e operations. This Event counts memory accesses no matter whether the Data = is received from any level of cache hierarchy or external memory. If memory= accesses are broken up into smaller transactions than what were specified = in the Load or Store Instructions, then the Event counts those smaller memo= ry transactions. Memory accesses generated by the following Instructions or= activity are not counted: Instruction fetches, cache maintenance Instructi= ons, Translation table walks or prefetches, Memory prefetch operations. Thi= s Event counts the sum of the MEM_ACCESS_RD and MEM_ACCESS_WR Events." + }, + { + "ArchStdEvent": "MEMORY_ERROR", + "PublicDescription": "The Event counts any detected correctable or= uncorrectable physical memory errors (ECC or parity) in protected CPUs RAM= s. On the Core, this Event counts errors in the caches (including Data and = tag rams). Any detected memory error (from either a speculative and abandon= ed access, or an architecturally executed access) is counted. Note that err= ors are only detected when the actual protected memory is accessed by an op= eration." + }, + { + "ArchStdEvent": "REMOTE_ACCESS", + "PublicDescription": "Counter counts each external bus Read access= that causes an access to a remote device. That is, a socket that does not = contain the PE." + }, + { + "ArchStdEvent": "MEM_ACCESS_RD", + "PublicDescription": "The Event counts memory accesses issued by t= he CPU due to Load operations. The Event counts any memory Load access, no = matter whether the Data is received from any level of cache hierarchy or ex= ternal memory. The Event also counts atomic Load operations. If memory acce= sses are broken up by the Load/Store unit into smaller transactions that ar= e issued by the bus interface, then the Event counts those smaller transact= ions. The following Instructions are not counted: 1) Instruction fetches, 2= ) Cache maintenance Instructions, 3) Translation table walks or prefetches,= 4) Memory prefetch operations. This Event is a subset of the MEM_ACCESS Ev= ent but the Event only counts memory-Read operations." + }, + { + "ArchStdEvent": "MEM_ACCESS_WR", + "PublicDescription": "The Event counts memory accesses issued by t= he CPU due to Store operations. The Event counts any memory Store access, n= o matter whether the Data is located in any level of cache or external memo= ry. The Event also counts atomic Load and Store operations. If memory acces= ses are broken up by the Load/Store unit into smaller transactions that are= issued by the bus interface, then the Event counts those smaller transacti= ons." + }, + { + "ArchStdEvent": "LDST_ALIGN_LAT", + "PublicDescription": "The Event counts the number of memory Read a= nd Write accesses in a cycle that incurred additional latency, due to the a= lignment of the address and the size of Data being accessed, which results = in Store crossing a single cache line. This Event is implemented as the sum= of LD_ALIGN_LAT and ST_ALIGN_LAT on this CPU." + }, + { + "ArchStdEvent": "LD_ALIGN_LAT", + "PublicDescription": "The Event counts the number of memory Read a= ccesses in a cycle that incurred additional latency, due to the alignment o= f the address and size of Data being accessed, which results in Load crossi= ng a single cache line." + }, + { + "ArchStdEvent": "ST_ALIGN_LAT", + "PublicDescription": "The Event counts the number of memory Write = access in a cycle that incurred additional latency, due to the alignment of= the address and size of Data being accessed incurred additional latency." + }, + { + "ArchStdEvent": "INST_FETCH_PERCYC", + "PublicDescription": "The Event counts number of Instruction fetch= es outstanding per cycle, which will provide an average latency of Instruct= ion fetch." + }, + { + "ArchStdEvent": "MEM_ACCESS_RD_PERCYC", + "PublicDescription": "The Event counts the number of outstanding L= oads or memory Read accesses per cycle." + }, + { + "ArchStdEvent": "INST_FETCH", + "PublicDescription": "The Event counts Instruction memory accesses= that the PE makes." + } +] diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/metrics.json b/to= ols/perf/pmu-events/arch/arm64/nvidia/t410/metrics.json new file mode 100644 index 000000000000..18c2fd58ee9e --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/metrics.json @@ -0,0 +1,722 @@ +[ + { + "MetricName": "backend_bound", + "MetricExpr": "100 * (STALL_SLOT_BACKEND / CPU_SLOT)", + "BriefDescription": "This metric is the percentage of total slots = that were stalled due to resource constraints in the backend of the process= or.", + "ScaleUnit": "1percent of slots", + "MetricGroup": "TopdownL1" + }, + { + "MetricName": "backend_busy_bound", + "MetricExpr": "100 * (STALL_BACKEND_BUSY / STALL_BACKEND)", + "BriefDescription": "This metric is the percentage of total cycles= stalled in the backend due to issue queues being full to accept operations= for execution.", + "ScaleUnit": "1percent of cycles", + "MetricGroup": "Topdown_Backend" + }, + { + "MetricName": "backend_cache_l1d_bound", + "MetricExpr": "100 * (STALL_BACKEND_L1D / (STALL_BACKEND_L1D + STA= LL_BACKEND_MEM))", + "BriefDescription": "This metric is the percentage of total cycles= stalled in the backend due to memory access latency issues caused by L1 D-= cache misses.", + "ScaleUnit": "1percent of cycles", + "MetricGroup": "Topdown_Backend" + }, + { + "MetricName": "backend_cache_l2d_bound", + "MetricExpr": "100 * (STALL_BACKEND_MEM / (STALL_BACKEND_L1D + STA= LL_BACKEND_MEM))", + "BriefDescription": "This metric is the percentage of total cycles= stalled in the backend due to memory access latency issues caused by L2 D-= cache misses.", + "ScaleUnit": "1percent of cycles", + "MetricGroup": "Topdown_Backend" + }, + { + "MetricName": "backend_core_bound", + "MetricExpr": "100 * (STALL_BACKEND_CPUBOUND / STALL_BACKEND)", + "BriefDescription": "This metric is the percentage of total cycles= stalled in the backend due to backend Core resource constraints not relate= d to Instruction fetch latency issues caused by memory access components.", + "ScaleUnit": "1percent of cycles", + "MetricGroup": "Topdown_Backend" + }, + { + "MetricName": "backend_core_rename_bound", + "MetricExpr": "100 * (STALL_BACKEND_RENAME / STALL_BACKEND_CPUBOUN= D)", + "BriefDescription": "This metric is the percentage of total cycles= stalled in the backend as the rename unit registers are unavailable.", + "ScaleUnit": "1percent of cycles", + "MetricGroup": "Topdown_Backend" + }, + { + "MetricName": "backend_mem_bound", + "MetricExpr": "100 * (STALL_BACKEND_MEMBOUND / STALL_BACKEND)", + "BriefDescription": "This metric is the percentage of total cycles= stalled in the backend due to backend Core resource constraints related to= memory access latency issues caused by memory access components.", + "ScaleUnit": "1percent of cycles", + "MetricGroup": "Topdown_Backend" + }, + { + "MetricName": "backend_mem_cache_bound", + "MetricExpr": "100 * ((STALL_BACKEND_L1D + STALL_BACKEND_MEM) / ST= ALL_BACKEND_MEMBOUND)", + "BriefDescription": "This metric is the percentage of total cycles= stalled in the backend due to memory latency issues caused by D-cache miss= es.", + "ScaleUnit": "1percent of cycles", + "MetricGroup": "Topdown_Backend" + }, + { + "MetricName": "backend_mem_store_bound", + "MetricExpr": "100 * (STALL_BACKEND_ST / STALL_BACKEND_MEMBOUND)", + "BriefDescription": "This metric is the percentage of total cycles= stalled in the backend due to memory Write pending caused by Stores stalle= d in the pre-commit stage.", + "ScaleUnit": "1percent of cycles", + "MetricGroup": "Topdown_Backend" + }, + { + "MetricName": "backend_mem_tlb_bound", + "MetricExpr": "100 * (STALL_BACKEND_TLB / STALL_BACKEND_MEMBOUND)", + "BriefDescription": "This metric is the percentage of total cycles= stalled in the backend due to memory access latency issues caused by Data = TLB misses.", + "ScaleUnit": "1percent of cycles", + "MetricGroup": "Topdown_Backend" + }, + { + "MetricName": "backend_stalled_cycles", + "MetricExpr": "100 * (STALL_BACKEND / CPU_CYCLES)", + "BriefDescription": "This metric is the percentage of cycles that = were stalled due to resource constraints in the backend unit of the process= or.", + "ScaleUnit": "1percent of cycles", + "MetricGroup": "Cycle_Accounting" + }, + { + "MetricName": "bad_speculation", + "MetricExpr": "100 - (frontend_bound + retiring + backend_bound)", + "BriefDescription": "This metric is the percentage of total slots = that executed operations and didn't retire due to a pipeline flush. This in= dicates cycles that were utilized but inefficiently.", + "ScaleUnit": "1percent of slots", + "MetricGroup": "TopdownL1" + }, + { + "MetricName": "bus_bandwidth", + "MetricExpr": "BUS_ACCESS * 32 / duration_time ", + "BriefDescription": "This metrics measures the bus-bandwidth of th= e Data transferred between this PE's L2 with unCore in the system", + "ScaleUnit": "1Bytes/sec" + }, + { + "MetricName": "barrier_percentage", + "MetricExpr": "100 * ((ISB_SPEC + DSB_SPEC + DMB_SPEC) / INST_SPEC= )", + "BriefDescription": "This metric measures Instruction and Data bar= rier operations as a percentage of operations speculatively executed.", + "ScaleUnit": "1percent of operations", + "MetricGroup": "Operation_Mix" + }, + { + "MetricName": "branch_direct_ratio", + "MetricExpr": "BR_IMMED_RETIRED / BR_RETIRED", + "BriefDescription": "This metric measures the ratio of direct bran= ches retired to the total number of Branches architecturally executed.", + "ScaleUnit": "1per branch", + "MetricGroup": "Branch_Effectiveness" + }, + { + "MetricName": "branch_indirect_ratio", + "MetricExpr": "BR_IND_RETIRED / BR_RETIRED", + "BriefDescription": "This metric measures the ratio of indirect Br= anches retired, including function returns, to the total number of Branches= architecturally executed.", + "ScaleUnit": "1per branch", + "MetricGroup": "Branch_Effectiveness" + }, + { + "MetricName": "branch_misprediction_ratio", + "MetricExpr": "BR_MIS_PRED_RETIRED / BR_RETIRED", + "BriefDescription": "This metric measures the ratio of Branches mi= spredicted to the total number of Branches architecturally executed. This g= ives an indication of the effectiveness of the Branch prediction unit.", + "ScaleUnit": "1per branch", + "MetricGroup": "Miss_Ratio;Branch_Effectiveness" + }, + { + "MetricName": "branch_mpki", + "MetricExpr": "1000 * (BR_MIS_PRED_RETIRED / INST_RETIRED)", + "BriefDescription": "This metric measures the number of Branch mis= predictions per thousand Instructions executed.", + "ScaleUnit": "1MPKI", + "MetricGroup": "MPKI;Branch_Effectiveness" + }, + { + "MetricName": "branch_percentage", + "MetricExpr": "100 * ((BR_IMMED_SPEC + BR_INDIRECT_SPEC) / INST_SP= EC)", + "BriefDescription": "This metric measures Branch operations as a p= ercentage of operations speculatively executed.", + "ScaleUnit": "1percent of operations", + "MetricGroup": "Operation_Mix" + }, + { + "MetricName": "branch_return_ratio", + "MetricExpr": "BR_RETURN_RETIRED / BR_RETIRED", + "BriefDescription": "This metric measures the ratio of Branches re= tired that are function returns to the total number of Branches architectur= ally executed.", + "ScaleUnit": "1per branch", + "MetricGroup": "Branch_Effectiveness" + }, + { + "MetricName": "cpu_cycles_fraction_in_st_mode", + "MetricExpr": "((CPU_SLOT/CPU_CYCLES) - 5) / 5", + "BriefDescription": "This metric counts fraction of the CPU cycles= spent in ST mode during program execution.", + "ScaleUnit": "1fraction of cycles", + "MetricGroup": "SMT" + }, + { + "MetricName": "cpu_cycles_in_smt_mode", + "MetricExpr": "(1 - cpu_cycles_fraction_in_st_mode) * CPU_CYCLES", + "BriefDescription": "This metric counts CPU cycles in SMT mode dur= ing program execution.", + "ScaleUnit": "1CPU cycles", + "MetricGroup": "SMT" + }, + { + "MetricName": "cpu_cycles_in_st_mode", + "MetricExpr": "cpu_cycles_fraction_in_st_mode * CPU_CYCLES", + "BriefDescription": "This metric counts CPU cycles in ST mode duri= ng program execution.", + "ScaleUnit": "1CPU cycles", + "MetricGroup": "SMT" + }, + { + "MetricName": "crypto_percentage", + "MetricExpr": "100 * (CRYPTO_SPEC / INST_SPEC)", + "BriefDescription": "This metric measures crypto operations as a p= ercentage of operations speculatively executed.", + "ScaleUnit": "1percent of operations", + "MetricGroup": "Operation_Mix" + }, + { + "MetricName": "dtlb_mpki", + "MetricExpr": "1000 * (DTLB_WALK / INST_RETIRED)", + "BriefDescription": "This metric measures the number of Data TLB W= alks per thousand Instructions executed.", + "ScaleUnit": "1MPKI", + "MetricGroup": "MPKI;DTLB_Effectiveness" + }, + { + "MetricName": "dtlb_walk_average_latency", + "MetricExpr": "DTLB_WALK_PERCYC / DTLB_WALK", + "BriefDescription": "This metric measures the average latency of D= ata TLB walks in CPU cycles", + "ScaleUnit": "1CPU cycles", + "MetricGroup": "Average_Latency" + }, + { + "MetricName": "dtlb_walk_ratio", + "MetricExpr": "DTLB_WALK / L1D_TLB", + "BriefDescription": "This metric measures the ratio of Data TLB Wa= lks to the total number of Data TLB accesses. This gives an indication of t= he effectiveness of the Data TLB accesses.", + "ScaleUnit": "1per TLB access", + "MetricGroup": "Miss_Ratio;DTLB_Effectiveness" + }, + { + "MetricName": "fp_ops_per_cycle", + "MetricExpr": "(FP_SCALE_OPS_SPEC + FP_FIXED_OPS_SPEC) / CPU_CYCLE= S", + "BriefDescription": "This metric measures floating point operation= s per cycle in any precision performed by any Instruction. Operations are c= ounted by computation and by vector lanes, fused computations such as multi= ply-add count as twice per vector lane for example.", + "ScaleUnit": "1operations per cycle", + "MetricGroup": "FP_Arithmetic_Intensity" + }, + { + "MetricName": "fp16_percentage", + "MetricExpr": "100 * (FP_HP_SPEC / INST_SPEC)", + "BriefDescription": "This metric measures half-precision floating = point operations as a percentage of operations speculatively executed.", + "ScaleUnit": "1percent of operations", + "MetricGroup": "FP_Precision_Mix" + }, + { + "MetricName": "fp32_percentage", + "MetricExpr": "100 * (FP_SP_SPEC / INST_SPEC)", + "BriefDescription": "This metric measures single-precision floatin= g point operations as a percentage of operations speculatively executed.", + "ScaleUnit": "1percent of operations", + "MetricGroup": "FP_Precision_Mix" + }, + { + "MetricName": "fp64_percentage", + "MetricExpr": "100 * (FP_DP_SPEC / INST_SPEC)", + "BriefDescription": "This metric measures double-precision floatin= g point operations as a percentage of operations speculatively executed.", + "ScaleUnit": "1percent of operations", + "MetricGroup": "FP_Precision_Mix" + }, + { + "MetricName": "frontend_bound", + "MetricExpr": "100 * (STALL_SLOT_FRONTEND_WITHOUT_MISPRED / CPU_SL= OT)", + "BriefDescription": "This metric is the percentage of total slots = that were stalled due to resource constraints in the frontend of the proces= sor.", + "ScaleUnit": "1percent of slots", + "MetricGroup": "TopdownL1" + }, + { + "MetricName": "frontend_cache_l1i_bound", + "MetricExpr": "100 * (STALL_FRONTEND_L1I / (STALL_FRONTEND_L1I + S= TALL_FRONTEND_MEM))", + "BriefDescription": "This metric is the percentage of total cycles= stalled in the frontend due to memory access latency issues caused by L1 I= -cache misses.", + "ScaleUnit": "1percent of cycles", + "MetricGroup": "Topdown_Frontend" + }, + { + "MetricName": "frontend_cache_l2i_bound", + "MetricExpr": "100 * (STALL_FRONTEND_MEM / (STALL_FRONTEND_L1I + S= TALL_FRONTEND_MEM))", + "BriefDescription": "This metric is the percentage of total cycles= stalled in the frontend due to memory access latency issues caused by L2 I= -cache misses.", + "ScaleUnit": "1percent of cycles", + "MetricGroup": "Topdown_Frontend" + }, + { + "MetricName": "frontend_core_bound", + "MetricExpr": "100 * (STALL_FRONTEND_CPUBOUND / STALL_FRONTEND)", + "BriefDescription": "This metric is the percentage of total cycles= stalled in the frontend due to frontend Core resource constraints not rela= ted to Instruction fetch latency issues caused by memory access components.= ", + "ScaleUnit": "1percent of cycles", + "MetricGroup": "Topdown_Frontend" + }, + { + "MetricName": "frontend_core_flow_bound", + "MetricExpr": "100 * (STALL_FRONTEND_FLOW / STALL_FRONTEND_CPUBOUN= D)", + "BriefDescription": "This metric is the percentage of total cycles= stalled in the frontend as the decode unit is awaiting input from the Bran= ch prediction unit.", + "ScaleUnit": "1percent of cycles", + "MetricGroup": "Topdown_Frontend" + }, + { + "MetricName": "frontend_core_flush_bound", + "MetricExpr": "100 * (STALL_FRONTEND_FLUSH / STALL_FRONTEND_CPUBOU= ND)", + "BriefDescription": "This metric is the percentage of total cycles= stalled in the frontend as the processor is recovering from a pipeline flu= sh caused by bad speculation or other machine resteers.", + "ScaleUnit": "1percent of cycles", + "MetricGroup": "Topdown_Frontend" + }, + { + "MetricName": "frontend_mem_bound", + "MetricExpr": "100 * (STALL_FRONTEND_MEMBOUND / STALL_FRONTEND)", + "BriefDescription": "This metric is the percentage of total cycles= stalled in the frontend due to frontend Core resource constraints related = to the Instruction fetch latency issues caused by memory access components.= ", + "ScaleUnit": "1percent of cycles", + "MetricGroup": "Topdown_Frontend" + }, + { + "MetricName": "frontend_mem_cache_bound", + "MetricExpr": "100 * ((STALL_FRONTEND_L1I + STALL_FRONTEND_MEM) / = STALL_FRONTEND_MEMBOUND)", + "BriefDescription": "This metric is the percentage of total cycles= stalled in the frontend due to Instruction fetch latency issues caused by = I-cache misses.", + "ScaleUnit": "1percent of cycles", + "MetricGroup": "Topdown_Frontend" + }, + { + "MetricName": "frontend_mem_tlb_bound", + "MetricExpr": "100 * (STALL_FRONTEND_TLB / STALL_FRONTEND_MEMBOUND= )", + "BriefDescription": "This metric is the percentage of total cycles= stalled in the frontend due to Instruction fetch latency issues caused by = Instruction TLB misses.", + "ScaleUnit": "1percent of cycles", + "MetricGroup": "Topdown_Frontend" + }, + { + "MetricName": "frontend_stalled_cycles", + "MetricExpr": "100 * (STALL_FRONTEND / CPU_CYCLES)", + "BriefDescription": "This metric is the percentage of cycles that = were stalled due to resource constraints in the frontend unit of the proces= sor.", + "ScaleUnit": "1percent of cycles", + "MetricGroup": "Cycle_Accounting" + }, + { + "MetricName": "instruction_fetch_average_latency", + "MetricExpr": "INST_FETCH_PERCYC / INST_FETCH", + "BriefDescription": "This metric measures the average latency of I= nstruction fetches in CPU cycles", + "ScaleUnit": "1CPU cycles", + "MetricGroup": "Average_Latency" + }, + { + "MetricName": "integer_dp_percentage", + "MetricExpr": "100 * (DP_SPEC / INST_SPEC)", + "BriefDescription": "This metric measures scalar integer operation= s as a percentage of operations speculatively executed.", + "ScaleUnit": "1percent of operations", + "MetricGroup": "Operation_Mix" + }, + { + "MetricName": "ipc", + "MetricExpr": "INST_RETIRED / CPU_CYCLES", + "BriefDescription": "This metric measures the number of Instructio= ns retired per cycle.", + "ScaleUnit": "1per cycle", + "MetricGroup": "General" + }, + { + "MetricName": "itlb_mpki", + "MetricExpr": "1000 * (ITLB_WALK / INST_RETIRED)", + "BriefDescription": "This metric measures the number of Instructio= n TLB Walks per thousand Instructions executed.", + "ScaleUnit": "1MPKI", + "MetricGroup": "MPKI;ITLB_Effectiveness" + }, + { + "MetricName": "itlb_walk_average_latency", + "MetricExpr": "ITLB_WALK_PERCYC / ITLB_WALK", + "BriefDescription": "This metric measures the average latency of I= nstruction TLB walks in CPU cycles", + "ScaleUnit": "1CPU cycles", + "MetricGroup": "Average_Latency" + }, + { + "MetricName": "itlb_walk_ratio", + "MetricExpr": "ITLB_WALK / L1I_TLB", + "BriefDescription": "This metric measures the ratio of Instruction= TLB Walks to the total number of Instruction TLB accesses. This gives an i= ndication of the effectiveness of the Instruction TLB accesses.", + "ScaleUnit": "1per TLB access", + "MetricGroup": "Miss_Ratio;ITLB_Effectiveness" + }, + { + "MetricName": "l1d_cache_miss_ratio", + "MetricExpr": "L1D_CACHE_REFILL / L1D_CACHE", + "BriefDescription": "This metric measures the ratio of L1 D-cache = accesses missed to the total number of L1 D-cache accesses. This gives an i= ndication of the effectiveness of the L1 D-cache.", + "ScaleUnit": "1per cache access", + "MetricGroup": "Miss_Ratio;L1D_Cache_Effectiveness" + }, + { + "MetricName": "l1d_cache_mpki", + "MetricExpr": "1000 * (L1D_CACHE_REFILL / INST_RETIRED)", + "BriefDescription": "This metric measures the number of L1 D-cache= accesses missed per thousand Instructions executed.", + "ScaleUnit": "1MPKI", + "MetricGroup": "MPKI;L1D_Cache_Effectiveness" + }, + { + "MetricName": "l1d_cache_rw_miss_ratio", + "MetricExpr": "l1d_demand_misses / l1d_demand_accesses", + "BriefDescription": "This metric measures the ratio of L1 D-cache = Read accesses missed to the total number of L1 D-cache accesses. This gives= an indication of the effectiveness of the L1 D-cache for demand Load or St= ore traffic.", + "ScaleUnit": "1per cache access", + "MetricGroup": "L1I_Prefetcher_Effectiveness" + }, + { + "MetricName": "l1d_demand_accesses", + "MetricExpr": "L1D_CACHE_RW", + "BriefDescription": "This metric measures the count of L1 D-cache = accesses incurred on Load or Store by the Instructions stream of the progra= m", + "ScaleUnit": "1count", + "MetricGroup": "L1I_Prefetcher_Effectiveness" + }, + { + "MetricName": "l1d_demand_misses", + "MetricExpr": "L1D_CACHE_REFILL_RW", + "BriefDescription": "This metric measures the count of L1 D-cache = misses incurred on a Load or Store by the Instructions stream of the progra= m", + "ScaleUnit": "1count", + "MetricGroup": "L1I_Prefetcher_Effectiveness" + }, + { + "MetricName": "l1d_prf_accuracy", + "MetricExpr": "100 * (l1d_useful_prf / l1d_refilled_prf)", + "BriefDescription": "This metric measures the fraction of prefetch= ed memory addresses that are used by the Instruction stream", + "ScaleUnit": "1percent of prefetch", + "MetricGroup": "L1I_Prefetcher_Effectiveness" + }, + { + "MetricName": "l1d_prf_coverage", + "MetricExpr": "100 * (l1d_useful_prf / (l1d_demand_misses + l1d_re= filled_prf))", + "BriefDescription": "This metric measures the baseline demand cach= e misses which the prefetcher brings into the cache.", + "ScaleUnit": "1percent of cache access", + "MetricGroup": "L1I_Prefetcher_Effectiveness" + }, + { + "MetricName": "l1d_refilled_prf", + "MetricExpr": "L1D_CACHE_REFILL_HWPRF + L1D_CACHE_REFILL_PRFM + L1= D_LFB_HIT_RW_FHWPRF + L1D_LFB_HIT_RW_FPRFM", + "BriefDescription": "This metric measures the count of cache lines= refilled by L1 Data prefetcher (hardware prefetches or software preload) i= nto L1 D-cache.", + "ScaleUnit": "1count", + "MetricGroup": "L1I_Prefetcher_Effectiveness" + }, + { + "MetricName": "l1d_tlb_miss_ratio", + "MetricExpr": "L1D_TLB_REFILL / L1D_TLB", + "BriefDescription": "This metric measures the ratio of L1 Data TLB= accesses missed to the total number of L1 Data TLB accesses. This gives an= indication of the effectiveness of the L1 Data TLB.", + "ScaleUnit": "1per TLB access", + "MetricGroup": "Miss_Ratio;DTLB_Effectiveness" + }, + { + "MetricName": "l1d_tlb_mpki", + "MetricExpr": "1000 * (L1D_TLB_REFILL / INST_RETIRED)", + "BriefDescription": "This metric measures the number of L1 Data TL= B accesses missed per thousand Instructions executed.", + "ScaleUnit": "1MPKI", + "MetricGroup": "MPKI;DTLB_Effectiveness" + }, + { + "MetricName": "l1d_useful_prf", + "MetricExpr": "L1D_CACHE_HIT_RW_FPRF + L1D_LFB_HIT_RW_FHWPRF + L1D= _LFB_HIT_RW_FPRFM", + "BriefDescription": "This metric measures the count of cache lines= refilled by L1 Data prefetcher (hardware prefetches or software preload) i= nto L1 D-cache which are further used by Load or Store from the Instruction= stream of the program.", + "ScaleUnit": "1count", + "MetricGroup": "L1I_Prefetcher_Effectiveness" + }, + { + "MetricName": "l1i_cache_miss_ratio", + "MetricExpr": "L1I_CACHE_REFILL / L1I_CACHE", + "BriefDescription": "This metric measures the ratio of L1 I-cache = accesses missed to the total number of L1 I-cache accesses. This gives an i= ndication of the effectiveness of the L1 I-cache.", + "ScaleUnit": "1per cache access", + "MetricGroup": "Miss_Ratio;L1I_Cache_Effectiveness" + }, + { + "MetricName": "l1i_cache_mpki", + "MetricExpr": "1000 * (L1I_CACHE_REFILL / INST_RETIRED)", + "BriefDescription": "This metric measures the number of L1 I-cache= accesses missed per thousand Instructions executed.", + "ScaleUnit": "1MPKI", + "MetricGroup": "MPKI;L1I_Cache_Effectiveness" + }, + { + "MetricName": "l1i_cache_rd_miss_ratio", + "MetricExpr": "l1i_demand_misses / l1i_demand_accesses", + "BriefDescription": "This metric measures the ratio of L1 I-cache = Read accesses missed to the total number of L1 I-cache accesses. This gives= an indication of the effectiveness of the L1 I-cache for demand Instructio= n fetch traffic. Note that cache accesses in this cache is demand Instructi= on fetch.", + "ScaleUnit": "1per cache access", + "MetricGroup": "L1D_Prefetcher_Effectiveness" + }, + { + "MetricName": "l1i_demand_accesses", + "MetricExpr": "L1I_CACHE_RD", + "BriefDescription": "This metric measures the count of L1 I-cache = accesses caused by an Instruction fetch by the Instructions stream of the p= rogram", + "ScaleUnit": "1Count", + "MetricGroup": "L1D_Prefetcher_Effectiveness" + }, + { + "MetricName": "l1i_demand_misses", + "MetricExpr": "L1I_CACHE_REFILL_RD", + "BriefDescription": "This metric measures the count of L1 I-cache = misses caused by an Instruction fetch by the Instructions stream of the pro= gram", + "ScaleUnit": "1Count", + "MetricGroup": "L1D_Prefetcher_Effectiveness" + }, + { + "MetricName": "l1i_prf_accuracy", + "MetricExpr": "100 * (l1i_useful_prf / l1i_refilled_prf)", + "BriefDescription": "This metric measures the fraction of prefetch= ed memory addresses that are used by the Instruction stream", + "ScaleUnit": "1percent of prefetch", + "MetricGroup": "L1D_Prefetcher_Effectiveness" + }, + { + "MetricName": "l1i_prf_coverage", + "MetricExpr": "100 * (l1i_useful_prf / (l1i_demand_misses + l1i_re= filled_prf))", + "BriefDescription": "This metric measures the baseline demand cach= e misses which the prefetcher brings into the cache.", + "ScaleUnit": "1percent of cache access", + "MetricGroup": "L1D_Prefetcher_Effectiveness" + }, + { + "MetricName": "l1i_refilled_prf", + "MetricExpr": "L1I_CACHE_REFILL_HWPRF + L1I_CACHE_REFILL_PRFM", + "BriefDescription": "This metric measures the count of cache lines= refilled by L1 Instruction prefetcher (hardware prefetches or software pre= load) into L1 I-cache.", + "ScaleUnit": "1count", + "MetricGroup": "L1D_Prefetcher_Effectiveness" + }, + { + "MetricName": "l1i_tlb_miss_ratio", + "MetricExpr": "L1I_TLB_REFILL / L1I_TLB", + "BriefDescription": "This metric measures the ratio of L1 Instruct= ion TLB accesses missed to the total number of L1 Instruction TLB accesses.= This gives an indication of the effectiveness of the L1 Instruction TLB.", + "ScaleUnit": "1per TLB access", + "MetricGroup": "Miss_Ratio;ITLB_Effectiveness" + }, + { + "MetricName": "l1i_tlb_mpki", + "MetricExpr": "1000 * (L1I_TLB_REFILL / INST_RETIRED)", + "BriefDescription": "This metric measures the number of L1 Instruc= tion TLB accesses missed per thousand Instructions executed.", + "ScaleUnit": "1MPKI", + "MetricGroup": "MPKI;ITLB_Effectiveness" + }, + { + "MetricName": "l1i_useful_prf", + "MetricExpr": "L1I_CACHE_HIT_RD_FPRF", + "BriefDescription": "This metric measures the count of cache lines= refilled by L1 Instruction prefetcher (hardware prefetches or software pre= load) into L1 I-cache which are further used by Instruction stream of the p= rogram.", + "ScaleUnit": "1count", + "MetricGroup": "L1D_Prefetcher_Effectiveness" + }, + { + "MetricName": "l2_cache_miss_ratio", + "MetricExpr": "L2D_CACHE_REFILL / L2D_CACHE", + "BriefDescription": "This metric measures the ratio of L2 cache ac= cesses missed to the total number of L2 cache accesses. This gives an indic= ation of the effectiveness of the L2 cache, which is a unified cache that s= tores both Data and Instruction.", + "ScaleUnit": "1per cache access", + "MetricGroup": "Miss_Ratio;L2_Cache_Effectiveness" + }, + { + "MetricName": "l2_cache_mpki", + "MetricExpr": "1000 * (L2D_CACHE_REFILL / INST_RETIRED)", + "BriefDescription": "This metric measures the number of L2 unified= cache accesses missed per thousand Instructions executed.", + "ScaleUnit": "1MPKI", + "MetricGroup": "MPKI;L2_Cache_Effectiveness" + }, + { + "MetricName": "l2_tlb_miss_ratio", + "MetricExpr": "L2D_TLB_REFILL / L2D_TLB", + "BriefDescription": "This metric measures the ratio of L2 unified = TLB accesses missed to the total number of L2 unified TLB accesses.", + "ScaleUnit": "1per TLB access", + "MetricGroup": "Miss_Ratio;ITLB_Effectiveness;DTLB_Effectiveness" + }, + { + "MetricName": "l2_tlb_mpki", + "MetricExpr": "1000 * (L2D_TLB_REFILL / INST_RETIRED)", + "BriefDescription": "This metric measures the number of L2 unified= TLB accesses missed per thousand Instructions executed.", + "ScaleUnit": "1MPKI", + "MetricGroup": "MPKI;ITLB_Effectiveness;DTLB_Effectiveness" + }, + { + "MetricName": "l2d_cache_rwl1prf_miss_ratio", + "MetricExpr": "l2d_demand_misses / l2d_demand_accesses", + "BriefDescription": "This metric measures the ratio of L2 D-cache = Read accesses missed to the total number of L2 D-cache accesses.", + "ScaleUnit": "1per cache access", + "MetricGroup": "L2_Prefetcher_Effectiveness" + }, + { + "MetricName": "l2d_demand_accesses", + "MetricExpr": "L2D_CACHE_RD + L2D_CACHE_WR + L2D_CACHE_L1PRF", + "BriefDescription": "This metric measures the count of L2 D-cache = accesses incurred on an Instruction Fetch, Load, Store, or L1 prefetcher ac= cesses by the Instructions stream of the program", + "ScaleUnit": "1count", + "MetricGroup": "L2_Prefetcher_Effectiveness" + }, + { + "MetricName": "l2d_demand_misses", + "MetricExpr": "L2D_CACHE_REFILL_RD + L2D_CACHE_REFILL_WR + L2D_CAC= HE_REFILL_L1PRF", + "BriefDescription": "This metric measures the count of L2 D-cache = misses incurred on an Instruction Fetch, Load, Store, or L1 prefetcher acce= sses by the Instructions stream of the program", + "ScaleUnit": "1count", + "MetricGroup": "L2_Prefetcher_Effectiveness" + }, + { + "MetricName": "l2d_prf_accuracy", + "MetricExpr": "100 * (l2d_useful_prf / l2d_refilled_prf)", + "BriefDescription": "This metric measures the fraction of prefetch= ed memory addresses that are used by the Instruction stream", + "ScaleUnit": "1percent of prefetch", + "MetricGroup": "L2_Prefetcher_Effectiveness" + }, + { + "MetricName": "l2d_prf_coverage", + "MetricExpr": "100 * (l2d_useful_prf / (l2d_demand_misses + l2d_re= filled_prf))", + "BriefDescription": "This metric measures the baseline demand cach= e misses which the prefetcher brings into the cache.", + "ScaleUnit": "1percent of cache access", + "MetricGroup": "L2_Prefetcher_Effectiveness" + }, + { + "MetricName": "l2d_refilled_prf", + "MetricExpr": "(L2D_CACHE_REFILL_PRF - L2D_CACHE_REFILL_L1PRF) + L= 2D_LFB_HIT_RWL1PRF_FHWPRF", + "BriefDescription": "This metric measures the count of cache lines= refilled by L2 Data prefetcher (hardware prefetches or software preload) i= nto L2 D-cache.", + "ScaleUnit": "1count", + "MetricGroup": "L2_Prefetcher_Effectiveness" + }, + { + "MetricName": "l2d_useful_prf", + "MetricExpr": "L2D_CACHE_HIT_RWL1PRF_FPRF + L2D_LFB_HIT_RWL1PRF_FH= WPRF", + "BriefDescription": "This metric measures the count of cache lines= refilled by L2 Data prefetcher (hardware prefetches or software preload) i= nto L2 D-cache which are further used by Instruction Fetch, Load, Store, or= L1 prefetcher accesses from the Instruction stream of the program.", + "ScaleUnit": "1count", + "MetricGroup": "L2_Prefetcher_Effectiveness" + }, + { + "MetricName": "l3d_cache_rwl1prfl2prf_miss_ratio", + "MetricExpr": "l3d_demand_misses / l3d_demand_accesses", + "BriefDescription": "This metric measures the ratio of L3 D-cache = Read accesses missed to the total number of L3 D-cache accesses. This gives= an indication of the effectiveness of the L2 D-cache for demand Instructio= n Fetch, Load, Store, L1 prefetcher, or L2 prefetcher accesses traffic.", + "ScaleUnit": "1per cache access", + "MetricGroup": "L3_Prefetcher_Effectiveness" + }, + { + "MetricName": "l3d_demand_accesses", + "MetricExpr": "L3D_CACHE_RWL1PRFL2PRF", + "BriefDescription": "This metric measures the count of L3 D-cache = accesses incurred on an Instruction Fetch, Load, Store, L1 prefetcher, or L= 2 prefetcher accesses by the Instructions stream of the program", + "ScaleUnit": "1count", + "MetricGroup": "L3_Prefetcher_Effectiveness" + }, + { + "MetricName": "l3d_demand_misses", + "MetricExpr": "L3D_CACHE_REFILL_RWL1PRFL2PRF", + "BriefDescription": "This metric measures the count of L3 D-cache = misses incurred on an Instruction Fetch, Load, Store, L1 prefetcher, or L2 = prefetcher accesses by the Instructions stream of the program", + "ScaleUnit": "1count", + "MetricGroup": "L3_Prefetcher_Effectiveness" + }, + { + "MetricName": "l3d_prf_accuracy", + "MetricExpr": "100 * (l3d_useful_prf / l3d_refilled_prf)", + "BriefDescription": "This metric measures the fraction of prefetch= ed memory addresses that are used by the Instruction stream", + "ScaleUnit": "1percent of prefetch", + "MetricGroup": "L3_Prefetcher_Effectiveness" + }, + { + "MetricName": "l3d_prf_coverage", + "MetricExpr": "100 * (l3d_useful_prf / (l3d_demand_misses + l3d_re= filled_prf))", + "BriefDescription": "This metric measures the baseline demand cach= e misses which the prefetcher brings into the cache.", + "ScaleUnit": "1percent of cache access", + "MetricGroup": "L3_Prefetcher_Effectiveness" + }, + { + "MetricName": "l3d_refilled_prf", + "MetricExpr": "L3D_CACHE_REFILL_HWPRF + L3D_CACHE_REFILL_PRFM - L3= D_CACHE_REFILL_L1PRF - L3D_CACHE_REFILL_L2PRF", + "BriefDescription": "This metric measures the count of cache lines= refilled by L3 Data prefetcher (hardware prefetches or software preload) i= nto L3 D-cache.", + "ScaleUnit": "1count", + "MetricGroup": "L3_Prefetcher_Effectiveness" + }, + { + "MetricName": "l3d_useful_prf", + "MetricExpr": "L3D_CACHE_HIT_RWL1PRFL2PRF_FPRF", + "BriefDescription": "This metric measures the count of cache lines= refilled by L3 Data prefetcher (hardware prefetches or software preload) i= nto L3 D-cache which are further used by Instruction Fetch, Load, Store, L1= prefetcher, or L2 prefetcher accesses from the Instruction stream of the p= rogram.", + "ScaleUnit": "1count", + "MetricGroup": "L3_Prefetcher_Effectiveness" + }, + { + "MetricName": "ll_cache_read_hit_ratio", + "MetricExpr": "(LL_CACHE_RD - LL_CACHE_MISS_RD) / LL_CACHE_RD", + "BriefDescription": "This metric measures the ratio of last level = cache Read accesses hit in the cache to the total number of last level cach= e accesses. This gives an indication of the effectiveness of the last level= cache for Read traffic. Note that cache accesses in this cache are either = Data memory access or Instruction fetch as this is a system level cache.", + "ScaleUnit": "1per cache access", + "MetricGroup": "LL_Cache_Effectiveness" + }, + { + "MetricName": "ll_cache_read_miss_ratio", + "MetricExpr": "LL_CACHE_MISS_RD / LL_CACHE_RD", + "BriefDescription": "This metric measures the ratio of last level = cache Read accesses missed to the total number of last level cache accesses= . This gives an indication of the effectiveness of the last level cache for= Read traffic. Note that cache accesses in this cache are either Data memor= y access or Instruction fetch as this is a system level cache.", + "ScaleUnit": "1per cache access", + "MetricGroup": "Miss_Ratio;LL_Cache_Effectiveness" + }, + { + "MetricName": "ll_cache_read_mpki", + "MetricExpr": "1000 * (LL_CACHE_MISS_RD / INST_RETIRED)", + "BriefDescription": "This metric measures the number of last level= cache Read accesses missed per thousand Instructions executed.", + "ScaleUnit": "1MPKI", + "MetricGroup": "MPKI;LL_Cache_Effectiveness" + }, + { + "MetricName": "load_average_latency", + "MetricExpr": "MEM_ACCESS_RD_PERCYC / MEM_ACCESS", + "BriefDescription": "This metric measures the average latency of L= oad operations in CPU cycles", + "ScaleUnit": "1CPU cycles", + "MetricGroup": "Average_Latency" + }, + { + "MetricName": "load_percentage", + "MetricExpr": "100 * (LD_SPEC / INST_SPEC)", + "BriefDescription": "This metric measures Load operations as a per= centage of operations speculatively executed.", + "ScaleUnit": "1percent of operations", + "MetricGroup": "Operation_Mix" + }, + { + "MetricName": "nonsve_fp_ops_per_cycle", + "MetricExpr": "FP_FIXED_OPS_SPEC / CPU_CYCLES", + "BriefDescription": "This metric measures floating point operation= s per cycle in any precision performed by an Instruction that is not an SVE= Instruction. Operations are counted by computation and by vector lanes, fu= sed computations such as multiply-add count as twice per vector lane for ex= ample.", + "ScaleUnit": "1operations per cycle", + "MetricGroup": "FP_Arithmetic_Intensity" + }, + { + "MetricName": "retiring", + "MetricExpr": "100 * ((OP_RETIRED/OP_SPEC) * (1 - (STALL_SLOT/CPU_= SLOT)))", + "BriefDescription": "This metric is the percentage of total slots = that retired operations, which indicates cycles that were utilized efficien= tly.", + "ScaleUnit": "1percent of slots", + "MetricGroup": "TopdownL1" + }, + { + "MetricName": "scalar_fp_percentage", + "MetricExpr": "100 * (VFP_SPEC / INST_SPEC)", + "BriefDescription": "This metric measures scalar floating point op= erations as a percentage of operations speculatively executed.", + "ScaleUnit": "1percent of operations", + "MetricGroup": "Operation_Mix" + }, + { + "MetricName": "simd_percentage", + "MetricExpr": "100 * (ASE_SPEC / INST_SPEC)", + "BriefDescription": "This metric measures advanced SIMD operations= as a percentage of total operations speculatively executed.", + "ScaleUnit": "1percent of operations", + "MetricGroup": "Operation_Mix" + }, + { + "MetricName": "store_percentage", + "MetricExpr": "100 * (ST_SPEC / INST_SPEC)", + "BriefDescription": "This metric measures Store operations as a pe= rcentage of operations speculatively executed.", + "ScaleUnit": "1percent of operations", + "MetricGroup": "Operation_Mix" + }, + { + "MetricName": "sve_all_percentage", + "MetricExpr": "100 * (SVE_INST_SPEC / INST_SPEC)", + "BriefDescription": "This metric measures scalable vector operatio= ns, including Loads and Stores, as a percentage of operations speculatively= executed.", + "ScaleUnit": "1percent of operations", + "MetricGroup": "Operation_Mix" + }, + { + "MetricName": "sve_fp_ops_per_cycle", + "MetricExpr": "FP_SCALE_OPS_SPEC / CPU_CYCLES", + "BriefDescription": "This metric measures floating point operation= s per cycle in any precision performed by SVE Instructions. Operations are = counted by computation and by vector lanes, fused computations such as mult= iply-add count as twice per vector lane for example.", + "ScaleUnit": "1operations per cycle", + "MetricGroup": "FP_Arithmetic_Intensity" + }, + { + "MetricName": "sve_predicate_empty_percentage", + "MetricExpr": "100 * (SVE_PRED_EMPTY_SPEC / SVE_PRED_SPEC)", + "BriefDescription": "This metric measures scalable vector operatio= ns with no active predicates as a percentage of sve predicated operations s= peculatively executed.", + "ScaleUnit": "1percent of SVE predicated operations", + "MetricGroup": "SVE_Effectiveness" + }, + { + "MetricName": "sve_predicate_full_percentage", + "MetricExpr": "100 * (SVE_PRED_FULL_SPEC / SVE_PRED_SPEC)", + "BriefDescription": "This metric measures scalable vector operatio= ns with all active predicates as a percentage of sve predicated operations = speculatively executed.", + "ScaleUnit": "1percent of SVE predicated operations", + "MetricGroup": "SVE_Effectiveness" + }, + { + "MetricName": "sve_predicate_partial_percentage", + "MetricExpr": "100 * (SVE_PRED_PARTIAL_SPEC / SVE_PRED_SPEC)", + "BriefDescription": "This metric measures scalable vector operatio= ns with at least one active predicates as a percentage of sve predicated op= erations speculatively executed.", + "ScaleUnit": "1percent of SVE predicated operations", + "MetricGroup": "SVE_Effectiveness" + }, + { + "MetricName": "sve_predicate_percentage", + "MetricExpr": "100 * (SVE_PRED_SPEC / INST_SPEC)", + "BriefDescription": "This metric measures scalable vector operatio= ns with predicates as a percentage of operations speculatively executed.", + "ScaleUnit": "1percent of operations", + "MetricGroup": "SVE_Effectiveness" + } +] diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/misc.json b/tools= /perf/pmu-events/arch/arm64/nvidia/t410/misc.json new file mode 100644 index 000000000000..0d657789ee48 --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/misc.json @@ -0,0 +1,646 @@ +[ + { + "ArchStdEvent": "SW_INCR", + "PublicDescription": "The Event counts software writes to the PMSW= INC_EL0 (software PMU increment) register. The PMSWINC_EL0 register is a ma= nually updated counter for use by application software. This Event could be= used to measure any user program Event, such as accesses to a particular D= ata structure (by writing to the PMSWINC_EL0 register each time the Data st= ructure is accessed). To use the PMSWINC_EL0 register and Event, developers= must insert Instructions that write to the PMSWINC_EL0 register into the s= ource code. Since the SW_INCR Event records Writes to the PMSWINC_EL0 regis= ter, there is no need to do a Read/Increment/Write sequence to the PMSWINC_= EL0 register." + }, + { + "ArchStdEvent": "CHAIN", + "PublicDescription": "For odd-numbered counters, this Event increm= ents the count by one for each overflow of the preceding even-numbered coun= ter. For even-numbered counters, there is no increment. This Event is used = when the even/odd pairs of registers are used as a single counter." + }, + { + "ArchStdEvent": "TRB_WRAP", + "PublicDescription": "The Event is generated each time the trace b= uffer current Write pointer is wrapped to the trace buffer base pointer." + }, + { + "ArchStdEvent": "TRCEXTOUT0", + "PublicDescription": "Trace unit external output 0." + }, + { + "ArchStdEvent": "TRCEXTOUT1", + "PublicDescription": "Trace unit external output 1." + }, + { + "ArchStdEvent": "TRCEXTOUT2", + "PublicDescription": "Trace unit external output 2." + }, + { + "ArchStdEvent": "TRCEXTOUT3", + "PublicDescription": "Trace unit external output 3." + }, + { + "ArchStdEvent": "CTI_TRIGOUT4", + "PublicDescription": "Cross-trigger Interface output trigger 4." + }, + { + "ArchStdEvent": "CTI_TRIGOUT5", + "PublicDescription": "Cross-trigger Interface output trigger 5." + }, + { + "ArchStdEvent": "CTI_TRIGOUT6", + "PublicDescription": "Cross-trigger Interface output trigger 6." + }, + { + "ArchStdEvent": "CTI_TRIGOUT7", + "PublicDescription": "Cross-trigger Interface output trigger 7." + }, + { + "EventCode": "0x00e1", + "EventName": "L1I_PRFM_REQ_DROP", + "PublicDescription": "L1 I-cache software prefetch dropped." + }, + { + "EventCode": "0x0100", + "EventName": "L1_PF_REFILL", + "PublicDescription": "L1 prefetch requests, refilled to L1 cache." + }, + { + "EventCode": "0x0120", + "EventName": "FLUSH", + "PublicDescription": "The Event counts both the CT flush and BX fl= ush. The BR_MIS_PRED counts the BX flushes. So the FLUSH-BR_MIS_PRED gives = the CT flushes." + }, + { + "EventCode": "0x0121", + "EventName": "FLUSH_MEM", + "PublicDescription": "Flushes due to memory hazards. This only inc= ludes CT flushes." + }, + { + "EventCode": "0x0122", + "EventName": "FLUSH_BAD_BRANCH", + "PublicDescription": "Flushes due to bad predicted Branch. This on= ly includes CT flushes." + }, + { + "EventCode": "0x0124", + "EventName": "FLUSH_ISB", + "PublicDescription": "Flushes due to ISB or similar side-effects. = This only includes CT flushes." + }, + { + "EventCode": "0x0125", + "EventName": "FLUSH_OTHER", + "PublicDescription": "Flushes due to other hazards. This only incl= udes CT flushes." + }, + { + "EventCode": "0x0126", + "EventName": "STORE_STREAM", + "PublicDescription": "Stored lines in streaming no-Write-allocate = mode." + }, + { + "EventCode": "0x0127", + "EventName": "NUKE_RAR", + "PublicDescription": "Load/Store nuke due to Read-after-Read order= ing hazard." + }, + { + "EventCode": "0x0128", + "EventName": "NUKE_RAW", + "PublicDescription": "Load/Store nuke due to Read-after-Write orde= ring hazard." + }, + { + "EventCode": "0x0129", + "EventName": "L1_PF_GEN_PAGE", + "PublicDescription": "Load/Store prefetch to L1 generated, Page mo= de." + }, + { + "EventCode": "0x012a", + "EventName": "L1_PF_GEN_STRIDE", + "PublicDescription": "Load/Store prefetch to L1 generated, stride = mode." + }, + { + "EventCode": "0x012b", + "EventName": "L2_PF_GEN_LD", + "PublicDescription": "Load prefetch to L2 generated." + }, + { + "EventCode": "0x012d", + "EventName": "LS_PF_TRAIN_TABLE_ALLOC", + "PublicDescription": "LS prefetch train table entry allocated." + }, + { + "EventCode": "0x0130", + "EventName": "LS_PF_GEN_TABLE_ALLOC", + "PublicDescription": "The Event counts the number of cycles with a= t least one table allocation, for L2 hardware prefetches (including the SW = PRFM that are converted into hardware prefetches due to D-TLB miss). LS pre= fetch gen table allocation (for L2 prefetches)." + }, + { + "EventCode": "0x0131", + "EventName": "LS_PF_GEN_TABLE_ALLOC_PF_PEND", + "PublicDescription": "The Event counts the number of cycles in whi= ch at least one hardware prefetch is dropped due to the inability to identi= fy a victim when the generation table is full. The hardware prefetch consid= ered here includes the software PRFM that is converted into hardware prefet= ches due to D-TLB miss." + }, + { + "EventCode": "0x0132", + "EventName": "TBW", + "PublicDescription": "Tablewalks." + }, + { + "EventCode": "0x0134", + "EventName": "S1L2_HIT", + "PublicDescription": "Translation cache hit on S1L2 walk cache ent= ry." + }, + { + "EventCode": "0x0135", + "EventName": "S1L1_HIT", + "PublicDescription": "Translation cache hit on S1L1 walk cache ent= ry." + }, + { + "EventCode": "0x0136", + "EventName": "S1L0_HIT", + "PublicDescription": "Translation cache hit on S1L0 walk cache ent= ry." + }, + { + "EventCode": "0x0137", + "EventName": "S2L2_HIT", + "PublicDescription": "Translation cache hit for S2L2 IPA walk cach= e entry." + }, + { + "EventCode": "0x0138", + "EventName": "IPA_REQ", + "PublicDescription": "Translation cache lookups for IPA to PA entr= ies." + }, + { + "EventCode": "0x0139", + "EventName": "IPA_REFILL", + "PublicDescription": "Translation cache refills for IPA to PA entr= ies." + }, + { + "EventCode": "0x013a", + "EventName": "S1_FLT", + "PublicDescription": "Stage1 tablewalk fault." + }, + { + "EventCode": "0x013b", + "EventName": "S2_FLT", + "PublicDescription": "Stage2 tablewalk fault." + }, + { + "EventCode": "0x013c", + "EventName": "COLT_REFILL", + "PublicDescription": "Aggregated page refill." + }, + { + "EventCode": "0x0145", + "EventName": "L1_PF_HIT", + "PublicDescription": "L1 prefetch requests, hitting in L1 cache." + }, + { + "EventCode": "0x0146", + "EventName": "L1_PF", + "PublicDescription": "L1 prefetch requests." + }, + { + "EventCode": "0x0147", + "EventName": "CACHE_LS_REFILL", + "PublicDescription": "L2 D-cache refill, Load/Store." + }, + { + "EventCode": "0x0148", + "EventName": "CACHE_PF", + "PublicDescription": "L2 prefetch requests." + }, + { + "EventCode": "0x0149", + "EventName": "CACHE_PF_HIT", + "PublicDescription": "L2 prefetch requests, hitting in L2 cache." + }, + { + "EventCode": "0x0150", + "EventName": "UNUSED_PF", + "PublicDescription": "L2 unused prefetch." + }, + { + "EventCode": "0x0151", + "EventName": "PFT_SENT", + "PublicDescription": "L2 prefetch TGT sent. Note that PFT_SENT != =3D PFT_USEFUL + PFT_DROP. There may be PFT_SENT for which the accesses res= ulted in a SLC hit." + }, + { + "EventCode": "0x0152", + "EventName": "PFT_USEFUL", + "PublicDescription": "L2 prefetch TGT useful." + }, + { + "EventCode": "0x0153", + "EventName": "PFT_DROP", + "PublicDescription": "L2 prefetch TGT dropped." + }, + { + "EventCode": "0x0162", + "EventName": "LRQ_FULL", + "PublicDescription": "The Event counts the number of cycles the LR= Q is full." + }, + { + "EventCode": "0x0163", + "EventName": "FETCH_FQ_EMPTY", + "PublicDescription": "Fetch Queue empty cycles." + }, + { + "EventCode": "0x0164", + "EventName": "FPG2", + "PublicDescription": "Forward progress guarantee. Medium range liv= elock triggered." + }, + { + "EventCode": "0x0165", + "EventName": "FPG", + "PublicDescription": "Forward progress guarantee. Tofu global live= lock buster is triggered." + }, + { + "EventCode": "0x0172", + "EventName": "DEADBLOCK", + "PublicDescription": "Write-back evictions converted to Dataless E= VICT. The victim line is deemed deadblock if the likeliness of a reuse is l= ow. The Core uses Dataless evict to evict a deadblock; And it uses a evict = with Data to evict an L2 line that is not a deadblock." + }, + { + "EventCode": "0x0173", + "EventName": "PF_PRQ_ALLOC_PF_PEND", + "PublicDescription": "L1 prefetch prq allocation (replacing pendin= g)." + }, + { + "EventCode": "0x0178", + "EventName": "FETCH_ICACHE_INSTR", + "PublicDescription": "Instructions fetched from I-cache." + }, + { + "EventCode": "0x017b", + "EventName": "NEAR_CAS", + "PublicDescription": "Near atomics: compare and swap." + }, + { + "EventCode": "0x017c", + "EventName": "NEAR_CAS_PASS", + "PublicDescription": "Near atomics: compare and swap pass." + }, + { + "EventCode": "0x017d", + "EventName": "FAR_CAS", + "PublicDescription": "Far atomics: compare and swap." + }, + { + "EventCode": "0x0186", + "EventName": "L2_BTB_RELOAD_MAIN_BTB", + "PublicDescription": "Number of completed L1 BTB update initiated = by L2 BTB hit which swap Branch information between L1 BTB and L2 BTB." + }, + { + "EventCode": "0x0190", + "EventName": "PF_MODE_0_CYCLES", + "PublicDescription": "Number of cycles in which the hardware prefe= tcher is in the most aggressive mode." + }, + { + "EventCode": "0x0191", + "EventName": "PF_MODE_1_CYCLES", + "PublicDescription": "Number of cycles in which the hardware prefe= tcher is in the more aggressive mode." + }, + { + "EventCode": "0x0192", + "EventName": "PF_MODE_2_CYCLES", + "PublicDescription": "Number of cycles in which the hardware prefe= tcher is in the less aggressive mode." + }, + { + "EventCode": "0x0193", + "EventName": "PF_MODE_3_CYCLES", + "PublicDescription": "Number of cycles in which the hardware prefe= tcher is in the most conservative mode." + }, + { + "EventCode": "0x0194", + "EventName": "TXREQ_LIMIT_MAX_CYCLES", + "PublicDescription": "Number of cycles in which the dynamic TXREQ = limit is the L2_TQ_SIZE." + }, + { + "EventCode": "0x0195", + "EventName": "TXREQ_LIMIT_3QUARTER_CYCLES", + "PublicDescription": "Number of cycles in which the dynamic TXREQ = limit is between 3/4 of the L2_TQ_SIZE and the L2_TQ_SIZE-1." + }, + { + "EventCode": "0x0196", + "EventName": "TXREQ_LIMIT_HALF_CYCLES", + "PublicDescription": "Number of cycles in which the dynamic TXREQ = limit is between 1/2 of the L2_TQ_SIZE and 3/4 of the L2_TQ_SIZE." + }, + { + "EventCode": "0x0197", + "EventName": "TXREQ_LIMIT_1QUARTER_CYCLES", + "PublicDescription": "Number of cycles in which the dynamic TXREQ = limit is between 1/4 of the L2_TQ_SIZE and 1/2 of the L2_TQ_SIZE." + }, + { + "EventCode": "0x019d", + "EventName": "PREFETCH_LATE_CMC", + "PublicDescription": "LS/readclean or LS/readunique lookup hit on = TQ entry allocated by CMC prefetch request." + }, + { + "EventCode": "0x019e", + "EventName": "PREFETCH_LATE_BO", + "PublicDescription": "LS/readclean or LS/readunique lookup hit on = TQ entry allocated by BO prefetch request." + }, + { + "EventCode": "0x019f", + "EventName": "PREFETCH_LATE_STRIDE", + "PublicDescription": "LS/readclean or LS/readunique lookup hit on = TQ entry allocated by STRIDE prefetch request." + }, + { + "EventCode": "0x01a0", + "EventName": "PREFETCH_LATE_SPATIAL", + "PublicDescription": "LS/readclean or LS/readunique lookup hit on = TQ entry allocated by SPATIAL prefetch request." + }, + { + "EventCode": "0x01a2", + "EventName": "PREFETCH_LATE_TBW", + "PublicDescription": "LS/readclean or LS/readunique lookup hit on = TQ entry allocated by TBW prefetch request." + }, + { + "EventCode": "0x01a3", + "EventName": "PREFETCH_LATE_PAGE", + "PublicDescription": "LS/readclean or LS/readunique lookup hit on = TQ entry allocated by PAGE prefetch request." + }, + { + "EventCode": "0x01a4", + "EventName": "PREFETCH_LATE_GSMS", + "PublicDescription": "LS/readclean or LS/readunique lookup hit on = TQ entry allocated by GSMS prefetch request." + }, + { + "EventCode": "0x01a5", + "EventName": "PREFETCH_LATE_SIP_CONS", + "PublicDescription": "LS/readclean or LS/readunique lookup hit on = TQ entry allocated by SIP_CONS prefetch request." + }, + { + "EventCode": "0x01a6", + "EventName": "PREFETCH_REFILL_CMC", + "PublicDescription": "PF/prefetch or PF/readclean request from CMC= pf engine filled the L2 cache." + }, + { + "EventCode": "0x01a7", + "EventName": "PREFETCH_REFILL_BO", + "PublicDescription": "PF/prefetch or PF/readclean request from BO = pf engine filled the L2 cache." + }, + { + "EventCode": "0x01a8", + "EventName": "PREFETCH_REFILL_STRIDE", + "PublicDescription": "PF/prefetch or PF/readclean request from STR= IDE pf engine filled the L2 cache." + }, + { + "EventCode": "0x01a9", + "EventName": "PREFETCH_REFILL_SPATIAL", + "PublicDescription": "PF/prefetch or PF/readclean request from SPA= TIAL pf engine filled the L2 cache." + }, + { + "EventCode": "0x01ab", + "EventName": "PREFETCH_REFILL_TBW", + "PublicDescription": "PF/prefetch or PF/readclean request from TBW= pf engine filled the L2 cache." + }, + { + "EventCode": "0x01ac", + "EventName": "PREFETCH_REFILL_PAGE", + "PublicDescription": "PF/prefetch or PF/readclean request from PAG= E pf engine filled the L2 cache." + }, + { + "EventCode": "0x01ad", + "EventName": "PREFETCH_REFILL_GSMS", + "PublicDescription": "PF/prefetch or PF/readclean request from GSM= S pf engine filled the L2 cache." + }, + { + "EventCode": "0x01ae", + "EventName": "PREFETCH_REFILL_SIP_CONS", + "PublicDescription": "PF/prefetch or PF/readclean request from SIP= _CONS pf engine filled the L2 cache." + }, + { + "EventCode": "0x01af", + "EventName": "CACHE_HIT_LINE_PF_CMC", + "PublicDescription": "LS/readclean or LS/readunique lookup hit in = L2 cache on line filled by CMC prefetch request." + }, + { + "EventCode": "0x01b0", + "EventName": "CACHE_HIT_LINE_PF_BO", + "PublicDescription": "LS/readclean or LS/readunique lookup hit in = L2 cache on line filled by BO prefetch request." + }, + { + "EventCode": "0x01b1", + "EventName": "CACHE_HIT_LINE_PF_STRIDE", + "PublicDescription": "LS/readclean or LS/readunique lookup hit in = L2 cache on line filled by STRIDE prefetch request." + }, + { + "EventCode": "0x01b2", + "EventName": "CACHE_HIT_LINE_PF_SPATIAL", + "PublicDescription": "LS/readclean or LS/readunique lookup hit in = L2 cache on line filled by SPATIAL prefetch request." + }, + { + "EventCode": "0x01b4", + "EventName": "CACHE_HIT_LINE_PF_TBW", + "PublicDescription": "LS/readclean or LS/readunique lookup hit in = L2 cache on line filled by TBW prefetch request." + }, + { + "EventCode": "0x01b5", + "EventName": "CACHE_HIT_LINE_PF_PAGE", + "PublicDescription": "LS/readclean or LS/readunique lookup hit in = L2 cache on line filled by PAGE prefetch request." + }, + { + "EventCode": "0x01b6", + "EventName": "CACHE_HIT_LINE_PF_GSMS", + "PublicDescription": "LS/readclean or LS/readunique lookup hit in = L2 cache on line filled by GSMS prefetch request." + }, + { + "EventCode": "0x01b7", + "EventName": "CACHE_HIT_LINE_PF_SIP_CONS", + "PublicDescription": "LS/readclean or LS/readunique lookup hit in = L2 cache on line filled by SIP_CONS prefetch request." + }, + { + "EventCode": "0x01cb", + "EventName": "L2_TQ_OUTSTANDING", + "PublicDescription": "Outstanding tracker count, per cycle. The Ev= ent increments by the number of valid entries pertaining to this thread in = the L2TQ, in each cycle. The Event can be used to calculate the occupancy o= f L2TQ by dividing this by the CPU_CYCLES Event. The L2TQ queue tracks the = outstanding Read, Write ,and Snoop transactions. The Read transaction and t= he Write transaction entries are attributable to PE, whereas the Snoop tran= sactions are not always attributable to PE." + }, + { + "EventCode": "0x01cc", + "EventName": "TXREQ_LIMIT_COUNT_CYCLES", + "PublicDescription": "The Event increments by the dynamic TXREQ va= lue, in each cycle. This is a companion Event of TXREQ_LIMIT_MAX_CYCLES, TX= REQ_LIMIT_3QUARTER_CYCLES, TXREQ_LIMIT_HALF_CYCLES, and TXREQ_LIMIT_1QUARTE= R_CYCLES." + }, + { + "EventCode": "0x01d2", + "EventName": "DVM_TLBI_RCVD", + "PublicDescription": "The Event counts the number of TLBI DVM mess= age received over CHI interface, for *this* Core." + }, + { + "EventCode": "0x01d6", + "EventName": "DSB_COMMITING_LOCAL_TLBI", + "PublicDescription": "The Event counts the number of DSB that are = retired and committed at least one local TLBI Instruction. This Event incre= ments no more than once (in a cycle) even if the DSB commits multiple local= TLBI Instruction." + }, + { + "EventCode": "0x01d7", + "EventName": "DSB_COMMITING_BROADCAST_TLBI", + "PublicDescription": "The Event counts the number of DSB that are = retired and committed at least one broadcast TLBI Instruction. This Event i= ncrements no more than once (in a cycle) even if the DSB commits multiple b= roadcast TLBI Instruction." + }, + { + "EventCode": "0x01f0", + "EventName": "TMS_ST_TO_SMT_LATENCY", + "PublicDescription": "The Event counts the number of CPU cycles sp= ent on TMS for ST-to-SMT switch. This Event is counted by both the threads = - The Event in both threads increment during TMS for ST-to-SMT switch." + }, + { + "EventCode": "0x01f1", + "EventName": "TMS_SMT_TO_ST_LATENCY", + "PublicDescription": "The Event counts the number of CPU cycles sp= ent on TMS for SMT-to-ST switch. The count also includes the CPU cycles spe= nd due to an aborted SMT-to-ST TMS attempt. This Event is counted only by t= he thread that is not in WFI." + }, + { + "EventCode": "0x01f2", + "EventName": "TMS_ST_TO_SMT_COUNT", + "PublicDescription": "The Event counts the number of completed TMS= from ST-to-SMT. This Event is counted only by the active thread (the one t= hat is not in WFI). Note: When an active thread enters the Debug state in S= T-Full resource mode, it is switched to SMT mode. This is because the inact= ive thread cannot wake up while the other thread remains in the Debug state= . To prEvent this issue, threads operating in ST-Full resource mode are tra= nsitioned to SMT mode upon entering Debug state. The Event count will also = reflect such switches from ST to SMT mode. (Also see the (NV_CPUACTLR14_EL1= .chka_prEvent_st_tx_to_smt_when_tx_in_debug_state bit to disable this behav= ior.)" + }, + { + "EventCode": "0x01f3", + "EventName": "TMS_SMT_TO_ST_COUNT", + "PublicDescription": "The Event counts the number of completed TMS= from SMT-to-ST. This Event is counted only by the thread that is not in WF= I." + }, + { + "EventCode": "0x01f4", + "EventName": "TMS_SMT_TO_ST_COUNT_ABRT", + "PublicDescription": "The Event counts the number of aborted TMS f= rom SMT-to-ST. This Event is counted only by the thread that is not in WFI." + }, + { + "EventCode": "0x021c", + "EventName": "CWT_ALLOC_ENTRY", + "PublicDescription": "Cache Way Tracker Allocate entry." + }, + { + "EventCode": "0x021d", + "EventName": "CWT_ALLOC_LINE", + "PublicDescription": "Cache Way Tracker Allocate line." + }, + { + "EventCode": "0x021e", + "EventName": "CWT_HIT", + "PublicDescription": "Cache Way Tracker hit." + }, + { + "EventCode": "0x021f", + "EventName": "CWT_HIT_TAG", + "PublicDescription": "Cache Way Tracker hit when ITAG lookup suppr= essed." + }, + { + "EventCode": "0x0220", + "EventName": "CWT_REPLAY_TAG", + "PublicDescription": "Cache Way Tracker causes ITAG replay due to = miss when ITAG lookup suppressed." + }, + { + "EventCode": "0x0250", + "EventName": "GPT_REQ", + "PublicDescription": "GPT lookup." + }, + { + "EventCode": "0x0251", + "EventName": "GPT_WC_HIT", + "PublicDescription": "GPT lookup hit in Walk cache." + }, + { + "EventCode": "0x0252", + "EventName": "GPT_PG_HIT", + "PublicDescription": "GPT lookup hit in TLB." + }, + { + "EventCode": "0x01ba", + "EventName": "PREFETCH_LATE_STORE_ISSUE", + "PublicDescription": "The Event counts the number of demand reques= ts that matches a Store-issue prefetcher's pending refill request. These ar= e called late prefetch requests and are still counted as useful prefetcher = requests for the sake of accuracy and coverage measurements." + }, + { + "EventCode": "0x01bb", + "EventName": "PREFETCH_LATE_STORE_STRIDE", + "PublicDescription": "The Event counts the number of demand reques= ts that matches a Store-stride prefetcher's pending refill request. These a= re called late prefetch requests and are still counted as useful prefetcher= requests for the sake of accuracy and coverage measurements." + }, + { + "EventCode": "0x01bc", + "EventName": "PREFETCH_LATE_PC_OFFSET", + "PublicDescription": "The Event counts the number of demand reques= ts that matches a PC-offset prefetcher's pending refill request. These are = called late prefetch requests and are still counted as useful prefetcher re= quests for the sake of accuracy and coverage measurements." + }, + { + "EventCode": "0x01bd", + "EventName": "PREFETCH_LATE_IFUPF", + "PublicDescription": "The Event counts the number of demand reques= ts that matches a IFU prefetcher's pending refill request. These are called= late prefetch requests and are still counted as useful prefetcher requests= for the sake of accuracy and coverage measurements." + }, + { + "EventCode": "0x01be", + "EventName": "PREFETCH_REFILL_STORE_ISSUE", + "PublicDescription": "The Event counts the number of cache refills= due to Store-Issue prefetcher." + }, + { + "EventCode": "0x01bf", + "EventName": "PREFETCH_REFILL_STORE_STRIDE", + "PublicDescription": "The Event counts the number of cache refills= due to Store-stride prefetcher." + }, + { + "EventCode": "0x01c0", + "EventName": "PREFETCH_REFILL_PC_OFFSET", + "PublicDescription": "The Event counts the number of cache refills= due to PC-offset prefetcher." + }, + { + "EventCode": "0x01c1", + "EventName": "PREFETCH_REFILL_IFUPF", + "PublicDescription": "The Event counts the number of cache refills= due to IFU prefetcher." + }, + { + "EventCode": "0x01c2", + "EventName": "CACHE_HIT_LINE_PF_STORE_ISSUE", + "PublicDescription": "The Event counts the number of first hit to = a cache line filled by Store-issue prefetcher." + }, + { + "EventCode": "0x01c3", + "EventName": "CACHE_HIT_LINE_PF_STORE_STRIDE", + "PublicDescription": "The Event counts the number of first hit to = a cache line filled by Store-stride prefetcher." + }, + { + "EventCode": "0x01c4", + "EventName": "CACHE_HIT_LINE_PF_PC_OFFSET", + "PublicDescription": "The Event counts the number of first hit to = a cache line filled by PC-offset prefetcher." + }, + { + "EventCode": "0x01c5", + "EventName": "CACHE_HIT_LINE_PF_IFUPF", + "PublicDescription": "The Event counts the number of first hit to = a cache line filled by IFU prefetcher." + }, + { + "EventCode": "0x01c6", + "EventName": "L2_PF_GEN_ST_ISSUE", + "PublicDescription": "Store-issue prefetch to L2 generated." + }, + { + "EventCode": "0x01c7", + "EventName": "L2_PF_GEN_ST_STRIDE", + "PublicDescription": "Store-stride prefetch to L2 generated" + }, + { + "EventCode": "0x01ee", + "EventName": "CACHE_HIT_LINE_PF_CONVERTED_PRFM", + "PublicDescription": "The Event counts the number of first hit to = a cache line filled by Converted-L1D-PRFM or Converted-L2D-PRFM. Note that = L2D_CACHE_HIT_RWL1PRF_FPRFM is inclusive of CACHE_HIT_LINE_PF_CONVERTED_PRF= M, where both the CACHE_HIT_LINE_PF_CONVERTED_PRFM and the L2D_CACHE_HIT_RW= L1PRF_FPRFM increment on a first hit to L2 D-cache filled by Converted-L1D-= PRFM or Converted-L2D-PRFM." + }, + { + "EventCode": "0x01ec", + "EventName": "PREFETCH_LATE_CONVERTED_PRFM", + "PublicDescription": "The Event counts the number of demand reques= ts that matches a Converted-L1D-PRFM or Converted-L2D-PRFM pending refill r= equest at L2 D-cache. These are called late prefetch requests and are still= counted as useful prefetcher requests for the sake of accuracy and coverag= e measurements. Note that this Event is not counted by the L2D_CACHE_HIT_RW= L1PRF_LATE_HWPRF, though the Converted-L1D-PRFM or Converted-L2D-PRFM are r= eplayed by the L2PRQ." + }, + { + "EventCode": "0x01ed", + "EventName": "PREFETCH_REFILL_CONVERTED_PRFM", + "PublicDescription": "The Event counts the number of L2 D-cache re= fills due to Converted-L1D-PRFM or Converted-L2D-PRFM. Note : L2D_CACHE_REF= ILL_PRFM is inclusive of PREFETCH_REFILL_PRFM_CONVERTED, where both the PRE= FETCH_REFILL_PRFM_CONVERTED and the L2D_CACHE_REFILL_PRFM increment when L2= D-cache refills due to Converted-L1D-PRFM or Converted-L2D-PRFM." + }, + { + "EventCode": "0x01eb", + "EventName": "L1DPRFM_L2DPRFM_TO_L2PRQ_CONVERTED", + "PublicDescription": "The Event counts the number of Converted-L1D= -PRFMs and Converted-L2D-PRFM. Activities involving the Converted-L1D-PRFM = are counted by the L1D_CACHE_PRFM. However they are *not* counted by the L1= D_CACHE_REFILL_PRFM, and L1D_CACHE_REFILL, as these Converted-L1D-PRFM are = treated as L2 D hardware prefetches. Activities around the Converted-L1D-PR= FMs and Converted-L2D-PRFMs are counted by the L2D_CACHE_PRFM, L2D_CACHE_RE= FILL_PRFM and L2D_CACHE_REFILL Events." + }, + { + "EventCode": "0x01ce", + "EventName": "L3DPRFM_TO_L2PRQ_CONVERTED", + "PublicDescription": "The Event counts the number of Converted-L3D= -PRFMs. These are indeed L3D PRFM and activities around these PRFM are coun= ted by the L3D_CACHE_PRFM, L3D_CACHE_REFILL_PRFM and L3D_CACHE_REFILL Event= s." + }, + { + "EventCode": "0x0202", + "EventName": "L0I_CACHE_RD", + "PublicDescription": "The Event counts the number of predict block= s serviced out of L0 I-cache. Note: The L0 I-cache performs at most 4 L0 I = look-up in a cycle. Two of which are to service PB from L0 I. And the other= two to refill L0 I-cache from L1 I. This Event count only the L0 I-cache l= ookup pertaining to servicing the PB from L0 I." + }, + { + "EventCode": "0x0203", + "EventName": "L0I_CACHE_REFILL", + "PublicDescription": "The Event counts the number of L0I cache ref= ill from L1 I-cache." + }, + { + "EventCode": "0x0207", + "EventName": "INTR_LATENCY", + "PublicDescription": "The Event counts the number of cycles elapse= d between when an Interrupt is recognized (after masking) to when a uop ass= ociated with the first Instruction in the destination exception level is al= located. If there is some other flush condition that pre-empts the Interrup= t, then the cycles counted terminates early at the first Instruction execut= ed after that flush. In the Event of dropped Interrupts (when an Interrupt = is deasserted before it is taken), this counter measures the number of cycl= es that elapse from the moment an Interrupt is recognized (post-masking) un= til the Interrupt is dropped or deasserted." + }, + { + "EventCode": "0x018f", + "EventName": "L1_PF_GEN_MCMC", + "PublicDescription": "Load/Store prefetch to L1 generated, MCMC." + }, + { + "EventCode": "0x0123", + "EventName": "FLUSH_STDBYPASS", + "PublicDescription": "Flushes due to bad predecode. This only incl= udes CT flushes." + } +] diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/retired.json b/to= ols/perf/pmu-events/arch/arm64/nvidia/t410/retired.json new file mode 100644 index 000000000000..ab384239541b --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/retired.json @@ -0,0 +1,94 @@ +[ + { + "ArchStdEvent": "INST_RETIRED", + "PublicDescription": "The Event counts Instructions that have been= architecturally executed." + }, + { + "ArchStdEvent": "CID_WRITE_RETIRED", + "PublicDescription": "The Event counts architecturally executed Wr= ites to the CONTEXTIDR_EL1 register, which usually contain the kernel PID a= nd can be output with hardware trace." + }, + { + "ArchStdEvent": "BR_IMMED_RETIRED", + "PublicDescription": "The Event counts architecturally executed di= rect Branches." + }, + { + "ArchStdEvent": "BR_RETURN_RETIRED", + "PublicDescription": "The Event counts architecturally executed pr= ocedure returns." + }, + { + "ArchStdEvent": "TTBR_WRITE_RETIRED", + "PublicDescription": "The Event counts architectural Writes to TTB= R0/1_EL1. If virtualization host extensions are enabled (by setting the HCR= _EL2.E2H bit to 1), then accesses to TTBR0/1_EL1 that are redirected to TTB= R0/1_EL2, or accesses to TTBR0/1_EL12, are counted. TTBRn registers are typ= ically updated when the kernel is swapping user-space threads or applicatio= ns." + }, + { + "ArchStdEvent": "BR_RETIRED", + "PublicDescription": "The Event counts architecturally executed Br= anches, whether the Branch is taken or not. Instructions that explicitly wr= ite to the PC are also counted. Note that exception generating Instructions= , exception return Instructions and context synchronization Instructions ar= e not counted." + }, + { + "ArchStdEvent": "BR_MIS_PRED_RETIRED", + "PublicDescription": "The Event counts Branches counted by BR_RETI= RED which were mispredicted and caused a pipeline flush." + }, + { + "ArchStdEvent": "OP_RETIRED", + "PublicDescription": "The Event counts micro-operations that are a= rchitecturally executed. This is a count of number of micro-operations reti= red from the commit queue in a single cycle." + }, + { + "ArchStdEvent": "BR_INDNR_TAKEN_RETIRED", + "PublicDescription": "The Event counts architecturally executed in= direct Branches excluding procedure returns that were taken." + }, + { + "ArchStdEvent": "BR_IMMED_PRED_RETIRED", + "PublicDescription": "The Event counts architecturally executed di= rect Branches that were correctly predicted." + }, + { + "ArchStdEvent": "BR_IMMED_MIS_PRED_RETIRED", + "PublicDescription": "The Event counts architecturally executed di= rect Branches that were mispredicted and caused a pipeline flush." + }, + { + "ArchStdEvent": "BR_IND_PRED_RETIRED", + "PublicDescription": "The Event counts architecturally executed in= direct Branches including procedure returns that were correctly predicted." + }, + { + "ArchStdEvent": "BR_IND_MIS_PRED_RETIRED", + "PublicDescription": "The Event counts architecturally executed in= direct Branches including procedure returns that were mispredicted and caus= ed a pipeline flush." + }, + { + "ArchStdEvent": "BR_RETURN_PRED_RETIRED", + "PublicDescription": "The Event counts architecturally executed pr= ocedure returns that were correctly predicted." + }, + { + "ArchStdEvent": "BR_RETURN_MIS_PRED_RETIRED", + "PublicDescription": "The Event counts architecturally executed pr= ocedure returns that were mispredicted and caused a pipeline flush." + }, + { + "ArchStdEvent": "BR_INDNR_PRED_RETIRED", + "PublicDescription": "The Event counts architecturally executed in= direct Branches excluding procedure returns that were correctly predicted." + }, + { + "ArchStdEvent": "BR_INDNR_MIS_PRED_RETIRED", + "PublicDescription": "The Event counts architecturally executed in= direct Branches excluding procedure returns that were mispredicted and caus= ed a pipeline flush." + }, + { + "ArchStdEvent": "BR_TAKEN_PRED_RETIRED", + "PublicDescription": "The Event counts architecturally executed Br= anches that were taken and were correctly predicted." + }, + { + "ArchStdEvent": "BR_TAKEN_MIS_PRED_RETIRED", + "PublicDescription": "The Event counts architecturally executed br= anches that were taken and were mispredicted causing a pipeline flush." + }, + { + "ArchStdEvent": "BR_SKIP_PRED_RETIRED", + "PublicDescription": "The Event counts architecturally executed Br= anches that were not taken and were correctly predicted." + }, + { + "ArchStdEvent": "BR_SKIP_MIS_PRED_RETIRED", + "PublicDescription": "The Event counts architecturally executed Br= anches that were not taken and were mispredicted causing a pipeline flush." + }, + { + "ArchStdEvent": "BR_PRED_RETIRED", + "PublicDescription": "The Event counts Branch Instructions counted= by BR_RETIRED which were correctly predicted." + }, + { + "ArchStdEvent": "BR_IND_RETIRED", + "PublicDescription": "The Event counts architecturally executed in= direct Branches including procedure returns." + } +] diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/spe.json b/tools/= perf/pmu-events/arch/arm64/nvidia/t410/spe.json new file mode 100644 index 000000000000..8d1fe2a8b161 --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/spe.json @@ -0,0 +1,42 @@ +[ + { + "ArchStdEvent": "SAMPLE_POP", + "PublicDescription": "The Event counts statistical profiling sampl= e population, the count of all operations that could be sampled but may or = may not be chosen for sampling." + }, + { + "ArchStdEvent": "SAMPLE_FEED", + "PublicDescription": "The Event counts statistical profiling sampl= es taken for sampling." + }, + { + "ArchStdEvent": "SAMPLE_FILTRATE", + "PublicDescription": "The Event counts statistical profiling sampl= es taken which are not removed by filtering." + }, + { + "ArchStdEvent": "SAMPLE_COLLISION", + "PublicDescription": "The Event counts statistical profiling sampl= es that have collided with a previous sample and so therefore not taken." + }, + { + "ArchStdEvent": "SAMPLE_FEED_BR", + "PublicDescription": "The Event counts statistical profiling sampl= es taken which are Branches." + }, + { + "ArchStdEvent": "SAMPLE_FEED_LD", + "PublicDescription": "The Event counts statistical profiling sampl= es taken which are Loads or Load atomic operations." + }, + { + "ArchStdEvent": "SAMPLE_FEED_ST", + "PublicDescription": "The Event counts statistical profiling sampl= es taken which are Stores or Store atomic operations." + }, + { + "ArchStdEvent": "SAMPLE_FEED_OP", + "PublicDescription": "The Event counts statistical profiling sampl= es taken which are matching any operation type filters supported." + }, + { + "ArchStdEvent": "SAMPLE_FEED_EVENT", + "PublicDescription": "The Event counts statistical profiling sampl= es taken which are matching Event packet filter constraints." + }, + { + "ArchStdEvent": "SAMPLE_FEED_LAT", + "PublicDescription": "The Event counts statistical profiling sampl= es taken which are exceeding minimum latency set by operation latency filte= r constraints." + } +] diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/spec_operation.js= on b/tools/perf/pmu-events/arch/arm64/nvidia/t410/spec_operation.json new file mode 100644 index 000000000000..18df3f701115 --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/spec_operation.json @@ -0,0 +1,230 @@ +[ + { + "ArchStdEvent": "INST_SPEC", + "PublicDescription": "The Event counts operations that have been s= peculatively executed." + }, + { + "ArchStdEvent": "OP_SPEC", + "PublicDescription": "The Event counts micro-operations speculativ= ely executed. This is the count of the number of micro-operations dispatche= d in a cycle." + }, + { + "ArchStdEvent": "UNALIGNED_LD_SPEC", + "PublicDescription": "The Event counts unaligned memory Read opera= tions issued by the CPU. This Event counts unaligned accesses (as defined b= y the actual Instruction), even if they are subsequently issued as multiple= aligned accesses. The Event does not count preload operations (PLD, PLI). = This Event is a subset of the UNALIGNED_LDST_SPEC Event." + }, + { + "ArchStdEvent": "UNALIGNED_ST_SPEC", + "PublicDescription": "The Event counts unaligned memory Write oper= ations issued by the CPU. This Event counts unaligned accesses (as defined = by the actual Instruction), even if they are subsequently issued as multipl= e aligned accesses. This Event is a subset of the UNALIGNED_LDST_SPEC Event= ." + }, + { + "ArchStdEvent": "UNALIGNED_LDST_SPEC", + "PublicDescription": "The Event counts unaligned memory operations= issued by the CPU. This Event counts unaligned accesses (as defined by the= actual Instruction), even if they are subsequently issued as multiple alig= ned accesses. This Event is the sum of the UNALIGNED_ST_SPEC and UNALIGNED_= LD_SPEC Events." + }, + { + "ArchStdEvent": "LDREX_SPEC", + "PublicDescription": "The Event counts Load-Exclusive operations t= hat have been speculatively executed. For example: LDREX, LDX" + }, + { + "ArchStdEvent": "STREX_PASS_SPEC", + "PublicDescription": "The Event counts Store-exclusive operations = that have been speculatively executed and have successfully completed the S= tore operation." + }, + { + "ArchStdEvent": "STREX_FAIL_SPEC", + "PublicDescription": "The Event counts Store-exclusive operations = that have been speculatively executed and have not successfully completed t= he Store operation." + }, + { + "ArchStdEvent": "STREX_SPEC", + "PublicDescription": "The Event counts Store-exclusive operations = that have been speculatively executed. This Event is the sum of STREX_PASS_= SPEC and STREX_FAIL_SPEC Events." + }, + { + "ArchStdEvent": "LD_SPEC", + "PublicDescription": "The Event counts speculatively executed Load= operations including Single Instruction Multiple Data (SIMD) Load operatio= ns." + }, + { + "ArchStdEvent": "ST_SPEC", + "PublicDescription": "The Event counts speculatively executed Stor= e operations including Single Instruction Multiple Data (SIMD) Store operat= ions." + }, + { + "ArchStdEvent": "LDST_SPEC", + "PublicDescription": "The Event counts Load and Store operations t= hat have been speculatively executed." + }, + { + "ArchStdEvent": "DP_SPEC", + "PublicDescription": "The Event counts speculatively executed logi= cal or arithmetic Instructions such as MOV/MVN operations." + }, + { + "ArchStdEvent": "ASE_SPEC", + "PublicDescription": "The Event counts speculatively executed Adva= nced SIMD operations excluding Load, Store, and Move micro-operations that = move Data to or from SIMD (vector) registers." + }, + { + "ArchStdEvent": "VFP_SPEC", + "PublicDescription": "The Event counts speculatively executed floa= ting point operations. This Event does not count operations that move Data = to or from floating point (vector) registers." + }, + { + "ArchStdEvent": "PC_WRITE_SPEC", + "PublicDescription": "The Event counts speculatively executed oper= ations which cause software changes of the PC. Those operations include all= taken Branch operations." + }, + { + "ArchStdEvent": "CRYPTO_SPEC", + "PublicDescription": "The Event counts speculatively executed cryp= tographic operations except for PMULL and VMULL operations." + }, + { + "ArchStdEvent": "BR_IMMED_SPEC", + "PublicDescription": "The Event counts direct Branch operations wh= ich are speculatively executed." + }, + { + "ArchStdEvent": "BR_RETURN_SPEC", + "PublicDescription": "The Event counts procedure return operations= (RET, RETAA and RETAB) which are speculatively executed." + }, + { + "ArchStdEvent": "BR_INDIRECT_SPEC", + "PublicDescription": "The Event counts indirect branch operations = including procedure returns, which are speculatively executed. This include= s operations that force a software change of the PC, other than exception-g= enerating operations and direct Branch Instructions. Some examples of the I= nstructions counted by this Event include BR Xn, RET, etc." + }, + { + "ArchStdEvent": "ISB_SPEC", + "PublicDescription": "The Event counts ISB operations that are exe= cuted." + }, + { + "ArchStdEvent": "DSB_SPEC", + "PublicDescription": "The Event counts DSB operations that are spe= culatively issued to Load/Store unit in the CPU." + }, + { + "ArchStdEvent": "DMB_SPEC", + "PublicDescription": "The Event counts DMB operations that are spe= culatively issued to the Load/Store unit in the CPU. This Event does not co= unt implied barriers from Load-acquire/Store-release operations." + }, + { + "ArchStdEvent": "CSDB_SPEC", + "PublicDescription": "The Event counts CSDB operations that are sp= eculatively issued to the Load/Store unit in the CPU. This Event does not c= ount implied barriers from Load-acquire/Store-release operations." + }, + { + "ArchStdEvent": "RC_LD_SPEC", + "PublicDescription": "The Event counts any Load acquire operations= that are speculatively executed. For example: LDAR, LDARH, LDARB" + }, + { + "ArchStdEvent": "RC_ST_SPEC", + "PublicDescription": "The Event counts any Store release operation= s that are speculatively executed. For example: STLR, STLRH, STLRB" + }, + { + "ArchStdEvent": "SIMD_INST_SPEC", + "PublicDescription": "The Event counts speculatively executed oper= ations that are SIMD or SVE vector operations or Advanced SIMD non-scalar o= perations." + }, + { + "ArchStdEvent": "ASE_INST_SPEC", + "PublicDescription": "The Event counts speculatively executed Adva= nced SIMD operations." + }, + { + "ArchStdEvent": "SVE_INST_SPEC", + "PublicDescription": "The Event counts speculatively executed oper= ations that are SVE operations." + }, + { + "ArchStdEvent": "INT_SPEC", + "PublicDescription": "The Event counts speculatively executed inte= ger arithmetic operations." + }, + { + "ArchStdEvent": "SVE_PRED_SPEC", + "PublicDescription": "The Event counts speculatively executed pred= icated SVE operations. This counter also counts SVE operation due to Instru= ction with Governing predicate operand that determines the Active elements = that do not write to any SVE Z vector destination register using either zer= oing or merging predicate. Thus, the operations due to Instructions such as= INCP, DECP, UQINCP, UQDECP, SQINCP, SQDECP and PNEXT, are counted by the S= VE_PRED_* Events." + }, + { + "ArchStdEvent": "SVE_PRED_EMPTY_SPEC", + "PublicDescription": "The Event counts speculatively executed pred= icated SVE operations with no active predicate elements. This counter also = counts SVE operation due to Instruction with Governing predicate operand th= at determines the Active elements that do not write to any SVE Z vector des= tination register using either zeroing or merging predicate. Thus, the oper= ations due to Instructions such as INCP, DECP, UQINCP, UQDECP, SQINCP, SQDE= CP and PNEXT, are counted by the SVE_PRED_* Events." + }, + { + "ArchStdEvent": "SVE_PRED_FULL_SPEC", + "PublicDescription": "The Event counts speculatively executed pred= icated SVE operations with all predicate elements active. This counter also= counts SVE operation due to Instruction with Governing predicate operand t= hat determines the Active elements that do not write to any SVE Z vector de= stination register using either zeroing or merging predicate. Thus, the ope= rations due to Instructions such as INCP, DECP, UQINCP, UQDECP, SQINCP, SQD= ECP and PNEXT, are counted by the SVE_PRED_* Events." + }, + { + "ArchStdEvent": "SVE_PRED_PARTIAL_SPEC", + "PublicDescription": "The Event counts speculatively executed pred= icated SVE operations with at least one but not all active predicate elemen= ts. This counter also counts SVE operation due to Instruction with Governin= g predicate operand that determines the Active elements that do not write t= o any SVE Z vector destination register using either zeroing or merging pre= dicate. Thus, the operations due to Instructions such as INCP, DECP, UQINCP= , UQDECP, SQINCP, SQDECP and PNEXT, are counted by the SVE_PRED_* Events." + }, + { + "ArchStdEvent": "SVE_PRED_NOT_FULL_SPEC", + "PublicDescription": "The Event counts speculatively executed pred= icated SVE operations with at least one non active predicate elements. This= counter also counts SVE operation due to Instruction with Governing predic= ate operand that determines the Active elements that do not write to any SV= E Z vector destination register using either zeroing or merging predicate. = Thus, the operations due to Instructions such as INCP, DECP, UQINCP, UQDECP= , SQINCP, SQDECP and PNEXT, are counted by the SVE_PRED_* Events." + }, + { + "ArchStdEvent": "PRF_SPEC", + "PublicDescription": "The Event counts speculatively executed oper= ations that prefetch memory. For example, Scalar: PRFM, SVE: PRFB, PRFD, PR= FH, or PRFW." + }, + { + "ArchStdEvent": "SVE_LDFF_SPEC", + "PublicDescription": "The Event counts speculatively executed SVE = first fault or non-fault Load operations." + }, + { + "ArchStdEvent": "SVE_LDFF_FAULT_SPEC", + "PublicDescription": "The Event counts speculatively executed SVE = first fault or non-fault Load operations that clear at least one bit in the= FFR." + }, + { + "ArchStdEvent": "ASE_SVE_INT8_SPEC", + "PublicDescription": "The Event counts speculatively executed Adva= nced SIMD or SVE integer operations with the largest Data type an 8-bit int= eger." + }, + { + "ArchStdEvent": "ASE_SVE_INT16_SPEC", + "PublicDescription": "The Event counts speculatively executed Adva= nced SIMD or SVE integer operations with the largest Data type a 16-bit int= eger." + }, + { + "ArchStdEvent": "ASE_SVE_INT32_SPEC", + "PublicDescription": "The Event counts speculatively executed Adva= nced SIMD or SVE integer operations with the largest Data type a 32-bit int= eger." + }, + { + "ArchStdEvent": "ASE_SVE_INT64_SPEC", + "PublicDescription": "The Event counts speculatively executed Adva= nced SIMD or SVE integer operations with the largest Data type a 64-bit int= eger." + }, + { + "EventCode": "0x011d", + "EventName": "SPEC_RET_STACK_FULL", + "PublicDescription": "The Event counts predict pipe stalls due to = speculative return address predictor full." + }, + { + "EventCode": "0x011f", + "EventName": "MOPS_SPEC", + "PublicDescription": "Macro-ops speculatively decoded." + }, + { + "EventCode": "0x0180", + "EventName": "BR_SPEC_PRED_TAKEN", + "PublicDescription": "Number of Predicted Taken from Branch Predic= tor." + }, + { + "EventCode": "0x0181", + "EventName": "BR_SPEC_PRED_TAKEN_FROM_L2BTB", + "PublicDescription": "Number of Predicted Taken Branch from L2 BTB= ." + }, + { + "EventCode": "0x0182", + "EventName": "BR_SPEC_PRED_TAKEN_MULTI", + "PublicDescription": "Number of Predicted Taken for Polymorphic Br= anch." + }, + { + "EventCode": "0x0185", + "EventName": "BR_SPEC_PRED_STATIC", + "PublicDescription": "Number of post fetch prediction." + }, + { + "EventCode": "0x01d0", + "EventName": "TLBI_LOCAL_SPEC", + "PublicDescription": "A non-broadcast TLBI Instruction executed (S= peculatively or otherwise) on *this* PE." + }, + { + "EventCode": "0x01d1", + "EventName": "TLBI_BROADCAST_SPEC", + "PublicDescription": "A broadcast TLBI Instruction executed (Specu= latively or otherwise) on *this* PE." + }, + { + "EventCode": "0x0200", + "EventName": "SIMD_CRYPTO_INST_SPEC", + "PublicDescription": "SIMD, SVE, and CRYPTO Instructions speculati= vely decoded." + }, + { + "EventCode": "0x01e7", + "EventName": "BR_SPEC_PRED_ALN_REDIR", + "PublicDescription": "BPU predict pipe align redirect (either AL-A= PQ hit/miss)." + }, + { + "EventCode": "0x022e", + "EventName": "VPRED_LD_SPEC", + "PublicDescription": "The Event counts the number of Speculatively= -executed-Load operations with addresses produced by the value-prediction m= echanism. The loaded Data might be discarded if the predicted address diffe= rs from the actual address." + }, + { + "EventCode": "0x022f", + "EventName": "VPRED_LD_SPEC_MISMATCH", + "PublicDescription": "The Event counts a subset of VPRED_LD_SPEC w= here the predicted Load address and the actual address mismatched." + } +] diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/stall.json b/tool= s/perf/pmu-events/arch/arm64/nvidia/t410/stall.json new file mode 100644 index 000000000000..652c1e3305d0 --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/stall.json @@ -0,0 +1,145 @@ +[ + { + "ArchStdEvent": "STALL_FRONTEND", + "PublicDescription": "The Event counts cycles when frontend could = not send any micro-operations to the rename stage because of frontend resou= rce stalls caused by fetch memory latency or Branch prediction flow stalls.= STALL_FRONTEND_SLOTS counts SLOTS during the cycle when this Event counts.= STALL_SLOT_FRONTEND will count SLOTS when this Event is counted on this CP= U." + }, + { + "ArchStdEvent": "STALL_BACKEND", + "PublicDescription": "The Event counts cycles whenever the rename = unit is unable to send any micro-operations to the backend of the pipeline = because of backend resource constraints. Backend resource constraints can i= nclude issue stage fullness, execution stage fullness, or other internal pi= peline resource fullness. All the backend slots were empty during the cycle= when this Event counts." + }, + { + "ArchStdEvent": "STALL", + "PublicDescription": "The Event counts cycles when no operations a= re sent to the rename unit from the frontend or from the rename unit to the= backend for any reason (either frontend or backend stall). This Event is t= he sum of STALL_FRONTEND and STALL_BACKEND." + }, + { + "ArchStdEvent": "STALL_SLOT_BACKEND", + "PublicDescription": "The Event counts slots per cycle in which no= operations are sent from the rename unit to the backend due to backend res= ource constraints. STALL_BACKEND counts during the cycle when STALL_SLOT_BA= CKEND counts at least 1. STALL_BACKEND counts during the cycle when STALL_S= LOT_BACKEND is SLOTS." + }, + { + "ArchStdEvent": "STALL_SLOT_FRONTEND", + "PublicDescription": "The Event counts slots per cycle in which no= operations are sent to the rename unit from the frontend due to frontend r= esource constraints. STALL_FRONTEND counts during the cycle when STALL_SLOT= _FRONTEND is SLOTS." + }, + { + "ArchStdEvent": "STALL_SLOT", + "PublicDescription": "The Event counts slots per cycle in which no= operations are sent to the rename unit from the frontend or from the renam= e unit to the backend for any reason (either frontend or backend stall). ST= ALL_SLOT is the sum of STALL_SLOT_FRONTEND and STALL_SLOT_BACKEND." + }, + { + "ArchStdEvent": "STALL_BACKEND_MEM", + "PublicDescription": "The Event counts cycles when the backend is = stalled because there is a pending demand Load request in progress in the l= ast level Core cache. Last level cache in this CPU is Level 2, hence this E= vent counts same as STALL_BACKEND_L2D." + }, + { + "ArchStdEvent": "STALL_FRONTEND_MEMBOUND", + "PublicDescription": "The Event counts cycles when the frontend co= uld not send any micro-operations to the rename stage due to resource const= raints in the memory resources." + }, + { + "ArchStdEvent": "STALL_FRONTEND_L1I", + "PublicDescription": "The Event counts cycles when the frontend is= stalled because there is an Instruction fetch request pending in the L1 I-= cache." + }, + { + "ArchStdEvent": "STALL_FRONTEND_MEM", + "PublicDescription": "The Event counts cycles when the frontend is= stalled because there is an Instruction fetch request pending in the last = level Core cache. Last level cache in this CPU is Level 2, hence this Event= counts rather than STALL_FRONTEND_L2I." + }, + { + "ArchStdEvent": "STALL_FRONTEND_TLB", + "PublicDescription": "The Event counts when the frontend is stalle= d on any TLB misses being handled. This Event also counts the TLB accesses = made by hardware prefetches." + }, + { + "ArchStdEvent": "STALL_FRONTEND_CPUBOUND", + "PublicDescription": "The Event counts cycles when the frontend co= uld not send any micro-operations to the rename stage due to resource const= raints in the CPU resources excluding memory resources." + }, + { + "ArchStdEvent": "STALL_FRONTEND_FLOW", + "PublicDescription": "The Event counts cycles when the frontend co= uld not send any micro-operations to the rename stage due to resource const= raints in the Branch prediction unit." + }, + { + "ArchStdEvent": "STALL_FRONTEND_FLUSH", + "PublicDescription": "The Event counts cycles when the frontend co= uld not send any micro-operations to the rename stage as the frontend is re= covering from a machine flush or resteer. Example scenarios that cause a fl= ush include Branch mispredictions, taken exceptions, microarchitectural flu= sh etc." + }, + { + "ArchStdEvent": "STALL_BACKEND_MEMBOUND", + "PublicDescription": "The Event counts cycles when the backend cou= ld not accept any micro-operations due to resource constraints in the memor= y resources." + }, + { + "ArchStdEvent": "STALL_BACKEND_L1D", + "PublicDescription": "The Event counts cycles when the backend is = stalled because there is a pending demand Load request in progress in the L= 1 D-cache." + }, + { + "ArchStdEvent": "STALL_BACKEND_TLB", + "PublicDescription": "The Event counts cycles when the backend is = stalled on any demand TLB misses being handled." + }, + { + "ArchStdEvent": "STALL_BACKEND_ST", + "PublicDescription": "The Event counts cycles when the backend is = stalled and there is a Store that has not reached the pre-commit stage." + }, + { + "ArchStdEvent": "STALL_BACKEND_CPUBOUND", + "PublicDescription": "The Event counts cycles when the backend cou= ld not accept any micro-operations due to any resource constraints in the C= PU excluding memory resources." + }, + { + "ArchStdEvent": "STALL_BACKEND_BUSY", + "PublicDescription": "The Event counts cycles when the backend cou= ld not accept any micro-operations because the issue queues are full to tak= e any operations for execution." + }, + { + "ArchStdEvent": "STALL_BACKEND_ILOCK", + "PublicDescription": "The Event counts cycles when the backend cou= ld not accept any micro-operations due to resource constraints imposed by i= nput dependency." + }, + { + "ArchStdEvent": "STALL_BACKEND_RENAME", + "PublicDescription": "The Event counts cycles when backend is stal= led even when operations are available from the frontend but at least one i= s not ready to be sent to the backend because no rename register is availab= le." + }, + { + "EventCode": "0x0158", + "EventName": "FLAG_DISP_STALL", + "PublicDescription": "Rename stalled due to FRF(Flag register file= ) full." + }, + { + "EventCode": "0x0159", + "EventName": "GEN_DISP_STALL", + "PublicDescription": "Rename stalled due to GRF (General-purpose r= egister file) full." + }, + { + "EventCode": "0x015a", + "EventName": "VEC_DISP_STALL", + "PublicDescription": "Rename stalled due to VRF (Vector register f= ile) full." + }, + { + "EventCode": "0x015c", + "EventName": "SX_IQ_STALL", + "PublicDescription": "Dispatch stalled due to IQ full, SX." + }, + { + "EventCode": "0x015d", + "EventName": "MX_IQ_STALL", + "PublicDescription": "Dispatch stalled due to IQ full, MX." + }, + { + "EventCode": "0x015e", + "EventName": "LS_IQ_STALL", + "PublicDescription": "Dispatch stalled due to IQ full, LS." + }, + { + "EventCode": "0x015f", + "EventName": "VX_IQ_STALL", + "PublicDescription": "Dispatch stalled due to IQ full, VX." + }, + { + "EventCode": "0x0160", + "EventName": "MCQ_FULL_STALL", + "PublicDescription": "Dispatch stalled due to MCQ full." + }, + { + "EventCode": "0x01cf", + "EventName": "PRD_DISP_STALL", + "PublicDescription": "Rename stalled due to predicate registers (p= hysical) are full." + }, + { + "EventCode": "0x01e0", + "EventName": "CSDB_STALL", + "PublicDescription": "Rename stalled due to CSDB." + }, + { + "EventCode": "0x01e2", + "EventName": "STALL_SLOT_FRONTEND_WITHOUT_MISPRED", + "PublicDescription": "Stall slot frontend during non-mispredicted = branch. The Event counts the STALL_STOT_FRONTEND Events, except for the 4 c= ycles following a mispredicted branch Event or 4 cycles following a commit = flush&restart Event." + } +] diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/tlb.json b/tools/= perf/pmu-events/arch/arm64/nvidia/t410/tlb.json new file mode 100644 index 000000000000..9a81a62a2646 --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/tlb.json @@ -0,0 +1,158 @@ +[ + { + "ArchStdEvent": "L1I_TLB_REFILL", + "PublicDescription": "The Event counts L1 Instruction TLB refills = from any Instruction fetch (Demand, hardware prefetch and software preload = accesses). If there are multiple misses in the TLB that are resolved by the= refill, then this Event only counts once. This Event will not count if the= translation table walk results in a fault (such as a translation or access= fault), since there is no new translation created for the TLB." + }, + { + "ArchStdEvent": "L1D_TLB_REFILL", + "PublicDescription": "The Event counts L1 Data TLB accesses that r= esulted in TLB refills. If there are multiple misses in the TLB that are re= solved by the refill, then this Event only counts once. This Event counts f= or refills caused by preload Instructions or hardware prefetch accesses. Th= is Event counts regardless of whether the miss hits in L2 or results in a t= ranslation table walk. This Event will not count if the translation table w= alk results in a fault (such as a translation or access fault), since there= is no new translation created for the TLB. This Event will not count on an= access from an AT(address translation) Instruction. This Event is the sum = of the L1D_TLB_REFILL_RD and L1D_TLB_REFILL_WR Events." + }, + { + "ArchStdEvent": "L1D_TLB", + "PublicDescription": "The Event counts L1 Data TLB accesses caused= by any memory Load or Store operation. Note that Load or Store Instruction= s can be broken up into multiple memory operations. This Event does not cou= nt TLB maintenance operations." + }, + { + "ArchStdEvent": "L1I_TLB", + "PublicDescription": "The Event counts L1 Instruction TLB accesses= (caused by Demand or hardware prefetch or software preload accesses), whet= her the access hits or misses in the TLB. This Event counts both demand acc= esses and prefetch or preload generated accesses. This Event is a superset = of the L1I_TLB_REFILL Event." + }, + { + "ArchStdEvent": "L2D_TLB_REFILL", + "PublicDescription": "The Event counts L2 TLB refills caused by me= mory operations from both Data and Instruction fetch, except for those caus= ed by TLB maintenance operations and hardware prefetches. This Event is the= sum of the L2D_TLB_REFILL_RD and L2D_TLB_REFILL_WR Events." + }, + { + "ArchStdEvent": "L2D_TLB", + "PublicDescription": "The Event counts L2 TLB accesses except thos= e caused by TLB maintenance operations. This Event is the sum of the L2D_TL= B_RD and L2D_TLB_WR Events." + }, + { + "ArchStdEvent": "DTLB_WALK", + "PublicDescription": "The Event counts number of demand Data trans= lation table walks caused by a miss in the L2 TLB and performing at least o= ne memory access. Translation table walks are counted even if the translati= on ended up taking a translation fault for reasons different than EPD, E0PD= and NFD. Note that partial translations that cause a translation table wal= k are also counted. Also note that this Event counts walks triggered by sof= tware preloads, but not walks triggered by hardware prefetchers, and that t= his Event does not count walks triggered by TLB maintenance operations. Thi= s Event does not include prefetches." + }, + { + "ArchStdEvent": "ITLB_WALK", + "PublicDescription": "The Event counts number of Instruction trans= lation table walks caused by a miss in the L2 TLB and performing at least o= ne memory access. Translation table walks are counted even if the translati= on ended up taking a translation fault for reasons different than EPD, E0PD= and NFD. Note that partial translations that cause a translation table wal= k are also counted. Also note that this Event does not count walks triggere= d by TLB maintenance operations. This Event does not include prefetches." + }, + { + "ArchStdEvent": "L1D_TLB_REFILL_RD", + "PublicDescription": "The Event counts L1 Data TLB refills caused = by memory Read operations. If there are multiple misses in the TLB that are= resolved by the refill, then this Event only counts once. This Event count= s for refills caused by preload Instructions or hardware prefetch accesses.= This Event counts regardless of whether the miss hits in L2 or results in = a translation table walk. This Event will not count if the translation tabl= e walk results in a fault (such as a translation or access fault), since th= ere is no new translation created for the TLB. This Event will not count on= an access from an Address Translation (AT) Instruction. This Event is a su= bset of the L1D_TLB_REFILL Event." + }, + { + "ArchStdEvent": "L1D_TLB_REFILL_WR", + "PublicDescription": "The Event counts L1 Data TLB refills caused = by Data side memory Write operations. If there are multiple misses in the T= LB that are resolved by the refill, then this Event only counts once. This = Event counts for refills caused by preload Instructions or hardware prefetc= h accesses. This Event counts regardless of whether the miss hits in L2 or = results in a translation table walk. This Event will not count if the table= walk results in a fault (such as a translation or access fault), since the= re is no new translation created for the TLB. This Event will not count wit= h an access from an Address Translation (AT) Instruction. This Event is a s= ubset of the L1D_TLB_REFILL Event." + }, + { + "ArchStdEvent": "L1D_TLB_RD", + "PublicDescription": "The Event counts L1 Data TLB accesses caused= by memory Read operations. This Event counts whether the access hits or mi= sses in the TLB. This Event does not count TLB maintenance operations." + }, + { + "ArchStdEvent": "L1D_TLB_WR", + "PublicDescription": "The Event counts any L1 Data side TLB access= es caused by memory Write operations. This Event counts whether the access = hits or misses in the TLB. This Event does not count TLB maintenance operat= ions." + }, + { + "ArchStdEvent": "L2D_TLB_REFILL_RD", + "PublicDescription": "The Event counts L2 TLB refills caused by me= mory Read operations from both Data and Instruction fetch except for those = caused by TLB maintenance operations or hardware prefetches. This Event is = a subset of the L2D_TLB_REFILL Event." + }, + { + "ArchStdEvent": "L2D_TLB_REFILL_WR", + "PublicDescription": "The Event counts L2 TLB refills caused by me= mory Write operations from both Data and Instruction fetch except for those= caused by TLB maintenance operations. This Event is a subset of the L2D_TL= B_REFILL Event." + }, + { + "ArchStdEvent": "L2D_TLB_RD", + "PublicDescription": "The Event counts L2 TLB accesses caused by m= emory Read operations from both Data and Instruction fetch except for those= caused by TLB maintenance operations. This Event is a subset of the L2D_TL= B Event." + }, + { + "ArchStdEvent": "L2D_TLB_WR", + "PublicDescription": "The Event counts L2 TLB accesses caused by m= emory Write operations from both Data and Instruction fetch except for thos= e caused by TLB maintenance operations. This Event is a subset of the L2D_T= LB Event." + }, + { + "ArchStdEvent": "DTLB_WALK_PERCYC", + "PublicDescription": "The Event counts the number of Data translat= ion table walks in progress per cycle." + }, + { + "ArchStdEvent": "ITLB_WALK_PERCYC", + "PublicDescription": "The Event counts the number of Instruction t= ranslation table walks in progress per cycle." + }, + { + "ArchStdEvent": "L1D_TLB_RW", + "PublicDescription": "The Event counts L1 Data TLB demand accesses= caused by memory Read or Write operations. This Event counts whether the a= ccess hits or misses in the TLB. This Event does not count TLB maintenance = operations." + }, + { + "ArchStdEvent": "L1I_TLB_RD", + "PublicDescription": "The Event counts L1 Instruction TLB demand a= ccesses whether the access hits or misses in the TLB." + }, + { + "ArchStdEvent": "L1D_TLB_PRFM", + "PublicDescription": "The Event counts L1 Data TLB accesses genera= ted by software prefetch or preload memory accesses. Load or Store Instruct= ions can be broken into multiple memory operations. This Event does not cou= nt TLB maintenance operations." + }, + { + "ArchStdEvent": "L1I_TLB_PRFM", + "PublicDescription": "The Event counts L1 Instruction TLB accesses= generated by software preload or prefetch Instructions. This Event counts = whether the access hits or misses in the TLB. This Event does not count TLB= maintenance operations." + }, + { + "ArchStdEvent": "DTLB_HWUPD", + "PublicDescription": "The Event counts number of memory accesses t= riggered by a Data translation table walk and performing an update of a tra= nslation table entry. Memory accesses are counted even if the translation e= nded up taking a translation fault for reasons different than EPD, E0PD and= NFD. Note that this Event counts accesses triggered by software preloads, = but not accesses triggered by hardware prefetchers." + }, + { + "ArchStdEvent": "ITLB_HWUPD", + "PublicDescription": "The Event counts number of memory accesses t= riggered by an Instruction translation table walk and performing an update = of a translation table entry. Memory accesses are counted even if the trans= lation ended up taking a translation fault for reasons different than EPD, = E0PD and NFD." + }, + { + "ArchStdEvent": "DTLB_STEP", + "PublicDescription": "The Event counts number of memory accesses t= riggered by a demand Data translation table walk and performing a Read of a= translation table entry. Memory accesses are counted even if the translati= on ended up taking a translation fault for reasons different than EPD, E0PD= and NFD. Note that this Event counts accesses triggered by software preloa= ds, but not accesses triggered by hardware prefetchers." + }, + { + "ArchStdEvent": "ITLB_STEP", + "PublicDescription": "The Event counts number of memory accesses t= riggered by an Instruction translation table walk and performing a Read of = a translation table entry. Memory accesses are counted even if the translat= ion ended up taking a translation fault for reasons different than EPD, E0P= D and NFD." + }, + { + "ArchStdEvent": "DTLB_WALK_LARGE", + "PublicDescription": "The Event counts number of demand Data trans= lation table walks caused by a miss in the L2 TLB and yielding a large page= . The set of large pages is defined as all pages with a final size higher t= han or equal to 2MB. Translation table walks that end up taking a translati= on fault are not counted, as the page size would be undefined in that case.= If DTLB_WALK_BLOCK is implemented, then it is an alias for this Event in t= his family. Note that partial translations that cause a translation table w= alk are also counted. Also note that this Event counts walks triggered by s= oftware preloads, but not walks triggered by hardware prefetchers, and that= this Event does not count walks triggered by TLB maintenance operations." + }, + { + "ArchStdEvent": "ITLB_WALK_LARGE", + "PublicDescription": "The Event counts number of Instruction trans= lation table walks caused by a miss in the L2 TLB and yielding a large page= . The set of large pages is defined as all pages with a final size higher t= han or equal to 2MB. Translation table walks that end up taking a translati= on fault are not counted, as the page size would be undefined in that case.= In this family, this is equal to ITLB_WALK_BLOCK Event. Note that partial = translations that cause a translation table walk are also counted. Also not= e that this Event does not count walks triggered by TLB maintenance operati= ons." + }, + { + "ArchStdEvent": "DTLB_WALK_SMALL", + "PublicDescription": "The Event counts number of Data translation = table walks caused by a miss in the L2 TLB and yielding a small page. The s= et of small pages is defined as all pages with a final size lower than 2MB.= Translation table walks that end up taking a translation fault are not cou= nted, as the page size would be undefined in that case. If DTLB_WALK_PAGE E= vent is implemented, then it is an alias for this Event in this family. Not= e that partial translations that cause a translation table walk are also co= unted. Also note that this Event counts walks triggered by software preload= s, but not walks triggered by hardware prefetchers, and that this Event doe= s not count walks triggered by TLB maintenance operations." + }, + { + "ArchStdEvent": "ITLB_WALK_SMALL", + "PublicDescription": "The Event counts number of Instruction trans= lation table walks caused by a miss in the L2 TLB and yielding a small page= . The set of small pages is defined as all pages with a final size lower th= an 2MB. Translation table walks that end up taking a translation fault are = not counted, as the page size would be undefined in that case. In this fami= ly, this is equal to ITLB_WALK_PAGE Event. Note that partial translations t= hat cause a translation table walk are also counted. Also note that this Ev= ent does not count walks triggered by TLB maintenance operations." + }, + { + "ArchStdEvent": "DTLB_WALK_RW", + "PublicDescription": "The Event counts number of demand Data trans= lation table walks caused by a miss in the L2 TLB and performing at least o= ne memory access. Translation table walks are counted even if the translati= on ended up taking a translation fault for reasons different than EPD, E0PD= and NFD. Note that partial translations that cause a translation table wal= k are also counted. Also note that this Event does not count walks triggere= d by TLB maintenance operations." + }, + { + "ArchStdEvent": "ITLB_WALK_RD", + "PublicDescription": "The Event counts number of demand Instructio= n translation table walks caused by a miss in the L2 TLB and performing at = least one memory access. Translation table walks are counted even if the tr= anslation ended up taking a translation fault for reasons different than EP= D, E0PD and NFD. Note that partial translations that cause a translation ta= ble walk are also counted. Also note that this Event does not count walks t= riggered by TLB maintenance operations." + }, + { + "ArchStdEvent": "DTLB_WALK_PRFM", + "PublicDescription": "The Event counts number of software prefetch= es or preloads generated Data translation table walks caused by a miss in t= he L2 TLB and performing at least one memory access. Translation table walk= s are counted even if the translation ended up taking a translation fault f= or reasons different than EPD, E0PD and NFD. Note that partial translations= that cause a translation table walk are also counted. Also note that this = Event does not count walks triggered by TLB maintenance operations." + }, + { + "ArchStdEvent": "ITLB_WALK_PRFM", + "PublicDescription": "The Event counts number of software prefetch= es or preloads generated Instruction translation table walks caused by a mi= ss in the L2 TLB and performing at least one memory access. Translation tab= le walks are counted even if the translation ended up taking a translation = fault for reasons different than EPD, E0PD and NFD. Note that partial trans= lations that cause a translation table walk are also counted. Also note tha= t this Event does not count walks triggered by TLB maintenance operations." + }, + { + "EventCode": "0x010e", + "EventName": "L1D_TLB_REFILL_RD_PF", + "PublicDescription": "L1 Data TLB refill, Read, prefetch." + }, + { + "EventCode": "0x010f", + "EventName": "L2TLB_PF_REFILL", + "PublicDescription": "L2 Data TLB refill, Read, prefetch. The Even= t counts MMU refills due to internal PFStream requests." + }, + { + "EventCode": "0x0223", + "EventName": "L1I_TLB_REFILL_RD", + "PublicDescription": "L1 Instruction TLB refills due to Demand mis= s." + }, + { + "EventCode": "0x0224", + "EventName": "L1I_TLB_REFILL_PRFM", + "PublicDescription": "L1 Instruction TLB refills due to Software p= refetch miss." + } +] --=20 2.43.0