From nobody Mon Oct 6 15:13:22 2025 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5153B219E8D for ; Sat, 19 Jul 2025 03:45:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752896738; cv=none; b=I6nXog3OodFzMd8wQV4GAttxNm4DWxOvqLgZB9neCfjrnzvVdOl6WVojDNjdrmgCg492S0yKwL3zhufZWI6p25lG8TbKf7ArmFNje37VKSwBKFMKnWUiMBp10aOGsSRlYGscBzwln43YldIDTg9bqMZ/HmzoKFo/5xY0ZqRiGQw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752896738; c=relaxed/simple; bh=Icq1qSipxewlEoeuGeArJjLisKORy7Lxm7lHr0Tko9s=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Content-Type; b=KZTva67Jczv6/qF8SNz2k1TN/zEXEE6nKfACyU+PUCMXTsw/+NZASJBTnFLiQyr918OfLoIv5CxbWHG3SujcfbqJUsF4v7SRK+Amp0aTKhix/AxoRBPQ3LxVlZ5WCf4igD3eTcr9SMJpCVhvrTn2WSKSsalLwjpcomwmdbreX+U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=M3j7VThz; arc=none smtp.client-ip=209.85.215.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="M3j7VThz" Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-b38ec062983so1825428a12.3 for ; Fri, 18 Jul 2025 20:45:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1752896727; x=1753501527; darn=vger.kernel.org; h=content-transfer-encoding:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=ajbzU7UIigO+eEFJUqnDB36JF+35fsVTr8mFTknfk04=; b=M3j7VThz89eD6OZDL0l1gmbSLxYPVl0/cBfoxEbsRLs6/NrdeHIIDnNh0lA6jBwUhJ rm3b6F834PiK4mNm0eJnmWsGTrKLdl2Cbv8bg6jnAAULuQ0I8nLMO7WDvfygXGwD7qfn zWommgyWn1DvFgi9lz5yVXeZxS4Bb0hkzpYHuGsXCmZ6JhYH/7bcdG6lX6kSZEArKjtJ z4KvJsLaZS4p3xbn9q3qAekD3NnAPA/E8MMnHlCO6V2mhDOu/p6hH39JG0KXcbHKhcvR BYed1mh/QYwe3vvu6pPx1SVPaUfh2z5IPm0aKIWuaHyxIyVLZ8R+lo+t52ggFO0QNf2Z 0CnQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752896728; x=1753501528; h=content-transfer-encoding:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=ajbzU7UIigO+eEFJUqnDB36JF+35fsVTr8mFTknfk04=; b=hjFgKLnaGa04ax1qgHkGLT6bjoFeH5tVFp8CgCGW2T5L0kgdJJFMxsw+YYmFXjovvM 4eI6bSkQ9Jf0HrCUezmFG2V0XZYDbqCZCB/HDlLjTO2zBD9MEzJGMNrYr+t99DBf9HVb rfnKeIQVgCngDxKsdckT5o7F9gOYIDhbZefwMBwvzrnL2nBnWtWlnIqE2pwlU/DmmnWP BDSrd4EEbGovO4AYT4ex743sl70Mpjn4y/dUMJ0pQzr7pOocR61Tza69ZdrcEYmPU+6a S5wknZ4tVIWjHgQeNBGx5GWqMjd34KpFFcTNNqIpfk98IRq5P0CGMDg8KgLUXxdPTgW7 9DbQ== X-Forwarded-Encrypted: i=1; AJvYcCU+1DsHbTlsVzb2V1OaIF7zQuCg/OSZewiegby+KXuElZ7ggFKoMUoPCIJ9mGIvS6qPjVDlkxOl+kEwh0U=@vger.kernel.org X-Gm-Message-State: AOJu0YxuKEnp7494Vy2G103ojbl7F5q2j/Ov13XeOe5rlB92RA5K7UXP Oaua3ng9f6bQtN/HVX5N9NUPK+JaO14TXEtonkZmd3ge82yo42BNL+pNxXl31wqzd5/7JX/ayqN yAOOzecJKEw== X-Google-Smtp-Source: AGHT+IHgjXtZeS9MWnJw0QaRs0K1423cZiYsSGlzGIma3UVNr339vfB91wq8B4i+TbGxa75qWPp/mdN0Y+l9 X-Received: from pjbsh18.prod.google.com ([2002:a17:90b:5252:b0:312:f88d:25f6]) (user=irogers job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:5305:b0:312:26d9:d5a7 with SMTP id 98e67ed59e1d1-31cc25cd0c8mr8891702a91.20.1752896727620; Fri, 18 Jul 2025 20:45:27 -0700 (PDT) Date: Fri, 18 Jul 2025 20:44:57 -0700 In-Reply-To: <20250719034515.2000467-1-irogers@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250719034515.2000467-1-irogers@google.com> X-Mailer: git-send-email 2.50.0.727.gbf7dc18ff4-goog Message-ID: <20250719034515.2000467-2-irogers@google.com> Subject: [PATCH v1 01/19] perf vendor events: Update alderlake events/metrics From: Ian Rogers To: Thomas Falcon , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Kan Liang , "=?UTF-8?q?Andreas=20F=C3=A4rber?=" , Manivannan Sadhasivam , Caleb Biggers , Weilin Wang , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-actions@lists.infradead.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Update events from v1.31 to v1.33. Update metrics from TMA 5.0 to 5.1. The event updates come from: https://github.com/intel/perfmon/commit/c504da6cb00e52067206166d2049a0c11af= 3b650 https://github.com/intel/perfmon/commit/4c18312c1a22eb564f5dbc94b187b59e438= 3a56a Signed-off-by: Ian Rogers Tested-by: Thomas Falcon --- .../arch/x86/alderlake/adl-metrics.json | 104 +++++------ .../pmu-events/arch/x86/alderlake/cache.json | 99 +++++------ .../arch/x86/alderlake/floating-point.json | 28 ++- .../arch/x86/alderlake/frontend.json | 42 +++-- .../pmu-events/arch/x86/alderlake/memory.json | 12 +- .../pmu-events/arch/x86/alderlake/other.json | 8 +- .../arch/x86/alderlake/pipeline.json | 163 +++++++----------- .../x86/alderlake/uncore-interconnect.json | 2 - .../arch/x86/alderlake/virtual-memory.json | 40 ++--- .../arch/x86/alderlaken/adln-metrics.json | 20 +-- .../x86/alderlaken/uncore-interconnect.json | 2 - tools/perf/pmu-events/arch/x86/mapfile.csv | 4 +- 12 files changed, 231 insertions(+), 293 deletions(-) diff --git a/tools/perf/pmu-events/arch/x86/alderlake/adl-metrics.json b/to= ols/perf/pmu-events/arch/x86/alderlake/adl-metrics.json index 377dfecd96bd..cae7c0cf02f2 100644 --- a/tools/perf/pmu-events/arch/x86/alderlake/adl-metrics.json +++ b/tools/perf/pmu-events/arch/x86/alderlake/adl-metrics.json @@ -1,56 +1,56 @@ [ { "BriefDescription": "C10 residency percent per package", - "MetricExpr": "cstate_pkg@c10\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c10\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C10_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C1 residency percent per core", - "MetricExpr": "cstate_core@c1\\-residency@ / TSC", + "MetricExpr": "cstate_core@c1\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C1_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C2 residency percent per package", - "MetricExpr": "cstate_pkg@c2\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c2\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C2_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C3 residency percent per package", - "MetricExpr": "cstate_pkg@c3\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c3\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C3_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per core", - "MetricExpr": "cstate_core@c6\\-residency@ / TSC", + "MetricExpr": "cstate_core@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per package", - "MetricExpr": "cstate_pkg@c6\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C7 residency percent per core", - "MetricExpr": "cstate_core@c7\\-residency@ / TSC", + "MetricExpr": "cstate_core@c7\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C7_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C8 residency percent per package", - "MetricExpr": "cstate_pkg@c8\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c8\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C8_Pkg_Residency", "ScaleUnit": "100%" @@ -552,7 +552,7 @@ }, { "BriefDescription": "Average CPU Utilization", - "MetricExpr": "cpu_atom@CPU_CLK_UNHALTED.REF_TSC@ / TSC", + "MetricExpr": "cpu_atom@CPU_CLK_UNHALTED.REF_TSC@ / msr@tsc\\,cpu= =3Dcpu_atom@", "MetricName": "tma_info_system_cpu_utilization", "Unit": "cpu_atom" }, @@ -751,7 +751,7 @@ { "BriefDescription": "This category represents fraction of slots wh= ere no uops are being delivered due to a lack of required resources for acc= epting new uops in the Backend", "DefaultMetricgroupName": "TopdownL1", - "MetricExpr": "cpu_core@topdown\\-be\\-bound@ / (cpu_core@topdown\= \-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retirin= g@ + cpu_core@topdown\\-be\\-bound@) + 0 * tma_info_thread_slots", + "MetricExpr": "cpu_core@topdown\\-be\\-bound@ / (cpu_core@topdown\= \-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retirin= g@ + cpu_core@topdown\\-be\\-bound@)", "MetricGroup": "BvOB;Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", "MetricThreshold": "tma_backend_bound > 0.2", @@ -789,12 +789,21 @@ "PublicDescription": "Total pipeline cost of instructions used for= program control-flow - a subset of the Retiring category in TMA. Examples = include function calls; loops and alignments. (A lower bound)", "Unit": "cpu_core" }, + { + "BriefDescription": "Total pipeline cost when the execution is com= pute-bound - an estimation", + "MetricExpr": "100 * (tma_core_bound * tma_divider / (tma_divider = + tma_ports_utilization + tma_serializing_operation) + tma_core_bound * (tm= a_ports_utilization / (tma_divider + tma_ports_utilization + tma_serializin= g_operation)) * (tma_ports_utilized_3m / (tma_ports_utilized_0 + tma_ports_= utilized_1 + tma_ports_utilized_2 + tma_ports_utilized_3m)))", + "MetricGroup": "BvCB;Cor;tma_issueComp", + "MetricName": "tma_bottleneck_compute_bound_est", + "MetricThreshold": "tma_bottleneck_compute_bound_est > 20", + "PublicDescription": "Total pipeline cost when the execution is co= mpute-bound - an estimation. Covers Core Bound when High ILP as well as whe= n long-latency execution units are busy. Related metrics: ", + "Unit": "cpu_core" + }, { "BriefDescription": "Total pipeline cost of external Memory- or Ca= che-Bandwidth related bottlenecks", "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) *= (tma_mem_bandwidth / (tma_mem_bandwidth + tma_mem_latency)) + tma_memory_b= ound * (tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_= l3_bound + tma_store_bound)) * (tma_sq_full / (tma_contested_accesses + tma= _data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_memory_bound * (tm= a_l1_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound += tma_store_bound)) * (tma_fb_full / (tma_dtlb_load + tma_fb_full + tma_l1_l= atency_dependency + tma_lock_latency + tma_split_loads + tma_store_fwd_blk)= ))", "MetricGroup": "BvMB;Mem;MemoryBW;Offcore;tma_issueBW", - "MetricName": "tma_bottleneck_cache_memory_bandwidth", - "MetricThreshold": "tma_bottleneck_cache_memory_bandwidth > 20", + "MetricName": "tma_bottleneck_data_cache_memory_bandwidth", + "MetricThreshold": "tma_bottleneck_data_cache_memory_bandwidth > 2= 0", "PublicDescription": "Total pipeline cost of external Memory- or C= ache-Bandwidth related bottlenecks. Related metrics: tma_fb_full, tma_info_= system_dram_bw_use, tma_mem_bandwidth, tma_sq_full", "Unit": "cpu_core" }, @@ -802,23 +811,14 @@ "BriefDescription": "Total pipeline cost of external Memory- or Ca= che-Latency related bottlenecks", "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) *= (tma_mem_latency / (tma_mem_bandwidth + tma_mem_latency)) + tma_memory_bou= nd * (tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3= _bound + tma_store_bound)) * (tma_l3_hit_latency / (tma_contested_accesses = + tma_data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_memory_bound = * tma_l2_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bou= nd + tma_store_bound) + tma_memory_bound * (tma_l1_bound / (tma_dram_bound = + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_l1_= latency_dependency / (tma_dtlb_load + tma_fb_full + tma_l1_latency_dependen= cy + tma_lock_latency + tma_split_loads + tma_store_fwd_blk)) + tma_memory_= bound * (tma_l1_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma= _l3_bound + tma_store_bound)) * (tma_lock_latency / (tma_dtlb_load + tma_fb= _full + tma_l1_latency_dependency + tma_lock_latency + tma_split_loads + tm= a_store_fwd_blk)) + tma_memory_bound * (tma_l1_bound / (tma_dram_bound + tm= a_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_split_l= oads / (tma_dtlb_load + tma_fb_full + tma_l1_latency_dependency + tma_lock_= latency + tma_split_loads + tma_store_fwd_blk)) + tma_memory_bound * (tma_s= tore_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound += tma_store_bound)) * (tma_split_stores / (tma_dtlb_store + tma_false_sharin= g + tma_split_stores + tma_store_latency + tma_streaming_stores)) + tma_mem= ory_bound * (tma_store_bound / (tma_dram_bound + tma_l1_bound + tma_l2_boun= d + tma_l3_bound + tma_store_bound)) * (tma_store_latency / (tma_dtlb_store= + tma_false_sharing + tma_split_stores + tma_store_latency + tma_streaming= _stores)))", "MetricGroup": "BvML;Mem;MemoryLat;Offcore;tma_issueLat", - "MetricName": "tma_bottleneck_cache_memory_latency", - "MetricThreshold": "tma_bottleneck_cache_memory_latency > 20", + "MetricName": "tma_bottleneck_data_cache_memory_latency", + "MetricThreshold": "tma_bottleneck_data_cache_memory_latency > 20", "PublicDescription": "Total pipeline cost of external Memory- or C= ache-Latency related bottlenecks. Related metrics: tma_l3_hit_latency, tma_= mem_latency", "Unit": "cpu_core" }, - { - "BriefDescription": "Total pipeline cost when the execution is com= pute-bound - an estimation", - "MetricExpr": "100 * (tma_core_bound * tma_divider / (tma_divider = + tma_ports_utilization + tma_serializing_operation) + tma_core_bound * (tm= a_ports_utilization / (tma_divider + tma_ports_utilization + tma_serializin= g_operation)) * (tma_ports_utilized_3m / (tma_ports_utilized_0 + tma_ports_= utilized_1 + tma_ports_utilized_2 + tma_ports_utilized_3m)))", - "MetricGroup": "BvCB;Cor;tma_issueComp", - "MetricName": "tma_bottleneck_compute_bound_est", - "MetricThreshold": "tma_bottleneck_compute_bound_est > 20", - "PublicDescription": "Total pipeline cost when the execution is co= mpute-bound - an estimation. Covers Core Bound when High ILP as well as whe= n long-latency execution units are busy. Related metrics: ", - "Unit": "cpu_core" - }, { "BriefDescription": "Total pipeline cost of instruction fetch band= width related bottlenecks (when the front-end could not sustain operations = delivery to the back-end)", - "MetricExpr": "100 * (tma_frontend_bound - (1 - 10 * tma_microcode= _sequencer * tma_other_mispredicts / tma_branch_mispredicts) * tma_fetch_la= tency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches = + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches) - (1 - c= pu_core@INST_RETIRED.REP_ITERATION@ / cpu_core@UOPS_RETIRED.MS\\,cmask\\=3D= 1@) * (tma_fetch_latency * (tma_ms_switches + tma_branch_resteers * (tma_cl= ears_resteers + tma_mispredicts_resteers * tma_other_mispredicts / tma_bran= ch_mispredicts) / (tma_clears_resteers + tma_mispredicts_resteers + tma_unk= nown_branches)) / (tma_branch_resteers + tma_dsb_switches + tma_icache_miss= es + tma_itlb_misses + tma_lcp + tma_ms_switches) + tma_fetch_bandwidth * t= ma_ms / (tma_dsb + tma_lsd + tma_mite + tma_ms))) - tma_bottleneck_big_code= ", + "MetricExpr": "100 * (tma_frontend_bound - (1 - 10 * tma_microcode= _sequencer * tma_other_mispredicts / tma_branch_mispredicts) * tma_fetch_la= tency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches = + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches) - (1 - c= pu_core@INST_RETIRED.REP_ITERATION@ / cpu_core@UOPS_RETIRED.MS\\,cmask\\=3D= 1@) * (tma_fetch_latency * (tma_ms_switches + tma_branch_resteers * (tma_cl= ears_resteers + tma_mispredicts_resteers * tma_other_mispredicts / tma_bran= ch_mispredicts) / (tma_clears_resteers + tma_mispredicts_resteers + tma_unk= nown_branches)) / (tma_branch_resteers + tma_dsb_switches + tma_icache_miss= es + tma_itlb_misses + tma_lcp + tma_ms_switches) + tma_ms)) - tma_bottlene= ck_big_code", "MetricGroup": "BvFB;Fed;FetchBW;Frontend", "MetricName": "tma_bottleneck_instruction_fetch_bw", "MetricThreshold": "tma_bottleneck_instruction_fetch_bw > 20", @@ -826,7 +826,7 @@ }, { "BriefDescription": "Total pipeline cost of irregular execution (e= .g", - "MetricExpr": "100 * ((1 - cpu_core@INST_RETIRED.REP_ITERATION@ / = cpu_core@UOPS_RETIRED.MS\\,cmask\\=3D1@) * (tma_fetch_latency * (tma_ms_swi= tches + tma_branch_resteers * (tma_clears_resteers + tma_mispredicts_restee= rs * tma_other_mispredicts / tma_branch_mispredicts) / (tma_clears_resteers= + tma_mispredicts_resteers + tma_unknown_branches)) / (tma_branch_resteers= + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_m= s_switches) + tma_fetch_bandwidth * tma_ms / (tma_dsb + tma_lsd + tma_mite = + tma_ms)) + 10 * tma_microcode_sequencer * tma_other_mispredicts / tma_bra= nch_mispredicts * tma_branch_mispredicts + tma_machine_clears * tma_other_n= ukes / tma_other_nukes + tma_core_bound * (tma_serializing_operation + cpu_= core@RS.EMPTY_RESOURCE@ / tma_info_thread_clks * tma_ports_utilized_0) / (t= ma_divider + tma_ports_utilization + tma_serializing_operation) + tma_micro= code_sequencer / (tma_few_uops_instructions + tma_microcode_sequencer) * (t= ma_assists / tma_microcode_sequencer) * tma_heavy_operations)", + "MetricExpr": "100 * ((1 - cpu_core@INST_RETIRED.REP_ITERATION@ / = cpu_core@UOPS_RETIRED.MS\\,cmask\\=3D1@) * (tma_fetch_latency * (tma_ms_swi= tches + tma_branch_resteers * (tma_clears_resteers + tma_mispredicts_restee= rs * tma_other_mispredicts / tma_branch_mispredicts) / (tma_clears_resteers= + tma_mispredicts_resteers + tma_unknown_branches)) / (tma_branch_resteers= + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_m= s_switches) + tma_ms) + 10 * tma_microcode_sequencer * tma_other_mispredict= s / tma_branch_mispredicts * tma_branch_mispredicts + tma_machine_clears * = tma_other_nukes / tma_other_nukes + tma_core_bound * (tma_serializing_opera= tion + cpu_core@RS.EMPTY_RESOURCE@ / tma_info_thread_clks * tma_ports_utili= zed_0) / (tma_divider + tma_ports_utilization + tma_serializing_operation) = + tma_microcode_sequencer / (tma_few_uops_instructions + tma_microcode_sequ= encer) * (tma_assists / tma_microcode_sequencer) * tma_heavy_operations)", "MetricGroup": "Bad;BvIO;Cor;Ret;tma_issueMS", "MetricName": "tma_bottleneck_irregular_overhead", "MetricThreshold": "tma_bottleneck_irregular_overhead > 10", @@ -862,7 +862,7 @@ }, { "BriefDescription": "Total pipeline cost of remaining bottlenecks = in the back-end", - "MetricExpr": "100 - (tma_bottleneck_big_code + tma_bottleneck_ins= truction_fetch_bw + tma_bottleneck_mispredictions + tma_bottleneck_cache_me= mory_bandwidth + tma_bottleneck_cache_memory_latency + tma_bottleneck_memor= y_data_tlbs + tma_bottleneck_memory_synchronization + tma_bottleneck_comput= e_bound_est + tma_bottleneck_irregular_overhead + tma_bottleneck_branching_= overhead + tma_bottleneck_useful_work)", + "MetricExpr": "100 - (tma_bottleneck_big_code + tma_bottleneck_ins= truction_fetch_bw + tma_bottleneck_mispredictions + tma_bottleneck_data_cac= he_memory_bandwidth + tma_bottleneck_data_cache_memory_latency + tma_bottle= neck_memory_data_tlbs + tma_bottleneck_memory_synchronization + tma_bottlen= eck_compute_bound_est + tma_bottleneck_irregular_overhead + tma_bottleneck_= branching_overhead + tma_bottleneck_useful_work)", "MetricGroup": "BvOB;Cor;Offcore", "MetricName": "tma_bottleneck_other_bottlenecks", "MetricThreshold": "tma_bottleneck_other_bottlenecks > 20", @@ -879,7 +879,7 @@ }, { "BriefDescription": "This metric represents fraction of slots the = CPU has wasted due to Branch Misprediction", - "MetricExpr": "cpu_core@topdown\\-br\\-mispredict@ / (cpu_core@top= down\\-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-re= tiring@ + cpu_core@topdown\\-be\\-bound@) + 0 * tma_info_thread_slots", + "MetricExpr": "cpu_core@topdown\\-br\\-mispredict@ / (cpu_core@top= down\\-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-re= tiring@ + cpu_core@topdown\\-be\\-bound@)", "MetricGroup": "BadSpec;BrMispredicts;BvMP;TmaL2;TopdownL2;tma_L2_= group;tma_bad_speculation_group;tma_issueBM", "MetricName": "tma_branch_mispredicts", "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_specula= tion > 0.15", @@ -992,7 +992,6 @@ }, { "BriefDescription": "This metric estimates fraction of cycles whil= e the memory subsystem was handling synchronizations due to contested acces= ses", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "(25 * tma_info_system_core_frequency * (cpu_core@ME= M_LOAD_L3_HIT_RETIRED.XSNP_FWD@ * (cpu_core@OCR.DEMAND_DATA_RD.L3_HIT.SNOOP= _HITM@ / (cpu_core@OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM@ + cpu_core@OCR.DEM= AND_DATA_RD.L3_HIT.SNOOP_HIT_WITH_FWD@))) + 24 * tma_info_system_core_frequ= ency * cpu_core@MEM_LOAD_L3_HIT_RETIRED.XSNP_MISS@) * (1 + cpu_core@MEM_LOA= D_RETIRED.FB_HIT@ / cpu_core@MEM_LOAD_RETIRED.L1_MISS@ / 2) / tma_info_thre= ad_clks", "MetricGroup": "BvMS;DataSharing;LockCont;Offcore;Snoop;TopdownL4;= tma_L4_group;tma_issueSyncxn;tma_l3_bound_group", "MetricName": "tma_contested_accesses", @@ -1109,7 +1108,7 @@ "MetricGroup": "BvMB;MemoryBW;TopdownL4;tma_L4_group;tma_issueBW;t= ma_issueSL;tma_issueSmSt;tma_l1_bound_group", "MetricName": "tma_fb_full", "MetricThreshold": "tma_fb_full > 0.3", - "PublicDescription": "This metric does a *rough estimation* of how= often L1D Fill Buffer unavailability limited additional L1D miss memory ac= cess requests to proceed. The higher the metric value; the deeper the memor= y hierarchy level the misses are satisfied from (metric values >1 are valid= ). Often it hints on approaching bandwidth limits (to L2 cache; L3 cache or= external memory). Related metrics: tma_bottleneck_cache_memory_bandwidth, = tma_info_system_dram_bw_use, tma_mem_bandwidth, tma_sq_full, tma_store_late= ncy, tma_streaming_stores", + "PublicDescription": "This metric does a *rough estimation* of how= often L1D Fill Buffer unavailability limited additional L1D miss memory ac= cess requests to proceed. The higher the metric value; the deeper the memor= y hierarchy level the misses are satisfied from (metric values >1 are valid= ). Often it hints on approaching bandwidth limits (to L2 cache; L3 cache or= external memory). Related metrics: tma_bottleneck_data_cache_memory_bandwi= dth, tma_info_system_dram_bw_use, tma_mem_bandwidth, tma_sq_full, tma_store= _latency, tma_streaming_stores", "ScaleUnit": "100%", "Unit": "cpu_core" }, @@ -1238,7 +1237,7 @@ }, { "BriefDescription": "This metric represents fraction of slots wher= e the CPU was retiring heavy-weight operations -- instructions that require= two or more uops or micro-coded sequences", - "MetricExpr": "cpu_core@topdown\\-heavy\\-ops@ / (cpu_core@topdown= \\-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retiri= ng@ + cpu_core@topdown\\-be\\-bound@) + 0 * tma_info_thread_slots", + "MetricExpr": "cpu_core@topdown\\-heavy\\-ops@ / (cpu_core@topdown= \\-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retiri= ng@ + cpu_core@topdown\\-be\\-bound@)", "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_g= roup", "MetricName": "tma_heavy_operations", "MetricThreshold": "tma_heavy_operations > 0.1", @@ -1851,7 +1850,7 @@ "Unit": "cpu_core" }, { - "BriefDescription": "", + "BriefDescription": "Mem;Backend;CacheHits", "MetricExpr": "cpu_core@UOPS_EXECUTED.THREAD@ / (cpu_core@UOPS_EXE= CUTED.CORE_CYCLES_GE_1@ / 2 if #SMT_on else cpu_core@UOPS_EXECUTED.THREAD\\= ,cmask\\=3D1@)", "MetricGroup": "Cor;Pipeline;PortsUtil;SMT", "MetricName": "tma_info_pipeline_execute", @@ -1912,7 +1911,7 @@ }, { "BriefDescription": "Measured Average Core Frequency for unhalted = processors [GHz]", - "MetricExpr": "tma_info_system_turbo_utilization * TSC / 1e9 / tma= _info_system_time", + "MetricExpr": "tma_info_system_turbo_utilization * msr@tsc\\,cpu= =3Dcpu_core@ / 1e9 / tma_info_system_time", "MetricGroup": "Power;Summary", "MetricName": "tma_info_system_core_frequency", "Unit": "cpu_core" @@ -1926,7 +1925,7 @@ }, { "BriefDescription": "Average number of utilized CPUs", - "MetricExpr": "cpu_core@CPU_CLK_UNHALTED.REF_TSC@ / TSC", + "MetricExpr": "cpu_core@CPU_CLK_UNHALTED.REF_TSC@ / msr@tsc\\,cpu= =3Dcpu_core@", "MetricGroup": "Summary", "MetricName": "tma_info_system_cpus_utilized", "Unit": "cpu_core" @@ -1936,7 +1935,7 @@ "MetricExpr": "64 * (UNC_ARB_TRK_REQUESTS.ALL + UNC_ARB_COH_TRK_RE= QUESTS.ALL) / 1e6 / tma_info_system_time / 1e3", "MetricGroup": "HPC;MemOffcore;MemoryBW;SoC;tma_issueBW", "MetricName": "tma_info_system_dram_bw_use", - "PublicDescription": "Average external Memory Bandwidth Use for re= ads and writes [GB / sec]. Related metrics: tma_bottleneck_cache_memory_ban= dwidth, tma_fb_full, tma_mem_bandwidth, tma_sq_full", + "PublicDescription": "Average external Memory Bandwidth Use for re= ads and writes [GB / sec]. Related metrics: tma_bottleneck_data_cache_memor= y_bandwidth, tma_fb_full, tma_mem_bandwidth, tma_sq_full", "Unit": "cpu_core" }, { @@ -1980,7 +1979,6 @@ }, { "BriefDescription": "Average latency of data read request to exter= nal memory (in nanoseconds)", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "(UNC_ARB_TRK_OCCUPANCY.RD + UNC_ARB_DAT_OCCUPANCY.R= D) / UNC_ARB_TRK_REQUESTS.RD", "MetricGroup": "Mem;MemoryLat;SoC", "MetricName": "tma_info_system_mem_read_latency", @@ -2031,6 +2029,13 @@ "MetricName": "tma_info_system_turbo_utilization", "Unit": "cpu_core" }, + { + "BriefDescription": "Measured Average Uncore Frequency for the SoC= [GHz]", + "MetricExpr": "tma_info_system_socket_clks / 1e9 / tma_info_system= _time", + "MetricGroup": "SoC", + "MetricName": "tma_info_system_uncore_frequency", + "Unit": "cpu_core" + }, { "BriefDescription": "Per-Logical Processor actual clocks when the = Logical Processor is active.", "MetricExpr": "cpu_core@CPU_CLK_UNHALTED.THREAD@", @@ -2150,12 +2155,12 @@ "Unit": "cpu_core" }, { - "BriefDescription": "This metric([SKL+] roughly; [LNL]) estimates = fraction of cycles with demand load accesses that hit the L1D cache", + "BriefDescription": "This metric ([SKL+] roughly; [LNL]) estimates= fraction of cycles with demand load accesses that hit the L1D cache", "MetricExpr": "min(2 * (cpu_core@MEM_INST_RETIRED.ALL_LOADS@ - cpu= _core@MEM_LOAD_RETIRED.FB_HIT@ - cpu_core@MEM_LOAD_RETIRED.L1_MISS@) * 20 /= 100, max(cpu_core@CYCLE_ACTIVITY.CYCLES_MEM_ANY@ - cpu_core@MEMORY_ACTIVIT= Y.CYCLES_L1D_MISS@, 0)) / tma_info_thread_clks", "MetricGroup": "BvML;MemoryLat;TopdownL4;tma_L4_group;tma_l1_bound= _group", "MetricName": "tma_l1_latency_dependency", "MetricThreshold": "tma_l1_latency_dependency > 0.1 & (tma_l1_boun= d > 0.1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric([SKL+] roughly; [LNL]) estimates= fraction of cycles with demand load accesses that hit the L1D cache. The s= hort latency of the L1D cache may be exposed in pointer-chasing memory acce= ss patterns as an example. Sample with: MEM_LOAD_RETIRED.L1_HIT", + "PublicDescription": "This metric ([SKL+] roughly; [LNL]) estimate= s fraction of cycles with demand load accesses that hit the L1D cache. The = short latency of the L1D cache may be exposed in pointer-chasing memory acc= ess patterns as an example. Sample with: MEM_LOAD_RETIRED.L1_HIT", "ScaleUnit": "100%", "Unit": "cpu_core" }, @@ -2171,7 +2176,6 @@ }, { "BriefDescription": "This metric represents fraction of cycles wit= h demand load accesses that hit the L2 cache under unloaded scenarios (poss= ibly L2 latency limited)", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "3 * tma_info_system_core_frequency * cpu_core@MEM_L= OAD_RETIRED.L2_HIT@ * (1 + cpu_core@MEM_LOAD_RETIRED.FB_HIT@ / cpu_core@MEM= _LOAD_RETIRED.L1_MISS@ / 2) / tma_info_thread_clks", "MetricGroup": "MemoryLat;TopdownL4;tma_L4_group;tma_l2_bound_grou= p", "MetricName": "tma_l2_hit_latency", @@ -2192,12 +2196,11 @@ }, { "BriefDescription": "This metric estimates fraction of cycles with= demand load accesses that hit the L3 cache under unloaded scenarios (possi= bly L3 latency limited)", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "9 * tma_info_system_core_frequency * (cpu_core@MEM_= LOAD_RETIRED.L3_HIT@ * (1 + cpu_core@MEM_LOAD_RETIRED.FB_HIT@ / cpu_core@ME= M_LOAD_RETIRED.L1_MISS@ / 2)) / tma_info_thread_clks", "MetricGroup": "BvML;MemoryLat;TopdownL4;tma_L4_group;tma_issueLat= ;tma_l3_bound_group", "MetricName": "tma_l3_hit_latency", "MetricThreshold": "tma_l3_hit_latency > 0.1 & (tma_l3_bound > 0.0= 5 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric estimates fraction of cycles wit= h demand load accesses that hit the L3 cache under unloaded scenarios (poss= ibly L3 latency limited). Avoiding private cache misses (i.e. L2 misses/L3= hits) will improve the latency; reduce contention with sibling physical co= res and increase performance. Note the value of this node may overlap with= its siblings. Sample with: MEM_LOAD_RETIRED.L3_HIT_PS. Related metrics: tm= a_bottleneck_cache_memory_latency, tma_mem_latency", + "PublicDescription": "This metric estimates fraction of cycles wit= h demand load accesses that hit the L3 cache under unloaded scenarios (poss= ibly L3 latency limited). Avoiding private cache misses (i.e. L2 misses/L3= hits) will improve the latency; reduce contention with sibling physical co= res and increase performance. Note the value of this node may overlap with= its siblings. Sample with: MEM_LOAD_RETIRED.L3_HIT_PS. Related metrics: tm= a_bottleneck_data_cache_memory_latency, tma_mem_latency", "ScaleUnit": "100%", "Unit": "cpu_core" }, @@ -2279,6 +2282,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles the= CPU spent handling cache misses due to lock operations", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "(16 * max(0, cpu_core@MEM_INST_RETIRED.LOCK_LOADS@ = - cpu_core@L2_RQSTS.ALL_RFO@) + cpu_core@MEM_INST_RETIRED.LOCK_LOADS@ / cpu= _core@MEM_INST_RETIRED.ALL_STORES@ * (10 * cpu_core@L2_RQSTS.RFO_HIT@ + min= (cpu_core@CPU_CLK_UNHALTED.THREAD@, cpu_core@OFFCORE_REQUESTS_OUTSTANDING.C= YCLES_WITH_DEMAND_RFO@))) / tma_info_thread_clks", "MetricGroup": "LockCont;Offcore;TopdownL4;tma_L4_group;tma_issueR= FO;tma_l1_bound_group", "MetricName": "tma_lock_latency", @@ -2314,7 +2318,7 @@ "MetricGroup": "BvMB;MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_d= ram_bound_group;tma_issueBW", "MetricName": "tma_mem_bandwidth", "MetricThreshold": "tma_mem_bandwidth > 0.2 & (tma_dram_bound > 0.= 1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric estimates fraction of cycles whe= re the core's performance was likely hurt due to approaching bandwidth limi= ts of external memory - DRAM ([SPR-HBM] and/or HBM). The underlying heuris= tic assumes that a similar off-core traffic is generated by all IA cores. T= his metric does not aggregate non-data-read requests by this logical proces= sor; requests from other IA Logical Processors/Physical Cores/sockets; or o= ther non-IA devices like GPU; hence the maximum external memory bandwidth l= imits may or may not be approached when this metric is flagged (see Uncore = counters for that). Related metrics: tma_bottleneck_cache_memory_bandwidth,= tma_fb_full, tma_info_system_dram_bw_use, tma_sq_full", + "PublicDescription": "This metric estimates fraction of cycles whe= re the core's performance was likely hurt due to approaching bandwidth limi= ts of external memory - DRAM ([SPR-HBM] and/or HBM). The underlying heuris= tic assumes that a similar off-core traffic is generated by all IA cores. T= his metric does not aggregate non-data-read requests by this logical proces= sor; requests from other IA Logical Processors/Physical Cores/sockets; or o= ther non-IA devices like GPU; hence the maximum external memory bandwidth l= imits may or may not be approached when this metric is flagged (see Uncore = counters for that). Related metrics: tma_bottleneck_data_cache_memory_bandw= idth, tma_fb_full, tma_info_system_dram_bw_use, tma_sq_full", "ScaleUnit": "100%", "Unit": "cpu_core" }, @@ -2324,13 +2328,13 @@ "MetricGroup": "BvML;MemoryLat;Offcore;TopdownL4;tma_L4_group;tma_= dram_bound_group;tma_issueLat", "MetricName": "tma_mem_latency", "MetricThreshold": "tma_mem_latency > 0.1 & (tma_dram_bound > 0.1 = & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric estimates fraction of cycles whe= re the performance was likely hurt due to latency from external memory - DR= AM ([SPR-HBM] and/or HBM). This metric does not aggregate requests from ot= her Logical Processors/Physical Cores/sockets (see Uncore counters for that= ). Related metrics: tma_bottleneck_cache_memory_latency, tma_l3_hit_latency= ", + "PublicDescription": "This metric estimates fraction of cycles whe= re the performance was likely hurt due to latency from external memory - DR= AM ([SPR-HBM] and/or HBM). This metric does not aggregate requests from ot= her Logical Processors/Physical Cores/sockets (see Uncore counters for that= ). Related metrics: tma_bottleneck_data_cache_memory_latency, tma_l3_hit_la= tency", "ScaleUnit": "100%", "Unit": "cpu_core" }, { "BriefDescription": "This metric represents fraction of slots the = Memory subsystem within the Backend was a bottleneck", - "MetricExpr": "cpu_core@topdown\\-mem\\-bound@ / (cpu_core@topdown= \\-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retiri= ng@ + cpu_core@topdown\\-be\\-bound@) + 0 * tma_info_thread_slots", + "MetricExpr": "cpu_core@topdown\\-mem\\-bound@ / (cpu_core@topdown= \\-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retiri= ng@ + cpu_core@topdown\\-be\\-bound@)", "MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_b= ound_group", "MetricName": "tma_memory_bound", "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0= .2", @@ -2341,7 +2345,6 @@ }, { "BriefDescription": "This metric represents fraction of cycles the= CPU was stalled due to LFENCE Instructions.", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "13 * cpu_core@MISC2_RETIRED.LFENCE@ / tma_info_thre= ad_clks", "MetricGroup": "TopdownL4;tma_L4_group;tma_serializing_operation_g= roup", "MetricName": "tma_memory_fence", @@ -2400,7 +2403,7 @@ }, { "BriefDescription": "This metric represents Core fraction of cycle= s in which CPU was likely limited due to the Microcode Sequencer (MS) unit = - see Microcode_Sequencer node for details.", - "MetricExpr": "max(cpu_core@IDQ.MS_CYCLES_ANY@, cpu_core@UOPS_RETI= RED.MS\\,cmask\\=3D1@ / (cpu_core@UOPS_RETIRED.SLOTS@ / cpu_core@UOPS_ISSUE= D.ANY@)) / tma_info_core_core_clks / 2", + "MetricExpr": "max(cpu_core@IDQ.MS_CYCLES_ANY@, cpu_core@UOPS_RETI= RED.MS\\,cmask\\=3D1@ / (cpu_core@UOPS_RETIRED.SLOTS@ / cpu_core@UOPS_ISSUE= D.ANY@)) / tma_info_core_core_clks / 2.4", "MetricGroup": "MicroSeq;TopdownL3;tma_L3_group;tma_fetch_bandwidt= h_group", "MetricName": "tma_ms", "MetricThreshold": "tma_ms > 0.05 & tma_fetch_bandwidth > 0.2", @@ -2439,6 +2442,7 @@ }, { "BriefDescription": "This metric represents the remaining light uo= ps fraction the CPU has executed - remaining means not covered by other sib= ling nodes", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "max(0, tma_light_operations - (tma_fp_arith + tma_i= nt_operations + tma_memory_operations + tma_fused_instructions + tma_non_fu= sed_branches))", "MetricGroup": "Pipeline;TopdownL3;tma_L3_group;tma_light_operatio= ns_group", "MetricName": "tma_other_light_ops", @@ -2507,6 +2511,7 @@ }, { "BriefDescription": "This metric estimates fraction of cycles the = CPU performance was potentially limited due to Core computation issues (non= divider-related)", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "((tma_ports_utilized_0 * tma_info_thread_clks + (cp= u_core@EXE_ACTIVITY.1_PORTS_UTIL@ + tma_retiring * cpu_core@EXE_ACTIVITY.2_= 3_PORTS_UTIL@)) / tma_info_thread_clks if cpu_core@ARITH.DIV_ACTIVE@ < cpu_= core@CYCLE_ACTIVITY.STALLS_TOTAL@ - cpu_core@EXE_ACTIVITY.BOUND_ON_LOADS@ e= lse (cpu_core@EXE_ACTIVITY.1_PORTS_UTIL@ + tma_retiring * cpu_core@EXE_ACTI= VITY.2_3_PORTS_UTIL@) / tma_info_thread_clks)", "MetricGroup": "PortsUtil;TopdownL3;tma_L3_group;tma_core_bound_gr= oup", "MetricName": "tma_ports_utilization", @@ -2517,6 +2522,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles CPU= executed no uops on any execution port (Logical Processor cycles since ICL= , Physical Core cycles otherwise)", + "MetricConstraint": "NO_THRESHOLD_AND_NMI", "MetricExpr": "(cpu_core@EXE_ACTIVITY.EXE_BOUND_0_PORTS@ + max(cpu= _core@RS.EMPTY_RESOURCE@ - cpu_core@RESOURCE_STALLS.SCOREBOARD@, 0)) / tma_= info_thread_clks * (cpu_core@CYCLE_ACTIVITY.STALLS_TOTAL@ - cpu_core@EXE_AC= TIVITY.BOUND_ON_LOADS@) / tma_info_thread_clks", "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_ports_utiliza= tion_group", "MetricName": "tma_ports_utilized_0", @@ -2527,6 +2533,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles whe= re the CPU executed total of 1 uop per cycle on all execution ports (Logica= l Processor cycles since ICL, Physical Core cycles otherwise)", + "MetricConstraint": "NO_THRESHOLD_AND_NMI", "MetricExpr": "cpu_core@EXE_ACTIVITY.1_PORTS_UTIL@ / tma_info_thre= ad_clks", "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_issueL1;tma_p= orts_utilization_group", "MetricName": "tma_ports_utilized_1", @@ -2537,7 +2544,6 @@ }, { "BriefDescription": "This metric represents fraction of cycles CPU= executed total of 2 uops per cycle on all execution ports (Logical Process= or cycles since ICL, Physical Core cycles otherwise)", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "cpu_core@EXE_ACTIVITY.2_PORTS_UTIL@ / tma_info_thre= ad_clks", "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_issue2P;tma_p= orts_utilization_group", "MetricName": "tma_ports_utilized_2", @@ -2548,7 +2554,6 @@ }, { "BriefDescription": "This metric represents fraction of cycles CPU= executed total of 3 or more uops per cycle on all execution ports (Logical= Processor cycles since ICL, Physical Core cycles otherwise)", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "cpu_core@UOPS_EXECUTED.CYCLES_GE_3@ / tma_info_thre= ad_clks", "MetricGroup": "BvCB;PortsUtil;TopdownL4;tma_L4_group;tma_ports_ut= ilization_group", "MetricName": "tma_ports_utilized_3m", @@ -2560,7 +2565,7 @@ { "BriefDescription": "This category represents fraction of slots ut= ilized by useful work i.e. issued uops that eventually get retired", "DefaultMetricgroupName": "TopdownL1", - "MetricExpr": "cpu_core@topdown\\-retiring@ / (cpu_core@topdown\\-= fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retiring@= + cpu_core@topdown\\-be\\-bound@) + 0 * tma_info_thread_slots", + "MetricExpr": "cpu_core@topdown\\-retiring@ / (cpu_core@topdown\\-= fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retiring@= + cpu_core@topdown\\-be\\-bound@)", "MetricGroup": "BvUW;Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_retiring", "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.= 1", @@ -2591,7 +2596,6 @@ }, { "BriefDescription": "This metric represents fraction of cycles the= CPU was stalled due to PAUSE Instructions", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "cpu_core@CPU_CLK_UNHALTED.PAUSE@ / tma_info_thread_= clks", "MetricGroup": "TopdownL4;tma_L4_group;tma_serializing_operation_g= roup", "MetricName": "tma_slow_pause", @@ -2626,7 +2630,7 @@ "MetricGroup": "BvMB;MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_i= ssueBW;tma_l3_bound_group", "MetricName": "tma_sq_full", "MetricThreshold": "tma_sq_full > 0.3 & (tma_l3_bound > 0.05 & (tm= a_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric measures fraction of cycles wher= e the Super Queue (SQ) was full taking into account all request-types and b= oth hardware SMT threads (Logical Processors). Related metrics: tma_bottlen= eck_cache_memory_bandwidth, tma_fb_full, tma_info_system_dram_bw_use, tma_m= em_bandwidth", + "PublicDescription": "This metric measures fraction of cycles wher= e the Super Queue (SQ) was full taking into account all request-types and b= oth hardware SMT threads (Logical Processors). Related metrics: tma_bottlen= eck_data_cache_memory_bandwidth, tma_fb_full, tma_info_system_dram_bw_use, = tma_mem_bandwidth", "ScaleUnit": "100%", "Unit": "cpu_core" }, diff --git a/tools/perf/pmu-events/arch/x86/alderlake/cache.json b/tools/pe= rf/pmu-events/arch/x86/alderlake/cache.json index 5461576dafc7..6a56c9ad8e43 100644 --- a/tools/perf/pmu-events/arch/x86/alderlake/cache.json +++ b/tools/perf/pmu-events/arch/x86/alderlake/cache.json @@ -4,7 +4,6 @@ "Counter": "0,1,2,3", "EventCode": "0x51", "EventName": "L1D.HWPF_MISS", - "PublicDescription": "L1D.HWPF_MISS Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x20", "Unit": "cpu_core" @@ -14,7 +13,7 @@ "Counter": "0,1,2,3", "EventCode": "0x51", "EventName": "L1D.REPLACEMENT", - "PublicDescription": "Counts L1D data line replacements including = opportunistic replacements, and replacements that require stall-for-replace= or block-for-replace. Available PDIST counters: 0", + "PublicDescription": "Counts L1D data line replacements including = opportunistic replacements, and replacements that require stall-for-replace= or block-for-replace.", "SampleAfterValue": "100003", "UMask": "0x1", "Unit": "cpu_core" @@ -24,7 +23,7 @@ "Counter": "0,1,2,3", "EventCode": "0x48", "EventName": "L1D_PEND_MISS.FB_FULL", - "PublicDescription": "Counts number of cycles a demand request has= waited due to L1D Fill Buffer (FB) unavailability. Demand requests include= cacheable/uncacheable demand load, store, lock or SW prefetch accesses. Av= ailable PDIST counters: 0", + "PublicDescription": "Counts number of cycles a demand request has= waited due to L1D Fill Buffer (FB) unavailability. Demand requests include= cacheable/uncacheable demand load, store, lock or SW prefetch accesses.", "SampleAfterValue": "1000003", "UMask": "0x2", "Unit": "cpu_core" @@ -36,7 +35,7 @@ "EdgeDetect": "1", "EventCode": "0x48", "EventName": "L1D_PEND_MISS.FB_FULL_PERIODS", - "PublicDescription": "Counts number of phases a demand request has= waited due to L1D Fill Buffer (FB) unavailability. Demand requests include= cacheable/uncacheable demand load, store, lock or SW prefetch accesses. Av= ailable PDIST counters: 0", + "PublicDescription": "Counts number of phases a demand request has= waited due to L1D Fill Buffer (FB) unavailability. Demand requests include= cacheable/uncacheable demand load, store, lock or SW prefetch accesses.", "SampleAfterValue": "1000003", "UMask": "0x2", "Unit": "cpu_core" @@ -47,7 +46,6 @@ "Deprecated": "1", "EventCode": "0x48", "EventName": "L1D_PEND_MISS.L2_STALL", - "PublicDescription": "This event is deprecated. Refer to new event= L1D_PEND_MISS.L2_STALLS Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x4", "Unit": "cpu_core" @@ -57,7 +55,7 @@ "Counter": "0,1,2,3", "EventCode": "0x48", "EventName": "L1D_PEND_MISS.L2_STALLS", - "PublicDescription": "Counts number of cycles a demand request has= waited due to L1D due to lack of L2 resources. Demand requests include cac= heable/uncacheable demand load, store, lock or SW prefetch accesses. Availa= ble PDIST counters: 0", + "PublicDescription": "Counts number of cycles a demand request has= waited due to L1D due to lack of L2 resources. Demand requests include cac= heable/uncacheable demand load, store, lock or SW prefetch accesses.", "SampleAfterValue": "1000003", "UMask": "0x4", "Unit": "cpu_core" @@ -67,7 +65,7 @@ "Counter": "0,1,2,3", "EventCode": "0x48", "EventName": "L1D_PEND_MISS.PENDING", - "PublicDescription": "Counts number of L1D misses that are outstan= ding in each cycle, that is each cycle the number of Fill Buffers (FB) outs= tanding required by Demand Reads. FB either is held by demand loads, or it = is held by non-demand loads and gets hit at least once by demand. The valid= outstanding interval is defined until the FB deallocation by one of the fo= llowing ways: from FB allocation, if FB is allocated by demand from the dem= and Hit FB, if it is allocated by hardware or software prefetch. Note: In t= he L1D, a Demand Read contains cacheable or noncacheable demand loads, incl= uding ones causing cache-line splits and reads due to page walks resulted f= rom any request type. Available PDIST counters: 0", + "PublicDescription": "Counts number of L1D misses that are outstan= ding in each cycle, that is each cycle the number of Fill Buffers (FB) outs= tanding required by Demand Reads. FB either is held by demand loads, or it = is held by non-demand loads and gets hit at least once by demand. The valid= outstanding interval is defined until the FB deallocation by one of the fo= llowing ways: from FB allocation, if FB is allocated by demand from the dem= and Hit FB, if it is allocated by hardware or software prefetch. Note: In t= he L1D, a Demand Read contains cacheable or noncacheable demand loads, incl= uding ones causing cache-line splits and reads due to page walks resulted f= rom any request type.", "SampleAfterValue": "1000003", "UMask": "0x1", "Unit": "cpu_core" @@ -78,7 +76,7 @@ "CounterMask": "1", "EventCode": "0x48", "EventName": "L1D_PEND_MISS.PENDING_CYCLES", - "PublicDescription": "Counts duration of L1D miss outstanding in c= ycles. Available PDIST counters: 0", + "PublicDescription": "Counts duration of L1D miss outstanding in c= ycles.", "SampleAfterValue": "1000003", "UMask": "0x1", "Unit": "cpu_core" @@ -88,7 +86,7 @@ "Counter": "0,1,2,3", "EventCode": "0x25", "EventName": "L2_LINES_IN.ALL", - "PublicDescription": "Counts the number of L2 cache lines filling = the L2. Counting does not cover rejects. Available PDIST counters: 0", + "PublicDescription": "Counts the number of L2 cache lines filling = the L2. Counting does not cover rejects.", "SampleAfterValue": "100003", "UMask": "0x1f", "Unit": "cpu_core" @@ -98,7 +96,7 @@ "Counter": "0,1,2,3", "EventCode": "0x26", "EventName": "L2_LINES_OUT.NON_SILENT", - "PublicDescription": "Counts the number of lines that are evicted = by L2 cache when triggered by an L2 cache fill. Those lines are in Modified= state. Modified lines are written back to L3 Available PDIST counters: 0", + "PublicDescription": "Counts the number of lines that are evicted = by L2 cache when triggered by an L2 cache fill. Those lines are in Modified= state. Modified lines are written back to L3", "SampleAfterValue": "200003", "UMask": "0x2", "Unit": "cpu_core" @@ -108,7 +106,7 @@ "Counter": "0,1,2,3", "EventCode": "0x26", "EventName": "L2_LINES_OUT.SILENT", - "PublicDescription": "Counts the number of lines that are silently= dropped by L2 cache. These lines are typically in Shared or Exclusive stat= e. A non-threaded event. Available PDIST counters: 0", + "PublicDescription": "Counts the number of lines that are silently= dropped by L2 cache. These lines are typically in Shared or Exclusive stat= e. A non-threaded event.", "SampleAfterValue": "200003", "UMask": "0x1", "Unit": "cpu_core" @@ -118,7 +116,7 @@ "Counter": "0,1,2,3", "EventCode": "0x26", "EventName": "L2_LINES_OUT.USELESS_HWPF", - "PublicDescription": "Counts the number of cache lines that have b= een prefetched by the L2 hardware prefetcher but not used by demand access = when evicted from the L2 cache Available PDIST counters: 0", + "PublicDescription": "Counts the number of cache lines that have b= een prefetched by the L2 hardware prefetcher but not used by demand access = when evicted from the L2 cache", "SampleAfterValue": "200003", "UMask": "0x4", "Unit": "cpu_core" @@ -137,7 +135,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_REQUEST.ALL", - "PublicDescription": "Counts all requests that were hit or true mi= sses in L2 cache. True-miss excludes misses that were merged with ongoing L= 2 misses. [This event is alias to L2_RQSTS.REFERENCES] Available PDIST coun= ters: 0", + "PublicDescription": "Counts all requests that were hit or true mi= sses in L2 cache. True-miss excludes misses that were merged with ongoing L= 2 misses. [This event is alias to L2_RQSTS.REFERENCES]", "SampleAfterValue": "200003", "UMask": "0xff", "Unit": "cpu_core" @@ -167,7 +165,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_REQUEST.MISS", - "PublicDescription": "Counts read requests of any type with true-m= iss in the L2 cache. True-miss excludes L2 misses that were merged with ong= oing L2 misses. [This event is alias to L2_RQSTS.MISS] Available PDIST coun= ters: 0", + "PublicDescription": "Counts read requests of any type with true-m= iss in the L2 cache. True-miss excludes L2 misses that were merged with ong= oing L2 misses. [This event is alias to L2_RQSTS.MISS]", "SampleAfterValue": "200003", "UMask": "0x3f", "Unit": "cpu_core" @@ -177,7 +175,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.ALL_CODE_RD", - "PublicDescription": "Counts the total number of L2 code requests.= Available PDIST counters: 0", + "PublicDescription": "Counts the total number of L2 code requests.= ", "SampleAfterValue": "200003", "UMask": "0xe4", "Unit": "cpu_core" @@ -187,7 +185,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.ALL_DEMAND_DATA_RD", - "PublicDescription": "Counts Demand Data Read requests accessing t= he L2 cache. These requests may hit or miss L2 cache. True-miss exclude mis= ses that were merged with ongoing L2 misses. An access is counted once. Ava= ilable PDIST counters: 0", + "PublicDescription": "Counts Demand Data Read requests accessing t= he L2 cache. These requests may hit or miss L2 cache. True-miss exclude mis= ses that were merged with ongoing L2 misses. An access is counted once.", "SampleAfterValue": "200003", "UMask": "0xe1", "Unit": "cpu_core" @@ -197,7 +195,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.ALL_DEMAND_MISS", - "PublicDescription": "Counts demand requests that miss L2 cache. A= vailable PDIST counters: 0", + "PublicDescription": "Counts demand requests that miss L2 cache.", "SampleAfterValue": "200003", "UMask": "0x27", "Unit": "cpu_core" @@ -207,7 +205,6 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.ALL_HWPF", - "PublicDescription": "L2_RQSTS.ALL_HWPF Available PDIST counters: = 0", "SampleAfterValue": "200003", "UMask": "0xf0", "Unit": "cpu_core" @@ -217,7 +214,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.ALL_RFO", - "PublicDescription": "Counts the total number of RFO (read for own= ership) requests to L2 cache. L2 RFO requests include both L1D demand RFO m= isses as well as L1D RFO prefetches. Available PDIST counters: 0", + "PublicDescription": "Counts the total number of RFO (read for own= ership) requests to L2 cache. L2 RFO requests include both L1D demand RFO m= isses as well as L1D RFO prefetches.", "SampleAfterValue": "200003", "UMask": "0xe2", "Unit": "cpu_core" @@ -227,7 +224,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.CODE_RD_HIT", - "PublicDescription": "Counts L2 cache hits when fetching instructi= ons, code reads. Available PDIST counters: 0", + "PublicDescription": "Counts L2 cache hits when fetching instructi= ons, code reads.", "SampleAfterValue": "200003", "UMask": "0xc4", "Unit": "cpu_core" @@ -237,7 +234,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.CODE_RD_MISS", - "PublicDescription": "Counts L2 cache misses when fetching instruc= tions. Available PDIST counters: 0", + "PublicDescription": "Counts L2 cache misses when fetching instruc= tions.", "SampleAfterValue": "200003", "UMask": "0x24", "Unit": "cpu_core" @@ -247,7 +244,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.DEMAND_DATA_RD_HIT", - "PublicDescription": "Counts the number of demand Data Read reques= ts initiated by load instructions that hit L2 cache. Available PDIST counte= rs: 0", + "PublicDescription": "Counts the number of demand Data Read reques= ts initiated by load instructions that hit L2 cache.", "SampleAfterValue": "200003", "UMask": "0xc1", "Unit": "cpu_core" @@ -257,7 +254,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.DEMAND_DATA_RD_MISS", - "PublicDescription": "Counts demand Data Read requests with true-m= iss in the L2 cache. True-miss excludes misses that were merged with ongoin= g L2 misses. An access is counted once. Available PDIST counters: 0", + "PublicDescription": "Counts demand Data Read requests with true-m= iss in the L2 cache. True-miss excludes misses that were merged with ongoin= g L2 misses. An access is counted once.", "SampleAfterValue": "200003", "UMask": "0x21", "Unit": "cpu_core" @@ -267,7 +264,6 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.HWPF_MISS", - "PublicDescription": "L2_RQSTS.HWPF_MISS Available PDIST counters:= 0", "SampleAfterValue": "200003", "UMask": "0x30", "Unit": "cpu_core" @@ -277,7 +273,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.MISS", - "PublicDescription": "Counts read requests of any type with true-m= iss in the L2 cache. True-miss excludes L2 misses that were merged with ong= oing L2 misses. [This event is alias to L2_REQUEST.MISS] Available PDIST co= unters: 0", + "PublicDescription": "Counts read requests of any type with true-m= iss in the L2 cache. True-miss excludes L2 misses that were merged with ong= oing L2 misses. [This event is alias to L2_REQUEST.MISS]", "SampleAfterValue": "200003", "UMask": "0x3f", "Unit": "cpu_core" @@ -287,7 +283,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.REFERENCES", - "PublicDescription": "Counts all requests that were hit or true mi= sses in L2 cache. True-miss excludes misses that were merged with ongoing L= 2 misses. [This event is alias to L2_REQUEST.ALL] Available PDIST counters:= 0", + "PublicDescription": "Counts all requests that were hit or true mi= sses in L2 cache. True-miss excludes misses that were merged with ongoing L= 2 misses. [This event is alias to L2_REQUEST.ALL]", "SampleAfterValue": "200003", "UMask": "0xff", "Unit": "cpu_core" @@ -297,7 +293,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.RFO_HIT", - "PublicDescription": "Counts the RFO (Read-for-Ownership) requests= that hit L2 cache. Available PDIST counters: 0", + "PublicDescription": "Counts the RFO (Read-for-Ownership) requests= that hit L2 cache.", "SampleAfterValue": "200003", "UMask": "0xc2", "Unit": "cpu_core" @@ -307,7 +303,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.RFO_MISS", - "PublicDescription": "Counts the RFO (Read-for-Ownership) requests= that miss L2 cache. Available PDIST counters: 0", + "PublicDescription": "Counts the RFO (Read-for-Ownership) requests= that miss L2 cache.", "SampleAfterValue": "200003", "UMask": "0x22", "Unit": "cpu_core" @@ -317,7 +313,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.SWPF_HIT", - "PublicDescription": "Counts Software prefetch requests that hit t= he L2 cache. Accounts for PREFETCHNTA and PREFETCHT0/1/2 instructions when = FB is not full. Available PDIST counters: 0", + "PublicDescription": "Counts Software prefetch requests that hit t= he L2 cache. Accounts for PREFETCHNTA and PREFETCHT0/1/2 instructions when = FB is not full.", "SampleAfterValue": "200003", "UMask": "0xc8", "Unit": "cpu_core" @@ -327,7 +323,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.SWPF_MISS", - "PublicDescription": "Counts Software prefetch requests that miss = the L2 cache. Accounts for PREFETCHNTA and PREFETCHT0/1/2 instructions when= FB is not full. Available PDIST counters: 0", + "PublicDescription": "Counts Software prefetch requests that miss = the L2 cache. Accounts for PREFETCHNTA and PREFETCHT0/1/2 instructions when= FB is not full.", "SampleAfterValue": "200003", "UMask": "0x28", "Unit": "cpu_core" @@ -337,7 +333,7 @@ "Counter": "0,1,2,3", "EventCode": "0x23", "EventName": "L2_TRANS.L2_WB", - "PublicDescription": "Counts L2 writebacks that access L2 cache. A= vailable PDIST counters: 0", + "PublicDescription": "Counts L2 writebacks that access L2 cache.", "SampleAfterValue": "200003", "UMask": "0x40", "Unit": "cpu_core" @@ -357,7 +353,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0x2e", "EventName": "LONGEST_LAT_CACHE.MISS", - "PublicDescription": "Counts core-originated cacheable requests th= at miss the L3 cache (Longest Latency cache). Requests include data and cod= e reads, Reads-for-Ownership (RFOs), speculative accesses and hardware pref= etches to the L1 and L2. It does not include hardware prefetches to the L3= , and may not count other types of requests to the L3. Available PDIST coun= ters: 0", + "PublicDescription": "Counts core-originated cacheable requests th= at miss the L3 cache (Longest Latency cache). Requests include data and cod= e reads, Reads-for-Ownership (RFOs), speculative accesses and hardware pref= etches to the L1 and L2. It does not include hardware prefetches to the L3= , and may not count other types of requests to the L3.", "SampleAfterValue": "100003", "UMask": "0x41", "Unit": "cpu_core" @@ -377,7 +373,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0x2e", "EventName": "LONGEST_LAT_CACHE.REFERENCE", - "PublicDescription": "Counts core-originated cacheable requests to= the L3 cache (Longest Latency cache). Requests include data and code reads= , Reads-for-Ownership (RFOs), speculative accesses and hardware prefetches = to the L1 and L2. It does not include hardware prefetches to the L3, and m= ay not count other types of requests to the L3. Available PDIST counters: 0= ", + "PublicDescription": "Counts core-originated cacheable requests to= the L3 cache (Longest Latency cache). Requests include data and code reads= , Reads-for-Ownership (RFOs), speculative accesses and hardware prefetches = to the L1 and L2. It does not include hardware prefetches to the L3, and m= ay not count other types of requests to the L3.", "SampleAfterValue": "100003", "UMask": "0x4f", "Unit": "cpu_core" @@ -552,7 +548,7 @@ "Counter": "0,1,2,3", "EventCode": "0x43", "EventName": "MEM_LOAD_COMPLETED.L1_MISS_ANY", - "PublicDescription": "Number of completed demand load requests tha= t missed the L1 data cache including shadow misses (FB hits, merge to an on= going L1D miss) Available PDIST counters: 0", + "PublicDescription": "Number of completed demand load requests tha= t missed the L1 data cache including shadow misses (FB hits, merge to an on= going L1D miss)", "SampleAfterValue": "1000003", "UMask": "0xfd", "Unit": "cpu_core" @@ -853,7 +849,6 @@ "Counter": "0,1,2,3", "EventCode": "0x44", "EventName": "MEM_STORE_RETIRED.L2_HIT", - "PublicDescription": "MEM_STORE_RETIRED.L2_HIT Available PDIST cou= nters: 0", "SampleAfterValue": "200003", "UMask": "0x1", "Unit": "cpu_core" @@ -1050,7 +1045,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe5", "EventName": "MEM_UOP_RETIRED.ANY", - "PublicDescription": "Number of retired micro-operations (uops) fo= r load or store memory accesses Available PDIST counters: 0", + "PublicDescription": "Number of retired micro-operations (uops) fo= r load or store memory accesses", "SampleAfterValue": "1000003", "UMask": "0x3", "Unit": "cpu_core" @@ -1372,7 +1367,6 @@ "Counter": "0,1,2,3", "EventCode": "0x21", "EventName": "OFFCORE_REQUESTS.ALL_REQUESTS", - "PublicDescription": "OFFCORE_REQUESTS.ALL_REQUESTS Available PDIS= T counters: 0", "SampleAfterValue": "100003", "UMask": "0x80", "Unit": "cpu_core" @@ -1382,7 +1376,7 @@ "Counter": "0,1,2,3", "EventCode": "0x21", "EventName": "OFFCORE_REQUESTS.DATA_RD", - "PublicDescription": "Counts the demand and prefetch data reads. A= ll Core Data Reads include cacheable 'Demands' and L2 prefetchers (not L3 p= refetchers). Counting also covers reads due to page walks resulted from any= request type. Available PDIST counters: 0", + "PublicDescription": "Counts the demand and prefetch data reads. A= ll Core Data Reads include cacheable 'Demands' and L2 prefetchers (not L3 p= refetchers). Counting also covers reads due to page walks resulted from any= request type.", "SampleAfterValue": "100003", "UMask": "0x8", "Unit": "cpu_core" @@ -1392,7 +1386,7 @@ "Counter": "0,1,2,3", "EventCode": "0x21", "EventName": "OFFCORE_REQUESTS.DEMAND_CODE_RD", - "PublicDescription": "Counts both cacheable and non-cacheable code= read requests. Available PDIST counters: 0", + "PublicDescription": "Counts both cacheable and non-cacheable code= read requests.", "SampleAfterValue": "100003", "UMask": "0x2", "Unit": "cpu_core" @@ -1402,7 +1396,7 @@ "Counter": "0,1,2,3", "EventCode": "0x21", "EventName": "OFFCORE_REQUESTS.DEMAND_DATA_RD", - "PublicDescription": "Counts the Demand Data Read requests sent to= uncore. Use it in conjunction with OFFCORE_REQUESTS_OUTSTANDING to determi= ne average latency in the uncore. Available PDIST counters: 0", + "PublicDescription": "Counts the Demand Data Read requests sent to= uncore. Use it in conjunction with OFFCORE_REQUESTS_OUTSTANDING to determi= ne average latency in the uncore.", "SampleAfterValue": "100003", "UMask": "0x1", "Unit": "cpu_core" @@ -1412,7 +1406,7 @@ "Counter": "0,1,2,3", "EventCode": "0x21", "EventName": "OFFCORE_REQUESTS.DEMAND_RFO", - "PublicDescription": "Counts the demand RFO (read for ownership) r= equests including regular RFOs, locks, ItoM. Available PDIST counters: 0", + "PublicDescription": "Counts the demand RFO (read for ownership) r= equests including regular RFOs, locks, ItoM.", "SampleAfterValue": "100003", "UMask": "0x4", "Unit": "cpu_core" @@ -1424,7 +1418,6 @@ "Errata": "ADL038", "EventCode": "0x20", "EventName": "OFFCORE_REQUESTS_OUTSTANDING.ALL_DATA_RD", - "PublicDescription": "This event is deprecated. Refer to new event= OFFCORE_REQUESTS_OUTSTANDING.DATA_RD Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x8", "Unit": "cpu_core" @@ -1436,7 +1429,6 @@ "Errata": "ADL038", "EventCode": "0x20", "EventName": "OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD", - "PublicDescription": "OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DAT= A_RD Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x8", "Unit": "cpu_core" @@ -1447,7 +1439,7 @@ "CounterMask": "1", "EventCode": "0x20", "EventName": "OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_CODE= _RD", - "PublicDescription": "Counts the number of offcore outstanding Cod= e Reads transactions in the super queue every cycle. The 'Offcore outstandi= ng' state of the transaction lasts from the L2 miss until the sending trans= action completion to requestor (SQ deallocation). See the corresponding Uma= sk under OFFCORE_REQUESTS. Available PDIST counters: 0", + "PublicDescription": "Counts the number of offcore outstanding Cod= e Reads transactions in the super queue every cycle. The 'Offcore outstandi= ng' state of the transaction lasts from the L2 miss until the sending trans= action completion to requestor (SQ deallocation). See the corresponding Uma= sk under OFFCORE_REQUESTS.", "SampleAfterValue": "1000003", "UMask": "0x2", "Unit": "cpu_core" @@ -1458,7 +1450,6 @@ "CounterMask": "1", "EventCode": "0x20", "EventName": "OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_DATA= _RD", - "PublicDescription": "Cycles where at least 1 outstanding demand d= ata read request is pending. Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x1", "Unit": "cpu_core" @@ -1469,7 +1460,7 @@ "CounterMask": "1", "EventCode": "0x20", "EventName": "OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_RFO", - "PublicDescription": "Counts the number of offcore outstanding dem= and rfo Reads transactions in the super queue every cycle. The 'Offcore out= standing' state of the transaction lasts from the L2 miss until the sending= transaction completion to requestor (SQ deallocation). See the correspondi= ng Umask under OFFCORE_REQUESTS. Available PDIST counters: 0", + "PublicDescription": "Counts the number of offcore outstanding dem= and rfo Reads transactions in the super queue every cycle. The 'Offcore out= standing' state of the transaction lasts from the L2 miss until the sending= transaction completion to requestor (SQ deallocation). See the correspondi= ng Umask under OFFCORE_REQUESTS.", "SampleAfterValue": "1000003", "UMask": "0x4", "Unit": "cpu_core" @@ -1480,7 +1471,6 @@ "Errata": "ADL038", "EventCode": "0x20", "EventName": "OFFCORE_REQUESTS_OUTSTANDING.DATA_RD", - "PublicDescription": "OFFCORE_REQUESTS_OUTSTANDING.DATA_RD Availab= le PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x8", "Unit": "cpu_core" @@ -1490,7 +1480,7 @@ "Counter": "0,1,2,3", "EventCode": "0x20", "EventName": "OFFCORE_REQUESTS_OUTSTANDING.DEMAND_CODE_RD", - "PublicDescription": "Counts the number of offcore outstanding Cod= e Reads transactions in the super queue every cycle. The 'Offcore outstandi= ng' state of the transaction lasts from the L2 miss until the sending trans= action completion to requestor (SQ deallocation). See the corresponding Uma= sk under OFFCORE_REQUESTS. Available PDIST counters: 0", + "PublicDescription": "Counts the number of offcore outstanding Cod= e Reads transactions in the super queue every cycle. The 'Offcore outstandi= ng' state of the transaction lasts from the L2 miss until the sending trans= action completion to requestor (SQ deallocation). See the corresponding Uma= sk under OFFCORE_REQUESTS.", "SampleAfterValue": "1000003", "UMask": "0x2", "Unit": "cpu_core" @@ -1500,7 +1490,7 @@ "Counter": "0,1,2,3", "EventCode": "0x20", "EventName": "OFFCORE_REQUESTS_OUTSTANDING.DEMAND_DATA_RD", - "PublicDescription": "For every cycle, increments by the number of= outstanding demand data read requests pending. Requests are considered o= utstanding from the time they miss the core's L2 cache until the transactio= n completion message is sent to the requestor. Available PDIST counters: 0", + "PublicDescription": "For every cycle, increments by the number of= outstanding demand data read requests pending. Requests are considered o= utstanding from the time they miss the core's L2 cache until the transactio= n completion message is sent to the requestor.", "SampleAfterValue": "1000003", "UMask": "0x1", "Unit": "cpu_core" @@ -1510,7 +1500,7 @@ "Counter": "0,1,2,3", "EventCode": "0x2c", "EventName": "SQ_MISC.BUS_LOCK", - "PublicDescription": "Counts the more expensive bus lock needed to= enforce cache coherency for certain memory accesses that need to be done a= tomically. Can be created by issuing an atomic instruction (via the LOCK p= refix) which causes a cache line split or accesses uncacheable memory. Avai= lable PDIST counters: 0", + "PublicDescription": "Counts the more expensive bus lock needed to= enforce cache coherency for certain memory accesses that need to be done a= tomically. Can be created by issuing an atomic instruction (via the LOCK p= refix) which causes a cache line split or accesses uncacheable memory.", "SampleAfterValue": "100003", "UMask": "0x10", "Unit": "cpu_core" @@ -1520,7 +1510,6 @@ "Counter": "0,1,2,3", "EventCode": "0x40", "EventName": "SW_PREFETCH_ACCESS.ANY", - "PublicDescription": "Counts the number of PREFETCHNTA, PREFETCHW,= PREFETCHT0, PREFETCHT1 or PREFETCHT2 instructions executed. Available PDIS= T counters: 0", "SampleAfterValue": "100003", "UMask": "0xf", "Unit": "cpu_core" @@ -1530,7 +1519,7 @@ "Counter": "0,1,2,3", "EventCode": "0x40", "EventName": "SW_PREFETCH_ACCESS.NTA", - "PublicDescription": "Counts the number of PREFETCHNTA instruction= s executed. Available PDIST counters: 0", + "PublicDescription": "Counts the number of PREFETCHNTA instruction= s executed.", "SampleAfterValue": "100003", "UMask": "0x1", "Unit": "cpu_core" @@ -1540,7 +1529,7 @@ "Counter": "0,1,2,3", "EventCode": "0x40", "EventName": "SW_PREFETCH_ACCESS.PREFETCHW", - "PublicDescription": "Counts the number of PREFETCHW instructions = executed. Available PDIST counters: 0", + "PublicDescription": "Counts the number of PREFETCHW instructions = executed.", "SampleAfterValue": "100003", "UMask": "0x8", "Unit": "cpu_core" @@ -1550,7 +1539,7 @@ "Counter": "0,1,2,3", "EventCode": "0x40", "EventName": "SW_PREFETCH_ACCESS.T0", - "PublicDescription": "Counts the number of PREFETCHT0 instructions= executed. Available PDIST counters: 0", + "PublicDescription": "Counts the number of PREFETCHT0 instructions= executed.", "SampleAfterValue": "100003", "UMask": "0x2", "Unit": "cpu_core" @@ -1560,7 +1549,7 @@ "Counter": "0,1,2,3", "EventCode": "0x40", "EventName": "SW_PREFETCH_ACCESS.T1_T2", - "PublicDescription": "Counts the number of PREFETCHT1 or PREFETCHT= 2 instructions executed. Available PDIST counters: 0", + "PublicDescription": "Counts the number of PREFETCHT1 or PREFETCHT= 2 instructions executed.", "SampleAfterValue": "100003", "UMask": "0x4", "Unit": "cpu_core" diff --git a/tools/perf/pmu-events/arch/x86/alderlake/floating-point.json b= /tools/perf/pmu-events/arch/x86/alderlake/floating-point.json index d01f1b163ed8..62fd70f220e5 100644 --- a/tools/perf/pmu-events/arch/x86/alderlake/floating-point.json +++ b/tools/perf/pmu-events/arch/x86/alderlake/floating-point.json @@ -14,7 +14,6 @@ "CounterMask": "1", "EventCode": "0xb0", "EventName": "ARITH.FPDIV_ACTIVE", - "PublicDescription": "ARITH.FPDIV_ACTIVE Available PDIST counters:= 0", "SampleAfterValue": "1000003", "UMask": "0x1", "Unit": "cpu_core" @@ -33,7 +32,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc1", "EventName": "ASSISTS.FP", - "PublicDescription": "Counts all microcode Floating Point assists.= Available PDIST counters: 0", + "PublicDescription": "Counts all microcode Floating Point assists.= ", "SampleAfterValue": "100003", "UMask": "0x2", "Unit": "cpu_core" @@ -43,7 +42,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc1", "EventName": "ASSISTS.SSE_AVX_MIX", - "PublicDescription": "ASSISTS.SSE_AVX_MIX Available PDIST counters= : 0", "SampleAfterValue": "1000003", "UMask": "0x10", "Unit": "cpu_core" @@ -53,7 +51,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb3", "EventName": "FP_ARITH_DISPATCHED.PORT_0", - "PublicDescription": "FP_ARITH_DISPATCHED.PORT_0 [This event is al= ias to FP_ARITH_DISPATCHED.V0] Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x1", "Unit": "cpu_core" @@ -63,7 +60,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb3", "EventName": "FP_ARITH_DISPATCHED.PORT_1", - "PublicDescription": "FP_ARITH_DISPATCHED.PORT_1 [This event is al= ias to FP_ARITH_DISPATCHED.V1] Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x2", "Unit": "cpu_core" @@ -73,7 +69,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb3", "EventName": "FP_ARITH_DISPATCHED.PORT_5", - "PublicDescription": "FP_ARITH_DISPATCHED.PORT_5 [This event is al= ias to FP_ARITH_DISPATCHED.V2] Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x4", "Unit": "cpu_core" @@ -83,7 +78,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb3", "EventName": "FP_ARITH_DISPATCHED.V0", - "PublicDescription": "FP_ARITH_DISPATCHED.V0 [This event is alias = to FP_ARITH_DISPATCHED.PORT_0] Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x1", "Unit": "cpu_core" @@ -93,7 +87,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb3", "EventName": "FP_ARITH_DISPATCHED.V1", - "PublicDescription": "FP_ARITH_DISPATCHED.V1 [This event is alias = to FP_ARITH_DISPATCHED.PORT_1] Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x2", "Unit": "cpu_core" @@ -103,7 +96,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb3", "EventName": "FP_ARITH_DISPATCHED.V2", - "PublicDescription": "FP_ARITH_DISPATCHED.V2 [This event is alias = to FP_ARITH_DISPATCHED.PORT_5] Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x4", "Unit": "cpu_core" @@ -113,7 +105,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE", - "PublicDescription": "Number of SSE/AVX computational 128-bit pack= ed double precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 2 computation opera= tions, one for each element. Applies to SSE* and AVX* packed double precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as the= y perform 2 calculations per element. The DAZ and FTZ flags in the MXCSR re= gister need to be set when using these events. Available PDIST counters: 0", + "PublicDescription": "Number of SSE/AVX computational 128-bit pack= ed double precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 2 computation opera= tions, one for each element. Applies to SSE* and AVX* packed double precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as the= y perform 2 calculations per element. The DAZ and FTZ flags in the MXCSR re= gister need to be set when using these events.", "SampleAfterValue": "100003", "UMask": "0x4", "Unit": "cpu_core" @@ -123,7 +115,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE", - "PublicDescription": "Number of SSE/AVX computational 128-bit pack= ed single precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 4 computation opera= tions, one for each element. Applies to SSE* and AVX* packed single precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT RSQRT RCP DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count tw= ice as they perform 2 calculations per element. The DAZ and FTZ flags in th= e MXCSR register need to be set when using these events. Available PDIST co= unters: 0", + "PublicDescription": "Number of SSE/AVX computational 128-bit pack= ed single precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 4 computation opera= tions, one for each element. Applies to SSE* and AVX* packed single precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT RSQRT RCP DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count tw= ice as they perform 2 calculations per element. The DAZ and FTZ flags in th= e MXCSR register need to be set when using these events.", "SampleAfterValue": "100003", "UMask": "0x8", "Unit": "cpu_core" @@ -133,7 +125,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE", - "PublicDescription": "Number of SSE/AVX computational 256-bit pack= ed double precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 4 computation opera= tions, one for each element. Applies to SSE* and AVX* packed double precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 = calculations per element. The DAZ and FTZ flags in the MXCSR register need = to be set when using these events. Available PDIST counters: 0", + "PublicDescription": "Number of SSE/AVX computational 256-bit pack= ed double precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 4 computation opera= tions, one for each element. Applies to SSE* and AVX* packed double precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 = calculations per element. The DAZ and FTZ flags in the MXCSR register need = to be set when using these events.", "SampleAfterValue": "100003", "UMask": "0x10", "Unit": "cpu_core" @@ -143,7 +135,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE", - "PublicDescription": "Number of SSE/AVX computational 256-bit pack= ed single precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 8 computation opera= tions, one for each element. Applies to SSE* and AVX* packed single precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT RSQRT RCP DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count tw= ice as they perform 2 calculations per element. The DAZ and FTZ flags in th= e MXCSR register need to be set when using these events. Available PDIST co= unters: 0", + "PublicDescription": "Number of SSE/AVX computational 256-bit pack= ed single precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 8 computation opera= tions, one for each element. Applies to SSE* and AVX* packed single precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT RSQRT RCP DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count tw= ice as they perform 2 calculations per element. The DAZ and FTZ flags in th= e MXCSR register need to be set when using these events.", "SampleAfterValue": "100003", "UMask": "0x20", "Unit": "cpu_core" @@ -153,7 +145,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.4_FLOPS", - "PublicDescription": "Number of SSE/AVX computational 128-bit pack= ed single precision and 256-bit packed double precision floating-point ins= tructions retired; some instructions will count twice as noted below. Each= count represents 2 or/and 4 computation operations, one for each element. = Applies to SSE* and AVX* packed single precision floating-point and packed= double precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL= DIV MIN MAX RCP14 RSQRT14 SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB ins= tructions count twice as they perform 2 calculations per element. The DAZ a= nd FTZ flags in the MXCSR register need to be set when using these events. = Available PDIST counters: 0", + "PublicDescription": "Number of SSE/AVX computational 128-bit pack= ed single precision and 256-bit packed double precision floating-point ins= tructions retired; some instructions will count twice as noted below. Each= count represents 2 or/and 4 computation operations, one for each element. = Applies to SSE* and AVX* packed single precision floating-point and packed= double precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL= DIV MIN MAX RCP14 RSQRT14 SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB ins= tructions count twice as they perform 2 calculations per element. The DAZ a= nd FTZ flags in the MXCSR register need to be set when using these events.", "SampleAfterValue": "100003", "UMask": "0x18", "Unit": "cpu_core" @@ -163,7 +155,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.SCALAR", - "PublicDescription": "Number of SSE/AVX computational scalar singl= e precision and double precision floating-point instructions retired; some = instructions will count twice as noted below. Each count represents 1 comp= utational operation. Applies to SSE* and AVX* scalar single precision float= ing-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT RCP FM(N)ADD/SUB= . FM(N)ADD/SUB instructions count twice as they perform 2 calculations per= element. The DAZ and FTZ flags in the MXCSR register need to be set when u= sing these events. Available PDIST counters: 0", + "PublicDescription": "Number of SSE/AVX computational scalar singl= e precision and double precision floating-point instructions retired; some = instructions will count twice as noted below. Each count represents 1 comp= utational operation. Applies to SSE* and AVX* scalar single precision float= ing-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT RCP FM(N)ADD/SUB= . FM(N)ADD/SUB instructions count twice as they perform 2 calculations per= element. The DAZ and FTZ flags in the MXCSR register need to be set when u= sing these events.", "SampleAfterValue": "1000003", "UMask": "0x3", "Unit": "cpu_core" @@ -173,7 +165,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.SCALAR_DOUBLE", - "PublicDescription": "Number of SSE/AVX computational scalar doubl= e precision floating-point instructions retired; some instructions will cou= nt twice as noted below. Each count represents 1 computational operation. = Applies to SSE* and AVX* scalar double precision floating-point instruction= s: ADD SUB MUL DIV MIN MAX SQRT FM(N)ADD/SUB. FM(N)ADD/SUB instructions co= unt twice as they perform 2 calculations per element. The DAZ and FTZ flags= in the MXCSR register need to be set when using these events. Available PD= IST counters: 0", + "PublicDescription": "Number of SSE/AVX computational scalar doubl= e precision floating-point instructions retired; some instructions will cou= nt twice as noted below. Each count represents 1 computational operation. = Applies to SSE* and AVX* scalar double precision floating-point instruction= s: ADD SUB MUL DIV MIN MAX SQRT FM(N)ADD/SUB. FM(N)ADD/SUB instructions co= unt twice as they perform 2 calculations per element. The DAZ and FTZ flags= in the MXCSR register need to be set when using these events.", "SampleAfterValue": "100003", "UMask": "0x1", "Unit": "cpu_core" @@ -183,7 +175,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.SCALAR_SINGLE", - "PublicDescription": "Number of SSE/AVX computational scalar singl= e precision floating-point instructions retired; some instructions will cou= nt twice as noted below. Each count represents 1 computational operation. = Applies to SSE* and AVX* scalar single precision floating-point instruction= s: ADD SUB MUL DIV MIN MAX SQRT RSQRT RCP FM(N)ADD/SUB. FM(N)ADD/SUB instr= uctions count twice as they perform 2 calculations per element. The DAZ and= FTZ flags in the MXCSR register need to be set when using these events. Av= ailable PDIST counters: 0", + "PublicDescription": "Number of SSE/AVX computational scalar singl= e precision floating-point instructions retired; some instructions will cou= nt twice as noted below. Each count represents 1 computational operation. = Applies to SSE* and AVX* scalar single precision floating-point instruction= s: ADD SUB MUL DIV MIN MAX SQRT RSQRT RCP FM(N)ADD/SUB. FM(N)ADD/SUB instr= uctions count twice as they perform 2 calculations per element. The DAZ and= FTZ flags in the MXCSR register need to be set when using these events.", "SampleAfterValue": "100003", "UMask": "0x2", "Unit": "cpu_core" @@ -193,7 +185,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.VECTOR", - "PublicDescription": "Number of any Vector retired FP arithmetic i= nstructions. The DAZ and FTZ flags in the MXCSR register need to be set wh= en using these events. Available PDIST counters: 0", + "PublicDescription": "Number of any Vector retired FP arithmetic i= nstructions. The DAZ and FTZ flags in the MXCSR register need to be set wh= en using these events.", "SampleAfterValue": "1000003", "UMask": "0xfc", "Unit": "cpu_core" diff --git a/tools/perf/pmu-events/arch/x86/alderlake/frontend.json b/tools= /perf/pmu-events/arch/x86/alderlake/frontend.json index dae3174a74fb..ff3b30c2619a 100644 --- a/tools/perf/pmu-events/arch/x86/alderlake/frontend.json +++ b/tools/perf/pmu-events/arch/x86/alderlake/frontend.json @@ -14,7 +14,7 @@ "Counter": "0,1,2,3", "EventCode": "0x60", "EventName": "BACLEARS.ANY", - "PublicDescription": "Number of times the front-end is resteered w= hen it finds a branch instruction in a fetch line. This is called Unknown B= ranch which occurs for the first time a branch instruction is fetched or wh= en the branch is not tracked by the BPU (Branch Prediction Unit) anymore. A= vailable PDIST counters: 0", + "PublicDescription": "Number of times the front-end is resteered w= hen it finds a branch instruction in a fetch line. This is called Unknown B= ranch which occurs for the first time a branch instruction is fetched or wh= en the branch is not tracked by the BPU (Branch Prediction Unit) anymore.", "SampleAfterValue": "100003", "UMask": "0x1", "Unit": "cpu_core" @@ -24,7 +24,7 @@ "Counter": "0,1,2,3", "EventCode": "0x87", "EventName": "DECODE.LCP", - "PublicDescription": "Counts cycles that the Instruction Length de= coder (ILD) stalls occurred due to dynamically changing prefix length of th= e decoded instruction (by operand size prefix instruction 0x66, address siz= e prefix instruction 0x67 or REX.W for Intel64). Count is proportional to t= he number of prefixes in a 16B-line. This may result in a three-cycle penal= ty for each LCP (Length changing prefix) in a 16-byte chunk. Available PDIS= T counters: 0", + "PublicDescription": "Counts cycles that the Instruction Length de= coder (ILD) stalls occurred due to dynamically changing prefix length of th= e decoded instruction (by operand size prefix instruction 0x66, address siz= e prefix instruction 0x67 or REX.W for Intel64). Count is proportional to t= he number of prefixes in a 16B-line. This may result in a three-cycle penal= ty for each LCP (Length changing prefix) in a 16-byte chunk.", "SampleAfterValue": "500009", "UMask": "0x1", "Unit": "cpu_core" @@ -34,7 +34,6 @@ "Counter": "0,1,2,3", "EventCode": "0x87", "EventName": "DECODE.MS_BUSY", - "PublicDescription": "Cycles the Microcode Sequencer is busy. Avai= lable PDIST counters: 0", "SampleAfterValue": "500009", "UMask": "0x2", "Unit": "cpu_core" @@ -44,7 +43,7 @@ "Counter": "0,1,2,3", "EventCode": "0x61", "EventName": "DSB2MITE_SWITCHES.PENALTY_CYCLES", - "PublicDescription": "Decode Stream Buffer (DSB) is a Uop-cache th= at holds translations of previously fetched instructions that were decoded = by the legacy x86 decode pipeline (MITE). This event counts fetch penalty c= ycles when a transition occurs from DSB to MITE. Available PDIST counters: = 0", + "PublicDescription": "Decode Stream Buffer (DSB) is a Uop-cache th= at holds translations of previously fetched instructions that were decoded = by the legacy x86 decode pipeline (MITE). This event counts fetch penalty c= ycles when a transition occurs from DSB to MITE.", "SampleAfterValue": "100003", "UMask": "0x2", "Unit": "cpu_core" @@ -302,7 +301,7 @@ "Counter": "0,1,2,3", "EventCode": "0x80", "EventName": "ICACHE_DATA.STALLS", - "PublicDescription": "Counts cycles where a code line fetch is sta= lled due to an L1 instruction cache miss. The decode pipeline works at a 32= Byte granularity. Available PDIST counters: 0", + "PublicDescription": "Counts cycles where a code line fetch is sta= lled due to an L1 instruction cache miss. The decode pipeline works at a 32= Byte granularity.", "SampleAfterValue": "500009", "UMask": "0x4", "Unit": "cpu_core" @@ -314,7 +313,6 @@ "EdgeDetect": "1", "EventCode": "0x80", "EventName": "ICACHE_DATA.STALL_PERIODS", - "PublicDescription": "ICACHE_DATA.STALL_PERIODS Available PDIST co= unters: 0", "SampleAfterValue": "500009", "UMask": "0x4", "Unit": "cpu_core" @@ -324,7 +322,7 @@ "Counter": "0,1,2,3", "EventCode": "0x83", "EventName": "ICACHE_TAG.STALLS", - "PublicDescription": "Counts cycles where a code fetch is stalled = due to L1 instruction cache tag miss. Available PDIST counters: 0", + "PublicDescription": "Counts cycles where a code fetch is stalled = due to L1 instruction cache tag miss.", "SampleAfterValue": "200003", "UMask": "0x4", "Unit": "cpu_core" @@ -335,7 +333,7 @@ "CounterMask": "1", "EventCode": "0x79", "EventName": "IDQ.DSB_CYCLES_ANY", - "PublicDescription": "Counts the number of cycles uops were delive= red to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) p= ath. Available PDIST counters: 0", + "PublicDescription": "Counts the number of cycles uops were delive= red to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) p= ath.", "SampleAfterValue": "2000003", "UMask": "0x8", "Unit": "cpu_core" @@ -346,7 +344,7 @@ "CounterMask": "6", "EventCode": "0x79", "EventName": "IDQ.DSB_CYCLES_OK", - "PublicDescription": "Counts the number of cycles where optimal nu= mber of uops was delivered to the Instruction Decode Queue (IDQ) from the D= SB (Decode Stream Buffer) path. Count includes uops that may 'bypass' the I= DQ. Available PDIST counters: 0", + "PublicDescription": "Counts the number of cycles where optimal nu= mber of uops was delivered to the Instruction Decode Queue (IDQ) from the D= SB (Decode Stream Buffer) path. Count includes uops that may 'bypass' the I= DQ.", "SampleAfterValue": "2000003", "UMask": "0x8", "Unit": "cpu_core" @@ -356,7 +354,7 @@ "Counter": "0,1,2,3", "EventCode": "0x79", "EventName": "IDQ.DSB_UOPS", - "PublicDescription": "Counts the number of uops delivered to Instr= uction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path. Availab= le PDIST counters: 0", + "PublicDescription": "Counts the number of uops delivered to Instr= uction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path.", "SampleAfterValue": "2000003", "UMask": "0x8", "Unit": "cpu_core" @@ -367,7 +365,7 @@ "CounterMask": "1", "EventCode": "0x79", "EventName": "IDQ.MITE_CYCLES_ANY", - "PublicDescription": "Counts the number of cycles uops were delive= red to the Instruction Decode Queue (IDQ) from the MITE (legacy decode pipe= line) path. During these cycles uops are not being delivered from the Decod= e Stream Buffer (DSB). Available PDIST counters: 0", + "PublicDescription": "Counts the number of cycles uops were delive= red to the Instruction Decode Queue (IDQ) from the MITE (legacy decode pipe= line) path. During these cycles uops are not being delivered from the Decod= e Stream Buffer (DSB).", "SampleAfterValue": "2000003", "UMask": "0x4", "Unit": "cpu_core" @@ -378,7 +376,7 @@ "CounterMask": "6", "EventCode": "0x79", "EventName": "IDQ.MITE_CYCLES_OK", - "PublicDescription": "Counts the number of cycles where optimal nu= mber of uops was delivered to the Instruction Decode Queue (IDQ) from the M= ITE (legacy decode pipeline) path. During these cycles uops are not being d= elivered from the Decode Stream Buffer (DSB). Available PDIST counters: 0", + "PublicDescription": "Counts the number of cycles where optimal nu= mber of uops was delivered to the Instruction Decode Queue (IDQ) from the M= ITE (legacy decode pipeline) path. During these cycles uops are not being d= elivered from the Decode Stream Buffer (DSB).", "SampleAfterValue": "2000003", "UMask": "0x4", "Unit": "cpu_core" @@ -388,7 +386,7 @@ "Counter": "0,1,2,3", "EventCode": "0x79", "EventName": "IDQ.MITE_UOPS", - "PublicDescription": "Counts the number of uops delivered to Instr= uction Decode Queue (IDQ) from the MITE path. This also means that uops are= not being delivered from the Decode Stream Buffer (DSB). Available PDIST c= ounters: 0", + "PublicDescription": "Counts the number of uops delivered to Instr= uction Decode Queue (IDQ) from the MITE path. This also means that uops are= not being delivered from the Decode Stream Buffer (DSB).", "SampleAfterValue": "2000003", "UMask": "0x4", "Unit": "cpu_core" @@ -399,7 +397,7 @@ "CounterMask": "1", "EventCode": "0x79", "EventName": "IDQ.MS_CYCLES_ANY", - "PublicDescription": "Counts cycles during which uops are being de= livered to Instruction Decode Queue (IDQ) while the Microcode Sequencer (MS= ) is busy. Uops maybe initiated by Decode Stream Buffer (DSB) or MITE. Avai= lable PDIST counters: 0", + "PublicDescription": "Counts cycles during which uops are being de= livered to Instruction Decode Queue (IDQ) while the Microcode Sequencer (MS= ) is busy. Uops maybe initiated by Decode Stream Buffer (DSB) or MITE.", "SampleAfterValue": "2000003", "UMask": "0x20", "Unit": "cpu_core" @@ -411,7 +409,7 @@ "EdgeDetect": "1", "EventCode": "0x79", "EventName": "IDQ.MS_SWITCHES", - "PublicDescription": "Number of switches from DSB (Decode Stream B= uffer) or MITE (legacy decode pipeline) to the Microcode Sequencer. Availab= le PDIST counters: 0", + "PublicDescription": "Number of switches from DSB (Decode Stream B= uffer) or MITE (legacy decode pipeline) to the Microcode Sequencer.", "SampleAfterValue": "100003", "UMask": "0x20", "Unit": "cpu_core" @@ -421,7 +419,7 @@ "Counter": "0,1,2,3", "EventCode": "0x79", "EventName": "IDQ.MS_UOPS", - "PublicDescription": "Counts the total number of uops delivered by= the Microcode Sequencer (MS). Available PDIST counters: 0", + "PublicDescription": "Counts the total number of uops delivered by= the Microcode Sequencer (MS).", "SampleAfterValue": "1000003", "UMask": "0x20", "Unit": "cpu_core" @@ -431,7 +429,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0x9c", "EventName": "IDQ_BUBBLES.CORE", - "PublicDescription": "Counts the number of uops not delivered to b= y the Instruction Decode Queue (IDQ) to the back-end of the pipeline when t= here was no back-end stalls. This event counts for one SMT thread in a give= n cycle. [This event is alias to IDQ_UOPS_NOT_DELIVERED.CORE] Available PDI= ST counters: 0", + "PublicDescription": "Counts the number of uops not delivered to b= y the Instruction Decode Queue (IDQ) to the back-end of the pipeline when t= here was no back-end stalls. This event counts for one SMT thread in a give= n cycle. [This event is alias to IDQ_UOPS_NOT_DELIVERED.CORE]", "SampleAfterValue": "1000003", "UMask": "0x1", "Unit": "cpu_core" @@ -442,7 +440,7 @@ "CounterMask": "6", "EventCode": "0x9c", "EventName": "IDQ_BUBBLES.CYCLES_0_UOPS_DELIV.CORE", - "PublicDescription": "Counts the number of cycles when no uops wer= e delivered by the Instruction Decode Queue (IDQ) to the back-end of the pi= peline when there was no back-end stalls. This event counts for one SMT thr= ead in a given cycle. [This event is alias to IDQ_UOPS_NOT_DELIVERED.CYCLES= _0_UOPS_DELIV.CORE] Available PDIST counters: 0", + "PublicDescription": "Counts the number of cycles when no uops wer= e delivered by the Instruction Decode Queue (IDQ) to the back-end of the pi= peline when there was no back-end stalls. This event counts for one SMT thr= ead in a given cycle. [This event is alias to IDQ_UOPS_NOT_DELIVERED.CYCLES= _0_UOPS_DELIV.CORE]", "SampleAfterValue": "1000003", "UMask": "0x1", "Unit": "cpu_core" @@ -454,7 +452,7 @@ "EventCode": "0x9c", "EventName": "IDQ_BUBBLES.CYCLES_FE_WAS_OK", "Invert": "1", - "PublicDescription": "Counts the number of cycles when the optimal= number of uops were delivered by the Instruction Decode Queue (IDQ) to the= back-end of the pipeline when there was no back-end stalls. This event cou= nts for one SMT thread in a given cycle. [This event is alias to IDQ_UOPS_N= OT_DELIVERED.CYCLES_FE_WAS_OK] Available PDIST counters: 0", + "PublicDescription": "Counts the number of cycles when the optimal= number of uops were delivered by the Instruction Decode Queue (IDQ) to the= back-end of the pipeline when there was no back-end stalls. This event cou= nts for one SMT thread in a given cycle. [This event is alias to IDQ_UOPS_N= OT_DELIVERED.CYCLES_FE_WAS_OK]", "SampleAfterValue": "1000003", "UMask": "0x1", "Unit": "cpu_core" @@ -464,7 +462,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0x9c", "EventName": "IDQ_UOPS_NOT_DELIVERED.CORE", - "PublicDescription": "Counts the number of uops not delivered to b= y the Instruction Decode Queue (IDQ) to the back-end of the pipeline when t= here was no back-end stalls. This event counts for one SMT thread in a give= n cycle. [This event is alias to IDQ_BUBBLES.CORE] Available PDIST counters= : 0", + "PublicDescription": "Counts the number of uops not delivered to b= y the Instruction Decode Queue (IDQ) to the back-end of the pipeline when t= here was no back-end stalls. This event counts for one SMT thread in a give= n cycle. [This event is alias to IDQ_BUBBLES.CORE]", "SampleAfterValue": "1000003", "UMask": "0x1", "Unit": "cpu_core" @@ -475,7 +473,7 @@ "CounterMask": "6", "EventCode": "0x9c", "EventName": "IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE", - "PublicDescription": "Counts the number of cycles when no uops wer= e delivered by the Instruction Decode Queue (IDQ) to the back-end of the pi= peline when there was no back-end stalls. This event counts for one SMT thr= ead in a given cycle. [This event is alias to IDQ_BUBBLES.CYCLES_0_UOPS_DEL= IV.CORE] Available PDIST counters: 0", + "PublicDescription": "Counts the number of cycles when no uops wer= e delivered by the Instruction Decode Queue (IDQ) to the back-end of the pi= peline when there was no back-end stalls. This event counts for one SMT thr= ead in a given cycle. [This event is alias to IDQ_BUBBLES.CYCLES_0_UOPS_DEL= IV.CORE]", "SampleAfterValue": "1000003", "UMask": "0x1", "Unit": "cpu_core" @@ -487,7 +485,7 @@ "EventCode": "0x9c", "EventName": "IDQ_UOPS_NOT_DELIVERED.CYCLES_FE_WAS_OK", "Invert": "1", - "PublicDescription": "Counts the number of cycles when the optimal= number of uops were delivered by the Instruction Decode Queue (IDQ) to the= back-end of the pipeline when there was no back-end stalls. This event cou= nts for one SMT thread in a given cycle. [This event is alias to IDQ_BUBBLE= S.CYCLES_FE_WAS_OK] Available PDIST counters: 0", + "PublicDescription": "Counts the number of cycles when the optimal= number of uops were delivered by the Instruction Decode Queue (IDQ) to the= back-end of the pipeline when there was no back-end stalls. This event cou= nts for one SMT thread in a given cycle. [This event is alias to IDQ_BUBBLE= S.CYCLES_FE_WAS_OK]", "SampleAfterValue": "1000003", "UMask": "0x1", "Unit": "cpu_core" diff --git a/tools/perf/pmu-events/arch/x86/alderlake/memory.json b/tools/p= erf/pmu-events/arch/x86/alderlake/memory.json index 07f5786bdbc0..a0260d5b8619 100644 --- a/tools/perf/pmu-events/arch/x86/alderlake/memory.json +++ b/tools/perf/pmu-events/arch/x86/alderlake/memory.json @@ -5,7 +5,6 @@ "CounterMask": "6", "EventCode": "0xa3", "EventName": "CYCLE_ACTIVITY.STALLS_L3_MISS", - "PublicDescription": "Execution stalls while L3 cache miss demand = load is outstanding. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x6", "Unit": "cpu_core" @@ -79,7 +78,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc3", "EventName": "MACHINE_CLEARS.MEMORY_ORDERING", - "PublicDescription": "Counts the number of Machine Clears detected= dye to memory ordering. Memory Ordering Machine Clears may apply when a me= mory read may not conform to the memory ordering rules of the x86 architect= ure Available PDIST counters: 0", + "PublicDescription": "Counts the number of Machine Clears detected= dye to memory ordering. Memory Ordering Machine Clears may apply when a me= mory read may not conform to the memory ordering rules of the x86 architect= ure", "SampleAfterValue": "100003", "UMask": "0x2", "Unit": "cpu_core" @@ -90,7 +89,6 @@ "CounterMask": "2", "EventCode": "0x47", "EventName": "MEMORY_ACTIVITY.CYCLES_L1D_MISS", - "PublicDescription": "Cycles while L1 cache miss demand load is ou= tstanding. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x2", "Unit": "cpu_core" @@ -101,7 +99,6 @@ "CounterMask": "3", "EventCode": "0x47", "EventName": "MEMORY_ACTIVITY.STALLS_L1D_MISS", - "PublicDescription": "Execution stalls while L1 cache miss demand = load is outstanding. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x3", "Unit": "cpu_core" @@ -112,7 +109,7 @@ "CounterMask": "5", "EventCode": "0x47", "EventName": "MEMORY_ACTIVITY.STALLS_L2_MISS", - "PublicDescription": "Execution stalls while L2 cache miss demand = cacheable load request is outstanding (will not count for uncacheable deman= d requests e.g. bus lock). Available PDIST counters: 0", + "PublicDescription": "Execution stalls while L2 cache miss demand = cacheable load request is outstanding (will not count for uncacheable deman= d requests e.g. bus lock).", "SampleAfterValue": "1000003", "UMask": "0x5", "Unit": "cpu_core" @@ -123,7 +120,7 @@ "CounterMask": "9", "EventCode": "0x47", "EventName": "MEMORY_ACTIVITY.STALLS_L3_MISS", - "PublicDescription": "Execution stalls while L3 cache miss demand = cacheable load request is outstanding (will not count for uncacheable deman= d requests e.g. bus lock). Available PDIST counters: 0", + "PublicDescription": "Execution stalls while L3 cache miss demand = cacheable load request is outstanding (will not count for uncacheable deman= d requests e.g. bus lock).", "SampleAfterValue": "1000003", "UMask": "0x9", "Unit": "cpu_core" @@ -417,7 +414,6 @@ "Counter": "0,1,2,3", "EventCode": "0x21", "EventName": "OFFCORE_REQUESTS.L3_MISS_DEMAND_DATA_RD", - "PublicDescription": "Counts demand data read requests that miss t= he L3 cache. Available PDIST counters: 0", "SampleAfterValue": "100003", "UMask": "0x10", "Unit": "cpu_core" @@ -427,7 +423,7 @@ "Counter": "0,1,2,3", "EventCode": "0x20", "EventName": "OFFCORE_REQUESTS_OUTSTANDING.L3_MISS_DEMAND_DATA_RD", - "PublicDescription": "For every cycle, increments by the number of= demand data read requests pending that are known to have missed the L3 cac= he. Note that this does not capture all elapsed cycles while requests are = outstanding - only cycles from when the requests were known by the requesti= ng core to have missed the L3 cache. Available PDIST counters: 0", + "PublicDescription": "For every cycle, increments by the number of= demand data read requests pending that are known to have missed the L3 cac= he. Note that this does not capture all elapsed cycles while requests are = outstanding - only cycles from when the requests were known by the requesti= ng core to have missed the L3 cache.", "SampleAfterValue": "2000003", "UMask": "0x10", "Unit": "cpu_core" diff --git a/tools/perf/pmu-events/arch/x86/alderlake/other.json b/tools/pe= rf/pmu-events/arch/x86/alderlake/other.json index 5f64138edfe4..af46cde26b54 100644 --- a/tools/perf/pmu-events/arch/x86/alderlake/other.json +++ b/tools/perf/pmu-events/arch/x86/alderlake/other.json @@ -4,7 +4,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc1", "EventName": "ASSISTS.HARDWARE", - "PublicDescription": "Count all other hardware assists or traps th= at are not necessarily architecturally exposed (through a software handler)= beyond FP; SSE-AVX mix and A/D assists who are counted by dedicated sub-ev= ents. This includes, but not limited to, assists at EXE or MEM uop writeba= ck like AVX* load/store/gather/scatter (non-FP GSSE-assist ) , assists gene= rated by ROB like PEBS and RTIT, Uncore trap, RAR (Remote Action Request) a= nd CET (Control flow Enforcement Technology) assists. the event also counts= for Machine Ordering count. Available PDIST counters: 0", + "PublicDescription": "Count all other hardware assists or traps th= at are not necessarily architecturally exposed (through a software handler)= beyond FP; SSE-AVX mix and A/D assists who are counted by dedicated sub-ev= ents. This includes, but not limited to, assists at EXE or MEM uop writeba= ck like AVX* load/store/gather/scatter (non-FP GSSE-assist ) , assists gene= rated by ROB like PEBS and RTIT, Uncore trap, RAR (Remote Action Request) a= nd CET (Control flow Enforcement Technology) assists. the event also counts= for Machine Ordering count.", "SampleAfterValue": "100003", "UMask": "0x4", "Unit": "cpu_core" @@ -14,7 +14,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc1", "EventName": "ASSISTS.PAGE_FAULT", - "PublicDescription": "ASSISTS.PAGE_FAULT Available PDIST counters:= 0", "SampleAfterValue": "1000003", "UMask": "0x8", "Unit": "cpu_core" @@ -24,7 +23,6 @@ "Counter": "0,1,2,3", "EventCode": "0x28", "EventName": "CORE_POWER.LICENSE_1", - "PublicDescription": "CORE_POWER.LICENSE_1 Available PDIST counter= s: 0", "SampleAfterValue": "200003", "UMask": "0x2", "Unit": "cpu_core" @@ -34,7 +32,6 @@ "Counter": "0,1,2,3", "EventCode": "0x28", "EventName": "CORE_POWER.LICENSE_2", - "PublicDescription": "CORE_POWER.LICENSE_2 Available PDIST counter= s: 0", "SampleAfterValue": "200003", "UMask": "0x4", "Unit": "cpu_core" @@ -44,7 +41,6 @@ "Counter": "0,1,2,3", "EventCode": "0x28", "EventName": "CORE_POWER.LICENSE_3", - "PublicDescription": "CORE_POWER.LICENSE_3 Available PDIST counter= s: 0", "SampleAfterValue": "200003", "UMask": "0x8", "Unit": "cpu_core" @@ -113,7 +109,7 @@ "CounterMask": "1", "EventCode": "0x2d", "EventName": "XQ.FULL_CYCLES", - "PublicDescription": "number of cycles when the thread is active a= nd the uncore cannot take any further requests (for example prefetches, loa= ds or stores initiated by the Core that miss the L2 cache). Available PDIST= counters: 0", + "PublicDescription": "number of cycles when the thread is active a= nd the uncore cannot take any further requests (for example prefetches, loa= ds or stores initiated by the Core that miss the L2 cache).", "SampleAfterValue": "1000003", "UMask": "0x1", "Unit": "cpu_core" diff --git a/tools/perf/pmu-events/arch/x86/alderlake/pipeline.json b/tools= /perf/pmu-events/arch/x86/alderlake/pipeline.json index 48ef2a8cc49a..33d1f39e441f 100644 --- a/tools/perf/pmu-events/arch/x86/alderlake/pipeline.json +++ b/tools/perf/pmu-events/arch/x86/alderlake/pipeline.json @@ -6,7 +6,6 @@ "Deprecated": "1", "EventCode": "0xb0", "EventName": "ARITH.DIVIDER_ACTIVE", - "PublicDescription": "This event is deprecated. Refer to new event= ARITH.DIV_ACTIVE Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x9", "Unit": "cpu_core" @@ -27,7 +26,7 @@ "CounterMask": "1", "EventCode": "0xb0", "EventName": "ARITH.DIV_ACTIVE", - "PublicDescription": "Counts cycles when divide unit is busy execu= ting divide or square root operations. Accounts for integer and floating-po= int operations. Available PDIST counters: 0", + "PublicDescription": "Counts cycles when divide unit is busy execu= ting divide or square root operations. Accounts for integer and floating-po= int operations.", "SampleAfterValue": "1000003", "UMask": "0x9", "Unit": "cpu_core" @@ -57,7 +56,6 @@ "Deprecated": "1", "EventCode": "0xb0", "EventName": "ARITH.FP_DIVIDER_ACTIVE", - "PublicDescription": "This event is deprecated. Refer to new event= ARITH.FPDIV_ACTIVE Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x1", "Unit": "cpu_core" @@ -78,7 +76,6 @@ "CounterMask": "1", "EventCode": "0xb0", "EventName": "ARITH.IDIV_ACTIVE", - "PublicDescription": "This event counts the cycles the integer div= ider is busy. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x8", "Unit": "cpu_core" @@ -108,7 +105,6 @@ "Deprecated": "1", "EventCode": "0xb0", "EventName": "ARITH.INT_DIVIDER_ACTIVE", - "PublicDescription": "This event is deprecated. Refer to new event= ARITH.IDIV_ACTIVE Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x8", "Unit": "cpu_core" @@ -118,7 +114,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc1", "EventName": "ASSISTS.ANY", - "PublicDescription": "Counts the number of occurrences where a mic= rocode assist is invoked by hardware. Examples include AD (page Access Dirt= y), FP and AVX related assists. Available PDIST counters: 0", + "PublicDescription": "Counts the number of occurrences where a mic= rocode assist is invoked by hardware. Examples include AD (page Access Dirt= y), FP and AVX related assists.", "SampleAfterValue": "100003", "UMask": "0x1b", "Unit": "cpu_core" @@ -549,7 +545,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xec", "EventName": "CPU_CLK_UNHALTED.C01", - "PublicDescription": "Counts core clocks when the thread is in the= C0.1 light-weight slower wakeup time but more power saving optimized state= . This state can be entered via the TPAUSE or UMWAIT instructions. Availab= le PDIST counters: 0", + "PublicDescription": "Counts core clocks when the thread is in the= C0.1 light-weight slower wakeup time but more power saving optimized state= . This state can be entered via the TPAUSE or UMWAIT instructions.", "SampleAfterValue": "2000003", "UMask": "0x10", "Unit": "cpu_core" @@ -559,7 +555,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xec", "EventName": "CPU_CLK_UNHALTED.C02", - "PublicDescription": "Counts core clocks when the thread is in the= C0.2 light-weight faster wakeup time but less power saving optimized state= . This state can be entered via the TPAUSE or UMWAIT instructions. Availab= le PDIST counters: 0", + "PublicDescription": "Counts core clocks when the thread is in the= C0.2 light-weight faster wakeup time but less power saving optimized state= . This state can be entered via the TPAUSE or UMWAIT instructions.", "SampleAfterValue": "2000003", "UMask": "0x20", "Unit": "cpu_core" @@ -569,7 +565,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xec", "EventName": "CPU_CLK_UNHALTED.C0_WAIT", - "PublicDescription": "Counts core clocks when the thread is in the= C0.1 or C0.2 power saving optimized states (TPAUSE or UMWAIT instructions)= or running the PAUSE instruction. Available PDIST counters: 0", + "PublicDescription": "Counts core clocks when the thread is in the= C0.1 or C0.2 power saving optimized states (TPAUSE or UMWAIT instructions)= or running the PAUSE instruction.", "SampleAfterValue": "2000003", "UMask": "0x70", "Unit": "cpu_core" @@ -597,7 +593,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xec", "EventName": "CPU_CLK_UNHALTED.DISTRIBUTED", - "PublicDescription": "This event distributes cycle counts between = active hyperthreads, i.e., those in C0. A hyperthread becomes inactive whe= n it executes the HLT or MWAIT instructions. If all other hyperthreads are= inactive (or disabled or do not exist), all counts are attributed to this = hyperthread. To obtain the full count when the Core is active, sum the coun= ts from each hyperthread. Available PDIST counters: 0", + "PublicDescription": "This event distributes cycle counts between = active hyperthreads, i.e., those in C0. A hyperthread becomes inactive whe= n it executes the HLT or MWAIT instructions. If all other hyperthreads are= inactive (or disabled or do not exist), all counts are attributed to this = hyperthread. To obtain the full count when the Core is active, sum the coun= ts from each hyperthread.", "SampleAfterValue": "2000003", "UMask": "0x2", "Unit": "cpu_core" @@ -607,7 +603,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0x3c", "EventName": "CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE", - "PublicDescription": "Counts Core crystal clock cycles when curren= t thread is unhalted and the other thread is halted. Available PDIST counte= rs: 0", + "PublicDescription": "Counts Core crystal clock cycles when curren= t thread is unhalted and the other thread is halted.", "SampleAfterValue": "25003", "UMask": "0x2", "Unit": "cpu_core" @@ -617,7 +613,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xec", "EventName": "CPU_CLK_UNHALTED.PAUSE", - "PublicDescription": "CPU_CLK_UNHALTED.PAUSE Available PDIST count= ers: 0", "SampleAfterValue": "2000003", "UMask": "0x40", "Unit": "cpu_core" @@ -629,7 +624,6 @@ "EdgeDetect": "1", "EventCode": "0xec", "EventName": "CPU_CLK_UNHALTED.PAUSE_INST", - "PublicDescription": "CPU_CLK_UNHALTED.PAUSE_INST Available PDIST = counters: 0", "SampleAfterValue": "2000003", "UMask": "0x40", "Unit": "cpu_core" @@ -649,7 +643,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0x3c", "EventName": "CPU_CLK_UNHALTED.REF_DISTRIBUTED", - "PublicDescription": "This event distributes Core crystal clock cy= cle counts between active hyperthreads, i.e., those in C0 sleep-state. A hy= perthread becomes inactive when it executes the HLT or MWAIT instructions. = If one thread is active in a core, all counts are attributed to this hypert= hread. To obtain the full count when the Core is active, sum the counts fro= m each hyperthread. Available PDIST counters: 0", + "PublicDescription": "This event distributes Core crystal clock cy= cle counts between active hyperthreads, i.e., those in C0 sleep-state. A hy= perthread becomes inactive when it executes the HLT or MWAIT instructions. = If one thread is active in a core, all counts are attributed to this hypert= hread. To obtain the full count when the Core is active, sum the counts fro= m each hyperthread.", "SampleAfterValue": "2000003", "UMask": "0x8", "Unit": "cpu_core" @@ -687,7 +681,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0x3c", "EventName": "CPU_CLK_UNHALTED.REF_TSC_P", - "PublicDescription": "Counts the number of reference cycles when t= he core is not in a halt state. The core enters the halt state when it is r= unning the HLT instruction or the MWAIT instruction. This event is not affe= cted by core frequency changes (for example, P states, TM2 transitions) but= has the same incrementing frequency as the time stamp counter. This event = can approximate elapsed time while the core was not in a halt state. It is = counted on a dedicated fixed counter, leaving the four (eight when Hyperthr= eading is disabled) programmable counters available for other events. Note:= On all current platforms this event stops counting during 'throttling (TM)= ' states duty off periods the processor is 'halted'. The counter update is= done at a lower clock rate then the core clock the overflow status bit for= this counter may appear 'sticky'. After the counter has overflowed and so= ftware clears the overflow status bit and resets the counter to less than M= AX. The reset value to the counter is not clocked immediately so the overfl= ow status bit will flip 'high (1)' and generate another PMI (if enabled) af= ter which the reset value gets clocked into the counter. Therefore, softwar= e will get the interrupt, read the overflow status bit '1 for bit 34 while = the counter value is less than MAX. Software should ignore this case. Avail= able PDIST counters: 0", + "PublicDescription": "Counts the number of reference cycles when t= he core is not in a halt state. The core enters the halt state when it is r= unning the HLT instruction or the MWAIT instruction. This event is not affe= cted by core frequency changes (for example, P states, TM2 transitions) but= has the same incrementing frequency as the time stamp counter. This event = can approximate elapsed time while the core was not in a halt state. It is = counted on a dedicated fixed counter, leaving the four (eight when Hyperthr= eading is disabled) programmable counters available for other events. Note:= On all current platforms this event stops counting during 'throttling (TM)= ' states duty off periods the processor is 'halted'. The counter update is= done at a lower clock rate then the core clock the overflow status bit for= this counter may appear 'sticky'. After the counter has overflowed and so= ftware clears the overflow status bit and resets the counter to less than M= AX. The reset value to the counter is not clocked immediately so the overfl= ow status bit will flip 'high (1)' and generate another PMI (if enabled) af= ter which the reset value gets clocked into the counter. Therefore, softwar= e will get the interrupt, read the overflow status bit '1 for bit 34 while = the counter value is less than MAX. Software should ignore this case.", "SampleAfterValue": "2000003", "UMask": "0x1", "Unit": "cpu_core" @@ -724,7 +718,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0x3c", "EventName": "CPU_CLK_UNHALTED.THREAD_P", - "PublicDescription": "This is an architectural event that counts t= he number of thread cycles while the thread is not in a halt state. The thr= ead enters the halt state when it is running the HLT instruction. The core = frequency may change from time to time due to power or thermal throttling. = For this reason, this event may have a changing ratio with regards to wall = clock time. Available PDIST counters: 0", + "PublicDescription": "This is an architectural event that counts t= he number of thread cycles while the thread is not in a halt state. The thr= ead enters the halt state when it is running the HLT instruction. The core = frequency may change from time to time due to power or thermal throttling. = For this reason, this event may have a changing ratio with regards to wall = clock time.", "SampleAfterValue": "2000003", "Unit": "cpu_core" }, @@ -734,7 +728,6 @@ "CounterMask": "8", "EventCode": "0xa3", "EventName": "CYCLE_ACTIVITY.CYCLES_L1D_MISS", - "PublicDescription": "Cycles while L1 cache miss demand load is ou= tstanding. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x8", "Unit": "cpu_core" @@ -745,7 +738,6 @@ "CounterMask": "1", "EventCode": "0xa3", "EventName": "CYCLE_ACTIVITY.CYCLES_L2_MISS", - "PublicDescription": "Cycles while L2 cache miss demand load is ou= tstanding. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x1", "Unit": "cpu_core" @@ -756,7 +748,6 @@ "CounterMask": "16", "EventCode": "0xa3", "EventName": "CYCLE_ACTIVITY.CYCLES_MEM_ANY", - "PublicDescription": "Cycles while memory subsystem has an outstan= ding load. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x10", "Unit": "cpu_core" @@ -767,7 +758,6 @@ "CounterMask": "12", "EventCode": "0xa3", "EventName": "CYCLE_ACTIVITY.STALLS_L1D_MISS", - "PublicDescription": "Execution stalls while L1 cache miss demand = load is outstanding. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0xc", "Unit": "cpu_core" @@ -778,7 +768,6 @@ "CounterMask": "5", "EventCode": "0xa3", "EventName": "CYCLE_ACTIVITY.STALLS_L2_MISS", - "PublicDescription": "Execution stalls while L2 cache miss demand = load is outstanding. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x5", "Unit": "cpu_core" @@ -789,7 +778,6 @@ "CounterMask": "4", "EventCode": "0xa3", "EventName": "CYCLE_ACTIVITY.STALLS_TOTAL", - "PublicDescription": "Total execution stalls. Available PDIST coun= ters: 0", "SampleAfterValue": "1000003", "UMask": "0x4", "Unit": "cpu_core" @@ -799,7 +787,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa6", "EventName": "EXE_ACTIVITY.1_PORTS_UTIL", - "PublicDescription": "Counts cycles during which a total of 1 uop = was executed on all ports and Reservation Station (RS) was not empty. Avail= able PDIST counters: 0", + "PublicDescription": "Counts cycles during which a total of 1 uop = was executed on all ports and Reservation Station (RS) was not empty.", "SampleAfterValue": "2000003", "UMask": "0x2", "Unit": "cpu_core" @@ -809,7 +797,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa6", "EventName": "EXE_ACTIVITY.2_3_PORTS_UTIL", - "PublicDescription": "Cycles total of 2 or 3 uops are executed on = all ports and Reservation Station (RS) was not empty. Available PDIST count= ers: 0", "SampleAfterValue": "2000003", "UMask": "0xc", "Unit": "cpu_core" @@ -819,7 +806,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa6", "EventName": "EXE_ACTIVITY.2_PORTS_UTIL", - "PublicDescription": "Counts cycles during which a total of 2 uops= were executed on all ports and Reservation Station (RS) was not empty. Ava= ilable PDIST counters: 0", + "PublicDescription": "Counts cycles during which a total of 2 uops= were executed on all ports and Reservation Station (RS) was not empty.", "SampleAfterValue": "2000003", "UMask": "0x4", "Unit": "cpu_core" @@ -829,7 +816,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa6", "EventName": "EXE_ACTIVITY.3_PORTS_UTIL", - "PublicDescription": "Cycles total of 3 uops are executed on all p= orts and Reservation Station (RS) was not empty. Available PDIST counters: = 0", + "PublicDescription": "Cycles total of 3 uops are executed on all p= orts and Reservation Station (RS) was not empty.", "SampleAfterValue": "2000003", "UMask": "0x8", "Unit": "cpu_core" @@ -839,7 +826,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa6", "EventName": "EXE_ACTIVITY.4_PORTS_UTIL", - "PublicDescription": "Cycles total of 4 uops are executed on all p= orts and Reservation Station (RS) was not empty. Available PDIST counters: = 0", + "PublicDescription": "Cycles total of 4 uops are executed on all p= orts and Reservation Station (RS) was not empty.", "SampleAfterValue": "2000003", "UMask": "0x10", "Unit": "cpu_core" @@ -850,7 +837,6 @@ "CounterMask": "5", "EventCode": "0xa6", "EventName": "EXE_ACTIVITY.BOUND_ON_LOADS", - "PublicDescription": "Execution stalls while memory subsystem has = an outstanding load. Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x21", "Unit": "cpu_core" @@ -861,7 +847,7 @@ "CounterMask": "2", "EventCode": "0xa6", "EventName": "EXE_ACTIVITY.BOUND_ON_STORES", - "PublicDescription": "Counts cycles where the Store Buffer was ful= l and no loads caused an execution stall. Available PDIST counters: 0", + "PublicDescription": "Counts cycles where the Store Buffer was ful= l and no loads caused an execution stall.", "SampleAfterValue": "1000003", "UMask": "0x40", "Unit": "cpu_core" @@ -871,7 +857,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa6", "EventName": "EXE_ACTIVITY.EXE_BOUND_0_PORTS", - "PublicDescription": "Number of cycles total of 0 uops executed on= all ports, Reservation Station (RS) was not empty, the Store Buffer (SB) w= as not full and there was no outstanding load. Available PDIST counters: 0", + "PublicDescription": "Number of cycles total of 0 uops executed on= all ports, Reservation Station (RS) was not empty, the Store Buffer (SB) w= as not full and there was no outstanding load.", "SampleAfterValue": "1000003", "UMask": "0x80", "Unit": "cpu_core" @@ -881,7 +867,7 @@ "Counter": "0,1,2,3", "EventCode": "0x75", "EventName": "INST_DECODED.DECODERS", - "PublicDescription": "Number of decoders utilized in a cycle when = the MITE (legacy decode pipeline) fetches instructions. Available PDIST cou= nters: 0", + "PublicDescription": "Number of decoders utilized in a cycle when = the MITE (legacy decode pipeline) fetches instructions.", "SampleAfterValue": "2000003", "UMask": "0x1", "Unit": "cpu_core" @@ -927,7 +913,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc0", "EventName": "INST_RETIRED.MACRO_FUSED", - "PublicDescription": "INST_RETIRED.MACRO_FUSED Available PDIST cou= nters: 0", "SampleAfterValue": "2000003", "UMask": "0x10", "Unit": "cpu_core" @@ -937,7 +922,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc0", "EventName": "INST_RETIRED.NOP", - "PublicDescription": "Counts all retired NOP or ENDBR32/64 instruc= tions Available PDIST counters: 0", + "PublicDescription": "Counts all retired NOP or ENDBR32/64 instruc= tions", "SampleAfterValue": "2000003", "UMask": "0x2", "Unit": "cpu_core" @@ -956,7 +941,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc0", "EventName": "INST_RETIRED.REP_ITERATION", - "PublicDescription": "Number of iterations of Repeat (REP) string = retired instructions such as MOVS, CMPS, and SCAS. Each has a byte, word, a= nd doubleword version and string instructions can be repeated using a repet= ition prefix, REP, that allows their architectural execution to be repeated= a number of times as specified by the RCX register. Note the number of ite= rations is implementation-dependent. Available PDIST counters: 0", + "PublicDescription": "Number of iterations of Repeat (REP) string = retired instructions such as MOVS, CMPS, and SCAS. Each has a byte, word, a= nd doubleword version and string instructions can be repeated using a repet= ition prefix, REP, that allows their architectural execution to be repeated= a number of times as specified by the RCX register. Note the number of ite= rations is implementation-dependent.", "SampleAfterValue": "2000003", "UMask": "0x8", "Unit": "cpu_core" @@ -968,7 +953,7 @@ "EdgeDetect": "1", "EventCode": "0xad", "EventName": "INT_MISC.CLEARS_COUNT", - "PublicDescription": "Counts the number of speculative clears due = to any type of branch misprediction or machine clears Available PDIST count= ers: 0", + "PublicDescription": "Counts the number of speculative clears due = to any type of branch misprediction or machine clears", "SampleAfterValue": "500009", "UMask": "0x1", "Unit": "cpu_core" @@ -978,7 +963,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xad", "EventName": "INT_MISC.CLEAR_RESTEER_CYCLES", - "PublicDescription": "Cycles after recovery from a branch mispredi= ction or machine clear till the first uop is issued from the resteered path= . Available PDIST counters: 0", + "PublicDescription": "Cycles after recovery from a branch mispredi= ction or machine clear till the first uop is issued from the resteered path= .", "SampleAfterValue": "500009", "UMask": "0x80", "Unit": "cpu_core" @@ -988,7 +973,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xad", "EventName": "INT_MISC.RECOVERY_CYCLES", - "PublicDescription": "Counts core cycles when the Resource allocat= or was stalled due to recovery from an earlier branch misprediction or mach= ine clear event. Available PDIST counters: 0", + "PublicDescription": "Counts core cycles when the Resource allocat= or was stalled due to recovery from an earlier branch misprediction or mach= ine clear event.", "SampleAfterValue": "500009", "UMask": "0x1", "Unit": "cpu_core" @@ -1000,7 +985,6 @@ "EventName": "INT_MISC.UNKNOWN_BRANCH_CYCLES", "MSRIndex": "0x3F7", "MSRValue": "0x7", - "PublicDescription": "Bubble cycles of BAClear (Unknown Branch). A= vailable PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x40", "Unit": "cpu_core" @@ -1010,7 +994,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xad", "EventName": "INT_MISC.UOP_DROPPING", - "PublicDescription": "Estimated number of Top-down Microarchitectu= re Analysis slots that got dropped due to non front-end reasons Available P= DIST counters: 0", + "PublicDescription": "Estimated number of Top-down Microarchitectu= re Analysis slots that got dropped due to non front-end reasons", "SampleAfterValue": "1000003", "UMask": "0x10", "Unit": "cpu_core" @@ -1020,7 +1004,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe7", "EventName": "INT_VEC_RETIRED.128BIT", - "PublicDescription": "INT_VEC_RETIRED.128BIT Available PDIST count= ers: 0", "SampleAfterValue": "1000003", "UMask": "0x13", "Unit": "cpu_core" @@ -1030,7 +1013,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe7", "EventName": "INT_VEC_RETIRED.256BIT", - "PublicDescription": "INT_VEC_RETIRED.256BIT Available PDIST count= ers: 0", "SampleAfterValue": "1000003", "UMask": "0xac", "Unit": "cpu_core" @@ -1040,7 +1022,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe7", "EventName": "INT_VEC_RETIRED.ADD_128", - "PublicDescription": "Number of retired integer ADD/SUB (regular o= r horizontal), SAD 128-bit vector instructions. Available PDIST counters: 0= ", + "PublicDescription": "Number of retired integer ADD/SUB (regular o= r horizontal), SAD 128-bit vector instructions.", "SampleAfterValue": "1000003", "UMask": "0x3", "Unit": "cpu_core" @@ -1050,7 +1032,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe7", "EventName": "INT_VEC_RETIRED.ADD_256", - "PublicDescription": "Number of retired integer ADD/SUB (regular o= r horizontal), SAD 256-bit vector instructions. Available PDIST counters: 0= ", + "PublicDescription": "Number of retired integer ADD/SUB (regular o= r horizontal), SAD 256-bit vector instructions.", "SampleAfterValue": "1000003", "UMask": "0xc", "Unit": "cpu_core" @@ -1060,7 +1042,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe7", "EventName": "INT_VEC_RETIRED.MUL_256", - "PublicDescription": "INT_VEC_RETIRED.MUL_256 Available PDIST coun= ters: 0", "SampleAfterValue": "1000003", "UMask": "0x80", "Unit": "cpu_core" @@ -1070,7 +1051,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe7", "EventName": "INT_VEC_RETIRED.SHUFFLES", - "PublicDescription": "INT_VEC_RETIRED.SHUFFLES Available PDIST cou= nters: 0", "SampleAfterValue": "1000003", "UMask": "0x40", "Unit": "cpu_core" @@ -1080,7 +1060,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe7", "EventName": "INT_VEC_RETIRED.VNNI_128", - "PublicDescription": "INT_VEC_RETIRED.VNNI_128 Available PDIST cou= nters: 0", "SampleAfterValue": "1000003", "UMask": "0x10", "Unit": "cpu_core" @@ -1090,7 +1069,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe7", "EventName": "INT_VEC_RETIRED.VNNI_256", - "PublicDescription": "INT_VEC_RETIRED.VNNI_256 Available PDIST cou= nters: 0", "SampleAfterValue": "1000003", "UMask": "0x20", "Unit": "cpu_core" @@ -1119,7 +1097,7 @@ "Counter": "0,1,2,3", "EventCode": "0x03", "EventName": "LD_BLOCKS.ADDRESS_ALIAS", - "PublicDescription": "Counts the number of times a load got blocke= d due to false dependencies in MOB due to partial compare on address. Avail= able PDIST counters: 0", + "PublicDescription": "Counts the number of times a load got blocke= d due to false dependencies in MOB due to partial compare on address.", "SampleAfterValue": "100003", "UMask": "0x4", "Unit": "cpu_core" @@ -1138,7 +1116,7 @@ "Counter": "0,1,2,3", "EventCode": "0x03", "EventName": "LD_BLOCKS.NO_SR", - "PublicDescription": "Counts the number of times that split load o= perations are temporarily blocked because all resources for handling the sp= lit accesses are in use. Available PDIST counters: 0", + "PublicDescription": "Counts the number of times that split load o= perations are temporarily blocked because all resources for handling the sp= lit accesses are in use.", "SampleAfterValue": "100003", "UMask": "0x88", "Unit": "cpu_core" @@ -1148,7 +1126,7 @@ "Counter": "0,1,2,3", "EventCode": "0x03", "EventName": "LD_BLOCKS.STORE_FORWARD", - "PublicDescription": "Counts the number of times where store forwa= rding was prevented for a load operation. The most common case is a load bl= ocked due to the address of memory access (partially) overlapping with a pr= eceding uncompleted store. Note: See the table of not supported store forwa= rds in the Optimization Guide. Available PDIST counters: 0", + "PublicDescription": "Counts the number of times where store forwa= rding was prevented for a load operation. The most common case is a load bl= ocked due to the address of memory access (partially) overlapping with a pr= eceding uncompleted store. Note: See the table of not supported store forwa= rds in the Optimization Guide.", "SampleAfterValue": "100003", "UMask": "0x82", "Unit": "cpu_core" @@ -1158,7 +1136,7 @@ "Counter": "0,1,2,3", "EventCode": "0x4c", "EventName": "LOAD_HIT_PREFETCH.SWPF", - "PublicDescription": "Counts all software-prefetch load dispatches= that hit the fill buffer (FB) allocated for the software prefetch. It can = also be incremented by some lock instructions. So it should only be used wi= th profiling so that the locks can be excluded by ASM (Assembly File) inspe= ction of the nearby instructions. Available PDIST counters: 0", + "PublicDescription": "Counts all software-prefetch load dispatches= that hit the fill buffer (FB) allocated for the software prefetch. It can = also be incremented by some lock instructions. So it should only be used wi= th profiling so that the locks can be excluded by ASM (Assembly File) inspe= ction of the nearby instructions.", "SampleAfterValue": "100003", "UMask": "0x1", "Unit": "cpu_core" @@ -1169,7 +1147,7 @@ "CounterMask": "1", "EventCode": "0xa8", "EventName": "LSD.CYCLES_ACTIVE", - "PublicDescription": "Counts the cycles when at least one uop is d= elivered by the LSD (Loop-stream detector). Available PDIST counters: 0", + "PublicDescription": "Counts the cycles when at least one uop is d= elivered by the LSD (Loop-stream detector).", "SampleAfterValue": "2000003", "UMask": "0x1", "Unit": "cpu_core" @@ -1180,7 +1158,7 @@ "CounterMask": "6", "EventCode": "0xa8", "EventName": "LSD.CYCLES_OK", - "PublicDescription": "Counts the cycles when optimal number of uop= s is delivered by the LSD (Loop-stream detector). Available PDIST counters:= 0", + "PublicDescription": "Counts the cycles when optimal number of uop= s is delivered by the LSD (Loop-stream detector).", "SampleAfterValue": "2000003", "UMask": "0x1", "Unit": "cpu_core" @@ -1190,7 +1168,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa8", "EventName": "LSD.UOPS", - "PublicDescription": "Counts the number of uops delivered to the b= ack-end by the LSD(Loop Stream Detector). Available PDIST counters: 0", + "PublicDescription": "Counts the number of uops delivered to the b= ack-end by the LSD(Loop Stream Detector).", "SampleAfterValue": "2000003", "UMask": "0x1", "Unit": "cpu_core" @@ -1202,7 +1180,7 @@ "EdgeDetect": "1", "EventCode": "0xc3", "EventName": "MACHINE_CLEARS.COUNT", - "PublicDescription": "Counts the number of machine clears (nukes) = of any type. Available PDIST counters: 0", + "PublicDescription": "Counts the number of machine clears (nukes) = of any type.", "SampleAfterValue": "100003", "UMask": "0x1", "Unit": "cpu_core" @@ -1258,7 +1236,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc3", "EventName": "MACHINE_CLEARS.SMC", - "PublicDescription": "Counts self-modifying code (SMC) detected, w= hich causes a machine clear. Available PDIST counters: 0", + "PublicDescription": "Counts self-modifying code (SMC) detected, w= hich causes a machine clear.", "SampleAfterValue": "100003", "UMask": "0x4", "Unit": "cpu_core" @@ -1268,7 +1246,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe0", "EventName": "MISC2_RETIRED.LFENCE", - "PublicDescription": "number of LFENCE retired instructions Availa= ble PDIST counters: 0", + "PublicDescription": "number of LFENCE retired instructions", "SampleAfterValue": "400009", "UMask": "0x20", "Unit": "cpu_core" @@ -1288,7 +1266,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xcc", "EventName": "MISC_RETIRED.LBR_INSERTS", - "PublicDescription": "Increments when an entry is added to the Las= t Branch Record (LBR) array (or removed from the array in case of RETURNs i= n call stack mode). The event requires LBR enable via IA32_DEBUGCTL MSR and= branch type selection via MSR_LBR_SELECT. Available PDIST counters: 0", + "PublicDescription": "Increments when an entry is added to the Las= t Branch Record (LBR) array (or removed from the array in case of RETURNs i= n call stack mode). The event requires LBR enable via IA32_DEBUGCTL MSR and= branch type selection via MSR_LBR_SELECT.", "SampleAfterValue": "100003", "UMask": "0x20", "Unit": "cpu_core" @@ -1298,7 +1276,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa2", "EventName": "RESOURCE_STALLS.SB", - "PublicDescription": "Counts allocation stall cycles caused by the= store buffer (SB) being full. This counts cycles that the pipeline back-en= d blocked uop delivery from the front-end. Available PDIST counters: 0", + "PublicDescription": "Counts allocation stall cycles caused by the= store buffer (SB) being full. This counts cycles that the pipeline back-en= d blocked uop delivery from the front-end.", "SampleAfterValue": "100003", "UMask": "0x8", "Unit": "cpu_core" @@ -1308,7 +1286,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa2", "EventName": "RESOURCE_STALLS.SCOREBOARD", - "PublicDescription": "Counts cycles where the pipeline is stalled = due to serializing operations. Available PDIST counters: 0", "SampleAfterValue": "100003", "UMask": "0x2", "Unit": "cpu_core" @@ -1318,7 +1295,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa5", "EventName": "RS.EMPTY", - "PublicDescription": "Counts cycles during which the reservation s= tation (RS) is empty for this logical processor. This is usually caused whe= n the front-end pipeline runs into starvation periods (e.g. branch mispredi= ctions or i-cache misses) Available PDIST counters: 0", + "PublicDescription": "Counts cycles during which the reservation s= tation (RS) is empty for this logical processor. This is usually caused whe= n the front-end pipeline runs into starvation periods (e.g. branch mispredi= ctions or i-cache misses)", "SampleAfterValue": "1000003", "UMask": "0x7", "Unit": "cpu_core" @@ -1331,7 +1308,7 @@ "EventCode": "0xa5", "EventName": "RS.EMPTY_COUNT", "Invert": "1", - "PublicDescription": "Counts end of periods where the Reservation = Station (RS) was empty. Could be useful to closely sample on front-end late= ncy issues (see the FRONTEND_RETIRED event of designated precise events) Av= ailable PDIST counters: 0", + "PublicDescription": "Counts end of periods where the Reservation = Station (RS) was empty. Could be useful to closely sample on front-end late= ncy issues (see the FRONTEND_RETIRED event of designated precise events)", "SampleAfterValue": "100003", "UMask": "0x7", "Unit": "cpu_core" @@ -1341,7 +1318,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa5", "EventName": "RS.EMPTY_RESOURCE", - "PublicDescription": "Cycles when Reservation Station (RS) is empt= y due to a resource in the back-end Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x1", "Unit": "cpu_core" @@ -1355,7 +1331,6 @@ "EventCode": "0xa5", "EventName": "RS_EMPTY.COUNT", "Invert": "1", - "PublicDescription": "This event is deprecated. Refer to new event= RS.EMPTY_COUNT Available PDIST counters: 0", "SampleAfterValue": "100003", "UMask": "0x7", "Unit": "cpu_core" @@ -1366,7 +1341,6 @@ "Deprecated": "1", "EventCode": "0xa5", "EventName": "RS_EMPTY.CYCLES", - "PublicDescription": "This event is deprecated. Refer to new event= RS.EMPTY Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x7", "Unit": "cpu_core" @@ -1395,7 +1369,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa4", "EventName": "TOPDOWN.BACKEND_BOUND_SLOTS", - "PublicDescription": "Number of slots in TMA method where no micro= -operations were being issued from front-end to back-end of the machine due= to lack of back-end resources. Available PDIST counters: 0", + "PublicDescription": "Number of slots in TMA method where no micro= -operations were being issued from front-end to back-end of the machine due= to lack of back-end resources.", "SampleAfterValue": "10000003", "UMask": "0x2", "Unit": "cpu_core" @@ -1405,7 +1379,7 @@ "Counter": "0", "EventCode": "0xa4", "EventName": "TOPDOWN.BAD_SPEC_SLOTS", - "PublicDescription": "Number of slots of TMA method that were wast= ed due to incorrect speculation. It covers all types of control-flow or dat= a-related mis-speculations. Available PDIST counters: 0", + "PublicDescription": "Number of slots of TMA method that were wast= ed due to incorrect speculation. It covers all types of control-flow or dat= a-related mis-speculations.", "SampleAfterValue": "10000003", "UMask": "0x4", "Unit": "cpu_core" @@ -1415,7 +1389,7 @@ "Counter": "0", "EventCode": "0xa4", "EventName": "TOPDOWN.BR_MISPREDICT_SLOTS", - "PublicDescription": "Number of TMA slots that were wasted due to = incorrect speculation by (any type of) branch mispredictions. This event es= timates number of speculative operations that were issued but not retired a= s well as the out-of-order engine recovery past a branch misprediction. Ava= ilable PDIST counters: 0", + "PublicDescription": "Number of TMA slots that were wasted due to = incorrect speculation by (any type of) branch mispredictions. This event es= timates number of speculative operations that were issued but not retired a= s well as the out-of-order engine recovery past a branch misprediction.", "SampleAfterValue": "10000003", "UMask": "0x8", "Unit": "cpu_core" @@ -1425,7 +1399,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa4", "EventName": "TOPDOWN.MEMORY_BOUND_SLOTS", - "PublicDescription": "TOPDOWN.MEMORY_BOUND_SLOTS Available PDIST c= ounters: 0", "SampleAfterValue": "10000003", "UMask": "0x10", "Unit": "cpu_core" @@ -1444,7 +1417,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa4", "EventName": "TOPDOWN.SLOTS_P", - "PublicDescription": "Counts the number of available slots for an = unhalted logical processor. The event increments by machine-width of the na= rrowest pipeline as employed by the Top-down Microarchitecture Analysis met= hod. The count is distributed among unhalted logical processors (hyper-thre= ads) who share the same physical core. Available PDIST counters: 0", + "PublicDescription": "Counts the number of available slots for an = unhalted logical processor. The event increments by machine-width of the na= rrowest pipeline as employed by the Top-down Microarchitecture Analysis met= hod. The count is distributed among unhalted logical processors (hyper-thre= ads) who share the same physical core.", "SampleAfterValue": "10000003", "UMask": "0x1", "Unit": "cpu_core" @@ -1661,7 +1634,6 @@ "Counter": "0,1,2,3", "EventCode": "0x76", "EventName": "UOPS_DECODED.DEC0_UOPS", - "PublicDescription": "UOPS_DECODED.DEC0_UOPS Available PDIST count= ers: 0", "SampleAfterValue": "1000003", "UMask": "0x1", "Unit": "cpu_core" @@ -1671,7 +1643,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb2", "EventName": "UOPS_DISPATCHED.PORT_0", - "PublicDescription": "Number of uops dispatch to execution port 0= . Available PDIST counters: 0", + "PublicDescription": "Number of uops dispatch to execution port 0= .", "SampleAfterValue": "2000003", "UMask": "0x1", "Unit": "cpu_core" @@ -1681,7 +1653,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb2", "EventName": "UOPS_DISPATCHED.PORT_1", - "PublicDescription": "Number of uops dispatch to execution port 1= . Available PDIST counters: 0", + "PublicDescription": "Number of uops dispatch to execution port 1= .", "SampleAfterValue": "2000003", "UMask": "0x2", "Unit": "cpu_core" @@ -1691,7 +1663,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb2", "EventName": "UOPS_DISPATCHED.PORT_2_3_10", - "PublicDescription": "Number of uops dispatch to execution ports 2= , 3 and 10 Available PDIST counters: 0", + "PublicDescription": "Number of uops dispatch to execution ports 2= , 3 and 10", "SampleAfterValue": "2000003", "UMask": "0x4", "Unit": "cpu_core" @@ -1701,7 +1673,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb2", "EventName": "UOPS_DISPATCHED.PORT_4_9", - "PublicDescription": "Number of uops dispatch to execution ports 4= and 9 Available PDIST counters: 0", + "PublicDescription": "Number of uops dispatch to execution ports 4= and 9", "SampleAfterValue": "2000003", "UMask": "0x10", "Unit": "cpu_core" @@ -1711,7 +1683,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb2", "EventName": "UOPS_DISPATCHED.PORT_5_11", - "PublicDescription": "Number of uops dispatch to execution ports 5= and 11 Available PDIST counters: 0", + "PublicDescription": "Number of uops dispatch to execution ports 5= and 11", "SampleAfterValue": "2000003", "UMask": "0x20", "Unit": "cpu_core" @@ -1721,7 +1693,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb2", "EventName": "UOPS_DISPATCHED.PORT_6", - "PublicDescription": "Number of uops dispatch to execution port 6= . Available PDIST counters: 0", + "PublicDescription": "Number of uops dispatch to execution port 6= .", "SampleAfterValue": "2000003", "UMask": "0x40", "Unit": "cpu_core" @@ -1731,7 +1703,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb2", "EventName": "UOPS_DISPATCHED.PORT_7_8", - "PublicDescription": "Number of uops dispatch to execution ports = 7 and 8. Available PDIST counters: 0", + "PublicDescription": "Number of uops dispatch to execution ports = 7 and 8.", "SampleAfterValue": "2000003", "UMask": "0x80", "Unit": "cpu_core" @@ -1742,7 +1714,7 @@ "CounterMask": "1", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.CORE_CYCLES_GE_1", - "PublicDescription": "Counts cycles when at least 1 micro-op is ex= ecuted from any thread on physical core. Available PDIST counters: 0", + "PublicDescription": "Counts cycles when at least 1 micro-op is ex= ecuted from any thread on physical core.", "SampleAfterValue": "2000003", "UMask": "0x2", "Unit": "cpu_core" @@ -1753,7 +1725,7 @@ "CounterMask": "2", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.CORE_CYCLES_GE_2", - "PublicDescription": "Counts cycles when at least 2 micro-ops are = executed from any thread on physical core. Available PDIST counters: 0", + "PublicDescription": "Counts cycles when at least 2 micro-ops are = executed from any thread on physical core.", "SampleAfterValue": "2000003", "UMask": "0x2", "Unit": "cpu_core" @@ -1764,7 +1736,7 @@ "CounterMask": "3", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.CORE_CYCLES_GE_3", - "PublicDescription": "Counts cycles when at least 3 micro-ops are = executed from any thread on physical core. Available PDIST counters: 0", + "PublicDescription": "Counts cycles when at least 3 micro-ops are = executed from any thread on physical core.", "SampleAfterValue": "2000003", "UMask": "0x2", "Unit": "cpu_core" @@ -1775,7 +1747,7 @@ "CounterMask": "4", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.CORE_CYCLES_GE_4", - "PublicDescription": "Counts cycles when at least 4 micro-ops are = executed from any thread on physical core. Available PDIST counters: 0", + "PublicDescription": "Counts cycles when at least 4 micro-ops are = executed from any thread on physical core.", "SampleAfterValue": "2000003", "UMask": "0x2", "Unit": "cpu_core" @@ -1786,7 +1758,7 @@ "CounterMask": "1", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.CYCLES_GE_1", - "PublicDescription": "Cycles where at least 1 uop was executed per= -thread. Available PDIST counters: 0", + "PublicDescription": "Cycles where at least 1 uop was executed per= -thread.", "SampleAfterValue": "2000003", "UMask": "0x1", "Unit": "cpu_core" @@ -1797,7 +1769,7 @@ "CounterMask": "2", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.CYCLES_GE_2", - "PublicDescription": "Cycles where at least 2 uops were executed p= er-thread. Available PDIST counters: 0", + "PublicDescription": "Cycles where at least 2 uops were executed p= er-thread.", "SampleAfterValue": "2000003", "UMask": "0x1", "Unit": "cpu_core" @@ -1808,7 +1780,7 @@ "CounterMask": "3", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.CYCLES_GE_3", - "PublicDescription": "Cycles where at least 3 uops were executed p= er-thread. Available PDIST counters: 0", + "PublicDescription": "Cycles where at least 3 uops were executed p= er-thread.", "SampleAfterValue": "2000003", "UMask": "0x1", "Unit": "cpu_core" @@ -1819,7 +1791,7 @@ "CounterMask": "4", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.CYCLES_GE_4", - "PublicDescription": "Cycles where at least 4 uops were executed p= er-thread. Available PDIST counters: 0", + "PublicDescription": "Cycles where at least 4 uops were executed p= er-thread.", "SampleAfterValue": "2000003", "UMask": "0x1", "Unit": "cpu_core" @@ -1831,7 +1803,7 @@ "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.STALLS", "Invert": "1", - "PublicDescription": "Counts cycles during which no uops were disp= atched from the Reservation Station (RS) per thread. Available PDIST counte= rs: 0", + "PublicDescription": "Counts cycles during which no uops were disp= atched from the Reservation Station (RS) per thread.", "SampleAfterValue": "2000003", "UMask": "0x1", "Unit": "cpu_core" @@ -1844,7 +1816,6 @@ "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.STALL_CYCLES", "Invert": "1", - "PublicDescription": "This event is deprecated. Refer to new event= UOPS_EXECUTED.STALLS Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x1", "Unit": "cpu_core" @@ -1854,7 +1825,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.THREAD", - "PublicDescription": "Counts the number of uops to be executed per= -thread each cycle. Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x1", "Unit": "cpu_core" @@ -1864,7 +1834,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.X87", - "PublicDescription": "Counts the number of x87 uops executed. Avai= lable PDIST counters: 0", + "PublicDescription": "Counts the number of x87 uops executed.", "SampleAfterValue": "2000003", "UMask": "0x10", "Unit": "cpu_core" @@ -1883,7 +1853,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xae", "EventName": "UOPS_ISSUED.ANY", - "PublicDescription": "Counts the number of uops that the Resource = Allocation Table (RAT) issues to the Reservation Station (RS). Available PD= IST counters: 0", + "PublicDescription": "Counts the number of uops that the Resource = Allocation Table (RAT) issues to the Reservation Station (RS).", "SampleAfterValue": "2000003", "UMask": "0x1", "Unit": "cpu_core" @@ -1894,7 +1864,6 @@ "CounterMask": "1", "EventCode": "0xae", "EventName": "UOPS_ISSUED.CYCLES", - "PublicDescription": "UOPS_ISSUED.CYCLES Available PDIST counters:= 0", "SampleAfterValue": "2000003", "UMask": "0x1", "Unit": "cpu_core" @@ -1913,7 +1882,7 @@ "CounterMask": "1", "EventCode": "0xc2", "EventName": "UOPS_RETIRED.CYCLES", - "PublicDescription": "Counts cycles where at least one uop has ret= ired. Available PDIST counters: 0", + "PublicDescription": "Counts cycles where at least one uop has ret= ired.", "SampleAfterValue": "1000003", "UMask": "0x2", "Unit": "cpu_core" @@ -1923,7 +1892,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc2", "EventName": "UOPS_RETIRED.HEAVY", - "PublicDescription": "Counts the number of retired micro-operation= s (uops) except the last uop of each instruction. An instruction that is de= coded into less than two uops does not contribute to the count. Available P= DIST counters: 0", + "PublicDescription": "Counts the number of retired micro-operation= s (uops) except the last uop of each instruction. An instruction that is de= coded into less than two uops does not contribute to the count.", "SampleAfterValue": "2000003", "UMask": "0x1", "Unit": "cpu_core" @@ -1954,7 +1923,6 @@ "EventName": "UOPS_RETIRED.MS", "MSRIndex": "0x3F7", "MSRValue": "0x8", - "PublicDescription": "UOPS_RETIRED.MS Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x4", "Unit": "cpu_core" @@ -1964,7 +1932,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc2", "EventName": "UOPS_RETIRED.SLOTS", - "PublicDescription": "Counts the retirement slots used each cycle.= Available PDIST counters: 0", + "PublicDescription": "Counts the retirement slots used each cycle.= ", "SampleAfterValue": "2000003", "UMask": "0x2", "Unit": "cpu_core" @@ -1976,7 +1944,7 @@ "EventCode": "0xc2", "EventName": "UOPS_RETIRED.STALLS", "Invert": "1", - "PublicDescription": "This event counts cycles without actually re= tired uops. Available PDIST counters: 0", + "PublicDescription": "This event counts cycles without actually re= tired uops.", "SampleAfterValue": "1000003", "UMask": "0x2", "Unit": "cpu_core" @@ -1989,7 +1957,6 @@ "EventCode": "0xc2", "EventName": "UOPS_RETIRED.STALL_CYCLES", "Invert": "1", - "PublicDescription": "This event is deprecated. Refer to new event= UOPS_RETIRED.STALLS Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x2", "Unit": "cpu_core" diff --git a/tools/perf/pmu-events/arch/x86/alderlake/uncore-interconnect.j= son b/tools/perf/pmu-events/arch/x86/alderlake/uncore-interconnect.json index 7c0779c74154..b5604c7534e1 100644 --- a/tools/perf/pmu-events/arch/x86/alderlake/uncore-interconnect.json +++ b/tools/perf/pmu-events/arch/x86/alderlake/uncore-interconnect.json @@ -65,7 +65,6 @@ "Counter": "0,1", "EventCode": "0x81", "EventName": "UNC_ARB_REQ_TRK_REQUEST.DRD", - "Experimental": "1", "PerPkg": "1", "UMask": "0x2", "Unit": "ARB" @@ -103,7 +102,6 @@ "Counter": "0,1", "EventCode": "0x81", "EventName": "UNC_ARB_TRK_REQUESTS.RD", - "Experimental": "1", "PerPkg": "1", "UMask": "0x2", "Unit": "ARB" diff --git a/tools/perf/pmu-events/arch/x86/alderlake/virtual-memory.json b= /tools/perf/pmu-events/arch/x86/alderlake/virtual-memory.json index ffbbd08acc68..132ce48af6d9 100644 --- a/tools/perf/pmu-events/arch/x86/alderlake/virtual-memory.json +++ b/tools/perf/pmu-events/arch/x86/alderlake/virtual-memory.json @@ -4,7 +4,7 @@ "Counter": "0,1,2,3", "EventCode": "0x12", "EventName": "DTLB_LOAD_MISSES.STLB_HIT", - "PublicDescription": "Counts loads that miss the DTLB (Data TLB) a= nd hit the STLB (Second level TLB). Available PDIST counters: 0", + "PublicDescription": "Counts loads that miss the DTLB (Data TLB) a= nd hit the STLB (Second level TLB).", "SampleAfterValue": "100003", "UMask": "0x20", "Unit": "cpu_core" @@ -15,7 +15,7 @@ "CounterMask": "1", "EventCode": "0x12", "EventName": "DTLB_LOAD_MISSES.WALK_ACTIVE", - "PublicDescription": "Counts cycles when at least one PMH (Page Mi= ss Handler) is busy with a page walk for a demand load. Available PDIST cou= nters: 0", + "PublicDescription": "Counts cycles when at least one PMH (Page Mi= ss Handler) is busy with a page walk for a demand load.", "SampleAfterValue": "100003", "UMask": "0x10", "Unit": "cpu_core" @@ -35,7 +35,7 @@ "Counter": "0,1,2,3", "EventCode": "0x12", "EventName": "DTLB_LOAD_MISSES.WALK_COMPLETED", - "PublicDescription": "Counts completed page walks (all page sizes= ) caused by demand data loads. This implies it missed in the DTLB and furth= er levels of TLB. The page walk can end with or without a fault. Available = PDIST counters: 0", + "PublicDescription": "Counts completed page walks (all page sizes= ) caused by demand data loads. This implies it missed in the DTLB and furth= er levels of TLB. The page walk can end with or without a fault.", "SampleAfterValue": "100003", "UMask": "0xe", "Unit": "cpu_core" @@ -45,7 +45,7 @@ "Counter": "0,1,2,3", "EventCode": "0x12", "EventName": "DTLB_LOAD_MISSES.WALK_COMPLETED_1G", - "PublicDescription": "Counts completed page walks (1G sizes) caus= ed by demand data loads. This implies address translations missed in the DT= LB and further levels of TLB. The page walk can end with or without a fault= . Available PDIST counters: 0", + "PublicDescription": "Counts completed page walks (1G sizes) caus= ed by demand data loads. This implies address translations missed in the DT= LB and further levels of TLB. The page walk can end with or without a fault= .", "SampleAfterValue": "100003", "UMask": "0x8", "Unit": "cpu_core" @@ -55,7 +55,7 @@ "Counter": "0,1,2,3", "EventCode": "0x12", "EventName": "DTLB_LOAD_MISSES.WALK_COMPLETED_2M_4M", - "PublicDescription": "Counts completed page walks (2M/4M sizes) c= aused by demand data loads. This implies address translations missed in the= DTLB and further levels of TLB. The page walk can end with or without a fa= ult. Available PDIST counters: 0", + "PublicDescription": "Counts completed page walks (2M/4M sizes) c= aused by demand data loads. This implies address translations missed in the= DTLB and further levels of TLB. The page walk can end with or without a fa= ult.", "SampleAfterValue": "100003", "UMask": "0x4", "Unit": "cpu_core" @@ -65,7 +65,7 @@ "Counter": "0,1,2,3", "EventCode": "0x12", "EventName": "DTLB_LOAD_MISSES.WALK_COMPLETED_4K", - "PublicDescription": "Counts completed page walks (4K sizes) caus= ed by demand data loads. This implies address translations missed in the DT= LB and further levels of TLB. The page walk can end with or without a fault= . Available PDIST counters: 0", + "PublicDescription": "Counts completed page walks (4K sizes) caus= ed by demand data loads. This implies address translations missed in the DT= LB and further levels of TLB. The page walk can end with or without a fault= .", "SampleAfterValue": "100003", "UMask": "0x2", "Unit": "cpu_core" @@ -75,7 +75,7 @@ "Counter": "0,1,2,3", "EventCode": "0x12", "EventName": "DTLB_LOAD_MISSES.WALK_PENDING", - "PublicDescription": "Counts the number of page walks outstanding = for a demand load in the PMH (Page Miss Handler) each cycle. Available PDIS= T counters: 0", + "PublicDescription": "Counts the number of page walks outstanding = for a demand load in the PMH (Page Miss Handler) each cycle.", "SampleAfterValue": "100003", "UMask": "0x10", "Unit": "cpu_core" @@ -85,7 +85,7 @@ "Counter": "0,1,2,3", "EventCode": "0x13", "EventName": "DTLB_STORE_MISSES.STLB_HIT", - "PublicDescription": "Counts stores that miss the DTLB (Data TLB) = and hit the STLB (2nd Level TLB). Available PDIST counters: 0", + "PublicDescription": "Counts stores that miss the DTLB (Data TLB) = and hit the STLB (2nd Level TLB).", "SampleAfterValue": "100003", "UMask": "0x20", "Unit": "cpu_core" @@ -96,7 +96,7 @@ "CounterMask": "1", "EventCode": "0x13", "EventName": "DTLB_STORE_MISSES.WALK_ACTIVE", - "PublicDescription": "Counts cycles when at least one PMH (Page Mi= ss Handler) is busy with a page walk for a store. Available PDIST counters:= 0", + "PublicDescription": "Counts cycles when at least one PMH (Page Mi= ss Handler) is busy with a page walk for a store.", "SampleAfterValue": "100003", "UMask": "0x10", "Unit": "cpu_core" @@ -116,7 +116,7 @@ "Counter": "0,1,2,3", "EventCode": "0x13", "EventName": "DTLB_STORE_MISSES.WALK_COMPLETED", - "PublicDescription": "Counts completed page walks (all page sizes= ) caused by demand data stores. This implies it missed in the DTLB and furt= her levels of TLB. The page walk can end with or without a fault. Available= PDIST counters: 0", + "PublicDescription": "Counts completed page walks (all page sizes= ) caused by demand data stores. This implies it missed in the DTLB and furt= her levels of TLB. The page walk can end with or without a fault.", "SampleAfterValue": "100003", "UMask": "0xe", "Unit": "cpu_core" @@ -126,7 +126,7 @@ "Counter": "0,1,2,3", "EventCode": "0x13", "EventName": "DTLB_STORE_MISSES.WALK_COMPLETED_1G", - "PublicDescription": "Counts completed page walks (1G sizes) caus= ed by demand data stores. This implies address translations missed in the D= TLB and further levels of TLB. The page walk can end with or without a faul= t. Available PDIST counters: 0", + "PublicDescription": "Counts completed page walks (1G sizes) caus= ed by demand data stores. This implies address translations missed in the D= TLB and further levels of TLB. The page walk can end with or without a faul= t.", "SampleAfterValue": "100003", "UMask": "0x8", "Unit": "cpu_core" @@ -136,7 +136,7 @@ "Counter": "0,1,2,3", "EventCode": "0x13", "EventName": "DTLB_STORE_MISSES.WALK_COMPLETED_2M_4M", - "PublicDescription": "Counts completed page walks (2M/4M sizes) c= aused by demand data stores. This implies address translations missed in th= e DTLB and further levels of TLB. The page walk can end with or without a f= ault. Available PDIST counters: 0", + "PublicDescription": "Counts completed page walks (2M/4M sizes) c= aused by demand data stores. This implies address translations missed in th= e DTLB and further levels of TLB. The page walk can end with or without a f= ault.", "SampleAfterValue": "100003", "UMask": "0x4", "Unit": "cpu_core" @@ -146,7 +146,7 @@ "Counter": "0,1,2,3", "EventCode": "0x13", "EventName": "DTLB_STORE_MISSES.WALK_COMPLETED_4K", - "PublicDescription": "Counts completed page walks (4K sizes) caus= ed by demand data stores. This implies address translations missed in the D= TLB and further levels of TLB. The page walk can end with or without a faul= t. Available PDIST counters: 0", + "PublicDescription": "Counts completed page walks (4K sizes) caus= ed by demand data stores. This implies address translations missed in the D= TLB and further levels of TLB. The page walk can end with or without a faul= t.", "SampleAfterValue": "100003", "UMask": "0x2", "Unit": "cpu_core" @@ -156,7 +156,7 @@ "Counter": "0,1,2,3", "EventCode": "0x13", "EventName": "DTLB_STORE_MISSES.WALK_PENDING", - "PublicDescription": "Counts the number of page walks outstanding = for a store in the PMH (Page Miss Handler) each cycle. Available PDIST coun= ters: 0", + "PublicDescription": "Counts the number of page walks outstanding = for a store in the PMH (Page Miss Handler) each cycle.", "SampleAfterValue": "100003", "UMask": "0x10", "Unit": "cpu_core" @@ -184,7 +184,7 @@ "Counter": "0,1,2,3", "EventCode": "0x11", "EventName": "ITLB_MISSES.STLB_HIT", - "PublicDescription": "Counts instruction fetch requests that miss = the ITLB (Instruction TLB) and hit the STLB (Second-level TLB). Available P= DIST counters: 0", + "PublicDescription": "Counts instruction fetch requests that miss = the ITLB (Instruction TLB) and hit the STLB (Second-level TLB).", "SampleAfterValue": "100003", "UMask": "0x20", "Unit": "cpu_core" @@ -195,7 +195,7 @@ "CounterMask": "1", "EventCode": "0x11", "EventName": "ITLB_MISSES.WALK_ACTIVE", - "PublicDescription": "Counts cycles when at least one PMH (Page Mi= ss Handler) is busy with a page walk for a code (instruction fetch) request= . Available PDIST counters: 0", + "PublicDescription": "Counts cycles when at least one PMH (Page Mi= ss Handler) is busy with a page walk for a code (instruction fetch) request= .", "SampleAfterValue": "100003", "UMask": "0x10", "Unit": "cpu_core" @@ -215,7 +215,7 @@ "Counter": "0,1,2,3", "EventCode": "0x11", "EventName": "ITLB_MISSES.WALK_COMPLETED", - "PublicDescription": "Counts completed page walks (all page sizes)= caused by a code fetch. This implies it missed in the ITLB (Instruction TL= B) and further levels of TLB. The page walk can end with or without a fault= . Available PDIST counters: 0", + "PublicDescription": "Counts completed page walks (all page sizes)= caused by a code fetch. This implies it missed in the ITLB (Instruction TL= B) and further levels of TLB. The page walk can end with or without a fault= .", "SampleAfterValue": "100003", "UMask": "0xe", "Unit": "cpu_core" @@ -225,7 +225,7 @@ "Counter": "0,1,2,3", "EventCode": "0x11", "EventName": "ITLB_MISSES.WALK_COMPLETED_2M_4M", - "PublicDescription": "Counts completed page walks (2M/4M page size= s) caused by a code fetch. This implies it missed in the ITLB (Instruction = TLB) and further levels of TLB. The page walk can end with or without a fau= lt. Available PDIST counters: 0", + "PublicDescription": "Counts completed page walks (2M/4M page size= s) caused by a code fetch. This implies it missed in the ITLB (Instruction = TLB) and further levels of TLB. The page walk can end with or without a fau= lt.", "SampleAfterValue": "100003", "UMask": "0x4", "Unit": "cpu_core" @@ -235,7 +235,7 @@ "Counter": "0,1,2,3", "EventCode": "0x11", "EventName": "ITLB_MISSES.WALK_COMPLETED_4K", - "PublicDescription": "Counts completed page walks (4K page sizes) = caused by a code fetch. This implies it missed in the ITLB (Instruction TLB= ) and further levels of TLB. The page walk can end with or without a fault.= Available PDIST counters: 0", + "PublicDescription": "Counts completed page walks (4K page sizes) = caused by a code fetch. This implies it missed in the ITLB (Instruction TLB= ) and further levels of TLB. The page walk can end with or without a fault.= ", "SampleAfterValue": "100003", "UMask": "0x2", "Unit": "cpu_core" @@ -245,7 +245,7 @@ "Counter": "0,1,2,3", "EventCode": "0x11", "EventName": "ITLB_MISSES.WALK_PENDING", - "PublicDescription": "Counts the number of page walks outstanding = for an outstanding code (instruction fetch) request in the PMH (Page Miss H= andler) each cycle. Available PDIST counters: 0", + "PublicDescription": "Counts the number of page walks outstanding = for an outstanding code (instruction fetch) request in the PMH (Page Miss H= andler) each cycle.", "SampleAfterValue": "100003", "UMask": "0x10", "Unit": "cpu_core" diff --git a/tools/perf/pmu-events/arch/x86/alderlaken/adln-metrics.json b/= tools/perf/pmu-events/arch/x86/alderlaken/adln-metrics.json index ce93648043ef..0f72c9192df6 100644 --- a/tools/perf/pmu-events/arch/x86/alderlaken/adln-metrics.json +++ b/tools/perf/pmu-events/arch/x86/alderlaken/adln-metrics.json @@ -1,56 +1,56 @@ [ { "BriefDescription": "C10 residency percent per package", - "MetricExpr": "cstate_pkg@c10\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c10\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C10_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C1 residency percent per core", - "MetricExpr": "cstate_core@c1\\-residency@ / TSC", + "MetricExpr": "cstate_core@c1\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C1_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C2 residency percent per package", - "MetricExpr": "cstate_pkg@c2\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c2\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C2_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C3 residency percent per package", - "MetricExpr": "cstate_pkg@c3\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c3\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C3_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per core", - "MetricExpr": "cstate_core@c6\\-residency@ / TSC", + "MetricExpr": "cstate_core@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per package", - "MetricExpr": "cstate_pkg@c6\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C7 residency percent per core", - "MetricExpr": "cstate_core@c7\\-residency@ / TSC", + "MetricExpr": "cstate_core@c7\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C7_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C8 residency percent per package", - "MetricExpr": "cstate_pkg@c8\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c8\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C8_Pkg_Residency", "ScaleUnit": "100%" @@ -460,12 +460,12 @@ }, { "BriefDescription": "Average CPU Utilization", - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / TSC", + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@", "MetricName": "tma_info_system_cpu_utilization" }, { "BriefDescription": "Fraction of cycles spent in Kernel mode", - "MetricExpr": "cpu@CPU_CLK_UNHALTED.CORE_P@k / CPU_CLK_UNHALTED.CO= RE", + "MetricExpr": "CPU_CLK_UNHALTED.CORE_P:k / CPU_CLK_UNHALTED.CORE", "MetricGroup": "Summary", "MetricName": "tma_info_system_kernel_utilization" }, diff --git a/tools/perf/pmu-events/arch/x86/alderlaken/uncore-interconnect.= json b/tools/perf/pmu-events/arch/x86/alderlaken/uncore-interconnect.json index 7c0779c74154..b5604c7534e1 100644 --- a/tools/perf/pmu-events/arch/x86/alderlaken/uncore-interconnect.json +++ b/tools/perf/pmu-events/arch/x86/alderlaken/uncore-interconnect.json @@ -65,7 +65,6 @@ "Counter": "0,1", "EventCode": "0x81", "EventName": "UNC_ARB_REQ_TRK_REQUEST.DRD", - "Experimental": "1", "PerPkg": "1", "UMask": "0x2", "Unit": "ARB" @@ -103,7 +102,6 @@ "Counter": "0,1", "EventCode": "0x81", "EventName": "UNC_ARB_TRK_REQUESTS.RD", - "Experimental": "1", "PerPkg": "1", "UMask": "0x2", "Unit": "ARB" diff --git a/tools/perf/pmu-events/arch/x86/mapfile.csv b/tools/perf/pmu-ev= ents/arch/x86/mapfile.csv index 354ce241500b..d0a17905c74e 100644 --- a/tools/perf/pmu-events/arch/x86/mapfile.csv +++ b/tools/perf/pmu-events/arch/x86/mapfile.csv @@ -1,6 +1,6 @@ Family-model,Version,Filename,EventType -GenuineIntel-6-(97|9A|B7|BA|BF),v1.31,alderlake,core -GenuineIntel-6-BE,v1.31,alderlaken,core +GenuineIntel-6-(97|9A|B7|BA|BF),v1.33,alderlake,core +GenuineIntel-6-BE,v1.33,alderlaken,core GenuineIntel-6-C[56],v1.09,arrowlake,core GenuineIntel-6-(1C|26|27|35|36),v5,bonnell,core GenuineIntel-6-(3D|47),v30,broadwell,core --=20 2.50.0.727.gbf7dc18ff4-goog From nobody Mon Oct 6 15:13:22 2025 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4B941218ADD for ; Sat, 19 Jul 2025 03:45:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752896736; cv=none; b=HtSJ1DKFAKM/bDWb9FYziDUzO1oQNZaAy7qRIgmiQE0K7+ImF84cLj0vioYEQeS4eIUq941UvcPboJoHlROWcYOlbCa87NlVrmtCgV18ltXhuv6IXMIXSo0OXWrZJMLH02Pwz2eZNBZ9Mlc52x0AeERwvGpEaYC3zgSh8knSQQA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752896736; c=relaxed/simple; bh=xFTNvFkmYo1uM2XaL6FP1zrsQ+xIMFA+dyEbfiFXxbo=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Content-Type; b=V4Ls6RnkTnsNMQi85irvYEhhl61+XZy/mZEQA4/fpFbmbFqA2jUOiLJqA7VHHzE8lnjXnTIbSoBWFMttnPDp1yIieobNhizbKLl4d9D5AiWWYNUVAt7pe34R/Wcc1RryJsQccygVT9/0ShdCuC29fL6i6WadYqKxtF8gu1AbASE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=jXXxUCv/; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="jXXxUCv/" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-31366819969so2563373a91.0 for ; Fri, 18 Jul 2025 20:45:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1752896729; x=1753501529; darn=vger.kernel.org; h=content-transfer-encoding:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=2r71u9Bw9kyGry8Dg+LafrpxZTDsysZsOvttEbY9mWw=; b=jXXxUCv/BMTP88ucUkN+6sRKKSKEJ5er1PRLkk/x7IpgC+8gU2mIRwFZjBmAsCJRcu t6zR6Cv8q4MT8RZJRSuM58UaBDgII7iH+0KdMjgpxXzCgvnRT+0+BFcrysFYazhdDa1A M7gS8LdAKk8Fn6LoLHmNoqg+hxcjhvZNZpohnWdxmeQMd5QwhnJO4clnj8feKglSqf06 jT+TaOHw0DZnxSX4Y1z6CXDksxRpz/NoJjx4lzRTyLqSCmVV+IRKkkZMqof1QOZ6AG2o fwfqPwcF1chpsl4I+gzoXFPSIYT+4syG8KG+aSbfjODggoOhgkIg2xi5zII5I+qU+ZU5 Nx4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752896729; x=1753501529; h=content-transfer-encoding:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=2r71u9Bw9kyGry8Dg+LafrpxZTDsysZsOvttEbY9mWw=; b=nwv9vuYN5pmOQFI0zLdOHkYax2VFhQn3qIimPTvP6Pn3UGZuUJ2zSrbVAL1rpwdEZA +wcyuv+cIFmbO2aq26n3FjhGV8uJVGwV+f+hQDGNxl1sKCZN6CNgVslzAxUF4M4W8NtU q5h74sTvFcpVDnQQrm9y0IS+XUuY2KxcHaN5nUh+IywBkn31U/Wy/b2EXnMnoWzGPwQh n7Qj7grRiPKLrbMxFPSr/ta8C7elcd9aAss3Bl0SM4QktLI99LbOtLHiUARTahahPVd+ 0a+FU7B1VEz/RzphVPVgV+NB9hpEmg2U4wMTbiovWzTvKj8n7oUL6KmXXhTiGE6JksTZ r0XQ== X-Forwarded-Encrypted: i=1; AJvYcCX2LpYnGw1aTT5H5eOHa2PnlpjYssjtyGhhBiDq7GXCgkGoR0Za3uKsWHAo5SHwN59yqDAGS5L/5IA4Dco=@vger.kernel.org X-Gm-Message-State: AOJu0YyZmbAhF4eUQyFohN9zyIxXCPNCDtdKiQWvfNy6wFZi0uNSPVaI OiPgY8iqKsLpwXdpY/WHDBFAan5H95hTbWdSnpZUie15m+AuzlVCUemKDQ1UaotXClxT2mvQmsg ZdV5OBVXzeg== X-Google-Smtp-Source: AGHT+IGgqjsbrFRfo89CYqDG9JDdK0KGYBQfPO+ICLzH2PRRKfnXTEVU5tzndr4igyz8/Vt973YVp7GlezyR X-Received: from pjbqd16.prod.google.com ([2002:a17:90b:3cd0:b0:31c:15e1:d04]) (user=irogers job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:3f8d:b0:316:d69d:49fb with SMTP id 98e67ed59e1d1-31c9e70915amr21926694a91.14.1752896729383; Fri, 18 Jul 2025 20:45:29 -0700 (PDT) Date: Fri, 18 Jul 2025 20:44:58 -0700 In-Reply-To: <20250719034515.2000467-1-irogers@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250719034515.2000467-1-irogers@google.com> X-Mailer: git-send-email 2.50.0.727.gbf7dc18ff4-goog Message-ID: <20250719034515.2000467-3-irogers@google.com> Subject: [PATCH v1 02/19] perf vendor events: Update arrowlake events/metrics From: Ian Rogers To: Thomas Falcon , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Kan Liang , "=?UTF-8?q?Andreas=20F=C3=A4rber?=" , Manivannan Sadhasivam , Caleb Biggers , Weilin Wang , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-actions@lists.infradead.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Update events from v1.09 to v1.12. Update metrics from TMA 5.0 to 5.1. The event updates come from: https://github.com/intel/perfmon/commit/d5b1d2e9ee89035198ab7752e0378acba86= 907b6 https://github.com/intel/perfmon/commit/b9c162b7c98874321af36a411ee3c2194ea= 9afeb Signed-off-by: Ian Rogers Tested-by: Thomas Falcon --- .../arch/x86/arrowlake/arl-metrics.json | 180 +++++++++++------- .../pmu-events/arch/x86/arrowlake/cache.json | 122 ++++++++---- .../arch/x86/arrowlake/frontend.json | 40 ++-- .../pmu-events/arch/x86/arrowlake/memory.json | 22 +-- .../arch/x86/arrowlake/pipeline.json | 94 +++++---- tools/perf/pmu-events/arch/x86/mapfile.csv | 2 +- 6 files changed, 282 insertions(+), 178 deletions(-) diff --git a/tools/perf/pmu-events/arch/x86/arrowlake/arl-metrics.json b/to= ols/perf/pmu-events/arch/x86/arrowlake/arl-metrics.json index b22a02450e6c..4f1f77404943 100644 --- a/tools/perf/pmu-events/arch/x86/arrowlake/arl-metrics.json +++ b/tools/perf/pmu-events/arch/x86/arrowlake/arl-metrics.json @@ -1,56 +1,56 @@ [ { "BriefDescription": "C10 residency percent per package", - "MetricExpr": "cstate_pkg@c10\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c10\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C10_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C1 residency percent per core", - "MetricExpr": "cstate_core@c1\\-residency@ / TSC", + "MetricExpr": "cstate_core@c1\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C1_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C2 residency percent per package", - "MetricExpr": "cstate_pkg@c2\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c2\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C2_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C3 residency percent per package", - "MetricExpr": "cstate_pkg@c3\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c3\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C3_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per core", - "MetricExpr": "cstate_core@c6\\-residency@ / TSC", + "MetricExpr": "cstate_core@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per package", - "MetricExpr": "cstate_pkg@c6\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C7 residency percent per core", - "MetricExpr": "cstate_core@c7\\-residency@ / TSC", + "MetricExpr": "cstate_core@c7\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C7_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C8 residency percent per package", - "MetricExpr": "cstate_pkg@c8\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c8\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C8_Pkg_Residency", "ScaleUnit": "100%" @@ -567,7 +567,7 @@ }, { "BriefDescription": "Average CPU Utilization", - "MetricExpr": "cpu_atom@CPU_CLK_UNHALTED.REF_TSC@ / TSC", + "MetricExpr": "cpu_atom@CPU_CLK_UNHALTED.REF_TSC@ / msr@tsc\\,cpu= =3Dcpu_atom@", "MetricName": "tma_info_system_cpu_utilization", "Unit": "cpu_atom" }, @@ -774,7 +774,7 @@ { "BriefDescription": "This category represents fraction of slots wh= ere no uops are being delivered due to a lack of required resources for acc= epting new uops in the Backend", "DefaultMetricgroupName": "TopdownL1", - "MetricExpr": "cpu_core@topdown\\-be\\-bound@ / (cpu_core@topdown\= \-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retirin= g@ + cpu_core@topdown\\-be\\-bound@) + 0 * tma_info_thread_slots", + "MetricExpr": "cpu_core@topdown\\-be\\-bound@ / (cpu_core@topdown\= \-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retirin= g@ + cpu_core@topdown\\-be\\-bound@)", "MetricGroup": "BvOB;Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", "MetricThreshold": "tma_backend_bound > 0.2", @@ -786,7 +786,7 @@ { "BriefDescription": "This category represents fraction of slots wa= sted due to incorrect speculations", "DefaultMetricgroupName": "TopdownL1", - "MetricExpr": "cpu_core@topdown\\-bad\\-spec@ / (cpu_core@topdown\= \-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retirin= g@ + cpu_core@topdown\\-be\\-bound@) + 0 * tma_info_thread_slots", + "MetricExpr": "cpu_core@topdown\\-bad\\-spec@ / (cpu_core@topdown\= \-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retirin= g@ + cpu_core@topdown\\-be\\-bound@)", "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_bad_speculation", "MetricThreshold": "tma_bad_speculation > 0.15", @@ -812,36 +812,36 @@ "PublicDescription": "Total pipeline cost of instructions used for= program control-flow - a subset of the Retiring category in TMA. Examples = include function calls; loops and alignments. (A lower bound)", "Unit": "cpu_core" }, + { + "BriefDescription": "Total pipeline cost when the execution is com= pute-bound - an estimation", + "MetricExpr": "100 * (tma_core_bound * tma_divider / (tma_divider = + tma_ports_utilization + tma_serializing_operation) + tma_core_bound * (tm= a_ports_utilization / (tma_divider + tma_ports_utilization + tma_serializin= g_operation)) * (tma_ports_utilized_3m / (tma_ports_utilized_0 + tma_ports_= utilized_1 + tma_ports_utilized_2 + tma_ports_utilized_3m)))", + "MetricGroup": "BvCB;Cor;tma_issueComp", + "MetricName": "tma_bottleneck_compute_bound_est", + "MetricThreshold": "tma_bottleneck_compute_bound_est > 20", + "PublicDescription": "Total pipeline cost when the execution is co= mpute-bound - an estimation. Covers Core Bound when High ILP as well as whe= n long-latency execution units are busy. Related metrics: ", + "Unit": "cpu_core" + }, { "BriefDescription": "Total pipeline cost of external Memory- or Ca= che-Bandwidth related bottlenecks", - "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) *= (tma_mem_bandwidth / (tma_mem_bandwidth + tma_mem_latency)) + tma_memory_b= ound * (tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_= l3_bound + tma_store_bound)) * (tma_sq_full / (tma_contested_accesses + tma= _data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_memory_bound * (tm= a_l1_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound += tma_store_bound)) * (tma_fb_full / (tma_dtlb_load + tma_fb_full + tma_l1_l= atency_capacity + tma_l1_latency_dependency + tma_lock_latency + tma_split_= loads + tma_store_fwd_blk)))", + "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) *= (tma_mem_bandwidth / (tma_mem_bandwidth + tma_mem_latency)) + tma_memory_b= ound * (tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_= l3_bound + tma_store_bound)) * (tma_sq_full / (tma_contested_accesses + tma= _data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_memory_bound * (tm= a_l1_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound += tma_store_bound)) * (tma_fb_full / (tma_dtlb_load + tma_fb_full + tma_l1_l= atency_capacity + tma_l1_latency_dependency + tma_lock_latency + tma_split_= loads + tma_store_early_blk + tma_store_fwd_blk)))", "MetricGroup": "BvMB;Mem;MemoryBW;Offcore;tma_issueBW", - "MetricName": "tma_bottleneck_cache_memory_bandwidth", - "MetricThreshold": "tma_bottleneck_cache_memory_bandwidth > 20", + "MetricName": "tma_bottleneck_data_cache_memory_bandwidth", + "MetricThreshold": "tma_bottleneck_data_cache_memory_bandwidth > 2= 0", "PublicDescription": "Total pipeline cost of external Memory- or C= ache-Bandwidth related bottlenecks. Related metrics: tma_fb_full, tma_info_= system_dram_bw_use, tma_mem_bandwidth, tma_sq_full", "Unit": "cpu_core" }, { "BriefDescription": "Total pipeline cost of external Memory- or Ca= che-Latency related bottlenecks", - "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) *= (tma_mem_latency / (tma_mem_bandwidth + tma_mem_latency)) + tma_memory_bou= nd * (tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3= _bound + tma_store_bound)) * (tma_l3_hit_latency / (tma_contested_accesses = + tma_data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_memory_bound = * tma_l2_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bou= nd + tma_store_bound) + tma_memory_bound * (tma_l1_bound / (tma_dram_bound = + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_l1_= latency_dependency / (tma_dtlb_load + tma_fb_full + tma_l1_latency_capacity= + tma_l1_latency_dependency + tma_lock_latency + tma_split_loads + tma_sto= re_fwd_blk)) + tma_memory_bound * (tma_l1_bound / (tma_dram_bound + tma_l1_= bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_l1_latency_c= apacity / (tma_dtlb_load + tma_fb_full + tma_l1_latency_capacity + tma_l1_l= atency_dependency + tma_lock_latency + tma_split_loads + tma_store_fwd_blk)= ) + tma_memory_bound * (tma_l1_bound / (tma_dram_bound + tma_l1_bound + tma= _l2_bound + tma_l3_bound + tma_store_bound)) * (tma_lock_latency / (tma_dtl= b_load + tma_fb_full + tma_l1_latency_capacity + tma_l1_latency_dependency = + tma_lock_latency + tma_split_loads + tma_store_fwd_blk)) + tma_memory_bou= nd * (tma_l1_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3= _bound + tma_store_bound)) * (tma_split_loads / (tma_dtlb_load + tma_fb_ful= l + tma_l1_latency_capacity + tma_l1_latency_dependency + tma_lock_latency = + tma_split_loads + tma_store_fwd_blk)) + tma_memory_bound * (tma_store_bou= nd / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_sto= re_bound)) * (tma_split_stores / (tma_dtlb_store + tma_false_sharing + tma_= split_stores + tma_store_latency + tma_streaming_stores)) + tma_memory_boun= d * (tma_store_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_= l3_bound + tma_store_bound)) * (tma_store_latency / (tma_dtlb_store + tma_f= alse_sharing + tma_split_stores + tma_store_latency + tma_streaming_stores)= ))", + "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) *= (tma_mem_latency / (tma_mem_bandwidth + tma_mem_latency)) + tma_memory_bou= nd * (tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3= _bound + tma_store_bound)) * (tma_l3_hit_latency / (tma_contested_accesses = + tma_data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_memory_bound = * tma_l2_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bou= nd + tma_store_bound) + tma_memory_bound * (tma_l1_bound / (tma_dram_bound = + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_l1_= latency_dependency / (tma_dtlb_load + tma_fb_full + tma_l1_latency_capacity= + tma_l1_latency_dependency + tma_lock_latency + tma_split_loads + tma_sto= re_early_blk + tma_store_fwd_blk)) + tma_memory_bound * (tma_l1_bound / (tm= a_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound= )) * (tma_l1_latency_capacity / (tma_dtlb_load + tma_fb_full + tma_l1_laten= cy_capacity + tma_l1_latency_dependency + tma_lock_latency + tma_split_load= s + tma_store_early_blk + tma_store_fwd_blk)) + tma_memory_bound * (tma_l1_= bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_= store_bound)) * (tma_lock_latency / (tma_dtlb_load + tma_fb_full + tma_l1_l= atency_capacity + tma_l1_latency_dependency + tma_lock_latency + tma_split_= loads + tma_store_early_blk + tma_store_fwd_blk)) + tma_memory_bound * (tma= _l1_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + = tma_store_bound)) * (tma_split_loads / (tma_dtlb_load + tma_fb_full + tma_l= 1_latency_capacity + tma_l1_latency_dependency + tma_lock_latency + tma_spl= it_loads + tma_store_early_blk + tma_store_fwd_blk)) + tma_memory_bound * (= tma_store_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bo= und + tma_store_bound)) * (tma_split_stores / (tma_dtlb_store + tma_false_s= haring + tma_split_stores + tma_store_latency + tma_streaming_stores)) + tm= a_memory_bound * (tma_store_bound / (tma_dram_bound + tma_l1_bound + tma_l2= _bound + tma_l3_bound + tma_store_bound)) * (tma_store_latency / (tma_dtlb_= store + tma_false_sharing + tma_split_stores + tma_store_latency + tma_stre= aming_stores)))", "MetricGroup": "BvML;Mem;MemoryLat;Offcore;tma_issueLat", - "MetricName": "tma_bottleneck_cache_memory_latency", - "MetricThreshold": "tma_bottleneck_cache_memory_latency > 20", + "MetricName": "tma_bottleneck_data_cache_memory_latency", + "MetricThreshold": "tma_bottleneck_data_cache_memory_latency > 20", "PublicDescription": "Total pipeline cost of external Memory- or C= ache-Latency related bottlenecks. Related metrics: tma_l3_hit_latency, tma_= mem_latency", "Unit": "cpu_core" }, - { - "BriefDescription": "Total pipeline cost when the execution is com= pute-bound - an estimation", - "MetricExpr": "100 * (tma_core_bound * tma_divider / (tma_divider = + tma_ports_utilization + tma_serializing_operation) + tma_core_bound * (tm= a_ports_utilization / (tma_divider + tma_ports_utilization + tma_serializin= g_operation)) * (tma_ports_utilized_3m / (tma_ports_utilized_0 + tma_ports_= utilized_1 + tma_ports_utilized_2 + tma_ports_utilized_3m)))", - "MetricGroup": "BvCB;Cor;tma_issueComp", - "MetricName": "tma_bottleneck_compute_bound_est", - "MetricThreshold": "tma_bottleneck_compute_bound_est > 20", - "PublicDescription": "Total pipeline cost when the execution is co= mpute-bound - an estimation. Covers Core Bound when High ILP as well as whe= n long-latency execution units are busy. Related metrics: ", - "Unit": "cpu_core" - }, { "BriefDescription": "Total pipeline cost of instruction fetch band= width related bottlenecks (when the front-end could not sustain operations = delivery to the back-end)", - "MetricExpr": "100 * (tma_frontend_bound - (1 - 10 * tma_microcode= _sequencer * tma_other_mispredicts / tma_branch_mispredicts) * tma_fetch_la= tency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches = + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches) - (1 - c= pu_core@INST_RETIRED.REP_ITERATION@ / cpu_core@UOPS_RETIRED.MS\\,cmask\\=3D= 1@) * (tma_fetch_latency * (tma_ms_switches + tma_branch_resteers * (tma_cl= ears_resteers + tma_mispredicts_resteers * tma_other_mispredicts / tma_bran= ch_mispredicts) / (tma_clears_resteers + tma_mispredicts_resteers + tma_unk= nown_branches)) / (tma_branch_resteers + tma_dsb_switches + tma_icache_miss= es + tma_itlb_misses + tma_lcp + tma_ms_switches) + tma_fetch_bandwidth * t= ma_ms / (tma_dsb + tma_lsd + tma_mite + tma_ms))) - tma_bottleneck_big_code= ", + "MetricExpr": "100 * (tma_frontend_bound - (1 - 10 * tma_microcode= _sequencer * tma_other_mispredicts / tma_branch_mispredicts) * tma_fetch_la= tency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches = + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches) - (1 - c= pu_core@INST_RETIRED.REP_ITERATION@ / cpu_core@UOPS_RETIRED.MS\\,cmask\\=3D= 1@) * (tma_fetch_latency * (tma_ms_switches + tma_branch_resteers * (tma_cl= ears_resteers + tma_mispredicts_resteers * tma_other_mispredicts / tma_bran= ch_mispredicts) / (tma_clears_resteers + tma_mispredicts_resteers + tma_unk= nown_branches)) / (tma_branch_resteers + tma_dsb_switches + tma_icache_miss= es + tma_itlb_misses + tma_lcp + tma_ms_switches) + tma_ms)) - tma_bottlene= ck_big_code", "MetricGroup": "BvFB;Fed;FetchBW;Frontend", "MetricName": "tma_bottleneck_instruction_fetch_bw", "MetricThreshold": "tma_bottleneck_instruction_fetch_bw > 20", @@ -849,7 +849,7 @@ }, { "BriefDescription": "Total pipeline cost of irregular execution (e= .g", - "MetricExpr": "100 * ((1 - cpu_core@INST_RETIRED.REP_ITERATION@ / = cpu_core@UOPS_RETIRED.MS\\,cmask\\=3D1@) * (tma_fetch_latency * (tma_ms_swi= tches + tma_branch_resteers * (tma_clears_resteers + tma_mispredicts_restee= rs * tma_other_mispredicts / tma_branch_mispredicts) / (tma_clears_resteers= + tma_mispredicts_resteers + tma_unknown_branches)) / (tma_branch_resteers= + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_m= s_switches) + tma_fetch_bandwidth * tma_ms / (tma_dsb + tma_lsd + tma_mite = + tma_ms)) + 10 * tma_microcode_sequencer * tma_other_mispredicts / tma_bra= nch_mispredicts * tma_branch_mispredicts + tma_machine_clears * tma_other_n= ukes / tma_other_nukes + tma_core_bound * (tma_serializing_operation + cpu_= core@RS.EMPTY_RESOURCE@ / tma_info_thread_clks * tma_ports_utilized_0) / (t= ma_divider + tma_ports_utilization + tma_serializing_operation) + tma_micro= code_sequencer / (tma_microcode_sequencer + tma_few_uops_instructions) * (t= ma_assists / tma_microcode_sequencer) * tma_heavy_operations)", + "MetricExpr": "100 * ((1 - cpu_core@INST_RETIRED.REP_ITERATION@ / = cpu_core@UOPS_RETIRED.MS\\,cmask\\=3D1@) * (tma_fetch_latency * (tma_ms_swi= tches + tma_branch_resteers * (tma_clears_resteers + tma_mispredicts_restee= rs * tma_other_mispredicts / tma_branch_mispredicts) / (tma_clears_resteers= + tma_mispredicts_resteers + tma_unknown_branches)) / (tma_branch_resteers= + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_m= s_switches) + tma_ms) + 10 * tma_microcode_sequencer * tma_other_mispredict= s / tma_branch_mispredicts * tma_branch_mispredicts + tma_machine_clears * = tma_other_nukes / tma_other_nukes + tma_core_bound * (tma_serializing_opera= tion + cpu_core@RS.EMPTY_RESOURCE@ / tma_info_thread_clks * tma_ports_utili= zed_0) / (tma_divider + tma_ports_utilization + tma_serializing_operation) = + tma_microcode_sequencer / (tma_microcode_sequencer + tma_few_uops_instruc= tions) * (tma_assists / tma_microcode_sequencer) * tma_heavy_operations)", "MetricGroup": "Bad;BvIO;Cor;Ret;tma_issueMS", "MetricName": "tma_bottleneck_irregular_overhead", "MetricThreshold": "tma_bottleneck_irregular_overhead > 10", @@ -858,7 +858,7 @@ }, { "BriefDescription": "Total pipeline cost of Memory Address Transla= tion related bottlenecks (data-side TLBs)", - "MetricExpr": "100 * (tma_memory_bound * (tma_l1_bound / (tma_dram= _bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (= tma_dtlb_load / (tma_dtlb_load + tma_fb_full + tma_l1_latency_capacity + tm= a_l1_latency_dependency + tma_lock_latency + tma_split_loads + tma_store_fw= d_blk)) + tma_memory_bound * (tma_store_bound / (tma_dram_bound + tma_l1_bo= und + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_dtlb_store / (= tma_dtlb_store + tma_false_sharing + tma_split_stores + tma_store_latency += tma_streaming_stores)))", + "MetricExpr": "100 * (tma_memory_bound * (tma_l1_bound / (tma_dram= _bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (= tma_dtlb_load / (tma_dtlb_load + tma_fb_full + tma_l1_latency_capacity + tm= a_l1_latency_dependency + tma_lock_latency + tma_split_loads + tma_store_ea= rly_blk + tma_store_fwd_blk)) + tma_memory_bound * (tma_store_bound / (tma_= dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound))= * (tma_dtlb_store / (tma_dtlb_store + tma_false_sharing + tma_split_stores= + tma_store_latency + tma_streaming_stores)))", "MetricGroup": "BvMT;Mem;MemoryTLB;Offcore;tma_issueTLB", "MetricName": "tma_bottleneck_memory_data_tlbs", "MetricThreshold": "tma_bottleneck_memory_data_tlbs > 20", @@ -885,7 +885,7 @@ }, { "BriefDescription": "Total pipeline cost of remaining bottlenecks = in the back-end", - "MetricExpr": "100 - (tma_bottleneck_big_code + tma_bottleneck_ins= truction_fetch_bw + tma_bottleneck_mispredictions + tma_bottleneck_cache_me= mory_bandwidth + tma_bottleneck_cache_memory_latency + tma_bottleneck_memor= y_data_tlbs + tma_bottleneck_memory_synchronization + tma_bottleneck_comput= e_bound_est + tma_bottleneck_irregular_overhead + tma_bottleneck_branching_= overhead + tma_bottleneck_useful_work)", + "MetricExpr": "100 - (tma_bottleneck_big_code + tma_bottleneck_ins= truction_fetch_bw + tma_bottleneck_mispredictions + tma_bottleneck_data_cac= he_memory_bandwidth + tma_bottleneck_data_cache_memory_latency + tma_bottle= neck_memory_data_tlbs + tma_bottleneck_memory_synchronization + tma_bottlen= eck_compute_bound_est + tma_bottleneck_irregular_overhead + tma_bottleneck_= branching_overhead + tma_bottleneck_useful_work)", "MetricGroup": "BvOB;Cor;Offcore", "MetricName": "tma_bottleneck_other_bottlenecks", "MetricThreshold": "tma_bottleneck_other_bottlenecks > 20", @@ -902,7 +902,7 @@ }, { "BriefDescription": "This metric represents fraction of slots the = CPU has wasted due to Branch Misprediction", - "MetricExpr": "cpu_core@topdown\\-br\\-mispredict@ / (cpu_core@top= down\\-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-re= tiring@ + cpu_core@topdown\\-be\\-bound@) + 0 * tma_info_thread_slots", + "MetricExpr": "cpu_core@topdown\\-br\\-mispredict@ / (cpu_core@top= down\\-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-re= tiring@ + cpu_core@topdown\\-be\\-bound@)", "MetricGroup": "BadSpec;BrMispredicts;BvMP;TmaL2;TopdownL2;tma_L2_= group;tma_bad_speculation_group;tma_issueBM", "MetricName": "tma_branch_mispredicts", "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_specula= tion > 0.15", @@ -1042,7 +1042,6 @@ }, { "BriefDescription": "This metric estimates fraction of cycles whil= e the memory subsystem was handling synchronizations due to contested acces= ses", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "(cpu_core@MEM_LOAD_L3_HIT_RETIRED.XSNP_MISS@ * min(= cpu_core@MEM_LOAD_L3_HIT_RETIRED.XSNP_MISS@R, 24 * tma_info_system_core_fre= quency) + cpu_core@MEM_LOAD_L3_HIT_RETIRED.XSNP_HITM@ * min(cpu_core@MEM_LO= AD_L3_HIT_RETIRED.XSNP_HITM@R, 25 * tma_info_system_core_frequency)) * (1 += cpu_core@MEM_LOAD_RETIRED.FB_HIT@ / cpu_core@MEM_LOAD_RETIRED.L1_MISS@ / 2= ) / tma_info_thread_clks", "MetricGroup": "BvMS;DataSharing;LockCont;Offcore;Snoop;TopdownL4;= tma_L4_group;tma_issueSyncxn;tma_l3_bound_group", "MetricName": "tma_contested_accesses", @@ -1095,7 +1094,7 @@ }, { "BriefDescription": "This metric represents Core fraction of cycle= s in which CPU was likely limited due to DSB (decoded uop cache) fetch pipe= line", - "MetricExpr": "(cpu_core@IDQ.DSB_UOPS\\,cmask\\=3D0x8\\,inv\\=3D0x= 1@ + cpu_core@IDQ.DSB_UOPS@ / (cpu_core@IDQ.DSB_UOPS@ + cpu_core@IDQ.MITE_U= OPS@) * (cpu_core@IDQ_BUBBLES.CYCLES_0_UOPS_DELIV.CORE@ - cpu_core@IDQ_BUBB= LES.FETCH_LATENCY@)) / tma_info_thread_clks", + "MetricExpr": "(cpu_core@IDQ.DSB_UOPS\\,cmask\\=3D0x8\\,inv\\=3D0x= 1@ / 2 + cpu_core@IDQ.DSB_UOPS@ / (cpu_core@IDQ.DSB_UOPS@ + cpu_core@IDQ.MI= TE_UOPS@) * (cpu_core@IDQ_BUBBLES.STARVATION_CYCLES@ - cpu_core@IDQ_BUBBLES= .FETCH_LATENCY@)) / tma_info_thread_clks", "MetricGroup": "DSB;FetchBW;TopdownL3;tma_L3_group;tma_fetch_bandw= idth_group", "MetricName": "tma_dsb", "MetricThreshold": "tma_dsb > 0.15 & tma_fetch_bandwidth > 0.2", @@ -1149,7 +1148,7 @@ "MetricGroup": "BvMB;MemoryBW;TopdownL4;tma_L4_group;tma_issueBW;t= ma_issueSL;tma_issueSmSt;tma_l1_bound_group", "MetricName": "tma_fb_full", "MetricThreshold": "tma_fb_full > 0.3", - "PublicDescription": "This metric does a *rough estimation* of how= often L1D Fill Buffer unavailability limited additional L1D miss memory ac= cess requests to proceed. The higher the metric value; the deeper the memor= y hierarchy level the misses are satisfied from (metric values >1 are valid= ). Often it hints on approaching bandwidth limits (to L2 cache; L3 cache or= external memory). Related metrics: tma_bottleneck_cache_memory_bandwidth, = tma_info_system_dram_bw_use, tma_mem_bandwidth, tma_sq_full, tma_store_late= ncy, tma_streaming_stores", + "PublicDescription": "This metric does a *rough estimation* of how= often L1D Fill Buffer unavailability limited additional L1D miss memory ac= cess requests to proceed. The higher the metric value; the deeper the memor= y hierarchy level the misses are satisfied from (metric values >1 are valid= ). Often it hints on approaching bandwidth limits (to L2 cache; L3 cache or= external memory). Related metrics: tma_bottleneck_data_cache_memory_bandwi= dth, tma_info_system_dram_bw_use, tma_mem_bandwidth, tma_sq_full, tma_store= _latency, tma_streaming_stores", "ScaleUnit": "100%", "Unit": "cpu_core" }, @@ -1166,7 +1165,7 @@ }, { "BriefDescription": "This metric represents fraction of slots the = CPU was stalled due to Frontend latency issues", - "MetricExpr": "cpu_core@topdown\\-fetch\\-lat@ / (cpu_core@topdown= \\-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retiri= ng@ + cpu_core@topdown\\-be\\-bound@) + 0 * tma_info_thread_slots", + "MetricExpr": "cpu_core@topdown\\-fetch\\-lat@ / (cpu_core@topdown= \\-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retiri= ng@ + cpu_core@topdown\\-be\\-bound@)", "MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend= _bound_group", "MetricName": "tma_fetch_latency", "MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound >= 0.15", @@ -1216,7 +1215,7 @@ }, { "BriefDescription": "This metric approximates arithmetic floating-= point (FP) scalar uops fraction the CPU has retired", - "MetricExpr": "cpu_core@FP_ARITH_INST_RETIRED.SCALAR@ / (tma_retir= ing * tma_info_thread_slots)", + "MetricExpr": "cpu_core@FP_ARITH_OPS_RETIRED.SCALAR@ / (tma_retiri= ng * tma_info_thread_slots)", "MetricGroup": "Compute;Flops;TopdownL4;tma_L4_group;tma_fp_arith_= group;tma_issue2P", "MetricName": "tma_fp_scalar", "MetricThreshold": "tma_fp_scalar > 0.1 & (tma_fp_arith > 0.2 & tm= a_light_operations > 0.6)", @@ -1226,7 +1225,7 @@ }, { "BriefDescription": "This metric approximates arithmetic floating-= point (FP) vector uops fraction the CPU has retired aggregated across all v= ector widths", - "MetricExpr": "cpu_core@FP_ARITH_INST_RETIRED.VECTOR@ / (tma_retir= ing * tma_info_thread_slots)", + "MetricExpr": "cpu_core@FP_ARITH_OPS_RETIRED.VECTOR@ / (tma_retiri= ng * tma_info_thread_slots)", "MetricGroup": "Compute;Flops;TopdownL4;tma_L4_group;tma_fp_arith_= group;tma_issue2P", "MetricName": "tma_fp_vector", "MetricThreshold": "tma_fp_vector > 0.1 & (tma_fp_arith > 0.2 & tm= a_light_operations > 0.6)", @@ -1236,7 +1235,7 @@ }, { "BriefDescription": "This metric approximates arithmetic FP vector= uops fraction the CPU has retired for 128-bit wide vectors", - "MetricExpr": "(cpu_core@FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE@= + cpu_core@FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE@) / (tma_retiring * tm= a_info_thread_slots)", + "MetricExpr": "(cpu_core@FP_ARITH_OPS_RETIRED.128B_PACKED_DOUBLE@ = + cpu_core@FP_ARITH_OPS_RETIRED.128B_PACKED_SINGLE@) / (tma_retiring * tma_= info_thread_slots)", "MetricGroup": "Compute;Flops;TopdownL5;tma_L5_group;tma_fp_vector= _group;tma_issue2P", "MetricName": "tma_fp_vector_128b", "MetricThreshold": "tma_fp_vector_128b > 0.1 & (tma_fp_vector > 0.= 1 & (tma_fp_arith > 0.2 & tma_light_operations > 0.6))", @@ -1246,7 +1245,7 @@ }, { "BriefDescription": "This metric approximates arithmetic FP vector= uops fraction the CPU has retired for 256-bit wide vectors", - "MetricExpr": "cpu_core@FP_ARITH_INST_RETIRED.VECTOR\\,umask\\=3D0= x30@ / (tma_retiring * tma_info_thread_slots)", + "MetricExpr": "cpu_core@FP_ARITH_OPS_RETIRED.VECTOR\\,umask\\=3D0x= 30@ / (tma_retiring * tma_info_thread_slots)", "MetricGroup": "Compute;Flops;TopdownL5;tma_L5_group;tma_fp_vector= _group;tma_issue2P", "MetricName": "tma_fp_vector_256b", "MetricThreshold": "tma_fp_vector_256b > 0.1 & (tma_fp_vector > 0.= 1 & (tma_fp_arith > 0.2 & tma_light_operations > 0.6))", @@ -1257,7 +1256,7 @@ { "BriefDescription": "This category represents fraction of slots wh= ere the processor's Frontend undersupplies its Backend", "DefaultMetricgroupName": "TopdownL1", - "MetricExpr": "cpu_core@topdown\\-fe\\-bound@ / (cpu_core@topdown\= \-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retirin= g@ + cpu_core@topdown\\-be\\-bound@) + 0 * tma_info_thread_slots", + "MetricExpr": "cpu_core@topdown\\-fe\\-bound@ / (cpu_core@topdown\= \-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retirin= g@ + cpu_core@topdown\\-be\\-bound@)", "MetricGroup": "BvFB;BvIO;Default;PGO;TmaL1;TopdownL1;tma_L1_group= ", "MetricName": "tma_frontend_bound", "MetricThreshold": "tma_frontend_bound > 0.15", @@ -1278,7 +1277,7 @@ }, { "BriefDescription": "This metric represents fraction of slots wher= e the CPU was retiring heavy-weight operations -- instructions that require= two or more uops or micro-coded sequences", - "MetricExpr": "cpu_core@topdown\\-heavy\\-ops@ / (cpu_core@topdown= \\-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retiri= ng@ + cpu_core@topdown\\-be\\-bound@) + 0 * tma_info_thread_slots", + "MetricExpr": "cpu_core@topdown\\-heavy\\-ops@ / (cpu_core@topdown= \\-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retiri= ng@ + cpu_core@topdown\\-be\\-bound@)", "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_g= roup", "MetricName": "tma_heavy_operations", "MetricThreshold": "tma_heavy_operations > 0.1", @@ -1456,7 +1455,7 @@ }, { "BriefDescription": "Floating Point Operations Per Cycle", - "MetricExpr": "(cpu_core@FP_ARITH_INST_RETIRED.SCALAR@ + 2 * cpu_c= ore@FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE@ + 4 * cpu_core@FP_ARITH_INST_= RETIRED.4_FLOPS@ + 8 * cpu_core@FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE@) = / tma_info_thread_clks", + "MetricExpr": "(cpu_core@FP_ARITH_OPS_RETIRED.SCALAR@ + 2 * cpu_co= re@FP_ARITH_OPS_RETIRED.128B_PACKED_DOUBLE@ + 4 * cpu_core@FP_ARITH_OPS_RET= IRED.4_FLOPS@ + 8 * cpu_core@FP_ARITH_OPS_RETIRED.256B_PACKED_SINGLE@) / tm= a_info_thread_clks", "MetricGroup": "Flops;Ret", "MetricName": "tma_info_core_flopc", "Unit": "cpu_core" @@ -1597,7 +1596,7 @@ }, { "BriefDescription": "Instructions per FP Arithmetic instruction (l= ower number means higher occurrence rate)", - "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / (cpu_core@FP_ARITH_INS= T_RETIRED.SCALAR@ + cpu_core@FP_ARITH_INST_RETIRED.VECTOR@)", + "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / (cpu_core@FP_ARITH_OPS= _RETIRED.SCALAR@ + cpu_core@FP_ARITH_OPS_RETIRED.VECTOR@)", "MetricGroup": "Flops;InsType", "MetricName": "tma_info_inst_mix_iparith", "MetricThreshold": "tma_info_inst_mix_iparith < 10", @@ -1606,7 +1605,7 @@ }, { "BriefDescription": "Instructions per FP Arithmetic AVX/SSE 128-bi= t instruction (lower number means higher occurrence rate)", - "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / (cpu_core@FP_ARITH_INS= T_RETIRED.128B_PACKED_DOUBLE@ + cpu_core@FP_ARITH_INST_RETIRED.128B_PACKED_= SINGLE@)", + "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / (cpu_core@FP_ARITH_OPS= _RETIRED.128B_PACKED_DOUBLE@ + cpu_core@FP_ARITH_OPS_RETIRED.128B_PACKED_SI= NGLE@)", "MetricGroup": "Flops;FpVector;InsType", "MetricName": "tma_info_inst_mix_iparith_avx128", "MetricThreshold": "tma_info_inst_mix_iparith_avx128 < 10", @@ -1615,7 +1614,7 @@ }, { "BriefDescription": "Instructions per FP Arithmetic AVX* 256-bit i= nstruction (lower number means higher occurrence rate)", - "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / (cpu_core@FP_ARITH_INS= T_RETIRED.256B_PACKED_DOUBLE@ + cpu_core@FP_ARITH_INST_RETIRED.256B_PACKED_= SINGLE@)", + "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / (cpu_core@FP_ARITH_OPS= _RETIRED.256B_PACKED_DOUBLE@ + cpu_core@FP_ARITH_OPS_RETIRED.256B_PACKED_SI= NGLE@)", "MetricGroup": "Flops;FpVector;InsType", "MetricName": "tma_info_inst_mix_iparith_avx256", "MetricThreshold": "tma_info_inst_mix_iparith_avx256 < 10", @@ -1624,7 +1623,7 @@ }, { "BriefDescription": "Instructions per FP Arithmetic Scalar Double-= Precision instruction (lower number means higher occurrence rate)", - "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / cpu_core@FP_ARITH_INST= _RETIRED.SCALAR_DOUBLE@", + "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / cpu_core@FP_ARITH_OPS_= RETIRED.SCALAR_DOUBLE@", "MetricGroup": "Flops;FpScalar;InsType", "MetricName": "tma_info_inst_mix_iparith_scalar_dp", "MetricThreshold": "tma_info_inst_mix_iparith_scalar_dp < 10", @@ -1633,7 +1632,7 @@ }, { "BriefDescription": "Instructions per FP Arithmetic Scalar Single-= Precision instruction (lower number means higher occurrence rate)", - "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / cpu_core@FP_ARITH_INST= _RETIRED.SCALAR_SINGLE@", + "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / cpu_core@FP_ARITH_OPS_= RETIRED.SCALAR_SINGLE@", "MetricGroup": "Flops;FpScalar;InsType", "MetricName": "tma_info_inst_mix_iparith_scalar_sp", "MetricThreshold": "tma_info_inst_mix_iparith_scalar_sp < 10", @@ -1658,7 +1657,7 @@ }, { "BriefDescription": "Instructions per Floating Point (FP) Operatio= n (lower number means higher occurrence rate)", - "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / (cpu_core@FP_ARITH_INS= T_RETIRED.SCALAR@ + 2 * cpu_core@FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE@ = + 4 * cpu_core@FP_ARITH_INST_RETIRED.4_FLOPS@ + 8 * cpu_core@FP_ARITH_INST_= RETIRED.256B_PACKED_SINGLE@)", + "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / (cpu_core@FP_ARITH_OPS= _RETIRED.SCALAR@ + 2 * cpu_core@FP_ARITH_OPS_RETIRED.128B_PACKED_DOUBLE@ + = 4 * cpu_core@FP_ARITH_OPS_RETIRED.4_FLOPS@ + 8 * cpu_core@FP_ARITH_OPS_RETI= RED.256B_PACKED_SINGLE@)", "MetricGroup": "Flops;InsType", "MetricName": "tma_info_inst_mix_ipflop", "MetricThreshold": "tma_info_inst_mix_ipflop < 10", @@ -1713,7 +1712,7 @@ }, { "BriefDescription": "Average per-thread data fill bandwidth to the= L1 data cache [GB / sec]", - "MetricExpr": "64 * cpu_core@L1D.REPLACEMENT@ / 1e9 / tma_info_sys= tem_time", + "MetricExpr": "64 * cpu_core@L1D.L1_REPLACEMENT@ / 1e9 / tma_info_= system_time", "MetricGroup": "Mem;MemoryBW", "MetricName": "tma_info_memory_l1d_cache_fill_bw", "Unit": "cpu_core" @@ -1725,6 +1724,13 @@ "MetricName": "tma_info_memory_l1dl0_cache_fill_bw", "Unit": "cpu_core" }, + { + "BriefDescription": "L0 cache true misses per kilo instruction for= retired demand loads", + "MetricExpr": "1e3 * (cpu_core@MEM_LOAD_RETIRED.L1_MISS@ + cpu_cor= e@MEM_LOAD_RETIRED.L1_HIT_L1@) / cpu_core@INST_RETIRED.ANY@", + "MetricGroup": "CacheHits;Mem", + "MetricName": "tma_info_memory_l1dl0_mpki", + "Unit": "cpu_core" + }, { "BriefDescription": "L1 cache true misses per kilo instruction for= retired demand loads", "MetricExpr": "1e3 * cpu_core@MEM_LOAD_RETIRED.L1_MISS@ / cpu_core= @INST_RETIRED.ANY@", @@ -1940,6 +1946,13 @@ "MetricName": "tma_info_pipeline_fetch_mite", "Unit": "cpu_core" }, + { + "BriefDescription": "Average number of uops fetched from MS per cy= cle", + "MetricExpr": "cpu_core@IDQ.MS_UOPS@ / cpu_core@IDQ.MS_UOPS\\,cmas= k\\=3D1@", + "MetricGroup": "Fed;FetchLat;MicroSeq", + "MetricName": "tma_info_pipeline_fetch_ms", + "Unit": "cpu_core" + }, { "BriefDescription": "Instructions per a microcode Assist invocatio= n", "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / cpu_core@ASSISTS.ANY@", @@ -1974,7 +1987,7 @@ }, { "BriefDescription": "Measured Average Core Frequency for unhalted = processors [GHz]", - "MetricExpr": "tma_info_system_turbo_utilization * TSC / 1e9 / tma= _info_system_time", + "MetricExpr": "tma_info_system_turbo_utilization * msr@tsc\\,cpu= =3Dcpu_core@ / 1e9 / tma_info_system_time", "MetricGroup": "Power;Summary", "MetricName": "tma_info_system_core_frequency", "Unit": "cpu_core" @@ -1988,14 +2001,22 @@ }, { "BriefDescription": "Average number of utilized CPUs", - "MetricExpr": "cpu_core@CPU_CLK_UNHALTED.REF_TSC@ / TSC", + "MetricExpr": "cpu_core@CPU_CLK_UNHALTED.REF_TSC@ / msr@tsc\\,cpu= =3Dcpu_core@", "MetricGroup": "Summary", "MetricName": "tma_info_system_cpus_utilized", "Unit": "cpu_core" }, + { + "BriefDescription": "Average external Memory Bandwidth Use for rea= ds and writes [GB / sec]", + "MetricExpr": "32 * UNC_M_TOTAL_DATA / 1e9 / tma_info_system_time", + "MetricGroup": "HPC;MemOffcore;MemoryBW;SoC;tma_issueBW", + "MetricName": "tma_info_system_dram_bw_use", + "PublicDescription": "Average external Memory Bandwidth Use for re= ads and writes [GB / sec]. Related metrics: tma_bottleneck_data_cache_memor= y_bandwidth, tma_fb_full, tma_mem_bandwidth, tma_sq_full", + "Unit": "cpu_core" + }, { "BriefDescription": "Giga Floating Point Operations Per Second", - "MetricExpr": "(cpu_core@FP_ARITH_INST_RETIRED.SCALAR@ + 2 * cpu_c= ore@FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE@ + 4 * cpu_core@FP_ARITH_INST_= RETIRED.4_FLOPS@ + 8 * cpu_core@FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE@) = / 1e9 / tma_info_system_time", + "MetricExpr": "(cpu_core@FP_ARITH_OPS_RETIRED.SCALAR@ + 2 * cpu_co= re@FP_ARITH_OPS_RETIRED.128B_PACKED_DOUBLE@ + 4 * cpu_core@FP_ARITH_OPS_RET= IRED.4_FLOPS@ + 8 * cpu_core@FP_ARITH_OPS_RETIRED.256B_PACKED_SINGLE@) / 1e= 9 / tma_info_system_time", "MetricGroup": "Cor;Flops;HPC", "MetricName": "tma_info_system_gflops", "PublicDescription": "Giga Floating Point Operations Per Second. A= ggregate across all supported options of: FP precisions, scalar and vector = instructions, vector-width", @@ -2061,6 +2082,13 @@ "MetricName": "tma_info_system_turbo_utilization", "Unit": "cpu_core" }, + { + "BriefDescription": "Measured Average Uncore Frequency for the SoC= [GHz]", + "MetricExpr": "tma_info_system_socket_clks / 1e9 / tma_info_system= _time", + "MetricGroup": "SoC", + "MetricName": "tma_info_system_uncore_frequency", + "Unit": "cpu_core" + }, { "BriefDescription": "Per-Logical Processor actual clocks when the = Logical Processor is active.", "MetricExpr": "cpu_core@CPU_CLK_UNHALTED.THREAD@", @@ -2182,12 +2210,12 @@ "Unit": "cpu_core" }, { - "BriefDescription": "This metric([SKL+] roughly; [LNL]) estimates = fraction of cycles with demand load accesses that hit the L1D cache", - "MetricExpr": "4 * cpu_core@DEPENDENT_LOADS.ANY@ / tma_info_thread= _clks", + "BriefDescription": "This metric ([SKL+] roughly; [LNL]) estimates= fraction of cycles with demand load accesses that hit the L1D cache", + "MetricExpr": "4 * cpu_core@DEPENDENT_LOADS.ANY\\,cmask\\=3D1@ / t= ma_info_thread_clks", "MetricGroup": "BvML;MemoryLat;TopdownL4;tma_L4_group;tma_l1_bound= _group", "MetricName": "tma_l1_latency_dependency", "MetricThreshold": "tma_l1_latency_dependency > 0.1 & (tma_l1_boun= d > 0.1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric([SKL+] roughly; [LNL]) estimates= fraction of cycles with demand load accesses that hit the L1D cache. The s= hort latency of the L1D cache may be exposed in pointer-chasing memory acce= ss patterns as an example. Sample with: MEM_LOAD_RETIRED.L1_HIT", + "PublicDescription": "This metric ([SKL+] roughly; [LNL]) estimate= s fraction of cycles with demand load accesses that hit the L1D cache. The = short latency of the L1D cache may be exposed in pointer-chasing memory acc= ess patterns as an example. Sample with: MEM_LOAD_RETIRED.L1_HIT", "ScaleUnit": "100%", "Unit": "cpu_core" }, @@ -2203,7 +2231,6 @@ }, { "BriefDescription": "This metric represents fraction of cycles wit= h demand load accesses that hit the L2 cache under unloaded scenarios (poss= ibly L2 latency limited)", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "cpu_core@MEM_LOAD_RETIRED.L2_HIT@ * min(cpu_core@ME= M_LOAD_RETIRED.L2_HIT@R, 3 * tma_info_system_core_frequency) * (1 + cpu_cor= e@MEM_LOAD_RETIRED.FB_HIT@ / cpu_core@MEM_LOAD_RETIRED.L1_MISS@ / 2) / tma_= info_thread_clks", "MetricGroup": "MemoryLat;TopdownL4;tma_L4_group;tma_l2_bound_grou= p", "MetricName": "tma_l2_hit_latency", @@ -2224,12 +2251,11 @@ }, { "BriefDescription": "This metric estimates fraction of cycles with= demand load accesses that hit the L3 cache under unloaded scenarios (possi= bly L3 latency limited)", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "cpu_core@MEM_LOAD_RETIRED.L3_HIT@ * min(cpu_core@ME= M_LOAD_RETIRED.L3_HIT@R, 9 * tma_info_system_core_frequency) * (1 + cpu_cor= e@MEM_LOAD_RETIRED.FB_HIT@ / cpu_core@MEM_LOAD_RETIRED.L1_MISS@ / 2) / tma_= info_thread_clks", "MetricGroup": "BvML;MemoryLat;TopdownL4;tma_L4_group;tma_issueLat= ;tma_l3_bound_group", "MetricName": "tma_l3_hit_latency", "MetricThreshold": "tma_l3_hit_latency > 0.1 & (tma_l3_bound > 0.0= 5 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric estimates fraction of cycles wit= h demand load accesses that hit the L3 cache under unloaded scenarios (poss= ibly L3 latency limited). Avoiding private cache misses (i.e. L2 misses/L3= hits) will improve the latency; reduce contention with sibling physical co= res and increase performance. Note the value of this node may overlap with= its siblings. Sample with: MEM_LOAD_RETIRED.L3_HIT_PS. Related metrics: tm= a_bottleneck_cache_memory_latency, tma_mem_latency", + "PublicDescription": "This metric estimates fraction of cycles wit= h demand load accesses that hit the L3 cache under unloaded scenarios (poss= ibly L3 latency limited). Avoiding private cache misses (i.e. L2 misses/L3= hits) will improve the latency; reduce contention with sibling physical co= res and increase performance. Note the value of this node may overlap with= its siblings. Sample with: MEM_LOAD_RETIRED.L3_HIT_PS. Related metrics: tm= a_bottleneck_data_cache_memory_latency, tma_mem_latency", "ScaleUnit": "100%", "Unit": "cpu_core" }, @@ -2311,6 +2337,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles the= CPU spent handling cache misses due to lock operations", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "cpu_core@MEM_INST_RETIRED.LOCK_LOADS@ * cpu_core@ME= M_INST_RETIRED.LOCK_LOADS@R / tma_info_thread_clks", "MetricGroup": "LockCont;Offcore;TopdownL4;tma_L4_group;tma_issueR= FO;tma_l1_bound_group", "MetricName": "tma_lock_latency", @@ -2321,7 +2348,7 @@ }, { "BriefDescription": "This metric represents Core fraction of cycle= s in which CPU was likely limited due to LSD (Loop Stream Detector) unit", - "MetricExpr": "cpu_core@LSD.UOPS\\,cmask\\=3D0x8\\,inv\\=3D0x1@ / = tma_info_thread_clks", + "MetricExpr": "cpu_core@LSD.UOPS\\,cmask\\=3D0x8\\,inv\\=3D0x1@ / = tma_info_thread_clks / 2", "MetricGroup": "FetchBW;LSD;TopdownL3;tma_L3_group;tma_fetch_bandw= idth_group", "MetricName": "tma_lsd", "MetricThreshold": "tma_lsd > 0.15 & tma_fetch_bandwidth > 0.2", @@ -2346,7 +2373,7 @@ "MetricGroup": "BvMB;MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_d= ram_bound_group;tma_issueBW", "MetricName": "tma_mem_bandwidth", "MetricThreshold": "tma_mem_bandwidth > 0.2 & (tma_dram_bound > 0.= 1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric estimates fraction of cycles whe= re the core's performance was likely hurt due to approaching bandwidth limi= ts of external memory - DRAM ([SPR-HBM] and/or HBM). The underlying heuris= tic assumes that a similar off-core traffic is generated by all IA cores. T= his metric does not aggregate non-data-read requests by this logical proces= sor; requests from other IA Logical Processors/Physical Cores/sockets; or o= ther non-IA devices like GPU; hence the maximum external memory bandwidth l= imits may or may not be approached when this metric is flagged (see Uncore = counters for that). Related metrics: tma_bottleneck_cache_memory_bandwidth,= tma_fb_full, tma_info_system_dram_bw_use, tma_sq_full", + "PublicDescription": "This metric estimates fraction of cycles whe= re the core's performance was likely hurt due to approaching bandwidth limi= ts of external memory - DRAM ([SPR-HBM] and/or HBM). The underlying heuris= tic assumes that a similar off-core traffic is generated by all IA cores. T= his metric does not aggregate non-data-read requests by this logical proces= sor; requests from other IA Logical Processors/Physical Cores/sockets; or o= ther non-IA devices like GPU; hence the maximum external memory bandwidth l= imits may or may not be approached when this metric is flagged (see Uncore = counters for that). Related metrics: tma_bottleneck_data_cache_memory_bandw= idth, tma_fb_full, tma_info_system_dram_bw_use, tma_sq_full", "ScaleUnit": "100%", "Unit": "cpu_core" }, @@ -2356,13 +2383,13 @@ "MetricGroup": "BvML;MemoryLat;Offcore;TopdownL4;tma_L4_group;tma_= dram_bound_group;tma_issueLat", "MetricName": "tma_mem_latency", "MetricThreshold": "tma_mem_latency > 0.1 & (tma_dram_bound > 0.1 = & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric estimates fraction of cycles whe= re the performance was likely hurt due to latency from external memory - DR= AM ([SPR-HBM] and/or HBM). This metric does not aggregate requests from ot= her Logical Processors/Physical Cores/sockets (see Uncore counters for that= ). Related metrics: tma_bottleneck_cache_memory_latency, tma_l3_hit_latency= ", + "PublicDescription": "This metric estimates fraction of cycles whe= re the performance was likely hurt due to latency from external memory - DR= AM ([SPR-HBM] and/or HBM). This metric does not aggregate requests from ot= her Logical Processors/Physical Cores/sockets (see Uncore counters for that= ). Related metrics: tma_bottleneck_data_cache_memory_latency, tma_l3_hit_la= tency", "ScaleUnit": "100%", "Unit": "cpu_core" }, { "BriefDescription": "This metric represents fraction of slots the = Memory subsystem within the Backend was a bottleneck", - "MetricExpr": "cpu_core@topdown\\-mem\\-bound@ / (cpu_core@topdown= \\-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retiri= ng@ + cpu_core@topdown\\-be\\-bound@) + 0 * tma_info_thread_slots", + "MetricExpr": "cpu_core@topdown\\-mem\\-bound@ / (cpu_core@topdown= \\-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retiri= ng@ + cpu_core@topdown\\-be\\-bound@)", "MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_b= ound_group", "MetricName": "tma_memory_bound", "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0= .2", @@ -2373,7 +2400,6 @@ }, { "BriefDescription": "This metric represents fraction of cycles the= CPU was stalled due to LFENCE Instructions.", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "13 * cpu_core@MISC2_RETIRED.LFENCE@ / tma_info_thre= ad_clks", "MetricGroup": "TopdownL4;tma_L4_group;tma_serializing_operation_g= roup", "MetricName": "tma_memory_fence", @@ -2412,7 +2438,7 @@ }, { "BriefDescription": "This metric represents Core fraction of cycle= s in which CPU was likely limited due to the MITE pipeline (the legacy deco= de pipeline)", - "MetricExpr": "(cpu_core@IDQ.MITE_UOPS\\,cmask\\=3D0x8\\,inv\\=3D0= x1@ / 2 + cpu_core@IDQ.MITE_UOPS@ / (cpu_core@IDQ.DSB_UOPS@ + cpu_core@IDQ.= MITE_UOPS@) * (cpu_core@IDQ_BUBBLES.CYCLES_0_UOPS_DELIV.CORE@ - cpu_core@ID= Q_BUBBLES.FETCH_LATENCY@)) / tma_info_thread_clks", + "MetricExpr": "(cpu_core@IDQ.MITE_UOPS\\,cmask\\=3D0x8\\,inv\\=3D0= x1@ / 2 + cpu_core@IDQ.MITE_UOPS@ / (cpu_core@IDQ.DSB_UOPS@ + cpu_core@IDQ.= MITE_UOPS@) * (cpu_core@IDQ_BUBBLES.STARVATION_CYCLES@ - cpu_core@IDQ_BUBBL= ES.FETCH_LATENCY@)) / tma_info_thread_clks", "MetricGroup": "DSBmiss;FetchBW;TopdownL3;tma_L3_group;tma_fetch_b= andwidth_group", "MetricName": "tma_mite", "MetricThreshold": "tma_mite > 0.1 & tma_fetch_bandwidth > 0.2", @@ -2432,7 +2458,7 @@ }, { "BriefDescription": "This metric represents Core fraction of cycle= s in which CPU was likely limited due to the Microcode Sequencer (MS) unit = - see Microcode_Sequencer node for details.", - "MetricExpr": "cpu_core@IDQ.MS_CYCLES_ANY@ / tma_info_thread_clks", + "MetricExpr": "cpu_core@IDQ.MS_CYCLES_ANY@ / tma_info_thread_clks = / 1.8", "MetricGroup": "MicroSeq;TopdownL3;tma_L3_group;tma_fetch_bandwidt= h_group", "MetricName": "tma_ms", "MetricThreshold": "tma_ms > 0.05 & tma_fetch_bandwidth > 0.2", @@ -2471,7 +2497,8 @@ }, { "BriefDescription": "This metric represents the remaining light uo= ps fraction the CPU has executed - remaining means not covered by other sib= ling nodes", - "MetricExpr": "max(0, tma_light_operations - (tma_x87_use + (cpu_c= ore@FP_ARITH_INST_RETIRED.SCALAR@ + cpu_core@FP_ARITH_INST_RETIRED.VECTOR@)= / (tma_retiring * tma_info_thread_slots) + (cpu_core@INT_VEC_RETIRED.ADD_1= 28@ + cpu_core@INT_VEC_RETIRED.VNNI_128@ + cpu_core@INT_VEC_RETIRED.ADD_256= @ + cpu_core@INT_VEC_RETIRED.MUL_256@ + cpu_core@INT_VEC_RETIRED.VNNI_256@)= / (tma_retiring * tma_info_thread_slots) + tma_memory_operations + tma_fus= ed_instructions + tma_non_fused_branches))", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "max(0, tma_light_operations - (tma_x87_use + (cpu_c= ore@FP_ARITH_OPS_RETIRED.SCALAR@ + cpu_core@FP_ARITH_OPS_RETIRED.VECTOR@) /= (tma_retiring * tma_info_thread_slots) + (cpu_core@INT_VEC_RETIRED.ADD_128= @ + cpu_core@INT_VEC_RETIRED.VNNI_128@ + cpu_core@INT_VEC_RETIRED.ADD_256@ = + cpu_core@INT_VEC_RETIRED.MUL_256@ + cpu_core@INT_VEC_RETIRED.VNNI_256@) /= (tma_retiring * tma_info_thread_slots) + tma_memory_operations + tma_fused= _instructions + tma_non_fused_branches))", "MetricGroup": "Pipeline;TopdownL3;tma_L3_group;tma_light_operatio= ns_group", "MetricName": "tma_other_light_ops", "MetricThreshold": "tma_other_light_ops > 0.3 & tma_light_operatio= ns > 0.6", @@ -2509,6 +2536,7 @@ }, { "BriefDescription": "This metric estimates fraction of cycles the = CPU performance was potentially limited due to Core computation issues (non= divider-related)", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "((cpu_core@EXE_ACTIVITY.EXE_BOUND_0_PORTS@ + (cpu_c= ore@EXE_ACTIVITY.1_PORTS_UTIL@ + tma_retiring * cpu_core@EXE_ACTIVITY.2_3_P= ORTS_UTIL@)) / tma_info_thread_clks if cpu_core@ARITH.DIV_ACTIVE@ < cpu_cor= e@CYCLE_ACTIVITY.STALLS_TOTAL@ - cpu_core@EXE_ACTIVITY.BOUND_ON_LOADS@ else= (cpu_core@EXE_ACTIVITY.1_PORTS_UTIL@ + tma_retiring * cpu_core@EXE_ACTIVIT= Y.2_3_PORTS_UTIL@) / tma_info_thread_clks)", "MetricGroup": "PortsUtil;TopdownL3;tma_L3_group;tma_core_bound_gr= oup", "MetricName": "tma_ports_utilization", @@ -2519,6 +2547,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles CPU= executed no uops on any execution port (Logical Processor cycles since ICL= , Physical Core cycles otherwise)", + "MetricConstraint": "NO_THRESHOLD_AND_NMI", "MetricExpr": "cpu_core@EXE_ACTIVITY.EXE_BOUND_0_PORTS@ / tma_info= _thread_clks", "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_ports_utiliza= tion_group", "MetricName": "tma_ports_utilized_0", @@ -2529,6 +2558,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles whe= re the CPU executed total of 1 uop per cycle on all execution ports (Logica= l Processor cycles since ICL, Physical Core cycles otherwise)", + "MetricConstraint": "NO_THRESHOLD_AND_NMI", "MetricExpr": "cpu_core@EXE_ACTIVITY.1_PORTS_UTIL@ / tma_info_thre= ad_clks", "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_issueL1;tma_p= orts_utilization_group", "MetricName": "tma_ports_utilized_1", @@ -2539,7 +2569,6 @@ }, { "BriefDescription": "This metric represents fraction of cycles CPU= executed total of 2 uops per cycle on all execution ports (Logical Process= or cycles since ICL, Physical Core cycles otherwise)", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "cpu_core@EXE_ACTIVITY.2_PORTS_UTIL@ / tma_info_thre= ad_clks", "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_issue2P;tma_p= orts_utilization_group", "MetricName": "tma_ports_utilized_2", @@ -2550,7 +2579,6 @@ }, { "BriefDescription": "This metric represents fraction of cycles CPU= executed total of 3 or more uops per cycle on all execution ports (Logical= Processor cycles since ICL, Physical Core cycles otherwise)", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "cpu_core@UOPS_EXECUTED.CYCLES_GE_3@ / tma_info_thre= ad_clks", "MetricGroup": "BvCB;PortsUtil;TopdownL4;tma_L4_group;tma_ports_ut= ilization_group", "MetricName": "tma_ports_utilized_3m", @@ -2571,7 +2599,7 @@ { "BriefDescription": "This category represents fraction of slots ut= ilized by useful work i.e. issued uops that eventually get retired", "DefaultMetricgroupName": "TopdownL1", - "MetricExpr": "cpu_core@topdown\\-retiring@ / (cpu_core@topdown\\-= fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retiring@= + cpu_core@topdown\\-be\\-bound@) + 0 * tma_info_thread_slots", + "MetricExpr": "cpu_core@topdown\\-retiring@ / (cpu_core@topdown\\-= fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retiring@= + cpu_core@topdown\\-be\\-bound@)", "MetricGroup": "BvUW;Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_retiring", "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.= 1", @@ -2586,7 +2614,7 @@ "MetricGroup": "BvIO;PortsUtil;TopdownL3;tma_L3_group;tma_core_bou= nd_group;tma_issueSO", "MetricName": "tma_serializing_operation", "MetricThreshold": "tma_serializing_operation > 0.1 & (tma_core_bo= und > 0.1 & tma_backend_bound > 0.2)", - "PublicDescription": "This metric represents fraction of cycles th= e CPU issue-pipeline was stalled due to serializing operations. Instruction= s like CPUID; WRMSR or LFENCE serialize the out-of-order execution which ma= y limit performance. Sample with: RESOURCE_STALLS.SCOREBOARD. Related metri= cs: tma_ms_switches", + "PublicDescription": "This metric represents fraction of cycles th= e CPU issue-pipeline was stalled due to serializing operations. Instruction= s like CPUID; WRMSR or LFENCE serialize the out-of-order execution which ma= y limit performance. Sample with: PARTIAL_RAT_STALLS.SCOREBOARD. Related me= trics: tma_ms_switches", "ScaleUnit": "100%", "Unit": "cpu_core" }, @@ -2602,7 +2630,6 @@ }, { "BriefDescription": "This metric represents fraction of cycles the= CPU was stalled due to PAUSE Instructions", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "cpu_core@CPU_CLK_UNHALTED.PAUSE@ / tma_info_thread_= clks", "MetricGroup": "TopdownL4;tma_L4_group;tma_serializing_operation_g= roup", "MetricName": "tma_slow_pause", @@ -2637,7 +2664,7 @@ "MetricGroup": "BvMB;MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_i= ssueBW;tma_l3_bound_group", "MetricName": "tma_sq_full", "MetricThreshold": "tma_sq_full > 0.3 & (tma_l3_bound > 0.05 & (tm= a_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric measures fraction of cycles wher= e the Super Queue (SQ) was full taking into account all request-types and b= oth hardware SMT threads (Logical Processors). Related metrics: tma_bottlen= eck_cache_memory_bandwidth, tma_fb_full, tma_info_system_dram_bw_use, tma_m= em_bandwidth", + "PublicDescription": "This metric measures fraction of cycles wher= e the Super Queue (SQ) was full taking into account all request-types and b= oth hardware SMT threads (Logical Processors). Related metrics: tma_bottlen= eck_data_cache_memory_bandwidth, tma_fb_full, tma_info_system_dram_bw_use, = tma_mem_bandwidth", "ScaleUnit": "100%", "Unit": "cpu_core" }, @@ -2651,6 +2678,15 @@ "ScaleUnit": "100%", "Unit": "cpu_core" }, + { + "BriefDescription": "This metric estimates clocks wasted due to lo= ads blocked due to unknown store address (did not do memory disambiguation)= or due to unknown store data", + "MetricExpr": "7 * cpu_core@LD_BLOCKS.STORE_EARLY\\,cmask\\=3D1@ /= tma_info_thread_clks", + "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group", + "MetricName": "tma_store_early_blk", + "MetricThreshold": "tma_store_early_blk > 0.2", + "ScaleUnit": "100%", + "Unit": "cpu_core" + }, { "BriefDescription": "This metric roughly estimates fraction of cyc= les when the memory subsystem had loads blocked since they could not forwar= d data from earlier (in program order) overlapping stores", "MetricExpr": "13 * cpu_core@LD_BLOCKS.STORE_FORWARD@ / tma_info_t= hread_clks", diff --git a/tools/perf/pmu-events/arch/x86/arrowlake/cache.json b/tools/pe= rf/pmu-events/arch/x86/arrowlake/cache.json index 91929d8bcf47..f5168b55a6f4 100644 --- a/tools/perf/pmu-events/arch/x86/arrowlake/cache.json +++ b/tools/perf/pmu-events/arch/x86/arrowlake/cache.json @@ -28,6 +28,16 @@ "UMask": "0x1", "Unit": "cpu_core" }, + { + "BriefDescription": "Cachelines replaced into the L1 d-cache. Succ= essful replacements only (not blocked) and exclude WB-miss case", + "Counter": "0,1,2,3,4,5,6,7,8,9", + "EventCode": "0x51", + "EventName": "L1D.L1_REPLACEMENT", + "PublicDescription": "Counts cachelines replaced into the L1 d-cac= he.", + "SampleAfterValue": "1000003", + "UMask": "0x4", + "Unit": "cpu_core" + }, { "BriefDescription": "Cachelines replaced into the L0 and L1 d-cach= e. Successful replacements only (not blocked) and exclude WB-miss case", "Counter": "0,1,2,3,4,5,6,7,8,9", @@ -540,7 +550,7 @@ "Data_LA": "1", "EventCode": "0xd0", "EventName": "MEM_INST_RETIRED.ALL_LOADS", - "PublicDescription": "Counts Instructions with at least one archit= ecturally visible load retired. Available PDIST counters: 0", + "PublicDescription": "Counts Instructions with at least one archit= ecturally visible load retired. Available PDIST counters: 0,1", "SampleAfterValue": "1000003", "UMask": "0x81", "Unit": "cpu_core" @@ -551,7 +561,7 @@ "Data_LA": "1", "EventCode": "0xd0", "EventName": "MEM_INST_RETIRED.ALL_STORES", - "PublicDescription": "Counts all retired store instructions. Avail= able PDIST counters: 0", + "PublicDescription": "Counts all retired store instructions. Avail= able PDIST counters: 0,1", "SampleAfterValue": "1000003", "UMask": "0x82", "Unit": "cpu_core" @@ -561,7 +571,7 @@ "Counter": "0,1,2,3", "EventCode": "0xd0", "EventName": "MEM_INST_RETIRED.ALL_SWPF", - "PublicDescription": "Counts all retired software prefetch instruc= tions. Available PDIST counters: 0", + "PublicDescription": "Counts all retired software prefetch instruc= tions. Available PDIST counters: 0,1", "SampleAfterValue": "1000003", "UMask": "0x84", "Unit": "cpu_core" @@ -572,7 +582,7 @@ "Data_LA": "1", "EventCode": "0xd0", "EventName": "MEM_INST_RETIRED.ANY", - "PublicDescription": "Counts all retired memory instructions - loa= ds and stores. Available PDIST counters: 0", + "PublicDescription": "Counts all retired memory instructions - loa= ds and stores. Available PDIST counters: 0,1", "SampleAfterValue": "1000003", "UMask": "0x87", "Unit": "cpu_core" @@ -583,7 +593,7 @@ "Data_LA": "1", "EventCode": "0xd0", "EventName": "MEM_INST_RETIRED.LOCK_LOADS", - "PublicDescription": "Counts retired load instructions with locked= access. Available PDIST counters: 0", + "PublicDescription": "Counts retired load instructions with locked= access. Available PDIST counters: 0,1", "SampleAfterValue": "100007", "UMask": "0x21", "Unit": "cpu_core" @@ -594,7 +604,7 @@ "Data_LA": "1", "EventCode": "0xd0", "EventName": "MEM_INST_RETIRED.SPLIT_LOADS", - "PublicDescription": "Counts retired load instructions that split = across a cacheline boundary. Available PDIST counters: 0", + "PublicDescription": "Counts retired load instructions that split = across a cacheline boundary. Available PDIST counters: 0,1", "SampleAfterValue": "100003", "UMask": "0x41", "Unit": "cpu_core" @@ -605,18 +615,29 @@ "Data_LA": "1", "EventCode": "0xd0", "EventName": "MEM_INST_RETIRED.SPLIT_STORES", - "PublicDescription": "Counts retired store instructions that split= across a cacheline boundary. Available PDIST counters: 0", + "PublicDescription": "Counts retired store instructions that split= across a cacheline boundary. Available PDIST counters: 0,1", "SampleAfterValue": "100003", "UMask": "0x42", "Unit": "cpu_core" }, + { + "BriefDescription": "Retired instructions that hit the STLB.", + "Counter": "0,1,2,3", + "Data_LA": "1", + "EventCode": "0xd0", + "EventName": "MEM_INST_RETIRED.STLB_HIT_ANY", + "PublicDescription": "Number of retired instructions with a clean = hit in the 2nd-level TLB (STLB). Available PDIST counters: 0,1", + "SampleAfterValue": "100003", + "UMask": "0xf", + "Unit": "cpu_core" + }, { "BriefDescription": "Retired load instructions that hit the STLB.", "Counter": "0,1,2,3", "Data_LA": "1", "EventCode": "0xd0", "EventName": "MEM_INST_RETIRED.STLB_HIT_LOADS", - "PublicDescription": "Number of retired load instructions with a c= lean hit in the 2nd-level TLB (STLB). Available PDIST counters: 0", + "PublicDescription": "Number of retired load instructions with a c= lean hit in the 2nd-level TLB (STLB). Available PDIST counters: 0,1", "SampleAfterValue": "100003", "UMask": "0x9", "Unit": "cpu_core" @@ -627,18 +648,39 @@ "Data_LA": "1", "EventCode": "0xd0", "EventName": "MEM_INST_RETIRED.STLB_HIT_STORES", - "PublicDescription": "Number of retired store instructions that hi= t in the 2nd-level TLB (STLB). Available PDIST counters: 0", + "PublicDescription": "Number of retired store instructions that hi= t in the 2nd-level TLB (STLB). Available PDIST counters: 0,1", "SampleAfterValue": "100003", "UMask": "0xa", "Unit": "cpu_core" }, + { + "BriefDescription": "Retired SWPF instructions that hit the STLB.", + "Counter": "0,1,2,3", + "EventCode": "0xd0", + "EventName": "MEM_INST_RETIRED.STLB_HIT_SWPF", + "PublicDescription": "Number of retired SWPF instructions that hit= in the 2nd-level TLB (STLB). Available PDIST counters: 0,1", + "SampleAfterValue": "1000003", + "UMask": "0xc", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Retired instructions that miss the STLB.", + "Counter": "0,1,2,3", + "Data_LA": "1", + "EventCode": "0xd0", + "EventName": "MEM_INST_RETIRED.STLB_MISS_ANY", + "PublicDescription": "Retired instructions that miss the STLB. Ava= ilable PDIST counters: 0,1", + "SampleAfterValue": "100003", + "UMask": "0x17", + "Unit": "cpu_core" + }, { "BriefDescription": "Retired load instructions that miss the STLB.= ", "Counter": "0,1,2,3", "Data_LA": "1", "EventCode": "0xd0", "EventName": "MEM_INST_RETIRED.STLB_MISS_LOADS", - "PublicDescription": "Number of retired load instructions that (st= art a) miss in the 2nd-level TLB (STLB). Available PDIST counters: 0", + "PublicDescription": "Number of retired load instructions that (st= art a) miss in the 2nd-level TLB (STLB). Available PDIST counters: 0,1", "SampleAfterValue": "100003", "UMask": "0x11", "Unit": "cpu_core" @@ -649,18 +691,28 @@ "Data_LA": "1", "EventCode": "0xd0", "EventName": "MEM_INST_RETIRED.STLB_MISS_STORES", - "PublicDescription": "Number of retired store instructions that (s= tart a) miss in the 2nd-level TLB (STLB). Available PDIST counters: 0", + "PublicDescription": "Number of retired store instructions that (s= tart a) miss in the 2nd-level TLB (STLB). Available PDIST counters: 0,1", "SampleAfterValue": "100003", "UMask": "0x12", "Unit": "cpu_core" }, + { + "BriefDescription": "Retired SWPF instructions that miss the STLB.= ", + "Counter": "0,1,2,3", + "EventCode": "0xd0", + "EventName": "MEM_INST_RETIRED.STLB_MISS_SWPF", + "PublicDescription": "Number of retired SWPF instructions that (st= art a) miss in the 2nd-level TLB (STLB). Available PDIST counters: 0,1", + "SampleAfterValue": "1000003", + "UMask": "0x14", + "Unit": "cpu_core" + }, { "BriefDescription": "Retired load instructions whose data sources = were a cross-core Snoop hits and forwards data from an in on-package core c= ache (induced by NI$)", "Counter": "0,1,2,3", "Data_LA": "1", "EventCode": "0xd2", "EventName": "MEM_LOAD_L3_HIT_RETIRED.XSNP_FWD", - "PublicDescription": "Counts retired load instructions whose data = sources were a cross-core Snoop hits and forwards data from an in on-packag= e core cache (induced by NI$) Available PDIST counters: 0", + "PublicDescription": "Counts retired load instructions whose data = sources were a cross-core Snoop hits and forwards data from an in on-packag= e core cache (induced by NI$) Available PDIST counters: 0,1", "SampleAfterValue": "20011", "UMask": "0x10", "Unit": "cpu_core" @@ -671,7 +723,7 @@ "Data_LA": "1", "EventCode": "0xd2", "EventName": "MEM_LOAD_L3_HIT_RETIRED.XSNP_HITM", - "PublicDescription": "Counts retired load instructions whose data = sources were HitM responses from shared L3, Hit-with-FWD is normally exclud= ed. Available PDIST counters: 0", + "PublicDescription": "Counts retired load instructions whose data = sources were HitM responses from shared L3, Hit-with-FWD is normally exclud= ed. Available PDIST counters: 0,1", "SampleAfterValue": "20011", "UMask": "0x4", "Unit": "cpu_core" @@ -682,7 +734,7 @@ "Data_LA": "1", "EventCode": "0xd2", "EventName": "MEM_LOAD_L3_HIT_RETIRED.XSNP_MISS", - "PublicDescription": "Counts the retired load instructions whose d= ata sources were L3 hit and cross-core snoop missed in on-pkg core cache. A= vailable PDIST counters: 0", + "PublicDescription": "Counts the retired load instructions whose d= ata sources were L3 hit and cross-core snoop missed in on-pkg core cache. A= vailable PDIST counters: 0,1", "SampleAfterValue": "20011", "UMask": "0x1", "Unit": "cpu_core" @@ -693,7 +745,7 @@ "Data_LA": "1", "EventCode": "0xd2", "EventName": "MEM_LOAD_L3_HIT_RETIRED.XSNP_NO_FWD", - "PublicDescription": "Counts retired load instructions whose data = sources were L3 and cross-core snoop hits in on-pkg core cache. Available P= DIST counters: 0", + "PublicDescription": "Counts retired load instructions whose data = sources were L3 and cross-core snoop hits in on-pkg core cache. Available P= DIST counters: 0,1", "SampleAfterValue": "20011", "UMask": "0x2", "Unit": "cpu_core" @@ -704,7 +756,7 @@ "Data_LA": "1", "EventCode": "0xd4", "EventName": "MEM_LOAD_MISC_RETIRED.UC", - "PublicDescription": "Retired instructions with at least one load = to uncacheable memory-type, or at least one cache-line split locked access = (Bus Lock). Available PDIST counters: 0", + "PublicDescription": "Retired instructions with at least one load = to uncacheable memory-type, or at least one cache-line split locked access = (Bus Lock). Available PDIST counters: 0,1", "SampleAfterValue": "100007", "UMask": "0x4", "Unit": "cpu_core" @@ -715,7 +767,7 @@ "Data_LA": "1", "EventCode": "0xd1", "EventName": "MEM_LOAD_RETIRED.FB_HIT", - "PublicDescription": "Counts retired load instructions with at lea= st one uop was load missed in L1 but hit FB (Fill Buffers) due to preceding= miss to the same cache line with data not ready. Available PDIST counters:= 0", + "PublicDescription": "Counts retired load instructions with at lea= st one uop was load missed in L1 but hit FB (Fill Buffers) due to preceding= miss to the same cache line with data not ready. Available PDIST counters:= 0,1", "SampleAfterValue": "100007", "UMask": "0x40", "Unit": "cpu_core" @@ -726,7 +778,7 @@ "Data_LA": "1", "EventCode": "0xd1", "EventName": "MEM_LOAD_RETIRED.L1_HIT", - "PublicDescription": "Counts retired load instructions with at lea= st one uop that hit in the L1 data cache. This event includes all SW prefet= ches and lock instructions regardless of the data source. Available PDIST c= ounters: 0", + "PublicDescription": "Counts retired load instructions with at lea= st one uop that hit in the L1 data cache. This event includes all SW prefet= ches and lock instructions regardless of the data source. Available PDIST c= ounters: 0,1", "SampleAfterValue": "1000003", "UMask": "0x101", "Unit": "cpu_core" @@ -737,7 +789,7 @@ "Data_LA": "1", "EventCode": "0xd1", "EventName": "MEM_LOAD_RETIRED.L1_HIT_L0", - "PublicDescription": "Counts retired load instructions with at lea= st one uop that hit in the Level 0 of the L1 data cache. This event include= s all SW prefetches and lock instructions regardless of the data source. Av= ailable PDIST counters: 0", + "PublicDescription": "Counts retired load instructions with at lea= st one uop that hit in the Level 0 of the L1 data cache. This event include= s all SW prefetches and lock instructions regardless of the data source. Av= ailable PDIST counters: 0,1", "SampleAfterValue": "1000003", "UMask": "0x1", "Unit": "cpu_core" @@ -747,7 +799,7 @@ "Counter": "0,1,2,3", "EventCode": "0xd1", "EventName": "MEM_LOAD_RETIRED.L1_HIT_L1", - "PublicDescription": "Counts retired load instructions with at lea= st one uop that hit in the Level 1 of the L1 data cache. Available PDIST co= unters: 0", + "PublicDescription": "Counts retired load instructions with at lea= st one uop that hit in the Level 1 of the L1 data cache. Available PDIST co= unters: 0,1", "SampleAfterValue": "1000003", "Unit": "cpu_core" }, @@ -757,7 +809,7 @@ "Data_LA": "1", "EventCode": "0xd1", "EventName": "MEM_LOAD_RETIRED.L1_MISS", - "PublicDescription": "Counts retired load instructions with at lea= st one uop that missed in the L1 cache. Available PDIST counters: 0", + "PublicDescription": "Counts retired load instructions with at lea= st one uop that missed in the L1 cache. Available PDIST counters: 0,1", "SampleAfterValue": "200003", "UMask": "0x8", "Unit": "cpu_core" @@ -768,7 +820,7 @@ "Data_LA": "1", "EventCode": "0xd1", "EventName": "MEM_LOAD_RETIRED.L2_HIT", - "PublicDescription": "Counts retired load instructions with L2 cac= he hits as data sources. Available PDIST counters: 0", + "PublicDescription": "Counts retired load instructions with L2 cac= he hits as data sources. Available PDIST counters: 0,1", "SampleAfterValue": "200003", "UMask": "0x2", "Unit": "cpu_core" @@ -779,7 +831,7 @@ "Data_LA": "1", "EventCode": "0xd1", "EventName": "MEM_LOAD_RETIRED.L2_MISS", - "PublicDescription": "Counts retired load instructions missed L2 c= ache as data sources. Available PDIST counters: 0", + "PublicDescription": "Counts retired load instructions missed L2 c= ache as data sources. Available PDIST counters: 0,1", "SampleAfterValue": "100021", "UMask": "0x10", "Unit": "cpu_core" @@ -790,7 +842,7 @@ "Data_LA": "1", "EventCode": "0xd1", "EventName": "MEM_LOAD_RETIRED.L3_HIT", - "PublicDescription": "Counts retired load instructions with at lea= st one uop that hit in the L3 cache. Available PDIST counters: 0", + "PublicDescription": "Counts retired load instructions with at lea= st one uop that hit in the L3 cache. Available PDIST counters: 0,1", "SampleAfterValue": "100021", "UMask": "0x4", "Unit": "cpu_core" @@ -801,7 +853,7 @@ "Data_LA": "1", "EventCode": "0xd1", "EventName": "MEM_LOAD_RETIRED.L3_MISS", - "PublicDescription": "Counts retired load instructions with at lea= st one uop that missed in the L3 cache. Available PDIST counters: 0", + "PublicDescription": "Counts retired load instructions with at lea= st one uop that missed in the L3 cache. Available PDIST counters: 0,1", "SampleAfterValue": "50021", "UMask": "0x20", "Unit": "cpu_core" @@ -1029,7 +1081,7 @@ }, { "BriefDescription": "Counts the number of tagged load uops retired= that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD = - Only counts with PEBS enabled.", - "Counter": "0,1", + "Counter": "0,1,2,3,4,5,6,7", "Data_LA": "1", "EventCode": "0xd0", "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_1024", @@ -1053,7 +1105,7 @@ }, { "BriefDescription": "Counts the number of tagged load uops retired= that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD = - Only counts with PEBS enabled.", - "Counter": "0,1", + "Counter": "0,1,2,3,4,5,6,7", "Data_LA": "1", "EventCode": "0xd0", "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_128", @@ -1077,7 +1129,7 @@ }, { "BriefDescription": "Counts the number of tagged load uops retired= that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD = - Only counts with PEBS enabled.", - "Counter": "0,1", + "Counter": "0,1,2,3,4,5,6,7", "Data_LA": "1", "EventCode": "0xd0", "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_16", @@ -1089,7 +1141,7 @@ }, { "BriefDescription": "Counts the number of tagged load uops retired= that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD = - Only counts with PEBS enabled.", - "Counter": "0,1", + "Counter": "0,1,2,3,4,5,6,7", "Data_LA": "1", "EventCode": "0xd0", "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_2048", @@ -1113,7 +1165,7 @@ }, { "BriefDescription": "Counts the number of tagged load uops retired= that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD = - Only counts with PEBS enabled.", - "Counter": "0,1", + "Counter": "0,1,2,3,4,5,6,7", "Data_LA": "1", "EventCode": "0xd0", "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_256", @@ -1137,7 +1189,7 @@ }, { "BriefDescription": "Counts the number of tagged load uops retired= that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD = - Only counts with PEBS enabled.", - "Counter": "0,1", + "Counter": "0,1,2,3,4,5,6,7", "Data_LA": "1", "EventCode": "0xd0", "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_32", @@ -1161,7 +1213,7 @@ }, { "BriefDescription": "Counts the number of tagged load uops retired= that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD = - Only counts with PEBS enabled.", - "Counter": "0,1", + "Counter": "0,1,2,3,4,5,6,7", "Data_LA": "1", "EventCode": "0xd0", "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_4", @@ -1185,7 +1237,7 @@ }, { "BriefDescription": "Counts the number of tagged load uops retired= that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD = - Only counts with PEBS enabled.", - "Counter": "0,1", + "Counter": "0,1,2,3,4,5,6,7", "Data_LA": "1", "EventCode": "0xd0", "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_512", @@ -1209,7 +1261,7 @@ }, { "BriefDescription": "Counts the number of tagged load uops retired= that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD = - Only counts with PEBS enabled.", - "Counter": "0,1", + "Counter": "0,1,2,3,4,5,6,7", "Data_LA": "1", "EventCode": "0xd0", "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_64", @@ -1233,7 +1285,7 @@ }, { "BriefDescription": "Counts the number of tagged load uops retired= that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD = - Only counts with PEBS enabled.", - "Counter": "0,1", + "Counter": "0,1,2,3,4,5,6,7", "Data_LA": "1", "EventCode": "0xd0", "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_8", diff --git a/tools/perf/pmu-events/arch/x86/arrowlake/frontend.json b/tools= /perf/pmu-events/arch/x86/arrowlake/frontend.json index 56cf1ec63200..db2ef84ca041 100644 --- a/tools/perf/pmu-events/arch/x86/arrowlake/frontend.json +++ b/tools/perf/pmu-events/arch/x86/arrowlake/frontend.json @@ -81,7 +81,7 @@ "EventName": "FRONTEND_RETIRED.ANY_ANT", "MSRIndex": "0x3F7", "MSRValue": "0x9", - "PublicDescription": "Always Not Taken (ANT) conditional retired b= ranches (no BTB entry and not mispredicted) Available PDIST counters: 0", + "PublicDescription": "Always Not Taken (ANT) conditional retired b= ranches (no BTB entry and not mispredicted) Available PDIST counters: 0,1", "SampleAfterValue": "100007", "UMask": "0x3", "Unit": "cpu_core" @@ -93,7 +93,7 @@ "EventName": "FRONTEND_RETIRED.ANY_DSB_MISS", "MSRIndex": "0x3F7", "MSRValue": "0x1", - "PublicDescription": "Counts retired Instructions that experienced= DSB (Decode stream buffer i.e. the decoded instruction-cache) miss. Availa= ble PDIST counters: 0", + "PublicDescription": "Counts retired Instructions that experienced= DSB (Decode stream buffer i.e. the decoded instruction-cache) miss. Availa= ble PDIST counters: 0,1", "SampleAfterValue": "100007", "UMask": "0x3", "Unit": "cpu_core" @@ -179,7 +179,7 @@ "EventName": "FRONTEND_RETIRED.DSB_MISS", "MSRIndex": "0x3F7", "MSRValue": "0x11", - "PublicDescription": "Number of retired Instructions that experien= ced a critical DSB (Decode stream buffer i.e. the decoded instruction-cache= ) miss. Critical means stalls were exposed to the back-end as a result of t= he DSB miss. Available PDIST counters: 0", + "PublicDescription": "Number of retired Instructions that experien= ced a critical DSB (Decode stream buffer i.e. the decoded instruction-cache= ) miss. Critical means stalls were exposed to the back-end as a result of t= he DSB miss. Available PDIST counters: 0,1", "SampleAfterValue": "100007", "UMask": "0x3", "Unit": "cpu_core" @@ -218,7 +218,7 @@ "EventName": "FRONTEND_RETIRED.ITLB_MISS", "MSRIndex": "0x3F7", "MSRValue": "0x14", - "PublicDescription": "Counts retired Instructions that experienced= iTLB (Instruction TLB) true miss. Available PDIST counters: 0", + "PublicDescription": "Counts retired Instructions that experienced= iTLB (Instruction TLB) true miss. Available PDIST counters: 0,1", "SampleAfterValue": "100007", "UMask": "0x3", "Unit": "cpu_core" @@ -239,7 +239,7 @@ "EventName": "FRONTEND_RETIRED.L1I_MISS", "MSRIndex": "0x3F7", "MSRValue": "0x12", - "PublicDescription": "Counts retired Instructions who experienced = Instruction L1 Cache true miss. Available PDIST counters: 0", + "PublicDescription": "Counts retired Instructions who experienced = Instruction L1 Cache true miss. Available PDIST counters: 0,1", "SampleAfterValue": "100007", "UMask": "0x3", "Unit": "cpu_core" @@ -251,7 +251,7 @@ "EventName": "FRONTEND_RETIRED.L2_MISS", "MSRIndex": "0x3F7", "MSRValue": "0x13", - "PublicDescription": "Counts retired Instructions who experienced = Instruction L2 Cache true miss. Available PDIST counters: 0", + "PublicDescription": "Counts retired Instructions who experienced = Instruction L2 Cache true miss. Available PDIST counters: 0,1", "SampleAfterValue": "100007", "UMask": "0x3", "Unit": "cpu_core" @@ -263,7 +263,7 @@ "EventName": "FRONTEND_RETIRED.LATENCY_GE_128", "MSRIndex": "0x3F7", "MSRValue": "0x608006", - "PublicDescription": "Counts retired instructions that are fetched= after an interval where the front-end delivered no uops for a period of 12= 8 cycles which was not interrupted by a back-end stall. Available PDIST cou= nters: 0", + "PublicDescription": "Counts retired instructions that are fetched= after an interval where the front-end delivered no uops for a period of 12= 8 cycles which was not interrupted by a back-end stall. Available PDIST cou= nters: 0,1", "SampleAfterValue": "100007", "UMask": "0x3", "Unit": "cpu_core" @@ -275,7 +275,7 @@ "EventName": "FRONTEND_RETIRED.LATENCY_GE_16", "MSRIndex": "0x3F7", "MSRValue": "0x601006", - "PublicDescription": "Counts retired instructions that are deliver= ed to the back-end after a front-end stall of at least 16 cycles. During th= is period the front-end delivered no uops. Available PDIST counters: 0", + "PublicDescription": "Counts retired instructions that are deliver= ed to the back-end after a front-end stall of at least 16 cycles. During th= is period the front-end delivered no uops. Available PDIST counters: 0,1", "SampleAfterValue": "100007", "UMask": "0x3", "Unit": "cpu_core" @@ -287,7 +287,7 @@ "EventName": "FRONTEND_RETIRED.LATENCY_GE_2", "MSRIndex": "0x3F7", "MSRValue": "0x600206", - "PublicDescription": "Retired instructions that are fetched after = an interval where the front-end delivered no uops for a period of at least = 2 cycles which was not interrupted by a back-end stall. Available PDIST cou= nters: 0", + "PublicDescription": "Retired instructions that are fetched after = an interval where the front-end delivered no uops for a period of at least = 2 cycles which was not interrupted by a back-end stall. Available PDIST cou= nters: 0,1", "SampleAfterValue": "100007", "UMask": "0x3", "Unit": "cpu_core" @@ -299,7 +299,7 @@ "EventName": "FRONTEND_RETIRED.LATENCY_GE_256", "MSRIndex": "0x3F7", "MSRValue": "0x610006", - "PublicDescription": "Counts retired instructions that are fetched= after an interval where the front-end delivered no uops for a period of 25= 6 cycles which was not interrupted by a back-end stall. Available PDIST cou= nters: 0", + "PublicDescription": "Counts retired instructions that are fetched= after an interval where the front-end delivered no uops for a period of 25= 6 cycles which was not interrupted by a back-end stall. Available PDIST cou= nters: 0,1", "SampleAfterValue": "100007", "UMask": "0x3", "Unit": "cpu_core" @@ -311,7 +311,7 @@ "EventName": "FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1", "MSRIndex": "0x3F7", "MSRValue": "0x100206", - "PublicDescription": "Counts retired instructions that are deliver= ed to the back-end after the front-end had at least 1 bubble-slot for a per= iod of 2 cycles. A bubble-slot is an empty issue-pipeline slot while there = was no RAT stall. Available PDIST counters: 0", + "PublicDescription": "Counts retired instructions that are deliver= ed to the back-end after the front-end had at least 1 bubble-slot for a per= iod of 2 cycles. A bubble-slot is an empty issue-pipeline slot while there = was no RAT stall. Available PDIST counters: 0,1", "SampleAfterValue": "100007", "UMask": "0x3", "Unit": "cpu_core" @@ -323,7 +323,7 @@ "EventName": "FRONTEND_RETIRED.LATENCY_GE_32", "MSRIndex": "0x3F7", "MSRValue": "0x602006", - "PublicDescription": "Counts retired instructions that are deliver= ed to the back-end after a front-end stall of at least 32 cycles. During th= is period the front-end delivered no uops. Available PDIST counters: 0", + "PublicDescription": "Counts retired instructions that are deliver= ed to the back-end after a front-end stall of at least 32 cycles. During th= is period the front-end delivered no uops. Available PDIST counters: 0,1", "SampleAfterValue": "100007", "UMask": "0x3", "Unit": "cpu_core" @@ -335,7 +335,7 @@ "EventName": "FRONTEND_RETIRED.LATENCY_GE_4", "MSRIndex": "0x3F7", "MSRValue": "0x600406", - "PublicDescription": "Counts retired instructions that are fetched= after an interval where the front-end delivered no uops for a period of 4 = cycles which was not interrupted by a back-end stall. Available PDIST count= ers: 0", + "PublicDescription": "Counts retired instructions that are fetched= after an interval where the front-end delivered no uops for a period of 4 = cycles which was not interrupted by a back-end stall. Available PDIST count= ers: 0,1", "SampleAfterValue": "100007", "UMask": "0x3", "Unit": "cpu_core" @@ -347,7 +347,7 @@ "EventName": "FRONTEND_RETIRED.LATENCY_GE_512", "MSRIndex": "0x3F7", "MSRValue": "0x620006", - "PublicDescription": "Counts retired instructions that are fetched= after an interval where the front-end delivered no uops for a period of 51= 2 cycles which was not interrupted by a back-end stall. Available PDIST cou= nters: 0", + "PublicDescription": "Counts retired instructions that are fetched= after an interval where the front-end delivered no uops for a period of 51= 2 cycles which was not interrupted by a back-end stall. Available PDIST cou= nters: 0,1", "SampleAfterValue": "100007", "UMask": "0x3", "Unit": "cpu_core" @@ -359,7 +359,7 @@ "EventName": "FRONTEND_RETIRED.LATENCY_GE_64", "MSRIndex": "0x3F7", "MSRValue": "0x604006", - "PublicDescription": "Counts retired instructions that are fetched= after an interval where the front-end delivered no uops for a period of 64= cycles which was not interrupted by a back-end stall. Available PDIST coun= ters: 0", + "PublicDescription": "Counts retired instructions that are fetched= after an interval where the front-end delivered no uops for a period of 64= cycles which was not interrupted by a back-end stall. Available PDIST coun= ters: 0,1", "SampleAfterValue": "100007", "UMask": "0x3", "Unit": "cpu_core" @@ -371,7 +371,7 @@ "EventName": "FRONTEND_RETIRED.LATENCY_GE_8", "MSRIndex": "0x3F7", "MSRValue": "0x600806", - "PublicDescription": "Counts retired instructions that are deliver= ed to the back-end after a front-end stall of at least 8 cycles. During thi= s period the front-end delivered no uops. Available PDIST counters: 0", + "PublicDescription": "Counts retired instructions that are deliver= ed to the back-end after a front-end stall of at least 8 cycles. During thi= s period the front-end delivered no uops. Available PDIST counters: 0,1", "SampleAfterValue": "100007", "UMask": "0x3", "Unit": "cpu_core" @@ -383,7 +383,7 @@ "EventName": "FRONTEND_RETIRED.MISP_ANT", "MSRIndex": "0x3F7", "MSRValue": "0x9", - "PublicDescription": "ANT retired branches that got just mispredic= ted Available PDIST counters: 0", + "PublicDescription": "ANT retired branches that got just mispredic= ted Available PDIST counters: 0,1", "SampleAfterValue": "100007", "UMask": "0x2", "Unit": "cpu_core" @@ -395,7 +395,7 @@ "EventName": "FRONTEND_RETIRED.MS_FLOWS", "MSRIndex": "0x3F7", "MSRValue": "0x8", - "PublicDescription": "Counts flows delivered by the Microcode Sequ= encer Available PDIST counters: 0", + "PublicDescription": "Counts flows delivered by the Microcode Sequ= encer Available PDIST counters: 0,1", "SampleAfterValue": "100007", "UMask": "0x3", "Unit": "cpu_core" @@ -443,7 +443,7 @@ "EventName": "FRONTEND_RETIRED.STLB_MISS", "MSRIndex": "0x3F7", "MSRValue": "0x15", - "PublicDescription": "Counts retired Instructions that experienced= STLB (2nd level TLB) true miss. Available PDIST counters: 0", + "PublicDescription": "Counts retired Instructions that experienced= STLB (2nd level TLB) true miss. Available PDIST counters: 0,1", "SampleAfterValue": "100007", "UMask": "0x3", "Unit": "cpu_core" @@ -455,7 +455,7 @@ "EventName": "FRONTEND_RETIRED.UNKNOWN_BRANCH", "MSRIndex": "0x3F7", "MSRValue": "0x17", - "PublicDescription": "Number retired branch instructions that caus= ed the front-end to be resteered when it finds the instruction in a fetch l= ine. This is called Unknown Branch which occurs for the first time a branch= instruction is fetched or when the branch is not tracked by the BPU (Branc= h Prediction Unit) anymore. Available PDIST counters: 0", + "PublicDescription": "Number retired branch instructions that caus= ed the front-end to be resteered when it finds the instruction in a fetch l= ine. This is called Unknown Branch which occurs for the first time a branch= instruction is fetched or when the branch is not tracked by the BPU (Branc= h Prediction Unit) anymore. Available PDIST counters: 0,1", "SampleAfterValue": "100007", "UMask": "0x3", "Unit": "cpu_core" diff --git a/tools/perf/pmu-events/arch/x86/arrowlake/memory.json b/tools/p= erf/pmu-events/arch/x86/arrowlake/memory.json index fb8d4ac69bda..1e6360347c0f 100644 --- a/tools/perf/pmu-events/arch/x86/arrowlake/memory.json +++ b/tools/perf/pmu-events/arch/x86/arrowlake/memory.json @@ -163,7 +163,7 @@ "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_1024", "MSRIndex": "0x3F6", "MSRValue": "0x400", - "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 1024 cycles. Reporte= d latency may be longer than just the memory latency. Available PDIST count= ers: 0", + "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 1024 cycles. Reporte= d latency may be longer than just the memory latency.", "SampleAfterValue": "53", "UMask": "0x1", "Unit": "cpu_core" @@ -176,7 +176,7 @@ "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_128", "MSRIndex": "0x3F6", "MSRValue": "0x80", - "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 128 cycles. Reported= latency may be longer than just the memory latency. Available PDIST counte= rs: 0", + "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 128 cycles. Reported= latency may be longer than just the memory latency.", "SampleAfterValue": "1009", "UMask": "0x1", "Unit": "cpu_core" @@ -189,7 +189,7 @@ "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_16", "MSRIndex": "0x3F6", "MSRValue": "0x10", - "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 16 cycles. Reported = latency may be longer than just the memory latency. Available PDIST counter= s: 0", + "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 16 cycles. Reported = latency may be longer than just the memory latency.", "SampleAfterValue": "20011", "UMask": "0x1", "Unit": "cpu_core" @@ -202,7 +202,7 @@ "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_2048", "MSRIndex": "0x3F6", "MSRValue": "0x800", - "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 2048 cycles. Reporte= d latency may be longer than just the memory latency. Available PDIST count= ers: 0", + "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 2048 cycles. Reporte= d latency may be longer than just the memory latency.", "SampleAfterValue": "23", "UMask": "0x1", "Unit": "cpu_core" @@ -215,7 +215,7 @@ "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_256", "MSRIndex": "0x3F6", "MSRValue": "0x100", - "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 256 cycles. Reported= latency may be longer than just the memory latency. Available PDIST counte= rs: 0", + "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 256 cycles. Reported= latency may be longer than just the memory latency.", "SampleAfterValue": "503", "UMask": "0x1", "Unit": "cpu_core" @@ -228,7 +228,7 @@ "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_32", "MSRIndex": "0x3F6", "MSRValue": "0x20", - "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 32 cycles. Reported = latency may be longer than just the memory latency. Available PDIST counter= s: 0", + "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 32 cycles. Reported = latency may be longer than just the memory latency.", "SampleAfterValue": "100007", "UMask": "0x1", "Unit": "cpu_core" @@ -241,7 +241,7 @@ "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_4", "MSRIndex": "0x3F6", "MSRValue": "0x4", - "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 4 cycles. Reported l= atency may be longer than just the memory latency. Available PDIST counters= : 0", + "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 4 cycles. Reported l= atency may be longer than just the memory latency.", "SampleAfterValue": "100003", "UMask": "0x1", "Unit": "cpu_core" @@ -254,7 +254,7 @@ "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_512", "MSRIndex": "0x3F6", "MSRValue": "0x200", - "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 512 cycles. Reported= latency may be longer than just the memory latency. Available PDIST counte= rs: 0", + "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 512 cycles. Reported= latency may be longer than just the memory latency.", "SampleAfterValue": "101", "UMask": "0x1", "Unit": "cpu_core" @@ -267,7 +267,7 @@ "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_64", "MSRIndex": "0x3F6", "MSRValue": "0x40", - "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 64 cycles. Reported = latency may be longer than just the memory latency. Available PDIST counter= s: 0", + "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 64 cycles. Reported = latency may be longer than just the memory latency.", "SampleAfterValue": "2003", "UMask": "0x1", "Unit": "cpu_core" @@ -280,7 +280,7 @@ "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_8", "MSRIndex": "0x3F6", "MSRValue": "0x8", - "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 8 cycles. Reported l= atency may be longer than just the memory latency. Available PDIST counters= : 0", + "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 8 cycles. Reported l= atency may be longer than just the memory latency.", "SampleAfterValue": "50021", "UMask": "0x1", "Unit": "cpu_core" @@ -291,7 +291,7 @@ "Data_LA": "1", "EventCode": "0xcd", "EventName": "MEM_TRANS_RETIRED.STORE_SAMPLE", - "PublicDescription": "Counts Retired memory accesses with at least= 1 store operation. This PEBS event is the precisely-distributed (PDist) tr= igger covering all stores uops for sampling by the PEBS Store Latency Facil= ity. The facility is described in Intel SDM Volume 3 section 19.9.8 Availab= le PDIST counters: 0", + "PublicDescription": "Counts Retired memory accesses with at least= 1 store operation. This PEBS event is the precisely-distributed (PDist) tr= igger covering all stores uops for sampling by the PEBS Store Latency Facil= ity. The facility is described in Intel SDM Volume 3 section 19.9.8 Availab= le PDIST counters: 0,1", "SampleAfterValue": "1000003", "UMask": "0x2", "Unit": "cpu_core" diff --git a/tools/perf/pmu-events/arch/x86/arrowlake/pipeline.json b/tools= /perf/pmu-events/arch/x86/arrowlake/pipeline.json index 18a22368b99b..0651e2c4561e 100644 --- a/tools/perf/pmu-events/arch/x86/arrowlake/pipeline.json +++ b/tools/perf/pmu-events/arch/x86/arrowlake/pipeline.json @@ -74,13 +74,14 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc4", "EventName": "BR_INST_RETIRED.ALL_BRANCHES", - "PublicDescription": "Counts all branch instructions retired. Avai= lable PDIST counters: 0", + "PublicDescription": "Counts all branch instructions retired. Avai= lable PDIST counters: 0,1", "SampleAfterValue": "400009", "Unit": "cpu_core" }, { "BriefDescription": "Counts the total number of branch instruction= s retired for all branch types.", "Counter": "0,1,2,3,4,5,6,7", + "Errata": "ARL010, ARL011", "EventCode": "0xc4", "EventName": "BR_INST_RETIRED.ALL_BRANCHES", "PublicDescription": "Counts the total number of instructions in w= hich the instruction pointer (IP) of the processor is resteered due to a br= anch instruction and the branch instruction successfully retires. All bran= ch type instructions are accounted for.", @@ -101,7 +102,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc4", "EventName": "BR_INST_RETIRED.COND", - "PublicDescription": "Counts conditional branch instructions retir= ed. Available PDIST counters: 0", + "PublicDescription": "Counts conditional branch instructions retir= ed. Available PDIST counters: 0,1", "SampleAfterValue": "400009", "UMask": "0x111", "Unit": "cpu_core" @@ -109,6 +110,7 @@ { "BriefDescription": "Counts the number of retired JCC (Jump on Con= ditional Code) branch instructions retired, includes both taken and not tak= en branches.", "Counter": "0,1,2,3,4,5,6,7", + "Errata": "ARL011", "EventCode": "0xc4", "EventName": "BR_INST_RETIRED.COND", "SampleAfterValue": "200003", @@ -120,7 +122,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc4", "EventName": "BR_INST_RETIRED.COND_NTAKEN", - "PublicDescription": "Counts not taken branch instructions retired= . Available PDIST counters: 0", + "PublicDescription": "Counts not taken branch instructions retired= . Available PDIST counters: 0,1", "SampleAfterValue": "400009", "UMask": "0x10", "Unit": "cpu_core" @@ -139,7 +141,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc4", "EventName": "BR_INST_RETIRED.COND_TAKEN", - "PublicDescription": "Counts taken conditional branch instructions= retired. Available PDIST counters: 0", + "PublicDescription": "Counts taken conditional branch instructions= retired. Available PDIST counters: 0,1", "SampleAfterValue": "400009", "UMask": "0x101", "Unit": "cpu_core" @@ -158,7 +160,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc4", "EventName": "BR_INST_RETIRED.COND_TAKEN_BWD", - "PublicDescription": "Counts taken backward conditional branch ins= tructions retired. Available PDIST counters: 0", + "PublicDescription": "Counts taken backward conditional branch ins= tructions retired. Available PDIST counters: 0,1", "SampleAfterValue": "400009", "UMask": "0x1", "Unit": "cpu_core" @@ -168,7 +170,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc4", "EventName": "BR_INST_RETIRED.COND_TAKEN_FWD", - "PublicDescription": "Counts taken forward conditional branch inst= ructions retired. Available PDIST counters: 0", + "PublicDescription": "Counts taken forward conditional branch inst= ructions retired. Available PDIST counters: 0,1", "SampleAfterValue": "400009", "UMask": "0x102", "Unit": "cpu_core" @@ -187,7 +189,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc4", "EventName": "BR_INST_RETIRED.FAR_BRANCH", - "PublicDescription": "Counts far branch instructions retired. Avai= lable PDIST counters: 0", + "PublicDescription": "Counts far branch instructions retired. Avai= lable PDIST counters: 0,1", "SampleAfterValue": "100007", "UMask": "0x40", "Unit": "cpu_core" @@ -195,6 +197,7 @@ { "BriefDescription": "Counts the number of far branch instructions = retired, includes far jump, far call and return, and interrupt call and ret= urn.", "Counter": "0,1,2,3,4,5,6,7", + "Errata": "ARL011", "EventCode": "0xc4", "EventName": "BR_INST_RETIRED.FAR_BRANCH", "SampleAfterValue": "200003", @@ -215,7 +218,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc4", "EventName": "BR_INST_RETIRED.INDIRECT", - "PublicDescription": "Counts near indirect branch instructions ret= ired excluding returns. TSX abort is an indirect branch. Available PDIST co= unters: 0", + "PublicDescription": "Counts near indirect branch instructions ret= ired excluding returns. TSX abort is an indirect branch. Available PDIST co= unters: 0,1", "SampleAfterValue": "100003", "UMask": "0x80", "Unit": "cpu_core" @@ -223,6 +226,7 @@ { "BriefDescription": "Counts the number of near indirect JMP and ne= ar indirect CALL branch instructions retired.", "Counter": "0,1,2,3,4,5,6,7", + "Errata": "ARL011", "EventCode": "0xc4", "EventName": "BR_INST_RETIRED.INDIRECT", "SampleAfterValue": "200003", @@ -241,6 +245,7 @@ { "BriefDescription": "Counts the number of near indirect CALL branc= h instructions retired.", "Counter": "0,1,2,3,4,5,6,7", + "Errata": "ARL011", "EventCode": "0xc4", "EventName": "BR_INST_RETIRED.INDIRECT_CALL", "SampleAfterValue": "200003", @@ -270,7 +275,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc4", "EventName": "BR_INST_RETIRED.NEAR_CALL", - "PublicDescription": "Counts both direct and indirect near call in= structions retired. Available PDIST counters: 0", + "PublicDescription": "Counts both direct and indirect near call in= structions retired. Available PDIST counters: 0,1", "SampleAfterValue": "100007", "UMask": "0x2", "Unit": "cpu_core" @@ -278,6 +283,7 @@ { "BriefDescription": "Counts the number of near CALL branch instruc= tions retired.", "Counter": "0,1,2,3,4,5,6,7", + "Errata": "ARL010, ARL011", "EventCode": "0xc4", "EventName": "BR_INST_RETIRED.NEAR_CALL", "SampleAfterValue": "200003", @@ -298,7 +304,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc4", "EventName": "BR_INST_RETIRED.NEAR_RETURN", - "PublicDescription": "Counts return instructions retired. Availabl= e PDIST counters: 0", + "PublicDescription": "Counts return instructions retired. Availabl= e PDIST counters: 0,1", "SampleAfterValue": "100007", "UMask": "0x8", "Unit": "cpu_core" @@ -317,7 +323,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc4", "EventName": "BR_INST_RETIRED.NEAR_TAKEN", - "PublicDescription": "Counts taken branch instructions retired. Av= ailable PDIST counters: 0", + "PublicDescription": "Counts taken branch instructions retired. Av= ailable PDIST counters: 0,1", "SampleAfterValue": "400009", "UMask": "0x20", "Unit": "cpu_core" @@ -325,6 +331,7 @@ { "BriefDescription": "Counts the number of near taken branch instru= ctions retired.", "Counter": "0,1,2,3,4,5,6,7", + "Errata": "ARL011", "EventCode": "0xc4", "EventName": "BR_INST_RETIRED.NEAR_TAKEN", "SampleAfterValue": "200003", @@ -372,7 +379,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc5", "EventName": "BR_MISP_RETIRED.ALL_BRANCHES", - "PublicDescription": "Counts all the retired branch instructions t= hat were mispredicted by the processor. A branch misprediction occurs when = the processor incorrectly predicts the destination of the branch. When the= misprediction is discovered at execution, all the instructions executed in= the wrong (speculative) path must be discarded, and the processor must sta= rt fetching from the correct path. Available PDIST counters: 0", + "PublicDescription": "Counts all the retired branch instructions t= hat were mispredicted by the processor. A branch misprediction occurs when = the processor incorrectly predicts the destination of the branch. When the= misprediction is discovered at execution, all the instructions executed in= the wrong (speculative) path must be discarded, and the processor must sta= rt fetching from the correct path. Available PDIST counters: 0,1", "SampleAfterValue": "400009", "Unit": "cpu_core" }, @@ -390,7 +397,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc5", "EventName": "BR_MISP_RETIRED.ALL_BRANCHES_COST", - "PublicDescription": "All mispredicted branch instructions retired= . This precise event may be used to get the misprediction cost via the Reti= re_Latency field of PEBS. It fires on the instruction that immediately foll= ows the mispredicted branch. Available PDIST counters: 0", + "PublicDescription": "All mispredicted branch instructions retired= . This precise event may be used to get the misprediction cost via the Reti= re_Latency field of PEBS. It fires on the instruction that immediately foll= ows the mispredicted branch. Available PDIST counters: 0,1", "SampleAfterValue": "400009", "UMask": "0x44", "Unit": "cpu_core" @@ -409,7 +416,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc5", "EventName": "BR_MISP_RETIRED.COND", - "PublicDescription": "Counts mispredicted conditional branch instr= uctions retired. Available PDIST counters: 0", + "PublicDescription": "Counts mispredicted conditional branch instr= uctions retired. Available PDIST counters: 0,1", "SampleAfterValue": "400009", "UMask": "0x111", "Unit": "cpu_core" @@ -428,7 +435,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc5", "EventName": "BR_MISP_RETIRED.COND_COST", - "PublicDescription": "Mispredicted conditional branch instructions= retired. This precise event may be used to get the misprediction cost via = the Retire_Latency field of PEBS. It fires on the instruction that immediat= ely follows the mispredicted branch. Available PDIST counters: 0", + "PublicDescription": "Mispredicted conditional branch instructions= retired. This precise event may be used to get the misprediction cost via = the Retire_Latency field of PEBS. It fires on the instruction that immediat= ely follows the mispredicted branch. Available PDIST counters: 0,1", "SampleAfterValue": "400009", "UMask": "0x151", "Unit": "cpu_core" @@ -438,7 +445,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc5", "EventName": "BR_MISP_RETIRED.COND_NTAKEN", - "PublicDescription": "Counts the number of conditional branch inst= ructions retired that were mispredicted and the branch direction was not ta= ken. Available PDIST counters: 0", + "PublicDescription": "Counts the number of conditional branch inst= ructions retired that were mispredicted and the branch direction was not ta= ken. Available PDIST counters: 0,1", "SampleAfterValue": "400009", "UMask": "0x10", "Unit": "cpu_core" @@ -448,7 +455,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc5", "EventName": "BR_MISP_RETIRED.COND_NTAKEN_COST", - "PublicDescription": "Mispredicted non-taken conditional branch in= structions retired. This precise event may be used to get the misprediction= cost via the Retire_Latency field of PEBS. It fires on the instruction tha= t immediately follows the mispredicted branch. Available PDIST counters: 0", + "PublicDescription": "Mispredicted non-taken conditional branch in= structions retired. This precise event may be used to get the misprediction= cost via the Retire_Latency field of PEBS. It fires on the instruction tha= t immediately follows the mispredicted branch. Available PDIST counters: 0,= 1", "SampleAfterValue": "400009", "UMask": "0x50", "Unit": "cpu_core" @@ -467,7 +474,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc5", "EventName": "BR_MISP_RETIRED.COND_TAKEN", - "PublicDescription": "Counts taken conditional mispredicted branch= instructions retired. Available PDIST counters: 0", + "PublicDescription": "Counts taken conditional mispredicted branch= instructions retired. Available PDIST counters: 0,1", "SampleAfterValue": "400009", "UMask": "0x101", "Unit": "cpu_core" @@ -486,7 +493,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc5", "EventName": "BR_MISP_RETIRED.COND_TAKEN_BWD", - "PublicDescription": "Counts taken backward conditional mispredict= ed branch instructions retired. Available PDIST counters: 0", + "PublicDescription": "Counts taken backward conditional mispredict= ed branch instructions retired. Available PDIST counters: 0,1", "SampleAfterValue": "400009", "UMask": "0x1", "Unit": "cpu_core" @@ -496,7 +503,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc5", "EventName": "BR_MISP_RETIRED.COND_TAKEN_BWD_COST", - "PublicDescription": "number of branch instructions retired that w= ere mispredicted and taken backward. This precise event may be used to get = the misprediction cost via the Retire_Latency field of PEBS. It fires on th= e instruction that immediately follows the mispredicted branch. Available P= DIST counters: 0", + "PublicDescription": "number of branch instructions retired that w= ere mispredicted and taken backward. This precise event may be used to get = the misprediction cost via the Retire_Latency field of PEBS. It fires on th= e instruction that immediately follows the mispredicted branch. Available P= DIST counters: 0,1", "SampleAfterValue": "400009", "UMask": "0x8001", "Unit": "cpu_core" @@ -506,7 +513,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc5", "EventName": "BR_MISP_RETIRED.COND_TAKEN_COST", - "PublicDescription": "Mispredicted taken conditional branch instru= ctions retired. This precise event may be used to get the misprediction cos= t via the Retire_Latency field of PEBS. It fires on the instruction that im= mediately follows the mispredicted branch. Available PDIST counters: 0", + "PublicDescription": "Mispredicted taken conditional branch instru= ctions retired. This precise event may be used to get the misprediction cos= t via the Retire_Latency field of PEBS. It fires on the instruction that im= mediately follows the mispredicted branch. Available PDIST counters: 0,1", "SampleAfterValue": "400009", "UMask": "0x141", "Unit": "cpu_core" @@ -516,7 +523,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc5", "EventName": "BR_MISP_RETIRED.COND_TAKEN_FWD", - "PublicDescription": "Counts taken forward conditional mispredicte= d branch instructions retired. Available PDIST counters: 0", + "PublicDescription": "Counts taken forward conditional mispredicte= d branch instructions retired. Available PDIST counters: 0,1", "SampleAfterValue": "400009", "Unit": "cpu_core" }, @@ -525,7 +532,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc5", "EventName": "BR_MISP_RETIRED.COND_TAKEN_FWD_COST", - "PublicDescription": "number of branch instructions retired that w= ere mispredicted and taken forward. This precise event may be used to get t= he misprediction cost via the Retire_Latency field of PEBS. It fires on the= instruction that immediately follows the mispredicted branch. Available PD= IST counters: 0", + "PublicDescription": "number of branch instructions retired that w= ere mispredicted and taken forward. This precise event may be used to get t= he misprediction cost via the Retire_Latency field of PEBS. It fires on the= instruction that immediately follows the mispredicted branch. Available PD= IST counters: 0,1", "SampleAfterValue": "400009", "UMask": "0x8002", "Unit": "cpu_core" @@ -544,7 +551,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc5", "EventName": "BR_MISP_RETIRED.INDIRECT", - "PublicDescription": "Counts miss-predicted near indirect branch i= nstructions retired excluding returns. TSX abort is an indirect branch. Ava= ilable PDIST counters: 0", + "PublicDescription": "Counts miss-predicted near indirect branch i= nstructions retired excluding returns. TSX abort is an indirect branch. Ava= ilable PDIST counters: 0,1", "SampleAfterValue": "100003", "UMask": "0x80", "Unit": "cpu_core" @@ -572,7 +579,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc5", "EventName": "BR_MISP_RETIRED.INDIRECT_CALL", - "PublicDescription": "Counts retired mispredicted indirect (near t= aken) CALL instructions, including both register and memory indirect. Avail= able PDIST counters: 0", + "PublicDescription": "Counts retired mispredicted indirect (near t= aken) CALL instructions, including both register and memory indirect. Avail= able PDIST counters: 0,1", "SampleAfterValue": "400009", "UMask": "0x2", "Unit": "cpu_core" @@ -591,7 +598,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc5", "EventName": "BR_MISP_RETIRED.INDIRECT_CALL_COST", - "PublicDescription": "Mispredicted indirect CALL retired. This pre= cise event may be used to get the misprediction cost via the Retire_Latency= field of PEBS. It fires on the instruction that immediately follows the mi= spredicted branch. Available PDIST counters: 0", + "PublicDescription": "Mispredicted indirect CALL retired. This pre= cise event may be used to get the misprediction cost via the Retire_Latency= field of PEBS. It fires on the instruction that immediately follows the mi= spredicted branch. Available PDIST counters: 0,1", "SampleAfterValue": "400009", "UMask": "0x42", "Unit": "cpu_core" @@ -601,7 +608,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc5", "EventName": "BR_MISP_RETIRED.INDIRECT_COST", - "PublicDescription": "Mispredicted near indirect branch instructio= ns retired (excluding returns). This precise event may be used to get the m= isprediction cost via the Retire_Latency field of PEBS. It fires on the ins= truction that immediately follows the mispredicted branch. Available PDIST = counters: 0", + "PublicDescription": "Mispredicted near indirect branch instructio= ns retired (excluding returns). This precise event may be used to get the m= isprediction cost via the Retire_Latency field of PEBS. It fires on the ins= truction that immediately follows the mispredicted branch. Available PDIST = counters: 0,1", "SampleAfterValue": "100003", "UMask": "0xc0", "Unit": "cpu_core" @@ -620,7 +627,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc5", "EventName": "BR_MISP_RETIRED.NEAR_TAKEN", - "PublicDescription": "Counts number of near branch instructions re= tired that were mispredicted and taken. Available PDIST counters: 0", + "PublicDescription": "Counts number of near branch instructions re= tired that were mispredicted and taken. Available PDIST counters: 0,1", "SampleAfterValue": "400009", "UMask": "0x20", "Unit": "cpu_core" @@ -639,7 +646,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc5", "EventName": "BR_MISP_RETIRED.NEAR_TAKEN_COST", - "PublicDescription": "Mispredicted taken near branch instructions = retired. This precise event may be used to get the misprediction cost via t= he Retire_Latency field of PEBS. It fires on the instruction that immediate= ly follows the mispredicted branch. Available PDIST counters: 0", + "PublicDescription": "Mispredicted taken near branch instructions = retired. This precise event may be used to get the misprediction cost via t= he Retire_Latency field of PEBS. It fires on the instruction that immediate= ly follows the mispredicted branch. Available PDIST counters: 0,1", "SampleAfterValue": "400009", "UMask": "0x60", "Unit": "cpu_core" @@ -649,7 +656,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc5", "EventName": "BR_MISP_RETIRED.RET", - "PublicDescription": "This is a non-precise version (that is, does= not use PEBS) of the event that counts mispredicted return instructions re= tired. Available PDIST counters: 0", + "PublicDescription": "This is a non-precise version (that is, does= not use PEBS) of the event that counts mispredicted return instructions re= tired. Available PDIST counters: 0,1", "SampleAfterValue": "100007", "UMask": "0x8", "Unit": "cpu_core" @@ -677,7 +684,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc5", "EventName": "BR_MISP_RETIRED.RET_COST", - "PublicDescription": "Mispredicted ret instructions retired. This = precise event may be used to get the misprediction cost via the Retire_Late= ncy field of PEBS. It fires on the instruction that immediately follows the= mispredicted branch. Available PDIST counters: 0", + "PublicDescription": "Mispredicted ret instructions retired. This = precise event may be used to get the misprediction cost via the Retire_Late= ncy field of PEBS. It fires on the instruction that immediately follows the= mispredicted branch. Available PDIST counters: 0,1", "SampleAfterValue": "100007", "UMask": "0x48", "Unit": "cpu_core" @@ -1046,7 +1053,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc0", "EventName": "INST_RETIRED.ANY_P", - "PublicDescription": "Counts the number of X86 instructions retire= d - an Architectural PerfMon event. Counting continues during hardware inte= rrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is co= unted by a designated fixed counter freeing up programmable counters to cou= nt other events. INST_RETIRED.ANY_P is counted by a programmable counter. A= vailable PDIST counters: 0", + "PublicDescription": "Counts the number of X86 instructions retire= d - an Architectural PerfMon event. Counting continues during hardware inte= rrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is co= unted by a designated fixed counter freeing up programmable counters to cou= nt other events. INST_RETIRED.ANY_P is counted by a programmable counter. A= vailable PDIST counters: 0,1", "SampleAfterValue": "2000003", "Unit": "cpu_core" }, @@ -1063,7 +1070,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc0", "EventName": "INST_RETIRED.BR_FUSED", - "PublicDescription": "retired macro-fused uops when there is a bra= nch in the macro-fused pair (the two instructions that got macro-fused coun= t once in this pmon) Available PDIST counters: 0", + "PublicDescription": "retired macro-fused uops when there is a bra= nch in the macro-fused pair (the two instructions that got macro-fused coun= t once in this pmon) Available PDIST counters: 0,1", "SampleAfterValue": "1000003", "UMask": "0x10", "Unit": "cpu_core" @@ -1073,7 +1080,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc0", "EventName": "INST_RETIRED.MACRO_FUSED", - "PublicDescription": "INST_RETIRED.MACRO_FUSED Available PDIST cou= nters: 0", + "PublicDescription": "INST_RETIRED.MACRO_FUSED Available PDIST cou= nters: 0,1", "SampleAfterValue": "2000003", "UMask": "0x30", "Unit": "cpu_core" @@ -1083,7 +1090,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc0", "EventName": "INST_RETIRED.NOP", - "PublicDescription": "Counts all retired NOP or ENDBR32/64 or PREF= ETCHIT0/1 instructions Available PDIST counters: 0", + "PublicDescription": "Counts all retired NOP or ENDBR32/64 or PREF= ETCHIT0/1 instructions Available PDIST counters: 0,1", "SampleAfterValue": "2000003", "UMask": "0x2", "Unit": "cpu_core" @@ -1102,7 +1109,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc0", "EventName": "INST_RETIRED.REP_ITERATION", - "PublicDescription": "Number of iterations of Repeat (REP) string = retired instructions such as MOVS, CMPS, and SCAS. Each has a byte, word, a= nd doubleword version and string instructions can be repeated using a repet= ition prefix, REP, that allows their architectural execution to be repeated= a number of times as specified by the RCX register. Note the number of ite= rations is implementation-dependent. Available PDIST counters: 0", + "PublicDescription": "Number of iterations of Repeat (REP) string = retired instructions such as MOVS, CMPS, and SCAS. Each has a byte, word, a= nd doubleword version and string instructions can be repeated using a repet= ition prefix, REP, that allows their architectural execution to be repeated= a number of times as specified by the RCX register. Note the number of ite= rations is implementation-dependent. Available PDIST counters: 0,1", "SampleAfterValue": "2000003", "UMask": "0x8", "Unit": "cpu_core" @@ -1348,6 +1355,15 @@ "UMask": "0x88", "Unit": "cpu_core" }, + { + "BriefDescription": "Counts the number of times a load got early b= locked due to preceding store operation with unknown address or unknown dat= a. Excluding in-line (immediate) wakeups", + "Counter": "0,1,2,3,4,5,6,7,8,9", + "EventCode": "0x03", + "EventName": "LD_BLOCKS.STORE_EARLY", + "SampleAfterValue": "100003", + "UMask": "0xa1", + "Unit": "cpu_core" + }, { "BriefDescription": "Counts the number of occurrences a retired lo= ad gets blocked because its address partially overlaps with an older store = (size mismatch) - unknown_sta/bad_forward", "Counter": "0,1,2,3,4,5,6,7", @@ -1563,7 +1579,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xe4", "EventName": "MISC_RETIRED.LBR_INSERTS", - "PublicDescription": "LBR record is inserted Available PDIST count= ers: 0", + "PublicDescription": "LBR record is inserted Available PDIST count= ers: 0,1", "SampleAfterValue": "1000003", "UMask": "0x1", "Unit": "cpu_core" @@ -1929,7 +1945,7 @@ }, { "BriefDescription": "Fixed Counter: Counts the number of retiremen= t slots not consumed due to front end stalls.", - "Counter": "37", + "Counter": "Fixed counter 5", "EventName": "TOPDOWN_FE_BOUND.ALL", "SampleAfterValue": "1000003", "UMask": "0x6", @@ -2126,7 +2142,7 @@ }, { "BriefDescription": "Fixed Counter: Counts the number of consumed = retirement slots.", - "Counter": "38", + "Counter": "Fixed counter 6", "EventName": "TOPDOWN_RETIRING.ALL", "SampleAfterValue": "1000003", "UMask": "0x7", diff --git a/tools/perf/pmu-events/arch/x86/mapfile.csv b/tools/perf/pmu-ev= ents/arch/x86/mapfile.csv index d0a17905c74e..12017f568606 100644 --- a/tools/perf/pmu-events/arch/x86/mapfile.csv +++ b/tools/perf/pmu-events/arch/x86/mapfile.csv @@ -1,7 +1,7 @@ Family-model,Version,Filename,EventType GenuineIntel-6-(97|9A|B7|BA|BF),v1.33,alderlake,core GenuineIntel-6-BE,v1.33,alderlaken,core -GenuineIntel-6-C[56],v1.09,arrowlake,core +GenuineIntel-6-C[56],v1.12,arrowlake,core GenuineIntel-6-(1C|26|27|35|36),v5,bonnell,core GenuineIntel-6-(3D|47),v30,broadwell,core GenuineIntel-6-56,v12,broadwellde,core --=20 2.50.0.727.gbf7dc18ff4-goog From nobody Mon Oct 6 15:13:22 2025 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A6B2921ADAE for ; Sat, 19 Jul 2025 03:45:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752896734; cv=none; b=YxNWLukDmK8/9AIZjx5BrAP3edj8yj8/Am15GCc4Dv3mRXznT8n43E0lVdfSckW1VMsU30LVb9aiJaxpmR0/P0ed2hyyw88LzEj+3yUXeN5MZSnejuQ0DoeZ2xUyVCScYCv2U407a5CjooAD1yIW8flHWqfPhPz/ViJOM938FyY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752896734; c=relaxed/simple; bh=jmLkC5TZM2YjOSI+N40sX4WdopSUFI1vlbpj4HW04yY=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Content-Type; b=uKv5H0QopY9ZTUVAKPdF69QEdB+fsoALY0i/PYum0KwfnidaN9V1Gd3tDetclQVd/krFwtTC7ruXn9t2bpB45eGl0cP4JEI3xDn4YlaiJ3KIsQX1oyNCbh+mEckev11N25yaS/aPuCsEmRRgxWxzHdSfe8xIqC/1oM/5QnS+MzI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=yH+KjjMW; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="yH+KjjMW" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-3141a9a6888so2471822a91.3 for ; Fri, 18 Jul 2025 20:45:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1752896731; x=1753501531; darn=vger.kernel.org; h=content-transfer-encoding:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=xKB/1q/ZKai1YtE4X+rn/QCKNZwFa0Jrgf8XK1OI1hE=; b=yH+KjjMWvo88YK7spN60u/6DsD44EXdqdrvoDTGgu2cAeZaqNWa1CBLQZwkLG7FCN2 ShZMVlMK+GYVcu9j7D8PaSd0KmojQXsIjnDK73OXKsxiX6EmUH+du45jNR/A9njZL1hB Y5ADzTT2L3cSqGIjS2ieqN2lW/CdnniYX0fh3mBTU/3FpzCt0+fenoklClbhr84LKAyi L6E1wh8fuQ/KtdFA0Rt3/y4/Fsot0WNPyZexMBCxOPNFjKF/zw5Zy2k8j0Z/xHMSDJ8I rfn5F+4FcGrHODTqtiUBboXC7bjP6Js7PZ/I4l8SFXS946a5XcPfcldE/GtyBcorQ1TQ Icjw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752896731; x=1753501531; h=content-transfer-encoding:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=xKB/1q/ZKai1YtE4X+rn/QCKNZwFa0Jrgf8XK1OI1hE=; b=wabXU7+M2zsbCxavkZLpYtGwRia0M4zZ17AIX9mkAwa0Axn/0k3Y4vSuGE/u+MFYnd Zu55KXSK2kkZ++7bwY21W/awCoR4h5FNhtyUfAqFqGlsqjfAlQw1N01kGlLWdDVbtxJi N3b7CygB910bWoqGhKDtDgQxZLAzjLHZQuc7IG37Z1A9lRDAbFOWv5pEi3S9agClna0U 8Sx+s2ZHwNu+byNH0g+IsFoWjBnYUxR2hauDK8TUAvuGE5L2wSDSm8sGt4SrRT/ZI3ya w8PLyCp9x2+S4byq8Z/MwAp7WjleoETJsOYp/ln7C0UFsuXSMrLnmqZZEvDfz9dmY557 qoWQ== X-Forwarded-Encrypted: i=1; AJvYcCXKqbLAVXs1DmWochhFpOUGXsyeYiG28BgEsZgZIwCSi+Mecs7eJ18WvT9o9PhObcn65MKKSkA1WS9Hs9o=@vger.kernel.org X-Gm-Message-State: AOJu0YwDP8zm7/47eB2zEXsUbUM378gLuinCx1hSTxbUz6EUJAxQu7El ijlQJupCTiUHBQDoQuXV+WXQ6B9//tbT8ma8f3zahg425bJxYG322WNzXCq8NL+1s1P9n++E3JK odCkUMzGqsQ== X-Google-Smtp-Source: AGHT+IHN/0/5FktNBmF5BDcRSEkUtHTg38UqeInvYMqLGghUY80XMw5norVYO9bHrLaICW31QTqRsYNiL+vE X-Received: from pjee13.prod.google.com ([2002:a17:90b:578d:b0:311:f699:df0a]) (user=irogers job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90a:d44c:b0:2f8:34df:5652 with SMTP id 98e67ed59e1d1-31c9e761788mr18677584a91.21.1752896731013; Fri, 18 Jul 2025 20:45:31 -0700 (PDT) Date: Fri, 18 Jul 2025 20:44:59 -0700 In-Reply-To: <20250719034515.2000467-1-irogers@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250719034515.2000467-1-irogers@google.com> X-Mailer: git-send-email 2.50.0.727.gbf7dc18ff4-goog Message-ID: <20250719034515.2000467-4-irogers@google.com> Subject: [PATCH v1 03/19] perf vendor events: Update broadwell metrics From: Ian Rogers To: Thomas Falcon , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Kan Liang , "=?UTF-8?q?Andreas=20F=C3=A4rber?=" , Manivannan Sadhasivam , Caleb Biggers , Weilin Wang , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-actions@lists.infradead.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Update metrics from TMA 5.0 to 5.1. Signed-off-by: Ian Rogers Tested-by: Thomas Falcon --- .../arch/x86/broadwell/bdw-metrics.json | 30 ++++++++--------- .../arch/x86/broadwellde/bdwde-metrics.json | 30 ++++++++--------- .../arch/x86/broadwellx/bdx-metrics.json | 33 +++++++++---------- 3 files changed, 42 insertions(+), 51 deletions(-) diff --git a/tools/perf/pmu-events/arch/x86/broadwell/bdw-metrics.json b/to= ols/perf/pmu-events/arch/x86/broadwell/bdw-metrics.json index 89750117a7f6..1d8e910f5961 100644 --- a/tools/perf/pmu-events/arch/x86/broadwell/bdw-metrics.json +++ b/tools/perf/pmu-events/arch/x86/broadwell/bdw-metrics.json @@ -1,49 +1,49 @@ [ { "BriefDescription": "C2 residency percent per package", - "MetricExpr": "cstate_pkg@c2\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c2\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C2_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C3 residency percent per core", - "MetricExpr": "cstate_core@c3\\-residency@ / TSC", + "MetricExpr": "cstate_core@c3\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C3_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C3 residency percent per package", - "MetricExpr": "cstate_pkg@c3\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c3\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C3_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per core", - "MetricExpr": "cstate_core@c6\\-residency@ / TSC", + "MetricExpr": "cstate_core@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per package", - "MetricExpr": "cstate_pkg@c6\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C7 residency percent per core", - "MetricExpr": "cstate_core@c7\\-residency@ / TSC", + "MetricExpr": "cstate_core@c7\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C7_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C7 residency percent per package", - "MetricExpr": "cstate_pkg@c7\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c7\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C7_Pkg_Residency", "ScaleUnit": "100%" @@ -80,7 +80,6 @@ }, { "BriefDescription": "This metric represents Core fraction of cycle= s CPU dispatched uops on execution ports for ALU operations.", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "(UOPS_DISPATCHED_PORT.PORT_0 + UOPS_DISPATCHED_PORT= .PORT_1 + UOPS_DISPATCHED_PORT.PORT_5 + UOPS_DISPATCHED_PORT.PORT_6) / tma_= info_thread_slots", "MetricGroup": "TopdownL5;tma_L5_group;tma_ports_utilized_3m_group= ", "MetricName": "tma_alu_op_utilization", @@ -98,7 +97,6 @@ }, { "BriefDescription": "This category represents fraction of slots wh= ere no uops are being delivered due to a lack of required resources for acc= epting new uops in the Backend", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "1 - (tma_frontend_bound + tma_bad_speculation + tma= _retiring)", "MetricGroup": "BvOB;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", @@ -139,7 +137,6 @@ }, { "BriefDescription": "This metric estimates fraction of cycles the = CPU retired uops originated from CISC (complex instruction set computer) in= struction", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "max(0, tma_microcode_sequencer - tma_assists)", "MetricGroup": "TopdownL4;tma_L4_group;tma_microcode_sequencer_gro= up", "MetricName": "tma_cisc", @@ -640,7 +637,7 @@ "MetricThreshold": "tma_info_memory_tlb_page_walks_utilization > 0= .5" }, { - "BriefDescription": "", + "BriefDescription": "Mem;Backend;CacheHits", "MetricExpr": "UOPS_EXECUTED.THREAD / (cpu@UOPS_EXECUTED.CORE\\,cm= ask\\=3D1@ / 2 if #SMT_on else UOPS_EXECUTED.CYCLES_GE_1_UOP_EXEC)", "MetricGroup": "Cor;Pipeline;PortsUtil;SMT", "MetricName": "tma_info_pipeline_execute" @@ -653,7 +650,7 @@ }, { "BriefDescription": "Measured Average Core Frequency for unhalted = processors [GHz]", - "MetricExpr": "tma_info_system_turbo_utilization * TSC / 1e9 / tma= _info_system_time", + "MetricExpr": "tma_info_system_turbo_utilization * msr@tsc@ / 1e9 = / tma_info_system_time", "MetricGroup": "Power;Summary", "MetricName": "tma_info_system_core_frequency" }, @@ -665,7 +662,7 @@ }, { "BriefDescription": "Average number of utilized CPUs", - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / TSC", + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@", "MetricGroup": "Summary", "MetricName": "tma_info_system_cpus_utilized" }, @@ -854,7 +851,6 @@ }, { "BriefDescription": "This metric represents Core fraction of cycle= s CPU dispatched uops on execution port for Load operations", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "(UOPS_DISPATCHED_PORT.PORT_2 + UOPS_DISPATCHED_PORT= .PORT_3 + UOPS_DISPATCHED_PORT.PORT_7 - UOPS_DISPATCHED_PORT.PORT_4) / (2 *= tma_info_core_core_clks)", "MetricGroup": "TopdownL5;tma_L5_group;tma_ports_utilized_3m_group= ", "MetricName": "tma_load_op_utilization", @@ -1032,7 +1028,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles CPU= executed no uops on any execution port (Logical Processor cycles since ICL= , Physical Core cycles otherwise)", - "MetricExpr": "(cpu@UOPS_EXECUTED.CORE\\,inv\\,cmask\\=3D1@ / 2 if= #SMT_on else (CYCLE_ACTIVITY.STALLS_TOTAL - (RS_EVENTS.EMPTY_CYCLES if tma= _fetch_latency > 0.1 else 0)) / tma_info_core_core_clks)", + "MetricExpr": "(cpu@UOPS_EXECUTED.CORE\\,inv\\,cmask\\=3D1@ / 2 if= #SMT_on else CYCLE_ACTIVITY.STALLS_TOTAL - (RS_EVENTS.EMPTY_CYCLES if tma_= fetch_latency > 0.1 else 0)) / tma_info_core_core_clks", "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_ports_utiliza= tion_group", "MetricName": "tma_ports_utilized_0", "MetricThreshold": "tma_ports_utilized_0 > 0.2 & (tma_ports_utiliz= ation > 0.15 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2))", @@ -1041,7 +1037,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles whe= re the CPU executed total of 1 uop per cycle on all execution ports (Logica= l Processor cycles since ICL, Physical Core cycles otherwise)", - "MetricExpr": "((cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D1@ - cpu@UOPS_= EXECUTED.CORE\\,cmask\\=3D2@) / 2 if #SMT_on else (UOPS_EXECUTED.CYCLES_GE_= 1_UOP_EXEC - UOPS_EXECUTED.CYCLES_GE_2_UOPS_EXEC) / tma_info_core_core_clks= )", + "MetricExpr": "((cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D1@ - cpu@UOPS_= EXECUTED.CORE\\,cmask\\=3D2@) / 2 if #SMT_on else UOPS_EXECUTED.CYCLES_GE_1= _UOP_EXEC - UOPS_EXECUTED.CYCLES_GE_2_UOPS_EXEC) / tma_info_core_core_clks", "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_issueL1;tma_p= orts_utilization_group", "MetricName": "tma_ports_utilized_1", "MetricThreshold": "tma_ports_utilized_1 > 0.2 & (tma_ports_utiliz= ation > 0.15 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2))", @@ -1050,7 +1046,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles CPU= executed total of 2 uops per cycle on all execution ports (Logical Process= or cycles since ICL, Physical Core cycles otherwise)", - "MetricExpr": "((cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D2@ - cpu@UOPS_= EXECUTED.CORE\\,cmask\\=3D3@) / 2 if #SMT_on else (UOPS_EXECUTED.CYCLES_GE_= 2_UOPS_EXEC - UOPS_EXECUTED.CYCLES_GE_3_UOPS_EXEC) / tma_info_core_core_clk= s)", + "MetricExpr": "((cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D2@ - cpu@UOPS_= EXECUTED.CORE\\,cmask\\=3D3@) / 2 if #SMT_on else UOPS_EXECUTED.CYCLES_GE_2= _UOPS_EXEC - UOPS_EXECUTED.CYCLES_GE_3_UOPS_EXEC) / tma_info_core_core_clks= ", "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_issue2P;tma_p= orts_utilization_group", "MetricName": "tma_ports_utilized_2", "MetricThreshold": "tma_ports_utilized_2 > 0.15 & (tma_ports_utili= zation > 0.15 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2))", diff --git a/tools/perf/pmu-events/arch/x86/broadwellde/bdwde-metrics.json = b/tools/perf/pmu-events/arch/x86/broadwellde/bdwde-metrics.json index 81175f0f2603..a5e408ca46a7 100644 --- a/tools/perf/pmu-events/arch/x86/broadwellde/bdwde-metrics.json +++ b/tools/perf/pmu-events/arch/x86/broadwellde/bdwde-metrics.json @@ -1,49 +1,49 @@ [ { "BriefDescription": "C2 residency percent per package", - "MetricExpr": "cstate_pkg@c2\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c2\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C2_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C3 residency percent per core", - "MetricExpr": "cstate_core@c3\\-residency@ / TSC", + "MetricExpr": "cstate_core@c3\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C3_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C3 residency percent per package", - "MetricExpr": "cstate_pkg@c3\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c3\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C3_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per core", - "MetricExpr": "cstate_core@c6\\-residency@ / TSC", + "MetricExpr": "cstate_core@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per package", - "MetricExpr": "cstate_pkg@c6\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C7 residency percent per core", - "MetricExpr": "cstate_core@c7\\-residency@ / TSC", + "MetricExpr": "cstate_core@c7\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C7_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C7 residency percent per package", - "MetricExpr": "cstate_pkg@c7\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c7\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C7_Pkg_Residency", "ScaleUnit": "100%" @@ -80,7 +80,6 @@ }, { "BriefDescription": "This metric represents Core fraction of cycle= s CPU dispatched uops on execution ports for ALU operations.", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "(UOPS_DISPATCHED_PORT.PORT_0 + UOPS_DISPATCHED_PORT= .PORT_1 + UOPS_DISPATCHED_PORT.PORT_5 + UOPS_DISPATCHED_PORT.PORT_6) / tma_= info_thread_slots", "MetricGroup": "TopdownL5;tma_L5_group;tma_ports_utilized_3m_group= ", "MetricName": "tma_alu_op_utilization", @@ -98,7 +97,6 @@ }, { "BriefDescription": "This category represents fraction of slots wh= ere no uops are being delivered due to a lack of required resources for acc= epting new uops in the Backend", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "1 - (tma_frontend_bound + tma_bad_speculation + tma= _retiring)", "MetricGroup": "BvOB;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", @@ -139,7 +137,6 @@ }, { "BriefDescription": "This metric estimates fraction of cycles the = CPU retired uops originated from CISC (complex instruction set computer) in= struction", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "max(0, tma_microcode_sequencer - tma_assists)", "MetricGroup": "TopdownL4;tma_L4_group;tma_microcode_sequencer_gro= up", "MetricName": "tma_cisc", @@ -632,7 +629,7 @@ "MetricThreshold": "tma_info_memory_tlb_page_walks_utilization > 0= .5" }, { - "BriefDescription": "", + "BriefDescription": "Mem;Backend;CacheHits", "MetricExpr": "UOPS_EXECUTED.THREAD / (cpu@UOPS_EXECUTED.CORE\\,cm= ask\\=3D1@ / 2 if #SMT_on else UOPS_EXECUTED.CYCLES_GE_1_UOP_EXEC)", "MetricGroup": "Cor;Pipeline;PortsUtil;SMT", "MetricName": "tma_info_pipeline_execute" @@ -645,7 +642,7 @@ }, { "BriefDescription": "Measured Average Core Frequency for unhalted = processors [GHz]", - "MetricExpr": "tma_info_system_turbo_utilization * TSC / 1e9 / tma= _info_system_time", + "MetricExpr": "tma_info_system_turbo_utilization * msr@tsc@ / 1e9 = / tma_info_system_time", "MetricGroup": "Power;Summary", "MetricName": "tma_info_system_core_frequency" }, @@ -657,7 +654,7 @@ }, { "BriefDescription": "Average number of utilized CPUs", - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / TSC", + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@", "MetricGroup": "Summary", "MetricName": "tma_info_system_cpus_utilized" }, @@ -846,7 +843,6 @@ }, { "BriefDescription": "This metric represents Core fraction of cycle= s CPU dispatched uops on execution port for Load operations", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "(UOPS_DISPATCHED_PORT.PORT_2 + UOPS_DISPATCHED_PORT= .PORT_3 + UOPS_DISPATCHED_PORT.PORT_7 - UOPS_DISPATCHED_PORT.PORT_4) / (2 *= tma_info_core_core_clks)", "MetricGroup": "TopdownL5;tma_L5_group;tma_ports_utilized_3m_group= ", "MetricName": "tma_load_op_utilization", @@ -1021,7 +1017,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles CPU= executed no uops on any execution port (Logical Processor cycles since ICL= , Physical Core cycles otherwise)", - "MetricExpr": "(cpu@UOPS_EXECUTED.CORE\\,inv\\,cmask\\=3D1@ / 2 if= #SMT_on else (CYCLE_ACTIVITY.STALLS_TOTAL - (RS_EVENTS.EMPTY_CYCLES if tma= _fetch_latency > 0.1 else 0)) / tma_info_core_core_clks)", + "MetricExpr": "(cpu@UOPS_EXECUTED.CORE\\,inv\\,cmask\\=3D1@ / 2 if= #SMT_on else CYCLE_ACTIVITY.STALLS_TOTAL - (RS_EVENTS.EMPTY_CYCLES if tma_= fetch_latency > 0.1 else 0)) / tma_info_core_core_clks", "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_ports_utiliza= tion_group", "MetricName": "tma_ports_utilized_0", "MetricThreshold": "tma_ports_utilized_0 > 0.2 & (tma_ports_utiliz= ation > 0.15 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2))", @@ -1030,7 +1026,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles whe= re the CPU executed total of 1 uop per cycle on all execution ports (Logica= l Processor cycles since ICL, Physical Core cycles otherwise)", - "MetricExpr": "((cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D1@ - cpu@UOPS_= EXECUTED.CORE\\,cmask\\=3D2@) / 2 if #SMT_on else (UOPS_EXECUTED.CYCLES_GE_= 1_UOP_EXEC - UOPS_EXECUTED.CYCLES_GE_2_UOPS_EXEC) / tma_info_core_core_clks= )", + "MetricExpr": "((cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D1@ - cpu@UOPS_= EXECUTED.CORE\\,cmask\\=3D2@) / 2 if #SMT_on else UOPS_EXECUTED.CYCLES_GE_1= _UOP_EXEC - UOPS_EXECUTED.CYCLES_GE_2_UOPS_EXEC) / tma_info_core_core_clks", "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_issueL1;tma_p= orts_utilization_group", "MetricName": "tma_ports_utilized_1", "MetricThreshold": "tma_ports_utilized_1 > 0.2 & (tma_ports_utiliz= ation > 0.15 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2))", @@ -1039,7 +1035,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles CPU= executed total of 2 uops per cycle on all execution ports (Logical Process= or cycles since ICL, Physical Core cycles otherwise)", - "MetricExpr": "((cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D2@ - cpu@UOPS_= EXECUTED.CORE\\,cmask\\=3D3@) / 2 if #SMT_on else (UOPS_EXECUTED.CYCLES_GE_= 2_UOPS_EXEC - UOPS_EXECUTED.CYCLES_GE_3_UOPS_EXEC) / tma_info_core_core_clk= s)", + "MetricExpr": "((cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D2@ - cpu@UOPS_= EXECUTED.CORE\\,cmask\\=3D3@) / 2 if #SMT_on else UOPS_EXECUTED.CYCLES_GE_2= _UOPS_EXEC - UOPS_EXECUTED.CYCLES_GE_3_UOPS_EXEC) / tma_info_core_core_clks= ", "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_issue2P;tma_p= orts_utilization_group", "MetricName": "tma_ports_utilized_2", "MetricThreshold": "tma_ports_utilized_2 > 0.15 & (tma_ports_utili= zation > 0.15 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2))", diff --git a/tools/perf/pmu-events/arch/x86/broadwellx/bdx-metrics.json b/t= ools/perf/pmu-events/arch/x86/broadwellx/bdx-metrics.json index 5d06a3f72be2..5b83b040060c 100644 --- a/tools/perf/pmu-events/arch/x86/broadwellx/bdx-metrics.json +++ b/tools/perf/pmu-events/arch/x86/broadwellx/bdx-metrics.json @@ -1,49 +1,49 @@ [ { "BriefDescription": "C2 residency percent per package", - "MetricExpr": "cstate_pkg@c2\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c2\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C2_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C3 residency percent per core", - "MetricExpr": "cstate_core@c3\\-residency@ / TSC", + "MetricExpr": "cstate_core@c3\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C3_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C3 residency percent per package", - "MetricExpr": "cstate_pkg@c3\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c3\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C3_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per core", - "MetricExpr": "cstate_core@c6\\-residency@ / TSC", + "MetricExpr": "cstate_core@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per package", - "MetricExpr": "cstate_pkg@c6\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C7 residency percent per core", - "MetricExpr": "cstate_core@c7\\-residency@ / TSC", + "MetricExpr": "cstate_core@c7\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C7_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C7 residency percent per package", - "MetricExpr": "cstate_pkg@c7\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c7\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C7_Pkg_Residency", "ScaleUnit": "100%" @@ -282,7 +282,6 @@ }, { "BriefDescription": "This metric represents Core fraction of cycle= s CPU dispatched uops on execution ports for ALU operations.", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "(UOPS_DISPATCHED_PORT.PORT_0 + UOPS_DISPATCHED_PORT= .PORT_1 + UOPS_DISPATCHED_PORT.PORT_5 + UOPS_DISPATCHED_PORT.PORT_6) / tma_= info_thread_slots", "MetricGroup": "TopdownL5;tma_L5_group;tma_ports_utilized_3m_group= ", "MetricName": "tma_alu_op_utilization", @@ -300,7 +299,6 @@ }, { "BriefDescription": "This category represents fraction of slots wh= ere no uops are being delivered due to a lack of required resources for acc= epting new uops in the Backend", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "1 - (tma_frontend_bound + tma_bad_speculation + tma= _retiring)", "MetricGroup": "BvOB;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", @@ -341,7 +339,6 @@ }, { "BriefDescription": "This metric estimates fraction of cycles the = CPU retired uops originated from CISC (complex instruction set computer) in= struction", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "max(0, tma_microcode_sequencer - tma_assists)", "MetricGroup": "TopdownL4;tma_L4_group;tma_microcode_sequencer_gro= up", "MetricName": "tma_cisc", @@ -842,7 +839,7 @@ "MetricThreshold": "tma_info_memory_tlb_page_walks_utilization > 0= .5" }, { - "BriefDescription": "", + "BriefDescription": "Mem;Backend;CacheHits", "MetricExpr": "UOPS_EXECUTED.THREAD / (cpu@UOPS_EXECUTED.CORE\\,cm= ask\\=3D1@ / 2 if #SMT_on else UOPS_EXECUTED.CYCLES_GE_1_UOP_EXEC)", "MetricGroup": "Cor;Pipeline;PortsUtil;SMT", "MetricName": "tma_info_pipeline_execute" @@ -855,7 +852,7 @@ }, { "BriefDescription": "Measured Average Core Frequency for unhalted = processors [GHz]", - "MetricExpr": "tma_info_system_turbo_utilization * TSC / 1e9 / tma= _info_system_time", + "MetricExpr": "tma_info_system_turbo_utilization * msr@tsc@ / 1e9 = / tma_info_system_time", "MetricGroup": "Power;Summary", "MetricName": "tma_info_system_core_frequency" }, @@ -867,7 +864,7 @@ }, { "BriefDescription": "Average number of utilized CPUs", - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / TSC", + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@", "MetricGroup": "Summary", "MetricName": "tma_info_system_cpus_utilized" }, @@ -907,6 +904,7 @@ }, { "BriefDescription": "Average number of parallel data read requests= to external memory", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "UNC_C_TOR_OCCUPANCY.MISS_OPCODE@filter_opc\\=3D0x18= 2@ / UNC_C_TOR_OCCUPANCY.MISS_OPCODE@filter_opc\\=3D0x182\\,thresh\\=3D1@", "MetricGroup": "Mem;MemoryBW;SoC", "MetricName": "tma_info_system_mem_parallel_reads", @@ -1076,7 +1074,6 @@ }, { "BriefDescription": "This metric represents Core fraction of cycle= s CPU dispatched uops on execution port for Load operations", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "(UOPS_DISPATCHED_PORT.PORT_2 + UOPS_DISPATCHED_PORT= .PORT_3 + UOPS_DISPATCHED_PORT.PORT_7 - UOPS_DISPATCHED_PORT.PORT_4) / (2 *= tma_info_core_core_clks)", "MetricGroup": "TopdownL5;tma_L5_group;tma_ports_utilized_3m_group= ", "MetricName": "tma_load_op_utilization", @@ -1086,6 +1083,7 @@ }, { "BriefDescription": "This metric estimates fraction of cycles whil= e the memory subsystem was handling loads from local memory", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "200 * (MEM_LOAD_UOPS_L3_MISS_RETIRED.LOCAL_DRAM * (= 1 + MEM_LOAD_UOPS_RETIRED.HIT_LFB / (MEM_LOAD_UOPS_RETIRED.L2_HIT + MEM_LOA= D_UOPS_RETIRED.L3_HIT + MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_HIT + MEM_LOAD_UO= PS_L3_HIT_RETIRED.XSNP_HITM + MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_MISS + MEM_= LOAD_UOPS_L3_MISS_RETIRED.LOCAL_DRAM + MEM_LOAD_UOPS_L3_MISS_RETIRED.REMOTE= _DRAM + MEM_LOAD_UOPS_L3_MISS_RETIRED.REMOTE_HITM + MEM_LOAD_UOPS_L3_MISS_R= ETIRED.REMOTE_FWD))) / tma_info_thread_clks", "MetricGroup": "Server;TopdownL5;tma_L5_group;tma_mem_latency_grou= p", "MetricName": "tma_local_mem", @@ -1263,7 +1261,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles CPU= executed no uops on any execution port (Logical Processor cycles since ICL= , Physical Core cycles otherwise)", - "MetricExpr": "(cpu@UOPS_EXECUTED.CORE\\,inv\\,cmask\\=3D1@ / 2 if= #SMT_on else (CYCLE_ACTIVITY.STALLS_TOTAL - (RS_EVENTS.EMPTY_CYCLES if tma= _fetch_latency > 0.1 else 0)) / tma_info_core_core_clks)", + "MetricExpr": "(cpu@UOPS_EXECUTED.CORE\\,inv\\,cmask\\=3D1@ / 2 if= #SMT_on else CYCLE_ACTIVITY.STALLS_TOTAL - (RS_EVENTS.EMPTY_CYCLES if tma_= fetch_latency > 0.1 else 0)) / tma_info_core_core_clks", "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_ports_utiliza= tion_group", "MetricName": "tma_ports_utilized_0", "MetricThreshold": "tma_ports_utilized_0 > 0.2 & (tma_ports_utiliz= ation > 0.15 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2))", @@ -1272,7 +1270,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles whe= re the CPU executed total of 1 uop per cycle on all execution ports (Logica= l Processor cycles since ICL, Physical Core cycles otherwise)", - "MetricExpr": "((cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D1@ - cpu@UOPS_= EXECUTED.CORE\\,cmask\\=3D2@) / 2 if #SMT_on else (UOPS_EXECUTED.CYCLES_GE_= 1_UOP_EXEC - UOPS_EXECUTED.CYCLES_GE_2_UOPS_EXEC) / tma_info_core_core_clks= )", + "MetricExpr": "((cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D1@ - cpu@UOPS_= EXECUTED.CORE\\,cmask\\=3D2@) / 2 if #SMT_on else UOPS_EXECUTED.CYCLES_GE_1= _UOP_EXEC - UOPS_EXECUTED.CYCLES_GE_2_UOPS_EXEC) / tma_info_core_core_clks", "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_issueL1;tma_p= orts_utilization_group", "MetricName": "tma_ports_utilized_1", "MetricThreshold": "tma_ports_utilized_1 > 0.2 & (tma_ports_utiliz= ation > 0.15 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2))", @@ -1281,7 +1279,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles CPU= executed total of 2 uops per cycle on all execution ports (Logical Process= or cycles since ICL, Physical Core cycles otherwise)", - "MetricExpr": "((cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D2@ - cpu@UOPS_= EXECUTED.CORE\\,cmask\\=3D3@) / 2 if #SMT_on else (UOPS_EXECUTED.CYCLES_GE_= 2_UOPS_EXEC - UOPS_EXECUTED.CYCLES_GE_3_UOPS_EXEC) / tma_info_core_core_clk= s)", + "MetricExpr": "((cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D2@ - cpu@UOPS_= EXECUTED.CORE\\,cmask\\=3D3@) / 2 if #SMT_on else UOPS_EXECUTED.CYCLES_GE_2= _UOPS_EXEC - UOPS_EXECUTED.CYCLES_GE_3_UOPS_EXEC) / tma_info_core_core_clks= ", "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_issue2P;tma_p= orts_utilization_group", "MetricName": "tma_ports_utilized_2", "MetricThreshold": "tma_ports_utilized_2 > 0.15 & (tma_ports_utili= zation > 0.15 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2))", @@ -1308,6 +1306,7 @@ }, { "BriefDescription": "This metric estimates fraction of cycles whil= e the memory subsystem was handling loads from remote memory", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "310 * (MEM_LOAD_UOPS_L3_MISS_RETIRED.REMOTE_DRAM * = (1 + MEM_LOAD_UOPS_RETIRED.HIT_LFB / (MEM_LOAD_UOPS_RETIRED.L2_HIT + MEM_LO= AD_UOPS_RETIRED.L3_HIT + MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_HIT + MEM_LOAD_U= OPS_L3_HIT_RETIRED.XSNP_HITM + MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_MISS + MEM= _LOAD_UOPS_L3_MISS_RETIRED.LOCAL_DRAM + MEM_LOAD_UOPS_L3_MISS_RETIRED.REMOT= E_DRAM + MEM_LOAD_UOPS_L3_MISS_RETIRED.REMOTE_HITM + MEM_LOAD_UOPS_L3_MISS_= RETIRED.REMOTE_FWD))) / tma_info_thread_clks", "MetricGroup": "Server;Snoop;TopdownL5;tma_L5_group;tma_mem_latenc= y_group", "MetricName": "tma_remote_mem", --=20 2.50.0.727.gbf7dc18ff4-goog From nobody Mon Oct 6 15:13:22 2025 Received: from mail-oa1-f73.google.com (mail-oa1-f73.google.com [209.85.160.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A0831217704 for ; Sat, 19 Jul 2025 03:45:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752896737; cv=none; b=Dr3G8yJUGlCoMJ89ASCM7klDl4lzi4yYaBEeyzL5KaRtLr4sxskSYBxhQpk+85ujIW3EO0vzx4iv278YbyYeeEo6ciZs3DLc+SMouQt4GVYvt6d7YihX4yzxIu8wr1A3Wj+exfRAOmcl2n7XbhL3EaeVG4D6JdahicSO0BLJgRA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752896737; c=relaxed/simple; bh=VKlDjylJylqoSBYOYrqiof5ny+N/VWevgR/t3ExwWpw=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Content-Type; b=AvEyqpFMO3sirm/1wdizMbsDcaPo4itTwBNoP17iVhRmzx7v3RL7I3HG/yvCv+YPue7lMdULsxi4pJ8o46MgzUdN5dJlo5hyMkKHMIZcCK6XrgYjemUjIwFKqf303aXYMA48cLBTBg/Y+obAPEmbqi1sKXcVBrWOVoZS/64en5U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Y8J8yeHW; arc=none smtp.client-ip=209.85.160.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Y8J8yeHW" Received: by mail-oa1-f73.google.com with SMTP id 586e51a60fabf-2d9e7fbfedaso3244841fac.3 for ; Fri, 18 Jul 2025 20:45:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1752896733; x=1753501533; darn=vger.kernel.org; h=content-transfer-encoding:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=oNUh2QquWbLWrYJY8NRy06UCvH+kyDKsN2i3f1JdJjM=; b=Y8J8yeHWWCwf1fqoMiEVI4OMGESq1fYXByR4PXVW5BBJmi1ed4iRI5Em0aqv5GleLc H7gcbHq6RUidSE5xhtXbHO1Lnd9QGSBGcKs0sipT8XPn0FfvR3xVQJACu/M1toX2afw3 ISpRXetlkz58tZjc544zx79w1+Smu/YkY0ZyPrYP+aavKkeKOAPiouEc5mKsjiQLzhSG mMzwoDcb4vLksbBi7mVnLJ9+ZXHO1aMpvnY5Ux63jK0cYBSOm754vAn5CkPt30n56i5u rEt80nkt9KpzoCMCI/lrJfPeiDPFp57ltrnVVXtEQom/VcE8YX0EVLMujMkbIfjC5U0y AItw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752896733; x=1753501533; h=content-transfer-encoding:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=oNUh2QquWbLWrYJY8NRy06UCvH+kyDKsN2i3f1JdJjM=; b=UT6sZgPPiS1iGZV/yMpcsdf/Udm+ESMZ167uUCM78PW3Om5jvWMm+Bu8oHgirXUppK G3GmS5YSioiIzNChkKX1Ubpv0k+rbeBAnwHk/qxkzPog2b1Uj2rYWQ+BBErJQUktdQ4t YZkOLlqrPXK3/HzXovwZGj9dGyEGO5Rs2G/KpzpJ89EIhhdqShfc1mRiRgj9N74SQjmP ij7JG49c9or6/hCHYRmX4jL4tIQvud/yopfAAqOzHaT0hve6Xb8MYkcLivXlH8D6w4p6 a3GiuO7k1YcjBwQ4NaJjB3gHNMnITKIu/m9sI/+qzJboo5ux/K2oq2IC83IBLRTlzCZb rv4Q== X-Forwarded-Encrypted: i=1; AJvYcCVkotR90Deya26qlUTkGbfmceN2GQay+tLXhreuqDdjeG3U0hf8aVOg1kFYNfXQ9Lxz6DHea6284lVcDPk=@vger.kernel.org X-Gm-Message-State: AOJu0YzYVLFOiRjFLVCIiJQburrdEphlsOl27WH0x+QH/+CZiN9EYN8/ a8ka67G/PgM/ITXy2bF3xh9h23Y30nOzHwBVTromkSbhA7lrKkTV6Gr9EqmyNNp/je0pRsfOi4m ClcW4KBp2IQ== X-Google-Smtp-Source: AGHT+IEPvE9DSMOkCWK1i+gxaiigSVX6UOwH6PfFh12zYCH3Wkzn8dXS2Hvk3BD5l/vGrXygSmD0pvIkRF37 X-Received: from oabqz6.prod.google.com ([2002:a05:6871:60c6:b0:297:2777:a7bb]) (user=irogers job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6871:d617:b0:2ff:a513:26dd with SMTP id 586e51a60fabf-2ffb21fe9admr11674143fac.6.1752896732768; Fri, 18 Jul 2025 20:45:32 -0700 (PDT) Date: Fri, 18 Jul 2025 20:45:00 -0700 In-Reply-To: <20250719034515.2000467-1-irogers@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250719034515.2000467-1-irogers@google.com> X-Mailer: git-send-email 2.50.0.727.gbf7dc18ff4-goog Message-ID: <20250719034515.2000467-5-irogers@google.com> Subject: [PATCH v1 04/19] perf vendor events: Update cascadelakex metrics From: Ian Rogers To: Thomas Falcon , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Kan Liang , "=?UTF-8?q?Andreas=20F=C3=A4rber?=" , Manivannan Sadhasivam , Caleb Biggers , Weilin Wang , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-actions@lists.infradead.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Update metrics from TMA 5.0 to 5.1. Signed-off-by: Ian Rogers Tested-by: Thomas Falcon --- .../arch/x86/cascadelakex/clx-metrics.json | 139 +++++++++++++----- 1 file changed, 100 insertions(+), 39 deletions(-) diff --git a/tools/perf/pmu-events/arch/x86/cascadelakex/clx-metrics.json b= /tools/perf/pmu-events/arch/x86/cascadelakex/clx-metrics.json index 6485b565acbc..2e50a91b6728 100644 --- a/tools/perf/pmu-events/arch/x86/cascadelakex/clx-metrics.json +++ b/tools/perf/pmu-events/arch/x86/cascadelakex/clx-metrics.json @@ -1,49 +1,49 @@ [ { "BriefDescription": "C2 residency percent per package", - "MetricExpr": "cstate_pkg@c2\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c2\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C2_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C3 residency percent per core", - "MetricExpr": "cstate_core@c3\\-residency@ / TSC", + "MetricExpr": "cstate_core@c3\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C3_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C3 residency percent per package", - "MetricExpr": "cstate_pkg@c3\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c3\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C3_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per core", - "MetricExpr": "cstate_core@c6\\-residency@ / TSC", + "MetricExpr": "cstate_core@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per package", - "MetricExpr": "cstate_pkg@c6\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C7 residency percent per core", - "MetricExpr": "cstate_core@c7\\-residency@ / TSC", + "MetricExpr": "cstate_core@c7\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C7_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C7 residency percent per package", - "MetricExpr": "cstate_pkg@c7\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c7\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C7_Pkg_Residency", "ScaleUnit": "100%" @@ -319,6 +319,7 @@ }, { "BriefDescription": "This metric represents Core fraction of cycle= s CPU dispatched uops on execution ports for ALU operations.", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "(UOPS_DISPATCHED_PORT.PORT_0 + UOPS_DISPATCHED_PORT= .PORT_1 + UOPS_DISPATCHED_PORT.PORT_5 + UOPS_DISPATCHED_PORT.PORT_6) / tma_= info_thread_slots", "MetricGroup": "TopdownL5;tma_L5_group;tma_ports_utilized_3m_group= ", "MetricName": "tma_alu_op_utilization", @@ -356,6 +357,7 @@ }, { "BriefDescription": "Total pipeline cost of instruction fetch rela= ted bottlenecks by large code footprint programs (i-side cache; TLB and BTB= misses)", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * tma_fetch_latency * (tma_itlb_misses + tma_ic= ache_misses + tma_unknown_branches) / (tma_branch_resteers + tma_dsb_switch= es + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches)", "MetricGroup": "BigFootprint;BvBC;Fed;Frontend;IcMiss;MemoryTLB", "MetricName": "tma_bottleneck_big_code", @@ -369,32 +371,36 @@ "MetricThreshold": "tma_bottleneck_branching_overhead > 5", "PublicDescription": "Total pipeline cost of instructions used for= program control-flow - a subset of the Retiring category in TMA. Examples = include function calls; loops and alignments. (A lower bound)" }, + { + "BriefDescription": "Total pipeline cost when the execution is com= pute-bound - an estimation", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "100 * (tma_core_bound * tma_divider / (tma_divider = + tma_ports_utilization + tma_serializing_operation) + tma_core_bound * (tm= a_ports_utilization / (tma_divider + tma_ports_utilization + tma_serializin= g_operation)) * (tma_ports_utilized_3m / (tma_ports_utilized_0 + tma_ports_= utilized_1 + tma_ports_utilized_2 + tma_ports_utilized_3m)))", + "MetricGroup": "BvCB;Cor;tma_issueComp", + "MetricName": "tma_bottleneck_compute_bound_est", + "MetricThreshold": "tma_bottleneck_compute_bound_est > 20", + "PublicDescription": "Total pipeline cost when the execution is co= mpute-bound - an estimation. Covers Core Bound when High ILP as well as whe= n long-latency execution units are busy. Related metrics: " + }, { "BriefDescription": "Total pipeline cost of external Memory- or Ca= che-Bandwidth related bottlenecks", - "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) *= (tma_mem_bandwidth / (tma_mem_bandwidth + tma_mem_latency)) + tma_memory_b= ound * (tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_= l3_bound + tma_store_bound)) * (tma_sq_full / (tma_contested_accesses + tma= _data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_memory_bound * (tm= a_l1_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound += tma_store_bound)) * (tma_fb_full / (tma_4k_aliasing + tma_dtlb_load + tma_= fb_full + tma_l1_latency_dependency + tma_lock_latency + tma_split_loads + = tma_store_fwd_blk)))", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_cx= l_mem_bound + tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound += tma_store_bound)) * (tma_mem_bandwidth / (tma_mem_bandwidth + tma_mem_late= ncy)) + tma_memory_bound * (tma_l3_bound / (tma_cxl_mem_bound + tma_dram_bo= und + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma= _sq_full / (tma_contested_accesses + tma_data_sharing + tma_l3_hit_latency = + tma_sq_full)) + tma_memory_bound * (tma_l1_bound / (tma_cxl_mem_bound + t= ma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_boun= d)) * (tma_fb_full / (tma_4k_aliasing + tma_dtlb_load + tma_fb_full + tma_l= 1_latency_dependency + tma_lock_latency + tma_split_loads + tma_store_fwd_b= lk)))", "MetricGroup": "BvMB;Mem;MemoryBW;Offcore;tma_issueBW", - "MetricName": "tma_bottleneck_cache_memory_bandwidth", - "MetricThreshold": "tma_bottleneck_cache_memory_bandwidth > 20", + "MetricName": "tma_bottleneck_data_cache_memory_bandwidth", + "MetricThreshold": "tma_bottleneck_data_cache_memory_bandwidth > 2= 0", "PublicDescription": "Total pipeline cost of external Memory- or C= ache-Bandwidth related bottlenecks. Related metrics: tma_fb_full, tma_info_= system_dram_bw_use, tma_mem_bandwidth, tma_sq_full" }, { "BriefDescription": "Total pipeline cost of external Memory- or Ca= che-Latency related bottlenecks", - "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) *= (tma_mem_latency / (tma_mem_bandwidth + tma_mem_latency)) + tma_memory_bou= nd * (tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3= _bound + tma_store_bound)) * (tma_l3_hit_latency / (tma_contested_accesses = + tma_data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_memory_bound = * tma_l2_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bou= nd + tma_store_bound) + tma_memory_bound * (tma_l1_bound / (tma_dram_bound = + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_l1_= latency_dependency / (tma_4k_aliasing + tma_dtlb_load + tma_fb_full + tma_l= 1_latency_dependency + tma_lock_latency + tma_split_loads + tma_store_fwd_b= lk)) + tma_memory_bound * (tma_l1_bound / (tma_dram_bound + tma_l1_bound + = tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_lock_latency / (tma_= 4k_aliasing + tma_dtlb_load + tma_fb_full + tma_l1_latency_dependency + tma= _lock_latency + tma_split_loads + tma_store_fwd_blk)) + tma_memory_bound * = (tma_l1_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_boun= d + tma_store_bound)) * (tma_split_loads / (tma_4k_aliasing + tma_dtlb_load= + tma_fb_full + tma_l1_latency_dependency + tma_lock_latency + tma_split_l= oads + tma_store_fwd_blk)) + tma_memory_bound * (tma_store_bound / (tma_dra= m_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * = (tma_split_stores / (tma_dtlb_store + tma_false_sharing + tma_split_stores = + tma_store_latency)) + tma_memory_bound * (tma_store_bound / (tma_dram_bou= nd + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_= store_latency / (tma_dtlb_store + tma_false_sharing + tma_split_stores + tm= a_store_latency)))", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_cx= l_mem_bound + tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound += tma_store_bound)) * (tma_mem_latency / (tma_mem_bandwidth + tma_mem_latenc= y)) + tma_memory_bound * (tma_l3_bound / (tma_cxl_mem_bound + tma_dram_boun= d + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_l= 3_hit_latency / (tma_contested_accesses + tma_data_sharing + tma_l3_hit_lat= ency + tma_sq_full)) + tma_memory_bound * tma_l2_bound / (tma_cxl_mem_bound= + tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_= bound) + tma_memory_bound * (tma_l1_bound / (tma_cxl_mem_bound + tma_dram_b= ound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tm= a_l1_latency_dependency / (tma_4k_aliasing + tma_dtlb_load + tma_fb_full + = tma_l1_latency_dependency + tma_lock_latency + tma_split_loads + tma_store_= fwd_blk)) + tma_memory_bound * (tma_l1_bound / (tma_cxl_mem_bound + tma_dra= m_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * = (tma_lock_latency / (tma_4k_aliasing + tma_dtlb_load + tma_fb_full + tma_l1= _latency_dependency + tma_lock_latency + tma_split_loads + tma_store_fwd_bl= k)) + tma_memory_bound * (tma_l1_bound / (tma_cxl_mem_bound + tma_dram_boun= d + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_s= plit_loads / (tma_4k_aliasing + tma_dtlb_load + tma_fb_full + tma_l1_latenc= y_dependency + tma_lock_latency + tma_split_loads + tma_store_fwd_blk)) + t= ma_memory_bound * (tma_store_bound / (tma_cxl_mem_bound + tma_dram_bound + = tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_split= _stores / (tma_dtlb_store + tma_false_sharing + tma_split_stores + tma_stor= e_latency)) + tma_memory_bound * (tma_store_bound / (tma_cxl_mem_bound + tm= a_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound= )) * (tma_store_latency / (tma_dtlb_store + tma_false_sharing + tma_split_s= tores + tma_store_latency)))", "MetricGroup": "BvML;Mem;MemoryLat;Offcore;tma_issueLat", - "MetricName": "tma_bottleneck_cache_memory_latency", - "MetricThreshold": "tma_bottleneck_cache_memory_latency > 20", + "MetricName": "tma_bottleneck_data_cache_memory_latency", + "MetricThreshold": "tma_bottleneck_data_cache_memory_latency > 20", "PublicDescription": "Total pipeline cost of external Memory- or C= ache-Latency related bottlenecks. Related metrics: tma_l3_hit_latency, tma_= mem_latency" }, - { - "BriefDescription": "Total pipeline cost when the execution is com= pute-bound - an estimation", - "MetricExpr": "100 * (tma_core_bound * tma_divider / (tma_divider = + tma_ports_utilization + tma_serializing_operation) + tma_core_bound * (tm= a_ports_utilization / (tma_divider + tma_ports_utilization + tma_serializin= g_operation)) * (tma_ports_utilized_3m / (tma_ports_utilized_0 + tma_ports_= utilized_1 + tma_ports_utilized_2 + tma_ports_utilized_3m)))", - "MetricGroup": "BvCB;Cor;tma_issueComp", - "MetricName": "tma_bottleneck_compute_bound_est", - "MetricThreshold": "tma_bottleneck_compute_bound_est > 20", - "PublicDescription": "Total pipeline cost when the execution is co= mpute-bound - an estimation. Covers Core Bound when High ILP as well as whe= n long-latency execution units are busy. Related metrics: " - }, { "BriefDescription": "Total pipeline cost of instruction fetch band= width related bottlenecks (when the front-end could not sustain operations = delivery to the back-end)", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_frontend_bound - (1 - 10 * tma_microcode= _sequencer * tma_other_mispredicts / tma_branch_mispredicts) * tma_fetch_la= tency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches = + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches) - tma_mi= crocode_sequencer / (tma_few_uops_instructions + tma_microcode_sequencer) *= (tma_assists / tma_microcode_sequencer) * tma_fetch_latency * (tma_ms_swit= ches + tma_branch_resteers * (tma_clears_resteers + tma_mispredicts_resteer= s * (10 * tma_microcode_sequencer * tma_other_mispredicts / tma_branch_misp= redicts)) / (tma_clears_resteers + tma_mispredicts_resteers + tma_unknown_b= ranches)) / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + t= ma_itlb_misses + tma_lcp + tma_ms_switches)) - tma_bottleneck_big_code", "MetricGroup": "BvFB;Fed;FetchBW;Frontend", "MetricName": "tma_bottleneck_instruction_fetch_bw", @@ -402,6 +408,7 @@ }, { "BriefDescription": "Total pipeline cost of irregular execution (e= .g", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_microcode_sequencer / (tma_few_uops_inst= ructions + tma_microcode_sequencer) * (tma_assists / tma_microcode_sequence= r) * tma_fetch_latency * (tma_ms_switches + tma_branch_resteers * (tma_clea= rs_resteers + tma_mispredicts_resteers * (10 * tma_microcode_sequencer * tm= a_other_mispredicts / tma_branch_mispredicts)) / (tma_clears_resteers + tma= _mispredicts_resteers + tma_unknown_branches)) / (tma_branch_resteers + tma= _dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_swit= ches) + 10 * tma_microcode_sequencer * tma_other_mispredicts / tma_branch_m= ispredicts * tma_branch_mispredicts + tma_machine_clears * tma_other_nukes = / tma_other_nukes + tma_core_bound * (tma_serializing_operation + tma_core_= bound * RS_EVENTS.EMPTY_CYCLES / tma_info_thread_clks * tma_ports_utilized_= 0) / (tma_divider + tma_ports_utilization + tma_serializing_operation) + tm= a_microcode_sequencer / (tma_few_uops_instructions + tma_microcode_sequence= r) * (tma_assists / tma_microcode_sequencer) * tma_heavy_operations)", "MetricGroup": "Bad;BvIO;Cor;Ret;tma_issueMS", "MetricName": "tma_bottleneck_irregular_overhead", @@ -410,7 +417,8 @@ }, { "BriefDescription": "Total pipeline cost of Memory Address Transla= tion related bottlenecks (data-side TLBs)", - "MetricExpr": "100 * (tma_memory_bound * (tma_l1_bound / max(tma_m= emory_bound, tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + = tma_store_bound)) * (tma_dtlb_load / max(tma_l1_bound, tma_4k_aliasing + tm= a_dtlb_load + tma_fb_full + tma_l1_latency_dependency + tma_lock_latency + = tma_split_loads + tma_store_fwd_blk)) + tma_memory_bound * (tma_store_bound= / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store= _bound)) * (tma_dtlb_store / (tma_dtlb_store + tma_false_sharing + tma_spli= t_stores + tma_store_latency)))", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "100 * (tma_memory_bound * (tma_l1_bound / max(tma_m= emory_bound, tma_cxl_mem_bound + tma_dram_bound + tma_l1_bound + tma_l2_bou= nd + tma_l3_bound + tma_store_bound)) * (tma_dtlb_load / max(tma_l1_bound, = tma_4k_aliasing + tma_dtlb_load + tma_fb_full + tma_l1_latency_dependency += tma_lock_latency + tma_split_loads + tma_store_fwd_blk)) + tma_memory_boun= d * (tma_store_bound / (tma_cxl_mem_bound + tma_dram_bound + tma_l1_bound += tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_dtlb_store / (tma_d= tlb_store + tma_false_sharing + tma_split_stores + tma_store_latency)))", "MetricGroup": "BvMT;Mem;MemoryTLB;Offcore;tma_issueTLB", "MetricName": "tma_bottleneck_memory_data_tlbs", "MetricThreshold": "tma_bottleneck_memory_data_tlbs > 20", @@ -418,7 +426,8 @@ }, { "BriefDescription": "Total pipeline cost of Memory Synchronization= related bottlenecks (data transfers and coherency updates across processor= s)", - "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * = (tma_mem_latency / (tma_mem_bandwidth + tma_mem_latency)) * tma_remote_cach= e / (tma_local_mem + tma_remote_cache + tma_remote_mem) + tma_l3_bound / (t= ma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_boun= d) * (tma_contested_accesses + tma_data_sharing) / (tma_contested_accesses = + tma_data_sharing + tma_l3_hit_latency + tma_sq_full) + tma_store_bound / = (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bo= und) * tma_false_sharing / (tma_dtlb_store + tma_false_sharing + tma_split_= stores + tma_store_latency - tma_store_latency)) + tma_machine_clears * (1 = - tma_other_nukes / tma_other_nukes))", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_cx= l_mem_bound + tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound += tma_store_bound) * (tma_mem_latency / (tma_mem_bandwidth + tma_mem_latency= )) * tma_remote_cache / (tma_local_mem + tma_remote_cache + tma_remote_mem)= + tma_l3_bound / (tma_cxl_mem_bound + tma_dram_bound + tma_l1_bound + tma_= l2_bound + tma_l3_bound + tma_store_bound) * (tma_contested_accesses + tma_= data_sharing) / (tma_contested_accesses + tma_data_sharing + tma_l3_hit_lat= ency + tma_sq_full) + tma_store_bound / (tma_cxl_mem_bound + tma_dram_bound= + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * tma_fals= e_sharing / (tma_dtlb_store + tma_false_sharing + tma_split_stores + tma_st= ore_latency - tma_store_latency)) + tma_machine_clears * (1 - tma_other_nuk= es / tma_other_nukes))", "MetricGroup": "BvMS;LockCont;Mem;Offcore;tma_issueSyncxn", "MetricName": "tma_bottleneck_memory_synchronization", "MetricThreshold": "tma_bottleneck_memory_synchronization > 10", @@ -426,6 +435,7 @@ }, { "BriefDescription": "Total pipeline cost of Branch Misprediction r= elated bottlenecks", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (1 - 10 * tma_microcode_sequencer * tma_other= _mispredicts / tma_branch_mispredicts) * (tma_branch_mispredicts + tma_fetc= h_latency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switc= hes + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches))", "MetricGroup": "Bad;BadSpec;BrMispredicts;BvMP;tma_issueBM", "MetricName": "tma_bottleneck_mispredictions", @@ -434,7 +444,8 @@ }, { "BriefDescription": "Total pipeline cost of remaining bottlenecks = in the back-end", - "MetricExpr": "100 - (tma_bottleneck_big_code + tma_bottleneck_ins= truction_fetch_bw + tma_bottleneck_mispredictions + tma_bottleneck_cache_me= mory_bandwidth + tma_bottleneck_cache_memory_latency + tma_bottleneck_memor= y_data_tlbs + tma_bottleneck_memory_synchronization + tma_bottleneck_comput= e_bound_est + tma_bottleneck_irregular_overhead + tma_bottleneck_branching_= overhead + tma_bottleneck_useful_work)", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "100 - (tma_bottleneck_big_code + tma_bottleneck_ins= truction_fetch_bw + tma_bottleneck_mispredictions + tma_bottleneck_data_cac= he_memory_bandwidth + tma_bottleneck_data_cache_memory_latency + tma_bottle= neck_memory_data_tlbs + tma_bottleneck_memory_synchronization + tma_bottlen= eck_compute_bound_est + tma_bottleneck_irregular_overhead + tma_bottleneck_= branching_overhead + tma_bottleneck_useful_work)", "MetricGroup": "BvOB;Cor;Offcore", "MetricName": "tma_bottleneck_other_bottlenecks", "MetricThreshold": "tma_bottleneck_other_bottlenecks > 20", @@ -442,6 +453,7 @@ }, { "BriefDescription": "Total pipeline cost of \"useful operations\" = - the portion of Retiring category not covered by Branching_Overhead nor Ir= regular_Overhead.", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_retiring - (BR_INST_RETIRED.ALL_BRANCHES= + 2 * BR_INST_RETIRED.NEAR_CALL + INST_RETIRED.NOP) / tma_info_thread_slot= s - tma_microcode_sequencer / (tma_few_uops_instructions + tma_microcode_se= quencer) * (tma_assists / tma_microcode_sequencer) * tma_heavy_operations)", "MetricGroup": "BvUW;Ret", "MetricName": "tma_bottleneck_useful_work", @@ -469,6 +481,7 @@ }, { "BriefDescription": "This metric estimates fraction of cycles the = CPU retired uops originated from CISC (complex instruction set computer) in= struction", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "max(0, tma_microcode_sequencer - tma_assists)", "MetricGroup": "TopdownL4;tma_L4_group;tma_microcode_sequencer_gro= up", "MetricName": "tma_cisc", @@ -538,6 +551,15 @@ "PublicDescription": "This metric represents fraction of slots whe= re Core non-memory issues were of a bottleneck. Shortage in hardware compu= te resources; or dependencies in software's instructions are both categoriz= ed under Core Bound. Hence it may indicate the machine ran out of an out-of= -order resource; certain execution units are overloaded or dependencies in = program's data- or instruction-flow are limiting the performance (e.g. FP-c= hained long-latency arithmetic operations).", "ScaleUnit": "100%" }, + { + "BriefDescription": "This metric roughly estimates (based on idle = latencies) how often the CPU was stalled on accesses to external CXL Memory= by loads (e.g", + "MetricExpr": "(((1 - ((19 * (MEM_LOAD_L3_MISS_RETIRED.REMOTE_DRAM= * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS)) + 10 * (MEM_LO= AD_L3_MISS_RETIRED.LOCAL_DRAM * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RET= IRED.L1_MISS) + MEM_LOAD_L3_MISS_RETIRED.REMOTE_FWD * (1 + MEM_LOAD_RETIRED= .FB_HIT / MEM_LOAD_RETIRED.L1_MISS) + MEM_LOAD_L3_MISS_RETIRED.REMOTE_HITM = * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS))) / (19 * (MEM_L= OAD_L3_MISS_RETIRED.REMOTE_DRAM * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_R= ETIRED.L1_MISS)) + 10 * (MEM_LOAD_L3_MISS_RETIRED.LOCAL_DRAM * (1 + MEM_LOA= D_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS) + MEM_LOAD_L3_MISS_RETIRED.REM= OTE_FWD * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS) + MEM_LO= AD_L3_MISS_RETIRED.REMOTE_HITM * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RE= TIRED.L1_MISS)) + (25 * (MEM_LOAD_RETIRED.LOCAL_PMM * (1 + MEM_LOAD_RETIRED= .FB_HIT / MEM_LOAD_RETIRED.L1_MISS) if #has_pmem > 0 else 0) + 33 * (MEM_LO= AD_L3_MISS_RETIRED.REMOTE_PMM * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RET= IRED.L1_MISS) if #has_pmem > 0 else 0))) if #has_pmem > 0 else 1)) * (CYCLE= _ACTIVITY.STALLS_L3_MISS / tma_info_thread_clks + (CYCLE_ACTIVITY.STALLS_L1= D_MISS - CYCLE_ACTIVITY.STALLS_L2_MISS) / tma_info_thread_clks - tma_l2_bou= nd) if 1e6 * (MEM_LOAD_L3_MISS_RETIRED.REMOTE_PMM + MEM_LOAD_RETIRED.LOCAL_= PMM) > MEM_LOAD_RETIRED.L1_MISS else 0) if #has_pmem > 0 else 0)", + "MetricGroup": "MemoryBound;Server;TmaL3mem;TopdownL3;tma_L3_group= ;tma_memory_bound_group", + "MetricName": "tma_cxl_mem_bound", + "MetricThreshold": "tma_cxl_mem_bound > 0.1 & (tma_memory_bound > = 0.2 & tma_backend_bound > 0.2)", + "PublicDescription": "This metric roughly estimates (based on idle= latencies) how often the CPU was stalled on accesses to external CXL Memor= y by loads (e.g. 3D-Xpoint (Crystal Ridge, a.k.a. IXP) memory, PMM - Persis= tent Memory Module [from CLX to SPR] or any other CXL Type3 Memory [EMR onw= ards]).", + "ScaleUnit": "100%" + }, { "BriefDescription": "This metric estimates fraction of cycles whil= e the memory subsystem was handling synchronizations due to data-sharing ac= cesses", "MetricConstraint": "NO_GROUP_EVENTS", @@ -569,7 +591,7 @@ { "BriefDescription": "This metric estimates how often the CPU was s= talled on accesses to external memory (DRAM) by loads", "MetricConstraint": "NO_GROUP_EVENTS", - "MetricExpr": "CYCLE_ACTIVITY.STALLS_L3_MISS / tma_info_thread_clk= s + (CYCLE_ACTIVITY.STALLS_L1D_MISS - CYCLE_ACTIVITY.STALLS_L2_MISS) / tma_= info_thread_clks - tma_l2_bound", + "MetricExpr": "(CYCLE_ACTIVITY.STALLS_L3_MISS / tma_info_thread_cl= ks + (CYCLE_ACTIVITY.STALLS_L1D_MISS - CYCLE_ACTIVITY.STALLS_L2_MISS) / tma= _info_thread_clks - tma_l2_bound - tma_cxl_mem_bound if #has_pmem > 0 else = CYCLE_ACTIVITY.STALLS_L3_MISS / tma_info_thread_clks + (CYCLE_ACTIVITY.STAL= LS_L1D_MISS - CYCLE_ACTIVITY.STALLS_L2_MISS) / tma_info_thread_clks - tma_l= 2_bound)", "MetricGroup": "MemoryBound;TmaL3mem;TopdownL3;tma_L3_group;tma_me= mory_bound_group", "MetricName": "tma_dram_bound", "MetricThreshold": "tma_dram_bound > 0.1 & (tma_memory_bound > 0.2= & tma_backend_bound > 0.2)", @@ -630,7 +652,7 @@ "MetricGroup": "BvMB;MemoryBW;TopdownL4;tma_L4_group;tma_issueBW;t= ma_issueSL;tma_issueSmSt;tma_l1_bound_group", "MetricName": "tma_fb_full", "MetricThreshold": "tma_fb_full > 0.3", - "PublicDescription": "This metric does a *rough estimation* of how= often L1D Fill Buffer unavailability limited additional L1D miss memory ac= cess requests to proceed. The higher the metric value; the deeper the memor= y hierarchy level the misses are satisfied from (metric values >1 are valid= ). Often it hints on approaching bandwidth limits (to L2 cache; L3 cache or= external memory). Related metrics: tma_bottleneck_cache_memory_bandwidth, = tma_info_system_dram_bw_use, tma_mem_bandwidth, tma_sq_full, tma_store_late= ncy, tma_streaming_stores", + "PublicDescription": "This metric does a *rough estimation* of how= often L1D Fill Buffer unavailability limited additional L1D miss memory ac= cess requests to proceed. The higher the metric value; the deeper the memor= y hierarchy level the misses are satisfied from (metric values >1 are valid= ). Often it hints on approaching bandwidth limits (to L2 cache; L3 cache or= external memory). Related metrics: tma_bottleneck_data_cache_memory_bandwi= dth, tma_info_system_dram_bw_use, tma_mem_bandwidth, tma_sq_full, tma_store= _latency, tma_streaming_stores", "ScaleUnit": "100%" }, { @@ -693,7 +715,6 @@ }, { "BriefDescription": "This metric approximates arithmetic floating-= point (FP) vector uops fraction the CPU has retired aggregated across all v= ector widths", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "cpu@FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE\\,umas= k\\=3D0xfc@ / UOPS_RETIRED.RETIRE_SLOTS", "MetricGroup": "Compute;Flops;TopdownL4;tma_L4_group;tma_fp_arith_= group;tma_issue2P", "MetricName": "tma_fp_vector", @@ -768,6 +789,7 @@ }, { "BriefDescription": "Branch Misprediction Cost: Cycles representin= g fraction of TMA slots wasted per non-speculative branch misprediction (re= tired JEClear)", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "tma_bottleneck_mispredictions * tma_info_thread_slo= ts / 4 / BR_MISP_RETIRED.ALL_BRANCHES / 100", "MetricGroup": "Bad;BrMispredicts;tma_issueBM", "MetricName": "tma_info_bad_spec_branch_misprediction_cost", @@ -803,6 +825,7 @@ }, { "BriefDescription": "Total pipeline cost of DSB (uop cache) hits -= subset of the Instruction_Fetch_BW Bottleneck", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_frontend_bound * (tma_fetch_bandwidth / = (tma_fetch_bandwidth + tma_fetch_latency)) * (tma_dsb / (tma_dsb + tma_mite= )))", "MetricGroup": "DSB;Fed;FetchBW;tma_issueFB", "MetricName": "tma_info_botlnk_l2_dsb_bandwidth", @@ -820,6 +843,7 @@ }, { "BriefDescription": "Total pipeline cost of Instruction Cache miss= es - subset of the Big_Code Bottleneck", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_fetch_latency * tma_icache_misses / (tma= _branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses += tma_lcp + tma_ms_switches))", "MetricGroup": "Fed;FetchLat;IcMiss;tma_issueFL", "MetricName": "tma_info_botlnk_l2_ic_misses", @@ -961,7 +985,6 @@ }, { "BriefDescription": "Instructions per FP Arithmetic instruction (l= ower number means higher occurrence rate)", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "INST_RETIRED.ANY / (FP_ARITH_INST_RETIRED.SCALAR + = cpu@FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE\\,umask\\=3D0xfc@)", "MetricGroup": "Flops;InsType", "MetricName": "tma_info_inst_mix_iparith", @@ -1249,7 +1272,7 @@ "MetricName": "tma_info_memory_tlb_store_stlb_mpki" }, { - "BriefDescription": "", + "BriefDescription": "Mem;Backend;CacheHits", "MetricExpr": "UOPS_EXECUTED.THREAD / (UOPS_EXECUTED.CORE_CYCLES_G= E_1 / 2 if #SMT_on else cpu@UOPS_EXECUTED.THREAD\\,cmask\\=3D1@)", "MetricGroup": "Cor;Pipeline;PortsUtil;SMT", "MetricName": "tma_info_pipeline_execute" @@ -1266,6 +1289,12 @@ "MetricGroup": "Fed;FetchBW", "MetricName": "tma_info_pipeline_fetch_mite" }, + { + "BriefDescription": "Average number of uops fetched from MS per cy= cle", + "MetricExpr": "IDQ.MS_UOPS / cpu@IDQ.MS_UOPS\\,cmask\\=3D1@", + "MetricGroup": "Fed;FetchLat;MicroSeq", + "MetricName": "tma_info_pipeline_fetch_ms" + }, { "BriefDescription": "Instructions per a microcode Assist invocatio= n", "MetricExpr": "INST_RETIRED.ANY / (FP_ASSIST.ANY + OTHER_ASSISTS.A= NY)", @@ -1282,7 +1311,7 @@ }, { "BriefDescription": "Measured Average Core Frequency for unhalted = processors [GHz]", - "MetricExpr": "tma_info_system_turbo_utilization * TSC / 1e9 / tma= _info_system_time", + "MetricExpr": "tma_info_system_turbo_utilization * msr@tsc@ / 1e9 = / tma_info_system_time", "MetricGroup": "Power;Summary", "MetricName": "tma_info_system_core_frequency" }, @@ -1294,16 +1323,28 @@ }, { "BriefDescription": "Average number of utilized CPUs", - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / TSC", + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@", "MetricGroup": "Summary", "MetricName": "tma_info_system_cpus_utilized" }, + { + "BriefDescription": "Average 3DXP Memory Bandwidth Use for reads [= GB / sec]", + "MetricExpr": "(64 * UNC_M_PMM_RPQ_INSERTS / 1e9 / tma_info_system= _time if #has_pmem > 0 else 0)", + "MetricGroup": "MemOffcore;MemoryBW;Server;SoC", + "MetricName": "tma_info_system_cxl_mem_read_bw" + }, + { + "BriefDescription": "Average 3DXP Memory Bandwidth Use for Writes = [GB / sec]", + "MetricExpr": "(64 * UNC_M_PMM_WPQ_INSERTS / 1e9 / tma_info_system= _time if #has_pmem > 0 else 0)", + "MetricGroup": "MemOffcore;MemoryBW;Server;SoC", + "MetricName": "tma_info_system_cxl_mem_write_bw" + }, { "BriefDescription": "Average external Memory Bandwidth Use for rea= ds and writes [GB / sec]", "MetricExpr": "64 * (UNC_M_CAS_COUNT.RD + UNC_M_CAS_COUNT.WR) / 1e= 9 / tma_info_system_time", "MetricGroup": "HPC;MemOffcore;MemoryBW;SoC;tma_issueBW", "MetricName": "tma_info_system_dram_bw_use", - "PublicDescription": "Average external Memory Bandwidth Use for re= ads and writes [GB / sec]. Related metrics: tma_bottleneck_cache_memory_ban= dwidth, tma_fb_full, tma_mem_bandwidth, tma_sq_full" + "PublicDescription": "Average external Memory Bandwidth Use for re= ads and writes [GB / sec]. Related metrics: tma_bottleneck_data_cache_memor= y_bandwidth, tma_fb_full, tma_mem_bandwidth, tma_sq_full" }, { "BriefDescription": "Giga Floating Point Operations Per Second", @@ -1361,6 +1402,13 @@ "MetricName": "tma_info_system_mem_parallel_reads", "PublicDescription": "Average number of parallel data read request= s to external memory. Accounts for demand loads and L1/L2 prefetches" }, + { + "BriefDescription": "Average latency of data read request to exter= nal 3D X-Point memory [in nanoseconds]", + "MetricExpr": "(1e9 * (UNC_M_PMM_RPQ_OCCUPANCY.ALL / UNC_M_PMM_RPQ= _INSERTS) / imc_0@event\\=3D0x0@ if #has_pmem > 0 else 0)", + "MetricGroup": "MemOffcore;MemoryLat;Server;SoC", + "MetricName": "tma_info_system_mem_pmm_read_latency", + "PublicDescription": "Average latency of data read request to exte= rnal 3D X-Point memory [in nanoseconds]. Accounts for demand loads and L1/L= 2 data-read prefetches" + }, { "BriefDescription": "Average latency of data read request to exter= nal memory (in nanoseconds)", "MetricExpr": "1e9 * (UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD / UNC_CHA_= TOR_INSERTS.IA_MISS_DRD) / (tma_info_system_socket_clks / tma_info_system_t= ime)", @@ -1499,12 +1547,13 @@ "ScaleUnit": "100%" }, { - "BriefDescription": "This metric([SKL+] roughly; [LNL]) estimates = fraction of cycles with demand load accesses that hit the L1D cache", + "BriefDescription": "This metric ([SKL+] roughly; [LNL]) estimates= fraction of cycles with demand load accesses that hit the L1D cache", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "min(2 * (MEM_INST_RETIRED.ALL_LOADS - MEM_LOAD_RETI= RED.FB_HIT - MEM_LOAD_RETIRED.L1_MISS) * 20 / 100, max(CYCLE_ACTIVITY.CYCLE= S_MEM_ANY - CYCLE_ACTIVITY.CYCLES_L1D_MISS, 0)) / tma_info_thread_clks", "MetricGroup": "BvML;MemoryLat;TopdownL4;tma_L4_group;tma_l1_bound= _group", "MetricName": "tma_l1_latency_dependency", "MetricThreshold": "tma_l1_latency_dependency > 0.1 & (tma_l1_boun= d > 0.1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric([SKL+] roughly; [LNL]) estimates= fraction of cycles with demand load accesses that hit the L1D cache. The s= hort latency of the L1D cache may be exposed in pointer-chasing memory acce= ss patterns as an example. Sample with: MEM_LOAD_RETIRED.L1_HIT", + "PublicDescription": "This metric ([SKL+] roughly; [LNL]) estimate= s fraction of cycles with demand load accesses that hit the L1D cache. The = short latency of the L1D cache may be exposed in pointer-chasing memory acc= ess patterns as an example. Sample with: MEM_LOAD_RETIRED.L1_HIT", "ScaleUnit": "100%" }, { @@ -1541,7 +1590,7 @@ "MetricGroup": "BvML;MemoryLat;TopdownL4;tma_L4_group;tma_issueLat= ;tma_l3_bound_group", "MetricName": "tma_l3_hit_latency", "MetricThreshold": "tma_l3_hit_latency > 0.1 & (tma_l3_bound > 0.0= 5 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric estimates fraction of cycles wit= h demand load accesses that hit the L3 cache under unloaded scenarios (poss= ibly L3 latency limited). Avoiding private cache misses (i.e. L2 misses/L3= hits) will improve the latency; reduce contention with sibling physical co= res and increase performance. Note the value of this node may overlap with= its siblings. Sample with: MEM_LOAD_RETIRED.L3_HIT_PS. Related metrics: tm= a_bottleneck_cache_memory_latency, tma_mem_latency", + "PublicDescription": "This metric estimates fraction of cycles wit= h demand load accesses that hit the L3 cache under unloaded scenarios (poss= ibly L3 latency limited). Avoiding private cache misses (i.e. L2 misses/L3= hits) will improve the latency; reduce contention with sibling physical co= res and increase performance. Note the value of this node may overlap with= its siblings. Sample with: MEM_LOAD_RETIRED.L3_HIT_PS. Related metrics: tm= a_bottleneck_data_cache_memory_latency, tma_mem_latency", "ScaleUnit": "100%" }, { @@ -1565,6 +1614,7 @@ }, { "BriefDescription": "This metric represents Core fraction of cycle= s CPU dispatched uops on execution port for Load operations", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "(UOPS_DISPATCHED_PORT.PORT_2 + UOPS_DISPATCHED_PORT= .PORT_3 + UOPS_DISPATCHED_PORT.PORT_7 - UOPS_DISPATCHED_PORT.PORT_4) / (2 *= tma_info_core_core_clks)", "MetricGroup": "TopdownL5;tma_L5_group;tma_ports_utilized_3m_group= ", "MetricName": "tma_load_op_utilization", @@ -1591,6 +1641,7 @@ }, { "BriefDescription": "This metric estimates the fraction of cycles = to walk the memory paging structures to cache translation of 1 GB pages for= data load accesses.", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "tma_load_stlb_miss * DTLB_LOAD_MISSES.WALK_COMPLETE= D_1G / (DTLB_LOAD_MISSES.WALK_COMPLETED_4K + DTLB_LOAD_MISSES.WALK_COMPLETE= D_2M_4M + DTLB_LOAD_MISSES.WALK_COMPLETED_1G)", "MetricGroup": "MemoryTLB;TopdownL6;tma_L6_group;tma_load_stlb_mis= s_group", "MetricName": "tma_load_stlb_miss_1g", @@ -1599,6 +1650,7 @@ }, { "BriefDescription": "This metric estimates the fraction of cycles = to walk the memory paging structures to cache translation of 2 or 4 MB page= s for data load accesses.", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "tma_load_stlb_miss * DTLB_LOAD_MISSES.WALK_COMPLETE= D_2M_4M / (DTLB_LOAD_MISSES.WALK_COMPLETED_4K + DTLB_LOAD_MISSES.WALK_COMPL= ETED_2M_4M + DTLB_LOAD_MISSES.WALK_COMPLETED_1G)", "MetricGroup": "MemoryTLB;TopdownL6;tma_L6_group;tma_load_stlb_mis= s_group", "MetricName": "tma_load_stlb_miss_2m", @@ -1607,6 +1659,7 @@ }, { "BriefDescription": "This metric estimates the fraction of cycles = to walk the memory paging structures to cache translation of 4 KB pages for= data load accesses.", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "tma_load_stlb_miss * DTLB_LOAD_MISSES.WALK_COMPLETE= D_4K / (DTLB_LOAD_MISSES.WALK_COMPLETED_4K + DTLB_LOAD_MISSES.WALK_COMPLETE= D_2M_4M + DTLB_LOAD_MISSES.WALK_COMPLETED_1G)", "MetricGroup": "MemoryTLB;TopdownL6;tma_L6_group;tma_load_stlb_mis= s_group", "MetricName": "tma_load_stlb_miss_4k", @@ -1624,6 +1677,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles the= CPU spent handling cache misses due to lock operations", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "(12 * max(0, MEM_INST_RETIRED.LOCK_LOADS - L2_RQSTS= .ALL_RFO) + MEM_INST_RETIRED.LOCK_LOADS / MEM_INST_RETIRED.ALL_STORES * (11= * L2_RQSTS.RFO_HIT + min(CPU_CLK_UNHALTED.THREAD, OFFCORE_REQUESTS_OUTSTAN= DING.CYCLES_WITH_DEMAND_RFO))) / tma_info_thread_clks", "MetricGroup": "LockCont;Offcore;TopdownL4;tma_L4_group;tma_issueR= FO;tma_l1_bound_group", "MetricName": "tma_lock_latency", @@ -1648,7 +1702,7 @@ "MetricGroup": "BvMB;MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_d= ram_bound_group;tma_issueBW", "MetricName": "tma_mem_bandwidth", "MetricThreshold": "tma_mem_bandwidth > 0.2 & (tma_dram_bound > 0.= 1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric estimates fraction of cycles whe= re the core's performance was likely hurt due to approaching bandwidth limi= ts of external memory - DRAM ([SPR-HBM] and/or HBM). The underlying heuris= tic assumes that a similar off-core traffic is generated by all IA cores. T= his metric does not aggregate non-data-read requests by this logical proces= sor; requests from other IA Logical Processors/Physical Cores/sockets; or o= ther non-IA devices like GPU; hence the maximum external memory bandwidth l= imits may or may not be approached when this metric is flagged (see Uncore = counters for that). Related metrics: tma_bottleneck_cache_memory_bandwidth,= tma_fb_full, tma_info_system_dram_bw_use, tma_sq_full", + "PublicDescription": "This metric estimates fraction of cycles whe= re the core's performance was likely hurt due to approaching bandwidth limi= ts of external memory - DRAM ([SPR-HBM] and/or HBM). The underlying heuris= tic assumes that a similar off-core traffic is generated by all IA cores. T= his metric does not aggregate non-data-read requests by this logical proces= sor; requests from other IA Logical Processors/Physical Cores/sockets; or o= ther non-IA devices like GPU; hence the maximum external memory bandwidth l= imits may or may not be approached when this metric is flagged (see Uncore = counters for that). Related metrics: tma_bottleneck_data_cache_memory_bandw= idth, tma_fb_full, tma_info_system_dram_bw_use, tma_sq_full", "ScaleUnit": "100%" }, { @@ -1657,7 +1711,7 @@ "MetricGroup": "BvML;MemoryLat;Offcore;TopdownL4;tma_L4_group;tma_= dram_bound_group;tma_issueLat", "MetricName": "tma_mem_latency", "MetricThreshold": "tma_mem_latency > 0.1 & (tma_dram_bound > 0.1 = & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric estimates fraction of cycles whe= re the performance was likely hurt due to latency from external memory - DR= AM ([SPR-HBM] and/or HBM). This metric does not aggregate requests from ot= her Logical Processors/Physical Cores/sockets (see Uncore counters for that= ). Related metrics: tma_bottleneck_cache_memory_latency, tma_l3_hit_latency= ", + "PublicDescription": "This metric estimates fraction of cycles whe= re the performance was likely hurt due to latency from external memory - DR= AM ([SPR-HBM] and/or HBM). This metric does not aggregate requests from ot= her Logical Processors/Physical Cores/sockets (see Uncore counters for that= ). Related metrics: tma_bottleneck_data_cache_memory_latency, tma_l3_hit_la= tency", "ScaleUnit": "100%" }, { @@ -1681,7 +1735,6 @@ }, { "BriefDescription": "This metric represents fraction of slots the = CPU was retiring uops fetched by the Microcode Sequencer (MS) unit", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / UOPS_ISSUED.ANY * IDQ.M= S_UOPS / tma_info_thread_slots", "MetricGroup": "MicroSeq;TopdownL3;tma_L3_group;tma_heavy_operatio= ns_group;tma_issueMC;tma_issueMS", "MetricName": "tma_microcode_sequencer", @@ -1691,6 +1744,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles the= CPU was stalled due to Branch Resteers as a result of Branch Misprediction= at execution stage", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "BR_MISP_RETIRED.ALL_BRANCHES / (BR_MISP_RETIRED.ALL= _BRANCHES + MACHINE_CLEARS.COUNT) * INT_MISC.CLEAR_RESTEER_CYCLES / tma_inf= o_thread_clks", "MetricGroup": "BadSpec;BrMispredicts;BvMP;TopdownL4;tma_L4_group;= tma_branch_resteers_group;tma_issueBM", "MetricName": "tma_mispredicts_resteers", @@ -1745,6 +1799,7 @@ }, { "BriefDescription": "This metric represents the remaining light uo= ps fraction the CPU has executed - remaining means not covered by other sib= ling nodes", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "max(0, tma_light_operations - (tma_fp_arith + tma_m= emory_operations + tma_fused_instructions + tma_non_fused_branches))", "MetricGroup": "Pipeline;TopdownL3;tma_L3_group;tma_light_operatio= ns_group", "MetricName": "tma_other_light_ops", @@ -1754,6 +1809,7 @@ }, { "BriefDescription": "This metric estimates fraction of slots the C= PU was stalled due to other cases of misprediction (non-retired x86 branche= s or other types).", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "max(tma_branch_mispredicts * (1 - BR_MISP_RETIRED.A= LL_BRANCHES / (INT_MISC.CLEARS_COUNT - MACHINE_CLEARS.COUNT)), 0.0001)", "MetricGroup": "BrMispredicts;BvIO;TopdownL3;tma_L3_group;tma_bran= ch_mispredicts_group", "MetricName": "tma_other_mispredicts", @@ -1762,6 +1818,7 @@ }, { "BriefDescription": "This metric represents fraction of slots the = CPU has wasted due to Nukes (Machine Clears) not related to memory ordering= .", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "max(tma_machine_clears * (1 - MACHINE_CLEARS.MEMORY= _ORDERING / MACHINE_CLEARS.COUNT), 0.0001)", "MetricGroup": "BvIO;Machine_Clears;TopdownL3;tma_L3_group;tma_mac= hine_clears_group", "MetricName": "tma_other_nukes", @@ -1842,6 +1899,7 @@ }, { "BriefDescription": "This metric estimates fraction of cycles the = CPU performance was potentially limited due to Core computation issues (non= divider-related)", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "((tma_ports_utilized_0 * tma_info_thread_clks + (EX= E_ACTIVITY.1_PORTS_UTIL + tma_retiring * EXE_ACTIVITY.2_PORTS_UTIL)) / tma_= info_thread_clks if ARITH.DIVIDER_ACTIVE < CYCLE_ACTIVITY.STALLS_TOTAL - CY= CLE_ACTIVITY.STALLS_MEM_ANY else (EXE_ACTIVITY.1_PORTS_UTIL + tma_retiring = * EXE_ACTIVITY.2_PORTS_UTIL) / tma_info_thread_clks)", "MetricGroup": "PortsUtil;TopdownL3;tma_L3_group;tma_core_bound_gr= oup", "MetricName": "tma_ports_utilization", @@ -1956,7 +2014,7 @@ "MetricGroup": "BvMB;MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_i= ssueBW;tma_l3_bound_group", "MetricName": "tma_sq_full", "MetricThreshold": "tma_sq_full > 0.3 & (tma_l3_bound > 0.05 & (tm= a_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric measures fraction of cycles wher= e the Super Queue (SQ) was full taking into account all request-types and b= oth hardware SMT threads (Logical Processors). Related metrics: tma_bottlen= eck_cache_memory_bandwidth, tma_fb_full, tma_info_system_dram_bw_use, tma_m= em_bandwidth", + "PublicDescription": "This metric measures fraction of cycles wher= e the Super Queue (SQ) was full taking into account all request-types and b= oth hardware SMT threads (Logical Processors). Related metrics: tma_bottlen= eck_data_cache_memory_bandwidth, tma_fb_full, tma_info_system_dram_bw_use, = tma_mem_bandwidth", "ScaleUnit": "100%" }, { @@ -2013,6 +2071,7 @@ }, { "BriefDescription": "This metric estimates the fraction of cycles = to walk the memory paging structures to cache translation of 1 GB pages for= data store accesses.", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "tma_store_stlb_miss * DTLB_STORE_MISSES.WALK_COMPLE= TED_1G / (DTLB_STORE_MISSES.WALK_COMPLETED_4K + DTLB_STORE_MISSES.WALK_COMP= LETED_2M_4M + DTLB_STORE_MISSES.WALK_COMPLETED_1G)", "MetricGroup": "MemoryTLB;TopdownL6;tma_L6_group;tma_store_stlb_mi= ss_group", "MetricName": "tma_store_stlb_miss_1g", @@ -2021,6 +2080,7 @@ }, { "BriefDescription": "This metric estimates the fraction of cycles = to walk the memory paging structures to cache translation of 2 or 4 MB page= s for data store accesses.", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "tma_store_stlb_miss * DTLB_STORE_MISSES.WALK_COMPLE= TED_2M_4M / (DTLB_STORE_MISSES.WALK_COMPLETED_4K + DTLB_STORE_MISSES.WALK_C= OMPLETED_2M_4M + DTLB_STORE_MISSES.WALK_COMPLETED_1G)", "MetricGroup": "MemoryTLB;TopdownL6;tma_L6_group;tma_store_stlb_mi= ss_group", "MetricName": "tma_store_stlb_miss_2m", @@ -2029,6 +2089,7 @@ }, { "BriefDescription": "This metric estimates the fraction of cycles = to walk the memory paging structures to cache translation of 4 KB pages for= data store accesses.", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "tma_store_stlb_miss * DTLB_STORE_MISSES.WALK_COMPLE= TED_4K / (DTLB_STORE_MISSES.WALK_COMPLETED_4K + DTLB_STORE_MISSES.WALK_COMP= LETED_2M_4M + DTLB_STORE_MISSES.WALK_COMPLETED_1G)", "MetricGroup": "MemoryTLB;TopdownL6;tma_L6_group;tma_store_stlb_mi= ss_group", "MetricName": "tma_store_stlb_miss_4k", --=20 2.50.0.727.gbf7dc18ff4-goog From nobody Mon Oct 6 15:13:22 2025 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4F9591DFD8F for ; Sat, 19 Jul 2025 03:45:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752896749; cv=none; b=ACTDp8ZclGU9PnlvMAQLV4KghrrW13bu+dtKvXfYOCzxv6/+OVzYGADks50DfcG7fMXOfCLsEUmd96Vzc/WfvwxZfzKEyDWXkeuMsupdrTdZOZQcKuvKAZZzTKWlfcK5K5WNu4wTaZi5c0TreRlEwp6Icxj8cSXOg5RJeHiTsX8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752896749; c=relaxed/simple; bh=2ugETdmoTGWuhDfvHrq4hWMBatZ2WjF566K63lIG0jc=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Content-Type; b=ifYHU4a/BfKGf/JkiuU+dIEeHiL8PxY2hN3CLegaKAo+7BBOUoBzcP8GBn2wijRdlXMqzKS+N5N0kYnUTfWR9fGOv7LD5+DecM6K1vcJDSzuQRcJ0531qoDTXA+Cas2ICKSwmetD3hBN8tto4QLaqqNHtCp3PLbkdwN1+K0QdDQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=VtZEC/+n; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="VtZEC/+n" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-3141f9ce4e2so3985132a91.1 for ; Fri, 18 Jul 2025 20:45:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1752896735; x=1753501535; darn=vger.kernel.org; h=content-transfer-encoding:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=fI/s2K4vL5PP6+xuepiTMMS6LZf6OBMfzqt8uLyDIxs=; b=VtZEC/+ndR0j/TOhRNsfpIZLJj0dVS1amzD926pc67d1LOluczY7usVoouMomL5/5R aH2dhi6I7oUykvoOslZr10sjQwE9qDMLYHD/UgVB9zfS6k0igP5Gk609MKGnUeBZA0Rf SLRIrbuKRB0yY+Oi2zql/NjcnzH01oqwOdoIHR30y89w/RBr7OUgIIIRziYw7liDtWH7 917FrG6Z3UiQHqYDidfZqaQl/Q57jjyLkXR72fP3Ls4rD7LrImkf710fvCOEFMfE0owB QGcXwa3U8GK+p6Kn64yeT/PwfBiYVYvHRuaUelVj7KS5YVEuRGCqdegmKi9PJvaXtRn3 vXjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752896735; x=1753501535; h=content-transfer-encoding:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=fI/s2K4vL5PP6+xuepiTMMS6LZf6OBMfzqt8uLyDIxs=; b=GXkd/cUIZz9OpV/gXdCAtjz9h3PFNYE8+Di2u4XXliWNi5vwfgjEv8mNFwKBfw0YaU PKxTEM9H6RzQ/ubGbtCJsN4Koq3LsDJuEvQbVTBhE9mc1A7BQQu3j4PKM9oyQQee26S4 exIGI2U7XNI8y/LY3DxsdT7xNpyPiqjloj0WUrRaIGhquF0s8F2iSHlvN4GG9gRIEmO9 QFlLGDFWkT01mYvDwpLPGCZDA6MYT0FYkUXqnRdoqnJqKagdMlq1/x5TdxkKfMeI1omA wYv1UVTrlEoJcC62Oc/eEbYkJoYhI8+JWZ/MQvQksqXWk1ss4y6gbLwaC9n4uJEWUwSf Bpxw== X-Forwarded-Encrypted: i=1; AJvYcCU54lvywR/uc3ocGivnY/PwycCJYv4wEjCTzfUW93idRyDaHmGTQLvsnb9ufu2zaaNb2ZuJJyvNQNb6NjY=@vger.kernel.org X-Gm-Message-State: AOJu0YzmjNBJDW72T7ahYx4X11KuBW3XjSvS6cskOiUQzhow4fWIFD/+ X+D1Ak6TGSgjlARGBpYal4a4tdJwCw5aPlUxYQ5rwRhDpz4WAyBlM8/xyexDBjClTGYUJSApEuT wT/KcNHhB6g== X-Google-Smtp-Source: AGHT+IGUU/eRRQ3/zYkipYuq5msRtezt4I9RkMFamcX6fmn1zwSnAKhrQ19M6Ll5ameuxIz2Nwz7Mp10bVGl X-Received: from pjbrs12.prod.google.com ([2002:a17:90b:2b8c:b0:312:187d:382d]) (user=irogers job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:4cc4:b0:313:f9f6:309f with SMTP id 98e67ed59e1d1-31c9f44dfc3mr16997190a91.34.1752896734667; Fri, 18 Jul 2025 20:45:34 -0700 (PDT) Date: Fri, 18 Jul 2025 20:45:01 -0700 In-Reply-To: <20250719034515.2000467-1-irogers@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250719034515.2000467-1-irogers@google.com> X-Mailer: git-send-email 2.50.0.727.gbf7dc18ff4-goog Message-ID: <20250719034515.2000467-6-irogers@google.com> Subject: [PATCH v1 05/19] perf vendor events: Update emeraldrapids events/metrics From: Ian Rogers To: Thomas Falcon , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Kan Liang , "=?UTF-8?q?Andreas=20F=C3=A4rber?=" , Manivannan Sadhasivam , Caleb Biggers , Weilin Wang , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-actions@lists.infradead.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Update events from v1.14 to v1.16. Update metrics from TMA 5.0 to 5.1. The event updates come from: https://github.com/intel/perfmon/commit/9020e49e790349f86141a1f6ef6f009f189= a0f48 https://github.com/intel/perfmon/commit/a0567b56185fea4b968391aae344e9d8ed9= 782d8 Signed-off-by: Ian Rogers Tested-by: Thomas Falcon --- .../arch/x86/emeraldrapids/cache.json | 100 +++++------ .../arch/x86/emeraldrapids/emr-metrics.json | 131 ++++++++------ .../x86/emeraldrapids/floating-point.json | 43 ++--- .../arch/x86/emeraldrapids/frontend.json | 42 +++-- .../arch/x86/emeraldrapids/memory.json | 30 ++-- .../arch/x86/emeraldrapids/other.json | 28 ++- .../arch/x86/emeraldrapids/pipeline.json | 167 +++++++----------- .../arch/x86/emeraldrapids/uncore-memory.json | 82 +++++++++ .../x86/emeraldrapids/virtual-memory.json | 40 ++--- tools/perf/pmu-events/arch/x86/mapfile.csv | 2 +- 10 files changed, 363 insertions(+), 302 deletions(-) diff --git a/tools/perf/pmu-events/arch/x86/emeraldrapids/cache.json b/tool= s/perf/pmu-events/arch/x86/emeraldrapids/cache.json index 10bdb193c16f..e96f938587bb 100644 --- a/tools/perf/pmu-events/arch/x86/emeraldrapids/cache.json +++ b/tools/perf/pmu-events/arch/x86/emeraldrapids/cache.json @@ -4,7 +4,6 @@ "Counter": "0,1,2,3", "EventCode": "0x51", "EventName": "L1D.HWPF_MISS", - "PublicDescription": "L1D.HWPF_MISS Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x20" }, @@ -13,7 +12,7 @@ "Counter": "0,1,2,3", "EventCode": "0x51", "EventName": "L1D.REPLACEMENT", - "PublicDescription": "Counts L1D data line replacements including = opportunistic replacements, and replacements that require stall-for-replace= or block-for-replace. Available PDIST counters: 0", + "PublicDescription": "Counts L1D data line replacements including = opportunistic replacements, and replacements that require stall-for-replace= or block-for-replace.", "SampleAfterValue": "100003", "UMask": "0x1" }, @@ -22,7 +21,7 @@ "Counter": "0,1,2,3", "EventCode": "0x48", "EventName": "L1D_PEND_MISS.FB_FULL", - "PublicDescription": "Counts number of cycles a demand request has= waited due to L1D Fill Buffer (FB) unavailability. Demand requests include= cacheable/uncacheable demand load, store, lock or SW prefetch accesses. Av= ailable PDIST counters: 0", + "PublicDescription": "Counts number of cycles a demand request has= waited due to L1D Fill Buffer (FB) unavailability. Demand requests include= cacheable/uncacheable demand load, store, lock or SW prefetch accesses.", "SampleAfterValue": "1000003", "UMask": "0x2" }, @@ -33,7 +32,7 @@ "EdgeDetect": "1", "EventCode": "0x48", "EventName": "L1D_PEND_MISS.FB_FULL_PERIODS", - "PublicDescription": "Counts number of phases a demand request has= waited due to L1D Fill Buffer (FB) unavailability. Demand requests include= cacheable/uncacheable demand load, store, lock or SW prefetch accesses. Av= ailable PDIST counters: 0", + "PublicDescription": "Counts number of phases a demand request has= waited due to L1D Fill Buffer (FB) unavailability. Demand requests include= cacheable/uncacheable demand load, store, lock or SW prefetch accesses.", "SampleAfterValue": "1000003", "UMask": "0x2" }, @@ -43,7 +42,6 @@ "Deprecated": "1", "EventCode": "0x48", "EventName": "L1D_PEND_MISS.L2_STALL", - "PublicDescription": "This event is deprecated. Refer to new event= L1D_PEND_MISS.L2_STALLS Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x4" }, @@ -52,7 +50,7 @@ "Counter": "0,1,2,3", "EventCode": "0x48", "EventName": "L1D_PEND_MISS.L2_STALLS", - "PublicDescription": "Counts number of cycles a demand request has= waited due to L1D due to lack of L2 resources. Demand requests include cac= heable/uncacheable demand load, store, lock or SW prefetch accesses. Availa= ble PDIST counters: 0", + "PublicDescription": "Counts number of cycles a demand request has= waited due to L1D due to lack of L2 resources. Demand requests include cac= heable/uncacheable demand load, store, lock or SW prefetch accesses.", "SampleAfterValue": "1000003", "UMask": "0x4" }, @@ -61,7 +59,7 @@ "Counter": "0,1,2,3", "EventCode": "0x48", "EventName": "L1D_PEND_MISS.PENDING", - "PublicDescription": "Counts number of L1D misses that are outstan= ding in each cycle, that is each cycle the number of Fill Buffers (FB) outs= tanding required by Demand Reads. FB either is held by demand loads, or it = is held by non-demand loads and gets hit at least once by demand. The valid= outstanding interval is defined until the FB deallocation by one of the fo= llowing ways: from FB allocation, if FB is allocated by demand from the dem= and Hit FB, if it is allocated by hardware or software prefetch. Note: In t= he L1D, a Demand Read contains cacheable or noncacheable demand loads, incl= uding ones causing cache-line splits and reads due to page walks resulted f= rom any request type. Available PDIST counters: 0", + "PublicDescription": "Counts number of L1D misses that are outstan= ding in each cycle, that is each cycle the number of Fill Buffers (FB) outs= tanding required by Demand Reads. FB either is held by demand loads, or it = is held by non-demand loads and gets hit at least once by demand. The valid= outstanding interval is defined until the FB deallocation by one of the fo= llowing ways: from FB allocation, if FB is allocated by demand from the dem= and Hit FB, if it is allocated by hardware or software prefetch. Note: In t= he L1D, a Demand Read contains cacheable or noncacheable demand loads, incl= uding ones causing cache-line splits and reads due to page walks resulted f= rom any request type.", "SampleAfterValue": "1000003", "UMask": "0x1" }, @@ -71,7 +69,7 @@ "CounterMask": "1", "EventCode": "0x48", "EventName": "L1D_PEND_MISS.PENDING_CYCLES", - "PublicDescription": "Counts duration of L1D miss outstanding in c= ycles. Available PDIST counters: 0", + "PublicDescription": "Counts duration of L1D miss outstanding in c= ycles.", "SampleAfterValue": "1000003", "UMask": "0x1" }, @@ -80,7 +78,7 @@ "Counter": "0,1,2,3", "EventCode": "0x25", "EventName": "L2_LINES_IN.ALL", - "PublicDescription": "Counts the number of L2 cache lines filling = the L2. Counting does not cover rejects. Available PDIST counters: 0", + "PublicDescription": "Counts the number of L2 cache lines filling = the L2. Counting does not cover rejects.", "SampleAfterValue": "100003", "UMask": "0x1f" }, @@ -89,7 +87,7 @@ "Counter": "0,1,2,3", "EventCode": "0x26", "EventName": "L2_LINES_OUT.NON_SILENT", - "PublicDescription": "Counts the number of lines that are evicted = by L2 cache when triggered by an L2 cache fill. Those lines are in Modified= state. Modified lines are written back to L3 Available PDIST counters: 0", + "PublicDescription": "Counts the number of lines that are evicted = by L2 cache when triggered by an L2 cache fill. Those lines are in Modified= state. Modified lines are written back to L3", "SampleAfterValue": "200003", "UMask": "0x2" }, @@ -98,7 +96,7 @@ "Counter": "0,1,2,3", "EventCode": "0x26", "EventName": "L2_LINES_OUT.SILENT", - "PublicDescription": "Counts the number of lines that are silently= dropped by L2 cache. These lines are typically in Shared or Exclusive stat= e. A non-threaded event. Available PDIST counters: 0", + "PublicDescription": "Counts the number of lines that are silently= dropped by L2 cache. These lines are typically in Shared or Exclusive stat= e. A non-threaded event.", "SampleAfterValue": "200003", "UMask": "0x1" }, @@ -107,7 +105,7 @@ "Counter": "0,1,2,3", "EventCode": "0x26", "EventName": "L2_LINES_OUT.USELESS_HWPF", - "PublicDescription": "Counts the number of cache lines that have b= een prefetched by the L2 hardware prefetcher but not used by demand access = when evicted from the L2 cache Available PDIST counters: 0", + "PublicDescription": "Counts the number of cache lines that have b= een prefetched by the L2 hardware prefetcher but not used by demand access = when evicted from the L2 cache", "SampleAfterValue": "200003", "UMask": "0x4" }, @@ -116,7 +114,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_REQUEST.ALL", - "PublicDescription": "Counts all requests that were hit or true mi= sses in L2 cache. True-miss excludes misses that were merged with ongoing L= 2 misses. [This event is alias to L2_RQSTS.REFERENCES] Available PDIST coun= ters: 0", + "PublicDescription": "Counts all requests that were hit or true mi= sses in L2 cache. True-miss excludes misses that were merged with ongoing L= 2 misses. [This event is alias to L2_RQSTS.REFERENCES]", "SampleAfterValue": "200003", "UMask": "0xff" }, @@ -125,7 +123,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_REQUEST.MISS", - "PublicDescription": "Counts read requests of any type with true-m= iss in the L2 cache. True-miss excludes L2 misses that were merged with ong= oing L2 misses. [This event is alias to L2_RQSTS.MISS] Available PDIST coun= ters: 0", + "PublicDescription": "Counts read requests of any type with true-m= iss in the L2 cache. True-miss excludes L2 misses that were merged with ong= oing L2 misses. [This event is alias to L2_RQSTS.MISS]", "SampleAfterValue": "200003", "UMask": "0x3f" }, @@ -134,7 +132,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.ALL_CODE_RD", - "PublicDescription": "Counts the total number of L2 code requests.= Available PDIST counters: 0", + "PublicDescription": "Counts the total number of L2 code requests.= ", "SampleAfterValue": "200003", "UMask": "0xe4" }, @@ -143,7 +141,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.ALL_DEMAND_DATA_RD", - "PublicDescription": "Counts Demand Data Read requests accessing t= he L2 cache. These requests may hit or miss L2 cache. True-miss exclude mis= ses that were merged with ongoing L2 misses. An access is counted once. Ava= ilable PDIST counters: 0", + "PublicDescription": "Counts Demand Data Read requests accessing t= he L2 cache. These requests may hit or miss L2 cache. True-miss exclude mis= ses that were merged with ongoing L2 misses. An access is counted once.", "SampleAfterValue": "200003", "UMask": "0xe1" }, @@ -152,7 +150,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.ALL_DEMAND_MISS", - "PublicDescription": "Counts demand requests that miss L2 cache. A= vailable PDIST counters: 0", + "PublicDescription": "Counts demand requests that miss L2 cache.", "SampleAfterValue": "200003", "UMask": "0x27" }, @@ -161,7 +159,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.ALL_DEMAND_REFERENCES", - "PublicDescription": "Counts demand requests to L2 cache. Availabl= e PDIST counters: 0", + "PublicDescription": "Counts demand requests to L2 cache.", "SampleAfterValue": "200003", "UMask": "0xe7" }, @@ -170,7 +168,6 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.ALL_HWPF", - "PublicDescription": "L2_RQSTS.ALL_HWPF Available PDIST counters: = 0", "SampleAfterValue": "200003", "UMask": "0xf0" }, @@ -179,7 +176,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.ALL_RFO", - "PublicDescription": "Counts the total number of RFO (read for own= ership) requests to L2 cache. L2 RFO requests include both L1D demand RFO m= isses as well as L1D RFO prefetches. Available PDIST counters: 0", + "PublicDescription": "Counts the total number of RFO (read for own= ership) requests to L2 cache. L2 RFO requests include both L1D demand RFO m= isses as well as L1D RFO prefetches.", "SampleAfterValue": "200003", "UMask": "0xe2" }, @@ -188,7 +185,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.CODE_RD_HIT", - "PublicDescription": "Counts L2 cache hits when fetching instructi= ons, code reads. Available PDIST counters: 0", + "PublicDescription": "Counts L2 cache hits when fetching instructi= ons, code reads.", "SampleAfterValue": "200003", "UMask": "0xc4" }, @@ -197,7 +194,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.CODE_RD_MISS", - "PublicDescription": "Counts L2 cache misses when fetching instruc= tions. Available PDIST counters: 0", + "PublicDescription": "Counts L2 cache misses when fetching instruc= tions.", "SampleAfterValue": "200003", "UMask": "0x24" }, @@ -206,7 +203,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.DEMAND_DATA_RD_HIT", - "PublicDescription": "Counts the number of demand Data Read reques= ts initiated by load instructions that hit L2 cache. Available PDIST counte= rs: 0", + "PublicDescription": "Counts the number of demand Data Read reques= ts initiated by load instructions that hit L2 cache.", "SampleAfterValue": "200003", "UMask": "0xc1" }, @@ -215,7 +212,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.DEMAND_DATA_RD_MISS", - "PublicDescription": "Counts demand Data Read requests with true-m= iss in the L2 cache. True-miss excludes misses that were merged with ongoin= g L2 misses. An access is counted once. Available PDIST counters: 0", + "PublicDescription": "Counts demand Data Read requests with true-m= iss in the L2 cache. True-miss excludes misses that were merged with ongoin= g L2 misses. An access is counted once.", "SampleAfterValue": "200003", "UMask": "0x21" }, @@ -224,7 +221,6 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.HWPF_MISS", - "PublicDescription": "L2_RQSTS.HWPF_MISS Available PDIST counters:= 0", "SampleAfterValue": "200003", "UMask": "0x30" }, @@ -233,7 +229,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.MISS", - "PublicDescription": "Counts read requests of any type with true-m= iss in the L2 cache. True-miss excludes L2 misses that were merged with ong= oing L2 misses. [This event is alias to L2_REQUEST.MISS] Available PDIST co= unters: 0", + "PublicDescription": "Counts read requests of any type with true-m= iss in the L2 cache. True-miss excludes L2 misses that were merged with ong= oing L2 misses. [This event is alias to L2_REQUEST.MISS]", "SampleAfterValue": "200003", "UMask": "0x3f" }, @@ -242,7 +238,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.REFERENCES", - "PublicDescription": "Counts all requests that were hit or true mi= sses in L2 cache. True-miss excludes misses that were merged with ongoing L= 2 misses. [This event is alias to L2_REQUEST.ALL] Available PDIST counters:= 0", + "PublicDescription": "Counts all requests that were hit or true mi= sses in L2 cache. True-miss excludes misses that were merged with ongoing L= 2 misses. [This event is alias to L2_REQUEST.ALL]", "SampleAfterValue": "200003", "UMask": "0xff" }, @@ -251,7 +247,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.RFO_HIT", - "PublicDescription": "Counts the RFO (Read-for-Ownership) requests= that hit L2 cache. Available PDIST counters: 0", + "PublicDescription": "Counts the RFO (Read-for-Ownership) requests= that hit L2 cache.", "SampleAfterValue": "200003", "UMask": "0xc2" }, @@ -260,7 +256,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.RFO_MISS", - "PublicDescription": "Counts the RFO (Read-for-Ownership) requests= that miss L2 cache. Available PDIST counters: 0", + "PublicDescription": "Counts the RFO (Read-for-Ownership) requests= that miss L2 cache.", "SampleAfterValue": "200003", "UMask": "0x22" }, @@ -269,7 +265,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.SWPF_HIT", - "PublicDescription": "Counts Software prefetch requests that hit t= he L2 cache. Accounts for PREFETCHNTA and PREFETCHT0/1/2 instructions when = FB is not full. Available PDIST counters: 0", + "PublicDescription": "Counts Software prefetch requests that hit t= he L2 cache. Accounts for PREFETCHNTA and PREFETCHT0/1/2 instructions when = FB is not full.", "SampleAfterValue": "200003", "UMask": "0xc8" }, @@ -278,7 +274,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.SWPF_MISS", - "PublicDescription": "Counts Software prefetch requests that miss = the L2 cache. Accounts for PREFETCHNTA and PREFETCHT0/1/2 instructions when= FB is not full. Available PDIST counters: 0", + "PublicDescription": "Counts Software prefetch requests that miss = the L2 cache. Accounts for PREFETCHNTA and PREFETCHT0/1/2 instructions when= FB is not full.", "SampleAfterValue": "200003", "UMask": "0x28" }, @@ -287,7 +283,7 @@ "Counter": "0,1,2,3", "EventCode": "0x23", "EventName": "L2_TRANS.L2_WB", - "PublicDescription": "Counts L2 writebacks that access L2 cache. A= vailable PDIST counters: 0", + "PublicDescription": "Counts L2 writebacks that access L2 cache.", "SampleAfterValue": "200003", "UMask": "0x40" }, @@ -296,7 +292,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0x2e", "EventName": "LONGEST_LAT_CACHE.MISS", - "PublicDescription": "Counts core-originated cacheable requests th= at miss the L3 cache (Longest Latency cache). Requests include data and cod= e reads, Reads-for-Ownership (RFOs), speculative accesses and hardware pref= etches to the L1 and L2. It does not include hardware prefetches to the L3= , and may not count other types of requests to the L3. Available PDIST coun= ters: 0", + "PublicDescription": "Counts core-originated cacheable requests th= at miss the L3 cache (Longest Latency cache). Requests include data and cod= e reads, Reads-for-Ownership (RFOs), speculative accesses and hardware pref= etches to the L1 and L2. It does not include hardware prefetches to the L3= , and may not count other types of requests to the L3.", "SampleAfterValue": "100003", "UMask": "0x41" }, @@ -305,7 +301,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0x2e", "EventName": "LONGEST_LAT_CACHE.REFERENCE", - "PublicDescription": "Counts core-originated cacheable requests to= the L3 cache (Longest Latency cache). Requests include data and code reads= , Reads-for-Ownership (RFOs), speculative accesses and hardware prefetches = to the L1 and L2. It does not include hardware prefetches to the L3, and m= ay not count other types of requests to the L3. Available PDIST counters: 0= ", + "PublicDescription": "Counts core-originated cacheable requests to= the L3 cache (Longest Latency cache). Requests include data and code reads= , Reads-for-Ownership (RFOs), speculative accesses and hardware prefetches = to the L1 and L2. It does not include hardware prefetches to the L3, and m= ay not count other types of requests to the L3.", "SampleAfterValue": "100003", "UMask": "0x4f" }, @@ -394,7 +390,7 @@ "Counter": "0,1,2,3", "EventCode": "0x43", "EventName": "MEM_LOAD_COMPLETED.L1_MISS_ANY", - "PublicDescription": "Number of completed demand load requests tha= t missed the L1 data cache including shadow misses (FB hits, merge to an on= going L1D miss) Available PDIST counters: 0", + "PublicDescription": "Number of completed demand load requests tha= t missed the L1 data cache including shadow misses (FB hits, merge to an on= going L1D miss)", "SampleAfterValue": "1000003", "UMask": "0xfd" }, @@ -563,7 +559,6 @@ "Counter": "0,1,2,3", "EventCode": "0x44", "EventName": "MEM_STORE_RETIRED.L2_HIT", - "PublicDescription": "MEM_STORE_RETIRED.L2_HIT Available PDIST cou= nters: 0", "SampleAfterValue": "200003", "UMask": "0x1" }, @@ -572,7 +567,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe5", "EventName": "MEM_UOP_RETIRED.ANY", - "PublicDescription": "Number of retired micro-operations (uops) fo= r load or store memory accesses Available PDIST counters: 0", + "PublicDescription": "Number of retired micro-operations (uops) fo= r load or store memory accesses", "SampleAfterValue": "1000003", "UMask": "0x3" }, @@ -999,7 +994,6 @@ "Counter": "0,1,2,3", "EventCode": "0x21", "EventName": "OFFCORE_REQUESTS.ALL_REQUESTS", - "PublicDescription": "OFFCORE_REQUESTS.ALL_REQUESTS Available PDIS= T counters: 0", "SampleAfterValue": "100003", "UMask": "0x80" }, @@ -1008,7 +1002,7 @@ "Counter": "0,1,2,3", "EventCode": "0x21", "EventName": "OFFCORE_REQUESTS.DATA_RD", - "PublicDescription": "Counts the demand and prefetch data reads. A= ll Core Data Reads include cacheable 'Demands' and L2 prefetchers (not L3 p= refetchers). Counting also covers reads due to page walks resulted from any= request type. Available PDIST counters: 0", + "PublicDescription": "Counts the demand and prefetch data reads. A= ll Core Data Reads include cacheable 'Demands' and L2 prefetchers (not L3 p= refetchers). Counting also covers reads due to page walks resulted from any= request type.", "SampleAfterValue": "100003", "UMask": "0x8" }, @@ -1017,7 +1011,7 @@ "Counter": "0,1,2,3", "EventCode": "0x21", "EventName": "OFFCORE_REQUESTS.DEMAND_CODE_RD", - "PublicDescription": "Counts both cacheable and non-cacheable code= read requests. Available PDIST counters: 0", + "PublicDescription": "Counts both cacheable and non-cacheable code= read requests.", "SampleAfterValue": "100003", "UMask": "0x2" }, @@ -1026,7 +1020,7 @@ "Counter": "0,1,2,3", "EventCode": "0x21", "EventName": "OFFCORE_REQUESTS.DEMAND_DATA_RD", - "PublicDescription": "Counts the Demand Data Read requests sent to= uncore. Use it in conjunction with OFFCORE_REQUESTS_OUTSTANDING to determi= ne average latency in the uncore. Available PDIST counters: 0", + "PublicDescription": "Counts the Demand Data Read requests sent to= uncore. Use it in conjunction with OFFCORE_REQUESTS_OUTSTANDING to determi= ne average latency in the uncore.", "SampleAfterValue": "100003", "UMask": "0x1" }, @@ -1035,7 +1029,7 @@ "Counter": "0,1,2,3", "EventCode": "0x21", "EventName": "OFFCORE_REQUESTS.DEMAND_RFO", - "PublicDescription": "Counts the demand RFO (read for ownership) r= equests including regular RFOs, locks, ItoM. Available PDIST counters: 0", + "PublicDescription": "Counts the demand RFO (read for ownership) r= equests including regular RFOs, locks, ItoM.", "SampleAfterValue": "100003", "UMask": "0x4" }, @@ -1045,7 +1039,6 @@ "Deprecated": "1", "EventCode": "0x20", "EventName": "OFFCORE_REQUESTS_OUTSTANDING.ALL_DATA_RD", - "PublicDescription": "This event is deprecated. Refer to new event= OFFCORE_REQUESTS_OUTSTANDING.DATA_RD Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x8" }, @@ -1055,7 +1048,6 @@ "CounterMask": "1", "EventCode": "0x20", "EventName": "OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD", - "PublicDescription": "OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DAT= A_RD Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x8" }, @@ -1065,7 +1057,7 @@ "CounterMask": "1", "EventCode": "0x20", "EventName": "OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_CODE= _RD", - "PublicDescription": "Counts the number of offcore outstanding Cod= e Reads transactions in the super queue every cycle. The 'Offcore outstandi= ng' state of the transaction lasts from the L2 miss until the sending trans= action completion to requestor (SQ deallocation). See the corresponding Uma= sk under OFFCORE_REQUESTS. Available PDIST counters: 0", + "PublicDescription": "Counts the number of offcore outstanding Cod= e Reads transactions in the super queue every cycle. The 'Offcore outstandi= ng' state of the transaction lasts from the L2 miss until the sending trans= action completion to requestor (SQ deallocation). See the corresponding Uma= sk under OFFCORE_REQUESTS.", "SampleAfterValue": "1000003", "UMask": "0x2" }, @@ -1075,7 +1067,6 @@ "CounterMask": "1", "EventCode": "0x20", "EventName": "OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_DATA= _RD", - "PublicDescription": "Cycles where at least 1 outstanding demand d= ata read request is pending. Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -1085,7 +1076,6 @@ "CounterMask": "1", "EventCode": "0x20", "EventName": "OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_RFO", - "PublicDescription": "OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEM= AND_RFO Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x4" }, @@ -1094,7 +1084,6 @@ "Counter": "0,1,2,3", "EventCode": "0x20", "EventName": "OFFCORE_REQUESTS_OUTSTANDING.DATA_RD", - "PublicDescription": "OFFCORE_REQUESTS_OUTSTANDING.DATA_RD Availab= le PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x8" }, @@ -1103,7 +1092,7 @@ "Counter": "0,1,2,3", "EventCode": "0x20", "EventName": "OFFCORE_REQUESTS_OUTSTANDING.DEMAND_CODE_RD", - "PublicDescription": "Counts the number of offcore outstanding Cod= e Reads transactions in the super queue every cycle. The 'Offcore outstandi= ng' state of the transaction lasts from the L2 miss until the sending trans= action completion to requestor (SQ deallocation). See the corresponding Uma= sk under OFFCORE_REQUESTS. Available PDIST counters: 0", + "PublicDescription": "Counts the number of offcore outstanding Cod= e Reads transactions in the super queue every cycle. The 'Offcore outstandi= ng' state of the transaction lasts from the L2 miss until the sending trans= action completion to requestor (SQ deallocation). See the corresponding Uma= sk under OFFCORE_REQUESTS.", "SampleAfterValue": "1000003", "UMask": "0x2" }, @@ -1112,7 +1101,7 @@ "Counter": "0,1,2,3", "EventCode": "0x20", "EventName": "OFFCORE_REQUESTS_OUTSTANDING.DEMAND_DATA_RD", - "PublicDescription": "For every cycle, increments by the number of= outstanding demand data read requests pending. Requests are considered o= utstanding from the time they miss the core's L2 cache until the transactio= n completion message is sent to the requestor. Available PDIST counters: 0", + "PublicDescription": "For every cycle, increments by the number of= outstanding demand data read requests pending. Requests are considered o= utstanding from the time they miss the core's L2 cache until the transactio= n completion message is sent to the requestor.", "SampleAfterValue": "1000003", "UMask": "0x1" }, @@ -1121,7 +1110,7 @@ "Counter": "0,1,2,3", "EventCode": "0x2c", "EventName": "SQ_MISC.BUS_LOCK", - "PublicDescription": "Counts the more expensive bus lock needed to= enforce cache coherency for certain memory accesses that need to be done a= tomically. Can be created by issuing an atomic instruction (via the LOCK p= refix) which causes a cache line split or accesses uncacheable memory. Avai= lable PDIST counters: 0", + "PublicDescription": "Counts the more expensive bus lock needed to= enforce cache coherency for certain memory accesses that need to be done a= tomically. Can be created by issuing an atomic instruction (via the LOCK p= refix) which causes a cache line split or accesses uncacheable memory.", "SampleAfterValue": "100003", "UMask": "0x10" }, @@ -1130,7 +1119,6 @@ "Counter": "0,1,2,3", "EventCode": "0x40", "EventName": "SW_PREFETCH_ACCESS.ANY", - "PublicDescription": "Counts the number of PREFETCHNTA, PREFETCHW,= PREFETCHT0, PREFETCHT1 or PREFETCHT2 instructions executed. Available PDIS= T counters: 0", "SampleAfterValue": "100003", "UMask": "0xf" }, @@ -1139,7 +1127,7 @@ "Counter": "0,1,2,3", "EventCode": "0x40", "EventName": "SW_PREFETCH_ACCESS.NTA", - "PublicDescription": "Counts the number of PREFETCHNTA instruction= s executed. Available PDIST counters: 0", + "PublicDescription": "Counts the number of PREFETCHNTA instruction= s executed.", "SampleAfterValue": "100003", "UMask": "0x1" }, @@ -1148,7 +1136,7 @@ "Counter": "0,1,2,3", "EventCode": "0x40", "EventName": "SW_PREFETCH_ACCESS.PREFETCHW", - "PublicDescription": "Counts the number of PREFETCHW instructions = executed. Available PDIST counters: 0", + "PublicDescription": "Counts the number of PREFETCHW instructions = executed.", "SampleAfterValue": "100003", "UMask": "0x8" }, @@ -1157,7 +1145,7 @@ "Counter": "0,1,2,3", "EventCode": "0x40", "EventName": "SW_PREFETCH_ACCESS.T0", - "PublicDescription": "Counts the number of PREFETCHT0 instructions= executed. Available PDIST counters: 0", + "PublicDescription": "Counts the number of PREFETCHT0 instructions= executed.", "SampleAfterValue": "100003", "UMask": "0x2" }, @@ -1166,7 +1154,7 @@ "Counter": "0,1,2,3", "EventCode": "0x40", "EventName": "SW_PREFETCH_ACCESS.T1_T2", - "PublicDescription": "Counts the number of PREFETCHT1 or PREFETCHT= 2 instructions executed. Available PDIST counters: 0", + "PublicDescription": "Counts the number of PREFETCHT1 or PREFETCHT= 2 instructions executed.", "SampleAfterValue": "100003", "UMask": "0x4" } diff --git a/tools/perf/pmu-events/arch/x86/emeraldrapids/emr-metrics.json = b/tools/perf/pmu-events/arch/x86/emeraldrapids/emr-metrics.json index 34e1cbcd722c..af0a7dd81e93 100644 --- a/tools/perf/pmu-events/arch/x86/emeraldrapids/emr-metrics.json +++ b/tools/perf/pmu-events/arch/x86/emeraldrapids/emr-metrics.json @@ -1,28 +1,28 @@ [ { "BriefDescription": "C1 residency percent per core", - "MetricExpr": "cstate_core@c1\\-residency@ / TSC", + "MetricExpr": "cstate_core@c1\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C1_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C2 residency percent per package", - "MetricExpr": "cstate_pkg@c2\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c2\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C2_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per core", - "MetricExpr": "cstate_core@c6\\-residency@ / TSC", + "MetricExpr": "cstate_core@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per package", - "MetricExpr": "cstate_pkg@c6\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Pkg_Residency", "ScaleUnit": "100%" @@ -78,6 +78,12 @@ "MetricName": "io_bandwidth_read", "ScaleUnit": "1MB/s" }, + { + "BriefDescription": "Bandwidth of inbound IO reads that are initia= ted by end device controllers that are requesting memory from the CPU and m= iss the L3 cache", + "MetricExpr": "UNC_CHA_TOR_INSERTS.IO_MISS_PCIRDCUR * 64 / 1e6 / d= uration_time", + "MetricName": "io_bandwidth_read_l3_miss", + "ScaleUnit": "1MB/s" + }, { "BriefDescription": "Bandwidth of IO reads that are initiated by e= nd device controllers that are requesting memory from the local CPU socket", "MetricExpr": "UNC_CHA_TOR_INSERTS.IO_PCIRDCUR_LOCAL * 64 / 1e6 / = duration_time", @@ -96,6 +102,12 @@ "MetricName": "io_bandwidth_write", "ScaleUnit": "1MB/s" }, + { + "BriefDescription": "Bandwidth of inbound IO writes that are initi= ated by end device controllers that are writing memory to the CPU", + "MetricExpr": "(UNC_CHA_TOR_INSERTS.IO_MISS_ITOM + UNC_CHA_TOR_INS= ERTS.IO_MISS_ITOMCACHENEAR) * 64 / 1e6 / duration_time", + "MetricName": "io_bandwidth_write_l3_miss", + "ScaleUnit": "1MB/s" + }, { "BriefDescription": "Bandwidth of IO writes that are initiated by = end device controllers that are writing memory to the local CPU socket", "MetricExpr": "(UNC_CHA_TOR_INSERTS.IO_ITOM_LOCAL + UNC_CHA_TOR_IN= SERTS.IO_ITOMCACHENEAR_LOCAL) * 64 / 1e6 / duration_time", @@ -111,19 +123,19 @@ { "BriefDescription": "Percentage of inbound full cacheline writes i= nitiated by end device controllers that miss the L3 cache", "MetricExpr": "UNC_CHA_TOR_INSERTS.IO_MISS_ITOM / UNC_CHA_TOR_INSE= RTS.IO_ITOM", - "MetricName": "io_percent_of_inbound_full_writes_that_miss_l3", + "MetricName": "io_full_write_l3_miss", "ScaleUnit": "100%" }, { "BriefDescription": "Percentage of inbound partial cacheline write= s initiated by end device controllers that miss the L3 cache", "MetricExpr": "(UNC_CHA_TOR_INSERTS.IO_MISS_ITOMCACHENEAR + UNC_CH= A_TOR_INSERTS.IO_MISS_RFO) / (UNC_CHA_TOR_INSERTS.IO_ITOMCACHENEAR + UNC_CH= A_TOR_INSERTS.IO_RFO)", - "MetricName": "io_percent_of_inbound_partial_writes_that_miss_l3", + "MetricName": "io_partial_write_l3_miss", "ScaleUnit": "100%" }, { "BriefDescription": "Percentage of inbound reads initiated by end = device controllers that miss the L3 cache", "MetricExpr": "UNC_CHA_TOR_INSERTS.IO_MISS_PCIRDCUR / UNC_CHA_TOR_= INSERTS.IO_PCIRDCUR", - "MetricName": "io_percent_of_inbound_reads_that_miss_l3", + "MetricName": "io_read_l3_miss", "ScaleUnit": "100%" }, { @@ -335,7 +347,7 @@ { "BriefDescription": "This category represents fraction of slots wh= ere no uops are being delivered due to a lack of required resources for acc= epting new uops in the Backend", "DefaultMetricgroupName": "TopdownL1", - "MetricExpr": "topdown\\-be\\-bound / (topdown\\-fe\\-bound + topd= own\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_inf= o_thread_slots", + "MetricExpr": "topdown\\-be\\-bound / (topdown\\-fe\\-bound + topd= own\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound)", "MetricGroup": "BvOB;Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", "MetricThreshold": "tma_backend_bound > 0.2", @@ -369,40 +381,40 @@ "MetricThreshold": "tma_bottleneck_branching_overhead > 5", "PublicDescription": "Total pipeline cost of instructions used for= program control-flow - a subset of the Retiring category in TMA. Examples = include function calls; loops and alignments. (A lower bound)" }, + { + "BriefDescription": "Total pipeline cost when the execution is com= pute-bound - an estimation", + "MetricExpr": "100 * (tma_core_bound * tma_divider / (tma_amx_busy= + tma_divider + tma_ports_utilization + tma_serializing_operation) + tma_c= ore_bound * tma_amx_busy / (tma_amx_busy + tma_divider + tma_ports_utilizat= ion + tma_serializing_operation) + tma_core_bound * (tma_ports_utilization = / (tma_amx_busy + tma_divider + tma_ports_utilization + tma_serializing_ope= ration)) * (tma_ports_utilized_3m / (tma_ports_utilized_0 + tma_ports_utili= zed_1 + tma_ports_utilized_2 + tma_ports_utilized_3m)))", + "MetricGroup": "BvCB;Cor;tma_issueComp", + "MetricName": "tma_bottleneck_compute_bound_est", + "MetricThreshold": "tma_bottleneck_compute_bound_est > 20", + "PublicDescription": "Total pipeline cost when the execution is co= mpute-bound - an estimation. Covers Core Bound when High ILP as well as whe= n long-latency execution units are busy. Related metrics: " + }, { "BriefDescription": "Total pipeline cost of external Memory- or Ca= che-Bandwidth related bottlenecks", - "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) *= (tma_mem_bandwidth / (tma_mem_bandwidth + tma_mem_latency)) + tma_memory_b= ound * (tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_= l3_bound + tma_store_bound)) * (tma_sq_full / (tma_contested_accesses + tma= _data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_memory_bound * (tm= a_l1_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound += tma_store_bound)) * (tma_fb_full / (tma_dtlb_load + tma_fb_full + tma_l1_l= atency_dependency + tma_lock_latency + tma_split_loads + tma_store_fwd_blk)= ))", + "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) *= (tma_mem_bandwidth / (tma_mem_bandwidth + tma_mem_latency)) + 0 / (tma_dra= m_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * t= ma_mem_bandwidth / (tma_mem_bandwidth + tma_mem_latency) + tma_memory_bound= * (tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_b= ound + tma_store_bound)) * (tma_sq_full / (tma_contested_accesses + tma_dat= a_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_memory_bound * (tma_l1= _bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma= _store_bound)) * (tma_fb_full / (tma_dtlb_load + tma_fb_full + tma_l1_laten= cy_dependency + tma_lock_latency + tma_split_loads + tma_store_fwd_blk)))", "MetricGroup": "BvMB;Mem;MemoryBW;Offcore;tma_issueBW", - "MetricName": "tma_bottleneck_cache_memory_bandwidth", - "MetricThreshold": "tma_bottleneck_cache_memory_bandwidth > 20", + "MetricName": "tma_bottleneck_data_cache_memory_bandwidth", + "MetricThreshold": "tma_bottleneck_data_cache_memory_bandwidth > 2= 0", "PublicDescription": "Total pipeline cost of external Memory- or C= ache-Bandwidth related bottlenecks. Related metrics: tma_fb_full, tma_info_= system_dram_bw_use, tma_mem_bandwidth, tma_sq_full" }, { "BriefDescription": "Total pipeline cost of external Memory- or Ca= che-Latency related bottlenecks", - "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) *= (tma_mem_latency / (tma_mem_bandwidth + tma_mem_latency)) + tma_memory_bou= nd * (tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3= _bound + tma_store_bound)) * (tma_l3_hit_latency / (tma_contested_accesses = + tma_data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_memory_bound = * tma_l2_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bou= nd + tma_store_bound) + tma_memory_bound * (tma_l1_bound / (tma_dram_bound = + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_l1_= latency_dependency / (tma_dtlb_load + tma_fb_full + tma_l1_latency_dependen= cy + tma_lock_latency + tma_split_loads + tma_store_fwd_blk)) + tma_memory_= bound * (tma_l1_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma= _l3_bound + tma_store_bound)) * (tma_lock_latency / (tma_dtlb_load + tma_fb= _full + tma_l1_latency_dependency + tma_lock_latency + tma_split_loads + tm= a_store_fwd_blk)) + tma_memory_bound * (tma_l1_bound / (tma_dram_bound + tm= a_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_split_l= oads / (tma_dtlb_load + tma_fb_full + tma_l1_latency_dependency + tma_lock_= latency + tma_split_loads + tma_store_fwd_blk)) + tma_memory_bound * (tma_s= tore_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound += tma_store_bound)) * (tma_split_stores / (tma_dtlb_store + tma_false_sharin= g + tma_split_stores + tma_store_latency + tma_streaming_stores)) + tma_mem= ory_bound * (tma_store_bound / (tma_dram_bound + tma_l1_bound + tma_l2_boun= d + tma_l3_bound + tma_store_bound)) * (tma_store_latency / (tma_dtlb_store= + tma_false_sharing + tma_split_stores + tma_store_latency + tma_streaming= _stores)))", + "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) *= (tma_mem_latency / (tma_mem_bandwidth + tma_mem_latency)) + 0 / (tma_dram_= bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * tma= _mem_latency / (tma_mem_bandwidth + tma_mem_latency) + tma_memory_bound * (= tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound= + tma_store_bound)) * (tma_l3_hit_latency / (tma_contested_accesses + tma_= data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_memory_bound * tma_= l2_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + t= ma_store_bound) + tma_memory_bound * (tma_l1_bound / (tma_dram_bound + tma_= l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_l1_latenc= y_dependency / (tma_dtlb_load + tma_fb_full + tma_l1_latency_dependency + t= ma_lock_latency + tma_split_loads + tma_store_fwd_blk)) + tma_memory_bound = * (tma_l1_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bo= und + tma_store_bound)) * (tma_lock_latency / (tma_dtlb_load + tma_fb_full = + tma_l1_latency_dependency + tma_lock_latency + tma_split_loads + tma_stor= e_fwd_blk)) + tma_memory_bound * (tma_l1_bound / (tma_dram_bound + tma_l1_b= ound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_split_loads /= (tma_dtlb_load + tma_fb_full + tma_l1_latency_dependency + tma_lock_latenc= y + tma_split_loads + tma_store_fwd_blk)) + tma_memory_bound * (tma_store_b= ound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_s= tore_bound)) * (tma_split_stores / (tma_dtlb_store + tma_false_sharing + tm= a_split_stores + tma_store_latency + tma_streaming_stores)) + tma_memory_bo= und * (tma_store_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tm= a_l3_bound + tma_store_bound)) * (tma_store_latency / (tma_dtlb_store + tma= _false_sharing + tma_split_stores + tma_store_latency + tma_streaming_store= s)))", "MetricGroup": "BvML;Mem;MemoryLat;Offcore;tma_issueLat", - "MetricName": "tma_bottleneck_cache_memory_latency", - "MetricThreshold": "tma_bottleneck_cache_memory_latency > 20", + "MetricName": "tma_bottleneck_data_cache_memory_latency", + "MetricThreshold": "tma_bottleneck_data_cache_memory_latency > 20", "PublicDescription": "Total pipeline cost of external Memory- or C= ache-Latency related bottlenecks. Related metrics: tma_l3_hit_latency, tma_= mem_latency" }, - { - "BriefDescription": "Total pipeline cost when the execution is com= pute-bound - an estimation", - "MetricExpr": "100 * (tma_core_bound * tma_divider / (tma_amx_busy= + tma_divider + tma_ports_utilization + tma_serializing_operation) + tma_c= ore_bound * tma_amx_busy / (tma_amx_busy + tma_divider + tma_ports_utilizat= ion + tma_serializing_operation) + tma_core_bound * (tma_ports_utilization = / (tma_amx_busy + tma_divider + tma_ports_utilization + tma_serializing_ope= ration)) * (tma_ports_utilized_3m / (tma_ports_utilized_0 + tma_ports_utili= zed_1 + tma_ports_utilized_2 + tma_ports_utilized_3m)))", - "MetricGroup": "BvCB;Cor;tma_issueComp", - "MetricName": "tma_bottleneck_compute_bound_est", - "MetricThreshold": "tma_bottleneck_compute_bound_est > 20", - "PublicDescription": "Total pipeline cost when the execution is co= mpute-bound - an estimation. Covers Core Bound when High ILP as well as whe= n long-latency execution units are busy. Related metrics: " - }, { "BriefDescription": "Total pipeline cost of instruction fetch band= width related bottlenecks (when the front-end could not sustain operations = delivery to the back-end)", - "MetricExpr": "100 * (tma_frontend_bound - (1 - 10 * tma_microcode= _sequencer * tma_other_mispredicts / tma_branch_mispredicts) * tma_fetch_la= tency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches = + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches) - (1 - I= NST_RETIRED.REP_ITERATION / cpu@UOPS_RETIRED.MS\\,cmask\\=3D1@) * (tma_fetc= h_latency * (tma_ms_switches + tma_branch_resteers * (tma_clears_resteers += tma_mispredicts_resteers * tma_other_mispredicts / tma_branch_mispredicts)= / (tma_clears_resteers + tma_mispredicts_resteers + tma_unknown_branches))= / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_m= isses + tma_lcp + tma_ms_switches) + tma_fetch_bandwidth * tma_ms / (tma_ds= b + tma_mite + tma_ms))) - tma_bottleneck_big_code", + "MetricExpr": "100 * (tma_frontend_bound - (1 - 10 * tma_microcode= _sequencer * tma_other_mispredicts / tma_branch_mispredicts) * tma_fetch_la= tency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches = + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches) - (1 - I= NST_RETIRED.REP_ITERATION / cpu@UOPS_RETIRED.MS\\,cmask\\=3D1@) * (tma_fetc= h_latency * (tma_ms_switches + tma_branch_resteers * (tma_clears_resteers += tma_mispredicts_resteers * tma_other_mispredicts / tma_branch_mispredicts)= / (tma_clears_resteers + tma_mispredicts_resteers + tma_unknown_branches))= / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_m= isses + tma_lcp + tma_ms_switches) + tma_ms)) - tma_bottleneck_big_code", "MetricGroup": "BvFB;Fed;FetchBW;Frontend", "MetricName": "tma_bottleneck_instruction_fetch_bw", "MetricThreshold": "tma_bottleneck_instruction_fetch_bw > 20" }, { "BriefDescription": "Total pipeline cost of irregular execution (e= .g", - "MetricExpr": "100 * ((1 - INST_RETIRED.REP_ITERATION / cpu@UOPS_R= ETIRED.MS\\,cmask\\=3D1@) * (tma_fetch_latency * (tma_ms_switches + tma_bra= nch_resteers * (tma_clears_resteers + tma_mispredicts_resteers * tma_other_= mispredicts / tma_branch_mispredicts) / (tma_clears_resteers + tma_mispredi= cts_resteers + tma_unknown_branches)) / (tma_branch_resteers + tma_dsb_swit= ches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches) + t= ma_fetch_bandwidth * tma_ms / (tma_dsb + tma_mite + tma_ms)) + 10 * tma_mic= rocode_sequencer * tma_other_mispredicts / tma_branch_mispredicts * tma_bra= nch_mispredicts + tma_machine_clears * tma_other_nukes / tma_other_nukes + = tma_core_bound * (tma_serializing_operation + RS.EMPTY_RESOURCE / tma_info_= thread_clks * tma_ports_utilized_0) / (tma_amx_busy + tma_divider + tma_por= ts_utilization + tma_serializing_operation) + tma_microcode_sequencer / (tm= a_few_uops_instructions + tma_microcode_sequencer) * (tma_assists / tma_mic= rocode_sequencer) * tma_heavy_operations)", + "MetricExpr": "100 * ((1 - INST_RETIRED.REP_ITERATION / cpu@UOPS_R= ETIRED.MS\\,cmask\\=3D1@) * (tma_fetch_latency * (tma_ms_switches + tma_bra= nch_resteers * (tma_clears_resteers + tma_mispredicts_resteers * tma_other_= mispredicts / tma_branch_mispredicts) / (tma_clears_resteers + tma_mispredi= cts_resteers + tma_unknown_branches)) / (tma_branch_resteers + tma_dsb_swit= ches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches) + t= ma_ms) + 10 * tma_microcode_sequencer * tma_other_mispredicts / tma_branch_= mispredicts * tma_branch_mispredicts + tma_machine_clears * tma_other_nukes= / tma_other_nukes + tma_core_bound * (tma_serializing_operation + RS.EMPTY= _RESOURCE / tma_info_thread_clks * tma_ports_utilized_0) / (tma_amx_busy + = tma_divider + tma_ports_utilization + tma_serializing_operation) + tma_micr= ocode_sequencer / (tma_few_uops_instructions + tma_microcode_sequencer) * (= tma_assists / tma_microcode_sequencer) * tma_heavy_operations)", "MetricGroup": "Bad;BvIO;Cor;Ret;tma_issueMS", "MetricName": "tma_bottleneck_irregular_overhead", "MetricThreshold": "tma_bottleneck_irregular_overhead > 10", @@ -434,7 +446,7 @@ }, { "BriefDescription": "Total pipeline cost of remaining bottlenecks = in the back-end", - "MetricExpr": "100 - (tma_bottleneck_big_code + tma_bottleneck_ins= truction_fetch_bw + tma_bottleneck_mispredictions + tma_bottleneck_cache_me= mory_bandwidth + tma_bottleneck_cache_memory_latency + tma_bottleneck_memor= y_data_tlbs + tma_bottleneck_memory_synchronization + tma_bottleneck_comput= e_bound_est + tma_bottleneck_irregular_overhead + tma_bottleneck_branching_= overhead + tma_bottleneck_useful_work)", + "MetricExpr": "100 - (tma_bottleneck_big_code + tma_bottleneck_ins= truction_fetch_bw + tma_bottleneck_mispredictions + tma_bottleneck_data_cac= he_memory_bandwidth + tma_bottleneck_data_cache_memory_latency + tma_bottle= neck_memory_data_tlbs + tma_bottleneck_memory_synchronization + tma_bottlen= eck_compute_bound_est + tma_bottleneck_irregular_overhead + tma_bottleneck_= branching_overhead + tma_bottleneck_useful_work)", "MetricGroup": "BvOB;Cor;Offcore", "MetricName": "tma_bottleneck_other_bottlenecks", "MetricThreshold": "tma_bottleneck_other_bottlenecks > 20", @@ -450,7 +462,7 @@ { "BriefDescription": "This metric represents fraction of slots the = CPU has wasted due to Branch Misprediction", "DefaultMetricgroupName": "TopdownL2", - "MetricExpr": "topdown\\-br\\-mispredict / (topdown\\-fe\\-bound += topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tm= a_info_thread_slots", + "MetricExpr": "topdown\\-br\\-mispredict / (topdown\\-fe\\-bound += topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound)", "MetricGroup": "BadSpec;BrMispredicts;BvMP;Default;TmaL2;TopdownL2= ;tma_L2_group;tma_bad_speculation_group;tma_issueBM", "MetricName": "tma_branch_mispredicts", "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_specula= tion > 0.15", @@ -551,7 +563,6 @@ }, { "BriefDescription": "This metric estimates fraction of cycles whil= e the memory subsystem was handling synchronizations due to contested acces= ses", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "(76.6 * tma_info_system_core_frequency * (MEM_LOAD_= L3_HIT_RETIRED.XSNP_FWD * (OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM / (OCR.DEMA= ND_DATA_RD.L3_HIT.SNOOP_HITM + OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HIT_WITH_FWD= ))) + 74.6 * tma_info_system_core_frequency * MEM_LOAD_L3_HIT_RETIRED.XSNP_= MISS) * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS / 2) / tma_= info_thread_clks", "MetricGroup": "BvMS;DataSharing;LockCont;Offcore;Snoop;TopdownL4;= tma_L4_group;tma_issueSyncxn;tma_l3_bound_group", "MetricName": "tma_contested_accesses", @@ -658,7 +669,7 @@ "MetricGroup": "BvMB;MemoryBW;TopdownL4;tma_L4_group;tma_issueBW;t= ma_issueSL;tma_issueSmSt;tma_l1_bound_group", "MetricName": "tma_fb_full", "MetricThreshold": "tma_fb_full > 0.3", - "PublicDescription": "This metric does a *rough estimation* of how= often L1D Fill Buffer unavailability limited additional L1D miss memory ac= cess requests to proceed. The higher the metric value; the deeper the memor= y hierarchy level the misses are satisfied from (metric values >1 are valid= ). Often it hints on approaching bandwidth limits (to L2 cache; L3 cache or= external memory). Related metrics: tma_bottleneck_cache_memory_bandwidth, = tma_info_system_dram_bw_use, tma_mem_bandwidth, tma_sq_full, tma_store_late= ncy, tma_streaming_stores", + "PublicDescription": "This metric does a *rough estimation* of how= often L1D Fill Buffer unavailability limited additional L1D miss memory ac= cess requests to proceed. The higher the metric value; the deeper the memor= y hierarchy level the misses are satisfied from (metric values >1 are valid= ). Often it hints on approaching bandwidth limits (to L2 cache; L3 cache or= external memory). Related metrics: tma_bottleneck_data_cache_memory_bandwi= dth, tma_info_system_dram_bw_use, tma_mem_bandwidth, tma_sq_full, tma_store= _latency, tma_streaming_stores", "ScaleUnit": "100%" }, { @@ -786,7 +797,7 @@ { "BriefDescription": "This metric represents fraction of slots wher= e the CPU was retiring heavy-weight operations -- instructions that require= two or more uops or micro-coded sequences", "DefaultMetricgroupName": "TopdownL2", - "MetricExpr": "topdown\\-heavy\\-ops / (topdown\\-fe\\-bound + top= down\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_in= fo_thread_slots", + "MetricExpr": "topdown\\-heavy\\-ops / (topdown\\-fe\\-bound + top= down\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound)", "MetricGroup": "Default;Retire;TmaL2;TopdownL2;tma_L2_group;tma_re= tiring_group", "MetricName": "tma_heavy_operations", "MetricThreshold": "tma_heavy_operations > 0.1", @@ -1297,19 +1308,19 @@ { "BriefDescription": "Off-core accesses per kilo instruction for mo= dified write requests", "MetricExpr": "1e3 * OCR.MODIFIED_WRITE.ANY_RESPONSE / tma_info_in= st_mix_instructions", - "MetricGroup": "Offcore", + "MetricGroup": "Offcore;Server", "MetricName": "tma_info_memory_mix_offcore_mwrite_any_pki" }, { "BriefDescription": "Off-core accesses per kilo instruction for re= ads-to-core requests (speculative; including in-core HW prefetches)", "MetricExpr": "1e3 * OCR.READS_TO_CORE.ANY_RESPONSE / tma_info_ins= t_mix_instructions", - "MetricGroup": "CacheHits;Offcore", + "MetricGroup": "CacheHits;Offcore;Server", "MetricName": "tma_info_memory_mix_offcore_read_any_pki" }, { "BriefDescription": "L3 cache misses per kilo instruction for read= s-to-core requests (speculative; including in-core HW prefetches)", "MetricExpr": "1e3 * OCR.READS_TO_CORE.L3_MISS / tma_info_inst_mix= _instructions", - "MetricGroup": "Offcore", + "MetricGroup": "Offcore;Server", "MetricName": "tma_info_memory_mix_offcore_read_l3m_pki" }, { @@ -1335,21 +1346,21 @@ { "BriefDescription": "Average DRAM BW for Reads-to-Core (R2C) cover= ing for memory attached to local- and remote-socket", "MetricExpr": "64 * OCR.READS_TO_CORE.DRAM / 1e9 / tma_info_system= _time", - "MetricGroup": "HPC;Mem;MemoryBW;SoC", + "MetricGroup": "HPC;Mem;MemoryBW;Offcore;Server", "MetricName": "tma_info_memory_soc_r2c_dram_bw", "PublicDescription": "Average DRAM BW for Reads-to-Core (R2C) cove= ring for memory attached to local- and remote-socket. See R2C_Offcore_BW." }, { "BriefDescription": "Average L3-cache miss BW for Reads-to-Core (R= 2C)", "MetricExpr": "64 * OCR.READS_TO_CORE.L3_MISS / 1e9 / tma_info_sys= tem_time", - "MetricGroup": "HPC;Mem;MemoryBW;SoC", + "MetricGroup": "HPC;Mem;MemoryBW;Offcore;Server", "MetricName": "tma_info_memory_soc_r2c_l3m_bw", "PublicDescription": "Average L3-cache miss BW for Reads-to-Core (= R2C). This covering going to DRAM or other memory off-chip memory tears. Se= e R2C_Offcore_BW." }, { "BriefDescription": "Average Off-core access BW for Reads-to-Core = (R2C)", "MetricExpr": "64 * OCR.READS_TO_CORE.ANY_RESPONSE / 1e9 / tma_inf= o_system_time", - "MetricGroup": "HPC;Mem;MemoryBW;SoC", + "MetricGroup": "HPC;Mem;MemoryBW;Offcore;Server", "MetricName": "tma_info_memory_soc_r2c_offcore_bw", "PublicDescription": "Average Off-core access BW for Reads-to-Core= (R2C). R2C account for demand or prefetch load/RFO/code access that fill d= ata into the Core caches." }, @@ -1379,7 +1390,7 @@ "MetricName": "tma_info_memory_tlb_store_stlb_mpki" }, { - "BriefDescription": "", + "BriefDescription": "Mem;Backend;CacheHits", "MetricExpr": "UOPS_EXECUTED.THREAD / (UOPS_EXECUTED.CORE_CYCLES_G= E_1 / 2 if #SMT_on else cpu@UOPS_EXECUTED.THREAD\\,cmask\\=3D1@)", "MetricGroup": "Cor;Pipeline;PortsUtil;SMT", "MetricName": "tma_info_pipeline_execute" @@ -1426,7 +1437,7 @@ }, { "BriefDescription": "Measured Average Core Frequency for unhalted = processors [GHz]", - "MetricExpr": "tma_info_system_turbo_utilization * TSC / 1e9 / tma= _info_system_time", + "MetricExpr": "tma_info_system_turbo_utilization * msr@tsc@ / 1e9 = / tma_info_system_time", "MetricGroup": "Power;Summary", "MetricName": "tma_info_system_core_frequency" }, @@ -1438,16 +1449,28 @@ }, { "BriefDescription": "Average number of utilized CPUs", - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / TSC", + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@", "MetricGroup": "Summary", "MetricName": "tma_info_system_cpus_utilized" }, + { + "BriefDescription": "Average 3DXP Memory Bandwidth Use for reads [= GB / sec]", + "MetricExpr": "(64 * UNC_M_PMM_RPQ_INSERTS / 1e9 / tma_info_system= _time if #has_pmem > 0 else 0)", + "MetricGroup": "MemOffcore;MemoryBW;Server;SoC", + "MetricName": "tma_info_system_cxl_mem_read_bw" + }, + { + "BriefDescription": "Average 3DXP Memory Bandwidth Use for Writes = [GB / sec]", + "MetricExpr": "(64 * UNC_M_PMM_WPQ_INSERTS / 1e9 / tma_info_system= _time if #has_pmem > 0 else 0)", + "MetricGroup": "MemOffcore;MemoryBW;Server;SoC", + "MetricName": "tma_info_system_cxl_mem_write_bw" + }, { "BriefDescription": "Average external Memory Bandwidth Use for rea= ds and writes [GB / sec]", "MetricExpr": "64 * (UNC_M_CAS_COUNT.RD + UNC_M_CAS_COUNT.WR) / 1e= 9 / tma_info_system_time", "MetricGroup": "HPC;MemOffcore;MemoryBW;SoC;tma_issueBW", "MetricName": "tma_info_system_dram_bw_use", - "PublicDescription": "Average external Memory Bandwidth Use for re= ads and writes [GB / sec]. Related metrics: tma_bottleneck_cache_memory_ban= dwidth, tma_fb_full, tma_mem_bandwidth, tma_sq_full" + "PublicDescription": "Average external Memory Bandwidth Use for re= ads and writes [GB / sec]. Related metrics: tma_bottleneck_data_cache_memor= y_bandwidth, tma_fb_full, tma_mem_bandwidth, tma_sq_full" }, { "BriefDescription": "Giga Floating Point Operations Per Second", @@ -1513,7 +1536,6 @@ }, { "BriefDescription": "Average latency of data read request to exter= nal memory (in nanoseconds)", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "1e9 * (UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD / UNC_CHA_= TOR_INSERTS.IA_MISS_DRD) / (tma_info_system_socket_clks / tma_info_system_t= ime)", "MetricGroup": "Mem;MemoryLat;SoC", "MetricName": "tma_info_system_mem_read_latency", @@ -1674,12 +1696,12 @@ "ScaleUnit": "100%" }, { - "BriefDescription": "This metric([SKL+] roughly; [LNL]) estimates = fraction of cycles with demand load accesses that hit the L1D cache", + "BriefDescription": "This metric ([SKL+] roughly; [LNL]) estimates= fraction of cycles with demand load accesses that hit the L1D cache", "MetricExpr": "min(2 * (MEM_INST_RETIRED.ALL_LOADS - MEM_LOAD_RETI= RED.FB_HIT - MEM_LOAD_RETIRED.L1_MISS) * 20 / 100, max(CYCLE_ACTIVITY.CYCLE= S_MEM_ANY - MEMORY_ACTIVITY.CYCLES_L1D_MISS, 0)) / tma_info_thread_clks", "MetricGroup": "BvML;MemoryLat;TopdownL4;tma_L4_group;tma_l1_bound= _group", "MetricName": "tma_l1_latency_dependency", "MetricThreshold": "tma_l1_latency_dependency > 0.1 & (tma_l1_boun= d > 0.1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric([SKL+] roughly; [LNL]) estimates= fraction of cycles with demand load accesses that hit the L1D cache. The s= hort latency of the L1D cache may be exposed in pointer-chasing memory acce= ss patterns as an example. Sample with: MEM_LOAD_RETIRED.L1_HIT", + "PublicDescription": "This metric ([SKL+] roughly; [LNL]) estimate= s fraction of cycles with demand load accesses that hit the L1D cache. The = short latency of the L1D cache may be exposed in pointer-chasing memory acc= ess patterns as an example. Sample with: MEM_LOAD_RETIRED.L1_HIT", "ScaleUnit": "100%" }, { @@ -1693,7 +1715,6 @@ }, { "BriefDescription": "This metric represents fraction of cycles wit= h demand load accesses that hit the L2 cache under unloaded scenarios (poss= ibly L2 latency limited)", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "4.4 * tma_info_system_core_frequency * MEM_LOAD_RET= IRED.L2_HIT * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS / 2) = / tma_info_thread_clks", "MetricGroup": "MemoryLat;TopdownL4;tma_L4_group;tma_l2_bound_grou= p", "MetricName": "tma_l2_hit_latency", @@ -1712,12 +1733,11 @@ }, { "BriefDescription": "This metric estimates fraction of cycles with= demand load accesses that hit the L3 cache under unloaded scenarios (possi= bly L3 latency limited)", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "32.6 * tma_info_system_core_frequency * (MEM_LOAD_R= ETIRED.L3_HIT * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS / 2= )) / tma_info_thread_clks", "MetricGroup": "BvML;MemoryLat;TopdownL4;tma_L4_group;tma_issueLat= ;tma_l3_bound_group", "MetricName": "tma_l3_hit_latency", "MetricThreshold": "tma_l3_hit_latency > 0.1 & (tma_l3_bound > 0.0= 5 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric estimates fraction of cycles wit= h demand load accesses that hit the L3 cache under unloaded scenarios (poss= ibly L3 latency limited). Avoiding private cache misses (i.e. L2 misses/L3= hits) will improve the latency; reduce contention with sibling physical co= res and increase performance. Note the value of this node may overlap with= its siblings. Sample with: MEM_LOAD_RETIRED.L3_HIT_PS. Related metrics: tm= a_bottleneck_cache_memory_latency, tma_mem_latency", + "PublicDescription": "This metric estimates fraction of cycles wit= h demand load accesses that hit the L3 cache under unloaded scenarios (poss= ibly L3 latency limited). Avoiding private cache misses (i.e. L2 misses/L3= hits) will improve the latency; reduce contention with sibling physical co= res and increase performance. Note the value of this node may overlap with= its siblings. Sample with: MEM_LOAD_RETIRED.L3_HIT_PS. Related metrics: tm= a_bottleneck_data_cache_memory_latency, tma_mem_latency", "ScaleUnit": "100%" }, { @@ -1800,6 +1820,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles the= CPU spent handling cache misses due to lock operations", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "(16 * max(0, MEM_INST_RETIRED.LOCK_LOADS - L2_RQSTS= .ALL_RFO) + MEM_INST_RETIRED.LOCK_LOADS / MEM_INST_RETIRED.ALL_STORES * (10= * L2_RQSTS.RFO_HIT + min(CPU_CLK_UNHALTED.THREAD, OFFCORE_REQUESTS_OUTSTAN= DING.CYCLES_WITH_DEMAND_RFO))) / tma_info_thread_clks", "MetricGroup": "LockCont;Offcore;TopdownL4;tma_L4_group;tma_issueR= FO;tma_l1_bound_group", "MetricName": "tma_lock_latency", @@ -1832,7 +1853,7 @@ "MetricGroup": "BvMB;MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_d= ram_bound_group;tma_issueBW", "MetricName": "tma_mem_bandwidth", "MetricThreshold": "tma_mem_bandwidth > 0.2 & (tma_dram_bound > 0.= 1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric estimates fraction of cycles whe= re the core's performance was likely hurt due to approaching bandwidth limi= ts of external memory - DRAM ([SPR-HBM] and/or HBM). The underlying heuris= tic assumes that a similar off-core traffic is generated by all IA cores. T= his metric does not aggregate non-data-read requests by this logical proces= sor; requests from other IA Logical Processors/Physical Cores/sockets; or o= ther non-IA devices like GPU; hence the maximum external memory bandwidth l= imits may or may not be approached when this metric is flagged (see Uncore = counters for that). Related metrics: tma_bottleneck_cache_memory_bandwidth,= tma_fb_full, tma_info_system_dram_bw_use, tma_sq_full", + "PublicDescription": "This metric estimates fraction of cycles whe= re the core's performance was likely hurt due to approaching bandwidth limi= ts of external memory - DRAM ([SPR-HBM] and/or HBM). The underlying heuris= tic assumes that a similar off-core traffic is generated by all IA cores. T= his metric does not aggregate non-data-read requests by this logical proces= sor; requests from other IA Logical Processors/Physical Cores/sockets; or o= ther non-IA devices like GPU; hence the maximum external memory bandwidth l= imits may or may not be approached when this metric is flagged (see Uncore = counters for that). Related metrics: tma_bottleneck_data_cache_memory_bandw= idth, tma_fb_full, tma_info_system_dram_bw_use, tma_sq_full", "ScaleUnit": "100%" }, { @@ -1841,13 +1862,13 @@ "MetricGroup": "BvML;MemoryLat;Offcore;TopdownL4;tma_L4_group;tma_= dram_bound_group;tma_issueLat", "MetricName": "tma_mem_latency", "MetricThreshold": "tma_mem_latency > 0.1 & (tma_dram_bound > 0.1 = & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric estimates fraction of cycles whe= re the performance was likely hurt due to latency from external memory - DR= AM ([SPR-HBM] and/or HBM). This metric does not aggregate requests from ot= her Logical Processors/Physical Cores/sockets (see Uncore counters for that= ). Related metrics: tma_bottleneck_cache_memory_latency, tma_l3_hit_latency= ", + "PublicDescription": "This metric estimates fraction of cycles whe= re the performance was likely hurt due to latency from external memory - DR= AM ([SPR-HBM] and/or HBM). This metric does not aggregate requests from ot= her Logical Processors/Physical Cores/sockets (see Uncore counters for that= ). Related metrics: tma_bottleneck_data_cache_memory_latency, tma_l3_hit_la= tency", "ScaleUnit": "100%" }, { "BriefDescription": "This metric represents fraction of slots the = Memory subsystem within the Backend was a bottleneck", "DefaultMetricgroupName": "TopdownL2", - "MetricExpr": "topdown\\-mem\\-bound / (topdown\\-fe\\-bound + top= down\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_in= fo_thread_slots", + "MetricExpr": "topdown\\-mem\\-bound / (topdown\\-fe\\-bound + top= down\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound)", "MetricGroup": "Backend;Default;TmaL2;TopdownL2;tma_L2_group;tma_b= ackend_bound_group", "MetricName": "tma_memory_bound", "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0= .2", @@ -1857,7 +1878,6 @@ }, { "BriefDescription": "This metric represents fraction of cycles the= CPU was stalled due to LFENCE Instructions.", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "13 * MISC2_RETIRED.LFENCE / tma_info_thread_clks", "MetricGroup": "TopdownL4;tma_L4_group;tma_serializing_operation_g= roup", "MetricName": "tma_memory_fence", @@ -1910,7 +1930,7 @@ }, { "BriefDescription": "This metric represents Core fraction of cycle= s in which CPU was likely limited due to the Microcode Sequencer (MS) unit = - see Microcode_Sequencer node for details.", - "MetricExpr": "max(IDQ.MS_CYCLES_ANY, cpu@UOPS_RETIRED.MS\\,cmask\= \=3D1@ / (UOPS_RETIRED.SLOTS / UOPS_ISSUED.ANY)) / tma_info_core_core_clks = / 2", + "MetricExpr": "max(IDQ.MS_CYCLES_ANY, cpu@UOPS_RETIRED.MS\\,cmask\= \=3D1@ / (UOPS_RETIRED.SLOTS / UOPS_ISSUED.ANY)) / tma_info_core_core_clks = / 2.4", "MetricGroup": "MicroSeq;TopdownL3;tma_L3_group;tma_fetch_bandwidt= h_group", "MetricName": "tma_ms", "MetricThreshold": "tma_ms > 0.05 & tma_fetch_bandwidth > 0.2", @@ -1945,6 +1965,7 @@ }, { "BriefDescription": "This metric represents the remaining light uo= ps fraction the CPU has executed - remaining means not covered by other sib= ling nodes", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "max(0, tma_light_operations - (tma_fp_arith + tma_i= nt_operations + tma_memory_operations + tma_fused_instructions + tma_non_fu= sed_branches))", "MetricGroup": "Pipeline;TopdownL3;tma_L3_group;tma_light_operatio= ns_group", "MetricName": "tma_other_light_ops", @@ -2006,6 +2027,7 @@ }, { "BriefDescription": "This metric estimates fraction of cycles the = CPU performance was potentially limited due to Core computation issues (non= divider-related)", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "((tma_ports_utilized_0 * tma_info_thread_clks + (EX= E_ACTIVITY.1_PORTS_UTIL + tma_retiring * EXE_ACTIVITY.2_3_PORTS_UTIL)) / tm= a_info_thread_clks if ARITH.DIV_ACTIVE < CYCLE_ACTIVITY.STALLS_TOTAL - EXE_= ACTIVITY.BOUND_ON_LOADS else (EXE_ACTIVITY.1_PORTS_UTIL + tma_retiring * EX= E_ACTIVITY.2_3_PORTS_UTIL) / tma_info_thread_clks)", "MetricGroup": "PortsUtil;TopdownL3;tma_L3_group;tma_core_bound_gr= oup", "MetricName": "tma_ports_utilization", @@ -2015,6 +2037,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles CPU= executed no uops on any execution port (Logical Processor cycles since ICL= , Physical Core cycles otherwise)", + "MetricConstraint": "NO_THRESHOLD_AND_NMI", "MetricExpr": "(EXE_ACTIVITY.EXE_BOUND_0_PORTS + max(RS.EMPTY_RESO= URCE - RESOURCE_STALLS.SCOREBOARD, 0)) / tma_info_thread_clks * (CYCLE_ACTI= VITY.STALLS_TOTAL - EXE_ACTIVITY.BOUND_ON_LOADS) / tma_info_thread_clks", "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_ports_utiliza= tion_group", "MetricName": "tma_ports_utilized_0", @@ -2024,6 +2047,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles whe= re the CPU executed total of 1 uop per cycle on all execution ports (Logica= l Processor cycles since ICL, Physical Core cycles otherwise)", + "MetricConstraint": "NO_THRESHOLD_AND_NMI", "MetricExpr": "EXE_ACTIVITY.1_PORTS_UTIL / tma_info_thread_clks", "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_issueL1;tma_p= orts_utilization_group", "MetricName": "tma_ports_utilized_1", @@ -2033,7 +2057,6 @@ }, { "BriefDescription": "This metric represents fraction of cycles CPU= executed total of 2 uops per cycle on all execution ports (Logical Process= or cycles since ICL, Physical Core cycles otherwise)", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "EXE_ACTIVITY.2_PORTS_UTIL / tma_info_thread_clks", "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_issue2P;tma_p= orts_utilization_group", "MetricName": "tma_ports_utilized_2", @@ -2043,7 +2066,6 @@ }, { "BriefDescription": "This metric represents fraction of cycles CPU= executed total of 3 or more uops per cycle on all execution ports (Logical= Processor cycles since ICL, Physical Core cycles otherwise)", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "UOPS_EXECUTED.CYCLES_GE_3 / tma_info_thread_clks", "MetricGroup": "BvCB;PortsUtil;TopdownL4;tma_L4_group;tma_ports_ut= ilization_group", "MetricName": "tma_ports_utilized_3m", @@ -2072,7 +2094,7 @@ { "BriefDescription": "This category represents fraction of slots ut= ilized by useful work i.e. issued uops that eventually get retired", "DefaultMetricgroupName": "TopdownL1", - "MetricExpr": "topdown\\-retiring / (topdown\\-fe\\-bound + topdow= n\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_info_= thread_slots", + "MetricExpr": "topdown\\-retiring / (topdown\\-fe\\-bound + topdow= n\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound)", "MetricGroup": "BvUW;Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_retiring", "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.= 1", @@ -2100,7 +2122,6 @@ }, { "BriefDescription": "This metric represents fraction of cycles the= CPU was stalled due to PAUSE Instructions", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "CPU_CLK_UNHALTED.PAUSE / tma_info_thread_clks", "MetricGroup": "TopdownL4;tma_L4_group;tma_serializing_operation_g= roup", "MetricName": "tma_slow_pause", @@ -2132,7 +2153,7 @@ "MetricGroup": "BvMB;MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_i= ssueBW;tma_l3_bound_group", "MetricName": "tma_sq_full", "MetricThreshold": "tma_sq_full > 0.3 & (tma_l3_bound > 0.05 & (tm= a_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric measures fraction of cycles wher= e the Super Queue (SQ) was full taking into account all request-types and b= oth hardware SMT threads (Logical Processors). Related metrics: tma_bottlen= eck_cache_memory_bandwidth, tma_fb_full, tma_info_system_dram_bw_use, tma_m= em_bandwidth", + "PublicDescription": "This metric measures fraction of cycles wher= e the Super Queue (SQ) was full taking into account all request-types and b= oth hardware SMT threads (Logical Processors). Related metrics: tma_bottlen= eck_data_cache_memory_bandwidth, tma_fb_full, tma_info_system_dram_bw_use, = tma_mem_bandwidth", "ScaleUnit": "100%" }, { diff --git a/tools/perf/pmu-events/arch/x86/emeraldrapids/floating-point.js= on b/tools/perf/pmu-events/arch/x86/emeraldrapids/floating-point.json index 8c9207750c82..bc475e163227 100644 --- a/tools/perf/pmu-events/arch/x86/emeraldrapids/floating-point.json +++ b/tools/perf/pmu-events/arch/x86/emeraldrapids/floating-point.json @@ -5,7 +5,6 @@ "CounterMask": "1", "EventCode": "0xb0", "EventName": "ARITH.FPDIV_ACTIVE", - "PublicDescription": "ARITH.FPDIV_ACTIVE Available PDIST counters:= 0", "SampleAfterValue": "1000003", "UMask": "0x1" }, @@ -14,7 +13,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc1", "EventName": "ASSISTS.FP", - "PublicDescription": "Counts all microcode Floating Point assists.= Available PDIST counters: 0", + "PublicDescription": "Counts all microcode Floating Point assists.= ", "SampleAfterValue": "100003", "UMask": "0x2" }, @@ -23,7 +22,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc1", "EventName": "ASSISTS.SSE_AVX_MIX", - "PublicDescription": "ASSISTS.SSE_AVX_MIX Available PDIST counters= : 0", "SampleAfterValue": "1000003", "UMask": "0x10" }, @@ -32,7 +30,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb3", "EventName": "FP_ARITH_DISPATCHED.PORT_0", - "PublicDescription": "FP_ARITH_DISPATCHED.PORT_0 [This event is al= ias to FP_ARITH_DISPATCHED.V0] Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -41,7 +38,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb3", "EventName": "FP_ARITH_DISPATCHED.PORT_1", - "PublicDescription": "FP_ARITH_DISPATCHED.PORT_1 [This event is al= ias to FP_ARITH_DISPATCHED.V1] Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x2" }, @@ -50,7 +46,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb3", "EventName": "FP_ARITH_DISPATCHED.PORT_5", - "PublicDescription": "FP_ARITH_DISPATCHED.PORT_5 [This event is al= ias to FP_ARITH_DISPATCHED.V2] Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x4" }, @@ -59,7 +54,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb3", "EventName": "FP_ARITH_DISPATCHED.V0", - "PublicDescription": "FP_ARITH_DISPATCHED.V0 [This event is alias = to FP_ARITH_DISPATCHED.PORT_0] Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -68,7 +62,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb3", "EventName": "FP_ARITH_DISPATCHED.V1", - "PublicDescription": "FP_ARITH_DISPATCHED.V1 [This event is alias = to FP_ARITH_DISPATCHED.PORT_1] Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x2" }, @@ -77,7 +70,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb3", "EventName": "FP_ARITH_DISPATCHED.V2", - "PublicDescription": "FP_ARITH_DISPATCHED.V2 [This event is alias = to FP_ARITH_DISPATCHED.PORT_5] Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x4" }, @@ -86,7 +78,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE", - "PublicDescription": "Number of SSE/AVX computational 128-bit pack= ed double precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 2 computation opera= tions, one for each element. Applies to SSE* and AVX* packed double precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as the= y perform 2 calculations per element. The DAZ and FTZ flags in the MXCSR re= gister need to be set when using these events. Available PDIST counters: 0", + "PublicDescription": "Number of SSE/AVX computational 128-bit pack= ed double precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 2 computation opera= tions, one for each element. Applies to SSE* and AVX* packed double precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as the= y perform 2 calculations per element. The DAZ and FTZ flags in the MXCSR re= gister need to be set when using these events.", "SampleAfterValue": "100003", "UMask": "0x4" }, @@ -95,7 +87,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE", - "PublicDescription": "Number of SSE/AVX computational 128-bit pack= ed single precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 4 computation opera= tions, one for each element. Applies to SSE* and AVX* packed single precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT RSQRT RCP DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count tw= ice as they perform 2 calculations per element. The DAZ and FTZ flags in th= e MXCSR register need to be set when using these events. Available PDIST co= unters: 0", + "PublicDescription": "Number of SSE/AVX computational 128-bit pack= ed single precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 4 computation opera= tions, one for each element. Applies to SSE* and AVX* packed single precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT RSQRT RCP DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count tw= ice as they perform 2 calculations per element. The DAZ and FTZ flags in th= e MXCSR register need to be set when using these events.", "SampleAfterValue": "100003", "UMask": "0x8" }, @@ -104,7 +96,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE", - "PublicDescription": "Number of SSE/AVX computational 256-bit pack= ed double precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 4 computation opera= tions, one for each element. Applies to SSE* and AVX* packed double precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 = calculations per element. The DAZ and FTZ flags in the MXCSR register need = to be set when using these events. Available PDIST counters: 0", + "PublicDescription": "Number of SSE/AVX computational 256-bit pack= ed double precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 4 computation opera= tions, one for each element. Applies to SSE* and AVX* packed double precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 = calculations per element. The DAZ and FTZ flags in the MXCSR register need = to be set when using these events.", "SampleAfterValue": "100003", "UMask": "0x10" }, @@ -113,7 +105,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE", - "PublicDescription": "Number of SSE/AVX computational 256-bit pack= ed single precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 8 computation opera= tions, one for each element. Applies to SSE* and AVX* packed single precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT RSQRT RCP DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count tw= ice as they perform 2 calculations per element. The DAZ and FTZ flags in th= e MXCSR register need to be set when using these events. Available PDIST co= unters: 0", + "PublicDescription": "Number of SSE/AVX computational 256-bit pack= ed single precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 8 computation opera= tions, one for each element. Applies to SSE* and AVX* packed single precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT RSQRT RCP DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count tw= ice as they perform 2 calculations per element. The DAZ and FTZ flags in th= e MXCSR register need to be set when using these events.", "SampleAfterValue": "100003", "UMask": "0x20" }, @@ -122,7 +114,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.4_FLOPS", - "PublicDescription": "Number of SSE/AVX computational 128-bit pack= ed single precision and 256-bit packed double precision floating-point ins= tructions retired; some instructions will count twice as noted below. Each= count represents 2 or/and 4 computation operations, one for each element. = Applies to SSE* and AVX* packed single precision floating-point and packed= double precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL= DIV MIN MAX RCP14 RSQRT14 SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB ins= tructions count twice as they perform 2 calculations per element. The DAZ a= nd FTZ flags in the MXCSR register need to be set when using these events. = Available PDIST counters: 0", + "PublicDescription": "Number of SSE/AVX computational 128-bit pack= ed single precision and 256-bit packed double precision floating-point ins= tructions retired; some instructions will count twice as noted below. Each= count represents 2 or/and 4 computation operations, one for each element. = Applies to SSE* and AVX* packed single precision floating-point and packed= double precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL= DIV MIN MAX RCP14 RSQRT14 SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB ins= tructions count twice as they perform 2 calculations per element. The DAZ a= nd FTZ flags in the MXCSR register need to be set when using these events.", "SampleAfterValue": "100003", "UMask": "0x18" }, @@ -131,7 +123,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE", - "PublicDescription": "Number of SSE/AVX computational 512-bit pack= ed double precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 8 computation opera= tions, one for each element. Applies to SSE* and AVX* packed double precis= ion floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT14 RCP14= FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 calc= ulations per element. The DAZ and FTZ flags in the MXCSR register need to b= e set when using these events. Available PDIST counters: 0", + "PublicDescription": "Number of SSE/AVX computational 512-bit pack= ed double precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 8 computation opera= tions, one for each element. Applies to SSE* and AVX* packed double precis= ion floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT14 RCP14= FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 calc= ulations per element. The DAZ and FTZ flags in the MXCSR register need to b= e set when using these events.", "SampleAfterValue": "100003", "UMask": "0x40" }, @@ -140,7 +132,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE", - "PublicDescription": "Number of SSE/AVX computational 512-bit pack= ed single precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 16 computation oper= ations, one for each element. Applies to SSE* and AVX* packed single preci= sion floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT14 RCP1= 4 FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 cal= culations per element. The DAZ and FTZ flags in the MXCSR register need to = be set when using these events. Available PDIST counters: 0", + "PublicDescription": "Number of SSE/AVX computational 512-bit pack= ed single precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 16 computation oper= ations, one for each element. Applies to SSE* and AVX* packed single preci= sion floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT14 RCP1= 4 FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 cal= culations per element. The DAZ and FTZ flags in the MXCSR register need to = be set when using these events.", "SampleAfterValue": "100003", "UMask": "0x80" }, @@ -149,7 +141,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.8_FLOPS", - "PublicDescription": "Number of SSE/AVX computational 256-bit pack= ed single precision and 512-bit packed double precision floating-point ins= tructions retired; some instructions will count twice as noted below. Each= count represents 8 computation operations, one for each element. Applies = to SSE* and AVX* packed single precision and double precision floating-poin= t instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT RSQRT RSQRT14= RCP RCP14 DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice= as they perform 2 calculations per element. The DAZ and FTZ flags in the M= XCSR register need to be set when using these events. Available PDIST count= ers: 0", + "PublicDescription": "Number of SSE/AVX computational 256-bit pack= ed single precision and 512-bit packed double precision floating-point ins= tructions retired; some instructions will count twice as noted below. Each= count represents 8 computation operations, one for each element. Applies = to SSE* and AVX* packed single precision and double precision floating-poin= t instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT RSQRT RSQRT14= RCP RCP14 DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice= as they perform 2 calculations per element. The DAZ and FTZ flags in the M= XCSR register need to be set when using these events.", "SampleAfterValue": "100003", "UMask": "0x60" }, @@ -158,7 +150,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.SCALAR", - "PublicDescription": "Number of SSE/AVX computational scalar singl= e precision and double precision floating-point instructions retired; some = instructions will count twice as noted below. Each count represents 1 comp= utational operation. Applies to SSE* and AVX* scalar single precision float= ing-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT RCP FM(N)ADD/SUB= . FM(N)ADD/SUB instructions count twice as they perform 2 calculations per= element. The DAZ and FTZ flags in the MXCSR register need to be set when u= sing these events. Available PDIST counters: 0", + "PublicDescription": "Number of SSE/AVX computational scalar singl= e precision and double precision floating-point instructions retired; some = instructions will count twice as noted below. Each count represents 1 comp= utational operation. Applies to SSE* and AVX* scalar single precision float= ing-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT RCP FM(N)ADD/SUB= . FM(N)ADD/SUB instructions count twice as they perform 2 calculations per= element. The DAZ and FTZ flags in the MXCSR register need to be set when u= sing these events.", "SampleAfterValue": "1000003", "UMask": "0x3" }, @@ -167,7 +159,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.SCALAR_DOUBLE", - "PublicDescription": "Number of SSE/AVX computational scalar doubl= e precision floating-point instructions retired; some instructions will cou= nt twice as noted below. Each count represents 1 computational operation. = Applies to SSE* and AVX* scalar double precision floating-point instruction= s: ADD SUB MUL DIV MIN MAX SQRT FM(N)ADD/SUB. FM(N)ADD/SUB instructions co= unt twice as they perform 2 calculations per element. The DAZ and FTZ flags= in the MXCSR register need to be set when using these events. Available PD= IST counters: 0", + "PublicDescription": "Number of SSE/AVX computational scalar doubl= e precision floating-point instructions retired; some instructions will cou= nt twice as noted below. Each count represents 1 computational operation. = Applies to SSE* and AVX* scalar double precision floating-point instruction= s: ADD SUB MUL DIV MIN MAX SQRT FM(N)ADD/SUB. FM(N)ADD/SUB instructions co= unt twice as they perform 2 calculations per element. The DAZ and FTZ flags= in the MXCSR register need to be set when using these events.", "SampleAfterValue": "100003", "UMask": "0x1" }, @@ -176,7 +168,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.SCALAR_SINGLE", - "PublicDescription": "Number of SSE/AVX computational scalar singl= e precision floating-point instructions retired; some instructions will cou= nt twice as noted below. Each count represents 1 computational operation. = Applies to SSE* and AVX* scalar single precision floating-point instruction= s: ADD SUB MUL DIV MIN MAX SQRT RSQRT RCP FM(N)ADD/SUB. FM(N)ADD/SUB instr= uctions count twice as they perform 2 calculations per element. The DAZ and= FTZ flags in the MXCSR register need to be set when using these events. Av= ailable PDIST counters: 0", + "PublicDescription": "Number of SSE/AVX computational scalar singl= e precision floating-point instructions retired; some instructions will cou= nt twice as noted below. Each count represents 1 computational operation. = Applies to SSE* and AVX* scalar single precision floating-point instruction= s: ADD SUB MUL DIV MIN MAX SQRT RSQRT RCP FM(N)ADD/SUB. FM(N)ADD/SUB instr= uctions count twice as they perform 2 calculations per element. The DAZ and= FTZ flags in the MXCSR register need to be set when using these events.", "SampleAfterValue": "100003", "UMask": "0x2" }, @@ -185,7 +177,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.VECTOR", - "PublicDescription": "Number of any Vector retired FP arithmetic i= nstructions. The DAZ and FTZ flags in the MXCSR register need to be set wh= en using these events. Available PDIST counters: 0", + "PublicDescription": "Number of any Vector retired FP arithmetic i= nstructions. The DAZ and FTZ flags in the MXCSR register need to be set wh= en using these events.", "SampleAfterValue": "1000003", "UMask": "0xfc" }, @@ -194,7 +186,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xcf", "EventName": "FP_ARITH_INST_RETIRED2.128B_PACKED_HALF", - "PublicDescription": "FP_ARITH_INST_RETIRED2.128B_PACKED_HALF Avai= lable PDIST counters: 0", "SampleAfterValue": "100003", "UMask": "0x4" }, @@ -203,7 +194,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xcf", "EventName": "FP_ARITH_INST_RETIRED2.256B_PACKED_HALF", - "PublicDescription": "FP_ARITH_INST_RETIRED2.256B_PACKED_HALF Avai= lable PDIST counters: 0", "SampleAfterValue": "100003", "UMask": "0x8" }, @@ -212,7 +202,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xcf", "EventName": "FP_ARITH_INST_RETIRED2.512B_PACKED_HALF", - "PublicDescription": "FP_ARITH_INST_RETIRED2.512B_PACKED_HALF Avai= lable PDIST counters: 0", "SampleAfterValue": "100003", "UMask": "0x10" }, @@ -221,7 +210,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xcf", "EventName": "FP_ARITH_INST_RETIRED2.COMPLEX_SCALAR_HALF", - "PublicDescription": "FP_ARITH_INST_RETIRED2.COMPLEX_SCALAR_HALF A= vailable PDIST counters: 0", "SampleAfterValue": "100003", "UMask": "0x2" }, @@ -230,7 +218,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xcf", "EventName": "FP_ARITH_INST_RETIRED2.SCALAR", - "PublicDescription": "FP_ARITH_INST_RETIRED2.SCALAR Available PDIS= T counters: 0", + "PublicDescription": "FP_ARITH_INST_RETIRED2.SCALAR", "SampleAfterValue": "100003", "UMask": "0x3" }, @@ -239,7 +227,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xcf", "EventName": "FP_ARITH_INST_RETIRED2.SCALAR_HALF", - "PublicDescription": "FP_ARITH_INST_RETIRED2.SCALAR_HALF Available= PDIST counters: 0", "SampleAfterValue": "100003", "UMask": "0x1" }, @@ -248,7 +235,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xcf", "EventName": "FP_ARITH_INST_RETIRED2.VECTOR", - "PublicDescription": "FP_ARITH_INST_RETIRED2.VECTOR Available PDIS= T counters: 0", + "PublicDescription": "FP_ARITH_INST_RETIRED2.VECTOR", "SampleAfterValue": "100003", "UMask": "0x1c" } diff --git a/tools/perf/pmu-events/arch/x86/emeraldrapids/frontend.json b/t= ools/perf/pmu-events/arch/x86/emeraldrapids/frontend.json index 9fe9d62b867a..793c486ffabe 100644 --- a/tools/perf/pmu-events/arch/x86/emeraldrapids/frontend.json +++ b/tools/perf/pmu-events/arch/x86/emeraldrapids/frontend.json @@ -4,7 +4,7 @@ "Counter": "0,1,2,3", "EventCode": "0x60", "EventName": "BACLEARS.ANY", - "PublicDescription": "Number of times the front-end is resteered w= hen it finds a branch instruction in a fetch line. This is called Unknown B= ranch which occurs for the first time a branch instruction is fetched or wh= en the branch is not tracked by the BPU (Branch Prediction Unit) anymore. A= vailable PDIST counters: 0", + "PublicDescription": "Number of times the front-end is resteered w= hen it finds a branch instruction in a fetch line. This is called Unknown B= ranch which occurs for the first time a branch instruction is fetched or wh= en the branch is not tracked by the BPU (Branch Prediction Unit) anymore.", "SampleAfterValue": "100003", "UMask": "0x1" }, @@ -13,7 +13,7 @@ "Counter": "0,1,2,3", "EventCode": "0x87", "EventName": "DECODE.LCP", - "PublicDescription": "Counts cycles that the Instruction Length de= coder (ILD) stalls occurred due to dynamically changing prefix length of th= e decoded instruction (by operand size prefix instruction 0x66, address siz= e prefix instruction 0x67 or REX.W for Intel64). Count is proportional to t= he number of prefixes in a 16B-line. This may result in a three-cycle penal= ty for each LCP (Length changing prefix) in a 16-byte chunk. Available PDIS= T counters: 0", + "PublicDescription": "Counts cycles that the Instruction Length de= coder (ILD) stalls occurred due to dynamically changing prefix length of th= e decoded instruction (by operand size prefix instruction 0x66, address siz= e prefix instruction 0x67 or REX.W for Intel64). Count is proportional to t= he number of prefixes in a 16B-line. This may result in a three-cycle penal= ty for each LCP (Length changing prefix) in a 16-byte chunk.", "SampleAfterValue": "500009", "UMask": "0x1" }, @@ -22,7 +22,6 @@ "Counter": "0,1,2,3", "EventCode": "0x87", "EventName": "DECODE.MS_BUSY", - "PublicDescription": "Cycles the Microcode Sequencer is busy. Avai= lable PDIST counters: 0", "SampleAfterValue": "500009", "UMask": "0x2" }, @@ -31,7 +30,7 @@ "Counter": "0,1,2,3", "EventCode": "0x61", "EventName": "DSB2MITE_SWITCHES.PENALTY_CYCLES", - "PublicDescription": "Decode Stream Buffer (DSB) is a Uop-cache th= at holds translations of previously fetched instructions that were decoded = by the legacy x86 decode pipeline (MITE). This event counts fetch penalty c= ycles when a transition occurs from DSB to MITE. Available PDIST counters: = 0", + "PublicDescription": "Decode Stream Buffer (DSB) is a Uop-cache th= at holds translations of previously fetched instructions that were decoded = by the legacy x86 decode pipeline (MITE). This event counts fetch penalty c= ycles when a transition occurs from DSB to MITE.", "SampleAfterValue": "100003", "UMask": "0x2" }, @@ -249,7 +248,7 @@ "Counter": "0,1,2,3", "EventCode": "0x80", "EventName": "ICACHE_DATA.STALLS", - "PublicDescription": "Counts cycles where a code line fetch is sta= lled due to an L1 instruction cache miss. The decode pipeline works at a 32= Byte granularity. Available PDIST counters: 0", + "PublicDescription": "Counts cycles where a code line fetch is sta= lled due to an L1 instruction cache miss. The decode pipeline works at a 32= Byte granularity.", "SampleAfterValue": "500009", "UMask": "0x4" }, @@ -260,7 +259,6 @@ "EdgeDetect": "1", "EventCode": "0x80", "EventName": "ICACHE_DATA.STALL_PERIODS", - "PublicDescription": "ICACHE_DATA.STALL_PERIODS Available PDIST co= unters: 0", "SampleAfterValue": "500009", "UMask": "0x4" }, @@ -269,7 +267,7 @@ "Counter": "0,1,2,3", "EventCode": "0x83", "EventName": "ICACHE_TAG.STALLS", - "PublicDescription": "Counts cycles where a code fetch is stalled = due to L1 instruction cache tag miss. Available PDIST counters: 0", + "PublicDescription": "Counts cycles where a code fetch is stalled = due to L1 instruction cache tag miss.", "SampleAfterValue": "200003", "UMask": "0x4" }, @@ -279,7 +277,7 @@ "CounterMask": "1", "EventCode": "0x79", "EventName": "IDQ.DSB_CYCLES_ANY", - "PublicDescription": "Counts the number of cycles uops were delive= red to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) p= ath. Available PDIST counters: 0", + "PublicDescription": "Counts the number of cycles uops were delive= red to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) p= ath.", "SampleAfterValue": "2000003", "UMask": "0x8" }, @@ -289,7 +287,7 @@ "CounterMask": "6", "EventCode": "0x79", "EventName": "IDQ.DSB_CYCLES_OK", - "PublicDescription": "Counts the number of cycles where optimal nu= mber of uops was delivered to the Instruction Decode Queue (IDQ) from the D= SB (Decode Stream Buffer) path. Count includes uops that may 'bypass' the I= DQ. Available PDIST counters: 0", + "PublicDescription": "Counts the number of cycles where optimal nu= mber of uops was delivered to the Instruction Decode Queue (IDQ) from the D= SB (Decode Stream Buffer) path. Count includes uops that may 'bypass' the I= DQ.", "SampleAfterValue": "2000003", "UMask": "0x8" }, @@ -298,7 +296,7 @@ "Counter": "0,1,2,3", "EventCode": "0x79", "EventName": "IDQ.DSB_UOPS", - "PublicDescription": "Counts the number of uops delivered to Instr= uction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path. Availab= le PDIST counters: 0", + "PublicDescription": "Counts the number of uops delivered to Instr= uction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path.", "SampleAfterValue": "2000003", "UMask": "0x8" }, @@ -308,7 +306,7 @@ "CounterMask": "1", "EventCode": "0x79", "EventName": "IDQ.MITE_CYCLES_ANY", - "PublicDescription": "Counts the number of cycles uops were delive= red to the Instruction Decode Queue (IDQ) from the MITE (legacy decode pipe= line) path. During these cycles uops are not being delivered from the Decod= e Stream Buffer (DSB). Available PDIST counters: 0", + "PublicDescription": "Counts the number of cycles uops were delive= red to the Instruction Decode Queue (IDQ) from the MITE (legacy decode pipe= line) path. During these cycles uops are not being delivered from the Decod= e Stream Buffer (DSB).", "SampleAfterValue": "2000003", "UMask": "0x4" }, @@ -318,7 +316,7 @@ "CounterMask": "6", "EventCode": "0x79", "EventName": "IDQ.MITE_CYCLES_OK", - "PublicDescription": "Counts the number of cycles where optimal nu= mber of uops was delivered to the Instruction Decode Queue (IDQ) from the M= ITE (legacy decode pipeline) path. During these cycles uops are not being d= elivered from the Decode Stream Buffer (DSB). Available PDIST counters: 0", + "PublicDescription": "Counts the number of cycles where optimal nu= mber of uops was delivered to the Instruction Decode Queue (IDQ) from the M= ITE (legacy decode pipeline) path. During these cycles uops are not being d= elivered from the Decode Stream Buffer (DSB).", "SampleAfterValue": "2000003", "UMask": "0x4" }, @@ -327,7 +325,7 @@ "Counter": "0,1,2,3", "EventCode": "0x79", "EventName": "IDQ.MITE_UOPS", - "PublicDescription": "Counts the number of uops delivered to Instr= uction Decode Queue (IDQ) from the MITE path. This also means that uops are= not being delivered from the Decode Stream Buffer (DSB). Available PDIST c= ounters: 0", + "PublicDescription": "Counts the number of uops delivered to Instr= uction Decode Queue (IDQ) from the MITE path. This also means that uops are= not being delivered from the Decode Stream Buffer (DSB).", "SampleAfterValue": "2000003", "UMask": "0x4" }, @@ -337,7 +335,7 @@ "CounterMask": "1", "EventCode": "0x79", "EventName": "IDQ.MS_CYCLES_ANY", - "PublicDescription": "Counts cycles during which uops are being de= livered to Instruction Decode Queue (IDQ) while the Microcode Sequencer (MS= ) is busy. Uops maybe initiated by Decode Stream Buffer (DSB) or MITE. Avai= lable PDIST counters: 0", + "PublicDescription": "Counts cycles during which uops are being de= livered to Instruction Decode Queue (IDQ) while the Microcode Sequencer (MS= ) is busy. Uops maybe initiated by Decode Stream Buffer (DSB) or MITE.", "SampleAfterValue": "2000003", "UMask": "0x20" }, @@ -348,7 +346,7 @@ "EdgeDetect": "1", "EventCode": "0x79", "EventName": "IDQ.MS_SWITCHES", - "PublicDescription": "Number of switches from DSB (Decode Stream B= uffer) or MITE (legacy decode pipeline) to the Microcode Sequencer. Availab= le PDIST counters: 0", + "PublicDescription": "Number of switches from DSB (Decode Stream B= uffer) or MITE (legacy decode pipeline) to the Microcode Sequencer.", "SampleAfterValue": "100003", "UMask": "0x20" }, @@ -357,7 +355,7 @@ "Counter": "0,1,2,3", "EventCode": "0x79", "EventName": "IDQ.MS_UOPS", - "PublicDescription": "Counts the total number of uops delivered by= the Microcode Sequencer (MS). Available PDIST counters: 0", + "PublicDescription": "Counts the total number of uops delivered by= the Microcode Sequencer (MS).", "SampleAfterValue": "1000003", "UMask": "0x20" }, @@ -366,7 +364,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0x9c", "EventName": "IDQ_BUBBLES.CORE", - "PublicDescription": "Counts the number of uops not delivered to b= y the Instruction Decode Queue (IDQ) to the back-end of the pipeline when t= here was no back-end stalls. This event counts for one SMT thread in a give= n cycle. [This event is alias to IDQ_UOPS_NOT_DELIVERED.CORE] Available PDI= ST counters: 0", + "PublicDescription": "Counts the number of uops not delivered to b= y the Instruction Decode Queue (IDQ) to the back-end of the pipeline when t= here was no back-end stalls. This event counts for one SMT thread in a give= n cycle. [This event is alias to IDQ_UOPS_NOT_DELIVERED.CORE]", "SampleAfterValue": "1000003", "UMask": "0x1" }, @@ -376,7 +374,7 @@ "CounterMask": "6", "EventCode": "0x9c", "EventName": "IDQ_BUBBLES.CYCLES_0_UOPS_DELIV.CORE", - "PublicDescription": "Counts the number of cycles when no uops wer= e delivered by the Instruction Decode Queue (IDQ) to the back-end of the pi= peline when there was no back-end stalls. This event counts for one SMT thr= ead in a given cycle. [This event is alias to IDQ_UOPS_NOT_DELIVERED.CYCLES= _0_UOPS_DELIV.CORE] Available PDIST counters: 0", + "PublicDescription": "Counts the number of cycles when no uops wer= e delivered by the Instruction Decode Queue (IDQ) to the back-end of the pi= peline when there was no back-end stalls. This event counts for one SMT thr= ead in a given cycle. [This event is alias to IDQ_UOPS_NOT_DELIVERED.CYCLES= _0_UOPS_DELIV.CORE]", "SampleAfterValue": "1000003", "UMask": "0x1" }, @@ -387,7 +385,7 @@ "EventCode": "0x9c", "EventName": "IDQ_BUBBLES.CYCLES_FE_WAS_OK", "Invert": "1", - "PublicDescription": "Counts the number of cycles when the optimal= number of uops were delivered by the Instruction Decode Queue (IDQ) to the= back-end of the pipeline when there was no back-end stalls. This event cou= nts for one SMT thread in a given cycle. [This event is alias to IDQ_UOPS_N= OT_DELIVERED.CYCLES_FE_WAS_OK] Available PDIST counters: 0", + "PublicDescription": "Counts the number of cycles when the optimal= number of uops were delivered by the Instruction Decode Queue (IDQ) to the= back-end of the pipeline when there was no back-end stalls. This event cou= nts for one SMT thread in a given cycle. [This event is alias to IDQ_UOPS_N= OT_DELIVERED.CYCLES_FE_WAS_OK]", "SampleAfterValue": "1000003", "UMask": "0x1" }, @@ -396,7 +394,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0x9c", "EventName": "IDQ_UOPS_NOT_DELIVERED.CORE", - "PublicDescription": "Counts the number of uops not delivered to b= y the Instruction Decode Queue (IDQ) to the back-end of the pipeline when t= here was no back-end stalls. This event counts for one SMT thread in a give= n cycle. [This event is alias to IDQ_BUBBLES.CORE] Available PDIST counters= : 0", + "PublicDescription": "Counts the number of uops not delivered to b= y the Instruction Decode Queue (IDQ) to the back-end of the pipeline when t= here was no back-end stalls. This event counts for one SMT thread in a give= n cycle. [This event is alias to IDQ_BUBBLES.CORE]", "SampleAfterValue": "1000003", "UMask": "0x1" }, @@ -406,7 +404,7 @@ "CounterMask": "6", "EventCode": "0x9c", "EventName": "IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE", - "PublicDescription": "Counts the number of cycles when no uops wer= e delivered by the Instruction Decode Queue (IDQ) to the back-end of the pi= peline when there was no back-end stalls. This event counts for one SMT thr= ead in a given cycle. [This event is alias to IDQ_BUBBLES.CYCLES_0_UOPS_DEL= IV.CORE] Available PDIST counters: 0", + "PublicDescription": "Counts the number of cycles when no uops wer= e delivered by the Instruction Decode Queue (IDQ) to the back-end of the pi= peline when there was no back-end stalls. This event counts for one SMT thr= ead in a given cycle. [This event is alias to IDQ_BUBBLES.CYCLES_0_UOPS_DEL= IV.CORE]", "SampleAfterValue": "1000003", "UMask": "0x1" }, @@ -417,7 +415,7 @@ "EventCode": "0x9c", "EventName": "IDQ_UOPS_NOT_DELIVERED.CYCLES_FE_WAS_OK", "Invert": "1", - "PublicDescription": "Counts the number of cycles when the optimal= number of uops were delivered by the Instruction Decode Queue (IDQ) to the= back-end of the pipeline when there was no back-end stalls. This event cou= nts for one SMT thread in a given cycle. [This event is alias to IDQ_BUBBLE= S.CYCLES_FE_WAS_OK] Available PDIST counters: 0", + "PublicDescription": "Counts the number of cycles when the optimal= number of uops were delivered by the Instruction Decode Queue (IDQ) to the= back-end of the pipeline when there was no back-end stalls. This event cou= nts for one SMT thread in a given cycle. [This event is alias to IDQ_BUBBLE= S.CYCLES_FE_WAS_OK]", "SampleAfterValue": "1000003", "UMask": "0x1" } diff --git a/tools/perf/pmu-events/arch/x86/emeraldrapids/memory.json b/too= ls/perf/pmu-events/arch/x86/emeraldrapids/memory.json index 7c3f9b76d367..5e6c1f05c981 100644 --- a/tools/perf/pmu-events/arch/x86/emeraldrapids/memory.json +++ b/tools/perf/pmu-events/arch/x86/emeraldrapids/memory.json @@ -5,7 +5,6 @@ "CounterMask": "6", "EventCode": "0xa3", "EventName": "CYCLE_ACTIVITY.STALLS_L3_MISS", - "PublicDescription": "Execution stalls while L3 cache miss demand = load is outstanding. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x6" }, @@ -14,7 +13,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc3", "EventName": "MACHINE_CLEARS.MEMORY_ORDERING", - "PublicDescription": "Counts the number of Machine Clears detected= dye to memory ordering. Memory Ordering Machine Clears may apply when a me= mory read may not conform to the memory ordering rules of the x86 architect= ure Available PDIST counters: 0", + "PublicDescription": "Counts the number of Machine Clears detected= dye to memory ordering. Memory Ordering Machine Clears may apply when a me= mory read may not conform to the memory ordering rules of the x86 architect= ure", "SampleAfterValue": "100003", "UMask": "0x2" }, @@ -24,7 +23,6 @@ "CounterMask": "2", "EventCode": "0x47", "EventName": "MEMORY_ACTIVITY.CYCLES_L1D_MISS", - "PublicDescription": "Cycles while L1 cache miss demand load is ou= tstanding. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x2" }, @@ -34,7 +32,6 @@ "CounterMask": "3", "EventCode": "0x47", "EventName": "MEMORY_ACTIVITY.STALLS_L1D_MISS", - "PublicDescription": "Execution stalls while L1 cache miss demand = load is outstanding. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x3" }, @@ -44,7 +41,7 @@ "CounterMask": "5", "EventCode": "0x47", "EventName": "MEMORY_ACTIVITY.STALLS_L2_MISS", - "PublicDescription": "Execution stalls while L2 cache miss demand = cacheable load request is outstanding (will not count for uncacheable deman= d requests e.g. bus lock). Available PDIST counters: 0", + "PublicDescription": "Execution stalls while L2 cache miss demand = cacheable load request is outstanding (will not count for uncacheable deman= d requests e.g. bus lock).", "SampleAfterValue": "1000003", "UMask": "0x5" }, @@ -54,7 +51,7 @@ "CounterMask": "9", "EventCode": "0x47", "EventName": "MEMORY_ACTIVITY.STALLS_L3_MISS", - "PublicDescription": "Execution stalls while L3 cache miss demand = cacheable load request is outstanding (will not count for uncacheable deman= d requests e.g. bus lock). Available PDIST counters: 0", + "PublicDescription": "Execution stalls while L3 cache miss demand = cacheable load request is outstanding (will not count for uncacheable deman= d requests e.g. bus lock).", "SampleAfterValue": "1000003", "UMask": "0x9" }, @@ -478,7 +475,6 @@ "Counter": "0,1,2,3", "EventCode": "0x21", "EventName": "OFFCORE_REQUESTS.L3_MISS_DEMAND_DATA_RD", - "PublicDescription": "Counts demand data read requests that miss t= he L3 cache. Available PDIST counters: 0", "SampleAfterValue": "100003", "UMask": "0x10" }, @@ -487,7 +483,7 @@ "Counter": "0,1,2,3", "EventCode": "0x20", "EventName": "OFFCORE_REQUESTS_OUTSTANDING.L3_MISS_DEMAND_DATA_RD", - "PublicDescription": "For every cycle, increments by the number of= demand data read requests pending that are known to have missed the L3 cac= he. Note that this does not capture all elapsed cycles while requests are = outstanding - only cycles from when the requests were known by the requesti= ng core to have missed the L3 cache. Available PDIST counters: 0", + "PublicDescription": "For every cycle, increments by the number of= demand data read requests pending that are known to have missed the L3 cac= he. Note that this does not capture all elapsed cycles while requests are = outstanding - only cycles from when the requests were known by the requesti= ng core to have missed the L3 cache.", "SampleAfterValue": "2000003", "UMask": "0x10" }, @@ -505,7 +501,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc9", "EventName": "RTM_RETIRED.ABORTED_EVENTS", - "PublicDescription": "Counts the number of times an RTM execution = aborted due to none of the previous 3 categories (e.g. interrupt). Availabl= e PDIST counters: 0", + "PublicDescription": "Counts the number of times an RTM execution = aborted due to none of the previous 3 categories (e.g. interrupt).", "SampleAfterValue": "100003", "UMask": "0x80" }, @@ -514,7 +510,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc9", "EventName": "RTM_RETIRED.ABORTED_MEM", - "PublicDescription": "Counts the number of times an RTM execution = aborted due to various memory events (e.g. read/write capacity and conflict= s). Available PDIST counters: 0", + "PublicDescription": "Counts the number of times an RTM execution = aborted due to various memory events (e.g. read/write capacity and conflict= s).", "SampleAfterValue": "100003", "UMask": "0x8" }, @@ -523,7 +519,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc9", "EventName": "RTM_RETIRED.ABORTED_MEMTYPE", - "PublicDescription": "Counts the number of times an RTM execution = aborted due to incompatible memory type. Available PDIST counters: 0", + "PublicDescription": "Counts the number of times an RTM execution = aborted due to incompatible memory type.", "SampleAfterValue": "100003", "UMask": "0x40" }, @@ -532,7 +528,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc9", "EventName": "RTM_RETIRED.ABORTED_UNFRIENDLY", - "PublicDescription": "Counts the number of times an RTM execution = aborted due to HLE-unfriendly instructions. Available PDIST counters: 0", + "PublicDescription": "Counts the number of times an RTM execution = aborted due to HLE-unfriendly instructions.", "SampleAfterValue": "100003", "UMask": "0x20" }, @@ -541,7 +537,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc9", "EventName": "RTM_RETIRED.COMMIT", - "PublicDescription": "Counts the number of times RTM commit succee= ded. Available PDIST counters: 0", + "PublicDescription": "Counts the number of times RTM commit succee= ded.", "SampleAfterValue": "100003", "UMask": "0x2" }, @@ -550,7 +546,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc9", "EventName": "RTM_RETIRED.START", - "PublicDescription": "Counts the number of times we entered an RTM= region. Does not count nested transactions. Available PDIST counters: 0", + "PublicDescription": "Counts the number of times we entered an RTM= region. Does not count nested transactions.", "SampleAfterValue": "100003", "UMask": "0x1" }, @@ -559,7 +555,7 @@ "Counter": "0,1,2,3", "EventCode": "0x54", "EventName": "TX_MEM.ABORT_CAPACITY_READ", - "PublicDescription": "Speculatively counts the number of Transacti= onal Synchronization Extensions (TSX) aborts due to a data capacity limitat= ion for transactional reads Available PDIST counters: 0", + "PublicDescription": "Speculatively counts the number of Transacti= onal Synchronization Extensions (TSX) aborts due to a data capacity limitat= ion for transactional reads", "SampleAfterValue": "100003", "UMask": "0x80" }, @@ -568,7 +564,7 @@ "Counter": "0,1,2,3", "EventCode": "0x54", "EventName": "TX_MEM.ABORT_CAPACITY_WRITE", - "PublicDescription": "Speculatively counts the number of Transacti= onal Synchronization Extensions (TSX) aborts due to a data capacity limitat= ion for transactional writes. Available PDIST counters: 0", + "PublicDescription": "Speculatively counts the number of Transacti= onal Synchronization Extensions (TSX) aborts due to a data capacity limitat= ion for transactional writes.", "SampleAfterValue": "100003", "UMask": "0x2" }, @@ -577,7 +573,7 @@ "Counter": "0,1,2,3", "EventCode": "0x54", "EventName": "TX_MEM.ABORT_CONFLICT", - "PublicDescription": "Counts the number of times a TSX line had a = cache conflict. Available PDIST counters: 0", + "PublicDescription": "Counts the number of times a TSX line had a = cache conflict.", "SampleAfterValue": "100003", "UMask": "0x1" } diff --git a/tools/perf/pmu-events/arch/x86/emeraldrapids/other.json b/tool= s/perf/pmu-events/arch/x86/emeraldrapids/other.json index a58d65556609..21f49f609ed4 100644 --- a/tools/perf/pmu-events/arch/x86/emeraldrapids/other.json +++ b/tools/perf/pmu-events/arch/x86/emeraldrapids/other.json @@ -4,10 +4,34 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc1", "EventName": "ASSISTS.PAGE_FAULT", - "PublicDescription": "ASSISTS.PAGE_FAULT Available PDIST counters:= 0", "SampleAfterValue": "1000003", "UMask": "0x8" }, + { + "BriefDescription": "HW_INTERRUPTS.MASKED", + "Counter": "0,1,2,3,4,5,6,7", + "EventCode": "0xcb", + "EventName": "HW_INTERRUPTS.MASKED", + "SampleAfterValue": "100003", + "UMask": "0x2" + }, + { + "BriefDescription": "HW_INTERRUPTS.PENDING_AND_MASKED", + "Counter": "0,1,2,3,4,5,6,7", + "EventCode": "0xcb", + "EventName": "HW_INTERRUPTS.PENDING_AND_MASKED", + "SampleAfterValue": "100003", + "UMask": "0x4" + }, + { + "BriefDescription": "Number of hardware interrupts received by the= processor.", + "Counter": "0,1,2,3,4,5,6,7", + "EventCode": "0xcb", + "EventName": "HW_INTERRUPTS.RECEIVED", + "PublicDescription": "Counts the number of hardware interruptions = received by the processor.", + "SampleAfterValue": "203", + "UMask": "0x1" + }, { "BriefDescription": "Counts streaming stores that have any type of= response.", "Counter": "0,1,2,3", @@ -25,7 +49,7 @@ "CounterMask": "1", "EventCode": "0x2d", "EventName": "XQ.FULL_CYCLES", - "PublicDescription": "number of cycles when the thread is active a= nd the uncore cannot take any further requests (for example prefetches, loa= ds or stores initiated by the Core that miss the L2 cache). Available PDIST= counters: 0", + "PublicDescription": "number of cycles when the thread is active a= nd the uncore cannot take any further requests (for example prefetches, loa= ds or stores initiated by the Core that miss the L2 cache).", "SampleAfterValue": "1000003", "UMask": "0x1" } diff --git a/tools/perf/pmu-events/arch/x86/emeraldrapids/pipeline.json b/t= ools/perf/pmu-events/arch/x86/emeraldrapids/pipeline.json index 48bec483b49a..1fa7957956df 100644 --- a/tools/perf/pmu-events/arch/x86/emeraldrapids/pipeline.json +++ b/tools/perf/pmu-events/arch/x86/emeraldrapids/pipeline.json @@ -6,7 +6,6 @@ "Deprecated": "1", "EventCode": "0xb0", "EventName": "ARITH.DIVIDER_ACTIVE", - "PublicDescription": "This event is deprecated. Refer to new event= ARITH.DIV_ACTIVE Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x9" }, @@ -16,7 +15,7 @@ "CounterMask": "1", "EventCode": "0xb0", "EventName": "ARITH.DIV_ACTIVE", - "PublicDescription": "Counts cycles when divide unit is busy execu= ting divide or square root operations. Accounts for integer and floating-po= int operations. Available PDIST counters: 0", + "PublicDescription": "Counts cycles when divide unit is busy execu= ting divide or square root operations. Accounts for integer and floating-po= int operations.", "SampleAfterValue": "1000003", "UMask": "0x9" }, @@ -27,7 +26,6 @@ "Deprecated": "1", "EventCode": "0xb0", "EventName": "ARITH.FP_DIVIDER_ACTIVE", - "PublicDescription": "This event is deprecated. Refer to new event= ARITH.FPDIV_ACTIVE Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x1" }, @@ -37,7 +35,6 @@ "CounterMask": "1", "EventCode": "0xb0", "EventName": "ARITH.IDIV_ACTIVE", - "PublicDescription": "This event counts the cycles the integer div= ider is busy. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x8" }, @@ -48,7 +45,6 @@ "Deprecated": "1", "EventCode": "0xb0", "EventName": "ARITH.INT_DIVIDER_ACTIVE", - "PublicDescription": "This event is deprecated. Refer to new event= ARITH.IDIV_ACTIVE Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x8" }, @@ -57,7 +53,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc1", "EventName": "ASSISTS.ANY", - "PublicDescription": "Counts the number of occurrences where a mic= rocode assist is invoked by hardware. Examples include AD (page Access Dirt= y), FP and AVX related assists. Available PDIST counters: 0", + "PublicDescription": "Counts the number of occurrences where a mic= rocode assist is invoked by hardware. Examples include AD (page Access Dirt= y), FP and AVX related assists.", "SampleAfterValue": "100003", "UMask": "0x1b" }, @@ -217,7 +213,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xec", "EventName": "CPU_CLK_UNHALTED.C01", - "PublicDescription": "Counts core clocks when the thread is in the= C0.1 light-weight slower wakeup time but more power saving optimized state= . This state can be entered via the TPAUSE or UMWAIT instructions. Availab= le PDIST counters: 0", + "PublicDescription": "Counts core clocks when the thread is in the= C0.1 light-weight slower wakeup time but more power saving optimized state= . This state can be entered via the TPAUSE or UMWAIT instructions.", "SampleAfterValue": "2000003", "UMask": "0x10" }, @@ -226,7 +222,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xec", "EventName": "CPU_CLK_UNHALTED.C02", - "PublicDescription": "Counts core clocks when the thread is in the= C0.2 light-weight faster wakeup time but less power saving optimized state= . This state can be entered via the TPAUSE or UMWAIT instructions. Availab= le PDIST counters: 0", + "PublicDescription": "Counts core clocks when the thread is in the= C0.2 light-weight faster wakeup time but less power saving optimized state= . This state can be entered via the TPAUSE or UMWAIT instructions.", "SampleAfterValue": "2000003", "UMask": "0x20" }, @@ -235,7 +231,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xec", "EventName": "CPU_CLK_UNHALTED.C0_WAIT", - "PublicDescription": "Counts core clocks when the thread is in the= C0.1 or C0.2 power saving optimized states (TPAUSE or UMWAIT instructions)= or running the PAUSE instruction. Available PDIST counters: 0", + "PublicDescription": "Counts core clocks when the thread is in the= C0.1 or C0.2 power saving optimized states (TPAUSE or UMWAIT instructions)= or running the PAUSE instruction.", "SampleAfterValue": "2000003", "UMask": "0x70" }, @@ -244,7 +240,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xec", "EventName": "CPU_CLK_UNHALTED.DISTRIBUTED", - "PublicDescription": "This event distributes cycle counts between = active hyperthreads, i.e., those in C0. A hyperthread becomes inactive whe= n it executes the HLT or MWAIT instructions. If all other hyperthreads are= inactive (or disabled or do not exist), all counts are attributed to this = hyperthread. To obtain the full count when the Core is active, sum the coun= ts from each hyperthread. Available PDIST counters: 0", + "PublicDescription": "This event distributes cycle counts between = active hyperthreads, i.e., those in C0. A hyperthread becomes inactive whe= n it executes the HLT or MWAIT instructions. If all other hyperthreads are= inactive (or disabled or do not exist), all counts are attributed to this = hyperthread. To obtain the full count when the Core is active, sum the coun= ts from each hyperthread.", "SampleAfterValue": "2000003", "UMask": "0x2" }, @@ -253,7 +249,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0x3c", "EventName": "CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE", - "PublicDescription": "Counts Core crystal clock cycles when curren= t thread is unhalted and the other thread is halted. Available PDIST counte= rs: 0", + "PublicDescription": "Counts Core crystal clock cycles when curren= t thread is unhalted and the other thread is halted.", "SampleAfterValue": "25003", "UMask": "0x2" }, @@ -262,7 +258,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xec", "EventName": "CPU_CLK_UNHALTED.PAUSE", - "PublicDescription": "CPU_CLK_UNHALTED.PAUSE Available PDIST count= ers: 0", "SampleAfterValue": "2000003", "UMask": "0x40" }, @@ -273,7 +268,6 @@ "EdgeDetect": "1", "EventCode": "0xec", "EventName": "CPU_CLK_UNHALTED.PAUSE_INST", - "PublicDescription": "CPU_CLK_UNHALTED.PAUSE_INST Available PDIST = counters: 0", "SampleAfterValue": "2000003", "UMask": "0x40" }, @@ -282,7 +276,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0x3c", "EventName": "CPU_CLK_UNHALTED.REF_DISTRIBUTED", - "PublicDescription": "This event distributes Core crystal clock cy= cle counts between active hyperthreads, i.e., those in C0 sleep-state. A hy= perthread becomes inactive when it executes the HLT or MWAIT instructions. = If one thread is active in a core, all counts are attributed to this hypert= hread. To obtain the full count when the Core is active, sum the counts fro= m each hyperthread. Available PDIST counters: 0", + "PublicDescription": "This event distributes Core crystal clock cy= cle counts between active hyperthreads, i.e., those in C0 sleep-state. A hy= perthread becomes inactive when it executes the HLT or MWAIT instructions. = If one thread is active in a core, all counts are attributed to this hypert= hread. To obtain the full count when the Core is active, sum the counts fro= m each hyperthread.", "SampleAfterValue": "2000003", "UMask": "0x8" }, @@ -299,7 +293,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0x3c", "EventName": "CPU_CLK_UNHALTED.REF_TSC_P", - "PublicDescription": "Counts the number of reference cycles when t= he core is not in a halt state. The core enters the halt state when it is r= unning the HLT instruction or the MWAIT instruction. This event is not affe= cted by core frequency changes (for example, P states, TM2 transitions) but= has the same incrementing frequency as the time stamp counter. This event = can approximate elapsed time while the core was not in a halt state. It is = counted on a dedicated fixed counter, leaving the four (eight when Hyperthr= eading is disabled) programmable counters available for other events. Note:= On all current platforms this event stops counting during 'throttling (TM)= ' states duty off periods the processor is 'halted'. The counter update is= done at a lower clock rate then the core clock the overflow status bit for= this counter may appear 'sticky'. After the counter has overflowed and so= ftware clears the overflow status bit and resets the counter to less than M= AX. The reset value to the counter is not clocked immediately so the overfl= ow status bit will flip 'high (1)' and generate another PMI (if enabled) af= ter which the reset value gets clocked into the counter. Therefore, softwar= e will get the interrupt, read the overflow status bit '1 for bit 34 while = the counter value is less than MAX. Software should ignore this case. Avail= able PDIST counters: 0", + "PublicDescription": "Counts the number of reference cycles when t= he core is not in a halt state. The core enters the halt state when it is r= unning the HLT instruction or the MWAIT instruction. This event is not affe= cted by core frequency changes (for example, P states, TM2 transitions) but= has the same incrementing frequency as the time stamp counter. This event = can approximate elapsed time while the core was not in a halt state. It is = counted on a dedicated fixed counter, leaving the four (eight when Hyperthr= eading is disabled) programmable counters available for other events. Note:= On all current platforms this event stops counting during 'throttling (TM)= ' states duty off periods the processor is 'halted'. The counter update is= done at a lower clock rate then the core clock the overflow status bit for= this counter may appear 'sticky'. After the counter has overflowed and so= ftware clears the overflow status bit and resets the counter to less than M= AX. The reset value to the counter is not clocked immediately so the overfl= ow status bit will flip 'high (1)' and generate another PMI (if enabled) af= ter which the reset value gets clocked into the counter. Therefore, softwar= e will get the interrupt, read the overflow status bit '1 for bit 34 while = the counter value is less than MAX. Software should ignore this case.", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -316,7 +310,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0x3c", "EventName": "CPU_CLK_UNHALTED.THREAD_P", - "PublicDescription": "This is an architectural event that counts t= he number of thread cycles while the thread is not in a halt state. The thr= ead enters the halt state when it is running the HLT instruction. The core = frequency may change from time to time due to power or thermal throttling. = For this reason, this event may have a changing ratio with regards to wall = clock time. Available PDIST counters: 0", + "PublicDescription": "This is an architectural event that counts t= he number of thread cycles while the thread is not in a halt state. The thr= ead enters the halt state when it is running the HLT instruction. The core = frequency may change from time to time due to power or thermal throttling. = For this reason, this event may have a changing ratio with regards to wall = clock time.", "SampleAfterValue": "2000003" }, { @@ -325,7 +319,6 @@ "CounterMask": "8", "EventCode": "0xa3", "EventName": "CYCLE_ACTIVITY.CYCLES_L1D_MISS", - "PublicDescription": "Cycles while L1 cache miss demand load is ou= tstanding. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x8" }, @@ -335,7 +328,6 @@ "CounterMask": "1", "EventCode": "0xa3", "EventName": "CYCLE_ACTIVITY.CYCLES_L2_MISS", - "PublicDescription": "Cycles while L2 cache miss demand load is ou= tstanding. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x1" }, @@ -345,7 +337,6 @@ "CounterMask": "16", "EventCode": "0xa3", "EventName": "CYCLE_ACTIVITY.CYCLES_MEM_ANY", - "PublicDescription": "Cycles while memory subsystem has an outstan= ding load. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x10" }, @@ -355,7 +346,6 @@ "CounterMask": "12", "EventCode": "0xa3", "EventName": "CYCLE_ACTIVITY.STALLS_L1D_MISS", - "PublicDescription": "Execution stalls while L1 cache miss demand = load is outstanding. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0xc" }, @@ -365,7 +355,6 @@ "CounterMask": "5", "EventCode": "0xa3", "EventName": "CYCLE_ACTIVITY.STALLS_L2_MISS", - "PublicDescription": "Execution stalls while L2 cache miss demand = load is outstanding. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x5" }, @@ -375,7 +364,6 @@ "CounterMask": "4", "EventCode": "0xa3", "EventName": "CYCLE_ACTIVITY.STALLS_TOTAL", - "PublicDescription": "Total execution stalls. Available PDIST coun= ters: 0", "SampleAfterValue": "1000003", "UMask": "0x4" }, @@ -384,7 +372,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb7", "EventName": "EXE.AMX_BUSY", - "PublicDescription": "Counts the cycles where the AMX (Advance Mat= rix Extension) unit is busy performing an operation. Available PDIST counte= rs: 0", "SampleAfterValue": "2000003", "UMask": "0x2" }, @@ -393,7 +380,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa6", "EventName": "EXE_ACTIVITY.1_PORTS_UTIL", - "PublicDescription": "Counts cycles during which a total of 1 uop = was executed on all ports and Reservation Station (RS) was not empty. Avail= able PDIST counters: 0", + "PublicDescription": "Counts cycles during which a total of 1 uop = was executed on all ports and Reservation Station (RS) was not empty.", "SampleAfterValue": "2000003", "UMask": "0x2" }, @@ -402,7 +389,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa6", "EventName": "EXE_ACTIVITY.2_3_PORTS_UTIL", - "PublicDescription": "Cycles total of 2 or 3 uops are executed on = all ports and Reservation Station (RS) was not empty. Available PDIST count= ers: 0", "SampleAfterValue": "2000003", "UMask": "0xc" }, @@ -411,7 +397,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa6", "EventName": "EXE_ACTIVITY.2_PORTS_UTIL", - "PublicDescription": "Counts cycles during which a total of 2 uops= were executed on all ports and Reservation Station (RS) was not empty. Ava= ilable PDIST counters: 0", + "PublicDescription": "Counts cycles during which a total of 2 uops= were executed on all ports and Reservation Station (RS) was not empty.", "SampleAfterValue": "2000003", "UMask": "0x4" }, @@ -420,7 +406,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa6", "EventName": "EXE_ACTIVITY.3_PORTS_UTIL", - "PublicDescription": "Cycles total of 3 uops are executed on all p= orts and Reservation Station (RS) was not empty. Available PDIST counters: = 0", + "PublicDescription": "Cycles total of 3 uops are executed on all p= orts and Reservation Station (RS) was not empty.", "SampleAfterValue": "2000003", "UMask": "0x8" }, @@ -429,7 +415,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa6", "EventName": "EXE_ACTIVITY.4_PORTS_UTIL", - "PublicDescription": "Cycles total of 4 uops are executed on all p= orts and Reservation Station (RS) was not empty. Available PDIST counters: = 0", + "PublicDescription": "Cycles total of 4 uops are executed on all p= orts and Reservation Station (RS) was not empty.", "SampleAfterValue": "2000003", "UMask": "0x10" }, @@ -439,7 +425,6 @@ "CounterMask": "5", "EventCode": "0xa6", "EventName": "EXE_ACTIVITY.BOUND_ON_LOADS", - "PublicDescription": "Execution stalls while memory subsystem has = an outstanding load. Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x21" }, @@ -449,7 +434,7 @@ "CounterMask": "2", "EventCode": "0xa6", "EventName": "EXE_ACTIVITY.BOUND_ON_STORES", - "PublicDescription": "Counts cycles where the Store Buffer was ful= l and no loads caused an execution stall. Available PDIST counters: 0", + "PublicDescription": "Counts cycles where the Store Buffer was ful= l and no loads caused an execution stall.", "SampleAfterValue": "1000003", "UMask": "0x40" }, @@ -458,7 +443,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa6", "EventName": "EXE_ACTIVITY.EXE_BOUND_0_PORTS", - "PublicDescription": "Number of cycles total of 0 uops executed on= all ports, Reservation Station (RS) was not empty, the Store Buffer (SB) w= as not full and there was no outstanding load. Available PDIST counters: 0", + "PublicDescription": "Number of cycles total of 0 uops executed on= all ports, Reservation Station (RS) was not empty, the Store Buffer (SB) w= as not full and there was no outstanding load.", "SampleAfterValue": "1000003", "UMask": "0x80" }, @@ -467,7 +452,7 @@ "Counter": "0,1,2,3", "EventCode": "0x75", "EventName": "INST_DECODED.DECODERS", - "PublicDescription": "Number of decoders utilized in a cycle when = the MITE (legacy decode pipeline) fetches instructions. Available PDIST cou= nters: 0", + "PublicDescription": "Number of decoders utilized in a cycle when = the MITE (legacy decode pipeline) fetches instructions.", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -492,7 +477,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc0", "EventName": "INST_RETIRED.MACRO_FUSED", - "PublicDescription": "INST_RETIRED.MACRO_FUSED Available PDIST cou= nters: 0", "SampleAfterValue": "2000003", "UMask": "0x10" }, @@ -501,7 +485,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc0", "EventName": "INST_RETIRED.NOP", - "PublicDescription": "Counts all retired NOP or ENDBR32/64 instruc= tions Available PDIST counters: 0", + "PublicDescription": "Counts all retired NOP or ENDBR32/64 instruc= tions", "SampleAfterValue": "2000003", "UMask": "0x2" }, @@ -518,7 +502,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc0", "EventName": "INST_RETIRED.REP_ITERATION", - "PublicDescription": "Number of iterations of Repeat (REP) string = retired instructions such as MOVS, CMPS, and SCAS. Each has a byte, word, a= nd doubleword version and string instructions can be repeated using a repet= ition prefix, REP, that allows their architectural execution to be repeated= a number of times as specified by the RCX register. Note the number of ite= rations is implementation-dependent. Available PDIST counters: 0", + "PublicDescription": "Number of iterations of Repeat (REP) string = retired instructions such as MOVS, CMPS, and SCAS. Each has a byte, word, a= nd doubleword version and string instructions can be repeated using a repet= ition prefix, REP, that allows their architectural execution to be repeated= a number of times as specified by the RCX register. Note the number of ite= rations is implementation-dependent.", "SampleAfterValue": "2000003", "UMask": "0x8" }, @@ -529,7 +513,7 @@ "EdgeDetect": "1", "EventCode": "0xad", "EventName": "INT_MISC.CLEARS_COUNT", - "PublicDescription": "Counts the number of speculative clears due = to any type of branch misprediction or machine clears Available PDIST count= ers: 0", + "PublicDescription": "Counts the number of speculative clears due = to any type of branch misprediction or machine clears", "SampleAfterValue": "500009", "UMask": "0x1" }, @@ -538,7 +522,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xad", "EventName": "INT_MISC.CLEAR_RESTEER_CYCLES", - "PublicDescription": "Cycles after recovery from a branch mispredi= ction or machine clear till the first uop is issued from the resteered path= . Available PDIST counters: 0", + "PublicDescription": "Cycles after recovery from a branch mispredi= ction or machine clear till the first uop is issued from the resteered path= .", "SampleAfterValue": "500009", "UMask": "0x80" }, @@ -547,7 +531,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xad", "EventName": "INT_MISC.MBA_STALLS", - "PublicDescription": "INT_MISC.MBA_STALLS Available PDIST counters= : 0", "SampleAfterValue": "1000003", "UMask": "0x20" }, @@ -556,7 +539,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xad", "EventName": "INT_MISC.RECOVERY_CYCLES", - "PublicDescription": "Counts core cycles when the Resource allocat= or was stalled due to recovery from an earlier branch misprediction or mach= ine clear event. Available PDIST counters: 0", + "PublicDescription": "Counts core cycles when the Resource allocat= or was stalled due to recovery from an earlier branch misprediction or mach= ine clear event.", "SampleAfterValue": "500009", "UMask": "0x1" }, @@ -567,7 +550,6 @@ "EventName": "INT_MISC.UNKNOWN_BRANCH_CYCLES", "MSRIndex": "0x3F7", "MSRValue": "0x7", - "PublicDescription": "Bubble cycles of BAClear (Unknown Branch). A= vailable PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x40" }, @@ -576,7 +558,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xad", "EventName": "INT_MISC.UOP_DROPPING", - "PublicDescription": "Estimated number of Top-down Microarchitectu= re Analysis slots that got dropped due to non front-end reasons Available P= DIST counters: 0", + "PublicDescription": "Estimated number of Top-down Microarchitectu= re Analysis slots that got dropped due to non front-end reasons", "SampleAfterValue": "1000003", "UMask": "0x10" }, @@ -585,7 +567,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe7", "EventName": "INT_VEC_RETIRED.128BIT", - "PublicDescription": "INT_VEC_RETIRED.128BIT Available PDIST count= ers: 0", "SampleAfterValue": "1000003", "UMask": "0x13" }, @@ -594,7 +575,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe7", "EventName": "INT_VEC_RETIRED.256BIT", - "PublicDescription": "INT_VEC_RETIRED.256BIT Available PDIST count= ers: 0", "SampleAfterValue": "1000003", "UMask": "0xac" }, @@ -603,7 +583,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe7", "EventName": "INT_VEC_RETIRED.ADD_128", - "PublicDescription": "Number of retired integer ADD/SUB (regular o= r horizontal), SAD 128-bit vector instructions. Available PDIST counters: 0= ", + "PublicDescription": "Number of retired integer ADD/SUB (regular o= r horizontal), SAD 128-bit vector instructions.", "SampleAfterValue": "1000003", "UMask": "0x3" }, @@ -612,7 +592,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe7", "EventName": "INT_VEC_RETIRED.ADD_256", - "PublicDescription": "Number of retired integer ADD/SUB (regular o= r horizontal), SAD 256-bit vector instructions. Available PDIST counters: 0= ", + "PublicDescription": "Number of retired integer ADD/SUB (regular o= r horizontal), SAD 256-bit vector instructions.", "SampleAfterValue": "1000003", "UMask": "0xc" }, @@ -621,7 +601,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe7", "EventName": "INT_VEC_RETIRED.MUL_256", - "PublicDescription": "INT_VEC_RETIRED.MUL_256 Available PDIST coun= ters: 0", "SampleAfterValue": "1000003", "UMask": "0x80" }, @@ -630,7 +609,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe7", "EventName": "INT_VEC_RETIRED.SHUFFLES", - "PublicDescription": "INT_VEC_RETIRED.SHUFFLES Available PDIST cou= nters: 0", "SampleAfterValue": "1000003", "UMask": "0x40" }, @@ -639,7 +617,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe7", "EventName": "INT_VEC_RETIRED.VNNI_128", - "PublicDescription": "INT_VEC_RETIRED.VNNI_128 Available PDIST cou= nters: 0", "SampleAfterValue": "1000003", "UMask": "0x10" }, @@ -648,7 +625,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe7", "EventName": "INT_VEC_RETIRED.VNNI_256", - "PublicDescription": "INT_VEC_RETIRED.VNNI_256 Available PDIST cou= nters: 0", "SampleAfterValue": "1000003", "UMask": "0x20" }, @@ -657,7 +633,7 @@ "Counter": "0,1,2,3", "EventCode": "0x03", "EventName": "LD_BLOCKS.ADDRESS_ALIAS", - "PublicDescription": "Counts the number of times a load got blocke= d due to false dependencies in MOB due to partial compare on address. Avail= able PDIST counters: 0", + "PublicDescription": "Counts the number of times a load got blocke= d due to false dependencies in MOB due to partial compare on address.", "SampleAfterValue": "100003", "UMask": "0x4" }, @@ -666,7 +642,7 @@ "Counter": "0,1,2,3", "EventCode": "0x03", "EventName": "LD_BLOCKS.NO_SR", - "PublicDescription": "Counts the number of times that split load o= perations are temporarily blocked because all resources for handling the sp= lit accesses are in use. Available PDIST counters: 0", + "PublicDescription": "Counts the number of times that split load o= perations are temporarily blocked because all resources for handling the sp= lit accesses are in use.", "SampleAfterValue": "100003", "UMask": "0x88" }, @@ -675,7 +651,7 @@ "Counter": "0,1,2,3", "EventCode": "0x03", "EventName": "LD_BLOCKS.STORE_FORWARD", - "PublicDescription": "Counts the number of times where store forwa= rding was prevented for a load operation. The most common case is a load bl= ocked due to the address of memory access (partially) overlapping with a pr= eceding uncompleted store. Note: See the table of not supported store forwa= rds in the Optimization Guide. Available PDIST counters: 0", + "PublicDescription": "Counts the number of times where store forwa= rding was prevented for a load operation. The most common case is a load bl= ocked due to the address of memory access (partially) overlapping with a pr= eceding uncompleted store. Note: See the table of not supported store forwa= rds in the Optimization Guide.", "SampleAfterValue": "100003", "UMask": "0x82" }, @@ -684,7 +660,7 @@ "Counter": "0,1,2,3", "EventCode": "0x4c", "EventName": "LOAD_HIT_PREFETCH.SWPF", - "PublicDescription": "Counts all software-prefetch load dispatches= that hit the fill buffer (FB) allocated for the software prefetch. It can = also be incremented by some lock instructions. So it should only be used wi= th profiling so that the locks can be excluded by ASM (Assembly File) inspe= ction of the nearby instructions. Available PDIST counters: 0", + "PublicDescription": "Counts all software-prefetch load dispatches= that hit the fill buffer (FB) allocated for the software prefetch. It can = also be incremented by some lock instructions. So it should only be used wi= th profiling so that the locks can be excluded by ASM (Assembly File) inspe= ction of the nearby instructions.", "SampleAfterValue": "100003", "UMask": "0x1" }, @@ -694,7 +670,7 @@ "CounterMask": "1", "EventCode": "0xa8", "EventName": "LSD.CYCLES_ACTIVE", - "PublicDescription": "Counts the cycles when at least one uop is d= elivered by the LSD (Loop-stream detector). Available PDIST counters: 0", + "PublicDescription": "Counts the cycles when at least one uop is d= elivered by the LSD (Loop-stream detector).", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -704,7 +680,7 @@ "CounterMask": "6", "EventCode": "0xa8", "EventName": "LSD.CYCLES_OK", - "PublicDescription": "Counts the cycles when optimal number of uop= s is delivered by the LSD (Loop-stream detector). Available PDIST counters:= 0", + "PublicDescription": "Counts the cycles when optimal number of uop= s is delivered by the LSD (Loop-stream detector).", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -713,7 +689,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa8", "EventName": "LSD.UOPS", - "PublicDescription": "Counts the number of uops delivered to the b= ack-end by the LSD(Loop Stream Detector). Available PDIST counters: 0", + "PublicDescription": "Counts the number of uops delivered to the b= ack-end by the LSD(Loop Stream Detector).", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -724,7 +700,7 @@ "EdgeDetect": "1", "EventCode": "0xc3", "EventName": "MACHINE_CLEARS.COUNT", - "PublicDescription": "Counts the number of machine clears (nukes) = of any type. Available PDIST counters: 0", + "PublicDescription": "Counts the number of machine clears (nukes) = of any type.", "SampleAfterValue": "100003", "UMask": "0x1" }, @@ -733,7 +709,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc3", "EventName": "MACHINE_CLEARS.SMC", - "PublicDescription": "Counts self-modifying code (SMC) detected, w= hich causes a machine clear. Available PDIST counters: 0", + "PublicDescription": "Counts self-modifying code (SMC) detected, w= hich causes a machine clear.", "SampleAfterValue": "100003", "UMask": "0x4" }, @@ -742,7 +718,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe0", "EventName": "MISC2_RETIRED.LFENCE", - "PublicDescription": "number of LFENCE retired instructions Availa= ble PDIST counters: 0", + "PublicDescription": "number of LFENCE retired instructions", "SampleAfterValue": "400009", "UMask": "0x20" }, @@ -751,7 +727,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xcc", "EventName": "MISC_RETIRED.LBR_INSERTS", - "PublicDescription": "Increments when an entry is added to the Las= t Branch Record (LBR) array (or removed from the array in case of RETURNs i= n call stack mode). The event requires LBR enable via IA32_DEBUGCTL MSR and= branch type selection via MSR_LBR_SELECT. Available PDIST counters: 0", + "PublicDescription": "Increments when an entry is added to the Las= t Branch Record (LBR) array (or removed from the array in case of RETURNs i= n call stack mode). The event requires LBR enable via IA32_DEBUGCTL MSR and= branch type selection via MSR_LBR_SELECT.", "SampleAfterValue": "100003", "UMask": "0x20" }, @@ -760,7 +736,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa2", "EventName": "RESOURCE_STALLS.SB", - "PublicDescription": "Counts allocation stall cycles caused by the= store buffer (SB) being full. This counts cycles that the pipeline back-en= d blocked uop delivery from the front-end. Available PDIST counters: 0", + "PublicDescription": "Counts allocation stall cycles caused by the= store buffer (SB) being full. This counts cycles that the pipeline back-en= d blocked uop delivery from the front-end.", "SampleAfterValue": "100003", "UMask": "0x8" }, @@ -769,7 +745,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa2", "EventName": "RESOURCE_STALLS.SCOREBOARD", - "PublicDescription": "Counts cycles where the pipeline is stalled = due to serializing operations. Available PDIST counters: 0", "SampleAfterValue": "100003", "UMask": "0x2" }, @@ -778,7 +753,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa5", "EventName": "RS.EMPTY", - "PublicDescription": "Counts cycles during which the reservation s= tation (RS) is empty for this logical processor. This is usually caused whe= n the front-end pipeline runs into starvation periods (e.g. branch mispredi= ctions or i-cache misses) Available PDIST counters: 0", + "PublicDescription": "Counts cycles during which the reservation s= tation (RS) is empty for this logical processor. This is usually caused whe= n the front-end pipeline runs into starvation periods (e.g. branch mispredi= ctions or i-cache misses)", "SampleAfterValue": "1000003", "UMask": "0x7" }, @@ -790,7 +765,7 @@ "EventCode": "0xa5", "EventName": "RS.EMPTY_COUNT", "Invert": "1", - "PublicDescription": "Counts end of periods where the Reservation = Station (RS) was empty. Could be useful to closely sample on front-end late= ncy issues (see the FRONTEND_RETIRED event of designated precise events) Av= ailable PDIST counters: 0", + "PublicDescription": "Counts end of periods where the Reservation = Station (RS) was empty. Could be useful to closely sample on front-end late= ncy issues (see the FRONTEND_RETIRED event of designated precise events)", "SampleAfterValue": "100003", "UMask": "0x7" }, @@ -799,7 +774,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa5", "EventName": "RS.EMPTY_RESOURCE", - "PublicDescription": "Cycles when Reservation Station (RS) is empt= y due to a resource in the back-end Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x1" }, @@ -812,7 +786,6 @@ "EventCode": "0xa5", "EventName": "RS_EMPTY.COUNT", "Invert": "1", - "PublicDescription": "This event is deprecated. Refer to new event= RS.EMPTY_COUNT Available PDIST counters: 0", "SampleAfterValue": "100003", "UMask": "0x7" }, @@ -822,7 +795,6 @@ "Deprecated": "1", "EventCode": "0xa5", "EventName": "RS_EMPTY.CYCLES", - "PublicDescription": "This event is deprecated. Refer to new event= RS.EMPTY Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x7" }, @@ -831,7 +803,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa4", "EventName": "TOPDOWN.BACKEND_BOUND_SLOTS", - "PublicDescription": "Number of slots in TMA method where no micro= -operations were being issued from front-end to back-end of the machine due= to lack of back-end resources. Available PDIST counters: 0", + "PublicDescription": "Number of slots in TMA method where no micro= -operations were being issued from front-end to back-end of the machine due= to lack of back-end resources.", "SampleAfterValue": "10000003", "UMask": "0x2" }, @@ -840,7 +812,7 @@ "Counter": "0", "EventCode": "0xa4", "EventName": "TOPDOWN.BAD_SPEC_SLOTS", - "PublicDescription": "Number of slots of TMA method that were wast= ed due to incorrect speculation. It covers all types of control-flow or dat= a-related mis-speculations. Available PDIST counters: 0", + "PublicDescription": "Number of slots of TMA method that were wast= ed due to incorrect speculation. It covers all types of control-flow or dat= a-related mis-speculations.", "SampleAfterValue": "10000003", "UMask": "0x4" }, @@ -849,7 +821,7 @@ "Counter": "0", "EventCode": "0xa4", "EventName": "TOPDOWN.BR_MISPREDICT_SLOTS", - "PublicDescription": "Number of TMA slots that were wasted due to = incorrect speculation by (any type of) branch mispredictions. This event es= timates number of speculative operations that were issued but not retired a= s well as the out-of-order engine recovery past a branch misprediction. Ava= ilable PDIST counters: 0", + "PublicDescription": "Number of TMA slots that were wasted due to = incorrect speculation by (any type of) branch mispredictions. This event es= timates number of speculative operations that were issued but not retired a= s well as the out-of-order engine recovery past a branch misprediction.", "SampleAfterValue": "10000003", "UMask": "0x8" }, @@ -858,7 +830,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa4", "EventName": "TOPDOWN.MEMORY_BOUND_SLOTS", - "PublicDescription": "TOPDOWN.MEMORY_BOUND_SLOTS Available PDIST c= ounters: 0", "SampleAfterValue": "10000003", "UMask": "0x10" }, @@ -875,7 +846,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa4", "EventName": "TOPDOWN.SLOTS_P", - "PublicDescription": "Counts the number of available slots for an = unhalted logical processor. The event increments by machine-width of the na= rrowest pipeline as employed by the Top-down Microarchitecture Analysis met= hod. The count is distributed among unhalted logical processors (hyper-thre= ads) who share the same physical core. Available PDIST counters: 0", + "PublicDescription": "Counts the number of available slots for an = unhalted logical processor. The event increments by machine-width of the na= rrowest pipeline as employed by the Top-down Microarchitecture Analysis met= hod. The count is distributed among unhalted logical processors (hyper-thre= ads) who share the same physical core.", "SampleAfterValue": "10000003", "UMask": "0x1" }, @@ -884,7 +855,6 @@ "Counter": "0,1,2,3", "EventCode": "0x76", "EventName": "UOPS_DECODED.DEC0_UOPS", - "PublicDescription": "UOPS_DECODED.DEC0_UOPS Available PDIST count= ers: 0", "SampleAfterValue": "1000003", "UMask": "0x1" }, @@ -893,7 +863,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb2", "EventName": "UOPS_DISPATCHED.PORT_0", - "PublicDescription": "Number of uops dispatch to execution port 0= . Available PDIST counters: 0", + "PublicDescription": "Number of uops dispatch to execution port 0= .", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -902,7 +872,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb2", "EventName": "UOPS_DISPATCHED.PORT_1", - "PublicDescription": "Number of uops dispatch to execution port 1= . Available PDIST counters: 0", + "PublicDescription": "Number of uops dispatch to execution port 1= .", "SampleAfterValue": "2000003", "UMask": "0x2" }, @@ -911,7 +881,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb2", "EventName": "UOPS_DISPATCHED.PORT_2_3_10", - "PublicDescription": "Number of uops dispatch to execution ports 2= , 3 and 10 Available PDIST counters: 0", + "PublicDescription": "Number of uops dispatch to execution ports 2= , 3 and 10", "SampleAfterValue": "2000003", "UMask": "0x4" }, @@ -920,7 +890,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb2", "EventName": "UOPS_DISPATCHED.PORT_4_9", - "PublicDescription": "Number of uops dispatch to execution ports 4= and 9 Available PDIST counters: 0", + "PublicDescription": "Number of uops dispatch to execution ports 4= and 9", "SampleAfterValue": "2000003", "UMask": "0x10" }, @@ -929,7 +899,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb2", "EventName": "UOPS_DISPATCHED.PORT_5_11", - "PublicDescription": "Number of uops dispatch to execution ports 5= and 11 Available PDIST counters: 0", + "PublicDescription": "Number of uops dispatch to execution ports 5= and 11", "SampleAfterValue": "2000003", "UMask": "0x20" }, @@ -938,7 +908,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb2", "EventName": "UOPS_DISPATCHED.PORT_6", - "PublicDescription": "Number of uops dispatch to execution port 6= . Available PDIST counters: 0", + "PublicDescription": "Number of uops dispatch to execution port 6= .", "SampleAfterValue": "2000003", "UMask": "0x40" }, @@ -947,7 +917,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb2", "EventName": "UOPS_DISPATCHED.PORT_7_8", - "PublicDescription": "Number of uops dispatch to execution ports = 7 and 8. Available PDIST counters: 0", + "PublicDescription": "Number of uops dispatch to execution ports = 7 and 8.", "SampleAfterValue": "2000003", "UMask": "0x80" }, @@ -956,7 +926,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.CORE", - "PublicDescription": "Counts the number of uops executed from any = thread. Available PDIST counters: 0", + "PublicDescription": "Counts the number of uops executed from any = thread.", "SampleAfterValue": "2000003", "UMask": "0x2" }, @@ -966,7 +936,7 @@ "CounterMask": "1", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.CORE_CYCLES_GE_1", - "PublicDescription": "Counts cycles when at least 1 micro-op is ex= ecuted from any thread on physical core. Available PDIST counters: 0", + "PublicDescription": "Counts cycles when at least 1 micro-op is ex= ecuted from any thread on physical core.", "SampleAfterValue": "2000003", "UMask": "0x2" }, @@ -976,7 +946,7 @@ "CounterMask": "2", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.CORE_CYCLES_GE_2", - "PublicDescription": "Counts cycles when at least 2 micro-ops are = executed from any thread on physical core. Available PDIST counters: 0", + "PublicDescription": "Counts cycles when at least 2 micro-ops are = executed from any thread on physical core.", "SampleAfterValue": "2000003", "UMask": "0x2" }, @@ -986,7 +956,7 @@ "CounterMask": "3", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.CORE_CYCLES_GE_3", - "PublicDescription": "Counts cycles when at least 3 micro-ops are = executed from any thread on physical core. Available PDIST counters: 0", + "PublicDescription": "Counts cycles when at least 3 micro-ops are = executed from any thread on physical core.", "SampleAfterValue": "2000003", "UMask": "0x2" }, @@ -996,7 +966,7 @@ "CounterMask": "4", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.CORE_CYCLES_GE_4", - "PublicDescription": "Counts cycles when at least 4 micro-ops are = executed from any thread on physical core. Available PDIST counters: 0", + "PublicDescription": "Counts cycles when at least 4 micro-ops are = executed from any thread on physical core.", "SampleAfterValue": "2000003", "UMask": "0x2" }, @@ -1006,7 +976,7 @@ "CounterMask": "1", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.CYCLES_GE_1", - "PublicDescription": "Cycles where at least 1 uop was executed per= -thread. Available PDIST counters: 0", + "PublicDescription": "Cycles where at least 1 uop was executed per= -thread.", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -1016,7 +986,7 @@ "CounterMask": "2", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.CYCLES_GE_2", - "PublicDescription": "Cycles where at least 2 uops were executed p= er-thread. Available PDIST counters: 0", + "PublicDescription": "Cycles where at least 2 uops were executed p= er-thread.", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -1026,7 +996,7 @@ "CounterMask": "3", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.CYCLES_GE_3", - "PublicDescription": "Cycles where at least 3 uops were executed p= er-thread. Available PDIST counters: 0", + "PublicDescription": "Cycles where at least 3 uops were executed p= er-thread.", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -1036,7 +1006,7 @@ "CounterMask": "4", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.CYCLES_GE_4", - "PublicDescription": "Cycles where at least 4 uops were executed p= er-thread. Available PDIST counters: 0", + "PublicDescription": "Cycles where at least 4 uops were executed p= er-thread.", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -1047,7 +1017,7 @@ "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.STALLS", "Invert": "1", - "PublicDescription": "Counts cycles during which no uops were disp= atched from the Reservation Station (RS) per thread. Available PDIST counte= rs: 0", + "PublicDescription": "Counts cycles during which no uops were disp= atched from the Reservation Station (RS) per thread.", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -1059,7 +1029,6 @@ "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.STALL_CYCLES", "Invert": "1", - "PublicDescription": "This event is deprecated. Refer to new event= UOPS_EXECUTED.STALLS Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -1068,7 +1037,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.THREAD", - "PublicDescription": "Counts the number of uops to be executed per= -thread each cycle. Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -1077,7 +1045,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.X87", - "PublicDescription": "Counts the number of x87 uops executed. Avai= lable PDIST counters: 0", + "PublicDescription": "Counts the number of x87 uops executed.", "SampleAfterValue": "2000003", "UMask": "0x10" }, @@ -1086,7 +1054,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xae", "EventName": "UOPS_ISSUED.ANY", - "PublicDescription": "Counts the number of uops that the Resource = Allocation Table (RAT) issues to the Reservation Station (RS). Available PD= IST counters: 0", + "PublicDescription": "Counts the number of uops that the Resource = Allocation Table (RAT) issues to the Reservation Station (RS).", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -1096,7 +1064,6 @@ "CounterMask": "1", "EventCode": "0xae", "EventName": "UOPS_ISSUED.CYCLES", - "PublicDescription": "UOPS_ISSUED.CYCLES Available PDIST counters:= 0", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -1106,7 +1073,7 @@ "CounterMask": "1", "EventCode": "0xc2", "EventName": "UOPS_RETIRED.CYCLES", - "PublicDescription": "Counts cycles where at least one uop has ret= ired. Available PDIST counters: 0", + "PublicDescription": "Counts cycles where at least one uop has ret= ired.", "SampleAfterValue": "1000003", "UMask": "0x2" }, @@ -1115,7 +1082,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc2", "EventName": "UOPS_RETIRED.HEAVY", - "PublicDescription": "Counts the number of retired micro-operation= s (uops) except the last uop of each instruction. An instruction that is de= coded into less than two uops does not contribute to the count. Available P= DIST counters: 0", + "PublicDescription": "Counts the number of retired micro-operation= s (uops) except the last uop of each instruction. An instruction that is de= coded into less than two uops does not contribute to the count.", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -1126,7 +1093,6 @@ "EventName": "UOPS_RETIRED.MS", "MSRIndex": "0x3F7", "MSRValue": "0x8", - "PublicDescription": "UOPS_RETIRED.MS Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x4" }, @@ -1135,7 +1101,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc2", "EventName": "UOPS_RETIRED.SLOTS", - "PublicDescription": "Counts the retirement slots used each cycle.= Available PDIST counters: 0", + "PublicDescription": "Counts the retirement slots used each cycle.= ", "SampleAfterValue": "2000003", "UMask": "0x2" }, @@ -1146,7 +1112,7 @@ "EventCode": "0xc2", "EventName": "UOPS_RETIRED.STALLS", "Invert": "1", - "PublicDescription": "This event counts cycles without actually re= tired uops. Available PDIST counters: 0", + "PublicDescription": "This event counts cycles without actually re= tired uops.", "SampleAfterValue": "1000003", "UMask": "0x2" }, @@ -1158,7 +1124,6 @@ "EventCode": "0xc2", "EventName": "UOPS_RETIRED.STALL_CYCLES", "Invert": "1", - "PublicDescription": "This event is deprecated. Refer to new event= UOPS_RETIRED.STALLS Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x2" } diff --git a/tools/perf/pmu-events/arch/x86/emeraldrapids/uncore-memory.jso= n b/tools/perf/pmu-events/arch/x86/emeraldrapids/uncore-memory.json index 68be01dad7c9..90f61c9511fc 100644 --- a/tools/perf/pmu-events/arch/x86/emeraldrapids/uncore-memory.json +++ b/tools/perf/pmu-events/arch/x86/emeraldrapids/uncore-memory.json @@ -2769,6 +2769,88 @@ "UMask": "0x3", "Unit": "iMC" }, + { + "BriefDescription": "Number of DRAM Refreshes Issued", + "Counter": "0,1,2,3", + "EventCode": "0x45", + "EventName": "UNC_M_DRAM_REFRESH.HIGH", + "Experimental": "1", + "PerPkg": "1", + "PublicDescription": "Number of DRAM Refreshes Issued : Counts the= number of refreshes issued.", + "UMask": "0x24", + "Unit": "iMC" + }, + { + "BriefDescription": "Number of DRAM Refreshes Issued", + "Counter": "0,1,2,3", + "EventCode": "0x45", + "EventName": "UNC_M_DRAM_REFRESH.HIGH_ALL", + "Experimental": "1", + "PerPkg": "1", + "UMask": "0x24", + "Unit": "iMC" + }, + { + "BriefDescription": "Number of DRAM Refreshes Issued", + "Counter": "0,1,2,3", + "EventCode": "0x45", + "EventName": "UNC_M_DRAM_REFRESH.HIGH_PCH0", + "Experimental": "1", + "PerPkg": "1", + "UMask": "0x4", + "Unit": "iMC" + }, + { + "BriefDescription": "Number of DRAM Refreshes Issued", + "Counter": "0,1,2,3", + "EventCode": "0x45", + "EventName": "UNC_M_DRAM_REFRESH.HIGH_PCH1", + "Experimental": "1", + "PerPkg": "1", + "UMask": "0x20", + "Unit": "iMC" + }, + { + "BriefDescription": "Number of DRAM Refreshes Issued", + "Counter": "0,1,2,3", + "EventCode": "0x45", + "EventName": "UNC_M_DRAM_REFRESH.PANIC", + "Experimental": "1", + "PerPkg": "1", + "PublicDescription": "Number of DRAM Refreshes Issued : Counts the= number of refreshes issued.", + "UMask": "0x12", + "Unit": "iMC" + }, + { + "BriefDescription": "Number of DRAM Refreshes Issued", + "Counter": "0,1,2,3", + "EventCode": "0x45", + "EventName": "UNC_M_DRAM_REFRESH.PANIC_ALL", + "Experimental": "1", + "PerPkg": "1", + "UMask": "0x12", + "Unit": "iMC" + }, + { + "BriefDescription": "Number of DRAM Refreshes Issued", + "Counter": "0,1,2,3", + "EventCode": "0x45", + "EventName": "UNC_M_DRAM_REFRESH.PANIC_PCH0", + "Experimental": "1", + "PerPkg": "1", + "UMask": "0x2", + "Unit": "iMC" + }, + { + "BriefDescription": "Number of DRAM Refreshes Issued", + "Counter": "0,1,2,3", + "EventCode": "0x45", + "EventName": "UNC_M_DRAM_REFRESH.PANIC_PCH1", + "Experimental": "1", + "PerPkg": "1", + "UMask": "0x10", + "Unit": "iMC" + }, { "BriefDescription": "ECC Correctable Errors", "Counter": "0,1,2,3", diff --git a/tools/perf/pmu-events/arch/x86/emeraldrapids/virtual-memory.js= on b/tools/perf/pmu-events/arch/x86/emeraldrapids/virtual-memory.json index 3d3f88600e26..609a9549cbf3 100644 --- a/tools/perf/pmu-events/arch/x86/emeraldrapids/virtual-memory.json +++ b/tools/perf/pmu-events/arch/x86/emeraldrapids/virtual-memory.json @@ -4,7 +4,7 @@ "Counter": "0,1,2,3", "EventCode": "0x12", "EventName": "DTLB_LOAD_MISSES.STLB_HIT", - "PublicDescription": "Counts loads that miss the DTLB (Data TLB) a= nd hit the STLB (Second level TLB). Available PDIST counters: 0", + "PublicDescription": "Counts loads that miss the DTLB (Data TLB) a= nd hit the STLB (Second level TLB).", "SampleAfterValue": "100003", "UMask": "0x20" }, @@ -14,7 +14,7 @@ "CounterMask": "1", "EventCode": "0x12", "EventName": "DTLB_LOAD_MISSES.WALK_ACTIVE", - "PublicDescription": "Counts cycles when at least one PMH (Page Mi= ss Handler) is busy with a page walk for a demand load. Available PDIST cou= nters: 0", + "PublicDescription": "Counts cycles when at least one PMH (Page Mi= ss Handler) is busy with a page walk for a demand load.", "SampleAfterValue": "100003", "UMask": "0x10" }, @@ -23,7 +23,7 @@ "Counter": "0,1,2,3", "EventCode": "0x12", "EventName": "DTLB_LOAD_MISSES.WALK_COMPLETED", - "PublicDescription": "Counts completed page walks (all page sizes= ) caused by demand data loads. This implies it missed in the DTLB and furth= er levels of TLB. The page walk can end with or without a fault. Available = PDIST counters: 0", + "PublicDescription": "Counts completed page walks (all page sizes= ) caused by demand data loads. This implies it missed in the DTLB and furth= er levels of TLB. The page walk can end with or without a fault.", "SampleAfterValue": "100003", "UMask": "0xe" }, @@ -32,7 +32,7 @@ "Counter": "0,1,2,3", "EventCode": "0x12", "EventName": "DTLB_LOAD_MISSES.WALK_COMPLETED_1G", - "PublicDescription": "Counts completed page walks (1G sizes) caus= ed by demand data loads. This implies address translations missed in the DT= LB and further levels of TLB. The page walk can end with or without a fault= . Available PDIST counters: 0", + "PublicDescription": "Counts completed page walks (1G sizes) caus= ed by demand data loads. This implies address translations missed in the DT= LB and further levels of TLB. The page walk can end with or without a fault= .", "SampleAfterValue": "100003", "UMask": "0x8" }, @@ -41,7 +41,7 @@ "Counter": "0,1,2,3", "EventCode": "0x12", "EventName": "DTLB_LOAD_MISSES.WALK_COMPLETED_2M_4M", - "PublicDescription": "Counts completed page walks (2M/4M sizes) c= aused by demand data loads. This implies address translations missed in the= DTLB and further levels of TLB. The page walk can end with or without a fa= ult. Available PDIST counters: 0", + "PublicDescription": "Counts completed page walks (2M/4M sizes) c= aused by demand data loads. This implies address translations missed in the= DTLB and further levels of TLB. The page walk can end with or without a fa= ult.", "SampleAfterValue": "100003", "UMask": "0x4" }, @@ -50,7 +50,7 @@ "Counter": "0,1,2,3", "EventCode": "0x12", "EventName": "DTLB_LOAD_MISSES.WALK_COMPLETED_4K", - "PublicDescription": "Counts completed page walks (4K sizes) caus= ed by demand data loads. This implies address translations missed in the DT= LB and further levels of TLB. The page walk can end with or without a fault= . Available PDIST counters: 0", + "PublicDescription": "Counts completed page walks (4K sizes) caus= ed by demand data loads. This implies address translations missed in the DT= LB and further levels of TLB. The page walk can end with or without a fault= .", "SampleAfterValue": "100003", "UMask": "0x2" }, @@ -59,7 +59,7 @@ "Counter": "0,1,2,3", "EventCode": "0x12", "EventName": "DTLB_LOAD_MISSES.WALK_PENDING", - "PublicDescription": "Counts the number of page walks outstanding = for a demand load in the PMH (Page Miss Handler) each cycle. Available PDIS= T counters: 0", + "PublicDescription": "Counts the number of page walks outstanding = for a demand load in the PMH (Page Miss Handler) each cycle.", "SampleAfterValue": "100003", "UMask": "0x10" }, @@ -68,7 +68,7 @@ "Counter": "0,1,2,3", "EventCode": "0x13", "EventName": "DTLB_STORE_MISSES.STLB_HIT", - "PublicDescription": "Counts stores that miss the DTLB (Data TLB) = and hit the STLB (2nd Level TLB). Available PDIST counters: 0", + "PublicDescription": "Counts stores that miss the DTLB (Data TLB) = and hit the STLB (2nd Level TLB).", "SampleAfterValue": "100003", "UMask": "0x20" }, @@ -78,7 +78,7 @@ "CounterMask": "1", "EventCode": "0x13", "EventName": "DTLB_STORE_MISSES.WALK_ACTIVE", - "PublicDescription": "Counts cycles when at least one PMH (Page Mi= ss Handler) is busy with a page walk for a store. Available PDIST counters:= 0", + "PublicDescription": "Counts cycles when at least one PMH (Page Mi= ss Handler) is busy with a page walk for a store.", "SampleAfterValue": "100003", "UMask": "0x10" }, @@ -87,7 +87,7 @@ "Counter": "0,1,2,3", "EventCode": "0x13", "EventName": "DTLB_STORE_MISSES.WALK_COMPLETED", - "PublicDescription": "Counts completed page walks (all page sizes= ) caused by demand data stores. This implies it missed in the DTLB and furt= her levels of TLB. The page walk can end with or without a fault. Available= PDIST counters: 0", + "PublicDescription": "Counts completed page walks (all page sizes= ) caused by demand data stores. This implies it missed in the DTLB and furt= her levels of TLB. The page walk can end with or without a fault.", "SampleAfterValue": "100003", "UMask": "0xe" }, @@ -96,7 +96,7 @@ "Counter": "0,1,2,3", "EventCode": "0x13", "EventName": "DTLB_STORE_MISSES.WALK_COMPLETED_1G", - "PublicDescription": "Counts completed page walks (1G sizes) caus= ed by demand data stores. This implies address translations missed in the D= TLB and further levels of TLB. The page walk can end with or without a faul= t. Available PDIST counters: 0", + "PublicDescription": "Counts completed page walks (1G sizes) caus= ed by demand data stores. This implies address translations missed in the D= TLB and further levels of TLB. The page walk can end with or without a faul= t.", "SampleAfterValue": "100003", "UMask": "0x8" }, @@ -105,7 +105,7 @@ "Counter": "0,1,2,3", "EventCode": "0x13", "EventName": "DTLB_STORE_MISSES.WALK_COMPLETED_2M_4M", - "PublicDescription": "Counts completed page walks (2M/4M sizes) c= aused by demand data stores. This implies address translations missed in th= e DTLB and further levels of TLB. The page walk can end with or without a f= ault. Available PDIST counters: 0", + "PublicDescription": "Counts completed page walks (2M/4M sizes) c= aused by demand data stores. This implies address translations missed in th= e DTLB and further levels of TLB. The page walk can end with or without a f= ault.", "SampleAfterValue": "100003", "UMask": "0x4" }, @@ -114,7 +114,7 @@ "Counter": "0,1,2,3", "EventCode": "0x13", "EventName": "DTLB_STORE_MISSES.WALK_COMPLETED_4K", - "PublicDescription": "Counts completed page walks (4K sizes) caus= ed by demand data stores. This implies address translations missed in the D= TLB and further levels of TLB. The page walk can end with or without a faul= t. Available PDIST counters: 0", + "PublicDescription": "Counts completed page walks (4K sizes) caus= ed by demand data stores. This implies address translations missed in the D= TLB and further levels of TLB. The page walk can end with or without a faul= t.", "SampleAfterValue": "100003", "UMask": "0x2" }, @@ -123,7 +123,7 @@ "Counter": "0,1,2,3", "EventCode": "0x13", "EventName": "DTLB_STORE_MISSES.WALK_PENDING", - "PublicDescription": "Counts the number of page walks outstanding = for a store in the PMH (Page Miss Handler) each cycle. Available PDIST coun= ters: 0", + "PublicDescription": "Counts the number of page walks outstanding = for a store in the PMH (Page Miss Handler) each cycle.", "SampleAfterValue": "100003", "UMask": "0x10" }, @@ -132,7 +132,7 @@ "Counter": "0,1,2,3", "EventCode": "0x11", "EventName": "ITLB_MISSES.STLB_HIT", - "PublicDescription": "Counts instruction fetch requests that miss = the ITLB (Instruction TLB) and hit the STLB (Second-level TLB). Available P= DIST counters: 0", + "PublicDescription": "Counts instruction fetch requests that miss = the ITLB (Instruction TLB) and hit the STLB (Second-level TLB).", "SampleAfterValue": "100003", "UMask": "0x20" }, @@ -142,7 +142,7 @@ "CounterMask": "1", "EventCode": "0x11", "EventName": "ITLB_MISSES.WALK_ACTIVE", - "PublicDescription": "Counts cycles when at least one PMH (Page Mi= ss Handler) is busy with a page walk for a code (instruction fetch) request= . Available PDIST counters: 0", + "PublicDescription": "Counts cycles when at least one PMH (Page Mi= ss Handler) is busy with a page walk for a code (instruction fetch) request= .", "SampleAfterValue": "100003", "UMask": "0x10" }, @@ -151,7 +151,7 @@ "Counter": "0,1,2,3", "EventCode": "0x11", "EventName": "ITLB_MISSES.WALK_COMPLETED", - "PublicDescription": "Counts completed page walks (all page sizes)= caused by a code fetch. This implies it missed in the ITLB (Instruction TL= B) and further levels of TLB. The page walk can end with or without a fault= . Available PDIST counters: 0", + "PublicDescription": "Counts completed page walks (all page sizes)= caused by a code fetch. This implies it missed in the ITLB (Instruction TL= B) and further levels of TLB. The page walk can end with or without a fault= .", "SampleAfterValue": "100003", "UMask": "0xe" }, @@ -160,7 +160,7 @@ "Counter": "0,1,2,3", "EventCode": "0x11", "EventName": "ITLB_MISSES.WALK_COMPLETED_2M_4M", - "PublicDescription": "Counts completed page walks (2M/4M page size= s) caused by a code fetch. This implies it missed in the ITLB (Instruction = TLB) and further levels of TLB. The page walk can end with or without a fau= lt. Available PDIST counters: 0", + "PublicDescription": "Counts completed page walks (2M/4M page size= s) caused by a code fetch. This implies it missed in the ITLB (Instruction = TLB) and further levels of TLB. The page walk can end with or without a fau= lt.", "SampleAfterValue": "100003", "UMask": "0x4" }, @@ -169,7 +169,7 @@ "Counter": "0,1,2,3", "EventCode": "0x11", "EventName": "ITLB_MISSES.WALK_COMPLETED_4K", - "PublicDescription": "Counts completed page walks (4K page sizes) = caused by a code fetch. This implies it missed in the ITLB (Instruction TLB= ) and further levels of TLB. The page walk can end with or without a fault.= Available PDIST counters: 0", + "PublicDescription": "Counts completed page walks (4K page sizes) = caused by a code fetch. This implies it missed in the ITLB (Instruction TLB= ) and further levels of TLB. The page walk can end with or without a fault.= ", "SampleAfterValue": "100003", "UMask": "0x2" }, @@ -178,7 +178,7 @@ "Counter": "0,1,2,3", "EventCode": "0x11", "EventName": "ITLB_MISSES.WALK_PENDING", - "PublicDescription": "Counts the number of page walks outstanding = for an outstanding code (instruction fetch) request in the PMH (Page Miss H= andler) each cycle. Available PDIST counters: 0", + "PublicDescription": "Counts the number of page walks outstanding = for an outstanding code (instruction fetch) request in the PMH (Page Miss H= andler) each cycle.", "SampleAfterValue": "100003", "UMask": "0x10" } diff --git a/tools/perf/pmu-events/arch/x86/mapfile.csv b/tools/perf/pmu-ev= ents/arch/x86/mapfile.csv index 12017f568606..e3ac2dede20a 100644 --- a/tools/perf/pmu-events/arch/x86/mapfile.csv +++ b/tools/perf/pmu-events/arch/x86/mapfile.csv @@ -9,7 +9,7 @@ GenuineIntel-6-4F,v23,broadwellx,core GenuineIntel-6-55-[56789ABCDEF],v1.25,cascadelakex,core GenuineIntel-6-DD,v1.00,clearwaterforest,core GenuineIntel-6-9[6C],v1.05,elkhartlake,core -GenuineIntel-6-CF,v1.14,emeraldrapids,core +GenuineIntel-6-CF,v1.16,emeraldrapids,core GenuineIntel-6-5[CF],v13,goldmont,core GenuineIntel-6-7A,v1.01,goldmontplus,core GenuineIntel-6-B6,v1.09,grandridge,core --=20 2.50.0.727.gbf7dc18ff4-goog From nobody Mon Oct 6 15:13:22 2025 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1C87C21CC4F for ; Sat, 19 Jul 2025 03:45:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752896738; cv=none; b=tSWk/fKNm0aJfmvX2AvZc3FyjkOlB6OKuTmm8sd+PNC/HeRfRhoAyhWROPcJa7KSmGg4QxQWccayC+sHZe763nWhAdr4iuzZ0DdDsWtcV1vR2wxLT/dBeoHAkzdsMjFpC/YzT6hMU3H/bFy3YoydL3C9Gh8OZJubfJYPyNw6yC8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752896738; c=relaxed/simple; bh=DDZ1q3vFuDdF98BUvmYY3doV3qc9WHq9aGRyD8HdLzc=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Content-Type; b=Kndf0VgIurZRmbqrDkaPl+CWuXMOS5t9vsoZvVgNP1G85Kv0gPWcnT3EzgjfS/ArQtCoiJR/NW9pKGHAxR+fqCQ3CSUXQeWt6uputujmKFP9TpljvZVOpGBhpygFsKYZ45R114V0UjTZloJsllEx9HBo45xRH2053aDMn19SR8M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=PXHD6Rfd; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="PXHD6Rfd" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-31220ecc586so2511678a91.2 for ; Fri, 18 Jul 2025 20:45:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1752896736; x=1753501536; darn=vger.kernel.org; h=to:from:subject:message-id:references:mime-version:in-reply-to:date :from:to:cc:subject:date:message-id:reply-to; bh=EBMFXrG1xOekg4HmZGvQhV+SxD43VvAIrVyKbopoMYE=; b=PXHD6Rfd+QMQogs4e3nU1IGVTlLqCGWRCx9ezwR51qsxCFv9zoJBrMElgH2W4CuzIq 8a/kgQFhbKwKv7fL5Jl+b8ldxt7wlp4m8cfDGF5isXVLzKGS+PwHYpxoy7eO7pDhTUh/ GoaP1aWdE0bGUxHBnhtkqFjGuQyrPqxM6ND15nqdCpdU+94a/9MEurgMb2/s+WY7n5l7 ERkPZWIiI9f3PZiOjfK8UbLH01TavsFsKmE88BNQ41i8fEJLXI+LlKYTz0Xhq85XWFgF mOnHPpoASWeDXaoHWi7IMAqXlI3Fn5sn6IPahM2E4GqUq3z9bKGqfTJgXfCXNUuDpgL9 Bsww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752896736; x=1753501536; h=to:from:subject:message-id:references:mime-version:in-reply-to:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=EBMFXrG1xOekg4HmZGvQhV+SxD43VvAIrVyKbopoMYE=; b=L7KP0O1J+T+Hp2I8esMEZciEgHHHfSw/yX/tnCGc2BfM2SQcfBxPkFx8AG6FuEKLbF 2q3rfs1eVMpFIjK11EgwHljZLI7eIznhzJRbQDm8BqnLuqGiADtCnpPdht7t0BwGpriG oJamhHwC/vH3DUDi1U9cFJbk+ikR63JTdn9nnPvxYqs7Rq8wfp9HBAaH8OaJalbiWQty X3Gih1T38b6FKfOyt8PMlSF3ZrLPIAyr3khdilUlvfgPRJTmM3fSKPjroZmo2Z9r/dhX qNN+6xlONbpC7gzQTbuVBAkWGwyjFcuo0wa/gDHfXC//V1AwdlwNPgEbr0gXib7p7l78 Y6ng== X-Forwarded-Encrypted: i=1; AJvYcCXz0d5e703BtgPSXfy1NkOmuBy2/GaTrqC7CAbZuTNzsh6KzUV/E2mOhS4B/1hds3dlQZgEWu9nZebLHQM=@vger.kernel.org X-Gm-Message-State: AOJu0YyyDsFAWxKGfTKNlSsvL57Nx4qdTTXjGuH5YSmlsLzkYEsnZQmo 5An4tdeK7eKjFUmNJjOUUYU4viP6W27gUIg2D43/tn/pEyibcsp9sImlHq+FR2cv4C5eX1tYLIL Ky9tXfegTqw== X-Google-Smtp-Source: AGHT+IEo18BE4QLvAw1/92EVXaYrWdaNlYSyBsOVRiEX/tG+V2krErJPgtF5SmI3YJNMFye5tO/kkwnXwrG7 X-Received: from pjvv4.prod.google.com ([2002:a17:90b:5884:b0:311:b3fb:9f74]) (user=irogers job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:2dc4:b0:311:e8cc:425d with SMTP id 98e67ed59e1d1-31c9e6f7177mr20111314a91.10.1752896736426; Fri, 18 Jul 2025 20:45:36 -0700 (PDT) Date: Fri, 18 Jul 2025 20:45:02 -0700 In-Reply-To: <20250719034515.2000467-1-irogers@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250719034515.2000467-1-irogers@google.com> X-Mailer: git-send-email 2.50.0.727.gbf7dc18ff4-goog Message-ID: <20250719034515.2000467-7-irogers@google.com> Subject: [PATCH v1 06/19] perf vendor events: Update grandridge metrics From: Ian Rogers To: Thomas Falcon , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Kan Liang , "=?UTF-8?q?Andreas=20F=C3=A4rber?=" , Manivannan Sadhasivam , Caleb Biggers , Weilin Wang , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-actions@lists.infradead.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Update metrics from TMA 5.0 to 5.1. Signed-off-by: Ian Rogers Tested-by: Thomas Falcon --- .../arch/x86/grandridge/grr-metrics.json | 20 +++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/tools/perf/pmu-events/arch/x86/grandridge/grr-metrics.json b/t= ools/perf/pmu-events/arch/x86/grandridge/grr-metrics.json index 878b1caf12de..a0d637a24c1b 100644 --- a/tools/perf/pmu-events/arch/x86/grandridge/grr-metrics.json +++ b/tools/perf/pmu-events/arch/x86/grandridge/grr-metrics.json @@ -1,56 +1,56 @@ [ { "BriefDescription": "C10 residency percent per package", - "MetricExpr": "cstate_pkg@c10\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c10\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C10_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C1 residency percent per core", - "MetricExpr": "cstate_core@c1\\-residency@ / TSC", + "MetricExpr": "cstate_core@c1\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C1_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C2 residency percent per package", - "MetricExpr": "cstate_pkg@c2\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c2\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C2_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C3 residency percent per package", - "MetricExpr": "cstate_pkg@c3\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c3\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C3_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per core", - "MetricExpr": "cstate_core@c6\\-residency@ / TSC", + "MetricExpr": "cstate_core@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per package", - "MetricExpr": "cstate_pkg@c6\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C7 residency percent per core", - "MetricExpr": "cstate_core@c7\\-residency@ / TSC", + "MetricExpr": "cstate_core@c7\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C7_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C8 residency percent per package", - "MetricExpr": "cstate_pkg@c8\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c8\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C8_Pkg_Residency", "ScaleUnit": "100%" @@ -633,7 +633,7 @@ }, { "BriefDescription": "Average CPU Utilization", - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / TSC", + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@", "MetricName": "tma_info_system_cpu_utilization" }, { @@ -645,7 +645,7 @@ }, { "BriefDescription": "Fraction of cycles spent in Kernel mode", - "MetricExpr": "cpu@CPU_CLK_UNHALTED.CORE_P@k / CPU_CLK_UNHALTED.CO= RE", + "MetricExpr": "CPU_CLK_UNHALTED.CORE_P:k / CPU_CLK_UNHALTED.CORE", "MetricGroup": "Summary", "MetricName": "tma_info_system_kernel_utilization" }, --=20 2.50.0.727.gbf7dc18ff4-goog From nobody Mon Oct 6 15:13:22 2025 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 356E221D3C6 for ; Sat, 19 Jul 2025 03:45:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752896756; cv=none; b=cN9Fm+a6B5PsPgZ1vWJJiEAxflVZS33j1gHPUC7wyVHTUDH7MJJERnD3N6sBC7txHnX9eVR8f3ycooU+jMlr+lYThOuBGmrFY/cp75Nv79fk/3uzeUtTrx+olVIjaKE+srNdqWLVMnqnNEGNlY3HHV2BffvmoH3iDOFYW34ewPo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752896756; c=relaxed/simple; bh=wOLNO3yx1wa9mFJHI30jg+6HyZQ/bPvE74N+F7szJfw=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Content-Type; b=Y682piKNFNeNgWAw2CFVJP/P/g1yRkt+ZFL7O5eiPT4P2JIFVmx1n/QIx+9uZr3HcAl342bSoiDXCoCWEmnQBlizUwpxeviryfeBmpOHOnwCagXDfyedTmghfJX+5CXfxGDNc13BHrhb2U+wvQcCUkmJSvAc8+ZVVMEOSUITpcs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=xKYv0fuk; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="xKYv0fuk" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-315af08594fso2671825a91.2 for ; Fri, 18 Jul 2025 20:45:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1752896738; x=1753501538; darn=vger.kernel.org; h=content-transfer-encoding:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=MZ3KS+t0ffyWQ3hr8cOOwI0ImjvojyOWhW9dMKFdN3s=; b=xKYv0fukpuH8eZ/s+wr1mUWHQIUzBpP3iH5ksdy0zPxGoTupfZCCz9BTft9yI9mmXG Ql2lZDmfJH6Cw1ChPOnEXp4LMcYLg+nTqh3eaZFN3qMY8edGwfPEZJKkUuwEBmz/kuMY nvcdKZV3Y07ZSwtqUUxfrKq0jQqCHGCMziwQqZpxEv3/aBeShA+PcfK+Sd4S5B47p+6Q jqcu0lg2E+Et1tVYN44AtMU4ujQccnMVoHjlsBjj0qAuCWhyD1JUF3uRb+m44y5OxQTE nbfiCFwCoE526YOKzg5YjqRsVoWwMue4IM5+/Ex49zFk5McvPyB42hKoX8zPLdDS21R3 BSbA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752896738; x=1753501538; h=content-transfer-encoding:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=MZ3KS+t0ffyWQ3hr8cOOwI0ImjvojyOWhW9dMKFdN3s=; b=ivKIuG8bawYYa9+KlxImQnL3cNFCfRazDxM76UOgIld0QO+YRYFetC6CtqFsHaTRfa WEyhdbkYN6ynK4LeAmlN/HHn3EK7NLfGSgYOvrvDHeLf8EF78soVWj2nZcC2SQHxUxa2 e94Sb2uSYWveuMTByujBeBfrJgLtlqRS912ZtHNl8J8jwMXR6WunM2QiqV+clwHxApJ/ mHqxgfvnRc9gc2vjrkketKic7PCYYBw6tGVdFeILWSb4JDMbKdzTQGFkZaZU53ZufIcw B/ckh5J+gNDQshaYZPZmBqL5GFf+OJRAFBitgXebY0CaV3Z4VmBLEmlQlAzJZxmqBSYM jLfg== X-Forwarded-Encrypted: i=1; AJvYcCWMupi7hpj8vRTpaUlZts8bR3xiK+3Tqkn1iCRzgoBvRvrPmhc8CtdvjKoAbwf/B7ZXu2W9mxFAA1JRCyo=@vger.kernel.org X-Gm-Message-State: AOJu0YzL2EoylodHqTXBE3irjBiTYWypZB/wfCM49gfgBa1m9YH/pceP 4iNVnjh9jr4jjEE8EkENPGw2QMWB4fOx29nSrK2Z/sxUWL1C3EIH0i196frSUDvSl1aQUSRe6wq ro2Jh7wgyoQ== X-Google-Smtp-Source: AGHT+IGqDv4Qh2dpOW/KIyM3fXGpUku1wnBnqRhVbiel3Mutgkw0zT0K02XAAVXeCyxbZ4waAZuvTL7uGMy/ X-Received: from pjbqd16.prod.google.com ([2002:a17:90b:3cd0:b0:31c:15e1:d04]) (user=irogers job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:3c10:b0:311:ff02:3fcc with SMTP id 98e67ed59e1d1-31c9f44bcd8mr19985067a91.14.1752896738434; Fri, 18 Jul 2025 20:45:38 -0700 (PDT) Date: Fri, 18 Jul 2025 20:45:03 -0700 In-Reply-To: <20250719034515.2000467-1-irogers@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250719034515.2000467-1-irogers@google.com> X-Mailer: git-send-email 2.50.0.727.gbf7dc18ff4-goog Message-ID: <20250719034515.2000467-8-irogers@google.com> Subject: [PATCH v1 07/19] perf vendor events: Update graniterapids events/metrics From: Ian Rogers To: Thomas Falcon , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Kan Liang , "=?UTF-8?q?Andreas=20F=C3=A4rber?=" , Manivannan Sadhasivam , Caleb Biggers , Weilin Wang , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-actions@lists.infradead.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Update events from v1.10 to v1.12. Update metrics from TMA 5.0 to 5.1. The event updates come from: https://github.com/intel/perfmon/commit/1684fa543fd45970759bb72dc3fc00c2ef8= 7c0e8 Signed-off-by: Ian Rogers Tested-by: Thomas Falcon --- .../arch/x86/graniterapids/cache.json | 227 +++++++++++++----- .../x86/graniterapids/floating-point.json | 43 ++-- .../arch/x86/graniterapids/frontend.json | 42 ++-- .../arch/x86/graniterapids/gnr-metrics.json | 131 +++++----- .../arch/x86/graniterapids/memory.json | 33 ++- .../arch/x86/graniterapids/other.json | 30 ++- .../arch/x86/graniterapids/pipeline.json | 167 ++++++------- .../arch/x86/graniterapids/uncore-io.json | 1 - .../arch/x86/graniterapids/uncore-memory.json | 31 --- .../x86/graniterapids/virtual-memory.json | 40 +-- tools/perf/pmu-events/arch/x86/mapfile.csv | 2 +- 11 files changed, 411 insertions(+), 336 deletions(-) diff --git a/tools/perf/pmu-events/arch/x86/graniterapids/cache.json b/tool= s/perf/pmu-events/arch/x86/graniterapids/cache.json index dbdeade6fe6f..7edb73583b07 100644 --- a/tools/perf/pmu-events/arch/x86/graniterapids/cache.json +++ b/tools/perf/pmu-events/arch/x86/graniterapids/cache.json @@ -4,7 +4,6 @@ "Counter": "0,1,2,3", "EventCode": "0x51", "EventName": "L1D.HWPF_MISS", - "PublicDescription": "L1D.HWPF_MISS Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x20" }, @@ -13,7 +12,7 @@ "Counter": "0,1,2,3", "EventCode": "0x51", "EventName": "L1D.REPLACEMENT", - "PublicDescription": "Counts L1D data line replacements including = opportunistic replacements, and replacements that require stall-for-replace= or block-for-replace. Available PDIST counters: 0", + "PublicDescription": "Counts L1D data line replacements including = opportunistic replacements, and replacements that require stall-for-replace= or block-for-replace.", "SampleAfterValue": "100003", "UMask": "0x1" }, @@ -22,7 +21,7 @@ "Counter": "0,1,2,3", "EventCode": "0x48", "EventName": "L1D_PEND_MISS.FB_FULL", - "PublicDescription": "Counts number of cycles a demand request has= waited due to L1D Fill Buffer (FB) unavailability. Demand requests include= cacheable/uncacheable demand load, store, lock or SW prefetch accesses. Av= ailable PDIST counters: 0", + "PublicDescription": "Counts number of cycles a demand request has= waited due to L1D Fill Buffer (FB) unavailability. Demand requests include= cacheable/uncacheable demand load, store, lock or SW prefetch accesses.", "SampleAfterValue": "1000003", "UMask": "0x2" }, @@ -33,7 +32,7 @@ "EdgeDetect": "1", "EventCode": "0x48", "EventName": "L1D_PEND_MISS.FB_FULL_PERIODS", - "PublicDescription": "Counts number of phases a demand request has= waited due to L1D Fill Buffer (FB) unavailability. Demand requests include= cacheable/uncacheable demand load, store, lock or SW prefetch accesses. Av= ailable PDIST counters: 0", + "PublicDescription": "Counts number of phases a demand request has= waited due to L1D Fill Buffer (FB) unavailability. Demand requests include= cacheable/uncacheable demand load, store, lock or SW prefetch accesses.", "SampleAfterValue": "1000003", "UMask": "0x2" }, @@ -42,7 +41,7 @@ "Counter": "0,1,2,3", "EventCode": "0x48", "EventName": "L1D_PEND_MISS.L2_STALLS", - "PublicDescription": "Counts number of cycles a demand request has= waited due to L1D due to lack of L2 resources. Demand requests include cac= heable/uncacheable demand load, store, lock or SW prefetch accesses. Availa= ble PDIST counters: 0", + "PublicDescription": "Counts number of cycles a demand request has= waited due to L1D due to lack of L2 resources. Demand requests include cac= heable/uncacheable demand load, store, lock or SW prefetch accesses.", "SampleAfterValue": "1000003", "UMask": "0x4" }, @@ -51,7 +50,7 @@ "Counter": "0,1,2,3", "EventCode": "0x48", "EventName": "L1D_PEND_MISS.PENDING", - "PublicDescription": "Counts number of L1D misses that are outstan= ding in each cycle, that is each cycle the number of Fill Buffers (FB) outs= tanding required by Demand Reads. FB either is held by demand loads, or it = is held by non-demand loads and gets hit at least once by demand. The valid= outstanding interval is defined until the FB deallocation by one of the fo= llowing ways: from FB allocation, if FB is allocated by demand from the dem= and Hit FB, if it is allocated by hardware or software prefetch. Note: In t= he L1D, a Demand Read contains cacheable or noncacheable demand loads, incl= uding ones causing cache-line splits and reads due to page walks resulted f= rom any request type. Available PDIST counters: 0", + "PublicDescription": "Counts number of L1D misses that are outstan= ding in each cycle, that is each cycle the number of Fill Buffers (FB) outs= tanding required by Demand Reads. FB either is held by demand loads, or it = is held by non-demand loads and gets hit at least once by demand. The valid= outstanding interval is defined until the FB deallocation by one of the fo= llowing ways: from FB allocation, if FB is allocated by demand from the dem= and Hit FB, if it is allocated by hardware or software prefetch. Note: In t= he L1D, a Demand Read contains cacheable or noncacheable demand loads, incl= uding ones causing cache-line splits and reads due to page walks resulted f= rom any request type.", "SampleAfterValue": "1000003", "UMask": "0x1" }, @@ -61,7 +60,7 @@ "CounterMask": "1", "EventCode": "0x48", "EventName": "L1D_PEND_MISS.PENDING_CYCLES", - "PublicDescription": "Counts duration of L1D miss outstanding in c= ycles. Available PDIST counters: 0", + "PublicDescription": "Counts duration of L1D miss outstanding in c= ycles.", "SampleAfterValue": "1000003", "UMask": "0x1" }, @@ -70,7 +69,7 @@ "Counter": "0,1,2,3", "EventCode": "0x25", "EventName": "L2_LINES_IN.ALL", - "PublicDescription": "Counts the number of L2 cache lines filling = the L2. Counting does not cover rejects. Available PDIST counters: 0", + "PublicDescription": "Counts the number of L2 cache lines filling = the L2. Counting does not cover rejects.", "SampleAfterValue": "100003", "UMask": "0x1f" }, @@ -79,7 +78,7 @@ "Counter": "0,1,2,3", "EventCode": "0x26", "EventName": "L2_LINES_OUT.NON_SILENT", - "PublicDescription": "Counts the number of lines that are evicted = by L2 cache when triggered by an L2 cache fill. Those lines are in Modified= state. Modified lines are written back to L3 Available PDIST counters: 0", + "PublicDescription": "Counts the number of lines that are evicted = by L2 cache when triggered by an L2 cache fill. Those lines are in Modified= state. Modified lines are written back to L3", "SampleAfterValue": "200003", "UMask": "0x2" }, @@ -88,7 +87,7 @@ "Counter": "0,1,2,3", "EventCode": "0x26", "EventName": "L2_LINES_OUT.SILENT", - "PublicDescription": "Counts the number of lines that are silently= dropped by L2 cache. These lines are typically in Shared or Exclusive stat= e. A non-threaded event. Available PDIST counters: 0", + "PublicDescription": "Counts the number of lines that are silently= dropped by L2 cache. These lines are typically in Shared or Exclusive stat= e. A non-threaded event.", "SampleAfterValue": "200003", "UMask": "0x1" }, @@ -97,7 +96,7 @@ "Counter": "0,1,2,3", "EventCode": "0x26", "EventName": "L2_LINES_OUT.USELESS_HWPF", - "PublicDescription": "Counts the number of cache lines that have b= een prefetched by the L2 hardware prefetcher but not used by demand access = when evicted from the L2 cache Available PDIST counters: 0", + "PublicDescription": "Counts the number of cache lines that have b= een prefetched by the L2 hardware prefetcher but not used by demand access = when evicted from the L2 cache", "SampleAfterValue": "200003", "UMask": "0x4" }, @@ -106,7 +105,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_REQUEST.ALL", - "PublicDescription": "Counts all requests that were hit or true mi= sses in L2 cache. True-miss excludes misses that were merged with ongoing L= 2 misses. [This event is alias to L2_RQSTS.REFERENCES] Available PDIST coun= ters: 0", + "PublicDescription": "Counts all requests that were hit or true mi= sses in L2 cache. True-miss excludes misses that were merged with ongoing L= 2 misses. [This event is alias to L2_RQSTS.REFERENCES]", "SampleAfterValue": "200003", "UMask": "0xff" }, @@ -115,7 +114,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_REQUEST.HIT", - "PublicDescription": "Counts all requests that hit L2 cache. [This= event is alias to L2_RQSTS.HIT] Available PDIST counters: 0", + "PublicDescription": "Counts all requests that hit L2 cache. [This= event is alias to L2_RQSTS.HIT]", "SampleAfterValue": "200003", "UMask": "0xdf" }, @@ -124,7 +123,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_REQUEST.MISS", - "PublicDescription": "Counts read requests of any type with true-m= iss in the L2 cache. True-miss excludes L2 misses that were merged with ong= oing L2 misses. [This event is alias to L2_RQSTS.MISS] Available PDIST coun= ters: 0", + "PublicDescription": "Counts read requests of any type with true-m= iss in the L2 cache. True-miss excludes L2 misses that were merged with ong= oing L2 misses. [This event is alias to L2_RQSTS.MISS]", "SampleAfterValue": "200003", "UMask": "0x3f" }, @@ -133,7 +132,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.ALL_CODE_RD", - "PublicDescription": "Counts the total number of L2 code requests.= Available PDIST counters: 0", + "PublicDescription": "Counts the total number of L2 code requests.= ", "SampleAfterValue": "200003", "UMask": "0xe4" }, @@ -142,7 +141,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.ALL_DEMAND_DATA_RD", - "PublicDescription": "Counts Demand Data Read requests accessing t= he L2 cache. These requests may hit or miss L2 cache. True-miss exclude mis= ses that were merged with ongoing L2 misses. An access is counted once. Ava= ilable PDIST counters: 0", + "PublicDescription": "Counts Demand Data Read requests accessing t= he L2 cache. These requests may hit or miss L2 cache. True-miss exclude mis= ses that were merged with ongoing L2 misses. An access is counted once.", "SampleAfterValue": "200003", "UMask": "0xe1" }, @@ -151,7 +150,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.ALL_DEMAND_MISS", - "PublicDescription": "Counts demand requests that miss L2 cache. A= vailable PDIST counters: 0", + "PublicDescription": "Counts demand requests that miss L2 cache.", "SampleAfterValue": "200003", "UMask": "0x27" }, @@ -160,7 +159,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.ALL_DEMAND_REFERENCES", - "PublicDescription": "Counts demand requests to L2 cache. Availabl= e PDIST counters: 0", + "PublicDescription": "Counts demand requests to L2 cache.", "SampleAfterValue": "200003", "UMask": "0xe7" }, @@ -169,7 +168,6 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.ALL_HWPF", - "PublicDescription": "L2_RQSTS.ALL_HWPF Available PDIST counters: = 0", "SampleAfterValue": "200003", "UMask": "0xf0" }, @@ -178,7 +176,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.ALL_RFO", - "PublicDescription": "Counts the total number of RFO (read for own= ership) requests to L2 cache. L2 RFO requests include both L1D demand RFO m= isses as well as L1D RFO prefetches. Available PDIST counters: 0", + "PublicDescription": "Counts the total number of RFO (read for own= ership) requests to L2 cache. L2 RFO requests include both L1D demand RFO m= isses as well as L1D RFO prefetches.", "SampleAfterValue": "200003", "UMask": "0xe2" }, @@ -187,7 +185,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.CODE_RD_HIT", - "PublicDescription": "Counts L2 cache hits when fetching instructi= ons, code reads. Available PDIST counters: 0", + "PublicDescription": "Counts L2 cache hits when fetching instructi= ons, code reads.", "SampleAfterValue": "200003", "UMask": "0xc4" }, @@ -196,7 +194,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.CODE_RD_MISS", - "PublicDescription": "Counts L2 cache misses when fetching instruc= tions. Available PDIST counters: 0", + "PublicDescription": "Counts L2 cache misses when fetching instruc= tions.", "SampleAfterValue": "200003", "UMask": "0x24" }, @@ -205,7 +203,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.DEMAND_DATA_RD_HIT", - "PublicDescription": "Counts the number of demand Data Read reques= ts initiated by load instructions that hit L2 cache. Available PDIST counte= rs: 0", + "PublicDescription": "Counts the number of demand Data Read reques= ts initiated by load instructions that hit L2 cache.", "SampleAfterValue": "200003", "UMask": "0xc1" }, @@ -214,7 +212,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.DEMAND_DATA_RD_MISS", - "PublicDescription": "Counts demand Data Read requests with true-m= iss in the L2 cache. True-miss excludes misses that were merged with ongoin= g L2 misses. An access is counted once. Available PDIST counters: 0", + "PublicDescription": "Counts demand Data Read requests with true-m= iss in the L2 cache. True-miss excludes misses that were merged with ongoin= g L2 misses. An access is counted once.", "SampleAfterValue": "200003", "UMask": "0x21" }, @@ -223,7 +221,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.HIT", - "PublicDescription": "Counts all requests that hit L2 cache. [This= event is alias to L2_REQUEST.HIT] Available PDIST counters: 0", + "PublicDescription": "Counts all requests that hit L2 cache. [This= event is alias to L2_REQUEST.HIT]", "SampleAfterValue": "200003", "UMask": "0xdf" }, @@ -232,7 +230,6 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.HWPF_MISS", - "PublicDescription": "L2_RQSTS.HWPF_MISS Available PDIST counters:= 0", "SampleAfterValue": "200003", "UMask": "0x30" }, @@ -241,7 +238,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.MISS", - "PublicDescription": "Counts read requests of any type with true-m= iss in the L2 cache. True-miss excludes L2 misses that were merged with ong= oing L2 misses. [This event is alias to L2_REQUEST.MISS] Available PDIST co= unters: 0", + "PublicDescription": "Counts read requests of any type with true-m= iss in the L2 cache. True-miss excludes L2 misses that were merged with ong= oing L2 misses. [This event is alias to L2_REQUEST.MISS]", "SampleAfterValue": "200003", "UMask": "0x3f" }, @@ -250,7 +247,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.REFERENCES", - "PublicDescription": "Counts all requests that were hit or true mi= sses in L2 cache. True-miss excludes misses that were merged with ongoing L= 2 misses. [This event is alias to L2_REQUEST.ALL] Available PDIST counters:= 0", + "PublicDescription": "Counts all requests that were hit or true mi= sses in L2 cache. True-miss excludes misses that were merged with ongoing L= 2 misses. [This event is alias to L2_REQUEST.ALL]", "SampleAfterValue": "200003", "UMask": "0xff" }, @@ -259,7 +256,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.RFO_HIT", - "PublicDescription": "Counts the RFO (Read-for-Ownership) requests= that hit L2 cache. Available PDIST counters: 0", + "PublicDescription": "Counts the RFO (Read-for-Ownership) requests= that hit L2 cache.", "SampleAfterValue": "200003", "UMask": "0xc2" }, @@ -268,7 +265,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.RFO_MISS", - "PublicDescription": "Counts the RFO (Read-for-Ownership) requests= that miss L2 cache. Available PDIST counters: 0", + "PublicDescription": "Counts the RFO (Read-for-Ownership) requests= that miss L2 cache.", "SampleAfterValue": "200003", "UMask": "0x22" }, @@ -277,7 +274,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.SWPF_HIT", - "PublicDescription": "Counts Software prefetch requests that hit t= he L2 cache. Accounts for PREFETCHNTA and PREFETCHT0/1/2 instructions when = FB is not full. Available PDIST counters: 0", + "PublicDescription": "Counts Software prefetch requests that hit t= he L2 cache. Accounts for PREFETCHNTA and PREFETCHT0/1/2 instructions when = FB is not full.", "SampleAfterValue": "200003", "UMask": "0xc8" }, @@ -286,7 +283,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.SWPF_MISS", - "PublicDescription": "Counts Software prefetch requests that miss = the L2 cache. Accounts for PREFETCHNTA and PREFETCHT0/1/2 instructions when= FB is not full. Available PDIST counters: 0", + "PublicDescription": "Counts Software prefetch requests that miss = the L2 cache. Accounts for PREFETCHNTA and PREFETCHT0/1/2 instructions when= FB is not full.", "SampleAfterValue": "200003", "UMask": "0x28" }, @@ -295,7 +292,7 @@ "Counter": "0,1,2,3", "EventCode": "0x23", "EventName": "L2_TRANS.L2_WB", - "PublicDescription": "Counts L2 writebacks that access L2 cache. A= vailable PDIST counters: 0", + "PublicDescription": "Counts L2 writebacks that access L2 cache.", "SampleAfterValue": "200003", "UMask": "0x40" }, @@ -304,7 +301,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0x2e", "EventName": "LONGEST_LAT_CACHE.MISS", - "PublicDescription": "Counts core-originated cacheable requests th= at miss the L3 cache (Longest Latency cache). Requests include data and cod= e reads, Reads-for-Ownership (RFOs), speculative accesses and hardware pref= etches to the L1 and L2. It does not include hardware prefetches to the L3= , and may not count other types of requests to the L3. Available PDIST coun= ters: 0", + "PublicDescription": "Counts core-originated cacheable requests th= at miss the L3 cache (Longest Latency cache). Requests include data and cod= e reads, Reads-for-Ownership (RFOs), speculative accesses and hardware pref= etches to the L1 and L2. It does not include hardware prefetches to the L3= , and may not count other types of requests to the L3.", "SampleAfterValue": "100003", "UMask": "0x41" }, @@ -313,7 +310,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0x2e", "EventName": "LONGEST_LAT_CACHE.REFERENCE", - "PublicDescription": "Counts core-originated cacheable requests to= the L3 cache (Longest Latency cache). Requests include data and code reads= , Reads-for-Ownership (RFOs), speculative accesses and hardware prefetches = to the L1 and L2. It does not include hardware prefetches to the L3, and m= ay not count other types of requests to the L3. Available PDIST counters: 0= ", + "PublicDescription": "Counts core-originated cacheable requests to= the L3 cache (Longest Latency cache). Requests include data and code reads= , Reads-for-Ownership (RFOs), speculative accesses and hardware prefetches = to the L1 and L2. It does not include hardware prefetches to the L3, and m= ay not count other types of requests to the L3.", "SampleAfterValue": "100003", "UMask": "0x4f" }, @@ -437,7 +434,7 @@ "Counter": "0,1,2,3", "EventCode": "0x43", "EventName": "MEM_LOAD_COMPLETED.L1_MISS_ANY", - "PublicDescription": "Number of completed demand load requests tha= t missed the L1 data cache including shadow misses (FB hits, merge to an on= going L1D miss) Available PDIST counters: 0", + "PublicDescription": "Number of completed demand load requests tha= t missed the L1 data cache including shadow misses (FB hits, merge to an on= going L1D miss)", "SampleAfterValue": "1000003", "UMask": "0xfd" }, @@ -503,6 +500,15 @@ "SampleAfterValue": "100007", "UMask": "0x1" }, + { + "BriefDescription": "Retired load instructions with remote cxl mem= as the data source where the data request missed all caches.", + "Counter": "0,1,2,3", + "EventCode": "0xd3", + "EventName": "MEM_LOAD_L3_MISS_RETIRED.REMOTE_CXL_MEM", + "PublicDescription": "Counts retired load instructions with remote= cxl mem as the data source and the data request missed L3. Available PDIST= counters: 0", + "SampleAfterValue": "100007", + "UMask": "0x10" + }, { "BriefDescription": "MEM_LOAD_L3_MISS_RETIRED.REMOTE_DRAM", "Counter": "0,1,2,3", @@ -628,12 +634,21 @@ "SampleAfterValue": "50021", "UMask": "0x20" }, + { + "BriefDescription": "Retired load instructions with local cxl mem = as the data source where the data request missed all caches.", + "Counter": "0,1,2,3", + "Data_LA": "1", + "EventCode": "0xd1", + "EventName": "MEM_LOAD_RETIRED.LOCAL_CXL_MEM", + "PublicDescription": "Counts retired load instructions with local = cxl mem as the data source and the data request missed L3. Available PDIST = counters: 0", + "SampleAfterValue": "1000003", + "UMask": "0x80" + }, { "BriefDescription": "MEM_STORE_RETIRED.L2_HIT", "Counter": "0,1,2,3", "EventCode": "0x44", "EventName": "MEM_STORE_RETIRED.L2_HIT", - "PublicDescription": "MEM_STORE_RETIRED.L2_HIT Available PDIST cou= nters: 0", "SampleAfterValue": "200003", "UMask": "0x1" }, @@ -642,7 +657,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe5", "EventName": "MEM_UOP_RETIRED.ANY", - "PublicDescription": "Number of retired micro-operations (uops) fo= r load or store memory accesses Available PDIST counters: 0", + "PublicDescription": "Number of retired micro-operations (uops) fo= r load or store memory accesses", "SampleAfterValue": "1000003", "UMask": "0x3" }, @@ -690,6 +705,17 @@ "SampleAfterValue": "100003", "UMask": "0x1" }, + { + "BriefDescription": "Counts demand data reads that were supplied b= y CXL MEM (Type 2 or Type 3).", + "Counter": "0,1,2,3", + "EventCode": "0x2A,0x2B", + "EventName": "OCR.DEMAND_DATA_RD.CXL_MEM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x703C00001", + "PublicDescription": "Counts demand data reads that were supplied = by CXL MEM (Type 2 or Type 3). Available PDIST counters: 0", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, { "BriefDescription": "Counts demand data reads that hit in the L3 o= r were snooped from another core's caches on the same socket.", "Counter": "0,1,2,3", @@ -734,6 +760,17 @@ "SampleAfterValue": "100003", "UMask": "0x1" }, + { + "BriefDescription": "Counts demand data reads that were supplied b= y CXL MEM (Type 2 and Type 3) attached to local socket.", + "Counter": "0,1,2,3", + "EventCode": "0x2A,0x2B", + "EventName": "OCR.DEMAND_DATA_RD.LOCAL_CXL_MEM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x700C00001", + "PublicDescription": "Counts demand data reads that were supplied = by CXL MEM (Type 2 and Type 3) attached to local socket. Available PDIST co= unters: 0", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, { "BriefDescription": "Counts demand data reads that were supplied b= y a cache on a remote socket where a snoop hit a modified line in another c= ore's caches which forwarded the data.", "Counter": "0,1,2,3", @@ -756,6 +793,17 @@ "SampleAfterValue": "100003", "UMask": "0x1" }, + { + "BriefDescription": "Counts demand data reads that were supplied b= y CXL MEM (Type 2 or Type 3) attached to another socket.", + "Counter": "0,1,2,3", + "EventCode": "0x2A,0x2B", + "EventName": "OCR.DEMAND_DATA_RD.REMOTE_CXL_MEM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x703000001", + "PublicDescription": "Counts demand data reads that were supplied = by CXL MEM (Type 2 or Type 3) attached to another socket. Available PDIST c= ounters: 0", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, { "BriefDescription": "Counts demand data reads that hit a modified = line in a distant L3 Cache or were snooped from a distant core's L1/L2 cach= es on this socket when the system is in SNC (sub-NUMA cluster) mode.", "Counter": "0,1,2,3", @@ -789,6 +837,17 @@ "SampleAfterValue": "100003", "UMask": "0x1" }, + { + "BriefDescription": "Counts demand reads for ownership (RFO) reque= sts and software prefetches for exclusive ownership (PREFETCHW) that were s= upplied by CXL MEM (Type 2 or Type 3).", + "Counter": "0,1,2,3", + "EventCode": "0x2A,0x2B", + "EventName": "OCR.DEMAND_RFO.CXL_MEM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x703C00002", + "PublicDescription": "Counts demand reads for ownership (RFO) requ= ests and software prefetches for exclusive ownership (PREFETCHW) that were = supplied by CXL MEM (Type 2 or Type 3). Available PDIST counters: 0", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, { "BriefDescription": "Counts demand reads for ownership (RFO) reque= sts and software prefetches for exclusive ownership (PREFETCHW) that hit in= the L3 or were snooped from another core's caches on the same socket.", "Counter": "0,1,2,3", @@ -811,6 +870,28 @@ "SampleAfterValue": "100003", "UMask": "0x1" }, + { + "BriefDescription": "Counts demand reads for ownership (RFO) reque= sts and software prefetches for exclusive ownership (PREFETCHW) that were s= upplied by CXL MEM (Type 2 and Type 3) attached to local socket.", + "Counter": "0,1,2,3", + "EventCode": "0x2A,0x2B", + "EventName": "OCR.DEMAND_RFO.LOCAL_CXL_MEM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x700C00002", + "PublicDescription": "Counts demand reads for ownership (RFO) requ= ests and software prefetches for exclusive ownership (PREFETCHW) that were = supplied by CXL MEM (Type 2 and Type 3) attached to local socket. Available= PDIST counters: 0", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand reads for ownership (RFO) reque= sts and software prefetches for exclusive ownership (PREFETCHW) that were s= upplied by CXL MEM (Type 2 or Type 3) attached to another socket.", + "Counter": "0,1,2,3", + "EventCode": "0x2A,0x2B", + "EventName": "OCR.DEMAND_RFO.REMOTE_CXL_MEM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x703000002", + "PublicDescription": "Counts demand reads for ownership (RFO) requ= ests and software prefetches for exclusive ownership (PREFETCHW) that were = supplied by CXL MEM (Type 2 or Type 3) attached to another socket. Availabl= e PDIST counters: 0", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, { "BriefDescription": "Counts writebacks of modified cachelines and = streaming stores that have any type of response.", "Counter": "0,1,2,3", @@ -833,6 +914,17 @@ "SampleAfterValue": "100003", "UMask": "0x1" }, + { + "BriefDescription": "Counts all (cacheable) data read, code read a= nd RFO requests including demands and prefetches to the core caches (L1 or = L2) that were supplied by CXL MEM (Type 2 or Type 3).", + "Counter": "0,1,2,3", + "EventCode": "0x2A,0x2B", + "EventName": "OCR.READS_TO_CORE.CXL_MEM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x703C04477", + "PublicDescription": "Counts all (cacheable) data read, code read = and RFO requests including demands and prefetches to the core caches (L1 or= L2) that were supplied by CXL MEM (Type 2 or Type 3). Available PDIST coun= ters: 0", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, { "BriefDescription": "Counts all (cacheable) data read, code read a= nd RFO requests including demands and prefetches to the core caches (L1 or = L2) that hit in the L3 or were snooped from another core's caches on the sa= me socket.", "Counter": "0,1,2,3", @@ -855,6 +947,17 @@ "SampleAfterValue": "100003", "UMask": "0x1" }, + { + "BriefDescription": "Counts all (cacheable) data read, code read a= nd RFO requests including demands and prefetches to the core caches (L1 or = L2) that were supplied by CXL MEM (Type 2 and Type 3) attached to local soc= ket.", + "Counter": "0,1,2,3", + "EventCode": "0x2A,0x2B", + "EventName": "OCR.READS_TO_CORE.LOCAL_CXL_MEM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x700C04477", + "PublicDescription": "Counts all (cacheable) data read, code read = and RFO requests including demands and prefetches to the core caches (L1 or= L2) that were supplied by CXL MEM (Type 2 and Type 3) attached to local so= cket. Available PDIST counters: 0", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, { "BriefDescription": "Counts all (cacheable) data read, code read a= nd RFO requests including demands and prefetches to the core caches (L1 or = L2) that were not supplied by the local socket's L1, L2, or L3 caches and w= ere supplied by a remote socket.", "Counter": "0,1,2,3", @@ -899,6 +1002,17 @@ "SampleAfterValue": "100003", "UMask": "0x1" }, + { + "BriefDescription": "Counts all (cacheable) data read, code read a= nd RFO requests including demands and prefetches to the core caches (L1 or = L2) that were supplied by CXL MEM (Type 2 or Type 3) attached to another so= cket.", + "Counter": "0,1,2,3", + "EventCode": "0x2A,0x2B", + "EventName": "OCR.READS_TO_CORE.REMOTE_CXL_MEM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x703004477", + "PublicDescription": "Counts all (cacheable) data read, code read = and RFO requests including demands and prefetches to the core caches (L1 or= L2) that were supplied by CXL MEM (Type 2 or Type 3) attached to another s= ocket. Available PDIST counters: 0", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, { "BriefDescription": "Counts all (cacheable) data read, code read a= nd RFO requests including demands and prefetches to the core caches (L1 or = L2) that hit a modified line in a distant L3 Cache or were snooped from a d= istant core's L1/L2 caches on this socket when the system is in SNC (sub-NU= MA cluster) mode.", "Counter": "0,1,2,3", @@ -937,7 +1051,7 @@ "Counter": "0,1,2,3", "EventCode": "0x21", "EventName": "OFFCORE_REQUESTS.ALL_REQUESTS", - "PublicDescription": "Counts memory transactions reached the super= queue including requests initiated by the core, all L3 prefetches, page wa= lks, etc.. Available PDIST counters: 0", + "PublicDescription": "Counts memory transactions reached the super= queue including requests initiated by the core, all L3 prefetches, page wa= lks, etc..", "SampleAfterValue": "100003", "UMask": "0x80" }, @@ -946,7 +1060,7 @@ "Counter": "0,1,2,3", "EventCode": "0x21", "EventName": "OFFCORE_REQUESTS.DATA_RD", - "PublicDescription": "Counts the demand and prefetch data reads. A= ll Core Data Reads include cacheable 'Demands' and L2 prefetchers (not L3 p= refetchers). Counting also covers reads due to page walks resulted from any= request type. Available PDIST counters: 0", + "PublicDescription": "Counts the demand and prefetch data reads. A= ll Core Data Reads include cacheable 'Demands' and L2 prefetchers (not L3 p= refetchers). Counting also covers reads due to page walks resulted from any= request type.", "SampleAfterValue": "100003", "UMask": "0x8" }, @@ -955,7 +1069,7 @@ "Counter": "0,1,2,3", "EventCode": "0x21", "EventName": "OFFCORE_REQUESTS.DEMAND_CODE_RD", - "PublicDescription": "Counts both cacheable and Non-Cacheable code= read requests. Available PDIST counters: 0", + "PublicDescription": "Counts both cacheable and Non-Cacheable code= read requests.", "SampleAfterValue": "100003", "UMask": "0x2" }, @@ -964,7 +1078,7 @@ "Counter": "0,1,2,3", "EventCode": "0x21", "EventName": "OFFCORE_REQUESTS.DEMAND_DATA_RD", - "PublicDescription": "Counts the Demand Data Read requests sent to= uncore. Use it in conjunction with OFFCORE_REQUESTS_OUTSTANDING to determi= ne average latency in the uncore. Available PDIST counters: 0", + "PublicDescription": "Counts the Demand Data Read requests sent to= uncore. Use it in conjunction with OFFCORE_REQUESTS_OUTSTANDING to determi= ne average latency in the uncore.", "SampleAfterValue": "100003", "UMask": "0x1" }, @@ -973,7 +1087,7 @@ "Counter": "0,1,2,3", "EventCode": "0x21", "EventName": "OFFCORE_REQUESTS.DEMAND_RFO", - "PublicDescription": "Counts the demand RFO (read for ownership) r= equests including regular RFOs, locks, ItoM. Available PDIST counters: 0", + "PublicDescription": "Counts the demand RFO (read for ownership) r= equests including regular RFOs, locks, ItoM.", "SampleAfterValue": "100003", "UMask": "0x4" }, @@ -982,7 +1096,7 @@ "Counter": "0,1,2,3", "EventCode": "0x21", "EventName": "OFFCORE_REQUESTS.MEM_UC", - "PublicDescription": "This event counts noncacheable memory data r= ead transactions. Available PDIST counters: 0", + "PublicDescription": "This event counts noncacheable memory data r= ead transactions.", "SampleAfterValue": "100003", "UMask": "0x20" }, @@ -992,7 +1106,7 @@ "CounterMask": "1", "EventCode": "0x20", "EventName": "OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD", - "PublicDescription": "Counts cycles when offcore outstanding cache= able Core Data Read transactions are present in the super queue. A transact= ion is considered to be in the Offcore outstanding state between L2 miss an= d transaction completion sent to requestor (SQ de-allocation). See correspo= nding Umask under OFFCORE_REQUESTS. Available PDIST counters: 0", + "PublicDescription": "Counts cycles when offcore outstanding cache= able Core Data Read transactions are present in the super queue. A transact= ion is considered to be in the Offcore outstanding state between L2 miss an= d transaction completion sent to requestor (SQ de-allocation). See correspo= nding Umask under OFFCORE_REQUESTS.", "SampleAfterValue": "1000003", "UMask": "0x8" }, @@ -1002,7 +1116,7 @@ "CounterMask": "1", "EventCode": "0x20", "EventName": "OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_CODE= _RD", - "PublicDescription": "Counts the number of offcore outstanding Cod= e Reads transactions in the super queue every cycle. The 'Offcore outstandi= ng' state of the transaction lasts from the L2 miss until the sending trans= action completion to requestor (SQ deallocation). See the corresponding Uma= sk under OFFCORE_REQUESTS. Available PDIST counters: 0", + "PublicDescription": "Counts the number of offcore outstanding Cod= e Reads transactions in the super queue every cycle. The 'Offcore outstandi= ng' state of the transaction lasts from the L2 miss until the sending trans= action completion to requestor (SQ deallocation). See the corresponding Uma= sk under OFFCORE_REQUESTS.", "SampleAfterValue": "1000003", "UMask": "0x2" }, @@ -1012,7 +1126,6 @@ "CounterMask": "1", "EventCode": "0x20", "EventName": "OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_DATA= _RD", - "PublicDescription": "Cycles where at least 1 outstanding demand d= ata read request is pending. Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -1022,7 +1135,7 @@ "CounterMask": "1", "EventCode": "0x20", "EventName": "OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_RFO", - "PublicDescription": "Counts the number of offcore outstanding dem= and rfo Reads transactions in the super queue every cycle. The 'Offcore out= standing' state of the transaction lasts from the L2 miss until the sending= transaction completion to requestor (SQ deallocation). See the correspondi= ng Umask under OFFCORE_REQUESTS. Available PDIST counters: 0", + "PublicDescription": "Counts the number of offcore outstanding dem= and rfo Reads transactions in the super queue every cycle. The 'Offcore out= standing' state of the transaction lasts from the L2 miss until the sending= transaction completion to requestor (SQ deallocation). See the correspondi= ng Umask under OFFCORE_REQUESTS.", "SampleAfterValue": "1000003", "UMask": "0x4" }, @@ -1031,7 +1144,6 @@ "Counter": "0,1,2,3", "EventCode": "0x20", "EventName": "OFFCORE_REQUESTS_OUTSTANDING.DATA_RD", - "PublicDescription": "OFFCORE_REQUESTS_OUTSTANDING.DATA_RD Availab= le PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x8" }, @@ -1040,7 +1152,7 @@ "Counter": "0,1,2,3", "EventCode": "0x20", "EventName": "OFFCORE_REQUESTS_OUTSTANDING.DEMAND_CODE_RD", - "PublicDescription": "Counts the number of offcore outstanding Cod= e Reads transactions in the super queue every cycle. The 'Offcore outstandi= ng' state of the transaction lasts from the L2 miss until the sending trans= action completion to requestor (SQ deallocation). See the corresponding Uma= sk under OFFCORE_REQUESTS. Available PDIST counters: 0", + "PublicDescription": "Counts the number of offcore outstanding Cod= e Reads transactions in the super queue every cycle. The 'Offcore outstandi= ng' state of the transaction lasts from the L2 miss until the sending trans= action completion to requestor (SQ deallocation). See the corresponding Uma= sk under OFFCORE_REQUESTS.", "SampleAfterValue": "1000003", "UMask": "0x2" }, @@ -1049,7 +1161,7 @@ "Counter": "0,1,2,3", "EventCode": "0x20", "EventName": "OFFCORE_REQUESTS_OUTSTANDING.DEMAND_DATA_RD", - "PublicDescription": "For every cycle, increments by the number of= outstanding demand data read requests pending. Requests are considered o= utstanding from the time they miss the core's L2 cache until the transactio= n completion message is sent to the requestor. Available PDIST counters: 0", + "PublicDescription": "For every cycle, increments by the number of= outstanding demand data read requests pending. Requests are considered o= utstanding from the time they miss the core's L2 cache until the transactio= n completion message is sent to the requestor.", "SampleAfterValue": "1000003", "UMask": "0x1" }, @@ -1058,7 +1170,7 @@ "Counter": "0,1,2,3", "EventCode": "0x20", "EventName": "OFFCORE_REQUESTS_OUTSTANDING.DEMAND_RFO", - "PublicDescription": "Counts the number of off-core outstanding re= ad-for-ownership (RFO) store transactions every cycle. An RFO transaction i= s considered to be in the Off-core outstanding state between L2 cache miss = and transaction completion. Available PDIST counters: 0", + "PublicDescription": "Counts the number of off-core outstanding re= ad-for-ownership (RFO) store transactions every cycle. An RFO transaction i= s considered to be in the Off-core outstanding state between L2 cache miss = and transaction completion.", "SampleAfterValue": "1000003", "UMask": "0x4" }, @@ -1067,7 +1179,7 @@ "Counter": "0,1,2,3", "EventCode": "0x2c", "EventName": "SQ_MISC.BUS_LOCK", - "PublicDescription": "Counts the more expensive bus lock needed to= enforce cache coherency for certain memory accesses that need to be done a= tomically. Can be created by issuing an atomic instruction (via the LOCK p= refix) which causes a cache line split or accesses uncacheable memory. Avai= lable PDIST counters: 0", + "PublicDescription": "Counts the more expensive bus lock needed to= enforce cache coherency for certain memory accesses that need to be done a= tomically. Can be created by issuing an atomic instruction (via the LOCK p= refix) which causes a cache line split or accesses uncacheable memory.", "SampleAfterValue": "100003", "UMask": "0x10" }, @@ -1076,7 +1188,6 @@ "Counter": "0,1,2,3", "EventCode": "0x40", "EventName": "SW_PREFETCH_ACCESS.ANY", - "PublicDescription": "Counts the number of PREFETCHNTA, PREFETCHW,= PREFETCHT0, PREFETCHT1 or PREFETCHT2 instructions executed. Available PDIS= T counters: 0", "SampleAfterValue": "100003", "UMask": "0xf" }, @@ -1085,7 +1196,7 @@ "Counter": "0,1,2,3", "EventCode": "0x40", "EventName": "SW_PREFETCH_ACCESS.NTA", - "PublicDescription": "Counts the number of PREFETCHNTA instruction= s executed. Available PDIST counters: 0", + "PublicDescription": "Counts the number of PREFETCHNTA instruction= s executed.", "SampleAfterValue": "100003", "UMask": "0x1" }, @@ -1094,7 +1205,7 @@ "Counter": "0,1,2,3", "EventCode": "0x40", "EventName": "SW_PREFETCH_ACCESS.PREFETCHW", - "PublicDescription": "Counts the number of PREFETCHW instructions = executed. Available PDIST counters: 0", + "PublicDescription": "Counts the number of PREFETCHW instructions = executed.", "SampleAfterValue": "100003", "UMask": "0x8" }, @@ -1103,7 +1214,7 @@ "Counter": "0,1,2,3", "EventCode": "0x40", "EventName": "SW_PREFETCH_ACCESS.T0", - "PublicDescription": "Counts the number of PREFETCHT0 instructions= executed. Available PDIST counters: 0", + "PublicDescription": "Counts the number of PREFETCHT0 instructions= executed.", "SampleAfterValue": "100003", "UMask": "0x2" }, @@ -1112,7 +1223,7 @@ "Counter": "0,1,2,3", "EventCode": "0x40", "EventName": "SW_PREFETCH_ACCESS.T1_T2", - "PublicDescription": "Counts the number of PREFETCHT1 or PREFETCHT= 2 instructions executed. Available PDIST counters: 0", + "PublicDescription": "Counts the number of PREFETCHT1 or PREFETCHT= 2 instructions executed.", "SampleAfterValue": "100003", "UMask": "0x4" } diff --git a/tools/perf/pmu-events/arch/x86/graniterapids/floating-point.js= on b/tools/perf/pmu-events/arch/x86/graniterapids/floating-point.json index 1832dd952f66..59789eee060c 100644 --- a/tools/perf/pmu-events/arch/x86/graniterapids/floating-point.json +++ b/tools/perf/pmu-events/arch/x86/graniterapids/floating-point.json @@ -5,7 +5,6 @@ "CounterMask": "1", "EventCode": "0xb0", "EventName": "ARITH.FPDIV_ACTIVE", - "PublicDescription": "This event counts the cycles the floating po= int divider is busy. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x1" }, @@ -14,7 +13,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc1", "EventName": "ASSISTS.FP", - "PublicDescription": "Counts all microcode Floating Point assists.= Available PDIST counters: 0", + "PublicDescription": "Counts all microcode Floating Point assists.= ", "SampleAfterValue": "100003", "UMask": "0x2" }, @@ -23,7 +22,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc1", "EventName": "ASSISTS.SSE_AVX_MIX", - "PublicDescription": "ASSISTS.SSE_AVX_MIX Available PDIST counters= : 0", "SampleAfterValue": "1000003", "UMask": "0x10" }, @@ -32,7 +30,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb3", "EventName": "FP_ARITH_DISPATCHED.PORT_0", - "PublicDescription": "FP_ARITH_DISPATCHED.PORT_0 [This event is al= ias to FP_ARITH_DISPATCHED.V0] Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -41,7 +38,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb3", "EventName": "FP_ARITH_DISPATCHED.PORT_1", - "PublicDescription": "FP_ARITH_DISPATCHED.PORT_1 [This event is al= ias to FP_ARITH_DISPATCHED.V1] Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x2" }, @@ -50,7 +46,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb3", "EventName": "FP_ARITH_DISPATCHED.PORT_5", - "PublicDescription": "FP_ARITH_DISPATCHED.PORT_5 [This event is al= ias to FP_ARITH_DISPATCHED.V2] Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x4" }, @@ -59,7 +54,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb3", "EventName": "FP_ARITH_DISPATCHED.V0", - "PublicDescription": "FP_ARITH_DISPATCHED.V0 [This event is alias = to FP_ARITH_DISPATCHED.PORT_0] Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -68,7 +62,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb3", "EventName": "FP_ARITH_DISPATCHED.V1", - "PublicDescription": "FP_ARITH_DISPATCHED.V1 [This event is alias = to FP_ARITH_DISPATCHED.PORT_1] Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x2" }, @@ -77,7 +70,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb3", "EventName": "FP_ARITH_DISPATCHED.V2", - "PublicDescription": "FP_ARITH_DISPATCHED.V2 [This event is alias = to FP_ARITH_DISPATCHED.PORT_5] Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x4" }, @@ -86,7 +78,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE", - "PublicDescription": "Number of SSE/AVX computational 128-bit pack= ed double precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 2 computation opera= tions, one for each element. Applies to SSE* and AVX* packed double precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as the= y perform 2 calculations per element. The DAZ and FTZ flags in the MXCSR re= gister need to be set when using these events. Available PDIST counters: 0", + "PublicDescription": "Number of SSE/AVX computational 128-bit pack= ed double precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 2 computation opera= tions, one for each element. Applies to SSE* and AVX* packed double precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as the= y perform 2 calculations per element. The DAZ and FTZ flags in the MXCSR re= gister need to be set when using these events.", "SampleAfterValue": "100003", "UMask": "0x4" }, @@ -95,7 +87,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE", - "PublicDescription": "Number of SSE/AVX computational 128-bit pack= ed single precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 4 computation opera= tions, one for each element. Applies to SSE* and AVX* packed single precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT RSQRT RCP DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count tw= ice as they perform 2 calculations per element. The DAZ and FTZ flags in th= e MXCSR register need to be set when using these events. Available PDIST co= unters: 0", + "PublicDescription": "Number of SSE/AVX computational 128-bit pack= ed single precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 4 computation opera= tions, one for each element. Applies to SSE* and AVX* packed single precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT RSQRT RCP DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count tw= ice as they perform 2 calculations per element. The DAZ and FTZ flags in th= e MXCSR register need to be set when using these events.", "SampleAfterValue": "100003", "UMask": "0x8" }, @@ -104,7 +96,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE", - "PublicDescription": "Number of SSE/AVX computational 256-bit pack= ed double precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 4 computation opera= tions, one for each element. Applies to SSE* and AVX* packed double precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 = calculations per element. The DAZ and FTZ flags in the MXCSR register need = to be set when using these events. Available PDIST counters: 0", + "PublicDescription": "Number of SSE/AVX computational 256-bit pack= ed double precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 4 computation opera= tions, one for each element. Applies to SSE* and AVX* packed double precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 = calculations per element. The DAZ and FTZ flags in the MXCSR register need = to be set when using these events.", "SampleAfterValue": "100003", "UMask": "0x10" }, @@ -113,7 +105,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE", - "PublicDescription": "Number of SSE/AVX computational 256-bit pack= ed single precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 8 computation opera= tions, one for each element. Applies to SSE* and AVX* packed single precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT RSQRT RCP DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count tw= ice as they perform 2 calculations per element. The DAZ and FTZ flags in th= e MXCSR register need to be set when using these events. Available PDIST co= unters: 0", + "PublicDescription": "Number of SSE/AVX computational 256-bit pack= ed single precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 8 computation opera= tions, one for each element. Applies to SSE* and AVX* packed single precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT RSQRT RCP DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count tw= ice as they perform 2 calculations per element. The DAZ and FTZ flags in th= e MXCSR register need to be set when using these events.", "SampleAfterValue": "100003", "UMask": "0x20" }, @@ -122,7 +114,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.4_FLOPS", - "PublicDescription": "Number of SSE/AVX computational 128-bit pack= ed single precision and 256-bit packed double precision floating-point ins= tructions retired; some instructions will count twice as noted below. Each= count represents 2 or/and 4 computation operations, one for each element. = Applies to SSE* and AVX* packed single precision floating-point and packed= double precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL= DIV MIN MAX RCP14 RSQRT14 SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB ins= tructions count twice as they perform 2 calculations per element. The DAZ a= nd FTZ flags in the MXCSR register need to be set when using these events. = Available PDIST counters: 0", + "PublicDescription": "Number of SSE/AVX computational 128-bit pack= ed single precision and 256-bit packed double precision floating-point ins= tructions retired; some instructions will count twice as noted below. Each= count represents 2 or/and 4 computation operations, one for each element. = Applies to SSE* and AVX* packed single precision floating-point and packed= double precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL= DIV MIN MAX RCP14 RSQRT14 SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB ins= tructions count twice as they perform 2 calculations per element. The DAZ a= nd FTZ flags in the MXCSR register need to be set when using these events.", "SampleAfterValue": "100003", "UMask": "0x18" }, @@ -131,7 +123,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE", - "PublicDescription": "Number of SSE/AVX computational 512-bit pack= ed double precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 8 computation opera= tions, one for each element. Applies to SSE* and AVX* packed double precis= ion floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT14 RCP14= FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 calc= ulations per element. The DAZ and FTZ flags in the MXCSR register need to b= e set when using these events. Available PDIST counters: 0", + "PublicDescription": "Number of SSE/AVX computational 512-bit pack= ed double precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 8 computation opera= tions, one for each element. Applies to SSE* and AVX* packed double precis= ion floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT14 RCP14= FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 calc= ulations per element. The DAZ and FTZ flags in the MXCSR register need to b= e set when using these events.", "SampleAfterValue": "100003", "UMask": "0x40" }, @@ -140,7 +132,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE", - "PublicDescription": "Number of SSE/AVX computational 512-bit pack= ed single precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 16 computation oper= ations, one for each element. Applies to SSE* and AVX* packed single preci= sion floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT14 RCP1= 4 FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 cal= culations per element. The DAZ and FTZ flags in the MXCSR register need to = be set when using these events. Available PDIST counters: 0", + "PublicDescription": "Number of SSE/AVX computational 512-bit pack= ed single precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 16 computation oper= ations, one for each element. Applies to SSE* and AVX* packed single preci= sion floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT14 RCP1= 4 FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 cal= culations per element. The DAZ and FTZ flags in the MXCSR register need to = be set when using these events.", "SampleAfterValue": "100003", "UMask": "0x80" }, @@ -149,7 +141,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.8_FLOPS", - "PublicDescription": "Number of SSE/AVX computational 256-bit pack= ed single precision and 512-bit packed double precision floating-point ins= tructions retired; some instructions will count twice as noted below. Each= count represents 8 computation operations, one for each element. Applies = to SSE* and AVX* packed single precision and double precision floating-poin= t instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT RSQRT RSQRT14= RCP RCP14 DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice= as they perform 2 calculations per element. The DAZ and FTZ flags in the M= XCSR register need to be set when using these events. Available PDIST count= ers: 0", + "PublicDescription": "Number of SSE/AVX computational 256-bit pack= ed single precision and 512-bit packed double precision floating-point ins= tructions retired; some instructions will count twice as noted below. Each= count represents 8 computation operations, one for each element. Applies = to SSE* and AVX* packed single precision and double precision floating-poin= t instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT RSQRT RSQRT14= RCP RCP14 DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice= as they perform 2 calculations per element. The DAZ and FTZ flags in the M= XCSR register need to be set when using these events.", "SampleAfterValue": "100003", "UMask": "0x60" }, @@ -158,7 +150,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.SCALAR", - "PublicDescription": "Number of SSE/AVX computational scalar singl= e precision and double precision floating-point instructions retired; some = instructions will count twice as noted below. Each count represents 1 comp= utational operation. Applies to SSE* and AVX* scalar single precision float= ing-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT RCP FM(N)ADD/SUB= . FM(N)ADD/SUB instructions count twice as they perform 2 calculations per= element. The DAZ and FTZ flags in the MXCSR register need to be set when u= sing these events. Available PDIST counters: 0", + "PublicDescription": "Number of SSE/AVX computational scalar singl= e precision and double precision floating-point instructions retired; some = instructions will count twice as noted below. Each count represents 1 comp= utational operation. Applies to SSE* and AVX* scalar single precision float= ing-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT RCP FM(N)ADD/SUB= . FM(N)ADD/SUB instructions count twice as they perform 2 calculations per= element. The DAZ and FTZ flags in the MXCSR register need to be set when u= sing these events.", "SampleAfterValue": "1000003", "UMask": "0x3" }, @@ -167,7 +159,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.SCALAR_DOUBLE", - "PublicDescription": "Number of SSE/AVX computational scalar doubl= e precision floating-point instructions retired; some instructions will cou= nt twice as noted below. Each count represents 1 computational operation. = Applies to SSE* and AVX* scalar double precision floating-point instruction= s: ADD SUB MUL DIV MIN MAX SQRT FM(N)ADD/SUB. FM(N)ADD/SUB instructions co= unt twice as they perform 2 calculations per element. The DAZ and FTZ flags= in the MXCSR register need to be set when using these events. Available PD= IST counters: 0", + "PublicDescription": "Number of SSE/AVX computational scalar doubl= e precision floating-point instructions retired; some instructions will cou= nt twice as noted below. Each count represents 1 computational operation. = Applies to SSE* and AVX* scalar double precision floating-point instruction= s: ADD SUB MUL DIV MIN MAX SQRT FM(N)ADD/SUB. FM(N)ADD/SUB instructions co= unt twice as they perform 2 calculations per element. The DAZ and FTZ flags= in the MXCSR register need to be set when using these events.", "SampleAfterValue": "100003", "UMask": "0x1" }, @@ -176,7 +168,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.SCALAR_SINGLE", - "PublicDescription": "Number of SSE/AVX computational scalar singl= e precision floating-point instructions retired; some instructions will cou= nt twice as noted below. Each count represents 1 computational operation. = Applies to SSE* and AVX* scalar single precision floating-point instruction= s: ADD SUB MUL DIV MIN MAX SQRT RSQRT RCP FM(N)ADD/SUB. FM(N)ADD/SUB instr= uctions count twice as they perform 2 calculations per element. The DAZ and= FTZ flags in the MXCSR register need to be set when using these events. Av= ailable PDIST counters: 0", + "PublicDescription": "Number of SSE/AVX computational scalar singl= e precision floating-point instructions retired; some instructions will cou= nt twice as noted below. Each count represents 1 computational operation. = Applies to SSE* and AVX* scalar single precision floating-point instruction= s: ADD SUB MUL DIV MIN MAX SQRT RSQRT RCP FM(N)ADD/SUB. FM(N)ADD/SUB instr= uctions count twice as they perform 2 calculations per element. The DAZ and= FTZ flags in the MXCSR register need to be set when using these events.", "SampleAfterValue": "100003", "UMask": "0x2" }, @@ -185,7 +177,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.VECTOR", - "PublicDescription": "Number of any Vector retired FP arithmetic i= nstructions. The DAZ and FTZ flags in the MXCSR register need to be set wh= en using these events. Available PDIST counters: 0", + "PublicDescription": "Number of any Vector retired FP arithmetic i= nstructions. The DAZ and FTZ flags in the MXCSR register need to be set wh= en using these events.", "SampleAfterValue": "1000003", "UMask": "0xfc" }, @@ -194,7 +186,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xcf", "EventName": "FP_ARITH_INST_RETIRED2.128B_PACKED_HALF", - "PublicDescription": "FP_ARITH_INST_RETIRED2.128B_PACKED_HALF Avai= lable PDIST counters: 0", "SampleAfterValue": "100003", "UMask": "0x4" }, @@ -203,7 +194,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xcf", "EventName": "FP_ARITH_INST_RETIRED2.256B_PACKED_HALF", - "PublicDescription": "FP_ARITH_INST_RETIRED2.256B_PACKED_HALF Avai= lable PDIST counters: 0", "SampleAfterValue": "100003", "UMask": "0x8" }, @@ -212,7 +202,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xcf", "EventName": "FP_ARITH_INST_RETIRED2.512B_PACKED_HALF", - "PublicDescription": "FP_ARITH_INST_RETIRED2.512B_PACKED_HALF Avai= lable PDIST counters: 0", "SampleAfterValue": "100003", "UMask": "0x10" }, @@ -221,7 +210,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xcf", "EventName": "FP_ARITH_INST_RETIRED2.COMPLEX_SCALAR_HALF", - "PublicDescription": "FP_ARITH_INST_RETIRED2.COMPLEX_SCALAR_HALF A= vailable PDIST counters: 0", "SampleAfterValue": "100003", "UMask": "0x2" }, @@ -230,7 +218,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xcf", "EventName": "FP_ARITH_INST_RETIRED2.SCALAR", - "PublicDescription": "FP_ARITH_INST_RETIRED2.SCALAR Available PDIS= T counters: 0", + "PublicDescription": "FP_ARITH_INST_RETIRED2.SCALAR", "SampleAfterValue": "100003", "UMask": "0x3" }, @@ -239,7 +227,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xcf", "EventName": "FP_ARITH_INST_RETIRED2.SCALAR_HALF", - "PublicDescription": "FP_ARITH_INST_RETIRED2.SCALAR_HALF Available= PDIST counters: 0", "SampleAfterValue": "100003", "UMask": "0x1" }, @@ -248,7 +235,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xcf", "EventName": "FP_ARITH_INST_RETIRED2.VECTOR", - "PublicDescription": "FP_ARITH_INST_RETIRED2.VECTOR Available PDIS= T counters: 0", + "PublicDescription": "FP_ARITH_INST_RETIRED2.VECTOR", "SampleAfterValue": "100003", "UMask": "0x1c" } diff --git a/tools/perf/pmu-events/arch/x86/graniterapids/frontend.json b/t= ools/perf/pmu-events/arch/x86/graniterapids/frontend.json index b7cd92fbecd5..d580d305c926 100644 --- a/tools/perf/pmu-events/arch/x86/graniterapids/frontend.json +++ b/tools/perf/pmu-events/arch/x86/graniterapids/frontend.json @@ -4,7 +4,7 @@ "Counter": "0,1,2,3", "EventCode": "0x60", "EventName": "BACLEARS.ANY", - "PublicDescription": "Number of times the front-end is resteered w= hen it finds a branch instruction in a fetch line. This is called Unknown B= ranch which occurs for the first time a branch instruction is fetched or wh= en the branch is not tracked by the BPU (Branch Prediction Unit) anymore. A= vailable PDIST counters: 0", + "PublicDescription": "Number of times the front-end is resteered w= hen it finds a branch instruction in a fetch line. This is called Unknown B= ranch which occurs for the first time a branch instruction is fetched or wh= en the branch is not tracked by the BPU (Branch Prediction Unit) anymore.", "SampleAfterValue": "100003", "UMask": "0x1" }, @@ -13,7 +13,7 @@ "Counter": "0,1,2,3", "EventCode": "0x87", "EventName": "DECODE.LCP", - "PublicDescription": "Counts cycles that the Instruction Length de= coder (ILD) stalls occurred due to dynamically changing prefix length of th= e decoded instruction (by operand size prefix instruction 0x66, address siz= e prefix instruction 0x67 or REX.W for Intel64). Count is proportional to t= he number of prefixes in a 16B-line. This may result in a three-cycle penal= ty for each LCP (Length changing prefix) in a 16-byte chunk. Available PDIS= T counters: 0", + "PublicDescription": "Counts cycles that the Instruction Length de= coder (ILD) stalls occurred due to dynamically changing prefix length of th= e decoded instruction (by operand size prefix instruction 0x66, address siz= e prefix instruction 0x67 or REX.W for Intel64). Count is proportional to t= he number of prefixes in a 16B-line. This may result in a three-cycle penal= ty for each LCP (Length changing prefix) in a 16-byte chunk.", "SampleAfterValue": "500009", "UMask": "0x1" }, @@ -22,7 +22,6 @@ "Counter": "0,1,2,3", "EventCode": "0x87", "EventName": "DECODE.MS_BUSY", - "PublicDescription": "Cycles the Microcode Sequencer is busy. Avai= lable PDIST counters: 0", "SampleAfterValue": "500009", "UMask": "0x2" }, @@ -31,7 +30,7 @@ "Counter": "0,1,2,3", "EventCode": "0x61", "EventName": "DSB2MITE_SWITCHES.PENALTY_CYCLES", - "PublicDescription": "Decode Stream Buffer (DSB) is a Uop-cache th= at holds translations of previously fetched instructions that were decoded = by the legacy x86 decode pipeline (MITE). This event counts fetch penalty c= ycles when a transition occurs from DSB to MITE. Available PDIST counters: = 0", + "PublicDescription": "Decode Stream Buffer (DSB) is a Uop-cache th= at holds translations of previously fetched instructions that were decoded = by the legacy x86 decode pipeline (MITE). This event counts fetch penalty c= ycles when a transition occurs from DSB to MITE.", "SampleAfterValue": "100003", "UMask": "0x2" }, @@ -303,7 +302,7 @@ "Counter": "0,1,2,3", "EventCode": "0x80", "EventName": "ICACHE_DATA.STALLS", - "PublicDescription": "Counts cycles where a code line fetch is sta= lled due to an L1 instruction cache miss. The decode pipeline works at a 32= Byte granularity. Available PDIST counters: 0", + "PublicDescription": "Counts cycles where a code line fetch is sta= lled due to an L1 instruction cache miss. The decode pipeline works at a 32= Byte granularity.", "SampleAfterValue": "500009", "UMask": "0x4" }, @@ -314,7 +313,6 @@ "EdgeDetect": "1", "EventCode": "0x80", "EventName": "ICACHE_DATA.STALL_PERIODS", - "PublicDescription": "ICACHE_DATA.STALL_PERIODS Available PDIST co= unters: 0", "SampleAfterValue": "500009", "UMask": "0x4" }, @@ -323,7 +321,7 @@ "Counter": "0,1,2,3", "EventCode": "0x83", "EventName": "ICACHE_TAG.STALLS", - "PublicDescription": "Counts cycles where a code fetch is stalled = due to L1 instruction cache tag miss. Available PDIST counters: 0", + "PublicDescription": "Counts cycles where a code fetch is stalled = due to L1 instruction cache tag miss.", "SampleAfterValue": "200003", "UMask": "0x4" }, @@ -333,7 +331,7 @@ "CounterMask": "1", "EventCode": "0x79", "EventName": "IDQ.DSB_CYCLES_ANY", - "PublicDescription": "Counts the number of cycles uops were delive= red to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) p= ath. Available PDIST counters: 0", + "PublicDescription": "Counts the number of cycles uops were delive= red to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) p= ath.", "SampleAfterValue": "2000003", "UMask": "0x8" }, @@ -343,7 +341,7 @@ "CounterMask": "6", "EventCode": "0x79", "EventName": "IDQ.DSB_CYCLES_OK", - "PublicDescription": "Counts the number of cycles where optimal nu= mber of uops was delivered to the Instruction Decode Queue (IDQ) from the D= SB (Decode Stream Buffer) path. Count includes uops that may 'bypass' the I= DQ. Available PDIST counters: 0", + "PublicDescription": "Counts the number of cycles where optimal nu= mber of uops was delivered to the Instruction Decode Queue (IDQ) from the D= SB (Decode Stream Buffer) path. Count includes uops that may 'bypass' the I= DQ.", "SampleAfterValue": "2000003", "UMask": "0x8" }, @@ -352,7 +350,7 @@ "Counter": "0,1,2,3", "EventCode": "0x79", "EventName": "IDQ.DSB_UOPS", - "PublicDescription": "Counts the number of uops delivered to Instr= uction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path. Availab= le PDIST counters: 0", + "PublicDescription": "Counts the number of uops delivered to Instr= uction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path.", "SampleAfterValue": "2000003", "UMask": "0x8" }, @@ -362,7 +360,7 @@ "CounterMask": "1", "EventCode": "0x79", "EventName": "IDQ.MITE_CYCLES_ANY", - "PublicDescription": "Counts the number of cycles uops were delive= red to the Instruction Decode Queue (IDQ) from the MITE (legacy decode pipe= line) path. During these cycles uops are not being delivered from the Decod= e Stream Buffer (DSB). Available PDIST counters: 0", + "PublicDescription": "Counts the number of cycles uops were delive= red to the Instruction Decode Queue (IDQ) from the MITE (legacy decode pipe= line) path. During these cycles uops are not being delivered from the Decod= e Stream Buffer (DSB).", "SampleAfterValue": "2000003", "UMask": "0x4" }, @@ -372,7 +370,7 @@ "CounterMask": "6", "EventCode": "0x79", "EventName": "IDQ.MITE_CYCLES_OK", - "PublicDescription": "Counts the number of cycles where optimal nu= mber of uops was delivered to the Instruction Decode Queue (IDQ) from the M= ITE (legacy decode pipeline) path. During these cycles uops are not being d= elivered from the Decode Stream Buffer (DSB). Available PDIST counters: 0", + "PublicDescription": "Counts the number of cycles where optimal nu= mber of uops was delivered to the Instruction Decode Queue (IDQ) from the M= ITE (legacy decode pipeline) path. During these cycles uops are not being d= elivered from the Decode Stream Buffer (DSB).", "SampleAfterValue": "2000003", "UMask": "0x4" }, @@ -381,7 +379,7 @@ "Counter": "0,1,2,3", "EventCode": "0x79", "EventName": "IDQ.MITE_UOPS", - "PublicDescription": "Counts the number of uops delivered to Instr= uction Decode Queue (IDQ) from the MITE path. This also means that uops are= not being delivered from the Decode Stream Buffer (DSB). Available PDIST c= ounters: 0", + "PublicDescription": "Counts the number of uops delivered to Instr= uction Decode Queue (IDQ) from the MITE path. This also means that uops are= not being delivered from the Decode Stream Buffer (DSB).", "SampleAfterValue": "2000003", "UMask": "0x4" }, @@ -391,7 +389,7 @@ "CounterMask": "1", "EventCode": "0x79", "EventName": "IDQ.MS_CYCLES_ANY", - "PublicDescription": "Counts cycles during which uops are being de= livered to Instruction Decode Queue (IDQ) while the Microcode Sequencer (MS= ) is busy. Uops maybe initiated by Decode Stream Buffer (DSB) or MITE. Avai= lable PDIST counters: 0", + "PublicDescription": "Counts cycles during which uops are being de= livered to Instruction Decode Queue (IDQ) while the Microcode Sequencer (MS= ) is busy. Uops maybe initiated by Decode Stream Buffer (DSB) or MITE.", "SampleAfterValue": "2000003", "UMask": "0x20" }, @@ -402,7 +400,7 @@ "EdgeDetect": "1", "EventCode": "0x79", "EventName": "IDQ.MS_SWITCHES", - "PublicDescription": "Number of switches from DSB (Decode Stream B= uffer) or MITE (legacy decode pipeline) to the Microcode Sequencer. Availab= le PDIST counters: 0", + "PublicDescription": "Number of switches from DSB (Decode Stream B= uffer) or MITE (legacy decode pipeline) to the Microcode Sequencer.", "SampleAfterValue": "100003", "UMask": "0x20" }, @@ -411,7 +409,7 @@ "Counter": "0,1,2,3", "EventCode": "0x79", "EventName": "IDQ.MS_UOPS", - "PublicDescription": "Counts the number of uops initiated by MITE = or Decode Stream Buffer (DSB) and delivered to Instruction Decode Queue (ID= Q) while the Microcode Sequencer (MS) is busy. Counting includes uops that = may 'bypass' the IDQ. Available PDIST counters: 0", + "PublicDescription": "Counts the number of uops initiated by MITE = or Decode Stream Buffer (DSB) and delivered to Instruction Decode Queue (ID= Q) while the Microcode Sequencer (MS) is busy. Counting includes uops that = may 'bypass' the IDQ.", "SampleAfterValue": "1000003", "UMask": "0x20" }, @@ -420,7 +418,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0x9c", "EventName": "IDQ_BUBBLES.CORE", - "PublicDescription": "This event counts a subset of the Topdown Sl= ots event that when no operation was delivered to the back-end pipeline due= to instruction fetch limitations when the back-end could have accepted mor= e operations. Common examples include instruction cache misses or x86 instr= uction decode limitations. The count may be distributed among unhalted logi= cal processors (hyper-threads) who share the same physical core, in process= ors that support Intel Hyper-Threading Technology. Software can use this ev= ent as the numerator for the Frontend Bound metric (or top-level category) = of the Top-down Microarchitecture Analysis method. Available PDIST counters= : 0", + "PublicDescription": "This event counts a subset of the Topdown Sl= ots event that when no operation was delivered to the back-end pipeline due= to instruction fetch limitations when the back-end could have accepted mor= e operations. Common examples include instruction cache misses or x86 instr= uction decode limitations. The count may be distributed among unhalted logi= cal processors (hyper-threads) who share the same physical core, in process= ors that support Intel Hyper-Threading Technology. Software can use this ev= ent as the numerator for the Frontend Bound metric (or top-level category) = of the Top-down Microarchitecture Analysis method.", "SampleAfterValue": "1000003", "UMask": "0x1" }, @@ -430,7 +428,7 @@ "CounterMask": "6", "EventCode": "0x9c", "EventName": "IDQ_BUBBLES.CYCLES_0_UOPS_DELIV.CORE", - "PublicDescription": "Counts the number of cycles when no uops wer= e delivered by the Instruction Decode Queue (IDQ) to the back-end of the pi= peline when there was no back-end stalls. This event counts for one SMT thr= ead in a given cycle. [This event is alias to IDQ_UOPS_NOT_DELIVERED.CYCLES= _0_UOPS_DELIV.CORE] Available PDIST counters: 0", + "PublicDescription": "Counts the number of cycles when no uops wer= e delivered by the Instruction Decode Queue (IDQ) to the back-end of the pi= peline when there was no back-end stalls. This event counts for one SMT thr= ead in a given cycle. [This event is alias to IDQ_UOPS_NOT_DELIVERED.CYCLES= _0_UOPS_DELIV.CORE]", "SampleAfterValue": "1000003", "UMask": "0x1" }, @@ -441,7 +439,7 @@ "EventCode": "0x9c", "EventName": "IDQ_BUBBLES.CYCLES_FE_WAS_OK", "Invert": "1", - "PublicDescription": "Counts the number of cycles when the optimal= number of uops were delivered by the Instruction Decode Queue (IDQ) to the= back-end of the pipeline when there was no back-end stalls. This event cou= nts for one SMT thread in a given cycle. [This event is alias to IDQ_UOPS_N= OT_DELIVERED.CYCLES_FE_WAS_OK] Available PDIST counters: 0", + "PublicDescription": "Counts the number of cycles when the optimal= number of uops were delivered by the Instruction Decode Queue (IDQ) to the= back-end of the pipeline when there was no back-end stalls. This event cou= nts for one SMT thread in a given cycle. [This event is alias to IDQ_UOPS_N= OT_DELIVERED.CYCLES_FE_WAS_OK]", "SampleAfterValue": "1000003", "UMask": "0x1" }, @@ -450,7 +448,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0x9c", "EventName": "IDQ_UOPS_NOT_DELIVERED.CORE", - "PublicDescription": "Counts the number of uops not delivered to b= y the Instruction Decode Queue (IDQ) to the back-end of the pipeline when t= here was no back-end stalls. This event counts for one SMT thread in a give= n cycle. Available PDIST counters: 0", + "PublicDescription": "Counts the number of uops not delivered to b= y the Instruction Decode Queue (IDQ) to the back-end of the pipeline when t= here was no back-end stalls. This event counts for one SMT thread in a give= n cycle.", "SampleAfterValue": "1000003", "UMask": "0x1" }, @@ -460,7 +458,7 @@ "CounterMask": "6", "EventCode": "0x9c", "EventName": "IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE", - "PublicDescription": "Counts the number of cycles when no uops wer= e delivered by the Instruction Decode Queue (IDQ) to the back-end of the pi= peline when there was no back-end stalls. This event counts for one SMT thr= ead in a given cycle. [This event is alias to IDQ_BUBBLES.CYCLES_0_UOPS_DEL= IV.CORE] Available PDIST counters: 0", + "PublicDescription": "Counts the number of cycles when no uops wer= e delivered by the Instruction Decode Queue (IDQ) to the back-end of the pi= peline when there was no back-end stalls. This event counts for one SMT thr= ead in a given cycle. [This event is alias to IDQ_BUBBLES.CYCLES_0_UOPS_DEL= IV.CORE]", "SampleAfterValue": "1000003", "UMask": "0x1" }, @@ -471,7 +469,7 @@ "EventCode": "0x9c", "EventName": "IDQ_UOPS_NOT_DELIVERED.CYCLES_FE_WAS_OK", "Invert": "1", - "PublicDescription": "Counts the number of cycles when the optimal= number of uops were delivered by the Instruction Decode Queue (IDQ) to the= back-end of the pipeline when there was no back-end stalls. This event cou= nts for one SMT thread in a given cycle. [This event is alias to IDQ_BUBBLE= S.CYCLES_FE_WAS_OK] Available PDIST counters: 0", + "PublicDescription": "Counts the number of cycles when the optimal= number of uops were delivered by the Instruction Decode Queue (IDQ) to the= back-end of the pipeline when there was no back-end stalls. This event cou= nts for one SMT thread in a given cycle. [This event is alias to IDQ_BUBBLE= S.CYCLES_FE_WAS_OK]", "SampleAfterValue": "1000003", "UMask": "0x1" } diff --git a/tools/perf/pmu-events/arch/x86/graniterapids/gnr-metrics.json = b/tools/perf/pmu-events/arch/x86/graniterapids/gnr-metrics.json index 9a620e1b8de8..cc3c834ca286 100644 --- a/tools/perf/pmu-events/arch/x86/graniterapids/gnr-metrics.json +++ b/tools/perf/pmu-events/arch/x86/graniterapids/gnr-metrics.json @@ -1,28 +1,28 @@ [ { "BriefDescription": "C1 residency percent per core", - "MetricExpr": "cstate_core@c1\\-residency@ / TSC", + "MetricExpr": "cstate_core@c1\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C1_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C2 residency percent per package", - "MetricExpr": "cstate_pkg@c2\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c2\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C2_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per core", - "MetricExpr": "cstate_core@c6\\-residency@ / TSC", + "MetricExpr": "cstate_core@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per package", - "MetricExpr": "cstate_pkg@c6\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Pkg_Residency", "ScaleUnit": "100%" @@ -381,7 +381,7 @@ { "BriefDescription": "This category represents fraction of slots wh= ere no uops are being delivered due to a lack of required resources for acc= epting new uops in the Backend", "DefaultMetricgroupName": "TopdownL1", - "MetricExpr": "topdown\\-be\\-bound / (topdown\\-fe\\-bound + topd= own\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_inf= o_thread_slots", + "MetricExpr": "topdown\\-be\\-bound / (topdown\\-fe\\-bound + topd= own\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound)", "MetricGroup": "BvOB;Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", "MetricThreshold": "tma_backend_bound > 0.2", @@ -415,40 +415,40 @@ "MetricThreshold": "tma_bottleneck_branching_overhead > 5", "PublicDescription": "Total pipeline cost of instructions used for= program control-flow - a subset of the Retiring category in TMA. Examples = include function calls; loops and alignments. (A lower bound)" }, + { + "BriefDescription": "Total pipeline cost when the execution is com= pute-bound - an estimation", + "MetricExpr": "100 * (tma_core_bound * tma_divider / (tma_amx_busy= + tma_divider + tma_ports_utilization + tma_serializing_operation) + tma_c= ore_bound * tma_amx_busy / (tma_amx_busy + tma_divider + tma_ports_utilizat= ion + tma_serializing_operation) + tma_core_bound * (tma_ports_utilization = / (tma_amx_busy + tma_divider + tma_ports_utilization + tma_serializing_ope= ration)) * (tma_ports_utilized_3m / (tma_ports_utilized_0 + tma_ports_utili= zed_1 + tma_ports_utilized_2 + tma_ports_utilized_3m)))", + "MetricGroup": "BvCB;Cor;tma_issueComp", + "MetricName": "tma_bottleneck_compute_bound_est", + "MetricThreshold": "tma_bottleneck_compute_bound_est > 20", + "PublicDescription": "Total pipeline cost when the execution is co= mpute-bound - an estimation. Covers Core Bound when High ILP as well as whe= n long-latency execution units are busy. Related metrics: " + }, { "BriefDescription": "Total pipeline cost of external Memory- or Ca= che-Bandwidth related bottlenecks", - "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) *= (tma_mem_bandwidth / (tma_mem_bandwidth + tma_mem_latency)) + tma_memory_b= ound * (tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_= l3_bound + tma_store_bound)) * (tma_sq_full / (tma_contested_accesses + tma= _data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_memory_bound * (tm= a_l1_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound += tma_store_bound)) * (tma_fb_full / (tma_dtlb_load + tma_fb_full + tma_l1_l= atency_dependency + tma_lock_latency + tma_split_loads + tma_store_fwd_blk)= ))", + "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_cx= l_mem_bound + tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound += tma_store_bound)) * (tma_mem_bandwidth / (tma_mem_bandwidth + tma_mem_late= ncy)) + tma_memory_bound * (tma_l3_bound / (tma_cxl_mem_bound + tma_dram_bo= und + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma= _sq_full / (tma_contested_accesses + tma_data_sharing + tma_l3_hit_latency = + tma_sq_full)) + tma_memory_bound * (tma_l1_bound / (tma_cxl_mem_bound + t= ma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_boun= d)) * (tma_fb_full / (tma_dtlb_load + tma_fb_full + tma_l1_latency_dependen= cy + tma_lock_latency + tma_split_loads + tma_store_fwd_blk)))", "MetricGroup": "BvMB;Mem;MemoryBW;Offcore;tma_issueBW", - "MetricName": "tma_bottleneck_cache_memory_bandwidth", - "MetricThreshold": "tma_bottleneck_cache_memory_bandwidth > 20", + "MetricName": "tma_bottleneck_data_cache_memory_bandwidth", + "MetricThreshold": "tma_bottleneck_data_cache_memory_bandwidth > 2= 0", "PublicDescription": "Total pipeline cost of external Memory- or C= ache-Bandwidth related bottlenecks. Related metrics: tma_fb_full, tma_info_= system_dram_bw_use, tma_mem_bandwidth, tma_sq_full" }, { "BriefDescription": "Total pipeline cost of external Memory- or Ca= che-Latency related bottlenecks", - "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) *= (tma_mem_latency / (tma_mem_bandwidth + tma_mem_latency)) + tma_memory_bou= nd * (tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3= _bound + tma_store_bound)) * (tma_l3_hit_latency / (tma_contested_accesses = + tma_data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_memory_bound = * tma_l2_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bou= nd + tma_store_bound) + tma_memory_bound * (tma_l1_bound / (tma_dram_bound = + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_l1_= latency_dependency / (tma_dtlb_load + tma_fb_full + tma_l1_latency_dependen= cy + tma_lock_latency + tma_split_loads + tma_store_fwd_blk)) + tma_memory_= bound * (tma_l1_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma= _l3_bound + tma_store_bound)) * (tma_lock_latency / (tma_dtlb_load + tma_fb= _full + tma_l1_latency_dependency + tma_lock_latency + tma_split_loads + tm= a_store_fwd_blk)) + tma_memory_bound * (tma_l1_bound / (tma_dram_bound + tm= a_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_split_l= oads / (tma_dtlb_load + tma_fb_full + tma_l1_latency_dependency + tma_lock_= latency + tma_split_loads + tma_store_fwd_blk)) + tma_memory_bound * (tma_s= tore_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound += tma_store_bound)) * (tma_split_stores / (tma_dtlb_store + tma_false_sharin= g + tma_split_stores + tma_store_latency + tma_streaming_stores)) + tma_mem= ory_bound * (tma_store_bound / (tma_dram_bound + tma_l1_bound + tma_l2_boun= d + tma_l3_bound + tma_store_bound)) * (tma_store_latency / (tma_dtlb_store= + tma_false_sharing + tma_split_stores + tma_store_latency + tma_streaming= _stores)))", + "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_cx= l_mem_bound + tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound += tma_store_bound)) * (tma_mem_latency / (tma_mem_bandwidth + tma_mem_latenc= y)) + 0 / (tma_cxl_mem_bound + tma_dram_bound + tma_l1_bound + tma_l2_bound= + tma_l3_bound + tma_store_bound) * tma_mem_latency / (tma_mem_bandwidth += tma_mem_latency) + tma_memory_bound * (tma_l3_bound / (tma_cxl_mem_bound += tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bo= und)) * (tma_l3_hit_latency / (tma_contested_accesses + tma_data_sharing + = tma_l3_hit_latency + tma_sq_full)) + tma_memory_bound * tma_l2_bound / (tma= _cxl_mem_bound + tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_boun= d + tma_store_bound) + tma_memory_bound * (tma_l1_bound / (tma_cxl_mem_boun= d + tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store= _bound)) * (tma_l1_latency_dependency / (tma_dtlb_load + tma_fb_full + tma_= l1_latency_dependency + tma_lock_latency + tma_split_loads + tma_store_fwd_= blk)) + tma_memory_bound * (tma_l1_bound / (tma_cxl_mem_bound + tma_dram_bo= und + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma= _lock_latency / (tma_dtlb_load + tma_fb_full + tma_l1_latency_dependency + = tma_lock_latency + tma_split_loads + tma_store_fwd_blk)) + tma_memory_bound= * (tma_l1_bound / (tma_cxl_mem_bound + tma_dram_bound + tma_l1_bound + tma= _l2_bound + tma_l3_bound + tma_store_bound)) * (tma_split_loads / (tma_dtlb= _load + tma_fb_full + tma_l1_latency_dependency + tma_lock_latency + tma_sp= lit_loads + tma_store_fwd_blk)) + tma_memory_bound * (tma_store_bound / (tm= a_cxl_mem_bound + tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bou= nd + tma_store_bound)) * (tma_split_stores / (tma_dtlb_store + tma_false_sh= aring + tma_split_stores + tma_store_latency + tma_streaming_stores)) + tma= _memory_bound * (tma_store_bound / (tma_cxl_mem_bound + tma_dram_bound + tm= a_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_store_l= atency / (tma_dtlb_store + tma_false_sharing + tma_split_stores + tma_store= _latency + tma_streaming_stores)))", "MetricGroup": "BvML;Mem;MemoryLat;Offcore;tma_issueLat", - "MetricName": "tma_bottleneck_cache_memory_latency", - "MetricThreshold": "tma_bottleneck_cache_memory_latency > 20", + "MetricName": "tma_bottleneck_data_cache_memory_latency", + "MetricThreshold": "tma_bottleneck_data_cache_memory_latency > 20", "PublicDescription": "Total pipeline cost of external Memory- or C= ache-Latency related bottlenecks. Related metrics: tma_l3_hit_latency, tma_= mem_latency" }, - { - "BriefDescription": "Total pipeline cost when the execution is com= pute-bound - an estimation", - "MetricExpr": "100 * (tma_core_bound * tma_divider / (tma_amx_busy= + tma_divider + tma_ports_utilization + tma_serializing_operation) + tma_c= ore_bound * tma_amx_busy / (tma_amx_busy + tma_divider + tma_ports_utilizat= ion + tma_serializing_operation) + tma_core_bound * (tma_ports_utilization = / (tma_amx_busy + tma_divider + tma_ports_utilization + tma_serializing_ope= ration)) * (tma_ports_utilized_3m / (tma_ports_utilized_0 + tma_ports_utili= zed_1 + tma_ports_utilized_2 + tma_ports_utilized_3m)))", - "MetricGroup": "BvCB;Cor;tma_issueComp", - "MetricName": "tma_bottleneck_compute_bound_est", - "MetricThreshold": "tma_bottleneck_compute_bound_est > 20", - "PublicDescription": "Total pipeline cost when the execution is co= mpute-bound - an estimation. Covers Core Bound when High ILP as well as whe= n long-latency execution units are busy. Related metrics: " - }, { "BriefDescription": "Total pipeline cost of instruction fetch band= width related bottlenecks (when the front-end could not sustain operations = delivery to the back-end)", - "MetricExpr": "100 * (tma_frontend_bound - (1 - 10 * tma_microcode= _sequencer * tma_other_mispredicts / tma_branch_mispredicts) * tma_fetch_la= tency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches = + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches) - (1 - I= NST_RETIRED.REP_ITERATION / cpu@UOPS_RETIRED.MS\\,cmask\\=3D1@) * (tma_fetc= h_latency * (tma_ms_switches + tma_branch_resteers * (tma_clears_resteers += tma_mispredicts_resteers * tma_other_mispredicts / tma_branch_mispredicts)= / (tma_clears_resteers + tma_mispredicts_resteers + tma_unknown_branches))= / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_m= isses + tma_lcp + tma_ms_switches) + tma_fetch_bandwidth * tma_ms / (tma_ds= b + tma_mite + tma_ms))) - tma_bottleneck_big_code", + "MetricExpr": "100 * (tma_frontend_bound - (1 - 10 * tma_microcode= _sequencer * tma_other_mispredicts / tma_branch_mispredicts) * tma_fetch_la= tency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches = + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches) - (1 - I= NST_RETIRED.REP_ITERATION / cpu@UOPS_RETIRED.MS\\,cmask\\=3D1@) * (tma_fetc= h_latency * (tma_ms_switches + tma_branch_resteers * (tma_clears_resteers += tma_mispredicts_resteers * tma_other_mispredicts / tma_branch_mispredicts)= / (tma_clears_resteers + tma_mispredicts_resteers + tma_unknown_branches))= / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_m= isses + tma_lcp + tma_ms_switches) + tma_ms)) - tma_bottleneck_big_code", "MetricGroup": "BvFB;Fed;FetchBW;Frontend", "MetricName": "tma_bottleneck_instruction_fetch_bw", "MetricThreshold": "tma_bottleneck_instruction_fetch_bw > 20" }, { "BriefDescription": "Total pipeline cost of irregular execution (e= .g", - "MetricExpr": "100 * ((1 - INST_RETIRED.REP_ITERATION / cpu@UOPS_R= ETIRED.MS\\,cmask\\=3D1@) * (tma_fetch_latency * (tma_ms_switches + tma_bra= nch_resteers * (tma_clears_resteers + tma_mispredicts_resteers * tma_other_= mispredicts / tma_branch_mispredicts) / (tma_clears_resteers + tma_mispredi= cts_resteers + tma_unknown_branches)) / (tma_branch_resteers + tma_dsb_swit= ches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches) + t= ma_fetch_bandwidth * tma_ms / (tma_dsb + tma_mite + tma_ms)) + 10 * tma_mic= rocode_sequencer * tma_other_mispredicts / tma_branch_mispredicts * tma_bra= nch_mispredicts + tma_machine_clears * tma_other_nukes / tma_other_nukes + = tma_core_bound * (tma_serializing_operation + RS.EMPTY_RESOURCE / tma_info_= thread_clks * tma_ports_utilized_0) / (tma_amx_busy + tma_divider + tma_por= ts_utilization + tma_serializing_operation) + tma_microcode_sequencer / (tm= a_few_uops_instructions + tma_microcode_sequencer) * (tma_assists / tma_mic= rocode_sequencer) * tma_heavy_operations)", + "MetricExpr": "100 * ((1 - INST_RETIRED.REP_ITERATION / cpu@UOPS_R= ETIRED.MS\\,cmask\\=3D1@) * (tma_fetch_latency * (tma_ms_switches + tma_bra= nch_resteers * (tma_clears_resteers + tma_mispredicts_resteers * tma_other_= mispredicts / tma_branch_mispredicts) / (tma_clears_resteers + tma_mispredi= cts_resteers + tma_unknown_branches)) / (tma_branch_resteers + tma_dsb_swit= ches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches) + t= ma_ms) + 10 * tma_microcode_sequencer * tma_other_mispredicts / tma_branch_= mispredicts * tma_branch_mispredicts + tma_machine_clears * tma_other_nukes= / tma_other_nukes + tma_core_bound * (tma_serializing_operation + RS.EMPTY= _RESOURCE / tma_info_thread_clks * tma_ports_utilized_0) / (tma_amx_busy + = tma_divider + tma_ports_utilization + tma_serializing_operation) + tma_micr= ocode_sequencer / (tma_few_uops_instructions + tma_microcode_sequencer) * (= tma_assists / tma_microcode_sequencer) * tma_heavy_operations)", "MetricGroup": "Bad;BvIO;Cor;Ret;tma_issueMS", "MetricName": "tma_bottleneck_irregular_overhead", "MetricThreshold": "tma_bottleneck_irregular_overhead > 10", @@ -456,7 +456,7 @@ }, { "BriefDescription": "Total pipeline cost of Memory Address Transla= tion related bottlenecks (data-side TLBs)", - "MetricExpr": "100 * (tma_memory_bound * (tma_l1_bound / (tma_dram= _bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (= tma_dtlb_load / (tma_dtlb_load + tma_fb_full + tma_l1_latency_dependency + = tma_lock_latency + tma_split_loads + tma_store_fwd_blk)) + tma_memory_bound= * (tma_store_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l= 3_bound + tma_store_bound)) * (tma_dtlb_store / (tma_dtlb_store + tma_false= _sharing + tma_split_stores + tma_store_latency + tma_streaming_stores)))", + "MetricExpr": "100 * (tma_memory_bound * (tma_l1_bound / (tma_cxl_= mem_bound + tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + t= ma_store_bound)) * (tma_dtlb_load / (tma_dtlb_load + tma_fb_full + tma_l1_l= atency_dependency + tma_lock_latency + tma_split_loads + tma_store_fwd_blk)= ) + tma_memory_bound * (tma_store_bound / (tma_cxl_mem_bound + tma_dram_bou= nd + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_= dtlb_store / (tma_dtlb_store + tma_false_sharing + tma_split_stores + tma_s= tore_latency + tma_streaming_stores)))", "MetricGroup": "BvMT;Mem;MemoryTLB;Offcore;tma_issueTLB", "MetricName": "tma_bottleneck_memory_data_tlbs", "MetricThreshold": "tma_bottleneck_memory_data_tlbs > 20", @@ -464,7 +464,7 @@ }, { "BriefDescription": "Total pipeline cost of Memory Synchronization= related bottlenecks (data transfers and coherency updates across processor= s)", - "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * = (tma_mem_latency / (tma_mem_bandwidth + tma_mem_latency)) * tma_remote_cach= e / (tma_local_mem + tma_remote_cache + tma_remote_mem) + tma_l3_bound / (t= ma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_boun= d) * (tma_contested_accesses + tma_data_sharing) / (tma_contested_accesses = + tma_data_sharing + tma_l3_hit_latency + tma_sq_full) + tma_store_bound / = (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bo= und) * tma_false_sharing / (tma_dtlb_store + tma_false_sharing + tma_split_= stores + tma_store_latency + tma_streaming_stores - tma_store_latency)) + t= ma_machine_clears * (1 - tma_other_nukes / tma_other_nukes))", + "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_cx= l_mem_bound + tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound += tma_store_bound) * (tma_mem_latency / (tma_mem_bandwidth + tma_mem_latency= )) * tma_remote_cache / (tma_local_mem + tma_remote_cache + tma_remote_mem)= + tma_l3_bound / (tma_cxl_mem_bound + tma_dram_bound + tma_l1_bound + tma_= l2_bound + tma_l3_bound + tma_store_bound) * (tma_contested_accesses + tma_= data_sharing) / (tma_contested_accesses + tma_data_sharing + tma_l3_hit_lat= ency + tma_sq_full) + tma_store_bound / (tma_cxl_mem_bound + tma_dram_bound= + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * tma_fals= e_sharing / (tma_dtlb_store + tma_false_sharing + tma_split_stores + tma_st= ore_latency + tma_streaming_stores - tma_store_latency)) + tma_machine_clea= rs * (1 - tma_other_nukes / tma_other_nukes))", "MetricGroup": "BvMS;LockCont;Mem;Offcore;tma_issueSyncxn", "MetricName": "tma_bottleneck_memory_synchronization", "MetricThreshold": "tma_bottleneck_memory_synchronization > 10", @@ -480,7 +480,7 @@ }, { "BriefDescription": "Total pipeline cost of remaining bottlenecks = in the back-end", - "MetricExpr": "100 - (tma_bottleneck_big_code + tma_bottleneck_ins= truction_fetch_bw + tma_bottleneck_mispredictions + tma_bottleneck_cache_me= mory_bandwidth + tma_bottleneck_cache_memory_latency + tma_bottleneck_memor= y_data_tlbs + tma_bottleneck_memory_synchronization + tma_bottleneck_comput= e_bound_est + tma_bottleneck_irregular_overhead + tma_bottleneck_branching_= overhead + tma_bottleneck_useful_work)", + "MetricExpr": "100 - (tma_bottleneck_big_code + tma_bottleneck_ins= truction_fetch_bw + tma_bottleneck_mispredictions + tma_bottleneck_data_cac= he_memory_bandwidth + tma_bottleneck_data_cache_memory_latency + tma_bottle= neck_memory_data_tlbs + tma_bottleneck_memory_synchronization + tma_bottlen= eck_compute_bound_est + tma_bottleneck_irregular_overhead + tma_bottleneck_= branching_overhead + tma_bottleneck_useful_work)", "MetricGroup": "BvOB;Cor;Offcore", "MetricName": "tma_bottleneck_other_bottlenecks", "MetricThreshold": "tma_bottleneck_other_bottlenecks > 20", @@ -496,7 +496,7 @@ { "BriefDescription": "This metric represents fraction of slots the = CPU has wasted due to Branch Misprediction", "DefaultMetricgroupName": "TopdownL2", - "MetricExpr": "topdown\\-br\\-mispredict / (topdown\\-fe\\-bound += topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tm= a_info_thread_slots", + "MetricExpr": "topdown\\-br\\-mispredict / (topdown\\-fe\\-bound += topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound)", "MetricGroup": "BadSpec;BrMispredicts;BvMP;Default;TmaL2;TopdownL2= ;tma_L2_group;tma_bad_speculation_group;tma_issueBM", "MetricName": "tma_branch_mispredicts", "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_specula= tion > 0.15", @@ -613,7 +613,6 @@ }, { "BriefDescription": "This metric estimates fraction of cycles whil= e the memory subsystem was handling synchronizations due to contested acces= ses", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "(MEM_LOAD_L3_HIT_RETIRED.XSNP_MISS * min(MEM_LOAD_L= 3_HIT_RETIRED.XSNP_MISS:R, 74.6 * tma_info_system_core_frequency) + MEM_LOA= D_L3_HIT_RETIRED.XSNP_FWD * min(MEM_LOAD_L3_HIT_RETIRED.XSNP_FWD:R, 76.6 * = tma_info_system_core_frequency) * (OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM / (= OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM + OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HIT_= WITH_FWD))) * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS / 2) = / tma_info_thread_clks", "MetricGroup": "BvMS;DataSharing;LockCont;Offcore;Snoop;TopdownL4;= tma_L4_group;tma_issueSyncxn;tma_l3_bound_group", "MetricName": "tma_contested_accesses", @@ -632,6 +631,15 @@ "PublicDescription": "This metric represents fraction of slots whe= re Core non-memory issues were of a bottleneck. Shortage in hardware compu= te resources; or dependencies in software's instructions are both categoriz= ed under Core Bound. Hence it may indicate the machine ran out of an out-of= -order resource; certain execution units are overloaded or dependencies in = program's data- or instruction-flow are limiting the performance (e.g. FP-c= hained long-latency arithmetic operations).", "ScaleUnit": "100%" }, + { + "BriefDescription": "This metric roughly estimates (based on idle = latencies) how often the CPU was stalled on accesses to external CXL Memory= by loads (e.g", + "MetricExpr": "(((1 - ((19 * (MEM_LOAD_L3_MISS_RETIRED.REMOTE_DRAM= * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS)) + 10 * (MEM_LO= AD_L3_MISS_RETIRED.LOCAL_DRAM * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RET= IRED.L1_MISS) + MEM_LOAD_L3_MISS_RETIRED.REMOTE_FWD * (1 + MEM_LOAD_RETIRED= .FB_HIT / MEM_LOAD_RETIRED.L1_MISS) + MEM_LOAD_L3_MISS_RETIRED.REMOTE_HITM = * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS))) / (19 * (MEM_L= OAD_L3_MISS_RETIRED.REMOTE_DRAM * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_R= ETIRED.L1_MISS)) + 10 * (MEM_LOAD_L3_MISS_RETIRED.LOCAL_DRAM * (1 + MEM_LOA= D_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS) + MEM_LOAD_L3_MISS_RETIRED.REM= OTE_FWD * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS) + MEM_LO= AD_L3_MISS_RETIRED.REMOTE_HITM * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RE= TIRED.L1_MISS)) + (25 * (MEM_LOAD_RETIRED.LOCAL_CXL_MEM * (1 + MEM_LOAD_RET= IRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS) if #has_pmem > 0 else 0) + 33 * (ME= M_LOAD_L3_MISS_RETIRED.REMOTE_CXL_MEM * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_= LOAD_RETIRED.L1_MISS) if #has_pmem > 0 else 0))) if #has_pmem > 0 else 1)) = * (MEMORY_ACTIVITY.STALLS_L3_MISS / tma_info_thread_clks) if 1e6 * (MEM_LOA= D_L3_MISS_RETIRED.REMOTE_CXL_MEM + MEM_LOAD_RETIRED.LOCAL_CXL_MEM) > MEM_LO= AD_RETIRED.L1_MISS else 0) if #has_pmem > 0 else 0)", + "MetricGroup": "MemoryBound;Server;TmaL3mem;TopdownL3;tma_L3_group= ;tma_memory_bound_group", + "MetricName": "tma_cxl_mem_bound", + "MetricThreshold": "tma_cxl_mem_bound > 0.1 & (tma_memory_bound > = 0.2 & tma_backend_bound > 0.2)", + "PublicDescription": "This metric roughly estimates (based on idle= latencies) how often the CPU was stalled on accesses to external CXL Memor= y by loads (e.g. 3D-Xpoint (Crystal Ridge, a.k.a. IXP) memory, PMM - Persis= tent Memory Module [from CLX to SPR] or any other CXL Type3 Memory [EMR onw= ards]).", + "ScaleUnit": "100%" + }, { "BriefDescription": "This metric estimates fraction of cycles whil= e the memory subsystem was handling synchronizations due to data-sharing ac= cesses", "MetricConstraint": "NO_GROUP_EVENTS", @@ -662,7 +670,7 @@ }, { "BriefDescription": "This metric estimates how often the CPU was s= talled on accesses to external memory (DRAM) by loads", - "MetricExpr": "MEMORY_ACTIVITY.STALLS_L3_MISS / tma_info_thread_cl= ks", + "MetricExpr": "(MEMORY_ACTIVITY.STALLS_L3_MISS / tma_info_thread_c= lks - tma_cxl_mem_bound if #has_pmem > 0 else MEMORY_ACTIVITY.STALLS_L3_MIS= S / tma_info_thread_clks)", "MetricGroup": "MemoryBound;TmaL3mem;TopdownL3;tma_L3_group;tma_me= mory_bound_group", "MetricName": "tma_dram_bound", "MetricThreshold": "tma_dram_bound > 0.1 & (tma_memory_bound > 0.2= & tma_backend_bound > 0.2)", @@ -720,7 +728,7 @@ "MetricGroup": "BvMB;MemoryBW;TopdownL4;tma_L4_group;tma_issueBW;t= ma_issueSL;tma_issueSmSt;tma_l1_bound_group", "MetricName": "tma_fb_full", "MetricThreshold": "tma_fb_full > 0.3", - "PublicDescription": "This metric does a *rough estimation* of how= often L1D Fill Buffer unavailability limited additional L1D miss memory ac= cess requests to proceed. The higher the metric value; the deeper the memor= y hierarchy level the misses are satisfied from (metric values >1 are valid= ). Often it hints on approaching bandwidth limits (to L2 cache; L3 cache or= external memory). Related metrics: tma_bottleneck_cache_memory_bandwidth, = tma_info_system_dram_bw_use, tma_mem_bandwidth, tma_sq_full, tma_store_late= ncy, tma_streaming_stores", + "PublicDescription": "This metric does a *rough estimation* of how= often L1D Fill Buffer unavailability limited additional L1D miss memory ac= cess requests to proceed. The higher the metric value; the deeper the memor= y hierarchy level the misses are satisfied from (metric values >1 are valid= ). Often it hints on approaching bandwidth limits (to L2 cache; L3 cache or= external memory). Related metrics: tma_bottleneck_data_cache_memory_bandwi= dth, tma_info_system_dram_bw_use, tma_mem_bandwidth, tma_sq_full, tma_store= _latency, tma_streaming_stores", "ScaleUnit": "100%" }, { @@ -848,7 +856,7 @@ { "BriefDescription": "This metric represents fraction of slots wher= e the CPU was retiring heavy-weight operations -- instructions that require= two or more uops or micro-coded sequences", "DefaultMetricgroupName": "TopdownL2", - "MetricExpr": "topdown\\-heavy\\-ops / (topdown\\-fe\\-bound + top= down\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_in= fo_thread_slots", + "MetricExpr": "topdown\\-heavy\\-ops / (topdown\\-fe\\-bound + top= down\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound)", "MetricGroup": "Default;Retire;TmaL2;TopdownL2;tma_L2_group;tma_re= tiring_group", "MetricName": "tma_heavy_operations", "MetricThreshold": "tma_heavy_operations > 0.1", @@ -1395,19 +1403,19 @@ { "BriefDescription": "Off-core accesses per kilo instruction for mo= dified write requests", "MetricExpr": "1e3 * OCR.MODIFIED_WRITE.ANY_RESPONSE / tma_info_in= st_mix_instructions", - "MetricGroup": "Offcore", + "MetricGroup": "Offcore;Server", "MetricName": "tma_info_memory_mix_offcore_mwrite_any_pki" }, { "BriefDescription": "Off-core accesses per kilo instruction for re= ads-to-core requests (speculative; including in-core HW prefetches)", "MetricExpr": "1e3 * OCR.READS_TO_CORE.ANY_RESPONSE / tma_info_ins= t_mix_instructions", - "MetricGroup": "CacheHits;Offcore", + "MetricGroup": "CacheHits;Offcore;Server", "MetricName": "tma_info_memory_mix_offcore_read_any_pki" }, { "BriefDescription": "L3 cache misses per kilo instruction for read= s-to-core requests (speculative; including in-core HW prefetches)", "MetricExpr": "1e3 * OCR.READS_TO_CORE.L3_MISS / tma_info_inst_mix= _instructions", - "MetricGroup": "Offcore", + "MetricGroup": "Offcore;Server", "MetricName": "tma_info_memory_mix_offcore_read_l3m_pki" }, { @@ -1433,21 +1441,21 @@ { "BriefDescription": "Average DRAM BW for Reads-to-Core (R2C) cover= ing for memory attached to local- and remote-socket", "MetricExpr": "64 * OCR.READS_TO_CORE.DRAM / 1e9 / tma_info_system= _time", - "MetricGroup": "HPC;Mem;MemoryBW;SoC", + "MetricGroup": "HPC;Mem;MemoryBW;Offcore;Server", "MetricName": "tma_info_memory_soc_r2c_dram_bw", "PublicDescription": "Average DRAM BW for Reads-to-Core (R2C) cove= ring for memory attached to local- and remote-socket. See R2C_Offcore_BW." }, { "BriefDescription": "Average L3-cache miss BW for Reads-to-Core (R= 2C)", "MetricExpr": "64 * OCR.READS_TO_CORE.L3_MISS / 1e9 / tma_info_sys= tem_time", - "MetricGroup": "HPC;Mem;MemoryBW;SoC", + "MetricGroup": "HPC;Mem;MemoryBW;Offcore;Server", "MetricName": "tma_info_memory_soc_r2c_l3m_bw", "PublicDescription": "Average L3-cache miss BW for Reads-to-Core (= R2C). This covering going to DRAM or other memory off-chip memory tears. Se= e R2C_Offcore_BW." }, { "BriefDescription": "Average Off-core access BW for Reads-to-Core = (R2C)", "MetricExpr": "64 * OCR.READS_TO_CORE.ANY_RESPONSE / 1e9 / tma_inf= o_system_time", - "MetricGroup": "HPC;Mem;MemoryBW;SoC", + "MetricGroup": "HPC;Mem;MemoryBW;Offcore;Server", "MetricName": "tma_info_memory_soc_r2c_offcore_bw", "PublicDescription": "Average Off-core access BW for Reads-to-Core= (R2C). R2C account for demand or prefetch load/RFO/code access that fill d= ata into the Core caches." }, @@ -1491,7 +1499,7 @@ "MetricName": "tma_info_memory_tlb_store_stlb_mpki" }, { - "BriefDescription": "", + "BriefDescription": "Mem;Backend;CacheHits", "MetricExpr": "UOPS_EXECUTED.THREAD / (UOPS_EXECUTED.CORE_CYCLES_G= E_1 / 2 if #SMT_on else cpu@UOPS_EXECUTED.THREAD\\,cmask\\=3D1@)", "MetricGroup": "Cor;Pipeline;PortsUtil;SMT", "MetricName": "tma_info_pipeline_execute" @@ -1538,7 +1546,7 @@ }, { "BriefDescription": "Measured Average Core Frequency for unhalted = processors [GHz]", - "MetricExpr": "tma_info_system_turbo_utilization * TSC / 1e9 / tma= _info_system_time", + "MetricExpr": "tma_info_system_turbo_utilization * msr@tsc@ / 1e9 = / tma_info_system_time", "MetricGroup": "Power;Summary", "MetricName": "tma_info_system_core_frequency" }, @@ -1550,16 +1558,28 @@ }, { "BriefDescription": "Average number of utilized CPUs", - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / TSC", + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@", "MetricGroup": "Summary", "MetricName": "tma_info_system_cpus_utilized" }, + { + "BriefDescription": "Average 3DXP Memory Bandwidth Use for reads [= GB / sec]", + "MetricExpr": "(64 * UNC_CXLCM_RxC_PACK_BUF_INSERTS.MEM_DATA / 1e9= / tma_info_system_time if #has_pmem > 0 else 0)", + "MetricGroup": "MemOffcore;MemoryBW;Server;SoC", + "MetricName": "tma_info_system_cxl_mem_read_bw" + }, + { + "BriefDescription": "Average 3DXP Memory Bandwidth Use for Writes = [GB / sec]", + "MetricExpr": "(64 * UNC_CXLDP_TxC_AGF_INSERTS.M2S_DATA / 1e9 / tm= a_info_system_time if #has_pmem > 0 else 0)", + "MetricGroup": "MemOffcore;MemoryBW;Server;SoC", + "MetricName": "tma_info_system_cxl_mem_write_bw" + }, { "BriefDescription": "Average external Memory Bandwidth Use for rea= ds and writes [GB / sec]", "MetricExpr": "64 * (UNC_M_CAS_COUNT_SCH0.RD + UNC_M_CAS_COUNT_SCH= 1.RD + UNC_M_CAS_COUNT_SCH0.WR + UNC_M_CAS_COUNT_SCH1.WR) / 1e9 / tma_info_= system_time", "MetricGroup": "HPC;MemOffcore;MemoryBW;SoC;tma_issueBW", "MetricName": "tma_info_system_dram_bw_use", - "PublicDescription": "Average external Memory Bandwidth Use for re= ads and writes [GB / sec]. Related metrics: tma_bottleneck_cache_memory_ban= dwidth, tma_fb_full, tma_mem_bandwidth, tma_sq_full" + "PublicDescription": "Average external Memory Bandwidth Use for re= ads and writes [GB / sec]. Related metrics: tma_bottleneck_data_cache_memor= y_bandwidth, tma_fb_full, tma_mem_bandwidth, tma_sq_full" }, { "BriefDescription": "Giga Floating Point Operations Per Second", @@ -1771,12 +1791,12 @@ "ScaleUnit": "100%" }, { - "BriefDescription": "This metric([SKL+] roughly; [LNL]) estimates = fraction of cycles with demand load accesses that hit the L1D cache", + "BriefDescription": "This metric ([SKL+] roughly; [LNL]) estimates= fraction of cycles with demand load accesses that hit the L1D cache", "MetricExpr": "min(2 * (MEM_INST_RETIRED.ALL_LOADS - MEM_LOAD_RETI= RED.FB_HIT - MEM_LOAD_RETIRED.L1_MISS) * 20 / 100, max(CYCLE_ACTIVITY.CYCLE= S_MEM_ANY - MEMORY_ACTIVITY.CYCLES_L1D_MISS, 0)) / tma_info_thread_clks", "MetricGroup": "BvML;MemoryLat;TopdownL4;tma_L4_group;tma_l1_bound= _group", "MetricName": "tma_l1_latency_dependency", "MetricThreshold": "tma_l1_latency_dependency > 0.1 & (tma_l1_boun= d > 0.1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric([SKL+] roughly; [LNL]) estimates= fraction of cycles with demand load accesses that hit the L1D cache. The s= hort latency of the L1D cache may be exposed in pointer-chasing memory acce= ss patterns as an example. Sample with: MEM_LOAD_RETIRED.L1_HIT", + "PublicDescription": "This metric ([SKL+] roughly; [LNL]) estimate= s fraction of cycles with demand load accesses that hit the L1D cache. The = short latency of the L1D cache may be exposed in pointer-chasing memory acc= ess patterns as an example. Sample with: MEM_LOAD_RETIRED.L1_HIT", "ScaleUnit": "100%" }, { @@ -1790,7 +1810,6 @@ }, { "BriefDescription": "This metric represents fraction of cycles wit= h demand load accesses that hit the L2 cache under unloaded scenarios (poss= ibly L2 latency limited)", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "MEM_LOAD_RETIRED.L2_HIT * min(MEM_LOAD_RETIRED.L2_H= IT:R, 4.4 * tma_info_system_core_frequency) * (1 + MEM_LOAD_RETIRED.FB_HIT = / MEM_LOAD_RETIRED.L1_MISS / 2) / tma_info_thread_clks", "MetricGroup": "MemoryLat;TopdownL4;tma_L4_group;tma_l2_bound_grou= p", "MetricName": "tma_l2_hit_latency", @@ -1809,12 +1828,11 @@ }, { "BriefDescription": "This metric estimates fraction of cycles with= demand load accesses that hit the L3 cache under unloaded scenarios (possi= bly L3 latency limited)", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "MEM_LOAD_RETIRED.L3_HIT * min(MEM_LOAD_RETIRED.L3_H= IT:R, 32.6 * tma_info_system_core_frequency) * (1 + MEM_LOAD_RETIRED.FB_HIT= / MEM_LOAD_RETIRED.L1_MISS / 2) / tma_info_thread_clks", "MetricGroup": "BvML;MemoryLat;TopdownL4;tma_L4_group;tma_issueLat= ;tma_l3_bound_group", "MetricName": "tma_l3_hit_latency", "MetricThreshold": "tma_l3_hit_latency > 0.1 & (tma_l3_bound > 0.0= 5 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric estimates fraction of cycles wit= h demand load accesses that hit the L3 cache under unloaded scenarios (poss= ibly L3 latency limited). Avoiding private cache misses (i.e. L2 misses/L3= hits) will improve the latency; reduce contention with sibling physical co= res and increase performance. Note the value of this node may overlap with= its siblings. Sample with: MEM_LOAD_RETIRED.L3_HIT_PS. Related metrics: tm= a_bottleneck_cache_memory_latency, tma_mem_latency", + "PublicDescription": "This metric estimates fraction of cycles wit= h demand load accesses that hit the L3 cache under unloaded scenarios (poss= ibly L3 latency limited). Avoiding private cache misses (i.e. L2 misses/L3= hits) will improve the latency; reduce contention with sibling physical co= res and increase performance. Note the value of this node may overlap with= its siblings. Sample with: MEM_LOAD_RETIRED.L3_HIT_PS. Related metrics: tm= a_bottleneck_data_cache_memory_latency, tma_mem_latency", "ScaleUnit": "100%" }, { @@ -1897,6 +1915,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles the= CPU spent handling cache misses due to lock operations", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "MEM_INST_RETIRED.LOCK_LOADS * MEM_INST_RETIRED.LOCK= _LOADS:R / tma_info_thread_clks", "MetricGroup": "LockCont;Offcore;TopdownL4;tma_L4_group;tma_issueR= FO;tma_l1_bound_group", "MetricName": "tma_lock_latency", @@ -1929,7 +1948,7 @@ "MetricGroup": "BvMB;MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_d= ram_bound_group;tma_issueBW", "MetricName": "tma_mem_bandwidth", "MetricThreshold": "tma_mem_bandwidth > 0.2 & (tma_dram_bound > 0.= 1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric estimates fraction of cycles whe= re the core's performance was likely hurt due to approaching bandwidth limi= ts of external memory - DRAM ([SPR-HBM] and/or HBM). The underlying heuris= tic assumes that a similar off-core traffic is generated by all IA cores. T= his metric does not aggregate non-data-read requests by this logical proces= sor; requests from other IA Logical Processors/Physical Cores/sockets; or o= ther non-IA devices like GPU; hence the maximum external memory bandwidth l= imits may or may not be approached when this metric is flagged (see Uncore = counters for that). Related metrics: tma_bottleneck_cache_memory_bandwidth,= tma_fb_full, tma_info_system_dram_bw_use, tma_sq_full", + "PublicDescription": "This metric estimates fraction of cycles whe= re the core's performance was likely hurt due to approaching bandwidth limi= ts of external memory - DRAM ([SPR-HBM] and/or HBM). The underlying heuris= tic assumes that a similar off-core traffic is generated by all IA cores. T= his metric does not aggregate non-data-read requests by this logical proces= sor; requests from other IA Logical Processors/Physical Cores/sockets; or o= ther non-IA devices like GPU; hence the maximum external memory bandwidth l= imits may or may not be approached when this metric is flagged (see Uncore = counters for that). Related metrics: tma_bottleneck_data_cache_memory_bandw= idth, tma_fb_full, tma_info_system_dram_bw_use, tma_sq_full", "ScaleUnit": "100%" }, { @@ -1938,13 +1957,13 @@ "MetricGroup": "BvML;MemoryLat;Offcore;TopdownL4;tma_L4_group;tma_= dram_bound_group;tma_issueLat", "MetricName": "tma_mem_latency", "MetricThreshold": "tma_mem_latency > 0.1 & (tma_dram_bound > 0.1 = & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric estimates fraction of cycles whe= re the performance was likely hurt due to latency from external memory - DR= AM ([SPR-HBM] and/or HBM). This metric does not aggregate requests from ot= her Logical Processors/Physical Cores/sockets (see Uncore counters for that= ). Related metrics: tma_bottleneck_cache_memory_latency, tma_l3_hit_latency= ", + "PublicDescription": "This metric estimates fraction of cycles whe= re the performance was likely hurt due to latency from external memory - DR= AM ([SPR-HBM] and/or HBM). This metric does not aggregate requests from ot= her Logical Processors/Physical Cores/sockets (see Uncore counters for that= ). Related metrics: tma_bottleneck_data_cache_memory_latency, tma_l3_hit_la= tency", "ScaleUnit": "100%" }, { "BriefDescription": "This metric represents fraction of slots the = Memory subsystem within the Backend was a bottleneck", "DefaultMetricgroupName": "TopdownL2", - "MetricExpr": "topdown\\-mem\\-bound / (topdown\\-fe\\-bound + top= down\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_in= fo_thread_slots", + "MetricExpr": "topdown\\-mem\\-bound / (topdown\\-fe\\-bound + top= down\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound)", "MetricGroup": "Backend;Default;TmaL2;TopdownL2;tma_L2_group;tma_b= ackend_bound_group", "MetricName": "tma_memory_bound", "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0= .2", @@ -1954,7 +1973,6 @@ }, { "BriefDescription": "This metric represents fraction of cycles the= CPU was stalled due to LFENCE Instructions.", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "13 * MISC2_RETIRED.LFENCE / tma_info_thread_clks", "MetricGroup": "TopdownL4;tma_L4_group;tma_serializing_operation_g= roup", "MetricName": "tma_memory_fence", @@ -2007,7 +2025,7 @@ }, { "BriefDescription": "This metric represents Core fraction of cycle= s in which CPU was likely limited due to the Microcode Sequencer (MS) unit = - see Microcode_Sequencer node for details.", - "MetricExpr": "max(IDQ.MS_CYCLES_ANY, cpu@UOPS_RETIRED.MS\\,cmask\= \=3D1@ / (UOPS_RETIRED.SLOTS / UOPS_ISSUED.ANY)) / tma_info_core_core_clks = / 2", + "MetricExpr": "max(IDQ.MS_CYCLES_ANY, cpu@UOPS_RETIRED.MS\\,cmask\= \=3D1@ / (UOPS_RETIRED.SLOTS / UOPS_ISSUED.ANY)) / tma_info_core_core_clks = / 2.4", "MetricGroup": "MicroSeq;TopdownL3;tma_L3_group;tma_fetch_bandwidt= h_group", "MetricName": "tma_ms", "MetricThreshold": "tma_ms > 0.05 & tma_fetch_bandwidth > 0.2", @@ -2042,6 +2060,7 @@ }, { "BriefDescription": "This metric represents the remaining light uo= ps fraction the CPU has executed - remaining means not covered by other sib= ling nodes", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "max(0, tma_light_operations - (tma_fp_arith + tma_i= nt_operations + tma_memory_operations + tma_fused_instructions + tma_non_fu= sed_branches))", "MetricGroup": "Pipeline;TopdownL3;tma_L3_group;tma_light_operatio= ns_group", "MetricName": "tma_other_light_ops", @@ -2103,6 +2122,7 @@ }, { "BriefDescription": "This metric estimates fraction of cycles the = CPU performance was potentially limited due to Core computation issues (non= divider-related)", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "((tma_ports_utilized_0 * tma_info_thread_clks + (EX= E_ACTIVITY.1_PORTS_UTIL + tma_retiring * EXE_ACTIVITY.2_3_PORTS_UTIL)) / tm= a_info_thread_clks if ARITH.DIV_ACTIVE < CYCLE_ACTIVITY.STALLS_TOTAL - EXE_= ACTIVITY.BOUND_ON_LOADS else (EXE_ACTIVITY.1_PORTS_UTIL + tma_retiring * EX= E_ACTIVITY.2_3_PORTS_UTIL) / tma_info_thread_clks)", "MetricGroup": "PortsUtil;TopdownL3;tma_L3_group;tma_core_bound_gr= oup", "MetricName": "tma_ports_utilization", @@ -2112,6 +2132,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles CPU= executed no uops on any execution port (Logical Processor cycles since ICL= , Physical Core cycles otherwise)", + "MetricConstraint": "NO_THRESHOLD_AND_NMI", "MetricExpr": "max(EXE_ACTIVITY.EXE_BOUND_0_PORTS - RESOURCE_STALL= S.SCOREBOARD, 0) / tma_info_thread_clks", "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_ports_utiliza= tion_group", "MetricName": "tma_ports_utilized_0", @@ -2121,6 +2142,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles whe= re the CPU executed total of 1 uop per cycle on all execution ports (Logica= l Processor cycles since ICL, Physical Core cycles otherwise)", + "MetricConstraint": "NO_THRESHOLD_AND_NMI", "MetricExpr": "EXE_ACTIVITY.1_PORTS_UTIL / tma_info_thread_clks", "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_issueL1;tma_p= orts_utilization_group", "MetricName": "tma_ports_utilized_1", @@ -2130,7 +2152,6 @@ }, { "BriefDescription": "This metric represents fraction of cycles CPU= executed total of 2 uops per cycle on all execution ports (Logical Process= or cycles since ICL, Physical Core cycles otherwise)", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "EXE_ACTIVITY.2_PORTS_UTIL / tma_info_thread_clks", "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_issue2P;tma_p= orts_utilization_group", "MetricName": "tma_ports_utilized_2", @@ -2140,7 +2161,6 @@ }, { "BriefDescription": "This metric represents fraction of cycles CPU= executed total of 3 or more uops per cycle on all execution ports (Logical= Processor cycles since ICL, Physical Core cycles otherwise)", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "UOPS_EXECUTED.CYCLES_GE_3 / tma_info_thread_clks", "MetricGroup": "BvCB;PortsUtil;TopdownL4;tma_L4_group;tma_ports_ut= ilization_group", "MetricName": "tma_ports_utilized_3m", @@ -2150,7 +2170,7 @@ }, { "BriefDescription": "This metric estimates fraction of cycles whil= e the memory subsystem was handling loads from remote cache in other socket= s including synchronizations issues", - "MetricExpr": "(MEM_LOAD_L3_MISS_RETIRED.REMOTE_HITM * MEM_LOAD_L3= _MISS_RETIRED.REMOTE_HITM:R + MEM_LOAD_L3_MISS_RETIRED.REMOTE_FWD * MEM_LOA= D_L3_MISS_RETIRED.REMOTE_FWD:R) * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_R= ETIRED.L1_MISS / 2) / tma_info_thread_clks", + "MetricExpr": "(MEM_LOAD_L3_MISS_RETIRED.REMOTE_HITM * PEBS + MEM_= LOAD_L3_MISS_RETIRED.REMOTE_FWD * PEBS) * (1 + MEM_LOAD_RETIRED.FB_HIT / ME= M_LOAD_RETIRED.L1_MISS / 2) / tma_info_thread_clks", "MetricGroup": "Offcore;Server;Snoop;TopdownL5;tma_L5_group;tma_is= sueSyncxn;tma_mem_latency_group", "MetricName": "tma_remote_cache", "MetricThreshold": "tma_remote_cache > 0.05 & (tma_mem_latency > 0= .1 & (tma_dram_bound > 0.1 & (tma_memory_bound > 0.2 & tma_backend_bound > = 0.2)))", @@ -2159,7 +2179,7 @@ }, { "BriefDescription": "This metric estimates fraction of cycles whil= e the memory subsystem was handling loads from remote memory", - "MetricExpr": "MEM_LOAD_L3_MISS_RETIRED.REMOTE_DRAM * MEM_LOAD_L3_= MISS_RETIRED.REMOTE_DRAM:R * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRE= D.L1_MISS / 2) / tma_info_thread_clks", + "MetricExpr": "MEM_LOAD_L3_MISS_RETIRED.REMOTE_DRAM * PEBS * (1 + = MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS / 2) / tma_info_thread_c= lks", "MetricGroup": "Server;Snoop;TopdownL5;tma_L5_group;tma_mem_latenc= y_group", "MetricName": "tma_remote_mem", "MetricThreshold": "tma_remote_mem > 0.1 & (tma_mem_latency > 0.1 = & (tma_dram_bound > 0.1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2= )))", @@ -2177,7 +2197,7 @@ { "BriefDescription": "This category represents fraction of slots ut= ilized by useful work i.e. issued uops that eventually get retired", "DefaultMetricgroupName": "TopdownL1", - "MetricExpr": "topdown\\-retiring / (topdown\\-fe\\-bound + topdow= n\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_info_= thread_slots", + "MetricExpr": "topdown\\-retiring / (topdown\\-fe\\-bound + topdow= n\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound)", "MetricGroup": "BvUW;Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_retiring", "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.= 1", @@ -2205,7 +2225,6 @@ }, { "BriefDescription": "This metric represents fraction of cycles the= CPU was stalled due to PAUSE Instructions", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "CPU_CLK_UNHALTED.PAUSE / tma_info_thread_clks", "MetricGroup": "TopdownL4;tma_L4_group;tma_serializing_operation_g= roup", "MetricName": "tma_slow_pause", @@ -2237,7 +2256,7 @@ "MetricGroup": "BvMB;MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_i= ssueBW;tma_l3_bound_group", "MetricName": "tma_sq_full", "MetricThreshold": "tma_sq_full > 0.3 & (tma_l3_bound > 0.05 & (tm= a_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric measures fraction of cycles wher= e the Super Queue (SQ) was full taking into account all request-types and b= oth hardware SMT threads (Logical Processors). Related metrics: tma_bottlen= eck_cache_memory_bandwidth, tma_fb_full, tma_info_system_dram_bw_use, tma_m= em_bandwidth", + "PublicDescription": "This metric measures fraction of cycles wher= e the Super Queue (SQ) was full taking into account all request-types and b= oth hardware SMT threads (Logical Processors). Related metrics: tma_bottlen= eck_data_cache_memory_bandwidth, tma_fb_full, tma_info_system_dram_bw_use, = tma_mem_bandwidth", "ScaleUnit": "100%" }, { diff --git a/tools/perf/pmu-events/arch/x86/graniterapids/memory.json b/too= ls/perf/pmu-events/arch/x86/graniterapids/memory.json index 4db39f304c2c..96f40390becf 100644 --- a/tools/perf/pmu-events/arch/x86/graniterapids/memory.json +++ b/tools/perf/pmu-events/arch/x86/graniterapids/memory.json @@ -5,7 +5,6 @@ "CounterMask": "2", "EventCode": "0xa3", "EventName": "CYCLE_ACTIVITY.CYCLES_L3_MISS", - "PublicDescription": "Cycles while L3 cache miss demand load is ou= tstanding. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x2" }, @@ -15,7 +14,6 @@ "CounterMask": "6", "EventCode": "0xa3", "EventName": "CYCLE_ACTIVITY.STALLS_L3_MISS", - "PublicDescription": "Execution stalls while L3 cache miss demand = load is outstanding. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x6" }, @@ -24,7 +22,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc3", "EventName": "MACHINE_CLEARS.MEMORY_ORDERING", - "PublicDescription": "Counts the number of Machine Clears detected= dye to memory ordering. Memory Ordering Machine Clears may apply when a me= mory read may not conform to the memory ordering rules of the x86 architect= ure Available PDIST counters: 0", + "PublicDescription": "Counts the number of Machine Clears detected= dye to memory ordering. Memory Ordering Machine Clears may apply when a me= mory read may not conform to the memory ordering rules of the x86 architect= ure", "SampleAfterValue": "100003", "UMask": "0x2" }, @@ -34,7 +32,6 @@ "CounterMask": "2", "EventCode": "0x47", "EventName": "MEMORY_ACTIVITY.CYCLES_L1D_MISS", - "PublicDescription": "Cycles while L1 cache miss demand load is ou= tstanding. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x2" }, @@ -44,7 +41,6 @@ "CounterMask": "3", "EventCode": "0x47", "EventName": "MEMORY_ACTIVITY.STALLS_L1D_MISS", - "PublicDescription": "Execution stalls while L1 cache miss demand = load is outstanding. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x3" }, @@ -54,7 +50,7 @@ "CounterMask": "5", "EventCode": "0x47", "EventName": "MEMORY_ACTIVITY.STALLS_L2_MISS", - "PublicDescription": "Execution stalls while L2 cache miss demand = cacheable load request is outstanding (will not count for uncacheable deman= d requests e.g. bus lock). Available PDIST counters: 0", + "PublicDescription": "Execution stalls while L2 cache miss demand = cacheable load request is outstanding (will not count for uncacheable deman= d requests e.g. bus lock).", "SampleAfterValue": "1000003", "UMask": "0x5" }, @@ -64,7 +60,7 @@ "CounterMask": "9", "EventCode": "0x47", "EventName": "MEMORY_ACTIVITY.STALLS_L3_MISS", - "PublicDescription": "Execution stalls while L3 cache miss demand = cacheable load request is outstanding (will not count for uncacheable deman= d requests e.g. bus lock). Available PDIST counters: 0", + "PublicDescription": "Execution stalls while L3 cache miss demand = cacheable load request is outstanding (will not count for uncacheable deman= d requests e.g. bus lock).", "SampleAfterValue": "1000003", "UMask": "0x9" }, @@ -412,7 +408,6 @@ "Counter": "0,1,2,3", "EventCode": "0x21", "EventName": "OFFCORE_REQUESTS.L3_MISS_DEMAND_DATA_RD", - "PublicDescription": "Counts demand data read requests that miss t= he L3 cache. Available PDIST counters: 0", "SampleAfterValue": "100003", "UMask": "0x10" }, @@ -422,7 +417,7 @@ "CounterMask": "1", "EventCode": "0x20", "EventName": "OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_L3_MISS_DEM= AND_DATA_RD", - "PublicDescription": "Cycles with at least 1 Demand Data Read requ= ests who miss L3 cache in the superQ. Available PDIST counters: 0", + "PublicDescription": "Cycles with at least 1 Demand Data Read requ= ests who miss L3 cache in the superQ.", "SampleAfterValue": "1000003", "UMask": "0x10" }, @@ -431,7 +426,7 @@ "Counter": "0,1,2,3", "EventCode": "0x20", "EventName": "OFFCORE_REQUESTS_OUTSTANDING.L3_MISS_DEMAND_DATA_RD", - "PublicDescription": "For every cycle, increments by the number of= demand data read requests pending that are known to have missed the L3 cac= he. Note that this does not capture all elapsed cycles while requests are = outstanding - only cycles from when the requests were known by the requesti= ng core to have missed the L3 cache. Available PDIST counters: 0", + "PublicDescription": "For every cycle, increments by the number of= demand data read requests pending that are known to have missed the L3 cac= he. Note that this does not capture all elapsed cycles while requests are = outstanding - only cycles from when the requests were known by the requesti= ng core to have missed the L3 cache.", "SampleAfterValue": "2000003", "UMask": "0x10" }, @@ -449,7 +444,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc9", "EventName": "RTM_RETIRED.ABORTED_EVENTS", - "PublicDescription": "Counts the number of times an RTM execution = aborted due to none of the previous 3 categories (e.g. interrupt). Availabl= e PDIST counters: 0", + "PublicDescription": "Counts the number of times an RTM execution = aborted due to none of the previous 3 categories (e.g. interrupt).", "SampleAfterValue": "100003", "UMask": "0x80" }, @@ -458,7 +453,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc9", "EventName": "RTM_RETIRED.ABORTED_MEM", - "PublicDescription": "Counts the number of times an RTM execution = aborted due to various memory events (e.g. read/write capacity and conflict= s). Available PDIST counters: 0", + "PublicDescription": "Counts the number of times an RTM execution = aborted due to various memory events (e.g. read/write capacity and conflict= s).", "SampleAfterValue": "100003", "UMask": "0x8" }, @@ -467,7 +462,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc9", "EventName": "RTM_RETIRED.ABORTED_MEMTYPE", - "PublicDescription": "Counts the number of times an RTM execution = aborted due to incompatible memory type. Available PDIST counters: 0", + "PublicDescription": "Counts the number of times an RTM execution = aborted due to incompatible memory type.", "SampleAfterValue": "100003", "UMask": "0x40" }, @@ -476,7 +471,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc9", "EventName": "RTM_RETIRED.ABORTED_UNFRIENDLY", - "PublicDescription": "Counts the number of times an RTM execution = aborted due to HLE-unfriendly instructions. Available PDIST counters: 0", + "PublicDescription": "Counts the number of times an RTM execution = aborted due to HLE-unfriendly instructions.", "SampleAfterValue": "100003", "UMask": "0x20" }, @@ -485,7 +480,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc9", "EventName": "RTM_RETIRED.COMMIT", - "PublicDescription": "Counts the number of times RTM commit succee= ded. Available PDIST counters: 0", + "PublicDescription": "Counts the number of times RTM commit succee= ded.", "SampleAfterValue": "100003", "UMask": "0x2" }, @@ -494,7 +489,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc9", "EventName": "RTM_RETIRED.START", - "PublicDescription": "Counts the number of times we entered an RTM= region. Does not count nested transactions. Available PDIST counters: 0", + "PublicDescription": "Counts the number of times we entered an RTM= region. Does not count nested transactions.", "SampleAfterValue": "100003", "UMask": "0x1" }, @@ -503,7 +498,7 @@ "Counter": "0,1,2,3", "EventCode": "0x54", "EventName": "TX_MEM.ABORT_CAPACITY_READ", - "PublicDescription": "Speculatively counts the number of Transacti= onal Synchronization Extensions (TSX) aborts due to a data capacity limitat= ion for transactional reads Available PDIST counters: 0", + "PublicDescription": "Speculatively counts the number of Transacti= onal Synchronization Extensions (TSX) aborts due to a data capacity limitat= ion for transactional reads", "SampleAfterValue": "100003", "UMask": "0x80" }, @@ -512,7 +507,7 @@ "Counter": "0,1,2,3", "EventCode": "0x54", "EventName": "TX_MEM.ABORT_CAPACITY_WRITE", - "PublicDescription": "Speculatively counts the number of Transacti= onal Synchronization Extensions (TSX) aborts due to a data capacity limitat= ion for transactional writes. Available PDIST counters: 0", + "PublicDescription": "Speculatively counts the number of Transacti= onal Synchronization Extensions (TSX) aborts due to a data capacity limitat= ion for transactional writes.", "SampleAfterValue": "100003", "UMask": "0x2" }, @@ -521,7 +516,7 @@ "Counter": "0,1,2,3", "EventCode": "0x54", "EventName": "TX_MEM.ABORT_CONFLICT", - "PublicDescription": "Counts the number of times a TSX line had a = cache conflict. Available PDIST counters: 0", + "PublicDescription": "Counts the number of times a TSX line had a = cache conflict.", "SampleAfterValue": "100003", "UMask": "0x1" } diff --git a/tools/perf/pmu-events/arch/x86/graniterapids/other.json b/tool= s/perf/pmu-events/arch/x86/graniterapids/other.json index 8b7aa4caec46..c0747750b1a8 100644 --- a/tools/perf/pmu-events/arch/x86/graniterapids/other.json +++ b/tools/perf/pmu-events/arch/x86/graniterapids/other.json @@ -4,7 +4,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc1", "EventName": "ASSISTS.HARDWARE", - "PublicDescription": "Count all other hardware assists or traps th= at are not necessarily architecturally exposed (through a software handler)= beyond FP; SSE-AVX mix and A/D assists who are counted by dedicated sub-ev= ents. This includes, but not limited to, assists at EXE or MEM uop writeba= ck like AVX* load/store/gather/scatter (non-FP GSSE-assist ) , assists gene= rated by ROB like PEBS and RTIT, Uncore trap, RAR (Remote Action Request) a= nd CET (Control flow Enforcement Technology) assists. the event also counts= for Machine Ordering count. Available PDIST counters: 0", + "PublicDescription": "Count all other hardware assists or traps th= at are not necessarily architecturally exposed (through a software handler)= beyond FP; SSE-AVX mix and A/D assists who are counted by dedicated sub-ev= ents. This includes, but not limited to, assists at EXE or MEM uop writeba= ck like AVX* load/store/gather/scatter (non-FP GSSE-assist ) , assists gene= rated by ROB like PEBS and RTIT, Uncore trap, RAR (Remote Action Request) a= nd CET (Control flow Enforcement Technology) assists. the event also counts= for Machine Ordering count.", "SampleAfterValue": "100003", "UMask": "0x4" }, @@ -13,10 +13,34 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc1", "EventName": "ASSISTS.PAGE_FAULT", - "PublicDescription": "ASSISTS.PAGE_FAULT Available PDIST counters:= 0", "SampleAfterValue": "1000003", "UMask": "0x8" }, + { + "BriefDescription": "HW_INTERRUPTS.MASKED", + "Counter": "0,1,2,3,4,5,6,7", + "EventCode": "0xcb", + "EventName": "HW_INTERRUPTS.MASKED", + "SampleAfterValue": "100003", + "UMask": "0x2" + }, + { + "BriefDescription": "HW_INTERRUPTS.PENDING_AND_MASKED", + "Counter": "0,1,2,3,4,5,6,7", + "EventCode": "0xcb", + "EventName": "HW_INTERRUPTS.PENDING_AND_MASKED", + "SampleAfterValue": "100003", + "UMask": "0x4" + }, + { + "BriefDescription": "Number of hardware interrupts received by the= processor.", + "Counter": "0,1,2,3,4,5,6,7", + "EventCode": "0xcb", + "EventName": "HW_INTERRUPTS.RECEIVED", + "PublicDescription": "Counts the number of hardware interruptions = received by the processor.", + "SampleAfterValue": "203", + "UMask": "0x1" + }, { "BriefDescription": "Counts streaming stores that have any type of= response.", "Counter": "0,1,2,3", @@ -34,7 +58,7 @@ "CounterMask": "1", "EventCode": "0x2d", "EventName": "XQ.FULL_CYCLES", - "PublicDescription": "number of cycles when the thread is active a= nd the uncore cannot take any further requests (for example prefetches, loa= ds or stores initiated by the Core that miss the L2 cache). Available PDIST= counters: 0", + "PublicDescription": "number of cycles when the thread is active a= nd the uncore cannot take any further requests (for example prefetches, loa= ds or stores initiated by the Core that miss the L2 cache).", "SampleAfterValue": "1000003", "UMask": "0x1" } diff --git a/tools/perf/pmu-events/arch/x86/graniterapids/pipeline.json b/t= ools/perf/pmu-events/arch/x86/graniterapids/pipeline.json index 27af3bd6bacf..0fef8fd61974 100644 --- a/tools/perf/pmu-events/arch/x86/graniterapids/pipeline.json +++ b/tools/perf/pmu-events/arch/x86/graniterapids/pipeline.json @@ -5,7 +5,7 @@ "CounterMask": "1", "EventCode": "0xb0", "EventName": "ARITH.DIV_ACTIVE", - "PublicDescription": "Counts cycles when divide unit is busy execu= ting divide or square root operations. Accounts for integer and floating-po= int operations. Available PDIST counters: 0", + "PublicDescription": "Counts cycles when divide unit is busy execu= ting divide or square root operations. Accounts for integer and floating-po= int operations.", "SampleAfterValue": "1000003", "UMask": "0x9" }, @@ -15,7 +15,6 @@ "CounterMask": "1", "EventCode": "0xb0", "EventName": "ARITH.IDIV_ACTIVE", - "PublicDescription": "This event counts the cycles the integer div= ider is busy. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x8" }, @@ -24,7 +23,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc1", "EventName": "ASSISTS.ANY", - "PublicDescription": "Counts the number of occurrences where a mic= rocode assist is invoked by hardware. Examples include AD (page Access Dirt= y), FP and AVX related assists. Available PDIST counters: 0", + "PublicDescription": "Counts the number of occurrences where a mic= rocode assist is invoked by hardware. Examples include AD (page Access Dirt= y), FP and AVX related assists.", "SampleAfterValue": "100003", "UMask": "0x1b" }, @@ -271,7 +270,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xec", "EventName": "CPU_CLK_UNHALTED.C01", - "PublicDescription": "Counts core clocks when the thread is in the= C0.1 light-weight slower wakeup time but more power saving optimized state= . This state can be entered via the TPAUSE or UMWAIT instructions. Availab= le PDIST counters: 0", + "PublicDescription": "Counts core clocks when the thread is in the= C0.1 light-weight slower wakeup time but more power saving optimized state= . This state can be entered via the TPAUSE or UMWAIT instructions.", "SampleAfterValue": "2000003", "UMask": "0x10" }, @@ -280,7 +279,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xec", "EventName": "CPU_CLK_UNHALTED.C02", - "PublicDescription": "Counts core clocks when the thread is in the= C0.2 light-weight faster wakeup time but less power saving optimized state= . This state can be entered via the TPAUSE or UMWAIT instructions. Availab= le PDIST counters: 0", + "PublicDescription": "Counts core clocks when the thread is in the= C0.2 light-weight faster wakeup time but less power saving optimized state= . This state can be entered via the TPAUSE or UMWAIT instructions.", "SampleAfterValue": "2000003", "UMask": "0x20" }, @@ -289,7 +288,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xec", "EventName": "CPU_CLK_UNHALTED.C0_WAIT", - "PublicDescription": "Counts core clocks when the thread is in the= C0.1 or C0.2 power saving optimized states (TPAUSE or UMWAIT instructions)= or running the PAUSE instruction. Available PDIST counters: 0", + "PublicDescription": "Counts core clocks when the thread is in the= C0.1 or C0.2 power saving optimized states (TPAUSE or UMWAIT instructions)= or running the PAUSE instruction.", "SampleAfterValue": "2000003", "UMask": "0x70" }, @@ -298,7 +297,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xec", "EventName": "CPU_CLK_UNHALTED.DISTRIBUTED", - "PublicDescription": "This event distributes cycle counts between = active hyperthreads, i.e., those in C0. A hyperthread becomes inactive whe= n it executes the HLT or MWAIT instructions. If all other hyperthreads are= inactive (or disabled or do not exist), all counts are attributed to this = hyperthread. To obtain the full count when the Core is active, sum the coun= ts from each hyperthread. Available PDIST counters: 0", + "PublicDescription": "This event distributes cycle counts between = active hyperthreads, i.e., those in C0. A hyperthread becomes inactive whe= n it executes the HLT or MWAIT instructions. If all other hyperthreads are= inactive (or disabled or do not exist), all counts are attributed to this = hyperthread. To obtain the full count when the Core is active, sum the coun= ts from each hyperthread.", "SampleAfterValue": "2000003", "UMask": "0x2" }, @@ -307,7 +306,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0x3c", "EventName": "CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE", - "PublicDescription": "Counts Core crystal clock cycles when curren= t thread is unhalted and the other thread is halted. Available PDIST counte= rs: 0", + "PublicDescription": "Counts Core crystal clock cycles when curren= t thread is unhalted and the other thread is halted.", "SampleAfterValue": "25003", "UMask": "0x2" }, @@ -316,7 +315,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xec", "EventName": "CPU_CLK_UNHALTED.PAUSE", - "PublicDescription": "CPU_CLK_UNHALTED.PAUSE Available PDIST count= ers: 0", "SampleAfterValue": "2000003", "UMask": "0x40" }, @@ -327,7 +325,6 @@ "EdgeDetect": "1", "EventCode": "0xec", "EventName": "CPU_CLK_UNHALTED.PAUSE_INST", - "PublicDescription": "CPU_CLK_UNHALTED.PAUSE_INST Available PDIST = counters: 0", "SampleAfterValue": "2000003", "UMask": "0x40" }, @@ -336,7 +333,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0x3c", "EventName": "CPU_CLK_UNHALTED.REF_DISTRIBUTED", - "PublicDescription": "This event distributes Core crystal clock cy= cle counts between active hyperthreads, i.e., those in C0 sleep-state. A hy= perthread becomes inactive when it executes the HLT or MWAIT instructions. = If one thread is active in a core, all counts are attributed to this hypert= hread. To obtain the full count when the Core is active, sum the counts fro= m each hyperthread. Available PDIST counters: 0", + "PublicDescription": "This event distributes Core crystal clock cy= cle counts between active hyperthreads, i.e., those in C0 sleep-state. A hy= perthread becomes inactive when it executes the HLT or MWAIT instructions. = If one thread is active in a core, all counts are attributed to this hypert= hread. To obtain the full count when the Core is active, sum the counts fro= m each hyperthread.", "SampleAfterValue": "2000003", "UMask": "0x8" }, @@ -344,7 +341,7 @@ "BriefDescription": "Reference cycles when the core is not in halt= state.", "Counter": "Fixed counter 2", "EventName": "CPU_CLK_UNHALTED.REF_TSC", - "PublicDescription": "Counts the number of reference cycles when t= he core is not in a halt state. The core enters the halt state when it is r= unning the HLT instruction or the MWAIT instruction. This event is not affe= cted by core frequency changes (for example, P states, TM2 transitions) but= has the same incrementing frequency as the time stamp counter. This event = can approximate elapsed time while the core was not in a halt state. It is = counted on a dedicated fixed counter, leaving the eight programmable counte= rs available for other events. Note: On all current platforms this event st= ops counting during 'throttling (TM)' states duty off periods the processor= is 'halted'. The counter update is done at a lower clock rate then the co= re clock the overflow status bit for this counter may appear 'sticky'. Aft= er the counter has overflowed and software clears the overflow status bit a= nd resets the counter to less than MAX. The reset value to the counter is n= ot clocked immediately so the overflow status bit will flip 'high (1)' and = generate another PMI (if enabled) after which the reset value gets clocked = into the counter. Therefore, software will get the interrupt, read the over= flow status bit '1 for bit 34 while the counter value is less than MAX. Sof= tware should ignore this case. Available PDIST counters: 0", + "PublicDescription": "Counts the number of reference cycles when t= he core is not in a halt state. The core enters the halt state when it is r= unning the HLT instruction or the MWAIT instruction. This event is not affe= cted by core frequency changes (for example, P states, TM2 transitions) but= has the same incrementing frequency as the time stamp counter. This event = can approximate elapsed time while the core was not in a halt state. It is = counted on a dedicated fixed counter, leaving the eight programmable counte= rs available for other events. Note: On all current platforms this event st= ops counting during 'throttling (TM)' states duty off periods the processor= is 'halted'. The counter update is done at a lower clock rate then the co= re clock the overflow status bit for this counter may appear 'sticky'. Aft= er the counter has overflowed and software clears the overflow status bit a= nd resets the counter to less than MAX. The reset value to the counter is n= ot clocked immediately so the overflow status bit will flip 'high (1)' and = generate another PMI (if enabled) after which the reset value gets clocked = into the counter. Therefore, software will get the interrupt, read the over= flow status bit '1 for bit 34 while the counter value is less than MAX. Sof= tware should ignore this case.", "SampleAfterValue": "2000003", "UMask": "0x3" }, @@ -353,7 +350,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0x3c", "EventName": "CPU_CLK_UNHALTED.REF_TSC_P", - "PublicDescription": "Counts the number of reference cycles when t= he core is not in a halt state. The core enters the halt state when it is r= unning the HLT instruction or the MWAIT instruction. This event is not affe= cted by core frequency changes (for example, P states, TM2 transitions) but= has the same incrementing frequency as the time stamp counter. This event = can approximate elapsed time while the core was not in a halt state. It is = counted on a dedicated fixed counter, leaving the four (eight when Hyperthr= eading is disabled) programmable counters available for other events. Note:= On all current platforms this event stops counting during 'throttling (TM)= ' states duty off periods the processor is 'halted'. The counter update is= done at a lower clock rate then the core clock the overflow status bit for= this counter may appear 'sticky'. After the counter has overflowed and so= ftware clears the overflow status bit and resets the counter to less than M= AX. The reset value to the counter is not clocked immediately so the overfl= ow status bit will flip 'high (1)' and generate another PMI (if enabled) af= ter which the reset value gets clocked into the counter. Therefore, softwar= e will get the interrupt, read the overflow status bit '1 for bit 34 while = the counter value is less than MAX. Software should ignore this case. Avail= able PDIST counters: 0", + "PublicDescription": "Counts the number of reference cycles when t= he core is not in a halt state. The core enters the halt state when it is r= unning the HLT instruction or the MWAIT instruction. This event is not affe= cted by core frequency changes (for example, P states, TM2 transitions) but= has the same incrementing frequency as the time stamp counter. This event = can approximate elapsed time while the core was not in a halt state. It is = counted on a dedicated fixed counter, leaving the four (eight when Hyperthr= eading is disabled) programmable counters available for other events. Note:= On all current platforms this event stops counting during 'throttling (TM)= ' states duty off periods the processor is 'halted'. The counter update is= done at a lower clock rate then the core clock the overflow status bit for= this counter may appear 'sticky'. After the counter has overflowed and so= ftware clears the overflow status bit and resets the counter to less than M= AX. The reset value to the counter is not clocked immediately so the overfl= ow status bit will flip 'high (1)' and generate another PMI (if enabled) af= ter which the reset value gets clocked into the counter. Therefore, softwar= e will get the interrupt, read the overflow status bit '1 for bit 34 while = the counter value is less than MAX. Software should ignore this case.", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -361,7 +358,7 @@ "BriefDescription": "Core cycles when the thread is not in halt st= ate", "Counter": "Fixed counter 1", "EventName": "CPU_CLK_UNHALTED.THREAD", - "PublicDescription": "Counts the number of core cycles while the t= hread is not in a halt state. The thread enters the halt state when it is r= unning the HLT instruction. This event is a component in many key event rat= ios. The core frequency may change from time to time due to transitions ass= ociated with Enhanced Intel SpeedStep Technology or TM2. For this reason th= is event may have a changing ratio with regards to time. When the core freq= uency is constant, this event can approximate elapsed time while the core w= as not in the halt state. It is counted on a dedicated fixed counter, leavi= ng the eight programmable counters available for other events. Available PD= IST counters: 0", + "PublicDescription": "Counts the number of core cycles while the t= hread is not in a halt state. The thread enters the halt state when it is r= unning the HLT instruction. This event is a component in many key event rat= ios. The core frequency may change from time to time due to transitions ass= ociated with Enhanced Intel SpeedStep Technology or TM2. For this reason th= is event may have a changing ratio with regards to time. When the core freq= uency is constant, this event can approximate elapsed time while the core w= as not in the halt state. It is counted on a dedicated fixed counter, leavi= ng the eight programmable counters available for other events.", "SampleAfterValue": "2000003", "UMask": "0x2" }, @@ -370,7 +367,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0x3c", "EventName": "CPU_CLK_UNHALTED.THREAD_P", - "PublicDescription": "This is an architectural event that counts t= he number of thread cycles while the thread is not in a halt state. The thr= ead enters the halt state when it is running the HLT instruction. The core = frequency may change from time to time due to power or thermal throttling. = For this reason, this event may have a changing ratio with regards to wall = clock time. Available PDIST counters: 0", + "PublicDescription": "This is an architectural event that counts t= he number of thread cycles while the thread is not in a halt state. The thr= ead enters the halt state when it is running the HLT instruction. The core = frequency may change from time to time due to power or thermal throttling. = For this reason, this event may have a changing ratio with regards to wall = clock time.", "SampleAfterValue": "2000003" }, { @@ -379,7 +376,6 @@ "CounterMask": "8", "EventCode": "0xa3", "EventName": "CYCLE_ACTIVITY.CYCLES_L1D_MISS", - "PublicDescription": "Cycles while L1 cache miss demand load is ou= tstanding. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x8" }, @@ -389,7 +385,6 @@ "CounterMask": "1", "EventCode": "0xa3", "EventName": "CYCLE_ACTIVITY.CYCLES_L2_MISS", - "PublicDescription": "Cycles while L2 cache miss demand load is ou= tstanding. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x1" }, @@ -399,7 +394,6 @@ "CounterMask": "16", "EventCode": "0xa3", "EventName": "CYCLE_ACTIVITY.CYCLES_MEM_ANY", - "PublicDescription": "Cycles while memory subsystem has an outstan= ding load. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x10" }, @@ -409,7 +403,6 @@ "CounterMask": "12", "EventCode": "0xa3", "EventName": "CYCLE_ACTIVITY.STALLS_L1D_MISS", - "PublicDescription": "Execution stalls while L1 cache miss demand = load is outstanding. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0xc" }, @@ -419,7 +412,6 @@ "CounterMask": "5", "EventCode": "0xa3", "EventName": "CYCLE_ACTIVITY.STALLS_L2_MISS", - "PublicDescription": "Execution stalls while L2 cache miss demand = load is outstanding. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x5" }, @@ -429,7 +421,6 @@ "CounterMask": "4", "EventCode": "0xa3", "EventName": "CYCLE_ACTIVITY.STALLS_TOTAL", - "PublicDescription": "Total execution stalls. Available PDIST coun= ters: 0", "SampleAfterValue": "1000003", "UMask": "0x4" }, @@ -438,7 +429,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb7", "EventName": "EXE.AMX_BUSY", - "PublicDescription": "Counts the cycles where the AMX (Advance Mat= rix Extension) unit is busy performing an operation. Available PDIST counte= rs: 0", "SampleAfterValue": "2000003", "UMask": "0x2" }, @@ -447,7 +437,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa6", "EventName": "EXE_ACTIVITY.1_PORTS_UTIL", - "PublicDescription": "Counts cycles during which a total of 1 uop = was executed on all ports and Reservation Station (RS) was not empty. Avail= able PDIST counters: 0", + "PublicDescription": "Counts cycles during which a total of 1 uop = was executed on all ports and Reservation Station (RS) was not empty.", "SampleAfterValue": "2000003", "UMask": "0x2" }, @@ -456,7 +446,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa6", "EventName": "EXE_ACTIVITY.2_3_PORTS_UTIL", - "PublicDescription": "Cycles total of 2 or 3 uops are executed on = all ports and Reservation Station (RS) was not empty. Available PDIST count= ers: 0", "SampleAfterValue": "2000003", "UMask": "0xc" }, @@ -465,7 +454,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa6", "EventName": "EXE_ACTIVITY.2_PORTS_UTIL", - "PublicDescription": "Counts cycles during which a total of 2 uops= were executed on all ports and Reservation Station (RS) was not empty. Ava= ilable PDIST counters: 0", + "PublicDescription": "Counts cycles during which a total of 2 uops= were executed on all ports and Reservation Station (RS) was not empty.", "SampleAfterValue": "2000003", "UMask": "0x4" }, @@ -474,7 +463,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa6", "EventName": "EXE_ACTIVITY.3_PORTS_UTIL", - "PublicDescription": "Cycles total of 3 uops are executed on all p= orts and Reservation Station (RS) was not empty. Available PDIST counters: = 0", + "PublicDescription": "Cycles total of 3 uops are executed on all p= orts and Reservation Station (RS) was not empty.", "SampleAfterValue": "2000003", "UMask": "0x8" }, @@ -483,7 +472,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa6", "EventName": "EXE_ACTIVITY.4_PORTS_UTIL", - "PublicDescription": "Cycles total of 4 uops are executed on all p= orts and Reservation Station (RS) was not empty. Available PDIST counters: = 0", + "PublicDescription": "Cycles total of 4 uops are executed on all p= orts and Reservation Station (RS) was not empty.", "SampleAfterValue": "2000003", "UMask": "0x10" }, @@ -493,7 +482,6 @@ "CounterMask": "5", "EventCode": "0xa6", "EventName": "EXE_ACTIVITY.BOUND_ON_LOADS", - "PublicDescription": "Execution stalls while memory subsystem has = an outstanding load. Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x21" }, @@ -503,7 +491,7 @@ "CounterMask": "2", "EventCode": "0xa6", "EventName": "EXE_ACTIVITY.BOUND_ON_STORES", - "PublicDescription": "Counts cycles where the Store Buffer was ful= l and no loads caused an execution stall. Available PDIST counters: 0", + "PublicDescription": "Counts cycles where the Store Buffer was ful= l and no loads caused an execution stall.", "SampleAfterValue": "1000003", "UMask": "0x40" }, @@ -512,7 +500,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa6", "EventName": "EXE_ACTIVITY.EXE_BOUND_0_PORTS", - "PublicDescription": "Number of cycles total of 0 uops executed on= all ports, Reservation Station (RS) was not empty, the Store Buffer (SB) w= as not full and there was no outstanding load. Available PDIST counters: 0", + "PublicDescription": "Number of cycles total of 0 uops executed on= all ports, Reservation Station (RS) was not empty, the Store Buffer (SB) w= as not full and there was no outstanding load.", "SampleAfterValue": "1000003", "UMask": "0x80" }, @@ -521,7 +509,7 @@ "Counter": "0,1,2,3", "EventCode": "0x75", "EventName": "INST_DECODED.DECODERS", - "PublicDescription": "Number of decoders utilized in a cycle when = the MITE (legacy decode pipeline) fetches instructions. Available PDIST cou= nters: 0", + "PublicDescription": "Number of decoders utilized in a cycle when = the MITE (legacy decode pipeline) fetches instructions.", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -546,7 +534,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc0", "EventName": "INST_RETIRED.MACRO_FUSED", - "PublicDescription": "INST_RETIRED.MACRO_FUSED Available PDIST cou= nters: 0", "SampleAfterValue": "2000003", "UMask": "0x10" }, @@ -555,7 +542,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc0", "EventName": "INST_RETIRED.NOP", - "PublicDescription": "Counts all retired NOP or ENDBR32/64 or PREF= ETCHIT0/1 instructions Available PDIST counters: 0", + "PublicDescription": "Counts all retired NOP or ENDBR32/64 or PREF= ETCHIT0/1 instructions", "SampleAfterValue": "2000003", "UMask": "0x2" }, @@ -572,7 +559,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc0", "EventName": "INST_RETIRED.REP_ITERATION", - "PublicDescription": "Number of iterations of Repeat (REP) string = retired instructions such as MOVS, CMPS, and SCAS. Each has a byte, word, a= nd doubleword version and string instructions can be repeated using a repet= ition prefix, REP, that allows their architectural execution to be repeated= a number of times as specified by the RCX register. Note the number of ite= rations is implementation-dependent. Available PDIST counters: 0", + "PublicDescription": "Number of iterations of Repeat (REP) string = retired instructions such as MOVS, CMPS, and SCAS. Each has a byte, word, a= nd doubleword version and string instructions can be repeated using a repet= ition prefix, REP, that allows their architectural execution to be repeated= a number of times as specified by the RCX register. Note the number of ite= rations is implementation-dependent.", "SampleAfterValue": "2000003", "UMask": "0x8" }, @@ -583,7 +570,7 @@ "EdgeDetect": "1", "EventCode": "0xad", "EventName": "INT_MISC.CLEARS_COUNT", - "PublicDescription": "Counts the number of speculative clears due = to any type of branch misprediction or machine clears Available PDIST count= ers: 0", + "PublicDescription": "Counts the number of speculative clears due = to any type of branch misprediction or machine clears", "SampleAfterValue": "500009", "UMask": "0x1" }, @@ -592,7 +579,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xad", "EventName": "INT_MISC.CLEAR_RESTEER_CYCLES", - "PublicDescription": "Cycles after recovery from a branch mispredi= ction or machine clear till the first uop is issued from the resteered path= . Available PDIST counters: 0", + "PublicDescription": "Cycles after recovery from a branch mispredi= ction or machine clear till the first uop is issued from the resteered path= .", "SampleAfterValue": "500009", "UMask": "0x80" }, @@ -601,7 +588,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xad", "EventName": "INT_MISC.MBA_STALLS", - "PublicDescription": "INT_MISC.MBA_STALLS Available PDIST counters= : 0", "SampleAfterValue": "1000003", "UMask": "0x20" }, @@ -610,7 +596,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xad", "EventName": "INT_MISC.RECOVERY_CYCLES", - "PublicDescription": "Counts core cycles when the Resource allocat= or was stalled due to recovery from an earlier branch misprediction or mach= ine clear event. Available PDIST counters: 0", + "PublicDescription": "Counts core cycles when the Resource allocat= or was stalled due to recovery from an earlier branch misprediction or mach= ine clear event.", "SampleAfterValue": "500009", "UMask": "0x1" }, @@ -621,7 +607,6 @@ "EventName": "INT_MISC.UNKNOWN_BRANCH_CYCLES", "MSRIndex": "0x3F7", "MSRValue": "0x7", - "PublicDescription": "Bubble cycles of BAClear (Unknown Branch). A= vailable PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x40" }, @@ -630,7 +615,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xad", "EventName": "INT_MISC.UOP_DROPPING", - "PublicDescription": "Estimated number of Top-down Microarchitectu= re Analysis slots that got dropped due to non front-end reasons Available P= DIST counters: 0", + "PublicDescription": "Estimated number of Top-down Microarchitectu= re Analysis slots that got dropped due to non front-end reasons", "SampleAfterValue": "1000003", "UMask": "0x10" }, @@ -639,7 +624,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe7", "EventName": "INT_VEC_RETIRED.128BIT", - "PublicDescription": "INT_VEC_RETIRED.128BIT Available PDIST count= ers: 0", "SampleAfterValue": "1000003", "UMask": "0x13" }, @@ -648,7 +632,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe7", "EventName": "INT_VEC_RETIRED.256BIT", - "PublicDescription": "INT_VEC_RETIRED.256BIT Available PDIST count= ers: 0", "SampleAfterValue": "1000003", "UMask": "0xac" }, @@ -657,7 +640,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe7", "EventName": "INT_VEC_RETIRED.ADD_128", - "PublicDescription": "Number of retired integer ADD/SUB (regular o= r horizontal), SAD 128-bit vector instructions. Available PDIST counters: 0= ", + "PublicDescription": "Number of retired integer ADD/SUB (regular o= r horizontal), SAD 128-bit vector instructions.", "SampleAfterValue": "1000003", "UMask": "0x3" }, @@ -666,7 +649,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe7", "EventName": "INT_VEC_RETIRED.ADD_256", - "PublicDescription": "Number of retired integer ADD/SUB (regular o= r horizontal), SAD 256-bit vector instructions. Available PDIST counters: 0= ", + "PublicDescription": "Number of retired integer ADD/SUB (regular o= r horizontal), SAD 256-bit vector instructions.", "SampleAfterValue": "1000003", "UMask": "0xc" }, @@ -675,7 +658,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe7", "EventName": "INT_VEC_RETIRED.MUL_256", - "PublicDescription": "INT_VEC_RETIRED.MUL_256 Available PDIST coun= ters: 0", "SampleAfterValue": "1000003", "UMask": "0x80" }, @@ -684,7 +666,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe7", "EventName": "INT_VEC_RETIRED.SHUFFLES", - "PublicDescription": "INT_VEC_RETIRED.SHUFFLES Available PDIST cou= nters: 0", "SampleAfterValue": "1000003", "UMask": "0x40" }, @@ -693,7 +674,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe7", "EventName": "INT_VEC_RETIRED.VNNI_128", - "PublicDescription": "INT_VEC_RETIRED.VNNI_128 Available PDIST cou= nters: 0", "SampleAfterValue": "1000003", "UMask": "0x10" }, @@ -702,7 +682,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe7", "EventName": "INT_VEC_RETIRED.VNNI_256", - "PublicDescription": "INT_VEC_RETIRED.VNNI_256 Available PDIST cou= nters: 0", "SampleAfterValue": "1000003", "UMask": "0x20" }, @@ -711,7 +690,7 @@ "Counter": "0,1,2,3", "EventCode": "0x03", "EventName": "LD_BLOCKS.ADDRESS_ALIAS", - "PublicDescription": "Counts the number of times a load got blocke= d due to false dependencies in MOB due to partial compare on address. Avail= able PDIST counters: 0", + "PublicDescription": "Counts the number of times a load got blocke= d due to false dependencies in MOB due to partial compare on address.", "SampleAfterValue": "100003", "UMask": "0x4" }, @@ -720,7 +699,7 @@ "Counter": "0,1,2,3", "EventCode": "0x03", "EventName": "LD_BLOCKS.NO_SR", - "PublicDescription": "Counts the number of times that split load o= perations are temporarily blocked because all resources for handling the sp= lit accesses are in use. Available PDIST counters: 0", + "PublicDescription": "Counts the number of times that split load o= perations are temporarily blocked because all resources for handling the sp= lit accesses are in use.", "SampleAfterValue": "100003", "UMask": "0x88" }, @@ -729,7 +708,7 @@ "Counter": "0,1,2,3", "EventCode": "0x03", "EventName": "LD_BLOCKS.STORE_FORWARD", - "PublicDescription": "Counts the number of times where store forwa= rding was prevented for a load operation. The most common case is a load bl= ocked due to the address of memory access (partially) overlapping with a pr= eceding uncompleted store. Note: See the table of not supported store forwa= rds in the Optimization Guide. Available PDIST counters: 0", + "PublicDescription": "Counts the number of times where store forwa= rding was prevented for a load operation. The most common case is a load bl= ocked due to the address of memory access (partially) overlapping with a pr= eceding uncompleted store. Note: See the table of not supported store forwa= rds in the Optimization Guide.", "SampleAfterValue": "100003", "UMask": "0x82" }, @@ -738,7 +717,7 @@ "Counter": "0,1,2,3", "EventCode": "0x4c", "EventName": "LOAD_HIT_PREFETCH.SWPF", - "PublicDescription": "Counts all software-prefetch load dispatches= that hit the fill buffer (FB) allocated for the software prefetch. It can = also be incremented by some lock instructions. So it should only be used wi= th profiling so that the locks can be excluded by ASM (Assembly File) inspe= ction of the nearby instructions. Available PDIST counters: 0", + "PublicDescription": "Counts all software-prefetch load dispatches= that hit the fill buffer (FB) allocated for the software prefetch. It can = also be incremented by some lock instructions. So it should only be used wi= th profiling so that the locks can be excluded by ASM (Assembly File) inspe= ction of the nearby instructions.", "SampleAfterValue": "100003", "UMask": "0x1" }, @@ -748,7 +727,7 @@ "CounterMask": "1", "EventCode": "0xa8", "EventName": "LSD.CYCLES_ACTIVE", - "PublicDescription": "Counts the cycles when at least one uop is d= elivered by the LSD (Loop-stream detector). Available PDIST counters: 0", + "PublicDescription": "Counts the cycles when at least one uop is d= elivered by the LSD (Loop-stream detector).", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -758,7 +737,7 @@ "CounterMask": "6", "EventCode": "0xa8", "EventName": "LSD.CYCLES_OK", - "PublicDescription": "Counts the cycles when optimal number of uop= s is delivered by the LSD (Loop-stream detector). Available PDIST counters:= 0", + "PublicDescription": "Counts the cycles when optimal number of uop= s is delivered by the LSD (Loop-stream detector).", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -767,7 +746,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa8", "EventName": "LSD.UOPS", - "PublicDescription": "Counts the number of uops delivered to the b= ack-end by the LSD(Loop Stream Detector). Available PDIST counters: 0", + "PublicDescription": "Counts the number of uops delivered to the b= ack-end by the LSD(Loop Stream Detector).", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -778,7 +757,7 @@ "EdgeDetect": "1", "EventCode": "0xc3", "EventName": "MACHINE_CLEARS.COUNT", - "PublicDescription": "Counts the number of machine clears (nukes) = of any type. Available PDIST counters: 0", + "PublicDescription": "Counts the number of machine clears (nukes) = of any type.", "SampleAfterValue": "100003", "UMask": "0x1" }, @@ -787,7 +766,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc3", "EventName": "MACHINE_CLEARS.SMC", - "PublicDescription": "Counts self-modifying code (SMC) detected, w= hich causes a machine clear. Available PDIST counters: 0", + "PublicDescription": "Counts self-modifying code (SMC) detected, w= hich causes a machine clear.", "SampleAfterValue": "100003", "UMask": "0x4" }, @@ -796,7 +775,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe0", "EventName": "MISC2_RETIRED.LFENCE", - "PublicDescription": "number of LFENCE retired instructions Availa= ble PDIST counters: 0", + "PublicDescription": "number of LFENCE retired instructions", "SampleAfterValue": "400009", "UMask": "0x20" }, @@ -805,7 +784,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xcc", "EventName": "MISC_RETIRED.LBR_INSERTS", - "PublicDescription": "Increments when an entry is added to the Las= t Branch Record (LBR) array (or removed from the array in case of RETURNs i= n call stack mode). The event requires LBR enable via IA32_DEBUGCTL MSR and= branch type selection via MSR_LBR_SELECT. Available PDIST counters: 0", + "PublicDescription": "Increments when an entry is added to the Las= t Branch Record (LBR) array (or removed from the array in case of RETURNs i= n call stack mode). The event requires LBR enable via IA32_DEBUGCTL MSR and= branch type selection via MSR_LBR_SELECT.", "SampleAfterValue": "100003", "UMask": "0x20" }, @@ -814,7 +793,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa2", "EventName": "RESOURCE_STALLS.SB", - "PublicDescription": "Counts allocation stall cycles caused by the= store buffer (SB) being full. This counts cycles that the pipeline back-en= d blocked uop delivery from the front-end. Available PDIST counters: 0", + "PublicDescription": "Counts allocation stall cycles caused by the= store buffer (SB) being full. This counts cycles that the pipeline back-en= d blocked uop delivery from the front-end.", "SampleAfterValue": "100003", "UMask": "0x8" }, @@ -823,7 +802,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa2", "EventName": "RESOURCE_STALLS.SCOREBOARD", - "PublicDescription": "Counts cycles where the pipeline is stalled = due to serializing operations. Available PDIST counters: 0", "SampleAfterValue": "100003", "UMask": "0x2" }, @@ -832,7 +810,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa5", "EventName": "RS.EMPTY", - "PublicDescription": "Counts cycles during which the reservation s= tation (RS) is empty for this logical processor. This is usually caused whe= n the front-end pipeline runs into starvation periods (e.g. branch mispredi= ctions or i-cache misses) Available PDIST counters: 0", + "PublicDescription": "Counts cycles during which the reservation s= tation (RS) is empty for this logical processor. This is usually caused whe= n the front-end pipeline runs into starvation periods (e.g. branch mispredi= ctions or i-cache misses)", "SampleAfterValue": "1000003", "UMask": "0x7" }, @@ -844,7 +822,7 @@ "EventCode": "0xa5", "EventName": "RS.EMPTY_COUNT", "Invert": "1", - "PublicDescription": "Counts end of periods where the Reservation = Station (RS) was empty. Could be useful to closely sample on front-end late= ncy issues (see the FRONTEND_RETIRED event of designated precise events) Av= ailable PDIST counters: 0", + "PublicDescription": "Counts end of periods where the Reservation = Station (RS) was empty. Could be useful to closely sample on front-end late= ncy issues (see the FRONTEND_RETIRED event of designated precise events)", "SampleAfterValue": "100003", "UMask": "0x7" }, @@ -853,7 +831,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa5", "EventName": "RS.EMPTY_RESOURCE", - "PublicDescription": "Cycles when RS was empty and a resource allo= cation stall is asserted Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x1" }, @@ -862,7 +839,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa4", "EventName": "TOPDOWN.BACKEND_BOUND_SLOTS", - "PublicDescription": "This event counts a subset of the Topdown Sl= ots event that were not consumed by the back-end pipeline due to lack of ba= ck-end resources, as a result of memory subsystem delays, execution units l= imitations, or other conditions. The count is distributed among unhalted lo= gical processors (hyper-threads) who share the same physical core, in proce= ssors that support Intel Hyper-Threading Technology. Software can use this = event as the numerator for the Backend Bound metric (or top-level category)= of the Top-down Microarchitecture Analysis method. Available PDIST counter= s: 0", + "PublicDescription": "This event counts a subset of the Topdown Sl= ots event that were not consumed by the back-end pipeline due to lack of ba= ck-end resources, as a result of memory subsystem delays, execution units l= imitations, or other conditions. The count is distributed among unhalted lo= gical processors (hyper-threads) who share the same physical core, in proce= ssors that support Intel Hyper-Threading Technology. Software can use this = event as the numerator for the Backend Bound metric (or top-level category)= of the Top-down Microarchitecture Analysis method.", "SampleAfterValue": "10000003", "UMask": "0x2" }, @@ -871,7 +848,7 @@ "Counter": "0", "EventCode": "0xa4", "EventName": "TOPDOWN.BAD_SPEC_SLOTS", - "PublicDescription": "Number of slots of TMA method that were wast= ed due to incorrect speculation. It covers all types of control-flow or dat= a-related mis-speculations. Available PDIST counters: 0", + "PublicDescription": "Number of slots of TMA method that were wast= ed due to incorrect speculation. It covers all types of control-flow or dat= a-related mis-speculations.", "SampleAfterValue": "10000003", "UMask": "0x4" }, @@ -880,7 +857,7 @@ "Counter": "0", "EventCode": "0xa4", "EventName": "TOPDOWN.BR_MISPREDICT_SLOTS", - "PublicDescription": "Number of TMA slots that were wasted due to = incorrect speculation by (any type of) branch mispredictions. This event es= timates number of speculative operations that were issued but not retired a= s well as the out-of-order engine recovery past a branch misprediction. Ava= ilable PDIST counters: 0", + "PublicDescription": "Number of TMA slots that were wasted due to = incorrect speculation by (any type of) branch mispredictions. This event es= timates number of speculative operations that were issued but not retired a= s well as the out-of-order engine recovery past a branch misprediction.", "SampleAfterValue": "10000003", "UMask": "0x8" }, @@ -889,7 +866,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa4", "EventName": "TOPDOWN.MEMORY_BOUND_SLOTS", - "PublicDescription": "TOPDOWN.MEMORY_BOUND_SLOTS Available PDIST c= ounters: 0", "SampleAfterValue": "10000003", "UMask": "0x10" }, @@ -897,7 +873,7 @@ "BriefDescription": "TMA slots available for an unhalted logical p= rocessor. Fixed counter - architectural event", "Counter": "Fixed counter 3", "EventName": "TOPDOWN.SLOTS", - "PublicDescription": "Number of available slots for an unhalted lo= gical processor. The event increments by machine-width of the narrowest pip= eline as employed by the Top-down Microarchitecture Analysis method (TMA). = The count is distributed among unhalted logical processors (hyper-threads) = who share the same physical core. Software can use this event as the denomi= nator for the top-level metrics of the TMA method. This architectural event= is counted on a designated fixed counter (Fixed Counter 3). Available PDIS= T counters: 0", + "PublicDescription": "Number of available slots for an unhalted lo= gical processor. The event increments by machine-width of the narrowest pip= eline as employed by the Top-down Microarchitecture Analysis method (TMA). = The count is distributed among unhalted logical processors (hyper-threads) = who share the same physical core. Software can use this event as the denomi= nator for the top-level metrics of the TMA method. This architectural event= is counted on a designated fixed counter (Fixed Counter 3).", "SampleAfterValue": "10000003", "UMask": "0x4" }, @@ -906,7 +882,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa4", "EventName": "TOPDOWN.SLOTS_P", - "PublicDescription": "Counts the number of available slots for an = unhalted logical processor. The event increments by machine-width of the na= rrowest pipeline as employed by the Top-down Microarchitecture Analysis met= hod. The count is distributed among unhalted logical processors (hyper-thre= ads) who share the same physical core. Available PDIST counters: 0", + "PublicDescription": "Counts the number of available slots for an = unhalted logical processor. The event increments by machine-width of the na= rrowest pipeline as employed by the Top-down Microarchitecture Analysis met= hod. The count is distributed among unhalted logical processors (hyper-thre= ads) who share the same physical core.", "SampleAfterValue": "10000003", "UMask": "0x1" }, @@ -915,7 +891,7 @@ "Counter": "0,1,2,3", "EventCode": "0x76", "EventName": "UOPS_DECODED.DEC0_UOPS", - "PublicDescription": "This event counts the number of not dec-by-a= ll uops decoded by decoder 0. Available PDIST counters: 0", + "PublicDescription": "This event counts the number of not dec-by-a= ll uops decoded by decoder 0.", "SampleAfterValue": "1000003", "UMask": "0x1" }, @@ -924,7 +900,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb2", "EventName": "UOPS_DISPATCHED.PORT_0", - "PublicDescription": "Number of uops dispatch to execution port 0= . Available PDIST counters: 0", + "PublicDescription": "Number of uops dispatch to execution port 0= .", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -933,7 +909,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb2", "EventName": "UOPS_DISPATCHED.PORT_1", - "PublicDescription": "Number of uops dispatch to execution port 1= . Available PDIST counters: 0", + "PublicDescription": "Number of uops dispatch to execution port 1= .", "SampleAfterValue": "2000003", "UMask": "0x2" }, @@ -942,7 +918,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb2", "EventName": "UOPS_DISPATCHED.PORT_2_3_10", - "PublicDescription": "Number of uops dispatch to execution ports 2= , 3 and 10 Available PDIST counters: 0", + "PublicDescription": "Number of uops dispatch to execution ports 2= , 3 and 10", "SampleAfterValue": "2000003", "UMask": "0x4" }, @@ -951,7 +927,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb2", "EventName": "UOPS_DISPATCHED.PORT_4_9", - "PublicDescription": "Number of uops dispatch to execution ports 4= and 9 Available PDIST counters: 0", + "PublicDescription": "Number of uops dispatch to execution ports 4= and 9", "SampleAfterValue": "2000003", "UMask": "0x10" }, @@ -960,7 +936,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb2", "EventName": "UOPS_DISPATCHED.PORT_5_11", - "PublicDescription": "Number of uops dispatch to execution ports 5= and 11 Available PDIST counters: 0", + "PublicDescription": "Number of uops dispatch to execution ports 5= and 11", "SampleAfterValue": "2000003", "UMask": "0x20" }, @@ -969,7 +945,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb2", "EventName": "UOPS_DISPATCHED.PORT_6", - "PublicDescription": "Number of uops dispatch to execution port 6= . Available PDIST counters: 0", + "PublicDescription": "Number of uops dispatch to execution port 6= .", "SampleAfterValue": "2000003", "UMask": "0x40" }, @@ -978,7 +954,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb2", "EventName": "UOPS_DISPATCHED.PORT_7_8", - "PublicDescription": "Number of uops dispatch to execution ports = 7 and 8. Available PDIST counters: 0", + "PublicDescription": "Number of uops dispatch to execution ports = 7 and 8.", "SampleAfterValue": "2000003", "UMask": "0x80" }, @@ -987,7 +963,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.CORE", - "PublicDescription": "Counts the number of uops executed from any = thread. Available PDIST counters: 0", + "PublicDescription": "Counts the number of uops executed from any = thread.", "SampleAfterValue": "2000003", "UMask": "0x2" }, @@ -997,7 +973,7 @@ "CounterMask": "1", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.CORE_CYCLES_GE_1", - "PublicDescription": "Counts cycles when at least 1 micro-op is ex= ecuted from any thread on physical core. Available PDIST counters: 0", + "PublicDescription": "Counts cycles when at least 1 micro-op is ex= ecuted from any thread on physical core.", "SampleAfterValue": "2000003", "UMask": "0x2" }, @@ -1007,7 +983,7 @@ "CounterMask": "2", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.CORE_CYCLES_GE_2", - "PublicDescription": "Counts cycles when at least 2 micro-ops are = executed from any thread on physical core. Available PDIST counters: 0", + "PublicDescription": "Counts cycles when at least 2 micro-ops are = executed from any thread on physical core.", "SampleAfterValue": "2000003", "UMask": "0x2" }, @@ -1017,7 +993,7 @@ "CounterMask": "3", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.CORE_CYCLES_GE_3", - "PublicDescription": "Counts cycles when at least 3 micro-ops are = executed from any thread on physical core. Available PDIST counters: 0", + "PublicDescription": "Counts cycles when at least 3 micro-ops are = executed from any thread on physical core.", "SampleAfterValue": "2000003", "UMask": "0x2" }, @@ -1027,7 +1003,7 @@ "CounterMask": "4", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.CORE_CYCLES_GE_4", - "PublicDescription": "Counts cycles when at least 4 micro-ops are = executed from any thread on physical core. Available PDIST counters: 0", + "PublicDescription": "Counts cycles when at least 4 micro-ops are = executed from any thread on physical core.", "SampleAfterValue": "2000003", "UMask": "0x2" }, @@ -1037,7 +1013,7 @@ "CounterMask": "1", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.CYCLES_GE_1", - "PublicDescription": "Cycles where at least 1 uop was executed per= -thread. Available PDIST counters: 0", + "PublicDescription": "Cycles where at least 1 uop was executed per= -thread.", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -1047,7 +1023,7 @@ "CounterMask": "2", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.CYCLES_GE_2", - "PublicDescription": "Cycles where at least 2 uops were executed p= er-thread. Available PDIST counters: 0", + "PublicDescription": "Cycles where at least 2 uops were executed p= er-thread.", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -1057,7 +1033,7 @@ "CounterMask": "3", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.CYCLES_GE_3", - "PublicDescription": "Cycles where at least 3 uops were executed p= er-thread. Available PDIST counters: 0", + "PublicDescription": "Cycles where at least 3 uops were executed p= er-thread.", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -1067,7 +1043,7 @@ "CounterMask": "4", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.CYCLES_GE_4", - "PublicDescription": "Cycles where at least 4 uops were executed p= er-thread. Available PDIST counters: 0", + "PublicDescription": "Cycles where at least 4 uops were executed p= er-thread.", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -1078,7 +1054,7 @@ "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.STALLS", "Invert": "1", - "PublicDescription": "Counts cycles during which no uops were disp= atched from the Reservation Station (RS) per thread. Available PDIST counte= rs: 0", + "PublicDescription": "Counts cycles during which no uops were disp= atched from the Reservation Station (RS) per thread.", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -1087,7 +1063,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.THREAD", - "PublicDescription": "Counts the number of uops to be executed per= -thread each cycle. Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -1096,7 +1071,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.X87", - "PublicDescription": "Counts the number of x87 uops executed. Avai= lable PDIST counters: 0", + "PublicDescription": "Counts the number of x87 uops executed.", "SampleAfterValue": "2000003", "UMask": "0x10" }, @@ -1105,7 +1080,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xae", "EventName": "UOPS_ISSUED.ANY", - "PublicDescription": "Counts the number of uops that the Resource = Allocation Table (RAT) issues to the Reservation Station (RS). Available PD= IST counters: 0", + "PublicDescription": "Counts the number of uops that the Resource = Allocation Table (RAT) issues to the Reservation Station (RS).", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -1115,7 +1090,6 @@ "CounterMask": "1", "EventCode": "0xae", "EventName": "UOPS_ISSUED.CYCLES", - "PublicDescription": "UOPS_ISSUED.CYCLES Available PDIST counters:= 0", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -1125,7 +1099,7 @@ "CounterMask": "1", "EventCode": "0xc2", "EventName": "UOPS_RETIRED.CYCLES", - "PublicDescription": "Counts cycles where at least one uop has ret= ired. Available PDIST counters: 0", + "PublicDescription": "Counts cycles where at least one uop has ret= ired.", "SampleAfterValue": "1000003", "UMask": "0x2" }, @@ -1134,7 +1108,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc2", "EventName": "UOPS_RETIRED.HEAVY", - "PublicDescription": "Counts the number of retired micro-operation= s (uops) except the last uop of each instruction. An instruction that is de= coded into less than two uops does not contribute to the count. Available P= DIST counters: 0", + "PublicDescription": "Counts the number of retired micro-operation= s (uops) except the last uop of each instruction. An instruction that is de= coded into less than two uops does not contribute to the count.", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -1145,7 +1119,6 @@ "EventName": "UOPS_RETIRED.MS", "MSRIndex": "0x3F7", "MSRValue": "0x8", - "PublicDescription": "UOPS_RETIRED.MS Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x4" }, @@ -1154,7 +1127,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc2", "EventName": "UOPS_RETIRED.SLOTS", - "PublicDescription": "This event counts a subset of the Topdown Sl= ots event that are utilized by operations that eventually get retired (comm= itted) by the processor pipeline. Usually, this event positively correlates= with higher performance for example, as measured by the instructions-per-= cycle metric. Software can use this event as the numerator for the Retiring= metric (or top-level category) of the Top-down Microarchitecture Analysis = method. Available PDIST counters: 0", + "PublicDescription": "This event counts a subset of the Topdown Sl= ots event that are utilized by operations that eventually get retired (comm= itted) by the processor pipeline. Usually, this event positively correlates= with higher performance for example, as measured by the instructions-per-= cycle metric. Software can use this event as the numerator for the Retiring= metric (or top-level category) of the Top-down Microarchitecture Analysis = method.", "SampleAfterValue": "2000003", "UMask": "0x2" }, @@ -1165,7 +1138,7 @@ "EventCode": "0xc2", "EventName": "UOPS_RETIRED.STALLS", "Invert": "1", - "PublicDescription": "This event counts cycles without actually re= tired uops. Available PDIST counters: 0", + "PublicDescription": "This event counts cycles without actually re= tired uops.", "SampleAfterValue": "1000003", "UMask": "0x2" } diff --git a/tools/perf/pmu-events/arch/x86/graniterapids/uncore-io.json b/= tools/perf/pmu-events/arch/x86/graniterapids/uncore-io.json index f4f956966e16..2ea2637df3fb 100644 --- a/tools/perf/pmu-events/arch/x86/graniterapids/uncore-io.json +++ b/tools/perf/pmu-events/arch/x86/graniterapids/uncore-io.json @@ -1321,7 +1321,6 @@ "FCMask": "0x01", "PerPkg": "1", "PortMask": "0x0FF", - "PublicDescription": "-", "UMask": "0x4", "Unit": "IIO" }, diff --git a/tools/perf/pmu-events/arch/x86/graniterapids/uncore-memory.jso= n b/tools/perf/pmu-events/arch/x86/graniterapids/uncore-memory.json index b991f6e1afbe..9d385be59e3d 100644 --- a/tools/perf/pmu-events/arch/x86/graniterapids/uncore-memory.json +++ b/tools/perf/pmu-events/arch/x86/graniterapids/uncore-memory.json @@ -195,7 +195,6 @@ "EventName": "UNC_M_MR4_2XREF_CYCLES.SCH0_DIMM0", "Experimental": "1", "PerPkg": "1", - "PublicDescription": "-", "UMask": "0x1", "Unit": "IMC" }, @@ -206,7 +205,6 @@ "EventName": "UNC_M_MR4_2XREF_CYCLES.SCH0_DIMM1", "Experimental": "1", "PerPkg": "1", - "PublicDescription": "-", "UMask": "0x2", "Unit": "IMC" }, @@ -217,7 +215,6 @@ "EventName": "UNC_M_MR4_2XREF_CYCLES.SCH1_DIMM0", "Experimental": "1", "PerPkg": "1", - "PublicDescription": "-", "UMask": "0x4", "Unit": "IMC" }, @@ -228,7 +225,6 @@ "EventName": "UNC_M_MR4_2XREF_CYCLES.SCH1_DIMM1", "Experimental": "1", "PerPkg": "1", - "PublicDescription": "-", "UMask": "0x8", "Unit": "IMC" }, @@ -239,7 +235,6 @@ "EventName": "UNC_M_PDC_MR4ACTIVE_CYCLES.SCH0_DIMM0", "Experimental": "1", "PerPkg": "1", - "PublicDescription": "-", "UMask": "0x1", "Unit": "IMC" }, @@ -250,7 +245,6 @@ "EventName": "UNC_M_PDC_MR4ACTIVE_CYCLES.SCH0_DIMM1", "Experimental": "1", "PerPkg": "1", - "PublicDescription": "-", "UMask": "0x2", "Unit": "IMC" }, @@ -261,7 +255,6 @@ "EventName": "UNC_M_PDC_MR4ACTIVE_CYCLES.SCH1_DIMM0", "Experimental": "1", "PerPkg": "1", - "PublicDescription": "-", "UMask": "0x4", "Unit": "IMC" }, @@ -272,7 +265,6 @@ "EventName": "UNC_M_PDC_MR4ACTIVE_CYCLES.SCH1_DIMM1", "Experimental": "1", "PerPkg": "1", - "PublicDescription": "-", "UMask": "0x8", "Unit": "IMC" }, @@ -283,7 +275,6 @@ "EventName": "UNC_M_POWERDOWN_CYCLES.SCH0_RANK0", "Experimental": "1", "PerPkg": "1", - "PublicDescription": "-", "UMask": "0x1", "Unit": "IMC" }, @@ -294,7 +285,6 @@ "EventName": "UNC_M_POWERDOWN_CYCLES.SCH0_RANK1", "Experimental": "1", "PerPkg": "1", - "PublicDescription": "-", "UMask": "0x2", "Unit": "IMC" }, @@ -305,7 +295,6 @@ "EventName": "UNC_M_POWERDOWN_CYCLES.SCH0_RANK2", "Experimental": "1", "PerPkg": "1", - "PublicDescription": "-", "UMask": "0x4", "Unit": "IMC" }, @@ -316,7 +305,6 @@ "EventName": "UNC_M_POWERDOWN_CYCLES.SCH0_RANK3", "Experimental": "1", "PerPkg": "1", - "PublicDescription": "-", "UMask": "0x8", "Unit": "IMC" }, @@ -327,7 +315,6 @@ "EventName": "UNC_M_POWERDOWN_CYCLES.SCH1_RANK0", "Experimental": "1", "PerPkg": "1", - "PublicDescription": "-", "UMask": "0x10", "Unit": "IMC" }, @@ -338,7 +325,6 @@ "EventName": "UNC_M_POWERDOWN_CYCLES.SCH1_RANK1", "Experimental": "1", "PerPkg": "1", - "PublicDescription": "-", "UMask": "0x20", "Unit": "IMC" }, @@ -349,7 +335,6 @@ "EventName": "UNC_M_POWERDOWN_CYCLES.SCH1_RANK2", "Experimental": "1", "PerPkg": "1", - "PublicDescription": "-", "UMask": "0x40", "Unit": "IMC" }, @@ -360,7 +345,6 @@ "EventName": "UNC_M_POWERDOWN_CYCLES.SCH1_RANK3", "Experimental": "1", "PerPkg": "1", - "PublicDescription": "-", "UMask": "0x80", "Unit": "IMC" }, @@ -371,7 +355,6 @@ "EventName": "UNC_M_POWER_CHANNEL_PPD_CYCLES", "Experimental": "1", "PerPkg": "1", - "PublicDescription": "-", "Unit": "IMC" }, { @@ -381,7 +364,6 @@ "EventName": "UNC_M_POWER_CRITICAL_THROTTLE_CYCLES.SLOT0", "Experimental": "1", "PerPkg": "1", - "PublicDescription": "-", "UMask": "0x1", "Unit": "IMC" }, @@ -392,7 +374,6 @@ "EventName": "UNC_M_POWER_CRITICAL_THROTTLE_CYCLES.SLOT1", "Experimental": "1", "PerPkg": "1", - "PublicDescription": "-", "UMask": "0x2", "Unit": "IMC" }, @@ -423,7 +404,6 @@ "EventName": "UNC_M_POWER_THROTTLE_CYCLES.MR4BLKEN", "Experimental": "1", "PerPkg": "1", - "PublicDescription": "-", "UMask": "0x8", "Unit": "IMC" }, @@ -434,7 +414,6 @@ "EventName": "UNC_M_POWER_THROTTLE_CYCLES.RAPLBLK", "Experimental": "1", "PerPkg": "1", - "PublicDescription": "-", "UMask": "0x4", "Unit": "IMC" }, @@ -615,7 +594,6 @@ "EventName": "UNC_M_SELF_REFRESH.ENTER_SUCCESS", "Experimental": "1", "PerPkg": "1", - "PublicDescription": "UNC_M_SELF_REFRESH.ENTER_SUCCESS", "UMask": "0x2", "Unit": "IMC" }, @@ -626,7 +604,6 @@ "EventName": "UNC_M_SELF_REFRESH.ENTER_SUCCESS_CYCLES", "Experimental": "1", "PerPkg": "1", - "PublicDescription": "-", "UMask": "0x1", "Unit": "IMC" }, @@ -637,7 +614,6 @@ "EventName": "UNC_M_THROTTLE_CRIT_CYCLES.SLOT0", "Experimental": "1", "PerPkg": "1", - "PublicDescription": "-", "UMask": "0x1", "Unit": "IMC" }, @@ -648,7 +624,6 @@ "EventName": "UNC_M_THROTTLE_CRIT_CYCLES.SLOT1", "Experimental": "1", "PerPkg": "1", - "PublicDescription": "-", "UMask": "0x2", "Unit": "IMC" }, @@ -659,7 +634,6 @@ "EventName": "UNC_M_THROTTLE_HIGH_CYCLES.SLOT0", "Experimental": "1", "PerPkg": "1", - "PublicDescription": "-", "UMask": "0x1", "Unit": "IMC" }, @@ -670,7 +644,6 @@ "EventName": "UNC_M_THROTTLE_HIGH_CYCLES.SLOT1", "Experimental": "1", "PerPkg": "1", - "PublicDescription": "-", "UMask": "0x2", "Unit": "IMC" }, @@ -681,7 +654,6 @@ "EventName": "UNC_M_THROTTLE_LOW_CYCLES.SLOT0", "Experimental": "1", "PerPkg": "1", - "PublicDescription": "-", "UMask": "0x1", "Unit": "IMC" }, @@ -692,7 +664,6 @@ "EventName": "UNC_M_THROTTLE_LOW_CYCLES.SLOT1", "Experimental": "1", "PerPkg": "1", - "PublicDescription": "-", "UMask": "0x2", "Unit": "IMC" }, @@ -703,7 +674,6 @@ "EventName": "UNC_M_THROTTLE_MID_CYCLES.SLOT0", "Experimental": "1", "PerPkg": "1", - "PublicDescription": "-", "UMask": "0x1", "Unit": "IMC" }, @@ -714,7 +684,6 @@ "EventName": "UNC_M_THROTTLE_MID_CYCLES.SLOT1", "Experimental": "1", "PerPkg": "1", - "PublicDescription": "-", "UMask": "0x2", "Unit": "IMC" }, diff --git a/tools/perf/pmu-events/arch/x86/graniterapids/virtual-memory.js= on b/tools/perf/pmu-events/arch/x86/graniterapids/virtual-memory.json index 3d3f88600e26..609a9549cbf3 100644 --- a/tools/perf/pmu-events/arch/x86/graniterapids/virtual-memory.json +++ b/tools/perf/pmu-events/arch/x86/graniterapids/virtual-memory.json @@ -4,7 +4,7 @@ "Counter": "0,1,2,3", "EventCode": "0x12", "EventName": "DTLB_LOAD_MISSES.STLB_HIT", - "PublicDescription": "Counts loads that miss the DTLB (Data TLB) a= nd hit the STLB (Second level TLB). Available PDIST counters: 0", + "PublicDescription": "Counts loads that miss the DTLB (Data TLB) a= nd hit the STLB (Second level TLB).", "SampleAfterValue": "100003", "UMask": "0x20" }, @@ -14,7 +14,7 @@ "CounterMask": "1", "EventCode": "0x12", "EventName": "DTLB_LOAD_MISSES.WALK_ACTIVE", - "PublicDescription": "Counts cycles when at least one PMH (Page Mi= ss Handler) is busy with a page walk for a demand load. Available PDIST cou= nters: 0", + "PublicDescription": "Counts cycles when at least one PMH (Page Mi= ss Handler) is busy with a page walk for a demand load.", "SampleAfterValue": "100003", "UMask": "0x10" }, @@ -23,7 +23,7 @@ "Counter": "0,1,2,3", "EventCode": "0x12", "EventName": "DTLB_LOAD_MISSES.WALK_COMPLETED", - "PublicDescription": "Counts completed page walks (all page sizes= ) caused by demand data loads. This implies it missed in the DTLB and furth= er levels of TLB. The page walk can end with or without a fault. Available = PDIST counters: 0", + "PublicDescription": "Counts completed page walks (all page sizes= ) caused by demand data loads. This implies it missed in the DTLB and furth= er levels of TLB. The page walk can end with or without a fault.", "SampleAfterValue": "100003", "UMask": "0xe" }, @@ -32,7 +32,7 @@ "Counter": "0,1,2,3", "EventCode": "0x12", "EventName": "DTLB_LOAD_MISSES.WALK_COMPLETED_1G", - "PublicDescription": "Counts completed page walks (1G sizes) caus= ed by demand data loads. This implies address translations missed in the DT= LB and further levels of TLB. The page walk can end with or without a fault= . Available PDIST counters: 0", + "PublicDescription": "Counts completed page walks (1G sizes) caus= ed by demand data loads. This implies address translations missed in the DT= LB and further levels of TLB. The page walk can end with or without a fault= .", "SampleAfterValue": "100003", "UMask": "0x8" }, @@ -41,7 +41,7 @@ "Counter": "0,1,2,3", "EventCode": "0x12", "EventName": "DTLB_LOAD_MISSES.WALK_COMPLETED_2M_4M", - "PublicDescription": "Counts completed page walks (2M/4M sizes) c= aused by demand data loads. This implies address translations missed in the= DTLB and further levels of TLB. The page walk can end with or without a fa= ult. Available PDIST counters: 0", + "PublicDescription": "Counts completed page walks (2M/4M sizes) c= aused by demand data loads. This implies address translations missed in the= DTLB and further levels of TLB. The page walk can end with or without a fa= ult.", "SampleAfterValue": "100003", "UMask": "0x4" }, @@ -50,7 +50,7 @@ "Counter": "0,1,2,3", "EventCode": "0x12", "EventName": "DTLB_LOAD_MISSES.WALK_COMPLETED_4K", - "PublicDescription": "Counts completed page walks (4K sizes) caus= ed by demand data loads. This implies address translations missed in the DT= LB and further levels of TLB. The page walk can end with or without a fault= . Available PDIST counters: 0", + "PublicDescription": "Counts completed page walks (4K sizes) caus= ed by demand data loads. This implies address translations missed in the DT= LB and further levels of TLB. The page walk can end with or without a fault= .", "SampleAfterValue": "100003", "UMask": "0x2" }, @@ -59,7 +59,7 @@ "Counter": "0,1,2,3", "EventCode": "0x12", "EventName": "DTLB_LOAD_MISSES.WALK_PENDING", - "PublicDescription": "Counts the number of page walks outstanding = for a demand load in the PMH (Page Miss Handler) each cycle. Available PDIS= T counters: 0", + "PublicDescription": "Counts the number of page walks outstanding = for a demand load in the PMH (Page Miss Handler) each cycle.", "SampleAfterValue": "100003", "UMask": "0x10" }, @@ -68,7 +68,7 @@ "Counter": "0,1,2,3", "EventCode": "0x13", "EventName": "DTLB_STORE_MISSES.STLB_HIT", - "PublicDescription": "Counts stores that miss the DTLB (Data TLB) = and hit the STLB (2nd Level TLB). Available PDIST counters: 0", + "PublicDescription": "Counts stores that miss the DTLB (Data TLB) = and hit the STLB (2nd Level TLB).", "SampleAfterValue": "100003", "UMask": "0x20" }, @@ -78,7 +78,7 @@ "CounterMask": "1", "EventCode": "0x13", "EventName": "DTLB_STORE_MISSES.WALK_ACTIVE", - "PublicDescription": "Counts cycles when at least one PMH (Page Mi= ss Handler) is busy with a page walk for a store. Available PDIST counters:= 0", + "PublicDescription": "Counts cycles when at least one PMH (Page Mi= ss Handler) is busy with a page walk for a store.", "SampleAfterValue": "100003", "UMask": "0x10" }, @@ -87,7 +87,7 @@ "Counter": "0,1,2,3", "EventCode": "0x13", "EventName": "DTLB_STORE_MISSES.WALK_COMPLETED", - "PublicDescription": "Counts completed page walks (all page sizes= ) caused by demand data stores. This implies it missed in the DTLB and furt= her levels of TLB. The page walk can end with or without a fault. Available= PDIST counters: 0", + "PublicDescription": "Counts completed page walks (all page sizes= ) caused by demand data stores. This implies it missed in the DTLB and furt= her levels of TLB. The page walk can end with or without a fault.", "SampleAfterValue": "100003", "UMask": "0xe" }, @@ -96,7 +96,7 @@ "Counter": "0,1,2,3", "EventCode": "0x13", "EventName": "DTLB_STORE_MISSES.WALK_COMPLETED_1G", - "PublicDescription": "Counts completed page walks (1G sizes) caus= ed by demand data stores. This implies address translations missed in the D= TLB and further levels of TLB. The page walk can end with or without a faul= t. Available PDIST counters: 0", + "PublicDescription": "Counts completed page walks (1G sizes) caus= ed by demand data stores. This implies address translations missed in the D= TLB and further levels of TLB. The page walk can end with or without a faul= t.", "SampleAfterValue": "100003", "UMask": "0x8" }, @@ -105,7 +105,7 @@ "Counter": "0,1,2,3", "EventCode": "0x13", "EventName": "DTLB_STORE_MISSES.WALK_COMPLETED_2M_4M", - "PublicDescription": "Counts completed page walks (2M/4M sizes) c= aused by demand data stores. This implies address translations missed in th= e DTLB and further levels of TLB. The page walk can end with or without a f= ault. Available PDIST counters: 0", + "PublicDescription": "Counts completed page walks (2M/4M sizes) c= aused by demand data stores. This implies address translations missed in th= e DTLB and further levels of TLB. The page walk can end with or without a f= ault.", "SampleAfterValue": "100003", "UMask": "0x4" }, @@ -114,7 +114,7 @@ "Counter": "0,1,2,3", "EventCode": "0x13", "EventName": "DTLB_STORE_MISSES.WALK_COMPLETED_4K", - "PublicDescription": "Counts completed page walks (4K sizes) caus= ed by demand data stores. This implies address translations missed in the D= TLB and further levels of TLB. The page walk can end with or without a faul= t. Available PDIST counters: 0", + "PublicDescription": "Counts completed page walks (4K sizes) caus= ed by demand data stores. This implies address translations missed in the D= TLB and further levels of TLB. The page walk can end with or without a faul= t.", "SampleAfterValue": "100003", "UMask": "0x2" }, @@ -123,7 +123,7 @@ "Counter": "0,1,2,3", "EventCode": "0x13", "EventName": "DTLB_STORE_MISSES.WALK_PENDING", - "PublicDescription": "Counts the number of page walks outstanding = for a store in the PMH (Page Miss Handler) each cycle. Available PDIST coun= ters: 0", + "PublicDescription": "Counts the number of page walks outstanding = for a store in the PMH (Page Miss Handler) each cycle.", "SampleAfterValue": "100003", "UMask": "0x10" }, @@ -132,7 +132,7 @@ "Counter": "0,1,2,3", "EventCode": "0x11", "EventName": "ITLB_MISSES.STLB_HIT", - "PublicDescription": "Counts instruction fetch requests that miss = the ITLB (Instruction TLB) and hit the STLB (Second-level TLB). Available P= DIST counters: 0", + "PublicDescription": "Counts instruction fetch requests that miss = the ITLB (Instruction TLB) and hit the STLB (Second-level TLB).", "SampleAfterValue": "100003", "UMask": "0x20" }, @@ -142,7 +142,7 @@ "CounterMask": "1", "EventCode": "0x11", "EventName": "ITLB_MISSES.WALK_ACTIVE", - "PublicDescription": "Counts cycles when at least one PMH (Page Mi= ss Handler) is busy with a page walk for a code (instruction fetch) request= . Available PDIST counters: 0", + "PublicDescription": "Counts cycles when at least one PMH (Page Mi= ss Handler) is busy with a page walk for a code (instruction fetch) request= .", "SampleAfterValue": "100003", "UMask": "0x10" }, @@ -151,7 +151,7 @@ "Counter": "0,1,2,3", "EventCode": "0x11", "EventName": "ITLB_MISSES.WALK_COMPLETED", - "PublicDescription": "Counts completed page walks (all page sizes)= caused by a code fetch. This implies it missed in the ITLB (Instruction TL= B) and further levels of TLB. The page walk can end with or without a fault= . Available PDIST counters: 0", + "PublicDescription": "Counts completed page walks (all page sizes)= caused by a code fetch. This implies it missed in the ITLB (Instruction TL= B) and further levels of TLB. The page walk can end with or without a fault= .", "SampleAfterValue": "100003", "UMask": "0xe" }, @@ -160,7 +160,7 @@ "Counter": "0,1,2,3", "EventCode": "0x11", "EventName": "ITLB_MISSES.WALK_COMPLETED_2M_4M", - "PublicDescription": "Counts completed page walks (2M/4M page size= s) caused by a code fetch. This implies it missed in the ITLB (Instruction = TLB) and further levels of TLB. The page walk can end with or without a fau= lt. Available PDIST counters: 0", + "PublicDescription": "Counts completed page walks (2M/4M page size= s) caused by a code fetch. This implies it missed in the ITLB (Instruction = TLB) and further levels of TLB. The page walk can end with or without a fau= lt.", "SampleAfterValue": "100003", "UMask": "0x4" }, @@ -169,7 +169,7 @@ "Counter": "0,1,2,3", "EventCode": "0x11", "EventName": "ITLB_MISSES.WALK_COMPLETED_4K", - "PublicDescription": "Counts completed page walks (4K page sizes) = caused by a code fetch. This implies it missed in the ITLB (Instruction TLB= ) and further levels of TLB. The page walk can end with or without a fault.= Available PDIST counters: 0", + "PublicDescription": "Counts completed page walks (4K page sizes) = caused by a code fetch. This implies it missed in the ITLB (Instruction TLB= ) and further levels of TLB. The page walk can end with or without a fault.= ", "SampleAfterValue": "100003", "UMask": "0x2" }, @@ -178,7 +178,7 @@ "Counter": "0,1,2,3", "EventCode": "0x11", "EventName": "ITLB_MISSES.WALK_PENDING", - "PublicDescription": "Counts the number of page walks outstanding = for an outstanding code (instruction fetch) request in the PMH (Page Miss H= andler) each cycle. Available PDIST counters: 0", + "PublicDescription": "Counts the number of page walks outstanding = for an outstanding code (instruction fetch) request in the PMH (Page Miss H= andler) each cycle.", "SampleAfterValue": "100003", "UMask": "0x10" } diff --git a/tools/perf/pmu-events/arch/x86/mapfile.csv b/tools/perf/pmu-ev= ents/arch/x86/mapfile.csv index e3ac2dede20a..2d7a9fa055dd 100644 --- a/tools/perf/pmu-events/arch/x86/mapfile.csv +++ b/tools/perf/pmu-events/arch/x86/mapfile.csv @@ -13,7 +13,7 @@ GenuineIntel-6-CF,v1.16,emeraldrapids,core GenuineIntel-6-5[CF],v13,goldmont,core GenuineIntel-6-7A,v1.01,goldmontplus,core GenuineIntel-6-B6,v1.09,grandridge,core -GenuineIntel-6-A[DE],v1.10,graniterapids,core +GenuineIntel-6-A[DE],v1.12,graniterapids,core GenuineIntel-6-(3C|45|46),v36,haswell,core GenuineIntel-6-3F,v29,haswellx,core GenuineIntel-6-7[DE],v1.24,icelake,core --=20 2.50.0.727.gbf7dc18ff4-goog From nobody Mon Oct 6 15:13:22 2025 Received: from mail-pf1-f201.google.com (mail-pf1-f201.google.com [209.85.210.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0DEE221FF23 for ; Sat, 19 Jul 2025 03:45:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752896744; cv=none; b=mwMViwq0NhPelpW85f0DsCMynk54e6G9Y35B1nscBavFNZnZWjKS2zHsxVk/pWzm0q7nhaJQ5Ct9ZPcQr9DvZbjU6GhcSjwY6AC7LBP23Ggh1gtA6vZnTHvOb16s/56WWi2uN4e9iYMNUwEFvhEs/bLk5g6T3dMABPcFgHjIUVQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752896744; c=relaxed/simple; bh=0b/Fdqc+5M9kJN0ZjO1K7BwBSncmP1POxHV7l/bUl0A=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Content-Type; b=lrAf97oA5ow+Bpb2f45cIbchVfMWtrKkOp3ndv/HPv+B687tNnuBGIQHus8HbrmC1dBd9os3uBjbMGiYuGgpDmOUjv+D327G0OSwKWcOTNhnWjp6j9Lk+mwBvVvaeWCLY1OGLBWLQbic4vJN0NAKwkFkMKeHAh43Fp+d3adHNgs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=tarH1CJq; arc=none smtp.client-ip=209.85.210.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="tarH1CJq" Received: by mail-pf1-f201.google.com with SMTP id d2e1a72fcca58-74913385dd8so3708822b3a.0 for ; Fri, 18 Jul 2025 20:45:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1752896740; x=1753501540; darn=vger.kernel.org; h=content-transfer-encoding:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=Oq/QPsklHyhVb1UgtWbIgHrWCqKuRIA4cR3xOVT8HWY=; b=tarH1CJqIwb/eltDdfPP+eaKuaU8htV9BrxLkAU+iPTQnUbcpcFBFkQQOa25Jr4EWU A6rkXW2qDvmO1VRz7UMJCdfRuvLHQBuyoDdXvp6VGwgt+oRvw5BEFC2/ETYoe+Q1zPOk fLiJoI6QljLlfZkAMAtCQdwE6LT20/n98hIyIvlHKxUCsSWHzsDicec9M9vFt7nakSjo vRP3Cu6lOC47wI1aBYsVGT+iv4cU5yXi20pAXIzqP+wLBcCjjTVukHzaplTW2WiTlClx PoTJnQd/GFs2GEakvB/npQKe5zYQ2FZNRBzQAvGLpNiC9tRi2CsK/5ivwkm1JMr83rOa Cv5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752896740; x=1753501540; h=content-transfer-encoding:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=Oq/QPsklHyhVb1UgtWbIgHrWCqKuRIA4cR3xOVT8HWY=; b=IvyS5PmbjvZhvnxIjW/3SlRrNJ3FGgwtVgBim+ea36acmS56hoGDFCHgEGOV/rz0wF +fwdGY7/ekXcSTgjQ6RbGntupCghow8t45usaNxG0U48oVsU5rS/cQAJBGIMuLPsBhK1 LQmTGZkaQ78cUJyyOLUI93i5OH/kymf1sC4FFF3SfZ/6eBXPX/0YFhJ/v+jCG8yMS/ra zfPqqHtTITRzc4ui5WY9Q+QedyIkvK4gHRPiWvAmHsNIiRtAD4G+ZsUimz5d701TdVjD Uvb3Z8IJrHwmIJ6fn1bHUPABMdrbi1EvfzuYSWRDLsZ8RkCL3/BQ5bdOZAl5/uz/Z/X2 pEcA== X-Forwarded-Encrypted: i=1; AJvYcCXbagROzkT3clKFlWUE+ksVGCV5+mLvl+23y3RC9FfZy2OZD8KNVcWAi98VGwixziDbL0aTmAqYWOSpQ/g=@vger.kernel.org X-Gm-Message-State: AOJu0YywAZVfV09Z6JdSQWghNZQP0EXbzJr5bcYiA7YkFjgKGWSQ23+o ialYlNkYJDLTxo3gsFibG2g2LdtIEdsshklVsnCedj7gFbROrxyhWMCukWRDxbBqSA0e8X1cmq2 B9czGasYZww== X-Google-Smtp-Source: AGHT+IEv2l18L0ZhkdFtfo16oi6E8iLas+G2yXT8Dwzbcv0gVoVu1uRvcB9htUL9TcDscGlYg0VQVqB5YjE0 X-Received: from pgcn11.prod.google.com ([2002:a63:720b:0:b0:b2f:5d02:68e9]) (user=irogers job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:1594:b0:21f:4f34:6b1 with SMTP id adf61e73a8af0-23811055c32mr21777199637.14.1752896740246; Fri, 18 Jul 2025 20:45:40 -0700 (PDT) Date: Fri, 18 Jul 2025 20:45:04 -0700 In-Reply-To: <20250719034515.2000467-1-irogers@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250719034515.2000467-1-irogers@google.com> X-Mailer: git-send-email 2.50.0.727.gbf7dc18ff4-goog Message-ID: <20250719034515.2000467-9-irogers@google.com> Subject: [PATCH v1 08/19] perf vendor events: Update haswell metrics From: Ian Rogers To: Thomas Falcon , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Kan Liang , "=?UTF-8?q?Andreas=20F=C3=A4rber?=" , Manivannan Sadhasivam , Caleb Biggers , Weilin Wang , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-actions@lists.infradead.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Update metrics from TMA 5.0 to 5.1. Signed-off-by: Ian Rogers Tested-by: Thomas Falcon --- .../arch/x86/haswell/hsw-metrics.json | 32 ++++++++--------- .../arch/x86/haswellx/hsx-metrics.json | 35 +++++++++---------- 2 files changed, 31 insertions(+), 36 deletions(-) diff --git a/tools/perf/pmu-events/arch/x86/haswell/hsw-metrics.json b/tool= s/perf/pmu-events/arch/x86/haswell/hsw-metrics.json index b26ea70a3628..aebd82ced1cf 100644 --- a/tools/perf/pmu-events/arch/x86/haswell/hsw-metrics.json +++ b/tools/perf/pmu-events/arch/x86/haswell/hsw-metrics.json @@ -1,49 +1,49 @@ [ { "BriefDescription": "C2 residency percent per package", - "MetricExpr": "cstate_pkg@c2\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c2\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C2_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C3 residency percent per core", - "MetricExpr": "cstate_core@c3\\-residency@ / TSC", + "MetricExpr": "cstate_core@c3\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C3_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C3 residency percent per package", - "MetricExpr": "cstate_pkg@c3\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c3\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C3_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per core", - "MetricExpr": "cstate_core@c6\\-residency@ / TSC", + "MetricExpr": "cstate_core@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per package", - "MetricExpr": "cstate_pkg@c6\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C7 residency percent per core", - "MetricExpr": "cstate_core@c7\\-residency@ / TSC", + "MetricExpr": "cstate_core@c7\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C7_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C7 residency percent per package", - "MetricExpr": "cstate_pkg@c7\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c7\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C7_Pkg_Residency", "ScaleUnit": "100%" @@ -80,7 +80,6 @@ }, { "BriefDescription": "This metric represents Core fraction of cycle= s CPU dispatched uops on execution ports for ALU operations.", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "(UOPS_DISPATCHED_PORT.PORT_0 + UOPS_DISPATCHED_PORT= .PORT_1 + UOPS_DISPATCHED_PORT.PORT_5 + UOPS_DISPATCHED_PORT.PORT_6) / tma_= info_thread_slots", "MetricGroup": "TopdownL5;tma_L5_group;tma_ports_utilized_3m_group= ", "MetricName": "tma_alu_op_utilization", @@ -98,7 +97,6 @@ }, { "BriefDescription": "This category represents fraction of slots wh= ere no uops are being delivered due to a lack of required resources for acc= epting new uops in the Backend", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "1 - (tma_frontend_bound + tma_bad_speculation + tma= _retiring)", "MetricGroup": "BvOB;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", @@ -139,7 +137,6 @@ }, { "BriefDescription": "This metric estimates fraction of cycles the = CPU retired uops originated from CISC (complex instruction set computer) in= struction", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "max(0, tma_microcode_sequencer - tma_assists)", "MetricGroup": "TopdownL4;tma_L4_group;tma_microcode_sequencer_gro= up", "MetricName": "tma_cisc", @@ -509,7 +506,7 @@ }, { "BriefDescription": "Measured Average Core Frequency for unhalted = processors [GHz]", - "MetricExpr": "tma_info_system_turbo_utilization * TSC / 1e9 / tma= _info_system_time", + "MetricExpr": "tma_info_system_turbo_utilization * msr@tsc@ / 1e9 = / tma_info_system_time", "MetricGroup": "Power;Summary", "MetricName": "tma_info_system_core_frequency" }, @@ -521,7 +518,7 @@ }, { "BriefDescription": "Average number of utilized CPUs", - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / TSC", + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@", "MetricGroup": "Summary", "MetricName": "tma_info_system_cpus_utilized" }, @@ -696,7 +693,6 @@ }, { "BriefDescription": "This metric represents Core fraction of cycle= s CPU dispatched uops on execution port for Load operations", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "(UOPS_DISPATCHED_PORT.PORT_2 + UOPS_DISPATCHED_PORT= .PORT_3 + UOPS_DISPATCHED_PORT.PORT_7 - UOPS_DISPATCHED_PORT.PORT_4) / (2 *= tma_info_core_core_clks)", "MetricGroup": "TopdownL5;tma_L5_group;tma_ports_utilized_3m_group= ", "MetricName": "tma_load_op_utilization", @@ -746,7 +742,7 @@ { "BriefDescription": "This metric represents fraction of slots the = Memory subsystem within the Backend was a bottleneck", "MetricConstraint": "NO_GROUP_EVENTS", - "MetricExpr": "((min(CPU_CLK_UNHALTED.THREAD, CYCLE_ACTIVITY.STALL= S_LDM_PENDING) + RESOURCE_STALLS.SB) / (min(CPU_CLK_UNHALTED.THREAD, CYCLE_= ACTIVITY.CYCLES_NO_EXECUTE) + (cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D1@ - (cpu= @UOPS_EXECUTED.CORE\\,cmask\\=3D3@ if tma_info_thread_ipc > 1.8 else cpu@UO= PS_EXECUTED.CORE\\,cmask\\=3D2@)) / 2 - (RS_EVENTS.EMPTY_CYCLES if tma_fetc= h_latency > 0.1 else 0) + RESOURCE_STALLS.SB) if #SMT_on else min(CPU_CLK_U= NHALTED.THREAD, CYCLE_ACTIVITY.CYCLES_NO_EXECUTE) + cpu@UOPS_EXECUTED.CORE\= \,cmask\\=3D1@ - (cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D3@ if tma_info_thread_= ipc > 1.8 else cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D2@) - (RS_EVENTS.EMPTY_CY= CLES if tma_fetch_latency > 0.1 else 0) + RESOURCE_STALLS.SB) * tma_backend= _bound", + "MetricExpr": "(min(CPU_CLK_UNHALTED.THREAD, CYCLE_ACTIVITY.STALLS= _LDM_PENDING) + RESOURCE_STALLS.SB) / (min(CPU_CLK_UNHALTED.THREAD, CYCLE_A= CTIVITY.CYCLES_NO_EXECUTE) + (cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D1@ - (cpu@= UOPS_EXECUTED.CORE\\,cmask\\=3D3@ if tma_info_thread_ipc > 1.8 else cpu@UOP= S_EXECUTED.CORE\\,cmask\\=3D2@)) / 2 - (RS_EVENTS.EMPTY_CYCLES if tma_fetch= _latency > 0.1 else 0) + RESOURCE_STALLS.SB if #SMT_on else min(CPU_CLK_UNH= ALTED.THREAD, CYCLE_ACTIVITY.CYCLES_NO_EXECUTE) + cpu@UOPS_EXECUTED.CORE\\,= cmask\\=3D1@ - (cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D3@ if tma_info_thread_ip= c > 1.8 else cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D2@) - (RS_EVENTS.EMPTY_CYCL= ES if tma_fetch_latency > 0.1 else 0) + RESOURCE_STALLS.SB) * tma_backend_b= ound", "MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_b= ound_group", "MetricName": "tma_memory_bound", "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0= .2", @@ -856,7 +852,7 @@ { "BriefDescription": "This metric estimates fraction of cycles the = CPU performance was potentially limited due to Core computation issues (non= divider-related)", "MetricConstraint": "NO_GROUP_EVENTS", - "MetricExpr": "(min(CPU_CLK_UNHALTED.THREAD, CYCLE_ACTIVITY.CYCLES= _NO_EXECUTE) + (cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D1@ - (cpu@UOPS_EXECUTED.= CORE\\,cmask\\=3D3@ if tma_info_thread_ipc > 1.8 else cpu@UOPS_EXECUTED.COR= E\\,cmask\\=3D2@)) / 2 - (RS_EVENTS.EMPTY_CYCLES if tma_fetch_latency > 0.1= else 0) + RESOURCE_STALLS.SB if #SMT_on else min(CPU_CLK_UNHALTED.THREAD, = CYCLE_ACTIVITY.CYCLES_NO_EXECUTE) + cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D1@ -= (cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D3@ if tma_info_thread_ipc > 1.8 else c= pu@UOPS_EXECUTED.CORE\\,cmask\\=3D2@) - (RS_EVENTS.EMPTY_CYCLES if tma_fetc= h_latency > 0.1 else 0) + RESOURCE_STALLS.SB - RESOURCE_STALLS.SB - min(CPU= _CLK_UNHALTED.THREAD, CYCLE_ACTIVITY.STALLS_LDM_PENDING)) / tma_info_thread= _clks", + "MetricExpr": "((min(CPU_CLK_UNHALTED.THREAD, CYCLE_ACTIVITY.CYCLE= S_NO_EXECUTE) + (cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D1@ - (cpu@UOPS_EXECUTED= .CORE\\,cmask\\=3D3@ if tma_info_thread_ipc > 1.8 else cpu@UOPS_EXECUTED.CO= RE\\,cmask\\=3D2@)) / 2 - (RS_EVENTS.EMPTY_CYCLES if tma_fetch_latency > 0.= 1 else 0) + RESOURCE_STALLS.SB if #SMT_on else min(CPU_CLK_UNHALTED.THREAD,= CYCLE_ACTIVITY.CYCLES_NO_EXECUTE) + cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D1@ = - (cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D3@ if tma_info_thread_ipc > 1.8 else = cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D2@) - (RS_EVENTS.EMPTY_CYCLES if tma_fet= ch_latency > 0.1 else 0) + RESOURCE_STALLS.SB) - RESOURCE_STALLS.SB - min(C= PU_CLK_UNHALTED.THREAD, CYCLE_ACTIVITY.STALLS_LDM_PENDING)) / tma_info_thre= ad_clks", "MetricGroup": "PortsUtil;TopdownL3;tma_L3_group;tma_core_bound_gr= oup", "MetricName": "tma_ports_utilization", "MetricThreshold": "tma_ports_utilization > 0.15 & (tma_core_bound= > 0.1 & tma_backend_bound > 0.2)", @@ -865,7 +861,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles CPU= executed no uops on any execution port (Logical Processor cycles since ICL= , Physical Core cycles otherwise)", - "MetricExpr": "(cpu@UOPS_EXECUTED.CORE\\,inv\\,cmask\\=3D1@ / 2 if= #SMT_on else (min(CPU_CLK_UNHALTED.THREAD, CYCLE_ACTIVITY.CYCLES_NO_EXECUT= E) - (RS_EVENTS.EMPTY_CYCLES if tma_fetch_latency > 0.1 else 0)) / tma_info= _core_core_clks)", + "MetricExpr": "(cpu@UOPS_EXECUTED.CORE\\,inv\\,cmask\\=3D1@ / 2 if= #SMT_on else min(CPU_CLK_UNHALTED.THREAD, CYCLE_ACTIVITY.CYCLES_NO_EXECUTE= ) - (RS_EVENTS.EMPTY_CYCLES if tma_fetch_latency > 0.1 else 0)) / tma_info_= core_core_clks", "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_ports_utiliza= tion_group", "MetricName": "tma_ports_utilized_0", "MetricThreshold": "tma_ports_utilized_0 > 0.2 & (tma_ports_utiliz= ation > 0.15 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2))", @@ -874,7 +870,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles whe= re the CPU executed total of 1 uop per cycle on all execution ports (Logica= l Processor cycles since ICL, Physical Core cycles otherwise)", - "MetricExpr": "((cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D1@ - cpu@UOPS_= EXECUTED.CORE\\,cmask\\=3D2@) / 2 if #SMT_on else (cpu@UOPS_EXECUTED.CORE\\= ,cmask\\=3D1@ - cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D2@) / tma_info_core_core= _clks)", + "MetricExpr": "((cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D1@ - cpu@UOPS_= EXECUTED.CORE\\,cmask\\=3D2@) / 2 if #SMT_on else cpu@UOPS_EXECUTED.CORE\\,= cmask\\=3D1@ - cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D2@) / tma_info_core_core_= clks", "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_issueL1;tma_p= orts_utilization_group", "MetricName": "tma_ports_utilized_1", "MetricThreshold": "tma_ports_utilized_1 > 0.2 & (tma_ports_utiliz= ation > 0.15 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2))", @@ -883,7 +879,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles CPU= executed total of 2 uops per cycle on all execution ports (Logical Process= or cycles since ICL, Physical Core cycles otherwise)", - "MetricExpr": "((cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D2@ - cpu@UOPS_= EXECUTED.CORE\\,cmask\\=3D3@) / 2 if #SMT_on else (cpu@UOPS_EXECUTED.CORE\\= ,cmask\\=3D2@ - cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D3@) / tma_info_core_core= _clks)", + "MetricExpr": "((cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D2@ - cpu@UOPS_= EXECUTED.CORE\\,cmask\\=3D3@) / 2 if #SMT_on else cpu@UOPS_EXECUTED.CORE\\,= cmask\\=3D2@ - cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D3@) / tma_info_core_core_= clks", "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_issue2P;tma_p= orts_utilization_group", "MetricName": "tma_ports_utilized_2", "MetricThreshold": "tma_ports_utilized_2 > 0.15 & (tma_ports_utili= zation > 0.15 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2))", diff --git a/tools/perf/pmu-events/arch/x86/haswellx/hsx-metrics.json b/too= ls/perf/pmu-events/arch/x86/haswellx/hsx-metrics.json index 8245a98ad4b9..b8845f8a28b9 100644 --- a/tools/perf/pmu-events/arch/x86/haswellx/hsx-metrics.json +++ b/tools/perf/pmu-events/arch/x86/haswellx/hsx-metrics.json @@ -1,49 +1,49 @@ [ { "BriefDescription": "C2 residency percent per package", - "MetricExpr": "cstate_pkg@c2\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c2\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C2_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C3 residency percent per core", - "MetricExpr": "cstate_core@c3\\-residency@ / TSC", + "MetricExpr": "cstate_core@c3\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C3_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C3 residency percent per package", - "MetricExpr": "cstate_pkg@c3\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c3\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C3_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per core", - "MetricExpr": "cstate_core@c6\\-residency@ / TSC", + "MetricExpr": "cstate_core@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per package", - "MetricExpr": "cstate_pkg@c6\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C7 residency percent per core", - "MetricExpr": "cstate_core@c7\\-residency@ / TSC", + "MetricExpr": "cstate_core@c7\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C7_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C7 residency percent per package", - "MetricExpr": "cstate_pkg@c7\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c7\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C7_Pkg_Residency", "ScaleUnit": "100%" @@ -282,7 +282,6 @@ }, { "BriefDescription": "This metric represents Core fraction of cycle= s CPU dispatched uops on execution ports for ALU operations.", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "(UOPS_DISPATCHED_PORT.PORT_0 + UOPS_DISPATCHED_PORT= .PORT_1 + UOPS_DISPATCHED_PORT.PORT_5 + UOPS_DISPATCHED_PORT.PORT_6) / tma_= info_thread_slots", "MetricGroup": "TopdownL5;tma_L5_group;tma_ports_utilized_3m_group= ", "MetricName": "tma_alu_op_utilization", @@ -300,7 +299,6 @@ }, { "BriefDescription": "This category represents fraction of slots wh= ere no uops are being delivered due to a lack of required resources for acc= epting new uops in the Backend", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "1 - (tma_frontend_bound + tma_bad_speculation + tma= _retiring)", "MetricGroup": "BvOB;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", @@ -341,7 +339,6 @@ }, { "BriefDescription": "This metric estimates fraction of cycles the = CPU retired uops originated from CISC (complex instruction set computer) in= struction", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "max(0, tma_microcode_sequencer - tma_assists)", "MetricGroup": "TopdownL4;tma_L4_group;tma_microcode_sequencer_gro= up", "MetricName": "tma_cisc", @@ -711,7 +708,7 @@ }, { "BriefDescription": "Measured Average Core Frequency for unhalted = processors [GHz]", - "MetricExpr": "tma_info_system_turbo_utilization * TSC / 1e9 / tma= _info_system_time", + "MetricExpr": "tma_info_system_turbo_utilization * msr@tsc@ / 1e9 = / tma_info_system_time", "MetricGroup": "Power;Summary", "MetricName": "tma_info_system_core_frequency" }, @@ -723,7 +720,7 @@ }, { "BriefDescription": "Average number of utilized CPUs", - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / TSC", + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@", "MetricGroup": "Summary", "MetricName": "tma_info_system_cpus_utilized" }, @@ -756,6 +753,7 @@ }, { "BriefDescription": "Average number of parallel data read requests= to external memory", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "UNC_C_TOR_OCCUPANCY.MISS_OPCODE@filter_opc\\=3D0x18= 2@ / UNC_C_TOR_OCCUPANCY.MISS_OPCODE@filter_opc\\=3D0x182\\,thresh\\=3D1@", "MetricGroup": "Mem;MemoryBW;SoC", "MetricName": "tma_info_system_mem_parallel_reads", @@ -918,7 +916,6 @@ }, { "BriefDescription": "This metric represents Core fraction of cycle= s CPU dispatched uops on execution port for Load operations", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "(UOPS_DISPATCHED_PORT.PORT_2 + UOPS_DISPATCHED_PORT= .PORT_3 + UOPS_DISPATCHED_PORT.PORT_7 - UOPS_DISPATCHED_PORT.PORT_4) / (2 *= tma_info_core_core_clks)", "MetricGroup": "TopdownL5;tma_L5_group;tma_ports_utilized_3m_group= ", "MetricName": "tma_load_op_utilization", @@ -928,6 +925,7 @@ }, { "BriefDescription": "This metric estimates fraction of cycles whil= e the memory subsystem was handling loads from local memory", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "200 * (MEM_LOAD_UOPS_L3_MISS_RETIRED.LOCAL_DRAM * (= 1 + MEM_LOAD_UOPS_RETIRED.HIT_LFB / (MEM_LOAD_UOPS_RETIRED.L2_HIT + MEM_LOA= D_UOPS_RETIRED.L3_HIT + MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_HIT + MEM_LOAD_UO= PS_L3_HIT_RETIRED.XSNP_HITM + MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_MISS + MEM_= LOAD_UOPS_L3_MISS_RETIRED.LOCAL_DRAM + MEM_LOAD_UOPS_L3_MISS_RETIRED.REMOTE= _DRAM + MEM_LOAD_UOPS_L3_MISS_RETIRED.REMOTE_HITM + MEM_LOAD_UOPS_L3_MISS_R= ETIRED.REMOTE_FWD))) / tma_info_thread_clks", "MetricGroup": "Server;TopdownL5;tma_L5_group;tma_mem_latency_grou= p", "MetricName": "tma_local_mem", @@ -977,7 +975,7 @@ { "BriefDescription": "This metric represents fraction of slots the = Memory subsystem within the Backend was a bottleneck", "MetricConstraint": "NO_GROUP_EVENTS", - "MetricExpr": "((min(CPU_CLK_UNHALTED.THREAD, CYCLE_ACTIVITY.STALL= S_LDM_PENDING) + RESOURCE_STALLS.SB) / (min(CPU_CLK_UNHALTED.THREAD, CYCLE_= ACTIVITY.CYCLES_NO_EXECUTE) + (cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D1@ - (cpu= @UOPS_EXECUTED.CORE\\,cmask\\=3D3@ if tma_info_thread_ipc > 1.8 else cpu@UO= PS_EXECUTED.CORE\\,cmask\\=3D2@)) / 2 - (RS_EVENTS.EMPTY_CYCLES if tma_fetc= h_latency > 0.1 else 0) + RESOURCE_STALLS.SB) if #SMT_on else min(CPU_CLK_U= NHALTED.THREAD, CYCLE_ACTIVITY.CYCLES_NO_EXECUTE) + cpu@UOPS_EXECUTED.CORE\= \,cmask\\=3D1@ - (cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D3@ if tma_info_thread_= ipc > 1.8 else cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D2@) - (RS_EVENTS.EMPTY_CY= CLES if tma_fetch_latency > 0.1 else 0) + RESOURCE_STALLS.SB) * tma_backend= _bound", + "MetricExpr": "(min(CPU_CLK_UNHALTED.THREAD, CYCLE_ACTIVITY.STALLS= _LDM_PENDING) + RESOURCE_STALLS.SB) / (min(CPU_CLK_UNHALTED.THREAD, CYCLE_A= CTIVITY.CYCLES_NO_EXECUTE) + (cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D1@ - (cpu@= UOPS_EXECUTED.CORE\\,cmask\\=3D3@ if tma_info_thread_ipc > 1.8 else cpu@UOP= S_EXECUTED.CORE\\,cmask\\=3D2@)) / 2 - (RS_EVENTS.EMPTY_CYCLES if tma_fetch= _latency > 0.1 else 0) + RESOURCE_STALLS.SB if #SMT_on else min(CPU_CLK_UNH= ALTED.THREAD, CYCLE_ACTIVITY.CYCLES_NO_EXECUTE) + cpu@UOPS_EXECUTED.CORE\\,= cmask\\=3D1@ - (cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D3@ if tma_info_thread_ip= c > 1.8 else cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D2@) - (RS_EVENTS.EMPTY_CYCL= ES if tma_fetch_latency > 0.1 else 0) + RESOURCE_STALLS.SB) * tma_backend_b= ound", "MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_b= ound_group", "MetricName": "tma_memory_bound", "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0= .2", @@ -1087,7 +1085,7 @@ { "BriefDescription": "This metric estimates fraction of cycles the = CPU performance was potentially limited due to Core computation issues (non= divider-related)", "MetricConstraint": "NO_GROUP_EVENTS", - "MetricExpr": "(min(CPU_CLK_UNHALTED.THREAD, CYCLE_ACTIVITY.CYCLES= _NO_EXECUTE) + (cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D1@ - (cpu@UOPS_EXECUTED.= CORE\\,cmask\\=3D3@ if tma_info_thread_ipc > 1.8 else cpu@UOPS_EXECUTED.COR= E\\,cmask\\=3D2@)) / 2 - (RS_EVENTS.EMPTY_CYCLES if tma_fetch_latency > 0.1= else 0) + RESOURCE_STALLS.SB if #SMT_on else min(CPU_CLK_UNHALTED.THREAD, = CYCLE_ACTIVITY.CYCLES_NO_EXECUTE) + cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D1@ -= (cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D3@ if tma_info_thread_ipc > 1.8 else c= pu@UOPS_EXECUTED.CORE\\,cmask\\=3D2@) - (RS_EVENTS.EMPTY_CYCLES if tma_fetc= h_latency > 0.1 else 0) + RESOURCE_STALLS.SB - RESOURCE_STALLS.SB - min(CPU= _CLK_UNHALTED.THREAD, CYCLE_ACTIVITY.STALLS_LDM_PENDING)) / tma_info_thread= _clks", + "MetricExpr": "((min(CPU_CLK_UNHALTED.THREAD, CYCLE_ACTIVITY.CYCLE= S_NO_EXECUTE) + (cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D1@ - (cpu@UOPS_EXECUTED= .CORE\\,cmask\\=3D3@ if tma_info_thread_ipc > 1.8 else cpu@UOPS_EXECUTED.CO= RE\\,cmask\\=3D2@)) / 2 - (RS_EVENTS.EMPTY_CYCLES if tma_fetch_latency > 0.= 1 else 0) + RESOURCE_STALLS.SB if #SMT_on else min(CPU_CLK_UNHALTED.THREAD,= CYCLE_ACTIVITY.CYCLES_NO_EXECUTE) + cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D1@ = - (cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D3@ if tma_info_thread_ipc > 1.8 else = cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D2@) - (RS_EVENTS.EMPTY_CYCLES if tma_fet= ch_latency > 0.1 else 0) + RESOURCE_STALLS.SB) - RESOURCE_STALLS.SB - min(C= PU_CLK_UNHALTED.THREAD, CYCLE_ACTIVITY.STALLS_LDM_PENDING)) / tma_info_thre= ad_clks", "MetricGroup": "PortsUtil;TopdownL3;tma_L3_group;tma_core_bound_gr= oup", "MetricName": "tma_ports_utilization", "MetricThreshold": "tma_ports_utilization > 0.15 & (tma_core_bound= > 0.1 & tma_backend_bound > 0.2)", @@ -1096,7 +1094,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles CPU= executed no uops on any execution port (Logical Processor cycles since ICL= , Physical Core cycles otherwise)", - "MetricExpr": "(cpu@UOPS_EXECUTED.CORE\\,inv\\,cmask\\=3D1@ / 2 if= #SMT_on else (min(CPU_CLK_UNHALTED.THREAD, CYCLE_ACTIVITY.CYCLES_NO_EXECUT= E) - (RS_EVENTS.EMPTY_CYCLES if tma_fetch_latency > 0.1 else 0)) / tma_info= _core_core_clks)", + "MetricExpr": "(cpu@UOPS_EXECUTED.CORE\\,inv\\,cmask\\=3D1@ / 2 if= #SMT_on else min(CPU_CLK_UNHALTED.THREAD, CYCLE_ACTIVITY.CYCLES_NO_EXECUTE= ) - (RS_EVENTS.EMPTY_CYCLES if tma_fetch_latency > 0.1 else 0)) / tma_info_= core_core_clks", "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_ports_utiliza= tion_group", "MetricName": "tma_ports_utilized_0", "MetricThreshold": "tma_ports_utilized_0 > 0.2 & (tma_ports_utiliz= ation > 0.15 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2))", @@ -1105,7 +1103,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles whe= re the CPU executed total of 1 uop per cycle on all execution ports (Logica= l Processor cycles since ICL, Physical Core cycles otherwise)", - "MetricExpr": "((cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D1@ - cpu@UOPS_= EXECUTED.CORE\\,cmask\\=3D2@) / 2 if #SMT_on else (cpu@UOPS_EXECUTED.CORE\\= ,cmask\\=3D1@ - cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D2@) / tma_info_core_core= _clks)", + "MetricExpr": "((cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D1@ - cpu@UOPS_= EXECUTED.CORE\\,cmask\\=3D2@) / 2 if #SMT_on else cpu@UOPS_EXECUTED.CORE\\,= cmask\\=3D1@ - cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D2@) / tma_info_core_core_= clks", "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_issueL1;tma_p= orts_utilization_group", "MetricName": "tma_ports_utilized_1", "MetricThreshold": "tma_ports_utilized_1 > 0.2 & (tma_ports_utiliz= ation > 0.15 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2))", @@ -1114,7 +1112,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles CPU= executed total of 2 uops per cycle on all execution ports (Logical Process= or cycles since ICL, Physical Core cycles otherwise)", - "MetricExpr": "((cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D2@ - cpu@UOPS_= EXECUTED.CORE\\,cmask\\=3D3@) / 2 if #SMT_on else (cpu@UOPS_EXECUTED.CORE\\= ,cmask\\=3D2@ - cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D3@) / tma_info_core_core= _clks)", + "MetricExpr": "((cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D2@ - cpu@UOPS_= EXECUTED.CORE\\,cmask\\=3D3@) / 2 if #SMT_on else cpu@UOPS_EXECUTED.CORE\\,= cmask\\=3D2@ - cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D3@) / tma_info_core_core_= clks", "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_issue2P;tma_p= orts_utilization_group", "MetricName": "tma_ports_utilized_2", "MetricThreshold": "tma_ports_utilized_2 > 0.15 & (tma_ports_utili= zation > 0.15 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2))", @@ -1141,6 +1139,7 @@ }, { "BriefDescription": "This metric estimates fraction of cycles whil= e the memory subsystem was handling loads from remote memory", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "310 * (MEM_LOAD_UOPS_L3_MISS_RETIRED.REMOTE_DRAM * = (1 + MEM_LOAD_UOPS_RETIRED.HIT_LFB / (MEM_LOAD_UOPS_RETIRED.L2_HIT + MEM_LO= AD_UOPS_RETIRED.L3_HIT + MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_HIT + MEM_LOAD_U= OPS_L3_HIT_RETIRED.XSNP_HITM + MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_MISS + MEM= _LOAD_UOPS_L3_MISS_RETIRED.LOCAL_DRAM + MEM_LOAD_UOPS_L3_MISS_RETIRED.REMOT= E_DRAM + MEM_LOAD_UOPS_L3_MISS_RETIRED.REMOTE_HITM + MEM_LOAD_UOPS_L3_MISS_= RETIRED.REMOTE_FWD))) / tma_info_thread_clks", "MetricGroup": "Server;Snoop;TopdownL5;tma_L5_group;tma_mem_latenc= y_group", "MetricName": "tma_remote_mem", --=20 2.50.0.727.gbf7dc18ff4-goog From nobody Mon Oct 6 15:13:22 2025 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9CB402206B1 for ; Sat, 19 Jul 2025 03:45:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752896751; cv=none; b=tz5TEhTs16Hn9TjilWVuKwTmPIte0cQJJF72s47mnR1SmwxucCs48svb/kV3wFfsIcAQRE9d6W2Y9EDociax3z4aExFHUiecmjsvroh8kd84N5Tj2mvbT/FnfgOSZwO3kqCEqz0cyoCxMkI4+86gGlnVWk8/nTquv7L5ow9W8jc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752896751; c=relaxed/simple; bh=d5BwQBh5RrPwBwW4M6xuU5V+qMleCUpU/D8/s8WniYk=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Content-Type; b=LxV99Juafh1BKO9ap0Jz4st8pz48y6svwdXjA2jBdLTx0m9hK9zL0escwNpYjXeSmmagJBfjcVuZfgRZDGp28Nlphe4FDnyCptMVFvgHVDtlmVzRpQN4MlmeC2Fo2Ib55PAjWL1fko6UW8/21WA1BMrEAtOyKZQeImLeijbG2J0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=4RoE1zT+; arc=none smtp.client-ip=209.85.214.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="4RoE1zT+" Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-234906c5e29so30355475ad.0 for ; Fri, 18 Jul 2025 20:45:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1752896742; x=1753501542; darn=vger.kernel.org; h=content-transfer-encoding:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=A6OWqCPyDQhTYuHIAQYRxDOhw9LFBlHskAcktUX3uLY=; b=4RoE1zT+x/0bUtCdjd88JX4M+HKJjBqrFoDKpkmGHuvqbmcvZoklLGBQcjvnKx6LBc uCKDb3u9AqP2pImDs/PH9pm3j3bFXmFe9J8MPmXuINCGoTBVJnN1neJZrj7rXLkMdsK3 Fed6fDFW2iuifN6c9a6AA7EboJELfGDCIdDZEXxaKZtwM20//dluTlcGkTD244tX/eYU 2Im0SZhSo6US9TTQLUaHaYC0MBjfuXg1aycswC2nbJRqU+YCluvVFMLCEaQUlKTh3qSt qJ8Cu8f6+f8RMrmZfrZ3Is8/UIm/BZhPWqKjEp49p6OX0WvItyJVcpt/jov2EM0wFzfz gNpQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752896742; x=1753501542; h=content-transfer-encoding:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=A6OWqCPyDQhTYuHIAQYRxDOhw9LFBlHskAcktUX3uLY=; b=aih7WhgrMvkj/iQ1dJThm69WcitzM6HIjhZ8QW3QHyWv2zLjnANzogfBsnAZ47eLYm CMedj01xcr+2GEPb37Hxd1H2rMyQJu56r0MCGFc1kvtwdRc0NVaGaTxh4oxAvuHDEove lpERdUtY+Rk9/QdU3MIeLsSErOmPmIEsjWiU8wp85yIQjNEdovmkZcHyKia5m9XIjQNj JfW2NP5esztMuLhXsHLn3GNrKJOjDk3Fu2e9tnGgdZJUCCnuCuacK/dxZDCRuc1tAN2T HqKABMqqX56rkJNmCtfdV5PbZ2i9zi0IQWbxdOjuOwhn+0WTdhUxOxy14KNHWku2SMYN pIrQ== X-Forwarded-Encrypted: i=1; AJvYcCUB5hG5npIovvvGLifErwN6HKPrKaRJniByn4xaiWIekZFjMBtqiEyAkuXo+Ym8pVByi0h+QmxpYpbA54o=@vger.kernel.org X-Gm-Message-State: AOJu0Yz2F6NmuLjMV8kmz4WRdhCfDyXUZi9KxYd/0zbbe/O62kpUarTz pyb7Sc9SaOzCkOLQJnToVaCxFOpblDG4m2j6shKuBPVwPI7Ghcj1Uwby0YWOI9t0/TkYMrYlpXl AFx6YiytDmA== X-Google-Smtp-Source: AGHT+IHamJ0m7J112hO2GysbNd/iUJmM9ocfd4dTYDGzYVC2Kw0OUN4+a9DpQoTDV1jBoq8Ky5IyBLcdK0we X-Received: from pjbsm16.prod.google.com ([2002:a17:90b:2e50:b0:31a:a0b3:5f6]) (user=irogers job=prod-delivery.src-stubby-dispatcher) by 2002:a17:903:faf:b0:235:799:eca5 with SMTP id d9443c01a7336-23e2576e67cmr195351945ad.44.1752896741950; Fri, 18 Jul 2025 20:45:41 -0700 (PDT) Date: Fri, 18 Jul 2025 20:45:05 -0700 In-Reply-To: <20250719034515.2000467-1-irogers@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250719034515.2000467-1-irogers@google.com> X-Mailer: git-send-email 2.50.0.727.gbf7dc18ff4-goog Message-ID: <20250719034515.2000467-10-irogers@google.com> Subject: [PATCH v1 09/19] perf vendor events: Update icelake metrics From: Ian Rogers To: Thomas Falcon , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Kan Liang , "=?UTF-8?q?Andreas=20F=C3=A4rber?=" , Manivannan Sadhasivam , Caleb Biggers , Weilin Wang , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-actions@lists.infradead.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Update metrics from TMA 5.0 to 5.1. Signed-off-by: Ian Rogers Tested-by: Thomas Falcon --- .../arch/x86/icelake/icl-metrics.json | 96 ++++++----- .../arch/x86/icelakex/icx-metrics.json | 155 +++++++++++++----- 2 files changed, 169 insertions(+), 82 deletions(-) diff --git a/tools/perf/pmu-events/arch/x86/icelake/icl-metrics.json b/tool= s/perf/pmu-events/arch/x86/icelake/icl-metrics.json index c5bfdb2f288b..cf9ed3edb694 100644 --- a/tools/perf/pmu-events/arch/x86/icelake/icl-metrics.json +++ b/tools/perf/pmu-events/arch/x86/icelake/icl-metrics.json @@ -1,63 +1,63 @@ [ { "BriefDescription": "C10 residency percent per package", - "MetricExpr": "cstate_pkg@c10\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c10\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C10_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C2 residency percent per package", - "MetricExpr": "cstate_pkg@c2\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c2\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C2_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C3 residency percent per package", - "MetricExpr": "cstate_pkg@c3\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c3\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C3_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per core", - "MetricExpr": "cstate_core@c6\\-residency@ / TSC", + "MetricExpr": "cstate_core@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per package", - "MetricExpr": "cstate_pkg@c6\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C7 residency percent per core", - "MetricExpr": "cstate_core@c7\\-residency@ / TSC", + "MetricExpr": "cstate_core@c7\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C7_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C7 residency percent per package", - "MetricExpr": "cstate_pkg@c7\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c7\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C7_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C8 residency percent per package", - "MetricExpr": "cstate_pkg@c8\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c8\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C8_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C9 residency percent per package", - "MetricExpr": "cstate_pkg@c9\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c9\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C9_Pkg_Residency", "ScaleUnit": "100%" @@ -85,7 +85,6 @@ }, { "BriefDescription": "This metric estimates how often memory load a= ccesses were aliased by preceding stores (in program order) with a 4K addre= ss offset", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "LD_BLOCKS_PARTIAL.ADDRESS_ALIAS / tma_info_thread_c= lks", "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group", "MetricName": "tma_4k_aliasing", @@ -134,6 +133,7 @@ }, { "BriefDescription": "Total pipeline cost of instruction fetch rela= ted bottlenecks by large code footprint programs (i-side cache; TLB and BTB= misses)", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * tma_fetch_latency * (tma_itlb_misses + tma_ic= ache_misses + tma_unknown_branches) / (tma_branch_resteers + tma_dsb_switch= es + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches)", "MetricGroup": "BigFootprint;BvBC;Fed;Frontend;IcMiss;MemoryTLB", "MetricName": "tma_bottleneck_big_code", @@ -147,40 +147,45 @@ "MetricThreshold": "tma_bottleneck_branching_overhead > 5", "PublicDescription": "Total pipeline cost of instructions used for= program control-flow - a subset of the Retiring category in TMA. Examples = include function calls; loops and alignments. (A lower bound)" }, + { + "BriefDescription": "Total pipeline cost when the execution is com= pute-bound - an estimation", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "100 * (tma_core_bound * tma_divider / (tma_divider = + tma_ports_utilization + tma_serializing_operation) + tma_core_bound * (tm= a_ports_utilization / (tma_divider + tma_ports_utilization + tma_serializin= g_operation)) * (tma_ports_utilized_3m / (tma_ports_utilized_0 + tma_ports_= utilized_1 + tma_ports_utilized_2 + tma_ports_utilized_3m)))", + "MetricGroup": "BvCB;Cor;tma_issueComp", + "MetricName": "tma_bottleneck_compute_bound_est", + "MetricThreshold": "tma_bottleneck_compute_bound_est > 20", + "PublicDescription": "Total pipeline cost when the execution is co= mpute-bound - an estimation. Covers Core Bound when High ILP as well as whe= n long-latency execution units are busy. Related metrics: " + }, { "BriefDescription": "Total pipeline cost of external Memory- or Ca= che-Bandwidth related bottlenecks", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) *= (tma_mem_bandwidth / (tma_mem_bandwidth + tma_mem_latency)) + tma_memory_b= ound * (tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_= l3_bound + tma_store_bound)) * (tma_sq_full / (tma_contested_accesses + tma= _data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_memory_bound * (tm= a_l1_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound += tma_store_bound)) * (tma_fb_full / (tma_4k_aliasing + tma_dtlb_load + tma_= fb_full + tma_l1_latency_dependency + tma_lock_latency + tma_split_loads + = tma_store_fwd_blk)))", "MetricGroup": "BvMB;Mem;MemoryBW;Offcore;tma_issueBW", - "MetricName": "tma_bottleneck_cache_memory_bandwidth", - "MetricThreshold": "tma_bottleneck_cache_memory_bandwidth > 20", + "MetricName": "tma_bottleneck_data_cache_memory_bandwidth", + "MetricThreshold": "tma_bottleneck_data_cache_memory_bandwidth > 2= 0", "PublicDescription": "Total pipeline cost of external Memory- or C= ache-Bandwidth related bottlenecks. Related metrics: tma_fb_full, tma_info_= system_dram_bw_use, tma_mem_bandwidth, tma_sq_full" }, { "BriefDescription": "Total pipeline cost of external Memory- or Ca= che-Latency related bottlenecks", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) *= (tma_mem_latency / (tma_mem_bandwidth + tma_mem_latency)) + tma_memory_bou= nd * (tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3= _bound + tma_store_bound)) * (tma_l3_hit_latency / (tma_contested_accesses = + tma_data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_memory_bound = * tma_l2_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bou= nd + tma_store_bound) + tma_memory_bound * (tma_l1_bound / (tma_dram_bound = + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_l1_= latency_dependency / (tma_4k_aliasing + tma_dtlb_load + tma_fb_full + tma_l= 1_latency_dependency + tma_lock_latency + tma_split_loads + tma_store_fwd_b= lk)) + tma_memory_bound * (tma_l1_bound / (tma_dram_bound + tma_l1_bound + = tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_lock_latency / (tma_= 4k_aliasing + tma_dtlb_load + tma_fb_full + tma_l1_latency_dependency + tma= _lock_latency + tma_split_loads + tma_store_fwd_blk)) + tma_memory_bound * = (tma_l1_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_boun= d + tma_store_bound)) * (tma_split_loads / (tma_4k_aliasing + tma_dtlb_load= + tma_fb_full + tma_l1_latency_dependency + tma_lock_latency + tma_split_l= oads + tma_store_fwd_blk)) + tma_memory_bound * (tma_store_bound / (tma_dra= m_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * = (tma_split_stores / (tma_dtlb_store + tma_false_sharing + tma_split_stores = + tma_store_latency + tma_streaming_stores)) + tma_memory_bound * (tma_stor= e_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tm= a_store_bound)) * (tma_store_latency / (tma_dtlb_store + tma_false_sharing = + tma_split_stores + tma_store_latency + tma_streaming_stores)))", "MetricGroup": "BvML;Mem;MemoryLat;Offcore;tma_issueLat", - "MetricName": "tma_bottleneck_cache_memory_latency", - "MetricThreshold": "tma_bottleneck_cache_memory_latency > 20", + "MetricName": "tma_bottleneck_data_cache_memory_latency", + "MetricThreshold": "tma_bottleneck_data_cache_memory_latency > 20", "PublicDescription": "Total pipeline cost of external Memory- or C= ache-Latency related bottlenecks. Related metrics: tma_l3_hit_latency, tma_= mem_latency" }, - { - "BriefDescription": "Total pipeline cost when the execution is com= pute-bound - an estimation", - "MetricExpr": "100 * (tma_core_bound * tma_divider / (tma_divider = + tma_ports_utilization + tma_serializing_operation) + tma_core_bound * (tm= a_ports_utilization / (tma_divider + tma_ports_utilization + tma_serializin= g_operation)) * (tma_ports_utilized_3m / (tma_ports_utilized_0 + tma_ports_= utilized_1 + tma_ports_utilized_2 + tma_ports_utilized_3m)))", - "MetricGroup": "BvCB;Cor;tma_issueComp", - "MetricName": "tma_bottleneck_compute_bound_est", - "MetricThreshold": "tma_bottleneck_compute_bound_est > 20", - "PublicDescription": "Total pipeline cost when the execution is co= mpute-bound - an estimation. Covers Core Bound when High ILP as well as whe= n long-latency execution units are busy. Related metrics: " - }, { "BriefDescription": "Total pipeline cost of instruction fetch band= width related bottlenecks (when the front-end could not sustain operations = delivery to the back-end)", - "MetricExpr": "100 * (tma_frontend_bound - (1 - 10 * tma_microcode= _sequencer * tma_other_mispredicts / tma_branch_mispredicts) * tma_fetch_la= tency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches = + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches) - tma_mi= crocode_sequencer / (tma_few_uops_instructions + tma_microcode_sequencer) *= (tma_assists / tma_microcode_sequencer) * (tma_fetch_latency * (tma_ms_swi= tches + tma_branch_resteers * (tma_clears_resteers + 10 * tma_microcode_seq= uencer * tma_other_mispredicts / tma_branch_mispredicts * tma_mispredicts_r= esteers) / (tma_clears_resteers + tma_mispredicts_resteers + tma_unknown_br= anches)) / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tm= a_itlb_misses + tma_lcp + tma_ms_switches) + tma_fetch_bandwidth * tma_ms /= (tma_dsb + tma_lsd + tma_mite + tma_ms))) - tma_bottleneck_big_code", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "100 * (tma_frontend_bound - (1 - 10 * tma_microcode= _sequencer * tma_other_mispredicts / tma_branch_mispredicts) * tma_fetch_la= tency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches = + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches) - tma_mi= crocode_sequencer / (tma_few_uops_instructions + tma_microcode_sequencer) *= (tma_assists / tma_microcode_sequencer) * (tma_fetch_latency * (tma_ms_swi= tches + tma_branch_resteers * (tma_clears_resteers + 10 * tma_microcode_seq= uencer * tma_other_mispredicts / tma_branch_mispredicts * tma_mispredicts_r= esteers) / (tma_clears_resteers + tma_mispredicts_resteers + tma_unknown_br= anches)) / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tm= a_itlb_misses + tma_lcp + tma_ms_switches) + tma_ms)) - tma_bottleneck_big_= code", "MetricGroup": "BvFB;Fed;FetchBW;Frontend", "MetricName": "tma_bottleneck_instruction_fetch_bw", "MetricThreshold": "tma_bottleneck_instruction_fetch_bw > 20" }, { "BriefDescription": "Total pipeline cost of irregular execution (e= .g", - "MetricExpr": "100 * (tma_microcode_sequencer / (tma_few_uops_inst= ructions + tma_microcode_sequencer) * (tma_assists / tma_microcode_sequence= r) * (tma_fetch_latency * (tma_ms_switches + tma_branch_resteers * (tma_cle= ars_resteers + 10 * tma_microcode_sequencer * tma_other_mispredicts / tma_b= ranch_mispredicts * tma_mispredicts_resteers) / (tma_clears_resteers + tma_= mispredicts_resteers + tma_unknown_branches)) / (tma_branch_resteers + tma_= dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switc= hes) + tma_fetch_bandwidth * tma_ms / (tma_dsb + tma_lsd + tma_mite + tma_m= s)) + 10 * tma_microcode_sequencer * tma_other_mispredicts / tma_branch_mis= predicts * tma_branch_mispredicts + tma_machine_clears * tma_other_nukes / = tma_other_nukes + tma_core_bound * (tma_serializing_operation + tma_core_bo= und * RS_EVENTS.EMPTY_CYCLES / tma_info_thread_clks * tma_ports_utilized_0)= / (tma_divider + tma_ports_utilization + tma_serializing_operation) + tma_= microcode_sequencer / (tma_few_uops_instructions + tma_microcode_sequencer)= * (tma_assists / tma_microcode_sequencer) * tma_heavy_operations)", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "100 * (tma_microcode_sequencer / (tma_few_uops_inst= ructions + tma_microcode_sequencer) * (tma_assists / tma_microcode_sequence= r) * (tma_fetch_latency * (tma_ms_switches + tma_branch_resteers * (tma_cle= ars_resteers + 10 * tma_microcode_sequencer * tma_other_mispredicts / tma_b= ranch_mispredicts * tma_mispredicts_resteers) / (tma_clears_resteers + tma_= mispredicts_resteers + tma_unknown_branches)) / (tma_branch_resteers + tma_= dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switc= hes) + tma_ms) + 10 * tma_microcode_sequencer * tma_other_mispredicts / tma= _branch_mispredicts * tma_branch_mispredicts + tma_machine_clears * tma_oth= er_nukes / tma_other_nukes + tma_core_bound * (tma_serializing_operation + = tma_core_bound * RS_EVENTS.EMPTY_CYCLES / tma_info_thread_clks * tma_ports_= utilized_0) / (tma_divider + tma_ports_utilization + tma_serializing_operat= ion) + tma_microcode_sequencer / (tma_few_uops_instructions + tma_microcode= _sequencer) * (tma_assists / tma_microcode_sequencer) * tma_heavy_operation= s)", "MetricGroup": "Bad;BvIO;Cor;Ret;tma_issueMS", "MetricName": "tma_bottleneck_irregular_overhead", "MetricThreshold": "tma_bottleneck_irregular_overhead > 10", @@ -188,6 +193,7 @@ }, { "BriefDescription": "Total pipeline cost of Memory Address Transla= tion related bottlenecks (data-side TLBs)", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_memory_bound * (tma_l1_bound / max(tma_m= emory_bound, tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + = tma_store_bound)) * (tma_dtlb_load / max(tma_l1_bound, tma_4k_aliasing + tm= a_dtlb_load + tma_fb_full + tma_l1_latency_dependency + tma_lock_latency + = tma_split_loads + tma_store_fwd_blk)) + tma_memory_bound * (tma_store_bound= / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store= _bound)) * (tma_dtlb_store / (tma_dtlb_store + tma_false_sharing + tma_spli= t_stores + tma_store_latency + tma_streaming_stores)))", "MetricGroup": "BvMT;Mem;MemoryTLB;Offcore;tma_issueTLB", "MetricName": "tma_bottleneck_memory_data_tlbs", @@ -196,6 +202,7 @@ }, { "BriefDescription": "Total pipeline cost of Memory Synchronization= related bottlenecks (data transfers and coherency updates across processor= s)", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_memory_bound * (tma_l3_bound / (tma_dram= _bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * (t= ma_contested_accesses + tma_data_sharing) / (tma_contested_accesses + tma_d= ata_sharing + tma_l3_hit_latency + tma_sq_full) + tma_store_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * = tma_false_sharing / (tma_dtlb_store + tma_false_sharing + tma_split_stores = + tma_store_latency + tma_streaming_stores - tma_store_latency)) + tma_mach= ine_clears * (1 - tma_other_nukes / tma_other_nukes))", "MetricGroup": "BvMS;LockCont;Mem;Offcore;tma_issueSyncxn", "MetricName": "tma_bottleneck_memory_synchronization", @@ -204,6 +211,7 @@ }, { "BriefDescription": "Total pipeline cost of Branch Misprediction r= elated bottlenecks", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (1 - 10 * tma_microcode_sequencer * tma_other= _mispredicts / tma_branch_mispredicts) * (tma_branch_mispredicts + tma_fetc= h_latency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switc= hes + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches))", "MetricGroup": "Bad;BadSpec;BrMispredicts;BvMP;tma_issueBM", "MetricName": "tma_bottleneck_mispredictions", @@ -212,7 +220,8 @@ }, { "BriefDescription": "Total pipeline cost of remaining bottlenecks = in the back-end", - "MetricExpr": "100 - (tma_bottleneck_big_code + tma_bottleneck_ins= truction_fetch_bw + tma_bottleneck_mispredictions + tma_bottleneck_cache_me= mory_bandwidth + tma_bottleneck_cache_memory_latency + tma_bottleneck_memor= y_data_tlbs + tma_bottleneck_memory_synchronization + tma_bottleneck_comput= e_bound_est + tma_bottleneck_irregular_overhead + tma_bottleneck_branching_= overhead + tma_bottleneck_useful_work)", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "100 - (tma_bottleneck_big_code + tma_bottleneck_ins= truction_fetch_bw + tma_bottleneck_mispredictions + tma_bottleneck_data_cac= he_memory_bandwidth + tma_bottleneck_data_cache_memory_latency + tma_bottle= neck_memory_data_tlbs + tma_bottleneck_memory_synchronization + tma_bottlen= eck_compute_bound_est + tma_bottleneck_irregular_overhead + tma_bottleneck_= branching_overhead + tma_bottleneck_useful_work)", "MetricGroup": "BvOB;Cor;Offcore", "MetricName": "tma_bottleneck_other_bottlenecks", "MetricThreshold": "tma_bottleneck_other_bottlenecks > 20", @@ -220,6 +229,7 @@ }, { "BriefDescription": "Total pipeline cost of \"useful operations\" = - the portion of Retiring category not covered by Branching_Overhead nor Ir= regular_Overhead.", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_retiring - (BR_INST_RETIRED.ALL_BRANCHES= + 2 * BR_INST_RETIRED.NEAR_CALL + INST_RETIRED.NOP) / tma_info_thread_slot= s - tma_microcode_sequencer / (tma_few_uops_instructions + tma_microcode_se= quencer) * (tma_assists / tma_microcode_sequencer) * tma_heavy_operations)", "MetricGroup": "BvUW;Ret", "MetricName": "tma_bottleneck_useful_work", @@ -427,7 +437,7 @@ "MetricGroup": "BvMB;MemoryBW;TopdownL4;tma_L4_group;tma_issueBW;t= ma_issueSL;tma_issueSmSt;tma_l1_bound_group", "MetricName": "tma_fb_full", "MetricThreshold": "tma_fb_full > 0.3", - "PublicDescription": "This metric does a *rough estimation* of how= often L1D Fill Buffer unavailability limited additional L1D miss memory ac= cess requests to proceed. The higher the metric value; the deeper the memor= y hierarchy level the misses are satisfied from (metric values >1 are valid= ). Often it hints on approaching bandwidth limits (to L2 cache; L3 cache or= external memory). Related metrics: tma_bottleneck_cache_memory_bandwidth, = tma_info_system_dram_bw_use, tma_mem_bandwidth, tma_sq_full, tma_store_late= ncy, tma_streaming_stores", + "PublicDescription": "This metric does a *rough estimation* of how= often L1D Fill Buffer unavailability limited additional L1D miss memory ac= cess requests to proceed. The higher the metric value; the deeper the memor= y hierarchy level the misses are satisfied from (metric values >1 are valid= ). Often it hints on approaching bandwidth limits (to L2 cache; L3 cache or= external memory). Related metrics: tma_bottleneck_data_cache_memory_bandwi= dth, tma_info_system_dram_bw_use, tma_mem_bandwidth, tma_sq_full, tma_store= _latency, tma_streaming_stores", "ScaleUnit": "100%" }, { @@ -619,6 +629,7 @@ }, { "BriefDescription": "Total pipeline cost of DSB (uop cache) hits -= subset of the Instruction_Fetch_BW Bottleneck", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_frontend_bound * (tma_fetch_bandwidth / = (tma_fetch_bandwidth + tma_fetch_latency)) * (tma_dsb / (tma_dsb + tma_lsd = + tma_mite + tma_ms)))", "MetricGroup": "DSB;Fed;FetchBW;tma_issueFB", "MetricName": "tma_info_botlnk_l2_dsb_bandwidth", @@ -1068,7 +1079,7 @@ "MetricName": "tma_info_memory_tlb_store_stlb_mpki" }, { - "BriefDescription": "", + "BriefDescription": "Mem;Backend;CacheHits", "MetricExpr": "UOPS_EXECUTED.THREAD / (UOPS_EXECUTED.CORE_CYCLES_G= E_1 / 2 if #SMT_on else cpu@UOPS_EXECUTED.THREAD\\,cmask\\=3D1@)", "MetricGroup": "Cor;Pipeline;PortsUtil;SMT", "MetricName": "tma_info_pipeline_execute" @@ -1091,6 +1102,12 @@ "MetricGroup": "Fed;FetchBW", "MetricName": "tma_info_pipeline_fetch_mite" }, + { + "BriefDescription": "Average number of uops fetched from MS per cy= cle", + "MetricExpr": "IDQ.MS_UOPS / cpu@IDQ.MS_UOPS\\,cmask\\=3D1@", + "MetricGroup": "Fed;FetchLat;MicroSeq", + "MetricName": "tma_info_pipeline_fetch_ms" + }, { "BriefDescription": "Instructions per a microcode Assist invocatio= n", "MetricExpr": "INST_RETIRED.ANY / ASSISTS.ANY", @@ -1107,7 +1124,7 @@ }, { "BriefDescription": "Measured Average Core Frequency for unhalted = processors [GHz]", - "MetricExpr": "tma_info_system_turbo_utilization * TSC / 1e9 / tma= _info_system_time", + "MetricExpr": "tma_info_system_turbo_utilization * msr@tsc@ / 1e9 = / tma_info_system_time", "MetricGroup": "Power;Summary", "MetricName": "tma_info_system_core_frequency" }, @@ -1119,7 +1136,7 @@ }, { "BriefDescription": "Average number of utilized CPUs", - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / TSC", + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@", "MetricGroup": "Summary", "MetricName": "tma_info_system_cpus_utilized" }, @@ -1128,7 +1145,7 @@ "MetricExpr": "64 * (UNC_ARB_TRK_REQUESTS.ALL + UNC_ARB_COH_TRK_RE= QUESTS.ALL) / 1e6 / tma_info_system_time / 1e3", "MetricGroup": "HPC;MemOffcore;MemoryBW;SoC;tma_issueBW", "MetricName": "tma_info_system_dram_bw_use", - "PublicDescription": "Average external Memory Bandwidth Use for re= ads and writes [GB / sec]. Related metrics: tma_bottleneck_cache_memory_ban= dwidth, tma_fb_full, tma_mem_bandwidth, tma_sq_full" + "PublicDescription": "Average external Memory Bandwidth Use for re= ads and writes [GB / sec]. Related metrics: tma_bottleneck_data_cache_memor= y_bandwidth, tma_fb_full, tma_mem_bandwidth, tma_sq_full" }, { "BriefDescription": "Giga Floating Point Operations Per Second", @@ -1296,12 +1313,12 @@ "ScaleUnit": "100%" }, { - "BriefDescription": "This metric([SKL+] roughly; [LNL]) estimates = fraction of cycles with demand load accesses that hit the L1D cache", + "BriefDescription": "This metric ([SKL+] roughly; [LNL]) estimates= fraction of cycles with demand load accesses that hit the L1D cache", "MetricExpr": "min(2 * (MEM_INST_RETIRED.ALL_LOADS - MEM_LOAD_RETI= RED.FB_HIT - MEM_LOAD_RETIRED.L1_MISS) * 20 / 100, max(CYCLE_ACTIVITY.CYCLE= S_MEM_ANY - CYCLE_ACTIVITY.CYCLES_L1D_MISS, 0)) / tma_info_thread_clks", "MetricGroup": "BvML;MemoryLat;TopdownL4;tma_L4_group;tma_l1_bound= _group", "MetricName": "tma_l1_latency_dependency", "MetricThreshold": "tma_l1_latency_dependency > 0.1 & (tma_l1_boun= d > 0.1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric([SKL+] roughly; [LNL]) estimates= fraction of cycles with demand load accesses that hit the L1D cache. The s= hort latency of the L1D cache may be exposed in pointer-chasing memory acce= ss patterns as an example. Sample with: MEM_LOAD_RETIRED.L1_HIT", + "PublicDescription": "This metric ([SKL+] roughly; [LNL]) estimate= s fraction of cycles with demand load accesses that hit the L1D cache. The = short latency of the L1D cache may be exposed in pointer-chasing memory acc= ess patterns as an example. Sample with: MEM_LOAD_RETIRED.L1_HIT", "ScaleUnit": "100%" }, { @@ -1325,7 +1342,6 @@ }, { "BriefDescription": "This metric estimates how often the CPU was s= talled due to loads accesses to L3 cache or contended with a sibling Core", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "(CYCLE_ACTIVITY.STALLS_L2_MISS - CYCLE_ACTIVITY.STA= LLS_L3_MISS) / tma_info_thread_clks", "MetricGroup": "CacheHits;MemoryBound;TmaL3mem;TopdownL3;tma_L3_gr= oup;tma_memory_bound_group", "MetricName": "tma_l3_bound", @@ -1339,7 +1355,7 @@ "MetricGroup": "BvML;MemoryLat;TopdownL4;tma_L4_group;tma_issueLat= ;tma_l3_bound_group", "MetricName": "tma_l3_hit_latency", "MetricThreshold": "tma_l3_hit_latency > 0.1 & (tma_l3_bound > 0.0= 5 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric estimates fraction of cycles wit= h demand load accesses that hit the L3 cache under unloaded scenarios (poss= ibly L3 latency limited). Avoiding private cache misses (i.e. L2 misses/L3= hits) will improve the latency; reduce contention with sibling physical co= res and increase performance. Note the value of this node may overlap with= its siblings. Sample with: MEM_LOAD_RETIRED.L3_HIT_PS. Related metrics: tm= a_bottleneck_cache_memory_latency, tma_mem_latency", + "PublicDescription": "This metric estimates fraction of cycles wit= h demand load accesses that hit the L3 cache under unloaded scenarios (poss= ibly L3 latency limited). Avoiding private cache misses (i.e. L2 misses/L3= hits) will improve the latency; reduce contention with sibling physical co= res and increase performance. Note the value of this node may overlap with= its siblings. Sample with: MEM_LOAD_RETIRED.L3_HIT_PS. Related metrics: tm= a_bottleneck_data_cache_memory_latency, tma_mem_latency", "ScaleUnit": "100%" }, { @@ -1445,7 +1461,7 @@ "MetricGroup": "BvMB;MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_d= ram_bound_group;tma_issueBW", "MetricName": "tma_mem_bandwidth", "MetricThreshold": "tma_mem_bandwidth > 0.2 & (tma_dram_bound > 0.= 1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric estimates fraction of cycles whe= re the core's performance was likely hurt due to approaching bandwidth limi= ts of external memory - DRAM ([SPR-HBM] and/or HBM). The underlying heuris= tic assumes that a similar off-core traffic is generated by all IA cores. T= his metric does not aggregate non-data-read requests by this logical proces= sor; requests from other IA Logical Processors/Physical Cores/sockets; or o= ther non-IA devices like GPU; hence the maximum external memory bandwidth l= imits may or may not be approached when this metric is flagged (see Uncore = counters for that). Related metrics: tma_bottleneck_cache_memory_bandwidth,= tma_fb_full, tma_info_system_dram_bw_use, tma_sq_full", + "PublicDescription": "This metric estimates fraction of cycles whe= re the core's performance was likely hurt due to approaching bandwidth limi= ts of external memory - DRAM ([SPR-HBM] and/or HBM). The underlying heuris= tic assumes that a similar off-core traffic is generated by all IA cores. T= his metric does not aggregate non-data-read requests by this logical proces= sor; requests from other IA Logical Processors/Physical Cores/sockets; or o= ther non-IA devices like GPU; hence the maximum external memory bandwidth l= imits may or may not be approached when this metric is flagged (see Uncore = counters for that). Related metrics: tma_bottleneck_data_cache_memory_bandw= idth, tma_fb_full, tma_info_system_dram_bw_use, tma_sq_full", "ScaleUnit": "100%" }, { @@ -1454,7 +1470,7 @@ "MetricGroup": "BvML;MemoryLat;Offcore;TopdownL4;tma_L4_group;tma_= dram_bound_group;tma_issueLat", "MetricName": "tma_mem_latency", "MetricThreshold": "tma_mem_latency > 0.1 & (tma_dram_bound > 0.1 = & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric estimates fraction of cycles whe= re the performance was likely hurt due to latency from external memory - DR= AM ([SPR-HBM] and/or HBM). This metric does not aggregate requests from ot= her Logical Processors/Physical Cores/sockets (see Uncore counters for that= ). Related metrics: tma_bottleneck_cache_memory_latency, tma_l3_hit_latency= ", + "PublicDescription": "This metric estimates fraction of cycles whe= re the performance was likely hurt due to latency from external memory - DR= AM ([SPR-HBM] and/or HBM). This metric does not aggregate requests from ot= her Logical Processors/Physical Cores/sockets (see Uncore counters for that= ). Related metrics: tma_bottleneck_data_cache_memory_latency, tma_l3_hit_la= tency", "ScaleUnit": "100%" }, { @@ -1522,7 +1538,7 @@ }, { "BriefDescription": "This metric represents Core fraction of cycle= s in which CPU was likely limited due to the Microcode Sequencer (MS) unit = - see Microcode_Sequencer node for details.", - "MetricExpr": "cpu@IDQ.MS_UOPS\\,cmask\\=3D1@ / tma_info_core_core= _clks / 2", + "MetricExpr": "cpu@IDQ.MS_UOPS\\,cmask\\=3D1@ / tma_info_core_core= _clks / 3.3", "MetricGroup": "MicroSeq;TopdownL3;tma_L3_group;tma_fetch_bandwidt= h_group", "MetricName": "tma_ms", "MetricThreshold": "tma_ms > 0.05 & tma_fetch_bandwidth > 0.2", @@ -1656,7 +1672,7 @@ { "BriefDescription": "This category represents fraction of slots ut= ilized by useful work i.e. issued uops that eventually get retired", "DefaultMetricgroupName": "TopdownL1", - "MetricExpr": "topdown\\-retiring / (topdown\\-fe\\-bound + topdow= n\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_info_= thread_slots", + "MetricExpr": "topdown\\-retiring / (topdown\\-fe\\-bound + topdow= n\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound)", "MetricGroup": "BvUW;Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_retiring", "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.= 1", @@ -1693,7 +1709,6 @@ }, { "BriefDescription": "This metric represents rate of split store ac= cesses", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "MEM_INST_RETIRED.SPLIT_STORES / tma_info_core_core_= clks", "MetricGroup": "TopdownL4;tma_L4_group;tma_issueSpSt;tma_store_bou= nd_group", "MetricName": "tma_split_stores", @@ -1707,7 +1722,7 @@ "MetricGroup": "BvMB;MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_i= ssueBW;tma_l3_bound_group", "MetricName": "tma_sq_full", "MetricThreshold": "tma_sq_full > 0.3 & (tma_l3_bound > 0.05 & (tm= a_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric measures fraction of cycles wher= e the Super Queue (SQ) was full taking into account all request-types and b= oth hardware SMT threads (Logical Processors). Related metrics: tma_bottlen= eck_cache_memory_bandwidth, tma_fb_full, tma_info_system_dram_bw_use, tma_m= em_bandwidth", + "PublicDescription": "This metric measures fraction of cycles wher= e the Super Queue (SQ) was full taking into account all request-types and b= oth hardware SMT threads (Logical Processors). Related metrics: tma_bottlen= eck_data_cache_memory_bandwidth, tma_fb_full, tma_info_system_dram_bw_use, = tma_mem_bandwidth", "ScaleUnit": "100%" }, { @@ -1721,7 +1736,6 @@ }, { "BriefDescription": "This metric roughly estimates fraction of cyc= les when the memory subsystem had loads blocked since they could not forwar= d data from earlier (in program order) overlapping stores", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "13 * LD_BLOCKS.STORE_FORWARD / tma_info_thread_clks= ", "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group", "MetricName": "tma_store_fwd_blk", diff --git a/tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json b/too= ls/perf/pmu-events/arch/x86/icelakex/icx-metrics.json index a886a0cfee07..f58eec2a1788 100644 --- a/tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json +++ b/tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json @@ -1,28 +1,28 @@ [ { "BriefDescription": "C1 residency percent per core", - "MetricExpr": "cstate_core@c1\\-residency@ / TSC", + "MetricExpr": "cstate_core@c1\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C1_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C2 residency percent per package", - "MetricExpr": "cstate_pkg@c2\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c2\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C2_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per core", - "MetricExpr": "cstate_core@c6\\-residency@ / TSC", + "MetricExpr": "cstate_core@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per package", - "MetricExpr": "cstate_pkg@c6\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Pkg_Residency", "ScaleUnit": "100%" @@ -78,6 +78,12 @@ "MetricName": "io_bandwidth_read", "ScaleUnit": "1MB/s" }, + { + "BriefDescription": "Bandwidth of inbound IO reads that are initia= ted by end device controllers that are requesting memory from the CPU and m= iss the L3 cache", + "MetricExpr": "UNC_CHA_TOR_INSERTS.IO_MISS_PCIRDCUR * 64 / 1e6 / d= uration_time", + "MetricName": "io_bandwidth_read_l3_miss", + "ScaleUnit": "1MB/s" + }, { "BriefDescription": "Bandwidth of IO reads that are initiated by e= nd device controllers that are requesting memory from the local CPU socket", "MetricExpr": "UNC_CHA_TOR_INSERTS.IO_PCIRDCUR_LOCAL * 64 / 1e6 / = duration_time", @@ -96,6 +102,12 @@ "MetricName": "io_bandwidth_write", "ScaleUnit": "1MB/s" }, + { + "BriefDescription": "Bandwidth of inbound IO writes that are initi= ated by end device controllers that are writing memory to the CPU", + "MetricExpr": "(UNC_CHA_TOR_INSERTS.IO_MISS_ITOM + UNC_CHA_TOR_INS= ERTS.IO_MISS_ITOMCACHENEAR) * 64 / 1e6 / duration_time", + "MetricName": "io_bandwidth_write_l3_miss", + "ScaleUnit": "1MB/s" + }, { "BriefDescription": "Bandwidth of IO writes that are initiated by = end device controllers that are writing memory to the local CPU socket", "MetricExpr": "(UNC_CHA_TOR_INSERTS.IO_ITOM_LOCAL + UNC_CHA_TOR_IN= SERTS.IO_ITOMCACHENEAR_LOCAL) * 64 / 1e6 / duration_time", @@ -108,6 +120,24 @@ "MetricName": "io_bandwidth_write_remote", "ScaleUnit": "1MB/s" }, + { + "BriefDescription": "The percent of inbound full cache line writes= initiated by IO that miss the L3 cache", + "MetricExpr": "UNC_CHA_TOR_INSERTS.IO_MISS_ITOM / UNC_CHA_TOR_INSE= RTS.IO_ITOM", + "MetricName": "io_full_write_l3_miss", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "The percent of inbound partial writes initiat= ed by IO that miss the L3 cache", + "MetricExpr": "(UNC_CHA_TOR_INSERTS.IO_MISS_ITOMCACHENEAR + UNC_CH= A_TOR_INSERTS.IO_MISS_RFO) / (UNC_CHA_TOR_INSERTS.IO_ITOMCACHENEAR + UNC_CH= A_TOR_INSERTS.IO_RFO)", + "MetricName": "io_partial_write_l3_miss", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "The percent of inbound reads initiated by IO = that miss the L3 cache", + "MetricExpr": "UNC_CHA_TOR_INSERTS.IO_MISS_PCIRDCUR / UNC_CHA_TOR_= INSERTS.IO_PCIRDCUR", + "MetricName": "io_read_l3_miss", + "ScaleUnit": "100%" + }, { "BriefDescription": "Ratio of number of completed page walks (for = 2 megabyte and 4 megabyte page sizes) caused by a code fetch to the total n= umber of completed instructions", "MetricExpr": "ITLB_MISSES.WALK_COMPLETED_2M_4M / INST_RETIRED.ANY= ", @@ -331,7 +361,6 @@ }, { "BriefDescription": "This metric estimates how often memory load a= ccesses were aliased by preceding stores (in program order) with a 4K addre= ss offset", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "LD_BLOCKS_PARTIAL.ADDRESS_ALIAS / tma_info_thread_c= lks", "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group", "MetricName": "tma_4k_aliasing", @@ -380,6 +409,7 @@ }, { "BriefDescription": "Total pipeline cost of instruction fetch rela= ted bottlenecks by large code footprint programs (i-side cache; TLB and BTB= misses)", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * tma_fetch_latency * (tma_itlb_misses + tma_ic= ache_misses + tma_unknown_branches) / (tma_branch_resteers + tma_dsb_switch= es + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches)", "MetricGroup": "BigFootprint;BvBC;Fed;Frontend;IcMiss;MemoryTLB", "MetricName": "tma_bottleneck_big_code", @@ -393,40 +423,45 @@ "MetricThreshold": "tma_bottleneck_branching_overhead > 5", "PublicDescription": "Total pipeline cost of instructions used for= program control-flow - a subset of the Retiring category in TMA. Examples = include function calls; loops and alignments. (A lower bound)" }, + { + "BriefDescription": "Total pipeline cost when the execution is com= pute-bound - an estimation", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "100 * (tma_core_bound * tma_divider / (tma_divider = + tma_ports_utilization + tma_serializing_operation) + tma_core_bound * (tm= a_ports_utilization / (tma_divider + tma_ports_utilization + tma_serializin= g_operation)) * (tma_ports_utilized_3m / (tma_ports_utilized_0 + tma_ports_= utilized_1 + tma_ports_utilized_2 + tma_ports_utilized_3m)))", + "MetricGroup": "BvCB;Cor;tma_issueComp", + "MetricName": "tma_bottleneck_compute_bound_est", + "MetricThreshold": "tma_bottleneck_compute_bound_est > 20", + "PublicDescription": "Total pipeline cost when the execution is co= mpute-bound - an estimation. Covers Core Bound when High ILP as well as whe= n long-latency execution units are busy. Related metrics: " + }, { "BriefDescription": "Total pipeline cost of external Memory- or Ca= che-Bandwidth related bottlenecks", - "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) *= (tma_mem_bandwidth / (tma_mem_bandwidth + tma_mem_latency)) + tma_memory_b= ound * (tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_= l3_bound + tma_store_bound)) * (tma_sq_full / (tma_contested_accesses + tma= _data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_memory_bound * (tm= a_l1_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound += tma_store_bound)) * (tma_fb_full / (tma_4k_aliasing + tma_dtlb_load + tma_= fb_full + tma_l1_latency_dependency + tma_lock_latency + tma_split_loads + = tma_store_fwd_blk)))", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_cx= l_mem_bound + tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound += tma_store_bound)) * (tma_mem_bandwidth / (tma_mem_bandwidth + tma_mem_late= ncy)) + tma_memory_bound * (tma_l3_bound / (tma_cxl_mem_bound + tma_dram_bo= und + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma= _sq_full / (tma_contested_accesses + tma_data_sharing + tma_l3_hit_latency = + tma_sq_full)) + tma_memory_bound * (tma_l1_bound / (tma_cxl_mem_bound + t= ma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_boun= d)) * (tma_fb_full / (tma_4k_aliasing + tma_dtlb_load + tma_fb_full + tma_l= 1_latency_dependency + tma_lock_latency + tma_split_loads + tma_store_fwd_b= lk)))", "MetricGroup": "BvMB;Mem;MemoryBW;Offcore;tma_issueBW", - "MetricName": "tma_bottleneck_cache_memory_bandwidth", - "MetricThreshold": "tma_bottleneck_cache_memory_bandwidth > 20", + "MetricName": "tma_bottleneck_data_cache_memory_bandwidth", + "MetricThreshold": "tma_bottleneck_data_cache_memory_bandwidth > 2= 0", "PublicDescription": "Total pipeline cost of external Memory- or C= ache-Bandwidth related bottlenecks. Related metrics: tma_fb_full, tma_info_= system_dram_bw_use, tma_mem_bandwidth, tma_sq_full" }, { "BriefDescription": "Total pipeline cost of external Memory- or Ca= che-Latency related bottlenecks", - "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) *= (tma_mem_latency / (tma_mem_bandwidth + tma_mem_latency)) + tma_memory_bou= nd * (tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3= _bound + tma_store_bound)) * (tma_l3_hit_latency / (tma_contested_accesses = + tma_data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_memory_bound = * tma_l2_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bou= nd + tma_store_bound) + tma_memory_bound * (tma_l1_bound / (tma_dram_bound = + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_l1_= latency_dependency / (tma_4k_aliasing + tma_dtlb_load + tma_fb_full + tma_l= 1_latency_dependency + tma_lock_latency + tma_split_loads + tma_store_fwd_b= lk)) + tma_memory_bound * (tma_l1_bound / (tma_dram_bound + tma_l1_bound + = tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_lock_latency / (tma_= 4k_aliasing + tma_dtlb_load + tma_fb_full + tma_l1_latency_dependency + tma= _lock_latency + tma_split_loads + tma_store_fwd_blk)) + tma_memory_bound * = (tma_l1_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_boun= d + tma_store_bound)) * (tma_split_loads / (tma_4k_aliasing + tma_dtlb_load= + tma_fb_full + tma_l1_latency_dependency + tma_lock_latency + tma_split_l= oads + tma_store_fwd_blk)) + tma_memory_bound * (tma_store_bound / (tma_dra= m_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * = (tma_split_stores / (tma_dtlb_store + tma_false_sharing + tma_split_stores = + tma_store_latency + tma_streaming_stores)) + tma_memory_bound * (tma_stor= e_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tm= a_store_bound)) * (tma_store_latency / (tma_dtlb_store + tma_false_sharing = + tma_split_stores + tma_store_latency + tma_streaming_stores)))", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_cx= l_mem_bound + tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound += tma_store_bound)) * (tma_mem_latency / (tma_mem_bandwidth + tma_mem_latenc= y)) + tma_memory_bound * (tma_l3_bound / (tma_cxl_mem_bound + tma_dram_boun= d + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_l= 3_hit_latency / (tma_contested_accesses + tma_data_sharing + tma_l3_hit_lat= ency + tma_sq_full)) + tma_memory_bound * tma_l2_bound / (tma_cxl_mem_bound= + tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_= bound) + tma_memory_bound * (tma_l1_bound / (tma_cxl_mem_bound + tma_dram_b= ound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tm= a_l1_latency_dependency / (tma_4k_aliasing + tma_dtlb_load + tma_fb_full + = tma_l1_latency_dependency + tma_lock_latency + tma_split_loads + tma_store_= fwd_blk)) + tma_memory_bound * (tma_l1_bound / (tma_cxl_mem_bound + tma_dra= m_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * = (tma_lock_latency / (tma_4k_aliasing + tma_dtlb_load + tma_fb_full + tma_l1= _latency_dependency + tma_lock_latency + tma_split_loads + tma_store_fwd_bl= k)) + tma_memory_bound * (tma_l1_bound / (tma_cxl_mem_bound + tma_dram_boun= d + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_s= plit_loads / (tma_4k_aliasing + tma_dtlb_load + tma_fb_full + tma_l1_latenc= y_dependency + tma_lock_latency + tma_split_loads + tma_store_fwd_blk)) + t= ma_memory_bound * (tma_store_bound / (tma_cxl_mem_bound + tma_dram_bound + = tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_split= _stores / (tma_dtlb_store + tma_false_sharing + tma_split_stores + tma_stor= e_latency + tma_streaming_stores)) + tma_memory_bound * (tma_store_bound / = (tma_cxl_mem_bound + tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_= bound + tma_store_bound)) * (tma_store_latency / (tma_dtlb_store + tma_fals= e_sharing + tma_split_stores + tma_store_latency + tma_streaming_stores)))", "MetricGroup": "BvML;Mem;MemoryLat;Offcore;tma_issueLat", - "MetricName": "tma_bottleneck_cache_memory_latency", - "MetricThreshold": "tma_bottleneck_cache_memory_latency > 20", + "MetricName": "tma_bottleneck_data_cache_memory_latency", + "MetricThreshold": "tma_bottleneck_data_cache_memory_latency > 20", "PublicDescription": "Total pipeline cost of external Memory- or C= ache-Latency related bottlenecks. Related metrics: tma_l3_hit_latency, tma_= mem_latency" }, - { - "BriefDescription": "Total pipeline cost when the execution is com= pute-bound - an estimation", - "MetricExpr": "100 * (tma_core_bound * tma_divider / (tma_divider = + tma_ports_utilization + tma_serializing_operation) + tma_core_bound * (tm= a_ports_utilization / (tma_divider + tma_ports_utilization + tma_serializin= g_operation)) * (tma_ports_utilized_3m / (tma_ports_utilized_0 + tma_ports_= utilized_1 + tma_ports_utilized_2 + tma_ports_utilized_3m)))", - "MetricGroup": "BvCB;Cor;tma_issueComp", - "MetricName": "tma_bottleneck_compute_bound_est", - "MetricThreshold": "tma_bottleneck_compute_bound_est > 20", - "PublicDescription": "Total pipeline cost when the execution is co= mpute-bound - an estimation. Covers Core Bound when High ILP as well as whe= n long-latency execution units are busy. Related metrics: " - }, { "BriefDescription": "Total pipeline cost of instruction fetch band= width related bottlenecks (when the front-end could not sustain operations = delivery to the back-end)", - "MetricExpr": "100 * (tma_frontend_bound - (1 - 10 * tma_microcode= _sequencer * tma_other_mispredicts / tma_branch_mispredicts) * tma_fetch_la= tency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches = + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches) - tma_mi= crocode_sequencer / (tma_few_uops_instructions + tma_microcode_sequencer) *= (tma_assists / tma_microcode_sequencer) * (tma_fetch_latency * (tma_ms_swi= tches + tma_branch_resteers * (tma_clears_resteers + 10 * tma_microcode_seq= uencer * tma_other_mispredicts / tma_branch_mispredicts * tma_mispredicts_r= esteers) / (tma_clears_resteers + tma_mispredicts_resteers + tma_unknown_br= anches)) / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tm= a_itlb_misses + tma_lcp + tma_ms_switches) + tma_fetch_bandwidth * tma_ms /= (tma_dsb + tma_mite + tma_ms))) - tma_bottleneck_big_code", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "100 * (tma_frontend_bound - (1 - 10 * tma_microcode= _sequencer * tma_other_mispredicts / tma_branch_mispredicts) * tma_fetch_la= tency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches = + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches) - tma_mi= crocode_sequencer / (tma_few_uops_instructions + tma_microcode_sequencer) *= (tma_assists / tma_microcode_sequencer) * (tma_fetch_latency * (tma_ms_swi= tches + tma_branch_resteers * (tma_clears_resteers + 10 * tma_microcode_seq= uencer * tma_other_mispredicts / tma_branch_mispredicts * tma_mispredicts_r= esteers) / (tma_clears_resteers + tma_mispredicts_resteers + tma_unknown_br= anches)) / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tm= a_itlb_misses + tma_lcp + tma_ms_switches) + tma_ms)) - tma_bottleneck_big_= code", "MetricGroup": "BvFB;Fed;FetchBW;Frontend", "MetricName": "tma_bottleneck_instruction_fetch_bw", "MetricThreshold": "tma_bottleneck_instruction_fetch_bw > 20" }, { "BriefDescription": "Total pipeline cost of irregular execution (e= .g", - "MetricExpr": "100 * (tma_microcode_sequencer / (tma_few_uops_inst= ructions + tma_microcode_sequencer) * (tma_assists / tma_microcode_sequence= r) * (tma_fetch_latency * (tma_ms_switches + tma_branch_resteers * (tma_cle= ars_resteers + 10 * tma_microcode_sequencer * tma_other_mispredicts / tma_b= ranch_mispredicts * tma_mispredicts_resteers) / (tma_clears_resteers + tma_= mispredicts_resteers + tma_unknown_branches)) / (tma_branch_resteers + tma_= dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switc= hes) + tma_fetch_bandwidth * tma_ms / (tma_dsb + tma_mite + tma_ms)) + 10 *= tma_microcode_sequencer * tma_other_mispredicts / tma_branch_mispredicts *= tma_branch_mispredicts + tma_machine_clears * tma_other_nukes / tma_other_= nukes + tma_core_bound * (tma_serializing_operation + tma_core_bound * RS_E= VENTS.EMPTY_CYCLES / tma_info_thread_clks * tma_ports_utilized_0) / (tma_di= vider + tma_ports_utilization + tma_serializing_operation) + tma_microcode_= sequencer / (tma_few_uops_instructions + tma_microcode_sequencer) * (tma_as= sists / tma_microcode_sequencer) * tma_heavy_operations)", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "100 * (tma_microcode_sequencer / (tma_few_uops_inst= ructions + tma_microcode_sequencer) * (tma_assists / tma_microcode_sequence= r) * (tma_fetch_latency * (tma_ms_switches + tma_branch_resteers * (tma_cle= ars_resteers + 10 * tma_microcode_sequencer * tma_other_mispredicts / tma_b= ranch_mispredicts * tma_mispredicts_resteers) / (tma_clears_resteers + tma_= mispredicts_resteers + tma_unknown_branches)) / (tma_branch_resteers + tma_= dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switc= hes) + tma_ms) + 10 * tma_microcode_sequencer * tma_other_mispredicts / tma= _branch_mispredicts * tma_branch_mispredicts + tma_machine_clears * tma_oth= er_nukes / tma_other_nukes + tma_core_bound * (tma_serializing_operation + = tma_core_bound * RS_EVENTS.EMPTY_CYCLES / tma_info_thread_clks * tma_ports_= utilized_0) / (tma_divider + tma_ports_utilization + tma_serializing_operat= ion) + tma_microcode_sequencer / (tma_few_uops_instructions + tma_microcode= _sequencer) * (tma_assists / tma_microcode_sequencer) * tma_heavy_operation= s)", "MetricGroup": "Bad;BvIO;Cor;Ret;tma_issueMS", "MetricName": "tma_bottleneck_irregular_overhead", "MetricThreshold": "tma_bottleneck_irregular_overhead > 10", @@ -434,7 +469,8 @@ }, { "BriefDescription": "Total pipeline cost of Memory Address Transla= tion related bottlenecks (data-side TLBs)", - "MetricExpr": "100 * (tma_memory_bound * (tma_l1_bound / max(tma_m= emory_bound, tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + = tma_store_bound)) * (tma_dtlb_load / max(tma_l1_bound, tma_4k_aliasing + tm= a_dtlb_load + tma_fb_full + tma_l1_latency_dependency + tma_lock_latency + = tma_split_loads + tma_store_fwd_blk)) + tma_memory_bound * (tma_store_bound= / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store= _bound)) * (tma_dtlb_store / (tma_dtlb_store + tma_false_sharing + tma_spli= t_stores + tma_store_latency + tma_streaming_stores)))", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "100 * (tma_memory_bound * (tma_l1_bound / max(tma_m= emory_bound, tma_cxl_mem_bound + tma_dram_bound + tma_l1_bound + tma_l2_bou= nd + tma_l3_bound + tma_store_bound)) * (tma_dtlb_load / max(tma_l1_bound, = tma_4k_aliasing + tma_dtlb_load + tma_fb_full + tma_l1_latency_dependency += tma_lock_latency + tma_split_loads + tma_store_fwd_blk)) + tma_memory_boun= d * (tma_store_bound / (tma_cxl_mem_bound + tma_dram_bound + tma_l1_bound += tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_dtlb_store / (tma_d= tlb_store + tma_false_sharing + tma_split_stores + tma_store_latency + tma_= streaming_stores)))", "MetricGroup": "BvMT;Mem;MemoryTLB;Offcore;tma_issueTLB", "MetricName": "tma_bottleneck_memory_data_tlbs", "MetricThreshold": "tma_bottleneck_memory_data_tlbs > 20", @@ -442,7 +478,8 @@ }, { "BriefDescription": "Total pipeline cost of Memory Synchronization= related bottlenecks (data transfers and coherency updates across processor= s)", - "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * = (tma_mem_latency / (tma_mem_bandwidth + tma_mem_latency)) * tma_remote_cach= e / (tma_local_mem + tma_remote_cache + tma_remote_mem) + tma_l3_bound / (t= ma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_boun= d) * (tma_contested_accesses + tma_data_sharing) / (tma_contested_accesses = + tma_data_sharing + tma_l3_hit_latency + tma_sq_full) + tma_store_bound / = (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bo= und) * tma_false_sharing / (tma_dtlb_store + tma_false_sharing + tma_split_= stores + tma_store_latency + tma_streaming_stores - tma_store_latency)) + t= ma_machine_clears * (1 - tma_other_nukes / tma_other_nukes))", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_cx= l_mem_bound + tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound += tma_store_bound) * (tma_mem_latency / (tma_mem_bandwidth + tma_mem_latency= )) * tma_remote_cache / (tma_local_mem + tma_remote_cache + tma_remote_mem)= + tma_l3_bound / (tma_cxl_mem_bound + tma_dram_bound + tma_l1_bound + tma_= l2_bound + tma_l3_bound + tma_store_bound) * (tma_contested_accesses + tma_= data_sharing) / (tma_contested_accesses + tma_data_sharing + tma_l3_hit_lat= ency + tma_sq_full) + tma_store_bound / (tma_cxl_mem_bound + tma_dram_bound= + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * tma_fals= e_sharing / (tma_dtlb_store + tma_false_sharing + tma_split_stores + tma_st= ore_latency + tma_streaming_stores - tma_store_latency)) + tma_machine_clea= rs * (1 - tma_other_nukes / tma_other_nukes))", "MetricGroup": "BvMS;LockCont;Mem;Offcore;tma_issueSyncxn", "MetricName": "tma_bottleneck_memory_synchronization", "MetricThreshold": "tma_bottleneck_memory_synchronization > 10", @@ -450,6 +487,7 @@ }, { "BriefDescription": "Total pipeline cost of Branch Misprediction r= elated bottlenecks", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (1 - 10 * tma_microcode_sequencer * tma_other= _mispredicts / tma_branch_mispredicts) * (tma_branch_mispredicts + tma_fetc= h_latency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switc= hes + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches))", "MetricGroup": "Bad;BadSpec;BrMispredicts;BvMP;tma_issueBM", "MetricName": "tma_bottleneck_mispredictions", @@ -458,7 +496,8 @@ }, { "BriefDescription": "Total pipeline cost of remaining bottlenecks = in the back-end", - "MetricExpr": "100 - (tma_bottleneck_big_code + tma_bottleneck_ins= truction_fetch_bw + tma_bottleneck_mispredictions + tma_bottleneck_cache_me= mory_bandwidth + tma_bottleneck_cache_memory_latency + tma_bottleneck_memor= y_data_tlbs + tma_bottleneck_memory_synchronization + tma_bottleneck_comput= e_bound_est + tma_bottleneck_irregular_overhead + tma_bottleneck_branching_= overhead + tma_bottleneck_useful_work)", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "100 - (tma_bottleneck_big_code + tma_bottleneck_ins= truction_fetch_bw + tma_bottleneck_mispredictions + tma_bottleneck_data_cac= he_memory_bandwidth + tma_bottleneck_data_cache_memory_latency + tma_bottle= neck_memory_data_tlbs + tma_bottleneck_memory_synchronization + tma_bottlen= eck_compute_bound_est + tma_bottleneck_irregular_overhead + tma_bottleneck_= branching_overhead + tma_bottleneck_useful_work)", "MetricGroup": "BvOB;Cor;Offcore", "MetricName": "tma_bottleneck_other_bottlenecks", "MetricThreshold": "tma_bottleneck_other_bottlenecks > 20", @@ -466,6 +505,7 @@ }, { "BriefDescription": "Total pipeline cost of \"useful operations\" = - the portion of Retiring category not covered by Branching_Overhead nor Ir= regular_Overhead.", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_retiring - (BR_INST_RETIRED.ALL_BRANCHES= + 2 * BR_INST_RETIRED.NEAR_CALL + INST_RETIRED.NOP) / tma_info_thread_slot= s - tma_microcode_sequencer / (tma_few_uops_instructions + tma_microcode_se= quencer) * (tma_assists / tma_microcode_sequencer) * tma_heavy_operations)", "MetricGroup": "BvUW;Ret", "MetricName": "tma_bottleneck_useful_work", @@ -584,6 +624,15 @@ "PublicDescription": "This metric represents fraction of slots whe= re Core non-memory issues were of a bottleneck. Shortage in hardware compu= te resources; or dependencies in software's instructions are both categoriz= ed under Core Bound. Hence it may indicate the machine ran out of an out-of= -order resource; certain execution units are overloaded or dependencies in = program's data- or instruction-flow are limiting the performance (e.g. FP-c= hained long-latency arithmetic operations).", "ScaleUnit": "100%" }, + { + "BriefDescription": "This metric roughly estimates (based on idle = latencies) how often the CPU was stalled on accesses to external CXL Memory= by loads (e.g", + "MetricExpr": "(((1 - ((19 * (MEM_LOAD_L3_MISS_RETIRED.REMOTE_DRAM= * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS)) + 10 * (MEM_LO= AD_L3_MISS_RETIRED.LOCAL_DRAM * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RET= IRED.L1_MISS) + MEM_LOAD_L3_MISS_RETIRED.REMOTE_FWD * (1 + MEM_LOAD_RETIRED= .FB_HIT / MEM_LOAD_RETIRED.L1_MISS) + MEM_LOAD_L3_MISS_RETIRED.REMOTE_HITM = * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS))) / (19 * (MEM_L= OAD_L3_MISS_RETIRED.REMOTE_DRAM * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_R= ETIRED.L1_MISS)) + 10 * (MEM_LOAD_L3_MISS_RETIRED.LOCAL_DRAM * (1 + MEM_LOA= D_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS) + MEM_LOAD_L3_MISS_RETIRED.REM= OTE_FWD * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS) + MEM_LO= AD_L3_MISS_RETIRED.REMOTE_HITM * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RE= TIRED.L1_MISS)) + (25 * (MEM_LOAD_RETIRED.LOCAL_PMM * (1 + MEM_LOAD_RETIRED= .FB_HIT / MEM_LOAD_RETIRED.L1_MISS) if #has_pmem > 0 else 0) + 33 * (MEM_LO= AD_L3_MISS_RETIRED.REMOTE_PMM * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RET= IRED.L1_MISS) if #has_pmem > 0 else 0))) if #has_pmem > 0 else 1)) * (CYCLE= _ACTIVITY.STALLS_L3_MISS / tma_info_thread_clks + (CYCLE_ACTIVITY.STALLS_L1= D_MISS - CYCLE_ACTIVITY.STALLS_L2_MISS) / tma_info_thread_clks - tma_l2_bou= nd) if 1e6 * (MEM_LOAD_L3_MISS_RETIRED.REMOTE_PMM + MEM_LOAD_RETIRED.LOCAL_= PMM) > MEM_LOAD_RETIRED.L1_MISS else 0) if #has_pmem > 0 else 0)", + "MetricGroup": "MemoryBound;Server;TmaL3mem;TopdownL3;tma_L3_group= ;tma_memory_bound_group", + "MetricName": "tma_cxl_mem_bound", + "MetricThreshold": "tma_cxl_mem_bound > 0.1 & (tma_memory_bound > = 0.2 & tma_backend_bound > 0.2)", + "PublicDescription": "This metric roughly estimates (based on idle= latencies) how often the CPU was stalled on accesses to external CXL Memor= y by loads (e.g. 3D-Xpoint (Crystal Ridge, a.k.a. IXP) memory, PMM - Persis= tent Memory Module [from CLX to SPR] or any other CXL Type3 Memory [EMR onw= ards]).", + "ScaleUnit": "100%" + }, { "BriefDescription": "This metric estimates fraction of cycles whil= e the memory subsystem was handling synchronizations due to data-sharing ac= cesses", "MetricConstraint": "NO_GROUP_EVENTS", @@ -615,7 +664,7 @@ { "BriefDescription": "This metric estimates how often the CPU was s= talled on accesses to external memory (DRAM) by loads", "MetricConstraint": "NO_GROUP_EVENTS", - "MetricExpr": "CYCLE_ACTIVITY.STALLS_L3_MISS / tma_info_thread_clk= s + (CYCLE_ACTIVITY.STALLS_L1D_MISS - CYCLE_ACTIVITY.STALLS_L2_MISS) / tma_= info_thread_clks - tma_l2_bound", + "MetricExpr": "(CYCLE_ACTIVITY.STALLS_L3_MISS / tma_info_thread_cl= ks + (CYCLE_ACTIVITY.STALLS_L1D_MISS - CYCLE_ACTIVITY.STALLS_L2_MISS) / tma= _info_thread_clks - tma_l2_bound - tma_cxl_mem_bound if #has_pmem > 0 else = CYCLE_ACTIVITY.STALLS_L3_MISS / tma_info_thread_clks + (CYCLE_ACTIVITY.STAL= LS_L1D_MISS - CYCLE_ACTIVITY.STALLS_L2_MISS) / tma_info_thread_clks - tma_l= 2_bound)", "MetricGroup": "MemoryBound;TmaL3mem;TopdownL3;tma_L3_group;tma_me= mory_bound_group", "MetricName": "tma_dram_bound", "MetricThreshold": "tma_dram_bound > 0.1 & (tma_memory_bound > 0.2= & tma_backend_bound > 0.2)", @@ -673,7 +722,7 @@ "MetricGroup": "BvMB;MemoryBW;TopdownL4;tma_L4_group;tma_issueBW;t= ma_issueSL;tma_issueSmSt;tma_l1_bound_group", "MetricName": "tma_fb_full", "MetricThreshold": "tma_fb_full > 0.3", - "PublicDescription": "This metric does a *rough estimation* of how= often L1D Fill Buffer unavailability limited additional L1D miss memory ac= cess requests to proceed. The higher the metric value; the deeper the memor= y hierarchy level the misses are satisfied from (metric values >1 are valid= ). Often it hints on approaching bandwidth limits (to L2 cache; L3 cache or= external memory). Related metrics: tma_bottleneck_cache_memory_bandwidth, = tma_info_system_dram_bw_use, tma_mem_bandwidth, tma_sq_full, tma_store_late= ncy, tma_streaming_stores", + "PublicDescription": "This metric does a *rough estimation* of how= often L1D Fill Buffer unavailability limited additional L1D miss memory ac= cess requests to proceed. The higher the metric value; the deeper the memor= y hierarchy level the misses are satisfied from (metric values >1 are valid= ). Often it hints on approaching bandwidth limits (to L2 cache; L3 cache or= external memory). Related metrics: tma_bottleneck_data_cache_memory_bandwi= dth, tma_info_system_dram_bw_use, tma_mem_bandwidth, tma_sq_full, tma_store= _latency, tma_streaming_stores", "ScaleUnit": "100%" }, { @@ -865,6 +914,7 @@ }, { "BriefDescription": "Total pipeline cost of DSB (uop cache) hits -= subset of the Instruction_Fetch_BW Bottleneck", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_frontend_bound * (tma_fetch_bandwidth / = (tma_fetch_bandwidth + tma_fetch_latency)) * (tma_dsb / (tma_dsb + tma_mite= + tma_ms)))", "MetricGroup": "DSB;Fed;FetchBW;tma_issueFB", "MetricName": "tma_info_botlnk_l2_dsb_bandwidth", @@ -1320,7 +1370,7 @@ "MetricName": "tma_info_memory_tlb_store_stlb_mpki" }, { - "BriefDescription": "", + "BriefDescription": "Mem;Backend;CacheHits", "MetricExpr": "UOPS_EXECUTED.THREAD / (UOPS_EXECUTED.CORE_CYCLES_G= E_1 / 2 if #SMT_on else cpu@UOPS_EXECUTED.THREAD\\,cmask\\=3D1@)", "MetricGroup": "Cor;Pipeline;PortsUtil;SMT", "MetricName": "tma_info_pipeline_execute" @@ -1337,6 +1387,12 @@ "MetricGroup": "Fed;FetchBW", "MetricName": "tma_info_pipeline_fetch_mite" }, + { + "BriefDescription": "Average number of uops fetched from MS per cy= cle", + "MetricExpr": "IDQ.MS_UOPS / cpu@IDQ.MS_UOPS\\,cmask\\=3D1@", + "MetricGroup": "Fed;FetchLat;MicroSeq", + "MetricName": "tma_info_pipeline_fetch_ms" + }, { "BriefDescription": "Instructions per a microcode Assist invocatio= n", "MetricExpr": "INST_RETIRED.ANY / ASSISTS.ANY", @@ -1353,7 +1409,7 @@ }, { "BriefDescription": "Measured Average Core Frequency for unhalted = processors [GHz]", - "MetricExpr": "tma_info_system_turbo_utilization * TSC / 1e9 / tma= _info_system_time", + "MetricExpr": "tma_info_system_turbo_utilization * msr@tsc@ / 1e9 = / tma_info_system_time", "MetricGroup": "Power;Summary", "MetricName": "tma_info_system_core_frequency" }, @@ -1365,16 +1421,28 @@ }, { "BriefDescription": "Average number of utilized CPUs", - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / TSC", + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@", "MetricGroup": "Summary", "MetricName": "tma_info_system_cpus_utilized" }, + { + "BriefDescription": "Average 3DXP Memory Bandwidth Use for reads [= GB / sec]", + "MetricExpr": "(64 * UNC_M_PMM_RPQ_INSERTS / 1e9 / tma_info_system= _time if #has_pmem > 0 else 0)", + "MetricGroup": "MemOffcore;MemoryBW;Server;SoC", + "MetricName": "tma_info_system_cxl_mem_read_bw" + }, + { + "BriefDescription": "Average 3DXP Memory Bandwidth Use for Writes = [GB / sec]", + "MetricExpr": "(64 * UNC_M_PMM_WPQ_INSERTS / 1e9 / tma_info_system= _time if #has_pmem > 0 else 0)", + "MetricGroup": "MemOffcore;MemoryBW;Server;SoC", + "MetricName": "tma_info_system_cxl_mem_write_bw" + }, { "BriefDescription": "Average external Memory Bandwidth Use for rea= ds and writes [GB / sec]", "MetricExpr": "64 * (UNC_M_CAS_COUNT.RD + UNC_M_CAS_COUNT.WR) / 1e= 9 / tma_info_system_time", "MetricGroup": "HPC;MemOffcore;MemoryBW;SoC;tma_issueBW", "MetricName": "tma_info_system_dram_bw_use", - "PublicDescription": "Average external Memory Bandwidth Use for re= ads and writes [GB / sec]. Related metrics: tma_bottleneck_cache_memory_ban= dwidth, tma_fb_full, tma_mem_bandwidth, tma_sq_full" + "PublicDescription": "Average external Memory Bandwidth Use for re= ads and writes [GB / sec]. Related metrics: tma_bottleneck_data_cache_memor= y_bandwidth, tma_fb_full, tma_mem_bandwidth, tma_sq_full" }, { "BriefDescription": "Giga Floating Point Operations Per Second", @@ -1433,11 +1501,19 @@ }, { "BriefDescription": "Average number of parallel data read requests= to external memory", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD / UNC_CHA_TOR_OCC= UPANCY.IA_MISS_DRD@thresh\\=3D1@", "MetricGroup": "Mem;MemoryBW;SoC", "MetricName": "tma_info_system_mem_parallel_reads", "PublicDescription": "Average number of parallel data read request= s to external memory. Accounts for demand loads and L1/L2 prefetches" }, + { + "BriefDescription": "Average latency of data read request to exter= nal 3D X-Point memory [in nanoseconds]", + "MetricExpr": "(1e9 * (UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_PMM / UNC= _CHA_TOR_INSERTS.IA_MISS_DRD_PMM) / cha_0@event\\=3D0x0@ if #has_pmem > 0 e= lse 0)", + "MetricGroup": "MemOffcore;MemoryLat;Server;SoC", + "MetricName": "tma_info_system_mem_pmm_read_latency", + "PublicDescription": "Average latency of data read request to exte= rnal 3D X-Point memory [in nanoseconds]. Accounts for demand loads and L1/L= 2 data-read prefetches" + }, { "BriefDescription": "Average latency of data read request to exter= nal memory (in nanoseconds)", "MetricExpr": "1e9 * (UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD / UNC_CHA_= TOR_INSERTS.IA_MISS_DRD) / (tma_info_system_socket_clks / tma_info_system_t= ime)", @@ -1590,12 +1666,12 @@ "ScaleUnit": "100%" }, { - "BriefDescription": "This metric([SKL+] roughly; [LNL]) estimates = fraction of cycles with demand load accesses that hit the L1D cache", + "BriefDescription": "This metric ([SKL+] roughly; [LNL]) estimates= fraction of cycles with demand load accesses that hit the L1D cache", "MetricExpr": "min(2 * (MEM_INST_RETIRED.ALL_LOADS - MEM_LOAD_RETI= RED.FB_HIT - MEM_LOAD_RETIRED.L1_MISS) * 20 / 100, max(CYCLE_ACTIVITY.CYCLE= S_MEM_ANY - CYCLE_ACTIVITY.CYCLES_L1D_MISS, 0)) / tma_info_thread_clks", "MetricGroup": "BvML;MemoryLat;TopdownL4;tma_L4_group;tma_l1_bound= _group", "MetricName": "tma_l1_latency_dependency", "MetricThreshold": "tma_l1_latency_dependency > 0.1 & (tma_l1_boun= d > 0.1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric([SKL+] roughly; [LNL]) estimates= fraction of cycles with demand load accesses that hit the L1D cache. The s= hort latency of the L1D cache may be exposed in pointer-chasing memory acce= ss patterns as an example. Sample with: MEM_LOAD_RETIRED.L1_HIT", + "PublicDescription": "This metric ([SKL+] roughly; [LNL]) estimate= s fraction of cycles with demand load accesses that hit the L1D cache. The = short latency of the L1D cache may be exposed in pointer-chasing memory acc= ess patterns as an example. Sample with: MEM_LOAD_RETIRED.L1_HIT", "ScaleUnit": "100%" }, { @@ -1619,7 +1695,6 @@ }, { "BriefDescription": "This metric estimates how often the CPU was s= talled due to loads accesses to L3 cache or contended with a sibling Core", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "(CYCLE_ACTIVITY.STALLS_L2_MISS - CYCLE_ACTIVITY.STA= LLS_L3_MISS) / tma_info_thread_clks", "MetricGroup": "CacheHits;MemoryBound;TmaL3mem;TopdownL3;tma_L3_gr= oup;tma_memory_bound_group", "MetricName": "tma_l3_bound", @@ -1633,7 +1708,7 @@ "MetricGroup": "BvML;MemoryLat;TopdownL4;tma_L4_group;tma_issueLat= ;tma_l3_bound_group", "MetricName": "tma_l3_hit_latency", "MetricThreshold": "tma_l3_hit_latency > 0.1 & (tma_l3_bound > 0.0= 5 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric estimates fraction of cycles wit= h demand load accesses that hit the L3 cache under unloaded scenarios (poss= ibly L3 latency limited). Avoiding private cache misses (i.e. L2 misses/L3= hits) will improve the latency; reduce contention with sibling physical co= res and increase performance. Note the value of this node may overlap with= its siblings. Sample with: MEM_LOAD_RETIRED.L3_HIT_PS. Related metrics: tm= a_bottleneck_cache_memory_latency, tma_mem_latency", + "PublicDescription": "This metric estimates fraction of cycles wit= h demand load accesses that hit the L3 cache under unloaded scenarios (poss= ibly L3 latency limited). Avoiding private cache misses (i.e. L2 misses/L3= hits) will improve the latency; reduce contention with sibling physical co= res and increase performance. Note the value of this node may overlap with= its siblings. Sample with: MEM_LOAD_RETIRED.L3_HIT_PS. Related metrics: tm= a_bottleneck_data_cache_memory_latency, tma_mem_latency", "ScaleUnit": "100%" }, { @@ -1739,7 +1814,7 @@ "MetricGroup": "BvMB;MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_d= ram_bound_group;tma_issueBW", "MetricName": "tma_mem_bandwidth", "MetricThreshold": "tma_mem_bandwidth > 0.2 & (tma_dram_bound > 0.= 1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric estimates fraction of cycles whe= re the core's performance was likely hurt due to approaching bandwidth limi= ts of external memory - DRAM ([SPR-HBM] and/or HBM). The underlying heuris= tic assumes that a similar off-core traffic is generated by all IA cores. T= his metric does not aggregate non-data-read requests by this logical proces= sor; requests from other IA Logical Processors/Physical Cores/sockets; or o= ther non-IA devices like GPU; hence the maximum external memory bandwidth l= imits may or may not be approached when this metric is flagged (see Uncore = counters for that). Related metrics: tma_bottleneck_cache_memory_bandwidth,= tma_fb_full, tma_info_system_dram_bw_use, tma_sq_full", + "PublicDescription": "This metric estimates fraction of cycles whe= re the core's performance was likely hurt due to approaching bandwidth limi= ts of external memory - DRAM ([SPR-HBM] and/or HBM). The underlying heuris= tic assumes that a similar off-core traffic is generated by all IA cores. T= his metric does not aggregate non-data-read requests by this logical proces= sor; requests from other IA Logical Processors/Physical Cores/sockets; or o= ther non-IA devices like GPU; hence the maximum external memory bandwidth l= imits may or may not be approached when this metric is flagged (see Uncore = counters for that). Related metrics: tma_bottleneck_data_cache_memory_bandw= idth, tma_fb_full, tma_info_system_dram_bw_use, tma_sq_full", "ScaleUnit": "100%" }, { @@ -1748,7 +1823,7 @@ "MetricGroup": "BvML;MemoryLat;Offcore;TopdownL4;tma_L4_group;tma_= dram_bound_group;tma_issueLat", "MetricName": "tma_mem_latency", "MetricThreshold": "tma_mem_latency > 0.1 & (tma_dram_bound > 0.1 = & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric estimates fraction of cycles whe= re the performance was likely hurt due to latency from external memory - DR= AM ([SPR-HBM] and/or HBM). This metric does not aggregate requests from ot= her Logical Processors/Physical Cores/sockets (see Uncore counters for that= ). Related metrics: tma_bottleneck_cache_memory_latency, tma_l3_hit_latency= ", + "PublicDescription": "This metric estimates fraction of cycles whe= re the performance was likely hurt due to latency from external memory - DR= AM ([SPR-HBM] and/or HBM). This metric does not aggregate requests from ot= her Logical Processors/Physical Cores/sockets (see Uncore counters for that= ). Related metrics: tma_bottleneck_data_cache_memory_latency, tma_l3_hit_la= tency", "ScaleUnit": "100%" }, { @@ -1816,7 +1891,7 @@ }, { "BriefDescription": "This metric represents Core fraction of cycle= s in which CPU was likely limited due to the Microcode Sequencer (MS) unit = - see Microcode_Sequencer node for details.", - "MetricExpr": "cpu@IDQ.MS_UOPS\\,cmask\\=3D1@ / tma_info_core_core= _clks / 2", + "MetricExpr": "cpu@IDQ.MS_UOPS\\,cmask\\=3D1@ / tma_info_core_core= _clks / 3.3", "MetricGroup": "MicroSeq;TopdownL3;tma_L3_group;tma_fetch_bandwidt= h_group", "MetricName": "tma_ms", "MetricThreshold": "tma_ms > 0.05 & tma_fetch_bandwidth > 0.2", @@ -1968,7 +2043,7 @@ { "BriefDescription": "This category represents fraction of slots ut= ilized by useful work i.e. issued uops that eventually get retired", "DefaultMetricgroupName": "TopdownL1", - "MetricExpr": "topdown\\-retiring / (topdown\\-fe\\-bound + topdow= n\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_info_= thread_slots", + "MetricExpr": "topdown\\-retiring / (topdown\\-fe\\-bound + topdow= n\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound)", "MetricGroup": "BvUW;Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_retiring", "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.= 1", @@ -2005,7 +2080,6 @@ }, { "BriefDescription": "This metric represents rate of split store ac= cesses", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "MEM_INST_RETIRED.SPLIT_STORES / tma_info_core_core_= clks", "MetricGroup": "TopdownL4;tma_L4_group;tma_issueSpSt;tma_store_bou= nd_group", "MetricName": "tma_split_stores", @@ -2019,7 +2093,7 @@ "MetricGroup": "BvMB;MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_i= ssueBW;tma_l3_bound_group", "MetricName": "tma_sq_full", "MetricThreshold": "tma_sq_full > 0.3 & (tma_l3_bound > 0.05 & (tm= a_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric measures fraction of cycles wher= e the Super Queue (SQ) was full taking into account all request-types and b= oth hardware SMT threads (Logical Processors). Related metrics: tma_bottlen= eck_cache_memory_bandwidth, tma_fb_full, tma_info_system_dram_bw_use, tma_m= em_bandwidth", + "PublicDescription": "This metric measures fraction of cycles wher= e the Super Queue (SQ) was full taking into account all request-types and b= oth hardware SMT threads (Logical Processors). Related metrics: tma_bottlen= eck_data_cache_memory_bandwidth, tma_fb_full, tma_info_system_dram_bw_use, = tma_mem_bandwidth", "ScaleUnit": "100%" }, { @@ -2033,7 +2107,6 @@ }, { "BriefDescription": "This metric roughly estimates fraction of cyc= les when the memory subsystem had loads blocked since they could not forwar= d data from earlier (in program order) overlapping stores", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "13 * LD_BLOCKS.STORE_FORWARD / tma_info_thread_clks= ", "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group", "MetricName": "tma_store_fwd_blk", --=20 2.50.0.727.gbf7dc18ff4-goog From nobody Mon Oct 6 15:13:22 2025 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9C9C3221DB1 for ; Sat, 19 Jul 2025 03:45:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752896748; cv=none; b=addTjYr4I0OAWSmhsNDNyQb4XXJbGWXgmDftpENBTRKPDDRLBnpQ/T7u/RF/OAdUfC4/gRzwOjD92WTbNh6Su12VhEcTqP5NKKLpHlepVXLlav3s9TX5wm4i+pWHmBjraa4BlkbnaW01Y4ujC4hYB3p4ULfpq9Pyubp8tvSaeV8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752896748; c=relaxed/simple; bh=d7IJ/wsOkNu5sSHZzR2prXyO50/Ref/jxlV0ql+z+NI=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Content-Type; b=Kb4i3e2asR7qLSszUBsA93iKS9hPAfi7/OFtD3VyXwd0aOS0HtzE8oBxzicoFnc4U5Wyz6agTz0T3DxdChKB4lFEoc7sb2toJ6p6Ea8x8sxoIblowkXsZuYQMvZVvj0A90yL2zV+VuM+l7RMRS/JK4xzOzeJJvJNgLS1GMLXDHU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=VntBegkC; arc=none smtp.client-ip=209.85.214.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="VntBegkC" Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-235e3f93687so39051675ad.2 for ; Fri, 18 Jul 2025 20:45:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1752896744; x=1753501544; darn=vger.kernel.org; h=content-transfer-encoding:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=e6RYOfo1e+dPbu+hizmy1GF6v7IsXdR4N/AmArIYOVY=; b=VntBegkC2ievT5F7PNCF6dhtf+hAjqexYUxBKaeBFz5F78UDAwDHBBohetMGfBVdaS Je5J9i0/yu5Td4n/t7/bTbdsLqayShetiES+Rg5UTYHQ8piku96l7NNF3iN7SpceQJDU 7w3Z91wXq3si7vOrcI0mxFtWgbAj9+eAwDOz39rALd+V9t+7ultCzGx4G1nXxxKnCLYF vF5CAQG0eN6tRxzmjog3HnNWpAzixwpdaCRtnyaYnWr9oHXyxrBwD4Rt/RVMzlHjHY3v jD1XbV0xE7rZoJeW8oZosoGfPkD2rs1bWnNu1e1xzHwsbW6zmAkT9SMwg8ST6/IRHMFw Jn6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752896744; x=1753501544; h=content-transfer-encoding:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=e6RYOfo1e+dPbu+hizmy1GF6v7IsXdR4N/AmArIYOVY=; b=dHq6N3mKJvtEWtl2r4tScju6aO5IUunwOQ8GgVqIuH0dd+JiyYlklzqRC/MVYq9EhZ IwD31mGze1GxYp9ko2fuZMUTJrgyxPUKQ7THmq4VahH1GyrJ6kLVNKNPAnbeA4vT2Dqr 5NhwNR9gZ4mWVn8LAesgccS+EaKO0gvAzD6KuAYE+IO54QWu8HvD3lwQe+qUZqIdwsl9 uaw8/pOzuMlWfPPEIiAFFamnN4k2DXUUmzm3SpwCmkHevWnaNA2fFzmoaOCRRvAqkLWP Y72zj7wBK7xdoENFTxEBWSX1rbAzOepShSlajdJUuTxdb+UOxtDSHKmPQyPuuCq7++7K TTag== X-Forwarded-Encrypted: i=1; AJvYcCUy7jelX1IuAIhAUy8l0JN1+/8NkKOIfp9TRuk9p3SQGkfXqH/9lfKrLi9WbiRHq1Au+7jK9V9Aq267hmA=@vger.kernel.org X-Gm-Message-State: AOJu0YyhSqYtf5zpFh1AeNdqS7oHF13GsrEeu2bx3RgaM6wzz0/U1LoA CzVq28PlXE0kb9H/nS0qwtj0ZLyHWI9OTrCKe5/9S3GI5Et31AWURZiobKKMF+LWayt6igTnh8K omNRKqg0Dhw== X-Google-Smtp-Source: AGHT+IEmJ05wRNcEOFielnUom5fSKxLwns2GRmqAaFvTgIxLuFUzeTNvIf+wRzW6R0ncoPw5VOrWm5FsAUBr X-Received: from plhv12.prod.google.com ([2002:a17:903:238c:b0:234:8c63:ac2b]) (user=irogers job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:ecd0:b0:235:711:f810 with SMTP id d9443c01a7336-23e256be344mr197508685ad.23.1752896743769; Fri, 18 Jul 2025 20:45:43 -0700 (PDT) Date: Fri, 18 Jul 2025 20:45:06 -0700 In-Reply-To: <20250719034515.2000467-1-irogers@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250719034515.2000467-1-irogers@google.com> X-Mailer: git-send-email 2.50.0.727.gbf7dc18ff4-goog Message-ID: <20250719034515.2000467-11-irogers@google.com> Subject: [PATCH v1 10/19] perf vendor events: Update ivybridge/ivytown metrics From: Ian Rogers To: Thomas Falcon , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Kan Liang , "=?UTF-8?q?Andreas=20F=C3=A4rber?=" , Manivannan Sadhasivam , Caleb Biggers , Weilin Wang , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-actions@lists.infradead.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Update metrics from TMA 5.0 to 5.1. Signed-off-by: Ian Rogers Tested-by: Thomas Falcon --- .../arch/x86/ivybridge/ivb-metrics.json | 30 ++++++++--------- .../arch/x86/ivytown/ivt-metrics.json | 33 +++++++++---------- 2 files changed, 29 insertions(+), 34 deletions(-) diff --git a/tools/perf/pmu-events/arch/x86/ivybridge/ivb-metrics.json b/to= ols/perf/pmu-events/arch/x86/ivybridge/ivb-metrics.json index de651ff9f846..969cb519eec1 100644 --- a/tools/perf/pmu-events/arch/x86/ivybridge/ivb-metrics.json +++ b/tools/perf/pmu-events/arch/x86/ivybridge/ivb-metrics.json @@ -1,49 +1,49 @@ [ { "BriefDescription": "C2 residency percent per package", - "MetricExpr": "cstate_pkg@c2\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c2\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C2_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C3 residency percent per core", - "MetricExpr": "cstate_core@c3\\-residency@ / TSC", + "MetricExpr": "cstate_core@c3\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C3_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C3 residency percent per package", - "MetricExpr": "cstate_pkg@c3\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c3\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C3_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per core", - "MetricExpr": "cstate_core@c6\\-residency@ / TSC", + "MetricExpr": "cstate_core@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per package", - "MetricExpr": "cstate_pkg@c6\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C7 residency percent per core", - "MetricExpr": "cstate_core@c7\\-residency@ / TSC", + "MetricExpr": "cstate_core@c7\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C7_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C7 residency percent per package", - "MetricExpr": "cstate_pkg@c7\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c7\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C7_Pkg_Residency", "ScaleUnit": "100%" @@ -80,7 +80,6 @@ }, { "BriefDescription": "This metric represents Core fraction of cycle= s CPU dispatched uops on execution ports for ALU operations.", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "(UOPS_DISPATCHED_PORT.PORT_0 + UOPS_DISPATCHED_PORT= .PORT_1 + UOPS_DISPATCHED_PORT.PORT_5) / (3 * tma_info_core_core_clks)", "MetricGroup": "TopdownL5;tma_L5_group;tma_ports_utilized_3m_group= ", "MetricName": "tma_alu_op_utilization", @@ -98,7 +97,6 @@ }, { "BriefDescription": "This category represents fraction of slots wh= ere no uops are being delivered due to a lack of required resources for acc= epting new uops in the Backend", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "1 - (tma_frontend_bound + tma_bad_speculation + tma= _retiring)", "MetricGroup": "BvOB;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", @@ -139,7 +137,6 @@ }, { "BriefDescription": "This metric estimates fraction of cycles the = CPU retired uops originated from CISC (complex instruction set computer) in= struction", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "max(0, tma_microcode_sequencer - tma_assists)", "MetricGroup": "TopdownL4;tma_L4_group;tma_microcode_sequencer_gro= up", "MetricName": "tma_cisc", @@ -561,7 +558,7 @@ "MetricThreshold": "tma_info_memory_tlb_page_walks_utilization > 0= .5" }, { - "BriefDescription": "", + "BriefDescription": "Mem;Backend;CacheHits", "MetricExpr": "UOPS_EXECUTED.THREAD / (cpu@UOPS_EXECUTED.CORE\\,cm= ask\\=3D1@ / 2 if #SMT_on else UOPS_EXECUTED.CYCLES_GE_1_UOP_EXEC)", "MetricGroup": "Cor;Pipeline;PortsUtil;SMT", "MetricName": "tma_info_pipeline_execute" @@ -574,7 +571,7 @@ }, { "BriefDescription": "Measured Average Core Frequency for unhalted = processors [GHz]", - "MetricExpr": "tma_info_system_turbo_utilization * TSC / 1e9 / tma= _info_system_time", + "MetricExpr": "tma_info_system_turbo_utilization * msr@tsc@ / 1e9 = / tma_info_system_time", "MetricGroup": "Power;Summary", "MetricName": "tma_info_system_core_frequency" }, @@ -586,7 +583,7 @@ }, { "BriefDescription": "Average number of utilized CPUs", - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / TSC", + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@", "MetricGroup": "Summary", "MetricName": "tma_info_system_cpus_utilized" }, @@ -775,7 +772,6 @@ }, { "BriefDescription": "This metric represents Core fraction of cycle= s CPU dispatched uops on execution port for Load operations", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "(UOPS_DISPATCHED_PORT.PORT_2 + UOPS_DISPATCHED_PORT= .PORT_3 - UOPS_DISPATCHED_PORT.PORT_4) / (2 * tma_info_core_core_clks)", "MetricGroup": "TopdownL5;tma_L5_group;tma_ports_utilized_3m_group= ", "MetricName": "tma_load_op_utilization", @@ -926,7 +922,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles CPU= executed no uops on any execution port (Logical Processor cycles since ICL= , Physical Core cycles otherwise)", - "MetricExpr": "(cpu@UOPS_EXECUTED.CORE\\,inv\\,cmask\\=3D1@ / 2 if= #SMT_on else (min(CPU_CLK_UNHALTED.THREAD, CYCLE_ACTIVITY.CYCLES_NO_EXECUT= E) - (RS_EVENTS.EMPTY_CYCLES if tma_fetch_latency > 0.1 else 0)) / tma_info= _core_core_clks)", + "MetricExpr": "(cpu@UOPS_EXECUTED.CORE\\,inv\\,cmask\\=3D1@ / 2 if= #SMT_on else min(CPU_CLK_UNHALTED.THREAD, CYCLE_ACTIVITY.CYCLES_NO_EXECUTE= ) - (RS_EVENTS.EMPTY_CYCLES if tma_fetch_latency > 0.1 else 0)) / tma_info_= core_core_clks", "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_ports_utiliza= tion_group", "MetricName": "tma_ports_utilized_0", "MetricThreshold": "tma_ports_utilized_0 > 0.2 & (tma_ports_utiliz= ation > 0.15 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2))", @@ -935,7 +931,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles whe= re the CPU executed total of 1 uop per cycle on all execution ports (Logica= l Processor cycles since ICL, Physical Core cycles otherwise)", - "MetricExpr": "((cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D1@ - cpu@UOPS_= EXECUTED.CORE\\,cmask\\=3D2@) / 2 if #SMT_on else (UOPS_EXECUTED.CYCLES_GE_= 1_UOP_EXEC - UOPS_EXECUTED.CYCLES_GE_2_UOPS_EXEC) / tma_info_core_core_clks= )", + "MetricExpr": "((cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D1@ - cpu@UOPS_= EXECUTED.CORE\\,cmask\\=3D2@) / 2 if #SMT_on else UOPS_EXECUTED.CYCLES_GE_1= _UOP_EXEC - UOPS_EXECUTED.CYCLES_GE_2_UOPS_EXEC) / tma_info_core_core_clks", "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_issueL1;tma_p= orts_utilization_group", "MetricName": "tma_ports_utilized_1", "MetricThreshold": "tma_ports_utilized_1 > 0.2 & (tma_ports_utiliz= ation > 0.15 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2))", @@ -944,7 +940,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles CPU= executed total of 2 uops per cycle on all execution ports (Logical Process= or cycles since ICL, Physical Core cycles otherwise)", - "MetricExpr": "((cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D2@ - cpu@UOPS_= EXECUTED.CORE\\,cmask\\=3D3@) / 2 if #SMT_on else (UOPS_EXECUTED.CYCLES_GE_= 2_UOPS_EXEC - UOPS_EXECUTED.CYCLES_GE_3_UOPS_EXEC) / tma_info_core_core_clk= s)", + "MetricExpr": "((cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D2@ - cpu@UOPS_= EXECUTED.CORE\\,cmask\\=3D3@) / 2 if #SMT_on else UOPS_EXECUTED.CYCLES_GE_2= _UOPS_EXEC - UOPS_EXECUTED.CYCLES_GE_3_UOPS_EXEC) / tma_info_core_core_clks= ", "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_issue2P;tma_p= orts_utilization_group", "MetricName": "tma_ports_utilized_2", "MetricThreshold": "tma_ports_utilized_2 > 0.15 & (tma_ports_utili= zation > 0.15 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2))", diff --git a/tools/perf/pmu-events/arch/x86/ivytown/ivt-metrics.json b/tool= s/perf/pmu-events/arch/x86/ivytown/ivt-metrics.json index 714d5e6d21e7..1cdd197ac883 100644 --- a/tools/perf/pmu-events/arch/x86/ivytown/ivt-metrics.json +++ b/tools/perf/pmu-events/arch/x86/ivytown/ivt-metrics.json @@ -1,49 +1,49 @@ [ { "BriefDescription": "C2 residency percent per package", - "MetricExpr": "cstate_pkg@c2\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c2\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C2_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C3 residency percent per core", - "MetricExpr": "cstate_core@c3\\-residency@ / TSC", + "MetricExpr": "cstate_core@c3\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C3_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C3 residency percent per package", - "MetricExpr": "cstate_pkg@c3\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c3\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C3_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per core", - "MetricExpr": "cstate_core@c6\\-residency@ / TSC", + "MetricExpr": "cstate_core@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per package", - "MetricExpr": "cstate_pkg@c6\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C7 residency percent per core", - "MetricExpr": "cstate_core@c7\\-residency@ / TSC", + "MetricExpr": "cstate_core@c7\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C7_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C7 residency percent per package", - "MetricExpr": "cstate_pkg@c7\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c7\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C7_Pkg_Residency", "ScaleUnit": "100%" @@ -80,7 +80,6 @@ }, { "BriefDescription": "This metric represents Core fraction of cycle= s CPU dispatched uops on execution ports for ALU operations.", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "(UOPS_DISPATCHED_PORT.PORT_0 + UOPS_DISPATCHED_PORT= .PORT_1 + UOPS_DISPATCHED_PORT.PORT_5) / (3 * tma_info_core_core_clks)", "MetricGroup": "TopdownL5;tma_L5_group;tma_ports_utilized_3m_group= ", "MetricName": "tma_alu_op_utilization", @@ -98,7 +97,6 @@ }, { "BriefDescription": "This category represents fraction of slots wh= ere no uops are being delivered due to a lack of required resources for acc= epting new uops in the Backend", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "1 - (tma_frontend_bound + tma_bad_speculation + tma= _retiring)", "MetricGroup": "BvOB;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", @@ -139,7 +137,6 @@ }, { "BriefDescription": "This metric estimates fraction of cycles the = CPU retired uops originated from CISC (complex instruction set computer) in= struction", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "max(0, tma_microcode_sequencer - tma_assists)", "MetricGroup": "TopdownL4;tma_L4_group;tma_microcode_sequencer_gro= up", "MetricName": "tma_cisc", @@ -561,7 +558,7 @@ "MetricThreshold": "tma_info_memory_tlb_page_walks_utilization > 0= .5" }, { - "BriefDescription": "", + "BriefDescription": "Mem;Backend;CacheHits", "MetricExpr": "UOPS_EXECUTED.THREAD / (cpu@UOPS_EXECUTED.CORE\\,cm= ask\\=3D1@ / 2 if #SMT_on else UOPS_EXECUTED.CYCLES_GE_1_UOP_EXEC)", "MetricGroup": "Cor;Pipeline;PortsUtil;SMT", "MetricName": "tma_info_pipeline_execute" @@ -574,7 +571,7 @@ }, { "BriefDescription": "Measured Average Core Frequency for unhalted = processors [GHz]", - "MetricExpr": "tma_info_system_turbo_utilization * TSC / 1e9 / tma= _info_system_time", + "MetricExpr": "tma_info_system_turbo_utilization * msr@tsc@ / 1e9 = / tma_info_system_time", "MetricGroup": "Power;Summary", "MetricName": "tma_info_system_core_frequency" }, @@ -586,7 +583,7 @@ }, { "BriefDescription": "Average number of utilized CPUs", - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / TSC", + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@", "MetricGroup": "Summary", "MetricName": "tma_info_system_cpus_utilized" }, @@ -626,6 +623,7 @@ }, { "BriefDescription": "Average number of parallel data read requests= to external memory", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "UNC_C_TOR_OCCUPANCY.MISS_OPCODE@filter_opc\\=3D0x18= 2@ / UNC_C_TOR_OCCUPANCY.MISS_OPCODE@filter_opc\\=3D0x182\\,thresh\\=3D1@", "MetricGroup": "Mem;MemoryBW;SoC", "MetricName": "tma_info_system_mem_parallel_reads", @@ -795,7 +793,6 @@ }, { "BriefDescription": "This metric represents Core fraction of cycle= s CPU dispatched uops on execution port for Load operations", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "(UOPS_DISPATCHED_PORT.PORT_2 + UOPS_DISPATCHED_PORT= .PORT_3 - UOPS_DISPATCHED_PORT.PORT_4) / (2 * tma_info_core_core_clks)", "MetricGroup": "TopdownL5;tma_L5_group;tma_ports_utilized_3m_group= ", "MetricName": "tma_load_op_utilization", @@ -805,6 +802,7 @@ }, { "BriefDescription": "This metric estimates fraction of cycles whil= e the memory subsystem was handling loads from local memory", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "200 * (MEM_LOAD_UOPS_LLC_MISS_RETIRED.LOCAL_DRAM * = (1 + MEM_LOAD_UOPS_RETIRED.HIT_LFB / (MEM_LOAD_UOPS_RETIRED.L2_HIT + MEM_LO= AD_UOPS_RETIRED.LLC_HIT + MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_HIT + MEM_LOAD= _UOPS_LLC_HIT_RETIRED.XSNP_HITM + MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_MISS += MEM_LOAD_UOPS_LLC_MISS_RETIRED.LOCAL_DRAM + MEM_LOAD_UOPS_LLC_MISS_RETIRED= .REMOTE_DRAM + MEM_LOAD_UOPS_LLC_MISS_RETIRED.REMOTE_HITM + MEM_LOAD_UOPS_L= LC_MISS_RETIRED.REMOTE_FWD))) / tma_info_thread_clks", "MetricGroup": "Server;TopdownL5;tma_L5_group;tma_mem_latency_grou= p", "MetricName": "tma_local_mem", @@ -955,7 +953,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles CPU= executed no uops on any execution port (Logical Processor cycles since ICL= , Physical Core cycles otherwise)", - "MetricExpr": "(cpu@UOPS_EXECUTED.CORE\\,inv\\,cmask\\=3D1@ / 2 if= #SMT_on else (min(CPU_CLK_UNHALTED.THREAD, CYCLE_ACTIVITY.CYCLES_NO_EXECUT= E) - (RS_EVENTS.EMPTY_CYCLES if tma_fetch_latency > 0.1 else 0)) / tma_info= _core_core_clks)", + "MetricExpr": "(cpu@UOPS_EXECUTED.CORE\\,inv\\,cmask\\=3D1@ / 2 if= #SMT_on else min(CPU_CLK_UNHALTED.THREAD, CYCLE_ACTIVITY.CYCLES_NO_EXECUTE= ) - (RS_EVENTS.EMPTY_CYCLES if tma_fetch_latency > 0.1 else 0)) / tma_info_= core_core_clks", "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_ports_utiliza= tion_group", "MetricName": "tma_ports_utilized_0", "MetricThreshold": "tma_ports_utilized_0 > 0.2 & (tma_ports_utiliz= ation > 0.15 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2))", @@ -964,7 +962,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles whe= re the CPU executed total of 1 uop per cycle on all execution ports (Logica= l Processor cycles since ICL, Physical Core cycles otherwise)", - "MetricExpr": "((cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D1@ - cpu@UOPS_= EXECUTED.CORE\\,cmask\\=3D2@) / 2 if #SMT_on else (UOPS_EXECUTED.CYCLES_GE_= 1_UOP_EXEC - UOPS_EXECUTED.CYCLES_GE_2_UOPS_EXEC) / tma_info_core_core_clks= )", + "MetricExpr": "((cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D1@ - cpu@UOPS_= EXECUTED.CORE\\,cmask\\=3D2@) / 2 if #SMT_on else UOPS_EXECUTED.CYCLES_GE_1= _UOP_EXEC - UOPS_EXECUTED.CYCLES_GE_2_UOPS_EXEC) / tma_info_core_core_clks", "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_issueL1;tma_p= orts_utilization_group", "MetricName": "tma_ports_utilized_1", "MetricThreshold": "tma_ports_utilized_1 > 0.2 & (tma_ports_utiliz= ation > 0.15 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2))", @@ -973,7 +971,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles CPU= executed total of 2 uops per cycle on all execution ports (Logical Process= or cycles since ICL, Physical Core cycles otherwise)", - "MetricExpr": "((cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D2@ - cpu@UOPS_= EXECUTED.CORE\\,cmask\\=3D3@) / 2 if #SMT_on else (UOPS_EXECUTED.CYCLES_GE_= 2_UOPS_EXEC - UOPS_EXECUTED.CYCLES_GE_3_UOPS_EXEC) / tma_info_core_core_clk= s)", + "MetricExpr": "((cpu@UOPS_EXECUTED.CORE\\,cmask\\=3D2@ - cpu@UOPS_= EXECUTED.CORE\\,cmask\\=3D3@) / 2 if #SMT_on else UOPS_EXECUTED.CYCLES_GE_2= _UOPS_EXEC - UOPS_EXECUTED.CYCLES_GE_3_UOPS_EXEC) / tma_info_core_core_clks= ", "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_issue2P;tma_p= orts_utilization_group", "MetricName": "tma_ports_utilized_2", "MetricThreshold": "tma_ports_utilized_2 > 0.15 & (tma_ports_utili= zation > 0.15 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2))", @@ -1000,6 +998,7 @@ }, { "BriefDescription": "This metric estimates fraction of cycles whil= e the memory subsystem was handling loads from remote memory", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "310 * (MEM_LOAD_UOPS_LLC_MISS_RETIRED.REMOTE_DRAM *= (1 + MEM_LOAD_UOPS_RETIRED.HIT_LFB / (MEM_LOAD_UOPS_RETIRED.L2_HIT + MEM_L= OAD_UOPS_RETIRED.LLC_HIT + MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_HIT + MEM_LOA= D_UOPS_LLC_HIT_RETIRED.XSNP_HITM + MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_MISS = + MEM_LOAD_UOPS_LLC_MISS_RETIRED.LOCAL_DRAM + MEM_LOAD_UOPS_LLC_MISS_RETIRE= D.REMOTE_DRAM + MEM_LOAD_UOPS_LLC_MISS_RETIRED.REMOTE_HITM + MEM_LOAD_UOPS_= LLC_MISS_RETIRED.REMOTE_FWD))) / tma_info_thread_clks", "MetricGroup": "Server;Snoop;TopdownL5;tma_L5_group;tma_mem_latenc= y_group", "MetricName": "tma_remote_mem", --=20 2.50.0.727.gbf7dc18ff4-goog From nobody Mon Oct 6 15:13:22 2025 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0F6CD221F26 for ; Sat, 19 Jul 2025 03:45:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752896750; cv=none; b=W5rT7xJqyaiclEjRbt/owT2QNfz/j51JxRvBvEqisImtd4aym40/l6VnUmSpxTJYXdVDY47i+/QwKiC+V3trybTkmn1bQCU39ArG7YoFsQEgZ1Yj0VReukQNfHLPE3w0e8LrQVcOSpJpVoFY+OMjxoqIoApzDYpB6n1FTBxd5qE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752896750; c=relaxed/simple; bh=/8ru69dmq9Cq2iRfp0MBXeJovmB8oB+19cJs9x6jK08=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Content-Type; b=iO4I2pI2c7IJ1nPEr+NWYJT+hzmTtZKdMc1ScOpIsJz+OHE5hLXCAbFlUG6oS1D4k9YgnPU6MAOPJqnjAowqV29kI3SAPDHVlfFHHnXiN28TdRh0Boi9cBwcNct0MCxhlgqpSXtWh2eMZ2P118QoVMSGsUvytyiRUMW/48SfHJY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=kmFjf0Qi; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="kmFjf0Qi" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-234f1acc707so25853755ad.3 for ; Fri, 18 Jul 2025 20:45:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1752896746; x=1753501546; darn=vger.kernel.org; h=to:from:subject:message-id:references:mime-version:in-reply-to:date :from:to:cc:subject:date:message-id:reply-to; bh=ahStHSrVpuKQhFyGWBebgfTyDgnCYq3IKD5gtRO66co=; b=kmFjf0Qirtq0hFyIEtNdmGZtdxBnI8RU3PjQxJToq6//QskipRna8C3kpMBbcyyOF0 f7bdlZdTbmO7mYS9DVYmCJT+DGVJOU3dIafMp434UjLHlPBcGxK0+tRZLoPl6SvrGuVJ ddlGEOX/wy2pBj+sKszTlrYEYStdRWbWWDba5h5lk4kTAsSBpBe3P6ZQwPhEC4Pt99Z3 iYhp/K9/6QOsd5PHutx/g2/kileXq94DNme64gnBU0I+4SBEjHu+gQTQ63gQ6rj/UKII gJXXNUJd5jyA/gnh7NWnqRwtFv8E+RebYzb0PqdhIVx9be5XF2yfNj+kLr+qHxgi5rg0 54tQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752896746; x=1753501546; h=to:from:subject:message-id:references:mime-version:in-reply-to:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ahStHSrVpuKQhFyGWBebgfTyDgnCYq3IKD5gtRO66co=; b=iomPhmvwpf3tnBJCVi8R9wWKwVZ6+xqf5gr8x0BMeNtoWhCLCjuofp5Cm7D2c+xtCg EyCyY4BRAKYikKN1EMz+QF7yZFFJamQjTNK4W1lZsCz9tJM1lSSE/c0IfEWkPAxxABWu YOuQ47f5rT3cZUUQ9phu3mVQ3qiupFQREpqBoTRkGE8X55yt3yol8X3FSi4py89S9Alb BwBlAddQQBELQM7RbMKpk/2sAPtggRqUyumKgh0iuVBRZ/aP9M8SQs29YHEp6Rf6FvKl RqMh2vV9DbbBf1v2ptGGU3SIbBWQFau2AdrvwOSJipKD+CmOAAp+wBHUWfODSSLKfJyn Ac8w== X-Forwarded-Encrypted: i=1; AJvYcCUEpV6hF0UVkuWMkCM82o61aQKxqlPcpjiSgHHGEnGptJdfhRl2hDXesc5X9XSXkcpSydhv55GUGVoI0D4=@vger.kernel.org X-Gm-Message-State: AOJu0YyVtSxweIT6B2F4y8szIKaOJB/4PFr/HS3HOQ9uXjIQgDT5ACoX mNUXTOIVVq/dwEMPViqKH5HQW3lwsHwkLYBfXosXU4VRZcBNHSJDWu6qslBprzEyZeZ01PztC/m nTHNvEtHZdQ== X-Google-Smtp-Source: AGHT+IGv0sffouYqm1gOpRs6Jr1vqe1c28k9q/lC5/fr4WrVy4M79yj4O+gHrSlforNmYAwEPqNVcHwwUXyz X-Received: from pjbsj2.prod.google.com ([2002:a17:90b:2d82:b0:31c:15c8:4c80]) (user=irogers job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:d48f:b0:234:a734:4ab1 with SMTP id d9443c01a7336-23e2566b004mr175340085ad.3.1752896745919; Fri, 18 Jul 2025 20:45:45 -0700 (PDT) Date: Fri, 18 Jul 2025 20:45:07 -0700 In-Reply-To: <20250719034515.2000467-1-irogers@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250719034515.2000467-1-irogers@google.com> X-Mailer: git-send-email 2.50.0.727.gbf7dc18ff4-goog Message-ID: <20250719034515.2000467-12-irogers@google.com> Subject: [PATCH v1 11/19] perf vendor events: Update jaketown metrics From: Ian Rogers To: Thomas Falcon , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Kan Liang , "=?UTF-8?q?Andreas=20F=C3=A4rber?=" , Manivannan Sadhasivam , Caleb Biggers , Weilin Wang , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-actions@lists.infradead.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Update metrics from TMA 5.0 to 5.1. Signed-off-by: Ian Rogers Tested-by: Thomas Falcon --- .../arch/x86/jaketown/jkt-metrics.json | 20 +++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/tools/perf/pmu-events/arch/x86/jaketown/jkt-metrics.json b/too= ls/perf/pmu-events/arch/x86/jaketown/jkt-metrics.json index 6f636ea0f216..250c73b21385 100644 --- a/tools/perf/pmu-events/arch/x86/jaketown/jkt-metrics.json +++ b/tools/perf/pmu-events/arch/x86/jaketown/jkt-metrics.json @@ -1,49 +1,49 @@ [ { "BriefDescription": "C2 residency percent per package", - "MetricExpr": "cstate_pkg@c2\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c2\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C2_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C3 residency percent per core", - "MetricExpr": "cstate_core@c3\\-residency@ / TSC", + "MetricExpr": "cstate_core@c3\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C3_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C3 residency percent per package", - "MetricExpr": "cstate_pkg@c3\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c3\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C3_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per core", - "MetricExpr": "cstate_core@c6\\-residency@ / TSC", + "MetricExpr": "cstate_core@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per package", - "MetricExpr": "cstate_pkg@c6\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C7 residency percent per core", - "MetricExpr": "cstate_core@c7\\-residency@ / TSC", + "MetricExpr": "cstate_core@c7\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C7_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C7 residency percent per package", - "MetricExpr": "cstate_pkg@c7\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c7\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C7_Pkg_Residency", "ScaleUnit": "100%" @@ -71,7 +71,6 @@ }, { "BriefDescription": "This category represents fraction of slots wh= ere no uops are being delivered due to a lack of required resources for acc= epting new uops in the Backend", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "1 - (tma_frontend_bound + tma_bad_speculation + tma= _retiring)", "MetricGroup": "BvOB;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", @@ -296,7 +295,7 @@ }, { "BriefDescription": "Measured Average Core Frequency for unhalted = processors [GHz]", - "MetricExpr": "tma_info_system_turbo_utilization * TSC / 1e9 / tma= _info_system_time", + "MetricExpr": "tma_info_system_turbo_utilization * msr@tsc@ / 1e9 = / tma_info_system_time", "MetricGroup": "Power;Summary", "MetricName": "tma_info_system_core_frequency" }, @@ -308,7 +307,7 @@ }, { "BriefDescription": "Average number of utilized CPUs", - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / TSC", + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@", "MetricGroup": "Summary", "MetricName": "tma_info_system_cpus_utilized" }, @@ -348,6 +347,7 @@ }, { "BriefDescription": "Average number of parallel data read requests= to external memory", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "UNC_C_TOR_OCCUPANCY.MISS_OPCODE@filter_opc\\=3D0x18= 2@ / UNC_C_TOR_OCCUPANCY.MISS_OPCODE@filter_opc\\=3D0x182\\,thresh\\=3D1@", "MetricGroup": "Mem;MemoryBW;SoC", "MetricName": "tma_info_system_mem_parallel_reads", --=20 2.50.0.727.gbf7dc18ff4-goog From nobody Mon Oct 6 15:13:22 2025 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BF87622331B for ; Sat, 19 Jul 2025 03:45:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752896760; cv=none; b=TnJeIGAhCgrVZ2+DmIEyiUCY9ugQTeX6AngXDVJWWDXmFou7GdPVFtPqpfg3MthSzRdC7Z9B/jN8NDK4/hrKBp/wuC/deg8fOBvl5bF5+orDoKIu5zIMDpeaMSIr/vlFBeu2hLUnsO/lojPzacpEJBYFU8sC2C9SkGvdqzKFfGo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752896760; c=relaxed/simple; bh=i1eXlk9xFyXviyW1qsHLWNOJo3ODH6cbAkz32Aicg1Y=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Content-Type; b=u8FUEO0WRi0z7nV5E/SOs5iKOVAMn24OKOpVuFuHTcC4cNv0GbGflSL54xMtf/mA1nPDeehvzABHjBpvYEtkrkCI8RZCtjIPALC5L5kyDEIABTW8iX0/OuGoLNIMWr5kFIQO1KNbRdyDVnKhC58mZhwZBeRqDmLoHUHkET1x2pY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=bJ989k5w; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="bJ989k5w" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-3141f9ce4e2so3985400a91.1 for ; Fri, 18 Jul 2025 20:45:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1752896748; x=1753501548; darn=vger.kernel.org; h=content-transfer-encoding:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=NFA6U8SoJ0t5tXDC1ICU5ArTwjjIXnavgGFPNSriYDw=; b=bJ989k5wB8ygJsILZqWahSY228bezP9sEHJPUe2RP3724f63ELOJSpgXXftiHwl9y4 Xps+t8Ly+13jp5O4gAuqe9hns5UCkWfSx7Hqzo+ojPwi2vysk3F5sAu9T52k3Y7rh+2P Un6H1FwX7wqO4D2HYJueIOKQQZRnRBjdAb8Si5PtZbDC1W2tmskM20gd8Lzoi/nzvdEQ w4Dij63238UMUU5kUXaOReZeatko87YQBYMozHIHbWhcYCClKCp6BVZXKlJ++BAzea6P +2luGhMsit19T36cYlGQYO4Sc8e/7HnEcxm9yzYZ2dTMBc20MasX0ReOPIf6K5zQtZ4D R8iw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752896748; x=1753501548; h=content-transfer-encoding:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=NFA6U8SoJ0t5tXDC1ICU5ArTwjjIXnavgGFPNSriYDw=; b=DKiNujtAgqQpwF3j8eB0RFBBl8z7cmtipFH7aEnKZpCt21kCiQj6Mia5wf0kgg6sTB 62xKf/BUVRroqF2ipqZ32j/deVEavIt6Ho7w39g9Fn7ERiQxdjL8SG79ogtr3Fm8EMvI SiggL6FHSw8u4IO8eM2XH1QD7UBfxifXJh6C3wKTVtxXEqWF3nfMbMILxLO03YS2QUhz IUw20brLhU62zz2dhv4Zhs6KwUCFyy3N9rpu3bVtpwU/Sxuo+dMt9aRcr2DMSU7wpria gGGY62gPU1aFs0WJuf+TSGw/L3SMcR+ib3XYTRaphEsk0eM9yrb+1+BHBYPxIC7/MASJ wNcw== X-Forwarded-Encrypted: i=1; AJvYcCXiLJHAf7L6K94GvJI5cTBKJLx1PzJKE0We966tjUxinaUQY2WpQZthVIwIT5pzcTrVYc2i1MEK7aE+7io=@vger.kernel.org X-Gm-Message-State: AOJu0YyXeuR6yuX7bUozRWK5qvt5re6PYYc/UwfbanXBcB8Df2F4eqbS tEeBm7GQ9dU107w1W1N6BJ4j4O9dKG4DeU9PcNBl9IN29sMucSwmXloQjv+bwIgyi7ncOPgaY+v RUR6mspWUhg== X-Google-Smtp-Source: AGHT+IFCWO94RxMMX+CVdYfCiYZ+xwJpIVw3hNxhJCZa6K2boNzWfwYBuEdHMgoZT22tQI1BblZP8OZ5xtyp X-Received: from pjvv12.prod.google.com ([2002:a17:90b:588c:b0:311:c20d:676d]) (user=irogers job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:384d:b0:311:cc4e:516f with SMTP id 98e67ed59e1d1-31c9f44dd0fmr18764645a91.31.1752896747652; Fri, 18 Jul 2025 20:45:47 -0700 (PDT) Date: Fri, 18 Jul 2025 20:45:08 -0700 In-Reply-To: <20250719034515.2000467-1-irogers@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250719034515.2000467-1-irogers@google.com> X-Mailer: git-send-email 2.50.0.727.gbf7dc18ff4-goog Message-ID: <20250719034515.2000467-13-irogers@google.com> Subject: [PATCH v1 12/19] perf vendor events: Update lunarlake events/metrics From: Ian Rogers To: Thomas Falcon , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Kan Liang , "=?UTF-8?q?Andreas=20F=C3=A4rber?=" , Manivannan Sadhasivam , Caleb Biggers , Weilin Wang , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-actions@lists.infradead.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Update events from v1.14 to v1.17. Update metrics from TMA 5.0 to 5.1. The event updates come from: https://github.com/intel/perfmon/commit/6bdcbce3e9df30ae02bd0ea51fd73bf51ee= 8aff4 https://github.com/intel/perfmon/commit/1684fa543fd45970759bb72dc3fc00c2ef8= 7c0e8 Signed-off-by: Ian Rogers Tested-by: Thomas Falcon --- .../pmu-events/arch/x86/lunarlake/cache.json | 104 ++++++--- .../arch/x86/lunarlake/frontend.json | 40 ++-- .../arch/x86/lunarlake/lnl-metrics.json | 216 ++++++++++-------- .../pmu-events/arch/x86/lunarlake/memory.json | 22 +- .../arch/x86/lunarlake/pipeline.json | 85 ++++--- .../x86/lunarlake/uncore-interconnect.json | 10 + .../arch/x86/lunarlake/uncore-memory.json | 8 + tools/perf/pmu-events/arch/x86/mapfile.csv | 2 +- 8 files changed, 294 insertions(+), 193 deletions(-) create mode 100644 tools/perf/pmu-events/arch/x86/lunarlake/uncore-interco= nnect.json diff --git a/tools/perf/pmu-events/arch/x86/lunarlake/cache.json b/tools/pe= rf/pmu-events/arch/x86/lunarlake/cache.json index ff37d49611c3..29bcb847178f 100644 --- a/tools/perf/pmu-events/arch/x86/lunarlake/cache.json +++ b/tools/perf/pmu-events/arch/x86/lunarlake/cache.json @@ -28,6 +28,16 @@ "UMask": "0x1", "Unit": "cpu_core" }, + { + "BriefDescription": "Cachelines replaced into the L1 d-cache. Succ= essful replacements only (not blocked) and exclude WB-miss case", + "Counter": "0,1,2,3,4,5,6,7,8,9", + "EventCode": "0x51", + "EventName": "L1D.L1_REPLACEMENT", + "PublicDescription": "Counts cachelines replaced into the L1 d-cac= he.", + "SampleAfterValue": "1000003", + "UMask": "0x4", + "Unit": "cpu_core" + }, { "BriefDescription": "Cachelines replaced into the L0 and L1 d-cach= e. Successful replacements only (not blocked) and exclude WB-miss case", "Counter": "0,1,2,3,4,5,6,7,8,9", @@ -592,7 +602,7 @@ "Data_LA": "1", "EventCode": "0xd0", "EventName": "MEM_INST_RETIRED.ALL_LOADS", - "PublicDescription": "Counts Instructions with at least one archit= ecturally visible load retired. Available PDIST counters: 0", + "PublicDescription": "Counts Instructions with at least one archit= ecturally visible load retired. Available PDIST counters: 0,1", "SampleAfterValue": "1000003", "UMask": "0x81", "Unit": "cpu_core" @@ -603,7 +613,7 @@ "Data_LA": "1", "EventCode": "0xd0", "EventName": "MEM_INST_RETIRED.ALL_STORES", - "PublicDescription": "Counts all retired store instructions. Avail= able PDIST counters: 0", + "PublicDescription": "Counts all retired store instructions. Avail= able PDIST counters: 0,1", "SampleAfterValue": "1000003", "UMask": "0x82", "Unit": "cpu_core" @@ -613,7 +623,7 @@ "Counter": "0,1,2,3", "EventCode": "0xd0", "EventName": "MEM_INST_RETIRED.ALL_SWPF", - "PublicDescription": "Counts all retired software prefetch instruc= tions. Available PDIST counters: 0", + "PublicDescription": "Counts all retired software prefetch instruc= tions. Available PDIST counters: 0,1", "SampleAfterValue": "1000003", "UMask": "0x84", "Unit": "cpu_core" @@ -624,7 +634,7 @@ "Data_LA": "1", "EventCode": "0xd0", "EventName": "MEM_INST_RETIRED.ANY", - "PublicDescription": "Counts all retired memory instructions - loa= ds and stores. Available PDIST counters: 0", + "PublicDescription": "Counts all retired memory instructions - loa= ds and stores. Available PDIST counters: 0,1", "SampleAfterValue": "1000003", "UMask": "0x87", "Unit": "cpu_core" @@ -635,7 +645,7 @@ "Data_LA": "1", "EventCode": "0xd0", "EventName": "MEM_INST_RETIRED.LOCK_LOADS", - "PublicDescription": "Counts retired load instructions with locked= access. Available PDIST counters: 0", + "PublicDescription": "Counts retired load instructions with locked= access. Available PDIST counters: 0,1", "SampleAfterValue": "100007", "UMask": "0x21", "Unit": "cpu_core" @@ -646,7 +656,7 @@ "Data_LA": "1", "EventCode": "0xd0", "EventName": "MEM_INST_RETIRED.SPLIT_LOADS", - "PublicDescription": "Counts retired load instructions that split = across a cacheline boundary. Available PDIST counters: 0", + "PublicDescription": "Counts retired load instructions that split = across a cacheline boundary. Available PDIST counters: 0,1", "SampleAfterValue": "100003", "UMask": "0x41", "Unit": "cpu_core" @@ -657,18 +667,29 @@ "Data_LA": "1", "EventCode": "0xd0", "EventName": "MEM_INST_RETIRED.SPLIT_STORES", - "PublicDescription": "Counts retired store instructions that split= across a cacheline boundary. Available PDIST counters: 0", + "PublicDescription": "Counts retired store instructions that split= across a cacheline boundary. Available PDIST counters: 0,1", "SampleAfterValue": "100003", "UMask": "0x42", "Unit": "cpu_core" }, + { + "BriefDescription": "Retired instructions that hit the STLB.", + "Counter": "0,1,2,3", + "Data_LA": "1", + "EventCode": "0xd0", + "EventName": "MEM_INST_RETIRED.STLB_HIT_ANY", + "PublicDescription": "Number of retired instructions with a clean = hit in the 2nd-level TLB (STLB). Available PDIST counters: 0,1", + "SampleAfterValue": "100003", + "UMask": "0xf", + "Unit": "cpu_core" + }, { "BriefDescription": "Retired load instructions that hit the STLB.", "Counter": "0,1,2,3", "Data_LA": "1", "EventCode": "0xd0", "EventName": "MEM_INST_RETIRED.STLB_HIT_LOADS", - "PublicDescription": "Number of retired load instructions with a c= lean hit in the 2nd-level TLB (STLB). Available PDIST counters: 0", + "PublicDescription": "Number of retired load instructions with a c= lean hit in the 2nd-level TLB (STLB). Available PDIST counters: 0,1", "SampleAfterValue": "100003", "UMask": "0x9", "Unit": "cpu_core" @@ -679,18 +700,39 @@ "Data_LA": "1", "EventCode": "0xd0", "EventName": "MEM_INST_RETIRED.STLB_HIT_STORES", - "PublicDescription": "Number of retired store instructions that hi= t in the 2nd-level TLB (STLB). Available PDIST counters: 0", + "PublicDescription": "Number of retired store instructions that hi= t in the 2nd-level TLB (STLB). Available PDIST counters: 0,1", "SampleAfterValue": "100003", "UMask": "0xa", "Unit": "cpu_core" }, + { + "BriefDescription": "Retired SWPF instructions that hit the STLB.", + "Counter": "0,1,2,3", + "EventCode": "0xd0", + "EventName": "MEM_INST_RETIRED.STLB_HIT_SWPF", + "PublicDescription": "Number of retired SWPF instructions that hit= in the 2nd-level TLB (STLB). Available PDIST counters: 0,1", + "SampleAfterValue": "1000003", + "UMask": "0xc", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Retired instructions that miss the STLB.", + "Counter": "0,1,2,3", + "Data_LA": "1", + "EventCode": "0xd0", + "EventName": "MEM_INST_RETIRED.STLB_MISS_ANY", + "PublicDescription": "Retired instructions that miss the STLB. Ava= ilable PDIST counters: 0,1", + "SampleAfterValue": "100003", + "UMask": "0x17", + "Unit": "cpu_core" + }, { "BriefDescription": "Retired load instructions that miss the STLB.= ", "Counter": "0,1,2,3", "Data_LA": "1", "EventCode": "0xd0", "EventName": "MEM_INST_RETIRED.STLB_MISS_LOADS", - "PublicDescription": "Number of retired load instructions that (st= art a) miss in the 2nd-level TLB (STLB). Available PDIST counters: 0", + "PublicDescription": "Number of retired load instructions that (st= art a) miss in the 2nd-level TLB (STLB). Available PDIST counters: 0,1", "SampleAfterValue": "100003", "UMask": "0x11", "Unit": "cpu_core" @@ -701,18 +743,28 @@ "Data_LA": "1", "EventCode": "0xd0", "EventName": "MEM_INST_RETIRED.STLB_MISS_STORES", - "PublicDescription": "Number of retired store instructions that (s= tart a) miss in the 2nd-level TLB (STLB). Available PDIST counters: 0", + "PublicDescription": "Number of retired store instructions that (s= tart a) miss in the 2nd-level TLB (STLB). Available PDIST counters: 0,1", "SampleAfterValue": "100003", "UMask": "0x12", "Unit": "cpu_core" }, + { + "BriefDescription": "Retired SWPF instructions that miss the STLB.= ", + "Counter": "0,1,2,3", + "EventCode": "0xd0", + "EventName": "MEM_INST_RETIRED.STLB_MISS_SWPF", + "PublicDescription": "Number of retired SWPF instructions that (st= art a) miss in the 2nd-level TLB (STLB). Available PDIST counters: 0,1", + "SampleAfterValue": "1000003", + "UMask": "0x14", + "Unit": "cpu_core" + }, { "BriefDescription": "Retired load instructions whose data sources = were a cross-core Snoop hits and forwards data from an in on-package core c= ache (induced by NI$)", "Counter": "0,1,2,3", "Data_LA": "1", "EventCode": "0xd2", "EventName": "MEM_LOAD_L3_HIT_RETIRED.XSNP_FWD", - "PublicDescription": "Counts retired load instructions whose data = sources were a cross-core Snoop hits and forwards data from an in on-packag= e core cache (induced by NI$) Available PDIST counters: 0", + "PublicDescription": "Counts retired load instructions whose data = sources were a cross-core Snoop hits and forwards data from an in on-packag= e core cache (induced by NI$) Available PDIST counters: 0,1", "SampleAfterValue": "20011", "UMask": "0x10", "Unit": "cpu_core" @@ -723,7 +775,7 @@ "Data_LA": "1", "EventCode": "0xd2", "EventName": "MEM_LOAD_L3_HIT_RETIRED.XSNP_HITM", - "PublicDescription": "Counts retired load instructions whose data = sources were HitM responses from shared L3, Hit-with-FWD is normally exclud= ed. Available PDIST counters: 0", + "PublicDescription": "Counts retired load instructions whose data = sources were HitM responses from shared L3, Hit-with-FWD is normally exclud= ed. Available PDIST counters: 0,1", "SampleAfterValue": "20011", "UMask": "0x4", "Unit": "cpu_core" @@ -734,7 +786,7 @@ "Data_LA": "1", "EventCode": "0xd2", "EventName": "MEM_LOAD_L3_HIT_RETIRED.XSNP_MISS", - "PublicDescription": "Counts the retired load instructions whose d= ata sources were L3 hit and cross-core snoop missed in on-pkg core cache. A= vailable PDIST counters: 0", + "PublicDescription": "Counts the retired load instructions whose d= ata sources were L3 hit and cross-core snoop missed in on-pkg core cache. A= vailable PDIST counters: 0,1", "SampleAfterValue": "20011", "UMask": "0x1", "Unit": "cpu_core" @@ -745,7 +797,7 @@ "Data_LA": "1", "EventCode": "0xd2", "EventName": "MEM_LOAD_L3_HIT_RETIRED.XSNP_NO_FWD", - "PublicDescription": "Counts retired load instructions whose data = sources were L3 and cross-core snoop hits in on-pkg core cache. Available P= DIST counters: 0", + "PublicDescription": "Counts retired load instructions whose data = sources were L3 and cross-core snoop hits in on-pkg core cache. Available P= DIST counters: 0,1", "SampleAfterValue": "20011", "UMask": "0x2", "Unit": "cpu_core" @@ -756,7 +808,7 @@ "Data_LA": "1", "EventCode": "0xd3", "EventName": "MEM_LOAD_L3_MISS_RETIRED.MEMSIDE_CACHE", - "PublicDescription": "Retired load instructions which data source = is memory side cache. Available PDIST counters: 0", + "PublicDescription": "Retired load instructions which data source = is memory side cache. Available PDIST counters: 0,1", "SampleAfterValue": "100007", "Unit": "cpu_core" }, @@ -766,7 +818,7 @@ "Data_LA": "1", "EventCode": "0xd4", "EventName": "MEM_LOAD_MISC_RETIRED.UC", - "PublicDescription": "Retired instructions with at least one load = to uncacheable memory-type, or at least one cache-line split locked access = (Bus Lock). Available PDIST counters: 0", + "PublicDescription": "Retired instructions with at least one load = to uncacheable memory-type, or at least one cache-line split locked access = (Bus Lock). Available PDIST counters: 0,1", "SampleAfterValue": "100007", "UMask": "0x4", "Unit": "cpu_core" @@ -777,7 +829,7 @@ "Data_LA": "1", "EventCode": "0xd1", "EventName": "MEM_LOAD_RETIRED.FB_HIT", - "PublicDescription": "Counts retired load instructions with at lea= st one uop was load missed in L1 but hit FB (Fill Buffers) due to preceding= miss to the same cache line with data not ready. Available PDIST counters:= 0", + "PublicDescription": "Counts retired load instructions with at lea= st one uop was load missed in L1 but hit FB (Fill Buffers) due to preceding= miss to the same cache line with data not ready. Available PDIST counters:= 0,1", "SampleAfterValue": "100007", "UMask": "0x40", "Unit": "cpu_core" @@ -788,7 +840,7 @@ "Data_LA": "1", "EventCode": "0xd1", "EventName": "MEM_LOAD_RETIRED.L1_HIT", - "PublicDescription": "Counts retired load instructions with at lea= st one uop that hit in the L1 data cache. This event includes all SW prefet= ches and lock instructions regardless of the data source. Available PDIST c= ounters: 0", + "PublicDescription": "Counts retired load instructions with at lea= st one uop that hit in the L1 data cache. This event includes all SW prefet= ches and lock instructions regardless of the data source. Available PDIST c= ounters: 0,1", "SampleAfterValue": "1000003", "UMask": "0x101", "Unit": "cpu_core" @@ -799,7 +851,7 @@ "Data_LA": "1", "EventCode": "0xd1", "EventName": "MEM_LOAD_RETIRED.L1_HIT_L0", - "PublicDescription": "Counts retired load instructions with at lea= st one uop that hit in the Level 0 of the L1 data cache. This event include= s all SW prefetches and lock instructions regardless of the data source. Av= ailable PDIST counters: 0", + "PublicDescription": "Counts retired load instructions with at lea= st one uop that hit in the Level 0 of the L1 data cache. This event include= s all SW prefetches and lock instructions regardless of the data source. Av= ailable PDIST counters: 0,1", "SampleAfterValue": "1000003", "UMask": "0x1", "Unit": "cpu_core" @@ -809,7 +861,7 @@ "Counter": "0,1,2,3", "EventCode": "0xd1", "EventName": "MEM_LOAD_RETIRED.L1_HIT_L1", - "PublicDescription": "Counts retired load instructions with at lea= st one uop that hit in the Level 1 of the L1 data cache. Available PDIST co= unters: 0", + "PublicDescription": "Counts retired load instructions with at lea= st one uop that hit in the Level 1 of the L1 data cache. Available PDIST co= unters: 0,1", "SampleAfterValue": "1000003", "Unit": "cpu_core" }, @@ -819,7 +871,7 @@ "Data_LA": "1", "EventCode": "0xd1", "EventName": "MEM_LOAD_RETIRED.L1_MISS", - "PublicDescription": "Counts retired load instructions with at lea= st one uop that missed in the L1 cache. Available PDIST counters: 0", + "PublicDescription": "Counts retired load instructions with at lea= st one uop that missed in the L1 cache. Available PDIST counters: 0,1", "SampleAfterValue": "200003", "UMask": "0x8", "Unit": "cpu_core" @@ -830,7 +882,7 @@ "Data_LA": "1", "EventCode": "0xd1", "EventName": "MEM_LOAD_RETIRED.L2_HIT", - "PublicDescription": "Counts retired load instructions with L2 cac= he hits as data sources. Available PDIST counters: 0", + "PublicDescription": "Counts retired load instructions with L2 cac= he hits as data sources. Available PDIST counters: 0,1", "SampleAfterValue": "200003", "UMask": "0x2", "Unit": "cpu_core" @@ -841,7 +893,7 @@ "Data_LA": "1", "EventCode": "0xd1", "EventName": "MEM_LOAD_RETIRED.L2_MISS", - "PublicDescription": "Counts retired load instructions missed L2 c= ache as data sources. Available PDIST counters: 0", + "PublicDescription": "Counts retired load instructions missed L2 c= ache as data sources. Available PDIST counters: 0,1", "SampleAfterValue": "100021", "UMask": "0x10", "Unit": "cpu_core" @@ -852,7 +904,7 @@ "Data_LA": "1", "EventCode": "0xd1", "EventName": "MEM_LOAD_RETIRED.L3_HIT", - "PublicDescription": "Counts retired load instructions with at lea= st one uop that hit in the L3 cache. Available PDIST counters: 0", + "PublicDescription": "Counts retired load instructions with at lea= st one uop that hit in the L3 cache. Available PDIST counters: 0,1", "SampleAfterValue": "100021", "UMask": "0x4", "Unit": "cpu_core" @@ -863,7 +915,7 @@ "Data_LA": "1", "EventCode": "0xd1", "EventName": "MEM_LOAD_RETIRED.L3_MISS", - "PublicDescription": "Counts retired load instructions with at lea= st one uop that missed in the L3 cache. Available PDIST counters: 0", + "PublicDescription": "Counts retired load instructions with at lea= st one uop that missed in the L3 cache. Available PDIST counters: 0,1", "SampleAfterValue": "50021", "UMask": "0x20", "Unit": "cpu_core" diff --git a/tools/perf/pmu-events/arch/x86/lunarlake/frontend.json b/tools= /perf/pmu-events/arch/x86/lunarlake/frontend.json index e2facc4086e9..b21d602e9f1a 100644 --- a/tools/perf/pmu-events/arch/x86/lunarlake/frontend.json +++ b/tools/perf/pmu-events/arch/x86/lunarlake/frontend.json @@ -108,7 +108,7 @@ "EventName": "FRONTEND_RETIRED.ANY_ANT", "MSRIndex": "0x3F7", "MSRValue": "0x9", - "PublicDescription": "Always Not Taken (ANT) conditional retired b= ranches (no BTB entry and not mispredicted) Available PDIST counters: 0", + "PublicDescription": "Always Not Taken (ANT) conditional retired b= ranches (no BTB entry and not mispredicted) Available PDIST counters: 0,1", "SampleAfterValue": "100007", "UMask": "0x3", "Unit": "cpu_core" @@ -120,7 +120,7 @@ "EventName": "FRONTEND_RETIRED.ANY_DSB_MISS", "MSRIndex": "0x3F7", "MSRValue": "0x1", - "PublicDescription": "Counts retired Instructions that experienced= DSB (Decode stream buffer i.e. the decoded instruction-cache) miss. Availa= ble PDIST counters: 0", + "PublicDescription": "Counts retired Instructions that experienced= DSB (Decode stream buffer i.e. the decoded instruction-cache) miss. Availa= ble PDIST counters: 0,1", "SampleAfterValue": "100007", "UMask": "0x3", "Unit": "cpu_core" @@ -169,7 +169,7 @@ "EventName": "FRONTEND_RETIRED.DSB_MISS", "MSRIndex": "0x3F7", "MSRValue": "0x11", - "PublicDescription": "Number of retired Instructions that experien= ced a critical DSB (Decode stream buffer i.e. the decoded instruction-cache= ) miss. Critical means stalls were exposed to the back-end as a result of t= he DSB miss. Available PDIST counters: 0", + "PublicDescription": "Number of retired Instructions that experien= ced a critical DSB (Decode stream buffer i.e. the decoded instruction-cache= ) miss. Critical means stalls were exposed to the back-end as a result of t= he DSB miss. Available PDIST counters: 0,1", "SampleAfterValue": "100007", "UMask": "0x3", "Unit": "cpu_core" @@ -199,7 +199,7 @@ "EventName": "FRONTEND_RETIRED.ITLB_MISS", "MSRIndex": "0x3F7", "MSRValue": "0x14", - "PublicDescription": "Counts retired Instructions that experienced= iTLB (Instruction TLB) true miss. Available PDIST counters: 0", + "PublicDescription": "Counts retired Instructions that experienced= iTLB (Instruction TLB) true miss. Available PDIST counters: 0,1", "SampleAfterValue": "100007", "UMask": "0x3", "Unit": "cpu_core" @@ -211,7 +211,7 @@ "EventName": "FRONTEND_RETIRED.L1I_MISS", "MSRIndex": "0x3F7", "MSRValue": "0x12", - "PublicDescription": "Counts retired Instructions who experienced = Instruction L1 Cache true miss. Available PDIST counters: 0", + "PublicDescription": "Counts retired Instructions who experienced = Instruction L1 Cache true miss. Available PDIST counters: 0,1", "SampleAfterValue": "100007", "UMask": "0x3", "Unit": "cpu_core" @@ -223,7 +223,7 @@ "EventName": "FRONTEND_RETIRED.L2_MISS", "MSRIndex": "0x3F7", "MSRValue": "0x13", - "PublicDescription": "Counts retired Instructions who experienced = Instruction L2 Cache true miss. Available PDIST counters: 0", + "PublicDescription": "Counts retired Instructions who experienced = Instruction L2 Cache true miss. Available PDIST counters: 0,1", "SampleAfterValue": "100007", "UMask": "0x3", "Unit": "cpu_core" @@ -235,7 +235,7 @@ "EventName": "FRONTEND_RETIRED.LATENCY_GE_128", "MSRIndex": "0x3F7", "MSRValue": "0x608006", - "PublicDescription": "Counts retired instructions that are fetched= after an interval where the front-end delivered no uops for a period of 12= 8 cycles which was not interrupted by a back-end stall. Available PDIST cou= nters: 0", + "PublicDescription": "Counts retired instructions that are fetched= after an interval where the front-end delivered no uops for a period of 12= 8 cycles which was not interrupted by a back-end stall. Available PDIST cou= nters: 0,1", "SampleAfterValue": "100007", "UMask": "0x3", "Unit": "cpu_core" @@ -247,7 +247,7 @@ "EventName": "FRONTEND_RETIRED.LATENCY_GE_16", "MSRIndex": "0x3F7", "MSRValue": "0x601006", - "PublicDescription": "Counts retired instructions that are deliver= ed to the back-end after a front-end stall of at least 16 cycles. During th= is period the front-end delivered no uops. Available PDIST counters: 0", + "PublicDescription": "Counts retired instructions that are deliver= ed to the back-end after a front-end stall of at least 16 cycles. During th= is period the front-end delivered no uops. Available PDIST counters: 0,1", "SampleAfterValue": "100007", "UMask": "0x3", "Unit": "cpu_core" @@ -259,7 +259,7 @@ "EventName": "FRONTEND_RETIRED.LATENCY_GE_2", "MSRIndex": "0x3F7", "MSRValue": "0x600206", - "PublicDescription": "Retired instructions that are fetched after = an interval where the front-end delivered no uops for a period of at least = 2 cycles which was not interrupted by a back-end stall. Available PDIST cou= nters: 0", + "PublicDescription": "Retired instructions that are fetched after = an interval where the front-end delivered no uops for a period of at least = 2 cycles which was not interrupted by a back-end stall. Available PDIST cou= nters: 0,1", "SampleAfterValue": "100007", "UMask": "0x3", "Unit": "cpu_core" @@ -271,7 +271,7 @@ "EventName": "FRONTEND_RETIRED.LATENCY_GE_256", "MSRIndex": "0x3F7", "MSRValue": "0x610006", - "PublicDescription": "Counts retired instructions that are fetched= after an interval where the front-end delivered no uops for a period of 25= 6 cycles which was not interrupted by a back-end stall. Available PDIST cou= nters: 0", + "PublicDescription": "Counts retired instructions that are fetched= after an interval where the front-end delivered no uops for a period of 25= 6 cycles which was not interrupted by a back-end stall. Available PDIST cou= nters: 0,1", "SampleAfterValue": "100007", "UMask": "0x3", "Unit": "cpu_core" @@ -283,7 +283,7 @@ "EventName": "FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1", "MSRIndex": "0x3F7", "MSRValue": "0x100206", - "PublicDescription": "Counts retired instructions that are deliver= ed to the back-end after the front-end had at least 1 bubble-slot for a per= iod of 2 cycles. A bubble-slot is an empty issue-pipeline slot while there = was no RAT stall. Available PDIST counters: 0", + "PublicDescription": "Counts retired instructions that are deliver= ed to the back-end after the front-end had at least 1 bubble-slot for a per= iod of 2 cycles. A bubble-slot is an empty issue-pipeline slot while there = was no RAT stall. Available PDIST counters: 0,1", "SampleAfterValue": "100007", "UMask": "0x3", "Unit": "cpu_core" @@ -295,7 +295,7 @@ "EventName": "FRONTEND_RETIRED.LATENCY_GE_32", "MSRIndex": "0x3F7", "MSRValue": "0x602006", - "PublicDescription": "Counts retired instructions that are deliver= ed to the back-end after a front-end stall of at least 32 cycles. During th= is period the front-end delivered no uops. Available PDIST counters: 0", + "PublicDescription": "Counts retired instructions that are deliver= ed to the back-end after a front-end stall of at least 32 cycles. During th= is period the front-end delivered no uops. Available PDIST counters: 0,1", "SampleAfterValue": "100007", "UMask": "0x3", "Unit": "cpu_core" @@ -307,7 +307,7 @@ "EventName": "FRONTEND_RETIRED.LATENCY_GE_4", "MSRIndex": "0x3F7", "MSRValue": "0x600406", - "PublicDescription": "Counts retired instructions that are fetched= after an interval where the front-end delivered no uops for a period of 4 = cycles which was not interrupted by a back-end stall. Available PDIST count= ers: 0", + "PublicDescription": "Counts retired instructions that are fetched= after an interval where the front-end delivered no uops for a period of 4 = cycles which was not interrupted by a back-end stall. Available PDIST count= ers: 0,1", "SampleAfterValue": "100007", "UMask": "0x3", "Unit": "cpu_core" @@ -319,7 +319,7 @@ "EventName": "FRONTEND_RETIRED.LATENCY_GE_512", "MSRIndex": "0x3F7", "MSRValue": "0x620006", - "PublicDescription": "Counts retired instructions that are fetched= after an interval where the front-end delivered no uops for a period of 51= 2 cycles which was not interrupted by a back-end stall. Available PDIST cou= nters: 0", + "PublicDescription": "Counts retired instructions that are fetched= after an interval where the front-end delivered no uops for a period of 51= 2 cycles which was not interrupted by a back-end stall. Available PDIST cou= nters: 0,1", "SampleAfterValue": "100007", "UMask": "0x3", "Unit": "cpu_core" @@ -331,7 +331,7 @@ "EventName": "FRONTEND_RETIRED.LATENCY_GE_64", "MSRIndex": "0x3F7", "MSRValue": "0x604006", - "PublicDescription": "Counts retired instructions that are fetched= after an interval where the front-end delivered no uops for a period of 64= cycles which was not interrupted by a back-end stall. Available PDIST coun= ters: 0", + "PublicDescription": "Counts retired instructions that are fetched= after an interval where the front-end delivered no uops for a period of 64= cycles which was not interrupted by a back-end stall. Available PDIST coun= ters: 0,1", "SampleAfterValue": "100007", "UMask": "0x3", "Unit": "cpu_core" @@ -343,7 +343,7 @@ "EventName": "FRONTEND_RETIRED.LATENCY_GE_8", "MSRIndex": "0x3F7", "MSRValue": "0x600806", - "PublicDescription": "Counts retired instructions that are deliver= ed to the back-end after a front-end stall of at least 8 cycles. During thi= s period the front-end delivered no uops. Available PDIST counters: 0", + "PublicDescription": "Counts retired instructions that are deliver= ed to the back-end after a front-end stall of at least 8 cycles. During thi= s period the front-end delivered no uops. Available PDIST counters: 0,1", "SampleAfterValue": "100007", "UMask": "0x3", "Unit": "cpu_core" @@ -355,7 +355,7 @@ "EventName": "FRONTEND_RETIRED.MISP_ANT", "MSRIndex": "0x3F7", "MSRValue": "0x9", - "PublicDescription": "ANT retired branches that got just mispredic= ted Available PDIST counters: 0", + "PublicDescription": "ANT retired branches that got just mispredic= ted Available PDIST counters: 0,1", "SampleAfterValue": "100007", "UMask": "0x2", "Unit": "cpu_core" @@ -367,7 +367,7 @@ "EventName": "FRONTEND_RETIRED.MS_FLOWS", "MSRIndex": "0x3F7", "MSRValue": "0x8", - "PublicDescription": "Counts flows delivered by the Microcode Sequ= encer Available PDIST counters: 0", + "PublicDescription": "Counts flows delivered by the Microcode Sequ= encer Available PDIST counters: 0,1", "SampleAfterValue": "100007", "UMask": "0x3", "Unit": "cpu_core" @@ -397,7 +397,7 @@ "EventName": "FRONTEND_RETIRED.STLB_MISS", "MSRIndex": "0x3F7", "MSRValue": "0x15", - "PublicDescription": "Counts retired Instructions that experienced= STLB (2nd level TLB) true miss. Available PDIST counters: 0", + "PublicDescription": "Counts retired Instructions that experienced= STLB (2nd level TLB) true miss. Available PDIST counters: 0,1", "SampleAfterValue": "100007", "UMask": "0x3", "Unit": "cpu_core" @@ -409,7 +409,7 @@ "EventName": "FRONTEND_RETIRED.UNKNOWN_BRANCH", "MSRIndex": "0x3F7", "MSRValue": "0x17", - "PublicDescription": "Number retired branch instructions that caus= ed the front-end to be resteered when it finds the instruction in a fetch l= ine. This is called Unknown Branch which occurs for the first time a branch= instruction is fetched or when the branch is not tracked by the BPU (Branc= h Prediction Unit) anymore. Available PDIST counters: 0", + "PublicDescription": "Number retired branch instructions that caus= ed the front-end to be resteered when it finds the instruction in a fetch l= ine. This is called Unknown Branch which occurs for the first time a branch= instruction is fetched or when the branch is not tracked by the BPU (Branc= h Prediction Unit) anymore. Available PDIST counters: 0,1", "SampleAfterValue": "100007", "UMask": "0x3", "Unit": "cpu_core" diff --git a/tools/perf/pmu-events/arch/x86/lunarlake/lnl-metrics.json b/to= ols/perf/pmu-events/arch/x86/lunarlake/lnl-metrics.json index 3c740962e63e..06390a72110d 100644 --- a/tools/perf/pmu-events/arch/x86/lunarlake/lnl-metrics.json +++ b/tools/perf/pmu-events/arch/x86/lunarlake/lnl-metrics.json @@ -1,74 +1,46 @@ [ { "BriefDescription": "C10 residency percent per package", - "MetricExpr": "cstate_pkg@c10\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c10\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C10_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C1 residency percent per core", - "MetricExpr": "cstate_core@c1\\-residency@ / TSC", + "MetricExpr": "cstate_core@c1\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C1_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C2 residency percent per package", - "MetricExpr": "cstate_pkg@c2\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c2\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C2_Pkg_Residency", "ScaleUnit": "100%" }, - { - "BriefDescription": "C3 residency percent per package", - "MetricExpr": "cstate_pkg@c3\\-residency@ / TSC", - "MetricGroup": "Power", - "MetricName": "C3_Pkg_Residency", - "ScaleUnit": "100%" - }, { "BriefDescription": "C6 residency percent per core", - "MetricExpr": "cstate_core@c6\\-residency@ / TSC", + "MetricExpr": "cstate_core@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per package", - "MetricExpr": "cstate_pkg@c6\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C7 residency percent per core", - "MetricExpr": "cstate_core@c7\\-residency@ / TSC", + "MetricExpr": "cstate_core@c7\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C7_Core_Residency", "ScaleUnit": "100%" }, - { - "BriefDescription": "C7 residency percent per package", - "MetricExpr": "cstate_pkg@c7\\-residency@ / TSC", - "MetricGroup": "Power", - "MetricName": "C7_Pkg_Residency", - "ScaleUnit": "100%" - }, - { - "BriefDescription": "C8 residency percent per package", - "MetricExpr": "cstate_pkg@c8\\-residency@ / TSC", - "MetricGroup": "Power", - "MetricName": "C8_Pkg_Residency", - "ScaleUnit": "100%" - }, - { - "BriefDescription": "C9 residency percent per package", - "MetricExpr": "cstate_pkg@c9\\-residency@ / TSC", - "MetricGroup": "Power", - "MetricName": "C9_Pkg_Residency", - "ScaleUnit": "100%" - }, { "BriefDescription": "Percentage of cycles spent in System Manageme= nt Interrupts.", "MetricExpr": "((msr@aperf@ - cycles) / msr@aperf@ if msr@smi@ > 0= else 0)", @@ -555,7 +527,7 @@ }, { "BriefDescription": "Average CPU Utilization", - "MetricExpr": "cpu_atom@CPU_CLK_UNHALTED.REF_TSC@ / TSC", + "MetricExpr": "cpu_atom@CPU_CLK_UNHALTED.REF_TSC@ / msr@tsc\\,cpu= =3Dcpu_atom@", "MetricName": "tma_info_system_cpu_utilization", "Unit": "cpu_atom" }, @@ -724,6 +696,13 @@ "ScaleUnit": "100%", "Unit": "cpu_atom" }, + { + "BriefDescription": "Uncore frequency per die [GHZ]", + "MetricExpr": "tma_info_system_socket_clks / #num_dies / duration_= time / 1e9", + "MetricGroup": "SoC", + "MetricName": "UNCORE_FREQ", + "Unit": "cpu_core" + }, { "BriefDescription": "This metric represents Core fraction of cycle= s CPU dispatched uops on execution ports for ALU operations.", "MetricExpr": "cpu_core@UOPS_DISPATCHED.ALU@ / (6 * tma_info_threa= d_clks)", @@ -755,7 +734,7 @@ { "BriefDescription": "This category represents fraction of slots wh= ere no uops are being delivered due to a lack of required resources for acc= epting new uops in the Backend", "DefaultMetricgroupName": "TopdownL1", - "MetricExpr": "cpu_core@topdown\\-be\\-bound@ / (cpu_core@topdown\= \-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retirin= g@ + cpu_core@topdown\\-be\\-bound@) + 0 * tma_info_thread_slots", + "MetricExpr": "cpu_core@topdown\\-be\\-bound@ / (cpu_core@topdown\= \-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retirin= g@ + cpu_core@topdown\\-be\\-bound@)", "MetricGroup": "BvOB;Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", "MetricThreshold": "tma_backend_bound > 0.2", @@ -767,7 +746,7 @@ { "BriefDescription": "This category represents fraction of slots wa= sted due to incorrect speculations", "DefaultMetricgroupName": "TopdownL1", - "MetricExpr": "cpu_core@topdown\\-bad\\-spec@ / (cpu_core@topdown\= \-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retirin= g@ + cpu_core@topdown\\-be\\-bound@) + 0 * tma_info_thread_slots", + "MetricExpr": "cpu_core@topdown\\-bad\\-spec@ / (cpu_core@topdown\= \-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retirin= g@ + cpu_core@topdown\\-be\\-bound@)", "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_bad_speculation", "MetricThreshold": "tma_bad_speculation > 0.15", @@ -793,36 +772,36 @@ "PublicDescription": "Total pipeline cost of instructions used for= program control-flow - a subset of the Retiring category in TMA. Examples = include function calls; loops and alignments. (A lower bound)", "Unit": "cpu_core" }, + { + "BriefDescription": "Total pipeline cost when the execution is com= pute-bound - an estimation", + "MetricExpr": "100 * (tma_core_bound * tma_divider / (tma_divider = + tma_ports_utilization + tma_serializing_operation) + tma_core_bound * (tm= a_ports_utilization / (tma_divider + tma_ports_utilization + tma_serializin= g_operation)) * (tma_ports_utilized_3m / (tma_ports_utilized_0 + tma_ports_= utilized_1 + tma_ports_utilized_2 + tma_ports_utilized_3m)))", + "MetricGroup": "BvCB;Cor;tma_issueComp", + "MetricName": "tma_bottleneck_compute_bound_est", + "MetricThreshold": "tma_bottleneck_compute_bound_est > 20", + "PublicDescription": "Total pipeline cost when the execution is co= mpute-bound - an estimation. Covers Core Bound when High ILP as well as whe= n long-latency execution units are busy. Related metrics: ", + "Unit": "cpu_core" + }, { "BriefDescription": "Total pipeline cost of external Memory- or Ca= che-Bandwidth related bottlenecks", - "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) *= (tma_mem_bandwidth / (tma_mem_bandwidth + tma_mem_latency)) + tma_memory_b= ound * (tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_= l3_bound + tma_store_bound)) * (tma_sq_full / (tma_contested_accesses + tma= _data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_memory_bound * (tm= a_l1_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound += tma_store_bound)) * (tma_fb_full / (tma_dtlb_load + tma_fb_full + tma_l1_l= atency_capacity + tma_l1_latency_dependency + tma_lock_latency + tma_split_= loads + tma_store_fwd_blk)))", + "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) *= (tma_mem_bandwidth / (tma_mem_bandwidth + tma_mem_latency)) + tma_memory_b= ound * (tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_= l3_bound + tma_store_bound)) * (tma_sq_full / (tma_contested_accesses + tma= _data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_memory_bound * (tm= a_l1_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound += tma_store_bound)) * (tma_fb_full / (tma_dtlb_load + tma_fb_full + tma_l1_l= atency_capacity + tma_l1_latency_dependency + tma_lock_latency + tma_split_= loads + tma_store_early_blk + tma_store_fwd_blk)))", "MetricGroup": "BvMB;Mem;MemoryBW;Offcore;tma_issueBW", - "MetricName": "tma_bottleneck_cache_memory_bandwidth", - "MetricThreshold": "tma_bottleneck_cache_memory_bandwidth > 20", + "MetricName": "tma_bottleneck_data_cache_memory_bandwidth", + "MetricThreshold": "tma_bottleneck_data_cache_memory_bandwidth > 2= 0", "PublicDescription": "Total pipeline cost of external Memory- or C= ache-Bandwidth related bottlenecks. Related metrics: tma_fb_full, tma_info_= system_dram_bw_use, tma_mem_bandwidth, tma_sq_full", "Unit": "cpu_core" }, { "BriefDescription": "Total pipeline cost of external Memory- or Ca= che-Latency related bottlenecks", - "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) *= (tma_mem_latency / (tma_mem_bandwidth + tma_mem_latency)) + tma_memory_bou= nd * (tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3= _bound + tma_store_bound)) * (tma_l3_hit_latency / (tma_contested_accesses = + tma_data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_memory_bound = * tma_l2_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bou= nd + tma_store_bound) + tma_memory_bound * (tma_l1_bound / (tma_dram_bound = + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_l1_= latency_dependency / (tma_dtlb_load + tma_fb_full + tma_l1_latency_capacity= + tma_l1_latency_dependency + tma_lock_latency + tma_split_loads + tma_sto= re_fwd_blk)) + tma_memory_bound * (tma_l1_bound / (tma_dram_bound + tma_l1_= bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_l1_latency_c= apacity / (tma_dtlb_load + tma_fb_full + tma_l1_latency_capacity + tma_l1_l= atency_dependency + tma_lock_latency + tma_split_loads + tma_store_fwd_blk)= ) + tma_memory_bound * (tma_l1_bound / (tma_dram_bound + tma_l1_bound + tma= _l2_bound + tma_l3_bound + tma_store_bound)) * (tma_lock_latency / (tma_dtl= b_load + tma_fb_full + tma_l1_latency_capacity + tma_l1_latency_dependency = + tma_lock_latency + tma_split_loads + tma_store_fwd_blk)) + tma_memory_bou= nd * (tma_l1_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3= _bound + tma_store_bound)) * (tma_split_loads / (tma_dtlb_load + tma_fb_ful= l + tma_l1_latency_capacity + tma_l1_latency_dependency + tma_lock_latency = + tma_split_loads + tma_store_fwd_blk)) + tma_memory_bound * (tma_store_bou= nd / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_sto= re_bound)) * (tma_split_stores / (tma_dtlb_store + tma_false_sharing + tma_= split_stores + tma_store_latency + tma_streaming_stores)) + tma_memory_boun= d * (tma_store_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_= l3_bound + tma_store_bound)) * (tma_store_latency / (tma_dtlb_store + tma_f= alse_sharing + tma_split_stores + tma_store_latency + tma_streaming_stores)= ))", + "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) *= (tma_mem_latency / (tma_mem_bandwidth + tma_mem_latency)) + tma_memory_bou= nd * (tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3= _bound + tma_store_bound)) * (tma_l3_hit_latency / (tma_contested_accesses = + tma_data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_memory_bound = * tma_l2_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bou= nd + tma_store_bound) + tma_memory_bound * (tma_l1_bound / (tma_dram_bound = + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_l1_= latency_dependency / (tma_dtlb_load + tma_fb_full + tma_l1_latency_capacity= + tma_l1_latency_dependency + tma_lock_latency + tma_split_loads + tma_sto= re_early_blk + tma_store_fwd_blk)) + tma_memory_bound * (tma_l1_bound / (tm= a_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound= )) * (tma_l1_latency_capacity / (tma_dtlb_load + tma_fb_full + tma_l1_laten= cy_capacity + tma_l1_latency_dependency + tma_lock_latency + tma_split_load= s + tma_store_early_blk + tma_store_fwd_blk)) + tma_memory_bound * (tma_l1_= bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_= store_bound)) * (tma_lock_latency / (tma_dtlb_load + tma_fb_full + tma_l1_l= atency_capacity + tma_l1_latency_dependency + tma_lock_latency + tma_split_= loads + tma_store_early_blk + tma_store_fwd_blk)) + tma_memory_bound * (tma= _l1_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + = tma_store_bound)) * (tma_split_loads / (tma_dtlb_load + tma_fb_full + tma_l= 1_latency_capacity + tma_l1_latency_dependency + tma_lock_latency + tma_spl= it_loads + tma_store_early_blk + tma_store_fwd_blk)) + tma_memory_bound * (= tma_store_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bo= und + tma_store_bound)) * (tma_split_stores / (tma_dtlb_store + tma_false_s= haring + tma_split_stores + tma_store_latency + tma_streaming_stores)) + tm= a_memory_bound * (tma_store_bound / (tma_dram_bound + tma_l1_bound + tma_l2= _bound + tma_l3_bound + tma_store_bound)) * (tma_store_latency / (tma_dtlb_= store + tma_false_sharing + tma_split_stores + tma_store_latency + tma_stre= aming_stores)))", "MetricGroup": "BvML;Mem;MemoryLat;Offcore;tma_issueLat", - "MetricName": "tma_bottleneck_cache_memory_latency", - "MetricThreshold": "tma_bottleneck_cache_memory_latency > 20", + "MetricName": "tma_bottleneck_data_cache_memory_latency", + "MetricThreshold": "tma_bottleneck_data_cache_memory_latency > 20", "PublicDescription": "Total pipeline cost of external Memory- or C= ache-Latency related bottlenecks. Related metrics: tma_l3_hit_latency, tma_= mem_latency", "Unit": "cpu_core" }, - { - "BriefDescription": "Total pipeline cost when the execution is com= pute-bound - an estimation", - "MetricExpr": "100 * (tma_core_bound * tma_divider / (tma_divider = + tma_ports_utilization + tma_serializing_operation) + tma_core_bound * (tm= a_ports_utilization / (tma_divider + tma_ports_utilization + tma_serializin= g_operation)) * (tma_ports_utilized_3m / (tma_ports_utilized_0 + tma_ports_= utilized_1 + tma_ports_utilized_2 + tma_ports_utilized_3m)))", - "MetricGroup": "BvCB;Cor;tma_issueComp", - "MetricName": "tma_bottleneck_compute_bound_est", - "MetricThreshold": "tma_bottleneck_compute_bound_est > 20", - "PublicDescription": "Total pipeline cost when the execution is co= mpute-bound - an estimation. Covers Core Bound when High ILP as well as whe= n long-latency execution units are busy. Related metrics: ", - "Unit": "cpu_core" - }, { "BriefDescription": "Total pipeline cost of instruction fetch band= width related bottlenecks (when the front-end could not sustain operations = delivery to the back-end)", - "MetricExpr": "100 * (tma_frontend_bound - (1 - 10 * tma_microcode= _sequencer * tma_other_mispredicts / tma_branch_mispredicts) * tma_fetch_la= tency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches = + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches) - (1 - c= pu_core@INST_RETIRED.REP_ITERATION@ / cpu_core@UOPS_RETIRED.MS\\,cmask\\=3D= 1@) * (tma_fetch_latency * (tma_ms_switches + tma_branch_resteers * (tma_cl= ears_resteers + tma_mispredicts_resteers * tma_other_mispredicts / tma_bran= ch_mispredicts) / (tma_clears_resteers + tma_mispredicts_resteers + tma_unk= nown_branches)) / (tma_branch_resteers + tma_dsb_switches + tma_icache_miss= es + tma_itlb_misses + tma_lcp + tma_ms_switches) + tma_fetch_bandwidth * t= ma_ms / (tma_dsb + tma_lsd + tma_mite + tma_ms))) - tma_bottleneck_big_code= ", + "MetricExpr": "100 * (tma_frontend_bound - (1 - 10 * tma_microcode= _sequencer * tma_other_mispredicts / tma_branch_mispredicts) * tma_fetch_la= tency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches = + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches) - (1 - c= pu_core@INST_RETIRED.REP_ITERATION@ / cpu_core@UOPS_RETIRED.MS\\,cmask\\=3D= 1@) * (tma_fetch_latency * (tma_ms_switches + tma_branch_resteers * (tma_cl= ears_resteers + tma_mispredicts_resteers * tma_other_mispredicts / tma_bran= ch_mispredicts) / (tma_clears_resteers + tma_mispredicts_resteers + tma_unk= nown_branches)) / (tma_branch_resteers + tma_dsb_switches + tma_icache_miss= es + tma_itlb_misses + tma_lcp + tma_ms_switches) + tma_ms)) - tma_bottlene= ck_big_code", "MetricGroup": "BvFB;Fed;FetchBW;Frontend", "MetricName": "tma_bottleneck_instruction_fetch_bw", "MetricThreshold": "tma_bottleneck_instruction_fetch_bw > 20", @@ -830,7 +809,7 @@ }, { "BriefDescription": "Total pipeline cost of irregular execution (e= .g", - "MetricExpr": "100 * ((1 - cpu_core@INST_RETIRED.REP_ITERATION@ / = cpu_core@UOPS_RETIRED.MS\\,cmask\\=3D1@) * (tma_fetch_latency * (tma_ms_swi= tches + tma_branch_resteers * (tma_clears_resteers + tma_mispredicts_restee= rs * tma_other_mispredicts / tma_branch_mispredicts) / (tma_clears_resteers= + tma_mispredicts_resteers + tma_unknown_branches)) / (tma_branch_resteers= + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_m= s_switches) + tma_fetch_bandwidth * tma_ms / (tma_dsb + tma_lsd + tma_mite = + tma_ms)) + 10 * tma_microcode_sequencer * tma_other_mispredicts / tma_bra= nch_mispredicts * tma_branch_mispredicts + tma_machine_clears * tma_other_n= ukes / tma_other_nukes + tma_core_bound * (tma_serializing_operation + cpu_= core@RS.EMPTY_RESOURCE@ / tma_info_thread_clks * tma_ports_utilized_0) / (t= ma_divider + tma_ports_utilization + tma_serializing_operation) + tma_micro= code_sequencer / (tma_microcode_sequencer + tma_few_uops_instructions) * (t= ma_assists / tma_microcode_sequencer) * tma_heavy_operations)", + "MetricExpr": "100 * ((1 - cpu_core@INST_RETIRED.REP_ITERATION@ / = cpu_core@UOPS_RETIRED.MS\\,cmask\\=3D1@) * (tma_fetch_latency * (tma_ms_swi= tches + tma_branch_resteers * (tma_clears_resteers + tma_mispredicts_restee= rs * tma_other_mispredicts / tma_branch_mispredicts) / (tma_clears_resteers= + tma_mispredicts_resteers + tma_unknown_branches)) / (tma_branch_resteers= + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_m= s_switches) + tma_ms) + 10 * tma_microcode_sequencer * tma_other_mispredict= s / tma_branch_mispredicts * tma_branch_mispredicts + tma_machine_clears * = tma_other_nukes / tma_other_nukes + tma_core_bound * (tma_serializing_opera= tion + cpu_core@RS.EMPTY_RESOURCE@ / tma_info_thread_clks * tma_ports_utili= zed_0) / (tma_divider + tma_ports_utilization + tma_serializing_operation) = + tma_microcode_sequencer / (tma_microcode_sequencer + tma_few_uops_instruc= tions) * (tma_assists / tma_microcode_sequencer) * tma_heavy_operations)", "MetricGroup": "Bad;BvIO;Cor;Ret;tma_issueMS", "MetricName": "tma_bottleneck_irregular_overhead", "MetricThreshold": "tma_bottleneck_irregular_overhead > 10", @@ -839,7 +818,7 @@ }, { "BriefDescription": "Total pipeline cost of Memory Address Transla= tion related bottlenecks (data-side TLBs)", - "MetricExpr": "100 * (tma_memory_bound * (tma_l1_bound / (tma_dram= _bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (= tma_dtlb_load / (tma_dtlb_load + tma_fb_full + tma_l1_latency_capacity + tm= a_l1_latency_dependency + tma_lock_latency + tma_split_loads + tma_store_fw= d_blk)) + tma_memory_bound * (tma_store_bound / (tma_dram_bound + tma_l1_bo= und + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_dtlb_store / (= tma_dtlb_store + tma_false_sharing + tma_split_stores + tma_store_latency += tma_streaming_stores)))", + "MetricExpr": "100 * (tma_memory_bound * (tma_l1_bound / (tma_dram= _bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (= tma_dtlb_load / (tma_dtlb_load + tma_fb_full + tma_l1_latency_capacity + tm= a_l1_latency_dependency + tma_lock_latency + tma_split_loads + tma_store_ea= rly_blk + tma_store_fwd_blk)) + tma_memory_bound * (tma_store_bound / (tma_= dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound))= * (tma_dtlb_store / (tma_dtlb_store + tma_false_sharing + tma_split_stores= + tma_store_latency + tma_streaming_stores)))", "MetricGroup": "BvMT;Mem;MemoryTLB;Offcore;tma_issueTLB", "MetricName": "tma_bottleneck_memory_data_tlbs", "MetricThreshold": "tma_bottleneck_memory_data_tlbs > 20", @@ -866,7 +845,7 @@ }, { "BriefDescription": "Total pipeline cost of remaining bottlenecks = in the back-end", - "MetricExpr": "100 - (tma_bottleneck_big_code + tma_bottleneck_ins= truction_fetch_bw + tma_bottleneck_mispredictions + tma_bottleneck_cache_me= mory_bandwidth + tma_bottleneck_cache_memory_latency + tma_bottleneck_memor= y_data_tlbs + tma_bottleneck_memory_synchronization + tma_bottleneck_comput= e_bound_est + tma_bottleneck_irregular_overhead + tma_bottleneck_branching_= overhead + tma_bottleneck_useful_work)", + "MetricExpr": "100 - (tma_bottleneck_big_code + tma_bottleneck_ins= truction_fetch_bw + tma_bottleneck_mispredictions + tma_bottleneck_data_cac= he_memory_bandwidth + tma_bottleneck_data_cache_memory_latency + tma_bottle= neck_memory_data_tlbs + tma_bottleneck_memory_synchronization + tma_bottlen= eck_compute_bound_est + tma_bottleneck_irregular_overhead + tma_bottleneck_= branching_overhead + tma_bottleneck_useful_work)", "MetricGroup": "BvOB;Cor;Offcore", "MetricName": "tma_bottleneck_other_bottlenecks", "MetricThreshold": "tma_bottleneck_other_bottlenecks > 20", @@ -883,7 +862,7 @@ }, { "BriefDescription": "This metric represents fraction of slots the = CPU has wasted due to Branch Misprediction", - "MetricExpr": "cpu_core@topdown\\-br\\-mispredict@ / (cpu_core@top= down\\-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-re= tiring@ + cpu_core@topdown\\-be\\-bound@) + 0 * tma_info_thread_slots", + "MetricExpr": "cpu_core@topdown\\-br\\-mispredict@ / (cpu_core@top= down\\-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-re= tiring@ + cpu_core@topdown\\-be\\-bound@)", "MetricGroup": "BadSpec;BrMispredicts;BvMP;TmaL2;TopdownL2;tma_L2_= group;tma_bad_speculation_group;tma_issueBM", "MetricName": "tma_branch_mispredicts", "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_specula= tion > 0.15", @@ -1023,7 +1002,6 @@ }, { "BriefDescription": "This metric estimates fraction of cycles whil= e the memory subsystem was handling synchronizations due to contested acces= ses", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "(cpu_core@MEM_LOAD_L3_HIT_RETIRED.XSNP_MISS@ * min(= cpu_core@MEM_LOAD_L3_HIT_RETIRED.XSNP_MISS@R, 24 * tma_info_system_core_fre= quency) + cpu_core@MEM_LOAD_L3_HIT_RETIRED.XSNP_HITM@ * min(cpu_core@MEM_LO= AD_L3_HIT_RETIRED.XSNP_HITM@R, 25 * tma_info_system_core_frequency)) * (1 += cpu_core@MEM_LOAD_RETIRED.FB_HIT@ / cpu_core@MEM_LOAD_RETIRED.L1_MISS@ / 2= ) / tma_info_thread_clks", "MetricGroup": "BvMS;DataSharing;LockCont;Offcore;Snoop;TopdownL4;= tma_L4_group;tma_issueSyncxn;tma_l3_bound_group", "MetricName": "tma_contested_accesses", @@ -1076,7 +1054,7 @@ }, { "BriefDescription": "This metric represents Core fraction of cycle= s in which CPU was likely limited due to DSB (decoded uop cache) fetch pipe= line", - "MetricExpr": "(cpu_core@IDQ.DSB_UOPS\\,cmask\\=3D0x8\\,inv\\=3D0x= 1@ + cpu_core@IDQ.DSB_UOPS@ / (cpu_core@IDQ.DSB_UOPS@ + cpu_core@IDQ.MITE_U= OPS@) * (cpu_core@IDQ_BUBBLES.CYCLES_0_UOPS_DELIV.CORE@ - cpu_core@IDQ_BUBB= LES.FETCH_LATENCY@)) / tma_info_thread_clks", + "MetricExpr": "(cpu_core@IDQ.DSB_UOPS\\,cmask\\=3D0x8\\,inv\\=3D0x= 1@ / 2 + cpu_core@IDQ.DSB_UOPS@ / (cpu_core@IDQ.DSB_UOPS@ + cpu_core@IDQ.MI= TE_UOPS@) * (cpu_core@IDQ_BUBBLES.STARVATION_CYCLES@ - cpu_core@IDQ_BUBBLES= .FETCH_LATENCY@)) / tma_info_thread_clks", "MetricGroup": "DSB;FetchBW;TopdownL3;tma_L3_group;tma_fetch_bandw= idth_group", "MetricName": "tma_dsb", "MetricThreshold": "tma_dsb > 0.15 & tma_fetch_bandwidth > 0.2", @@ -1130,7 +1108,7 @@ "MetricGroup": "BvMB;MemoryBW;TopdownL4;tma_L4_group;tma_issueBW;t= ma_issueSL;tma_issueSmSt;tma_l1_bound_group", "MetricName": "tma_fb_full", "MetricThreshold": "tma_fb_full > 0.3", - "PublicDescription": "This metric does a *rough estimation* of how= often L1D Fill Buffer unavailability limited additional L1D miss memory ac= cess requests to proceed. The higher the metric value; the deeper the memor= y hierarchy level the misses are satisfied from (metric values >1 are valid= ). Often it hints on approaching bandwidth limits (to L2 cache; L3 cache or= external memory). Related metrics: tma_bottleneck_cache_memory_bandwidth, = tma_info_system_dram_bw_use, tma_mem_bandwidth, tma_sq_full, tma_store_late= ncy, tma_streaming_stores", + "PublicDescription": "This metric does a *rough estimation* of how= often L1D Fill Buffer unavailability limited additional L1D miss memory ac= cess requests to proceed. The higher the metric value; the deeper the memor= y hierarchy level the misses are satisfied from (metric values >1 are valid= ). Often it hints on approaching bandwidth limits (to L2 cache; L3 cache or= external memory). Related metrics: tma_bottleneck_data_cache_memory_bandwi= dth, tma_info_system_dram_bw_use, tma_mem_bandwidth, tma_sq_full, tma_store= _latency, tma_streaming_stores", "ScaleUnit": "100%", "Unit": "cpu_core" }, @@ -1147,7 +1125,7 @@ }, { "BriefDescription": "This metric represents fraction of slots the = CPU was stalled due to Frontend latency issues", - "MetricExpr": "cpu_core@topdown\\-fetch\\-lat@ / (cpu_core@topdown= \\-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retiri= ng@ + cpu_core@topdown\\-be\\-bound@) + 0 * tma_info_thread_slots", + "MetricExpr": "cpu_core@topdown\\-fetch\\-lat@ / (cpu_core@topdown= \\-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retiri= ng@ + cpu_core@topdown\\-be\\-bound@)", "MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend= _bound_group", "MetricName": "tma_fetch_latency", "MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound >= 0.15", @@ -1197,7 +1175,7 @@ }, { "BriefDescription": "This metric approximates arithmetic floating-= point (FP) scalar uops fraction the CPU has retired", - "MetricExpr": "cpu_core@FP_ARITH_INST_RETIRED.SCALAR@ / (tma_retir= ing * tma_info_thread_slots)", + "MetricExpr": "cpu_core@FP_ARITH_OPS_RETIRED.SCALAR@ / (tma_retiri= ng * tma_info_thread_slots)", "MetricGroup": "Compute;Flops;TopdownL4;tma_L4_group;tma_fp_arith_= group;tma_issue2P", "MetricName": "tma_fp_scalar", "MetricThreshold": "tma_fp_scalar > 0.1 & (tma_fp_arith > 0.2 & tm= a_light_operations > 0.6)", @@ -1207,7 +1185,7 @@ }, { "BriefDescription": "This metric approximates arithmetic floating-= point (FP) vector uops fraction the CPU has retired aggregated across all v= ector widths", - "MetricExpr": "cpu_core@FP_ARITH_INST_RETIRED.VECTOR@ / (tma_retir= ing * tma_info_thread_slots)", + "MetricExpr": "cpu_core@FP_ARITH_OPS_RETIRED.VECTOR@ / (tma_retiri= ng * tma_info_thread_slots)", "MetricGroup": "Compute;Flops;TopdownL4;tma_L4_group;tma_fp_arith_= group;tma_issue2P", "MetricName": "tma_fp_vector", "MetricThreshold": "tma_fp_vector > 0.1 & (tma_fp_arith > 0.2 & tm= a_light_operations > 0.6)", @@ -1217,7 +1195,7 @@ }, { "BriefDescription": "This metric approximates arithmetic FP vector= uops fraction the CPU has retired for 128-bit wide vectors", - "MetricExpr": "(cpu_core@FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE@= + cpu_core@FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE@) / (tma_retiring * tm= a_info_thread_slots)", + "MetricExpr": "(cpu_core@FP_ARITH_OPS_RETIRED.128B_PACKED_DOUBLE@ = + cpu_core@FP_ARITH_OPS_RETIRED.128B_PACKED_SINGLE@) / (tma_retiring * tma_= info_thread_slots)", "MetricGroup": "Compute;Flops;TopdownL5;tma_L5_group;tma_fp_vector= _group;tma_issue2P", "MetricName": "tma_fp_vector_128b", "MetricThreshold": "tma_fp_vector_128b > 0.1 & (tma_fp_vector > 0.= 1 & (tma_fp_arith > 0.2 & tma_light_operations > 0.6))", @@ -1227,7 +1205,7 @@ }, { "BriefDescription": "This metric approximates arithmetic FP vector= uops fraction the CPU has retired for 256-bit wide vectors", - "MetricExpr": "cpu_core@FP_ARITH_INST_RETIRED.VECTOR\\,umask\\=3D0= x30@ / (tma_retiring * tma_info_thread_slots)", + "MetricExpr": "cpu_core@FP_ARITH_OPS_RETIRED.VECTOR\\,umask\\=3D0x= 30@ / (tma_retiring * tma_info_thread_slots)", "MetricGroup": "Compute;Flops;TopdownL5;tma_L5_group;tma_fp_vector= _group;tma_issue2P", "MetricName": "tma_fp_vector_256b", "MetricThreshold": "tma_fp_vector_256b > 0.1 & (tma_fp_vector > 0.= 1 & (tma_fp_arith > 0.2 & tma_light_operations > 0.6))", @@ -1238,7 +1216,7 @@ { "BriefDescription": "This category represents fraction of slots wh= ere the processor's Frontend undersupplies its Backend", "DefaultMetricgroupName": "TopdownL1", - "MetricExpr": "cpu_core@topdown\\-fe\\-bound@ / (cpu_core@topdown\= \-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retirin= g@ + cpu_core@topdown\\-be\\-bound@) + 0 * tma_info_thread_slots", + "MetricExpr": "cpu_core@topdown\\-fe\\-bound@ / (cpu_core@topdown\= \-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retirin= g@ + cpu_core@topdown\\-be\\-bound@)", "MetricGroup": "BvFB;BvIO;Default;PGO;TmaL1;TopdownL1;tma_L1_group= ", "MetricName": "tma_frontend_bound", "MetricThreshold": "tma_frontend_bound > 0.15", @@ -1259,7 +1237,7 @@ }, { "BriefDescription": "This metric represents fraction of slots wher= e the CPU was retiring heavy-weight operations -- instructions that require= two or more uops or micro-coded sequences", - "MetricExpr": "cpu_core@topdown\\-heavy\\-ops@ / (cpu_core@topdown= \\-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retiri= ng@ + cpu_core@topdown\\-be\\-bound@) + 0 * tma_info_thread_slots", + "MetricExpr": "cpu_core@topdown\\-heavy\\-ops@ / (cpu_core@topdown= \\-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retiri= ng@ + cpu_core@topdown\\-be\\-bound@)", "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_g= roup", "MetricName": "tma_heavy_operations", "MetricThreshold": "tma_heavy_operations > 0.1", @@ -1437,7 +1415,7 @@ }, { "BriefDescription": "Floating Point Operations Per Cycle", - "MetricExpr": "(cpu_core@FP_ARITH_INST_RETIRED.SCALAR@ + 2 * cpu_c= ore@FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE@ + 4 * cpu_core@FP_ARITH_INST_= RETIRED.4_FLOPS@ + 8 * cpu_core@FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE@) = / tma_info_thread_clks", + "MetricExpr": "(cpu_core@FP_ARITH_OPS_RETIRED.SCALAR@ + 2 * cpu_co= re@FP_ARITH_OPS_RETIRED.128B_PACKED_DOUBLE@ + 4 * cpu_core@FP_ARITH_OPS_RET= IRED.4_FLOPS@ + 8 * cpu_core@FP_ARITH_OPS_RETIRED.256B_PACKED_SINGLE@) / tm= a_info_thread_clks", "MetricGroup": "Flops;Ret", "MetricName": "tma_info_core_flopc", "Unit": "cpu_core" @@ -1578,7 +1556,7 @@ }, { "BriefDescription": "Instructions per FP Arithmetic instruction (l= ower number means higher occurrence rate)", - "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / (cpu_core@FP_ARITH_INS= T_RETIRED.SCALAR@ + cpu_core@FP_ARITH_INST_RETIRED.VECTOR@)", + "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / (cpu_core@FP_ARITH_OPS= _RETIRED.SCALAR@ + cpu_core@FP_ARITH_OPS_RETIRED.VECTOR@)", "MetricGroup": "Flops;InsType", "MetricName": "tma_info_inst_mix_iparith", "MetricThreshold": "tma_info_inst_mix_iparith < 10", @@ -1587,7 +1565,7 @@ }, { "BriefDescription": "Instructions per FP Arithmetic AVX/SSE 128-bi= t instruction (lower number means higher occurrence rate)", - "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / (cpu_core@FP_ARITH_INS= T_RETIRED.128B_PACKED_DOUBLE@ + cpu_core@FP_ARITH_INST_RETIRED.128B_PACKED_= SINGLE@)", + "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / (cpu_core@FP_ARITH_OPS= _RETIRED.128B_PACKED_DOUBLE@ + cpu_core@FP_ARITH_OPS_RETIRED.128B_PACKED_SI= NGLE@)", "MetricGroup": "Flops;FpVector;InsType", "MetricName": "tma_info_inst_mix_iparith_avx128", "MetricThreshold": "tma_info_inst_mix_iparith_avx128 < 10", @@ -1596,7 +1574,7 @@ }, { "BriefDescription": "Instructions per FP Arithmetic AVX* 256-bit i= nstruction (lower number means higher occurrence rate)", - "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / (cpu_core@FP_ARITH_INS= T_RETIRED.256B_PACKED_DOUBLE@ + cpu_core@FP_ARITH_INST_RETIRED.256B_PACKED_= SINGLE@)", + "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / (cpu_core@FP_ARITH_OPS= _RETIRED.256B_PACKED_DOUBLE@ + cpu_core@FP_ARITH_OPS_RETIRED.256B_PACKED_SI= NGLE@)", "MetricGroup": "Flops;FpVector;InsType", "MetricName": "tma_info_inst_mix_iparith_avx256", "MetricThreshold": "tma_info_inst_mix_iparith_avx256 < 10", @@ -1605,7 +1583,7 @@ }, { "BriefDescription": "Instructions per FP Arithmetic Scalar Double-= Precision instruction (lower number means higher occurrence rate)", - "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / cpu_core@FP_ARITH_INST= _RETIRED.SCALAR_DOUBLE@", + "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / cpu_core@FP_ARITH_OPS_= RETIRED.SCALAR_DOUBLE@", "MetricGroup": "Flops;FpScalar;InsType", "MetricName": "tma_info_inst_mix_iparith_scalar_dp", "MetricThreshold": "tma_info_inst_mix_iparith_scalar_dp < 10", @@ -1614,7 +1592,7 @@ }, { "BriefDescription": "Instructions per FP Arithmetic Scalar Single-= Precision instruction (lower number means higher occurrence rate)", - "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / cpu_core@FP_ARITH_INST= _RETIRED.SCALAR_SINGLE@", + "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / cpu_core@FP_ARITH_OPS_= RETIRED.SCALAR_SINGLE@", "MetricGroup": "Flops;FpScalar;InsType", "MetricName": "tma_info_inst_mix_iparith_scalar_sp", "MetricThreshold": "tma_info_inst_mix_iparith_scalar_sp < 10", @@ -1639,7 +1617,7 @@ }, { "BriefDescription": "Instructions per Floating Point (FP) Operatio= n (lower number means higher occurrence rate)", - "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / (cpu_core@FP_ARITH_INS= T_RETIRED.SCALAR@ + 2 * cpu_core@FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE@ = + 4 * cpu_core@FP_ARITH_INST_RETIRED.4_FLOPS@ + 8 * cpu_core@FP_ARITH_INST_= RETIRED.256B_PACKED_SINGLE@)", + "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / (cpu_core@FP_ARITH_OPS= _RETIRED.SCALAR@ + 2 * cpu_core@FP_ARITH_OPS_RETIRED.128B_PACKED_DOUBLE@ + = 4 * cpu_core@FP_ARITH_OPS_RETIRED.4_FLOPS@ + 8 * cpu_core@FP_ARITH_OPS_RETI= RED.256B_PACKED_SINGLE@)", "MetricGroup": "Flops;InsType", "MetricName": "tma_info_inst_mix_ipflop", "MetricThreshold": "tma_info_inst_mix_ipflop < 10", @@ -1694,7 +1672,7 @@ }, { "BriefDescription": "Average per-thread data fill bandwidth to the= L1 data cache [GB / sec]", - "MetricExpr": "64 * cpu_core@L1D.REPLACEMENT@ / 1e9 / tma_info_sys= tem_time", + "MetricExpr": "64 * cpu_core@L1D.L1_REPLACEMENT@ / 1e9 / tma_info_= system_time", "MetricGroup": "Mem;MemoryBW", "MetricName": "tma_info_memory_l1d_cache_fill_bw", "Unit": "cpu_core" @@ -1706,6 +1684,13 @@ "MetricName": "tma_info_memory_l1dl0_cache_fill_bw", "Unit": "cpu_core" }, + { + "BriefDescription": "L0 cache true misses per kilo instruction for= retired demand loads", + "MetricExpr": "1e3 * (cpu_core@MEM_LOAD_RETIRED.L1_MISS@ + cpu_cor= e@MEM_LOAD_RETIRED.L1_HIT_L1@) / cpu_core@INST_RETIRED.ANY@", + "MetricGroup": "CacheHits;Mem", + "MetricName": "tma_info_memory_l1dl0_mpki", + "Unit": "cpu_core" + }, { "BriefDescription": "L1 cache true misses per kilo instruction for= retired demand loads", "MetricExpr": "1e3 * cpu_core@MEM_LOAD_RETIRED.L1_MISS@ / cpu_core= @INST_RETIRED.ANY@", @@ -1921,6 +1906,13 @@ "MetricName": "tma_info_pipeline_fetch_mite", "Unit": "cpu_core" }, + { + "BriefDescription": "Average number of uops fetched from MS per cy= cle", + "MetricExpr": "cpu_core@IDQ.MS_UOPS@ / cpu_core@IDQ.MS_UOPS\\,cmas= k\\=3D1@", + "MetricGroup": "Fed;FetchLat;MicroSeq", + "MetricName": "tma_info_pipeline_fetch_ms", + "Unit": "cpu_core" + }, { "BriefDescription": "Instructions per a microcode Assist invocatio= n", "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / cpu_core@ASSISTS.ANY@", @@ -1955,7 +1947,7 @@ }, { "BriefDescription": "Measured Average Core Frequency for unhalted = processors [GHz]", - "MetricExpr": "tma_info_system_turbo_utilization * TSC / 1e9 / tma= _info_system_time", + "MetricExpr": "tma_info_system_turbo_utilization * msr@tsc\\,cpu= =3Dcpu_core@ / 1e9 / tma_info_system_time", "MetricGroup": "Power;Summary", "MetricName": "tma_info_system_core_frequency", "Unit": "cpu_core" @@ -1969,14 +1961,22 @@ }, { "BriefDescription": "Average number of utilized CPUs", - "MetricExpr": "cpu_core@CPU_CLK_UNHALTED.REF_TSC@ / TSC", + "MetricExpr": "cpu_core@CPU_CLK_UNHALTED.REF_TSC@ / msr@tsc\\,cpu= =3Dcpu_core@", "MetricGroup": "Summary", "MetricName": "tma_info_system_cpus_utilized", "Unit": "cpu_core" }, + { + "BriefDescription": "Average external Memory Bandwidth Use for rea= ds and writes [GB / sec]", + "MetricExpr": "32 * UNC_M_TOTAL_DATA / 1e9 / tma_info_system_time", + "MetricGroup": "HPC;MemOffcore;MemoryBW;SoC;tma_issueBW", + "MetricName": "tma_info_system_dram_bw_use", + "PublicDescription": "Average external Memory Bandwidth Use for re= ads and writes [GB / sec]. Related metrics: tma_bottleneck_data_cache_memor= y_bandwidth, tma_fb_full, tma_mem_bandwidth, tma_sq_full", + "Unit": "cpu_core" + }, { "BriefDescription": "Giga Floating Point Operations Per Second", - "MetricExpr": "(cpu_core@FP_ARITH_INST_RETIRED.SCALAR@ + 2 * cpu_c= ore@FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE@ + 4 * cpu_core@FP_ARITH_INST_= RETIRED.4_FLOPS@ + 8 * cpu_core@FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE@) = / 1e9 / tma_info_system_time", + "MetricExpr": "(cpu_core@FP_ARITH_OPS_RETIRED.SCALAR@ + 2 * cpu_co= re@FP_ARITH_OPS_RETIRED.128B_PACKED_DOUBLE@ + 4 * cpu_core@FP_ARITH_OPS_RET= IRED.4_FLOPS@ + 8 * cpu_core@FP_ARITH_OPS_RETIRED.256B_PACKED_SINGLE@) / 1e= 9 / tma_info_system_time", "MetricGroup": "Cor;Flops;HPC", "MetricName": "tma_info_system_gflops", "PublicDescription": "Giga Floating Point Operations Per Second. A= ggregate across all supported options of: FP precisions, scalar and vector = instructions, vector-width", @@ -2020,6 +2020,13 @@ "MetricName": "tma_info_system_power", "Unit": "cpu_core" }, + { + "BriefDescription": "Socket actual clocks when any core is active = on that socket", + "MetricExpr": "UNC_CLOCK.SOCKET", + "MetricGroup": "SoC", + "MetricName": "tma_info_system_socket_clks", + "Unit": "cpu_core" + }, { "BriefDescription": "Run duration time in seconds", "MetricExpr": "duration_time", @@ -2035,6 +2042,13 @@ "MetricName": "tma_info_system_turbo_utilization", "Unit": "cpu_core" }, + { + "BriefDescription": "Measured Average Uncore Frequency for the SoC= [GHz]", + "MetricExpr": "tma_info_system_socket_clks / 1e9 / tma_info_system= _time", + "MetricGroup": "SoC", + "MetricName": "tma_info_system_uncore_frequency", + "Unit": "cpu_core" + }, { "BriefDescription": "Per-Logical Processor actual clocks when the = Logical Processor is active.", "MetricExpr": "cpu_core@CPU_CLK_UNHALTED.THREAD@", @@ -2156,12 +2170,12 @@ "Unit": "cpu_core" }, { - "BriefDescription": "This metric([SKL+] roughly; [LNL]) estimates = fraction of cycles with demand load accesses that hit the L1D cache", - "MetricExpr": "4 * cpu_core@DEPENDENT_LOADS.ANY@ / tma_info_thread= _clks", + "BriefDescription": "This metric ([SKL+] roughly; [LNL]) estimates= fraction of cycles with demand load accesses that hit the L1D cache", + "MetricExpr": "4 * cpu_core@DEPENDENT_LOADS.ANY\\,cmask\\=3D1@ / t= ma_info_thread_clks", "MetricGroup": "BvML;MemoryLat;TopdownL4;tma_L4_group;tma_l1_bound= _group", "MetricName": "tma_l1_latency_dependency", "MetricThreshold": "tma_l1_latency_dependency > 0.1 & (tma_l1_boun= d > 0.1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric([SKL+] roughly; [LNL]) estimates= fraction of cycles with demand load accesses that hit the L1D cache. The s= hort latency of the L1D cache may be exposed in pointer-chasing memory acce= ss patterns as an example. Sample with: MEM_LOAD_UOPS_RETIRED.L1_HIT_PS", + "PublicDescription": "This metric ([SKL+] roughly; [LNL]) estimate= s fraction of cycles with demand load accesses that hit the L1D cache. The = short latency of the L1D cache may be exposed in pointer-chasing memory acc= ess patterns as an example. Sample with: MEM_LOAD_UOPS_RETIRED.L1_HIT_PS", "ScaleUnit": "100%", "Unit": "cpu_core" }, @@ -2177,7 +2191,6 @@ }, { "BriefDescription": "This metric represents fraction of cycles wit= h demand load accesses that hit the L2 cache under unloaded scenarios (poss= ibly L2 latency limited)", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "cpu_core@MEM_LOAD_RETIRED.L2_HIT@ * min(cpu_core@ME= M_LOAD_RETIRED.L2_HIT@R, 3 * tma_info_system_core_frequency) * (1 + cpu_cor= e@MEM_LOAD_RETIRED.FB_HIT@ / cpu_core@MEM_LOAD_RETIRED.L1_MISS@ / 2) / tma_= info_thread_clks", "MetricGroup": "MemoryLat;TopdownL4;tma_L4_group;tma_l2_bound_grou= p", "MetricName": "tma_l2_hit_latency", @@ -2198,12 +2211,11 @@ }, { "BriefDescription": "This metric estimates fraction of cycles with= demand load accesses that hit the L3 cache under unloaded scenarios (possi= bly L3 latency limited)", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "cpu_core@MEM_LOAD_RETIRED.L3_HIT@ * min(cpu_core@ME= M_LOAD_RETIRED.L3_HIT@R, 9 * tma_info_system_core_frequency) * (1 + cpu_cor= e@MEM_LOAD_RETIRED.FB_HIT@ / cpu_core@MEM_LOAD_RETIRED.L1_MISS@ / 2) / tma_= info_thread_clks", "MetricGroup": "BvML;MemoryLat;TopdownL4;tma_L4_group;tma_issueLat= ;tma_l3_bound_group", "MetricName": "tma_l3_hit_latency", "MetricThreshold": "tma_l3_hit_latency > 0.1 & (tma_l3_bound > 0.0= 5 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric estimates fraction of cycles wit= h demand load accesses that hit the L3 cache under unloaded scenarios (poss= ibly L3 latency limited). Avoiding private cache misses (i.e. L2 misses/L3= hits) will improve the latency; reduce contention with sibling physical co= res and increase performance. Note the value of this node may overlap with= its siblings. Sample with: MEM_LOAD_RETIRED.L3_HIT_PS. Related metrics: tm= a_bottleneck_cache_memory_latency, tma_mem_latency", + "PublicDescription": "This metric estimates fraction of cycles wit= h demand load accesses that hit the L3 cache under unloaded scenarios (poss= ibly L3 latency limited). Avoiding private cache misses (i.e. L2 misses/L3= hits) will improve the latency; reduce contention with sibling physical co= res and increase performance. Note the value of this node may overlap with= its siblings. Sample with: MEM_LOAD_RETIRED.L3_HIT_PS. Related metrics: tm= a_bottleneck_data_cache_memory_latency, tma_mem_latency", "ScaleUnit": "100%", "Unit": "cpu_core" }, @@ -2285,6 +2297,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles the= CPU spent handling cache misses due to lock operations", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "cpu_core@MEM_INST_RETIRED.LOCK_LOADS@ * cpu_core@ME= M_INST_RETIRED.LOCK_LOADS@R / tma_info_thread_clks", "MetricGroup": "LockCont;Offcore;TopdownL4;tma_L4_group;tma_issueR= FO;tma_l1_bound_group", "MetricName": "tma_lock_latency", @@ -2295,7 +2308,7 @@ }, { "BriefDescription": "This metric represents Core fraction of cycle= s in which CPU was likely limited due to LSD (Loop Stream Detector) unit", - "MetricExpr": "cpu_core@LSD.UOPS\\,cmask\\=3D0x8\\,inv\\=3D0x1@ / = tma_info_thread_clks", + "MetricExpr": "cpu_core@LSD.UOPS\\,cmask\\=3D0x8\\,inv\\=3D0x1@ / = tma_info_thread_clks / 2", "MetricGroup": "FetchBW;LSD;TopdownL3;tma_L3_group;tma_fetch_bandw= idth_group", "MetricName": "tma_lsd", "MetricThreshold": "tma_lsd > 0.15 & tma_fetch_bandwidth > 0.2", @@ -2320,7 +2333,7 @@ "MetricGroup": "BvMB;MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_d= ram_bound_group;tma_issueBW", "MetricName": "tma_mem_bandwidth", "MetricThreshold": "tma_mem_bandwidth > 0.2 & (tma_dram_bound > 0.= 1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric estimates fraction of cycles whe= re the core's performance was likely hurt due to approaching bandwidth limi= ts of external memory - DRAM ([SPR-HBM] and/or HBM). The underlying heuris= tic assumes that a similar off-core traffic is generated by all IA cores. T= his metric does not aggregate non-data-read requests by this logical proces= sor; requests from other IA Logical Processors/Physical Cores/sockets; or o= ther non-IA devices like GPU; hence the maximum external memory bandwidth l= imits may or may not be approached when this metric is flagged (see Uncore = counters for that). Related metrics: tma_bottleneck_cache_memory_bandwidth,= tma_fb_full, tma_info_system_dram_bw_use, tma_sq_full", + "PublicDescription": "This metric estimates fraction of cycles whe= re the core's performance was likely hurt due to approaching bandwidth limi= ts of external memory - DRAM ([SPR-HBM] and/or HBM). The underlying heuris= tic assumes that a similar off-core traffic is generated by all IA cores. T= his metric does not aggregate non-data-read requests by this logical proces= sor; requests from other IA Logical Processors/Physical Cores/sockets; or o= ther non-IA devices like GPU; hence the maximum external memory bandwidth l= imits may or may not be approached when this metric is flagged (see Uncore = counters for that). Related metrics: tma_bottleneck_data_cache_memory_bandw= idth, tma_fb_full, tma_info_system_dram_bw_use, tma_sq_full", "ScaleUnit": "100%", "Unit": "cpu_core" }, @@ -2330,13 +2343,13 @@ "MetricGroup": "BvML;MemoryLat;Offcore;TopdownL4;tma_L4_group;tma_= dram_bound_group;tma_issueLat", "MetricName": "tma_mem_latency", "MetricThreshold": "tma_mem_latency > 0.1 & (tma_dram_bound > 0.1 = & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric estimates fraction of cycles whe= re the performance was likely hurt due to latency from external memory - DR= AM ([SPR-HBM] and/or HBM). This metric does not aggregate requests from ot= her Logical Processors/Physical Cores/sockets (see Uncore counters for that= ). Related metrics: tma_bottleneck_cache_memory_latency, tma_l3_hit_latency= ", + "PublicDescription": "This metric estimates fraction of cycles whe= re the performance was likely hurt due to latency from external memory - DR= AM ([SPR-HBM] and/or HBM). This metric does not aggregate requests from ot= her Logical Processors/Physical Cores/sockets (see Uncore counters for that= ). Related metrics: tma_bottleneck_data_cache_memory_latency, tma_l3_hit_la= tency", "ScaleUnit": "100%", "Unit": "cpu_core" }, { "BriefDescription": "This metric represents fraction of slots the = Memory subsystem within the Backend was a bottleneck", - "MetricExpr": "cpu_core@topdown\\-mem\\-bound@ / (cpu_core@topdown= \\-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retiri= ng@ + cpu_core@topdown\\-be\\-bound@) + 0 * tma_info_thread_slots", + "MetricExpr": "cpu_core@topdown\\-mem\\-bound@ / (cpu_core@topdown= \\-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retiri= ng@ + cpu_core@topdown\\-be\\-bound@)", "MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_b= ound_group", "MetricName": "tma_memory_bound", "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0= .2", @@ -2347,7 +2360,6 @@ }, { "BriefDescription": "This metric represents fraction of cycles the= CPU was stalled due to LFENCE Instructions.", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "13 * cpu_core@MISC2_RETIRED.LFENCE@ / tma_info_thre= ad_clks", "MetricGroup": "TopdownL4;tma_L4_group;tma_serializing_operation_g= roup", "MetricName": "tma_memory_fence", @@ -2386,7 +2398,7 @@ }, { "BriefDescription": "This metric represents Core fraction of cycle= s in which CPU was likely limited due to the MITE pipeline (the legacy deco= de pipeline)", - "MetricExpr": "(cpu_core@IDQ.MITE_UOPS\\,cmask\\=3D0x8\\,inv\\=3D0= x1@ / 2 + cpu_core@IDQ.MITE_UOPS@ / (cpu_core@IDQ.DSB_UOPS@ + cpu_core@IDQ.= MITE_UOPS@) * (cpu_core@IDQ_BUBBLES.CYCLES_0_UOPS_DELIV.CORE@ - cpu_core@ID= Q_BUBBLES.FETCH_LATENCY@)) / tma_info_thread_clks", + "MetricExpr": "(cpu_core@IDQ.MITE_UOPS\\,cmask\\=3D0x8\\,inv\\=3D0= x1@ / 2 + cpu_core@IDQ.MITE_UOPS@ / (cpu_core@IDQ.DSB_UOPS@ + cpu_core@IDQ.= MITE_UOPS@) * (cpu_core@IDQ_BUBBLES.STARVATION_CYCLES@ - cpu_core@IDQ_BUBBL= ES.FETCH_LATENCY@)) / tma_info_thread_clks", "MetricGroup": "DSBmiss;FetchBW;TopdownL3;tma_L3_group;tma_fetch_b= andwidth_group", "MetricName": "tma_mite", "MetricThreshold": "tma_mite > 0.1 & tma_fetch_bandwidth > 0.2", @@ -2406,7 +2418,7 @@ }, { "BriefDescription": "This metric represents Core fraction of cycle= s in which CPU was likely limited due to the Microcode Sequencer (MS) unit = - see Microcode_Sequencer node for details.", - "MetricExpr": "cpu_core@IDQ.MS_CYCLES_ANY@ / tma_info_thread_clks", + "MetricExpr": "cpu_core@IDQ.MS_CYCLES_ANY@ / tma_info_thread_clks = / 1.8", "MetricGroup": "MicroSeq;TopdownL3;tma_L3_group;tma_fetch_bandwidt= h_group", "MetricName": "tma_ms", "MetricThreshold": "tma_ms > 0.05 & tma_fetch_bandwidth > 0.2", @@ -2445,7 +2457,8 @@ }, { "BriefDescription": "This metric represents the remaining light uo= ps fraction the CPU has executed - remaining means not covered by other sib= ling nodes", - "MetricExpr": "max(0, tma_light_operations - (tma_x87_use + (cpu_c= ore@FP_ARITH_INST_RETIRED.SCALAR@ + cpu_core@FP_ARITH_INST_RETIRED.VECTOR@)= / (tma_retiring * tma_info_thread_slots) + (cpu_core@INT_VEC_RETIRED.ADD_1= 28@ + cpu_core@INT_VEC_RETIRED.VNNI_128@ + cpu_core@INT_VEC_RETIRED.ADD_256= @ + cpu_core@INT_VEC_RETIRED.MUL_256@ + cpu_core@INT_VEC_RETIRED.VNNI_256@)= / (tma_retiring * tma_info_thread_slots) + tma_memory_operations + tma_fus= ed_instructions + tma_non_fused_branches))", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "max(0, tma_light_operations - (tma_x87_use + (cpu_c= ore@FP_ARITH_OPS_RETIRED.SCALAR@ + cpu_core@FP_ARITH_OPS_RETIRED.VECTOR@) /= (tma_retiring * tma_info_thread_slots) + (cpu_core@INT_VEC_RETIRED.ADD_128= @ + cpu_core@INT_VEC_RETIRED.VNNI_128@ + cpu_core@INT_VEC_RETIRED.ADD_256@ = + cpu_core@INT_VEC_RETIRED.MUL_256@ + cpu_core@INT_VEC_RETIRED.VNNI_256@) /= (tma_retiring * tma_info_thread_slots) + tma_memory_operations + tma_fused= _instructions + tma_non_fused_branches))", "MetricGroup": "Pipeline;TopdownL3;tma_L3_group;tma_light_operatio= ns_group", "MetricName": "tma_other_light_ops", "MetricThreshold": "tma_other_light_ops > 0.3 & tma_light_operatio= ns > 0.6", @@ -2483,6 +2496,7 @@ }, { "BriefDescription": "This metric estimates fraction of cycles the = CPU performance was potentially limited due to Core computation issues (non= divider-related)", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "((cpu_core@EXE_ACTIVITY.EXE_BOUND_0_PORTS@ + (cpu_c= ore@EXE_ACTIVITY.1_PORTS_UTIL@ + tma_retiring * cpu_core@EXE_ACTIVITY.2_3_P= ORTS_UTIL@)) / tma_info_thread_clks if cpu_core@ARITH.DIV_ACTIVE@ < cpu_cor= e@CYCLE_ACTIVITY.STALLS_TOTAL@ - cpu_core@EXE_ACTIVITY.BOUND_ON_LOADS@ else= (cpu_core@EXE_ACTIVITY.1_PORTS_UTIL@ + tma_retiring * cpu_core@EXE_ACTIVIT= Y.2_3_PORTS_UTIL@) / tma_info_thread_clks)", "MetricGroup": "PortsUtil;TopdownL3;tma_L3_group;tma_core_bound_gr= oup", "MetricName": "tma_ports_utilization", @@ -2493,6 +2507,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles CPU= executed no uops on any execution port (Logical Processor cycles since ICL= , Physical Core cycles otherwise)", + "MetricConstraint": "NO_THRESHOLD_AND_NMI", "MetricExpr": "cpu_core@EXE_ACTIVITY.EXE_BOUND_0_PORTS@ / tma_info= _thread_clks", "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_ports_utiliza= tion_group", "MetricName": "tma_ports_utilized_0", @@ -2503,6 +2518,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles whe= re the CPU executed total of 1 uop per cycle on all execution ports (Logica= l Processor cycles since ICL, Physical Core cycles otherwise)", + "MetricConstraint": "NO_THRESHOLD_AND_NMI", "MetricExpr": "cpu_core@EXE_ACTIVITY.1_PORTS_UTIL@ / tma_info_thre= ad_clks", "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_issueL1;tma_p= orts_utilization_group", "MetricName": "tma_ports_utilized_1", @@ -2513,7 +2529,6 @@ }, { "BriefDescription": "This metric represents fraction of cycles CPU= executed total of 2 uops per cycle on all execution ports (Logical Process= or cycles since ICL, Physical Core cycles otherwise)", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "cpu_core@EXE_ACTIVITY.2_PORTS_UTIL@ / tma_info_thre= ad_clks", "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_issue2P;tma_p= orts_utilization_group", "MetricName": "tma_ports_utilized_2", @@ -2524,7 +2539,6 @@ }, { "BriefDescription": "This metric represents fraction of cycles CPU= executed total of 3 or more uops per cycle on all execution ports (Logical= Processor cycles since ICL, Physical Core cycles otherwise)", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "cpu_core@UOPS_EXECUTED.CYCLES_GE_3@ / tma_info_thre= ad_clks", "MetricGroup": "BvCB;PortsUtil;TopdownL4;tma_L4_group;tma_ports_ut= ilization_group", "MetricName": "tma_ports_utilized_3m", @@ -2545,7 +2559,7 @@ { "BriefDescription": "This category represents fraction of slots ut= ilized by useful work i.e. issued uops that eventually get retired", "DefaultMetricgroupName": "TopdownL1", - "MetricExpr": "cpu_core@topdown\\-retiring@ / (cpu_core@topdown\\-= fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retiring@= + cpu_core@topdown\\-be\\-bound@) + 0 * tma_info_thread_slots", + "MetricExpr": "cpu_core@topdown\\-retiring@ / (cpu_core@topdown\\-= fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retiring@= + cpu_core@topdown\\-be\\-bound@)", "MetricGroup": "BvUW;Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_retiring", "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.= 1", @@ -2576,7 +2590,6 @@ }, { "BriefDescription": "This metric represents fraction of cycles the= CPU was stalled due to PAUSE Instructions", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "cpu_core@CPU_CLK_UNHALTED.PAUSE@ / tma_info_thread_= clks", "MetricGroup": "TopdownL4;tma_L4_group;tma_serializing_operation_g= roup", "MetricName": "tma_slow_pause", @@ -2611,7 +2624,7 @@ "MetricGroup": "BvMB;MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_i= ssueBW;tma_l3_bound_group", "MetricName": "tma_sq_full", "MetricThreshold": "tma_sq_full > 0.3 & (tma_l3_bound > 0.05 & (tm= a_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric measures fraction of cycles wher= e the Super Queue (SQ) was full taking into account all request-types and b= oth hardware SMT threads (Logical Processors). Related metrics: tma_bottlen= eck_cache_memory_bandwidth, tma_fb_full, tma_info_system_dram_bw_use, tma_m= em_bandwidth", + "PublicDescription": "This metric measures fraction of cycles wher= e the Super Queue (SQ) was full taking into account all request-types and b= oth hardware SMT threads (Logical Processors). Related metrics: tma_bottlen= eck_data_cache_memory_bandwidth, tma_fb_full, tma_info_system_dram_bw_use, = tma_mem_bandwidth", "ScaleUnit": "100%", "Unit": "cpu_core" }, @@ -2625,6 +2638,15 @@ "ScaleUnit": "100%", "Unit": "cpu_core" }, + { + "BriefDescription": "This metric estimates clocks wasted due to lo= ads blocked due to unknown store address (did not do memory disambiguation)= or due to unknown store data", + "MetricExpr": "7 * cpu_core@LD_BLOCKS.STORE_EARLY\\,cmask\\=3D1@ /= tma_info_thread_clks", + "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group", + "MetricName": "tma_store_early_blk", + "MetricThreshold": "tma_store_early_blk > 0.2", + "ScaleUnit": "100%", + "Unit": "cpu_core" + }, { "BriefDescription": "This metric roughly estimates fraction of cyc= les when the memory subsystem had loads blocked since they could not forwar= d data from earlier (in program order) overlapping stores", "MetricExpr": "13 * cpu_core@LD_BLOCKS.STORE_FORWARD@ / tma_info_t= hread_clks", diff --git a/tools/perf/pmu-events/arch/x86/lunarlake/memory.json b/tools/p= erf/pmu-events/arch/x86/lunarlake/memory.json index 8021a1c7dd3b..25021cb76f61 100644 --- a/tools/perf/pmu-events/arch/x86/lunarlake/memory.json +++ b/tools/perf/pmu-events/arch/x86/lunarlake/memory.json @@ -163,7 +163,7 @@ "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_1024", "MSRIndex": "0x3F6", "MSRValue": "0x400", - "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 1024 cycles. Reporte= d latency may be longer than just the memory latency. Available PDIST count= ers: 0", + "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 1024 cycles. Reporte= d latency may be longer than just the memory latency.", "SampleAfterValue": "53", "UMask": "0x1", "Unit": "cpu_core" @@ -176,7 +176,7 @@ "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_128", "MSRIndex": "0x3F6", "MSRValue": "0x80", - "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 128 cycles. Reported= latency may be longer than just the memory latency. Available PDIST counte= rs: 0", + "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 128 cycles. Reported= latency may be longer than just the memory latency.", "SampleAfterValue": "1009", "UMask": "0x1", "Unit": "cpu_core" @@ -189,7 +189,7 @@ "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_16", "MSRIndex": "0x3F6", "MSRValue": "0x10", - "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 16 cycles. Reported = latency may be longer than just the memory latency. Available PDIST counter= s: 0", + "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 16 cycles. Reported = latency may be longer than just the memory latency.", "SampleAfterValue": "20011", "UMask": "0x1", "Unit": "cpu_core" @@ -202,7 +202,7 @@ "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_2048", "MSRIndex": "0x3F6", "MSRValue": "0x800", - "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 2048 cycles. Reporte= d latency may be longer than just the memory latency. Available PDIST count= ers: 0", + "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 2048 cycles. Reporte= d latency may be longer than just the memory latency.", "SampleAfterValue": "23", "UMask": "0x1", "Unit": "cpu_core" @@ -215,7 +215,7 @@ "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_256", "MSRIndex": "0x3F6", "MSRValue": "0x100", - "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 256 cycles. Reported= latency may be longer than just the memory latency. Available PDIST counte= rs: 0", + "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 256 cycles. Reported= latency may be longer than just the memory latency.", "SampleAfterValue": "503", "UMask": "0x1", "Unit": "cpu_core" @@ -228,7 +228,7 @@ "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_32", "MSRIndex": "0x3F6", "MSRValue": "0x20", - "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 32 cycles. Reported = latency may be longer than just the memory latency. Available PDIST counter= s: 0", + "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 32 cycles. Reported = latency may be longer than just the memory latency.", "SampleAfterValue": "100007", "UMask": "0x1", "Unit": "cpu_core" @@ -241,7 +241,7 @@ "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_4", "MSRIndex": "0x3F6", "MSRValue": "0x4", - "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 4 cycles. Reported l= atency may be longer than just the memory latency. Available PDIST counters= : 0", + "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 4 cycles. Reported l= atency may be longer than just the memory latency.", "SampleAfterValue": "100003", "UMask": "0x1", "Unit": "cpu_core" @@ -254,7 +254,7 @@ "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_512", "MSRIndex": "0x3F6", "MSRValue": "0x200", - "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 512 cycles. Reported= latency may be longer than just the memory latency. Available PDIST counte= rs: 0", + "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 512 cycles. Reported= latency may be longer than just the memory latency.", "SampleAfterValue": "101", "UMask": "0x1", "Unit": "cpu_core" @@ -267,7 +267,7 @@ "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_64", "MSRIndex": "0x3F6", "MSRValue": "0x40", - "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 64 cycles. Reported = latency may be longer than just the memory latency. Available PDIST counter= s: 0", + "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 64 cycles. Reported = latency may be longer than just the memory latency.", "SampleAfterValue": "2003", "UMask": "0x1", "Unit": "cpu_core" @@ -280,7 +280,7 @@ "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_8", "MSRIndex": "0x3F6", "MSRValue": "0x8", - "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 8 cycles. Reported l= atency may be longer than just the memory latency. Available PDIST counters= : 0", + "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 8 cycles. Reported l= atency may be longer than just the memory latency.", "SampleAfterValue": "50021", "UMask": "0x1", "Unit": "cpu_core" @@ -291,7 +291,7 @@ "Data_LA": "1", "EventCode": "0xcd", "EventName": "MEM_TRANS_RETIRED.STORE_SAMPLE", - "PublicDescription": "Counts Retired memory accesses with at least= 1 store operation. This PEBS event is the precisely-distributed (PDist) tr= igger covering all stores uops for sampling by the PEBS Store Latency Facil= ity. The facility is described in Intel SDM Volume 3 section 19.9.8 Availab= le PDIST counters: 0", + "PublicDescription": "Counts Retired memory accesses with at least= 1 store operation. This PEBS event is the precisely-distributed (PDist) tr= igger covering all stores uops for sampling by the PEBS Store Latency Facil= ity. The facility is described in Intel SDM Volume 3 section 19.9.8 Availab= le PDIST counters: 0,1", "SampleAfterValue": "1000003", "UMask": "0x2", "Unit": "cpu_core" diff --git a/tools/perf/pmu-events/arch/x86/lunarlake/pipeline.json b/tools= /perf/pmu-events/arch/x86/lunarlake/pipeline.json index 6ac410510628..cdaa01e9a57d 100644 --- a/tools/perf/pmu-events/arch/x86/lunarlake/pipeline.json +++ b/tools/perf/pmu-events/arch/x86/lunarlake/pipeline.json @@ -110,7 +110,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc4", "EventName": "BR_INST_RETIRED.ALL_BRANCHES", - "PublicDescription": "Counts all branch instructions retired. Avai= lable PDIST counters: 0", + "PublicDescription": "Counts all branch instructions retired. Avai= lable PDIST counters: 0,1", "SampleAfterValue": "400009", "Unit": "cpu_core" }, @@ -128,7 +128,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc4", "EventName": "BR_INST_RETIRED.COND", - "PublicDescription": "Counts conditional branch instructions retir= ed. Available PDIST counters: 0", + "PublicDescription": "Counts conditional branch instructions retir= ed. Available PDIST counters: 0,1", "SampleAfterValue": "400009", "UMask": "0x111", "Unit": "cpu_core" @@ -147,7 +147,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc4", "EventName": "BR_INST_RETIRED.COND_NTAKEN", - "PublicDescription": "Counts not taken branch instructions retired= . Available PDIST counters: 0", + "PublicDescription": "Counts not taken branch instructions retired= . Available PDIST counters: 0,1", "SampleAfterValue": "400009", "UMask": "0x10", "Unit": "cpu_core" @@ -166,7 +166,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc4", "EventName": "BR_INST_RETIRED.COND_TAKEN", - "PublicDescription": "Counts taken conditional branch instructions= retired. Available PDIST counters: 0", + "PublicDescription": "Counts taken conditional branch instructions= retired. Available PDIST counters: 0,1", "SampleAfterValue": "400009", "UMask": "0x101", "Unit": "cpu_core" @@ -176,7 +176,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc4", "EventName": "BR_INST_RETIRED.COND_TAKEN_BWD", - "PublicDescription": "Counts taken backward conditional branch ins= tructions retired. Available PDIST counters: 0", + "PublicDescription": "Counts taken backward conditional branch ins= tructions retired. Available PDIST counters: 0,1", "SampleAfterValue": "400009", "UMask": "0x1", "Unit": "cpu_core" @@ -186,7 +186,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc4", "EventName": "BR_INST_RETIRED.COND_TAKEN_FWD", - "PublicDescription": "Counts taken forward conditional branch inst= ructions retired. Available PDIST counters: 0", + "PublicDescription": "Counts taken forward conditional branch inst= ructions retired. Available PDIST counters: 0,1", "SampleAfterValue": "400009", "UMask": "0x102", "Unit": "cpu_core" @@ -205,7 +205,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc4", "EventName": "BR_INST_RETIRED.FAR_BRANCH", - "PublicDescription": "Counts far branch instructions retired. Avai= lable PDIST counters: 0", + "PublicDescription": "Counts far branch instructions retired. Avai= lable PDIST counters: 0,1", "SampleAfterValue": "100007", "UMask": "0x40", "Unit": "cpu_core" @@ -224,7 +224,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc4", "EventName": "BR_INST_RETIRED.INDIRECT", - "PublicDescription": "Counts near indirect branch instructions ret= ired excluding returns. TSX abort is an indirect branch. Available PDIST co= unters: 0", + "PublicDescription": "Counts near indirect branch instructions ret= ired excluding returns. TSX abort is an indirect branch. Available PDIST co= unters: 0,1", "SampleAfterValue": "100003", "UMask": "0x80", "Unit": "cpu_core" @@ -261,7 +261,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc4", "EventName": "BR_INST_RETIRED.NEAR_CALL", - "PublicDescription": "Counts both direct and indirect near call in= structions retired. Available PDIST counters: 0", + "PublicDescription": "Counts both direct and indirect near call in= structions retired. Available PDIST counters: 0,1", "SampleAfterValue": "100007", "UMask": "0x2", "Unit": "cpu_core" @@ -280,13 +280,13 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc4", "EventName": "BR_INST_RETIRED.NEAR_RETURN", - "PublicDescription": "Counts return instructions retired. Availabl= e PDIST counters: 0", + "PublicDescription": "Counts return instructions retired. Availabl= e PDIST counters: 0,1", "SampleAfterValue": "100007", "UMask": "0x8", "Unit": "cpu_core" }, { - "BriefDescription": "Counts the number of taken branch instruction= s retired", + "BriefDescription": "Counts the number of near taken branch instru= ctions retired", "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc4", "EventName": "BR_INST_RETIRED.NEAR_TAKEN", @@ -299,7 +299,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc4", "EventName": "BR_INST_RETIRED.NEAR_TAKEN", - "PublicDescription": "Counts taken branch instructions retired. Av= ailable PDIST counters: 0", + "PublicDescription": "Counts taken branch instructions retired. Av= ailable PDIST counters: 0,1", "SampleAfterValue": "400009", "UMask": "0x20", "Unit": "cpu_core" @@ -327,7 +327,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc5", "EventName": "BR_MISP_RETIRED.ALL_BRANCHES", - "PublicDescription": "Counts all the retired branch instructions t= hat were mispredicted by the processor. A branch misprediction occurs when = the processor incorrectly predicts the destination of the branch. When the= misprediction is discovered at execution, all the instructions executed in= the wrong (speculative) path must be discarded, and the processor must sta= rt fetching from the correct path. Available PDIST counters: 0", + "PublicDescription": "Counts all the retired branch instructions t= hat were mispredicted by the processor. A branch misprediction occurs when = the processor incorrectly predicts the destination of the branch. When the= misprediction is discovered at execution, all the instructions executed in= the wrong (speculative) path must be discarded, and the processor must sta= rt fetching from the correct path. Available PDIST counters: 0,1", "SampleAfterValue": "400009", "Unit": "cpu_core" }, @@ -336,7 +336,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc5", "EventName": "BR_MISP_RETIRED.ALL_BRANCHES_COST", - "PublicDescription": "All mispredicted branch instructions retired= . This precise event may be used to get the misprediction cost via the Reti= re_Latency field of PEBS. It fires on the instruction that immediately foll= ows the mispredicted branch. Available PDIST counters: 0", + "PublicDescription": "All mispredicted branch instructions retired= . This precise event may be used to get the misprediction cost via the Reti= re_Latency field of PEBS. It fires on the instruction that immediately foll= ows the mispredicted branch. Available PDIST counters: 0,1", "SampleAfterValue": "400009", "UMask": "0x44", "Unit": "cpu_core" @@ -355,7 +355,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc5", "EventName": "BR_MISP_RETIRED.COND", - "PublicDescription": "Counts mispredicted conditional branch instr= uctions retired. Available PDIST counters: 0", + "PublicDescription": "Counts mispredicted conditional branch instr= uctions retired. Available PDIST counters: 0,1", "SampleAfterValue": "400009", "UMask": "0x111", "Unit": "cpu_core" @@ -365,7 +365,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc5", "EventName": "BR_MISP_RETIRED.COND_COST", - "PublicDescription": "Mispredicted conditional branch instructions= retired. This precise event may be used to get the misprediction cost via = the Retire_Latency field of PEBS. It fires on the instruction that immediat= ely follows the mispredicted branch. Available PDIST counters: 0", + "PublicDescription": "Mispredicted conditional branch instructions= retired. This precise event may be used to get the misprediction cost via = the Retire_Latency field of PEBS. It fires on the instruction that immediat= ely follows the mispredicted branch. Available PDIST counters: 0,1", "SampleAfterValue": "400009", "UMask": "0x151", "Unit": "cpu_core" @@ -384,7 +384,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc5", "EventName": "BR_MISP_RETIRED.COND_NTAKEN", - "PublicDescription": "Counts the number of conditional branch inst= ructions retired that were mispredicted and the branch direction was not ta= ken. Available PDIST counters: 0", + "PublicDescription": "Counts the number of conditional branch inst= ructions retired that were mispredicted and the branch direction was not ta= ken. Available PDIST counters: 0,1", "SampleAfterValue": "400009", "UMask": "0x10", "Unit": "cpu_core" @@ -394,7 +394,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc5", "EventName": "BR_MISP_RETIRED.COND_NTAKEN_COST", - "PublicDescription": "Mispredicted non-taken conditional branch in= structions retired. This precise event may be used to get the misprediction= cost via the Retire_Latency field of PEBS. It fires on the instruction tha= t immediately follows the mispredicted branch. Available PDIST counters: 0", + "PublicDescription": "Mispredicted non-taken conditional branch in= structions retired. This precise event may be used to get the misprediction= cost via the Retire_Latency field of PEBS. It fires on the instruction tha= t immediately follows the mispredicted branch. Available PDIST counters: 0,= 1", "SampleAfterValue": "400009", "UMask": "0x50", "Unit": "cpu_core" @@ -413,7 +413,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc5", "EventName": "BR_MISP_RETIRED.COND_TAKEN", - "PublicDescription": "Counts taken conditional mispredicted branch= instructions retired. Available PDIST counters: 0", + "PublicDescription": "Counts taken conditional mispredicted branch= instructions retired. Available PDIST counters: 0,1", "SampleAfterValue": "400009", "UMask": "0x101", "Unit": "cpu_core" @@ -423,7 +423,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc5", "EventName": "BR_MISP_RETIRED.COND_TAKEN_BWD", - "PublicDescription": "Counts taken backward conditional mispredict= ed branch instructions retired. Available PDIST counters: 0", + "PublicDescription": "Counts taken backward conditional mispredict= ed branch instructions retired. Available PDIST counters: 0,1", "SampleAfterValue": "400009", "UMask": "0x1", "Unit": "cpu_core" @@ -433,7 +433,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc5", "EventName": "BR_MISP_RETIRED.COND_TAKEN_BWD_COST", - "PublicDescription": "number of branch instructions retired that w= ere mispredicted and taken backward. This precise event may be used to get = the misprediction cost via the Retire_Latency field of PEBS. It fires on th= e instruction that immediately follows the mispredicted branch. Available P= DIST counters: 0", + "PublicDescription": "number of branch instructions retired that w= ere mispredicted and taken backward. This precise event may be used to get = the misprediction cost via the Retire_Latency field of PEBS. It fires on th= e instruction that immediately follows the mispredicted branch. Available P= DIST counters: 0,1", "SampleAfterValue": "400009", "UMask": "0x8001", "Unit": "cpu_core" @@ -443,7 +443,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc5", "EventName": "BR_MISP_RETIRED.COND_TAKEN_COST", - "PublicDescription": "Mispredicted taken conditional branch instru= ctions retired. This precise event may be used to get the misprediction cos= t via the Retire_Latency field of PEBS. It fires on the instruction that im= mediately follows the mispredicted branch. Available PDIST counters: 0", + "PublicDescription": "Mispredicted taken conditional branch instru= ctions retired. This precise event may be used to get the misprediction cos= t via the Retire_Latency field of PEBS. It fires on the instruction that im= mediately follows the mispredicted branch. Available PDIST counters: 0,1", "SampleAfterValue": "400009", "UMask": "0x141", "Unit": "cpu_core" @@ -453,7 +453,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc5", "EventName": "BR_MISP_RETIRED.COND_TAKEN_FWD", - "PublicDescription": "Counts taken forward conditional mispredicte= d branch instructions retired. Available PDIST counters: 0", + "PublicDescription": "Counts taken forward conditional mispredicte= d branch instructions retired. Available PDIST counters: 0,1", "SampleAfterValue": "400009", "Unit": "cpu_core" }, @@ -462,7 +462,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc5", "EventName": "BR_MISP_RETIRED.COND_TAKEN_FWD_COST", - "PublicDescription": "number of branch instructions retired that w= ere mispredicted and taken forward. This precise event may be used to get t= he misprediction cost via the Retire_Latency field of PEBS. It fires on the= instruction that immediately follows the mispredicted branch. Available PD= IST counters: 0", + "PublicDescription": "number of branch instructions retired that w= ere mispredicted and taken forward. This precise event may be used to get t= he misprediction cost via the Retire_Latency field of PEBS. It fires on the= instruction that immediately follows the mispredicted branch. Available PD= IST counters: 0,1", "SampleAfterValue": "400009", "UMask": "0x8002", "Unit": "cpu_core" @@ -481,7 +481,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc5", "EventName": "BR_MISP_RETIRED.INDIRECT", - "PublicDescription": "Counts miss-predicted near indirect branch i= nstructions retired excluding returns. TSX abort is an indirect branch. Ava= ilable PDIST counters: 0", + "PublicDescription": "Counts miss-predicted near indirect branch i= nstructions retired excluding returns. TSX abort is an indirect branch. Ava= ilable PDIST counters: 0,1", "SampleAfterValue": "100003", "UMask": "0x80", "Unit": "cpu_core" @@ -500,7 +500,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc5", "EventName": "BR_MISP_RETIRED.INDIRECT_CALL", - "PublicDescription": "Counts retired mispredicted indirect (near t= aken) CALL instructions, including both register and memory indirect. Avail= able PDIST counters: 0", + "PublicDescription": "Counts retired mispredicted indirect (near t= aken) CALL instructions, including both register and memory indirect. Avail= able PDIST counters: 0,1", "SampleAfterValue": "400009", "UMask": "0x2", "Unit": "cpu_core" @@ -510,7 +510,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc5", "EventName": "BR_MISP_RETIRED.INDIRECT_CALL_COST", - "PublicDescription": "Mispredicted indirect CALL retired. This pre= cise event may be used to get the misprediction cost via the Retire_Latency= field of PEBS. It fires on the instruction that immediately follows the mi= spredicted branch. Available PDIST counters: 0", + "PublicDescription": "Mispredicted indirect CALL retired. This pre= cise event may be used to get the misprediction cost via the Retire_Latency= field of PEBS. It fires on the instruction that immediately follows the mi= spredicted branch. Available PDIST counters: 0,1", "SampleAfterValue": "400009", "UMask": "0x42", "Unit": "cpu_core" @@ -520,7 +520,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc5", "EventName": "BR_MISP_RETIRED.INDIRECT_COST", - "PublicDescription": "Mispredicted near indirect branch instructio= ns retired (excluding returns). This precise event may be used to get the m= isprediction cost via the Retire_Latency field of PEBS. It fires on the ins= truction that immediately follows the mispredicted branch. Available PDIST = counters: 0", + "PublicDescription": "Mispredicted near indirect branch instructio= ns retired (excluding returns). This precise event may be used to get the m= isprediction cost via the Retire_Latency field of PEBS. It fires on the ins= truction that immediately follows the mispredicted branch. Available PDIST = counters: 0,1", "SampleAfterValue": "100003", "UMask": "0xc0", "Unit": "cpu_core" @@ -548,7 +548,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc5", "EventName": "BR_MISP_RETIRED.NEAR_TAKEN", - "PublicDescription": "Counts number of near branch instructions re= tired that were mispredicted and taken. Available PDIST counters: 0", + "PublicDescription": "Counts number of near branch instructions re= tired that were mispredicted and taken. Available PDIST counters: 0,1", "SampleAfterValue": "400009", "UMask": "0x20", "Unit": "cpu_core" @@ -558,7 +558,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc5", "EventName": "BR_MISP_RETIRED.NEAR_TAKEN_COST", - "PublicDescription": "Mispredicted taken near branch instructions = retired. This precise event may be used to get the misprediction cost via t= he Retire_Latency field of PEBS. It fires on the instruction that immediate= ly follows the mispredicted branch. Available PDIST counters: 0", + "PublicDescription": "Mispredicted taken near branch instructions = retired. This precise event may be used to get the misprediction cost via t= he Retire_Latency field of PEBS. It fires on the instruction that immediate= ly follows the mispredicted branch. Available PDIST counters: 0,1", "SampleAfterValue": "400009", "UMask": "0x60", "Unit": "cpu_core" @@ -568,7 +568,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc5", "EventName": "BR_MISP_RETIRED.RET", - "PublicDescription": "This is a non-precise version (that is, does= not use PEBS) of the event that counts mispredicted return instructions re= tired. Available PDIST counters: 0", + "PublicDescription": "This is a non-precise version (that is, does= not use PEBS) of the event that counts mispredicted return instructions re= tired. Available PDIST counters: 0,1", "SampleAfterValue": "100007", "UMask": "0x8", "Unit": "cpu_core" @@ -587,7 +587,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc5", "EventName": "BR_MISP_RETIRED.RET_COST", - "PublicDescription": "Mispredicted ret instructions retired. This = precise event may be used to get the misprediction cost via the Retire_Late= ncy field of PEBS. It fires on the instruction that immediately follows the= mispredicted branch. Available PDIST counters: 0", + "PublicDescription": "Mispredicted ret instructions retired. This = precise event may be used to get the misprediction cost via the Retire_Late= ncy field of PEBS. It fires on the instruction that immediately follows the= mispredicted branch. Available PDIST counters: 0,1", "SampleAfterValue": "100007", "UMask": "0x48", "Unit": "cpu_core" @@ -906,7 +906,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc0", "EventName": "INST_RETIRED.ANY_P", - "PublicDescription": "Counts the number of X86 instructions retire= d - an Architectural PerfMon event. Counting continues during hardware inte= rrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is co= unted by a designated fixed counter freeing up programmable counters to cou= nt other events. INST_RETIRED.ANY_P is counted by a programmable counter. A= vailable PDIST counters: 0", + "PublicDescription": "Counts the number of X86 instructions retire= d - an Architectural PerfMon event. Counting continues during hardware inte= rrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is co= unted by a designated fixed counter freeing up programmable counters to cou= nt other events. INST_RETIRED.ANY_P is counted by a programmable counter. A= vailable PDIST counters: 0,1", "SampleAfterValue": "2000003", "Unit": "cpu_core" }, @@ -915,7 +915,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc0", "EventName": "INST_RETIRED.BR_FUSED", - "PublicDescription": "retired macro-fused uops when there is a bra= nch in the macro-fused pair (the two instructions that got macro-fused coun= t once in this pmon) Available PDIST counters: 0", + "PublicDescription": "retired macro-fused uops when there is a bra= nch in the macro-fused pair (the two instructions that got macro-fused coun= t once in this pmon) Available PDIST counters: 0,1", "SampleAfterValue": "1000003", "UMask": "0x10", "Unit": "cpu_core" @@ -925,7 +925,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc0", "EventName": "INST_RETIRED.MACRO_FUSED", - "PublicDescription": "INST_RETIRED.MACRO_FUSED Available PDIST cou= nters: 0", + "PublicDescription": "INST_RETIRED.MACRO_FUSED Available PDIST cou= nters: 0,1", "SampleAfterValue": "2000003", "UMask": "0x30", "Unit": "cpu_core" @@ -935,7 +935,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc0", "EventName": "INST_RETIRED.NOP", - "PublicDescription": "Counts all retired NOP or ENDBR32/64 or PREF= ETCHIT0/1 instructions Available PDIST counters: 0", + "PublicDescription": "Counts all retired NOP or ENDBR32/64 or PREF= ETCHIT0/1 instructions Available PDIST counters: 0,1", "SampleAfterValue": "2000003", "UMask": "0x2", "Unit": "cpu_core" @@ -954,7 +954,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xc0", "EventName": "INST_RETIRED.REP_ITERATION", - "PublicDescription": "Number of iterations of Repeat (REP) string = retired instructions such as MOVS, CMPS, and SCAS. Each has a byte, word, a= nd doubleword version and string instructions can be repeated using a repet= ition prefix, REP, that allows their architectural execution to be repeated= a number of times as specified by the RCX register. Note the number of ite= rations is implementation-dependent. Available PDIST counters: 0", + "PublicDescription": "Number of iterations of Repeat (REP) string = retired instructions such as MOVS, CMPS, and SCAS. Each has a byte, word, a= nd doubleword version and string instructions can be repeated using a repet= ition prefix, REP, that allows their architectural execution to be repeated= a number of times as specified by the RCX register. Note the number of ite= rations is implementation-dependent. Available PDIST counters: 0,1", "SampleAfterValue": "2000003", "UMask": "0x8", "Unit": "cpu_core" @@ -1227,6 +1227,15 @@ "UMask": "0x88", "Unit": "cpu_core" }, + { + "BriefDescription": "Counts the number of times a load got early b= locked due to preceding store operation with unknown address or unknown dat= a. Excluding in-line (immediate) wakeups", + "Counter": "0,1,2,3,4,5,6,7,8,9", + "EventCode": "0x03", + "EventName": "LD_BLOCKS.STORE_EARLY", + "SampleAfterValue": "100003", + "UMask": "0xa1", + "Unit": "cpu_core" + }, { "BriefDescription": "Counts the number of occurrences a retired lo= ad gets blocked because its address partially overlaps with an older store = (size mismatch) - unknown_sta/bad_forward", "Counter": "0,1,2,3,4,5,6,7", @@ -1451,7 +1460,7 @@ "Counter": "0,1,2,3,4,5,6,7,8,9", "EventCode": "0xe4", "EventName": "MISC_RETIRED.LBR_INSERTS", - "PublicDescription": "LBR record is inserted Available PDIST count= ers: 0", + "PublicDescription": "LBR record is inserted Available PDIST count= ers: 0,1", "SampleAfterValue": "1000003", "UMask": "0x1", "Unit": "cpu_core" diff --git a/tools/perf/pmu-events/arch/x86/lunarlake/uncore-interconnect.j= son b/tools/perf/pmu-events/arch/x86/lunarlake/uncore-interconnect.json new file mode 100644 index 000000000000..69ef928d57f6 --- /dev/null +++ b/tools/perf/pmu-events/arch/x86/lunarlake/uncore-interconnect.json @@ -0,0 +1,10 @@ +[ + { + "BriefDescription": "This 48-bit fixed counter counts the UCLK cyc= les.", + "Counter": "FIXED", + "EventCode": "0xff", + "EventName": "UNC_CLOCK.SOCKET", + "PerPkg": "1", + "Unit": "SANTA" + } +] diff --git a/tools/perf/pmu-events/arch/x86/lunarlake/uncore-memory.json b/= tools/perf/pmu-events/arch/x86/lunarlake/uncore-memory.json index 7d63580302de..63c4aa2791e4 100644 --- a/tools/perf/pmu-events/arch/x86/lunarlake/uncore-memory.json +++ b/tools/perf/pmu-events/arch/x86/lunarlake/uncore-memory.json @@ -32,5 +32,13 @@ "Experimental": "1", "PerPkg": "1", "Unit": "iMC" + }, + { + "BriefDescription": "Total number of read and write byte transfers= to/from DRAM, in 32B chunk, per DDR channel. Counter increments by 1 after= sending or receiving 32B chunk data.", + "Counter": "0,1,2,3,4", + "EventCode": "0x3C", + "EventName": "UNC_M_TOTAL_DATA", + "PerPkg": "1", + "Unit": "iMC" } ] diff --git a/tools/perf/pmu-events/arch/x86/mapfile.csv b/tools/perf/pmu-ev= ents/arch/x86/mapfile.csv index 2d7a9fa055dd..894a3f485a4a 100644 --- a/tools/perf/pmu-events/arch/x86/mapfile.csv +++ b/tools/perf/pmu-events/arch/x86/mapfile.csv @@ -22,7 +22,7 @@ GenuineIntel-6-3A,v24,ivybridge,core GenuineIntel-6-3E,v24,ivytown,core GenuineIntel-6-2D,v24,jaketown,core GenuineIntel-6-(57|85),v16,knightslanding,core -GenuineIntel-6-BD,v1.14,lunarlake,core +GenuineIntel-6-BD,v1.17,lunarlake,core GenuineIntel-6-(AA|AC|B5),v1.14,meteorlake,core GenuineIntel-6-1[AEF],v4,nehalemep,core GenuineIntel-6-2E,v4,nehalemex,core --=20 2.50.0.727.gbf7dc18ff4-goog From nobody Mon Oct 6 15:13:22 2025 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 59F98221FB8 for ; Sat, 19 Jul 2025 03:45:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752896768; cv=none; b=G6c1f8z3KWjOhD/EwijNmonVz3j1y8QuArqzs4KRyaNMj9ruuGb8sz1orKXWkVofsM92TtN4po73O+5S2cb+3hIrb7zViC0NP0n+Y9P8DSc6bT1QqMG/zFyXMu/CL12p2akxSVmJp9N0BGGa73l3ka2OI40YvkhiO1gcan1VCX4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752896768; c=relaxed/simple; bh=bX9WJxECh4lW+Kc+1mUf+PEswG1V3kb1wMTtrI8/nEs=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Content-Type; b=thieACDt9Vcg+YiPGkottCeclShTaF45MkCoREsuQvstMLYfuout4vYc0KIsn8UhGdMI3xJ8lY86L+czTJ/87MjfvMwsDTnh9Hny//UsfPpdCCcyM/nokZGPUKjqnjYlTrr20rk/soZakX9/6htgi7VF6Riwu0kIgtrTBTLvvSQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=xPhuBDDt; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="xPhuBDDt" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-23827190886so27945575ad.3 for ; Fri, 18 Jul 2025 20:45:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1752896749; x=1753501549; darn=vger.kernel.org; h=content-transfer-encoding:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=8v3g6nLKVOLXeMC5F9qvchK0AWcwjg2MAn8iz73McHU=; b=xPhuBDDtm33x5tb9jxQl9SQMpuq/CsrEglQK5M1+hYN64xzYhvw+E7nlz+jQyh2SJD 3GbENUK6qKQ3/vWfIFrlHlXjUnOMugrNETMEma7F7BJk+3ShGRra5AnsXf9hokjD62IP +e2ZVk1jgozb/FVNPN8wpLNpBF4b/1ADdDXeEsjYhIqBg+uMGgmgR0UiAX6CpUlvbW1H PUsMqrRRlW6kXdnEQKQPbufNzWqA8tvogNN5a5T1XNxxpJ5PqAh3YLb0bztSNRcc8joO yj9WUnCAAW/fB1352+Exuf0j6T05An+eMO7ox53nT7DW+P2g3A0nDwCJlgEhauraRNcv kpHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752896749; x=1753501549; h=content-transfer-encoding:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=8v3g6nLKVOLXeMC5F9qvchK0AWcwjg2MAn8iz73McHU=; b=EfDdq0TW4DA5eq28fuPvK3qYvGAWxMGPd60PyGMCgMCASPOJMCY7jkkDV25iaRw3wG jZj11FDZOIQ9nOt4FZ/tDKzode6qrgx8s/Hub3Js2sfVdzVRwh6oBIWhWD01cwhGrbd5 skXglyRtZSSsp/KbdGVHXUzerIDfz79zh1wfZmkNQ1MVOKi1CWRwrh1RsjNFQNboR3c/ vEU+BL8/xyi5y3tSXorExvjGhb2KDMrdgkIUAzME9lh8YK27A4IPljvqRmUv3VusnsvF eNmF1X68l1AQbSdrTUUMYHQ2gpDMvmE0Yna7kQlaKaix18ZcmmsdQT7h/ABg8kxzHpQU uD+w== X-Forwarded-Encrypted: i=1; AJvYcCUNvtiyp/WuD+dnXORs/BNBvZmJ2gJHd6w2bn6F4pRX4GToraLzgM/as3wdc2RLAWe4w+Dbq21gFlArh1s=@vger.kernel.org X-Gm-Message-State: AOJu0Yw4+52QjYV0gFgoy3sDwTGNBoiu6kppPAWo5cLET2yDw4pSCfyZ x4qYgt5YFHJIVqEy4YC8Wv4PYTzD2Sp91waIEXzGnRW6G8a0EKNZWywj8Np9OtP7ob5RTQYIRT7 wgPwWevpXug== X-Google-Smtp-Source: AGHT+IEH1luCzsvYKVBNzjshBk1VqzkBMbB2Re1IlFDenp6G9KG6uOKiyrXKvdNlp8wavqgkUZZdhkAXCma1 X-Received: from plbba4.prod.google.com ([2002:a17:902:7204:b0:23d:dd69:dd0a]) (user=irogers job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:d2cc:b0:234:a139:11fa with SMTP id d9443c01a7336-23e2566ae85mr206471615ad.3.1752896749335; Fri, 18 Jul 2025 20:45:49 -0700 (PDT) Date: Fri, 18 Jul 2025 20:45:09 -0700 In-Reply-To: <20250719034515.2000467-1-irogers@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250719034515.2000467-1-irogers@google.com> X-Mailer: git-send-email 2.50.0.727.gbf7dc18ff4-goog Message-ID: <20250719034515.2000467-14-irogers@google.com> Subject: [PATCH v1 13/19] perf vendor events: Update meteorlake events/metrics From: Ian Rogers To: Thomas Falcon , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Kan Liang , "=?UTF-8?q?Andreas=20F=C3=A4rber?=" , Manivannan Sadhasivam , Caleb Biggers , Weilin Wang , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-actions@lists.infradead.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Update events from v1.14 to v1.16. Update metrics from TMA 5.0 to 5.1. The event updates come from: https://github.com/intel/perfmon/commit/c3e91c6e6b39429c57001d4942667f380ef= e8ea9 Signed-off-by: Ian Rogers Tested-by: Thomas Falcon --- tools/perf/pmu-events/arch/x86/mapfile.csv | 2 +- .../pmu-events/arch/x86/meteorlake/cache.json | 129 ++++++------- .../arch/x86/meteorlake/floating-point.json | 28 +-- .../arch/x86/meteorlake/frontend.json | 42 ++--- .../arch/x86/meteorlake/memory.json | 15 +- .../arch/x86/meteorlake/mtl-metrics.json | 103 ++++++----- .../pmu-events/arch/x86/meteorlake/other.json | 5 +- .../arch/x86/meteorlake/pipeline.json | 173 ++++++++---------- .../arch/x86/meteorlake/virtual-memory.json | 40 ++-- 9 files changed, 251 insertions(+), 286 deletions(-) diff --git a/tools/perf/pmu-events/arch/x86/mapfile.csv b/tools/perf/pmu-ev= ents/arch/x86/mapfile.csv index 894a3f485a4a..f8bf16d30602 100644 --- a/tools/perf/pmu-events/arch/x86/mapfile.csv +++ b/tools/perf/pmu-events/arch/x86/mapfile.csv @@ -23,7 +23,7 @@ GenuineIntel-6-3E,v24,ivytown,core GenuineIntel-6-2D,v24,jaketown,core GenuineIntel-6-(57|85),v16,knightslanding,core GenuineIntel-6-BD,v1.17,lunarlake,core -GenuineIntel-6-(AA|AC|B5),v1.14,meteorlake,core +GenuineIntel-6-(AA|AC|B5),v1.16,meteorlake,core GenuineIntel-6-1[AEF],v4,nehalemep,core GenuineIntel-6-2E,v4,nehalemex,core GenuineIntel-6-CC,v1.00,pantherlake,core diff --git a/tools/perf/pmu-events/arch/x86/meteorlake/cache.json b/tools/p= erf/pmu-events/arch/x86/meteorlake/cache.json index 82b115183924..f1d8db284a02 100644 --- a/tools/perf/pmu-events/arch/x86/meteorlake/cache.json +++ b/tools/perf/pmu-events/arch/x86/meteorlake/cache.json @@ -14,7 +14,6 @@ "Counter": "0,1,2,3", "EventCode": "0x51", "EventName": "L1D.HWPF_MISS", - "PublicDescription": "L1D.HWPF_MISS Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x20", "Unit": "cpu_core" @@ -24,7 +23,7 @@ "Counter": "0,1,2,3", "EventCode": "0x51", "EventName": "L1D.REPLACEMENT", - "PublicDescription": "Counts L1D data line replacements including = opportunistic replacements, and replacements that require stall-for-replace= or block-for-replace. Available PDIST counters: 0", + "PublicDescription": "Counts L1D data line replacements including = opportunistic replacements, and replacements that require stall-for-replace= or block-for-replace.", "SampleAfterValue": "100003", "UMask": "0x1", "Unit": "cpu_core" @@ -34,7 +33,7 @@ "Counter": "0,1,2,3", "EventCode": "0x48", "EventName": "L1D_PEND_MISS.FB_FULL", - "PublicDescription": "Counts number of cycles a demand request has= waited due to L1D Fill Buffer (FB) unavailability. Demand requests include= cacheable/uncacheable demand load, store, lock or SW prefetch accesses. Av= ailable PDIST counters: 0", + "PublicDescription": "Counts number of cycles a demand request has= waited due to L1D Fill Buffer (FB) unavailability. Demand requests include= cacheable/uncacheable demand load, store, lock or SW prefetch accesses.", "SampleAfterValue": "1000003", "UMask": "0x2", "Unit": "cpu_core" @@ -46,7 +45,7 @@ "EdgeDetect": "1", "EventCode": "0x48", "EventName": "L1D_PEND_MISS.FB_FULL_PERIODS", - "PublicDescription": "Counts number of phases a demand request has= waited due to L1D Fill Buffer (FB) unavailability. Demand requests include= cacheable/uncacheable demand load, store, lock or SW prefetch accesses. Av= ailable PDIST counters: 0", + "PublicDescription": "Counts number of phases a demand request has= waited due to L1D Fill Buffer (FB) unavailability. Demand requests include= cacheable/uncacheable demand load, store, lock or SW prefetch accesses.", "SampleAfterValue": "1000003", "UMask": "0x2", "Unit": "cpu_core" @@ -56,7 +55,7 @@ "Counter": "0,1,2,3", "EventCode": "0x48", "EventName": "L1D_PEND_MISS.L2_STALLS", - "PublicDescription": "Counts number of cycles a demand request has= waited due to L1D due to lack of L2 resources. Demand requests include cac= heable/uncacheable demand load, store, lock or SW prefetch accesses. Availa= ble PDIST counters: 0", + "PublicDescription": "Counts number of cycles a demand request has= waited due to L1D due to lack of L2 resources. Demand requests include cac= heable/uncacheable demand load, store, lock or SW prefetch accesses.", "SampleAfterValue": "1000003", "UMask": "0x4", "Unit": "cpu_core" @@ -66,7 +65,7 @@ "Counter": "0,1,2,3", "EventCode": "0x48", "EventName": "L1D_PEND_MISS.PENDING", - "PublicDescription": "Counts number of L1D misses that are outstan= ding in each cycle, that is each cycle the number of Fill Buffers (FB) outs= tanding required by Demand Reads. FB either is held by demand loads, or it = is held by non-demand loads and gets hit at least once by demand. The valid= outstanding interval is defined until the FB deallocation by one of the fo= llowing ways: from FB allocation, if FB is allocated by demand from the dem= and Hit FB, if it is allocated by hardware or software prefetch. Note: In t= he L1D, a Demand Read contains cacheable or noncacheable demand loads, incl= uding ones causing cache-line splits and reads due to page walks resulted f= rom any request type. Available PDIST counters: 0", + "PublicDescription": "Counts number of L1D misses that are outstan= ding in each cycle, that is each cycle the number of Fill Buffers (FB) outs= tanding required by Demand Reads. FB either is held by demand loads, or it = is held by non-demand loads and gets hit at least once by demand. The valid= outstanding interval is defined until the FB deallocation by one of the fo= llowing ways: from FB allocation, if FB is allocated by demand from the dem= and Hit FB, if it is allocated by hardware or software prefetch. Note: In t= he L1D, a Demand Read contains cacheable or noncacheable demand loads, incl= uding ones causing cache-line splits and reads due to page walks resulted f= rom any request type.", "SampleAfterValue": "1000003", "UMask": "0x1", "Unit": "cpu_core" @@ -77,7 +76,7 @@ "CounterMask": "1", "EventCode": "0x48", "EventName": "L1D_PEND_MISS.PENDING_CYCLES", - "PublicDescription": "Counts duration of L1D miss outstanding in c= ycles. Available PDIST counters: 0", + "PublicDescription": "Counts duration of L1D miss outstanding in c= ycles.", "SampleAfterValue": "1000003", "UMask": "0x1", "Unit": "cpu_core" @@ -87,7 +86,7 @@ "Counter": "0,1,2,3", "EventCode": "0x25", "EventName": "L2_LINES_IN.ALL", - "PublicDescription": "Counts the number of L2 cache lines filling = the L2. Counting does not cover rejects. Available PDIST counters: 0", + "PublicDescription": "Counts the number of L2 cache lines filling = the L2. Counting does not cover rejects.", "SampleAfterValue": "100003", "UMask": "0x1f", "Unit": "cpu_core" @@ -147,7 +146,7 @@ "Counter": "0,1,2,3", "EventCode": "0x26", "EventName": "L2_LINES_OUT.NON_SILENT", - "PublicDescription": "Counts the number of lines that are evicted = by L2 cache when triggered by an L2 cache fill. Those lines are in Modified= state. Modified lines are written back to L3 Available PDIST counters: 0", + "PublicDescription": "Counts the number of lines that are evicted = by L2 cache when triggered by an L2 cache fill. Those lines are in Modified= state. Modified lines are written back to L3", "SampleAfterValue": "200003", "UMask": "0x2", "Unit": "cpu_core" @@ -167,7 +166,7 @@ "Counter": "0,1,2,3", "EventCode": "0x26", "EventName": "L2_LINES_OUT.SILENT", - "PublicDescription": "Counts the number of lines that are silently= dropped by L2 cache. These lines are typically in Shared or Exclusive stat= e. A non-threaded event. Available PDIST counters: 0", + "PublicDescription": "Counts the number of lines that are silently= dropped by L2 cache. These lines are typically in Shared or Exclusive stat= e. A non-threaded event.", "SampleAfterValue": "200003", "UMask": "0x1", "Unit": "cpu_core" @@ -177,7 +176,7 @@ "Counter": "0,1,2,3", "EventCode": "0x26", "EventName": "L2_LINES_OUT.USELESS_HWPF", - "PublicDescription": "Counts the number of cache lines that have b= een prefetched by the L2 hardware prefetcher but not used by demand access = when evicted from the L2 cache Available PDIST counters: 0", + "PublicDescription": "Counts the number of cache lines that have b= een prefetched by the L2 hardware prefetcher but not used by demand access = when evicted from the L2 cache", "SampleAfterValue": "200003", "UMask": "0x4", "Unit": "cpu_core" @@ -187,7 +186,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_REQUEST.ALL", - "PublicDescription": "Counts all requests that were hit or true mi= sses in L2 cache. True-miss excludes misses that were merged with ongoing L= 2 misses. [This event is alias to L2_RQSTS.REFERENCES] Available PDIST coun= ters: 0", + "PublicDescription": "Counts all requests that were hit or true mi= sses in L2 cache. True-miss excludes misses that were merged with ongoing L= 2 misses. [This event is alias to L2_RQSTS.REFERENCES]", "SampleAfterValue": "200003", "UMask": "0xff", "Unit": "cpu_core" @@ -206,7 +205,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_REQUEST.HIT", - "PublicDescription": "Counts all requests that hit L2 cache. [This= event is alias to L2_RQSTS.HIT] Available PDIST counters: 0", + "PublicDescription": "Counts all requests that hit L2 cache. [This= event is alias to L2_RQSTS.HIT]", "SampleAfterValue": "200003", "UMask": "0xdf", "Unit": "cpu_core" @@ -225,7 +224,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_REQUEST.MISS", - "PublicDescription": "Counts read requests of any type with true-m= iss in the L2 cache. True-miss excludes L2 misses that were merged with ong= oing L2 misses. [This event is alias to L2_RQSTS.MISS] Available PDIST coun= ters: 0", + "PublicDescription": "Counts read requests of any type with true-m= iss in the L2 cache. True-miss excludes L2 misses that were merged with ong= oing L2 misses. [This event is alias to L2_RQSTS.MISS]", "SampleAfterValue": "200003", "UMask": "0x3f", "Unit": "cpu_core" @@ -244,7 +243,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.ALL_CODE_RD", - "PublicDescription": "Counts the total number of L2 code requests.= Available PDIST counters: 0", + "PublicDescription": "Counts the total number of L2 code requests.= ", "SampleAfterValue": "200003", "UMask": "0xe4", "Unit": "cpu_core" @@ -254,7 +253,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.ALL_DEMAND_DATA_RD", - "PublicDescription": "Counts Demand Data Read requests accessing t= he L2 cache. These requests may hit or miss L2 cache. True-miss exclude mis= ses that were merged with ongoing L2 misses. An access is counted once. Ava= ilable PDIST counters: 0", + "PublicDescription": "Counts Demand Data Read requests accessing t= he L2 cache. These requests may hit or miss L2 cache. True-miss exclude mis= ses that were merged with ongoing L2 misses. An access is counted once.", "SampleAfterValue": "200003", "UMask": "0xe1", "Unit": "cpu_core" @@ -264,7 +263,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.ALL_DEMAND_MISS", - "PublicDescription": "Counts demand requests that miss L2 cache. A= vailable PDIST counters: 0", + "PublicDescription": "Counts demand requests that miss L2 cache.", "SampleAfterValue": "200003", "UMask": "0x27", "Unit": "cpu_core" @@ -274,7 +273,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.ALL_DEMAND_REFERENCES", - "PublicDescription": "Counts demand requests to L2 cache. Availabl= e PDIST counters: 0", + "PublicDescription": "Counts demand requests to L2 cache.", "SampleAfterValue": "200003", "UMask": "0xe7", "Unit": "cpu_core" @@ -284,7 +283,6 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.ALL_HWPF", - "PublicDescription": "L2_RQSTS.ALL_HWPF Available PDIST counters: = 0", "SampleAfterValue": "200003", "UMask": "0xf0", "Unit": "cpu_core" @@ -294,7 +292,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.ALL_RFO", - "PublicDescription": "Counts the total number of RFO (read for own= ership) requests to L2 cache. L2 RFO requests include both L1D demand RFO m= isses as well as L1D RFO prefetches. Available PDIST counters: 0", + "PublicDescription": "Counts the total number of RFO (read for own= ership) requests to L2 cache. L2 RFO requests include both L1D demand RFO m= isses as well as L1D RFO prefetches.", "SampleAfterValue": "200003", "UMask": "0xe2", "Unit": "cpu_core" @@ -304,7 +302,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.CODE_RD_HIT", - "PublicDescription": "Counts L2 cache hits when fetching instructi= ons, code reads. Available PDIST counters: 0", + "PublicDescription": "Counts L2 cache hits when fetching instructi= ons, code reads.", "SampleAfterValue": "200003", "UMask": "0xc4", "Unit": "cpu_core" @@ -314,7 +312,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.CODE_RD_MISS", - "PublicDescription": "Counts L2 cache misses when fetching instruc= tions. Available PDIST counters: 0", + "PublicDescription": "Counts L2 cache misses when fetching instruc= tions.", "SampleAfterValue": "200003", "UMask": "0x24", "Unit": "cpu_core" @@ -324,7 +322,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.DEMAND_DATA_RD_HIT", - "PublicDescription": "Counts the number of demand Data Read reques= ts initiated by load instructions that hit L2 cache. Available PDIST counte= rs: 0", + "PublicDescription": "Counts the number of demand Data Read reques= ts initiated by load instructions that hit L2 cache.", "SampleAfterValue": "200003", "UMask": "0xc1", "Unit": "cpu_core" @@ -334,7 +332,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.DEMAND_DATA_RD_MISS", - "PublicDescription": "Counts demand Data Read requests with true-m= iss in the L2 cache. True-miss excludes misses that were merged with ongoin= g L2 misses. An access is counted once. Available PDIST counters: 0", + "PublicDescription": "Counts demand Data Read requests with true-m= iss in the L2 cache. True-miss excludes misses that were merged with ongoin= g L2 misses. An access is counted once.", "SampleAfterValue": "200003", "UMask": "0x21", "Unit": "cpu_core" @@ -344,7 +342,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.HIT", - "PublicDescription": "Counts all requests that hit L2 cache. [This= event is alias to L2_REQUEST.HIT] Available PDIST counters: 0", + "PublicDescription": "Counts all requests that hit L2 cache. [This= event is alias to L2_REQUEST.HIT]", "SampleAfterValue": "200003", "UMask": "0xdf", "Unit": "cpu_core" @@ -354,7 +352,6 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.HWPF_MISS", - "PublicDescription": "L2_RQSTS.HWPF_MISS Available PDIST counters:= 0", "SampleAfterValue": "200003", "UMask": "0x30", "Unit": "cpu_core" @@ -364,7 +361,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.MISS", - "PublicDescription": "Counts read requests of any type with true-m= iss in the L2 cache. True-miss excludes L2 misses that were merged with ong= oing L2 misses. [This event is alias to L2_REQUEST.MISS] Available PDIST co= unters: 0", + "PublicDescription": "Counts read requests of any type with true-m= iss in the L2 cache. True-miss excludes L2 misses that were merged with ong= oing L2 misses. [This event is alias to L2_REQUEST.MISS]", "SampleAfterValue": "200003", "UMask": "0x3f", "Unit": "cpu_core" @@ -374,7 +371,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.REFERENCES", - "PublicDescription": "Counts all requests that were hit or true mi= sses in L2 cache. True-miss excludes misses that were merged with ongoing L= 2 misses. [This event is alias to L2_REQUEST.ALL] Available PDIST counters:= 0", + "PublicDescription": "Counts all requests that were hit or true mi= sses in L2 cache. True-miss excludes misses that were merged with ongoing L= 2 misses. [This event is alias to L2_REQUEST.ALL]", "SampleAfterValue": "200003", "UMask": "0xff", "Unit": "cpu_core" @@ -384,7 +381,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.RFO_HIT", - "PublicDescription": "Counts the RFO (Read-for-Ownership) requests= that hit L2 cache. Available PDIST counters: 0", + "PublicDescription": "Counts the RFO (Read-for-Ownership) requests= that hit L2 cache.", "SampleAfterValue": "200003", "UMask": "0xc2", "Unit": "cpu_core" @@ -394,7 +391,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.RFO_MISS", - "PublicDescription": "Counts the RFO (Read-for-Ownership) requests= that miss L2 cache. Available PDIST counters: 0", + "PublicDescription": "Counts the RFO (Read-for-Ownership) requests= that miss L2 cache.", "SampleAfterValue": "200003", "UMask": "0x22", "Unit": "cpu_core" @@ -404,7 +401,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.SWPF_HIT", - "PublicDescription": "Counts Software prefetch requests that hit t= he L2 cache. Accounts for PREFETCHNTA and PREFETCHT0/1/2 instructions when = FB is not full. Available PDIST counters: 0", + "PublicDescription": "Counts Software prefetch requests that hit t= he L2 cache. Accounts for PREFETCHNTA and PREFETCHT0/1/2 instructions when = FB is not full.", "SampleAfterValue": "200003", "UMask": "0xc8", "Unit": "cpu_core" @@ -414,7 +411,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.SWPF_MISS", - "PublicDescription": "Counts Software prefetch requests that miss = the L2 cache. Accounts for PREFETCHNTA and PREFETCHT0/1/2 instructions when= FB is not full. Available PDIST counters: 0", + "PublicDescription": "Counts Software prefetch requests that miss = the L2 cache. Accounts for PREFETCHNTA and PREFETCHT0/1/2 instructions when= FB is not full.", "SampleAfterValue": "200003", "UMask": "0x28", "Unit": "cpu_core" @@ -424,7 +421,7 @@ "Counter": "0,1,2,3", "EventCode": "0x23", "EventName": "L2_TRANS.L2_WB", - "PublicDescription": "Counts L2 writebacks that access L2 cache. A= vailable PDIST counters: 0", + "PublicDescription": "Counts L2 writebacks that access L2 cache.", "SampleAfterValue": "200003", "UMask": "0x40", "Unit": "cpu_core" @@ -434,7 +431,7 @@ "Counter": "0,1,2,3", "EventCode": "0x42", "EventName": "LOCK_CYCLES.CACHE_LOCK_DURATION", - "PublicDescription": "This event counts the number of cycles when = the L1D is locked. It is a superset of the 0x1 mask (BUS_LOCK_CLOCKS.BUS_LO= CK_DURATION). Available PDIST counters: 0", + "PublicDescription": "This event counts the number of cycles when = the L1D is locked. It is a superset of the 0x1 mask (BUS_LOCK_CLOCKS.BUS_LO= CK_DURATION).", "SampleAfterValue": "2000003", "UMask": "0x2", "Unit": "cpu_core" @@ -454,7 +451,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0x2e", "EventName": "LONGEST_LAT_CACHE.MISS", - "PublicDescription": "Counts core-originated cacheable requests th= at miss the L3 cache (Longest Latency cache). Requests include data and cod= e reads, Reads-for-Ownership (RFOs), speculative accesses and hardware pref= etches to the L1 and L2. It does not include hardware prefetches to the L3= , and may not count other types of requests to the L3. Available PDIST coun= ters: 0", + "PublicDescription": "Counts core-originated cacheable requests th= at miss the L3 cache (Longest Latency cache). Requests include data and cod= e reads, Reads-for-Ownership (RFOs), speculative accesses and hardware pref= etches to the L1 and L2. It does not include hardware prefetches to the L3= , and may not count other types of requests to the L3.", "SampleAfterValue": "100003", "UMask": "0x41", "Unit": "cpu_core" @@ -474,7 +471,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0x2e", "EventName": "LONGEST_LAT_CACHE.REFERENCE", - "PublicDescription": "Counts core-originated cacheable requests to= the L3 cache (Longest Latency cache). Requests include data and code reads= , Reads-for-Ownership (RFOs), speculative accesses and hardware prefetches = to the L1 and L2. It does not include hardware prefetches to the L3, and m= ay not count other types of requests to the L3. Available PDIST counters: 0= ", + "PublicDescription": "Counts core-originated cacheable requests to= the L3 cache (Longest Latency cache). Requests include data and code reads= , Reads-for-Ownership (RFOs), speculative accesses and hardware prefetches = to the L1 and L2. It does not include hardware prefetches to the L3, and m= ay not count other types of requests to the L3.", "SampleAfterValue": "100003", "UMask": "0x4f", "Unit": "cpu_core" @@ -695,7 +692,7 @@ "Counter": "0,1,2,3", "EventCode": "0x43", "EventName": "MEM_LOAD_COMPLETED.L1_MISS_ANY", - "PublicDescription": "Number of completed demand load requests tha= t missed the L1 data cache including shadow misses (FB hits, merge to an on= going L1D miss) Available PDIST counters: 0", + "PublicDescription": "Number of completed demand load requests tha= t missed the L1 data cache including shadow misses (FB hits, merge to an on= going L1D miss)", "SampleAfterValue": "1000003", "UMask": "0xfd", "Unit": "cpu_core" @@ -947,7 +944,6 @@ "Counter": "0,1,2,3", "EventCode": "0x44", "EventName": "MEM_STORE_RETIRED.L2_HIT", - "PublicDescription": "MEM_STORE_RETIRED.L2_HIT Available PDIST cou= nters: 0", "SampleAfterValue": "200003", "UMask": "0x1", "Unit": "cpu_core" @@ -974,7 +970,7 @@ }, { "BriefDescription": "Counts the number of tagged load uops retired= that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD = - Only counts with PEBS enabled.", - "Counter": "0,1", + "Counter": "0,1,2,3,4,5,6,7", "Data_LA": "1", "EventCode": "0xd0", "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_1024", @@ -986,7 +982,7 @@ }, { "BriefDescription": "Counts the number of tagged load uops retired= that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD = - Only counts with PEBS enabled.", - "Counter": "0,1", + "Counter": "0,1,2,3,4,5,6,7", "Data_LA": "1", "EventCode": "0xd0", "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_128", @@ -998,7 +994,7 @@ }, { "BriefDescription": "Counts the number of tagged load uops retired= that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD = - Only counts with PEBS enabled.", - "Counter": "0,1", + "Counter": "0,1,2,3,4,5,6,7", "Data_LA": "1", "EventCode": "0xd0", "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_16", @@ -1010,7 +1006,7 @@ }, { "BriefDescription": "Counts the number of tagged load uops retired= that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD = - Only counts with PEBS enabled.", - "Counter": "0,1", + "Counter": "0,1,2,3,4,5,6,7", "Data_LA": "1", "EventCode": "0xd0", "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_2048", @@ -1022,7 +1018,7 @@ }, { "BriefDescription": "Counts the number of tagged load uops retired= that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD = - Only counts with PEBS enabled.", - "Counter": "0,1", + "Counter": "0,1,2,3,4,5,6,7", "Data_LA": "1", "EventCode": "0xd0", "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_256", @@ -1034,7 +1030,7 @@ }, { "BriefDescription": "Counts the number of tagged load uops retired= that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD = - Only counts with PEBS enabled.", - "Counter": "0,1", + "Counter": "0,1,2,3,4,5,6,7", "Data_LA": "1", "EventCode": "0xd0", "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_32", @@ -1046,7 +1042,7 @@ }, { "BriefDescription": "Counts the number of tagged load uops retired= that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD = - Only counts with PEBS enabled.", - "Counter": "0,1", + "Counter": "0,1,2,3,4,5,6,7", "Data_LA": "1", "EventCode": "0xd0", "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_4", @@ -1058,7 +1054,7 @@ }, { "BriefDescription": "Counts the number of tagged load uops retired= that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD = - Only counts with PEBS enabled.", - "Counter": "0,1", + "Counter": "0,1,2,3,4,5,6,7", "Data_LA": "1", "EventCode": "0xd0", "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_512", @@ -1070,7 +1066,7 @@ }, { "BriefDescription": "Counts the number of tagged load uops retired= that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD = - Only counts with PEBS enabled.", - "Counter": "0,1", + "Counter": "0,1,2,3,4,5,6,7", "Data_LA": "1", "EventCode": "0xd0", "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_64", @@ -1082,7 +1078,7 @@ }, { "BriefDescription": "Counts the number of tagged load uops retired= that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD = - Only counts with PEBS enabled.", - "Counter": "0,1", + "Counter": "0,1,2,3,4,5,6,7", "Data_LA": "1", "EventCode": "0xd0", "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_8", @@ -1177,7 +1173,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe5", "EventName": "MEM_UOP_RETIRED.ANY", - "PublicDescription": "Number of retired micro-operations (uops) fo= r load or store memory accesses Available PDIST counters: 0", + "PublicDescription": "Number of retired micro-operations (uops) fo= r load or store memory accesses", "SampleAfterValue": "1000003", "UMask": "0x3", "Unit": "cpu_core" @@ -1403,7 +1399,7 @@ "Counter": "0,1,2,3", "EventCode": "0x21", "EventName": "OFFCORE_REQUESTS.ALL_REQUESTS", - "PublicDescription": "Counts memory transactions reached the super= queue including requests initiated by the core, all L3 prefetches, page wa= lks, etc.. Available PDIST counters: 0", + "PublicDescription": "Counts memory transactions reached the super= queue including requests initiated by the core, all L3 prefetches, page wa= lks, etc..", "SampleAfterValue": "100003", "UMask": "0x80", "Unit": "cpu_core" @@ -1413,7 +1409,7 @@ "Counter": "0,1,2,3", "EventCode": "0x21", "EventName": "OFFCORE_REQUESTS.DATA_RD", - "PublicDescription": "Counts the demand and prefetch data reads. A= ll Core Data Reads include cacheable 'Demands' and L2 prefetchers (not L3 p= refetchers). Counting also covers reads due to page walks resulted from any= request type. Available PDIST counters: 0", + "PublicDescription": "Counts the demand and prefetch data reads. A= ll Core Data Reads include cacheable 'Demands' and L2 prefetchers (not L3 p= refetchers). Counting also covers reads due to page walks resulted from any= request type.", "SampleAfterValue": "100003", "UMask": "0x8", "Unit": "cpu_core" @@ -1423,7 +1419,7 @@ "Counter": "0,1,2,3", "EventCode": "0x21", "EventName": "OFFCORE_REQUESTS.DEMAND_CODE_RD", - "PublicDescription": "Counts both cacheable and Non-Cacheable code= read requests. Available PDIST counters: 0", + "PublicDescription": "Counts both cacheable and Non-Cacheable code= read requests.", "SampleAfterValue": "100003", "UMask": "0x2", "Unit": "cpu_core" @@ -1433,7 +1429,7 @@ "Counter": "0,1,2,3", "EventCode": "0x21", "EventName": "OFFCORE_REQUESTS.DEMAND_DATA_RD", - "PublicDescription": "Counts the Demand Data Read requests sent to= uncore. Use it in conjunction with OFFCORE_REQUESTS_OUTSTANDING to determi= ne average latency in the uncore. Available PDIST counters: 0", + "PublicDescription": "Counts the Demand Data Read requests sent to= uncore. Use it in conjunction with OFFCORE_REQUESTS_OUTSTANDING to determi= ne average latency in the uncore.", "SampleAfterValue": "100003", "UMask": "0x1", "Unit": "cpu_core" @@ -1443,7 +1439,7 @@ "Counter": "0,1,2,3", "EventCode": "0x21", "EventName": "OFFCORE_REQUESTS.DEMAND_RFO", - "PublicDescription": "Counts the demand RFO (read for ownership) r= equests including regular RFOs, locks, ItoM. Available PDIST counters: 0", + "PublicDescription": "Counts the demand RFO (read for ownership) r= equests including regular RFOs, locks, ItoM.", "SampleAfterValue": "100003", "UMask": "0x4", "Unit": "cpu_core" @@ -1454,7 +1450,7 @@ "CounterMask": "1", "EventCode": "0x20", "EventName": "OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD", - "PublicDescription": "Counts cycles when offcore outstanding cache= able Core Data Read transactions are present in the super queue. A transact= ion is considered to be in the Offcore outstanding state between L2 miss an= d transaction completion sent to requestor (SQ de-allocation). See correspo= nding Umask under OFFCORE_REQUESTS. Available PDIST counters: 0", + "PublicDescription": "Counts cycles when offcore outstanding cache= able Core Data Read transactions are present in the super queue. A transact= ion is considered to be in the Offcore outstanding state between L2 miss an= d transaction completion sent to requestor (SQ de-allocation). See correspo= nding Umask under OFFCORE_REQUESTS.", "SampleAfterValue": "1000003", "UMask": "0x8", "Unit": "cpu_core" @@ -1465,7 +1461,7 @@ "CounterMask": "1", "EventCode": "0x20", "EventName": "OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_CODE= _RD", - "PublicDescription": "Counts the number of offcore outstanding Cod= e Reads transactions in the super queue every cycle. The 'Offcore outstandi= ng' state of the transaction lasts from the L2 miss until the sending trans= action completion to requestor (SQ deallocation). See the corresponding Uma= sk under OFFCORE_REQUESTS. Available PDIST counters: 0", + "PublicDescription": "Counts the number of offcore outstanding Cod= e Reads transactions in the super queue every cycle. The 'Offcore outstandi= ng' state of the transaction lasts from the L2 miss until the sending trans= action completion to requestor (SQ deallocation). See the corresponding Uma= sk under OFFCORE_REQUESTS.", "SampleAfterValue": "1000003", "UMask": "0x2", "Unit": "cpu_core" @@ -1476,7 +1472,6 @@ "CounterMask": "1", "EventCode": "0x20", "EventName": "OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_DATA= _RD", - "PublicDescription": "Cycles where at least 1 outstanding demand d= ata read request is pending. Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x1", "Unit": "cpu_core" @@ -1487,7 +1482,7 @@ "CounterMask": "1", "EventCode": "0x20", "EventName": "OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_RFO", - "PublicDescription": "Counts the number of offcore outstanding dem= and rfo Reads transactions in the super queue every cycle. The 'Offcore out= standing' state of the transaction lasts from the L2 miss until the sending= transaction completion to requestor (SQ deallocation). See the correspondi= ng Umask under OFFCORE_REQUESTS. Available PDIST counters: 0", + "PublicDescription": "Counts the number of offcore outstanding dem= and rfo Reads transactions in the super queue every cycle. The 'Offcore out= standing' state of the transaction lasts from the L2 miss until the sending= transaction completion to requestor (SQ deallocation). See the correspondi= ng Umask under OFFCORE_REQUESTS.", "SampleAfterValue": "1000003", "UMask": "0x4", "Unit": "cpu_core" @@ -1497,7 +1492,6 @@ "Counter": "0,1,2,3", "EventCode": "0x20", "EventName": "OFFCORE_REQUESTS_OUTSTANDING.DATA_RD", - "PublicDescription": "OFFCORE_REQUESTS_OUTSTANDING.DATA_RD Availab= le PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x8", "Unit": "cpu_core" @@ -1507,7 +1501,7 @@ "Counter": "0,1,2,3", "EventCode": "0x20", "EventName": "OFFCORE_REQUESTS_OUTSTANDING.DEMAND_CODE_RD", - "PublicDescription": "Counts the number of offcore outstanding Cod= e Reads transactions in the super queue every cycle. The 'Offcore outstandi= ng' state of the transaction lasts from the L2 miss until the sending trans= action completion to requestor (SQ deallocation). See the corresponding Uma= sk under OFFCORE_REQUESTS. Available PDIST counters: 0", + "PublicDescription": "Counts the number of offcore outstanding Cod= e Reads transactions in the super queue every cycle. The 'Offcore outstandi= ng' state of the transaction lasts from the L2 miss until the sending trans= action completion to requestor (SQ deallocation). See the corresponding Uma= sk under OFFCORE_REQUESTS.", "SampleAfterValue": "1000003", "UMask": "0x2", "Unit": "cpu_core" @@ -1517,7 +1511,7 @@ "Counter": "0,1,2,3", "EventCode": "0x20", "EventName": "OFFCORE_REQUESTS_OUTSTANDING.DEMAND_DATA_RD", - "PublicDescription": "For every cycle, increments by the number of= outstanding demand data read requests pending. Requests are considered o= utstanding from the time they miss the core's L2 cache until the transactio= n completion message is sent to the requestor. Available PDIST counters: 0", + "PublicDescription": "For every cycle, increments by the number of= outstanding demand data read requests pending. Requests are considered o= utstanding from the time they miss the core's L2 cache until the transactio= n completion message is sent to the requestor.", "SampleAfterValue": "1000003", "UMask": "0x1", "Unit": "cpu_core" @@ -1527,7 +1521,7 @@ "Counter": "0,1,2,3", "EventCode": "0x20", "EventName": "OFFCORE_REQUESTS_OUTSTANDING.DEMAND_RFO", - "PublicDescription": "Counts the number of off-core outstanding re= ad-for-ownership (RFO) store transactions every cycle. An RFO transaction i= s considered to be in the Off-core outstanding state between L2 cache miss = and transaction completion. Available PDIST counters: 0", + "PublicDescription": "Counts the number of off-core outstanding re= ad-for-ownership (RFO) store transactions every cycle. An RFO transaction i= s considered to be in the Off-core outstanding state between L2 cache miss = and transaction completion.", "SampleAfterValue": "1000003", "UMask": "0x4", "Unit": "cpu_core" @@ -1537,7 +1531,7 @@ "Counter": "0,1,2,3", "EventCode": "0x2c", "EventName": "SQ_MISC.BUS_LOCK", - "PublicDescription": "Counts the more expensive bus lock needed to= enforce cache coherency for certain memory accesses that need to be done a= tomically. Can be created by issuing an atomic instruction (via the LOCK p= refix) which causes a cache line split or accesses uncacheable memory. Avai= lable PDIST counters: 0", + "PublicDescription": "Counts the more expensive bus lock needed to= enforce cache coherency for certain memory accesses that need to be done a= tomically. Can be created by issuing an atomic instruction (via the LOCK p= refix) which causes a cache line split or accesses uncacheable memory.", "SampleAfterValue": "100003", "UMask": "0x10", "Unit": "cpu_core" @@ -1547,7 +1541,6 @@ "Counter": "0,1,2,3", "EventCode": "0x40", "EventName": "SW_PREFETCH_ACCESS.ANY", - "PublicDescription": "Counts the number of PREFETCHNTA, PREFETCHW,= PREFETCHT0, PREFETCHT1 or PREFETCHT2 instructions executed. Available PDIS= T counters: 0", "SampleAfterValue": "100003", "UMask": "0xf", "Unit": "cpu_core" @@ -1557,7 +1550,7 @@ "Counter": "0,1,2,3", "EventCode": "0x40", "EventName": "SW_PREFETCH_ACCESS.NTA", - "PublicDescription": "Counts the number of PREFETCHNTA instruction= s executed. Available PDIST counters: 0", + "PublicDescription": "Counts the number of PREFETCHNTA instruction= s executed.", "SampleAfterValue": "100003", "UMask": "0x1", "Unit": "cpu_core" @@ -1567,7 +1560,7 @@ "Counter": "0,1,2,3", "EventCode": "0x40", "EventName": "SW_PREFETCH_ACCESS.PREFETCHW", - "PublicDescription": "Counts the number of PREFETCHW instructions = executed. Available PDIST counters: 0", + "PublicDescription": "Counts the number of PREFETCHW instructions = executed.", "SampleAfterValue": "100003", "UMask": "0x8", "Unit": "cpu_core" @@ -1577,7 +1570,7 @@ "Counter": "0,1,2,3", "EventCode": "0x40", "EventName": "SW_PREFETCH_ACCESS.T0", - "PublicDescription": "Counts the number of PREFETCHT0 instructions= executed. Available PDIST counters: 0", + "PublicDescription": "Counts the number of PREFETCHT0 instructions= executed.", "SampleAfterValue": "100003", "UMask": "0x2", "Unit": "cpu_core" @@ -1587,7 +1580,7 @@ "Counter": "0,1,2,3", "EventCode": "0x40", "EventName": "SW_PREFETCH_ACCESS.T1_T2", - "PublicDescription": "Counts the number of PREFETCHT1 or PREFETCHT= 2 instructions executed. Available PDIST counters: 0", + "PublicDescription": "Counts the number of PREFETCHT1 or PREFETCHT= 2 instructions executed.", "SampleAfterValue": "100003", "UMask": "0x4", "Unit": "cpu_core" diff --git a/tools/perf/pmu-events/arch/x86/meteorlake/floating-point.json = b/tools/perf/pmu-events/arch/x86/meteorlake/floating-point.json index ae9778aa755b..28dc5e06ee31 100644 --- a/tools/perf/pmu-events/arch/x86/meteorlake/floating-point.json +++ b/tools/perf/pmu-events/arch/x86/meteorlake/floating-point.json @@ -15,7 +15,6 @@ "CounterMask": "1", "EventCode": "0xb0", "EventName": "ARITH.FPDIV_ACTIVE", - "PublicDescription": "This event counts the cycles the floating po= int divider is busy. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x1", "Unit": "cpu_core" @@ -25,7 +24,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc1", "EventName": "ASSISTS.FP", - "PublicDescription": "Counts all microcode Floating Point assists.= Available PDIST counters: 0", + "PublicDescription": "Counts all microcode Floating Point assists.= ", "SampleAfterValue": "100003", "UMask": "0x2", "Unit": "cpu_core" @@ -35,7 +34,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc1", "EventName": "ASSISTS.SSE_AVX_MIX", - "PublicDescription": "ASSISTS.SSE_AVX_MIX Available PDIST counters= : 0", "SampleAfterValue": "1000003", "UMask": "0x10", "Unit": "cpu_core" @@ -45,7 +43,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb3", "EventName": "FP_ARITH_DISPATCHED.PORT_0", - "PublicDescription": "FP_ARITH_DISPATCHED.PORT_0 [This event is al= ias to FP_ARITH_DISPATCHED.V0] Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x1", "Unit": "cpu_core" @@ -55,7 +52,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb3", "EventName": "FP_ARITH_DISPATCHED.PORT_1", - "PublicDescription": "FP_ARITH_DISPATCHED.PORT_1 [This event is al= ias to FP_ARITH_DISPATCHED.V1] Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x2", "Unit": "cpu_core" @@ -65,7 +61,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb3", "EventName": "FP_ARITH_DISPATCHED.PORT_5", - "PublicDescription": "FP_ARITH_DISPATCHED.PORT_5 [This event is al= ias to FP_ARITH_DISPATCHED.V2] Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x4", "Unit": "cpu_core" @@ -75,7 +70,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb3", "EventName": "FP_ARITH_DISPATCHED.V0", - "PublicDescription": "FP_ARITH_DISPATCHED.V0 [This event is alias = to FP_ARITH_DISPATCHED.PORT_0] Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x1", "Unit": "cpu_core" @@ -85,7 +79,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb3", "EventName": "FP_ARITH_DISPATCHED.V1", - "PublicDescription": "FP_ARITH_DISPATCHED.V1 [This event is alias = to FP_ARITH_DISPATCHED.PORT_1] Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x2", "Unit": "cpu_core" @@ -95,7 +88,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb3", "EventName": "FP_ARITH_DISPATCHED.V2", - "PublicDescription": "FP_ARITH_DISPATCHED.V2 [This event is alias = to FP_ARITH_DISPATCHED.PORT_5] Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x4", "Unit": "cpu_core" @@ -105,7 +97,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE", - "PublicDescription": "Number of SSE/AVX computational 128-bit pack= ed double precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 2 computation opera= tions, one for each element. Applies to SSE* and AVX* packed double precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as the= y perform 2 calculations per element. The DAZ and FTZ flags in the MXCSR re= gister need to be set when using these events. Available PDIST counters: 0", + "PublicDescription": "Number of SSE/AVX computational 128-bit pack= ed double precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 2 computation opera= tions, one for each element. Applies to SSE* and AVX* packed double precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as the= y perform 2 calculations per element. The DAZ and FTZ flags in the MXCSR re= gister need to be set when using these events.", "SampleAfterValue": "100003", "UMask": "0x4", "Unit": "cpu_core" @@ -115,7 +107,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE", - "PublicDescription": "Number of SSE/AVX computational 128-bit pack= ed single precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 4 computation opera= tions, one for each element. Applies to SSE* and AVX* packed single precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT RSQRT RCP DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count tw= ice as they perform 2 calculations per element. The DAZ and FTZ flags in th= e MXCSR register need to be set when using these events. Available PDIST co= unters: 0", + "PublicDescription": "Number of SSE/AVX computational 128-bit pack= ed single precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 4 computation opera= tions, one for each element. Applies to SSE* and AVX* packed single precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT RSQRT RCP DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count tw= ice as they perform 2 calculations per element. The DAZ and FTZ flags in th= e MXCSR register need to be set when using these events.", "SampleAfterValue": "100003", "UMask": "0x8", "Unit": "cpu_core" @@ -125,7 +117,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE", - "PublicDescription": "Number of SSE/AVX computational 256-bit pack= ed double precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 4 computation opera= tions, one for each element. Applies to SSE* and AVX* packed double precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 = calculations per element. The DAZ and FTZ flags in the MXCSR register need = to be set when using these events. Available PDIST counters: 0", + "PublicDescription": "Number of SSE/AVX computational 256-bit pack= ed double precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 4 computation opera= tions, one for each element. Applies to SSE* and AVX* packed double precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 = calculations per element. The DAZ and FTZ flags in the MXCSR register need = to be set when using these events.", "SampleAfterValue": "100003", "UMask": "0x10", "Unit": "cpu_core" @@ -135,7 +127,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE", - "PublicDescription": "Number of SSE/AVX computational 256-bit pack= ed single precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 8 computation opera= tions, one for each element. Applies to SSE* and AVX* packed single precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT RSQRT RCP DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count tw= ice as they perform 2 calculations per element. The DAZ and FTZ flags in th= e MXCSR register need to be set when using these events. Available PDIST co= unters: 0", + "PublicDescription": "Number of SSE/AVX computational 256-bit pack= ed single precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 8 computation opera= tions, one for each element. Applies to SSE* and AVX* packed single precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT RSQRT RCP DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count tw= ice as they perform 2 calculations per element. The DAZ and FTZ flags in th= e MXCSR register need to be set when using these events.", "SampleAfterValue": "100003", "UMask": "0x20", "Unit": "cpu_core" @@ -145,7 +137,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.4_FLOPS", - "PublicDescription": "Number of SSE/AVX computational 128-bit pack= ed single precision and 256-bit packed double precision floating-point ins= tructions retired; some instructions will count twice as noted below. Each= count represents 2 or/and 4 computation operations, one for each element. = Applies to SSE* and AVX* packed single precision floating-point and packed= double precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL= DIV MIN MAX RCP14 RSQRT14 SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB ins= tructions count twice as they perform 2 calculations per element. The DAZ a= nd FTZ flags in the MXCSR register need to be set when using these events. = Available PDIST counters: 0", + "PublicDescription": "Number of SSE/AVX computational 128-bit pack= ed single precision and 256-bit packed double precision floating-point ins= tructions retired; some instructions will count twice as noted below. Each= count represents 2 or/and 4 computation operations, one for each element. = Applies to SSE* and AVX* packed single precision floating-point and packed= double precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL= DIV MIN MAX RCP14 RSQRT14 SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB ins= tructions count twice as they perform 2 calculations per element. The DAZ a= nd FTZ flags in the MXCSR register need to be set when using these events.", "SampleAfterValue": "100003", "UMask": "0x18", "Unit": "cpu_core" @@ -155,7 +147,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.SCALAR", - "PublicDescription": "Number of SSE/AVX computational scalar singl= e precision and double precision floating-point instructions retired; some = instructions will count twice as noted below. Each count represents 1 comp= utational operation. Applies to SSE* and AVX* scalar single precision float= ing-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT RCP FM(N)ADD/SUB= . FM(N)ADD/SUB instructions count twice as they perform 2 calculations per= element. The DAZ and FTZ flags in the MXCSR register need to be set when u= sing these events. Available PDIST counters: 0", + "PublicDescription": "Number of SSE/AVX computational scalar singl= e precision and double precision floating-point instructions retired; some = instructions will count twice as noted below. Each count represents 1 comp= utational operation. Applies to SSE* and AVX* scalar single precision float= ing-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT RCP FM(N)ADD/SUB= . FM(N)ADD/SUB instructions count twice as they perform 2 calculations per= element. The DAZ and FTZ flags in the MXCSR register need to be set when u= sing these events.", "SampleAfterValue": "1000003", "UMask": "0x3", "Unit": "cpu_core" @@ -165,7 +157,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.SCALAR_DOUBLE", - "PublicDescription": "Number of SSE/AVX computational scalar doubl= e precision floating-point instructions retired; some instructions will cou= nt twice as noted below. Each count represents 1 computational operation. = Applies to SSE* and AVX* scalar double precision floating-point instruction= s: ADD SUB MUL DIV MIN MAX SQRT FM(N)ADD/SUB. FM(N)ADD/SUB instructions co= unt twice as they perform 2 calculations per element. The DAZ and FTZ flags= in the MXCSR register need to be set when using these events. Available PD= IST counters: 0", + "PublicDescription": "Number of SSE/AVX computational scalar doubl= e precision floating-point instructions retired; some instructions will cou= nt twice as noted below. Each count represents 1 computational operation. = Applies to SSE* and AVX* scalar double precision floating-point instruction= s: ADD SUB MUL DIV MIN MAX SQRT FM(N)ADD/SUB. FM(N)ADD/SUB instructions co= unt twice as they perform 2 calculations per element. The DAZ and FTZ flags= in the MXCSR register need to be set when using these events.", "SampleAfterValue": "100003", "UMask": "0x1", "Unit": "cpu_core" @@ -175,7 +167,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.SCALAR_SINGLE", - "PublicDescription": "Number of SSE/AVX computational scalar singl= e precision floating-point instructions retired; some instructions will cou= nt twice as noted below. Each count represents 1 computational operation. = Applies to SSE* and AVX* scalar single precision floating-point instruction= s: ADD SUB MUL DIV MIN MAX SQRT RSQRT RCP FM(N)ADD/SUB. FM(N)ADD/SUB instr= uctions count twice as they perform 2 calculations per element. The DAZ and= FTZ flags in the MXCSR register need to be set when using these events. Av= ailable PDIST counters: 0", + "PublicDescription": "Number of SSE/AVX computational scalar singl= e precision floating-point instructions retired; some instructions will cou= nt twice as noted below. Each count represents 1 computational operation. = Applies to SSE* and AVX* scalar single precision floating-point instruction= s: ADD SUB MUL DIV MIN MAX SQRT RSQRT RCP FM(N)ADD/SUB. FM(N)ADD/SUB instr= uctions count twice as they perform 2 calculations per element. The DAZ and= FTZ flags in the MXCSR register need to be set when using these events.", "SampleAfterValue": "100003", "UMask": "0x2", "Unit": "cpu_core" @@ -185,7 +177,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.VECTOR", - "PublicDescription": "Number of any Vector retired FP arithmetic i= nstructions. The DAZ and FTZ flags in the MXCSR register need to be set wh= en using these events. Available PDIST counters: 0", + "PublicDescription": "Number of any Vector retired FP arithmetic i= nstructions. The DAZ and FTZ flags in the MXCSR register need to be set wh= en using these events.", "SampleAfterValue": "1000003", "UMask": "0xfc", "Unit": "cpu_core" diff --git a/tools/perf/pmu-events/arch/x86/meteorlake/frontend.json b/tool= s/perf/pmu-events/arch/x86/meteorlake/frontend.json index 82727022efb6..6484834b1127 100644 --- a/tools/perf/pmu-events/arch/x86/meteorlake/frontend.json +++ b/tools/perf/pmu-events/arch/x86/meteorlake/frontend.json @@ -14,7 +14,7 @@ "Counter": "0,1,2,3", "EventCode": "0x60", "EventName": "BACLEARS.ANY", - "PublicDescription": "Number of times the front-end is resteered w= hen it finds a branch instruction in a fetch line. This is called Unknown B= ranch which occurs for the first time a branch instruction is fetched or wh= en the branch is not tracked by the BPU (Branch Prediction Unit) anymore. A= vailable PDIST counters: 0", + "PublicDescription": "Number of times the front-end is resteered w= hen it finds a branch instruction in a fetch line. This is called Unknown B= ranch which occurs for the first time a branch instruction is fetched or wh= en the branch is not tracked by the BPU (Branch Prediction Unit) anymore.", "SampleAfterValue": "100003", "UMask": "0x1", "Unit": "cpu_core" @@ -24,7 +24,7 @@ "Counter": "0,1,2,3", "EventCode": "0x87", "EventName": "DECODE.LCP", - "PublicDescription": "Counts cycles that the Instruction Length de= coder (ILD) stalls occurred due to dynamically changing prefix length of th= e decoded instruction (by operand size prefix instruction 0x66, address siz= e prefix instruction 0x67 or REX.W for Intel64). Count is proportional to t= he number of prefixes in a 16B-line. This may result in a three-cycle penal= ty for each LCP (Length changing prefix) in a 16-byte chunk. Available PDIS= T counters: 0", + "PublicDescription": "Counts cycles that the Instruction Length de= coder (ILD) stalls occurred due to dynamically changing prefix length of th= e decoded instruction (by operand size prefix instruction 0x66, address siz= e prefix instruction 0x67 or REX.W for Intel64). Count is proportional to t= he number of prefixes in a 16B-line. This may result in a three-cycle penal= ty for each LCP (Length changing prefix) in a 16-byte chunk.", "SampleAfterValue": "500009", "UMask": "0x1", "Unit": "cpu_core" @@ -34,7 +34,6 @@ "Counter": "0,1,2,3", "EventCode": "0x87", "EventName": "DECODE.MS_BUSY", - "PublicDescription": "Cycles the Microcode Sequencer is busy. Avai= lable PDIST counters: 0", "SampleAfterValue": "500009", "UMask": "0x2", "Unit": "cpu_core" @@ -44,7 +43,7 @@ "Counter": "0,1,2,3", "EventCode": "0x61", "EventName": "DSB2MITE_SWITCHES.PENALTY_CYCLES", - "PublicDescription": "Decode Stream Buffer (DSB) is a Uop-cache th= at holds translations of previously fetched instructions that were decoded = by the legacy x86 decode pipeline (MITE). This event counts fetch penalty c= ycles when a transition occurs from DSB to MITE. Available PDIST counters: = 0", + "PublicDescription": "Decode Stream Buffer (DSB) is a Uop-cache th= at holds translations of previously fetched instructions that were decoded = by the legacy x86 decode pipeline (MITE). This event counts fetch penalty c= ycles when a transition occurs from DSB to MITE.", "SampleAfterValue": "100003", "UMask": "0x2", "Unit": "cpu_core" @@ -405,7 +404,7 @@ "Counter": "0,1,2,3", "EventCode": "0x80", "EventName": "ICACHE_DATA.STALLS", - "PublicDescription": "Counts cycles where a code line fetch is sta= lled due to an L1 instruction cache miss. The decode pipeline works at a 32= Byte granularity. Available PDIST counters: 0", + "PublicDescription": "Counts cycles where a code line fetch is sta= lled due to an L1 instruction cache miss. The decode pipeline works at a 32= Byte granularity.", "SampleAfterValue": "500009", "UMask": "0x4", "Unit": "cpu_core" @@ -417,7 +416,6 @@ "EdgeDetect": "1", "EventCode": "0x80", "EventName": "ICACHE_DATA.STALL_PERIODS", - "PublicDescription": "ICACHE_DATA.STALL_PERIODS Available PDIST co= unters: 0", "SampleAfterValue": "500009", "UMask": "0x4", "Unit": "cpu_core" @@ -427,7 +425,7 @@ "Counter": "0,1,2,3", "EventCode": "0x83", "EventName": "ICACHE_TAG.STALLS", - "PublicDescription": "Counts cycles where a code fetch is stalled = due to L1 instruction cache tag miss. Available PDIST counters: 0", + "PublicDescription": "Counts cycles where a code fetch is stalled = due to L1 instruction cache tag miss.", "SampleAfterValue": "200003", "UMask": "0x4", "Unit": "cpu_core" @@ -438,7 +436,7 @@ "CounterMask": "1", "EventCode": "0x79", "EventName": "IDQ.DSB_CYCLES_ANY", - "PublicDescription": "Counts the number of cycles uops were delive= red to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) p= ath. Available PDIST counters: 0", + "PublicDescription": "Counts the number of cycles uops were delive= red to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) p= ath.", "SampleAfterValue": "2000003", "UMask": "0x8", "Unit": "cpu_core" @@ -449,7 +447,7 @@ "CounterMask": "6", "EventCode": "0x79", "EventName": "IDQ.DSB_CYCLES_OK", - "PublicDescription": "Counts the number of cycles where optimal nu= mber of uops was delivered to the Instruction Decode Queue (IDQ) from the D= SB (Decode Stream Buffer) path. Count includes uops that may 'bypass' the I= DQ. Available PDIST counters: 0", + "PublicDescription": "Counts the number of cycles where optimal nu= mber of uops was delivered to the Instruction Decode Queue (IDQ) from the D= SB (Decode Stream Buffer) path. Count includes uops that may 'bypass' the I= DQ.", "SampleAfterValue": "2000003", "UMask": "0x8", "Unit": "cpu_core" @@ -459,7 +457,7 @@ "Counter": "0,1,2,3", "EventCode": "0x79", "EventName": "IDQ.DSB_UOPS", - "PublicDescription": "Counts the number of uops delivered to Instr= uction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path. Availab= le PDIST counters: 0", + "PublicDescription": "Counts the number of uops delivered to Instr= uction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path.", "SampleAfterValue": "2000003", "UMask": "0x8", "Unit": "cpu_core" @@ -470,7 +468,7 @@ "CounterMask": "1", "EventCode": "0x79", "EventName": "IDQ.MITE_CYCLES_ANY", - "PublicDescription": "Counts the number of cycles uops were delive= red to the Instruction Decode Queue (IDQ) from the MITE (legacy decode pipe= line) path. During these cycles uops are not being delivered from the Decod= e Stream Buffer (DSB). Available PDIST counters: 0", + "PublicDescription": "Counts the number of cycles uops were delive= red to the Instruction Decode Queue (IDQ) from the MITE (legacy decode pipe= line) path. During these cycles uops are not being delivered from the Decod= e Stream Buffer (DSB).", "SampleAfterValue": "2000003", "UMask": "0x4", "Unit": "cpu_core" @@ -481,7 +479,7 @@ "CounterMask": "6", "EventCode": "0x79", "EventName": "IDQ.MITE_CYCLES_OK", - "PublicDescription": "Counts the number of cycles where optimal nu= mber of uops was delivered to the Instruction Decode Queue (IDQ) from the M= ITE (legacy decode pipeline) path. During these cycles uops are not being d= elivered from the Decode Stream Buffer (DSB). Available PDIST counters: 0", + "PublicDescription": "Counts the number of cycles where optimal nu= mber of uops was delivered to the Instruction Decode Queue (IDQ) from the M= ITE (legacy decode pipeline) path. During these cycles uops are not being d= elivered from the Decode Stream Buffer (DSB).", "SampleAfterValue": "2000003", "UMask": "0x4", "Unit": "cpu_core" @@ -491,7 +489,7 @@ "Counter": "0,1,2,3", "EventCode": "0x79", "EventName": "IDQ.MITE_UOPS", - "PublicDescription": "Counts the number of uops delivered to Instr= uction Decode Queue (IDQ) from the MITE path. This also means that uops are= not being delivered from the Decode Stream Buffer (DSB). Available PDIST c= ounters: 0", + "PublicDescription": "Counts the number of uops delivered to Instr= uction Decode Queue (IDQ) from the MITE path. This also means that uops are= not being delivered from the Decode Stream Buffer (DSB).", "SampleAfterValue": "2000003", "UMask": "0x4", "Unit": "cpu_core" @@ -502,7 +500,7 @@ "CounterMask": "1", "EventCode": "0x79", "EventName": "IDQ.MS_CYCLES_ANY", - "PublicDescription": "Counts cycles during which uops are being de= livered to Instruction Decode Queue (IDQ) while the Microcode Sequencer (MS= ) is busy. Uops maybe initiated by Decode Stream Buffer (DSB) or MITE. Avai= lable PDIST counters: 0", + "PublicDescription": "Counts cycles during which uops are being de= livered to Instruction Decode Queue (IDQ) while the Microcode Sequencer (MS= ) is busy. Uops maybe initiated by Decode Stream Buffer (DSB) or MITE.", "SampleAfterValue": "2000003", "UMask": "0x20", "Unit": "cpu_core" @@ -514,7 +512,7 @@ "EdgeDetect": "1", "EventCode": "0x79", "EventName": "IDQ.MS_SWITCHES", - "PublicDescription": "Number of switches from DSB (Decode Stream B= uffer) or MITE (legacy decode pipeline) to the Microcode Sequencer. Availab= le PDIST counters: 0", + "PublicDescription": "Number of switches from DSB (Decode Stream B= uffer) or MITE (legacy decode pipeline) to the Microcode Sequencer.", "SampleAfterValue": "100003", "UMask": "0x20", "Unit": "cpu_core" @@ -524,7 +522,7 @@ "Counter": "0,1,2,3", "EventCode": "0x79", "EventName": "IDQ.MS_UOPS", - "PublicDescription": "Counts the number of uops initiated by MITE = or Decode Stream Buffer (DSB) and delivered to Instruction Decode Queue (ID= Q) while the Microcode Sequencer (MS) is busy. Counting includes uops that = may 'bypass' the IDQ. Available PDIST counters: 0", + "PublicDescription": "Counts the number of uops initiated by MITE = or Decode Stream Buffer (DSB) and delivered to Instruction Decode Queue (ID= Q) while the Microcode Sequencer (MS) is busy. Counting includes uops that = may 'bypass' the IDQ.", "SampleAfterValue": "1000003", "UMask": "0x20", "Unit": "cpu_core" @@ -534,7 +532,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0x9c", "EventName": "IDQ_BUBBLES.CORE", - "PublicDescription": "This event counts a subset of the Topdown Sl= ots event that when no operation was delivered to the back-end pipeline due= to instruction fetch limitations when the back-end could have accepted mor= e operations. Common examples include instruction cache misses or x86 instr= uction decode limitations. The count may be distributed among unhalted logi= cal processors (hyper-threads) who share the same physical core, in process= ors that support Intel Hyper-Threading Technology. Software can use this ev= ent as the numerator for the Frontend Bound metric (or top-level category) = of the Top-down Microarchitecture Analysis method. Available PDIST counters= : 0", + "PublicDescription": "This event counts a subset of the Topdown Sl= ots event that when no operation was delivered to the back-end pipeline due= to instruction fetch limitations when the back-end could have accepted mor= e operations. Common examples include instruction cache misses or x86 instr= uction decode limitations. The count may be distributed among unhalted logi= cal processors (hyper-threads) who share the same physical core, in process= ors that support Intel Hyper-Threading Technology. Software can use this ev= ent as the numerator for the Frontend Bound metric (or top-level category) = of the Top-down Microarchitecture Analysis method.", "SampleAfterValue": "1000003", "UMask": "0x1", "Unit": "cpu_core" @@ -545,7 +543,7 @@ "CounterMask": "6", "EventCode": "0x9c", "EventName": "IDQ_BUBBLES.CYCLES_0_UOPS_DELIV.CORE", - "PublicDescription": "Counts the number of cycles when no uops wer= e delivered by the Instruction Decode Queue (IDQ) to the back-end of the pi= peline when there was no back-end stalls. This event counts for one SMT thr= ead in a given cycle. [This event is alias to IDQ_UOPS_NOT_DELIVERED.CYCLES= _0_UOPS_DELIV.CORE] Available PDIST counters: 0", + "PublicDescription": "Counts the number of cycles when no uops wer= e delivered by the Instruction Decode Queue (IDQ) to the back-end of the pi= peline when there was no back-end stalls. This event counts for one SMT thr= ead in a given cycle. [This event is alias to IDQ_UOPS_NOT_DELIVERED.CYCLES= _0_UOPS_DELIV.CORE]", "SampleAfterValue": "1000003", "UMask": "0x1", "Unit": "cpu_core" @@ -557,7 +555,7 @@ "EventCode": "0x9c", "EventName": "IDQ_BUBBLES.CYCLES_FE_WAS_OK", "Invert": "1", - "PublicDescription": "Counts the number of cycles when the optimal= number of uops were delivered by the Instruction Decode Queue (IDQ) to the= back-end of the pipeline when there was no back-end stalls. This event cou= nts for one SMT thread in a given cycle. [This event is alias to IDQ_UOPS_N= OT_DELIVERED.CYCLES_FE_WAS_OK] Available PDIST counters: 0", + "PublicDescription": "Counts the number of cycles when the optimal= number of uops were delivered by the Instruction Decode Queue (IDQ) to the= back-end of the pipeline when there was no back-end stalls. This event cou= nts for one SMT thread in a given cycle. [This event is alias to IDQ_UOPS_N= OT_DELIVERED.CYCLES_FE_WAS_OK]", "SampleAfterValue": "1000003", "UMask": "0x1", "Unit": "cpu_core" @@ -567,7 +565,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0x9c", "EventName": "IDQ_UOPS_NOT_DELIVERED.CORE", - "PublicDescription": "Counts the number of uops not delivered to b= y the Instruction Decode Queue (IDQ) to the back-end of the pipeline when t= here was no back-end stalls. This event counts for one SMT thread in a give= n cycle. Available PDIST counters: 0", + "PublicDescription": "Counts the number of uops not delivered to b= y the Instruction Decode Queue (IDQ) to the back-end of the pipeline when t= here was no back-end stalls. This event counts for one SMT thread in a give= n cycle.", "SampleAfterValue": "1000003", "UMask": "0x1", "Unit": "cpu_core" @@ -578,7 +576,7 @@ "CounterMask": "6", "EventCode": "0x9c", "EventName": "IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE", - "PublicDescription": "Counts the number of cycles when no uops wer= e delivered by the Instruction Decode Queue (IDQ) to the back-end of the pi= peline when there was no back-end stalls. This event counts for one SMT thr= ead in a given cycle. [This event is alias to IDQ_BUBBLES.CYCLES_0_UOPS_DEL= IV.CORE] Available PDIST counters: 0", + "PublicDescription": "Counts the number of cycles when no uops wer= e delivered by the Instruction Decode Queue (IDQ) to the back-end of the pi= peline when there was no back-end stalls. This event counts for one SMT thr= ead in a given cycle. [This event is alias to IDQ_BUBBLES.CYCLES_0_UOPS_DEL= IV.CORE]", "SampleAfterValue": "1000003", "UMask": "0x1", "Unit": "cpu_core" @@ -590,7 +588,7 @@ "EventCode": "0x9c", "EventName": "IDQ_UOPS_NOT_DELIVERED.CYCLES_FE_WAS_OK", "Invert": "1", - "PublicDescription": "Counts the number of cycles when the optimal= number of uops were delivered by the Instruction Decode Queue (IDQ) to the= back-end of the pipeline when there was no back-end stalls. This event cou= nts for one SMT thread in a given cycle. [This event is alias to IDQ_BUBBLE= S.CYCLES_FE_WAS_OK] Available PDIST counters: 0", + "PublicDescription": "Counts the number of cycles when the optimal= number of uops were delivered by the Instruction Decode Queue (IDQ) to the= back-end of the pipeline when there was no back-end stalls. This event cou= nts for one SMT thread in a given cycle. [This event is alias to IDQ_BUBBLE= S.CYCLES_FE_WAS_OK]", "SampleAfterValue": "1000003", "UMask": "0x1", "Unit": "cpu_core" diff --git a/tools/perf/pmu-events/arch/x86/meteorlake/memory.json b/tools/= perf/pmu-events/arch/x86/meteorlake/memory.json index 17b94f810d5a..f0cbeda4d5ca 100644 --- a/tools/perf/pmu-events/arch/x86/meteorlake/memory.json +++ b/tools/perf/pmu-events/arch/x86/meteorlake/memory.json @@ -5,7 +5,6 @@ "CounterMask": "2", "EventCode": "0xa3", "EventName": "CYCLE_ACTIVITY.CYCLES_L3_MISS", - "PublicDescription": "Cycles while L3 cache miss demand load is ou= tstanding. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x2", "Unit": "cpu_core" @@ -16,7 +15,6 @@ "CounterMask": "6", "EventCode": "0xa3", "EventName": "CYCLE_ACTIVITY.STALLS_L3_MISS", - "PublicDescription": "Execution stalls while L3 cache miss demand = load is outstanding. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x6", "Unit": "cpu_core" @@ -90,7 +88,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc3", "EventName": "MACHINE_CLEARS.MEMORY_ORDERING", - "PublicDescription": "Counts the number of Machine Clears detected= dye to memory ordering. Memory Ordering Machine Clears may apply when a me= mory read may not conform to the memory ordering rules of the x86 architect= ure Available PDIST counters: 0", + "PublicDescription": "Counts the number of Machine Clears detected= dye to memory ordering. Memory Ordering Machine Clears may apply when a me= mory read may not conform to the memory ordering rules of the x86 architect= ure", "SampleAfterValue": "100003", "UMask": "0x2", "Unit": "cpu_core" @@ -101,7 +99,6 @@ "CounterMask": "2", "EventCode": "0x47", "EventName": "MEMORY_ACTIVITY.CYCLES_L1D_MISS", - "PublicDescription": "Cycles while L1 cache miss demand load is ou= tstanding. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x2", "Unit": "cpu_core" @@ -112,7 +109,6 @@ "CounterMask": "3", "EventCode": "0x47", "EventName": "MEMORY_ACTIVITY.STALLS_L1D_MISS", - "PublicDescription": "Execution stalls while L1 cache miss demand = load is outstanding. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x3", "Unit": "cpu_core" @@ -123,7 +119,7 @@ "CounterMask": "5", "EventCode": "0x47", "EventName": "MEMORY_ACTIVITY.STALLS_L2_MISS", - "PublicDescription": "Execution stalls while L2 cache miss demand = cacheable load request is outstanding (will not count for uncacheable deman= d requests e.g. bus lock). Available PDIST counters: 0", + "PublicDescription": "Execution stalls while L2 cache miss demand = cacheable load request is outstanding (will not count for uncacheable deman= d requests e.g. bus lock).", "SampleAfterValue": "1000003", "UMask": "0x5", "Unit": "cpu_core" @@ -134,7 +130,7 @@ "CounterMask": "9", "EventCode": "0x47", "EventName": "MEMORY_ACTIVITY.STALLS_L3_MISS", - "PublicDescription": "Execution stalls while L3 cache miss demand = cacheable load request is outstanding (will not count for uncacheable deman= d requests e.g. bus lock). Available PDIST counters: 0", + "PublicDescription": "Execution stalls while L3 cache miss demand = cacheable load request is outstanding (will not count for uncacheable deman= d requests e.g. bus lock).", "SampleAfterValue": "1000003", "UMask": "0x9", "Unit": "cpu_core" @@ -411,7 +407,6 @@ "Counter": "0,1,2,3", "EventCode": "0x21", "EventName": "OFFCORE_REQUESTS.L3_MISS_DEMAND_DATA_RD", - "PublicDescription": "Counts demand data read requests that miss t= he L3 cache. Available PDIST counters: 0", "SampleAfterValue": "100003", "UMask": "0x10", "Unit": "cpu_core" @@ -422,7 +417,7 @@ "CounterMask": "1", "EventCode": "0x20", "EventName": "OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_L3_MISS_DEM= AND_DATA_RD", - "PublicDescription": "Cycles with at least 1 Demand Data Read requ= ests who miss L3 cache in the superQ. Available PDIST counters: 0", + "PublicDescription": "Cycles with at least 1 Demand Data Read requ= ests who miss L3 cache in the superQ.", "SampleAfterValue": "1000003", "UMask": "0x10", "Unit": "cpu_core" @@ -432,7 +427,7 @@ "Counter": "0,1,2,3", "EventCode": "0x20", "EventName": "OFFCORE_REQUESTS_OUTSTANDING.L3_MISS_DEMAND_DATA_RD", - "PublicDescription": "For every cycle, increments by the number of= demand data read requests pending that are known to have missed the L3 cac= he. Note that this does not capture all elapsed cycles while requests are = outstanding - only cycles from when the requests were known by the requesti= ng core to have missed the L3 cache. Available PDIST counters: 0", + "PublicDescription": "For every cycle, increments by the number of= demand data read requests pending that are known to have missed the L3 cac= he. Note that this does not capture all elapsed cycles while requests are = outstanding - only cycles from when the requests were known by the requesti= ng core to have missed the L3 cache.", "SampleAfterValue": "2000003", "UMask": "0x10", "Unit": "cpu_core" diff --git a/tools/perf/pmu-events/arch/x86/meteorlake/mtl-metrics.json b/t= ools/perf/pmu-events/arch/x86/meteorlake/mtl-metrics.json index 0088be169f9b..948c16a1f95b 100644 --- a/tools/perf/pmu-events/arch/x86/meteorlake/mtl-metrics.json +++ b/tools/perf/pmu-events/arch/x86/meteorlake/mtl-metrics.json @@ -1,56 +1,56 @@ [ { "BriefDescription": "C10 residency percent per package", - "MetricExpr": "cstate_pkg@c10\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c10\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C10_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C1 residency percent per core", - "MetricExpr": "cstate_core@c1\\-residency@ / TSC", + "MetricExpr": "cstate_core@c1\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C1_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C2 residency percent per package", - "MetricExpr": "cstate_pkg@c2\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c2\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C2_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C3 residency percent per package", - "MetricExpr": "cstate_pkg@c3\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c3\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C3_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per core", - "MetricExpr": "cstate_core@c6\\-residency@ / TSC", + "MetricExpr": "cstate_core@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per package", - "MetricExpr": "cstate_pkg@c6\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C7 residency percent per core", - "MetricExpr": "cstate_core@c7\\-residency@ / TSC", + "MetricExpr": "cstate_core@c7\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C7_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C8 residency percent per package", - "MetricExpr": "cstate_pkg@c8\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c8\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C8_Pkg_Residency", "ScaleUnit": "100%" @@ -541,7 +541,7 @@ }, { "BriefDescription": "Average CPU Utilization", - "MetricExpr": "cpu_atom@CPU_CLK_UNHALTED.REF_TSC@ / TSC", + "MetricExpr": "cpu_atom@CPU_CLK_UNHALTED.REF_TSC@ / msr@tsc\\,cpu= =3Dcpu_atom@", "MetricName": "tma_info_system_cpu_utilization", "Unit": "cpu_atom" }, @@ -748,7 +748,7 @@ { "BriefDescription": "This category represents fraction of slots wh= ere no uops are being delivered due to a lack of required resources for acc= epting new uops in the Backend", "DefaultMetricgroupName": "TopdownL1", - "MetricExpr": "cpu_core@topdown\\-be\\-bound@ / (cpu_core@topdown\= \-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retirin= g@ + cpu_core@topdown\\-be\\-bound@) + 0 * tma_info_thread_slots", + "MetricExpr": "cpu_core@topdown\\-be\\-bound@ / (cpu_core@topdown\= \-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retirin= g@ + cpu_core@topdown\\-be\\-bound@)", "MetricGroup": "BvOB;Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", "MetricThreshold": "tma_backend_bound > 0.2", @@ -786,12 +786,21 @@ "PublicDescription": "Total pipeline cost of instructions used for= program control-flow - a subset of the Retiring category in TMA. Examples = include function calls; loops and alignments. (A lower bound)", "Unit": "cpu_core" }, + { + "BriefDescription": "Total pipeline cost when the execution is com= pute-bound - an estimation", + "MetricExpr": "100 * (tma_core_bound * tma_divider / (tma_divider = + tma_ports_utilization + tma_serializing_operation) + tma_core_bound * (tm= a_ports_utilization / (tma_divider + tma_ports_utilization + tma_serializin= g_operation)) * (tma_ports_utilized_3m / (tma_ports_utilized_0 + tma_ports_= utilized_1 + tma_ports_utilized_2 + tma_ports_utilized_3m)))", + "MetricGroup": "BvCB;Cor;tma_issueComp", + "MetricName": "tma_bottleneck_compute_bound_est", + "MetricThreshold": "tma_bottleneck_compute_bound_est > 20", + "PublicDescription": "Total pipeline cost when the execution is co= mpute-bound - an estimation. Covers Core Bound when High ILP as well as whe= n long-latency execution units are busy. Related metrics: ", + "Unit": "cpu_core" + }, { "BriefDescription": "Total pipeline cost of external Memory- or Ca= che-Bandwidth related bottlenecks", "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) *= (tma_mem_bandwidth / (tma_mem_bandwidth + tma_mem_latency)) + tma_memory_b= ound * (tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_= l3_bound + tma_store_bound)) * (tma_sq_full / (tma_contested_accesses + tma= _data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_memory_bound * (tm= a_l1_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound += tma_store_bound)) * (tma_fb_full / (tma_dtlb_load + tma_fb_full + tma_l1_l= atency_dependency + tma_lock_latency + tma_split_loads + tma_store_fwd_blk)= ))", "MetricGroup": "BvMB;Mem;MemoryBW;Offcore;tma_issueBW", - "MetricName": "tma_bottleneck_cache_memory_bandwidth", - "MetricThreshold": "tma_bottleneck_cache_memory_bandwidth > 20", + "MetricName": "tma_bottleneck_data_cache_memory_bandwidth", + "MetricThreshold": "tma_bottleneck_data_cache_memory_bandwidth > 2= 0", "PublicDescription": "Total pipeline cost of external Memory- or C= ache-Bandwidth related bottlenecks. Related metrics: tma_fb_full, tma_info_= system_dram_bw_use, tma_mem_bandwidth, tma_sq_full", "Unit": "cpu_core" }, @@ -799,23 +808,14 @@ "BriefDescription": "Total pipeline cost of external Memory- or Ca= che-Latency related bottlenecks", "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) *= (tma_mem_latency / (tma_mem_bandwidth + tma_mem_latency)) + tma_memory_bou= nd * (tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3= _bound + tma_store_bound)) * (tma_l3_hit_latency / (tma_contested_accesses = + tma_data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_memory_bound = * tma_l2_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bou= nd + tma_store_bound) + tma_memory_bound * (tma_l1_bound / (tma_dram_bound = + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_l1_= latency_dependency / (tma_dtlb_load + tma_fb_full + tma_l1_latency_dependen= cy + tma_lock_latency + tma_split_loads + tma_store_fwd_blk)) + tma_memory_= bound * (tma_l1_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma= _l3_bound + tma_store_bound)) * (tma_lock_latency / (tma_dtlb_load + tma_fb= _full + tma_l1_latency_dependency + tma_lock_latency + tma_split_loads + tm= a_store_fwd_blk)) + tma_memory_bound * (tma_l1_bound / (tma_dram_bound + tm= a_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_split_l= oads / (tma_dtlb_load + tma_fb_full + tma_l1_latency_dependency + tma_lock_= latency + tma_split_loads + tma_store_fwd_blk)) + tma_memory_bound * (tma_s= tore_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound += tma_store_bound)) * (tma_split_stores / (tma_dtlb_store + tma_false_sharin= g + tma_split_stores + tma_store_latency + tma_streaming_stores)) + tma_mem= ory_bound * (tma_store_bound / (tma_dram_bound + tma_l1_bound + tma_l2_boun= d + tma_l3_bound + tma_store_bound)) * (tma_store_latency / (tma_dtlb_store= + tma_false_sharing + tma_split_stores + tma_store_latency + tma_streaming= _stores)))", "MetricGroup": "BvML;Mem;MemoryLat;Offcore;tma_issueLat", - "MetricName": "tma_bottleneck_cache_memory_latency", - "MetricThreshold": "tma_bottleneck_cache_memory_latency > 20", + "MetricName": "tma_bottleneck_data_cache_memory_latency", + "MetricThreshold": "tma_bottleneck_data_cache_memory_latency > 20", "PublicDescription": "Total pipeline cost of external Memory- or C= ache-Latency related bottlenecks. Related metrics: tma_l3_hit_latency, tma_= mem_latency", "Unit": "cpu_core" }, - { - "BriefDescription": "Total pipeline cost when the execution is com= pute-bound - an estimation", - "MetricExpr": "100 * (tma_core_bound * tma_divider / (tma_divider = + tma_ports_utilization + tma_serializing_operation) + tma_core_bound * (tm= a_ports_utilization / (tma_divider + tma_ports_utilization + tma_serializin= g_operation)) * (tma_ports_utilized_3m / (tma_ports_utilized_0 + tma_ports_= utilized_1 + tma_ports_utilized_2 + tma_ports_utilized_3m)))", - "MetricGroup": "BvCB;Cor;tma_issueComp", - "MetricName": "tma_bottleneck_compute_bound_est", - "MetricThreshold": "tma_bottleneck_compute_bound_est > 20", - "PublicDescription": "Total pipeline cost when the execution is co= mpute-bound - an estimation. Covers Core Bound when High ILP as well as whe= n long-latency execution units are busy. Related metrics: ", - "Unit": "cpu_core" - }, { "BriefDescription": "Total pipeline cost of instruction fetch band= width related bottlenecks (when the front-end could not sustain operations = delivery to the back-end)", - "MetricExpr": "100 * (tma_frontend_bound - (1 - 10 * tma_microcode= _sequencer * tma_other_mispredicts / tma_branch_mispredicts) * tma_fetch_la= tency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches = + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches) - (1 - c= pu_core@INST_RETIRED.REP_ITERATION@ / cpu_core@UOPS_RETIRED.MS\\,cmask\\=3D= 1@) * (tma_fetch_latency * (tma_ms_switches + tma_branch_resteers * (tma_cl= ears_resteers + tma_mispredicts_resteers * tma_other_mispredicts / tma_bran= ch_mispredicts) / (tma_clears_resteers + tma_mispredicts_resteers + tma_unk= nown_branches)) / (tma_branch_resteers + tma_dsb_switches + tma_icache_miss= es + tma_itlb_misses + tma_lcp + tma_ms_switches) + tma_fetch_bandwidth * t= ma_ms / (tma_dsb + tma_lsd + tma_mite + tma_ms))) - tma_bottleneck_big_code= ", + "MetricExpr": "100 * (tma_frontend_bound - (1 - 10 * tma_microcode= _sequencer * tma_other_mispredicts / tma_branch_mispredicts) * tma_fetch_la= tency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches = + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches) - (1 - c= pu_core@INST_RETIRED.REP_ITERATION@ / cpu_core@UOPS_RETIRED.MS\\,cmask\\=3D= 1@) * (tma_fetch_latency * (tma_ms_switches + tma_branch_resteers * (tma_cl= ears_resteers + tma_mispredicts_resteers * tma_other_mispredicts / tma_bran= ch_mispredicts) / (tma_clears_resteers + tma_mispredicts_resteers + tma_unk= nown_branches)) / (tma_branch_resteers + tma_dsb_switches + tma_icache_miss= es + tma_itlb_misses + tma_lcp + tma_ms_switches) + tma_ms)) - tma_bottlene= ck_big_code", "MetricGroup": "BvFB;Fed;FetchBW;Frontend", "MetricName": "tma_bottleneck_instruction_fetch_bw", "MetricThreshold": "tma_bottleneck_instruction_fetch_bw > 20", @@ -823,7 +823,7 @@ }, { "BriefDescription": "Total pipeline cost of irregular execution (e= .g", - "MetricExpr": "100 * ((1 - cpu_core@INST_RETIRED.REP_ITERATION@ / = cpu_core@UOPS_RETIRED.MS\\,cmask\\=3D1@) * (tma_fetch_latency * (tma_ms_swi= tches + tma_branch_resteers * (tma_clears_resteers + tma_mispredicts_restee= rs * tma_other_mispredicts / tma_branch_mispredicts) / (tma_clears_resteers= + tma_mispredicts_resteers + tma_unknown_branches)) / (tma_branch_resteers= + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_m= s_switches) + tma_fetch_bandwidth * tma_ms / (tma_dsb + tma_lsd + tma_mite = + tma_ms)) + 10 * tma_microcode_sequencer * tma_other_mispredicts / tma_bra= nch_mispredicts * tma_branch_mispredicts + tma_machine_clears * tma_other_n= ukes / tma_other_nukes + tma_core_bound * (tma_serializing_operation + cpu_= core@RS.EMPTY_RESOURCE@ / tma_info_thread_clks * tma_ports_utilized_0) / (t= ma_divider + tma_ports_utilization + tma_serializing_operation) + tma_micro= code_sequencer / (tma_few_uops_instructions + tma_microcode_sequencer) * (t= ma_assists / tma_microcode_sequencer) * tma_heavy_operations)", + "MetricExpr": "100 * ((1 - cpu_core@INST_RETIRED.REP_ITERATION@ / = cpu_core@UOPS_RETIRED.MS\\,cmask\\=3D1@) * (tma_fetch_latency * (tma_ms_swi= tches + tma_branch_resteers * (tma_clears_resteers + tma_mispredicts_restee= rs * tma_other_mispredicts / tma_branch_mispredicts) / (tma_clears_resteers= + tma_mispredicts_resteers + tma_unknown_branches)) / (tma_branch_resteers= + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_m= s_switches) + tma_ms) + 10 * tma_microcode_sequencer * tma_other_mispredict= s / tma_branch_mispredicts * tma_branch_mispredicts + tma_machine_clears * = tma_other_nukes / tma_other_nukes + tma_core_bound * (tma_serializing_opera= tion + cpu_core@RS.EMPTY_RESOURCE@ / tma_info_thread_clks * tma_ports_utili= zed_0) / (tma_divider + tma_ports_utilization + tma_serializing_operation) = + tma_microcode_sequencer / (tma_few_uops_instructions + tma_microcode_sequ= encer) * (tma_assists / tma_microcode_sequencer) * tma_heavy_operations)", "MetricGroup": "Bad;BvIO;Cor;Ret;tma_issueMS", "MetricName": "tma_bottleneck_irregular_overhead", "MetricThreshold": "tma_bottleneck_irregular_overhead > 10", @@ -859,7 +859,7 @@ }, { "BriefDescription": "Total pipeline cost of remaining bottlenecks = in the back-end", - "MetricExpr": "100 - (tma_bottleneck_big_code + tma_bottleneck_ins= truction_fetch_bw + tma_bottleneck_mispredictions + tma_bottleneck_cache_me= mory_bandwidth + tma_bottleneck_cache_memory_latency + tma_bottleneck_memor= y_data_tlbs + tma_bottleneck_memory_synchronization + tma_bottleneck_comput= e_bound_est + tma_bottleneck_irregular_overhead + tma_bottleneck_branching_= overhead + tma_bottleneck_useful_work)", + "MetricExpr": "100 - (tma_bottleneck_big_code + tma_bottleneck_ins= truction_fetch_bw + tma_bottleneck_mispredictions + tma_bottleneck_data_cac= he_memory_bandwidth + tma_bottleneck_data_cache_memory_latency + tma_bottle= neck_memory_data_tlbs + tma_bottleneck_memory_synchronization + tma_bottlen= eck_compute_bound_est + tma_bottleneck_irregular_overhead + tma_bottleneck_= branching_overhead + tma_bottleneck_useful_work)", "MetricGroup": "BvOB;Cor;Offcore", "MetricName": "tma_bottleneck_other_bottlenecks", "MetricThreshold": "tma_bottleneck_other_bottlenecks > 20", @@ -876,7 +876,7 @@ }, { "BriefDescription": "This metric represents fraction of slots the = CPU has wasted due to Branch Misprediction", - "MetricExpr": "cpu_core@topdown\\-br\\-mispredict@ / (cpu_core@top= down\\-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-re= tiring@ + cpu_core@topdown\\-be\\-bound@) + 0 * tma_info_thread_slots", + "MetricExpr": "cpu_core@topdown\\-br\\-mispredict@ / (cpu_core@top= down\\-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-re= tiring@ + cpu_core@topdown\\-be\\-bound@)", "MetricGroup": "BadSpec;BrMispredicts;BvMP;TmaL2;TopdownL2;tma_L2_= group;tma_bad_speculation_group;tma_issueBM", "MetricName": "tma_branch_mispredicts", "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_specula= tion > 0.15", @@ -1007,7 +1007,6 @@ }, { "BriefDescription": "This metric estimates fraction of cycles whil= e the memory subsystem was handling synchronizations due to contested acces= ses", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "(cpu_core@MEM_LOAD_L3_HIT_RETIRED.XSNP_MISS@ * min(= cpu_core@MEM_LOAD_L3_HIT_RETIRED.XSNP_MISS@R, 24 * tma_info_system_core_fre= quency) + cpu_core@MEM_LOAD_L3_HIT_RETIRED.XSNP_FWD@ * min(cpu_core@MEM_LOA= D_L3_HIT_RETIRED.XSNP_FWD@R, 25 * tma_info_system_core_frequency) * (cpu_co= re@OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM@ / (cpu_core@OCR.DEMAND_DATA_RD.L3_= HIT.SNOOP_HITM@ + cpu_core@OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HIT_WITH_FWD@)))= * (1 + cpu_core@MEM_LOAD_RETIRED.FB_HIT@ / cpu_core@MEM_LOAD_RETIRED.L1_MI= SS@ / 2) / tma_info_thread_clks", "MetricGroup": "BvMS;DataSharing;LockCont;Offcore;Snoop;TopdownL4;= tma_L4_group;tma_issueSyncxn;tma_l3_bound_group", "MetricName": "tma_contested_accesses", @@ -1124,7 +1123,7 @@ "MetricGroup": "BvMB;MemoryBW;TopdownL4;tma_L4_group;tma_issueBW;t= ma_issueSL;tma_issueSmSt;tma_l1_bound_group", "MetricName": "tma_fb_full", "MetricThreshold": "tma_fb_full > 0.3", - "PublicDescription": "This metric does a *rough estimation* of how= often L1D Fill Buffer unavailability limited additional L1D miss memory ac= cess requests to proceed. The higher the metric value; the deeper the memor= y hierarchy level the misses are satisfied from (metric values >1 are valid= ). Often it hints on approaching bandwidth limits (to L2 cache; L3 cache or= external memory). Related metrics: tma_bottleneck_cache_memory_bandwidth, = tma_info_system_dram_bw_use, tma_mem_bandwidth, tma_sq_full, tma_store_late= ncy, tma_streaming_stores", + "PublicDescription": "This metric does a *rough estimation* of how= often L1D Fill Buffer unavailability limited additional L1D miss memory ac= cess requests to proceed. The higher the metric value; the deeper the memor= y hierarchy level the misses are satisfied from (metric values >1 are valid= ). Often it hints on approaching bandwidth limits (to L2 cache; L3 cache or= external memory). Related metrics: tma_bottleneck_data_cache_memory_bandwi= dth, tma_info_system_dram_bw_use, tma_mem_bandwidth, tma_sq_full, tma_store= _latency, tma_streaming_stores", "ScaleUnit": "100%", "Unit": "cpu_core" }, @@ -1253,7 +1252,7 @@ }, { "BriefDescription": "This metric represents fraction of slots wher= e the CPU was retiring heavy-weight operations -- instructions that require= two or more uops or micro-coded sequences", - "MetricExpr": "cpu_core@topdown\\-heavy\\-ops@ / (cpu_core@topdown= \\-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retiri= ng@ + cpu_core@topdown\\-be\\-bound@) + 0 * tma_info_thread_slots", + "MetricExpr": "cpu_core@topdown\\-heavy\\-ops@ / (cpu_core@topdown= \\-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retiri= ng@ + cpu_core@topdown\\-be\\-bound@)", "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_g= roup", "MetricName": "tma_heavy_operations", "MetricThreshold": "tma_heavy_operations > 0.1", @@ -1923,7 +1922,7 @@ "Unit": "cpu_core" }, { - "BriefDescription": "", + "BriefDescription": "Mem;Backend;CacheHits", "MetricExpr": "cpu_core@UOPS_EXECUTED.THREAD@ / (cpu_core@UOPS_EXE= CUTED.CORE_CYCLES_GE_1@ / 2 if #SMT_on else cpu_core@UOPS_EXECUTED.THREAD\\= ,cmask\\=3D1@)", "MetricGroup": "Cor;Pipeline;PortsUtil;SMT", "MetricName": "tma_info_pipeline_execute", @@ -1984,7 +1983,7 @@ }, { "BriefDescription": "Measured Average Core Frequency for unhalted = processors [GHz]", - "MetricExpr": "tma_info_system_turbo_utilization * TSC / 1e9 / tma= _info_system_time", + "MetricExpr": "tma_info_system_turbo_utilization * msr@tsc\\,cpu= =3Dcpu_core@ / 1e9 / tma_info_system_time", "MetricGroup": "Power;Summary", "MetricName": "tma_info_system_core_frequency", "Unit": "cpu_core" @@ -1998,7 +1997,7 @@ }, { "BriefDescription": "Average number of utilized CPUs", - "MetricExpr": "cpu_core@CPU_CLK_UNHALTED.REF_TSC@ / TSC", + "MetricExpr": "cpu_core@CPU_CLK_UNHALTED.REF_TSC@ / msr@tsc\\,cpu= =3Dcpu_core@", "MetricGroup": "Summary", "MetricName": "tma_info_system_cpus_utilized", "Unit": "cpu_core" @@ -2008,7 +2007,7 @@ "MetricExpr": "64 * (UNC_HAC_ARB_TRK_REQUESTS.ALL + UNC_HAC_ARB_CO= H_TRK_REQUESTS.ALL) / 1e9 / tma_info_system_time", "MetricGroup": "HPC;MemOffcore;MemoryBW;SoC;tma_issueBW", "MetricName": "tma_info_system_dram_bw_use", - "PublicDescription": "Average external Memory Bandwidth Use for re= ads and writes [GB / sec]. Related metrics: tma_bottleneck_cache_memory_ban= dwidth, tma_fb_full, tma_mem_bandwidth, tma_sq_full", + "PublicDescription": "Average external Memory Bandwidth Use for re= ads and writes [GB / sec]. Related metrics: tma_bottleneck_data_cache_memor= y_bandwidth, tma_fb_full, tma_mem_bandwidth, tma_sq_full", "Unit": "cpu_core" }, { @@ -2094,6 +2093,13 @@ "MetricName": "tma_info_system_turbo_utilization", "Unit": "cpu_core" }, + { + "BriefDescription": "Measured Average Uncore Frequency for the SoC= [GHz]", + "MetricExpr": "tma_info_system_socket_clks / 1e9 / tma_info_system= _time", + "MetricGroup": "SoC", + "MetricName": "tma_info_system_uncore_frequency", + "Unit": "cpu_core" + }, { "BriefDescription": "Per-Logical Processor actual clocks when the = Logical Processor is active.", "MetricExpr": "cpu_core@CPU_CLK_UNHALTED.THREAD@", @@ -2213,12 +2219,12 @@ "Unit": "cpu_core" }, { - "BriefDescription": "This metric([SKL+] roughly; [LNL]) estimates = fraction of cycles with demand load accesses that hit the L1D cache", + "BriefDescription": "This metric ([SKL+] roughly; [LNL]) estimates= fraction of cycles with demand load accesses that hit the L1D cache", "MetricExpr": "min(2 * (cpu_core@MEM_INST_RETIRED.ALL_LOADS@ - cpu= _core@MEM_LOAD_RETIRED.FB_HIT@ - cpu_core@MEM_LOAD_RETIRED.L1_MISS@) * 20 /= 100, max(cpu_core@CYCLE_ACTIVITY.CYCLES_MEM_ANY@ - cpu_core@MEMORY_ACTIVIT= Y.CYCLES_L1D_MISS@, 0)) / tma_info_thread_clks", "MetricGroup": "BvML;MemoryLat;TopdownL4;tma_L4_group;tma_l1_bound= _group", "MetricName": "tma_l1_latency_dependency", "MetricThreshold": "tma_l1_latency_dependency > 0.1 & (tma_l1_boun= d > 0.1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric([SKL+] roughly; [LNL]) estimates= fraction of cycles with demand load accesses that hit the L1D cache. The s= hort latency of the L1D cache may be exposed in pointer-chasing memory acce= ss patterns as an example. Sample with: MEM_LOAD_RETIRED.L1_HIT", + "PublicDescription": "This metric ([SKL+] roughly; [LNL]) estimate= s fraction of cycles with demand load accesses that hit the L1D cache. The = short latency of the L1D cache may be exposed in pointer-chasing memory acc= ess patterns as an example. Sample with: MEM_LOAD_RETIRED.L1_HIT", "ScaleUnit": "100%", "Unit": "cpu_core" }, @@ -2234,7 +2240,6 @@ }, { "BriefDescription": "This metric represents fraction of cycles wit= h demand load accesses that hit the L2 cache under unloaded scenarios (poss= ibly L2 latency limited)", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "cpu_core@MEM_LOAD_RETIRED.L2_HIT@ * min(cpu_core@ME= M_LOAD_RETIRED.L2_HIT@R, 3 * tma_info_system_core_frequency) * (1 + cpu_cor= e@MEM_LOAD_RETIRED.FB_HIT@ / cpu_core@MEM_LOAD_RETIRED.L1_MISS@ / 2) / tma_= info_thread_clks", "MetricGroup": "MemoryLat;TopdownL4;tma_L4_group;tma_l2_bound_grou= p", "MetricName": "tma_l2_hit_latency", @@ -2255,12 +2260,11 @@ }, { "BriefDescription": "This metric estimates fraction of cycles with= demand load accesses that hit the L3 cache under unloaded scenarios (possi= bly L3 latency limited)", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "cpu_core@MEM_LOAD_RETIRED.L3_HIT@ * min(cpu_core@ME= M_LOAD_RETIRED.L3_HIT@R, 9 * tma_info_system_core_frequency) * (1 + cpu_cor= e@MEM_LOAD_RETIRED.FB_HIT@ / cpu_core@MEM_LOAD_RETIRED.L1_MISS@ / 2) / tma_= info_thread_clks", "MetricGroup": "BvML;MemoryLat;TopdownL4;tma_L4_group;tma_issueLat= ;tma_l3_bound_group", "MetricName": "tma_l3_hit_latency", "MetricThreshold": "tma_l3_hit_latency > 0.1 & (tma_l3_bound > 0.0= 5 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric estimates fraction of cycles wit= h demand load accesses that hit the L3 cache under unloaded scenarios (poss= ibly L3 latency limited). Avoiding private cache misses (i.e. L2 misses/L3= hits) will improve the latency; reduce contention with sibling physical co= res and increase performance. Note the value of this node may overlap with= its siblings. Sample with: MEM_LOAD_RETIRED.L3_HIT_PS. Related metrics: tm= a_bottleneck_cache_memory_latency, tma_mem_latency", + "PublicDescription": "This metric estimates fraction of cycles wit= h demand load accesses that hit the L3 cache under unloaded scenarios (poss= ibly L3 latency limited). Avoiding private cache misses (i.e. L2 misses/L3= hits) will improve the latency; reduce contention with sibling physical co= res and increase performance. Note the value of this node may overlap with= its siblings. Sample with: MEM_LOAD_RETIRED.L3_HIT_PS. Related metrics: tm= a_bottleneck_data_cache_memory_latency, tma_mem_latency", "ScaleUnit": "100%", "Unit": "cpu_core" }, @@ -2342,6 +2346,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles the= CPU spent handling cache misses due to lock operations", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "cpu_core@MEM_INST_RETIRED.LOCK_LOADS@ * cpu_core@ME= M_INST_RETIRED.LOCK_LOADS@R / tma_info_thread_clks", "MetricGroup": "LockCont;Offcore;TopdownL4;tma_L4_group;tma_issueR= FO;tma_l1_bound_group", "MetricName": "tma_lock_latency", @@ -2377,7 +2382,7 @@ "MetricGroup": "BvMB;MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_d= ram_bound_group;tma_issueBW", "MetricName": "tma_mem_bandwidth", "MetricThreshold": "tma_mem_bandwidth > 0.2 & (tma_dram_bound > 0.= 1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric estimates fraction of cycles whe= re the core's performance was likely hurt due to approaching bandwidth limi= ts of external memory - DRAM ([SPR-HBM] and/or HBM). The underlying heuris= tic assumes that a similar off-core traffic is generated by all IA cores. T= his metric does not aggregate non-data-read requests by this logical proces= sor; requests from other IA Logical Processors/Physical Cores/sockets; or o= ther non-IA devices like GPU; hence the maximum external memory bandwidth l= imits may or may not be approached when this metric is flagged (see Uncore = counters for that). Related metrics: tma_bottleneck_cache_memory_bandwidth,= tma_fb_full, tma_info_system_dram_bw_use, tma_sq_full", + "PublicDescription": "This metric estimates fraction of cycles whe= re the core's performance was likely hurt due to approaching bandwidth limi= ts of external memory - DRAM ([SPR-HBM] and/or HBM). The underlying heuris= tic assumes that a similar off-core traffic is generated by all IA cores. T= his metric does not aggregate non-data-read requests by this logical proces= sor; requests from other IA Logical Processors/Physical Cores/sockets; or o= ther non-IA devices like GPU; hence the maximum external memory bandwidth l= imits may or may not be approached when this metric is flagged (see Uncore = counters for that). Related metrics: tma_bottleneck_data_cache_memory_bandw= idth, tma_fb_full, tma_info_system_dram_bw_use, tma_sq_full", "ScaleUnit": "100%", "Unit": "cpu_core" }, @@ -2387,13 +2392,13 @@ "MetricGroup": "BvML;MemoryLat;Offcore;TopdownL4;tma_L4_group;tma_= dram_bound_group;tma_issueLat", "MetricName": "tma_mem_latency", "MetricThreshold": "tma_mem_latency > 0.1 & (tma_dram_bound > 0.1 = & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric estimates fraction of cycles whe= re the performance was likely hurt due to latency from external memory - DR= AM ([SPR-HBM] and/or HBM). This metric does not aggregate requests from ot= her Logical Processors/Physical Cores/sockets (see Uncore counters for that= ). Related metrics: tma_bottleneck_cache_memory_latency, tma_l3_hit_latency= ", + "PublicDescription": "This metric estimates fraction of cycles whe= re the performance was likely hurt due to latency from external memory - DR= AM ([SPR-HBM] and/or HBM). This metric does not aggregate requests from ot= her Logical Processors/Physical Cores/sockets (see Uncore counters for that= ). Related metrics: tma_bottleneck_data_cache_memory_latency, tma_l3_hit_la= tency", "ScaleUnit": "100%", "Unit": "cpu_core" }, { "BriefDescription": "This metric represents fraction of slots the = Memory subsystem within the Backend was a bottleneck", - "MetricExpr": "cpu_core@topdown\\-mem\\-bound@ / (cpu_core@topdown= \\-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retiri= ng@ + cpu_core@topdown\\-be\\-bound@) + 0 * tma_info_thread_slots", + "MetricExpr": "cpu_core@topdown\\-mem\\-bound@ / (cpu_core@topdown= \\-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retiri= ng@ + cpu_core@topdown\\-be\\-bound@)", "MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_b= ound_group", "MetricName": "tma_memory_bound", "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0= .2", @@ -2404,7 +2409,6 @@ }, { "BriefDescription": "This metric represents fraction of cycles the= CPU was stalled due to LFENCE Instructions.", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "13 * cpu_core@MISC2_RETIRED.LFENCE@ / tma_info_thre= ad_clks", "MetricGroup": "TopdownL4;tma_L4_group;tma_serializing_operation_g= roup", "MetricName": "tma_memory_fence", @@ -2463,7 +2467,7 @@ }, { "BriefDescription": "This metric represents Core fraction of cycle= s in which CPU was likely limited due to the Microcode Sequencer (MS) unit = - see Microcode_Sequencer node for details.", - "MetricExpr": "max(cpu_core@IDQ.MS_CYCLES_ANY@, cpu_core@UOPS_RETI= RED.MS\\,cmask\\=3D1@ / (cpu_core@UOPS_RETIRED.SLOTS@ / cpu_core@UOPS_ISSUE= D.ANY@)) / tma_info_core_core_clks / 2", + "MetricExpr": "max(cpu_core@IDQ.MS_CYCLES_ANY@, cpu_core@UOPS_RETI= RED.MS\\,cmask\\=3D1@ / (cpu_core@UOPS_RETIRED.SLOTS@ / cpu_core@UOPS_ISSUE= D.ANY@)) / tma_info_core_core_clks / 2.4", "MetricGroup": "MicroSeq;TopdownL3;tma_L3_group;tma_fetch_bandwidt= h_group", "MetricName": "tma_ms", "MetricThreshold": "tma_ms > 0.05 & tma_fetch_bandwidth > 0.2", @@ -2502,6 +2506,7 @@ }, { "BriefDescription": "This metric represents the remaining light uo= ps fraction the CPU has executed - remaining means not covered by other sib= ling nodes", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "max(0, tma_light_operations - (tma_fp_arith + tma_i= nt_operations + tma_memory_operations + tma_fused_instructions + tma_non_fu= sed_branches))", "MetricGroup": "Pipeline;TopdownL3;tma_L3_group;tma_light_operatio= ns_group", "MetricName": "tma_other_light_ops", @@ -2570,6 +2575,7 @@ }, { "BriefDescription": "This metric estimates fraction of cycles the = CPU performance was potentially limited due to Core computation issues (non= divider-related)", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "((tma_ports_utilized_0 * tma_info_thread_clks + (cp= u_core@EXE_ACTIVITY.1_PORTS_UTIL@ + tma_retiring * cpu_core@EXE_ACTIVITY.2_= 3_PORTS_UTIL@)) / tma_info_thread_clks if cpu_core@ARITH.DIV_ACTIVE@ < cpu_= core@CYCLE_ACTIVITY.STALLS_TOTAL@ - cpu_core@EXE_ACTIVITY.BOUND_ON_LOADS@ e= lse (cpu_core@EXE_ACTIVITY.1_PORTS_UTIL@ + tma_retiring * cpu_core@EXE_ACTI= VITY.2_3_PORTS_UTIL@) / tma_info_thread_clks)", "MetricGroup": "PortsUtil;TopdownL3;tma_L3_group;tma_core_bound_gr= oup", "MetricName": "tma_ports_utilization", @@ -2580,6 +2586,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles CPU= executed no uops on any execution port (Logical Processor cycles since ICL= , Physical Core cycles otherwise)", + "MetricConstraint": "NO_THRESHOLD_AND_NMI", "MetricExpr": "max(cpu_core@EXE_ACTIVITY.EXE_BOUND_0_PORTS@ - cpu_= core@RESOURCE_STALLS.SCOREBOARD@, 0) / tma_info_thread_clks", "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_ports_utiliza= tion_group", "MetricName": "tma_ports_utilized_0", @@ -2590,6 +2597,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles whe= re the CPU executed total of 1 uop per cycle on all execution ports (Logica= l Processor cycles since ICL, Physical Core cycles otherwise)", + "MetricConstraint": "NO_THRESHOLD_AND_NMI", "MetricExpr": "cpu_core@EXE_ACTIVITY.1_PORTS_UTIL@ / tma_info_thre= ad_clks", "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_issueL1;tma_p= orts_utilization_group", "MetricName": "tma_ports_utilized_1", @@ -2600,7 +2608,6 @@ }, { "BriefDescription": "This metric represents fraction of cycles CPU= executed total of 2 uops per cycle on all execution ports (Logical Process= or cycles since ICL, Physical Core cycles otherwise)", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "cpu_core@EXE_ACTIVITY.2_PORTS_UTIL@ / tma_info_thre= ad_clks", "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_issue2P;tma_p= orts_utilization_group", "MetricName": "tma_ports_utilized_2", @@ -2611,7 +2618,6 @@ }, { "BriefDescription": "This metric represents fraction of cycles CPU= executed total of 3 or more uops per cycle on all execution ports (Logical= Processor cycles since ICL, Physical Core cycles otherwise)", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "cpu_core@UOPS_EXECUTED.CYCLES_GE_3@ / tma_info_thre= ad_clks", "MetricGroup": "BvCB;PortsUtil;TopdownL4;tma_L4_group;tma_ports_ut= ilization_group", "MetricName": "tma_ports_utilized_3m", @@ -2632,7 +2638,7 @@ { "BriefDescription": "This category represents fraction of slots ut= ilized by useful work i.e. issued uops that eventually get retired", "DefaultMetricgroupName": "TopdownL1", - "MetricExpr": "cpu_core@topdown\\-retiring@ / (cpu_core@topdown\\-= fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retiring@= + cpu_core@topdown\\-be\\-bound@) + 0 * tma_info_thread_slots", + "MetricExpr": "cpu_core@topdown\\-retiring@ / (cpu_core@topdown\\-= fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retiring@= + cpu_core@topdown\\-be\\-bound@)", "MetricGroup": "BvUW;Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_retiring", "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.= 1", @@ -2663,7 +2669,6 @@ }, { "BriefDescription": "This metric represents fraction of cycles the= CPU was stalled due to PAUSE Instructions", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "cpu_core@CPU_CLK_UNHALTED.PAUSE@ / tma_info_thread_= clks", "MetricGroup": "TopdownL4;tma_L4_group;tma_serializing_operation_g= roup", "MetricName": "tma_slow_pause", @@ -2698,7 +2703,7 @@ "MetricGroup": "BvMB;MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_i= ssueBW;tma_l3_bound_group", "MetricName": "tma_sq_full", "MetricThreshold": "tma_sq_full > 0.3 & (tma_l3_bound > 0.05 & (tm= a_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric measures fraction of cycles wher= e the Super Queue (SQ) was full taking into account all request-types and b= oth hardware SMT threads (Logical Processors). Related metrics: tma_bottlen= eck_cache_memory_bandwidth, tma_fb_full, tma_info_system_dram_bw_use, tma_m= em_bandwidth", + "PublicDescription": "This metric measures fraction of cycles wher= e the Super Queue (SQ) was full taking into account all request-types and b= oth hardware SMT threads (Logical Processors). Related metrics: tma_bottlen= eck_data_cache_memory_bandwidth, tma_fb_full, tma_info_system_dram_bw_use, = tma_mem_bandwidth", "ScaleUnit": "100%", "Unit": "cpu_core" }, diff --git a/tools/perf/pmu-events/arch/x86/meteorlake/other.json b/tools/p= erf/pmu-events/arch/x86/meteorlake/other.json index cb21bb933617..8320ffd83c51 100644 --- a/tools/perf/pmu-events/arch/x86/meteorlake/other.json +++ b/tools/perf/pmu-events/arch/x86/meteorlake/other.json @@ -4,7 +4,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc1", "EventName": "ASSISTS.HARDWARE", - "PublicDescription": "Count all other hardware assists or traps th= at are not necessarily architecturally exposed (through a software handler)= beyond FP; SSE-AVX mix and A/D assists who are counted by dedicated sub-ev= ents. This includes, but not limited to, assists at EXE or MEM uop writeba= ck like AVX* load/store/gather/scatter (non-FP GSSE-assist ) , assists gene= rated by ROB like PEBS and RTIT, Uncore trap, RAR (Remote Action Request) a= nd CET (Control flow Enforcement Technology) assists. the event also counts= for Machine Ordering count. Available PDIST counters: 0", + "PublicDescription": "Count all other hardware assists or traps th= at are not necessarily architecturally exposed (through a software handler)= beyond FP; SSE-AVX mix and A/D assists who are counted by dedicated sub-ev= ents. This includes, but not limited to, assists at EXE or MEM uop writeba= ck like AVX* load/store/gather/scatter (non-FP GSSE-assist ) , assists gene= rated by ROB like PEBS and RTIT, Uncore trap, RAR (Remote Action Request) a= nd CET (Control flow Enforcement Technology) assists. the event also counts= for Machine Ordering count.", "SampleAfterValue": "100003", "UMask": "0x4", "Unit": "cpu_core" @@ -14,7 +14,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc1", "EventName": "ASSISTS.PAGE_FAULT", - "PublicDescription": "ASSISTS.PAGE_FAULT Available PDIST counters:= 0", "SampleAfterValue": "1000003", "UMask": "0x8", "Unit": "cpu_core" @@ -83,7 +82,7 @@ "CounterMask": "1", "EventCode": "0x2d", "EventName": "XQ.FULL_CYCLES", - "PublicDescription": "number of cycles when the thread is active a= nd the uncore cannot take any further requests (for example prefetches, loa= ds or stores initiated by the Core that miss the L2 cache). Available PDIST= counters: 0", + "PublicDescription": "number of cycles when the thread is active a= nd the uncore cannot take any further requests (for example prefetches, loa= ds or stores initiated by the Core that miss the L2 cache).", "SampleAfterValue": "1000003", "UMask": "0x1", "Unit": "cpu_core" diff --git a/tools/perf/pmu-events/arch/x86/meteorlake/pipeline.json b/tool= s/perf/pmu-events/arch/x86/meteorlake/pipeline.json index 22b25708e799..bfdaabe9377d 100644 --- a/tools/perf/pmu-events/arch/x86/meteorlake/pipeline.json +++ b/tools/perf/pmu-events/arch/x86/meteorlake/pipeline.json @@ -15,7 +15,7 @@ "CounterMask": "1", "EventCode": "0xb0", "EventName": "ARITH.DIV_ACTIVE", - "PublicDescription": "Counts cycles when divide unit is busy execu= ting divide or square root operations. Accounts for integer and floating-po= int operations. Available PDIST counters: 0", + "PublicDescription": "Counts cycles when divide unit is busy execu= ting divide or square root operations. Accounts for integer and floating-po= int operations.", "SampleAfterValue": "1000003", "UMask": "0x9", "Unit": "cpu_core" @@ -26,7 +26,6 @@ "CounterMask": "1", "EventCode": "0xb0", "EventName": "ARITH.IDIV_ACTIVE", - "PublicDescription": "This event counts the cycles the integer div= ider is busy. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x8", "Unit": "cpu_core" @@ -36,7 +35,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc1", "EventName": "ASSISTS.ANY", - "PublicDescription": "Counts the number of occurrences where a mic= rocode assist is invoked by hardware. Examples include AD (page Access Dirt= y), FP and AVX related assists. Available PDIST counters: 0", + "PublicDescription": "Counts the number of occurrences where a mic= rocode assist is invoked by hardware. Examples include AD (page Access Dirt= y), FP and AVX related assists.", "SampleAfterValue": "100003", "UMask": "0x1b", "Unit": "cpu_core" @@ -44,6 +43,7 @@ { "BriefDescription": "Counts the total number of branch instruction= s retired for all branch types.", "Counter": "0,1,2,3,4,5,6,7", + "Errata": "MTL012, MTL013", "EventCode": "0xc4", "EventName": "BR_INST_RETIRED.ALL_BRANCHES", "PublicDescription": "Counts the total number of instructions in w= hich the instruction pointer (IP) of the processor is resteered due to a br= anch instruction and the branch instruction successfully retires. All bran= ch type instructions are accounted for.", @@ -62,6 +62,7 @@ { "BriefDescription": "Counts the number of retired JCC (Jump on Con= ditional Code) branch instructions retired, includes both taken and not tak= en branches.", "Counter": "0,1,2,3,4,5,6,7", + "Errata": "MTL013", "EventCode": "0xc4", "EventName": "BR_INST_RETIRED.COND", "SampleAfterValue": "200003", @@ -110,6 +111,7 @@ { "BriefDescription": "Counts the number of far branch instructions = retired, includes far jump, far call and return, and interrupt call and ret= urn.", "Counter": "0,1,2,3,4,5,6,7", + "Errata": "MTL013", "EventCode": "0xc4", "EventName": "BR_INST_RETIRED.FAR_BRANCH", "SampleAfterValue": "200003", @@ -129,6 +131,7 @@ { "BriefDescription": "Counts the number of near indirect JMP and ne= ar indirect CALL branch instructions retired.", "Counter": "0,1,2,3,4,5,6,7", + "Errata": "MTL013", "EventCode": "0xc4", "EventName": "BR_INST_RETIRED.INDIRECT", "SampleAfterValue": "200003", @@ -148,6 +151,7 @@ { "BriefDescription": "Counts the number of near indirect CALL branc= h instructions retired.", "Counter": "0,1,2,3,4,5,6,7", + "Errata": "MTL013", "EventCode": "0xc4", "EventName": "BR_INST_RETIRED.INDIRECT_CALL", "SampleAfterValue": "200003", @@ -167,6 +171,7 @@ "BriefDescription": "This event is deprecated. Refer to new event = BR_INST_RETIRED.INDIRECT_CALL", "Counter": "0,1,2,3,4,5,6,7", "Deprecated": "1", + "Errata": "MTL013", "EventCode": "0xc4", "EventName": "BR_INST_RETIRED.IND_CALL", "SampleAfterValue": "200003", @@ -176,6 +181,7 @@ { "BriefDescription": "Counts the number of near CALL branch instruc= tions retired.", "Counter": "0,1,2,3,4,5,6,7", + "Errata": "MTL012, MTL013", "EventCode": "0xc4", "EventName": "BR_INST_RETIRED.NEAR_CALL", "SampleAfterValue": "200003", @@ -214,6 +220,7 @@ { "BriefDescription": "Counts the number of near taken branch instru= ctions retired.", "Counter": "0,1,2,3,4,5,6,7", + "Errata": "MTL013", "EventCode": "0xc4", "EventName": "BR_INST_RETIRED.NEAR_TAKEN", "SampleAfterValue": "200003", @@ -484,7 +491,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xec", "EventName": "CPU_CLK_UNHALTED.C01", - "PublicDescription": "Counts core clocks when the thread is in the= C0.1 light-weight slower wakeup time but more power saving optimized state= . This state can be entered via the TPAUSE or UMWAIT instructions. Availab= le PDIST counters: 0", + "PublicDescription": "Counts core clocks when the thread is in the= C0.1 light-weight slower wakeup time but more power saving optimized state= . This state can be entered via the TPAUSE or UMWAIT instructions.", "SampleAfterValue": "2000003", "UMask": "0x10", "Unit": "cpu_core" @@ -494,7 +501,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xec", "EventName": "CPU_CLK_UNHALTED.C02", - "PublicDescription": "Counts core clocks when the thread is in the= C0.2 light-weight faster wakeup time but less power saving optimized state= . This state can be entered via the TPAUSE or UMWAIT instructions. Availab= le PDIST counters: 0", + "PublicDescription": "Counts core clocks when the thread is in the= C0.2 light-weight faster wakeup time but less power saving optimized state= . This state can be entered via the TPAUSE or UMWAIT instructions.", "SampleAfterValue": "2000003", "UMask": "0x20", "Unit": "cpu_core" @@ -504,7 +511,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xec", "EventName": "CPU_CLK_UNHALTED.C0_WAIT", - "PublicDescription": "Counts core clocks when the thread is in the= C0.1 or C0.2 power saving optimized states (TPAUSE or UMWAIT instructions)= or running the PAUSE instruction. Available PDIST counters: 0", + "PublicDescription": "Counts core clocks when the thread is in the= C0.1 or C0.2 power saving optimized states (TPAUSE or UMWAIT instructions)= or running the PAUSE instruction.", "SampleAfterValue": "2000003", "UMask": "0x70", "Unit": "cpu_core" @@ -530,7 +537,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xec", "EventName": "CPU_CLK_UNHALTED.DISTRIBUTED", - "PublicDescription": "This event distributes cycle counts between = active hyperthreads, i.e., those in C0. A hyperthread becomes inactive whe= n it executes the HLT or MWAIT instructions. If all other hyperthreads are= inactive (or disabled or do not exist), all counts are attributed to this = hyperthread. To obtain the full count when the Core is active, sum the coun= ts from each hyperthread. Available PDIST counters: 0", + "PublicDescription": "This event distributes cycle counts between = active hyperthreads, i.e., those in C0. A hyperthread becomes inactive whe= n it executes the HLT or MWAIT instructions. If all other hyperthreads are= inactive (or disabled or do not exist), all counts are attributed to this = hyperthread. To obtain the full count when the Core is active, sum the coun= ts from each hyperthread.", "SampleAfterValue": "2000003", "UMask": "0x2", "Unit": "cpu_core" @@ -540,7 +547,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0x3c", "EventName": "CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE", - "PublicDescription": "Counts Core crystal clock cycles when curren= t thread is unhalted and the other thread is halted. Available PDIST counte= rs: 0", + "PublicDescription": "Counts Core crystal clock cycles when curren= t thread is unhalted and the other thread is halted.", "SampleAfterValue": "25003", "UMask": "0x2", "Unit": "cpu_core" @@ -550,7 +557,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xec", "EventName": "CPU_CLK_UNHALTED.PAUSE", - "PublicDescription": "CPU_CLK_UNHALTED.PAUSE Available PDIST count= ers: 0", "SampleAfterValue": "2000003", "UMask": "0x40", "Unit": "cpu_core" @@ -562,7 +568,6 @@ "EdgeDetect": "1", "EventCode": "0xec", "EventName": "CPU_CLK_UNHALTED.PAUSE_INST", - "PublicDescription": "CPU_CLK_UNHALTED.PAUSE_INST Available PDIST = counters: 0", "SampleAfterValue": "2000003", "UMask": "0x40", "Unit": "cpu_core" @@ -572,7 +577,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0x3c", "EventName": "CPU_CLK_UNHALTED.REF_DISTRIBUTED", - "PublicDescription": "This event distributes Core crystal clock cy= cle counts between active hyperthreads, i.e., those in C0 sleep-state. A hy= perthread becomes inactive when it executes the HLT or MWAIT instructions. = If one thread is active in a core, all counts are attributed to this hypert= hread. To obtain the full count when the Core is active, sum the counts fro= m each hyperthread. Available PDIST counters: 0", + "PublicDescription": "This event distributes Core crystal clock cy= cle counts between active hyperthreads, i.e., those in C0 sleep-state. A hy= perthread becomes inactive when it executes the HLT or MWAIT instructions. = If one thread is active in a core, all counts are attributed to this hypert= hread. To obtain the full count when the Core is active, sum the counts fro= m each hyperthread.", "SampleAfterValue": "2000003", "UMask": "0x8", "Unit": "cpu_core" @@ -589,7 +594,7 @@ "BriefDescription": "Reference cycles when the core is not in halt= state.", "Counter": "Fixed counter 2", "EventName": "CPU_CLK_UNHALTED.REF_TSC", - "PublicDescription": "Counts the number of reference cycles when t= he core is not in a halt state. The core enters the halt state when it is r= unning the HLT instruction or the MWAIT instruction. This event is not affe= cted by core frequency changes (for example, P states, TM2 transitions) but= has the same incrementing frequency as the time stamp counter. This event = can approximate elapsed time while the core was not in a halt state. It is = counted on a dedicated fixed counter, leaving the eight programmable counte= rs available for other events. Note: On all current platforms this event st= ops counting during 'throttling (TM)' states duty off periods the processor= is 'halted'. The counter update is done at a lower clock rate then the co= re clock the overflow status bit for this counter may appear 'sticky'. Aft= er the counter has overflowed and software clears the overflow status bit a= nd resets the counter to less than MAX. The reset value to the counter is n= ot clocked immediately so the overflow status bit will flip 'high (1)' and = generate another PMI (if enabled) after which the reset value gets clocked = into the counter. Therefore, software will get the interrupt, read the over= flow status bit '1 for bit 34 while the counter value is less than MAX. Sof= tware should ignore this case. Available PDIST counters: 0", + "PublicDescription": "Counts the number of reference cycles when t= he core is not in a halt state. The core enters the halt state when it is r= unning the HLT instruction or the MWAIT instruction. This event is not affe= cted by core frequency changes (for example, P states, TM2 transitions) but= has the same incrementing frequency as the time stamp counter. This event = can approximate elapsed time while the core was not in a halt state. It is = counted on a dedicated fixed counter, leaving the eight programmable counte= rs available for other events. Note: On all current platforms this event st= ops counting during 'throttling (TM)' states duty off periods the processor= is 'halted'. The counter update is done at a lower clock rate then the co= re clock the overflow status bit for this counter may appear 'sticky'. Aft= er the counter has overflowed and software clears the overflow status bit a= nd resets the counter to less than MAX. The reset value to the counter is n= ot clocked immediately so the overflow status bit will flip 'high (1)' and = generate another PMI (if enabled) after which the reset value gets clocked = into the counter. Therefore, software will get the interrupt, read the over= flow status bit '1 for bit 34 while the counter value is less than MAX. Sof= tware should ignore this case.", "SampleAfterValue": "2000003", "UMask": "0x3", "Unit": "cpu_core" @@ -609,7 +614,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0x3c", "EventName": "CPU_CLK_UNHALTED.REF_TSC_P", - "PublicDescription": "Counts the number of reference cycles when t= he core is not in a halt state. The core enters the halt state when it is r= unning the HLT instruction or the MWAIT instruction. This event is not affe= cted by core frequency changes (for example, P states, TM2 transitions) but= has the same incrementing frequency as the time stamp counter. This event = can approximate elapsed time while the core was not in a halt state. It is = counted on a dedicated fixed counter, leaving the four (eight when Hyperthr= eading is disabled) programmable counters available for other events. Note:= On all current platforms this event stops counting during 'throttling (TM)= ' states duty off periods the processor is 'halted'. The counter update is= done at a lower clock rate then the core clock the overflow status bit for= this counter may appear 'sticky'. After the counter has overflowed and so= ftware clears the overflow status bit and resets the counter to less than M= AX. The reset value to the counter is not clocked immediately so the overfl= ow status bit will flip 'high (1)' and generate another PMI (if enabled) af= ter which the reset value gets clocked into the counter. Therefore, softwar= e will get the interrupt, read the overflow status bit '1 for bit 34 while = the counter value is less than MAX. Software should ignore this case. Avail= able PDIST counters: 0", + "PublicDescription": "Counts the number of reference cycles when t= he core is not in a halt state. The core enters the halt state when it is r= unning the HLT instruction or the MWAIT instruction. This event is not affe= cted by core frequency changes (for example, P states, TM2 transitions) but= has the same incrementing frequency as the time stamp counter. This event = can approximate elapsed time while the core was not in a halt state. It is = counted on a dedicated fixed counter, leaving the four (eight when Hyperthr= eading is disabled) programmable counters available for other events. Note:= On all current platforms this event stops counting during 'throttling (TM)= ' states duty off periods the processor is 'halted'. The counter update is= done at a lower clock rate then the core clock the overflow status bit for= this counter may appear 'sticky'. After the counter has overflowed and so= ftware clears the overflow status bit and resets the counter to less than M= AX. The reset value to the counter is not clocked immediately so the overfl= ow status bit will flip 'high (1)' and generate another PMI (if enabled) af= ter which the reset value gets clocked into the counter. Therefore, softwar= e will get the interrupt, read the overflow status bit '1 for bit 34 while = the counter value is less than MAX. Software should ignore this case.", "SampleAfterValue": "2000003", "UMask": "0x1", "Unit": "cpu_core" @@ -626,7 +631,7 @@ "BriefDescription": "Core cycles when the thread is not in halt st= ate", "Counter": "Fixed counter 1", "EventName": "CPU_CLK_UNHALTED.THREAD", - "PublicDescription": "Counts the number of core cycles while the t= hread is not in a halt state. The thread enters the halt state when it is r= unning the HLT instruction. This event is a component in many key event rat= ios. The core frequency may change from time to time due to transitions ass= ociated with Enhanced Intel SpeedStep Technology or TM2. For this reason th= is event may have a changing ratio with regards to time. When the core freq= uency is constant, this event can approximate elapsed time while the core w= as not in the halt state. It is counted on a dedicated fixed counter, leavi= ng the eight programmable counters available for other events. Available PD= IST counters: 0", + "PublicDescription": "Counts the number of core cycles while the t= hread is not in a halt state. The thread enters the halt state when it is r= unning the HLT instruction. This event is a component in many key event rat= ios. The core frequency may change from time to time due to transitions ass= ociated with Enhanced Intel SpeedStep Technology or TM2. For this reason th= is event may have a changing ratio with regards to time. When the core freq= uency is constant, this event can approximate elapsed time while the core w= as not in the halt state. It is counted on a dedicated fixed counter, leavi= ng the eight programmable counters available for other events.", "SampleAfterValue": "2000003", "UMask": "0x2", "Unit": "cpu_core" @@ -644,7 +649,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0x3c", "EventName": "CPU_CLK_UNHALTED.THREAD_P", - "PublicDescription": "This is an architectural event that counts t= he number of thread cycles while the thread is not in a halt state. The thr= ead enters the halt state when it is running the HLT instruction. The core = frequency may change from time to time due to power or thermal throttling. = For this reason, this event may have a changing ratio with regards to wall = clock time. Available PDIST counters: 0", + "PublicDescription": "This is an architectural event that counts t= he number of thread cycles while the thread is not in a halt state. The thr= ead enters the halt state when it is running the HLT instruction. The core = frequency may change from time to time due to power or thermal throttling. = For this reason, this event may have a changing ratio with regards to wall = clock time.", "SampleAfterValue": "2000003", "Unit": "cpu_core" }, @@ -654,7 +659,6 @@ "CounterMask": "8", "EventCode": "0xa3", "EventName": "CYCLE_ACTIVITY.CYCLES_L1D_MISS", - "PublicDescription": "Cycles while L1 cache miss demand load is ou= tstanding. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x8", "Unit": "cpu_core" @@ -665,7 +669,6 @@ "CounterMask": "1", "EventCode": "0xa3", "EventName": "CYCLE_ACTIVITY.CYCLES_L2_MISS", - "PublicDescription": "Cycles while L2 cache miss demand load is ou= tstanding. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x1", "Unit": "cpu_core" @@ -676,7 +679,6 @@ "CounterMask": "16", "EventCode": "0xa3", "EventName": "CYCLE_ACTIVITY.CYCLES_MEM_ANY", - "PublicDescription": "Cycles while memory subsystem has an outstan= ding load. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x10", "Unit": "cpu_core" @@ -687,7 +689,6 @@ "CounterMask": "12", "EventCode": "0xa3", "EventName": "CYCLE_ACTIVITY.STALLS_L1D_MISS", - "PublicDescription": "Execution stalls while L1 cache miss demand = load is outstanding. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0xc", "Unit": "cpu_core" @@ -698,7 +699,6 @@ "CounterMask": "5", "EventCode": "0xa3", "EventName": "CYCLE_ACTIVITY.STALLS_L2_MISS", - "PublicDescription": "Execution stalls while L2 cache miss demand = load is outstanding. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x5", "Unit": "cpu_core" @@ -709,7 +709,6 @@ "CounterMask": "4", "EventCode": "0xa3", "EventName": "CYCLE_ACTIVITY.STALLS_TOTAL", - "PublicDescription": "Total execution stalls. Available PDIST coun= ters: 0", "SampleAfterValue": "1000003", "UMask": "0x4", "Unit": "cpu_core" @@ -719,7 +718,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa6", "EventName": "EXE_ACTIVITY.1_PORTS_UTIL", - "PublicDescription": "Counts cycles during which a total of 1 uop = was executed on all ports and Reservation Station (RS) was not empty. Avail= able PDIST counters: 0", + "PublicDescription": "Counts cycles during which a total of 1 uop = was executed on all ports and Reservation Station (RS) was not empty.", "SampleAfterValue": "2000003", "UMask": "0x2", "Unit": "cpu_core" @@ -729,7 +728,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa6", "EventName": "EXE_ACTIVITY.2_3_PORTS_UTIL", - "PublicDescription": "Cycles total of 2 or 3 uops are executed on = all ports and Reservation Station (RS) was not empty. Available PDIST count= ers: 0", "SampleAfterValue": "2000003", "UMask": "0xc", "Unit": "cpu_core" @@ -739,7 +737,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa6", "EventName": "EXE_ACTIVITY.2_PORTS_UTIL", - "PublicDescription": "Counts cycles during which a total of 2 uops= were executed on all ports and Reservation Station (RS) was not empty. Ava= ilable PDIST counters: 0", + "PublicDescription": "Counts cycles during which a total of 2 uops= were executed on all ports and Reservation Station (RS) was not empty.", "SampleAfterValue": "2000003", "UMask": "0x4", "Unit": "cpu_core" @@ -749,7 +747,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa6", "EventName": "EXE_ACTIVITY.3_PORTS_UTIL", - "PublicDescription": "Cycles total of 3 uops are executed on all p= orts and Reservation Station (RS) was not empty. Available PDIST counters: = 0", + "PublicDescription": "Cycles total of 3 uops are executed on all p= orts and Reservation Station (RS) was not empty.", "SampleAfterValue": "2000003", "UMask": "0x8", "Unit": "cpu_core" @@ -759,7 +757,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa6", "EventName": "EXE_ACTIVITY.4_PORTS_UTIL", - "PublicDescription": "Cycles total of 4 uops are executed on all p= orts and Reservation Station (RS) was not empty. Available PDIST counters: = 0", + "PublicDescription": "Cycles total of 4 uops are executed on all p= orts and Reservation Station (RS) was not empty.", "SampleAfterValue": "2000003", "UMask": "0x10", "Unit": "cpu_core" @@ -770,7 +768,6 @@ "CounterMask": "5", "EventCode": "0xa6", "EventName": "EXE_ACTIVITY.BOUND_ON_LOADS", - "PublicDescription": "Execution stalls while memory subsystem has = an outstanding load. Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x21", "Unit": "cpu_core" @@ -781,7 +778,7 @@ "CounterMask": "2", "EventCode": "0xa6", "EventName": "EXE_ACTIVITY.BOUND_ON_STORES", - "PublicDescription": "Counts cycles where the Store Buffer was ful= l and no loads caused an execution stall. Available PDIST counters: 0", + "PublicDescription": "Counts cycles where the Store Buffer was ful= l and no loads caused an execution stall.", "SampleAfterValue": "1000003", "UMask": "0x40", "Unit": "cpu_core" @@ -791,7 +788,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa6", "EventName": "EXE_ACTIVITY.EXE_BOUND_0_PORTS", - "PublicDescription": "Number of cycles total of 0 uops executed on= all ports, Reservation Station (RS) was not empty, the Store Buffer (SB) w= as not full and there was no outstanding load. Available PDIST counters: 0", + "PublicDescription": "Number of cycles total of 0 uops executed on= all ports, Reservation Station (RS) was not empty, the Store Buffer (SB) w= as not full and there was no outstanding load.", "SampleAfterValue": "1000003", "UMask": "0x80", "Unit": "cpu_core" @@ -801,7 +798,7 @@ "Counter": "0,1,2,3", "EventCode": "0x75", "EventName": "INST_DECODED.DECODERS", - "PublicDescription": "Number of decoders utilized in a cycle when = the MITE (legacy decode pipeline) fetches instructions. Available PDIST cou= nters: 0", + "PublicDescription": "Number of decoders utilized in a cycle when = the MITE (legacy decode pipeline) fetches instructions.", "SampleAfterValue": "2000003", "UMask": "0x1", "Unit": "cpu_core" @@ -846,7 +843,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc0", "EventName": "INST_RETIRED.MACRO_FUSED", - "PublicDescription": "INST_RETIRED.MACRO_FUSED Available PDIST cou= nters: 0", "SampleAfterValue": "2000003", "UMask": "0x10", "Unit": "cpu_core" @@ -856,7 +852,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc0", "EventName": "INST_RETIRED.NOP", - "PublicDescription": "Counts all retired NOP or ENDBR32/64 or PREF= ETCHIT0/1 instructions Available PDIST counters: 0", + "PublicDescription": "Counts all retired NOP or ENDBR32/64 or PREF= ETCHIT0/1 instructions", "SampleAfterValue": "2000003", "UMask": "0x2", "Unit": "cpu_core" @@ -875,7 +871,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc0", "EventName": "INST_RETIRED.REP_ITERATION", - "PublicDescription": "Number of iterations of Repeat (REP) string = retired instructions such as MOVS, CMPS, and SCAS. Each has a byte, word, a= nd doubleword version and string instructions can be repeated using a repet= ition prefix, REP, that allows their architectural execution to be repeated= a number of times as specified by the RCX register. Note the number of ite= rations is implementation-dependent. Available PDIST counters: 0", + "PublicDescription": "Number of iterations of Repeat (REP) string = retired instructions such as MOVS, CMPS, and SCAS. Each has a byte, word, a= nd doubleword version and string instructions can be repeated using a repet= ition prefix, REP, that allows their architectural execution to be repeated= a number of times as specified by the RCX register. Note the number of ite= rations is implementation-dependent.", "SampleAfterValue": "2000003", "UMask": "0x8", "Unit": "cpu_core" @@ -887,7 +883,7 @@ "EdgeDetect": "1", "EventCode": "0xad", "EventName": "INT_MISC.CLEARS_COUNT", - "PublicDescription": "Counts the number of speculative clears due = to any type of branch misprediction or machine clears Available PDIST count= ers: 0", + "PublicDescription": "Counts the number of speculative clears due = to any type of branch misprediction or machine clears", "SampleAfterValue": "500009", "UMask": "0x1", "Unit": "cpu_core" @@ -897,7 +893,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xad", "EventName": "INT_MISC.CLEAR_RESTEER_CYCLES", - "PublicDescription": "Cycles after recovery from a branch mispredi= ction or machine clear till the first uop is issued from the resteered path= . Available PDIST counters: 0", + "PublicDescription": "Cycles after recovery from a branch mispredi= ction or machine clear till the first uop is issued from the resteered path= .", "SampleAfterValue": "500009", "UMask": "0x80", "Unit": "cpu_core" @@ -907,7 +903,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xad", "EventName": "INT_MISC.RECOVERY_CYCLES", - "PublicDescription": "Counts core cycles when the Resource allocat= or was stalled due to recovery from an earlier branch misprediction or mach= ine clear event. Available PDIST counters: 0", + "PublicDescription": "Counts core cycles when the Resource allocat= or was stalled due to recovery from an earlier branch misprediction or mach= ine clear event.", "SampleAfterValue": "500009", "UMask": "0x1", "Unit": "cpu_core" @@ -919,7 +915,6 @@ "EventName": "INT_MISC.UNKNOWN_BRANCH_CYCLES", "MSRIndex": "0x3F7", "MSRValue": "0x7", - "PublicDescription": "Bubble cycles of BAClear (Unknown Branch). A= vailable PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x40", "Unit": "cpu_core" @@ -929,7 +924,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xad", "EventName": "INT_MISC.UOP_DROPPING", - "PublicDescription": "Estimated number of Top-down Microarchitectu= re Analysis slots that got dropped due to non front-end reasons Available P= DIST counters: 0", + "PublicDescription": "Estimated number of Top-down Microarchitectu= re Analysis slots that got dropped due to non front-end reasons", "SampleAfterValue": "1000003", "UMask": "0x10", "Unit": "cpu_core" @@ -939,7 +934,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe7", "EventName": "INT_VEC_RETIRED.128BIT", - "PublicDescription": "INT_VEC_RETIRED.128BIT Available PDIST count= ers: 0", "SampleAfterValue": "1000003", "UMask": "0x13", "Unit": "cpu_core" @@ -949,7 +943,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe7", "EventName": "INT_VEC_RETIRED.256BIT", - "PublicDescription": "INT_VEC_RETIRED.256BIT Available PDIST count= ers: 0", "SampleAfterValue": "1000003", "UMask": "0xac", "Unit": "cpu_core" @@ -959,7 +952,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe7", "EventName": "INT_VEC_RETIRED.ADD_128", - "PublicDescription": "Number of retired integer ADD/SUB (regular o= r horizontal), SAD 128-bit vector instructions. Available PDIST counters: 0= ", + "PublicDescription": "Number of retired integer ADD/SUB (regular o= r horizontal), SAD 128-bit vector instructions.", "SampleAfterValue": "1000003", "UMask": "0x3", "Unit": "cpu_core" @@ -969,7 +962,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe7", "EventName": "INT_VEC_RETIRED.ADD_256", - "PublicDescription": "Number of retired integer ADD/SUB (regular o= r horizontal), SAD 256-bit vector instructions. Available PDIST counters: 0= ", + "PublicDescription": "Number of retired integer ADD/SUB (regular o= r horizontal), SAD 256-bit vector instructions.", "SampleAfterValue": "1000003", "UMask": "0xc", "Unit": "cpu_core" @@ -979,7 +972,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe7", "EventName": "INT_VEC_RETIRED.MUL_256", - "PublicDescription": "INT_VEC_RETIRED.MUL_256 Available PDIST coun= ters: 0", "SampleAfterValue": "1000003", "UMask": "0x80", "Unit": "cpu_core" @@ -989,7 +981,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe7", "EventName": "INT_VEC_RETIRED.SHUFFLES", - "PublicDescription": "INT_VEC_RETIRED.SHUFFLES Available PDIST cou= nters: 0", "SampleAfterValue": "1000003", "UMask": "0x40", "Unit": "cpu_core" @@ -999,7 +990,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe7", "EventName": "INT_VEC_RETIRED.VNNI_128", - "PublicDescription": "INT_VEC_RETIRED.VNNI_128 Available PDIST cou= nters: 0", "SampleAfterValue": "1000003", "UMask": "0x10", "Unit": "cpu_core" @@ -1009,7 +999,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe7", "EventName": "INT_VEC_RETIRED.VNNI_256", - "PublicDescription": "INT_VEC_RETIRED.VNNI_256 Available PDIST cou= nters: 0", "SampleAfterValue": "1000003", "UMask": "0x20", "Unit": "cpu_core" @@ -1028,7 +1017,7 @@ "Counter": "0,1,2,3", "EventCode": "0x03", "EventName": "LD_BLOCKS.ADDRESS_ALIAS", - "PublicDescription": "Counts the number of times a load got blocke= d due to false dependencies in MOB due to partial compare on address. Avail= able PDIST counters: 0", + "PublicDescription": "Counts the number of times a load got blocke= d due to false dependencies in MOB due to partial compare on address.", "SampleAfterValue": "100003", "UMask": "0x4", "Unit": "cpu_core" @@ -1047,7 +1036,7 @@ "Counter": "0,1,2,3", "EventCode": "0x03", "EventName": "LD_BLOCKS.NO_SR", - "PublicDescription": "Counts the number of times that split load o= perations are temporarily blocked because all resources for handling the sp= lit accesses are in use. Available PDIST counters: 0", + "PublicDescription": "Counts the number of times that split load o= perations are temporarily blocked because all resources for handling the sp= lit accesses are in use.", "SampleAfterValue": "100003", "UMask": "0x88", "Unit": "cpu_core" @@ -1066,7 +1055,7 @@ "Counter": "0,1,2,3", "EventCode": "0x03", "EventName": "LD_BLOCKS.STORE_FORWARD", - "PublicDescription": "Counts the number of times where store forwa= rding was prevented for a load operation. The most common case is a load bl= ocked due to the address of memory access (partially) overlapping with a pr= eceding uncompleted store. Note: See the table of not supported store forwa= rds in the Optimization Guide. Available PDIST counters: 0", + "PublicDescription": "Counts the number of times where store forwa= rding was prevented for a load operation. The most common case is a load bl= ocked due to the address of memory access (partially) overlapping with a pr= eceding uncompleted store. Note: See the table of not supported store forwa= rds in the Optimization Guide.", "SampleAfterValue": "100003", "UMask": "0x82", "Unit": "cpu_core" @@ -1076,7 +1065,7 @@ "Counter": "0,1,2,3", "EventCode": "0x4c", "EventName": "LOAD_HIT_PREFETCH.SWPF", - "PublicDescription": "Counts all software-prefetch load dispatches= that hit the fill buffer (FB) allocated for the software prefetch. It can = also be incremented by some lock instructions. So it should only be used wi= th profiling so that the locks can be excluded by ASM (Assembly File) inspe= ction of the nearby instructions. Available PDIST counters: 0", + "PublicDescription": "Counts all software-prefetch load dispatches= that hit the fill buffer (FB) allocated for the software prefetch. It can = also be incremented by some lock instructions. So it should only be used wi= th profiling so that the locks can be excluded by ASM (Assembly File) inspe= ction of the nearby instructions.", "SampleAfterValue": "100003", "UMask": "0x1", "Unit": "cpu_core" @@ -1087,7 +1076,7 @@ "CounterMask": "1", "EventCode": "0xa8", "EventName": "LSD.CYCLES_ACTIVE", - "PublicDescription": "Counts the cycles when at least one uop is d= elivered by the LSD (Loop-stream detector). Available PDIST counters: 0", + "PublicDescription": "Counts the cycles when at least one uop is d= elivered by the LSD (Loop-stream detector).", "SampleAfterValue": "2000003", "UMask": "0x1", "Unit": "cpu_core" @@ -1098,7 +1087,7 @@ "CounterMask": "6", "EventCode": "0xa8", "EventName": "LSD.CYCLES_OK", - "PublicDescription": "Counts the cycles when optimal number of uop= s is delivered by the LSD (Loop-stream detector). Available PDIST counters:= 0", + "PublicDescription": "Counts the cycles when optimal number of uop= s is delivered by the LSD (Loop-stream detector).", "SampleAfterValue": "2000003", "UMask": "0x1", "Unit": "cpu_core" @@ -1108,7 +1097,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa8", "EventName": "LSD.UOPS", - "PublicDescription": "Counts the number of uops delivered to the b= ack-end by the LSD(Loop Stream Detector). Available PDIST counters: 0", + "PublicDescription": "Counts the number of uops delivered to the b= ack-end by the LSD(Loop Stream Detector).", "SampleAfterValue": "2000003", "UMask": "0x1", "Unit": "cpu_core" @@ -1128,7 +1117,7 @@ "EdgeDetect": "1", "EventCode": "0xc3", "EventName": "MACHINE_CLEARS.COUNT", - "PublicDescription": "Counts the number of machine clears (nukes) = of any type. Available PDIST counters: 0", + "PublicDescription": "Counts the number of machine clears (nukes) = of any type.", "SampleAfterValue": "100003", "UMask": "0x1", "Unit": "cpu_core" @@ -1184,7 +1173,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc3", "EventName": "MACHINE_CLEARS.SMC", - "PublicDescription": "Counts self-modifying code (SMC) detected, w= hich causes a machine clear. Available PDIST counters: 0", + "PublicDescription": "Counts self-modifying code (SMC) detected, w= hich causes a machine clear.", "SampleAfterValue": "100003", "UMask": "0x4", "Unit": "cpu_core" @@ -1194,7 +1183,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe0", "EventName": "MISC2_RETIRED.LFENCE", - "PublicDescription": "number of LFENCE retired instructions Availa= ble PDIST counters: 0", + "PublicDescription": "number of LFENCE retired instructions", "SampleAfterValue": "400009", "UMask": "0x20", "Unit": "cpu_core" @@ -1213,7 +1202,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xcc", "EventName": "MISC_RETIRED.LBR_INSERTS", - "PublicDescription": "Increments when an entry is added to the Las= t Branch Record (LBR) array (or removed from the array in case of RETURNs i= n call stack mode). The event requires LBR enable via IA32_DEBUGCTL MSR and= branch type selection via MSR_LBR_SELECT. Available PDIST counters: 0", + "PublicDescription": "Increments when an entry is added to the Las= t Branch Record (LBR) array (or removed from the array in case of RETURNs i= n call stack mode). The event requires LBR enable via IA32_DEBUGCTL MSR and= branch type selection via MSR_LBR_SELECT.", "SampleAfterValue": "100003", "UMask": "0x20", "Unit": "cpu_core" @@ -1223,7 +1212,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa2", "EventName": "RESOURCE_STALLS.SB", - "PublicDescription": "Counts allocation stall cycles caused by the= store buffer (SB) being full. This counts cycles that the pipeline back-en= d blocked uop delivery from the front-end. Available PDIST counters: 0", + "PublicDescription": "Counts allocation stall cycles caused by the= store buffer (SB) being full. This counts cycles that the pipeline back-en= d blocked uop delivery from the front-end.", "SampleAfterValue": "100003", "UMask": "0x8", "Unit": "cpu_core" @@ -1233,7 +1222,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa2", "EventName": "RESOURCE_STALLS.SCOREBOARD", - "PublicDescription": "Counts cycles where the pipeline is stalled = due to serializing operations. Available PDIST counters: 0", "SampleAfterValue": "100003", "UMask": "0x2", "Unit": "cpu_core" @@ -1243,7 +1231,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa5", "EventName": "RS.EMPTY", - "PublicDescription": "Counts cycles during which the reservation s= tation (RS) is empty for this logical processor. This is usually caused whe= n the front-end pipeline runs into starvation periods (e.g. branch mispredi= ctions or i-cache misses) Available PDIST counters: 0", + "PublicDescription": "Counts cycles during which the reservation s= tation (RS) is empty for this logical processor. This is usually caused whe= n the front-end pipeline runs into starvation periods (e.g. branch mispredi= ctions or i-cache misses)", "SampleAfterValue": "1000003", "UMask": "0x7", "Unit": "cpu_core" @@ -1256,7 +1244,7 @@ "EventCode": "0xa5", "EventName": "RS.EMPTY_COUNT", "Invert": "1", - "PublicDescription": "Counts end of periods where the Reservation = Station (RS) was empty. Could be useful to closely sample on front-end late= ncy issues (see the FRONTEND_RETIRED event of designated precise events) Av= ailable PDIST counters: 0", + "PublicDescription": "Counts end of periods where the Reservation = Station (RS) was empty. Could be useful to closely sample on front-end late= ncy issues (see the FRONTEND_RETIRED event of designated precise events)", "SampleAfterValue": "100003", "UMask": "0x7", "Unit": "cpu_core" @@ -1266,7 +1254,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa5", "EventName": "RS.EMPTY_RESOURCE", - "PublicDescription": "Cycles when RS was empty and a resource allo= cation stall is asserted Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x1", "Unit": "cpu_core" @@ -1285,7 +1272,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa4", "EventName": "TOPDOWN.BACKEND_BOUND_SLOTS", - "PublicDescription": "This event counts a subset of the Topdown Sl= ots event that were not consumed by the back-end pipeline due to lack of ba= ck-end resources, as a result of memory subsystem delays, execution units l= imitations, or other conditions. The count is distributed among unhalted lo= gical processors (hyper-threads) who share the same physical core, in proce= ssors that support Intel Hyper-Threading Technology. Software can use this = event as the numerator for the Backend Bound metric (or top-level category)= of the Top-down Microarchitecture Analysis method. Available PDIST counter= s: 0", + "PublicDescription": "This event counts a subset of the Topdown Sl= ots event that were not consumed by the back-end pipeline due to lack of ba= ck-end resources, as a result of memory subsystem delays, execution units l= imitations, or other conditions. The count is distributed among unhalted lo= gical processors (hyper-threads) who share the same physical core, in proce= ssors that support Intel Hyper-Threading Technology. Software can use this = event as the numerator for the Backend Bound metric (or top-level category)= of the Top-down Microarchitecture Analysis method.", "SampleAfterValue": "10000003", "UMask": "0x2", "Unit": "cpu_core" @@ -1295,7 +1282,7 @@ "Counter": "0", "EventCode": "0xa4", "EventName": "TOPDOWN.BAD_SPEC_SLOTS", - "PublicDescription": "Number of slots of TMA method that were wast= ed due to incorrect speculation. It covers all types of control-flow or dat= a-related mis-speculations. Available PDIST counters: 0", + "PublicDescription": "Number of slots of TMA method that were wast= ed due to incorrect speculation. It covers all types of control-flow or dat= a-related mis-speculations.", "SampleAfterValue": "10000003", "UMask": "0x4", "Unit": "cpu_core" @@ -1305,7 +1292,7 @@ "Counter": "0", "EventCode": "0xa4", "EventName": "TOPDOWN.BR_MISPREDICT_SLOTS", - "PublicDescription": "Number of TMA slots that were wasted due to = incorrect speculation by (any type of) branch mispredictions. This event es= timates number of speculative operations that were issued but not retired a= s well as the out-of-order engine recovery past a branch misprediction. Ava= ilable PDIST counters: 0", + "PublicDescription": "Number of TMA slots that were wasted due to = incorrect speculation by (any type of) branch mispredictions. This event es= timates number of speculative operations that were issued but not retired a= s well as the out-of-order engine recovery past a branch misprediction.", "SampleAfterValue": "10000003", "UMask": "0x8", "Unit": "cpu_core" @@ -1315,7 +1302,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa4", "EventName": "TOPDOWN.MEMORY_BOUND_SLOTS", - "PublicDescription": "TOPDOWN.MEMORY_BOUND_SLOTS Available PDIST c= ounters: 0", "SampleAfterValue": "10000003", "UMask": "0x10", "Unit": "cpu_core" @@ -1324,7 +1310,7 @@ "BriefDescription": "TMA slots available for an unhalted logical p= rocessor. Fixed counter - architectural event", "Counter": "Fixed counter 3", "EventName": "TOPDOWN.SLOTS", - "PublicDescription": "Number of available slots for an unhalted lo= gical processor. The event increments by machine-width of the narrowest pip= eline as employed by the Top-down Microarchitecture Analysis method (TMA). = The count is distributed among unhalted logical processors (hyper-threads) = who share the same physical core. Software can use this event as the denomi= nator for the top-level metrics of the TMA method. This architectural event= is counted on a designated fixed counter (Fixed Counter 3). Available PDIS= T counters: 0", + "PublicDescription": "Number of available slots for an unhalted lo= gical processor. The event increments by machine-width of the narrowest pip= eline as employed by the Top-down Microarchitecture Analysis method (TMA). = The count is distributed among unhalted logical processors (hyper-threads) = who share the same physical core. Software can use this event as the denomi= nator for the top-level metrics of the TMA method. This architectural event= is counted on a designated fixed counter (Fixed Counter 3).", "SampleAfterValue": "10000003", "UMask": "0x4", "Unit": "cpu_core" @@ -1334,7 +1320,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa4", "EventName": "TOPDOWN.SLOTS_P", - "PublicDescription": "Counts the number of available slots for an = unhalted logical processor. The event increments by machine-width of the na= rrowest pipeline as employed by the Top-down Microarchitecture Analysis met= hod. The count is distributed among unhalted logical processors (hyper-thre= ads) who share the same physical core. Available PDIST counters: 0", + "PublicDescription": "Counts the number of available slots for an = unhalted logical processor. The event increments by machine-width of the na= rrowest pipeline as employed by the Top-down Microarchitecture Analysis met= hod. The count is distributed among unhalted logical processors (hyper-thre= ads) who share the same physical core.", "SampleAfterValue": "10000003", "UMask": "0x1", "Unit": "cpu_core" @@ -1591,7 +1577,7 @@ "Counter": "0,1,2,3", "EventCode": "0x76", "EventName": "UOPS_DECODED.DEC0_UOPS", - "PublicDescription": "This event counts the number of not dec-by-a= ll uops decoded by decoder 0. Available PDIST counters: 0", + "PublicDescription": "This event counts the number of not dec-by-a= ll uops decoded by decoder 0.", "SampleAfterValue": "1000003", "UMask": "0x1", "Unit": "cpu_core" @@ -1601,7 +1587,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb2", "EventName": "UOPS_DISPATCHED.PORT_0", - "PublicDescription": "Number of uops dispatch to execution port 0= . Available PDIST counters: 0", + "PublicDescription": "Number of uops dispatch to execution port 0= .", "SampleAfterValue": "2000003", "UMask": "0x1", "Unit": "cpu_core" @@ -1611,7 +1597,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb2", "EventName": "UOPS_DISPATCHED.PORT_1", - "PublicDescription": "Number of uops dispatch to execution port 1= . Available PDIST counters: 0", + "PublicDescription": "Number of uops dispatch to execution port 1= .", "SampleAfterValue": "2000003", "UMask": "0x2", "Unit": "cpu_core" @@ -1621,7 +1607,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb2", "EventName": "UOPS_DISPATCHED.PORT_2_3_10", - "PublicDescription": "Number of uops dispatch to execution ports 2= , 3 and 10 Available PDIST counters: 0", + "PublicDescription": "Number of uops dispatch to execution ports 2= , 3 and 10", "SampleAfterValue": "2000003", "UMask": "0x4", "Unit": "cpu_core" @@ -1631,7 +1617,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb2", "EventName": "UOPS_DISPATCHED.PORT_4_9", - "PublicDescription": "Number of uops dispatch to execution ports 4= and 9 Available PDIST counters: 0", + "PublicDescription": "Number of uops dispatch to execution ports 4= and 9", "SampleAfterValue": "2000003", "UMask": "0x10", "Unit": "cpu_core" @@ -1641,7 +1627,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb2", "EventName": "UOPS_DISPATCHED.PORT_5_11", - "PublicDescription": "Number of uops dispatch to execution ports 5= and 11 Available PDIST counters: 0", + "PublicDescription": "Number of uops dispatch to execution ports 5= and 11", "SampleAfterValue": "2000003", "UMask": "0x20", "Unit": "cpu_core" @@ -1651,7 +1637,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb2", "EventName": "UOPS_DISPATCHED.PORT_6", - "PublicDescription": "Number of uops dispatch to execution port 6= . Available PDIST counters: 0", + "PublicDescription": "Number of uops dispatch to execution port 6= .", "SampleAfterValue": "2000003", "UMask": "0x40", "Unit": "cpu_core" @@ -1661,7 +1647,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb2", "EventName": "UOPS_DISPATCHED.PORT_7_8", - "PublicDescription": "Number of uops dispatch to execution ports = 7 and 8. Available PDIST counters: 0", + "PublicDescription": "Number of uops dispatch to execution ports = 7 and 8.", "SampleAfterValue": "2000003", "UMask": "0x80", "Unit": "cpu_core" @@ -1671,7 +1657,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.CORE", - "PublicDescription": "Counts the number of uops executed from any = thread. Available PDIST counters: 0", + "PublicDescription": "Counts the number of uops executed from any = thread.", "SampleAfterValue": "2000003", "UMask": "0x2", "Unit": "cpu_core" @@ -1682,7 +1668,7 @@ "CounterMask": "1", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.CORE_CYCLES_GE_1", - "PublicDescription": "Counts cycles when at least 1 micro-op is ex= ecuted from any thread on physical core. Available PDIST counters: 0", + "PublicDescription": "Counts cycles when at least 1 micro-op is ex= ecuted from any thread on physical core.", "SampleAfterValue": "2000003", "UMask": "0x2", "Unit": "cpu_core" @@ -1693,7 +1679,7 @@ "CounterMask": "2", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.CORE_CYCLES_GE_2", - "PublicDescription": "Counts cycles when at least 2 micro-ops are = executed from any thread on physical core. Available PDIST counters: 0", + "PublicDescription": "Counts cycles when at least 2 micro-ops are = executed from any thread on physical core.", "SampleAfterValue": "2000003", "UMask": "0x2", "Unit": "cpu_core" @@ -1704,7 +1690,7 @@ "CounterMask": "3", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.CORE_CYCLES_GE_3", - "PublicDescription": "Counts cycles when at least 3 micro-ops are = executed from any thread on physical core. Available PDIST counters: 0", + "PublicDescription": "Counts cycles when at least 3 micro-ops are = executed from any thread on physical core.", "SampleAfterValue": "2000003", "UMask": "0x2", "Unit": "cpu_core" @@ -1715,7 +1701,7 @@ "CounterMask": "4", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.CORE_CYCLES_GE_4", - "PublicDescription": "Counts cycles when at least 4 micro-ops are = executed from any thread on physical core. Available PDIST counters: 0", + "PublicDescription": "Counts cycles when at least 4 micro-ops are = executed from any thread on physical core.", "SampleAfterValue": "2000003", "UMask": "0x2", "Unit": "cpu_core" @@ -1726,7 +1712,7 @@ "CounterMask": "1", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.CYCLES_GE_1", - "PublicDescription": "Cycles where at least 1 uop was executed per= -thread. Available PDIST counters: 0", + "PublicDescription": "Cycles where at least 1 uop was executed per= -thread.", "SampleAfterValue": "2000003", "UMask": "0x1", "Unit": "cpu_core" @@ -1737,7 +1723,7 @@ "CounterMask": "2", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.CYCLES_GE_2", - "PublicDescription": "Cycles where at least 2 uops were executed p= er-thread. Available PDIST counters: 0", + "PublicDescription": "Cycles where at least 2 uops were executed p= er-thread.", "SampleAfterValue": "2000003", "UMask": "0x1", "Unit": "cpu_core" @@ -1748,7 +1734,7 @@ "CounterMask": "3", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.CYCLES_GE_3", - "PublicDescription": "Cycles where at least 3 uops were executed p= er-thread. Available PDIST counters: 0", + "PublicDescription": "Cycles where at least 3 uops were executed p= er-thread.", "SampleAfterValue": "2000003", "UMask": "0x1", "Unit": "cpu_core" @@ -1759,7 +1745,7 @@ "CounterMask": "4", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.CYCLES_GE_4", - "PublicDescription": "Cycles where at least 4 uops were executed p= er-thread. Available PDIST counters: 0", + "PublicDescription": "Cycles where at least 4 uops were executed p= er-thread.", "SampleAfterValue": "2000003", "UMask": "0x1", "Unit": "cpu_core" @@ -1771,7 +1757,7 @@ "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.STALLS", "Invert": "1", - "PublicDescription": "Counts cycles during which no uops were disp= atched from the Reservation Station (RS) per thread. Available PDIST counte= rs: 0", + "PublicDescription": "Counts cycles during which no uops were disp= atched from the Reservation Station (RS) per thread.", "SampleAfterValue": "2000003", "UMask": "0x1", "Unit": "cpu_core" @@ -1781,7 +1767,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.THREAD", - "PublicDescription": "Counts the number of uops to be executed per= -thread each cycle. Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x1", "Unit": "cpu_core" @@ -1791,7 +1776,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.X87", - "PublicDescription": "Counts the number of x87 uops executed. Avai= lable PDIST counters: 0", + "PublicDescription": "Counts the number of x87 uops executed.", "SampleAfterValue": "2000003", "UMask": "0x10", "Unit": "cpu_core" @@ -1810,7 +1795,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xae", "EventName": "UOPS_ISSUED.ANY", - "PublicDescription": "Counts the number of uops that the Resource = Allocation Table (RAT) issues to the Reservation Station (RS). Available PD= IST counters: 0", + "PublicDescription": "Counts the number of uops that the Resource = Allocation Table (RAT) issues to the Reservation Station (RS).", "SampleAfterValue": "2000003", "UMask": "0x1", "Unit": "cpu_core" @@ -1821,7 +1806,6 @@ "CounterMask": "1", "EventCode": "0xae", "EventName": "UOPS_ISSUED.CYCLES", - "PublicDescription": "UOPS_ISSUED.CYCLES Available PDIST counters:= 0", "SampleAfterValue": "2000003", "UMask": "0x1", "Unit": "cpu_core" @@ -1840,7 +1824,7 @@ "CounterMask": "1", "EventCode": "0xc2", "EventName": "UOPS_RETIRED.CYCLES", - "PublicDescription": "Counts cycles where at least one uop has ret= ired. Available PDIST counters: 0", + "PublicDescription": "Counts cycles where at least one uop has ret= ired.", "SampleAfterValue": "1000003", "UMask": "0x2", "Unit": "cpu_core" @@ -1850,7 +1834,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc2", "EventName": "UOPS_RETIRED.HEAVY", - "PublicDescription": "Counts the number of retired micro-operation= s (uops) except the last uop of each instruction. An instruction that is de= coded into less than two uops does not contribute to the count. Available P= DIST counters: 0", + "PublicDescription": "Counts the number of retired micro-operation= s (uops) except the last uop of each instruction. An instruction that is de= coded into less than two uops does not contribute to the count.", "SampleAfterValue": "2000003", "UMask": "0x1", "Unit": "cpu_core" @@ -1880,7 +1864,6 @@ "EventName": "UOPS_RETIRED.MS", "MSRIndex": "0x3F7", "MSRValue": "0x8", - "PublicDescription": "UOPS_RETIRED.MS Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x4", "Unit": "cpu_core" @@ -1890,7 +1873,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc2", "EventName": "UOPS_RETIRED.SLOTS", - "PublicDescription": "This event counts a subset of the Topdown Sl= ots event that are utilized by operations that eventually get retired (comm= itted) by the processor pipeline. Usually, this event positively correlates= with higher performance for example, as measured by the instructions-per-= cycle metric. Software can use this event as the numerator for the Retiring= metric (or top-level category) of the Top-down Microarchitecture Analysis = method. Available PDIST counters: 0", + "PublicDescription": "This event counts a subset of the Topdown Sl= ots event that are utilized by operations that eventually get retired (comm= itted) by the processor pipeline. Usually, this event positively correlates= with higher performance for example, as measured by the instructions-per-= cycle metric. Software can use this event as the numerator for the Retiring= metric (or top-level category) of the Top-down Microarchitecture Analysis = method.", "SampleAfterValue": "2000003", "UMask": "0x2", "Unit": "cpu_core" @@ -1902,7 +1885,7 @@ "EventCode": "0xc2", "EventName": "UOPS_RETIRED.STALLS", "Invert": "1", - "PublicDescription": "This event counts cycles without actually re= tired uops. Available PDIST counters: 0", + "PublicDescription": "This event counts cycles without actually re= tired uops.", "SampleAfterValue": "1000003", "UMask": "0x2", "Unit": "cpu_core" diff --git a/tools/perf/pmu-events/arch/x86/meteorlake/virtual-memory.json = b/tools/perf/pmu-events/arch/x86/meteorlake/virtual-memory.json index f300129e9e2d..305b96b26a4e 100644 --- a/tools/perf/pmu-events/arch/x86/meteorlake/virtual-memory.json +++ b/tools/perf/pmu-events/arch/x86/meteorlake/virtual-memory.json @@ -13,7 +13,7 @@ "Counter": "0,1,2,3", "EventCode": "0x12", "EventName": "DTLB_LOAD_MISSES.STLB_HIT", - "PublicDescription": "Counts loads that miss the DTLB (Data TLB) a= nd hit the STLB (Second level TLB). Available PDIST counters: 0", + "PublicDescription": "Counts loads that miss the DTLB (Data TLB) a= nd hit the STLB (Second level TLB).", "SampleAfterValue": "100003", "UMask": "0x20", "Unit": "cpu_core" @@ -24,7 +24,7 @@ "CounterMask": "1", "EventCode": "0x12", "EventName": "DTLB_LOAD_MISSES.WALK_ACTIVE", - "PublicDescription": "Counts cycles when at least one PMH (Page Mi= ss Handler) is busy with a page walk for a demand load. Available PDIST cou= nters: 0", + "PublicDescription": "Counts cycles when at least one PMH (Page Mi= ss Handler) is busy with a page walk for a demand load.", "SampleAfterValue": "100003", "UMask": "0x10", "Unit": "cpu_core" @@ -43,7 +43,7 @@ "Counter": "0,1,2,3", "EventCode": "0x12", "EventName": "DTLB_LOAD_MISSES.WALK_COMPLETED", - "PublicDescription": "Counts completed page walks (all page sizes= ) caused by demand data loads. This implies it missed in the DTLB and furth= er levels of TLB. The page walk can end with or without a fault. Available = PDIST counters: 0", + "PublicDescription": "Counts completed page walks (all page sizes= ) caused by demand data loads. This implies it missed in the DTLB and furth= er levels of TLB. The page walk can end with or without a fault.", "SampleAfterValue": "100003", "UMask": "0xe", "Unit": "cpu_core" @@ -53,7 +53,7 @@ "Counter": "0,1,2,3", "EventCode": "0x12", "EventName": "DTLB_LOAD_MISSES.WALK_COMPLETED_1G", - "PublicDescription": "Counts completed page walks (1G sizes) caus= ed by demand data loads. This implies address translations missed in the DT= LB and further levels of TLB. The page walk can end with or without a fault= . Available PDIST counters: 0", + "PublicDescription": "Counts completed page walks (1G sizes) caus= ed by demand data loads. This implies address translations missed in the DT= LB and further levels of TLB. The page walk can end with or without a fault= .", "SampleAfterValue": "100003", "UMask": "0x8", "Unit": "cpu_core" @@ -73,7 +73,7 @@ "Counter": "0,1,2,3", "EventCode": "0x12", "EventName": "DTLB_LOAD_MISSES.WALK_COMPLETED_2M_4M", - "PublicDescription": "Counts completed page walks (2M/4M sizes) c= aused by demand data loads. This implies address translations missed in the= DTLB and further levels of TLB. The page walk can end with or without a fa= ult. Available PDIST counters: 0", + "PublicDescription": "Counts completed page walks (2M/4M sizes) c= aused by demand data loads. This implies address translations missed in the= DTLB and further levels of TLB. The page walk can end with or without a fa= ult.", "SampleAfterValue": "100003", "UMask": "0x4", "Unit": "cpu_core" @@ -93,7 +93,7 @@ "Counter": "0,1,2,3", "EventCode": "0x12", "EventName": "DTLB_LOAD_MISSES.WALK_COMPLETED_4K", - "PublicDescription": "Counts completed page walks (4K sizes) caus= ed by demand data loads. This implies address translations missed in the DT= LB and further levels of TLB. The page walk can end with or without a fault= . Available PDIST counters: 0", + "PublicDescription": "Counts completed page walks (4K sizes) caus= ed by demand data loads. This implies address translations missed in the DT= LB and further levels of TLB. The page walk can end with or without a fault= .", "SampleAfterValue": "100003", "UMask": "0x2", "Unit": "cpu_core" @@ -113,7 +113,7 @@ "Counter": "0,1,2,3", "EventCode": "0x12", "EventName": "DTLB_LOAD_MISSES.WALK_PENDING", - "PublicDescription": "Counts the number of page walks outstanding = for a demand load in the PMH (Page Miss Handler) each cycle. Available PDIS= T counters: 0", + "PublicDescription": "Counts the number of page walks outstanding = for a demand load in the PMH (Page Miss Handler) each cycle.", "SampleAfterValue": "100003", "UMask": "0x10", "Unit": "cpu_core" @@ -132,7 +132,7 @@ "Counter": "0,1,2,3", "EventCode": "0x13", "EventName": "DTLB_STORE_MISSES.STLB_HIT", - "PublicDescription": "Counts stores that miss the DTLB (Data TLB) = and hit the STLB (2nd Level TLB). Available PDIST counters: 0", + "PublicDescription": "Counts stores that miss the DTLB (Data TLB) = and hit the STLB (2nd Level TLB).", "SampleAfterValue": "100003", "UMask": "0x20", "Unit": "cpu_core" @@ -143,7 +143,7 @@ "CounterMask": "1", "EventCode": "0x13", "EventName": "DTLB_STORE_MISSES.WALK_ACTIVE", - "PublicDescription": "Counts cycles when at least one PMH (Page Mi= ss Handler) is busy with a page walk for a store. Available PDIST counters:= 0", + "PublicDescription": "Counts cycles when at least one PMH (Page Mi= ss Handler) is busy with a page walk for a store.", "SampleAfterValue": "100003", "UMask": "0x10", "Unit": "cpu_core" @@ -162,7 +162,7 @@ "Counter": "0,1,2,3", "EventCode": "0x13", "EventName": "DTLB_STORE_MISSES.WALK_COMPLETED", - "PublicDescription": "Counts completed page walks (all page sizes= ) caused by demand data stores. This implies it missed in the DTLB and furt= her levels of TLB. The page walk can end with or without a fault. Available= PDIST counters: 0", + "PublicDescription": "Counts completed page walks (all page sizes= ) caused by demand data stores. This implies it missed in the DTLB and furt= her levels of TLB. The page walk can end with or without a fault.", "SampleAfterValue": "100003", "UMask": "0xe", "Unit": "cpu_core" @@ -172,7 +172,7 @@ "Counter": "0,1,2,3", "EventCode": "0x13", "EventName": "DTLB_STORE_MISSES.WALK_COMPLETED_1G", - "PublicDescription": "Counts completed page walks (1G sizes) caus= ed by demand data stores. This implies address translations missed in the D= TLB and further levels of TLB. The page walk can end with or without a faul= t. Available PDIST counters: 0", + "PublicDescription": "Counts completed page walks (1G sizes) caus= ed by demand data stores. This implies address translations missed in the D= TLB and further levels of TLB. The page walk can end with or without a faul= t.", "SampleAfterValue": "100003", "UMask": "0x8", "Unit": "cpu_core" @@ -192,7 +192,7 @@ "Counter": "0,1,2,3", "EventCode": "0x13", "EventName": "DTLB_STORE_MISSES.WALK_COMPLETED_2M_4M", - "PublicDescription": "Counts completed page walks (2M/4M sizes) c= aused by demand data stores. This implies address translations missed in th= e DTLB and further levels of TLB. The page walk can end with or without a f= ault. Available PDIST counters: 0", + "PublicDescription": "Counts completed page walks (2M/4M sizes) c= aused by demand data stores. This implies address translations missed in th= e DTLB and further levels of TLB. The page walk can end with or without a f= ault.", "SampleAfterValue": "100003", "UMask": "0x4", "Unit": "cpu_core" @@ -212,7 +212,7 @@ "Counter": "0,1,2,3", "EventCode": "0x13", "EventName": "DTLB_STORE_MISSES.WALK_COMPLETED_4K", - "PublicDescription": "Counts completed page walks (4K sizes) caus= ed by demand data stores. This implies address translations missed in the D= TLB and further levels of TLB. The page walk can end with or without a faul= t. Available PDIST counters: 0", + "PublicDescription": "Counts completed page walks (4K sizes) caus= ed by demand data stores. This implies address translations missed in the D= TLB and further levels of TLB. The page walk can end with or without a faul= t.", "SampleAfterValue": "100003", "UMask": "0x2", "Unit": "cpu_core" @@ -232,7 +232,7 @@ "Counter": "0,1,2,3", "EventCode": "0x13", "EventName": "DTLB_STORE_MISSES.WALK_PENDING", - "PublicDescription": "Counts the number of page walks outstanding = for a store in the PMH (Page Miss Handler) each cycle. Available PDIST coun= ters: 0", + "PublicDescription": "Counts the number of page walks outstanding = for a store in the PMH (Page Miss Handler) each cycle.", "SampleAfterValue": "100003", "UMask": "0x10", "Unit": "cpu_core" @@ -260,7 +260,7 @@ "Counter": "0,1,2,3", "EventCode": "0x11", "EventName": "ITLB_MISSES.STLB_HIT", - "PublicDescription": "Counts instruction fetch requests that miss = the ITLB (Instruction TLB) and hit the STLB (Second-level TLB). Available P= DIST counters: 0", + "PublicDescription": "Counts instruction fetch requests that miss = the ITLB (Instruction TLB) and hit the STLB (Second-level TLB).", "SampleAfterValue": "100003", "UMask": "0x20", "Unit": "cpu_core" @@ -271,7 +271,7 @@ "CounterMask": "1", "EventCode": "0x11", "EventName": "ITLB_MISSES.WALK_ACTIVE", - "PublicDescription": "Counts cycles when at least one PMH (Page Mi= ss Handler) is busy with a page walk for a code (instruction fetch) request= . Available PDIST counters: 0", + "PublicDescription": "Counts cycles when at least one PMH (Page Mi= ss Handler) is busy with a page walk for a code (instruction fetch) request= .", "SampleAfterValue": "100003", "UMask": "0x10", "Unit": "cpu_core" @@ -291,7 +291,7 @@ "Counter": "0,1,2,3", "EventCode": "0x11", "EventName": "ITLB_MISSES.WALK_COMPLETED", - "PublicDescription": "Counts completed page walks (all page sizes)= caused by a code fetch. This implies it missed in the ITLB (Instruction TL= B) and further levels of TLB. The page walk can end with or without a fault= . Available PDIST counters: 0", + "PublicDescription": "Counts completed page walks (all page sizes)= caused by a code fetch. This implies it missed in the ITLB (Instruction TL= B) and further levels of TLB. The page walk can end with or without a fault= .", "SampleAfterValue": "100003", "UMask": "0xe", "Unit": "cpu_core" @@ -311,7 +311,7 @@ "Counter": "0,1,2,3", "EventCode": "0x11", "EventName": "ITLB_MISSES.WALK_COMPLETED_2M_4M", - "PublicDescription": "Counts completed page walks (2M/4M page size= s) caused by a code fetch. This implies it missed in the ITLB (Instruction = TLB) and further levels of TLB. The page walk can end with or without a fau= lt. Available PDIST counters: 0", + "PublicDescription": "Counts completed page walks (2M/4M page size= s) caused by a code fetch. This implies it missed in the ITLB (Instruction = TLB) and further levels of TLB. The page walk can end with or without a fau= lt.", "SampleAfterValue": "100003", "UMask": "0x4", "Unit": "cpu_core" @@ -331,7 +331,7 @@ "Counter": "0,1,2,3", "EventCode": "0x11", "EventName": "ITLB_MISSES.WALK_COMPLETED_4K", - "PublicDescription": "Counts completed page walks (4K page sizes) = caused by a code fetch. This implies it missed in the ITLB (Instruction TLB= ) and further levels of TLB. The page walk can end with or without a fault.= Available PDIST counters: 0", + "PublicDescription": "Counts completed page walks (4K page sizes) = caused by a code fetch. This implies it missed in the ITLB (Instruction TLB= ) and further levels of TLB. The page walk can end with or without a fault.= ", "SampleAfterValue": "100003", "UMask": "0x2", "Unit": "cpu_core" @@ -351,7 +351,7 @@ "Counter": "0,1,2,3", "EventCode": "0x11", "EventName": "ITLB_MISSES.WALK_PENDING", - "PublicDescription": "Counts the number of page walks outstanding = for an outstanding code (instruction fetch) request in the PMH (Page Miss H= andler) each cycle. Available PDIST counters: 0", + "PublicDescription": "Counts the number of page walks outstanding = for an outstanding code (instruction fetch) request in the PMH (Page Miss H= andler) each cycle.", "SampleAfterValue": "100003", "UMask": "0x10", "Unit": "cpu_core" --=20 2.50.0.727.gbf7dc18ff4-goog From nobody Mon Oct 6 15:13:22 2025 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9DC812253BA for ; Sat, 19 Jul 2025 03:45:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752896757; cv=none; b=c8v25GGxmjWBWU3KqlgN8hJzpX1cuPyfH1I6kKWJOAJdpfVN8n41ZPABIwllm3Gp52EIsBaCp1NU/SJfBcwLMW3PNuRE+PU0r4iN/Ce5c2TXqwWHUPSKOQ7mKd7Aotql0XZRYOLx3CBP/JhOg/miZVDm0v5xWBKShJ3g9vuj4Q4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752896757; c=relaxed/simple; bh=as7wWLHb0auacGI+HfGnUpXG3EcntIv+I5PV4s18FbY=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Content-Type; b=H9KVYLWjTg4p8qUB7d/7Sbv2Tv8cM1B/4HMKmmbpuA9H4BBy1lCymDcdPTVJ+l0JSjLNm7ZmK3CI3q7UaSmArKUXmwxMrLgKbeWe0uA1OBozpwIdCyGfVWbd82QXrL5OT6v5Rrokaow0kzh5ISM1sdU1dc8LTCZD4teBEfYdcnE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=gXbCGS57; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="gXbCGS57" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-3138c50d2a0so3963114a91.2 for ; Fri, 18 Jul 2025 20:45:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1752896751; x=1753501551; darn=vger.kernel.org; h=content-transfer-encoding:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=Y3WYZgH3rNPUVeSy3/7+Re2XxOn5w18Tz+qgtFTnT9k=; b=gXbCGS57v7L9baPczJmDAClH0rkQqyIOc3IF9U3vivo0gcaS23JnJcXsCNRe7KSeLb SEI4cFXjIAqlQ8XikePe7JLkSg/MkL+UZskSD1kpkQT0vcrKoI7HdrzXfQ8C+dgfwo/p OHISjC/VRNjEdto4C1L8JQTkrsXRr25VT9UxfJPRAkMrToCEY1Vf83TGSGU4n84lPJa0 9me8lhazJDG9bImch3Agz4Xxkj0orHhoDKicc3G8vlI2XmvSsOmZFrcph/0oM1C5QU+S DsLC8onSt55S32HUPjP6ZSiGVxZpzL6rDKXToWHBLzFCsm+aSuu+wOYfjQKV39fhRZuT sFpQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752896751; x=1753501551; h=content-transfer-encoding:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=Y3WYZgH3rNPUVeSy3/7+Re2XxOn5w18Tz+qgtFTnT9k=; b=B/ndLhWyiRiSOYL/iCv8yXDtDRhpOX9/VAKwNcoGtawnjH8G+6zmpLow9gOnkZVzlx zw0sX+X2CdMHR7NCeUvSkyFWmTGmGUbQDc38OPu05xV4t25B8OVQ2Xo5/mmGYk2HBrCq fSsNapv76iLeM7Z6jkkcPj5jJsUswjuO4LqmyLOqaA29ukWp6oyTj+VKVXeIH0Sn2CTa e1gQ1g/t3YBzzsSI13TMExoF74JPVyOhcUrQUBbkTHgGDOP1Ftr+98lkEnwiFIECQITA oFi4GNCN3Nh/W8UGwVglpTUq75sVesjM4ABikanBUof2ISCJj2PZtWzG4k7qqXYzh/J3 KiGg== X-Forwarded-Encrypted: i=1; AJvYcCU0rYBKcyoY/w0+8SGcU1K/3+bHQKx20nyTZc8LvTKvZWFCN55tgxWCG+l1RSIWuGb4HUOTIXBQ3RpjcC4=@vger.kernel.org X-Gm-Message-State: AOJu0YwRQwnsve0yTxi9Qo7UdVscAtnOTrFME3gv71lBFNPoc2PHK7GR QYs5HBhnhmFgKJqLr+HClCkSaJPMxCoHQkULLdowv7BooKGLxoQsclW2sIy+YVKmNLtABFVBjZl 4Z5QWThUktw== X-Google-Smtp-Source: AGHT+IGoNvDRSMVfKxfn1lzllj3fC5pVtz6ncuma11irfQgdhFbw1hJZlC/A/DEwswcqgJ8tyiL/nSiZqEiF X-Received: from pjbpt1.prod.google.com ([2002:a17:90b:3d01:b0:311:ff32:a85d]) (user=irogers job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90a:c884:b0:311:ad7f:329f with SMTP id 98e67ed59e1d1-31caf921df8mr13096835a91.31.1752896750897; Fri, 18 Jul 2025 20:45:50 -0700 (PDT) Date: Fri, 18 Jul 2025 20:45:10 -0700 In-Reply-To: <20250719034515.2000467-1-irogers@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250719034515.2000467-1-irogers@google.com> X-Mailer: git-send-email 2.50.0.727.gbf7dc18ff4-goog Message-ID: <20250719034515.2000467-15-irogers@google.com> Subject: [PATCH v1 14/19] perf vendor events: Update rocketlake metrics From: Ian Rogers To: Thomas Falcon , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Kan Liang , "=?UTF-8?q?Andreas=20F=C3=A4rber?=" , Manivannan Sadhasivam , Caleb Biggers , Weilin Wang , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-actions@lists.infradead.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Update metrics from TMA 5.0 to 5.1. Signed-off-by: Ian Rogers Tested-by: Thomas Falcon --- .../arch/x86/rocketlake/rkl-metrics.json | 97 +++++++++++-------- 1 file changed, 56 insertions(+), 41 deletions(-) diff --git a/tools/perf/pmu-events/arch/x86/rocketlake/rkl-metrics.json b/t= ools/perf/pmu-events/arch/x86/rocketlake/rkl-metrics.json index 71737a1a1997..79dc34157481 100644 --- a/tools/perf/pmu-events/arch/x86/rocketlake/rkl-metrics.json +++ b/tools/perf/pmu-events/arch/x86/rocketlake/rkl-metrics.json @@ -1,63 +1,63 @@ [ { "BriefDescription": "C10 residency percent per package", - "MetricExpr": "cstate_pkg@c10\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c10\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C10_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C2 residency percent per package", - "MetricExpr": "cstate_pkg@c2\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c2\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C2_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C3 residency percent per package", - "MetricExpr": "cstate_pkg@c3\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c3\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C3_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per core", - "MetricExpr": "cstate_core@c6\\-residency@ / TSC", + "MetricExpr": "cstate_core@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per package", - "MetricExpr": "cstate_pkg@c6\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C7 residency percent per core", - "MetricExpr": "cstate_core@c7\\-residency@ / TSC", + "MetricExpr": "cstate_core@c7\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C7_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C7 residency percent per package", - "MetricExpr": "cstate_pkg@c7\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c7\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C7_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C8 residency percent per package", - "MetricExpr": "cstate_pkg@c8\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c8\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C8_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C9 residency percent per package", - "MetricExpr": "cstate_pkg@c9\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c9\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C9_Pkg_Residency", "ScaleUnit": "100%" @@ -85,7 +85,6 @@ }, { "BriefDescription": "This metric estimates how often memory load a= ccesses were aliased by preceding stores (in program order) with a 4K addre= ss offset", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "LD_BLOCKS_PARTIAL.ADDRESS_ALIAS / tma_info_thread_c= lks", "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group", "MetricName": "tma_4k_aliasing", @@ -134,6 +133,7 @@ }, { "BriefDescription": "Total pipeline cost of instruction fetch rela= ted bottlenecks by large code footprint programs (i-side cache; TLB and BTB= misses)", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * tma_fetch_latency * (tma_itlb_misses + tma_ic= ache_misses + tma_unknown_branches) / (tma_branch_resteers + tma_dsb_switch= es + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches)", "MetricGroup": "BigFootprint;BvBC;Fed;Frontend;IcMiss;MemoryTLB", "MetricName": "tma_bottleneck_big_code", @@ -147,40 +147,45 @@ "MetricThreshold": "tma_bottleneck_branching_overhead > 5", "PublicDescription": "Total pipeline cost of instructions used for= program control-flow - a subset of the Retiring category in TMA. Examples = include function calls; loops and alignments. (A lower bound)" }, + { + "BriefDescription": "Total pipeline cost when the execution is com= pute-bound - an estimation", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "100 * (tma_core_bound * tma_divider / (tma_divider = + tma_ports_utilization + tma_serializing_operation) + tma_core_bound * (tm= a_ports_utilization / (tma_divider + tma_ports_utilization + tma_serializin= g_operation)) * (tma_ports_utilized_3m / (tma_ports_utilized_0 + tma_ports_= utilized_1 + tma_ports_utilized_2 + tma_ports_utilized_3m)))", + "MetricGroup": "BvCB;Cor;tma_issueComp", + "MetricName": "tma_bottleneck_compute_bound_est", + "MetricThreshold": "tma_bottleneck_compute_bound_est > 20", + "PublicDescription": "Total pipeline cost when the execution is co= mpute-bound - an estimation. Covers Core Bound when High ILP as well as whe= n long-latency execution units are busy. Related metrics: " + }, { "BriefDescription": "Total pipeline cost of external Memory- or Ca= che-Bandwidth related bottlenecks", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) *= (tma_mem_bandwidth / (tma_mem_bandwidth + tma_mem_latency)) + tma_memory_b= ound * (tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_= l3_bound + tma_store_bound)) * (tma_sq_full / (tma_contested_accesses + tma= _data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_memory_bound * (tm= a_l1_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound += tma_store_bound)) * (tma_fb_full / (tma_4k_aliasing + tma_dtlb_load + tma_= fb_full + tma_l1_latency_dependency + tma_lock_latency + tma_split_loads + = tma_store_fwd_blk)))", "MetricGroup": "BvMB;Mem;MemoryBW;Offcore;tma_issueBW", - "MetricName": "tma_bottleneck_cache_memory_bandwidth", - "MetricThreshold": "tma_bottleneck_cache_memory_bandwidth > 20", + "MetricName": "tma_bottleneck_data_cache_memory_bandwidth", + "MetricThreshold": "tma_bottleneck_data_cache_memory_bandwidth > 2= 0", "PublicDescription": "Total pipeline cost of external Memory- or C= ache-Bandwidth related bottlenecks. Related metrics: tma_fb_full, tma_info_= system_dram_bw_use, tma_mem_bandwidth, tma_sq_full" }, { "BriefDescription": "Total pipeline cost of external Memory- or Ca= che-Latency related bottlenecks", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) *= (tma_mem_latency / (tma_mem_bandwidth + tma_mem_latency)) + tma_memory_bou= nd * (tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3= _bound + tma_store_bound)) * (tma_l3_hit_latency / (tma_contested_accesses = + tma_data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_memory_bound = * tma_l2_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bou= nd + tma_store_bound) + tma_memory_bound * (tma_l1_bound / (tma_dram_bound = + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_l1_= latency_dependency / (tma_4k_aliasing + tma_dtlb_load + tma_fb_full + tma_l= 1_latency_dependency + tma_lock_latency + tma_split_loads + tma_store_fwd_b= lk)) + tma_memory_bound * (tma_l1_bound / (tma_dram_bound + tma_l1_bound + = tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_lock_latency / (tma_= 4k_aliasing + tma_dtlb_load + tma_fb_full + tma_l1_latency_dependency + tma= _lock_latency + tma_split_loads + tma_store_fwd_blk)) + tma_memory_bound * = (tma_l1_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_boun= d + tma_store_bound)) * (tma_split_loads / (tma_4k_aliasing + tma_dtlb_load= + tma_fb_full + tma_l1_latency_dependency + tma_lock_latency + tma_split_l= oads + tma_store_fwd_blk)) + tma_memory_bound * (tma_store_bound / (tma_dra= m_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * = (tma_split_stores / (tma_dtlb_store + tma_false_sharing + tma_split_stores = + tma_store_latency + tma_streaming_stores)) + tma_memory_bound * (tma_stor= e_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tm= a_store_bound)) * (tma_store_latency / (tma_dtlb_store + tma_false_sharing = + tma_split_stores + tma_store_latency + tma_streaming_stores)))", "MetricGroup": "BvML;Mem;MemoryLat;Offcore;tma_issueLat", - "MetricName": "tma_bottleneck_cache_memory_latency", - "MetricThreshold": "tma_bottleneck_cache_memory_latency > 20", + "MetricName": "tma_bottleneck_data_cache_memory_latency", + "MetricThreshold": "tma_bottleneck_data_cache_memory_latency > 20", "PublicDescription": "Total pipeline cost of external Memory- or C= ache-Latency related bottlenecks. Related metrics: tma_l3_hit_latency, tma_= mem_latency" }, - { - "BriefDescription": "Total pipeline cost when the execution is com= pute-bound - an estimation", - "MetricExpr": "100 * (tma_core_bound * tma_divider / (tma_divider = + tma_ports_utilization + tma_serializing_operation) + tma_core_bound * (tm= a_ports_utilization / (tma_divider + tma_ports_utilization + tma_serializin= g_operation)) * (tma_ports_utilized_3m / (tma_ports_utilized_0 + tma_ports_= utilized_1 + tma_ports_utilized_2 + tma_ports_utilized_3m)))", - "MetricGroup": "BvCB;Cor;tma_issueComp", - "MetricName": "tma_bottleneck_compute_bound_est", - "MetricThreshold": "tma_bottleneck_compute_bound_est > 20", - "PublicDescription": "Total pipeline cost when the execution is co= mpute-bound - an estimation. Covers Core Bound when High ILP as well as whe= n long-latency execution units are busy. Related metrics: " - }, { "BriefDescription": "Total pipeline cost of instruction fetch band= width related bottlenecks (when the front-end could not sustain operations = delivery to the back-end)", - "MetricExpr": "100 * (tma_frontend_bound - (1 - 10 * tma_microcode= _sequencer * tma_other_mispredicts / tma_branch_mispredicts) * tma_fetch_la= tency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches = + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches) - tma_mi= crocode_sequencer / (tma_few_uops_instructions + tma_microcode_sequencer) *= (tma_assists / tma_microcode_sequencer) * (tma_fetch_latency * (tma_ms_swi= tches + tma_branch_resteers * (tma_clears_resteers + 10 * tma_microcode_seq= uencer * tma_other_mispredicts / tma_branch_mispredicts * tma_mispredicts_r= esteers) / (tma_clears_resteers + tma_mispredicts_resteers + tma_unknown_br= anches)) / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tm= a_itlb_misses + tma_lcp + tma_ms_switches) + tma_fetch_bandwidth * tma_ms /= (tma_dsb + tma_lsd + tma_mite + tma_ms))) - tma_bottleneck_big_code", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "100 * (tma_frontend_bound - (1 - 10 * tma_microcode= _sequencer * tma_other_mispredicts / tma_branch_mispredicts) * tma_fetch_la= tency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches = + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches) - tma_mi= crocode_sequencer / (tma_few_uops_instructions + tma_microcode_sequencer) *= (tma_assists / tma_microcode_sequencer) * (tma_fetch_latency * (tma_ms_swi= tches + tma_branch_resteers * (tma_clears_resteers + 10 * tma_microcode_seq= uencer * tma_other_mispredicts / tma_branch_mispredicts * tma_mispredicts_r= esteers) / (tma_clears_resteers + tma_mispredicts_resteers + tma_unknown_br= anches)) / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tm= a_itlb_misses + tma_lcp + tma_ms_switches) + tma_ms)) - tma_bottleneck_big_= code", "MetricGroup": "BvFB;Fed;FetchBW;Frontend", "MetricName": "tma_bottleneck_instruction_fetch_bw", "MetricThreshold": "tma_bottleneck_instruction_fetch_bw > 20" }, { "BriefDescription": "Total pipeline cost of irregular execution (e= .g", - "MetricExpr": "100 * (tma_microcode_sequencer / (tma_few_uops_inst= ructions + tma_microcode_sequencer) * (tma_assists / tma_microcode_sequence= r) * (tma_fetch_latency * (tma_ms_switches + tma_branch_resteers * (tma_cle= ars_resteers + 10 * tma_microcode_sequencer * tma_other_mispredicts / tma_b= ranch_mispredicts * tma_mispredicts_resteers) / (tma_clears_resteers + tma_= mispredicts_resteers + tma_unknown_branches)) / (tma_branch_resteers + tma_= dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switc= hes) + tma_fetch_bandwidth * tma_ms / (tma_dsb + tma_lsd + tma_mite + tma_m= s)) + 10 * tma_microcode_sequencer * tma_other_mispredicts / tma_branch_mis= predicts * tma_branch_mispredicts + tma_machine_clears * tma_other_nukes / = tma_other_nukes + tma_core_bound * (tma_serializing_operation + tma_core_bo= und * RS_EVENTS.EMPTY_CYCLES / tma_info_thread_clks * tma_ports_utilized_0)= / (tma_divider + tma_ports_utilization + tma_serializing_operation) + tma_= microcode_sequencer / (tma_few_uops_instructions + tma_microcode_sequencer)= * (tma_assists / tma_microcode_sequencer) * tma_heavy_operations)", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "100 * (tma_microcode_sequencer / (tma_few_uops_inst= ructions + tma_microcode_sequencer) * (tma_assists / tma_microcode_sequence= r) * (tma_fetch_latency * (tma_ms_switches + tma_branch_resteers * (tma_cle= ars_resteers + 10 * tma_microcode_sequencer * tma_other_mispredicts / tma_b= ranch_mispredicts * tma_mispredicts_resteers) / (tma_clears_resteers + tma_= mispredicts_resteers + tma_unknown_branches)) / (tma_branch_resteers + tma_= dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switc= hes) + tma_ms) + 10 * tma_microcode_sequencer * tma_other_mispredicts / tma= _branch_mispredicts * tma_branch_mispredicts + tma_machine_clears * tma_oth= er_nukes / tma_other_nukes + tma_core_bound * (tma_serializing_operation + = tma_core_bound * RS_EVENTS.EMPTY_CYCLES / tma_info_thread_clks * tma_ports_= utilized_0) / (tma_divider + tma_ports_utilization + tma_serializing_operat= ion) + tma_microcode_sequencer / (tma_few_uops_instructions + tma_microcode= _sequencer) * (tma_assists / tma_microcode_sequencer) * tma_heavy_operation= s)", "MetricGroup": "Bad;BvIO;Cor;Ret;tma_issueMS", "MetricName": "tma_bottleneck_irregular_overhead", "MetricThreshold": "tma_bottleneck_irregular_overhead > 10", @@ -188,6 +193,7 @@ }, { "BriefDescription": "Total pipeline cost of Memory Address Transla= tion related bottlenecks (data-side TLBs)", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_memory_bound * (tma_l1_bound / max(tma_m= emory_bound, tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + = tma_store_bound)) * (tma_dtlb_load / max(tma_l1_bound, tma_4k_aliasing + tm= a_dtlb_load + tma_fb_full + tma_l1_latency_dependency + tma_lock_latency + = tma_split_loads + tma_store_fwd_blk)) + tma_memory_bound * (tma_store_bound= / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store= _bound)) * (tma_dtlb_store / (tma_dtlb_store + tma_false_sharing + tma_spli= t_stores + tma_store_latency + tma_streaming_stores)))", "MetricGroup": "BvMT;Mem;MemoryTLB;Offcore;tma_issueTLB", "MetricName": "tma_bottleneck_memory_data_tlbs", @@ -196,6 +202,7 @@ }, { "BriefDescription": "Total pipeline cost of Memory Synchronization= related bottlenecks (data transfers and coherency updates across processor= s)", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_memory_bound * (tma_l3_bound / (tma_dram= _bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * (t= ma_contested_accesses + tma_data_sharing) / (tma_contested_accesses + tma_d= ata_sharing + tma_l3_hit_latency + tma_sq_full) + tma_store_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * = tma_false_sharing / (tma_dtlb_store + tma_false_sharing + tma_split_stores = + tma_store_latency + tma_streaming_stores - tma_store_latency)) + tma_mach= ine_clears * (1 - tma_other_nukes / tma_other_nukes))", "MetricGroup": "BvMS;LockCont;Mem;Offcore;tma_issueSyncxn", "MetricName": "tma_bottleneck_memory_synchronization", @@ -204,6 +211,7 @@ }, { "BriefDescription": "Total pipeline cost of Branch Misprediction r= elated bottlenecks", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (1 - 10 * tma_microcode_sequencer * tma_other= _mispredicts / tma_branch_mispredicts) * (tma_branch_mispredicts + tma_fetc= h_latency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switc= hes + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches))", "MetricGroup": "Bad;BadSpec;BrMispredicts;BvMP;tma_issueBM", "MetricName": "tma_bottleneck_mispredictions", @@ -212,7 +220,8 @@ }, { "BriefDescription": "Total pipeline cost of remaining bottlenecks = in the back-end", - "MetricExpr": "100 - (tma_bottleneck_big_code + tma_bottleneck_ins= truction_fetch_bw + tma_bottleneck_mispredictions + tma_bottleneck_cache_me= mory_bandwidth + tma_bottleneck_cache_memory_latency + tma_bottleneck_memor= y_data_tlbs + tma_bottleneck_memory_synchronization + tma_bottleneck_comput= e_bound_est + tma_bottleneck_irregular_overhead + tma_bottleneck_branching_= overhead + tma_bottleneck_useful_work)", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "100 - (tma_bottleneck_big_code + tma_bottleneck_ins= truction_fetch_bw + tma_bottleneck_mispredictions + tma_bottleneck_data_cac= he_memory_bandwidth + tma_bottleneck_data_cache_memory_latency + tma_bottle= neck_memory_data_tlbs + tma_bottleneck_memory_synchronization + tma_bottlen= eck_compute_bound_est + tma_bottleneck_irregular_overhead + tma_bottleneck_= branching_overhead + tma_bottleneck_useful_work)", "MetricGroup": "BvOB;Cor;Offcore", "MetricName": "tma_bottleneck_other_bottlenecks", "MetricThreshold": "tma_bottleneck_other_bottlenecks > 20", @@ -220,6 +229,7 @@ }, { "BriefDescription": "Total pipeline cost of \"useful operations\" = - the portion of Retiring category not covered by Branching_Overhead nor Ir= regular_Overhead.", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_retiring - (BR_INST_RETIRED.ALL_BRANCHES= + 2 * BR_INST_RETIRED.NEAR_CALL + INST_RETIRED.NOP) / tma_info_thread_slot= s - tma_microcode_sequencer / (tma_few_uops_instructions + tma_microcode_se= quencer) * (tma_assists / tma_microcode_sequencer) * tma_heavy_operations)", "MetricGroup": "BvUW;Ret", "MetricName": "tma_bottleneck_useful_work", @@ -427,7 +437,7 @@ "MetricGroup": "BvMB;MemoryBW;TopdownL4;tma_L4_group;tma_issueBW;t= ma_issueSL;tma_issueSmSt;tma_l1_bound_group", "MetricName": "tma_fb_full", "MetricThreshold": "tma_fb_full > 0.3", - "PublicDescription": "This metric does a *rough estimation* of how= often L1D Fill Buffer unavailability limited additional L1D miss memory ac= cess requests to proceed. The higher the metric value; the deeper the memor= y hierarchy level the misses are satisfied from (metric values >1 are valid= ). Often it hints on approaching bandwidth limits (to L2 cache; L3 cache or= external memory). Related metrics: tma_bottleneck_cache_memory_bandwidth, = tma_info_system_dram_bw_use, tma_mem_bandwidth, tma_sq_full, tma_store_late= ncy, tma_streaming_stores", + "PublicDescription": "This metric does a *rough estimation* of how= often L1D Fill Buffer unavailability limited additional L1D miss memory ac= cess requests to proceed. The higher the metric value; the deeper the memor= y hierarchy level the misses are satisfied from (metric values >1 are valid= ). Often it hints on approaching bandwidth limits (to L2 cache; L3 cache or= external memory). Related metrics: tma_bottleneck_data_cache_memory_bandwi= dth, tma_info_system_dram_bw_use, tma_mem_bandwidth, tma_sq_full, tma_store= _latency, tma_streaming_stores", "ScaleUnit": "100%" }, { @@ -619,6 +629,7 @@ }, { "BriefDescription": "Total pipeline cost of DSB (uop cache) hits -= subset of the Instruction_Fetch_BW Bottleneck", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_frontend_bound * (tma_fetch_bandwidth / = (tma_fetch_bandwidth + tma_fetch_latency)) * (tma_dsb / (tma_dsb + tma_lsd = + tma_mite + tma_ms)))", "MetricGroup": "DSB;Fed;FetchBW;tma_issueFB", "MetricName": "tma_info_botlnk_l2_dsb_bandwidth", @@ -1074,7 +1085,7 @@ "MetricName": "tma_info_memory_tlb_store_stlb_mpki" }, { - "BriefDescription": "", + "BriefDescription": "Mem;Backend;CacheHits", "MetricExpr": "UOPS_EXECUTED.THREAD / (UOPS_EXECUTED.CORE_CYCLES_G= E_1 / 2 if #SMT_on else cpu@UOPS_EXECUTED.THREAD\\,cmask\\=3D1@)", "MetricGroup": "Cor;Pipeline;PortsUtil;SMT", "MetricName": "tma_info_pipeline_execute" @@ -1097,6 +1108,12 @@ "MetricGroup": "Fed;FetchBW", "MetricName": "tma_info_pipeline_fetch_mite" }, + { + "BriefDescription": "Average number of uops fetched from MS per cy= cle", + "MetricExpr": "IDQ.MS_UOPS / cpu@IDQ.MS_UOPS\\,cmask\\=3D1@", + "MetricGroup": "Fed;FetchLat;MicroSeq", + "MetricName": "tma_info_pipeline_fetch_ms" + }, { "BriefDescription": "Instructions per a microcode Assist invocatio= n", "MetricExpr": "INST_RETIRED.ANY / ASSISTS.ANY", @@ -1113,7 +1130,7 @@ }, { "BriefDescription": "Measured Average Core Frequency for unhalted = processors [GHz]", - "MetricExpr": "tma_info_system_turbo_utilization * TSC / 1e9 / tma= _info_system_time", + "MetricExpr": "tma_info_system_turbo_utilization * msr@tsc@ / 1e9 = / tma_info_system_time", "MetricGroup": "Power;Summary", "MetricName": "tma_info_system_core_frequency" }, @@ -1125,7 +1142,7 @@ }, { "BriefDescription": "Average number of utilized CPUs", - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / TSC", + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@", "MetricGroup": "Summary", "MetricName": "tma_info_system_cpus_utilized" }, @@ -1134,7 +1151,7 @@ "MetricExpr": "64 * (UNC_ARB_TRK_REQUESTS.ALL + UNC_ARB_COH_TRK_RE= QUESTS.ALL) / 1e6 / tma_info_system_time / 1e3", "MetricGroup": "HPC;MemOffcore;MemoryBW;SoC;tma_issueBW", "MetricName": "tma_info_system_dram_bw_use", - "PublicDescription": "Average external Memory Bandwidth Use for re= ads and writes [GB / sec]. Related metrics: tma_bottleneck_cache_memory_ban= dwidth, tma_fb_full, tma_mem_bandwidth, tma_sq_full" + "PublicDescription": "Average external Memory Bandwidth Use for re= ads and writes [GB / sec]. Related metrics: tma_bottleneck_data_cache_memor= y_bandwidth, tma_fb_full, tma_mem_bandwidth, tma_sq_full" }, { "BriefDescription": "Giga Floating Point Operations Per Second", @@ -1165,6 +1182,7 @@ }, { "BriefDescription": "Average number of parallel data read requests= to external memory", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "UNC_ARB_DAT_OCCUPANCY.RD / UNC_ARB_DAT_OCCUPANCY.RD= @cmask\\=3D1@", "MetricGroup": "Mem;MemoryBW;SoC", "MetricName": "tma_info_system_mem_parallel_reads", @@ -1316,12 +1334,12 @@ "ScaleUnit": "100%" }, { - "BriefDescription": "This metric([SKL+] roughly; [LNL]) estimates = fraction of cycles with demand load accesses that hit the L1D cache", + "BriefDescription": "This metric ([SKL+] roughly; [LNL]) estimates= fraction of cycles with demand load accesses that hit the L1D cache", "MetricExpr": "min(2 * (MEM_INST_RETIRED.ALL_LOADS - MEM_LOAD_RETI= RED.FB_HIT - MEM_LOAD_RETIRED.L1_MISS) * 20 / 100, max(CYCLE_ACTIVITY.CYCLE= S_MEM_ANY - CYCLE_ACTIVITY.CYCLES_L1D_MISS, 0)) / tma_info_thread_clks", "MetricGroup": "BvML;MemoryLat;TopdownL4;tma_L4_group;tma_l1_bound= _group", "MetricName": "tma_l1_latency_dependency", "MetricThreshold": "tma_l1_latency_dependency > 0.1 & (tma_l1_boun= d > 0.1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric([SKL+] roughly; [LNL]) estimates= fraction of cycles with demand load accesses that hit the L1D cache. The s= hort latency of the L1D cache may be exposed in pointer-chasing memory acce= ss patterns as an example. Sample with: MEM_LOAD_RETIRED.L1_HIT", + "PublicDescription": "This metric ([SKL+] roughly; [LNL]) estimate= s fraction of cycles with demand load accesses that hit the L1D cache. The = short latency of the L1D cache may be exposed in pointer-chasing memory acc= ess patterns as an example. Sample with: MEM_LOAD_RETIRED.L1_HIT", "ScaleUnit": "100%" }, { @@ -1345,7 +1363,6 @@ }, { "BriefDescription": "This metric estimates how often the CPU was s= talled due to loads accesses to L3 cache or contended with a sibling Core", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "(CYCLE_ACTIVITY.STALLS_L2_MISS - CYCLE_ACTIVITY.STA= LLS_L3_MISS) / tma_info_thread_clks", "MetricGroup": "CacheHits;MemoryBound;TmaL3mem;TopdownL3;tma_L3_gr= oup;tma_memory_bound_group", "MetricName": "tma_l3_bound", @@ -1359,7 +1376,7 @@ "MetricGroup": "BvML;MemoryLat;TopdownL4;tma_L4_group;tma_issueLat= ;tma_l3_bound_group", "MetricName": "tma_l3_hit_latency", "MetricThreshold": "tma_l3_hit_latency > 0.1 & (tma_l3_bound > 0.0= 5 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric estimates fraction of cycles wit= h demand load accesses that hit the L3 cache under unloaded scenarios (poss= ibly L3 latency limited). Avoiding private cache misses (i.e. L2 misses/L3= hits) will improve the latency; reduce contention with sibling physical co= res and increase performance. Note the value of this node may overlap with= its siblings. Sample with: MEM_LOAD_RETIRED.L3_HIT_PS. Related metrics: tm= a_bottleneck_cache_memory_latency, tma_mem_latency", + "PublicDescription": "This metric estimates fraction of cycles wit= h demand load accesses that hit the L3 cache under unloaded scenarios (poss= ibly L3 latency limited). Avoiding private cache misses (i.e. L2 misses/L3= hits) will improve the latency; reduce contention with sibling physical co= res and increase performance. Note the value of this node may overlap with= its siblings. Sample with: MEM_LOAD_RETIRED.L3_HIT_PS. Related metrics: tm= a_bottleneck_data_cache_memory_latency, tma_mem_latency", "ScaleUnit": "100%" }, { @@ -1465,7 +1482,7 @@ "MetricGroup": "BvMB;MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_d= ram_bound_group;tma_issueBW", "MetricName": "tma_mem_bandwidth", "MetricThreshold": "tma_mem_bandwidth > 0.2 & (tma_dram_bound > 0.= 1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric estimates fraction of cycles whe= re the core's performance was likely hurt due to approaching bandwidth limi= ts of external memory - DRAM ([SPR-HBM] and/or HBM). The underlying heuris= tic assumes that a similar off-core traffic is generated by all IA cores. T= his metric does not aggregate non-data-read requests by this logical proces= sor; requests from other IA Logical Processors/Physical Cores/sockets; or o= ther non-IA devices like GPU; hence the maximum external memory bandwidth l= imits may or may not be approached when this metric is flagged (see Uncore = counters for that). Related metrics: tma_bottleneck_cache_memory_bandwidth,= tma_fb_full, tma_info_system_dram_bw_use, tma_sq_full", + "PublicDescription": "This metric estimates fraction of cycles whe= re the core's performance was likely hurt due to approaching bandwidth limi= ts of external memory - DRAM ([SPR-HBM] and/or HBM). The underlying heuris= tic assumes that a similar off-core traffic is generated by all IA cores. T= his metric does not aggregate non-data-read requests by this logical proces= sor; requests from other IA Logical Processors/Physical Cores/sockets; or o= ther non-IA devices like GPU; hence the maximum external memory bandwidth l= imits may or may not be approached when this metric is flagged (see Uncore = counters for that). Related metrics: tma_bottleneck_data_cache_memory_bandw= idth, tma_fb_full, tma_info_system_dram_bw_use, tma_sq_full", "ScaleUnit": "100%" }, { @@ -1474,7 +1491,7 @@ "MetricGroup": "BvML;MemoryLat;Offcore;TopdownL4;tma_L4_group;tma_= dram_bound_group;tma_issueLat", "MetricName": "tma_mem_latency", "MetricThreshold": "tma_mem_latency > 0.1 & (tma_dram_bound > 0.1 = & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric estimates fraction of cycles whe= re the performance was likely hurt due to latency from external memory - DR= AM ([SPR-HBM] and/or HBM). This metric does not aggregate requests from ot= her Logical Processors/Physical Cores/sockets (see Uncore counters for that= ). Related metrics: tma_bottleneck_cache_memory_latency, tma_l3_hit_latency= ", + "PublicDescription": "This metric estimates fraction of cycles whe= re the performance was likely hurt due to latency from external memory - DR= AM ([SPR-HBM] and/or HBM). This metric does not aggregate requests from ot= her Logical Processors/Physical Cores/sockets (see Uncore counters for that= ). Related metrics: tma_bottleneck_data_cache_memory_latency, tma_l3_hit_la= tency", "ScaleUnit": "100%" }, { @@ -1542,7 +1559,7 @@ }, { "BriefDescription": "This metric represents Core fraction of cycle= s in which CPU was likely limited due to the Microcode Sequencer (MS) unit = - see Microcode_Sequencer node for details.", - "MetricExpr": "cpu@IDQ.MS_UOPS\\,cmask\\=3D1@ / tma_info_core_core= _clks / 2", + "MetricExpr": "cpu@IDQ.MS_UOPS\\,cmask\\=3D1@ / tma_info_core_core= _clks / 3.3", "MetricGroup": "MicroSeq;TopdownL3;tma_L3_group;tma_fetch_bandwidt= h_group", "MetricName": "tma_ms", "MetricThreshold": "tma_ms > 0.05 & tma_fetch_bandwidth > 0.2", @@ -1676,7 +1693,7 @@ { "BriefDescription": "This category represents fraction of slots ut= ilized by useful work i.e. issued uops that eventually get retired", "DefaultMetricgroupName": "TopdownL1", - "MetricExpr": "topdown\\-retiring / (topdown\\-fe\\-bound + topdow= n\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_info_= thread_slots", + "MetricExpr": "topdown\\-retiring / (topdown\\-fe\\-bound + topdow= n\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound)", "MetricGroup": "BvUW;Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_retiring", "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.= 1", @@ -1713,7 +1730,6 @@ }, { "BriefDescription": "This metric represents rate of split store ac= cesses", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "MEM_INST_RETIRED.SPLIT_STORES / tma_info_core_core_= clks", "MetricGroup": "TopdownL4;tma_L4_group;tma_issueSpSt;tma_store_bou= nd_group", "MetricName": "tma_split_stores", @@ -1727,7 +1743,7 @@ "MetricGroup": "BvMB;MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_i= ssueBW;tma_l3_bound_group", "MetricName": "tma_sq_full", "MetricThreshold": "tma_sq_full > 0.3 & (tma_l3_bound > 0.05 & (tm= a_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric measures fraction of cycles wher= e the Super Queue (SQ) was full taking into account all request-types and b= oth hardware SMT threads (Logical Processors). Related metrics: tma_bottlen= eck_cache_memory_bandwidth, tma_fb_full, tma_info_system_dram_bw_use, tma_m= em_bandwidth", + "PublicDescription": "This metric measures fraction of cycles wher= e the Super Queue (SQ) was full taking into account all request-types and b= oth hardware SMT threads (Logical Processors). Related metrics: tma_bottlen= eck_data_cache_memory_bandwidth, tma_fb_full, tma_info_system_dram_bw_use, = tma_mem_bandwidth", "ScaleUnit": "100%" }, { @@ -1741,7 +1757,6 @@ }, { "BriefDescription": "This metric roughly estimates fraction of cyc= les when the memory subsystem had loads blocked since they could not forwar= d data from earlier (in program order) overlapping stores", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "13 * LD_BLOCKS.STORE_FORWARD / tma_info_thread_clks= ", "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group", "MetricName": "tma_store_fwd_blk", --=20 2.50.0.727.gbf7dc18ff4-goog From nobody Mon Oct 6 15:13:22 2025 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BC6992264B6 for ; Sat, 19 Jul 2025 03:45:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752896756; cv=none; b=V6imPitPFuIorMfo31Wk+8ySR0GfW28ex6x7saeh6pb2kOdC8t9OqeutCXagVGiSEg2z0WzgNypd9N1usTO8PPCsyaEA7tQsV72HT+RZgcBHD/P5d6Ez5zU+/MGhuREDazfnBz2MTWxZBt0c0ehyr6NuXc6tlrIVuO2FvmmFWlk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752896756; c=relaxed/simple; bh=HTgDiHcfpmsB1kpb9qDWCy9sflejE8kAAdvG3nRSZ8g=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Content-Type; b=PwxtN3dkUqY4bsx1t2eU490iLl1nNeZR0CnPg6jZYOKBSK4KMvYxIWxvfKbbNRl2HM/MXKT6sDLXcQ3fq+vq+DNkXmS4fU00mY4HhFdd9NGj8LeGcYtzlKM894A8V3g2CX8QI+M+/IdVm0c2EquI421auH2BHWy3Sj2CdbOkGEo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=3Tra2xnY; arc=none smtp.client-ip=209.85.210.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="3Tra2xnY" Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-74ce2491c0fso3992890b3a.0 for ; Fri, 18 Jul 2025 20:45:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1752896753; x=1753501553; darn=vger.kernel.org; h=to:from:subject:message-id:references:mime-version:in-reply-to:date :from:to:cc:subject:date:message-id:reply-to; bh=wyoHqHO0xSjLFu8zTnBCBlfDgc3kXoP6K9OnCFzAIxc=; b=3Tra2xnYi/ja/ZC+Of47KeuGUD+7bdhSJK03xstpx5jYgbHSfELSoa9tbQiGMiHoBz 8eF6Lve40ji9wrEfQ62PUhP6D8W6CdseZ207U8F4EIfi1D44IZ3o2ssn+YZsV7hjJO4d 7HseyySxVy9Kk9dXUriCn6COORWmUifJI//toDxcVAVm/XCVscqzmTm4sJ5QzbC/IuXL 1FI5Dc33QQlV/pRrvP/68cpykNCN+uL88SsTZKE6VYTapVRnwdPywHM68fknrXQvpFRr XXt56yq30RN+OwA06Dl28SzDRqnPVEHdEXbeDWog8HgN5Aa3zgMhpnwC5b4rPxKLad7X Ze8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752896753; x=1753501553; h=to:from:subject:message-id:references:mime-version:in-reply-to:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=wyoHqHO0xSjLFu8zTnBCBlfDgc3kXoP6K9OnCFzAIxc=; b=LN54GkJKMh8okqItkx4bzZ4OXRHNgZ7EUw64bavpZUme0uEMwFJaSyFWbrMNBsTK47 qMScUuRDtyF6THudCzNtfDq40HpAeqw7CYjTzPOPw1NCAeJCOnQd+y/5xZA66TVeWQ71 mMtlU4Rz3QjUZXNZ0SeFzFjoW56Xkp0SqNPxI/Ri5SI0TUPQWXUm9BZb69gmJ7MEUA0y sitNau7cuvFFZYkOC3OGj1lDndzCWFKX90VOK8NdoSt0FkuNzpDawNLhIHP7j6dwhCId tQmDVMXD4111/2zQSBG2GB9Hdpb7gQxhvUSeBM2EgJPwm3xaz9UWjCMzUgOhDUQQS0bm MaJA== X-Forwarded-Encrypted: i=1; AJvYcCUCRYQWtykHJ5SXNsrvVloNqq54cs3NhWa5Fy1zFZdW11X13W9gs2XJ6I09aM1eWyGwJm5max6/4EkIHf0=@vger.kernel.org X-Gm-Message-State: AOJu0Yw0g1SaAMuANpnue0hD42j8O9hjb7f3bSY20k2MFMzFPvzhmO70 VoKXPoJ9E/QEF8bDg/vsysuEcxAhc6E9bKXVpKZo7gWS7FqGzSTXjubFfJjiiqJ0ctzmmAmOBPK VlCLNAtvhDQ== X-Google-Smtp-Source: AGHT+IEvlwMx6OUGmBObZryfm2N5mH9c5xFv7x/kAdQ364gVL3wMAsr/DibD/iPCp5oqD1mn0gDliHeWG5du X-Received: from pfuv16.prod.google.com ([2002:a05:6a00:1490:b0:74a:cccf:5726]) (user=irogers job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:4b45:b0:746:26fe:8cdf with SMTP id d2e1a72fcca58-758492e296dmr13889782b3a.7.1752896752680; Fri, 18 Jul 2025 20:45:52 -0700 (PDT) Date: Fri, 18 Jul 2025 20:45:11 -0700 In-Reply-To: <20250719034515.2000467-1-irogers@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250719034515.2000467-1-irogers@google.com> X-Mailer: git-send-email 2.50.0.727.gbf7dc18ff4-goog Message-ID: <20250719034515.2000467-16-irogers@google.com> Subject: [PATCH v1 15/19] perf vendor events: Update sandybridge metrics From: Ian Rogers To: Thomas Falcon , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Kan Liang , "=?UTF-8?q?Andreas=20F=C3=A4rber?=" , Manivannan Sadhasivam , Caleb Biggers , Weilin Wang , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-actions@lists.infradead.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Update metrics from TMA 5.0 to 5.1. Signed-off-by: Ian Rogers Tested-by: Thomas Falcon --- .../arch/x86/sandybridge/snb-metrics.json | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/tools/perf/pmu-events/arch/x86/sandybridge/snb-metrics.json b/= tools/perf/pmu-events/arch/x86/sandybridge/snb-metrics.json index 823d8b7c4224..d40761903429 100644 --- a/tools/perf/pmu-events/arch/x86/sandybridge/snb-metrics.json +++ b/tools/perf/pmu-events/arch/x86/sandybridge/snb-metrics.json @@ -1,49 +1,49 @@ [ { "BriefDescription": "C2 residency percent per package", - "MetricExpr": "cstate_pkg@c2\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c2\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C2_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C3 residency percent per core", - "MetricExpr": "cstate_core@c3\\-residency@ / TSC", + "MetricExpr": "cstate_core@c3\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C3_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C3 residency percent per package", - "MetricExpr": "cstate_pkg@c3\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c3\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C3_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per core", - "MetricExpr": "cstate_core@c6\\-residency@ / TSC", + "MetricExpr": "cstate_core@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per package", - "MetricExpr": "cstate_pkg@c6\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C7 residency percent per core", - "MetricExpr": "cstate_core@c7\\-residency@ / TSC", + "MetricExpr": "cstate_core@c7\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C7_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C7 residency percent per package", - "MetricExpr": "cstate_pkg@c7\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c7\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C7_Pkg_Residency", "ScaleUnit": "100%" @@ -71,7 +71,6 @@ }, { "BriefDescription": "This category represents fraction of slots wh= ere no uops are being delivered due to a lack of required resources for acc= epting new uops in the Backend", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "1 - (tma_frontend_bound + tma_bad_speculation + tma= _retiring)", "MetricGroup": "BvOB;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", @@ -296,7 +295,7 @@ }, { "BriefDescription": "Measured Average Core Frequency for unhalted = processors [GHz]", - "MetricExpr": "tma_info_system_turbo_utilization * TSC / 1e9 / tma= _info_system_time", + "MetricExpr": "tma_info_system_turbo_utilization * msr@tsc@ / 1e9 = / tma_info_system_time", "MetricGroup": "Power;Summary", "MetricName": "tma_info_system_core_frequency" }, @@ -308,7 +307,7 @@ }, { "BriefDescription": "Average number of utilized CPUs", - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / TSC", + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@", "MetricGroup": "Summary", "MetricName": "tma_info_system_cpus_utilized" }, --=20 2.50.0.727.gbf7dc18ff4-goog From nobody Mon Oct 6 15:13:22 2025 Received: from mail-oo1-f74.google.com (mail-oo1-f74.google.com [209.85.161.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 82338149C41 for ; Sat, 19 Jul 2025 03:45:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.161.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752896774; cv=none; b=ZpZ6Na2ny5nK85dr7kX/9UeZkOB4gbWRPx8LcHXPl1KBnGkavDlGEPQrN82hYesrFRXn8gMO8qzqiYKR37GrN8kocurxwCHT53ubS9LX/uX1LbM0mZHijSr+UhRgOVq9XbQTyKORSNkrgUY9RdKmix0dg6tvLhod6OtItlvpxeI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752896774; c=relaxed/simple; bh=TeS2A04pSWQEaGttIbczK4qDoHdbKCh2cbmQtU3U+0s=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Content-Type; b=WRaHhmhfp85MA8LgraonczCzvq5CbbIk54FJL0P7oZX7JPyIv+h+PpMIetcIwZlUeoaunaggSN1v6WSOiPeUPGFcn3O5tsJLb3KeFqhLdBy3abDzE7rFud1u9bgcP9hxbJ4insymCSL9929doXttCIK1TXtjq3007RYStmqiNXU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=3ikT6EX8; arc=none smtp.client-ip=209.85.161.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="3ikT6EX8" Received: by mail-oo1-f74.google.com with SMTP id 006d021491bc7-615c6bacd06so318666eaf.1 for ; Fri, 18 Jul 2025 20:45:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1752896755; x=1753501555; darn=vger.kernel.org; h=content-transfer-encoding:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=jPDLnjWH0b1svPB0ZCFB8M7pcpBUSRL5ujPq1BX2Aq0=; b=3ikT6EX8Iic0+YCABHqIvN/YgPQ6gTshhizq6ZBj6kW6worSnNI5Sbx3HDazW3dkpT Fb++M7yS6mCrwtDiM7fSQ9qnPscnBRZ9ACDdj9pjzuoi3RibfpUumGAFibOkzPAfy1XS 8frJViJsku45Wuv1/2rRSXvtbs/WIOO1V9jxSUUzSSaE34CUqQUGzjJtVyZdl/MB5ecd PtSK8CBQ2jHivgmynNj/uS/nGrwRNSUjy1TWVgYfWvDOnUAqezcCs3BjsFZfHWO2Bya2 lUDk8HibCbeZ05qlPuzrzyJW6Z/+JWlyYlCMF7jmK+5vUvTM7CDgzZQ7eoYjarFgfgdO IDWQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752896755; x=1753501555; h=content-transfer-encoding:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=jPDLnjWH0b1svPB0ZCFB8M7pcpBUSRL5ujPq1BX2Aq0=; b=SMSHNwUpMT5g9BmdcTErWBqJgfvUq1BhhQNFG5TjtNdIjL41qdGQpMW5+4tgV1Y/li mbJSz3z3jCJXASN2UJ6FHeUa0dy6Hco1JnIb52kzTgbki9YXQKnCXNSiJAlj4HbNmdf2 zwZysCN1gi9TwO/HUZBiX78R9ucG9wkguhs3WSRkatl2YLH82zhcQ9hsseVWCenj+cse VhomHoqoQLdB8tF8d7N6TvmftX+kpVd4JzqoqnNU1EKJnUrXKBISWaKB6ejyHpC9q0YN 4noQX1QCnL67htQm1/dlAR9RTx9gPr9+JSJ+6DjIhOn42zBoVR4tEiAUWVGoewwsRS8b 9x8g== X-Forwarded-Encrypted: i=1; AJvYcCXSEvF71fGKLIbuG4DmCG1jqhQqKR82V4+egcNDPXV1KBxT4yl8uGesddwdhjeXy2YPm20m7s4qqWTdkSI=@vger.kernel.org X-Gm-Message-State: AOJu0YxQ14ANxXjThntCZ9kKvwIsKk1Qw0enJh0MAL4+xv+0Sj/n3pFb 263p6mye7vu6pjt72aR/N2+QN8RplMnmjmYhqjShypp3UE+TKgH17XF6uZVJsjTvjVhATROSSdY XXT/vjG/5Tw== X-Google-Smtp-Source: AGHT+IHp8bxHq2Cv7WL1Y3yJRwSoTTubIbnSCy9JQ1M8hs2cpm1JuLFOjyeIiltZECsju6FkMX6TKbKmRMPB X-Received: from oabwh16.prod.google.com ([2002:a05:6871:a690:b0:2c1:5c70:acac]) (user=irogers job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6808:350b:b0:408:e711:a9d with SMTP id 5614622812f47-41d04d89e41mr9675423b6e.18.1752896754647; Fri, 18 Jul 2025 20:45:54 -0700 (PDT) Date: Fri, 18 Jul 2025 20:45:12 -0700 In-Reply-To: <20250719034515.2000467-1-irogers@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250719034515.2000467-1-irogers@google.com> X-Mailer: git-send-email 2.50.0.727.gbf7dc18ff4-goog Message-ID: <20250719034515.2000467-17-irogers@google.com> Subject: [PATCH v1 16/19] perf vendor events: Update sapphirerapids events/metrics From: Ian Rogers To: Thomas Falcon , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Kan Liang , "=?UTF-8?q?Andreas=20F=C3=A4rber?=" , Manivannan Sadhasivam , Caleb Biggers , Weilin Wang , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-actions@lists.infradead.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Update events from v1.28 to v1.30. Update metrics from TMA 5.0 to 5.1. The event updates come from: https://github.com/intel/perfmon/commit/c6a01e651c7be0dbc7a0e92ea915bb3c7e5= 970da https://github.com/intel/perfmon/commit/8b3a5b3f8ebf3cc48e29e3b65ecccb37f6f= c3e81 Signed-off-by: Ian Rogers Tested-by: Thomas Falcon --- tools/perf/pmu-events/arch/x86/mapfile.csv | 2 +- .../arch/x86/sapphirerapids/cache.json | 100 +++++------ .../x86/sapphirerapids/floating-point.json | 43 ++--- .../arch/x86/sapphirerapids/frontend.json | 42 +++-- .../arch/x86/sapphirerapids/memory.json | 30 ++-- .../arch/x86/sapphirerapids/other.json | 28 ++- .../arch/x86/sapphirerapids/pipeline.json | 167 +++++++----------- .../arch/x86/sapphirerapids/spr-metrics.json | 153 ++++++++++------ .../x86/sapphirerapids/uncore-memory.json | 82 +++++++++ .../x86/sapphirerapids/virtual-memory.json | 40 ++--- 10 files changed, 382 insertions(+), 305 deletions(-) diff --git a/tools/perf/pmu-events/arch/x86/mapfile.csv b/tools/perf/pmu-ev= ents/arch/x86/mapfile.csv index f8bf16d30602..d9daab4d8461 100644 --- a/tools/perf/pmu-events/arch/x86/mapfile.csv +++ b/tools/perf/pmu-events/arch/x86/mapfile.csv @@ -29,7 +29,7 @@ GenuineIntel-6-2E,v4,nehalemex,core GenuineIntel-6-CC,v1.00,pantherlake,core GenuineIntel-6-A7,v1.04,rocketlake,core GenuineIntel-6-2A,v19,sandybridge,core -GenuineIntel-6-8F,v1.28,sapphirerapids,core +GenuineIntel-6-8F,v1.30,sapphirerapids,core GenuineIntel-6-AF,v1.11,sierraforest,core GenuineIntel-6-(37|4A|4C|4D|5A),v15,silvermont,core GenuineIntel-6-(4E|5E|8E|9E|A5|A6),v59,skylake,core diff --git a/tools/perf/pmu-events/arch/x86/sapphirerapids/cache.json b/too= ls/perf/pmu-events/arch/x86/sapphirerapids/cache.json index 21db53f9e9d6..a30af5140e6c 100644 --- a/tools/perf/pmu-events/arch/x86/sapphirerapids/cache.json +++ b/tools/perf/pmu-events/arch/x86/sapphirerapids/cache.json @@ -4,7 +4,6 @@ "Counter": "0,1,2,3", "EventCode": "0x51", "EventName": "L1D.HWPF_MISS", - "PublicDescription": "L1D.HWPF_MISS Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x20" }, @@ -13,7 +12,7 @@ "Counter": "0,1,2,3", "EventCode": "0x51", "EventName": "L1D.REPLACEMENT", - "PublicDescription": "Counts L1D data line replacements including = opportunistic replacements, and replacements that require stall-for-replace= or block-for-replace. Available PDIST counters: 0", + "PublicDescription": "Counts L1D data line replacements including = opportunistic replacements, and replacements that require stall-for-replace= or block-for-replace.", "SampleAfterValue": "100003", "UMask": "0x1" }, @@ -22,7 +21,7 @@ "Counter": "0,1,2,3", "EventCode": "0x48", "EventName": "L1D_PEND_MISS.FB_FULL", - "PublicDescription": "Counts number of cycles a demand request has= waited due to L1D Fill Buffer (FB) unavailability. Demand requests include= cacheable/uncacheable demand load, store, lock or SW prefetch accesses. Av= ailable PDIST counters: 0", + "PublicDescription": "Counts number of cycles a demand request has= waited due to L1D Fill Buffer (FB) unavailability. Demand requests include= cacheable/uncacheable demand load, store, lock or SW prefetch accesses.", "SampleAfterValue": "1000003", "UMask": "0x2" }, @@ -33,7 +32,7 @@ "EdgeDetect": "1", "EventCode": "0x48", "EventName": "L1D_PEND_MISS.FB_FULL_PERIODS", - "PublicDescription": "Counts number of phases a demand request has= waited due to L1D Fill Buffer (FB) unavailability. Demand requests include= cacheable/uncacheable demand load, store, lock or SW prefetch accesses. Av= ailable PDIST counters: 0", + "PublicDescription": "Counts number of phases a demand request has= waited due to L1D Fill Buffer (FB) unavailability. Demand requests include= cacheable/uncacheable demand load, store, lock or SW prefetch accesses.", "SampleAfterValue": "1000003", "UMask": "0x2" }, @@ -43,7 +42,6 @@ "Deprecated": "1", "EventCode": "0x48", "EventName": "L1D_PEND_MISS.L2_STALL", - "PublicDescription": "This event is deprecated. Refer to new event= L1D_PEND_MISS.L2_STALLS Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x4" }, @@ -52,7 +50,7 @@ "Counter": "0,1,2,3", "EventCode": "0x48", "EventName": "L1D_PEND_MISS.L2_STALLS", - "PublicDescription": "Counts number of cycles a demand request has= waited due to L1D due to lack of L2 resources. Demand requests include cac= heable/uncacheable demand load, store, lock or SW prefetch accesses. Availa= ble PDIST counters: 0", + "PublicDescription": "Counts number of cycles a demand request has= waited due to L1D due to lack of L2 resources. Demand requests include cac= heable/uncacheable demand load, store, lock or SW prefetch accesses.", "SampleAfterValue": "1000003", "UMask": "0x4" }, @@ -61,7 +59,7 @@ "Counter": "0,1,2,3", "EventCode": "0x48", "EventName": "L1D_PEND_MISS.PENDING", - "PublicDescription": "Counts number of L1D misses that are outstan= ding in each cycle, that is each cycle the number of Fill Buffers (FB) outs= tanding required by Demand Reads. FB either is held by demand loads, or it = is held by non-demand loads and gets hit at least once by demand. The valid= outstanding interval is defined until the FB deallocation by one of the fo= llowing ways: from FB allocation, if FB is allocated by demand from the dem= and Hit FB, if it is allocated by hardware or software prefetch. Note: In t= he L1D, a Demand Read contains cacheable or noncacheable demand loads, incl= uding ones causing cache-line splits and reads due to page walks resulted f= rom any request type. Available PDIST counters: 0", + "PublicDescription": "Counts number of L1D misses that are outstan= ding in each cycle, that is each cycle the number of Fill Buffers (FB) outs= tanding required by Demand Reads. FB either is held by demand loads, or it = is held by non-demand loads and gets hit at least once by demand. The valid= outstanding interval is defined until the FB deallocation by one of the fo= llowing ways: from FB allocation, if FB is allocated by demand from the dem= and Hit FB, if it is allocated by hardware or software prefetch. Note: In t= he L1D, a Demand Read contains cacheable or noncacheable demand loads, incl= uding ones causing cache-line splits and reads due to page walks resulted f= rom any request type.", "SampleAfterValue": "1000003", "UMask": "0x1" }, @@ -71,7 +69,7 @@ "CounterMask": "1", "EventCode": "0x48", "EventName": "L1D_PEND_MISS.PENDING_CYCLES", - "PublicDescription": "Counts duration of L1D miss outstanding in c= ycles. Available PDIST counters: 0", + "PublicDescription": "Counts duration of L1D miss outstanding in c= ycles.", "SampleAfterValue": "1000003", "UMask": "0x1" }, @@ -80,7 +78,7 @@ "Counter": "0,1,2,3", "EventCode": "0x25", "EventName": "L2_LINES_IN.ALL", - "PublicDescription": "Counts the number of L2 cache lines filling = the L2. Counting does not cover rejects. Available PDIST counters: 0", + "PublicDescription": "Counts the number of L2 cache lines filling = the L2. Counting does not cover rejects.", "SampleAfterValue": "100003", "UMask": "0x1f" }, @@ -89,7 +87,7 @@ "Counter": "0,1,2,3", "EventCode": "0x26", "EventName": "L2_LINES_OUT.NON_SILENT", - "PublicDescription": "Counts the number of lines that are evicted = by L2 cache when triggered by an L2 cache fill. Those lines are in Modified= state. Modified lines are written back to L3 Available PDIST counters: 0", + "PublicDescription": "Counts the number of lines that are evicted = by L2 cache when triggered by an L2 cache fill. Those lines are in Modified= state. Modified lines are written back to L3", "SampleAfterValue": "200003", "UMask": "0x2" }, @@ -98,7 +96,7 @@ "Counter": "0,1,2,3", "EventCode": "0x26", "EventName": "L2_LINES_OUT.SILENT", - "PublicDescription": "Counts the number of lines that are silently= dropped by L2 cache. These lines are typically in Shared or Exclusive stat= e. A non-threaded event. Available PDIST counters: 0", + "PublicDescription": "Counts the number of lines that are silently= dropped by L2 cache. These lines are typically in Shared or Exclusive stat= e. A non-threaded event.", "SampleAfterValue": "200003", "UMask": "0x1" }, @@ -107,7 +105,7 @@ "Counter": "0,1,2,3", "EventCode": "0x26", "EventName": "L2_LINES_OUT.USELESS_HWPF", - "PublicDescription": "Counts the number of cache lines that have b= een prefetched by the L2 hardware prefetcher but not used by demand access = when evicted from the L2 cache Available PDIST counters: 0", + "PublicDescription": "Counts the number of cache lines that have b= een prefetched by the L2 hardware prefetcher but not used by demand access = when evicted from the L2 cache", "SampleAfterValue": "200003", "UMask": "0x4" }, @@ -116,7 +114,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_REQUEST.ALL", - "PublicDescription": "Counts all requests that were hit or true mi= sses in L2 cache. True-miss excludes misses that were merged with ongoing L= 2 misses. [This event is alias to L2_RQSTS.REFERENCES] Available PDIST coun= ters: 0", + "PublicDescription": "Counts all requests that were hit or true mi= sses in L2 cache. True-miss excludes misses that were merged with ongoing L= 2 misses. [This event is alias to L2_RQSTS.REFERENCES]", "SampleAfterValue": "200003", "UMask": "0xff" }, @@ -125,7 +123,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_REQUEST.MISS", - "PublicDescription": "Counts read requests of any type with true-m= iss in the L2 cache. True-miss excludes L2 misses that were merged with ong= oing L2 misses. [This event is alias to L2_RQSTS.MISS] Available PDIST coun= ters: 0", + "PublicDescription": "Counts read requests of any type with true-m= iss in the L2 cache. True-miss excludes L2 misses that were merged with ong= oing L2 misses. [This event is alias to L2_RQSTS.MISS]", "SampleAfterValue": "200003", "UMask": "0x3f" }, @@ -134,7 +132,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.ALL_CODE_RD", - "PublicDescription": "Counts the total number of L2 code requests.= Available PDIST counters: 0", + "PublicDescription": "Counts the total number of L2 code requests.= ", "SampleAfterValue": "200003", "UMask": "0xe4" }, @@ -143,7 +141,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.ALL_DEMAND_DATA_RD", - "PublicDescription": "Counts Demand Data Read requests accessing t= he L2 cache. These requests may hit or miss L2 cache. True-miss exclude mis= ses that were merged with ongoing L2 misses. An access is counted once. Ava= ilable PDIST counters: 0", + "PublicDescription": "Counts Demand Data Read requests accessing t= he L2 cache. These requests may hit or miss L2 cache. True-miss exclude mis= ses that were merged with ongoing L2 misses. An access is counted once.", "SampleAfterValue": "200003", "UMask": "0xe1" }, @@ -152,7 +150,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.ALL_DEMAND_MISS", - "PublicDescription": "Counts demand requests that miss L2 cache. A= vailable PDIST counters: 0", + "PublicDescription": "Counts demand requests that miss L2 cache.", "SampleAfterValue": "200003", "UMask": "0x27" }, @@ -161,7 +159,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.ALL_DEMAND_REFERENCES", - "PublicDescription": "Counts demand requests to L2 cache. Availabl= e PDIST counters: 0", + "PublicDescription": "Counts demand requests to L2 cache.", "SampleAfterValue": "200003", "UMask": "0xe7" }, @@ -170,7 +168,6 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.ALL_HWPF", - "PublicDescription": "L2_RQSTS.ALL_HWPF Available PDIST counters: = 0", "SampleAfterValue": "200003", "UMask": "0xf0" }, @@ -179,7 +176,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.ALL_RFO", - "PublicDescription": "Counts the total number of RFO (read for own= ership) requests to L2 cache. L2 RFO requests include both L1D demand RFO m= isses as well as L1D RFO prefetches. Available PDIST counters: 0", + "PublicDescription": "Counts the total number of RFO (read for own= ership) requests to L2 cache. L2 RFO requests include both L1D demand RFO m= isses as well as L1D RFO prefetches.", "SampleAfterValue": "200003", "UMask": "0xe2" }, @@ -188,7 +185,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.CODE_RD_HIT", - "PublicDescription": "Counts L2 cache hits when fetching instructi= ons, code reads. Available PDIST counters: 0", + "PublicDescription": "Counts L2 cache hits when fetching instructi= ons, code reads.", "SampleAfterValue": "200003", "UMask": "0xc4" }, @@ -197,7 +194,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.CODE_RD_MISS", - "PublicDescription": "Counts L2 cache misses when fetching instruc= tions. Available PDIST counters: 0", + "PublicDescription": "Counts L2 cache misses when fetching instruc= tions.", "SampleAfterValue": "200003", "UMask": "0x24" }, @@ -206,7 +203,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.DEMAND_DATA_RD_HIT", - "PublicDescription": "Counts the number of demand Data Read reques= ts initiated by load instructions that hit L2 cache. Available PDIST counte= rs: 0", + "PublicDescription": "Counts the number of demand Data Read reques= ts initiated by load instructions that hit L2 cache.", "SampleAfterValue": "200003", "UMask": "0xc1" }, @@ -215,7 +212,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.DEMAND_DATA_RD_MISS", - "PublicDescription": "Counts demand Data Read requests with true-m= iss in the L2 cache. True-miss excludes misses that were merged with ongoin= g L2 misses. An access is counted once. Available PDIST counters: 0", + "PublicDescription": "Counts demand Data Read requests with true-m= iss in the L2 cache. True-miss excludes misses that were merged with ongoin= g L2 misses. An access is counted once.", "SampleAfterValue": "200003", "UMask": "0x21" }, @@ -224,7 +221,6 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.HWPF_MISS", - "PublicDescription": "L2_RQSTS.HWPF_MISS Available PDIST counters:= 0", "SampleAfterValue": "200003", "UMask": "0x30" }, @@ -233,7 +229,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.MISS", - "PublicDescription": "Counts read requests of any type with true-m= iss in the L2 cache. True-miss excludes L2 misses that were merged with ong= oing L2 misses. [This event is alias to L2_REQUEST.MISS] Available PDIST co= unters: 0", + "PublicDescription": "Counts read requests of any type with true-m= iss in the L2 cache. True-miss excludes L2 misses that were merged with ong= oing L2 misses. [This event is alias to L2_REQUEST.MISS]", "SampleAfterValue": "200003", "UMask": "0x3f" }, @@ -242,7 +238,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.REFERENCES", - "PublicDescription": "Counts all requests that were hit or true mi= sses in L2 cache. True-miss excludes misses that were merged with ongoing L= 2 misses. [This event is alias to L2_REQUEST.ALL] Available PDIST counters:= 0", + "PublicDescription": "Counts all requests that were hit or true mi= sses in L2 cache. True-miss excludes misses that were merged with ongoing L= 2 misses. [This event is alias to L2_REQUEST.ALL]", "SampleAfterValue": "200003", "UMask": "0xff" }, @@ -251,7 +247,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.RFO_HIT", - "PublicDescription": "Counts the RFO (Read-for-Ownership) requests= that hit L2 cache. Available PDIST counters: 0", + "PublicDescription": "Counts the RFO (Read-for-Ownership) requests= that hit L2 cache.", "SampleAfterValue": "200003", "UMask": "0xc2" }, @@ -260,7 +256,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.RFO_MISS", - "PublicDescription": "Counts the RFO (Read-for-Ownership) requests= that miss L2 cache. Available PDIST counters: 0", + "PublicDescription": "Counts the RFO (Read-for-Ownership) requests= that miss L2 cache.", "SampleAfterValue": "200003", "UMask": "0x22" }, @@ -269,7 +265,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.SWPF_HIT", - "PublicDescription": "Counts Software prefetch requests that hit t= he L2 cache. Accounts for PREFETCHNTA and PREFETCHT0/1/2 instructions when = FB is not full. Available PDIST counters: 0", + "PublicDescription": "Counts Software prefetch requests that hit t= he L2 cache. Accounts for PREFETCHNTA and PREFETCHT0/1/2 instructions when = FB is not full.", "SampleAfterValue": "200003", "UMask": "0xc8" }, @@ -278,7 +274,7 @@ "Counter": "0,1,2,3", "EventCode": "0x24", "EventName": "L2_RQSTS.SWPF_MISS", - "PublicDescription": "Counts Software prefetch requests that miss = the L2 cache. Accounts for PREFETCHNTA and PREFETCHT0/1/2 instructions when= FB is not full. Available PDIST counters: 0", + "PublicDescription": "Counts Software prefetch requests that miss = the L2 cache. Accounts for PREFETCHNTA and PREFETCHT0/1/2 instructions when= FB is not full.", "SampleAfterValue": "200003", "UMask": "0x28" }, @@ -287,7 +283,7 @@ "Counter": "0,1,2,3", "EventCode": "0x23", "EventName": "L2_TRANS.L2_WB", - "PublicDescription": "Counts L2 writebacks that access L2 cache. A= vailable PDIST counters: 0", + "PublicDescription": "Counts L2 writebacks that access L2 cache.", "SampleAfterValue": "200003", "UMask": "0x40" }, @@ -296,7 +292,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0x2e", "EventName": "LONGEST_LAT_CACHE.MISS", - "PublicDescription": "Counts core-originated cacheable requests th= at miss the L3 cache (Longest Latency cache). Requests include data and cod= e reads, Reads-for-Ownership (RFOs), speculative accesses and hardware pref= etches to the L1 and L2. It does not include hardware prefetches to the L3= , and may not count other types of requests to the L3. Available PDIST coun= ters: 0", + "PublicDescription": "Counts core-originated cacheable requests th= at miss the L3 cache (Longest Latency cache). Requests include data and cod= e reads, Reads-for-Ownership (RFOs), speculative accesses and hardware pref= etches to the L1 and L2. It does not include hardware prefetches to the L3= , and may not count other types of requests to the L3.", "SampleAfterValue": "100003", "UMask": "0x41" }, @@ -305,7 +301,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0x2e", "EventName": "LONGEST_LAT_CACHE.REFERENCE", - "PublicDescription": "Counts core-originated cacheable requests to= the L3 cache (Longest Latency cache). Requests include data and code reads= , Reads-for-Ownership (RFOs), speculative accesses and hardware prefetches = to the L1 and L2. It does not include hardware prefetches to the L3, and m= ay not count other types of requests to the L3. Available PDIST counters: 0= ", + "PublicDescription": "Counts core-originated cacheable requests to= the L3 cache (Longest Latency cache). Requests include data and code reads= , Reads-for-Ownership (RFOs), speculative accesses and hardware prefetches = to the L1 and L2. It does not include hardware prefetches to the L3, and m= ay not count other types of requests to the L3.", "SampleAfterValue": "100003", "UMask": "0x4f" }, @@ -394,7 +390,7 @@ "Counter": "0,1,2,3", "EventCode": "0x43", "EventName": "MEM_LOAD_COMPLETED.L1_MISS_ANY", - "PublicDescription": "Number of completed demand load requests tha= t missed the L1 data cache including shadow misses (FB hits, merge to an on= going L1D miss) Available PDIST counters: 0", + "PublicDescription": "Number of completed demand load requests tha= t missed the L1 data cache including shadow misses (FB hits, merge to an on= going L1D miss)", "SampleAfterValue": "1000003", "UMask": "0xfd" }, @@ -582,7 +578,6 @@ "Counter": "0,1,2,3", "EventCode": "0x44", "EventName": "MEM_STORE_RETIRED.L2_HIT", - "PublicDescription": "MEM_STORE_RETIRED.L2_HIT Available PDIST cou= nters: 0", "SampleAfterValue": "200003", "UMask": "0x1" }, @@ -591,7 +586,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe5", "EventName": "MEM_UOP_RETIRED.ANY", - "PublicDescription": "Number of retired micro-operations (uops) fo= r load or store memory accesses Available PDIST counters: 0", + "PublicDescription": "Number of retired micro-operations (uops) fo= r load or store memory accesses", "SampleAfterValue": "1000003", "UMask": "0x3" }, @@ -1073,7 +1068,6 @@ "Counter": "0,1,2,3", "EventCode": "0x21", "EventName": "OFFCORE_REQUESTS.ALL_REQUESTS", - "PublicDescription": "OFFCORE_REQUESTS.ALL_REQUESTS Available PDIS= T counters: 0", "SampleAfterValue": "100003", "UMask": "0x80" }, @@ -1082,7 +1076,7 @@ "Counter": "0,1,2,3", "EventCode": "0x21", "EventName": "OFFCORE_REQUESTS.DATA_RD", - "PublicDescription": "Counts the demand and prefetch data reads. A= ll Core Data Reads include cacheable 'Demands' and L2 prefetchers (not L3 p= refetchers). Counting also covers reads due to page walks resulted from any= request type. Available PDIST counters: 0", + "PublicDescription": "Counts the demand and prefetch data reads. A= ll Core Data Reads include cacheable 'Demands' and L2 prefetchers (not L3 p= refetchers). Counting also covers reads due to page walks resulted from any= request type.", "SampleAfterValue": "100003", "UMask": "0x8" }, @@ -1091,7 +1085,7 @@ "Counter": "0,1,2,3", "EventCode": "0x21", "EventName": "OFFCORE_REQUESTS.DEMAND_CODE_RD", - "PublicDescription": "Counts both cacheable and non-cacheable code= read requests. Available PDIST counters: 0", + "PublicDescription": "Counts both cacheable and non-cacheable code= read requests.", "SampleAfterValue": "100003", "UMask": "0x2" }, @@ -1100,7 +1094,7 @@ "Counter": "0,1,2,3", "EventCode": "0x21", "EventName": "OFFCORE_REQUESTS.DEMAND_DATA_RD", - "PublicDescription": "Counts the Demand Data Read requests sent to= uncore. Use it in conjunction with OFFCORE_REQUESTS_OUTSTANDING to determi= ne average latency in the uncore. Available PDIST counters: 0", + "PublicDescription": "Counts the Demand Data Read requests sent to= uncore. Use it in conjunction with OFFCORE_REQUESTS_OUTSTANDING to determi= ne average latency in the uncore.", "SampleAfterValue": "100003", "UMask": "0x1" }, @@ -1109,7 +1103,7 @@ "Counter": "0,1,2,3", "EventCode": "0x21", "EventName": "OFFCORE_REQUESTS.DEMAND_RFO", - "PublicDescription": "Counts the demand RFO (read for ownership) r= equests including regular RFOs, locks, ItoM. Available PDIST counters: 0", + "PublicDescription": "Counts the demand RFO (read for ownership) r= equests including regular RFOs, locks, ItoM.", "SampleAfterValue": "100003", "UMask": "0x4" }, @@ -1119,7 +1113,6 @@ "Deprecated": "1", "EventCode": "0x20", "EventName": "OFFCORE_REQUESTS_OUTSTANDING.ALL_DATA_RD", - "PublicDescription": "This event is deprecated. Refer to new event= OFFCORE_REQUESTS_OUTSTANDING.DATA_RD Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x8" }, @@ -1129,7 +1122,6 @@ "CounterMask": "1", "EventCode": "0x20", "EventName": "OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD", - "PublicDescription": "OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DAT= A_RD Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x8" }, @@ -1139,7 +1131,7 @@ "CounterMask": "1", "EventCode": "0x20", "EventName": "OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_CODE= _RD", - "PublicDescription": "Counts the number of offcore outstanding Cod= e Reads transactions in the super queue every cycle. The 'Offcore outstandi= ng' state of the transaction lasts from the L2 miss until the sending trans= action completion to requestor (SQ deallocation). See the corresponding Uma= sk under OFFCORE_REQUESTS. Available PDIST counters: 0", + "PublicDescription": "Counts the number of offcore outstanding Cod= e Reads transactions in the super queue every cycle. The 'Offcore outstandi= ng' state of the transaction lasts from the L2 miss until the sending trans= action completion to requestor (SQ deallocation). See the corresponding Uma= sk under OFFCORE_REQUESTS.", "SampleAfterValue": "1000003", "UMask": "0x2" }, @@ -1149,7 +1141,6 @@ "CounterMask": "1", "EventCode": "0x20", "EventName": "OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_DATA= _RD", - "PublicDescription": "Cycles where at least 1 outstanding demand d= ata read request is pending. Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -1159,7 +1150,6 @@ "CounterMask": "1", "EventCode": "0x20", "EventName": "OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_RFO", - "PublicDescription": "OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEM= AND_RFO Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x4" }, @@ -1168,7 +1158,6 @@ "Counter": "0,1,2,3", "EventCode": "0x20", "EventName": "OFFCORE_REQUESTS_OUTSTANDING.DATA_RD", - "PublicDescription": "OFFCORE_REQUESTS_OUTSTANDING.DATA_RD Availab= le PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x8" }, @@ -1177,7 +1166,7 @@ "Counter": "0,1,2,3", "EventCode": "0x20", "EventName": "OFFCORE_REQUESTS_OUTSTANDING.DEMAND_CODE_RD", - "PublicDescription": "Counts the number of offcore outstanding Cod= e Reads transactions in the super queue every cycle. The 'Offcore outstandi= ng' state of the transaction lasts from the L2 miss until the sending trans= action completion to requestor (SQ deallocation). See the corresponding Uma= sk under OFFCORE_REQUESTS. Available PDIST counters: 0", + "PublicDescription": "Counts the number of offcore outstanding Cod= e Reads transactions in the super queue every cycle. The 'Offcore outstandi= ng' state of the transaction lasts from the L2 miss until the sending trans= action completion to requestor (SQ deallocation). See the corresponding Uma= sk under OFFCORE_REQUESTS.", "SampleAfterValue": "1000003", "UMask": "0x2" }, @@ -1186,7 +1175,7 @@ "Counter": "0,1,2,3", "EventCode": "0x20", "EventName": "OFFCORE_REQUESTS_OUTSTANDING.DEMAND_DATA_RD", - "PublicDescription": "For every cycle, increments by the number of= outstanding demand data read requests pending. Requests are considered o= utstanding from the time they miss the core's L2 cache until the transactio= n completion message is sent to the requestor. Available PDIST counters: 0", + "PublicDescription": "For every cycle, increments by the number of= outstanding demand data read requests pending. Requests are considered o= utstanding from the time they miss the core's L2 cache until the transactio= n completion message is sent to the requestor.", "SampleAfterValue": "1000003", "UMask": "0x1" }, @@ -1195,7 +1184,7 @@ "Counter": "0,1,2,3", "EventCode": "0x2c", "EventName": "SQ_MISC.BUS_LOCK", - "PublicDescription": "Counts the more expensive bus lock needed to= enforce cache coherency for certain memory accesses that need to be done a= tomically. Can be created by issuing an atomic instruction (via the LOCK p= refix) which causes a cache line split or accesses uncacheable memory. Avai= lable PDIST counters: 0", + "PublicDescription": "Counts the more expensive bus lock needed to= enforce cache coherency for certain memory accesses that need to be done a= tomically. Can be created by issuing an atomic instruction (via the LOCK p= refix) which causes a cache line split or accesses uncacheable memory.", "SampleAfterValue": "100003", "UMask": "0x10" }, @@ -1204,7 +1193,6 @@ "Counter": "0,1,2,3", "EventCode": "0x40", "EventName": "SW_PREFETCH_ACCESS.ANY", - "PublicDescription": "Counts the number of PREFETCHNTA, PREFETCHW,= PREFETCHT0, PREFETCHT1 or PREFETCHT2 instructions executed. Available PDIS= T counters: 0", "SampleAfterValue": "100003", "UMask": "0xf" }, @@ -1213,7 +1201,7 @@ "Counter": "0,1,2,3", "EventCode": "0x40", "EventName": "SW_PREFETCH_ACCESS.NTA", - "PublicDescription": "Counts the number of PREFETCHNTA instruction= s executed. Available PDIST counters: 0", + "PublicDescription": "Counts the number of PREFETCHNTA instruction= s executed.", "SampleAfterValue": "100003", "UMask": "0x1" }, @@ -1222,7 +1210,7 @@ "Counter": "0,1,2,3", "EventCode": "0x40", "EventName": "SW_PREFETCH_ACCESS.PREFETCHW", - "PublicDescription": "Counts the number of PREFETCHW instructions = executed. Available PDIST counters: 0", + "PublicDescription": "Counts the number of PREFETCHW instructions = executed.", "SampleAfterValue": "100003", "UMask": "0x8" }, @@ -1231,7 +1219,7 @@ "Counter": "0,1,2,3", "EventCode": "0x40", "EventName": "SW_PREFETCH_ACCESS.T0", - "PublicDescription": "Counts the number of PREFETCHT0 instructions= executed. Available PDIST counters: 0", + "PublicDescription": "Counts the number of PREFETCHT0 instructions= executed.", "SampleAfterValue": "100003", "UMask": "0x2" }, @@ -1240,7 +1228,7 @@ "Counter": "0,1,2,3", "EventCode": "0x40", "EventName": "SW_PREFETCH_ACCESS.T1_T2", - "PublicDescription": "Counts the number of PREFETCHT1 or PREFETCHT= 2 instructions executed. Available PDIST counters: 0", + "PublicDescription": "Counts the number of PREFETCHT1 or PREFETCHT= 2 instructions executed.", "SampleAfterValue": "100003", "UMask": "0x4" } diff --git a/tools/perf/pmu-events/arch/x86/sapphirerapids/floating-point.j= son b/tools/perf/pmu-events/arch/x86/sapphirerapids/floating-point.json index 8c9207750c82..bc475e163227 100644 --- a/tools/perf/pmu-events/arch/x86/sapphirerapids/floating-point.json +++ b/tools/perf/pmu-events/arch/x86/sapphirerapids/floating-point.json @@ -5,7 +5,6 @@ "CounterMask": "1", "EventCode": "0xb0", "EventName": "ARITH.FPDIV_ACTIVE", - "PublicDescription": "ARITH.FPDIV_ACTIVE Available PDIST counters:= 0", "SampleAfterValue": "1000003", "UMask": "0x1" }, @@ -14,7 +13,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc1", "EventName": "ASSISTS.FP", - "PublicDescription": "Counts all microcode Floating Point assists.= Available PDIST counters: 0", + "PublicDescription": "Counts all microcode Floating Point assists.= ", "SampleAfterValue": "100003", "UMask": "0x2" }, @@ -23,7 +22,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc1", "EventName": "ASSISTS.SSE_AVX_MIX", - "PublicDescription": "ASSISTS.SSE_AVX_MIX Available PDIST counters= : 0", "SampleAfterValue": "1000003", "UMask": "0x10" }, @@ -32,7 +30,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb3", "EventName": "FP_ARITH_DISPATCHED.PORT_0", - "PublicDescription": "FP_ARITH_DISPATCHED.PORT_0 [This event is al= ias to FP_ARITH_DISPATCHED.V0] Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -41,7 +38,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb3", "EventName": "FP_ARITH_DISPATCHED.PORT_1", - "PublicDescription": "FP_ARITH_DISPATCHED.PORT_1 [This event is al= ias to FP_ARITH_DISPATCHED.V1] Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x2" }, @@ -50,7 +46,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb3", "EventName": "FP_ARITH_DISPATCHED.PORT_5", - "PublicDescription": "FP_ARITH_DISPATCHED.PORT_5 [This event is al= ias to FP_ARITH_DISPATCHED.V2] Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x4" }, @@ -59,7 +54,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb3", "EventName": "FP_ARITH_DISPATCHED.V0", - "PublicDescription": "FP_ARITH_DISPATCHED.V0 [This event is alias = to FP_ARITH_DISPATCHED.PORT_0] Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -68,7 +62,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb3", "EventName": "FP_ARITH_DISPATCHED.V1", - "PublicDescription": "FP_ARITH_DISPATCHED.V1 [This event is alias = to FP_ARITH_DISPATCHED.PORT_1] Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x2" }, @@ -77,7 +70,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb3", "EventName": "FP_ARITH_DISPATCHED.V2", - "PublicDescription": "FP_ARITH_DISPATCHED.V2 [This event is alias = to FP_ARITH_DISPATCHED.PORT_5] Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x4" }, @@ -86,7 +78,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE", - "PublicDescription": "Number of SSE/AVX computational 128-bit pack= ed double precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 2 computation opera= tions, one for each element. Applies to SSE* and AVX* packed double precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as the= y perform 2 calculations per element. The DAZ and FTZ flags in the MXCSR re= gister need to be set when using these events. Available PDIST counters: 0", + "PublicDescription": "Number of SSE/AVX computational 128-bit pack= ed double precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 2 computation opera= tions, one for each element. Applies to SSE* and AVX* packed double precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as the= y perform 2 calculations per element. The DAZ and FTZ flags in the MXCSR re= gister need to be set when using these events.", "SampleAfterValue": "100003", "UMask": "0x4" }, @@ -95,7 +87,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE", - "PublicDescription": "Number of SSE/AVX computational 128-bit pack= ed single precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 4 computation opera= tions, one for each element. Applies to SSE* and AVX* packed single precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT RSQRT RCP DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count tw= ice as they perform 2 calculations per element. The DAZ and FTZ flags in th= e MXCSR register need to be set when using these events. Available PDIST co= unters: 0", + "PublicDescription": "Number of SSE/AVX computational 128-bit pack= ed single precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 4 computation opera= tions, one for each element. Applies to SSE* and AVX* packed single precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT RSQRT RCP DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count tw= ice as they perform 2 calculations per element. The DAZ and FTZ flags in th= e MXCSR register need to be set when using these events.", "SampleAfterValue": "100003", "UMask": "0x8" }, @@ -104,7 +96,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE", - "PublicDescription": "Number of SSE/AVX computational 256-bit pack= ed double precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 4 computation opera= tions, one for each element. Applies to SSE* and AVX* packed double precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 = calculations per element. The DAZ and FTZ flags in the MXCSR register need = to be set when using these events. Available PDIST counters: 0", + "PublicDescription": "Number of SSE/AVX computational 256-bit pack= ed double precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 4 computation opera= tions, one for each element. Applies to SSE* and AVX* packed double precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 = calculations per element. The DAZ and FTZ flags in the MXCSR register need = to be set when using these events.", "SampleAfterValue": "100003", "UMask": "0x10" }, @@ -113,7 +105,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE", - "PublicDescription": "Number of SSE/AVX computational 256-bit pack= ed single precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 8 computation opera= tions, one for each element. Applies to SSE* and AVX* packed single precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT RSQRT RCP DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count tw= ice as they perform 2 calculations per element. The DAZ and FTZ flags in th= e MXCSR register need to be set when using these events. Available PDIST co= unters: 0", + "PublicDescription": "Number of SSE/AVX computational 256-bit pack= ed single precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 8 computation opera= tions, one for each element. Applies to SSE* and AVX* packed single precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT RSQRT RCP DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count tw= ice as they perform 2 calculations per element. The DAZ and FTZ flags in th= e MXCSR register need to be set when using these events.", "SampleAfterValue": "100003", "UMask": "0x20" }, @@ -122,7 +114,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.4_FLOPS", - "PublicDescription": "Number of SSE/AVX computational 128-bit pack= ed single precision and 256-bit packed double precision floating-point ins= tructions retired; some instructions will count twice as noted below. Each= count represents 2 or/and 4 computation operations, one for each element. = Applies to SSE* and AVX* packed single precision floating-point and packed= double precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL= DIV MIN MAX RCP14 RSQRT14 SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB ins= tructions count twice as they perform 2 calculations per element. The DAZ a= nd FTZ flags in the MXCSR register need to be set when using these events. = Available PDIST counters: 0", + "PublicDescription": "Number of SSE/AVX computational 128-bit pack= ed single precision and 256-bit packed double precision floating-point ins= tructions retired; some instructions will count twice as noted below. Each= count represents 2 or/and 4 computation operations, one for each element. = Applies to SSE* and AVX* packed single precision floating-point and packed= double precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL= DIV MIN MAX RCP14 RSQRT14 SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB ins= tructions count twice as they perform 2 calculations per element. The DAZ a= nd FTZ flags in the MXCSR register need to be set when using these events.", "SampleAfterValue": "100003", "UMask": "0x18" }, @@ -131,7 +123,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE", - "PublicDescription": "Number of SSE/AVX computational 512-bit pack= ed double precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 8 computation opera= tions, one for each element. Applies to SSE* and AVX* packed double precis= ion floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT14 RCP14= FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 calc= ulations per element. The DAZ and FTZ flags in the MXCSR register need to b= e set when using these events. Available PDIST counters: 0", + "PublicDescription": "Number of SSE/AVX computational 512-bit pack= ed double precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 8 computation opera= tions, one for each element. Applies to SSE* and AVX* packed double precis= ion floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT14 RCP14= FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 calc= ulations per element. The DAZ and FTZ flags in the MXCSR register need to b= e set when using these events.", "SampleAfterValue": "100003", "UMask": "0x40" }, @@ -140,7 +132,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE", - "PublicDescription": "Number of SSE/AVX computational 512-bit pack= ed single precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 16 computation oper= ations, one for each element. Applies to SSE* and AVX* packed single preci= sion floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT14 RCP1= 4 FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 cal= culations per element. The DAZ and FTZ flags in the MXCSR register need to = be set when using these events. Available PDIST counters: 0", + "PublicDescription": "Number of SSE/AVX computational 512-bit pack= ed single precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 16 computation oper= ations, one for each element. Applies to SSE* and AVX* packed single preci= sion floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT14 RCP1= 4 FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 cal= culations per element. The DAZ and FTZ flags in the MXCSR register need to = be set when using these events.", "SampleAfterValue": "100003", "UMask": "0x80" }, @@ -149,7 +141,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.8_FLOPS", - "PublicDescription": "Number of SSE/AVX computational 256-bit pack= ed single precision and 512-bit packed double precision floating-point ins= tructions retired; some instructions will count twice as noted below. Each= count represents 8 computation operations, one for each element. Applies = to SSE* and AVX* packed single precision and double precision floating-poin= t instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT RSQRT RSQRT14= RCP RCP14 DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice= as they perform 2 calculations per element. The DAZ and FTZ flags in the M= XCSR register need to be set when using these events. Available PDIST count= ers: 0", + "PublicDescription": "Number of SSE/AVX computational 256-bit pack= ed single precision and 512-bit packed double precision floating-point ins= tructions retired; some instructions will count twice as noted below. Each= count represents 8 computation operations, one for each element. Applies = to SSE* and AVX* packed single precision and double precision floating-poin= t instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT RSQRT RSQRT14= RCP RCP14 DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice= as they perform 2 calculations per element. The DAZ and FTZ flags in the M= XCSR register need to be set when using these events.", "SampleAfterValue": "100003", "UMask": "0x60" }, @@ -158,7 +150,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.SCALAR", - "PublicDescription": "Number of SSE/AVX computational scalar singl= e precision and double precision floating-point instructions retired; some = instructions will count twice as noted below. Each count represents 1 comp= utational operation. Applies to SSE* and AVX* scalar single precision float= ing-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT RCP FM(N)ADD/SUB= . FM(N)ADD/SUB instructions count twice as they perform 2 calculations per= element. The DAZ and FTZ flags in the MXCSR register need to be set when u= sing these events. Available PDIST counters: 0", + "PublicDescription": "Number of SSE/AVX computational scalar singl= e precision and double precision floating-point instructions retired; some = instructions will count twice as noted below. Each count represents 1 comp= utational operation. Applies to SSE* and AVX* scalar single precision float= ing-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT RCP FM(N)ADD/SUB= . FM(N)ADD/SUB instructions count twice as they perform 2 calculations per= element. The DAZ and FTZ flags in the MXCSR register need to be set when u= sing these events.", "SampleAfterValue": "1000003", "UMask": "0x3" }, @@ -167,7 +159,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.SCALAR_DOUBLE", - "PublicDescription": "Number of SSE/AVX computational scalar doubl= e precision floating-point instructions retired; some instructions will cou= nt twice as noted below. Each count represents 1 computational operation. = Applies to SSE* and AVX* scalar double precision floating-point instruction= s: ADD SUB MUL DIV MIN MAX SQRT FM(N)ADD/SUB. FM(N)ADD/SUB instructions co= unt twice as they perform 2 calculations per element. The DAZ and FTZ flags= in the MXCSR register need to be set when using these events. Available PD= IST counters: 0", + "PublicDescription": "Number of SSE/AVX computational scalar doubl= e precision floating-point instructions retired; some instructions will cou= nt twice as noted below. Each count represents 1 computational operation. = Applies to SSE* and AVX* scalar double precision floating-point instruction= s: ADD SUB MUL DIV MIN MAX SQRT FM(N)ADD/SUB. FM(N)ADD/SUB instructions co= unt twice as they perform 2 calculations per element. The DAZ and FTZ flags= in the MXCSR register need to be set when using these events.", "SampleAfterValue": "100003", "UMask": "0x1" }, @@ -176,7 +168,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.SCALAR_SINGLE", - "PublicDescription": "Number of SSE/AVX computational scalar singl= e precision floating-point instructions retired; some instructions will cou= nt twice as noted below. Each count represents 1 computational operation. = Applies to SSE* and AVX* scalar single precision floating-point instruction= s: ADD SUB MUL DIV MIN MAX SQRT RSQRT RCP FM(N)ADD/SUB. FM(N)ADD/SUB instr= uctions count twice as they perform 2 calculations per element. The DAZ and= FTZ flags in the MXCSR register need to be set when using these events. Av= ailable PDIST counters: 0", + "PublicDescription": "Number of SSE/AVX computational scalar singl= e precision floating-point instructions retired; some instructions will cou= nt twice as noted below. Each count represents 1 computational operation. = Applies to SSE* and AVX* scalar single precision floating-point instruction= s: ADD SUB MUL DIV MIN MAX SQRT RSQRT RCP FM(N)ADD/SUB. FM(N)ADD/SUB instr= uctions count twice as they perform 2 calculations per element. The DAZ and= FTZ flags in the MXCSR register need to be set when using these events.", "SampleAfterValue": "100003", "UMask": "0x2" }, @@ -185,7 +177,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc7", "EventName": "FP_ARITH_INST_RETIRED.VECTOR", - "PublicDescription": "Number of any Vector retired FP arithmetic i= nstructions. The DAZ and FTZ flags in the MXCSR register need to be set wh= en using these events. Available PDIST counters: 0", + "PublicDescription": "Number of any Vector retired FP arithmetic i= nstructions. The DAZ and FTZ flags in the MXCSR register need to be set wh= en using these events.", "SampleAfterValue": "1000003", "UMask": "0xfc" }, @@ -194,7 +186,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xcf", "EventName": "FP_ARITH_INST_RETIRED2.128B_PACKED_HALF", - "PublicDescription": "FP_ARITH_INST_RETIRED2.128B_PACKED_HALF Avai= lable PDIST counters: 0", "SampleAfterValue": "100003", "UMask": "0x4" }, @@ -203,7 +194,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xcf", "EventName": "FP_ARITH_INST_RETIRED2.256B_PACKED_HALF", - "PublicDescription": "FP_ARITH_INST_RETIRED2.256B_PACKED_HALF Avai= lable PDIST counters: 0", "SampleAfterValue": "100003", "UMask": "0x8" }, @@ -212,7 +202,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xcf", "EventName": "FP_ARITH_INST_RETIRED2.512B_PACKED_HALF", - "PublicDescription": "FP_ARITH_INST_RETIRED2.512B_PACKED_HALF Avai= lable PDIST counters: 0", "SampleAfterValue": "100003", "UMask": "0x10" }, @@ -221,7 +210,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xcf", "EventName": "FP_ARITH_INST_RETIRED2.COMPLEX_SCALAR_HALF", - "PublicDescription": "FP_ARITH_INST_RETIRED2.COMPLEX_SCALAR_HALF A= vailable PDIST counters: 0", "SampleAfterValue": "100003", "UMask": "0x2" }, @@ -230,7 +218,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xcf", "EventName": "FP_ARITH_INST_RETIRED2.SCALAR", - "PublicDescription": "FP_ARITH_INST_RETIRED2.SCALAR Available PDIS= T counters: 0", + "PublicDescription": "FP_ARITH_INST_RETIRED2.SCALAR", "SampleAfterValue": "100003", "UMask": "0x3" }, @@ -239,7 +227,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xcf", "EventName": "FP_ARITH_INST_RETIRED2.SCALAR_HALF", - "PublicDescription": "FP_ARITH_INST_RETIRED2.SCALAR_HALF Available= PDIST counters: 0", "SampleAfterValue": "100003", "UMask": "0x1" }, @@ -248,7 +235,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xcf", "EventName": "FP_ARITH_INST_RETIRED2.VECTOR", - "PublicDescription": "FP_ARITH_INST_RETIRED2.VECTOR Available PDIS= T counters: 0", + "PublicDescription": "FP_ARITH_INST_RETIRED2.VECTOR", "SampleAfterValue": "100003", "UMask": "0x1c" } diff --git a/tools/perf/pmu-events/arch/x86/sapphirerapids/frontend.json b/= tools/perf/pmu-events/arch/x86/sapphirerapids/frontend.json index 9fe9d62b867a..793c486ffabe 100644 --- a/tools/perf/pmu-events/arch/x86/sapphirerapids/frontend.json +++ b/tools/perf/pmu-events/arch/x86/sapphirerapids/frontend.json @@ -4,7 +4,7 @@ "Counter": "0,1,2,3", "EventCode": "0x60", "EventName": "BACLEARS.ANY", - "PublicDescription": "Number of times the front-end is resteered w= hen it finds a branch instruction in a fetch line. This is called Unknown B= ranch which occurs for the first time a branch instruction is fetched or wh= en the branch is not tracked by the BPU (Branch Prediction Unit) anymore. A= vailable PDIST counters: 0", + "PublicDescription": "Number of times the front-end is resteered w= hen it finds a branch instruction in a fetch line. This is called Unknown B= ranch which occurs for the first time a branch instruction is fetched or wh= en the branch is not tracked by the BPU (Branch Prediction Unit) anymore.", "SampleAfterValue": "100003", "UMask": "0x1" }, @@ -13,7 +13,7 @@ "Counter": "0,1,2,3", "EventCode": "0x87", "EventName": "DECODE.LCP", - "PublicDescription": "Counts cycles that the Instruction Length de= coder (ILD) stalls occurred due to dynamically changing prefix length of th= e decoded instruction (by operand size prefix instruction 0x66, address siz= e prefix instruction 0x67 or REX.W for Intel64). Count is proportional to t= he number of prefixes in a 16B-line. This may result in a three-cycle penal= ty for each LCP (Length changing prefix) in a 16-byte chunk. Available PDIS= T counters: 0", + "PublicDescription": "Counts cycles that the Instruction Length de= coder (ILD) stalls occurred due to dynamically changing prefix length of th= e decoded instruction (by operand size prefix instruction 0x66, address siz= e prefix instruction 0x67 or REX.W for Intel64). Count is proportional to t= he number of prefixes in a 16B-line. This may result in a three-cycle penal= ty for each LCP (Length changing prefix) in a 16-byte chunk.", "SampleAfterValue": "500009", "UMask": "0x1" }, @@ -22,7 +22,6 @@ "Counter": "0,1,2,3", "EventCode": "0x87", "EventName": "DECODE.MS_BUSY", - "PublicDescription": "Cycles the Microcode Sequencer is busy. Avai= lable PDIST counters: 0", "SampleAfterValue": "500009", "UMask": "0x2" }, @@ -31,7 +30,7 @@ "Counter": "0,1,2,3", "EventCode": "0x61", "EventName": "DSB2MITE_SWITCHES.PENALTY_CYCLES", - "PublicDescription": "Decode Stream Buffer (DSB) is a Uop-cache th= at holds translations of previously fetched instructions that were decoded = by the legacy x86 decode pipeline (MITE). This event counts fetch penalty c= ycles when a transition occurs from DSB to MITE. Available PDIST counters: = 0", + "PublicDescription": "Decode Stream Buffer (DSB) is a Uop-cache th= at holds translations of previously fetched instructions that were decoded = by the legacy x86 decode pipeline (MITE). This event counts fetch penalty c= ycles when a transition occurs from DSB to MITE.", "SampleAfterValue": "100003", "UMask": "0x2" }, @@ -249,7 +248,7 @@ "Counter": "0,1,2,3", "EventCode": "0x80", "EventName": "ICACHE_DATA.STALLS", - "PublicDescription": "Counts cycles where a code line fetch is sta= lled due to an L1 instruction cache miss. The decode pipeline works at a 32= Byte granularity. Available PDIST counters: 0", + "PublicDescription": "Counts cycles where a code line fetch is sta= lled due to an L1 instruction cache miss. The decode pipeline works at a 32= Byte granularity.", "SampleAfterValue": "500009", "UMask": "0x4" }, @@ -260,7 +259,6 @@ "EdgeDetect": "1", "EventCode": "0x80", "EventName": "ICACHE_DATA.STALL_PERIODS", - "PublicDescription": "ICACHE_DATA.STALL_PERIODS Available PDIST co= unters: 0", "SampleAfterValue": "500009", "UMask": "0x4" }, @@ -269,7 +267,7 @@ "Counter": "0,1,2,3", "EventCode": "0x83", "EventName": "ICACHE_TAG.STALLS", - "PublicDescription": "Counts cycles where a code fetch is stalled = due to L1 instruction cache tag miss. Available PDIST counters: 0", + "PublicDescription": "Counts cycles where a code fetch is stalled = due to L1 instruction cache tag miss.", "SampleAfterValue": "200003", "UMask": "0x4" }, @@ -279,7 +277,7 @@ "CounterMask": "1", "EventCode": "0x79", "EventName": "IDQ.DSB_CYCLES_ANY", - "PublicDescription": "Counts the number of cycles uops were delive= red to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) p= ath. Available PDIST counters: 0", + "PublicDescription": "Counts the number of cycles uops were delive= red to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) p= ath.", "SampleAfterValue": "2000003", "UMask": "0x8" }, @@ -289,7 +287,7 @@ "CounterMask": "6", "EventCode": "0x79", "EventName": "IDQ.DSB_CYCLES_OK", - "PublicDescription": "Counts the number of cycles where optimal nu= mber of uops was delivered to the Instruction Decode Queue (IDQ) from the D= SB (Decode Stream Buffer) path. Count includes uops that may 'bypass' the I= DQ. Available PDIST counters: 0", + "PublicDescription": "Counts the number of cycles where optimal nu= mber of uops was delivered to the Instruction Decode Queue (IDQ) from the D= SB (Decode Stream Buffer) path. Count includes uops that may 'bypass' the I= DQ.", "SampleAfterValue": "2000003", "UMask": "0x8" }, @@ -298,7 +296,7 @@ "Counter": "0,1,2,3", "EventCode": "0x79", "EventName": "IDQ.DSB_UOPS", - "PublicDescription": "Counts the number of uops delivered to Instr= uction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path. Availab= le PDIST counters: 0", + "PublicDescription": "Counts the number of uops delivered to Instr= uction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path.", "SampleAfterValue": "2000003", "UMask": "0x8" }, @@ -308,7 +306,7 @@ "CounterMask": "1", "EventCode": "0x79", "EventName": "IDQ.MITE_CYCLES_ANY", - "PublicDescription": "Counts the number of cycles uops were delive= red to the Instruction Decode Queue (IDQ) from the MITE (legacy decode pipe= line) path. During these cycles uops are not being delivered from the Decod= e Stream Buffer (DSB). Available PDIST counters: 0", + "PublicDescription": "Counts the number of cycles uops were delive= red to the Instruction Decode Queue (IDQ) from the MITE (legacy decode pipe= line) path. During these cycles uops are not being delivered from the Decod= e Stream Buffer (DSB).", "SampleAfterValue": "2000003", "UMask": "0x4" }, @@ -318,7 +316,7 @@ "CounterMask": "6", "EventCode": "0x79", "EventName": "IDQ.MITE_CYCLES_OK", - "PublicDescription": "Counts the number of cycles where optimal nu= mber of uops was delivered to the Instruction Decode Queue (IDQ) from the M= ITE (legacy decode pipeline) path. During these cycles uops are not being d= elivered from the Decode Stream Buffer (DSB). Available PDIST counters: 0", + "PublicDescription": "Counts the number of cycles where optimal nu= mber of uops was delivered to the Instruction Decode Queue (IDQ) from the M= ITE (legacy decode pipeline) path. During these cycles uops are not being d= elivered from the Decode Stream Buffer (DSB).", "SampleAfterValue": "2000003", "UMask": "0x4" }, @@ -327,7 +325,7 @@ "Counter": "0,1,2,3", "EventCode": "0x79", "EventName": "IDQ.MITE_UOPS", - "PublicDescription": "Counts the number of uops delivered to Instr= uction Decode Queue (IDQ) from the MITE path. This also means that uops are= not being delivered from the Decode Stream Buffer (DSB). Available PDIST c= ounters: 0", + "PublicDescription": "Counts the number of uops delivered to Instr= uction Decode Queue (IDQ) from the MITE path. This also means that uops are= not being delivered from the Decode Stream Buffer (DSB).", "SampleAfterValue": "2000003", "UMask": "0x4" }, @@ -337,7 +335,7 @@ "CounterMask": "1", "EventCode": "0x79", "EventName": "IDQ.MS_CYCLES_ANY", - "PublicDescription": "Counts cycles during which uops are being de= livered to Instruction Decode Queue (IDQ) while the Microcode Sequencer (MS= ) is busy. Uops maybe initiated by Decode Stream Buffer (DSB) or MITE. Avai= lable PDIST counters: 0", + "PublicDescription": "Counts cycles during which uops are being de= livered to Instruction Decode Queue (IDQ) while the Microcode Sequencer (MS= ) is busy. Uops maybe initiated by Decode Stream Buffer (DSB) or MITE.", "SampleAfterValue": "2000003", "UMask": "0x20" }, @@ -348,7 +346,7 @@ "EdgeDetect": "1", "EventCode": "0x79", "EventName": "IDQ.MS_SWITCHES", - "PublicDescription": "Number of switches from DSB (Decode Stream B= uffer) or MITE (legacy decode pipeline) to the Microcode Sequencer. Availab= le PDIST counters: 0", + "PublicDescription": "Number of switches from DSB (Decode Stream B= uffer) or MITE (legacy decode pipeline) to the Microcode Sequencer.", "SampleAfterValue": "100003", "UMask": "0x20" }, @@ -357,7 +355,7 @@ "Counter": "0,1,2,3", "EventCode": "0x79", "EventName": "IDQ.MS_UOPS", - "PublicDescription": "Counts the total number of uops delivered by= the Microcode Sequencer (MS). Available PDIST counters: 0", + "PublicDescription": "Counts the total number of uops delivered by= the Microcode Sequencer (MS).", "SampleAfterValue": "1000003", "UMask": "0x20" }, @@ -366,7 +364,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0x9c", "EventName": "IDQ_BUBBLES.CORE", - "PublicDescription": "Counts the number of uops not delivered to b= y the Instruction Decode Queue (IDQ) to the back-end of the pipeline when t= here was no back-end stalls. This event counts for one SMT thread in a give= n cycle. [This event is alias to IDQ_UOPS_NOT_DELIVERED.CORE] Available PDI= ST counters: 0", + "PublicDescription": "Counts the number of uops not delivered to b= y the Instruction Decode Queue (IDQ) to the back-end of the pipeline when t= here was no back-end stalls. This event counts for one SMT thread in a give= n cycle. [This event is alias to IDQ_UOPS_NOT_DELIVERED.CORE]", "SampleAfterValue": "1000003", "UMask": "0x1" }, @@ -376,7 +374,7 @@ "CounterMask": "6", "EventCode": "0x9c", "EventName": "IDQ_BUBBLES.CYCLES_0_UOPS_DELIV.CORE", - "PublicDescription": "Counts the number of cycles when no uops wer= e delivered by the Instruction Decode Queue (IDQ) to the back-end of the pi= peline when there was no back-end stalls. This event counts for one SMT thr= ead in a given cycle. [This event is alias to IDQ_UOPS_NOT_DELIVERED.CYCLES= _0_UOPS_DELIV.CORE] Available PDIST counters: 0", + "PublicDescription": "Counts the number of cycles when no uops wer= e delivered by the Instruction Decode Queue (IDQ) to the back-end of the pi= peline when there was no back-end stalls. This event counts for one SMT thr= ead in a given cycle. [This event is alias to IDQ_UOPS_NOT_DELIVERED.CYCLES= _0_UOPS_DELIV.CORE]", "SampleAfterValue": "1000003", "UMask": "0x1" }, @@ -387,7 +385,7 @@ "EventCode": "0x9c", "EventName": "IDQ_BUBBLES.CYCLES_FE_WAS_OK", "Invert": "1", - "PublicDescription": "Counts the number of cycles when the optimal= number of uops were delivered by the Instruction Decode Queue (IDQ) to the= back-end of the pipeline when there was no back-end stalls. This event cou= nts for one SMT thread in a given cycle. [This event is alias to IDQ_UOPS_N= OT_DELIVERED.CYCLES_FE_WAS_OK] Available PDIST counters: 0", + "PublicDescription": "Counts the number of cycles when the optimal= number of uops were delivered by the Instruction Decode Queue (IDQ) to the= back-end of the pipeline when there was no back-end stalls. This event cou= nts for one SMT thread in a given cycle. [This event is alias to IDQ_UOPS_N= OT_DELIVERED.CYCLES_FE_WAS_OK]", "SampleAfterValue": "1000003", "UMask": "0x1" }, @@ -396,7 +394,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0x9c", "EventName": "IDQ_UOPS_NOT_DELIVERED.CORE", - "PublicDescription": "Counts the number of uops not delivered to b= y the Instruction Decode Queue (IDQ) to the back-end of the pipeline when t= here was no back-end stalls. This event counts for one SMT thread in a give= n cycle. [This event is alias to IDQ_BUBBLES.CORE] Available PDIST counters= : 0", + "PublicDescription": "Counts the number of uops not delivered to b= y the Instruction Decode Queue (IDQ) to the back-end of the pipeline when t= here was no back-end stalls. This event counts for one SMT thread in a give= n cycle. [This event is alias to IDQ_BUBBLES.CORE]", "SampleAfterValue": "1000003", "UMask": "0x1" }, @@ -406,7 +404,7 @@ "CounterMask": "6", "EventCode": "0x9c", "EventName": "IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE", - "PublicDescription": "Counts the number of cycles when no uops wer= e delivered by the Instruction Decode Queue (IDQ) to the back-end of the pi= peline when there was no back-end stalls. This event counts for one SMT thr= ead in a given cycle. [This event is alias to IDQ_BUBBLES.CYCLES_0_UOPS_DEL= IV.CORE] Available PDIST counters: 0", + "PublicDescription": "Counts the number of cycles when no uops wer= e delivered by the Instruction Decode Queue (IDQ) to the back-end of the pi= peline when there was no back-end stalls. This event counts for one SMT thr= ead in a given cycle. [This event is alias to IDQ_BUBBLES.CYCLES_0_UOPS_DEL= IV.CORE]", "SampleAfterValue": "1000003", "UMask": "0x1" }, @@ -417,7 +415,7 @@ "EventCode": "0x9c", "EventName": "IDQ_UOPS_NOT_DELIVERED.CYCLES_FE_WAS_OK", "Invert": "1", - "PublicDescription": "Counts the number of cycles when the optimal= number of uops were delivered by the Instruction Decode Queue (IDQ) to the= back-end of the pipeline when there was no back-end stalls. This event cou= nts for one SMT thread in a given cycle. [This event is alias to IDQ_BUBBLE= S.CYCLES_FE_WAS_OK] Available PDIST counters: 0", + "PublicDescription": "Counts the number of cycles when the optimal= number of uops were delivered by the Instruction Decode Queue (IDQ) to the= back-end of the pipeline when there was no back-end stalls. This event cou= nts for one SMT thread in a given cycle. [This event is alias to IDQ_BUBBLE= S.CYCLES_FE_WAS_OK]", "SampleAfterValue": "1000003", "UMask": "0x1" } diff --git a/tools/perf/pmu-events/arch/x86/sapphirerapids/memory.json b/to= ols/perf/pmu-events/arch/x86/sapphirerapids/memory.json index 7c3f9b76d367..5e6c1f05c981 100644 --- a/tools/perf/pmu-events/arch/x86/sapphirerapids/memory.json +++ b/tools/perf/pmu-events/arch/x86/sapphirerapids/memory.json @@ -5,7 +5,6 @@ "CounterMask": "6", "EventCode": "0xa3", "EventName": "CYCLE_ACTIVITY.STALLS_L3_MISS", - "PublicDescription": "Execution stalls while L3 cache miss demand = load is outstanding. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x6" }, @@ -14,7 +13,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc3", "EventName": "MACHINE_CLEARS.MEMORY_ORDERING", - "PublicDescription": "Counts the number of Machine Clears detected= dye to memory ordering. Memory Ordering Machine Clears may apply when a me= mory read may not conform to the memory ordering rules of the x86 architect= ure Available PDIST counters: 0", + "PublicDescription": "Counts the number of Machine Clears detected= dye to memory ordering. Memory Ordering Machine Clears may apply when a me= mory read may not conform to the memory ordering rules of the x86 architect= ure", "SampleAfterValue": "100003", "UMask": "0x2" }, @@ -24,7 +23,6 @@ "CounterMask": "2", "EventCode": "0x47", "EventName": "MEMORY_ACTIVITY.CYCLES_L1D_MISS", - "PublicDescription": "Cycles while L1 cache miss demand load is ou= tstanding. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x2" }, @@ -34,7 +32,6 @@ "CounterMask": "3", "EventCode": "0x47", "EventName": "MEMORY_ACTIVITY.STALLS_L1D_MISS", - "PublicDescription": "Execution stalls while L1 cache miss demand = load is outstanding. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x3" }, @@ -44,7 +41,7 @@ "CounterMask": "5", "EventCode": "0x47", "EventName": "MEMORY_ACTIVITY.STALLS_L2_MISS", - "PublicDescription": "Execution stalls while L2 cache miss demand = cacheable load request is outstanding (will not count for uncacheable deman= d requests e.g. bus lock). Available PDIST counters: 0", + "PublicDescription": "Execution stalls while L2 cache miss demand = cacheable load request is outstanding (will not count for uncacheable deman= d requests e.g. bus lock).", "SampleAfterValue": "1000003", "UMask": "0x5" }, @@ -54,7 +51,7 @@ "CounterMask": "9", "EventCode": "0x47", "EventName": "MEMORY_ACTIVITY.STALLS_L3_MISS", - "PublicDescription": "Execution stalls while L3 cache miss demand = cacheable load request is outstanding (will not count for uncacheable deman= d requests e.g. bus lock). Available PDIST counters: 0", + "PublicDescription": "Execution stalls while L3 cache miss demand = cacheable load request is outstanding (will not count for uncacheable deman= d requests e.g. bus lock).", "SampleAfterValue": "1000003", "UMask": "0x9" }, @@ -478,7 +475,6 @@ "Counter": "0,1,2,3", "EventCode": "0x21", "EventName": "OFFCORE_REQUESTS.L3_MISS_DEMAND_DATA_RD", - "PublicDescription": "Counts demand data read requests that miss t= he L3 cache. Available PDIST counters: 0", "SampleAfterValue": "100003", "UMask": "0x10" }, @@ -487,7 +483,7 @@ "Counter": "0,1,2,3", "EventCode": "0x20", "EventName": "OFFCORE_REQUESTS_OUTSTANDING.L3_MISS_DEMAND_DATA_RD", - "PublicDescription": "For every cycle, increments by the number of= demand data read requests pending that are known to have missed the L3 cac= he. Note that this does not capture all elapsed cycles while requests are = outstanding - only cycles from when the requests were known by the requesti= ng core to have missed the L3 cache. Available PDIST counters: 0", + "PublicDescription": "For every cycle, increments by the number of= demand data read requests pending that are known to have missed the L3 cac= he. Note that this does not capture all elapsed cycles while requests are = outstanding - only cycles from when the requests were known by the requesti= ng core to have missed the L3 cache.", "SampleAfterValue": "2000003", "UMask": "0x10" }, @@ -505,7 +501,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc9", "EventName": "RTM_RETIRED.ABORTED_EVENTS", - "PublicDescription": "Counts the number of times an RTM execution = aborted due to none of the previous 3 categories (e.g. interrupt). Availabl= e PDIST counters: 0", + "PublicDescription": "Counts the number of times an RTM execution = aborted due to none of the previous 3 categories (e.g. interrupt).", "SampleAfterValue": "100003", "UMask": "0x80" }, @@ -514,7 +510,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc9", "EventName": "RTM_RETIRED.ABORTED_MEM", - "PublicDescription": "Counts the number of times an RTM execution = aborted due to various memory events (e.g. read/write capacity and conflict= s). Available PDIST counters: 0", + "PublicDescription": "Counts the number of times an RTM execution = aborted due to various memory events (e.g. read/write capacity and conflict= s).", "SampleAfterValue": "100003", "UMask": "0x8" }, @@ -523,7 +519,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc9", "EventName": "RTM_RETIRED.ABORTED_MEMTYPE", - "PublicDescription": "Counts the number of times an RTM execution = aborted due to incompatible memory type. Available PDIST counters: 0", + "PublicDescription": "Counts the number of times an RTM execution = aborted due to incompatible memory type.", "SampleAfterValue": "100003", "UMask": "0x40" }, @@ -532,7 +528,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc9", "EventName": "RTM_RETIRED.ABORTED_UNFRIENDLY", - "PublicDescription": "Counts the number of times an RTM execution = aborted due to HLE-unfriendly instructions. Available PDIST counters: 0", + "PublicDescription": "Counts the number of times an RTM execution = aborted due to HLE-unfriendly instructions.", "SampleAfterValue": "100003", "UMask": "0x20" }, @@ -541,7 +537,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc9", "EventName": "RTM_RETIRED.COMMIT", - "PublicDescription": "Counts the number of times RTM commit succee= ded. Available PDIST counters: 0", + "PublicDescription": "Counts the number of times RTM commit succee= ded.", "SampleAfterValue": "100003", "UMask": "0x2" }, @@ -550,7 +546,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc9", "EventName": "RTM_RETIRED.START", - "PublicDescription": "Counts the number of times we entered an RTM= region. Does not count nested transactions. Available PDIST counters: 0", + "PublicDescription": "Counts the number of times we entered an RTM= region. Does not count nested transactions.", "SampleAfterValue": "100003", "UMask": "0x1" }, @@ -559,7 +555,7 @@ "Counter": "0,1,2,3", "EventCode": "0x54", "EventName": "TX_MEM.ABORT_CAPACITY_READ", - "PublicDescription": "Speculatively counts the number of Transacti= onal Synchronization Extensions (TSX) aborts due to a data capacity limitat= ion for transactional reads Available PDIST counters: 0", + "PublicDescription": "Speculatively counts the number of Transacti= onal Synchronization Extensions (TSX) aborts due to a data capacity limitat= ion for transactional reads", "SampleAfterValue": "100003", "UMask": "0x80" }, @@ -568,7 +564,7 @@ "Counter": "0,1,2,3", "EventCode": "0x54", "EventName": "TX_MEM.ABORT_CAPACITY_WRITE", - "PublicDescription": "Speculatively counts the number of Transacti= onal Synchronization Extensions (TSX) aborts due to a data capacity limitat= ion for transactional writes. Available PDIST counters: 0", + "PublicDescription": "Speculatively counts the number of Transacti= onal Synchronization Extensions (TSX) aborts due to a data capacity limitat= ion for transactional writes.", "SampleAfterValue": "100003", "UMask": "0x2" }, @@ -577,7 +573,7 @@ "Counter": "0,1,2,3", "EventCode": "0x54", "EventName": "TX_MEM.ABORT_CONFLICT", - "PublicDescription": "Counts the number of times a TSX line had a = cache conflict. Available PDIST counters: 0", + "PublicDescription": "Counts the number of times a TSX line had a = cache conflict.", "SampleAfterValue": "100003", "UMask": "0x1" } diff --git a/tools/perf/pmu-events/arch/x86/sapphirerapids/other.json b/too= ls/perf/pmu-events/arch/x86/sapphirerapids/other.json index a58d65556609..21f49f609ed4 100644 --- a/tools/perf/pmu-events/arch/x86/sapphirerapids/other.json +++ b/tools/perf/pmu-events/arch/x86/sapphirerapids/other.json @@ -4,10 +4,34 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc1", "EventName": "ASSISTS.PAGE_FAULT", - "PublicDescription": "ASSISTS.PAGE_FAULT Available PDIST counters:= 0", "SampleAfterValue": "1000003", "UMask": "0x8" }, + { + "BriefDescription": "HW_INTERRUPTS.MASKED", + "Counter": "0,1,2,3,4,5,6,7", + "EventCode": "0xcb", + "EventName": "HW_INTERRUPTS.MASKED", + "SampleAfterValue": "100003", + "UMask": "0x2" + }, + { + "BriefDescription": "HW_INTERRUPTS.PENDING_AND_MASKED", + "Counter": "0,1,2,3,4,5,6,7", + "EventCode": "0xcb", + "EventName": "HW_INTERRUPTS.PENDING_AND_MASKED", + "SampleAfterValue": "100003", + "UMask": "0x4" + }, + { + "BriefDescription": "Number of hardware interrupts received by the= processor.", + "Counter": "0,1,2,3,4,5,6,7", + "EventCode": "0xcb", + "EventName": "HW_INTERRUPTS.RECEIVED", + "PublicDescription": "Counts the number of hardware interruptions = received by the processor.", + "SampleAfterValue": "203", + "UMask": "0x1" + }, { "BriefDescription": "Counts streaming stores that have any type of= response.", "Counter": "0,1,2,3", @@ -25,7 +49,7 @@ "CounterMask": "1", "EventCode": "0x2d", "EventName": "XQ.FULL_CYCLES", - "PublicDescription": "number of cycles when the thread is active a= nd the uncore cannot take any further requests (for example prefetches, loa= ds or stores initiated by the Core that miss the L2 cache). Available PDIST= counters: 0", + "PublicDescription": "number of cycles when the thread is active a= nd the uncore cannot take any further requests (for example prefetches, loa= ds or stores initiated by the Core that miss the L2 cache).", "SampleAfterValue": "1000003", "UMask": "0x1" } diff --git a/tools/perf/pmu-events/arch/x86/sapphirerapids/pipeline.json b/= tools/perf/pmu-events/arch/x86/sapphirerapids/pipeline.json index 48bec483b49a..1fa7957956df 100644 --- a/tools/perf/pmu-events/arch/x86/sapphirerapids/pipeline.json +++ b/tools/perf/pmu-events/arch/x86/sapphirerapids/pipeline.json @@ -6,7 +6,6 @@ "Deprecated": "1", "EventCode": "0xb0", "EventName": "ARITH.DIVIDER_ACTIVE", - "PublicDescription": "This event is deprecated. Refer to new event= ARITH.DIV_ACTIVE Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x9" }, @@ -16,7 +15,7 @@ "CounterMask": "1", "EventCode": "0xb0", "EventName": "ARITH.DIV_ACTIVE", - "PublicDescription": "Counts cycles when divide unit is busy execu= ting divide or square root operations. Accounts for integer and floating-po= int operations. Available PDIST counters: 0", + "PublicDescription": "Counts cycles when divide unit is busy execu= ting divide or square root operations. Accounts for integer and floating-po= int operations.", "SampleAfterValue": "1000003", "UMask": "0x9" }, @@ -27,7 +26,6 @@ "Deprecated": "1", "EventCode": "0xb0", "EventName": "ARITH.FP_DIVIDER_ACTIVE", - "PublicDescription": "This event is deprecated. Refer to new event= ARITH.FPDIV_ACTIVE Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x1" }, @@ -37,7 +35,6 @@ "CounterMask": "1", "EventCode": "0xb0", "EventName": "ARITH.IDIV_ACTIVE", - "PublicDescription": "This event counts the cycles the integer div= ider is busy. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x8" }, @@ -48,7 +45,6 @@ "Deprecated": "1", "EventCode": "0xb0", "EventName": "ARITH.INT_DIVIDER_ACTIVE", - "PublicDescription": "This event is deprecated. Refer to new event= ARITH.IDIV_ACTIVE Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x8" }, @@ -57,7 +53,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc1", "EventName": "ASSISTS.ANY", - "PublicDescription": "Counts the number of occurrences where a mic= rocode assist is invoked by hardware. Examples include AD (page Access Dirt= y), FP and AVX related assists. Available PDIST counters: 0", + "PublicDescription": "Counts the number of occurrences where a mic= rocode assist is invoked by hardware. Examples include AD (page Access Dirt= y), FP and AVX related assists.", "SampleAfterValue": "100003", "UMask": "0x1b" }, @@ -217,7 +213,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xec", "EventName": "CPU_CLK_UNHALTED.C01", - "PublicDescription": "Counts core clocks when the thread is in the= C0.1 light-weight slower wakeup time but more power saving optimized state= . This state can be entered via the TPAUSE or UMWAIT instructions. Availab= le PDIST counters: 0", + "PublicDescription": "Counts core clocks when the thread is in the= C0.1 light-weight slower wakeup time but more power saving optimized state= . This state can be entered via the TPAUSE or UMWAIT instructions.", "SampleAfterValue": "2000003", "UMask": "0x10" }, @@ -226,7 +222,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xec", "EventName": "CPU_CLK_UNHALTED.C02", - "PublicDescription": "Counts core clocks when the thread is in the= C0.2 light-weight faster wakeup time but less power saving optimized state= . This state can be entered via the TPAUSE or UMWAIT instructions. Availab= le PDIST counters: 0", + "PublicDescription": "Counts core clocks when the thread is in the= C0.2 light-weight faster wakeup time but less power saving optimized state= . This state can be entered via the TPAUSE or UMWAIT instructions.", "SampleAfterValue": "2000003", "UMask": "0x20" }, @@ -235,7 +231,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xec", "EventName": "CPU_CLK_UNHALTED.C0_WAIT", - "PublicDescription": "Counts core clocks when the thread is in the= C0.1 or C0.2 power saving optimized states (TPAUSE or UMWAIT instructions)= or running the PAUSE instruction. Available PDIST counters: 0", + "PublicDescription": "Counts core clocks when the thread is in the= C0.1 or C0.2 power saving optimized states (TPAUSE or UMWAIT instructions)= or running the PAUSE instruction.", "SampleAfterValue": "2000003", "UMask": "0x70" }, @@ -244,7 +240,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xec", "EventName": "CPU_CLK_UNHALTED.DISTRIBUTED", - "PublicDescription": "This event distributes cycle counts between = active hyperthreads, i.e., those in C0. A hyperthread becomes inactive whe= n it executes the HLT or MWAIT instructions. If all other hyperthreads are= inactive (or disabled or do not exist), all counts are attributed to this = hyperthread. To obtain the full count when the Core is active, sum the coun= ts from each hyperthread. Available PDIST counters: 0", + "PublicDescription": "This event distributes cycle counts between = active hyperthreads, i.e., those in C0. A hyperthread becomes inactive whe= n it executes the HLT or MWAIT instructions. If all other hyperthreads are= inactive (or disabled or do not exist), all counts are attributed to this = hyperthread. To obtain the full count when the Core is active, sum the coun= ts from each hyperthread.", "SampleAfterValue": "2000003", "UMask": "0x2" }, @@ -253,7 +249,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0x3c", "EventName": "CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE", - "PublicDescription": "Counts Core crystal clock cycles when curren= t thread is unhalted and the other thread is halted. Available PDIST counte= rs: 0", + "PublicDescription": "Counts Core crystal clock cycles when curren= t thread is unhalted and the other thread is halted.", "SampleAfterValue": "25003", "UMask": "0x2" }, @@ -262,7 +258,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xec", "EventName": "CPU_CLK_UNHALTED.PAUSE", - "PublicDescription": "CPU_CLK_UNHALTED.PAUSE Available PDIST count= ers: 0", "SampleAfterValue": "2000003", "UMask": "0x40" }, @@ -273,7 +268,6 @@ "EdgeDetect": "1", "EventCode": "0xec", "EventName": "CPU_CLK_UNHALTED.PAUSE_INST", - "PublicDescription": "CPU_CLK_UNHALTED.PAUSE_INST Available PDIST = counters: 0", "SampleAfterValue": "2000003", "UMask": "0x40" }, @@ -282,7 +276,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0x3c", "EventName": "CPU_CLK_UNHALTED.REF_DISTRIBUTED", - "PublicDescription": "This event distributes Core crystal clock cy= cle counts between active hyperthreads, i.e., those in C0 sleep-state. A hy= perthread becomes inactive when it executes the HLT or MWAIT instructions. = If one thread is active in a core, all counts are attributed to this hypert= hread. To obtain the full count when the Core is active, sum the counts fro= m each hyperthread. Available PDIST counters: 0", + "PublicDescription": "This event distributes Core crystal clock cy= cle counts between active hyperthreads, i.e., those in C0 sleep-state. A hy= perthread becomes inactive when it executes the HLT or MWAIT instructions. = If one thread is active in a core, all counts are attributed to this hypert= hread. To obtain the full count when the Core is active, sum the counts fro= m each hyperthread.", "SampleAfterValue": "2000003", "UMask": "0x8" }, @@ -299,7 +293,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0x3c", "EventName": "CPU_CLK_UNHALTED.REF_TSC_P", - "PublicDescription": "Counts the number of reference cycles when t= he core is not in a halt state. The core enters the halt state when it is r= unning the HLT instruction or the MWAIT instruction. This event is not affe= cted by core frequency changes (for example, P states, TM2 transitions) but= has the same incrementing frequency as the time stamp counter. This event = can approximate elapsed time while the core was not in a halt state. It is = counted on a dedicated fixed counter, leaving the four (eight when Hyperthr= eading is disabled) programmable counters available for other events. Note:= On all current platforms this event stops counting during 'throttling (TM)= ' states duty off periods the processor is 'halted'. The counter update is= done at a lower clock rate then the core clock the overflow status bit for= this counter may appear 'sticky'. After the counter has overflowed and so= ftware clears the overflow status bit and resets the counter to less than M= AX. The reset value to the counter is not clocked immediately so the overfl= ow status bit will flip 'high (1)' and generate another PMI (if enabled) af= ter which the reset value gets clocked into the counter. Therefore, softwar= e will get the interrupt, read the overflow status bit '1 for bit 34 while = the counter value is less than MAX. Software should ignore this case. Avail= able PDIST counters: 0", + "PublicDescription": "Counts the number of reference cycles when t= he core is not in a halt state. The core enters the halt state when it is r= unning the HLT instruction or the MWAIT instruction. This event is not affe= cted by core frequency changes (for example, P states, TM2 transitions) but= has the same incrementing frequency as the time stamp counter. This event = can approximate elapsed time while the core was not in a halt state. It is = counted on a dedicated fixed counter, leaving the four (eight when Hyperthr= eading is disabled) programmable counters available for other events. Note:= On all current platforms this event stops counting during 'throttling (TM)= ' states duty off periods the processor is 'halted'. The counter update is= done at a lower clock rate then the core clock the overflow status bit for= this counter may appear 'sticky'. After the counter has overflowed and so= ftware clears the overflow status bit and resets the counter to less than M= AX. The reset value to the counter is not clocked immediately so the overfl= ow status bit will flip 'high (1)' and generate another PMI (if enabled) af= ter which the reset value gets clocked into the counter. Therefore, softwar= e will get the interrupt, read the overflow status bit '1 for bit 34 while = the counter value is less than MAX. Software should ignore this case.", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -316,7 +310,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0x3c", "EventName": "CPU_CLK_UNHALTED.THREAD_P", - "PublicDescription": "This is an architectural event that counts t= he number of thread cycles while the thread is not in a halt state. The thr= ead enters the halt state when it is running the HLT instruction. The core = frequency may change from time to time due to power or thermal throttling. = For this reason, this event may have a changing ratio with regards to wall = clock time. Available PDIST counters: 0", + "PublicDescription": "This is an architectural event that counts t= he number of thread cycles while the thread is not in a halt state. The thr= ead enters the halt state when it is running the HLT instruction. The core = frequency may change from time to time due to power or thermal throttling. = For this reason, this event may have a changing ratio with regards to wall = clock time.", "SampleAfterValue": "2000003" }, { @@ -325,7 +319,6 @@ "CounterMask": "8", "EventCode": "0xa3", "EventName": "CYCLE_ACTIVITY.CYCLES_L1D_MISS", - "PublicDescription": "Cycles while L1 cache miss demand load is ou= tstanding. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x8" }, @@ -335,7 +328,6 @@ "CounterMask": "1", "EventCode": "0xa3", "EventName": "CYCLE_ACTIVITY.CYCLES_L2_MISS", - "PublicDescription": "Cycles while L2 cache miss demand load is ou= tstanding. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x1" }, @@ -345,7 +337,6 @@ "CounterMask": "16", "EventCode": "0xa3", "EventName": "CYCLE_ACTIVITY.CYCLES_MEM_ANY", - "PublicDescription": "Cycles while memory subsystem has an outstan= ding load. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x10" }, @@ -355,7 +346,6 @@ "CounterMask": "12", "EventCode": "0xa3", "EventName": "CYCLE_ACTIVITY.STALLS_L1D_MISS", - "PublicDescription": "Execution stalls while L1 cache miss demand = load is outstanding. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0xc" }, @@ -365,7 +355,6 @@ "CounterMask": "5", "EventCode": "0xa3", "EventName": "CYCLE_ACTIVITY.STALLS_L2_MISS", - "PublicDescription": "Execution stalls while L2 cache miss demand = load is outstanding. Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x5" }, @@ -375,7 +364,6 @@ "CounterMask": "4", "EventCode": "0xa3", "EventName": "CYCLE_ACTIVITY.STALLS_TOTAL", - "PublicDescription": "Total execution stalls. Available PDIST coun= ters: 0", "SampleAfterValue": "1000003", "UMask": "0x4" }, @@ -384,7 +372,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb7", "EventName": "EXE.AMX_BUSY", - "PublicDescription": "Counts the cycles where the AMX (Advance Mat= rix Extension) unit is busy performing an operation. Available PDIST counte= rs: 0", "SampleAfterValue": "2000003", "UMask": "0x2" }, @@ -393,7 +380,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa6", "EventName": "EXE_ACTIVITY.1_PORTS_UTIL", - "PublicDescription": "Counts cycles during which a total of 1 uop = was executed on all ports and Reservation Station (RS) was not empty. Avail= able PDIST counters: 0", + "PublicDescription": "Counts cycles during which a total of 1 uop = was executed on all ports and Reservation Station (RS) was not empty.", "SampleAfterValue": "2000003", "UMask": "0x2" }, @@ -402,7 +389,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa6", "EventName": "EXE_ACTIVITY.2_3_PORTS_UTIL", - "PublicDescription": "Cycles total of 2 or 3 uops are executed on = all ports and Reservation Station (RS) was not empty. Available PDIST count= ers: 0", "SampleAfterValue": "2000003", "UMask": "0xc" }, @@ -411,7 +397,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa6", "EventName": "EXE_ACTIVITY.2_PORTS_UTIL", - "PublicDescription": "Counts cycles during which a total of 2 uops= were executed on all ports and Reservation Station (RS) was not empty. Ava= ilable PDIST counters: 0", + "PublicDescription": "Counts cycles during which a total of 2 uops= were executed on all ports and Reservation Station (RS) was not empty.", "SampleAfterValue": "2000003", "UMask": "0x4" }, @@ -420,7 +406,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa6", "EventName": "EXE_ACTIVITY.3_PORTS_UTIL", - "PublicDescription": "Cycles total of 3 uops are executed on all p= orts and Reservation Station (RS) was not empty. Available PDIST counters: = 0", + "PublicDescription": "Cycles total of 3 uops are executed on all p= orts and Reservation Station (RS) was not empty.", "SampleAfterValue": "2000003", "UMask": "0x8" }, @@ -429,7 +415,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa6", "EventName": "EXE_ACTIVITY.4_PORTS_UTIL", - "PublicDescription": "Cycles total of 4 uops are executed on all p= orts and Reservation Station (RS) was not empty. Available PDIST counters: = 0", + "PublicDescription": "Cycles total of 4 uops are executed on all p= orts and Reservation Station (RS) was not empty.", "SampleAfterValue": "2000003", "UMask": "0x10" }, @@ -439,7 +425,6 @@ "CounterMask": "5", "EventCode": "0xa6", "EventName": "EXE_ACTIVITY.BOUND_ON_LOADS", - "PublicDescription": "Execution stalls while memory subsystem has = an outstanding load. Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x21" }, @@ -449,7 +434,7 @@ "CounterMask": "2", "EventCode": "0xa6", "EventName": "EXE_ACTIVITY.BOUND_ON_STORES", - "PublicDescription": "Counts cycles where the Store Buffer was ful= l and no loads caused an execution stall. Available PDIST counters: 0", + "PublicDescription": "Counts cycles where the Store Buffer was ful= l and no loads caused an execution stall.", "SampleAfterValue": "1000003", "UMask": "0x40" }, @@ -458,7 +443,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa6", "EventName": "EXE_ACTIVITY.EXE_BOUND_0_PORTS", - "PublicDescription": "Number of cycles total of 0 uops executed on= all ports, Reservation Station (RS) was not empty, the Store Buffer (SB) w= as not full and there was no outstanding load. Available PDIST counters: 0", + "PublicDescription": "Number of cycles total of 0 uops executed on= all ports, Reservation Station (RS) was not empty, the Store Buffer (SB) w= as not full and there was no outstanding load.", "SampleAfterValue": "1000003", "UMask": "0x80" }, @@ -467,7 +452,7 @@ "Counter": "0,1,2,3", "EventCode": "0x75", "EventName": "INST_DECODED.DECODERS", - "PublicDescription": "Number of decoders utilized in a cycle when = the MITE (legacy decode pipeline) fetches instructions. Available PDIST cou= nters: 0", + "PublicDescription": "Number of decoders utilized in a cycle when = the MITE (legacy decode pipeline) fetches instructions.", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -492,7 +477,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc0", "EventName": "INST_RETIRED.MACRO_FUSED", - "PublicDescription": "INST_RETIRED.MACRO_FUSED Available PDIST cou= nters: 0", "SampleAfterValue": "2000003", "UMask": "0x10" }, @@ -501,7 +485,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc0", "EventName": "INST_RETIRED.NOP", - "PublicDescription": "Counts all retired NOP or ENDBR32/64 instruc= tions Available PDIST counters: 0", + "PublicDescription": "Counts all retired NOP or ENDBR32/64 instruc= tions", "SampleAfterValue": "2000003", "UMask": "0x2" }, @@ -518,7 +502,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc0", "EventName": "INST_RETIRED.REP_ITERATION", - "PublicDescription": "Number of iterations of Repeat (REP) string = retired instructions such as MOVS, CMPS, and SCAS. Each has a byte, word, a= nd doubleword version and string instructions can be repeated using a repet= ition prefix, REP, that allows their architectural execution to be repeated= a number of times as specified by the RCX register. Note the number of ite= rations is implementation-dependent. Available PDIST counters: 0", + "PublicDescription": "Number of iterations of Repeat (REP) string = retired instructions such as MOVS, CMPS, and SCAS. Each has a byte, word, a= nd doubleword version and string instructions can be repeated using a repet= ition prefix, REP, that allows their architectural execution to be repeated= a number of times as specified by the RCX register. Note the number of ite= rations is implementation-dependent.", "SampleAfterValue": "2000003", "UMask": "0x8" }, @@ -529,7 +513,7 @@ "EdgeDetect": "1", "EventCode": "0xad", "EventName": "INT_MISC.CLEARS_COUNT", - "PublicDescription": "Counts the number of speculative clears due = to any type of branch misprediction or machine clears Available PDIST count= ers: 0", + "PublicDescription": "Counts the number of speculative clears due = to any type of branch misprediction or machine clears", "SampleAfterValue": "500009", "UMask": "0x1" }, @@ -538,7 +522,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xad", "EventName": "INT_MISC.CLEAR_RESTEER_CYCLES", - "PublicDescription": "Cycles after recovery from a branch mispredi= ction or machine clear till the first uop is issued from the resteered path= . Available PDIST counters: 0", + "PublicDescription": "Cycles after recovery from a branch mispredi= ction or machine clear till the first uop is issued from the resteered path= .", "SampleAfterValue": "500009", "UMask": "0x80" }, @@ -547,7 +531,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xad", "EventName": "INT_MISC.MBA_STALLS", - "PublicDescription": "INT_MISC.MBA_STALLS Available PDIST counters= : 0", "SampleAfterValue": "1000003", "UMask": "0x20" }, @@ -556,7 +539,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xad", "EventName": "INT_MISC.RECOVERY_CYCLES", - "PublicDescription": "Counts core cycles when the Resource allocat= or was stalled due to recovery from an earlier branch misprediction or mach= ine clear event. Available PDIST counters: 0", + "PublicDescription": "Counts core cycles when the Resource allocat= or was stalled due to recovery from an earlier branch misprediction or mach= ine clear event.", "SampleAfterValue": "500009", "UMask": "0x1" }, @@ -567,7 +550,6 @@ "EventName": "INT_MISC.UNKNOWN_BRANCH_CYCLES", "MSRIndex": "0x3F7", "MSRValue": "0x7", - "PublicDescription": "Bubble cycles of BAClear (Unknown Branch). A= vailable PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x40" }, @@ -576,7 +558,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xad", "EventName": "INT_MISC.UOP_DROPPING", - "PublicDescription": "Estimated number of Top-down Microarchitectu= re Analysis slots that got dropped due to non front-end reasons Available P= DIST counters: 0", + "PublicDescription": "Estimated number of Top-down Microarchitectu= re Analysis slots that got dropped due to non front-end reasons", "SampleAfterValue": "1000003", "UMask": "0x10" }, @@ -585,7 +567,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe7", "EventName": "INT_VEC_RETIRED.128BIT", - "PublicDescription": "INT_VEC_RETIRED.128BIT Available PDIST count= ers: 0", "SampleAfterValue": "1000003", "UMask": "0x13" }, @@ -594,7 +575,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe7", "EventName": "INT_VEC_RETIRED.256BIT", - "PublicDescription": "INT_VEC_RETIRED.256BIT Available PDIST count= ers: 0", "SampleAfterValue": "1000003", "UMask": "0xac" }, @@ -603,7 +583,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe7", "EventName": "INT_VEC_RETIRED.ADD_128", - "PublicDescription": "Number of retired integer ADD/SUB (regular o= r horizontal), SAD 128-bit vector instructions. Available PDIST counters: 0= ", + "PublicDescription": "Number of retired integer ADD/SUB (regular o= r horizontal), SAD 128-bit vector instructions.", "SampleAfterValue": "1000003", "UMask": "0x3" }, @@ -612,7 +592,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe7", "EventName": "INT_VEC_RETIRED.ADD_256", - "PublicDescription": "Number of retired integer ADD/SUB (regular o= r horizontal), SAD 256-bit vector instructions. Available PDIST counters: 0= ", + "PublicDescription": "Number of retired integer ADD/SUB (regular o= r horizontal), SAD 256-bit vector instructions.", "SampleAfterValue": "1000003", "UMask": "0xc" }, @@ -621,7 +601,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe7", "EventName": "INT_VEC_RETIRED.MUL_256", - "PublicDescription": "INT_VEC_RETIRED.MUL_256 Available PDIST coun= ters: 0", "SampleAfterValue": "1000003", "UMask": "0x80" }, @@ -630,7 +609,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe7", "EventName": "INT_VEC_RETIRED.SHUFFLES", - "PublicDescription": "INT_VEC_RETIRED.SHUFFLES Available PDIST cou= nters: 0", "SampleAfterValue": "1000003", "UMask": "0x40" }, @@ -639,7 +617,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe7", "EventName": "INT_VEC_RETIRED.VNNI_128", - "PublicDescription": "INT_VEC_RETIRED.VNNI_128 Available PDIST cou= nters: 0", "SampleAfterValue": "1000003", "UMask": "0x10" }, @@ -648,7 +625,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe7", "EventName": "INT_VEC_RETIRED.VNNI_256", - "PublicDescription": "INT_VEC_RETIRED.VNNI_256 Available PDIST cou= nters: 0", "SampleAfterValue": "1000003", "UMask": "0x20" }, @@ -657,7 +633,7 @@ "Counter": "0,1,2,3", "EventCode": "0x03", "EventName": "LD_BLOCKS.ADDRESS_ALIAS", - "PublicDescription": "Counts the number of times a load got blocke= d due to false dependencies in MOB due to partial compare on address. Avail= able PDIST counters: 0", + "PublicDescription": "Counts the number of times a load got blocke= d due to false dependencies in MOB due to partial compare on address.", "SampleAfterValue": "100003", "UMask": "0x4" }, @@ -666,7 +642,7 @@ "Counter": "0,1,2,3", "EventCode": "0x03", "EventName": "LD_BLOCKS.NO_SR", - "PublicDescription": "Counts the number of times that split load o= perations are temporarily blocked because all resources for handling the sp= lit accesses are in use. Available PDIST counters: 0", + "PublicDescription": "Counts the number of times that split load o= perations are temporarily blocked because all resources for handling the sp= lit accesses are in use.", "SampleAfterValue": "100003", "UMask": "0x88" }, @@ -675,7 +651,7 @@ "Counter": "0,1,2,3", "EventCode": "0x03", "EventName": "LD_BLOCKS.STORE_FORWARD", - "PublicDescription": "Counts the number of times where store forwa= rding was prevented for a load operation. The most common case is a load bl= ocked due to the address of memory access (partially) overlapping with a pr= eceding uncompleted store. Note: See the table of not supported store forwa= rds in the Optimization Guide. Available PDIST counters: 0", + "PublicDescription": "Counts the number of times where store forwa= rding was prevented for a load operation. The most common case is a load bl= ocked due to the address of memory access (partially) overlapping with a pr= eceding uncompleted store. Note: See the table of not supported store forwa= rds in the Optimization Guide.", "SampleAfterValue": "100003", "UMask": "0x82" }, @@ -684,7 +660,7 @@ "Counter": "0,1,2,3", "EventCode": "0x4c", "EventName": "LOAD_HIT_PREFETCH.SWPF", - "PublicDescription": "Counts all software-prefetch load dispatches= that hit the fill buffer (FB) allocated for the software prefetch. It can = also be incremented by some lock instructions. So it should only be used wi= th profiling so that the locks can be excluded by ASM (Assembly File) inspe= ction of the nearby instructions. Available PDIST counters: 0", + "PublicDescription": "Counts all software-prefetch load dispatches= that hit the fill buffer (FB) allocated for the software prefetch. It can = also be incremented by some lock instructions. So it should only be used wi= th profiling so that the locks can be excluded by ASM (Assembly File) inspe= ction of the nearby instructions.", "SampleAfterValue": "100003", "UMask": "0x1" }, @@ -694,7 +670,7 @@ "CounterMask": "1", "EventCode": "0xa8", "EventName": "LSD.CYCLES_ACTIVE", - "PublicDescription": "Counts the cycles when at least one uop is d= elivered by the LSD (Loop-stream detector). Available PDIST counters: 0", + "PublicDescription": "Counts the cycles when at least one uop is d= elivered by the LSD (Loop-stream detector).", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -704,7 +680,7 @@ "CounterMask": "6", "EventCode": "0xa8", "EventName": "LSD.CYCLES_OK", - "PublicDescription": "Counts the cycles when optimal number of uop= s is delivered by the LSD (Loop-stream detector). Available PDIST counters:= 0", + "PublicDescription": "Counts the cycles when optimal number of uop= s is delivered by the LSD (Loop-stream detector).", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -713,7 +689,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa8", "EventName": "LSD.UOPS", - "PublicDescription": "Counts the number of uops delivered to the b= ack-end by the LSD(Loop Stream Detector). Available PDIST counters: 0", + "PublicDescription": "Counts the number of uops delivered to the b= ack-end by the LSD(Loop Stream Detector).", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -724,7 +700,7 @@ "EdgeDetect": "1", "EventCode": "0xc3", "EventName": "MACHINE_CLEARS.COUNT", - "PublicDescription": "Counts the number of machine clears (nukes) = of any type. Available PDIST counters: 0", + "PublicDescription": "Counts the number of machine clears (nukes) = of any type.", "SampleAfterValue": "100003", "UMask": "0x1" }, @@ -733,7 +709,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc3", "EventName": "MACHINE_CLEARS.SMC", - "PublicDescription": "Counts self-modifying code (SMC) detected, w= hich causes a machine clear. Available PDIST counters: 0", + "PublicDescription": "Counts self-modifying code (SMC) detected, w= hich causes a machine clear.", "SampleAfterValue": "100003", "UMask": "0x4" }, @@ -742,7 +718,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xe0", "EventName": "MISC2_RETIRED.LFENCE", - "PublicDescription": "number of LFENCE retired instructions Availa= ble PDIST counters: 0", + "PublicDescription": "number of LFENCE retired instructions", "SampleAfterValue": "400009", "UMask": "0x20" }, @@ -751,7 +727,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xcc", "EventName": "MISC_RETIRED.LBR_INSERTS", - "PublicDescription": "Increments when an entry is added to the Las= t Branch Record (LBR) array (or removed from the array in case of RETURNs i= n call stack mode). The event requires LBR enable via IA32_DEBUGCTL MSR and= branch type selection via MSR_LBR_SELECT. Available PDIST counters: 0", + "PublicDescription": "Increments when an entry is added to the Las= t Branch Record (LBR) array (or removed from the array in case of RETURNs i= n call stack mode). The event requires LBR enable via IA32_DEBUGCTL MSR and= branch type selection via MSR_LBR_SELECT.", "SampleAfterValue": "100003", "UMask": "0x20" }, @@ -760,7 +736,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa2", "EventName": "RESOURCE_STALLS.SB", - "PublicDescription": "Counts allocation stall cycles caused by the= store buffer (SB) being full. This counts cycles that the pipeline back-en= d blocked uop delivery from the front-end. Available PDIST counters: 0", + "PublicDescription": "Counts allocation stall cycles caused by the= store buffer (SB) being full. This counts cycles that the pipeline back-en= d blocked uop delivery from the front-end.", "SampleAfterValue": "100003", "UMask": "0x8" }, @@ -769,7 +745,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa2", "EventName": "RESOURCE_STALLS.SCOREBOARD", - "PublicDescription": "Counts cycles where the pipeline is stalled = due to serializing operations. Available PDIST counters: 0", "SampleAfterValue": "100003", "UMask": "0x2" }, @@ -778,7 +753,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa5", "EventName": "RS.EMPTY", - "PublicDescription": "Counts cycles during which the reservation s= tation (RS) is empty for this logical processor. This is usually caused whe= n the front-end pipeline runs into starvation periods (e.g. branch mispredi= ctions or i-cache misses) Available PDIST counters: 0", + "PublicDescription": "Counts cycles during which the reservation s= tation (RS) is empty for this logical processor. This is usually caused whe= n the front-end pipeline runs into starvation periods (e.g. branch mispredi= ctions or i-cache misses)", "SampleAfterValue": "1000003", "UMask": "0x7" }, @@ -790,7 +765,7 @@ "EventCode": "0xa5", "EventName": "RS.EMPTY_COUNT", "Invert": "1", - "PublicDescription": "Counts end of periods where the Reservation = Station (RS) was empty. Could be useful to closely sample on front-end late= ncy issues (see the FRONTEND_RETIRED event of designated precise events) Av= ailable PDIST counters: 0", + "PublicDescription": "Counts end of periods where the Reservation = Station (RS) was empty. Could be useful to closely sample on front-end late= ncy issues (see the FRONTEND_RETIRED event of designated precise events)", "SampleAfterValue": "100003", "UMask": "0x7" }, @@ -799,7 +774,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa5", "EventName": "RS.EMPTY_RESOURCE", - "PublicDescription": "Cycles when Reservation Station (RS) is empt= y due to a resource in the back-end Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x1" }, @@ -812,7 +786,6 @@ "EventCode": "0xa5", "EventName": "RS_EMPTY.COUNT", "Invert": "1", - "PublicDescription": "This event is deprecated. Refer to new event= RS.EMPTY_COUNT Available PDIST counters: 0", "SampleAfterValue": "100003", "UMask": "0x7" }, @@ -822,7 +795,6 @@ "Deprecated": "1", "EventCode": "0xa5", "EventName": "RS_EMPTY.CYCLES", - "PublicDescription": "This event is deprecated. Refer to new event= RS.EMPTY Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x7" }, @@ -831,7 +803,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa4", "EventName": "TOPDOWN.BACKEND_BOUND_SLOTS", - "PublicDescription": "Number of slots in TMA method where no micro= -operations were being issued from front-end to back-end of the machine due= to lack of back-end resources. Available PDIST counters: 0", + "PublicDescription": "Number of slots in TMA method where no micro= -operations were being issued from front-end to back-end of the machine due= to lack of back-end resources.", "SampleAfterValue": "10000003", "UMask": "0x2" }, @@ -840,7 +812,7 @@ "Counter": "0", "EventCode": "0xa4", "EventName": "TOPDOWN.BAD_SPEC_SLOTS", - "PublicDescription": "Number of slots of TMA method that were wast= ed due to incorrect speculation. It covers all types of control-flow or dat= a-related mis-speculations. Available PDIST counters: 0", + "PublicDescription": "Number of slots of TMA method that were wast= ed due to incorrect speculation. It covers all types of control-flow or dat= a-related mis-speculations.", "SampleAfterValue": "10000003", "UMask": "0x4" }, @@ -849,7 +821,7 @@ "Counter": "0", "EventCode": "0xa4", "EventName": "TOPDOWN.BR_MISPREDICT_SLOTS", - "PublicDescription": "Number of TMA slots that were wasted due to = incorrect speculation by (any type of) branch mispredictions. This event es= timates number of speculative operations that were issued but not retired a= s well as the out-of-order engine recovery past a branch misprediction. Ava= ilable PDIST counters: 0", + "PublicDescription": "Number of TMA slots that were wasted due to = incorrect speculation by (any type of) branch mispredictions. This event es= timates number of speculative operations that were issued but not retired a= s well as the out-of-order engine recovery past a branch misprediction.", "SampleAfterValue": "10000003", "UMask": "0x8" }, @@ -858,7 +830,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa4", "EventName": "TOPDOWN.MEMORY_BOUND_SLOTS", - "PublicDescription": "TOPDOWN.MEMORY_BOUND_SLOTS Available PDIST c= ounters: 0", "SampleAfterValue": "10000003", "UMask": "0x10" }, @@ -875,7 +846,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xa4", "EventName": "TOPDOWN.SLOTS_P", - "PublicDescription": "Counts the number of available slots for an = unhalted logical processor. The event increments by machine-width of the na= rrowest pipeline as employed by the Top-down Microarchitecture Analysis met= hod. The count is distributed among unhalted logical processors (hyper-thre= ads) who share the same physical core. Available PDIST counters: 0", + "PublicDescription": "Counts the number of available slots for an = unhalted logical processor. The event increments by machine-width of the na= rrowest pipeline as employed by the Top-down Microarchitecture Analysis met= hod. The count is distributed among unhalted logical processors (hyper-thre= ads) who share the same physical core.", "SampleAfterValue": "10000003", "UMask": "0x1" }, @@ -884,7 +855,6 @@ "Counter": "0,1,2,3", "EventCode": "0x76", "EventName": "UOPS_DECODED.DEC0_UOPS", - "PublicDescription": "UOPS_DECODED.DEC0_UOPS Available PDIST count= ers: 0", "SampleAfterValue": "1000003", "UMask": "0x1" }, @@ -893,7 +863,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb2", "EventName": "UOPS_DISPATCHED.PORT_0", - "PublicDescription": "Number of uops dispatch to execution port 0= . Available PDIST counters: 0", + "PublicDescription": "Number of uops dispatch to execution port 0= .", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -902,7 +872,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb2", "EventName": "UOPS_DISPATCHED.PORT_1", - "PublicDescription": "Number of uops dispatch to execution port 1= . Available PDIST counters: 0", + "PublicDescription": "Number of uops dispatch to execution port 1= .", "SampleAfterValue": "2000003", "UMask": "0x2" }, @@ -911,7 +881,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb2", "EventName": "UOPS_DISPATCHED.PORT_2_3_10", - "PublicDescription": "Number of uops dispatch to execution ports 2= , 3 and 10 Available PDIST counters: 0", + "PublicDescription": "Number of uops dispatch to execution ports 2= , 3 and 10", "SampleAfterValue": "2000003", "UMask": "0x4" }, @@ -920,7 +890,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb2", "EventName": "UOPS_DISPATCHED.PORT_4_9", - "PublicDescription": "Number of uops dispatch to execution ports 4= and 9 Available PDIST counters: 0", + "PublicDescription": "Number of uops dispatch to execution ports 4= and 9", "SampleAfterValue": "2000003", "UMask": "0x10" }, @@ -929,7 +899,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb2", "EventName": "UOPS_DISPATCHED.PORT_5_11", - "PublicDescription": "Number of uops dispatch to execution ports 5= and 11 Available PDIST counters: 0", + "PublicDescription": "Number of uops dispatch to execution ports 5= and 11", "SampleAfterValue": "2000003", "UMask": "0x20" }, @@ -938,7 +908,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb2", "EventName": "UOPS_DISPATCHED.PORT_6", - "PublicDescription": "Number of uops dispatch to execution port 6= . Available PDIST counters: 0", + "PublicDescription": "Number of uops dispatch to execution port 6= .", "SampleAfterValue": "2000003", "UMask": "0x40" }, @@ -947,7 +917,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb2", "EventName": "UOPS_DISPATCHED.PORT_7_8", - "PublicDescription": "Number of uops dispatch to execution ports = 7 and 8. Available PDIST counters: 0", + "PublicDescription": "Number of uops dispatch to execution ports = 7 and 8.", "SampleAfterValue": "2000003", "UMask": "0x80" }, @@ -956,7 +926,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.CORE", - "PublicDescription": "Counts the number of uops executed from any = thread. Available PDIST counters: 0", + "PublicDescription": "Counts the number of uops executed from any = thread.", "SampleAfterValue": "2000003", "UMask": "0x2" }, @@ -966,7 +936,7 @@ "CounterMask": "1", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.CORE_CYCLES_GE_1", - "PublicDescription": "Counts cycles when at least 1 micro-op is ex= ecuted from any thread on physical core. Available PDIST counters: 0", + "PublicDescription": "Counts cycles when at least 1 micro-op is ex= ecuted from any thread on physical core.", "SampleAfterValue": "2000003", "UMask": "0x2" }, @@ -976,7 +946,7 @@ "CounterMask": "2", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.CORE_CYCLES_GE_2", - "PublicDescription": "Counts cycles when at least 2 micro-ops are = executed from any thread on physical core. Available PDIST counters: 0", + "PublicDescription": "Counts cycles when at least 2 micro-ops are = executed from any thread on physical core.", "SampleAfterValue": "2000003", "UMask": "0x2" }, @@ -986,7 +956,7 @@ "CounterMask": "3", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.CORE_CYCLES_GE_3", - "PublicDescription": "Counts cycles when at least 3 micro-ops are = executed from any thread on physical core. Available PDIST counters: 0", + "PublicDescription": "Counts cycles when at least 3 micro-ops are = executed from any thread on physical core.", "SampleAfterValue": "2000003", "UMask": "0x2" }, @@ -996,7 +966,7 @@ "CounterMask": "4", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.CORE_CYCLES_GE_4", - "PublicDescription": "Counts cycles when at least 4 micro-ops are = executed from any thread on physical core. Available PDIST counters: 0", + "PublicDescription": "Counts cycles when at least 4 micro-ops are = executed from any thread on physical core.", "SampleAfterValue": "2000003", "UMask": "0x2" }, @@ -1006,7 +976,7 @@ "CounterMask": "1", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.CYCLES_GE_1", - "PublicDescription": "Cycles where at least 1 uop was executed per= -thread. Available PDIST counters: 0", + "PublicDescription": "Cycles where at least 1 uop was executed per= -thread.", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -1016,7 +986,7 @@ "CounterMask": "2", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.CYCLES_GE_2", - "PublicDescription": "Cycles where at least 2 uops were executed p= er-thread. Available PDIST counters: 0", + "PublicDescription": "Cycles where at least 2 uops were executed p= er-thread.", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -1026,7 +996,7 @@ "CounterMask": "3", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.CYCLES_GE_3", - "PublicDescription": "Cycles where at least 3 uops were executed p= er-thread. Available PDIST counters: 0", + "PublicDescription": "Cycles where at least 3 uops were executed p= er-thread.", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -1036,7 +1006,7 @@ "CounterMask": "4", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.CYCLES_GE_4", - "PublicDescription": "Cycles where at least 4 uops were executed p= er-thread. Available PDIST counters: 0", + "PublicDescription": "Cycles where at least 4 uops were executed p= er-thread.", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -1047,7 +1017,7 @@ "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.STALLS", "Invert": "1", - "PublicDescription": "Counts cycles during which no uops were disp= atched from the Reservation Station (RS) per thread. Available PDIST counte= rs: 0", + "PublicDescription": "Counts cycles during which no uops were disp= atched from the Reservation Station (RS) per thread.", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -1059,7 +1029,6 @@ "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.STALL_CYCLES", "Invert": "1", - "PublicDescription": "This event is deprecated. Refer to new event= UOPS_EXECUTED.STALLS Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -1068,7 +1037,6 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.THREAD", - "PublicDescription": "Counts the number of uops to be executed per= -thread each cycle. Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -1077,7 +1045,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xb1", "EventName": "UOPS_EXECUTED.X87", - "PublicDescription": "Counts the number of x87 uops executed. Avai= lable PDIST counters: 0", + "PublicDescription": "Counts the number of x87 uops executed.", "SampleAfterValue": "2000003", "UMask": "0x10" }, @@ -1086,7 +1054,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xae", "EventName": "UOPS_ISSUED.ANY", - "PublicDescription": "Counts the number of uops that the Resource = Allocation Table (RAT) issues to the Reservation Station (RS). Available PD= IST counters: 0", + "PublicDescription": "Counts the number of uops that the Resource = Allocation Table (RAT) issues to the Reservation Station (RS).", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -1096,7 +1064,6 @@ "CounterMask": "1", "EventCode": "0xae", "EventName": "UOPS_ISSUED.CYCLES", - "PublicDescription": "UOPS_ISSUED.CYCLES Available PDIST counters:= 0", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -1106,7 +1073,7 @@ "CounterMask": "1", "EventCode": "0xc2", "EventName": "UOPS_RETIRED.CYCLES", - "PublicDescription": "Counts cycles where at least one uop has ret= ired. Available PDIST counters: 0", + "PublicDescription": "Counts cycles where at least one uop has ret= ired.", "SampleAfterValue": "1000003", "UMask": "0x2" }, @@ -1115,7 +1082,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc2", "EventName": "UOPS_RETIRED.HEAVY", - "PublicDescription": "Counts the number of retired micro-operation= s (uops) except the last uop of each instruction. An instruction that is de= coded into less than two uops does not contribute to the count. Available P= DIST counters: 0", + "PublicDescription": "Counts the number of retired micro-operation= s (uops) except the last uop of each instruction. An instruction that is de= coded into less than two uops does not contribute to the count.", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -1126,7 +1093,6 @@ "EventName": "UOPS_RETIRED.MS", "MSRIndex": "0x3F7", "MSRValue": "0x8", - "PublicDescription": "UOPS_RETIRED.MS Available PDIST counters: 0", "SampleAfterValue": "2000003", "UMask": "0x4" }, @@ -1135,7 +1101,7 @@ "Counter": "0,1,2,3,4,5,6,7", "EventCode": "0xc2", "EventName": "UOPS_RETIRED.SLOTS", - "PublicDescription": "Counts the retirement slots used each cycle.= Available PDIST counters: 0", + "PublicDescription": "Counts the retirement slots used each cycle.= ", "SampleAfterValue": "2000003", "UMask": "0x2" }, @@ -1146,7 +1112,7 @@ "EventCode": "0xc2", "EventName": "UOPS_RETIRED.STALLS", "Invert": "1", - "PublicDescription": "This event counts cycles without actually re= tired uops. Available PDIST counters: 0", + "PublicDescription": "This event counts cycles without actually re= tired uops.", "SampleAfterValue": "1000003", "UMask": "0x2" }, @@ -1158,7 +1124,6 @@ "EventCode": "0xc2", "EventName": "UOPS_RETIRED.STALL_CYCLES", "Invert": "1", - "PublicDescription": "This event is deprecated. Refer to new event= UOPS_RETIRED.STALLS Available PDIST counters: 0", "SampleAfterValue": "1000003", "UMask": "0x2" } diff --git a/tools/perf/pmu-events/arch/x86/sapphirerapids/spr-metrics.json= b/tools/perf/pmu-events/arch/x86/sapphirerapids/spr-metrics.json index fe3f288be10e..8a18d6e01ef4 100644 --- a/tools/perf/pmu-events/arch/x86/sapphirerapids/spr-metrics.json +++ b/tools/perf/pmu-events/arch/x86/sapphirerapids/spr-metrics.json @@ -1,28 +1,28 @@ [ { "BriefDescription": "C1 residency percent per core", - "MetricExpr": "cstate_core@c1\\-residency@ / TSC", + "MetricExpr": "cstate_core@c1\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C1_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C2 residency percent per package", - "MetricExpr": "cstate_pkg@c2\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c2\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C2_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per core", - "MetricExpr": "cstate_core@c6\\-residency@ / TSC", + "MetricExpr": "cstate_core@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per package", - "MetricExpr": "cstate_pkg@c6\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Pkg_Residency", "ScaleUnit": "100%" @@ -90,6 +90,12 @@ "MetricName": "io_bandwidth_read", "ScaleUnit": "1MB/s" }, + { + "BriefDescription": "Bandwidth of inbound IO reads that are initia= ted by end device controllers that are requesting memory from the CPU and m= iss the L3 cache", + "MetricExpr": "UNC_CHA_TOR_INSERTS.IO_MISS_PCIRDCUR * 64 / 1e6 / d= uration_time", + "MetricName": "io_bandwidth_read_l3_miss", + "ScaleUnit": "1MB/s" + }, { "BriefDescription": "Bandwidth of IO reads that are initiated by e= nd device controllers that are requesting memory from the local CPU socket", "MetricExpr": "UNC_CHA_TOR_INSERTS.IO_PCIRDCUR_LOCAL * 64 / 1e6 / = duration_time", @@ -108,6 +114,12 @@ "MetricName": "io_bandwidth_write", "ScaleUnit": "1MB/s" }, + { + "BriefDescription": "Bandwidth of inbound IO writes that are initi= ated by end device controllers that are writing memory to the CPU", + "MetricExpr": "(UNC_CHA_TOR_INSERTS.IO_MISS_ITOM + UNC_CHA_TOR_INS= ERTS.IO_MISS_ITOMCACHENEAR) * 64 / 1e6 / duration_time", + "MetricName": "io_bandwidth_write_l3_miss", + "ScaleUnit": "1MB/s" + }, { "BriefDescription": "Bandwidth of IO writes that are initiated by = end device controllers that are writing memory to the local CPU socket", "MetricExpr": "(UNC_CHA_TOR_INSERTS.IO_ITOM_LOCAL + UNC_CHA_TOR_IN= SERTS.IO_ITOMCACHENEAR_LOCAL) * 64 / 1e6 / duration_time", @@ -123,19 +135,19 @@ { "BriefDescription": "Percentage of inbound full cacheline writes i= nitiated by end device controllers that miss the L3 cache", "MetricExpr": "UNC_CHA_TOR_INSERTS.IO_MISS_ITOM / UNC_CHA_TOR_INSE= RTS.IO_ITOM", - "MetricName": "io_percent_of_inbound_full_writes_that_miss_l3", + "MetricName": "io_full_write_l3_miss", "ScaleUnit": "100%" }, { "BriefDescription": "Percentage of inbound partial cacheline write= s initiated by end device controllers that miss the L3 cache", "MetricExpr": "(UNC_CHA_TOR_INSERTS.IO_MISS_ITOMCACHENEAR + UNC_CH= A_TOR_INSERTS.IO_MISS_RFO) / (UNC_CHA_TOR_INSERTS.IO_ITOMCACHENEAR + UNC_CH= A_TOR_INSERTS.IO_RFO)", - "MetricName": "io_percent_of_inbound_partial_writes_that_miss_l3", + "MetricName": "io_partial_write_l3_miss", "ScaleUnit": "100%" }, { "BriefDescription": "Percentage of inbound reads initiated by end = device controllers that miss the L3 cache", "MetricExpr": "UNC_CHA_TOR_INSERTS.IO_MISS_PCIRDCUR / UNC_CHA_TOR_= INSERTS.IO_PCIRDCUR", - "MetricName": "io_percent_of_inbound_reads_that_miss_l3", + "MetricName": "io_read_l3_miss", "ScaleUnit": "100%" }, { @@ -395,7 +407,7 @@ { "BriefDescription": "This category represents fraction of slots wh= ere no uops are being delivered due to a lack of required resources for acc= epting new uops in the Backend", "DefaultMetricgroupName": "TopdownL1", - "MetricExpr": "topdown\\-be\\-bound / (topdown\\-fe\\-bound + topd= own\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_inf= o_thread_slots", + "MetricExpr": "topdown\\-be\\-bound / (topdown\\-fe\\-bound + topd= own\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound)", "MetricGroup": "BvOB;Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", "MetricThreshold": "tma_backend_bound > 0.2", @@ -429,40 +441,40 @@ "MetricThreshold": "tma_bottleneck_branching_overhead > 5", "PublicDescription": "Total pipeline cost of instructions used for= program control-flow - a subset of the Retiring category in TMA. Examples = include function calls; loops and alignments. (A lower bound)" }, + { + "BriefDescription": "Total pipeline cost when the execution is com= pute-bound - an estimation", + "MetricExpr": "100 * (tma_core_bound * tma_divider / (tma_amx_busy= + tma_divider + tma_ports_utilization + tma_serializing_operation) + tma_c= ore_bound * tma_amx_busy / (tma_amx_busy + tma_divider + tma_ports_utilizat= ion + tma_serializing_operation) + tma_core_bound * (tma_ports_utilization = / (tma_amx_busy + tma_divider + tma_ports_utilization + tma_serializing_ope= ration)) * (tma_ports_utilized_3m / (tma_ports_utilized_0 + tma_ports_utili= zed_1 + tma_ports_utilized_2 + tma_ports_utilized_3m)))", + "MetricGroup": "BvCB;Cor;tma_issueComp", + "MetricName": "tma_bottleneck_compute_bound_est", + "MetricThreshold": "tma_bottleneck_compute_bound_est > 20", + "PublicDescription": "Total pipeline cost when the execution is co= mpute-bound - an estimation. Covers Core Bound when High ILP as well as whe= n long-latency execution units are busy. Related metrics: " + }, { "BriefDescription": "Total pipeline cost of external Memory- or Ca= che-Bandwidth related bottlenecks", - "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) *= (tma_mem_bandwidth / (tma_mem_bandwidth + tma_mem_latency)) + tma_memory_b= ound * (tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_= l3_bound + tma_store_bound)) * (tma_sq_full / (tma_contested_accesses + tma= _data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_memory_bound * (tm= a_l1_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound += tma_store_bound)) * (tma_fb_full / (tma_dtlb_load + tma_fb_full + tma_l1_l= atency_dependency + tma_lock_latency + tma_split_loads + tma_store_fwd_blk)= ))", + "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_cx= l_mem_bound + tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound += tma_store_bound)) * (tma_mem_bandwidth / (tma_mem_bandwidth + tma_mem_late= ncy)) + tma_memory_bound * (tma_l3_bound / (tma_cxl_mem_bound + tma_dram_bo= und + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma= _sq_full / (tma_contested_accesses + tma_data_sharing + tma_l3_hit_latency = + tma_sq_full)) + tma_memory_bound * (tma_l1_bound / (tma_cxl_mem_bound + t= ma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_boun= d)) * (tma_fb_full / (tma_dtlb_load + tma_fb_full + tma_l1_latency_dependen= cy + tma_lock_latency + tma_split_loads + tma_store_fwd_blk)))", "MetricGroup": "BvMB;Mem;MemoryBW;Offcore;tma_issueBW", - "MetricName": "tma_bottleneck_cache_memory_bandwidth", - "MetricThreshold": "tma_bottleneck_cache_memory_bandwidth > 20", + "MetricName": "tma_bottleneck_data_cache_memory_bandwidth", + "MetricThreshold": "tma_bottleneck_data_cache_memory_bandwidth > 2= 0", "PublicDescription": "Total pipeline cost of external Memory- or C= ache-Bandwidth related bottlenecks. Related metrics: tma_fb_full, tma_info_= system_dram_bw_use, tma_mem_bandwidth, tma_sq_full" }, { "BriefDescription": "Total pipeline cost of external Memory- or Ca= che-Latency related bottlenecks", - "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) *= (tma_mem_latency / (tma_mem_bandwidth + tma_mem_latency)) + tma_memory_bou= nd * (tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3= _bound + tma_store_bound)) * (tma_l3_hit_latency / (tma_contested_accesses = + tma_data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_memory_bound = * tma_l2_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bou= nd + tma_store_bound) + tma_memory_bound * (tma_l1_bound / (tma_dram_bound = + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_l1_= latency_dependency / (tma_dtlb_load + tma_fb_full + tma_l1_latency_dependen= cy + tma_lock_latency + tma_split_loads + tma_store_fwd_blk)) + tma_memory_= bound * (tma_l1_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma= _l3_bound + tma_store_bound)) * (tma_lock_latency / (tma_dtlb_load + tma_fb= _full + tma_l1_latency_dependency + tma_lock_latency + tma_split_loads + tm= a_store_fwd_blk)) + tma_memory_bound * (tma_l1_bound / (tma_dram_bound + tm= a_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_split_l= oads / (tma_dtlb_load + tma_fb_full + tma_l1_latency_dependency + tma_lock_= latency + tma_split_loads + tma_store_fwd_blk)) + tma_memory_bound * (tma_s= tore_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound += tma_store_bound)) * (tma_split_stores / (tma_dtlb_store + tma_false_sharin= g + tma_split_stores + tma_store_latency + tma_streaming_stores)) + tma_mem= ory_bound * (tma_store_bound / (tma_dram_bound + tma_l1_bound + tma_l2_boun= d + tma_l3_bound + tma_store_bound)) * (tma_store_latency / (tma_dtlb_store= + tma_false_sharing + tma_split_stores + tma_store_latency + tma_streaming= _stores)))", + "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_cx= l_mem_bound + tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound += tma_store_bound)) * (tma_mem_latency / (tma_mem_bandwidth + tma_mem_latenc= y)) + tma_memory_bound * (tma_l3_bound / (tma_cxl_mem_bound + tma_dram_boun= d + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_l= 3_hit_latency / (tma_contested_accesses + tma_data_sharing + tma_l3_hit_lat= ency + tma_sq_full)) + tma_memory_bound * tma_l2_bound / (tma_cxl_mem_bound= + tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_= bound) + tma_memory_bound * (tma_l1_bound / (tma_cxl_mem_bound + tma_dram_b= ound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tm= a_l1_latency_dependency / (tma_dtlb_load + tma_fb_full + tma_l1_latency_dep= endency + tma_lock_latency + tma_split_loads + tma_store_fwd_blk)) + tma_me= mory_bound * (tma_l1_bound / (tma_cxl_mem_bound + tma_dram_bound + tma_l1_b= ound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_lock_latency = / (tma_dtlb_load + tma_fb_full + tma_l1_latency_dependency + tma_lock_laten= cy + tma_split_loads + tma_store_fwd_blk)) + tma_memory_bound * (tma_l1_bou= nd / (tma_cxl_mem_bound + tma_dram_bound + tma_l1_bound + tma_l2_bound + tm= a_l3_bound + tma_store_bound)) * (tma_split_loads / (tma_dtlb_load + tma_fb= _full + tma_l1_latency_dependency + tma_lock_latency + tma_split_loads + tm= a_store_fwd_blk)) + tma_memory_bound * (tma_store_bound / (tma_cxl_mem_boun= d + tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store= _bound)) * (tma_split_stores / (tma_dtlb_store + tma_false_sharing + tma_sp= lit_stores + tma_store_latency + tma_streaming_stores)) + tma_memory_bound = * (tma_store_bound / (tma_cxl_mem_bound + tma_dram_bound + tma_l1_bound + t= ma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_store_latency / (tma_= dtlb_store + tma_false_sharing + tma_split_stores + tma_store_latency + tma= _streaming_stores)))", "MetricGroup": "BvML;Mem;MemoryLat;Offcore;tma_issueLat", - "MetricName": "tma_bottleneck_cache_memory_latency", - "MetricThreshold": "tma_bottleneck_cache_memory_latency > 20", + "MetricName": "tma_bottleneck_data_cache_memory_latency", + "MetricThreshold": "tma_bottleneck_data_cache_memory_latency > 20", "PublicDescription": "Total pipeline cost of external Memory- or C= ache-Latency related bottlenecks. Related metrics: tma_l3_hit_latency, tma_= mem_latency" }, - { - "BriefDescription": "Total pipeline cost when the execution is com= pute-bound - an estimation", - "MetricExpr": "100 * (tma_core_bound * tma_divider / (tma_amx_busy= + tma_divider + tma_ports_utilization + tma_serializing_operation) + tma_c= ore_bound * tma_amx_busy / (tma_amx_busy + tma_divider + tma_ports_utilizat= ion + tma_serializing_operation) + tma_core_bound * (tma_ports_utilization = / (tma_amx_busy + tma_divider + tma_ports_utilization + tma_serializing_ope= ration)) * (tma_ports_utilized_3m / (tma_ports_utilized_0 + tma_ports_utili= zed_1 + tma_ports_utilized_2 + tma_ports_utilized_3m)))", - "MetricGroup": "BvCB;Cor;tma_issueComp", - "MetricName": "tma_bottleneck_compute_bound_est", - "MetricThreshold": "tma_bottleneck_compute_bound_est > 20", - "PublicDescription": "Total pipeline cost when the execution is co= mpute-bound - an estimation. Covers Core Bound when High ILP as well as whe= n long-latency execution units are busy. Related metrics: " - }, { "BriefDescription": "Total pipeline cost of instruction fetch band= width related bottlenecks (when the front-end could not sustain operations = delivery to the back-end)", - "MetricExpr": "100 * (tma_frontend_bound - (1 - 10 * tma_microcode= _sequencer * tma_other_mispredicts / tma_branch_mispredicts) * tma_fetch_la= tency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches = + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches) - (1 - I= NST_RETIRED.REP_ITERATION / cpu@UOPS_RETIRED.MS\\,cmask\\=3D1@) * (tma_fetc= h_latency * (tma_ms_switches + tma_branch_resteers * (tma_clears_resteers += tma_mispredicts_resteers * tma_other_mispredicts / tma_branch_mispredicts)= / (tma_clears_resteers + tma_mispredicts_resteers + tma_unknown_branches))= / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_m= isses + tma_lcp + tma_ms_switches) + tma_fetch_bandwidth * tma_ms / (tma_ds= b + tma_mite + tma_ms))) - tma_bottleneck_big_code", + "MetricExpr": "100 * (tma_frontend_bound - (1 - 10 * tma_microcode= _sequencer * tma_other_mispredicts / tma_branch_mispredicts) * tma_fetch_la= tency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches = + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches) - (1 - I= NST_RETIRED.REP_ITERATION / cpu@UOPS_RETIRED.MS\\,cmask\\=3D1@) * (tma_fetc= h_latency * (tma_ms_switches + tma_branch_resteers * (tma_clears_resteers += tma_mispredicts_resteers * tma_other_mispredicts / tma_branch_mispredicts)= / (tma_clears_resteers + tma_mispredicts_resteers + tma_unknown_branches))= / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_m= isses + tma_lcp + tma_ms_switches) + tma_ms)) - tma_bottleneck_big_code", "MetricGroup": "BvFB;Fed;FetchBW;Frontend", "MetricName": "tma_bottleneck_instruction_fetch_bw", "MetricThreshold": "tma_bottleneck_instruction_fetch_bw > 20" }, { "BriefDescription": "Total pipeline cost of irregular execution (e= .g", - "MetricExpr": "100 * ((1 - INST_RETIRED.REP_ITERATION / cpu@UOPS_R= ETIRED.MS\\,cmask\\=3D1@) * (tma_fetch_latency * (tma_ms_switches + tma_bra= nch_resteers * (tma_clears_resteers + tma_mispredicts_resteers * tma_other_= mispredicts / tma_branch_mispredicts) / (tma_clears_resteers + tma_mispredi= cts_resteers + tma_unknown_branches)) / (tma_branch_resteers + tma_dsb_swit= ches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches) + t= ma_fetch_bandwidth * tma_ms / (tma_dsb + tma_mite + tma_ms)) + 10 * tma_mic= rocode_sequencer * tma_other_mispredicts / tma_branch_mispredicts * tma_bra= nch_mispredicts + tma_machine_clears * tma_other_nukes / tma_other_nukes + = tma_core_bound * (tma_serializing_operation + RS.EMPTY_RESOURCE / tma_info_= thread_clks * tma_ports_utilized_0) / (tma_amx_busy + tma_divider + tma_por= ts_utilization + tma_serializing_operation) + tma_microcode_sequencer / (tm= a_few_uops_instructions + tma_microcode_sequencer) * (tma_assists / tma_mic= rocode_sequencer) * tma_heavy_operations)", + "MetricExpr": "100 * ((1 - INST_RETIRED.REP_ITERATION / cpu@UOPS_R= ETIRED.MS\\,cmask\\=3D1@) * (tma_fetch_latency * (tma_ms_switches + tma_bra= nch_resteers * (tma_clears_resteers + tma_mispredicts_resteers * tma_other_= mispredicts / tma_branch_mispredicts) / (tma_clears_resteers + tma_mispredi= cts_resteers + tma_unknown_branches)) / (tma_branch_resteers + tma_dsb_swit= ches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches) + t= ma_ms) + 10 * tma_microcode_sequencer * tma_other_mispredicts / tma_branch_= mispredicts * tma_branch_mispredicts + tma_machine_clears * tma_other_nukes= / tma_other_nukes + tma_core_bound * (tma_serializing_operation + RS.EMPTY= _RESOURCE / tma_info_thread_clks * tma_ports_utilized_0) / (tma_amx_busy + = tma_divider + tma_ports_utilization + tma_serializing_operation) + tma_micr= ocode_sequencer / (tma_few_uops_instructions + tma_microcode_sequencer) * (= tma_assists / tma_microcode_sequencer) * tma_heavy_operations)", "MetricGroup": "Bad;BvIO;Cor;Ret;tma_issueMS", "MetricName": "tma_bottleneck_irregular_overhead", "MetricThreshold": "tma_bottleneck_irregular_overhead > 10", @@ -470,7 +482,7 @@ }, { "BriefDescription": "Total pipeline cost of Memory Address Transla= tion related bottlenecks (data-side TLBs)", - "MetricExpr": "100 * (tma_memory_bound * (tma_l1_bound / max(tma_m= emory_bound, tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + = tma_store_bound)) * (tma_dtlb_load / max(tma_l1_bound, tma_dtlb_load + tma_= fb_full + tma_l1_latency_dependency + tma_lock_latency + tma_split_loads + = tma_store_fwd_blk)) + tma_memory_bound * (tma_store_bound / (tma_dram_bound= + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_dt= lb_store / (tma_dtlb_store + tma_false_sharing + tma_split_stores + tma_sto= re_latency + tma_streaming_stores)))", + "MetricExpr": "100 * (tma_memory_bound * (tma_l1_bound / max(tma_m= emory_bound, tma_cxl_mem_bound + tma_dram_bound + tma_l1_bound + tma_l2_bou= nd + tma_l3_bound + tma_store_bound)) * (tma_dtlb_load / max(tma_l1_bound, = tma_dtlb_load + tma_fb_full + tma_l1_latency_dependency + tma_lock_latency = + tma_split_loads + tma_store_fwd_blk)) + tma_memory_bound * (tma_store_bou= nd / (tma_cxl_mem_bound + tma_dram_bound + tma_l1_bound + tma_l2_bound + tm= a_l3_bound + tma_store_bound)) * (tma_dtlb_store / (tma_dtlb_store + tma_fa= lse_sharing + tma_split_stores + tma_store_latency + tma_streaming_stores))= )", "MetricGroup": "BvMT;Mem;MemoryTLB;Offcore;tma_issueTLB", "MetricName": "tma_bottleneck_memory_data_tlbs", "MetricThreshold": "tma_bottleneck_memory_data_tlbs > 20", @@ -478,7 +490,7 @@ }, { "BriefDescription": "Total pipeline cost of Memory Synchronization= related bottlenecks (data transfers and coherency updates across processor= s)", - "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * = (tma_mem_latency / (tma_mem_bandwidth + tma_mem_latency)) * tma_remote_cach= e / (tma_local_mem + tma_remote_cache + tma_remote_mem) + tma_l3_bound / (t= ma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_boun= d) * (tma_contested_accesses + tma_data_sharing) / (tma_contested_accesses = + tma_data_sharing + tma_l3_hit_latency + tma_sq_full) + tma_store_bound / = (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bo= und) * tma_false_sharing / (tma_dtlb_store + tma_false_sharing + tma_split_= stores + tma_store_latency + tma_streaming_stores - tma_store_latency)) + t= ma_machine_clears * (1 - tma_other_nukes / tma_other_nukes))", + "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_cx= l_mem_bound + tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound += tma_store_bound) * (tma_mem_latency / (tma_mem_bandwidth + tma_mem_latency= )) * tma_remote_cache / (tma_local_mem + tma_remote_cache + tma_remote_mem)= + tma_l3_bound / (tma_cxl_mem_bound + tma_dram_bound + tma_l1_bound + tma_= l2_bound + tma_l3_bound + tma_store_bound) * (tma_contested_accesses + tma_= data_sharing) / (tma_contested_accesses + tma_data_sharing + tma_l3_hit_lat= ency + tma_sq_full) + tma_store_bound / (tma_cxl_mem_bound + tma_dram_bound= + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * tma_fals= e_sharing / (tma_dtlb_store + tma_false_sharing + tma_split_stores + tma_st= ore_latency + tma_streaming_stores - tma_store_latency)) + tma_machine_clea= rs * (1 - tma_other_nukes / tma_other_nukes))", "MetricGroup": "BvMS;LockCont;Mem;Offcore;tma_issueSyncxn", "MetricName": "tma_bottleneck_memory_synchronization", "MetricThreshold": "tma_bottleneck_memory_synchronization > 10", @@ -494,7 +506,7 @@ }, { "BriefDescription": "Total pipeline cost of remaining bottlenecks = in the back-end", - "MetricExpr": "100 - (tma_bottleneck_big_code + tma_bottleneck_ins= truction_fetch_bw + tma_bottleneck_mispredictions + tma_bottleneck_cache_me= mory_bandwidth + tma_bottleneck_cache_memory_latency + tma_bottleneck_memor= y_data_tlbs + tma_bottleneck_memory_synchronization + tma_bottleneck_comput= e_bound_est + tma_bottleneck_irregular_overhead + tma_bottleneck_branching_= overhead + tma_bottleneck_useful_work)", + "MetricExpr": "100 - (tma_bottleneck_big_code + tma_bottleneck_ins= truction_fetch_bw + tma_bottleneck_mispredictions + tma_bottleneck_data_cac= he_memory_bandwidth + tma_bottleneck_data_cache_memory_latency + tma_bottle= neck_memory_data_tlbs + tma_bottleneck_memory_synchronization + tma_bottlen= eck_compute_bound_est + tma_bottleneck_irregular_overhead + tma_bottleneck_= branching_overhead + tma_bottleneck_useful_work)", "MetricGroup": "BvOB;Cor;Offcore", "MetricName": "tma_bottleneck_other_bottlenecks", "MetricThreshold": "tma_bottleneck_other_bottlenecks > 20", @@ -510,7 +522,7 @@ { "BriefDescription": "This metric represents fraction of slots the = CPU has wasted due to Branch Misprediction", "DefaultMetricgroupName": "TopdownL2", - "MetricExpr": "topdown\\-br\\-mispredict / (topdown\\-fe\\-bound += topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tm= a_info_thread_slots", + "MetricExpr": "topdown\\-br\\-mispredict / (topdown\\-fe\\-bound += topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound)", "MetricGroup": "BadSpec;BrMispredicts;BvMP;Default;TmaL2;TopdownL2= ;tma_L2_group;tma_bad_speculation_group;tma_issueBM", "MetricName": "tma_branch_mispredicts", "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_specula= tion > 0.15", @@ -611,7 +623,6 @@ }, { "BriefDescription": "This metric estimates fraction of cycles whil= e the memory subsystem was handling synchronizations due to contested acces= ses", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "(76.6 * tma_info_system_core_frequency * (MEM_LOAD_= L3_HIT_RETIRED.XSNP_FWD * (OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM / (OCR.DEMA= ND_DATA_RD.L3_HIT.SNOOP_HITM + OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HIT_WITH_FWD= ))) + 74.6 * tma_info_system_core_frequency * MEM_LOAD_L3_HIT_RETIRED.XSNP_= MISS) * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS / 2) / tma_= info_thread_clks", "MetricGroup": "BvMS;DataSharing;LockCont;Offcore;Snoop;TopdownL4;= tma_L4_group;tma_issueSyncxn;tma_l3_bound_group", "MetricName": "tma_contested_accesses", @@ -630,6 +641,15 @@ "PublicDescription": "This metric represents fraction of slots whe= re Core non-memory issues were of a bottleneck. Shortage in hardware compu= te resources; or dependencies in software's instructions are both categoriz= ed under Core Bound. Hence it may indicate the machine ran out of an out-of= -order resource; certain execution units are overloaded or dependencies in = program's data- or instruction-flow are limiting the performance (e.g. FP-c= hained long-latency arithmetic operations).", "ScaleUnit": "100%" }, + { + "BriefDescription": "This metric roughly estimates (based on idle = latencies) how often the CPU was stalled on accesses to external CXL Memory= by loads (e.g", + "MetricExpr": "(((1 - ((19 * (MEM_LOAD_L3_MISS_RETIRED.REMOTE_DRAM= * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS)) + 10 * (MEM_LO= AD_L3_MISS_RETIRED.LOCAL_DRAM * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RET= IRED.L1_MISS) + MEM_LOAD_L3_MISS_RETIRED.REMOTE_FWD * (1 + MEM_LOAD_RETIRED= .FB_HIT / MEM_LOAD_RETIRED.L1_MISS) + MEM_LOAD_L3_MISS_RETIRED.REMOTE_HITM = * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS))) / (19 * (MEM_L= OAD_L3_MISS_RETIRED.REMOTE_DRAM * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_R= ETIRED.L1_MISS)) + 10 * (MEM_LOAD_L3_MISS_RETIRED.LOCAL_DRAM * (1 + MEM_LOA= D_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS) + MEM_LOAD_L3_MISS_RETIRED.REM= OTE_FWD * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS) + MEM_LO= AD_L3_MISS_RETIRED.REMOTE_HITM * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RE= TIRED.L1_MISS)) + (25 * (MEM_LOAD_RETIRED.LOCAL_PMM * (1 + MEM_LOAD_RETIRED= .FB_HIT / MEM_LOAD_RETIRED.L1_MISS) if #has_pmem > 0 else 0) + 33 * (MEM_LO= AD_L3_MISS_RETIRED.REMOTE_PMM * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RET= IRED.L1_MISS) if #has_pmem > 0 else 0))) if #has_pmem > 0 else 1)) * (MEMOR= Y_ACTIVITY.STALLS_L3_MISS / tma_info_thread_clks) if 1e6 * (MEM_LOAD_L3_MIS= S_RETIRED.REMOTE_PMM + MEM_LOAD_RETIRED.LOCAL_PMM) > MEM_LOAD_RETIRED.L1_MI= SS else 0) if #has_pmem > 0 else 0)", + "MetricGroup": "MemoryBound;Server;TmaL3mem;TopdownL3;tma_L3_group= ;tma_memory_bound_group", + "MetricName": "tma_cxl_mem_bound", + "MetricThreshold": "tma_cxl_mem_bound > 0.1 & (tma_memory_bound > = 0.2 & tma_backend_bound > 0.2)", + "PublicDescription": "This metric roughly estimates (based on idle= latencies) how often the CPU was stalled on accesses to external CXL Memor= y by loads (e.g. 3D-Xpoint (Crystal Ridge, a.k.a. IXP) memory, PMM - Persis= tent Memory Module [from CLX to SPR] or any other CXL Type3 Memory [EMR onw= ards]).", + "ScaleUnit": "100%" + }, { "BriefDescription": "This metric estimates fraction of cycles whil= e the memory subsystem was handling synchronizations due to data-sharing ac= cesses", "MetricConstraint": "NO_GROUP_EVENTS", @@ -660,7 +680,7 @@ }, { "BriefDescription": "This metric estimates how often the CPU was s= talled on accesses to external memory (DRAM) by loads", - "MetricExpr": "MEMORY_ACTIVITY.STALLS_L3_MISS / tma_info_thread_cl= ks", + "MetricExpr": "(MEMORY_ACTIVITY.STALLS_L3_MISS / tma_info_thread_c= lks - tma_cxl_mem_bound if #has_pmem > 0 else MEMORY_ACTIVITY.STALLS_L3_MIS= S / tma_info_thread_clks)", "MetricGroup": "MemoryBound;TmaL3mem;TopdownL3;tma_L3_group;tma_me= mory_bound_group", "MetricName": "tma_dram_bound", "MetricThreshold": "tma_dram_bound > 0.1 & (tma_memory_bound > 0.2= & tma_backend_bound > 0.2)", @@ -718,7 +738,7 @@ "MetricGroup": "BvMB;MemoryBW;TopdownL4;tma_L4_group;tma_issueBW;t= ma_issueSL;tma_issueSmSt;tma_l1_bound_group", "MetricName": "tma_fb_full", "MetricThreshold": "tma_fb_full > 0.3", - "PublicDescription": "This metric does a *rough estimation* of how= often L1D Fill Buffer unavailability limited additional L1D miss memory ac= cess requests to proceed. The higher the metric value; the deeper the memor= y hierarchy level the misses are satisfied from (metric values >1 are valid= ). Often it hints on approaching bandwidth limits (to L2 cache; L3 cache or= external memory). Related metrics: tma_bottleneck_cache_memory_bandwidth, = tma_info_system_dram_bw_use, tma_mem_bandwidth, tma_sq_full, tma_store_late= ncy, tma_streaming_stores", + "PublicDescription": "This metric does a *rough estimation* of how= often L1D Fill Buffer unavailability limited additional L1D miss memory ac= cess requests to proceed. The higher the metric value; the deeper the memor= y hierarchy level the misses are satisfied from (metric values >1 are valid= ). Often it hints on approaching bandwidth limits (to L2 cache; L3 cache or= external memory). Related metrics: tma_bottleneck_data_cache_memory_bandwi= dth, tma_info_system_dram_bw_use, tma_mem_bandwidth, tma_sq_full, tma_store= _latency, tma_streaming_stores", "ScaleUnit": "100%" }, { @@ -846,7 +866,7 @@ { "BriefDescription": "This metric represents fraction of slots wher= e the CPU was retiring heavy-weight operations -- instructions that require= two or more uops or micro-coded sequences", "DefaultMetricgroupName": "TopdownL2", - "MetricExpr": "topdown\\-heavy\\-ops / (topdown\\-fe\\-bound + top= down\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_in= fo_thread_slots", + "MetricExpr": "topdown\\-heavy\\-ops / (topdown\\-fe\\-bound + top= down\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound)", "MetricGroup": "Default;Retire;TmaL2;TopdownL2;tma_L2_group;tma_re= tiring_group", "MetricName": "tma_heavy_operations", "MetricThreshold": "tma_heavy_operations > 0.1", @@ -1357,19 +1377,19 @@ { "BriefDescription": "Off-core accesses per kilo instruction for mo= dified write requests", "MetricExpr": "1e3 * OCR.MODIFIED_WRITE.ANY_RESPONSE / tma_info_in= st_mix_instructions", - "MetricGroup": "Offcore", + "MetricGroup": "Offcore;Server", "MetricName": "tma_info_memory_mix_offcore_mwrite_any_pki" }, { "BriefDescription": "Off-core accesses per kilo instruction for re= ads-to-core requests (speculative; including in-core HW prefetches)", "MetricExpr": "1e3 * OCR.READS_TO_CORE.ANY_RESPONSE / tma_info_ins= t_mix_instructions", - "MetricGroup": "CacheHits;Offcore", + "MetricGroup": "CacheHits;Offcore;Server", "MetricName": "tma_info_memory_mix_offcore_read_any_pki" }, { "BriefDescription": "L3 cache misses per kilo instruction for read= s-to-core requests (speculative; including in-core HW prefetches)", "MetricExpr": "1e3 * OCR.READS_TO_CORE.L3_MISS / tma_info_inst_mix= _instructions", - "MetricGroup": "Offcore", + "MetricGroup": "Offcore;Server", "MetricName": "tma_info_memory_mix_offcore_read_l3m_pki" }, { @@ -1395,21 +1415,21 @@ { "BriefDescription": "Average DRAM BW for Reads-to-Core (R2C) cover= ing for memory attached to local- and remote-socket", "MetricExpr": "64 * OCR.READS_TO_CORE.DRAM / 1e9 / tma_info_system= _time", - "MetricGroup": "HPC;Mem;MemoryBW;SoC", + "MetricGroup": "HPC;Mem;MemoryBW;Offcore;Server", "MetricName": "tma_info_memory_soc_r2c_dram_bw", "PublicDescription": "Average DRAM BW for Reads-to-Core (R2C) cove= ring for memory attached to local- and remote-socket. See R2C_Offcore_BW." }, { "BriefDescription": "Average L3-cache miss BW for Reads-to-Core (R= 2C)", "MetricExpr": "64 * OCR.READS_TO_CORE.L3_MISS / 1e9 / tma_info_sys= tem_time", - "MetricGroup": "HPC;Mem;MemoryBW;SoC", + "MetricGroup": "HPC;Mem;MemoryBW;Offcore;Server", "MetricName": "tma_info_memory_soc_r2c_l3m_bw", "PublicDescription": "Average L3-cache miss BW for Reads-to-Core (= R2C). This covering going to DRAM or other memory off-chip memory tears. Se= e R2C_Offcore_BW." }, { "BriefDescription": "Average Off-core access BW for Reads-to-Core = (R2C)", "MetricExpr": "64 * OCR.READS_TO_CORE.ANY_RESPONSE / 1e9 / tma_inf= o_system_time", - "MetricGroup": "HPC;Mem;MemoryBW;SoC", + "MetricGroup": "HPC;Mem;MemoryBW;Offcore;Server", "MetricName": "tma_info_memory_soc_r2c_offcore_bw", "PublicDescription": "Average Off-core access BW for Reads-to-Core= (R2C). R2C account for demand or prefetch load/RFO/code access that fill d= ata into the Core caches." }, @@ -1439,7 +1459,7 @@ "MetricName": "tma_info_memory_tlb_store_stlb_mpki" }, { - "BriefDescription": "", + "BriefDescription": "Mem;Backend;CacheHits", "MetricExpr": "UOPS_EXECUTED.THREAD / (UOPS_EXECUTED.CORE_CYCLES_G= E_1 / 2 if #SMT_on else cpu@UOPS_EXECUTED.THREAD\\,cmask\\=3D1@)", "MetricGroup": "Cor;Pipeline;PortsUtil;SMT", "MetricName": "tma_info_pipeline_execute" @@ -1486,7 +1506,7 @@ }, { "BriefDescription": "Measured Average Core Frequency for unhalted = processors [GHz]", - "MetricExpr": "tma_info_system_turbo_utilization * TSC / 1e9 / tma= _info_system_time", + "MetricExpr": "tma_info_system_turbo_utilization * msr@tsc@ / 1e9 = / tma_info_system_time", "MetricGroup": "Power;Summary", "MetricName": "tma_info_system_core_frequency" }, @@ -1498,16 +1518,28 @@ }, { "BriefDescription": "Average number of utilized CPUs", - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / TSC", + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@", "MetricGroup": "Summary", "MetricName": "tma_info_system_cpus_utilized" }, + { + "BriefDescription": "Average 3DXP Memory Bandwidth Use for reads [= GB / sec]", + "MetricExpr": "(64 * UNC_M_PMM_RPQ_INSERTS / 1e9 / tma_info_system= _time if #has_pmem > 0 else 0)", + "MetricGroup": "MemOffcore;MemoryBW;Server;SoC", + "MetricName": "tma_info_system_cxl_mem_read_bw" + }, + { + "BriefDescription": "Average 3DXP Memory Bandwidth Use for Writes = [GB / sec]", + "MetricExpr": "(64 * UNC_M_PMM_WPQ_INSERTS / 1e9 / tma_info_system= _time if #has_pmem > 0 else 0)", + "MetricGroup": "MemOffcore;MemoryBW;Server;SoC", + "MetricName": "tma_info_system_cxl_mem_write_bw" + }, { "BriefDescription": "Average external Memory Bandwidth Use for rea= ds and writes [GB / sec]", "MetricExpr": "64 * (UNC_M_CAS_COUNT.RD + UNC_M_CAS_COUNT.WR) / 1e= 9 / tma_info_system_time", "MetricGroup": "HPC;MemOffcore;MemoryBW;SoC;tma_issueBW", "MetricName": "tma_info_system_dram_bw_use", - "PublicDescription": "Average external Memory Bandwidth Use for re= ads and writes [GB / sec]. Related metrics: tma_bottleneck_cache_memory_ban= dwidth, tma_fb_full, tma_mem_bandwidth, tma_sq_full" + "PublicDescription": "Average external Memory Bandwidth Use for re= ads and writes [GB / sec]. Related metrics: tma_bottleneck_data_cache_memor= y_bandwidth, tma_fb_full, tma_mem_bandwidth, tma_sq_full" }, { "BriefDescription": "Giga Floating Point Operations Per Second", @@ -1571,9 +1603,15 @@ "MetricName": "tma_info_system_mem_parallel_reads", "PublicDescription": "Average number of parallel data read request= s to external memory. Accounts for demand loads and L1/L2 prefetches" }, + { + "BriefDescription": "Average latency of data read request to exter= nal 3D X-Point memory [in nanoseconds]", + "MetricExpr": "(1e9 * (UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_PMM / UNC= _CHA_TOR_INSERTS.IA_MISS_DRD_PMM) / uncore_cha_0@event\\=3D0x1@ if #has_pme= m > 0 else 0)", + "MetricGroup": "MemOffcore;MemoryLat;Server;SoC", + "MetricName": "tma_info_system_mem_pmm_read_latency", + "PublicDescription": "Average latency of data read request to exte= rnal 3D X-Point memory [in nanoseconds]. Accounts for demand loads and L1/L= 2 data-read prefetches" + }, { "BriefDescription": "Average latency of data read request to exter= nal memory (in nanoseconds)", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "1e9 * (UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD / UNC_CHA_= TOR_INSERTS.IA_MISS_DRD) / (tma_info_system_socket_clks / tma_info_system_t= ime)", "MetricGroup": "Mem;MemoryLat;SoC", "MetricName": "tma_info_system_mem_read_latency", @@ -1734,12 +1772,12 @@ "ScaleUnit": "100%" }, { - "BriefDescription": "This metric([SKL+] roughly; [LNL]) estimates = fraction of cycles with demand load accesses that hit the L1D cache", + "BriefDescription": "This metric ([SKL+] roughly; [LNL]) estimates= fraction of cycles with demand load accesses that hit the L1D cache", "MetricExpr": "min(2 * (MEM_INST_RETIRED.ALL_LOADS - MEM_LOAD_RETI= RED.FB_HIT - MEM_LOAD_RETIRED.L1_MISS) * 20 / 100, max(CYCLE_ACTIVITY.CYCLE= S_MEM_ANY - MEMORY_ACTIVITY.CYCLES_L1D_MISS, 0)) / tma_info_thread_clks", "MetricGroup": "BvML;MemoryLat;TopdownL4;tma_L4_group;tma_l1_bound= _group", "MetricName": "tma_l1_latency_dependency", "MetricThreshold": "tma_l1_latency_dependency > 0.1 & (tma_l1_boun= d > 0.1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric([SKL+] roughly; [LNL]) estimates= fraction of cycles with demand load accesses that hit the L1D cache. The s= hort latency of the L1D cache may be exposed in pointer-chasing memory acce= ss patterns as an example. Sample with: MEM_LOAD_RETIRED.L1_HIT", + "PublicDescription": "This metric ([SKL+] roughly; [LNL]) estimate= s fraction of cycles with demand load accesses that hit the L1D cache. The = short latency of the L1D cache may be exposed in pointer-chasing memory acc= ess patterns as an example. Sample with: MEM_LOAD_RETIRED.L1_HIT", "ScaleUnit": "100%" }, { @@ -1753,7 +1791,6 @@ }, { "BriefDescription": "This metric represents fraction of cycles wit= h demand load accesses that hit the L2 cache under unloaded scenarios (poss= ibly L2 latency limited)", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "4.4 * tma_info_system_core_frequency * MEM_LOAD_RET= IRED.L2_HIT * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS / 2) = / tma_info_thread_clks", "MetricGroup": "MemoryLat;TopdownL4;tma_L4_group;tma_l2_bound_grou= p", "MetricName": "tma_l2_hit_latency", @@ -1772,12 +1809,11 @@ }, { "BriefDescription": "This metric estimates fraction of cycles with= demand load accesses that hit the L3 cache under unloaded scenarios (possi= bly L3 latency limited)", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "32.6 * tma_info_system_core_frequency * (MEM_LOAD_R= ETIRED.L3_HIT * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS / 2= )) / tma_info_thread_clks", "MetricGroup": "BvML;MemoryLat;TopdownL4;tma_L4_group;tma_issueLat= ;tma_l3_bound_group", "MetricName": "tma_l3_hit_latency", "MetricThreshold": "tma_l3_hit_latency > 0.1 & (tma_l3_bound > 0.0= 5 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric estimates fraction of cycles wit= h demand load accesses that hit the L3 cache under unloaded scenarios (poss= ibly L3 latency limited). Avoiding private cache misses (i.e. L2 misses/L3= hits) will improve the latency; reduce contention with sibling physical co= res and increase performance. Note the value of this node may overlap with= its siblings. Sample with: MEM_LOAD_RETIRED.L3_HIT_PS. Related metrics: tm= a_bottleneck_cache_memory_latency, tma_mem_latency", + "PublicDescription": "This metric estimates fraction of cycles wit= h demand load accesses that hit the L3 cache under unloaded scenarios (poss= ibly L3 latency limited). Avoiding private cache misses (i.e. L2 misses/L3= hits) will improve the latency; reduce contention with sibling physical co= res and increase performance. Note the value of this node may overlap with= its siblings. Sample with: MEM_LOAD_RETIRED.L3_HIT_PS. Related metrics: tm= a_bottleneck_data_cache_memory_latency, tma_mem_latency", "ScaleUnit": "100%" }, { @@ -1860,6 +1896,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles the= CPU spent handling cache misses due to lock operations", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "(16 * max(0, MEM_INST_RETIRED.LOCK_LOADS - L2_RQSTS= .ALL_RFO) + MEM_INST_RETIRED.LOCK_LOADS / MEM_INST_RETIRED.ALL_STORES * (10= * L2_RQSTS.RFO_HIT + min(CPU_CLK_UNHALTED.THREAD, OFFCORE_REQUESTS_OUTSTAN= DING.CYCLES_WITH_DEMAND_RFO))) / tma_info_thread_clks", "MetricGroup": "LockCont;Offcore;TopdownL4;tma_L4_group;tma_issueR= FO;tma_l1_bound_group", "MetricName": "tma_lock_latency", @@ -1892,7 +1929,7 @@ "MetricGroup": "BvMB;MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_d= ram_bound_group;tma_issueBW", "MetricName": "tma_mem_bandwidth", "MetricThreshold": "tma_mem_bandwidth > 0.2 & (tma_dram_bound > 0.= 1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric estimates fraction of cycles whe= re the core's performance was likely hurt due to approaching bandwidth limi= ts of external memory - DRAM ([SPR-HBM] and/or HBM). The underlying heuris= tic assumes that a similar off-core traffic is generated by all IA cores. T= his metric does not aggregate non-data-read requests by this logical proces= sor; requests from other IA Logical Processors/Physical Cores/sockets; or o= ther non-IA devices like GPU; hence the maximum external memory bandwidth l= imits may or may not be approached when this metric is flagged (see Uncore = counters for that). Related metrics: tma_bottleneck_cache_memory_bandwidth,= tma_fb_full, tma_info_system_dram_bw_use, tma_sq_full", + "PublicDescription": "This metric estimates fraction of cycles whe= re the core's performance was likely hurt due to approaching bandwidth limi= ts of external memory - DRAM ([SPR-HBM] and/or HBM). The underlying heuris= tic assumes that a similar off-core traffic is generated by all IA cores. T= his metric does not aggregate non-data-read requests by this logical proces= sor; requests from other IA Logical Processors/Physical Cores/sockets; or o= ther non-IA devices like GPU; hence the maximum external memory bandwidth l= imits may or may not be approached when this metric is flagged (see Uncore = counters for that). Related metrics: tma_bottleneck_data_cache_memory_bandw= idth, tma_fb_full, tma_info_system_dram_bw_use, tma_sq_full", "ScaleUnit": "100%" }, { @@ -1901,13 +1938,13 @@ "MetricGroup": "BvML;MemoryLat;Offcore;TopdownL4;tma_L4_group;tma_= dram_bound_group;tma_issueLat", "MetricName": "tma_mem_latency", "MetricThreshold": "tma_mem_latency > 0.1 & (tma_dram_bound > 0.1 = & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric estimates fraction of cycles whe= re the performance was likely hurt due to latency from external memory - DR= AM ([SPR-HBM] and/or HBM). This metric does not aggregate requests from ot= her Logical Processors/Physical Cores/sockets (see Uncore counters for that= ). Related metrics: tma_bottleneck_cache_memory_latency, tma_l3_hit_latency= ", + "PublicDescription": "This metric estimates fraction of cycles whe= re the performance was likely hurt due to latency from external memory - DR= AM ([SPR-HBM] and/or HBM). This metric does not aggregate requests from ot= her Logical Processors/Physical Cores/sockets (see Uncore counters for that= ). Related metrics: tma_bottleneck_data_cache_memory_latency, tma_l3_hit_la= tency", "ScaleUnit": "100%" }, { "BriefDescription": "This metric represents fraction of slots the = Memory subsystem within the Backend was a bottleneck", "DefaultMetricgroupName": "TopdownL2", - "MetricExpr": "topdown\\-mem\\-bound / (topdown\\-fe\\-bound + top= down\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_in= fo_thread_slots", + "MetricExpr": "topdown\\-mem\\-bound / (topdown\\-fe\\-bound + top= down\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound)", "MetricGroup": "Backend;Default;TmaL2;TopdownL2;tma_L2_group;tma_b= ackend_bound_group", "MetricName": "tma_memory_bound", "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0= .2", @@ -1917,7 +1954,6 @@ }, { "BriefDescription": "This metric represents fraction of cycles the= CPU was stalled due to LFENCE Instructions.", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "13 * MISC2_RETIRED.LFENCE / tma_info_thread_clks", "MetricGroup": "TopdownL4;tma_L4_group;tma_serializing_operation_g= roup", "MetricName": "tma_memory_fence", @@ -1970,7 +2006,7 @@ }, { "BriefDescription": "This metric represents Core fraction of cycle= s in which CPU was likely limited due to the Microcode Sequencer (MS) unit = - see Microcode_Sequencer node for details.", - "MetricExpr": "max(IDQ.MS_CYCLES_ANY, cpu@UOPS_RETIRED.MS\\,cmask\= \=3D1@ / (UOPS_RETIRED.SLOTS / UOPS_ISSUED.ANY)) / tma_info_core_core_clks = / 2", + "MetricExpr": "max(IDQ.MS_CYCLES_ANY, cpu@UOPS_RETIRED.MS\\,cmask\= \=3D1@ / (UOPS_RETIRED.SLOTS / UOPS_ISSUED.ANY)) / tma_info_core_core_clks = / 2.4", "MetricGroup": "MicroSeq;TopdownL3;tma_L3_group;tma_fetch_bandwidt= h_group", "MetricName": "tma_ms", "MetricThreshold": "tma_ms > 0.05 & tma_fetch_bandwidth > 0.2", @@ -2005,6 +2041,7 @@ }, { "BriefDescription": "This metric represents the remaining light uo= ps fraction the CPU has executed - remaining means not covered by other sib= ling nodes", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "max(0, tma_light_operations - (tma_fp_arith + tma_i= nt_operations + tma_memory_operations + tma_fused_instructions + tma_non_fu= sed_branches))", "MetricGroup": "Pipeline;TopdownL3;tma_L3_group;tma_light_operatio= ns_group", "MetricName": "tma_other_light_ops", @@ -2066,6 +2103,7 @@ }, { "BriefDescription": "This metric estimates fraction of cycles the = CPU performance was potentially limited due to Core computation issues (non= divider-related)", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "((tma_ports_utilized_0 * tma_info_thread_clks + (EX= E_ACTIVITY.1_PORTS_UTIL + tma_retiring * EXE_ACTIVITY.2_3_PORTS_UTIL)) / tm= a_info_thread_clks if ARITH.DIV_ACTIVE < CYCLE_ACTIVITY.STALLS_TOTAL - EXE_= ACTIVITY.BOUND_ON_LOADS else (EXE_ACTIVITY.1_PORTS_UTIL + tma_retiring * EX= E_ACTIVITY.2_3_PORTS_UTIL) / tma_info_thread_clks)", "MetricGroup": "PortsUtil;TopdownL3;tma_L3_group;tma_core_bound_gr= oup", "MetricName": "tma_ports_utilization", @@ -2075,6 +2113,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles CPU= executed no uops on any execution port (Logical Processor cycles since ICL= , Physical Core cycles otherwise)", + "MetricConstraint": "NO_THRESHOLD_AND_NMI", "MetricExpr": "(EXE_ACTIVITY.EXE_BOUND_0_PORTS + max(RS.EMPTY_RESO= URCE - RESOURCE_STALLS.SCOREBOARD, 0)) / tma_info_thread_clks * (CYCLE_ACTI= VITY.STALLS_TOTAL - EXE_ACTIVITY.BOUND_ON_LOADS) / tma_info_thread_clks", "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_ports_utiliza= tion_group", "MetricName": "tma_ports_utilized_0", @@ -2084,6 +2123,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles whe= re the CPU executed total of 1 uop per cycle on all execution ports (Logica= l Processor cycles since ICL, Physical Core cycles otherwise)", + "MetricConstraint": "NO_THRESHOLD_AND_NMI", "MetricExpr": "EXE_ACTIVITY.1_PORTS_UTIL / tma_info_thread_clks", "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_issueL1;tma_p= orts_utilization_group", "MetricName": "tma_ports_utilized_1", @@ -2093,7 +2133,6 @@ }, { "BriefDescription": "This metric represents fraction of cycles CPU= executed total of 2 uops per cycle on all execution ports (Logical Process= or cycles since ICL, Physical Core cycles otherwise)", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "EXE_ACTIVITY.2_PORTS_UTIL / tma_info_thread_clks", "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_issue2P;tma_p= orts_utilization_group", "MetricName": "tma_ports_utilized_2", @@ -2103,7 +2142,6 @@ }, { "BriefDescription": "This metric represents fraction of cycles CPU= executed total of 3 or more uops per cycle on all execution ports (Logical= Processor cycles since ICL, Physical Core cycles otherwise)", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "UOPS_EXECUTED.CYCLES_GE_3 / tma_info_thread_clks", "MetricGroup": "BvCB;PortsUtil;TopdownL4;tma_L4_group;tma_ports_ut= ilization_group", "MetricName": "tma_ports_utilized_3m", @@ -2132,7 +2170,7 @@ { "BriefDescription": "This category represents fraction of slots ut= ilized by useful work i.e. issued uops that eventually get retired", "DefaultMetricgroupName": "TopdownL1", - "MetricExpr": "topdown\\-retiring / (topdown\\-fe\\-bound + topdow= n\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_info_= thread_slots", + "MetricExpr": "topdown\\-retiring / (topdown\\-fe\\-bound + topdow= n\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound)", "MetricGroup": "BvUW;Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_retiring", "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.= 1", @@ -2160,7 +2198,6 @@ }, { "BriefDescription": "This metric represents fraction of cycles the= CPU was stalled due to PAUSE Instructions", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "CPU_CLK_UNHALTED.PAUSE / tma_info_thread_clks", "MetricGroup": "TopdownL4;tma_L4_group;tma_serializing_operation_g= roup", "MetricName": "tma_slow_pause", @@ -2192,7 +2229,7 @@ "MetricGroup": "BvMB;MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_i= ssueBW;tma_l3_bound_group", "MetricName": "tma_sq_full", "MetricThreshold": "tma_sq_full > 0.3 & (tma_l3_bound > 0.05 & (tm= a_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric measures fraction of cycles wher= e the Super Queue (SQ) was full taking into account all request-types and b= oth hardware SMT threads (Logical Processors). Related metrics: tma_bottlen= eck_cache_memory_bandwidth, tma_fb_full, tma_info_system_dram_bw_use, tma_m= em_bandwidth", + "PublicDescription": "This metric measures fraction of cycles wher= e the Super Queue (SQ) was full taking into account all request-types and b= oth hardware SMT threads (Logical Processors). Related metrics: tma_bottlen= eck_data_cache_memory_bandwidth, tma_fb_full, tma_info_system_dram_bw_use, = tma_mem_bandwidth", "ScaleUnit": "100%" }, { diff --git a/tools/perf/pmu-events/arch/x86/sapphirerapids/uncore-memory.js= on b/tools/perf/pmu-events/arch/x86/sapphirerapids/uncore-memory.json index 68be01dad7c9..90f61c9511fc 100644 --- a/tools/perf/pmu-events/arch/x86/sapphirerapids/uncore-memory.json +++ b/tools/perf/pmu-events/arch/x86/sapphirerapids/uncore-memory.json @@ -2769,6 +2769,88 @@ "UMask": "0x3", "Unit": "iMC" }, + { + "BriefDescription": "Number of DRAM Refreshes Issued", + "Counter": "0,1,2,3", + "EventCode": "0x45", + "EventName": "UNC_M_DRAM_REFRESH.HIGH", + "Experimental": "1", + "PerPkg": "1", + "PublicDescription": "Number of DRAM Refreshes Issued : Counts the= number of refreshes issued.", + "UMask": "0x24", + "Unit": "iMC" + }, + { + "BriefDescription": "Number of DRAM Refreshes Issued", + "Counter": "0,1,2,3", + "EventCode": "0x45", + "EventName": "UNC_M_DRAM_REFRESH.HIGH_ALL", + "Experimental": "1", + "PerPkg": "1", + "UMask": "0x24", + "Unit": "iMC" + }, + { + "BriefDescription": "Number of DRAM Refreshes Issued", + "Counter": "0,1,2,3", + "EventCode": "0x45", + "EventName": "UNC_M_DRAM_REFRESH.HIGH_PCH0", + "Experimental": "1", + "PerPkg": "1", + "UMask": "0x4", + "Unit": "iMC" + }, + { + "BriefDescription": "Number of DRAM Refreshes Issued", + "Counter": "0,1,2,3", + "EventCode": "0x45", + "EventName": "UNC_M_DRAM_REFRESH.HIGH_PCH1", + "Experimental": "1", + "PerPkg": "1", + "UMask": "0x20", + "Unit": "iMC" + }, + { + "BriefDescription": "Number of DRAM Refreshes Issued", + "Counter": "0,1,2,3", + "EventCode": "0x45", + "EventName": "UNC_M_DRAM_REFRESH.PANIC", + "Experimental": "1", + "PerPkg": "1", + "PublicDescription": "Number of DRAM Refreshes Issued : Counts the= number of refreshes issued.", + "UMask": "0x12", + "Unit": "iMC" + }, + { + "BriefDescription": "Number of DRAM Refreshes Issued", + "Counter": "0,1,2,3", + "EventCode": "0x45", + "EventName": "UNC_M_DRAM_REFRESH.PANIC_ALL", + "Experimental": "1", + "PerPkg": "1", + "UMask": "0x12", + "Unit": "iMC" + }, + { + "BriefDescription": "Number of DRAM Refreshes Issued", + "Counter": "0,1,2,3", + "EventCode": "0x45", + "EventName": "UNC_M_DRAM_REFRESH.PANIC_PCH0", + "Experimental": "1", + "PerPkg": "1", + "UMask": "0x2", + "Unit": "iMC" + }, + { + "BriefDescription": "Number of DRAM Refreshes Issued", + "Counter": "0,1,2,3", + "EventCode": "0x45", + "EventName": "UNC_M_DRAM_REFRESH.PANIC_PCH1", + "Experimental": "1", + "PerPkg": "1", + "UMask": "0x10", + "Unit": "iMC" + }, { "BriefDescription": "ECC Correctable Errors", "Counter": "0,1,2,3", diff --git a/tools/perf/pmu-events/arch/x86/sapphirerapids/virtual-memory.j= son b/tools/perf/pmu-events/arch/x86/sapphirerapids/virtual-memory.json index 3d3f88600e26..609a9549cbf3 100644 --- a/tools/perf/pmu-events/arch/x86/sapphirerapids/virtual-memory.json +++ b/tools/perf/pmu-events/arch/x86/sapphirerapids/virtual-memory.json @@ -4,7 +4,7 @@ "Counter": "0,1,2,3", "EventCode": "0x12", "EventName": "DTLB_LOAD_MISSES.STLB_HIT", - "PublicDescription": "Counts loads that miss the DTLB (Data TLB) a= nd hit the STLB (Second level TLB). Available PDIST counters: 0", + "PublicDescription": "Counts loads that miss the DTLB (Data TLB) a= nd hit the STLB (Second level TLB).", "SampleAfterValue": "100003", "UMask": "0x20" }, @@ -14,7 +14,7 @@ "CounterMask": "1", "EventCode": "0x12", "EventName": "DTLB_LOAD_MISSES.WALK_ACTIVE", - "PublicDescription": "Counts cycles when at least one PMH (Page Mi= ss Handler) is busy with a page walk for a demand load. Available PDIST cou= nters: 0", + "PublicDescription": "Counts cycles when at least one PMH (Page Mi= ss Handler) is busy with a page walk for a demand load.", "SampleAfterValue": "100003", "UMask": "0x10" }, @@ -23,7 +23,7 @@ "Counter": "0,1,2,3", "EventCode": "0x12", "EventName": "DTLB_LOAD_MISSES.WALK_COMPLETED", - "PublicDescription": "Counts completed page walks (all page sizes= ) caused by demand data loads. This implies it missed in the DTLB and furth= er levels of TLB. The page walk can end with or without a fault. Available = PDIST counters: 0", + "PublicDescription": "Counts completed page walks (all page sizes= ) caused by demand data loads. This implies it missed in the DTLB and furth= er levels of TLB. The page walk can end with or without a fault.", "SampleAfterValue": "100003", "UMask": "0xe" }, @@ -32,7 +32,7 @@ "Counter": "0,1,2,3", "EventCode": "0x12", "EventName": "DTLB_LOAD_MISSES.WALK_COMPLETED_1G", - "PublicDescription": "Counts completed page walks (1G sizes) caus= ed by demand data loads. This implies address translations missed in the DT= LB and further levels of TLB. The page walk can end with or without a fault= . Available PDIST counters: 0", + "PublicDescription": "Counts completed page walks (1G sizes) caus= ed by demand data loads. This implies address translations missed in the DT= LB and further levels of TLB. The page walk can end with or without a fault= .", "SampleAfterValue": "100003", "UMask": "0x8" }, @@ -41,7 +41,7 @@ "Counter": "0,1,2,3", "EventCode": "0x12", "EventName": "DTLB_LOAD_MISSES.WALK_COMPLETED_2M_4M", - "PublicDescription": "Counts completed page walks (2M/4M sizes) c= aused by demand data loads. This implies address translations missed in the= DTLB and further levels of TLB. The page walk can end with or without a fa= ult. Available PDIST counters: 0", + "PublicDescription": "Counts completed page walks (2M/4M sizes) c= aused by demand data loads. This implies address translations missed in the= DTLB and further levels of TLB. The page walk can end with or without a fa= ult.", "SampleAfterValue": "100003", "UMask": "0x4" }, @@ -50,7 +50,7 @@ "Counter": "0,1,2,3", "EventCode": "0x12", "EventName": "DTLB_LOAD_MISSES.WALK_COMPLETED_4K", - "PublicDescription": "Counts completed page walks (4K sizes) caus= ed by demand data loads. This implies address translations missed in the DT= LB and further levels of TLB. The page walk can end with or without a fault= . Available PDIST counters: 0", + "PublicDescription": "Counts completed page walks (4K sizes) caus= ed by demand data loads. This implies address translations missed in the DT= LB and further levels of TLB. The page walk can end with or without a fault= .", "SampleAfterValue": "100003", "UMask": "0x2" }, @@ -59,7 +59,7 @@ "Counter": "0,1,2,3", "EventCode": "0x12", "EventName": "DTLB_LOAD_MISSES.WALK_PENDING", - "PublicDescription": "Counts the number of page walks outstanding = for a demand load in the PMH (Page Miss Handler) each cycle. Available PDIS= T counters: 0", + "PublicDescription": "Counts the number of page walks outstanding = for a demand load in the PMH (Page Miss Handler) each cycle.", "SampleAfterValue": "100003", "UMask": "0x10" }, @@ -68,7 +68,7 @@ "Counter": "0,1,2,3", "EventCode": "0x13", "EventName": "DTLB_STORE_MISSES.STLB_HIT", - "PublicDescription": "Counts stores that miss the DTLB (Data TLB) = and hit the STLB (2nd Level TLB). Available PDIST counters: 0", + "PublicDescription": "Counts stores that miss the DTLB (Data TLB) = and hit the STLB (2nd Level TLB).", "SampleAfterValue": "100003", "UMask": "0x20" }, @@ -78,7 +78,7 @@ "CounterMask": "1", "EventCode": "0x13", "EventName": "DTLB_STORE_MISSES.WALK_ACTIVE", - "PublicDescription": "Counts cycles when at least one PMH (Page Mi= ss Handler) is busy with a page walk for a store. Available PDIST counters:= 0", + "PublicDescription": "Counts cycles when at least one PMH (Page Mi= ss Handler) is busy with a page walk for a store.", "SampleAfterValue": "100003", "UMask": "0x10" }, @@ -87,7 +87,7 @@ "Counter": "0,1,2,3", "EventCode": "0x13", "EventName": "DTLB_STORE_MISSES.WALK_COMPLETED", - "PublicDescription": "Counts completed page walks (all page sizes= ) caused by demand data stores. This implies it missed in the DTLB and furt= her levels of TLB. The page walk can end with or without a fault. Available= PDIST counters: 0", + "PublicDescription": "Counts completed page walks (all page sizes= ) caused by demand data stores. This implies it missed in the DTLB and furt= her levels of TLB. The page walk can end with or without a fault.", "SampleAfterValue": "100003", "UMask": "0xe" }, @@ -96,7 +96,7 @@ "Counter": "0,1,2,3", "EventCode": "0x13", "EventName": "DTLB_STORE_MISSES.WALK_COMPLETED_1G", - "PublicDescription": "Counts completed page walks (1G sizes) caus= ed by demand data stores. This implies address translations missed in the D= TLB and further levels of TLB. The page walk can end with or without a faul= t. Available PDIST counters: 0", + "PublicDescription": "Counts completed page walks (1G sizes) caus= ed by demand data stores. This implies address translations missed in the D= TLB and further levels of TLB. The page walk can end with or without a faul= t.", "SampleAfterValue": "100003", "UMask": "0x8" }, @@ -105,7 +105,7 @@ "Counter": "0,1,2,3", "EventCode": "0x13", "EventName": "DTLB_STORE_MISSES.WALK_COMPLETED_2M_4M", - "PublicDescription": "Counts completed page walks (2M/4M sizes) c= aused by demand data stores. This implies address translations missed in th= e DTLB and further levels of TLB. The page walk can end with or without a f= ault. Available PDIST counters: 0", + "PublicDescription": "Counts completed page walks (2M/4M sizes) c= aused by demand data stores. This implies address translations missed in th= e DTLB and further levels of TLB. The page walk can end with or without a f= ault.", "SampleAfterValue": "100003", "UMask": "0x4" }, @@ -114,7 +114,7 @@ "Counter": "0,1,2,3", "EventCode": "0x13", "EventName": "DTLB_STORE_MISSES.WALK_COMPLETED_4K", - "PublicDescription": "Counts completed page walks (4K sizes) caus= ed by demand data stores. This implies address translations missed in the D= TLB and further levels of TLB. The page walk can end with or without a faul= t. Available PDIST counters: 0", + "PublicDescription": "Counts completed page walks (4K sizes) caus= ed by demand data stores. This implies address translations missed in the D= TLB and further levels of TLB. The page walk can end with or without a faul= t.", "SampleAfterValue": "100003", "UMask": "0x2" }, @@ -123,7 +123,7 @@ "Counter": "0,1,2,3", "EventCode": "0x13", "EventName": "DTLB_STORE_MISSES.WALK_PENDING", - "PublicDescription": "Counts the number of page walks outstanding = for a store in the PMH (Page Miss Handler) each cycle. Available PDIST coun= ters: 0", + "PublicDescription": "Counts the number of page walks outstanding = for a store in the PMH (Page Miss Handler) each cycle.", "SampleAfterValue": "100003", "UMask": "0x10" }, @@ -132,7 +132,7 @@ "Counter": "0,1,2,3", "EventCode": "0x11", "EventName": "ITLB_MISSES.STLB_HIT", - "PublicDescription": "Counts instruction fetch requests that miss = the ITLB (Instruction TLB) and hit the STLB (Second-level TLB). Available P= DIST counters: 0", + "PublicDescription": "Counts instruction fetch requests that miss = the ITLB (Instruction TLB) and hit the STLB (Second-level TLB).", "SampleAfterValue": "100003", "UMask": "0x20" }, @@ -142,7 +142,7 @@ "CounterMask": "1", "EventCode": "0x11", "EventName": "ITLB_MISSES.WALK_ACTIVE", - "PublicDescription": "Counts cycles when at least one PMH (Page Mi= ss Handler) is busy with a page walk for a code (instruction fetch) request= . Available PDIST counters: 0", + "PublicDescription": "Counts cycles when at least one PMH (Page Mi= ss Handler) is busy with a page walk for a code (instruction fetch) request= .", "SampleAfterValue": "100003", "UMask": "0x10" }, @@ -151,7 +151,7 @@ "Counter": "0,1,2,3", "EventCode": "0x11", "EventName": "ITLB_MISSES.WALK_COMPLETED", - "PublicDescription": "Counts completed page walks (all page sizes)= caused by a code fetch. This implies it missed in the ITLB (Instruction TL= B) and further levels of TLB. The page walk can end with or without a fault= . Available PDIST counters: 0", + "PublicDescription": "Counts completed page walks (all page sizes)= caused by a code fetch. This implies it missed in the ITLB (Instruction TL= B) and further levels of TLB. The page walk can end with or without a fault= .", "SampleAfterValue": "100003", "UMask": "0xe" }, @@ -160,7 +160,7 @@ "Counter": "0,1,2,3", "EventCode": "0x11", "EventName": "ITLB_MISSES.WALK_COMPLETED_2M_4M", - "PublicDescription": "Counts completed page walks (2M/4M page size= s) caused by a code fetch. This implies it missed in the ITLB (Instruction = TLB) and further levels of TLB. The page walk can end with or without a fau= lt. Available PDIST counters: 0", + "PublicDescription": "Counts completed page walks (2M/4M page size= s) caused by a code fetch. This implies it missed in the ITLB (Instruction = TLB) and further levels of TLB. The page walk can end with or without a fau= lt.", "SampleAfterValue": "100003", "UMask": "0x4" }, @@ -169,7 +169,7 @@ "Counter": "0,1,2,3", "EventCode": "0x11", "EventName": "ITLB_MISSES.WALK_COMPLETED_4K", - "PublicDescription": "Counts completed page walks (4K page sizes) = caused by a code fetch. This implies it missed in the ITLB (Instruction TLB= ) and further levels of TLB. The page walk can end with or without a fault.= Available PDIST counters: 0", + "PublicDescription": "Counts completed page walks (4K page sizes) = caused by a code fetch. This implies it missed in the ITLB (Instruction TLB= ) and further levels of TLB. The page walk can end with or without a fault.= ", "SampleAfterValue": "100003", "UMask": "0x2" }, @@ -178,7 +178,7 @@ "Counter": "0,1,2,3", "EventCode": "0x11", "EventName": "ITLB_MISSES.WALK_PENDING", - "PublicDescription": "Counts the number of page walks outstanding = for an outstanding code (instruction fetch) request in the PMH (Page Miss H= andler) each cycle. Available PDIST counters: 0", + "PublicDescription": "Counts the number of page walks outstanding = for an outstanding code (instruction fetch) request in the PMH (Page Miss H= andler) each cycle.", "SampleAfterValue": "100003", "UMask": "0x10" } --=20 2.50.0.727.gbf7dc18ff4-goog From nobody Mon Oct 6 15:13:22 2025 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CBF6D225A24 for ; Sat, 19 Jul 2025 03:45:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752896762; cv=none; b=rK1XmW1Invn5U+GLkseO2Tr1kBLrqOaHbv3x8ywe6q7EnbpYOeJrGAhUm0Pb8Zke9kO2b9MLes7x5PTX7zBCjIub8nlpd9PtZKJzX2btFuV2qEBXmQRx1Nl1m2spFzXvprLSTDB0ISiZ8HvbSQ5IdS7H6ix5uY5rY4IXSfB8q38= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752896762; c=relaxed/simple; bh=BImgXBhe8gz9mi2Hr7xZg6e2yMMe4mk6zYrtyywYwVc=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Content-Type; b=J7R0k94HU4BoAOrG6ONbgrLmBFt/4HL8/x89V5bm7qXdffsGHWuUaS2fHqLG/hSgk11Ie2e1cs5p6SDf8A6axRcWZ9hrgNZPxrZTeMJkxcEb7S9su+nbZAa0LT8YdxArU6oqLNu2ouRfvEeJbRzSG1ojwsoeCH074sEBpS4Xf0U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=fdnRu2/y; arc=none smtp.client-ip=209.85.215.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="fdnRu2/y" Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-b3642ab4429so1922145a12.0 for ; Fri, 18 Jul 2025 20:45:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1752896756; x=1753501556; darn=vger.kernel.org; h=to:from:subject:message-id:references:mime-version:in-reply-to:date :from:to:cc:subject:date:message-id:reply-to; bh=uwbCpczO77l0xiPSlOdCxGpiGz3+duVbPCBhW+nDRFA=; b=fdnRu2/yvmaNEHOLkuTzP9siuEjvZb5AVFv0766+GABeiFid7JGkuaL9obAeAFai95 xeoe+5gxmrFbH/CwpikupZPR/igxVlJqIUn3vJUvrc4x8wvUy7npIrDZC7tG2Z32SPDd jBOrIpX2ZSK1YUn7smMyUBYmGSYPe7/ffN6faeloja2342QXlxac7CSVvWF0uNuZlIHo Xdy3XB3rkAbQ9Gg1QSVdp3P6e0eMcD1w6GwRkrXThOVI9I8Xuh54SMfvBCbfSffOQfvx vrZvLYR7qjK/zRXEzQpKRtFsf8eY8IgEYeF0yFTLWWM+478J2/+vXh3xAuwn18RqsxKS Qk+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752896756; x=1753501556; h=to:from:subject:message-id:references:mime-version:in-reply-to:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=uwbCpczO77l0xiPSlOdCxGpiGz3+duVbPCBhW+nDRFA=; b=M2g/90QNWLya1DxoKlEjaUpp+HqCkw8khf2TFfHqIKrMcRp4YvRjDYus8SzmQ0C0Ih PuWz7iLSEZRAXEZhkXD4SaHkZT9OYjA1zpXy/u7K0PU8nOv8u28DI57qDrVX4Cc7G2T3 KFMlEqFm3xttco0xStMDPv6bB4Eg+rS1eUoFgV3cztUwCHKqRhe3ge13RxGEHVrDLq8k CQsaGkRCF10a2yuPaCTwHe1S9hDZ9VIvP3Ch6R/sb77rtJtYOnFZIqYjBLRRaoq/7MWy d6PjlMHARBQQFwIGhXdYc4nHRH+DrQAbBzXyogXjHTrkW427ZdDmsK2GmvyFaCIFWt4u lByQ== X-Forwarded-Encrypted: i=1; AJvYcCWO5YEQnswlPdArcbR/doIQykd1DMliBqQmcRq7dQzUc5RB7jr618goQnBrfdGCPqmxYkhkQfW+PBVUt7E=@vger.kernel.org X-Gm-Message-State: AOJu0YzlmD5UfrBfNuhFIZ/yMaLYm9V/RQDOEXH6EdVMjPRGDRd8pvH3 ghT8+GGUQfocTjrzaNqZSeMKAu9OxnD6evQkP6fJljOaVQW47LSIPq5tuKFAGfL4iHk0sLy5dQD uarIUOdZ2TA== X-Google-Smtp-Source: AGHT+IG9zb3O4YDYA+FzRpT+cFBELwmC1bv45iR6dsLUNQDetbmplXcLS/ZYZ/XjTEEAd3LxKlhmIjMpuHwi X-Received: from pjbnc8.prod.google.com ([2002:a17:90b:37c8:b0:312:26c4:236]) (user=irogers job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:50ce:b0:312:f2ee:a895 with SMTP id 98e67ed59e1d1-31c9f45b0abmr17336572a91.31.1752896756497; Fri, 18 Jul 2025 20:45:56 -0700 (PDT) Date: Fri, 18 Jul 2025 20:45:13 -0700 In-Reply-To: <20250719034515.2000467-1-irogers@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250719034515.2000467-1-irogers@google.com> X-Mailer: git-send-email 2.50.0.727.gbf7dc18ff4-goog Message-ID: <20250719034515.2000467-18-irogers@google.com> Subject: [PATCH v1 17/19] perf vendor events: Update sierraforest metrics From: Ian Rogers To: Thomas Falcon , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Kan Liang , "=?UTF-8?q?Andreas=20F=C3=A4rber?=" , Manivannan Sadhasivam , Caleb Biggers , Weilin Wang , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-actions@lists.infradead.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Update metrics from TMA 5.0 to 5.1. Signed-off-by: Ian Rogers Tested-by: Thomas Falcon --- .../arch/x86/sierraforest/srf-metrics.json | 20 +++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/tools/perf/pmu-events/arch/x86/sierraforest/srf-metrics.json b= /tools/perf/pmu-events/arch/x86/sierraforest/srf-metrics.json index b9f3c611d87b..ca2c55917e55 100644 --- a/tools/perf/pmu-events/arch/x86/sierraforest/srf-metrics.json +++ b/tools/perf/pmu-events/arch/x86/sierraforest/srf-metrics.json @@ -1,56 +1,56 @@ [ { "BriefDescription": "C10 residency percent per package", - "MetricExpr": "cstate_pkg@c10\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c10\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C10_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C1 residency percent per core", - "MetricExpr": "cstate_core@c1\\-residency@ / TSC", + "MetricExpr": "cstate_core@c1\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C1_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C2 residency percent per package", - "MetricExpr": "cstate_pkg@c2\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c2\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C2_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C3 residency percent per package", - "MetricExpr": "cstate_pkg@c3\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c3\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C3_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per core", - "MetricExpr": "cstate_core@c6\\-residency@ / TSC", + "MetricExpr": "cstate_core@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per package", - "MetricExpr": "cstate_pkg@c6\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C7 residency percent per core", - "MetricExpr": "cstate_core@c7\\-residency@ / TSC", + "MetricExpr": "cstate_core@c7\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C7_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C8 residency percent per package", - "MetricExpr": "cstate_pkg@c8\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c8\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C8_Pkg_Residency", "ScaleUnit": "100%" @@ -735,7 +735,7 @@ }, { "BriefDescription": "Average CPU Utilization", - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / TSC", + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@", "MetricName": "tma_info_system_cpu_utilization" }, { @@ -747,7 +747,7 @@ }, { "BriefDescription": "Fraction of cycles spent in Kernel mode", - "MetricExpr": "cpu@CPU_CLK_UNHALTED.CORE_P@k / CPU_CLK_UNHALTED.CO= RE", + "MetricExpr": "CPU_CLK_UNHALTED.CORE_P:k / CPU_CLK_UNHALTED.CORE", "MetricGroup": "Summary", "MetricName": "tma_info_system_kernel_utilization" }, --=20 2.50.0.727.gbf7dc18ff4-goog From nobody Mon Oct 6 15:13:22 2025 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C361C22D7BF for ; Sat, 19 Jul 2025 03:45:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752896769; cv=none; b=QeZ9fbiqUqB1JLwANq4Ew0oYcq/y678OObMoEGCOZmfRmCbhVK7t6eBHKcQKSQCsXf3QgPV7O6Jd28GpjZVlrI8n755cPnbLoePFcwtWfEOuxAtbh9ndnNN3JZe4xeuxh4J3thT4y9f5TWXQreLiBFuDZ5190aBuTyspivyVZDw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752896769; c=relaxed/simple; bh=iRfMjavfKgAMM/LFOzjPKazoTt4axyMDnytQ/JBBQdA=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Content-Type; b=q4DLlLxlnYbI+Nc4XmbIYvqUeLPa6Hub1RV/kE3QYf7Hy68k4NuZnw/yQYAwjhnLeBYHsJ9kLNrxEhrmEjOI/PR7MrcoMuA3maXBXeLVGvWhfqVNeYghrze8I+yFWP4FRchvc0dnfdKvvwtRva2Izv6J1KlJntwZwLW4ic1E5kM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=iXLSRo/8; arc=none smtp.client-ip=209.85.215.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="iXLSRo/8" Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-b3928ad6176so2393456a12.3 for ; Fri, 18 Jul 2025 20:45:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1752896758; x=1753501558; darn=vger.kernel.org; h=content-transfer-encoding:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=4tTPs7OslFBOCapfTRkksUPcX2J22mOEa781rKSRuzI=; b=iXLSRo/87lWkJl0wbOJ0wwcb3ezMzfIE/3YW67rWf6SAFFHK4NcWwMon4dL5LGtim/ ALn22PEC9V6XUHixscyFuT5fXWn0/k9ftfq327ENleMgX7t3hAr+ot4GFNWXvDqb11BK gi1sWd1nq2ng2qGGleVyoic+8naKffqC1fhjlAurapbKCYaMIVqW9USw0chl8MBBQ4QS /H2LdrNosGBwFDK4nnLvyuE2fXLLelsiaUX573EluwSNSlwTkYri6E5UCPrirdP1Gq01 YfPbhLO5cT7vhnMuT93vp4ILomNMW46KMpBlyB7bzeymZQm2KGD7wSFpNqwEsY8LVmFv mTXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752896758; x=1753501558; h=content-transfer-encoding:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=4tTPs7OslFBOCapfTRkksUPcX2J22mOEa781rKSRuzI=; b=g4wV5ZkwR5huXfj/cpAyZLVXNzI7f0CZrWtuRWXk/YCUfJIeHxCmUSGXNl4GoMu3aa BIJTeS5ZJb3j/UaxZAmkSkJ3LOUYANALnQeWDIgiwGgZAf9ptgbp2qzZCahuu3t+NFcz RTwDPjxycbYYDJoAPkOMBjUGDEQn9Buz/M00sOXMu74Jo+I58s0H4C04yVL6DcpBRxkC lG1H5TKrV3N8OxwUJNBzmEnY1RfLL4m+37GgOL29UGfPoeaLODTEsEh+XbggmG35wMGs wGmBqb3RHHSARE3sujwg9/3GvDou+PKLZyywre+R5s/7SVCu7yx0I81LIOwpMA7Nt4hz Bw0g== X-Forwarded-Encrypted: i=1; AJvYcCU9AKWjNA+0SZ0xROMpOmcS4awhjiw9WefzZ3f19COhyEO80NGXb+UeXSeAy1FtBldVV/CkuLmEy9DfHiU=@vger.kernel.org X-Gm-Message-State: AOJu0YxgU1wYgTEp8KhyBWtwuETn2YbkK4sp6G3anNb2+SjCbACEYwH5 crskEgg5x32Te81Xpqp9uIi135VNqTIIsd/wEwqsDqhm6TO20fyLybvYCiMc95sXOvYIlloLOQi d2m7zNB0o/g== X-Google-Smtp-Source: AGHT+IGKBTcp27HmQUZttThwlBEEPF33GR6cg0X0cudxZ34sgw8rkU0/OdElBm/hHbHApbwGoOC/e3jzkHk/ X-Received: from pjbqo12.prod.google.com ([2002:a17:90b:3dcc:b0:315:b7f8:7ff]) (user=irogers job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:520a:b0:31c:15d9:8a8 with SMTP id 98e67ed59e1d1-31c9e6e9639mr17218425a91.1.1752896758033; Fri, 18 Jul 2025 20:45:58 -0700 (PDT) Date: Fri, 18 Jul 2025 20:45:14 -0700 In-Reply-To: <20250719034515.2000467-1-irogers@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250719034515.2000467-1-irogers@google.com> X-Mailer: git-send-email 2.50.0.727.gbf7dc18ff4-goog Message-ID: <20250719034515.2000467-19-irogers@google.com> Subject: [PATCH v1 18/19] perf vendor events: Update skylake metrics From: Ian Rogers To: Thomas Falcon , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Kan Liang , "=?UTF-8?q?Andreas=20F=C3=A4rber?=" , Manivannan Sadhasivam , Caleb Biggers , Weilin Wang , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-actions@lists.infradead.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Update metrics from TMA 5.0 to 5.1. Signed-off-by: Ian Rogers Tested-by: Thomas Falcon --- .../arch/x86/skylake/skl-metrics.json | 101 ++++++++++++------ .../arch/x86/skylakex/skx-metrics.json | 101 ++++++++++++------ 2 files changed, 134 insertions(+), 68 deletions(-) diff --git a/tools/perf/pmu-events/arch/x86/skylake/skl-metrics.json b/tool= s/perf/pmu-events/arch/x86/skylake/skl-metrics.json index 2d3a037e88b5..707bd8790ed2 100644 --- a/tools/perf/pmu-events/arch/x86/skylake/skl-metrics.json +++ b/tools/perf/pmu-events/arch/x86/skylake/skl-metrics.json @@ -1,49 +1,49 @@ [ { "BriefDescription": "C2 residency percent per package", - "MetricExpr": "cstate_pkg@c2\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c2\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C2_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C3 residency percent per core", - "MetricExpr": "cstate_core@c3\\-residency@ / TSC", + "MetricExpr": "cstate_core@c3\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C3_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C3 residency percent per package", - "MetricExpr": "cstate_pkg@c3\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c3\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C3_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per core", - "MetricExpr": "cstate_core@c6\\-residency@ / TSC", + "MetricExpr": "cstate_core@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per package", - "MetricExpr": "cstate_pkg@c6\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C7 residency percent per core", - "MetricExpr": "cstate_core@c7\\-residency@ / TSC", + "MetricExpr": "cstate_core@c7\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C7_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C7 residency percent per package", - "MetricExpr": "cstate_pkg@c7\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c7\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C7_Pkg_Residency", "ScaleUnit": "100%" @@ -80,6 +80,7 @@ }, { "BriefDescription": "This metric represents Core fraction of cycle= s CPU dispatched uops on execution ports for ALU operations.", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "(UOPS_DISPATCHED_PORT.PORT_0 + UOPS_DISPATCHED_PORT= .PORT_1 + UOPS_DISPATCHED_PORT.PORT_5 + UOPS_DISPATCHED_PORT.PORT_6) / tma_= info_thread_slots", "MetricGroup": "TopdownL5;tma_L5_group;tma_ports_utilized_3m_group= ", "MetricName": "tma_alu_op_utilization", @@ -117,6 +118,7 @@ }, { "BriefDescription": "Total pipeline cost of instruction fetch rela= ted bottlenecks by large code footprint programs (i-side cache; TLB and BTB= misses)", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * tma_fetch_latency * (tma_itlb_misses + tma_ic= ache_misses + tma_unknown_branches) / (tma_branch_resteers + tma_dsb_switch= es + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches)", "MetricGroup": "BigFootprint;BvBC;Fed;Frontend;IcMiss;MemoryTLB", "MetricName": "tma_bottleneck_big_code", @@ -130,32 +132,36 @@ "MetricThreshold": "tma_bottleneck_branching_overhead > 5", "PublicDescription": "Total pipeline cost of instructions used for= program control-flow - a subset of the Retiring category in TMA. Examples = include function calls; loops and alignments. (A lower bound)" }, + { + "BriefDescription": "Total pipeline cost when the execution is com= pute-bound - an estimation", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "100 * (tma_core_bound * tma_divider / (tma_divider = + tma_ports_utilization + tma_serializing_operation) + tma_core_bound * (tm= a_ports_utilization / (tma_divider + tma_ports_utilization + tma_serializin= g_operation)) * (tma_ports_utilized_3m / (tma_ports_utilized_0 + tma_ports_= utilized_1 + tma_ports_utilized_2 + tma_ports_utilized_3m)))", + "MetricGroup": "BvCB;Cor;tma_issueComp", + "MetricName": "tma_bottleneck_compute_bound_est", + "MetricThreshold": "tma_bottleneck_compute_bound_est > 20", + "PublicDescription": "Total pipeline cost when the execution is co= mpute-bound - an estimation. Covers Core Bound when High ILP as well as whe= n long-latency execution units are busy. Related metrics: " + }, { "BriefDescription": "Total pipeline cost of external Memory- or Ca= che-Bandwidth related bottlenecks", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) *= (tma_mem_bandwidth / (tma_mem_bandwidth + tma_mem_latency)) + tma_memory_b= ound * (tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_= l3_bound + tma_store_bound)) * (tma_sq_full / (tma_contested_accesses + tma= _data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_memory_bound * (tm= a_l1_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound += tma_store_bound)) * (tma_fb_full / (tma_4k_aliasing + tma_dtlb_load + tma_= fb_full + tma_l1_latency_dependency + tma_lock_latency + tma_split_loads + = tma_store_fwd_blk)))", "MetricGroup": "BvMB;Mem;MemoryBW;Offcore;tma_issueBW", - "MetricName": "tma_bottleneck_cache_memory_bandwidth", - "MetricThreshold": "tma_bottleneck_cache_memory_bandwidth > 20", + "MetricName": "tma_bottleneck_data_cache_memory_bandwidth", + "MetricThreshold": "tma_bottleneck_data_cache_memory_bandwidth > 2= 0", "PublicDescription": "Total pipeline cost of external Memory- or C= ache-Bandwidth related bottlenecks. Related metrics: tma_fb_full, tma_info_= system_dram_bw_use, tma_mem_bandwidth, tma_sq_full" }, { "BriefDescription": "Total pipeline cost of external Memory- or Ca= che-Latency related bottlenecks", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) *= (tma_mem_latency / (tma_mem_bandwidth + tma_mem_latency)) + tma_memory_bou= nd * (tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3= _bound + tma_store_bound)) * (tma_l3_hit_latency / (tma_contested_accesses = + tma_data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_memory_bound = * tma_l2_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bou= nd + tma_store_bound) + tma_memory_bound * (tma_l1_bound / (tma_dram_bound = + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_l1_= latency_dependency / (tma_4k_aliasing + tma_dtlb_load + tma_fb_full + tma_l= 1_latency_dependency + tma_lock_latency + tma_split_loads + tma_store_fwd_b= lk)) + tma_memory_bound * (tma_l1_bound / (tma_dram_bound + tma_l1_bound + = tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_lock_latency / (tma_= 4k_aliasing + tma_dtlb_load + tma_fb_full + tma_l1_latency_dependency + tma= _lock_latency + tma_split_loads + tma_store_fwd_blk)) + tma_memory_bound * = (tma_l1_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_boun= d + tma_store_bound)) * (tma_split_loads / (tma_4k_aliasing + tma_dtlb_load= + tma_fb_full + tma_l1_latency_dependency + tma_lock_latency + tma_split_l= oads + tma_store_fwd_blk)) + tma_memory_bound * (tma_store_bound / (tma_dra= m_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * = (tma_split_stores / (tma_dtlb_store + tma_false_sharing + tma_split_stores = + tma_store_latency)) + tma_memory_bound * (tma_store_bound / (tma_dram_bou= nd + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_= store_latency / (tma_dtlb_store + tma_false_sharing + tma_split_stores + tm= a_store_latency)))", "MetricGroup": "BvML;Mem;MemoryLat;Offcore;tma_issueLat", - "MetricName": "tma_bottleneck_cache_memory_latency", - "MetricThreshold": "tma_bottleneck_cache_memory_latency > 20", + "MetricName": "tma_bottleneck_data_cache_memory_latency", + "MetricThreshold": "tma_bottleneck_data_cache_memory_latency > 20", "PublicDescription": "Total pipeline cost of external Memory- or C= ache-Latency related bottlenecks. Related metrics: tma_l3_hit_latency, tma_= mem_latency" }, - { - "BriefDescription": "Total pipeline cost when the execution is com= pute-bound - an estimation", - "MetricExpr": "100 * (tma_core_bound * tma_divider / (tma_divider = + tma_ports_utilization + tma_serializing_operation) + tma_core_bound * (tm= a_ports_utilization / (tma_divider + tma_ports_utilization + tma_serializin= g_operation)) * (tma_ports_utilized_3m / (tma_ports_utilized_0 + tma_ports_= utilized_1 + tma_ports_utilized_2 + tma_ports_utilized_3m)))", - "MetricGroup": "BvCB;Cor;tma_issueComp", - "MetricName": "tma_bottleneck_compute_bound_est", - "MetricThreshold": "tma_bottleneck_compute_bound_est > 20", - "PublicDescription": "Total pipeline cost when the execution is co= mpute-bound - an estimation. Covers Core Bound when High ILP as well as whe= n long-latency execution units are busy. Related metrics: " - }, { "BriefDescription": "Total pipeline cost of instruction fetch band= width related bottlenecks (when the front-end could not sustain operations = delivery to the back-end)", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_frontend_bound - (1 - 10 * tma_microcode= _sequencer * tma_other_mispredicts / tma_branch_mispredicts) * tma_fetch_la= tency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches = + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches) - tma_mi= crocode_sequencer / (tma_few_uops_instructions + tma_microcode_sequencer) *= (tma_assists / tma_microcode_sequencer) * tma_fetch_latency * (tma_ms_swit= ches + tma_branch_resteers * (tma_clears_resteers + tma_mispredicts_resteer= s * (10 * tma_microcode_sequencer * tma_other_mispredicts / tma_branch_misp= redicts)) / (tma_clears_resteers + tma_mispredicts_resteers + tma_unknown_b= ranches)) / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + t= ma_itlb_misses + tma_lcp + tma_ms_switches)) - tma_bottleneck_big_code", "MetricGroup": "BvFB;Fed;FetchBW;Frontend", "MetricName": "tma_bottleneck_instruction_fetch_bw", @@ -163,6 +169,7 @@ }, { "BriefDescription": "Total pipeline cost of irregular execution (e= .g", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_microcode_sequencer / (tma_few_uops_inst= ructions + tma_microcode_sequencer) * (tma_assists / tma_microcode_sequence= r) * tma_fetch_latency * (tma_ms_switches + tma_branch_resteers * (tma_clea= rs_resteers + tma_mispredicts_resteers * (10 * tma_microcode_sequencer * tm= a_other_mispredicts / tma_branch_mispredicts)) / (tma_clears_resteers + tma= _mispredicts_resteers + tma_unknown_branches)) / (tma_branch_resteers + tma= _dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_swit= ches) + 10 * tma_microcode_sequencer * tma_other_mispredicts / tma_branch_m= ispredicts * tma_branch_mispredicts + tma_machine_clears * tma_other_nukes = / tma_other_nukes + tma_core_bound * (tma_serializing_operation + tma_core_= bound * RS_EVENTS.EMPTY_CYCLES / tma_info_thread_clks * tma_ports_utilized_= 0) / (tma_divider + tma_ports_utilization + tma_serializing_operation) + tm= a_microcode_sequencer / (tma_few_uops_instructions + tma_microcode_sequence= r) * (tma_assists / tma_microcode_sequencer) * tma_heavy_operations)", "MetricGroup": "Bad;BvIO;Cor;Ret;tma_issueMS", "MetricName": "tma_bottleneck_irregular_overhead", @@ -171,6 +178,7 @@ }, { "BriefDescription": "Total pipeline cost of Memory Address Transla= tion related bottlenecks (data-side TLBs)", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_memory_bound * (tma_l1_bound / max(tma_m= emory_bound, tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + = tma_store_bound)) * (tma_dtlb_load / max(tma_l1_bound, tma_4k_aliasing + tm= a_dtlb_load + tma_fb_full + tma_l1_latency_dependency + tma_lock_latency + = tma_split_loads + tma_store_fwd_blk)) + tma_memory_bound * (tma_store_bound= / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store= _bound)) * (tma_dtlb_store / (tma_dtlb_store + tma_false_sharing + tma_spli= t_stores + tma_store_latency)))", "MetricGroup": "BvMT;Mem;MemoryTLB;Offcore;tma_issueTLB", "MetricName": "tma_bottleneck_memory_data_tlbs", @@ -179,6 +187,7 @@ }, { "BriefDescription": "Total pipeline cost of Memory Synchronization= related bottlenecks (data transfers and coherency updates across processor= s)", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_memory_bound * (tma_l3_bound / (tma_dram= _bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * (t= ma_contested_accesses + tma_data_sharing) / (tma_contested_accesses + tma_d= ata_sharing + tma_l3_hit_latency + tma_sq_full) + tma_store_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * = tma_false_sharing / (tma_dtlb_store + tma_false_sharing + tma_split_stores = + tma_store_latency - tma_store_latency)) + tma_machine_clears * (1 - tma_o= ther_nukes / tma_other_nukes))", "MetricGroup": "BvMS;LockCont;Mem;Offcore;tma_issueSyncxn", "MetricName": "tma_bottleneck_memory_synchronization", @@ -187,6 +196,7 @@ }, { "BriefDescription": "Total pipeline cost of Branch Misprediction r= elated bottlenecks", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (1 - 10 * tma_microcode_sequencer * tma_other= _mispredicts / tma_branch_mispredicts) * (tma_branch_mispredicts + tma_fetc= h_latency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switc= hes + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches))", "MetricGroup": "Bad;BadSpec;BrMispredicts;BvMP;tma_issueBM", "MetricName": "tma_bottleneck_mispredictions", @@ -195,7 +205,8 @@ }, { "BriefDescription": "Total pipeline cost of remaining bottlenecks = in the back-end", - "MetricExpr": "100 - (tma_bottleneck_big_code + tma_bottleneck_ins= truction_fetch_bw + tma_bottleneck_mispredictions + tma_bottleneck_cache_me= mory_bandwidth + tma_bottleneck_cache_memory_latency + tma_bottleneck_memor= y_data_tlbs + tma_bottleneck_memory_synchronization + tma_bottleneck_comput= e_bound_est + tma_bottleneck_irregular_overhead + tma_bottleneck_branching_= overhead + tma_bottleneck_useful_work)", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "100 - (tma_bottleneck_big_code + tma_bottleneck_ins= truction_fetch_bw + tma_bottleneck_mispredictions + tma_bottleneck_data_cac= he_memory_bandwidth + tma_bottleneck_data_cache_memory_latency + tma_bottle= neck_memory_data_tlbs + tma_bottleneck_memory_synchronization + tma_bottlen= eck_compute_bound_est + tma_bottleneck_irregular_overhead + tma_bottleneck_= branching_overhead + tma_bottleneck_useful_work)", "MetricGroup": "BvOB;Cor;Offcore", "MetricName": "tma_bottleneck_other_bottlenecks", "MetricThreshold": "tma_bottleneck_other_bottlenecks > 20", @@ -203,6 +214,7 @@ }, { "BriefDescription": "Total pipeline cost of \"useful operations\" = - the portion of Retiring category not covered by Branching_Overhead nor Ir= regular_Overhead.", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_retiring - (BR_INST_RETIRED.ALL_BRANCHES= + 2 * BR_INST_RETIRED.NEAR_CALL + INST_RETIRED.NOP) / tma_info_thread_slot= s - tma_microcode_sequencer / (tma_few_uops_instructions + tma_microcode_se= quencer) * (tma_assists / tma_microcode_sequencer) * tma_heavy_operations)", "MetricGroup": "BvUW;Ret", "MetricName": "tma_bottleneck_useful_work", @@ -230,6 +242,7 @@ }, { "BriefDescription": "This metric estimates fraction of cycles the = CPU retired uops originated from CISC (complex instruction set computer) in= struction", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "max(0, tma_microcode_sequencer - tma_assists)", "MetricGroup": "TopdownL4;tma_L4_group;tma_microcode_sequencer_gro= up", "MetricName": "tma_cisc", @@ -391,7 +404,7 @@ "MetricGroup": "BvMB;MemoryBW;TopdownL4;tma_L4_group;tma_issueBW;t= ma_issueSL;tma_issueSmSt;tma_l1_bound_group", "MetricName": "tma_fb_full", "MetricThreshold": "tma_fb_full > 0.3", - "PublicDescription": "This metric does a *rough estimation* of how= often L1D Fill Buffer unavailability limited additional L1D miss memory ac= cess requests to proceed. The higher the metric value; the deeper the memor= y hierarchy level the misses are satisfied from (metric values >1 are valid= ). Often it hints on approaching bandwidth limits (to L2 cache; L3 cache or= external memory). Related metrics: tma_bottleneck_cache_memory_bandwidth, = tma_info_system_dram_bw_use, tma_mem_bandwidth, tma_sq_full, tma_store_late= ncy, tma_streaming_stores", + "PublicDescription": "This metric does a *rough estimation* of how= often L1D Fill Buffer unavailability limited additional L1D miss memory ac= cess requests to proceed. The higher the metric value; the deeper the memor= y hierarchy level the misses are satisfied from (metric values >1 are valid= ). Often it hints on approaching bandwidth limits (to L2 cache; L3 cache or= external memory). Related metrics: tma_bottleneck_data_cache_memory_bandwi= dth, tma_info_system_dram_bw_use, tma_mem_bandwidth, tma_sq_full, tma_store= _latency, tma_streaming_stores", "ScaleUnit": "100%" }, { @@ -454,7 +467,6 @@ }, { "BriefDescription": "This metric approximates arithmetic floating-= point (FP) vector uops fraction the CPU has retired aggregated across all v= ector widths", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "FP_ARITH_INST_RETIRED.VECTOR / UOPS_RETIRED.RETIRE_= SLOTS", "MetricGroup": "Compute;Flops;TopdownL4;tma_L4_group;tma_fp_arith_= group;tma_issue2P", "MetricName": "tma_fp_vector", @@ -520,6 +532,7 @@ }, { "BriefDescription": "Branch Misprediction Cost: Cycles representin= g fraction of TMA slots wasted per non-speculative branch misprediction (re= tired JEClear)", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "tma_bottleneck_mispredictions * tma_info_thread_slo= ts / 4 / BR_MISP_RETIRED.ALL_BRANCHES / 100", "MetricGroup": "Bad;BrMispredicts;tma_issueBM", "MetricName": "tma_info_bad_spec_branch_misprediction_cost", @@ -555,6 +568,7 @@ }, { "BriefDescription": "Total pipeline cost of DSB (uop cache) hits -= subset of the Instruction_Fetch_BW Bottleneck", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_frontend_bound * (tma_fetch_bandwidth / = (tma_fetch_bandwidth + tma_fetch_latency)) * (tma_dsb / (tma_dsb + tma_mite= )))", "MetricGroup": "DSB;Fed;FetchBW;tma_issueFB", "MetricName": "tma_info_botlnk_l2_dsb_bandwidth", @@ -572,6 +586,7 @@ }, { "BriefDescription": "Total pipeline cost of Instruction Cache miss= es - subset of the Big_Code Bottleneck", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_fetch_latency * tma_icache_misses / (tma= _branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses += tma_lcp + tma_ms_switches))", "MetricGroup": "Fed;FetchLat;IcMiss;tma_issueFL", "MetricName": "tma_info_botlnk_l2_ic_misses", @@ -713,7 +728,6 @@ }, { "BriefDescription": "Instructions per FP Arithmetic instruction (l= ower number means higher occurrence rate)", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "INST_RETIRED.ANY / (FP_ARITH_INST_RETIRED.SCALAR + = FP_ARITH_INST_RETIRED.VECTOR)", "MetricGroup": "Flops;InsType", "MetricName": "tma_info_inst_mix_iparith", @@ -975,7 +989,7 @@ "MetricName": "tma_info_memory_tlb_store_stlb_mpki" }, { - "BriefDescription": "", + "BriefDescription": "Mem;Backend;CacheHits", "MetricExpr": "UOPS_EXECUTED.THREAD / (UOPS_EXECUTED.CORE_CYCLES_G= E_1 / 2 if #SMT_on else cpu@UOPS_EXECUTED.THREAD\\,cmask\\=3D1@)", "MetricGroup": "Cor;Pipeline;PortsUtil;SMT", "MetricName": "tma_info_pipeline_execute" @@ -992,6 +1006,12 @@ "MetricGroup": "Fed;FetchBW", "MetricName": "tma_info_pipeline_fetch_mite" }, + { + "BriefDescription": "Average number of uops fetched from MS per cy= cle", + "MetricExpr": "IDQ.MS_UOPS / cpu@IDQ.MS_UOPS\\,cmask\\=3D1@", + "MetricGroup": "Fed;FetchLat;MicroSeq", + "MetricName": "tma_info_pipeline_fetch_ms" + }, { "BriefDescription": "Instructions per a microcode Assist invocatio= n", "MetricExpr": "INST_RETIRED.ANY / (FP_ASSIST.ANY + OTHER_ASSISTS.A= NY)", @@ -1008,7 +1028,7 @@ }, { "BriefDescription": "Measured Average Core Frequency for unhalted = processors [GHz]", - "MetricExpr": "tma_info_system_turbo_utilization * TSC / 1e9 / tma= _info_system_time", + "MetricExpr": "tma_info_system_turbo_utilization * msr@tsc@ / 1e9 = / tma_info_system_time", "MetricGroup": "Power;Summary", "MetricName": "tma_info_system_core_frequency" }, @@ -1020,7 +1040,7 @@ }, { "BriefDescription": "Average number of utilized CPUs", - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / TSC", + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@", "MetricGroup": "Summary", "MetricName": "tma_info_system_cpus_utilized" }, @@ -1029,7 +1049,7 @@ "MetricExpr": "64 * (UNC_ARB_TRK_REQUESTS.ALL + UNC_ARB_COH_TRK_RE= QUESTS.ALL) / 1e6 / tma_info_system_time / 1e3", "MetricGroup": "HPC;MemOffcore;MemoryBW;SoC;tma_issueBW", "MetricName": "tma_info_system_dram_bw_use", - "PublicDescription": "Average external Memory Bandwidth Use for re= ads and writes [GB / sec]. Related metrics: tma_bottleneck_cache_memory_ban= dwidth, tma_fb_full, tma_mem_bandwidth, tma_sq_full" + "PublicDescription": "Average external Memory Bandwidth Use for re= ads and writes [GB / sec]. Related metrics: tma_bottleneck_data_cache_memor= y_bandwidth, tma_fb_full, tma_mem_bandwidth, tma_sq_full" }, { "BriefDescription": "Giga Floating Point Operations Per Second", @@ -1175,12 +1195,13 @@ "ScaleUnit": "100%" }, { - "BriefDescription": "This metric([SKL+] roughly; [LNL]) estimates = fraction of cycles with demand load accesses that hit the L1D cache", + "BriefDescription": "This metric ([SKL+] roughly; [LNL]) estimates= fraction of cycles with demand load accesses that hit the L1D cache", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "min(2 * (MEM_INST_RETIRED.ALL_LOADS - MEM_LOAD_RETI= RED.FB_HIT - MEM_LOAD_RETIRED.L1_MISS) * 20 / 100, max(CYCLE_ACTIVITY.CYCLE= S_MEM_ANY - CYCLE_ACTIVITY.CYCLES_L1D_MISS, 0)) / tma_info_thread_clks", "MetricGroup": "BvML;MemoryLat;TopdownL4;tma_L4_group;tma_l1_bound= _group", "MetricName": "tma_l1_latency_dependency", "MetricThreshold": "tma_l1_latency_dependency > 0.1 & (tma_l1_boun= d > 0.1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric([SKL+] roughly; [LNL]) estimates= fraction of cycles with demand load accesses that hit the L1D cache. The s= hort latency of the L1D cache may be exposed in pointer-chasing memory acce= ss patterns as an example. Sample with: MEM_LOAD_RETIRED.L1_HIT", + "PublicDescription": "This metric ([SKL+] roughly; [LNL]) estimate= s fraction of cycles with demand load accesses that hit the L1D cache. The = short latency of the L1D cache may be exposed in pointer-chasing memory acc= ess patterns as an example. Sample with: MEM_LOAD_RETIRED.L1_HIT", "ScaleUnit": "100%" }, { @@ -1217,7 +1238,7 @@ "MetricGroup": "BvML;MemoryLat;TopdownL4;tma_L4_group;tma_issueLat= ;tma_l3_bound_group", "MetricName": "tma_l3_hit_latency", "MetricThreshold": "tma_l3_hit_latency > 0.1 & (tma_l3_bound > 0.0= 5 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric estimates fraction of cycles wit= h demand load accesses that hit the L3 cache under unloaded scenarios (poss= ibly L3 latency limited). Avoiding private cache misses (i.e. L2 misses/L3= hits) will improve the latency; reduce contention with sibling physical co= res and increase performance. Note the value of this node may overlap with= its siblings. Sample with: MEM_LOAD_RETIRED.L3_HIT_PS. Related metrics: tm= a_bottleneck_cache_memory_latency, tma_mem_latency", + "PublicDescription": "This metric estimates fraction of cycles wit= h demand load accesses that hit the L3 cache under unloaded scenarios (poss= ibly L3 latency limited). Avoiding private cache misses (i.e. L2 misses/L3= hits) will improve the latency; reduce contention with sibling physical co= res and increase performance. Note the value of this node may overlap with= its siblings. Sample with: MEM_LOAD_RETIRED.L3_HIT_PS. Related metrics: tm= a_bottleneck_data_cache_memory_latency, tma_mem_latency", "ScaleUnit": "100%" }, { @@ -1241,6 +1262,7 @@ }, { "BriefDescription": "This metric represents Core fraction of cycle= s CPU dispatched uops on execution port for Load operations", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "(UOPS_DISPATCHED_PORT.PORT_2 + UOPS_DISPATCHED_PORT= .PORT_3 + UOPS_DISPATCHED_PORT.PORT_7 - UOPS_DISPATCHED_PORT.PORT_4) / (2 *= tma_info_core_core_clks)", "MetricGroup": "TopdownL5;tma_L5_group;tma_ports_utilized_3m_group= ", "MetricName": "tma_load_op_utilization", @@ -1267,6 +1289,7 @@ }, { "BriefDescription": "This metric estimates the fraction of cycles = to walk the memory paging structures to cache translation of 1 GB pages for= data load accesses.", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "tma_load_stlb_miss * DTLB_LOAD_MISSES.WALK_COMPLETE= D_1G / (DTLB_LOAD_MISSES.WALK_COMPLETED_4K + DTLB_LOAD_MISSES.WALK_COMPLETE= D_2M_4M + DTLB_LOAD_MISSES.WALK_COMPLETED_1G)", "MetricGroup": "MemoryTLB;TopdownL6;tma_L6_group;tma_load_stlb_mis= s_group", "MetricName": "tma_load_stlb_miss_1g", @@ -1275,6 +1298,7 @@ }, { "BriefDescription": "This metric estimates the fraction of cycles = to walk the memory paging structures to cache translation of 2 or 4 MB page= s for data load accesses.", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "tma_load_stlb_miss * DTLB_LOAD_MISSES.WALK_COMPLETE= D_2M_4M / (DTLB_LOAD_MISSES.WALK_COMPLETED_4K + DTLB_LOAD_MISSES.WALK_COMPL= ETED_2M_4M + DTLB_LOAD_MISSES.WALK_COMPLETED_1G)", "MetricGroup": "MemoryTLB;TopdownL6;tma_L6_group;tma_load_stlb_mis= s_group", "MetricName": "tma_load_stlb_miss_2m", @@ -1283,6 +1307,7 @@ }, { "BriefDescription": "This metric estimates the fraction of cycles = to walk the memory paging structures to cache translation of 4 KB pages for= data load accesses.", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "tma_load_stlb_miss * DTLB_LOAD_MISSES.WALK_COMPLETE= D_4K / (DTLB_LOAD_MISSES.WALK_COMPLETED_4K + DTLB_LOAD_MISSES.WALK_COMPLETE= D_2M_4M + DTLB_LOAD_MISSES.WALK_COMPLETED_1G)", "MetricGroup": "MemoryTLB;TopdownL6;tma_L6_group;tma_load_stlb_mis= s_group", "MetricName": "tma_load_stlb_miss_4k", @@ -1291,6 +1316,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles the= CPU spent handling cache misses due to lock operations", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "(12 * max(0, MEM_INST_RETIRED.LOCK_LOADS - L2_RQSTS= .ALL_RFO) + MEM_INST_RETIRED.LOCK_LOADS / MEM_INST_RETIRED.ALL_STORES * (9 = * L2_RQSTS.RFO_HIT + min(CPU_CLK_UNHALTED.THREAD, OFFCORE_REQUESTS_OUTSTAND= ING.CYCLES_WITH_DEMAND_RFO))) / tma_info_thread_clks", "MetricGroup": "LockCont;Offcore;TopdownL4;tma_L4_group;tma_issueR= FO;tma_l1_bound_group", "MetricName": "tma_lock_latency", @@ -1315,7 +1341,7 @@ "MetricGroup": "BvMB;MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_d= ram_bound_group;tma_issueBW", "MetricName": "tma_mem_bandwidth", "MetricThreshold": "tma_mem_bandwidth > 0.2 & (tma_dram_bound > 0.= 1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric estimates fraction of cycles whe= re the core's performance was likely hurt due to approaching bandwidth limi= ts of external memory - DRAM ([SPR-HBM] and/or HBM). The underlying heuris= tic assumes that a similar off-core traffic is generated by all IA cores. T= his metric does not aggregate non-data-read requests by this logical proces= sor; requests from other IA Logical Processors/Physical Cores/sockets; or o= ther non-IA devices like GPU; hence the maximum external memory bandwidth l= imits may or may not be approached when this metric is flagged (see Uncore = counters for that). Related metrics: tma_bottleneck_cache_memory_bandwidth,= tma_fb_full, tma_info_system_dram_bw_use, tma_sq_full", + "PublicDescription": "This metric estimates fraction of cycles whe= re the core's performance was likely hurt due to approaching bandwidth limi= ts of external memory - DRAM ([SPR-HBM] and/or HBM). The underlying heuris= tic assumes that a similar off-core traffic is generated by all IA cores. T= his metric does not aggregate non-data-read requests by this logical proces= sor; requests from other IA Logical Processors/Physical Cores/sockets; or o= ther non-IA devices like GPU; hence the maximum external memory bandwidth l= imits may or may not be approached when this metric is flagged (see Uncore = counters for that). Related metrics: tma_bottleneck_data_cache_memory_bandw= idth, tma_fb_full, tma_info_system_dram_bw_use, tma_sq_full", "ScaleUnit": "100%" }, { @@ -1324,7 +1350,7 @@ "MetricGroup": "BvML;MemoryLat;Offcore;TopdownL4;tma_L4_group;tma_= dram_bound_group;tma_issueLat", "MetricName": "tma_mem_latency", "MetricThreshold": "tma_mem_latency > 0.1 & (tma_dram_bound > 0.1 = & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric estimates fraction of cycles whe= re the performance was likely hurt due to latency from external memory - DR= AM ([SPR-HBM] and/or HBM). This metric does not aggregate requests from ot= her Logical Processors/Physical Cores/sockets (see Uncore counters for that= ). Related metrics: tma_bottleneck_cache_memory_latency, tma_l3_hit_latency= ", + "PublicDescription": "This metric estimates fraction of cycles whe= re the performance was likely hurt due to latency from external memory - DR= AM ([SPR-HBM] and/or HBM). This metric does not aggregate requests from ot= her Logical Processors/Physical Cores/sockets (see Uncore counters for that= ). Related metrics: tma_bottleneck_data_cache_memory_latency, tma_l3_hit_la= tency", "ScaleUnit": "100%" }, { @@ -1348,7 +1374,6 @@ }, { "BriefDescription": "This metric represents fraction of slots the = CPU was retiring uops fetched by the Microcode Sequencer (MS) unit", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / UOPS_ISSUED.ANY * IDQ.M= S_UOPS / tma_info_thread_slots", "MetricGroup": "MicroSeq;TopdownL3;tma_L3_group;tma_heavy_operatio= ns_group;tma_issueMC;tma_issueMS", "MetricName": "tma_microcode_sequencer", @@ -1358,6 +1383,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles the= CPU was stalled due to Branch Resteers as a result of Branch Misprediction= at execution stage", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "BR_MISP_RETIRED.ALL_BRANCHES / (BR_MISP_RETIRED.ALL= _BRANCHES + MACHINE_CLEARS.COUNT) * INT_MISC.CLEAR_RESTEER_CYCLES / tma_inf= o_thread_clks", "MetricGroup": "BadSpec;BrMispredicts;BvMP;TopdownL4;tma_L4_group;= tma_branch_resteers_group;tma_issueBM", "MetricName": "tma_mispredicts_resteers", @@ -1412,6 +1438,7 @@ }, { "BriefDescription": "This metric represents the remaining light uo= ps fraction the CPU has executed - remaining means not covered by other sib= ling nodes", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "max(0, tma_light_operations - (tma_fp_arith + tma_m= emory_operations + tma_fused_instructions + tma_non_fused_branches))", "MetricGroup": "Pipeline;TopdownL3;tma_L3_group;tma_light_operatio= ns_group", "MetricName": "tma_other_light_ops", @@ -1421,6 +1448,7 @@ }, { "BriefDescription": "This metric estimates fraction of slots the C= PU was stalled due to other cases of misprediction (non-retired x86 branche= s or other types).", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "max(tma_branch_mispredicts * (1 - BR_MISP_RETIRED.A= LL_BRANCHES / (INT_MISC.CLEARS_COUNT - MACHINE_CLEARS.COUNT)), 0.0001)", "MetricGroup": "BrMispredicts;BvIO;TopdownL3;tma_L3_group;tma_bran= ch_mispredicts_group", "MetricName": "tma_other_mispredicts", @@ -1429,6 +1457,7 @@ }, { "BriefDescription": "This metric represents fraction of slots the = CPU has wasted due to Nukes (Machine Clears) not related to memory ordering= .", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "max(tma_machine_clears * (1 - MACHINE_CLEARS.MEMORY= _ORDERING / MACHINE_CLEARS.COUNT), 0.0001)", "MetricGroup": "BvIO;Machine_Clears;TopdownL3;tma_L3_group;tma_mac= hine_clears_group", "MetricName": "tma_other_nukes", @@ -1509,6 +1538,7 @@ }, { "BriefDescription": "This metric estimates fraction of cycles the = CPU performance was potentially limited due to Core computation issues (non= divider-related)", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "((tma_ports_utilized_0 * tma_info_thread_clks + (EX= E_ACTIVITY.1_PORTS_UTIL + tma_retiring * EXE_ACTIVITY.2_PORTS_UTIL)) / tma_= info_thread_clks if ARITH.DIVIDER_ACTIVE < CYCLE_ACTIVITY.STALLS_TOTAL - CY= CLE_ACTIVITY.STALLS_MEM_ANY else (EXE_ACTIVITY.1_PORTS_UTIL + tma_retiring = * EXE_ACTIVITY.2_PORTS_UTIL) / tma_info_thread_clks)", "MetricGroup": "PortsUtil;TopdownL3;tma_L3_group;tma_core_bound_gr= oup", "MetricName": "tma_ports_utilization", @@ -1595,7 +1625,7 @@ "MetricGroup": "BvMB;MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_i= ssueBW;tma_l3_bound_group", "MetricName": "tma_sq_full", "MetricThreshold": "tma_sq_full > 0.3 & (tma_l3_bound > 0.05 & (tm= a_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric measures fraction of cycles wher= e the Super Queue (SQ) was full taking into account all request-types and b= oth hardware SMT threads (Logical Processors). Related metrics: tma_bottlen= eck_cache_memory_bandwidth, tma_fb_full, tma_info_system_dram_bw_use, tma_m= em_bandwidth", + "PublicDescription": "This metric measures fraction of cycles wher= e the Super Queue (SQ) was full taking into account all request-types and b= oth hardware SMT threads (Logical Processors). Related metrics: tma_bottlen= eck_data_cache_memory_bandwidth, tma_fb_full, tma_info_system_dram_bw_use, = tma_mem_bandwidth", "ScaleUnit": "100%" }, { @@ -1652,6 +1682,7 @@ }, { "BriefDescription": "This metric estimates the fraction of cycles = to walk the memory paging structures to cache translation of 1 GB pages for= data store accesses.", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "tma_store_stlb_miss * DTLB_STORE_MISSES.WALK_COMPLE= TED_1G / (DTLB_STORE_MISSES.WALK_COMPLETED_4K + DTLB_STORE_MISSES.WALK_COMP= LETED_2M_4M + DTLB_STORE_MISSES.WALK_COMPLETED_1G)", "MetricGroup": "MemoryTLB;TopdownL6;tma_L6_group;tma_store_stlb_mi= ss_group", "MetricName": "tma_store_stlb_miss_1g", @@ -1660,6 +1691,7 @@ }, { "BriefDescription": "This metric estimates the fraction of cycles = to walk the memory paging structures to cache translation of 2 or 4 MB page= s for data store accesses.", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "tma_store_stlb_miss * DTLB_STORE_MISSES.WALK_COMPLE= TED_2M_4M / (DTLB_STORE_MISSES.WALK_COMPLETED_4K + DTLB_STORE_MISSES.WALK_C= OMPLETED_2M_4M + DTLB_STORE_MISSES.WALK_COMPLETED_1G)", "MetricGroup": "MemoryTLB;TopdownL6;tma_L6_group;tma_store_stlb_mi= ss_group", "MetricName": "tma_store_stlb_miss_2m", @@ -1668,6 +1700,7 @@ }, { "BriefDescription": "This metric estimates the fraction of cycles = to walk the memory paging structures to cache translation of 4 KB pages for= data store accesses.", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "tma_store_stlb_miss * DTLB_STORE_MISSES.WALK_COMPLE= TED_4K / (DTLB_STORE_MISSES.WALK_COMPLETED_4K + DTLB_STORE_MISSES.WALK_COMP= LETED_2M_4M + DTLB_STORE_MISSES.WALK_COMPLETED_1G)", "MetricGroup": "MemoryTLB;TopdownL6;tma_L6_group;tma_store_stlb_mi= ss_group", "MetricName": "tma_store_stlb_miss_4k", diff --git a/tools/perf/pmu-events/arch/x86/skylakex/skx-metrics.json b/too= ls/perf/pmu-events/arch/x86/skylakex/skx-metrics.json index 7cc7b076c3e2..56c2caaafef4 100644 --- a/tools/perf/pmu-events/arch/x86/skylakex/skx-metrics.json +++ b/tools/perf/pmu-events/arch/x86/skylakex/skx-metrics.json @@ -1,49 +1,49 @@ [ { "BriefDescription": "C2 residency percent per package", - "MetricExpr": "cstate_pkg@c2\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c2\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C2_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C3 residency percent per core", - "MetricExpr": "cstate_core@c3\\-residency@ / TSC", + "MetricExpr": "cstate_core@c3\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C3_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C3 residency percent per package", - "MetricExpr": "cstate_pkg@c3\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c3\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C3_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per core", - "MetricExpr": "cstate_core@c6\\-residency@ / TSC", + "MetricExpr": "cstate_core@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per package", - "MetricExpr": "cstate_pkg@c6\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C7 residency percent per core", - "MetricExpr": "cstate_core@c7\\-residency@ / TSC", + "MetricExpr": "cstate_core@c7\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C7_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C7 residency percent per package", - "MetricExpr": "cstate_pkg@c7\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c7\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C7_Pkg_Residency", "ScaleUnit": "100%" @@ -301,6 +301,7 @@ }, { "BriefDescription": "This metric represents Core fraction of cycle= s CPU dispatched uops on execution ports for ALU operations.", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "(UOPS_DISPATCHED_PORT.PORT_0 + UOPS_DISPATCHED_PORT= .PORT_1 + UOPS_DISPATCHED_PORT.PORT_5 + UOPS_DISPATCHED_PORT.PORT_6) / tma_= info_thread_slots", "MetricGroup": "TopdownL5;tma_L5_group;tma_ports_utilized_3m_group= ", "MetricName": "tma_alu_op_utilization", @@ -338,6 +339,7 @@ }, { "BriefDescription": "Total pipeline cost of instruction fetch rela= ted bottlenecks by large code footprint programs (i-side cache; TLB and BTB= misses)", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * tma_fetch_latency * (tma_itlb_misses + tma_ic= ache_misses + tma_unknown_branches) / (tma_branch_resteers + tma_dsb_switch= es + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches)", "MetricGroup": "BigFootprint;BvBC;Fed;Frontend;IcMiss;MemoryTLB", "MetricName": "tma_bottleneck_big_code", @@ -351,32 +353,36 @@ "MetricThreshold": "tma_bottleneck_branching_overhead > 5", "PublicDescription": "Total pipeline cost of instructions used for= program control-flow - a subset of the Retiring category in TMA. Examples = include function calls; loops and alignments. (A lower bound)" }, + { + "BriefDescription": "Total pipeline cost when the execution is com= pute-bound - an estimation", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "100 * (tma_core_bound * tma_divider / (tma_divider = + tma_ports_utilization + tma_serializing_operation) + tma_core_bound * (tm= a_ports_utilization / (tma_divider + tma_ports_utilization + tma_serializin= g_operation)) * (tma_ports_utilized_3m / (tma_ports_utilized_0 + tma_ports_= utilized_1 + tma_ports_utilized_2 + tma_ports_utilized_3m)))", + "MetricGroup": "BvCB;Cor;tma_issueComp", + "MetricName": "tma_bottleneck_compute_bound_est", + "MetricThreshold": "tma_bottleneck_compute_bound_est > 20", + "PublicDescription": "Total pipeline cost when the execution is co= mpute-bound - an estimation. Covers Core Bound when High ILP as well as whe= n long-latency execution units are busy. Related metrics: " + }, { "BriefDescription": "Total pipeline cost of external Memory- or Ca= che-Bandwidth related bottlenecks", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) *= (tma_mem_bandwidth / (tma_mem_bandwidth + tma_mem_latency)) + tma_memory_b= ound * (tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_= l3_bound + tma_store_bound)) * (tma_sq_full / (tma_contested_accesses + tma= _data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_memory_bound * (tm= a_l1_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound += tma_store_bound)) * (tma_fb_full / (tma_4k_aliasing + tma_dtlb_load + tma_= fb_full + tma_l1_latency_dependency + tma_lock_latency + tma_split_loads + = tma_store_fwd_blk)))", "MetricGroup": "BvMB;Mem;MemoryBW;Offcore;tma_issueBW", - "MetricName": "tma_bottleneck_cache_memory_bandwidth", - "MetricThreshold": "tma_bottleneck_cache_memory_bandwidth > 20", + "MetricName": "tma_bottleneck_data_cache_memory_bandwidth", + "MetricThreshold": "tma_bottleneck_data_cache_memory_bandwidth > 2= 0", "PublicDescription": "Total pipeline cost of external Memory- or C= ache-Bandwidth related bottlenecks. Related metrics: tma_fb_full, tma_info_= system_dram_bw_use, tma_mem_bandwidth, tma_sq_full" }, { "BriefDescription": "Total pipeline cost of external Memory- or Ca= che-Latency related bottlenecks", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) *= (tma_mem_latency / (tma_mem_bandwidth + tma_mem_latency)) + tma_memory_bou= nd * (tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3= _bound + tma_store_bound)) * (tma_l3_hit_latency / (tma_contested_accesses = + tma_data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_memory_bound = * tma_l2_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bou= nd + tma_store_bound) + tma_memory_bound * (tma_l1_bound / (tma_dram_bound = + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_l1_= latency_dependency / (tma_4k_aliasing + tma_dtlb_load + tma_fb_full + tma_l= 1_latency_dependency + tma_lock_latency + tma_split_loads + tma_store_fwd_b= lk)) + tma_memory_bound * (tma_l1_bound / (tma_dram_bound + tma_l1_bound + = tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_lock_latency / (tma_= 4k_aliasing + tma_dtlb_load + tma_fb_full + tma_l1_latency_dependency + tma= _lock_latency + tma_split_loads + tma_store_fwd_blk)) + tma_memory_bound * = (tma_l1_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_boun= d + tma_store_bound)) * (tma_split_loads / (tma_4k_aliasing + tma_dtlb_load= + tma_fb_full + tma_l1_latency_dependency + tma_lock_latency + tma_split_l= oads + tma_store_fwd_blk)) + tma_memory_bound * (tma_store_bound / (tma_dra= m_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * = (tma_split_stores / (tma_dtlb_store + tma_false_sharing + tma_split_stores = + tma_store_latency)) + tma_memory_bound * (tma_store_bound / (tma_dram_bou= nd + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_= store_latency / (tma_dtlb_store + tma_false_sharing + tma_split_stores + tm= a_store_latency)))", "MetricGroup": "BvML;Mem;MemoryLat;Offcore;tma_issueLat", - "MetricName": "tma_bottleneck_cache_memory_latency", - "MetricThreshold": "tma_bottleneck_cache_memory_latency > 20", + "MetricName": "tma_bottleneck_data_cache_memory_latency", + "MetricThreshold": "tma_bottleneck_data_cache_memory_latency > 20", "PublicDescription": "Total pipeline cost of external Memory- or C= ache-Latency related bottlenecks. Related metrics: tma_l3_hit_latency, tma_= mem_latency" }, - { - "BriefDescription": "Total pipeline cost when the execution is com= pute-bound - an estimation", - "MetricExpr": "100 * (tma_core_bound * tma_divider / (tma_divider = + tma_ports_utilization + tma_serializing_operation) + tma_core_bound * (tm= a_ports_utilization / (tma_divider + tma_ports_utilization + tma_serializin= g_operation)) * (tma_ports_utilized_3m / (tma_ports_utilized_0 + tma_ports_= utilized_1 + tma_ports_utilized_2 + tma_ports_utilized_3m)))", - "MetricGroup": "BvCB;Cor;tma_issueComp", - "MetricName": "tma_bottleneck_compute_bound_est", - "MetricThreshold": "tma_bottleneck_compute_bound_est > 20", - "PublicDescription": "Total pipeline cost when the execution is co= mpute-bound - an estimation. Covers Core Bound when High ILP as well as whe= n long-latency execution units are busy. Related metrics: " - }, { "BriefDescription": "Total pipeline cost of instruction fetch band= width related bottlenecks (when the front-end could not sustain operations = delivery to the back-end)", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_frontend_bound - (1 - 10 * tma_microcode= _sequencer * tma_other_mispredicts / tma_branch_mispredicts) * tma_fetch_la= tency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches = + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches) - tma_mi= crocode_sequencer / (tma_few_uops_instructions + tma_microcode_sequencer) *= (tma_assists / tma_microcode_sequencer) * tma_fetch_latency * (tma_ms_swit= ches + tma_branch_resteers * (tma_clears_resteers + tma_mispredicts_resteer= s * (10 * tma_microcode_sequencer * tma_other_mispredicts / tma_branch_misp= redicts)) / (tma_clears_resteers + tma_mispredicts_resteers + tma_unknown_b= ranches)) / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + t= ma_itlb_misses + tma_lcp + tma_ms_switches)) - tma_bottleneck_big_code", "MetricGroup": "BvFB;Fed;FetchBW;Frontend", "MetricName": "tma_bottleneck_instruction_fetch_bw", @@ -384,6 +390,7 @@ }, { "BriefDescription": "Total pipeline cost of irregular execution (e= .g", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_microcode_sequencer / (tma_few_uops_inst= ructions + tma_microcode_sequencer) * (tma_assists / tma_microcode_sequence= r) * tma_fetch_latency * (tma_ms_switches + tma_branch_resteers * (tma_clea= rs_resteers + tma_mispredicts_resteers * (10 * tma_microcode_sequencer * tm= a_other_mispredicts / tma_branch_mispredicts)) / (tma_clears_resteers + tma= _mispredicts_resteers + tma_unknown_branches)) / (tma_branch_resteers + tma= _dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_swit= ches) + 10 * tma_microcode_sequencer * tma_other_mispredicts / tma_branch_m= ispredicts * tma_branch_mispredicts + tma_machine_clears * tma_other_nukes = / tma_other_nukes + tma_core_bound * (tma_serializing_operation + tma_core_= bound * RS_EVENTS.EMPTY_CYCLES / tma_info_thread_clks * tma_ports_utilized_= 0) / (tma_divider + tma_ports_utilization + tma_serializing_operation) + tm= a_microcode_sequencer / (tma_few_uops_instructions + tma_microcode_sequence= r) * (tma_assists / tma_microcode_sequencer) * tma_heavy_operations)", "MetricGroup": "Bad;BvIO;Cor;Ret;tma_issueMS", "MetricName": "tma_bottleneck_irregular_overhead", @@ -392,6 +399,7 @@ }, { "BriefDescription": "Total pipeline cost of Memory Address Transla= tion related bottlenecks (data-side TLBs)", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_memory_bound * (tma_l1_bound / max(tma_m= emory_bound, tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + = tma_store_bound)) * (tma_dtlb_load / max(tma_l1_bound, tma_4k_aliasing + tm= a_dtlb_load + tma_fb_full + tma_l1_latency_dependency + tma_lock_latency + = tma_split_loads + tma_store_fwd_blk)) + tma_memory_bound * (tma_store_bound= / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store= _bound)) * (tma_dtlb_store / (tma_dtlb_store + tma_false_sharing + tma_spli= t_stores + tma_store_latency)))", "MetricGroup": "BvMT;Mem;MemoryTLB;Offcore;tma_issueTLB", "MetricName": "tma_bottleneck_memory_data_tlbs", @@ -400,6 +408,7 @@ }, { "BriefDescription": "Total pipeline cost of Memory Synchronization= related bottlenecks (data transfers and coherency updates across processor= s)", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * = (tma_mem_latency / (tma_mem_bandwidth + tma_mem_latency)) * tma_remote_cach= e / (tma_local_mem + tma_remote_cache + tma_remote_mem) + tma_l3_bound / (t= ma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_boun= d) * (tma_contested_accesses + tma_data_sharing) / (tma_contested_accesses = + tma_data_sharing + tma_l3_hit_latency + tma_sq_full) + tma_store_bound / = (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bo= und) * tma_false_sharing / (tma_dtlb_store + tma_false_sharing + tma_split_= stores + tma_store_latency - tma_store_latency)) + tma_machine_clears * (1 = - tma_other_nukes / tma_other_nukes))", "MetricGroup": "BvMS;LockCont;Mem;Offcore;tma_issueSyncxn", "MetricName": "tma_bottleneck_memory_synchronization", @@ -408,6 +417,7 @@ }, { "BriefDescription": "Total pipeline cost of Branch Misprediction r= elated bottlenecks", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (1 - 10 * tma_microcode_sequencer * tma_other= _mispredicts / tma_branch_mispredicts) * (tma_branch_mispredicts + tma_fetc= h_latency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switc= hes + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches))", "MetricGroup": "Bad;BadSpec;BrMispredicts;BvMP;tma_issueBM", "MetricName": "tma_bottleneck_mispredictions", @@ -416,7 +426,8 @@ }, { "BriefDescription": "Total pipeline cost of remaining bottlenecks = in the back-end", - "MetricExpr": "100 - (tma_bottleneck_big_code + tma_bottleneck_ins= truction_fetch_bw + tma_bottleneck_mispredictions + tma_bottleneck_cache_me= mory_bandwidth + tma_bottleneck_cache_memory_latency + tma_bottleneck_memor= y_data_tlbs + tma_bottleneck_memory_synchronization + tma_bottleneck_comput= e_bound_est + tma_bottleneck_irregular_overhead + tma_bottleneck_branching_= overhead + tma_bottleneck_useful_work)", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "100 - (tma_bottleneck_big_code + tma_bottleneck_ins= truction_fetch_bw + tma_bottleneck_mispredictions + tma_bottleneck_data_cac= he_memory_bandwidth + tma_bottleneck_data_cache_memory_latency + tma_bottle= neck_memory_data_tlbs + tma_bottleneck_memory_synchronization + tma_bottlen= eck_compute_bound_est + tma_bottleneck_irregular_overhead + tma_bottleneck_= branching_overhead + tma_bottleneck_useful_work)", "MetricGroup": "BvOB;Cor;Offcore", "MetricName": "tma_bottleneck_other_bottlenecks", "MetricThreshold": "tma_bottleneck_other_bottlenecks > 20", @@ -424,6 +435,7 @@ }, { "BriefDescription": "Total pipeline cost of \"useful operations\" = - the portion of Retiring category not covered by Branching_Overhead nor Ir= regular_Overhead.", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_retiring - (BR_INST_RETIRED.ALL_BRANCHES= + 2 * BR_INST_RETIRED.NEAR_CALL + INST_RETIRED.NOP) / tma_info_thread_slot= s - tma_microcode_sequencer / (tma_few_uops_instructions + tma_microcode_se= quencer) * (tma_assists / tma_microcode_sequencer) * tma_heavy_operations)", "MetricGroup": "BvUW;Ret", "MetricName": "tma_bottleneck_useful_work", @@ -451,6 +463,7 @@ }, { "BriefDescription": "This metric estimates fraction of cycles the = CPU retired uops originated from CISC (complex instruction set computer) in= struction", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "max(0, tma_microcode_sequencer - tma_assists)", "MetricGroup": "TopdownL4;tma_L4_group;tma_microcode_sequencer_gro= up", "MetricName": "tma_cisc", @@ -612,7 +625,7 @@ "MetricGroup": "BvMB;MemoryBW;TopdownL4;tma_L4_group;tma_issueBW;t= ma_issueSL;tma_issueSmSt;tma_l1_bound_group", "MetricName": "tma_fb_full", "MetricThreshold": "tma_fb_full > 0.3", - "PublicDescription": "This metric does a *rough estimation* of how= often L1D Fill Buffer unavailability limited additional L1D miss memory ac= cess requests to proceed. The higher the metric value; the deeper the memor= y hierarchy level the misses are satisfied from (metric values >1 are valid= ). Often it hints on approaching bandwidth limits (to L2 cache; L3 cache or= external memory). Related metrics: tma_bottleneck_cache_memory_bandwidth, = tma_info_system_dram_bw_use, tma_mem_bandwidth, tma_sq_full, tma_store_late= ncy, tma_streaming_stores", + "PublicDescription": "This metric does a *rough estimation* of how= often L1D Fill Buffer unavailability limited additional L1D miss memory ac= cess requests to proceed. The higher the metric value; the deeper the memor= y hierarchy level the misses are satisfied from (metric values >1 are valid= ). Often it hints on approaching bandwidth limits (to L2 cache; L3 cache or= external memory). Related metrics: tma_bottleneck_data_cache_memory_bandwi= dth, tma_info_system_dram_bw_use, tma_mem_bandwidth, tma_sq_full, tma_store= _latency, tma_streaming_stores", "ScaleUnit": "100%" }, { @@ -675,7 +688,6 @@ }, { "BriefDescription": "This metric approximates arithmetic floating-= point (FP) vector uops fraction the CPU has retired aggregated across all v= ector widths", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "cpu@FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE\\,umas= k\\=3D0xfc@ / UOPS_RETIRED.RETIRE_SLOTS", "MetricGroup": "Compute;Flops;TopdownL4;tma_L4_group;tma_fp_arith_= group;tma_issue2P", "MetricName": "tma_fp_vector", @@ -750,6 +762,7 @@ }, { "BriefDescription": "Branch Misprediction Cost: Cycles representin= g fraction of TMA slots wasted per non-speculative branch misprediction (re= tired JEClear)", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "tma_bottleneck_mispredictions * tma_info_thread_slo= ts / 4 / BR_MISP_RETIRED.ALL_BRANCHES / 100", "MetricGroup": "Bad;BrMispredicts;tma_issueBM", "MetricName": "tma_info_bad_spec_branch_misprediction_cost", @@ -785,6 +798,7 @@ }, { "BriefDescription": "Total pipeline cost of DSB (uop cache) hits -= subset of the Instruction_Fetch_BW Bottleneck", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_frontend_bound * (tma_fetch_bandwidth / = (tma_fetch_bandwidth + tma_fetch_latency)) * (tma_dsb / (tma_dsb + tma_mite= )))", "MetricGroup": "DSB;Fed;FetchBW;tma_issueFB", "MetricName": "tma_info_botlnk_l2_dsb_bandwidth", @@ -802,6 +816,7 @@ }, { "BriefDescription": "Total pipeline cost of Instruction Cache miss= es - subset of the Big_Code Bottleneck", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_fetch_latency * tma_icache_misses / (tma= _branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses += tma_lcp + tma_ms_switches))", "MetricGroup": "Fed;FetchLat;IcMiss;tma_issueFL", "MetricName": "tma_info_botlnk_l2_ic_misses", @@ -943,7 +958,6 @@ }, { "BriefDescription": "Instructions per FP Arithmetic instruction (l= ower number means higher occurrence rate)", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "INST_RETIRED.ANY / (FP_ARITH_INST_RETIRED.SCALAR + = cpu@FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE\\,umask\\=3D0xfc@)", "MetricGroup": "Flops;InsType", "MetricName": "tma_info_inst_mix_iparith", @@ -1225,7 +1239,7 @@ "MetricName": "tma_info_memory_tlb_store_stlb_mpki" }, { - "BriefDescription": "", + "BriefDescription": "Mem;Backend;CacheHits", "MetricExpr": "UOPS_EXECUTED.THREAD / (UOPS_EXECUTED.CORE_CYCLES_G= E_1 / 2 if #SMT_on else cpu@UOPS_EXECUTED.THREAD\\,cmask\\=3D1@)", "MetricGroup": "Cor;Pipeline;PortsUtil;SMT", "MetricName": "tma_info_pipeline_execute" @@ -1242,6 +1256,12 @@ "MetricGroup": "Fed;FetchBW", "MetricName": "tma_info_pipeline_fetch_mite" }, + { + "BriefDescription": "Average number of uops fetched from MS per cy= cle", + "MetricExpr": "IDQ.MS_UOPS / cpu@IDQ.MS_UOPS\\,cmask\\=3D1@", + "MetricGroup": "Fed;FetchLat;MicroSeq", + "MetricName": "tma_info_pipeline_fetch_ms" + }, { "BriefDescription": "Instructions per a microcode Assist invocatio= n", "MetricExpr": "INST_RETIRED.ANY / (FP_ASSIST.ANY + OTHER_ASSISTS.A= NY)", @@ -1258,7 +1278,7 @@ }, { "BriefDescription": "Measured Average Core Frequency for unhalted = processors [GHz]", - "MetricExpr": "tma_info_system_turbo_utilization * TSC / 1e9 / tma= _info_system_time", + "MetricExpr": "tma_info_system_turbo_utilization * msr@tsc@ / 1e9 = / tma_info_system_time", "MetricGroup": "Power;Summary", "MetricName": "tma_info_system_core_frequency" }, @@ -1270,7 +1290,7 @@ }, { "BriefDescription": "Average number of utilized CPUs", - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / TSC", + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@", "MetricGroup": "Summary", "MetricName": "tma_info_system_cpus_utilized" }, @@ -1279,7 +1299,7 @@ "MetricExpr": "64 * (UNC_M_CAS_COUNT.RD + UNC_M_CAS_COUNT.WR) / 1e= 9 / tma_info_system_time", "MetricGroup": "HPC;MemOffcore;MemoryBW;SoC;tma_issueBW", "MetricName": "tma_info_system_dram_bw_use", - "PublicDescription": "Average external Memory Bandwidth Use for re= ads and writes [GB / sec]. Related metrics: tma_bottleneck_cache_memory_ban= dwidth, tma_fb_full, tma_mem_bandwidth, tma_sq_full" + "PublicDescription": "Average external Memory Bandwidth Use for re= ads and writes [GB / sec]. Related metrics: tma_bottleneck_data_cache_memor= y_bandwidth, tma_fb_full, tma_mem_bandwidth, tma_sq_full" }, { "BriefDescription": "Giga Floating Point Operations Per Second", @@ -1475,12 +1495,13 @@ "ScaleUnit": "100%" }, { - "BriefDescription": "This metric([SKL+] roughly; [LNL]) estimates = fraction of cycles with demand load accesses that hit the L1D cache", + "BriefDescription": "This metric ([SKL+] roughly; [LNL]) estimates= fraction of cycles with demand load accesses that hit the L1D cache", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "min(2 * (MEM_INST_RETIRED.ALL_LOADS - MEM_LOAD_RETI= RED.FB_HIT - MEM_LOAD_RETIRED.L1_MISS) * 20 / 100, max(CYCLE_ACTIVITY.CYCLE= S_MEM_ANY - CYCLE_ACTIVITY.CYCLES_L1D_MISS, 0)) / tma_info_thread_clks", "MetricGroup": "BvML;MemoryLat;TopdownL4;tma_L4_group;tma_l1_bound= _group", "MetricName": "tma_l1_latency_dependency", "MetricThreshold": "tma_l1_latency_dependency > 0.1 & (tma_l1_boun= d > 0.1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric([SKL+] roughly; [LNL]) estimates= fraction of cycles with demand load accesses that hit the L1D cache. The s= hort latency of the L1D cache may be exposed in pointer-chasing memory acce= ss patterns as an example. Sample with: MEM_LOAD_RETIRED.L1_HIT", + "PublicDescription": "This metric ([SKL+] roughly; [LNL]) estimate= s fraction of cycles with demand load accesses that hit the L1D cache. The = short latency of the L1D cache may be exposed in pointer-chasing memory acc= ess patterns as an example. Sample with: MEM_LOAD_RETIRED.L1_HIT", "ScaleUnit": "100%" }, { @@ -1517,7 +1538,7 @@ "MetricGroup": "BvML;MemoryLat;TopdownL4;tma_L4_group;tma_issueLat= ;tma_l3_bound_group", "MetricName": "tma_l3_hit_latency", "MetricThreshold": "tma_l3_hit_latency > 0.1 & (tma_l3_bound > 0.0= 5 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric estimates fraction of cycles wit= h demand load accesses that hit the L3 cache under unloaded scenarios (poss= ibly L3 latency limited). Avoiding private cache misses (i.e. L2 misses/L3= hits) will improve the latency; reduce contention with sibling physical co= res and increase performance. Note the value of this node may overlap with= its siblings. Sample with: MEM_LOAD_RETIRED.L3_HIT_PS. Related metrics: tm= a_bottleneck_cache_memory_latency, tma_mem_latency", + "PublicDescription": "This metric estimates fraction of cycles wit= h demand load accesses that hit the L3 cache under unloaded scenarios (poss= ibly L3 latency limited). Avoiding private cache misses (i.e. L2 misses/L3= hits) will improve the latency; reduce contention with sibling physical co= res and increase performance. Note the value of this node may overlap with= its siblings. Sample with: MEM_LOAD_RETIRED.L3_HIT_PS. Related metrics: tm= a_bottleneck_data_cache_memory_latency, tma_mem_latency", "ScaleUnit": "100%" }, { @@ -1541,6 +1562,7 @@ }, { "BriefDescription": "This metric represents Core fraction of cycle= s CPU dispatched uops on execution port for Load operations", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "(UOPS_DISPATCHED_PORT.PORT_2 + UOPS_DISPATCHED_PORT= .PORT_3 + UOPS_DISPATCHED_PORT.PORT_7 - UOPS_DISPATCHED_PORT.PORT_4) / (2 *= tma_info_core_core_clks)", "MetricGroup": "TopdownL5;tma_L5_group;tma_ports_utilized_3m_group= ", "MetricName": "tma_load_op_utilization", @@ -1567,6 +1589,7 @@ }, { "BriefDescription": "This metric estimates the fraction of cycles = to walk the memory paging structures to cache translation of 1 GB pages for= data load accesses.", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "tma_load_stlb_miss * DTLB_LOAD_MISSES.WALK_COMPLETE= D_1G / (DTLB_LOAD_MISSES.WALK_COMPLETED_4K + DTLB_LOAD_MISSES.WALK_COMPLETE= D_2M_4M + DTLB_LOAD_MISSES.WALK_COMPLETED_1G)", "MetricGroup": "MemoryTLB;TopdownL6;tma_L6_group;tma_load_stlb_mis= s_group", "MetricName": "tma_load_stlb_miss_1g", @@ -1575,6 +1598,7 @@ }, { "BriefDescription": "This metric estimates the fraction of cycles = to walk the memory paging structures to cache translation of 2 or 4 MB page= s for data load accesses.", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "tma_load_stlb_miss * DTLB_LOAD_MISSES.WALK_COMPLETE= D_2M_4M / (DTLB_LOAD_MISSES.WALK_COMPLETED_4K + DTLB_LOAD_MISSES.WALK_COMPL= ETED_2M_4M + DTLB_LOAD_MISSES.WALK_COMPLETED_1G)", "MetricGroup": "MemoryTLB;TopdownL6;tma_L6_group;tma_load_stlb_mis= s_group", "MetricName": "tma_load_stlb_miss_2m", @@ -1583,6 +1607,7 @@ }, { "BriefDescription": "This metric estimates the fraction of cycles = to walk the memory paging structures to cache translation of 4 KB pages for= data load accesses.", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "tma_load_stlb_miss * DTLB_LOAD_MISSES.WALK_COMPLETE= D_4K / (DTLB_LOAD_MISSES.WALK_COMPLETED_4K + DTLB_LOAD_MISSES.WALK_COMPLETE= D_2M_4M + DTLB_LOAD_MISSES.WALK_COMPLETED_1G)", "MetricGroup": "MemoryTLB;TopdownL6;tma_L6_group;tma_load_stlb_mis= s_group", "MetricName": "tma_load_stlb_miss_4k", @@ -1600,6 +1625,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles the= CPU spent handling cache misses due to lock operations", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "(12 * max(0, MEM_INST_RETIRED.LOCK_LOADS - L2_RQSTS= .ALL_RFO) + MEM_INST_RETIRED.LOCK_LOADS / MEM_INST_RETIRED.ALL_STORES * (11= * L2_RQSTS.RFO_HIT + min(CPU_CLK_UNHALTED.THREAD, OFFCORE_REQUESTS_OUTSTAN= DING.CYCLES_WITH_DEMAND_RFO))) / tma_info_thread_clks", "MetricGroup": "LockCont;Offcore;TopdownL4;tma_L4_group;tma_issueR= FO;tma_l1_bound_group", "MetricName": "tma_lock_latency", @@ -1624,7 +1650,7 @@ "MetricGroup": "BvMB;MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_d= ram_bound_group;tma_issueBW", "MetricName": "tma_mem_bandwidth", "MetricThreshold": "tma_mem_bandwidth > 0.2 & (tma_dram_bound > 0.= 1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric estimates fraction of cycles whe= re the core's performance was likely hurt due to approaching bandwidth limi= ts of external memory - DRAM ([SPR-HBM] and/or HBM). The underlying heuris= tic assumes that a similar off-core traffic is generated by all IA cores. T= his metric does not aggregate non-data-read requests by this logical proces= sor; requests from other IA Logical Processors/Physical Cores/sockets; or o= ther non-IA devices like GPU; hence the maximum external memory bandwidth l= imits may or may not be approached when this metric is flagged (see Uncore = counters for that). Related metrics: tma_bottleneck_cache_memory_bandwidth,= tma_fb_full, tma_info_system_dram_bw_use, tma_sq_full", + "PublicDescription": "This metric estimates fraction of cycles whe= re the core's performance was likely hurt due to approaching bandwidth limi= ts of external memory - DRAM ([SPR-HBM] and/or HBM). The underlying heuris= tic assumes that a similar off-core traffic is generated by all IA cores. T= his metric does not aggregate non-data-read requests by this logical proces= sor; requests from other IA Logical Processors/Physical Cores/sockets; or o= ther non-IA devices like GPU; hence the maximum external memory bandwidth l= imits may or may not be approached when this metric is flagged (see Uncore = counters for that). Related metrics: tma_bottleneck_data_cache_memory_bandw= idth, tma_fb_full, tma_info_system_dram_bw_use, tma_sq_full", "ScaleUnit": "100%" }, { @@ -1633,7 +1659,7 @@ "MetricGroup": "BvML;MemoryLat;Offcore;TopdownL4;tma_L4_group;tma_= dram_bound_group;tma_issueLat", "MetricName": "tma_mem_latency", "MetricThreshold": "tma_mem_latency > 0.1 & (tma_dram_bound > 0.1 = & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric estimates fraction of cycles whe= re the performance was likely hurt due to latency from external memory - DR= AM ([SPR-HBM] and/or HBM). This metric does not aggregate requests from ot= her Logical Processors/Physical Cores/sockets (see Uncore counters for that= ). Related metrics: tma_bottleneck_cache_memory_latency, tma_l3_hit_latency= ", + "PublicDescription": "This metric estimates fraction of cycles whe= re the performance was likely hurt due to latency from external memory - DR= AM ([SPR-HBM] and/or HBM). This metric does not aggregate requests from ot= her Logical Processors/Physical Cores/sockets (see Uncore counters for that= ). Related metrics: tma_bottleneck_data_cache_memory_latency, tma_l3_hit_la= tency", "ScaleUnit": "100%" }, { @@ -1657,7 +1683,6 @@ }, { "BriefDescription": "This metric represents fraction of slots the = CPU was retiring uops fetched by the Microcode Sequencer (MS) unit", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / UOPS_ISSUED.ANY * IDQ.M= S_UOPS / tma_info_thread_slots", "MetricGroup": "MicroSeq;TopdownL3;tma_L3_group;tma_heavy_operatio= ns_group;tma_issueMC;tma_issueMS", "MetricName": "tma_microcode_sequencer", @@ -1667,6 +1692,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles the= CPU was stalled due to Branch Resteers as a result of Branch Misprediction= at execution stage", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "BR_MISP_RETIRED.ALL_BRANCHES / (BR_MISP_RETIRED.ALL= _BRANCHES + MACHINE_CLEARS.COUNT) * INT_MISC.CLEAR_RESTEER_CYCLES / tma_inf= o_thread_clks", "MetricGroup": "BadSpec;BrMispredicts;BvMP;TopdownL4;tma_L4_group;= tma_branch_resteers_group;tma_issueBM", "MetricName": "tma_mispredicts_resteers", @@ -1721,6 +1747,7 @@ }, { "BriefDescription": "This metric represents the remaining light uo= ps fraction the CPU has executed - remaining means not covered by other sib= ling nodes", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "max(0, tma_light_operations - (tma_fp_arith + tma_m= emory_operations + tma_fused_instructions + tma_non_fused_branches))", "MetricGroup": "Pipeline;TopdownL3;tma_L3_group;tma_light_operatio= ns_group", "MetricName": "tma_other_light_ops", @@ -1730,6 +1757,7 @@ }, { "BriefDescription": "This metric estimates fraction of slots the C= PU was stalled due to other cases of misprediction (non-retired x86 branche= s or other types).", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "max(tma_branch_mispredicts * (1 - BR_MISP_RETIRED.A= LL_BRANCHES / (INT_MISC.CLEARS_COUNT - MACHINE_CLEARS.COUNT)), 0.0001)", "MetricGroup": "BrMispredicts;BvIO;TopdownL3;tma_L3_group;tma_bran= ch_mispredicts_group", "MetricName": "tma_other_mispredicts", @@ -1738,6 +1766,7 @@ }, { "BriefDescription": "This metric represents fraction of slots the = CPU has wasted due to Nukes (Machine Clears) not related to memory ordering= .", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "max(tma_machine_clears * (1 - MACHINE_CLEARS.MEMORY= _ORDERING / MACHINE_CLEARS.COUNT), 0.0001)", "MetricGroup": "BvIO;Machine_Clears;TopdownL3;tma_L3_group;tma_mac= hine_clears_group", "MetricName": "tma_other_nukes", @@ -1818,6 +1847,7 @@ }, { "BriefDescription": "This metric estimates fraction of cycles the = CPU performance was potentially limited due to Core computation issues (non= divider-related)", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "((tma_ports_utilized_0 * tma_info_thread_clks + (EX= E_ACTIVITY.1_PORTS_UTIL + tma_retiring * EXE_ACTIVITY.2_PORTS_UTIL)) / tma_= info_thread_clks if ARITH.DIVIDER_ACTIVE < CYCLE_ACTIVITY.STALLS_TOTAL - CY= CLE_ACTIVITY.STALLS_MEM_ANY else (EXE_ACTIVITY.1_PORTS_UTIL + tma_retiring = * EXE_ACTIVITY.2_PORTS_UTIL) / tma_info_thread_clks)", "MetricGroup": "PortsUtil;TopdownL3;tma_L3_group;tma_core_bound_gr= oup", "MetricName": "tma_ports_utilization", @@ -1923,7 +1953,7 @@ "MetricGroup": "BvMB;MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_i= ssueBW;tma_l3_bound_group", "MetricName": "tma_sq_full", "MetricThreshold": "tma_sq_full > 0.3 & (tma_l3_bound > 0.05 & (tm= a_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric measures fraction of cycles wher= e the Super Queue (SQ) was full taking into account all request-types and b= oth hardware SMT threads (Logical Processors). Related metrics: tma_bottlen= eck_cache_memory_bandwidth, tma_fb_full, tma_info_system_dram_bw_use, tma_m= em_bandwidth", + "PublicDescription": "This metric measures fraction of cycles wher= e the Super Queue (SQ) was full taking into account all request-types and b= oth hardware SMT threads (Logical Processors). Related metrics: tma_bottlen= eck_data_cache_memory_bandwidth, tma_fb_full, tma_info_system_dram_bw_use, = tma_mem_bandwidth", "ScaleUnit": "100%" }, { @@ -1980,6 +2010,7 @@ }, { "BriefDescription": "This metric estimates the fraction of cycles = to walk the memory paging structures to cache translation of 1 GB pages for= data store accesses.", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "tma_store_stlb_miss * DTLB_STORE_MISSES.WALK_COMPLE= TED_1G / (DTLB_STORE_MISSES.WALK_COMPLETED_4K + DTLB_STORE_MISSES.WALK_COMP= LETED_2M_4M + DTLB_STORE_MISSES.WALK_COMPLETED_1G)", "MetricGroup": "MemoryTLB;TopdownL6;tma_L6_group;tma_store_stlb_mi= ss_group", "MetricName": "tma_store_stlb_miss_1g", @@ -1988,6 +2019,7 @@ }, { "BriefDescription": "This metric estimates the fraction of cycles = to walk the memory paging structures to cache translation of 2 or 4 MB page= s for data store accesses.", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "tma_store_stlb_miss * DTLB_STORE_MISSES.WALK_COMPLE= TED_2M_4M / (DTLB_STORE_MISSES.WALK_COMPLETED_4K + DTLB_STORE_MISSES.WALK_C= OMPLETED_2M_4M + DTLB_STORE_MISSES.WALK_COMPLETED_1G)", "MetricGroup": "MemoryTLB;TopdownL6;tma_L6_group;tma_store_stlb_mi= ss_group", "MetricName": "tma_store_stlb_miss_2m", @@ -1996,6 +2028,7 @@ }, { "BriefDescription": "This metric estimates the fraction of cycles = to walk the memory paging structures to cache translation of 4 KB pages for= data store accesses.", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "tma_store_stlb_miss * DTLB_STORE_MISSES.WALK_COMPLE= TED_4K / (DTLB_STORE_MISSES.WALK_COMPLETED_4K + DTLB_STORE_MISSES.WALK_COMP= LETED_2M_4M + DTLB_STORE_MISSES.WALK_COMPLETED_1G)", "MetricGroup": "MemoryTLB;TopdownL6;tma_L6_group;tma_store_stlb_mi= ss_group", "MetricName": "tma_store_stlb_miss_4k", --=20 2.50.0.727.gbf7dc18ff4-goog From nobody Mon Oct 6 15:13:22 2025 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 98EB622DFBA for ; Sat, 19 Jul 2025 03:46:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752896767; cv=none; b=SJqrbs1hPG4fxILwGpWpATNBUuZG4DASGPmKjHQf8nriVdHXX1Sz7gXG1SOV9H5x1BmwRu/V70+wHrxJDiYCrhUsJZFeuZrLmLGV1sLJlb/R8qO58d2v7/A5jELxIdaZ7ei3yyPcccYDv5rspy1tDw83Kin3dNqrxKrSeEijtgc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752896767; c=relaxed/simple; bh=RW3sE/zzUhGGaYDbDBgH0ab3y8iZWKEGmmQZpthh+FI=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Content-Type; b=kPZzTNgPxf/SDKneEY3r7xPbz4n7b4q/gmzRPDoR538UnvqjXjKcXin3f91MME2iB3Ua8wwxpVYXh1xRhmRRqyJAkaO+9+9kE6klhdzQQBDWcnh90p0M8dAXVODKxhnnf/uqF8oln3Mudg8RwvU0OOW++gMSyETY/+L7m9y0SX8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=b85OfHfw; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="b85OfHfw" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-3132c8437ffso4546374a91.1 for ; Fri, 18 Jul 2025 20:46:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1752896760; x=1753501560; darn=vger.kernel.org; h=content-transfer-encoding:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=KHcNoZMwTfMQ/aHVRsUg2K+VIGfO91d7ITnZVoJR130=; b=b85OfHfwnBVfA1VMsuFKWDbuLhu9XGZGeApnqfdX5TOXuaNNk5I2om7kNyFKDIccq0 KJwQN9u29DP+iG3rfaTUTY+Lt4gYKdErR94DPWkplaJRGOIhIVsBr1/xT72mmV89bTiE R4z+mplPuszGKMotSQCVBEo5yF3Tlat/ygQ46Z4JHOrgCB03m9bh1DQ96BiKOkyLh5Gd uT0LYIVI8edGiLVuCI8HKD7GQI3Hx2jDvdj8WtgWHNg+OvM/B3RH9f8LRyfpzb7xgwT/ ptatI9W4JOvys8keLY4Lp2jouGwtldua6NJTiS0SIis3J1S6ZtiUWdb7SSPLCqMmFQe2 88Mg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752896760; x=1753501560; h=content-transfer-encoding:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=KHcNoZMwTfMQ/aHVRsUg2K+VIGfO91d7ITnZVoJR130=; b=ksznGpthXjWaWsJUT+Lb+GVnSd1y0u+CEVWkCFi4QhTL2ajfhCJGXGMsVV/hDAZ7YM 9WRzPRNccj05BDBkFahxy18rVQPX60UGUter01Vm4EYYNZqCQjxoYAgAAa3LzhRSrNZ7 lcdCWlW8jORKOGZ1u8df1fBJ782/Cp0E1qH9Wrb4HSVP+aBN1h85J6zpm6LHY/WbmpTt 1PAM7tgVy78cRf77nsZOl1SQYN5VQK/eaCyRhxsp3HZUTlQNkF4qiB1jwCf9k0Cdhs1Q iILMdtLcXTw1AYSW9zRPV3uftReBBlKZ2gleTo43UCDeA0Sa4CiFd9+o5PDN0K20Laxn AphA== X-Forwarded-Encrypted: i=1; AJvYcCW8LmpBi5BBiiO/8smgzbefhHpWlkbd1QFjED9aOrkOCNFUh75kuYI//gfEzwmVv4zXH9Y2+qXRSrkZU1U=@vger.kernel.org X-Gm-Message-State: AOJu0Yz2i1Ixe0lb5gZNumEhBRsRd/c+YoT3m2Du9zp7M2deF85+hdd/ sqvDnlrvy8Fp9ZCv0+9K/kmBuCTvkk20RCAmoldZ5+YP03AmfTYGHYb3/pLh1+8znSuwjs1AmZt +xUPGQZK2Jg== X-Google-Smtp-Source: AGHT+IFVfIwfnpP1lgOsCyFGDf+rrVS8i5c+LGA1Zeuqz1SGxm+eBI4aKVcH6fCtwLPrnxvLbTx3tUhdbX1H X-Received: from pji6.prod.google.com ([2002:a17:90b:3fc6:b0:313:2213:1f54]) (user=irogers job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:2683:b0:312:959:dc4d with SMTP id 98e67ed59e1d1-31c9f3ee9e7mr17487572a91.7.1752896759805; Fri, 18 Jul 2025 20:45:59 -0700 (PDT) Date: Fri, 18 Jul 2025 20:45:15 -0700 In-Reply-To: <20250719034515.2000467-1-irogers@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250719034515.2000467-1-irogers@google.com> X-Mailer: git-send-email 2.50.0.727.gbf7dc18ff4-goog Message-ID: <20250719034515.2000467-20-irogers@google.com> Subject: [PATCH v1 19/19] perf vendor events: Update tigerlake metrics From: Ian Rogers To: Thomas Falcon , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Kan Liang , "=?UTF-8?q?Andreas=20F=C3=A4rber?=" , Manivannan Sadhasivam , Caleb Biggers , Weilin Wang , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-actions@lists.infradead.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Update metrics from TMA 5.0 to 5.1. Signed-off-by: Ian Rogers Tested-by: Thomas Falcon --- .../arch/x86/tigerlake/tgl-metrics.json | 97 +++++++++++-------- 1 file changed, 56 insertions(+), 41 deletions(-) diff --git a/tools/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json b/to= ols/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json index 2db7a70f7a07..908da985c594 100644 --- a/tools/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json +++ b/tools/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json @@ -1,63 +1,63 @@ [ { "BriefDescription": "C10 residency percent per package", - "MetricExpr": "cstate_pkg@c10\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c10\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C10_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C2 residency percent per package", - "MetricExpr": "cstate_pkg@c2\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c2\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C2_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C3 residency percent per package", - "MetricExpr": "cstate_pkg@c3\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c3\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C3_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per core", - "MetricExpr": "cstate_core@c6\\-residency@ / TSC", + "MetricExpr": "cstate_core@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per package", - "MetricExpr": "cstate_pkg@c6\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C7 residency percent per core", - "MetricExpr": "cstate_core@c7\\-residency@ / TSC", + "MetricExpr": "cstate_core@c7\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C7_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C7 residency percent per package", - "MetricExpr": "cstate_pkg@c7\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c7\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C7_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C8 residency percent per package", - "MetricExpr": "cstate_pkg@c8\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c8\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C8_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C9 residency percent per package", - "MetricExpr": "cstate_pkg@c9\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c9\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C9_Pkg_Residency", "ScaleUnit": "100%" @@ -85,7 +85,6 @@ }, { "BriefDescription": "This metric estimates how often memory load a= ccesses were aliased by preceding stores (in program order) with a 4K addre= ss offset", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "LD_BLOCKS_PARTIAL.ADDRESS_ALIAS / tma_info_thread_c= lks", "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group", "MetricName": "tma_4k_aliasing", @@ -134,6 +133,7 @@ }, { "BriefDescription": "Total pipeline cost of instruction fetch rela= ted bottlenecks by large code footprint programs (i-side cache; TLB and BTB= misses)", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * tma_fetch_latency * (tma_itlb_misses + tma_ic= ache_misses + tma_unknown_branches) / (tma_branch_resteers + tma_dsb_switch= es + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches)", "MetricGroup": "BigFootprint;BvBC;Fed;Frontend;IcMiss;MemoryTLB", "MetricName": "tma_bottleneck_big_code", @@ -147,40 +147,45 @@ "MetricThreshold": "tma_bottleneck_branching_overhead > 5", "PublicDescription": "Total pipeline cost of instructions used for= program control-flow - a subset of the Retiring category in TMA. Examples = include function calls; loops and alignments. (A lower bound)" }, + { + "BriefDescription": "Total pipeline cost when the execution is com= pute-bound - an estimation", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "100 * (tma_core_bound * tma_divider / (tma_divider = + tma_ports_utilization + tma_serializing_operation) + tma_core_bound * (tm= a_ports_utilization / (tma_divider + tma_ports_utilization + tma_serializin= g_operation)) * (tma_ports_utilized_3m / (tma_ports_utilized_0 + tma_ports_= utilized_1 + tma_ports_utilized_2 + tma_ports_utilized_3m)))", + "MetricGroup": "BvCB;Cor;tma_issueComp", + "MetricName": "tma_bottleneck_compute_bound_est", + "MetricThreshold": "tma_bottleneck_compute_bound_est > 20", + "PublicDescription": "Total pipeline cost when the execution is co= mpute-bound - an estimation. Covers Core Bound when High ILP as well as whe= n long-latency execution units are busy. Related metrics: " + }, { "BriefDescription": "Total pipeline cost of external Memory- or Ca= che-Bandwidth related bottlenecks", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) *= (tma_mem_bandwidth / (tma_mem_bandwidth + tma_mem_latency)) + tma_memory_b= ound * (tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_= l3_bound + tma_store_bound)) * (tma_sq_full / (tma_contested_accesses + tma= _data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_memory_bound * (tm= a_l1_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound += tma_store_bound)) * (tma_fb_full / (tma_4k_aliasing + tma_dtlb_load + tma_= fb_full + tma_l1_latency_dependency + tma_lock_latency + tma_split_loads + = tma_store_fwd_blk)))", "MetricGroup": "BvMB;Mem;MemoryBW;Offcore;tma_issueBW", - "MetricName": "tma_bottleneck_cache_memory_bandwidth", - "MetricThreshold": "tma_bottleneck_cache_memory_bandwidth > 20", + "MetricName": "tma_bottleneck_data_cache_memory_bandwidth", + "MetricThreshold": "tma_bottleneck_data_cache_memory_bandwidth > 2= 0", "PublicDescription": "Total pipeline cost of external Memory- or C= ache-Bandwidth related bottlenecks. Related metrics: tma_fb_full, tma_info_= system_dram_bw_use, tma_mem_bandwidth, tma_sq_full" }, { "BriefDescription": "Total pipeline cost of external Memory- or Ca= che-Latency related bottlenecks", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) *= (tma_mem_latency / (tma_mem_bandwidth + tma_mem_latency)) + tma_memory_bou= nd * (tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3= _bound + tma_store_bound)) * (tma_l3_hit_latency / (tma_contested_accesses = + tma_data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_memory_bound = * tma_l2_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bou= nd + tma_store_bound) + tma_memory_bound * (tma_l1_bound / (tma_dram_bound = + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_l1_= latency_dependency / (tma_4k_aliasing + tma_dtlb_load + tma_fb_full + tma_l= 1_latency_dependency + tma_lock_latency + tma_split_loads + tma_store_fwd_b= lk)) + tma_memory_bound * (tma_l1_bound / (tma_dram_bound + tma_l1_bound + = tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_lock_latency / (tma_= 4k_aliasing + tma_dtlb_load + tma_fb_full + tma_l1_latency_dependency + tma= _lock_latency + tma_split_loads + tma_store_fwd_blk)) + tma_memory_bound * = (tma_l1_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_boun= d + tma_store_bound)) * (tma_split_loads / (tma_4k_aliasing + tma_dtlb_load= + tma_fb_full + tma_l1_latency_dependency + tma_lock_latency + tma_split_l= oads + tma_store_fwd_blk)) + tma_memory_bound * (tma_store_bound / (tma_dra= m_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * = (tma_split_stores / (tma_dtlb_store + tma_false_sharing + tma_split_stores = + tma_store_latency + tma_streaming_stores)) + tma_memory_bound * (tma_stor= e_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tm= a_store_bound)) * (tma_store_latency / (tma_dtlb_store + tma_false_sharing = + tma_split_stores + tma_store_latency + tma_streaming_stores)))", "MetricGroup": "BvML;Mem;MemoryLat;Offcore;tma_issueLat", - "MetricName": "tma_bottleneck_cache_memory_latency", - "MetricThreshold": "tma_bottleneck_cache_memory_latency > 20", + "MetricName": "tma_bottleneck_data_cache_memory_latency", + "MetricThreshold": "tma_bottleneck_data_cache_memory_latency > 20", "PublicDescription": "Total pipeline cost of external Memory- or C= ache-Latency related bottlenecks. Related metrics: tma_l3_hit_latency, tma_= mem_latency" }, - { - "BriefDescription": "Total pipeline cost when the execution is com= pute-bound - an estimation", - "MetricExpr": "100 * (tma_core_bound * tma_divider / (tma_divider = + tma_ports_utilization + tma_serializing_operation) + tma_core_bound * (tm= a_ports_utilization / (tma_divider + tma_ports_utilization + tma_serializin= g_operation)) * (tma_ports_utilized_3m / (tma_ports_utilized_0 + tma_ports_= utilized_1 + tma_ports_utilized_2 + tma_ports_utilized_3m)))", - "MetricGroup": "BvCB;Cor;tma_issueComp", - "MetricName": "tma_bottleneck_compute_bound_est", - "MetricThreshold": "tma_bottleneck_compute_bound_est > 20", - "PublicDescription": "Total pipeline cost when the execution is co= mpute-bound - an estimation. Covers Core Bound when High ILP as well as whe= n long-latency execution units are busy. Related metrics: " - }, { "BriefDescription": "Total pipeline cost of instruction fetch band= width related bottlenecks (when the front-end could not sustain operations = delivery to the back-end)", - "MetricExpr": "100 * (tma_frontend_bound - (1 - 10 * tma_microcode= _sequencer * tma_other_mispredicts / tma_branch_mispredicts) * tma_fetch_la= tency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches = + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches) - tma_mi= crocode_sequencer / (tma_few_uops_instructions + tma_microcode_sequencer) *= (tma_assists / tma_microcode_sequencer) * (tma_fetch_latency * (tma_ms_swi= tches + tma_branch_resteers * (tma_clears_resteers + 10 * tma_microcode_seq= uencer * tma_other_mispredicts / tma_branch_mispredicts * tma_mispredicts_r= esteers) / (tma_clears_resteers + tma_mispredicts_resteers + tma_unknown_br= anches)) / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tm= a_itlb_misses + tma_lcp + tma_ms_switches) + tma_fetch_bandwidth * tma_ms /= (tma_dsb + tma_lsd + tma_mite + tma_ms))) - tma_bottleneck_big_code", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "100 * (tma_frontend_bound - (1 - 10 * tma_microcode= _sequencer * tma_other_mispredicts / tma_branch_mispredicts) * tma_fetch_la= tency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches = + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches) - tma_mi= crocode_sequencer / (tma_few_uops_instructions + tma_microcode_sequencer) *= (tma_assists / tma_microcode_sequencer) * (tma_fetch_latency * (tma_ms_swi= tches + tma_branch_resteers * (tma_clears_resteers + 10 * tma_microcode_seq= uencer * tma_other_mispredicts / tma_branch_mispredicts * tma_mispredicts_r= esteers) / (tma_clears_resteers + tma_mispredicts_resteers + tma_unknown_br= anches)) / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tm= a_itlb_misses + tma_lcp + tma_ms_switches) + tma_ms)) - tma_bottleneck_big_= code", "MetricGroup": "BvFB;Fed;FetchBW;Frontend", "MetricName": "tma_bottleneck_instruction_fetch_bw", "MetricThreshold": "tma_bottleneck_instruction_fetch_bw > 20" }, { "BriefDescription": "Total pipeline cost of irregular execution (e= .g", - "MetricExpr": "100 * (tma_microcode_sequencer / (tma_few_uops_inst= ructions + tma_microcode_sequencer) * (tma_assists / tma_microcode_sequence= r) * (tma_fetch_latency * (tma_ms_switches + tma_branch_resteers * (tma_cle= ars_resteers + 10 * tma_microcode_sequencer * tma_other_mispredicts / tma_b= ranch_mispredicts * tma_mispredicts_resteers) / (tma_clears_resteers + tma_= mispredicts_resteers + tma_unknown_branches)) / (tma_branch_resteers + tma_= dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switc= hes) + tma_fetch_bandwidth * tma_ms / (tma_dsb + tma_lsd + tma_mite + tma_m= s)) + 10 * tma_microcode_sequencer * tma_other_mispredicts / tma_branch_mis= predicts * tma_branch_mispredicts + tma_machine_clears * tma_other_nukes / = tma_other_nukes + tma_core_bound * (tma_serializing_operation + tma_core_bo= und * RS_EVENTS.EMPTY_CYCLES / tma_info_thread_clks * tma_ports_utilized_0)= / (tma_divider + tma_ports_utilization + tma_serializing_operation) + tma_= microcode_sequencer / (tma_few_uops_instructions + tma_microcode_sequencer)= * (tma_assists / tma_microcode_sequencer) * tma_heavy_operations)", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "100 * (tma_microcode_sequencer / (tma_few_uops_inst= ructions + tma_microcode_sequencer) * (tma_assists / tma_microcode_sequence= r) * (tma_fetch_latency * (tma_ms_switches + tma_branch_resteers * (tma_cle= ars_resteers + 10 * tma_microcode_sequencer * tma_other_mispredicts / tma_b= ranch_mispredicts * tma_mispredicts_resteers) / (tma_clears_resteers + tma_= mispredicts_resteers + tma_unknown_branches)) / (tma_branch_resteers + tma_= dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switc= hes) + tma_ms) + 10 * tma_microcode_sequencer * tma_other_mispredicts / tma= _branch_mispredicts * tma_branch_mispredicts + tma_machine_clears * tma_oth= er_nukes / tma_other_nukes + tma_core_bound * (tma_serializing_operation + = tma_core_bound * RS_EVENTS.EMPTY_CYCLES / tma_info_thread_clks * tma_ports_= utilized_0) / (tma_divider + tma_ports_utilization + tma_serializing_operat= ion) + tma_microcode_sequencer / (tma_few_uops_instructions + tma_microcode= _sequencer) * (tma_assists / tma_microcode_sequencer) * tma_heavy_operation= s)", "MetricGroup": "Bad;BvIO;Cor;Ret;tma_issueMS", "MetricName": "tma_bottleneck_irregular_overhead", "MetricThreshold": "tma_bottleneck_irregular_overhead > 10", @@ -188,6 +193,7 @@ }, { "BriefDescription": "Total pipeline cost of Memory Address Transla= tion related bottlenecks (data-side TLBs)", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_memory_bound * (tma_l1_bound / max(tma_m= emory_bound, tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + = tma_store_bound)) * (tma_dtlb_load / max(tma_l1_bound, tma_4k_aliasing + tm= a_dtlb_load + tma_fb_full + tma_l1_latency_dependency + tma_lock_latency + = tma_split_loads + tma_store_fwd_blk)) + tma_memory_bound * (tma_store_bound= / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store= _bound)) * (tma_dtlb_store / (tma_dtlb_store + tma_false_sharing + tma_spli= t_stores + tma_store_latency + tma_streaming_stores)))", "MetricGroup": "BvMT;Mem;MemoryTLB;Offcore;tma_issueTLB", "MetricName": "tma_bottleneck_memory_data_tlbs", @@ -196,6 +202,7 @@ }, { "BriefDescription": "Total pipeline cost of Memory Synchronization= related bottlenecks (data transfers and coherency updates across processor= s)", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_memory_bound * (tma_l3_bound / (tma_dram= _bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * (t= ma_contested_accesses + tma_data_sharing) / (tma_contested_accesses + tma_d= ata_sharing + tma_l3_hit_latency + tma_sq_full) + tma_store_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * = tma_false_sharing / (tma_dtlb_store + tma_false_sharing + tma_split_stores = + tma_store_latency + tma_streaming_stores - tma_store_latency)) + tma_mach= ine_clears * (1 - tma_other_nukes / tma_other_nukes))", "MetricGroup": "BvMS;LockCont;Mem;Offcore;tma_issueSyncxn", "MetricName": "tma_bottleneck_memory_synchronization", @@ -204,6 +211,7 @@ }, { "BriefDescription": "Total pipeline cost of Branch Misprediction r= elated bottlenecks", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (1 - 10 * tma_microcode_sequencer * tma_other= _mispredicts / tma_branch_mispredicts) * (tma_branch_mispredicts + tma_fetc= h_latency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switc= hes + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches))", "MetricGroup": "Bad;BadSpec;BrMispredicts;BvMP;tma_issueBM", "MetricName": "tma_bottleneck_mispredictions", @@ -212,7 +220,8 @@ }, { "BriefDescription": "Total pipeline cost of remaining bottlenecks = in the back-end", - "MetricExpr": "100 - (tma_bottleneck_big_code + tma_bottleneck_ins= truction_fetch_bw + tma_bottleneck_mispredictions + tma_bottleneck_cache_me= mory_bandwidth + tma_bottleneck_cache_memory_latency + tma_bottleneck_memor= y_data_tlbs + tma_bottleneck_memory_synchronization + tma_bottleneck_comput= e_bound_est + tma_bottleneck_irregular_overhead + tma_bottleneck_branching_= overhead + tma_bottleneck_useful_work)", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "100 - (tma_bottleneck_big_code + tma_bottleneck_ins= truction_fetch_bw + tma_bottleneck_mispredictions + tma_bottleneck_data_cac= he_memory_bandwidth + tma_bottleneck_data_cache_memory_latency + tma_bottle= neck_memory_data_tlbs + tma_bottleneck_memory_synchronization + tma_bottlen= eck_compute_bound_est + tma_bottleneck_irregular_overhead + tma_bottleneck_= branching_overhead + tma_bottleneck_useful_work)", "MetricGroup": "BvOB;Cor;Offcore", "MetricName": "tma_bottleneck_other_bottlenecks", "MetricThreshold": "tma_bottleneck_other_bottlenecks > 20", @@ -220,6 +229,7 @@ }, { "BriefDescription": "Total pipeline cost of \"useful operations\" = - the portion of Retiring category not covered by Branching_Overhead nor Ir= regular_Overhead.", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_retiring - (BR_INST_RETIRED.ALL_BRANCHES= + 2 * BR_INST_RETIRED.NEAR_CALL + INST_RETIRED.NOP) / tma_info_thread_slot= s - tma_microcode_sequencer / (tma_few_uops_instructions + tma_microcode_se= quencer) * (tma_assists / tma_microcode_sequencer) * tma_heavy_operations)", "MetricGroup": "BvUW;Ret", "MetricName": "tma_bottleneck_useful_work", @@ -427,7 +437,7 @@ "MetricGroup": "BvMB;MemoryBW;TopdownL4;tma_L4_group;tma_issueBW;t= ma_issueSL;tma_issueSmSt;tma_l1_bound_group", "MetricName": "tma_fb_full", "MetricThreshold": "tma_fb_full > 0.3", - "PublicDescription": "This metric does a *rough estimation* of how= often L1D Fill Buffer unavailability limited additional L1D miss memory ac= cess requests to proceed. The higher the metric value; the deeper the memor= y hierarchy level the misses are satisfied from (metric values >1 are valid= ). Often it hints on approaching bandwidth limits (to L2 cache; L3 cache or= external memory). Related metrics: tma_bottleneck_cache_memory_bandwidth, = tma_info_system_dram_bw_use, tma_mem_bandwidth, tma_sq_full, tma_store_late= ncy, tma_streaming_stores", + "PublicDescription": "This metric does a *rough estimation* of how= often L1D Fill Buffer unavailability limited additional L1D miss memory ac= cess requests to proceed. The higher the metric value; the deeper the memor= y hierarchy level the misses are satisfied from (metric values >1 are valid= ). Often it hints on approaching bandwidth limits (to L2 cache; L3 cache or= external memory). Related metrics: tma_bottleneck_data_cache_memory_bandwi= dth, tma_info_system_dram_bw_use, tma_mem_bandwidth, tma_sq_full, tma_store= _latency, tma_streaming_stores", "ScaleUnit": "100%" }, { @@ -619,6 +629,7 @@ }, { "BriefDescription": "Total pipeline cost of DSB (uop cache) hits -= subset of the Instruction_Fetch_BW Bottleneck", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_frontend_bound * (tma_fetch_bandwidth / = (tma_fetch_bandwidth + tma_fetch_latency)) * (tma_dsb / (tma_dsb + tma_lsd = + tma_mite + tma_ms)))", "MetricGroup": "DSB;Fed;FetchBW;tma_issueFB", "MetricName": "tma_info_botlnk_l2_dsb_bandwidth", @@ -1074,7 +1085,7 @@ "MetricName": "tma_info_memory_tlb_store_stlb_mpki" }, { - "BriefDescription": "", + "BriefDescription": "Mem;Backend;CacheHits", "MetricExpr": "UOPS_EXECUTED.THREAD / (UOPS_EXECUTED.CORE_CYCLES_G= E_1 / 2 if #SMT_on else cpu@UOPS_EXECUTED.THREAD\\,cmask\\=3D1@)", "MetricGroup": "Cor;Pipeline;PortsUtil;SMT", "MetricName": "tma_info_pipeline_execute" @@ -1097,6 +1108,12 @@ "MetricGroup": "Fed;FetchBW", "MetricName": "tma_info_pipeline_fetch_mite" }, + { + "BriefDescription": "Average number of uops fetched from MS per cy= cle", + "MetricExpr": "IDQ.MS_UOPS / cpu@IDQ.MS_UOPS\\,cmask\\=3D1@", + "MetricGroup": "Fed;FetchLat;MicroSeq", + "MetricName": "tma_info_pipeline_fetch_ms" + }, { "BriefDescription": "Instructions per a microcode Assist invocatio= n", "MetricExpr": "INST_RETIRED.ANY / ASSISTS.ANY", @@ -1113,7 +1130,7 @@ }, { "BriefDescription": "Measured Average Core Frequency for unhalted = processors [GHz]", - "MetricExpr": "tma_info_system_turbo_utilization * TSC / 1e9 / tma= _info_system_time", + "MetricExpr": "tma_info_system_turbo_utilization * msr@tsc@ / 1e9 = / tma_info_system_time", "MetricGroup": "Power;Summary", "MetricName": "tma_info_system_core_frequency" }, @@ -1125,7 +1142,7 @@ }, { "BriefDescription": "Average number of utilized CPUs", - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / TSC", + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@", "MetricGroup": "Summary", "MetricName": "tma_info_system_cpus_utilized" }, @@ -1134,7 +1151,7 @@ "MetricExpr": "64 * (UNC_ARB_TRK_REQUESTS.ALL + UNC_ARB_COH_TRK_RE= QUESTS.ALL) / 1e6 / tma_info_system_time / 1e3", "MetricGroup": "HPC;MemOffcore;MemoryBW;SoC;tma_issueBW", "MetricName": "tma_info_system_dram_bw_use", - "PublicDescription": "Average external Memory Bandwidth Use for re= ads and writes [GB / sec]. Related metrics: tma_bottleneck_cache_memory_ban= dwidth, tma_fb_full, tma_mem_bandwidth, tma_sq_full" + "PublicDescription": "Average external Memory Bandwidth Use for re= ads and writes [GB / sec]. Related metrics: tma_bottleneck_data_cache_memor= y_bandwidth, tma_fb_full, tma_mem_bandwidth, tma_sq_full" }, { "BriefDescription": "Giga Floating Point Operations Per Second", @@ -1165,6 +1182,7 @@ }, { "BriefDescription": "Average number of parallel data read requests= to external memory", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "UNC_ARB_DAT_OCCUPANCY.RD / UNC_ARB_DAT_OCCUPANCY.RD= @cmask\\=3D1@", "MetricGroup": "Mem;MemoryBW;SoC", "MetricName": "tma_info_system_mem_parallel_reads", @@ -1316,12 +1334,12 @@ "ScaleUnit": "100%" }, { - "BriefDescription": "This metric([SKL+] roughly; [LNL]) estimates = fraction of cycles with demand load accesses that hit the L1D cache", + "BriefDescription": "This metric ([SKL+] roughly; [LNL]) estimates= fraction of cycles with demand load accesses that hit the L1D cache", "MetricExpr": "min(2 * (MEM_INST_RETIRED.ALL_LOADS - MEM_LOAD_RETI= RED.FB_HIT - MEM_LOAD_RETIRED.L1_MISS) * 20 / 100, max(CYCLE_ACTIVITY.CYCLE= S_MEM_ANY - CYCLE_ACTIVITY.CYCLES_L1D_MISS, 0)) / tma_info_thread_clks", "MetricGroup": "BvML;MemoryLat;TopdownL4;tma_L4_group;tma_l1_bound= _group", "MetricName": "tma_l1_latency_dependency", "MetricThreshold": "tma_l1_latency_dependency > 0.1 & (tma_l1_boun= d > 0.1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric([SKL+] roughly; [LNL]) estimates= fraction of cycles with demand load accesses that hit the L1D cache. The s= hort latency of the L1D cache may be exposed in pointer-chasing memory acce= ss patterns as an example. Sample with: MEM_LOAD_RETIRED.L1_HIT", + "PublicDescription": "This metric ([SKL+] roughly; [LNL]) estimate= s fraction of cycles with demand load accesses that hit the L1D cache. The = short latency of the L1D cache may be exposed in pointer-chasing memory acc= ess patterns as an example. Sample with: MEM_LOAD_RETIRED.L1_HIT", "ScaleUnit": "100%" }, { @@ -1345,7 +1363,6 @@ }, { "BriefDescription": "This metric estimates how often the CPU was s= talled due to loads accesses to L3 cache or contended with a sibling Core", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "(CYCLE_ACTIVITY.STALLS_L2_MISS - CYCLE_ACTIVITY.STA= LLS_L3_MISS) / tma_info_thread_clks", "MetricGroup": "CacheHits;MemoryBound;TmaL3mem;TopdownL3;tma_L3_gr= oup;tma_memory_bound_group", "MetricName": "tma_l3_bound", @@ -1359,7 +1376,7 @@ "MetricGroup": "BvML;MemoryLat;TopdownL4;tma_L4_group;tma_issueLat= ;tma_l3_bound_group", "MetricName": "tma_l3_hit_latency", "MetricThreshold": "tma_l3_hit_latency > 0.1 & (tma_l3_bound > 0.0= 5 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric estimates fraction of cycles wit= h demand load accesses that hit the L3 cache under unloaded scenarios (poss= ibly L3 latency limited). Avoiding private cache misses (i.e. L2 misses/L3= hits) will improve the latency; reduce contention with sibling physical co= res and increase performance. Note the value of this node may overlap with= its siblings. Sample with: MEM_LOAD_RETIRED.L3_HIT_PS. Related metrics: tm= a_bottleneck_cache_memory_latency, tma_mem_latency", + "PublicDescription": "This metric estimates fraction of cycles wit= h demand load accesses that hit the L3 cache under unloaded scenarios (poss= ibly L3 latency limited). Avoiding private cache misses (i.e. L2 misses/L3= hits) will improve the latency; reduce contention with sibling physical co= res and increase performance. Note the value of this node may overlap with= its siblings. Sample with: MEM_LOAD_RETIRED.L3_HIT_PS. Related metrics: tm= a_bottleneck_data_cache_memory_latency, tma_mem_latency", "ScaleUnit": "100%" }, { @@ -1465,7 +1482,7 @@ "MetricGroup": "BvMB;MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_d= ram_bound_group;tma_issueBW", "MetricName": "tma_mem_bandwidth", "MetricThreshold": "tma_mem_bandwidth > 0.2 & (tma_dram_bound > 0.= 1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric estimates fraction of cycles whe= re the core's performance was likely hurt due to approaching bandwidth limi= ts of external memory - DRAM ([SPR-HBM] and/or HBM). The underlying heuris= tic assumes that a similar off-core traffic is generated by all IA cores. T= his metric does not aggregate non-data-read requests by this logical proces= sor; requests from other IA Logical Processors/Physical Cores/sockets; or o= ther non-IA devices like GPU; hence the maximum external memory bandwidth l= imits may or may not be approached when this metric is flagged (see Uncore = counters for that). Related metrics: tma_bottleneck_cache_memory_bandwidth,= tma_fb_full, tma_info_system_dram_bw_use, tma_sq_full", + "PublicDescription": "This metric estimates fraction of cycles whe= re the core's performance was likely hurt due to approaching bandwidth limi= ts of external memory - DRAM ([SPR-HBM] and/or HBM). The underlying heuris= tic assumes that a similar off-core traffic is generated by all IA cores. T= his metric does not aggregate non-data-read requests by this logical proces= sor; requests from other IA Logical Processors/Physical Cores/sockets; or o= ther non-IA devices like GPU; hence the maximum external memory bandwidth l= imits may or may not be approached when this metric is flagged (see Uncore = counters for that). Related metrics: tma_bottleneck_data_cache_memory_bandw= idth, tma_fb_full, tma_info_system_dram_bw_use, tma_sq_full", "ScaleUnit": "100%" }, { @@ -1474,7 +1491,7 @@ "MetricGroup": "BvML;MemoryLat;Offcore;TopdownL4;tma_L4_group;tma_= dram_bound_group;tma_issueLat", "MetricName": "tma_mem_latency", "MetricThreshold": "tma_mem_latency > 0.1 & (tma_dram_bound > 0.1 = & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric estimates fraction of cycles whe= re the performance was likely hurt due to latency from external memory - DR= AM ([SPR-HBM] and/or HBM). This metric does not aggregate requests from ot= her Logical Processors/Physical Cores/sockets (see Uncore counters for that= ). Related metrics: tma_bottleneck_cache_memory_latency, tma_l3_hit_latency= ", + "PublicDescription": "This metric estimates fraction of cycles whe= re the performance was likely hurt due to latency from external memory - DR= AM ([SPR-HBM] and/or HBM). This metric does not aggregate requests from ot= her Logical Processors/Physical Cores/sockets (see Uncore counters for that= ). Related metrics: tma_bottleneck_data_cache_memory_latency, tma_l3_hit_la= tency", "ScaleUnit": "100%" }, { @@ -1542,7 +1559,7 @@ }, { "BriefDescription": "This metric represents Core fraction of cycle= s in which CPU was likely limited due to the Microcode Sequencer (MS) unit = - see Microcode_Sequencer node for details.", - "MetricExpr": "cpu@IDQ.MS_UOPS\\,cmask\\=3D1@ / tma_info_core_core= _clks / 2", + "MetricExpr": "cpu@IDQ.MS_UOPS\\,cmask\\=3D1@ / tma_info_core_core= _clks / 3.3", "MetricGroup": "MicroSeq;TopdownL3;tma_L3_group;tma_fetch_bandwidt= h_group", "MetricName": "tma_ms", "MetricThreshold": "tma_ms > 0.05 & tma_fetch_bandwidth > 0.2", @@ -1676,7 +1693,7 @@ { "BriefDescription": "This category represents fraction of slots ut= ilized by useful work i.e. issued uops that eventually get retired", "DefaultMetricgroupName": "TopdownL1", - "MetricExpr": "topdown\\-retiring / (topdown\\-fe\\-bound + topdow= n\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_info_= thread_slots", + "MetricExpr": "topdown\\-retiring / (topdown\\-fe\\-bound + topdow= n\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound)", "MetricGroup": "BvUW;Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_retiring", "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.= 1", @@ -1713,7 +1730,6 @@ }, { "BriefDescription": "This metric represents rate of split store ac= cesses", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "MEM_INST_RETIRED.SPLIT_STORES / tma_info_core_core_= clks", "MetricGroup": "TopdownL4;tma_L4_group;tma_issueSpSt;tma_store_bou= nd_group", "MetricName": "tma_split_stores", @@ -1727,7 +1743,7 @@ "MetricGroup": "BvMB;MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_i= ssueBW;tma_l3_bound_group", "MetricName": "tma_sq_full", "MetricThreshold": "tma_sq_full > 0.3 & (tma_l3_bound > 0.05 & (tm= a_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric measures fraction of cycles wher= e the Super Queue (SQ) was full taking into account all request-types and b= oth hardware SMT threads (Logical Processors). Related metrics: tma_bottlen= eck_cache_memory_bandwidth, tma_fb_full, tma_info_system_dram_bw_use, tma_m= em_bandwidth", + "PublicDescription": "This metric measures fraction of cycles wher= e the Super Queue (SQ) was full taking into account all request-types and b= oth hardware SMT threads (Logical Processors). Related metrics: tma_bottlen= eck_data_cache_memory_bandwidth, tma_fb_full, tma_info_system_dram_bw_use, = tma_mem_bandwidth", "ScaleUnit": "100%" }, { @@ -1741,7 +1757,6 @@ }, { "BriefDescription": "This metric roughly estimates fraction of cyc= les when the memory subsystem had loads blocked since they could not forwar= d data from earlier (in program order) overlapping stores", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "13 * LD_BLOCKS.STORE_FORWARD / tma_info_thread_clks= ", "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group", "MetricName": "tma_store_fwd_blk", --=20 2.50.0.727.gbf7dc18ff4-goog