From nobody Sat Oct 4 09:39:48 2025 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 858513451AA for ; Mon, 18 Aug 2025 19:06:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755543995; cv=none; b=eq51dcs4hmtwFyXCj63sB4f2rJ79VCoPn2zCxx1/KT8K/sj1cf/I9IBRAp1QxGECgeh01J3DHAxZyv5v3gloeIZ7IDBHF6EyZ7Ny/CGkoV7MFlQu24dq8a/HK3uVrXpciLjkxu65gdkIyVfsRTfe46ZNtAJIeXZBhIkzDUiQp/o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755543995; c=relaxed/simple; bh=UsXw9x29ZbNtGM96teVkQJv0iivIpANKZ4rBeHTFkLc=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Content-Type; b=mF+EP02LHbWqbQF0fuv3wCLNAW3MC1n0/MYMnZ+lXlkAc/ha8NOegMbmJWe1yAmJAE/k6VtEKZ4yNs+pP8FkXzT+qAvptVAzpCuslnJ32isPBHKJBG5EFy2uzY3tYsl3jxf59eT8WTBcFlkbwlZR/H/L2U9DuWdyiWpqalmGw58= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=TXGtJY4s; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="TXGtJY4s" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-2445823bc21so112559265ad.3 for ; Mon, 18 Aug 2025 12:06:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1755543991; x=1756148791; darn=vger.kernel.org; h=content-transfer-encoding:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=OadSD3Pb1UTaC9Y8QE9413VXg/3egXRdL1KX4X+y1NQ=; b=TXGtJY4s0oCLwNsC+O2ZFd7wPmSbaXaU+Vm1KIjIt8c2nxJEE0wX6jyQSxRDLqe4AR r5lHcZuEXfKuuQKDaLk+t//V+1nRAsFaV3Zshld8GgwDUTn76/t3lR60nJbLDzFFai// jDdZHD/wicAyAm1WW/lrbyxsBbZ5npxPUyAFWYBFo++d1TKM78OqIJ0/9f4ex0lHuc61 Ml8sbf77nziQ1E3+r2HvmlRactqWx8pjq/RcNzNu4rHUzV/P5tgcBC62ozekU93/XFD/ B9HuuGiI07rDoqrRSnEs2Q8ZrQO6kyTQ1GBOx23Obnzkljuhu9xGC7OyI4Fjp8ixjWFf T6Tw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1755543991; x=1756148791; h=content-transfer-encoding:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=OadSD3Pb1UTaC9Y8QE9413VXg/3egXRdL1KX4X+y1NQ=; b=rOkmD/k2VfsUFQP7B/jCV1QI3fNejoc+FPcB7hv37y6qXU7lkrhMCdkgECm3FZgi9A 7gJhiUu0LBv9DZoESGFU/NgDsd5i7OgCpLRew37QvZ8G9u1mgtevq7rnJf2ZP8EX0M53 9QOP/hJJIJJAvUr9uGFH/vvWQyZXiBiuHRS1N7uVu864Ahx8gKkx+R2QYXNXGBJCsBvV tL2lHi7oJqZRMbSykgQLiRC61FXRKOTnzDCLxpZzerAzC6PJDqf17kb9IN5cBGKRwXo9 Klc0UglHrUa9HYITk9Fu3CziMVPq+NYjW1h9jzLMrMfrfYn0rLLYBuN+mq/pQcB4gVDe /LNw== X-Forwarded-Encrypted: i=1; AJvYcCVXKlXmycm6r/F3LthVkDYXLEVQbiSmMJLI7ZvftB/61VurjLv3JWcSo26pggTGONZDu2dc+K3M2pjSY7w=@vger.kernel.org X-Gm-Message-State: AOJu0Yzg8Yj4zo0cjlWlDDGa3MUp+jSUwL5tUabXuu4WaRMuZYN2og4L Cq5lcTSI4MIGwVJXE8Uy1SJnY/Jp5Hn1oSg2aCQ71vfjNQy6lyEiaVvflzHLW4Dng87cFU/DwAg UgrkLZFHAOg== X-Google-Smtp-Source: AGHT+IFg33d5dO+ENVrLNt1kWtvSPQy3frFazMkPPlbi8VHrevR4RYgEp3gx9aTo45PTs5mFZwpAiEKJ/JxL X-Received: from plblh13.prod.google.com ([2002:a17:903:290d:b0:240:52d7:e88a]) (user=irogers job=prod-delivery.src-stubby-dispatcher) by 2002:a17:903:1249:b0:235:ec11:f0ee with SMTP id d9443c01a7336-2449cf92875mr5242365ad.14.1755543990751; Mon, 18 Aug 2025 12:06:30 -0700 (PDT) Date: Mon, 18 Aug 2025 12:04:16 -0700 In-Reply-To: <20250818190416.145274-1-irogers@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250818190416.145274-1-irogers@google.com> X-Mailer: git-send-email 2.51.0.rc1.167.g924127e9c0-goog Message-ID: <20250818190416.145274-21-irogers@google.com> Subject: [PATCH v2 20/20] perf vendor events: Update tigerlake metrics From: Ian Rogers To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Kan Liang , "=?UTF-8?q?Andreas=20F=C3=A4rber?=" , Manivannan Sadhasivam , Caleb Biggers , Thomas Falcon , Weilin Wang , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, linux-actions@lists.infradead.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Update metrics from TMA 5.0 to 5.1. Signed-off-by: Ian Rogers Tested-by: Thomas Falcon --- .../arch/x86/tigerlake/tgl-metrics.json | 97 +++++++++++-------- 1 file changed, 56 insertions(+), 41 deletions(-) diff --git a/tools/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json b/to= ols/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json index 2db7a70f7a07..908da985c594 100644 --- a/tools/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json +++ b/tools/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json @@ -1,63 +1,63 @@ [ { "BriefDescription": "C10 residency percent per package", - "MetricExpr": "cstate_pkg@c10\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c10\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C10_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C2 residency percent per package", - "MetricExpr": "cstate_pkg@c2\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c2\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C2_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C3 residency percent per package", - "MetricExpr": "cstate_pkg@c3\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c3\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C3_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per core", - "MetricExpr": "cstate_core@c6\\-residency@ / TSC", + "MetricExpr": "cstate_core@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C6 residency percent per package", - "MetricExpr": "cstate_pkg@c6\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c6\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C6_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C7 residency percent per core", - "MetricExpr": "cstate_core@c7\\-residency@ / TSC", + "MetricExpr": "cstate_core@c7\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C7_Core_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C7 residency percent per package", - "MetricExpr": "cstate_pkg@c7\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c7\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C7_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C8 residency percent per package", - "MetricExpr": "cstate_pkg@c8\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c8\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C8_Pkg_Residency", "ScaleUnit": "100%" }, { "BriefDescription": "C9 residency percent per package", - "MetricExpr": "cstate_pkg@c9\\-residency@ / TSC", + "MetricExpr": "cstate_pkg@c9\\-residency@ / msr@tsc@", "MetricGroup": "Power", "MetricName": "C9_Pkg_Residency", "ScaleUnit": "100%" @@ -85,7 +85,6 @@ }, { "BriefDescription": "This metric estimates how often memory load a= ccesses were aliased by preceding stores (in program order) with a 4K addre= ss offset", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "LD_BLOCKS_PARTIAL.ADDRESS_ALIAS / tma_info_thread_c= lks", "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group", "MetricName": "tma_4k_aliasing", @@ -134,6 +133,7 @@ }, { "BriefDescription": "Total pipeline cost of instruction fetch rela= ted bottlenecks by large code footprint programs (i-side cache; TLB and BTB= misses)", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * tma_fetch_latency * (tma_itlb_misses + tma_ic= ache_misses + tma_unknown_branches) / (tma_branch_resteers + tma_dsb_switch= es + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches)", "MetricGroup": "BigFootprint;BvBC;Fed;Frontend;IcMiss;MemoryTLB", "MetricName": "tma_bottleneck_big_code", @@ -147,40 +147,45 @@ "MetricThreshold": "tma_bottleneck_branching_overhead > 5", "PublicDescription": "Total pipeline cost of instructions used for= program control-flow - a subset of the Retiring category in TMA. Examples = include function calls; loops and alignments. (A lower bound)" }, + { + "BriefDescription": "Total pipeline cost when the execution is com= pute-bound - an estimation", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "100 * (tma_core_bound * tma_divider / (tma_divider = + tma_ports_utilization + tma_serializing_operation) + tma_core_bound * (tm= a_ports_utilization / (tma_divider + tma_ports_utilization + tma_serializin= g_operation)) * (tma_ports_utilized_3m / (tma_ports_utilized_0 + tma_ports_= utilized_1 + tma_ports_utilized_2 + tma_ports_utilized_3m)))", + "MetricGroup": "BvCB;Cor;tma_issueComp", + "MetricName": "tma_bottleneck_compute_bound_est", + "MetricThreshold": "tma_bottleneck_compute_bound_est > 20", + "PublicDescription": "Total pipeline cost when the execution is co= mpute-bound - an estimation. Covers Core Bound when High ILP as well as whe= n long-latency execution units are busy. Related metrics: " + }, { "BriefDescription": "Total pipeline cost of external Memory- or Ca= che-Bandwidth related bottlenecks", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) *= (tma_mem_bandwidth / (tma_mem_bandwidth + tma_mem_latency)) + tma_memory_b= ound * (tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_= l3_bound + tma_store_bound)) * (tma_sq_full / (tma_contested_accesses + tma= _data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_memory_bound * (tm= a_l1_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound += tma_store_bound)) * (tma_fb_full / (tma_4k_aliasing + tma_dtlb_load + tma_= fb_full + tma_l1_latency_dependency + tma_lock_latency + tma_split_loads + = tma_store_fwd_blk)))", "MetricGroup": "BvMB;Mem;MemoryBW;Offcore;tma_issueBW", - "MetricName": "tma_bottleneck_cache_memory_bandwidth", - "MetricThreshold": "tma_bottleneck_cache_memory_bandwidth > 20", + "MetricName": "tma_bottleneck_data_cache_memory_bandwidth", + "MetricThreshold": "tma_bottleneck_data_cache_memory_bandwidth > 2= 0", "PublicDescription": "Total pipeline cost of external Memory- or C= ache-Bandwidth related bottlenecks. Related metrics: tma_fb_full, tma_info_= system_dram_bw_use, tma_mem_bandwidth, tma_sq_full" }, { "BriefDescription": "Total pipeline cost of external Memory- or Ca= che-Latency related bottlenecks", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) *= (tma_mem_latency / (tma_mem_bandwidth + tma_mem_latency)) + tma_memory_bou= nd * (tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3= _bound + tma_store_bound)) * (tma_l3_hit_latency / (tma_contested_accesses = + tma_data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_memory_bound = * tma_l2_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bou= nd + tma_store_bound) + tma_memory_bound * (tma_l1_bound / (tma_dram_bound = + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_l1_= latency_dependency / (tma_4k_aliasing + tma_dtlb_load + tma_fb_full + tma_l= 1_latency_dependency + tma_lock_latency + tma_split_loads + tma_store_fwd_b= lk)) + tma_memory_bound * (tma_l1_bound / (tma_dram_bound + tma_l1_bound + = tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_lock_latency / (tma_= 4k_aliasing + tma_dtlb_load + tma_fb_full + tma_l1_latency_dependency + tma= _lock_latency + tma_split_loads + tma_store_fwd_blk)) + tma_memory_bound * = (tma_l1_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_boun= d + tma_store_bound)) * (tma_split_loads / (tma_4k_aliasing + tma_dtlb_load= + tma_fb_full + tma_l1_latency_dependency + tma_lock_latency + tma_split_l= oads + tma_store_fwd_blk)) + tma_memory_bound * (tma_store_bound / (tma_dra= m_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * = (tma_split_stores / (tma_dtlb_store + tma_false_sharing + tma_split_stores = + tma_store_latency + tma_streaming_stores)) + tma_memory_bound * (tma_stor= e_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tm= a_store_bound)) * (tma_store_latency / (tma_dtlb_store + tma_false_sharing = + tma_split_stores + tma_store_latency + tma_streaming_stores)))", "MetricGroup": "BvML;Mem;MemoryLat;Offcore;tma_issueLat", - "MetricName": "tma_bottleneck_cache_memory_latency", - "MetricThreshold": "tma_bottleneck_cache_memory_latency > 20", + "MetricName": "tma_bottleneck_data_cache_memory_latency", + "MetricThreshold": "tma_bottleneck_data_cache_memory_latency > 20", "PublicDescription": "Total pipeline cost of external Memory- or C= ache-Latency related bottlenecks. Related metrics: tma_l3_hit_latency, tma_= mem_latency" }, - { - "BriefDescription": "Total pipeline cost when the execution is com= pute-bound - an estimation", - "MetricExpr": "100 * (tma_core_bound * tma_divider / (tma_divider = + tma_ports_utilization + tma_serializing_operation) + tma_core_bound * (tm= a_ports_utilization / (tma_divider + tma_ports_utilization + tma_serializin= g_operation)) * (tma_ports_utilized_3m / (tma_ports_utilized_0 + tma_ports_= utilized_1 + tma_ports_utilized_2 + tma_ports_utilized_3m)))", - "MetricGroup": "BvCB;Cor;tma_issueComp", - "MetricName": "tma_bottleneck_compute_bound_est", - "MetricThreshold": "tma_bottleneck_compute_bound_est > 20", - "PublicDescription": "Total pipeline cost when the execution is co= mpute-bound - an estimation. Covers Core Bound when High ILP as well as whe= n long-latency execution units are busy. Related metrics: " - }, { "BriefDescription": "Total pipeline cost of instruction fetch band= width related bottlenecks (when the front-end could not sustain operations = delivery to the back-end)", - "MetricExpr": "100 * (tma_frontend_bound - (1 - 10 * tma_microcode= _sequencer * tma_other_mispredicts / tma_branch_mispredicts) * tma_fetch_la= tency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches = + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches) - tma_mi= crocode_sequencer / (tma_few_uops_instructions + tma_microcode_sequencer) *= (tma_assists / tma_microcode_sequencer) * (tma_fetch_latency * (tma_ms_swi= tches + tma_branch_resteers * (tma_clears_resteers + 10 * tma_microcode_seq= uencer * tma_other_mispredicts / tma_branch_mispredicts * tma_mispredicts_r= esteers) / (tma_clears_resteers + tma_mispredicts_resteers + tma_unknown_br= anches)) / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tm= a_itlb_misses + tma_lcp + tma_ms_switches) + tma_fetch_bandwidth * tma_ms /= (tma_dsb + tma_lsd + tma_mite + tma_ms))) - tma_bottleneck_big_code", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "100 * (tma_frontend_bound - (1 - 10 * tma_microcode= _sequencer * tma_other_mispredicts / tma_branch_mispredicts) * tma_fetch_la= tency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches = + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches) - tma_mi= crocode_sequencer / (tma_few_uops_instructions + tma_microcode_sequencer) *= (tma_assists / tma_microcode_sequencer) * (tma_fetch_latency * (tma_ms_swi= tches + tma_branch_resteers * (tma_clears_resteers + 10 * tma_microcode_seq= uencer * tma_other_mispredicts / tma_branch_mispredicts * tma_mispredicts_r= esteers) / (tma_clears_resteers + tma_mispredicts_resteers + tma_unknown_br= anches)) / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tm= a_itlb_misses + tma_lcp + tma_ms_switches) + tma_ms)) - tma_bottleneck_big_= code", "MetricGroup": "BvFB;Fed;FetchBW;Frontend", "MetricName": "tma_bottleneck_instruction_fetch_bw", "MetricThreshold": "tma_bottleneck_instruction_fetch_bw > 20" }, { "BriefDescription": "Total pipeline cost of irregular execution (e= .g", - "MetricExpr": "100 * (tma_microcode_sequencer / (tma_few_uops_inst= ructions + tma_microcode_sequencer) * (tma_assists / tma_microcode_sequence= r) * (tma_fetch_latency * (tma_ms_switches + tma_branch_resteers * (tma_cle= ars_resteers + 10 * tma_microcode_sequencer * tma_other_mispredicts / tma_b= ranch_mispredicts * tma_mispredicts_resteers) / (tma_clears_resteers + tma_= mispredicts_resteers + tma_unknown_branches)) / (tma_branch_resteers + tma_= dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switc= hes) + tma_fetch_bandwidth * tma_ms / (tma_dsb + tma_lsd + tma_mite + tma_m= s)) + 10 * tma_microcode_sequencer * tma_other_mispredicts / tma_branch_mis= predicts * tma_branch_mispredicts + tma_machine_clears * tma_other_nukes / = tma_other_nukes + tma_core_bound * (tma_serializing_operation + tma_core_bo= und * RS_EVENTS.EMPTY_CYCLES / tma_info_thread_clks * tma_ports_utilized_0)= / (tma_divider + tma_ports_utilization + tma_serializing_operation) + tma_= microcode_sequencer / (tma_few_uops_instructions + tma_microcode_sequencer)= * (tma_assists / tma_microcode_sequencer) * tma_heavy_operations)", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "100 * (tma_microcode_sequencer / (tma_few_uops_inst= ructions + tma_microcode_sequencer) * (tma_assists / tma_microcode_sequence= r) * (tma_fetch_latency * (tma_ms_switches + tma_branch_resteers * (tma_cle= ars_resteers + 10 * tma_microcode_sequencer * tma_other_mispredicts / tma_b= ranch_mispredicts * tma_mispredicts_resteers) / (tma_clears_resteers + tma_= mispredicts_resteers + tma_unknown_branches)) / (tma_branch_resteers + tma_= dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switc= hes) + tma_ms) + 10 * tma_microcode_sequencer * tma_other_mispredicts / tma= _branch_mispredicts * tma_branch_mispredicts + tma_machine_clears * tma_oth= er_nukes / tma_other_nukes + tma_core_bound * (tma_serializing_operation + = tma_core_bound * RS_EVENTS.EMPTY_CYCLES / tma_info_thread_clks * tma_ports_= utilized_0) / (tma_divider + tma_ports_utilization + tma_serializing_operat= ion) + tma_microcode_sequencer / (tma_few_uops_instructions + tma_microcode= _sequencer) * (tma_assists / tma_microcode_sequencer) * tma_heavy_operation= s)", "MetricGroup": "Bad;BvIO;Cor;Ret;tma_issueMS", "MetricName": "tma_bottleneck_irregular_overhead", "MetricThreshold": "tma_bottleneck_irregular_overhead > 10", @@ -188,6 +193,7 @@ }, { "BriefDescription": "Total pipeline cost of Memory Address Transla= tion related bottlenecks (data-side TLBs)", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_memory_bound * (tma_l1_bound / max(tma_m= emory_bound, tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + = tma_store_bound)) * (tma_dtlb_load / max(tma_l1_bound, tma_4k_aliasing + tm= a_dtlb_load + tma_fb_full + tma_l1_latency_dependency + tma_lock_latency + = tma_split_loads + tma_store_fwd_blk)) + tma_memory_bound * (tma_store_bound= / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store= _bound)) * (tma_dtlb_store / (tma_dtlb_store + tma_false_sharing + tma_spli= t_stores + tma_store_latency + tma_streaming_stores)))", "MetricGroup": "BvMT;Mem;MemoryTLB;Offcore;tma_issueTLB", "MetricName": "tma_bottleneck_memory_data_tlbs", @@ -196,6 +202,7 @@ }, { "BriefDescription": "Total pipeline cost of Memory Synchronization= related bottlenecks (data transfers and coherency updates across processor= s)", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_memory_bound * (tma_l3_bound / (tma_dram= _bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * (t= ma_contested_accesses + tma_data_sharing) / (tma_contested_accesses + tma_d= ata_sharing + tma_l3_hit_latency + tma_sq_full) + tma_store_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * = tma_false_sharing / (tma_dtlb_store + tma_false_sharing + tma_split_stores = + tma_store_latency + tma_streaming_stores - tma_store_latency)) + tma_mach= ine_clears * (1 - tma_other_nukes / tma_other_nukes))", "MetricGroup": "BvMS;LockCont;Mem;Offcore;tma_issueSyncxn", "MetricName": "tma_bottleneck_memory_synchronization", @@ -204,6 +211,7 @@ }, { "BriefDescription": "Total pipeline cost of Branch Misprediction r= elated bottlenecks", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (1 - 10 * tma_microcode_sequencer * tma_other= _mispredicts / tma_branch_mispredicts) * (tma_branch_mispredicts + tma_fetc= h_latency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switc= hes + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches))", "MetricGroup": "Bad;BadSpec;BrMispredicts;BvMP;tma_issueBM", "MetricName": "tma_bottleneck_mispredictions", @@ -212,7 +220,8 @@ }, { "BriefDescription": "Total pipeline cost of remaining bottlenecks = in the back-end", - "MetricExpr": "100 - (tma_bottleneck_big_code + tma_bottleneck_ins= truction_fetch_bw + tma_bottleneck_mispredictions + tma_bottleneck_cache_me= mory_bandwidth + tma_bottleneck_cache_memory_latency + tma_bottleneck_memor= y_data_tlbs + tma_bottleneck_memory_synchronization + tma_bottleneck_comput= e_bound_est + tma_bottleneck_irregular_overhead + tma_bottleneck_branching_= overhead + tma_bottleneck_useful_work)", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "100 - (tma_bottleneck_big_code + tma_bottleneck_ins= truction_fetch_bw + tma_bottleneck_mispredictions + tma_bottleneck_data_cac= he_memory_bandwidth + tma_bottleneck_data_cache_memory_latency + tma_bottle= neck_memory_data_tlbs + tma_bottleneck_memory_synchronization + tma_bottlen= eck_compute_bound_est + tma_bottleneck_irregular_overhead + tma_bottleneck_= branching_overhead + tma_bottleneck_useful_work)", "MetricGroup": "BvOB;Cor;Offcore", "MetricName": "tma_bottleneck_other_bottlenecks", "MetricThreshold": "tma_bottleneck_other_bottlenecks > 20", @@ -220,6 +229,7 @@ }, { "BriefDescription": "Total pipeline cost of \"useful operations\" = - the portion of Retiring category not covered by Branching_Overhead nor Ir= regular_Overhead.", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_retiring - (BR_INST_RETIRED.ALL_BRANCHES= + 2 * BR_INST_RETIRED.NEAR_CALL + INST_RETIRED.NOP) / tma_info_thread_slot= s - tma_microcode_sequencer / (tma_few_uops_instructions + tma_microcode_se= quencer) * (tma_assists / tma_microcode_sequencer) * tma_heavy_operations)", "MetricGroup": "BvUW;Ret", "MetricName": "tma_bottleneck_useful_work", @@ -427,7 +437,7 @@ "MetricGroup": "BvMB;MemoryBW;TopdownL4;tma_L4_group;tma_issueBW;t= ma_issueSL;tma_issueSmSt;tma_l1_bound_group", "MetricName": "tma_fb_full", "MetricThreshold": "tma_fb_full > 0.3", - "PublicDescription": "This metric does a *rough estimation* of how= often L1D Fill Buffer unavailability limited additional L1D miss memory ac= cess requests to proceed. The higher the metric value; the deeper the memor= y hierarchy level the misses are satisfied from (metric values >1 are valid= ). Often it hints on approaching bandwidth limits (to L2 cache; L3 cache or= external memory). Related metrics: tma_bottleneck_cache_memory_bandwidth, = tma_info_system_dram_bw_use, tma_mem_bandwidth, tma_sq_full, tma_store_late= ncy, tma_streaming_stores", + "PublicDescription": "This metric does a *rough estimation* of how= often L1D Fill Buffer unavailability limited additional L1D miss memory ac= cess requests to proceed. The higher the metric value; the deeper the memor= y hierarchy level the misses are satisfied from (metric values >1 are valid= ). Often it hints on approaching bandwidth limits (to L2 cache; L3 cache or= external memory). Related metrics: tma_bottleneck_data_cache_memory_bandwi= dth, tma_info_system_dram_bw_use, tma_mem_bandwidth, tma_sq_full, tma_store= _latency, tma_streaming_stores", "ScaleUnit": "100%" }, { @@ -619,6 +629,7 @@ }, { "BriefDescription": "Total pipeline cost of DSB (uop cache) hits -= subset of the Instruction_Fetch_BW Bottleneck", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_frontend_bound * (tma_fetch_bandwidth / = (tma_fetch_bandwidth + tma_fetch_latency)) * (tma_dsb / (tma_dsb + tma_lsd = + tma_mite + tma_ms)))", "MetricGroup": "DSB;Fed;FetchBW;tma_issueFB", "MetricName": "tma_info_botlnk_l2_dsb_bandwidth", @@ -1074,7 +1085,7 @@ "MetricName": "tma_info_memory_tlb_store_stlb_mpki" }, { - "BriefDescription": "", + "BriefDescription": "Mem;Backend;CacheHits", "MetricExpr": "UOPS_EXECUTED.THREAD / (UOPS_EXECUTED.CORE_CYCLES_G= E_1 / 2 if #SMT_on else cpu@UOPS_EXECUTED.THREAD\\,cmask\\=3D1@)", "MetricGroup": "Cor;Pipeline;PortsUtil;SMT", "MetricName": "tma_info_pipeline_execute" @@ -1097,6 +1108,12 @@ "MetricGroup": "Fed;FetchBW", "MetricName": "tma_info_pipeline_fetch_mite" }, + { + "BriefDescription": "Average number of uops fetched from MS per cy= cle", + "MetricExpr": "IDQ.MS_UOPS / cpu@IDQ.MS_UOPS\\,cmask\\=3D1@", + "MetricGroup": "Fed;FetchLat;MicroSeq", + "MetricName": "tma_info_pipeline_fetch_ms" + }, { "BriefDescription": "Instructions per a microcode Assist invocatio= n", "MetricExpr": "INST_RETIRED.ANY / ASSISTS.ANY", @@ -1113,7 +1130,7 @@ }, { "BriefDescription": "Measured Average Core Frequency for unhalted = processors [GHz]", - "MetricExpr": "tma_info_system_turbo_utilization * TSC / 1e9 / tma= _info_system_time", + "MetricExpr": "tma_info_system_turbo_utilization * msr@tsc@ / 1e9 = / tma_info_system_time", "MetricGroup": "Power;Summary", "MetricName": "tma_info_system_core_frequency" }, @@ -1125,7 +1142,7 @@ }, { "BriefDescription": "Average number of utilized CPUs", - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / TSC", + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@", "MetricGroup": "Summary", "MetricName": "tma_info_system_cpus_utilized" }, @@ -1134,7 +1151,7 @@ "MetricExpr": "64 * (UNC_ARB_TRK_REQUESTS.ALL + UNC_ARB_COH_TRK_RE= QUESTS.ALL) / 1e6 / tma_info_system_time / 1e3", "MetricGroup": "HPC;MemOffcore;MemoryBW;SoC;tma_issueBW", "MetricName": "tma_info_system_dram_bw_use", - "PublicDescription": "Average external Memory Bandwidth Use for re= ads and writes [GB / sec]. Related metrics: tma_bottleneck_cache_memory_ban= dwidth, tma_fb_full, tma_mem_bandwidth, tma_sq_full" + "PublicDescription": "Average external Memory Bandwidth Use for re= ads and writes [GB / sec]. Related metrics: tma_bottleneck_data_cache_memor= y_bandwidth, tma_fb_full, tma_mem_bandwidth, tma_sq_full" }, { "BriefDescription": "Giga Floating Point Operations Per Second", @@ -1165,6 +1182,7 @@ }, { "BriefDescription": "Average number of parallel data read requests= to external memory", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "UNC_ARB_DAT_OCCUPANCY.RD / UNC_ARB_DAT_OCCUPANCY.RD= @cmask\\=3D1@", "MetricGroup": "Mem;MemoryBW;SoC", "MetricName": "tma_info_system_mem_parallel_reads", @@ -1316,12 +1334,12 @@ "ScaleUnit": "100%" }, { - "BriefDescription": "This metric([SKL+] roughly; [LNL]) estimates = fraction of cycles with demand load accesses that hit the L1D cache", + "BriefDescription": "This metric ([SKL+] roughly; [LNL]) estimates= fraction of cycles with demand load accesses that hit the L1D cache", "MetricExpr": "min(2 * (MEM_INST_RETIRED.ALL_LOADS - MEM_LOAD_RETI= RED.FB_HIT - MEM_LOAD_RETIRED.L1_MISS) * 20 / 100, max(CYCLE_ACTIVITY.CYCLE= S_MEM_ANY - CYCLE_ACTIVITY.CYCLES_L1D_MISS, 0)) / tma_info_thread_clks", "MetricGroup": "BvML;MemoryLat;TopdownL4;tma_L4_group;tma_l1_bound= _group", "MetricName": "tma_l1_latency_dependency", "MetricThreshold": "tma_l1_latency_dependency > 0.1 & (tma_l1_boun= d > 0.1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric([SKL+] roughly; [LNL]) estimates= fraction of cycles with demand load accesses that hit the L1D cache. The s= hort latency of the L1D cache may be exposed in pointer-chasing memory acce= ss patterns as an example. Sample with: MEM_LOAD_RETIRED.L1_HIT", + "PublicDescription": "This metric ([SKL+] roughly; [LNL]) estimate= s fraction of cycles with demand load accesses that hit the L1D cache. The = short latency of the L1D cache may be exposed in pointer-chasing memory acc= ess patterns as an example. Sample with: MEM_LOAD_RETIRED.L1_HIT", "ScaleUnit": "100%" }, { @@ -1345,7 +1363,6 @@ }, { "BriefDescription": "This metric estimates how often the CPU was s= talled due to loads accesses to L3 cache or contended with a sibling Core", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "(CYCLE_ACTIVITY.STALLS_L2_MISS - CYCLE_ACTIVITY.STA= LLS_L3_MISS) / tma_info_thread_clks", "MetricGroup": "CacheHits;MemoryBound;TmaL3mem;TopdownL3;tma_L3_gr= oup;tma_memory_bound_group", "MetricName": "tma_l3_bound", @@ -1359,7 +1376,7 @@ "MetricGroup": "BvML;MemoryLat;TopdownL4;tma_L4_group;tma_issueLat= ;tma_l3_bound_group", "MetricName": "tma_l3_hit_latency", "MetricThreshold": "tma_l3_hit_latency > 0.1 & (tma_l3_bound > 0.0= 5 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric estimates fraction of cycles wit= h demand load accesses that hit the L3 cache under unloaded scenarios (poss= ibly L3 latency limited). Avoiding private cache misses (i.e. L2 misses/L3= hits) will improve the latency; reduce contention with sibling physical co= res and increase performance. Note the value of this node may overlap with= its siblings. Sample with: MEM_LOAD_RETIRED.L3_HIT_PS. Related metrics: tm= a_bottleneck_cache_memory_latency, tma_mem_latency", + "PublicDescription": "This metric estimates fraction of cycles wit= h demand load accesses that hit the L3 cache under unloaded scenarios (poss= ibly L3 latency limited). Avoiding private cache misses (i.e. L2 misses/L3= hits) will improve the latency; reduce contention with sibling physical co= res and increase performance. Note the value of this node may overlap with= its siblings. Sample with: MEM_LOAD_RETIRED.L3_HIT_PS. Related metrics: tm= a_bottleneck_data_cache_memory_latency, tma_mem_latency", "ScaleUnit": "100%" }, { @@ -1465,7 +1482,7 @@ "MetricGroup": "BvMB;MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_d= ram_bound_group;tma_issueBW", "MetricName": "tma_mem_bandwidth", "MetricThreshold": "tma_mem_bandwidth > 0.2 & (tma_dram_bound > 0.= 1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric estimates fraction of cycles whe= re the core's performance was likely hurt due to approaching bandwidth limi= ts of external memory - DRAM ([SPR-HBM] and/or HBM). The underlying heuris= tic assumes that a similar off-core traffic is generated by all IA cores. T= his metric does not aggregate non-data-read requests by this logical proces= sor; requests from other IA Logical Processors/Physical Cores/sockets; or o= ther non-IA devices like GPU; hence the maximum external memory bandwidth l= imits may or may not be approached when this metric is flagged (see Uncore = counters for that). Related metrics: tma_bottleneck_cache_memory_bandwidth,= tma_fb_full, tma_info_system_dram_bw_use, tma_sq_full", + "PublicDescription": "This metric estimates fraction of cycles whe= re the core's performance was likely hurt due to approaching bandwidth limi= ts of external memory - DRAM ([SPR-HBM] and/or HBM). The underlying heuris= tic assumes that a similar off-core traffic is generated by all IA cores. T= his metric does not aggregate non-data-read requests by this logical proces= sor; requests from other IA Logical Processors/Physical Cores/sockets; or o= ther non-IA devices like GPU; hence the maximum external memory bandwidth l= imits may or may not be approached when this metric is flagged (see Uncore = counters for that). Related metrics: tma_bottleneck_data_cache_memory_bandw= idth, tma_fb_full, tma_info_system_dram_bw_use, tma_sq_full", "ScaleUnit": "100%" }, { @@ -1474,7 +1491,7 @@ "MetricGroup": "BvML;MemoryLat;Offcore;TopdownL4;tma_L4_group;tma_= dram_bound_group;tma_issueLat", "MetricName": "tma_mem_latency", "MetricThreshold": "tma_mem_latency > 0.1 & (tma_dram_bound > 0.1 = & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric estimates fraction of cycles whe= re the performance was likely hurt due to latency from external memory - DR= AM ([SPR-HBM] and/or HBM). This metric does not aggregate requests from ot= her Logical Processors/Physical Cores/sockets (see Uncore counters for that= ). Related metrics: tma_bottleneck_cache_memory_latency, tma_l3_hit_latency= ", + "PublicDescription": "This metric estimates fraction of cycles whe= re the performance was likely hurt due to latency from external memory - DR= AM ([SPR-HBM] and/or HBM). This metric does not aggregate requests from ot= her Logical Processors/Physical Cores/sockets (see Uncore counters for that= ). Related metrics: tma_bottleneck_data_cache_memory_latency, tma_l3_hit_la= tency", "ScaleUnit": "100%" }, { @@ -1542,7 +1559,7 @@ }, { "BriefDescription": "This metric represents Core fraction of cycle= s in which CPU was likely limited due to the Microcode Sequencer (MS) unit = - see Microcode_Sequencer node for details.", - "MetricExpr": "cpu@IDQ.MS_UOPS\\,cmask\\=3D1@ / tma_info_core_core= _clks / 2", + "MetricExpr": "cpu@IDQ.MS_UOPS\\,cmask\\=3D1@ / tma_info_core_core= _clks / 3.3", "MetricGroup": "MicroSeq;TopdownL3;tma_L3_group;tma_fetch_bandwidt= h_group", "MetricName": "tma_ms", "MetricThreshold": "tma_ms > 0.05 & tma_fetch_bandwidth > 0.2", @@ -1676,7 +1693,7 @@ { "BriefDescription": "This category represents fraction of slots ut= ilized by useful work i.e. issued uops that eventually get retired", "DefaultMetricgroupName": "TopdownL1", - "MetricExpr": "topdown\\-retiring / (topdown\\-fe\\-bound + topdow= n\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_info_= thread_slots", + "MetricExpr": "topdown\\-retiring / (topdown\\-fe\\-bound + topdow= n\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound)", "MetricGroup": "BvUW;Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_retiring", "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.= 1", @@ -1713,7 +1730,6 @@ }, { "BriefDescription": "This metric represents rate of split store ac= cesses", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "MEM_INST_RETIRED.SPLIT_STORES / tma_info_core_core_= clks", "MetricGroup": "TopdownL4;tma_L4_group;tma_issueSpSt;tma_store_bou= nd_group", "MetricName": "tma_split_stores", @@ -1727,7 +1743,7 @@ "MetricGroup": "BvMB;MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_i= ssueBW;tma_l3_bound_group", "MetricName": "tma_sq_full", "MetricThreshold": "tma_sq_full > 0.3 & (tma_l3_bound > 0.05 & (tm= a_memory_bound > 0.2 & tma_backend_bound > 0.2))", - "PublicDescription": "This metric measures fraction of cycles wher= e the Super Queue (SQ) was full taking into account all request-types and b= oth hardware SMT threads (Logical Processors). Related metrics: tma_bottlen= eck_cache_memory_bandwidth, tma_fb_full, tma_info_system_dram_bw_use, tma_m= em_bandwidth", + "PublicDescription": "This metric measures fraction of cycles wher= e the Super Queue (SQ) was full taking into account all request-types and b= oth hardware SMT threads (Logical Processors). Related metrics: tma_bottlen= eck_data_cache_memory_bandwidth, tma_fb_full, tma_info_system_dram_bw_use, = tma_mem_bandwidth", "ScaleUnit": "100%" }, { @@ -1741,7 +1757,6 @@ }, { "BriefDescription": "This metric roughly estimates fraction of cyc= les when the memory subsystem had loads blocked since they could not forwar= d data from earlier (in program order) overlapping stores", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "13 * LD_BLOCKS.STORE_FORWARD / tma_info_thread_clks= ", "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group", "MetricName": "tma_store_fwd_blk", --=20 2.51.0.rc1.167.g924127e9c0-goog