From nobody Fri Dec 26 15:29:44 2025 Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 84FD92D02A for ; Thu, 4 Jan 2024 23:19:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="lX3dLoaT" Received: by mail-yb1-f202.google.com with SMTP id 3f1490d57ef6-dbe9eeb4a60so1202526276.3 for ; Thu, 04 Jan 2024 15:19:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1704410348; x=1705015148; darn=vger.kernel.org; h=content-transfer-encoding:to:from:subject:mime-version:message-id :date:from:to:cc:subject:date:message-id:reply-to; bh=TPzuEKzRrL81jyZLVHH/kadM86g22bb5qzwLRzcADww=; b=lX3dLoaT8wUhFaGg0ggPiPuv1DpJusE+cKIn24F8cQWRT0oTPeAOBDA5jMMXIufMHp tHVPvznORnOU9DSQWp1SHRtkZxB09RkD/5yNGhd7Hd/SxAt2HAtcXvVlgvVFKrw33E5I zPgwpRgpUROAISEYeiF723cJbgppaYnVmO7dGdSEYBj5B0onWTbnXLhTo0loAorVdFJU 6ZWa1rOax7F5m1LJ5EDZCCpFgxQHyXw4TPVaCYwPhlbBbQqSg0sgUBbad4EbqQw0ZWwb 2KxaCep2FbTCRmt8PI7YTfmzHgmfkgSjkMTfi1qc9IogOy8FtpRfOSbliv1B4zfiGShK bOXQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704410348; x=1705015148; h=content-transfer-encoding:to:from:subject:mime-version:message-id :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=TPzuEKzRrL81jyZLVHH/kadM86g22bb5qzwLRzcADww=; b=pi9PXA3MyNyntUg4N7uOPmNbKddH4JK5KJ2LsFDxDOSvoUq6I6S2sh9LvLCWHa4YEU t/UcdwFSX1Pu0waiCpvdg/D8DVh/p8fQDqpOwlJIdqUZ4NqWQ1eZQx2b4js4fstaijbF H+mtgOiRDLkSEU1OzrmAHOMSSrP2nTpyrDUw6iztUnshqQvVbIgi0/ZxbvXKvHjDLWG1 ashXkekS/DBCItYrcxd8rlzMx2eBcgVkkQD7Cagb0jhSBJV2O1GA5/mi413/MSCXVLiF UzhbigKMeAJcT+IQahBQBvCTa4sngB6Mkgr9yCIHUoQAACuLhRgU0Wa+XoTVx5BSiHmc wCTw== X-Gm-Message-State: AOJu0YzcKATUMVVOGFLHfc4JHOMyY6TTzyK+nUFlYwWKp2ft8OBpVPDP bDsTENd1l7Ab2cOP0CwiIYO4fn4nYSeW7UnJKCc= X-Google-Smtp-Source: AGHT+IEXZzt69hSSjJG846GhPE1tjpnhudfHoZA4qgFiL1IClk9Rc1sBiWrHIfrct/M12mLKVolM+Re6LSub X-Received: from irogers.svl.corp.google.com ([2620:15c:2a3:200:6a70:551e:bdf8:84f3]) (user=irogers job=sendgmr) by 2002:a25:6b01:0:b0:dbe:aab3:4371 with SMTP id g1-20020a256b01000000b00dbeaab34371mr40444ybc.8.1704410348569; Thu, 04 Jan 2024 15:19:08 -0800 (PST) Date: Thu, 4 Jan 2024 15:19:03 -0800 Message-Id: <20240104231903.775717-1-irogers@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog Subject: [PATCH v1] perf vendor events intel: Alderlake/sapphirerapids metric fixes From: Ian Rogers To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , Kan Liang , linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, Edward Baker Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" As events are deduplicated by name, ensure PMU prefixes are always used in metrics. Previously they may be missed on the first event in a formula. Update metric constraints for architectures with topdown l2 events. Conversion script updated in: https://github.com/intel/perfmon/pull/128 Reported-by: Arnaldo Carvalho de Melo Closes: https://lore.kernel.org/lkml/ZZam-EG-UepcXtWw@kernel.org/ Signed-off-by: Ian Rogers Reviewed-by: Kan Liang --- .../arch/x86/alderlake/adl-metrics.json | 254 ++++++++---------- .../arch/x86/alderlaken/adln-metrics.json | 4 - .../arch/x86/sapphirerapids/spr-metrics.json | 25 +- 3 files changed, 123 insertions(+), 160 deletions(-) diff --git a/tools/perf/pmu-events/arch/x86/alderlake/adl-metrics.json b/to= ols/perf/pmu-events/arch/x86/alderlake/adl-metrics.json index 35124a4ddcb2..bbfa3883e533 100644 --- a/tools/perf/pmu-events/arch/x86/alderlake/adl-metrics.json +++ b/tools/perf/pmu-events/arch/x86/alderlake/adl-metrics.json @@ -114,7 +114,7 @@ }, { "BriefDescription": "Counts the number of issue slots that were n= ot consumed by the backend due to certain allocation restrictions.", - "MetricExpr": "TOPDOWN_BE_BOUND.ALLOC_RESTRICTIONS / tma_info_core= _slots", + "MetricExpr": "cpu_atom@TOPDOWN_BE_BOUND.ALLOC_RESTRICTIONS@ / tma= _info_core_slots", "MetricGroup": "TopdownL3;tma_L3_group;tma_resource_bound_group", "MetricName": "tma_alloc_restriction", "MetricThreshold": "tma_alloc_restriction > 0.1", @@ -124,7 +124,7 @@ { "BriefDescription": "Counts the total number of issue slots that = were not consumed by the backend due to backend stalls", "DefaultMetricgroupName": "TopdownL1", - "MetricExpr": "TOPDOWN_BE_BOUND.ALL / tma_info_core_slots", + "MetricExpr": "cpu_atom@TOPDOWN_BE_BOUND.ALL@ / tma_info_core_slot= s", "MetricGroup": "Default;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", "MetricThreshold": "tma_backend_bound > 0.1", @@ -169,7 +169,7 @@ }, { "BriefDescription": "Counts the number of issue slots that were n= ot delivered by the frontend due to BACLEARS, which occurs when the Branch = Target Buffer (BTB) prediction or lack thereof, was corrected by a later br= anch predictor in the frontend", - "MetricExpr": "TOPDOWN_FE_BOUND.BRANCH_DETECT / tma_info_core_slot= s", + "MetricExpr": "cpu_atom@TOPDOWN_FE_BOUND.BRANCH_DETECT@ / tma_info= _core_slots", "MetricGroup": "TopdownL3;tma_L3_group;tma_fetch_latency_group", "MetricName": "tma_branch_detect", "MetricThreshold": "tma_branch_detect > 0.05", @@ -179,7 +179,7 @@ }, { "BriefDescription": "Counts the number of issue slots that were n= ot consumed by the backend due to branch mispredicts.", - "MetricExpr": "TOPDOWN_BAD_SPECULATION.MISPREDICT / tma_info_core_= slots", + "MetricExpr": "cpu_atom@TOPDOWN_BAD_SPECULATION.MISPREDICT@ / tma_= info_core_slots", "MetricGroup": "TopdownL2;tma_L2_group;tma_bad_speculation_group", "MetricName": "tma_branch_mispredicts", "MetricThreshold": "tma_branch_mispredicts > 0.05", @@ -189,7 +189,7 @@ }, { "BriefDescription": "Counts the number of issue slots that were n= ot delivered by the frontend due to BTCLEARS, which occurs when the Branch = Target Buffer (BTB) predicts a taken branch.", - "MetricExpr": "TOPDOWN_FE_BOUND.BRANCH_RESTEER / tma_info_core_slo= ts", + "MetricExpr": "cpu_atom@TOPDOWN_FE_BOUND.BRANCH_RESTEER@ / tma_inf= o_core_slots", "MetricGroup": "TopdownL3;tma_L3_group;tma_fetch_latency_group", "MetricName": "tma_branch_resteer", "MetricThreshold": "tma_branch_resteer > 0.05", @@ -198,7 +198,7 @@ }, { "BriefDescription": "Counts the number of issue slots that were n= ot delivered by the frontend due to the microcode sequencer (MS).", - "MetricExpr": "TOPDOWN_FE_BOUND.CISC / tma_info_core_slots", + "MetricExpr": "cpu_atom@TOPDOWN_FE_BOUND.CISC@ / tma_info_core_slo= ts", "MetricGroup": "TopdownL3;tma_L3_group;tma_fetch_bandwidth_group", "MetricName": "tma_cisc", "MetricThreshold": "tma_cisc > 0.05", @@ -217,7 +217,7 @@ }, { "BriefDescription": "Counts the number of issue slots that were n= ot delivered by the frontend due to decode stalls.", - "MetricExpr": "TOPDOWN_FE_BOUND.DECODE / tma_info_core_slots", + "MetricExpr": "cpu_atom@TOPDOWN_FE_BOUND.DECODE@ / tma_info_core_s= lots", "MetricGroup": "TopdownL3;tma_L3_group;tma_fetch_bandwidth_group", "MetricName": "tma_decode", "MetricThreshold": "tma_decode > 0.05", @@ -235,7 +235,6 @@ }, { "BriefDescription": "Counts the number of cycles the core is stall= ed due to a demand load miss which hit in DRAM or MMIO (Non-DRAM).", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "cpu_atom@MEM_BOUND_STALLS.LOAD_DRAM_HIT@ / tma_info= _core_clks - max((cpu_atom@MEM_BOUND_STALLS.LOAD@ - cpu_atom@LD_HEAD.L1_MIS= S_AT_RET@) / tma_info_core_clks, 0) * cpu_atom@MEM_BOUND_STALLS.LOAD_DRAM_H= IT@ / cpu_atom@MEM_BOUND_STALLS.LOAD@", "MetricGroup": "TopdownL3;tma_L3_group;tma_memory_bound_group", "MetricName": "tma_dram_bound", @@ -245,7 +244,7 @@ }, { "BriefDescription": "Counts the number of issue slots that were n= ot consumed by the backend due to a machine clear classified as a fast nuke= due to memory ordering, memory disambiguation and memory renaming.", - "MetricExpr": "TOPDOWN_BAD_SPECULATION.FASTNUKE / tma_info_core_sl= ots", + "MetricExpr": "cpu_atom@TOPDOWN_BAD_SPECULATION.FASTNUKE@ / tma_in= fo_core_slots", "MetricGroup": "TopdownL3;tma_L3_group;tma_machine_clears_group", "MetricName": "tma_fast_nuke", "MetricThreshold": "tma_fast_nuke > 0.05", @@ -254,7 +253,7 @@ }, { "BriefDescription": "Counts the number of issue slots that were n= ot delivered by the frontend due to frontend bandwidth restrictions due to = decode, predecode, cisc, and other limitations.", - "MetricExpr": "TOPDOWN_FE_BOUND.FRONTEND_BANDWIDTH / tma_info_core= _slots", + "MetricExpr": "cpu_atom@TOPDOWN_FE_BOUND.FRONTEND_BANDWIDTH@ / tma= _info_core_slots", "MetricGroup": "TopdownL2;tma_L2_group;tma_frontend_bound_group", "MetricName": "tma_fetch_bandwidth", "MetricThreshold": "tma_fetch_bandwidth > 0.1", @@ -264,7 +263,7 @@ }, { "BriefDescription": "Counts the number of issue slots that were n= ot delivered by the frontend due to frontend bandwidth restrictions due to = decode, predecode, cisc, and other limitations.", - "MetricExpr": "TOPDOWN_FE_BOUND.FRONTEND_LATENCY / tma_info_core_s= lots", + "MetricExpr": "cpu_atom@TOPDOWN_FE_BOUND.FRONTEND_LATENCY@ / tma_i= nfo_core_slots", "MetricGroup": "TopdownL2;tma_L2_group;tma_frontend_bound_group", "MetricName": "tma_fetch_latency", "MetricThreshold": "tma_fetch_latency > 0.15", @@ -283,7 +282,7 @@ }, { "BriefDescription": "Counts the number of floating point divide op= erations per uop.", - "MetricExpr": "UOPS_RETIRED.FPDIV / tma_info_core_slots", + "MetricExpr": "cpu_atom@UOPS_RETIRED.FPDIV@ / tma_info_core_slots", "MetricGroup": "TopdownL3;tma_L3_group;tma_base_group", "MetricName": "tma_fpdiv_uops", "MetricThreshold": "tma_fpdiv_uops > 0.2", @@ -293,7 +292,7 @@ { "BriefDescription": "Counts the number of issue slots that were n= ot consumed by the backend due to frontend stalls.", "DefaultMetricgroupName": "TopdownL1", - "MetricExpr": "TOPDOWN_FE_BOUND.ALL / tma_info_core_slots", + "MetricExpr": "cpu_atom@TOPDOWN_FE_BOUND.ALL@ / tma_info_core_slot= s", "MetricGroup": "Default;TopdownL1;tma_L1_group", "MetricName": "tma_frontend_bound", "MetricThreshold": "tma_frontend_bound > 0.2", @@ -303,7 +302,7 @@ }, { "BriefDescription": "Counts the number of issue slots that were n= ot delivered by the frontend due to instruction cache misses.", - "MetricExpr": "TOPDOWN_FE_BOUND.ICACHE / tma_info_core_slots", + "MetricExpr": "cpu_atom@TOPDOWN_FE_BOUND.ICACHE@ / tma_info_core_s= lots", "MetricGroup": "TopdownL3;tma_L3_group;tma_fetch_latency_group", "MetricName": "tma_icache_misses", "MetricThreshold": "tma_icache_misses > 0.05", @@ -330,7 +329,7 @@ }, { "BriefDescription": "Instructions Per Cycle", - "MetricExpr": "INST_RETIRED.ANY / tma_info_core_clks", + "MetricExpr": "cpu_atom@INST_RETIRED.ANY@ / tma_info_core_clks", "MetricName": "tma_info_core_ipc", "Unit": "cpu_atom" }, @@ -342,7 +341,7 @@ }, { "BriefDescription": "Uops Per Instruction", - "MetricExpr": "UOPS_RETIRED.ALL / INST_RETIRED.ANY", + "MetricExpr": "cpu_atom@UOPS_RETIRED.ALL@ / INST_RETIRED.ANY", "MetricName": "tma_info_core_upi", "Unit": "cpu_atom" }, @@ -366,13 +365,13 @@ }, { "BriefDescription": "Ratio of all branches which mispredict", - "MetricExpr": "BR_MISP_RETIRED.ALL_BRANCHES / BR_INST_RETIRED.ALL_= BRANCHES", + "MetricExpr": "cpu_atom@BR_MISP_RETIRED.ALL_BRANCHES@ / BR_INST_RE= TIRED.ALL_BRANCHES", "MetricName": "tma_info_inst_mix_branch_mispredict_ratio", "Unit": "cpu_atom" }, { "BriefDescription": "Ratio between Mispredicted branches and unkno= wn branches", - "MetricExpr": "BR_MISP_RETIRED.ALL_BRANCHES / BACLEARS.ANY", + "MetricExpr": "cpu_atom@BR_MISP_RETIRED.ALL_BRANCHES@ / BACLEARS.A= NY", "MetricName": "tma_info_inst_mix_branch_mispredict_to_unknown_bran= ch_ratio", "Unit": "cpu_atom" }, @@ -390,61 +389,61 @@ }, { "BriefDescription": "Instructions per Branch (lower number means h= igher occurrence rate)", - "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.ALL_BRANCHES", + "MetricExpr": "cpu_atom@INST_RETIRED.ANY@ / BR_INST_RETIRED.ALL_BR= ANCHES", "MetricName": "tma_info_inst_mix_ipbranch", "Unit": "cpu_atom" }, { "BriefDescription": "Instruction per (near) call (lower number mea= ns higher occurrence rate)", - "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.CALL", + "MetricExpr": "cpu_atom@INST_RETIRED.ANY@ / BR_INST_RETIRED.CALL", "MetricName": "tma_info_inst_mix_ipcall", "Unit": "cpu_atom" }, { "BriefDescription": "Instructions per Far Branch", - "MetricExpr": "INST_RETIRED.ANY / (cpu_atom@BR_INST_RETIRED.FAR_BR= ANCH@ / 2)", + "MetricExpr": "cpu_atom@INST_RETIRED.ANY@ / (cpu_atom@BR_INST_RETI= RED.FAR_BRANCH@ / 2)", "MetricName": "tma_info_inst_mix_ipfarbranch", "Unit": "cpu_atom" }, { "BriefDescription": "Instructions per Load", - "MetricExpr": "INST_RETIRED.ANY / MEM_UOPS_RETIRED.ALL_LOADS", + "MetricExpr": "cpu_atom@INST_RETIRED.ANY@ / MEM_UOPS_RETIRED.ALL_L= OADS", "MetricName": "tma_info_inst_mix_ipload", "Unit": "cpu_atom" }, { "BriefDescription": "Instructions per retired conditional Branch M= isprediction where the branch was not taken", - "MetricExpr": "INST_RETIRED.ANY / (cpu_atom@BR_MISP_RETIRED.COND@ = - cpu_atom@BR_MISP_RETIRED.COND_TAKEN@)", + "MetricExpr": "cpu_atom@INST_RETIRED.ANY@ / (cpu_atom@BR_MISP_RETI= RED.COND@ - cpu_atom@BR_MISP_RETIRED.COND_TAKEN@)", "MetricName": "tma_info_inst_mix_ipmisp_cond_ntaken", "Unit": "cpu_atom" }, { "BriefDescription": "Instructions per retired conditional Branch M= isprediction where the branch was taken", - "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.COND_TAKEN", + "MetricExpr": "cpu_atom@INST_RETIRED.ANY@ / BR_MISP_RETIRED.COND_T= AKEN", "MetricName": "tma_info_inst_mix_ipmisp_cond_taken", "Unit": "cpu_atom" }, { "BriefDescription": "Instructions per retired indirect call or jum= p Branch Misprediction", - "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.INDIRECT", + "MetricExpr": "cpu_atom@INST_RETIRED.ANY@ / BR_MISP_RETIRED.INDIRE= CT", "MetricName": "tma_info_inst_mix_ipmisp_indirect", "Unit": "cpu_atom" }, { "BriefDescription": "Instructions per retired return Branch Mispre= diction", - "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.RETURN", + "MetricExpr": "cpu_atom@INST_RETIRED.ANY@ / BR_MISP_RETIRED.RETURN= ", "MetricName": "tma_info_inst_mix_ipmisp_ret", "Unit": "cpu_atom" }, { "BriefDescription": "Instructions per retired Branch Misprediction= ", - "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES", + "MetricExpr": "cpu_atom@INST_RETIRED.ANY@ / BR_MISP_RETIRED.ALL_BR= ANCHES", "MetricName": "tma_info_inst_mix_ipmispredict", "Unit": "cpu_atom" }, { "BriefDescription": "Instructions per Store", - "MetricExpr": "INST_RETIRED.ANY / MEM_UOPS_RETIRED.ALL_STORES", + "MetricExpr": "cpu_atom@INST_RETIRED.ANY@ / MEM_UOPS_RETIRED.ALL_S= TORES", "MetricName": "tma_info_inst_mix_ipstore", "Unit": "cpu_atom" }, @@ -480,19 +479,19 @@ }, { "BriefDescription": "Cycle cost per DRAM hit", - "MetricExpr": "MEM_BOUND_STALLS.LOAD_DRAM_HIT / MEM_LOAD_UOPS_RETI= RED.DRAM_HIT", + "MetricExpr": "cpu_atom@MEM_BOUND_STALLS.LOAD_DRAM_HIT@ / MEM_LOAD= _UOPS_RETIRED.DRAM_HIT", "MetricName": "tma_info_memory_cycles_per_demand_load_dram_hit", "Unit": "cpu_atom" }, { "BriefDescription": "Cycle cost per L2 hit", - "MetricExpr": "MEM_BOUND_STALLS.LOAD_L2_HIT / MEM_LOAD_UOPS_RETIRE= D.L2_HIT", + "MetricExpr": "cpu_atom@MEM_BOUND_STALLS.LOAD_L2_HIT@ / MEM_LOAD_U= OPS_RETIRED.L2_HIT", "MetricName": "tma_info_memory_cycles_per_demand_load_l2_hit", "Unit": "cpu_atom" }, { "BriefDescription": "Cycle cost per LLC hit", - "MetricExpr": "MEM_BOUND_STALLS.LOAD_LLC_HIT / MEM_LOAD_UOPS_RETIR= ED.L3_HIT", + "MetricExpr": "cpu_atom@MEM_BOUND_STALLS.LOAD_LLC_HIT@ / MEM_LOAD_= UOPS_RETIRED.L3_HIT", "MetricName": "tma_info_memory_cycles_per_demand_load_l3_hit", "Unit": "cpu_atom" }, @@ -504,7 +503,7 @@ }, { "BriefDescription": "Average CPU Utilization", - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / TSC", + "MetricExpr": "cpu_atom@CPU_CLK_UNHALTED.REF_TSC@ / TSC", "MetricName": "tma_info_system_cpu_utilization", "Unit": "cpu_atom" }, @@ -524,7 +523,7 @@ }, { "BriefDescription": "Counts the number of issue slots that were n= ot delivered by the frontend due to Instruction Table Lookaside Buffer (ITL= B) misses.", - "MetricExpr": "TOPDOWN_FE_BOUND.ITLB / tma_info_core_slots", + "MetricExpr": "cpu_atom@TOPDOWN_FE_BOUND.ITLB@ / tma_info_core_slo= ts", "MetricGroup": "TopdownL3;tma_L3_group;tma_fetch_latency_group", "MetricName": "tma_itlb_misses", "MetricThreshold": "tma_itlb_misses > 0.05", @@ -533,7 +532,7 @@ }, { "BriefDescription": "Counts the number of cycles that the oldest l= oad of the load buffer is stalled at retirement due to a load block.", - "MetricExpr": "LD_HEAD.L1_BOUND_AT_RET / tma_info_core_clks", + "MetricExpr": "cpu_atom@LD_HEAD.L1_BOUND_AT_RET@ / tma_info_core_c= lks", "MetricGroup": "TopdownL3;tma_L3_group;tma_memory_bound_group", "MetricName": "tma_l1_bound", "MetricThreshold": "tma_l1_bound > 0.1", @@ -542,7 +541,6 @@ }, { "BriefDescription": "Counts the number of cycles a core is stalled= due to a demand load which hit in the L2 Cache.", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "cpu_atom@MEM_BOUND_STALLS.LOAD_L2_HIT@ / tma_info_c= ore_clks - max((cpu_atom@MEM_BOUND_STALLS.LOAD@ - cpu_atom@LD_HEAD.L1_MISS_= AT_RET@) / tma_info_core_clks, 0) * cpu_atom@MEM_BOUND_STALLS.LOAD_L2_HIT@ = / cpu_atom@MEM_BOUND_STALLS.LOAD@", "MetricGroup": "TopdownL3;tma_L3_group;tma_memory_bound_group", "MetricName": "tma_l2_bound", @@ -552,7 +550,6 @@ }, { "BriefDescription": "Counts the number of cycles a core is stalled= due to a demand load which hit in the Last Level Cache (LLC) or other core= with HITE/F/M.", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "cpu_atom@MEM_BOUND_STALLS.LOAD_LLC_HIT@ / tma_info_= core_clks - max((cpu_atom@MEM_BOUND_STALLS.LOAD@ - cpu_atom@LD_HEAD.L1_MISS= _AT_RET@) / tma_info_core_clks, 0) * cpu_atom@MEM_BOUND_STALLS.LOAD_LLC_HIT= @ / cpu_atom@MEM_BOUND_STALLS.LOAD@", "MetricGroup": "TopdownL3;tma_L3_group;tma_memory_bound_group", "MetricName": "tma_l3_bound", @@ -571,7 +568,7 @@ }, { "BriefDescription": "Counts the total number of issue slots that w= ere not consumed by the backend because allocation is stalled due to a mach= ine clear (nuke) of any kind including memory ordering and memory disambigu= ation.", - "MetricExpr": "TOPDOWN_BAD_SPECULATION.MACHINE_CLEARS / tma_info_c= ore_slots", + "MetricExpr": "cpu_atom@TOPDOWN_BAD_SPECULATION.MACHINE_CLEARS@ / = tma_info_core_slots", "MetricGroup": "TopdownL2;tma_L2_group;tma_bad_speculation_group", "MetricName": "tma_machine_clears", "MetricThreshold": "tma_machine_clears > 0.05", @@ -581,7 +578,7 @@ }, { "BriefDescription": "Counts the number of issue slots that were n= ot consumed by the backend due to memory reservation stalls in which a sche= duler is not able to accept uops.", - "MetricExpr": "TOPDOWN_BE_BOUND.MEM_SCHEDULER / tma_info_core_slot= s", + "MetricExpr": "cpu_atom@TOPDOWN_BE_BOUND.MEM_SCHEDULER@ / tma_info= _core_slots", "MetricGroup": "TopdownL3;tma_L3_group;tma_resource_bound_group", "MetricName": "tma_mem_scheduler", "MetricThreshold": "tma_mem_scheduler > 0.1", @@ -590,7 +587,7 @@ }, { "BriefDescription": "Counts the number of cycles the core is stall= ed due to stores or loads.", - "MetricExpr": "min(cpu_atom@TOPDOWN_BE_BOUND.ALL@ / tma_info_core_= slots, cpu_atom@LD_HEAD.ANY_AT_RET@ / tma_info_core_clks + tma_store_bound)= ", + "MetricExpr": "min(tma_backend_bound, cpu_atom@LD_HEAD.ANY_AT_RET@= / tma_info_core_clks + tma_store_bound)", "MetricGroup": "TopdownL2;tma_L2_group;tma_backend_bound_group", "MetricName": "tma_memory_bound", "MetricThreshold": "tma_memory_bound > 0.2", @@ -609,7 +606,7 @@ }, { "BriefDescription": "Counts the number of uops that are from the c= omplex flows issued by the micro-sequencer (MS)", - "MetricExpr": "UOPS_RETIRED.MS / tma_info_core_slots", + "MetricExpr": "cpu_atom@UOPS_RETIRED.MS@ / tma_info_core_slots", "MetricGroup": "TopdownL2;tma_L2_group;tma_retiring_group", "MetricName": "tma_ms_uops", "MetricThreshold": "tma_ms_uops > 0.05", @@ -620,7 +617,7 @@ }, { "BriefDescription": "Counts the number of issue slots that were n= ot consumed by the backend due to IEC or FPC RAT stalls, which can be due t= o FIQ or IEC reservation stalls in which the integer, floating point or SIM= D scheduler is not able to accept uops.", - "MetricExpr": "TOPDOWN_BE_BOUND.NON_MEM_SCHEDULER / tma_info_core_= slots", + "MetricExpr": "cpu_atom@TOPDOWN_BE_BOUND.NON_MEM_SCHEDULER@ / tma_= info_core_slots", "MetricGroup": "TopdownL3;tma_L3_group;tma_resource_bound_group", "MetricName": "tma_non_mem_scheduler", "MetricThreshold": "tma_non_mem_scheduler > 0.1", @@ -629,7 +626,7 @@ }, { "BriefDescription": "Counts the number of issue slots that were n= ot consumed by the backend due to a machine clear (slow nuke).", - "MetricExpr": "TOPDOWN_BAD_SPECULATION.NUKE / tma_info_core_slots", + "MetricExpr": "cpu_atom@TOPDOWN_BAD_SPECULATION.NUKE@ / tma_info_c= ore_slots", "MetricGroup": "TopdownL3;tma_L3_group;tma_machine_clears_group", "MetricName": "tma_nuke", "MetricThreshold": "tma_nuke > 0.05", @@ -638,7 +635,7 @@ }, { "BriefDescription": "Counts the number of issue slots that were n= ot delivered by the frontend due to other common frontend stalls not catego= rized.", - "MetricExpr": "TOPDOWN_FE_BOUND.OTHER / tma_info_core_slots", + "MetricExpr": "cpu_atom@TOPDOWN_FE_BOUND.OTHER@ / tma_info_core_sl= ots", "MetricGroup": "TopdownL3;tma_L3_group;tma_fetch_bandwidth_group", "MetricName": "tma_other_fb", "MetricThreshold": "tma_other_fb > 0.05", @@ -647,7 +644,7 @@ }, { "BriefDescription": "Counts the number of cycles that the oldest l= oad of the load buffer is stalled at retirement due to a number of other lo= ad blocks.", - "MetricExpr": "LD_HEAD.OTHER_AT_RET / tma_info_core_clks", + "MetricExpr": "cpu_atom@LD_HEAD.OTHER_AT_RET@ / tma_info_core_clks= ", "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group", "MetricName": "tma_other_l1", "MetricThreshold": "tma_other_l1 > 0.05", @@ -683,7 +680,7 @@ }, { "BriefDescription": "Counts the number of issue slots that were n= ot delivered by the frontend due to wrong predecodes.", - "MetricExpr": "TOPDOWN_FE_BOUND.PREDECODE / tma_info_core_slots", + "MetricExpr": "cpu_atom@TOPDOWN_FE_BOUND.PREDECODE@ / tma_info_cor= e_slots", "MetricGroup": "TopdownL3;tma_L3_group;tma_fetch_bandwidth_group", "MetricName": "tma_predecode", "MetricThreshold": "tma_predecode > 0.05", @@ -692,7 +689,7 @@ }, { "BriefDescription": "Counts the number of issue slots that were n= ot consumed by the backend due to the physical register file unable to acce= pt an entry (marble stalls).", - "MetricExpr": "TOPDOWN_BE_BOUND.REGISTER / tma_info_core_slots", + "MetricExpr": "cpu_atom@TOPDOWN_BE_BOUND.REGISTER@ / tma_info_core= _slots", "MetricGroup": "TopdownL3;tma_L3_group;tma_resource_bound_group", "MetricName": "tma_register", "MetricThreshold": "tma_register > 0.1", @@ -701,7 +698,7 @@ }, { "BriefDescription": "Counts the number of issue slots that were n= ot consumed by the backend due to the reorder buffer being full (ROB stalls= ).", - "MetricExpr": "TOPDOWN_BE_BOUND.REORDER_BUFFER / tma_info_core_slo= ts", + "MetricExpr": "cpu_atom@TOPDOWN_BE_BOUND.REORDER_BUFFER@ / tma_inf= o_core_slots", "MetricGroup": "TopdownL3;tma_L3_group;tma_resource_bound_group", "MetricName": "tma_reorder_buffer", "MetricThreshold": "tma_reorder_buffer > 0.1", @@ -722,7 +719,7 @@ { "BriefDescription": "Counts the number of issue slots that result= in retirement slots.", "DefaultMetricgroupName": "TopdownL1", - "MetricExpr": "TOPDOWN_RETIRING.ALL / tma_info_core_slots", + "MetricExpr": "cpu_atom@TOPDOWN_RETIRING.ALL@ / tma_info_core_slot= s", "MetricGroup": "Default;TopdownL1;tma_L1_group", "MetricName": "tma_retiring", "MetricThreshold": "tma_retiring > 0.75", @@ -741,7 +738,7 @@ }, { "BriefDescription": "Counts the number of issue slots that were n= ot consumed by the backend due to scoreboards from the instruction queue (I= Q), jump execution unit (JEU), or microcode sequencer (MS).", - "MetricExpr": "TOPDOWN_BE_BOUND.SERIALIZATION / tma_info_core_slot= s", + "MetricExpr": "cpu_atom@TOPDOWN_BE_BOUND.SERIALIZATION@ / tma_info= _core_slots", "MetricGroup": "TopdownL3;tma_L3_group;tma_resource_bound_group", "MetricName": "tma_serialization", "MetricThreshold": "tma_serialization > 0.1", @@ -768,7 +765,7 @@ }, { "BriefDescription": "Counts the number of cycles that the oldest l= oad of the load buffer is stalled at retirement due to a first level TLB mi= ss.", - "MetricExpr": "LD_HEAD.DTLB_MISS_AT_RET / tma_info_core_clks", + "MetricExpr": "cpu_atom@LD_HEAD.DTLB_MISS_AT_RET@ / tma_info_core_= clks", "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group", "MetricName": "tma_stlb_hit", "MetricThreshold": "tma_stlb_hit > 0.05", @@ -777,7 +774,7 @@ }, { "BriefDescription": "Counts the number of cycles that the oldest l= oad of the load buffer is stalled at retirement due to a second level TLB m= iss requiring a page walk.", - "MetricExpr": "LD_HEAD.PGWALK_AT_RET / tma_info_core_clks", + "MetricExpr": "cpu_atom@LD_HEAD.PGWALK_AT_RET@ / tma_info_core_clk= s", "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group", "MetricName": "tma_stlb_miss", "MetricThreshold": "tma_stlb_miss > 0.05", @@ -795,8 +792,7 @@ }, { "BriefDescription": "Counts the number of cycles that the oldest l= oad of the load buffer is stalled at retirement due to a store forward bloc= k.", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", - "MetricExpr": "LD_HEAD.ST_ADDR_AT_RET / tma_info_core_clks", + "MetricExpr": "cpu_atom@LD_HEAD.ST_ADDR_AT_RET@ / tma_info_core_cl= ks", "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group", "MetricName": "tma_store_fwd_blk", "MetricThreshold": "tma_store_fwd_blk > 0.05", @@ -875,7 +871,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles the= CPU was stalled due to Branch Resteers", - "MetricExpr": "INT_MISC.CLEAR_RESTEER_CYCLES / tma_info_thread_clk= s + tma_unknown_branches", + "MetricExpr": "cpu_core@INT_MISC.CLEAR_RESTEER_CYCLES@ / tma_info_= thread_clks + tma_unknown_branches", "MetricGroup": "FetchLat;TopdownL3;tma_L3_group;tma_fetch_latency_= group", "MetricName": "tma_branch_resteers", "MetricThreshold": "tma_branch_resteers > 0.05 & (tma_fetch_latenc= y > 0.1 & tma_frontend_bound > 0.15)", @@ -905,7 +901,6 @@ }, { "BriefDescription": "This metric estimates fraction of cycles whil= e the memory subsystem was handling synchronizations due to contested acces= ses", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "(25 * tma_info_system_average_frequency * (cpu_core= @MEM_LOAD_L3_HIT_RETIRED.XSNP_FWD@ * (cpu_core@OCR.DEMAND_DATA_RD.L3_HIT.SN= OOP_HITM@ / (cpu_core@OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM@ + cpu_core@OCR.= DEMAND_DATA_RD.L3_HIT.SNOOP_HIT_WITH_FWD@))) + 24 * tma_info_system_average= _frequency * cpu_core@MEM_LOAD_L3_HIT_RETIRED.XSNP_MISS@) * (1 + cpu_core@M= EM_LOAD_RETIRED.FB_HIT@ / cpu_core@MEM_LOAD_RETIRED.L1_MISS@ / 2) / tma_inf= o_thread_clks", "MetricGroup": "DataSharing;Offcore;Snoop;TopdownL4;tma_L4_group;t= ma_issueSyncxn;tma_l3_bound_group", "MetricName": "tma_contested_accesses", @@ -927,7 +922,6 @@ }, { "BriefDescription": "This metric estimates fraction of cycles whil= e the memory subsystem was handling synchronizations due to data-sharing ac= cesses", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "24 * tma_info_system_average_frequency * (cpu_core@= MEM_LOAD_L3_HIT_RETIRED.XSNP_NO_FWD@ + cpu_core@MEM_LOAD_L3_HIT_RETIRED.XSN= P_FWD@ * (1 - cpu_core@OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM@ / (cpu_core@OC= R.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM@ + cpu_core@OCR.DEMAND_DATA_RD.L3_HIT.SN= OOP_HIT_WITH_FWD@))) * (1 + cpu_core@MEM_LOAD_RETIRED.FB_HIT@ / cpu_core@ME= M_LOAD_RETIRED.L1_MISS@ / 2) / tma_info_thread_clks", "MetricGroup": "Offcore;Snoop;TopdownL4;tma_L4_group;tma_issueSync= xn;tma_l3_bound_group", "MetricName": "tma_data_sharing", @@ -948,7 +942,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles whe= re the Divider unit was active", - "MetricExpr": "ARITH.DIV_ACTIVE / tma_info_thread_clks", + "MetricExpr": "cpu_core@ARITH.DIV_ACTIVE@ / tma_info_thread_clks", "MetricGroup": "TopdownL3;tma_L3_group;tma_core_bound_group", "MetricName": "tma_divider", "MetricThreshold": "tma_divider > 0.2 & (tma_core_bound > 0.1 & tm= a_backend_bound > 0.2)", @@ -958,7 +952,6 @@ }, { "BriefDescription": "This metric estimates how often the CPU was s= talled on accesses to external memory (DRAM) by loads", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "cpu_core@MEMORY_ACTIVITY.STALLS_L3_MISS@ / tma_info= _thread_clks", "MetricGroup": "MemoryBound;TmaL3mem;TopdownL3;tma_L3_group;tma_me= mory_bound_group", "MetricName": "tma_dram_bound", @@ -979,7 +972,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles the= CPU was stalled due to switches from DSB to MITE pipelines", - "MetricExpr": "DSB2MITE_SWITCHES.PENALTY_CYCLES / tma_info_thread_= clks", + "MetricExpr": "cpu_core@DSB2MITE_SWITCHES.PENALTY_CYCLES@ / tma_in= fo_thread_clks", "MetricGroup": "DSBmiss;FetchLat;TopdownL3;tma_L3_group;tma_fetch_= latency_group;tma_issueFB", "MetricName": "tma_dsb_switches", "MetricThreshold": "tma_dsb_switches > 0.05 & (tma_fetch_latency >= 0.1 & tma_frontend_bound > 0.15)", @@ -1019,7 +1012,7 @@ }, { "BriefDescription": "This metric does a *rough estimation* of how = often L1D Fill Buffer unavailability limited additional L1D miss memory acc= ess requests to proceed", - "MetricExpr": "L1D_PEND_MISS.FB_FULL / tma_info_thread_clks", + "MetricExpr": "cpu_core@L1D_PEND_MISS.FB_FULL@ / tma_info_thread_c= lks", "MetricGroup": "MemoryBW;TopdownL4;tma_L4_group;tma_issueBW;tma_is= sueSL;tma_issueSmSt;tma_l1_bound_group", "MetricName": "tma_fb_full", "MetricThreshold": "tma_fb_full > 0.3", @@ -1154,7 +1147,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles the= CPU was stalled due to instruction cache misses", - "MetricExpr": "ICACHE_DATA.STALLS / tma_info_thread_clks", + "MetricExpr": "cpu_core@ICACHE_DATA.STALLS@ / tma_info_thread_clks= ", "MetricGroup": "BigFoot;FetchLat;IcMiss;TopdownL3;tma_L3_group;tma= _fetch_latency_group", "MetricName": "tma_icache_misses", "MetricThreshold": "tma_icache_misses > 0.05 & (tma_fetch_latency = > 0.1 & tma_frontend_bound > 0.15)", @@ -1164,7 +1157,6 @@ }, { "BriefDescription": "Branch Misprediction Cost: Fraction of TMA sl= ots wasted per non-speculative branch misprediction (retired JEClear)", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "(tma_branch_mispredicts + tma_fetch_latency * tma_m= ispredicts_resteers / (tma_branch_resteers + tma_dsb_switches + tma_icache_= misses + tma_itlb_misses + tma_lcp + tma_ms_switches)) * tma_info_thread_sl= ots / BR_MISP_RETIRED.ALL_BRANCHES", "MetricGroup": "Bad;BrMispredicts;tma_issueBM", "MetricName": "tma_info_bad_spec_branch_misprediction_cost", @@ -1173,7 +1165,7 @@ }, { "BriefDescription": "Instructions per retired mispredicts for cond= itional non-taken branches (lower number means higher occurrence rate).", - "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.COND_NTAKEN", + "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / BR_MISP_RETIRED.COND_N= TAKEN", "MetricGroup": "Bad;BrMispredicts", "MetricName": "tma_info_bad_spec_ipmisp_cond_ntaken", "MetricThreshold": "tma_info_bad_spec_ipmisp_cond_ntaken < 200", @@ -1181,7 +1173,7 @@ }, { "BriefDescription": "Instructions per retired mispredicts for cond= itional taken branches (lower number means higher occurrence rate).", - "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.COND_TAKEN", + "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / BR_MISP_RETIRED.COND_T= AKEN", "MetricGroup": "Bad;BrMispredicts", "MetricName": "tma_info_bad_spec_ipmisp_cond_taken", "MetricThreshold": "tma_info_bad_spec_ipmisp_cond_taken < 200", @@ -1197,7 +1189,7 @@ }, { "BriefDescription": "Instructions per retired mispredicts for retu= rn branches (lower number means higher occurrence rate).", - "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.RET", + "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / BR_MISP_RETIRED.RET", "MetricGroup": "Bad;BrMispredicts", "MetricName": "tma_info_bad_spec_ipmisp_ret", "MetricThreshold": "tma_info_bad_spec_ipmisp_ret < 500", @@ -1205,7 +1197,7 @@ }, { "BriefDescription": "Number of Instructions per non-speculative Br= anch Misprediction (JEClear) (lower number means higher occurrence rate)", - "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES", + "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / BR_MISP_RETIRED.ALL_BR= ANCHES", "MetricGroup": "Bad;BadSpec;BrMispredicts", "MetricName": "tma_info_bad_spec_ipmispredict", "MetricThreshold": "tma_info_bad_spec_ipmispredict < 200", @@ -1213,7 +1205,6 @@ }, { "BriefDescription": "Probability of Core Bound bottleneck hidden b= y SMT-profiling artifacts", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "(100 * (1 - tma_core_bound / tma_ports_utilization = if tma_core_bound < tma_ports_utilization else 1) if tma_info_system_smt_2t= _utilization > 0.5 else 0)", "MetricGroup": "Cor;SMT", "MetricName": "tma_info_botlnk_l0_core_bound_likely", @@ -1222,7 +1213,6 @@ }, { "BriefDescription": "Total pipeline cost of DSB (uop cache) misses= - subset of the Instruction_Fetch_BW Bottleneck", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_fetch_latency * tma_dsb_switches / (tma_= branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + = tma_lcp + tma_ms_switches) + tma_fetch_bandwidth * tma_mite / (tma_dsb + tm= a_lsd + tma_mite))", "MetricGroup": "DSBmiss;Fed;tma_issueFB", "MetricName": "tma_info_botlnk_l2_dsb_misses", @@ -1232,7 +1222,6 @@ }, { "BriefDescription": "Total pipeline cost of Instruction Cache miss= es - subset of the Big_Code Bottleneck", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_fetch_latency * tma_icache_misses / (tma= _branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses += tma_lcp + tma_ms_switches))", "MetricGroup": "Fed;FetchLat;IcMiss;tma_issueFL", "MetricName": "tma_info_botlnk_l2_ic_misses", @@ -1242,7 +1231,6 @@ }, { "BriefDescription": "Total pipeline cost of instruction fetch rela= ted bottlenecks by large code footprint programs (i-side cache; TLB and BTB= misses)", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * tma_fetch_latency * (tma_itlb_misses + tma_ic= ache_misses + tma_unknown_branches) / (tma_branch_resteers + tma_dsb_switch= es + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches)", "MetricGroup": "BigFoot;Fed;Frontend;IcMiss;MemoryTLB;tma_issueBC", "MetricName": "tma_info_bottleneck_big_code", @@ -1261,7 +1249,6 @@ }, { "BriefDescription": "Total pipeline cost of instruction fetch band= width related bottlenecks", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_frontend_bound - tma_fetch_latency * tma= _mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches + tma_icach= e_misses + tma_itlb_misses + tma_lcp + tma_ms_switches)) - tma_info_bottlen= eck_big_code", "MetricGroup": "Fed;FetchBW;Frontend", "MetricName": "tma_info_bottleneck_instruction_fetch_bw", @@ -1270,7 +1257,6 @@ }, { "BriefDescription": "Total pipeline cost of (external) Memory Band= width related bottlenecks", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * tma_memory_bound * (tma_dram_bound / (tma_dra= m_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * (= tma_mem_bandwidth / (tma_mem_bandwidth + tma_mem_latency)) + tma_l3_bound /= (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_b= ound) * (tma_sq_full / (tma_contested_accesses + tma_data_sharing + tma_l3_= hit_latency + tma_sq_full))) + tma_l1_bound / (tma_dram_bound + tma_l1_boun= d + tma_l2_bound + tma_l3_bound + tma_store_bound) * (tma_fb_full / (tma_dt= lb_load + tma_fb_full + tma_lock_latency + tma_split_loads + tma_store_fwd_= blk))", "MetricGroup": "Mem;MemoryBW;Offcore;tma_issueBW", "MetricName": "tma_info_bottleneck_memory_bandwidth", @@ -1280,7 +1266,6 @@ }, { "BriefDescription": "Total pipeline cost of Memory Address Transla= tion related bottlenecks (data-side TLBs)", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * tma_memory_bound * (tma_l1_bound / max(tma_me= mory_bound, tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + t= ma_store_bound) * (tma_dtlb_load / max(tma_l1_bound, tma_dtlb_load + tma_fb= _full + tma_lock_latency + tma_split_loads + tma_store_fwd_blk)) + tma_stor= e_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tm= a_store_bound) * (tma_dtlb_store / (tma_dtlb_store + tma_false_sharing + tm= a_split_stores + tma_store_latency + tma_streaming_stores)))", "MetricGroup": "Mem;MemoryTLB;Offcore;tma_issueTLB", "MetricName": "tma_info_bottleneck_memory_data_tlbs", @@ -1290,7 +1275,6 @@ }, { "BriefDescription": "Total pipeline cost of Memory Latency related= bottlenecks (external memory and off-core caches)", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * tma_memory_bound * (tma_dram_bound / (tma_dra= m_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * (= tma_mem_latency / (tma_mem_bandwidth + tma_mem_latency)) + tma_l3_bound / (= tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bou= nd) * (tma_l3_hit_latency / (tma_contested_accesses + tma_data_sharing + tm= a_l3_hit_latency + tma_sq_full)) + tma_l2_bound / (tma_dram_bound + tma_l1_= bound + tma_l2_bound + tma_l3_bound + tma_store_bound))", "MetricGroup": "Mem;MemoryLat;Offcore;tma_issueLat", "MetricName": "tma_info_bottleneck_memory_latency", @@ -1300,7 +1284,6 @@ }, { "BriefDescription": "Total pipeline cost of Branch Misprediction r= elated bottlenecks", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_branch_mispredicts + tma_fetch_latency *= tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches + tma_i= cache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches))", "MetricGroup": "Bad;BadSpec;BrMispredicts;tma_issueBM", "MetricName": "tma_info_bottleneck_mispredictions", @@ -1317,14 +1300,14 @@ }, { "BriefDescription": "Fraction of branches that are non-taken condi= tionals", - "MetricExpr": "BR_INST_RETIRED.COND_NTAKEN / BR_INST_RETIRED.ALL_B= RANCHES", + "MetricExpr": "cpu_core@BR_INST_RETIRED.COND_NTAKEN@ / BR_INST_RET= IRED.ALL_BRANCHES", "MetricGroup": "Bad;Branches;CodeGen;PGO", "MetricName": "tma_info_branches_cond_nt", "Unit": "cpu_core" }, { "BriefDescription": "Fraction of branches that are taken condition= als", - "MetricExpr": "BR_INST_RETIRED.COND_TAKEN / BR_INST_RETIRED.ALL_BR= ANCHES", + "MetricExpr": "cpu_core@BR_INST_RETIRED.COND_TAKEN@ / BR_INST_RETI= RED.ALL_BRANCHES", "MetricGroup": "Bad;Branches;CodeGen;PGO", "MetricName": "tma_info_branches_cond_tk", "Unit": "cpu_core" @@ -1352,7 +1335,7 @@ }, { "BriefDescription": "Instructions Per Cycle across hyper-threads (= per physical core)", - "MetricExpr": "INST_RETIRED.ANY / tma_info_core_core_clks", + "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / tma_info_core_core_clk= s", "MetricGroup": "Ret;SMT;TmaL1;tma_L1_group", "MetricName": "tma_info_core_coreipc", "Unit": "cpu_core" @@ -1374,14 +1357,14 @@ }, { "BriefDescription": "Instruction-Level-Parallelism (average number= of uops executed when there is execution) per-core", - "MetricExpr": "UOPS_EXECUTED.THREAD / (cpu_core@UOPS_EXECUTED.CORE= _CYCLES_GE_1@ / 2 if #SMT_on else cpu_core@UOPS_EXECUTED.CORE_CYCLES_GE_1@)= ", + "MetricExpr": "cpu_core@UOPS_EXECUTED.THREAD@ / (cpu_core@UOPS_EXE= CUTED.CORE_CYCLES_GE_1@ / 2 if #SMT_on else cpu_core@UOPS_EXECUTED.CORE_CYC= LES_GE_1@)", "MetricGroup": "Backend;Cor;Pipeline;PortsUtil", "MetricName": "tma_info_core_ilp", "Unit": "cpu_core" }, { "BriefDescription": "Fraction of Uops delivered by the DSB (aka De= coded ICache; or Uop Cache)", - "MetricExpr": "IDQ.DSB_UOPS / cpu_core@UOPS_ISSUED.ANY@", + "MetricExpr": "cpu_core@IDQ.DSB_UOPS@ / cpu_core@UOPS_ISSUED.ANY@", "MetricGroup": "DSB;Fed;FetchBW;tma_issueFB", "MetricName": "tma_info_frontend_dsb_coverage", "MetricThreshold": "tma_info_frontend_dsb_coverage < 0.7 & tma_inf= o_thread_ipc / 6 > 0.35", @@ -1390,28 +1373,28 @@ }, { "BriefDescription": "Average number of cycles of a switch from the= DSB fetch-unit to MITE fetch unit - see DSB_Switches tree node for details= .", - "MetricExpr": "DSB2MITE_SWITCHES.PENALTY_CYCLES / cpu_core@DSB2MIT= E_SWITCHES.PENALTY_CYCLES\\,cmask\\=3D1\\,edge@", + "MetricExpr": "cpu_core@DSB2MITE_SWITCHES.PENALTY_CYCLES@ / cpu_co= re@DSB2MITE_SWITCHES.PENALTY_CYCLES\\,cmask\\=3D1\\,edge@", "MetricGroup": "DSBmiss", "MetricName": "tma_info_frontend_dsb_switch_cost", "Unit": "cpu_core" }, { "BriefDescription": "Average number of Uops issued by front-end wh= en it issued something", - "MetricExpr": "UOPS_ISSUED.ANY / cpu_core@UOPS_ISSUED.ANY\\,cmask\= \=3D1@", + "MetricExpr": "cpu_core@UOPS_ISSUED.ANY@ / cpu_core@UOPS_ISSUED.AN= Y\\,cmask\\=3D1@", "MetricGroup": "Fed;FetchBW", "MetricName": "tma_info_frontend_fetch_upc", "Unit": "cpu_core" }, { "BriefDescription": "Average Latency for L1 instruction cache miss= es", - "MetricExpr": "ICACHE_DATA.STALLS / cpu_core@ICACHE_DATA.STALLS\\,= cmask\\=3D1\\,edge@", + "MetricExpr": "cpu_core@ICACHE_DATA.STALLS@ / cpu_core@ICACHE_DATA= .STALLS\\,cmask\\=3D1\\,edge@", "MetricGroup": "Fed;FetchLat;IcMiss", "MetricName": "tma_info_frontend_icache_miss_latency", "Unit": "cpu_core" }, { "BriefDescription": "Instructions per non-speculative DSB miss (lo= wer number means higher occurrence rate)", - "MetricExpr": "INST_RETIRED.ANY / FRONTEND_RETIRED.ANY_DSB_MISS", + "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / FRONTEND_RETIRED.ANY_D= SB_MISS", "MetricGroup": "DSBmiss;Fed", "MetricName": "tma_info_frontend_ipdsb_miss_ret", "MetricThreshold": "tma_info_frontend_ipdsb_miss_ret < 50", @@ -1440,14 +1423,14 @@ }, { "BriefDescription": "Fraction of Uops delivered by the LSD (Loop S= tream Detector; aka Loop Cache)", - "MetricExpr": "LSD.UOPS / cpu_core@UOPS_ISSUED.ANY@", + "MetricExpr": "cpu_core@LSD.UOPS@ / cpu_core@UOPS_ISSUED.ANY@", "MetricGroup": "Fed;LSD", "MetricName": "tma_info_frontend_lsd_coverage", "Unit": "cpu_core" }, { "BriefDescription": "Branch instructions per taken branch.", - "MetricExpr": "BR_INST_RETIRED.ALL_BRANCHES / BR_INST_RETIRED.NEAR= _TAKEN", + "MetricExpr": "cpu_core@BR_INST_RETIRED.ALL_BRANCHES@ / BR_INST_RE= TIRED.NEAR_TAKEN", "MetricGroup": "Branches;Fed;PGO", "MetricName": "tma_info_inst_mix_bptkbranch", "Unit": "cpu_core" @@ -1462,7 +1445,7 @@ }, { "BriefDescription": "Instructions per FP Arithmetic instruction (l= ower number means higher occurrence rate)", - "MetricExpr": "INST_RETIRED.ANY / (cpu_core@FP_ARITH_INST_RETIRED.= SCALAR_SINGLE\\,umask\\=3D0x03@ + cpu_core@FP_ARITH_INST_RETIRED.128B_PACKE= D_DOUBLE\\,umask\\=3D0x3c@)", + "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / (cpu_core@FP_ARITH_INS= T_RETIRED.SCALAR_SINGLE\\,umask\\=3D0x03@ + cpu_core@FP_ARITH_INST_RETIRED.= 128B_PACKED_DOUBLE\\,umask\\=3D0x3c@)", "MetricGroup": "Flops;InsType", "MetricName": "tma_info_inst_mix_iparith", "MetricThreshold": "tma_info_inst_mix_iparith < 10", @@ -1471,7 +1454,7 @@ }, { "BriefDescription": "Instructions per FP Arithmetic AVX/SSE 128-bi= t instruction (lower number means higher occurrence rate)", - "MetricExpr": "INST_RETIRED.ANY / (cpu_core@FP_ARITH_INST_RETIRED.= 128B_PACKED_DOUBLE@ + cpu_core@FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE@)", + "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / (cpu_core@FP_ARITH_INS= T_RETIRED.128B_PACKED_DOUBLE@ + cpu_core@FP_ARITH_INST_RETIRED.128B_PACKED_= SINGLE@)", "MetricGroup": "Flops;FpVector;InsType", "MetricName": "tma_info_inst_mix_iparith_avx128", "MetricThreshold": "tma_info_inst_mix_iparith_avx128 < 10", @@ -1480,7 +1463,7 @@ }, { "BriefDescription": "Instructions per FP Arithmetic AVX* 256-bit i= nstruction (lower number means higher occurrence rate)", - "MetricExpr": "INST_RETIRED.ANY / (cpu_core@FP_ARITH_INST_RETIRED.= 256B_PACKED_DOUBLE@ + cpu_core@FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE@)", + "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / (cpu_core@FP_ARITH_INS= T_RETIRED.256B_PACKED_DOUBLE@ + cpu_core@FP_ARITH_INST_RETIRED.256B_PACKED_= SINGLE@)", "MetricGroup": "Flops;FpVector;InsType", "MetricName": "tma_info_inst_mix_iparith_avx256", "MetricThreshold": "tma_info_inst_mix_iparith_avx256 < 10", @@ -1489,7 +1472,7 @@ }, { "BriefDescription": "Instructions per FP Arithmetic Scalar Double-= Precision instruction (lower number means higher occurrence rate)", - "MetricExpr": "INST_RETIRED.ANY / FP_ARITH_INST_RETIRED.SCALAR_DOU= BLE", + "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / FP_ARITH_INST_RETIRED.= SCALAR_DOUBLE", "MetricGroup": "Flops;FpScalar;InsType", "MetricName": "tma_info_inst_mix_iparith_scalar_dp", "MetricThreshold": "tma_info_inst_mix_iparith_scalar_dp < 10", @@ -1498,7 +1481,7 @@ }, { "BriefDescription": "Instructions per FP Arithmetic Scalar Single-= Precision instruction (lower number means higher occurrence rate)", - "MetricExpr": "INST_RETIRED.ANY / FP_ARITH_INST_RETIRED.SCALAR_SIN= GLE", + "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / FP_ARITH_INST_RETIRED.= SCALAR_SINGLE", "MetricGroup": "Flops;FpScalar;InsType", "MetricName": "tma_info_inst_mix_iparith_scalar_sp", "MetricThreshold": "tma_info_inst_mix_iparith_scalar_sp < 10", @@ -1507,7 +1490,7 @@ }, { "BriefDescription": "Instructions per Branch (lower number means h= igher occurrence rate)", - "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.ALL_BRANCHES", + "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / BR_INST_RETIRED.ALL_BR= ANCHES", "MetricGroup": "Branches;Fed;InsType", "MetricName": "tma_info_inst_mix_ipbranch", "MetricThreshold": "tma_info_inst_mix_ipbranch < 8", @@ -1515,7 +1498,7 @@ }, { "BriefDescription": "Instructions per (near) call (lower number me= ans higher occurrence rate)", - "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_CALL", + "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / BR_INST_RETIRED.NEAR_C= ALL", "MetricGroup": "Branches;Fed;PGO", "MetricName": "tma_info_inst_mix_ipcall", "MetricThreshold": "tma_info_inst_mix_ipcall < 200", @@ -1523,7 +1506,7 @@ }, { "BriefDescription": "Instructions per Floating Point (FP) Operatio= n (lower number means higher occurrence rate)", - "MetricExpr": "INST_RETIRED.ANY / (cpu_core@FP_ARITH_INST_RETIRED.= SCALAR_SINGLE@ + cpu_core@FP_ARITH_INST_RETIRED.SCALAR_DOUBLE@ + 2 * cpu_co= re@FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE@ + 4 * (cpu_core@FP_ARITH_INST_= RETIRED.128B_PACKED_SINGLE@ + cpu_core@FP_ARITH_INST_RETIRED.256B_PACKED_DO= UBLE@) + 8 * cpu_core@FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE@)", + "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / (cpu_core@FP_ARITH_INS= T_RETIRED.SCALAR_SINGLE@ + cpu_core@FP_ARITH_INST_RETIRED.SCALAR_DOUBLE@ + = 2 * cpu_core@FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE@ + 4 * (cpu_core@FP_A= RITH_INST_RETIRED.128B_PACKED_SINGLE@ + cpu_core@FP_ARITH_INST_RETIRED.256B= _PACKED_DOUBLE@) + 8 * cpu_core@FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE@)", "MetricGroup": "Flops;InsType", "MetricName": "tma_info_inst_mix_ipflop", "MetricThreshold": "tma_info_inst_mix_ipflop < 10", @@ -1531,7 +1514,7 @@ }, { "BriefDescription": "Instructions per Load (lower number means hig= her occurrence rate)", - "MetricExpr": "INST_RETIRED.ANY / MEM_INST_RETIRED.ALL_LOADS", + "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / MEM_INST_RETIRED.ALL_L= OADS", "MetricGroup": "InsType", "MetricName": "tma_info_inst_mix_ipload", "MetricThreshold": "tma_info_inst_mix_ipload < 3", @@ -1539,7 +1522,7 @@ }, { "BriefDescription": "Instructions per Store (lower number means hi= gher occurrence rate)", - "MetricExpr": "INST_RETIRED.ANY / MEM_INST_RETIRED.ALL_STORES", + "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / MEM_INST_RETIRED.ALL_S= TORES", "MetricGroup": "InsType", "MetricName": "tma_info_inst_mix_ipstore", "MetricThreshold": "tma_info_inst_mix_ipstore < 8", @@ -1547,7 +1530,7 @@ }, { "BriefDescription": "Instructions per Software prefetch instructio= n (of any type: NTA/T0/T1/T2/Prefetch) (lower number means higher occurrenc= e rate)", - "MetricExpr": "INST_RETIRED.ANY / cpu_core@SW_PREFETCH_ACCESS.T0\\= ,umask\\=3D0xF@", + "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / cpu_core@SW_PREFETCH_A= CCESS.T0\\,umask\\=3D0xF@", "MetricGroup": "Prefetches", "MetricName": "tma_info_inst_mix_ipswpf", "MetricThreshold": "tma_info_inst_mix_ipswpf < 100", @@ -1555,7 +1538,7 @@ }, { "BriefDescription": "Instruction per taken branch", - "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN", + "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / BR_INST_RETIRED.NEAR_T= AKEN", "MetricGroup": "Branches;Fed;FetchBW;Frontend;PGO;tma_issueFB", "MetricName": "tma_info_inst_mix_iptb", "MetricThreshold": "tma_info_inst_mix_iptb < 13", @@ -1655,14 +1638,14 @@ }, { "BriefDescription": "Actual Average Latency for L1 data-cache miss= demand load operations (in core cycles)", - "MetricExpr": "L1D_PEND_MISS.PENDING / MEM_LOAD_COMPLETED.L1_MISS_= ANY", + "MetricExpr": "cpu_core@L1D_PEND_MISS.PENDING@ / MEM_LOAD_COMPLETE= D.L1_MISS_ANY", "MetricGroup": "Mem;MemoryBound;MemoryLat", "MetricName": "tma_info_memory_load_miss_real_latency", "Unit": "cpu_core" }, { "BriefDescription": "Memory-Level-Parallelism (average number of L= 1 miss demand load when there is at least one such miss", - "MetricExpr": "L1D_PEND_MISS.PENDING / L1D_PEND_MISS.PENDING_CYCLE= S", + "MetricExpr": "cpu_core@L1D_PEND_MISS.PENDING@ / L1D_PEND_MISS.PEN= DING_CYCLES", "MetricGroup": "Mem;MemoryBW;MemoryBound", "MetricName": "tma_info_memory_mlp", "PublicDescription": "Memory-Level-Parallelism (average number of = L1 miss demand load when there is at least one such miss. Per-Logical Proce= ssor)", @@ -1670,28 +1653,28 @@ }, { "BriefDescription": "Average Parallel L2 cache miss data reads", - "MetricExpr": "OFFCORE_REQUESTS_OUTSTANDING.ALL_DATA_RD / OFFCORE_= REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD", + "MetricExpr": "cpu_core@OFFCORE_REQUESTS_OUTSTANDING.ALL_DATA_RD@ = / OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD", "MetricGroup": "Memory_BW;Offcore", "MetricName": "tma_info_memory_oro_data_l2_mlp", "Unit": "cpu_core" }, { "BriefDescription": "Average Latency for L2 cache miss demand Load= s", - "MetricExpr": "OFFCORE_REQUESTS_OUTSTANDING.DEMAND_DATA_RD / OFFCO= RE_REQUESTS.DEMAND_DATA_RD", + "MetricExpr": "cpu_core@OFFCORE_REQUESTS_OUTSTANDING.DEMAND_DATA_R= D@ / OFFCORE_REQUESTS.DEMAND_DATA_RD", "MetricGroup": "Memory_Lat;Offcore", "MetricName": "tma_info_memory_oro_load_l2_miss_latency", "Unit": "cpu_core" }, { "BriefDescription": "Average Parallel L2 cache miss demand Loads", - "MetricExpr": "OFFCORE_REQUESTS_OUTSTANDING.DEMAND_DATA_RD / cpu_c= ore@OFFCORE_REQUESTS_OUTSTANDING.DEMAND_DATA_RD\\,cmask\\=3D1@", + "MetricExpr": "cpu_core@OFFCORE_REQUESTS_OUTSTANDING.DEMAND_DATA_R= D@ / cpu_core@OFFCORE_REQUESTS_OUTSTANDING.DEMAND_DATA_RD\\,cmask\\=3D1@", "MetricGroup": "Memory_BW;Offcore", "MetricName": "tma_info_memory_oro_load_l2_mlp", "Unit": "cpu_core" }, { "BriefDescription": "Average Latency for L3 cache miss demand Load= s", - "MetricExpr": "OFFCORE_REQUESTS_OUTSTANDING.L3_MISS_DEMAND_DATA_RD= / OFFCORE_REQUESTS.L3_MISS_DEMAND_DATA_RD", + "MetricExpr": "cpu_core@OFFCORE_REQUESTS_OUTSTANDING.L3_MISS_DEMAN= D_DATA_RD@ / OFFCORE_REQUESTS.L3_MISS_DEMAND_DATA_RD", "MetricGroup": "Memory_Lat;Offcore", "MetricName": "tma_info_memory_oro_load_l3_miss_latency", "Unit": "cpu_core" @@ -1755,14 +1738,14 @@ }, { "BriefDescription": "Instruction-Level-Parallelism (average number= of uops executed when there is execution) per-thread", - "MetricExpr": "UOPS_EXECUTED.THREAD / cpu_core@UOPS_EXECUTED.THREA= D\\,cmask\\=3D1@", + "MetricExpr": "cpu_core@UOPS_EXECUTED.THREAD@ / cpu_core@UOPS_EXEC= UTED.THREAD\\,cmask\\=3D1@", "MetricGroup": "Cor;Pipeline;PortsUtil;SMT", "MetricName": "tma_info_pipeline_execute", "Unit": "cpu_core" }, { "BriefDescription": "Instructions per a microcode Assist invocatio= n", - "MetricExpr": "INST_RETIRED.ANY / cpu_core@ASSISTS.ANY\\,umask\\= =3D0x1B@", + "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / cpu_core@ASSISTS.ANY\\= ,umask\\=3D0x1B@", "MetricGroup": "Pipeline;Ret;Retire", "MetricName": "tma_info_pipeline_ipassist", "MetricThreshold": "tma_info_pipeline_ipassist < 100e3", @@ -1778,7 +1761,7 @@ }, { "BriefDescription": "Estimated fraction of retirement-cycles deali= ng with repeat instructions", - "MetricExpr": "INST_RETIRED.REP_ITERATION / cpu_core@UOPS_RETIRED.= SLOTS\\,cmask\\=3D1@", + "MetricExpr": "cpu_core@INST_RETIRED.REP_ITERATION@ / cpu_core@UOP= S_RETIRED.SLOTS\\,cmask\\=3D1@", "MetricGroup": "Pipeline;Ret", "MetricName": "tma_info_pipeline_strings_cycles", "MetricThreshold": "tma_info_pipeline_strings_cycles > 0.1", @@ -1793,7 +1776,7 @@ }, { "BriefDescription": "Average CPU Utilization", - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / TSC", + "MetricExpr": "cpu_core@CPU_CLK_UNHALTED.REF_TSC@ / TSC", "MetricGroup": "HPC;Summary", "MetricName": "tma_info_system_cpu_utilization", "Unit": "cpu_core" @@ -1816,7 +1799,7 @@ }, { "BriefDescription": "Instructions per Far Branch ( Far Branches ap= ply upon transition from application to operating system, handling interrup= ts, exceptions) [lower number means higher occurrence rate]", - "MetricExpr": "INST_RETIRED.ANY / cpu_core@BR_INST_RETIRED.FAR_BRA= NCH@u", + "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / cpu_core@BR_INST_RETIR= ED.FAR_BRANCH@u", "MetricGroup": "Branches;OS", "MetricName": "tma_info_system_ipfarbranch", "MetricThreshold": "tma_info_system_ipfarbranch < 1e6", @@ -1847,6 +1830,7 @@ }, { "BriefDescription": "Average latency of data read request to exter= nal memory (in nanoseconds)", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "(UNC_ARB_TRK_OCCUPANCY.RD + UNC_ARB_DAT_OCCUPANCY.R= D) / UNC_ARB_TRK_REQUESTS.RD", "MetricGroup": "Mem;MemoryLat;SoC", "MetricName": "tma_info_system_mem_read_latency", @@ -1855,6 +1839,7 @@ }, { "BriefDescription": "Average latency of all requests to external m= emory (in Uncore cycles)", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "(UNC_ARB_TRK_OCCUPANCY.ALL + UNC_ARB_DAT_OCCUPANCY.= RD) / UNC_ARB_TRK_REQUESTS.ALL", "MetricGroup": "Mem;SoC", "MetricName": "tma_info_system_mem_request_latency", @@ -1897,7 +1882,7 @@ }, { "BriefDescription": "The ratio of Executed- by Issued-Uops", - "MetricExpr": "UOPS_EXECUTED.THREAD / UOPS_ISSUED.ANY", + "MetricExpr": "cpu_core@UOPS_EXECUTED.THREAD@ / UOPS_ISSUED.ANY", "MetricGroup": "Cor;Pipeline", "MetricName": "tma_info_thread_execute_per_issue", "PublicDescription": "The ratio of Executed- by Issued-Uops. Ratio= > 1 suggests high rate of uop micro-fusions. Ratio < 1 suggest high rate o= f \"execute\" at rename stage.", @@ -1905,7 +1890,7 @@ }, { "BriefDescription": "Instructions Per Cycle (per Logical Processor= )", - "MetricExpr": "INST_RETIRED.ANY / tma_info_thread_clks", + "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / tma_info_thread_clks", "MetricGroup": "Ret;Summary", "MetricName": "tma_info_thread_ipc", "Unit": "cpu_core" @@ -1972,7 +1957,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles the= CPU was stalled due to Instruction TLB (ITLB) misses", - "MetricExpr": "ICACHE_TAG.STALLS / tma_info_thread_clks", + "MetricExpr": "cpu_core@ICACHE_TAG.STALLS@ / tma_info_thread_clks", "MetricGroup": "BigFoot;FetchLat;MemoryTLB;TopdownL3;tma_L3_group;= tma_fetch_latency_group", "MetricName": "tma_itlb_misses", "MetricThreshold": "tma_itlb_misses > 0.05 & (tma_fetch_latency > = 0.1 & tma_frontend_bound > 0.15)", @@ -1992,7 +1977,6 @@ }, { "BriefDescription": "This metric estimates how often the CPU was s= talled due to L2 cache accesses by loads", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "(cpu_core@MEMORY_ACTIVITY.STALLS_L1D_MISS@ - cpu_co= re@MEMORY_ACTIVITY.STALLS_L2_MISS@) / tma_info_thread_clks", "MetricGroup": "CacheMisses;MemoryBound;TmaL3mem;TopdownL3;tma_L3_= group;tma_memory_bound_group", "MetricName": "tma_l2_bound", @@ -2003,7 +1987,6 @@ }, { "BriefDescription": "This metric estimates how often the CPU was s= talled due to loads accesses to L3 cache or contended with a sibling Core", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "(cpu_core@MEMORY_ACTIVITY.STALLS_L2_MISS@ - cpu_cor= e@MEMORY_ACTIVITY.STALLS_L3_MISS@) / tma_info_thread_clks", "MetricGroup": "CacheMisses;MemoryBound;TmaL3mem;TopdownL3;tma_L3_= group;tma_memory_bound_group", "MetricName": "tma_l3_bound", @@ -2024,7 +2007,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles CPU= was stalled due to Length Changing Prefixes (LCPs)", - "MetricExpr": "DECODE.LCP / tma_info_thread_clks", + "MetricExpr": "cpu_core@DECODE.LCP@ / tma_info_thread_clks", "MetricGroup": "FetchLat;TopdownL3;tma_L3_group;tma_fetch_latency_= group;tma_issueFB", "MetricName": "tma_lcp", "MetricThreshold": "tma_lcp > 0.05 & (tma_fetch_latency > 0.1 & tm= a_frontend_bound > 0.15)", @@ -2045,7 +2028,7 @@ }, { "BriefDescription": "This metric represents Core fraction of cycle= s CPU dispatched uops on execution port for Load operations", - "MetricExpr": "UOPS_DISPATCHED.PORT_2_3_10 / (3 * tma_info_core_co= re_clks)", + "MetricExpr": "cpu_core@UOPS_DISPATCHED.PORT_2_3_10@ / (3 * tma_in= fo_core_core_clks)", "MetricGroup": "TopdownL5;tma_L5_group;tma_ports_utilized_3m_group= ", "MetricName": "tma_load_op_utilization", "MetricThreshold": "tma_load_op_utilization > 0.6", @@ -2064,7 +2047,7 @@ }, { "BriefDescription": "This metric estimates the fraction of cycles = where the Second-level TLB (STLB) was missed by load accesses, performing a= hardware page walk", - "MetricExpr": "DTLB_LOAD_MISSES.WALK_ACTIVE / tma_info_thread_clks= ", + "MetricExpr": "cpu_core@DTLB_LOAD_MISSES.WALK_ACTIVE@ / tma_info_t= hread_clks", "MetricGroup": "MemoryTLB;TopdownL5;tma_L5_group;tma_dtlb_load_gro= up", "MetricName": "tma_load_stlb_miss", "MetricThreshold": "tma_load_stlb_miss > 0.05 & (tma_dtlb_load > 0= .1 & (tma_l1_bound > 0.1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.= 2)))", @@ -2073,7 +2056,6 @@ }, { "BriefDescription": "This metric represents fraction of cycles the= CPU spent handling cache misses due to lock operations", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "(16 * max(0, cpu_core@MEM_INST_RETIRED.LOCK_LOADS@ = - cpu_core@L2_RQSTS.ALL_RFO@) + cpu_core@MEM_INST_RETIRED.LOCK_LOADS@ / cpu= _core@MEM_INST_RETIRED.ALL_STORES@ * (10 * cpu_core@L2_RQSTS.RFO_HIT@ + min= (cpu_core@CPU_CLK_UNHALTED.THREAD@, cpu_core@OFFCORE_REQUESTS_OUTSTANDING.C= YCLES_WITH_DEMAND_RFO@))) / tma_info_thread_clks", "MetricGroup": "Offcore;TopdownL4;tma_L4_group;tma_issueRFO;tma_l1= _bound_group", "MetricName": "tma_lock_latency", @@ -2136,6 +2118,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles the= CPU was stalled due to LFENCE Instructions.", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "13 * cpu_core@MISC2_RETIRED.LFENCE@ / tma_info_thre= ad_clks", "MetricGroup": "TopdownL6;tma_L6_group;tma_serializing_operation_g= roup", "MetricName": "tma_memory_fence", @@ -2145,7 +2128,6 @@ }, { "BriefDescription": "This metric represents fraction of slots wher= e the CPU was retiring memory operations -- uops for memory load or store a= ccesses.", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "tma_light_operations * cpu_core@MEM_UOP_RETIRED.ANY= @ / (tma_retiring * tma_info_thread_slots)", "MetricGroup": "Pipeline;TopdownL3;tma_L3_group;tma_light_operatio= ns_group", "MetricName": "tma_memory_operations", @@ -2155,7 +2137,7 @@ }, { "BriefDescription": "This metric represents fraction of slots the = CPU was retiring uops fetched by the Microcode Sequencer (MS) unit", - "MetricExpr": "UOPS_RETIRED.MS / tma_info_thread_slots", + "MetricExpr": "cpu_core@UOPS_RETIRED.MS@ / tma_info_thread_slots", "MetricGroup": "MicroSeq;TopdownL3;tma_L3_group;tma_heavy_operatio= ns_group;tma_issueMC;tma_issueMS", "MetricName": "tma_microcode_sequencer", "MetricThreshold": "tma_microcode_sequencer > 0.05 & tma_heavy_ope= rations > 0.1", @@ -2225,7 +2207,6 @@ }, { "BriefDescription": "This metric represents the remaining light uo= ps fraction the CPU has executed - remaining means not covered by other sib= ling nodes", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "max(0, tma_light_operations - (tma_fp_arith + tma_i= nt_operations + tma_memory_operations + tma_fused_instructions + tma_non_fu= sed_branches + tma_nop_instructions))", "MetricGroup": "Pipeline;TopdownL3;tma_L3_group;tma_light_operatio= ns_group", "MetricName": "tma_other_light_ops", @@ -2246,7 +2227,7 @@ }, { "BriefDescription": "This metric represents Core fraction of cycle= s CPU dispatched uops on execution port 0 ([SNB+] ALU; [HSW+] ALU and 2nd b= ranch)", - "MetricExpr": "UOPS_DISPATCHED.PORT_0 / tma_info_core_core_clks", + "MetricExpr": "cpu_core@UOPS_DISPATCHED.PORT_0@ / tma_info_core_co= re_clks", "MetricGroup": "Compute;TopdownL6;tma_L6_group;tma_alu_op_utilizat= ion_group;tma_issue2P", "MetricName": "tma_port_0", "MetricThreshold": "tma_port_0 > 0.6", @@ -2256,7 +2237,7 @@ }, { "BriefDescription": "This metric represents Core fraction of cycle= s CPU dispatched uops on execution port 1 (ALU)", - "MetricExpr": "UOPS_DISPATCHED.PORT_1 / tma_info_core_core_clks", + "MetricExpr": "cpu_core@UOPS_DISPATCHED.PORT_1@ / tma_info_core_co= re_clks", "MetricGroup": "TopdownL6;tma_L6_group;tma_alu_op_utilization_grou= p;tma_issue2P", "MetricName": "tma_port_1", "MetricThreshold": "tma_port_1 > 0.6", @@ -2266,7 +2247,7 @@ }, { "BriefDescription": "This metric represents Core fraction of cycle= s CPU dispatched uops on execution port 6 ([HSW+]Primary Branch and simple = ALU)", - "MetricExpr": "UOPS_DISPATCHED.PORT_6 / tma_info_core_core_clks", + "MetricExpr": "cpu_core@UOPS_DISPATCHED.PORT_6@ / tma_info_core_co= re_clks", "MetricGroup": "TopdownL6;tma_L6_group;tma_alu_op_utilization_grou= p;tma_issue2P", "MetricName": "tma_port_6", "MetricThreshold": "tma_port_6 > 0.6", @@ -2296,7 +2277,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles whe= re the CPU executed total of 1 uop per cycle on all execution ports (Logica= l Processor cycles since ICL, Physical Core cycles otherwise)", - "MetricExpr": "EXE_ACTIVITY.1_PORTS_UTIL / tma_info_thread_clks", + "MetricExpr": "cpu_core@EXE_ACTIVITY.1_PORTS_UTIL@ / tma_info_thre= ad_clks", "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_issueL1;tma_p= orts_utilization_group", "MetricName": "tma_ports_utilized_1", "MetricThreshold": "tma_ports_utilized_1 > 0.2 & (tma_ports_utiliz= ation > 0.15 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2))", @@ -2306,7 +2287,8 @@ }, { "BriefDescription": "This metric represents fraction of cycles CPU= executed total of 2 uops per cycle on all execution ports (Logical Process= or cycles since ICL, Physical Core cycles otherwise)", - "MetricExpr": "EXE_ACTIVITY.2_PORTS_UTIL / tma_info_thread_clks", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", + "MetricExpr": "cpu_core@EXE_ACTIVITY.2_PORTS_UTIL@ / tma_info_thre= ad_clks", "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_issue2P;tma_p= orts_utilization_group", "MetricName": "tma_ports_utilized_2", "MetricThreshold": "tma_ports_utilized_2 > 0.15 & (tma_ports_utili= zation > 0.15 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2))", @@ -2316,7 +2298,8 @@ }, { "BriefDescription": "This metric represents fraction of cycles CPU= executed total of 3 or more uops per cycle on all execution ports (Logical= Processor cycles since ICL, Physical Core cycles otherwise)", - "MetricExpr": "UOPS_EXECUTED.CYCLES_GE_3 / tma_info_thread_clks", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", + "MetricExpr": "cpu_core@UOPS_EXECUTED.CYCLES_GE_3@ / tma_info_thre= ad_clks", "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_ports_utiliza= tion_group", "MetricName": "tma_ports_utilized_3m", "MetricThreshold": "tma_ports_utilized_3m > 0.7 & (tma_ports_utili= zation > 0.15 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2))", @@ -2338,7 +2321,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles the= CPU issue-pipeline was stalled due to serializing operations", - "MetricExpr": "RESOURCE_STALLS.SCOREBOARD / tma_info_thread_clks", + "MetricExpr": "cpu_core@RESOURCE_STALLS.SCOREBOARD@ / tma_info_thr= ead_clks", "MetricGroup": "PortsUtil;TopdownL5;tma_L5_group;tma_issueSO;tma_p= orts_utilized_0_group", "MetricName": "tma_serializing_operation", "MetricThreshold": "tma_serializing_operation > 0.1 & (tma_ports_u= tilized_0 > 0.2 & (tma_ports_utilization > 0.15 & (tma_core_bound > 0.1 & t= ma_backend_bound > 0.2)))", @@ -2348,7 +2331,7 @@ }, { "BriefDescription": "This metric represents Shuffle (cross \"vecto= r lane\" data transfers) uops fraction the CPU has retired.", - "MetricExpr": "INT_VEC_RETIRED.SHUFFLES / (tma_retiring * tma_info= _thread_slots)", + "MetricExpr": "cpu_core@INT_VEC_RETIRED.SHUFFLES@ / (tma_retiring = * tma_info_thread_slots)", "MetricGroup": "HPC;Pipeline;TopdownL4;tma_L4_group;tma_int_operat= ions_group", "MetricName": "tma_shuffles", "MetricThreshold": "tma_shuffles > 0.1 & (tma_int_operations > 0.1= & tma_light_operations > 0.6)", @@ -2357,7 +2340,8 @@ }, { "BriefDescription": "This metric represents fraction of cycles the= CPU was stalled due to PAUSE Instructions", - "MetricExpr": "CPU_CLK_UNHALTED.PAUSE / tma_info_thread_clks", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", + "MetricExpr": "cpu_core@CPU_CLK_UNHALTED.PAUSE@ / tma_info_thread_= clks", "MetricGroup": "TopdownL6;tma_L6_group;tma_serializing_operation_g= roup", "MetricName": "tma_slow_pause", "MetricThreshold": "tma_slow_pause > 0.05 & (tma_serializing_opera= tion > 0.1 & (tma_ports_utilized_0 > 0.2 & (tma_ports_utilization > 0.15 & = (tma_core_bound > 0.1 & tma_backend_bound > 0.2))))", @@ -2377,8 +2361,7 @@ }, { "BriefDescription": "This metric represents rate of split store ac= cesses", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", - "MetricExpr": "MEM_INST_RETIRED.SPLIT_STORES / tma_info_core_core_= clks", + "MetricExpr": "cpu_core@MEM_INST_RETIRED.SPLIT_STORES@ / tma_info_= core_core_clks", "MetricGroup": "TopdownL4;tma_L4_group;tma_issueSpSt;tma_store_bou= nd_group", "MetricName": "tma_split_stores", "MetricThreshold": "tma_split_stores > 0.2 & (tma_store_bound > 0.= 2 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", @@ -2398,7 +2381,7 @@ }, { "BriefDescription": "This metric estimates how often CPU was stall= ed due to RFO store memory accesses; RFO store issue a read-for-ownership = request before the write", - "MetricExpr": "EXE_ACTIVITY.BOUND_ON_STORES / tma_info_thread_clks= ", + "MetricExpr": "cpu_core@EXE_ACTIVITY.BOUND_ON_STORES@ / tma_info_t= hread_clks", "MetricGroup": "MemoryBound;TmaL3mem;TopdownL3;tma_L3_group;tma_me= mory_bound_group", "MetricName": "tma_store_bound", "MetricThreshold": "tma_store_bound > 0.2 & (tma_memory_bound > 0.= 2 & tma_backend_bound > 0.2)", @@ -2408,7 +2391,6 @@ }, { "BriefDescription": "This metric roughly estimates fraction of cyc= les when the memory subsystem had loads blocked since they could not forwar= d data from earlier (in program order) overlapping stores", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "13 * cpu_core@LD_BLOCKS.STORE_FORWARD@ / tma_info_t= hread_clks", "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group", "MetricName": "tma_store_fwd_blk", @@ -2448,7 +2430,7 @@ }, { "BriefDescription": "This metric estimates the fraction of cycles = where the STLB was missed by store accesses, performing a hardware page wal= k", - "MetricExpr": "DTLB_STORE_MISSES.WALK_ACTIVE / tma_info_core_core_= clks", + "MetricExpr": "cpu_core@DTLB_STORE_MISSES.WALK_ACTIVE@ / tma_info_= core_core_clks", "MetricGroup": "MemoryTLB;TopdownL5;tma_L5_group;tma_dtlb_store_gr= oup", "MetricName": "tma_store_stlb_miss", "MetricThreshold": "tma_store_stlb_miss > 0.05 & (tma_dtlb_store >= 0.05 & (tma_store_bound > 0.2 & (tma_memory_bound > 0.2 & tma_backend_boun= d > 0.2)))", @@ -2467,7 +2449,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles the= CPU was stalled due to new branch address clears", - "MetricExpr": "INT_MISC.UNKNOWN_BRANCH_CYCLES / tma_info_thread_cl= ks", + "MetricExpr": "cpu_core@INT_MISC.UNKNOWN_BRANCH_CYCLES@ / tma_info= _thread_clks", "MetricGroup": "BigFoot;FetchLat;TopdownL4;tma_L4_group;tma_branch= _resteers_group", "MetricName": "tma_unknown_branches", "MetricThreshold": "tma_unknown_branches > 0.05 & (tma_branch_rest= eers > 0.05 & (tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15))", diff --git a/tools/perf/pmu-events/arch/x86/alderlaken/adln-metrics.json b/= tools/perf/pmu-events/arch/x86/alderlaken/adln-metrics.json index c150c14ac6ed..a35edf7d86a9 100644 --- a/tools/perf/pmu-events/arch/x86/alderlaken/adln-metrics.json +++ b/tools/perf/pmu-events/arch/x86/alderlaken/adln-metrics.json @@ -195,7 +195,6 @@ }, { "BriefDescription": "Counts the number of cycles the core is stall= ed due to a demand load miss which hit in DRAM or MMIO (Non-DRAM).", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "MEM_BOUND_STALLS.LOAD_DRAM_HIT / tma_info_core_clks= - max((MEM_BOUND_STALLS.LOAD - LD_HEAD.L1_MISS_AT_RET) / tma_info_core_clk= s, 0) * MEM_BOUND_STALLS.LOAD_DRAM_HIT / MEM_BOUND_STALLS.LOAD", "MetricGroup": "TopdownL3;tma_L3_group;tma_memory_bound_group", "MetricName": "tma_dram_bound", @@ -457,7 +456,6 @@ }, { "BriefDescription": "Counts the number of cycles a core is stalled= due to a demand load which hit in the L2 Cache.", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "MEM_BOUND_STALLS.LOAD_L2_HIT / tma_info_core_clks -= max((MEM_BOUND_STALLS.LOAD - LD_HEAD.L1_MISS_AT_RET) / tma_info_core_clks,= 0) * MEM_BOUND_STALLS.LOAD_L2_HIT / MEM_BOUND_STALLS.LOAD", "MetricGroup": "TopdownL3;tma_L3_group;tma_memory_bound_group", "MetricName": "tma_l2_bound", @@ -466,7 +464,6 @@ }, { "BriefDescription": "Counts the number of cycles a core is stalled= due to a demand load which hit in the Last Level Cache (LLC) or other core= with HITE/F/M.", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "MEM_BOUND_STALLS.LOAD_LLC_HIT / tma_info_core_clks = - max((MEM_BOUND_STALLS.LOAD - LD_HEAD.L1_MISS_AT_RET) / tma_info_core_clks= , 0) * MEM_BOUND_STALLS.LOAD_LLC_HIT / MEM_BOUND_STALLS.LOAD", "MetricGroup": "TopdownL3;tma_L3_group;tma_memory_bound_group", "MetricName": "tma_l3_bound", @@ -683,7 +680,6 @@ }, { "BriefDescription": "Counts the number of cycles that the oldest l= oad of the load buffer is stalled at retirement due to a store forward bloc= k.", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "LD_HEAD.ST_ADDR_AT_RET / tma_info_core_clks", "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group", "MetricName": "tma_store_fwd_blk", diff --git a/tools/perf/pmu-events/arch/x86/sapphirerapids/spr-metrics.json= b/tools/perf/pmu-events/arch/x86/sapphirerapids/spr-metrics.json index e31a4aac9f20..56e54babcc26 100644 --- a/tools/perf/pmu-events/arch/x86/sapphirerapids/spr-metrics.json +++ b/tools/perf/pmu-events/arch/x86/sapphirerapids/spr-metrics.json @@ -400,7 +400,6 @@ }, { "BriefDescription": "This metric estimates fraction of cycles whil= e the memory subsystem was handling synchronizations due to contested acces= ses", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "(76 * tma_info_system_average_frequency * (MEM_LOAD= _L3_HIT_RETIRED.XSNP_FWD * (OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM / (OCR.DEM= AND_DATA_RD.L3_HIT.SNOOP_HITM + OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HIT_WITH_FW= D))) + 75.5 * tma_info_system_average_frequency * MEM_LOAD_L3_HIT_RETIRED.X= SNP_MISS) * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS / 2) / = tma_info_thread_clks", "MetricGroup": "DataSharing;Offcore;Snoop;TopdownL4;tma_L4_group;t= ma_issueSyncxn;tma_l3_bound_group", "MetricName": "tma_contested_accesses", @@ -421,7 +420,6 @@ }, { "BriefDescription": "This metric estimates fraction of cycles whil= e the memory subsystem was handling synchronizations due to data-sharing ac= cesses", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "75.5 * tma_info_system_average_frequency * (MEM_LOA= D_L3_HIT_RETIRED.XSNP_NO_FWD + MEM_LOAD_L3_HIT_RETIRED.XSNP_FWD * (1 - OCR.= DEMAND_DATA_RD.L3_HIT.SNOOP_HITM / (OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM + = OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HIT_WITH_FWD))) * (1 + MEM_LOAD_RETIRED.FB_= HIT / MEM_LOAD_RETIRED.L1_MISS / 2) / tma_info_thread_clks", "MetricGroup": "Offcore;Snoop;TopdownL4;tma_L4_group;tma_issueSync= xn;tma_l3_bound_group", "MetricName": "tma_data_sharing", @@ -449,7 +447,6 @@ }, { "BriefDescription": "This metric estimates how often the CPU was s= talled on accesses to external memory (DRAM) by loads", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "(MEMORY_ACTIVITY.STALLS_L3_MISS / tma_info_thread_c= lks - tma_pmm_bound if #has_pmem > 0 else MEMORY_ACTIVITY.STALLS_L3_MISS / = tma_info_thread_clks)", "MetricGroup": "MemoryBound;TmaL3mem;TopdownL3;tma_L3_group;tma_me= mory_bound_group", "MetricName": "tma_dram_bound", @@ -656,7 +653,6 @@ }, { "BriefDescription": "Branch Misprediction Cost: Fraction of TMA sl= ots wasted per non-speculative branch misprediction (retired JEClear)", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "(tma_branch_mispredicts + tma_fetch_latency * tma_m= ispredicts_resteers / (tma_branch_resteers + tma_dsb_switches + tma_icache_= misses + tma_itlb_misses + tma_lcp + tma_ms_switches)) * tma_info_thread_sl= ots / BR_MISP_RETIRED.ALL_BRANCHES", "MetricGroup": "Bad;BrMispredicts;tma_issueBM", "MetricName": "tma_info_bad_spec_branch_misprediction_cost", @@ -699,7 +695,6 @@ }, { "BriefDescription": "Probability of Core Bound bottleneck hidden b= y SMT-profiling artifacts", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "(100 * (1 - tma_core_bound / tma_ports_utilization = if tma_core_bound < tma_ports_utilization else 1) if tma_info_system_smt_2t= _utilization > 0.5 else 0)", "MetricGroup": "Cor;SMT", "MetricName": "tma_info_botlnk_l0_core_bound_likely", @@ -707,7 +702,6 @@ }, { "BriefDescription": "Total pipeline cost of DSB (uop cache) misses= - subset of the Instruction_Fetch_BW Bottleneck", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_fetch_latency * tma_dsb_switches / (tma_= branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + = tma_lcp + tma_ms_switches) + tma_fetch_bandwidth * tma_mite / (tma_dsb + tm= a_mite))", "MetricGroup": "DSBmiss;Fed;tma_issueFB", "MetricName": "tma_info_botlnk_l2_dsb_misses", @@ -716,7 +710,6 @@ }, { "BriefDescription": "Total pipeline cost of Instruction Cache miss= es - subset of the Big_Code Bottleneck", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_fetch_latency * tma_icache_misses / (tma= _branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses += tma_lcp + tma_ms_switches))", "MetricGroup": "Fed;FetchLat;IcMiss;tma_issueFL", "MetricName": "tma_info_botlnk_l2_ic_misses", @@ -725,7 +718,6 @@ }, { "BriefDescription": "Total pipeline cost of instruction fetch rela= ted bottlenecks by large code footprint programs (i-side cache; TLB and BTB= misses)", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * tma_fetch_latency * (tma_itlb_misses + tma_ic= ache_misses + tma_unknown_branches) / (tma_branch_resteers + tma_dsb_switch= es + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches)", "MetricGroup": "BigFoot;Fed;Frontend;IcMiss;MemoryTLB;tma_issueBC", "MetricName": "tma_info_bottleneck_big_code", @@ -742,7 +734,6 @@ }, { "BriefDescription": "Total pipeline cost of instruction fetch band= width related bottlenecks", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_frontend_bound - tma_fetch_latency * tma= _mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches + tma_icach= e_misses + tma_itlb_misses + tma_lcp + tma_ms_switches)) - tma_info_bottlen= eck_big_code", "MetricGroup": "Fed;FetchBW;Frontend", "MetricName": "tma_info_bottleneck_instruction_fetch_bw", @@ -750,7 +741,6 @@ }, { "BriefDescription": "Total pipeline cost of (external) Memory Band= width related bottlenecks", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * tma_memory_bound * (tma_dram_bound / (tma_dra= m_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_pmm_bound + tma_= store_bound) * (tma_mem_bandwidth / (tma_mem_bandwidth + tma_mem_latency)) = + tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bou= nd + tma_pmm_bound + tma_store_bound) * (tma_sq_full / (tma_contested_acces= ses + tma_data_sharing + tma_l3_hit_latency + tma_sq_full))) + tma_l1_bound= / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_pmm_b= ound + tma_store_bound) * (tma_fb_full / (tma_dtlb_load + tma_fb_full + tma= _lock_latency + tma_split_loads + tma_store_fwd_blk))", "MetricGroup": "Mem;MemoryBW;Offcore;tma_issueBW", "MetricName": "tma_info_bottleneck_memory_bandwidth", @@ -759,7 +749,6 @@ }, { "BriefDescription": "Total pipeline cost of Memory Address Transla= tion related bottlenecks (data-side TLBs)", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * tma_memory_bound * (tma_l1_bound / max(tma_me= mory_bound, tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + t= ma_pmm_bound + tma_store_bound) * (tma_dtlb_load / max(tma_l1_bound, tma_dt= lb_load + tma_fb_full + tma_lock_latency + tma_split_loads + tma_store_fwd_= blk)) + tma_store_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + t= ma_l3_bound + tma_pmm_bound + tma_store_bound) * (tma_dtlb_store / (tma_dtl= b_store + tma_false_sharing + tma_split_stores + tma_store_latency + tma_st= reaming_stores)))", "MetricGroup": "Mem;MemoryTLB;Offcore;tma_issueTLB", "MetricName": "tma_info_bottleneck_memory_data_tlbs", @@ -768,7 +757,6 @@ }, { "BriefDescription": "Total pipeline cost of Memory Latency related= bottlenecks (external memory and off-core caches)", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * tma_memory_bound * (tma_dram_bound / (tma_dra= m_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_pmm_bound + tma_= store_bound) * (tma_mem_latency / (tma_mem_bandwidth + tma_mem_latency)) + = tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound= + tma_pmm_bound + tma_store_bound) * (tma_l3_hit_latency / (tma_contested_= accesses + tma_data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_l2_b= ound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_p= mm_bound + tma_store_bound))", "MetricGroup": "Mem;MemoryLat;Offcore;tma_issueLat", "MetricName": "tma_info_bottleneck_memory_latency", @@ -777,7 +765,6 @@ }, { "BriefDescription": "Total pipeline cost of Branch Misprediction r= elated bottlenecks", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_branch_mispredicts + tma_fetch_latency *= tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches + tma_i= cache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches))", "MetricGroup": "Bad;BadSpec;BrMispredicts;tma_issueBM", "MetricName": "tma_info_bottleneck_mispredictions", @@ -1301,6 +1288,7 @@ }, { "BriefDescription": "Average latency of data read request to exter= nal memory (in nanoseconds)", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "1e9 * (UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD / UNC_CHA_= TOR_INSERTS.IA_MISS_DRD) / (tma_info_system_socket_clks / duration_time)", "MetricGroup": "Mem;MemoryLat;SoC", "MetricName": "tma_info_system_mem_read_latency", @@ -1455,7 +1443,6 @@ }, { "BriefDescription": "This metric estimates how often the CPU was s= talled due to L2 cache accesses by loads", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "(MEMORY_ACTIVITY.STALLS_L1D_MISS - MEMORY_ACTIVITY.= STALLS_L2_MISS) / tma_info_thread_clks", "MetricGroup": "CacheMisses;MemoryBound;TmaL3mem;TopdownL3;tma_L3_= group;tma_memory_bound_group", "MetricName": "tma_l2_bound", @@ -1465,7 +1452,6 @@ }, { "BriefDescription": "This metric estimates how often the CPU was s= talled due to loads accesses to L3 cache or contended with a sibling Core", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "(MEMORY_ACTIVITY.STALLS_L2_MISS - MEMORY_ACTIVITY.S= TALLS_L3_MISS) / tma_info_thread_clks", "MetricGroup": "CacheMisses;MemoryBound;TmaL3mem;TopdownL3;tma_L3_= group;tma_memory_bound_group", "MetricName": "tma_l3_bound", @@ -1538,7 +1524,6 @@ }, { "BriefDescription": "This metric represents fraction of cycles the= CPU spent handling cache misses due to lock operations", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "(16 * max(0, MEM_INST_RETIRED.LOCK_LOADS - L2_RQSTS= .ALL_RFO) + MEM_INST_RETIRED.LOCK_LOADS / MEM_INST_RETIRED.ALL_STORES * (10= * L2_RQSTS.RFO_HIT + min(CPU_CLK_UNHALTED.THREAD, OFFCORE_REQUESTS_OUTSTAN= DING.CYCLES_WITH_DEMAND_RFO))) / tma_info_thread_clks", "MetricGroup": "Offcore;TopdownL4;tma_L4_group;tma_issueRFO;tma_l1= _bound_group", "MetricName": "tma_lock_latency", @@ -1596,6 +1581,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles the= CPU was stalled due to LFENCE Instructions.", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "13 * MISC2_RETIRED.LFENCE / tma_info_thread_clks", "MetricGroup": "TopdownL6;tma_L6_group;tma_serializing_operation_g= roup", "MetricName": "tma_memory_fence", @@ -1604,7 +1590,6 @@ }, { "BriefDescription": "This metric represents fraction of slots wher= e the CPU was retiring memory operations -- uops for memory load or store a= ccesses.", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "tma_light_operations * MEM_UOP_RETIRED.ANY / (tma_r= etiring * tma_info_thread_slots)", "MetricGroup": "Pipeline;TopdownL3;tma_L3_group;tma_light_operatio= ns_group", "MetricName": "tma_memory_operations", @@ -1676,7 +1661,6 @@ }, { "BriefDescription": "This metric represents the remaining light uo= ps fraction the CPU has executed - remaining means not covered by other sib= ling nodes", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "max(0, tma_light_operations - (tma_fp_arith + tma_i= nt_operations + tma_memory_operations + tma_fused_instructions + tma_non_fu= sed_branches + tma_nop_instructions))", "MetricGroup": "Pipeline;TopdownL3;tma_L3_group;tma_light_operatio= ns_group", "MetricName": "tma_other_light_ops", @@ -1758,6 +1742,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles CPU= executed total of 2 uops per cycle on all execution ports (Logical Process= or cycles since ICL, Physical Core cycles otherwise)", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "EXE_ACTIVITY.2_PORTS_UTIL / tma_info_thread_clks", "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_issue2P;tma_p= orts_utilization_group", "MetricName": "tma_ports_utilized_2", @@ -1767,6 +1752,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles CPU= executed total of 3 or more uops per cycle on all execution ports (Logical= Processor cycles since ICL, Physical Core cycles otherwise)", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "UOPS_EXECUTED.CYCLES_GE_3 / tma_info_thread_clks", "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_ports_utiliza= tion_group", "MetricName": "tma_ports_utilized_3m", @@ -1822,6 +1808,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles the= CPU was stalled due to PAUSE Instructions", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "CPU_CLK_UNHALTED.PAUSE / tma_info_thread_clks", "MetricGroup": "TopdownL6;tma_L6_group;tma_serializing_operation_g= roup", "MetricName": "tma_slow_pause", @@ -1840,7 +1827,6 @@ }, { "BriefDescription": "This metric represents rate of split store ac= cesses", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "MEM_INST_RETIRED.SPLIT_STORES / tma_info_core_core_= clks", "MetricGroup": "TopdownL4;tma_L4_group;tma_issueSpSt;tma_store_bou= nd_group", "MetricName": "tma_split_stores", @@ -1868,7 +1854,6 @@ }, { "BriefDescription": "This metric roughly estimates fraction of cyc= les when the memory subsystem had loads blocked since they could not forwar= d data from earlier (in program order) overlapping stores", - "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "13 * LD_BLOCKS.STORE_FORWARD / tma_info_thread_clks= ", "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group", "MetricName": "tma_store_fwd_blk", --=20 2.43.0.472.g3155946c3a-goog