From nobody Thu Sep 18 00:07:19 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 375A6C4332F for ; Thu, 15 Dec 2022 06:48:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229705AbiLOGsa (ORCPT ); Thu, 15 Dec 2022 01:48:30 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47152 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229637AbiLOGs0 (ORCPT ); Thu, 15 Dec 2022 01:48:26 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 13DD93E0BF for ; Wed, 14 Dec 2022 22:48:23 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id b4-20020a253404000000b006fad1bb09f4so2668289yba.1 for ; Wed, 14 Dec 2022 22:48:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=content-transfer-encoding:cc:to:from:subject:references :mime-version:message-id:in-reply-to:date:from:to:cc:subject:date :message-id:reply-to; bh=DEss46TpvlLEtoH1Iq94iZ2sfjKMRnZEhgQaExEpnT4=; b=RrUT1WzI8L2xj7QJzP1VKLNjhR6z2teH0CzWKbldx0Mu/VS+Z3kZovtd2nqp7nGISG U7sO9KYspfBBumGpy2e/SShJRbmeKSU8I9LMXEgCWlp2S3BkNYlJFwL/6tC1jpS1WHnS 08+YRONj1bpl7g47h55a1l+pUVR2kDgQ0VDYAqGq8bnZ9SdJ3Hu1Iu5hInXXpEAMqWrP DezrHkqvXWAy+jwMPo2CeVIRm0cQb8YmXMt7U+1yvxK1q+DAzp4DwCa/TyCt9Pwp4kt9 RC8kretJaJK4Vmu6X8HvX2hTXCEWSPITbt/MffioOCocHkELbRr/QTlNCOU6sjwrjMsF 1cjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:cc:to:from:subject:references :mime-version:message-id:in-reply-to:date:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=DEss46TpvlLEtoH1Iq94iZ2sfjKMRnZEhgQaExEpnT4=; b=EbqhsHr1ks06R8GqLZpFg4B7mhIAUOCNTEGAyJRnufrCe3GN6r7GsjP6GHQvMhdkep Xvnl19qhvDDfkOCphgt2tdQ/RYX56CGCDtCFADbDlMlwS3qM9dBUCcluhMW6VJ4ru4GZ m8tX5vYi59oMbIkFr+qHqheVMm8DxDa7SBrmWmXotgD0AtYz+4r5xreuWWptfPJCJIgD 3yWmypzKnur9YBBe004xtkImaLyutnJKJbXCE1RpN9QcHSBTybyrHYw+gy78gnN8QY5v AjjchoE+ATJ+MJEBZZRGBdilyskKfgLm7KaaKMjYruX35xKfFRZ2rIqYT6sugOX4PYl0 vm4Q== X-Gm-Message-State: ANoB5pkV6L0uP3jAJR4poQD0t64cCjfkarLV02t0y9vCbYKJCejdkArW fBYzegnkoaGNtGULYDVRorlxCKDDyYr6 X-Google-Smtp-Source: AA0mqf5n8zO+/k/Iz1lvPryrcRf3cIQeQVHSpNcPS+PKk8cmKtUPohujKBk+R8hNbUE5t1tfp5LvA0aXjBhw X-Received: from irogers.svl.corp.google.com ([2620:15c:2d4:203:c7dc:d96f:4f9:f59d]) (user=irogers job=sendgmr) by 2002:a05:6902:91b:b0:6dd:313b:9b30 with SMTP id bu27-20020a056902091b00b006dd313b9b30mr96605927ybb.618.1671086902818; Wed, 14 Dec 2022 22:48:22 -0800 (PST) Date: Wed, 14 Dec 2022 22:47:25 -0800 In-Reply-To: <20221215064755.1620246-1-irogers@google.com> Message-Id: <20221215064755.1620246-3-irogers@google.com> Mime-Version: 1.0 References: <20221215064755.1620246-1-irogers@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Subject: [PATCH v1 02/32] perf vendor events intel: Refresh alderlake metrics From: Ian Rogers To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Adrian Hunter , Kan Liang , Xing Zhengjun , linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org Cc: John Garry , Stephane Eranian , Perry Taylor , Caleb Biggers , Ian Rogers Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Update the alderlake metrics using the new tooling from: https://github.com/intel/perfmon The metrics are unchanged but the formulas differ due to parentheses, use of exponents and removal of redundant operations like "* 1". Signed-off-by: Ian Rogers --- .../arch/x86/alderlake/adl-metrics.json | 116 +++++++++--------- 1 file changed, 58 insertions(+), 58 deletions(-) diff --git a/tools/perf/pmu-events/arch/x86/alderlake/adl-metrics.json b/to= ols/perf/pmu-events/arch/x86/alderlake/adl-metrics.json index edf440e9359a..2eb3d7464d9f 100644 --- a/tools/perf/pmu-events/arch/x86/alderlake/adl-metrics.json +++ b/tools/perf/pmu-events/arch/x86/alderlake/adl-metrics.json @@ -10,7 +10,7 @@ }, { "BriefDescription": "This metric represents fraction of slots the = CPU was stalled due to Frontend latency issues", - "MetricExpr": "(topdown\\-fetch\\-lat / (topdown\\-fe\\-bound + to= pdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) - INT_MISC.= UOP_DROPPING / SLOTS)", + "MetricExpr": "topdown\\-fetch\\-lat / (topdown\\-fe\\-bound + top= down\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) - INT_MISC.U= OP_DROPPING / SLOTS", "MetricGroup": "Frontend;TopdownL2;tma_L2_group;tma_frontend_bound= _group", "MetricName": "tma_fetch_latency", "PublicDescription": "This metric represents fraction of slots the= CPU was stalled due to Frontend latency issues. For example; instruction-= cache misses; iTLB misses or fetch stalls after a branch misprediction are = categorized under Frontend Latency. In such cases; the Frontend eventually = delivers no uops for some period. Sample with: FRONTEND_RETIRED.LATENCY_GE_= 16_PS;FRONTEND_RETIRED.LATENCY_GE_8_PS", @@ -46,7 +46,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles the= CPU was stalled due to Branch Resteers as a result of Branch Misprediction= at execution stage", - "MetricExpr": "(tma_branch_mispredicts / tma_bad_speculation) * IN= T_MISC.CLEAR_RESTEER_CYCLES / CLKS", + "MetricExpr": "tma_branch_mispredicts / tma_bad_speculation * INT_= MISC.CLEAR_RESTEER_CYCLES / CLKS", "MetricGroup": "BadSpec;BrMispredicts;TopdownL4;tma_branch_resteer= s_group", "MetricName": "tma_mispredicts_resteers", "PublicDescription": "This metric represents fraction of cycles th= e CPU was stalled due to Branch Resteers as a result of Branch Mispredictio= n at execution stage. Sample with: INT_MISC.CLEAR_RESTEER_CYCLES", @@ -55,7 +55,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles the= CPU was stalled due to Branch Resteers as a result of Machine Clears", - "MetricExpr": "(1 - (tma_branch_mispredicts / tma_bad_speculation)= ) * INT_MISC.CLEAR_RESTEER_CYCLES / CLKS", + "MetricExpr": "(1 - tma_branch_mispredicts / tma_bad_speculation) = * INT_MISC.CLEAR_RESTEER_CYCLES / CLKS", "MetricGroup": "BadSpec;MachineClears;TopdownL4;tma_branch_resteer= s_group", "MetricName": "tma_clears_resteers", "PublicDescription": "This metric represents fraction of cycles th= e CPU was stalled due to Branch Resteers as a result of Machine Clears. Sa= mple with: INT_MISC.CLEAR_RESTEER_CYCLES", @@ -153,7 +153,7 @@ }, { "BriefDescription": "This metric represents fraction of slots the = CPU has wasted due to Branch Misprediction", - "MetricExpr": "topdown\\-br\\-mispredict / (topdown\\-fe\\-bound += topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0*SLOT= S", + "MetricExpr": "topdown\\-br\\-mispredict / (topdown\\-fe\\-bound += topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * SL= OTS", "MetricGroup": "BadSpec;BrMispredicts;TopdownL2;tma_L2_group;tma_b= ad_speculation_group", "MetricName": "tma_branch_mispredicts", "PublicDescription": "This metric represents fraction of slots the= CPU has wasted due to Branch Misprediction. These slots are either wasted= by uops fetched from an incorrectly speculated program path; or stalls whe= n the out-of-order part of the machine needs to recover its state from a sp= eculative path. Sample with: TOPDOWN.BR_MISPREDICT_SLOTS", @@ -171,7 +171,7 @@ }, { "BriefDescription": "This category represents fraction of slots wh= ere no uops are being delivered due to a lack of required resources for acc= epting new uops in the Backend", - "MetricExpr": "topdown\\-be\\-bound / (topdown\\-fe\\-bound + topd= own\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0*SLOTS", + "MetricExpr": "topdown\\-be\\-bound / (topdown\\-fe\\-bound + topd= own\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * SLOTS", "MetricGroup": "TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", "PublicDescription": "This category represents fraction of slots w= here no uops are being delivered due to a lack of required resources for ac= cepting new uops in the Backend. Backend is the portion of the processor co= re where the out-of-order scheduler dispatches ready uops into their respec= tive execution units; and once completed these uops get retired according t= o program order. For example; stalls due to data-cache misses or stalls due= to the divider unit being overloaded are both categorized under Backend Bo= und. Backend Bound is further divided into two main categories: Memory Boun= d and Core Bound. Sample with: TOPDOWN.BACKEND_BOUND_SLOTS", @@ -180,7 +180,7 @@ }, { "BriefDescription": "This metric represents fraction of slots the = Memory subsystem within the Backend was a bottleneck", - "MetricExpr": "topdown\\-mem\\-bound / (topdown\\-fe\\-bound + top= down\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0*SLOTS", + "MetricExpr": "topdown\\-mem\\-bound / (topdown\\-fe\\-bound + top= down\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * SLOTS", "MetricGroup": "Backend;TopdownL2;tma_L2_group;tma_backend_bound_g= roup", "MetricName": "tma_memory_bound", "PublicDescription": "This metric represents fraction of slots the= Memory subsystem within the Backend was a bottleneck. Memory Bound estima= tes fraction of slots where pipeline is likely stalled due to demand load o= r store instructions. This accounts mainly for (1) non-completed in-flight = memory demand loads which coincides with execution units starvation; in add= ition to (2) cases where stores could impose backpressure on the pipeline w= hen many of them get buffered at the same time (less common out of the two)= .", @@ -232,7 +232,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles the= CPU spent handling cache misses due to lock operations", - "MetricExpr": "(16 * max(0, MEM_INST_RETIRED.LOCK_LOADS - L2_RQSTS= .ALL_RFO) + (MEM_INST_RETIRED.LOCK_LOADS / MEM_INST_RETIRED.ALL_STORES) * (= 10 * L2_RQSTS.RFO_HIT + min(CPU_CLK_UNHALTED.THREAD, OFFCORE_REQUESTS_OUTST= ANDING.CYCLES_WITH_DEMAND_RFO))) / CLKS", + "MetricExpr": "(16 * max(0, MEM_INST_RETIRED.LOCK_LOADS - L2_RQSTS= .ALL_RFO) + MEM_INST_RETIRED.LOCK_LOADS / MEM_INST_RETIRED.ALL_STORES * (10= * L2_RQSTS.RFO_HIT + min(CPU_CLK_UNHALTED.THREAD, OFFCORE_REQUESTS_OUTSTAN= DING.CYCLES_WITH_DEMAND_RFO))) / CLKS", "MetricGroup": "Offcore;TopdownL4;tma_l1_bound_group", "MetricName": "tma_lock_latency", "PublicDescription": "This metric represents fraction of cycles th= e CPU spent handling cache misses due to lock operations. Due to the microa= rchitecture handling of locks; they are classified as L1_Bound regardless o= f what memory source satisfied them. Sample with: MEM_INST_RETIRED.LOCK_LOA= DS_PS", @@ -277,7 +277,7 @@ }, { "BriefDescription": "This metric estimates fraction of cycles whil= e the memory subsystem was handling synchronizations due to contested acces= ses", - "MetricExpr": "((25 * Average_Frequency) * (MEM_LOAD_L3_HIT_RETIRE= D.XSNP_FWD * (OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM / (OCR.DEMAND_DATA_RD.L3= _HIT.SNOOP_HITM + OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HIT_WITH_FWD))) + (24 * A= verage_Frequency) * MEM_LOAD_L3_HIT_RETIRED.XSNP_MISS) * (1 + (MEM_LOAD_RET= IRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS) / 2) / CLKS", + "MetricExpr": "(25 * Average_Frequency * (MEM_LOAD_L3_HIT_RETIRED.= XSNP_FWD * (OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM / (OCR.DEMAND_DATA_RD.L3_H= IT.SNOOP_HITM + OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HIT_WITH_FWD))) + 24 * Aver= age_Frequency * MEM_LOAD_L3_HIT_RETIRED.XSNP_MISS) * (1 + MEM_LOAD_RETIRED.= FB_HIT / MEM_LOAD_RETIRED.L1_MISS / 2) / CLKS", "MetricGroup": "DataSharing;Offcore;Snoop;TopdownL4;tma_l3_bound_g= roup", "MetricName": "tma_contested_accesses", "PublicDescription": "This metric estimates fraction of cycles whi= le the memory subsystem was handling synchronizations due to contested acce= sses. Contested accesses occur when data written by one Logical Processor a= re read by another Logical Processor on a different Physical Core. Examples= of contested accesses include synchronizations such as locks; true data sh= aring such as modified locked variables; and false sharing. Sample with: ME= M_LOAD_L3_HIT_RETIRED.XSNP_FWD;MEM_LOAD_L3_HIT_RETIRED.XSNP_MISS", @@ -286,7 +286,7 @@ }, { "BriefDescription": "This metric estimates fraction of cycles whil= e the memory subsystem was handling synchronizations due to data-sharing ac= cesses", - "MetricExpr": "(24 * Average_Frequency) * (MEM_LOAD_L3_HIT_RETIRED= .XSNP_NO_FWD + MEM_LOAD_L3_HIT_RETIRED.XSNP_FWD * (1 - (OCR.DEMAND_DATA_RD.= L3_HIT.SNOOP_HITM / (OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM + OCR.DEMAND_DATA= _RD.L3_HIT.SNOOP_HIT_WITH_FWD)))) * (1 + (MEM_LOAD_RETIRED.FB_HIT / MEM_LOA= D_RETIRED.L1_MISS) / 2) / CLKS", + "MetricExpr": "24 * Average_Frequency * (MEM_LOAD_L3_HIT_RETIRED.X= SNP_NO_FWD + MEM_LOAD_L3_HIT_RETIRED.XSNP_FWD * (1 - OCR.DEMAND_DATA_RD.L3_= HIT.SNOOP_HITM / (OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM + OCR.DEMAND_DATA_RD= .L3_HIT.SNOOP_HIT_WITH_FWD))) * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RET= IRED.L1_MISS / 2) / CLKS", "MetricGroup": "Offcore;Snoop;TopdownL4;tma_l3_bound_group", "MetricName": "tma_data_sharing", "PublicDescription": "This metric estimates fraction of cycles whi= le the memory subsystem was handling synchronizations due to data-sharing a= ccesses. Data shared by multiple Logical Processors (even just read shared)= may cause increased access latency due to cache coherency. Excessive data = sharing can drastically harm multithreaded performance. Sample with: MEM_LO= AD_L3_HIT_RETIRED.XSNP_NO_FWD", @@ -295,7 +295,7 @@ }, { "BriefDescription": "This metric represents fraction of cycles wit= h demand load accesses that hit the L3 cache under unloaded scenarios (poss= ibly L3 latency limited)", - "MetricExpr": "(9 * Average_Frequency) * MEM_LOAD_RETIRED.L3_HIT *= (1 + (MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS) / 2) / CLKS", + "MetricExpr": "9 * Average_Frequency * MEM_LOAD_RETIRED.L3_HIT * (= 1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS / 2) / CLKS", "MetricGroup": "MemoryLat;TopdownL4;tma_l3_bound_group", "MetricName": "tma_l3_hit_latency", "PublicDescription": "This metric represents fraction of cycles wi= th demand load accesses that hit the L3 cache under unloaded scenarios (pos= sibly L3 latency limited). Avoiding private cache misses (i.e. L2 misses/L= 3 hits) will improve the latency; reduce contention with sibling physical c= ores and increase performance. Note the value of this node may overlap wit= h its siblings. Sample with: MEM_LOAD_RETIRED.L3_HIT_PS", @@ -313,7 +313,7 @@ }, { "BriefDescription": "This metric estimates how often the CPU was s= talled on accesses to external memory (DRAM) by loads", - "MetricExpr": "(MEMORY_ACTIVITY.STALLS_L3_MISS / CLKS)", + "MetricExpr": "MEMORY_ACTIVITY.STALLS_L3_MISS / CLKS", "MetricGroup": "MemoryBound;TmaL3mem;TopdownL3;tma_memory_bound_gr= oup", "MetricName": "tma_dram_bound", "PublicDescription": "This metric estimates how often the CPU was = stalled on accesses to external memory (DRAM) by loads. Better caching can = improve the latency and increase performance. Sample with: MEM_LOAD_RETIRED= .L3_MISS_PS", @@ -340,7 +340,7 @@ }, { "BriefDescription": "This metric estimates how often CPU was stall= ed due to RFO store memory accesses; RFO store issue a read-for-ownership = request before the write", - "MetricExpr": "EXE_ACTIVITY.BOUND_ON_STORES / CLKS", + "MetricExpr": "tma_st_buffer", "MetricGroup": "MemoryBound;TmaL3mem;TopdownL3;tma_memory_bound_gr= oup", "MetricName": "tma_store_bound", "PublicDescription": "This metric estimates how often CPU was stal= led due to RFO store memory accesses; RFO store issue a read-for-ownership= request before the write. Even though store accesses do not typically stal= l out-of-order CPUs; there are few cases where stores can lead to actual st= alls. This metric will be flagged should RFO stores be a bottleneck. Sample= with: MEM_INST_RETIRED.ALL_STORES_PS", @@ -349,7 +349,7 @@ }, { "BriefDescription": "This metric estimates fraction of cycles the = CPU spent handling L1D store misses", - "MetricExpr": "((MEM_STORE_RETIRED.L2_HIT * 10 * (1 - (MEM_INST_RE= TIRED.LOCK_LOADS / MEM_INST_RETIRED.ALL_STORES))) + (1 - (MEM_INST_RETIRED.= LOCK_LOADS / MEM_INST_RETIRED.ALL_STORES)) * min(CPU_CLK_UNHALTED.THREAD, O= FFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_RFO)) / CLKS", + "MetricExpr": "(MEM_STORE_RETIRED.L2_HIT * 10 * (1 - MEM_INST_RETI= RED.LOCK_LOADS / MEM_INST_RETIRED.ALL_STORES) + (1 - MEM_INST_RETIRED.LOCK_= LOADS / MEM_INST_RETIRED.ALL_STORES) * min(CPU_CLK_UNHALTED.THREAD, OFFCORE= _REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_RFO)) / CLKS", "MetricGroup": "MemoryLat;Offcore;TopdownL4;tma_store_bound_group", "MetricName": "tma_store_latency", "PublicDescription": "This metric estimates fraction of cycles the= CPU spent handling L1D store misses. Store accesses usually less impact ou= t-of-order core performance; however; holding resources for longer time can= lead into undesired implications (e.g. contention on L1D fill-buffer entri= es - see FB_Full)", @@ -358,7 +358,7 @@ }, { "BriefDescription": "This metric roughly estimates how often CPU w= as handling synchronizations due to False Sharing", - "MetricExpr": "(28 * Average_Frequency) * OCR.DEMAND_RFO.L3_HIT.SN= OOP_HITM / CLKS", + "MetricExpr": "28 * Average_Frequency * OCR.DEMAND_RFO.L3_HIT.SNOO= P_HITM / CLKS", "MetricGroup": "DataSharing;Offcore;Snoop;TopdownL4;tma_store_boun= d_group", "MetricName": "tma_false_sharing", "PublicDescription": "This metric roughly estimates how often CPU = was handling synchronizations due to False Sharing. False Sharing is a mult= ithreading hiccup; where multiple Logical Processors contend on different d= ata-elements mapped into the same cache line. Sample with: OCR.DEMAND_RFO.= L3_HIT.SNOOP_HITM", @@ -428,7 +428,7 @@ }, { "BriefDescription": "This metric estimates fraction of cycles the = CPU performance was potentially limited due to Core computation issues (non= divider-related)", - "MetricExpr": "(cpu_core@EXE_ACTIVITY.3_PORTS_UTIL\\,umask\\=3D0x8= 0@ + (EXE_ACTIVITY.1_PORTS_UTIL + tma_retiring * cpu_core@EXE_ACTIVITY.2_PO= RTS_UTIL\\,umask\\=3D0xc@)) / CLKS if (ARITH.DIVIDER_ACTIVE < (CYCLE_ACTIVI= TY.STALLS_TOTAL - EXE_ACTIVITY.BOUND_ON_LOADS)) else (EXE_ACTIVITY.1_PORTS_= UTIL + tma_retiring * cpu_core@EXE_ACTIVITY.2_PORTS_UTIL\\,umask\\=3D0xc@) = / CLKS", + "MetricExpr": "((cpu_core@EXE_ACTIVITY.3_PORTS_UTIL\\,umask\\=3D0x= 80@ + (EXE_ACTIVITY.1_PORTS_UTIL + tma_retiring * cpu_core@EXE_ACTIVITY.2_P= ORTS_UTIL\\,umask\\=3D0xc@)) / CLKS if ARITH.DIVIDER_ACTIVE < CYCLE_ACTIVIT= Y.STALLS_TOTAL - EXE_ACTIVITY.BOUND_ON_LOADS else (EXE_ACTIVITY.1_PORTS_UTI= L + tma_retiring * cpu_core@EXE_ACTIVITY.2_PORTS_UTIL\\,umask\\=3D0xc@) / C= LKS)", "MetricGroup": "PortsUtil;TopdownL3;tma_core_bound_group", "MetricName": "tma_ports_utilization", "PublicDescription": "This metric estimates fraction of cycles the= CPU performance was potentially limited due to Core computation issues (no= n divider-related). Two distinct categories can be attributed into this me= tric: (1) heavy data-dependency among contiguous instructions would manifes= t in this metric - such cases are often referred to as low Instruction Leve= l Parallelism (ILP). (2) Contention on some hardware execution unit other t= han Divider. For example; when there are too many multiply operations.", @@ -556,7 +556,7 @@ }, { "BriefDescription": "This category represents fraction of slots ut= ilized by useful work i.e. issued uops that eventually get retired", - "MetricExpr": "topdown\\-retiring / (topdown\\-fe\\-bound + topdow= n\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0*SLOTS", + "MetricExpr": "topdown\\-retiring / (topdown\\-fe\\-bound + topdow= n\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * SLOTS", "MetricGroup": "TopdownL1;tma_L1_group", "MetricName": "tma_retiring", "PublicDescription": "This category represents fraction of slots u= tilized by useful work i.e. issued uops that eventually get retired. Ideall= y; all pipeline slots would be attributed to the Retiring category. Retiri= ng of 100% would indicate the maximum Pipeline_Width throughput was achieve= d. Maximizing Retiring typically increases the Instructions-per-cycle (see= IPC metric). Note that a high Retiring value does not necessary mean there= is no room for more performance. For example; Heavy-operations or Microco= de Assists are categorized under Retiring. They often indicate suboptimal p= erformance and can often be optimized or avoided. Sample with: UOPS_RETIRE= D.SLOTS", @@ -704,7 +704,7 @@ }, { "BriefDescription": "This metric represents fraction of slots wher= e the CPU was retiring heavy-weight operations -- instructions that require= two or more uops or microcoded sequences", - "MetricExpr": "topdown\\-heavy\\-ops / (topdown\\-fe\\-bound + top= down\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0*SLOTS", + "MetricExpr": "topdown\\-heavy\\-ops / (topdown\\-fe\\-bound + top= down\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * SLOTS", "MetricGroup": "Retire;TopdownL2;tma_L2_group;tma_retiring_group", "MetricName": "tma_heavy_operations", "PublicDescription": "This metric represents fraction of slots whe= re the CPU was retiring heavy-weight operations -- instructions that requir= e two or more uops or microcoded sequences. This highly-correlates with the= uop length of these instructions/sequences. Sample with: UOPS_RETIRED.HEAV= Y", @@ -722,7 +722,7 @@ }, { "BriefDescription": "This metric represents fraction of slots the = CPU was retiring uops fetched by the Microcode Sequencer (MS) unit", - "MetricExpr": "UOPS_RETIRED.MS / SLOTS", + "MetricExpr": "tma_ms_uops", "MetricGroup": "MicroSeq;TopdownL3;tma_heavy_operations_group", "MetricName": "tma_microcode_sequencer", "PublicDescription": "This metric represents fraction of slots the= CPU was retiring uops fetched by the Microcode Sequencer (MS) unit. The M= S is used for CISC instructions not supported by the default decoders (like= repeat move strings; or CPUID); or by microcode assists used to address so= me operation modes (like in Floating Point assists). These cases can often = be avoided. Sample with: UOPS_RETIRED.MS", @@ -782,21 +782,21 @@ }, { "BriefDescription": "Total pipeline cost of (external) Memory Band= width related bottlenecks", - "MetricExpr": "100 * tma_memory_bound * ((tma_dram_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) *= (tma_mem_bandwidth / (tma_mem_bandwidth + tma_mem_latency)) + (tma_l3_boun= d / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_stor= e_bound)) * (tma_sq_full / (tma_contested_accesses + tma_data_sharing + tma= _l3_hit_latency + tma_sq_full))) + (tma_l1_bound / (tma_dram_bound + tma_l1= _bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_fb_full / (= tma_dtlb_load + tma_fb_full + tma_lock_latency + tma_split_loads + tma_stor= e_fwd_blk)) ", + "MetricExpr": "100 * tma_memory_bound * (tma_dram_bound / (tma_dra= m_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * (= tma_mem_bandwidth / (tma_mem_bandwidth + tma_mem_latency)) + tma_l3_bound /= (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_b= ound) * (tma_sq_full / (tma_contested_accesses + tma_data_sharing + tma_l3_= hit_latency + tma_sq_full))) + tma_l1_bound / (tma_dram_bound + tma_l1_boun= d + tma_l2_bound + tma_l3_bound + tma_store_bound) * (tma_fb_full / (tma_dt= lb_load + tma_fb_full + tma_lock_latency + tma_split_loads + tma_store_fwd_= blk))", "MetricGroup": "Mem;MemoryBW;Offcore", "MetricName": "Memory_Bandwidth", "Unit": "cpu_core" }, { "BriefDescription": "Total pipeline cost of Memory Latency related= bottlenecks (external memory and off-core caches)", - "MetricExpr": "100 * tma_memory_bound * ((tma_dram_bound / (tma_dr= am_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) *= (tma_mem_latency / (tma_mem_bandwidth + tma_mem_latency)) + (tma_l3_bound = / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_= bound)) * (tma_l3_hit_latency / (tma_contested_accesses + tma_data_sharing = + tma_l3_hit_latency + tma_sq_full)) + (tma_l2_bound / (tma_dram_bound + tm= a_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)))", + "MetricExpr": "100 * tma_memory_bound * (tma_dram_bound / (tma_dra= m_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * (= tma_mem_latency / (tma_mem_bandwidth + tma_mem_latency)) + tma_l3_bound / (= tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bou= nd) * (tma_l3_hit_latency / (tma_contested_accesses + tma_data_sharing + tm= a_l3_hit_latency + tma_sq_full)) + tma_l2_bound / (tma_dram_bound + tma_l1_= bound + tma_l2_bound + tma_l3_bound + tma_store_bound))", "MetricGroup": "Mem;MemoryLat;Offcore", "MetricName": "Memory_Latency", "Unit": "cpu_core" }, { "BriefDescription": "Total pipeline cost of Memory Address Transla= tion related bottlenecks (data-side TLBs)", - "MetricExpr": "100 * tma_memory_bound * ((tma_l1_bound / max(tma_m= emory_bound, tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + = tma_store_bound)) * (tma_dtlb_load / max(tma_l1_bound, tma_dtlb_load + tma_= fb_full + tma_lock_latency + tma_split_loads + tma_store_fwd_blk)) + (tma_s= tore_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound += tma_store_bound)) * (tma_dtlb_store / (tma_dtlb_store + tma_false_sharing = + tma_split_stores + tma_store_latency + tma_streaming_stores))) ", + "MetricExpr": "100 * tma_memory_bound * (tma_l1_bound / max(tma_me= mory_bound, tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + t= ma_store_bound) * (tma_dtlb_load / max(tma_l1_bound, tma_dtlb_load + tma_fb= _full + tma_lock_latency + tma_split_loads + tma_store_fwd_blk)) + tma_stor= e_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tm= a_store_bound) * (tma_dtlb_store / (tma_dtlb_store + tma_false_sharing + tm= a_split_stores + tma_store_latency + tma_streaming_stores)))", "MetricGroup": "Mem;MemoryTLB;Offcore", "MetricName": "Memory_Data_TLBs", "Unit": "cpu_core" @@ -831,14 +831,14 @@ }, { "BriefDescription": "Uops Per Instruction", - "MetricExpr": "(tma_retiring * SLOTS) / INST_RETIRED.ANY", + "MetricExpr": "tma_retiring * SLOTS / INST_RETIRED.ANY", "MetricGroup": "Pipeline;Ret;Retire", "MetricName": "UPI", "Unit": "cpu_core" }, { "BriefDescription": "Instruction per taken branch", - "MetricExpr": "(tma_retiring * SLOTS) / BR_INST_RETIRED.NEAR_TAKEN= ", + "MetricExpr": "tma_retiring * SLOTS / BR_INST_RETIRED.NEAR_TAKEN", "MetricGroup": "Branches;Fed;FetchBW", "MetricName": "UpTB", "Unit": "cpu_core" @@ -866,7 +866,7 @@ }, { "BriefDescription": "Fraction of Physical Core issue-slots utilize= d by this Logical Processor", - "MetricExpr": "SLOTS / (TOPDOWN.SLOTS / 2) if #SMT_on else 1", + "MetricExpr": "(SLOTS / (TOPDOWN.SLOTS / 2) if #SMT_on else 1)", "MetricGroup": "SMT;tma_L1_group", "MetricName": "Slots_Utilization", "Unit": "cpu_core" @@ -888,7 +888,7 @@ }, { "BriefDescription": "Floating Point Operations Per Cycle", - "MetricExpr": "(1 * (FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARIT= H_INST_RETIRED.SCALAR_DOUBLE) + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBL= E + 4 * (FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.2= 56B_PACKED_DOUBLE) + 8 * FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE) / CORE_C= LKS", + "MetricExpr": "(FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INS= T_RETIRED.SCALAR_DOUBLE + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 = * (FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PA= CKED_DOUBLE) + 8 * FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE) / CORE_CLKS", "MetricGroup": "Flops;Ret", "MetricName": "FLOPc", "Unit": "cpu_core" @@ -903,14 +903,14 @@ }, { "BriefDescription": "Instruction-Level-Parallelism (average number= of uops executed when there is execution) per-core", - "MetricExpr": "UOPS_EXECUTED.THREAD / ((UOPS_EXECUTED.CORE_CYCLES_= GE_1 / 2) if #SMT_on else UOPS_EXECUTED.CORE_CYCLES_GE_1)", + "MetricExpr": "UOPS_EXECUTED.THREAD / (UOPS_EXECUTED.CORE_CYCLES_G= E_1 / 2 if #SMT_on else UOPS_EXECUTED.CORE_CYCLES_GE_1)", "MetricGroup": "Backend;Cor;Pipeline;PortsUtil", "MetricName": "ILP", "Unit": "cpu_core" }, { "BriefDescription": "Probability of Core Bound bottleneck hidden b= y SMT-profiling artifacts", - "MetricExpr": "(1 - tma_core_bound / tma_ports_utilization if tma_= core_bound < tma_ports_utilization else 1) if SMT_2T_Utilization > 0.5 else= 0", + "MetricExpr": "((1 - tma_core_bound / tma_ports_utilization if tma= _core_bound < tma_ports_utilization else 1) if SMT_2T_Utilization > 0.5 els= e 0)", "MetricGroup": "Cor;SMT", "MetricName": "Core_Bound_Likely", "Unit": "cpu_core" @@ -966,14 +966,14 @@ }, { "BriefDescription": "Instructions per Floating Point (FP) Operatio= n (lower number means higher occurrence rate)", - "MetricExpr": "INST_RETIRED.ANY / (1 * (FP_ARITH_INST_RETIRED.SCAL= AR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE) + 2 * FP_ARITH_INST_RETIRE= D.128B_PACKED_DOUBLE + 4 * (FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_A= RITH_INST_RETIRED.256B_PACKED_DOUBLE) + 8 * FP_ARITH_INST_RETIRED.256B_PACK= ED_SINGLE)", + "MetricExpr": "INST_RETIRED.ANY / (FP_ARITH_INST_RETIRED.SCALAR_SI= NGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE + 2 * FP_ARITH_INST_RETIRED.128B= _PACKED_DOUBLE + 4 * (FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_I= NST_RETIRED.256B_PACKED_DOUBLE) + 8 * FP_ARITH_INST_RETIRED.256B_PACKED_SIN= GLE)", "MetricGroup": "Flops;InsType", "MetricName": "IpFLOP", "Unit": "cpu_core" }, { "BriefDescription": "Instructions per FP Arithmetic instruction (l= ower number means higher occurrence rate)", - "MetricExpr": "INST_RETIRED.ANY / ((FP_ARITH_INST_RETIRED.SCALAR_S= INGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE) + (FP_ARITH_INST_RETIRED.128B_= PACKED_DOUBLE + FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RE= TIRED.256B_PACKED_DOUBLE + FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE))", + "MetricExpr": "INST_RETIRED.ANY / (FP_ARITH_INST_RETIRED.SCALAR_SI= NGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE + (FP_ARITH_INST_RETIRED.128B_PA= CKED_DOUBLE + FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETI= RED.256B_PACKED_DOUBLE + FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE))", "MetricGroup": "Flops;InsType", "MetricName": "IpArith", "PublicDescription": "Instructions per FP Arithmetic instruction (= lower number means higher occurrence rate). May undercount due to FMA doubl= e counting. Approximated prior to BDW.", @@ -1027,7 +1027,7 @@ }, { "BriefDescription": "Average number of Uops retired in cycles wher= e at least one uop has retired.", - "MetricExpr": "(tma_retiring * SLOTS) / cpu_core@UOPS_RETIRED.SLOT= S\\,cmask\\=3D1@", + "MetricExpr": "tma_retiring * SLOTS / cpu_core@UOPS_RETIRED.SLOTS\= \,cmask\\=3D1@", "MetricGroup": "Pipeline;Ret", "MetricName": "Retire", "Unit": "cpu_core" @@ -1104,7 +1104,7 @@ }, { "BriefDescription": "Branch Misprediction Cost: Fraction of TMA sl= ots wasted per non-speculative branch misprediction (retired JEClear)", - "MetricExpr": " (tma_branch_mispredicts + tma_fetch_latency * tma_= mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches + tma_icache= _misses + tma_itlb_misses + tma_lcp + tma_ms_switches)) * SLOTS / BR_MISP_R= ETIRED.ALL_BRANCHES", + "MetricExpr": "(tma_branch_mispredicts + tma_fetch_latency * tma_m= ispredicts_resteers / (tma_branch_resteers + tma_dsb_switches + tma_icache_= misses + tma_itlb_misses + tma_lcp + tma_ms_switches)) * SLOTS / BR_MISP_RE= TIRED.ALL_BRANCHES", "MetricGroup": "Bad;BrMispredicts", "MetricName": "Branch_Misprediction_Cost", "Unit": "cpu_core" @@ -1160,63 +1160,63 @@ }, { "BriefDescription": "L1 cache true misses per kilo instruction for= retired demand loads", - "MetricExpr": "1000 * MEM_LOAD_RETIRED.L1_MISS / INST_RETIRED.ANY", + "MetricExpr": "1e3 * MEM_LOAD_RETIRED.L1_MISS / INST_RETIRED.ANY", "MetricGroup": "CacheMisses;Mem", "MetricName": "L1MPKI", "Unit": "cpu_core" }, { "BriefDescription": "L1 cache true misses per kilo instruction for= all demand loads (including speculative)", - "MetricExpr": "1000 * L2_RQSTS.ALL_DEMAND_DATA_RD / INST_RETIRED.A= NY", + "MetricExpr": "1e3 * L2_RQSTS.ALL_DEMAND_DATA_RD / INST_RETIRED.AN= Y", "MetricGroup": "CacheMisses;Mem", "MetricName": "L1MPKI_Load", "Unit": "cpu_core" }, { "BriefDescription": "L2 cache true misses per kilo instruction for= retired demand loads", - "MetricExpr": "1000 * MEM_LOAD_RETIRED.L2_MISS / INST_RETIRED.ANY", + "MetricExpr": "1e3 * MEM_LOAD_RETIRED.L2_MISS / INST_RETIRED.ANY", "MetricGroup": "Backend;CacheMisses;Mem", "MetricName": "L2MPKI", "Unit": "cpu_core" }, { "BriefDescription": "L2 cache ([RKL+] true) misses per kilo instru= ction for all request types (including speculative)", - "MetricExpr": "1000 * L2_RQSTS.MISS / INST_RETIRED.ANY", + "MetricExpr": "1e3 * L2_RQSTS.MISS / INST_RETIRED.ANY", "MetricGroup": "CacheMisses;Mem;Offcore", "MetricName": "L2MPKI_All", "Unit": "cpu_core" }, { "BriefDescription": "L2 cache ([RKL+] true) misses per kilo instru= ction for all demand loads (including speculative)", - "MetricExpr": "1000 * L2_RQSTS.DEMAND_DATA_RD_MISS / INST_RETIRED.= ANY", + "MetricExpr": "1e3 * L2_RQSTS.DEMAND_DATA_RD_MISS / INST_RETIRED.A= NY", "MetricGroup": "CacheMisses;Mem", "MetricName": "L2MPKI_Load", "Unit": "cpu_core" }, { "BriefDescription": "L2 cache hits per kilo instruction for all re= quest types (including speculative)", - "MetricExpr": "1000 * (L2_RQSTS.REFERENCES - L2_RQSTS.MISS) / INST= _RETIRED.ANY", + "MetricExpr": "1e3 * (L2_RQSTS.REFERENCES - L2_RQSTS.MISS) / INST_= RETIRED.ANY", "MetricGroup": "CacheMisses;Mem", "MetricName": "L2HPKI_All", "Unit": "cpu_core" }, { "BriefDescription": "L2 cache hits per kilo instruction for all de= mand loads (including speculative)", - "MetricExpr": "1000 * L2_RQSTS.DEMAND_DATA_RD_HIT / INST_RETIRED.A= NY", + "MetricExpr": "1e3 * L2_RQSTS.DEMAND_DATA_RD_HIT / INST_RETIRED.AN= Y", "MetricGroup": "CacheMisses;Mem", "MetricName": "L2HPKI_Load", "Unit": "cpu_core" }, { "BriefDescription": "L3 cache true misses per kilo instruction for= retired demand loads", - "MetricExpr": "1000 * MEM_LOAD_RETIRED.L3_MISS / INST_RETIRED.ANY", + "MetricExpr": "1e3 * MEM_LOAD_RETIRED.L3_MISS / INST_RETIRED.ANY", "MetricGroup": "CacheMisses;Mem", "MetricName": "L3MPKI", "Unit": "cpu_core" }, { "BriefDescription": "Fill Buffer (FB) hits per kilo instructions f= or retired demand loads (L1D misses that merge into ongoing miss-handling e= ntries)", - "MetricExpr": "1000 * MEM_LOAD_RETIRED.FB_HIT / INST_RETIRED.ANY", + "MetricExpr": "1e3 * MEM_LOAD_RETIRED.FB_HIT / INST_RETIRED.ANY", "MetricGroup": "CacheMisses;Mem", "MetricName": "FB_HPKI", "Unit": "cpu_core" @@ -1231,28 +1231,28 @@ }, { "BriefDescription": "Average per-core data fill bandwidth to the L= 1 data cache [GB / sec]", - "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / duration_time", + "MetricExpr": "64 * L1D.REPLACEMENT / 1e9 / duration_time", "MetricGroup": "Mem;MemoryBW", "MetricName": "L1D_Cache_Fill_BW", "Unit": "cpu_core" }, { "BriefDescription": "Average per-core data fill bandwidth to the L= 2 cache [GB / sec]", - "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / duration_time", + "MetricExpr": "64 * L2_LINES_IN.ALL / 1e9 / duration_time", "MetricGroup": "Mem;MemoryBW", "MetricName": "L2_Cache_Fill_BW", "Unit": "cpu_core" }, { "BriefDescription": "Average per-core data fill bandwidth to the L= 3 cache [GB / sec]", - "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / duration= _time", + "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1e9 / duration_time", "MetricGroup": "Mem;MemoryBW", "MetricName": "L3_Cache_Fill_BW", "Unit": "cpu_core" }, { "BriefDescription": "Average per-core data access bandwidth to the= L3 cache [GB / sec]", - "MetricExpr": "64 * OFFCORE_REQUESTS.ALL_REQUESTS / 1000000000 / d= uration_time", + "MetricExpr": "64 * OFFCORE_REQUESTS.ALL_REQUESTS / 1e9 / duration= _time", "MetricGroup": "Mem;MemoryBW;Offcore", "MetricName": "L3_Cache_Access_BW", "Unit": "cpu_core" @@ -1294,14 +1294,14 @@ }, { "BriefDescription": "Measured Average Frequency for unhalted proce= ssors [GHz]", - "MetricExpr": "Turbo_Utilization * TSC / 1000000000 / duration_tim= e", + "MetricExpr": "Turbo_Utilization * TSC / 1e9 / duration_time", "MetricGroup": "Power;Summary", "MetricName": "Average_Frequency", "Unit": "cpu_core" }, { "BriefDescription": "Giga Floating Point Operations Per Second", - "MetricExpr": "((1 * (FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARI= TH_INST_RETIRED.SCALAR_DOUBLE) + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUB= LE + 4 * (FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.= 256B_PACKED_DOUBLE) + 8 * FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE) / 10000= 00000) / duration_time", + "MetricExpr": "(FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INS= T_RETIRED.SCALAR_DOUBLE + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 = * (FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PA= CKED_DOUBLE) + 8 * FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE) / 1e9 / durati= on_time", "MetricGroup": "Cor;Flops;HPC", "MetricName": "GFLOPs", "PublicDescription": "Giga Floating Point Operations Per Second. A= ggregate across all supported options of: FP precisions, scalar and vector = instructions, vector-width and AMX engine.", @@ -1316,7 +1316,7 @@ }, { "BriefDescription": "Fraction of cycles where both hardware Logica= l Processors were active", - "MetricExpr": "1 - CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UN= HALTED.REF_DISTRIBUTED if #SMT_on else 0", + "MetricExpr": "(1 - CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_U= NHALTED.REF_DISTRIBUTED if #SMT_on else 0)", "MetricGroup": "SMT", "MetricName": "SMT_2T_Utilization", "Unit": "cpu_core" @@ -1337,7 +1337,7 @@ }, { "BriefDescription": "Average external Memory Bandwidth Use for rea= ds and writes [GB / sec]", - "MetricExpr": "64 * (UNC_ARB_TRK_REQUESTS.ALL + UNC_ARB_COH_TRK_RE= QUESTS.ALL) / 1000000 / duration_time / 1000", + "MetricExpr": "64 * (UNC_ARB_TRK_REQUESTS.ALL + UNC_ARB_COH_TRK_RE= QUESTS.ALL) / 1e6 / duration_time / 1e3", "MetricGroup": "HPC;Mem;MemoryBW;SoC", "MetricName": "DRAM_BW_Use", "Unit": "cpu_core" @@ -1365,7 +1365,7 @@ }, { "BriefDescription": "Uncore frequency per die [GHZ]", - "MetricExpr": "Socket_CLKS / #num_dies / duration_time / 100000000= 0", + "MetricExpr": "Socket_CLKS / #num_dies / duration_time / 1e9", "MetricGroup": "SoC", "MetricName": "UNCORE_FREQ" }, @@ -1558,7 +1558,7 @@ }, { "BriefDescription": "Counts the number of cycles the core is stall= ed due to stores or loads. ", - "MetricExpr": "min((TOPDOWN_BE_BOUND.ALL / SLOTS), (LD_HEAD.ANY_AT= _RET / CLKS) + tma_store_bound)", + "MetricExpr": "min(tma_backend_bound, LD_HEAD.ANY_AT_RET / CLKS + = tma_store_bound)", "MetricGroup": "TopdownL2;tma_backend_bound_group", "MetricName": "tma_load_store_bound", "ScaleUnit": "100%", @@ -1566,7 +1566,7 @@ }, { "BriefDescription": "Counts the number of cycles the core is stall= ed due to store buffer full.", - "MetricExpr": "tma_mem_scheduler * (MEM_SCHEDULER_BLOCK.ST_BUF / M= EM_SCHEDULER_BLOCK.ALL)", + "MetricExpr": "tma_st_buffer", "MetricGroup": "TopdownL3;tma_load_store_bound_group", "MetricName": "tma_store_bound", "ScaleUnit": "100%", @@ -1614,7 +1614,7 @@ }, { "BriefDescription": "Counts the number of cycles a core is stalled= due to a demand load which hit in the L2 Cache.", - "MetricExpr": "(MEM_BOUND_STALLS.LOAD_L2_HIT / CLKS) - (MEM_BOUND_= STALLS_AT_RET_CORRECTION * MEM_BOUND_STALLS.LOAD_L2_HIT / MEM_BOUND_STALLS.= LOAD)", + "MetricExpr": "MEM_BOUND_STALLS.LOAD_L2_HIT / CLKS - MEM_BOUND_STA= LLS_AT_RET_CORRECTION * MEM_BOUND_STALLS.LOAD_L2_HIT / MEM_BOUND_STALLS.LOA= D", "MetricGroup": "TopdownL3;tma_load_store_bound_group", "MetricName": "tma_l2_bound", "ScaleUnit": "100%", @@ -1622,7 +1622,7 @@ }, { "BriefDescription": "Counts the number of cycles a core is stalled= due to a demand load which hit in the Last Level Cache (LLC) or other core= with HITE/F/M.", - "MetricExpr": "(MEM_BOUND_STALLS.LOAD_LLC_HIT / CLKS) - (MEM_BOUND= _STALLS_AT_RET_CORRECTION * MEM_BOUND_STALLS.LOAD_LLC_HIT / MEM_BOUND_STALL= S.LOAD)", + "MetricExpr": "MEM_BOUND_STALLS.LOAD_LLC_HIT / CLKS - MEM_BOUND_ST= ALLS_AT_RET_CORRECTION * MEM_BOUND_STALLS.LOAD_LLC_HIT / MEM_BOUND_STALLS.L= OAD", "MetricGroup": "TopdownL3;tma_load_store_bound_group", "MetricName": "tma_l3_bound", "ScaleUnit": "100%", @@ -1630,7 +1630,7 @@ }, { "BriefDescription": "Counts the number of cycles the core is stall= ed due to a demand load miss which hit in DRAM or MMIO (Non-DRAM).", - "MetricExpr": "(MEM_BOUND_STALLS.LOAD_DRAM_HIT / CLKS) - (MEM_BOUN= D_STALLS_AT_RET_CORRECTION * MEM_BOUND_STALLS.LOAD_DRAM_HIT / MEM_BOUND_STA= LLS.LOAD)", + "MetricExpr": "MEM_BOUND_STALLS.LOAD_DRAM_HIT / CLKS - MEM_BOUND_S= TALLS_AT_RET_CORRECTION * MEM_BOUND_STALLS.LOAD_DRAM_HIT / MEM_BOUND_STALLS= .LOAD", "MetricGroup": "TopdownL3;tma_load_store_bound_group", "MetricName": "tma_dram_bound", "ScaleUnit": "100%", @@ -1939,25 +1939,25 @@ }, { "BriefDescription": "Percent of instruction miss cost that hit in = the L2", - "MetricExpr": "100 * MEM_BOUND_STALLS.IFETCH_L2_HIT / (MEM_BOUND_S= TALLS.IFETCH)", + "MetricExpr": "100 * MEM_BOUND_STALLS.IFETCH_L2_HIT / MEM_BOUND_ST= ALLS.IFETCH", "MetricName": "Inst_Miss_Cost_L2Hit_Percent", "Unit": "cpu_atom" }, { "BriefDescription": "Percent of instruction miss cost that hit in = the L3", - "MetricExpr": "100 * MEM_BOUND_STALLS.IFETCH_LLC_HIT / (MEM_BOUND_= STALLS.IFETCH)", + "MetricExpr": "100 * MEM_BOUND_STALLS.IFETCH_LLC_HIT / MEM_BOUND_S= TALLS.IFETCH", "MetricName": "Inst_Miss_Cost_L3Hit_Percent", "Unit": "cpu_atom" }, { "BriefDescription": "Percent of instruction miss cost that hit in = DRAM", - "MetricExpr": "100 * MEM_BOUND_STALLS.IFETCH_DRAM_HIT / (MEM_BOUND= _STALLS.IFETCH)", + "MetricExpr": "100 * MEM_BOUND_STALLS.IFETCH_DRAM_HIT / MEM_BOUND_= STALLS.IFETCH", "MetricName": "Inst_Miss_Cost_DRAMHit_Percent", "Unit": "cpu_atom" }, { "BriefDescription": "load ops retired per 1000 instruction", - "MetricExpr": "1000 * MEM_UOPS_RETIRED.ALL_LOADS / INST_RETIRED.AN= Y", + "MetricExpr": "1e3 * MEM_UOPS_RETIRED.ALL_LOADS / INST_RETIRED.ANY= ", "MetricName": "MemLoadPKI", "Unit": "cpu_atom" }, --=20 2.39.0.314.g84b9a713c41-goog