From nobody Sat Feb 7 19:00:24 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D24D2EB64DD for ; Tue, 1 Aug 2023 05:37:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230394AbjHAFhE (ORCPT ); Tue, 1 Aug 2023 01:37:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49628 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229974AbjHAFhA (ORCPT ); Tue, 1 Aug 2023 01:37:00 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 992C2139 for ; Mon, 31 Jul 2023 22:36:49 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-584139b6b03so60961927b3.3 for ; Mon, 31 Jul 2023 22:36:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1690868209; x=1691473009; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=7cgKjdSs3G1HD7e5a1QJ2TPW0bOH8fEPCcMkC6M5juE=; b=vasKg5Ucb7s834gs6TqGgTHy3yn7y2I64AU1KlEXsVFglMHUYI3YWwtIeYmJvvv7r1 J8h8xXYpI5jbWq2H53iOxymdAQ0Tbx2jcEvsDuO0ChxhooVaqtkxu4ir0iYA2vI/IYel NCGM+o+vdH8s/IVNwLS6M5MOulScPi4J5PmFQlGNoKkzIy9Es4XPEvD/Mpn927LiyRBz XHa0scMkWQc3+kY21TTAOfWtR2nFdnXSir3rgHtN2dcQHFN4PjhOz2j2T2XtsymDGMi2 iiAfovKi/QYXaEo/A7iRqgnc8YuNWzlhxcTfcGZyYaOq2tklUNXoMNnuvEXy53HwRZiK Y1sA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690868209; x=1691473009; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=7cgKjdSs3G1HD7e5a1QJ2TPW0bOH8fEPCcMkC6M5juE=; b=DwoJ9A/2CS5Fnikd5F7cB7N4KaiUXc229dXMUwcPK234hkGD7h2ObPDV/zx1pzHi7C Yj7XgmeHapgAqa40m1jg0sYXWf3pml9IjS5664NJw/xx2mCZOIU/Qi9l//TSMv5RuxVd YEFYP0uVawTKxX74xNHsZnVWmre2CxrpIoitMdzAnw6w93ze2FLuElUH11S3UB4LZ+NF eRogGwXPW+ODM4LersiNRYMapiAmwkw5dSGvWM2Fxgawi8pXgy5tjHNonVQuA/Z1Q8l9 A5/9IM4TopA+Bt2KpfZhG4RF2pR9hEiTTmAHI5fkXCBH8PTonzEd8pc/5M8S3WOmuMz0 KiUw== X-Gm-Message-State: ABy/qLaIWO4D5WQ9P9e5tuNSsYNu5NR/Y9HSye/xZnp7g6M6HPdkbIwV 20gUC4HDoB0Gr8Wcg4x7m62JtfXykZC6 X-Google-Smtp-Source: APBJJlHy6JFYSb9M9Oh886GADCpFJqSyjGvQ5kdRzdU6QgUVE9MfCqO3B63MHXpf8WNkAXKF78c6grWhGTtr X-Received: from irogers.svl.corp.google.com ([2620:15c:2a3:200:dd0a:f4b:6531:d8dc]) (user=irogers job=sendgmr) by 2002:a81:a789:0:b0:586:5388:82ba with SMTP id e131-20020a81a789000000b00586538882bamr12914ywh.6.1690868208792; Mon, 31 Jul 2023 22:36:48 -0700 (PDT) Date: Mon, 31 Jul 2023 22:36:31 -0700 In-Reply-To: <20230801053634.1142634-1-irogers@google.com> Message-Id: <20230801053634.1142634-2-irogers@google.com> Mime-Version: 1.0 References: <20230801053634.1142634-1-irogers@google.com> X-Mailer: git-send-email 2.41.0.585.gd2178a4bd4-goog Subject: [PATCH v1 1/4] perf parse-events x86: Avoid sorting uops_retired.slots From: Ian Rogers To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Adrian Hunter , Kan Liang , Ravi Bangoria , Zhengjun Xing , linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, Andi Kleen , Weilin Wang Cc: Ian Rogers Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" As topdown.slots may appear as slots it may get confused with uops_retired.slots which is an invalid perf metric event group leader. Special case uops_retired.slots to avoid this confusion. Signed-off-by: Ian Rogers Reviewed-by: Kan Liang --- tools/perf/arch/x86/util/evlist.c | 7 ++++--- tools/perf/arch/x86/util/evsel.c | 7 +++---- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/tools/perf/arch/x86/util/evlist.c b/tools/perf/arch/x86/util/e= vlist.c index cbd582182932..b1ce0c52d88d 100644 --- a/tools/perf/arch/x86/util/evlist.c +++ b/tools/perf/arch/x86/util/evlist.c @@ -75,11 +75,12 @@ int arch_evlist__add_default_attrs(struct evlist *evlis= t, =20 int arch_evlist__cmp(const struct evsel *lhs, const struct evsel *rhs) { - if (topdown_sys_has_perf_metrics() && evsel__sys_has_perf_metrics(lhs)) { + if (topdown_sys_has_perf_metrics() && + (arch_evsel__must_be_in_group(lhs) || arch_evsel__must_be_in_group(rh= s))) { /* Ensure the topdown slots comes first. */ - if (strcasestr(lhs->name, "slots")) + if (strcasestr(lhs->name, "slots") && !strcasestr(lhs->name, "uops_retir= ed.slots")) return -1; - if (strcasestr(rhs->name, "slots")) + if (strcasestr(rhs->name, "slots") && !strcasestr(rhs->name, "uops_retir= ed.slots")) return 1; /* Followed by topdown events. */ if (strcasestr(lhs->name, "topdown") && !strcasestr(rhs->name, "topdown"= )) diff --git a/tools/perf/arch/x86/util/evsel.c b/tools/perf/arch/x86/util/ev= sel.c index 81d22657922a..090d0f371891 100644 --- a/tools/perf/arch/x86/util/evsel.c +++ b/tools/perf/arch/x86/util/evsel.c @@ -40,12 +40,11 @@ bool evsel__sys_has_perf_metrics(const struct evsel *ev= sel) =20 bool arch_evsel__must_be_in_group(const struct evsel *evsel) { - if (!evsel__sys_has_perf_metrics(evsel)) + if (!evsel__sys_has_perf_metrics(evsel) || !evsel->name || + strcasestr(evsel->name, "uops_retired.slots")) return false; =20 - return evsel->name && - (strcasestr(evsel->name, "slots") || - strcasestr(evsel->name, "topdown")); + return strcasestr(evsel->name, "topdown") || strcasestr(evsel->name, "slo= ts"); } =20 int arch_evsel__hw_name(struct evsel *evsel, char *bf, size_t size) --=20 2.41.0.585.gd2178a4bd4-goog From nobody Sat Feb 7 19:00:24 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4820BEB64DD for ; Tue, 1 Aug 2023 05:37:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230507AbjHAFhM (ORCPT ); Tue, 1 Aug 2023 01:37:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49650 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230296AbjHAFhA (ORCPT ); Tue, 1 Aug 2023 01:37:00 -0400 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3A5DB199E for ; Mon, 31 Jul 2023 22:36:52 -0700 (PDT) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-58473c4f629so55475487b3.1 for ; Mon, 31 Jul 2023 22:36:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1690868211; x=1691473011; h=content-transfer-encoding:cc:to:from:subject:references :mime-version:message-id:in-reply-to:date:from:to:cc:subject:date :message-id:reply-to; bh=RDwbBIhhv9guUY7kz5vhWz9Ol5BMymdmzTOJH3VXtzY=; b=KiHT8gPWAv/Si85igoRB7An7peG/8JKQPLdnC2nBBpPeTg+Hi6yaFsuYJw0qtA5KAB p5l1ret51dS5nyD0GSy4YaiXdOMBK7q48gF7RotWA1XSavK4h8e+pIoIcI/61nTBDyLW 0TsO7XluDdtfbrzsu5GEzuTWo0JTf8PN0WvUg7B9yDUIdyQMIRophltixXRfKYMfMPCt RW8BtxQsuvhEHcgADRis4BaT/gpJACBPa74f5c33KeEIhoMf8doUk66Sa8qJ75NFaRWM +QVlSNMXl3TbFWH9YG03Vmg2NzI5uO+HO93IGAYnQ8RBnS66tTEZqrLxwcMBgfxWfXXE 6wQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690868211; x=1691473011; h=content-transfer-encoding:cc:to:from:subject:references :mime-version:message-id:in-reply-to:date:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=RDwbBIhhv9guUY7kz5vhWz9Ol5BMymdmzTOJH3VXtzY=; b=Yeey2X8tmeXP9JHS+ZxfO2XnoaJN5juDHrDHLkgQKbQN/mp4RTg20mOPfZ3IKY9v1N gcOwjrBUcGyfLO+1ppc0Neh3ru8EcaNPOWZu121OnS3YCFBF/cF9UA3WJmTgfdLPiSKW hGsjRBBIvH2+PyeR2LxnH0G9V32cYCOUHf8VuiBmjp1vMj1a3TF0oFROdm/2TMgDvvGQ LSKpWo4Ym8dDD5G7Pyvh8kmrkUtlhXkspOnTmlquCO/MmvtfJ2unP1Uucr3NsI/46TTe c84fCx5EcXxarRIYDbaq/dghDBgIc4MhP+1Rit4gDQqtxYpSgmepcqQH6hKcTQa6xG7T ATUA== X-Gm-Message-State: ABy/qLbrtfNj5GjywPqLbUBiThS10YwiHV3icgBMdQUvzY91bUI0Aqmv SV4sm2VAs4vgheqpySrcKfiS1uj0NV+8 X-Google-Smtp-Source: APBJJlGBABYJDe0ieiW93uGmJqm/PsE3Cyn++5rBcmuJffG7xSigHTbC4cc24O1RBsmQ+ZxUBhWnRF/2UP24 X-Received: from irogers.svl.corp.google.com ([2620:15c:2a3:200:dd0a:f4b:6531:d8dc]) (user=irogers job=sendgmr) by 2002:a81:b248:0:b0:583:491b:53d3 with SMTP id q69-20020a81b248000000b00583491b53d3mr102335ywh.9.1690868211254; Mon, 31 Jul 2023 22:36:51 -0700 (PDT) Date: Mon, 31 Jul 2023 22:36:32 -0700 In-Reply-To: <20230801053634.1142634-1-irogers@google.com> Message-Id: <20230801053634.1142634-3-irogers@google.com> Mime-Version: 1.0 References: <20230801053634.1142634-1-irogers@google.com> X-Mailer: git-send-email 2.41.0.585.gd2178a4bd4-goog Subject: [PATCH v1 2/4] perf vendor events intel: Update meteorlake to 1.04 From: Ian Rogers To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Adrian Hunter , Kan Liang , Ravi Bangoria , Zhengjun Xing , linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, Andi Kleen , Weilin Wang Cc: Ian Rogers Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" 1.04 events were released in: https://github.com/intel/perfmon/commit/44fe3681501f43fc515577aced8e944b187= c8e51 Addition of 51 core events. Signed-off-by: Ian Rogers Reviewed-by: Kan Liang --- tools/perf/pmu-events/arch/x86/mapfile.csv | 2 +- .../pmu-events/arch/x86/meteorlake/cache.json | 165 ++++++++++++++++++ .../arch/x86/meteorlake/floating-point.json | 8 + .../arch/x86/meteorlake/frontend.json | 56 ++++++ .../arch/x86/meteorlake/memory.json | 80 +++++++++ .../pmu-events/arch/x86/meteorlake/other.json | 16 ++ .../arch/x86/meteorlake/pipeline.json | 159 +++++++++++++++++ 7 files changed, 485 insertions(+), 1 deletion(-) diff --git a/tools/perf/pmu-events/arch/x86/mapfile.csv b/tools/perf/pmu-ev= ents/arch/x86/mapfile.csv index 6650100830c4..9020d7a23c91 100644 --- a/tools/perf/pmu-events/arch/x86/mapfile.csv +++ b/tools/perf/pmu-events/arch/x86/mapfile.csv @@ -19,7 +19,7 @@ GenuineIntel-6-3A,v24,ivybridge,core GenuineIntel-6-3E,v23,ivytown,core GenuineIntel-6-2D,v23,jaketown,core GenuineIntel-6-(57|85),v10,knightslanding,core -GenuineIntel-6-A[AC],v1.03,meteorlake,core +GenuineIntel-6-A[AC],v1.04,meteorlake,core GenuineIntel-6-1[AEF],v3,nehalemep,core GenuineIntel-6-2E,v3,nehalemex,core GenuineIntel-6-A7,v1.01,rocketlake,core diff --git a/tools/perf/pmu-events/arch/x86/meteorlake/cache.json b/tools/p= erf/pmu-events/arch/x86/meteorlake/cache.json index e1ae7c92f38e..1de0200b32f6 100644 --- a/tools/perf/pmu-events/arch/x86/meteorlake/cache.json +++ b/tools/perf/pmu-events/arch/x86/meteorlake/cache.json @@ -36,6 +36,15 @@ "UMask": "0x2", "Unit": "cpu_core" }, + { + "BriefDescription": "Number of cycles a demand request has waited = due to L1D due to lack of L2 resources.", + "EventCode": "0x48", + "EventName": "L1D_PEND_MISS.L2_STALLS", + "PublicDescription": "Counts number of cycles a demand request has= waited due to L1D due to lack of L2 resources. Demand requests include cac= heable/uncacheable demand load, store, lock or SW prefetch accesses.", + "SampleAfterValue": "1000003", + "UMask": "0x4", + "Unit": "cpu_core" + }, { "BriefDescription": "Number of L1D misses that are outstanding", "EventCode": "0x48", @@ -260,6 +269,15 @@ "UMask": "0x40", "Unit": "cpu_core" }, + { + "BriefDescription": "Cycles when L1D is locked", + "EventCode": "0x42", + "EventName": "LOCK_CYCLES.CACHE_LOCK_DURATION", + "PublicDescription": "This event counts the number of cycles when = the L1D is locked. It is a superset of the 0x1 mask (BUS_LOCK_CLOCKS.BUS_LO= CK_DURATION).", + "SampleAfterValue": "2000003", + "UMask": "0x2", + "Unit": "cpu_core" + }, { "BriefDescription": "Counts the number of cacheable memory request= s that miss in the LLC. Counts on a per core basis.", "EventCode": "0x2e", @@ -514,6 +532,17 @@ "UMask": "0x4", "Unit": "cpu_core" }, + { + "BriefDescription": "Retired load instructions whose data sources = were L3 hit and cross-core snoop missed in on-pkg core cache.", + "Data_LA": "1", + "EventCode": "0xd2", + "EventName": "MEM_LOAD_L3_HIT_RETIRED.XSNP_MISS", + "PEBS": "1", + "PublicDescription": "Counts the retired load instructions whose d= ata sources were L3 hit and cross-core snoop missed in on-pkg core cache.", + "SampleAfterValue": "20011", + "UMask": "0x1", + "Unit": "cpu_core" + }, { "BriefDescription": "Retired load instructions whose data sources = were hits in L3 without snoops required", "Data_LA": "1", @@ -730,6 +759,14 @@ "UMask": "0x1", "Unit": "cpu_atom" }, + { + "BriefDescription": "MEM_STORE_RETIRED.L2_HIT", + "EventCode": "0x44", + "EventName": "MEM_STORE_RETIRED.L2_HIT", + "SampleAfterValue": "200003", + "UMask": "0x1", + "Unit": "cpu_core" + }, { "BriefDescription": "Counts the number of load ops retired.", "Data_LA": "1", @@ -977,6 +1014,15 @@ "UMask": "0x8", "Unit": "cpu_core" }, + { + "BriefDescription": "Cacheable and Non-Cacheable code read request= s", + "EventCode": "0x21", + "EventName": "OFFCORE_REQUESTS.DEMAND_CODE_RD", + "PublicDescription": "Counts both cacheable and Non-Cacheable code= read requests.", + "SampleAfterValue": "100003", + "UMask": "0x2", + "Unit": "cpu_core" + }, { "BriefDescription": "Demand Data Read requests sent to uncore", "EventCode": "0x21", @@ -995,6 +1041,89 @@ "UMask": "0x4", "Unit": "cpu_core" }, + { + "BriefDescription": "Cycles when offcore outstanding cacheable Cor= e Data Read transactions are present in SuperQueue (SQ), queue to uncore.", + "CounterMask": "1", + "EventCode": "0x20", + "EventName": "OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD", + "PublicDescription": "Counts cycles when offcore outstanding cache= able Core Data Read transactions are present in the super queue. A transact= ion is considered to be in the Offcore outstanding state between L2 miss an= d transaction completion sent to requestor (SQ de-allocation). See correspo= nding Umask under OFFCORE_REQUESTS.", + "SampleAfterValue": "1000003", + "UMask": "0x8", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Cycles with offcore outstanding Code Reads tr= ansactions in the SuperQueue (SQ), queue to uncore.", + "CounterMask": "1", + "EventCode": "0x20", + "EventName": "OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_CODE= _RD", + "PublicDescription": "Counts the number of offcore outstanding Cod= e Reads transactions in the super queue every cycle. The 'Offcore outstandi= ng' state of the transaction lasts from the L2 miss until the sending trans= action completion to requestor (SQ deallocation). See the corresponding Uma= sk under OFFCORE_REQUESTS.", + "SampleAfterValue": "1000003", + "UMask": "0x2", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Cycles where at least 1 outstanding demand da= ta read request is pending.", + "CounterMask": "1", + "EventCode": "0x20", + "EventName": "OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_DATA= _RD", + "SampleAfterValue": "2000003", + "UMask": "0x1", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Cycles with offcore outstanding demand rfo re= ads transactions in SuperQueue (SQ), queue to uncore.", + "CounterMask": "1", + "EventCode": "0x20", + "EventName": "OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_RFO", + "PublicDescription": "Counts the number of offcore outstanding dem= and rfo Reads transactions in the super queue every cycle. The 'Offcore out= standing' state of the transaction lasts from the L2 miss until the sending= transaction completion to requestor (SQ deallocation). See the correspondi= ng Umask under OFFCORE_REQUESTS.", + "SampleAfterValue": "1000003", + "UMask": "0x4", + "Unit": "cpu_core" + }, + { + "BriefDescription": "OFFCORE_REQUESTS_OUTSTANDING.DATA_RD", + "EventCode": "0x20", + "EventName": "OFFCORE_REQUESTS_OUTSTANDING.DATA_RD", + "SampleAfterValue": "1000003", + "UMask": "0x8", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Offcore outstanding Code Reads transactions i= n the SuperQueue (SQ), queue to uncore, every cycle.", + "EventCode": "0x20", + "EventName": "OFFCORE_REQUESTS_OUTSTANDING.DEMAND_CODE_RD", + "PublicDescription": "Counts the number of offcore outstanding Cod= e Reads transactions in the super queue every cycle. The 'Offcore outstandi= ng' state of the transaction lasts from the L2 miss until the sending trans= action completion to requestor (SQ deallocation). See the corresponding Uma= sk under OFFCORE_REQUESTS.", + "SampleAfterValue": "1000003", + "UMask": "0x2", + "Unit": "cpu_core" + }, + { + "BriefDescription": "For every cycle, increments by the number of = outstanding demand data read requests pending.", + "EventCode": "0x20", + "EventName": "OFFCORE_REQUESTS_OUTSTANDING.DEMAND_DATA_RD", + "PublicDescription": "For every cycle, increments by the number of= outstanding demand data read requests pending. Requests are considered o= utstanding from the time they miss the core's L2 cache until the transactio= n completion message is sent to the requestor.", + "SampleAfterValue": "1000003", + "UMask": "0x1", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Cycles with at least 6 offcore outstanding De= mand Data Read transactions in uncore queue.", + "CounterMask": "6", + "EventCode": "0x20", + "EventName": "OFFCORE_REQUESTS_OUTSTANDING.DEMAND_DATA_RD_GE_6", + "SampleAfterValue": "2000003", + "UMask": "0x1", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Store Read transactions pending for off-core.= Highly correlated.", + "EventCode": "0x20", + "EventName": "OFFCORE_REQUESTS_OUTSTANDING.DEMAND_RFO", + "PublicDescription": "Counts the number of off-core outstanding re= ad-for-ownership (RFO) store transactions every cycle. An RFO transaction i= s considered to be in the Off-core outstanding state between L2 cache miss = and transaction completion.", + "SampleAfterValue": "1000003", + "UMask": "0x4", + "Unit": "cpu_core" + }, { "BriefDescription": "Counts bus locks, accounts for cache line spl= it locks and UC locks.", "EventCode": "0x2c", @@ -1004,6 +1133,42 @@ "UMask": "0x10", "Unit": "cpu_core" }, + { + "BriefDescription": "Number of PREFETCHNTA instructions executed.", + "EventCode": "0x40", + "EventName": "SW_PREFETCH_ACCESS.NTA", + "PublicDescription": "Counts the number of PREFETCHNTA instruction= s executed.", + "SampleAfterValue": "100003", + "UMask": "0x1", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Number of PREFETCHW instructions executed.", + "EventCode": "0x40", + "EventName": "SW_PREFETCH_ACCESS.PREFETCHW", + "PublicDescription": "Counts the number of PREFETCHW instructions = executed.", + "SampleAfterValue": "100003", + "UMask": "0x8", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Number of PREFETCHT0 instructions executed.", + "EventCode": "0x40", + "EventName": "SW_PREFETCH_ACCESS.T0", + "PublicDescription": "Counts the number of PREFETCHT0 instructions= executed.", + "SampleAfterValue": "100003", + "UMask": "0x2", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Number of PREFETCHT1 or PREFETCHT2 instructio= ns executed.", + "EventCode": "0x40", + "EventName": "SW_PREFETCH_ACCESS.T1_T2", + "PublicDescription": "Counts the number of PREFETCHT1 or PREFETCHT= 2 instructions executed.", + "SampleAfterValue": "100003", + "UMask": "0x4", + "Unit": "cpu_core" + }, { "BriefDescription": "Counts the number of issue slots every cycle = that were not delivered by the frontend due to an icache miss", "EventCode": "0x71", diff --git a/tools/perf/pmu-events/arch/x86/meteorlake/floating-point.json = b/tools/perf/pmu-events/arch/x86/meteorlake/floating-point.json index 616489f0974a..f66506ee37ef 100644 --- a/tools/perf/pmu-events/arch/x86/meteorlake/floating-point.json +++ b/tools/perf/pmu-events/arch/x86/meteorlake/floating-point.json @@ -41,6 +41,14 @@ "UMask": "0x2", "Unit": "cpu_core" }, + { + "BriefDescription": "FP_ARITH_DISPATCHED.PORT_5", + "EventCode": "0xb3", + "EventName": "FP_ARITH_DISPATCHED.PORT_5", + "SampleAfterValue": "2000003", + "UMask": "0x4", + "Unit": "cpu_core" + }, { "BriefDescription": "Counts number of SSE/AVX computational 128-bi= t packed double precision floating-point instructions retired; some instruc= tions will count twice as noted below. Each count represents 2 computation= operations, one for each element. Applies to SSE* and AVX* packed double = precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN= MAX SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice = as they perform 2 calculations per element.", "EventCode": "0xc7", diff --git a/tools/perf/pmu-events/arch/x86/meteorlake/frontend.json b/tool= s/perf/pmu-events/arch/x86/meteorlake/frontend.json index 0f064518d1c0..8264419500a5 100644 --- a/tools/perf/pmu-events/arch/x86/meteorlake/frontend.json +++ b/tools/perf/pmu-events/arch/x86/meteorlake/frontend.json @@ -43,6 +43,14 @@ "UMask": "0x2", "Unit": "cpu_core" }, + { + "BriefDescription": "DSB_FILL.FB_STALL_OT", + "EventCode": "0x62", + "EventName": "DSB_FILL.FB_STALL_OT", + "SampleAfterValue": "1000003", + "UMask": "0x10", + "Unit": "cpu_core" + }, { "BriefDescription": "Retired ANT branches", "EventCode": "0xc6", @@ -55,6 +63,30 @@ "UMask": "0x3", "Unit": "cpu_core" }, + { + "BriefDescription": "Retired Instructions who experienced DSB miss= .", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.ANY_DSB_MISS", + "MSRIndex": "0x3F7", + "MSRValue": "0x1", + "PEBS": "1", + "PublicDescription": "Counts retired Instructions that experienced= DSB (Decode stream buffer i.e. the decoded instruction-cache) miss.", + "SampleAfterValue": "100007", + "UMask": "0x3", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Retired Instructions who experienced a critic= al DSB miss.", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.DSB_MISS", + "MSRIndex": "0x3F7", + "MSRValue": "0x11", + "PEBS": "1", + "PublicDescription": "Number of retired Instructions that experien= ced a critical DSB (Decode stream buffer i.e. the decoded instruction-cache= ) miss. Critical means stalls were exposed to the back-end as a result of t= he DSB miss.", + "SampleAfterValue": "100007", + "UMask": "0x3", + "Unit": "cpu_core" + }, { "BriefDescription": "Counts the number of instructions retired tha= t were tagged because empty issue slots were seen before the uop due to ITL= B miss", "EventCode": "0xc6", @@ -88,6 +120,18 @@ "UMask": "0x3", "Unit": "cpu_core" }, + { + "BriefDescription": "Retired Instructions who experienced Instruct= ion L2 Cache true miss.", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.L2_MISS", + "MSRIndex": "0x3F7", + "MSRValue": "0x13", + "PEBS": "1", + "PublicDescription": "Counts retired Instructions who experienced = Instruction L2 Cache true miss.", + "SampleAfterValue": "100007", + "UMask": "0x3", + "Unit": "cpu_core" + }, { "BriefDescription": "Retired instructions after front-end starvati= on of at least 1 cycle", "EventCode": "0xc6", @@ -243,6 +287,18 @@ "UMask": "0x3", "Unit": "cpu_core" }, + { + "BriefDescription": "Retired Instructions who experienced STLB (2n= d level TLB) true miss.", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.STLB_MISS", + "MSRIndex": "0x3F7", + "MSRValue": "0x15", + "PEBS": "1", + "PublicDescription": "Counts retired Instructions that experienced= STLB (2nd level TLB) true miss.", + "SampleAfterValue": "100007", + "UMask": "0x3", + "Unit": "cpu_core" + }, { "BriefDescription": "FRONTEND_RETIRED.UNKNOWN_BRANCH", "EventCode": "0xc6", diff --git a/tools/perf/pmu-events/arch/x86/meteorlake/memory.json b/tools/= perf/pmu-events/arch/x86/meteorlake/memory.json index 67e949b4c789..2605e1d0ba9f 100644 --- a/tools/perf/pmu-events/arch/x86/meteorlake/memory.json +++ b/tools/perf/pmu-events/arch/x86/meteorlake/memory.json @@ -66,6 +66,15 @@ "UMask": "0x84", "Unit": "cpu_atom" }, + { + "BriefDescription": "Number of machine clears due to memory orderi= ng conflicts.", + "EventCode": "0xc3", + "EventName": "MACHINE_CLEARS.MEMORY_ORDERING", + "PublicDescription": "Counts the number of Machine Clears detected= dye to memory ordering. Memory Ordering Machine Clears may apply when a me= mory read may not conform to the memory ordering rules of the x86 architect= ure", + "SampleAfterValue": "100003", + "UMask": "0x2", + "Unit": "cpu_core" + }, { "BriefDescription": "Execution stalls while L1 cache miss demand l= oad is outstanding.", "CounterMask": "3", @@ -95,6 +104,35 @@ "UMask": "0x9", "Unit": "cpu_core" }, + { + "BriefDescription": "MEMORY_ORDERING.MD_NUKE", + "EventCode": "0x09", + "EventName": "MEMORY_ORDERING.MD_NUKE", + "SampleAfterValue": "100003", + "UMask": "0x1", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Counts the number of memory ordering machine = clears due to memory renaming.", + "EventCode": "0x09", + "EventName": "MEMORY_ORDERING.MRN_NUKE", + "SampleAfterValue": "100003", + "UMask": "0x2", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Counts randomly selected loads when the laten= cy from first dispatch to completion is greater than 1024 cycles.", + "Data_LA": "1", + "EventCode": "0xcd", + "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_1024", + "MSRIndex": "0x3F6", + "MSRValue": "0x400", + "PEBS": "2", + "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 1024 cycles. Reporte= d latency may be longer than just the memory latency.", + "SampleAfterValue": "53", + "UMask": "0x1", + "Unit": "cpu_core" + }, { "BriefDescription": "Counts randomly selected loads when the laten= cy from first dispatch to completion is greater than 128 cycles.", "Data_LA": "1", @@ -121,6 +159,19 @@ "UMask": "0x1", "Unit": "cpu_core" }, + { + "BriefDescription": "Counts randomly selected loads when the laten= cy from first dispatch to completion is greater than 2048 cycles.", + "Data_LA": "1", + "EventCode": "0xcd", + "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_2048", + "MSRIndex": "0x3F6", + "MSRValue": "0x800", + "PEBS": "2", + "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 2048 cycles. Reporte= d latency may be longer than just the memory latency.", + "SampleAfterValue": "23", + "UMask": "0x1", + "Unit": "cpu_core" + }, { "BriefDescription": "Counts randomly selected loads when the laten= cy from first dispatch to completion is greater than 256 cycles.", "Data_LA": "1", @@ -235,5 +286,34 @@ "SampleAfterValue": "100003", "UMask": "0x10", "Unit": "cpu_core" + }, + { + "BriefDescription": "Cycles where data return is pending for a Dem= and Data Read request who miss L3 cache.", + "CounterMask": "1", + "EventCode": "0x20", + "EventName": "OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_L3_MISS_DEM= AND_DATA_RD", + "PublicDescription": "Cycles with at least 1 Demand Data Read requ= ests who miss L3 cache in the superQ.", + "SampleAfterValue": "1000003", + "UMask": "0x10", + "Unit": "cpu_core" + }, + { + "BriefDescription": "For every cycle, increments by the number of = demand data read requests pending that are known to have missed the L3 cach= e.", + "EventCode": "0x20", + "EventName": "OFFCORE_REQUESTS_OUTSTANDING.L3_MISS_DEMAND_DATA_RD", + "PublicDescription": "For every cycle, increments by the number of= demand data read requests pending that are known to have missed the L3 cac= he. Note that this does not capture all elapsed cycles while requests are = outstanding - only cycles from when the requests were known by the requesti= ng core to have missed the L3 cache.", + "SampleAfterValue": "2000003", + "UMask": "0x10", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Cycles where the core is waiting on at least = 6 outstanding demand data read requests known to have missed the L3 cache.", + "CounterMask": "6", + "EventCode": "0x20", + "EventName": "OFFCORE_REQUESTS_OUTSTANDING.L3_MISS_DEMAND_DATA_RD_= GE_6", + "PublicDescription": "Cycles where the core is waiting on at least= 6 outstanding demand data read requests known to have missed the L3 cache.= Note that this event does not capture all elapsed cycles while the reques= ts are outstanding - only cycles from when the requests were known to have = missed the L3 cache.", + "SampleAfterValue": "2000003", + "UMask": "0x10", + "Unit": "cpu_core" } ] diff --git a/tools/perf/pmu-events/arch/x86/meteorlake/other.json b/tools/p= erf/pmu-events/arch/x86/meteorlake/other.json index 2ec57f487525..f4c603599df4 100644 --- a/tools/perf/pmu-events/arch/x86/meteorlake/other.json +++ b/tools/perf/pmu-events/arch/x86/meteorlake/other.json @@ -1,4 +1,12 @@ [ + { + "BriefDescription": "ASSISTS.PAGE_FAULT", + "EventCode": "0xc1", + "EventName": "ASSISTS.PAGE_FAULT", + "SampleAfterValue": "1000003", + "UMask": "0x8", + "Unit": "cpu_core" + }, { "BriefDescription": "Counts streaming stores that have any type of= response.", "EventCode": "0x2A,0x2B", @@ -30,6 +38,14 @@ "UMask": "0x7", "Unit": "cpu_core" }, + { + "BriefDescription": "RS.EMPTY_RESOURCE", + "EventCode": "0xa5", + "EventName": "RS.EMPTY_RESOURCE", + "SampleAfterValue": "1000003", + "UMask": "0x1", + "Unit": "cpu_core" + }, { "BriefDescription": "Counts the number of issue slots in a UMWAIT = or TPAUSE instruction where no uop issues due to the instruction putting th= e CPU into the C0.1 activity state. For Tremont, UMWAIT and TPAUSE will onl= y put the CPU into C0.1 activity state (not C0.2 activity state)", "EventCode": "0x75", diff --git a/tools/perf/pmu-events/arch/x86/meteorlake/pipeline.json b/tool= s/perf/pmu-events/arch/x86/meteorlake/pipeline.json index eeaa7a97f71c..352c5efafc06 100644 --- a/tools/perf/pmu-events/arch/x86/meteorlake/pipeline.json +++ b/tools/perf/pmu-events/arch/x86/meteorlake/pipeline.json @@ -311,6 +311,16 @@ "UMask": "0x60", "Unit": "cpu_core" }, + { + "BriefDescription": "This event counts the number of mispredicted = ret instructions retired. Non PEBS", + "EventCode": "0xc5", + "EventName": "BR_MISP_RETIRED.RET", + "PEBS": "1", + "PublicDescription": "This is a non-precise version (that is, does= not use PEBS) of the event that counts mispredicted return instructions re= tired.", + "SampleAfterValue": "100007", + "UMask": "0x8", + "Unit": "cpu_core" + }, { "BriefDescription": "Counts the number of mispredicted near RET br= anch instructions retired.", "EventCode": "0xc5", @@ -329,6 +339,33 @@ "UMask": "0x48", "Unit": "cpu_core" }, + { + "BriefDescription": "Core clocks when the thread is in the C0.1 li= ght-weight slower wakeup time but more power saving optimized state.", + "EventCode": "0xec", + "EventName": "CPU_CLK_UNHALTED.C01", + "PublicDescription": "Counts core clocks when the thread is in the= C0.1 light-weight slower wakeup time but more power saving optimized state= . This state can be entered via the TPAUSE or UMWAIT instructions.", + "SampleAfterValue": "2000003", + "UMask": "0x10", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Core clocks when the thread is in the C0.2 li= ght-weight faster wakeup time but less power saving optimized state.", + "EventCode": "0xec", + "EventName": "CPU_CLK_UNHALTED.C02", + "PublicDescription": "Counts core clocks when the thread is in the= C0.2 light-weight faster wakeup time but less power saving optimized state= . This state can be entered via the TPAUSE or UMWAIT instructions.", + "SampleAfterValue": "2000003", + "UMask": "0x20", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Core clocks when the thread is in the C0.1 or= C0.2 or running a PAUSE in C0 ACPI state.", + "EventCode": "0xec", + "EventName": "CPU_CLK_UNHALTED.C0_WAIT", + "PublicDescription": "Counts core clocks when the thread is in the= C0.1 or C0.2 power saving optimized states (TPAUSE or UMWAIT instructions)= or running the PAUSE instruction.", + "SampleAfterValue": "2000003", + "UMask": "0x70", + "Unit": "cpu_core" + }, { "BriefDescription": "Fixed Counter: Counts the number of unhalted = core clock cycles", "EventName": "CPU_CLK_UNHALTED.CORE", @@ -361,6 +398,24 @@ "UMask": "0x2", "Unit": "cpu_core" }, + { + "BriefDescription": "CPU_CLK_UNHALTED.PAUSE", + "EventCode": "0xec", + "EventName": "CPU_CLK_UNHALTED.PAUSE", + "SampleAfterValue": "2000003", + "UMask": "0x40", + "Unit": "cpu_core" + }, + { + "BriefDescription": "CPU_CLK_UNHALTED.PAUSE_INST", + "CounterMask": "1", + "EdgeDetect": "1", + "EventCode": "0xec", + "EventName": "CPU_CLK_UNHALTED.PAUSE_INST", + "SampleAfterValue": "2000003", + "UMask": "0x40", + "Unit": "cpu_core" + }, { "BriefDescription": "Core crystal clock cycles. Cycle counts are e= venly distributed between active threads in the Core.", "EventCode": "0x3c", @@ -602,6 +657,15 @@ "UMask": "0x10", "Unit": "cpu_core" }, + { + "BriefDescription": "Retired NOP instructions.", + "EventCode": "0xc0", + "EventName": "INST_RETIRED.NOP", + "PublicDescription": "Counts all retired NOP or ENDBR32/64 or PREF= ETCHIT0/1 instructions", + "SampleAfterValue": "2000003", + "UMask": "0x2", + "Unit": "cpu_core" + }, { "BriefDescription": "Precise instruction retired with PEBS precise= -distribution", "EventName": "INST_RETIRED.PREC_DIST", @@ -611,6 +675,15 @@ "UMask": "0x1", "Unit": "cpu_core" }, + { + "BriefDescription": "Iterations of Repeat string retired instructi= ons.", + "EventCode": "0xc0", + "EventName": "INST_RETIRED.REP_ITERATION", + "PublicDescription": "Number of iterations of Repeat (REP) string = retired instructions such as MOVS, CMPS, and SCAS. Each has a byte, word, a= nd doubleword version and string instructions can be repeated using a repet= ition prefix, REP, that allows their architectural execution to be repeated= a number of times as specified by the RCX register. Note the number of ite= rations is implementation-dependent.", + "SampleAfterValue": "2000003", + "UMask": "0x8", + "Unit": "cpu_core" + }, { "BriefDescription": "Cycles the Backend cluster is recovering afte= r a miss-speculation or a Store Buffer or Load Buffer drain stall.", "CounterMask": "1", @@ -621,6 +694,17 @@ "UMask": "0x3", "Unit": "cpu_core" }, + { + "BriefDescription": "Clears speculative count", + "CounterMask": "1", + "EdgeDetect": "1", + "EventCode": "0xad", + "EventName": "INT_MISC.CLEARS_COUNT", + "PublicDescription": "Counts the number of speculative clears due = to any type of branch misprediction or machine clears", + "SampleAfterValue": "500009", + "UMask": "0x1", + "Unit": "cpu_core" + }, { "BriefDescription": "Counts cycles after recovery from a branch mi= sprediction or machine clear till the first uop is issued from the resteere= d path.", "EventCode": "0xad", @@ -630,6 +714,15 @@ "UMask": "0x80", "Unit": "cpu_core" }, + { + "BriefDescription": "Cycles when Resource Allocation Table (RAT) e= xternal stall is sent to Instruction Decode Queue (IDQ) for the thread", + "EventCode": "0xad", + "EventName": "INT_MISC.RAT_STALLS", + "PublicDescription": "This event counts the number of cycles durin= g which Resource Allocation Table (RAT) external stall is sent to Instructi= on Decode Queue (IDQ) for the current thread. This also includes the cycles= during which the Allocator is serving another thread.", + "SampleAfterValue": "1000003", + "UMask": "0x8", + "Unit": "cpu_core" + }, { "BriefDescription": "Core cycles the allocator was stalled due to = recovery from earlier clear event for this thread", "EventCode": "0xad", @@ -733,6 +826,15 @@ "UMask": "0x4", "Unit": "cpu_atom" }, + { + "BriefDescription": "False dependencies in MOB due to partial comp= are on address.", + "EventCode": "0x03", + "EventName": "LD_BLOCKS.ADDRESS_ALIAS", + "PublicDescription": "Counts the number of times a load got blocke= d due to false dependencies in MOB due to partial compare on address.", + "SampleAfterValue": "100003", + "UMask": "0x4", + "Unit": "cpu_core" + }, { "BriefDescription": "Counts the number of retired loads that are b= locked because its address exactly matches an older store whose data is not= ready.", "EventCode": "0x03", @@ -742,6 +844,15 @@ "UMask": "0x1", "Unit": "cpu_atom" }, + { + "BriefDescription": "The number of times that split load operation= s are temporarily blocked because all resources for handling the split acce= sses are in use.", + "EventCode": "0x03", + "EventName": "LD_BLOCKS.NO_SR", + "PublicDescription": "Counts the number of times that split load o= perations are temporarily blocked because all resources for handling the sp= lit accesses are in use.", + "SampleAfterValue": "100003", + "UMask": "0x88", + "Unit": "cpu_core" + }, { "BriefDescription": "Counts the number of retired loads that are b= locked because its address partially overlapped with an older store.", "EventCode": "0x03", @@ -751,6 +862,15 @@ "UMask": "0x2", "Unit": "cpu_atom" }, + { + "BriefDescription": "Loads blocked due to overlapping with a prece= ding store that cannot be forwarded.", + "EventCode": "0x03", + "EventName": "LD_BLOCKS.STORE_FORWARD", + "PublicDescription": "Counts the number of times where store forwa= rding was prevented for a load operation. The most common case is a load bl= ocked due to the address of memory access (partially) overlapping with a pr= eceding uncompleted store. Note: See the table of not supported store forwa= rds in the Optimization Guide.", + "SampleAfterValue": "100003", + "UMask": "0x82", + "Unit": "cpu_core" + }, { "BriefDescription": "Cycles Uops delivered by the LSD, but didn't = come from the decoder.", "CounterMask": "1", @@ -823,6 +943,24 @@ "UMask": "0x1", "Unit": "cpu_atom" }, + { + "BriefDescription": "Self-modifying code (SMC) detected.", + "EventCode": "0xc3", + "EventName": "MACHINE_CLEARS.SMC", + "PublicDescription": "Counts self-modifying code (SMC) detected, w= hich causes a machine clear.", + "SampleAfterValue": "100003", + "UMask": "0x4", + "Unit": "cpu_core" + }, + { + "BriefDescription": "LFENCE instructions retired", + "EventCode": "0xe0", + "EventName": "MISC2_RETIRED.LFENCE", + "PublicDescription": "number of LFENCE retired instructions", + "SampleAfterValue": "400009", + "UMask": "0x20", + "Unit": "cpu_core" + }, { "BriefDescription": "Counts cycles where the pipeline is stalled d= ue to serializing operations.", "EventCode": "0xa2", @@ -1260,6 +1398,16 @@ "UMask": "0x1", "Unit": "cpu_core" }, + { + "BriefDescription": "Cycles with retired uop(s).", + "CounterMask": "1", + "EventCode": "0xc2", + "EventName": "UOPS_RETIRED.CYCLES", + "PublicDescription": "Counts cycles where at least one uop has ret= ired.", + "SampleAfterValue": "1000003", + "UMask": "0x2", + "Unit": "cpu_core" + }, { "BriefDescription": "Retired uops except the last uop of each inst= ruction.", "EventCode": "0xc2", @@ -1306,6 +1454,17 @@ "UMask": "0x2", "Unit": "cpu_core" }, + { + "BriefDescription": "Cycles without actually retired uops.", + "CounterMask": "1", + "EventCode": "0xc2", + "EventName": "UOPS_RETIRED.STALLS", + "Invert": "1", + "PublicDescription": "This event counts cycles without actually re= tired uops.", + "SampleAfterValue": "1000003", + "UMask": "0x2", + "Unit": "cpu_core" + }, { "BriefDescription": "Cycles with less than 10 actually retired uop= s.", "CounterMask": "10", --=20 2.41.0.585.gd2178a4bd4-goog From nobody Sat Feb 7 19:00:24 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 808F3C04E69 for ; Tue, 1 Aug 2023 05:37:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230372AbjHAFhH (ORCPT ); Tue, 1 Aug 2023 01:37:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49670 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230173AbjHAFhA (ORCPT ); Tue, 1 Aug 2023 01:37:00 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 534F41BF1 for ; Mon, 31 Jul 2023 22:36:54 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-d326cb4c97fso1705002276.0 for ; Mon, 31 Jul 2023 22:36:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1690868213; x=1691473013; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=UKKw+NoKhTAujkctGAEA03hT1bAPJjDpeY1EXDQF9cg=; b=q6mr/otNvx2nbxcxVCrk74/og2gISIFG6v2fLfLC1kP8KTWbzsu4C4kRbqmhMLyDo2 4mzM/6+JtWILDXVoGkJdk7m6Fc0eWnqkzMNSwLpATn8rJHoJgWlDUBNS2+mPkdNJ1DU2 l6hLg+kVsntRrSILPpPc9IZ7/Q+L1LJLNr3jnxA0lhPivxDRQfM7hoHeAm6j1tq8qHF+ pxsbLHqC99bcCRvVb9ef3eVuOb70YIE9eTO39B/qleRyn7Mma0P4nYsf7cy/I26vmYCm hpZnDMMtxvEDiwD5JAL2428WVYqiETYbI9CnWwcIbWQRX+Zobnccp4PYJBCZ4d5Ogwww LzYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690868213; x=1691473013; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=UKKw+NoKhTAujkctGAEA03hT1bAPJjDpeY1EXDQF9cg=; b=iolkGUY9hNNzBhrQ+/QX3rqiBMrYaH78MRS/qLunJCw3kYn4RILg4XisfnJ6ffsUIv dy10FPV74ZeveCNLE2eSvtK9cAvXQr0bthNyRYj24OWinNQ594mb6E2ZJgL5gLB6pu0v MDSNo0MwV01/A+hHQ8acVycPTNwdqwGUA7FvUbdauZ/cq5/DQZW/lPzsp0yJEtlLTKpf zaVPuQDTBA0XBFD6MeROMlJ6PM3aLFlCN/Mq7PnnExBKxF3p1U+q3nfwdR+2LsF55NUO PABwk39Q6+qf2V/JLZJgZOGXSoamgFwD3BqG5lCHa6+eXvHHf0Y/58kcHgbU1zJ7Uj0P tM6w== X-Gm-Message-State: ABy/qLboWt7dYYo7QR44r1pTqzUktjxRzxXzoumhKCWUsnCcWP2IcWsT rQpieGrEwo4gSgQS1ijlZ1e7ULj/Sqwv X-Google-Smtp-Source: APBJJlGHBKhwYEiNXtpe+3148aniIERraU7cOk0yF95FuPWBpAxYzpg9M/tmYEo+B0FB56/TFySIN1uF3T1J X-Received: from irogers.svl.corp.google.com ([2620:15c:2a3:200:dd0a:f4b:6531:d8dc]) (user=irogers job=sendgmr) by 2002:a25:ce91:0:b0:c4e:3060:41f9 with SMTP id x139-20020a25ce91000000b00c4e306041f9mr71542ybe.9.1690868213682; Mon, 31 Jul 2023 22:36:53 -0700 (PDT) Date: Mon, 31 Jul 2023 22:36:33 -0700 In-Reply-To: <20230801053634.1142634-1-irogers@google.com> Message-Id: <20230801053634.1142634-4-irogers@google.com> Mime-Version: 1.0 References: <20230801053634.1142634-1-irogers@google.com> X-Mailer: git-send-email 2.41.0.585.gd2178a4bd4-goog Subject: [PATCH v1 3/4] perf vendor events intel: Update sapphirerapids to 1.15 From: Ian Rogers To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Adrian Hunter , Kan Liang , Ravi Bangoria , Zhengjun Xing , linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, Andi Kleen , Weilin Wang Cc: Ian Rogers Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" 1.15 events were released in: https://github.com/intel/perfmon/commit/76dfb81a1148ec049fd9caae9c62529404d= a63df Adds the events OCR.DEMAND_DATA_RD.LOCAL_SOCKET_PMM and OCR.DEMAND_DATA_RD.PMM. Signed-off-by: Ian Rogers Reviewed-by: Kan Liang --- tools/perf/pmu-events/arch/x86/mapfile.csv | 2 +- .../arch/x86/sapphirerapids/other.json | 18 ++++++++++++++++++ 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/tools/perf/pmu-events/arch/x86/mapfile.csv b/tools/perf/pmu-ev= ents/arch/x86/mapfile.csv index 9020d7a23c91..3a8770e29fe8 100644 --- a/tools/perf/pmu-events/arch/x86/mapfile.csv +++ b/tools/perf/pmu-events/arch/x86/mapfile.csv @@ -24,7 +24,7 @@ GenuineIntel-6-1[AEF],v3,nehalemep,core GenuineIntel-6-2E,v3,nehalemex,core GenuineIntel-6-A7,v1.01,rocketlake,core GenuineIntel-6-2A,v19,sandybridge,core -GenuineIntel-6-(8F|CF),v1.14,sapphirerapids,core +GenuineIntel-6-(8F|CF),v1.15,sapphirerapids,core GenuineIntel-6-AF,v1.00,sierraforest,core GenuineIntel-6-(37|4A|4C|4D|5A),v15,silvermont,core GenuineIntel-6-(4E|5E|8E|9E|A5|A6),v57,skylake,core diff --git a/tools/perf/pmu-events/arch/x86/sapphirerapids/other.json b/too= ls/perf/pmu-events/arch/x86/sapphirerapids/other.json index 31b6be9fb8c7..442ef3807a9d 100644 --- a/tools/perf/pmu-events/arch/x86/sapphirerapids/other.json +++ b/tools/perf/pmu-events/arch/x86/sapphirerapids/other.json @@ -76,6 +76,24 @@ "SampleAfterValue": "100003", "UMask": "0x1" }, + { + "BriefDescription": "Counts demand data reads that were supplied b= y PMM attached to this socket, whether or not in Sub NUMA Cluster(SNC) Mode= . In SNC Mode counts PMM accesses that are controlled by the close or dist= ant SNC Cluster.", + "EventCode": "0x2A,0x2B", + "EventName": "OCR.DEMAND_DATA_RD.LOCAL_SOCKET_PMM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x700C00001", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand data reads that were supplied b= y PMM.", + "EventCode": "0x2A,0x2B", + "EventName": "OCR.DEMAND_DATA_RD.PMM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x703C00001", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, { "BriefDescription": "Counts demand data reads that were supplied b= y DRAM attached to another socket.", "EventCode": "0x2A,0x2B", --=20 2.41.0.585.gd2178a4bd4-goog From nobody Sat Feb 7 19:00:24 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A7A48C00528 for ; Tue, 1 Aug 2023 05:37:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230515AbjHAFhO (ORCPT ); Tue, 1 Aug 2023 01:37:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49680 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229658AbjHAFhB (ORCPT ); Tue, 1 Aug 2023 01:37:01 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 04EE71FC9 for ; Mon, 31 Jul 2023 22:36:58 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-d061f324d64so5526637276.1 for ; Mon, 31 Jul 2023 22:36:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1690868217; x=1691473017; h=content-transfer-encoding:cc:to:from:subject:references :mime-version:message-id:in-reply-to:date:from:to:cc:subject:date :message-id:reply-to; bh=TbCLMGMEsNi4R0z+TbrnfgluYwXGQPMhFZKJ1X9gbTE=; b=ZSlB4EfKU6QY5k33ik8Xfcz+f72KWQWxz5SblXqTZSKtcW6sPGRrp6EZSRS9aoY5K4 9c4FuX94Z8U7IhQNJlPBilYAN6y7iZih4mY9C3o0tVX8LxVdbWmBkcljyNntTyue+9Jt IP5hJ8yS7S8SwOG7ByQnv00LJnuVX53dKpEG5q9twtLK9N7lPaimgT8zpRQblmITeKGl wrQepr2ipUePDhJISyvMtzRPph++kxF/Gydw6TyefBdVDvlBzSsde8YuAbjnDFrsKiHJ yl08ff0wocUO79XF2y/OxFgHp43B7RwuBHx1xc3Ew2kj2nvC9t9MYDyppQDs52ShzrRR RvuA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690868217; x=1691473017; h=content-transfer-encoding:cc:to:from:subject:references :mime-version:message-id:in-reply-to:date:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=TbCLMGMEsNi4R0z+TbrnfgluYwXGQPMhFZKJ1X9gbTE=; b=IS31bp+OxtksJ5aN5l75O/I0Jw/V0o0nHyZknxHCdZ5Q0jPPfA4vV38O7qJVcxPKlB cjluK7f3jPf9X/AqWHizzA24+oFHKhDrpv6m6fmGdn1O7eDKKe5ok/cfBmmJN7SrIOYN WycEkI5oJnZ5QA4v5gZjk5pRIDW4gqw4IBgRpaLoIQgDGCtOC5ZdbVMyi+n8hvtQOFGE DXcwQNi6H8y7q9QEEYZuTKK9ha8SYW/Sg6mmIkoOXfD/g/X9PwBXlhlyFMkx83R+EYsM opvPlSqkgTsauiNBgDW+4FzomVgmpRW5St46AYs5b6RLnO3uvMfWaJh53/1W7Hk3FJo7 uR6g== X-Gm-Message-State: ABy/qLbM3N7nyo6mvc+YzizchcvJXIN0rIiPbAVZE6o0aKimx5E4y4D5 YuxSB2ZWb9JY7tYvC7mFMhueEW4X5QvN X-Google-Smtp-Source: APBJJlGroOR0/AOlBeGaIu9EhDMydKpkqKNMeqpXQdTH6hPo1wm8H5Be7ucSuM9adgVFHinR1DvUhm03IAui X-Received: from irogers.svl.corp.google.com ([2620:15c:2a3:200:dd0a:f4b:6531:d8dc]) (user=irogers job=sendgmr) by 2002:a05:6902:1582:b0:d0a:353b:b93b with SMTP id k2-20020a056902158200b00d0a353bb93bmr75142ybu.3.1690868216155; Mon, 31 Jul 2023 22:36:56 -0700 (PDT) Date: Mon, 31 Jul 2023 22:36:34 -0700 In-Reply-To: <20230801053634.1142634-1-irogers@google.com> Message-Id: <20230801053634.1142634-5-irogers@google.com> Mime-Version: 1.0 References: <20230801053634.1142634-1-irogers@google.com> X-Mailer: git-send-email 2.41.0.585.gd2178a4bd4-goog Subject: [PATCH v1 4/4] perf vendor events intel: Update Icelake+ metric constraints From: Ian Rogers To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Adrian Hunter , Kan Liang , Ravi Bangoria , Zhengjun Xing , linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, Andi Kleen , Weilin Wang Cc: Ian Rogers Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Avoid grouping events especially in cases where the kernel's PMU driver fails to not open the events, causing the events to report back as "". This update comes from: https://github.com/intel/perfmon/pull/94 Fixes issues reported with patch: https://lore.kernel.org/lkml/20230719001836.198363-3-irogers@google.com/ Signed-off-by: Ian Rogers Reviewed-by: Kan Liang --- .../pmu-events/arch/x86/alderlake/adl-metrics.json | 11 +++++++---- .../pmu-events/arch/x86/alderlaken/adln-metrics.json | 2 ++ .../perf/pmu-events/arch/x86/icelake/icl-metrics.json | 10 ++++++---- .../pmu-events/arch/x86/icelakex/icx-metrics.json | 10 ++++++---- .../pmu-events/arch/x86/rocketlake/rkl-metrics.json | 10 ++++++---- .../arch/x86/sapphirerapids/spr-metrics.json | 9 +++++---- .../pmu-events/arch/x86/tigerlake/tgl-metrics.json | 10 ++++++---- 7 files changed, 38 insertions(+), 24 deletions(-) diff --git a/tools/perf/pmu-events/arch/x86/alderlake/adl-metrics.json b/to= ols/perf/pmu-events/arch/x86/alderlake/adl-metrics.json index daf9458f0b77..c6780d5c456b 100644 --- a/tools/perf/pmu-events/arch/x86/alderlake/adl-metrics.json +++ b/tools/perf/pmu-events/arch/x86/alderlake/adl-metrics.json @@ -558,6 +558,7 @@ }, { "BriefDescription": "Counts the number of cycles a core is stalled= due to a demand load which hit in the Last Level Cache (LLC) or other core= with HITE/F/M.", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "cpu_atom@MEM_BOUND_STALLS.LOAD_LLC_HIT@ / tma_info_= core_clks - max((cpu_atom@MEM_BOUND_STALLS.LOAD@ - cpu_atom@LD_HEAD.L1_MISS= _AT_RET@) / tma_info_core_clks, 0) * cpu_atom@MEM_BOUND_STALLS.LOAD_LLC_HIT= @ / cpu_atom@MEM_BOUND_STALLS.LOAD@", "MetricGroup": "TopdownL3;tma_L3_group;tma_memory_bound_group", "MetricName": "tma_l3_bound", @@ -800,6 +801,7 @@ }, { "BriefDescription": "Counts the number of cycles that the oldest l= oad of the load buffer is stalled at retirement due to a store forward bloc= k.", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "LD_HEAD.ST_ADDR_AT_RET / tma_info_core_clks", "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group", "MetricName": "tma_store_fwd_blk", @@ -1058,7 +1060,6 @@ }, { "BriefDescription": "This metric represents overall arithmetic flo= ating-point (FP) operations fraction the CPU has executed (retired)", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "tma_x87_use + tma_fp_scalar + tma_fp_vector", "MetricGroup": "HPC;TopdownL3;tma_L3_group;tma_light_operations_gr= oup", "MetricName": "tma_fp_arith", @@ -1230,6 +1231,7 @@ }, { "BriefDescription": "Total pipeline cost of Instruction Cache miss= es - subset of the Big_Code Bottleneck", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_fetch_latency * tma_icache_misses / (tma= _branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses += tma_lcp + tma_ms_switches))", "MetricGroup": "Fed;FetchLat;IcMiss;tma_issueFL", "MetricName": "tma_info_botlnk_l2_ic_misses", @@ -1267,6 +1269,7 @@ }, { "BriefDescription": "Total pipeline cost of (external) Memory Band= width related bottlenecks", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * tma_memory_bound * (tma_dram_bound / (tma_dra= m_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * (= tma_mem_bandwidth / (tma_mem_bandwidth + tma_mem_latency)) + tma_l3_bound /= (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_b= ound) * (tma_sq_full / (tma_contested_accesses + tma_data_sharing + tma_l3_= hit_latency + tma_sq_full))) + tma_l1_bound / (tma_dram_bound + tma_l1_boun= d + tma_l2_bound + tma_l3_bound + tma_store_bound) * (tma_fb_full / (tma_dt= lb_load + tma_fb_full + tma_lock_latency + tma_split_loads + tma_store_fwd_= blk))", "MetricGroup": "Mem;MemoryBW;Offcore;tma_issueBW", "MetricName": "tma_info_bottleneck_memory_bandwidth", @@ -1355,7 +1358,6 @@ }, { "BriefDescription": "Floating Point Operations Per Cycle", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "(cpu_core@FP_ARITH_INST_RETIRED.SCALAR_SINGLE@ + cp= u_core@FP_ARITH_INST_RETIRED.SCALAR_DOUBLE@ + 2 * cpu_core@FP_ARITH_INST_RE= TIRED.128B_PACKED_DOUBLE@ + 4 * (cpu_core@FP_ARITH_INST_RETIRED.128B_PACKED= _SINGLE@ + cpu_core@FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE@) + 8 * cpu_co= re@FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE@) / tma_info_core_core_clks", "MetricGroup": "Flops;Ret", "MetricName": "tma_info_core_flopc", @@ -1363,7 +1365,6 @@ }, { "BriefDescription": "Actual per-core usage of the Floating Point n= on-X87 execution units (regardless of precision or vector-width)", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "(cpu_core@FP_ARITH_DISPATCHED.PORT_0@ + cpu_core@FP= _ARITH_DISPATCHED.PORT_1@ + cpu_core@FP_ARITH_DISPATCHED.PORT_5@) / (2 * tm= a_info_core_core_clks)", "MetricGroup": "Cor;Flops;HPC", "MetricName": "tma_info_core_fp_arith_utilization", @@ -1769,7 +1770,6 @@ }, { "BriefDescription": "Average number of Uops retired in cycles wher= e at least one uop has retired.", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "tma_retiring * tma_info_thread_slots / cpu_core@UOP= S_RETIRED.SLOTS\\,cmask\\=3D1@", "MetricGroup": "Pipeline;Ret", "MetricName": "tma_info_pipeline_retire", @@ -2002,6 +2002,7 @@ }, { "BriefDescription": "This metric estimates how often the CPU was s= talled due to loads accesses to L3 cache or contended with a sibling Core", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "(cpu_core@MEMORY_ACTIVITY.STALLS_L2_MISS@ - cpu_cor= e@MEMORY_ACTIVITY.STALLS_L3_MISS@) / tma_info_thread_clks", "MetricGroup": "CacheMisses;MemoryBound;TmaL3mem;TopdownL3;tma_L3_= group;tma_memory_bound_group", "MetricName": "tma_l3_bound", @@ -2375,6 +2376,7 @@ }, { "BriefDescription": "This metric represents rate of split store ac= cesses", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "MEM_INST_RETIRED.SPLIT_STORES / tma_info_core_core_= clks", "MetricGroup": "TopdownL4;tma_L4_group;tma_issueSpSt;tma_store_bou= nd_group", "MetricName": "tma_split_stores", @@ -2405,6 +2407,7 @@ }, { "BriefDescription": "This metric roughly estimates fraction of cyc= les when the memory subsystem had loads blocked since they could not forwar= d data from earlier (in program order) overlapping stores", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "13 * cpu_core@LD_BLOCKS.STORE_FORWARD@ / tma_info_t= hread_clks", "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group", "MetricName": "tma_store_fwd_blk", diff --git a/tools/perf/pmu-events/arch/x86/alderlaken/adln-metrics.json b/= tools/perf/pmu-events/arch/x86/alderlaken/adln-metrics.json index 0f1628d698da..06e67e34e1bf 100644 --- a/tools/perf/pmu-events/arch/x86/alderlaken/adln-metrics.json +++ b/tools/perf/pmu-events/arch/x86/alderlaken/adln-metrics.json @@ -466,6 +466,7 @@ }, { "BriefDescription": "Counts the number of cycles a core is stalled= due to a demand load which hit in the Last Level Cache (LLC) or other core= with HITE/F/M.", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "MEM_BOUND_STALLS.LOAD_LLC_HIT / tma_info_core_clks = - max((MEM_BOUND_STALLS.LOAD - LD_HEAD.L1_MISS_AT_RET) / tma_info_core_clks= , 0) * MEM_BOUND_STALLS.LOAD_LLC_HIT / MEM_BOUND_STALLS.LOAD", "MetricGroup": "TopdownL3;tma_L3_group;tma_memory_bound_group", "MetricName": "tma_l3_bound", @@ -682,6 +683,7 @@ }, { "BriefDescription": "Counts the number of cycles that the oldest l= oad of the load buffer is stalled at retirement due to a store forward bloc= k.", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "LD_HEAD.ST_ADDR_AT_RET / tma_info_core_clks", "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group", "MetricName": "tma_store_fwd_blk", diff --git a/tools/perf/pmu-events/arch/x86/icelake/icl-metrics.json b/tool= s/perf/pmu-events/arch/x86/icelake/icl-metrics.json index 8fcc05c4e0a1..a6eed0d9a26d 100644 --- a/tools/perf/pmu-events/arch/x86/icelake/icl-metrics.json +++ b/tools/perf/pmu-events/arch/x86/icelake/icl-metrics.json @@ -85,6 +85,7 @@ }, { "BriefDescription": "This metric estimates how often memory load a= ccesses were aliased by preceding stores (in program order) with a 4K addre= ss offset", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "LD_BLOCKS_PARTIAL.ADDRESS_ALIAS / tma_info_thread_c= lks", "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group", "MetricName": "tma_4k_aliasing", @@ -319,7 +320,6 @@ }, { "BriefDescription": "This metric represents overall arithmetic flo= ating-point (FP) operations fraction the CPU has executed (retired)", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "tma_x87_use + tma_fp_scalar + tma_fp_vector", "MetricGroup": "HPC;TopdownL3;tma_L3_group;tma_light_operations_gr= oup", "MetricName": "tma_fp_arith", @@ -464,6 +464,7 @@ }, { "BriefDescription": "Total pipeline cost of Instruction Cache miss= es - subset of the Big_Code Bottleneck", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_fetch_latency * tma_icache_misses / (tma= _branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses += tma_lcp + tma_ms_switches))", "MetricGroup": "Fed;FetchLat;IcMiss;tma_issueFL", "MetricName": "tma_info_botlnk_l2_ic_misses", @@ -497,6 +498,7 @@ }, { "BriefDescription": "Total pipeline cost of (external) Memory Band= width related bottlenecks", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * tma_memory_bound * (tma_dram_bound / (tma_dra= m_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * (= tma_mem_bandwidth / (tma_mem_bandwidth + tma_mem_latency)) + tma_l3_bound /= (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_b= ound) * (tma_sq_full / (tma_contested_accesses + tma_data_sharing + tma_l3_= hit_latency + tma_sq_full))) + tma_l1_bound / (tma_dram_bound + tma_l1_boun= d + tma_l2_bound + tma_l3_bound + tma_store_bound) * (tma_fb_full / (tma_4k= _aliasing + tma_dtlb_load + tma_fb_full + tma_lock_latency + tma_split_load= s + tma_store_fwd_blk))", "MetricGroup": "Mem;MemoryBW;Offcore;tma_issueBW", "MetricName": "tma_info_bottleneck_memory_bandwidth", @@ -574,14 +576,12 @@ }, { "BriefDescription": "Floating Point Operations Per Cycle", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "(cpu@FP_ARITH_INST_RETIRED.SCALAR_SINGLE\\,umask\\= =3D0x03@ + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * cpu@FP_ARITH_= INST_RETIRED.128B_PACKED_SINGLE\\,umask\\=3D0x18@ + 8 * cpu@FP_ARITH_INST_R= ETIRED.256B_PACKED_SINGLE\\,umask\\=3D0x60@ + 16 * FP_ARITH_INST_RETIRED.51= 2B_PACKED_SINGLE) / tma_info_core_core_clks", "MetricGroup": "Flops;Ret", "MetricName": "tma_info_core_flopc" }, { "BriefDescription": "Actual per-core usage of the Floating Point n= on-X87 execution units (regardless of precision or vector-width)", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "(cpu@FP_ARITH_INST_RETIRED.SCALAR_SINGLE\\,umask\\= =3D0x03@ + cpu@FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE\\,umask\\=3D0xfc@) = / (2 * tma_info_core_core_clks)", "MetricGroup": "Cor;Flops;HPC", "MetricName": "tma_info_core_fp_arith_utilization", @@ -927,7 +927,6 @@ }, { "BriefDescription": "Average number of Uops retired in cycles wher= e at least one uop has retired.", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "tma_retiring * tma_info_thread_slots / cpu@UOPS_RET= IRED.SLOTS\\,cmask\\=3D1@", "MetricGroup": "Pipeline;Ret", "MetricName": "tma_info_pipeline_retire" @@ -1100,6 +1099,7 @@ }, { "BriefDescription": "This metric estimates how often the CPU was s= talled due to loads accesses to L3 cache or contended with a sibling Core", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "(CYCLE_ACTIVITY.STALLS_L2_MISS - CYCLE_ACTIVITY.STA= LLS_L3_MISS) / tma_info_thread_clks", "MetricGroup": "CacheMisses;MemoryBound;TmaL3mem;TopdownL3;tma_L3_= group;tma_memory_bound_group", "MetricName": "tma_l3_bound", @@ -1419,6 +1419,7 @@ }, { "BriefDescription": "This metric represents rate of split store ac= cesses", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "MEM_INST_RETIRED.SPLIT_STORES / tma_info_core_core_= clks", "MetricGroup": "TopdownL4;tma_L4_group;tma_issueSpSt;tma_store_bou= nd_group", "MetricName": "tma_split_stores", @@ -1446,6 +1447,7 @@ }, { "BriefDescription": "This metric roughly estimates fraction of cyc= les when the memory subsystem had loads blocked since they could not forwar= d data from earlier (in program order) overlapping stores", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "13 * LD_BLOCKS.STORE_FORWARD / tma_info_thread_clks= ", "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group", "MetricName": "tma_store_fwd_blk", diff --git a/tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json b/too= ls/perf/pmu-events/arch/x86/icelakex/icx-metrics.json index 9bb7e3f20f7f..7082ad5ba961 100644 --- a/tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json +++ b/tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json @@ -289,6 +289,7 @@ }, { "BriefDescription": "This metric estimates how often memory load a= ccesses were aliased by preceding stores (in program order) with a 4K addre= ss offset", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "LD_BLOCKS_PARTIAL.ADDRESS_ALIAS / tma_info_thread_c= lks", "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group", "MetricName": "tma_4k_aliasing", @@ -523,7 +524,6 @@ }, { "BriefDescription": "This metric represents overall arithmetic flo= ating-point (FP) operations fraction the CPU has executed (retired)", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "tma_x87_use + tma_fp_scalar + tma_fp_vector", "MetricGroup": "HPC;TopdownL3;tma_L3_group;tma_light_operations_gr= oup", "MetricName": "tma_fp_arith", @@ -668,6 +668,7 @@ }, { "BriefDescription": "Total pipeline cost of Instruction Cache miss= es - subset of the Big_Code Bottleneck", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_fetch_latency * tma_icache_misses / (tma= _branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses += tma_lcp + tma_ms_switches))", "MetricGroup": "Fed;FetchLat;IcMiss;tma_issueFL", "MetricName": "tma_info_botlnk_l2_ic_misses", @@ -701,6 +702,7 @@ }, { "BriefDescription": "Total pipeline cost of (external) Memory Band= width related bottlenecks", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * tma_memory_bound * (tma_dram_bound / (tma_dra= m_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_pmm_bound + tma_= store_bound) * (tma_mem_bandwidth / (tma_mem_bandwidth + tma_mem_latency)) = + tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bou= nd + tma_pmm_bound + tma_store_bound) * (tma_sq_full / (tma_contested_acces= ses + tma_data_sharing + tma_l3_hit_latency + tma_sq_full))) + tma_l1_bound= / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_pmm_b= ound + tma_store_bound) * (tma_fb_full / (tma_4k_aliasing + tma_dtlb_load += tma_fb_full + tma_lock_latency + tma_split_loads + tma_store_fwd_blk))", "MetricGroup": "Mem;MemoryBW;Offcore;tma_issueBW", "MetricName": "tma_info_bottleneck_memory_bandwidth", @@ -778,14 +780,12 @@ }, { "BriefDescription": "Floating Point Operations Per Cycle", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "(cpu@FP_ARITH_INST_RETIRED.SCALAR_SINGLE\\,umask\\= =3D0x03@ + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * cpu@FP_ARITH_= INST_RETIRED.128B_PACKED_SINGLE\\,umask\\=3D0x18@ + 8 * cpu@FP_ARITH_INST_R= ETIRED.256B_PACKED_SINGLE\\,umask\\=3D0x60@ + 16 * FP_ARITH_INST_RETIRED.51= 2B_PACKED_SINGLE) / tma_info_core_core_clks", "MetricGroup": "Flops;Ret", "MetricName": "tma_info_core_flopc" }, { "BriefDescription": "Actual per-core usage of the Floating Point n= on-X87 execution units (regardless of precision or vector-width)", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "(cpu@FP_ARITH_INST_RETIRED.SCALAR_SINGLE\\,umask\\= =3D0x03@ + cpu@FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE\\,umask\\=3D0xfc@) = / (2 * tma_info_core_core_clks)", "MetricGroup": "Cor;Flops;HPC", "MetricName": "tma_info_core_fp_arith_utilization", @@ -1144,7 +1144,6 @@ }, { "BriefDescription": "Average number of Uops retired in cycles wher= e at least one uop has retired.", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "tma_retiring * tma_info_thread_slots / cpu@UOPS_RET= IRED.SLOTS\\,cmask\\=3D1@", "MetricGroup": "Pipeline;Ret", "MetricName": "tma_info_pipeline_retire" @@ -1369,6 +1368,7 @@ }, { "BriefDescription": "This metric estimates how often the CPU was s= talled due to loads accesses to L3 cache or contended with a sibling Core", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "(CYCLE_ACTIVITY.STALLS_L2_MISS - CYCLE_ACTIVITY.STA= LLS_L3_MISS) / tma_info_thread_clks", "MetricGroup": "CacheMisses;MemoryBound;TmaL3mem;TopdownL3;tma_L3_= group;tma_memory_bound_group", "MetricName": "tma_l3_bound", @@ -1715,6 +1715,7 @@ }, { "BriefDescription": "This metric represents rate of split store ac= cesses", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "MEM_INST_RETIRED.SPLIT_STORES / tma_info_core_core_= clks", "MetricGroup": "TopdownL4;tma_L4_group;tma_issueSpSt;tma_store_bou= nd_group", "MetricName": "tma_split_stores", @@ -1742,6 +1743,7 @@ }, { "BriefDescription": "This metric roughly estimates fraction of cyc= les when the memory subsystem had loads blocked since they could not forwar= d data from earlier (in program order) overlapping stores", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "13 * LD_BLOCKS.STORE_FORWARD / tma_info_thread_clks= ", "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group", "MetricName": "tma_store_fwd_blk", diff --git a/tools/perf/pmu-events/arch/x86/rocketlake/rkl-metrics.json b/t= ools/perf/pmu-events/arch/x86/rocketlake/rkl-metrics.json index 1bb9cededa56..a0191c8b708d 100644 --- a/tools/perf/pmu-events/arch/x86/rocketlake/rkl-metrics.json +++ b/tools/perf/pmu-events/arch/x86/rocketlake/rkl-metrics.json @@ -85,6 +85,7 @@ }, { "BriefDescription": "This metric estimates how often memory load a= ccesses were aliased by preceding stores (in program order) with a 4K addre= ss offset", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "LD_BLOCKS_PARTIAL.ADDRESS_ALIAS / tma_info_thread_c= lks", "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group", "MetricName": "tma_4k_aliasing", @@ -319,7 +320,6 @@ }, { "BriefDescription": "This metric represents overall arithmetic flo= ating-point (FP) operations fraction the CPU has executed (retired)", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "tma_x87_use + tma_fp_scalar + tma_fp_vector", "MetricGroup": "HPC;TopdownL3;tma_L3_group;tma_light_operations_gr= oup", "MetricName": "tma_fp_arith", @@ -464,6 +464,7 @@ }, { "BriefDescription": "Total pipeline cost of Instruction Cache miss= es - subset of the Big_Code Bottleneck", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_fetch_latency * tma_icache_misses / (tma= _branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses += tma_lcp + tma_ms_switches))", "MetricGroup": "Fed;FetchLat;IcMiss;tma_issueFL", "MetricName": "tma_info_botlnk_l2_ic_misses", @@ -497,6 +498,7 @@ }, { "BriefDescription": "Total pipeline cost of (external) Memory Band= width related bottlenecks", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * tma_memory_bound * (tma_dram_bound / (tma_dra= m_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * (= tma_mem_bandwidth / (tma_mem_bandwidth + tma_mem_latency)) + tma_l3_bound /= (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_b= ound) * (tma_sq_full / (tma_contested_accesses + tma_data_sharing + tma_l3_= hit_latency + tma_sq_full))) + tma_l1_bound / (tma_dram_bound + tma_l1_boun= d + tma_l2_bound + tma_l3_bound + tma_store_bound) * (tma_fb_full / (tma_4k= _aliasing + tma_dtlb_load + tma_fb_full + tma_lock_latency + tma_split_load= s + tma_store_fwd_blk))", "MetricGroup": "Mem;MemoryBW;Offcore;tma_issueBW", "MetricName": "tma_info_bottleneck_memory_bandwidth", @@ -574,14 +576,12 @@ }, { "BriefDescription": "Floating Point Operations Per Cycle", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "(cpu@FP_ARITH_INST_RETIRED.SCALAR_SINGLE\\,umask\\= =3D0x03@ + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * cpu@FP_ARITH_= INST_RETIRED.128B_PACKED_SINGLE\\,umask\\=3D0x18@ + 8 * cpu@FP_ARITH_INST_R= ETIRED.256B_PACKED_SINGLE\\,umask\\=3D0x60@ + 16 * FP_ARITH_INST_RETIRED.51= 2B_PACKED_SINGLE) / tma_info_core_core_clks", "MetricGroup": "Flops;Ret", "MetricName": "tma_info_core_flopc" }, { "BriefDescription": "Actual per-core usage of the Floating Point n= on-X87 execution units (regardless of precision or vector-width)", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "(cpu@FP_ARITH_INST_RETIRED.SCALAR_SINGLE\\,umask\\= =3D0x03@ + cpu@FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE\\,umask\\=3D0xfc@) = / (2 * tma_info_core_core_clks)", "MetricGroup": "Cor;Flops;HPC", "MetricName": "tma_info_core_fp_arith_utilization", @@ -933,7 +933,6 @@ }, { "BriefDescription": "Average number of Uops retired in cycles wher= e at least one uop has retired.", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "tma_retiring * tma_info_thread_slots / cpu@UOPS_RET= IRED.SLOTS\\,cmask\\=3D1@", "MetricGroup": "Pipeline;Ret", "MetricName": "tma_info_pipeline_retire" @@ -1126,6 +1125,7 @@ }, { "BriefDescription": "This metric estimates how often the CPU was s= talled due to loads accesses to L3 cache or contended with a sibling Core", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "(CYCLE_ACTIVITY.STALLS_L2_MISS - CYCLE_ACTIVITY.STA= LLS_L3_MISS) / tma_info_thread_clks", "MetricGroup": "CacheMisses;MemoryBound;TmaL3mem;TopdownL3;tma_L3_= group;tma_memory_bound_group", "MetricName": "tma_l3_bound", @@ -1445,6 +1445,7 @@ }, { "BriefDescription": "This metric represents rate of split store ac= cesses", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "MEM_INST_RETIRED.SPLIT_STORES / tma_info_core_core_= clks", "MetricGroup": "TopdownL4;tma_L4_group;tma_issueSpSt;tma_store_bou= nd_group", "MetricName": "tma_split_stores", @@ -1472,6 +1473,7 @@ }, { "BriefDescription": "This metric roughly estimates fraction of cyc= les when the memory subsystem had loads blocked since they could not forwar= d data from earlier (in program order) overlapping stores", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "13 * LD_BLOCKS.STORE_FORWARD / tma_info_thread_clks= ", "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group", "MetricName": "tma_store_fwd_blk", diff --git a/tools/perf/pmu-events/arch/x86/sapphirerapids/spr-metrics.json= b/tools/perf/pmu-events/arch/x86/sapphirerapids/spr-metrics.json index c207c851a9f9..222212abd811 100644 --- a/tools/perf/pmu-events/arch/x86/sapphirerapids/spr-metrics.json +++ b/tools/perf/pmu-events/arch/x86/sapphirerapids/spr-metrics.json @@ -553,7 +553,6 @@ }, { "BriefDescription": "This metric represents overall arithmetic flo= ating-point (FP) operations fraction the CPU has executed (retired)", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "tma_x87_use + tma_fp_scalar + tma_fp_vector + tma_f= p_amx", "MetricGroup": "HPC;TopdownL3;tma_L3_group;tma_light_operations_gr= oup", "MetricName": "tma_fp_arith", @@ -717,6 +716,7 @@ }, { "BriefDescription": "Total pipeline cost of Instruction Cache miss= es - subset of the Big_Code Bottleneck", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_fetch_latency * tma_icache_misses / (tma= _branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses += tma_lcp + tma_ms_switches))", "MetricGroup": "Fed;FetchLat;IcMiss;tma_issueFL", "MetricName": "tma_info_botlnk_l2_ic_misses", @@ -750,6 +750,7 @@ }, { "BriefDescription": "Total pipeline cost of (external) Memory Band= width related bottlenecks", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * tma_memory_bound * (tma_dram_bound / (tma_dra= m_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_pmm_bound + tma_= store_bound) * (tma_mem_bandwidth / (tma_mem_bandwidth + tma_mem_latency)) = + tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bou= nd + tma_pmm_bound + tma_store_bound) * (tma_sq_full / (tma_contested_acces= ses + tma_data_sharing + tma_l3_hit_latency + tma_sq_full))) + tma_l1_bound= / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_pmm_b= ound + tma_store_bound) * (tma_fb_full / (tma_dtlb_load + tma_fb_full + tma= _lock_latency + tma_split_loads + tma_store_fwd_blk))", "MetricGroup": "Mem;MemoryBW;Offcore;tma_issueBW", "MetricName": "tma_info_bottleneck_memory_bandwidth", @@ -827,14 +828,12 @@ }, { "BriefDescription": "Floating Point Operations Per Cycle", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "cpu@FP_ARITH_INST_RETIRED.SCALAR_SINGLE\\,umask\\= =3D0x03@ + FP_ARITH_INST_RETIRED2.SCALAR_HALF + 2 * (FP_ARITH_INST_RETIRED.= 128B_PACKED_DOUBLE + FP_ARITH_INST_RETIRED2.COMPLEX_SCALAR_HALF) + 4 * cpu@= FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE\\,umask\\=3D0x18@ + 8 * (FP_ARITH_= INST_RETIRED2.128B_PACKED_HALF + cpu@FP_ARITH_INST_RETIRED.256B_PACKED_SING= LE\\,umask\\=3D0x60@) + 16 * (FP_ARITH_INST_RETIRED2.256B_PACKED_HALF + FP_= ARITH_INST_RETIRED.512B_PACKED_SINGLE) + 32 * FP_ARITH_INST_RETIRED2.512B_P= ACKED_HALF + 4 * AMX_OPS_RETIRED.BF16", "MetricGroup": "Flops;Ret", "MetricName": "tma_info_core_flopc" }, { "BriefDescription": "Actual per-core usage of the Floating Point n= on-X87 execution units (regardless of precision or vector-width)", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "(FP_ARITH_DISPATCHED.PORT_0 + FP_ARITH_DISPATCHED.P= ORT_1 + FP_ARITH_DISPATCHED.PORT_5) / (2 * tma_info_core_core_clks)", "MetricGroup": "Cor;Flops;HPC", "MetricName": "tma_info_core_fp_arith_utilization", @@ -1216,7 +1215,6 @@ }, { "BriefDescription": "Average number of Uops retired in cycles wher= e at least one uop has retired.", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "tma_retiring * tma_info_thread_slots / cpu@UOPS_RET= IRED.SLOTS\\,cmask\\=3D1@", "MetricGroup": "Pipeline;Ret", "MetricName": "tma_info_pipeline_retire" @@ -1467,6 +1465,7 @@ }, { "BriefDescription": "This metric estimates how often the CPU was s= talled due to loads accesses to L3 cache or contended with a sibling Core", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "(MEMORY_ACTIVITY.STALLS_L2_MISS - MEMORY_ACTIVITY.S= TALLS_L3_MISS) / tma_info_thread_clks", "MetricGroup": "CacheMisses;MemoryBound;TmaL3mem;TopdownL3;tma_L3_= group;tma_memory_bound_group", "MetricName": "tma_l3_bound", @@ -1841,6 +1840,7 @@ }, { "BriefDescription": "This metric represents rate of split store ac= cesses", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "MEM_INST_RETIRED.SPLIT_STORES / tma_info_core_core_= clks", "MetricGroup": "TopdownL4;tma_L4_group;tma_issueSpSt;tma_store_bou= nd_group", "MetricName": "tma_split_stores", @@ -1868,6 +1868,7 @@ }, { "BriefDescription": "This metric roughly estimates fraction of cyc= les when the memory subsystem had loads blocked since they could not forwar= d data from earlier (in program order) overlapping stores", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "13 * LD_BLOCKS.STORE_FORWARD / tma_info_thread_clks= ", "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group", "MetricName": "tma_store_fwd_blk", diff --git a/tools/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json b/to= ols/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json index c7c2d6ab1a93..fab084e1bc69 100644 --- a/tools/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json +++ b/tools/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json @@ -79,6 +79,7 @@ }, { "BriefDescription": "This metric estimates how often memory load a= ccesses were aliased by preceding stores (in program order) with a 4K addre= ss offset", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "LD_BLOCKS_PARTIAL.ADDRESS_ALIAS / tma_info_thread_c= lks", "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group", "MetricName": "tma_4k_aliasing", @@ -313,7 +314,6 @@ }, { "BriefDescription": "This metric represents overall arithmetic flo= ating-point (FP) operations fraction the CPU has executed (retired)", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "tma_x87_use + tma_fp_scalar + tma_fp_vector", "MetricGroup": "HPC;TopdownL3;tma_L3_group;tma_light_operations_gr= oup", "MetricName": "tma_fp_arith", @@ -458,6 +458,7 @@ }, { "BriefDescription": "Total pipeline cost of Instruction Cache miss= es - subset of the Big_Code Bottleneck", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * (tma_fetch_latency * tma_icache_misses / (tma= _branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses += tma_lcp + tma_ms_switches))", "MetricGroup": "Fed;FetchLat;IcMiss;tma_issueFL", "MetricName": "tma_info_botlnk_l2_ic_misses", @@ -491,6 +492,7 @@ }, { "BriefDescription": "Total pipeline cost of (external) Memory Band= width related bottlenecks", + "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "100 * tma_memory_bound * (tma_dram_bound / (tma_dra= m_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * (= tma_mem_bandwidth / (tma_mem_bandwidth + tma_mem_latency)) + tma_l3_bound /= (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_b= ound) * (tma_sq_full / (tma_contested_accesses + tma_data_sharing + tma_l3_= hit_latency + tma_sq_full))) + tma_l1_bound / (tma_dram_bound + tma_l1_boun= d + tma_l2_bound + tma_l3_bound + tma_store_bound) * (tma_fb_full / (tma_4k= _aliasing + tma_dtlb_load + tma_fb_full + tma_lock_latency + tma_split_load= s + tma_store_fwd_blk))", "MetricGroup": "Mem;MemoryBW;Offcore;tma_issueBW", "MetricName": "tma_info_bottleneck_memory_bandwidth", @@ -568,14 +570,12 @@ }, { "BriefDescription": "Floating Point Operations Per Cycle", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "(cpu@FP_ARITH_INST_RETIRED.SCALAR_SINGLE\\,umask\\= =3D0x03@ + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * cpu@FP_ARITH_= INST_RETIRED.128B_PACKED_SINGLE\\,umask\\=3D0x18@ + 8 * cpu@FP_ARITH_INST_R= ETIRED.256B_PACKED_SINGLE\\,umask\\=3D0x60@ + 16 * FP_ARITH_INST_RETIRED.51= 2B_PACKED_SINGLE) / tma_info_core_core_clks", "MetricGroup": "Flops;Ret", "MetricName": "tma_info_core_flopc" }, { "BriefDescription": "Actual per-core usage of the Floating Point n= on-X87 execution units (regardless of precision or vector-width)", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "(cpu@FP_ARITH_INST_RETIRED.SCALAR_SINGLE\\,umask\\= =3D0x03@ + cpu@FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE\\,umask\\=3D0xfc@) = / (2 * tma_info_core_core_clks)", "MetricGroup": "Cor;Flops;HPC", "MetricName": "tma_info_core_fp_arith_utilization", @@ -927,7 +927,6 @@ }, { "BriefDescription": "Average number of Uops retired in cycles wher= e at least one uop has retired.", - "MetricConstraint": "NO_GROUP_EVENTS", "MetricExpr": "tma_retiring * tma_info_thread_slots / cpu@UOPS_RET= IRED.SLOTS\\,cmask\\=3D1@", "MetricGroup": "Pipeline;Ret", "MetricName": "tma_info_pipeline_retire" @@ -1114,6 +1113,7 @@ }, { "BriefDescription": "This metric estimates how often the CPU was s= talled due to loads accesses to L3 cache or contended with a sibling Core", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "(CYCLE_ACTIVITY.STALLS_L2_MISS - CYCLE_ACTIVITY.STA= LLS_L3_MISS) / tma_info_thread_clks", "MetricGroup": "CacheMisses;MemoryBound;TmaL3mem;TopdownL3;tma_L3_= group;tma_memory_bound_group", "MetricName": "tma_l3_bound", @@ -1433,6 +1433,7 @@ }, { "BriefDescription": "This metric represents rate of split store ac= cesses", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "MEM_INST_RETIRED.SPLIT_STORES / tma_info_core_core_= clks", "MetricGroup": "TopdownL4;tma_L4_group;tma_issueSpSt;tma_store_bou= nd_group", "MetricName": "tma_split_stores", @@ -1460,6 +1461,7 @@ }, { "BriefDescription": "This metric roughly estimates fraction of cyc= les when the memory subsystem had loads blocked since they could not forwar= d data from earlier (in program order) overlapping stores", + "MetricConstraint": "NO_GROUP_EVENTS_NMI", "MetricExpr": "13 * LD_BLOCKS.STORE_FORWARD / tma_info_thread_clks= ", "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group", "MetricName": "tma_store_fwd_blk", --=20 2.41.0.585.gd2178a4bd4-goog