From nobody Wed Dec 17 05:27:01 2025 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7AAED13DB8A for ; Tue, 18 Jun 2024 15:12:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.19 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718723540; cv=none; b=BTDQvCY5Pkvbr+nH2W+wWJcocRp53mHFcyGaHIU1qwM1KYsxxo7Gp5gF/pAShiwrQLBlqjOGrBglgiO/EmqhkZMLWfX+IBowNiLwK9WKXiCUTWOoqfwAz7uCSBA03duhzdEpzwwLxKlJSC8DWOVLcMVFEgqWWXkRLKxmx0KkdkY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718723540; c=relaxed/simple; bh=f63O02IXi1idywusM9TihfSMG3EyG4AmNVGX1fGiXdM=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=vFFXqQTlBMHMgLxFo66cWeEoZrk2XM+npk6ULWwfAyL9CYwR/fVLOXCUhcSwO9EwmC+/uPlc6eT3gpQxvFfiUgq4W6MlEDg8ITD8/Iu8mm0a9lHCiAHWfATu1IICI+DjEbvPzVPNirukHYK0mNh+yyCXNonTA+rH7jBdaKRm2KU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=NOaBlKQc; arc=none smtp.client-ip=192.198.163.19 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="NOaBlKQc" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1718723538; x=1750259538; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=f63O02IXi1idywusM9TihfSMG3EyG4AmNVGX1fGiXdM=; b=NOaBlKQcjqekeTEHlIWS2b+Ccwi6x/BEX7X5Yz4uH1tlHCJQNkZ1WI76 sao+YrlxsirQSFTlE6W9aUrTWbST/Firzb5m8Tyc3Ibnn4rQ72GcLHwqW NIb+Z/dLUUPlnPOiTlgwKPra2eEsleNUfT30V7Pe1oZmIXP3MQyXcBria riAPfUlUNLOF69XE2W0Bctc41rAGbfqjG0bsDch7c1nWp66WPvKsjssh0 HVfTmbwWWLe+lzOuXGvfTGSVfhMpXF2pvLkAsSoXIJW/VgV4746LHzCHh C3nc9YBJqTGpSIWBRS5epsXW2aKZgrJNIadfyTBWu7VhtcoSjguIRaH0F w==; X-CSE-ConnectionGUID: HqsqEdMGRCaJb548mo+emw== X-CSE-MsgGUID: CIuhQXD8SBKtmZodR8Rtsw== X-IronPort-AV: E=McAfee;i="6700,10204,11107"; a="15374203" X-IronPort-AV: E=Sophos;i="6.08,247,1712646000"; d="scan'208";a="15374203" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by fmvoesa113.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Jun 2024 08:12:16 -0700 X-CSE-ConnectionGUID: TOfN2ITfSaSYo4tjbjBtpA== X-CSE-MsgGUID: niRy8Rj9QrOmqHmh+V+gYg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,247,1712646000"; d="scan'208";a="41426934" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by fmviesa006.fm.intel.com with ESMTP; 18 Jun 2024 08:12:15 -0700 From: kan.liang@linux.intel.com To: peterz@infradead.org, mingo@kernel.org, acme@kernel.org, namhyung@kernel.org, irogers@google.com, adrian.hunter@intel.com, alexander.shishkin@linux.intel.com, linux-kernel@vger.kernel.org Cc: ak@linux.intel.com, eranian@google.com, Kan Liang Subject: [RESEND PATCH 04/12] perf/x86/intel: Support new data source for Lunar Lake Date: Tue, 18 Jun 2024 08:10:36 -0700 Message-Id: <20240618151044.1318612-5-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20240618151044.1318612-1-kan.liang@linux.intel.com> References: <20240618151044.1318612-1-kan.liang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang A new PEBS data source format is introduced for the p-core of Lunar Lake. The data source field is extended to 8 bits with new encodings. A new layout is introduced into the union intel_x86_pebs_dse. Introduce the lnl_latency_data() to parse the new format. Enlarge the pebs_data_source[] accordingly to include new encodings. Only the mem load and the mem store events can generate the data source. Introduce INTEL_HYBRID_LDLAT_CONSTRAINT and INTEL_HYBRID_STLAT_CONSTRAINT to mark them. Add two new bits for the new cache-related data src, L2_MHB and MSC. The L2_MHB is short for L2 Miss Handling Buffer, which is similar to LFB (Line Fill Buffer), but to track the L2 Cache misses. The MSC stands for the memory-side cache. Reviewed-by: Andi Kleen Signed-off-by: Kan Liang --- arch/x86/events/intel/core.c | 2 + arch/x86/events/intel/ds.c | 88 ++++++++++++++++++++++++++++++++- arch/x86/events/perf_event.h | 16 +++++- include/uapi/linux/perf_event.h | 6 ++- 4 files changed, 107 insertions(+), 5 deletions(-) diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index 42e65fb6f2ff..60806f373226 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -6960,6 +6960,7 @@ __init int intel_pmu_init(void) case INTEL_FAM6_ARROWLAKE: intel_pmu_init_hybrid(hybrid_big_small); =20 + x86_pmu.pebs_latency_data =3D lnl_latency_data; x86_pmu.get_event_constraints =3D mtl_get_event_constraints; x86_pmu.hw_config =3D adl_hw_config; =20 @@ -6977,6 +6978,7 @@ __init int intel_pmu_init(void) pmu =3D &x86_pmu.hybrid_pmu[X86_HYBRID_PMU_ATOM_IDX]; intel_pmu_init_skt(&pmu->pmu); =20 + intel_pmu_pebs_data_source_lnl(); pr_cont("Lunarlake Hybrid events, "); name =3D "lunarlake_hybrid"; break; diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index 42a38222c044..abf2b1991bc0 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -63,6 +63,15 @@ union intel_x86_pebs_dse { unsigned int mtl_fwd_blk:1; unsigned int ld_reserved4:24; }; + struct { + unsigned int lnc_dse:8; + unsigned int ld_reserved5:2; + unsigned int lnc_stlb_miss:1; + unsigned int lnc_locked:1; + unsigned int lnc_data_blk:1; + unsigned int lnc_addr_blk:1; + unsigned int ld_reserved6:18; + }; }; =20 =20 @@ -77,7 +86,7 @@ union intel_x86_pebs_dse { #define SNOOP_NONE_MISS (P(SNOOP, NONE) | P(SNOOP, MISS)) =20 /* Version for Sandy Bridge and later */ -static u64 pebs_data_source[] =3D { +static u64 pebs_data_source[PERF_PEBS_DATA_SOURCE_MAX] =3D { P(OP, LOAD) | P(LVL, MISS) | LEVEL(L3) | P(SNOOP, NA),/* 0x00:ukn L3 */ OP_LH | P(LVL, L1) | LEVEL(L1) | P(SNOOP, NONE), /* 0x01: L1 local */ OP_LH | P(LVL, LFB) | LEVEL(LFB) | P(SNOOP, NONE), /* 0x02: LFB hit */ @@ -173,6 +182,40 @@ void __init intel_pmu_pebs_data_source_cmt(void) __intel_pmu_pebs_data_source_cmt(pebs_data_source); } =20 +/* Version for Lunar Lake p-core and later */ +static u64 lnc_pebs_data_source[PERF_PEBS_DATA_SOURCE_MAX] =3D { + P(OP, LOAD) | P(LVL, MISS) | LEVEL(L3) | P(SNOOP, NA), /* 0x00: ukn L3 */ + OP_LH | P(LVL, L1) | LEVEL(L1) | P(SNOOP, NONE), /* 0x01: L1 hit */ + OP_LH | P(LVL, L1) | LEVEL(L1) | P(SNOOP, NONE), /* 0x02: L1 hit */ + OP_LH | P(LVL, LFB) | LEVEL(LFB) | P(SNOOP, NONE), /* 0x03: LFB/L1 Miss H= andling Buffer hit */ + 0, /* 0x04: Reserved */ + OP_LH | P(LVL, L2) | LEVEL(L2) | P(SNOOP, NONE), /* 0x05: L2 Hit */ + OP_LH | LEVEL(L2_MHB) | P(SNOOP, NONE), /* 0x06: L2 Miss Handling Buffe= r Hit */ + 0, /* 0x07: Reserved */ + OP_LH | P(LVL, L3) | LEVEL(L3) | P(SNOOP, NONE), /* 0x08: L3 Hit */ + 0, /* 0x09: Reserved */ + 0, /* 0x0a: Reserved */ + 0, /* 0x0b: Reserved */ + OP_LH | P(LVL, L3) | LEVEL(L3) | P(SNOOPX, FWD), /* 0x0c: L3 Hit Snoop F= wd */ + OP_LH | P(LVL, L3) | LEVEL(L3) | P(SNOOP, HITM), /* 0x0d: L3 Hit Snoop H= itM */ + 0, /* 0x0e: Reserved */ + P(OP, LOAD) | P(LVL, MISS) | P(LVL, L3) | LEVEL(L3) | P(SNOOP, HITM), /*= 0x0f: L3 Miss Snoop HitM */ + OP_LH | LEVEL(MSC) | P(SNOOP, NONE), /* 0x10: Memory-side Cache Hit */ + OP_LH | P(LVL, LOC_RAM) | LEVEL(RAM) | P(SNOOP, NONE), /* 0x11: Local Me= mory Hit */ +}; + +void __init intel_pmu_pebs_data_source_lnl(void) +{ + u64 *data_source; + + data_source =3D x86_pmu.hybrid_pmu[X86_HYBRID_PMU_CORE_IDX].pebs_data_sou= rce; + memcpy(data_source, lnc_pebs_data_source, sizeof(lnc_pebs_data_source)); + + data_source =3D x86_pmu.hybrid_pmu[X86_HYBRID_PMU_ATOM_IDX].pebs_data_sou= rce; + memcpy(data_source, pebs_data_source, sizeof(pebs_data_source)); + __intel_pmu_pebs_data_source_cmt(data_source); +} + static u64 precise_store_data(u64 status) { union intel_x86_pebs_dse dse; @@ -264,7 +307,7 @@ static u64 __adl_latency_data_small(struct perf_event *= event, u64 status, =20 WARN_ON_ONCE(hybrid_pmu(event->pmu)->pmu_type =3D=3D hybrid_big); =20 - dse &=3D PERF_PEBS_DATA_SOURCE_MASK; + dse &=3D PERF_PEBS_DATA_SOURCE_GRT_MASK; val =3D hybrid_var(event->pmu, pebs_data_source)[dse]; =20 pebs_set_tlb_lock(&val, tlb, lock); @@ -300,6 +343,45 @@ u64 mtl_latency_data_small(struct perf_event *event, u= 64 status) dse.mtl_fwd_blk); } =20 +u64 lnl_latency_data(struct perf_event *event, u64 status) +{ + struct x86_hybrid_pmu *pmu =3D hybrid_pmu(event->pmu); + union intel_x86_pebs_dse dse; + union perf_mem_data_src src; + u64 val; + + if (pmu->pmu_type =3D=3D hybrid_small) + return mtl_latency_data_small(event, status); + + dse.val =3D status; + + /* LNC core latency data */ + val =3D hybrid_var(event->pmu, pebs_data_source)[status & PERF_PEBS_DATA_= SOURCE_MASK]; + if (!val) + val =3D P(OP, LOAD) | LEVEL(NA) | P(SNOOP, NA); + + if (dse.lnc_stlb_miss) + val |=3D P(TLB, MISS) | P(TLB, L2); + else + val |=3D P(TLB, HIT) | P(TLB, L1) | P(TLB, L2); + + if (dse.lnc_locked) + val |=3D P(LOCK, LOCKED); + + if (dse.lnc_data_blk) + val |=3D P(BLK, DATA); + if (dse.lnc_addr_blk) + val |=3D P(BLK, ADDR); + if (!dse.lnc_data_blk && !dse.lnc_addr_blk) + val |=3D P(BLK, NA); + + src.val =3D val; + if (event->hw.flags & PERF_X86_EVENT_PEBS_ST_HSW) + src.mem_op =3D P(OP, STORE); + + return src.val; +} + static u64 load_latency_data(struct perf_event *event, u64 status) { union intel_x86_pebs_dse dse; @@ -1090,6 +1172,8 @@ struct event_constraint intel_lnc_pebs_event_constrai= nts[] =3D { INTEL_FLAGS_UEVENT_CONSTRAINT(0x100, 0x100000000ULL), /* INST_RETIRED.PRE= C_DIST */ INTEL_FLAGS_UEVENT_CONSTRAINT(0x0400, 0x800000000ULL), =20 + INTEL_HYBRID_LDLAT_CONSTRAINT(0x1cd, 0x3ff), + INTEL_HYBRID_STLAT_CONSTRAINT(0x2cd, 0x3), INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x11d0, 0xf), /* MEM_INST_RETIRED= .STLB_MISS_LOADS */ INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(0x12d0, 0xf), /* MEM_INST_RETIRED= .STLB_MISS_STORES */ INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x21d0, 0xf), /* MEM_INST_RETIRED= .LOCK_LOADS */ diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 92df40b8926c..66209bb2ba77 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -476,6 +476,14 @@ struct cpu_hw_events { __EVENT_CONSTRAINT(c, n, INTEL_ARCH_EVENT_MASK|X86_ALL_EVENT_FLAGS, \ HWEIGHT(n), 0, PERF_X86_EVENT_PEBS_LAT_HYBRID) =20 +#define INTEL_HYBRID_LDLAT_CONSTRAINT(c, n) \ + __EVENT_CONSTRAINT(c, n, INTEL_ARCH_EVENT_MASK|X86_ALL_EVENT_FLAGS, \ + HWEIGHT(n), 0, PERF_X86_EVENT_PEBS_LAT_HYBRID|PERF_X86_EVENT_PEBS_LD_= HSW) + +#define INTEL_HYBRID_STLAT_CONSTRAINT(c, n) \ + __EVENT_CONSTRAINT(c, n, INTEL_ARCH_EVENT_MASK|X86_ALL_EVENT_FLAGS, \ + HWEIGHT(n), 0, PERF_X86_EVENT_PEBS_LAT_HYBRID|PERF_X86_EVENT_PEBS_ST_= HSW) + /* Event constraint, but match on all event flags too. */ #define INTEL_FLAGS_EVENT_CONSTRAINT(c, n) \ EVENT_CONSTRAINT(c, n, ARCH_PERFMON_EVENTSEL_EVENT|X86_ALL_EVENT_FLAGS) @@ -655,8 +663,10 @@ enum { x86_lbr_exclusive_max, }; =20 -#define PERF_PEBS_DATA_SOURCE_MAX 0x10 +#define PERF_PEBS_DATA_SOURCE_MAX 0x100 #define PERF_PEBS_DATA_SOURCE_MASK (PERF_PEBS_DATA_SOURCE_MAX - 1) +#define PERF_PEBS_DATA_SOURCE_GRT_MAX 0x10 +#define PERF_PEBS_DATA_SOURCE_GRT_MASK (PERF_PEBS_DATA_SOURCE_GRT_MAX - 1) =20 enum hybrid_cpu_type { HYBRID_INTEL_NONE, @@ -1542,6 +1552,8 @@ u64 adl_latency_data_small(struct perf_event *event, = u64 status); =20 u64 mtl_latency_data_small(struct perf_event *event, u64 status); =20 +u64 lnl_latency_data(struct perf_event *event, u64 status); + extern struct event_constraint intel_core2_pebs_event_constraints[]; =20 extern struct event_constraint intel_atom_pebs_event_constraints[]; @@ -1663,6 +1675,8 @@ void intel_pmu_pebs_data_source_mtl(void); =20 void intel_pmu_pebs_data_source_cmt(void); =20 +void intel_pmu_pebs_data_source_lnl(void); + int intel_pmu_setup_lbr_filter(struct perf_event *event); =20 void intel_pt_interrupt(void); diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_even= t.h index 3a64499b0f5d..4842c36fdf80 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -1349,12 +1349,14 @@ union perf_mem_data_src { #define PERF_MEM_LVLNUM_L2 0x02 /* L2 */ #define PERF_MEM_LVLNUM_L3 0x03 /* L3 */ #define PERF_MEM_LVLNUM_L4 0x04 /* L4 */ -/* 5-0x7 available */ +#define PERF_MEM_LVLNUM_L2_MHB 0x05 /* L2 Miss Handling Buffer */ +#define PERF_MEM_LVLNUM_MSC 0x06 /* Memory-side Cache */ +/* 0x7 available */ #define PERF_MEM_LVLNUM_UNC 0x08 /* Uncached */ #define PERF_MEM_LVLNUM_CXL 0x09 /* CXL */ #define PERF_MEM_LVLNUM_IO 0x0a /* I/O */ #define PERF_MEM_LVLNUM_ANY_CACHE 0x0b /* Any cache */ -#define PERF_MEM_LVLNUM_LFB 0x0c /* LFB */ +#define PERF_MEM_LVLNUM_LFB 0x0c /* LFB / L1 Miss Handling Buffer */ #define PERF_MEM_LVLNUM_RAM 0x0d /* RAM */ #define PERF_MEM_LVLNUM_PMEM 0x0e /* PMEM */ #define PERF_MEM_LVLNUM_NA 0x0f /* N/A */ --=20 2.35.1