From nobody Thu Dec 18 20:19:08 2025 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7E3C0245B1B for ; Thu, 13 Feb 2025 21:17:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.12 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739481423; cv=none; b=gyVUBKcJkcdCqD+UV2N4rQrfPH4N5lyHIK6LJKiB2XLrpxJTBOAPdyZ/GoonyHtrUjmP4qd2MIv4qSjOw/o4x9WL/A9d9evJgZjzczFjVcbDpw1y4dXTnF22zxt71svyosuFXii3zFB02L2JxDqJ4KOQ+334eExBs5inqWXm1Ek= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739481423; c=relaxed/simple; bh=qUoGhuKtGkdmsBm47uTm7j0cp1wYUb11+UQSSJY+ycM=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=WXl97iKQVMiC+RFrMiI1KOgDHqrkEOmzkjbeQRXuWF4zR3HF04hvwKR8iAZ+eonyLcqAgu/9U+AKh1tQoJJMGEpGPX/3QqUqVSb9LHcHtJ7gai1G4gzTD/tLXb04Y59kcpFqBtGDpIxmoIaptAq4IRL4d/OkeN06DHihG33dKiE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=AdoWaMCO; arc=none smtp.client-ip=192.198.163.12 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="AdoWaMCO" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1739481421; x=1771017421; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=qUoGhuKtGkdmsBm47uTm7j0cp1wYUb11+UQSSJY+ycM=; b=AdoWaMCODmD0/w68veWk15b243Qs7MM8DN5XvDjPtHMSp7Gzr0543OM3 EfZAGBHTP9bcNO03eTrwincr3gnmHTSxcB9YkVLGl28dmAFK+7S+mBEpZ XCbeHc9VmWmc9RAN/342Nx92a2AZ4VG+nK2yhS31N1YTqa8WaqmjX2yrZ rjwIr5pTToK/tebT+6XziRz2hR6utWW+tdAiXmkKUtcwHojMJ8iQ7+XLt mmZsfUeMUFKhcvznJUNnuQdXTMwp+1Z7fh4couME6pa0rT1vegFxk/Xpj dx0fYGyXTnRjaK/vSRuoddBQFHcwUDIOqq5jRgdZFaFkWdKIkoHXsGLSH w==; X-CSE-ConnectionGUID: N/v9DkenQqqYuRnV6jC1ww== X-CSE-MsgGUID: LMNavWWNQeOnouoAgpr+Zw== X-IronPort-AV: E=McAfee;i="6700,10204,11344"; a="44142398" X-IronPort-AV: E=Sophos;i="6.13,282,1732608000"; d="scan'208";a="44142398" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by fmvoesa106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Feb 2025 13:16:58 -0800 X-CSE-ConnectionGUID: 1OH+nuhuSXuHn/Th5QR8/Q== X-CSE-MsgGUID: HK53CHRWQ3iNazhSNynrAw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="118469491" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by orviesa005.jf.intel.com with ESMTP; 13 Feb 2025 13:16:59 -0800 From: kan.liang@linux.intel.com To: peterz@infradead.org, mingo@redhat.com, acme@kernel.org, namhyung@kernel.org, irogers@google.com, adrian.hunter@intel.com, alexander.shishkin@linux.intel.com, linux-kernel@vger.kernel.org Cc: ak@linux.intel.com, eranian@google.com, dapeng1.mi@linux.intel.com, thomas.falcon@intel.com, Kan Liang Subject: [PATCH V3 1/5] perf/x86: Add dynamic constraint Date: Thu, 13 Feb 2025 13:17:14 -0800 Message-Id: <20250213211718.2406744-2-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20250213211718.2406744-1-kan.liang@linux.intel.com> References: <20250213211718.2406744-1-kan.liang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang More and more features require a dynamic event constraint, e.g., branch counter logging, auto counter reload, Arch PEBS, etc. Add a generic flag, PMU_FL_DYN_CONSTRAINT, to indicate the case. It avoids keeping adding the individual flag in intel_cpuc_prepare(). Add a variable dyn_constraint in the struct hw_perf_event to track the dynamic constraint of the event. Apply it if it's updated. Apply the generic dynamic constraint for branch counter logging. Many features on and after V6 require dynamic constraint. So unconditionally set the flag for V6+. Signed-off-by: Kan Liang Tested-by: Thomas Falcon --- arch/x86/events/core.c | 1 + arch/x86/events/intel/core.c | 21 +++++++++++++++------ arch/x86/events/intel/lbr.c | 2 +- arch/x86/events/perf_event.h | 1 + include/linux/perf_event.h | 1 + 5 files changed, 19 insertions(+), 7 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 20ad5cca6ad2..b56fa6a9d7a4 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -675,6 +675,7 @@ static int __x86_pmu_event_init(struct perf_event *even= t) event->hw.idx =3D -1; event->hw.last_cpu =3D -1; event->hw.last_tag =3D ~0ULL; + event->hw.dyn_constraint =3D ~0ULL; =20 /* mark unused */ event->hw.extra_reg.idx =3D EXTRA_REG_NONE; diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index f728d2cfdf1c..2df05b18ff04 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -3736,10 +3736,9 @@ intel_get_event_constraints(struct cpu_hw_events *cp= uc, int idx, if (cpuc->excl_cntrs) return intel_get_excl_constraints(cpuc, event, idx, c2); =20 - /* Not all counters support the branch counter feature. */ - if (branch_sample_counters(event)) { + if (event->hw.dyn_constraint !=3D ~0ULL) { c2 =3D dyn_constraint(cpuc, c2, idx); - c2->idxmsk64 &=3D x86_pmu.lbr_counters; + c2->idxmsk64 &=3D event->hw.dyn_constraint; c2->weight =3D hweight64(c2->idxmsk64); } =20 @@ -4141,15 +4140,19 @@ static int intel_pmu_hw_config(struct perf_event *e= vent) leader =3D event->group_leader; if (branch_sample_call_stack(leader)) return -EINVAL; - if (branch_sample_counters(leader)) + if (branch_sample_counters(leader)) { num++; + leader->hw.dyn_constraint &=3D x86_pmu.lbr_counters; + } leader->hw.flags |=3D PERF_X86_EVENT_BRANCH_COUNTERS; =20 for_each_sibling_event(sibling, leader) { if (branch_sample_call_stack(sibling)) return -EINVAL; - if (branch_sample_counters(sibling)) + if (branch_sample_counters(sibling)) { num++; + sibling->hw.dyn_constraint &=3D x86_pmu.lbr_counters; + } } =20 if (num > fls(x86_pmu.lbr_counters)) @@ -4949,7 +4952,7 @@ int intel_cpuc_prepare(struct cpu_hw_events *cpuc, in= t cpu) goto err; } =20 - if (x86_pmu.flags & (PMU_FL_EXCL_CNTRS | PMU_FL_TFA | PMU_FL_BR_CNTR)) { + if (x86_pmu.flags & (PMU_FL_EXCL_CNTRS | PMU_FL_TFA | PMU_FL_DYN_CONSTRAI= NT)) { size_t sz =3D X86_PMC_IDX_MAX * sizeof(struct event_constraint); =20 cpuc->constraint_list =3D kzalloc_node(sz, GFP_KERNEL, cpu_to_node(cpu)); @@ -6667,6 +6670,12 @@ __init int intel_pmu_init(void) pr_cont(" AnyThread deprecated, "); } =20 + /* + * Many features on and after V6 require dynamic constraint, + * e.g., Arch PEBS, ACR. + */ + if (version >=3D 6) + x86_pmu.flags |=3D PMU_FL_DYN_CONSTRAINT; /* * Install the hw-cache-events table: */ diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c index dc641b50814e..743dcc322085 100644 --- a/arch/x86/events/intel/lbr.c +++ b/arch/x86/events/intel/lbr.c @@ -1609,7 +1609,7 @@ void __init intel_pmu_arch_lbr_init(void) x86_pmu.lbr_nr =3D lbr_nr; =20 if (!!x86_pmu.lbr_counters) - x86_pmu.flags |=3D PMU_FL_BR_CNTR; + x86_pmu.flags |=3D PMU_FL_BR_CNTR | PMU_FL_DYN_CONSTRAINT; =20 if (x86_pmu.lbr_mispred) static_branch_enable(&x86_lbr_mispred); diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index a698e6484b3b..f4693409e191 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -1066,6 +1066,7 @@ do { \ #define PMU_FL_MEM_LOADS_AUX 0x100 /* Require an auxiliary event for the c= omplete memory info */ #define PMU_FL_RETIRE_LATENCY 0x200 /* Support Retire Latency in PEBS */ #define PMU_FL_BR_CNTR 0x400 /* Support branch counter logging */ +#define PMU_FL_DYN_CONSTRAINT 0x800 /* Needs dynamic constraint */ =20 #define EVENT_VAR(_id) event_attr_##_id #define EVENT_PTR(_id) &event_attr_##_id.attr.attr diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 2d07bc1193f3..c381ea7135df 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -158,6 +158,7 @@ struct hw_perf_event { struct { /* hardware */ u64 config; u64 last_tag; + u64 dyn_constraint; unsigned long config_base; unsigned long event_base; int event_base_rdpmc; --=20 2.38.1 From nobody Thu Dec 18 20:19:08 2025 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1BE932661BE for ; Thu, 13 Feb 2025 21:17:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.12 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739481423; cv=none; b=eeJ6X+EFOpWeUipNSZSUxzKFGfWAe/yZa9IbwgGNkxT8zrf9nn/CKAG9Dxv3tupJfB3kUSRew1Y7BUnjahtWDEBbB8OBh9joQqDXC25v19XD28wfCciRLEDM0iGnadI4KMtVrpdzFr6cIAi0ojOWq2tKP1k6wWPBUJLjoshoQTs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739481423; c=relaxed/simple; bh=q6xirUQpd71kWY89SAbH/i2BweBjkOmcnzd5GSWXRMw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=gdeIdkErk7Rgh4wBIqRMEMZ4Nf4j6fJsc31pu16By6Q/q/SbVFrJSZ1NFOosYcTWsAciiwZoLKGkPQ+SuZ8cum3w8BEL6G8L7cFPy5keGPDtU1i7u1sVBJDNMG2TunYsqINVef6xAi8zx+tms6xe/SbJkfN4c6WhtXskBgEbFO8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=QSvcAtUh; arc=none smtp.client-ip=192.198.163.12 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="QSvcAtUh" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1739481422; x=1771017422; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=q6xirUQpd71kWY89SAbH/i2BweBjkOmcnzd5GSWXRMw=; b=QSvcAtUhb6qUWTSUv7pTFFFClatzNSA4LIePGww/8M84ffp2SKULNQ8X aYs6fOA6sQxwWDQNsd1FuAWCFlFJDh9rTuMPG5bH266HDytJSkaYf7BNY x/MsXBYCG1k+6zdZoJyRis8r1uqd0ZTx/NxgJ9ZzZxY+uc5bHtXMG7LXG 46mxs9BeA1A1IEuIsVlxJO+5bdWOfjCjipaoPNnWncKdPv3i6AbzS5/nZ z8iKWgm120TLejVoFno28C4NbKBzD3OMmOla9DIDBuh1FhQZ7muLu2hRa boHpNJ6Q2YMnIz/Hv5Qu9JvtxqkBT4SlWNIll3ONrT+gpv3y48ZpejwgD g==; X-CSE-ConnectionGUID: 2e0TahSaST63MJ9YcOY5ng== X-CSE-MsgGUID: yKdoZFryTQ2gN+8ZJX/Raw== X-IronPort-AV: E=McAfee;i="6700,10204,11344"; a="44142402" X-IronPort-AV: E=Sophos;i="6.13,282,1732608000"; d="scan'208";a="44142402" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by fmvoesa106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Feb 2025 13:16:58 -0800 X-CSE-ConnectionGUID: hfksfB61TJSid+JI0LSqEQ== X-CSE-MsgGUID: LiCYH3NvTBi2BIq6EHTHCA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="118469494" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by orviesa005.jf.intel.com with ESMTP; 13 Feb 2025 13:16:59 -0800 From: kan.liang@linux.intel.com To: peterz@infradead.org, mingo@redhat.com, acme@kernel.org, namhyung@kernel.org, irogers@google.com, adrian.hunter@intel.com, alexander.shishkin@linux.intel.com, linux-kernel@vger.kernel.org Cc: ak@linux.intel.com, eranian@google.com, dapeng1.mi@linux.intel.com, thomas.falcon@intel.com, Kan Liang Subject: [PATCH V3 2/5] perf/x86/intel: Track the num of events needs late setup Date: Thu, 13 Feb 2025 13:17:15 -0800 Message-Id: <20250213211718.2406744-3-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20250213211718.2406744-1-kan.liang@linux.intel.com> References: <20250213211718.2406744-1-kan.liang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang When a machine supports PEBS v6, perf unconditionally searches the cpuc->event_list[] for every event and check if the late setup is required, which is unnecessary. The late setup is only required for special events, e.g., events support counters snapshotting feature. Add n_late_setup to track the num of events that needs the late setup. Other features, e.g., auto counter reload feature, require the late setup as well. Add a wrapper, intel_pmu_pebs_late_setup, for the events that support counters snapshotting feature. Signed-off-by: Kan Liang Tested-by: Thomas Falcon --- arch/x86/events/intel/core.c | 14 ++++++++++++++ arch/x86/events/intel/ds.c | 3 +-- arch/x86/events/perf_event.h | 5 +++++ 3 files changed, 20 insertions(+), 2 deletions(-) diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index 2df05b18ff04..ce04553910ab 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -2609,6 +2609,8 @@ static void intel_pmu_del_event(struct perf_event *ev= ent) intel_pmu_lbr_del(event); if (event->attr.precise_ip) intel_pmu_pebs_del(event); + if (is_pebs_counter_event_group(event)) + this_cpu_ptr(&cpu_hw_events)->n_late_setup--; } =20 static int icl_set_topdown_event_period(struct perf_event *event) @@ -2920,12 +2922,24 @@ static void intel_pmu_enable_event(struct perf_even= t *event) } } =20 +void intel_pmu_late_setup(void) +{ + struct cpu_hw_events *cpuc =3D this_cpu_ptr(&cpu_hw_events); + + if (!cpuc->n_late_setup) + return; + + intel_pmu_pebs_late_setup(cpuc); +} + static void intel_pmu_add_event(struct perf_event *event) { if (event->attr.precise_ip) intel_pmu_pebs_add(event); if (intel_pmu_needs_branch_stack(event)) intel_pmu_lbr_add(event); + if (is_pebs_counter_event_group(event)) + this_cpu_ptr(&cpu_hw_events)->n_late_setup++; } =20 /* diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index e8f808905871..df9499d6e4dc 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -1355,9 +1355,8 @@ static void __intel_pmu_pebs_update_cfg(struct perf_e= vent *event, } =20 =20 -static void intel_pmu_late_setup(void) +void intel_pmu_pebs_late_setup(struct cpu_hw_events *cpuc) { - struct cpu_hw_events *cpuc =3D this_cpu_ptr(&cpu_hw_events); struct perf_event *event; u64 pebs_data_cfg =3D 0; int i; diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index f4693409e191..5bf9c117e9ef 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -261,6 +261,7 @@ struct cpu_hw_events { struct event_constraint *event_constraint[X86_PMC_IDX_MAX]; =20 int n_excl; /* the number of exclusive events */ + int n_late_setup; /* the num of events needs late setup */ =20 unsigned int txn_flags; int is_fake; @@ -1602,6 +1603,8 @@ void intel_pmu_disable_bts(void); =20 int intel_pmu_drain_bts_buffer(void); =20 +void intel_pmu_late_setup(void); + u64 grt_latency_data(struct perf_event *event, u64 status); =20 u64 cmt_latency_data(struct perf_event *event, u64 status); @@ -1658,6 +1661,8 @@ void intel_pmu_pebs_disable_all(void); =20 void intel_pmu_pebs_sched_task(struct perf_event_pmu_context *pmu_ctx, boo= l sched_in); =20 +void intel_pmu_pebs_late_setup(struct cpu_hw_events *cpuc); + void intel_pmu_drain_pebs_buffer(void); =20 void intel_pmu_store_pebs_lbrs(struct lbr_entry *lbr); --=20 2.38.1 From nobody Thu Dec 18 20:19:08 2025 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 41215266B6F for ; Thu, 13 Feb 2025 21:17:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.12 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739481424; cv=none; b=aeTov3cIyPff4szKTxX6bvLtNpdqo4PKKIw/Alq7uBCltbnZHnYy6eb4eVevB3UqexfKRE+MM+PYAswdNYXLj1eRJBdwiai+KOrLI2JGjtEg3mjld+aib79SKyvIGxBf9xk2Ie483zObwFaEmGPXV4AYLPiWd90Meq9sz3v+yIs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739481424; c=relaxed/simple; bh=Un2UlhEVBthfHvpKKGwNEa3ysX4b7bWuxH0qmz6lk2k=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=tYgbVc4YHMHkWey5DICBx+Xf0yvJ3ZyqWcVZDaETYjFL+Jzze8V2Wlxcr04d7BqgfpdULiRjsZMmXQIP66d3bTLcDKhZeVDAHn77LsV2ZdVken/uCefJmATxk05WrRWAMsLALMLTVEYXdq0/JuU2YVGR5dlvt0/+kT62fNsdARM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=J80q+q+P; arc=none smtp.client-ip=192.198.163.12 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="J80q+q+P" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1739481423; x=1771017423; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Un2UlhEVBthfHvpKKGwNEa3ysX4b7bWuxH0qmz6lk2k=; b=J80q+q+PlbDPQf0rKdCa9znrL6lsmd7qrytD5Eyh/qEiP5r/KfmJ/pZc axYL9WL0B+faMCLgnKjKsJsqEFDxGitNGtvO0Eco9VfOYKYFGpMpIOHms PaiwsG2QSnVKpOHICBtrpplpfEo/3Y2LX9d6H7gxvGzod8mXQfCclSPQp 7D2FJQpOrsiGofgxTpAU/7yqJgQ3Yzlv4/mBlgho1MDnxM63Tmjs8Aznx VXX5NSZCTx9hGW+MouxoAZI4JFR7aXQT1sbj0okl+L26Gw9wNIrhA6wSp Sm5cZfK17NtmhnWV9kP0H2IQqAB7SGAOviuE68cgwrRPveEY3PL22O6TG w==; X-CSE-ConnectionGUID: LcRn75bXQ2Whjd68sCDdjg== X-CSE-MsgGUID: t0s4C4WHSvu+SocmvZ9Few== X-IronPort-AV: E=McAfee;i="6700,10204,11344"; a="44142407" X-IronPort-AV: E=Sophos;i="6.13,282,1732608000"; d="scan'208";a="44142407" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by fmvoesa106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Feb 2025 13:16:58 -0800 X-CSE-ConnectionGUID: 6iwdQMz4SfGPlckAvqhQuQ== X-CSE-MsgGUID: PjvPLpbmS+OOy16gbzzyoQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="118469497" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by orviesa005.jf.intel.com with ESMTP; 13 Feb 2025 13:16:59 -0800 From: kan.liang@linux.intel.com To: peterz@infradead.org, mingo@redhat.com, acme@kernel.org, namhyung@kernel.org, irogers@google.com, adrian.hunter@intel.com, alexander.shishkin@linux.intel.com, linux-kernel@vger.kernel.org Cc: ak@linux.intel.com, eranian@google.com, dapeng1.mi@linux.intel.com, thomas.falcon@intel.com, Kan Liang Subject: [PATCH V3 3/5] perf: Extend the bit width of the arch-specific flag Date: Thu, 13 Feb 2025 13:17:16 -0800 Message-Id: <20250213211718.2406744-4-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20250213211718.2406744-1-kan.liang@linux.intel.com> References: <20250213211718.2406744-1-kan.liang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang The auto counter reload feature requires an event flag to indicate an auto counter reload group, which can only be scheduled on specific counters that enumerated in CPUID. However, the hw_perf_event.flags has run out on X86. Two solutions were considered to address the issue. - Currently, 20 bits are reserved for the architecture-specific flags. Only the bit 31 is used for the generic flag. There is still plenty of space left. Reserve 8 more bits for the arch-specific flags. - Add a new X86 specific hw_perf_event.flags1 to support more flags. The former is implemented. Enough room is still left in the global generic flag. Signed-off-by: Kan Liang Tested-by: Thomas Falcon --- arch/x86/events/perf_event_flags.h | 41 +++++++++++++++--------------- include/linux/perf_event.h | 2 +- 2 files changed, 22 insertions(+), 21 deletions(-) diff --git a/arch/x86/events/perf_event_flags.h b/arch/x86/events/perf_even= t_flags.h index 1d9e385649b5..70078334e4a3 100644 --- a/arch/x86/events/perf_event_flags.h +++ b/arch/x86/events/perf_event_flags.h @@ -2,23 +2,24 @@ /* * struct hw_perf_event.flags flags */ -PERF_ARCH(PEBS_LDLAT, 0x00001) /* ld+ldlat data address sampling */ -PERF_ARCH(PEBS_ST, 0x00002) /* st data address sampling */ -PERF_ARCH(PEBS_ST_HSW, 0x00004) /* haswell style datala, store */ -PERF_ARCH(PEBS_LD_HSW, 0x00008) /* haswell style datala, load */ -PERF_ARCH(PEBS_NA_HSW, 0x00010) /* haswell style datala, unknown */ -PERF_ARCH(EXCL, 0x00020) /* HT exclusivity on counter */ -PERF_ARCH(DYNAMIC, 0x00040) /* dynamic alloc'd constraint */ -PERF_ARCH(PEBS_CNTR, 0x00080) /* PEBS counters snapshot */ -PERF_ARCH(EXCL_ACCT, 0x00100) /* accounted EXCL event */ -PERF_ARCH(AUTO_RELOAD, 0x00200) /* use PEBS auto-reload */ -PERF_ARCH(LARGE_PEBS, 0x00400) /* use large PEBS */ -PERF_ARCH(PEBS_VIA_PT, 0x00800) /* use PT buffer for PEBS */ -PERF_ARCH(PAIR, 0x01000) /* Large Increment per Cycle */ -PERF_ARCH(LBR_SELECT, 0x02000) /* Save/Restore MSR_LBR_SELECT */ -PERF_ARCH(TOPDOWN, 0x04000) /* Count Topdown slots/metrics events */ -PERF_ARCH(PEBS_STLAT, 0x08000) /* st+stlat data address sampling */ -PERF_ARCH(AMD_BRS, 0x10000) /* AMD Branch Sampling */ -PERF_ARCH(PEBS_LAT_HYBRID, 0x20000) /* ld and st lat for hybrid */ -PERF_ARCH(NEEDS_BRANCH_STACK, 0x40000) /* require branch stack setup */ -PERF_ARCH(BRANCH_COUNTERS, 0x80000) /* logs the counters in the extra spac= e of each branch */ +PERF_ARCH(PEBS_LDLAT, 0x0000001) /* ld+ldlat data address sampling */ +PERF_ARCH(PEBS_ST, 0x0000002) /* st data address sampling */ +PERF_ARCH(PEBS_ST_HSW, 0x0000004) /* haswell style datala, store */ +PERF_ARCH(PEBS_LD_HSW, 0x0000008) /* haswell style datala, load */ +PERF_ARCH(PEBS_NA_HSW, 0x0000010) /* haswell style datala, unknown */ +PERF_ARCH(EXCL, 0x0000020) /* HT exclusivity on counter */ +PERF_ARCH(DYNAMIC, 0x0000040) /* dynamic alloc'd constraint */ +PERF_ARCH(PEBS_CNTR, 0x0000080) /* PEBS counters snapshot */ +PERF_ARCH(EXCL_ACCT, 0x0000100) /* accounted EXCL event */ +PERF_ARCH(AUTO_RELOAD, 0x0000200) /* use PEBS auto-reload */ +PERF_ARCH(LARGE_PEBS, 0x0000400) /* use large PEBS */ +PERF_ARCH(PEBS_VIA_PT, 0x0000800) /* use PT buffer for PEBS */ +PERF_ARCH(PAIR, 0x0001000) /* Large Increment per Cycle */ +PERF_ARCH(LBR_SELECT, 0x0002000) /* Save/Restore MSR_LBR_SELECT */ +PERF_ARCH(TOPDOWN, 0x0004000) /* Count Topdown slots/metrics events */ +PERF_ARCH(PEBS_STLAT, 0x0008000) /* st+stlat data address sampling */ +PERF_ARCH(AMD_BRS, 0x0010000) /* AMD Branch Sampling */ +PERF_ARCH(PEBS_LAT_HYBRID, 0x0020000) /* ld and st lat for hybrid */ +PERF_ARCH(NEEDS_BRANCH_STACK, 0x0040000) /* require branch stack setup */ +PERF_ARCH(BRANCH_COUNTERS, 0x0080000) /* logs the counters in the extra sp= ace of each branch */ +PERF_ARCH(ACR, 0x0100000) /* Auto counter reload */ diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index c381ea7135df..238879f6c3e3 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -144,7 +144,7 @@ struct hw_perf_event_extra { * PERF_EVENT_FLAG_ARCH bits are reserved for architecture-specific * usage. */ -#define PERF_EVENT_FLAG_ARCH 0x000fffff +#define PERF_EVENT_FLAG_ARCH 0x0fffffff #define PERF_EVENT_FLAG_USER_READ_CNT 0x80000000 =20 static_assert((PERF_EVENT_FLAG_USER_READ_CNT & PERF_EVENT_FLAG_ARCH) =3D= =3D 0); --=20 2.38.1 From nobody Thu Dec 18 20:19:08 2025 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 87FB6266B78 for ; Thu, 13 Feb 2025 21:17:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.12 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739481425; cv=none; b=uM+12UDK91TwQfrcx9Ho5Q+dMDCxf8e6CWFqmEIBiVDKLOLd2DLyY1smadH3pgdlctygzFMm5DYxHRZy1tPO3nwhmyXt9mH6TA4AS1TB/0uX8+ZIyTE+v4N0M1MJcDrKYxdatM+ouE2AZhcN4fhvwmWdB5G7X+WUC/TC4jYCOoc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739481425; c=relaxed/simple; bh=lEmnA3tIju3nmgAlg2whcVPWZoz5WM4KKLlKQAyXktk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=J3hvvJgVX4cH+j4/d4/Futqa9aoRRbp6c2I5xwXvAGhrK+0H8nKrPkvPhEdkHsCmvz+zGuliHEaMqBBZ8xuhjRvFPhbOriyo51shbCsOT1UDfNZwmOWkG8HIIIrUTu5h3HFrYuYT8IOkWlZ6+vFMqgi0XpCfdHrLn4YvNwtzXYY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=eae9KOEw; arc=none smtp.client-ip=192.198.163.12 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="eae9KOEw" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1739481423; x=1771017423; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=lEmnA3tIju3nmgAlg2whcVPWZoz5WM4KKLlKQAyXktk=; b=eae9KOEwcpEagLeIqnchQRS2/I/gBn0yQvB93bEfEBmwW5kPcFRouuqU lhPcbkF4CKEfGE2sQl8QO3vOLfXZ2TKvBMRb7R88k5raFW8J3VSxE2DTO 9Prr72VlklKWgiPvyWf7kNCEuvNUFIqej+g/FKNuCqu9n51/lOcEK2YHF ujcwBQgAighbut79pUsVYd6YBg2PABiY69DuZUOgR3mxuT++Rnwh1GV1y FBuCn9i/4iDdiMWtsYUN3zCdTe3QDNLsa7lK8Nm/pvGaU1SaGs9U9ZTh1 arCe2KW8V2nCi1lrbvcNuQqx3q+56fQ24mYZ/ZomU8wtnSNgNrdjU/DpU g==; X-CSE-ConnectionGUID: NMHxb1YWTBK1YmiMhJsmMQ== X-CSE-MsgGUID: washyBzlSvGW8LXbpZzqOg== X-IronPort-AV: E=McAfee;i="6700,10204,11344"; a="44142411" X-IronPort-AV: E=Sophos;i="6.13,282,1732608000"; d="scan'208";a="44142411" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by fmvoesa106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Feb 2025 13:16:59 -0800 X-CSE-ConnectionGUID: sMhnifXkRCez11IiwOOQlg== X-CSE-MsgGUID: xTdfmlCYQluLs3yW2+Px+w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="118469503" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by orviesa005.jf.intel.com with ESMTP; 13 Feb 2025 13:16:59 -0800 From: kan.liang@linux.intel.com To: peterz@infradead.org, mingo@redhat.com, acme@kernel.org, namhyung@kernel.org, irogers@google.com, adrian.hunter@intel.com, alexander.shishkin@linux.intel.com, linux-kernel@vger.kernel.org Cc: ak@linux.intel.com, eranian@google.com, dapeng1.mi@linux.intel.com, thomas.falcon@intel.com, Kan Liang Subject: [PATCH V3 4/5] perf/x86/intel: Add CPUID enumeration for the auto counter reload Date: Thu, 13 Feb 2025 13:17:17 -0800 Message-Id: <20250213211718.2406744-5-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20250213211718.2406744-1-kan.liang@linux.intel.com> References: <20250213211718.2406744-1-kan.liang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang The counters that support the auto counter reload feature can be enumerated in the CPUID Leaf 0x23 sub-leaf 0x2. Add acr_cntr_mask to store the mask of counters which are reloadable. Add acr_cause_mask to store the mask of counters which can cause reload. Since the e-core and p-core may have different numbers of counters, track the masks in the struct x86_hybrid_pmu as well. Signed-off-by: Kan Liang Tested-by: Thomas Falcon --- arch/x86/events/intel/core.c | 10 ++++++++++ arch/x86/events/perf_event.h | 17 +++++++++++++++++ arch/x86/include/asm/perf_event.h | 1 + 3 files changed, 28 insertions(+) diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index ce04553910ab..8e3ad9efd798 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -5075,6 +5075,16 @@ static void update_pmu_cap(struct x86_hybrid_pmu *pm= u) pmu->fixed_cntr_mask64 =3D fixed_cntr; } =20 + if (eax.split.acr_subleaf) { + cpuid_count(ARCH_PERFMON_EXT_LEAF, ARCH_PERFMON_ACR_LEAF, + &cntr, &fixed_cntr, &ecx, &edx); + /* The mask of the counters which can be reloaded */ + pmu->acr_cntr_mask64 =3D cntr | ((u64)fixed_cntr << INTEL_PMC_IDX_FIXED); + + /* The mask of the counters which can cause a reload of reloadable count= ers */ + pmu->acr_cause_mask64 =3D ecx | ((u64)edx << INTEL_PMC_IDX_FIXED); + } + if (!intel_pmu_broken_perf_cap()) { /* Perf Metric (Bit 15) and PEBS via PT (Bit 16) are hybrid enumeration = */ rdmsrl(MSR_IA32_PERF_CAPABILITIES, pmu->intel_cap.capabilities); diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 5bf9c117e9ef..2184ae0c9a4a 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -725,6 +725,15 @@ struct x86_hybrid_pmu { u64 fixed_cntr_mask64; unsigned long fixed_cntr_mask[BITS_TO_LONGS(X86_PMC_IDX_MAX)]; }; + + union { + u64 acr_cntr_mask64; + unsigned long acr_cntr_mask[BITS_TO_LONGS(X86_PMC_IDX_MAX)]; + }; + union { + u64 acr_cause_mask64; + unsigned long acr_cause_mask[BITS_TO_LONGS(X86_PMC_IDX_MAX)]; + }; struct event_constraint unconstrained; =20 u64 hw_cache_event_ids @@ -823,6 +832,14 @@ struct x86_pmu { u64 fixed_cntr_mask64; unsigned long fixed_cntr_mask[BITS_TO_LONGS(X86_PMC_IDX_MAX)]; }; + union { + u64 acr_cntr_mask64; + unsigned long acr_cntr_mask[BITS_TO_LONGS(X86_PMC_IDX_MAX)]; + }; + union { + u64 acr_cause_mask64; + unsigned long acr_cause_mask[BITS_TO_LONGS(X86_PMC_IDX_MAX)]; + }; int cntval_bits; u64 cntval_mask; union { diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_= event.h index eaf0d5245999..5b7a84254ee5 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -195,6 +195,7 @@ union cpuid10_edx { */ #define ARCH_PERFMON_EXT_LEAF 0x00000023 #define ARCH_PERFMON_NUM_COUNTER_LEAF 0x1 +#define ARCH_PERFMON_ACR_LEAF 0x2 =20 union cpuid35_eax { struct { --=20 2.38.1 From nobody Thu Dec 18 20:19:08 2025 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C2D87269804 for ; Thu, 13 Feb 2025 21:17:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.12 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739481426; cv=none; b=aLOE/ICC2+KCij6opMo7kBa8jykhn1DR4ld34YqXjuVsAqNCjWtgC5qVuhS9Y7bB1m8xD5GyV3luvmwu5Xi15oVzyXltLacJUn7DW8FzW7dZ9pc0bX8tvp4piCLdKSsV6LOdMSbKjhDuW6FIGoMCI0O//VvTjDVJ622I/5GnLZs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739481426; c=relaxed/simple; bh=AQEhKoiu4fU+8Op/1jvMM9sgsk1zRyCUQFgEuj4rL+k=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version:Content-Type; b=n5V+43s97PDNrjvMSnE1QNvn5ng0e7kvPNLPz59Y1j6/VscLDLCOGAktlruzpAp7qdn9pVTEUK7RSppwSELboGrr4Ax0jea+PCuriE8Vn41Ch2ARxowARJewQGD1k0/sEhkfqH0y2K8L2QliJKrev5f0uv9QX2H/WisWjlRZb0U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=nK6gyF4U; arc=none smtp.client-ip=192.198.163.12 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="nK6gyF4U" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1739481424; x=1771017424; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=AQEhKoiu4fU+8Op/1jvMM9sgsk1zRyCUQFgEuj4rL+k=; b=nK6gyF4UpGd+/A22iz2wmXy0YUl2obiv+O+vxJH+6XjgKef/DJHB/ThN pEVH/feMUFevFAeq5a3c1uwafSPUL86D5u9n4xBQSxMLWI8ZtihSEY9KX dFuHPiDdppzqecfH0PyPdSygItdJT1HRoG48dz4acQPep19tJvMvG1i73 eqZ/71U61RuQ/362P/OtUYvxGxSgLPYdbw0800MQnvJpg3r3jK2c/KtHm zWWORUHqQAI7/cOyLE44OdZuQVrGumnHdfq4+6H6AYJf9bW2Cxk+U853B dD2yWLEJKKT4mwh6DzioXHo+/L5Qj3MzsCu+1mpzas09O18MMZ2h9Q+be g==; X-CSE-ConnectionGUID: PUzQ3QOkTv+qRiVbySufHw== X-CSE-MsgGUID: ohpdmGQxSrO8hz3btAwdLg== X-IronPort-AV: E=McAfee;i="6700,10204,11344"; a="44142416" X-IronPort-AV: E=Sophos;i="6.13,282,1732608000"; d="scan'208";a="44142416" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by fmvoesa106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Feb 2025 13:16:59 -0800 X-CSE-ConnectionGUID: SPltJ7MXSDuK52GpeLKvMg== X-CSE-MsgGUID: rGYaUMsOQ0yr6tIoGMZOag== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="118469506" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by orviesa005.jf.intel.com with ESMTP; 13 Feb 2025 13:16:59 -0800 From: kan.liang@linux.intel.com To: peterz@infradead.org, mingo@redhat.com, acme@kernel.org, namhyung@kernel.org, irogers@google.com, adrian.hunter@intel.com, alexander.shishkin@linux.intel.com, linux-kernel@vger.kernel.org Cc: ak@linux.intel.com, eranian@google.com, dapeng1.mi@linux.intel.com, thomas.falcon@intel.com, Kan Liang Subject: [PATCH V3 5/5] perf/x86/intel: Support auto counter reload Date: Thu, 13 Feb 2025 13:17:18 -0800 Message-Id: <20250213211718.2406744-6-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20250213211718.2406744-1-kan.liang@linux.intel.com> References: <20250213211718.2406744-1-kan.liang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Kan Liang The relative rates among two or more events are useful for performance analysis, e.g., a high branch miss rate may indicate a performance issue. Usually, the samples with a relative rate that exceeds some threshold are more useful. However, the traditional sampling takes samples of events separately. To get the relative rates among two or more events, a high sample rate is required, which can bring high overhead. Many samples taken in the non-hotspot area are also dropped (useless) in the post-process. The auto counter reload (ACR) feature takes samples when the relative rate of two or more events exceeds some threshold, which provides the fine-grained information at a low cost. To support the feature, two sets of MSRs are introduced. For a given counter IA32_PMC_GPn_CTR/IA32_PMC_FXm_CTR, bit fields in the IA32_PMC_GPn_CFG_B/IA32_PMC_FXm_CFG_B MSR indicate which counter(s) can cause a reload of that counter. The reload value is stored in the IA32_PMC_GPn_CFG_C/IA32_PMC_FXm_CFG_C. The details can be found at Intel SDM (085), Volume 3, 21.9.11 Auto Counter Reload. In the hw_config(), an ACR event is specially configured, because the cause/reloadable counter mask has to be applied to the dyn_constraint. Besides the HW limit, e.g., not support perf metrics, PDist and etc, a SW limit is applied as well. ACR events in a group must be contiguous. It facilitates the later conversion from the event idx to the counter idx. Otherwise, the intel_pmu_acr_late_setup() has to traverse the whole event list again to find the "cause" event. Also, add a new flag PERF_X86_EVENT_ACR to indicate an ACR group, which is set to the group leader. The late setup() is also required for an ACR group. It's to convert the event idx to the counter idx, and saved it in hw.config1. The ACR configuration MSRs are only updated in the enable_event(). The disable_event() doesn't clear the ACR CFG register. Add acr_cfg_b/acr_cfg_c in the struct cpu_hw_events to cache the MSR values. It can avoid a MSR write if the value is not changed. Expose an acr_mask to the sysfs. The perf tool can utilize the new format to configure the relation of events in the group. The bit sequence of the acr_mask follows the events enabled order of the group. Example: Here is the snippet of the mispredict.c. Since the array has a random numbers, jumps are random and often mispredicted. The mispredicted rate depends on the compared value. For the Loop1, ~11% of all branches are mispredicted. For the Loop2, ~21% of all branches are mispredicted. main() { ... for (i =3D 0; i < N; i++) data[i] =3D rand() % 256; ... /* Loop 1 */ for (k =3D 0; k < 50; k++) for (i =3D 0; i < N; i++) if (data[i] >=3D 64) sum +=3D data[i]; ... ... /* Loop 2 */ for (k =3D 0; k < 50; k++) for (i =3D 0; i < N; i++) if (data[i] >=3D 128) sum +=3D data[i]; ... } Usually, a code with a high branch miss rate means a bad performance. To understand the branch miss rate of the codes, the traditional method usually samples both branches and branch-misses events. E.g., perf record -e "{cpu_atom/branch-misses/ppu, cpu_atom/branch-instructions/u= }" -c 1000000 -- ./mispredict [ perf record: Woken up 4 times to write data ] [ perf record: Captured and wrote 0.925 MB perf.data (5106 samples) ] The 5106 samples are from both events and spread in both Loops. In the post-process stage, a user can know that the Loop 2 has a 21% branch miss rate. Then they can focus on the samples of branch-misses events for the Loop 2. With this patch, the user can generate the samples only when the branch miss rate > 20%. For example, perf record -e "{cpu_atom/branch-misses,period=3D200000,acr_mask=3D0x2/ppu, cpu_atom/branch-instructions,period=3D1000000,acr_mask=3D0= x3/u}" -- ./mispredict (Two different periods are applied to branch-misses and branch-instructions. The ratio is set to 20%. If the branch-instructions is overflowed first, the branch-miss rate < 20%. No samples should be generated. All counters should be automatically reloaded. If the branch-misses is overflowed first, the branch-miss rate > 20%. A sample triggered by the branch-misses event should be generated. Just the counter of the branch-instructions should be automatically reloaded. The branch-misses event should only be automatically reloaded when the branch-instructions is overflowed. So the "cause" event is the branch-instructions event. The acr_mask is set to 0x2, since the event index of branch-instructions is 1. The branch-instructions event is automatically reloaded no matter which events are overflowed. So the "cause" events are the branch-misses and the branch-instructions event. The acr_mask should be set to 0x3.) [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.098 MB perf.data (2498 samples) ] $perf report Percent =E2=94=82154: movl $0x0,-0x14(%rbp) =E2=94=82 =E2=86=93 jmp 1af =E2=94=82 for (i =3D j; i < N; i++) =E2=94=8215d: mov -0x10(%rbp),%eax =E2=94=82 mov %eax,-0x18(%rbp) =E2=94=82 =E2=86=93 jmp 1a2 =E2=94=82 if (data[i] >=3D 128) =E2=94=82165: mov -0x18(%rbp),%eax =E2=94=82 cltq =E2=94=82 lea 0x0(,%rax,4),%rdx =E2=94=82 mov -0x8(%rbp),%rax =E2=94=82 add %rdx,%rax =E2=94=82 mov (%rax),%eax =E2=94=82 =E2=94=8C=E2=94=80=E2=94=80cmp $0x7f,%eax 100.00 0.00 =E2=94=82 =E2=94=9C=E2=94=80=E2=94=80jle 19e =E2=94=82 =E2=94=82sum +=3D data[i]; The 2498 samples are all from the branch-misses events for the Loop 2. The number of samples and overhead is significantly reduced without losing any information. Signed-off-by: Kan Liang Tested-by: Thomas Falcon --- arch/x86/events/core.c | 2 +- arch/x86/events/intel/core.c | 219 ++++++++++++++++++++++++++++++- arch/x86/events/perf_event.h | 10 ++ arch/x86/include/asm/msr-index.h | 4 + include/linux/perf_event.h | 1 + 5 files changed, 233 insertions(+), 3 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index b56fa6a9d7a4..c2525567a03f 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -756,7 +756,7 @@ void x86_pmu_enable_all(int added) } } =20 -static inline int is_x86_event(struct perf_event *event) +int is_x86_event(struct perf_event *event) { int i; =20 diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index 8e3ad9efd798..89298cde7056 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -2609,7 +2609,8 @@ static void intel_pmu_del_event(struct perf_event *ev= ent) intel_pmu_lbr_del(event); if (event->attr.precise_ip) intel_pmu_pebs_del(event); - if (is_pebs_counter_event_group(event)) + if (is_pebs_counter_event_group(event) || + is_acr_event_group(event)) this_cpu_ptr(&cpu_hw_events)->n_late_setup--; } =20 @@ -2888,6 +2889,54 @@ static void intel_pmu_enable_fixed(struct perf_event= *event) cpuc->fixed_ctrl_val |=3D bits; } =20 +static void intel_pmu_config_acr(int idx, u64 mask, u32 reload) +{ + struct cpu_hw_events *cpuc =3D this_cpu_ptr(&cpu_hw_events); + int msr_b, msr_c; + + if (!mask && !cpuc->acr_cfg_b[idx]) + return; + + if (idx < INTEL_PMC_IDX_FIXED) { + msr_b =3D MSR_IA32_PMC_V6_GP0_CFG_B; + msr_c =3D MSR_IA32_PMC_V6_GP0_CFG_C; + } else { + msr_b =3D MSR_IA32_PMC_V6_FX0_CFG_B; + msr_c =3D MSR_IA32_PMC_V6_FX0_CFG_C; + idx -=3D INTEL_PMC_IDX_FIXED; + } + + if (cpuc->acr_cfg_b[idx] !=3D mask) { + wrmsrl(msr_b + x86_pmu.addr_offset(idx, false), mask); + cpuc->acr_cfg_b[idx] =3D mask; + } + /* Only need to update the reload value when there is a valid config valu= e. */ + if (mask && cpuc->acr_cfg_c[idx] !=3D reload) { + wrmsrl(msr_c + x86_pmu.addr_offset(idx, false), reload); + cpuc->acr_cfg_c[idx] =3D reload; + } +} + +static void intel_pmu_enable_acr(struct perf_event *event) +{ + struct hw_perf_event *hwc =3D &event->hw; + + /* The PMU doesn't support ACR */ + if (!hybrid(event->pmu, acr_cntr_mask64)) + return; + + if (!is_acr_event_group(event) || !event->attr.config2) { + /* + * The disable doesn't clear the ACR CFG register. + * Check and clear the ACR CFG register. + */ + intel_pmu_config_acr(hwc->idx, 0, 0); + return; + } + + intel_pmu_config_acr(hwc->idx, hwc->config1, -hwc->sample_period); +} + static void intel_pmu_enable_event(struct perf_event *event) { u64 enable_mask =3D ARCH_PERFMON_EVENTSEL_ENABLE; @@ -2903,8 +2952,11 @@ static void intel_pmu_enable_event(struct perf_event= *event) enable_mask |=3D ARCH_PERFMON_EVENTSEL_BR_CNTR; intel_set_masks(event, idx); __x86_pmu_enable_event(hwc, enable_mask); + intel_pmu_enable_acr(event); break; case INTEL_PMC_IDX_FIXED ... INTEL_PMC_IDX_FIXED_BTS - 1: + intel_pmu_enable_acr(event); + fallthrough; case INTEL_PMC_IDX_METRIC_BASE ... INTEL_PMC_IDX_METRIC_END: intel_pmu_enable_fixed(event); break; @@ -2922,6 +2974,31 @@ static void intel_pmu_enable_event(struct perf_event= *event) } } =20 +static void intel_pmu_acr_late_setup(struct cpu_hw_events *cpuc) +{ + struct perf_event *event, *leader; + int i, j, idx; + + for (i =3D 0; i < cpuc->n_events; i++) { + leader =3D cpuc->event_list[i]; + if (!is_acr_event_group(leader)) + continue; + + /* The ACR events must be contiguous. */ + for (j =3D i; j < cpuc->n_events; j++) { + event =3D cpuc->event_list[j]; + if (event->group_leader !=3D leader->group_leader) + break; + for_each_set_bit(idx, (unsigned long *)&event->attr.config2, X86_PMC_ID= X_MAX) { + if (WARN_ON_ONCE(i + idx > cpuc->n_events)) + return; + set_bit(cpuc->assign[i + idx], (unsigned long *)&event->hw.config1); + } + } + i =3D j - 1; + } +} + void intel_pmu_late_setup(void) { struct cpu_hw_events *cpuc =3D this_cpu_ptr(&cpu_hw_events); @@ -2930,6 +3007,7 @@ void intel_pmu_late_setup(void) return; =20 intel_pmu_pebs_late_setup(cpuc); + intel_pmu_acr_late_setup(cpuc); } =20 static void intel_pmu_add_event(struct perf_event *event) @@ -2938,7 +3016,8 @@ static void intel_pmu_add_event(struct perf_event *ev= ent) intel_pmu_pebs_add(event); if (intel_pmu_needs_branch_stack(event)) intel_pmu_lbr_add(event); - if (is_pebs_counter_event_group(event)) + if (is_pebs_counter_event_group(event) || + is_acr_event_group(event)) this_cpu_ptr(&cpu_hw_events)->n_late_setup++; } =20 @@ -4093,6 +4172,22 @@ static u64 intel_pmu_freq_start_period(struct perf_e= vent *event) return start; } =20 +static bool intel_pmu_is_acr_group(struct perf_event *event) +{ + if (!hybrid(event->pmu, acr_cntr_mask64)) + return false; + + /* The group leader has the ACR flag set */ + if (is_acr_event_group(event)) + return true; + + /* The acr_mask is set */ + if (event->attr.config2) + return true; + + return false; +} + static int intel_pmu_hw_config(struct perf_event *event) { int ret =3D x86_pmu_hw_config(event); @@ -4221,6 +4316,103 @@ static int intel_pmu_hw_config(struct perf_event *e= vent) event->attr.precise_ip) event->group_leader->hw.flags |=3D PERF_X86_EVENT_PEBS_CNTR; =20 + if (intel_pmu_is_acr_group(event)) { + struct perf_event *sibling, *leader =3D event->group_leader; + struct pmu *pmu =3D event->pmu; + u64 constraint =3D hybrid(pmu, acr_cntr_mask64); + bool has_sw_event =3D false; + int num =3D 0, idx =3D 0; + u64 cause_mask =3D 0; + + /* Not support perf metrics */ + if (is_metric_event(event)) + return -EINVAL; + + /* Not support freq mode */ + if (event->attr.freq) + return -EINVAL; + + /* PDist is not supported */ + if (event->attr.config2 && event->attr.precise_ip > 2) + return -EINVAL; + + /* The reload value cannot exceeds the max period */ + if (event->attr.sample_period > x86_pmu.max_period) + return -EINVAL; + /* + * The counter-constraints of each event cannot be finalized + * unless the whole group is scanned. However, it's hard + * to know whether the event is the last one of the group. + * Recalculate the counter-constraints for each event when + * adding a new event. + * + * The group is traversed twice, which may be optimized later. + * In the first round, + * - Find all events which do reload when other events + * overflow and set the corresponding counter-constraints + * - Add all events, which can cause other events reload, + * in the cause_mask + * - Error out if the number of events exceeds the HW limit + * - The ACR events must be contiguous. + * Error out if there are non-X86 events between ACR events. + * This is not a HW limit, but a SW limit. + * With the assumption, the intel_pmu_acr_late_setup() can + * easily convert the event idx to counter idx without + * traversing the whole event list. + */ + if (!is_x86_event(leader)) + return -EINVAL; + + if (leader->attr.config2) { + leader->hw.dyn_constraint &=3D constraint; + cause_mask |=3D leader->attr.config2; + num++; + } + + for_each_sibling_event(sibling, leader) { + if (!is_x86_event(sibling)) { + has_sw_event =3D true; + continue; + } + if (!sibling->attr.config2) + continue; + if (has_sw_event) + return -EINVAL; + sibling->hw.dyn_constraint &=3D constraint; + cause_mask |=3D sibling->attr.config2; + num++; + } + + if (leader !=3D event && event->attr.config2) { + if (has_sw_event) + return -EINVAL; + event->hw.dyn_constraint &=3D constraint; + cause_mask |=3D event->attr.config2; + num++; + } + + if (hweight64(cause_mask) > hweight64(hybrid(pmu, acr_cause_mask64)) || + num > hweight64(constraint)) + return -EINVAL; + /* + * In the second round, apply the counter-constraints for + * the events which can cause other events reload. + */ + constraint =3D hybrid(pmu, acr_cause_mask64); + if (test_bit(idx++, (unsigned long *)&cause_mask)) + leader->hw.dyn_constraint &=3D constraint; + + for_each_sibling_event(sibling, leader) { + if (test_bit(idx++, (unsigned long *)&cause_mask)) + sibling->hw.dyn_constraint &=3D constraint; + } + + if ((leader !=3D event) && test_bit(idx, (unsigned long *)&cause_mask)) + event->hw.dyn_constraint &=3D constraint; + + leader->hw.flags |=3D PERF_X86_EVENT_ACR; + } + if ((event->attr.type =3D=3D PERF_TYPE_HARDWARE) || (event->attr.type =3D=3D PERF_TYPE_HW_CACHE)) return 0; @@ -6070,6 +6262,21 @@ td_is_visible(struct kobject *kobj, struct attribute= *attr, int i) return attr->mode; } =20 +PMU_FORMAT_ATTR(acr_mask, "config2:0-63"); + +static struct attribute *format_acr_attrs[] =3D { + &format_attr_acr_mask.attr, + NULL +}; + +static umode_t +acr_is_visible(struct kobject *kobj, struct attribute *attr, int i) +{ + struct device *dev =3D kobj_to_dev(kobj); + + return hybrid(dev_get_drvdata(dev), acr_cntr_mask64) ? attr->mode : 0; +} + static struct attribute_group group_events_td =3D { .name =3D "events", .is_visible =3D td_is_visible, @@ -6112,6 +6319,12 @@ static struct attribute_group group_format_evtsel_ex= t =3D { .is_visible =3D evtsel_ext_is_visible, }; =20 +static struct attribute_group group_format_acr =3D { + .name =3D "format", + .attrs =3D format_acr_attrs, + .is_visible =3D acr_is_visible, +}; + static struct attribute_group group_default =3D { .attrs =3D intel_pmu_attrs, .is_visible =3D default_is_visible, @@ -6126,6 +6339,7 @@ static const struct attribute_group *attr_update[] = =3D { &group_format_extra, &group_format_extra_skl, &group_format_evtsel_ext, + &group_format_acr, &group_default, NULL, }; @@ -6410,6 +6624,7 @@ static const struct attribute_group *hybrid_attr_upda= te[] =3D { &group_caps_lbr, &hybrid_group_format_extra, &group_format_evtsel_ext, + &group_format_acr, &group_default, &hybrid_group_cpus, NULL, diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 2184ae0c9a4a..e43e5fe01905 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -120,6 +120,11 @@ static inline bool is_pebs_counter_event_group(struct = perf_event *event) return event->group_leader->hw.flags & PERF_X86_EVENT_PEBS_CNTR; } =20 +static inline bool is_acr_event_group(struct perf_event *event) +{ + return event->group_leader->hw.flags & PERF_X86_EVENT_ACR; +} + struct amd_nb { int nb_id; /* NorthBridge id */ int refcnt; /* reference count */ @@ -287,6 +292,10 @@ struct cpu_hw_events { u64 fixed_ctrl_val; u64 active_fixed_ctrl_val; =20 + /* Intel ACR configuration */ + u64 acr_cfg_b[X86_PMC_IDX_MAX]; + u64 acr_cfg_c[X86_PMC_IDX_MAX]; + /* * Intel LBR bits */ @@ -1127,6 +1136,7 @@ static struct perf_pmu_format_hybrid_attr format_attr= _hybrid_##_name =3D {\ .pmu_type =3D _pmu, \ } =20 +int is_x86_event(struct perf_event *event); struct pmu *x86_get_pmu(unsigned int cpu); extern struct x86_pmu x86_pmu __read_mostly; =20 diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-in= dex.h index 9a71880eec07..4c9361d8f05d 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -591,7 +591,11 @@ /* V6 PMON MSR range */ #define MSR_IA32_PMC_V6_GP0_CTR 0x1900 #define MSR_IA32_PMC_V6_GP0_CFG_A 0x1901 +#define MSR_IA32_PMC_V6_GP0_CFG_B 0x1902 +#define MSR_IA32_PMC_V6_GP0_CFG_C 0x1903 #define MSR_IA32_PMC_V6_FX0_CTR 0x1980 +#define MSR_IA32_PMC_V6_FX0_CFG_B 0x1982 +#define MSR_IA32_PMC_V6_FX0_CFG_C 0x1983 #define MSR_IA32_PMC_V6_STEP 4 =20 /* KeyID partitioning between MKTME and TDX */ diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 238879f6c3e3..24f2eba200ac 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -157,6 +157,7 @@ struct hw_perf_event { union { struct { /* hardware */ u64 config; + u64 config1; u64 last_tag; u64 dyn_constraint; unsigned long config_base; --=20 2.38.1