From nobody Thu Dec 18 08:37:56 2025 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AADB81B3749 for ; Wed, 31 Jul 2024 14:37:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.8 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722436678; cv=none; b=Q9YPp8S4+olJdZwKKg5Gbw7nKlkC590u3LKHdhkVb7LXybWvK8HfMJK4+/QYz00tzoY/rwuwAlXqy9FFPoi8ht0kg7DTWBDf9yEaJI+PJ23wGtfys85a9xMXkEX5ZLUQW/DNvB/j1eYLw8YcC+jJY694MDBpHV/PQgpj23HDX2s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722436678; c=relaxed/simple; bh=FyCsbtcSh00Iu77DT5wdUQWFMMNSUGwO8ZjNmoCEsAc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=aPjechV/ZE7WC1Lf4lZwI84q0zhYeu7PAYUuIcIz7z5rHakrf3Wv4Jey988AY4iUylnzsuQwWBYc0qBBmcHATnudmJ4zsPuXP8lZRn8OfdpdVBWt+DwDjmK739yx7V0xK9cl9ebyN6JafgftkvAO/Zj+q1qE++uEk8lePdYbOvM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=E72dg/DZ; arc=none smtp.client-ip=192.198.163.8 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="E72dg/DZ" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1722436675; x=1753972675; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=FyCsbtcSh00Iu77DT5wdUQWFMMNSUGwO8ZjNmoCEsAc=; b=E72dg/DZ98Nnn4sgMYFrDPd+Tqw2/eqq33ccoxNR9WEnwmr/S+TFe2It 6Q32+VI9KPk2nK2z9FjiZkzHYF7DBqE28uTHqEJvKcPTYiwFo0gwNoiLJ OqN9kbOtuUr5Z/E0axFUlSM7LFjxR8+D+iMDsYaezOs4/KE/L6Bw+Feq8 Y355a+HtSBgUG59q+ddGzUEnAeysr8Eq0fYKeJhUbmKhdVWJ2LM7QvOUM 3c/Yicwvy6/Rj4FcyUp+ZAEN9tL74ORUPdTijMK64+qQL2tuIPgvCBTdA WVwknhY+AK2mcgt5F1tQETNm9kVTloIVB1Zjpsl1elPdWBGalDHqQbUHZ A==; X-CSE-ConnectionGUID: 3xcw3MtlTsGRl7kFBVK4xg== X-CSE-MsgGUID: 6yBFdqseTxyy8Ku9i42+ww== X-IronPort-AV: E=McAfee;i="6700,10204,11150"; a="37835850" X-IronPort-AV: E=Sophos;i="6.09,251,1716274800"; d="scan'208";a="37835850" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jul 2024 07:37:53 -0700 X-CSE-ConnectionGUID: k1ToXL/+SsugOFKbSNVtPg== X-CSE-MsgGUID: necDGlacRsiLTcdXcQLQ+A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,251,1716274800"; d="scan'208";a="54698384" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by orviesa009.jf.intel.com with ESMTP; 31 Jul 2024 07:37:53 -0700 From: kan.liang@linux.intel.com To: peterz@infradead.org, mingo@kernel.org, acme@kernel.org, namhyung@kernel.org, irogers@google.com, adrian.hunter@intel.com, alexander.shishkin@linux.intel.com, linux-kernel@vger.kernel.org Cc: ak@linux.intel.com, eranian@google.com, Kan Liang , Sandipan Das , Ravi Bangoria , silviazhao Subject: [PATCH V4 1/5] perf/x86: Extend event update interface Date: Wed, 31 Jul 2024 07:38:31 -0700 Message-Id: <20240731143835.771618-2-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20240731143835.771618-1-kan.liang@linux.intel.com> References: <20240731143835.771618-1-kan.liang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang The current event update interface directly reads the values from the counter, but the values may not be the accurate ones users require. For example, the sample read feature wants the counter value of the member events when the leader event is overflow. But with the current implementation, the read (event update) actually happens in the NMI handler. There may be a small gap between the overflow and the NMI handler. The new Intel PEBS counters snapshotting feature can provide the accurate counter value in the overflow. The event update interface has to be updated to apply the given accurate values. Pass the accurate values via the event update interface. If the value is not available, still directly read the counter. Using u64 * rather than u64 as the new parameter. Because 0 might be a valid rdpmc() value. The !val cannot be used to distinguish between there begin an argument and there not being one. Also, for some cases, e.g., intel_update_topdown_event, there could be more than one counter/register are read. Reviewed-by: Andi Kleen Reviewed-by: Ian Rogers Signed-off-by: Kan Liang Cc: Sandipan Das Cc: Ravi Bangoria Cc: silviazhao --- arch/x86/events/amd/core.c | 2 +- arch/x86/events/core.c | 13 ++++++----- arch/x86/events/intel/core.c | 40 +++++++++++++++++++--------------- arch/x86/events/intel/p4.c | 2 +- arch/x86/events/perf_event.h | 4 ++-- arch/x86/events/zhaoxin/core.c | 2 +- 6 files changed, 36 insertions(+), 27 deletions(-) diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c index 920e3a640cad..284bf6157545 100644 --- a/arch/x86/events/amd/core.c +++ b/arch/x86/events/amd/core.c @@ -986,7 +986,7 @@ static int amd_pmu_v2_handle_irq(struct pt_regs *regs) =20 event =3D cpuc->events[idx]; hwc =3D &event->hw; - x86_perf_event_update(event); + x86_perf_event_update(event, NULL); mask =3D BIT_ULL(idx); =20 if (!(status & mask)) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 12f2a0c14d33..07a56bf71160 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -112,7 +112,7 @@ u64 __read_mostly hw_cache_extra_regs * Can only be executed on the CPU where the event is active. * Returns the delta events processed. */ -u64 x86_perf_event_update(struct perf_event *event) +u64 x86_perf_event_update(struct perf_event *event, u64 *val) { struct hw_perf_event *hwc =3D &event->hw; int shift =3D 64 - x86_pmu.cntval_bits; @@ -131,7 +131,10 @@ u64 x86_perf_event_update(struct perf_event *event) */ prev_raw_count =3D local64_read(&hwc->prev_count); do { - rdpmcl(hwc->event_base_rdpmc, new_raw_count); + if (!val) + rdpmcl(hwc->event_base_rdpmc, new_raw_count); + else + new_raw_count =3D *val; } while (!local64_try_cmpxchg(&hwc->prev_count, &prev_raw_count, new_raw_count)); =20 @@ -1598,7 +1601,7 @@ void x86_pmu_stop(struct perf_event *event, int flags) * Drain the remaining delta count out of a event * that we are disabling: */ - static_call(x86_pmu_update)(event); + static_call(x86_pmu_update)(event, NULL); hwc->state |=3D PERF_HES_UPTODATE; } } @@ -1689,7 +1692,7 @@ int x86_pmu_handle_irq(struct pt_regs *regs) =20 event =3D cpuc->events[idx]; =20 - val =3D static_call(x86_pmu_update)(event); + val =3D static_call(x86_pmu_update)(event, NULL); if (val & (1ULL << (x86_pmu.cntval_bits - 1))) continue; =20 @@ -2036,7 +2039,7 @@ static void x86_pmu_static_call_update(void) =20 static void _x86_pmu_read(struct perf_event *event) { - static_call(x86_pmu_update)(event); + static_call(x86_pmu_update)(event, NULL); } =20 void x86_pmu_show_pmu_cap(struct pmu *pmu) diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index 0c9c2706d4ec..f32d47cbe37f 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -2418,7 +2418,7 @@ static void intel_pmu_nhm_workaround(void) for (i =3D 0; i < 4; i++) { event =3D cpuc->events[i]; if (event) - static_call(x86_pmu_update)(event); + static_call(x86_pmu_update)(event, NULL); } =20 for (i =3D 0; i < 4; i++) { @@ -2710,7 +2710,7 @@ static void update_saved_topdown_regs(struct perf_eve= nt *event, u64 slots, * modify by a NMI. PMU has to be disabled before calling this function. */ =20 -static u64 intel_update_topdown_event(struct perf_event *event, int metric= _end) +static u64 intel_update_topdown_event(struct perf_event *event, int metric= _end, u64 *val) { struct cpu_hw_events *cpuc =3D this_cpu_ptr(&cpu_hw_events); struct perf_event *other; @@ -2718,13 +2718,18 @@ static u64 intel_update_topdown_event(struct perf_e= vent *event, int metric_end) bool reset =3D true; int idx; =20 - /* read Fixed counter 3 */ - rdpmcl((3 | INTEL_PMC_FIXED_RDPMC_BASE), slots); - if (!slots) - return 0; + if (!val) { + /* read Fixed counter 3 */ + rdpmcl((3 | INTEL_PMC_FIXED_RDPMC_BASE), slots); + if (!slots) + return 0; =20 - /* read PERF_METRICS */ - rdpmcl(INTEL_PMC_FIXED_RDPMC_METRICS, metrics); + /* read PERF_METRICS */ + rdpmcl(INTEL_PMC_FIXED_RDPMC_METRICS, metrics); + } else { + slots =3D val[0]; + metrics =3D val[1]; + } =20 for_each_set_bit(idx, cpuc->active_mask, metric_end + 1) { if (!is_topdown_idx(idx)) @@ -2767,10 +2772,11 @@ static u64 intel_update_topdown_event(struct perf_e= vent *event, int metric_end) return slots; } =20 -static u64 icl_update_topdown_event(struct perf_event *event) +static u64 icl_update_topdown_event(struct perf_event *event, u64 *val) { return intel_update_topdown_event(event, INTEL_PMC_IDX_METRIC_BASE + - x86_pmu.num_topdown_events - 1); + x86_pmu.num_topdown_events - 1, + val); } =20 DEFINE_STATIC_CALL(intel_pmu_update_topdown_event, x86_perf_event_update); @@ -2785,7 +2791,7 @@ static void intel_pmu_read_topdown_event(struct perf_= event *event) return; =20 perf_pmu_disable(event->pmu); - static_call(intel_pmu_update_topdown_event)(event); + static_call(intel_pmu_update_topdown_event)(event, NULL); perf_pmu_enable(event->pmu); } =20 @@ -2796,7 +2802,7 @@ static void intel_pmu_read_event(struct perf_event *e= vent) else if (is_topdown_count(event)) intel_pmu_read_topdown_event(event); else - x86_perf_event_update(event); + x86_perf_event_update(event, NULL); } =20 static void intel_pmu_enable_fixed(struct perf_event *event) @@ -2899,7 +2905,7 @@ static void intel_pmu_add_event(struct perf_event *ev= ent) */ int intel_pmu_save_and_restart(struct perf_event *event) { - static_call(x86_pmu_update)(event); + static_call(x86_pmu_update)(event, NULL); /* * For a checkpointed counter always reset back to 0. This * avoids a situation where the counter overflows, aborts the @@ -2922,12 +2928,12 @@ static int intel_pmu_set_period(struct perf_event *= event) return x86_perf_event_set_period(event); } =20 -static u64 intel_pmu_update(struct perf_event *event) +static u64 intel_pmu_update(struct perf_event *event, u64 *val) { if (unlikely(is_topdown_count(event))) - return static_call(intel_pmu_update_topdown_event)(event); + return static_call(intel_pmu_update_topdown_event)(event, val); =20 - return x86_perf_event_update(event); + return x86_perf_event_update(event, val); } =20 static void intel_pmu_reset(void) @@ -3091,7 +3097,7 @@ static int handle_pmi_common(struct pt_regs *regs, u6= 4 status) */ if (__test_and_clear_bit(GLOBAL_STATUS_PERF_METRICS_OVF_BIT, (unsigned lo= ng *)&status)) { handled++; - static_call(intel_pmu_update_topdown_event)(NULL); + static_call(intel_pmu_update_topdown_event)(NULL, NULL); } =20 /* diff --git a/arch/x86/events/intel/p4.c b/arch/x86/events/intel/p4.c index 844bc4fc4724..3177be0dedd1 100644 --- a/arch/x86/events/intel/p4.c +++ b/arch/x86/events/intel/p4.c @@ -1058,7 +1058,7 @@ static int p4_pmu_handle_irq(struct pt_regs *regs) /* it might be unflagged overflow */ overflow =3D p4_pmu_clear_cccr_ovf(hwc); =20 - val =3D x86_perf_event_update(event); + val =3D x86_perf_event_update(event, NULL); if (!overflow && (val & (1ULL << (x86_pmu.cntval_bits - 1)))) continue; =20 diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index ac1182141bf6..2cb5c2e31b1f 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -782,7 +782,7 @@ struct x86_pmu { void (*del)(struct perf_event *); void (*read)(struct perf_event *event); int (*set_period)(struct perf_event *event); - u64 (*update)(struct perf_event *event); + u64 (*update)(struct perf_event *event, u64 *val); int (*hw_config)(struct perf_event *event); int (*schedule_events)(struct cpu_hw_events *cpuc, int n, int *assign); unsigned eventsel; @@ -1131,7 +1131,7 @@ extern u64 __read_mostly hw_cache_extra_regs [PERF_COUNT_HW_CACHE_OP_MAX] [PERF_COUNT_HW_CACHE_RESULT_MAX]; =20 -u64 x86_perf_event_update(struct perf_event *event); +u64 x86_perf_event_update(struct perf_event *event, u64 *cntr); =20 static inline unsigned int x86_pmu_config_addr(int index) { diff --git a/arch/x86/events/zhaoxin/core.c b/arch/x86/events/zhaoxin/core.c index 2fd9b0cf9a5e..5fe3a9eed650 100644 --- a/arch/x86/events/zhaoxin/core.c +++ b/arch/x86/events/zhaoxin/core.c @@ -391,7 +391,7 @@ static int zhaoxin_pmu_handle_irq(struct pt_regs *regs) if (!test_bit(bit, cpuc->active_mask)) continue; =20 - x86_perf_event_update(event); + x86_perf_event_update(event, NULL); perf_sample_data_init(&data, 0, event->hw.last_period); =20 if (!x86_perf_event_set_period(event)) --=20 2.38.1 From nobody Thu Dec 18 08:37:56 2025 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 64AD11BBBD6 for ; Wed, 31 Jul 2024 14:37:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.8 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722436678; cv=none; b=LDZvc/l0c1JkYBSn7kh0eTdqIJOSGyGuy0Uy8CaTO5ThNyYJ7AdTVpGQduRyzbqUL5W8X9f3umA0rcvwwUnALihZ3e4jXXJImJnBql1n2Btj1E1qN67xhq4q1zyp1xsTm6I3vQfgu9OoC34Cmd8haFW+SAyjsy0hIbFo3tH3VVs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722436678; c=relaxed/simple; bh=jVDOCi1KfLth6h1o0JvRNsiOY34GdfN01cPp1eF3UZ0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=WdbmmEfPwpWyXEHdD/VK0EsD4GAWBazDUcCUhyMsemx+8r/LS0s8CnYIRB4BuubwGbp+xYsEMqG3PVCFtiaoz2VeamuHsl7JbBsScWj75275ppkLSVlRzG68MZIvsroh2pdsayt6CdV6f10BCRDyYm+ri8Za7Ih3vhx1brBjuHs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=j6r9mk5I; arc=none smtp.client-ip=192.198.163.8 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="j6r9mk5I" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1722436676; x=1753972676; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=jVDOCi1KfLth6h1o0JvRNsiOY34GdfN01cPp1eF3UZ0=; b=j6r9mk5IN97qMtO2E1M7s1IdVJkTziETKIqshKFZ+FsNQ7AYD7wS+4Te FVyGaZ0LG7g3iVpaGzcY5rgDYSOw+eTzjsbNIUu4Jy+CUZJHg7tLwStmw jZcNXnSQkMC4+/PdYgHMc9RzNXPRQO7zdg9zAYww+f0CiVK9XRsvP9mAe 9R+qcTO7CFFB1LLUIFHCLMCjXMVtzvr+hPpeTn7wU0zK6MdF/XEKqQQzy I3rIrFXnMA4ej+J6d0QZNwnkkmbFJrt38+t+xuvI2CyaBd63IW6WaDt80 QoLW7lczKW50z8TOzZnFKi79v+Xvv+Jib9Eag/mnUxJuGwNP4NFJ4zCbq w==; X-CSE-ConnectionGUID: uqjcBORaRYaSzdosYV9Kbg== X-CSE-MsgGUID: JSaj+6UaQxqDsUF7tHDUSA== X-IronPort-AV: E=McAfee;i="6700,10204,11150"; a="37835856" X-IronPort-AV: E=Sophos;i="6.09,251,1716274800"; d="scan'208";a="37835856" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jul 2024 07:37:53 -0700 X-CSE-ConnectionGUID: CIXD4V/iSfarPv2GZZb4aw== X-CSE-MsgGUID: 9mZITuMtQyu0ecBSiYYIFQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,251,1716274800"; d="scan'208";a="54698387" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by orviesa009.jf.intel.com with ESMTP; 31 Jul 2024 07:37:53 -0700 From: kan.liang@linux.intel.com To: peterz@infradead.org, mingo@kernel.org, acme@kernel.org, namhyung@kernel.org, irogers@google.com, adrian.hunter@intel.com, alexander.shishkin@linux.intel.com, linux-kernel@vger.kernel.org Cc: ak@linux.intel.com, eranian@google.com, Kan Liang Subject: [PATCH V4 2/5] perf: Extend perf_output_read Date: Wed, 31 Jul 2024 07:38:32 -0700 Message-Id: <20240731143835.771618-3-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20240731143835.771618-1-kan.liang@linux.intel.com> References: <20240731143835.771618-1-kan.liang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang The event may have been updated in the PMU-specific implementation, e.g., Intel PEBS counters snapshotting. The common code should not read and overwrite the value. The PERF_SAMPLE_READ in the data->sample_type can be used to detect whether the PMU-specific value is available. If yes, avoid the pmu->read() in the common code. Reviewed-by: Andi Kleen Reviewed-by: Ian Rogers Signed-off-by: Kan Liang --- kernel/events/core.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index aa3450bdc227..fcc55d0b5848 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -7269,7 +7269,7 @@ static void perf_output_read_one(struct perf_output_h= andle *handle, =20 static void perf_output_read_group(struct perf_output_handle *handle, struct perf_event *event, - u64 enabled, u64 running) + u64 enabled, u64 running, bool read) { struct perf_event *leader =3D event->group_leader, *sub; u64 read_format =3D event->attr.read_format; @@ -7291,7 +7291,7 @@ static void perf_output_read_group(struct perf_output= _handle *handle, if (read_format & PERF_FORMAT_TOTAL_TIME_RUNNING) values[n++] =3D running; =20 - if ((leader !=3D event) && + if ((leader !=3D event) && read && (leader->state =3D=3D PERF_EVENT_STATE_ACTIVE)) leader->pmu->read(leader); =20 @@ -7306,7 +7306,7 @@ static void perf_output_read_group(struct perf_output= _handle *handle, for_each_sibling_event(sub, leader) { n =3D 0; =20 - if ((sub !=3D event) && + if ((sub !=3D event) && read && (sub->state =3D=3D PERF_EVENT_STATE_ACTIVE)) sub->pmu->read(sub); =20 @@ -7333,7 +7333,8 @@ static void perf_output_read_group(struct perf_output= _handle *handle, * on another CPU, from interrupt/NMI context. */ static void perf_output_read(struct perf_output_handle *handle, - struct perf_event *event) + struct perf_event *event, + bool read) { u64 enabled =3D 0, running =3D 0, now; u64 read_format =3D event->attr.read_format; @@ -7351,7 +7352,7 @@ static void perf_output_read(struct perf_output_handl= e *handle, calc_timer_values(event, &now, &enabled, &running); =20 if (event->attr.read_format & PERF_FORMAT_GROUP) - perf_output_read_group(handle, event, enabled, running); + perf_output_read_group(handle, event, enabled, running, read); else perf_output_read_one(handle, event, enabled, running); } @@ -7393,7 +7394,7 @@ void perf_output_sample(struct perf_output_handle *ha= ndle, perf_output_put(handle, data->period); =20 if (sample_type & PERF_SAMPLE_READ) - perf_output_read(handle, event); + perf_output_read(handle, event, !(data->sample_flags & PERF_SAMPLE_READ)= ); =20 if (sample_type & PERF_SAMPLE_CALLCHAIN) { int size =3D 1; @@ -7994,7 +7995,7 @@ perf_event_read_event(struct perf_event *event, return; =20 perf_output_put(&handle, read_event); - perf_output_read(&handle, event); + perf_output_read(&handle, event, true); perf_event__output_id_sample(event, &handle, &sample); =20 perf_output_end(&handle); --=20 2.38.1 From nobody Thu Dec 18 08:37:56 2025 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C13F21BBBDB for ; Wed, 31 Jul 2024 14:37:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.8 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722436678; cv=none; b=flEkJo1Knec3sVwZ1clK27tUvUSgKJR7xR2eD11qkHaNBajXpPZaULpXUyA2YFrkPNz6XGUdRvYVPmxvfpn+7duM1m4LSzxpoO6sNALzh0Ij1N6MorJQIbh3t7rZhYXHooXgJSZDXd7pScWbRFTyTtQF4TZ4RQ1xl/nkwLWljhI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722436678; c=relaxed/simple; bh=aYhN7amcu7BIt4RJmvhF4b86s1eS1QylcxjZtEOaLNQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=aWVeDhVTPhltJOH5arnzab/dg7BsMluBP60UO/YTyuX6ZgfqNiyRyDvV2yMIhK7dYs5mcvzhS1xwX/qHaFwEAK6TNIzjnxoJXbydrMRpIOO1/8G8a3hotvWM8+iRYT1WcGoY++7SSIYoIC9VOB8eHFj5x1fCXQP1Zrj9WdarriA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=HT+Ia4J0; arc=none smtp.client-ip=192.198.163.8 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="HT+Ia4J0" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1722436676; x=1753972676; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=aYhN7amcu7BIt4RJmvhF4b86s1eS1QylcxjZtEOaLNQ=; b=HT+Ia4J061TnLvHNNx3QQiKchY6zJUBnGx2TFwXF7eXxaAH0chngyC/N jMT2u37V0Z2eF0tvC13PymF5qlE/yX4n7hrf5k8jV/QrHIKEfplO+f50N C17frMe3g/WkdtCgP/85UoAA7/u2rMqIbAMe1JBItxaqNcvyMDJL8ZAxl hSn9PxeH5vywSOVOl/x4Pn8jZmrXrz2NASK0tz1bT96tFODVrzkxudlJA 2t3vPy4CiyejLSxZtp5IGvrQBhZCOR/YPtv9ow/1KY2LtZYGHRbbVGpKh nioodHqn7Q1RmxqMJQ7wJyVJFuBNa6OOKJUO495s9DYpgDG/9k3xElj3T Q==; X-CSE-ConnectionGUID: xUzvW1ggQXym5oA2838QTQ== X-CSE-MsgGUID: uQ+v1FH7TLmH7ctft0vohw== X-IronPort-AV: E=McAfee;i="6700,10204,11150"; a="37835861" X-IronPort-AV: E=Sophos;i="6.09,251,1716274800"; d="scan'208";a="37835861" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jul 2024 07:37:53 -0700 X-CSE-ConnectionGUID: oWLTsR5SQw2xYoyhh/i4iQ== X-CSE-MsgGUID: 5IJQA5MEQvqQCzXgDZTN0A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,251,1716274800"; d="scan'208";a="54698391" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by orviesa009.jf.intel.com with ESMTP; 31 Jul 2024 07:37:53 -0700 From: kan.liang@linux.intel.com To: peterz@infradead.org, mingo@kernel.org, acme@kernel.org, namhyung@kernel.org, irogers@google.com, adrian.hunter@intel.com, alexander.shishkin@linux.intel.com, linux-kernel@vger.kernel.org Cc: ak@linux.intel.com, eranian@google.com, Kan Liang Subject: [PATCH V4 3/5] perf/x86/intel: Move PEBS event update after the sample output Date: Wed, 31 Jul 2024 07:38:33 -0700 Message-Id: <20240731143835.771618-4-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20240731143835.771618-1-kan.liang@linux.intel.com> References: <20240731143835.771618-1-kan.liang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang In the drain_pebs(), besides outputting the sample data, the perf needs to update the PEBS event (e.g., prev_count, event->count, etc.) as well. Both operations may invoke the perf_event_update(), but the sequence of the two operations doesn't matter for now. Because the updated event value is read directly from the counter via rdpmc. The counter stops in the drain_pebs(). But if the updated event value is from different places (PEBS record VS. counter), the sequence does matter. For example, with the new Intel PEBS counters snapshotting feature, the large PEBS can be enabled for the sample read, since counter values for each sample are recorded in PEBS records. The current perf does the PEBS event update first, which also updates the event for all the records altogether. It's impossible for the later sample read output to dump the value for each sample, since the prev_count is already the newest one from the current counter. Move PEBS event update after the sample output. For each sample read output, it will update and output the value only for this sample (according to the value in the PEBS record). Once all samples are output, update the PEBS event again according to the current counter, and set the left period. The !intel_pmu_save_and_restart() only happens when !hwc->event_base or the left > 0. The !hwc->event_base is impossible for the PEBS event which is only available on GP and fixed counters. The __intel_pmu_pebs_event() is only to process the overflowed sample. The left should be always <=3D0. It's safe to ignore the return from the !inel_pmu_save_and_restart() check. Reviewed-by: Andi Kleen Reviewed-by: Ian Rogers Signed-off-by: Kan Liang --- arch/x86/events/intel/ds.c | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index fa5ea65de0d0..9c28c7e34b57 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -2168,17 +2168,6 @@ __intel_pmu_pebs_event(struct perf_event *event, void *at =3D get_next_pebs_record_by_bit(base, top, bit); static struct pt_regs dummy_iregs; =20 - if (hwc->flags & PERF_X86_EVENT_AUTO_RELOAD) { - /* - * Now, auto-reload is only enabled in fixed period mode. - * The reload value is always hwc->sample_period. - * May need to change it, if auto-reload is enabled in - * freq mode later. - */ - intel_pmu_save_and_restart_reload(event, count); - } else if (!intel_pmu_save_and_restart(event)) - return; - if (!iregs) iregs =3D &dummy_iregs; =20 @@ -2207,6 +2196,17 @@ __intel_pmu_pebs_event(struct perf_event *event, if (perf_event_overflow(event, data, regs)) x86_pmu_stop(event, 0); } + + if (hwc->flags & PERF_X86_EVENT_AUTO_RELOAD) { + /* + * Now, auto-reload is only enabled in fixed period mode. + * The reload value is always hwc->sample_period. + * May need to change it, if auto-reload is enabled in + * freq mode later. + */ + intel_pmu_save_and_restart_reload(event, count); + } else + intel_pmu_save_and_restart(event); } =20 static void intel_pmu_drain_pebs_core(struct pt_regs *iregs, struct perf_s= ample_data *data) --=20 2.38.1 From nobody Thu Dec 18 08:37:56 2025 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1F0F41BBBEA for ; Wed, 31 Jul 2024 14:37:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.8 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722436680; cv=none; b=rhs6xNkPmmEpJ7YoGGH9ZctKkAXMCiucoG7GdYg2SBPbQncf/TlCeJVNeCP3U7HGmNEUfqHLQMuy71NPe6THEfqLg1+eW7wS9d3p9NoKBYtML7P8w3gxtTp+iS0ImV6mtN2cgam2Wp3p7xkMgnkRujKTHdH9BjT94LCuert8SUg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722436680; c=relaxed/simple; bh=JZiS61kaWdsFPHyyP31badhx9qambFATBdPCvaEyBK4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=NGGfHyqadCnJuPYTvqO5k0eGBvt19CZeVTb7sF2bKm9g+rWZ3j6+LfrzawIVLWEejgVKO7Nn2eKazLs2ISFznZ/6SRkIFt5gHCw/Iem+dsaG1QI4XA5QEsDvzrVhpYd+bAWcMRg1N5ql8ghjJdtJ16Zz8B9zslj9GVl4Nu02vyk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=nqOE59qn; arc=none smtp.client-ip=192.198.163.8 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="nqOE59qn" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1722436678; x=1753972678; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=JZiS61kaWdsFPHyyP31badhx9qambFATBdPCvaEyBK4=; b=nqOE59qnM+trAWA7RwwD/o8W17oDyess6SIZaXjaaJtx/SM72AmJTyVg GG4eCB2MU5/wMl4fBbI4xevQxutBRE/qzRhhBtSv6SbGpAtasbIBBfD4z jv+KZjmhPWelaEcGuP027r8XSCyzJgUiKMZGuy2iwV1EWtDhm6NbHMAgZ zn7+ze3eaDa3mivR71O0OebBTIsT6vrvymFElE05X+omsu+ZqEfCXUk0B zRLQEoE3WMhp+yD2TmamLoYEGlZxYTe7JDtgeDjw6OLZxYaAnGA9muqXx Mqzsh4ubx74+c+fQQmQpDMI8SzObFSEGEsZqhMxqPOYNPWZMdKS48CkUz A==; X-CSE-ConnectionGUID: aEeEeQugTretpywuNacEhQ== X-CSE-MsgGUID: fiFsBB03QGKobWIotKIgKA== X-IronPort-AV: E=McAfee;i="6700,10204,11150"; a="37835864" X-IronPort-AV: E=Sophos;i="6.09,251,1716274800"; d="scan'208";a="37835864" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jul 2024 07:37:53 -0700 X-CSE-ConnectionGUID: QXTQ9ewLTcaX/QTCbgMUWA== X-CSE-MsgGUID: qRn0BJkcQBGhnI2+uF+rww== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,251,1716274800"; d="scan'208";a="54698394" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by orviesa009.jf.intel.com with ESMTP; 31 Jul 2024 07:37:53 -0700 From: kan.liang@linux.intel.com To: peterz@infradead.org, mingo@kernel.org, acme@kernel.org, namhyung@kernel.org, irogers@google.com, adrian.hunter@intel.com, alexander.shishkin@linux.intel.com, linux-kernel@vger.kernel.org Cc: ak@linux.intel.com, eranian@google.com, Kan Liang Subject: [PATCH V4 4/5] perf/x86/intel: Support PEBS counters snapshotting Date: Wed, 31 Jul 2024 07:38:34 -0700 Message-Id: <20240731143835.771618-5-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20240731143835.771618-1-kan.liang@linux.intel.com> References: <20240731143835.771618-1-kan.liang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang The counters snapshotting is a new adaptive PEBS extension, which can capture programmable counters, fixed-function counters, and performance metrics in a PEBS record. The feature is available in the PEBS format V6. The target counters can be configured in the new fields of MSR_PEBS_CFG. Then the PEBS HW will generate the bit mask of counters (Counters Group Header) followed by the content of all the requested counters into a PEBS record. The current Linux perf sample read feature intends to read the counters of other member events when the leader event is overflowing. But the current read is in the NMI handler, which may has a small gap from overflow. Using the counters snapshotting feature for the sample read. Add a new PEBS_CNTR flag to indicate a sample read group that utilizes the counters snapshotting feature. When the group is scheduled, the PEBS configure can be updated accordingly. Reviewed-by: Andi Kleen Reviewed-by: Ian Rogers Signed-off-by: Kan Liang --- arch/x86/events/intel/core.c | 33 ++++++++- arch/x86/events/intel/ds.c | 114 +++++++++++++++++++++++++++-- arch/x86/events/perf_event.h | 3 + arch/x86/events/perf_event_flags.h | 2 +- arch/x86/include/asm/perf_event.h | 15 ++++ 5 files changed, 157 insertions(+), 10 deletions(-) diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index f32d47cbe37f..1988de2dd4f4 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -4058,6 +4058,19 @@ static int intel_pmu_hw_config(struct perf_event *ev= ent) event->hw.flags |=3D PERF_X86_EVENT_PEBS_VIA_PT; } =20 + if ((event->attr.sample_type & PERF_SAMPLE_READ) && + (x86_pmu.intel_cap.pebs_format >=3D 6)) { + struct perf_event *leader =3D event->group_leader; + + if (is_slots_event(leader)) + leader =3D list_next_entry(leader, sibling_list); + + if (leader->attr.precise_ip) { + leader->hw.flags |=3D PERF_X86_EVENT_PEBS_CNTR; + event->hw.flags |=3D PERF_X86_EVENT_PEBS_CNTR; + } + } + if ((event->attr.type =3D=3D PERF_TYPE_HARDWARE) || (event->attr.type =3D=3D PERF_TYPE_HW_CACHE)) return 0; @@ -4161,6 +4174,24 @@ static int intel_pmu_hw_config(struct perf_event *ev= ent) return 0; } =20 +static int intel_pmu_schedule_events(struct cpu_hw_events *cpuc, int n, in= t *assign) +{ + struct perf_event *event; + int ret =3D x86_schedule_events(cpuc, n, assign); + + if (ret) + return ret; + + if (cpuc->is_fake) + return ret; + + event =3D cpuc->event_list[n - 1]; + if (event && (event->hw.flags & PERF_X86_EVENT_PEBS_CNTR)) + intel_pmu_pebs_update_cfg(cpuc, n, assign); + + return 0; +} + /* * Currently, the only caller of this function is the atomic_switch_perf_m= srs(). * The host perf context helps to prepare the values of the real hardware = for @@ -5245,7 +5276,7 @@ static __initconst const struct x86_pmu intel_pmu =3D= { .set_period =3D intel_pmu_set_period, .update =3D intel_pmu_update, .hw_config =3D intel_pmu_hw_config, - .schedule_events =3D x86_schedule_events, + .schedule_events =3D intel_pmu_schedule_events, .eventsel =3D MSR_ARCH_PERFMON_EVENTSEL0, .perfctr =3D MSR_ARCH_PERFMON_PERFCTR0, .fixedctr =3D MSR_ARCH_PERFMON_FIXED_CTR0, diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index 9c28c7e34b57..1bb9223c31cc 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -1287,10 +1287,61 @@ static void adaptive_pebs_record_size_update(void) sz +=3D sizeof(struct pebs_xmm); if (pebs_data_cfg & PEBS_DATACFG_LBRS) sz +=3D x86_pmu.lbr_nr * sizeof(struct lbr_entry); + if (pebs_data_cfg & (PEBS_DATACFG_METRICS | PEBS_DATACFG_CNTR)) { + sz +=3D sizeof(struct pebs_cntr_header); + + /* Metrics base and Metrics Data */ + if (pebs_data_cfg & PEBS_DATACFG_METRICS) + sz +=3D 2 * sizeof(u64); + + if (pebs_data_cfg & PEBS_DATACFG_CNTR) { + sz +=3D hweight64((pebs_data_cfg >> PEBS_DATACFG_CNTR_SHIFT) & PEBS_DAT= ACFG_CNTR_MASK) + * sizeof(u64); + sz +=3D hweight64((pebs_data_cfg >> PEBS_DATACFG_FIX_SHIFT) & PEBS_DATA= CFG_FIX_MASK) + * sizeof(u64); + } + } =20 cpuc->pebs_record_size =3D sz; } =20 +static void __intel_pmu_pebs_update_cfg(struct perf_event *event, + int idx, u64 *pebs_data_cfg) +{ + if (is_metric_event(event)) { + *pebs_data_cfg |=3D PEBS_DATACFG_METRICS; + return; + } + + *pebs_data_cfg |=3D PEBS_DATACFG_CNTR; + + if (idx >=3D INTEL_PMC_IDX_FIXED) { + *pebs_data_cfg |=3D ((1ULL << (idx - INTEL_PMC_IDX_FIXED)) & PEBS_DATACF= G_FIX_MASK) + << PEBS_DATACFG_FIX_SHIFT; + } else { + *pebs_data_cfg |=3D ((1ULL << idx) & PEBS_DATACFG_CNTR_MASK) + << PEBS_DATACFG_CNTR_SHIFT; + } +} + +void intel_pmu_pebs_update_cfg(struct cpu_hw_events *cpuc, int n, int *ass= ign) +{ + struct perf_event *leader, *event; + u64 pebs_data_cfg =3D 0; + int i =3D n - 1; + + leader =3D cpuc->event_list[i]->group_leader; + for (; i >=3D 0; i--) { + event =3D cpuc->event_list[i]; + if (leader !=3D event->group_leader) + break; + __intel_pmu_pebs_update_cfg(event, assign[i], &pebs_data_cfg); + } + + if (pebs_data_cfg & ~cpuc->pebs_data_cfg) + cpuc->pebs_data_cfg |=3D pebs_data_cfg | PEBS_UPDATE_DS_SW; +} + #define PERF_PEBS_MEMINFO_TYPE (PERF_SAMPLE_ADDR | PERF_SAMPLE_DATA_SRC | = \ PERF_SAMPLE_PHYS_ADDR | \ PERF_SAMPLE_WEIGHT_TYPE | \ @@ -2034,6 +2085,40 @@ static void setup_pebs_adaptive_sample_data(struct p= erf_event *event, } } =20 + if (format_size & (PEBS_DATACFG_CNTR | PEBS_DATACFG_METRICS)) { + struct pebs_cntr_header *cntr =3D next_record; + int bit; + + next_record +=3D sizeof(struct pebs_cntr_header); + + for_each_set_bit(bit, (unsigned long *)&cntr->cntr, INTEL_PMC_MAX_GENERI= C) { + x86_perf_event_update(cpuc->events[bit], (u64 *)next_record); + next_record +=3D sizeof(u64); + } + + for_each_set_bit(bit, (unsigned long *)&cntr->fixed, INTEL_PMC_MAX_FIXED= ) { + /* The slots event will be handled with perf_metric later */ + if ((cntr->metrics =3D=3D INTEL_CNTR_METRICS) && + (INTEL_PMC_IDX_FIXED_SLOTS =3D=3D bit + INTEL_PMC_IDX_FIXED)) { + next_record +=3D sizeof(u64); + continue; + } + x86_perf_event_update(cpuc->events[bit + INTEL_PMC_IDX_FIXED], (u64 *)n= ext_record); + next_record +=3D sizeof(u64); + } + + /* HW will reload the value right after the overflow. */ + if (event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD) + local64_set(&event->hw.prev_count, (u64)-event->hw.sample_period); + + if (cntr->metrics =3D=3D INTEL_CNTR_METRICS) { + static_call(intel_pmu_update_topdown_event) + (event->group_leader, (u64 *)next_record); + next_record +=3D 2 * sizeof(u64); + } + data->sample_flags |=3D PERF_SAMPLE_READ; + } + WARN_ONCE(next_record !=3D __pebs + (format_size >> 48), "PEBS record size %llu, expected %llu, config %llx\n", format_size >> 48, @@ -2198,13 +2283,22 @@ __intel_pmu_pebs_event(struct perf_event *event, } =20 if (hwc->flags & PERF_X86_EVENT_AUTO_RELOAD) { - /* - * Now, auto-reload is only enabled in fixed period mode. - * The reload value is always hwc->sample_period. - * May need to change it, if auto-reload is enabled in - * freq mode later. - */ - intel_pmu_save_and_restart_reload(event, count); + if (event->hw.flags & PERF_X86_EVENT_PEBS_CNTR) { + /* + * The value of each sample has been updated when setup + * the corresponding sample data. But there may be a small + * gap between the last overflow and the drain_pebs(). + */ + intel_pmu_save_and_restart_reload(event, 0); + } else { + /* + * Now, auto-reload is only enabled in fixed period mode. + * The reload value is always hwc->sample_period. + * May need to change it, if auto-reload is enabled in + * freq mode later. + */ + intel_pmu_save_and_restart_reload(event, count); + } } else intel_pmu_save_and_restart(event); } @@ -2496,6 +2590,10 @@ void __init intel_ds_init(void) x86_pmu.large_pebs_flags |=3D PERF_SAMPLE_TIME; break; =20 + case 6: + if (x86_pmu.intel_cap.pebs_baseline) + x86_pmu.large_pebs_flags |=3D PERF_SAMPLE_READ; + fallthrough; case 5: x86_pmu.pebs_ept =3D 1; fallthrough; @@ -2520,7 +2618,7 @@ void __init intel_ds_init(void) PERF_SAMPLE_REGS_USER | PERF_SAMPLE_REGS_INTR); } - pr_cont("PEBS fmt4%c%s, ", pebs_type, pebs_qual); + pr_cont("PEBS fmt%d%c%s, ", format, pebs_type, pebs_qual); =20 if (!is_hybrid() && x86_pmu.intel_cap.pebs_output_pt_available) { pr_cont("PEBS-via-PT, "); diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 2cb5c2e31b1f..de839dfa7dfb 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -1132,6 +1132,7 @@ extern u64 __read_mostly hw_cache_extra_regs [PERF_COUNT_HW_CACHE_RESULT_MAX]; =20 u64 x86_perf_event_update(struct perf_event *event, u64 *cntr); +DECLARE_STATIC_CALL(intel_pmu_update_topdown_event, x86_perf_event_update); =20 static inline unsigned int x86_pmu_config_addr(int index) { @@ -1626,6 +1627,8 @@ void intel_pmu_pebs_disable_all(void); =20 void intel_pmu_pebs_sched_task(struct perf_event_pmu_context *pmu_ctx, boo= l sched_in); =20 +void intel_pmu_pebs_update_cfg(struct cpu_hw_events *cpuc, int n, int *ass= ign); + void intel_pmu_auto_reload_read(struct perf_event *event); =20 void intel_pmu_store_pebs_lbrs(struct lbr_entry *lbr); diff --git a/arch/x86/events/perf_event_flags.h b/arch/x86/events/perf_even= t_flags.h index 6c977c19f2cd..1d9e385649b5 100644 --- a/arch/x86/events/perf_event_flags.h +++ b/arch/x86/events/perf_event_flags.h @@ -9,7 +9,7 @@ PERF_ARCH(PEBS_LD_HSW, 0x00008) /* haswell style datala, l= oad */ PERF_ARCH(PEBS_NA_HSW, 0x00010) /* haswell style datala, unknown */ PERF_ARCH(EXCL, 0x00020) /* HT exclusivity on counter */ PERF_ARCH(DYNAMIC, 0x00040) /* dynamic alloc'd constraint */ - /* 0x00080 */ +PERF_ARCH(PEBS_CNTR, 0x00080) /* PEBS counters snapshot */ PERF_ARCH(EXCL_ACCT, 0x00100) /* accounted EXCL event */ PERF_ARCH(AUTO_RELOAD, 0x00200) /* use PEBS auto-reload */ PERF_ARCH(LARGE_PEBS, 0x00400) /* use large PEBS */ diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_= event.h index 91b73571412f..709746cd7c19 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -140,6 +140,12 @@ #define PEBS_DATACFG_XMMS BIT_ULL(2) #define PEBS_DATACFG_LBRS BIT_ULL(3) #define PEBS_DATACFG_LBR_SHIFT 24 +#define PEBS_DATACFG_CNTR BIT_ULL(4) +#define PEBS_DATACFG_CNTR_SHIFT 32 +#define PEBS_DATACFG_CNTR_MASK GENMASK_ULL(15, 0) +#define PEBS_DATACFG_FIX_SHIFT 48 +#define PEBS_DATACFG_FIX_MASK GENMASK_ULL(7, 0) +#define PEBS_DATACFG_METRICS BIT_ULL(5) =20 /* Steal the highest bit of pebs_data_cfg for SW usage */ #define PEBS_UPDATE_DS_SW BIT_ULL(63) @@ -444,6 +450,15 @@ struct pebs_xmm { u64 xmm[16*2]; /* two entries for each register */ }; =20 +struct pebs_cntr_header { + u32 cntr; + u32 fixed; + u32 metrics; + u32 reserved; +}; + +#define INTEL_CNTR_METRICS 0x3 + /* * AMD Extended Performance Monitoring and Debug cpuid feature detection */ --=20 2.38.1 From nobody Thu Dec 18 08:37:56 2025 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 22D8B1BBBEB for ; Wed, 31 Jul 2024 14:37:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.8 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722436679; cv=none; b=EKDmF6kdfUVeKXYAltSYccLlw5YEfnlI1i0yIy4AusHXpEHW9oTw4uO5RxPjonBXM2voKrCmJVr2CUjtd1FB+si/e88AOWJtzjyS6koWVo15i18MTXT1YeYXD3a+sAINagSHx9l7hcgY7TPWTBCSk8y2VYAXBdiW93G4TztPF+w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722436679; c=relaxed/simple; bh=VjifaBMSudR8gxH/LkJ96JlX4E3Kyq0fEmEyA1/sgqc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=ocJ4dO55JGx65pMNrLZYS9ASIp5cVgmBlGB5KkNqfvD7Vt1xlD6WWmy5hLxpUZ8Wg7w3GUQmrZVbOUOTpCvLFpDqka+Ix99j8YkUrwsyer97UdX1bxiWVc8UM9qf5fRT2IopWTnNiORAzL2pSqPfQxVuXDilx8HQLJIGkVxKce0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=ckDweBU3; arc=none smtp.client-ip=192.198.163.8 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="ckDweBU3" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1722436678; x=1753972678; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=VjifaBMSudR8gxH/LkJ96JlX4E3Kyq0fEmEyA1/sgqc=; b=ckDweBU3+nPOiQpf9ZDUJzXrp9UTMCGjYgY989J1UQ1p0deS9s8OUg9q pvnHDNN+nSYabzI1Iy/DEdWctjGQoVOcGGck3AOEdb6TsS+dZQTo5BAcZ pqXi41WsLO/YDoGRt1g85HBvJFcmYSCg8KEmZ3mDhl1tD3fbOznU1LExu qtokiA4U3d9TIoIdNqWFlq+aFdhtWtf55BPY7vYm3D0mAtj39/5EAuuSg 7CgUt1Yvnfuozk3nR5HKkK4jo2Kff3xcjOj7eHh8SVDWRmL3rDJCda+k5 QIpRigMM4YURTGe2RpCS6w9KYeqqIQlGfTnbOjhV3RmInzGkRLycJTzG5 w==; X-CSE-ConnectionGUID: wLQ1vnqLTyK9AtYjOLff4A== X-CSE-MsgGUID: BEpQnXwUS5KV5Vj1qjAx8g== X-IronPort-AV: E=McAfee;i="6700,10204,11150"; a="37835868" X-IronPort-AV: E=Sophos;i="6.09,251,1716274800"; d="scan'208";a="37835868" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jul 2024 07:37:54 -0700 X-CSE-ConnectionGUID: hTpQJuc+R8C1Sm76+kLetg== X-CSE-MsgGUID: eQEn3YecTqGv2TYRpgUbWw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,251,1716274800"; d="scan'208";a="54698397" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by orviesa009.jf.intel.com with ESMTP; 31 Jul 2024 07:37:53 -0700 From: kan.liang@linux.intel.com To: peterz@infradead.org, mingo@kernel.org, acme@kernel.org, namhyung@kernel.org, irogers@google.com, adrian.hunter@intel.com, alexander.shishkin@linux.intel.com, linux-kernel@vger.kernel.org Cc: ak@linux.intel.com, eranian@google.com, Kan Liang Subject: [PATCH V4 5/5] perf/x86/intel: Support RDPMC metrics clear mode Date: Wed, 31 Jul 2024 07:38:35 -0700 Message-Id: <20240731143835.771618-6-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20240731143835.771618-1-kan.liang@linux.intel.com> References: <20240731143835.771618-1-kan.liang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang The new RDPMC enhancement, metrics clear mode, is to clear the PERF_METRICS-related resources as well as the fixed-function performance monitoring counter 3 after the read is performed. It is available for ring 3. The feature is enumerated by the IA32_PERF_CAPABILITIES.RDPMC_CLEAR_METRICS[bit 19]. To enable the feature, the IA32_FIXED_CTR_CTRL.METRICS_CLEAR_EN[bit 14] must be set. Two ways were considered to enable the feature. - Expose a knob in the sysfs globally. One user may affect the measurement of other users when changing the knob. The solution is dropped. - Introduce a new event format, metrics_clear, for the slots event to disable/enable the feature only for the current process. Users can utilize the feature as needed. The latter solution is implemented in the patch. The current KVM doesn't support the perf metrics yet. For virtualization, the feature can be enabled later separately. Update the document of perf metrics. Suggested-by: Andi Kleen Reviewed-by: Andi Kleen Reviewed-by: Ian Rogers Signed-off-by: Kan Liang --- arch/x86/events/intel/core.c | 20 +++++++++++++++++++- arch/x86/events/perf_event.h | 1 + arch/x86/include/asm/perf_event.h | 4 ++++ tools/perf/Documentation/topdown.txt | 9 +++++++-- 4 files changed, 31 insertions(+), 3 deletions(-) diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index 1988de2dd4f4..ba981b37900e 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -2822,6 +2822,9 @@ static void intel_pmu_enable_fixed(struct perf_event = *event) return; =20 idx =3D INTEL_PMC_IDX_FIXED_SLOTS; + + if (event->attr.config1 & INTEL_TD_CFG_METRIC_CLEAR) + bits |=3D INTEL_FIXED_3_METRICS_CLEAR; } =20 intel_set_masks(event, idx); @@ -4086,7 +4089,12 @@ static int intel_pmu_hw_config(struct perf_event *ev= ent) * is used in a metrics group, it too cannot support sampling. */ if (intel_pmu_has_cap(event, PERF_CAP_METRICS_IDX) && is_topdown_event(ev= ent)) { - if (event->attr.config1 || event->attr.config2) + /* The metrics_clear can only be set for the slots event */ + if (event->attr.config1 && + (!is_slots_event(event) || (event->attr.config1 & ~INTEL_TD_CFG_METR= IC_CLEAR))) + return -EINVAL; + + if (event->attr.config2) return -EINVAL; =20 /* @@ -4673,6 +4681,8 @@ PMU_FORMAT_ATTR(in_tx, "config:32" ); PMU_FORMAT_ATTR(in_tx_cp, "config:33" ); PMU_FORMAT_ATTR(eq, "config:36" ); /* v6 + */ =20 +PMU_FORMAT_ATTR(metrics_clear, "config1:0"); /* PERF_CAPABILITIES.RDPMC_ME= TRICS_CLEAR */ + static ssize_t umask2_show(struct device *dev, struct device_attribute *attr, char *page) @@ -4692,6 +4702,7 @@ static struct device_attribute format_attr_umask2 = =3D static struct attribute *format_evtsel_ext_attrs[] =3D { &format_attr_umask2.attr, &format_attr_eq.attr, + &format_attr_metrics_clear.attr, NULL }; =20 @@ -4716,6 +4727,13 @@ evtsel_ext_is_visible(struct kobject *kobj, struct a= ttribute *attr, int i) if (i =3D=3D 1) return (mask & ARCH_PERFMON_EVENTSEL_EQ) ? attr->mode : 0; =20 + /* PERF_CAPABILITIES.RDPMC_METRICS_CLEAR */ + if (i =3D=3D 2) { + union perf_capabilities intel_cap =3D hybrid(dev_get_drvdata(dev), intel= _cap); + + return intel_cap.rdpmc_metrics_clear ? attr->mode : 0; + } + return 0; } =20 diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index de839dfa7dfb..c50f8b4f7a89 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -624,6 +624,7 @@ union perf_capabilities { u64 pebs_output_pt_available:1; u64 pebs_timing_info:1; u64 anythread_deprecated:1; + u64 rdpmc_metrics_clear:1; }; u64 capabilities; }; diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_= event.h index 709746cd7c19..21e1d1fe5972 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -41,6 +41,7 @@ #define INTEL_FIXED_0_USER (1ULL << 1) #define INTEL_FIXED_0_ANYTHREAD (1ULL << 2) #define INTEL_FIXED_0_ENABLE_PMI (1ULL << 3) +#define INTEL_FIXED_3_METRICS_CLEAR (1ULL << 2) =20 #define HSW_IN_TX (1ULL << 32) #define HSW_IN_TX_CHECKPOINTED (1ULL << 33) @@ -378,6 +379,9 @@ static inline bool use_fixed_pseudo_encoding(u64 code) #define INTEL_TD_METRIC_MAX INTEL_TD_METRIC_MEM_BOUND #define INTEL_TD_METRIC_NUM 8 =20 +#define INTEL_TD_CFG_METRIC_CLEAR_BIT 0 +#define INTEL_TD_CFG_METRIC_CLEAR BIT_ULL(INTEL_TD_CFG_METRIC_CLEAR_BIT) + static inline bool is_metric_idx(int idx) { return (unsigned)(idx - INTEL_PMC_IDX_METRIC_BASE) < INTEL_TD_METRIC_NUM; diff --git a/tools/perf/Documentation/topdown.txt b/tools/perf/Documentatio= n/topdown.txt index ae0aee86844f..f36c8ca1dc53 100644 --- a/tools/perf/Documentation/topdown.txt +++ b/tools/perf/Documentation/topdown.txt @@ -280,8 +280,13 @@ with no longer interval than a few seconds =20 perf stat -I 1000 --topdown ... =20 -For user programs using RDPMC directly the counter can -be reset explicitly using ioctl: +Starting from the Lunar Lake p-core, a RDPMC metrics clear mode is +introduced. The metrics and the fixed counter 3 are automatically +cleared after the read is performed. It is recommended to always enable +the mode. To enable the mode, the config1 of slots event is set to 1. + +On the previous platforms, for user programs using RDPMC directly, the +counter has to be reset explicitly using ioctl: =20 ioctl(perf_fd, PERF_EVENT_IOC_RESET, 0); =20 --=20 2.38.1