From nobody Wed Feb 11 05:49:46 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8D0DD1BFE05; Thu, 23 Jan 2025 06:20:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737613214; cv=none; b=kgN1Ci2AnlI24qpqL9yKlPIGZiafNackzNE2hR4d53kw8ReGZsU0/3McUm8lz1rNSdE8Sc5ZRPc5AvDAcSUVMVAMXdg9ZtUUDZKMrksYQDt+/XJIg3xApR8WqjhKT47RWkXhf3NzftnwnGDRWX5c4NPnl67INFBFTIcgL9iiL2E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737613214; c=relaxed/simple; bh=pSwGsDNCYRGk2x+g0Drr8wYURsrGsK3/Ut8o4ZT9Iwc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=r+n6bSd0d0y1lQiYb2iLgZaaY7xvWSzXvZXDsE4NxR5v2zLXENjq9b/Kf0dqdMibbIyLUfC73aWfJ4qBk1OTtltzg7Y8SQ0RrjsB43d2ccShcVoj8RtfC1J0OIVlc+FgtXbCYAk7o3a7QJtFdO/XU1BKZHZIXTxjAQRH+wzwLdM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=VCju4UFr; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="VCju4UFr" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1737613213; x=1769149213; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=pSwGsDNCYRGk2x+g0Drr8wYURsrGsK3/Ut8o4ZT9Iwc=; b=VCju4UFrXhf1vXNUWiSe0esL2dWQjJ/Z6P9PdM4ra6gAa9FY/tkJWjAW gXVK4HfsNZ8eS00IYASPN0VWZ+OP/YdUGbCZ2iW7D6uckmILVWNS5GI1X 5BeGaXXJei6an+zdW0Mhedd2kevmIg5FjwfudpmhLpES90Zb8xvkSkzfQ paGXFCxqYGblQLs1GhleYysN9UIfxxVns5ewcJ+5c89MpB+GAxTSwTRWH MSYJyz3XvFxVksql9U5knUOG4akeF7QfomUfAAYd0r/I+4/S/iTT9gSfm 1kVjt6Pzjbmj8u5nafLNX5Vgyl3ZA64hxWSpaR3LuckpNxOOd962rmEik w==; X-CSE-ConnectionGUID: Q+9Z9XxwSqKLQxNQftOSig== X-CSE-MsgGUID: 2uHciqOZSO+kgTaY09UgEg== X-IronPort-AV: E=McAfee;i="6700,10204,11323"; a="55513023" X-IronPort-AV: E=Sophos;i="6.13,227,1732608000"; d="scan'208,223";a="55513023" Received: from orviesa003.jf.intel.com ([10.64.159.143]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jan 2025 22:20:13 -0800 X-CSE-ConnectionGUID: pznLnFkWRx6Xy8EOtCspyA== X-CSE-MsgGUID: NK12cE/2R+Sk8pWhihdXUA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,199,1725346800"; d="scan'208,223";a="112334410" Received: from emr.sh.intel.com ([10.112.229.56]) by orviesa003.jf.intel.com with ESMTP; 22 Jan 2025 22:20:09 -0800 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Ian Rogers , Adrian Hunter , Alexander Shishkin , Kan Liang , Andi Kleen , Eranian Stephane Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Dapeng Mi , Dapeng Mi Subject: [PATCH 01/20] perf/x86/intel: Add PMU support for Clearwater Forest Date: Thu, 23 Jan 2025 14:07:02 +0000 Message-Id: <20250123140721.2496639-2-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250123140721.2496639-1-dapeng1.mi@linux.intel.com> References: <20250123140721.2496639-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From PMU's perspective, Clearwater Forest is similar to the previous generation Sierra Forest. The key differences are the ARCH PEBS feature and the new added 3 fixed counters for topdown L1 metrics events. The ARCH PEBS is supported in the following patches. This patch provides support for basic perfmon features and 3 new added fixed counters. Signed-off-by: Dapeng Mi --- arch/x86/events/intel/core.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index b140c1473a9d..5e8521a54474 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -2220,6 +2220,18 @@ static struct extra_reg intel_cmt_extra_regs[] __rea= d_mostly =3D { EVENT_EXTRA_END }; =20 +EVENT_ATTR_STR(topdown-fe-bound, td_fe_bound_skt, "event=3D0x= 9c,umask=3D0x01"); +EVENT_ATTR_STR(topdown-retiring, td_retiring_skt, "event=3D0x= c2,umask=3D0x02"); +EVENT_ATTR_STR(topdown-be-bound, td_be_bound_skt, "event=3D0x= a4,umask=3D0x02"); + +static struct attribute *skt_events_attrs[] =3D { + EVENT_PTR(td_fe_bound_skt), + EVENT_PTR(td_retiring_skt), + EVENT_PTR(td_bad_spec_cmt), + EVENT_PTR(td_be_bound_skt), + NULL, +}; + #define KNL_OT_L2_HITE BIT_ULL(19) /* Other Tile L2 Hit */ #define KNL_OT_L2_HITF BIT_ULL(20) /* Other Tile L2 Hit */ #define KNL_MCDRAM_LOCAL BIT_ULL(21) @@ -6801,6 +6813,18 @@ __init int intel_pmu_init(void) name =3D "crestmont"; break; =20 + case INTEL_ATOM_DARKMONT_X: + intel_pmu_init_skt(NULL); + intel_pmu_pebs_data_source_cmt(); + x86_pmu.pebs_latency_data =3D cmt_latency_data; + x86_pmu.get_event_constraints =3D cmt_get_event_constraints; + td_attr =3D skt_events_attrs; + mem_attr =3D grt_mem_attrs; + extra_attr =3D cmt_format_attr; + pr_cont("Darkmont events, "); + name =3D "darkmont"; + break; + case INTEL_WESTMERE: case INTEL_WESTMERE_EP: case INTEL_WESTMERE_EX: --=20 2.40.1 From nobody Wed Feb 11 05:49:46 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 25F40170A0A; Thu, 23 Jan 2025 06:20:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737613217; cv=none; b=XPlO0DYjVtG5fg5PB8tSE5xjpNj26TGLyXQNguUkgqV0svCvMgGhvFEDHoqBd7xatdjEyce1Mc12JScDhRDRUF9Otr1qa+jT0DGvApncK/zyY9cQOuQRtHnu1gd3brk8g4ChNPs/ElOmVG9szUaiwFluYtmAppWMotoR9MOuSv4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737613217; c=relaxed/simple; bh=Q7f0JQndshh2AJGcplxnFEAoV0+2mUf/eOiGuWfB/D4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=RE7b8y+R3w+bNOIU0alCeiSjbLJif/cSq3ilUpSzQnVvqdpUVbKt2HUqWa0o+C5WYrB5roIS7x70s0Dd4GzCIdN1nZ5BRamETVrAPWCClCkH1FbiIHIwluPuWJKJiwINdYw3BH+RBgbq4ENQUfMMu9779CnthJelbEZENzb1Pgw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=UhAqwGXV; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="UhAqwGXV" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1737613217; x=1769149217; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Q7f0JQndshh2AJGcplxnFEAoV0+2mUf/eOiGuWfB/D4=; b=UhAqwGXVAoYdIC6+uWHzXrfBjU4Emo8/vvFkLJm3fDedqIN7SQoWsf/v ZL3xd2GIzUi6qltV4MPx50gwtaKq9E29GvwDUS6uFa8HXtG1QeZQl05f7 4H18PF6hvYqbz1/mptgtAN6311g7qbs8PIlgiUSBHURNXJaVqZUM2BmhF uCE5wWIdttAxei2eVigqi6vGZp/9vRoSbyKFv8F8/nsE/VYzJgHwP+G0b bRmbQk+ZyLp+dBWrvT5g+c82HMOkK1R+cEFfhsZ4tLMd6DYpGvOoV60yg nVjnXT75QLgO46nnQY8u+ww0Dmwq0MRGKV3t+K3npRS8+Gm947rUrAzCZ g==; X-CSE-ConnectionGUID: cIo1ZoFsS4qTm7xVonexyw== X-CSE-MsgGUID: 4iEMtzloQPOoaCUQhFFXUQ== X-IronPort-AV: E=McAfee;i="6700,10204,11323"; a="55513032" X-IronPort-AV: E=Sophos;i="6.13,227,1732608000"; d="scan'208";a="55513032" Received: from orviesa003.jf.intel.com ([10.64.159.143]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jan 2025 22:20:16 -0800 X-CSE-ConnectionGUID: 2YFA9debS46ba63sEM9DoQ== X-CSE-MsgGUID: V31eWlYtQIaqQj5j+kf59w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,199,1725346800"; d="scan'208";a="112334423" Received: from emr.sh.intel.com ([10.112.229.56]) by orviesa003.jf.intel.com with ESMTP; 22 Jan 2025 22:20:12 -0800 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Ian Rogers , Adrian Hunter , Alexander Shishkin , Kan Liang , Andi Kleen , Eranian Stephane Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Dapeng Mi , stable@vger.kernel.org Subject: [PATCH 02/20] perf/x86/intel: Fix ARCH_PERFMON_NUM_COUNTER_LEAF Date: Thu, 23 Jan 2025 14:07:03 +0000 Message-Id: <20250123140721.2496639-3-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250123140721.2496639-1-dapeng1.mi@linux.intel.com> References: <20250123140721.2496639-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang The EAX of the CPUID Leaf 023H enumerates the mask of valid sub-leaves. To tell the availability of the sub-leaf 1 (enumerate the counter mask), perf should check the bit 1 (0x2) of EAS, rather than bit 0 (0x1). The error is not user-visible on bare metal. Because the sub-leaf 0 and the sub-leaf 1 are always available. However, it may bring issues in a virtualization environment when a VMM only enumerates the sub-leaf 0. Fixes: eb467aaac21e ("perf/x86/intel: Support Architectural PerfMon Extensi= on leaf") Signed-off-by: Kan Liang Cc: stable@vger.kernel.org --- arch/x86/events/intel/core.c | 4 ++-- arch/x86/include/asm/perf_event.h | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index 5e8521a54474..12eb96219740 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -4966,8 +4966,8 @@ static void update_pmu_cap(struct x86_hybrid_pmu *pmu) if (ebx & ARCH_PERFMON_EXT_EQ) pmu->config_mask |=3D ARCH_PERFMON_EVENTSEL_EQ; =20 - if (sub_bitmaps & ARCH_PERFMON_NUM_COUNTER_LEAF_BIT) { - cpuid_count(ARCH_PERFMON_EXT_LEAF, ARCH_PERFMON_NUM_COUNTER_LEAF, + if (sub_bitmaps & ARCH_PERFMON_NUM_COUNTER_LEAF) { + cpuid_count(ARCH_PERFMON_EXT_LEAF, ARCH_PERFMON_NUM_COUNTER_LEAF_BIT, &eax, &ebx, &ecx, &edx); pmu->cntr_mask64 =3D eax; pmu->fixed_cntr_mask64 =3D ebx; diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_= event.h index adaeb8ca3a8a..71e2ae021374 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -197,7 +197,7 @@ union cpuid10_edx { #define ARCH_PERFMON_EXT_UMASK2 0x1 #define ARCH_PERFMON_EXT_EQ 0x2 #define ARCH_PERFMON_NUM_COUNTER_LEAF_BIT 0x1 -#define ARCH_PERFMON_NUM_COUNTER_LEAF 0x1 +#define ARCH_PERFMON_NUM_COUNTER_LEAF BIT(ARCH_PERFMON_NUM_COUNTER_LEAF_B= IT) =20 /* * Intel Architectural LBR CPUID detection/enumeration details: --=20 2.40.1 From nobody Wed Feb 11 05:49:46 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 620A01C9B7A; Thu, 23 Jan 2025 06:20:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737613220; cv=none; b=tD9VpYhBYdKO5dQLl2B+EuzpvRNKSRaEkgfpHgx846NR1CiZPyopfOonW/C7mU/tzj9gjQoL6/VZCnRkbDFepPYW9pEF2J8PZ4nE5/Ft8lFBjSqei0kHlEYQllhb/y4lG45yCYyak3AK/EUt/AqnCk+B7l8G4aIGYWLAn4MwSb8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737613220; c=relaxed/simple; bh=EEZpU4/D2b6pihCbiAMkHaPnvVQe+3/e33M0t/sr3v0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=JQZxMI66EArT4cslHf7Hxk8g/RE8RZpNWfa9/VZJ9Fs5ubqrgOPwX4XUo7C0dME9zSdkD6Azyvfq8YDWw75SfNdXIJ8pmjVY1uZGMw4WlsrvqMl2LWZX5gmwkI00Xl/wT/sPdEvCvqIARd7nckjR5bZ12XjkjN4r/0fXaq0ZEvU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=D/lXsbHr; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="D/lXsbHr" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1737613220; x=1769149220; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=EEZpU4/D2b6pihCbiAMkHaPnvVQe+3/e33M0t/sr3v0=; b=D/lXsbHr9rr2CfKaoaeC4hfcy/TpQmvmdQ4Dh1+BiTi0Ecxrla01sQaO T059NwAh6ekO95z8R+naEpEDaRpay2E1W6m1mVL/PHKI/tf+hoyn7W0GG K7H3kElgru7PqnGZE9JDJ5PTL2WXCOJYif2s/VL/Ja5opO8BvKmdcij1n 9fmUKMU852IEnv85keaNMk//tXLh40DfwrZtJXEpoibaDv+o8I4/PEIzu ZXhjCaXlAjFzXm0a74ka8excgbRAyJoGJWZpAghqwDYpx76DbhDDD0y1e cd+79FNJ8JbUQc/Cd2jMVJHHSc15IdPhB5smIFMtdeT49S2G4zkNJbChn w==; X-CSE-ConnectionGUID: w+agh3vzTBaevURG7ab4Ew== X-CSE-MsgGUID: QhJouVw4Tua2Zlt/w6eb4w== X-IronPort-AV: E=McAfee;i="6700,10204,11323"; a="55513045" X-IronPort-AV: E=Sophos;i="6.13,227,1732608000"; d="scan'208";a="55513045" Received: from orviesa003.jf.intel.com ([10.64.159.143]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jan 2025 22:20:20 -0800 X-CSE-ConnectionGUID: WU26d5LhQ1mrJGrJnEguJg== X-CSE-MsgGUID: ws4VBgnDSquZ8HA1Mkg2+w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,199,1725346800"; d="scan'208";a="112334452" Received: from emr.sh.intel.com ([10.112.229.56]) by orviesa003.jf.intel.com with ESMTP; 22 Jan 2025 22:20:15 -0800 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Ian Rogers , Adrian Hunter , Alexander Shishkin , Kan Liang , Andi Kleen , Eranian Stephane Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Dapeng Mi , Dapeng Mi Subject: [PATCH 03/20] perf/x86/intel: Parse CPUID archPerfmonExt leaves for non-hybrid CPUs Date: Thu, 23 Jan 2025 14:07:04 +0000 Message-Id: <20250123140721.2496639-4-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250123140721.2496639-1-dapeng1.mi@linux.intel.com> References: <20250123140721.2496639-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" CPUID archPerfmonExt (0x23) leaves are supported to enumerate CPU level's PMU capabilities on non-hybrid processors as well. This patch supports to parse archPerfmonExt leaves on non-hybrid processors. Architectural PEBS leverages archPerfmonExt sub-leaves 0x4 and 0x5 to enumerate the PEBS capabilities as well. This patch is a precursor of the subsequent arch-PEBS enabling patches. Signed-off-by: Dapeng Mi --- arch/x86/events/intel/core.c | 27 ++++++++++++++++++++------- 1 file changed, 20 insertions(+), 7 deletions(-) diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index 12eb96219740..d29e7ada96aa 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -4955,27 +4955,27 @@ static inline bool intel_pmu_broken_perf_cap(void) return false; } =20 -static void update_pmu_cap(struct x86_hybrid_pmu *pmu) +static void update_pmu_cap(struct pmu *pmu) { unsigned int sub_bitmaps, eax, ebx, ecx, edx; =20 cpuid(ARCH_PERFMON_EXT_LEAF, &sub_bitmaps, &ebx, &ecx, &edx); =20 if (ebx & ARCH_PERFMON_EXT_UMASK2) - pmu->config_mask |=3D ARCH_PERFMON_EVENTSEL_UMASK2; + hybrid(pmu, config_mask) |=3D ARCH_PERFMON_EVENTSEL_UMASK2; if (ebx & ARCH_PERFMON_EXT_EQ) - pmu->config_mask |=3D ARCH_PERFMON_EVENTSEL_EQ; + hybrid(pmu, config_mask) |=3D ARCH_PERFMON_EVENTSEL_EQ; =20 if (sub_bitmaps & ARCH_PERFMON_NUM_COUNTER_LEAF) { cpuid_count(ARCH_PERFMON_EXT_LEAF, ARCH_PERFMON_NUM_COUNTER_LEAF_BIT, &eax, &ebx, &ecx, &edx); - pmu->cntr_mask64 =3D eax; - pmu->fixed_cntr_mask64 =3D ebx; + hybrid(pmu, cntr_mask64) =3D eax; + hybrid(pmu, fixed_cntr_mask64) =3D ebx; } =20 if (!intel_pmu_broken_perf_cap()) { /* Perf Metric (Bit 15) and PEBS via PT (Bit 16) are hybrid enumeration = */ - rdmsrl(MSR_IA32_PERF_CAPABILITIES, pmu->intel_cap.capabilities); + rdmsrl(MSR_IA32_PERF_CAPABILITIES, hybrid(pmu, intel_cap).capabilities); } } =20 @@ -5066,7 +5066,7 @@ static bool init_hybrid_pmu(int cpu) goto end; =20 if (this_cpu_has(X86_FEATURE_ARCH_PERFMON_EXT)) - update_pmu_cap(pmu); + update_pmu_cap(&pmu->pmu); =20 intel_pmu_check_hybrid_pmus(pmu); =20 @@ -6564,6 +6564,7 @@ __init int intel_pmu_init(void) =20 x86_pmu.pebs_events_mask =3D intel_pmu_pebs_mask(x86_pmu.cntr_mask64); x86_pmu.pebs_capable =3D PEBS_COUNTER_MASK; + x86_pmu.config_mask =3D X86_RAW_EVENT_MASK; =20 /* * Quirk: v2 perfmon does not report fixed-purpose events, so @@ -7374,6 +7375,18 @@ __init int intel_pmu_init(void) x86_pmu.attr_update =3D hybrid_attr_update; } =20 + /* + * The archPerfmonExt (0x23) includes an enhanced enumeration of + * PMU architectural features with a per-core view. For non-hybrid, + * each core has the same PMU capabilities. It's good enough to + * update the x86_pmu from the booting CPU. For hybrid, the x86_pmu + * is used to keep the common capabilities. Still keep the values + * from the leaf 0xa. The core specific update will be done later + * when a new type is online. + */ + if (!is_hybrid() && boot_cpu_has(X86_FEATURE_ARCH_PERFMON_EXT)) + update_pmu_cap(NULL); + intel_pmu_check_counters_mask(&x86_pmu.cntr_mask64, &x86_pmu.fixed_cntr_mask64, &x86_pmu.intel_ctrl); --=20 2.40.1 From nobody Wed Feb 11 05:49:46 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D9B321CAA8C; Thu, 23 Jan 2025 06:20:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737613224; cv=none; b=NWu/R6TzbHdpbNpuLvZvMdg9MBHfMu+EuU8M2CXPL17rO2gp38wvNzCV9Gr34qihfwYOxzDuudql3wZ3cllNLBY+CVPh7UMQjZRevQGDuSrIS44qFRkxCQi9Wp+WgvRyIRKb6B4QBr4JUVq4nViPLZgTPyi4qZszY24Sy3xGw4U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737613224; c=relaxed/simple; bh=l3wSYEAfl5v5b3KMp8mM+Z1KMLhUKde2syJ/vc4aPBA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=j5XTvjRcWjbummIp1dNCvdD7SlSWwa9VLCvnbBI5PCUNGXsdowouOwsOriBfBKsP0HtuYnJJ9VkK7QebgOj+6/A7hmc1g/Pdi3Qnx7b3S4xqTOdKtgvieUEE8eYk9PWOE7s1B3paxe5FsAuiHI7ojq1CQGt5MXfUQ2acIl/ywVs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=gZG4I5pL; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="gZG4I5pL" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1737613223; x=1769149223; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=l3wSYEAfl5v5b3KMp8mM+Z1KMLhUKde2syJ/vc4aPBA=; b=gZG4I5pLkJterb3r7YvjG/8S2GAJzNIaRTqyhj2+oQgCP94tn7WZ1IfY /f9KmmVqpXxz/CCOowwSmHZW9pVcPxoACMVijFjlXyAomj+Q91tMdTNFV dNupdvQ2JCdxo9Gi0wcbEflOLD91yQDknWEDAa/Jq6iR2qmFMyPlCsaGW vQAk5322/YPdCuFBdX8A6TlQgdLmT2l9kYKmbI1sDRCoCEIcjZwOsrBUP yTLKbYW189vqCYODNeNBe5mdA80MtP9FhG7eCD0NXVr1J9P1dLoCE3FuB p+FJT/31VCGOIXA/kgnQw297ZPlU6qfmiRvhnIzdaKcODQiuPEbIvPTfO g==; X-CSE-ConnectionGUID: SZKwARG+SvSpmR0fLsOTqw== X-CSE-MsgGUID: su0Ni2MLQvSqNfDVTWCM4w== X-IronPort-AV: E=McAfee;i="6700,10204,11323"; a="55513059" X-IronPort-AV: E=Sophos;i="6.13,227,1732608000"; d="scan'208";a="55513059" Received: from orviesa003.jf.intel.com ([10.64.159.143]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jan 2025 22:20:23 -0800 X-CSE-ConnectionGUID: ZCTNdLOaRAOPKgRIXCftkg== X-CSE-MsgGUID: /pI8FQQpSlG9KcA7z5YSkw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,199,1725346800"; d="scan'208";a="112334468" Received: from emr.sh.intel.com ([10.112.229.56]) by orviesa003.jf.intel.com with ESMTP; 22 Jan 2025 22:20:19 -0800 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Ian Rogers , Adrian Hunter , Alexander Shishkin , Kan Liang , Andi Kleen , Eranian Stephane Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Dapeng Mi , Dapeng Mi Subject: [PATCH 04/20] perf/x86/intel: Decouple BTS initialization from PEBS initialization Date: Thu, 23 Jan 2025 14:07:05 +0000 Message-Id: <20250123140721.2496639-5-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250123140721.2496639-1-dapeng1.mi@linux.intel.com> References: <20250123140721.2496639-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Move x86_pmu.bts flag initialization into bts_init() from intel_ds_init() and rename intel_ds_init() to intel_pebs_init() since it fully initializes PEBS now after removing the x86_pmu.bts initialization. It's safe to move x86_pmu.bts into bts_init() since all x86_pmu.bts flag are called after bts_init() execution. Signed-off-by: Dapeng Mi --- arch/x86/events/intel/bts.c | 6 +++++- arch/x86/events/intel/core.c | 2 +- arch/x86/events/intel/ds.c | 5 ++--- arch/x86/events/perf_event.h | 2 +- 4 files changed, 9 insertions(+), 6 deletions(-) diff --git a/arch/x86/events/intel/bts.c b/arch/x86/events/intel/bts.c index 8f78b0c900ef..a205d1fb37b1 100644 --- a/arch/x86/events/intel/bts.c +++ b/arch/x86/events/intel/bts.c @@ -584,7 +584,11 @@ static void bts_event_read(struct perf_event *event) =20 static __init int bts_init(void) { - if (!boot_cpu_has(X86_FEATURE_DTES64) || !x86_pmu.bts) + if (!boot_cpu_has(X86_FEATURE_DTES64)) + return -ENODEV; + + x86_pmu.bts =3D boot_cpu_has(X86_FEATURE_BTS); + if (!x86_pmu.bts) return -ENODEV; =20 if (boot_cpu_has(X86_FEATURE_PTI)) { diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index d29e7ada96aa..91afba51038f 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -6593,7 +6593,7 @@ __init int intel_pmu_init(void) if (boot_cpu_has(X86_FEATURE_ARCH_LBR)) intel_pmu_arch_lbr_init(); =20 - intel_ds_init(); + intel_pebs_init(); =20 x86_add_quirk(intel_arch_events_quirk); /* Install first, so it runs last= */ =20 diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index 13a78a8a2780..86fa6d8c45cf 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -2650,10 +2650,10 @@ static void intel_pmu_drain_pebs_icl(struct pt_regs= *iregs, struct perf_sample_d } =20 /* - * BTS, PEBS probe and setup + * PEBS probe and setup */ =20 -void __init intel_ds_init(void) +void __init intel_pebs_init(void) { /* * No support for 32bit formats @@ -2661,7 +2661,6 @@ void __init intel_ds_init(void) if (!boot_cpu_has(X86_FEATURE_DTES64)) return; =20 - x86_pmu.bts =3D boot_cpu_has(X86_FEATURE_BTS); x86_pmu.pebs =3D boot_cpu_has(X86_FEATURE_PEBS); x86_pmu.pebs_buffer_size =3D PEBS_BUFFER_SIZE; if (x86_pmu.version <=3D 4) diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index a698e6484b3b..e15c2d0dbb27 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -1661,7 +1661,7 @@ void intel_pmu_drain_pebs_buffer(void); =20 void intel_pmu_store_pebs_lbrs(struct lbr_entry *lbr); =20 -void intel_ds_init(void); +void intel_pebs_init(void); =20 void intel_pmu_lbr_save_brstack(struct perf_sample_data *data, struct cpu_hw_events *cpuc, --=20 2.40.1 From nobody Wed Feb 11 05:49:46 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5C5AB1CD1EA; Thu, 23 Jan 2025 06:20:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737613227; cv=none; b=oJ8lFMvo9ZlVjP4Wpbuzda+4Fv4e1aAdQ0pvAM8ZiI9xGsEGW7Z46NT09fHiYZZo31YkUWVDkqRxU3DGw0UIGSyHh78qtKDIaMwR9HX10XakOLf47LbjYSJhRb2f10utPOFHh5/xF4WmcMywgUg6Gxnv7PWKcyImdt2pDgnqesw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737613227; c=relaxed/simple; bh=km79UngpPkrZIJwtiWyAjaZipv2uHIAEmkV3BKaTUDA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=fti2q1uyzZb89wiStwx09y8PS25GL2ohmVjKL43j4/EmUxTLvKLqYJRpeKJzjQsIH/ySp8xVaYoURzuODie20GG5wNV5GRgZ6rlzvI1nIaqo+jdN/yFqXYpWzmojN6K4lkOexTDdtr1tpHfY23Nn9OXPVgcKIhJdctZa0osnNF4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=HANysIDb; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="HANysIDb" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1737613227; x=1769149227; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=km79UngpPkrZIJwtiWyAjaZipv2uHIAEmkV3BKaTUDA=; b=HANysIDbIZ+8nlGoK/1ZhHa6BcRNJLvADOLLRkxbocnxRjq1UrnDpS3P Ctj4fuuskjBYtceXNs1Z48Xqz57JYmPW8+rhiAq4C0rGI4QBIECUo3gBX 96rBPUxGJA4OBuykdedspZb47h4gkIUV7sOU+0a3ihIfdmlxLeEUjNwcP 6/xxlOD1+RIq+eWZscuC1fbpBfztGqvGPs/J6kx0oTG8Djem4D71wYJY0 fq0aTDuCkkcMjunfQhe4AsbbnPFrf/R/rvnv0hWQ7IFqL0BGzD10EC4Pf OrcJC5dwkaETm2Nbr1WHtnNCy/dwWmw49zFhNG1/M70uoCu0rKVuFe+gv A==; X-CSE-ConnectionGUID: gD6usK2BS9uE/jA+zos8Ag== X-CSE-MsgGUID: Ex8BCGdbQZKwPvyEnAbJXg== X-IronPort-AV: E=McAfee;i="6700,10204,11323"; a="55513069" X-IronPort-AV: E=Sophos;i="6.13,227,1732608000"; d="scan'208";a="55513069" Received: from orviesa003.jf.intel.com ([10.64.159.143]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jan 2025 22:20:27 -0800 X-CSE-ConnectionGUID: beuVxB+5Q+KuYYe1J7YqgQ== X-CSE-MsgGUID: b1sJz8ckREeKTF2XvOvVbg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,199,1725346800"; d="scan'208";a="112334498" Received: from emr.sh.intel.com ([10.112.229.56]) by orviesa003.jf.intel.com with ESMTP; 22 Jan 2025 22:20:22 -0800 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Ian Rogers , Adrian Hunter , Alexander Shishkin , Kan Liang , Andi Kleen , Eranian Stephane Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Dapeng Mi , Dapeng Mi Subject: [PATCH 05/20] perf/x86/intel: Rename x86_pmu.pebs to x86_pmu.ds_pebs Date: Thu, 23 Jan 2025 14:07:06 +0000 Message-Id: <20250123140721.2496639-6-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250123140721.2496639-1-dapeng1.mi@linux.intel.com> References: <20250123140721.2496639-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Since architectural PEBS would be introduced in subsequent patches, rename x86_pmu.pebs to x86_pmu.ds_pebs for distinguishing with the upcoming architectural PEBS. Signed-off-by: Dapeng Mi --- arch/x86/events/intel/core.c | 6 +++--- arch/x86/events/intel/ds.c | 20 ++++++++++---------- arch/x86/events/perf_event.h | 2 +- 3 files changed, 14 insertions(+), 14 deletions(-) diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index 91afba51038f..0063afa0ddac 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -4268,7 +4268,7 @@ static struct perf_guest_switch_msr *intel_guest_get_= msrs(int *nr, void *data) .guest =3D intel_ctrl & ~cpuc->intel_ctrl_host_mask & ~pebs_mask, }; =20 - if (!x86_pmu.pebs) + if (!x86_pmu.ds_pebs) return arr; =20 /* @@ -5447,7 +5447,7 @@ static __init void intel_clovertown_quirk(void) * these chips. */ pr_warn("PEBS disabled due to CPU errata\n"); - x86_pmu.pebs =3D 0; + x86_pmu.ds_pebs =3D 0; x86_pmu.pebs_constraints =3D NULL; } =20 @@ -5945,7 +5945,7 @@ tsx_is_visible(struct kobject *kobj, struct attribute= *attr, int i) static umode_t pebs_is_visible(struct kobject *kobj, struct attribute *attr, int i) { - return x86_pmu.pebs ? attr->mode : 0; + return x86_pmu.ds_pebs ? attr->mode : 0; } =20 static umode_t diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index 86fa6d8c45cf..e8a06c8486af 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -624,7 +624,7 @@ static int alloc_pebs_buffer(int cpu) int max, node =3D cpu_to_node(cpu); void *buffer, *insn_buff, *cea; =20 - if (!x86_pmu.pebs) + if (!x86_pmu.ds_pebs) return 0; =20 buffer =3D dsalloc_pages(bsiz, GFP_KERNEL, cpu); @@ -659,7 +659,7 @@ static void release_pebs_buffer(int cpu) struct cpu_hw_events *hwev =3D per_cpu_ptr(&cpu_hw_events, cpu); void *cea; =20 - if (!x86_pmu.pebs) + if (!x86_pmu.ds_pebs) return; =20 kfree(per_cpu(insn_buffer, cpu)); @@ -734,7 +734,7 @@ void release_ds_buffers(void) { int cpu; =20 - if (!x86_pmu.bts && !x86_pmu.pebs) + if (!x86_pmu.bts && !x86_pmu.ds_pebs) return; =20 for_each_possible_cpu(cpu) @@ -763,13 +763,13 @@ void reserve_ds_buffers(void) x86_pmu.bts_active =3D 0; x86_pmu.pebs_active =3D 0; =20 - if (!x86_pmu.bts && !x86_pmu.pebs) + if (!x86_pmu.bts && !x86_pmu.ds_pebs) return; =20 if (!x86_pmu.bts) bts_err =3D 1; =20 - if (!x86_pmu.pebs) + if (!x86_pmu.ds_pebs) pebs_err =3D 1; =20 for_each_possible_cpu(cpu) { @@ -805,7 +805,7 @@ void reserve_ds_buffers(void) if (x86_pmu.bts && !bts_err) x86_pmu.bts_active =3D 1; =20 - if (x86_pmu.pebs && !pebs_err) + if (x86_pmu.ds_pebs && !pebs_err) x86_pmu.pebs_active =3D 1; =20 for_each_possible_cpu(cpu) { @@ -2661,12 +2661,12 @@ void __init intel_pebs_init(void) if (!boot_cpu_has(X86_FEATURE_DTES64)) return; =20 - x86_pmu.pebs =3D boot_cpu_has(X86_FEATURE_PEBS); + x86_pmu.ds_pebs =3D boot_cpu_has(X86_FEATURE_PEBS); x86_pmu.pebs_buffer_size =3D PEBS_BUFFER_SIZE; if (x86_pmu.version <=3D 4) x86_pmu.pebs_no_isolation =3D 1; =20 - if (x86_pmu.pebs) { + if (x86_pmu.ds_pebs) { char pebs_type =3D x86_pmu.intel_cap.pebs_trap ? '+' : '-'; char *pebs_qual =3D ""; int format =3D x86_pmu.intel_cap.pebs_format; @@ -2750,7 +2750,7 @@ void __init intel_pebs_init(void) =20 default: pr_cont("no PEBS fmt%d%c, ", format, pebs_type); - x86_pmu.pebs =3D 0; + x86_pmu.ds_pebs =3D 0; } } } @@ -2759,7 +2759,7 @@ void perf_restore_debug_store(void) { struct debug_store *ds =3D __this_cpu_read(cpu_hw_events.ds); =20 - if (!x86_pmu.bts && !x86_pmu.pebs) + if (!x86_pmu.bts && !x86_pmu.ds_pebs) return; =20 wrmsrl(MSR_IA32_DS_AREA, (unsigned long)ds); diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index e15c2d0dbb27..d5b7f5605e1e 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -888,7 +888,7 @@ struct x86_pmu { */ unsigned int bts :1, bts_active :1, - pebs :1, + ds_pebs :1, pebs_active :1, pebs_broken :1, pebs_prec_dist :1, --=20 2.40.1 From nobody Wed Feb 11 05:49:46 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 18E1F1CDA3F; Thu, 23 Jan 2025 06:20:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737613231; cv=none; b=u/+CVjLNyPQycmbf55hDCTzqXQ9nJXwPia7jMMbEKYJ0aIY+krov0755vF0bWrXd1FESrJtWe3KCSgYtj8D1IClyXeSXJBkpHfUD0gbjKe4EH7jijbSTFYUOY667kx73VFbluigGCHQYd2iSi162kGY3q9wn+b1xU3x46jdfRTg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737613231; c=relaxed/simple; bh=TpIfjt1v0HwRi0fNRAa7iYXvgNjuHheeVH0IQMvHrNg=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=HYNnUJNUdsqFmnby6fN+rTKtF9FAU1fum1Qk1f/Odpqit3dq2BPsfOSej0i8/5s6h/Ofa4upCFDl05+hwcbAi3jFNXAxThASjtl9oAFjknxv2b/ZBdqCzd/jthAgt0V3fbQMnGdzYz1VCLvqBG25oSixng2Ocq6UDp4pajlBJlE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=YQ+hsBP5; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="YQ+hsBP5" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1737613231; x=1769149231; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=TpIfjt1v0HwRi0fNRAa7iYXvgNjuHheeVH0IQMvHrNg=; b=YQ+hsBP5fOs5vhzWH34d6c64ERrXvXxCknMyMPLcDMwgIsUmFU8z7DVH W6pIqr+peCm6UoDL6yxzLustoCzvfP9VYrJrjYyJ0CPQr7jB3opoPpJzn rsJ7/r8S0xOpLIcdunaHLUsKfhbiyRZGl/iFnCTTemnBIm3AvSMMsO3yb zb3HSQB/ZLdTdHX/EjldJeLOTo2glPljSM6tY4Y3aZrsxjgYMFE58lMFs Da5tPyp4Cz0n0ftehK1znFdQ9reM+jWdno71rIAy9li1nmC+JP/9VCxRj l9r7xFCb8YeJKVf01/6b60/4ZautPxISrHJX2rZG6QscOg3GtgJI4ziiz w==; X-CSE-ConnectionGUID: lz0GKv5fQQKpsC41jyUutQ== X-CSE-MsgGUID: i/oNXOX1RpOHHo4rQ/W59w== X-IronPort-AV: E=McAfee;i="6700,10204,11323"; a="55513081" X-IronPort-AV: E=Sophos;i="6.13,227,1732608000"; d="scan'208";a="55513081" Received: from orviesa003.jf.intel.com ([10.64.159.143]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jan 2025 22:20:30 -0800 X-CSE-ConnectionGUID: EQidAE4KTReb2OtlCyRgeA== X-CSE-MsgGUID: ZlZXuJx/QTWtYx6WRqgHHQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,199,1725346800"; d="scan'208";a="112334512" Received: from emr.sh.intel.com ([10.112.229.56]) by orviesa003.jf.intel.com with ESMTP; 22 Jan 2025 22:20:26 -0800 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Ian Rogers , Adrian Hunter , Alexander Shishkin , Kan Liang , Andi Kleen , Eranian Stephane Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Dapeng Mi , Dapeng Mi Subject: [PATCH 06/20] perf/x86/intel: Initialize architectural PEBS Date: Thu, 23 Jan 2025 14:07:07 +0000 Message-Id: <20250123140721.2496639-7-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250123140721.2496639-1-dapeng1.mi@linux.intel.com> References: <20250123140721.2496639-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" arch-PEBS leverages CPUID.23H.4/5 sub-leaves enumerate arch-PEBS supported capabilities and counters bitmap. This patch parses these 2 sub-leaves and initializes arch-PEBS capabilities and corresponding structures. Since IA32_PEBS_ENABLE and MSR_PEBS_DATA_CFG MSRs are no longer existed for arch-PEBS, avoid code to access these MSRs as well if arch-PEBS is supported. Signed-off-by: Dapeng Mi --- arch/x86/events/core.c | 21 +++++++++++++----- arch/x86/events/intel/core.c | 20 ++++++++++++++++- arch/x86/events/intel/ds.c | 36 ++++++++++++++++++++++++++----- arch/x86/events/perf_event.h | 25 ++++++++++++++++++--- arch/x86/include/asm/perf_event.h | 7 ++++++ 5 files changed, 95 insertions(+), 14 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 7b6430e5a77b..c36cc606bd19 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -549,14 +549,22 @@ static inline int precise_br_compat(struct perf_event= *event) return m =3D=3D b; } =20 -int x86_pmu_max_precise(void) +int x86_pmu_max_precise(struct pmu *pmu) { int precise =3D 0; =20 - /* Support for constant skid */ if (x86_pmu.pebs_active && !x86_pmu.pebs_broken) { - precise++; + /* arch PEBS */ + if (x86_pmu.arch_pebs) { + precise =3D 2; + if (hybrid(pmu, arch_pebs_cap).pdists) + precise++; + + return precise; + } =20 + /* legacy PEBS - support for constant skid */ + precise++; /* Support for IP fixup */ if (x86_pmu.lbr_nr || x86_pmu.intel_cap.pebs_format >=3D 2) precise++; @@ -564,13 +572,14 @@ int x86_pmu_max_precise(void) if (x86_pmu.pebs_prec_dist) precise++; } + return precise; } =20 int x86_pmu_hw_config(struct perf_event *event) { if (event->attr.precise_ip) { - int precise =3D x86_pmu_max_precise(); + int precise =3D x86_pmu_max_precise(event->pmu); =20 if (event->attr.precise_ip > precise) return -EOPNOTSUPP; @@ -2615,7 +2624,9 @@ static ssize_t max_precise_show(struct device *cdev, struct device_attribute *attr, char *buf) { - return snprintf(buf, PAGE_SIZE, "%d\n", x86_pmu_max_precise()); + struct pmu *pmu =3D dev_get_drvdata(cdev); + + return snprintf(buf, PAGE_SIZE, "%d\n", x86_pmu_max_precise(pmu)); } =20 static DEVICE_ATTR_RO(max_precise); diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index 0063afa0ddac..dc49dcf9b705 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -4973,6 +4973,21 @@ static void update_pmu_cap(struct pmu *pmu) hybrid(pmu, fixed_cntr_mask64) =3D ebx; } =20 + /* Bits[5:4] should be set simultaneously if arch-PEBS is supported */ + if ((sub_bitmaps & ARCH_PERFMON_PEBS_LEAVES) =3D=3D ARCH_PERFMON_PEBS_LEA= VES) { + cpuid_count(ARCH_PERFMON_EXT_LEAF, ARCH_PERFMON_PEBS_CAP_LEAF_BIT, + &eax, &ebx, &ecx, &edx); + hybrid(pmu, arch_pebs_cap).caps =3D (u64)ebx << 32; + + cpuid_count(ARCH_PERFMON_EXT_LEAF, ARCH_PERFMON_PEBS_COUNTER_LEAF_BIT, + &eax, &ebx, &ecx, &edx); + hybrid(pmu, arch_pebs_cap).counters =3D ((u64)ecx << 32) | eax; + hybrid(pmu, arch_pebs_cap).pdists =3D ((u64)edx << 32) | ebx; + } else { + WARN_ON(x86_pmu.arch_pebs =3D=3D 1); + x86_pmu.arch_pebs =3D 0; + } + if (!intel_pmu_broken_perf_cap()) { /* Perf Metric (Bit 15) and PEBS via PT (Bit 16) are hybrid enumeration = */ rdmsrl(MSR_IA32_PERF_CAPABILITIES, hybrid(pmu, intel_cap).capabilities); @@ -5945,7 +5960,7 @@ tsx_is_visible(struct kobject *kobj, struct attribute= *attr, int i) static umode_t pebs_is_visible(struct kobject *kobj, struct attribute *attr, int i) { - return x86_pmu.ds_pebs ? attr->mode : 0; + return intel_pmu_has_pebs() ? attr->mode : 0; } =20 static umode_t @@ -7387,6 +7402,9 @@ __init int intel_pmu_init(void) if (!is_hybrid() && boot_cpu_has(X86_FEATURE_ARCH_PERFMON_EXT)) update_pmu_cap(NULL); =20 + if (x86_pmu.arch_pebs) + pr_cont("Architectural PEBS, "); + intel_pmu_check_counters_mask(&x86_pmu.cntr_mask64, &x86_pmu.fixed_cntr_mask64, &x86_pmu.intel_ctrl); diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index e8a06c8486af..1b33a6a60584 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -1537,6 +1537,9 @@ void intel_pmu_pebs_enable(struct perf_event *event) =20 cpuc->pebs_enabled |=3D 1ULL << hwc->idx; =20 + if (x86_pmu.arch_pebs) + return; + if ((event->hw.flags & PERF_X86_EVENT_PEBS_LDLAT) && (x86_pmu.version < 5= )) cpuc->pebs_enabled |=3D 1ULL << (hwc->idx + 32); else if (event->hw.flags & PERF_X86_EVENT_PEBS_ST) @@ -1606,6 +1609,11 @@ void intel_pmu_pebs_disable(struct perf_event *event) =20 cpuc->pebs_enabled &=3D ~(1ULL << hwc->idx); =20 + hwc->config |=3D ARCH_PERFMON_EVENTSEL_INT; + + if (x86_pmu.arch_pebs) + return; + if ((event->hw.flags & PERF_X86_EVENT_PEBS_LDLAT) && (x86_pmu.version < 5)) cpuc->pebs_enabled &=3D ~(1ULL << (hwc->idx + 32)); @@ -1616,15 +1624,13 @@ void intel_pmu_pebs_disable(struct perf_event *even= t) =20 if (cpuc->enabled) wrmsrl(MSR_IA32_PEBS_ENABLE, cpuc->pebs_enabled); - - hwc->config |=3D ARCH_PERFMON_EVENTSEL_INT; } =20 void intel_pmu_pebs_enable_all(void) { struct cpu_hw_events *cpuc =3D this_cpu_ptr(&cpu_hw_events); =20 - if (cpuc->pebs_enabled) + if (!x86_pmu.arch_pebs && cpuc->pebs_enabled) wrmsrl(MSR_IA32_PEBS_ENABLE, cpuc->pebs_enabled); } =20 @@ -1632,7 +1638,7 @@ void intel_pmu_pebs_disable_all(void) { struct cpu_hw_events *cpuc =3D this_cpu_ptr(&cpu_hw_events); =20 - if (cpuc->pebs_enabled) + if (!x86_pmu.arch_pebs && cpuc->pebs_enabled) __intel_pmu_pebs_disable_all(); } =20 @@ -2649,11 +2655,23 @@ static void intel_pmu_drain_pebs_icl(struct pt_regs= *iregs, struct perf_sample_d } } =20 +static void __init intel_arch_pebs_init(void) +{ + /* + * Current hybrid platforms always both support arch-PEBS or not + * on all kinds of cores. So directly set x86_pmu.arch_pebs flag + * if boot cpu supports arch-PEBS. + */ + x86_pmu.arch_pebs =3D 1; + x86_pmu.pebs_buffer_size =3D PEBS_BUFFER_SIZE; + x86_pmu.pebs_capable =3D ~0ULL; +} + /* * PEBS probe and setup */ =20 -void __init intel_pebs_init(void) +static void __init intel_ds_pebs_init(void) { /* * No support for 32bit formats @@ -2755,6 +2773,14 @@ void __init intel_pebs_init(void) } } =20 +void __init intel_pebs_init(void) +{ + if (x86_pmu.intel_cap.pebs_format =3D=3D 0xf) + intel_arch_pebs_init(); + else + intel_ds_pebs_init(); +} + void perf_restore_debug_store(void) { struct debug_store *ds =3D __this_cpu_read(cpu_hw_events.ds); diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index d5b7f5605e1e..85cb36ad5520 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -707,6 +707,12 @@ enum atom_native_id { skt_native_id =3D 0x3, /* Skymont */ }; =20 +struct arch_pebs_cap { + u64 caps; + u64 counters; + u64 pdists; +}; + struct x86_hybrid_pmu { struct pmu pmu; const char *name; @@ -742,6 +748,8 @@ struct x86_hybrid_pmu { mid_ack :1, enabled_ack :1; =20 + struct arch_pebs_cap arch_pebs_cap; + u64 pebs_data_source[PERF_PEBS_DATA_SOURCE_MAX]; }; =20 @@ -884,7 +892,7 @@ struct x86_pmu { union perf_capabilities intel_cap; =20 /* - * Intel DebugStore bits + * Intel DebugStore and PEBS bits */ unsigned int bts :1, bts_active :1, @@ -895,7 +903,8 @@ struct x86_pmu { pebs_no_tlb :1, pebs_no_isolation :1, pebs_block :1, - pebs_ept :1; + pebs_ept :1, + arch_pebs :1; int pebs_record_size; int pebs_buffer_size; u64 pebs_events_mask; @@ -907,6 +916,11 @@ struct x86_pmu { u64 rtm_abort_event; u64 pebs_capable; =20 + /* + * Intel Architectural PEBS + */ + struct arch_pebs_cap arch_pebs_cap; + /* * Intel LBR */ @@ -1196,7 +1210,7 @@ int x86_reserve_hardware(void); =20 void x86_release_hardware(void); =20 -int x86_pmu_max_precise(void); +int x86_pmu_max_precise(struct pmu *pmu); =20 void hw_perf_lbr_event_destroy(struct perf_event *event); =20 @@ -1766,6 +1780,11 @@ static inline int intel_pmu_max_num_pebs(struct pmu = *pmu) return fls((u32)hybrid(pmu, pebs_events_mask)); } =20 +static inline bool intel_pmu_has_pebs(void) +{ + return x86_pmu.ds_pebs || x86_pmu.arch_pebs; +} + #else /* CONFIG_CPU_SUP_INTEL */ =20 static inline void reserve_ds_buffers(void) diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_= event.h index 71e2ae021374..00ffb9933aba 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -198,6 +198,13 @@ union cpuid10_edx { #define ARCH_PERFMON_EXT_EQ 0x2 #define ARCH_PERFMON_NUM_COUNTER_LEAF_BIT 0x1 #define ARCH_PERFMON_NUM_COUNTER_LEAF BIT(ARCH_PERFMON_NUM_COUNTER_LEAF_B= IT) +#define ARCH_PERFMON_PEBS_CAP_LEAF_BIT 0x4 +#define ARCH_PERFMON_PEBS_CAP_LEAF BIT(ARCH_PERFMON_PEBS_CAP_LEAF_BIT) +#define ARCH_PERFMON_PEBS_COUNTER_LEAF_BIT 0x5 +#define ARCH_PERFMON_PEBS_COUNTER_LEAF BIT(ARCH_PERFMON_PEBS_COUNTER_LEAF= _BIT) + +#define ARCH_PERFMON_PEBS_LEAVES (ARCH_PERFMON_PEBS_CAP_LEAF | \ + ARCH_PERFMON_PEBS_COUNTER_LEAF) =20 /* * Intel Architectural LBR CPUID detection/enumeration details: --=20 2.40.1 From nobody Wed Feb 11 05:49:46 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8FA8C1CEACB; Thu, 23 Jan 2025 06:20:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737613235; cv=none; b=qigeDnnFPkbr28yqa4kV/jhDs1yZ4slhow4YSJMD4lab7pOH6eo4d+KlBjg2S8Y8MESjEZx8uzS+eDEtHmixokdW1IV3lDs4DHvKG5GnWVhNaI2Gz9Tu565tlg9cgYcPHKkwliIymCjuZQjCii2xK+2mINyQv8Vd+WN69TDq/Ec= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737613235; c=relaxed/simple; bh=kzfwHhoRu0YrhGfbh1pFWAgmNXwt10lsV0ly69dnzWc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=WWGCfC2qCjoXj4s29IBQBqJF5yZy1V8svm7IOM1Cdz/wsHY+FYmRH76JHFqWjQvw7U7afmqAA3JC/eCFWvyZBAqIvLkhslT40D8kLAxfp8AcV90qmWXWQcd8LQdXY22qVPIxgoeYZ7tvz09q2FHe0yu8zp0rk4cIIdFUusjzSSA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=goQLzG5r; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="goQLzG5r" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1737613234; x=1769149234; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=kzfwHhoRu0YrhGfbh1pFWAgmNXwt10lsV0ly69dnzWc=; b=goQLzG5rPG2PfngF/2GqvISZ4aRk/M+WXJ8QPZfQyLHPrvCsErZpo1t8 ydtU71hT44GjY1HTeKHRFQf39bqI/QWy006NY2NlMCsNA2DOmRgzxDn1L lrnWvFBt3Y/HZJZSCYA5cbozvRUQeY1xuXGamBAGcTyUs5oy5iW7feYIM H+zSdRAnlfE19YtP2EoeF8sN9ogp6yfG4sXZjXexWO8rRa/3QB7++VXNM iTXaP5mi95GOJh5nVUlA6bVqvQuQBbHx27VUWbk5xm19cjKNczvL371GD kltQ6vv57HD+KJv6WzoC0mMbiMH7VGJAsbOKqEm36IU9byl5jR+X+vGpV A==; X-CSE-ConnectionGUID: Acd9FBqgSKqq77p11xtl8g== X-CSE-MsgGUID: 7DNdDNW0SmuoMJZ5pbPzxg== X-IronPort-AV: E=McAfee;i="6700,10204,11323"; a="55513089" X-IronPort-AV: E=Sophos;i="6.13,227,1732608000"; d="scan'208";a="55513089" Received: from orviesa003.jf.intel.com ([10.64.159.143]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jan 2025 22:20:34 -0800 X-CSE-ConnectionGUID: OEDD+D2sTiyz0CLiYmOp9w== X-CSE-MsgGUID: OtBvj75cS5SLSmqVoWcFng== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,199,1725346800"; d="scan'208";a="112334534" Received: from emr.sh.intel.com ([10.112.229.56]) by orviesa003.jf.intel.com with ESMTP; 22 Jan 2025 22:20:30 -0800 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Ian Rogers , Adrian Hunter , Alexander Shishkin , Kan Liang , Andi Kleen , Eranian Stephane Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Dapeng Mi , Dapeng Mi Subject: [PATCH 07/20] perf/x86/intel/ds: Factor out common PEBS processing code to functions Date: Thu, 23 Jan 2025 14:07:08 +0000 Message-Id: <20250123140721.2496639-8-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250123140721.2496639-1-dapeng1.mi@linux.intel.com> References: <20250123140721.2496639-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Beside some PEBS record layout difference, arch-PEBS can share most of PEBS record processing code with adaptive PEBS. Thus, factor out these common processing code to independent inline functions, so they can be reused by subsequent arch-PEBS handler. Suggested-by: Kan Liang Signed-off-by: Dapeng Mi --- arch/x86/events/intel/ds.c | 80 ++++++++++++++++++++++++++------------ 1 file changed, 55 insertions(+), 25 deletions(-) diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index 1b33a6a60584..be190cb03ef8 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -2587,6 +2587,54 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs = *iregs, struct perf_sample_d } } =20 +static inline void __intel_pmu_handle_pebs_record(struct pt_regs *iregs, + struct pt_regs *regs, + struct perf_sample_data *data, + void *at, u64 pebs_status, + short *counts, void **last, + setup_fn setup_sample) +{ + struct cpu_hw_events *cpuc =3D this_cpu_ptr(&cpu_hw_events); + struct perf_event *event; + int bit; + + for_each_set_bit(bit, (unsigned long *)&pebs_status, X86_PMC_IDX_MAX) { + event =3D cpuc->events[bit]; + + if (WARN_ON_ONCE(!event) || + WARN_ON_ONCE(!event->attr.precise_ip)) + continue; + + if (counts[bit]++) + __intel_pmu_pebs_event(event, iregs, regs, data, + last[bit], setup_sample); + + last[bit] =3D at; + } +} + +static inline void +__intel_pmu_handle_last_pebs_record(struct pt_regs *iregs, struct pt_regs = *regs, + struct perf_sample_data *data, u64 mask, + short *counts, void **last, + setup_fn setup_sample) +{ + struct cpu_hw_events *cpuc =3D this_cpu_ptr(&cpu_hw_events); + struct perf_event *event; + int bit; + + for_each_set_bit(bit, (unsigned long *)&mask, X86_PMC_IDX_MAX) { + if (!counts[bit]) + continue; + + event =3D cpuc->events[bit]; + + __intel_pmu_pebs_last_event(event, iregs, regs, data, last[bit], + counts[bit], setup_sample); + } + +} + static void intel_pmu_drain_pebs_icl(struct pt_regs *iregs, struct perf_sa= mple_data *data) { short counts[INTEL_PMC_IDX_FIXED + MAX_FIXED_PEBS_EVENTS] =3D {}; @@ -2596,9 +2644,7 @@ static void intel_pmu_drain_pebs_icl(struct pt_regs *= iregs, struct perf_sample_d struct x86_perf_regs perf_regs; struct pt_regs *regs =3D &perf_regs.regs; struct pebs_basic *basic; - struct perf_event *event; void *base, *at, *top; - int bit; u64 mask; =20 if (!x86_pmu.pebs_active) @@ -2611,6 +2657,7 @@ static void intel_pmu_drain_pebs_icl(struct pt_regs *= iregs, struct perf_sample_d =20 mask =3D hybrid(cpuc->pmu, pebs_events_mask) | (hybrid(cpuc->pmu, fixed_cntr_mask64) << INTEL_PMC_IDX_FIXED); + mask &=3D cpuc->pebs_enabled; =20 if (unlikely(base >=3D top)) { intel_pmu_pebs_event_update_no_drain(cpuc, X86_PMC_IDX_MAX); @@ -2628,31 +2675,14 @@ static void intel_pmu_drain_pebs_icl(struct pt_regs= *iregs, struct perf_sample_d if (basic->format_size !=3D cpuc->pebs_record_size) continue; =20 - pebs_status =3D basic->applicable_counters & cpuc->pebs_enabled & mask; - for_each_set_bit(bit, (unsigned long *)&pebs_status, X86_PMC_IDX_MAX) { - event =3D cpuc->events[bit]; - - if (WARN_ON_ONCE(!event) || - WARN_ON_ONCE(!event->attr.precise_ip)) - continue; - - if (counts[bit]++) { - __intel_pmu_pebs_event(event, iregs, regs, data, last[bit], - setup_pebs_adaptive_sample_data); - } - last[bit] =3D at; - } + pebs_status =3D mask & basic->applicable_counters; + __intel_pmu_handle_pebs_record(iregs, regs, data, at, + pebs_status, counts, last, + setup_pebs_adaptive_sample_data); } =20 - for_each_set_bit(bit, (unsigned long *)&mask, X86_PMC_IDX_MAX) { - if (!counts[bit]) - continue; - - event =3D cpuc->events[bit]; - - __intel_pmu_pebs_last_event(event, iregs, regs, data, last[bit], - counts[bit], setup_pebs_adaptive_sample_data); - } + __intel_pmu_handle_last_pebs_record(iregs, regs, data, mask, counts, last, + setup_pebs_adaptive_sample_data); } =20 static void __init intel_arch_pebs_init(void) --=20 2.40.1 From nobody Wed Feb 11 05:49:46 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B46ED1CF7A2; Thu, 23 Jan 2025 06:20:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737613239; cv=none; b=OTSbYk4Sl8iIdhbrMHYdYDhTFhYFh/HsOl1FdvQChU6e+CnfuWru4kDmsHL6ewpeHuABKxW22xI4Qcsgqm8EDOWZbNlE3EeDfipiOk8XswRs6OzNsAbNhUm3AiKLEg7xUJiKWjE8OEW7059IabnKDDYLnyc2GZR9FR4kTGfN8Lw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737613239; c=relaxed/simple; bh=ZIeHx22K4f1c7rb8FtO5njRZ8cMFxGe+NHB2w+FueFI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=L+5BcKf73KINSX3VX0OS7Lq9Yth2Jp7R5R6hc9UewPJR/+NkkV823FWIO2E7R9R6Stj/iyfguYtsOYFLdFHQEWNBGIvfq6WZp+J+n2ZHjDvO2qhiqrZBHzpXNBx80Yljlg8YLa/xcwlIImAcDgP78t79s/6KxPnrY/W/UYxN+VQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=hpbJXpzp; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="hpbJXpzp" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1737613238; x=1769149238; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ZIeHx22K4f1c7rb8FtO5njRZ8cMFxGe+NHB2w+FueFI=; b=hpbJXpzpWsNnuoJTvanWSP/7v6JDvt8WYoOKjSzDn/0o0g51K9na3WCd ua5SNVBXRLSvQm2vZqSjL9k91AmL30UFYxLCaFYuBq++dzFZx1FNhQct/ 7E1E7EzHOJnsYTjCRffbnsDDHWsHGX0LLrmZeHwXJgQELz4odCuirUoJj fjwrOt12dOOOEMyIfGGHJf290Qgr35Pvrr3BogFVZOlDHIcRolAF8zGVw a35TO1Y2KiOFwhis4HnFMi/kuap8OVKCgLu7JIPdv774KmZ3c7ai4BbWg pTLMT2oeoCDQa5eNMzv4dfITJW6lhbfQ4rybXFxv2ZJzhHOVWNRiMuGpj Q==; X-CSE-ConnectionGUID: 2TXeqW5fQtCWrA87fGagQg== X-CSE-MsgGUID: ZcsmQWmhRuCcTI+0erWF1A== X-IronPort-AV: E=McAfee;i="6700,10204,11323"; a="55513104" X-IronPort-AV: E=Sophos;i="6.13,227,1732608000"; d="scan'208";a="55513104" Received: from orviesa003.jf.intel.com ([10.64.159.143]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jan 2025 22:20:38 -0800 X-CSE-ConnectionGUID: ukuZRbO9TPGAI/ubt/st+w== X-CSE-MsgGUID: Z8rEISyASn6MZocFyIvc3g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,199,1725346800"; d="scan'208";a="112334564" Received: from emr.sh.intel.com ([10.112.229.56]) by orviesa003.jf.intel.com with ESMTP; 22 Jan 2025 22:20:33 -0800 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Ian Rogers , Adrian Hunter , Alexander Shishkin , Kan Liang , Andi Kleen , Eranian Stephane Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Dapeng Mi , Dapeng Mi Subject: [PATCH 08/20] perf/x86/intel: Process arch-PEBS records or record fragments Date: Thu, 23 Jan 2025 14:07:09 +0000 Message-Id: <20250123140721.2496639-9-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250123140721.2496639-1-dapeng1.mi@linux.intel.com> References: <20250123140721.2496639-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" A significant difference with adaptive PEBS is that arch-PEBS record supports fragments which means an arch-PEBS record could be split into several independent fragments which have its own arch-PEBS header in each fragment. This patch defines architectural PEBS record layout structures and add helpers to process arch-PEBS records or fragments. Only legacy PEBS groups like basic, GPR, XMM and LBR groups are supported in this patch, the new added YMM/ZMM/OPMASK vector registers capturing would be supported in subsequent patches. Signed-off-by: Dapeng Mi --- arch/x86/events/intel/core.c | 9 ++ arch/x86/events/intel/ds.c | 219 ++++++++++++++++++++++++++++++ arch/x86/include/asm/msr-index.h | 6 + arch/x86/include/asm/perf_event.h | 100 ++++++++++++++ 4 files changed, 334 insertions(+) diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index dc49dcf9b705..d73d899d6b02 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -3114,6 +3114,15 @@ static int handle_pmi_common(struct pt_regs *regs, u= 64 status) wrmsrl(MSR_IA32_PEBS_ENABLE, cpuc->pebs_enabled); } =20 + /* + * Arch PEBS sets bit 54 in the global status register + */ + if (__test_and_clear_bit(GLOBAL_STATUS_ARCH_PEBS_THRESHOLD_BIT, + (unsigned long *)&status)) { + handled++; + x86_pmu.drain_pebs(regs, &data); + } + /* * Intel PT */ diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index be190cb03ef8..680637d63679 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -2222,6 +2222,153 @@ static void setup_pebs_adaptive_sample_data(struct = perf_event *event, format_group); } =20 +static inline bool arch_pebs_record_continued(struct arch_pebs_header *hea= der) +{ + /* Continue bit or null PEBS record indicates fragment follows. */ + return header->cont || !(header->format & GENMASK_ULL(63, 16)); +} + +static void setup_arch_pebs_sample_data(struct perf_event *event, + struct pt_regs *iregs, void *__pebs, + struct perf_sample_data *data, + struct pt_regs *regs) +{ + struct cpu_hw_events *cpuc =3D this_cpu_ptr(&cpu_hw_events); + struct arch_pebs_header *header =3D NULL; + struct arch_pebs_aux *meminfo =3D NULL; + struct arch_pebs_gprs *gprs =3D NULL; + struct x86_perf_regs *perf_regs; + void *next_record; + void *at =3D __pebs; + u64 sample_type; + + if (at =3D=3D NULL) + return; + + perf_regs =3D container_of(regs, struct x86_perf_regs, regs); + perf_regs->xmm_regs =3D NULL; + + sample_type =3D event->attr.sample_type; + perf_sample_data_init(data, 0, event->hw.last_period); + data->period =3D event->hw.last_period; + + /* + * We must however always use iregs for the unwinder to stay sane; the + * record BP,SP,IP can point into thin air when the record is from a + * previous PMI context or an (I)RET happened between the record and + * PMI. + */ + if (sample_type & PERF_SAMPLE_CALLCHAIN) + perf_sample_save_callchain(data, event, iregs); + + *regs =3D *iregs; + +again: + header =3D at; + next_record =3D at + sizeof(struct arch_pebs_header); + if (header->basic) { + struct arch_pebs_basic *basic =3D next_record; + + /* The ip in basic is EventingIP */ + set_linear_ip(regs, basic->ip); + regs->flags =3D PERF_EFLAGS_EXACT; + setup_pebs_time(event, data, basic->tsc); + + if (sample_type & PERF_SAMPLE_WEIGHT_STRUCT) + data->weight.var3_w =3D basic->valid ? basic->retire : 0; + + next_record =3D basic + 1; + } + + /* + * The record for MEMINFO is in front of GP + * But PERF_SAMPLE_TRANSACTION needs gprs->ax. + * Save the pointer here but process later. + */ + if (header->aux) { + meminfo =3D next_record; + next_record =3D meminfo + 1; + } + + if (header->gpr) { + gprs =3D next_record; + next_record =3D gprs + 1; + + if (event->attr.precise_ip < 2) { + set_linear_ip(regs, gprs->ip); + regs->flags &=3D ~PERF_EFLAGS_EXACT; + } + + if (sample_type & PERF_SAMPLE_REGS_INTR) + adaptive_pebs_save_regs(regs, (struct pebs_gprs *)gprs); + } + + if (header->aux) { + if (sample_type & PERF_SAMPLE_WEIGHT_TYPE) { + u16 latency =3D meminfo->cache_latency; + u64 tsx_latency =3D intel_get_tsx_weight(meminfo->tsx_tuning); + + data->weight.var2_w =3D meminfo->instr_latency; + + if (sample_type & PERF_SAMPLE_WEIGHT) + data->weight.full =3D latency ?: tsx_latency; + else + data->weight.var1_dw =3D latency ?: (u32)tsx_latency; + data->sample_flags |=3D PERF_SAMPLE_WEIGHT_TYPE; + } + + if (sample_type & PERF_SAMPLE_DATA_SRC) { + data->data_src.val =3D get_data_src(event, meminfo->aux); + data->sample_flags |=3D PERF_SAMPLE_DATA_SRC; + } + + if (sample_type & PERF_SAMPLE_ADDR_TYPE) { + data->addr =3D meminfo->address; + data->sample_flags |=3D PERF_SAMPLE_ADDR; + } + + if (sample_type & PERF_SAMPLE_TRANSACTION) { + data->txn =3D intel_get_tsx_transaction(meminfo->tsx_tuning, + gprs ? gprs->ax : 0); + data->sample_flags |=3D PERF_SAMPLE_TRANSACTION; + } + } + + if (header->xmm) { + struct arch_pebs_xmm *xmm; + + next_record +=3D sizeof(struct arch_pebs_xer_header); + + xmm =3D next_record; + perf_regs->xmm_regs =3D xmm->xmm; + next_record =3D xmm + 1; + } + + if (header->lbr) { + struct arch_pebs_lbr_header *lbr_header =3D next_record; + struct lbr_entry *lbr; + int num_lbr; + + next_record =3D lbr_header + 1; + lbr =3D next_record; + + num_lbr =3D header->lbr =3D=3D ARCH_PEBS_LBR_NUM_VAR ? lbr_header->depth= : + header->lbr * ARCH_PEBS_BASE_LBR_ENTRIES; + next_record +=3D num_lbr * sizeof(struct lbr_entry); + + if (has_branch_stack(event)) { + intel_pmu_store_pebs_lbrs(lbr); + intel_pmu_lbr_save_brstack(data, cpuc, event); + } + } + + /* Parse followed fragments if there are. */ + if (arch_pebs_record_continued(header)) { + at =3D at + header->size; + goto again; + } +} + static inline void * get_next_pebs_record_by_bit(void *base, void *top, int bit) { @@ -2685,6 +2832,77 @@ static void intel_pmu_drain_pebs_icl(struct pt_regs = *iregs, struct perf_sample_d setup_pebs_adaptive_sample_data); } =20 +static void intel_pmu_drain_arch_pebs(struct pt_regs *iregs, + struct perf_sample_data *data) +{ + short counts[INTEL_PMC_IDX_FIXED + MAX_FIXED_PEBS_EVENTS] =3D {}; + void *last[INTEL_PMC_IDX_FIXED + MAX_FIXED_PEBS_EVENTS]; + struct cpu_hw_events *cpuc =3D this_cpu_ptr(&cpu_hw_events); + union arch_pebs_index index; + struct x86_perf_regs perf_regs; + struct pt_regs *regs =3D &perf_regs.regs; + void *base, *at, *top; + u64 mask; + + rdmsrl(MSR_IA32_PEBS_INDEX, index.full); + + if (unlikely(!index.split.wr)) { + intel_pmu_pebs_event_update_no_drain(cpuc, X86_PMC_IDX_MAX); + return; + } + + base =3D cpuc->ds_pebs_vaddr; + top =3D (void *)((u64)cpuc->ds_pebs_vaddr + + (index.split.wr << ARCH_PEBS_INDEX_WR_SHIFT)); + + mask =3D hybrid(cpuc->pmu, arch_pebs_cap).counters & cpuc->pebs_enabled; + + if (!iregs) + iregs =3D &dummy_iregs; + + /* Process all but the last event for each counter. */ + for (at =3D base; at < top;) { + struct arch_pebs_header *header; + struct arch_pebs_basic *basic; + u64 pebs_status; + + header =3D at; + + if (WARN_ON_ONCE(!header->size)) + break; + + /* 1st fragment or single record must have basic group */ + if (!header->basic) { + at +=3D header->size; + continue; + } + + basic =3D at + sizeof(struct arch_pebs_header); + pebs_status =3D mask & basic->applicable_counters; + __intel_pmu_handle_pebs_record(iregs, regs, data, at, + pebs_status, counts, last, + setup_arch_pebs_sample_data); + + /* Skip non-last fragments */ + while (arch_pebs_record_continued(header)) { + if (!header->size) + break; + at +=3D header->size; + header =3D at; + } + + /* Skip last fragment or the single record */ + at +=3D header->size; + } + + __intel_pmu_handle_last_pebs_record(iregs, regs, data, mask, counts, + last, setup_arch_pebs_sample_data); + + index.split.wr =3D 0; + index.split.full =3D 0; + wrmsrl(MSR_IA32_PEBS_INDEX, index.full); +} + static void __init intel_arch_pebs_init(void) { /* @@ -2694,6 +2912,7 @@ static void __init intel_arch_pebs_init(void) */ x86_pmu.arch_pebs =3D 1; x86_pmu.pebs_buffer_size =3D PEBS_BUFFER_SIZE; + x86_pmu.drain_pebs =3D intel_pmu_drain_arch_pebs; x86_pmu.pebs_capable =3D ~0ULL; } =20 diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-in= dex.h index 3ae84c3b8e6d..59d3a050985e 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -312,6 +312,12 @@ #define PERF_CAP_PEBS_MASK (PERF_CAP_PEBS_TRAP | PERF_CAP_ARCH_REG | \ PERF_CAP_PEBS_FORMAT | PERF_CAP_PEBS_BASELINE) =20 +/* Arch PEBS */ +#define MSR_IA32_PEBS_BASE 0x000003f4 +#define MSR_IA32_PEBS_INDEX 0x000003f5 +#define ARCH_PEBS_OFFSET_MASK 0x7fffff +#define ARCH_PEBS_INDEX_WR_SHIFT 4 + #define MSR_IA32_RTIT_CTL 0x00000570 #define RTIT_CTL_TRACEEN BIT(0) #define RTIT_CTL_CYCLEACC BIT(1) diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_= event.h index 00ffb9933aba..d0a3a13b8dae 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -412,6 +412,8 @@ static inline bool is_topdown_idx(int idx) #define GLOBAL_STATUS_LBRS_FROZEN BIT_ULL(GLOBAL_STATUS_LBRS_FROZEN_BIT) #define GLOBAL_STATUS_TRACE_TOPAPMI_BIT 55 #define GLOBAL_STATUS_TRACE_TOPAPMI BIT_ULL(GLOBAL_STATUS_TRACE_TOPAPMI_B= IT) +#define GLOBAL_STATUS_ARCH_PEBS_THRESHOLD_BIT 54 +#define GLOBAL_STATUS_ARCH_PEBS_THRESHOLD BIT_ULL(GLOBAL_STATUS_ARCH_PEBS_= THRESHOLD_BIT) #define GLOBAL_STATUS_PERF_METRICS_OVF_BIT 48 =20 #define GLOBAL_CTRL_EN_PERF_METRICS 48 @@ -473,6 +475,104 @@ struct pebs_xmm { u64 xmm[16*2]; /* two entries for each register */ }; =20 +/* + * Arch PEBS + */ +union arch_pebs_index { + struct { + u64 rsvd:4, + wr:23, + rsvd2:4, + full:1, + en:1, + rsvd3:3, + thresh:23, + rsvd4:5; + } split; + u64 full; +}; + +struct arch_pebs_header { + union { + u64 format; + struct { + u64 size:16, /* Record size */ + rsvd:14, + mode:1, /* 64BIT_MODE */ + cont:1, + rsvd2:3, + cntr:5, + lbr:2, + rsvd3:7, + xmm:1, + ymmh:1, + rsvd4:2, + opmask:1, + zmmh:1, + h16zmm:1, + rsvd5:5, + gpr:1, + aux:1, + basic:1; + }; + }; + u64 rsvd6; +}; + +struct arch_pebs_basic { + u64 ip; + u64 applicable_counters; + u64 tsc; + u64 retire :16, /* Retire Latency */ + valid :1, + rsvd :47; + u64 rsvd2; + u64 rsvd3; +}; + +struct arch_pebs_aux { + u64 address; + u64 rsvd; + u64 rsvd2; + u64 rsvd3; + u64 rsvd4; + u64 aux; + u64 instr_latency :16, + pad2 :16, + cache_latency :16, + pad3 :16; + u64 tsx_tuning; +}; + +struct arch_pebs_gprs { + u64 flags, ip, ax, cx, dx, bx, sp, bp, si, di; + u64 r8, r9, r10, r11, r12, r13, r14, r15, ssp; + u64 rsvd; +}; + +struct arch_pebs_xer_header { + u64 xstate; + u64 rsvd; +}; + +struct arch_pebs_xmm { + u64 xmm[16*2]; /* two entries for each register */ +}; + +#define ARCH_PEBS_LBR_NAN 0x0 +#define ARCH_PEBS_LBR_NUM_8 0x1 +#define ARCH_PEBS_LBR_NUM_16 0x2 +#define ARCH_PEBS_LBR_NUM_VAR 0x3 +#define ARCH_PEBS_BASE_LBR_ENTRIES 8 +struct arch_pebs_lbr_header { + u64 rsvd; + u64 ctl; + u64 depth; + u64 ler_from; + u64 ler_to; + u64 ler_info; +}; + /* * AMD Extended Performance Monitoring and Debug cpuid feature detection */ --=20 2.40.1 From nobody Wed Feb 11 05:49:46 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 38C761D1724; Thu, 23 Jan 2025 06:20:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737613242; cv=none; b=bo8uNjGdkufYOW1B49Es25laSlcPeB0C/2G9ubBBLP1URSryn6+sLYCfUzUNtIfc6B/89Zvtfpchrjt8zqTRMyv9FvzJSb2Ild3I4bDyYWGWx2wGvSXpasqMmto5hHIbbDpVGUf6kcxML+CBAItFZoB2MPnqlMA/fFj63H+oImY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737613242; c=relaxed/simple; bh=CvPk3wWDihTn8g7Y/3fuCoShzeKz2XJK+MWaNNvV3mM=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=lVDeCfhlRihvxXGsbCpAPtSnBoPB39zLZEne1vdEH7CpMAAZf/1I6NSPSULs1wbmN01dpZcqnajaVp4flsjDoa2xj9yWIMF893+qL14jInEG6qox3ov1CbKnnNagP3Gk912YGkTLvBUEOOwd0loWvG0ZaGPokHtQHZiOIbAzD/k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=EMCmyppV; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="EMCmyppV" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1737613242; x=1769149242; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=CvPk3wWDihTn8g7Y/3fuCoShzeKz2XJK+MWaNNvV3mM=; b=EMCmyppVWwJY4GW4OBkNm5napFUwHcG1R/Ra/YTRXO8yWF4q5/+XyQAq EuYdXai1ia1fnJjfKzgtQMuSNBLAWv/ia4gZptus+F35USVziTy48VuH+ 31B9tn84ZayAINSgMsw0J4cxr/m9VDTxyzvnBWT0lZZ1ybL2aOtvPpH59 oV5Rjr53CrBbzq869AER3avRO/E+TY6gz+JHQaUGguLuC7MdW065pugUQ 95087xiH1dSRBh63AZot6ZF3gW8iJyt+sJHX0y2IRKkVAZBfJlxmoAyB6 DBaDYFXn/W6QnREOF0b3xPIu4oWUPnZkonfdCnz2acTEk2kGdso0Lw8cH A==; X-CSE-ConnectionGUID: EbjNzhB5Q5eAVNW75IAxRA== X-CSE-MsgGUID: S2NuWhZvRhq9N6tPA/my4g== X-IronPort-AV: E=McAfee;i="6700,10204,11323"; a="55513113" X-IronPort-AV: E=Sophos;i="6.13,227,1732608000"; d="scan'208";a="55513113" Received: from orviesa003.jf.intel.com ([10.64.159.143]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jan 2025 22:20:41 -0800 X-CSE-ConnectionGUID: i8+8d8PkSWK3xS/TV34RNQ== X-CSE-MsgGUID: 8UrWhZPoQya2o8W8VgNC3g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,199,1725346800"; d="scan'208";a="112334586" Received: from emr.sh.intel.com ([10.112.229.56]) by orviesa003.jf.intel.com with ESMTP; 22 Jan 2025 22:20:37 -0800 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Ian Rogers , Adrian Hunter , Alexander Shishkin , Kan Liang , Andi Kleen , Eranian Stephane Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Dapeng Mi , Dapeng Mi Subject: [PATCH 09/20] perf/x86/intel: Factor out common functions to process PEBS groups Date: Thu, 23 Jan 2025 14:07:10 +0000 Message-Id: <20250123140721.2496639-10-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250123140721.2496639-1-dapeng1.mi@linux.intel.com> References: <20250123140721.2496639-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Adaptive PEBS and arch-PEBS share lots of same code to process these PEBS groups, like basic, GPR and meminfo groups. Extract these shared code to common functions to avoid duplicated code. Signed-off-by: Dapeng Mi --- arch/x86/events/intel/ds.c | 239 ++++++++++++++++++------------------- 1 file changed, 119 insertions(+), 120 deletions(-) diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index 680637d63679..dce2b6ee8bd1 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -2061,6 +2061,91 @@ static inline void __setup_pebs_counter_group(struct= cpu_hw_events *cpuc, =20 #define PEBS_LATENCY_MASK 0xffff =20 +static inline void __setup_perf_sample_data(struct perf_event *event, + struct pt_regs *iregs, + struct perf_sample_data *data) +{ + perf_sample_data_init(data, 0, event->hw.last_period); + data->period =3D event->hw.last_period; + + /* + * We must however always use iregs for the unwinder to stay sane; the + * record BP,SP,IP can point into thin air when the record is from a + * previous PMI context or an (I)RET happened between the record and + * PMI. + */ + perf_sample_save_callchain(data, event, iregs); +} + +static inline void __setup_pebs_basic_group(struct perf_event *event, + struct pt_regs *regs, + struct perf_sample_data *data, + u64 sample_type, u64 ip, + u64 tsc, u16 retire) +{ + /* The ip in basic is EventingIP */ + set_linear_ip(regs, ip); + regs->flags =3D PERF_EFLAGS_EXACT; + setup_pebs_time(event, data, tsc); + + if (sample_type & PERF_SAMPLE_WEIGHT_STRUCT) + data->weight.var3_w =3D retire; +} + +static inline void __setup_pebs_gpr_group(struct perf_event *event, + struct pt_regs *regs, + struct pebs_gprs *gprs, + u64 sample_type) +{ + if (event->attr.precise_ip < 2) { + set_linear_ip(regs, gprs->ip); + regs->flags &=3D ~PERF_EFLAGS_EXACT; + } + + if (sample_type & PERF_SAMPLE_REGS_INTR) + adaptive_pebs_save_regs(regs, gprs); +} + +static inline void __setup_pebs_meminfo_group(struct perf_event *event, + struct perf_sample_data *data, + u64 sample_type, u64 latency, + u16 instr_latency, u64 address, + u64 aux, u64 tsx_tuning, u64 ax) +{ + if (sample_type & PERF_SAMPLE_WEIGHT_TYPE) { + u64 tsx_latency =3D intel_get_tsx_weight(tsx_tuning); + + data->weight.var2_w =3D instr_latency; + + /* + * Although meminfo::latency is defined as a u64, + * only the lower 32 bits include the valid data + * in practice on Ice Lake and earlier platforms. + */ + if (sample_type & PERF_SAMPLE_WEIGHT) + data->weight.full =3D latency ?: tsx_latency; + else + data->weight.var1_dw =3D (u32)latency ?: tsx_latency; + + data->sample_flags |=3D PERF_SAMPLE_WEIGHT_TYPE; + } + + if (sample_type & PERF_SAMPLE_DATA_SRC) { + data->data_src.val =3D get_data_src(event, aux); + data->sample_flags |=3D PERF_SAMPLE_DATA_SRC; + } + + if (sample_type & PERF_SAMPLE_ADDR_TYPE) { + data->addr =3D address; + data->sample_flags |=3D PERF_SAMPLE_ADDR; + } + + if (sample_type & PERF_SAMPLE_TRANSACTION) { + data->txn =3D intel_get_tsx_transaction(tsx_tuning, ax); + data->sample_flags |=3D PERF_SAMPLE_TRANSACTION; + } +} + /* * With adaptive PEBS the layout depends on what fields are configured. */ @@ -2070,12 +2155,14 @@ static void setup_pebs_adaptive_sample_data(struct = perf_event *event, struct pt_regs *regs) { struct cpu_hw_events *cpuc =3D this_cpu_ptr(&cpu_hw_events); + u64 sample_type =3D event->attr.sample_type; struct pebs_basic *basic =3D __pebs; void *next_record =3D basic + 1; - u64 sample_type, format_group; struct pebs_meminfo *meminfo =3D NULL; struct pebs_gprs *gprs =3D NULL; struct x86_perf_regs *perf_regs; + u64 format_group; + u16 retire; =20 if (basic =3D=3D NULL) return; @@ -2083,32 +2170,17 @@ static void setup_pebs_adaptive_sample_data(struct = perf_event *event, perf_regs =3D container_of(regs, struct x86_perf_regs, regs); perf_regs->xmm_regs =3D NULL; =20 - sample_type =3D event->attr.sample_type; format_group =3D basic->format_group; - perf_sample_data_init(data, 0, event->hw.last_period); - data->period =3D event->hw.last_period; =20 - setup_pebs_time(event, data, basic->tsc); - - /* - * We must however always use iregs for the unwinder to stay sane; the - * record BP,SP,IP can point into thin air when the record is from a - * previous PMI context or an (I)RET happened between the record and - * PMI. - */ - perf_sample_save_callchain(data, event, iregs); + __setup_perf_sample_data(event, iregs, data); =20 *regs =3D *iregs; - /* The ip in basic is EventingIP */ - set_linear_ip(regs, basic->ip); - regs->flags =3D PERF_EFLAGS_EXACT; =20 - if (sample_type & PERF_SAMPLE_WEIGHT_STRUCT) { - if (x86_pmu.flags & PMU_FL_RETIRE_LATENCY) - data->weight.var3_w =3D basic->retire_latency; - else - data->weight.var3_w =3D 0; - } + /* basic group */ + retire =3D x86_pmu.flags & PMU_FL_RETIRE_LATENCY ? + basic->retire_latency : 0; + __setup_pebs_basic_group(event, regs, data, sample_type, + basic->ip, basic->tsc, retire); =20 /* * The record for MEMINFO is in front of GP @@ -2124,54 +2196,20 @@ static void setup_pebs_adaptive_sample_data(struct = perf_event *event, gprs =3D next_record; next_record =3D gprs + 1; =20 - if (event->attr.precise_ip < 2) { - set_linear_ip(regs, gprs->ip); - regs->flags &=3D ~PERF_EFLAGS_EXACT; - } - - if (sample_type & PERF_SAMPLE_REGS_INTR) - adaptive_pebs_save_regs(regs, gprs); + __setup_pebs_gpr_group(event, regs, gprs, sample_type); } =20 if (format_group & PEBS_DATACFG_MEMINFO) { - if (sample_type & PERF_SAMPLE_WEIGHT_TYPE) { - u64 latency =3D x86_pmu.flags & PMU_FL_INSTR_LATENCY ? - meminfo->cache_latency : meminfo->mem_latency; - - if (x86_pmu.flags & PMU_FL_INSTR_LATENCY) - data->weight.var2_w =3D meminfo->instr_latency; - - /* - * Although meminfo::latency is defined as a u64, - * only the lower 32 bits include the valid data - * in practice on Ice Lake and earlier platforms. - */ - if (sample_type & PERF_SAMPLE_WEIGHT) { - data->weight.full =3D latency ?: - intel_get_tsx_weight(meminfo->tsx_tuning); - } else { - data->weight.var1_dw =3D (u32)latency ?: - intel_get_tsx_weight(meminfo->tsx_tuning); - } - - data->sample_flags |=3D PERF_SAMPLE_WEIGHT_TYPE; - } - - if (sample_type & PERF_SAMPLE_DATA_SRC) { - data->data_src.val =3D get_data_src(event, meminfo->aux); - data->sample_flags |=3D PERF_SAMPLE_DATA_SRC; - } + u64 latency =3D x86_pmu.flags & PMU_FL_INSTR_LATENCY ? + meminfo->cache_latency : meminfo->mem_latency; + u64 instr_latency =3D x86_pmu.flags & PMU_FL_INSTR_LATENCY ? + meminfo->instr_latency : 0; + u64 ax =3D gprs ? gprs->ax : 0; =20 - if (sample_type & PERF_SAMPLE_ADDR_TYPE) { - data->addr =3D meminfo->address; - data->sample_flags |=3D PERF_SAMPLE_ADDR; - } - - if (sample_type & PERF_SAMPLE_TRANSACTION) { - data->txn =3D intel_get_tsx_transaction(meminfo->tsx_tuning, - gprs ? gprs->ax : 0); - data->sample_flags |=3D PERF_SAMPLE_TRANSACTION; - } + __setup_pebs_meminfo_group(event, data, sample_type, latency, + instr_latency, meminfo->address, + meminfo->aux, meminfo->tsx_tuning, + ax); } =20 if (format_group & PEBS_DATACFG_XMMS) { @@ -2234,13 +2272,13 @@ static void setup_arch_pebs_sample_data(struct perf= _event *event, struct pt_regs *regs) { struct cpu_hw_events *cpuc =3D this_cpu_ptr(&cpu_hw_events); + u64 sample_type =3D event->attr.sample_type; struct arch_pebs_header *header =3D NULL; struct arch_pebs_aux *meminfo =3D NULL; struct arch_pebs_gprs *gprs =3D NULL; struct x86_perf_regs *perf_regs; void *next_record; void *at =3D __pebs; - u64 sample_type; =20 if (at =3D=3D NULL) return; @@ -2248,18 +2286,7 @@ static void setup_arch_pebs_sample_data(struct perf_= event *event, perf_regs =3D container_of(regs, struct x86_perf_regs, regs); perf_regs->xmm_regs =3D NULL; =20 - sample_type =3D event->attr.sample_type; - perf_sample_data_init(data, 0, event->hw.last_period); - data->period =3D event->hw.last_period; - - /* - * We must however always use iregs for the unwinder to stay sane; the - * record BP,SP,IP can point into thin air when the record is from a - * previous PMI context or an (I)RET happened between the record and - * PMI. - */ - if (sample_type & PERF_SAMPLE_CALLCHAIN) - perf_sample_save_callchain(data, event, iregs); + __setup_perf_sample_data(event, iregs, data); =20 *regs =3D *iregs; =20 @@ -2268,16 +2295,14 @@ static void setup_arch_pebs_sample_data(struct perf= _event *event, next_record =3D at + sizeof(struct arch_pebs_header); if (header->basic) { struct arch_pebs_basic *basic =3D next_record; + u16 retire =3D 0; =20 - /* The ip in basic is EventingIP */ - set_linear_ip(regs, basic->ip); - regs->flags =3D PERF_EFLAGS_EXACT; - setup_pebs_time(event, data, basic->tsc); + next_record =3D basic + 1; =20 if (sample_type & PERF_SAMPLE_WEIGHT_STRUCT) - data->weight.var3_w =3D basic->valid ? basic->retire : 0; - - next_record =3D basic + 1; + retire =3D basic->valid ? basic->retire : 0; + __setup_pebs_basic_group(event, regs, data, sample_type, + basic->ip, basic->tsc, retire); } =20 /* @@ -2294,44 +2319,18 @@ static void setup_arch_pebs_sample_data(struct perf= _event *event, gprs =3D next_record; next_record =3D gprs + 1; =20 - if (event->attr.precise_ip < 2) { - set_linear_ip(regs, gprs->ip); - regs->flags &=3D ~PERF_EFLAGS_EXACT; - } - - if (sample_type & PERF_SAMPLE_REGS_INTR) - adaptive_pebs_save_regs(regs, (struct pebs_gprs *)gprs); + __setup_pebs_gpr_group(event, regs, (struct pebs_gprs *)gprs, + sample_type); } =20 if (header->aux) { - if (sample_type & PERF_SAMPLE_WEIGHT_TYPE) { - u16 latency =3D meminfo->cache_latency; - u64 tsx_latency =3D intel_get_tsx_weight(meminfo->tsx_tuning); + u64 ax =3D gprs ? gprs->ax : 0; =20 - data->weight.var2_w =3D meminfo->instr_latency; - - if (sample_type & PERF_SAMPLE_WEIGHT) - data->weight.full =3D latency ?: tsx_latency; - else - data->weight.var1_dw =3D latency ?: (u32)tsx_latency; - data->sample_flags |=3D PERF_SAMPLE_WEIGHT_TYPE; - } - - if (sample_type & PERF_SAMPLE_DATA_SRC) { - data->data_src.val =3D get_data_src(event, meminfo->aux); - data->sample_flags |=3D PERF_SAMPLE_DATA_SRC; - } - - if (sample_type & PERF_SAMPLE_ADDR_TYPE) { - data->addr =3D meminfo->address; - data->sample_flags |=3D PERF_SAMPLE_ADDR; - } - - if (sample_type & PERF_SAMPLE_TRANSACTION) { - data->txn =3D intel_get_tsx_transaction(meminfo->tsx_tuning, - gprs ? gprs->ax : 0); - data->sample_flags |=3D PERF_SAMPLE_TRANSACTION; - } + __setup_pebs_meminfo_group(event, data, sample_type, + meminfo->cache_latency, + meminfo->instr_latency, + meminfo->address, meminfo->aux, + meminfo->tsx_tuning, ax); } =20 if (header->xmm) { --=20 2.40.1 From nobody Wed Feb 11 05:49:46 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1D38C1CF7A2; Thu, 23 Jan 2025 06:20:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737613246; cv=none; b=Zm+TmG6BMDsoej4apTakSgQjH8LCthJIsq7ABNX3ICEx5Nr+cA/smMSWqA1UZZjjyQ1qEyj1nThAEfzzS//Ud8zuQVoQh6DXzV20yh6X6rCfJVxLA5iyznl3j0o05pInnE1LPZJShsmsdY0Poo71BYhgwebJ3KEUhOWW/HXn0As= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737613246; c=relaxed/simple; bh=biOZ70rpQyKyeeNYPuFrFasq/dme8+amH3/CJLd8yqY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=GdPSNql+JLyKn7s49OruNVjWJNRftAOGwFFxhYetyGZZ7u4+/o/YHCY2uahX+ALG8FqqVu9VdR5YA0fPDevbHiwqASAhzjFdC57ptvMTO37xpdK9jlNM1ABwwF42NIrBDzv6gcvJNWhB0aPILI3ejjO6fuSzLWZ26WtinVBvS1I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=YHREb3Ly; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="YHREb3Ly" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1737613246; x=1769149246; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=biOZ70rpQyKyeeNYPuFrFasq/dme8+amH3/CJLd8yqY=; b=YHREb3Ly78+GRaAa5dZT7RF0PAoSH/AngXEs/gltFY7gasDXMMdH+lva jJaNHwAfjROKrfTasHuMUcT0L5vieoUdcyeFugfm8tV8qkcnyTbugqZjg i4Szhk9LVDcPQePSv1LwkD/JWQ38QEoxQkziGhufJsfkv9xVWjCEU4Hgc 64pCcv9nCoD0IaYiN1RP2AgHf3sXGMmfjTVMJrZ9D0BoPKLXGhB5MjlXN 0TnHZ0NK183cDvUIsG71RaAN8obWv4H7afoWzvIUTMjn9XEEv2O4InloL hYCAFAqYe1qMT+A4xmJA3gLn7ogt9AUP+dEUKCe3PJ73PxwkMBPJ9fUNq A==; X-CSE-ConnectionGUID: FPPcX87dT5qQmAs+B/PRyQ== X-CSE-MsgGUID: HfjVzj0aTeCI4FU9z8uYGg== X-IronPort-AV: E=McAfee;i="6700,10204,11323"; a="55513121" X-IronPort-AV: E=Sophos;i="6.13,227,1732608000"; d="scan'208";a="55513121" Received: from orviesa003.jf.intel.com ([10.64.159.143]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jan 2025 22:20:45 -0800 X-CSE-ConnectionGUID: Fw/CBf6BRdOg5gBz23lPTw== X-CSE-MsgGUID: kqykVnQ6TsK5+JtdEB+Z2A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,199,1725346800"; d="scan'208";a="112334602" Received: from emr.sh.intel.com ([10.112.229.56]) by orviesa003.jf.intel.com with ESMTP; 22 Jan 2025 22:20:41 -0800 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Ian Rogers , Adrian Hunter , Alexander Shishkin , Kan Liang , Andi Kleen , Eranian Stephane Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Dapeng Mi , Dapeng Mi Subject: [PATCH 10/20] perf/x86/intel: Allocate arch-PEBS buffer and initialize PEBS_BASE MSR Date: Thu, 23 Jan 2025 14:07:11 +0000 Message-Id: <20250123140721.2496639-11-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250123140721.2496639-1-dapeng1.mi@linux.intel.com> References: <20250123140721.2496639-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Arch-PEBS introduces a new MSR IA32_PEBS_BASE to store the arch-PEBS buffer physical address. This patch allocates arch-PEBS buffer and then initialize IA32_PEBS_BASE MSR with the buffer physical address. Co-developed-by: Kan Liang Signed-off-by: Kan Liang Signed-off-by: Dapeng Mi --- arch/x86/events/core.c | 4 +- arch/x86/events/intel/core.c | 4 +- arch/x86/events/intel/ds.c | 112 ++++++++++++++++++++------------ arch/x86/events/perf_event.h | 16 ++--- arch/x86/include/asm/intel_ds.h | 3 +- 5 files changed, 84 insertions(+), 55 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index c36cc606bd19..f40b03adb5c7 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -411,7 +411,7 @@ int x86_reserve_hardware(void) if (!reserve_pmc_hardware()) { err =3D -EBUSY; } else { - reserve_ds_buffers(); + reserve_bts_pebs_buffers(); reserve_lbr_buffers(); } } @@ -427,7 +427,7 @@ void x86_release_hardware(void) { if (atomic_dec_and_mutex_lock(&pmc_refcount, &pmc_reserve_mutex)) { release_pmc_hardware(); - release_ds_buffers(); + release_bts_pebs_buffers(); release_lbr_buffers(); mutex_unlock(&pmc_reserve_mutex); } diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index d73d899d6b02..7775e1e1c1e9 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -5122,7 +5122,7 @@ static void intel_pmu_cpu_starting(int cpu) if (is_hybrid() && !init_hybrid_pmu(cpu)) return; =20 - init_debug_store_on_cpu(cpu); + init_pebs_buf_on_cpu(cpu); /* * Deal with CPUs that don't clear their LBRs on power-up. */ @@ -5216,7 +5216,7 @@ static void free_excl_cntrs(struct cpu_hw_events *cpu= c) =20 static void intel_pmu_cpu_dying(int cpu) { - fini_debug_store_on_cpu(cpu); + fini_pebs_buf_on_cpu(cpu); } =20 void intel_cpuc_finish(struct cpu_hw_events *cpuc) diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index dce2b6ee8bd1..2f2c6b7c801b 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -545,26 +545,6 @@ struct pebs_record_skl { u64 tsc; }; =20 -void init_debug_store_on_cpu(int cpu) -{ - struct debug_store *ds =3D per_cpu(cpu_hw_events, cpu).ds; - - if (!ds) - return; - - wrmsr_on_cpu(cpu, MSR_IA32_DS_AREA, - (u32)((u64)(unsigned long)ds), - (u32)((u64)(unsigned long)ds >> 32)); -} - -void fini_debug_store_on_cpu(int cpu) -{ - if (!per_cpu(cpu_hw_events, cpu).ds) - return; - - wrmsr_on_cpu(cpu, MSR_IA32_DS_AREA, 0, 0); -} - static DEFINE_PER_CPU(void *, insn_buffer); =20 static void ds_update_cea(void *cea, void *addr, size_t size, pgprot_t pro= t) @@ -624,13 +604,18 @@ static int alloc_pebs_buffer(int cpu) int max, node =3D cpu_to_node(cpu); void *buffer, *insn_buff, *cea; =20 - if (!x86_pmu.ds_pebs) + if (!intel_pmu_has_pebs()) return 0; =20 - buffer =3D dsalloc_pages(bsiz, GFP_KERNEL, cpu); + buffer =3D dsalloc_pages(bsiz, preemptible() ? GFP_KERNEL : GFP_ATOMIC, c= pu); if (unlikely(!buffer)) return -ENOMEM; =20 + if (x86_pmu.arch_pebs) { + hwev->pebs_vaddr =3D buffer; + return 0; + } + /* * HSW+ already provides us the eventing ip; no need to allocate this * buffer then. @@ -643,7 +628,7 @@ static int alloc_pebs_buffer(int cpu) } per_cpu(insn_buffer, cpu) =3D insn_buff; } - hwev->ds_pebs_vaddr =3D buffer; + hwev->pebs_vaddr =3D buffer; /* Update the cpu entry area mapping */ cea =3D &get_cpu_entry_area(cpu)->cpu_debug_buffers.pebs_buffer; ds->pebs_buffer_base =3D (unsigned long) cea; @@ -659,17 +644,20 @@ static void release_pebs_buffer(int cpu) struct cpu_hw_events *hwev =3D per_cpu_ptr(&cpu_hw_events, cpu); void *cea; =20 - if (!x86_pmu.ds_pebs) + if (!intel_pmu_has_pebs()) return; =20 - kfree(per_cpu(insn_buffer, cpu)); - per_cpu(insn_buffer, cpu) =3D NULL; + if (x86_pmu.ds_pebs) { + kfree(per_cpu(insn_buffer, cpu)); + per_cpu(insn_buffer, cpu) =3D NULL; =20 - /* Clear the fixmap */ - cea =3D &get_cpu_entry_area(cpu)->cpu_debug_buffers.pebs_buffer; - ds_clear_cea(cea, x86_pmu.pebs_buffer_size); - dsfree_pages(hwev->ds_pebs_vaddr, x86_pmu.pebs_buffer_size); - hwev->ds_pebs_vaddr =3D NULL; + /* Clear the fixmap */ + cea =3D &get_cpu_entry_area(cpu)->cpu_debug_buffers.pebs_buffer; + ds_clear_cea(cea, x86_pmu.pebs_buffer_size); + } + + dsfree_pages(hwev->pebs_vaddr, x86_pmu.pebs_buffer_size); + hwev->pebs_vaddr =3D NULL; } =20 static int alloc_bts_buffer(int cpu) @@ -730,11 +718,11 @@ static void release_ds_buffer(int cpu) per_cpu(cpu_hw_events, cpu).ds =3D NULL; } =20 -void release_ds_buffers(void) +void release_bts_pebs_buffers(void) { int cpu; =20 - if (!x86_pmu.bts && !x86_pmu.ds_pebs) + if (!x86_pmu.bts && !intel_pmu_has_pebs()) return; =20 for_each_possible_cpu(cpu) @@ -746,7 +734,7 @@ void release_ds_buffers(void) * observe cpu_hw_events.ds and not program the DS_AREA when * they come up. */ - fini_debug_store_on_cpu(cpu); + fini_pebs_buf_on_cpu(cpu); } =20 for_each_possible_cpu(cpu) { @@ -755,7 +743,7 @@ void release_ds_buffers(void) } } =20 -void reserve_ds_buffers(void) +void reserve_bts_pebs_buffers(void) { int bts_err =3D 0, pebs_err =3D 0; int cpu; @@ -763,19 +751,20 @@ void reserve_ds_buffers(void) x86_pmu.bts_active =3D 0; x86_pmu.pebs_active =3D 0; =20 - if (!x86_pmu.bts && !x86_pmu.ds_pebs) + if (!x86_pmu.bts && !intel_pmu_has_pebs()) return; =20 if (!x86_pmu.bts) bts_err =3D 1; =20 - if (!x86_pmu.ds_pebs) + if (!intel_pmu_has_pebs()) pebs_err =3D 1; =20 for_each_possible_cpu(cpu) { if (alloc_ds_buffer(cpu)) { bts_err =3D 1; - pebs_err =3D 1; + if (x86_pmu.ds_pebs) + pebs_err =3D 1; } =20 if (!bts_err && alloc_bts_buffer(cpu)) @@ -805,7 +794,7 @@ void reserve_ds_buffers(void) if (x86_pmu.bts && !bts_err) x86_pmu.bts_active =3D 1; =20 - if (x86_pmu.ds_pebs && !pebs_err) + if (intel_pmu_has_pebs() && !pebs_err) x86_pmu.pebs_active =3D 1; =20 for_each_possible_cpu(cpu) { @@ -813,11 +802,50 @@ void reserve_ds_buffers(void) * Ignores wrmsr_on_cpu() errors for offline CPUs they * will get this call through intel_pmu_cpu_starting(). */ - init_debug_store_on_cpu(cpu); + init_pebs_buf_on_cpu(cpu); } } } =20 +void init_pebs_buf_on_cpu(int cpu) +{ + struct cpu_hw_events *cpuc =3D per_cpu_ptr(&cpu_hw_events, cpu); + + if (x86_pmu.arch_pebs) { + u64 arch_pebs_base; + + if (!cpuc->pebs_vaddr) + return; + + /* + * 4KB-aligned pointer of the output buffer + * (__alloc_pages_node() return page aligned address) + * Buffer Size =3D 4KB * 2^SIZE + * contiguous physical buffer (__alloc_pages_node() with order) + */ + arch_pebs_base =3D virt_to_phys(cpuc->pebs_vaddr) | PEBS_BUFFER_SHIFT; + + wrmsr_on_cpu(cpu, MSR_IA32_PEBS_BASE, + (u32)arch_pebs_base, + (u32)(arch_pebs_base >> 32)); + } else if (cpuc->ds) { + /* legacy PEBS */ + wrmsr_on_cpu(cpu, MSR_IA32_DS_AREA, + (u32)((u64)(unsigned long)cpuc->ds), + (u32)((u64)(unsigned long)cpuc->ds >> 32)); + } +} + +void fini_pebs_buf_on_cpu(int cpu) +{ + struct cpu_hw_events *cpuc =3D per_cpu_ptr(&cpu_hw_events, cpu); + + if (x86_pmu.arch_pebs) + wrmsr_on_cpu(cpu, MSR_IA32_PEBS_BASE, 0, 0); + else if (cpuc->ds) + wrmsr_on_cpu(cpu, MSR_IA32_DS_AREA, 0, 0); +} + /* * BTS */ @@ -2850,8 +2878,8 @@ static void intel_pmu_drain_arch_pebs(struct pt_regs = *iregs, return; } =20 - base =3D cpuc->ds_pebs_vaddr; - top =3D (void *)((u64)cpuc->ds_pebs_vaddr + + base =3D cpuc->pebs_vaddr; + top =3D (void *)((u64)cpuc->pebs_vaddr + (index.split.wr << ARCH_PEBS_INDEX_WR_SHIFT)); =20 mask =3D hybrid(cpuc->pmu, arch_pebs_cap).counters & cpuc->pebs_enabled; diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 85cb36ad5520..a3c4374fe7f3 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -266,11 +266,11 @@ struct cpu_hw_events { int is_fake; =20 /* - * Intel DebugStore bits + * Intel DebugStore/PEBS bits */ struct debug_store *ds; - void *ds_pebs_vaddr; void *ds_bts_vaddr; + void *pebs_vaddr; u64 pebs_enabled; int n_pebs; int n_large_pebs; @@ -1594,13 +1594,13 @@ extern void intel_cpuc_finish(struct cpu_hw_events = *cpuc); =20 int intel_pmu_init(void); =20 -void init_debug_store_on_cpu(int cpu); +void init_pebs_buf_on_cpu(int cpu); =20 -void fini_debug_store_on_cpu(int cpu); +void fini_pebs_buf_on_cpu(int cpu); =20 -void release_ds_buffers(void); +void release_bts_pebs_buffers(void); =20 -void reserve_ds_buffers(void); +void reserve_bts_pebs_buffers(void); =20 void release_lbr_buffers(void); =20 @@ -1787,11 +1787,11 @@ static inline bool intel_pmu_has_pebs(void) =20 #else /* CONFIG_CPU_SUP_INTEL */ =20 -static inline void reserve_ds_buffers(void) +static inline void reserve_bts_pebs_buffers(void) { } =20 -static inline void release_ds_buffers(void) +static inline void release_bts_pebs_buffers(void) { } =20 diff --git a/arch/x86/include/asm/intel_ds.h b/arch/x86/include/asm/intel_d= s.h index 5dbeac48a5b9..023c2883f9f3 100644 --- a/arch/x86/include/asm/intel_ds.h +++ b/arch/x86/include/asm/intel_ds.h @@ -4,7 +4,8 @@ #include =20 #define BTS_BUFFER_SIZE (PAGE_SIZE << 4) -#define PEBS_BUFFER_SIZE (PAGE_SIZE << 4) +#define PEBS_BUFFER_SHIFT 4 +#define PEBS_BUFFER_SIZE (PAGE_SIZE << PEBS_BUFFER_SHIFT) =20 /* The maximal number of PEBS events: */ #define MAX_PEBS_EVENTS_FMT4 8 --=20 2.40.1 From nobody Wed Feb 11 05:49:46 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 522191D1724; Thu, 23 Jan 2025 06:20:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737613249; cv=none; b=Y2rog8IsSpqLU6ZPvQfjqiRdgv/9J8KiCYWYSvR+HnNcZcyCkUGCkqlBYff3O5kdFAC3ibvrPfeQP3ZHAwfOCdKNajXi/EW1QocByxmPBwKd9qYydMIuhf9XkSQdG6gUYIE8v/QsRwXX+wdUx1WM4Q/tecbxz0AM3k3TLklXm7Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737613249; c=relaxed/simple; bh=mZKJOuHIfxOBrRVooiJbWghFpD1aSivSH6cGxszIQdY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=PbZt8pHRfUoApG4AGE4ncvt2ymx6+fKR960s8Jr1G+QIHECUcfDEtWeNP86oJPgl+OCO9eoRyLy+h5RX0zWnycUqUavB4mFGGID+sQbyAssWEF9a9t+HOe1aG1SYha7JXucBmwOc22M/r9qokUvd4q3o7KrTseWeFmlscMQNYxU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=gkznNnc8; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="gkznNnc8" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1737613249; x=1769149249; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=mZKJOuHIfxOBrRVooiJbWghFpD1aSivSH6cGxszIQdY=; b=gkznNnc85LL48IttoxYz1tAXh2xQaypQv0fF4yzX04PFsxMVyxUkIrdw EoIErS4vS0niE7T858etZI3rxzQTOv/ueyS99vtqgYbLa8v5IObqAzxEn jUn1LtvZfH8f1FNy3SE5S65+ThUMppq+GKEwki3OkmXLeRPtkfeZaDtGg DRG4Ir8wBasvQOqkSblAJTrFs66+4cJim2AQXQFlhA3+fare+/nuPHhhu nTHEUu9ZfxJvz/AFEpdNI/tAHfNBcFdZMStb1CtsHiHEHJIkXf17LZGn8 uRx7HhubqBVEg5UKhFZbUdtp3nZ0sUwF3gNQ1wUVHpnVk91B/Zz0YVWPh Q==; X-CSE-ConnectionGUID: bumZ3KzzRV+DrsB4IYTl0A== X-CSE-MsgGUID: OPXYHr5tQri8W21bAnibhw== X-IronPort-AV: E=McAfee;i="6700,10204,11323"; a="55513131" X-IronPort-AV: E=Sophos;i="6.13,227,1732608000"; d="scan'208";a="55513131" Received: from orviesa003.jf.intel.com ([10.64.159.143]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jan 2025 22:20:49 -0800 X-CSE-ConnectionGUID: WumO+90kTAqWnE2niY+v6Q== X-CSE-MsgGUID: E55x1I8zTK+FtRBj0XQGjg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,199,1725346800"; d="scan'208";a="112334624" Received: from emr.sh.intel.com ([10.112.229.56]) by orviesa003.jf.intel.com with ESMTP; 22 Jan 2025 22:20:45 -0800 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Ian Rogers , Adrian Hunter , Alexander Shishkin , Kan Liang , Andi Kleen , Eranian Stephane Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Dapeng Mi , Dapeng Mi Subject: [PATCH 11/20] perf/x86/intel: Setup PEBS constraints base on counter & pdist map Date: Thu, 23 Jan 2025 14:07:12 +0000 Message-Id: <20250123140721.2496639-12-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250123140721.2496639-1-dapeng1.mi@linux.intel.com> References: <20250123140721.2496639-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" arch-PEBS provides CPUIDs to enumerate which counters support PEBS sampling and precise distribution PEBS sampling. Thus PEBS constraints can be dynamically configured base on these counter and precise distribution bitmap instead of defining them statically. Signed-off-by: Dapeng Mi --- arch/x86/events/intel/core.c | 20 ++++++++++++++++++++ arch/x86/events/intel/ds.c | 1 + 2 files changed, 21 insertions(+) diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index 7775e1e1c1e9..0f1be36113fa 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -3728,6 +3728,7 @@ intel_get_event_constraints(struct cpu_hw_events *cpu= c, int idx, struct perf_event *event) { struct event_constraint *c1, *c2; + struct pmu *pmu =3D event->pmu; =20 c1 =3D cpuc->event_constraint[idx]; =20 @@ -3754,6 +3755,25 @@ intel_get_event_constraints(struct cpu_hw_events *cp= uc, int idx, c2->weight =3D hweight64(c2->idxmsk64); } =20 + if (x86_pmu.arch_pebs && event->attr.precise_ip) { + u64 pebs_cntrs_mask; + u64 cntrs_mask; + + if (event->attr.precise_ip >=3D 3) + pebs_cntrs_mask =3D hybrid(pmu, arch_pebs_cap).pdists; + else + pebs_cntrs_mask =3D hybrid(pmu, arch_pebs_cap).counters; + + cntrs_mask =3D hybrid(pmu, fixed_cntr_mask64) << INTEL_PMC_IDX_FIXED | + hybrid(pmu, cntr_mask64); + + if (pebs_cntrs_mask !=3D cntrs_mask) { + c2 =3D dyn_constraint(cpuc, c2, idx); + c2->idxmsk64 &=3D pebs_cntrs_mask; + c2->weight =3D hweight64(c2->idxmsk64); + } + } + return c2; } =20 diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index 2f2c6b7c801b..a573ce0e576a 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -2941,6 +2941,7 @@ static void __init intel_arch_pebs_init(void) x86_pmu.pebs_buffer_size =3D PEBS_BUFFER_SIZE; x86_pmu.drain_pebs =3D intel_pmu_drain_arch_pebs; x86_pmu.pebs_capable =3D ~0ULL; + x86_pmu.flags |=3D PMU_FL_PEBS_ALL; } =20 /* --=20 2.40.1 From nobody Wed Feb 11 05:49:46 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0D2441D54D6; Thu, 23 Jan 2025 06:20:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737613253; cv=none; b=JMbYpzEtN9X4XkwSIkMsFGRllskY1/9+BVdZywRLPDAcjUn3FD+L8OyDd7za2RPqiLdsyz6+zSGHhdZoWnL/9Buo/mdceGngekUbWL6v/k2DzJKuzVLJ8fSPXZB1L3cvG9FTA2WzzSS4+VhCYF8jOJfMHAasCOLWzhgQtSAbpm0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737613253; c=relaxed/simple; bh=uKbxzypOc6jQBbD7toLrmn0LamajvZKSwLA8cDKtZf8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=X/iogv/mhNwxyYMlrs91cLuNP9kQ0u/Quf1X2Z73mCp0/R/Y19l1QzYY+P4CpPHqjB+KGxTHdqGF5y8s+TbMGULzS8YfdzN3rzVBMBic9bOeIpdtr+yV0e/eGq1g3AddBJUQikGBoV12gwhpSy81M/T8uqAQBqwHBvlB5aTL9i8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=cNWRqnH2; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="cNWRqnH2" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1737613252; x=1769149252; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=uKbxzypOc6jQBbD7toLrmn0LamajvZKSwLA8cDKtZf8=; b=cNWRqnH2akuVILkFgF4bQd4JSp5Ikc5pObz8VRy6sA5hqevODWJ1KKR8 mWd4I+ju8PHD2Q0rJVKPK9NbL83ipqN0PJavbsnKUNPJmzSM5t8UB61ju /wOnSIjVIrZ45ECydB7eOsHCp7ksljzqe4knRnHkPjhuJxOQEnQd+yKtB avsqdAaoK0pI55QENdMDuqY16Tc1wk9gqTkAcf1qHkkLSsTi0DJcJ5pEU PPCYUvgNniPtxgeS2KiSSQftkHlENAcPB7KF36aKzz076yy1VHsPdRm0z gfHU11EBvptYPGEAJ6U6BTSATgm3j33wKJ61hb8qEe+O9Lw00gxPvlsHq A==; X-CSE-ConnectionGUID: sLujh3OMQNCgqKNJxURwVQ== X-CSE-MsgGUID: 5XNmVSo8SzqIjg5oFtBNrA== X-IronPort-AV: E=McAfee;i="6700,10204,11323"; a="55513143" X-IronPort-AV: E=Sophos;i="6.13,227,1732608000"; d="scan'208";a="55513143" Received: from orviesa003.jf.intel.com ([10.64.159.143]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jan 2025 22:20:52 -0800 X-CSE-ConnectionGUID: pJJMc7MQSg6i6k8/excbQw== X-CSE-MsgGUID: Q7iRz6NORB+250nVf2aksA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,199,1725346800"; d="scan'208";a="112334641" Received: from emr.sh.intel.com ([10.112.229.56]) by orviesa003.jf.intel.com with ESMTP; 22 Jan 2025 22:20:48 -0800 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Ian Rogers , Adrian Hunter , Alexander Shishkin , Kan Liang , Andi Kleen , Eranian Stephane Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Dapeng Mi , Dapeng Mi Subject: [PATCH 12/20] perf/x86/intel: Setup PEBS data configuration and enable legacy groups Date: Thu, 23 Jan 2025 14:07:13 +0000 Message-Id: <20250123140721.2496639-13-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250123140721.2496639-1-dapeng1.mi@linux.intel.com> References: <20250123140721.2496639-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Different with legacy PEBS, arch-PEBS provides per-counter PEBS data configuration by programing MSR IA32_PMC_GPx/FXx_CFG_C MSRs. This patch obtains PEBS data configuration from event attribute and then writes the PEBS data configuration to MSR IA32_PMC_GPx/FXx_CFG_C and enable corresponding PEBS groups. Co-developed-by: Kan Liang Signed-off-by: Kan Liang Signed-off-by: Dapeng Mi --- arch/x86/events/intel/core.c | 127 +++++++++++++++++++++++++++++++ arch/x86/events/intel/ds.c | 17 +++++ arch/x86/events/perf_event.h | 15 ++++ arch/x86/include/asm/intel_ds.h | 7 ++ arch/x86/include/asm/msr-index.h | 10 +++ 5 files changed, 176 insertions(+) diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index 0f1be36113fa..cb88ae60de8e 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -2558,6 +2558,39 @@ static void intel_pmu_disable_fixed(struct perf_even= t *event) cpuc->fixed_ctrl_val &=3D ~mask; } =20 +static inline void __intel_pmu_update_event_ext(int idx, u64 ext) +{ + struct cpu_hw_events *cpuc =3D this_cpu_ptr(&cpu_hw_events); + u32 msr =3D idx < INTEL_PMC_IDX_FIXED ? + x86_pmu_cfg_c_addr(idx, true) : + x86_pmu_cfg_c_addr(idx - INTEL_PMC_IDX_FIXED, false); + + cpuc->cfg_c_val[idx] =3D ext; + wrmsrl(msr, ext); +} + +static void intel_pmu_disable_event_ext(struct perf_event *event) +{ + if (!x86_pmu.arch_pebs) + return; + + /* + * Only clear CFG_C MSR for PEBS counter group events, + * it avoids the HW counter's value to be added into + * other PEBS records incorrectly after PEBS counter + * group events are disabled. + * + * For other events, it's unnecessary to clear CFG_C MSRs + * since CFG_C doesn't take effect if counter is in + * disabled state. That helps to reduce the WRMSR overhead + * in context switches. + */ + if (!is_pebs_counter_event_group(event)) + return; + + __intel_pmu_update_event_ext(event->hw.idx, 0); +} + static void intel_pmu_disable_event(struct perf_event *event) { struct hw_perf_event *hwc =3D &event->hw; @@ -2566,9 +2599,12 @@ static void intel_pmu_disable_event(struct perf_even= t *event) switch (idx) { case 0 ... INTEL_PMC_IDX_FIXED - 1: intel_clear_masks(event, idx); + intel_pmu_disable_event_ext(event); x86_pmu_disable_event(event); break; case INTEL_PMC_IDX_FIXED ... INTEL_PMC_IDX_FIXED_BTS - 1: + intel_pmu_disable_event_ext(event); + fallthrough; case INTEL_PMC_IDX_METRIC_BASE ... INTEL_PMC_IDX_METRIC_END: intel_pmu_disable_fixed(event); break; @@ -2888,6 +2924,66 @@ static void intel_pmu_enable_fixed(struct perf_event= *event) cpuc->fixed_ctrl_val |=3D bits; } =20 +static void intel_pmu_enable_event_ext(struct perf_event *event) +{ + struct cpu_hw_events *cpuc =3D this_cpu_ptr(&cpu_hw_events); + struct hw_perf_event *hwc =3D &event->hw; + union arch_pebs_index cached, index; + struct arch_pebs_cap cap; + u64 ext =3D 0; + + if (!x86_pmu.arch_pebs) + return; + + cap =3D hybrid(cpuc->pmu, arch_pebs_cap); + + if (event->attr.precise_ip) { + u64 pebs_data_cfg =3D intel_get_arch_pebs_data_config(event); + + ext |=3D ARCH_PEBS_EN; + ext |=3D (-hwc->sample_period) & ARCH_PEBS_RELOAD; + + if (pebs_data_cfg && cap.caps) { + if (pebs_data_cfg & PEBS_DATACFG_MEMINFO) + ext |=3D ARCH_PEBS_AUX & cap.caps; + + if (pebs_data_cfg & PEBS_DATACFG_GP) + ext |=3D ARCH_PEBS_GPR & cap.caps; + + if (pebs_data_cfg & PEBS_DATACFG_XMMS) + ext |=3D ARCH_PEBS_VECR_XMM & cap.caps; + + if (pebs_data_cfg & PEBS_DATACFG_LBRS) + ext |=3D ARCH_PEBS_LBR & cap.caps; + } + + if (cpuc->n_pebs =3D=3D cpuc->n_large_pebs) + index.split.thresh =3D ARCH_PEBS_THRESH_MUL; + else + index.split.thresh =3D ARCH_PEBS_THRESH_SINGLE; + + rdmsrl(MSR_IA32_PEBS_INDEX, cached.full); + if (index.split.thresh !=3D cached.split.thresh || !cached.split.en) { + if (cached.split.thresh =3D=3D ARCH_PEBS_THRESH_MUL && + cached.split.wr > 0) { + /* + * Large PEBS was enabled. + * Drain PEBS buffer before applying the single PEBS. + */ + intel_pmu_drain_pebs_buffer(); + } else { + index.split.wr =3D 0; + index.split.full =3D 0; + index.split.en =3D 1; + wrmsrl(MSR_IA32_PEBS_INDEX, index.full); + } + } + } + + if (cpuc->cfg_c_val[hwc->idx] !=3D ext) + __intel_pmu_update_event_ext(hwc->idx, ext); +} + static void intel_pmu_enable_event(struct perf_event *event) { u64 enable_mask =3D ARCH_PERFMON_EVENTSEL_ENABLE; @@ -2902,9 +2998,12 @@ static void intel_pmu_enable_event(struct perf_event= *event) if (branch_sample_counters(event)) enable_mask |=3D ARCH_PERFMON_EVENTSEL_BR_CNTR; intel_set_masks(event, idx); + intel_pmu_enable_event_ext(event); __x86_pmu_enable_event(hwc, enable_mask); break; case INTEL_PMC_IDX_FIXED ... INTEL_PMC_IDX_FIXED_BTS - 1: + intel_pmu_enable_event_ext(event); + fallthrough; case INTEL_PMC_IDX_METRIC_BASE ... INTEL_PMC_IDX_METRIC_END: intel_pmu_enable_fixed(event); break; @@ -4984,6 +5083,29 @@ static inline bool intel_pmu_broken_perf_cap(void) return false; } =20 +static inline void __intel_update_pmu_caps(struct pmu *pmu) +{ + struct pmu *dest_pmu =3D pmu ? pmu : x86_get_pmu(smp_processor_id()); + + if (hybrid(pmu, arch_pebs_cap).caps & ARCH_PEBS_VECR_XMM) + dest_pmu->capabilities |=3D PERF_PMU_CAP_EXTENDED_REGS; +} + +static inline void __intel_update_large_pebs_flags(struct pmu *pmu) +{ + u64 caps =3D hybrid(pmu, arch_pebs_cap).caps; + + x86_pmu.large_pebs_flags |=3D PERF_SAMPLE_TIME; + if (caps & ARCH_PEBS_LBR) + x86_pmu.large_pebs_flags |=3D PERF_SAMPLE_BRANCH_STACK; + + if (!(caps & ARCH_PEBS_AUX)) + x86_pmu.large_pebs_flags &=3D ~PERF_SAMPLE_DATA_SRC; + if (!(caps & ARCH_PEBS_GPR)) + x86_pmu.large_pebs_flags &=3D + ~(PERF_SAMPLE_REGS_INTR | PERF_SAMPLE_REGS_USER); +} + static void update_pmu_cap(struct pmu *pmu) { unsigned int sub_bitmaps, eax, ebx, ecx, edx; @@ -5012,6 +5134,9 @@ static void update_pmu_cap(struct pmu *pmu) &eax, &ebx, &ecx, &edx); hybrid(pmu, arch_pebs_cap).counters =3D ((u64)ecx << 32) | eax; hybrid(pmu, arch_pebs_cap).pdists =3D ((u64)edx << 32) | ebx; + + __intel_update_pmu_caps(pmu); + __intel_update_large_pebs_flags(pmu); } else { WARN_ON(x86_pmu.arch_pebs =3D=3D 1); x86_pmu.arch_pebs =3D 0; @@ -5178,6 +5303,8 @@ static void intel_pmu_cpu_starting(int cpu) } } =20 + __intel_update_pmu_caps(cpuc->pmu); + if (!cpuc->shared_regs) return; =20 diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index a573ce0e576a..5d8c5c8d5e24 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -1492,6 +1492,18 @@ pebs_update_state(bool needed_cb, struct cpu_hw_even= ts *cpuc, } } =20 +u64 intel_get_arch_pebs_data_config(struct perf_event *event) +{ + u64 pebs_data_cfg =3D 0; + + if (WARN_ON(event->hw.idx < 0 || event->hw.idx >=3D X86_PMC_IDX_MAX)) + return 0; + + pebs_data_cfg |=3D pebs_update_adaptive_cfg(event); + + return pebs_data_cfg; +} + void intel_pmu_pebs_add(struct perf_event *event) { struct cpu_hw_events *cpuc =3D this_cpu_ptr(&cpu_hw_events); @@ -2927,6 +2939,11 @@ static void intel_pmu_drain_arch_pebs(struct pt_regs= *iregs, =20 index.split.wr =3D 0; index.split.full =3D 0; + index.split.en =3D 1; + if (cpuc->n_pebs =3D=3D cpuc->n_large_pebs) + index.split.thresh =3D ARCH_PEBS_THRESH_MUL; + else + index.split.thresh =3D ARCH_PEBS_THRESH_SINGLE; wrmsrl(MSR_IA32_PEBS_INDEX, index.full); } =20 diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index a3c4374fe7f3..3acb03a5c214 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -286,6 +286,9 @@ struct cpu_hw_events { u64 fixed_ctrl_val; u64 active_fixed_ctrl_val; =20 + /* Cached CFG_C values */ + u64 cfg_c_val[X86_PMC_IDX_MAX]; + /* * Intel LBR bits */ @@ -1194,6 +1197,14 @@ static inline unsigned int x86_pmu_fixed_ctr_addr(in= t index) x86_pmu.addr_offset(index, false) : index); } =20 +static inline unsigned int x86_pmu_cfg_c_addr(int index, bool gp) +{ + u32 base =3D gp ? MSR_IA32_PMC_V6_GP0_CFG_C : MSR_IA32_PMC_V6_FX0_CFG_C; + + return base + (x86_pmu.addr_offset ? x86_pmu.addr_offset(index, false) : + index * MSR_IA32_PMC_V6_STEP); +} + static inline int x86_pmu_rdpmc_index(int index) { return x86_pmu.rdpmc_index ? x86_pmu.rdpmc_index(index) : index; @@ -1615,6 +1626,8 @@ void intel_pmu_disable_bts(void); =20 int intel_pmu_drain_bts_buffer(void); =20 +void intel_pmu_drain_pebs_buffer(void); + u64 grt_latency_data(struct perf_event *event, u64 status); =20 u64 cmt_latency_data(struct perf_event *event, u64 status); @@ -1748,6 +1761,8 @@ void intel_pmu_pebs_data_source_cmt(void); =20 void intel_pmu_pebs_data_source_lnl(void); =20 +u64 intel_get_arch_pebs_data_config(struct perf_event *event); + int intel_pmu_setup_lbr_filter(struct perf_event *event); =20 void intel_pt_interrupt(void); diff --git a/arch/x86/include/asm/intel_ds.h b/arch/x86/include/asm/intel_d= s.h index 023c2883f9f3..7bb80c993bef 100644 --- a/arch/x86/include/asm/intel_ds.h +++ b/arch/x86/include/asm/intel_ds.h @@ -7,6 +7,13 @@ #define PEBS_BUFFER_SHIFT 4 #define PEBS_BUFFER_SIZE (PAGE_SIZE << PEBS_BUFFER_SHIFT) =20 +/* + * The largest PEBS record could consume a page, ensure + * a record at least can be written after triggering PMI. + */ +#define ARCH_PEBS_THRESH_MUL ((PEBS_BUFFER_SIZE - PAGE_SIZE) >> PEBS_BUFFE= R_SHIFT) +#define ARCH_PEBS_THRESH_SINGLE 1 + /* The maximal number of PEBS events: */ #define MAX_PEBS_EVENTS_FMT4 8 #define MAX_PEBS_EVENTS 32 diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-in= dex.h index 59d3a050985e..a3fad7e910eb 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -318,6 +318,14 @@ #define ARCH_PEBS_OFFSET_MASK 0x7fffff #define ARCH_PEBS_INDEX_WR_SHIFT 4 =20 +#define ARCH_PEBS_RELOAD 0xffffffff +#define ARCH_PEBS_LBR_SHIFT 40 +#define ARCH_PEBS_LBR (0x3ull << ARCH_PEBS_LBR_SHIFT) +#define ARCH_PEBS_VECR_XMM BIT_ULL(49) +#define ARCH_PEBS_GPR BIT_ULL(61) +#define ARCH_PEBS_AUX BIT_ULL(62) +#define ARCH_PEBS_EN BIT_ULL(63) + #define MSR_IA32_RTIT_CTL 0x00000570 #define RTIT_CTL_TRACEEN BIT(0) #define RTIT_CTL_CYCLEACC BIT(1) @@ -597,7 +605,9 @@ /* V6 PMON MSR range */ #define MSR_IA32_PMC_V6_GP0_CTR 0x1900 #define MSR_IA32_PMC_V6_GP0_CFG_A 0x1901 +#define MSR_IA32_PMC_V6_GP0_CFG_C 0x1903 #define MSR_IA32_PMC_V6_FX0_CTR 0x1980 +#define MSR_IA32_PMC_V6_FX0_CFG_C 0x1983 #define MSR_IA32_PMC_V6_STEP 4 =20 /* KeyID partitioning between MKTME and TDX */ --=20 2.40.1 From nobody Wed Feb 11 05:49:46 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8320D1D5AB5; Thu, 23 Jan 2025 06:20:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737613258; cv=none; b=TVirVOV9a+lVqoTaMAJLKM86lxYxXDo3mvBupYK+8EGE6/r/T1ORrzxgi5sVsiuxVhRx+tNk/dlcrQCXQVEieinq44jGQ13C6CqAn8m9iRH/hxPhTzs0rYU9h68BNhuJX0/Uf2RSzTM0o4mTCepjJ3g0whLSjetCPCvZhVrjtf4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737613258; c=relaxed/simple; bh=4oXkoEn57uFVGb/wlrcvRfUkBiaTMHDJsbUvCeXZ3pk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=WSxkc4dt13uEiP1kvvIYLbhn+cgCkNgOESJSLJSjXSVHVCmU15yKc43/QkbxnqO/UzxdBCzvUuVp/kT3t3MqObyKXBtbmDCB68YaFyVrgnILX1HqiREPbNqe9tAzZdKIyct65NnrHhHWkfdj9N+jq/T+NO3VUVqqLwl2cbkS8jU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=n5+5HUjP; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="n5+5HUjP" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1737613257; x=1769149257; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=4oXkoEn57uFVGb/wlrcvRfUkBiaTMHDJsbUvCeXZ3pk=; b=n5+5HUjPOy1ScSA68WNRmgUU/MMpDM3PrFFMrl8OCE+bNzWMqM2EljKX q3iZaEGJ+T3Q0lNJQ4xXVy2NRLLboq5PzIol5yuT8bPFBkYm2jfBIXdKT 00W58tV2N8dYM696JrXuapQXfzycbN92P6D3L6X+/whkzyTNcWxUprd/s tKkDucCJGBN15JwjX32zdfUx7XextcEBZiku7piNaJ7JIhFOhnJNUNY6i /Lop67Nb61KisYWIHJu6Ntsb0PdMRJHD/sUtJInEgoO82pYz8udzHfejP u6wOEpEtawGw5NaEiqenwrPIokypWQpiIiav5RuprnyDCJWtW9xVuHGp4 w==; X-CSE-ConnectionGUID: RxNOs8gFTJuCrtyFa0dwaA== X-CSE-MsgGUID: GiErytExSI21VDy2WgjSlw== X-IronPort-AV: E=McAfee;i="6700,10204,11323"; a="55513164" X-IronPort-AV: E=Sophos;i="6.13,227,1732608000"; d="scan'208";a="55513164" Received: from orviesa003.jf.intel.com ([10.64.159.143]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jan 2025 22:20:56 -0800 X-CSE-ConnectionGUID: ZcsX0TBdTp6kSzpgmxooBg== X-CSE-MsgGUID: I4xZK48LSGeIxiRCy7Ek3A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,199,1725346800"; d="scan'208";a="112334656" Received: from emr.sh.intel.com ([10.112.229.56]) by orviesa003.jf.intel.com with ESMTP; 22 Jan 2025 22:20:52 -0800 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Ian Rogers , Adrian Hunter , Alexander Shishkin , Kan Liang , Andi Kleen , Eranian Stephane Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Dapeng Mi , Dapeng Mi Subject: [PATCH 13/20] perf/x86/intel: Add SSP register support for arch-PEBS Date: Thu, 23 Jan 2025 14:07:14 +0000 Message-Id: <20250123140721.2496639-14-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250123140721.2496639-1-dapeng1.mi@linux.intel.com> References: <20250123140721.2496639-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Arch-PEBS supports to capture SSP register in GPR group. This patch supports to read and output this register. SSP is for shadow stacks. Signed-off-by: Dapeng Mi --- arch/x86/events/core.c | 10 ++++++++++ arch/x86/events/intel/ds.c | 3 +++ arch/x86/include/asm/perf_event.h | 1 + arch/x86/include/uapi/asm/perf_regs.h | 3 ++- arch/x86/kernel/perf_regs.c | 5 +++++ 5 files changed, 21 insertions(+), 1 deletion(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index f40b03adb5c7..7ed80f01f15d 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -646,6 +646,16 @@ int x86_pmu_hw_config(struct perf_event *event) return -EINVAL; } =20 + /* sample_regs_user never support SSP register. */ + if (unlikely(event->attr.sample_regs_user & BIT_ULL(PERF_REG_X86_SSP))) + return -EINVAL; + + if (unlikely(event->attr.sample_regs_intr & BIT_ULL(PERF_REG_X86_SSP))) { + /* Only arch-PEBS supports to capture SSP register. */ + if (!x86_pmu.arch_pebs || !event->attr.precise_ip) + return -EINVAL; + } + /* sample_regs_user never support XMM registers */ if (unlikely(event->attr.sample_regs_user & PERF_REG_EXTENDED_MASK)) return -EINVAL; diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index 5d8c5c8d5e24..a7e101f6f2d6 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -2209,6 +2209,7 @@ static void setup_pebs_adaptive_sample_data(struct pe= rf_event *event, =20 perf_regs =3D container_of(regs, struct x86_perf_regs, regs); perf_regs->xmm_regs =3D NULL; + perf_regs->ssp =3D 0; =20 format_group =3D basic->format_group; =20 @@ -2325,6 +2326,7 @@ static void setup_arch_pebs_sample_data(struct perf_e= vent *event, =20 perf_regs =3D container_of(regs, struct x86_perf_regs, regs); perf_regs->xmm_regs =3D NULL; + perf_regs->ssp =3D 0; =20 __setup_perf_sample_data(event, iregs, data); =20 @@ -2361,6 +2363,7 @@ static void setup_arch_pebs_sample_data(struct perf_e= vent *event, =20 __setup_pebs_gpr_group(event, regs, (struct pebs_gprs *)gprs, sample_type); + perf_regs->ssp =3D gprs->ssp; } =20 if (header->aux) { diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_= event.h index d0a3a13b8dae..cca8a0d68cbc 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -671,6 +671,7 @@ extern void perf_events_lapic_init(void); struct pt_regs; struct x86_perf_regs { struct pt_regs regs; + u64 ssp; u64 *xmm_regs; }; =20 diff --git a/arch/x86/include/uapi/asm/perf_regs.h b/arch/x86/include/uapi/= asm/perf_regs.h index 7c9d2bb3833b..2e88fdebd259 100644 --- a/arch/x86/include/uapi/asm/perf_regs.h +++ b/arch/x86/include/uapi/asm/perf_regs.h @@ -27,9 +27,10 @@ enum perf_event_x86_regs { PERF_REG_X86_R13, PERF_REG_X86_R14, PERF_REG_X86_R15, + PERF_REG_X86_SSP, /* These are the limits for the GPRs. */ PERF_REG_X86_32_MAX =3D PERF_REG_X86_GS + 1, - PERF_REG_X86_64_MAX =3D PERF_REG_X86_R15 + 1, + PERF_REG_X86_64_MAX =3D PERF_REG_X86_SSP + 1, =20 /* These all need two bits set because they are 128bit */ PERF_REG_X86_XMM0 =3D 32, diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c index 624703af80a1..4b15c7488ec1 100644 --- a/arch/x86/kernel/perf_regs.c +++ b/arch/x86/kernel/perf_regs.c @@ -54,6 +54,8 @@ static unsigned int pt_regs_offset[PERF_REG_X86_MAX] =3D { PT_REGS_OFFSET(PERF_REG_X86_R13, r13), PT_REGS_OFFSET(PERF_REG_X86_R14, r14), PT_REGS_OFFSET(PERF_REG_X86_R15, r15), + /* The pt_regs struct does not store Shadow stack pointer. */ + (unsigned int) -1, #endif }; =20 @@ -68,6 +70,9 @@ u64 perf_reg_value(struct pt_regs *regs, int idx) return perf_regs->xmm_regs[idx - PERF_REG_X86_XMM0]; } =20 + if (idx =3D=3D PERF_REG_X86_SSP) + return perf_regs->ssp; + if (WARN_ON_ONCE(idx >=3D ARRAY_SIZE(pt_regs_offset))) return 0; =20 --=20 2.40.1 From nobody Wed Feb 11 05:49:46 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7D1621D63D0; Thu, 23 Jan 2025 06:20:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737613261; cv=none; b=k1xtXmjDXatEpXXlpL/8IByZS6Bn5FUzk1UrKt68Cvy05ExoHwuq54oxUzLbBpzMWGLqruhY389ov2A2KMHxWaCH6n+/c+oGCKqcEyOigFjcsoZGa6rNWxNr6cp4i9DrjupyzF2hjhQRIernN+iYXUHzFN7B+CQAW0rZ6o4JEl4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737613261; c=relaxed/simple; bh=k1nX0IQ8iIrSSRgZ/Mtbl7lrtf9QpbxEsHzeK5gUJxs=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=IAjHsgk9OYu3XcdGglHPxAE2tLoEMx5hUWpTotlgZIfeaBLiOQR0UanRdTSNQpyOHCMloiuae9grjJRkiQtl4idMmGenLbMNg1IT0/Vqn5pBrsw99/5qnw295js5LvmQ5hKFBoFbTPV5IDar/YmuK813UukdntJRAZZ6MIChHmY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=KJx7A53f; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="KJx7A53f" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1737613260; x=1769149260; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=k1nX0IQ8iIrSSRgZ/Mtbl7lrtf9QpbxEsHzeK5gUJxs=; b=KJx7A53fq6Kx8Ekottkktbiuqo1/Dynus4D8XLrLhZsXFwzEo1v2dBzZ en0iptE/tsYrOoV+nyP02hYrMDDQct/cG1MDtw7bQuEccYBkj0K7TYh0K vAc120ScBpBoLGcucHOgTx3NgOKBojM4mblSRVwjTFtmOQ77pTYfZziA+ wN2to67nYUKQMbMOHADSoZKSJ9tGBeh+ufiwl0If9qsXsNAT6mhAyGY5g Q/8EMLZrvXwOTOB+7BNKdol423fK8RacH8oRnC3UmzM8/FSO3SJUodUBl Qp3DV89sMpt8IA8b4sBsR5vYA4eO5Di3PB1ToTQlCjyirKxfdcV59tbDp g==; X-CSE-ConnectionGUID: vxmALjdjRrKEN5UsE2RM3w== X-CSE-MsgGUID: cIJ/FD7JQXuo5MPWTpp2YQ== X-IronPort-AV: E=McAfee;i="6700,10204,11323"; a="55513172" X-IronPort-AV: E=Sophos;i="6.13,227,1732608000"; d="scan'208";a="55513172" Received: from orviesa003.jf.intel.com ([10.64.159.143]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jan 2025 22:20:59 -0800 X-CSE-ConnectionGUID: XIsdZsY9ShqmZF7CeZ8KZw== X-CSE-MsgGUID: qqGnYeOJTHCt2lfnm9UCtQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,199,1725346800"; d="scan'208";a="112334681" Received: from emr.sh.intel.com ([10.112.229.56]) by orviesa003.jf.intel.com with ESMTP; 22 Jan 2025 22:20:55 -0800 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Ian Rogers , Adrian Hunter , Alexander Shishkin , Kan Liang , Andi Kleen , Eranian Stephane Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Dapeng Mi , Dapeng Mi Subject: [PATCH 14/20] perf/x86/intel: Add counter group support for arch-PEBS Date: Thu, 23 Jan 2025 14:07:15 +0000 Message-Id: <20250123140721.2496639-15-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250123140721.2496639-1-dapeng1.mi@linux.intel.com> References: <20250123140721.2496639-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Base on previous adaptive PEBS counter snapshot support, add counter group support for architectural PEBS. Since arch-PEBS shares same counter group layout with adaptive PEBS, directly reuse __setup_pebs_counter_group() helper to process arch-PEBS counter group. Signed-off-by: Dapeng Mi --- arch/x86/events/intel/core.c | 36 +++++++++++++++++++++++++++++-- arch/x86/events/intel/ds.c | 27 ++++++++++++++++++++++- arch/x86/events/perf_event.h | 2 ++ arch/x86/include/asm/msr-index.h | 6 ++++++ arch/x86/include/asm/perf_event.h | 13 ++++++++--- 5 files changed, 78 insertions(+), 6 deletions(-) diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index cb88ae60de8e..9c5b44a73ca2 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -2955,6 +2955,17 @@ static void intel_pmu_enable_event_ext(struct perf_e= vent *event) =20 if (pebs_data_cfg & PEBS_DATACFG_LBRS) ext |=3D ARCH_PEBS_LBR & cap.caps; + + if (pebs_data_cfg & + (PEBS_DATACFG_CNTR_MASK << PEBS_DATACFG_CNTR_SHIFT)) + ext |=3D ARCH_PEBS_CNTR_GP & cap.caps; + + if (pebs_data_cfg & + (PEBS_DATACFG_FIX_MASK << PEBS_DATACFG_FIX_SHIFT)) + ext |=3D ARCH_PEBS_CNTR_FIXED & cap.caps; + + if (pebs_data_cfg & PEBS_DATACFG_METRICS) + ext |=3D ARCH_PEBS_CNTR_METRICS & cap.caps; } =20 if (cpuc->n_pebs =3D=3D cpuc->n_large_pebs) @@ -2980,6 +2991,9 @@ static void intel_pmu_enable_event_ext(struct perf_ev= ent *event) } } =20 + if (is_pebs_counter_event_group(event)) + ext |=3D ARCH_PEBS_CNTR_ALLOW; + if (cpuc->cfg_c_val[hwc->idx] !=3D ext) __intel_pmu_update_event_ext(hwc->idx, ext); } @@ -4131,6 +4145,20 @@ static inline bool intel_pmu_has_cap(struct perf_eve= nt *event, int idx) return test_bit(idx, (unsigned long *)&intel_cap->capabilities); } =20 +static inline bool intel_pmu_has_pebs_counter_group(struct pmu *pmu) +{ + u64 caps; + + if (x86_pmu.intel_cap.pebs_format >=3D 6 && x86_pmu.intel_cap.pebs_baseli= ne) + return true; + + caps =3D hybrid(pmu, arch_pebs_cap).caps; + if (x86_pmu.arch_pebs && (caps & ARCH_PEBS_CNTR_MASK)) + return true; + + return false; +} + static int intel_pmu_hw_config(struct perf_event *event) { int ret =3D x86_pmu_hw_config(event); @@ -4243,8 +4271,7 @@ static int intel_pmu_hw_config(struct perf_event *eve= nt) } =20 if ((event->attr.sample_type & PERF_SAMPLE_READ) && - (x86_pmu.intel_cap.pebs_format >=3D 6) && - x86_pmu.intel_cap.pebs_baseline && + intel_pmu_has_pebs_counter_group(event->pmu) && is_sampling_event(event) && event->attr.precise_ip) event->group_leader->hw.flags |=3D PERF_X86_EVENT_PEBS_CNTR; @@ -5089,6 +5116,9 @@ static inline void __intel_update_pmu_caps(struct pmu= *pmu) =20 if (hybrid(pmu, arch_pebs_cap).caps & ARCH_PEBS_VECR_XMM) dest_pmu->capabilities |=3D PERF_PMU_CAP_EXTENDED_REGS; + + if (hybrid(pmu, arch_pebs_cap).caps & ARCH_PEBS_CNTR_MASK) + x86_pmu.late_setup =3D intel_pmu_late_setup; } =20 static inline void __intel_update_large_pebs_flags(struct pmu *pmu) @@ -5098,6 +5128,8 @@ static inline void __intel_update_large_pebs_flags(st= ruct pmu *pmu) x86_pmu.large_pebs_flags |=3D PERF_SAMPLE_TIME; if (caps & ARCH_PEBS_LBR) x86_pmu.large_pebs_flags |=3D PERF_SAMPLE_BRANCH_STACK; + if (caps & ARCH_PEBS_CNTR_MASK) + x86_pmu.large_pebs_flags |=3D PERF_SAMPLE_READ; =20 if (!(caps & ARCH_PEBS_AUX)) x86_pmu.large_pebs_flags &=3D ~PERF_SAMPLE_DATA_SRC; diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index a7e101f6f2d6..32a44e3571cb 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -1383,7 +1383,7 @@ static void __intel_pmu_pebs_update_cfg(struct perf_e= vent *event, } =20 =20 -static void intel_pmu_late_setup(void) +void intel_pmu_late_setup(void) { struct cpu_hw_events *cpuc =3D this_cpu_ptr(&cpu_hw_events); struct perf_event *event; @@ -1494,13 +1494,20 @@ pebs_update_state(bool needed_cb, struct cpu_hw_eve= nts *cpuc, =20 u64 intel_get_arch_pebs_data_config(struct perf_event *event) { + struct cpu_hw_events *cpuc =3D this_cpu_ptr(&cpu_hw_events); u64 pebs_data_cfg =3D 0; + u64 cntr_mask; =20 if (WARN_ON(event->hw.idx < 0 || event->hw.idx >=3D X86_PMC_IDX_MAX)) return 0; =20 pebs_data_cfg |=3D pebs_update_adaptive_cfg(event); =20 + cntr_mask =3D (PEBS_DATACFG_CNTR_MASK << PEBS_DATACFG_CNTR_SHIFT) | + (PEBS_DATACFG_FIX_MASK << PEBS_DATACFG_FIX_SHIFT) | + PEBS_DATACFG_CNTR | PEBS_DATACFG_METRICS; + pebs_data_cfg |=3D cpuc->pebs_data_cfg & cntr_mask; + return pebs_data_cfg; } =20 @@ -2404,6 +2411,24 @@ static void setup_arch_pebs_sample_data(struct perf_= event *event, } } =20 + if (header->cntr) { + struct arch_pebs_cntr_header *cntr =3D next_record; + unsigned int nr; + + next_record +=3D sizeof(struct arch_pebs_cntr_header); + + if (is_pebs_counter_event_group(event)) { + __setup_pebs_counter_group(cpuc, event, + (struct pebs_cntr_header *)cntr, next_record); + data->sample_flags |=3D PERF_SAMPLE_READ; + } + + nr =3D hweight32(cntr->cntr) + hweight32(cntr->fixed); + if (cntr->metrics =3D=3D INTEL_CNTR_METRICS) + nr +=3D 2; + next_record +=3D nr * sizeof(u64); + } + /* Parse followed fragments if there are. */ if (arch_pebs_record_continued(header)) { at =3D at + header->size; diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 3acb03a5c214..ce8757cb229c 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -1688,6 +1688,8 @@ void intel_pmu_drain_pebs_buffer(void); =20 void intel_pmu_store_pebs_lbrs(struct lbr_entry *lbr); =20 +void intel_pmu_late_setup(void); + void intel_pebs_init(void); =20 void intel_pmu_lbr_save_brstack(struct perf_sample_data *data, diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-in= dex.h index a3fad7e910eb..6235df132ee0 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -319,12 +319,18 @@ #define ARCH_PEBS_INDEX_WR_SHIFT 4 =20 #define ARCH_PEBS_RELOAD 0xffffffff +#define ARCH_PEBS_CNTR_ALLOW BIT_ULL(35) +#define ARCH_PEBS_CNTR_GP BIT_ULL(36) +#define ARCH_PEBS_CNTR_FIXED BIT_ULL(37) +#define ARCH_PEBS_CNTR_METRICS BIT_ULL(38) #define ARCH_PEBS_LBR_SHIFT 40 #define ARCH_PEBS_LBR (0x3ull << ARCH_PEBS_LBR_SHIFT) #define ARCH_PEBS_VECR_XMM BIT_ULL(49) #define ARCH_PEBS_GPR BIT_ULL(61) #define ARCH_PEBS_AUX BIT_ULL(62) #define ARCH_PEBS_EN BIT_ULL(63) +#define ARCH_PEBS_CNTR_MASK (ARCH_PEBS_CNTR_GP | ARCH_PEBS_CNTR_FIXED | \ + ARCH_PEBS_CNTR_METRICS) =20 #define MSR_IA32_RTIT_CTL 0x00000570 #define RTIT_CTL_TRACEEN BIT(0) diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_= event.h index cca8a0d68cbc..a38d791cd0c2 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -137,16 +137,16 @@ #define ARCH_PERFMON_EVENTS_COUNT 7 =20 #define PEBS_DATACFG_MEMINFO BIT_ULL(0) -#define PEBS_DATACFG_GP BIT_ULL(1) +#define PEBS_DATACFG_GP BIT_ULL(1) #define PEBS_DATACFG_XMMS BIT_ULL(2) #define PEBS_DATACFG_LBRS BIT_ULL(3) -#define PEBS_DATACFG_LBR_SHIFT 24 #define PEBS_DATACFG_CNTR BIT_ULL(4) +#define PEBS_DATACFG_METRICS BIT_ULL(5) +#define PEBS_DATACFG_LBR_SHIFT 24 #define PEBS_DATACFG_CNTR_SHIFT 32 #define PEBS_DATACFG_CNTR_MASK GENMASK_ULL(15, 0) #define PEBS_DATACFG_FIX_SHIFT 48 #define PEBS_DATACFG_FIX_MASK GENMASK_ULL(7, 0) -#define PEBS_DATACFG_METRICS BIT_ULL(5) =20 /* Steal the highest bit of pebs_data_cfg for SW usage */ #define PEBS_UPDATE_DS_SW BIT_ULL(63) @@ -573,6 +573,13 @@ struct arch_pebs_lbr_header { u64 ler_info; }; =20 +struct arch_pebs_cntr_header { + u32 cntr; + u32 fixed; + u32 metrics; + u32 reserved; +}; + /* * AMD Extended Performance Monitoring and Debug cpuid feature detection */ --=20 2.40.1 From nobody Wed Feb 11 05:49:46 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DD8DB1D79B8; Thu, 23 Jan 2025 06:21:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737613265; cv=none; b=Pi1O7LT5HIqsfznVWpEgZl0bPQ/OHW2ejrA976e8AC+Do6D8veoPumzQSfoly5CFQEMk99A7YH+Wawa9FoZdpCn+itM/WlyEhZkQOFdqJOUmfGmhpT8PbfWrOyiiqJt9TXdZse3nwb/qACLcLweQbTMC3eChGDXlP73KP4QYYcU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737613265; c=relaxed/simple; bh=w7EIU9s7myqloD03Ci09IIyRJUcSx3e2Cw1hcdZRa3s=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=GmlvmwTc+wDB5k50f5/NKPvjEb68z9rDTksF4ac75LZCi61B4sG4tAM+kGvu65Q6w6YtvwjYn2ahNRYulyP7AotRLeYsn6L47KwQBw5JFUKf+xAuZ5lecDdFPyadtSTOohRU1zbbektlkGOPrrwjCzUCi+tATYpdTBYYUQpw7vU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=MKXEHxvo; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="MKXEHxvo" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1737613264; x=1769149264; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=w7EIU9s7myqloD03Ci09IIyRJUcSx3e2Cw1hcdZRa3s=; b=MKXEHxvo60wfgDYnJF2UN7M3SZVLtMKEsrTv94oLwUaoUxKAsn8BDTt/ TzgCOxGr9p7majp9IZp54Y/+fjrIr0U5t9vZlfPCnSnJIcHh1B/xNdaEA 4Hl3Qokta+dioiwJxQcg9CO1XSPFGzKlAIsJlQ/V/V1PR/A89nBsrbfvv 4hi4MTb/yYQxYUv8qhvkTYm9iuT2BArdTFElEy67njdDJusPttycXMIzc 8d4O02+0U9+r938ks7ZecfhFlTe+GcQJphcIaKRpQ7aHTpM5yX2xrM/Hc GQzij4q4sISriTbIRtrjiDwy+IvsLpCFChAUHRCDO18kdujQOapfUE+ZW Q==; X-CSE-ConnectionGUID: 0ncoMfWOSFS+5SsDAcFfaQ== X-CSE-MsgGUID: XIFGLKRgQWGJmmD3zbnuWQ== X-IronPort-AV: E=McAfee;i="6700,10204,11323"; a="55513189" X-IronPort-AV: E=Sophos;i="6.13,227,1732608000"; d="scan'208";a="55513189" Received: from orviesa003.jf.intel.com ([10.64.159.143]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jan 2025 22:21:04 -0800 X-CSE-ConnectionGUID: 6K8E2ZmBSHSITE+wJeMJqg== X-CSE-MsgGUID: TYnnR/0JTx6WpOB0oU0vnQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,199,1725346800"; d="scan'208";a="112334720" Received: from emr.sh.intel.com ([10.112.229.56]) by orviesa003.jf.intel.com with ESMTP; 22 Jan 2025 22:20:59 -0800 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Ian Rogers , Adrian Hunter , Alexander Shishkin , Kan Liang , Andi Kleen , Eranian Stephane Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Dapeng Mi , Dapeng Mi Subject: [PATCH 15/20] perf/core: Support to capture higher width vector registers Date: Thu, 23 Jan 2025 14:07:16 +0000 Message-Id: <20250123140721.2496639-16-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250123140721.2496639-1-dapeng1.mi@linux.intel.com> References: <20250123140721.2496639-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Arch-PEBS supports to capture more vector registers like OPMASK/YMM/ZMM registers besides XMM registers. This patch extends PERF_SAMPLE_REGS_INTR attribute to support these higher width vector registers capturing. The array sample_regs_intr_ext[] is added into perf_event_attr structure to record user configured extended register bitmap and a helper perf_reg_ext_validate() is added to validate if these registers are supported on some specific PMUs. This patch just adds the common perf/core support, the x86/intel specific support would be added in next patch. Co-developed-by: Kan Liang Signed-off-by: Kan Liang Signed-off-by: Dapeng Mi --- arch/arm/kernel/perf_regs.c | 6 ++ arch/arm64/kernel/perf_regs.c | 6 ++ arch/csky/kernel/perf_regs.c | 5 ++ arch/loongarch/kernel/perf_regs.c | 5 ++ arch/mips/kernel/perf_regs.c | 5 ++ arch/powerpc/perf/perf_regs.c | 5 ++ arch/riscv/kernel/perf_regs.c | 5 ++ arch/s390/kernel/perf_regs.c | 5 ++ arch/x86/include/asm/perf_event.h | 4 ++ arch/x86/include/uapi/asm/perf_regs.h | 83 ++++++++++++++++++++++++++- arch/x86/kernel/perf_regs.c | 50 +++++++++++++++- include/linux/perf_event.h | 2 + include/linux/perf_regs.h | 10 ++++ include/uapi/linux/perf_event.h | 10 ++++ kernel/events/core.c | 53 ++++++++++++++++- 15 files changed, 249 insertions(+), 5 deletions(-) diff --git a/arch/arm/kernel/perf_regs.c b/arch/arm/kernel/perf_regs.c index 0529f90395c9..86b2002d0846 100644 --- a/arch/arm/kernel/perf_regs.c +++ b/arch/arm/kernel/perf_regs.c @@ -37,3 +37,9 @@ void perf_get_regs_user(struct perf_regs *regs_user, regs_user->regs =3D task_pt_regs(current); regs_user->abi =3D perf_reg_abi(current); } + +int perf_reg_ext_validate(unsigned long *mask, unsigned int size) +{ + return -EINVAL; +} + diff --git a/arch/arm64/kernel/perf_regs.c b/arch/arm64/kernel/perf_regs.c index b4eece3eb17d..1c91fd3530d5 100644 --- a/arch/arm64/kernel/perf_regs.c +++ b/arch/arm64/kernel/perf_regs.c @@ -104,3 +104,9 @@ void perf_get_regs_user(struct perf_regs *regs_user, regs_user->regs =3D task_pt_regs(current); regs_user->abi =3D perf_reg_abi(current); } + +int perf_reg_ext_validate(unsigned long *mask, unsigned int size) +{ + return -EINVAL; +} + diff --git a/arch/csky/kernel/perf_regs.c b/arch/csky/kernel/perf_regs.c index 09b7f88a2d6a..d2e2af0bf1ad 100644 --- a/arch/csky/kernel/perf_regs.c +++ b/arch/csky/kernel/perf_regs.c @@ -26,6 +26,11 @@ int perf_reg_validate(u64 mask) return 0; } =20 +int perf_reg_ext_validate(unsigned long *mask, unsigned int size) +{ + return -EINVAL; +} + u64 perf_reg_abi(struct task_struct *task) { return PERF_SAMPLE_REGS_ABI_32; diff --git a/arch/loongarch/kernel/perf_regs.c b/arch/loongarch/kernel/perf= _regs.c index 263ac4ab5af6..e1df67e3fab4 100644 --- a/arch/loongarch/kernel/perf_regs.c +++ b/arch/loongarch/kernel/perf_regs.c @@ -34,6 +34,11 @@ int perf_reg_validate(u64 mask) return 0; } =20 +int perf_reg_ext_validate(unsigned long *mask, unsigned int size) +{ + return -EINVAL; +} + u64 perf_reg_value(struct pt_regs *regs, int idx) { if (WARN_ON_ONCE((u32)idx >=3D PERF_REG_LOONGARCH_MAX)) diff --git a/arch/mips/kernel/perf_regs.c b/arch/mips/kernel/perf_regs.c index e686780d1647..bbb5f25b9191 100644 --- a/arch/mips/kernel/perf_regs.c +++ b/arch/mips/kernel/perf_regs.c @@ -37,6 +37,11 @@ int perf_reg_validate(u64 mask) return 0; } =20 +int perf_reg_ext_validate(unsigned long *mask, unsigned int size) +{ + return -EINVAL; +} + u64 perf_reg_value(struct pt_regs *regs, int idx) { long v; diff --git a/arch/powerpc/perf/perf_regs.c b/arch/powerpc/perf/perf_regs.c index 350dccb0143c..d919c628aee3 100644 --- a/arch/powerpc/perf/perf_regs.c +++ b/arch/powerpc/perf/perf_regs.c @@ -132,6 +132,11 @@ int perf_reg_validate(u64 mask) return 0; } =20 +int perf_reg_ext_validate(unsigned long *mask, unsigned int size) +{ + return -EINVAL; +} + u64 perf_reg_abi(struct task_struct *task) { if (is_tsk_32bit_task(task)) diff --git a/arch/riscv/kernel/perf_regs.c b/arch/riscv/kernel/perf_regs.c index fd304a248de6..5beb60544c9a 100644 --- a/arch/riscv/kernel/perf_regs.c +++ b/arch/riscv/kernel/perf_regs.c @@ -26,6 +26,11 @@ int perf_reg_validate(u64 mask) return 0; } =20 +int perf_reg_ext_validate(unsigned long *mask, unsigned int size) +{ + return -EINVAL; +} + u64 perf_reg_abi(struct task_struct *task) { #if __riscv_xlen =3D=3D 64 diff --git a/arch/s390/kernel/perf_regs.c b/arch/s390/kernel/perf_regs.c index a6b058ee4a36..9247573229b0 100644 --- a/arch/s390/kernel/perf_regs.c +++ b/arch/s390/kernel/perf_regs.c @@ -42,6 +42,11 @@ int perf_reg_validate(u64 mask) return 0; } =20 +int perf_reg_ext_validate(unsigned long *mask, unsigned int size) +{ + return -EINVAL; +} + u64 perf_reg_abi(struct task_struct *task) { if (test_tsk_thread_flag(task, TIF_31BIT)) diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_= event.h index a38d791cd0c2..54125b344b2b 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -680,6 +680,10 @@ struct x86_perf_regs { struct pt_regs regs; u64 ssp; u64 *xmm_regs; + u64 *opmask_regs; + u64 *ymmh_regs; + u64 **zmmh_regs; + u64 **h16zmm_regs; }; =20 extern unsigned long perf_arch_instruction_pointer(struct pt_regs *regs); diff --git a/arch/x86/include/uapi/asm/perf_regs.h b/arch/x86/include/uapi/= asm/perf_regs.h index 2e88fdebd259..6651e5af448d 100644 --- a/arch/x86/include/uapi/asm/perf_regs.h +++ b/arch/x86/include/uapi/asm/perf_regs.h @@ -32,7 +32,7 @@ enum perf_event_x86_regs { PERF_REG_X86_32_MAX =3D PERF_REG_X86_GS + 1, PERF_REG_X86_64_MAX =3D PERF_REG_X86_SSP + 1, =20 - /* These all need two bits set because they are 128bit */ + /* These all need two bits set because they are 128 bits */ PERF_REG_X86_XMM0 =3D 32, PERF_REG_X86_XMM1 =3D 34, PERF_REG_X86_XMM2 =3D 36, @@ -52,6 +52,87 @@ enum perf_event_x86_regs { =20 /* These include both GPRs and XMMX registers */ PERF_REG_X86_XMM_MAX =3D PERF_REG_X86_XMM15 + 2, + + /* + * YMM upper bits need two bits set because they are 128 bits. + * PERF_REG_X86_YMMH0 =3D 64 + */ + PERF_REG_X86_YMMH0 =3D PERF_REG_X86_XMM_MAX, + PERF_REG_X86_YMMH1 =3D PERF_REG_X86_YMMH0 + 2, + PERF_REG_X86_YMMH2 =3D PERF_REG_X86_YMMH1 + 2, + PERF_REG_X86_YMMH3 =3D PERF_REG_X86_YMMH2 + 2, + PERF_REG_X86_YMMH4 =3D PERF_REG_X86_YMMH3 + 2, + PERF_REG_X86_YMMH5 =3D PERF_REG_X86_YMMH4 + 2, + PERF_REG_X86_YMMH6 =3D PERF_REG_X86_YMMH5 + 2, + PERF_REG_X86_YMMH7 =3D PERF_REG_X86_YMMH6 + 2, + PERF_REG_X86_YMMH8 =3D PERF_REG_X86_YMMH7 + 2, + PERF_REG_X86_YMMH9 =3D PERF_REG_X86_YMMH8 + 2, + PERF_REG_X86_YMMH10 =3D PERF_REG_X86_YMMH9 + 2, + PERF_REG_X86_YMMH11 =3D PERF_REG_X86_YMMH10 + 2, + PERF_REG_X86_YMMH12 =3D PERF_REG_X86_YMMH11 + 2, + PERF_REG_X86_YMMH13 =3D PERF_REG_X86_YMMH12 + 2, + PERF_REG_X86_YMMH14 =3D PERF_REG_X86_YMMH13 + 2, + PERF_REG_X86_YMMH15 =3D PERF_REG_X86_YMMH14 + 2, + PERF_REG_X86_YMMH_MAX =3D PERF_REG_X86_YMMH15 + 2, + + /* + * ZMM0-15 upper bits need four bits set because they are 256 bits + * PERF_REG_X86_ZMMH0 =3D 96 + */ + PERF_REG_X86_ZMMH0 =3D PERF_REG_X86_YMMH_MAX, + PERF_REG_X86_ZMMH1 =3D PERF_REG_X86_ZMMH0 + 4, + PERF_REG_X86_ZMMH2 =3D PERF_REG_X86_ZMMH1 + 4, + PERF_REG_X86_ZMMH3 =3D PERF_REG_X86_ZMMH2 + 4, + PERF_REG_X86_ZMMH4 =3D PERF_REG_X86_ZMMH3 + 4, + PERF_REG_X86_ZMMH5 =3D PERF_REG_X86_ZMMH4 + 4, + PERF_REG_X86_ZMMH6 =3D PERF_REG_X86_ZMMH5 + 4, + PERF_REG_X86_ZMMH7 =3D PERF_REG_X86_ZMMH6 + 4, + PERF_REG_X86_ZMMH8 =3D PERF_REG_X86_ZMMH7 + 4, + PERF_REG_X86_ZMMH9 =3D PERF_REG_X86_ZMMH8 + 4, + PERF_REG_X86_ZMMH10 =3D PERF_REG_X86_ZMMH9 + 4, + PERF_REG_X86_ZMMH11 =3D PERF_REG_X86_ZMMH10 + 4, + PERF_REG_X86_ZMMH12 =3D PERF_REG_X86_ZMMH11 + 4, + PERF_REG_X86_ZMMH13 =3D PERF_REG_X86_ZMMH12 + 4, + PERF_REG_X86_ZMMH14 =3D PERF_REG_X86_ZMMH13 + 4, + PERF_REG_X86_ZMMH15 =3D PERF_REG_X86_ZMMH14 + 4, + PERF_REG_X86_ZMMH_MAX =3D PERF_REG_X86_ZMMH15 + 4, + + /* + * ZMM16-31 need eight bits set because they are 512 bits + * PERF_REG_X86_ZMM16 =3D 160 + */ + PERF_REG_X86_ZMM16 =3D PERF_REG_X86_ZMMH_MAX, + PERF_REG_X86_ZMM17 =3D PERF_REG_X86_ZMM16 + 8, + PERF_REG_X86_ZMM18 =3D PERF_REG_X86_ZMM17 + 8, + PERF_REG_X86_ZMM19 =3D PERF_REG_X86_ZMM18 + 8, + PERF_REG_X86_ZMM20 =3D PERF_REG_X86_ZMM19 + 8, + PERF_REG_X86_ZMM21 =3D PERF_REG_X86_ZMM20 + 8, + PERF_REG_X86_ZMM22 =3D PERF_REG_X86_ZMM21 + 8, + PERF_REG_X86_ZMM23 =3D PERF_REG_X86_ZMM22 + 8, + PERF_REG_X86_ZMM24 =3D PERF_REG_X86_ZMM23 + 8, + PERF_REG_X86_ZMM25 =3D PERF_REG_X86_ZMM24 + 8, + PERF_REG_X86_ZMM26 =3D PERF_REG_X86_ZMM25 + 8, + PERF_REG_X86_ZMM27 =3D PERF_REG_X86_ZMM26 + 8, + PERF_REG_X86_ZMM28 =3D PERF_REG_X86_ZMM27 + 8, + PERF_REG_X86_ZMM29 =3D PERF_REG_X86_ZMM28 + 8, + PERF_REG_X86_ZMM30 =3D PERF_REG_X86_ZMM29 + 8, + PERF_REG_X86_ZMM31 =3D PERF_REG_X86_ZMM30 + 8, + PERF_REG_X86_ZMM_MAX =3D PERF_REG_X86_ZMM31 + 8, + + /* + * OPMASK Registers + * PERF_REG_X86_OPMASK0 =3D 288 + */ + PERF_REG_X86_OPMASK0 =3D PERF_REG_X86_ZMM_MAX, + PERF_REG_X86_OPMASK1 =3D PERF_REG_X86_OPMASK0 + 1, + PERF_REG_X86_OPMASK2 =3D PERF_REG_X86_OPMASK1 + 1, + PERF_REG_X86_OPMASK3 =3D PERF_REG_X86_OPMASK2 + 1, + PERF_REG_X86_OPMASK4 =3D PERF_REG_X86_OPMASK3 + 1, + PERF_REG_X86_OPMASK5 =3D PERF_REG_X86_OPMASK4 + 1, + PERF_REG_X86_OPMASK6 =3D PERF_REG_X86_OPMASK5 + 1, + PERF_REG_X86_OPMASK7 =3D PERF_REG_X86_OPMASK6 + 1, + + PERF_REG_X86_VEC_MAX =3D PERF_REG_X86_OPMASK7 + 1, }; =20 #define PERF_REG_EXTENDED_MASK (~((1ULL << PERF_REG_X86_XMM0) - 1)) diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c index 4b15c7488ec1..1447cd341868 100644 --- a/arch/x86/kernel/perf_regs.c +++ b/arch/x86/kernel/perf_regs.c @@ -59,12 +59,41 @@ static unsigned int pt_regs_offset[PERF_REG_X86_MAX] = =3D { #endif }; =20 -u64 perf_reg_value(struct pt_regs *regs, int idx) +static u64 perf_reg_ext_value(struct pt_regs *regs, int idx) { struct x86_perf_regs *perf_regs; =20 + perf_regs =3D container_of(regs, struct x86_perf_regs, regs); + + switch (idx) { + case PERF_REG_X86_YMMH0 ... PERF_REG_X86_YMMH_MAX - 1: + idx -=3D PERF_REG_X86_YMMH0; + return !perf_regs->ymmh_regs ? 0 : perf_regs->ymmh_regs[idx]; + case PERF_REG_X86_ZMMH0 ... PERF_REG_X86_ZMMH_MAX - 1: + idx -=3D PERF_REG_X86_ZMMH0; + return !perf_regs->zmmh_regs ? 0 : perf_regs->zmmh_regs[idx / 4][idx % 4= ]; + case PERF_REG_X86_ZMM16 ... PERF_REG_X86_ZMM_MAX - 1: + idx -=3D PERF_REG_X86_ZMM16; + return !perf_regs->h16zmm_regs ? 0 : perf_regs->h16zmm_regs[idx / 8][idx= % 8]; + case PERF_REG_X86_OPMASK0 ... PERF_REG_X86_OPMASK7: + idx -=3D PERF_REG_X86_OPMASK0; + return !perf_regs->opmask_regs ? 0 : perf_regs->opmask_regs[idx]; + default: + WARN_ON_ONCE(1); + break; + } + + return 0; +} + +u64 perf_reg_value(struct pt_regs *regs, int idx) +{ + struct x86_perf_regs *perf_regs =3D container_of(regs, struct x86_perf_re= gs, regs); + + if (idx >=3D PERF_REG_EXTENDED_OFFSET) + return perf_reg_ext_value(regs, idx); + if (idx >=3D PERF_REG_X86_XMM0 && idx < PERF_REG_X86_XMM_MAX) { - perf_regs =3D container_of(regs, struct x86_perf_regs, regs); if (!perf_regs->xmm_regs) return 0; return perf_regs->xmm_regs[idx - PERF_REG_X86_XMM0]; @@ -100,6 +129,11 @@ int perf_reg_validate(u64 mask) return 0; } =20 +int perf_reg_ext_validate(unsigned long *mask, unsigned int size) +{ + return -EINVAL; +} + u64 perf_reg_abi(struct task_struct *task) { return PERF_SAMPLE_REGS_ABI_32; @@ -125,6 +159,18 @@ int perf_reg_validate(u64 mask) return 0; } =20 +int perf_reg_ext_validate(unsigned long *mask, unsigned int size) +{ + if (!mask || !size || size > PERF_NUM_EXT_REGS) + return -EINVAL; + + if (find_last_bit(mask, size) > + (PERF_REG_X86_VEC_MAX - PERF_REG_EXTENDED_OFFSET)) + return -EINVAL; + + return 0; +} + u64 perf_reg_abi(struct task_struct *task) { if (!user_64bit_mode(task_pt_regs(task))) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 2d07bc1193f3..3612ef66f86c 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -301,6 +301,7 @@ struct perf_event_pmu_context; #define PERF_PMU_CAP_AUX_OUTPUT 0x0080 #define PERF_PMU_CAP_EXTENDED_HW_TYPE 0x0100 #define PERF_PMU_CAP_AUX_PAUSE 0x0200 +#define PERF_PMU_CAP_MORE_EXT_REGS 0x0400 =20 /** * pmu::scope @@ -1389,6 +1390,7 @@ static inline void perf_clear_branch_entry_bitfields(= struct perf_branch_entry *b br->reserved =3D 0; } =20 +extern bool has_more_extended_regs(struct perf_event *event); extern void perf_output_sample(struct perf_output_handle *handle, struct perf_event_header *header, struct perf_sample_data *data, diff --git a/include/linux/perf_regs.h b/include/linux/perf_regs.h index f632c5725f16..aa4dfb5af552 100644 --- a/include/linux/perf_regs.h +++ b/include/linux/perf_regs.h @@ -9,6 +9,8 @@ struct perf_regs { struct pt_regs *regs; }; =20 +#define PERF_REG_EXTENDED_OFFSET 64 + #ifdef CONFIG_HAVE_PERF_REGS #include =20 @@ -21,6 +23,8 @@ int perf_reg_validate(u64 mask); u64 perf_reg_abi(struct task_struct *task); void perf_get_regs_user(struct perf_regs *regs_user, struct pt_regs *regs); +int perf_reg_ext_validate(unsigned long *mask, unsigned int size); + #else =20 #define PERF_REG_EXTENDED_MASK 0 @@ -35,6 +39,12 @@ static inline int perf_reg_validate(u64 mask) return mask ? -ENOSYS : 0; } =20 +static inline int perf_reg_ext_validate(unsigned long *mask, + unsigned int size) +{ + return -EINVAL; +} + static inline u64 perf_reg_abi(struct task_struct *task) { return PERF_SAMPLE_REGS_ABI_NONE; diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_even= t.h index 0524d541d4e3..575cd653291c 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -379,6 +379,10 @@ enum perf_event_read_format { #define PERF_ATTR_SIZE_VER6 120 /* add: aux_sample_size */ #define PERF_ATTR_SIZE_VER7 128 /* add: sig_data */ #define PERF_ATTR_SIZE_VER8 136 /* add: config3 */ +#define PERF_ATTR_SIZE_VER9 168 /* add: sample_regs_intr_ext[PERF_EXT_REGS= _ARRAY_SIZE] */ + +#define PERF_EXT_REGS_ARRAY_SIZE 4 +#define PERF_NUM_EXT_REGS (PERF_EXT_REGS_ARRAY_SIZE * 64) =20 /* * Hardware event_id to monitor via a performance monitoring event: @@ -531,6 +535,12 @@ struct perf_event_attr { __u64 sig_data; =20 __u64 config3; /* extension of config2 */ + + /* + * Extension sets of regs to dump for each sample. + * See asm/perf_regs.h for details. + */ + __u64 sample_regs_intr_ext[PERF_EXT_REGS_ARRAY_SIZE]; }; =20 /* diff --git a/kernel/events/core.c b/kernel/events/core.c index 0f8c55990783..0da480b5e025 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -7081,6 +7081,21 @@ perf_output_sample_regs(struct perf_output_handle *h= andle, } } =20 +static void +perf_output_sample_regs_ext(struct perf_output_handle *handle, + struct pt_regs *regs, + unsigned long *mask, + unsigned int size) +{ + int bit; + u64 val; + + for_each_set_bit(bit, mask, size) { + val =3D perf_reg_value(regs, bit + PERF_REG_EXTENDED_OFFSET); + perf_output_put(handle, val); + } +} + static void perf_sample_regs_user(struct perf_regs *regs_user, struct pt_regs *regs) { @@ -7509,6 +7524,13 @@ static void perf_output_read(struct perf_output_hand= le *handle, perf_output_read_one(handle, event, enabled, running); } =20 +inline bool has_more_extended_regs(struct perf_event *event) +{ + return !!bitmap_weight( + (unsigned long *)event->attr.sample_regs_intr_ext, + PERF_NUM_EXT_REGS); +} + void perf_output_sample(struct perf_output_handle *handle, struct perf_event_header *header, struct perf_sample_data *data, @@ -7666,6 +7688,12 @@ void perf_output_sample(struct perf_output_handle *h= andle, perf_output_sample_regs(handle, data->regs_intr.regs, mask); + if (has_more_extended_regs(event)) { + perf_output_sample_regs_ext( + handle, data->regs_intr.regs, + (unsigned long *)event->attr.sample_regs_intr_ext, + PERF_NUM_EXT_REGS); + } } } =20 @@ -7980,6 +8008,12 @@ void perf_prepare_sample(struct perf_sample_data *da= ta, u64 mask =3D event->attr.sample_regs_intr; =20 size +=3D hweight64(mask) * sizeof(u64); + + if (has_more_extended_regs(event)) { + size +=3D bitmap_weight( + (unsigned long *)event->attr.sample_regs_intr_ext, + PERF_NUM_EXT_REGS) * sizeof(u64); + } } =20 data->dyn_size +=3D size; @@ -11991,6 +12025,10 @@ static int perf_try_init_event(struct pmu *pmu, st= ruct perf_event *event) has_extended_regs(event)) ret =3D -EOPNOTSUPP; =20 + if (!(pmu->capabilities & PERF_PMU_CAP_MORE_EXT_REGS) && + has_more_extended_regs(event)) + ret =3D -EOPNOTSUPP; + if (pmu->capabilities & PERF_PMU_CAP_NO_EXCLUDE && event_has_any_exclude_flag(event)) ret =3D -EINVAL; @@ -12561,8 +12599,19 @@ static int perf_copy_attr(struct perf_event_attr _= _user *uattr, if (!attr->sample_max_stack) attr->sample_max_stack =3D sysctl_perf_event_max_stack; =20 - if (attr->sample_type & PERF_SAMPLE_REGS_INTR) - ret =3D perf_reg_validate(attr->sample_regs_intr); + if (attr->sample_type & PERF_SAMPLE_REGS_INTR) { + if (attr->sample_regs_intr !=3D 0) + ret =3D perf_reg_validate(attr->sample_regs_intr); + if (ret) + return ret; + if (!!bitmap_weight((unsigned long *)attr->sample_regs_intr_ext, + PERF_NUM_EXT_REGS)) + ret =3D perf_reg_ext_validate( + (unsigned long *)attr->sample_regs_intr_ext, + PERF_NUM_EXT_REGS); + if (ret) + return ret; + } =20 #ifndef CONFIG_CGROUP_PERF if (attr->sample_type & PERF_SAMPLE_CGROUP) --=20 2.40.1 From nobody Wed Feb 11 05:49:46 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D7BC31CAA9D; Thu, 23 Jan 2025 06:21:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737613268; cv=none; b=bwgNRgiVvAaCenuVdMqHDkBQjm+hsRAWl35QrQxz+w7o94ZdRUVSg+I/pNXU6ktLqh0gNWY7X0kLXtNTZXd9QELwL3gJtEBgqSlFB1isDAyqaRryUcg7zxNKQRsMh1GRGVmtSKhLbt0fEgQ9L1SkgMho0Qy1rysjobz3uDDnsws= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737613268; c=relaxed/simple; bh=x/QRPgrDF27N+5NAeSUVqZhq0/Y5in74j2wHi+e/LfA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Fg5YsDg8JwHtpgw/3GUgNBwRmVEiNA7dLZmOKabO3trxPPCnkKYKyW4/XKYdYgN4yV7Uq5iOMlgF/tk7iAB6AYbywNA1ysoUJAITxJj7HMGaeD70f+9QFIQtHxxrFUWSVQWFkGZ+bgetu5vUiQurKm3gdbtDjMzzkLrxAgYCdW8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=FaUw8rS3; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="FaUw8rS3" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1737613267; x=1769149267; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=x/QRPgrDF27N+5NAeSUVqZhq0/Y5in74j2wHi+e/LfA=; b=FaUw8rS3i+y6bOaZ4um4bLJPIG6n6lTWIFXtj1eEcflvtLanRx5dYQq2 YBhGxEVTHR930qXlJXvLpz5eWeNav7LH+K9apy120ZY8QgHK3+PdnMuIp 60cG8mnvs9LdBkQf+yX3DqYqOGXsHHEY0grXC3PLeqm4XvCYM27ZQkbCk 0Y1859fP76oYx8osghTgvbZvSNnpQktxlGN1YAcGBqo5Hzta6Tu0DbvFZ 2Zr/s3jsxTZltd0YkVaKG21DYdPt3q0aPx+ABbPLxNr2ElqhdTwC0QgA8 d3reBP/FGldO5QotUJI6gp0gohsS8inZS4PddWy/6OX7sKagNUiCQOto1 A==; X-CSE-ConnectionGUID: D7QQtRpZQ06YYOUj6Hi8/w== X-CSE-MsgGUID: ROSo+ZhoSZWzO2nXIaPhuQ== X-IronPort-AV: E=McAfee;i="6700,10204,11323"; a="55513209" X-IronPort-AV: E=Sophos;i="6.13,227,1732608000"; d="scan'208";a="55513209" Received: from orviesa003.jf.intel.com ([10.64.159.143]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jan 2025 22:21:07 -0800 X-CSE-ConnectionGUID: OgIWIOdZSuWEhAyg8LiugA== X-CSE-MsgGUID: 7zEXgXKsSJ+bKWr4w1fWgQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,199,1725346800"; d="scan'208";a="112334764" Received: from emr.sh.intel.com ([10.112.229.56]) by orviesa003.jf.intel.com with ESMTP; 22 Jan 2025 22:21:03 -0800 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Ian Rogers , Adrian Hunter , Alexander Shishkin , Kan Liang , Andi Kleen , Eranian Stephane Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Dapeng Mi , Dapeng Mi Subject: [PATCH 16/20] perf/x86/intel: Support arch-PEBS vector registers group capturing Date: Thu, 23 Jan 2025 14:07:17 +0000 Message-Id: <20250123140721.2496639-17-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250123140721.2496639-1-dapeng1.mi@linux.intel.com> References: <20250123140721.2496639-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add x86/intel specific vector register (VECR) group capturing for arch-PEBS. Enable corresponding VECR group bits in GPx_CFG_C/FX0_CFG_C MSRs if users configures these vector registers bitmap in perf_event_attr and parse VECR group in arch-PEBS record. Currently vector registers capturing is only supported by PEBS based sampling, PMU driver would return error if PMI based sampling tries to capture these vector registers. Co-developed-by: Kan Liang Signed-off-by: Kan Liang Signed-off-by: Dapeng Mi --- arch/x86/events/core.c | 59 ++++++++++++++++++++++ arch/x86/events/intel/core.c | 15 ++++++ arch/x86/events/intel/ds.c | 82 ++++++++++++++++++++++++++++--- arch/x86/include/asm/msr-index.h | 6 +++ arch/x86/include/asm/perf_event.h | 20 ++++++++ 5 files changed, 175 insertions(+), 7 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 7ed80f01f15d..f17a8c9c6391 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -576,6 +576,39 @@ int x86_pmu_max_precise(struct pmu *pmu) return precise; } =20 +static bool has_vec_regs(struct perf_event *event, int start, int end) +{ + /* -1 to subtract PERF_REG_EXTENDED_OFFSET */ + int idx =3D start / 64 - 1; + int s =3D start % 64; + int e =3D end % 64; + + return event->attr.sample_regs_intr_ext[idx] & GENMASK_ULL(e, s); +} + +static inline bool has_ymmh_regs(struct perf_event *event) +{ + return has_vec_regs(event, PERF_REG_X86_YMMH0, PERF_REG_X86_YMMH15 + 1); +} + +static inline bool has_zmmh_regs(struct perf_event *event) +{ + return has_vec_regs(event, PERF_REG_X86_ZMMH0, PERF_REG_X86_ZMMH7 + 3) || + has_vec_regs(event, PERF_REG_X86_ZMMH8, PERF_REG_X86_ZMMH15 + 3); +} + +static inline bool has_h16zmm_regs(struct perf_event *event) +{ + return has_vec_regs(event, PERF_REG_X86_ZMM16, PERF_REG_X86_ZMM19 + 7) || + has_vec_regs(event, PERF_REG_X86_ZMM20, PERF_REG_X86_ZMM27 + 7) || + has_vec_regs(event, PERF_REG_X86_ZMM28, PERF_REG_X86_ZMM31 + 7); +} + +static inline bool has_opmask_regs(struct perf_event *event) +{ + return has_vec_regs(event, PERF_REG_X86_OPMASK0, PERF_REG_X86_OPMASK7); +} + int x86_pmu_hw_config(struct perf_event *event) { if (event->attr.precise_ip) { @@ -671,6 +704,32 @@ int x86_pmu_hw_config(struct perf_event *event) return -EINVAL; } =20 + /* + * Architectural PEBS supports to capture more vector registers besides + * XMM registers, like YMM, OPMASK and ZMM registers. + */ + if (unlikely(has_more_extended_regs(event))) { + u64 caps =3D hybrid(event->pmu, arch_pebs_cap).caps; + + if (!(event->pmu->capabilities & PERF_PMU_CAP_MORE_EXT_REGS)) + return -EINVAL; + + if (has_opmask_regs(event) && !(caps & ARCH_PEBS_VECR_OPMASK)) + return -EINVAL; + + if (has_ymmh_regs(event) && !(caps & ARCH_PEBS_VECR_YMM)) + return -EINVAL; + + if (has_zmmh_regs(event) && !(caps & ARCH_PEBS_VECR_ZMMH)) + return -EINVAL; + + if (has_h16zmm_regs(event) && !(caps & ARCH_PEBS_VECR_H16ZMM)) + return -EINVAL; + + if (!event->attr.precise_ip) + return -EINVAL; + } + return x86_setup_perfctr(event); } =20 diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index 9c5b44a73ca2..0c828a42b1ad 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -2953,6 +2953,18 @@ static void intel_pmu_enable_event_ext(struct perf_e= vent *event) if (pebs_data_cfg & PEBS_DATACFG_XMMS) ext |=3D ARCH_PEBS_VECR_XMM & cap.caps; =20 + if (pebs_data_cfg & PEBS_DATACFG_YMMS) + ext |=3D ARCH_PEBS_VECR_YMM & cap.caps; + + if (pebs_data_cfg & PEBS_DATACFG_OPMASKS) + ext |=3D ARCH_PEBS_VECR_OPMASK & cap.caps; + + if (pebs_data_cfg & PEBS_DATACFG_ZMMHS) + ext |=3D ARCH_PEBS_VECR_ZMMH & cap.caps; + + if (pebs_data_cfg & PEBS_DATACFG_H16ZMMS) + ext |=3D ARCH_PEBS_VECR_H16ZMM & cap.caps; + if (pebs_data_cfg & PEBS_DATACFG_LBRS) ext |=3D ARCH_PEBS_LBR & cap.caps; =20 @@ -5117,6 +5129,9 @@ static inline void __intel_update_pmu_caps(struct pmu= *pmu) if (hybrid(pmu, arch_pebs_cap).caps & ARCH_PEBS_VECR_XMM) dest_pmu->capabilities |=3D PERF_PMU_CAP_EXTENDED_REGS; =20 + if (hybrid(pmu, arch_pebs_cap).caps & ARCH_PEBS_VECR_EXT) + dest_pmu->capabilities |=3D PERF_PMU_CAP_MORE_EXT_REGS; + if (hybrid(pmu, arch_pebs_cap).caps & ARCH_PEBS_CNTR_MASK) x86_pmu.late_setup =3D intel_pmu_late_setup; } diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index 32a44e3571cb..fc5716b257d7 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -1413,6 +1413,7 @@ static u64 pebs_update_adaptive_cfg(struct perf_event= *event) u64 sample_type =3D attr->sample_type; u64 pebs_data_cfg =3D 0; bool gprs, tsx_weight; + int bit =3D 0; =20 if (!(sample_type & ~(PERF_SAMPLE_IP|PERF_SAMPLE_TIME)) && attr->precise_ip > 1) @@ -1437,9 +1438,37 @@ static u64 pebs_update_adaptive_cfg(struct perf_even= t *event) if (gprs || (attr->precise_ip < 2) || tsx_weight) pebs_data_cfg |=3D PEBS_DATACFG_GP; =20 - if ((sample_type & PERF_SAMPLE_REGS_INTR) && - (attr->sample_regs_intr & PERF_REG_EXTENDED_MASK)) - pebs_data_cfg |=3D PEBS_DATACFG_XMMS; + if (sample_type & PERF_SAMPLE_REGS_INTR) { + if (attr->sample_regs_intr & PERF_REG_EXTENDED_MASK) + pebs_data_cfg |=3D PEBS_DATACFG_XMMS; + + for_each_set_bit_from(bit, + (unsigned long *)event->attr.sample_regs_intr_ext, + PERF_NUM_EXT_REGS) { + switch (bit + PERF_REG_EXTENDED_OFFSET) { + case PERF_REG_X86_OPMASK0 ... PERF_REG_X86_OPMASK7: + pebs_data_cfg |=3D PEBS_DATACFG_OPMASKS; + bit =3D PERF_REG_X86_YMMH0 - + PERF_REG_EXTENDED_OFFSET - 1; + break; + case PERF_REG_X86_YMMH0 ... PERF_REG_X86_ZMMH0 - 1: + pebs_data_cfg |=3D PEBS_DATACFG_YMMS; + bit =3D PERF_REG_X86_ZMMH0 - + PERF_REG_EXTENDED_OFFSET - 1; + break; + case PERF_REG_X86_ZMMH0 ... PERF_REG_X86_ZMM16 - 1: + pebs_data_cfg |=3D PEBS_DATACFG_ZMMHS; + bit =3D PERF_REG_X86_ZMM16 - + PERF_REG_EXTENDED_OFFSET - 1; + break; + case PERF_REG_X86_ZMM16 ... PERF_REG_X86_ZMM_MAX - 1: + pebs_data_cfg |=3D PEBS_DATACFG_H16ZMMS; + bit =3D PERF_REG_X86_ZMM_MAX - + PERF_REG_EXTENDED_OFFSET - 1; + break; + } + } + } =20 if (sample_type & PERF_SAMPLE_BRANCH_STACK) { /* @@ -2216,6 +2245,10 @@ static void setup_pebs_adaptive_sample_data(struct p= erf_event *event, =20 perf_regs =3D container_of(regs, struct x86_perf_regs, regs); perf_regs->xmm_regs =3D NULL; + perf_regs->ymmh_regs =3D NULL; + perf_regs->opmask_regs =3D NULL; + perf_regs->zmmh_regs =3D NULL; + perf_regs->h16zmm_regs =3D NULL; perf_regs->ssp =3D 0; =20 format_group =3D basic->format_group; @@ -2333,6 +2366,10 @@ static void setup_arch_pebs_sample_data(struct perf_= event *event, =20 perf_regs =3D container_of(regs, struct x86_perf_regs, regs); perf_regs->xmm_regs =3D NULL; + perf_regs->ymmh_regs =3D NULL; + perf_regs->opmask_regs =3D NULL; + perf_regs->zmmh_regs =3D NULL; + perf_regs->h16zmm_regs =3D NULL; perf_regs->ssp =3D 0; =20 __setup_perf_sample_data(event, iregs, data); @@ -2383,14 +2420,45 @@ static void setup_arch_pebs_sample_data(struct perf= _event *event, meminfo->tsx_tuning, ax); } =20 - if (header->xmm) { + if (header->xmm || header->ymmh || header->opmask || + header->zmmh || header->h16zmm) { struct arch_pebs_xmm *xmm; + struct arch_pebs_ymmh *ymmh; + struct arch_pebs_zmmh *zmmh; + struct arch_pebs_h16zmm *h16zmm; + struct arch_pebs_opmask *opmask; =20 next_record +=3D sizeof(struct arch_pebs_xer_header); =20 - xmm =3D next_record; - perf_regs->xmm_regs =3D xmm->xmm; - next_record =3D xmm + 1; + if (header->xmm) { + xmm =3D next_record; + perf_regs->xmm_regs =3D xmm->xmm; + next_record =3D xmm + 1; + } + + if (header->ymmh) { + ymmh =3D next_record; + perf_regs->ymmh_regs =3D ymmh->ymmh; + next_record =3D ymmh + 1; + } + + if (header->opmask) { + opmask =3D next_record; + perf_regs->opmask_regs =3D opmask->opmask; + next_record =3D opmask + 1; + } + + if (header->zmmh) { + zmmh =3D next_record; + perf_regs->zmmh_regs =3D (u64 **)zmmh->zmmh; + next_record =3D zmmh + 1; + } + + if (header->h16zmm) { + h16zmm =3D next_record; + perf_regs->h16zmm_regs =3D (u64 **)h16zmm->h16zmm; + next_record =3D h16zmm + 1; + } } =20 if (header->lbr) { diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-in= dex.h index 6235df132ee0..e017ee8556e5 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -326,6 +326,12 @@ #define ARCH_PEBS_LBR_SHIFT 40 #define ARCH_PEBS_LBR (0x3ull << ARCH_PEBS_LBR_SHIFT) #define ARCH_PEBS_VECR_XMM BIT_ULL(49) +#define ARCH_PEBS_VECR_YMM BIT_ULL(50) +#define ARCH_PEBS_VECR_OPMASK BIT_ULL(53) +#define ARCH_PEBS_VECR_ZMMH BIT_ULL(54) +#define ARCH_PEBS_VECR_H16ZMM BIT_ULL(55) +#define ARCH_PEBS_VECR_EXT_SHIFT 50 +#define ARCH_PEBS_VECR_EXT (0x3full << ARCH_PEBS_VECR_EXT_SHIFT) #define ARCH_PEBS_GPR BIT_ULL(61) #define ARCH_PEBS_AUX BIT_ULL(62) #define ARCH_PEBS_EN BIT_ULL(63) diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_= event.h index 54125b344b2b..79368ece2bf9 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -142,6 +142,10 @@ #define PEBS_DATACFG_LBRS BIT_ULL(3) #define PEBS_DATACFG_CNTR BIT_ULL(4) #define PEBS_DATACFG_METRICS BIT_ULL(5) +#define PEBS_DATACFG_YMMS BIT_ULL(6) +#define PEBS_DATACFG_OPMASKS BIT_ULL(7) +#define PEBS_DATACFG_ZMMHS BIT_ULL(8) +#define PEBS_DATACFG_H16ZMMS BIT_ULL(9) #define PEBS_DATACFG_LBR_SHIFT 24 #define PEBS_DATACFG_CNTR_SHIFT 32 #define PEBS_DATACFG_CNTR_MASK GENMASK_ULL(15, 0) @@ -559,6 +563,22 @@ struct arch_pebs_xmm { u64 xmm[16*2]; /* two entries for each register */ }; =20 +struct arch_pebs_ymmh { + u64 ymmh[16*2]; /* two entries for each register */ +}; + +struct arch_pebs_opmask { + u64 opmask[8]; +}; + +struct arch_pebs_zmmh { + u64 zmmh[16][4]; /* four entries for each register */ +}; + +struct arch_pebs_h16zmm { + u64 h16zmm[16][8]; /* eight entries for each register */ +}; + #define ARCH_PEBS_LBR_NAN 0x0 #define ARCH_PEBS_LBR_NUM_8 0x1 #define ARCH_PEBS_LBR_NUM_16 0x2 --=20 2.40.1 From nobody Wed Feb 11 05:49:46 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 581851E2309; Thu, 23 Jan 2025 06:21:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737613271; cv=none; b=ovXi8xuYKa9fDxUw6mpSt/j5HE27VHlywp9DW0EWgByVNulBQiGuI97WZg/ZbPPJ2DfOpYXhp0yalwnUKHTc9sHloqjje40uZrpKXd9TmPFeJwSanz8hMhG7fXuo3z9m4cOlrQtMgWLDevP6GzNONyfarP7ndnknLdCPRjdFZPE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737613271; c=relaxed/simple; bh=ioaDTrDSThkgc8vgR2LRPC4o2BbK4SxoBXm+O75Bby8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=jw9hjgug5GOYjn6xiIuwJLCNPfikCRodIeyA+Mnsa65EWcpGZPqrZG4EI5078MXuQ51Iv+vg45RqGfoOORco+DSDUmlbTiavBsefjEfRwbinPYSxdyBcUdOjV0L83PQwuoFQLEXipwlOMEY+DJrjvzSR5/2OuGRGgfWm3gOTw/Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=QG/KwjYc; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="QG/KwjYc" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1737613271; x=1769149271; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ioaDTrDSThkgc8vgR2LRPC4o2BbK4SxoBXm+O75Bby8=; b=QG/KwjYcch07qN/M4ns8mYWq6sfAO1GxSPl4IqZcXx7r7gyscGt53wSr cdaD6MA2+SW8NOkDTQjoGULvaFYbRMtZOS6BB1TJASvCMxOgeLaN2UdOy 0vrf5Ut9LT8n2xog+9mn4DZrIFVY2GZqzv+M04R8hfKD5f6A/izV78d0j ojIGPWonMdj5sRjYOIE5D+UDSIh01OmDNyadOyKH/lfWu+Z77NFKFuk8k Hw4kW6E9XOKxrnQ1uEXEZlhVJTLaHPAEhQGnXPU2PfG6JPgeczE2LRxbt QcFnYjBwLyXMj9ykCElhn/Bn8OtV8fbFXL2yO813tS3wlEVjMUF2O1My8 g==; X-CSE-ConnectionGUID: 93dawb6RQ2uazyuiI4V70w== X-CSE-MsgGUID: v64EyHVSTdCJdUO6Kkb6dQ== X-IronPort-AV: E=McAfee;i="6700,10204,11323"; a="55513227" X-IronPort-AV: E=Sophos;i="6.13,227,1732608000"; d="scan'208";a="55513227" Received: from orviesa003.jf.intel.com ([10.64.159.143]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jan 2025 22:21:11 -0800 X-CSE-ConnectionGUID: oHuru5bCQwyRKOTSxBEBGg== X-CSE-MsgGUID: MiVpjJGKTFe6Of+PFT4RhQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,199,1725346800"; d="scan'208";a="112334781" Received: from emr.sh.intel.com ([10.112.229.56]) by orviesa003.jf.intel.com with ESMTP; 22 Jan 2025 22:21:06 -0800 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Ian Rogers , Adrian Hunter , Alexander Shishkin , Kan Liang , Andi Kleen , Eranian Stephane Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Dapeng Mi , Dapeng Mi Subject: [PATCH 17/20] perf tools: Support to show SSP register Date: Thu, 23 Jan 2025 14:07:18 +0000 Message-Id: <20250123140721.2496639-18-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250123140721.2496639-1-dapeng1.mi@linux.intel.com> References: <20250123140721.2496639-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add SSP register support. Signed-off-by: Dapeng Mi Reviewed-by: Ian Rogers --- tools/arch/x86/include/uapi/asm/perf_regs.h | 4 +++- tools/perf/arch/x86/util/perf_regs.c | 2 ++ tools/perf/util/intel-pt.c | 2 +- tools/perf/util/perf-regs-arch/perf_regs_x86.c | 2 ++ 4 files changed, 8 insertions(+), 2 deletions(-) diff --git a/tools/arch/x86/include/uapi/asm/perf_regs.h b/tools/arch/x86/i= nclude/uapi/asm/perf_regs.h index 7c9d2bb3833b..158e353070c3 100644 --- a/tools/arch/x86/include/uapi/asm/perf_regs.h +++ b/tools/arch/x86/include/uapi/asm/perf_regs.h @@ -27,9 +27,11 @@ enum perf_event_x86_regs { PERF_REG_X86_R13, PERF_REG_X86_R14, PERF_REG_X86_R15, + PERF_REG_X86_SSP, /* These are the limits for the GPRs. */ PERF_REG_X86_32_MAX =3D PERF_REG_X86_GS + 1, - PERF_REG_X86_64_MAX =3D PERF_REG_X86_R15 + 1, + PERF_REG_X86_64_MAX =3D PERF_REG_X86_SSP + 1, + PERF_REG_INTEL_PT_MAX =3D PERF_REG_X86_R15 + 1, =20 /* These all need two bits set because they are 128bit */ PERF_REG_X86_XMM0 =3D 32, diff --git a/tools/perf/arch/x86/util/perf_regs.c b/tools/perf/arch/x86/uti= l/perf_regs.c index 12fd93f04802..9f492568f3b4 100644 --- a/tools/perf/arch/x86/util/perf_regs.c +++ b/tools/perf/arch/x86/util/perf_regs.c @@ -36,6 +36,8 @@ static const struct sample_reg sample_reg_masks[] =3D { SMPL_REG(R14, PERF_REG_X86_R14), SMPL_REG(R15, PERF_REG_X86_R15), #endif + SMPL_REG(SSP, PERF_REG_X86_SSP), + SMPL_REG2(XMM0, PERF_REG_X86_XMM0), SMPL_REG2(XMM1, PERF_REG_X86_XMM1), SMPL_REG2(XMM2, PERF_REG_X86_XMM2), diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c index 30be6dfe09eb..86196275c1e7 100644 --- a/tools/perf/util/intel-pt.c +++ b/tools/perf/util/intel-pt.c @@ -2139,7 +2139,7 @@ static u64 *intel_pt_add_gp_regs(struct regs_dump *in= tr_regs, u64 *pos, u32 bit; int i; =20 - for (i =3D 0, bit =3D 1; i < PERF_REG_X86_64_MAX; i++, bit <<=3D 1) { + for (i =3D 0, bit =3D 1; i < PERF_REG_INTEL_PT_MAX; i++, bit <<=3D 1) { /* Get the PEBS gp_regs array index */ int n =3D pebs_gp_regs[i] - 1; =20 diff --git a/tools/perf/util/perf-regs-arch/perf_regs_x86.c b/tools/perf/ut= il/perf-regs-arch/perf_regs_x86.c index 708954a9d35d..9a909f02bc04 100644 --- a/tools/perf/util/perf-regs-arch/perf_regs_x86.c +++ b/tools/perf/util/perf-regs-arch/perf_regs_x86.c @@ -54,6 +54,8 @@ const char *__perf_reg_name_x86(int id) return "R14"; case PERF_REG_X86_R15: return "R15"; + case PERF_REG_X86_SSP: + return "ssp"; =20 #define XMM(x) \ case PERF_REG_X86_XMM ## x: \ --=20 2.40.1 From nobody Wed Feb 11 05:49:46 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7F5421F37B5; Thu, 23 Jan 2025 06:21:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737613276; cv=none; b=XXgy22QL8fnq6nETUrhJnb4DOgV+SCK+lh0w+I7w5tm/rXOwmbQhksalrFZLVrbsjxQ4SLbTxLWtS16D1V5A/C0Cv+2jW8iixZdC0KKJxHb4+fZv4pN81h3uMTqjhkU2LeHImxNH5rssS+buAT2y2FTSyyQYiq6esCzLwVi1nyQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737613276; c=relaxed/simple; bh=GXHp4jZIX8PmNK5zTmaX4cPc/FBLIJa3bilW/LVW4kY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=YPoVMdAeFQ5/Dbb/kr9j0rnrBOCfY7K87hcnVkz/YYqMtYw1odVao10AWx54DmIY/fK25Ly8Oa7009skMQ8adEkUklBal0XiLt3O18Qlrr5iQIp42MXYvUBLTUdW+K5++3ZvVCvOWBfWKU+nBNwjQb/eyDXyTIoYwnPMRZw+muw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=BF0M75H6; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="BF0M75H6" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1737613275; x=1769149275; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=GXHp4jZIX8PmNK5zTmaX4cPc/FBLIJa3bilW/LVW4kY=; b=BF0M75H6+TQ90/nq/0XGr0yYnNt9tz0ul5Fh6tjpVSOZAyK614wfNV3c wkHmEd9Gf0Mi7GkKbm2kQygVfPARy00+Akp1snDqLI9FO/HwFnulAQVvZ jI58SHaNwFCErqXnB8zcVYMLt5oLpywPvRJ6/PHeurbtKoU9TNKcdCMZS 2jVJO8O5xPoGMzYbtO1Eek+DpL4mbZ3/mKTul5AmSiWi2ZvHwa5Ua/ZP4 i7Y/UDwiFlUb19GV/Mv6ym8Zv1hbmb2DMxKbKWUtukS3j7HjH0dWVxhVp Z3D1QDO5rKZFy5mZnEd7XMVuwxUCDWrhDjpTgKA9aE9B+AtioEHthN5Zd w==; X-CSE-ConnectionGUID: BYCjgsO9SxikdPEuXuJELg== X-CSE-MsgGUID: HYc4rIhiSqigQWt2o1NT0w== X-IronPort-AV: E=McAfee;i="6700,10204,11323"; a="55513238" X-IronPort-AV: E=Sophos;i="6.13,227,1732608000"; d="scan'208";a="55513238" Received: from orviesa003.jf.intel.com ([10.64.159.143]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jan 2025 22:21:14 -0800 X-CSE-ConnectionGUID: QICaJ9WzQLKPLp38a/vcHQ== X-CSE-MsgGUID: hWMSevW/Q/aSvF6Sna97hg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,199,1725346800"; d="scan'208";a="112334791" Received: from emr.sh.intel.com ([10.112.229.56]) by orviesa003.jf.intel.com with ESMTP; 22 Jan 2025 22:21:10 -0800 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Ian Rogers , Adrian Hunter , Alexander Shishkin , Kan Liang , Andi Kleen , Eranian Stephane Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Dapeng Mi , Dapeng Mi Subject: [PATCH 18/20] perf tools: Support to capture more vector registers (common part) Date: Thu, 23 Jan 2025 14:07:19 +0000 Message-Id: <20250123140721.2496639-19-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250123140721.2496639-1-dapeng1.mi@linux.intel.com> References: <20250123140721.2496639-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Intel architectural PEBS supports to capture more vector registers like OPMASK/YMM/ZMM registers besides already supported XMM registers. arch-PEBS vector registers (VCER) capturing on perf core/pmu driver (Intel) has been supported by previous patches. This patch adds perf tool's part support. In detail, add support for the new sample_regs_intr_ext register selector in perf_event_attr. This 32 bytes bitmap is used to select the new register group OPMASK, YMMH, ZMMH and ZMM in VECR. Update perf regs to introduce the new registers. This single patch only introduces the common support, x86/intel specific support would be added in next patch. Co-developed-by: Kan Liang Signed-off-by: Kan Liang Signed-off-by: Dapeng Mi --- tools/include/uapi/linux/perf_event.h | 13 +++++++++ tools/perf/arch/arm/util/perf_regs.c | 5 +--- tools/perf/arch/arm64/util/perf_regs.c | 5 +--- tools/perf/arch/csky/util/perf_regs.c | 5 +--- tools/perf/arch/loongarch/util/perf_regs.c | 5 +--- tools/perf/arch/mips/util/perf_regs.c | 5 +--- tools/perf/arch/powerpc/util/perf_regs.c | 9 ++++--- tools/perf/arch/riscv/util/perf_regs.c | 5 +--- tools/perf/arch/s390/util/perf_regs.c | 5 +--- tools/perf/arch/x86/util/perf_regs.c | 9 ++++--- tools/perf/builtin-script.c | 19 ++++++++++--- tools/perf/util/evsel.c | 14 +++++++--- tools/perf/util/parse-regs-options.c | 23 +++++++++------- tools/perf/util/perf_regs.c | 5 ---- tools/perf/util/perf_regs.h | 18 +++++++++++-- tools/perf/util/record.h | 2 +- tools/perf/util/sample.h | 6 ++++- tools/perf/util/session.c | 31 +++++++++++++--------- tools/perf/util/synthetic-events.c | 7 +++-- 19 files changed, 116 insertions(+), 75 deletions(-) diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/lin= ux/perf_event.h index 4842c36fdf80..02d8f55f6247 100644 --- a/tools/include/uapi/linux/perf_event.h +++ b/tools/include/uapi/linux/perf_event.h @@ -379,6 +379,13 @@ enum perf_event_read_format { #define PERF_ATTR_SIZE_VER6 120 /* add: aux_sample_size */ #define PERF_ATTR_SIZE_VER7 128 /* add: sig_data */ #define PERF_ATTR_SIZE_VER8 136 /* add: config3 */ +#define PERF_ATTR_SIZE_VER9 168 /* add: sample_regs_intr_ext[PERF_EXT_REGS= _ARRAY_SIZE] */ + +#define PERF_EXT_REGS_ARRAY_SIZE 4 +#define PERF_NUM_EXT_REGS (PERF_EXT_REGS_ARRAY_SIZE * 64) + +#define PERF_NUM_INTR_REGS (PERF_EXT_REGS_ARRAY_SIZE + 1) +#define PERF_NUM_INTR_REGS_SIZE ((PERF_NUM_INTR_REGS) * 64) =20 /* * Hardware event_id to monitor via a performance monitoring event: @@ -522,6 +529,12 @@ struct perf_event_attr { __u64 sig_data; =20 __u64 config3; /* extension of config2 */ + + /* + * Extension sets of regs to dump for each sample. + * See asm/perf_regs.h for details. + */ + __u64 sample_regs_intr_ext[PERF_EXT_REGS_ARRAY_SIZE]; }; =20 /* diff --git a/tools/perf/arch/arm/util/perf_regs.c b/tools/perf/arch/arm/uti= l/perf_regs.c index f94a0210c7b7..3a3c2779efd4 100644 --- a/tools/perf/arch/arm/util/perf_regs.c +++ b/tools/perf/arch/arm/util/perf_regs.c @@ -6,10 +6,7 @@ static const struct sample_reg sample_reg_masks[] =3D { SMPL_REG_END }; =20 -uint64_t arch__intr_reg_mask(void) -{ - return PERF_REGS_MASK; -} +void arch__intr_reg_mask(unsigned long *mask) {} =20 uint64_t arch__user_reg_mask(void) { diff --git a/tools/perf/arch/arm64/util/perf_regs.c b/tools/perf/arch/arm64= /util/perf_regs.c index 09308665e28a..754bb8423733 100644 --- a/tools/perf/arch/arm64/util/perf_regs.c +++ b/tools/perf/arch/arm64/util/perf_regs.c @@ -140,10 +140,7 @@ int arch_sdt_arg_parse_op(char *old_op, char **new_op) return SDT_ARG_VALID; } =20 -uint64_t arch__intr_reg_mask(void) -{ - return PERF_REGS_MASK; -} +void arch__intr_reg_mask(unsigned long *mask) {} =20 uint64_t arch__user_reg_mask(void) { diff --git a/tools/perf/arch/csky/util/perf_regs.c b/tools/perf/arch/csky/u= til/perf_regs.c index 6b1665f41180..9d132150ecb6 100644 --- a/tools/perf/arch/csky/util/perf_regs.c +++ b/tools/perf/arch/csky/util/perf_regs.c @@ -6,10 +6,7 @@ static const struct sample_reg sample_reg_masks[] =3D { SMPL_REG_END }; =20 -uint64_t arch__intr_reg_mask(void) -{ - return PERF_REGS_MASK; -} +void arch__intr_reg_mask(unsigned long *mask) {} =20 uint64_t arch__user_reg_mask(void) { diff --git a/tools/perf/arch/loongarch/util/perf_regs.c b/tools/perf/arch/l= oongarch/util/perf_regs.c index f94a0210c7b7..3a3c2779efd4 100644 --- a/tools/perf/arch/loongarch/util/perf_regs.c +++ b/tools/perf/arch/loongarch/util/perf_regs.c @@ -6,10 +6,7 @@ static const struct sample_reg sample_reg_masks[] =3D { SMPL_REG_END }; =20 -uint64_t arch__intr_reg_mask(void) -{ - return PERF_REGS_MASK; -} +void arch__intr_reg_mask(unsigned long *mask) {} =20 uint64_t arch__user_reg_mask(void) { diff --git a/tools/perf/arch/mips/util/perf_regs.c b/tools/perf/arch/mips/u= til/perf_regs.c index 6b1665f41180..9d132150ecb6 100644 --- a/tools/perf/arch/mips/util/perf_regs.c +++ b/tools/perf/arch/mips/util/perf_regs.c @@ -6,10 +6,7 @@ static const struct sample_reg sample_reg_masks[] =3D { SMPL_REG_END }; =20 -uint64_t arch__intr_reg_mask(void) -{ - return PERF_REGS_MASK; -} +void arch__intr_reg_mask(unsigned long *mask) {} =20 uint64_t arch__user_reg_mask(void) { diff --git a/tools/perf/arch/powerpc/util/perf_regs.c b/tools/perf/arch/pow= erpc/util/perf_regs.c index e8e6e6fc6f17..08ab9ed692fb 100644 --- a/tools/perf/arch/powerpc/util/perf_regs.c +++ b/tools/perf/arch/powerpc/util/perf_regs.c @@ -186,7 +186,7 @@ int arch_sdt_arg_parse_op(char *old_op, char **new_op) return SDT_ARG_VALID; } =20 -uint64_t arch__intr_reg_mask(void) +void arch__intr_reg_mask(unsigned long *mask) { struct perf_event_attr attr =3D { .type =3D PERF_TYPE_HARDWARE, @@ -198,7 +198,9 @@ uint64_t arch__intr_reg_mask(void) }; int fd; u32 version; - u64 extended_mask =3D 0, mask =3D PERF_REGS_MASK; + u64 extended_mask =3D 0; + + *(u64 *)mask =3D PERF_REGS_MASK; =20 /* * Get the PVR value to set the extended @@ -223,9 +225,8 @@ uint64_t arch__intr_reg_mask(void) fd =3D sys_perf_event_open(&attr, 0, -1, -1, 0); if (fd !=3D -1) { close(fd); - mask |=3D extended_mask; + *(u64 *)mask |=3D extended_mask; } - return mask; } =20 uint64_t arch__user_reg_mask(void) diff --git a/tools/perf/arch/riscv/util/perf_regs.c b/tools/perf/arch/riscv= /util/perf_regs.c index 6b1665f41180..9d132150ecb6 100644 --- a/tools/perf/arch/riscv/util/perf_regs.c +++ b/tools/perf/arch/riscv/util/perf_regs.c @@ -6,10 +6,7 @@ static const struct sample_reg sample_reg_masks[] =3D { SMPL_REG_END }; =20 -uint64_t arch__intr_reg_mask(void) -{ - return PERF_REGS_MASK; -} +void arch__intr_reg_mask(unsigned long *mask) {} =20 uint64_t arch__user_reg_mask(void) { diff --git a/tools/perf/arch/s390/util/perf_regs.c b/tools/perf/arch/s390/u= til/perf_regs.c index 6b1665f41180..9d132150ecb6 100644 --- a/tools/perf/arch/s390/util/perf_regs.c +++ b/tools/perf/arch/s390/util/perf_regs.c @@ -6,10 +6,7 @@ static const struct sample_reg sample_reg_masks[] =3D { SMPL_REG_END }; =20 -uint64_t arch__intr_reg_mask(void) -{ - return PERF_REGS_MASK; -} +void arch__intr_reg_mask(unsigned long *mask) {} =20 uint64_t arch__user_reg_mask(void) { diff --git a/tools/perf/arch/x86/util/perf_regs.c b/tools/perf/arch/x86/uti= l/perf_regs.c index 9f492568f3b4..52f08498d005 100644 --- a/tools/perf/arch/x86/util/perf_regs.c +++ b/tools/perf/arch/x86/util/perf_regs.c @@ -283,7 +283,7 @@ const struct sample_reg *arch__sample_reg_masks(void) return sample_reg_masks; } =20 -uint64_t arch__intr_reg_mask(void) +void arch__intr_reg_mask(unsigned long *mask) { struct perf_event_attr attr =3D { .type =3D PERF_TYPE_HARDWARE, @@ -295,6 +295,9 @@ uint64_t arch__intr_reg_mask(void) .exclude_kernel =3D 1, }; int fd; + + *(u64 *)mask =3D PERF_REGS_MASK; + /* * In an unnamed union, init it here to build on older gcc versions */ @@ -320,10 +323,8 @@ uint64_t arch__intr_reg_mask(void) fd =3D sys_perf_event_open(&attr, 0, -1, -1, 0); if (fd !=3D -1) { close(fd); - return (PERF_REG_EXTENDED_MASK | PERF_REGS_MASK); + *(u64 *)mask |=3D PERF_REG_EXTENDED_MASK; } - - return PERF_REGS_MASK; } =20 uint64_t arch__user_reg_mask(void) diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index 9e47905f75a6..66d3923e4040 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -704,10 +704,11 @@ static int perf_session__check_output_opt(struct perf= _session *session) } =20 static int perf_sample__fprintf_regs(struct regs_dump *regs, uint64_t mask= , const char *arch, - FILE *fp) + unsigned long *mask_ext, FILE *fp) { unsigned i =3D 0, r; int printed =3D 0; + u64 val; =20 if (!regs || !regs->regs) return 0; @@ -715,7 +716,15 @@ static int perf_sample__fprintf_regs(struct regs_dump = *regs, uint64_t mask, cons printed +=3D fprintf(fp, " ABI:%" PRIu64 " ", regs->abi); =20 for_each_set_bit(r, (unsigned long *) &mask, sizeof(mask) * 8) { - u64 val =3D regs->regs[i++]; + val =3D regs->regs[i++]; + printed +=3D fprintf(fp, "%5s:0x%"PRIx64" ", perf_reg_name(r, arch), val= ); + } + + if (!mask_ext) + return printed; + + for_each_set_bit(r, mask_ext, PERF_NUM_EXT_REGS) { + val =3D regs->regs[i++]; printed +=3D fprintf(fp, "%5s:0x%"PRIx64" ", perf_reg_name(r, arch), val= ); } =20 @@ -776,14 +785,16 @@ static int perf_sample__fprintf_iregs(struct perf_sam= ple *sample, struct perf_event_attr *attr, const char *arch, FILE *fp) { return perf_sample__fprintf_regs(&sample->intr_regs, - attr->sample_regs_intr, arch, fp); + attr->sample_regs_intr, arch, + (unsigned long *)attr->sample_regs_intr_ext, + fp); } =20 static int perf_sample__fprintf_uregs(struct perf_sample *sample, struct perf_event_attr *attr, const char *arch, FILE *fp) { return perf_sample__fprintf_regs(&sample->user_regs, - attr->sample_regs_user, arch, fp); + attr->sample_regs_user, arch, NULL, fp); } =20 static int perf_sample__fprintf_start(struct perf_script *script, diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index f745723d486b..297b960ac446 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -1314,9 +1314,11 @@ void evsel__config(struct evsel *evsel, struct recor= d_opts *opts, if (callchain && callchain->enabled && !evsel->no_aux_samples) evsel__config_callchain(evsel, opts, callchain); =20 - if (opts->sample_intr_regs && !evsel->no_aux_samples && - !evsel__is_dummy_event(evsel)) { - attr->sample_regs_intr =3D opts->sample_intr_regs; + if (bitmap_weight(opts->sample_intr_regs, PERF_NUM_INTR_REGS_SIZE) && + !evsel->no_aux_samples && !evsel__is_dummy_event(evsel)) { + attr->sample_regs_intr =3D opts->sample_intr_regs[0]; + memcpy(attr->sample_regs_intr_ext, &opts->sample_intr_regs[1], + PERF_NUM_EXT_REGS / 8); evsel__set_sample_bit(evsel, REGS_INTR); } =20 @@ -3097,10 +3099,16 @@ int evsel__parse_sample(struct evsel *evsel, union = perf_event *event, =20 if (data->intr_regs.abi !=3D PERF_SAMPLE_REGS_ABI_NONE) { u64 mask =3D evsel->core.attr.sample_regs_intr; + unsigned long *mask_ext =3D + (unsigned long *)evsel->core.attr.sample_regs_intr_ext; + u64 *intr_regs_mask; =20 sz =3D hweight64(mask) * sizeof(u64); + sz +=3D bitmap_weight(mask_ext, PERF_NUM_EXT_REGS) * sizeof(u64); OVERFLOW_CHECK(array, sz, max_size); data->intr_regs.mask =3D mask; + intr_regs_mask =3D (u64 *)&data->intr_regs.mask_ext; + memcpy(&intr_regs_mask[1], mask_ext, PERF_NUM_EXT_REGS); data->intr_regs.regs =3D (u64 *)array; array =3D (void *)array + sz; } diff --git a/tools/perf/util/parse-regs-options.c b/tools/perf/util/parse-r= egs-options.c index cda1c620968e..666c2a172ef2 100644 --- a/tools/perf/util/parse-regs-options.c +++ b/tools/perf/util/parse-regs-options.c @@ -12,11 +12,13 @@ static int __parse_regs(const struct option *opt, const char *str, int unset, bool in= tr) { + unsigned int size =3D intr ? PERF_NUM_INTR_REGS * 64 : 64; uint64_t *mode =3D (uint64_t *)opt->value; const struct sample_reg *r =3D NULL; char *s, *os =3D NULL, *p; int ret =3D -1; - uint64_t mask; + DECLARE_BITMAP(mask, size); + DECLARE_BITMAP(mask_tmp, size); =20 if (unset) return 0; @@ -24,13 +26,14 @@ __parse_regs(const struct option *opt, const char *str,= int unset, bool intr) /* * cannot set it twice */ - if (*mode) + if (bitmap_weight((unsigned long *)mode, size)) return -1; =20 + bitmap_zero(mask, size); if (intr) - mask =3D arch__intr_reg_mask(); + arch__intr_reg_mask(mask); else - mask =3D arch__user_reg_mask(); + *(uint64_t *)mask =3D arch__user_reg_mask(); =20 /* str may be NULL in case no arg is passed to -I */ if (str) { @@ -47,7 +50,8 @@ __parse_regs(const struct option *opt, const char *str, i= nt unset, bool intr) if (!strcmp(s, "?")) { fprintf(stderr, "available registers: "); for (r =3D arch__sample_reg_masks(); r->name; r++) { - if (r->mask & mask) + bitmap_and(mask_tmp, mask, r->mask_ext, size); + if (bitmap_weight(mask_tmp, size)) fprintf(stderr, "%s ", r->name); } fputc('\n', stderr); @@ -55,7 +59,8 @@ __parse_regs(const struct option *opt, const char *str, i= nt unset, bool intr) goto error; } for (r =3D arch__sample_reg_masks(); r->name; r++) { - if ((r->mask & mask) && !strcasecmp(s, r->name)) + bitmap_and(mask_tmp, mask, r->mask_ext, size); + if (bitmap_weight(mask_tmp, size) && !strcasecmp(s, r->name)) break; } if (!r || !r->name) { @@ -64,7 +69,7 @@ __parse_regs(const struct option *opt, const char *str, i= nt unset, bool intr) goto error; } =20 - *mode |=3D r->mask; + bitmap_or((unsigned long *)mode, (unsigned long *)mode, r->mask_ext, si= ze); =20 if (!p) break; @@ -75,8 +80,8 @@ __parse_regs(const struct option *opt, const char *str, i= nt unset, bool intr) ret =3D 0; =20 /* default to all possible regs */ - if (*mode =3D=3D 0) - *mode =3D mask; + if (!bitmap_weight((unsigned long *)mode, size)) + bitmap_or((unsigned long *)mode, (unsigned long *)mode, mask, size); error: free(os); return ret; diff --git a/tools/perf/util/perf_regs.c b/tools/perf/util/perf_regs.c index 44b90bbf2d07..b36eafc10e84 100644 --- a/tools/perf/util/perf_regs.c +++ b/tools/perf/util/perf_regs.c @@ -11,11 +11,6 @@ int __weak arch_sdt_arg_parse_op(char *old_op __maybe_un= used, return SDT_ARG_SKIP; } =20 -uint64_t __weak arch__intr_reg_mask(void) -{ - return 0; -} - uint64_t __weak arch__user_reg_mask(void) { return 0; diff --git a/tools/perf/util/perf_regs.h b/tools/perf/util/perf_regs.h index f2d0736d65cc..5018b8d040ee 100644 --- a/tools/perf/util/perf_regs.h +++ b/tools/perf/util/perf_regs.h @@ -4,18 +4,32 @@ =20 #include #include +#include +#include +#include "util/record.h" =20 struct regs_dump; =20 struct sample_reg { const char *name; - uint64_t mask; + union { + uint64_t mask; + DECLARE_BITMAP(mask_ext, PERF_NUM_INTR_REGS * 64); + }; }; =20 #define SMPL_REG_MASK(b) (1ULL << (b)) #define SMPL_REG(n, b) { .name =3D #n, .mask =3D SMPL_REG_MASK(b) } #define SMPL_REG2_MASK(b) (3ULL << (b)) #define SMPL_REG2(n, b) { .name =3D #n, .mask =3D SMPL_REG2_MASK(b) } +#define SMPL_REG_EXT(n, b) \ + { .name =3D #n, .mask_ext[b / __BITS_PER_LONG] =3D 0x1ULL << (b % __BITS_= PER_LONG) } +#define SMPL_REG2_EXT(n, b) \ + { .name =3D #n, .mask_ext[b / __BITS_PER_LONG] =3D 0x3ULL << (b % __BITS_= PER_LONG) } +#define SMPL_REG4_EXT(n, b) \ + { .name =3D #n, .mask_ext[b / __BITS_PER_LONG] =3D 0xfULL << (b % __BITS_= PER_LONG) } +#define SMPL_REG8_EXT(n, b) \ + { .name =3D #n, .mask_ext[b / __BITS_PER_LONG] =3D 0xffULL << (b % __BITS= _PER_LONG) } #define SMPL_REG_END { .name =3D NULL } =20 enum { @@ -24,7 +38,7 @@ enum { }; =20 int arch_sdt_arg_parse_op(char *old_op, char **new_op); -uint64_t arch__intr_reg_mask(void); +void arch__intr_reg_mask(unsigned long *mask); uint64_t arch__user_reg_mask(void); const struct sample_reg *arch__sample_reg_masks(void); =20 diff --git a/tools/perf/util/record.h b/tools/perf/util/record.h index a6566134e09e..16e44a640e57 100644 --- a/tools/perf/util/record.h +++ b/tools/perf/util/record.h @@ -57,7 +57,7 @@ struct record_opts { unsigned int auxtrace_mmap_pages; unsigned int user_freq; u64 branch_stack; - u64 sample_intr_regs; + u64 sample_intr_regs[PERF_NUM_INTR_REGS]; u64 sample_user_regs; u64 default_interval; u64 user_interval; diff --git a/tools/perf/util/sample.h b/tools/perf/util/sample.h index 70b2c3135555..98c9c4260de6 100644 --- a/tools/perf/util/sample.h +++ b/tools/perf/util/sample.h @@ -4,13 +4,17 @@ =20 #include #include +#include =20 /* number of register is bound by the number of bits in regs_dump::mask (6= 4) */ #define PERF_SAMPLE_REGS_CACHE_SIZE (8 * sizeof(u64)) =20 struct regs_dump { u64 abi; - u64 mask; + union { + u64 mask; + DECLARE_BITMAP(mask_ext, PERF_NUM_INTR_REGS * 64); + }; u64 *regs; =20 /* Cached values/mask filled by first register access. */ diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c index 507e6cba9545..995f5c2963bc 100644 --- a/tools/perf/util/session.c +++ b/tools/perf/util/session.c @@ -909,12 +909,13 @@ static void branch_stack__printf(struct perf_sample *= sample, } } =20 -static void regs_dump__printf(u64 mask, u64 *regs, const char *arch) +static void regs_dump__printf(bool intr, struct regs_dump *regs, const cha= r *arch) { + unsigned int size =3D intr ? PERF_NUM_INTR_REGS * 64 : 64; unsigned rid, i =3D 0; =20 - for_each_set_bit(rid, (unsigned long *) &mask, sizeof(mask) * 8) { - u64 val =3D regs[i++]; + for_each_set_bit(rid, regs->mask_ext, size) { + u64 val =3D regs->regs[i++]; =20 printf(".... %-5s 0x%016" PRIx64 "\n", perf_reg_name(rid, arch), val); @@ -935,16 +936,22 @@ static inline const char *regs_dump_abi(struct regs_d= ump *d) return regs_abi[d->abi]; } =20 -static void regs__printf(const char *type, struct regs_dump *regs, const c= har *arch) +static void regs__printf(bool intr, struct regs_dump *regs, const char *ar= ch) { - u64 mask =3D regs->mask; + if (intr) { + u64 *mask =3D (u64 *)®s->mask_ext; =20 - printf("... %s regs: mask 0x%" PRIx64 " ABI %s\n", - type, - mask, - regs_dump_abi(regs)); + printf("... intr regs: mask 0x"); + for (int i =3D 0; i < PERF_NUM_INTR_REGS; i++) + printf("%" PRIx64 "", mask[i]); + printf(" ABI %s\n", regs_dump_abi(regs)); + } else { + printf("... user regs: mask 0x%" PRIx64 " ABI %s\n", + regs->mask, + regs_dump_abi(regs)); + } =20 - regs_dump__printf(mask, regs->regs, arch); + regs_dump__printf(intr, regs, arch); } =20 static void regs_user__printf(struct perf_sample *sample, const char *arch) @@ -952,7 +959,7 @@ static void regs_user__printf(struct perf_sample *sampl= e, const char *arch) struct regs_dump *user_regs =3D &sample->user_regs; =20 if (user_regs->regs) - regs__printf("user", user_regs, arch); + regs__printf(false, user_regs, arch); } =20 static void regs_intr__printf(struct perf_sample *sample, const char *arch) @@ -960,7 +967,7 @@ static void regs_intr__printf(struct perf_sample *sampl= e, const char *arch) struct regs_dump *intr_regs =3D &sample->intr_regs; =20 if (intr_regs->regs) - regs__printf("intr", intr_regs, arch); + regs__printf(true, intr_regs, arch); } =20 static void stack_user__printf(struct stack_dump *dump) diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic= -events.c index a58444c4aed1..35c5d58aa45f 100644 --- a/tools/perf/util/synthetic-events.c +++ b/tools/perf/util/synthetic-events.c @@ -1538,7 +1538,9 @@ size_t perf_event__sample_event_size(const struct per= f_sample *sample, u64 type, if (type & PERF_SAMPLE_REGS_INTR) { if (sample->intr_regs.abi) { result +=3D sizeof(u64); - sz =3D hweight64(sample->intr_regs.mask) * sizeof(u64); + sz =3D bitmap_weight(sample->intr_regs.mask_ext, + PERF_NUM_INTR_REGS * 64) * + sizeof(u64); result +=3D sz; } else { result +=3D sizeof(u64); @@ -1741,7 +1743,8 @@ int perf_event__synthesize_sample(union perf_event *e= vent, u64 type, u64 read_fo if (type & PERF_SAMPLE_REGS_INTR) { if (sample->intr_regs.abi) { *array++ =3D sample->intr_regs.abi; - sz =3D hweight64(sample->intr_regs.mask) * sizeof(u64); + sz =3D bitmap_weight(sample->intr_regs.mask_ext, + PERF_NUM_INTR_REGS * 64) * sizeof(u64); memcpy(array, sample->intr_regs.regs, sz); array =3D (void *)array + sz; } else { --=20 2.40.1 From nobody Wed Feb 11 05:49:46 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E96791F37D8; Thu, 23 Jan 2025 06:21:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737613279; cv=none; b=G1AWL1IZKSbnN4nXzV1OQSVXvJlHo+JCozs/123w3Y3kRHbtmhIkDaix/PkF5ZunmuxkuyvFHJ9nLPeKNOL+juCKm3Ud5ZEpqkD4HN8WvMog5KthjzfB77uPnDXrwywtwGpSgu+RYYN6d83FDFKUZnEkbo/GQkQAyUc/zW2lMwc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737613279; c=relaxed/simple; bh=w8gR4PcOOTjGZyakv8XvIAL6ly8LE62WRlXRxjSneA8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=RZX9cLF3fkkHQjtEso8/wOMoeFXOSiwreaBi1gbbCA9bixu7Zw3H779Hx/2oV5qazPsfANV4IzVmqpjzuo4Ug5/peHZLNOk+0R5lHZlIG5NjPr8WtTibo0wmkeC9Vngz9CzKH449OOIyN6IsZYPwrT6Wc0WNgAU/hpHKx3nXLhk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=jMHQBxwo; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="jMHQBxwo" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1737613278; x=1769149278; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=w8gR4PcOOTjGZyakv8XvIAL6ly8LE62WRlXRxjSneA8=; b=jMHQBxwonzJYQf82pC2iuGSQj1XTesv0P1mzEspRUahV884MsyZqgRmQ KpDwdjF3kk/87pfKWweNAPpfjIa4jIrPna76RH6yfKmXSiZaxI1IVKBj9 DRtYWAZdSvj6t0Fg+CPct9ltdzTsMf2DDHFr4dJOklXpMfucqMcnbCqSD ZvC0hl6azunA4N1alVsSdeMSgsykrszquadOTo8etTYZAV6yIzaT25viX 2z+DNUgPdp8fVVLZTn0E/EqttJvenIbyQFoXmjBB7xDNGIXIZD7N+3xmq AHKPWpGdKesWHTDteskbSqkadr9IMkB20gCeKz/MOixEdVVMz/f7rgNQH Q==; X-CSE-ConnectionGUID: rZaN2odcRqSNbX/f8/ii9A== X-CSE-MsgGUID: FHaRfDx9TCeJ31qpI8KhUQ== X-IronPort-AV: E=McAfee;i="6700,10204,11323"; a="55513246" X-IronPort-AV: E=Sophos;i="6.13,227,1732608000"; d="scan'208";a="55513246" Received: from orviesa003.jf.intel.com ([10.64.159.143]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jan 2025 22:21:18 -0800 X-CSE-ConnectionGUID: owrbzEC9SYeOnJjjtgycuQ== X-CSE-MsgGUID: Y4FDZU0FSa2eMiTWp4+8jA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,199,1725346800"; d="scan'208";a="112334801" Received: from emr.sh.intel.com ([10.112.229.56]) by orviesa003.jf.intel.com with ESMTP; 22 Jan 2025 22:21:14 -0800 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Ian Rogers , Adrian Hunter , Alexander Shishkin , Kan Liang , Andi Kleen , Eranian Stephane Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Dapeng Mi , Dapeng Mi Subject: [PATCH 19/20] perf tools: Support to capture more vector registers (x86/Intel part) Date: Thu, 23 Jan 2025 14:07:20 +0000 Message-Id: <20250123140721.2496639-20-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250123140721.2496639-1-dapeng1.mi@linux.intel.com> References: <20250123140721.2496639-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Intel architectural PEBS supports to capture more vector registers like OPMASK/YMM/ZMM registers besides already supported XMM registers. This patch adds Intel specific support to capture these new vector registers for perf tools. Besides, add SSP in perf regs. SSP is stored in general register group and is selected by sample_regs_intr. Co-developed-by: Kan Liang Signed-off-by: Kan Liang Signed-off-by: Dapeng Mi --- tools/arch/x86/include/uapi/asm/perf_regs.h | 83 +++++++++++++++- tools/perf/arch/x86/util/perf_regs.c | 99 +++++++++++++++++++ .../perf/util/perf-regs-arch/perf_regs_x86.c | 88 +++++++++++++++++ 3 files changed, 269 insertions(+), 1 deletion(-) diff --git a/tools/arch/x86/include/uapi/asm/perf_regs.h b/tools/arch/x86/i= nclude/uapi/asm/perf_regs.h index 158e353070c3..f723e8bf9963 100644 --- a/tools/arch/x86/include/uapi/asm/perf_regs.h +++ b/tools/arch/x86/include/uapi/asm/perf_regs.h @@ -33,7 +33,7 @@ enum perf_event_x86_regs { PERF_REG_X86_64_MAX =3D PERF_REG_X86_SSP + 1, PERF_REG_INTEL_PT_MAX =3D PERF_REG_X86_R15 + 1, =20 - /* These all need two bits set because they are 128bit */ + /* These all need two bits set because they are 128 bits */ PERF_REG_X86_XMM0 =3D 32, PERF_REG_X86_XMM1 =3D 34, PERF_REG_X86_XMM2 =3D 36, @@ -53,6 +53,87 @@ enum perf_event_x86_regs { =20 /* These include both GPRs and XMMX registers */ PERF_REG_X86_XMM_MAX =3D PERF_REG_X86_XMM15 + 2, + + /* + * YMM upper bits need two bits set because they are 128 bits. + * PERF_REG_X86_YMMH0 =3D 64 + */ + PERF_REG_X86_YMMH0 =3D PERF_REG_X86_XMM_MAX, + PERF_REG_X86_YMMH1 =3D PERF_REG_X86_YMMH0 + 2, + PERF_REG_X86_YMMH2 =3D PERF_REG_X86_YMMH1 + 2, + PERF_REG_X86_YMMH3 =3D PERF_REG_X86_YMMH2 + 2, + PERF_REG_X86_YMMH4 =3D PERF_REG_X86_YMMH3 + 2, + PERF_REG_X86_YMMH5 =3D PERF_REG_X86_YMMH4 + 2, + PERF_REG_X86_YMMH6 =3D PERF_REG_X86_YMMH5 + 2, + PERF_REG_X86_YMMH7 =3D PERF_REG_X86_YMMH6 + 2, + PERF_REG_X86_YMMH8 =3D PERF_REG_X86_YMMH7 + 2, + PERF_REG_X86_YMMH9 =3D PERF_REG_X86_YMMH8 + 2, + PERF_REG_X86_YMMH10 =3D PERF_REG_X86_YMMH9 + 2, + PERF_REG_X86_YMMH11 =3D PERF_REG_X86_YMMH10 + 2, + PERF_REG_X86_YMMH12 =3D PERF_REG_X86_YMMH11 + 2, + PERF_REG_X86_YMMH13 =3D PERF_REG_X86_YMMH12 + 2, + PERF_REG_X86_YMMH14 =3D PERF_REG_X86_YMMH13 + 2, + PERF_REG_X86_YMMH15 =3D PERF_REG_X86_YMMH14 + 2, + PERF_REG_X86_YMMH_MAX =3D PERF_REG_X86_YMMH15 + 2, + + /* + * ZMM0-15 upper bits need four bits set because they are 256 bits + * PERF_REG_X86_ZMMH0 =3D 96 + */ + PERF_REG_X86_ZMMH0 =3D PERF_REG_X86_YMMH_MAX, + PERF_REG_X86_ZMMH1 =3D PERF_REG_X86_ZMMH0 + 4, + PERF_REG_X86_ZMMH2 =3D PERF_REG_X86_ZMMH1 + 4, + PERF_REG_X86_ZMMH3 =3D PERF_REG_X86_ZMMH2 + 4, + PERF_REG_X86_ZMMH4 =3D PERF_REG_X86_ZMMH3 + 4, + PERF_REG_X86_ZMMH5 =3D PERF_REG_X86_ZMMH4 + 4, + PERF_REG_X86_ZMMH6 =3D PERF_REG_X86_ZMMH5 + 4, + PERF_REG_X86_ZMMH7 =3D PERF_REG_X86_ZMMH6 + 4, + PERF_REG_X86_ZMMH8 =3D PERF_REG_X86_ZMMH7 + 4, + PERF_REG_X86_ZMMH9 =3D PERF_REG_X86_ZMMH8 + 4, + PERF_REG_X86_ZMMH10 =3D PERF_REG_X86_ZMMH9 + 4, + PERF_REG_X86_ZMMH11 =3D PERF_REG_X86_ZMMH10 + 4, + PERF_REG_X86_ZMMH12 =3D PERF_REG_X86_ZMMH11 + 4, + PERF_REG_X86_ZMMH13 =3D PERF_REG_X86_ZMMH12 + 4, + PERF_REG_X86_ZMMH14 =3D PERF_REG_X86_ZMMH13 + 4, + PERF_REG_X86_ZMMH15 =3D PERF_REG_X86_ZMMH14 + 4, + PERF_REG_X86_ZMMH_MAX =3D PERF_REG_X86_ZMMH15 + 4, + + /* + * ZMM16-31 need eight bits set because they are 512 bits + * PERF_REG_X86_ZMM16 =3D 160 + */ + PERF_REG_X86_ZMM16 =3D PERF_REG_X86_ZMMH_MAX, + PERF_REG_X86_ZMM17 =3D PERF_REG_X86_ZMM16 + 8, + PERF_REG_X86_ZMM18 =3D PERF_REG_X86_ZMM17 + 8, + PERF_REG_X86_ZMM19 =3D PERF_REG_X86_ZMM18 + 8, + PERF_REG_X86_ZMM20 =3D PERF_REG_X86_ZMM19 + 8, + PERF_REG_X86_ZMM21 =3D PERF_REG_X86_ZMM20 + 8, + PERF_REG_X86_ZMM22 =3D PERF_REG_X86_ZMM21 + 8, + PERF_REG_X86_ZMM23 =3D PERF_REG_X86_ZMM22 + 8, + PERF_REG_X86_ZMM24 =3D PERF_REG_X86_ZMM23 + 8, + PERF_REG_X86_ZMM25 =3D PERF_REG_X86_ZMM24 + 8, + PERF_REG_X86_ZMM26 =3D PERF_REG_X86_ZMM25 + 8, + PERF_REG_X86_ZMM27 =3D PERF_REG_X86_ZMM26 + 8, + PERF_REG_X86_ZMM28 =3D PERF_REG_X86_ZMM27 + 8, + PERF_REG_X86_ZMM29 =3D PERF_REG_X86_ZMM28 + 8, + PERF_REG_X86_ZMM30 =3D PERF_REG_X86_ZMM29 + 8, + PERF_REG_X86_ZMM31 =3D PERF_REG_X86_ZMM30 + 8, + PERF_REG_X86_ZMM_MAX =3D PERF_REG_X86_ZMM31 + 8, + + /* + * OPMASK Registers + * PERF_REG_X86_OPMASK0 =3D 288 + */ + PERF_REG_X86_OPMASK0 =3D PERF_REG_X86_ZMM_MAX, + PERF_REG_X86_OPMASK1 =3D PERF_REG_X86_OPMASK0 + 1, + PERF_REG_X86_OPMASK2 =3D PERF_REG_X86_OPMASK1 + 1, + PERF_REG_X86_OPMASK3 =3D PERF_REG_X86_OPMASK2 + 1, + PERF_REG_X86_OPMASK4 =3D PERF_REG_X86_OPMASK3 + 1, + PERF_REG_X86_OPMASK5 =3D PERF_REG_X86_OPMASK4 + 1, + PERF_REG_X86_OPMASK6 =3D PERF_REG_X86_OPMASK5 + 1, + PERF_REG_X86_OPMASK7 =3D PERF_REG_X86_OPMASK6 + 1, + + PERF_REG_X86_VEC_MAX =3D PERF_REG_X86_OPMASK7 + 1, }; =20 #define PERF_REG_EXTENDED_MASK (~((1ULL << PERF_REG_X86_XMM0) - 1)) diff --git a/tools/perf/arch/x86/util/perf_regs.c b/tools/perf/arch/x86/uti= l/perf_regs.c index 52f08498d005..e233e6fe2c72 100644 --- a/tools/perf/arch/x86/util/perf_regs.c +++ b/tools/perf/arch/x86/util/perf_regs.c @@ -54,6 +54,67 @@ static const struct sample_reg sample_reg_masks[] =3D { SMPL_REG2(XMM13, PERF_REG_X86_XMM13), SMPL_REG2(XMM14, PERF_REG_X86_XMM14), SMPL_REG2(XMM15, PERF_REG_X86_XMM15), + + SMPL_REG2_EXT(YMMH0, PERF_REG_X86_YMMH0), + SMPL_REG2_EXT(YMMH1, PERF_REG_X86_YMMH1), + SMPL_REG2_EXT(YMMH2, PERF_REG_X86_YMMH2), + SMPL_REG2_EXT(YMMH3, PERF_REG_X86_YMMH3), + SMPL_REG2_EXT(YMMH4, PERF_REG_X86_YMMH4), + SMPL_REG2_EXT(YMMH5, PERF_REG_X86_YMMH5), + SMPL_REG2_EXT(YMMH6, PERF_REG_X86_YMMH6), + SMPL_REG2_EXT(YMMH7, PERF_REG_X86_YMMH7), + SMPL_REG2_EXT(YMMH8, PERF_REG_X86_YMMH8), + SMPL_REG2_EXT(YMMH9, PERF_REG_X86_YMMH9), + SMPL_REG2_EXT(YMMH10, PERF_REG_X86_YMMH10), + SMPL_REG2_EXT(YMMH11, PERF_REG_X86_YMMH11), + SMPL_REG2_EXT(YMMH12, PERF_REG_X86_YMMH12), + SMPL_REG2_EXT(YMMH13, PERF_REG_X86_YMMH13), + SMPL_REG2_EXT(YMMH14, PERF_REG_X86_YMMH14), + SMPL_REG2_EXT(YMMH15, PERF_REG_X86_YMMH15), + + SMPL_REG4_EXT(ZMMH0, PERF_REG_X86_ZMMH0), + SMPL_REG4_EXT(ZMMH1, PERF_REG_X86_ZMMH1), + SMPL_REG4_EXT(ZMMH2, PERF_REG_X86_ZMMH2), + SMPL_REG4_EXT(ZMMH3, PERF_REG_X86_ZMMH3), + SMPL_REG4_EXT(ZMMH4, PERF_REG_X86_ZMMH4), + SMPL_REG4_EXT(ZMMH5, PERF_REG_X86_ZMMH5), + SMPL_REG4_EXT(ZMMH6, PERF_REG_X86_ZMMH6), + SMPL_REG4_EXT(ZMMH7, PERF_REG_X86_ZMMH7), + SMPL_REG4_EXT(ZMMH8, PERF_REG_X86_ZMMH8), + SMPL_REG4_EXT(ZMMH9, PERF_REG_X86_ZMMH9), + SMPL_REG4_EXT(ZMMH10, PERF_REG_X86_ZMMH10), + SMPL_REG4_EXT(ZMMH11, PERF_REG_X86_ZMMH11), + SMPL_REG4_EXT(ZMMH12, PERF_REG_X86_ZMMH12), + SMPL_REG4_EXT(ZMMH13, PERF_REG_X86_ZMMH13), + SMPL_REG4_EXT(ZMMH14, PERF_REG_X86_ZMMH14), + SMPL_REG4_EXT(ZMMH15, PERF_REG_X86_ZMMH15), + + SMPL_REG8_EXT(ZMM16, PERF_REG_X86_ZMM16), + SMPL_REG8_EXT(ZMM17, PERF_REG_X86_ZMM17), + SMPL_REG8_EXT(ZMM18, PERF_REG_X86_ZMM18), + SMPL_REG8_EXT(ZMM19, PERF_REG_X86_ZMM19), + SMPL_REG8_EXT(ZMM20, PERF_REG_X86_ZMM20), + SMPL_REG8_EXT(ZMM21, PERF_REG_X86_ZMM21), + SMPL_REG8_EXT(ZMM22, PERF_REG_X86_ZMM22), + SMPL_REG8_EXT(ZMM23, PERF_REG_X86_ZMM23), + SMPL_REG8_EXT(ZMM24, PERF_REG_X86_ZMM24), + SMPL_REG8_EXT(ZMM25, PERF_REG_X86_ZMM25), + SMPL_REG8_EXT(ZMM26, PERF_REG_X86_ZMM26), + SMPL_REG8_EXT(ZMM27, PERF_REG_X86_ZMM27), + SMPL_REG8_EXT(ZMM28, PERF_REG_X86_ZMM28), + SMPL_REG8_EXT(ZMM29, PERF_REG_X86_ZMM29), + SMPL_REG8_EXT(ZMM30, PERF_REG_X86_ZMM30), + SMPL_REG8_EXT(ZMM31, PERF_REG_X86_ZMM31), + + SMPL_REG_EXT(OPMASK0, PERF_REG_X86_OPMASK0), + SMPL_REG_EXT(OPMASK1, PERF_REG_X86_OPMASK1), + SMPL_REG_EXT(OPMASK2, PERF_REG_X86_OPMASK2), + SMPL_REG_EXT(OPMASK3, PERF_REG_X86_OPMASK3), + SMPL_REG_EXT(OPMASK4, PERF_REG_X86_OPMASK4), + SMPL_REG_EXT(OPMASK5, PERF_REG_X86_OPMASK5), + SMPL_REG_EXT(OPMASK6, PERF_REG_X86_OPMASK6), + SMPL_REG_EXT(OPMASK7, PERF_REG_X86_OPMASK7), + SMPL_REG_END }; =20 @@ -283,6 +344,32 @@ const struct sample_reg *arch__sample_reg_masks(void) return sample_reg_masks; } =20 +static void check_intr_reg_ext_mask(struct perf_event_attr *attr, int idx, + u64 fmask, unsigned long *mask) +{ + u64 src_mask[PERF_NUM_INTR_REGS] =3D { 0 }; + int fd; + + attr->sample_regs_intr =3D 0; + attr->sample_regs_intr_ext[idx] =3D fmask; + src_mask[idx + 1] =3D fmask; + + fd =3D sys_perf_event_open(attr, 0, -1, -1, 0); + if (fd !=3D -1) { + close(fd); + bitmap_or(mask, mask, (unsigned long *)src_mask, + PERF_NUM_INTR_REGS * 64); + } +} + +#define PERF_REG_EXTENDED_YMMH_MASK GENMASK_ULL(31, 0) +#define PERF_REG_EXTENDED_ZMMH_1ST_MASK GENMASK_ULL(63, 32) +#define PERF_REG_EXTENDED_ZMMH_2ND_MASK GENMASK_ULL(31, 0) +#define PERF_REG_EXTENDED_ZMM_1ST_MASK GENMASK_ULL(63, 32) +#define PERF_REG_EXTENDED_ZMM_2ND_MASK GENMASK_ULL(63, 0) +#define PERF_REG_EXTENDED_ZMM_3RD_MASK GENMASK_ULL(31, 0) +#define PERF_REG_EXTENDED_OPMASK_MASK GENMASK_ULL(39, 32) + void arch__intr_reg_mask(unsigned long *mask) { struct perf_event_attr attr =3D { @@ -325,6 +412,18 @@ void arch__intr_reg_mask(unsigned long *mask) close(fd); *(u64 *)mask |=3D PERF_REG_EXTENDED_MASK; } + + /* Check YMMH regs */ + check_intr_reg_ext_mask(&attr, 0, PERF_REG_EXTENDED_YMMH_MASK, mask); + /* Check ZMMLH0-15 regs */ + check_intr_reg_ext_mask(&attr, 0, PERF_REG_EXTENDED_ZMMH_1ST_MASK, mask); + check_intr_reg_ext_mask(&attr, 1, PERF_REG_EXTENDED_ZMMH_2ND_MASK, mask); + /* Check ZMM16-31 regs */ + check_intr_reg_ext_mask(&attr, 1, PERF_REG_EXTENDED_ZMM_1ST_MASK, mask); + check_intr_reg_ext_mask(&attr, 2, PERF_REG_EXTENDED_ZMM_2ND_MASK, mask); + check_intr_reg_ext_mask(&attr, 3, PERF_REG_EXTENDED_ZMM_3RD_MASK, mask); + /* Check OPMASK regs */ + check_intr_reg_ext_mask(&attr, 3, PERF_REG_EXTENDED_OPMASK_MASK, mask); } =20 uint64_t arch__user_reg_mask(void) diff --git a/tools/perf/util/perf-regs-arch/perf_regs_x86.c b/tools/perf/ut= il/perf-regs-arch/perf_regs_x86.c index 9a909f02bc04..c926046ebddc 100644 --- a/tools/perf/util/perf-regs-arch/perf_regs_x86.c +++ b/tools/perf/util/perf-regs-arch/perf_regs_x86.c @@ -78,6 +78,94 @@ const char *__perf_reg_name_x86(int id) XMM(14) XMM(15) #undef XMM + +#define YMMH(x) \ + case PERF_REG_X86_YMMH ## x: \ + case PERF_REG_X86_YMMH ## x + 1: \ + return "YMMH" #x; + YMMH(0) + YMMH(1) + YMMH(2) + YMMH(3) + YMMH(4) + YMMH(5) + YMMH(6) + YMMH(7) + YMMH(8) + YMMH(9) + YMMH(10) + YMMH(11) + YMMH(12) + YMMH(13) + YMMH(14) + YMMH(15) +#undef YMMH + +#define ZMMH(x) \ + case PERF_REG_X86_ZMMH ## x: \ + case PERF_REG_X86_ZMMH ## x + 1: \ + case PERF_REG_X86_ZMMH ## x + 2: \ + case PERF_REG_X86_ZMMH ## x + 3: \ + return "ZMMLH" #x; + ZMMH(0) + ZMMH(1) + ZMMH(2) + ZMMH(3) + ZMMH(4) + ZMMH(5) + ZMMH(6) + ZMMH(7) + ZMMH(8) + ZMMH(9) + ZMMH(10) + ZMMH(11) + ZMMH(12) + ZMMH(13) + ZMMH(14) + ZMMH(15) +#undef ZMMH + +#define ZMM(x) \ + case PERF_REG_X86_ZMM ## x: \ + case PERF_REG_X86_ZMM ## x + 1: \ + case PERF_REG_X86_ZMM ## x + 2: \ + case PERF_REG_X86_ZMM ## x + 3: \ + case PERF_REG_X86_ZMM ## x + 4: \ + case PERF_REG_X86_ZMM ## x + 5: \ + case PERF_REG_X86_ZMM ## x + 6: \ + case PERF_REG_X86_ZMM ## x + 7: \ + return "ZMM" #x; + ZMM(16) + ZMM(17) + ZMM(18) + ZMM(19) + ZMM(20) + ZMM(21) + ZMM(22) + ZMM(23) + ZMM(24) + ZMM(25) + ZMM(26) + ZMM(27) + ZMM(28) + ZMM(29) + ZMM(30) + ZMM(31) +#undef ZMM + +#define OPMASK(x) \ + case PERF_REG_X86_OPMASK ## x: \ + return "opmask" #x; + + OPMASK(0) + OPMASK(1) + OPMASK(2) + OPMASK(3) + OPMASK(4) + OPMASK(5) + OPMASK(6) + OPMASK(7) +#undef OPMASK default: return NULL; } --=20 2.40.1 From nobody Wed Feb 11 05:49:46 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6B1CA1F4E35; Thu, 23 Jan 2025 06:21:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737613282; cv=none; b=SnykATwLrgscLkwhKBBCmBND13x7n27nKFs+XsC3fVSQaiv2T/F3mGj0QvaNV2PpYzpd2pNg7S9UVKu3dHDk6EjUhBZy+Q1mS98KLrZwqJF4+10/InJj8sQ3L05FRf+sat6NkvV7q0T5wNEPfGm2w8pN6Xg4fk1Niva3YJdGBDY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737613282; c=relaxed/simple; bh=z8tAD9zaxuzIlqzE32+Zvrc3lbsY4IkgS8q2NB9ON2M=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=gidcFBfONV/pt5EWhOvoxkSbtp+ONdrjzhAakk8LjLwTAbTDjSx0cYjmPSiIBMuhSt3oPSMa8DhUwV79R8p+8dAYsVJN/N6bqDcWYNRAbEMZ9s82TbCHTHk2c4azn9YXED49GJJpXKpLeYuo/kMarSFPWeRL05NmF2F30aR6fY4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=DFf7T6Gm; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="DFf7T6Gm" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1737613282; x=1769149282; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=z8tAD9zaxuzIlqzE32+Zvrc3lbsY4IkgS8q2NB9ON2M=; b=DFf7T6GmuN5c/HtIv+yLWT3bzsqZCQsw0mqOfHLWppTHqUO1nFQRQ+s9 pEFXi+a4n6kk+kmd0jVbPbwZyPVvc3mMjWek/ICjv+Ofu8fMvhZK1pEaF Pt2nWqcI3mxrU0LyJD8gq2+jTdo9jGdyUhbSLbGbiVx3HX6Du93+G3wJ2 qvvmlHA+w8415QACLxTKrAMGj6rBuf7Ro8ER3dfSAi2so0AlqfAlUr+6Y L2FQJok89g4uAqOvroIDmDjitnl9Jo8vX+pcnw6OcXaLEVqtjAcfsGPdb etteAjsztexhOJm+5pcjXFwUOgiXZtpendWu3nrJvhA7I7nw3hgs6Rc8S Q==; X-CSE-ConnectionGUID: qjY6WSZKR8ukcy/30Jxpqw== X-CSE-MsgGUID: X2BNBf9pT7S+rYDO1q1E2g== X-IronPort-AV: E=McAfee;i="6700,10204,11323"; a="55513252" X-IronPort-AV: E=Sophos;i="6.13,227,1732608000"; d="scan'208";a="55513252" Received: from orviesa003.jf.intel.com ([10.64.159.143]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jan 2025 22:21:22 -0800 X-CSE-ConnectionGUID: Yl7J6cnnT7SHcDtuLAVk7w== X-CSE-MsgGUID: /Rc1ZM2kQKSx/gdRZENS3Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,199,1725346800"; d="scan'208";a="112334812" Received: from emr.sh.intel.com ([10.112.229.56]) by orviesa003.jf.intel.com with ESMTP; 22 Jan 2025 22:21:17 -0800 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Ian Rogers , Adrian Hunter , Alexander Shishkin , Kan Liang , Andi Kleen , Eranian Stephane Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Dapeng Mi , Dapeng Mi Subject: [PATCH 20/20] perf tools/tests: Add vector registers PEBS sampling test Date: Thu, 23 Jan 2025 14:07:21 +0000 Message-Id: <20250123140721.2496639-21-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250123140721.2496639-1-dapeng1.mi@linux.intel.com> References: <20250123140721.2496639-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Current adaptive PEBS supports to capture some vector registers like XMM register, and arch-PEBS supports to capture wider vector registers like YMM and ZMM registers. This patch adds a perf test case to verify these vector registers can be captured correctly. Suggested-by: Kan Liang Signed-off-by: Dapeng Mi --- tools/perf/tests/shell/record.sh | 55 ++++++++++++++++++++++++++++++++ 1 file changed, 55 insertions(+) diff --git a/tools/perf/tests/shell/record.sh b/tools/perf/tests/shell/reco= rd.sh index 0fc7a909ae9b..521eaa1972f9 100755 --- a/tools/perf/tests/shell/record.sh +++ b/tools/perf/tests/shell/record.sh @@ -116,6 +116,60 @@ test_register_capture() { echo "Register capture test [Success]" } =20 +test_vec_register_capture() { + echo "Vector register capture test" + if ! perf record -o /dev/null --quiet -e instructions:p true 2> /dev/null + then + echo "Vector register capture test [Skipped missing event]" + return + fi + if ! perf record --intr-regs=3D\? 2>&1 | grep -q 'XMM0' + then + echo "Vector register capture test [Skipped missing XMM registers]" + return + fi + if ! perf record -o - --intr-regs=3Dxmm0 -e instructions:p \ + -c 100000 ${testprog} 2> /dev/null \ + | perf script -F ip,sym,iregs -i - 2> /dev/null \ + | grep -q "XMM0:" + then + echo "Vector register capture test [Failed missing XMM output]" + err=3D1 + return + fi + echo "Vector registe (XMM) capture test [Success]" + if ! perf record --intr-regs=3D\? 2>&1 | grep -q 'YMMH0' + then + echo "Vector register capture test [Skipped missing YMM registers]" + return + fi + if ! perf record -o - --intr-regs=3Dymmh0 -e instructions:p \ + -c 100000 ${testprog} 2> /dev/null \ + | perf script -F ip,sym,iregs -i - 2> /dev/null \ + | grep -q "YMMH0:" + then + echo "Vector register capture test [Failed missing YMMH output]" + err=3D1 + return + fi + echo "Vector registe (YMM) capture test [Success]" + if ! perf record --intr-regs=3D\? 2>&1 | grep -q 'ZMMH0' + then + echo "Vector register capture test [Skipped missing ZMM registers]" + return + fi + if ! perf record -o - --intr-regs=3Dzmmh0 -e instructions:p \ + -c 100000 ${testprog} 2> /dev/null \ + | perf script -F ip,sym,iregs -i - 2> /dev/null \ + | grep -q "ZMMH0:" + then + echo "Vector register capture test [Failed missing ZMMH output]" + err=3D1 + return + fi + echo "Vector registe (ZMM) capture test [Success]" +} + test_system_wide() { echo "Basic --system-wide mode test" if ! perf record -aB --synth=3Dno -o "${perfdata}" ${testprog} 2> /dev/n= ull @@ -303,6 +357,7 @@ fi =20 test_per_thread test_register_capture +test_vec_register_capture test_system_wide test_workload test_branch_counter --=20 2.40.1