From nobody Fri Dec 19 16:05:42 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 084B027B4FC; Tue, 15 Apr 2025 08:24:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744705460; cv=none; b=M6CoCVrFF66k1IxN47rKpsyR1T2ViIwxYuL6jjG2/HyXtcWVDyXxfd2a6dIswaRM9oUk2xBjpGYm0ORmMuRrLNE2Bh0DIxmBcAL70GAZyQBbZtG3pelK9a++Am5pVzmkAfpNx+LepdypVzvTd6Vf0sLQdgJNKqqigGsrPWpAaOA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744705460; c=relaxed/simple; bh=9d+mLsCAasTLdGJgc6ffxudPE/jqvpz6Ci0AKO2oPrU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=LNBsqTa6YE/EFBcHbXWUV+Wb5YqYgOnmdLlDkdSrsNdVHK8rdU/9t/FWUUt4YA59zrEZJRpRgucOXpJt8n9PAZ2rq7JXDpw58chu+OVUP1uRd5pgfNDvCiKWkf/+2dMF6d4cpdbsYxD/0vY90OGNTLUb1aN5asUYKWa2VrmwmdQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=SBbaDQr5; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="SBbaDQr5" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1744705459; x=1776241459; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=9d+mLsCAasTLdGJgc6ffxudPE/jqvpz6Ci0AKO2oPrU=; b=SBbaDQr5mlfQ4jM1JmtaDFVtOZwrv6Jw8SXZBnjxK221KZNoeuXZb6jv Z9UmEsgOfvENg2MmqO2BLkFJQCdW8qrrE+MpgFIif1qvyPp6fa1hDLLdL vEN+tf3I64IoklRXx/w6Yju1WKx/KtI0p/jl6cYi0AvjJd5iAUEqWngVd E0L7RnXjbi+paptUpJhZEdFWKjjpRcK50cEtefTgAoQ1/fFPShCVXPx51 AInzQEi5k+Jafrw3DbmwbHDqMJpyPOR0xAMwCBa04ENL4ZXJmpLL3bcJR A/B2/5pFhR0SOS9d4cZGTX3XY1AUmz6r7tVNTgj4GEeXfiOiKHCn4Ur0X Q==; X-CSE-ConnectionGUID: o6d8ZIfRSm2lLIe/4VTL1A== X-CSE-MsgGUID: auVmlU7sTu+RPmX8R8lyzg== X-IronPort-AV: E=McAfee;i="6700,10204,11403"; a="46115838" X-IronPort-AV: E=Sophos;i="6.15,213,1739865600"; d="scan'208";a="46115838" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2025 01:23:13 -0700 X-CSE-ConnectionGUID: JgW/NYlFTCi9x/B26gFmEw== X-CSE-MsgGUID: Jt9HAbVjQoe8G6j9rkNDHA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.15,213,1739865600"; d="scan'208";a="130055391" Received: from emr.sh.intel.com ([10.112.229.56]) by fmviesa007.fm.intel.com with ESMTP; 15 Apr 2025 01:23:09 -0700 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Ian Rogers , Adrian Hunter , Alexander Shishkin , Kan Liang , Andi Kleen , Eranian Stephane Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Dapeng Mi Subject: [Patch v3 01/22] perf/x86/intel: Add Panther Lake support Date: Tue, 15 Apr 2025 11:44:07 +0000 Message-Id: <20250415114428.341182-2-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250415114428.341182-1-dapeng1.mi@linux.intel.com> References: <20250415114428.341182-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang From PMU's perspective, Panther Lake is similar to the previous generation Lunar Lake. Both are hybrid platforms, with e-core and p-core. The key differences are the ARCH PEBS feature and several new events. The ARCH PEBS is supported in the following patches. The new events will be supported later in perf tool. Share the code path with the Lunar Lake. Only update the name. Signed-off-by: Kan Liang --- arch/x86/events/intel/core.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index c6f69ce3b2b3..f107dd826c11 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -7572,8 +7572,17 @@ __init int intel_pmu_init(void) name =3D "meteorlake_hybrid"; break; =20 + case INTEL_PANTHERLAKE_L: + pr_cont("Pantherlake Hybrid events, "); + name =3D "pantherlake_hybrid"; + goto lnl_common; + case INTEL_LUNARLAKE_M: case INTEL_ARROWLAKE: + pr_cont("Lunarlake Hybrid events, "); + name =3D "lunarlake_hybrid"; + + lnl_common: intel_pmu_init_hybrid(hybrid_big_small); =20 x86_pmu.pebs_latency_data =3D lnl_latency_data; @@ -7595,8 +7604,6 @@ __init int intel_pmu_init(void) intel_pmu_init_skt(&pmu->pmu); =20 intel_pmu_pebs_data_source_lnl(); - pr_cont("Lunarlake Hybrid events, "); - name =3D "lunarlake_hybrid"; break; =20 case INTEL_ARROWLAKE_H: --=20 2.40.1 From nobody Fri Dec 19 16:05:42 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E2B9D28F51D; Tue, 15 Apr 2025 08:24:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744705460; cv=none; b=mpG/rWo4g7gqKWBsZXBI/PzfsY7UKT8KA+azog0/+TgHNETPQXIU21HK31wkQJoPM+l/5YAV0RSWYRqtVd1t3bYdxkgjpxRApWqiz1B6Ve3efbQpPsvqERrgRtEPoVynti/+TPEUyRc0eYdRgK4O5iNofyvnZ1PfetBvkX5ZDs4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744705460; c=relaxed/simple; bh=pXEsotqy8uhbzC/O4SHOUDlCwAx6teTSf5d4hUKUcB0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=fECHINLKpzmBv8kNJwCny+h4VWfvEEVlNcKMiXzyZ2VZRdmEvHto9+H4YRokQ17HJU0edJ08plHMobTZ7fPubLC9IB4oC4WygO6S5HZFHTAgrw6s8N5EjinsH4PLyqTc3hFMfjwn1705/Ci+QFzcI52n3fCLU0e8TbC41T4NVIo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=n7yLk847; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="n7yLk847" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1744705459; x=1776241459; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=pXEsotqy8uhbzC/O4SHOUDlCwAx6teTSf5d4hUKUcB0=; b=n7yLk847GyzscDDHQdqhKpAs426dDJY+KWTYzx1MzR5KOFUikASvQHgW xlELdujsoOm46Lkrcow1ggyC9rgecxcphXgKV+Fw9HVQS7oLTg+rTSPUo eRATKMenpb4KCzMxt/rV2yRDg5TPKozD02lPa/QTMBs8qVKI2AcoB3CCY O0k7kj2kl/IQTIFSZ1/Yo3x8uZgQq8G+DF8BksllQu4E0mm1D2nd6OPyb g96wHaoqQhYJzEHrER0Uc3GFokCiJhioWRVJp0pT3w7WxbrxXM4Q8Wj+9 k3rPfXME/bWT9jZS2OlLQw8N4Q4FFTBim7M7L51oZEU2m4gTEgfOaS9na w==; X-CSE-ConnectionGUID: SWPR31fMRC2cY0SXx0/kDA== X-CSE-MsgGUID: yBBhqYqBQKeLzqTtNWz3MA== X-IronPort-AV: E=McAfee;i="6700,10204,11403"; a="46115852" X-IronPort-AV: E=Sophos;i="6.15,213,1739865600"; d="scan'208,223";a="46115852" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2025 01:23:17 -0700 X-CSE-ConnectionGUID: aeDkbdirSUWiQxcBXtXX8w== X-CSE-MsgGUID: CSiOCL3qToyLnK2rNpPsJw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.15,213,1739865600"; d="scan'208,223";a="130055435" Received: from emr.sh.intel.com ([10.112.229.56]) by fmviesa007.fm.intel.com with ESMTP; 15 Apr 2025 01:23:13 -0700 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Ian Rogers , Adrian Hunter , Alexander Shishkin , Kan Liang , Andi Kleen , Eranian Stephane Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Dapeng Mi , Dapeng Mi Subject: [Patch v3 02/22] perf/x86/intel: Add PMU support for Clearwater Forest Date: Tue, 15 Apr 2025 11:44:08 +0000 Message-Id: <20250415114428.341182-3-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250415114428.341182-1-dapeng1.mi@linux.intel.com> References: <20250415114428.341182-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From PMU's perspective, Clearwater Forest is similar to the previous generation Sierra Forest. The key differences are the ARCH PEBS feature and the new added 3 fixed counters for topdown L1 metrics events. The ARCH PEBS is supported in the following patches. This patch provides support for basic perfmon features and 3 new added fixed counters. Signed-off-by: Dapeng Mi --- arch/x86/events/intel/core.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index f107dd826c11..adc0187a81a0 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -2224,6 +2224,18 @@ static struct extra_reg intel_cmt_extra_regs[] __rea= d_mostly =3D { EVENT_EXTRA_END }; =20 +EVENT_ATTR_STR(topdown-fe-bound, td_fe_bound_skt, "event=3D0x= 9c,umask=3D0x01"); +EVENT_ATTR_STR(topdown-retiring, td_retiring_skt, "event=3D0x= c2,umask=3D0x02"); +EVENT_ATTR_STR(topdown-be-bound, td_be_bound_skt, "event=3D0x= a4,umask=3D0x02"); + +static struct attribute *skt_events_attrs[] =3D { + EVENT_PTR(td_fe_bound_skt), + EVENT_PTR(td_retiring_skt), + EVENT_PTR(td_bad_spec_cmt), + EVENT_PTR(td_be_bound_skt), + NULL, +}; + #define KNL_OT_L2_HITE BIT_ULL(19) /* Other Tile L2 Hit */ #define KNL_OT_L2_HITF BIT_ULL(20) /* Other Tile L2 Hit */ #define KNL_MCDRAM_LOCAL BIT_ULL(21) @@ -7142,6 +7154,18 @@ __init int intel_pmu_init(void) name =3D "crestmont"; break; =20 + case INTEL_ATOM_DARKMONT_X: + intel_pmu_init_skt(NULL); + intel_pmu_pebs_data_source_cmt(); + x86_pmu.pebs_latency_data =3D cmt_latency_data; + x86_pmu.get_event_constraints =3D cmt_get_event_constraints; + td_attr =3D skt_events_attrs; + mem_attr =3D grt_mem_attrs; + extra_attr =3D cmt_format_attr; + pr_cont("Darkmont events, "); + name =3D "darkmont"; + break; + case INTEL_WESTMERE: case INTEL_WESTMERE_EP: case INTEL_WESTMERE_EX: --=20 2.40.1 From nobody Fri Dec 19 16:05:42 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0E464291160; Tue, 15 Apr 2025 08:24:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744705463; cv=none; b=cXTRY5Y56YoJObXZqYRYk46ej2dLXEe39aNPm8eAXpc0AKIvrbW5dUGl5D8jzCVBCiqTaCJN5K6GadI2qi4tYEj9B3OhkdQaFFGpwpSrIbM5tkNYhRVrxMDpgvpEFxMmGOJ1qAPk+bHy0wYj1JbuNJ4B39Cqk1AMi5PmChIExos= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744705463; c=relaxed/simple; bh=w3cvApiihyqxnckGt3CNiYtqSyQdTpFdKg4IdjtAT9M=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=VPZ63a2jbFl3umAh+o00wLxF79h9vP9xHq0Inwebrj1782WWtwWaiz4SEk7cv4Bucu7c+P+zrulXnHTIrytXiKzw4fGWQBBYPtVnO/lSGxH6+8szLUKUAXWSyqU8OEw5O4Vo0eRhbjHZieTGvjzvffGS2J1qwlH/I9fnUIU0Kuo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=VZYKJgYG; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="VZYKJgYG" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1744705462; x=1776241462; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=w3cvApiihyqxnckGt3CNiYtqSyQdTpFdKg4IdjtAT9M=; b=VZYKJgYGFA+FjrRxTYc8b2b1I8hZ+2iJX0KTk7LhrgyC3j9vb1bcyiIE CchJVHrhE7gh5/Rbwum61gErvRHSf1/WbnMglMenAr8QBmeyW69aNdNhv XglGyZnyZ1yJu2u2M/o7KxGNnoQRGFnbqA0VRbdAfD0Osbf8ursCFx5iP PCctW2pr/kBaAKZ3dLbGf4sTo97uakgT64WMnWKnaf3bhlYZBf/4EAYqa Zx834uCUif228PvgpH/iYl2WmYfHQlNcsXb49jj5wPQINPeo0n0mSlAMU aoHWiKUZzqS3Uw+CBaE+IxpEKRzW6Q1H1cYYA0NfNoj4G4J6zSd82f1Cs w==; X-CSE-ConnectionGUID: 5EjfvxOGRgOk8R35e0pq4g== X-CSE-MsgGUID: jknZ1fT+QMC049brVc+/lQ== X-IronPort-AV: E=McAfee;i="6700,10204,11403"; a="46115861" X-IronPort-AV: E=Sophos;i="6.15,213,1739865600"; d="scan'208";a="46115861" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2025 01:23:20 -0700 X-CSE-ConnectionGUID: bs1KenWVSpi88Cwd2Nznig== X-CSE-MsgGUID: /LpM3HTZQPGsAEiAbxgBmQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.15,213,1739865600"; d="scan'208";a="130055459" Received: from emr.sh.intel.com ([10.112.229.56]) by fmviesa007.fm.intel.com with ESMTP; 15 Apr 2025 01:23:16 -0700 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Ian Rogers , Adrian Hunter , Alexander Shishkin , Kan Liang , Andi Kleen , Eranian Stephane Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Dapeng Mi , Dapeng Mi Subject: [Patch v3 03/22] perf/x86/intel: Parse CPUID archPerfmonExt leaves for non-hybrid CPUs Date: Tue, 15 Apr 2025 11:44:09 +0000 Message-Id: <20250415114428.341182-4-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250415114428.341182-1-dapeng1.mi@linux.intel.com> References: <20250415114428.341182-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" CPUID archPerfmonExt (0x23) leaves are supported to enumerate CPU level's PMU capabilities on non-hybrid processors as well. This patch supports to parse archPerfmonExt leaves on non-hybrid processors. Architectural PEBS leverages archPerfmonExt sub-leaves 0x4 and 0x5 to enumerate the PEBS capabilities as well. This patch is a precursor of the subsequent arch-PEBS enabling patches. Signed-off-by: Dapeng Mi --- arch/x86/events/intel/core.c | 31 ++++++++++++++++++++++--------- 1 file changed, 22 insertions(+), 9 deletions(-) diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index adc0187a81a0..c7937b872348 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -5271,7 +5271,7 @@ static inline bool intel_pmu_broken_perf_cap(void) return false; } =20 -static void update_pmu_cap(struct x86_hybrid_pmu *pmu) +static void update_pmu_cap(struct pmu *pmu) { unsigned int cntr, fixed_cntr, ecx, edx; union cpuid35_eax eax; @@ -5280,30 +5280,30 @@ static void update_pmu_cap(struct x86_hybrid_pmu *p= mu) cpuid(ARCH_PERFMON_EXT_LEAF, &eax.full, &ebx.full, &ecx, &edx); =20 if (ebx.split.umask2) - pmu->config_mask |=3D ARCH_PERFMON_EVENTSEL_UMASK2; + hybrid(pmu, config_mask) |=3D ARCH_PERFMON_EVENTSEL_UMASK2; if (ebx.split.eq) - pmu->config_mask |=3D ARCH_PERFMON_EVENTSEL_EQ; + hybrid(pmu, config_mask) |=3D ARCH_PERFMON_EVENTSEL_EQ; =20 if (eax.split.cntr_subleaf) { cpuid_count(ARCH_PERFMON_EXT_LEAF, ARCH_PERFMON_NUM_COUNTER_LEAF, &cntr, &fixed_cntr, &ecx, &edx); - pmu->cntr_mask64 =3D cntr; - pmu->fixed_cntr_mask64 =3D fixed_cntr; + hybrid(pmu, cntr_mask64) =3D cntr; + hybrid(pmu, fixed_cntr_mask64) =3D fixed_cntr; } =20 if (eax.split.acr_subleaf) { cpuid_count(ARCH_PERFMON_EXT_LEAF, ARCH_PERFMON_ACR_LEAF, &cntr, &fixed_cntr, &ecx, &edx); /* The mask of the counters which can be reloaded */ - pmu->acr_cntr_mask64 =3D cntr | ((u64)fixed_cntr << INTEL_PMC_IDX_FIXED); + hybrid(pmu, acr_cntr_mask64) =3D cntr | ((u64)fixed_cntr << INTEL_PMC_ID= X_FIXED); =20 /* The mask of the counters which can cause a reload of reloadable count= ers */ - pmu->acr_cause_mask64 =3D ecx | ((u64)edx << INTEL_PMC_IDX_FIXED); + hybrid(pmu, acr_cause_mask64) =3D ecx | ((u64)edx << INTEL_PMC_IDX_FIXED= ); } =20 if (!intel_pmu_broken_perf_cap()) { /* Perf Metric (Bit 15) and PEBS via PT (Bit 16) are hybrid enumeration = */ - rdmsrl(MSR_IA32_PERF_CAPABILITIES, pmu->intel_cap.capabilities); + rdmsrl(MSR_IA32_PERF_CAPABILITIES, hybrid(pmu, intel_cap).capabilities); } } =20 @@ -5390,7 +5390,7 @@ static bool init_hybrid_pmu(int cpu) goto end; =20 if (this_cpu_has(X86_FEATURE_ARCH_PERFMON_EXT)) - update_pmu_cap(pmu); + update_pmu_cap(&pmu->pmu); =20 intel_pmu_check_hybrid_pmus(pmu); =20 @@ -6899,6 +6899,7 @@ __init int intel_pmu_init(void) =20 x86_pmu.pebs_events_mask =3D intel_pmu_pebs_mask(x86_pmu.cntr_mask64); x86_pmu.pebs_capable =3D PEBS_COUNTER_MASK; + x86_pmu.config_mask =3D X86_RAW_EVENT_MASK; =20 /* * Quirk: v2 perfmon does not report fixed-purpose events, so @@ -7715,6 +7716,18 @@ __init int intel_pmu_init(void) x86_pmu.attr_update =3D hybrid_attr_update; } =20 + /* + * The archPerfmonExt (0x23) includes an enhanced enumeration of + * PMU architectural features with a per-core view. For non-hybrid, + * each core has the same PMU capabilities. It's good enough to + * update the x86_pmu from the booting CPU. For hybrid, the x86_pmu + * is used to keep the common capabilities. Still keep the values + * from the leaf 0xa. The core specific update will be done later + * when a new type is online. + */ + if (!is_hybrid() && boot_cpu_has(X86_FEATURE_ARCH_PERFMON_EXT)) + update_pmu_cap(NULL); + intel_pmu_check_counters_mask(&x86_pmu.cntr_mask64, &x86_pmu.fixed_cntr_mask64, &x86_pmu.intel_ctrl); --=20 2.40.1 From nobody Fri Dec 19 16:05:42 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8A1AC2918FF; Tue, 15 Apr 2025 08:24:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744705468; cv=none; b=BPw/W2zONP9gVGDJlfvEhkNnumV8tHZ2ldzzIoVlneDuwCkNgoc+eh0HKr6OdQZ4qEudkTFGU0QCc2g7uX7I2GOZapTHiz47oVr9G8hfBcuwsLE6Tvrp1wZtdkrQT/f/8pY5wRUXxkz8irZViFIinVGLJLgY9AgVPGgW2rOnrWs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744705468; c=relaxed/simple; bh=bPDc33zIJ9fQccDYYs4KtadWDnQKLZnBRafs/XcOk78=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=cN48o2KrmOlXNFZ4qpRUl6M/o+/jwPfgnax/hbDJAH9s8ooqyTKwN6Yx8WySdmrSCzruum4UagvSOM/n4FypuAN7VnvxkhARqSXxmPmamOKW0GBQXgBCdAgFLmHjQW/tDLFe8xwJ5JWQFz6pk7w8PypqcNgI7A0pe4n5F60Zs84= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=EIhmrbF+; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="EIhmrbF+" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1744705467; x=1776241467; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=bPDc33zIJ9fQccDYYs4KtadWDnQKLZnBRafs/XcOk78=; b=EIhmrbF+4knuoJqIllxf6B6p/GdeokPoX+bnwUZtYZidMWjPBZBukix0 wQe1/X+qmmXDaEVYvPbmkP3DreZpN/uxavc15TG3XPQwgPS4rxBpjy6/D mjhe1pUPBwze55463FEFnerG7rwDvAn8drgp22xaapOgHyidERlyr96ZP ngHHujImolY+vdYLsqaARZ6mypAPsr2ZEGvsop4IqBuxYC4bU+E2Lx6dt DsN2bZd8gtnFfK5ogGdiU+02b8iZPiZyIzJdV3c+JNbELWX86wz2SmZZF UCAt+U5FAaDlqClgam5vXVBnJo7DaSddQuoHL9ydjxJ3JpMYrrzehkPf3 Q==; X-CSE-ConnectionGUID: LVhKzDyaQNyFL1imNFdGVw== X-CSE-MsgGUID: lYCrvcOsT5qgW8ZwL+ke5Q== X-IronPort-AV: E=McAfee;i="6700,10204,11403"; a="46115872" X-IronPort-AV: E=Sophos;i="6.15,213,1739865600"; d="scan'208";a="46115872" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2025 01:23:24 -0700 X-CSE-ConnectionGUID: V1IvFp3vRTCfTZjujdaLPw== X-CSE-MsgGUID: ml/YpkG2S7Ki/u5jzT6oLA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.15,213,1739865600"; d="scan'208";a="130055479" Received: from emr.sh.intel.com ([10.112.229.56]) by fmviesa007.fm.intel.com with ESMTP; 15 Apr 2025 01:23:20 -0700 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Ian Rogers , Adrian Hunter , Alexander Shishkin , Kan Liang , Andi Kleen , Eranian Stephane Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Dapeng Mi , Dapeng Mi Subject: [Patch v3 04/22] perf/x86/intel: Decouple BTS initialization from PEBS initialization Date: Tue, 15 Apr 2025 11:44:10 +0000 Message-Id: <20250415114428.341182-5-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250415114428.341182-1-dapeng1.mi@linux.intel.com> References: <20250415114428.341182-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Move x86_pmu.bts flag initialization into bts_init() from intel_ds_init() and rename intel_ds_init() to intel_pebs_init() since it fully initializes PEBS now after removing the x86_pmu.bts initialization. It's safe to move x86_pmu.bts into bts_init() since all x86_pmu.bts flag are called after bts_init() execution. Signed-off-by: Dapeng Mi --- arch/x86/events/intel/bts.c | 6 +++++- arch/x86/events/intel/core.c | 2 +- arch/x86/events/intel/ds.c | 5 ++--- arch/x86/events/perf_event.h | 2 +- 4 files changed, 9 insertions(+), 6 deletions(-) diff --git a/arch/x86/events/intel/bts.c b/arch/x86/events/intel/bts.c index 16bc89c8023b..9560f693fac0 100644 --- a/arch/x86/events/intel/bts.c +++ b/arch/x86/events/intel/bts.c @@ -599,7 +599,11 @@ static void bts_event_read(struct perf_event *event) =20 static __init int bts_init(void) { - if (!boot_cpu_has(X86_FEATURE_DTES64) || !x86_pmu.bts) + if (!boot_cpu_has(X86_FEATURE_DTES64)) + return -ENODEV; + + x86_pmu.bts =3D boot_cpu_has(X86_FEATURE_BTS); + if (!x86_pmu.bts) return -ENODEV; =20 if (boot_cpu_has(X86_FEATURE_PTI)) { diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index c7937b872348..16049ba63135 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -6928,7 +6928,7 @@ __init int intel_pmu_init(void) if (boot_cpu_has(X86_FEATURE_ARCH_LBR)) intel_pmu_arch_lbr_init(); =20 - intel_ds_init(); + intel_pebs_init(); =20 x86_add_quirk(intel_arch_events_quirk); /* Install first, so it runs last= */ =20 diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index fcf9c5b26cab..d894cf3f631e 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -2651,10 +2651,10 @@ static void intel_pmu_drain_pebs_icl(struct pt_regs= *iregs, struct perf_sample_d } =20 /* - * BTS, PEBS probe and setup + * PEBS probe and setup */ =20 -void __init intel_ds_init(void) +void __init intel_pebs_init(void) { /* * No support for 32bit formats @@ -2662,7 +2662,6 @@ void __init intel_ds_init(void) if (!boot_cpu_has(X86_FEATURE_DTES64)) return; =20 - x86_pmu.bts =3D boot_cpu_has(X86_FEATURE_BTS); x86_pmu.pebs =3D boot_cpu_has(X86_FEATURE_PEBS); x86_pmu.pebs_buffer_size =3D PEBS_BUFFER_SIZE; if (x86_pmu.version <=3D 4) diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 46bbb503aca1..ac6743e392ad 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -1673,7 +1673,7 @@ void intel_pmu_drain_pebs_buffer(void); =20 void intel_pmu_store_pebs_lbrs(struct lbr_entry *lbr); =20 -void intel_ds_init(void); +void intel_pebs_init(void); =20 void intel_pmu_lbr_save_brstack(struct perf_sample_data *data, struct cpu_hw_events *cpuc, --=20 2.40.1 From nobody Fri Dec 19 16:05:42 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F40171F460E; Tue, 15 Apr 2025 08:23:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744705409; cv=none; b=UNF5L998vvgF9PoqrZW7mAJSk1SnCJUCypJG+vHlZoxZ+yNWYZXoJl3mW+mwETVsqE9u0LiptLGLCHSPypjNHlktqKSxEysuOo6yr7/QDzF0109/3lkBJsFiLeGO4ZquMB8oNUNETkiEVm48itWK+YbDacHHNMeMO9S2cJpQJSs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744705409; c=relaxed/simple; bh=THPWTcNrmMqrZSGJ5ztCISz7v65l37PxQ6MERtG7qnA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=CeG4kPOP8Pd8qeJq4Dz1U91Wt+E72f+eALhILEoe6/1uyImDqiHMlt9sNQ5xI305/J5FOEycT6vB0ldLeN84AnH+j4JCqjOyPBz+YTR/pU8qkO1TxtIIpzSoDgLISJXnHreJNTdloUvbRR19fTeb40KWVt875ZmoSOxxWHS+6CU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=BFPJlCZR; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="BFPJlCZR" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1744705408; x=1776241408; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=THPWTcNrmMqrZSGJ5ztCISz7v65l37PxQ6MERtG7qnA=; b=BFPJlCZRyH5at2QFlBDZ9cUhlx9QJ2Er+gn4T4EoTBv8QPNMYTS6myfP patAcMZeAS/uLZZOmrsVbQUVHIyChLFQxQ60yoKFmRgjIEno/VvpK7IHI VjvMLvS+gqOcDTunsXQnPvcpPy7mItdgOu24Q8jW8n2utgLmA7tefuNWN zkODHHt3cCErbovHgKGaM/6LQTxAWcPIO9GDCTI2+AfE8XYZov5lyxemm nSnIvKuJ2iS1rx1eaGZrCCT3fZtxdl5Mjs4yBEY7F+gPKSfOJ8h9FD9wD XmAKwflScssdXsNTMjRW546k+El9UNShsbgK/aDBFxUDKJNEWfIKoG+h0 A==; X-CSE-ConnectionGUID: P8RmYE6uThe0hOHQuZcAEg== X-CSE-MsgGUID: Pakhtqd/RzyiPIKKyEEsjw== X-IronPort-AV: E=McAfee;i="6700,10204,11403"; a="46115887" X-IronPort-AV: E=Sophos;i="6.15,213,1739865600"; d="scan'208";a="46115887" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2025 01:23:28 -0700 X-CSE-ConnectionGUID: 79kwlrdORlSUgmKv+Rel7Q== X-CSE-MsgGUID: REX9lZgRRPqeZoKtDBkMgA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.15,213,1739865600"; d="scan'208";a="130055502" Received: from emr.sh.intel.com ([10.112.229.56]) by fmviesa007.fm.intel.com with ESMTP; 15 Apr 2025 01:23:24 -0700 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Ian Rogers , Adrian Hunter , Alexander Shishkin , Kan Liang , Andi Kleen , Eranian Stephane Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Dapeng Mi , Dapeng Mi Subject: [Patch v3 05/22] perf/x86/intel: Rename x86_pmu.pebs to x86_pmu.ds_pebs Date: Tue, 15 Apr 2025 11:44:11 +0000 Message-Id: <20250415114428.341182-6-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250415114428.341182-1-dapeng1.mi@linux.intel.com> References: <20250415114428.341182-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Since architectural PEBS would be introduced in subsequent patches, rename x86_pmu.pebs to x86_pmu.ds_pebs for distinguishing with the upcoming architectural PEBS. Besides restrict reserve_ds_buffers() helper to work only for the legacy DS based PEBS and avoid it to corrupt the pebs_active flag and release PEBS buffer incorrectly for arch-PEBS since the later patch would reuse these flags and alloc/release_pebs_buffer() helpers for arch-PEBS. Signed-off-by: Dapeng Mi --- arch/x86/events/intel/core.c | 6 +++--- arch/x86/events/intel/ds.c | 32 ++++++++++++++++++-------------- arch/x86/events/perf_event.h | 2 +- 3 files changed, 22 insertions(+), 18 deletions(-) diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index 16049ba63135..7bbc7a740242 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -4584,7 +4584,7 @@ static struct perf_guest_switch_msr *intel_guest_get_= msrs(int *nr, void *data) .guest =3D intel_ctrl & ~cpuc->intel_ctrl_host_mask & ~pebs_mask, }; =20 - if (!x86_pmu.pebs) + if (!x86_pmu.ds_pebs) return arr; =20 /* @@ -5764,7 +5764,7 @@ static __init void intel_clovertown_quirk(void) * these chips. */ pr_warn("PEBS disabled due to CPU errata\n"); - x86_pmu.pebs =3D 0; + x86_pmu.ds_pebs =3D 0; x86_pmu.pebs_constraints =3D NULL; } =20 @@ -6252,7 +6252,7 @@ tsx_is_visible(struct kobject *kobj, struct attribute= *attr, int i) static umode_t pebs_is_visible(struct kobject *kobj, struct attribute *attr, int i) { - return x86_pmu.pebs ? attr->mode : 0; + return x86_pmu.ds_pebs ? attr->mode : 0; } =20 static umode_t diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index d894cf3f631e..1d6b3fa6a8eb 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -624,7 +624,7 @@ static int alloc_pebs_buffer(int cpu) int max, node =3D cpu_to_node(cpu); void *buffer, *insn_buff, *cea; =20 - if (!x86_pmu.pebs) + if (!x86_pmu.ds_pebs) return 0; =20 buffer =3D dsalloc_pages(bsiz, GFP_KERNEL, cpu); @@ -659,7 +659,7 @@ static void release_pebs_buffer(int cpu) struct cpu_hw_events *hwev =3D per_cpu_ptr(&cpu_hw_events, cpu); void *cea; =20 - if (!x86_pmu.pebs) + if (!x86_pmu.ds_pebs) return; =20 kfree(per_cpu(insn_buffer, cpu)); @@ -734,7 +734,7 @@ void release_ds_buffers(void) { int cpu; =20 - if (!x86_pmu.bts && !x86_pmu.pebs) + if (!x86_pmu.bts && !x86_pmu.ds_pebs) return; =20 for_each_possible_cpu(cpu) @@ -750,7 +750,8 @@ void release_ds_buffers(void) } =20 for_each_possible_cpu(cpu) { - release_pebs_buffer(cpu); + if (x86_pmu.ds_pebs) + release_pebs_buffer(cpu); release_bts_buffer(cpu); } } @@ -761,15 +762,17 @@ void reserve_ds_buffers(void) int cpu; =20 x86_pmu.bts_active =3D 0; - x86_pmu.pebs_active =3D 0; =20 - if (!x86_pmu.bts && !x86_pmu.pebs) + if (x86_pmu.ds_pebs) + x86_pmu.pebs_active =3D 0; + + if (!x86_pmu.bts && !x86_pmu.ds_pebs) return; =20 if (!x86_pmu.bts) bts_err =3D 1; =20 - if (!x86_pmu.pebs) + if (!x86_pmu.ds_pebs) pebs_err =3D 1; =20 for_each_possible_cpu(cpu) { @@ -781,7 +784,8 @@ void reserve_ds_buffers(void) if (!bts_err && alloc_bts_buffer(cpu)) bts_err =3D 1; =20 - if (!pebs_err && alloc_pebs_buffer(cpu)) + if (x86_pmu.ds_pebs && !pebs_err && + alloc_pebs_buffer(cpu)) pebs_err =3D 1; =20 if (bts_err && pebs_err) @@ -793,7 +797,7 @@ void reserve_ds_buffers(void) release_bts_buffer(cpu); } =20 - if (pebs_err) { + if (x86_pmu.ds_pebs && pebs_err) { for_each_possible_cpu(cpu) release_pebs_buffer(cpu); } @@ -805,7 +809,7 @@ void reserve_ds_buffers(void) if (x86_pmu.bts && !bts_err) x86_pmu.bts_active =3D 1; =20 - if (x86_pmu.pebs && !pebs_err) + if (x86_pmu.ds_pebs && !pebs_err) x86_pmu.pebs_active =3D 1; =20 for_each_possible_cpu(cpu) { @@ -2662,12 +2666,12 @@ void __init intel_pebs_init(void) if (!boot_cpu_has(X86_FEATURE_DTES64)) return; =20 - x86_pmu.pebs =3D boot_cpu_has(X86_FEATURE_PEBS); + x86_pmu.ds_pebs =3D boot_cpu_has(X86_FEATURE_PEBS); x86_pmu.pebs_buffer_size =3D PEBS_BUFFER_SIZE; if (x86_pmu.version <=3D 4) x86_pmu.pebs_no_isolation =3D 1; =20 - if (x86_pmu.pebs) { + if (x86_pmu.ds_pebs) { char pebs_type =3D x86_pmu.intel_cap.pebs_trap ? '+' : '-'; char *pebs_qual =3D ""; int format =3D x86_pmu.intel_cap.pebs_format; @@ -2759,7 +2763,7 @@ void __init intel_pebs_init(void) =20 default: pr_cont("no PEBS fmt%d%c, ", format, pebs_type); - x86_pmu.pebs =3D 0; + x86_pmu.ds_pebs =3D 0; } } } @@ -2768,7 +2772,7 @@ void perf_restore_debug_store(void) { struct debug_store *ds =3D __this_cpu_read(cpu_hw_events.ds); =20 - if (!x86_pmu.bts && !x86_pmu.pebs) + if (!x86_pmu.bts && !x86_pmu.ds_pebs) return; =20 wrmsrl(MSR_IA32_DS_AREA, (unsigned long)ds); diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index ac6743e392ad..2ef407d0a7e2 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -898,7 +898,7 @@ struct x86_pmu { */ unsigned int bts :1, bts_active :1, - pebs :1, + ds_pebs :1, pebs_active :1, pebs_broken :1, pebs_prec_dist :1, --=20 2.40.1 From nobody Fri Dec 19 16:05:42 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4D010192D97; Tue, 15 Apr 2025 08:23:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744705412; cv=none; b=R5mhlyX/8ol6kgnKuPOa6tgRfEPkgrcF0+3P5KZzd+0UabH9CWPBmym2u/mxxHErEISoFpIcYNwj4d2h/clfFcCgbH76Eq8JHAyEn/nt8GFFNclmzanEbwvJbmUWgmQx3GwrfQ1Cbf/TblTjBj2/IhEd9Woy0pa+eJsgnAfDQ0g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744705412; c=relaxed/simple; bh=DA8J2sfyRVygucdeoqkkf9u6QbFkCguW9hWQTxxOuqs=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=OcAOb9FnTAJqfuolMwl9s6mt+/eUwJYHtmYtwuJpXH9vZHzSl1jh9qPEhBIp3b4f+lbTrsxABQfYvS5qk7vXFlq6uXKV5l+uuJl1bYLWzRbZfVUwwIsjyxxGWR+XFsr9exfFolG/87ZpMkqqwyHpH6rQJ2vcQSuKSeoj7s+P3L0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=gzR2tI3P; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="gzR2tI3P" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1744705412; x=1776241412; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=DA8J2sfyRVygucdeoqkkf9u6QbFkCguW9hWQTxxOuqs=; b=gzR2tI3PYNQosFdvT3UpKZKWzDxdkwQ+TF3G9wEqTcUaPVxDl3d89Upq 9QjanWXhPXUrhBByTHEwR3QbwiN9pb+pt/6VLHSl75Ds9eXRyvo8jZL+P oKBc7DscnQRrzK5rYul1Ipfj0AOno7ILPGJskrd8IMVwjUVKFDzx+v2BK 1nR3s/HYEtdiaEGz/GyuIsqVXNWTlSAaenSvkfM5nzRGIWYPsWFY9G4b/ 3JobjVgtAwfJo0tG31zxb8RZ6VIuIcEKVeR9bYuj5HbFp6aodOrBt5ny7 iETVEGO1oZn3LydbobcKrYbcLUturHNFSoqevL2mHL8913gOTatEzBGzG Q==; X-CSE-ConnectionGUID: MEkurpI+QIO3dtBFEDreEA== X-CSE-MsgGUID: 8NPHGJt6QZ65KpMCZ58Wkg== X-IronPort-AV: E=McAfee;i="6700,10204,11403"; a="46115901" X-IronPort-AV: E=Sophos;i="6.15,213,1739865600"; d="scan'208";a="46115901" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2025 01:23:31 -0700 X-CSE-ConnectionGUID: Il3a+jx0QF6k+N/PSXh39g== X-CSE-MsgGUID: TmqoTpefT+aw/93lLiFCIw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.15,213,1739865600"; d="scan'208";a="130055521" Received: from emr.sh.intel.com ([10.112.229.56]) by fmviesa007.fm.intel.com with ESMTP; 15 Apr 2025 01:23:27 -0700 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Ian Rogers , Adrian Hunter , Alexander Shishkin , Kan Liang , Andi Kleen , Eranian Stephane Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Dapeng Mi , Dapeng Mi Subject: [Patch v3 06/22] perf/x86/intel: Introduce pairs of PEBS static calls Date: Tue, 15 Apr 2025 11:44:12 +0000 Message-Id: <20250415114428.341182-7-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250415114428.341182-1-dapeng1.mi@linux.intel.com> References: <20250415114428.341182-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Arch-PEBS retires IA32_PEBS_ENABLE and MSR_PEBS_DATA_CFG MSRs, so intel_pmu_pebs_enable/disable() and intel_pmu_pebs_enable/disable_all() are not needed to call for ach-PEBS. To make code cleaner, introduces static calls x86_pmu_pebs_enable/disable() and x86_pmu_pebs_enable/disable_all() instead of adding "x86_pmu.arch_pebs" check directly in these helpers. Suggested-by: Peter Zijlstra Signed-off-by: Dapeng Mi --- arch/x86/events/core.c | 10 ++++++++++ arch/x86/events/intel/core.c | 8 ++++---- arch/x86/events/intel/ds.c | 5 +++++ arch/x86/events/perf_event.h | 8 ++++++++ 4 files changed, 27 insertions(+), 4 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index cae213296a63..995df8f392b6 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -95,6 +95,11 @@ DEFINE_STATIC_CALL_NULL(x86_pmu_filter, *x86_pmu.filter); =20 DEFINE_STATIC_CALL_NULL(x86_pmu_late_setup, *x86_pmu.late_setup); =20 +DEFINE_STATIC_CALL_NULL(x86_pmu_pebs_enable, *x86_pmu.pebs_enable); +DEFINE_STATIC_CALL_NULL(x86_pmu_pebs_disable, *x86_pmu.pebs_disable); +DEFINE_STATIC_CALL_NULL(x86_pmu_pebs_enable_all, *x86_pmu.pebs_enable_all); +DEFINE_STATIC_CALL_NULL(x86_pmu_pebs_disable_all, *x86_pmu.pebs_disable_al= l); + /* * This one is magic, it will get called even when PMU init fails (because * there is no PMU), in which case it should simply return NULL. @@ -2049,6 +2054,11 @@ static void x86_pmu_static_call_update(void) static_call_update(x86_pmu_filter, x86_pmu.filter); =20 static_call_update(x86_pmu_late_setup, x86_pmu.late_setup); + + static_call_update(x86_pmu_pebs_enable, x86_pmu.pebs_enable); + static_call_update(x86_pmu_pebs_disable, x86_pmu.pebs_disable); + static_call_update(x86_pmu_pebs_enable_all, x86_pmu.pebs_enable_all); + static_call_update(x86_pmu_pebs_disable_all, x86_pmu.pebs_disable_all); } =20 static void _x86_pmu_read(struct perf_event *event) diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index 7bbc7a740242..cd6329207311 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -2306,7 +2306,7 @@ static __always_inline void __intel_pmu_disable_all(b= ool bts) static __always_inline void intel_pmu_disable_all(void) { __intel_pmu_disable_all(true); - intel_pmu_pebs_disable_all(); + static_call_cond(x86_pmu_pebs_disable_all)(); intel_pmu_lbr_disable_all(); } =20 @@ -2338,7 +2338,7 @@ static void __intel_pmu_enable_all(int added, bool pm= i) =20 static void intel_pmu_enable_all(int added) { - intel_pmu_pebs_enable_all(); + static_call_cond(x86_pmu_pebs_enable_all)(); __intel_pmu_enable_all(added, false); } =20 @@ -2595,7 +2595,7 @@ static void intel_pmu_disable_event(struct perf_event= *event) * so we don't trigger the event without PEBS bit set. */ if (unlikely(event->attr.precise_ip)) - intel_pmu_pebs_disable(event); + static_call(x86_pmu_pebs_disable)(event); } =20 static void intel_pmu_assign_event(struct perf_event *event, int idx) @@ -2948,7 +2948,7 @@ static void intel_pmu_enable_event(struct perf_event = *event) int idx =3D hwc->idx; =20 if (unlikely(event->attr.precise_ip)) - intel_pmu_pebs_enable(event); + static_call(x86_pmu_pebs_enable)(event); =20 switch (idx) { case 0 ... INTEL_PMC_IDX_FIXED - 1: diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index 1d6b3fa6a8eb..e216622b94dc 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -2679,6 +2679,11 @@ void __init intel_pebs_init(void) if (format < 4) x86_pmu.intel_cap.pebs_baseline =3D 0; =20 + x86_pmu.pebs_enable =3D intel_pmu_pebs_enable; + x86_pmu.pebs_disable =3D intel_pmu_pebs_disable; + x86_pmu.pebs_enable_all =3D intel_pmu_pebs_enable_all; + x86_pmu.pebs_disable_all =3D intel_pmu_pebs_disable_all; + switch (format) { case 0: pr_cont("PEBS fmt0%c, ", pebs_type); diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 2ef407d0a7e2..d201e6ac2ede 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -808,6 +808,10 @@ struct x86_pmu { int (*hw_config)(struct perf_event *event); int (*schedule_events)(struct cpu_hw_events *cpuc, int n, int *assign); void (*late_setup)(void); + void (*pebs_enable)(struct perf_event *event); + void (*pebs_disable)(struct perf_event *event); + void (*pebs_enable_all)(void); + void (*pebs_disable_all)(void); unsigned eventsel; unsigned perfctr; unsigned fixedctr; @@ -1120,6 +1124,10 @@ DECLARE_STATIC_CALL(x86_pmu_set_period, *x86_pmu.set= _period); DECLARE_STATIC_CALL(x86_pmu_update, *x86_pmu.update); DECLARE_STATIC_CALL(x86_pmu_drain_pebs, *x86_pmu.drain_pebs); DECLARE_STATIC_CALL(x86_pmu_late_setup, *x86_pmu.late_setup); +DECLARE_STATIC_CALL(x86_pmu_pebs_enable, *x86_pmu.pebs_enable); +DECLARE_STATIC_CALL(x86_pmu_pebs_disable, *x86_pmu.pebs_disable); +DECLARE_STATIC_CALL(x86_pmu_pebs_enable_all, *x86_pmu.pebs_enable_all); +DECLARE_STATIC_CALL(x86_pmu_pebs_disable_all, *x86_pmu.pebs_disable_all); =20 static __always_inline struct x86_perf_task_context_opt *task_context_opt(= void *ctx) { --=20 2.40.1 From nobody Fri Dec 19 16:05:42 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 09D1C2820BD; Tue, 15 Apr 2025 08:23:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744705417; cv=none; b=jPaRXQTBg5Gg3aU/EFXORgD2jff0dEDQxFef/DKHNBcO9zgPknW02ViHT5JaH3+3fpGBDUEPx9K+tvoHuVZT5KVPQm1slHziNp3ohvYYlZezv0QIOu2ciCeho4qqytx1fnxeLMktobWUp7F7mhtfH3E6KsSfZdAzENnDHKTMv5E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744705417; c=relaxed/simple; bh=7M9YoMK8uIGuUd8gyI92Lan/8dv3VM+x/JC7KW/fAqI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=sbKc4joyY9+lJL0+yI0z9EISroorDjv99gd7TvzDfX1nSNJPQY+Mv9KM3CSbc/KOeSukSBiGBzJgPqMCsfToE2AP6rYd8PZJs9uajg8Xrf9hmgngP/sqZ9yMmMoEYq19DRencmY192CbkNxuOBZTzZRJoKH13KnhBILV9+DmX2E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=lAyy/SQm; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="lAyy/SQm" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1744705416; x=1776241416; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=7M9YoMK8uIGuUd8gyI92Lan/8dv3VM+x/JC7KW/fAqI=; b=lAyy/SQm6bC7pIjwmnXYudTjZ7yA2hoqkdZIc2qROB8FWniL9ETHwjjL trJJ63Zd8gQssZvAB25xFGLut2FVn5Bii6KA61NEiHX02e+JgfATc7y0z DbaqQWKNlJaW0N64a1tq8J/QlVKZ+T2AEAcivK1EJn0vNLJdMZUDQOakz FZIR33IkQXsLgSsfG+O6QTFvMks/LcvBVHVAsUNPK+bbQrZ9xm0KfWjZ9 EbbS1yF5ooMPP7OsHYn5SrNey7G4TRCyVbbTuBh/1QooXC6nSpKB8r9/g YGMChZgxwhuu648e+YYWI/TwFuXZh2LXAMQtYnXsr+YjKf01A5xwGlYk8 Q==; X-CSE-ConnectionGUID: hlCVQQfXSziUYI0XjMeY4w== X-CSE-MsgGUID: 4MYTgkOzRi2xbRZg1p2Prw== X-IronPort-AV: E=McAfee;i="6700,10204,11403"; a="46115926" X-IronPort-AV: E=Sophos;i="6.15,213,1739865600"; d="scan'208";a="46115926" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2025 01:23:36 -0700 X-CSE-ConnectionGUID: fd4SnCGkTAaX9NY5Qf1/sw== X-CSE-MsgGUID: Ct0NU0pASyymJLpmL4jO3w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.15,213,1739865600"; d="scan'208";a="130055557" Received: from emr.sh.intel.com ([10.112.229.56]) by fmviesa007.fm.intel.com with ESMTP; 15 Apr 2025 01:23:31 -0700 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Ian Rogers , Adrian Hunter , Alexander Shishkin , Kan Liang , Andi Kleen , Eranian Stephane Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Dapeng Mi , Dapeng Mi Subject: [Patch v3 07/22] perf/x86/intel: Initialize architectural PEBS Date: Tue, 15 Apr 2025 11:44:13 +0000 Message-Id: <20250415114428.341182-8-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250415114428.341182-1-dapeng1.mi@linux.intel.com> References: <20250415114428.341182-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" arch-PEBS leverages CPUID.23H.4/5 sub-leaves enumerate arch-PEBS supported capabilities and counters bitmap. This patch parses these 2 sub-leaves and initializes arch-PEBS capabilities and corresponding structures. Since IA32_PEBS_ENABLE and MSR_PEBS_DATA_CFG MSRs are no longer existed for arch-PEBS, arch-PEBS doesn't need to manipulate these MSRs. Thus add a simple pair of __intel_pmu_pebs_enable/disable() callbacks for arch-PEBS. Signed-off-by: Dapeng Mi --- arch/x86/events/core.c | 21 ++++++++++--- arch/x86/events/intel/core.c | 46 ++++++++++++++++++--------- arch/x86/events/intel/ds.c | 52 ++++++++++++++++++++++++++----- arch/x86/events/perf_event.h | 25 +++++++++++++-- arch/x86/include/asm/perf_event.h | 7 ++++- 5 files changed, 120 insertions(+), 31 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 995df8f392b6..9c205a8a4fa6 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -553,14 +553,22 @@ static inline int precise_br_compat(struct perf_event= *event) return m =3D=3D b; } =20 -int x86_pmu_max_precise(void) +int x86_pmu_max_precise(struct pmu *pmu) { int precise =3D 0; =20 - /* Support for constant skid */ if (x86_pmu.pebs_active && !x86_pmu.pebs_broken) { - precise++; + /* arch PEBS */ + if (x86_pmu.arch_pebs) { + precise =3D 2; + if (hybrid(pmu, arch_pebs_cap).pdists) + precise++; + + return precise; + } =20 + /* legacy PEBS - support for constant skid */ + precise++; /* Support for IP fixup */ if (x86_pmu.lbr_nr || x86_pmu.intel_cap.pebs_format >=3D 2) precise++; @@ -568,13 +576,14 @@ int x86_pmu_max_precise(void) if (x86_pmu.pebs_prec_dist) precise++; } + return precise; } =20 int x86_pmu_hw_config(struct perf_event *event) { if (event->attr.precise_ip) { - int precise =3D x86_pmu_max_precise(); + int precise =3D x86_pmu_max_precise(event->pmu); =20 if (event->attr.precise_ip > precise) return -EOPNOTSUPP; @@ -2626,7 +2635,9 @@ static ssize_t max_precise_show(struct device *cdev, struct device_attribute *attr, char *buf) { - return snprintf(buf, PAGE_SIZE, "%d\n", x86_pmu_max_precise()); + struct pmu *pmu =3D dev_get_drvdata(cdev); + + return snprintf(buf, PAGE_SIZE, "%d\n", x86_pmu_max_precise(pmu)); } =20 static DEVICE_ATTR_RO(max_precise); diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index cd6329207311..09e2a23f9bcc 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -5273,34 +5273,49 @@ static inline bool intel_pmu_broken_perf_cap(void) =20 static void update_pmu_cap(struct pmu *pmu) { - unsigned int cntr, fixed_cntr, ecx, edx; - union cpuid35_eax eax; - union cpuid35_ebx ebx; + unsigned int eax, ebx, ecx, edx; + union cpuid35_eax eax_0; + union cpuid35_ebx ebx_0; =20 - cpuid(ARCH_PERFMON_EXT_LEAF, &eax.full, &ebx.full, &ecx, &edx); + cpuid(ARCH_PERFMON_EXT_LEAF, &eax_0.full, &ebx_0.full, &ecx, &edx); =20 - if (ebx.split.umask2) + if (ebx_0.split.umask2) hybrid(pmu, config_mask) |=3D ARCH_PERFMON_EVENTSEL_UMASK2; - if (ebx.split.eq) + if (ebx_0.split.eq) hybrid(pmu, config_mask) |=3D ARCH_PERFMON_EVENTSEL_EQ; =20 - if (eax.split.cntr_subleaf) { + if (eax_0.split.cntr_subleaf) { cpuid_count(ARCH_PERFMON_EXT_LEAF, ARCH_PERFMON_NUM_COUNTER_LEAF, - &cntr, &fixed_cntr, &ecx, &edx); - hybrid(pmu, cntr_mask64) =3D cntr; - hybrid(pmu, fixed_cntr_mask64) =3D fixed_cntr; + &eax, &ebx, &ecx, &edx); + hybrid(pmu, cntr_mask64) =3D eax; + hybrid(pmu, fixed_cntr_mask64) =3D ebx; } =20 - if (eax.split.acr_subleaf) { + if (eax_0.split.acr_subleaf) { cpuid_count(ARCH_PERFMON_EXT_LEAF, ARCH_PERFMON_ACR_LEAF, - &cntr, &fixed_cntr, &ecx, &edx); + &eax, &ebx, &ecx, &edx); /* The mask of the counters which can be reloaded */ - hybrid(pmu, acr_cntr_mask64) =3D cntr | ((u64)fixed_cntr << INTEL_PMC_ID= X_FIXED); + hybrid(pmu, acr_cntr_mask64) =3D eax | ((u64)ebx << INTEL_PMC_IDX_FIXED); =20 /* The mask of the counters which can cause a reload of reloadable count= ers */ hybrid(pmu, acr_cause_mask64) =3D ecx | ((u64)edx << INTEL_PMC_IDX_FIXED= ); } =20 + /* Bits[5:4] should be set simultaneously if arch-PEBS is supported */ + if (eax_0.split.pebs_caps_subleaf && eax_0.split.pebs_cnts_subleaf) { + cpuid_count(ARCH_PERFMON_EXT_LEAF, ARCH_PERFMON_PEBS_CAP_LEAF, + &eax, &ebx, &ecx, &edx); + hybrid(pmu, arch_pebs_cap).caps =3D (u64)ebx << 32; + + cpuid_count(ARCH_PERFMON_EXT_LEAF, ARCH_PERFMON_PEBS_COUNTER_LEAF, + &eax, &ebx, &ecx, &edx); + hybrid(pmu, arch_pebs_cap).counters =3D ((u64)ecx << 32) | eax; + hybrid(pmu, arch_pebs_cap).pdists =3D ((u64)edx << 32) | ebx; + } else { + WARN_ON(x86_pmu.arch_pebs =3D=3D 1); + x86_pmu.arch_pebs =3D 0; + } + if (!intel_pmu_broken_perf_cap()) { /* Perf Metric (Bit 15) and PEBS via PT (Bit 16) are hybrid enumeration = */ rdmsrl(MSR_IA32_PERF_CAPABILITIES, hybrid(pmu, intel_cap).capabilities); @@ -6252,7 +6267,7 @@ tsx_is_visible(struct kobject *kobj, struct attribute= *attr, int i) static umode_t pebs_is_visible(struct kobject *kobj, struct attribute *attr, int i) { - return x86_pmu.ds_pebs ? attr->mode : 0; + return intel_pmu_has_pebs() ? attr->mode : 0; } =20 static umode_t @@ -7728,6 +7743,9 @@ __init int intel_pmu_init(void) if (!is_hybrid() && boot_cpu_has(X86_FEATURE_ARCH_PERFMON_EXT)) update_pmu_cap(NULL); =20 + if (x86_pmu.arch_pebs) + pr_cont("Architectural PEBS, "); + intel_pmu_check_counters_mask(&x86_pmu.cntr_mask64, &x86_pmu.fixed_cntr_mask64, &x86_pmu.intel_ctrl); diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index e216622b94dc..4597b5c48d8a 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -1530,6 +1530,15 @@ static inline void intel_pmu_drain_large_pebs(struct= cpu_hw_events *cpuc) intel_pmu_drain_pebs_buffer(); } =20 +static void __intel_pmu_pebs_enable(struct perf_event *event) +{ + struct cpu_hw_events *cpuc =3D this_cpu_ptr(&cpu_hw_events); + struct hw_perf_event *hwc =3D &event->hw; + + hwc->config &=3D ~ARCH_PERFMON_EVENTSEL_INT; + cpuc->pebs_enabled |=3D 1ULL << hwc->idx; +} + void intel_pmu_pebs_enable(struct perf_event *event) { struct cpu_hw_events *cpuc =3D this_cpu_ptr(&cpu_hw_events); @@ -1538,9 +1547,7 @@ void intel_pmu_pebs_enable(struct perf_event *event) struct debug_store *ds =3D cpuc->ds; unsigned int idx =3D hwc->idx; =20 - hwc->config &=3D ~ARCH_PERFMON_EVENTSEL_INT; - - cpuc->pebs_enabled |=3D 1ULL << hwc->idx; + __intel_pmu_pebs_enable(event); =20 if ((event->hw.flags & PERF_X86_EVENT_PEBS_LDLAT) && (x86_pmu.version < 5= )) cpuc->pebs_enabled |=3D 1ULL << (hwc->idx + 32); @@ -1602,14 +1609,22 @@ void intel_pmu_pebs_del(struct perf_event *event) pebs_update_state(needed_cb, cpuc, event, false); } =20 -void intel_pmu_pebs_disable(struct perf_event *event) +static void __intel_pmu_pebs_disable(struct perf_event *event) { struct cpu_hw_events *cpuc =3D this_cpu_ptr(&cpu_hw_events); struct hw_perf_event *hwc =3D &event->hw; =20 intel_pmu_drain_large_pebs(cpuc); - cpuc->pebs_enabled &=3D ~(1ULL << hwc->idx); + hwc->config |=3D ARCH_PERFMON_EVENTSEL_INT; +} + +void intel_pmu_pebs_disable(struct perf_event *event) +{ + struct cpu_hw_events *cpuc =3D this_cpu_ptr(&cpu_hw_events); + struct hw_perf_event *hwc =3D &event->hw; + + __intel_pmu_pebs_disable(event); =20 if ((event->hw.flags & PERF_X86_EVENT_PEBS_LDLAT) && (x86_pmu.version < 5)) @@ -1621,8 +1636,6 @@ void intel_pmu_pebs_disable(struct perf_event *event) =20 if (cpuc->enabled) wrmsrl(MSR_IA32_PEBS_ENABLE, cpuc->pebs_enabled); - - hwc->config |=3D ARCH_PERFMON_EVENTSEL_INT; } =20 void intel_pmu_pebs_enable_all(void) @@ -2654,11 +2667,26 @@ static void intel_pmu_drain_pebs_icl(struct pt_regs= *iregs, struct perf_sample_d } } =20 +static void __init intel_arch_pebs_init(void) +{ + /* + * Current hybrid platforms always both support arch-PEBS or not + * on all kinds of cores. So directly set x86_pmu.arch_pebs flag + * if boot cpu supports arch-PEBS. + */ + x86_pmu.arch_pebs =3D 1; + x86_pmu.pebs_buffer_size =3D PEBS_BUFFER_SIZE; + x86_pmu.pebs_capable =3D ~0ULL; + + x86_pmu.pebs_enable =3D __intel_pmu_pebs_enable; + x86_pmu.pebs_disable =3D __intel_pmu_pebs_disable; +} + /* * PEBS probe and setup */ =20 -void __init intel_pebs_init(void) +static void __init intel_ds_pebs_init(void) { /* * No support for 32bit formats @@ -2773,6 +2801,14 @@ void __init intel_pebs_init(void) } } =20 +void __init intel_pebs_init(void) +{ + if (x86_pmu.intel_cap.pebs_format =3D=3D 0xf) + intel_arch_pebs_init(); + else + intel_ds_pebs_init(); +} + void perf_restore_debug_store(void) { struct debug_store *ds =3D __this_cpu_read(cpu_hw_events.ds); diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index d201e6ac2ede..23ffad67a927 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -700,6 +700,12 @@ enum hybrid_pmu_type { hybrid_big_small_tiny =3D hybrid_big | hybrid_small_tiny, }; =20 +struct arch_pebs_cap { + u64 caps; + u64 counters; + u64 pdists; +}; + struct x86_hybrid_pmu { struct pmu pmu; const char *name; @@ -744,6 +750,8 @@ struct x86_hybrid_pmu { mid_ack :1, enabled_ack :1; =20 + struct arch_pebs_cap arch_pebs_cap; + u64 pebs_data_source[PERF_PEBS_DATA_SOURCE_MAX]; }; =20 @@ -898,7 +906,7 @@ struct x86_pmu { union perf_capabilities intel_cap; =20 /* - * Intel DebugStore bits + * Intel DebugStore and PEBS bits */ unsigned int bts :1, bts_active :1, @@ -909,7 +917,8 @@ struct x86_pmu { pebs_no_tlb :1, pebs_no_isolation :1, pebs_block :1, - pebs_ept :1; + pebs_ept :1, + arch_pebs :1; int pebs_record_size; int pebs_buffer_size; u64 pebs_events_mask; @@ -921,6 +930,11 @@ struct x86_pmu { u64 rtm_abort_event; u64 pebs_capable; =20 + /* + * Intel Architectural PEBS + */ + struct arch_pebs_cap arch_pebs_cap; + /* * Intel LBR */ @@ -1209,7 +1223,7 @@ int x86_reserve_hardware(void); =20 void x86_release_hardware(void); =20 -int x86_pmu_max_precise(void); +int x86_pmu_max_precise(struct pmu *pmu); =20 void hw_perf_lbr_event_destroy(struct perf_event *event); =20 @@ -1784,6 +1798,11 @@ static inline int intel_pmu_max_num_pebs(struct pmu = *pmu) return fls((u32)hybrid(pmu, pebs_events_mask)); } =20 +static inline bool intel_pmu_has_pebs(void) +{ + return x86_pmu.ds_pebs || x86_pmu.arch_pebs; +} + #else /* CONFIG_CPU_SUP_INTEL */ =20 static inline void reserve_ds_buffers(void) diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_= event.h index 70d1d94aca7e..7fca9494aae9 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -196,6 +196,8 @@ union cpuid10_edx { #define ARCH_PERFMON_EXT_LEAF 0x00000023 #define ARCH_PERFMON_NUM_COUNTER_LEAF 0x1 #define ARCH_PERFMON_ACR_LEAF 0x2 +#define ARCH_PERFMON_PEBS_CAP_LEAF 0x4 +#define ARCH_PERFMON_PEBS_COUNTER_LEAF 0x5 =20 union cpuid35_eax { struct { @@ -206,7 +208,10 @@ union cpuid35_eax { unsigned int acr_subleaf:1; /* Events Sub-Leaf */ unsigned int events_subleaf:1; - unsigned int reserved:28; + /* arch-PEBS Sub-Leaves */ + unsigned int pebs_caps_subleaf:1; + unsigned int pebs_cnts_subleaf:1; + unsigned int reserved:26; } split; unsigned int full; }; --=20 2.40.1 From nobody Fri Dec 19 16:05:42 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 490882820CE; Tue, 15 Apr 2025 08:23:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744705421; cv=none; b=UBfYnxM3QNlP3OEdapKSTVO5G+E+5gWQgVIc9rTxG5Xn9iW8rz5qtFtB6xSyVbitB5b82UpjDfzJwVGV8T/TDbFnWfC77NWXGt1HkzXb5ZkY126RuQbZgxoy3DQLoaM3+nWNDGPBwkYHmoJwlBpzMOKtDTSeIWnomenH6IXrOZM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744705421; c=relaxed/simple; bh=yfmqerRJ8gN/xI2pyqS617K+aks9MjMLGdKd7NNdMgI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Bk12X/BezBTkcQ5Yk1dAxWnz4hQwOKh+DlARtXkzIfXK8MljQdpvGWvDqHrAATCJTHLkudb459IQ8oEDjhf+MxMXzv1m+Fkh8xZP4CeyfuC5sXqrY/f0VsyYVRX+iPE2VayZt0JeWR6MlPHslrrC5isFwuVR2gYD2ve7c3ILIk4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=HCSwIrKk; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="HCSwIrKk" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1744705421; x=1776241421; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=yfmqerRJ8gN/xI2pyqS617K+aks9MjMLGdKd7NNdMgI=; b=HCSwIrKk+twX5MbcyEFZDwM2F/T1C+PAHj5s1f5+BqiV+8+zDiGaGabI cKNjDsFpFJWSKdt/rWUO7y3I40YUGXbIeINvEihR5RbNuDZ6bGH7opWtx ZCYZ4IzGiarrNAk3ZMDJZ6ECAKIUWV6oXjK9DbSBgQkBFIeOcww/VJwAV SoXq8OREsuLqXXTqFj6PYwM/sAdWNlosZwyDmO1yWeImg8xPQ0DjPypyG 00DOVy678Yxy8dMTmQSKgb2OBwR14OmQc9Pj3wlX0owXFMe8PxXVrWo5X yY5IDkjFuy7IOEewkdBenS00YdEQ2S7kQ1WwUQnQOJCgH2raq5mY+Mb/a Q==; X-CSE-ConnectionGUID: PZvePO4STzmmEFkb57hNRg== X-CSE-MsgGUID: 9ufjGxq0TXWDYXc8tUF8Qg== X-IronPort-AV: E=McAfee;i="6700,10204,11403"; a="46115948" X-IronPort-AV: E=Sophos;i="6.15,213,1739865600"; d="scan'208";a="46115948" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2025 01:23:40 -0700 X-CSE-ConnectionGUID: TVzuCOWWQ56ogSsb3MdslA== X-CSE-MsgGUID: ALku7Qj5Tzq4RrsyPXFYcw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.15,213,1739865600"; d="scan'208";a="130055574" Received: from emr.sh.intel.com ([10.112.229.56]) by fmviesa007.fm.intel.com with ESMTP; 15 Apr 2025 01:23:35 -0700 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Ian Rogers , Adrian Hunter , Alexander Shishkin , Kan Liang , Andi Kleen , Eranian Stephane Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Dapeng Mi , Dapeng Mi Subject: [Patch v3 08/22] perf/x86/intel/ds: Factor out PEBS record processing code to functions Date: Tue, 15 Apr 2025 11:44:14 +0000 Message-Id: <20250415114428.341182-9-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250415114428.341182-1-dapeng1.mi@linux.intel.com> References: <20250415114428.341182-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Beside some PEBS record layout difference, arch-PEBS can share most of PEBS record processing code with adaptive PEBS. Thus, factor out these common processing code to independent inline functions, so they can be reused by subsequent arch-PEBS handler. Suggested-by: Kan Liang Signed-off-by: Dapeng Mi --- arch/x86/events/intel/ds.c | 80 ++++++++++++++++++++++++++------------ 1 file changed, 55 insertions(+), 25 deletions(-) diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index 4597b5c48d8a..22831ef003d0 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -2599,6 +2599,54 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs = *iregs, struct perf_sample_d } } =20 +static inline void __intel_pmu_handle_pebs_record(struct pt_regs *iregs, + struct pt_regs *regs, + struct perf_sample_data *data, + void *at, u64 pebs_status, + short *counts, void **last, + setup_fn setup_sample) +{ + struct cpu_hw_events *cpuc =3D this_cpu_ptr(&cpu_hw_events); + struct perf_event *event; + int bit; + + for_each_set_bit(bit, (unsigned long *)&pebs_status, X86_PMC_IDX_MAX) { + event =3D cpuc->events[bit]; + + if (WARN_ON_ONCE(!event) || + WARN_ON_ONCE(!event->attr.precise_ip)) + continue; + + if (counts[bit]++) + __intel_pmu_pebs_event(event, iregs, regs, data, + last[bit], setup_sample); + + last[bit] =3D at; + } +} + +static inline void +__intel_pmu_handle_last_pebs_record(struct pt_regs *iregs, struct pt_regs = *regs, + struct perf_sample_data *data, u64 mask, + short *counts, void **last, + setup_fn setup_sample) +{ + struct cpu_hw_events *cpuc =3D this_cpu_ptr(&cpu_hw_events); + struct perf_event *event; + int bit; + + for_each_set_bit(bit, (unsigned long *)&mask, X86_PMC_IDX_MAX) { + if (!counts[bit]) + continue; + + event =3D cpuc->events[bit]; + + __intel_pmu_pebs_last_event(event, iregs, regs, data, last[bit], + counts[bit], setup_sample); + } + +} + static void intel_pmu_drain_pebs_icl(struct pt_regs *iregs, struct perf_sa= mple_data *data) { short counts[INTEL_PMC_IDX_FIXED + MAX_FIXED_PEBS_EVENTS] =3D {}; @@ -2608,9 +2656,7 @@ static void intel_pmu_drain_pebs_icl(struct pt_regs *= iregs, struct perf_sample_d struct x86_perf_regs perf_regs; struct pt_regs *regs =3D &perf_regs.regs; struct pebs_basic *basic; - struct perf_event *event; void *base, *at, *top; - int bit; u64 mask; =20 if (!x86_pmu.pebs_active) @@ -2623,6 +2669,7 @@ static void intel_pmu_drain_pebs_icl(struct pt_regs *= iregs, struct perf_sample_d =20 mask =3D hybrid(cpuc->pmu, pebs_events_mask) | (hybrid(cpuc->pmu, fixed_cntr_mask64) << INTEL_PMC_IDX_FIXED); + mask &=3D cpuc->pebs_enabled; =20 if (unlikely(base >=3D top)) { intel_pmu_pebs_event_update_no_drain(cpuc, X86_PMC_IDX_MAX); @@ -2640,31 +2687,14 @@ static void intel_pmu_drain_pebs_icl(struct pt_regs= *iregs, struct perf_sample_d if (basic->format_size !=3D cpuc->pebs_record_size) continue; =20 - pebs_status =3D basic->applicable_counters & cpuc->pebs_enabled & mask; - for_each_set_bit(bit, (unsigned long *)&pebs_status, X86_PMC_IDX_MAX) { - event =3D cpuc->events[bit]; - - if (WARN_ON_ONCE(!event) || - WARN_ON_ONCE(!event->attr.precise_ip)) - continue; - - if (counts[bit]++) { - __intel_pmu_pebs_event(event, iregs, regs, data, last[bit], - setup_pebs_adaptive_sample_data); - } - last[bit] =3D at; - } + pebs_status =3D mask & basic->applicable_counters; + __intel_pmu_handle_pebs_record(iregs, regs, data, at, + pebs_status, counts, last, + setup_pebs_adaptive_sample_data); } =20 - for_each_set_bit(bit, (unsigned long *)&mask, X86_PMC_IDX_MAX) { - if (!counts[bit]) - continue; - - event =3D cpuc->events[bit]; - - __intel_pmu_pebs_last_event(event, iregs, regs, data, last[bit], - counts[bit], setup_pebs_adaptive_sample_data); - } + __intel_pmu_handle_last_pebs_record(iregs, regs, data, mask, counts, last, + setup_pebs_adaptive_sample_data); } =20 static void __init intel_arch_pebs_init(void) --=20 2.40.1 From nobody Fri Dec 19 16:05:42 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 18387284679; Tue, 15 Apr 2025 08:23:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744705424; cv=none; b=ndmn0CltYBNx5y6wMcMTrPN3C/Wi+ZdbeZPe8uW79noHBLuJqdcg4ziYyW6YTZURFDRXOKT2HFmaWTzA4gx0Z+QVedpPOzH239210baEy0K9GXkAbay1nZNDIwkg9YVcqWNrwEKv6gPhh5l8uTN2wJhhaKqb1Jr3M/sKbWWKmwo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744705424; c=relaxed/simple; bh=Tj5BN0hlq+3Gds5hqLsHc8rRa4O4HNXumjmV63TI4wg=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=baRsjhbPTy5RtIwseKjb+WaEVELQuPL7s0kXI1CrpneZuOp+bRgLzlwuIx0XP0Dnu/Sjz2ey5PIthnNakluq1rrsRmfmsuXTGLXcdPA3it/9QMIX6R/dFE/UCJRDKrK1Ckx53mQu158Sa+we8Bb7TPgITQPxoM/U3VOU0VgFdL8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=GInw0dCf; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="GInw0dCf" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1744705424; x=1776241424; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Tj5BN0hlq+3Gds5hqLsHc8rRa4O4HNXumjmV63TI4wg=; b=GInw0dCfAYyBAz+2Glu2JxlOy8jvToU6c9FsTYh+sKZ4K5L6wlktsq6v rBC2A5xByCHcHYEoRdQh8gC1U4Nup84wt5vYSbTz6fBJs5ulU71XSgZf3 7Sr9DjbyV00jubzHkE7ipCGezFCkAgd0l9epNMvKWlixjgD09EBXMNHYR 5Prq6+SXKIUGbikSXkgLluuta1fSb6kmZmk5qoOrk8/FH61Zf1IkyHDKQ 8vNku11FuGgn+KuRizXiPykN3JjdH+HXBFQv3trMfLQqZkreztTcbGDtR 0m62OLyAEysIYhZYtni/yCjelqmAr3yDKfIlHCEfRaTAfUiRm9rmHk74b g==; X-CSE-ConnectionGUID: ByRs/LVNTACAWhn21pSBgQ== X-CSE-MsgGUID: 1VZ7mCYeR2qO7YiPH3hKbw== X-IronPort-AV: E=McAfee;i="6700,10204,11403"; a="46115963" X-IronPort-AV: E=Sophos;i="6.15,213,1739865600"; d="scan'208";a="46115963" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2025 01:23:43 -0700 X-CSE-ConnectionGUID: 4F/iUGuARlK/RqiWHrYiKA== X-CSE-MsgGUID: Q7lw6fyUS9mKWLutRnMxow== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.15,213,1739865600"; d="scan'208";a="130055583" Received: from emr.sh.intel.com ([10.112.229.56]) by fmviesa007.fm.intel.com with ESMTP; 15 Apr 2025 01:23:39 -0700 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Ian Rogers , Adrian Hunter , Alexander Shishkin , Kan Liang , Andi Kleen , Eranian Stephane Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Dapeng Mi , Dapeng Mi Subject: [Patch v3 09/22] perf/x86/intel/ds: Factor out PEBS group processing code to functions Date: Tue, 15 Apr 2025 11:44:15 +0000 Message-Id: <20250415114428.341182-10-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250415114428.341182-1-dapeng1.mi@linux.intel.com> References: <20250415114428.341182-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Adaptive PEBS and arch-PEBS share lots of same code to process these PEBS groups, like basic, GPR and meminfo groups. Extract these shared code to generic functions to avoid duplicated code. Signed-off-by: Dapeng Mi --- arch/x86/events/intel/ds.c | 172 ++++++++++++++++++++++--------------- 1 file changed, 105 insertions(+), 67 deletions(-) diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index 22831ef003d0..6c872bf2e916 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -2073,6 +2073,91 @@ static inline void __setup_pebs_counter_group(struct= cpu_hw_events *cpuc, =20 #define PEBS_LATENCY_MASK 0xffff =20 +static inline void __setup_perf_sample_data(struct perf_event *event, + struct pt_regs *iregs, + struct perf_sample_data *data) +{ + perf_sample_data_init(data, 0, event->hw.last_period); + data->period =3D event->hw.last_period; + + /* + * We must however always use iregs for the unwinder to stay sane; the + * record BP,SP,IP can point into thin air when the record is from a + * previous PMI context or an (I)RET happened between the record and + * PMI. + */ + perf_sample_save_callchain(data, event, iregs); +} + +static inline void __setup_pebs_basic_group(struct perf_event *event, + struct pt_regs *regs, + struct perf_sample_data *data, + u64 sample_type, u64 ip, + u64 tsc, u16 retire) +{ + /* The ip in basic is EventingIP */ + set_linear_ip(regs, ip); + regs->flags =3D PERF_EFLAGS_EXACT; + setup_pebs_time(event, data, tsc); + + if (sample_type & PERF_SAMPLE_WEIGHT_STRUCT) + data->weight.var3_w =3D retire; +} + +static inline void __setup_pebs_gpr_group(struct perf_event *event, + struct pt_regs *regs, + struct pebs_gprs *gprs, + u64 sample_type) +{ + if (event->attr.precise_ip < 2) { + set_linear_ip(regs, gprs->ip); + regs->flags &=3D ~PERF_EFLAGS_EXACT; + } + + if (sample_type & (PERF_SAMPLE_REGS_INTR | PERF_SAMPLE_REGS_USER)) + adaptive_pebs_save_regs(regs, gprs); +} + +static inline void __setup_pebs_meminfo_group(struct perf_event *event, + struct perf_sample_data *data, + u64 sample_type, u64 latency, + u16 instr_latency, u64 address, + u64 aux, u64 tsx_tuning, u64 ax) +{ + if (sample_type & PERF_SAMPLE_WEIGHT_TYPE) { + u64 tsx_latency =3D intel_get_tsx_weight(tsx_tuning); + + data->weight.var2_w =3D instr_latency; + + /* + * Although meminfo::latency is defined as a u64, + * only the lower 32 bits include the valid data + * in practice on Ice Lake and earlier platforms. + */ + if (sample_type & PERF_SAMPLE_WEIGHT) + data->weight.full =3D latency ?: tsx_latency; + else + data->weight.var1_dw =3D (u32)latency ?: tsx_latency; + + data->sample_flags |=3D PERF_SAMPLE_WEIGHT_TYPE; + } + + if (sample_type & PERF_SAMPLE_DATA_SRC) { + data->data_src.val =3D get_data_src(event, aux); + data->sample_flags |=3D PERF_SAMPLE_DATA_SRC; + } + + if (sample_type & PERF_SAMPLE_ADDR_TYPE) { + data->addr =3D address; + data->sample_flags |=3D PERF_SAMPLE_ADDR; + } + + if (sample_type & PERF_SAMPLE_TRANSACTION) { + data->txn =3D intel_get_tsx_transaction(tsx_tuning, ax); + data->sample_flags |=3D PERF_SAMPLE_TRANSACTION; + } +} + /* * With adaptive PEBS the layout depends on what fields are configured. */ @@ -2082,12 +2167,14 @@ static void setup_pebs_adaptive_sample_data(struct = perf_event *event, struct pt_regs *regs) { struct cpu_hw_events *cpuc =3D this_cpu_ptr(&cpu_hw_events); + u64 sample_type =3D event->attr.sample_type; struct pebs_basic *basic =3D __pebs; void *next_record =3D basic + 1; - u64 sample_type, format_group; struct pebs_meminfo *meminfo =3D NULL; struct pebs_gprs *gprs =3D NULL; struct x86_perf_regs *perf_regs; + u64 format_group; + u16 retire; =20 if (basic =3D=3D NULL) return; @@ -2095,32 +2182,17 @@ static void setup_pebs_adaptive_sample_data(struct = perf_event *event, perf_regs =3D container_of(regs, struct x86_perf_regs, regs); perf_regs->xmm_regs =3D NULL; =20 - sample_type =3D event->attr.sample_type; format_group =3D basic->format_group; - perf_sample_data_init(data, 0, event->hw.last_period); - data->period =3D event->hw.last_period; =20 - setup_pebs_time(event, data, basic->tsc); - - /* - * We must however always use iregs for the unwinder to stay sane; the - * record BP,SP,IP can point into thin air when the record is from a - * previous PMI context or an (I)RET happened between the record and - * PMI. - */ - perf_sample_save_callchain(data, event, iregs); + __setup_perf_sample_data(event, iregs, data); =20 *regs =3D *iregs; - /* The ip in basic is EventingIP */ - set_linear_ip(regs, basic->ip); - regs->flags =3D PERF_EFLAGS_EXACT; =20 - if (sample_type & PERF_SAMPLE_WEIGHT_STRUCT) { - if (x86_pmu.flags & PMU_FL_RETIRE_LATENCY) - data->weight.var3_w =3D basic->retire_latency; - else - data->weight.var3_w =3D 0; - } + /* basic group */ + retire =3D x86_pmu.flags & PMU_FL_RETIRE_LATENCY ? + basic->retire_latency : 0; + __setup_pebs_basic_group(event, regs, data, sample_type, + basic->ip, basic->tsc, retire); =20 /* * The record for MEMINFO is in front of GP @@ -2136,54 +2208,20 @@ static void setup_pebs_adaptive_sample_data(struct = perf_event *event, gprs =3D next_record; next_record =3D gprs + 1; =20 - if (event->attr.precise_ip < 2) { - set_linear_ip(regs, gprs->ip); - regs->flags &=3D ~PERF_EFLAGS_EXACT; - } - - if (sample_type & (PERF_SAMPLE_REGS_INTR | PERF_SAMPLE_REGS_USER)) - adaptive_pebs_save_regs(regs, gprs); + __setup_pebs_gpr_group(event, regs, gprs, sample_type); } =20 if (format_group & PEBS_DATACFG_MEMINFO) { - if (sample_type & PERF_SAMPLE_WEIGHT_TYPE) { - u64 latency =3D x86_pmu.flags & PMU_FL_INSTR_LATENCY ? - meminfo->cache_latency : meminfo->mem_latency; - - if (x86_pmu.flags & PMU_FL_INSTR_LATENCY) - data->weight.var2_w =3D meminfo->instr_latency; - - /* - * Although meminfo::latency is defined as a u64, - * only the lower 32 bits include the valid data - * in practice on Ice Lake and earlier platforms. - */ - if (sample_type & PERF_SAMPLE_WEIGHT) { - data->weight.full =3D latency ?: - intel_get_tsx_weight(meminfo->tsx_tuning); - } else { - data->weight.var1_dw =3D (u32)latency ?: - intel_get_tsx_weight(meminfo->tsx_tuning); - } - - data->sample_flags |=3D PERF_SAMPLE_WEIGHT_TYPE; - } - - if (sample_type & PERF_SAMPLE_DATA_SRC) { - data->data_src.val =3D get_data_src(event, meminfo->aux); - data->sample_flags |=3D PERF_SAMPLE_DATA_SRC; - } - - if (sample_type & PERF_SAMPLE_ADDR_TYPE) { - data->addr =3D meminfo->address; - data->sample_flags |=3D PERF_SAMPLE_ADDR; - } - - if (sample_type & PERF_SAMPLE_TRANSACTION) { - data->txn =3D intel_get_tsx_transaction(meminfo->tsx_tuning, - gprs ? gprs->ax : 0); - data->sample_flags |=3D PERF_SAMPLE_TRANSACTION; - } + u64 latency =3D x86_pmu.flags & PMU_FL_INSTR_LATENCY ? + meminfo->cache_latency : meminfo->mem_latency; + u64 instr_latency =3D x86_pmu.flags & PMU_FL_INSTR_LATENCY ? + meminfo->instr_latency : 0; + u64 ax =3D gprs ? gprs->ax : 0; + + __setup_pebs_meminfo_group(event, data, sample_type, latency, + instr_latency, meminfo->address, + meminfo->aux, meminfo->tsx_tuning, + ax); } =20 if (format_group & PEBS_DATACFG_XMMS) { --=20 2.40.1 From nobody Fri Dec 19 16:05:42 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B840028469B; Tue, 15 Apr 2025 08:23:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744705428; cv=none; b=qzKFFDPIdBqlzAV0qKBK+tSa9v/FUl9cT6NsqLsrabBfotkvySHN8fNz++CpfReIzyEu7zBd+4eTQSkhIL+1IQmHgtVtV6Amiajm8szx9IhZ1y9pN9MIj+yF4+bffSHx6D1xHWYH66/KKBShNmHSqsd/keErYas5YWtfltChtVA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744705428; c=relaxed/simple; bh=MD6Ch380jgoWZZj3Q4YLRh4W6kcQ6jXtLsCcyCXXTjA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=PF6cc40gFp+5rEgtCLgN2RFElEfnFmgQ+1o0uaPQjlBxOQ+Z0OhJ4PtxjY41lMwGCRNL3RkGmN3/tXT9NO229IWNd++0qa2S/EZa5noTMUfqE2PyDT/PBvbDD7N82HTnYMtb9vWKWFp0rkrp/zPKyhYF15VKjRU0yPNvrteRDxY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=JtLTFiei; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="JtLTFiei" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1744705427; x=1776241427; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=MD6Ch380jgoWZZj3Q4YLRh4W6kcQ6jXtLsCcyCXXTjA=; b=JtLTFieiCYWa74y92Df378744ovm+Ry+MhZxfnHxCHEE1ilGwHOlE3jq SVG1IGiIUQgKF1FJm7sMgTDbxtqklxhqmfOVyESBEDk+/E9vCCnjtuY1T kJH9aC3JB4eL9dUruutyvBufmO64yBGjbGwrcA5LdrBSwc93LTObWzmHq 7G9jluNtV+UGtGymezJn/UHu8QD6CDxhOHal3DUyX9bmZ4PzQ67wGHz9E k4+17t8Ie65qhRdqzFn2cqsfd40e3jlNR2qbBeZElaOg/mHXIrojaHlzf kTGeJBPqsvoG6kzZ7gOMPAFsAk0ZEUmqg5OrykCTdmwnjBCbiLbkZL8w2 Q==; X-CSE-ConnectionGUID: cfYPtfKYSqOX44FYPY2TeA== X-CSE-MsgGUID: Oi/vAg8YTwmeEUwkpxajIw== X-IronPort-AV: E=McAfee;i="6700,10204,11403"; a="46115978" X-IronPort-AV: E=Sophos;i="6.15,213,1739865600"; d="scan'208";a="46115978" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2025 01:23:47 -0700 X-CSE-ConnectionGUID: SYTPN5ZpRAeZNLm+NkYAJQ== X-CSE-MsgGUID: q7zIDoJgSEeHQw8/z9hlrQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.15,213,1739865600"; d="scan'208";a="130055601" Received: from emr.sh.intel.com ([10.112.229.56]) by fmviesa007.fm.intel.com with ESMTP; 15 Apr 2025 01:23:43 -0700 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Ian Rogers , Adrian Hunter , Alexander Shishkin , Kan Liang , Andi Kleen , Eranian Stephane Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Dapeng Mi , Dapeng Mi Subject: [Patch v3 10/22] perf/x86/intel: Process arch-PEBS records or record fragments Date: Tue, 15 Apr 2025 11:44:16 +0000 Message-Id: <20250415114428.341182-11-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250415114428.341182-1-dapeng1.mi@linux.intel.com> References: <20250415114428.341182-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" A significant difference with adaptive PEBS is that arch-PEBS record supports fragments which means an arch-PEBS record could be split into several independent fragments which have its own arch-PEBS header in each fragment. This patch defines architectural PEBS record layout structures and add helpers to process arch-PEBS records or fragments. Only legacy PEBS groups like basic, GPR, XMM and LBR groups are supported in this patch, the new added YMM/ZMM/OPMASK vector registers capturing would be supported in subsequent patches. Signed-off-by: Dapeng Mi --- arch/x86/events/intel/core.c | 15 ++- arch/x86/events/intel/ds.c | 180 ++++++++++++++++++++++++++++++ arch/x86/include/asm/msr-index.h | 6 + arch/x86/include/asm/perf_event.h | 100 +++++++++++++++++ 4 files changed, 300 insertions(+), 1 deletion(-) diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index 09e2a23f9bcc..0f911e974e02 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -3216,6 +3216,19 @@ static int handle_pmi_common(struct pt_regs *regs, u= 64 status) status &=3D ~GLOBAL_STATUS_PERF_METRICS_OVF_BIT; } =20 + /* + * Arch PEBS sets bit 54 in the global status register + */ + if (__test_and_clear_bit(GLOBAL_STATUS_ARCH_PEBS_THRESHOLD_BIT, + (unsigned long *)&status)) { + handled++; + static_call(x86_pmu_drain_pebs)(regs, &data); + + if (cpuc->events[INTEL_PMC_IDX_FIXED_SLOTS] && + is_pebs_counter_event_group(cpuc->events[INTEL_PMC_IDX_FIXED_SLOTS])) + status &=3D ~GLOBAL_STATUS_PERF_METRICS_OVF_BIT; + } + /* * Intel PT */ @@ -3270,7 +3283,7 @@ static int handle_pmi_common(struct pt_regs *regs, u6= 4 status) * The PEBS buffer has to be drained before handling the A-PMI */ if (is_pebs_counter_event_group(event)) - x86_pmu.drain_pebs(regs, &data); + static_call(x86_pmu_drain_pebs)(regs, &data); =20 last_period =3D event->hw.last_period; =20 diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index 6c872bf2e916..ed0bccb04b95 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -2272,6 +2272,114 @@ static void setup_pebs_adaptive_sample_data(struct = perf_event *event, format_group); } =20 +static inline bool arch_pebs_record_continued(struct arch_pebs_header *hea= der) +{ + /* Continue bit or null PEBS record indicates fragment follows. */ + return header->cont || !(header->format & GENMASK_ULL(63, 16)); +} + +static void setup_arch_pebs_sample_data(struct perf_event *event, + struct pt_regs *iregs, void *__pebs, + struct perf_sample_data *data, + struct pt_regs *regs) +{ + struct cpu_hw_events *cpuc =3D this_cpu_ptr(&cpu_hw_events); + u64 sample_type =3D event->attr.sample_type; + struct arch_pebs_header *header =3D NULL; + struct arch_pebs_aux *meminfo =3D NULL; + struct arch_pebs_gprs *gprs =3D NULL; + struct x86_perf_regs *perf_regs; + void *next_record; + void *at =3D __pebs; + + if (at =3D=3D NULL) + return; + + perf_regs =3D container_of(regs, struct x86_perf_regs, regs); + perf_regs->xmm_regs =3D NULL; + + __setup_perf_sample_data(event, iregs, data); + + *regs =3D *iregs; + +again: + header =3D at; + next_record =3D at + sizeof(struct arch_pebs_header); + if (header->basic) { + struct arch_pebs_basic *basic =3D next_record; + u16 retire =3D 0; + + next_record =3D basic + 1; + + if (sample_type & PERF_SAMPLE_WEIGHT_STRUCT) + retire =3D basic->valid ? basic->retire : 0; + __setup_pebs_basic_group(event, regs, data, sample_type, + basic->ip, basic->tsc, retire); + } + + /* + * The record for MEMINFO is in front of GP + * But PERF_SAMPLE_TRANSACTION needs gprs->ax. + * Save the pointer here but process later. + */ + if (header->aux) { + meminfo =3D next_record; + next_record =3D meminfo + 1; + } + + if (header->gpr) { + gprs =3D next_record; + next_record =3D gprs + 1; + + __setup_pebs_gpr_group(event, regs, (struct pebs_gprs *)gprs, + sample_type); + } + + if (header->aux) { + u64 ax =3D gprs ? gprs->ax : 0; + + __setup_pebs_meminfo_group(event, data, sample_type, + meminfo->cache_latency, + meminfo->instr_latency, + meminfo->address, meminfo->aux, + meminfo->tsx_tuning, ax); + } + + if (header->xmm) { + struct arch_pebs_xmm *xmm; + + next_record +=3D sizeof(struct arch_pebs_xer_header); + + xmm =3D next_record; + perf_regs->xmm_regs =3D xmm->xmm; + next_record =3D xmm + 1; + } + + if (header->lbr) { + struct arch_pebs_lbr_header *lbr_header =3D next_record; + struct lbr_entry *lbr; + int num_lbr; + + next_record =3D lbr_header + 1; + lbr =3D next_record; + + num_lbr =3D header->lbr =3D=3D ARCH_PEBS_LBR_NUM_VAR ? lbr_header->depth= : + header->lbr * ARCH_PEBS_BASE_LBR_ENTRIES; + next_record +=3D num_lbr * sizeof(struct lbr_entry); + + if (has_branch_stack(event)) { + intel_pmu_store_pebs_lbrs(lbr); + intel_pmu_lbr_save_brstack(data, cpuc, event); + } + } + + /* Parse followed fragments if there are. */ + if (arch_pebs_record_continued(header)) { + at =3D at + header->size; + goto again; + } +} + static inline void * get_next_pebs_record_by_bit(void *base, void *top, int bit) { @@ -2735,6 +2843,77 @@ static void intel_pmu_drain_pebs_icl(struct pt_regs = *iregs, struct perf_sample_d setup_pebs_adaptive_sample_data); } =20 +static void intel_pmu_drain_arch_pebs(struct pt_regs *iregs, + struct perf_sample_data *data) +{ + short counts[INTEL_PMC_IDX_FIXED + MAX_FIXED_PEBS_EVENTS] =3D {}; + void *last[INTEL_PMC_IDX_FIXED + MAX_FIXED_PEBS_EVENTS]; + struct cpu_hw_events *cpuc =3D this_cpu_ptr(&cpu_hw_events); + union arch_pebs_index index; + struct x86_perf_regs perf_regs; + struct pt_regs *regs =3D &perf_regs.regs; + void *base, *at, *top; + u64 mask; + + rdmsrl(MSR_IA32_PEBS_INDEX, index.full); + + if (unlikely(!index.split.wr)) { + intel_pmu_pebs_event_update_no_drain(cpuc, X86_PMC_IDX_MAX); + return; + } + + base =3D cpuc->ds_pebs_vaddr; + top =3D (void *)((u64)cpuc->ds_pebs_vaddr + + (index.split.wr << ARCH_PEBS_INDEX_WR_SHIFT)); + + mask =3D hybrid(cpuc->pmu, arch_pebs_cap).counters & cpuc->pebs_enabled; + + if (!iregs) + iregs =3D &dummy_iregs; + + /* Process all but the last event for each counter. */ + for (at =3D base; at < top;) { + struct arch_pebs_header *header; + struct arch_pebs_basic *basic; + u64 pebs_status; + + header =3D at; + + if (WARN_ON_ONCE(!header->size)) + break; + + /* 1st fragment or single record must have basic group */ + if (!header->basic) { + at +=3D header->size; + continue; + } + + basic =3D at + sizeof(struct arch_pebs_header); + pebs_status =3D mask & basic->applicable_counters; + __intel_pmu_handle_pebs_record(iregs, regs, data, at, + pebs_status, counts, last, + setup_arch_pebs_sample_data); + + /* Skip non-last fragments */ + while (arch_pebs_record_continued(header)) { + if (!header->size) + break; + at +=3D header->size; + header =3D at; + } + + /* Skip last fragment or the single record */ + at +=3D header->size; + } + + __intel_pmu_handle_last_pebs_record(iregs, regs, data, mask, counts, + last, setup_arch_pebs_sample_data); + + index.split.wr =3D 0; + index.split.full =3D 0; + wrmsrl(MSR_IA32_PEBS_INDEX, index.full); +} + static void __init intel_arch_pebs_init(void) { /* @@ -2744,6 +2923,7 @@ static void __init intel_arch_pebs_init(void) */ x86_pmu.arch_pebs =3D 1; x86_pmu.pebs_buffer_size =3D PEBS_BUFFER_SIZE; + x86_pmu.drain_pebs =3D intel_pmu_drain_arch_pebs; x86_pmu.pebs_capable =3D ~0ULL; =20 x86_pmu.pebs_enable =3D __intel_pmu_pebs_enable; diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-in= dex.h index 53da787b9326..d77048df8e72 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -314,6 +314,12 @@ #define PERF_CAP_PEBS_MASK (PERF_CAP_PEBS_TRAP | PERF_CAP_ARCH_REG | \ PERF_CAP_PEBS_FORMAT | PERF_CAP_PEBS_BASELINE) =20 +/* Arch PEBS */ +#define MSR_IA32_PEBS_BASE 0x000003f4 +#define MSR_IA32_PEBS_INDEX 0x000003f5 +#define ARCH_PEBS_OFFSET_MASK 0x7fffff +#define ARCH_PEBS_INDEX_WR_SHIFT 4 + #define MSR_IA32_RTIT_CTL 0x00000570 #define RTIT_CTL_TRACEEN BIT(0) #define RTIT_CTL_CYCLEACC BIT(1) diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_= event.h index 7fca9494aae9..7f9d8e6577f0 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -433,6 +433,8 @@ static inline bool is_topdown_idx(int idx) #define GLOBAL_STATUS_LBRS_FROZEN BIT_ULL(GLOBAL_STATUS_LBRS_FROZEN_BIT) #define GLOBAL_STATUS_TRACE_TOPAPMI_BIT 55 #define GLOBAL_STATUS_TRACE_TOPAPMI BIT_ULL(GLOBAL_STATUS_TRACE_TOPAPMI_B= IT) +#define GLOBAL_STATUS_ARCH_PEBS_THRESHOLD_BIT 54 +#define GLOBAL_STATUS_ARCH_PEBS_THRESHOLD BIT_ULL(GLOBAL_STATUS_ARCH_PEBS_= THRESHOLD_BIT) #define GLOBAL_STATUS_PERF_METRICS_OVF_BIT 48 =20 #define GLOBAL_CTRL_EN_PERF_METRICS 48 @@ -503,6 +505,104 @@ struct pebs_cntr_header { =20 #define INTEL_CNTR_METRICS 0x3 =20 +/* + * Arch PEBS + */ +union arch_pebs_index { + struct { + u64 rsvd:4, + wr:23, + rsvd2:4, + full:1, + en:1, + rsvd3:3, + thresh:23, + rsvd4:5; + } split; + u64 full; +}; + +struct arch_pebs_header { + union { + u64 format; + struct { + u64 size:16, /* Record size */ + rsvd:14, + mode:1, /* 64BIT_MODE */ + cont:1, + rsvd2:3, + cntr:5, + lbr:2, + rsvd3:7, + xmm:1, + ymmh:1, + rsvd4:2, + opmask:1, + zmmh:1, + h16zmm:1, + rsvd5:5, + gpr:1, + aux:1, + basic:1; + }; + }; + u64 rsvd6; +}; + +struct arch_pebs_basic { + u64 ip; + u64 applicable_counters; + u64 tsc; + u64 retire :16, /* Retire Latency */ + valid :1, + rsvd :47; + u64 rsvd2; + u64 rsvd3; +}; + +struct arch_pebs_aux { + u64 address; + u64 rsvd; + u64 rsvd2; + u64 rsvd3; + u64 rsvd4; + u64 aux; + u64 instr_latency :16, + pad2 :16, + cache_latency :16, + pad3 :16; + u64 tsx_tuning; +}; + +struct arch_pebs_gprs { + u64 flags, ip, ax, cx, dx, bx, sp, bp, si, di; + u64 r8, r9, r10, r11, r12, r13, r14, r15, ssp; + u64 rsvd; +}; + +struct arch_pebs_xer_header { + u64 xstate; + u64 rsvd; +}; + +struct arch_pebs_xmm { + u64 xmm[16*2]; /* two entries for each register */ +}; + +#define ARCH_PEBS_LBR_NAN 0x0 +#define ARCH_PEBS_LBR_NUM_8 0x1 +#define ARCH_PEBS_LBR_NUM_16 0x2 +#define ARCH_PEBS_LBR_NUM_VAR 0x3 +#define ARCH_PEBS_BASE_LBR_ENTRIES 8 +struct arch_pebs_lbr_header { + u64 rsvd; + u64 ctl; + u64 depth; + u64 ler_from; + u64 ler_to; + u64 ler_info; +}; + /* * AMD Extended Performance Monitoring and Debug cpuid feature detection */ --=20 2.40.1 From nobody Fri Dec 19 16:05:42 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 72DC6289356; Tue, 15 Apr 2025 08:23:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744705432; cv=none; b=FR2s4b9SczuCu9jgieMQlCzUrXSIwYZ/5D5SC7rpdoSKIZLeqNIk7bzUvsL342hAZd4hFeZXKUjV/8SPEvCGNkFaUkIVtA/FCVWQcLgSBrn6JHfqWm8q2P+n6D/CJ5mD0ZmItpmd8x9vVomp8X2XHQIAmTmWdapUhn1UQiFEQrM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744705432; c=relaxed/simple; bh=HUTZPECpWhsPs2UA980gXyymbBaboD8UG/qcrOuGyLA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=olqbCSo9cxGOEQREr6SroyS+UpWNnIdDgNIGgP7O//WFSAL2Geq1jR/8aT31ayS9Uv4h9ohoPqu755SPREcC9IEWQe9TeKZYXZwDODdeWpeY8J6fumzCV4DJHO5pOaRV394NKCjNU/rIlcT+bg+QbIjlTKCGSJiYheRngTbNwoA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=RTGlx2FO; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="RTGlx2FO" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1744705431; x=1776241431; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=HUTZPECpWhsPs2UA980gXyymbBaboD8UG/qcrOuGyLA=; b=RTGlx2FOdkUA6hLIlBNRPETF9CW2Rv+42HZZmwrMZY74+plkGLIDXfLA fQDavf2CgbVxM3X2LpAt2XS9Y4/sIHX31ldIfOQO2L2ZRHas1e1P5vCcF 0Jv4msOBQD4HvUzh/rvuBAPus2PbB7YLKvAmLv4jQzGm+oKqUh++8rxC3 Kn5M5ERyHrpErH9fN9Ia/mAOntIzQS1IyopY0dQKb6S2cJtx6EaMJE2Ov tlhSzXwy6x3X1kFnRCyvcxyUWHXRs/az43eoXCWUIrYqbbKRKTmajmIi2 +/h8uWHA2lm75V0ZUC8C4G00njsZbqTgXCcaay0At7VzgJ+ZAngNfWETe A==; X-CSE-ConnectionGUID: jfF1EftPQTeNMXcHWWUpGQ== X-CSE-MsgGUID: 0WxtPk86Tgi7Q1eM0xvWLw== X-IronPort-AV: E=McAfee;i="6700,10204,11403"; a="46116014" X-IronPort-AV: E=Sophos;i="6.15,213,1739865600"; d="scan'208";a="46116014" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2025 01:23:50 -0700 X-CSE-ConnectionGUID: F0aH9UugRzWXWC8rHWnqtw== X-CSE-MsgGUID: OIVObpRCR+SADueXGKk8GQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.15,213,1739865600"; d="scan'208";a="130055610" Received: from emr.sh.intel.com ([10.112.229.56]) by fmviesa007.fm.intel.com with ESMTP; 15 Apr 2025 01:23:46 -0700 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Ian Rogers , Adrian Hunter , Alexander Shishkin , Kan Liang , Andi Kleen , Eranian Stephane Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Dapeng Mi , Dapeng Mi Subject: [Patch v3 11/22] perf/x86/intel: Allocate arch-PEBS buffer and initialize PEBS_BASE MSR Date: Tue, 15 Apr 2025 11:44:17 +0000 Message-Id: <20250415114428.341182-12-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250415114428.341182-1-dapeng1.mi@linux.intel.com> References: <20250415114428.341182-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Arch-PEBS introduces a new MSR IA32_PEBS_BASE to store the arch-PEBS buffer physical address. This patch allocates arch-PEBS buffer and then initialize IA32_PEBS_BASE MSR with the buffer physical address. Co-developed-by: Kan Liang Signed-off-by: Kan Liang Signed-off-by: Dapeng Mi --- arch/x86/events/intel/core.c | 2 + arch/x86/events/intel/ds.c | 69 ++++++++++++++++++++++++++------- arch/x86/events/perf_event.h | 7 +++- arch/x86/include/asm/intel_ds.h | 3 +- 4 files changed, 66 insertions(+), 15 deletions(-) diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index 0f911e974e02..e0be6be50936 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -5448,6 +5448,7 @@ static void intel_pmu_cpu_starting(int cpu) return; =20 init_debug_store_on_cpu(cpu); + init_arch_pebs_buf_on_cpu(cpu); /* * Deal with CPUs that don't clear their LBRs on power-up, and that may * even boot with LBRs enabled. @@ -5545,6 +5546,7 @@ static void free_excl_cntrs(struct cpu_hw_events *cpu= c) static void intel_pmu_cpu_dying(int cpu) { fini_debug_store_on_cpu(cpu); + fini_arch_pebs_buf_on_cpu(cpu); } =20 void intel_cpuc_finish(struct cpu_hw_events *cpuc) diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index ed0bccb04b95..7437a52ba5f0 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -624,13 +624,18 @@ static int alloc_pebs_buffer(int cpu) int max, node =3D cpu_to_node(cpu); void *buffer, *insn_buff, *cea; =20 - if (!x86_pmu.ds_pebs) + if (!intel_pmu_has_pebs()) return 0; =20 - buffer =3D dsalloc_pages(bsiz, GFP_KERNEL, cpu); + buffer =3D dsalloc_pages(bsiz, preemptible() ? GFP_KERNEL : GFP_ATOMIC, c= pu); if (unlikely(!buffer)) return -ENOMEM; =20 + if (x86_pmu.arch_pebs) { + hwev->pebs_vaddr =3D buffer; + return 0; + } + /* * HSW+ already provides us the eventing ip; no need to allocate this * buffer then. @@ -643,7 +648,7 @@ static int alloc_pebs_buffer(int cpu) } per_cpu(insn_buffer, cpu) =3D insn_buff; } - hwev->ds_pebs_vaddr =3D buffer; + hwev->pebs_vaddr =3D buffer; /* Update the cpu entry area mapping */ cea =3D &get_cpu_entry_area(cpu)->cpu_debug_buffers.pebs_buffer; ds->pebs_buffer_base =3D (unsigned long) cea; @@ -659,17 +664,20 @@ static void release_pebs_buffer(int cpu) struct cpu_hw_events *hwev =3D per_cpu_ptr(&cpu_hw_events, cpu); void *cea; =20 - if (!x86_pmu.ds_pebs) + if (!intel_pmu_has_pebs()) return; =20 - kfree(per_cpu(insn_buffer, cpu)); - per_cpu(insn_buffer, cpu) =3D NULL; + if (x86_pmu.ds_pebs) { + kfree(per_cpu(insn_buffer, cpu)); + per_cpu(insn_buffer, cpu) =3D NULL; =20 - /* Clear the fixmap */ - cea =3D &get_cpu_entry_area(cpu)->cpu_debug_buffers.pebs_buffer; - ds_clear_cea(cea, x86_pmu.pebs_buffer_size); - dsfree_pages(hwev->ds_pebs_vaddr, x86_pmu.pebs_buffer_size); - hwev->ds_pebs_vaddr =3D NULL; + /* Clear the fixmap */ + cea =3D &get_cpu_entry_area(cpu)->cpu_debug_buffers.pebs_buffer; + ds_clear_cea(cea, x86_pmu.pebs_buffer_size); + } + + dsfree_pages(hwev->pebs_vaddr, x86_pmu.pebs_buffer_size); + hwev->pebs_vaddr =3D NULL; } =20 static int alloc_bts_buffer(int cpu) @@ -822,6 +830,41 @@ void reserve_ds_buffers(void) } } =20 +void init_arch_pebs_buf_on_cpu(int cpu) +{ + struct cpu_hw_events *cpuc =3D per_cpu_ptr(&cpu_hw_events, cpu); + u64 arch_pebs_base; + + if (!x86_pmu.arch_pebs) + return; + + if (alloc_pebs_buffer(cpu) < 0 || !cpuc->pebs_vaddr) { + WARN(1, "Fail to allocate PEBS buffer on CPU %d\n", cpu); + x86_pmu.pebs_active =3D 0; + return; + } + + /* + * 4KB-aligned pointer of the output buffer + * (__alloc_pages_node() return page aligned address) + * Buffer Size =3D 4KB * 2^SIZE + * contiguous physical buffer (__alloc_pages_node() with order) + */ + arch_pebs_base =3D virt_to_phys(cpuc->pebs_vaddr) | PEBS_BUFFER_SHIFT; + wrmsr_on_cpu(cpu, MSR_IA32_PEBS_BASE, (u32)arch_pebs_base, + (u32)(arch_pebs_base >> 32)); + x86_pmu.pebs_active =3D 1; +} + +void fini_arch_pebs_buf_on_cpu(int cpu) +{ + if (!x86_pmu.arch_pebs) + return; + + release_pebs_buffer(cpu); + wrmsr_on_cpu(cpu, MSR_IA32_PEBS_BASE, 0, 0); +} + /* * BTS */ @@ -2862,8 +2905,8 @@ static void intel_pmu_drain_arch_pebs(struct pt_regs = *iregs, return; } =20 - base =3D cpuc->ds_pebs_vaddr; - top =3D (void *)((u64)cpuc->ds_pebs_vaddr + + base =3D cpuc->pebs_vaddr; + top =3D (void *)((u64)cpuc->pebs_vaddr + (index.split.wr << ARCH_PEBS_INDEX_WR_SHIFT)); =20 mask =3D hybrid(cpuc->pmu, arch_pebs_cap).counters & cpuc->pebs_enabled; diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 23ffad67a927..d93d4c7a9876 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -275,8 +275,9 @@ struct cpu_hw_events { * Intel DebugStore bits */ struct debug_store *ds; - void *ds_pebs_vaddr; void *ds_bts_vaddr; + /* DS based PEBS or arch-PEBS buffer address */ + void *pebs_vaddr; u64 pebs_enabled; int n_pebs; int n_large_pebs; @@ -1610,6 +1611,10 @@ extern void intel_cpuc_finish(struct cpu_hw_events *= cpuc); =20 int intel_pmu_init(void); =20 +void init_arch_pebs_buf_on_cpu(int cpu); + +void fini_arch_pebs_buf_on_cpu(int cpu); + void init_debug_store_on_cpu(int cpu); =20 void fini_debug_store_on_cpu(int cpu); diff --git a/arch/x86/include/asm/intel_ds.h b/arch/x86/include/asm/intel_d= s.h index 5dbeac48a5b9..023c2883f9f3 100644 --- a/arch/x86/include/asm/intel_ds.h +++ b/arch/x86/include/asm/intel_ds.h @@ -4,7 +4,8 @@ #include =20 #define BTS_BUFFER_SIZE (PAGE_SIZE << 4) -#define PEBS_BUFFER_SIZE (PAGE_SIZE << 4) +#define PEBS_BUFFER_SHIFT 4 +#define PEBS_BUFFER_SIZE (PAGE_SIZE << PEBS_BUFFER_SHIFT) =20 /* The maximal number of PEBS events: */ #define MAX_PEBS_EVENTS_FMT4 8 --=20 2.40.1 From nobody Fri Dec 19 16:05:42 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 662CB28A1C3; Tue, 15 Apr 2025 08:23:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744705435; cv=none; b=DLZv2S4vcsTFZZTrSj32bMzCuRmCaX0NFV1fWL4b30dXd+Os5L569zu4mmNVd+0dduryModi1VlOpj7P06iOffUoCt1CMKWBb6tJ1i2bkbjs8dw93v6tIOMVTSHkFLIbcFdvO3SO2Y2Vky3JE2modaQn0HP1NHjQ70bUySx1C64= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744705435; c=relaxed/simple; bh=+5pmn70+QMBztXdc0bA1AWAPW44xJnx2qTnnWfMn0UU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=LG0ctEqI8lYiceJTbDnGbcM7Jdls1gjsc1y9oWxvhsJfif7l07bhl4S9roEboYgm4E3AGES1O3VeNnr7xmAbWWXAtfUiCqZCUuuuuXE/GmLPDHYYHBrSTa7BRZZ+F5L8adjSE9OlcaMDu7HK1PEF9lka8HDhBlB6hZcFKNF2Qgo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=QFFXwN4N; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="QFFXwN4N" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1744705434; x=1776241434; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=+5pmn70+QMBztXdc0bA1AWAPW44xJnx2qTnnWfMn0UU=; b=QFFXwN4NwSPaKR54saIkvXz0NgoNXqA/a7Ikt32p8IOGESOg4aQ2cPjN lw90+8XJ1++vmlHAz9g7OLgYIh22y4Tnt+Pps2opadg37QO7J2JlsapQ0 8BNeDhVxZtC4SBVbK1WpOlnvj1gASiNN+vf+jfQrKqLXkHzA3UVwQYIFx 9Kzic2aKsN0FgpFevPAsLfsKclMB6+MSVxqw5hNM7zeBSmJtLYUSzheuv f9mtFiu6y/Gq6/KsADlU2mOQ2qe1cpIbhlly3GClOZeSdpbDmbbIGtezw rW2lvYvWd20vi7oG5WgpldqsMXsyTqYbXuu0xjz7B2mKaIUr4wGcPMCZv A==; X-CSE-ConnectionGUID: zNK6yjz6TzG7pj5Fbgujwg== X-CSE-MsgGUID: 3ZVGfUP2RRmnMR/ewKAGpA== X-IronPort-AV: E=McAfee;i="6700,10204,11403"; a="46116029" X-IronPort-AV: E=Sophos;i="6.15,213,1739865600"; d="scan'208";a="46116029" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2025 01:23:54 -0700 X-CSE-ConnectionGUID: NNim8lJJTtKw8ycdifHkzQ== X-CSE-MsgGUID: X23G9NEDR8CM7Oozz4Zn3A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.15,213,1739865600"; d="scan'208";a="130055619" Received: from emr.sh.intel.com ([10.112.229.56]) by fmviesa007.fm.intel.com with ESMTP; 15 Apr 2025 01:23:50 -0700 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Ian Rogers , Adrian Hunter , Alexander Shishkin , Kan Liang , Andi Kleen , Eranian Stephane Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Dapeng Mi , Dapeng Mi Subject: [Patch v3 12/22] perf/x86/intel: Update dyn_constranit base on PEBS event precise level Date: Tue, 15 Apr 2025 11:44:18 +0000 Message-Id: <20250415114428.341182-13-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250415114428.341182-1-dapeng1.mi@linux.intel.com> References: <20250415114428.341182-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" arch-PEBS provides CPUIDs to enumerate which counters support PEBS sampling and precise distribution PEBS sampling. Thus PEBS constraints should be dynamically configured base on these counter and precise distribution bitmap instead of defining them statically. Update event dyn_constraint base on PEBS event precise level. Signed-off-by: Dapeng Mi --- arch/x86/events/intel/core.c | 9 +++++++++ arch/x86/events/intel/ds.c | 1 + 2 files changed, 10 insertions(+) diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index e0be6be50936..265b5e4baf73 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -4252,6 +4252,8 @@ static int intel_pmu_hw_config(struct perf_event *eve= nt) } =20 if (event->attr.precise_ip) { + struct arch_pebs_cap pebs_cap =3D hybrid(event->pmu, arch_pebs_cap); + if ((event->attr.config & INTEL_ARCH_EVENT_MASK) =3D=3D INTEL_FIXED_VLBR= _EVENT) return -EINVAL; =20 @@ -4265,6 +4267,13 @@ static int intel_pmu_hw_config(struct perf_event *ev= ent) } if (x86_pmu.pebs_aliases) x86_pmu.pebs_aliases(event); + + if (x86_pmu.arch_pebs) { + u64 cntr_mask =3D event->attr.precise_ip >=3D 3 ? + pebs_cap.pdists : pebs_cap.counters; + if (cntr_mask !=3D hybrid(event->pmu, intel_ctrl)) + event->hw.dyn_constraint =3D cntr_mask; + } } =20 if (needs_branch_stack(event)) { diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index 7437a52ba5f0..757d97c05d8f 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -2968,6 +2968,7 @@ static void __init intel_arch_pebs_init(void) x86_pmu.pebs_buffer_size =3D PEBS_BUFFER_SIZE; x86_pmu.drain_pebs =3D intel_pmu_drain_arch_pebs; x86_pmu.pebs_capable =3D ~0ULL; + x86_pmu.flags |=3D PMU_FL_PEBS_ALL; =20 x86_pmu.pebs_enable =3D __intel_pmu_pebs_enable; x86_pmu.pebs_disable =3D __intel_pmu_pebs_disable; --=20 2.40.1 From nobody Fri Dec 19 16:05:42 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2023028B4E9; Tue, 15 Apr 2025 08:23:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744705438; cv=none; b=onaNlPEwyQiZLgYTNCixIDHvTVgKmsJveQrLj6D1EWYQiRmZqGjfc4FMjxf05SfH9KpQq3iNrD8zLsg5v80D1jpDdYhVPRo5Vmg3kQ+PHNtHeWX+1XfS3NF5SRqitUYXEzneKX0pBPOOgQHunT4JzX7P4B+Lw+MaVy7HOspJUns= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744705438; c=relaxed/simple; bh=KROLHxbLjfSaVa1wiQ8tKF0kNbAMFG74XRkk/Zwfp1Q=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=L0JVeHBDiCSGHbO7acUq1654fubogsA3p1nmSCQ5GkDdwfgozqhkmi1JpSZ40xlFQ1UPFewSlPTybWOJkM6P5h7iF5xdOBB+PnmIFWyHIWpmKFmNHzGaWB/217pfwNDtxDT8tjKirFc0otOmnSmhZtJkq66WG6whL9Ozotsw9Ew= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Ped46Khn; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Ped46Khn" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1744705438; x=1776241438; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=KROLHxbLjfSaVa1wiQ8tKF0kNbAMFG74XRkk/Zwfp1Q=; b=Ped46KhnINI67d/uOuY4tgz2QS1X6g8rKOEIiiQa2J6TqWjd24tUUM+i tEzYzLsfZtiDJnlbq/EPaa+InL6vEjcRN5UbOtmglTReFvKCcOShz3u1B HA9VDK/5WOsdlFBiCONzKhOSGS5KjJKdGRfZFH+V7gI3BkQUR2RX3fppZ LyNfaa7lpP/96+7WStbnokClYA69NwoR1CPqrv9xyoIosMT6ojicgFidv P/3bbgRPNP1UCeJU3l7OQYMkESIveLEgCWysfBP+wLy9+KrnMBt8Kvglz NQZgi6cjmspbT6jIBlX3kV6805f1FLfcDdMXKKPbdt/0z4uzmEXzXxHGI Q==; X-CSE-ConnectionGUID: kXakDx6LRn+ppdi0XE5xhA== X-CSE-MsgGUID: A69oi2FBTZqlMuQ5JMOJMg== X-IronPort-AV: E=McAfee;i="6700,10204,11403"; a="46116049" X-IronPort-AV: E=Sophos;i="6.15,213,1739865600"; d="scan'208";a="46116049" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2025 01:23:57 -0700 X-CSE-ConnectionGUID: U9A9gpICQJKAq/n3EjDw1w== X-CSE-MsgGUID: s3GeJykiQtaRcWJ+xTwPEg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.15,213,1739865600"; d="scan'208";a="130055627" Received: from emr.sh.intel.com ([10.112.229.56]) by fmviesa007.fm.intel.com with ESMTP; 15 Apr 2025 01:23:53 -0700 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Ian Rogers , Adrian Hunter , Alexander Shishkin , Kan Liang , Andi Kleen , Eranian Stephane Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Dapeng Mi , Dapeng Mi Subject: [Patch v3 13/22] perf/x86/intel: Setup PEBS data configuration and enable legacy groups Date: Tue, 15 Apr 2025 11:44:19 +0000 Message-Id: <20250415114428.341182-14-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250415114428.341182-1-dapeng1.mi@linux.intel.com> References: <20250415114428.341182-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Different with legacy PEBS, arch-PEBS provides per-counter PEBS data configuration by programing MSR IA32_PMC_GPx/FXx_CFG_C MSRs. This patch obtains PEBS data configuration from event attribute and then writes the PEBS data configuration to MSR IA32_PMC_GPx/FXx_CFG_C and enable corresponding PEBS groups. Co-developed-by: Kan Liang Signed-off-by: Kan Liang Signed-off-by: Dapeng Mi --- arch/x86/events/intel/core.c | 127 +++++++++++++++++++++++++++++++ arch/x86/events/intel/ds.c | 17 +++++ arch/x86/events/perf_event.h | 12 +++ arch/x86/include/asm/intel_ds.h | 7 ++ arch/x86/include/asm/msr-index.h | 8 ++ 5 files changed, 171 insertions(+) diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index 265b5e4baf73..ae7f5dfee041 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -2562,6 +2562,39 @@ static void intel_pmu_disable_fixed(struct perf_even= t *event) cpuc->fixed_ctrl_val &=3D ~mask; } =20 +static inline void __intel_pmu_update_event_ext(int idx, u64 ext) +{ + struct cpu_hw_events *cpuc =3D this_cpu_ptr(&cpu_hw_events); + u32 msr =3D idx < INTEL_PMC_IDX_FIXED ? + x86_pmu_cfg_c_addr(idx, true) : + x86_pmu_cfg_c_addr(idx - INTEL_PMC_IDX_FIXED, false); + + cpuc->cfg_c_val[idx] =3D ext; + wrmsrl(msr, ext); +} + +static void intel_pmu_disable_event_ext(struct perf_event *event) +{ + if (!x86_pmu.arch_pebs) + return; + + /* + * Only clear CFG_C MSR for PEBS counter group events, + * it avoids the HW counter's value to be added into + * other PEBS records incorrectly after PEBS counter + * group events are disabled. + * + * For other events, it's unnecessary to clear CFG_C MSRs + * since CFG_C doesn't take effect if counter is in + * disabled state. That helps to reduce the WRMSR overhead + * in context switches. + */ + if (!is_pebs_counter_event_group(event)) + return; + + __intel_pmu_update_event_ext(event->hw.idx, 0); +} + static void intel_pmu_disable_event(struct perf_event *event) { struct hw_perf_event *hwc =3D &event->hw; @@ -2570,9 +2603,12 @@ static void intel_pmu_disable_event(struct perf_even= t *event) switch (idx) { case 0 ... INTEL_PMC_IDX_FIXED - 1: intel_clear_masks(event, idx); + intel_pmu_disable_event_ext(event); x86_pmu_disable_event(event); break; case INTEL_PMC_IDX_FIXED ... INTEL_PMC_IDX_FIXED_BTS - 1: + intel_pmu_disable_event_ext(event); + fallthrough; case INTEL_PMC_IDX_METRIC_BASE ... INTEL_PMC_IDX_METRIC_END: intel_pmu_disable_fixed(event); break; @@ -2941,6 +2977,67 @@ static void intel_pmu_enable_acr(struct perf_event *= event) =20 DEFINE_STATIC_CALL_NULL(intel_pmu_enable_acr_event, intel_pmu_enable_acr); =20 +static void intel_pmu_enable_event_ext(struct perf_event *event) +{ + struct cpu_hw_events *cpuc =3D this_cpu_ptr(&cpu_hw_events); + struct hw_perf_event *hwc =3D &event->hw; + union arch_pebs_index cached, index; + struct arch_pebs_cap cap; + u64 ext =3D 0; + + if (!x86_pmu.arch_pebs) + return; + + cap =3D hybrid(cpuc->pmu, arch_pebs_cap); + + if (event->attr.precise_ip) { + u64 pebs_data_cfg =3D intel_get_arch_pebs_data_config(event); + + ext |=3D ARCH_PEBS_EN; + if (hwc->flags & PERF_X86_EVENT_AUTO_RELOAD) + ext |=3D (-hwc->sample_period) & ARCH_PEBS_RELOAD; + + if (pebs_data_cfg && cap.caps) { + if (pebs_data_cfg & PEBS_DATACFG_MEMINFO) + ext |=3D ARCH_PEBS_AUX & cap.caps; + + if (pebs_data_cfg & PEBS_DATACFG_GP) + ext |=3D ARCH_PEBS_GPR & cap.caps; + + if (pebs_data_cfg & PEBS_DATACFG_XMMS) + ext |=3D ARCH_PEBS_VECR_XMM & cap.caps; + + if (pebs_data_cfg & PEBS_DATACFG_LBRS) + ext |=3D ARCH_PEBS_LBR & cap.caps; + } + + if (cpuc->n_pebs =3D=3D cpuc->n_large_pebs) + index.split.thresh =3D ARCH_PEBS_THRESH_MUL; + else + index.split.thresh =3D ARCH_PEBS_THRESH_SINGLE; + + rdmsrl(MSR_IA32_PEBS_INDEX, cached.full); + if (index.split.thresh !=3D cached.split.thresh || !cached.split.en) { + if (cached.split.thresh =3D=3D ARCH_PEBS_THRESH_MUL && + cached.split.wr > 0) { + /* + * Large PEBS was enabled. + * Drain PEBS buffer before applying the single PEBS. + */ + intel_pmu_drain_pebs_buffer(); + } else { + index.split.wr =3D 0; + index.split.full =3D 0; + index.split.en =3D 1; + wrmsrl(MSR_IA32_PEBS_INDEX, index.full); + } + } + } + + if (cpuc->cfg_c_val[hwc->idx] !=3D ext) + __intel_pmu_update_event_ext(hwc->idx, ext); +} + static void intel_pmu_enable_event(struct perf_event *event) { u64 enable_mask =3D ARCH_PERFMON_EVENTSEL_ENABLE; @@ -2956,10 +3053,12 @@ static void intel_pmu_enable_event(struct perf_even= t *event) enable_mask |=3D ARCH_PERFMON_EVENTSEL_BR_CNTR; intel_set_masks(event, idx); static_call_cond(intel_pmu_enable_acr_event)(event); + intel_pmu_enable_event_ext(event); __x86_pmu_enable_event(hwc, enable_mask); break; case INTEL_PMC_IDX_FIXED ... INTEL_PMC_IDX_FIXED_BTS - 1: static_call_cond(intel_pmu_enable_acr_event)(event); + intel_pmu_enable_event_ext(event); fallthrough; case INTEL_PMC_IDX_METRIC_BASE ... INTEL_PMC_IDX_METRIC_END: intel_pmu_enable_fixed(event); @@ -5293,6 +5392,29 @@ static inline bool intel_pmu_broken_perf_cap(void) return false; } =20 +static inline void __intel_update_pmu_caps(struct pmu *pmu) +{ + struct pmu *dest_pmu =3D pmu ? pmu : x86_get_pmu(smp_processor_id()); + + if (hybrid(pmu, arch_pebs_cap).caps & ARCH_PEBS_VECR_XMM) + dest_pmu->capabilities |=3D PERF_PMU_CAP_EXTENDED_REGS; +} + +static inline void __intel_update_large_pebs_flags(struct pmu *pmu) +{ + u64 caps =3D hybrid(pmu, arch_pebs_cap).caps; + + x86_pmu.large_pebs_flags |=3D PERF_SAMPLE_TIME; + if (caps & ARCH_PEBS_LBR) + x86_pmu.large_pebs_flags |=3D PERF_SAMPLE_BRANCH_STACK; + + if (!(caps & ARCH_PEBS_AUX)) + x86_pmu.large_pebs_flags &=3D ~PERF_SAMPLE_DATA_SRC; + if (!(caps & ARCH_PEBS_GPR)) + x86_pmu.large_pebs_flags &=3D + ~(PERF_SAMPLE_REGS_INTR | PERF_SAMPLE_REGS_USER); +} + static void update_pmu_cap(struct pmu *pmu) { unsigned int eax, ebx, ecx, edx; @@ -5333,6 +5455,9 @@ static void update_pmu_cap(struct pmu *pmu) &eax, &ebx, &ecx, &edx); hybrid(pmu, arch_pebs_cap).counters =3D ((u64)ecx << 32) | eax; hybrid(pmu, arch_pebs_cap).pdists =3D ((u64)edx << 32) | ebx; + + __intel_update_pmu_caps(pmu); + __intel_update_large_pebs_flags(pmu); } else { WARN_ON(x86_pmu.arch_pebs =3D=3D 1); x86_pmu.arch_pebs =3D 0; @@ -5496,6 +5621,8 @@ static void intel_pmu_cpu_starting(int cpu) } } =20 + __intel_update_pmu_caps(cpuc->pmu); + if (!cpuc->shared_regs) return; =20 diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index 757d97c05d8f..6a138435092d 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -1512,6 +1512,18 @@ pebs_update_state(bool needed_cb, struct cpu_hw_even= ts *cpuc, } } =20 +u64 intel_get_arch_pebs_data_config(struct perf_event *event) +{ + u64 pebs_data_cfg =3D 0; + + if (WARN_ON(event->hw.idx < 0 || event->hw.idx >=3D X86_PMC_IDX_MAX)) + return 0; + + pebs_data_cfg |=3D pebs_update_adaptive_cfg(event); + + return pebs_data_cfg; +} + void intel_pmu_pebs_add(struct perf_event *event) { struct cpu_hw_events *cpuc =3D this_cpu_ptr(&cpu_hw_events); @@ -2954,6 +2966,11 @@ static void intel_pmu_drain_arch_pebs(struct pt_regs= *iregs, =20 index.split.wr =3D 0; index.split.full =3D 0; + index.split.en =3D 1; + if (cpuc->n_pebs =3D=3D cpuc->n_large_pebs) + index.split.thresh =3D ARCH_PEBS_THRESH_MUL; + else + index.split.thresh =3D ARCH_PEBS_THRESH_SINGLE; wrmsrl(MSR_IA32_PEBS_INDEX, index.full); } =20 diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index d93d4c7a9876..c6c2ab34e711 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -296,6 +296,8 @@ struct cpu_hw_events { /* Intel ACR configuration */ u64 acr_cfg_b[X86_PMC_IDX_MAX]; u64 acr_cfg_c[X86_PMC_IDX_MAX]; + /* Cached CFG_C values */ + u64 cfg_c_val[X86_PMC_IDX_MAX]; =20 /* * Intel LBR bits @@ -1208,6 +1210,14 @@ static inline unsigned int x86_pmu_fixed_ctr_addr(in= t index) x86_pmu.addr_offset(index, false) : index); } =20 +static inline unsigned int x86_pmu_cfg_c_addr(int index, bool gp) +{ + u32 base =3D gp ? MSR_IA32_PMC_V6_GP0_CFG_C : MSR_IA32_PMC_V6_FX0_CFG_C; + + return base + (x86_pmu.addr_offset ? x86_pmu.addr_offset(index, false) : + index * MSR_IA32_PMC_V6_STEP); +} + static inline int x86_pmu_rdpmc_index(int index) { return x86_pmu.rdpmc_index ? x86_pmu.rdpmc_index(index) : index; @@ -1771,6 +1781,8 @@ void intel_pmu_pebs_data_source_cmt(void); =20 void intel_pmu_pebs_data_source_lnl(void); =20 +u64 intel_get_arch_pebs_data_config(struct perf_event *event); + int intel_pmu_setup_lbr_filter(struct perf_event *event); =20 void intel_pt_interrupt(void); diff --git a/arch/x86/include/asm/intel_ds.h b/arch/x86/include/asm/intel_d= s.h index 023c2883f9f3..7bb80c993bef 100644 --- a/arch/x86/include/asm/intel_ds.h +++ b/arch/x86/include/asm/intel_ds.h @@ -7,6 +7,13 @@ #define PEBS_BUFFER_SHIFT 4 #define PEBS_BUFFER_SIZE (PAGE_SIZE << PEBS_BUFFER_SHIFT) =20 +/* + * The largest PEBS record could consume a page, ensure + * a record at least can be written after triggering PMI. + */ +#define ARCH_PEBS_THRESH_MUL ((PEBS_BUFFER_SIZE - PAGE_SIZE) >> PEBS_BUFFE= R_SHIFT) +#define ARCH_PEBS_THRESH_SINGLE 1 + /* The maximal number of PEBS events: */ #define MAX_PEBS_EVENTS_FMT4 8 #define MAX_PEBS_EVENTS 32 diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-in= dex.h index d77048df8e72..ea4f100dbd3c 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -320,6 +320,14 @@ #define ARCH_PEBS_OFFSET_MASK 0x7fffff #define ARCH_PEBS_INDEX_WR_SHIFT 4 =20 +#define ARCH_PEBS_RELOAD 0xffffffff +#define ARCH_PEBS_LBR_SHIFT 40 +#define ARCH_PEBS_LBR (0x3ull << ARCH_PEBS_LBR_SHIFT) +#define ARCH_PEBS_VECR_XMM BIT_ULL(49) +#define ARCH_PEBS_GPR BIT_ULL(61) +#define ARCH_PEBS_AUX BIT_ULL(62) +#define ARCH_PEBS_EN BIT_ULL(63) + #define MSR_IA32_RTIT_CTL 0x00000570 #define RTIT_CTL_TRACEEN BIT(0) #define RTIT_CTL_CYCLEACC BIT(1) --=20 2.40.1 From nobody Fri Dec 19 16:05:43 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A592328B505; Tue, 15 Apr 2025 08:24:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744705442; cv=none; b=L6VlDvbegPBEtSP2Doe0wRq5JBd9EChJSteYEh7I1hBe8K5e67VsqAZZuUnES1JEniKLn98wFyq8u0M9MuJMHxSUaB5OlIDnSspsATzjbjSFCZIGRgq2rhJP8cQ8Rr10QuEUQmxNV0w4eBYYIZpfcdbXDuIBiQgFf2FDkcU7Mis= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744705442; c=relaxed/simple; bh=UR2ZWqXLCM716ubu2TqQTb1sW52LUxKROanB7DXWBnA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=K1A/Y6mATTsOPtvYis5uIvw5pt/n/+iq7ko16i9HCTDJnaX2Zu+N2eO/faq90+4SKgG1C4i/YC+UfmppF1sT7fY62q95/tOOyERUVrJNizYdbpvrdLAwUOxWsUFYLhUrQk/JL/DNXd49LlM0X6titLYAl/HZu3AxPi/eVfbD2W0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=blg66+De; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="blg66+De" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1744705441; x=1776241441; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=UR2ZWqXLCM716ubu2TqQTb1sW52LUxKROanB7DXWBnA=; b=blg66+DeJXRgw6KxyPtjEuoiKzN6g3mtgbhbgBb+JZyqFwKIBs4wKc4v iDnjIZZJBSgbeSVV0IOoUIuEdINbLjvHz5S3WUwmQZgYE//ld0XUM6aAg DDBIpuptKECyaX8xxWwMQhi2EeTRtn2WwhD2eHc7AsNLt8B8bFfdnzG5r 39tlcDC10pYwJQr2arPa3NqOGmAY9EwEjr+Wbl+TMCqm54OlEBs7s9HZy zlwn9EWLfreTKL+3Tx64vPyf45L8QeH0Gh7/UDJOZ32JzA0W/k1yWmpFB fSxDiqWXv2joSYMukP/2LzkHmW/SPn2ZZAETAi0yrRmh9P0tqp9dFUdAB Q==; X-CSE-ConnectionGUID: eWdvMmysQQmuvgIpFrx0Ng== X-CSE-MsgGUID: JdIm6KLJTv+qacc9iYA33w== X-IronPort-AV: E=McAfee;i="6700,10204,11403"; a="46116061" X-IronPort-AV: E=Sophos;i="6.15,213,1739865600"; d="scan'208";a="46116061" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2025 01:24:01 -0700 X-CSE-ConnectionGUID: 5U8VaAI1QQavTb4ssZHeVg== X-CSE-MsgGUID: GxDNzjUfQ4SRds8tgVKu7Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.15,213,1739865600"; d="scan'208";a="130055632" Received: from emr.sh.intel.com ([10.112.229.56]) by fmviesa007.fm.intel.com with ESMTP; 15 Apr 2025 01:23:57 -0700 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Ian Rogers , Adrian Hunter , Alexander Shishkin , Kan Liang , Andi Kleen , Eranian Stephane Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Dapeng Mi , Dapeng Mi Subject: [Patch v3 14/22] perf/x86/intel: Add counter group support for arch-PEBS Date: Tue, 15 Apr 2025 11:44:20 +0000 Message-Id: <20250415114428.341182-15-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250415114428.341182-1-dapeng1.mi@linux.intel.com> References: <20250415114428.341182-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Base on previous adaptive PEBS counter snapshot support, add counter group support for architectural PEBS. Since arch-PEBS shares same counter group layout with adaptive PEBS, directly reuse __setup_pebs_counter_group() helper to process arch-PEBS counter group. Signed-off-by: Dapeng Mi --- arch/x86/events/intel/core.c | 38 ++++++++++++++++++++++++++++--- arch/x86/events/intel/ds.c | 29 ++++++++++++++++++++--- arch/x86/include/asm/msr-index.h | 6 +++++ arch/x86/include/asm/perf_event.h | 13 ++++++++--- 4 files changed, 77 insertions(+), 9 deletions(-) diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index ae7f5dfee041..d543ed052743 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -3009,6 +3009,17 @@ static void intel_pmu_enable_event_ext(struct perf_e= vent *event) =20 if (pebs_data_cfg & PEBS_DATACFG_LBRS) ext |=3D ARCH_PEBS_LBR & cap.caps; + + if (pebs_data_cfg & + (PEBS_DATACFG_CNTR_MASK << PEBS_DATACFG_CNTR_SHIFT)) + ext |=3D ARCH_PEBS_CNTR_GP & cap.caps; + + if (pebs_data_cfg & + (PEBS_DATACFG_FIX_MASK << PEBS_DATACFG_FIX_SHIFT)) + ext |=3D ARCH_PEBS_CNTR_FIXED & cap.caps; + + if (pebs_data_cfg & PEBS_DATACFG_METRICS) + ext |=3D ARCH_PEBS_CNTR_METRICS & cap.caps; } =20 if (cpuc->n_pebs =3D=3D cpuc->n_large_pebs) @@ -3034,6 +3045,9 @@ static void intel_pmu_enable_event_ext(struct perf_ev= ent *event) } } =20 + if (is_pebs_counter_event_group(event)) + ext |=3D ARCH_PEBS_CNTR_ALLOW; + if (cpuc->cfg_c_val[hwc->idx] !=3D ext) __intel_pmu_update_event_ext(hwc->idx, ext); } @@ -4318,6 +4332,20 @@ static bool intel_pmu_is_acr_group(struct perf_event= *event) return false; } =20 +static inline bool intel_pmu_has_pebs_counter_group(struct pmu *pmu) +{ + u64 caps; + + if (x86_pmu.intel_cap.pebs_format >=3D 6 && x86_pmu.intel_cap.pebs_baseli= ne) + return true; + + caps =3D hybrid(pmu, arch_pebs_cap).caps; + if (x86_pmu.arch_pebs && (caps & ARCH_PEBS_CNTR_MASK)) + return true; + + return false; +} + static inline void intel_pmu_set_acr_cntr_constr(struct perf_event *event, u64 *cause_mask, int *num) { @@ -4464,8 +4492,7 @@ static int intel_pmu_hw_config(struct perf_event *eve= nt) } =20 if ((event->attr.sample_type & PERF_SAMPLE_READ) && - (x86_pmu.intel_cap.pebs_format >=3D 6) && - x86_pmu.intel_cap.pebs_baseline && + intel_pmu_has_pebs_counter_group(event->pmu) && is_sampling_event(event) && event->attr.precise_ip) event->group_leader->hw.flags |=3D PERF_X86_EVENT_PEBS_CNTR; @@ -5407,6 +5434,8 @@ static inline void __intel_update_large_pebs_flags(st= ruct pmu *pmu) x86_pmu.large_pebs_flags |=3D PERF_SAMPLE_TIME; if (caps & ARCH_PEBS_LBR) x86_pmu.large_pebs_flags |=3D PERF_SAMPLE_BRANCH_STACK; + if (caps & ARCH_PEBS_CNTR_MASK) + x86_pmu.large_pebs_flags |=3D PERF_SAMPLE_READ; =20 if (!(caps & ARCH_PEBS_AUX)) x86_pmu.large_pebs_flags &=3D ~PERF_SAMPLE_DATA_SRC; @@ -7108,8 +7137,11 @@ __init int intel_pmu_init(void) * Many features on and after V6 require dynamic constraint, * e.g., Arch PEBS, ACR. */ - if (version >=3D 6) + if (version >=3D 6) { x86_pmu.flags |=3D PMU_FL_DYN_CONSTRAINT; + x86_pmu.late_setup =3D intel_pmu_late_setup; + } + /* * Install the hw-cache-events table: */ diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index 6a138435092d..19b51b4d0d94 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -1514,13 +1514,20 @@ pebs_update_state(bool needed_cb, struct cpu_hw_eve= nts *cpuc, =20 u64 intel_get_arch_pebs_data_config(struct perf_event *event) { + struct cpu_hw_events *cpuc =3D this_cpu_ptr(&cpu_hw_events); u64 pebs_data_cfg =3D 0; + u64 cntr_mask; =20 if (WARN_ON(event->hw.idx < 0 || event->hw.idx >=3D X86_PMC_IDX_MAX)) return 0; =20 pebs_data_cfg |=3D pebs_update_adaptive_cfg(event); =20 + cntr_mask =3D (PEBS_DATACFG_CNTR_MASK << PEBS_DATACFG_CNTR_SHIFT) | + (PEBS_DATACFG_FIX_MASK << PEBS_DATACFG_FIX_SHIFT) | + PEBS_DATACFG_CNTR | PEBS_DATACFG_METRICS; + pebs_data_cfg |=3D cpuc->pebs_data_cfg & cntr_mask; + return pebs_data_cfg; } =20 @@ -2428,6 +2435,24 @@ static void setup_arch_pebs_sample_data(struct perf_= event *event, } } =20 + if (header->cntr) { + struct arch_pebs_cntr_header *cntr =3D next_record; + unsigned int nr; + + next_record +=3D sizeof(struct arch_pebs_cntr_header); + + if (is_pebs_counter_event_group(event)) { + __setup_pebs_counter_group(cpuc, event, + (struct pebs_cntr_header *)cntr, next_record); + data->sample_flags |=3D PERF_SAMPLE_READ; + } + + nr =3D hweight32(cntr->cntr) + hweight32(cntr->fixed); + if (cntr->metrics =3D=3D INTEL_CNTR_METRICS) + nr +=3D 2; + next_record +=3D nr * sizeof(u64); + } + /* Parse followed fragments if there are. */ if (arch_pebs_record_continued(header)) { at =3D at + header->size; @@ -3057,10 +3082,8 @@ static void __init intel_ds_pebs_init(void) break; =20 case 6: - if (x86_pmu.intel_cap.pebs_baseline) { + if (x86_pmu.intel_cap.pebs_baseline) x86_pmu.large_pebs_flags |=3D PERF_SAMPLE_READ; - x86_pmu.late_setup =3D intel_pmu_late_setup; - } fallthrough; case 5: x86_pmu.pebs_ept =3D 1; diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-in= dex.h index ea4f100dbd3c..c971ac09d881 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -321,12 +321,18 @@ #define ARCH_PEBS_INDEX_WR_SHIFT 4 =20 #define ARCH_PEBS_RELOAD 0xffffffff +#define ARCH_PEBS_CNTR_ALLOW BIT_ULL(35) +#define ARCH_PEBS_CNTR_GP BIT_ULL(36) +#define ARCH_PEBS_CNTR_FIXED BIT_ULL(37) +#define ARCH_PEBS_CNTR_METRICS BIT_ULL(38) #define ARCH_PEBS_LBR_SHIFT 40 #define ARCH_PEBS_LBR (0x3ull << ARCH_PEBS_LBR_SHIFT) #define ARCH_PEBS_VECR_XMM BIT_ULL(49) #define ARCH_PEBS_GPR BIT_ULL(61) #define ARCH_PEBS_AUX BIT_ULL(62) #define ARCH_PEBS_EN BIT_ULL(63) +#define ARCH_PEBS_CNTR_MASK (ARCH_PEBS_CNTR_GP | ARCH_PEBS_CNTR_FIXED | \ + ARCH_PEBS_CNTR_METRICS) =20 #define MSR_IA32_RTIT_CTL 0x00000570 #define RTIT_CTL_TRACEEN BIT(0) diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_= event.h index 7f9d8e6577f0..4e5adbc7baea 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -137,16 +137,16 @@ #define ARCH_PERFMON_EVENTS_COUNT 7 =20 #define PEBS_DATACFG_MEMINFO BIT_ULL(0) -#define PEBS_DATACFG_GP BIT_ULL(1) +#define PEBS_DATACFG_GP BIT_ULL(1) #define PEBS_DATACFG_XMMS BIT_ULL(2) #define PEBS_DATACFG_LBRS BIT_ULL(3) -#define PEBS_DATACFG_LBR_SHIFT 24 #define PEBS_DATACFG_CNTR BIT_ULL(4) +#define PEBS_DATACFG_METRICS BIT_ULL(5) +#define PEBS_DATACFG_LBR_SHIFT 24 #define PEBS_DATACFG_CNTR_SHIFT 32 #define PEBS_DATACFG_CNTR_MASK GENMASK_ULL(15, 0) #define PEBS_DATACFG_FIX_SHIFT 48 #define PEBS_DATACFG_FIX_MASK GENMASK_ULL(7, 0) -#define PEBS_DATACFG_METRICS BIT_ULL(5) =20 /* Steal the highest bit of pebs_data_cfg for SW usage */ #define PEBS_UPDATE_DS_SW BIT_ULL(63) @@ -603,6 +603,13 @@ struct arch_pebs_lbr_header { u64 ler_info; }; =20 +struct arch_pebs_cntr_header { + u32 cntr; + u32 fixed; + u32 metrics; + u32 reserved; +}; + /* * AMD Extended Performance Monitoring and Debug cpuid feature detection */ --=20 2.40.1 From nobody Fri Dec 19 16:05:43 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1380328BAA7; Tue, 15 Apr 2025 08:24:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744705445; cv=none; b=bJCuiZtOLYVleUOX/vcvR+/Ut/76e92s/U6ZToe1iOCsy3M4m+kwswYzdmwgrVfBAcPjs57+u4K7FWEL7rdz++GwVpiJvVZgN6IOesb/KjoWSg/1Uz81TNdnTBu4+UEPFfwAfTqsMYXKymXcrm3G1h9ahK6vrpAp4ZrhNrx3ovo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744705445; c=relaxed/simple; bh=KU+fwd9ys/2H/rWcdMTGYWyS/WENx0jFbK3PxdMIHpk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=EM/OBePyBRTCC1Y1AtYDeV8Zf7HjKXV8ixtKZgK65t9eX19LWKJwZoiH2CdIl6Yqss/kI7wa+iLocq3XeqMfdsLF2I0BeOUvWiXoXnIa27h2EYg6jZejVCgmxL24yx01iTu0lyiyHuIMI6S5bAioGqhUJwXq9CuGXhkJ7OaYJPA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=hwWe6fjA; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="hwWe6fjA" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1744705444; x=1776241444; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=KU+fwd9ys/2H/rWcdMTGYWyS/WENx0jFbK3PxdMIHpk=; b=hwWe6fjAItDxxhuXzrHP7F0LZuLWf8RqVsrJlIoRKnsa0/O3DDPN/zUZ 7j7HmF74gvhMI//vMvdIVNoflCg+xDLpjChgGIzcjPX71nK1zzABENpz9 sz0vLcFN1e8UrklOVDa49oih5Hj/ByL/6ET2A/52GsriDe+TFVnMfRbCb J2YH1wiskjDZtFiPu+YnakncHvIO1JGAviqKi4fJOsZPRjM/iccU+pvtM 6LX5jFp+7Moy3cwOc7+mW5yZ6vwpBw46QhYtRMMWztluKyGBr3ckSC2Ne bHhZxQl1rt6X/eeefTlT2Zk/RxxT8NUjSzLzalPVm3vciAjKjmsIeR/gG A==; X-CSE-ConnectionGUID: Dp3EhExIT7SzQ3KBOhQH/A== X-CSE-MsgGUID: Q5c94PtISUqsPaxD1T7Ewg== X-IronPort-AV: E=McAfee;i="6700,10204,11403"; a="46116075" X-IronPort-AV: E=Sophos;i="6.15,213,1739865600"; d="scan'208";a="46116075" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2025 01:24:04 -0700 X-CSE-ConnectionGUID: xFm4KFhhTC6Aa4j6YVjYeA== X-CSE-MsgGUID: 1F64bVbgTPS13tRPJfTOYA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.15,213,1739865600"; d="scan'208";a="130055641" Received: from emr.sh.intel.com ([10.112.229.56]) by fmviesa007.fm.intel.com with ESMTP; 15 Apr 2025 01:24:00 -0700 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Ian Rogers , Adrian Hunter , Alexander Shishkin , Kan Liang , Andi Kleen , Eranian Stephane Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Dapeng Mi , Dapeng Mi Subject: [Patch v3 15/22] perf/x86/intel: Support SSP register capturing for arch-PEBS Date: Tue, 15 Apr 2025 11:44:21 +0000 Message-Id: <20250415114428.341182-16-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250415114428.341182-1-dapeng1.mi@linux.intel.com> References: <20250415114428.341182-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Arch-PEBS supports to capture shadow stack pointer (SSP) register in GPR group. This patch supports to capture and output SSP register at interrupt or user space, but capturing SSP at user space requires 'exclude_kernel' attribute must be set. That avoids kernel space SSP register is captured unintentionally. Signed-off-by: Dapeng Mi --- arch/x86/events/core.c | 15 +++++++++++++++ arch/x86/events/intel/core.c | 3 ++- arch/x86/events/intel/ds.c | 9 +++++++-- arch/x86/events/perf_event.h | 4 ++++ arch/x86/include/asm/perf_event.h | 1 + arch/x86/include/uapi/asm/perf_regs.h | 4 +++- arch/x86/kernel/perf_regs.c | 7 +++++++ 7 files changed, 39 insertions(+), 4 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 9c205a8a4fa6..0ccbe8385c7f 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -650,6 +650,21 @@ int x86_pmu_hw_config(struct perf_event *event) return -EINVAL; } =20 + if (unlikely(event->attr.sample_regs_user & BIT_ULL(PERF_REG_X86_SSP))) { + /* Only arch-PEBS supports to capture SSP register. */ + if (!x86_pmu.arch_pebs || !event->attr.precise_ip) + return -EINVAL; + /* Only user space is allowed to capture. */ + if (!event->attr.exclude_kernel) + return -EINVAL; + } + + if (unlikely(event->attr.sample_regs_intr & BIT_ULL(PERF_REG_X86_SSP))) { + /* Only arch-PEBS supports to capture SSP register. */ + if (!x86_pmu.arch_pebs || !event->attr.precise_ip) + return -EINVAL; + } + /* sample_regs_user never support XMM registers */ if (unlikely(event->attr.sample_regs_user & PERF_REG_EXTENDED_MASK)) return -EINVAL; diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index d543ed052743..b6416535f84d 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -4151,12 +4151,13 @@ static void intel_pebs_aliases_skl(struct perf_even= t *event) static unsigned long intel_pmu_large_pebs_flags(struct perf_event *event) { unsigned long flags =3D x86_pmu.large_pebs_flags; + u64 gprs_mask =3D x86_pmu.arch_pebs ? ARCH_PEBS_GP_REGS : PEBS_GP_REGS; =20 if (event->attr.use_clockid) flags &=3D ~PERF_SAMPLE_TIME; if (!event->attr.exclude_kernel) flags &=3D ~PERF_SAMPLE_REGS_USER; - if (event->attr.sample_regs_user & ~PEBS_GP_REGS) + if (event->attr.sample_regs_user & ~gprs_mask) flags &=3D ~(PERF_SAMPLE_REGS_USER | PERF_SAMPLE_REGS_INTR); return flags; } diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index 19b51b4d0d94..91a093cba11f 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -1431,6 +1431,7 @@ static u64 pebs_update_adaptive_cfg(struct perf_event= *event) u64 sample_type =3D attr->sample_type; u64 pebs_data_cfg =3D 0; bool gprs, tsx_weight; + u64 gprs_mask; =20 if (!(sample_type & ~(PERF_SAMPLE_IP|PERF_SAMPLE_TIME)) && attr->precise_ip > 1) @@ -1445,10 +1446,11 @@ static u64 pebs_update_adaptive_cfg(struct perf_eve= nt *event) * + precise_ip < 2 for the non event IP * + For RTM TSX weight we need GPRs for the abort code. */ + gprs_mask =3D x86_pmu.arch_pebs ? ARCH_PEBS_GP_REGS : PEBS_GP_REGS; gprs =3D ((sample_type & PERF_SAMPLE_REGS_INTR) && - (attr->sample_regs_intr & PEBS_GP_REGS)) || + (attr->sample_regs_intr & gprs_mask)) || ((sample_type & PERF_SAMPLE_REGS_USER) && - (attr->sample_regs_user & PEBS_GP_REGS)); + (attr->sample_regs_user & gprs_mask)); =20 tsx_weight =3D (sample_type & PERF_SAMPLE_WEIGHT_TYPE) && ((attr->config & INTEL_ARCH_EVENT_MASK) =3D=3D @@ -2243,6 +2245,7 @@ static void setup_pebs_adaptive_sample_data(struct pe= rf_event *event, =20 perf_regs =3D container_of(regs, struct x86_perf_regs, regs); perf_regs->xmm_regs =3D NULL; + perf_regs->ssp =3D 0; =20 format_group =3D basic->format_group; =20 @@ -2359,6 +2362,7 @@ static void setup_arch_pebs_sample_data(struct perf_e= vent *event, =20 perf_regs =3D container_of(regs, struct x86_perf_regs, regs); perf_regs->xmm_regs =3D NULL; + perf_regs->ssp =3D 0; =20 __setup_perf_sample_data(event, iregs, data); =20 @@ -2395,6 +2399,7 @@ static void setup_arch_pebs_sample_data(struct perf_e= vent *event, =20 __setup_pebs_gpr_group(event, regs, (struct pebs_gprs *)gprs, sample_type); + perf_regs->ssp =3D gprs->ssp; } =20 if (header->aux) { diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index c6c2ab34e711..6a8804a75de9 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -175,6 +175,10 @@ struct amd_nb { (1ULL << PERF_REG_X86_R14) | \ (1ULL << PERF_REG_X86_R15)) =20 +#define ARCH_PEBS_GP_REGS \ + (PEBS_GP_REGS | \ + (1ULL << PERF_REG_X86_SSP)) + /* * Per register state. */ diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_= event.h index 4e5adbc7baea..ba382361b13f 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -704,6 +704,7 @@ extern void perf_events_lapic_init(void); struct pt_regs; struct x86_perf_regs { struct pt_regs regs; + u64 ssp; u64 *xmm_regs; }; =20 diff --git a/arch/x86/include/uapi/asm/perf_regs.h b/arch/x86/include/uapi/= asm/perf_regs.h index 7c9d2bb3833b..f9c5b16b1882 100644 --- a/arch/x86/include/uapi/asm/perf_regs.h +++ b/arch/x86/include/uapi/asm/perf_regs.h @@ -27,9 +27,11 @@ enum perf_event_x86_regs { PERF_REG_X86_R13, PERF_REG_X86_R14, PERF_REG_X86_R15, + /* arch-PEBS supports to capture shadow stack pointer (SSP) */ + PERF_REG_X86_SSP, /* These are the limits for the GPRs. */ PERF_REG_X86_32_MAX =3D PERF_REG_X86_GS + 1, - PERF_REG_X86_64_MAX =3D PERF_REG_X86_R15 + 1, + PERF_REG_X86_64_MAX =3D PERF_REG_X86_SSP + 1, =20 /* These all need two bits set because they are 128bit */ PERF_REG_X86_XMM0 =3D 32, diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c index 624703af80a1..985bd616200e 100644 --- a/arch/x86/kernel/perf_regs.c +++ b/arch/x86/kernel/perf_regs.c @@ -54,6 +54,8 @@ static unsigned int pt_regs_offset[PERF_REG_X86_MAX] =3D { PT_REGS_OFFSET(PERF_REG_X86_R13, r13), PT_REGS_OFFSET(PERF_REG_X86_R14, r14), PT_REGS_OFFSET(PERF_REG_X86_R15, r15), + /* The pt_regs struct does not store Shadow stack pointer. */ + (unsigned int) -1, #endif }; =20 @@ -68,6 +70,11 @@ u64 perf_reg_value(struct pt_regs *regs, int idx) return perf_regs->xmm_regs[idx - PERF_REG_X86_XMM0]; } =20 + if (idx =3D=3D PERF_REG_X86_SSP) { + perf_regs =3D container_of(regs, struct x86_perf_regs, regs); + return perf_regs->ssp; + } + if (WARN_ON_ONCE(idx >=3D ARRAY_SIZE(pt_regs_offset))) return 0; =20 --=20 2.40.1 From nobody Fri Dec 19 16:05:43 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BE74028BABB; Tue, 15 Apr 2025 08:24:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744705449; cv=none; b=Ch+j2pS5BKr7JMyaRSYpL+g0l32He1BZiXGgrTUqqAsx99B4oHIl1kNPfojG4pWTxVExwQ+KipTq3xX7FLvjXIbAi2tsOMuPeOUjkdyzW2jUZyWxk2x30oFizcF8xeX3cojreaq7yEw/KnenjuvR1bUC8TqYl+HQ0Jm7Cc57pIE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744705449; c=relaxed/simple; bh=qYqUfma8FkYSXAkd2OIGwxjfQdBIy2QD/gjcseBGXfs=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=fV6ZDkeFTM3Guh9Wvqy+IGeuJna25DXXe1H72qVnxiLcIH92ztVQlN1Gl/643nQ0nU66tYoEoBGnMbxE5TX78FLK2NMwj1XHKgBcjJeWC7/mU052q/eEvvZYi3reEcLbQv0JcFVQ2ppAuDD+QHoJPBZ/19O7t1Jx2EdLaMf7d8Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=LH7rSAHw; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="LH7rSAHw" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1744705448; x=1776241448; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=qYqUfma8FkYSXAkd2OIGwxjfQdBIy2QD/gjcseBGXfs=; b=LH7rSAHwb64chVEdN8VccMw98CAVMVBdDeTVmbqaBFVCjKrVGDPRT6f2 GRc+jH+sqyDvl1yfcnDTunj2IX3puyOCkp+rUJwFypPk6Q7M5HAkwtCs6 EV1i70t9Smoawyj9lYP9b/xCK8lLOPkIBIQ8dCyq5aovIsdhr4XhWcMSO 4kZHhESYxm742BVl5+sXbBZpJaiGnfgQbyMsy7CTmltpuv6x6YTEOFm8y uIHOzs7olDe2Lm4vKMeNdWtc1O6Itv9TanKuysRNohAev5xkHr4l/kO3h 7OXVQPb/yRy54uRQ95YnJxEdVH/tYzPe+eeTkloCdwwVpfHkPRbqOl3Ip Q==; X-CSE-ConnectionGUID: p+/75zFnRaKjVehzLZkEtw== X-CSE-MsgGUID: 2ANaFblFShOWZeOVCuyaUg== X-IronPort-AV: E=McAfee;i="6700,10204,11403"; a="46116086" X-IronPort-AV: E=Sophos;i="6.15,213,1739865600"; d="scan'208";a="46116086" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2025 01:24:08 -0700 X-CSE-ConnectionGUID: fZKp7LkPRtqkIZYkazDZlw== X-CSE-MsgGUID: SjiNQv7oSvqjdapuLZJDuA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.15,213,1739865600"; d="scan'208";a="130055653" Received: from emr.sh.intel.com ([10.112.229.56]) by fmviesa007.fm.intel.com with ESMTP; 15 Apr 2025 01:24:04 -0700 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Ian Rogers , Adrian Hunter , Alexander Shishkin , Kan Liang , Andi Kleen , Eranian Stephane Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Dapeng Mi , Dapeng Mi Subject: [Patch v3 16/22] perf/core: Support to capture higher width vector registers Date: Tue, 15 Apr 2025 11:44:22 +0000 Message-Id: <20250415114428.341182-17-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250415114428.341182-1-dapeng1.mi@linux.intel.com> References: <20250415114428.341182-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Arch-PEBS supports to capture more vector registers like OPMASK/YMM/ZMM registers besides XMM registers. This patch extends PERF_SAMPLE_REGS_INTR and PERF_SAMPLE_REGS_USER attributes to support these new vector registers capturing at interrupt and user space. The arrays sample_regs_intr/user__ext[] is added into perf_event_attr structure to record user configured extended register bitmap and a helper perf_reg_ext_validate() is added to validate if these registers are supported on some specific PMUs. Furthermore considering to leave enough space to support more GPRs like R16 ~ R31 introduced by APX in the future, directly extend the array size to 7. This patch just adds the common perf/core support, the x86/intel specific support would be added in next patch. Co-developed-by: Kan Liang Signed-off-by: Kan Liang Signed-off-by: Dapeng Mi --- arch/arm/kernel/perf_regs.c | 6 ++ arch/arm64/kernel/perf_regs.c | 6 ++ arch/csky/kernel/perf_regs.c | 5 ++ arch/loongarch/kernel/perf_regs.c | 5 ++ arch/mips/kernel/perf_regs.c | 5 ++ arch/powerpc/perf/perf_regs.c | 5 ++ arch/riscv/kernel/perf_regs.c | 5 ++ arch/s390/kernel/perf_regs.c | 5 ++ arch/x86/include/asm/perf_event.h | 4 ++ arch/x86/include/uapi/asm/perf_regs.h | 79 ++++++++++++++++++++- arch/x86/kernel/perf_regs.c | 64 ++++++++++++++++- include/linux/perf_event.h | 4 ++ include/linux/perf_regs.h | 10 +++ include/uapi/linux/perf_event.h | 11 +++ kernel/events/core.c | 98 +++++++++++++++++++++++++-- 15 files changed, 304 insertions(+), 8 deletions(-) diff --git a/arch/arm/kernel/perf_regs.c b/arch/arm/kernel/perf_regs.c index 0529f90395c9..86b2002d0846 100644 --- a/arch/arm/kernel/perf_regs.c +++ b/arch/arm/kernel/perf_regs.c @@ -37,3 +37,9 @@ void perf_get_regs_user(struct perf_regs *regs_user, regs_user->regs =3D task_pt_regs(current); regs_user->abi =3D perf_reg_abi(current); } + +int perf_reg_ext_validate(unsigned long *mask, unsigned int size) +{ + return -EINVAL; +} + diff --git a/arch/arm64/kernel/perf_regs.c b/arch/arm64/kernel/perf_regs.c index b4eece3eb17d..1c91fd3530d5 100644 --- a/arch/arm64/kernel/perf_regs.c +++ b/arch/arm64/kernel/perf_regs.c @@ -104,3 +104,9 @@ void perf_get_regs_user(struct perf_regs *regs_user, regs_user->regs =3D task_pt_regs(current); regs_user->abi =3D perf_reg_abi(current); } + +int perf_reg_ext_validate(unsigned long *mask, unsigned int size) +{ + return -EINVAL; +} + diff --git a/arch/csky/kernel/perf_regs.c b/arch/csky/kernel/perf_regs.c index 09b7f88a2d6a..d2e2af0bf1ad 100644 --- a/arch/csky/kernel/perf_regs.c +++ b/arch/csky/kernel/perf_regs.c @@ -26,6 +26,11 @@ int perf_reg_validate(u64 mask) return 0; } =20 +int perf_reg_ext_validate(unsigned long *mask, unsigned int size) +{ + return -EINVAL; +} + u64 perf_reg_abi(struct task_struct *task) { return PERF_SAMPLE_REGS_ABI_32; diff --git a/arch/loongarch/kernel/perf_regs.c b/arch/loongarch/kernel/perf= _regs.c index 263ac4ab5af6..e1df67e3fab4 100644 --- a/arch/loongarch/kernel/perf_regs.c +++ b/arch/loongarch/kernel/perf_regs.c @@ -34,6 +34,11 @@ int perf_reg_validate(u64 mask) return 0; } =20 +int perf_reg_ext_validate(unsigned long *mask, unsigned int size) +{ + return -EINVAL; +} + u64 perf_reg_value(struct pt_regs *regs, int idx) { if (WARN_ON_ONCE((u32)idx >=3D PERF_REG_LOONGARCH_MAX)) diff --git a/arch/mips/kernel/perf_regs.c b/arch/mips/kernel/perf_regs.c index e686780d1647..bbb5f25b9191 100644 --- a/arch/mips/kernel/perf_regs.c +++ b/arch/mips/kernel/perf_regs.c @@ -37,6 +37,11 @@ int perf_reg_validate(u64 mask) return 0; } =20 +int perf_reg_ext_validate(unsigned long *mask, unsigned int size) +{ + return -EINVAL; +} + u64 perf_reg_value(struct pt_regs *regs, int idx) { long v; diff --git a/arch/powerpc/perf/perf_regs.c b/arch/powerpc/perf/perf_regs.c index 350dccb0143c..d919c628aee3 100644 --- a/arch/powerpc/perf/perf_regs.c +++ b/arch/powerpc/perf/perf_regs.c @@ -132,6 +132,11 @@ int perf_reg_validate(u64 mask) return 0; } =20 +int perf_reg_ext_validate(unsigned long *mask, unsigned int size) +{ + return -EINVAL; +} + u64 perf_reg_abi(struct task_struct *task) { if (is_tsk_32bit_task(task)) diff --git a/arch/riscv/kernel/perf_regs.c b/arch/riscv/kernel/perf_regs.c index fd304a248de6..5beb60544c9a 100644 --- a/arch/riscv/kernel/perf_regs.c +++ b/arch/riscv/kernel/perf_regs.c @@ -26,6 +26,11 @@ int perf_reg_validate(u64 mask) return 0; } =20 +int perf_reg_ext_validate(unsigned long *mask, unsigned int size) +{ + return -EINVAL; +} + u64 perf_reg_abi(struct task_struct *task) { #if __riscv_xlen =3D=3D 64 diff --git a/arch/s390/kernel/perf_regs.c b/arch/s390/kernel/perf_regs.c index a6b058ee4a36..9247573229b0 100644 --- a/arch/s390/kernel/perf_regs.c +++ b/arch/s390/kernel/perf_regs.c @@ -42,6 +42,11 @@ int perf_reg_validate(u64 mask) return 0; } =20 +int perf_reg_ext_validate(unsigned long *mask, unsigned int size) +{ + return -EINVAL; +} + u64 perf_reg_abi(struct task_struct *task) { if (test_tsk_thread_flag(task, TIF_31BIT)) diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_= event.h index ba382361b13f..560eb218868c 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -706,6 +706,10 @@ struct x86_perf_regs { struct pt_regs regs; u64 ssp; u64 *xmm_regs; + u64 *opmask_regs; + u64 *ymmh_regs; + u64 *zmmh_regs; + u64 *h16zmm_regs; }; =20 extern unsigned long perf_arch_instruction_pointer(struct pt_regs *regs); diff --git a/arch/x86/include/uapi/asm/perf_regs.h b/arch/x86/include/uapi/= asm/perf_regs.h index f9c5b16b1882..5e2d9796b2cc 100644 --- a/arch/x86/include/uapi/asm/perf_regs.h +++ b/arch/x86/include/uapi/asm/perf_regs.h @@ -33,7 +33,7 @@ enum perf_event_x86_regs { PERF_REG_X86_32_MAX =3D PERF_REG_X86_GS + 1, PERF_REG_X86_64_MAX =3D PERF_REG_X86_SSP + 1, =20 - /* These all need two bits set because they are 128bit */ + /* These all need two bits set because they are 128 bits */ PERF_REG_X86_XMM0 =3D 32, PERF_REG_X86_XMM1 =3D 34, PERF_REG_X86_XMM2 =3D 36, @@ -53,6 +53,83 @@ enum perf_event_x86_regs { =20 /* These include both GPRs and XMMX registers */ PERF_REG_X86_XMM_MAX =3D PERF_REG_X86_XMM15 + 2, + + /* Leave bits[127:64] for other GP registers, like R16 ~ R31.*/ + + /* + * Each YMM register need 4 bits to represent because they are 256 bits. + * PERF_REG_X86_YMMH0 =3D 128 + */ + PERF_REG_X86_YMM0 =3D 128, + PERF_REG_X86_YMM1 =3D PERF_REG_X86_YMM0 + 4, + PERF_REG_X86_YMM2 =3D PERF_REG_X86_YMM1 + 4, + PERF_REG_X86_YMM3 =3D PERF_REG_X86_YMM2 + 4, + PERF_REG_X86_YMM4 =3D PERF_REG_X86_YMM3 + 4, + PERF_REG_X86_YMM5 =3D PERF_REG_X86_YMM4 + 4, + PERF_REG_X86_YMM6 =3D PERF_REG_X86_YMM5 + 4, + PERF_REG_X86_YMM7 =3D PERF_REG_X86_YMM6 + 4, + PERF_REG_X86_YMM8 =3D PERF_REG_X86_YMM7 + 4, + PERF_REG_X86_YMM9 =3D PERF_REG_X86_YMM8 + 4, + PERF_REG_X86_YMM10 =3D PERF_REG_X86_YMM9 + 4, + PERF_REG_X86_YMM11 =3D PERF_REG_X86_YMM10 + 4, + PERF_REG_X86_YMM12 =3D PERF_REG_X86_YMM11 + 4, + PERF_REG_X86_YMM13 =3D PERF_REG_X86_YMM12 + 4, + PERF_REG_X86_YMM14 =3D PERF_REG_X86_YMM13 + 4, + PERF_REG_X86_YMM15 =3D PERF_REG_X86_YMM14 + 4, + PERF_REG_X86_YMM_MAX =3D PERF_REG_X86_YMM15 + 4, + + /* + * Each ZMM register needs 8 bits to represent because they are 512 bits + * PERF_REG_X86_ZMMH0 =3D 192 + */ + PERF_REG_X86_ZMM0 =3D PERF_REG_X86_YMM_MAX, + PERF_REG_X86_ZMM1 =3D PERF_REG_X86_ZMM0 + 8, + PERF_REG_X86_ZMM2 =3D PERF_REG_X86_ZMM1 + 8, + PERF_REG_X86_ZMM3 =3D PERF_REG_X86_ZMM2 + 8, + PERF_REG_X86_ZMM4 =3D PERF_REG_X86_ZMM3 + 8, + PERF_REG_X86_ZMM5 =3D PERF_REG_X86_ZMM4 + 8, + PERF_REG_X86_ZMM6 =3D PERF_REG_X86_ZMM5 + 8, + PERF_REG_X86_ZMM7 =3D PERF_REG_X86_ZMM6 + 8, + PERF_REG_X86_ZMM8 =3D PERF_REG_X86_ZMM7 + 8, + PERF_REG_X86_ZMM9 =3D PERF_REG_X86_ZMM8 + 8, + PERF_REG_X86_ZMM10 =3D PERF_REG_X86_ZMM9 + 8, + PERF_REG_X86_ZMM11 =3D PERF_REG_X86_ZMM10 + 8, + PERF_REG_X86_ZMM12 =3D PERF_REG_X86_ZMM11 + 8, + PERF_REG_X86_ZMM13 =3D PERF_REG_X86_ZMM12 + 8, + PERF_REG_X86_ZMM14 =3D PERF_REG_X86_ZMM13 + 8, + PERF_REG_X86_ZMM15 =3D PERF_REG_X86_ZMM14 + 8, + PERF_REG_X86_ZMM16 =3D PERF_REG_X86_ZMM15 + 8, + PERF_REG_X86_ZMM17 =3D PERF_REG_X86_ZMM16 + 8, + PERF_REG_X86_ZMM18 =3D PERF_REG_X86_ZMM17 + 8, + PERF_REG_X86_ZMM19 =3D PERF_REG_X86_ZMM18 + 8, + PERF_REG_X86_ZMM20 =3D PERF_REG_X86_ZMM19 + 8, + PERF_REG_X86_ZMM21 =3D PERF_REG_X86_ZMM20 + 8, + PERF_REG_X86_ZMM22 =3D PERF_REG_X86_ZMM21 + 8, + PERF_REG_X86_ZMM23 =3D PERF_REG_X86_ZMM22 + 8, + PERF_REG_X86_ZMM24 =3D PERF_REG_X86_ZMM23 + 8, + PERF_REG_X86_ZMM25 =3D PERF_REG_X86_ZMM24 + 8, + PERF_REG_X86_ZMM26 =3D PERF_REG_X86_ZMM25 + 8, + PERF_REG_X86_ZMM27 =3D PERF_REG_X86_ZMM26 + 8, + PERF_REG_X86_ZMM28 =3D PERF_REG_X86_ZMM27 + 8, + PERF_REG_X86_ZMM29 =3D PERF_REG_X86_ZMM28 + 8, + PERF_REG_X86_ZMM30 =3D PERF_REG_X86_ZMM29 + 8, + PERF_REG_X86_ZMM31 =3D PERF_REG_X86_ZMM30 + 8, + PERF_REG_X86_ZMM_MAX =3D PERF_REG_X86_ZMM31 + 8, + + /* + * OPMASK Registers + * PERF_REG_X86_OPMASK0 =3D 448 + */ + PERF_REG_X86_OPMASK0 =3D PERF_REG_X86_ZMM_MAX, + PERF_REG_X86_OPMASK1 =3D PERF_REG_X86_OPMASK0 + 1, + PERF_REG_X86_OPMASK2 =3D PERF_REG_X86_OPMASK1 + 1, + PERF_REG_X86_OPMASK3 =3D PERF_REG_X86_OPMASK2 + 1, + PERF_REG_X86_OPMASK4 =3D PERF_REG_X86_OPMASK3 + 1, + PERF_REG_X86_OPMASK5 =3D PERF_REG_X86_OPMASK4 + 1, + PERF_REG_X86_OPMASK6 =3D PERF_REG_X86_OPMASK5 + 1, + PERF_REG_X86_OPMASK7 =3D PERF_REG_X86_OPMASK6 + 1, + + PERF_REG_X86_VEC_MAX =3D PERF_REG_X86_OPMASK7 + 1, }; =20 #define PERF_REG_EXTENDED_MASK (~((1ULL << PERF_REG_X86_XMM0) - 1)) diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c index 985bd616200e..466ccd67ea99 100644 --- a/arch/x86/kernel/perf_regs.c +++ b/arch/x86/kernel/perf_regs.c @@ -59,12 +59,55 @@ static unsigned int pt_regs_offset[PERF_REG_X86_MAX] = =3D { #endif }; =20 +static u64 perf_reg_ext_value(struct pt_regs *regs, int idx) +{ + struct x86_perf_regs *perf_regs =3D container_of(regs, struct x86_perf_re= gs, regs); + u64 data; + int mod; + + switch (idx) { + case PERF_REG_X86_YMM0 ... PERF_REG_X86_YMM_MAX - 1: + idx -=3D PERF_REG_X86_YMM0; + mod =3D idx % 4; + if (mod < 2) + data =3D !perf_regs->xmm_regs ? 0 : perf_regs->xmm_regs[idx / 4 + mod]; + else + data =3D !perf_regs->ymmh_regs ? 0 : perf_regs->ymmh_regs[idx / 4 + mod= - 2]; + return data; + case PERF_REG_X86_ZMM0 ... PERF_REG_X86_ZMM16 - 1: + idx -=3D PERF_REG_X86_ZMM0; + mod =3D idx % 8; + if (mod < 4) { + if (mod < 2) + data =3D !perf_regs->xmm_regs ? 0 : perf_regs->xmm_regs[idx / 8 + mod]; + else + data =3D !perf_regs->ymmh_regs ? 0 : perf_regs->ymmh_regs[idx / 8 + mo= d - 2]; + } else { + data =3D !perf_regs->zmmh_regs ? 0 : perf_regs->zmmh_regs[idx / 8 + mod= - 4]; + } + return data; + case PERF_REG_X86_ZMM16 ... PERF_REG_X86_ZMM_MAX - 1: + idx -=3D PERF_REG_X86_ZMM16; + return !perf_regs->h16zmm_regs ? 0 : perf_regs->h16zmm_regs[idx]; + case PERF_REG_X86_OPMASK0 ... PERF_REG_X86_OPMASK7: + idx -=3D PERF_REG_X86_OPMASK0; + return !perf_regs->opmask_regs ? 0 : perf_regs->opmask_regs[idx]; + default: + WARN_ON_ONCE(1); + break; + } + + return 0; +} + u64 perf_reg_value(struct pt_regs *regs, int idx) { - struct x86_perf_regs *perf_regs; + struct x86_perf_regs *perf_regs =3D container_of(regs, struct x86_perf_re= gs, regs); + + if (idx >=3D PERF_REG_EXTENDED_OFFSET) + return perf_reg_ext_value(regs, idx); =20 if (idx >=3D PERF_REG_X86_XMM0 && idx < PERF_REG_X86_XMM_MAX) { - perf_regs =3D container_of(regs, struct x86_perf_regs, regs); if (!perf_regs->xmm_regs) return 0; return perf_regs->xmm_regs[idx - PERF_REG_X86_XMM0]; @@ -102,6 +145,11 @@ int perf_reg_validate(u64 mask) return 0; } =20 +int perf_reg_ext_validate(unsigned long *mask, unsigned int size) +{ + return -EINVAL; +} + u64 perf_reg_abi(struct task_struct *task) { return PERF_SAMPLE_REGS_ABI_32; @@ -127,6 +175,18 @@ int perf_reg_validate(u64 mask) return 0; } =20 +int perf_reg_ext_validate(unsigned long *mask, unsigned int size) +{ + if (!mask || !size || size > PERF_NUM_EXT_REGS) + return -EINVAL; + + if (find_last_bit(mask, size) > + (PERF_REG_X86_VEC_MAX - PERF_REG_EXTENDED_OFFSET)) + return -EINVAL; + + return 0; +} + u64 perf_reg_abi(struct task_struct *task) { if (!user_64bit_mode(task_pt_regs(task))) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 947ad12dfdbe..5a33c5a0e4e4 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -303,6 +303,7 @@ struct perf_event_pmu_context; #define PERF_PMU_CAP_AUX_OUTPUT 0x0080 #define PERF_PMU_CAP_EXTENDED_HW_TYPE 0x0100 #define PERF_PMU_CAP_AUX_PAUSE 0x0200 +#define PERF_PMU_CAP_MORE_EXT_REGS 0x0400 =20 /** * pmu::scope @@ -1424,6 +1425,9 @@ static inline void perf_clear_branch_entry_bitfields(= struct perf_branch_entry *b br->reserved =3D 0; } =20 +extern bool has_more_extended_intr_regs(struct perf_event *event); +extern bool has_more_extended_user_regs(struct perf_event *event); +extern bool has_more_extended_regs(struct perf_event *event); extern void perf_output_sample(struct perf_output_handle *handle, struct perf_event_header *header, struct perf_sample_data *data, diff --git a/include/linux/perf_regs.h b/include/linux/perf_regs.h index f632c5725f16..aa4dfb5af552 100644 --- a/include/linux/perf_regs.h +++ b/include/linux/perf_regs.h @@ -9,6 +9,8 @@ struct perf_regs { struct pt_regs *regs; }; =20 +#define PERF_REG_EXTENDED_OFFSET 64 + #ifdef CONFIG_HAVE_PERF_REGS #include =20 @@ -21,6 +23,8 @@ int perf_reg_validate(u64 mask); u64 perf_reg_abi(struct task_struct *task); void perf_get_regs_user(struct perf_regs *regs_user, struct pt_regs *regs); +int perf_reg_ext_validate(unsigned long *mask, unsigned int size); + #else =20 #define PERF_REG_EXTENDED_MASK 0 @@ -35,6 +39,12 @@ static inline int perf_reg_validate(u64 mask) return mask ? -ENOSYS : 0; } =20 +static inline int perf_reg_ext_validate(unsigned long *mask, + unsigned int size) +{ + return -EINVAL; +} + static inline u64 perf_reg_abi(struct task_struct *task) { return PERF_SAMPLE_REGS_ABI_NONE; diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_even= t.h index 5fc753c23734..78aae0464a54 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -379,6 +379,10 @@ enum perf_event_read_format { #define PERF_ATTR_SIZE_VER6 120 /* add: aux_sample_size */ #define PERF_ATTR_SIZE_VER7 128 /* add: sig_data */ #define PERF_ATTR_SIZE_VER8 136 /* add: config3 */ +#define PERF_ATTR_SIZE_VER9 168 /* add: sample_regs_intr_ext[PERF_EXT_REGS= _ARRAY_SIZE] */ + +#define PERF_EXT_REGS_ARRAY_SIZE 7 +#define PERF_NUM_EXT_REGS (PERF_EXT_REGS_ARRAY_SIZE * 64) =20 /* * Hardware event_id to monitor via a performance monitoring event: @@ -533,6 +537,13 @@ struct perf_event_attr { __u64 sig_data; =20 __u64 config3; /* extension of config2 */ + + /* + * Extension sets of regs to dump for each sample. + * See asm/perf_regs.h for details. + */ + __u64 sample_regs_intr_ext[PERF_EXT_REGS_ARRAY_SIZE]; + __u64 sample_regs_user_ext[PERF_EXT_REGS_ARRAY_SIZE]; }; =20 /* diff --git a/kernel/events/core.c b/kernel/events/core.c index 2eb9cd5d86a1..ebf3be1a6e47 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -7345,6 +7345,21 @@ perf_output_sample_regs(struct perf_output_handle *h= andle, } } =20 +static void +perf_output_sample_regs_ext(struct perf_output_handle *handle, + struct pt_regs *regs, + unsigned long *mask, + unsigned int size) +{ + int bit; + u64 val; + + for_each_set_bit(bit, mask, size) { + val =3D perf_reg_value(regs, bit + PERF_REG_EXTENDED_OFFSET); + perf_output_put(handle, val); + } +} + static void perf_sample_regs_user(struct perf_regs *regs_user, struct pt_regs *regs) { @@ -7773,6 +7788,26 @@ static void perf_output_read(struct perf_output_hand= le *handle, perf_output_read_one(handle, event, enabled, running); } =20 +inline bool has_more_extended_intr_regs(struct perf_event *event) +{ + return !!bitmap_weight( + (unsigned long *)event->attr.sample_regs_intr_ext, + PERF_NUM_EXT_REGS); +} + +inline bool has_more_extended_user_regs(struct perf_event *event) +{ + return !!bitmap_weight( + (unsigned long *)event->attr.sample_regs_user_ext, + PERF_NUM_EXT_REGS); +} + +inline bool has_more_extended_regs(struct perf_event *event) +{ + return has_more_extended_intr_regs(event) || + has_more_extended_user_regs(event); +} + void perf_output_sample(struct perf_output_handle *handle, struct perf_event_header *header, struct perf_sample_data *data, @@ -7898,6 +7933,12 @@ void perf_output_sample(struct perf_output_handle *h= andle, perf_output_sample_regs(handle, data->regs_user.regs, mask); + if (has_more_extended_user_regs(event)) { + perf_output_sample_regs_ext( + handle, data->regs_user.regs, + (unsigned long *)event->attr.sample_regs_user_ext, + PERF_NUM_EXT_REGS); + } } } =20 @@ -7930,6 +7971,12 @@ void perf_output_sample(struct perf_output_handle *h= andle, perf_output_sample_regs(handle, data->regs_intr.regs, mask); + if (has_more_extended_intr_regs(event)) { + perf_output_sample_regs_ext( + handle, data->regs_intr.regs, + (unsigned long *)event->attr.sample_regs_intr_ext, + PERF_NUM_EXT_REGS); + } } } =20 @@ -8181,6 +8228,12 @@ void perf_prepare_sample(struct perf_sample_data *da= ta, if (data->regs_user.regs) { u64 mask =3D event->attr.sample_regs_user; size +=3D hweight64(mask) * sizeof(u64); + + if (has_more_extended_user_regs(event)) { + size +=3D bitmap_weight( + (unsigned long *)event->attr.sample_regs_user_ext, + PERF_NUM_EXT_REGS) * sizeof(u64); + } } =20 data->dyn_size +=3D size; @@ -8244,6 +8297,12 @@ void perf_prepare_sample(struct perf_sample_data *da= ta, u64 mask =3D event->attr.sample_regs_intr; =20 size +=3D hweight64(mask) * sizeof(u64); + + if (has_more_extended_intr_regs(event)) { + size +=3D bitmap_weight( + (unsigned long *)event->attr.sample_regs_intr_ext, + PERF_NUM_EXT_REGS) * sizeof(u64); + } } =20 data->dyn_size +=3D size; @@ -12496,6 +12555,12 @@ static int perf_try_init_event(struct pmu *pmu, st= ruct perf_event *event) goto err_destroy; } =20 + if (!(pmu->capabilities & PERF_PMU_CAP_MORE_EXT_REGS) && + has_more_extended_regs(event)) { + ret =3D -EOPNOTSUPP; + goto err_destroy; + } + if (pmu->capabilities & PERF_PMU_CAP_NO_EXCLUDE && event_has_any_exclude_flag(event)) { ret =3D -EINVAL; @@ -13028,9 +13093,19 @@ static int perf_copy_attr(struct perf_event_attr _= _user *uattr, } =20 if (attr->sample_type & PERF_SAMPLE_REGS_USER) { - ret =3D perf_reg_validate(attr->sample_regs_user); - if (ret) - return ret; + if (attr->sample_regs_user !=3D 0) { + ret =3D perf_reg_validate(attr->sample_regs_user); + if (ret) + return ret; + } + if (!!bitmap_weight((unsigned long *)attr->sample_regs_user_ext, + PERF_NUM_EXT_REGS)) { + ret =3D perf_reg_ext_validate( + (unsigned long *)attr->sample_regs_user_ext, + PERF_NUM_EXT_REGS); + if (ret) + return ret; + } } =20 if (attr->sample_type & PERF_SAMPLE_STACK_USER) { @@ -13051,8 +13126,21 @@ static int perf_copy_attr(struct perf_event_attr _= _user *uattr, if (!attr->sample_max_stack) attr->sample_max_stack =3D sysctl_perf_event_max_stack; =20 - if (attr->sample_type & PERF_SAMPLE_REGS_INTR) - ret =3D perf_reg_validate(attr->sample_regs_intr); + if (attr->sample_type & PERF_SAMPLE_REGS_INTR) { + if (attr->sample_regs_intr !=3D 0) { + ret =3D perf_reg_validate(attr->sample_regs_intr); + if (ret) + return ret; + } + if (!!bitmap_weight((unsigned long *)attr->sample_regs_intr_ext, + PERF_NUM_EXT_REGS)) { + ret =3D perf_reg_ext_validate( + (unsigned long *)attr->sample_regs_intr_ext, + PERF_NUM_EXT_REGS); + if (ret) + return ret; + } + } =20 #ifndef CONFIG_CGROUP_PERF if (attr->sample_type & PERF_SAMPLE_CGROUP) --=20 2.40.1 From nobody Fri Dec 19 16:05:43 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 672B521348; Tue, 15 Apr 2025 08:24:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744705453; cv=none; b=e43KiTSwtvzdGHjaKlYLqXNg3sU8YcWNlVMJC2lAwBM6oJMCP2vbYmxN/5vWkz8an7mrXJhAt3tMRGqJ/0a6ATmP0yz8V4xcBPfuV4s4xRiB56jMswjbX64wEHXqP+MNLsH3K4oFvWfTohKnjRuoDStJYA9e5jsQdulsQA+8qgE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744705453; c=relaxed/simple; bh=SDnBVN5iRq8yfd8DXONLFO4QH16pPwUoYjJQqkRNOxQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=gdHsih4srMt+MZU/HKcDgwe/Za8CmunFXg+2qxsUchukzASAcMXQM3nc1zUKChz6CZo7MXOJovtaHEL4gFZcbENFAqZOagNi0APn43RhYm5Dswihg8Y1EKBgg2MgvWmKLobA/o/vqBM4S9/rzzDbU3KzPK0ZDxGbx5M+b7Qh1Vo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=kzdAmdBc; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="kzdAmdBc" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1744705452; x=1776241452; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=SDnBVN5iRq8yfd8DXONLFO4QH16pPwUoYjJQqkRNOxQ=; b=kzdAmdBczlxAm6uLTRFjDyBitwlv83wG3frN5YkSdJdxIINeysIxMefB 1zFMLgMYBufvGp1fpzIg7W0it6dNU7BFl9aTgPiyjvsEsuxOkyMfrWuQL HCqoUB6fFZlOhcYF7KB1vv+e79KxwSKibIXNj3kr5XPZv3DQpqN/N/bDx F3arC8+XTh1PN9lP3m7U+I5zdFZzI9l6zuIEzKQxlw7pdQZUhrwD3sIzz zMPes5IxyPw7j4QfG77VImVDbG9d/gK7/iwIH8DeyFHZGpQG9Sy+d8vlT s1o+QMYI1TtJvIxLCVH8cholf44Ngy0QaGDPccuNI+ScwxZu/KRp0P5tB Q==; X-CSE-ConnectionGUID: /wi+RzIpQDW01ky0S5yVAw== X-CSE-MsgGUID: Ze8bV6y4TOSE9ka/at6yfQ== X-IronPort-AV: E=McAfee;i="6700,10204,11403"; a="46116102" X-IronPort-AV: E=Sophos;i="6.15,213,1739865600"; d="scan'208";a="46116102" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2025 01:24:12 -0700 X-CSE-ConnectionGUID: vpqzoh33R2CedmuJU2OW7Q== X-CSE-MsgGUID: pgzA67D4TY6SOkPhOtiaIQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.15,213,1739865600"; d="scan'208";a="130055663" Received: from emr.sh.intel.com ([10.112.229.56]) by fmviesa007.fm.intel.com with ESMTP; 15 Apr 2025 01:24:07 -0700 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Ian Rogers , Adrian Hunter , Alexander Shishkin , Kan Liang , Andi Kleen , Eranian Stephane Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Dapeng Mi , Dapeng Mi Subject: [Patch v3 17/22] perf/x86/intel: Support arch-PEBS vector registers group capturing Date: Tue, 15 Apr 2025 11:44:23 +0000 Message-Id: <20250415114428.341182-18-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250415114428.341182-1-dapeng1.mi@linux.intel.com> References: <20250415114428.341182-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add x86/intel specific vector register (VECR) group capturing for arch-PEBS. Enable corresponding VECR group bits in GPx_CFG_C/FX0_CFG_C MSRs if users configures these vector registers bitmap in perf_event_attr and parse VECR group in arch-PEBS record. Currently vector registers capturing is only supported by PEBS based sampling, PMU driver would return error if PMI based sampling tries to capture these vector registers. Co-developed-by: Kan Liang Signed-off-by: Kan Liang Signed-off-by: Dapeng Mi --- arch/x86/events/core.c | 90 +++++++++++++++++++++++++++++- arch/x86/events/intel/core.c | 15 +++++ arch/x86/events/intel/ds.c | 93 ++++++++++++++++++++++++++++--- arch/x86/include/asm/msr-index.h | 6 ++ arch/x86/include/asm/perf_event.h | 20 +++++++ 5 files changed, 214 insertions(+), 10 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 0ccbe8385c7f..16f019ff44f1 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -580,6 +580,73 @@ int x86_pmu_max_precise(struct pmu *pmu) return precise; } =20 +static bool has_vec_regs(struct perf_event *event, bool user, + int start, int end) +{ + int idx =3D (start - PERF_REG_EXTENDED_OFFSET) / 64; + int s =3D start % 64; + int e =3D end % 64; + u64 regs_mask; + + if (user) + regs_mask =3D event->attr.sample_regs_user_ext[idx]; + else + regs_mask =3D event->attr.sample_regs_intr_ext[idx]; + + return regs_mask & GENMASK_ULL(e, s); +} + +static inline bool has_ymm_regs(struct perf_event *event, bool user) +{ + return has_vec_regs(event, user, PERF_REG_X86_YMM0, PERF_REG_X86_YMM_MAX = - 1); +} + +static inline bool has_zmm_regs(struct perf_event *event, bool user) +{ + return has_vec_regs(event, user, PERF_REG_X86_ZMM0, PERF_REG_X86_ZMM8 - 1= ) || + has_vec_regs(event, user, PERF_REG_X86_ZMM8, PERF_REG_X86_ZMM16 - = 1); +} + +static inline bool has_h16zmm_regs(struct perf_event *event, bool user) +{ + return has_vec_regs(event, user, PERF_REG_X86_ZMM16, PERF_REG_X86_ZMM24 -= 1) || + has_vec_regs(event, user, PERF_REG_X86_ZMM24, PERF_REG_X86_ZMM_MAX= - 1); +} + +static inline bool has_opmask_regs(struct perf_event *event, bool user) +{ + return has_vec_regs(event, user, PERF_REG_X86_OPMASK0, PERF_REG_X86_OPMAS= K7); +} + +static bool ext_vec_regs_supported(struct perf_event *event, bool user) +{ + u64 caps =3D hybrid(event->pmu, arch_pebs_cap).caps; + + if (!(event->pmu->capabilities & PERF_PMU_CAP_MORE_EXT_REGS)) + return false; + + if (has_opmask_regs(event, user) && !(caps & ARCH_PEBS_VECR_OPMASK)) + return false; + + if (has_ymm_regs(event, user) && !(caps & ARCH_PEBS_VECR_YMMH)) + return false; + + if (has_zmm_regs(event, user) && !(caps & ARCH_PEBS_VECR_ZMMH)) + return false; + + if (has_h16zmm_regs(event, user) && !(caps & ARCH_PEBS_VECR_H16ZMM)) + return false; + + if (!event->attr.precise_ip) + return false; + + /* Only user space sampling is allowed for extended vector registers. */ + if (user && !event->attr.exclude_kernel) + return false; + + return true; +} + int x86_pmu_hw_config(struct perf_event *event) { if (event->attr.precise_ip) { @@ -665,9 +732,12 @@ int x86_pmu_hw_config(struct perf_event *event) return -EINVAL; } =20 - /* sample_regs_user never support XMM registers */ - if (unlikely(event->attr.sample_regs_user & PERF_REG_EXTENDED_MASK)) - return -EINVAL; + if (unlikely(event->attr.sample_regs_user & PERF_REG_EXTENDED_MASK)) { + /* Only user space sampling is allowed for XMM registers. */ + if (!event->attr.exclude_kernel) + return -EINVAL; + } + /* * Besides the general purpose registers, XMM registers may * be collected in PEBS on some platforms, e.g. Icelake @@ -680,6 +750,20 @@ int x86_pmu_hw_config(struct perf_event *event) return -EINVAL; } =20 + /* + * Architectural PEBS supports to capture more vector registers besides + * XMM registers, like YMM, OPMASK and ZMM registers. + */ + if (unlikely(has_more_extended_user_regs(event))) { + if (!ext_vec_regs_supported(event, true)) + return -EINVAL; + } + + if (unlikely(has_more_extended_intr_regs(event))) { + if (!ext_vec_regs_supported(event, false)) + return -EINVAL; + } + return x86_setup_perfctr(event); } =20 diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index b6416535f84d..9bd77974d83b 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -3007,6 +3007,18 @@ static void intel_pmu_enable_event_ext(struct perf_e= vent *event) if (pebs_data_cfg & PEBS_DATACFG_XMMS) ext |=3D ARCH_PEBS_VECR_XMM & cap.caps; =20 + if (pebs_data_cfg & PEBS_DATACFG_YMMHS) + ext |=3D ARCH_PEBS_VECR_YMMH & cap.caps; + + if (pebs_data_cfg & PEBS_DATACFG_OPMASKS) + ext |=3D ARCH_PEBS_VECR_OPMASK & cap.caps; + + if (pebs_data_cfg & PEBS_DATACFG_ZMMHS) + ext |=3D ARCH_PEBS_VECR_ZMMH & cap.caps; + + if (pebs_data_cfg & PEBS_DATACFG_H16ZMMS) + ext |=3D ARCH_PEBS_VECR_H16ZMM & cap.caps; + if (pebs_data_cfg & PEBS_DATACFG_LBRS) ext |=3D ARCH_PEBS_LBR & cap.caps; =20 @@ -5426,6 +5438,9 @@ static inline void __intel_update_pmu_caps(struct pmu= *pmu) =20 if (hybrid(pmu, arch_pebs_cap).caps & ARCH_PEBS_VECR_XMM) dest_pmu->capabilities |=3D PERF_PMU_CAP_EXTENDED_REGS; + + if (hybrid(pmu, arch_pebs_cap).caps & ARCH_PEBS_VECR_EXT) + dest_pmu->capabilities |=3D PERF_PMU_CAP_MORE_EXT_REGS; } =20 static inline void __intel_update_large_pebs_flags(struct pmu *pmu) diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index 91a093cba11f..26220bfbe885 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -1425,6 +1425,34 @@ void intel_pmu_pebs_late_setup(struct cpu_hw_events = *cpuc) PERF_SAMPLE_TRANSACTION | \ PERF_SAMPLE_DATA_PAGE_SIZE) =20 +static u64 pebs_get_ext_reg_data_cfg(unsigned long *ext_reg) +{ + u64 pebs_data_cfg =3D 0; + int bit; + + for_each_set_bit(bit, ext_reg, PERF_NUM_EXT_REGS) { + switch (bit + PERF_REG_EXTENDED_OFFSET) { + case PERF_REG_X86_OPMASK0 ... PERF_REG_X86_OPMASK7: + pebs_data_cfg |=3D PEBS_DATACFG_OPMASKS; + break; + case PERF_REG_X86_YMM0 ... PERF_REG_X86_YMM_MAX - 1: + pebs_data_cfg |=3D PEBS_DATACFG_YMMHS | PEBS_DATACFG_XMMS; + break; + case PERF_REG_X86_ZMM0 ... PERF_REG_X86_ZMM16 - 1: + pebs_data_cfg |=3D PEBS_DATACFG_ZMMHS | PEBS_DATACFG_YMMHS | + PEBS_DATACFG_XMMS; + break; + case PERF_REG_X86_ZMM16 ... PERF_REG_X86_ZMM_MAX - 1: + pebs_data_cfg |=3D PEBS_DATACFG_H16ZMMS; + break; + default: + break; + } + } + + return pebs_data_cfg; +} + static u64 pebs_update_adaptive_cfg(struct perf_event *event) { struct perf_event_attr *attr =3D &event->attr; @@ -1459,9 +1487,21 @@ static u64 pebs_update_adaptive_cfg(struct perf_even= t *event) if (gprs || (attr->precise_ip < 2) || tsx_weight) pebs_data_cfg |=3D PEBS_DATACFG_GP; =20 - if ((sample_type & PERF_SAMPLE_REGS_INTR) && - (attr->sample_regs_intr & PERF_REG_EXTENDED_MASK)) - pebs_data_cfg |=3D PEBS_DATACFG_XMMS; + if (sample_type & PERF_SAMPLE_REGS_INTR) { + if (attr->sample_regs_intr & PERF_REG_EXTENDED_MASK) + pebs_data_cfg |=3D PEBS_DATACFG_XMMS; + + pebs_data_cfg |=3D pebs_get_ext_reg_data_cfg( + (unsigned long *)event->attr.sample_regs_intr_ext); + } + + if (sample_type & PERF_SAMPLE_REGS_USER) { + if (attr->sample_regs_user & PERF_REG_EXTENDED_MASK) + pebs_data_cfg |=3D PEBS_DATACFG_XMMS; + + pebs_data_cfg |=3D pebs_get_ext_reg_data_cfg( + (unsigned long *)event->attr.sample_regs_user_ext); + } =20 if (sample_type & PERF_SAMPLE_BRANCH_STACK) { /* @@ -2245,6 +2285,10 @@ static void setup_pebs_adaptive_sample_data(struct p= erf_event *event, =20 perf_regs =3D container_of(regs, struct x86_perf_regs, regs); perf_regs->xmm_regs =3D NULL; + perf_regs->ymmh_regs =3D NULL; + perf_regs->opmask_regs =3D NULL; + perf_regs->zmmh_regs =3D NULL; + perf_regs->h16zmm_regs =3D NULL; perf_regs->ssp =3D 0; =20 format_group =3D basic->format_group; @@ -2362,6 +2406,10 @@ static void setup_arch_pebs_sample_data(struct perf_= event *event, =20 perf_regs =3D container_of(regs, struct x86_perf_regs, regs); perf_regs->xmm_regs =3D NULL; + perf_regs->ymmh_regs =3D NULL; + perf_regs->opmask_regs =3D NULL; + perf_regs->zmmh_regs =3D NULL; + perf_regs->h16zmm_regs =3D NULL; perf_regs->ssp =3D 0; =20 __setup_perf_sample_data(event, iregs, data); @@ -2412,14 +2460,45 @@ static void setup_arch_pebs_sample_data(struct perf= _event *event, meminfo->tsx_tuning, ax); } =20 - if (header->xmm) { + if (header->xmm || header->ymmh || header->opmask || + header->zmmh || header->h16zmm) { struct arch_pebs_xmm *xmm; + struct arch_pebs_ymmh *ymmh; + struct arch_pebs_zmmh *zmmh; + struct arch_pebs_h16zmm *h16zmm; + struct arch_pebs_opmask *opmask; =20 next_record +=3D sizeof(struct arch_pebs_xer_header); =20 - xmm =3D next_record; - perf_regs->xmm_regs =3D xmm->xmm; - next_record =3D xmm + 1; + if (header->xmm) { + xmm =3D next_record; + perf_regs->xmm_regs =3D xmm->xmm; + next_record =3D xmm + 1; + } + + if (header->ymmh) { + ymmh =3D next_record; + perf_regs->ymmh_regs =3D ymmh->ymmh; + next_record =3D ymmh + 1; + } + + if (header->opmask) { + opmask =3D next_record; + perf_regs->opmask_regs =3D opmask->opmask; + next_record =3D opmask + 1; + } + + if (header->zmmh) { + zmmh =3D next_record; + perf_regs->zmmh_regs =3D zmmh->zmmh; + next_record =3D zmmh + 1; + } + + if (header->h16zmm) { + h16zmm =3D next_record; + perf_regs->h16zmm_regs =3D h16zmm->h16zmm; + next_record =3D h16zmm + 1; + } } =20 if (header->lbr) { diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-in= dex.h index c971ac09d881..93193eb6ff94 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -328,6 +328,12 @@ #define ARCH_PEBS_LBR_SHIFT 40 #define ARCH_PEBS_LBR (0x3ull << ARCH_PEBS_LBR_SHIFT) #define ARCH_PEBS_VECR_XMM BIT_ULL(49) +#define ARCH_PEBS_VECR_YMMH BIT_ULL(50) +#define ARCH_PEBS_VECR_OPMASK BIT_ULL(53) +#define ARCH_PEBS_VECR_ZMMH BIT_ULL(54) +#define ARCH_PEBS_VECR_H16ZMM BIT_ULL(55) +#define ARCH_PEBS_VECR_EXT_SHIFT 50 +#define ARCH_PEBS_VECR_EXT (0x3full << ARCH_PEBS_VECR_EXT_SHIFT) #define ARCH_PEBS_GPR BIT_ULL(61) #define ARCH_PEBS_AUX BIT_ULL(62) #define ARCH_PEBS_EN BIT_ULL(63) diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_= event.h index 560eb218868c..a7b2548bf7b4 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -142,6 +142,10 @@ #define PEBS_DATACFG_LBRS BIT_ULL(3) #define PEBS_DATACFG_CNTR BIT_ULL(4) #define PEBS_DATACFG_METRICS BIT_ULL(5) +#define PEBS_DATACFG_YMMHS BIT_ULL(6) +#define PEBS_DATACFG_OPMASKS BIT_ULL(7) +#define PEBS_DATACFG_ZMMHS BIT_ULL(8) +#define PEBS_DATACFG_H16ZMMS BIT_ULL(9) #define PEBS_DATACFG_LBR_SHIFT 24 #define PEBS_DATACFG_CNTR_SHIFT 32 #define PEBS_DATACFG_CNTR_MASK GENMASK_ULL(15, 0) @@ -589,6 +593,22 @@ struct arch_pebs_xmm { u64 xmm[16*2]; /* two entries for each register */ }; =20 +struct arch_pebs_ymmh { + u64 ymmh[16*2]; /* two entries for each register */ +}; + +struct arch_pebs_opmask { + u64 opmask[8]; +}; + +struct arch_pebs_zmmh { + u64 zmmh[16*4]; /* four entries for each register */ +}; + +struct arch_pebs_h16zmm { + u64 h16zmm[16*8]; /* eight entries for each register */ +}; + #define ARCH_PEBS_LBR_NAN 0x0 #define ARCH_PEBS_LBR_NUM_8 0x1 #define ARCH_PEBS_LBR_NUM_16 0x2 --=20 2.40.1 From nobody Fri Dec 19 16:05:43 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F1F0728E5E7; Tue, 15 Apr 2025 08:24:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744705456; cv=none; b=AmnZWHIHF6Q3UmQyGgYvOjO4Hpi2Z1RW5LRsVCTizR/71ts6ozsApdzs/pwmSVe5EKr7auffxAf+DGHBz1y1gPn+18+3NobRKHRW6fu1//FCk3vZP/2hkOCZzH0/x2JxvzLV3N4tSKcYAmrNmoA62GYmrNJ/7FdsYaoglscc0v8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744705456; c=relaxed/simple; bh=+Zt70Jq+vHmKxuWddWk27DeekEa3h2+uFWDNCqkphRI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=beDG0QzuAZ9UAZ/aXTP0BMpAO/4atWDnVnr+tUXbNqIGZC2BVj1Hltu1eJg8ruUHK/G8TxnGzflBcIgTcSZxVzYA0IRQcaJfKKYHwRCnACBapaGuGGrzhvqS7bMgMoKXRGotK1oh9dAe9BhaSPr3Wr8H+hqm/ceVv+CUORD2Mmc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=UxAktIya; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="UxAktIya" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1744705455; x=1776241455; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=+Zt70Jq+vHmKxuWddWk27DeekEa3h2+uFWDNCqkphRI=; b=UxAktIyaW2V5Yb8pMrwRsdJZSIiQyU+ud4kWDZ7N9Kn2+CgdYO85utpa CTW3lxc9+vtCeWDzQT4nHphXtcfeqfI4nLkz5c5ySZxA4TIpRNnakFWvf vB43zL32HpI4wbEvBFp/fHkhiASDmUVcsYxf9v2JrJzFKpDOjIh4NvT4h sCiANCbEzCmQkN6zIIMejfiF0ndsLMIahG4Pn36OdWrFeJnonjiM2xWNV SQaerfjabSlPHm6XHzBRiiChqxSeOQF6+ILOTRZlGWufFuHLJUr9CWjTj u3ThEOu9mxj+ce+eU2UJmKEV/bB6ltVLzflJZlyygRUfXURXGmSFHbUM8 g==; X-CSE-ConnectionGUID: OMfphx9USMagEsEr7aVNjQ== X-CSE-MsgGUID: umm89REhTk67ATgZ5tr3Sw== X-IronPort-AV: E=McAfee;i="6700,10204,11403"; a="46116131" X-IronPort-AV: E=Sophos;i="6.15,213,1739865600"; d="scan'208";a="46116131" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2025 01:24:15 -0700 X-CSE-ConnectionGUID: jnWOWEYrRM2EZAdHr2e5wQ== X-CSE-MsgGUID: C5TglTvWTfWlXe1/z/b6hg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.15,213,1739865600"; d="scan'208";a="130055673" Received: from emr.sh.intel.com ([10.112.229.56]) by fmviesa007.fm.intel.com with ESMTP; 15 Apr 2025 01:24:11 -0700 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Ian Rogers , Adrian Hunter , Alexander Shishkin , Kan Liang , Andi Kleen , Eranian Stephane Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Dapeng Mi , Dapeng Mi Subject: [Patch v3 18/22] perf tools: Support to show SSP register Date: Tue, 15 Apr 2025 11:44:24 +0000 Message-Id: <20250415114428.341182-19-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250415114428.341182-1-dapeng1.mi@linux.intel.com> References: <20250415114428.341182-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add SSP register support. Reviewed-by: Ian Rogers Signed-off-by: Dapeng Mi --- tools/arch/x86/include/uapi/asm/perf_regs.h | 7 ++++++- tools/perf/arch/x86/util/perf_regs.c | 2 ++ tools/perf/util/intel-pt.c | 2 +- tools/perf/util/perf-regs-arch/perf_regs_x86.c | 2 ++ 4 files changed, 11 insertions(+), 2 deletions(-) diff --git a/tools/arch/x86/include/uapi/asm/perf_regs.h b/tools/arch/x86/i= nclude/uapi/asm/perf_regs.h index 7c9d2bb3833b..1c7ab5af5cc1 100644 --- a/tools/arch/x86/include/uapi/asm/perf_regs.h +++ b/tools/arch/x86/include/uapi/asm/perf_regs.h @@ -27,9 +27,14 @@ enum perf_event_x86_regs { PERF_REG_X86_R13, PERF_REG_X86_R14, PERF_REG_X86_R15, + /* arch-PEBS supports to capture shadow stack pointer (SSP) */ + PERF_REG_X86_SSP, /* These are the limits for the GPRs. */ PERF_REG_X86_32_MAX =3D PERF_REG_X86_GS + 1, - PERF_REG_X86_64_MAX =3D PERF_REG_X86_R15 + 1, + /* PERF_REG_X86_64_MAX used generally, for PEBS, etc. */ + PERF_REG_X86_64_MAX =3D PERF_REG_X86_SSP + 1, + /* PERF_REG_INTEL_PT_MAX ignores the SSP register. */ + PERF_REG_INTEL_PT_MAX =3D PERF_REG_X86_R15 + 1, =20 /* These all need two bits set because they are 128bit */ PERF_REG_X86_XMM0 =3D 32, diff --git a/tools/perf/arch/x86/util/perf_regs.c b/tools/perf/arch/x86/uti= l/perf_regs.c index 12fd93f04802..9f492568f3b4 100644 --- a/tools/perf/arch/x86/util/perf_regs.c +++ b/tools/perf/arch/x86/util/perf_regs.c @@ -36,6 +36,8 @@ static const struct sample_reg sample_reg_masks[] =3D { SMPL_REG(R14, PERF_REG_X86_R14), SMPL_REG(R15, PERF_REG_X86_R15), #endif + SMPL_REG(SSP, PERF_REG_X86_SSP), + SMPL_REG2(XMM0, PERF_REG_X86_XMM0), SMPL_REG2(XMM1, PERF_REG_X86_XMM1), SMPL_REG2(XMM2, PERF_REG_X86_XMM2), diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c index 4e8a9b172fbc..ad23973c9075 100644 --- a/tools/perf/util/intel-pt.c +++ b/tools/perf/util/intel-pt.c @@ -2179,7 +2179,7 @@ static u64 *intel_pt_add_gp_regs(struct regs_dump *in= tr_regs, u64 *pos, u32 bit; int i; =20 - for (i =3D 0, bit =3D 1; i < PERF_REG_X86_64_MAX; i++, bit <<=3D 1) { + for (i =3D 0, bit =3D 1; i < PERF_REG_INTEL_PT_MAX; i++, bit <<=3D 1) { /* Get the PEBS gp_regs array index */ int n =3D pebs_gp_regs[i] - 1; =20 diff --git a/tools/perf/util/perf-regs-arch/perf_regs_x86.c b/tools/perf/ut= il/perf-regs-arch/perf_regs_x86.c index 708954a9d35d..c0e95215b577 100644 --- a/tools/perf/util/perf-regs-arch/perf_regs_x86.c +++ b/tools/perf/util/perf-regs-arch/perf_regs_x86.c @@ -54,6 +54,8 @@ const char *__perf_reg_name_x86(int id) return "R14"; case PERF_REG_X86_R15: return "R15"; + case PERF_REG_X86_SSP: + return "SSP"; =20 #define XMM(x) \ case PERF_REG_X86_XMM ## x: \ --=20 2.40.1 From nobody Fri Dec 19 16:05:43 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 735F828F538; Tue, 15 Apr 2025 08:24:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744705462; cv=none; b=W0TTNz2bX9I5mJbxuUg4Y+kS0X9Lp0URd+uerkQbRxOYjZVQU5yyEc2JlZ5Q7Z6hjGcufQ8MrKo7+OnKZDOOKoqzrY5gsZhF7a/TKqaNHBnTC06tinYQcQ7ce9C7Burt4x+PTNxOlNVfc8xTspKIw3bSimQ/7Muzi/8SXf9jZhQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744705462; c=relaxed/simple; bh=pKsfa88mosn9f8QHFusnwOz+UfKkd03V/+U8xSDrrKE=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=k2S4vdTmCgd237EkKXV4SNHbT8nRvNNfFUoJDVu9dbuf26ADaEUHlvsdsm1V4LbPgAp+oiJPfTKkWbYM9ftAEXEG+JOgjmmW01ZIhhcix5nb5AzndpqdLddeExD2FcfsK1AGEna32q8wFO5Y5kD20gEjd6untskXhIGSvW9Y6U8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=NJ+y+La1; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="NJ+y+La1" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1744705461; x=1776241461; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=pKsfa88mosn9f8QHFusnwOz+UfKkd03V/+U8xSDrrKE=; b=NJ+y+La1b7IzSQzws1i1fz/4JZOaIqork05dkqzSPwBJByIs4DwRJHDq q/N0y4w6FuCUv728lAY2V8Cc6Ym0D2Sqt34sIE2YS5ujlVDfLZBIIOb9V jROTDIWaJKy4dOdesx08thm8N/dhme2mwtyMrurmXKTzRe0UuyrqRd+GD CG3OQ6X42MKsYmpgyuCLpeWzHOINVJiH72y6cOKI76JdU4eK6AyTZ3u4+ w1s26h6JDjjzWUdMjdDztTS5GRYZGIPU1BxKvlv56PVHiRSwivnFfeuhG +KJf8jWMNL+VN3phEI8v1Q0uCO8MGS9q415cCS0Kd8REkaJV3ozWuSFCr g==; X-CSE-ConnectionGUID: gzlI68AERBW9szAyj1UvuA== X-CSE-MsgGUID: xV0DeVlCRIui/GVjxF92eQ== X-IronPort-AV: E=McAfee;i="6700,10204,11403"; a="46116159" X-IronPort-AV: E=Sophos;i="6.15,213,1739865600"; d="scan'208";a="46116159" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2025 01:24:19 -0700 X-CSE-ConnectionGUID: bgDEVn2BSoqbL9vF3Z8yRg== X-CSE-MsgGUID: eFhq9IZ9QMiQ9oPaFOdyLA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.15,213,1739865600"; d="scan'208";a="130055687" Received: from emr.sh.intel.com ([10.112.229.56]) by fmviesa007.fm.intel.com with ESMTP; 15 Apr 2025 01:24:14 -0700 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Ian Rogers , Adrian Hunter , Alexander Shishkin , Kan Liang , Andi Kleen , Eranian Stephane Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Dapeng Mi , Dapeng Mi Subject: [Patch v3 19/22] perf tools: Enhance arch__intr/user_reg_mask() helpers Date: Tue, 15 Apr 2025 11:44:25 +0000 Message-Id: <20250415114428.341182-20-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250415114428.341182-1-dapeng1.mi@linux.intel.com> References: <20250415114428.341182-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Arch-PEBS supports to capture more higher-width vector registers, like YMM/ZMM registers, while the return value "uint64_t" of these 2 helpers is not enough to represent these new added registors. Thus enhance these two helpers by passing a "unsigned long" pointer, so these two helpers can return more bits via this pointer. Currently only sample_intr_regs supports these new added vector registers, but change arch__user_reg_mask() for the sake of consistency as well. Signed-off-by: Dapeng Mi --- tools/perf/arch/arm/util/perf_regs.c | 8 ++++---- tools/perf/arch/arm64/util/perf_regs.c | 11 ++++++----- tools/perf/arch/csky/util/perf_regs.c | 8 ++++---- tools/perf/arch/loongarch/util/perf_regs.c | 8 ++++---- tools/perf/arch/mips/util/perf_regs.c | 8 ++++---- tools/perf/arch/powerpc/util/perf_regs.c | 17 +++++++++-------- tools/perf/arch/riscv/util/perf_regs.c | 8 ++++---- tools/perf/arch/s390/util/perf_regs.c | 8 ++++---- tools/perf/arch/x86/util/perf_regs.c | 13 +++++++------ tools/perf/util/evsel.c | 6 ++++-- tools/perf/util/parse-regs-options.c | 6 +++--- tools/perf/util/perf_regs.c | 8 ++++---- tools/perf/util/perf_regs.h | 4 ++-- 13 files changed, 59 insertions(+), 54 deletions(-) diff --git a/tools/perf/arch/arm/util/perf_regs.c b/tools/perf/arch/arm/uti= l/perf_regs.c index f94a0210c7b7..14f18d518c96 100644 --- a/tools/perf/arch/arm/util/perf_regs.c +++ b/tools/perf/arch/arm/util/perf_regs.c @@ -6,14 +6,14 @@ static const struct sample_reg sample_reg_masks[] =3D { SMPL_REG_END }; =20 -uint64_t arch__intr_reg_mask(void) +void arch__intr_reg_mask(unsigned long *mask) { - return PERF_REGS_MASK; + *(uint64_t *)mask =3D PERF_REGS_MASK; } =20 -uint64_t arch__user_reg_mask(void) +void arch__user_reg_mask(unsigned long *mask) { - return PERF_REGS_MASK; + *(uint64_t *)mask =3D PERF_REGS_MASK; } =20 const struct sample_reg *arch__sample_reg_masks(void) diff --git a/tools/perf/arch/arm64/util/perf_regs.c b/tools/perf/arch/arm64= /util/perf_regs.c index 09308665e28a..9bcf4755290c 100644 --- a/tools/perf/arch/arm64/util/perf_regs.c +++ b/tools/perf/arch/arm64/util/perf_regs.c @@ -140,12 +140,12 @@ int arch_sdt_arg_parse_op(char *old_op, char **new_op) return SDT_ARG_VALID; } =20 -uint64_t arch__intr_reg_mask(void) +void arch__intr_reg_mask(unsigned long *mask) { - return PERF_REGS_MASK; + *(uint64_t *)mask =3D PERF_REGS_MASK; } =20 -uint64_t arch__user_reg_mask(void) +void arch__user_reg_mask(unsigned long *mask) { struct perf_event_attr attr =3D { .type =3D PERF_TYPE_HARDWARE, @@ -170,10 +170,11 @@ uint64_t arch__user_reg_mask(void) fd =3D sys_perf_event_open(&attr, 0, -1, -1, 0); if (fd !=3D -1) { close(fd); - return attr.sample_regs_user; + *(uint64_t *)mask =3D attr.sample_regs_user; + return; } } - return PERF_REGS_MASK; + *(uint64_t *)mask =3D PERF_REGS_MASK; } =20 const struct sample_reg *arch__sample_reg_masks(void) diff --git a/tools/perf/arch/csky/util/perf_regs.c b/tools/perf/arch/csky/u= til/perf_regs.c index 6b1665f41180..56c84fc91aff 100644 --- a/tools/perf/arch/csky/util/perf_regs.c +++ b/tools/perf/arch/csky/util/perf_regs.c @@ -6,14 +6,14 @@ static const struct sample_reg sample_reg_masks[] =3D { SMPL_REG_END }; =20 -uint64_t arch__intr_reg_mask(void) +void arch__intr_reg_mask(unsigned long *mask) { - return PERF_REGS_MASK; + *(uint64_t *)mask =3D PERF_REGS_MASK; } =20 -uint64_t arch__user_reg_mask(void) +void arch__user_reg_mask(unsigned long *mask) { - return PERF_REGS_MASK; + *(uint64_t *)mask =3D PERF_REGS_MASK; } =20 const struct sample_reg *arch__sample_reg_masks(void) diff --git a/tools/perf/arch/loongarch/util/perf_regs.c b/tools/perf/arch/l= oongarch/util/perf_regs.c index f94a0210c7b7..14f18d518c96 100644 --- a/tools/perf/arch/loongarch/util/perf_regs.c +++ b/tools/perf/arch/loongarch/util/perf_regs.c @@ -6,14 +6,14 @@ static const struct sample_reg sample_reg_masks[] =3D { SMPL_REG_END }; =20 -uint64_t arch__intr_reg_mask(void) +void arch__intr_reg_mask(unsigned long *mask) { - return PERF_REGS_MASK; + *(uint64_t *)mask =3D PERF_REGS_MASK; } =20 -uint64_t arch__user_reg_mask(void) +void arch__user_reg_mask(unsigned long *mask) { - return PERF_REGS_MASK; + *(uint64_t *)mask =3D PERF_REGS_MASK; } =20 const struct sample_reg *arch__sample_reg_masks(void) diff --git a/tools/perf/arch/mips/util/perf_regs.c b/tools/perf/arch/mips/u= til/perf_regs.c index 6b1665f41180..56c84fc91aff 100644 --- a/tools/perf/arch/mips/util/perf_regs.c +++ b/tools/perf/arch/mips/util/perf_regs.c @@ -6,14 +6,14 @@ static const struct sample_reg sample_reg_masks[] =3D { SMPL_REG_END }; =20 -uint64_t arch__intr_reg_mask(void) +void arch__intr_reg_mask(unsigned long *mask) { - return PERF_REGS_MASK; + *(uint64_t *)mask =3D PERF_REGS_MASK; } =20 -uint64_t arch__user_reg_mask(void) +void arch__user_reg_mask(unsigned long *mask) { - return PERF_REGS_MASK; + *(uint64_t *)mask =3D PERF_REGS_MASK; } =20 const struct sample_reg *arch__sample_reg_masks(void) diff --git a/tools/perf/arch/powerpc/util/perf_regs.c b/tools/perf/arch/pow= erpc/util/perf_regs.c index bd36cfd420a2..e5d042305030 100644 --- a/tools/perf/arch/powerpc/util/perf_regs.c +++ b/tools/perf/arch/powerpc/util/perf_regs.c @@ -187,7 +187,7 @@ int arch_sdt_arg_parse_op(char *old_op, char **new_op) return SDT_ARG_VALID; } =20 -uint64_t arch__intr_reg_mask(void) +void arch__intr_reg_mask(unsigned long *mask) { struct perf_event_attr attr =3D { .type =3D PERF_TYPE_HARDWARE, @@ -199,7 +199,7 @@ uint64_t arch__intr_reg_mask(void) }; int fd; u32 version; - u64 extended_mask =3D 0, mask =3D PERF_REGS_MASK; + u64 extended_mask =3D 0; =20 /* * Get the PVR value to set the extended @@ -210,8 +210,10 @@ uint64_t arch__intr_reg_mask(void) extended_mask =3D PERF_REG_PMU_MASK_300; else if ((version =3D=3D PVR_POWER10) || (version =3D=3D PVR_POWER11)) extended_mask =3D PERF_REG_PMU_MASK_31; - else - return mask; + else { + *(u64 *)mask =3D PERF_REGS_MASK; + return; + } =20 attr.sample_regs_intr =3D extended_mask; attr.sample_period =3D 1; @@ -224,14 +226,13 @@ uint64_t arch__intr_reg_mask(void) fd =3D sys_perf_event_open(&attr, 0, -1, -1, 0); if (fd !=3D -1) { close(fd); - mask |=3D extended_mask; + *(u64 *)mask =3D PERF_REGS_MASK | extended_mask; } - return mask; } =20 -uint64_t arch__user_reg_mask(void) +void arch__user_reg_mask(unsigned long *mask) { - return PERF_REGS_MASK; + *(uint64_t *)mask =3D PERF_REGS_MASK; } =20 const struct sample_reg *arch__sample_reg_masks(void) diff --git a/tools/perf/arch/riscv/util/perf_regs.c b/tools/perf/arch/riscv= /util/perf_regs.c index 6b1665f41180..56c84fc91aff 100644 --- a/tools/perf/arch/riscv/util/perf_regs.c +++ b/tools/perf/arch/riscv/util/perf_regs.c @@ -6,14 +6,14 @@ static const struct sample_reg sample_reg_masks[] =3D { SMPL_REG_END }; =20 -uint64_t arch__intr_reg_mask(void) +void arch__intr_reg_mask(unsigned long *mask) { - return PERF_REGS_MASK; + *(uint64_t *)mask =3D PERF_REGS_MASK; } =20 -uint64_t arch__user_reg_mask(void) +void arch__user_reg_mask(unsigned long *mask) { - return PERF_REGS_MASK; + *(uint64_t *)mask =3D PERF_REGS_MASK; } =20 const struct sample_reg *arch__sample_reg_masks(void) diff --git a/tools/perf/arch/s390/util/perf_regs.c b/tools/perf/arch/s390/u= til/perf_regs.c index 6b1665f41180..56c84fc91aff 100644 --- a/tools/perf/arch/s390/util/perf_regs.c +++ b/tools/perf/arch/s390/util/perf_regs.c @@ -6,14 +6,14 @@ static const struct sample_reg sample_reg_masks[] =3D { SMPL_REG_END }; =20 -uint64_t arch__intr_reg_mask(void) +void arch__intr_reg_mask(unsigned long *mask) { - return PERF_REGS_MASK; + *(uint64_t *)mask =3D PERF_REGS_MASK; } =20 -uint64_t arch__user_reg_mask(void) +void arch__user_reg_mask(unsigned long *mask) { - return PERF_REGS_MASK; + *(uint64_t *)mask =3D PERF_REGS_MASK; } =20 const struct sample_reg *arch__sample_reg_masks(void) diff --git a/tools/perf/arch/x86/util/perf_regs.c b/tools/perf/arch/x86/uti= l/perf_regs.c index 9f492568f3b4..5b163f0a651a 100644 --- a/tools/perf/arch/x86/util/perf_regs.c +++ b/tools/perf/arch/x86/util/perf_regs.c @@ -283,7 +283,7 @@ const struct sample_reg *arch__sample_reg_masks(void) return sample_reg_masks; } =20 -uint64_t arch__intr_reg_mask(void) +void arch__intr_reg_mask(unsigned long *mask) { struct perf_event_attr attr =3D { .type =3D PERF_TYPE_HARDWARE, @@ -295,6 +295,9 @@ uint64_t arch__intr_reg_mask(void) .exclude_kernel =3D 1, }; int fd; + + *(u64 *)mask =3D PERF_REGS_MASK; + /* * In an unnamed union, init it here to build on older gcc versions */ @@ -320,13 +323,11 @@ uint64_t arch__intr_reg_mask(void) fd =3D sys_perf_event_open(&attr, 0, -1, -1, 0); if (fd !=3D -1) { close(fd); - return (PERF_REG_EXTENDED_MASK | PERF_REGS_MASK); + *(u64 *)mask =3D PERF_REG_EXTENDED_MASK | PERF_REGS_MASK; } - - return PERF_REGS_MASK; } =20 -uint64_t arch__user_reg_mask(void) +void arch__user_reg_mask(unsigned long *mask) { - return PERF_REGS_MASK; + *(uint64_t *)mask =3D PERF_REGS_MASK; } diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index 1974395492d7..6e71187d6a93 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -1056,17 +1056,19 @@ static void __evsel__config_callchain(struct evsel = *evsel, struct record_opts *o if (param->record_mode =3D=3D CALLCHAIN_DWARF) { if (!function) { const char *arch =3D perf_env__arch(evsel__env(evsel)); + uint64_t mask =3D 0; =20 + arch__user_reg_mask((unsigned long *)&mask); evsel__set_sample_bit(evsel, REGS_USER); evsel__set_sample_bit(evsel, STACK_USER); if (opts->sample_user_regs && - DWARF_MINIMAL_REGS(arch) !=3D arch__user_reg_mask()) { + DWARF_MINIMAL_REGS(arch) !=3D mask) { attr->sample_regs_user |=3D DWARF_MINIMAL_REGS(arch); pr_warning("WARNING: The use of --call-graph=3Ddwarf may require all t= he user registers, " "specifying a subset with --user-regs may render DWARF unwinding u= nreliable, " "so the minimal registers set (IP, SP) is explicitly forced.\n"); } else { - attr->sample_regs_user |=3D arch__user_reg_mask(); + attr->sample_regs_user |=3D mask; } attr->sample_stack_user =3D param->dump_size; attr->exclude_callchain_user =3D 1; diff --git a/tools/perf/util/parse-regs-options.c b/tools/perf/util/parse-r= egs-options.c index cda1c620968e..3dcd8dc4f81b 100644 --- a/tools/perf/util/parse-regs-options.c +++ b/tools/perf/util/parse-regs-options.c @@ -16,7 +16,7 @@ __parse_regs(const struct option *opt, const char *str, i= nt unset, bool intr) const struct sample_reg *r =3D NULL; char *s, *os =3D NULL, *p; int ret =3D -1; - uint64_t mask; + uint64_t mask =3D 0; =20 if (unset) return 0; @@ -28,9 +28,9 @@ __parse_regs(const struct option *opt, const char *str, i= nt unset, bool intr) return -1; =20 if (intr) - mask =3D arch__intr_reg_mask(); + arch__intr_reg_mask((unsigned long *)&mask); else - mask =3D arch__user_reg_mask(); + arch__user_reg_mask((unsigned long *)&mask); =20 /* str may be NULL in case no arg is passed to -I */ if (str) { diff --git a/tools/perf/util/perf_regs.c b/tools/perf/util/perf_regs.c index 44b90bbf2d07..7a96290fd1e6 100644 --- a/tools/perf/util/perf_regs.c +++ b/tools/perf/util/perf_regs.c @@ -11,14 +11,14 @@ int __weak arch_sdt_arg_parse_op(char *old_op __maybe_u= nused, return SDT_ARG_SKIP; } =20 -uint64_t __weak arch__intr_reg_mask(void) +void __weak arch__intr_reg_mask(unsigned long *mask) { - return 0; + *(uint64_t *)mask =3D 0; } =20 -uint64_t __weak arch__user_reg_mask(void) +void __weak arch__user_reg_mask(unsigned long *mask) { - return 0; + *(uint64_t *)mask =3D 0; } =20 static const struct sample_reg sample_reg_masks[] =3D { diff --git a/tools/perf/util/perf_regs.h b/tools/perf/util/perf_regs.h index f2d0736d65cc..316d280e5cd7 100644 --- a/tools/perf/util/perf_regs.h +++ b/tools/perf/util/perf_regs.h @@ -24,8 +24,8 @@ enum { }; =20 int arch_sdt_arg_parse_op(char *old_op, char **new_op); -uint64_t arch__intr_reg_mask(void); -uint64_t arch__user_reg_mask(void); +void arch__intr_reg_mask(unsigned long *mask); +void arch__user_reg_mask(unsigned long *mask); const struct sample_reg *arch__sample_reg_masks(void); =20 const char *perf_reg_name(int id, const char *arch); --=20 2.40.1 From nobody Fri Dec 19 16:05:43 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9B7DD29116F; Tue, 15 Apr 2025 08:24:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744705464; cv=none; b=XvZEGYHCzrV9s4GU7M4I/9cgKJbqEG/PQUI8tePuA4ldWAHVg5nJ3r4GNV+6hVOloughXyfOTMcMnBlZQtMkA/e9jrx7B3PSu4E4i5Q2Se0nVxFoDiXVOo45OxRvYf+LnfpCG+JCXDkFobUlRHsE+fAijGChZ6KTwzRWsRt+Ju8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744705464; c=relaxed/simple; bh=7d2zMHeWj50Um4H2evhWEbsinhS6K3aMK32UCV5IJ1w=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Is8pqdNShY6qaC2lqX4YYKCOYsiP88OgmgDEKPxn89Zr2rm7ZSB0X7djCUtUZq8JZNumylybRwRVRVNq1BUZ4egB/eIm0ggxogAxno8a42fujIic6AsLeJx183oN5qagwRsTcKbI3gpXPxSN1nSDE3MIPtnf6l1FILoWJmcOvE0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=WQ7Xa43A; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="WQ7Xa43A" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1744705463; x=1776241463; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=7d2zMHeWj50Um4H2evhWEbsinhS6K3aMK32UCV5IJ1w=; b=WQ7Xa43AM5/cYyB+mwfsS9ASR1mN9T3K6RmNZ/hRPxdVWIukdkDbLIWj SHYYafUcXKjnPytPCFp34PE4JsP+uwRx4Lg/fDLRWldt7pZeo5Z32JWv0 MrwG/otzwTFv8aCGIGSkA36VFpoUxAKrOsUSxRFkESOqev1BybD/I0xcg +aIir7mf0BPvhRgkUDiFEnJ3RP2xbGKSI4XW9mAcVOrjpiS3k3HLFg89v UNHOfyoBrVeqpXfo3uLlcq6BrUKxMkziFdkGTLk0uptFpiu+4o0DTkOwg iSk+8b2FBQB7v9WY1iGGj/mvHLUUYVKqEj0uLUu/1DAhfpkWrheywqKMF Q==; X-CSE-ConnectionGUID: Jtpo8TbuRK2NNSTTxMnt/A== X-CSE-MsgGUID: 5qImbULOTV+g8c0t7cLmhA== X-IronPort-AV: E=McAfee;i="6700,10204,11403"; a="46116189" X-IronPort-AV: E=Sophos;i="6.15,213,1739865600"; d="scan'208";a="46116189" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2025 01:24:22 -0700 X-CSE-ConnectionGUID: Ryc3lMPdQ1Ou3+GyFFbdrQ== X-CSE-MsgGUID: FRkvWQHtSnK2k2VQX0rgUg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.15,213,1739865600"; d="scan'208";a="130055702" Received: from emr.sh.intel.com ([10.112.229.56]) by fmviesa007.fm.intel.com with ESMTP; 15 Apr 2025 01:24:18 -0700 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Ian Rogers , Adrian Hunter , Alexander Shishkin , Kan Liang , Andi Kleen , Eranian Stephane Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Dapeng Mi , Dapeng Mi Subject: [Patch v3 20/22] perf tools: Enhance sample_regs_user/intr to capture more registers Date: Tue, 15 Apr 2025 11:44:26 +0000 Message-Id: <20250415114428.341182-21-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250415114428.341182-1-dapeng1.mi@linux.intel.com> References: <20250415114428.341182-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Intel architectural PEBS supports to capture more vector registers like OPMASK/YMM/ZMM registers besides already supported XMM registers. arch-PEBS vector registers (VCER) capturing on perf core/pmu driver (Intel) has been supported by previous patches. This patch adds perf tool's part support. In detail, add support for the new sample_regs_intr/user_ext register selector in perf_event_attr. These 32 bytes bitmap is used to select the new register group OPMASK, YMMH, ZMMH and ZMM in VECR. Update perf regs to introduce the new registers. This single patch only introduces the generic support, x86/intel specific support would be added in next patch. Co-developed-by: Kan Liang Signed-off-by: Kan Liang Signed-off-by: Dapeng Mi --- tools/include/uapi/linux/perf_event.h | 14 +++++++++++++ tools/perf/builtin-script.c | 23 +++++++++++++++----- tools/perf/util/evsel.c | 30 ++++++++++++++++++++------- tools/perf/util/parse-regs-options.c | 23 ++++++++++++-------- tools/perf/util/perf_regs.h | 16 +++++++++++++- tools/perf/util/record.h | 4 ++-- tools/perf/util/sample.h | 6 +++++- tools/perf/util/session.c | 29 +++++++++++++++----------- tools/perf/util/synthetic-events.c | 12 +++++++---- 9 files changed, 116 insertions(+), 41 deletions(-) diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/lin= ux/perf_event.h index 0524d541d4e3..f19370f9bd78 100644 --- a/tools/include/uapi/linux/perf_event.h +++ b/tools/include/uapi/linux/perf_event.h @@ -379,6 +379,13 @@ enum perf_event_read_format { #define PERF_ATTR_SIZE_VER6 120 /* add: aux_sample_size */ #define PERF_ATTR_SIZE_VER7 128 /* add: sig_data */ #define PERF_ATTR_SIZE_VER8 136 /* add: config3 */ +#define PERF_ATTR_SIZE_VER9 168 /* add: sample_regs_intr_ext[PERF_EXT_REGS= _ARRAY_SIZE] */ + +#define PERF_EXT_REGS_ARRAY_SIZE 7 +#define PERF_NUM_EXT_REGS (PERF_EXT_REGS_ARRAY_SIZE * 64) + +#define PERF_SAMPLE_ARRAY_SIZE (PERF_EXT_REGS_ARRAY_SIZE + 1) +#define PERF_SAMPLE_REGS_NUM ((PERF_SAMPLE_ARRAY_SIZE) * 64) =20 /* * Hardware event_id to monitor via a performance monitoring event: @@ -531,6 +538,13 @@ struct perf_event_attr { __u64 sig_data; =20 __u64 config3; /* extension of config2 */ + + /* + * Extension sets of regs to dump for each sample. + * See asm/perf_regs.h for details. + */ + __u64 sample_regs_intr_ext[PERF_EXT_REGS_ARRAY_SIZE]; + __u64 sample_regs_user_ext[PERF_EXT_REGS_ARRAY_SIZE]; }; =20 /* diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index 9b16df881af8..c41d9ccdaa9d 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -722,21 +722,32 @@ static int perf_session__check_output_opt(struct perf= _session *session) } =20 static int perf_sample__fprintf_regs(struct regs_dump *regs, uint64_t mask= , const char *arch, - FILE *fp) + unsigned long *mask_ext, FILE *fp) { + unsigned int mask_size =3D sizeof(mask) * 8; unsigned i =3D 0, r; int printed =3D 0; + u64 val; =20 if (!regs || !regs->regs) return 0; =20 printed +=3D fprintf(fp, " ABI:%" PRIu64 " ", regs->abi); =20 - for_each_set_bit(r, (unsigned long *) &mask, sizeof(mask) * 8) { - u64 val =3D regs->regs[i++]; + for_each_set_bit(r, (unsigned long *)&mask, mask_size) { + val =3D regs->regs[i++]; printed +=3D fprintf(fp, "%5s:0x%"PRIx64" ", perf_reg_name(r, arch), val= ); } =20 + if (!mask_ext) + return printed; + + for_each_set_bit(r, mask_ext, PERF_NUM_EXT_REGS) { + val =3D regs->regs[i++]; + printed +=3D fprintf(fp, "%5s:0x%"PRIx64" ", + perf_reg_name(r + mask_size, arch), val); + } + return printed; } =20 @@ -797,7 +808,8 @@ static int perf_sample__fprintf_iregs(struct perf_sampl= e *sample, return 0; =20 return perf_sample__fprintf_regs(perf_sample__intr_regs(sample), - attr->sample_regs_intr, arch, fp); + attr->sample_regs_intr, arch, + (unsigned long *)attr->sample_regs_intr_ext, fp); } =20 static int perf_sample__fprintf_uregs(struct perf_sample *sample, @@ -807,7 +819,8 @@ static int perf_sample__fprintf_uregs(struct perf_sampl= e *sample, return 0; =20 return perf_sample__fprintf_regs(perf_sample__user_regs(sample), - attr->sample_regs_user, arch, fp); + attr->sample_regs_user, arch, + (unsigned long *)attr->sample_regs_user_ext, fp); } =20 static int perf_sample__fprintf_start(struct perf_script *script, diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index 6e71187d6a93..4e4389e16369 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -1061,7 +1061,7 @@ static void __evsel__config_callchain(struct evsel *e= vsel, struct record_opts *o arch__user_reg_mask((unsigned long *)&mask); evsel__set_sample_bit(evsel, REGS_USER); evsel__set_sample_bit(evsel, STACK_USER); - if (opts->sample_user_regs && + if (bitmap_weight(opts->sample_user_regs, PERF_SAMPLE_REGS_NUM) && DWARF_MINIMAL_REGS(arch) !=3D mask) { attr->sample_regs_user |=3D DWARF_MINIMAL_REGS(arch); pr_warning("WARNING: The use of --call-graph=3Ddwarf may require all t= he user registers, " @@ -1397,15 +1397,19 @@ void evsel__config(struct evsel *evsel, struct reco= rd_opts *opts, if (callchain && callchain->enabled && !evsel->no_aux_samples) evsel__config_callchain(evsel, opts, callchain); =20 - if (opts->sample_intr_regs && !evsel->no_aux_samples && - !evsel__is_dummy_event(evsel)) { - attr->sample_regs_intr =3D opts->sample_intr_regs; + if (bitmap_weight(opts->sample_intr_regs, PERF_SAMPLE_REGS_NUM) && + !evsel->no_aux_samples && !evsel__is_dummy_event(evsel)) { + attr->sample_regs_intr =3D opts->sample_intr_regs[0]; + memcpy(attr->sample_regs_intr_ext, &opts->sample_intr_regs[1], + PERF_NUM_EXT_REGS / 8); evsel__set_sample_bit(evsel, REGS_INTR); } =20 - if (opts->sample_user_regs && !evsel->no_aux_samples && - !evsel__is_dummy_event(evsel)) { - attr->sample_regs_user |=3D opts->sample_user_regs; + if (bitmap_weight(opts->sample_user_regs, PERF_SAMPLE_REGS_NUM) && + !evsel->no_aux_samples && !evsel__is_dummy_event(evsel)) { + attr->sample_regs_user |=3D opts->sample_user_regs[0]; + memcpy(attr->sample_regs_user_ext, &opts->sample_user_regs[1], + PERF_NUM_EXT_REGS / 8); evsel__set_sample_bit(evsel, REGS_USER); } =20 @@ -3198,10 +3202,16 @@ int evsel__parse_sample(struct evsel *evsel, union = perf_event *event, =20 if (regs->abi) { u64 mask =3D evsel->core.attr.sample_regs_user; + unsigned long *mask_ext =3D + (unsigned long *)evsel->core.attr.sample_regs_user_ext; + u64 *user_regs_mask; =20 sz =3D hweight64(mask) * sizeof(u64); + sz +=3D bitmap_weight(mask_ext, PERF_NUM_EXT_REGS) * sizeof(u64); OVERFLOW_CHECK(array, sz, max_size); regs->mask =3D mask; + user_regs_mask =3D (u64 *)regs->mask_ext; + memcpy(&user_regs_mask[1], mask_ext, PERF_NUM_EXT_REGS); regs->regs =3D (u64 *)array; array =3D (void *)array + sz; } @@ -3255,10 +3265,16 @@ int evsel__parse_sample(struct evsel *evsel, union = perf_event *event, =20 if (regs->abi !=3D PERF_SAMPLE_REGS_ABI_NONE) { u64 mask =3D evsel->core.attr.sample_regs_intr; + unsigned long *mask_ext =3D + (unsigned long *)evsel->core.attr.sample_regs_intr_ext; + u64 *intr_regs_mask; =20 sz =3D hweight64(mask) * sizeof(u64); + sz +=3D bitmap_weight(mask_ext, PERF_NUM_EXT_REGS) * sizeof(u64); OVERFLOW_CHECK(array, sz, max_size); regs->mask =3D mask; + intr_regs_mask =3D (u64 *)regs->mask_ext; + memcpy(&intr_regs_mask[1], mask_ext, PERF_NUM_EXT_REGS); regs->regs =3D (u64 *)array; array =3D (void *)array + sz; } diff --git a/tools/perf/util/parse-regs-options.c b/tools/perf/util/parse-r= egs-options.c index 3dcd8dc4f81b..42b176705ccf 100644 --- a/tools/perf/util/parse-regs-options.c +++ b/tools/perf/util/parse-regs-options.c @@ -12,11 +12,13 @@ static int __parse_regs(const struct option *opt, const char *str, int unset, bool in= tr) { + unsigned int size =3D PERF_SAMPLE_REGS_NUM; uint64_t *mode =3D (uint64_t *)opt->value; const struct sample_reg *r =3D NULL; char *s, *os =3D NULL, *p; int ret =3D -1; - uint64_t mask =3D 0; + DECLARE_BITMAP(mask, size); + DECLARE_BITMAP(mask_tmp, size); =20 if (unset) return 0; @@ -24,13 +26,14 @@ __parse_regs(const struct option *opt, const char *str,= int unset, bool intr) /* * cannot set it twice */ - if (*mode) + if (bitmap_weight((unsigned long *)mode, size)) return -1; =20 + bitmap_zero(mask, size); if (intr) - arch__intr_reg_mask((unsigned long *)&mask); + arch__intr_reg_mask(mask); else - arch__user_reg_mask((unsigned long *)&mask); + arch__user_reg_mask(mask); =20 /* str may be NULL in case no arg is passed to -I */ if (str) { @@ -47,7 +50,8 @@ __parse_regs(const struct option *opt, const char *str, i= nt unset, bool intr) if (!strcmp(s, "?")) { fprintf(stderr, "available registers: "); for (r =3D arch__sample_reg_masks(); r->name; r++) { - if (r->mask & mask) + bitmap_and(mask_tmp, mask, r->mask_ext, size); + if (bitmap_weight(mask_tmp, size)) fprintf(stderr, "%s ", r->name); } fputc('\n', stderr); @@ -55,7 +59,8 @@ __parse_regs(const struct option *opt, const char *str, i= nt unset, bool intr) goto error; } for (r =3D arch__sample_reg_masks(); r->name; r++) { - if ((r->mask & mask) && !strcasecmp(s, r->name)) + bitmap_and(mask_tmp, mask, r->mask_ext, size); + if (bitmap_weight(mask_tmp, size) && !strcasecmp(s, r->name)) break; } if (!r || !r->name) { @@ -64,7 +69,7 @@ __parse_regs(const struct option *opt, const char *str, i= nt unset, bool intr) goto error; } =20 - *mode |=3D r->mask; + bitmap_or((unsigned long *)mode, (unsigned long *)mode, r->mask_ext, si= ze); =20 if (!p) break; @@ -75,8 +80,8 @@ __parse_regs(const struct option *opt, const char *str, i= nt unset, bool intr) ret =3D 0; =20 /* default to all possible regs */ - if (*mode =3D=3D 0) - *mode =3D mask; + if (!bitmap_weight((unsigned long *)mode, size)) + bitmap_or((unsigned long *)mode, (unsigned long *)mode, mask, size); error: free(os); return ret; diff --git a/tools/perf/util/perf_regs.h b/tools/perf/util/perf_regs.h index 316d280e5cd7..d60a74623a0f 100644 --- a/tools/perf/util/perf_regs.h +++ b/tools/perf/util/perf_regs.h @@ -4,18 +4,32 @@ =20 #include #include +#include +#include +#include "util/record.h" =20 struct regs_dump; =20 struct sample_reg { const char *name; - uint64_t mask; + union { + uint64_t mask; + DECLARE_BITMAP(mask_ext, PERF_SAMPLE_REGS_NUM); + }; }; =20 #define SMPL_REG_MASK(b) (1ULL << (b)) #define SMPL_REG(n, b) { .name =3D #n, .mask =3D SMPL_REG_MASK(b) } #define SMPL_REG2_MASK(b) (3ULL << (b)) #define SMPL_REG2(n, b) { .name =3D #n, .mask =3D SMPL_REG2_MASK(b) } +#define SMPL_REG_EXT(n, b) \ + { .name =3D #n, .mask_ext[b / __BITS_PER_LONG] =3D 0x1ULL << (b % __BITS_= PER_LONG) } +#define SMPL_REG2_EXT(n, b) \ + { .name =3D #n, .mask_ext[b / __BITS_PER_LONG] =3D 0x3ULL << (b % __BITS_= PER_LONG) } +#define SMPL_REG4_EXT(n, b) \ + { .name =3D #n, .mask_ext[b / __BITS_PER_LONG] =3D 0xfULL << (b % __BITS_= PER_LONG) } +#define SMPL_REG8_EXT(n, b) \ + { .name =3D #n, .mask_ext[b / __BITS_PER_LONG] =3D 0xffULL << (b % __BITS= _PER_LONG) } #define SMPL_REG_END { .name =3D NULL } =20 enum { diff --git a/tools/perf/util/record.h b/tools/perf/util/record.h index a6566134e09e..2741bbbc2794 100644 --- a/tools/perf/util/record.h +++ b/tools/perf/util/record.h @@ -57,8 +57,8 @@ struct record_opts { unsigned int auxtrace_mmap_pages; unsigned int user_freq; u64 branch_stack; - u64 sample_intr_regs; - u64 sample_user_regs; + u64 sample_intr_regs[PERF_SAMPLE_ARRAY_SIZE]; + u64 sample_user_regs[PERF_SAMPLE_ARRAY_SIZE]; u64 default_interval; u64 user_interval; size_t auxtrace_snapshot_size; diff --git a/tools/perf/util/sample.h b/tools/perf/util/sample.h index 0e96240052e9..82db52aeae4d 100644 --- a/tools/perf/util/sample.h +++ b/tools/perf/util/sample.h @@ -4,13 +4,17 @@ =20 #include #include +#include =20 /* number of register is bound by the number of bits in regs_dump::mask (6= 4) */ #define PERF_SAMPLE_REGS_CACHE_SIZE (8 * sizeof(u64)) =20 struct regs_dump { u64 abi; - u64 mask; + union { + u64 mask; + DECLARE_BITMAP(mask_ext, PERF_SAMPLE_REGS_NUM); + }; u64 *regs; =20 /* Cached values/mask filled by first register access. */ diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c index 60fb9997ea0d..54db3f36d962 100644 --- a/tools/perf/util/session.c +++ b/tools/perf/util/session.c @@ -910,12 +910,13 @@ static void branch_stack__printf(struct perf_sample *= sample, } } =20 -static void regs_dump__printf(u64 mask, u64 *regs, const char *arch) +static void regs_dump__printf(struct regs_dump *regs, const char *arch) { + unsigned int size =3D PERF_SAMPLE_REGS_NUM; unsigned rid, i =3D 0; =20 - for_each_set_bit(rid, (unsigned long *) &mask, sizeof(mask) * 8) { - u64 val =3D regs[i++]; + for_each_set_bit(rid, regs->mask_ext, size) { + u64 val =3D regs->regs[i++]; =20 printf(".... %-5s 0x%016" PRIx64 "\n", perf_reg_name(rid, arch), val); @@ -936,16 +937,20 @@ static inline const char *regs_dump_abi(struct regs_d= ump *d) return regs_abi[d->abi]; } =20 -static void regs__printf(const char *type, struct regs_dump *regs, const c= har *arch) +static void regs__printf(bool intr, struct regs_dump *regs, const char *ar= ch) { - u64 mask =3D regs->mask; + u64 *mask =3D (u64 *)®s->mask_ext; =20 - printf("... %s regs: mask 0x%" PRIx64 " ABI %s\n", - type, - mask, - regs_dump_abi(regs)); + if (intr) + printf("... intr regs: mask 0x"); + else + printf("... user regs: mask 0x"); + + for (int i =3D 0; i < PERF_SAMPLE_ARRAY_SIZE; i++) + printf("%" PRIx64 "", mask[i]); + printf(" ABI %s\n", regs_dump_abi(regs)); =20 - regs_dump__printf(mask, regs->regs, arch); + regs_dump__printf(regs, arch); } =20 static void regs_user__printf(struct perf_sample *sample, const char *arch) @@ -958,7 +963,7 @@ static void regs_user__printf(struct perf_sample *sampl= e, const char *arch) user_regs =3D perf_sample__user_regs(sample); =20 if (user_regs->regs) - regs__printf("user", user_regs, arch); + regs__printf(false, user_regs, arch); } =20 static void regs_intr__printf(struct perf_sample *sample, const char *arch) @@ -971,7 +976,7 @@ static void regs_intr__printf(struct perf_sample *sampl= e, const char *arch) intr_regs =3D perf_sample__intr_regs(sample); =20 if (intr_regs->regs) - regs__printf("intr", intr_regs, arch); + regs__printf(true, intr_regs, arch); } =20 static void stack_user__printf(struct stack_dump *dump) diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic= -events.c index 2fc4d0537840..2706b92c9a80 100644 --- a/tools/perf/util/synthetic-events.c +++ b/tools/perf/util/synthetic-events.c @@ -1512,7 +1512,8 @@ size_t perf_event__sample_event_size(const struct per= f_sample *sample, u64 type, if (type & PERF_SAMPLE_REGS_USER) { if (sample->user_regs && sample->user_regs->abi) { result +=3D sizeof(u64); - sz =3D hweight64(sample->user_regs->mask) * sizeof(u64); + sz =3D bitmap_weight(sample->user_regs->mask_ext, + PERF_SAMPLE_REGS_NUM) * sizeof(u64); result +=3D sz; } else { result +=3D sizeof(u64); @@ -1540,7 +1541,8 @@ size_t perf_event__sample_event_size(const struct per= f_sample *sample, u64 type, if (type & PERF_SAMPLE_REGS_INTR) { if (sample->intr_regs && sample->intr_regs->abi) { result +=3D sizeof(u64); - sz =3D hweight64(sample->intr_regs->mask) * sizeof(u64); + sz =3D bitmap_weight(sample->intr_regs->mask_ext, + PERF_SAMPLE_REGS_NUM) * sizeof(u64); result +=3D sz; } else { result +=3D sizeof(u64); @@ -1711,7 +1713,8 @@ int perf_event__synthesize_sample(union perf_event *e= vent, u64 type, u64 read_fo if (type & PERF_SAMPLE_REGS_USER) { if (sample->user_regs && sample->user_regs->abi) { *array++ =3D sample->user_regs->abi; - sz =3D hweight64(sample->user_regs->mask) * sizeof(u64); + sz =3D bitmap_weight(sample->user_regs->mask_ext, + PERF_SAMPLE_REGS_NUM) * sizeof(u64); memcpy(array, sample->user_regs->regs, sz); array =3D (void *)array + sz; } else { @@ -1747,7 +1750,8 @@ int perf_event__synthesize_sample(union perf_event *e= vent, u64 type, u64 read_fo if (type & PERF_SAMPLE_REGS_INTR) { if (sample->intr_regs && sample->intr_regs->abi) { *array++ =3D sample->intr_regs->abi; - sz =3D hweight64(sample->intr_regs->mask) * sizeof(u64); + sz =3D bitmap_weight(sample->intr_regs->mask_ext, + PERF_SAMPLE_REGS_NUM) * sizeof(u64); memcpy(array, sample->intr_regs->regs, sz); array =3D (void *)array + sz; } else { --=20 2.40.1 From nobody Fri Dec 19 16:05:43 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B65822918F5; Tue, 15 Apr 2025 08:24:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744705467; cv=none; b=DyVVOGc3vauVHr3X9LgRlWccaXyQuRpkPa2Vbjz5C1uvAqedgXwOgM//o+CQTddFraOHZFDPKKQklqgFV0UJWldwGsQiVvGScH9sTsN/5SdPKPIE+YAetHIblgEJaI6iruwR5/9C4YeaA8WkKLBRlEKDkSNO8y7ABGEIW17Sq6k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744705467; c=relaxed/simple; bh=jTM45kRcQkxlXEdM7lEjqCGUqnBloRB7ou6TbwBoblk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=R0cMlzE08Awcsm1I7YMEYrNFHIlftIsS8YkeMQi1fzvB5HvYh3vH2aNcqQNufktRhgTgudGldEXA0DsnxmU9E42vAb1OsMu2+Czr5aG8BHwFMnhtTnTSYC8NkugnmM9SRh/IIMRiip2TB5ZEEd2K2bszAKqrJzu2nsdHuT2DehI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=kT4rAb/4; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="kT4rAb/4" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1744705466; x=1776241466; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=jTM45kRcQkxlXEdM7lEjqCGUqnBloRB7ou6TbwBoblk=; b=kT4rAb/4BTYwXksa/8YPmSvq57hM6+YVAt02bBn1o2WB2d2cmJsg/6H3 Wd/E0mnUGPdx619WAXqXNqazZgE3hP+FDrkios01h9ekHRBW04OrGjxpe frHF9aCfofDcwunXpWy5dl86qeDcWwJvYbJJoQKlLxvEY05tkDA7LW0SV p9mYh9PBSW0JcyPkTmxNbzZdnFRluzMMVa/X5VxDIjUNdIr8AfuP+uLA0 LtTz90qBbboR4Mvm40Gpcqcu+jfYjC2uVrDKITC6GHCa5gZJuSRJmtmLW 2vOUHW9SWZc+ZPmKkLhE1ceDSPgRmfkHncOW/16gx3KNCrL2FX6f+MWlc g==; X-CSE-ConnectionGUID: AbZat4XZSJqbRkwp61+NNA== X-CSE-MsgGUID: N3kNQcgiTXKJk06a2LxErw== X-IronPort-AV: E=McAfee;i="6700,10204,11403"; a="46116222" X-IronPort-AV: E=Sophos;i="6.15,213,1739865600"; d="scan'208";a="46116222" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2025 01:24:26 -0700 X-CSE-ConnectionGUID: DLI/c93/TdKTRf3jJ5Wv/Q== X-CSE-MsgGUID: Ss4TJN44Q6eO+a2oufe+KQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.15,213,1739865600"; d="scan'208";a="130055714" Received: from emr.sh.intel.com ([10.112.229.56]) by fmviesa007.fm.intel.com with ESMTP; 15 Apr 2025 01:24:22 -0700 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Ian Rogers , Adrian Hunter , Alexander Shishkin , Kan Liang , Andi Kleen , Eranian Stephane Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Dapeng Mi , Dapeng Mi Subject: [Patch v3 21/22] perf tools: Support to capture more vector registers (x86/Intel) Date: Tue, 15 Apr 2025 11:44:27 +0000 Message-Id: <20250415114428.341182-22-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250415114428.341182-1-dapeng1.mi@linux.intel.com> References: <20250415114428.341182-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Intel architectural PEBS supports to capture more vector registers like OPMASK/YMM/ZMM registers besides already supported XMM registers. This patch adds Intel specific support to capture these new vector registers for perf tools. Co-developed-by: Kan Liang Signed-off-by: Kan Liang Signed-off-by: Dapeng Mi --- tools/arch/x86/include/uapi/asm/perf_regs.h | 79 ++++++++++- tools/perf/arch/x86/util/perf_regs.c | 129 +++++++++++++++++- .../perf/util/perf-regs-arch/perf_regs_x86.c | 82 +++++++++++ 3 files changed, 285 insertions(+), 5 deletions(-) diff --git a/tools/arch/x86/include/uapi/asm/perf_regs.h b/tools/arch/x86/i= nclude/uapi/asm/perf_regs.h index 1c7ab5af5cc1..c05c6ec127c8 100644 --- a/tools/arch/x86/include/uapi/asm/perf_regs.h +++ b/tools/arch/x86/include/uapi/asm/perf_regs.h @@ -36,7 +36,7 @@ enum perf_event_x86_regs { /* PERF_REG_INTEL_PT_MAX ignores the SSP register. */ PERF_REG_INTEL_PT_MAX =3D PERF_REG_X86_R15 + 1, =20 - /* These all need two bits set because they are 128bit */ + /* These all need two bits set because they are 128 bits */ PERF_REG_X86_XMM0 =3D 32, PERF_REG_X86_XMM1 =3D 34, PERF_REG_X86_XMM2 =3D 36, @@ -56,6 +56,83 @@ enum perf_event_x86_regs { =20 /* These include both GPRs and XMMX registers */ PERF_REG_X86_XMM_MAX =3D PERF_REG_X86_XMM15 + 2, + + /* Leave bits[127:64] for other GP registers, like R16 ~ R31.*/ + + /* + * Each YMM register need 4 bits to represent because they are 256 bits. + * PERF_REG_X86_YMMH0 =3D 128 + */ + PERF_REG_X86_YMM0 =3D 128, + PERF_REG_X86_YMM1 =3D PERF_REG_X86_YMM0 + 4, + PERF_REG_X86_YMM2 =3D PERF_REG_X86_YMM1 + 4, + PERF_REG_X86_YMM3 =3D PERF_REG_X86_YMM2 + 4, + PERF_REG_X86_YMM4 =3D PERF_REG_X86_YMM3 + 4, + PERF_REG_X86_YMM5 =3D PERF_REG_X86_YMM4 + 4, + PERF_REG_X86_YMM6 =3D PERF_REG_X86_YMM5 + 4, + PERF_REG_X86_YMM7 =3D PERF_REG_X86_YMM6 + 4, + PERF_REG_X86_YMM8 =3D PERF_REG_X86_YMM7 + 4, + PERF_REG_X86_YMM9 =3D PERF_REG_X86_YMM8 + 4, + PERF_REG_X86_YMM10 =3D PERF_REG_X86_YMM9 + 4, + PERF_REG_X86_YMM11 =3D PERF_REG_X86_YMM10 + 4, + PERF_REG_X86_YMM12 =3D PERF_REG_X86_YMM11 + 4, + PERF_REG_X86_YMM13 =3D PERF_REG_X86_YMM12 + 4, + PERF_REG_X86_YMM14 =3D PERF_REG_X86_YMM13 + 4, + PERF_REG_X86_YMM15 =3D PERF_REG_X86_YMM14 + 4, + PERF_REG_X86_YMM_MAX =3D PERF_REG_X86_YMM15 + 4, + + /* + * Each ZMM register needs 8 bits to represent because they are 512 bits + * PERF_REG_X86_ZMMH0 =3D 192 + */ + PERF_REG_X86_ZMM0 =3D PERF_REG_X86_YMM_MAX, + PERF_REG_X86_ZMM1 =3D PERF_REG_X86_ZMM0 + 8, + PERF_REG_X86_ZMM2 =3D PERF_REG_X86_ZMM1 + 8, + PERF_REG_X86_ZMM3 =3D PERF_REG_X86_ZMM2 + 8, + PERF_REG_X86_ZMM4 =3D PERF_REG_X86_ZMM3 + 8, + PERF_REG_X86_ZMM5 =3D PERF_REG_X86_ZMM4 + 8, + PERF_REG_X86_ZMM6 =3D PERF_REG_X86_ZMM5 + 8, + PERF_REG_X86_ZMM7 =3D PERF_REG_X86_ZMM6 + 8, + PERF_REG_X86_ZMM8 =3D PERF_REG_X86_ZMM7 + 8, + PERF_REG_X86_ZMM9 =3D PERF_REG_X86_ZMM8 + 8, + PERF_REG_X86_ZMM10 =3D PERF_REG_X86_ZMM9 + 8, + PERF_REG_X86_ZMM11 =3D PERF_REG_X86_ZMM10 + 8, + PERF_REG_X86_ZMM12 =3D PERF_REG_X86_ZMM11 + 8, + PERF_REG_X86_ZMM13 =3D PERF_REG_X86_ZMM12 + 8, + PERF_REG_X86_ZMM14 =3D PERF_REG_X86_ZMM13 + 8, + PERF_REG_X86_ZMM15 =3D PERF_REG_X86_ZMM14 + 8, + PERF_REG_X86_ZMM16 =3D PERF_REG_X86_ZMM15 + 8, + PERF_REG_X86_ZMM17 =3D PERF_REG_X86_ZMM16 + 8, + PERF_REG_X86_ZMM18 =3D PERF_REG_X86_ZMM17 + 8, + PERF_REG_X86_ZMM19 =3D PERF_REG_X86_ZMM18 + 8, + PERF_REG_X86_ZMM20 =3D PERF_REG_X86_ZMM19 + 8, + PERF_REG_X86_ZMM21 =3D PERF_REG_X86_ZMM20 + 8, + PERF_REG_X86_ZMM22 =3D PERF_REG_X86_ZMM21 + 8, + PERF_REG_X86_ZMM23 =3D PERF_REG_X86_ZMM22 + 8, + PERF_REG_X86_ZMM24 =3D PERF_REG_X86_ZMM23 + 8, + PERF_REG_X86_ZMM25 =3D PERF_REG_X86_ZMM24 + 8, + PERF_REG_X86_ZMM26 =3D PERF_REG_X86_ZMM25 + 8, + PERF_REG_X86_ZMM27 =3D PERF_REG_X86_ZMM26 + 8, + PERF_REG_X86_ZMM28 =3D PERF_REG_X86_ZMM27 + 8, + PERF_REG_X86_ZMM29 =3D PERF_REG_X86_ZMM28 + 8, + PERF_REG_X86_ZMM30 =3D PERF_REG_X86_ZMM29 + 8, + PERF_REG_X86_ZMM31 =3D PERF_REG_X86_ZMM30 + 8, + PERF_REG_X86_ZMM_MAX =3D PERF_REG_X86_ZMM31 + 8, + + /* + * OPMASK Registers + * PERF_REG_X86_OPMASK0 =3D 448 + */ + PERF_REG_X86_OPMASK0 =3D PERF_REG_X86_ZMM_MAX, + PERF_REG_X86_OPMASK1 =3D PERF_REG_X86_OPMASK0 + 1, + PERF_REG_X86_OPMASK2 =3D PERF_REG_X86_OPMASK1 + 1, + PERF_REG_X86_OPMASK3 =3D PERF_REG_X86_OPMASK2 + 1, + PERF_REG_X86_OPMASK4 =3D PERF_REG_X86_OPMASK3 + 1, + PERF_REG_X86_OPMASK5 =3D PERF_REG_X86_OPMASK4 + 1, + PERF_REG_X86_OPMASK6 =3D PERF_REG_X86_OPMASK5 + 1, + PERF_REG_X86_OPMASK7 =3D PERF_REG_X86_OPMASK6 + 1, + + PERF_REG_X86_VEC_MAX =3D PERF_REG_X86_OPMASK7 + 1, }; =20 #define PERF_REG_EXTENDED_MASK (~((1ULL << PERF_REG_X86_XMM0) - 1)) diff --git a/tools/perf/arch/x86/util/perf_regs.c b/tools/perf/arch/x86/uti= l/perf_regs.c index 5b163f0a651a..bade6c64770c 100644 --- a/tools/perf/arch/x86/util/perf_regs.c +++ b/tools/perf/arch/x86/util/perf_regs.c @@ -54,6 +54,66 @@ static const struct sample_reg sample_reg_masks[] =3D { SMPL_REG2(XMM13, PERF_REG_X86_XMM13), SMPL_REG2(XMM14, PERF_REG_X86_XMM14), SMPL_REG2(XMM15, PERF_REG_X86_XMM15), + + SMPL_REG4_EXT(YMM0, PERF_REG_X86_YMM0), + SMPL_REG4_EXT(YMM1, PERF_REG_X86_YMM1), + SMPL_REG4_EXT(YMM2, PERF_REG_X86_YMM2), + SMPL_REG4_EXT(YMM3, PERF_REG_X86_YMM3), + SMPL_REG4_EXT(YMM4, PERF_REG_X86_YMM4), + SMPL_REG4_EXT(YMM5, PERF_REG_X86_YMM5), + SMPL_REG4_EXT(YMM6, PERF_REG_X86_YMM6), + SMPL_REG4_EXT(YMM7, PERF_REG_X86_YMM7), + SMPL_REG4_EXT(YMM8, PERF_REG_X86_YMM8), + SMPL_REG4_EXT(YMM9, PERF_REG_X86_YMM9), + SMPL_REG4_EXT(YMM10, PERF_REG_X86_YMM10), + SMPL_REG4_EXT(YMM11, PERF_REG_X86_YMM11), + SMPL_REG4_EXT(YMM12, PERF_REG_X86_YMM12), + SMPL_REG4_EXT(YMM13, PERF_REG_X86_YMM13), + SMPL_REG4_EXT(YMM14, PERF_REG_X86_YMM14), + SMPL_REG4_EXT(YMM15, PERF_REG_X86_YMM15), + + SMPL_REG8_EXT(ZMM0, PERF_REG_X86_ZMM0), + SMPL_REG8_EXT(ZMM1, PERF_REG_X86_ZMM1), + SMPL_REG8_EXT(ZMM2, PERF_REG_X86_ZMM2), + SMPL_REG8_EXT(ZMM3, PERF_REG_X86_ZMM3), + SMPL_REG8_EXT(ZMM4, PERF_REG_X86_ZMM4), + SMPL_REG8_EXT(ZMM5, PERF_REG_X86_ZMM5), + SMPL_REG8_EXT(ZMM6, PERF_REG_X86_ZMM6), + SMPL_REG8_EXT(ZMM7, PERF_REG_X86_ZMM7), + SMPL_REG8_EXT(ZMM8, PERF_REG_X86_ZMM8), + SMPL_REG8_EXT(ZMM9, PERF_REG_X86_ZMM9), + SMPL_REG8_EXT(ZMM10, PERF_REG_X86_ZMM10), + SMPL_REG8_EXT(ZMM11, PERF_REG_X86_ZMM11), + SMPL_REG8_EXT(ZMM12, PERF_REG_X86_ZMM12), + SMPL_REG8_EXT(ZMM13, PERF_REG_X86_ZMM13), + SMPL_REG8_EXT(ZMM14, PERF_REG_X86_ZMM14), + SMPL_REG8_EXT(ZMM15, PERF_REG_X86_ZMM15), + SMPL_REG8_EXT(ZMM16, PERF_REG_X86_ZMM16), + SMPL_REG8_EXT(ZMM17, PERF_REG_X86_ZMM17), + SMPL_REG8_EXT(ZMM18, PERF_REG_X86_ZMM18), + SMPL_REG8_EXT(ZMM19, PERF_REG_X86_ZMM19), + SMPL_REG8_EXT(ZMM20, PERF_REG_X86_ZMM20), + SMPL_REG8_EXT(ZMM21, PERF_REG_X86_ZMM21), + SMPL_REG8_EXT(ZMM22, PERF_REG_X86_ZMM22), + SMPL_REG8_EXT(ZMM23, PERF_REG_X86_ZMM23), + SMPL_REG8_EXT(ZMM24, PERF_REG_X86_ZMM24), + SMPL_REG8_EXT(ZMM25, PERF_REG_X86_ZMM25), + SMPL_REG8_EXT(ZMM26, PERF_REG_X86_ZMM26), + SMPL_REG8_EXT(ZMM27, PERF_REG_X86_ZMM27), + SMPL_REG8_EXT(ZMM28, PERF_REG_X86_ZMM28), + SMPL_REG8_EXT(ZMM29, PERF_REG_X86_ZMM29), + SMPL_REG8_EXT(ZMM30, PERF_REG_X86_ZMM30), + SMPL_REG8_EXT(ZMM31, PERF_REG_X86_ZMM31), + + SMPL_REG_EXT(OPMASK0, PERF_REG_X86_OPMASK0), + SMPL_REG_EXT(OPMASK1, PERF_REG_X86_OPMASK1), + SMPL_REG_EXT(OPMASK2, PERF_REG_X86_OPMASK2), + SMPL_REG_EXT(OPMASK3, PERF_REG_X86_OPMASK3), + SMPL_REG_EXT(OPMASK4, PERF_REG_X86_OPMASK4), + SMPL_REG_EXT(OPMASK5, PERF_REG_X86_OPMASK5), + SMPL_REG_EXT(OPMASK6, PERF_REG_X86_OPMASK6), + SMPL_REG_EXT(OPMASK7, PERF_REG_X86_OPMASK7), + SMPL_REG_END }; =20 @@ -283,13 +343,59 @@ const struct sample_reg *arch__sample_reg_masks(void) return sample_reg_masks; } =20 -void arch__intr_reg_mask(unsigned long *mask) +static void check_ext2_regs_mask(struct perf_event_attr *attr, bool user, + int idx, u64 fmask, unsigned long *mask) +{ + u64 reg_mask[PERF_SAMPLE_ARRAY_SIZE] =3D { 0 }; + int fd; + + if (user) { + attr->sample_regs_user =3D 0; + attr->sample_regs_user_ext[idx] =3D fmask; + } else { + attr->sample_regs_intr =3D 0; + attr->sample_regs_intr_ext[idx] =3D fmask; + } + + /* reg_mask[] includes sample_regs_intr regs, so index need add 1. */ + reg_mask[idx + 1] =3D fmask; + + fd =3D sys_perf_event_open(attr, 0, -1, -1, 0); + if (fd !=3D -1) { + close(fd); + bitmap_or(mask, mask, (unsigned long *)reg_mask, + PERF_SAMPLE_REGS_NUM); + } +} + +#define PERF_REG_EXTENDED_YMM_MASK GENMASK_ULL(63, 0) +#define PERF_REG_EXTENDED_ZMM_MASK GENMASK_ULL(63, 0) +#define PERF_REG_EXTENDED_OPMASK_MASK GENMASK_ULL(7, 0) + +static void get_ext2_regs_mask(struct perf_event_attr *attr, bool user, + unsigned long *mask) +{ + event_attr_init(attr); + + /* Check YMM regs, bits 128 ~ 191. */ + check_ext2_regs_mask(attr, user, 1, PERF_REG_EXTENDED_YMM_MASK, mask); + /* Check ZMM 0-7 regs, bits 192 ~ 255. */ + check_ext2_regs_mask(attr, user, 2, PERF_REG_EXTENDED_ZMM_MASK, mask); + /* Check ZMM 8-15 regs, bits 256 ~ 319. */ + check_ext2_regs_mask(attr, user, 3, PERF_REG_EXTENDED_ZMM_MASK, mask); + /* Check ZMM 16-23 regs, bits 320 ~ 383. */ + check_ext2_regs_mask(attr, user, 4, PERF_REG_EXTENDED_ZMM_MASK, mask); + /* Check ZMM 16-23 regs, bits 384 ~ 447. */ + check_ext2_regs_mask(attr, user, 5, PERF_REG_EXTENDED_ZMM_MASK, mask); + /* Check OPMASK regs, bits 448 ~ 455. */ + check_ext2_regs_mask(attr, user, 6, PERF_REG_EXTENDED_OPMASK_MASK, mask); +} + +static void arch__get_reg_mask(unsigned long *mask, bool user) { struct perf_event_attr attr =3D { .type =3D PERF_TYPE_HARDWARE, .config =3D PERF_COUNT_HW_CPU_CYCLES, - .sample_type =3D PERF_SAMPLE_REGS_INTR, - .sample_regs_intr =3D PERF_REG_EXTENDED_MASK, .precise_ip =3D 1, .disabled =3D 1, .exclude_kernel =3D 1, @@ -298,6 +404,14 @@ void arch__intr_reg_mask(unsigned long *mask) =20 *(u64 *)mask =3D PERF_REGS_MASK; =20 + if (user) { + attr.sample_type =3D PERF_SAMPLE_REGS_USER; + attr.sample_regs_user =3D PERF_REG_EXTENDED_MASK; + } else { + attr.sample_type =3D PERF_SAMPLE_REGS_INTR; + attr.sample_regs_intr =3D PERF_REG_EXTENDED_MASK; + } + /* * In an unnamed union, init it here to build on older gcc versions */ @@ -325,9 +439,16 @@ void arch__intr_reg_mask(unsigned long *mask) close(fd); *(u64 *)mask =3D PERF_REG_EXTENDED_MASK | PERF_REGS_MASK; } + + get_ext2_regs_mask(&attr, user, mask); +} + +void arch__intr_reg_mask(unsigned long *mask) +{ + arch__get_reg_mask(mask, false); } =20 void arch__user_reg_mask(unsigned long *mask) { - *(uint64_t *)mask =3D PERF_REGS_MASK; + arch__get_reg_mask(mask, true); } diff --git a/tools/perf/util/perf-regs-arch/perf_regs_x86.c b/tools/perf/ut= il/perf-regs-arch/perf_regs_x86.c index c0e95215b577..eb1e3d716f27 100644 --- a/tools/perf/util/perf-regs-arch/perf_regs_x86.c +++ b/tools/perf/util/perf-regs-arch/perf_regs_x86.c @@ -78,6 +78,88 @@ const char *__perf_reg_name_x86(int id) XMM(14) XMM(15) #undef XMM + +#define YMM(x) \ + case PERF_REG_X86_YMM ## x: \ + case PERF_REG_X86_YMM ## x + 1: \ + case PERF_REG_X86_YMM ## x + 2: \ + case PERF_REG_X86_YMM ## x + 3: \ + return "YMM" #x; + YMM(0) + YMM(1) + YMM(2) + YMM(3) + YMM(4) + YMM(5) + YMM(6) + YMM(7) + YMM(8) + YMM(9) + YMM(10) + YMM(11) + YMM(12) + YMM(13) + YMM(14) + YMM(15) +#undef YMM + +#define ZMM(x) \ + case PERF_REG_X86_ZMM ## x: \ + case PERF_REG_X86_ZMM ## x + 1: \ + case PERF_REG_X86_ZMM ## x + 2: \ + case PERF_REG_X86_ZMM ## x + 3: \ + case PERF_REG_X86_ZMM ## x + 4: \ + case PERF_REG_X86_ZMM ## x + 5: \ + case PERF_REG_X86_ZMM ## x + 6: \ + case PERF_REG_X86_ZMM ## x + 7: \ + return "ZMM" #x; + ZMM(0) + ZMM(1) + ZMM(2) + ZMM(3) + ZMM(4) + ZMM(5) + ZMM(6) + ZMM(7) + ZMM(8) + ZMM(9) + ZMM(10) + ZMM(11) + ZMM(12) + ZMM(13) + ZMM(14) + ZMM(15) + ZMM(16) + ZMM(17) + ZMM(18) + ZMM(19) + ZMM(20) + ZMM(21) + ZMM(22) + ZMM(23) + ZMM(24) + ZMM(25) + ZMM(26) + ZMM(27) + ZMM(28) + ZMM(29) + ZMM(30) + ZMM(31) +#undef ZMM + +#define OPMASK(x) \ + case PERF_REG_X86_OPMASK ## x: \ + return "opmask" #x; + + OPMASK(0) + OPMASK(1) + OPMASK(2) + OPMASK(3) + OPMASK(4) + OPMASK(5) + OPMASK(6) + OPMASK(7) +#undef OPMASK default: return NULL; } --=20 2.40.1 From nobody Fri Dec 19 16:05:43 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2CCC1292929; Tue, 15 Apr 2025 08:24:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744705470; cv=none; b=lhb72jTOnAmosvwiDCJnqvSPdxcj/ffiDUxVm6x+Us46cxNSSWsHuQ1+UUOH2LxtUmIsu83lAP3dLarowOaTbz/O+OkDyfzXGKjz/opymFU7Krw2oCLHiDlrBfPVHkzGchfShRsJZ15HXxWa8Biz+17J5V9DvpQ8Eta3BIWErYA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744705470; c=relaxed/simple; bh=K0PCFpqXR+BoZPRcggrk+ij0mfJByFEiMMV0vPkyapU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=qfw0ajrMgg5vPpRBOcBitVYWyiobMsoSOrG8jvM0pQ8s7wfoBWJdc8kknsE6J726Ny2E8ZN6nK7xjlnbY1P9TDkEBKgsFRgPHtFlXcUSz3LLDSGIv/gRaA3vgpy51yKj1pKem3/MkrKaeYWn+EYa4N4tWpcBUCt0sZr1XlzReDg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=CnPGop0P; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="CnPGop0P" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1744705470; x=1776241470; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=K0PCFpqXR+BoZPRcggrk+ij0mfJByFEiMMV0vPkyapU=; b=CnPGop0PyLV5WMZ2JW31uRnxixmiB5KQZ9X0Ai/AKzBtxoZiNRuO8g9+ ez4tLwnPtxNPnGfQXiJsTxagYtkNIgA7/L+agE7uDG7P2dTCvQHWD/2iF QO0JQrI8bo4a3bSGuN01eYeV3breAr3gKxxOiSG/HDxC+3M50PIVl0Ohp kahINo1rTHqF6Kro9nR31qhGOY/fS+9122G66Y9roK2czW0IyViUc8Q0q JlMzspxz7hnfymGprBQDcmVniahaZZI0osRuq/SaXoC74s1lYxfNsltr0 xbIwpwkuV2VmR4sz0ili9t3h8QXmEuTUnPVtt2WKHtd67IZ3YsUfk9jsv A==; X-CSE-ConnectionGUID: UYyPPPsdRc+Rmc31Qyrt6w== X-CSE-MsgGUID: iwlnnQEYSp6EB5AZ84JPPQ== X-IronPort-AV: E=McAfee;i="6700,10204,11403"; a="46116255" X-IronPort-AV: E=Sophos;i="6.15,213,1739865600"; d="scan'208";a="46116255" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2025 01:24:29 -0700 X-CSE-ConnectionGUID: J+C/4w8mTbaacM+yta7nPw== X-CSE-MsgGUID: sChUOHhzTiqcxunNh4+LMQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.15,213,1739865600"; d="scan'208";a="130055727" Received: from emr.sh.intel.com ([10.112.229.56]) by fmviesa007.fm.intel.com with ESMTP; 15 Apr 2025 01:24:25 -0700 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Ian Rogers , Adrian Hunter , Alexander Shishkin , Kan Liang , Andi Kleen , Eranian Stephane Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Dapeng Mi , Dapeng Mi Subject: [Patch v3 22/22] perf tools/tests: Add vector registers PEBS sampling test Date: Tue, 15 Apr 2025 11:44:28 +0000 Message-Id: <20250415114428.341182-23-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250415114428.341182-1-dapeng1.mi@linux.intel.com> References: <20250415114428.341182-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Current adaptive PEBS supports to capture some vector registers like XMM register, and arch-PEBS supports to capture wider vector registers like YMM and ZMM registers. This patch adds a perf test case to verify these vector registers can be captured correctly. Suggested-by: Kan Liang Signed-off-by: Dapeng Mi --- tools/perf/tests/shell/record.sh | 55 ++++++++++++++++++++++++++++++++ 1 file changed, 55 insertions(+) diff --git a/tools/perf/tests/shell/record.sh b/tools/perf/tests/shell/reco= rd.sh index ba8d873d3ca7..d85aab09902b 100755 --- a/tools/perf/tests/shell/record.sh +++ b/tools/perf/tests/shell/record.sh @@ -116,6 +116,60 @@ test_register_capture() { echo "Register capture test [Success]" } =20 +test_vec_register_capture() { + echo "Vector register capture test" + if ! perf record -o /dev/null --quiet -e instructions:p true 2> /dev/null + then + echo "Vector register capture test [Skipped missing event]" + return + fi + if ! perf record --intr-regs=3D\? 2>&1 | grep -q 'XMM0' + then + echo "Vector register capture test [Skipped missing XMM registers]" + return + fi + if ! perf record -o - --intr-regs=3Dxmm0 -e instructions:p \ + -c 100000 ${testprog} 2> /dev/null \ + | perf script -F ip,sym,iregs -i - 2> /dev/null \ + | grep -q "XMM0:" + then + echo "Vector register capture test [Failed missing XMM output]" + err=3D1 + return + fi + echo "Vector registe (XMM) capture test [Success]" + if ! perf record --intr-regs=3D\? 2>&1 | grep -q 'YMM0' + then + echo "Vector register capture test [Skipped missing YMM registers]" + return + fi + if ! perf record -o - --intr-regs=3Dymm0 -e instructions:p \ + -c 100000 ${testprog} 2> /dev/null \ + | perf script -F ip,sym,iregs -i - 2> /dev/null \ + | grep -q "YMM0:" + then + echo "Vector register capture test [Failed missing YMM output]" + err=3D1 + return + fi + echo "Vector registe (YMM) capture test [Success]" + if ! perf record --intr-regs=3D\? 2>&1 | grep -q 'ZMM0' + then + echo "Vector register capture test [Skipped missing ZMM registers]" + return + fi + if ! perf record -o - --intr-regs=3Dzmm0 -e instructions:p \ + -c 100000 ${testprog} 2> /dev/null \ + | perf script -F ip,sym,iregs -i - 2> /dev/null \ + | grep -q "ZMM0:" + then + echo "Vector register capture test [Failed missing ZMM output]" + err=3D1 + return + fi + echo "Vector register (ZMM) capture test [Success]" +} + test_system_wide() { echo "Basic --system-wide mode test" if ! perf record -aB --synth=3Dno -o "${perfdata}" ${testprog} 2> /dev/n= ull @@ -318,6 +372,7 @@ fi =20 test_per_thread test_register_capture +test_vec_register_capture test_system_wide test_workload test_branch_counter --=20 2.40.1