From nobody Wed Oct 8 14:18:30 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 788A525A2C4 for ; Fri, 15 Aug 2025 21:35:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755293748; cv=none; b=UYK0iAH/nVcUC/AAiwrvOYwI7qyX6kYVM0Vr0htnBbbYDfMn/HS0YgZPctfNc7zL6PpD1Wf5tp8+H7XYHwoJ8Th3C3fWG2tD6iqv77/mWbkoVgEBTAoyOzleb6YmF3X3uMXA7I/5jJX0nZouBPev15f0uLWHQGtd4jpjP4o8tIw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755293748; c=relaxed/simple; bh=ilZZDLKFtvjxpxoBhhniohVxQ7x9krgeaddVpbJsspM=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=sipwnUBDJHMFZahnZMVpydUnbQroZncWdVAA6vY3ZmCqWVxs1ZzSQN8so/Y9A6UxDZ2QA6sw/Eg75o8mBkUufLfDZ3ipFsRMCcc4K7CkBkrx5IrizE6toSHPbM67oIesKvipk/emb96X3nTFyREB3GfJq/r9pKt1yD12V+RCHvc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=QwFiHrZU; arc=none smtp.client-ip=198.175.65.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="QwFiHrZU" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1755293747; x=1786829747; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ilZZDLKFtvjxpxoBhhniohVxQ7x9krgeaddVpbJsspM=; b=QwFiHrZUy35tlZvHfiJ1OXVzPCJ8rWKyrK18rwGcAl8mqRdNeC7IeA5S 3ST1Az9mw4Cwn2sTmNAsD+Lo6Xk7WZ/w1ZBxU+8Lu6vQtzGGFDdkmqGF3 QkzyBRYH6IMMNPrSOInckZFCzAEk6RCAUHP7ZVPOl2m7gzWwTEId/Otti R8ynvCF1ddKUSfTMlIH7XlDmCuKNkBTA510S3nzezwyyk6r9Apfxyfyug Fui5UK8ZmVLJTr1nmDgwm93nepYhINoG63I1V0Olw72AThloFO4m+Onl0 sG7JACazefgVKc+4vm5C9TWSEajmV717xB8XSTb4qCw+tY0GqvO9tYYtC w==; X-CSE-ConnectionGUID: r0f6a4NXQriBNhtdmIs6YQ== X-CSE-MsgGUID: +ctSlu5eRDmuZingnXWbVQ== X-IronPort-AV: E=McAfee;i="6800,10657,11523"; a="68707362" X-IronPort-AV: E=Sophos;i="6.17,293,1747724400"; d="scan'208";a="68707362" Received: from orviesa010.jf.intel.com ([10.64.159.150]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Aug 2025 14:35:45 -0700 X-CSE-ConnectionGUID: kOxS3FTYTzWG+txuTn9ZSg== X-CSE-MsgGUID: KqKz784DRy2CtXob8CBayA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.17,293,1747724400"; d="scan'208";a="166319577" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by orviesa010.jf.intel.com with ESMTP; 15 Aug 2025 14:35:45 -0700 From: kan.liang@linux.intel.com To: peterz@infradead.org, mingo@redhat.com, acme@kernel.org, namhyung@kernel.org, tglx@linutronix.de, dave.hansen@linux.intel.com, irogers@google.com, adrian.hunter@intel.com, jolsa@kernel.org, alexander.shishkin@linux.intel.com, linux-kernel@vger.kernel.org Cc: dapeng1.mi@linux.intel.com, ak@linux.intel.com, zide.chen@intel.com, mark.rutland@arm.com, broonie@kernel.org, ravi.bangoria@amd.com, eranian@google.com, Kan Liang Subject: [PATCH V3 01/17] perf/x86: Use x86_perf_regs in the x86 nmi handler Date: Fri, 15 Aug 2025 14:34:19 -0700 Message-Id: <20250815213435.1702022-2-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20250815213435.1702022-1-kan.liang@linux.intel.com> References: <20250815213435.1702022-1-kan.liang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang More and more regs will be supported in the overflow, e.g., more vector registers, SSP, etc. The generic pt_regs struct cannot store all of them. Use a X86 specific x86_perf_regs instead. The struct pt_regs *regs is still passed to x86_pmu_handle_irq(). There is no functional change for the existing code. AMD IBS's NMI handler doesn't utilize the static call x86_pmu_handle_irq(). The x86_perf_regs struct doesn't apply to the AMD IBS. It can be added separately later when AMD IBS supports more regs. Signed-off-by: Kan Liang --- arch/x86/events/core.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 7610f26dfbd9..64a7a8aa2e38 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -1752,6 +1752,7 @@ void perf_events_lapic_init(void) static int perf_event_nmi_handler(unsigned int cmd, struct pt_regs *regs) { + struct x86_perf_regs x86_regs; u64 start_clock; u64 finish_clock; int ret; @@ -1764,7 +1765,8 @@ perf_event_nmi_handler(unsigned int cmd, struct pt_re= gs *regs) return NMI_DONE; =20 start_clock =3D sched_clock(); - ret =3D static_call(x86_pmu_handle_irq)(regs); + x86_regs.regs =3D *regs; + ret =3D static_call(x86_pmu_handle_irq)(&x86_regs.regs); finish_clock =3D sched_clock(); =20 perf_sample_event_took(finish_clock - start_clock); --=20 2.38.1 From nobody Wed Oct 8 14:18:30 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0527E25B687 for ; Fri, 15 Aug 2025 21:35:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755293749; cv=none; b=RKBbBVB1IqDwG7oEutxl7hbkQL3JFHD8UukO7kkvsR5le85LWOgm4sOXGjBcxmebyBMAt16kFimCiMJjTvEgToFPoVjtu7EXCyHGC36v2UApL6kxrzwhyCQwGh8jI9wCMSU8KV4cbLBxEstO4lhKcjJ7eDBoJdAFTBUsUiukRM8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755293749; c=relaxed/simple; bh=jiUU/CO/dIeSSSoaGHlhX1BWyie8LcKytE6z9fFLDBo=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Xim7ANP4Ff42JIwogfr56FqowMpG7yEkF0j5/JuWe70cHy6SE5osjZVv8JBhEJoypFSKLA/8x9KjxwkRfjlds00EUFfpRo/lJCIvZss0fHw82a/0PICDO2KQAjvfEaywOKEBjMQ0lnZMKmwik1yez2ZtYn5IuEQOp3gQkm+B/4Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Op1Q0zA9; arc=none smtp.client-ip=198.175.65.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Op1Q0zA9" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1755293748; x=1786829748; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=jiUU/CO/dIeSSSoaGHlhX1BWyie8LcKytE6z9fFLDBo=; b=Op1Q0zA9E4sMohziGdpRoT9cHS16GBEYXDveheGB7P6B0qPzGtI6SV2W 1GQ/bGcbXUyEFd7UXUO3qNlL1MtzUbfaC4aFABLVkLxXc0A+yzxBZ4Ety rJxRqi7gztvsn6sYcbxKz9gnoKsgKtDX34wF8trR0lLVSzBdr4FMzyCDW +6uAvhl8UsuDb2al9PE+YPO/3EnEELkJ9cub0EcvQ/iY2BEyYt0/v6mi2 IzPlS2XOiHtAO8J3TCCEL/RvkxBIQfPltaSO3BQ+wX2Sstkwc4hiEc1Pr SHDuCNBgorOT4R+GJ06mLuEIC750cvqBfd628o0BjYod1LGzokMSEUJ2K A==; X-CSE-ConnectionGUID: T5iajYg+Q1KFD/nhY2shag== X-CSE-MsgGUID: KdKzDMtgQoqxetB400E/HQ== X-IronPort-AV: E=McAfee;i="6800,10657,11523"; a="68707370" X-IronPort-AV: E=Sophos;i="6.17,293,1747724400"; d="scan'208";a="68707370" Received: from orviesa010.jf.intel.com ([10.64.159.150]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Aug 2025 14:35:45 -0700 X-CSE-ConnectionGUID: 9JtFepHNQLqqGoDuZssjgg== X-CSE-MsgGUID: kf71pVFEQxWyoJcTQFTKJA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.17,293,1747724400"; d="scan'208";a="166319581" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by orviesa010.jf.intel.com with ESMTP; 15 Aug 2025 14:35:46 -0700 From: kan.liang@linux.intel.com To: peterz@infradead.org, mingo@redhat.com, acme@kernel.org, namhyung@kernel.org, tglx@linutronix.de, dave.hansen@linux.intel.com, irogers@google.com, adrian.hunter@intel.com, jolsa@kernel.org, alexander.shishkin@linux.intel.com, linux-kernel@vger.kernel.org Cc: dapeng1.mi@linux.intel.com, ak@linux.intel.com, zide.chen@intel.com, mark.rutland@arm.com, broonie@kernel.org, ravi.bangoria@amd.com, eranian@google.com, Kan Liang Subject: [PATCH V3 02/17] perf/x86: Setup the regs data Date: Fri, 15 Aug 2025 14:34:20 -0700 Message-Id: <20250815213435.1702022-3-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20250815213435.1702022-1-kan.liang@linux.intel.com> References: <20250815213435.1702022-1-kan.liang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang The current code relies on the generic code to setup the regs data. It will not work well when there are more regs introduced. Introduce a X86-specific x86_pmu_setup_regs_data(). Now, it's the same as the generic code. More X86-specific codes will be added later when the new regs. Signed-off-by: Kan Liang --- arch/x86/events/core.c | 32 ++++++++++++++++++++++++++++++++ arch/x86/events/intel/ds.c | 4 +++- arch/x86/events/perf_event.h | 4 ++++ 3 files changed, 39 insertions(+), 1 deletion(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 64a7a8aa2e38..c601ad761534 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -1685,6 +1685,38 @@ static void x86_pmu_del(struct perf_event *event, in= t flags) static_call_cond(x86_pmu_del)(event); } =20 +void x86_pmu_setup_regs_data(struct perf_event *event, + struct perf_sample_data *data, + struct pt_regs *regs) +{ + u64 sample_type =3D event->attr.sample_type; + + if (sample_type & PERF_SAMPLE_REGS_USER) { + if (user_mode(regs)) { + data->regs_user.abi =3D perf_reg_abi(current); + data->regs_user.regs =3D regs; + } else if (!(current->flags & PF_KTHREAD)) { + perf_get_regs_user(&data->regs_user, regs); + } else { + data->regs_user.abi =3D PERF_SAMPLE_REGS_ABI_NONE; + data->regs_user.regs =3D NULL; + } + data->dyn_size +=3D sizeof(u64); + if (data->regs_user.regs) + data->dyn_size +=3D hweight64(event->attr.sample_regs_user) * sizeof(u6= 4); + data->sample_flags |=3D PERF_SAMPLE_REGS_USER; + } + + if (sample_type & PERF_SAMPLE_REGS_INTR) { + data->regs_intr.regs =3D regs; + data->regs_intr.abi =3D perf_reg_abi(current); + data->dyn_size +=3D sizeof(u64); + if (data->regs_intr.regs) + data->dyn_size +=3D hweight64(event->attr.sample_regs_intr) * sizeof(u6= 4); + data->sample_flags |=3D PERF_SAMPLE_REGS_INTR; + } +} + int x86_pmu_handle_irq(struct pt_regs *regs) { struct perf_sample_data data; diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index c0b7ac1c7594..e67d8a03ddfe 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -2126,8 +2126,10 @@ static void setup_pebs_adaptive_sample_data(struct p= erf_event *event, regs->flags &=3D ~PERF_EFLAGS_EXACT; } =20 - if (sample_type & (PERF_SAMPLE_REGS_INTR | PERF_SAMPLE_REGS_USER)) + if (sample_type & (PERF_SAMPLE_REGS_INTR | PERF_SAMPLE_REGS_USER)) { adaptive_pebs_save_regs(regs, gprs); + x86_pmu_setup_regs_data(event, data, regs); + } } =20 if (format_group & PEBS_DATACFG_MEMINFO) { diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 2b969386dcdd..12682a059608 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -1278,6 +1278,10 @@ void x86_pmu_enable_event(struct perf_event *event); =20 int x86_pmu_handle_irq(struct pt_regs *regs); =20 +void x86_pmu_setup_regs_data(struct perf_event *event, + struct perf_sample_data *data, + struct pt_regs *regs); + void x86_pmu_show_pmu_cap(struct pmu *pmu); =20 static inline int x86_pmu_num_counters(struct pmu *pmu) --=20 2.38.1 From nobody Wed Oct 8 14:18:30 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 55ADD25CC75 for ; Fri, 15 Aug 2025 21:35:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755293749; cv=none; b=edItErgSfiJKrpH20yLdjzJC412JA9CszpDjmaQiRxnae98fbVO0o1H7AhK48gMDvbJxcV/2k8WQ9JudmJ3dQHZDJHh0Xw4nUGhCcn7nolplXNQxA2ups3jQ5p3U9qcfh+XBqyUIAYsz0KGMmouGC2VHE2HqbIb/9oJ2W62XVBQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755293749; c=relaxed/simple; bh=LCO5Fdp7rj+IB5i/Rg82/lrS2S+GA3QhWc5ciizjtWI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=NtVMu5mS3rmmLzI+pOh2wrIbxyN29RdmxuMU6e10C2iVPK/fTfMUi6IwLY0ZIkQT/CrEFIZyCY8PcoF3/iIZjObWbrP9ib92qQpFxDjFb0oYAQaHiLuONSDtNBytSL3MG2nU0f0QcOttBR0b8KoB+jEsPFE3jDwZVwZXRPdnZwA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Fhq/RApK; arc=none smtp.client-ip=198.175.65.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Fhq/RApK" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1755293748; x=1786829748; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=LCO5Fdp7rj+IB5i/Rg82/lrS2S+GA3QhWc5ciizjtWI=; b=Fhq/RApKL0191ePIxivFU4Z64PEf8bVbvQxk2wAMBl6+G3mO6Cn2am82 7fampwbRQ5a5jfmXmBzF7UuAMSjVMmUQGBlwZ/8hSToG82uMJUkICJLkl hybcQr/UQr1g+Xz849w8EUwkh+XF2aWE72NtKrEXV7ViMsLv/gWXjfMVX qre1zEWcRCN0GJ1j1JpCsmQ768PdWJnjm4hW7JR7zLt/w/dx9ZBysTxMn sp9qKSH2geA7lGOxsCEEc4vlx+rdvPrWlpQScl96mPs/C8H83hIlUevbg fSlE+i5ImXvt/ge11NKNx7aMH+KU/KQDMDsFVwJs2MexKMnSQtHX3nNtF g==; X-CSE-ConnectionGUID: CkYXCqX6Qm6T3QkOwBxqgA== X-CSE-MsgGUID: KuwZuOEgTY+d+L8Hh2gbtg== X-IronPort-AV: E=McAfee;i="6800,10657,11523"; a="68707378" X-IronPort-AV: E=Sophos;i="6.17,293,1747724400"; d="scan'208";a="68707378" Received: from orviesa010.jf.intel.com ([10.64.159.150]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Aug 2025 14:35:45 -0700 X-CSE-ConnectionGUID: RBXpkExGSnq3UqtoDcj4vA== X-CSE-MsgGUID: AE3FeR0SRMGlkMDdVjEyXQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.17,293,1747724400"; d="scan'208";a="166319584" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by orviesa010.jf.intel.com with ESMTP; 15 Aug 2025 14:35:46 -0700 From: kan.liang@linux.intel.com To: peterz@infradead.org, mingo@redhat.com, acme@kernel.org, namhyung@kernel.org, tglx@linutronix.de, dave.hansen@linux.intel.com, irogers@google.com, adrian.hunter@intel.com, jolsa@kernel.org, alexander.shishkin@linux.intel.com, linux-kernel@vger.kernel.org Cc: dapeng1.mi@linux.intel.com, ak@linux.intel.com, zide.chen@intel.com, mark.rutland@arm.com, broonie@kernel.org, ravi.bangoria@amd.com, eranian@google.com, Kan Liang Subject: [PATCH V3 03/17] x86/fpu/xstate: Add xsaves_nmi Date: Fri, 15 Aug 2025 14:34:21 -0700 Message-Id: <20250815213435.1702022-4-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20250815213435.1702022-1-kan.liang@linux.intel.com> References: <20250815213435.1702022-1-kan.liang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang There is a hardware feature (Intel PEBS XMMs group), which can handle XSAVE "snapshots" from random code running. This just provides another XSAVE data source at a random time. Add an interface to retrieve the actual register contents when the NMI hit. The interface is different from the other interfaces of FPU. The other mechanisms that deal with xstate try to get something coherent. But this interface is *in*coherent. There's no telling what was in the registers when a NMI hits. It writes whatever was in the registers when the NMI hit. It's the invoker's responsibility to make sure the contents are properly filtered before exposing them to the end user. The support of the supervisor state components is required. The compacted storage format is preferred. So the XSAVES is used. Suggested-by: Dave Hansen Signed-off-by: Kan Liang --- arch/x86/include/asm/fpu/xstate.h | 1 + arch/x86/kernel/fpu/xstate.c | 30 ++++++++++++++++++++++++++++++ 2 files changed, 31 insertions(+) diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/x= state.h index b308a76afbb7..0c8b9251c29f 100644 --- a/arch/x86/include/asm/fpu/xstate.h +++ b/arch/x86/include/asm/fpu/xstate.h @@ -107,6 +107,7 @@ int xfeature_size(int xfeature_nr); =20 void xsaves(struct xregs_state *xsave, u64 mask); void xrstors(struct xregs_state *xsave, u64 mask); +void xsaves_nmi(struct xregs_state *xsave, u64 mask); =20 int xfd_enable_feature(u64 xfd_err); =20 diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index 9aa9ac8399ae..8602683fcb12 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -1448,6 +1448,36 @@ void xrstors(struct xregs_state *xstate, u64 mask) WARN_ON_ONCE(err); } =20 +/** + * xsaves_nmi - Save selected components to a kernel xstate buffer in NMI + * @xstate: Pointer to the buffer + * @mask: Feature mask to select the components to save + * + * The @xstate buffer must be 64 byte aligned. + * + * Caution: The interface is different from the other interfaces of FPU. + * The other mechanisms that deal with xstate try to get something coheren= t. + * But this interface is *in*coherent. There's no telling what was in the + * registers when a NMI hits. It writes whatever was in the registers when + * the NMI hit. + * The only user for the interface is perf_event. There is already a + * hardware feature (See Intel PEBS XMMs group), which can handle XSAVE + * "snapshots" from random code running. This just provides another XSAVE + * data source at a random time. + * This function can only be invoked in an NMI. It returns the *ACTUAL* + * register contents when the NMI hit. + */ +void xsaves_nmi(struct xregs_state *xstate, u64 mask) +{ + int err; + + if (!in_nmi()) + return; + + XSTATE_OP(XSAVES, xstate, (u32)mask, (u32)(mask >> 32), err); + WARN_ON_ONCE(err); +} + #if IS_ENABLED(CONFIG_KVM) void fpstate_clear_xstate_component(struct fpstate *fpstate, unsigned int = xfeature) { --=20 2.38.1 From nobody Wed Oct 8 14:18:30 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EF85C25F7B5 for ; Fri, 15 Aug 2025 21:35:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755293750; cv=none; b=SxPTP82UkkXSsT45Hgl14KfXhaMHzvg29qyNb3tCJ1cJic+h1mBOYTqkLjglr33ZDmKLaantWhytatETVME0WrG2QrS6Htway2BHRhJoOeVrLXh3LJjIPAPTczlPtPvCBUDXNnyI9KRPNHNyoR+MHJoJWsDN0WUV0ZKXXqmJ5Xw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755293750; c=relaxed/simple; bh=BnOLccNSR36i9lhzHqgvKp4kyeZTq6EoNER9hwWqf/c=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=mD4IdIuW8gY93+BPAu1hbcgAP/yS4xWslUknd+jZIxkmFoy0z5uPgVLxtkqcg+UvLiuSMuPoG1NGeWQtujzk4IjCgePvccJ2ujcf4Se73dGXXzfEonfC09MPcnZsFAROMo9+/QAMxNouMsePrZVpTFgiWub/zgUYrZ6RRSPXEOY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=dHOw3uvv; arc=none smtp.client-ip=198.175.65.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="dHOw3uvv" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1755293749; x=1786829749; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=BnOLccNSR36i9lhzHqgvKp4kyeZTq6EoNER9hwWqf/c=; b=dHOw3uvvD3KoNOiy0GJ6jhs1O0WFPK7EokzgomPtO6GFMM2PK/eXTkqP yPM2DEcFOiG7njCGJUXmgAbVpLiquDwhDURbmjILbYuHKqMb6AxI+IEaR m6a/jFW6XRddFn3KnTmGnP1rATSBI2LgCwjpimXoZdf8CRMgwAbX6mlXE eSOlK0naP7J+ouOOwTQE/o5hWMhdaxwAxWmYq5p/lzBx9cVsQrCE/qB6/ sFAsjcgdqsiaxi7PgaI0ikKGxxpmzk4SbmkkQ5SPqOCYFQ1XaMVrOb2rG 2gc32MdPfUgoQcP6wxtXSN9R+6pcd7U9r+mfEL2LqYTMlH1PKf/L1TTfy A==; X-CSE-ConnectionGUID: +uHpw3LPQCiW7+Gb6PbVDw== X-CSE-MsgGUID: rY9xASAuRsi9aTOzDQQA1g== X-IronPort-AV: E=McAfee;i="6800,10657,11523"; a="68707386" X-IronPort-AV: E=Sophos;i="6.17,293,1747724400"; d="scan'208";a="68707386" Received: from orviesa010.jf.intel.com ([10.64.159.150]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Aug 2025 14:35:46 -0700 X-CSE-ConnectionGUID: qG9ii2WORjGsqV+e1+v0zw== X-CSE-MsgGUID: 8fr4+kCvSB+GRCNq8obEQQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.17,293,1747724400"; d="scan'208";a="166319588" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by orviesa010.jf.intel.com with ESMTP; 15 Aug 2025 14:35:46 -0700 From: kan.liang@linux.intel.com To: peterz@infradead.org, mingo@redhat.com, acme@kernel.org, namhyung@kernel.org, tglx@linutronix.de, dave.hansen@linux.intel.com, irogers@google.com, adrian.hunter@intel.com, jolsa@kernel.org, alexander.shishkin@linux.intel.com, linux-kernel@vger.kernel.org Cc: dapeng1.mi@linux.intel.com, ak@linux.intel.com, zide.chen@intel.com, mark.rutland@arm.com, broonie@kernel.org, ravi.bangoria@amd.com, eranian@google.com, Kan Liang Subject: [PATCH V3 04/17] perf: Move has_extended_regs() to header file Date: Fri, 15 Aug 2025 14:34:22 -0700 Message-Id: <20250815213435.1702022-5-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20250815213435.1702022-1-kan.liang@linux.intel.com> References: <20250815213435.1702022-1-kan.liang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang The function will also be used in the ARCH-specific code. Rename it to follow the naming rule of the existing functions. No functional change. Signed-off-by: Kan Liang --- include/linux/perf_event.h | 8 ++++++++ kernel/events/core.c | 8 +------- 2 files changed, 9 insertions(+), 7 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index ec9d96025683..444b162f3f92 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -1526,6 +1526,14 @@ perf_event__output_id_sample(struct perf_event *even= t, extern void perf_log_lost_samples(struct perf_event *event, u64 lost); =20 +static inline bool event_has_extended_regs(struct perf_event *event) +{ + struct perf_event_attr *attr =3D &event->attr; + + return (attr->sample_regs_user & PERF_REG_EXTENDED_MASK) || + (attr->sample_regs_intr & PERF_REG_EXTENDED_MASK); +} + static inline bool event_has_any_exclude_flag(struct perf_event *event) { struct perf_event_attr *attr =3D &event->attr; diff --git a/kernel/events/core.c b/kernel/events/core.c index 0db36b2b2448..95a7b6f5af09 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -12527,12 +12527,6 @@ int perf_pmu_unregister(struct pmu *pmu) } EXPORT_SYMBOL_GPL(perf_pmu_unregister); =20 -static inline bool has_extended_regs(struct perf_event *event) -{ - return (event->attr.sample_regs_user & PERF_REG_EXTENDED_MASK) || - (event->attr.sample_regs_intr & PERF_REG_EXTENDED_MASK); -} - static int perf_try_init_event(struct pmu *pmu, struct perf_event *event) { struct perf_event_context *ctx =3D NULL; @@ -12567,7 +12561,7 @@ static int perf_try_init_event(struct pmu *pmu, str= uct perf_event *event) goto err_pmu; =20 if (!(pmu->capabilities & PERF_PMU_CAP_EXTENDED_REGS) && - has_extended_regs(event)) { + event_has_extended_regs(event)) { ret =3D -EOPNOTSUPP; goto err_destroy; } --=20 2.38.1 From nobody Wed Oct 8 14:18:30 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C83B5263C8E for ; Fri, 15 Aug 2025 21:35:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755293753; cv=none; b=SEF308Py7nC2YHdY/5cduTS+WUZZakKW3viVoN0kOxVKpA+kKFRUMlOYzUJfk7U/B3K9k76/TT/3dIKpmjoOLWJNIHJPHx/s8SEw7hyd7O+FuECgHYFxffm67WebMICqyRkrm0IhjUBxUse/5I8BflW8GgYRnw8AEPOvQ9PF2ck= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755293753; c=relaxed/simple; bh=OdsJzI8AoRa0uBrHbrllbXEbNLO0NsFCOL9QXfWDtWk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=j4pp28Ubuo+32em/EkBCxxjwZVPGqaGkhF94BKP/blijb/1AkjjwrIdISF9C4D1COpjlFA8aErLtdY6N9D0HOwaXRB7Calqv82u2GxF7qjIyDGVgQJETMdkah4xVH9JVsk+st+2iKcBvupa+tGHvyoMpdtiODjfNmj2q8BY7PpA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=DhLuV1r2; arc=none smtp.client-ip=198.175.65.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="DhLuV1r2" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1755293749; x=1786829749; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=OdsJzI8AoRa0uBrHbrllbXEbNLO0NsFCOL9QXfWDtWk=; b=DhLuV1r24c4XiZnpcYLIK/nf6mIyElbz/hSTPQ1hlErn5ktsdmpvWt8+ sctwNgl/8yzMl8pVreZzhCJJIGyh9ECRVTtb7WNpqk4Sm5WNRLboPFDRf S0icz5wvYdfbO9zGNopbBM3DOgdQVFKnBLnre77IFqtUNwmk3drf1iLEC 89eCtPvdQ6idRLTI8qEoPbkjp2/TK1jl46WsT+N1xUm54RIyjDDvdKPGU YNSV2/J0F3L6djNqcPqfKTVRjWjSYEmUZwdGwN1Y6LXtbJmSFIUhtINhu RJ5pkSwQ0mAQ64HW37sgkIrJ+oZMxGpe7hIKUShkhOA9BT9Vfy8aY2Pxn g==; X-CSE-ConnectionGUID: lcxiyuq+SuSAn13HhwYMbw== X-CSE-MsgGUID: XfMvE5EASX6fwGDIjFVVjQ== X-IronPort-AV: E=McAfee;i="6800,10657,11523"; a="68707394" X-IronPort-AV: E=Sophos;i="6.17,293,1747724400"; d="scan'208";a="68707394" Received: from orviesa010.jf.intel.com ([10.64.159.150]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Aug 2025 14:35:46 -0700 X-CSE-ConnectionGUID: QqqqU16CTr6G1diA/23yVw== X-CSE-MsgGUID: g1YkZflqSzOmt4xzLEMRaA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.17,293,1747724400"; d="scan'208";a="166319592" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by orviesa010.jf.intel.com with ESMTP; 15 Aug 2025 14:35:46 -0700 From: kan.liang@linux.intel.com To: peterz@infradead.org, mingo@redhat.com, acme@kernel.org, namhyung@kernel.org, tglx@linutronix.de, dave.hansen@linux.intel.com, irogers@google.com, adrian.hunter@intel.com, jolsa@kernel.org, alexander.shishkin@linux.intel.com, linux-kernel@vger.kernel.org Cc: dapeng1.mi@linux.intel.com, ak@linux.intel.com, zide.chen@intel.com, mark.rutland@arm.com, broonie@kernel.org, ravi.bangoria@amd.com, eranian@google.com, Kan Liang Subject: [PATCH V3 05/17] perf/x86: Support XMM register for non-PEBS and REGS_USER Date: Fri, 15 Aug 2025 14:34:23 -0700 Message-Id: <20250815213435.1702022-6-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20250815213435.1702022-1-kan.liang@linux.intel.com> References: <20250815213435.1702022-1-kan.liang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang Collecting the XMM registers in a PEBS record has been supported since the Icelake. But non-PEBS events don't support the feature. It's possible to retrieve the XMM registers from the XSAVE for non-PEBS. Add it to make the feature complete. To utilize the XSAVE, a 64-byte aligned buffer is required. Add a per-CPU ext_regs_buf to store the vector registers. The size of the buffer is ~2K. kzalloc_node() is used because there's a _guarantee_ that all kmalloc()'s with powers of 2 are naturally aligned and also 64b aligned. Extend the support for both REGS_USER and REGS_INTR. For REGS_USER, the perf_get_regs_user() returns the regs from the task_pt_regs(current), which is struct pt_regs. Need to move it to local struct x86_perf_regs x86_user_regs. For PEBS, the HW support is still preferred. The XMM should be retrieved from PEBS records. There could be more vector registers supported later. Add ext_regs_mask to track the supported vector register group. Signed-off-by: Kan Liang --- arch/x86/events/core.c | 127 +++++++++++++++++++++++++----- arch/x86/events/intel/core.c | 27 +++++++ arch/x86/events/intel/ds.c | 10 ++- arch/x86/events/perf_event.h | 9 ++- arch/x86/include/asm/fpu/xstate.h | 2 + arch/x86/include/asm/perf_event.h | 5 +- arch/x86/kernel/fpu/xstate.c | 2 +- 7 files changed, 157 insertions(+), 25 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index c601ad761534..f27c58f4c815 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -406,6 +406,61 @@ set_ext_hw_attr(struct hw_perf_event *hwc, struct perf= _event *event) return x86_pmu_extra_regs(val, event); } =20 +static DEFINE_PER_CPU(struct xregs_state *, ext_regs_buf); + +static void x86_pmu_get_ext_regs(struct x86_perf_regs *perf_regs, u64 mask) +{ + struct xregs_state *xsave =3D per_cpu(ext_regs_buf, smp_processor_id()); + u64 valid_mask =3D x86_pmu.ext_regs_mask & mask; + + if (WARN_ON_ONCE(!xsave)) + return; + + xsaves_nmi(xsave, valid_mask); + + /* Filtered by what XSAVE really gives */ + valid_mask &=3D xsave->header.xfeatures; + + if (valid_mask & XFEATURE_MASK_SSE) + perf_regs->xmm_space =3D xsave->i387.xmm_space; +} + +static void release_ext_regs_buffers(void) +{ + int cpu; + + if (!x86_pmu.ext_regs_mask) + return; + + for_each_possible_cpu(cpu) { + kfree(per_cpu(ext_regs_buf, cpu)); + per_cpu(ext_regs_buf, cpu) =3D NULL; + } +} + +static void reserve_ext_regs_buffers(void) +{ + unsigned int size; + int cpu; + + if (!x86_pmu.ext_regs_mask) + return; + + size =3D xstate_calculate_size(x86_pmu.ext_regs_mask, true); + + for_each_possible_cpu(cpu) { + per_cpu(ext_regs_buf, cpu) =3D kzalloc_node(size, GFP_KERNEL, + cpu_to_node(cpu)); + if (!per_cpu(ext_regs_buf, cpu)) + goto err; + } + + return; + +err: + release_ext_regs_buffers(); +} + int x86_reserve_hardware(void) { int err =3D 0; @@ -418,6 +473,7 @@ int x86_reserve_hardware(void) } else { reserve_ds_buffers(); reserve_lbr_buffers(); + reserve_ext_regs_buffers(); } } if (!err) @@ -434,6 +490,7 @@ void x86_release_hardware(void) release_pmc_hardware(); release_ds_buffers(); release_lbr_buffers(); + release_ext_regs_buffers(); mutex_unlock(&pmc_reserve_mutex); } } @@ -642,21 +699,18 @@ int x86_pmu_hw_config(struct perf_event *event) return -EINVAL; } =20 - /* sample_regs_user never support XMM registers */ - if (unlikely(event->attr.sample_regs_user & PERF_REG_EXTENDED_MASK)) - return -EINVAL; - /* - * Besides the general purpose registers, XMM registers may - * be collected in PEBS on some platforms, e.g. Icelake - */ - if (unlikely(event->attr.sample_regs_intr & PERF_REG_EXTENDED_MASK)) { - if (!(event->pmu->capabilities & PERF_PMU_CAP_EXTENDED_REGS)) - return -EINVAL; - - if (!event->attr.precise_ip) - return -EINVAL; + if (event->attr.sample_type & (PERF_SAMPLE_REGS_INTR | PERF_SAMPLE_REGS_U= SER)) { + /* + * Besides the general purpose registers, XMM registers may + * be collected as well. + */ + if (event_has_extended_regs(event)) { + if (!(event->pmu->capabilities & PERF_PMU_CAP_EXTENDED_REGS)) + return -EINVAL; + if (!(x86_pmu.ext_regs_mask & XFEATURE_MASK_SSE)) + return -EINVAL; + } } - return x86_setup_perfctr(event); } =20 @@ -1685,25 +1739,51 @@ static void x86_pmu_del(struct perf_event *event, i= nt flags) static_call_cond(x86_pmu_del)(event); } =20 +static DEFINE_PER_CPU(struct x86_perf_regs, x86_user_regs); + +static struct x86_perf_regs * +x86_pmu_perf_get_regs_user(struct perf_sample_data *data, + struct pt_regs *regs) +{ + struct x86_perf_regs *x86_regs_user =3D this_cpu_ptr(&x86_user_regs); + struct perf_regs regs_user; + + perf_get_regs_user(®s_user, regs); + data->regs_user.abi =3D regs_user.abi; + if (regs_user.regs) { + x86_regs_user->regs =3D *regs_user.regs; + data->regs_user.regs =3D &x86_regs_user->regs; + } else + data->regs_user.regs =3D NULL; + return x86_regs_user; +} + void x86_pmu_setup_regs_data(struct perf_event *event, struct perf_sample_data *data, - struct pt_regs *regs) + struct pt_regs *regs, + u64 ignore_mask) { - u64 sample_type =3D event->attr.sample_type; + struct x86_perf_regs *perf_regs =3D container_of(regs, struct x86_perf_re= gs, regs); + struct perf_event_attr *attr =3D &event->attr; + u64 sample_type =3D attr->sample_type; + u64 mask =3D 0; + + if (!(attr->sample_type & (PERF_SAMPLE_REGS_INTR | PERF_SAMPLE_REGS_USER)= )) + return; =20 if (sample_type & PERF_SAMPLE_REGS_USER) { if (user_mode(regs)) { data->regs_user.abi =3D perf_reg_abi(current); data->regs_user.regs =3D regs; } else if (!(current->flags & PF_KTHREAD)) { - perf_get_regs_user(&data->regs_user, regs); + perf_regs =3D x86_pmu_perf_get_regs_user(data, regs); } else { data->regs_user.abi =3D PERF_SAMPLE_REGS_ABI_NONE; data->regs_user.regs =3D NULL; } data->dyn_size +=3D sizeof(u64); if (data->regs_user.regs) - data->dyn_size +=3D hweight64(event->attr.sample_regs_user) * sizeof(u6= 4); + data->dyn_size +=3D hweight64(attr->sample_regs_user) * sizeof(u64); data->sample_flags |=3D PERF_SAMPLE_REGS_USER; } =20 @@ -1712,9 +1792,18 @@ void x86_pmu_setup_regs_data(struct perf_event *even= t, data->regs_intr.abi =3D perf_reg_abi(current); data->dyn_size +=3D sizeof(u64); if (data->regs_intr.regs) - data->dyn_size +=3D hweight64(event->attr.sample_regs_intr) * sizeof(u6= 4); + data->dyn_size +=3D hweight64(attr->sample_regs_intr) * sizeof(u64); data->sample_flags |=3D PERF_SAMPLE_REGS_INTR; } + + if (event_has_extended_regs(event)) { + perf_regs->xmm_regs =3D NULL; + mask |=3D XFEATURE_MASK_SSE; + } + + mask &=3D ~ignore_mask; + if (mask) + x86_pmu_get_ext_regs(perf_regs, mask); } =20 int x86_pmu_handle_irq(struct pt_regs *regs) diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index c2fb729c270e..bd16f91dea1c 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -3284,6 +3284,8 @@ static int handle_pmi_common(struct pt_regs *regs, u6= 4 status) if (has_branch_stack(event)) intel_pmu_lbr_save_brstack(&data, cpuc, event); =20 + x86_pmu_setup_regs_data(event, &data, regs, 0); + perf_event_overflow(event, &data, regs); } =20 @@ -5272,6 +5274,29 @@ static inline bool intel_pmu_broken_perf_cap(void) return false; } =20 +static void intel_extended_regs_init(struct pmu *pmu) +{ + /* + * Extend the vector registers support to non-PEBS. + * The feature is limited to newer Intel machines with + * PEBS V4+ or archPerfmonExt (0x23) enabled for now. + * In theory, the vector registers can be retrieved as + * long as the CPU supports. The support for the old + * generations may be added later if there is a + * requirement. + * Only support the extension when XSAVES is available. + */ + if (!boot_cpu_has(X86_FEATURE_XSAVES)) + return; + + if (!boot_cpu_has(X86_FEATURE_XMM) || + !cpu_has_xfeatures(XFEATURE_MASK_SSE, NULL)) + return; + + x86_pmu.ext_regs_mask |=3D XFEATURE_MASK_SSE; + x86_get_pmu(smp_processor_id())->capabilities |=3D PERF_PMU_CAP_EXTENDED_= REGS; +} + static void update_pmu_cap(struct pmu *pmu) { unsigned int cntr, fixed_cntr, ecx, edx; @@ -5306,6 +5331,8 @@ static void update_pmu_cap(struct pmu *pmu) /* Perf Metric (Bit 15) and PEBS via PT (Bit 16) are hybrid enumeration = */ rdmsrq(MSR_IA32_PERF_CAPABILITIES, hybrid(pmu, intel_cap).capabilities); } + + intel_extended_regs_init(pmu); } =20 static void intel_pmu_check_hybrid_pmus(struct x86_hybrid_pmu *pmu) diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index e67d8a03ddfe..9cdece014ac0 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -1415,8 +1415,7 @@ static u64 pebs_update_adaptive_cfg(struct perf_event= *event) if (gprs || (attr->precise_ip < 2) || tsx_weight) pebs_data_cfg |=3D PEBS_DATACFG_GP; =20 - if ((sample_type & PERF_SAMPLE_REGS_INTR) && - (attr->sample_regs_intr & PERF_REG_EXTENDED_MASK)) + if (event_has_extended_regs(event)) pebs_data_cfg |=3D PEBS_DATACFG_XMMS; =20 if (sample_type & PERF_SAMPLE_BRANCH_STACK) { @@ -2127,8 +2126,12 @@ static void setup_pebs_adaptive_sample_data(struct p= erf_event *event, } =20 if (sample_type & (PERF_SAMPLE_REGS_INTR | PERF_SAMPLE_REGS_USER)) { + u64 mask =3D 0; + adaptive_pebs_save_regs(regs, gprs); - x86_pmu_setup_regs_data(event, data, regs); + if (format_group & PEBS_DATACFG_XMMS) + mask |=3D XFEATURE_MASK_SSE; + x86_pmu_setup_regs_data(event, data, regs, mask); } } =20 @@ -2755,6 +2758,7 @@ void __init intel_pebs_init(void) x86_pmu.flags |=3D PMU_FL_PEBS_ALL; x86_pmu.pebs_capable =3D ~0ULL; pebs_qual =3D "-baseline"; + x86_pmu.ext_regs_mask |=3D XFEATURE_MASK_SSE; x86_get_pmu(smp_processor_id())->capabilities |=3D PERF_PMU_CAP_EXTEND= ED_REGS; } else { /* Only basic record supported */ diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 12682a059608..7bf24842b1dc 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -992,6 +992,12 @@ struct x86_pmu { struct extra_reg *extra_regs; unsigned int flags; =20 + /* + * Extended regs, e.g., vector registers + * Utilize the same format as the XFEATURE_MASK_* + */ + u64 ext_regs_mask; + /* * Intel host/guest support (KVM) */ @@ -1280,7 +1286,8 @@ int x86_pmu_handle_irq(struct pt_regs *regs); =20 void x86_pmu_setup_regs_data(struct perf_event *event, struct perf_sample_data *data, - struct pt_regs *regs); + struct pt_regs *regs, + u64 ignore_mask); =20 void x86_pmu_show_pmu_cap(struct pmu *pmu); =20 diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/x= state.h index 0c8b9251c29f..58bbdf9226d1 100644 --- a/arch/x86/include/asm/fpu/xstate.h +++ b/arch/x86/include/asm/fpu/xstate.h @@ -109,6 +109,8 @@ void xsaves(struct xregs_state *xsave, u64 mask); void xrstors(struct xregs_state *xsave, u64 mask); void xsaves_nmi(struct xregs_state *xsave, u64 mask); =20 +unsigned int xstate_calculate_size(u64 xfeatures, bool compacted); + int xfd_enable_feature(u64 xfd_err); =20 #ifdef CONFIG_X86_64 diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_= event.h index 70d1d94aca7e..f36f04bc95f1 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -592,7 +592,10 @@ extern void perf_events_lapic_init(void); struct pt_regs; struct x86_perf_regs { struct pt_regs regs; - u64 *xmm_regs; + union { + u64 *xmm_regs; + u32 *xmm_space; /* for xsaves */ + }; }; =20 extern unsigned long perf_arch_instruction_pointer(struct pt_regs *regs); diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index 8602683fcb12..4747b29608cd 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -583,7 +583,7 @@ static bool __init check_xstate_against_struct(int nr) return true; } =20 -static unsigned int xstate_calculate_size(u64 xfeatures, bool compacted) +unsigned int xstate_calculate_size(u64 xfeatures, bool compacted) { unsigned int topmost =3D fls64(xfeatures) - 1; unsigned int offset, i; --=20 2.38.1 From nobody Wed Oct 8 14:18:30 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D960326B973 for ; Fri, 15 Aug 2025 21:35:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755293751; cv=none; b=A/gVHPiHE8myuJ8z4zfLZBvXRsnejmEc8KGBMZ8KPoRR7VYdJh9SDWKwsSOr5VF+uO5T3V60YsOsAbBz6/IMHFOvEexBx2KWCV6K66ZJe/ZvLGx/gKTBcpZkOORkGujy38OBNcKudxnPp3wvKvzvVShJOBz1CLm4m9jWjYGKNoc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755293751; c=relaxed/simple; bh=ldklq7t2qlJCP7YTem0KDZBrLnN7Ju8p02sHCq11DV8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Ceax0MCnnDRZdbv1R4ElzaSkhc6eZym8N3PC+niCB7ejhazgN5VK47S51H4mAPucZWtlLhEXE8F3cAUvP1BqFm40mGaodolA5Fv6V+k2vY1HGEZHa6xXkIu+xaEeD7MGka4Lde7OsDTNDYcRBF0PEqYOY9ZYsuULB9gDKR6PoQo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=LOVqZnCX; arc=none smtp.client-ip=198.175.65.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="LOVqZnCX" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1755293749; x=1786829749; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ldklq7t2qlJCP7YTem0KDZBrLnN7Ju8p02sHCq11DV8=; b=LOVqZnCXLJmXXeQO4rpALLJx0xoP42EkWfY/ecyuSgZjoI9jcbNsviyO ZGgVNFBRJR/Ultmtrc3nHJfZ0WPo7YGrhZYt+/tItMSQlBDPXNets/6tb 4ez8iI1gSd1UWcPphiFOtyMOgLXPkMxWgQLx7gj3lCGvHqI5/XsqEukPr t2I4g2dw4WAOLHRs0c8HKYyks+WN8sBmI5OJRM5rnQUSdHRSyIhqkO+ba AhJTDfruI7ITMnwPpA1j6bCfJIxU4mtLH6wEOoKyW+D0WDeJ0udAGikBK pF2+prYT0za8omAirHFXIL6wLtO6O03vwecge0uLMI+/eX7gacl6JkevX w==; X-CSE-ConnectionGUID: 9sIcmtWdTgWD1B9Y+rS9kA== X-CSE-MsgGUID: DVLsoA2DQ1m765xZGJmw4Q== X-IronPort-AV: E=McAfee;i="6800,10657,11523"; a="68707402" X-IronPort-AV: E=Sophos;i="6.17,293,1747724400"; d="scan'208";a="68707402" Received: from orviesa010.jf.intel.com ([10.64.159.150]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Aug 2025 14:35:46 -0700 X-CSE-ConnectionGUID: KhJt+R09SFuCB/TnpKnyxw== X-CSE-MsgGUID: dcRfe1X9TFiIm9tMgUXceA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.17,293,1747724400"; d="scan'208";a="166319595" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by orviesa010.jf.intel.com with ESMTP; 15 Aug 2025 14:35:46 -0700 From: kan.liang@linux.intel.com To: peterz@infradead.org, mingo@redhat.com, acme@kernel.org, namhyung@kernel.org, tglx@linutronix.de, dave.hansen@linux.intel.com, irogers@google.com, adrian.hunter@intel.com, jolsa@kernel.org, alexander.shishkin@linux.intel.com, linux-kernel@vger.kernel.org Cc: dapeng1.mi@linux.intel.com, ak@linux.intel.com, zide.chen@intel.com, mark.rutland@arm.com, broonie@kernel.org, ravi.bangoria@amd.com, eranian@google.com, Kan Liang Subject: [PATCH V3 06/17] perf: Support SIMD registers Date: Fri, 15 Aug 2025 14:34:24 -0700 Message-Id: <20250815213435.1702022-7-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20250815213435.1702022-1-kan.liang@linux.intel.com> References: <20250815213435.1702022-1-kan.liang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang The users may be interested in the SIMD registers in a sample while profiling. The current sample_regs_XXX doesn't have enough space for all SIMD registers. Add sets of the sample_simd_{pred,vec}_reg_* in the struct perf_event_attr to define a set of SIMD registers to dump on samples. The current X86 supports the XMM registers in sample_regs_XXX. To utilize the new SIMD registers configuration method, the sample_simd_regs_enabled should always be set. If so, the XMM space in the sample_regs_XXX is reserved for other usage. The SIMD registers are wider than 64. A new output format is introduced. The number and width of SIMD registers will be dumped first, following the register values. The number and width are the same as the user's configuration now. If, for some reason (e.g., ARM) they are different, an ARCH-specific perf_output_sample_simd_regs can be implemented later separately. Add a new ABI, PERF_SAMPLE_REGS_ABI_SIMD, to indicate the new format. The enum perf_sample_regs_abi becomes a bitmap now. There should be no impact on the existing tool, since the version and bitmap are the same for 1 and 2. Add three new __weak functions to retrieve the number of available registers, validate the configuration of the SIMD registers, and retrieve the SIMD registers. The ARCH-specific functions will be implemented in the following patches. Add a new flag PERF_PMU_CAP_SIMD_REGS to indicate that the PMU has the capability to support SIMD registers dumping. Error out if the sample_simd_{pred,vec}_reg_* mistakenly set for a PMU that doesn't have the capability. Suggested-by: Peter Zijlstra (Intel) Signed-off-by: Kan Liang --- include/linux/perf_event.h | 13 ++++ include/linux/perf_regs.h | 9 +++ include/uapi/linux/perf_event.h | 47 +++++++++++++-- kernel/events/core.c | 101 +++++++++++++++++++++++++++++++- 4 files changed, 162 insertions(+), 8 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 444b162f3f92..205361b7de2e 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -305,6 +305,7 @@ struct perf_event_pmu_context; #define PERF_PMU_CAP_EXTENDED_HW_TYPE 0x0100 #define PERF_PMU_CAP_AUX_PAUSE 0x0200 #define PERF_PMU_CAP_AUX_PREFER_LARGE 0x0400 +#define PERF_PMU_CAP_SIMD_REGS 0x0800 =20 /** * pmu::scope @@ -1526,6 +1527,18 @@ perf_event__output_id_sample(struct perf_event *even= t, extern void perf_log_lost_samples(struct perf_event *event, u64 lost); =20 +static inline bool event_has_simd_regs(struct perf_event *event) +{ + struct perf_event_attr *attr =3D &event->attr; + + return attr->sample_simd_regs_enabled !=3D 0 || + attr->sample_simd_pred_reg_intr !=3D 0 || + attr->sample_simd_pred_reg_user !=3D 0 || + attr->sample_simd_vec_reg_qwords !=3D 0 || + attr->sample_simd_vec_reg_intr !=3D 0 || + attr->sample_simd_vec_reg_user !=3D 0; +} + static inline bool event_has_extended_regs(struct perf_event *event) { struct perf_event_attr *attr =3D &event->attr; diff --git a/include/linux/perf_regs.h b/include/linux/perf_regs.h index f632c5725f16..0172682b18fd 100644 --- a/include/linux/perf_regs.h +++ b/include/linux/perf_regs.h @@ -9,6 +9,15 @@ struct perf_regs { struct pt_regs *regs; }; =20 +int perf_simd_reg_validate(u16 vec_qwords, u64 vec_mask, + u16 pred_qwords, u32 pred_mask); +u64 perf_simd_reg_value(struct pt_regs *regs, int idx, + u16 qwords_idx, bool pred); +void perf_simd_reg_check(struct pt_regs *regs, + u64 mask, u16 *nr_vectors, u16 *vec_qwords, + u16 pred_mask, u16 *nr_pred, u16 *pred_qwords); + + #ifdef CONFIG_HAVE_PERF_REGS #include =20 diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_even= t.h index 78a362b80027..2e9b16acbed6 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -313,9 +313,10 @@ enum { * Values to determine ABI of the registers dump. */ enum perf_sample_regs_abi { - PERF_SAMPLE_REGS_ABI_NONE =3D 0, - PERF_SAMPLE_REGS_ABI_32 =3D 1, - PERF_SAMPLE_REGS_ABI_64 =3D 2, + PERF_SAMPLE_REGS_ABI_NONE =3D 0x00, + PERF_SAMPLE_REGS_ABI_32 =3D 0x01, + PERF_SAMPLE_REGS_ABI_64 =3D 0x02, + PERF_SAMPLE_REGS_ABI_SIMD =3D 0x04, }; =20 /* @@ -382,6 +383,7 @@ enum perf_event_read_format { #define PERF_ATTR_SIZE_VER6 120 /* Add: aux_sample_size */ #define PERF_ATTR_SIZE_VER7 128 /* Add: sig_data */ #define PERF_ATTR_SIZE_VER8 136 /* Add: config3 */ +#define PERF_ATTR_SIZE_VER9 168 /* Add: sample_simd_{pred,vec}_reg_* */ =20 /* * 'struct perf_event_attr' contains various attributes that define @@ -543,6 +545,25 @@ struct perf_event_attr { __u64 sig_data; =20 __u64 config3; /* extension of config2 */ + + + /* + * Defines set of SIMD registers to dump on samples. + * The sample_simd_regs_enabled !=3D0 implies the + * set of SIMD registers is used to config all SIMD registers. + * If !sample_simd_regs_enabled, sample_regs_XXX may be used to + * config some SIMD registers on X86. + */ + union { + __u16 sample_simd_regs_enabled; + __u16 sample_simd_pred_reg_qwords; + }; + __u32 sample_simd_pred_reg_intr; + __u32 sample_simd_pred_reg_user; + __u16 sample_simd_vec_reg_qwords; + __u64 sample_simd_vec_reg_intr; + __u64 sample_simd_vec_reg_user; + __u32 __reserved_4; }; =20 /* @@ -1016,7 +1037,15 @@ enum perf_event_type { * } && PERF_SAMPLE_BRANCH_STACK * * { u64 abi; # enum perf_sample_regs_abi - * u64 regs[weight(mask)]; } && PERF_SAMPLE_REGS_USER + * u64 regs[weight(mask)]; + * struct { + * u16 nr_vectors; + * u16 vector_qwords; + * u16 nr_pred; + * u16 pred_qwords; + * u64 data[nr_vectors * vector_qwords + nr_pred * pred_qwords]; + * } && (abi & PERF_SAMPLE_REGS_ABI_SIMD) + * } && PERF_SAMPLE_REGS_USER * * { u64 size; * char data[size]; @@ -1043,7 +1072,15 @@ enum perf_event_type { * { u64 data_src; } && PERF_SAMPLE_DATA_SRC * { u64 transaction; } && PERF_SAMPLE_TRANSACTION * { u64 abi; # enum perf_sample_regs_abi - * u64 regs[weight(mask)]; } && PERF_SAMPLE_REGS_INTR + * u64 regs[weight(mask)]; + * struct { + * u16 nr_vectors; + * u16 vector_qwords; + * u16 nr_pred; + * u16 pred_qwords; + * u64 data[nr_vectors * vector_qwords + nr_pred * pred_qwords]; + * } && (abi & PERF_SAMPLE_REGS_ABI_SIMD) + * } && PERF_SAMPLE_REGS_INTR * { u64 phys_addr;} && PERF_SAMPLE_PHYS_ADDR * { u64 cgroup;} && PERF_SAMPLE_CGROUP * { u64 data_page_size;} && PERF_SAMPLE_DATA_PAGE_SIZE diff --git a/kernel/events/core.c b/kernel/events/core.c index 95a7b6f5af09..dd8cf3c7fb7a 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -7408,6 +7408,47 @@ perf_output_sample_regs(struct perf_output_handle *h= andle, } } =20 +static void +perf_output_sample_simd_regs(struct perf_output_handle *handle, + struct perf_event *event, + struct pt_regs *regs, + u64 mask, u16 pred_mask) +{ + u16 pred_qwords =3D event->attr.sample_simd_pred_reg_qwords; + u16 vec_qwords =3D event->attr.sample_simd_vec_reg_qwords; + u16 nr_pred =3D hweight16(pred_mask); + u16 nr_vectors =3D hweight64(mask); + int bit; + u64 val; + u16 i; + + /* Get the number of available regs */ + perf_simd_reg_check(regs, mask, &nr_vectors, &vec_qwords, + pred_mask, &nr_pred, &pred_qwords); + + perf_output_put(handle, nr_vectors); + perf_output_put(handle, vec_qwords); + perf_output_put(handle, nr_pred); + perf_output_put(handle, pred_qwords); + + if (nr_vectors) { + for_each_set_bit(bit, (unsigned long *)&mask, sizeof(mask) * BITS_PER_BY= TE) { + for (i =3D 0; i < vec_qwords; i++) { + val =3D perf_simd_reg_value(regs, bit, i, false); + perf_output_put(handle, val); + } + } + } + if (nr_pred) { + for_each_set_bit(bit, (unsigned long *)&pred_mask, sizeof(pred_mask) * B= ITS_PER_BYTE) { + for (i =3D 0; i < pred_qwords; i++) { + val =3D perf_simd_reg_value(regs, bit, i, true); + perf_output_put(handle, val); + } + } + } +} + static void perf_sample_regs_user(struct perf_regs *regs_user, struct pt_regs *regs) { @@ -7429,6 +7470,25 @@ static void perf_sample_regs_intr(struct perf_regs *= regs_intr, regs_intr->abi =3D perf_reg_abi(current); } =20 +int __weak perf_simd_reg_validate(u16 vec_qwords, u64 vec_mask, + u16 pred_qwords, u32 pred_mask) +{ + return vec_qwords || vec_mask || pred_qwords || pred_mask ? -ENOSYS : 0; +} + +u64 __weak perf_simd_reg_value(struct pt_regs *regs, int idx, + u16 qwords_idx, bool pred) +{ + return 0; +} + +void __weak perf_simd_reg_check(struct pt_regs *regs, + u64 mask, u16 *nr_vectors, u16 *vec_qwords, + u16 pred_mask, u16 *nr_pred, u16 *pred_qwords) +{ + *nr_vectors =3D 0; + *nr_pred =3D 0; +} =20 /* * Get remaining task size from user stack pointer. @@ -7961,10 +8021,17 @@ void perf_output_sample(struct perf_output_handle *= handle, perf_output_put(handle, abi); =20 if (abi) { - u64 mask =3D event->attr.sample_regs_user; + struct perf_event_attr *attr =3D &event->attr; + u64 mask =3D attr->sample_regs_user; perf_output_sample_regs(handle, data->regs_user.regs, mask); + if (abi & PERF_SAMPLE_REGS_ABI_SIMD) { + perf_output_sample_simd_regs(handle, event, + data->regs_user.regs, + attr->sample_simd_vec_reg_user, + attr->sample_simd_pred_reg_user); + } } } =20 @@ -7992,11 +8059,18 @@ void perf_output_sample(struct perf_output_handle *= handle, perf_output_put(handle, abi); =20 if (abi) { - u64 mask =3D event->attr.sample_regs_intr; + struct perf_event_attr *attr =3D &event->attr; + u64 mask =3D attr->sample_regs_intr; =20 perf_output_sample_regs(handle, data->regs_intr.regs, mask); + if (abi & PERF_SAMPLE_REGS_ABI_SIMD) { + perf_output_sample_simd_regs(handle, event, + data->regs_intr.regs, + attr->sample_simd_vec_reg_intr, + attr->sample_simd_pred_reg_intr); + } } } =20 @@ -12560,6 +12634,12 @@ static int perf_try_init_event(struct pmu *pmu, st= ruct perf_event *event) if (ret) goto err_pmu; =20 + if (!(pmu->capabilities & PERF_PMU_CAP_SIMD_REGS) && + event_has_simd_regs(event)) { + ret =3D -EOPNOTSUPP; + goto err_destroy; + } + if (!(pmu->capabilities & PERF_PMU_CAP_EXTENDED_REGS) && event_has_extended_regs(event)) { ret =3D -EOPNOTSUPP; @@ -13101,6 +13181,12 @@ static int perf_copy_attr(struct perf_event_attr _= _user *uattr, ret =3D perf_reg_validate(attr->sample_regs_user); if (ret) return ret; + ret =3D perf_simd_reg_validate(attr->sample_simd_vec_reg_qwords, + attr->sample_simd_vec_reg_user, + attr->sample_simd_pred_reg_qwords, + attr->sample_simd_pred_reg_user); + if (ret) + return ret; } =20 if (attr->sample_type & PERF_SAMPLE_STACK_USER) { @@ -13121,8 +13207,17 @@ static int perf_copy_attr(struct perf_event_attr _= _user *uattr, if (!attr->sample_max_stack) attr->sample_max_stack =3D sysctl_perf_event_max_stack; =20 - if (attr->sample_type & PERF_SAMPLE_REGS_INTR) + if (attr->sample_type & PERF_SAMPLE_REGS_INTR) { ret =3D perf_reg_validate(attr->sample_regs_intr); + if (ret) + return ret; + ret =3D perf_simd_reg_validate(attr->sample_simd_vec_reg_qwords, + attr->sample_simd_vec_reg_intr, + attr->sample_simd_pred_reg_qwords, + attr->sample_simd_pred_reg_intr); + if (ret) + return ret; + } =20 #ifndef CONFIG_CGROUP_PERF if (attr->sample_type & PERF_SAMPLE_CGROUP) --=20 2.38.1 From nobody Wed Oct 8 14:18:30 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 77FF627F01B for ; Fri, 15 Aug 2025 21:35:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755293752; cv=none; b=S0+JnsT4h1zky815ibSXgYlJo36M0iIMm4O4SAGW2vY+nY3TtYJBi/vDrx3llii4mBVdSp1ZECdJatKhqPAJLw76Cn7UeZPUFRzivWXGr5YJTON6qPeuzv9QqIaSOyXhBfWFgE/VD+a9waggUQk1j1neRx8jgFPHUyI0MrjKhDA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755293752; c=relaxed/simple; bh=YgpTfTUmFFvdAT5UJK9nUhKQ5ZCDVu2ZiRZaDeutoLU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=ZH5mUXhljIh2CGZch9lCsM4Gcy4kmRiJd+6TiTy/2c2P/xihqyM2QfqUZp6+5hAn6/wwIGxgXz1rerjHHpCzvVvu4GAexGjA/xC40IWa+M07HUYyrn+yLYwl30Kg6Y1UQGb0pOQukU23ve63EAbax2Bjt3HheXc0BTuRPUVQGVw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=KkQ0oUb5; arc=none smtp.client-ip=198.175.65.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="KkQ0oUb5" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1755293750; x=1786829750; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=YgpTfTUmFFvdAT5UJK9nUhKQ5ZCDVu2ZiRZaDeutoLU=; b=KkQ0oUb5Gq6Zq9mwRg9fI5IZBhrvlDisknBnfxV4uKsLi1aCUi1MnXO2 qxDd5q1LTlbRZZrDnYtMUeBB3ugf22AiUhhpEl1O7rNGnYG8vhBoPruOY 21q35kURbuw26KQAmxe0ggfd8tXhJnQZGwPwRk6GoYrUuUxhMRZmsmnGE Ztq1QANqPX26dtwRFzD3iYVOhpxbA7PPuS5K/GkQ3Sh3T9HUKNUIeO3Ti t78VMCGhLFweD9WAJN52NljnnKurLjTJ0X1BnUoMrdhPJibFUSUPt52DN zlcZaYmYZ4Wsj9tubJisvrCEV3ErSZ56yt+GEFcrafRylAmTW+nafWkgo g==; X-CSE-ConnectionGUID: 5O64e/nxS7ep/WGsiMtaow== X-CSE-MsgGUID: gD+K0hV9Ro+C3B08t9rTMw== X-IronPort-AV: E=McAfee;i="6800,10657,11523"; a="68707410" X-IronPort-AV: E=Sophos;i="6.17,293,1747724400"; d="scan'208";a="68707410" Received: from orviesa010.jf.intel.com ([10.64.159.150]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Aug 2025 14:35:46 -0700 X-CSE-ConnectionGUID: GXvbfZqmRHSMXmDd4TE77A== X-CSE-MsgGUID: QRlPCb2zT/W9GPDZ8HV/bA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.17,293,1747724400"; d="scan'208";a="166319599" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by orviesa010.jf.intel.com with ESMTP; 15 Aug 2025 14:35:47 -0700 From: kan.liang@linux.intel.com To: peterz@infradead.org, mingo@redhat.com, acme@kernel.org, namhyung@kernel.org, tglx@linutronix.de, dave.hansen@linux.intel.com, irogers@google.com, adrian.hunter@intel.com, jolsa@kernel.org, alexander.shishkin@linux.intel.com, linux-kernel@vger.kernel.org Cc: dapeng1.mi@linux.intel.com, ak@linux.intel.com, zide.chen@intel.com, mark.rutland@arm.com, broonie@kernel.org, ravi.bangoria@amd.com, eranian@google.com, Kan Liang Subject: [PATCH V3 07/17] perf/x86: Move XMM to sample_simd_vec_regs Date: Fri, 15 Aug 2025 14:34:25 -0700 Message-Id: <20250815213435.1702022-8-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20250815213435.1702022-1-kan.liang@linux.intel.com> References: <20250815213435.1702022-1-kan.liang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang The XMM0-15 are SIMD registers. Move them from sample_regs to sample_simd_vec_regs. Reject access to the extended space of the sample_regs if the new sample_simd_vec_regs is used. The perf_reg_value requires the abi to understand the layout of the sample_regs. Add the abi information in the struct x86_perf_regs. Implement the X86-specific perf_simd_reg_validate to validate the SIMD registers configuration from the user tool. Only the XMM0-15 is supported now. More registers will be added in the following patches. Implement the X86-specific perf_simd_reg_value to retrieve the XMM value. Signed-off-by: Kan Liang --- arch/x86/events/core.c | 38 ++++++++++++++++- arch/x86/events/intel/ds.c | 2 +- arch/x86/events/perf_event.h | 12 ++++++ arch/x86/include/asm/perf_event.h | 1 + arch/x86/include/uapi/asm/perf_regs.h | 6 +++ arch/x86/kernel/perf_regs.c | 61 ++++++++++++++++++++++++++- 6 files changed, 117 insertions(+), 3 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index f27c58f4c815..1789b91c95c6 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -709,6 +709,22 @@ int x86_pmu_hw_config(struct perf_event *event) return -EINVAL; if (!(x86_pmu.ext_regs_mask & XFEATURE_MASK_SSE)) return -EINVAL; + if (event->attr.sample_simd_regs_enabled) + return -EINVAL; + } + + if (event_has_simd_regs(event)) { + if (!(event->pmu->capabilities & PERF_PMU_CAP_SIMD_REGS)) + return -EINVAL; + /* Not require any vector registers but set width */ + if (event->attr.sample_simd_vec_reg_qwords && + !event->attr.sample_simd_vec_reg_intr && + !event->attr.sample_simd_vec_reg_user) + return -EINVAL; + /* The vector registers set is not supported */ + if (event->attr.sample_simd_vec_reg_qwords >=3D PERF_X86_XMM_QWORDS && + !(x86_pmu.ext_regs_mask & XFEATURE_MASK_SSE)) + return -EINVAL; } } return x86_setup_perfctr(event); @@ -1784,6 +1800,16 @@ void x86_pmu_setup_regs_data(struct perf_event *even= t, data->dyn_size +=3D sizeof(u64); if (data->regs_user.regs) data->dyn_size +=3D hweight64(attr->sample_regs_user) * sizeof(u64); + if (attr->sample_simd_regs_enabled && data->regs_user.abi) { + /* num and qwords of vector and pred registers */ + data->dyn_size +=3D sizeof(u64); + /* data[] */ + data->dyn_size +=3D hweight64(attr->sample_simd_vec_reg_user) * + sizeof(u64) * + attr->sample_simd_vec_reg_qwords; + data->regs_user.abi |=3D PERF_SAMPLE_REGS_ABI_SIMD; + } + perf_regs->abi =3D data->regs_user.abi; data->sample_flags |=3D PERF_SAMPLE_REGS_USER; } =20 @@ -1793,10 +1819,20 @@ void x86_pmu_setup_regs_data(struct perf_event *eve= nt, data->dyn_size +=3D sizeof(u64); if (data->regs_intr.regs) data->dyn_size +=3D hweight64(attr->sample_regs_intr) * sizeof(u64); + if (attr->sample_simd_regs_enabled && data->regs_intr.abi) { + /* num and qwords of vector and pred registers */ + data->dyn_size +=3D sizeof(u64); + /* data[] */ + data->dyn_size +=3D hweight64(attr->sample_simd_vec_reg_intr) * + sizeof(u64) * + attr->sample_simd_vec_reg_qwords; + data->regs_intr.abi |=3D PERF_SAMPLE_REGS_ABI_SIMD; + } + perf_regs->abi =3D data->regs_intr.abi; data->sample_flags |=3D PERF_SAMPLE_REGS_INTR; } =20 - if (event_has_extended_regs(event)) { + if (event_needs_xmm(event)) { perf_regs->xmm_regs =3D NULL; mask |=3D XFEATURE_MASK_SSE; } diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index 9cdece014ac0..4887f6ea7dde 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -1415,7 +1415,7 @@ static u64 pebs_update_adaptive_cfg(struct perf_event= *event) if (gprs || (attr->precise_ip < 2) || tsx_weight) pebs_data_cfg |=3D PEBS_DATACFG_GP; =20 - if (event_has_extended_regs(event)) + if (event_needs_xmm(event)) pebs_data_cfg |=3D PEBS_DATACFG_XMMS; =20 if (sample_type & PERF_SAMPLE_BRANCH_STACK) { diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 7bf24842b1dc..6f22ed718a75 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -133,6 +133,18 @@ static inline bool is_acr_event_group(struct perf_even= t *event) return check_leader_group(event->group_leader, PERF_X86_EVENT_ACR); } =20 +static inline bool event_needs_xmm(struct perf_event *event) +{ + if (event->attr.sample_simd_regs_enabled && + event->attr.sample_simd_vec_reg_qwords >=3D PERF_X86_XMM_QWORDS) + return true; + + if (!event->attr.sample_simd_regs_enabled && + event_has_extended_regs(event)) + return true; + return false; +} + struct amd_nb { int nb_id; /* NorthBridge id */ int refcnt; /* reference count */ diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_= event.h index f36f04bc95f1..538219c59979 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -592,6 +592,7 @@ extern void perf_events_lapic_init(void); struct pt_regs; struct x86_perf_regs { struct pt_regs regs; + u64 abi; union { u64 *xmm_regs; u32 *xmm_space; /* for xsaves */ diff --git a/arch/x86/include/uapi/asm/perf_regs.h b/arch/x86/include/uapi/= asm/perf_regs.h index 7c9d2bb3833b..bd8af802f757 100644 --- a/arch/x86/include/uapi/asm/perf_regs.h +++ b/arch/x86/include/uapi/asm/perf_regs.h @@ -55,4 +55,10 @@ enum perf_event_x86_regs { =20 #define PERF_REG_EXTENDED_MASK (~((1ULL << PERF_REG_X86_XMM0) - 1)) =20 +#define PERF_X86_SIMD_VEC_REGS_MAX 16 +#define PERF_X86_SIMD_VEC_MASK GENMASK_ULL(PERF_X86_SIMD_VEC_REGS_MAX - 1= , 0) + +#define PERF_X86_XMM_QWORDS 2 +#define PERF_X86_SIMD_QWORDS_MAX PERF_X86_XMM_QWORDS + #endif /* _ASM_X86_PERF_REGS_H */ diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c index 624703af80a1..397357c5896b 100644 --- a/arch/x86/kernel/perf_regs.c +++ b/arch/x86/kernel/perf_regs.c @@ -57,12 +57,27 @@ static unsigned int pt_regs_offset[PERF_REG_X86_MAX] = =3D { #endif }; =20 +void perf_simd_reg_check(struct pt_regs *regs, + u64 mask, u16 *nr_vectors, u16 *vec_qwords, + u16 pred_mask, u16 *nr_pred, u16 *pred_qwords) +{ + struct x86_perf_regs *perf_regs =3D container_of(regs, struct x86_perf_re= gs, regs); + + if (*vec_qwords >=3D PERF_X86_XMM_QWORDS && !perf_regs->xmm_regs) + *nr_vectors =3D 0; + + *nr_pred =3D 0; +} + u64 perf_reg_value(struct pt_regs *regs, int idx) { struct x86_perf_regs *perf_regs; =20 if (idx >=3D PERF_REG_X86_XMM0 && idx < PERF_REG_X86_XMM_MAX) { perf_regs =3D container_of(regs, struct x86_perf_regs, regs); + /* SIMD registers are moved to dedicated sample_simd_vec_reg */ + if (perf_regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) + return 0; if (!perf_regs->xmm_regs) return 0; return perf_regs->xmm_regs[idx - PERF_REG_X86_XMM0]; @@ -74,6 +89,49 @@ u64 perf_reg_value(struct pt_regs *regs, int idx) return regs_get_register(regs, pt_regs_offset[idx]); } =20 +u64 perf_simd_reg_value(struct pt_regs *regs, int idx, + u16 qwords_idx, bool pred) +{ + struct x86_perf_regs *perf_regs =3D container_of(regs, struct x86_perf_re= gs, regs); + + if (pred) + return 0; + + if (WARN_ON_ONCE(idx >=3D PERF_X86_SIMD_VEC_REGS_MAX || + qwords_idx >=3D PERF_X86_SIMD_QWORDS_MAX)) + return 0; + + if (qwords_idx < PERF_X86_XMM_QWORDS) { + if (!perf_regs->xmm_regs) + return 0; + return perf_regs->xmm_regs[idx * PERF_X86_XMM_QWORDS + qwords_idx]; + } + + return 0; +} + +int perf_simd_reg_validate(u16 vec_qwords, u64 vec_mask, + u16 pred_qwords, u32 pred_mask) +{ + /* pred_qwords implies sample_simd_{pred,vec}_reg_* are supported */ + if (!pred_qwords) + return 0; + + if (!vec_qwords) { + if (vec_mask) + return -EINVAL; + } else { + if (vec_qwords !=3D PERF_X86_XMM_QWORDS) + return -EINVAL; + if (vec_mask & ~PERF_X86_SIMD_VEC_MASK) + return -EINVAL; + } + if (pred_mask) + return -EINVAL; + + return 0; +} + #define PERF_REG_X86_RESERVED (((1ULL << PERF_REG_X86_XMM0) - 1) & \ ~((1ULL << PERF_REG_X86_MAX) - 1)) =20 @@ -114,7 +172,8 @@ void perf_get_regs_user(struct perf_regs *regs_user, =20 int perf_reg_validate(u64 mask) { - if (!mask || (mask & (REG_NOSUPPORT | PERF_REG_X86_RESERVED))) + /* The mask could be 0 if only the SIMD registers are interested */ + if (mask & (REG_NOSUPPORT | PERF_REG_X86_RESERVED)) return -EINVAL; =20 return 0; --=20 2.38.1 From nobody Wed Oct 8 14:18:30 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BA7BC2BF3FB for ; Fri, 15 Aug 2025 21:35:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755293753; cv=none; b=SokwHzD0M9uUtGHtArq58LEDRhKNbNBcHd5tBbBYFUaT7JF2e4qfRlqbLa4ai9b5CRZ4fQWY0+2SEAsGTxRDiB8VIknaLwjL57GBQkYSyT+lYoWqRuGIeqm15DE+6CgmrfG9B6BQolYQOsoCnmuOVqYvZWY2mI+3aNEPiF3pOf0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755293753; c=relaxed/simple; bh=KnrEIK0Hu7VGXUlHrHcXxvD5+aCqJwlBbWbx/FuwhxU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=uasOFuhNM7SR3dA9kULxFTcegOrZ+Hcv+dRVq4l/jqehdOjS/biedp7DAZpUKomy4hIdg73ZLGI39RJ9vlCMckJeVMCg+QqR+zWgz+Fsf3faOC8Qo5g7wDcrfvN+Dlt/Q0VyxF37cGc+bPoTc1yiJV5KcIDCtAdhZPASwf9BErU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=RbSSIO1G; arc=none smtp.client-ip=198.175.65.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="RbSSIO1G" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1755293751; x=1786829751; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=KnrEIK0Hu7VGXUlHrHcXxvD5+aCqJwlBbWbx/FuwhxU=; b=RbSSIO1GJoPLulfWODXlUncua5avehvEDyLeACDoGUyJzC047doya5y1 NEQtMO7I+xTV1mi9n9f9ZcZPWvk5L2DoHg5i1UKx8j/KcpbLEZsUk29cz RvRDAd+WQnkiulyN9FHI4M5WhFyKo7luhrdrRjwoy7/x10hcwRJ77anD4 Hwo1mQ5vlEK8q/WqRSp92cJpOsRECOk3J7tPBIAvT0vpsL3shmdldHCt5 9IozzDcHtP9aqpDUntQsZ9IYWZQz0yUOH6h7Jlvu6l9FyGlEU44nZldSY IFJhyIuPULWGUOzaXRAzShMWYqmRWBYw/eVN8TXR/NyXADLj2ctT38+9G Q==; X-CSE-ConnectionGUID: E7txgLHxRe+L1MDk7GQ4GQ== X-CSE-MsgGUID: ECGx4OYQRjuLYvCOtN84tA== X-IronPort-AV: E=McAfee;i="6800,10657,11523"; a="68707418" X-IronPort-AV: E=Sophos;i="6.17,293,1747724400"; d="scan'208";a="68707418" Received: from orviesa010.jf.intel.com ([10.64.159.150]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Aug 2025 14:35:46 -0700 X-CSE-ConnectionGUID: 2wKQkEqkSc2v/v+xxhLsYg== X-CSE-MsgGUID: 83dP9wF/S+WlwTjF94FlKA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.17,293,1747724400"; d="scan'208";a="166319603" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by orviesa010.jf.intel.com with ESMTP; 15 Aug 2025 14:35:47 -0700 From: kan.liang@linux.intel.com To: peterz@infradead.org, mingo@redhat.com, acme@kernel.org, namhyung@kernel.org, tglx@linutronix.de, dave.hansen@linux.intel.com, irogers@google.com, adrian.hunter@intel.com, jolsa@kernel.org, alexander.shishkin@linux.intel.com, linux-kernel@vger.kernel.org Cc: dapeng1.mi@linux.intel.com, ak@linux.intel.com, zide.chen@intel.com, mark.rutland@arm.com, broonie@kernel.org, ravi.bangoria@amd.com, eranian@google.com, Kan Liang Subject: [PATCH V3 08/17] perf/x86: Add YMM into sample_simd_vec_regs Date: Fri, 15 Aug 2025 14:34:26 -0700 Message-Id: <20250815213435.1702022-9-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20250815213435.1702022-1-kan.liang@linux.intel.com> References: <20250815213435.1702022-1-kan.liang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang The YMM0-15 is composed of XMM and YMMH. It requires 2 XSAVE commands to get the complete value. Internally, the XMM and YMMH are stored in different structures, which follow the XSAVE format. But the output dumps the YMM as a whole. The qwords 4 imply YMM. Signed-off-by: Kan Liang --- arch/x86/events/core.c | 13 +++++++++++++ arch/x86/include/asm/perf_event.h | 4 ++++ arch/x86/include/uapi/asm/perf_regs.h | 4 +++- arch/x86/kernel/perf_regs.c | 10 +++++++++- 4 files changed, 29 insertions(+), 2 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 1789b91c95c6..aebd4e56dff1 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -423,6 +423,9 @@ static void x86_pmu_get_ext_regs(struct x86_perf_regs *= perf_regs, u64 mask) =20 if (valid_mask & XFEATURE_MASK_SSE) perf_regs->xmm_space =3D xsave->i387.xmm_space; + + if (valid_mask & XFEATURE_MASK_YMM) + perf_regs->ymmh =3D get_xsave_addr(xsave, XFEATURE_YMM); } =20 static void release_ext_regs_buffers(void) @@ -725,6 +728,9 @@ int x86_pmu_hw_config(struct perf_event *event) if (event->attr.sample_simd_vec_reg_qwords >=3D PERF_X86_XMM_QWORDS && !(x86_pmu.ext_regs_mask & XFEATURE_MASK_SSE)) return -EINVAL; + if (event->attr.sample_simd_vec_reg_qwords >=3D PERF_X86_YMM_QWORDS && + !(x86_pmu.ext_regs_mask & XFEATURE_MASK_YMM)) + return -EINVAL; } } return x86_setup_perfctr(event); @@ -1837,6 +1843,13 @@ void x86_pmu_setup_regs_data(struct perf_event *even= t, mask |=3D XFEATURE_MASK_SSE; } =20 + if (attr->sample_simd_regs_enabled) { + if (attr->sample_simd_vec_reg_qwords >=3D PERF_X86_YMM_QWORDS) { + perf_regs->ymmh_regs =3D NULL; + mask |=3D XFEATURE_MASK_YMM; + } + } + mask &=3D ~ignore_mask; if (mask) x86_pmu_get_ext_regs(perf_regs, mask); diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_= event.h index 538219c59979..81e3143fd91a 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -597,6 +597,10 @@ struct x86_perf_regs { u64 *xmm_regs; u32 *xmm_space; /* for xsaves */ }; + union { + u64 *ymmh_regs; + struct ymmh_struct *ymmh; + }; }; =20 extern unsigned long perf_arch_instruction_pointer(struct pt_regs *regs); diff --git a/arch/x86/include/uapi/asm/perf_regs.h b/arch/x86/include/uapi/= asm/perf_regs.h index bd8af802f757..feb3e8f80761 100644 --- a/arch/x86/include/uapi/asm/perf_regs.h +++ b/arch/x86/include/uapi/asm/perf_regs.h @@ -59,6 +59,8 @@ enum perf_event_x86_regs { #define PERF_X86_SIMD_VEC_MASK GENMASK_ULL(PERF_X86_SIMD_VEC_REGS_MAX - 1= , 0) =20 #define PERF_X86_XMM_QWORDS 2 -#define PERF_X86_SIMD_QWORDS_MAX PERF_X86_XMM_QWORDS +#define PERF_X86_YMM_QWORDS 4 +#define PERF_X86_YMMH_QWORDS (PERF_X86_YMM_QWORDS / 2) +#define PERF_X86_SIMD_QWORDS_MAX PERF_X86_YMM_QWORDS =20 #endif /* _ASM_X86_PERF_REGS_H */ diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c index 397357c5896b..d94bc687e4bf 100644 --- a/arch/x86/kernel/perf_regs.c +++ b/arch/x86/kernel/perf_regs.c @@ -66,6 +66,9 @@ void perf_simd_reg_check(struct pt_regs *regs, if (*vec_qwords >=3D PERF_X86_XMM_QWORDS && !perf_regs->xmm_regs) *nr_vectors =3D 0; =20 + if (*vec_qwords >=3D PERF_X86_YMM_QWORDS && !perf_regs->xmm_regs) + *vec_qwords =3D PERF_X86_XMM_QWORDS; + *nr_pred =3D 0; } =20 @@ -105,6 +108,10 @@ u64 perf_simd_reg_value(struct pt_regs *regs, int idx, if (!perf_regs->xmm_regs) return 0; return perf_regs->xmm_regs[idx * PERF_X86_XMM_QWORDS + qwords_idx]; + } else if (qwords_idx < PERF_X86_YMM_QWORDS) { + if (!perf_regs->ymmh_regs) + return 0; + return perf_regs->ymmh_regs[idx * PERF_X86_YMMH_QWORDS + qwords_idx - PE= RF_X86_XMM_QWORDS]; } =20 return 0; @@ -121,7 +128,8 @@ int perf_simd_reg_validate(u16 vec_qwords, u64 vec_mask, if (vec_mask) return -EINVAL; } else { - if (vec_qwords !=3D PERF_X86_XMM_QWORDS) + if (vec_qwords !=3D PERF_X86_XMM_QWORDS && + vec_qwords !=3D PERF_X86_YMM_QWORDS) return -EINVAL; if (vec_mask & ~PERF_X86_SIMD_VEC_MASK) return -EINVAL; --=20 2.38.1 From nobody Wed Oct 8 14:18:30 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4AFA52C0F84 for ; Fri, 15 Aug 2025 21:35:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755293754; cv=none; b=P1tJtFMh8P5MF9da5esuUHgcYELuSmAc54Rh3TksfClu1qacXl72a+JQOcdkjwyvTIk7X63b6/5FTghMT0lPW+jh9ZNlqbJcQP96S6MMR2a6xAftggWB8VKzazY7+aME21PnvWvchMbFAdHKeCgHuzFxz0VQdYYMNBc8XFtolCk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755293754; c=relaxed/simple; bh=65QRemEGqKZDYP445qWHAGhZ6ykDc+ZyhHM+PxueYOY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=WlPIDfIsPWsIbJyXARBV27YKEf8TlGt/rbgB3UZGyCKEIwbFbmH4tHvHc++gK4zzfwUjOECkdbKB9goSAUZsnGW/uTQ7F/KglSfoY6pJvbFD5yzMCDnzgxWOC3ZXS6lgGSXKsY37Ev48FJs3c2/Bk3uH9Fs9BapVGlAuM5yZklQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=kcCjYJOU; arc=none smtp.client-ip=198.175.65.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="kcCjYJOU" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1755293752; x=1786829752; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=65QRemEGqKZDYP445qWHAGhZ6ykDc+ZyhHM+PxueYOY=; b=kcCjYJOUF+jZxAHrRcW8ln3q9YGMP/4jLmBl5GXHR6ziUNYv11o/4T3R SeirgKnPVbNrqroTA+pKYjcrg/KbttnLT7kBSMMNODNaF0UEnHijG+pb5 oejwob/Xa+mXUVy3iOzFgMlqb/FG7tzWS5HqM9bVHUCQi/ID2uJejty0u VyAUWTChqyZa83OggdSqqLia33OIAqnCGqRlTj+5m3UXpAQvKg22j8GPm Hny0D3L6B6sWVqrUA3EJWfDSp2uWBawuC31abXMrC/zVrVt3DI62WFZ2I JHe/G1wc44aVbnnP9osH2UCFZ9/XGci6nlyN2+PV5WTvC+hNCsm0P+DjD g==; X-CSE-ConnectionGUID: a4y0fO1VSHyzSFLYUe8a8w== X-CSE-MsgGUID: QNhhX6WPTyy40GNOKUhXJA== X-IronPort-AV: E=McAfee;i="6800,10657,11523"; a="68707427" X-IronPort-AV: E=Sophos;i="6.17,293,1747724400"; d="scan'208";a="68707427" Received: from orviesa010.jf.intel.com ([10.64.159.150]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Aug 2025 14:35:46 -0700 X-CSE-ConnectionGUID: 9blzD+GUR/KJ53FOz+Y9UA== X-CSE-MsgGUID: 0msfRi0BSq+IXIuwVJsKjA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.17,293,1747724400"; d="scan'208";a="166319606" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by orviesa010.jf.intel.com with ESMTP; 15 Aug 2025 14:35:47 -0700 From: kan.liang@linux.intel.com To: peterz@infradead.org, mingo@redhat.com, acme@kernel.org, namhyung@kernel.org, tglx@linutronix.de, dave.hansen@linux.intel.com, irogers@google.com, adrian.hunter@intel.com, jolsa@kernel.org, alexander.shishkin@linux.intel.com, linux-kernel@vger.kernel.org Cc: dapeng1.mi@linux.intel.com, ak@linux.intel.com, zide.chen@intel.com, mark.rutland@arm.com, broonie@kernel.org, ravi.bangoria@amd.com, eranian@google.com, Kan Liang Subject: [PATCH V3 09/17] perf/x86: Add ZMM into sample_simd_vec_regs Date: Fri, 15 Aug 2025 14:34:27 -0700 Message-Id: <20250815213435.1702022-10-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20250815213435.1702022-1-kan.liang@linux.intel.com> References: <20250815213435.1702022-1-kan.liang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang The ZMM0-15 is composed of XMM, YMMH, and ZMMH. It requires 3 XSAVE commands to get the complete value. The ZMM16-31/YMM16-31/XMM16-31 are also supported, which only require the XSAVE Hi16_ZMM. Internally, the XMM, YMMH, ZMMH and Hi16_ZMM are stored in different structures, which follow the XSAVE format. But the output dumps the ZMM or Hi16 XMM/YMM/ZMM as a whole. The qwords 8 imply ZMM. Signed-off-by: Kan Liang --- arch/x86/events/core.c | 20 ++++++++++++++++++++ arch/x86/include/asm/perf_event.h | 8 ++++++++ arch/x86/include/uapi/asm/perf_regs.h | 8 ++++++-- arch/x86/kernel/perf_regs.c | 19 ++++++++++++++++++- 4 files changed, 52 insertions(+), 3 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index aebd4e56dff1..85b739fe1693 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -426,6 +426,10 @@ static void x86_pmu_get_ext_regs(struct x86_perf_regs = *perf_regs, u64 mask) =20 if (valid_mask & XFEATURE_MASK_YMM) perf_regs->ymmh =3D get_xsave_addr(xsave, XFEATURE_YMM); + if (valid_mask & XFEATURE_MASK_ZMM_Hi256) + perf_regs->zmmh =3D get_xsave_addr(xsave, XFEATURE_ZMM_Hi256); + if (valid_mask & XFEATURE_MASK_Hi16_ZMM) + perf_regs->h16zmm =3D get_xsave_addr(xsave, XFEATURE_Hi16_ZMM); } =20 static void release_ext_regs_buffers(void) @@ -731,6 +735,13 @@ int x86_pmu_hw_config(struct perf_event *event) if (event->attr.sample_simd_vec_reg_qwords >=3D PERF_X86_YMM_QWORDS && !(x86_pmu.ext_regs_mask & XFEATURE_MASK_YMM)) return -EINVAL; + if (event->attr.sample_simd_vec_reg_qwords >=3D PERF_X86_ZMM_QWORDS && + !(x86_pmu.ext_regs_mask & XFEATURE_MASK_ZMM_Hi256)) + return -EINVAL; + if ((fls64(event->attr.sample_simd_vec_reg_intr) > PERF_X86_H16ZMM_BASE= || + fls64(event->attr.sample_simd_vec_reg_user) > PERF_X86_H16ZMM_BASE= ) && + !(x86_pmu.ext_regs_mask & XFEATURE_MASK_Hi16_ZMM)) + return -EINVAL; } } return x86_setup_perfctr(event); @@ -1848,6 +1859,15 @@ void x86_pmu_setup_regs_data(struct perf_event *even= t, perf_regs->ymmh_regs =3D NULL; mask |=3D XFEATURE_MASK_YMM; } + if (attr->sample_simd_vec_reg_qwords >=3D PERF_X86_ZMM_QWORDS) { + perf_regs->zmmh_regs =3D NULL; + mask |=3D XFEATURE_MASK_ZMM_Hi256; + } + if (fls64(attr->sample_simd_vec_reg_intr) > PERF_X86_H16ZMM_BASE || + fls64(attr->sample_simd_vec_reg_user) > PERF_X86_H16ZMM_BASE) { + perf_regs->h16zmm_regs =3D NULL; + mask |=3D XFEATURE_MASK_Hi16_ZMM; + } } =20 mask &=3D ~ignore_mask; diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_= event.h index 81e3143fd91a..2d78bd9649bd 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -601,6 +601,14 @@ struct x86_perf_regs { u64 *ymmh_regs; struct ymmh_struct *ymmh; }; + union { + u64 *zmmh_regs; + struct avx_512_zmm_uppers_state *zmmh; + }; + union { + u64 *h16zmm_regs; + struct avx_512_hi16_state *h16zmm; + }; }; =20 extern unsigned long perf_arch_instruction_pointer(struct pt_regs *regs); diff --git a/arch/x86/include/uapi/asm/perf_regs.h b/arch/x86/include/uapi/= asm/perf_regs.h index feb3e8f80761..f74e3ba65be2 100644 --- a/arch/x86/include/uapi/asm/perf_regs.h +++ b/arch/x86/include/uapi/asm/perf_regs.h @@ -55,12 +55,16 @@ enum perf_event_x86_regs { =20 #define PERF_REG_EXTENDED_MASK (~((1ULL << PERF_REG_X86_XMM0) - 1)) =20 -#define PERF_X86_SIMD_VEC_REGS_MAX 16 +#define PERF_X86_SIMD_VEC_REGS_MAX 32 #define PERF_X86_SIMD_VEC_MASK GENMASK_ULL(PERF_X86_SIMD_VEC_REGS_MAX - 1= , 0) =20 +#define PERF_X86_H16ZMM_BASE 16 + #define PERF_X86_XMM_QWORDS 2 #define PERF_X86_YMM_QWORDS 4 #define PERF_X86_YMMH_QWORDS (PERF_X86_YMM_QWORDS / 2) -#define PERF_X86_SIMD_QWORDS_MAX PERF_X86_YMM_QWORDS +#define PERF_X86_ZMM_QWORDS 8 +#define PERF_X86_ZMMH_QWORDS (PERF_X86_ZMM_QWORDS / 2) +#define PERF_X86_SIMD_QWORDS_MAX PERF_X86_ZMM_QWORDS =20 #endif /* _ASM_X86_PERF_REGS_H */ diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c index d94bc687e4bf..f04c44d3d356 100644 --- a/arch/x86/kernel/perf_regs.c +++ b/arch/x86/kernel/perf_regs.c @@ -69,6 +69,12 @@ void perf_simd_reg_check(struct pt_regs *regs, if (*vec_qwords >=3D PERF_X86_YMM_QWORDS && !perf_regs->xmm_regs) *vec_qwords =3D PERF_X86_XMM_QWORDS; =20 + if (*vec_qwords >=3D PERF_X86_ZMM_QWORDS && !perf_regs->zmmh_regs) + *vec_qwords =3D PERF_X86_YMM_QWORDS; + + if (*nr_vectors > PERF_X86_H16ZMM_BASE && !perf_regs->h16zmm_regs) + *nr_vectors =3D PERF_X86_H16ZMM_BASE; + *nr_pred =3D 0; } =20 @@ -104,6 +110,12 @@ u64 perf_simd_reg_value(struct pt_regs *regs, int idx, qwords_idx >=3D PERF_X86_SIMD_QWORDS_MAX)) return 0; =20 + if (idx >=3D PERF_X86_H16ZMM_BASE) { + if (!perf_regs->h16zmm_regs) + return 0; + return perf_regs->h16zmm_regs[idx * PERF_X86_ZMM_QWORDS + qwords_idx]; + } + if (qwords_idx < PERF_X86_XMM_QWORDS) { if (!perf_regs->xmm_regs) return 0; @@ -112,6 +124,10 @@ u64 perf_simd_reg_value(struct pt_regs *regs, int idx, if (!perf_regs->ymmh_regs) return 0; return perf_regs->ymmh_regs[idx * PERF_X86_YMMH_QWORDS + qwords_idx - PE= RF_X86_XMM_QWORDS]; + } else if (qwords_idx < PERF_X86_ZMM_QWORDS) { + if (!perf_regs->zmmh_regs) + return 0; + return perf_regs->zmmh_regs[idx * PERF_X86_ZMMH_QWORDS + qwords_idx - PE= RF_X86_YMM_QWORDS]; } =20 return 0; @@ -129,7 +145,8 @@ int perf_simd_reg_validate(u16 vec_qwords, u64 vec_mask, return -EINVAL; } else { if (vec_qwords !=3D PERF_X86_XMM_QWORDS && - vec_qwords !=3D PERF_X86_YMM_QWORDS) + vec_qwords !=3D PERF_X86_YMM_QWORDS && + vec_qwords !=3D PERF_X86_ZMM_QWORDS) return -EINVAL; if (vec_mask & ~PERF_X86_SIMD_VEC_MASK) return -EINVAL; --=20 2.38.1 From nobody Wed Oct 8 14:18:30 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1F5992D3229 for ; Fri, 15 Aug 2025 21:35:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755293754; cv=none; b=PNeIfbnX799+MIOXMrBKY7EavwUTrxXbxsmx6+7SHsnbm4SHyg3JhRIaWUknQvjmzLXbdyT4gOMpv7fMnuBqghPnMWsrAbJfceDl9+WrwdXslfxKAJYC78md3Hj6xf7h5g2YM96D/zykqLH7tr1URv7k4Z4ShBtJvy3RtBdqxn8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755293754; c=relaxed/simple; bh=kDddHfsUPu43BOSmzJScDNiehhqgbeN1X7dbmj3yB3g=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Rg5MCbG3WId7g4Nia1Lhh2UWdXXhuGJUuEfZ1VaBII3PPVGYevuXKFzt4yBopQQTAODeBjcfaD15qZkNTrLUQ26vQ1NThrPvSS12up70PvQ9odxRtzpRfj3+H1BUMWw85eLvAVTPtkblXuqg12i5Mqbfn6/LVHklSSqgC3KEc+0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Jz58FO/9; arc=none smtp.client-ip=198.175.65.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Jz58FO/9" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1755293753; x=1786829753; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=kDddHfsUPu43BOSmzJScDNiehhqgbeN1X7dbmj3yB3g=; b=Jz58FO/9gQjHHRDFli/z+ORTannYPj8XkPQ8kwUXiudYTAfZLWsy7kf7 ogvnQF8N//ft52qD0FIY7c2t4eTMKOLIVZkFUgiSNhdVTXuHUvLkQuFx5 bLwU1L7fE7IhuQo182i1N2X9KipDOqXbS0c8V1hFQJqETO1xQvbY84KHj +tKMFraVDU02s++cWD/DpOYGKYqoNLrnfb+rkdMWeHCkdk1Wv7CfIiaW5 ZTyE0IAD1LhUJVc6mLq7NIYVAMIW7iS1dJKFC4w02Gf6Bk3f+6RAqhb5D GjSoHyRHryiWkJ8QEz18vY+NUrA1UUl2drv7kacsJ9MTjonhqiHNE23Z5 A==; X-CSE-ConnectionGUID: m98A3gFPTcyGco/heCggKA== X-CSE-MsgGUID: MYiSbFmlS6yBNCPAA/3Mog== X-IronPort-AV: E=McAfee;i="6800,10657,11523"; a="68707435" X-IronPort-AV: E=Sophos;i="6.17,293,1747724400"; d="scan'208";a="68707435" Received: from orviesa010.jf.intel.com ([10.64.159.150]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Aug 2025 14:35:47 -0700 X-CSE-ConnectionGUID: gJJwc7bMTzuw7prbtLVEWQ== X-CSE-MsgGUID: uCss7RiJQ3i2MPVecBu3AQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.17,293,1747724400"; d="scan'208";a="166319610" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by orviesa010.jf.intel.com with ESMTP; 15 Aug 2025 14:35:47 -0700 From: kan.liang@linux.intel.com To: peterz@infradead.org, mingo@redhat.com, acme@kernel.org, namhyung@kernel.org, tglx@linutronix.de, dave.hansen@linux.intel.com, irogers@google.com, adrian.hunter@intel.com, jolsa@kernel.org, alexander.shishkin@linux.intel.com, linux-kernel@vger.kernel.org Cc: dapeng1.mi@linux.intel.com, ak@linux.intel.com, zide.chen@intel.com, mark.rutland@arm.com, broonie@kernel.org, ravi.bangoria@amd.com, eranian@google.com, Kan Liang Subject: [PATCH V3 10/17] perf/x86: Add OPMASK into sample_simd_pred_reg Date: Fri, 15 Aug 2025 14:34:28 -0700 Message-Id: <20250815213435.1702022-11-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20250815213435.1702022-1-kan.liang@linux.intel.com> References: <20250815213435.1702022-1-kan.liang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang The OPMASK is the SIMD's predicate registers. Add them into sample_simd_pred_reg. The qwords of OPMASK is 1. There are 8 registers. Signed-off-by: Kan Liang --- arch/x86/events/core.c | 13 +++++++++++++ arch/x86/include/asm/perf_event.h | 4 ++++ arch/x86/include/uapi/asm/perf_regs.h | 3 +++ arch/x86/kernel/perf_regs.c | 18 ++++++++++++++---- 4 files changed, 34 insertions(+), 4 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 85b739fe1693..1fa550efcdfa 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -430,6 +430,8 @@ static void x86_pmu_get_ext_regs(struct x86_perf_regs *= perf_regs, u64 mask) perf_regs->zmmh =3D get_xsave_addr(xsave, XFEATURE_ZMM_Hi256); if (valid_mask & XFEATURE_MASK_Hi16_ZMM) perf_regs->h16zmm =3D get_xsave_addr(xsave, XFEATURE_Hi16_ZMM); + if (valid_mask & XFEATURE_MASK_OPMASK) + perf_regs->opmask =3D get_xsave_addr(xsave, XFEATURE_OPMASK); } =20 static void release_ext_regs_buffers(void) @@ -1824,6 +1826,9 @@ void x86_pmu_setup_regs_data(struct perf_event *event, data->dyn_size +=3D hweight64(attr->sample_simd_vec_reg_user) * sizeof(u64) * attr->sample_simd_vec_reg_qwords; + data->dyn_size +=3D hweight32(attr->sample_simd_pred_reg_user) * + sizeof(u64) * + attr->sample_simd_pred_reg_qwords; data->regs_user.abi |=3D PERF_SAMPLE_REGS_ABI_SIMD; } perf_regs->abi =3D data->regs_user.abi; @@ -1843,6 +1848,9 @@ void x86_pmu_setup_regs_data(struct perf_event *event, data->dyn_size +=3D hweight64(attr->sample_simd_vec_reg_intr) * sizeof(u64) * attr->sample_simd_vec_reg_qwords; + data->dyn_size +=3D hweight32(attr->sample_simd_pred_reg_intr) * + sizeof(u64) * + attr->sample_simd_pred_reg_qwords; data->regs_intr.abi |=3D PERF_SAMPLE_REGS_ABI_SIMD; } perf_regs->abi =3D data->regs_intr.abi; @@ -1868,6 +1876,11 @@ void x86_pmu_setup_regs_data(struct perf_event *even= t, perf_regs->h16zmm_regs =3D NULL; mask |=3D XFEATURE_MASK_Hi16_ZMM; } + if (attr->sample_simd_pred_reg_intr || + attr->sample_simd_pred_reg_user) { + perf_regs->opmask_regs =3D NULL; + mask |=3D XFEATURE_MASK_OPMASK; + } } =20 mask &=3D ~ignore_mask; diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_= event.h index 2d78bd9649bd..dda677022882 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -609,6 +609,10 @@ struct x86_perf_regs { u64 *h16zmm_regs; struct avx_512_hi16_state *h16zmm; }; + union { + u64 *opmask_regs; + struct avx_512_opmask_state *opmask; + }; }; =20 extern unsigned long perf_arch_instruction_pointer(struct pt_regs *regs); diff --git a/arch/x86/include/uapi/asm/perf_regs.h b/arch/x86/include/uapi/= asm/perf_regs.h index f74e3ba65be2..dd7bd1dd8d39 100644 --- a/arch/x86/include/uapi/asm/perf_regs.h +++ b/arch/x86/include/uapi/asm/perf_regs.h @@ -55,11 +55,14 @@ enum perf_event_x86_regs { =20 #define PERF_REG_EXTENDED_MASK (~((1ULL << PERF_REG_X86_XMM0) - 1)) =20 +#define PERF_X86_SIMD_PRED_REGS_MAX 8 +#define PERF_X86_SIMD_PRED_MASK GENMASK(PERF_X86_SIMD_PRED_REGS_MAX - 1, = 0) #define PERF_X86_SIMD_VEC_REGS_MAX 32 #define PERF_X86_SIMD_VEC_MASK GENMASK_ULL(PERF_X86_SIMD_VEC_REGS_MAX - 1= , 0) =20 #define PERF_X86_H16ZMM_BASE 16 =20 +#define PERF_X86_OPMASK_QWORDS 1 #define PERF_X86_XMM_QWORDS 2 #define PERF_X86_YMM_QWORDS 4 #define PERF_X86_YMMH_QWORDS (PERF_X86_YMM_QWORDS / 2) diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c index f04c44d3d356..5e815f806605 100644 --- a/arch/x86/kernel/perf_regs.c +++ b/arch/x86/kernel/perf_regs.c @@ -75,7 +75,8 @@ void perf_simd_reg_check(struct pt_regs *regs, if (*nr_vectors > PERF_X86_H16ZMM_BASE && !perf_regs->h16zmm_regs) *nr_vectors =3D PERF_X86_H16ZMM_BASE; =20 - *nr_pred =3D 0; + if (*nr_pred && !perf_regs->opmask_regs) + *nr_pred =3D 0; } =20 u64 perf_reg_value(struct pt_regs *regs, int idx) @@ -103,8 +104,14 @@ u64 perf_simd_reg_value(struct pt_regs *regs, int idx, { struct x86_perf_regs *perf_regs =3D container_of(regs, struct x86_perf_re= gs, regs); =20 - if (pred) - return 0; + if (pred) { + if (WARN_ON_ONCE(idx >=3D PERF_X86_SIMD_PRED_REGS_MAX || + qwords_idx >=3D PERF_X86_OPMASK_QWORDS)) + return 0; + if (!perf_regs->opmask_regs) + return 0; + return perf_regs->opmask_regs[idx]; + } =20 if (WARN_ON_ONCE(idx >=3D PERF_X86_SIMD_VEC_REGS_MAX || qwords_idx >=3D PERF_X86_SIMD_QWORDS_MAX)) @@ -151,7 +158,10 @@ int perf_simd_reg_validate(u16 vec_qwords, u64 vec_mas= k, if (vec_mask & ~PERF_X86_SIMD_VEC_MASK) return -EINVAL; } - if (pred_mask) + + if (pred_qwords !=3D PERF_X86_OPMASK_QWORDS) + return -EINVAL; + if (pred_mask & ~PERF_X86_SIMD_PRED_MASK) return -EINVAL; =20 return 0; --=20 2.38.1 From nobody Wed Oct 8 14:18:30 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 894412E2DFD for ; Fri, 15 Aug 2025 21:35:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755293755; cv=none; b=uSRD1ihx4C/RjO/ZBgSRqRkHuKrJ+WFYQkGWhuzbuWHh9qCyzH8ch7KILUud2pP88BlSj1pXS5/vwKVP4wAAy18/MRzRiyEhkD1GM2BcNR7rBh/3cpH3KOm+c7x+V+c0GuTHjqY+bFHI7IX9VpzcsH8dF0DX9D99XsB573wcZV0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755293755; c=relaxed/simple; bh=9a37j8SIObA/Y7POlaW8vXb6NHK1dLDqle8KAAgUZ3k=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=DKKRVBxUwEaUzUyVcdNCfpzcfKJkkzpFZ7MmVsW90AQoO7NuI4bmf414fUZMyAOVlPZtSuR+70U/fINQ6FJvlXo+u0ZQC8aGtorjF4aQ41vkcrcxJWX6nEAISNpPDcCvM9R2hynD12EV1AwwDx9MfnipW3Bb/EqROb5aBvqiG+M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=BH+FmtjE; arc=none smtp.client-ip=198.175.65.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="BH+FmtjE" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1755293753; x=1786829753; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=9a37j8SIObA/Y7POlaW8vXb6NHK1dLDqle8KAAgUZ3k=; b=BH+FmtjE8ghVupsNLf6oeQMOM6ObMiavU4XmozSIqtmPF5neO0+/bXns Kh7EZxCMVNZbIhkxT987mywfHrzSFWDq30uZIRswf7mUO0upY9LdW6cul cMf0aV4rnVOYxW4DDEMynxTADG/BXYtRMwiL5624hWa8YkAoMMcHeVS9Z 7z1DjoFx4lhZSGuC+ogIfqHhBFPoY/6oPxhUVHrsz/t8KiCogqJXRj4RF sZ0+WL0xadvm82ZcEjDWDnLSYg/GIYmJh2BPJLPtc1PBti6LqKWyU/qxL LDs7oxfNPkxRE0GInU3jnIa1doxiZgcHArXYFWWQFiOHy2vASfYOZUI+4 w==; X-CSE-ConnectionGUID: ZfWidEzyS8uatj5s6mcAZg== X-CSE-MsgGUID: 1XXewvshTtevpXAgcJY2Qg== X-IronPort-AV: E=McAfee;i="6800,10657,11523"; a="68707443" X-IronPort-AV: E=Sophos;i="6.17,293,1747724400"; d="scan'208";a="68707443" Received: from orviesa010.jf.intel.com ([10.64.159.150]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Aug 2025 14:35:47 -0700 X-CSE-ConnectionGUID: IAAsYWyuTTmHC7EZkihtkg== X-CSE-MsgGUID: /RNl79gVScGnFn6ICizPng== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.17,293,1747724400"; d="scan'208";a="166319614" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by orviesa010.jf.intel.com with ESMTP; 15 Aug 2025 14:35:47 -0700 From: kan.liang@linux.intel.com To: peterz@infradead.org, mingo@redhat.com, acme@kernel.org, namhyung@kernel.org, tglx@linutronix.de, dave.hansen@linux.intel.com, irogers@google.com, adrian.hunter@intel.com, jolsa@kernel.org, alexander.shishkin@linux.intel.com, linux-kernel@vger.kernel.org Cc: dapeng1.mi@linux.intel.com, ak@linux.intel.com, zide.chen@intel.com, mark.rutland@arm.com, broonie@kernel.org, ravi.bangoria@amd.com, eranian@google.com, Kan Liang Subject: [PATCH V3 11/17] perf/x86: Add eGPRs into sample_regs Date: Fri, 15 Aug 2025 14:34:29 -0700 Message-Id: <20250815213435.1702022-12-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20250815213435.1702022-1-kan.liang@linux.intel.com> References: <20250815213435.1702022-1-kan.liang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang The eGPRs is only supported when the new SIMD registers configuration method is used, which moves the XMM to sample_simd_vec_regs. So the space can be reclaimed for the eGPRs. The eGPRs is retrieved by XSAVE. Only support the eGPRs for X86_64. Suggested-by: Peter Zijlstra (Intel) Signed-off-by: Kan Liang --- arch/x86/events/core.c | 39 +++++++++++++++++++++------ arch/x86/include/asm/perf_event.h | 4 +++ arch/x86/include/uapi/asm/perf_regs.h | 26 ++++++++++++++++-- arch/x86/kernel/perf_regs.c | 31 ++++++++++----------- 4 files changed, 75 insertions(+), 25 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 1fa550efcdfa..f816290defc1 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -432,6 +432,8 @@ static void x86_pmu_get_ext_regs(struct x86_perf_regs *= perf_regs, u64 mask) perf_regs->h16zmm =3D get_xsave_addr(xsave, XFEATURE_Hi16_ZMM); if (valid_mask & XFEATURE_MASK_OPMASK) perf_regs->opmask =3D get_xsave_addr(xsave, XFEATURE_OPMASK); + if (valid_mask & XFEATURE_MASK_APX) + perf_regs->egpr =3D get_xsave_addr(xsave, XFEATURE_APX); } =20 static void release_ext_regs_buffers(void) @@ -709,17 +711,33 @@ int x86_pmu_hw_config(struct perf_event *event) } =20 if (event->attr.sample_type & (PERF_SAMPLE_REGS_INTR | PERF_SAMPLE_REGS_U= SER)) { - /* - * Besides the general purpose registers, XMM registers may - * be collected as well. - */ - if (event_has_extended_regs(event)) { - if (!(event->pmu->capabilities & PERF_PMU_CAP_EXTENDED_REGS)) + if (event->attr.sample_simd_regs_enabled) { + u64 reserved =3D ~GENMASK_ULL(PERF_REG_X86_64_MAX - 1, 0); + + if (!(event->pmu->capabilities & PERF_PMU_CAP_SIMD_REGS)) return -EINVAL; - if (!(x86_pmu.ext_regs_mask & XFEATURE_MASK_SSE)) + /* + * The XMM space in the perf_event_x86_regs is reclaimed + * for eGPRs and other general registers. + */ + if (event->attr.sample_regs_user & reserved || + event->attr.sample_regs_intr & reserved) return -EINVAL; - if (event->attr.sample_simd_regs_enabled) + if ((event->attr.sample_regs_user & PERF_X86_EGPRS_MASK || + event->attr.sample_regs_intr & PERF_X86_EGPRS_MASK) && + !(x86_pmu.ext_regs_mask & XFEATURE_MASK_APX)) return -EINVAL; + } else { + /* + * Besides the general purpose registers, XMM registers may + * be collected as well. + */ + if (event_has_extended_regs(event)) { + if (!(event->pmu->capabilities & PERF_PMU_CAP_EXTENDED_REGS)) + return -EINVAL; + if (!(x86_pmu.ext_regs_mask & XFEATURE_MASK_SSE)) + return -EINVAL; + } } =20 if (event_has_simd_regs(event)) { @@ -1881,6 +1899,11 @@ void x86_pmu_setup_regs_data(struct perf_event *even= t, perf_regs->opmask_regs =3D NULL; mask |=3D XFEATURE_MASK_OPMASK; } + if (attr->sample_regs_user & PERF_X86_EGPRS_MASK || + attr->sample_regs_intr & PERF_X86_EGPRS_MASK) { + perf_regs->egpr_regs =3D NULL; + mask |=3D XFEATURE_MASK_APX; + } } =20 mask &=3D ~ignore_mask; diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_= event.h index dda677022882..4400cb66bc8e 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -613,6 +613,10 @@ struct x86_perf_regs { u64 *opmask_regs; struct avx_512_opmask_state *opmask; }; + union { + u64 *egpr_regs; + struct apx_state *egpr; + }; }; =20 extern unsigned long perf_arch_instruction_pointer(struct pt_regs *regs); diff --git a/arch/x86/include/uapi/asm/perf_regs.h b/arch/x86/include/uapi/= asm/perf_regs.h index dd7bd1dd8d39..cd0f6804debf 100644 --- a/arch/x86/include/uapi/asm/perf_regs.h +++ b/arch/x86/include/uapi/asm/perf_regs.h @@ -27,11 +27,31 @@ enum perf_event_x86_regs { PERF_REG_X86_R13, PERF_REG_X86_R14, PERF_REG_X86_R15, + /* Extended GPRs (EGPRs) */ + PERF_REG_X86_R16, + PERF_REG_X86_R17, + PERF_REG_X86_R18, + PERF_REG_X86_R19, + PERF_REG_X86_R20, + PERF_REG_X86_R21, + PERF_REG_X86_R22, + PERF_REG_X86_R23, + PERF_REG_X86_R24, + PERF_REG_X86_R25, + PERF_REG_X86_R26, + PERF_REG_X86_R27, + PERF_REG_X86_R28, + PERF_REG_X86_R29, + PERF_REG_X86_R30, + PERF_REG_X86_R31, /* These are the limits for the GPRs. */ PERF_REG_X86_32_MAX =3D PERF_REG_X86_GS + 1, - PERF_REG_X86_64_MAX =3D PERF_REG_X86_R15 + 1, + PERF_REG_X86_64_MAX =3D PERF_REG_X86_R31 + 1, =20 - /* These all need two bits set because they are 128bit */ + /* + * These all need two bits set because they are 128bit. + * These are only available when !PERF_SAMPLE_REGS_ABI_SIMD + */ PERF_REG_X86_XMM0 =3D 32, PERF_REG_X86_XMM1 =3D 34, PERF_REG_X86_XMM2 =3D 36, @@ -55,6 +75,8 @@ enum perf_event_x86_regs { =20 #define PERF_REG_EXTENDED_MASK (~((1ULL << PERF_REG_X86_XMM0) - 1)) =20 +#define PERF_X86_EGPRS_MASK GENMASK_ULL(PERF_REG_X86_R31, PERF_REG_X86_R1= 6) + #define PERF_X86_SIMD_PRED_REGS_MAX 8 #define PERF_X86_SIMD_PRED_MASK GENMASK(PERF_X86_SIMD_PRED_REGS_MAX - 1, = 0) #define PERF_X86_SIMD_VEC_REGS_MAX 32 diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c index 5e815f806605..b6e50194ff3e 100644 --- a/arch/x86/kernel/perf_regs.c +++ b/arch/x86/kernel/perf_regs.c @@ -83,14 +83,22 @@ u64 perf_reg_value(struct pt_regs *regs, int idx) { struct x86_perf_regs *perf_regs; =20 - if (idx >=3D PERF_REG_X86_XMM0 && idx < PERF_REG_X86_XMM_MAX) { + if (idx > PERF_REG_X86_R15) { perf_regs =3D container_of(regs, struct x86_perf_regs, regs); - /* SIMD registers are moved to dedicated sample_simd_vec_reg */ - if (perf_regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) - return 0; - if (!perf_regs->xmm_regs) - return 0; - return perf_regs->xmm_regs[idx - PERF_REG_X86_XMM0]; + + if (perf_regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) { + if (idx <=3D PERF_REG_X86_R31) { + if (!perf_regs->egpr_regs) + return 0; + return perf_regs->egpr_regs[idx - PERF_REG_X86_R16]; + } + } else { + if (idx >=3D PERF_REG_X86_XMM0 && idx < PERF_REG_X86_XMM_MAX) { + if (!perf_regs->xmm_regs) + return 0; + return perf_regs->xmm_regs[idx - PERF_REG_X86_XMM0]; + } + } } =20 if (WARN_ON_ONCE(idx >=3D ARRAY_SIZE(pt_regs_offset))) @@ -171,14 +179,7 @@ int perf_simd_reg_validate(u16 vec_qwords, u64 vec_mas= k, ~((1ULL << PERF_REG_X86_MAX) - 1)) =20 #ifdef CONFIG_X86_32 -#define REG_NOSUPPORT ((1ULL << PERF_REG_X86_R8) | \ - (1ULL << PERF_REG_X86_R9) | \ - (1ULL << PERF_REG_X86_R10) | \ - (1ULL << PERF_REG_X86_R11) | \ - (1ULL << PERF_REG_X86_R12) | \ - (1ULL << PERF_REG_X86_R13) | \ - (1ULL << PERF_REG_X86_R14) | \ - (1ULL << PERF_REG_X86_R15)) +#define REG_NOSUPPORT GENMASK_ULL(PERF_REG_X86_R31, PERF_REG_X86_R8) =20 int perf_reg_validate(u64 mask) { --=20 2.38.1 From nobody Wed Oct 8 14:18:30 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 13B4B2F9C35 for ; Fri, 15 Aug 2025 21:35:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755293755; cv=none; b=SH8cgyIfOQVVsJKO1ivvdreqi8hDMfQMhtIDBWCy+DIt8iEPrwFtzSpFWL0JnIpm4cLb97SiEcCr1hJpYMGlf4l3b3VYl3F2U+zSFoFRsZVwEi43d7oo7apXtSzcCre3N5oLCm3IB4cSZF/OKJvK8yXGEzWZgM+gp9xnJ3JQAPM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755293755; c=relaxed/simple; bh=zK00kmfRlCHwVjHg2vItvive4icCCJY9j1QesyHgFw4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=me/+qI2dMl/IQhCNgDQ1PloZofWm3bgA5DBRcNKggwmZoe5HaXP2LG7NcqK+llc8Y0CGUGBD7zRsZtWzUdAH22zhwqc00gn5ef7xuWDTGM3Wgwo3VkoD43XEry8TXF5pprhs9E6Et7cHKrvNvugbTo5RcQYyL94ZAIVAM9oP2/s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=LHs+FLxs; arc=none smtp.client-ip=198.175.65.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="LHs+FLxs" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1755293754; x=1786829754; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=zK00kmfRlCHwVjHg2vItvive4icCCJY9j1QesyHgFw4=; b=LHs+FLxs61lKJTF1Nf6YP6a7Jo+ZzXYMizAm/36OU6xCGluRbfdlarry gqF7sLYT1+Ch3allf5sXg2rN3OjcbxUoNN0t5qkWHkgerdDXGBjon3ztr gJ0akukXp7WWah5biKF2zPeYNPkyYntC5qx1MRbYSSpCv2EBcB49Fwiai DLrrjH+hIHLK2UmbL93TmjUvRrbvPPbb2KdiW5P/1/Fx9tE57NEmz25nr COAlncSSs3z3E5eN5ypvPMeew29TrnAxx4oxArU0TUU6qNw15KNfI37I1 GhNlsRlvaVvHviUkbuX/uQyvDEEZJJcEl/EUSSH166wCY6NQKR3xvGGYh A==; X-CSE-ConnectionGUID: p3CQqAaUSDKVyxTmzu5tww== X-CSE-MsgGUID: ExwD0ZgMSBSPPio87jeLtw== X-IronPort-AV: E=McAfee;i="6800,10657,11523"; a="68707451" X-IronPort-AV: E=Sophos;i="6.17,293,1747724400"; d="scan'208";a="68707451" Received: from orviesa010.jf.intel.com ([10.64.159.150]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Aug 2025 14:35:47 -0700 X-CSE-ConnectionGUID: O5sb1IS4QQu6/APtYWUvQQ== X-CSE-MsgGUID: z8utoMZ8RMK+oNYMNWJ/Cg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.17,293,1747724400"; d="scan'208";a="166319617" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by orviesa010.jf.intel.com with ESMTP; 15 Aug 2025 14:35:48 -0700 From: kan.liang@linux.intel.com To: peterz@infradead.org, mingo@redhat.com, acme@kernel.org, namhyung@kernel.org, tglx@linutronix.de, dave.hansen@linux.intel.com, irogers@google.com, adrian.hunter@intel.com, jolsa@kernel.org, alexander.shishkin@linux.intel.com, linux-kernel@vger.kernel.org Cc: dapeng1.mi@linux.intel.com, ak@linux.intel.com, zide.chen@intel.com, mark.rutland@arm.com, broonie@kernel.org, ravi.bangoria@amd.com, eranian@google.com, Kan Liang Subject: [PATCH V3 12/17] perf/x86: Add SSP into sample_regs Date: Fri, 15 Aug 2025 14:34:30 -0700 Message-Id: <20250815213435.1702022-13-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20250815213435.1702022-1-kan.liang@linux.intel.com> References: <20250815213435.1702022-1-kan.liang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang The SSP is only supported when the new SIMD registers configuration method is used, which moves the XMM to sample_simd_vec_regs. So the space can be reclaimed for the SSP. The SSP is retrieved by XSAVE. Only support the SSP for X86_64. Signed-off-by: Kan Liang --- arch/x86/events/core.c | 14 +++++++++++++- arch/x86/include/asm/perf_event.h | 4 ++++ arch/x86/include/uapi/asm/perf_regs.h | 3 +++ arch/x86/kernel/perf_regs.c | 8 +++++++- 4 files changed, 27 insertions(+), 2 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index f816290defc1..b0c8b24975cb 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -434,6 +434,8 @@ static void x86_pmu_get_ext_regs(struct x86_perf_regs *= perf_regs, u64 mask) perf_regs->opmask =3D get_xsave_addr(xsave, XFEATURE_OPMASK); if (valid_mask & XFEATURE_MASK_APX) perf_regs->egpr =3D get_xsave_addr(xsave, XFEATURE_APX); + if (valid_mask & XFEATURE_MASK_CET_USER) + perf_regs->cet =3D get_xsave_addr(xsave, XFEATURE_CET_USER); } =20 static void release_ext_regs_buffers(void) @@ -712,7 +714,7 @@ int x86_pmu_hw_config(struct perf_event *event) =20 if (event->attr.sample_type & (PERF_SAMPLE_REGS_INTR | PERF_SAMPLE_REGS_U= SER)) { if (event->attr.sample_simd_regs_enabled) { - u64 reserved =3D ~GENMASK_ULL(PERF_REG_X86_64_MAX - 1, 0); + u64 reserved =3D ~GENMASK_ULL(PERF_REG_MISC_MAX - 1, 0); =20 if (!(event->pmu->capabilities & PERF_PMU_CAP_SIMD_REGS)) return -EINVAL; @@ -727,6 +729,11 @@ int x86_pmu_hw_config(struct perf_event *event) event->attr.sample_regs_intr & PERF_X86_EGPRS_MASK) && !(x86_pmu.ext_regs_mask & XFEATURE_MASK_APX)) return -EINVAL; + if ((event->attr.sample_regs_user & BIT_ULL(PERF_REG_X86_SSP) || + event->attr.sample_regs_intr & BIT_ULL(PERF_REG_X86_SSP)) && + !(x86_pmu.ext_regs_mask & XFEATURE_MASK_CET_USER)) + return -EINVAL; + } else { /* * Besides the general purpose registers, XMM registers may @@ -1904,6 +1911,11 @@ void x86_pmu_setup_regs_data(struct perf_event *even= t, perf_regs->egpr_regs =3D NULL; mask |=3D XFEATURE_MASK_APX; } + if (attr->sample_regs_user & BIT_ULL(PERF_REG_X86_SSP) || + attr->sample_regs_intr & BIT_ULL(PERF_REG_X86_SSP)) { + perf_regs->cet_regs =3D NULL; + mask |=3D XFEATURE_MASK_CET_USER; + } } =20 mask &=3D ~ignore_mask; diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_= event.h index 4400cb66bc8e..28ddff38d232 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -617,6 +617,10 @@ struct x86_perf_regs { u64 *egpr_regs; struct apx_state *egpr; }; + union { + u64 *cet_regs; + struct cet_user_state *cet; + }; }; =20 extern unsigned long perf_arch_instruction_pointer(struct pt_regs *regs); diff --git a/arch/x86/include/uapi/asm/perf_regs.h b/arch/x86/include/uapi/= asm/perf_regs.h index cd0f6804debf..4d88cb18acb9 100644 --- a/arch/x86/include/uapi/asm/perf_regs.h +++ b/arch/x86/include/uapi/asm/perf_regs.h @@ -48,6 +48,9 @@ enum perf_event_x86_regs { PERF_REG_X86_32_MAX =3D PERF_REG_X86_GS + 1, PERF_REG_X86_64_MAX =3D PERF_REG_X86_R31 + 1, =20 + PERF_REG_X86_SSP, + PERF_REG_MISC_MAX =3D PERF_REG_X86_SSP + 1, + /* * These all need two bits set because they are 128bit. * These are only available when !PERF_SAMPLE_REGS_ABI_SIMD diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c index b6e50194ff3e..d579fa3223c0 100644 --- a/arch/x86/kernel/perf_regs.c +++ b/arch/x86/kernel/perf_regs.c @@ -92,6 +92,11 @@ u64 perf_reg_value(struct pt_regs *regs, int idx) return 0; return perf_regs->egpr_regs[idx - PERF_REG_X86_R16]; } + if (idx =3D=3D PERF_REG_X86_SSP) { + if (!perf_regs->cet_regs) + return 0; + return perf_regs->cet_regs[1]; + } } else { if (idx >=3D PERF_REG_X86_XMM0 && idx < PERF_REG_X86_XMM_MAX) { if (!perf_regs->xmm_regs) @@ -179,7 +184,8 @@ int perf_simd_reg_validate(u16 vec_qwords, u64 vec_mask, ~((1ULL << PERF_REG_X86_MAX) - 1)) =20 #ifdef CONFIG_X86_32 -#define REG_NOSUPPORT GENMASK_ULL(PERF_REG_X86_R31, PERF_REG_X86_R8) +#define REG_NOSUPPORT (GENMASK_ULL(PERF_REG_X86_R31, PERF_REG_X86_R8) | \ + BIT_ULL(PERF_REG_X86_SSP)) =20 int perf_reg_validate(u64 mask) { --=20 2.38.1 From nobody Wed Oct 8 14:18:30 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D5C7B304BD2 for ; Fri, 15 Aug 2025 21:35:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755293756; cv=none; b=kf6ot/bJZAr8Ok/oErzXCn1Oiq/ssWZ5ErtB+I6mgE7S9NToHqC2FnSaY0IIxd5msK3AaKhFo4Y3PHSlbYAfMLTgXxoToKhHciAqDiwtC0ZCywhbEqhgckTMfPOAcUlAqb9XPp1k4q6iVYV7WN4ZkOqAbszcygCfbDmozrk7vJc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755293756; c=relaxed/simple; bh=WT+twm8Uf99k768qpKBOGZ1/ZhOFLPhGU/CEE7PFnPw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=u7xZFRHn5rAkHajhhni8vUKKFU8LVqhtVl2AQJN6oE/kORsV6bqlp68e/GKklIDRaczuOsZCekhBLgfa3kckg3huZpEjiyvyhLzld1wkQnM3fCnDVhWAfbw1QynU7F05r1tjCmOOdhOXQdE5MnkxIDBKtnGA2HJ1aMCtNg4L8LU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=NFRbLJIe; arc=none smtp.client-ip=198.175.65.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="NFRbLJIe" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1755293754; x=1786829754; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=WT+twm8Uf99k768qpKBOGZ1/ZhOFLPhGU/CEE7PFnPw=; b=NFRbLJIeSoWFZ1PkQrGffxrb+FhR/B1gxKGdIKhCRdktaj+ILxYYl8t9 XPzyCJ6djOAzHtiZNSYcV39uF7qMZlL3yvRhR9F4BZ0mU3MfCbokLlTEA otW/kapHHZl8cliGCNGxOcKu1bUOGFkOALceYfTKJnHJwsnCwD9ZvQDxG pRzodOppGz5mVmp3kY57Oo8jVGU1GTKMIGEBsYVOK5o9uZgstH/rEsLle aLJDroFdFC3Fh+gVZhozDgGgzy++1hVpgJyGjGZBhJFThnliB1IOi7C9m tYskdRQhb++KqOEd24JRIWKDckSU0q4WDTRWJcYyNjX/eLKhZ90HMkFQs g==; X-CSE-ConnectionGUID: /3To17/KSLiJyG0DqsCCTQ== X-CSE-MsgGUID: pWbkKZiBRQOV++LKnol6nw== X-IronPort-AV: E=McAfee;i="6800,10657,11523"; a="68707459" X-IronPort-AV: E=Sophos;i="6.17,293,1747724400"; d="scan'208";a="68707459" Received: from orviesa010.jf.intel.com ([10.64.159.150]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Aug 2025 14:35:47 -0700 X-CSE-ConnectionGUID: lhaK5i9lQli2SWhlr0ynvA== X-CSE-MsgGUID: +GXMbuktRMqMnyURUgzQLw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.17,293,1747724400"; d="scan'208";a="166319621" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by orviesa010.jf.intel.com with ESMTP; 15 Aug 2025 14:35:48 -0700 From: kan.liang@linux.intel.com To: peterz@infradead.org, mingo@redhat.com, acme@kernel.org, namhyung@kernel.org, tglx@linutronix.de, dave.hansen@linux.intel.com, irogers@google.com, adrian.hunter@intel.com, jolsa@kernel.org, alexander.shishkin@linux.intel.com, linux-kernel@vger.kernel.org Cc: dapeng1.mi@linux.intel.com, ak@linux.intel.com, zide.chen@intel.com, mark.rutland@arm.com, broonie@kernel.org, ravi.bangoria@amd.com, eranian@google.com, Kan Liang Subject: [PATCH V3 13/17] perf/x86/intel: Enable PERF_PMU_CAP_SIMD_REGS Date: Fri, 15 Aug 2025 14:34:31 -0700 Message-Id: <20250815213435.1702022-14-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20250815213435.1702022-1-kan.liang@linux.intel.com> References: <20250815213435.1702022-1-kan.liang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang Enable PERF_PMU_CAP_SIMD_REGS if there is XSAVES support for YMM, ZMM, OPMASK, eGPRs, or SSP. Disable large PEBS for these registers since PEBS HW doesn't support them yet. Signed-off-by: Kan Liang --- arch/x86/events/intel/core.c | 46 ++++++++++++++++++++++++++++++++++-- 1 file changed, 44 insertions(+), 2 deletions(-) diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index bd16f91dea1c..c09176400377 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -4033,8 +4033,30 @@ static unsigned long intel_pmu_large_pebs_flags(stru= ct perf_event *event) flags &=3D ~PERF_SAMPLE_TIME; if (!event->attr.exclude_kernel) flags &=3D ~PERF_SAMPLE_REGS_USER; - if (event->attr.sample_regs_user & ~PEBS_GP_REGS) - flags &=3D ~(PERF_SAMPLE_REGS_USER | PERF_SAMPLE_REGS_INTR); + if (event->attr.sample_simd_regs_enabled) { + u64 nolarge =3D PERF_X86_EGPRS_MASK | BIT_ULL(PERF_REG_X86_SSP); + + /* + * PEBS HW can only collect the XMM0-XMM15 for now. + * Disable large PEBS for other vector registers, predicate + * registers, eGPRs, and SSP. + */ + if (event->attr.sample_regs_user & nolarge || + fls64(event->attr.sample_simd_vec_reg_user) > PERF_X86_H16ZMM_BASE || + event->attr.sample_simd_pred_reg_user) + flags &=3D ~PERF_SAMPLE_REGS_USER; + + if (event->attr.sample_regs_intr & nolarge || + fls64(event->attr.sample_simd_vec_reg_intr) > PERF_X86_H16ZMM_BASE || + event->attr.sample_simd_pred_reg_intr) + flags &=3D ~PERF_SAMPLE_REGS_INTR; + + if (event->attr.sample_simd_vec_reg_qwords > PERF_X86_XMM_QWORDS) + flags &=3D ~(PERF_SAMPLE_REGS_USER | PERF_SAMPLE_REGS_INTR); + } else { + if (event->attr.sample_regs_user & ~PEBS_GP_REGS) + flags &=3D ~(PERF_SAMPLE_REGS_USER | PERF_SAMPLE_REGS_INTR); + } return flags; } =20 @@ -5295,6 +5317,26 @@ static void intel_extended_regs_init(struct pmu *pmu) =20 x86_pmu.ext_regs_mask |=3D XFEATURE_MASK_SSE; x86_get_pmu(smp_processor_id())->capabilities |=3D PERF_PMU_CAP_EXTENDED_= REGS; + + if (boot_cpu_has(X86_FEATURE_AVX) && + cpu_has_xfeatures(XFEATURE_MASK_YMM, NULL)) + x86_pmu.ext_regs_mask |=3D XFEATURE_MASK_YMM; + if (boot_cpu_has(X86_FEATURE_APX) && + cpu_has_xfeatures(XFEATURE_MASK_APX, NULL)) + x86_pmu.ext_regs_mask |=3D XFEATURE_MASK_APX; + if (boot_cpu_has(X86_FEATURE_AVX512F)) { + if (cpu_has_xfeatures(XFEATURE_MASK_OPMASK, NULL)) + x86_pmu.ext_regs_mask |=3D XFEATURE_MASK_OPMASK; + if (cpu_has_xfeatures(XFEATURE_MASK_ZMM_Hi256, NULL)) + x86_pmu.ext_regs_mask |=3D XFEATURE_MASK_ZMM_Hi256; + if (cpu_has_xfeatures(XFEATURE_MASK_Hi16_ZMM, NULL)) + x86_pmu.ext_regs_mask |=3D XFEATURE_MASK_Hi16_ZMM; + } + if (cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + x86_pmu.ext_regs_mask |=3D XFEATURE_MASK_CET_USER; + + if (x86_pmu.ext_regs_mask !=3D XFEATURE_MASK_SSE) + x86_get_pmu(smp_processor_id())->capabilities |=3D PERF_PMU_CAP_SIMD_REG= S; } =20 static void update_pmu_cap(struct pmu *pmu) --=20 2.38.1 From nobody Wed Oct 8 14:18:30 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 46E793090F5 for ; Fri, 15 Aug 2025 21:35:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755293756; cv=none; b=BmCNHRm8M+Sumw40vi7S+muran2s5lJXMwxrlOFmo44QaqZPzSBeSlgx/ony0j0K4+IOGy5tWjFSt/i/7DRS0IdmDJkEsiL2Uu1YsTK7LphWCio9sPThYoanzjW92powz9kB7Doq9e+zCszrES6SCogL3PcqMc7M8gzwIFgGL5E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755293756; c=relaxed/simple; bh=LhzVdwsZBUx/g6yYRHBqACDVKMw4oc5Q0hwku9mvBbI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=FtpfqofXJfHuhtsavVTiAnpxa93QI9am7Yf713yXaN55k6vbxdTK2MzVpuim1tNh2tLrmNeuIH3SNeOwgy3+a/5MGVuAjpGWfEzWZgiYL9o85v98WPd4Gei2gu2bi22/PAOK6ev9+fSU1+eThtjQkTqZCLn2YGjqkIrhV+rBrBY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=NJnxo00t; arc=none smtp.client-ip=198.175.65.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="NJnxo00t" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1755293755; x=1786829755; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=LhzVdwsZBUx/g6yYRHBqACDVKMw4oc5Q0hwku9mvBbI=; b=NJnxo00tn9jyZ2Y+aEAgc/GRwfFXBBS1XGNqak5LbP8c16/MtONPJxkV CKH663EL7AtFjKJD9ZgT5nmIXUrd6l35YTfdXEO8DXvcUS9WHEBq1A2xY o+RF6Qn8iRzq8/7nMbK5cW3BZIvkHSr3zvdhXlyRkGIoZpjS8i8Z5E26Z vYnX+1qJ/v82qQS3eR52peyFfy4JnhDI4jkX+CpmVXe5w8N9MRc7Jtkd8 ozMJ3912n/CklpxMu5fMcLsl0HdyIPY/iQEb+GoOXIuGvvAPdqSkob2i7 iETxYPxqsFDrxJEpVZ5ZttIL+yp/ZnyWjLvvLFzeW//VMoyIJJI4EMkzl g==; X-CSE-ConnectionGUID: 1fAi0ZdOT7qAqqhPJXH07A== X-CSE-MsgGUID: goPDCqSTQBuJcrglQx9BiA== X-IronPort-AV: E=McAfee;i="6800,10657,11523"; a="68707467" X-IronPort-AV: E=Sophos;i="6.17,293,1747724400"; d="scan'208";a="68707467" Received: from orviesa010.jf.intel.com ([10.64.159.150]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Aug 2025 14:35:47 -0700 X-CSE-ConnectionGUID: i86zaBbSTBqHF6mU9C0KsQ== X-CSE-MsgGUID: bFNnfVn7S4KUuaTFbPwqRw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.17,293,1747724400"; d="scan'208";a="166319624" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by orviesa010.jf.intel.com with ESMTP; 15 Aug 2025 14:35:48 -0700 From: kan.liang@linux.intel.com To: peterz@infradead.org, mingo@redhat.com, acme@kernel.org, namhyung@kernel.org, tglx@linutronix.de, dave.hansen@linux.intel.com, irogers@google.com, adrian.hunter@intel.com, jolsa@kernel.org, alexander.shishkin@linux.intel.com, linux-kernel@vger.kernel.org Cc: dapeng1.mi@linux.intel.com, ak@linux.intel.com, zide.chen@intel.com, mark.rutland@arm.com, broonie@kernel.org, ravi.bangoria@amd.com, eranian@google.com, Kan Liang Subject: [POC PATCH 14/17] perf/x86/regs: Only support legacy regs for the PT and PERF_REGS_MASK for now Date: Fri, 15 Aug 2025 14:34:32 -0700 Message-Id: <20250815213435.1702022-15-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20250815213435.1702022-1-kan.liang@linux.intel.com> References: <20250815213435.1702022-1-kan.liang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang The PERF_REG_X86_64_MAX is going to be updated to support more regs, e.g., eGPRs. However, the PT and PERF_REGS_MASK will not be touched in the POC. Using the PERF_REG_X86_R15 + 1 to replace PERF_REG_X86_64_MAX. Signed-off-by: Kan Liang Acked-by: Adrian Hunter --- tools/perf/arch/x86/include/perf_regs.h | 2 +- tools/perf/util/intel-pt.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/perf/arch/x86/include/perf_regs.h b/tools/perf/arch/x86/= include/perf_regs.h index f209ce2c1dd9..793fb597b03f 100644 --- a/tools/perf/arch/x86/include/perf_regs.h +++ b/tools/perf/arch/x86/include/perf_regs.h @@ -17,7 +17,7 @@ void perf_regs_load(u64 *regs); (1ULL << PERF_REG_X86_ES) | \ (1ULL << PERF_REG_X86_FS) | \ (1ULL << PERF_REG_X86_GS)) -#define PERF_REGS_MASK (((1ULL << PERF_REG_X86_64_MAX) - 1) & ~REG_NOSUPPO= RT) +#define PERF_REGS_MASK (((1ULL << (PERF_REG_X86_R15 + 1)) - 1) & ~REG_NOSU= PPORT) #define PERF_SAMPLE_REGS_ABI PERF_SAMPLE_REGS_ABI_64 #endif =20 diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c index 9b1011fe4826..a9585524f2e1 100644 --- a/tools/perf/util/intel-pt.c +++ b/tools/perf/util/intel-pt.c @@ -2181,7 +2181,7 @@ static u64 *intel_pt_add_gp_regs(struct regs_dump *in= tr_regs, u64 *pos, u32 bit; int i; =20 - for (i =3D 0, bit =3D 1; i < PERF_REG_X86_64_MAX; i++, bit <<=3D 1) { + for (i =3D 0, bit =3D 1; i < PERF_REG_X86_R15 + 1; i++, bit <<=3D 1) { /* Get the PEBS gp_regs array index */ int n =3D pebs_gp_regs[i] - 1; =20 --=20 2.38.1 From nobody Wed Oct 8 14:18:30 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C302430DEB3 for ; Fri, 15 Aug 2025 21:35:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755293757; cv=none; b=tW0jLFq+8cYBkyoEVD7qtEwvztzugwLqPZEDtNAnx71Bw65lPQmXDkbW8MSoWvHwIRZh/bQF98vTEwAwxp9xGcet5hsHdXOqGtv/fRLPBrR+z4JfJfMIIcmrgyI7QT3mrIY1IfO7SAD+zzoqqkv1AGHmfePMRQXh+/iQs9TKNtc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755293757; c=relaxed/simple; bh=LEL8MGhUoPZ/fqndiEjBZP5i9VDoC6vzBAO0lfEvRt8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=nfk/+vd9q0uPSlZso/Dm3l3CPGDa/TU+ZfU8UbVUpfaAJ5NqluWVHemwDh5DXMCfuOKQBLS2ltE+NE/KaGcS0NcorHkWqIq7C6VnGsdufTS8XGE/DOkVp216PfFf/pMC3XtymjmEZQ51d+Eekq3oiGnqcyyqYCLRPCJQqwg0Zvk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=GkwQPbkA; arc=none smtp.client-ip=198.175.65.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="GkwQPbkA" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1755293755; x=1786829755; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=LEL8MGhUoPZ/fqndiEjBZP5i9VDoC6vzBAO0lfEvRt8=; b=GkwQPbkAUCilqPrRwWGkgMjrccybmeibUwQ9qlSq1jtIrQfb3QG4nb5F koZAJSGaYVVt47PLNlKQKyR8T4k0GPvDJfTS2bXUZmLnsjZNemYreVAnk XGKgmMNv1m0ANpGOJoU1aEVpxw2VFBK52x1FOlmp22o0r/DSh3MhnPRlv h2b8kc7CeEVHC9j5EZ0QCXsvYpvv/2UWLaT11sxFJ3jmLkQoEiwethXoN NonhKOWbJTbXspYkVevDVKNhKCSwZDU+RMuRTaYKnP9JxpvsLjUyQdim1 dJN/kqzjZ7LNJGBIHWz/VlLj49R8nSLhS16rt9WI8O4SqS9ZWWetoFHFg A==; X-CSE-ConnectionGUID: 4wz9L0QbTcGuVVhYRY8Z3A== X-CSE-MsgGUID: s/dRsVy5QSOteTAOlwrW9Q== X-IronPort-AV: E=McAfee;i="6800,10657,11523"; a="68707475" X-IronPort-AV: E=Sophos;i="6.17,293,1747724400"; d="scan'208";a="68707475" Received: from orviesa010.jf.intel.com ([10.64.159.150]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Aug 2025 14:35:48 -0700 X-CSE-ConnectionGUID: bHxZXO0TSl2mt5U+bz31TA== X-CSE-MsgGUID: KALjkDNCQ5+xW61iZabpwg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.17,293,1747724400"; d="scan'208";a="166319627" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by orviesa010.jf.intel.com with ESMTP; 15 Aug 2025 14:35:48 -0700 From: kan.liang@linux.intel.com To: peterz@infradead.org, mingo@redhat.com, acme@kernel.org, namhyung@kernel.org, tglx@linutronix.de, dave.hansen@linux.intel.com, irogers@google.com, adrian.hunter@intel.com, jolsa@kernel.org, alexander.shishkin@linux.intel.com, linux-kernel@vger.kernel.org Cc: dapeng1.mi@linux.intel.com, ak@linux.intel.com, zide.chen@intel.com, mark.rutland@arm.com, broonie@kernel.org, ravi.bangoria@amd.com, eranian@google.com, Kan Liang Subject: [POC PATCH 15/17] tools headers: Sync with the kernel sources Date: Fri, 15 Aug 2025 14:34:33 -0700 Message-Id: <20250815213435.1702022-16-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20250815213435.1702022-1-kan.liang@linux.intel.com> References: <20250815213435.1702022-1-kan.liang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang Update include/uapi/linux/perf_event.h and arch/x86/include/uapi/asm/perf_regs.h to support extended regs. Signed-off-by: Kan Liang --- tools/arch/x86/include/uapi/asm/perf_regs.h | 44 ++++++++++++++++++- tools/include/uapi/linux/perf_event.h | 47 ++++++++++++++++++--- 2 files changed, 84 insertions(+), 7 deletions(-) diff --git a/tools/arch/x86/include/uapi/asm/perf_regs.h b/tools/arch/x86/i= nclude/uapi/asm/perf_regs.h index 7c9d2bb3833b..4d88cb18acb9 100644 --- a/tools/arch/x86/include/uapi/asm/perf_regs.h +++ b/tools/arch/x86/include/uapi/asm/perf_regs.h @@ -27,11 +27,34 @@ enum perf_event_x86_regs { PERF_REG_X86_R13, PERF_REG_X86_R14, PERF_REG_X86_R15, + /* Extended GPRs (EGPRs) */ + PERF_REG_X86_R16, + PERF_REG_X86_R17, + PERF_REG_X86_R18, + PERF_REG_X86_R19, + PERF_REG_X86_R20, + PERF_REG_X86_R21, + PERF_REG_X86_R22, + PERF_REG_X86_R23, + PERF_REG_X86_R24, + PERF_REG_X86_R25, + PERF_REG_X86_R26, + PERF_REG_X86_R27, + PERF_REG_X86_R28, + PERF_REG_X86_R29, + PERF_REG_X86_R30, + PERF_REG_X86_R31, /* These are the limits for the GPRs. */ PERF_REG_X86_32_MAX =3D PERF_REG_X86_GS + 1, - PERF_REG_X86_64_MAX =3D PERF_REG_X86_R15 + 1, + PERF_REG_X86_64_MAX =3D PERF_REG_X86_R31 + 1, =20 - /* These all need two bits set because they are 128bit */ + PERF_REG_X86_SSP, + PERF_REG_MISC_MAX =3D PERF_REG_X86_SSP + 1, + + /* + * These all need two bits set because they are 128bit. + * These are only available when !PERF_SAMPLE_REGS_ABI_SIMD + */ PERF_REG_X86_XMM0 =3D 32, PERF_REG_X86_XMM1 =3D 34, PERF_REG_X86_XMM2 =3D 36, @@ -55,4 +78,21 @@ enum perf_event_x86_regs { =20 #define PERF_REG_EXTENDED_MASK (~((1ULL << PERF_REG_X86_XMM0) - 1)) =20 +#define PERF_X86_EGPRS_MASK GENMASK_ULL(PERF_REG_X86_R31, PERF_REG_X86_R1= 6) + +#define PERF_X86_SIMD_PRED_REGS_MAX 8 +#define PERF_X86_SIMD_PRED_MASK GENMASK(PERF_X86_SIMD_PRED_REGS_MAX - 1, = 0) +#define PERF_X86_SIMD_VEC_REGS_MAX 32 +#define PERF_X86_SIMD_VEC_MASK GENMASK_ULL(PERF_X86_SIMD_VEC_REGS_MAX - 1= , 0) + +#define PERF_X86_H16ZMM_BASE 16 + +#define PERF_X86_OPMASK_QWORDS 1 +#define PERF_X86_XMM_QWORDS 2 +#define PERF_X86_YMM_QWORDS 4 +#define PERF_X86_YMMH_QWORDS (PERF_X86_YMM_QWORDS / 2) +#define PERF_X86_ZMM_QWORDS 8 +#define PERF_X86_ZMMH_QWORDS (PERF_X86_ZMM_QWORDS / 2) +#define PERF_X86_SIMD_QWORDS_MAX PERF_X86_ZMM_QWORDS + #endif /* _ASM_X86_PERF_REGS_H */ diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/lin= ux/perf_event.h index 78a362b80027..2e9b16acbed6 100644 --- a/tools/include/uapi/linux/perf_event.h +++ b/tools/include/uapi/linux/perf_event.h @@ -313,9 +313,10 @@ enum { * Values to determine ABI of the registers dump. */ enum perf_sample_regs_abi { - PERF_SAMPLE_REGS_ABI_NONE =3D 0, - PERF_SAMPLE_REGS_ABI_32 =3D 1, - PERF_SAMPLE_REGS_ABI_64 =3D 2, + PERF_SAMPLE_REGS_ABI_NONE =3D 0x00, + PERF_SAMPLE_REGS_ABI_32 =3D 0x01, + PERF_SAMPLE_REGS_ABI_64 =3D 0x02, + PERF_SAMPLE_REGS_ABI_SIMD =3D 0x04, }; =20 /* @@ -382,6 +383,7 @@ enum perf_event_read_format { #define PERF_ATTR_SIZE_VER6 120 /* Add: aux_sample_size */ #define PERF_ATTR_SIZE_VER7 128 /* Add: sig_data */ #define PERF_ATTR_SIZE_VER8 136 /* Add: config3 */ +#define PERF_ATTR_SIZE_VER9 168 /* Add: sample_simd_{pred,vec}_reg_* */ =20 /* * 'struct perf_event_attr' contains various attributes that define @@ -543,6 +545,25 @@ struct perf_event_attr { __u64 sig_data; =20 __u64 config3; /* extension of config2 */ + + + /* + * Defines set of SIMD registers to dump on samples. + * The sample_simd_regs_enabled !=3D0 implies the + * set of SIMD registers is used to config all SIMD registers. + * If !sample_simd_regs_enabled, sample_regs_XXX may be used to + * config some SIMD registers on X86. + */ + union { + __u16 sample_simd_regs_enabled; + __u16 sample_simd_pred_reg_qwords; + }; + __u32 sample_simd_pred_reg_intr; + __u32 sample_simd_pred_reg_user; + __u16 sample_simd_vec_reg_qwords; + __u64 sample_simd_vec_reg_intr; + __u64 sample_simd_vec_reg_user; + __u32 __reserved_4; }; =20 /* @@ -1016,7 +1037,15 @@ enum perf_event_type { * } && PERF_SAMPLE_BRANCH_STACK * * { u64 abi; # enum perf_sample_regs_abi - * u64 regs[weight(mask)]; } && PERF_SAMPLE_REGS_USER + * u64 regs[weight(mask)]; + * struct { + * u16 nr_vectors; + * u16 vector_qwords; + * u16 nr_pred; + * u16 pred_qwords; + * u64 data[nr_vectors * vector_qwords + nr_pred * pred_qwords]; + * } && (abi & PERF_SAMPLE_REGS_ABI_SIMD) + * } && PERF_SAMPLE_REGS_USER * * { u64 size; * char data[size]; @@ -1043,7 +1072,15 @@ enum perf_event_type { * { u64 data_src; } && PERF_SAMPLE_DATA_SRC * { u64 transaction; } && PERF_SAMPLE_TRANSACTION * { u64 abi; # enum perf_sample_regs_abi - * u64 regs[weight(mask)]; } && PERF_SAMPLE_REGS_INTR + * u64 regs[weight(mask)]; + * struct { + * u16 nr_vectors; + * u16 vector_qwords; + * u16 nr_pred; + * u16 pred_qwords; + * u64 data[nr_vectors * vector_qwords + nr_pred * pred_qwords]; + * } && (abi & PERF_SAMPLE_REGS_ABI_SIMD) + * } && PERF_SAMPLE_REGS_INTR * { u64 phys_addr;} && PERF_SAMPLE_PHYS_ADDR * { u64 cgroup;} && PERF_SAMPLE_CGROUP * { u64 data_page_size;} && PERF_SAMPLE_DATA_PAGE_SIZE --=20 2.38.1 From nobody Wed Oct 8 14:18:30 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AA9FA30DEDA for ; Fri, 15 Aug 2025 21:35:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755293759; cv=none; b=qzMl+VgUq9j1qZa6UgFldfNVh/pbkkycEtooMVoSvXtIeYFcbWi7Pa9ItSheQz6G1Aj7dOdQJgaqb26MCt6WEaZzzos9q8IMgJjt2YwLfiy4/g8XnwAXx9JH/yYSnQYTHDVHl25eJpR6UGBiKshuGfRM3hIAmIN8tahMrkeCYDk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755293759; c=relaxed/simple; bh=/3YdOgnYd6GHMZ2BququByk6yNTsCgpGYwQ2+816BFs=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Tg78tFP7BotweoefoYNL+VPORmshLLdpJ4Uttr2z/pJgzcuur/nIzylT5sKEhEKj2InwiZPcfk+a1LBewZROAKUzYeIv70dA4hP9vSodSX98pZrasD5l6cT8pIVSScAwNcFQzQBwHzB2aYVbqAsGHWVACn8O04x0xJjg/GCyqqM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=efWb0df2; arc=none smtp.client-ip=198.175.65.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="efWb0df2" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1755293756; x=1786829756; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=/3YdOgnYd6GHMZ2BququByk6yNTsCgpGYwQ2+816BFs=; b=efWb0df2Oprgw4a5UYMY7aaWHONXxIi9UM1W1Jno7uMSgwwhFpPxW7I6 Xd58bQ9dR1VxIJTrraNO1PfV/IMX7KWfJ3eNxUkAwApjlTy4I+T6gCz6H yiJr2pQgenDTWI2/fLdqqwb56YnPyhM21+UT/QYTo003EP2B6pMhVpaS9 5aTs5jxklEpEpmdjq/vILLNK9HSM9dLizuzE9RMqDxutbDMK0JazS4V1p V8xpk8xj41o1I73kT2rbnsIPYvuExfh9EIQo/KRE7mCH9jzKjrbVjH52L S+9E7EZ5aFw+B3QR9Li694FLwUY1wb9N6QhVd7rxnilwdkwNRjCNB2sI0 g==; X-CSE-ConnectionGUID: 3AFS5RWsRXa0m9ZZlaAajA== X-CSE-MsgGUID: mZGkSqKASLKNBf4uvjyvSw== X-IronPort-AV: E=McAfee;i="6800,10657,11523"; a="68707483" X-IronPort-AV: E=Sophos;i="6.17,293,1747724400"; d="scan'208";a="68707483" Received: from orviesa010.jf.intel.com ([10.64.159.150]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Aug 2025 14:35:48 -0700 X-CSE-ConnectionGUID: Gv298BkmRpOKOksqR8t+2g== X-CSE-MsgGUID: dFBDLjlVTbmBh9F+9y/BDw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.17,293,1747724400"; d="scan'208";a="166319630" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by orviesa010.jf.intel.com with ESMTP; 15 Aug 2025 14:35:48 -0700 From: kan.liang@linux.intel.com To: peterz@infradead.org, mingo@redhat.com, acme@kernel.org, namhyung@kernel.org, tglx@linutronix.de, dave.hansen@linux.intel.com, irogers@google.com, adrian.hunter@intel.com, jolsa@kernel.org, alexander.shishkin@linux.intel.com, linux-kernel@vger.kernel.org Cc: dapeng1.mi@linux.intel.com, ak@linux.intel.com, zide.chen@intel.com, mark.rutland@arm.com, broonie@kernel.org, ravi.bangoria@amd.com, eranian@google.com, Kan Liang Subject: [POC PATCH 16/17] perf parse-regs: Support the new SIMD format Date: Fri, 15 Aug 2025 14:34:34 -0700 Message-Id: <20250815213435.1702022-17-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20250815213435.1702022-1-kan.liang@linux.intel.com> References: <20250815213435.1702022-1-kan.liang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang Add has_cap_simd_regs() to check if the new SIMD format is available. If yes, get the possible mask and qwords. Add several __weak functions to return qwords and mask for vector and pred registers. Only support collecting the vector and pred as a whole, and only the superset. For example, -I XMM,YMM. Only collect all 16 YMMs. Examples: $perf record -I? available registers: AX BX CX DX SI DI BP SP IP FLAGS CS SS R8 R9 R10 R11 R12 R13 R14 R15 SSP XMM0-31 YMM0-31 ZMM0-31 OPMASK0-7 $perf record --user-regs=3D? available registers: AX BX CX DX SI DI BP SP IP FLAGS CS SS R8 R9 R10 R11 R12 R13 R14 R15 SSP XMM0-31 YMM0-31 ZMM0-31 OPMASK0-7 Signed-off-by: Kan Liang --- tools/perf/arch/x86/util/perf_regs.c | 257 +++++++++++++++++++++- tools/perf/util/evsel.c | 25 +++ tools/perf/util/parse-regs-options.c | 60 ++++- tools/perf/util/perf_event_attr_fprintf.c | 6 + tools/perf/util/perf_regs.c | 29 +++ tools/perf/util/perf_regs.h | 13 +- tools/perf/util/record.h | 6 + 7 files changed, 381 insertions(+), 15 deletions(-) diff --git a/tools/perf/arch/x86/util/perf_regs.c b/tools/perf/arch/x86/uti= l/perf_regs.c index 12fd93f04802..78027df1af9a 100644 --- a/tools/perf/arch/x86/util/perf_regs.c +++ b/tools/perf/arch/x86/util/perf_regs.c @@ -13,6 +13,49 @@ #include "../../../util/pmu.h" #include "../../../util/pmus.h" =20 +static const struct sample_reg sample_reg_masks_ext[] =3D { + SMPL_REG(AX, PERF_REG_X86_AX), + SMPL_REG(BX, PERF_REG_X86_BX), + SMPL_REG(CX, PERF_REG_X86_CX), + SMPL_REG(DX, PERF_REG_X86_DX), + SMPL_REG(SI, PERF_REG_X86_SI), + SMPL_REG(DI, PERF_REG_X86_DI), + SMPL_REG(BP, PERF_REG_X86_BP), + SMPL_REG(SP, PERF_REG_X86_SP), + SMPL_REG(IP, PERF_REG_X86_IP), + SMPL_REG(FLAGS, PERF_REG_X86_FLAGS), + SMPL_REG(CS, PERF_REG_X86_CS), + SMPL_REG(SS, PERF_REG_X86_SS), +#ifdef HAVE_ARCH_X86_64_SUPPORT + SMPL_REG(R8, PERF_REG_X86_R8), + SMPL_REG(R9, PERF_REG_X86_R9), + SMPL_REG(R10, PERF_REG_X86_R10), + SMPL_REG(R11, PERF_REG_X86_R11), + SMPL_REG(R12, PERF_REG_X86_R12), + SMPL_REG(R13, PERF_REG_X86_R13), + SMPL_REG(R14, PERF_REG_X86_R14), + SMPL_REG(R15, PERF_REG_X86_R15), + SMPL_REG(R16, PERF_REG_X86_R16), + SMPL_REG(R17, PERF_REG_X86_R17), + SMPL_REG(R18, PERF_REG_X86_R18), + SMPL_REG(R19, PERF_REG_X86_R19), + SMPL_REG(R20, PERF_REG_X86_R20), + SMPL_REG(R21, PERF_REG_X86_R21), + SMPL_REG(R22, PERF_REG_X86_R22), + SMPL_REG(R23, PERF_REG_X86_R23), + SMPL_REG(R24, PERF_REG_X86_R24), + SMPL_REG(R25, PERF_REG_X86_R25), + SMPL_REG(R26, PERF_REG_X86_R26), + SMPL_REG(R27, PERF_REG_X86_R27), + SMPL_REG(R28, PERF_REG_X86_R28), + SMPL_REG(R29, PERF_REG_X86_R29), + SMPL_REG(R30, PERF_REG_X86_R30), + SMPL_REG(R31, PERF_REG_X86_R31), + SMPL_REG(SSP, PERF_REG_X86_SSP), +#endif + SMPL_REG_END +}; + static const struct sample_reg sample_reg_masks[] =3D { SMPL_REG(AX, PERF_REG_X86_AX), SMPL_REG(BX, PERF_REG_X86_BX), @@ -276,27 +319,159 @@ int arch_sdt_arg_parse_op(char *old_op, char **new_o= p) return SDT_ARG_VALID; } =20 +static bool support_simd_reg(u64 sample_type, u16 qwords, u64 mask, bool p= red) +{ + struct perf_event_attr attr =3D { + .type =3D PERF_TYPE_HARDWARE, + .config =3D PERF_COUNT_HW_CPU_CYCLES, + .sample_type =3D sample_type, + .disabled =3D 1, + .exclude_kernel =3D 1, + .sample_simd_regs_enabled =3D 1, + }; + int fd; + + attr.sample_period =3D 1; + + if (!pred) { + attr.sample_simd_vec_reg_qwords =3D qwords; + if (sample_type =3D=3D PERF_SAMPLE_REGS_INTR) + attr.sample_simd_vec_reg_intr =3D mask; + else + attr.sample_simd_vec_reg_user =3D mask; + } else { + attr.sample_simd_pred_reg_qwords =3D PERF_X86_OPMASK_QWORDS; + if (sample_type =3D=3D PERF_SAMPLE_REGS_INTR) + attr.sample_simd_pred_reg_intr =3D PERF_X86_SIMD_PRED_MASK; + else + attr.sample_simd_pred_reg_user =3D PERF_X86_SIMD_PRED_MASK; + } + + if (perf_pmus__num_core_pmus() > 1) { + struct perf_pmu *pmu =3D NULL; + __u64 type =3D PERF_TYPE_RAW; + + /* + * The same register set is supported among different hybrid PMUs. + * Only check the first available one. + */ + while ((pmu =3D perf_pmus__scan_core(pmu)) !=3D NULL) { + type =3D pmu->type; + break; + } + attr.config |=3D type << PERF_PMU_TYPE_SHIFT; + } + + event_attr_init(&attr); + + fd =3D sys_perf_event_open(&attr, 0, -1, -1, 0); + if (fd !=3D -1) { + close(fd); + return true; + } + + return false; +} + +static uint64_t intr_simd_mask, user_simd_mask, pred_mask; +static u16 intr_simd_qwords, user_simd_qwords, pred_qwords; + +static bool get_simd_reg_mask(u64 sample_type) +{ + u64 mask =3D GENMASK_ULL(PERF_X86_H16ZMM_BASE - 1, 0); + u16 qwords =3D PERF_X86_ZMM_QWORDS; + + if (support_simd_reg(sample_type, qwords, mask, false)) { + if (support_simd_reg(sample_type, qwords, PERF_X86_SIMD_VEC_MASK, false)) + mask =3D PERF_X86_SIMD_VEC_MASK; + } else { + qwords =3D PERF_X86_YMM_QWORDS; + if (!support_simd_reg(sample_type, qwords, mask, false)) { + qwords =3D PERF_X86_XMM_QWORDS; + if (!support_simd_reg(sample_type, qwords, mask, false)) { + qwords =3D 0; + mask =3D 0; + } + } + } + + if (sample_type =3D=3D PERF_SAMPLE_REGS_INTR) { + intr_simd_mask =3D mask; + intr_simd_qwords =3D qwords; + } else { + user_simd_mask =3D mask; + user_simd_qwords =3D qwords; + } + + if (support_simd_reg(sample_type, qwords, mask, true)) { + pred_mask =3D PERF_X86_SIMD_PRED_MASK; + pred_qwords =3D PERF_X86_OPMASK_QWORDS; + } + + return true; +} + +static bool has_cap_simd_regs(void) +{ + static bool has_cap_simd_regs; + static bool cached; + + if (cached) + return has_cap_simd_regs; + + cached =3D true; + has_cap_simd_regs =3D get_simd_reg_mask(PERF_SAMPLE_REGS_INTR); + has_cap_simd_regs |=3D get_simd_reg_mask(PERF_SAMPLE_REGS_USER); + + return has_cap_simd_regs; +} + const struct sample_reg *arch__sample_reg_masks(void) { + if (has_cap_simd_regs()) + return sample_reg_masks_ext; return sample_reg_masks; } =20 -uint64_t arch__intr_reg_mask(void) +static const struct sample_reg sample_simd_reg_masks_empty[] =3D { + SMPL_REG_END +}; + +static const struct sample_reg sample_simd_reg_masks[] =3D { + SMPL_REG(XMM, 1), + SMPL_REG(YMM, 2), + SMPL_REG(ZMM, 3), + SMPL_REG(OPMASK, 32), + SMPL_REG_END +}; + +const struct sample_reg *arch__sample_simd_reg_masks(void) +{ + if (has_cap_simd_regs()) + return sample_simd_reg_masks; + return sample_simd_reg_masks_empty; +} + +static uint64_t __arch__reg_mask(u64 sample_type, u64 mask, bool has_simd_= regs) { struct perf_event_attr attr =3D { - .type =3D PERF_TYPE_HARDWARE, - .config =3D PERF_COUNT_HW_CPU_CYCLES, - .sample_type =3D PERF_SAMPLE_REGS_INTR, - .sample_regs_intr =3D PERF_REG_EXTENDED_MASK, - .precise_ip =3D 1, - .disabled =3D 1, - .exclude_kernel =3D 1, + .type =3D PERF_TYPE_HARDWARE, + .config =3D PERF_COUNT_HW_CPU_CYCLES, + .sample_type =3D sample_type, + .precise_ip =3D 1, + .disabled =3D 1, + .exclude_kernel =3D 1, + .sample_simd_regs_enabled =3D has_simd_regs, }; int fd; /* * In an unnamed union, init it here to build on older gcc versions */ attr.sample_period =3D 1; + if (sample_type =3D=3D PERF_SAMPLE_REGS_INTR) + attr.sample_regs_intr =3D mask; + else + attr.sample_regs_user =3D mask; =20 if (perf_pmus__num_core_pmus() > 1) { struct perf_pmu *pmu =3D NULL; @@ -318,13 +493,73 @@ uint64_t arch__intr_reg_mask(void) fd =3D sys_perf_event_open(&attr, 0, -1, -1, 0); if (fd !=3D -1) { close(fd); - return (PERF_REG_EXTENDED_MASK | PERF_REGS_MASK); + return mask; } =20 - return PERF_REGS_MASK; + return 0; +} + +uint64_t arch__intr_reg_mask(void) +{ + uint64_t mask =3D PERF_REGS_MASK; + + if (has_cap_simd_regs()) { + mask |=3D __arch__reg_mask(PERF_SAMPLE_REGS_INTR, + GENMASK_ULL(PERF_REG_X86_R31, PERF_REG_X86_R16), + true); + mask |=3D __arch__reg_mask(PERF_SAMPLE_REGS_INTR, + BIT_ULL(PERF_REG_X86_SSP), + true); + } else + mask |=3D __arch__reg_mask(PERF_SAMPLE_REGS_INTR, PERF_REG_EXTENDED_MASK= , false); + + return mask; } =20 uint64_t arch__user_reg_mask(void) { - return PERF_REGS_MASK; + uint64_t mask =3D PERF_REGS_MASK; + + if (has_cap_simd_regs()) { + mask |=3D __arch__reg_mask(PERF_SAMPLE_REGS_USER, + GENMASK_ULL(PERF_REG_X86_R31, PERF_REG_X86_R16), + true); + mask |=3D __arch__reg_mask(PERF_SAMPLE_REGS_USER, + BIT_ULL(PERF_REG_X86_SSP), + true); + } + + return mask; +} + +uint64_t arch__intr_simd_reg_mask(u16 *qwords) +{ + if (!has_cap_simd_regs()) + return 0; + *qwords =3D intr_simd_qwords; + return intr_simd_mask; +} + +uint64_t arch__user_simd_reg_mask(u16 *qwords) +{ + if (!has_cap_simd_regs()) + return 0; + *qwords =3D user_simd_qwords; + return user_simd_mask; +} + +uint64_t arch__intr_pred_reg_mask(u16 *qwords) +{ + if (!has_cap_simd_regs()) + return 0; + *qwords =3D pred_qwords; + return pred_mask; +} + +uint64_t arch__user_pred_reg_mask(u16 *qwords) +{ + if (!has_cap_simd_regs()) + return 0; + *qwords =3D pred_qwords; + return pred_mask; } diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index d55482f094bf..af6e1c843fc5 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -1402,12 +1402,37 @@ void evsel__config(struct evsel *evsel, struct reco= rd_opts *opts, evsel__set_sample_bit(evsel, REGS_INTR); } =20 + if ((opts->sample_intr_vec_regs || opts->sample_intr_pred_regs) && + !evsel->no_aux_samples && !evsel__is_dummy_event(evsel)) { + /* The pred qwords is to implies the set of SIMD registers is used */ + if (opts->sample_pred_regs_qwords) + attr->sample_simd_pred_reg_qwords =3D opts->sample_pred_regs_qwords; + else + attr->sample_simd_pred_reg_qwords =3D 1; + attr->sample_simd_vec_reg_intr =3D opts->sample_intr_vec_regs; + attr->sample_simd_vec_reg_qwords =3D opts->sample_vec_regs_qwords; + attr->sample_simd_pred_reg_intr =3D opts->sample_intr_pred_regs; + evsel__set_sample_bit(evsel, REGS_INTR); + } + if (opts->sample_user_regs && !evsel->no_aux_samples && !evsel__is_dummy_event(evsel)) { attr->sample_regs_user |=3D opts->sample_user_regs; evsel__set_sample_bit(evsel, REGS_USER); } =20 + if ((opts->sample_user_vec_regs || opts->sample_user_pred_regs) && + !evsel->no_aux_samples && !evsel__is_dummy_event(evsel)) { + if (opts->sample_pred_regs_qwords) + attr->sample_simd_pred_reg_qwords =3D opts->sample_pred_regs_qwords; + else + attr->sample_simd_pred_reg_qwords =3D 1; + attr->sample_simd_vec_reg_user =3D opts->sample_user_vec_regs; + attr->sample_simd_vec_reg_qwords =3D opts->sample_vec_regs_qwords; + attr->sample_simd_pred_reg_user =3D opts->sample_user_pred_regs; + evsel__set_sample_bit(evsel, REGS_USER); + } + if (target__has_cpu(&opts->target) || opts->sample_cpu) evsel__set_sample_bit(evsel, CPU); =20 diff --git a/tools/perf/util/parse-regs-options.c b/tools/perf/util/parse-r= egs-options.c index cda1c620968e..27266038352f 100644 --- a/tools/perf/util/parse-regs-options.c +++ b/tools/perf/util/parse-regs-options.c @@ -4,20 +4,26 @@ #include #include #include +#include #include "util/debug.h" #include #include "util/perf_regs.h" #include "util/parse-regs-options.h" +#include "record.h" =20 static int __parse_regs(const struct option *opt, const char *str, int unset, bool in= tr) { uint64_t *mode =3D (uint64_t *)opt->value; const struct sample_reg *r =3D NULL; + u16 simd_qwords, pred_qwords; + u64 simd_mask, pred_mask; + struct record_opts *opts; char *s, *os =3D NULL, *p; int ret =3D -1; uint64_t mask; =20 + if (unset) return 0; =20 @@ -27,10 +33,17 @@ __parse_regs(const struct option *opt, const char *str,= int unset, bool intr) if (*mode) return -1; =20 - if (intr) + if (intr) { + opts =3D container_of(opt->value, struct record_opts, sample_intr_regs); mask =3D arch__intr_reg_mask(); - else + simd_mask =3D arch__intr_simd_reg_mask(&simd_qwords); + pred_mask =3D arch__intr_pred_reg_mask(&pred_qwords); + } else { + opts =3D container_of(opt->value, struct record_opts, sample_user_regs); mask =3D arch__user_reg_mask(); + simd_mask =3D arch__user_simd_reg_mask(&simd_qwords); + pred_mask =3D arch__user_pred_reg_mask(&pred_qwords); + } =20 /* str may be NULL in case no arg is passed to -I */ if (str) { @@ -50,10 +63,51 @@ __parse_regs(const struct option *opt, const char *str,= int unset, bool intr) if (r->mask & mask) fprintf(stderr, "%s ", r->name); } + for (r =3D arch__sample_simd_reg_masks(); r->name; r++) { + if (pred_qwords =3D=3D r->qwords.pred) { + fprintf(stderr, "%s0-%d ", r->name, fls64(pred_mask) - 1); + continue; + } + if (simd_qwords >=3D r->mask) + fprintf(stderr, "%s0-%d ", r->name, fls64(simd_mask) - 1); + } + fputc('\n', stderr); /* just printing available regs */ goto error; } + + if (simd_mask || pred_mask) { + u16 vec_regs_qwords =3D 0, pred_regs_qwords =3D 0; + + for (r =3D arch__sample_simd_reg_masks(); r->name; r++) { + if (!strcasecmp(s, r->name)) { + vec_regs_qwords =3D r->qwords.vec; + pred_regs_qwords =3D r->qwords.pred; + break; + } + } + + /* Just need the highest qwords */ + if (vec_regs_qwords > opts->sample_vec_regs_qwords) { + opts->sample_vec_regs_qwords =3D vec_regs_qwords; + if (intr) + opts->sample_intr_vec_regs =3D simd_mask; + else + opts->sample_user_vec_regs =3D simd_mask; + } + if (pred_regs_qwords > opts->sample_pred_regs_qwords) { + opts->sample_pred_regs_qwords =3D pred_regs_qwords; + if (intr) + opts->sample_intr_pred_regs =3D pred_mask; + else + opts->sample_user_pred_regs =3D pred_mask; + } + + if (r->name) + goto next; + } + for (r =3D arch__sample_reg_masks(); r->name; r++) { if ((r->mask & mask) && !strcasecmp(s, r->name)) break; @@ -65,7 +119,7 @@ __parse_regs(const struct option *opt, const char *str, = int unset, bool intr) } =20 *mode |=3D r->mask; - +next: if (!p) break; =20 diff --git a/tools/perf/util/perf_event_attr_fprintf.c b/tools/perf/util/pe= rf_event_attr_fprintf.c index 66b666d9ce64..fb0366d050cf 100644 --- a/tools/perf/util/perf_event_attr_fprintf.c +++ b/tools/perf/util/perf_event_attr_fprintf.c @@ -360,6 +360,12 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_eve= nt_attr *attr, PRINT_ATTRf(aux_start_paused, p_unsigned); PRINT_ATTRf(aux_pause, p_unsigned); PRINT_ATTRf(aux_resume, p_unsigned); + PRINT_ATTRf(sample_simd_pred_reg_qwords, p_unsigned); + PRINT_ATTRf(sample_simd_pred_reg_intr, p_hex); + PRINT_ATTRf(sample_simd_pred_reg_user, p_hex); + PRINT_ATTRf(sample_simd_vec_reg_qwords, p_unsigned); + PRINT_ATTRf(sample_simd_vec_reg_intr, p_hex); + PRINT_ATTRf(sample_simd_vec_reg_user, p_hex); =20 return ret; } diff --git a/tools/perf/util/perf_regs.c b/tools/perf/util/perf_regs.c index 44b90bbf2d07..0744c77b4ac8 100644 --- a/tools/perf/util/perf_regs.c +++ b/tools/perf/util/perf_regs.c @@ -21,6 +21,30 @@ uint64_t __weak arch__user_reg_mask(void) return 0; } =20 +uint64_t __weak arch__intr_simd_reg_mask(u16 *qwords) +{ + *qwords =3D 0; + return 0; +} + +uint64_t __weak arch__user_simd_reg_mask(u16 *qwords) +{ + *qwords =3D 0; + return 0; +} + +uint64_t __weak arch__intr_pred_reg_mask(u16 *qwords) +{ + *qwords =3D 0; + return 0; +} + +uint64_t __weak arch__user_pred_reg_mask(u16 *qwords) +{ + *qwords =3D 0; + return 0; +} + static const struct sample_reg sample_reg_masks[] =3D { SMPL_REG_END }; @@ -30,6 +54,11 @@ const struct sample_reg * __weak arch__sample_reg_masks(= void) return sample_reg_masks; } =20 +const struct sample_reg * __weak arch__sample_simd_reg_masks(void) +{ + return sample_reg_masks; +} + const char *perf_reg_name(int id, const char *arch) { const char *reg_name =3D NULL; diff --git a/tools/perf/util/perf_regs.h b/tools/perf/util/perf_regs.h index f2d0736d65cc..b932caa73a8a 100644 --- a/tools/perf/util/perf_regs.h +++ b/tools/perf/util/perf_regs.h @@ -9,7 +9,13 @@ struct regs_dump; =20 struct sample_reg { const char *name; - uint64_t mask; + union { + struct { + uint32_t vec; + uint32_t pred; + } qwords; + uint64_t mask; + }; }; =20 #define SMPL_REG_MASK(b) (1ULL << (b)) @@ -27,6 +33,11 @@ int arch_sdt_arg_parse_op(char *old_op, char **new_op); uint64_t arch__intr_reg_mask(void); uint64_t arch__user_reg_mask(void); const struct sample_reg *arch__sample_reg_masks(void); +const struct sample_reg *arch__sample_simd_reg_masks(void); +uint64_t arch__intr_simd_reg_mask(u16 *qwords); +uint64_t arch__user_simd_reg_mask(u16 *qwords); +uint64_t arch__intr_pred_reg_mask(u16 *qwords); +uint64_t arch__user_pred_reg_mask(u16 *qwords); =20 const char *perf_reg_name(int id, const char *arch); int perf_reg_value(u64 *valp, struct regs_dump *regs, int id); diff --git a/tools/perf/util/record.h b/tools/perf/util/record.h index ea3a6c4657ee..825ffb4cc53f 100644 --- a/tools/perf/util/record.h +++ b/tools/perf/util/record.h @@ -59,7 +59,13 @@ struct record_opts { unsigned int user_freq; u64 branch_stack; u64 sample_intr_regs; + u64 sample_intr_vec_regs; u64 sample_user_regs; + u64 sample_user_vec_regs; + u16 sample_pred_regs_qwords; + u16 sample_vec_regs_qwords; + u16 sample_intr_pred_regs; + u16 sample_user_pred_regs; u64 default_interval; u64 user_interval; size_t auxtrace_snapshot_size; --=20 2.38.1 From nobody Wed Oct 8 14:18:30 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EDB8230F524 for ; Fri, 15 Aug 2025 21:35:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755293758; cv=none; b=IAhVFplh69r+aDXifBxJH+grYwfwapaAW++Nx/XI5RsS9oEFsLe/m7Ezrj3zuA2XZiSMslMcvciqKUCR47ccSYmR9kWWAV7gMcTL7f8HHCsqp8EzIFQEPMd2gJCtcrIWic9nsUyLvS+CiJw+epZzky7ThBM8OVQiHtPhtE0M/U0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755293758; c=relaxed/simple; bh=yKA5f4nWjm9us8MphiAr5foByb9+nVvSKrYfvmpgLsI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=LCdOdf8Uf/yjwMfrZIg7WP6d1/zrpx3xwbcpxDX/dY/axVC5XAKyBb8Rq3v5/dwp4c9Xo5VlR5n1SIyyEIIy8RFiNKcjBDctqIjlfT9oFhZfGqAbm8QYuHTwO68CR9RNY8an07TQEAsoO0fUVWwQ2cZ+cEnBieeNlUua3yqujxU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=O6nS4xcj; arc=none smtp.client-ip=198.175.65.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="O6nS4xcj" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1755293757; x=1786829757; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=yKA5f4nWjm9us8MphiAr5foByb9+nVvSKrYfvmpgLsI=; b=O6nS4xcjHtVDpPInoTlgSd+432dK+3D29IM9K3e64punGcu3K06fd4wR xhJB3eAF5nvvHkEZi2P/XIQ9R9M6UvFdtm6Dis3MZUS7dfJzWob9uJqrl RKgtWnWuA7hQhXVdtPRdVu+aN5eK9I8ckz4WAjRuuT3O8T8RJvLPYrcJT exAMLC+5fKrfdV6SN4b1mOKQPe5MjTLfIuFIRwM9kFMvUg8TSlOtB3Uxm kggOYTGa9QDCgWAv1fshUsEhqkEsHD/4R2xx8UGLEER/eRcn4I1YS83dU 3CsaT73xAJ1nyaYgMT5d5T/JpUUh37ZQK4w2OZryRjWaIMPfds4s7x1w/ A==; X-CSE-ConnectionGUID: owI+Iw3LQQSZZku++8ZqAw== X-CSE-MsgGUID: aTafiSO1QEmjbSP01iR8Rg== X-IronPort-AV: E=McAfee;i="6800,10657,11523"; a="68707491" X-IronPort-AV: E=Sophos;i="6.17,293,1747724400"; d="scan'208";a="68707491" Received: from orviesa010.jf.intel.com ([10.64.159.150]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Aug 2025 14:35:48 -0700 X-CSE-ConnectionGUID: kpBkzUdeSf2MBdEaCCZ5Bw== X-CSE-MsgGUID: 0HwQCP69Q4+cBkjWiFa6Tw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.17,293,1747724400"; d="scan'208";a="166319633" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by orviesa010.jf.intel.com with ESMTP; 15 Aug 2025 14:35:48 -0700 From: kan.liang@linux.intel.com To: peterz@infradead.org, mingo@redhat.com, acme@kernel.org, namhyung@kernel.org, tglx@linutronix.de, dave.hansen@linux.intel.com, irogers@google.com, adrian.hunter@intel.com, jolsa@kernel.org, alexander.shishkin@linux.intel.com, linux-kernel@vger.kernel.org Cc: dapeng1.mi@linux.intel.com, ak@linux.intel.com, zide.chen@intel.com, mark.rutland@arm.com, broonie@kernel.org, ravi.bangoria@amd.com, eranian@google.com, Kan Liang Subject: [POC PATCH 17/17] perf regs: Support the PERF_SAMPLE_REGS_ABI_SIMD Date: Fri, 15 Aug 2025 14:34:35 -0700 Message-Id: <20250815213435.1702022-18-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20250815213435.1702022-1-kan.liang@linux.intel.com> References: <20250815213435.1702022-1-kan.liang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang Support the new PERF_SAMPLE_REGS_ABI_SIMD. Dump the data to perf report -D. Only the superset of the vector registers is displayed for now. Example: $perf record -e cycles:p -IXMM,YMM,OPMASK,SSP ./test $perf report -D ... ... 237538985992962 0x454d0 [0x480]: PERF_RECORD_SAMPLE(IP, 0x1): 179370/179370: 0xffffffff969627fc period: 124999 addr: 0 ... intr regs: mask 0x20000000000 ABI 64-bit .... SSP 0x0000000000000000 ... SIMD ABI nr_vectors 32 vector_qwords 4 nr_pred 8 pred_qwords 1 .... YMM [0] 0x0000000000004000 .... YMM [0] 0x000055e828695270 .... YMM [0] 0x0000000000000000 .... YMM [0] 0x0000000000000000 .... YMM [1] 0x000055e8286990e0 .... YMM [1] 0x000055e828698dd0 .... YMM [1] 0x0000000000000000 .... YMM [1] 0x0000000000000000 ... ... .... YMM [31] 0x0000000000000000 .... YMM [31] 0x0000000000000000 .... YMM [31] 0x0000000000000000 .... YMM [31] 0x0000000000000000 .... OPMASK[0] 0x0000000000100221 .... OPMASK[1] 0x0000000000000020 .... OPMASK[2] 0x000000007fffffff .... OPMASK[3] 0x0000000000000000 .... OPMASK[4] 0x0000000000000000 .... OPMASK[5] 0x0000000000000000 .... OPMASK[6] 0x0000000000000000 .... OPMASK[7] 0x0000000000000000 ... ... Signed-off-by: Kan Liang --- tools/perf/util/evsel.c | 18 ++++++ .../perf/util/perf-regs-arch/perf_regs_x86.c | 45 ++++++++++++++ tools/perf/util/sample.h | 10 +++ tools/perf/util/session.c | 62 ++++++++++++++++--- 4 files changed, 127 insertions(+), 8 deletions(-) diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index 7cfb0aab5dd9..e0c0ebfafc23 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -3233,6 +3233,15 @@ int evsel__parse_sample(struct evsel *evsel, union p= erf_event *event, regs->mask =3D mask; regs->regs =3D (u64 *)array; array =3D (void *)array + sz; + + if (regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) { + regs->config =3D *(u64 *)array; + array =3D (void *)array + sizeof(u64); + regs->data =3D (u64 *)array; + sz =3D (regs->nr_vectors * regs->vector_qwords + regs->nr_pred * regs-= >pred_qwords) * sizeof(u64); + OVERFLOW_CHECK(array, sz, max_size); + array =3D (void *)array + sz; + } } } =20 @@ -3290,6 +3299,15 @@ int evsel__parse_sample(struct evsel *evsel, union p= erf_event *event, regs->mask =3D mask; regs->regs =3D (u64 *)array; array =3D (void *)array + sz; + + if (regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) { + regs->config =3D *(u64 *)array; + array =3D (void *)array + sizeof(u64); + regs->data =3D (u64 *)array; + sz =3D (regs->nr_vectors * regs->vector_qwords + regs->nr_pred * regs-= >pred_qwords) * sizeof(u64); + OVERFLOW_CHECK(array, sz, max_size); + array =3D (void *)array + sz; + } } } =20 diff --git a/tools/perf/util/perf-regs-arch/perf_regs_x86.c b/tools/perf/ut= il/perf-regs-arch/perf_regs_x86.c index 708954a9d35d..b494f4504052 100644 --- a/tools/perf/util/perf-regs-arch/perf_regs_x86.c +++ b/tools/perf/util/perf-regs-arch/perf_regs_x86.c @@ -5,6 +5,51 @@ =20 const char *__perf_reg_name_x86(int id) { + u16 qwords; + + if (id > PERF_REG_X86_R15 && arch__intr_simd_reg_mask(&qwords)) { + switch (id) { + case PERF_REG_X86_R16: + return "R16"; + case PERF_REG_X86_R17: + return "R17"; + case PERF_REG_X86_R18: + return "R18"; + case PERF_REG_X86_R19: + return "R19"; + case PERF_REG_X86_R20: + return "R20"; + case PERF_REG_X86_R21: + return "R21"; + case PERF_REG_X86_R22: + return "R22"; + case PERF_REG_X86_R23: + return "R23"; + case PERF_REG_X86_R24: + return "R24"; + case PERF_REG_X86_R25: + return "R25"; + case PERF_REG_X86_R26: + return "R26"; + case PERF_REG_X86_R27: + return "R27"; + case PERF_REG_X86_R28: + return "R28"; + case PERF_REG_X86_R29: + return "R29"; + case PERF_REG_X86_R30: + return "R30"; + case PERF_REG_X86_R31: + return "R31"; + case PERF_REG_X86_SSP: + return "SSP"; + default: + return NULL; + } + + return NULL; + } + switch (id) { case PERF_REG_X86_AX: return "AX"; diff --git a/tools/perf/util/sample.h b/tools/perf/util/sample.h index 0e96240052e9..36ac4519014b 100644 --- a/tools/perf/util/sample.h +++ b/tools/perf/util/sample.h @@ -12,6 +12,16 @@ struct regs_dump { u64 abi; u64 mask; u64 *regs; + union { + u64 config; + struct { + u16 nr_vectors; + u16 vector_qwords; + u16 nr_pred; + u16 pred_qwords; + }; + }; + u64 *data; =20 /* Cached values/mask filled by first register access. */ u64 cache_regs[PERF_SAMPLE_REGS_CACHE_SIZE]; diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c index a320672c264e..6f931abe2050 100644 --- a/tools/perf/util/session.c +++ b/tools/perf/util/session.c @@ -922,18 +922,62 @@ static void regs_dump__printf(u64 mask, u64 *regs, co= nst char *arch) } } =20 -static const char *regs_abi[] =3D { - [PERF_SAMPLE_REGS_ABI_NONE] =3D "none", - [PERF_SAMPLE_REGS_ABI_32] =3D "32-bit", - [PERF_SAMPLE_REGS_ABI_64] =3D "64-bit", -}; +static void simd_regs_dump__printf(struct regs_dump *regs) +{ + const char *name =3D "unknown"; + const struct sample_reg *r; + int i, idx =3D 0; + + if (!(regs->abi & PERF_SAMPLE_REGS_ABI_SIMD)) + return; + + printf("... SIMD ABI nr_vectors %d vector_qwords %d nr_pred %d pred_qword= s %d\n", + regs->nr_vectors, regs->vector_qwords, + regs->nr_pred, regs->pred_qwords); + + for (r =3D arch__sample_simd_reg_masks(); r->name; r++) { + if (regs->vector_qwords =3D=3D r->qwords.vec) { + name =3D r->name; + break; + } + } + + for (i =3D 0; i < regs->nr_vectors; i++) { + printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]); + printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]); + if (regs->vector_qwords > 2) { + printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]); + printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]); + } + if (regs->vector_qwords > 4) { + printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]); + printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]); + printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]); + printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]); + } + } + + name =3D "unknown"; + for (r =3D arch__sample_simd_reg_masks(); r->name; r++) { + if (r->qwords.pred && regs->pred_qwords =3D=3D r->qwords.pred) { + name =3D r->name; + break; + } + } + for (i =3D 0; i < regs->nr_pred; i++) + printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]); +} =20 static inline const char *regs_dump_abi(struct regs_dump *d) { - if (d->abi > PERF_SAMPLE_REGS_ABI_64) - return "unknown"; + if (!d->abi) + return "none"; + if (d->abi & PERF_SAMPLE_REGS_ABI_32) + return "32-bit"; + else if (d->abi & PERF_SAMPLE_REGS_ABI_64) + return "64-bit"; =20 - return regs_abi[d->abi]; + return "unknown"; } =20 static void regs__printf(const char *type, struct regs_dump *regs, const c= har *arch) @@ -946,6 +990,8 @@ static void regs__printf(const char *type, struct regs_= dump *regs, const char *a regs_dump_abi(regs)); =20 regs_dump__printf(mask, regs->regs, arch); + + simd_regs_dump__printf(regs); } =20 static void regs_user__printf(struct perf_sample *sample, const char *arch) --=20 2.38.1