From nobody Sat Feb 7 14:28:38 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9B41429AB02; Wed, 3 Dec 2025 06:58:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.8 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764745087; cv=none; b=Yqausf/hK+FoNl+moGITDTRmvmgfMJMAHM5ch9YNAeaeZ1ts6T87giGgf89QU14fiDVBKqZfnJq/X25uyiw2/T0IzvEOulbWnxGGXWFFBGqbyZ0oVRyE51YlHkyFkmS0TZW90cIheq6b3TC5XUj9vICpFV09YBIDnfF1KmJwZGQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764745087; c=relaxed/simple; bh=s8Mjt2hi1feECOqH4GgDxZyRFwxJ3TUBCd+huGJXuA4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=JejdIRq/zWpnp/fRxan49ynZCpbBxT3ZoJSe8GdIuVjy5hrLzGJBJ2UJUlc3X57SSD9SiQ0lrbmR6EVLCCGb2Y5moPbmPZTt8vBpr2Spfrys25DRpHjWB4vA5wghA0z2xLHB4RzH2+rRO6fMha5S1hAsIaAFKNaYB6WvdfCK9PE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=NhbwyEgw; arc=none smtp.client-ip=192.198.163.8 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="NhbwyEgw" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1764745086; x=1796281086; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=s8Mjt2hi1feECOqH4GgDxZyRFwxJ3TUBCd+huGJXuA4=; b=NhbwyEgw/qJ7C/oKDoFbUtXAA5C/xGUk9GIBuV6rqP4WPso0HDkT06dY RSwbtat9TUaxsuCHiSUdPNEXWfHK1cluKXgM2o8hCQVvvK612jreQF0Dy xTNHQDNVUR9SBr/kyPUoZThooUo9KcSleBaADyiQYwQQmvhfbrAURFREm f+Zv7bMahYx05QixVb+dgFxQubrYo2b1qouo5vG9aUJ/hJJqURC1X9OSF KfJCFqbxMsbGVbHAmdyMdcWnmsbYmFkIqxDBTg3lc+EgX35BbsHpZiGZR oWvQixtDMSVTQQWo/8L5p4kHvQN8Mt0IhjTluAWDixrzVDPCBchW/xvSv w==; X-CSE-ConnectionGUID: AWsBKJmfS62riYS/Ff3FbQ== X-CSE-MsgGUID: gnSl+qyKTd+2byr0LTAdfQ== X-IronPort-AV: E=McAfee;i="6800,10657,11631"; a="84324715" X-IronPort-AV: E=Sophos;i="6.20,245,1758610800"; d="scan'208";a="84324715" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2025 22:58:05 -0800 X-CSE-ConnectionGUID: vvrxwYgMSL2KPLcalSBNWQ== X-CSE-MsgGUID: oDKQ4jMiQsK8I/iNsJXyWA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.20,245,1758610800"; d="scan'208";a="199003743" Received: from spr.sh.intel.com ([10.112.229.196]) by fmviesa005.fm.intel.com with ESMTP; 02 Dec 2025 22:58:00 -0800 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Thomas Gleixner , Dave Hansen , Ian Rogers , Adrian Hunter , Jiri Olsa , Alexander Shishkin , Andi Kleen , Eranian Stephane Cc: Mark Rutland , broonie@kernel.org, Ravi Bangoria , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Zide Chen , Falcon Thomas , Dapeng Mi , Xudong Hao , Dapeng Mi Subject: [Patch v5 01/19] perf: Eliminate duplicate arch-specific functions definations Date: Wed, 3 Dec 2025 14:54:42 +0800 Message-Id: <20251203065500.2597594-2-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20251203065500.2597594-1-dapeng1.mi@linux.intel.com> References: <20251203065500.2597594-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Define default common __weak functions for perf_reg_value(), perf_reg_validate(), perf_reg_abi() and perf_get_regs_user(). This helps to eliminate the duplicated arch-specific definations. No function changes intended. Signed-off-by: Dapeng Mi --- arch/arm/kernel/perf_regs.c | 6 ------ arch/arm64/kernel/perf_regs.c | 6 ------ arch/csky/kernel/perf_regs.c | 6 ------ arch/loongarch/kernel/perf_regs.c | 6 ------ arch/mips/kernel/perf_regs.c | 6 ------ arch/parisc/kernel/perf_regs.c | 6 ------ arch/riscv/kernel/perf_regs.c | 6 ------ arch/x86/kernel/perf_regs.c | 6 ------ include/linux/perf_regs.h | 32 ++++++------------------------- kernel/events/core.c | 22 +++++++++++++++++++++ 10 files changed, 28 insertions(+), 74 deletions(-) diff --git a/arch/arm/kernel/perf_regs.c b/arch/arm/kernel/perf_regs.c index 0529f90395c9..d575a4c3ca56 100644 --- a/arch/arm/kernel/perf_regs.c +++ b/arch/arm/kernel/perf_regs.c @@ -31,9 +31,3 @@ u64 perf_reg_abi(struct task_struct *task) return PERF_SAMPLE_REGS_ABI_32; } =20 -void perf_get_regs_user(struct perf_regs *regs_user, - struct pt_regs *regs) -{ - regs_user->regs =3D task_pt_regs(current); - regs_user->abi =3D perf_reg_abi(current); -} diff --git a/arch/arm64/kernel/perf_regs.c b/arch/arm64/kernel/perf_regs.c index b4eece3eb17d..70e2f13f587f 100644 --- a/arch/arm64/kernel/perf_regs.c +++ b/arch/arm64/kernel/perf_regs.c @@ -98,9 +98,3 @@ u64 perf_reg_abi(struct task_struct *task) return PERF_SAMPLE_REGS_ABI_64; } =20 -void perf_get_regs_user(struct perf_regs *regs_user, - struct pt_regs *regs) -{ - regs_user->regs =3D task_pt_regs(current); - regs_user->abi =3D perf_reg_abi(current); -} diff --git a/arch/csky/kernel/perf_regs.c b/arch/csky/kernel/perf_regs.c index 09b7f88a2d6a..94601f37b596 100644 --- a/arch/csky/kernel/perf_regs.c +++ b/arch/csky/kernel/perf_regs.c @@ -31,9 +31,3 @@ u64 perf_reg_abi(struct task_struct *task) return PERF_SAMPLE_REGS_ABI_32; } =20 -void perf_get_regs_user(struct perf_regs *regs_user, - struct pt_regs *regs) -{ - regs_user->regs =3D task_pt_regs(current); - regs_user->abi =3D perf_reg_abi(current); -} diff --git a/arch/loongarch/kernel/perf_regs.c b/arch/loongarch/kernel/perf= _regs.c index 263ac4ab5af6..8dd604f01745 100644 --- a/arch/loongarch/kernel/perf_regs.c +++ b/arch/loongarch/kernel/perf_regs.c @@ -45,9 +45,3 @@ u64 perf_reg_value(struct pt_regs *regs, int idx) return regs->regs[idx]; } =20 -void perf_get_regs_user(struct perf_regs *regs_user, - struct pt_regs *regs) -{ - regs_user->regs =3D task_pt_regs(current); - regs_user->abi =3D perf_reg_abi(current); -} diff --git a/arch/mips/kernel/perf_regs.c b/arch/mips/kernel/perf_regs.c index e686780d1647..7736d3c5ebd2 100644 --- a/arch/mips/kernel/perf_regs.c +++ b/arch/mips/kernel/perf_regs.c @@ -60,9 +60,3 @@ u64 perf_reg_value(struct pt_regs *regs, int idx) return (s64)v; /* Sign extend if 32-bit. */ } =20 -void perf_get_regs_user(struct perf_regs *regs_user, - struct pt_regs *regs) -{ - regs_user->regs =3D task_pt_regs(current); - regs_user->abi =3D perf_reg_abi(current); -} diff --git a/arch/parisc/kernel/perf_regs.c b/arch/parisc/kernel/perf_regs.c index 68458e2f6197..87e6990569a7 100644 --- a/arch/parisc/kernel/perf_regs.c +++ b/arch/parisc/kernel/perf_regs.c @@ -53,9 +53,3 @@ u64 perf_reg_abi(struct task_struct *task) return PERF_SAMPLE_REGS_ABI_64; } =20 -void perf_get_regs_user(struct perf_regs *regs_user, - struct pt_regs *regs) -{ - regs_user->regs =3D task_pt_regs(current); - regs_user->abi =3D perf_reg_abi(current); -} diff --git a/arch/riscv/kernel/perf_regs.c b/arch/riscv/kernel/perf_regs.c index fd304a248de6..3bba8deababb 100644 --- a/arch/riscv/kernel/perf_regs.c +++ b/arch/riscv/kernel/perf_regs.c @@ -35,9 +35,3 @@ u64 perf_reg_abi(struct task_struct *task) #endif } =20 -void perf_get_regs_user(struct perf_regs *regs_user, - struct pt_regs *regs) -{ - regs_user->regs =3D task_pt_regs(current); - regs_user->abi =3D perf_reg_abi(current); -} diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c index 624703af80a1..81204cb7f723 100644 --- a/arch/x86/kernel/perf_regs.c +++ b/arch/x86/kernel/perf_regs.c @@ -100,12 +100,6 @@ u64 perf_reg_abi(struct task_struct *task) return PERF_SAMPLE_REGS_ABI_32; } =20 -void perf_get_regs_user(struct perf_regs *regs_user, - struct pt_regs *regs) -{ - regs_user->regs =3D task_pt_regs(current); - regs_user->abi =3D perf_reg_abi(current); -} #else /* CONFIG_X86_64 */ #define REG_NOSUPPORT ((1ULL << PERF_REG_X86_DS) | \ (1ULL << PERF_REG_X86_ES) | \ diff --git a/include/linux/perf_regs.h b/include/linux/perf_regs.h index f632c5725f16..144bcc3ff19f 100644 --- a/include/linux/perf_regs.h +++ b/include/linux/perf_regs.h @@ -9,6 +9,12 @@ struct perf_regs { struct pt_regs *regs; }; =20 +u64 perf_reg_value(struct pt_regs *regs, int idx); +int perf_reg_validate(u64 mask); +u64 perf_reg_abi(struct task_struct *task); +void perf_get_regs_user(struct perf_regs *regs_user, + struct pt_regs *regs); + #ifdef CONFIG_HAVE_PERF_REGS #include =20 @@ -16,35 +22,9 @@ struct perf_regs { #define PERF_REG_EXTENDED_MASK 0 #endif =20 -u64 perf_reg_value(struct pt_regs *regs, int idx); -int perf_reg_validate(u64 mask); -u64 perf_reg_abi(struct task_struct *task); -void perf_get_regs_user(struct perf_regs *regs_user, - struct pt_regs *regs); #else =20 #define PERF_REG_EXTENDED_MASK 0 =20 -static inline u64 perf_reg_value(struct pt_regs *regs, int idx) -{ - return 0; -} - -static inline int perf_reg_validate(u64 mask) -{ - return mask ? -ENOSYS : 0; -} - -static inline u64 perf_reg_abi(struct task_struct *task) -{ - return PERF_SAMPLE_REGS_ABI_NONE; -} - -static inline void perf_get_regs_user(struct perf_regs *regs_user, - struct pt_regs *regs) -{ - regs_user->regs =3D task_pt_regs(current); - regs_user->abi =3D perf_reg_abi(current); -} #endif /* CONFIG_HAVE_PERF_REGS */ #endif /* _LINUX_PERF_REGS_H */ diff --git a/kernel/events/core.c b/kernel/events/core.c index f6a08c73f783..efc938c6a2be 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -7431,6 +7431,28 @@ unsigned long perf_instruction_pointer(struct perf_e= vent *event, return perf_arch_instruction_pointer(regs); } =20 +u64 __weak perf_reg_value(struct pt_regs *regs, int idx) +{ + return 0; +} + +int __weak perf_reg_validate(u64 mask) +{ + return mask ? -ENOSYS : 0; +} + +u64 __weak perf_reg_abi(struct task_struct *task) +{ + return PERF_SAMPLE_REGS_ABI_NONE; +} + +void __weak perf_get_regs_user(struct perf_regs *regs_user, + struct pt_regs *regs) +{ + regs_user->regs =3D task_pt_regs(current); + regs_user->abi =3D perf_reg_abi(current); +} + static void perf_output_sample_regs(struct perf_output_handle *handle, struct pt_regs *regs, u64 mask) --=20 2.34.1 From nobody Sat Feb 7 14:28:38 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8EC8929B793; Wed, 3 Dec 2025 06:58:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.8 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764745092; cv=none; b=BkDKZLpoqEJmdyZqHi/AwqEL8x7Urfr+EtC665oq1+X5OuFK4UfTyGS1d6vWDFSubRS9hYu+8Il7RZYr25aT1XkIf3+Lj+h+9fGAmcaonbiVKOCG95ZcYlZti5+CtUn/9PFvwIqbCb6CtqmE5ZLajM+UMp7o9OVzTibS6FtGU4w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764745092; c=relaxed/simple; bh=IJylo32d8+Pz7OvO2NJYwcN+sWaBEGNEfF89JNYCDEw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=lxcIa2scp1ZQbas3tsDJ9+FX+5xWi/PnAsEHFtpER2gl6m/Ld4kQ/5uf61InzZnkw/kDMnEk/2Wdh8UiAvVcgj8JgVctv4V2TJwM/fVFRfFlxwPrN/cdtpoOZWWOfN4vrA0LzsiEQYQKmq0Qea/gdBjiL/FoqsuwVHMcSIVg5q4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=aC3/JaWf; arc=none smtp.client-ip=192.198.163.8 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="aC3/JaWf" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1764745091; x=1796281091; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=IJylo32d8+Pz7OvO2NJYwcN+sWaBEGNEfF89JNYCDEw=; b=aC3/JaWfXw+WP8S1nrteD5eh0xjpH0rc9HY7PpamAk8cWwSe8RMlRuT8 2yy3cf4XAyclIHr3/Ycm1Y2+piRplSfyAF20WXPoOlOxILp7r82pFP94X VqwG9I2TuMFoS6iBmH15Kcg4fp18j7xPr+1TkqfPKUIKpHC+l4H5Za69v uY0SmsCmsdLpR2/Qu7Cf1SY15ZDaeE4Q+ZZp1JKBaa/X0CYugQBYdC6z4 dz5++E9xjIvGo9JEwtg8NVkjKdkrJ6sPA+zWvgoKpBARIdqFJAu8zFvGO ucIcWQUx1KrJ2Ie6hbpN0i6qlz3dkFdlwbmOFqjh2uJmHwonDzGgG8yXl w==; X-CSE-ConnectionGUID: nNrlu98fQPaUZpKvSpuDng== X-CSE-MsgGUID: 153npsVcSF21u24iUSa7kw== X-IronPort-AV: E=McAfee;i="6800,10657,11631"; a="84324733" X-IronPort-AV: E=Sophos;i="6.20,245,1758610800"; d="scan'208";a="84324733" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2025 22:58:10 -0800 X-CSE-ConnectionGUID: Kx6fwX/VTDyUdsCegoCTXA== X-CSE-MsgGUID: oxZ4/GTTStWjCUehomLsfw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.20,245,1758610800"; d="scan'208";a="199003764" Received: from spr.sh.intel.com ([10.112.229.196]) by fmviesa005.fm.intel.com with ESMTP; 02 Dec 2025 22:58:05 -0800 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Thomas Gleixner , Dave Hansen , Ian Rogers , Adrian Hunter , Jiri Olsa , Alexander Shishkin , Andi Kleen , Eranian Stephane Cc: Mark Rutland , broonie@kernel.org, Ravi Bangoria , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Zide Chen , Falcon Thomas , Dapeng Mi , Xudong Hao , Kan Liang , Dapeng Mi Subject: [Patch v5 02/19] perf/x86: Use x86_perf_regs in the x86 nmi handler Date: Wed, 3 Dec 2025 14:54:43 +0800 Message-Id: <20251203065500.2597594-3-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20251203065500.2597594-1-dapeng1.mi@linux.intel.com> References: <20251203065500.2597594-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang More and more regs will be supported in the overflow, e.g., more vector registers, SSP, etc. The generic pt_regs struct cannot store all of them. Use a X86 specific x86_perf_regs instead. The struct pt_regs *regs is still passed to x86_pmu_handle_irq(). There is no functional change for the existing code. AMD IBS's NMI handler doesn't utilize the static call x86_pmu_handle_irq(). The x86_perf_regs struct doesn't apply to the AMD IBS. It can be added separately later when AMD IBS supports more regs. Signed-off-by: Kan Liang Signed-off-by: Dapeng Mi --- arch/x86/events/core.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 5d0d5e466c62..ef3bf8fbc97f 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -1762,6 +1762,7 @@ void perf_events_lapic_init(void) static int perf_event_nmi_handler(unsigned int cmd, struct pt_regs *regs) { + struct x86_perf_regs x86_regs; u64 start_clock; u64 finish_clock; int ret; @@ -1774,7 +1775,8 @@ perf_event_nmi_handler(unsigned int cmd, struct pt_re= gs *regs) return NMI_DONE; =20 start_clock =3D sched_clock(); - ret =3D static_call(x86_pmu_handle_irq)(regs); + x86_regs.regs =3D *regs; + ret =3D static_call(x86_pmu_handle_irq)(&x86_regs.regs); finish_clock =3D sched_clock(); =20 perf_sample_event_took(finish_clock - start_clock); --=20 2.34.1 From nobody Sat Feb 7 14:28:38 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EF76329B229; Wed, 3 Dec 2025 06:58:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.8 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764745097; cv=none; b=LpJHOtNy0tUXQkyPklCIh0ZU1+LtVoo0IhdOVlGUSDoBXDdegogn+B9OijqLUo0ARKuEbrWfx7FIcT6YLWuHvO8NZzcjlfRhD1UCkDdtiDIzzoEs3ZVp9oLENr5Y1XO2VtL2QmDq6QhEqQSvXoEI8QXRY8qAkKbrSQXglT13r5M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764745097; c=relaxed/simple; bh=NWDX0b18H+JRWd7zZxTxDJESj0fJMlzbeOV8IIQCMpY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=SJkS+Y8q7N+orvEi5s49j5YLTRh3btZpVVnitJ4qNY0sI+0JpsFov31qnVwIAkqMIZkTPm6btoa9y8PsKYeJclDPnefmj8dgkaMqIo9HaL1k3uFHvOwpfvoqsP2UDvke4Gjb8vEgr22g7re7fLJ/SaYIr4MkDX14bpoWc7Cvt3Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=h6cXXqQr; arc=none smtp.client-ip=192.198.163.8 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="h6cXXqQr" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1764745096; x=1796281096; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=NWDX0b18H+JRWd7zZxTxDJESj0fJMlzbeOV8IIQCMpY=; b=h6cXXqQr0xjVbAI/QeXYLrQ+NzbXAt2MXO3y6WLER1SYoVwiftAyyKh5 mBiJC5dhJPs5PymJZGevDncBCJ3+NN212Td64VsrdMWzMzN/dnn5AuvHE 2eiaI/68YAiUaPNClElCJmzgtCJNzwaozw7BqJhPJnDCVIoAcXVktTn6D gL8CRVmBgCs/xJdyLQMPVept/WI6S8MNUb8GUV93BYJdg1gXTr0VmEVSs adGBCpWhi626A+sHC+tEeeBVKd7mGPjOxvqd8TvxG5a0pdgdGSHorXnIK MJFZBmxF6rgT/4YPXqbF2YAU/TFghiZ9CtqWt86ahKIOfI6HTEBUUI+KL w==; X-CSE-ConnectionGUID: x5bN0voiSm+jvfGhQX7Bvw== X-CSE-MsgGUID: ATDm+0JZSmGmcHf+ou2ooQ== X-IronPort-AV: E=McAfee;i="6800,10657,11631"; a="84324762" X-IronPort-AV: E=Sophos;i="6.20,245,1758610800"; d="scan'208";a="84324762" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2025 22:58:16 -0800 X-CSE-ConnectionGUID: FNsoQpoeR8y8NEuE3Z2Tkw== X-CSE-MsgGUID: tka7fAYpSMCnvh0iGHHkWg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.20,245,1758610800"; d="scan'208";a="199003798" Received: from spr.sh.intel.com ([10.112.229.196]) by fmviesa005.fm.intel.com with ESMTP; 02 Dec 2025 22:58:10 -0800 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Thomas Gleixner , Dave Hansen , Ian Rogers , Adrian Hunter , Jiri Olsa , Alexander Shishkin , Andi Kleen , Eranian Stephane Cc: Mark Rutland , broonie@kernel.org, Ravi Bangoria , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Zide Chen , Falcon Thomas , Dapeng Mi , Xudong Hao , Kan Liang , Dapeng Mi Subject: [Patch v5 03/19] perf/x86: Introduce x86-specific x86_pmu_setup_regs_data() Date: Wed, 3 Dec 2025 14:54:44 +0800 Message-Id: <20251203065500.2597594-4-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20251203065500.2597594-1-dapeng1.mi@linux.intel.com> References: <20251203065500.2597594-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang The current perf/x86 implementation uses the generic functions perf_sample_regs_user() and perf_sample_regs_intr() to set up registers data for sampling records. While this approach works for general registers, it falls short when adding sampling support for SIMD and APX eGPRs registers on x86 platforms. To address this, we introduce the x86-specific function x86_pmu_setup_regs_data() for setting up register data on x86 platforms. At present, x86_pmu_setup_regs_data() mirrors the logic of the generic functions perf_sample_regs_user() and perf_sample_regs_intr(). Subsequent patches will introduce x86-specific enhancements. Signed-off-by: Kan Liang Signed-off-by: Dapeng Mi --- arch/x86/events/core.c | 32 ++++++++++++++++++++++++++++++++ arch/x86/events/intel/ds.c | 9 ++++++--- arch/x86/events/perf_event.h | 4 ++++ 3 files changed, 42 insertions(+), 3 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index ef3bf8fbc97f..dcdd2c2d68ee 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -1695,6 +1695,38 @@ static void x86_pmu_del(struct perf_event *event, in= t flags) static_call_cond(x86_pmu_del)(event); } =20 +void x86_pmu_setup_regs_data(struct perf_event *event, + struct perf_sample_data *data, + struct pt_regs *regs) +{ + u64 sample_type =3D event->attr.sample_type; + + if (sample_type & PERF_SAMPLE_REGS_USER) { + if (user_mode(regs)) { + data->regs_user.abi =3D perf_reg_abi(current); + data->regs_user.regs =3D regs; + } else if (!(current->flags & PF_KTHREAD)) { + perf_get_regs_user(&data->regs_user, regs); + } else { + data->regs_user.abi =3D PERF_SAMPLE_REGS_ABI_NONE; + data->regs_user.regs =3D NULL; + } + data->dyn_size +=3D sizeof(u64); + if (data->regs_user.regs) + data->dyn_size +=3D hweight64(event->attr.sample_regs_user) * sizeof(u6= 4); + data->sample_flags |=3D PERF_SAMPLE_REGS_USER; + } + + if (sample_type & PERF_SAMPLE_REGS_INTR) { + data->regs_intr.regs =3D regs; + data->regs_intr.abi =3D perf_reg_abi(current); + data->dyn_size +=3D sizeof(u64); + if (data->regs_intr.regs) + data->dyn_size +=3D hweight64(event->attr.sample_regs_intr) * sizeof(u6= 4); + data->sample_flags |=3D PERF_SAMPLE_REGS_INTR; + } +} + int x86_pmu_handle_irq(struct pt_regs *regs) { struct perf_sample_data data; diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index 2e170f2093ac..c7351f476d8c 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -2180,6 +2180,7 @@ static inline void __setup_pebs_basic_group(struct pe= rf_event *event, } =20 static inline void __setup_pebs_gpr_group(struct perf_event *event, + struct perf_sample_data *data, struct pt_regs *regs, struct pebs_gprs *gprs, u64 sample_type) @@ -2189,8 +2190,10 @@ static inline void __setup_pebs_gpr_group(struct per= f_event *event, regs->flags &=3D ~PERF_EFLAGS_EXACT; } =20 - if (sample_type & (PERF_SAMPLE_REGS_INTR | PERF_SAMPLE_REGS_USER)) + if (sample_type & (PERF_SAMPLE_REGS_INTR | PERF_SAMPLE_REGS_USER)) { adaptive_pebs_save_regs(regs, gprs); + x86_pmu_setup_regs_data(event, data, regs); + } } =20 static inline void __setup_pebs_meminfo_group(struct perf_event *event, @@ -2283,7 +2286,7 @@ static void setup_pebs_adaptive_sample_data(struct pe= rf_event *event, gprs =3D next_record; next_record =3D gprs + 1; =20 - __setup_pebs_gpr_group(event, regs, gprs, sample_type); + __setup_pebs_gpr_group(event, data, regs, gprs, sample_type); } =20 if (format_group & PEBS_DATACFG_MEMINFO) { @@ -2407,7 +2410,7 @@ static void setup_arch_pebs_sample_data(struct perf_e= vent *event, gprs =3D next_record; next_record =3D gprs + 1; =20 - __setup_pebs_gpr_group(event, regs, + __setup_pebs_gpr_group(event, data, regs, (struct pebs_gprs *)gprs, sample_type); } diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 3161ec0a3416..80e52e937638 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -1294,6 +1294,10 @@ void x86_pmu_enable_event(struct perf_event *event); =20 int x86_pmu_handle_irq(struct pt_regs *regs); =20 +void x86_pmu_setup_regs_data(struct perf_event *event, + struct perf_sample_data *data, + struct pt_regs *regs); + void x86_pmu_show_pmu_cap(struct pmu *pmu); =20 static inline int x86_pmu_num_counters(struct pmu *pmu) --=20 2.34.1 From nobody Sat Feb 7 14:28:38 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D67E629D28F; Wed, 3 Dec 2025 06:58:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.8 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764745102; cv=none; b=IScpuFN/MKuAZle4V8dMZxUzxAKAvevSC5zfWe5bLWb3/JR2vrOStAjslet17BdIry0vHgng01uHk1kX7Wj20LRoE59LobAgGvUlN4BXEIne8KLI/hT7V5xRnXIAgJk/BhPLSz2HSr8Pkw1Qli72yRxmrVlxLlb8eDNKTk0doF8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764745102; c=relaxed/simple; bh=x9ZgONUDtVCEjRBKBlKFNPl1U5MMoEuvT00xRdCh3hY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=JS+Z3E2vwdZCFq4yrfSnHGgYy8HQ73/UOSA9piIEVeKKOicEhXfOMKw3F+OK/oymaxv197HpaX4GA+ZJqyB5d5sGg8cX+7E8IsqZXPeqJ15f4mBAjTI16EmYbEglypIrbqyiOV89CjAQAxwl2hMnQCvYyc2zfbE2w9uLu7u/5dI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=eblIBhu2; arc=none smtp.client-ip=192.198.163.8 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="eblIBhu2" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1764745101; x=1796281101; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=x9ZgONUDtVCEjRBKBlKFNPl1U5MMoEuvT00xRdCh3hY=; b=eblIBhu2Ie+/b10v2D92au3bPHac9PC5ty184zQJrNwokQOPCOcmWj4A 5QlRtr0gG/IDbyU/M6rAZ7QOPEpF4rW12z7hHPaLFo9eso7HdLnSUqv0R fDF7Z0MNLmIPUpVUJTiBX3123mWIrwY5bj+bitkk/A4kWXdUIIiXcBUi+ QnfcyZLOhzdNwFlzmmHCAGfWyEMXevtkATWbPyGd5s5+1njAVsVPXcxbd 4BVkqr0uOxnfT2E0Dg+9XOhkUiLOyMOX7s54hs3Mw7EnWT8n/pfVoWNkJ XL5y+hUEJa5fjkXprzv7FyftNA0SGR8BAv8+EOp/6l8Y87f9en92aOyo+ Q==; X-CSE-ConnectionGUID: XiA6jZV1Qbyn9Kor3d3gpw== X-CSE-MsgGUID: TOObFZtJSvadtg1wnFBE8g== X-IronPort-AV: E=McAfee;i="6800,10657,11631"; a="84324778" X-IronPort-AV: E=Sophos;i="6.20,245,1758610800"; d="scan'208";a="84324778" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2025 22:58:21 -0800 X-CSE-ConnectionGUID: g3Oeze4KQlWJ+Y0l+vENIQ== X-CSE-MsgGUID: cwFpJBJuTOGkVVA/pHwITQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.20,245,1758610800"; d="scan'208";a="199003803" Received: from spr.sh.intel.com ([10.112.229.196]) by fmviesa005.fm.intel.com with ESMTP; 02 Dec 2025 22:58:16 -0800 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Thomas Gleixner , Dave Hansen , Ian Rogers , Adrian Hunter , Jiri Olsa , Alexander Shishkin , Andi Kleen , Eranian Stephane Cc: Mark Rutland , broonie@kernel.org, Ravi Bangoria , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Zide Chen , Falcon Thomas , Dapeng Mi , Xudong Hao , Kan Liang , Dapeng Mi Subject: [Patch v5 04/19] x86/fpu/xstate: Add xsaves_nmi() helper Date: Wed, 3 Dec 2025 14:54:45 +0800 Message-Id: <20251203065500.2597594-5-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20251203065500.2597594-1-dapeng1.mi@linux.intel.com> References: <20251203065500.2597594-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang Add xsaves_nmi() to save supported xsave states in NMI handler. This function is similar to xsaves(), but should only be called within a NMI handler. This function returns the actual register contents at the moment the NMI occurs. Currently the perf subsystem is the sole user of this helper. It uses this function to snapshot SIMD (XMM/YMM/ZMM) and APX eGPRs registers which would be added in subsequent patches. Suggested-by: Dave Hansen Signed-off-by: Kan Liang Signed-off-by: Dapeng Mi --- arch/x86/include/asm/fpu/xstate.h | 1 + arch/x86/kernel/fpu/xstate.c | 23 +++++++++++++++++++++++ 2 files changed, 24 insertions(+) diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/x= state.h index 7a7dc9d56027..38fa8ff26559 100644 --- a/arch/x86/include/asm/fpu/xstate.h +++ b/arch/x86/include/asm/fpu/xstate.h @@ -110,6 +110,7 @@ int xfeature_size(int xfeature_nr); =20 void xsaves(struct xregs_state *xsave, u64 mask); void xrstors(struct xregs_state *xsave, u64 mask); +void xsaves_nmi(struct xregs_state *xsave, u64 mask); =20 int xfd_enable_feature(u64 xfd_err); =20 diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index 28e4fd65c9da..e3b8afed8b2c 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -1474,6 +1474,29 @@ void xrstors(struct xregs_state *xstate, u64 mask) WARN_ON_ONCE(err); } =20 +/** + * xsaves_nmi - Save selected components to a kernel xstate buffer in NMI + * @xstate: Pointer to the buffer + * @mask: Feature mask to select the components to save + * + * This function is similar to xsaves(), but should only be called within + * a NMI handler. This function returns the actual register contents at + * the moment the NMI occurs. + * + * Currently, the perf subsystem is the sole user of this helper. It uses + * the function to snapshot SIMD (XMM/YMM/ZMM) and APX eGPRs registers. + */ +void xsaves_nmi(struct xregs_state *xstate, u64 mask) +{ + int err; + + if (!in_nmi()) + return; + + XSTATE_OP(XSAVES, xstate, (u32)mask, (u32)(mask >> 32), err); + WARN_ON_ONCE(err); +} + #if IS_ENABLED(CONFIG_KVM) void fpstate_clear_xstate_component(struct fpstate *fpstate, unsigned int = xfeature) { --=20 2.34.1 From nobody Sat Feb 7 14:28:38 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AC53529D260; Wed, 3 Dec 2025 06:58:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.8 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764745107; cv=none; b=uZwYLr9XYe8GQiD9LjpmqLELde1X/2WmcHpP92PKoR6ejX1PL0uujTURaolJ6fuylhz/yBhNPQPrtfNt27Aqb2SIrJkz0jfW0u+JQevCNgvpGyxCy0e3F5Lq+q40sp5omcnHj2b6Ep+cfEaaulfCOc208nD/2eULrmLDs3QNqPs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764745107; c=relaxed/simple; bh=zLgkbpHLpsxyBbWLwrpPSYQx1fWU/bpzREKI2QhRlSc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=mWr4PrH6ZKeqkD10R6z+V5HppVhDrk2enLo4zQrN1aIxhY6cvGVtvExWJeNVjWEF5yKvjye1aAUn/a2gZLCzWb2sRu0RxNjThw7YelKO0ZEBJGM4QUE1z8YMD7CPymQvvY18lzuxjVp1XYZqzp66iqQyP9I9hjZ9QtIghWUntHI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=WnqzPUgk; arc=none smtp.client-ip=192.198.163.8 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="WnqzPUgk" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1764745106; x=1796281106; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=zLgkbpHLpsxyBbWLwrpPSYQx1fWU/bpzREKI2QhRlSc=; b=WnqzPUgkszEZVNj7j22SdcCzC+KTlbjYZ9wPOeXEKjjiFS9/mPkb8w/C 0KI0F6nT7PSUFiThuEkRbC84zJa43wjbgbphBPNZE61QJl2jqkuE/5oRY WGp/juLrJnW1zVARMjtGtulkZYiglEwEMg+cEHsEx4qBjhbtdbE6OdxNF KjrFQxYqaEAp/+aQbf9ZcE2O4B+yTmCTXjKUk4za99z2wFgcklVpaU7+L d5Hs8xnKdg9EmzlGjry4T+2jMf9xVSQWzLPMV45xkuLVqlKXPP694tGKT SlK1/dZ5i2HHF7B10+AILfECuK3EM9z1td2Smnv44Vb/a7EeydYRlhrL/ Q==; X-CSE-ConnectionGUID: WCYDkFH9QkqdHc2ZyGwdyA== X-CSE-MsgGUID: nGPFvFkjSsW1wDEmdS+A7A== X-IronPort-AV: E=McAfee;i="6800,10657,11631"; a="84324793" X-IronPort-AV: E=Sophos;i="6.20,245,1758610800"; d="scan'208";a="84324793" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2025 22:58:25 -0800 X-CSE-ConnectionGUID: TP7saJF6ThSkoHFUkN1c5A== X-CSE-MsgGUID: +DaQ0cDrRcC+1BK4huZJmw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.20,245,1758610800"; d="scan'208";a="199003821" Received: from spr.sh.intel.com ([10.112.229.196]) by fmviesa005.fm.intel.com with ESMTP; 02 Dec 2025 22:58:21 -0800 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Thomas Gleixner , Dave Hansen , Ian Rogers , Adrian Hunter , Jiri Olsa , Alexander Shishkin , Andi Kleen , Eranian Stephane Cc: Mark Rutland , broonie@kernel.org, Ravi Bangoria , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Zide Chen , Falcon Thomas , Dapeng Mi , Xudong Hao , Kan Liang , Dapeng Mi Subject: [Patch v5 05/19] perf: Move and rename has_extended_regs() for ARCH-specific use Date: Wed, 3 Dec 2025 14:54:46 +0800 Message-Id: <20251203065500.2597594-6-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20251203065500.2597594-1-dapeng1.mi@linux.intel.com> References: <20251203065500.2597594-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang The has_extended_regs() function will be utilized in ARCH-specific code. To facilitate this, move it to header file perf_event.h Additionally, the function is renamed to event_has_extended_regs() which aligns with the existing naming conventions. No functional change intended. Signed-off-by: Kan Liang Signed-off-by: Dapeng Mi --- include/linux/perf_event.h | 8 ++++++++ kernel/events/core.c | 8 +------- 2 files changed, 9 insertions(+), 7 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 9870d768db4c..5153b70d09c8 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -1526,6 +1526,14 @@ perf_event__output_id_sample(struct perf_event *even= t, extern void perf_log_lost_samples(struct perf_event *event, u64 lost); =20 +static inline bool event_has_extended_regs(struct perf_event *event) +{ + struct perf_event_attr *attr =3D &event->attr; + + return (attr->sample_regs_user & PERF_REG_EXTENDED_MASK) || + (attr->sample_regs_intr & PERF_REG_EXTENDED_MASK); +} + static inline bool event_has_any_exclude_flag(struct perf_event *event) { struct perf_event_attr *attr =3D &event->attr; diff --git a/kernel/events/core.c b/kernel/events/core.c index efc938c6a2be..3e9c48fa2202 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -12664,12 +12664,6 @@ int perf_pmu_unregister(struct pmu *pmu) } EXPORT_SYMBOL_GPL(perf_pmu_unregister); =20 -static inline bool has_extended_regs(struct perf_event *event) -{ - return (event->attr.sample_regs_user & PERF_REG_EXTENDED_MASK) || - (event->attr.sample_regs_intr & PERF_REG_EXTENDED_MASK); -} - static int perf_try_init_event(struct pmu *pmu, struct perf_event *event) { struct perf_event_context *ctx =3D NULL; @@ -12704,7 +12698,7 @@ static int perf_try_init_event(struct pmu *pmu, str= uct perf_event *event) goto err_pmu; =20 if (!(pmu->capabilities & PERF_PMU_CAP_EXTENDED_REGS) && - has_extended_regs(event)) { + event_has_extended_regs(event)) { ret =3D -EOPNOTSUPP; goto err_destroy; } --=20 2.34.1 From nobody Sat Feb 7 14:28:38 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 035CD29B764; Wed, 3 Dec 2025 06:58:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.8 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764745113; cv=none; b=NeMHTnRlgMnLxRHX7boIDtgovbwFpfvbMBXs/WlpIK65j4qHqjKq6gaVdRM0z2zERUuB36M0k0rmAIfCgkTnndqCZiV90nYCcaCLPlV0WlHomCQJd9COQUTWzG/ZCW4ByGyNJ1pa9oeEHxqMMzlWBvljCNCDMXoFkq3AWRniFwk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764745113; c=relaxed/simple; bh=tXZqap3cgZO9By+BrJCXmA+9w+oHTpOpKl34VtoarrE=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=T9+nm4/rcJeYux7bOCeP2YGgOtVzotdGV36UzOXVKRomdV36Ub1RVjZHr604ZscjbeSn4ojHD2El/j6XCg8X+dj8Mazcm2YCZ08F/wY7oFGa4YT1BNPjNNbITpRKTU/AfWgfsaNKZ8sLyw/8smG0Hc8mzOGtFifHu32LmeBWBQU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Hq6zWsj4; arc=none smtp.client-ip=192.198.163.8 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Hq6zWsj4" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1764745111; x=1796281111; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=tXZqap3cgZO9By+BrJCXmA+9w+oHTpOpKl34VtoarrE=; b=Hq6zWsj4gviuHVKcIIEqAUrnQWtnejqwEj/9eUoasn8bYHS1z/P9OH7Y 5cyKaDwiW5JAKM2fydLmDaE8APJ395rZ3KoXaa133nlMO3hKIlez+q1xr OS63HCS/gXQtsPBoTYEPPUwN0zQDH1nXOTvInyj396OzzEPqeWFbGtjWb oHT5XbO2ZbcD85tvoPZLudPu+5ODkl4Tr+XSMXTER30DbM1a7w2G0jIt+ JyLj1cReQvTnx27g2CGDDmLkNAEysFbC2yjdUDYwLYhs6NuB6NwsRshd0 gElF1Zwz6hxnZJ1U8iWMbHAoMDELUJoRqoMYgN3FFrI6DpScJbNksUB6e A==; X-CSE-ConnectionGUID: Cb9B2PN2QeWPe5dIjC+Hrw== X-CSE-MsgGUID: n1Hfa9gbSiqDrITY6zQ8jw== X-IronPort-AV: E=McAfee;i="6800,10657,11631"; a="84324807" X-IronPort-AV: E=Sophos;i="6.20,245,1758610800"; d="scan'208";a="84324807" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2025 22:58:31 -0800 X-CSE-ConnectionGUID: /tebktNSQYi5yAnWrek33w== X-CSE-MsgGUID: QFS3FgoIQ4qVbOYCgXDuOQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.20,245,1758610800"; d="scan'208";a="199003844" Received: from spr.sh.intel.com ([10.112.229.196]) by fmviesa005.fm.intel.com with ESMTP; 02 Dec 2025 22:58:25 -0800 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Thomas Gleixner , Dave Hansen , Ian Rogers , Adrian Hunter , Jiri Olsa , Alexander Shishkin , Andi Kleen , Eranian Stephane Cc: Mark Rutland , broonie@kernel.org, Ravi Bangoria , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Zide Chen , Falcon Thomas , Dapeng Mi , Xudong Hao , Kan Liang , Dapeng Mi Subject: [Patch v5 06/19] perf/x86: Add support for XMM registers in non-PEBS and REGS_USER Date: Wed, 3 Dec 2025 14:54:47 +0800 Message-Id: <20251203065500.2597594-7-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20251203065500.2597594-1-dapeng1.mi@linux.intel.com> References: <20251203065500.2597594-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang While collecting XMM registers in a PEBS record has been supported since Icelake, non-PEBS events have lacked this feature. By leveraging the xsaves instruction, it is now possible to snapshot XMM registers for non-PEBS events, completing the feature set. To utilize the xsaves instruction, a 64-byte aligned buffer is required. A per-CPU ext_regs_buf is added to store SIMD and other registers, with the buffer size being approximately 2K. The buffer is allocated using kzalloc_node(), ensuring natural alignment and 64-byte alignment for all kmalloc() allocations with powers of 2. The XMM sampling support is extended for both REGS_USER and REGS_INTR. For REGS_USER, perf_get_regs_user() returns the registers from task_pt_regs(current), which is a pt_regs structure. It needs to be copied to user space secific x86_user_regs structure since kernel may modify pt_regs structure later. For PEBS, XMM registers are retrieved from PEBS records. In cases where userspace tasks are trapped within kernel mode (e.g., during a syscall) when an NMI arrives, pt_regs information can still be retrieved from task_pt_regs(). However, capturing SIMD and other xsave-based registers in this scenario is challenging. Therefore, snapshots for these registers are omitted in such cases. The reasons are: - Profiling a userspace task that requires SIMD/eGPR registers typically involves NMIs hitting userspace, not kernel mode. - Although it is possible to retrieve values when the TIF_NEED_FPU_LOAD flag is set, the complexity introduced to handle this uncommon case in the critical path is not justified. - Additionally, checking the TIF_NEED_FPU_LOAD flag alone is insufficient. Some corner cases, such as an NMI occurring just after the flag switches but still in kernel mode, cannot be handled. Future support for additional vector registers is anticipated. An ext_regs_mask is added to track the supported vector register groups. Signed-off-by: Kan Liang Co-developed-by: Dapeng Mi Signed-off-by: Dapeng Mi --- arch/x86/events/core.c | 175 ++++++++++++++++++++++++++---- arch/x86/events/intel/core.c | 29 ++++- arch/x86/events/intel/ds.c | 20 ++-- arch/x86/events/perf_event.h | 11 +- arch/x86/include/asm/fpu/xstate.h | 2 + arch/x86/include/asm/perf_event.h | 5 +- arch/x86/kernel/fpu/xstate.c | 2 +- 7 files changed, 212 insertions(+), 32 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index dcdd2c2d68ee..0d33668b1927 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -406,6 +406,62 @@ set_ext_hw_attr(struct hw_perf_event *hwc, struct perf= _event *event) return x86_pmu_extra_regs(val, event); } =20 +static DEFINE_PER_CPU(struct xregs_state *, ext_regs_buf); + +static void x86_pmu_get_ext_regs(struct x86_perf_regs *perf_regs, u64 mask) +{ + struct xregs_state *xsave =3D per_cpu(ext_regs_buf, smp_processor_id()); + u64 valid_mask =3D x86_pmu.ext_regs_mask & mask; + + if (WARN_ON_ONCE(!xsave)) + return; + + xsaves_nmi(xsave, valid_mask); + + /* Filtered by what XSAVE really gives */ + valid_mask &=3D xsave->header.xfeatures; + + if (valid_mask & XFEATURE_MASK_SSE) + perf_regs->xmm_space =3D xsave->i387.xmm_space; +} + +static void release_ext_regs_buffers(void) +{ + int cpu; + + if (!x86_pmu.ext_regs_mask) + return; + + for_each_possible_cpu(cpu) { + kfree(per_cpu(ext_regs_buf, cpu)); + per_cpu(ext_regs_buf, cpu) =3D NULL; + } +} + +static void reserve_ext_regs_buffers(void) +{ + bool compacted =3D cpu_feature_enabled(X86_FEATURE_XCOMPACTED); + unsigned int size; + int cpu; + + if (!x86_pmu.ext_regs_mask) + return; + + size =3D xstate_calculate_size(x86_pmu.ext_regs_mask, compacted); + + for_each_possible_cpu(cpu) { + per_cpu(ext_regs_buf, cpu) =3D kzalloc_node(size, GFP_KERNEL, + cpu_to_node(cpu)); + if (!per_cpu(ext_regs_buf, cpu)) + goto err; + } + + return; + +err: + release_ext_regs_buffers(); +} + int x86_reserve_hardware(void) { int err =3D 0; @@ -418,6 +474,7 @@ int x86_reserve_hardware(void) } else { reserve_ds_buffers(); reserve_lbr_buffers(); + reserve_ext_regs_buffers(); } } if (!err) @@ -434,6 +491,7 @@ void x86_release_hardware(void) release_pmc_hardware(); release_ds_buffers(); release_lbr_buffers(); + release_ext_regs_buffers(); mutex_unlock(&pmc_reserve_mutex); } } @@ -651,19 +709,17 @@ int x86_pmu_hw_config(struct perf_event *event) return -EINVAL; } =20 - /* sample_regs_user never support XMM registers */ - if (unlikely(event->attr.sample_regs_user & PERF_REG_EXTENDED_MASK)) - return -EINVAL; - /* - * Besides the general purpose registers, XMM registers may - * be collected in PEBS on some platforms, e.g. Icelake - */ - if (unlikely(event->attr.sample_regs_intr & PERF_REG_EXTENDED_MASK)) { - if (!(event->pmu->capabilities & PERF_PMU_CAP_EXTENDED_REGS)) - return -EINVAL; - - if (!event->attr.precise_ip) - return -EINVAL; + if (event->attr.sample_type & (PERF_SAMPLE_REGS_INTR | PERF_SAMPLE_REGS_U= SER)) { + /* + * Besides the general purpose registers, XMM registers may + * be collected as well. + */ + if (event_has_extended_regs(event)) { + if (!(event->pmu->capabilities & PERF_PMU_CAP_EXTENDED_REGS)) + return -EINVAL; + if (!event->attr.precise_ip) + return -EINVAL; + } } =20 return x86_setup_perfctr(event); @@ -1695,38 +1751,115 @@ static void x86_pmu_del(struct perf_event *event, = int flags) static_call_cond(x86_pmu_del)(event); } =20 -void x86_pmu_setup_regs_data(struct perf_event *event, - struct perf_sample_data *data, - struct pt_regs *regs) +static DEFINE_PER_CPU(struct x86_perf_regs, x86_user_regs); + +static struct x86_perf_regs * +x86_pmu_perf_get_regs_user(struct perf_sample_data *data, + struct pt_regs *regs) +{ + struct x86_perf_regs *x86_regs_user =3D this_cpu_ptr(&x86_user_regs); + struct perf_regs regs_user; + + perf_get_regs_user(®s_user, regs); + data->regs_user.abi =3D regs_user.abi; + if (regs_user.regs) { + x86_regs_user->regs =3D *regs_user.regs; + data->regs_user.regs =3D &x86_regs_user->regs; + } else + data->regs_user.regs =3D NULL; + return x86_regs_user; +} + +static bool x86_pmu_user_req_pt_regs_only(struct perf_event *event) { - u64 sample_type =3D event->attr.sample_type; + return !(event->attr.sample_regs_user & PERF_REG_EXTENDED_MASK); +} + +inline void x86_pmu_clear_perf_regs(struct pt_regs *regs) +{ + struct x86_perf_regs *perf_regs =3D container_of(regs, struct x86_perf_re= gs, regs); + + perf_regs->xmm_regs =3D NULL; +} + +static void x86_pmu_setup_basic_regs_data(struct perf_event *event, + struct perf_sample_data *data, + struct pt_regs *regs) +{ + struct perf_event_attr *attr =3D &event->attr; + u64 sample_type =3D attr->sample_type; + struct x86_perf_regs *perf_regs; + + if (!(attr->sample_type & (PERF_SAMPLE_REGS_INTR | PERF_SAMPLE_REGS_USER)= )) + return; =20 if (sample_type & PERF_SAMPLE_REGS_USER) { + perf_regs =3D container_of(regs, struct x86_perf_regs, regs); + if (user_mode(regs)) { data->regs_user.abi =3D perf_reg_abi(current); data->regs_user.regs =3D regs; - } else if (!(current->flags & PF_KTHREAD)) { - perf_get_regs_user(&data->regs_user, regs); + } else if (!(current->flags & PF_KTHREAD) && + x86_pmu_user_req_pt_regs_only(event)) { + /* + * It cannot guarantee that the kernel will never + * touch the registers outside of the pt_regs, + * especially when more and more registers + * (e.g., SIMD, eGPR) are added. The live data + * cannot be used. + * Dump the registers when only pt_regs are required. + */ + perf_regs =3D x86_pmu_perf_get_regs_user(data, regs); } else { data->regs_user.abi =3D PERF_SAMPLE_REGS_ABI_NONE; data->regs_user.regs =3D NULL; } data->dyn_size +=3D sizeof(u64); if (data->regs_user.regs) - data->dyn_size +=3D hweight64(event->attr.sample_regs_user) * sizeof(u6= 4); + data->dyn_size +=3D hweight64(attr->sample_regs_user) * sizeof(u64); data->sample_flags |=3D PERF_SAMPLE_REGS_USER; } =20 if (sample_type & PERF_SAMPLE_REGS_INTR) { + perf_regs =3D container_of(regs, struct x86_perf_regs, regs); + data->regs_intr.regs =3D regs; data->regs_intr.abi =3D perf_reg_abi(current); data->dyn_size +=3D sizeof(u64); if (data->regs_intr.regs) - data->dyn_size +=3D hweight64(event->attr.sample_regs_intr) * sizeof(u6= 4); + data->dyn_size +=3D hweight64(attr->sample_regs_intr) * sizeof(u64); data->sample_flags |=3D PERF_SAMPLE_REGS_INTR; } } =20 +static void x86_pmu_sample_ext_regs(struct perf_event *event, + struct pt_regs *regs, + u64 ignore_mask) +{ + struct x86_perf_regs *perf_regs =3D container_of(regs, struct x86_perf_re= gs, regs); + u64 mask =3D 0; + + if (event_has_extended_regs(event)) + mask |=3D XFEATURE_MASK_SSE; + + mask &=3D ~ignore_mask; + if (mask) + x86_pmu_get_ext_regs(perf_regs, mask); +} + +void x86_pmu_setup_regs_data(struct perf_event *event, + struct perf_sample_data *data, + struct pt_regs *regs, + u64 ignore_mask) +{ + x86_pmu_setup_basic_regs_data(event, data, regs); + /* + * ignore_mask indicates the PEBS sampled extended regs + * which is unnessary to sample again. + */ + x86_pmu_sample_ext_regs(event, regs, ignore_mask); +} + int x86_pmu_handle_irq(struct pt_regs *regs) { struct perf_sample_data data; diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index 81e6c8bcabde..b5c89e8eabb2 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -3410,6 +3410,9 @@ static int handle_pmi_common(struct pt_regs *regs, u6= 4 status) if (has_branch_stack(event)) intel_pmu_lbr_save_brstack(&data, cpuc, event); =20 + x86_pmu_clear_perf_regs(regs); + x86_pmu_setup_regs_data(event, &data, regs, 0); + perf_event_overflow(event, &data, regs); } =20 @@ -5619,8 +5622,30 @@ static inline void __intel_update_large_pebs_flags(s= truct pmu *pmu) } } =20 -#define counter_mask(_gp, _fixed) ((_gp) | ((u64)(_fixed) << INTEL_PMC_IDX= _FIXED)) +static void intel_extended_regs_init(struct pmu *pmu) +{ + /* + * Extend the vector registers support to non-PEBS. + * The feature is limited to newer Intel machines with + * PEBS V4+ or archPerfmonExt (0x23) enabled for now. + * In theory, the vector registers can be retrieved as + * long as the CPU supports. The support for the old + * generations may be added later if there is a + * requirement. + * Only support the extension when XSAVES is available. + */ + if (!boot_cpu_has(X86_FEATURE_XSAVES)) + return; =20 + if (!boot_cpu_has(X86_FEATURE_XMM) || + !cpu_has_xfeatures(XFEATURE_MASK_SSE, NULL)) + return; + + x86_pmu.ext_regs_mask |=3D XFEATURE_MASK_SSE; + x86_get_pmu(smp_processor_id())->capabilities |=3D PERF_PMU_CAP_EXTENDED_= REGS; +} + +#define counter_mask(_gp, _fixed) ((_gp) | ((u64)(_fixed) << INTEL_PMC_IDX= _FIXED)) static void update_pmu_cap(struct pmu *pmu) { unsigned int eax, ebx, ecx, edx; @@ -5682,6 +5707,8 @@ static void update_pmu_cap(struct pmu *pmu) /* Perf Metric (Bit 15) and PEBS via PT (Bit 16) are hybrid enumeration = */ rdmsrq(MSR_IA32_PERF_CAPABILITIES, hybrid(pmu, intel_cap).capabilities); } + + intel_extended_regs_init(pmu); } =20 static void intel_pmu_check_hybrid_pmus(struct x86_hybrid_pmu *pmu) diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index c7351f476d8c..af462f69cd1c 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -1473,8 +1473,7 @@ static u64 pebs_update_adaptive_cfg(struct perf_event= *event) if (gprs || (attr->precise_ip < 2) || tsx_weight) pebs_data_cfg |=3D PEBS_DATACFG_GP; =20 - if ((sample_type & PERF_SAMPLE_REGS_INTR) && - (attr->sample_regs_intr & PERF_REG_EXTENDED_MASK)) + if (event_has_extended_regs(event)) pebs_data_cfg |=3D PEBS_DATACFG_XMMS; =20 if (sample_type & PERF_SAMPLE_BRANCH_STACK) { @@ -2190,10 +2189,8 @@ static inline void __setup_pebs_gpr_group(struct per= f_event *event, regs->flags &=3D ~PERF_EFLAGS_EXACT; } =20 - if (sample_type & (PERF_SAMPLE_REGS_INTR | PERF_SAMPLE_REGS_USER)) { + if (sample_type & (PERF_SAMPLE_REGS_INTR | PERF_SAMPLE_REGS_USER)) adaptive_pebs_save_regs(regs, gprs); - x86_pmu_setup_regs_data(event, data, regs); - } } =20 static inline void __setup_pebs_meminfo_group(struct perf_event *event, @@ -2251,6 +2248,7 @@ static void setup_pebs_adaptive_sample_data(struct pe= rf_event *event, struct pebs_meminfo *meminfo =3D NULL; struct pebs_gprs *gprs =3D NULL; struct x86_perf_regs *perf_regs; + u64 ignore_mask =3D 0; u64 format_group; u16 retire; =20 @@ -2258,7 +2256,7 @@ static void setup_pebs_adaptive_sample_data(struct pe= rf_event *event, return; =20 perf_regs =3D container_of(regs, struct x86_perf_regs, regs); - perf_regs->xmm_regs =3D NULL; + x86_pmu_clear_perf_regs(regs); =20 format_group =3D basic->format_group; =20 @@ -2305,6 +2303,7 @@ static void setup_pebs_adaptive_sample_data(struct pe= rf_event *event, if (format_group & PEBS_DATACFG_XMMS) { struct pebs_xmm *xmm =3D next_record; =20 + ignore_mask |=3D XFEATURE_MASK_SSE; next_record =3D xmm + 1; perf_regs->xmm_regs =3D xmm->xmm; } @@ -2343,6 +2342,8 @@ static void setup_pebs_adaptive_sample_data(struct pe= rf_event *event, next_record +=3D nr * sizeof(u64); } =20 + x86_pmu_setup_regs_data(event, data, regs, ignore_mask); + WARN_ONCE(next_record !=3D __pebs + basic->format_size, "PEBS record size %u, expected %llu, config %llx\n", basic->format_size, @@ -2368,6 +2369,7 @@ static void setup_arch_pebs_sample_data(struct perf_e= vent *event, struct arch_pebs_aux *meminfo =3D NULL; struct arch_pebs_gprs *gprs =3D NULL; struct x86_perf_regs *perf_regs; + u64 ignore_mask =3D 0; void *next_record; void *at =3D __pebs; =20 @@ -2375,7 +2377,7 @@ static void setup_arch_pebs_sample_data(struct perf_e= vent *event, return; =20 perf_regs =3D container_of(regs, struct x86_perf_regs, regs); - perf_regs->xmm_regs =3D NULL; + x86_pmu_clear_perf_regs(regs); =20 __setup_perf_sample_data(event, iregs, data); =20 @@ -2430,6 +2432,7 @@ static void setup_arch_pebs_sample_data(struct perf_e= vent *event, =20 next_record +=3D sizeof(struct arch_pebs_xer_header); =20 + ignore_mask |=3D XFEATURE_MASK_SSE; xmm =3D next_record; perf_regs->xmm_regs =3D xmm->xmm; next_record =3D xmm + 1; @@ -2477,6 +2480,8 @@ static void setup_arch_pebs_sample_data(struct perf_e= vent *event, at =3D at + header->size; goto again; } + + x86_pmu_setup_regs_data(event, data, regs, ignore_mask); } =20 static inline void * @@ -3137,6 +3142,7 @@ static void __init intel_ds_pebs_init(void) x86_pmu.flags |=3D PMU_FL_PEBS_ALL; x86_pmu.pebs_capable =3D ~0ULL; pebs_qual =3D "-baseline"; + x86_pmu.ext_regs_mask |=3D XFEATURE_MASK_SSE; x86_get_pmu(smp_processor_id())->capabilities |=3D PERF_PMU_CAP_EXTEND= ED_REGS; } else { /* Only basic record supported */ diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 80e52e937638..3c470d79aa65 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -1009,6 +1009,12 @@ struct x86_pmu { struct extra_reg *extra_regs; unsigned int flags; =20 + /* + * Extended regs, e.g., vector registers + * Utilize the same format as the XFEATURE_MASK_* + */ + u64 ext_regs_mask; + /* * Intel host/guest support (KVM) */ @@ -1294,9 +1300,12 @@ void x86_pmu_enable_event(struct perf_event *event); =20 int x86_pmu_handle_irq(struct pt_regs *regs); =20 +void x86_pmu_clear_perf_regs(struct pt_regs *regs); + void x86_pmu_setup_regs_data(struct perf_event *event, struct perf_sample_data *data, - struct pt_regs *regs); + struct pt_regs *regs, + u64 ignore_mask); =20 void x86_pmu_show_pmu_cap(struct pmu *pmu); =20 diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/x= state.h index 38fa8ff26559..19dec5f0b1c7 100644 --- a/arch/x86/include/asm/fpu/xstate.h +++ b/arch/x86/include/asm/fpu/xstate.h @@ -112,6 +112,8 @@ void xsaves(struct xregs_state *xsave, u64 mask); void xrstors(struct xregs_state *xsave, u64 mask); void xsaves_nmi(struct xregs_state *xsave, u64 mask); =20 +unsigned int xstate_calculate_size(u64 xfeatures, bool compacted); + int xfd_enable_feature(u64 xfd_err); =20 #ifdef CONFIG_X86_64 diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_= event.h index 7276ba70c88a..3b368de9f803 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -704,7 +704,10 @@ extern void perf_events_lapic_init(void); struct pt_regs; struct x86_perf_regs { struct pt_regs regs; - u64 *xmm_regs; + union { + u64 *xmm_regs; + u32 *xmm_space; /* for xsaves */ + }; }; =20 extern unsigned long perf_arch_instruction_pointer(struct pt_regs *regs); diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index e3b8afed8b2c..33142bccc075 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -586,7 +586,7 @@ static bool __init check_xstate_against_struct(int nr) return true; } =20 -static unsigned int xstate_calculate_size(u64 xfeatures, bool compacted) +unsigned int xstate_calculate_size(u64 xfeatures, bool compacted) { unsigned int topmost =3D fls64(xfeatures) - 1; unsigned int offset, i; --=20 2.34.1 From nobody Sat Feb 7 14:28:38 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D473B29E0FD; Wed, 3 Dec 2025 06:58:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.8 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764745118; cv=none; b=CyQTp7iO6bPqadt9VeLMGsaJVNO7r3cOFv2NFZFI5NQR8YDlz69+dE/39AG14fEoUdrAxBYcXQI6m/TfnTnTYC0iGi/INmUsF5+8tAqkG9RAVwoGxoXJFUMbIoVPFIjd4lPva/qsot6ABaFn1qQcmBnxIXWRtOHDIUUOLtJtAtA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764745118; c=relaxed/simple; bh=m7GkF9gQg1+jYyArzPbAbuiGQe0I0Wer1Z80CtBeEb0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=KhsOdAq90cORCk9RtTWJy9KBY7czK/aEu02y9q+7jU6mcFCwKO9jSGjRNl7n8M/iMnU9hGJN7BUFHZiVxILWtZJnMbQEE/cH9ozzJgI9ShqBfk+othBfGhSprc/7pGMEeBeFF6N8FGbI8cHIbCaLjIlkvdQfVDUwkdMVKye46+A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=gz2VX0/8; arc=none smtp.client-ip=192.198.163.8 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="gz2VX0/8" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1764745117; x=1796281117; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=m7GkF9gQg1+jYyArzPbAbuiGQe0I0Wer1Z80CtBeEb0=; b=gz2VX0/8KoL5TXqEqGaSJq9OjXSUVCr4dhoi1wbu1Chpym4JJEJu06hh upBBMIb6zv2C17t1L2F28tlquqmCmpWhKg/Vv+f+/KiPtrgfRWYWfHmns PN1jbdsXRgG4iF8sOlfIyPT8c7CA3+Xu/ZQq9txNZmp9cEANXzs9Yj+yJ lKadEnGEjSwjcMq2J14GdP5nJbYEWUIgUt2gZXmDGlaI/XKXB1WJk0tt7 tPiJRYAByfa56MeL6jotlpXary2fbd7SVl6PdTqGT6VeTIwkrqpWF0FUT X2P/FQu3alTIetMO/aaLOe+W5MMlBOzpSxcwzKS4fQ+7jnDaaMUSf+xby w==; X-CSE-ConnectionGUID: AvUbf3UeQ4SS+xnuXoSYlw== X-CSE-MsgGUID: E6Cmqv3KRdyKarAfjaQqWA== X-IronPort-AV: E=McAfee;i="6800,10657,11631"; a="84324822" X-IronPort-AV: E=Sophos;i="6.20,245,1758610800"; d="scan'208";a="84324822" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2025 22:58:36 -0800 X-CSE-ConnectionGUID: D0lwsIcsRnOTXHxWvu2UyA== X-CSE-MsgGUID: vK/MIps5TVyQWPpL5avHZA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.20,245,1758610800"; d="scan'208";a="199003865" Received: from spr.sh.intel.com ([10.112.229.196]) by fmviesa005.fm.intel.com with ESMTP; 02 Dec 2025 22:58:31 -0800 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Thomas Gleixner , Dave Hansen , Ian Rogers , Adrian Hunter , Jiri Olsa , Alexander Shishkin , Andi Kleen , Eranian Stephane Cc: Mark Rutland , broonie@kernel.org, Ravi Bangoria , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Zide Chen , Falcon Thomas , Dapeng Mi , Xudong Hao , Kan Liang , Dapeng Mi Subject: [Patch v5 07/19] perf: Add sampling support for SIMD registers Date: Wed, 3 Dec 2025 14:54:48 +0800 Message-Id: <20251203065500.2597594-8-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20251203065500.2597594-1-dapeng1.mi@linux.intel.com> References: <20251203065500.2597594-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang Users may be interested in sampling SIMD registers during profiling. The current sample_regs_* structure does not have sufficient space for all SIMD registers. To address this, new attribute fields sample_simd_{pred,vec}_reg_* are added to struct perf_event_attr to represent the SIMD registers that are expected to be sampled. Currently, the perf/x86 code supports XMM registers in sample_regs_*. To unify the configuration of SIMD registers and ensure a consistent method for configuring XMM and other SIMD registers, a new event attribute field, sample_simd_regs_enabled, is introduced. When sample_simd_regs_enabled is set, it indicates that all SIMD registers, including XMM, will be represented by the newly introduced sample_simd_{pred|vec}_reg_* fields. The original XMM space in sample_regs_* is reserved for future uses. Since SIMD registers are wider than 64 bits, a new output format is introduced. The number and width of SIMD registers are dumped first, followed by the register values. The number and width are based on the user's configuration. If they differ (e.g., on ARM), an ARCH-specific perf_output_sample_simd_regs function can be implemented separately. A new ABI, PERF_SAMPLE_REGS_ABI_SIMD, is added to indicate the new format. The enum perf_sample_regs_abi is now a bitmap. This change should not impact existing tools, as the version and bitmap remain the same for values 1 and 2. Additionally, two new __weak functions are introduced: - perf_simd_reg_value(): Retrieves the value of the requested SIMD register. - perf_simd_reg_validate(): Validates the configuration of the SIMD registers. A new flag, PERF_PMU_CAP_SIMD_REGS, is added to indicate that the PMU supports SIMD register dumping. An error is generated if sample_simd_{pred|vec}_reg_* is mistakenly set for a PMU that does not support this capability. Suggested-by: Peter Zijlstra (Intel) Signed-off-by: Kan Liang Co-developed-by: Dapeng Mi Signed-off-by: Dapeng Mi --- include/linux/perf_event.h | 8 +++ include/linux/perf_regs.h | 4 ++ include/uapi/linux/perf_event.h | 45 ++++++++++++++-- kernel/events/core.c | 96 +++++++++++++++++++++++++++++++-- 4 files changed, 146 insertions(+), 7 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 5153b70d09c8..87d3bdbef30e 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -305,6 +305,7 @@ struct perf_event_pmu_context; #define PERF_PMU_CAP_EXTENDED_HW_TYPE 0x0100 #define PERF_PMU_CAP_AUX_PAUSE 0x0200 #define PERF_PMU_CAP_AUX_PREFER_LARGE 0x0400 +#define PERF_PMU_CAP_SIMD_REGS 0x0800 =20 /** * pmu::scope @@ -1526,6 +1527,13 @@ perf_event__output_id_sample(struct perf_event *even= t, extern void perf_log_lost_samples(struct perf_event *event, u64 lost); =20 +static inline bool event_has_simd_regs(struct perf_event *event) +{ + struct perf_event_attr *attr =3D &event->attr; + + return attr->sample_simd_regs_enabled !=3D 0; +} + static inline bool event_has_extended_regs(struct perf_event *event) { struct perf_event_attr *attr =3D &event->attr; diff --git a/include/linux/perf_regs.h b/include/linux/perf_regs.h index 144bcc3ff19f..518f28c6a7d4 100644 --- a/include/linux/perf_regs.h +++ b/include/linux/perf_regs.h @@ -14,6 +14,10 @@ int perf_reg_validate(u64 mask); u64 perf_reg_abi(struct task_struct *task); void perf_get_regs_user(struct perf_regs *regs_user, struct pt_regs *regs); +int perf_simd_reg_validate(u16 vec_qwords, u64 vec_mask, + u16 pred_qwords, u32 pred_mask); +u64 perf_simd_reg_value(struct pt_regs *regs, int idx, + u16 qwords_idx, bool pred); =20 #ifdef CONFIG_HAVE_PERF_REGS #include diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_even= t.h index d292f96bc06f..f1474da32622 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -314,8 +314,9 @@ enum { */ enum perf_sample_regs_abi { PERF_SAMPLE_REGS_ABI_NONE =3D 0, - PERF_SAMPLE_REGS_ABI_32 =3D 1, - PERF_SAMPLE_REGS_ABI_64 =3D 2, + PERF_SAMPLE_REGS_ABI_32 =3D (1 << 0), + PERF_SAMPLE_REGS_ABI_64 =3D (1 << 1), + PERF_SAMPLE_REGS_ABI_SIMD =3D (1 << 2), }; =20 /* @@ -382,6 +383,7 @@ enum perf_event_read_format { #define PERF_ATTR_SIZE_VER6 120 /* Add: aux_sample_size */ #define PERF_ATTR_SIZE_VER7 128 /* Add: sig_data */ #define PERF_ATTR_SIZE_VER8 136 /* Add: config3 */ +#define PERF_ATTR_SIZE_VER9 168 /* Add: sample_simd_{pred,vec}_reg_* */ =20 /* * 'struct perf_event_attr' contains various attributes that define @@ -545,6 +547,25 @@ struct perf_event_attr { __u64 sig_data; =20 __u64 config3; /* extension of config2 */ + + + /* + * Defines set of SIMD registers to dump on samples. + * The sample_simd_regs_enabled !=3D0 implies the + * set of SIMD registers is used to config all SIMD registers. + * If !sample_simd_regs_enabled, sample_regs_XXX may be used to + * config some SIMD registers on X86. + */ + union { + __u16 sample_simd_regs_enabled; + __u16 sample_simd_pred_reg_qwords; + }; + __u32 sample_simd_pred_reg_intr; + __u32 sample_simd_pred_reg_user; + __u16 sample_simd_vec_reg_qwords; + __u64 sample_simd_vec_reg_intr; + __u64 sample_simd_vec_reg_user; + __u32 __reserved_4; }; =20 /* @@ -1018,7 +1039,15 @@ enum perf_event_type { * } && PERF_SAMPLE_BRANCH_STACK * * { u64 abi; # enum perf_sample_regs_abi - * u64 regs[weight(mask)]; } && PERF_SAMPLE_REGS_USER + * u64 regs[weight(mask)]; + * struct { + * u16 nr_vectors; + * u16 vector_qwords; + * u16 nr_pred; + * u16 pred_qwords; + * u64 data[nr_vectors * vector_qwords + nr_pred * pred_qwords]; + * } && (abi & PERF_SAMPLE_REGS_ABI_SIMD) + * } && PERF_SAMPLE_REGS_USER * * { u64 size; * char data[size]; @@ -1045,7 +1074,15 @@ enum perf_event_type { * { u64 data_src; } && PERF_SAMPLE_DATA_SRC * { u64 transaction; } && PERF_SAMPLE_TRANSACTION * { u64 abi; # enum perf_sample_regs_abi - * u64 regs[weight(mask)]; } && PERF_SAMPLE_REGS_INTR + * u64 regs[weight(mask)]; + * struct { + * u16 nr_vectors; + * u16 vector_qwords; + * u16 nr_pred; + * u16 pred_qwords; + * u64 data[nr_vectors * vector_qwords + nr_pred * pred_qwords]; + * } && (abi & PERF_SAMPLE_REGS_ABI_SIMD) + * } && PERF_SAMPLE_REGS_INTR * { u64 phys_addr;} && PERF_SAMPLE_PHYS_ADDR * { u64 cgroup;} && PERF_SAMPLE_CGROUP * { u64 data_page_size;} && PERF_SAMPLE_DATA_PAGE_SIZE diff --git a/kernel/events/core.c b/kernel/events/core.c index 3e9c48fa2202..b19de038979e 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -7469,6 +7469,50 @@ perf_output_sample_regs(struct perf_output_handle *h= andle, } } =20 +static void +perf_output_sample_simd_regs(struct perf_output_handle *handle, + struct perf_event *event, + struct pt_regs *regs, + u64 mask, u16 pred_mask) +{ + u16 pred_qwords =3D event->attr.sample_simd_pred_reg_qwords; + u16 vec_qwords =3D event->attr.sample_simd_vec_reg_qwords; + u64 pred_bitmap =3D pred_mask; + u64 bitmap =3D mask; + u16 nr_vectors; + u16 nr_pred; + int bit; + u64 val; + u16 i; + + nr_vectors =3D hweight64(bitmap); + nr_pred =3D hweight64(pred_bitmap); + + perf_output_put(handle, nr_vectors); + perf_output_put(handle, vec_qwords); + perf_output_put(handle, nr_pred); + perf_output_put(handle, pred_qwords); + + if (nr_vectors) { + for_each_set_bit(bit, (unsigned long *)&bitmap, + sizeof(bitmap) * BITS_PER_BYTE) { + for (i =3D 0; i < vec_qwords; i++) { + val =3D perf_simd_reg_value(regs, bit, i, false); + perf_output_put(handle, val); + } + } + } + if (nr_pred) { + for_each_set_bit(bit, (unsigned long *)&pred_bitmap, + sizeof(pred_bitmap) * BITS_PER_BYTE) { + for (i =3D 0; i < pred_qwords; i++) { + val =3D perf_simd_reg_value(regs, bit, i, true); + perf_output_put(handle, val); + } + } + } +} + static void perf_sample_regs_user(struct perf_regs *regs_user, struct pt_regs *regs) { @@ -7490,6 +7534,17 @@ static void perf_sample_regs_intr(struct perf_regs *= regs_intr, regs_intr->abi =3D perf_reg_abi(current); } =20 +int __weak perf_simd_reg_validate(u16 vec_qwords, u64 vec_mask, + u16 pred_qwords, u32 pred_mask) +{ + return vec_qwords || vec_mask || pred_qwords || pred_mask ? -ENOSYS : 0; +} + +u64 __weak perf_simd_reg_value(struct pt_regs *regs, int idx, + u16 qwords_idx, bool pred) +{ + return 0; +} =20 /* * Get remaining task size from user stack pointer. @@ -8022,10 +8077,17 @@ void perf_output_sample(struct perf_output_handle *= handle, perf_output_put(handle, abi); =20 if (abi) { - u64 mask =3D event->attr.sample_regs_user; + struct perf_event_attr *attr =3D &event->attr; + u64 mask =3D attr->sample_regs_user; perf_output_sample_regs(handle, data->regs_user.regs, mask); + if (abi & PERF_SAMPLE_REGS_ABI_SIMD) { + perf_output_sample_simd_regs(handle, event, + data->regs_user.regs, + attr->sample_simd_vec_reg_user, + attr->sample_simd_pred_reg_user); + } } } =20 @@ -8053,11 +8115,18 @@ void perf_output_sample(struct perf_output_handle *= handle, perf_output_put(handle, abi); =20 if (abi) { - u64 mask =3D event->attr.sample_regs_intr; + struct perf_event_attr *attr =3D &event->attr; + u64 mask =3D attr->sample_regs_intr; =20 perf_output_sample_regs(handle, data->regs_intr.regs, mask); + if (abi & PERF_SAMPLE_REGS_ABI_SIMD) { + perf_output_sample_simd_regs(handle, event, + data->regs_intr.regs, + attr->sample_simd_vec_reg_intr, + attr->sample_simd_pred_reg_intr); + } } } =20 @@ -12697,6 +12766,12 @@ static int perf_try_init_event(struct pmu *pmu, st= ruct perf_event *event) if (ret) goto err_pmu; =20 + if (!(pmu->capabilities & PERF_PMU_CAP_SIMD_REGS) && + event_has_simd_regs(event)) { + ret =3D -EOPNOTSUPP; + goto err_destroy; + } + if (!(pmu->capabilities & PERF_PMU_CAP_EXTENDED_REGS) && event_has_extended_regs(event)) { ret =3D -EOPNOTSUPP; @@ -13238,6 +13313,12 @@ static int perf_copy_attr(struct perf_event_attr _= _user *uattr, ret =3D perf_reg_validate(attr->sample_regs_user); if (ret) return ret; + ret =3D perf_simd_reg_validate(attr->sample_simd_vec_reg_qwords, + attr->sample_simd_vec_reg_user, + attr->sample_simd_pred_reg_qwords, + attr->sample_simd_pred_reg_user); + if (ret) + return ret; } =20 if (attr->sample_type & PERF_SAMPLE_STACK_USER) { @@ -13258,8 +13339,17 @@ static int perf_copy_attr(struct perf_event_attr _= _user *uattr, if (!attr->sample_max_stack) attr->sample_max_stack =3D sysctl_perf_event_max_stack; =20 - if (attr->sample_type & PERF_SAMPLE_REGS_INTR) + if (attr->sample_type & PERF_SAMPLE_REGS_INTR) { ret =3D perf_reg_validate(attr->sample_regs_intr); + if (ret) + return ret; + ret =3D perf_simd_reg_validate(attr->sample_simd_vec_reg_qwords, + attr->sample_simd_vec_reg_intr, + attr->sample_simd_pred_reg_qwords, + attr->sample_simd_pred_reg_intr); + if (ret) + return ret; + } =20 #ifndef CONFIG_CGROUP_PERF if (attr->sample_type & PERF_SAMPLE_CGROUP) --=20 2.34.1 From nobody Sat Feb 7 14:28:38 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F16E429BDA4; Wed, 3 Dec 2025 06:58:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.8 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764745124; cv=none; b=WW+5raMPFZiMz6MCrKG+8+gpUMkyy0wLSBrTBQV8ajCRRWQ2159PrWCPLgSdtKtB0lBJzrVzRS7SG+lA1Df7/J9f+gKu13O7ytrg6xqEYiXQ4zqF/MCbUQh2jO9r0ICZYl05PLic3Sl6cS3jzFWDTi3vofYiDPGUCnQQo5Lz8sc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764745124; c=relaxed/simple; bh=QX/6ttqqehObHamaXaK/DPkxSTJWNicXOzbxdnXd70U=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=ao+h4gP8Mt2RqQWRrNtzOQ2hOcaK1ajG2Npi10kb3VJBOGMq4hZqBOxuwseBljlTzDu+bGldd87xfxwA6BPKJrFFOESsQS6qTB0QyWLb3BGRYwbvTnYtGjetz4C/RZLpttiRLlukwdS2Vijm5IuaJshY52XDuabzIks4tOrET+g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=FZi1lsGJ; arc=none smtp.client-ip=192.198.163.8 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="FZi1lsGJ" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1764745122; x=1796281122; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=QX/6ttqqehObHamaXaK/DPkxSTJWNicXOzbxdnXd70U=; b=FZi1lsGJrH8kyT77NOTxMBzE9SOsb+1cBTmYSbnZo3sOvBGvpM6rmMdH 9tvGYB4UK+leT0szF6GNQuSTXqhydtCvvHZ7MRUuTyLoKVQa4lZnAcnJf GIBrL/pkY03L8/x9FcnhYvVHpLKlvgTmfyPKQwT5Ra3u8718Ffun3JX1N Ty/NkjByF5L34CtK2+iqecBZkauSVXFrACz65giCD0J3a+zHxSp0L5P7d +jwAPJjuMMhQj60sYSNQNmmWvY8vyqFR6J7FNaucjNPpxJlYU+Cd6hJAR 6FEiEvxns8CLsRERfDwPkynCmsnKh4vgIaIWH600JjqSbnu16ES/brwOg Q==; X-CSE-ConnectionGUID: PL2YICfGTQqDXT2gpVcKqA== X-CSE-MsgGUID: YlOLm01+QQGUVQiT9/skeA== X-IronPort-AV: E=McAfee;i="6800,10657,11631"; a="84324838" X-IronPort-AV: E=Sophos;i="6.20,245,1758610800"; d="scan'208";a="84324838" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2025 22:58:42 -0800 X-CSE-ConnectionGUID: 2v1UHBwZSrKLMk1TzZhKog== X-CSE-MsgGUID: kk2oR39gT+SVwSugaXFp3Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.20,245,1758610800"; d="scan'208";a="199003884" Received: from spr.sh.intel.com ([10.112.229.196]) by fmviesa005.fm.intel.com with ESMTP; 02 Dec 2025 22:58:36 -0800 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Thomas Gleixner , Dave Hansen , Ian Rogers , Adrian Hunter , Jiri Olsa , Alexander Shishkin , Andi Kleen , Eranian Stephane Cc: Mark Rutland , broonie@kernel.org, Ravi Bangoria , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Zide Chen , Falcon Thomas , Dapeng Mi , Xudong Hao , Kan Liang , Dapeng Mi Subject: [Patch v5 08/19] perf/x86: Enable XMM sampling using sample_simd_vec_reg_* fields Date: Wed, 3 Dec 2025 14:54:49 +0800 Message-Id: <20251203065500.2597594-9-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20251203065500.2597594-1-dapeng1.mi@linux.intel.com> References: <20251203065500.2597594-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang This patch adds support for sampling XMM registers using the sample_simd_vec_reg_* fields. When sample_simd_regs_enabled is set, the original XMM space in the sample_regs_* field is treated as reserved. An INVAL error will be reported to user space if any bit is set in the original XMM space while sample_simd_regs_enabled is set. The perf_reg_value function requires ABI information to understand the layout of sample_regs. To accommodate this, a new abi field is introduced in the struct x86_perf_regs to represent ABI information. Additionally, the X86-specific perf_simd_reg_value function is implemented to retrieve the XMM register values. Signed-off-by: Kan Liang Co-developed-by: Dapeng Mi Signed-off-by: Dapeng Mi --- arch/x86/events/core.c | 78 ++++++++++++++++++++++++++- arch/x86/events/intel/ds.c | 2 +- arch/x86/events/perf_event.h | 12 +++++ arch/x86/include/asm/perf_event.h | 1 + arch/x86/include/uapi/asm/perf_regs.h | 17 ++++++ arch/x86/kernel/perf_regs.c | 51 +++++++++++++++++- 6 files changed, 158 insertions(+), 3 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 0d33668b1927..8f7e7e81daaf 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -719,6 +719,22 @@ int x86_pmu_hw_config(struct perf_event *event) return -EINVAL; if (!event->attr.precise_ip) return -EINVAL; + if (event->attr.sample_simd_regs_enabled) + return -EINVAL; + } + + if (event_has_simd_regs(event)) { + if (!(event->pmu->capabilities & PERF_PMU_CAP_SIMD_REGS)) + return -EINVAL; + /* Not require any vector registers but set width */ + if (event->attr.sample_simd_vec_reg_qwords && + !event->attr.sample_simd_vec_reg_intr && + !event->attr.sample_simd_vec_reg_user) + return -EINVAL; + /* The vector registers set is not supported */ + if (event_needs_xmm(event) && + !(x86_pmu.ext_regs_mask & XFEATURE_MASK_SSE)) + return -EINVAL; } } =20 @@ -1760,6 +1776,7 @@ x86_pmu_perf_get_regs_user(struct perf_sample_data *d= ata, struct x86_perf_regs *x86_regs_user =3D this_cpu_ptr(&x86_user_regs); struct perf_regs regs_user; =20 + x86_regs_user->abi =3D PERF_SAMPLE_REGS_ABI_NONE; perf_get_regs_user(®s_user, regs); data->regs_user.abi =3D regs_user.abi; if (regs_user.regs) { @@ -1772,9 +1789,26 @@ x86_pmu_perf_get_regs_user(struct perf_sample_data *= data, =20 static bool x86_pmu_user_req_pt_regs_only(struct perf_event *event) { + if (event->attr.sample_simd_regs_enabled) + return false; return !(event->attr.sample_regs_user & PERF_REG_EXTENDED_MASK); } =20 +static inline void +x86_pmu_update_ext_regs_size(struct perf_event_attr *attr, + struct perf_sample_data *data, + struct pt_regs *regs, + u64 mask, u16 pred_mask) +{ + u16 pred_qwords =3D attr->sample_simd_pred_reg_qwords; + u16 vec_qwords =3D attr->sample_simd_vec_reg_qwords; + u64 pred_bitmap =3D pred_mask; + u64 bitmap =3D mask; + + data->dyn_size +=3D (hweight64(bitmap) * vec_qwords + + hweight64(pred_bitmap) * pred_qwords) * sizeof(u64); +} + inline void x86_pmu_clear_perf_regs(struct pt_regs *regs) { struct x86_perf_regs *perf_regs =3D container_of(regs, struct x86_perf_re= gs, regs); @@ -1795,6 +1829,7 @@ static void x86_pmu_setup_basic_regs_data(struct perf= _event *event, =20 if (sample_type & PERF_SAMPLE_REGS_USER) { perf_regs =3D container_of(regs, struct x86_perf_regs, regs); + perf_regs->abi =3D PERF_SAMPLE_REGS_ABI_NONE; =20 if (user_mode(regs)) { data->regs_user.abi =3D perf_reg_abi(current); @@ -1817,17 +1852,24 @@ static void x86_pmu_setup_basic_regs_data(struct pe= rf_event *event, data->dyn_size +=3D sizeof(u64); if (data->regs_user.regs) data->dyn_size +=3D hweight64(attr->sample_regs_user) * sizeof(u64); + perf_regs->abi |=3D data->regs_user.abi; + if (attr->sample_simd_regs_enabled) + perf_regs->abi |=3D PERF_SAMPLE_REGS_ABI_SIMD; data->sample_flags |=3D PERF_SAMPLE_REGS_USER; } =20 if (sample_type & PERF_SAMPLE_REGS_INTR) { perf_regs =3D container_of(regs, struct x86_perf_regs, regs); + perf_regs->abi =3D PERF_SAMPLE_REGS_ABI_NONE; =20 data->regs_intr.regs =3D regs; data->regs_intr.abi =3D perf_reg_abi(current); data->dyn_size +=3D sizeof(u64); if (data->regs_intr.regs) data->dyn_size +=3D hweight64(attr->sample_regs_intr) * sizeof(u64); + perf_regs->abi |=3D data->regs_intr.abi; + if (attr->sample_simd_regs_enabled) + perf_regs->abi |=3D PERF_SAMPLE_REGS_ABI_SIMD; data->sample_flags |=3D PERF_SAMPLE_REGS_INTR; } } @@ -1839,7 +1881,7 @@ static void x86_pmu_sample_ext_regs(struct perf_event= *event, struct x86_perf_regs *perf_regs =3D container_of(regs, struct x86_perf_re= gs, regs); u64 mask =3D 0; =20 - if (event_has_extended_regs(event)) + if (event_needs_xmm(event)) mask |=3D XFEATURE_MASK_SSE; =20 mask &=3D ~ignore_mask; @@ -1847,6 +1889,39 @@ static void x86_pmu_sample_ext_regs(struct perf_even= t *event, x86_pmu_get_ext_regs(perf_regs, mask); } =20 +static void x86_pmu_setup_extended_regs_data(struct perf_event *event, + struct perf_sample_data *data, + struct pt_regs *regs) +{ + struct perf_event_attr *attr =3D &event->attr; + u64 sample_type =3D attr->sample_type; + + if (!attr->sample_simd_regs_enabled) + return; + + if (!(attr->sample_type & (PERF_SAMPLE_REGS_INTR | PERF_SAMPLE_REGS_USER)= )) + return; + + /* Update the data[] size */ + if (sample_type & PERF_SAMPLE_REGS_USER && data->regs_user.abi) { + /* num and qwords of vector and pred registers */ + data->dyn_size +=3D sizeof(u64); + data->regs_user.abi |=3D PERF_SAMPLE_REGS_ABI_SIMD; + x86_pmu_update_ext_regs_size(attr, data, data->regs_user.regs, + attr->sample_simd_vec_reg_user, + attr->sample_simd_pred_reg_user); + } + + if (sample_type & PERF_SAMPLE_REGS_INTR && data->regs_intr.abi) { + /* num and qwords of vector and pred registers */ + data->dyn_size +=3D sizeof(u64); + data->regs_intr.abi |=3D PERF_SAMPLE_REGS_ABI_SIMD; + x86_pmu_update_ext_regs_size(attr, data, data->regs_intr.regs, + attr->sample_simd_vec_reg_intr, + attr->sample_simd_pred_reg_intr); + } +} + void x86_pmu_setup_regs_data(struct perf_event *event, struct perf_sample_data *data, struct pt_regs *regs, @@ -1858,6 +1933,7 @@ void x86_pmu_setup_regs_data(struct perf_event *event, * which is unnessary to sample again. */ x86_pmu_sample_ext_regs(event, regs, ignore_mask); + x86_pmu_setup_extended_regs_data(event, data, regs); } =20 int x86_pmu_handle_irq(struct pt_regs *regs) diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index af462f69cd1c..79cba323eeb1 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -1473,7 +1473,7 @@ static u64 pebs_update_adaptive_cfg(struct perf_event= *event) if (gprs || (attr->precise_ip < 2) || tsx_weight) pebs_data_cfg |=3D PEBS_DATACFG_GP; =20 - if (event_has_extended_regs(event)) + if (event_needs_xmm(event)) pebs_data_cfg |=3D PEBS_DATACFG_XMMS; =20 if (sample_type & PERF_SAMPLE_BRANCH_STACK) { diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 3c470d79aa65..e5d8ad024553 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -133,6 +133,18 @@ static inline bool is_acr_event_group(struct perf_even= t *event) return check_leader_group(event->group_leader, PERF_X86_EVENT_ACR); } =20 +static inline bool event_needs_xmm(struct perf_event *event) +{ + if (event->attr.sample_simd_regs_enabled && + event->attr.sample_simd_vec_reg_qwords >=3D PERF_X86_XMM_QWORDS) + return true; + + if (!event->attr.sample_simd_regs_enabled && + event_has_extended_regs(event)) + return true; + return false; +} + struct amd_nb { int nb_id; /* NorthBridge id */ int refcnt; /* reference count */ diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_= event.h index 3b368de9f803..5d623805bf87 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -704,6 +704,7 @@ extern void perf_events_lapic_init(void); struct pt_regs; struct x86_perf_regs { struct pt_regs regs; + u64 abi; union { u64 *xmm_regs; u32 *xmm_space; /* for xsaves */ diff --git a/arch/x86/include/uapi/asm/perf_regs.h b/arch/x86/include/uapi/= asm/perf_regs.h index 7c9d2bb3833b..c3862e5fdd6d 100644 --- a/arch/x86/include/uapi/asm/perf_regs.h +++ b/arch/x86/include/uapi/asm/perf_regs.h @@ -55,4 +55,21 @@ enum perf_event_x86_regs { =20 #define PERF_REG_EXTENDED_MASK (~((1ULL << PERF_REG_X86_XMM0) - 1)) =20 +enum { + PERF_REG_X86_XMM, + PERF_REG_X86_MAX_SIMD_REGS, +}; + +enum { + PERF_X86_SIMD_XMM_REGS =3D 16, + PERF_X86_SIMD_VEC_REGS_MAX =3D PERF_X86_SIMD_XMM_REGS, +}; + +#define PERF_X86_SIMD_VEC_MASK GENMASK_ULL(PERF_X86_SIMD_VEC_REGS_MAX - 1= , 0) + +enum { + PERF_X86_XMM_QWORDS =3D 2, + PERF_X86_SIMD_QWORDS_MAX =3D PERF_X86_XMM_QWORDS, +}; + #endif /* _ASM_X86_PERF_REGS_H */ diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c index 81204cb7f723..9947a6b5c260 100644 --- a/arch/x86/kernel/perf_regs.c +++ b/arch/x86/kernel/perf_regs.c @@ -63,6 +63,9 @@ u64 perf_reg_value(struct pt_regs *regs, int idx) =20 if (idx >=3D PERF_REG_X86_XMM0 && idx < PERF_REG_X86_XMM_MAX) { perf_regs =3D container_of(regs, struct x86_perf_regs, regs); + /* SIMD registers are moved to dedicated sample_simd_vec_reg */ + if (perf_regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) + return 0; if (!perf_regs->xmm_regs) return 0; return perf_regs->xmm_regs[idx - PERF_REG_X86_XMM0]; @@ -74,6 +77,51 @@ u64 perf_reg_value(struct pt_regs *regs, int idx) return regs_get_register(regs, pt_regs_offset[idx]); } =20 +u64 perf_simd_reg_value(struct pt_regs *regs, int idx, + u16 qwords_idx, bool pred) +{ + struct x86_perf_regs *perf_regs =3D + container_of(regs, struct x86_perf_regs, regs); + + if (pred) + return 0; + + if (WARN_ON_ONCE(idx >=3D PERF_X86_SIMD_VEC_REGS_MAX || + qwords_idx >=3D PERF_X86_SIMD_QWORDS_MAX)) + return 0; + + if (qwords_idx < PERF_X86_XMM_QWORDS) { + if (!perf_regs->xmm_regs) + return 0; + return perf_regs->xmm_regs[idx * PERF_X86_XMM_QWORDS + + qwords_idx]; + } + + return 0; +} + +int perf_simd_reg_validate(u16 vec_qwords, u64 vec_mask, + u16 pred_qwords, u32 pred_mask) +{ + /* pred_qwords implies sample_simd_{pred,vec}_reg_* are supported */ + if (!pred_qwords) + return 0; + + if (!vec_qwords) { + if (vec_mask) + return -EINVAL; + } else { + if (vec_qwords !=3D PERF_X86_XMM_QWORDS) + return -EINVAL; + if (vec_mask & ~PERF_X86_SIMD_VEC_MASK) + return -EINVAL; + } + if (pred_mask) + return -EINVAL; + + return 0; +} + #define PERF_REG_X86_RESERVED (((1ULL << PERF_REG_X86_XMM0) - 1) & \ ~((1ULL << PERF_REG_X86_MAX) - 1)) =20 @@ -108,7 +156,8 @@ u64 perf_reg_abi(struct task_struct *task) =20 int perf_reg_validate(u64 mask) { - if (!mask || (mask & (REG_NOSUPPORT | PERF_REG_X86_RESERVED))) + /* The mask could be 0 if only the SIMD registers are interested */ + if (mask & (REG_NOSUPPORT | PERF_REG_X86_RESERVED)) return -EINVAL; =20 return 0; --=20 2.34.1 From nobody Sat Feb 7 14:28:38 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F2A4829E0FD; Wed, 3 Dec 2025 06:58:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.8 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764745128; cv=none; b=eWeB/5/ZxeUT5E6J0G99xxgSUHjfy/tU7sQRGWS9fOPH7kuHWVnavrDaSvqmaDQAKuZm6d212/g5FVdFiWgP36FJzccqa2gKYjzTGYj+uDDB8F7FSXCe3Q6SEXCjsB4VO2PIfy7TmP1QqfL5DJJnuvgYeiJAGeC5Cc2OgGBj7YI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764745128; c=relaxed/simple; bh=yWFkDmGsO1F+usNu+ZQBoVZd63Fqq6a7CZENuW/Dyc0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=fwIXn6ubGnwXAkDx8yFuuy7txVCbOsEtblDgOBHbsBouj/6dwM5n9SxPlNWLZCZDxpVqvVoz86AwkLWCMvhs+jKH+hSljNKaBTfJDPM7ZswVDUPE/2MMUqwqIcQj8SmLcJbnnlZh2xTh002PyxnXXV6ijJAfNFAwW107mK/EwZU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=oKrf6w5I; arc=none smtp.client-ip=192.198.163.8 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="oKrf6w5I" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1764745127; x=1796281127; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=yWFkDmGsO1F+usNu+ZQBoVZd63Fqq6a7CZENuW/Dyc0=; b=oKrf6w5IfBxaH5oMVIOdV5TzcxbdNL6ylRTjEdtA20Xotj2gE+uJOwDe H1dlw4KQNgg4T0+q9gXgu3UnAUZjb503RLbBsglOMoTnaFHiDLRrFDHk5 QROisb41urMdtodRj6lqDvNtAilIi8dLadQvN1Ke0g7Scadw0g5HEauoh rBq1bP5I15QONYJ8pU7NHLPnEwb1tBOqbOFo7PVjO/2ZEx9KNSULtSf5y +XngsLCdexUo9JTK0Fpff6yqhWeWGSYo+ZM2taMTyY+oDQdlOpc6jvNwI IcfCEV3R9npvsDbhm/kQqXqn6yKzfsePLJ2ntgucKHATPvbe3eV/lec7S Q==; X-CSE-ConnectionGUID: AeeIlagZTv+ZbhKCAWyl/Q== X-CSE-MsgGUID: I8ck9QmqQ8664FgLeezUhg== X-IronPort-AV: E=McAfee;i="6800,10657,11631"; a="84324850" X-IronPort-AV: E=Sophos;i="6.20,245,1758610800"; d="scan'208";a="84324850" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2025 22:58:47 -0800 X-CSE-ConnectionGUID: 7UUljBYgQiqkj2iJIvfOHw== X-CSE-MsgGUID: I8bRBN8MSt+lTXFohYovyQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.20,245,1758610800"; d="scan'208";a="199003899" Received: from spr.sh.intel.com ([10.112.229.196]) by fmviesa005.fm.intel.com with ESMTP; 02 Dec 2025 22:58:42 -0800 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Thomas Gleixner , Dave Hansen , Ian Rogers , Adrian Hunter , Jiri Olsa , Alexander Shishkin , Andi Kleen , Eranian Stephane Cc: Mark Rutland , broonie@kernel.org, Ravi Bangoria , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Zide Chen , Falcon Thomas , Dapeng Mi , Xudong Hao , Kan Liang , Dapeng Mi Subject: [Patch v5 09/19] perf/x86: Enable YMM sampling using sample_simd_vec_reg_* fields Date: Wed, 3 Dec 2025 14:54:50 +0800 Message-Id: <20251203065500.2597594-10-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20251203065500.2597594-1-dapeng1.mi@linux.intel.com> References: <20251203065500.2597594-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang This patch introduces support for sampling YMM registers via the sample_simd_vec_reg_* fields. Each YMM register consists of 4 u64 words, assembled from two halves: XMM (the lower 2 u64 words) and YMMH (the upper 2 u64 words). Although both XMM and YMMH data can be retrieved with a single xsaves instruction, they are stored in separate locations. The perf_simd_reg_value() function is responsible for assembling these halves into a complete YMM register for output to userspace. Additionally, sample_simd_vec_reg_qwords should be set to 4 to indicate YMM sampling. Signed-off-by: Kan Liang Co-developed-by: Dapeng Mi Signed-off-by: Dapeng Mi --- arch/x86/events/core.c | 9 +++++++++ arch/x86/events/perf_event.h | 9 +++++++++ arch/x86/include/asm/perf_event.h | 4 ++++ arch/x86/include/uapi/asm/perf_regs.h | 8 ++++++-- arch/x86/kernel/perf_regs.c | 8 +++++++- 5 files changed, 35 insertions(+), 3 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 8f7e7e81daaf..b1e62c061d9e 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -423,6 +423,9 @@ static void x86_pmu_get_ext_regs(struct x86_perf_regs *= perf_regs, u64 mask) =20 if (valid_mask & XFEATURE_MASK_SSE) perf_regs->xmm_space =3D xsave->i387.xmm_space; + + if (valid_mask & XFEATURE_MASK_YMM) + perf_regs->ymmh =3D get_xsave_addr(xsave, XFEATURE_YMM); } =20 static void release_ext_regs_buffers(void) @@ -735,6 +738,9 @@ int x86_pmu_hw_config(struct perf_event *event) if (event_needs_xmm(event) && !(x86_pmu.ext_regs_mask & XFEATURE_MASK_SSE)) return -EINVAL; + if (event_needs_ymm(event) && + !(x86_pmu.ext_regs_mask & XFEATURE_MASK_YMM)) + return -EINVAL; } } =20 @@ -1814,6 +1820,7 @@ inline void x86_pmu_clear_perf_regs(struct pt_regs *r= egs) struct x86_perf_regs *perf_regs =3D container_of(regs, struct x86_perf_re= gs, regs); =20 perf_regs->xmm_regs =3D NULL; + perf_regs->ymmh_regs =3D NULL; } =20 static void x86_pmu_setup_basic_regs_data(struct perf_event *event, @@ -1883,6 +1890,8 @@ static void x86_pmu_sample_ext_regs(struct perf_event= *event, =20 if (event_needs_xmm(event)) mask |=3D XFEATURE_MASK_SSE; + if (event_needs_ymm(event)) + mask |=3D XFEATURE_MASK_YMM; =20 mask &=3D ~ignore_mask; if (mask) diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index e5d8ad024553..3d4577a1bb7d 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -145,6 +145,15 @@ static inline bool event_needs_xmm(struct perf_event *= event) return false; } =20 +static inline bool event_needs_ymm(struct perf_event *event) +{ + if (event->attr.sample_simd_regs_enabled && + event->attr.sample_simd_vec_reg_qwords >=3D PERF_X86_YMM_QWORDS) + return true; + + return false; +} + struct amd_nb { int nb_id; /* NorthBridge id */ int refcnt; /* reference count */ diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_= event.h index 5d623805bf87..25f5ae60f72f 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -709,6 +709,10 @@ struct x86_perf_regs { u64 *xmm_regs; u32 *xmm_space; /* for xsaves */ }; + union { + u64 *ymmh_regs; + struct ymmh_struct *ymmh; + }; }; =20 extern unsigned long perf_arch_instruction_pointer(struct pt_regs *regs); diff --git a/arch/x86/include/uapi/asm/perf_regs.h b/arch/x86/include/uapi/= asm/perf_regs.h index c3862e5fdd6d..4fd598785f6d 100644 --- a/arch/x86/include/uapi/asm/perf_regs.h +++ b/arch/x86/include/uapi/asm/perf_regs.h @@ -57,19 +57,23 @@ enum perf_event_x86_regs { =20 enum { PERF_REG_X86_XMM, + PERF_REG_X86_YMM, PERF_REG_X86_MAX_SIMD_REGS, }; =20 enum { PERF_X86_SIMD_XMM_REGS =3D 16, - PERF_X86_SIMD_VEC_REGS_MAX =3D PERF_X86_SIMD_XMM_REGS, + PERF_X86_SIMD_YMM_REGS =3D 16, + PERF_X86_SIMD_VEC_REGS_MAX =3D PERF_X86_SIMD_YMM_REGS, }; =20 #define PERF_X86_SIMD_VEC_MASK GENMASK_ULL(PERF_X86_SIMD_VEC_REGS_MAX - 1= , 0) =20 enum { PERF_X86_XMM_QWORDS =3D 2, - PERF_X86_SIMD_QWORDS_MAX =3D PERF_X86_XMM_QWORDS, + PERF_X86_YMMH_QWORDS =3D 2, + PERF_X86_YMM_QWORDS =3D 4, + PERF_X86_SIMD_QWORDS_MAX =3D PERF_X86_YMM_QWORDS, }; =20 #endif /* _ASM_X86_PERF_REGS_H */ diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c index 9947a6b5c260..8aa61a18fd71 100644 --- a/arch/x86/kernel/perf_regs.c +++ b/arch/x86/kernel/perf_regs.c @@ -95,6 +95,11 @@ u64 perf_simd_reg_value(struct pt_regs *regs, int idx, return 0; return perf_regs->xmm_regs[idx * PERF_X86_XMM_QWORDS + qwords_idx]; + } else if (qwords_idx < PERF_X86_YMM_QWORDS) { + if (!perf_regs->ymmh_regs) + return 0; + return perf_regs->ymmh_regs[idx * PERF_X86_YMMH_QWORDS + + qwords_idx - PERF_X86_XMM_QWORDS]; } =20 return 0; @@ -111,7 +116,8 @@ int perf_simd_reg_validate(u16 vec_qwords, u64 vec_mask, if (vec_mask) return -EINVAL; } else { - if (vec_qwords !=3D PERF_X86_XMM_QWORDS) + if (vec_qwords !=3D PERF_X86_XMM_QWORDS && + vec_qwords !=3D PERF_X86_YMM_QWORDS) return -EINVAL; if (vec_mask & ~PERF_X86_SIMD_VEC_MASK) return -EINVAL; --=20 2.34.1 From nobody Sat Feb 7 14:28:38 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0DC5929B766; Wed, 3 Dec 2025 06:58:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.8 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764745133; cv=none; b=sGRCPO1Gq6gOFdh6PqyazHFhrO8ConF5UN8hqjNd+Kip04SdTqdxV0aK9S3mxZgNpwD2sMP6T0kFpn2zZzExiOpaJAntspyQoLFOhlP9dUP3AxL/J1g6O9pMabUg3JanxXzeqj9PC3ytB1Rirbs9HSekPVDD7H8ogmgQ9WiIFDQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764745133; c=relaxed/simple; bh=5baxVJIuW2bmEUDAuOVJ1r4h89h+uSYhiPFa7r9dWvw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=HoEjRrt4uCIY2sPrAmE+iC8waTLsjyMvbXLdBvS0s42nk2BbpPKTKgUZ97suZviLVgE/dgn/rXdDEcPWEran0u3y39ElAuANL15p+H06ABbMcrh9EisqOzn+elQyQ6xe1VN7A7sNRUzv2DEtEujnzChs/twaNwTY/dnqTlqY94M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=n3b1nlxD; arc=none smtp.client-ip=192.198.163.8 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="n3b1nlxD" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1764745132; x=1796281132; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=5baxVJIuW2bmEUDAuOVJ1r4h89h+uSYhiPFa7r9dWvw=; b=n3b1nlxDBGqle5nHVFaJXpn+RCNV8yad13P66s1sXrrTUQj8yp2yXF92 s9LMhXiKLg8o7J5ui+j+hCPVD/Q/9Y9j0cGZCw/GnG3YCJ9plepWVobyN 0bqnqID/KI7erPZtPWs0hpCfkV1Uz/VsBQfq7pxk8IywVtbI/jz+R1oXb bGUf3Mp12qLEENTfIryOcGG0z9wOTrbe2axntHPG3j4WWDFosL/Rk70+o 5hk+aqcG8J2i8Ukr7XSM1d1YhnCIh+4B+OZrKF71IuNANp9xIpWpn/fcy ari8pyU9dh2nuJMfiAg0p3Z3JYKZD9d0j2SiMn3GJigcXjlU7a2jUbgx6 w==; X-CSE-ConnectionGUID: o9zkXKS4QO6zgz00B9gKPg== X-CSE-MsgGUID: +dKbn14OTt+Wyo1iYUjFNg== X-IronPort-AV: E=McAfee;i="6800,10657,11631"; a="84324859" X-IronPort-AV: E=Sophos;i="6.20,245,1758610800"; d="scan'208";a="84324859" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2025 22:58:52 -0800 X-CSE-ConnectionGUID: ux7fYehoThGmgj2I8SSxqQ== X-CSE-MsgGUID: pRG5YlaAS1WbJ2Y/GFejrA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.20,245,1758610800"; d="scan'208";a="199003925" Received: from spr.sh.intel.com ([10.112.229.196]) by fmviesa005.fm.intel.com with ESMTP; 02 Dec 2025 22:58:47 -0800 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Thomas Gleixner , Dave Hansen , Ian Rogers , Adrian Hunter , Jiri Olsa , Alexander Shishkin , Andi Kleen , Eranian Stephane Cc: Mark Rutland , broonie@kernel.org, Ravi Bangoria , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Zide Chen , Falcon Thomas , Dapeng Mi , Xudong Hao , Kan Liang , Dapeng Mi Subject: [Patch v5 10/19] perf/x86: Enable ZMM sampling using sample_simd_vec_reg_* fields Date: Wed, 3 Dec 2025 14:54:51 +0800 Message-Id: <20251203065500.2597594-11-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20251203065500.2597594-1-dapeng1.mi@linux.intel.com> References: <20251203065500.2597594-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang This patch adds support for sampling ZMM registers via the sample_simd_vec_reg_* fields. Each ZMM register consists of 8 u64 words. Current x86 hardware supports up to 32 ZMM registers. For ZMM registers from ZMM0 to ZMM15, they are assembled from three parts: XMM (the lower 2 u64 words), YMMH (the middle 2 u64 words), and ZMMH (the upper 4 u64 words). The perf_simd_reg_value() function is responsible for assembling these three parts into a complete ZMM register for output to userspace. For ZMM registers ZMM16 to ZMM31, each register can be read as a whole and directly outputted to userspace. Additionally, sample_simd_vec_reg_qwords should be set to 8 to indicate ZMM sampling. Signed-off-by: Kan Liang Co-developed-by: Dapeng Mi Signed-off-by: Dapeng Mi --- arch/x86/events/core.c | 16 ++++++++++++++++ arch/x86/events/perf_event.h | 19 +++++++++++++++++++ arch/x86/include/asm/perf_event.h | 8 ++++++++ arch/x86/include/uapi/asm/perf_regs.h | 11 +++++++++-- arch/x86/kernel/perf_regs.c | 15 ++++++++++++++- 5 files changed, 66 insertions(+), 3 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index b1e62c061d9e..d9c2cab5dcb9 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -426,6 +426,10 @@ static void x86_pmu_get_ext_regs(struct x86_perf_regs = *perf_regs, u64 mask) =20 if (valid_mask & XFEATURE_MASK_YMM) perf_regs->ymmh =3D get_xsave_addr(xsave, XFEATURE_YMM); + if (valid_mask & XFEATURE_MASK_ZMM_Hi256) + perf_regs->zmmh =3D get_xsave_addr(xsave, XFEATURE_ZMM_Hi256); + if (valid_mask & XFEATURE_MASK_Hi16_ZMM) + perf_regs->h16zmm =3D get_xsave_addr(xsave, XFEATURE_Hi16_ZMM); } =20 static void release_ext_regs_buffers(void) @@ -741,6 +745,12 @@ int x86_pmu_hw_config(struct perf_event *event) if (event_needs_ymm(event) && !(x86_pmu.ext_regs_mask & XFEATURE_MASK_YMM)) return -EINVAL; + if (event_needs_low16_zmm(event) && + !(x86_pmu.ext_regs_mask & XFEATURE_MASK_ZMM_Hi256)) + return -EINVAL; + if (event_needs_high16_zmm(event) && + !(x86_pmu.ext_regs_mask & XFEATURE_MASK_Hi16_ZMM)) + return -EINVAL; } } =20 @@ -1821,6 +1831,8 @@ inline void x86_pmu_clear_perf_regs(struct pt_regs *r= egs) =20 perf_regs->xmm_regs =3D NULL; perf_regs->ymmh_regs =3D NULL; + perf_regs->zmmh_regs =3D NULL; + perf_regs->h16zmm_regs =3D NULL; } =20 static void x86_pmu_setup_basic_regs_data(struct perf_event *event, @@ -1892,6 +1904,10 @@ static void x86_pmu_sample_ext_regs(struct perf_even= t *event, mask |=3D XFEATURE_MASK_SSE; if (event_needs_ymm(event)) mask |=3D XFEATURE_MASK_YMM; + if (event_needs_low16_zmm(event)) + mask |=3D XFEATURE_MASK_ZMM_Hi256; + if (event_needs_high16_zmm(event)) + mask |=3D XFEATURE_MASK_Hi16_ZMM; =20 mask &=3D ~ignore_mask; if (mask) diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 3d4577a1bb7d..9a871809a4aa 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -154,6 +154,25 @@ static inline bool event_needs_ymm(struct perf_event *= event) return false; } =20 +static inline bool event_needs_low16_zmm(struct perf_event *event) +{ + if (event->attr.sample_simd_regs_enabled && + event->attr.sample_simd_vec_reg_qwords >=3D PERF_X86_ZMM_QWORDS) + return true; + + return false; +} + +static inline bool event_needs_high16_zmm(struct perf_event *event) +{ + if (event->attr.sample_simd_regs_enabled && + (fls64(event->attr.sample_simd_vec_reg_intr) > PERF_X86_H16ZMM_BASE || + fls64(event->attr.sample_simd_vec_reg_user) > PERF_X86_H16ZMM_BASE)) + return true; + + return false; +} + struct amd_nb { int nb_id; /* NorthBridge id */ int refcnt; /* reference count */ diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_= event.h index 25f5ae60f72f..e4d9a8ba3e95 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -713,6 +713,14 @@ struct x86_perf_regs { u64 *ymmh_regs; struct ymmh_struct *ymmh; }; + union { + u64 *zmmh_regs; + struct avx_512_zmm_uppers_state *zmmh; + }; + union { + u64 *h16zmm_regs; + struct avx_512_hi16_state *h16zmm; + }; }; =20 extern unsigned long perf_arch_instruction_pointer(struct pt_regs *regs); diff --git a/arch/x86/include/uapi/asm/perf_regs.h b/arch/x86/include/uapi/= asm/perf_regs.h index 4fd598785f6d..96db454c7923 100644 --- a/arch/x86/include/uapi/asm/perf_regs.h +++ b/arch/x86/include/uapi/asm/perf_regs.h @@ -58,22 +58,29 @@ enum perf_event_x86_regs { enum { PERF_REG_X86_XMM, PERF_REG_X86_YMM, + PERF_REG_X86_ZMM, PERF_REG_X86_MAX_SIMD_REGS, }; =20 enum { PERF_X86_SIMD_XMM_REGS =3D 16, PERF_X86_SIMD_YMM_REGS =3D 16, - PERF_X86_SIMD_VEC_REGS_MAX =3D PERF_X86_SIMD_YMM_REGS, + PERF_X86_SIMD_ZMMH_REGS =3D 16, + PERF_X86_SIMD_ZMM_REGS =3D 32, + PERF_X86_SIMD_VEC_REGS_MAX =3D PERF_X86_SIMD_ZMM_REGS, }; =20 #define PERF_X86_SIMD_VEC_MASK GENMASK_ULL(PERF_X86_SIMD_VEC_REGS_MAX - 1= , 0) =20 +#define PERF_X86_H16ZMM_BASE PERF_X86_SIMD_ZMMH_REGS + enum { PERF_X86_XMM_QWORDS =3D 2, PERF_X86_YMMH_QWORDS =3D 2, PERF_X86_YMM_QWORDS =3D 4, - PERF_X86_SIMD_QWORDS_MAX =3D PERF_X86_YMM_QWORDS, + PERF_X86_ZMMH_QWORDS =3D 4, + PERF_X86_ZMM_QWORDS =3D 8, + PERF_X86_SIMD_QWORDS_MAX =3D PERF_X86_ZMM_QWORDS, }; =20 #endif /* _ASM_X86_PERF_REGS_H */ diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c index 8aa61a18fd71..0a3ffaaea3aa 100644 --- a/arch/x86/kernel/perf_regs.c +++ b/arch/x86/kernel/perf_regs.c @@ -90,6 +90,13 @@ u64 perf_simd_reg_value(struct pt_regs *regs, int idx, qwords_idx >=3D PERF_X86_SIMD_QWORDS_MAX)) return 0; =20 + if (idx >=3D PERF_X86_H16ZMM_BASE) { + if (!perf_regs->h16zmm_regs) + return 0; + return perf_regs->h16zmm_regs[(idx - PERF_X86_H16ZMM_BASE) * + PERF_X86_ZMM_QWORDS + qwords_idx]; + } + if (qwords_idx < PERF_X86_XMM_QWORDS) { if (!perf_regs->xmm_regs) return 0; @@ -100,6 +107,11 @@ u64 perf_simd_reg_value(struct pt_regs *regs, int idx, return 0; return perf_regs->ymmh_regs[idx * PERF_X86_YMMH_QWORDS + qwords_idx - PERF_X86_XMM_QWORDS]; + } else if (qwords_idx < PERF_X86_ZMM_QWORDS) { + if (!perf_regs->zmmh_regs) + return 0; + return perf_regs->zmmh_regs[idx * PERF_X86_ZMMH_QWORDS + + qwords_idx - PERF_X86_YMM_QWORDS]; } =20 return 0; @@ -117,7 +129,8 @@ int perf_simd_reg_validate(u16 vec_qwords, u64 vec_mask, return -EINVAL; } else { if (vec_qwords !=3D PERF_X86_XMM_QWORDS && - vec_qwords !=3D PERF_X86_YMM_QWORDS) + vec_qwords !=3D PERF_X86_YMM_QWORDS && + vec_qwords !=3D PERF_X86_ZMM_QWORDS) return -EINVAL; if (vec_mask & ~PERF_X86_SIMD_VEC_MASK) return -EINVAL; --=20 2.34.1 From nobody Sat Feb 7 14:28:38 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 509F62BE7A7; Wed, 3 Dec 2025 06:58:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.8 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764745139; cv=none; b=U2dwerL6brNYjwdAHswK+4sTCVkKzPPFnaNr+pcIAyVpFnJKonQUKuiRka3na9xZxp4j0j+SaRUr19ARRcumkjEBWxtIVs+h/droqS2OtXMTZWqCBicNQ12FakaU1AHxLdl6u0Dcrb3WM3EUCp/z8I7Seyzwu/+WXk28l6euwMw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764745139; c=relaxed/simple; bh=QeTVpFSlvbyMhrMYMIPPCJXoPq4f/BngnZbhnxX0n2M=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=L2cp89oxKT6fH3Ratzrd+dPh/JsT24K16/GTEs/NaSltuT2WHSs9mHrv/8JN02Bu7D3P3z+QZekQr+oXagVwuFwbIOt1IzKpa8g+vdiBnZMJLThNS+pLzP9TIlOrgCsvFjJsFUgkhyPMC5wOSB1WiihVQ2jUsLXTbqab0kXx3Lg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=hC7b7Y6T; arc=none smtp.client-ip=192.198.163.8 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="hC7b7Y6T" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1764745137; x=1796281137; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=QeTVpFSlvbyMhrMYMIPPCJXoPq4f/BngnZbhnxX0n2M=; b=hC7b7Y6T/oqfaRg25/gJLsGg2Me4jvDo3dC5OmX49oc3AxK4fPzM/0rA ZQz4EUcQqK93XJY/nQGeoHAu9l0GRbjDkhH6RnTAuwuZxWYSMBqVMgGM5 eUrd9Q7r6s6EN2nPSpTL7O6Oy7pkUq5UoKtXc2b2y/Xi5+ANX2MlEZkyd i6jeUJHwTD4vN2DKdZBX/k2cQ+t4EwSA1rEQ92WE0xVNMifAw47ihVF8r 4CNzvbU+ADFckB1jjed3+Zd9gW1BCt4fUEo8CkAK4PL/eJSgxHZEI3eqO +CFJ0DeBpdX9tL7YO7HtC3cqLF1awjUlcjsZOn1dUyBF7lWymqATMN35/ w==; X-CSE-ConnectionGUID: UUb36myYR1qkBAJGKNxL9A== X-CSE-MsgGUID: HOxtSOG9RHSASs1YnizlMQ== X-IronPort-AV: E=McAfee;i="6800,10657,11631"; a="84324869" X-IronPort-AV: E=Sophos;i="6.20,245,1758610800"; d="scan'208";a="84324869" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2025 22:58:57 -0800 X-CSE-ConnectionGUID: LXwPhZZQQn61oPuJtILq8Q== X-CSE-MsgGUID: z1lz8/QUQQ6LJJH65DK5aw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.20,245,1758610800"; d="scan'208";a="199003956" Received: from spr.sh.intel.com ([10.112.229.196]) by fmviesa005.fm.intel.com with ESMTP; 02 Dec 2025 22:58:52 -0800 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Thomas Gleixner , Dave Hansen , Ian Rogers , Adrian Hunter , Jiri Olsa , Alexander Shishkin , Andi Kleen , Eranian Stephane Cc: Mark Rutland , broonie@kernel.org, Ravi Bangoria , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Zide Chen , Falcon Thomas , Dapeng Mi , Xudong Hao , Kan Liang , Dapeng Mi Subject: [Patch v5 11/19] perf/x86: Enable OPMASK sampling using sample_simd_pred_reg_* fields Date: Wed, 3 Dec 2025 14:54:52 +0800 Message-Id: <20251203065500.2597594-12-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20251203065500.2597594-1-dapeng1.mi@linux.intel.com> References: <20251203065500.2597594-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang This patch adds support for sampling OPAMSK registers via the sample_simd_pred_reg_* fields. Each OPMASK register consists of 1 u64 word. Current x86 hardware supports 8 OPMASK registers. The perf_simd_reg_value() function is responsible for outputting OPMASK value to userspace. Additionally, sample_simd_pred_reg_qwords should be set to 1 to indicate OPMASK sampling. Signed-off-by: Kan Liang Co-developed-by: Dapeng Mi Signed-off-by: Dapeng Mi --- arch/x86/events/core.c | 8 ++++++++ arch/x86/events/perf_event.h | 10 ++++++++++ arch/x86/include/asm/perf_event.h | 4 ++++ arch/x86/include/uapi/asm/perf_regs.h | 8 ++++++++ arch/x86/kernel/perf_regs.c | 15 ++++++++++++--- 5 files changed, 42 insertions(+), 3 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index d9c2cab5dcb9..3a4144ee0b7b 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -430,6 +430,8 @@ static void x86_pmu_get_ext_regs(struct x86_perf_regs *= perf_regs, u64 mask) perf_regs->zmmh =3D get_xsave_addr(xsave, XFEATURE_ZMM_Hi256); if (valid_mask & XFEATURE_MASK_Hi16_ZMM) perf_regs->h16zmm =3D get_xsave_addr(xsave, XFEATURE_Hi16_ZMM); + if (valid_mask & XFEATURE_MASK_OPMASK) + perf_regs->opmask =3D get_xsave_addr(xsave, XFEATURE_OPMASK); } =20 static void release_ext_regs_buffers(void) @@ -751,6 +753,9 @@ int x86_pmu_hw_config(struct perf_event *event) if (event_needs_high16_zmm(event) && !(x86_pmu.ext_regs_mask & XFEATURE_MASK_Hi16_ZMM)) return -EINVAL; + if (event_needs_opmask(event) && + !(x86_pmu.ext_regs_mask & XFEATURE_MASK_OPMASK)) + return -EINVAL; } } =20 @@ -1833,6 +1838,7 @@ inline void x86_pmu_clear_perf_regs(struct pt_regs *r= egs) perf_regs->ymmh_regs =3D NULL; perf_regs->zmmh_regs =3D NULL; perf_regs->h16zmm_regs =3D NULL; + perf_regs->opmask_regs =3D NULL; } =20 static void x86_pmu_setup_basic_regs_data(struct perf_event *event, @@ -1908,6 +1914,8 @@ static void x86_pmu_sample_ext_regs(struct perf_event= *event, mask |=3D XFEATURE_MASK_ZMM_Hi256; if (event_needs_high16_zmm(event)) mask |=3D XFEATURE_MASK_Hi16_ZMM; + if (event_needs_opmask(event)) + mask |=3D XFEATURE_MASK_OPMASK; =20 mask &=3D ~ignore_mask; if (mask) diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 9a871809a4aa..7e081a392ff8 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -173,6 +173,16 @@ static inline bool event_needs_high16_zmm(struct perf_= event *event) return false; } =20 +static inline bool event_needs_opmask(struct perf_event *event) +{ + if (event->attr.sample_simd_regs_enabled && + (event->attr.sample_simd_pred_reg_intr || + event->attr.sample_simd_pred_reg_user)) + return true; + + return false; +} + struct amd_nb { int nb_id; /* NorthBridge id */ int refcnt; /* reference count */ diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_= event.h index e4d9a8ba3e95..caa6df8ac1cd 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -721,6 +721,10 @@ struct x86_perf_regs { u64 *h16zmm_regs; struct avx_512_hi16_state *h16zmm; }; + union { + u64 *opmask_regs; + struct avx_512_opmask_state *opmask; + }; }; =20 extern unsigned long perf_arch_instruction_pointer(struct pt_regs *regs); diff --git a/arch/x86/include/uapi/asm/perf_regs.h b/arch/x86/include/uapi/= asm/perf_regs.h index 96db454c7923..6f29fd9495a2 100644 --- a/arch/x86/include/uapi/asm/perf_regs.h +++ b/arch/x86/include/uapi/asm/perf_regs.h @@ -60,6 +60,9 @@ enum { PERF_REG_X86_YMM, PERF_REG_X86_ZMM, PERF_REG_X86_MAX_SIMD_REGS, + + PERF_REG_X86_OPMASK =3D 0, + PERF_REG_X86_MAX_PRED_REGS =3D 1, }; =20 enum { @@ -68,13 +71,18 @@ enum { PERF_X86_SIMD_ZMMH_REGS =3D 16, PERF_X86_SIMD_ZMM_REGS =3D 32, PERF_X86_SIMD_VEC_REGS_MAX =3D PERF_X86_SIMD_ZMM_REGS, + + PERF_X86_SIMD_OPMASK_REGS =3D 8, + PERF_X86_SIMD_PRED_REGS_MAX =3D PERF_X86_SIMD_OPMASK_REGS, }; =20 +#define PERF_X86_SIMD_PRED_MASK GENMASK(PERF_X86_SIMD_PRED_REGS_MAX - 1, = 0) #define PERF_X86_SIMD_VEC_MASK GENMASK_ULL(PERF_X86_SIMD_VEC_REGS_MAX - 1= , 0) =20 #define PERF_X86_H16ZMM_BASE PERF_X86_SIMD_ZMMH_REGS =20 enum { + PERF_X86_OPMASK_QWORDS =3D 1, PERF_X86_XMM_QWORDS =3D 2, PERF_X86_YMMH_QWORDS =3D 2, PERF_X86_YMM_QWORDS =3D 4, diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c index 0a3ffaaea3aa..1ca24e2a6aa0 100644 --- a/arch/x86/kernel/perf_regs.c +++ b/arch/x86/kernel/perf_regs.c @@ -83,8 +83,14 @@ u64 perf_simd_reg_value(struct pt_regs *regs, int idx, struct x86_perf_regs *perf_regs =3D container_of(regs, struct x86_perf_regs, regs); =20 - if (pred) - return 0; + if (pred) { + if (WARN_ON_ONCE(idx >=3D PERF_X86_SIMD_PRED_REGS_MAX || + qwords_idx >=3D PERF_X86_OPMASK_QWORDS)) + return 0; + if (!perf_regs->opmask_regs) + return 0; + return perf_regs->opmask_regs[idx]; + } =20 if (WARN_ON_ONCE(idx >=3D PERF_X86_SIMD_VEC_REGS_MAX || qwords_idx >=3D PERF_X86_SIMD_QWORDS_MAX)) @@ -135,7 +141,10 @@ int perf_simd_reg_validate(u16 vec_qwords, u64 vec_mas= k, if (vec_mask & ~PERF_X86_SIMD_VEC_MASK) return -EINVAL; } - if (pred_mask) + + if (pred_qwords !=3D PERF_X86_OPMASK_QWORDS) + return -EINVAL; + if (pred_mask & ~PERF_X86_SIMD_PRED_MASK) return -EINVAL; =20 return 0; --=20 2.34.1 From nobody Sat Feb 7 14:28:38 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2D1E42BEC2D; Wed, 3 Dec 2025 06:59:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.8 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764745147; cv=none; b=gkX1zVPsOcVOFrEJ153HWJY0txTZ3BBkRFofvWMttc4W0JlG+WCITBvTirN609/AmJ8sQn0Owx6vWEJ2EhzBgmyD2zQ7banVS5PvAGmpTIcVcVtwC2mqSEMTE/37OCxNoXMf39Hu4SMAFLNgPW4ZYLcJsxoLyPBwTjH+rwquXTk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764745147; c=relaxed/simple; bh=ddcjsAcklFdcJaV67NJqKAAOjXOKDQnf4Nq6bNtJcPA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Bn15agXwBO0s2fJ3DbAIU6NJ1DkMptMKYxXxSY+2DbpuGvVml5cbrS7dysKYeq3UaT+OCNrlPmjy4YDAyGLbBwGhE2ZOhwDOtH7doDj71jUpFXHe2KbXzB36Ma96kY84vZKWY1tOChPfRdUdJKYh3m6v/19LcuMNBpH/snNjdlc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=YRUKqUbz; arc=none smtp.client-ip=192.198.163.8 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="YRUKqUbz" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1764745144; x=1796281144; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ddcjsAcklFdcJaV67NJqKAAOjXOKDQnf4Nq6bNtJcPA=; b=YRUKqUbzB2aoWb7TWG5OpEIis6g+K9w2/RDyA/WrBifmCcgfKzWi4fx7 3IywFmoWZSaDlycwb7otyoGytV5AZRiiTwRQN7ukCgtg895X73txoKmGY DvT100h8wE6mH6Wj3wodD4mR8A1ZJcqPXq79BKc19DIHyFyfGRKnTQVfX Lwiw/gm1oxuPzVXTNo1bi8SaJotujlJjLlJrtPOi291n1aP1+YkcM8nLO IuN4E45NYz0JRdiiGKhZTLwhClkJKRjlbBLEE1bVDFj+CpKzLQGOO82lz HJKuQfAuHdmk4cXH3FiI75oIIePACY87KkRKlJ3q4jJLytVcXol6iylDF w==; X-CSE-ConnectionGUID: Y2A8PAMDSDepXCBwb2hjhA== X-CSE-MsgGUID: qaHwHEGdQMipDZBJSzQqoA== X-IronPort-AV: E=McAfee;i="6800,10657,11631"; a="84324880" X-IronPort-AV: E=Sophos;i="6.20,245,1758610800"; d="scan'208";a="84324880" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2025 22:59:03 -0800 X-CSE-ConnectionGUID: Dp9ZGSN2R0ipUiu3QH4xCA== X-CSE-MsgGUID: YU83Uc/JTHWhivAi0zkJAw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.20,245,1758610800"; d="scan'208";a="199003988" Received: from spr.sh.intel.com ([10.112.229.196]) by fmviesa005.fm.intel.com with ESMTP; 02 Dec 2025 22:58:57 -0800 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Thomas Gleixner , Dave Hansen , Ian Rogers , Adrian Hunter , Jiri Olsa , Alexander Shishkin , Andi Kleen , Eranian Stephane Cc: Mark Rutland , broonie@kernel.org, Ravi Bangoria , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Zide Chen , Falcon Thomas , Dapeng Mi , Xudong Hao , Kan Liang , Dapeng Mi Subject: [Patch v5 12/19] perf/x86: Enable eGPRs sampling using sample_regs_* fields Date: Wed, 3 Dec 2025 14:54:53 +0800 Message-Id: <20251203065500.2597594-13-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20251203065500.2597594-1-dapeng1.mi@linux.intel.com> References: <20251203065500.2597594-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang This patch enables sampling of APX eGPRs (R16 ~ R31) via the sample_regs_* fields. To sample eGPRs, the sample_simd_regs_enabled field must be set. This allows the spare space (reclaimed from the original XMM space) in the sample_regs_* fields to be used for representing eGPRs. The perf_reg_value() function needs to check if the PERF_SAMPLE_REGS_ABI_SIMD flag is set first, and then determine whether to output eGPRs or legacy XMM registers to userspace. The perf_reg_validate() function is enhanced to validate the eGPRs bitmap by adding a new argument, "simd_enabled". Currently, eGPRs sampling is only supported on the x86_64 architecture, as APX is only available on x86_64 platforms. Suggested-by: Peter Zijlstra (Intel) Signed-off-by: Kan Liang Co-developed-by: Dapeng Mi Signed-off-by: Dapeng Mi --- arch/arm/kernel/perf_regs.c | 2 +- arch/arm64/kernel/perf_regs.c | 2 +- arch/csky/kernel/perf_regs.c | 2 +- arch/loongarch/kernel/perf_regs.c | 2 +- arch/mips/kernel/perf_regs.c | 2 +- arch/parisc/kernel/perf_regs.c | 2 +- arch/powerpc/perf/perf_regs.c | 2 +- arch/riscv/kernel/perf_regs.c | 2 +- arch/s390/kernel/perf_regs.c | 2 +- arch/x86/events/core.c | 41 +++++++++++++++------- arch/x86/events/perf_event.h | 10 ++++++ arch/x86/include/asm/perf_event.h | 4 +++ arch/x86/include/uapi/asm/perf_regs.h | 25 ++++++++++++++ arch/x86/kernel/perf_regs.c | 49 +++++++++++++++------------ include/linux/perf_regs.h | 2 +- kernel/events/core.c | 8 +++-- 16 files changed, 110 insertions(+), 47 deletions(-) diff --git a/arch/arm/kernel/perf_regs.c b/arch/arm/kernel/perf_regs.c index d575a4c3ca56..838d701adf4d 100644 --- a/arch/arm/kernel/perf_regs.c +++ b/arch/arm/kernel/perf_regs.c @@ -18,7 +18,7 @@ u64 perf_reg_value(struct pt_regs *regs, int idx) =20 #define REG_RESERVED (~((1ULL << PERF_REG_ARM_MAX) - 1)) =20 -int perf_reg_validate(u64 mask) +int perf_reg_validate(u64 mask, bool simd_enabled) { if (!mask || mask & REG_RESERVED) return -EINVAL; diff --git a/arch/arm64/kernel/perf_regs.c b/arch/arm64/kernel/perf_regs.c index 70e2f13f587f..71a3e0238de4 100644 --- a/arch/arm64/kernel/perf_regs.c +++ b/arch/arm64/kernel/perf_regs.c @@ -77,7 +77,7 @@ u64 perf_reg_value(struct pt_regs *regs, int idx) =20 #define REG_RESERVED (~((1ULL << PERF_REG_ARM64_MAX) - 1)) =20 -int perf_reg_validate(u64 mask) +int perf_reg_validate(u64 mask, bool simd_enabled) { u64 reserved_mask =3D REG_RESERVED; =20 diff --git a/arch/csky/kernel/perf_regs.c b/arch/csky/kernel/perf_regs.c index 94601f37b596..c932a96afc56 100644 --- a/arch/csky/kernel/perf_regs.c +++ b/arch/csky/kernel/perf_regs.c @@ -18,7 +18,7 @@ u64 perf_reg_value(struct pt_regs *regs, int idx) =20 #define REG_RESERVED (~((1ULL << PERF_REG_CSKY_MAX) - 1)) =20 -int perf_reg_validate(u64 mask) +int perf_reg_validate(u64 mask, bool simd_enabled) { if (!mask || mask & REG_RESERVED) return -EINVAL; diff --git a/arch/loongarch/kernel/perf_regs.c b/arch/loongarch/kernel/perf= _regs.c index 8dd604f01745..164514f40ae0 100644 --- a/arch/loongarch/kernel/perf_regs.c +++ b/arch/loongarch/kernel/perf_regs.c @@ -25,7 +25,7 @@ u64 perf_reg_abi(struct task_struct *tsk) } #endif /* CONFIG_32BIT */ =20 -int perf_reg_validate(u64 mask) +int perf_reg_validate(u64 mask, bool simd_enabled) { if (!mask) return -EINVAL; diff --git a/arch/mips/kernel/perf_regs.c b/arch/mips/kernel/perf_regs.c index 7736d3c5ebd2..00a5201dbd5d 100644 --- a/arch/mips/kernel/perf_regs.c +++ b/arch/mips/kernel/perf_regs.c @@ -28,7 +28,7 @@ u64 perf_reg_abi(struct task_struct *tsk) } #endif /* CONFIG_32BIT */ =20 -int perf_reg_validate(u64 mask) +int perf_reg_validate(u64 mask, bool simd_enabled) { if (!mask) return -EINVAL; diff --git a/arch/parisc/kernel/perf_regs.c b/arch/parisc/kernel/perf_regs.c index 87e6990569a7..169c25c054b2 100644 --- a/arch/parisc/kernel/perf_regs.c +++ b/arch/parisc/kernel/perf_regs.c @@ -34,7 +34,7 @@ u64 perf_reg_value(struct pt_regs *regs, int idx) =20 #define REG_RESERVED (~((1ULL << PERF_REG_PARISC_MAX) - 1)) =20 -int perf_reg_validate(u64 mask) +int perf_reg_validate(u64 mask, bool simd_enabled) { if (!mask || mask & REG_RESERVED) return -EINVAL; diff --git a/arch/powerpc/perf/perf_regs.c b/arch/powerpc/perf/perf_regs.c index 350dccb0143c..a01d8a903640 100644 --- a/arch/powerpc/perf/perf_regs.c +++ b/arch/powerpc/perf/perf_regs.c @@ -125,7 +125,7 @@ u64 perf_reg_value(struct pt_regs *regs, int idx) return regs_get_register(regs, pt_regs_offset[idx]); } =20 -int perf_reg_validate(u64 mask) +int perf_reg_validate(u64 mask, bool simd_enabled) { if (!mask || mask & REG_RESERVED) return -EINVAL; diff --git a/arch/riscv/kernel/perf_regs.c b/arch/riscv/kernel/perf_regs.c index 3bba8deababb..1ecc8760b88b 100644 --- a/arch/riscv/kernel/perf_regs.c +++ b/arch/riscv/kernel/perf_regs.c @@ -18,7 +18,7 @@ u64 perf_reg_value(struct pt_regs *regs, int idx) =20 #define REG_RESERVED (~((1ULL << PERF_REG_RISCV_MAX) - 1)) =20 -int perf_reg_validate(u64 mask) +int perf_reg_validate(u64 mask, bool simd_enabled) { if (!mask || mask & REG_RESERVED) return -EINVAL; diff --git a/arch/s390/kernel/perf_regs.c b/arch/s390/kernel/perf_regs.c index a6b058ee4a36..c5ad9e2f489b 100644 --- a/arch/s390/kernel/perf_regs.c +++ b/arch/s390/kernel/perf_regs.c @@ -34,7 +34,7 @@ u64 perf_reg_value(struct pt_regs *regs, int idx) =20 #define REG_RESERVED (~((1UL << PERF_REG_S390_MAX) - 1)) =20 -int perf_reg_validate(u64 mask) +int perf_reg_validate(u64 mask, bool simd_enabled) { if (!mask || mask & REG_RESERVED) return -EINVAL; diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 3a4144ee0b7b..ec0838469cae 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -432,6 +432,8 @@ static void x86_pmu_get_ext_regs(struct x86_perf_regs *= perf_regs, u64 mask) perf_regs->h16zmm =3D get_xsave_addr(xsave, XFEATURE_Hi16_ZMM); if (valid_mask & XFEATURE_MASK_OPMASK) perf_regs->opmask =3D get_xsave_addr(xsave, XFEATURE_OPMASK); + if (valid_mask & XFEATURE_MASK_APX) + perf_regs->egpr =3D get_xsave_addr(xsave, XFEATURE_APX); } =20 static void release_ext_regs_buffers(void) @@ -719,22 +721,21 @@ int x86_pmu_hw_config(struct perf_event *event) } =20 if (event->attr.sample_type & (PERF_SAMPLE_REGS_INTR | PERF_SAMPLE_REGS_U= SER)) { - /* - * Besides the general purpose registers, XMM registers may - * be collected as well. - */ - if (event_has_extended_regs(event)) { - if (!(event->pmu->capabilities & PERF_PMU_CAP_EXTENDED_REGS)) - return -EINVAL; - if (!event->attr.precise_ip) - return -EINVAL; - if (event->attr.sample_simd_regs_enabled) - return -EINVAL; - } - if (event_has_simd_regs(event)) { + u64 reserved =3D ~GENMASK_ULL(PERF_REG_MISC_MAX - 1, 0); + if (!(event->pmu->capabilities & PERF_PMU_CAP_SIMD_REGS)) return -EINVAL; + /* + * The XMM space in the perf_event_x86_regs is reclaimed + * for eGPRs and other general registers. + */ + if (event->attr.sample_regs_user & reserved || + event->attr.sample_regs_intr & reserved) + return -EINVAL; + if (event_needs_egprs(event) && + !(x86_pmu.ext_regs_mask & XFEATURE_MASK_APX)) + return -EINVAL; /* Not require any vector registers but set width */ if (event->attr.sample_simd_vec_reg_qwords && !event->attr.sample_simd_vec_reg_intr && @@ -756,6 +757,17 @@ int x86_pmu_hw_config(struct perf_event *event) if (event_needs_opmask(event) && !(x86_pmu.ext_regs_mask & XFEATURE_MASK_OPMASK)) return -EINVAL; + } else { + /* + * Besides the general purpose registers, XMM registers may + * be collected as well. + */ + if (event_has_extended_regs(event)) { + if (!(event->pmu->capabilities & PERF_PMU_CAP_EXTENDED_REGS)) + return -EINVAL; + if (!event->attr.precise_ip) + return -EINVAL; + } } } =20 @@ -1839,6 +1851,7 @@ inline void x86_pmu_clear_perf_regs(struct pt_regs *r= egs) perf_regs->zmmh_regs =3D NULL; perf_regs->h16zmm_regs =3D NULL; perf_regs->opmask_regs =3D NULL; + perf_regs->egpr_regs =3D NULL; } =20 static void x86_pmu_setup_basic_regs_data(struct perf_event *event, @@ -1916,6 +1929,8 @@ static void x86_pmu_sample_ext_regs(struct perf_event= *event, mask |=3D XFEATURE_MASK_Hi16_ZMM; if (event_needs_opmask(event)) mask |=3D XFEATURE_MASK_OPMASK; + if (event_needs_egprs(event)) + mask |=3D XFEATURE_MASK_APX; =20 mask &=3D ~ignore_mask; if (mask) diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 7e081a392ff8..9fb1cbbc1b76 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -183,6 +183,16 @@ static inline bool event_needs_opmask(struct perf_even= t *event) return false; } =20 +static inline bool event_needs_egprs(struct perf_event *event) +{ + if (event->attr.sample_simd_regs_enabled && + (event->attr.sample_regs_user & PERF_X86_EGPRS_MASK || + event->attr.sample_regs_intr & PERF_X86_EGPRS_MASK)) + return true; + + return false; +} + struct amd_nb { int nb_id; /* NorthBridge id */ int refcnt; /* reference count */ diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_= event.h index caa6df8ac1cd..ca242db3720f 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -725,6 +725,10 @@ struct x86_perf_regs { u64 *opmask_regs; struct avx_512_opmask_state *opmask; }; + union { + u64 *egpr_regs; + struct apx_state *egpr; + }; }; =20 extern unsigned long perf_arch_instruction_pointer(struct pt_regs *regs); diff --git a/arch/x86/include/uapi/asm/perf_regs.h b/arch/x86/include/uapi/= asm/perf_regs.h index 6f29fd9495a2..f145e3b78426 100644 --- a/arch/x86/include/uapi/asm/perf_regs.h +++ b/arch/x86/include/uapi/asm/perf_regs.h @@ -27,9 +27,33 @@ enum perf_event_x86_regs { PERF_REG_X86_R13, PERF_REG_X86_R14, PERF_REG_X86_R15, + /* + * The EGPRs and XMM have overlaps. Only one can be used + * at a time. For the ABI type PERF_SAMPLE_REGS_ABI_SIMD, + * utilize EGPRs. For the other ABI type, XMM is used. + * + * Extended GPRs (EGPRs) + */ + PERF_REG_X86_R16, + PERF_REG_X86_R17, + PERF_REG_X86_R18, + PERF_REG_X86_R19, + PERF_REG_X86_R20, + PERF_REG_X86_R21, + PERF_REG_X86_R22, + PERF_REG_X86_R23, + PERF_REG_X86_R24, + PERF_REG_X86_R25, + PERF_REG_X86_R26, + PERF_REG_X86_R27, + PERF_REG_X86_R28, + PERF_REG_X86_R29, + PERF_REG_X86_R30, + PERF_REG_X86_R31, /* These are the limits for the GPRs. */ PERF_REG_X86_32_MAX =3D PERF_REG_X86_GS + 1, PERF_REG_X86_64_MAX =3D PERF_REG_X86_R15 + 1, + PERF_REG_MISC_MAX =3D PERF_REG_X86_R31 + 1, =20 /* These all need two bits set because they are 128bit */ PERF_REG_X86_XMM0 =3D 32, @@ -54,6 +78,7 @@ enum perf_event_x86_regs { }; =20 #define PERF_REG_EXTENDED_MASK (~((1ULL << PERF_REG_X86_XMM0) - 1)) +#define PERF_X86_EGPRS_MASK GENMASK_ULL(PERF_REG_X86_R31, PERF_REG_X86_R16) =20 enum { PERF_REG_X86_XMM, diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c index 1ca24e2a6aa0..e76de39e1385 100644 --- a/arch/x86/kernel/perf_regs.c +++ b/arch/x86/kernel/perf_regs.c @@ -61,14 +61,22 @@ u64 perf_reg_value(struct pt_regs *regs, int idx) { struct x86_perf_regs *perf_regs; =20 - if (idx >=3D PERF_REG_X86_XMM0 && idx < PERF_REG_X86_XMM_MAX) { + if (idx > PERF_REG_X86_R15) { perf_regs =3D container_of(regs, struct x86_perf_regs, regs); - /* SIMD registers are moved to dedicated sample_simd_vec_reg */ - if (perf_regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) - return 0; - if (!perf_regs->xmm_regs) - return 0; - return perf_regs->xmm_regs[idx - PERF_REG_X86_XMM0]; + + if (perf_regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) { + if (idx <=3D PERF_REG_X86_R31) { + if (!perf_regs->egpr_regs) + return 0; + return perf_regs->egpr_regs[idx - PERF_REG_X86_R16]; + } + } else { + if (idx >=3D PERF_REG_X86_XMM0 && idx < PERF_REG_X86_XMM_MAX) { + if (!perf_regs->xmm_regs) + return 0; + return perf_regs->xmm_regs[idx - PERF_REG_X86_XMM0]; + } + } } =20 if (WARN_ON_ONCE(idx >=3D ARRAY_SIZE(pt_regs_offset))) @@ -150,20 +158,14 @@ int perf_simd_reg_validate(u16 vec_qwords, u64 vec_ma= sk, return 0; } =20 -#define PERF_REG_X86_RESERVED (((1ULL << PERF_REG_X86_XMM0) - 1) & \ - ~((1ULL << PERF_REG_X86_MAX) - 1)) +#define PERF_REG_X86_RESERVED (GENMASK_ULL(PERF_REG_X86_XMM0 - 1, PERF_REG= _X86_AX) & \ + ~GENMASK_ULL(PERF_REG_X86_R15, PERF_REG_X86_AX)) +#define PERF_REG_X86_EXT_RESERVED (~GENMASK_ULL(PERF_REG_MISC_MAX - 1, PER= F_REG_X86_AX)) =20 #ifdef CONFIG_X86_32 -#define REG_NOSUPPORT ((1ULL << PERF_REG_X86_R8) | \ - (1ULL << PERF_REG_X86_R9) | \ - (1ULL << PERF_REG_X86_R10) | \ - (1ULL << PERF_REG_X86_R11) | \ - (1ULL << PERF_REG_X86_R12) | \ - (1ULL << PERF_REG_X86_R13) | \ - (1ULL << PERF_REG_X86_R14) | \ - (1ULL << PERF_REG_X86_R15)) - -int perf_reg_validate(u64 mask) +#define REG_NOSUPPORT GENMASK_ULL(PERF_REG_X86_R15, PERF_REG_X86_R8) + +int perf_reg_validate(u64 mask, bool simd_enabled) { if (!mask || (mask & (REG_NOSUPPORT | PERF_REG_X86_RESERVED))) return -EINVAL; @@ -182,10 +184,15 @@ u64 perf_reg_abi(struct task_struct *task) (1ULL << PERF_REG_X86_FS) | \ (1ULL << PERF_REG_X86_GS)) =20 -int perf_reg_validate(u64 mask) +int perf_reg_validate(u64 mask, bool simd_enabled) { /* The mask could be 0 if only the SIMD registers are interested */ - if (mask & (REG_NOSUPPORT | PERF_REG_X86_RESERVED)) + if (!simd_enabled && + (mask & (REG_NOSUPPORT | PERF_REG_X86_RESERVED))) + return -EINVAL; + + if (simd_enabled && + (mask & (REG_NOSUPPORT | PERF_REG_X86_EXT_RESERVED))) return -EINVAL; =20 return 0; diff --git a/include/linux/perf_regs.h b/include/linux/perf_regs.h index 518f28c6a7d4..09dbc2fc3859 100644 --- a/include/linux/perf_regs.h +++ b/include/linux/perf_regs.h @@ -10,7 +10,7 @@ struct perf_regs { }; =20 u64 perf_reg_value(struct pt_regs *regs, int idx); -int perf_reg_validate(u64 mask); +int perf_reg_validate(u64 mask, bool simd_enabled); u64 perf_reg_abi(struct task_struct *task); void perf_get_regs_user(struct perf_regs *regs_user, struct pt_regs *regs); diff --git a/kernel/events/core.c b/kernel/events/core.c index b19de038979e..428ff39e03c5 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -7436,7 +7436,7 @@ u64 __weak perf_reg_value(struct pt_regs *regs, int i= dx) return 0; } =20 -int __weak perf_reg_validate(u64 mask) +int __weak perf_reg_validate(u64 mask, bool simd_enabled) { return mask ? -ENOSYS : 0; } @@ -13310,7 +13310,8 @@ static int perf_copy_attr(struct perf_event_attr __= user *uattr, } =20 if (attr->sample_type & PERF_SAMPLE_REGS_USER) { - ret =3D perf_reg_validate(attr->sample_regs_user); + ret =3D perf_reg_validate(attr->sample_regs_user, + attr->sample_simd_regs_enabled); if (ret) return ret; ret =3D perf_simd_reg_validate(attr->sample_simd_vec_reg_qwords, @@ -13340,7 +13341,8 @@ static int perf_copy_attr(struct perf_event_attr __= user *uattr, attr->sample_max_stack =3D sysctl_perf_event_max_stack; =20 if (attr->sample_type & PERF_SAMPLE_REGS_INTR) { - ret =3D perf_reg_validate(attr->sample_regs_intr); + ret =3D perf_reg_validate(attr->sample_regs_intr, + attr->sample_simd_regs_enabled); if (ret) return ret; ret =3D perf_simd_reg_validate(attr->sample_simd_vec_reg_qwords, --=20 2.34.1 From nobody Sat Feb 7 14:28:38 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E88582BDC17; Wed, 3 Dec 2025 06:59:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.8 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764745154; cv=none; b=hkmKpAY79JLNgyjXKmIhugazQGBI9Mgsh+Ig3nzE6O6DKhGI3SgnqBb6W0XKAsChSCqFK4AqB59q3I9UisMIaBCibT+dre3Zv6wbrhglzAuFtTcHhAZSZtVCaNnKWtGmCbGJJb1XgiQJnMLootKbPLNfDqOSzJp58Op4l2f/FV4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764745154; c=relaxed/simple; bh=VjQIha2opesHaPJpF9OXfmPv2ooba3kDbd3TwPvcPuw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=aN3460JpbLki1kYaBWQT508ff0Z8pg1dat6BbgFk/rf5y+7Vwx+lkHV6gASYe6YRcTAQcdANhwDbSEJYeFkHkh/kquntmA3u3QWl94BtRziUGVee6VghWkX69hDiZbD/oPfF6M0R8wzbtkxsOrvcRQLVrekfXPeXtiknnkdMbB8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=eJZ1uwZj; arc=none smtp.client-ip=192.198.163.8 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="eJZ1uwZj" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1764745149; x=1796281149; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=VjQIha2opesHaPJpF9OXfmPv2ooba3kDbd3TwPvcPuw=; b=eJZ1uwZjDxLEGagM9l9J842kL9KsLHLOmrC9KWtrl2xNFAHUdaliB3NA 9TO/92FpCXmN+hoiVqx9MKJpzQQuLwRX6wxUp1Acq3V8BobV4SJT9CvQj MXgTePqcsGCN2fL/aaf4j5thrFlUqOokwk3gBJAruLutvL4MP1NdOhQGN Qx6kMF6Krpw50u5GhIAu+ipagH3THNX+16/X17ssTHgV6hzbKFOWS7b+i j9Qwl9Dbtkpg3q5mvr/M0kDTTcgywiBG+4mU9T9aH/pju1R62HfyYGK0b MTeQJDjeaYaMS3nFsue4Pn4w5PbdvIaF6ZaSrJ84/RX8ZCHn5fcgoIl76 w==; X-CSE-ConnectionGUID: kStG+wHAStqBuxqMg/buMg== X-CSE-MsgGUID: eZuEmwy+T3WxRDE2WhWMbg== X-IronPort-AV: E=McAfee;i="6800,10657,11631"; a="84324893" X-IronPort-AV: E=Sophos;i="6.20,245,1758610800"; d="scan'208";a="84324893" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2025 22:59:08 -0800 X-CSE-ConnectionGUID: QFQ+0iA6Q2yb5jlbD/+idQ== X-CSE-MsgGUID: MP+cTNCZSU2JWqL/GSoN9g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.20,245,1758610800"; d="scan'208";a="199003998" Received: from spr.sh.intel.com ([10.112.229.196]) by fmviesa005.fm.intel.com with ESMTP; 02 Dec 2025 22:59:03 -0800 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Thomas Gleixner , Dave Hansen , Ian Rogers , Adrian Hunter , Jiri Olsa , Alexander Shishkin , Andi Kleen , Eranian Stephane Cc: Mark Rutland , broonie@kernel.org, Ravi Bangoria , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Zide Chen , Falcon Thomas , Dapeng Mi , Xudong Hao , Kan Liang , Dapeng Mi Subject: [Patch v5 13/19] perf/x86: Enable SSP sampling using sample_regs_* fields Date: Wed, 3 Dec 2025 14:54:54 +0800 Message-Id: <20251203065500.2597594-14-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20251203065500.2597594-1-dapeng1.mi@linux.intel.com> References: <20251203065500.2597594-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang This patch enables sampling of CET SSP register via the sample_regs_* fields. To sample SSP, the sample_simd_regs_enabled field must be set. This allows the spare space (reclaimed from the original XMM space) in the sample_regs_* fields to be used for representing SSP. Similar with eGPRs sampling, the perf_reg_value() function needs to check if the PERF_SAMPLE_REGS_ABI_SIMD flag is set first, and then determine whether to output SSP or legacy XMM registers to userspace. Additionally, arch-PEBS supports sampling SSP, which is placed into the GPRs group. This patch also enables arch-PEBS-based SSP sampling. Currently, SSP sampling is only supported on the x86_64 architecture, as CET is only available on x86_64 platforms. Signed-off-by: Kan Liang Co-developed-by: Dapeng Mi Signed-off-by: Dapeng Mi --- arch/x86/events/core.c | 9 +++++++++ arch/x86/events/intel/ds.c | 3 +++ arch/x86/events/perf_event.h | 10 ++++++++++ arch/x86/include/asm/perf_event.h | 4 ++++ arch/x86/include/uapi/asm/perf_regs.h | 7 ++++--- arch/x86/kernel/perf_regs.c | 5 +++++ 6 files changed, 35 insertions(+), 3 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index ec0838469cae..b6030dae561d 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -434,6 +434,8 @@ static void x86_pmu_get_ext_regs(struct x86_perf_regs *= perf_regs, u64 mask) perf_regs->opmask =3D get_xsave_addr(xsave, XFEATURE_OPMASK); if (valid_mask & XFEATURE_MASK_APX) perf_regs->egpr =3D get_xsave_addr(xsave, XFEATURE_APX); + if (valid_mask & XFEATURE_MASK_CET_USER) + perf_regs->cet =3D get_xsave_addr(xsave, XFEATURE_CET_USER); } =20 static void release_ext_regs_buffers(void) @@ -736,6 +738,10 @@ int x86_pmu_hw_config(struct perf_event *event) if (event_needs_egprs(event) && !(x86_pmu.ext_regs_mask & XFEATURE_MASK_APX)) return -EINVAL; + if (event_needs_ssp(event) && + !(x86_pmu.ext_regs_mask & XFEATURE_MASK_CET_USER)) + return -EINVAL; + /* Not require any vector registers but set width */ if (event->attr.sample_simd_vec_reg_qwords && !event->attr.sample_simd_vec_reg_intr && @@ -1852,6 +1858,7 @@ inline void x86_pmu_clear_perf_regs(struct pt_regs *r= egs) perf_regs->h16zmm_regs =3D NULL; perf_regs->opmask_regs =3D NULL; perf_regs->egpr_regs =3D NULL; + perf_regs->cet_regs =3D NULL; } =20 static void x86_pmu_setup_basic_regs_data(struct perf_event *event, @@ -1931,6 +1938,8 @@ static void x86_pmu_sample_ext_regs(struct perf_event= *event, mask |=3D XFEATURE_MASK_OPMASK; if (event_needs_egprs(event)) mask |=3D XFEATURE_MASK_APX; + if (event_needs_ssp(event)) + mask |=3D XFEATURE_MASK_CET_USER; =20 mask &=3D ~ignore_mask; if (mask) diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index 79cba323eeb1..3212259d1a16 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -2409,12 +2409,15 @@ static void setup_arch_pebs_sample_data(struct perf= _event *event, } =20 if (header->gpr) { + ignore_mask =3D XFEATURE_MASK_CET_USER; + gprs =3D next_record; next_record =3D gprs + 1; =20 __setup_pebs_gpr_group(event, data, regs, (struct pebs_gprs *)gprs, sample_type); + perf_regs->cet_regs =3D &gprs->r15; } =20 if (header->aux) { diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 9fb1cbbc1b76..35a1837d0b77 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -193,6 +193,16 @@ static inline bool event_needs_egprs(struct perf_event= *event) return false; } =20 +static inline bool event_needs_ssp(struct perf_event *event) +{ + if (event->attr.sample_simd_regs_enabled && + (event->attr.sample_regs_user & BIT_ULL(PERF_REG_X86_SSP) || + event->attr.sample_regs_intr & BIT_ULL(PERF_REG_X86_SSP))) + return true; + + return false; +} + struct amd_nb { int nb_id; /* NorthBridge id */ int refcnt; /* reference count */ diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_= event.h index ca242db3720f..c925af4160ad 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -729,6 +729,10 @@ struct x86_perf_regs { u64 *egpr_regs; struct apx_state *egpr; }; + union { + u64 *cet_regs; + struct cet_user_state *cet; + }; }; =20 extern unsigned long perf_arch_instruction_pointer(struct pt_regs *regs); diff --git a/arch/x86/include/uapi/asm/perf_regs.h b/arch/x86/include/uapi/= asm/perf_regs.h index f145e3b78426..f3561ed10041 100644 --- a/arch/x86/include/uapi/asm/perf_regs.h +++ b/arch/x86/include/uapi/asm/perf_regs.h @@ -28,9 +28,9 @@ enum perf_event_x86_regs { PERF_REG_X86_R14, PERF_REG_X86_R15, /* - * The EGPRs and XMM have overlaps. Only one can be used + * The EGPRs/SSP and XMM have overlaps. Only one can be used * at a time. For the ABI type PERF_SAMPLE_REGS_ABI_SIMD, - * utilize EGPRs. For the other ABI type, XMM is used. + * utilize EGPRs/SSP. For the other ABI type, XMM is used. * * Extended GPRs (EGPRs) */ @@ -50,10 +50,11 @@ enum perf_event_x86_regs { PERF_REG_X86_R29, PERF_REG_X86_R30, PERF_REG_X86_R31, + PERF_REG_X86_SSP, /* These are the limits for the GPRs. */ PERF_REG_X86_32_MAX =3D PERF_REG_X86_GS + 1, PERF_REG_X86_64_MAX =3D PERF_REG_X86_R15 + 1, - PERF_REG_MISC_MAX =3D PERF_REG_X86_R31 + 1, + PERF_REG_MISC_MAX =3D PERF_REG_X86_SSP + 1, =20 /* These all need two bits set because they are 128bit */ PERF_REG_X86_XMM0 =3D 32, diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c index e76de39e1385..518bbe577ee8 100644 --- a/arch/x86/kernel/perf_regs.c +++ b/arch/x86/kernel/perf_regs.c @@ -70,6 +70,11 @@ u64 perf_reg_value(struct pt_regs *regs, int idx) return 0; return perf_regs->egpr_regs[idx - PERF_REG_X86_R16]; } + if (idx =3D=3D PERF_REG_X86_SSP) { + if (!perf_regs->cet) + return 0; + return perf_regs->cet->user_ssp; + } } else { if (idx >=3D PERF_REG_X86_XMM0 && idx < PERF_REG_X86_XMM_MAX) { if (!perf_regs->xmm_regs) --=20 2.34.1 From nobody Sat Feb 7 14:28:38 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 66597275AFF; Wed, 3 Dec 2025 06:59:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.8 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764745157; cv=none; b=VSZH/uEWl76+u5sgUnl1CgKLgxKsQ4VJQc6ukeRgRavBYGJSJ7o1e2puRpZRnDsQw0lzy455lkKg3zg2aDBxCb2PqDAJC0EO08dn9u8tGt9JOd64P7xI/332Zt+BweJaczelh8q/hemQbiZVD5z8WTzzDf7RNsh2H/jTXwmmT+U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764745157; c=relaxed/simple; bh=LP8XvBQArCn/RHlcBCgIs9eVBDldJeFN+PWTqovSrwo=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=f3Ft3/PoDYRmx8u7p4czzVZZt5fCYCQg44V0KUIR1a8f9vga1b6XQiJjB8NNblt120AkmQ8e/W52CFr1R05qamLgvqXjVfu++V/zcX/U4QSplUR7gSKhCX6Wu37esG+1CNrNg7385aKWrpENZKTqtFD73GAR42ZCy7epV8nVZyg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=T8RX8IDW; arc=none smtp.client-ip=192.198.163.8 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="T8RX8IDW" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1764745155; x=1796281155; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=LP8XvBQArCn/RHlcBCgIs9eVBDldJeFN+PWTqovSrwo=; b=T8RX8IDWfqTNvwTUmHr5XseLvFXtcRp1zKUYKaLjKhhuqlkfVVIoJBXc 0jdTuIeEFPp27hkMm1c9lQ/P8ey200b3bz/1Fw7jpOXPeOdl2ivJ0Qi5s Xv/EWzKPx1hq3DaYEivfAXLYMeVG+0lrfH7mExX56vUM8Nmxy66Sb047j /1TlOl4ZvV/+8cOXp8OMusZdz8BZXuBaw+crxAVOqYV3rIqzUxhdLA1WS uDg0yuxHC3Wk9Z3trQlUzRz2SG74tvkeYO+5ol0DVmYbBjuTzxu7cNkLF zWQr4nrFUsDeLIeuMrKA6aKKCDwWYH9y9zRtRzkqd0vXdclxYiaYv8Pj+ g==; X-CSE-ConnectionGUID: UUv4TWMBQoSEOPXm+G8HRg== X-CSE-MsgGUID: m6hZM0cFRFeG9RUpRL7L0w== X-IronPort-AV: E=McAfee;i="6800,10657,11631"; a="84324904" X-IronPort-AV: E=Sophos;i="6.20,245,1758610800"; d="scan'208";a="84324904" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2025 22:59:13 -0800 X-CSE-ConnectionGUID: CRzGbckPSIKCfbWCYPX8Kw== X-CSE-MsgGUID: YPkz25gyRKeTzvUplq7ENg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.20,245,1758610800"; d="scan'208";a="199004008" Received: from spr.sh.intel.com ([10.112.229.196]) by fmviesa005.fm.intel.com with ESMTP; 02 Dec 2025 22:59:08 -0800 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Thomas Gleixner , Dave Hansen , Ian Rogers , Adrian Hunter , Jiri Olsa , Alexander Shishkin , Andi Kleen , Eranian Stephane Cc: Mark Rutland , broonie@kernel.org, Ravi Bangoria , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Zide Chen , Falcon Thomas , Dapeng Mi , Xudong Hao , Kan Liang , Dapeng Mi Subject: [Patch v5 14/19] perf/x86/intel: Enable PERF_PMU_CAP_SIMD_REGS capability Date: Wed, 3 Dec 2025 14:54:55 +0800 Message-Id: <20251203065500.2597594-15-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20251203065500.2597594-1-dapeng1.mi@linux.intel.com> References: <20251203065500.2597594-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang Enable the PERF_PMU_CAP_SIMD_REGS capability if XSAVES support is available for YMM, ZMM, OPMASK, eGPRs, or SSP. Temporarily disable large PEBS sampling for these registers, as the current arch-PEBS sampling code does not support them yet. Large PEBS sampling for these registers will be enabled in subsequent patches. Signed-off-by: Kan Liang Signed-off-by: Dapeng Mi --- arch/x86/events/intel/core.c | 50 +++++++++++++++++++++++++++++++++--- 1 file changed, 46 insertions(+), 4 deletions(-) diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index b5c89e8eabb2..d8cc7abfcdc6 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -4160,10 +4160,32 @@ static unsigned long intel_pmu_large_pebs_flags(str= uct perf_event *event) flags &=3D ~PERF_SAMPLE_TIME; if (!event->attr.exclude_kernel) flags &=3D ~PERF_SAMPLE_REGS_USER; - if (event->attr.sample_regs_user & ~PEBS_GP_REGS) - flags &=3D ~PERF_SAMPLE_REGS_USER; - if (event->attr.sample_regs_intr & ~PEBS_GP_REGS) - flags &=3D ~PERF_SAMPLE_REGS_INTR; + if (event->attr.sample_simd_regs_enabled) { + u64 nolarge =3D PERF_X86_EGPRS_MASK | BIT_ULL(PERF_REG_X86_SSP); + + /* + * PEBS HW can only collect the XMM0-XMM15 for now. + * Disable large PEBS for other vector registers, predicate + * registers, eGPRs, and SSP. + */ + if (event->attr.sample_regs_user & nolarge || + fls64(event->attr.sample_simd_vec_reg_user) > PERF_X86_H16ZMM_BASE || + event->attr.sample_simd_pred_reg_user) + flags &=3D ~PERF_SAMPLE_REGS_USER; + + if (event->attr.sample_regs_intr & nolarge || + fls64(event->attr.sample_simd_vec_reg_intr) > PERF_X86_H16ZMM_BASE || + event->attr.sample_simd_pred_reg_intr) + flags &=3D ~PERF_SAMPLE_REGS_INTR; + + if (event->attr.sample_simd_vec_reg_qwords > PERF_X86_XMM_QWORDS) + flags &=3D ~(PERF_SAMPLE_REGS_USER | PERF_SAMPLE_REGS_INTR); + } else { + if (event->attr.sample_regs_user & ~PEBS_GP_REGS) + flags &=3D ~PERF_SAMPLE_REGS_USER; + if (event->attr.sample_regs_intr & ~PEBS_GP_REGS) + flags &=3D ~PERF_SAMPLE_REGS_INTR; + } return flags; } =20 @@ -5643,6 +5665,26 @@ static void intel_extended_regs_init(struct pmu *pmu) =20 x86_pmu.ext_regs_mask |=3D XFEATURE_MASK_SSE; x86_get_pmu(smp_processor_id())->capabilities |=3D PERF_PMU_CAP_EXTENDED_= REGS; + + if (boot_cpu_has(X86_FEATURE_AVX) && + cpu_has_xfeatures(XFEATURE_MASK_YMM, NULL)) + x86_pmu.ext_regs_mask |=3D XFEATURE_MASK_YMM; + if (boot_cpu_has(X86_FEATURE_APX) && + cpu_has_xfeatures(XFEATURE_MASK_APX, NULL)) + x86_pmu.ext_regs_mask |=3D XFEATURE_MASK_APX; + if (boot_cpu_has(X86_FEATURE_AVX512F)) { + if (cpu_has_xfeatures(XFEATURE_MASK_OPMASK, NULL)) + x86_pmu.ext_regs_mask |=3D XFEATURE_MASK_OPMASK; + if (cpu_has_xfeatures(XFEATURE_MASK_ZMM_Hi256, NULL)) + x86_pmu.ext_regs_mask |=3D XFEATURE_MASK_ZMM_Hi256; + if (cpu_has_xfeatures(XFEATURE_MASK_Hi16_ZMM, NULL)) + x86_pmu.ext_regs_mask |=3D XFEATURE_MASK_Hi16_ZMM; + } + if (cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + x86_pmu.ext_regs_mask |=3D XFEATURE_MASK_CET_USER; + + if (x86_pmu.ext_regs_mask !=3D XFEATURE_MASK_SSE) + x86_get_pmu(smp_processor_id())->capabilities |=3D PERF_PMU_CAP_SIMD_REG= S; } =20 #define counter_mask(_gp, _fixed) ((_gp) | ((u64)(_fixed) << INTEL_PMC_IDX= _FIXED)) --=20 2.34.1 From nobody Sat Feb 7 14:28:38 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 32E962BDC0C; Wed, 3 Dec 2025 06:59:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.8 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764745160; cv=none; b=ete63sUiqZ5sQIn/7BaU2mOicvDICsxOcABT822/CwhDjafioYaw97UjtXhuTxFh2FDt2btfa+4JzyIs4W2fmfhMh+b3E7who7ctulvEA5MNVW/qrOJGLDYULhuNmRYKy72JJmWKzILTQK4PrgghL/cDb9buyTY7AZjwGRGPsBQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764745160; c=relaxed/simple; bh=yQIVHyWq+rfqSCxN0Vb6rsLLcDoEQHDLR5H2ydDIFbw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=OiYT9/JN250uv4hGVzb6IHUZO3XJG5LacYFBjNcKAZtbJSxeJTkAUVe3DzL1cCKGyIONErZG5UWUvawD/Y23XT6vQtjmNgNpSiRxKgNZKa+y6/abL+vfMn6SiWOySc186uD2NseByU0jMTw2jmL+yoT2K22S+4wND/XV0W3QeDY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=TBnZvJRT; arc=none smtp.client-ip=192.198.163.8 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="TBnZvJRT" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1764745158; x=1796281158; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=yQIVHyWq+rfqSCxN0Vb6rsLLcDoEQHDLR5H2ydDIFbw=; b=TBnZvJRTEJi0k58CCGtF8UIARGa5VxsBTpuQpS1dtsx7BbkDi0lEj728 Nl1Hm3phBNPvx38XC8/swXyzROQUndnBl723DchjJhwPlk0u4wP50FGiZ zteSYb8W5XgBsCINxPyqkkpa7QrdpsbuUw9Ev2csvUW3whmMwYHYsQql4 0JWB7ghWyQ4I0nBvBx0i3ubQ58dLG7A8M/V5DN3LvUC5TNI/XuumPCn4n 7eTDv5k778w72/sGg9sKZgpLdFDBFSFf91Y65XY1xJrD1y/8aMIXdnZKq yLX3x40Lmoh0sAPq6jcWXpQI6PtxMKGJuPfpgpP+Ydf++ySB0K1zDUwUz g==; X-CSE-ConnectionGUID: znRRxnbfT8uaFkiuedwCNw== X-CSE-MsgGUID: r+YUpa4SRjiPXY8IztpKVQ== X-IronPort-AV: E=McAfee;i="6800,10657,11631"; a="84324912" X-IronPort-AV: E=Sophos;i="6.20,245,1758610800"; d="scan'208";a="84324912" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2025 22:59:18 -0800 X-CSE-ConnectionGUID: QvWXJJwiR8K2DoHGJBRv9A== X-CSE-MsgGUID: tMxiLfCPS9i8QcJnyQpkHw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.20,245,1758610800"; d="scan'208";a="199004017" Received: from spr.sh.intel.com ([10.112.229.196]) by fmviesa005.fm.intel.com with ESMTP; 02 Dec 2025 22:59:13 -0800 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Thomas Gleixner , Dave Hansen , Ian Rogers , Adrian Hunter , Jiri Olsa , Alexander Shishkin , Andi Kleen , Eranian Stephane Cc: Mark Rutland , broonie@kernel.org, Ravi Bangoria , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Zide Chen , Falcon Thomas , Dapeng Mi , Xudong Hao , Dapeng Mi Subject: [Patch v5 15/19] perf/x86/intel: Enable arch-PEBS based SIMD/eGPRs/SSP sampling Date: Wed, 3 Dec 2025 14:54:56 +0800 Message-Id: <20251203065500.2597594-16-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20251203065500.2597594-1-dapeng1.mi@linux.intel.com> References: <20251203065500.2597594-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This patch enables arch-PEBS based SIMD/eGPRs/SSP registers sampling. Arch-PEBS supports sampling of these registers, with all except SSP placed into the XSAVE-Enabled Registers (XER) group with the layout described below. Field Name Registers Used Size ---------------------------------------------------------------------- XSTATE_BV XINUSE for groups 8 B ---------------------------------------------------------------------- Reserved Reserved 8 B ---------------------------------------------------------------------- SSER XMM0-XMM15 16 regs * 16 B =3D 256 B ---------------------------------------------------------------------- YMMHIR Upper 128 bits of YMM0-YMM15 16 regs * 16 B =3D 256 B ---------------------------------------------------------------------- EGPR R16-R31 16 regs * 8 B =3D 128 B ---------------------------------------------------------------------- OPMASKR K0-K7 8 regs * 8 B =3D 64 B ---------------------------------------------------------------------- ZMMHIR Upper 256 bits of ZMM0-ZMM15 16 regs * 32 B =3D 512 B ---------------------------------------------------------------------- Hi16ZMMR ZMM16-ZMM31 16 regs * 64 B =3D 1024 B ---------------------------------------------------------------------- Memory space in the output buffer is allocated for these sub-groups as long as the corresponding Format.XER[55:49] bits in the PEBS record header are set. However, the arch-PEBS hardware engine does not write the sub-group if it is not used (in INIT state). In such cases, the corresponding bit in the XSTATE_BV bitmap is set to 0. Therefore, the XSTATE_BV field is checked to determine if the register data is actually written for each PEBS record. If not, the register data is not outputted to userspace. The SSP register is sampled and placed into the GPRs group by arch-PEBS. Additionally, the MSRs IA32_PMC_{GPn|FXm}_CFG_C.[55:49] bits are used to manage which types of these registers need to be sampled. Signed-off-by: Dapeng Mi --- arch/x86/events/intel/core.c | 71 +++++++++++++++++++++-------- arch/x86/events/intel/ds.c | 76 ++++++++++++++++++++++++++++--- arch/x86/include/asm/msr-index.h | 7 +++ arch/x86/include/asm/perf_event.h | 8 +++- 4 files changed, 137 insertions(+), 25 deletions(-) diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index d8cc7abfcdc6..da48bcde8fce 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -3008,6 +3008,21 @@ static void intel_pmu_enable_event_ext(struct perf_e= vent *event) if (pebs_data_cfg & PEBS_DATACFG_XMMS) ext |=3D ARCH_PEBS_VECR_XMM & cap.caps; =20 + if (pebs_data_cfg & PEBS_DATACFG_YMMHS) + ext |=3D ARCH_PEBS_VECR_YMMH & cap.caps; + + if (pebs_data_cfg & PEBS_DATACFG_EGPRS) + ext |=3D ARCH_PEBS_VECR_EGPRS & cap.caps; + + if (pebs_data_cfg & PEBS_DATACFG_OPMASKS) + ext |=3D ARCH_PEBS_VECR_OPMASK & cap.caps; + + if (pebs_data_cfg & PEBS_DATACFG_ZMMHS) + ext |=3D ARCH_PEBS_VECR_ZMMH & cap.caps; + + if (pebs_data_cfg & PEBS_DATACFG_H16ZMMS) + ext |=3D ARCH_PEBS_VECR_H16ZMM & cap.caps; + if (pebs_data_cfg & PEBS_DATACFG_LBRS) ext |=3D ARCH_PEBS_LBR & cap.caps; =20 @@ -4152,6 +4167,30 @@ static void intel_pebs_aliases_skl(struct perf_event= *event) return intel_pebs_aliases_precdist(event); } =20 +static inline bool intel_pebs_support_regs(struct perf_event *event, u64 r= egs) +{ + struct arch_pebs_cap cap =3D hybrid(event->pmu, arch_pebs_cap); + bool supported =3D true; + + /* SSP */ + if (regs & PEBS_DATACFG_GP) + supported &=3D x86_pmu.arch_pebs && (ARCH_PEBS_GPR & cap.caps); + if (regs & PEBS_DATACFG_XMMS) + supported &=3D x86_pmu.intel_cap.pebs_format > 3; + if (regs & PEBS_DATACFG_YMMHS) + supported &=3D x86_pmu.arch_pebs && (ARCH_PEBS_VECR_YMMH & cap.caps); + if (regs & PEBS_DATACFG_EGPRS) + supported &=3D x86_pmu.arch_pebs && (ARCH_PEBS_VECR_EGPRS & cap.caps); + if (regs & PEBS_DATACFG_OPMASKS) + supported &=3D x86_pmu.arch_pebs && (ARCH_PEBS_VECR_OPMASK & cap.caps); + if (regs & PEBS_DATACFG_ZMMHS) + supported &=3D x86_pmu.arch_pebs && (ARCH_PEBS_VECR_ZMMH & cap.caps); + if (regs & PEBS_DATACFG_H16ZMMS) + supported &=3D x86_pmu.arch_pebs && (ARCH_PEBS_VECR_H16ZMM & cap.caps); + + return supported; +} + static unsigned long intel_pmu_large_pebs_flags(struct perf_event *event) { unsigned long flags =3D x86_pmu.large_pebs_flags; @@ -4161,24 +4200,20 @@ static unsigned long intel_pmu_large_pebs_flags(str= uct perf_event *event) if (!event->attr.exclude_kernel) flags &=3D ~PERF_SAMPLE_REGS_USER; if (event->attr.sample_simd_regs_enabled) { - u64 nolarge =3D PERF_X86_EGPRS_MASK | BIT_ULL(PERF_REG_X86_SSP); - - /* - * PEBS HW can only collect the XMM0-XMM15 for now. - * Disable large PEBS for other vector registers, predicate - * registers, eGPRs, and SSP. - */ - if (event->attr.sample_regs_user & nolarge || - fls64(event->attr.sample_simd_vec_reg_user) > PERF_X86_H16ZMM_BASE || - event->attr.sample_simd_pred_reg_user) - flags &=3D ~PERF_SAMPLE_REGS_USER; - - if (event->attr.sample_regs_intr & nolarge || - fls64(event->attr.sample_simd_vec_reg_intr) > PERF_X86_H16ZMM_BASE || - event->attr.sample_simd_pred_reg_intr) - flags &=3D ~PERF_SAMPLE_REGS_INTR; - - if (event->attr.sample_simd_vec_reg_qwords > PERF_X86_XMM_QWORDS) + if ((event_needs_ssp(event) && + !intel_pebs_support_regs(event, PEBS_DATACFG_GP)) || + (event_needs_xmm(event) && + !intel_pebs_support_regs(event, PEBS_DATACFG_XMMS)) || + (event_needs_ymm(event) && + !intel_pebs_support_regs(event, PEBS_DATACFG_YMMHS)) || + (event_needs_egprs(event) && + !intel_pebs_support_regs(event, PEBS_DATACFG_EGPRS)) || + (event_needs_opmask(event) && + !intel_pebs_support_regs(event, PEBS_DATACFG_OPMASKS)) || + (event_needs_low16_zmm(event) && + !intel_pebs_support_regs(event, PEBS_DATACFG_ZMMHS)) || + (event_needs_high16_zmm(event) && + !intel_pebs_support_regs(event, PEBS_DATACFG_H16ZMMS))) flags &=3D ~(PERF_SAMPLE_REGS_USER | PERF_SAMPLE_REGS_INTR); } else { if (event->attr.sample_regs_user & ~PEBS_GP_REGS) diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index 3212259d1a16..a01c72c03bd6 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -1470,11 +1470,21 @@ static u64 pebs_update_adaptive_cfg(struct perf_eve= nt *event) ((attr->config & INTEL_ARCH_EVENT_MASK) =3D=3D x86_pmu.rtm_abort_event); =20 - if (gprs || (attr->precise_ip < 2) || tsx_weight) + if (gprs || (attr->precise_ip < 2) || tsx_weight || event_needs_ssp(event= )) pebs_data_cfg |=3D PEBS_DATACFG_GP; =20 if (event_needs_xmm(event)) pebs_data_cfg |=3D PEBS_DATACFG_XMMS; + if (event_needs_ymm(event)) + pebs_data_cfg |=3D PEBS_DATACFG_YMMHS; + if (event_needs_low16_zmm(event)) + pebs_data_cfg |=3D PEBS_DATACFG_ZMMHS; + if (event_needs_high16_zmm(event)) + pebs_data_cfg |=3D PEBS_DATACFG_H16ZMMS; + if (event_needs_opmask(event)) + pebs_data_cfg |=3D PEBS_DATACFG_OPMASKS; + if (event_needs_egprs(event)) + pebs_data_cfg |=3D PEBS_DATACFG_EGPRS; =20 if (sample_type & PERF_SAMPLE_BRANCH_STACK) { /* @@ -2430,15 +2440,69 @@ static void setup_arch_pebs_sample_data(struct perf= _event *event, meminfo->tsx_tuning, ax); } =20 - if (header->xmm) { + if (header->xmm || header->ymmh || header->egpr || + header->opmask || header->zmmh || header->h16zmm) { + struct arch_pebs_xer_header *xer_header =3D next_record; struct pebs_xmm *xmm; + struct ymmh_struct *ymmh; + struct avx_512_zmm_uppers_state *zmmh; + struct avx_512_hi16_state *h16zmm; + struct avx_512_opmask_state *opmask; + struct apx_state *egpr; =20 next_record +=3D sizeof(struct arch_pebs_xer_header); =20 - ignore_mask |=3D XFEATURE_MASK_SSE; - xmm =3D next_record; - perf_regs->xmm_regs =3D xmm->xmm; - next_record =3D xmm + 1; + if (header->xmm) { + ignore_mask |=3D XFEATURE_MASK_SSE; + xmm =3D next_record; + /* + * Only output XMM regs to user space when arch-PEBS + * really writes data into xstate area. + */ + if (xer_header->xstate & XFEATURE_MASK_SSE) + perf_regs->xmm_regs =3D xmm->xmm; + next_record =3D xmm + 1; + } + + if (header->ymmh) { + ignore_mask |=3D XFEATURE_MASK_YMM; + ymmh =3D next_record; + if (xer_header->xstate & XFEATURE_MASK_YMM) + perf_regs->ymmh =3D ymmh; + next_record =3D ymmh + 1; + } + + if (header->egpr) { + ignore_mask |=3D XFEATURE_MASK_APX; + egpr =3D next_record; + if (xer_header->xstate & XFEATURE_MASK_APX) + perf_regs->egpr =3D egpr; + next_record =3D egpr + 1; + } + + if (header->opmask) { + ignore_mask |=3D XFEATURE_MASK_OPMASK; + opmask =3D next_record; + if (xer_header->xstate & XFEATURE_MASK_OPMASK) + perf_regs->opmask =3D opmask; + next_record =3D opmask + 1; + } + + if (header->zmmh) { + ignore_mask |=3D XFEATURE_MASK_ZMM_Hi256; + zmmh =3D next_record; + if (xer_header->xstate & XFEATURE_MASK_ZMM_Hi256) + perf_regs->zmmh =3D zmmh; + next_record =3D zmmh + 1; + } + + if (header->h16zmm) { + ignore_mask |=3D XFEATURE_MASK_Hi16_ZMM; + h16zmm =3D next_record; + if (xer_header->xstate & XFEATURE_MASK_Hi16_ZMM) + perf_regs->h16zmm =3D h16zmm; + next_record =3D h16zmm + 1; + } } =20 if (header->lbr) { diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-in= dex.h index 65cc528fbad8..3f1cc294b1e9 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -341,6 +341,13 @@ #define ARCH_PEBS_LBR_SHIFT 40 #define ARCH_PEBS_LBR (0x3ull << ARCH_PEBS_LBR_SHIFT) #define ARCH_PEBS_VECR_XMM BIT_ULL(49) +#define ARCH_PEBS_VECR_YMMH BIT_ULL(50) +#define ARCH_PEBS_VECR_EGPRS BIT_ULL(51) +#define ARCH_PEBS_VECR_OPMASK BIT_ULL(53) +#define ARCH_PEBS_VECR_ZMMH BIT_ULL(54) +#define ARCH_PEBS_VECR_H16ZMM BIT_ULL(55) +#define ARCH_PEBS_VECR_EXT_SHIFT 50 +#define ARCH_PEBS_VECR_EXT (0x3full << ARCH_PEBS_VECR_EXT_SHIFT) #define ARCH_PEBS_GPR BIT_ULL(61) #define ARCH_PEBS_AUX BIT_ULL(62) #define ARCH_PEBS_EN BIT_ULL(63) diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_= event.h index c925af4160ad..41668a4633df 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -146,6 +146,11 @@ #define PEBS_DATACFG_LBRS BIT_ULL(3) #define PEBS_DATACFG_CNTR BIT_ULL(4) #define PEBS_DATACFG_METRICS BIT_ULL(5) +#define PEBS_DATACFG_YMMHS BIT_ULL(6) +#define PEBS_DATACFG_OPMASKS BIT_ULL(7) +#define PEBS_DATACFG_ZMMHS BIT_ULL(8) +#define PEBS_DATACFG_H16ZMMS BIT_ULL(9) +#define PEBS_DATACFG_EGPRS BIT_ULL(10) #define PEBS_DATACFG_LBR_SHIFT 24 #define PEBS_DATACFG_CNTR_SHIFT 32 #define PEBS_DATACFG_CNTR_MASK GENMASK_ULL(15, 0) @@ -540,7 +545,8 @@ struct arch_pebs_header { rsvd3:7, xmm:1, ymmh:1, - rsvd4:2, + egpr:1, + rsvd4:1, opmask:1, zmmh:1, h16zmm:1, --=20 2.34.1 From nobody Sat Feb 7 14:28:38 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 38F7D29ACFD; Wed, 3 Dec 2025 06:59:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.8 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764745165; cv=none; b=bZ+T0rAzUGATCTA7CKSRxyVfzEBCeFu51lOx3EAx1Kie2aufLrgwLcUEF3TfjPy4TgKXsZAJVtPHupit6/exczl9j1NQnMoZT3O+mO03gpXYgZtvkHhVYnxNisyHDiG9LPOsF7S/Thoqab4a7QhzuJxP6VVVsTe28xqp/6iy/58= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764745165; c=relaxed/simple; bh=8ilMXUbou99UFi5xd5SgbrbDnffWXIZoPGVykXimAHk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=MgJ1MBDidrSa5myaNkyWIlStp1s9vHRPlDyoKqeXxTTT483sVhrf5+qU6Wn3cI26CIM7Z0ReZ1wIB/GhsZ6FdTqYarmiqajdnBsBgVE6UTxhyBoPrTTNxW8WGNRlxVQlXbOQe/0A2F6A0hVXUlP7eFZb4AwMwbn+9H3RimIBAdI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=AV72ILd5; arc=none smtp.client-ip=192.198.163.8 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="AV72ILd5" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1764745163; x=1796281163; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=8ilMXUbou99UFi5xd5SgbrbDnffWXIZoPGVykXimAHk=; b=AV72ILd5JCkhOO5Fgrarr0H2G/tyjbZRK0gTeASpkP6hp3eAL+SCT2eU N21MlPdybfeidZmXyacEBxqIGey92KlugmiXqipgSjQ/EX7Itb3NBdFQB MI2MJeGfVqlW31TuDUCMnOxVQpJVLsZ8PqvzyLRI+B8q1n9bJtdx/0q6J old3nUfM506mrrg0gyg/S7YxqosL+WLsvNXUatj3ja85V5B6DWFzemfq0 Sr0/w227mHMOBSWksNzHUy9SppJ0G2lXvrFvgX2KjtmiaxKtPJZjKSa78 VaX+UMbbqaeelms+mbctevsw3Rld6+KOJ4tc2wgTgTlHYvZivtCx5fIZl g==; X-CSE-ConnectionGUID: xLyKewleS+6w7VeV2UNmyQ== X-CSE-MsgGUID: 0FZ6EHm9Rtq5yVuc2/HOfw== X-IronPort-AV: E=McAfee;i="6800,10657,11631"; a="84324920" X-IronPort-AV: E=Sophos;i="6.20,245,1758610800"; d="scan'208";a="84324920" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2025 22:59:23 -0800 X-CSE-ConnectionGUID: YrLKPjoRROO17TUb7gKoFQ== X-CSE-MsgGUID: 1W/kNWa8TsKCTIVzwSYNzA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.20,245,1758610800"; d="scan'208";a="199004024" Received: from spr.sh.intel.com ([10.112.229.196]) by fmviesa005.fm.intel.com with ESMTP; 02 Dec 2025 22:59:18 -0800 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Thomas Gleixner , Dave Hansen , Ian Rogers , Adrian Hunter , Jiri Olsa , Alexander Shishkin , Andi Kleen , Eranian Stephane Cc: Mark Rutland , broonie@kernel.org, Ravi Bangoria , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Zide Chen , Falcon Thomas , Dapeng Mi , Xudong Hao , Dapeng Mi Subject: [Patch v5 16/19] perf/x86: Activate back-to-back NMI detection for arch-PEBS induced NMIs Date: Wed, 3 Dec 2025 14:54:57 +0800 Message-Id: <20251203065500.2597594-17-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20251203065500.2597594-1-dapeng1.mi@linux.intel.com> References: <20251203065500.2597594-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When two or more identical PEBS events with the same sampling period are programmed on a mix of PDIST and non-PDIST counters, multiple back-to-back NMIs can be triggered. The Linux PMI handler processes the first NMI and clears the GLOBAL_STATUS MSR. If a second NMI is triggered immediately after the first, it is recognized as a "suspicious NMI" because no bits are set in the GLOBAL_STATUS MSR (cleared by the first NMI). This issue does not lead to PEBS data corruption or data loss, but it does result in an annoying warning message. The current NMI handler supports back-to-back NMI detection, but it requires the PMI handler to return the count of actually processed events, which the PEBS handler does not currently do. This patch modifies the PEBS handler to return the count of actually processed events, thereby activating back-to-back NMI detection and avoiding the "suspicious NMI" warning. Suggested-by: Andi Kleen Signed-off-by: Dapeng Mi --- arch/x86/events/intel/core.c | 3 +-- arch/x86/events/intel/ds.c | 36 +++++++++++++++++++++++------------- arch/x86/events/perf_event.h | 2 +- 3 files changed, 25 insertions(+), 16 deletions(-) diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index da48bcde8fce..a130d3f14844 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -3351,8 +3351,7 @@ static int handle_pmi_common(struct pt_regs *regs, u6= 4 status) */ if (__test_and_clear_bit(GLOBAL_STATUS_ARCH_PEBS_THRESHOLD_BIT, (unsigned long *)&status)) { - handled++; - static_call(x86_pmu_drain_pebs)(regs, &data); + handled +=3D static_call(x86_pmu_drain_pebs)(regs, &data); =20 if (cpuc->events[INTEL_PMC_IDX_FIXED_SLOTS] && is_pebs_counter_event_group(cpuc->events[INTEL_PMC_IDX_FIXED_SLOTS])) diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index a01c72c03bd6..c7cdcd585574 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -2759,7 +2759,7 @@ __intel_pmu_pebs_events(struct perf_event *event, __intel_pmu_pebs_last_event(event, iregs, regs, data, at, count, setup_sa= mple); } =20 -static void intel_pmu_drain_pebs_core(struct pt_regs *iregs, struct perf_s= ample_data *data) +static int intel_pmu_drain_pebs_core(struct pt_regs *iregs, struct perf_sa= mple_data *data) { struct cpu_hw_events *cpuc =3D this_cpu_ptr(&cpu_hw_events); struct debug_store *ds =3D cpuc->ds; @@ -2768,7 +2768,7 @@ static void intel_pmu_drain_pebs_core(struct pt_regs = *iregs, struct perf_sample_ int n; =20 if (!x86_pmu.pebs_active) - return; + return 0; =20 at =3D (struct pebs_record_core *)(unsigned long)ds->pebs_buffer_base; top =3D (struct pebs_record_core *)(unsigned long)ds->pebs_index; @@ -2779,22 +2779,24 @@ static void intel_pmu_drain_pebs_core(struct pt_reg= s *iregs, struct perf_sample_ ds->pebs_index =3D ds->pebs_buffer_base; =20 if (!test_bit(0, cpuc->active_mask)) - return; + return 0; =20 WARN_ON_ONCE(!event); =20 if (!event->attr.precise_ip) - return; + return 0; =20 n =3D top - at; if (n <=3D 0) { if (event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD) intel_pmu_save_and_restart_reload(event, 0); - return; + return 0; } =20 __intel_pmu_pebs_events(event, iregs, data, at, top, 0, n, setup_pebs_fixed_sample_data); + + return 0; } =20 static void intel_pmu_pebs_event_update_no_drain(struct cpu_hw_events *cpu= c, u64 mask) @@ -2817,7 +2819,7 @@ static void intel_pmu_pebs_event_update_no_drain(stru= ct cpu_hw_events *cpuc, u64 } } =20 -static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs, struct perf_sa= mple_data *data) +static int intel_pmu_drain_pebs_nhm(struct pt_regs *iregs, struct perf_sam= ple_data *data) { struct cpu_hw_events *cpuc =3D this_cpu_ptr(&cpu_hw_events); struct debug_store *ds =3D cpuc->ds; @@ -2830,7 +2832,7 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *= iregs, struct perf_sample_d u64 mask; =20 if (!x86_pmu.pebs_active) - return; + return 0; =20 base =3D (struct pebs_record_nhm *)(unsigned long)ds->pebs_buffer_base; top =3D (struct pebs_record_nhm *)(unsigned long)ds->pebs_index; @@ -2846,7 +2848,7 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *= iregs, struct perf_sample_d =20 if (unlikely(base >=3D top)) { intel_pmu_pebs_event_update_no_drain(cpuc, mask); - return; + return 0; } =20 for (at =3D base; at < top; at +=3D x86_pmu.pebs_record_size) { @@ -2931,6 +2933,8 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *= iregs, struct perf_sample_d setup_pebs_fixed_sample_data); } } + + return 0; } =20 static __always_inline void @@ -2984,7 +2988,7 @@ __intel_pmu_handle_last_pebs_record(struct pt_regs *i= regs, =20 } =20 -static void intel_pmu_drain_pebs_icl(struct pt_regs *iregs, struct perf_sa= mple_data *data) +static int intel_pmu_drain_pebs_icl(struct pt_regs *iregs, struct perf_sam= ple_data *data) { short counts[INTEL_PMC_IDX_FIXED + MAX_FIXED_PEBS_EVENTS] =3D {}; void *last[INTEL_PMC_IDX_FIXED + MAX_FIXED_PEBS_EVENTS]; @@ -2997,7 +3001,7 @@ static void intel_pmu_drain_pebs_icl(struct pt_regs *= iregs, struct perf_sample_d u64 mask; =20 if (!x86_pmu.pebs_active) - return; + return 0; =20 base =3D (struct pebs_basic *)(unsigned long)ds->pebs_buffer_base; top =3D (struct pebs_basic *)(unsigned long)ds->pebs_index; @@ -3010,7 +3014,7 @@ static void intel_pmu_drain_pebs_icl(struct pt_regs *= iregs, struct perf_sample_d =20 if (unlikely(base >=3D top)) { intel_pmu_pebs_event_update_no_drain(cpuc, mask); - return; + return 0; } =20 if (!iregs) @@ -3032,9 +3036,11 @@ static void intel_pmu_drain_pebs_icl(struct pt_regs = *iregs, struct perf_sample_d =20 __intel_pmu_handle_last_pebs_record(iregs, regs, data, mask, counts, last, setup_pebs_adaptive_sample_data); + + return 0; } =20 -static void intel_pmu_drain_arch_pebs(struct pt_regs *iregs, +static int intel_pmu_drain_arch_pebs(struct pt_regs *iregs, struct perf_sample_data *data) { short counts[INTEL_PMC_IDX_FIXED + MAX_FIXED_PEBS_EVENTS] =3D {}; @@ -3044,13 +3050,14 @@ static void intel_pmu_drain_arch_pebs(struct pt_reg= s *iregs, struct x86_perf_regs perf_regs; struct pt_regs *regs =3D &perf_regs.regs; void *base, *at, *top; + u64 events_bitmap =3D 0; u64 mask; =20 rdmsrq(MSR_IA32_PEBS_INDEX, index.whole); =20 if (unlikely(!index.wr)) { intel_pmu_pebs_event_update_no_drain(cpuc, X86_PMC_IDX_MAX); - return; + return 0; } =20 base =3D cpuc->pebs_vaddr; @@ -3089,6 +3096,7 @@ static void intel_pmu_drain_arch_pebs(struct pt_regs = *iregs, =20 basic =3D at + sizeof(struct arch_pebs_header); pebs_status =3D mask & basic->applicable_counters; + events_bitmap |=3D pebs_status; __intel_pmu_handle_pebs_record(iregs, regs, data, at, pebs_status, counts, last, setup_arch_pebs_sample_data); @@ -3108,6 +3116,8 @@ static void intel_pmu_drain_arch_pebs(struct pt_regs = *iregs, __intel_pmu_handle_last_pebs_record(iregs, regs, data, mask, counts, last, setup_arch_pebs_sample_data); + + return hweight64(events_bitmap); } =20 static void __init intel_arch_pebs_init(void) diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 35a1837d0b77..98958f6d29b6 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -1003,7 +1003,7 @@ struct x86_pmu { int pebs_record_size; int pebs_buffer_size; u64 pebs_events_mask; - void (*drain_pebs)(struct pt_regs *regs, struct perf_sample_data *data); + int (*drain_pebs)(struct pt_regs *regs, struct perf_sample_data *data); struct event_constraint *pebs_constraints; void (*pebs_aliases)(struct perf_event *event); u64 (*pebs_latency_data)(struct perf_event *event, u64 status); --=20 2.34.1 From nobody Sat Feb 7 14:28:38 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E334E2BF015; Wed, 3 Dec 2025 06:59:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.8 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764745169; cv=none; b=pjOLQatm9TEkuythFfR1EPyv9FmFmBux0gBJ9VJmCQwtEOTVcR4+lQtC4oxWaZpZ6mpq176RDMr3x5gUNax+fOFFqMpXBBw12I5a1azMsDRsgZOPc0HTDvk2n2jkzOTaTxwaMOjaRW1LlGLbRByYXemyIBT85EbkFCl3Vnwmy5o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764745169; c=relaxed/simple; bh=Dzwz1Gy/LjTYE8v6XNg2ueMBfmicb7t8BSCmopWa1sI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=mR0bemxdYg4lbsJ/9zbtYdAUiL4pItAT+g9gG9fhO11psfQ3bc08yaM1jgzRQqgAoLXKv7aSVXZWn4wdibEBT0AZGrtinOG8NEIuvYhBatfxUTp2HCuH7DkOb3Gkd4NbfuyDH4QFUnoObnT/6cmJt6Vjptc8ZBxKkc6/Sri0eQE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=gCYvVaDc; arc=none smtp.client-ip=192.198.163.8 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="gCYvVaDc" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1764745168; x=1796281168; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Dzwz1Gy/LjTYE8v6XNg2ueMBfmicb7t8BSCmopWa1sI=; b=gCYvVaDcsFbDzrnf0MfwzVZr1iqVyOSYdFEDIgLHqLecwD6Chn11iP7l 8oCEaL7yuLuU9inEjYop5XYTE3d4RftiKbWtu4b4EPIF/hgnJJg/roXou wfl0+tECMz/+b1ccMwmXX99GHYal2NzB3vn2PDYIqpQdexT6uEAX755u7 fkLdUFILU4we35v4kW4YVYqKvMK8sQ1oU0l8q8uD3s6JO7JSYhg1qmWKq zfFCoSAqJxjWgjFgkNFnUhrjgJq6W1hhJCcLIqTdgo90kui6VSOzOYqK4 v14wKqubA8yyzeL3eoju9hsqe0f6VOFznKTrR6WpnPrblBF/s+cpavlY+ w==; X-CSE-ConnectionGUID: eKwXHdweTwWew2A3IRtcNA== X-CSE-MsgGUID: fYA+aFBlTEKiU51VubD9ig== X-IronPort-AV: E=McAfee;i="6800,10657,11631"; a="84324928" X-IronPort-AV: E=Sophos;i="6.20,245,1758610800"; d="scan'208";a="84324928" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2025 22:59:28 -0800 X-CSE-ConnectionGUID: DPltcxu8TTyOxh3Sz1In2w== X-CSE-MsgGUID: oQI46d3ISlGukl03J5wEGw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.20,245,1758610800"; d="scan'208";a="199004035" Received: from spr.sh.intel.com ([10.112.229.196]) by fmviesa005.fm.intel.com with ESMTP; 02 Dec 2025 22:59:23 -0800 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Thomas Gleixner , Dave Hansen , Ian Rogers , Adrian Hunter , Jiri Olsa , Alexander Shishkin , Andi Kleen , Eranian Stephane Cc: Mark Rutland , broonie@kernel.org, Ravi Bangoria , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Zide Chen , Falcon Thomas , Dapeng Mi , Xudong Hao , Kan Liang , Dapeng Mi Subject: [Patch v5 17/19] perf headers: Sync with the kernel headers Date: Wed, 3 Dec 2025 14:54:58 +0800 Message-Id: <20251203065500.2597594-18-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20251203065500.2597594-1-dapeng1.mi@linux.intel.com> References: <20251203065500.2597594-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang Update include/uapi/linux/perf_event.h and arch/x86/include/uapi/asm/perf_regs.h to support extended regs. Signed-off-by: Kan Liang Co-developed-by: Dapeng Mi Signed-off-by: Dapeng Mi --- tools/arch/x86/include/uapi/asm/perf_regs.h | 62 +++++++++++++++++++++ tools/include/uapi/linux/perf_event.h | 45 +++++++++++++-- 2 files changed, 103 insertions(+), 4 deletions(-) diff --git a/tools/arch/x86/include/uapi/asm/perf_regs.h b/tools/arch/x86/i= nclude/uapi/asm/perf_regs.h index 7c9d2bb3833b..f3561ed10041 100644 --- a/tools/arch/x86/include/uapi/asm/perf_regs.h +++ b/tools/arch/x86/include/uapi/asm/perf_regs.h @@ -27,9 +27,34 @@ enum perf_event_x86_regs { PERF_REG_X86_R13, PERF_REG_X86_R14, PERF_REG_X86_R15, + /* + * The EGPRs/SSP and XMM have overlaps. Only one can be used + * at a time. For the ABI type PERF_SAMPLE_REGS_ABI_SIMD, + * utilize EGPRs/SSP. For the other ABI type, XMM is used. + * + * Extended GPRs (EGPRs) + */ + PERF_REG_X86_R16, + PERF_REG_X86_R17, + PERF_REG_X86_R18, + PERF_REG_X86_R19, + PERF_REG_X86_R20, + PERF_REG_X86_R21, + PERF_REG_X86_R22, + PERF_REG_X86_R23, + PERF_REG_X86_R24, + PERF_REG_X86_R25, + PERF_REG_X86_R26, + PERF_REG_X86_R27, + PERF_REG_X86_R28, + PERF_REG_X86_R29, + PERF_REG_X86_R30, + PERF_REG_X86_R31, + PERF_REG_X86_SSP, /* These are the limits for the GPRs. */ PERF_REG_X86_32_MAX =3D PERF_REG_X86_GS + 1, PERF_REG_X86_64_MAX =3D PERF_REG_X86_R15 + 1, + PERF_REG_MISC_MAX =3D PERF_REG_X86_SSP + 1, =20 /* These all need two bits set because they are 128bit */ PERF_REG_X86_XMM0 =3D 32, @@ -54,5 +79,42 @@ enum perf_event_x86_regs { }; =20 #define PERF_REG_EXTENDED_MASK (~((1ULL << PERF_REG_X86_XMM0) - 1)) +#define PERF_X86_EGPRS_MASK GENMASK_ULL(PERF_REG_X86_R31, PERF_REG_X86_R16) + +enum { + PERF_REG_X86_XMM, + PERF_REG_X86_YMM, + PERF_REG_X86_ZMM, + PERF_REG_X86_MAX_SIMD_REGS, + + PERF_REG_X86_OPMASK =3D 0, + PERF_REG_X86_MAX_PRED_REGS =3D 1, +}; + +enum { + PERF_X86_SIMD_XMM_REGS =3D 16, + PERF_X86_SIMD_YMM_REGS =3D 16, + PERF_X86_SIMD_ZMMH_REGS =3D 16, + PERF_X86_SIMD_ZMM_REGS =3D 32, + PERF_X86_SIMD_VEC_REGS_MAX =3D PERF_X86_SIMD_ZMM_REGS, + + PERF_X86_SIMD_OPMASK_REGS =3D 8, + PERF_X86_SIMD_PRED_REGS_MAX =3D PERF_X86_SIMD_OPMASK_REGS, +}; + +#define PERF_X86_SIMD_PRED_MASK GENMASK(PERF_X86_SIMD_PRED_REGS_MAX - 1, = 0) +#define PERF_X86_SIMD_VEC_MASK GENMASK_ULL(PERF_X86_SIMD_VEC_REGS_MAX - 1= , 0) + +#define PERF_X86_H16ZMM_BASE PERF_X86_SIMD_ZMMH_REGS + +enum { + PERF_X86_OPMASK_QWORDS =3D 1, + PERF_X86_XMM_QWORDS =3D 2, + PERF_X86_YMMH_QWORDS =3D 2, + PERF_X86_YMM_QWORDS =3D 4, + PERF_X86_ZMMH_QWORDS =3D 4, + PERF_X86_ZMM_QWORDS =3D 8, + PERF_X86_SIMD_QWORDS_MAX =3D PERF_X86_ZMM_QWORDS, +}; =20 #endif /* _ASM_X86_PERF_REGS_H */ diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/lin= ux/perf_event.h index d292f96bc06f..f1474da32622 100644 --- a/tools/include/uapi/linux/perf_event.h +++ b/tools/include/uapi/linux/perf_event.h @@ -314,8 +314,9 @@ enum { */ enum perf_sample_regs_abi { PERF_SAMPLE_REGS_ABI_NONE =3D 0, - PERF_SAMPLE_REGS_ABI_32 =3D 1, - PERF_SAMPLE_REGS_ABI_64 =3D 2, + PERF_SAMPLE_REGS_ABI_32 =3D (1 << 0), + PERF_SAMPLE_REGS_ABI_64 =3D (1 << 1), + PERF_SAMPLE_REGS_ABI_SIMD =3D (1 << 2), }; =20 /* @@ -382,6 +383,7 @@ enum perf_event_read_format { #define PERF_ATTR_SIZE_VER6 120 /* Add: aux_sample_size */ #define PERF_ATTR_SIZE_VER7 128 /* Add: sig_data */ #define PERF_ATTR_SIZE_VER8 136 /* Add: config3 */ +#define PERF_ATTR_SIZE_VER9 168 /* Add: sample_simd_{pred,vec}_reg_* */ =20 /* * 'struct perf_event_attr' contains various attributes that define @@ -545,6 +547,25 @@ struct perf_event_attr { __u64 sig_data; =20 __u64 config3; /* extension of config2 */ + + + /* + * Defines set of SIMD registers to dump on samples. + * The sample_simd_regs_enabled !=3D0 implies the + * set of SIMD registers is used to config all SIMD registers. + * If !sample_simd_regs_enabled, sample_regs_XXX may be used to + * config some SIMD registers on X86. + */ + union { + __u16 sample_simd_regs_enabled; + __u16 sample_simd_pred_reg_qwords; + }; + __u32 sample_simd_pred_reg_intr; + __u32 sample_simd_pred_reg_user; + __u16 sample_simd_vec_reg_qwords; + __u64 sample_simd_vec_reg_intr; + __u64 sample_simd_vec_reg_user; + __u32 __reserved_4; }; =20 /* @@ -1018,7 +1039,15 @@ enum perf_event_type { * } && PERF_SAMPLE_BRANCH_STACK * * { u64 abi; # enum perf_sample_regs_abi - * u64 regs[weight(mask)]; } && PERF_SAMPLE_REGS_USER + * u64 regs[weight(mask)]; + * struct { + * u16 nr_vectors; + * u16 vector_qwords; + * u16 nr_pred; + * u16 pred_qwords; + * u64 data[nr_vectors * vector_qwords + nr_pred * pred_qwords]; + * } && (abi & PERF_SAMPLE_REGS_ABI_SIMD) + * } && PERF_SAMPLE_REGS_USER * * { u64 size; * char data[size]; @@ -1045,7 +1074,15 @@ enum perf_event_type { * { u64 data_src; } && PERF_SAMPLE_DATA_SRC * { u64 transaction; } && PERF_SAMPLE_TRANSACTION * { u64 abi; # enum perf_sample_regs_abi - * u64 regs[weight(mask)]; } && PERF_SAMPLE_REGS_INTR + * u64 regs[weight(mask)]; + * struct { + * u16 nr_vectors; + * u16 vector_qwords; + * u16 nr_pred; + * u16 pred_qwords; + * u64 data[nr_vectors * vector_qwords + nr_pred * pred_qwords]; + * } && (abi & PERF_SAMPLE_REGS_ABI_SIMD) + * } && PERF_SAMPLE_REGS_INTR * { u64 phys_addr;} && PERF_SAMPLE_PHYS_ADDR * { u64 cgroup;} && PERF_SAMPLE_CGROUP * { u64 data_page_size;} && PERF_SAMPLE_DATA_PAGE_SIZE --=20 2.34.1 From nobody Sat Feb 7 14:28:38 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7B5372C11C7; Wed, 3 Dec 2025 06:59:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.8 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764745176; cv=none; b=tZuhb5DXybbTQERF1Vu09I850nPoWMQZWaWzkesXTEht2lncoZ8nKW/qqJsnH9SGXzmQjWDCEnXNzUTMbJuKdEuyRhLswaP5XqHDejsJF7xZ3Q1z4Q9D5AYPeVypctSQskjQm6BQx7CDtv4/mNkhnTns7TzGiHuCUg5tqRXaBjM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764745176; c=relaxed/simple; bh=fQZ/JfVn24TzsBsXWgzeI9HHJdz00iCp+BRk3AMsPNU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=tE37phxN4LG5Ri/KbjoN9aVGwlDgR4BE+hSS8Arx5wLSKIo6JloQcQjkMgz+77264OkPL/fNGXi/rXYnOhzUekj5W0pfvl/RNosWwyhs56thNDqbAqm6fseJiuRgnlKiRNa9ww32asNS2gyUP3uMmM72MfOEQjciFjWaLjveYis= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=V7qEV7bV; arc=none smtp.client-ip=192.198.163.8 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="V7qEV7bV" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1764745174; x=1796281174; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=fQZ/JfVn24TzsBsXWgzeI9HHJdz00iCp+BRk3AMsPNU=; b=V7qEV7bVTG3rLwt6AN1LRdfIFaI1AGQHE3zfr6FCMuylKUmCye2Xy7rX jqwh6D7l0cACLB647JVnwrnJtbFy3s2oUTnGnxOGarGuzYEdgHvLxrciB JmlvdgwV+IQ+wJxIUNbMJKHeDNPwXqiUdub2DNbeONWgln7aBKT0pwYsk OKSaLfeZpp8zw9wI3o9LVTOtAGACysVkOGYolpxgDgN9ADwfSK5IWhSAP yXDFYmUTayJRwjuaYTdTB+3Ls1uFyJx7tCmooVpHfOXC7h8ObD71mfmB/ EtLigRn5rk/ABR65sRQEAUZKql8/0V7oGIePY5r4WjrLjT60RqpsKZ/5f g==; X-CSE-ConnectionGUID: nsF5nrxzRbeDXfoogeVaSw== X-CSE-MsgGUID: qNR3tsYkT2SWhKBD0T5d8g== X-IronPort-AV: E=McAfee;i="6800,10657,11631"; a="84324936" X-IronPort-AV: E=Sophos;i="6.20,245,1758610800"; d="scan'208";a="84324936" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2025 22:59:33 -0800 X-CSE-ConnectionGUID: xPGR7vGuT+SAKlMQPz7PUQ== X-CSE-MsgGUID: /HkoRqKGRvmr4E+WEbuLWw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.20,245,1758610800"; d="scan'208";a="199004045" Received: from spr.sh.intel.com ([10.112.229.196]) by fmviesa005.fm.intel.com with ESMTP; 02 Dec 2025 22:59:28 -0800 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Thomas Gleixner , Dave Hansen , Ian Rogers , Adrian Hunter , Jiri Olsa , Alexander Shishkin , Andi Kleen , Eranian Stephane Cc: Mark Rutland , broonie@kernel.org, Ravi Bangoria , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Zide Chen , Falcon Thomas , Dapeng Mi , Xudong Hao , Kan Liang , Dapeng Mi Subject: [Patch v5 18/19] perf parse-regs: Support new SIMD sampling format Date: Wed, 3 Dec 2025 14:54:59 +0800 Message-Id: <20251203065500.2597594-19-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20251203065500.2597594-1-dapeng1.mi@linux.intel.com> References: <20251203065500.2597594-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang This patch adds support for the newly introduced SIMD register sampling format by adding the following functions: uint64_t arch__intr_simd_reg_mask(void); uint64_t arch__user_simd_reg_mask(void); uint64_t arch__intr_pred_reg_mask(void); uint64_t arch__user_pred_reg_mask(void); uint64_t arch__intr_simd_reg_bitmap_qwords(int reg, u16 *qwords); uint64_t arch__user_simd_reg_bitmap_qwords(int reg, u16 *qwords); uint64_t arch__intr_pred_reg_bitmap_qwords(int reg, u16 *qwords); uint64_t arch__user_pred_reg_bitmap_qwords(int reg, u16 *qwords); The arch__{intr|user}_simd_reg_mask() functions retrieve the bitmap of supported SIMD registers, such as XMM/YMM/ZMM on x86 platforms. The arch__{intr|user}_pred_reg_mask() functions retrieve the bitmap of supported PRED registers, such as OPMASK on x86 platforms. The arch__{intr|user}_simd_reg_bitmap_qwords() functions provide the exact bitmap and number of qwords for a specific type of SIMD register. For example, for XMM registers on x86 platforms, the returned bitmap is 0xffff (XMM0 ~ XMM15) and the qwords number is 2 (128 bits for each XMM). The arch__{intr|user}_pred_reg_bitmap_qwords() functions provide the exact bitmap and number of qwords for a specific type of PRED register. For example, for OPMASK registers on x86 platforms, the returned bitmap is 0xff (OPMASK0 ~ OPMASK7) and the qwords number is 1 (64 bits for each OPMASK). Additionally, the function __parse_regs() is enhanced to support parsing these newly introduced SIMD registers. Currently, each type of register can only be sampled collectively; sampling a specific SIMD register is not supported. For example, all XMM registers are sampled together rather than sampling only XMM0. When multiple overlapping register types, such as XMM and YMM, are sampled simultaneously, only the superset (YMM registers) is sampled. With this patch, all supported sampling registers on x86 platforms are displayed as follows. $perf record -I? available registers: AX BX CX DX SI DI BP SP IP FLAGS CS SS R8 R9 R10 R11 R12 R13 R14 R15 R16 R17 R18 R19 R20 R21 R22 R23 R24 R25 R26 R27 R28 R29 R30 R31 SSP XMM0-15 YMM0-15 ZMM0-31 OPMASK0-7 $perf record --user-regs=3D? available registers: AX BX CX DX SI DI BP SP IP FLAGS CS SS R8 R9 R10 R11 R12 R13 R14 R15 R16 R17 R18 R19 R20 R21 R22 R23 R24 R25 R26 R27 R28 R29 R30 R31 SSP XMM0-15 YMM0-15 ZMM0-31 OPMASK0-7 Signed-off-by: Kan Liang Co-developed-by: Dapeng Mi Signed-off-by: Dapeng Mi --- tools/perf/arch/x86/util/perf_regs.c | 470 +++++++++++++++++++++- tools/perf/util/evsel.c | 27 ++ tools/perf/util/parse-regs-options.c | 151 ++++++- tools/perf/util/perf_event_attr_fprintf.c | 6 + tools/perf/util/perf_regs.c | 59 +++ tools/perf/util/perf_regs.h | 11 + tools/perf/util/record.h | 6 + 7 files changed, 714 insertions(+), 16 deletions(-) diff --git a/tools/perf/arch/x86/util/perf_regs.c b/tools/perf/arch/x86/uti= l/perf_regs.c index 12fd93f04802..db41430f3b07 100644 --- a/tools/perf/arch/x86/util/perf_regs.c +++ b/tools/perf/arch/x86/util/perf_regs.c @@ -13,6 +13,49 @@ #include "../../../util/pmu.h" #include "../../../util/pmus.h" =20 +static const struct sample_reg sample_reg_masks_ext[] =3D { + SMPL_REG(AX, PERF_REG_X86_AX), + SMPL_REG(BX, PERF_REG_X86_BX), + SMPL_REG(CX, PERF_REG_X86_CX), + SMPL_REG(DX, PERF_REG_X86_DX), + SMPL_REG(SI, PERF_REG_X86_SI), + SMPL_REG(DI, PERF_REG_X86_DI), + SMPL_REG(BP, PERF_REG_X86_BP), + SMPL_REG(SP, PERF_REG_X86_SP), + SMPL_REG(IP, PERF_REG_X86_IP), + SMPL_REG(FLAGS, PERF_REG_X86_FLAGS), + SMPL_REG(CS, PERF_REG_X86_CS), + SMPL_REG(SS, PERF_REG_X86_SS), +#ifdef HAVE_ARCH_X86_64_SUPPORT + SMPL_REG(R8, PERF_REG_X86_R8), + SMPL_REG(R9, PERF_REG_X86_R9), + SMPL_REG(R10, PERF_REG_X86_R10), + SMPL_REG(R11, PERF_REG_X86_R11), + SMPL_REG(R12, PERF_REG_X86_R12), + SMPL_REG(R13, PERF_REG_X86_R13), + SMPL_REG(R14, PERF_REG_X86_R14), + SMPL_REG(R15, PERF_REG_X86_R15), + SMPL_REG(R16, PERF_REG_X86_R16), + SMPL_REG(R17, PERF_REG_X86_R17), + SMPL_REG(R18, PERF_REG_X86_R18), + SMPL_REG(R19, PERF_REG_X86_R19), + SMPL_REG(R20, PERF_REG_X86_R20), + SMPL_REG(R21, PERF_REG_X86_R21), + SMPL_REG(R22, PERF_REG_X86_R22), + SMPL_REG(R23, PERF_REG_X86_R23), + SMPL_REG(R24, PERF_REG_X86_R24), + SMPL_REG(R25, PERF_REG_X86_R25), + SMPL_REG(R26, PERF_REG_X86_R26), + SMPL_REG(R27, PERF_REG_X86_R27), + SMPL_REG(R28, PERF_REG_X86_R28), + SMPL_REG(R29, PERF_REG_X86_R29), + SMPL_REG(R30, PERF_REG_X86_R30), + SMPL_REG(R31, PERF_REG_X86_R31), + SMPL_REG(SSP, PERF_REG_X86_SSP), +#endif + SMPL_REG_END +}; + static const struct sample_reg sample_reg_masks[] =3D { SMPL_REG(AX, PERF_REG_X86_AX), SMPL_REG(BX, PERF_REG_X86_BX), @@ -276,27 +319,404 @@ int arch_sdt_arg_parse_op(char *old_op, char **new_o= p) return SDT_ARG_VALID; } =20 +static bool support_simd_reg(u64 sample_type, u16 qwords, u64 mask, bool p= red) +{ + struct perf_event_attr attr =3D { + .type =3D PERF_TYPE_HARDWARE, + .config =3D PERF_COUNT_HW_CPU_CYCLES, + .sample_type =3D sample_type, + .disabled =3D 1, + .exclude_kernel =3D 1, + .sample_simd_regs_enabled =3D 1, + }; + int fd; + + attr.sample_period =3D 1; + + if (!pred) { + attr.sample_simd_vec_reg_qwords =3D qwords; + if (sample_type =3D=3D PERF_SAMPLE_REGS_INTR) + attr.sample_simd_vec_reg_intr =3D mask; + else + attr.sample_simd_vec_reg_user =3D mask; + } else { + attr.sample_simd_pred_reg_qwords =3D PERF_X86_OPMASK_QWORDS; + if (sample_type =3D=3D PERF_SAMPLE_REGS_INTR) + attr.sample_simd_pred_reg_intr =3D PERF_X86_SIMD_PRED_MASK; + else + attr.sample_simd_pred_reg_user =3D PERF_X86_SIMD_PRED_MASK; + } + + if (perf_pmus__num_core_pmus() > 1) { + struct perf_pmu *pmu =3D NULL; + __u64 type =3D PERF_TYPE_RAW; + + /* + * The same register set is supported among different hybrid PMUs. + * Only check the first available one. + */ + while ((pmu =3D perf_pmus__scan_core(pmu)) !=3D NULL) { + type =3D pmu->type; + break; + } + attr.config |=3D type << PERF_PMU_TYPE_SHIFT; + } + + event_attr_init(&attr); + + fd =3D sys_perf_event_open(&attr, 0, -1, -1, 0); + if (fd !=3D -1) { + close(fd); + return true; + } + + return false; +} + +static bool __arch_simd_reg_mask(u64 sample_type, int reg, uint64_t *mask,= u16 *qwords) +{ + bool supported =3D false; + u64 bits; + + *mask =3D 0; + *qwords =3D 0; + + switch (reg) { + case PERF_REG_X86_XMM: + bits =3D BIT_ULL(PERF_X86_SIMD_XMM_REGS) - 1; + supported =3D support_simd_reg(sample_type, PERF_X86_XMM_QWORDS, bits, f= alse); + if (supported) { + *mask =3D bits; + *qwords =3D PERF_X86_XMM_QWORDS; + } + break; + case PERF_REG_X86_YMM: + bits =3D BIT_ULL(PERF_X86_SIMD_YMM_REGS) - 1; + supported =3D support_simd_reg(sample_type, PERF_X86_YMM_QWORDS, bits, f= alse); + if (supported) { + *mask =3D bits; + *qwords =3D PERF_X86_YMM_QWORDS; + } + break; + case PERF_REG_X86_ZMM: + bits =3D BIT_ULL(PERF_X86_SIMD_ZMM_REGS) - 1; + supported =3D support_simd_reg(sample_type, PERF_X86_ZMM_QWORDS, bits, f= alse); + if (supported) { + *mask =3D bits; + *qwords =3D PERF_X86_ZMM_QWORDS; + break; + } + + bits =3D BIT_ULL(PERF_X86_SIMD_ZMMH_REGS) - 1; + supported =3D support_simd_reg(sample_type, PERF_X86_ZMM_QWORDS, bits, f= alse); + if (supported) { + *mask =3D bits; + *qwords =3D PERF_X86_ZMMH_QWORDS; + } + break; + default: + break; + } + + return supported; +} + +static bool __arch_pred_reg_mask(u64 sample_type, int reg, uint64_t *mask,= u16 *qwords) +{ + bool supported =3D false; + u64 bits; + + *mask =3D 0; + *qwords =3D 0; + + switch (reg) { + case PERF_REG_X86_OPMASK: + bits =3D BIT_ULL(PERF_X86_SIMD_OPMASK_REGS) - 1; + supported =3D support_simd_reg(sample_type, PERF_X86_OPMASK_QWORDS, bits= , true); + if (supported) { + *mask =3D bits; + *qwords =3D PERF_X86_OPMASK_QWORDS; + } + break; + default: + break; + } + + return supported; +} + +static bool has_cap_simd_regs(void) +{ + uint64_t mask =3D BIT_ULL(PERF_X86_SIMD_XMM_REGS) - 1; + u16 qwords =3D PERF_X86_XMM_QWORDS; + static bool has_cap_simd_regs; + static bool cached; + + if (cached) + return has_cap_simd_regs; + + has_cap_simd_regs =3D __arch_simd_reg_mask(PERF_SAMPLE_REGS_INTR, + PERF_REG_X86_XMM, &mask, &qwords); + has_cap_simd_regs |=3D __arch_simd_reg_mask(PERF_SAMPLE_REGS_USER, + PERF_REG_X86_XMM, &mask, &qwords); + cached =3D true; + + return has_cap_simd_regs; +} + +bool arch_has_simd_regs(u64 mask) +{ + return has_cap_simd_regs() && + mask & GENMASK_ULL(PERF_REG_X86_SSP, PERF_REG_X86_R16); +} + +static const struct sample_reg sample_simd_reg_masks[] =3D { + SMPL_REG(XMM, PERF_REG_X86_XMM), + SMPL_REG(YMM, PERF_REG_X86_YMM), + SMPL_REG(ZMM, PERF_REG_X86_ZMM), + SMPL_REG_END +}; + +static const struct sample_reg sample_pred_reg_masks[] =3D { + SMPL_REG(OPMASK, PERF_REG_X86_OPMASK), + SMPL_REG_END +}; + +const struct sample_reg *arch__sample_simd_reg_masks(void) +{ + return sample_simd_reg_masks; +} + +const struct sample_reg *arch__sample_pred_reg_masks(void) +{ + return sample_pred_reg_masks; +} + +static bool x86_intr_simd_updated; +static u64 x86_intr_simd_reg_mask; +static u64 x86_intr_simd_mask[PERF_REG_X86_MAX_SIMD_REGS]; +static u16 x86_intr_simd_qwords[PERF_REG_X86_MAX_SIMD_REGS]; +static bool x86_user_simd_updated; +static u64 x86_user_simd_reg_mask; +static u64 x86_user_simd_mask[PERF_REG_X86_MAX_SIMD_REGS]; +static u16 x86_user_simd_qwords[PERF_REG_X86_MAX_SIMD_REGS]; + +static bool x86_intr_pred_updated; +static u64 x86_intr_pred_reg_mask; +static u64 x86_intr_pred_mask[PERF_REG_X86_MAX_PRED_REGS]; +static u16 x86_intr_pred_qwords[PERF_REG_X86_MAX_PRED_REGS]; +static bool x86_user_pred_updated; +static u64 x86_user_pred_reg_mask; +static u64 x86_user_pred_mask[PERF_REG_X86_MAX_PRED_REGS]; +static u16 x86_user_pred_qwords[PERF_REG_X86_MAX_PRED_REGS]; + +static uint64_t __arch__simd_reg_mask(u64 sample_type) +{ + const struct sample_reg *r =3D NULL; + bool supported; + u64 mask =3D 0; + int reg; + + if (!has_cap_simd_regs()) + return 0; + + if (sample_type =3D=3D PERF_SAMPLE_REGS_INTR && x86_intr_simd_updated) + return x86_intr_simd_reg_mask; + + if (sample_type =3D=3D PERF_SAMPLE_REGS_USER && x86_user_simd_updated) + return x86_user_simd_reg_mask; + + for (r =3D arch__sample_simd_reg_masks(); r->name; r++) { + supported =3D false; + + if (!r->mask) + continue; + reg =3D fls64(r->mask) - 1; + + if (reg >=3D PERF_REG_X86_MAX_SIMD_REGS) + break; + if (sample_type =3D=3D PERF_SAMPLE_REGS_INTR) + supported =3D __arch_simd_reg_mask(sample_type, reg, + &x86_intr_simd_mask[reg], + &x86_intr_simd_qwords[reg]); + else if (sample_type =3D=3D PERF_SAMPLE_REGS_USER) + supported =3D __arch_simd_reg_mask(sample_type, reg, + &x86_user_simd_mask[reg], + &x86_user_simd_qwords[reg]); + if (supported) + mask |=3D BIT_ULL(reg); + } + + if (sample_type =3D=3D PERF_SAMPLE_REGS_INTR) { + x86_intr_simd_reg_mask =3D mask; + x86_intr_simd_updated =3D true; + } else { + x86_user_simd_reg_mask =3D mask; + x86_user_simd_updated =3D true; + } + + return mask; +} + +static uint64_t __arch__pred_reg_mask(u64 sample_type) +{ + const struct sample_reg *r =3D NULL; + bool supported; + u64 mask =3D 0; + int reg; + + if (!has_cap_simd_regs()) + return 0; + + if (sample_type =3D=3D PERF_SAMPLE_REGS_INTR && x86_intr_pred_updated) + return x86_intr_pred_reg_mask; + + if (sample_type =3D=3D PERF_SAMPLE_REGS_USER && x86_user_pred_updated) + return x86_user_pred_reg_mask; + + for (r =3D arch__sample_pred_reg_masks(); r->name; r++) { + supported =3D false; + + if (!r->mask) + continue; + reg =3D fls64(r->mask) - 1; + + if (reg >=3D PERF_REG_X86_MAX_PRED_REGS) + break; + if (sample_type =3D=3D PERF_SAMPLE_REGS_INTR) + supported =3D __arch_pred_reg_mask(sample_type, reg, + &x86_intr_pred_mask[reg], + &x86_intr_pred_qwords[reg]); + else if (sample_type =3D=3D PERF_SAMPLE_REGS_USER) + supported =3D __arch_pred_reg_mask(sample_type, reg, + &x86_user_pred_mask[reg], + &x86_user_pred_qwords[reg]); + if (supported) + mask |=3D BIT_ULL(reg); + } + + if (sample_type =3D=3D PERF_SAMPLE_REGS_INTR) { + x86_intr_pred_reg_mask =3D mask; + x86_intr_pred_updated =3D true; + } else { + x86_user_pred_reg_mask =3D mask; + x86_user_pred_updated =3D true; + } + + return mask; +} + +uint64_t arch__intr_simd_reg_mask(void) +{ + return __arch__simd_reg_mask(PERF_SAMPLE_REGS_INTR); +} + +uint64_t arch__user_simd_reg_mask(void) +{ + return __arch__simd_reg_mask(PERF_SAMPLE_REGS_USER); +} + +uint64_t arch__intr_pred_reg_mask(void) +{ + return __arch__pred_reg_mask(PERF_SAMPLE_REGS_INTR); +} + +uint64_t arch__user_pred_reg_mask(void) +{ + return __arch__pred_reg_mask(PERF_SAMPLE_REGS_USER); +} + +static uint64_t arch__simd_reg_bitmap_qwords(int reg, u16 *qwords, bool in= tr) +{ + uint64_t mask =3D 0; + + *qwords =3D 0; + if (reg < PERF_REG_X86_MAX_SIMD_REGS) { + if (intr) { + *qwords =3D x86_intr_simd_qwords[reg]; + mask =3D x86_intr_simd_mask[reg]; + } else { + *qwords =3D x86_user_simd_qwords[reg]; + mask =3D x86_user_simd_mask[reg]; + } + } + + return mask; +} + +static uint64_t arch__pred_reg_bitmap_qwords(int reg, u16 *qwords, bool in= tr) +{ + uint64_t mask =3D 0; + + *qwords =3D 0; + if (reg < PERF_REG_X86_MAX_PRED_REGS) { + if (intr) { + *qwords =3D x86_intr_pred_qwords[reg]; + mask =3D x86_intr_pred_mask[reg]; + } else { + *qwords =3D x86_user_pred_qwords[reg]; + mask =3D x86_user_pred_mask[reg]; + } + } + + return mask; +} + +uint64_t arch__intr_simd_reg_bitmap_qwords(int reg, u16 *qwords) +{ + if (!x86_intr_simd_updated) + arch__intr_simd_reg_mask(); + return arch__simd_reg_bitmap_qwords(reg, qwords, true); +} + +uint64_t arch__user_simd_reg_bitmap_qwords(int reg, u16 *qwords) +{ + if (!x86_user_simd_updated) + arch__user_simd_reg_mask(); + return arch__simd_reg_bitmap_qwords(reg, qwords, false); +} + +uint64_t arch__intr_pred_reg_bitmap_qwords(int reg, u16 *qwords) +{ + if (!x86_intr_pred_updated) + arch__intr_pred_reg_mask(); + return arch__pred_reg_bitmap_qwords(reg, qwords, true); +} + +uint64_t arch__user_pred_reg_bitmap_qwords(int reg, u16 *qwords) +{ + if (!x86_user_pred_updated) + arch__user_pred_reg_mask(); + return arch__pred_reg_bitmap_qwords(reg, qwords, false); +} + const struct sample_reg *arch__sample_reg_masks(void) { + if (has_cap_simd_regs()) + return sample_reg_masks_ext; return sample_reg_masks; } =20 -uint64_t arch__intr_reg_mask(void) +static uint64_t __arch__reg_mask(u64 sample_type, u64 mask, bool has_simd_= regs) { struct perf_event_attr attr =3D { - .type =3D PERF_TYPE_HARDWARE, - .config =3D PERF_COUNT_HW_CPU_CYCLES, - .sample_type =3D PERF_SAMPLE_REGS_INTR, - .sample_regs_intr =3D PERF_REG_EXTENDED_MASK, - .precise_ip =3D 1, - .disabled =3D 1, - .exclude_kernel =3D 1, + .type =3D PERF_TYPE_HARDWARE, + .config =3D PERF_COUNT_HW_CPU_CYCLES, + .sample_type =3D sample_type, + .precise_ip =3D 1, + .disabled =3D 1, + .exclude_kernel =3D 1, + .sample_simd_regs_enabled =3D has_simd_regs, }; int fd; /* * In an unnamed union, init it here to build on older gcc versions */ attr.sample_period =3D 1; + if (sample_type =3D=3D PERF_SAMPLE_REGS_INTR) + attr.sample_regs_intr =3D mask; + else + attr.sample_regs_user =3D mask; =20 if (perf_pmus__num_core_pmus() > 1) { struct perf_pmu *pmu =3D NULL; @@ -318,13 +738,41 @@ uint64_t arch__intr_reg_mask(void) fd =3D sys_perf_event_open(&attr, 0, -1, -1, 0); if (fd !=3D -1) { close(fd); - return (PERF_REG_EXTENDED_MASK | PERF_REGS_MASK); + return mask; } =20 - return PERF_REGS_MASK; + return 0; +} + +uint64_t arch__intr_reg_mask(void) +{ + uint64_t mask =3D PERF_REGS_MASK; + + if (has_cap_simd_regs()) { + mask |=3D __arch__reg_mask(PERF_SAMPLE_REGS_INTR, + GENMASK_ULL(PERF_REG_X86_R31, PERF_REG_X86_R16), + true); + mask |=3D __arch__reg_mask(PERF_SAMPLE_REGS_INTR, + BIT_ULL(PERF_REG_X86_SSP), + true); + } else + mask |=3D __arch__reg_mask(PERF_SAMPLE_REGS_INTR, PERF_REG_EXTENDED_MASK= , false); + + return mask; } =20 uint64_t arch__user_reg_mask(void) { - return PERF_REGS_MASK; + uint64_t mask =3D PERF_REGS_MASK; + + if (has_cap_simd_regs()) { + mask |=3D __arch__reg_mask(PERF_SAMPLE_REGS_USER, + GENMASK_ULL(PERF_REG_X86_R31, PERF_REG_X86_R16), + true); + mask |=3D __arch__reg_mask(PERF_SAMPLE_REGS_USER, + BIT_ULL(PERF_REG_X86_SSP), + true); + } + + return mask; } diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index 56ebefd075f2..5d1d90cf9488 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -1461,12 +1461,39 @@ void evsel__config(struct evsel *evsel, struct reco= rd_opts *opts, if (opts->sample_intr_regs && !evsel->no_aux_samples && !evsel__is_dummy_event(evsel)) { attr->sample_regs_intr =3D opts->sample_intr_regs; + attr->sample_simd_regs_enabled =3D arch_has_simd_regs(attr->sample_regs_= intr); + evsel__set_sample_bit(evsel, REGS_INTR); + } + + if ((opts->sample_intr_vec_regs || opts->sample_intr_pred_regs) && + !evsel->no_aux_samples && !evsel__is_dummy_event(evsel)) { + /* The pred qwords is to implies the set of SIMD registers is used */ + if (opts->sample_pred_regs_qwords) + attr->sample_simd_pred_reg_qwords =3D opts->sample_pred_regs_qwords; + else + attr->sample_simd_pred_reg_qwords =3D 1; + attr->sample_simd_vec_reg_intr =3D opts->sample_intr_vec_regs; + attr->sample_simd_vec_reg_qwords =3D opts->sample_vec_regs_qwords; + attr->sample_simd_pred_reg_intr =3D opts->sample_intr_pred_regs; evsel__set_sample_bit(evsel, REGS_INTR); } =20 if (opts->sample_user_regs && !evsel->no_aux_samples && !evsel__is_dummy_event(evsel)) { attr->sample_regs_user |=3D opts->sample_user_regs; + attr->sample_simd_regs_enabled =3D arch_has_simd_regs(attr->sample_regs_= user); + evsel__set_sample_bit(evsel, REGS_USER); + } + + if ((opts->sample_user_vec_regs || opts->sample_user_pred_regs) && + !evsel->no_aux_samples && !evsel__is_dummy_event(evsel)) { + if (opts->sample_pred_regs_qwords) + attr->sample_simd_pred_reg_qwords =3D opts->sample_pred_regs_qwords; + else + attr->sample_simd_pred_reg_qwords =3D 1; + attr->sample_simd_vec_reg_user =3D opts->sample_user_vec_regs; + attr->sample_simd_vec_reg_qwords =3D opts->sample_vec_regs_qwords; + attr->sample_simd_pred_reg_user =3D opts->sample_user_pred_regs; evsel__set_sample_bit(evsel, REGS_USER); } =20 diff --git a/tools/perf/util/parse-regs-options.c b/tools/perf/util/parse-r= egs-options.c index cda1c620968e..0bd100392889 100644 --- a/tools/perf/util/parse-regs-options.c +++ b/tools/perf/util/parse-regs-options.c @@ -4,19 +4,139 @@ #include #include #include +#include #include "util/debug.h" #include #include "util/perf_regs.h" #include "util/parse-regs-options.h" +#include "record.h" + +static void __print_simd_regs(bool intr, uint64_t simd_mask) +{ + const struct sample_reg *r =3D NULL; + uint64_t bitmap =3D 0; + u16 qwords =3D 0; + int reg_idx; + + if (!simd_mask) + return; + + for (r =3D arch__sample_simd_reg_masks(); r->name; r++) { + if (!(r->mask & simd_mask)) + continue; + reg_idx =3D fls64(r->mask) - 1; + if (intr) + bitmap =3D arch__intr_simd_reg_bitmap_qwords(reg_idx, &qwords); + else + bitmap =3D arch__user_simd_reg_bitmap_qwords(reg_idx, &qwords); + if (bitmap) + fprintf(stderr, "%s0-%d ", r->name, fls64(bitmap) - 1); + } +} + +static void __print_pred_regs(bool intr, uint64_t pred_mask) +{ + const struct sample_reg *r =3D NULL; + uint64_t bitmap =3D 0; + u16 qwords =3D 0; + int reg_idx; + + if (!pred_mask) + return; + + for (r =3D arch__sample_pred_reg_masks(); r->name; r++) { + if (!(r->mask & pred_mask)) + continue; + reg_idx =3D fls64(r->mask) - 1; + if (intr) + bitmap =3D arch__intr_pred_reg_bitmap_qwords(reg_idx, &qwords); + else + bitmap =3D arch__user_pred_reg_bitmap_qwords(reg_idx, &qwords); + if (bitmap) + fprintf(stderr, "%s0-%d ", r->name, fls64(bitmap) - 1); + } +} + +static bool __parse_simd_regs(struct record_opts *opts, char *s, bool intr) +{ + const struct sample_reg *r =3D NULL; + bool matched =3D false; + uint64_t bitmap =3D 0; + u16 qwords =3D 0; + int reg_idx; + + for (r =3D arch__sample_simd_reg_masks(); r->name; r++) { + if (strcasecmp(s, r->name)) + continue; + if (!fls64(r->mask)) + continue; + reg_idx =3D fls64(r->mask) - 1; + if (intr) + bitmap =3D arch__intr_simd_reg_bitmap_qwords(reg_idx, &qwords); + else + bitmap =3D arch__user_simd_reg_bitmap_qwords(reg_idx, &qwords); + matched =3D true; + break; + } + + /* Just need the highest qwords */ + if (qwords > opts->sample_vec_regs_qwords) { + opts->sample_vec_regs_qwords =3D qwords; + if (intr) + opts->sample_intr_vec_regs =3D bitmap; + else + opts->sample_user_vec_regs =3D bitmap; + } + + return matched; +} + +static bool __parse_pred_regs(struct record_opts *opts, char *s, bool intr) +{ + const struct sample_reg *r =3D NULL; + bool matched =3D false; + uint64_t bitmap =3D 0; + u16 qwords =3D 0; + int reg_idx; + + for (r =3D arch__sample_pred_reg_masks(); r->name; r++) { + if (strcasecmp(s, r->name)) + continue; + if (!fls64(r->mask)) + continue; + reg_idx =3D fls64(r->mask) - 1; + if (intr) + bitmap =3D arch__intr_pred_reg_bitmap_qwords(reg_idx, &qwords); + else + bitmap =3D arch__user_pred_reg_bitmap_qwords(reg_idx, &qwords); + matched =3D true; + break; + } + + /* Just need the highest qwords */ + if (qwords > opts->sample_pred_regs_qwords) { + opts->sample_pred_regs_qwords =3D qwords; + if (intr) + opts->sample_intr_pred_regs =3D bitmap; + else + opts->sample_user_pred_regs =3D bitmap; + } + + return matched; +} =20 static int __parse_regs(const struct option *opt, const char *str, int unset, bool in= tr) { uint64_t *mode =3D (uint64_t *)opt->value; const struct sample_reg *r =3D NULL; + struct record_opts *opts; char *s, *os =3D NULL, *p; - int ret =3D -1; + bool has_simd_regs =3D false; uint64_t mask; + uint64_t simd_mask; + uint64_t pred_mask; + int ret =3D -1; =20 if (unset) return 0; @@ -27,10 +147,17 @@ __parse_regs(const struct option *opt, const char *str= , int unset, bool intr) if (*mode) return -1; =20 - if (intr) + if (intr) { + opts =3D container_of(opt->value, struct record_opts, sample_intr_regs); mask =3D arch__intr_reg_mask(); - else + simd_mask =3D arch__intr_simd_reg_mask(); + pred_mask =3D arch__intr_pred_reg_mask(); + } else { + opts =3D container_of(opt->value, struct record_opts, sample_user_regs); mask =3D arch__user_reg_mask(); + simd_mask =3D arch__user_simd_reg_mask(); + pred_mask =3D arch__user_pred_reg_mask(); + } =20 /* str may be NULL in case no arg is passed to -I */ if (str) { @@ -50,10 +177,24 @@ __parse_regs(const struct option *opt, const char *str= , int unset, bool intr) if (r->mask & mask) fprintf(stderr, "%s ", r->name); } + __print_simd_regs(intr, simd_mask); + __print_pred_regs(intr, pred_mask); fputc('\n', stderr); /* just printing available regs */ goto error; } + + if (simd_mask) { + has_simd_regs =3D __parse_simd_regs(opts, s, intr); + if (has_simd_regs) + goto next; + } + if (pred_mask) { + has_simd_regs =3D __parse_pred_regs(opts, s, intr); + if (has_simd_regs) + goto next; + } + for (r =3D arch__sample_reg_masks(); r->name; r++) { if ((r->mask & mask) && !strcasecmp(s, r->name)) break; @@ -65,7 +206,7 @@ __parse_regs(const struct option *opt, const char *str, = int unset, bool intr) } =20 *mode |=3D r->mask; - +next: if (!p) break; =20 @@ -75,7 +216,7 @@ __parse_regs(const struct option *opt, const char *str, = int unset, bool intr) ret =3D 0; =20 /* default to all possible regs */ - if (*mode =3D=3D 0) + if (*mode =3D=3D 0 && !has_simd_regs) *mode =3D mask; error: free(os); diff --git a/tools/perf/util/perf_event_attr_fprintf.c b/tools/perf/util/pe= rf_event_attr_fprintf.c index 66b666d9ce64..fb0366d050cf 100644 --- a/tools/perf/util/perf_event_attr_fprintf.c +++ b/tools/perf/util/perf_event_attr_fprintf.c @@ -360,6 +360,12 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_eve= nt_attr *attr, PRINT_ATTRf(aux_start_paused, p_unsigned); PRINT_ATTRf(aux_pause, p_unsigned); PRINT_ATTRf(aux_resume, p_unsigned); + PRINT_ATTRf(sample_simd_pred_reg_qwords, p_unsigned); + PRINT_ATTRf(sample_simd_pred_reg_intr, p_hex); + PRINT_ATTRf(sample_simd_pred_reg_user, p_hex); + PRINT_ATTRf(sample_simd_vec_reg_qwords, p_unsigned); + PRINT_ATTRf(sample_simd_vec_reg_intr, p_hex); + PRINT_ATTRf(sample_simd_vec_reg_user, p_hex); =20 return ret; } diff --git a/tools/perf/util/perf_regs.c b/tools/perf/util/perf_regs.c index 44b90bbf2d07..e8a9fabc92e6 100644 --- a/tools/perf/util/perf_regs.c +++ b/tools/perf/util/perf_regs.c @@ -11,6 +11,11 @@ int __weak arch_sdt_arg_parse_op(char *old_op __maybe_un= used, return SDT_ARG_SKIP; } =20 +bool __weak arch_has_simd_regs(u64 mask __maybe_unused) +{ + return false; +} + uint64_t __weak arch__intr_reg_mask(void) { return 0; @@ -21,6 +26,50 @@ uint64_t __weak arch__user_reg_mask(void) return 0; } =20 +uint64_t __weak arch__intr_simd_reg_mask(void) +{ + return 0; +} + +uint64_t __weak arch__user_simd_reg_mask(void) +{ + return 0; +} + +uint64_t __weak arch__intr_pred_reg_mask(void) +{ + return 0; +} + +uint64_t __weak arch__user_pred_reg_mask(void) +{ + return 0; +} + +uint64_t __weak arch__intr_simd_reg_bitmap_qwords(int reg __maybe_unused,= u16 *qwords) +{ + *qwords =3D 0; + return 0; +} + +uint64_t __weak arch__user_simd_reg_bitmap_qwords(int reg __maybe_unused, = u16 *qwords) +{ + *qwords =3D 0; + return 0; +} + +uint64_t __weak arch__intr_pred_reg_bitmap_qwords(int reg __maybe_unused,= u16 *qwords) +{ + *qwords =3D 0; + return 0; +} + +uint64_t __weak arch__user_pred_reg_bitmap_qwords(int reg __maybe_unused, = u16 *qwords) +{ + *qwords =3D 0; + return 0; +} + static const struct sample_reg sample_reg_masks[] =3D { SMPL_REG_END }; @@ -30,6 +79,16 @@ const struct sample_reg * __weak arch__sample_reg_masks(= void) return sample_reg_masks; } =20 +const struct sample_reg * __weak arch__sample_simd_reg_masks(void) +{ + return sample_reg_masks; +} + +const struct sample_reg * __weak arch__sample_pred_reg_masks(void) +{ + return sample_reg_masks; +} + const char *perf_reg_name(int id, const char *arch) { const char *reg_name =3D NULL; diff --git a/tools/perf/util/perf_regs.h b/tools/perf/util/perf_regs.h index f2d0736d65cc..bce9c4cfd1bf 100644 --- a/tools/perf/util/perf_regs.h +++ b/tools/perf/util/perf_regs.h @@ -24,9 +24,20 @@ enum { }; =20 int arch_sdt_arg_parse_op(char *old_op, char **new_op); +bool arch_has_simd_regs(u64 mask); uint64_t arch__intr_reg_mask(void); uint64_t arch__user_reg_mask(void); const struct sample_reg *arch__sample_reg_masks(void); +const struct sample_reg *arch__sample_simd_reg_masks(void); +const struct sample_reg *arch__sample_pred_reg_masks(void); +uint64_t arch__intr_simd_reg_mask(void); +uint64_t arch__user_simd_reg_mask(void); +uint64_t arch__intr_pred_reg_mask(void); +uint64_t arch__user_pred_reg_mask(void); +uint64_t arch__intr_simd_reg_bitmap_qwords(int reg, u16 *qwords); +uint64_t arch__user_simd_reg_bitmap_qwords(int reg, u16 *qwords); +uint64_t arch__intr_pred_reg_bitmap_qwords(int reg, u16 *qwords); +uint64_t arch__user_pred_reg_bitmap_qwords(int reg, u16 *qwords); =20 const char *perf_reg_name(int id, const char *arch); int perf_reg_value(u64 *valp, struct regs_dump *regs, int id); diff --git a/tools/perf/util/record.h b/tools/perf/util/record.h index ea3a6c4657ee..825ffb4cc53f 100644 --- a/tools/perf/util/record.h +++ b/tools/perf/util/record.h @@ -59,7 +59,13 @@ struct record_opts { unsigned int user_freq; u64 branch_stack; u64 sample_intr_regs; + u64 sample_intr_vec_regs; u64 sample_user_regs; + u64 sample_user_vec_regs; + u16 sample_pred_regs_qwords; + u16 sample_vec_regs_qwords; + u16 sample_intr_pred_regs; + u16 sample_user_pred_regs; u64 default_interval; u64 user_interval; size_t auxtrace_snapshot_size; --=20 2.34.1 From nobody Sat Feb 7 14:28:38 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 55E1B2C158F; Wed, 3 Dec 2025 06:59:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.8 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764745180; cv=none; b=edcFXq1E3tqdiqn8lNZ8WouAQB7b4z/CPdHKcGBPdkrANn3zZpcAZk6My3QPbsRiLSndKLiV7M7mVDsRroEtZFe2Qn3N88L/7BKRa0izvJqpeeHRkffqcJDpPAApcw64ghvJJ5oyhkYC5JH+qDfZp33alO5hUMfhTS/zm7dGtPA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764745180; c=relaxed/simple; bh=t23Vet3KpXEUBHWAW7TVt4gogWNPDsrOzq5ZhNAt7G4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=pf6P0Q/6kqieFwQ/q1MBGyodCaGXpmcp8nVenhQc/yprxcZnisVfMNM59dQdQXSy3WwA0Q7wkwa6sv21M7uhh/xj4ZNid/KheKVh+EzXyiLlrCZZmCULN+UdNUn7jYcEQBq1r3OtlRglTsxqXQKcbpdXzK8PRUOVKeejBvP81I4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=JXRveBX9; arc=none smtp.client-ip=192.198.163.8 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="JXRveBX9" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1764745178; x=1796281178; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=t23Vet3KpXEUBHWAW7TVt4gogWNPDsrOzq5ZhNAt7G4=; b=JXRveBX95n9c9lGE466SprKkb8tsjmn5TE8Ahd03uNnPpfvVywsqPb8r wphy81YGvEWS6an0YS8Y0YOfX/7KZGkew2Wq9Hr+InZxyl0/R/UJUld1B IHU/sJQnQPN9VZXq1irI0OmpbHLBlflZaPayKgRmM/5GtYmvEizAfq8Fc OGppAkttb7mKND8RV27doaXgpHLXd5Y9JtjJ8Vtm4xxwVkuBjwr/yJE7H bVGOb9TVpmbMZlxfGDKVKJZr9cw3kZCs7PoMvuLgTmWVfh/8ysCbXmBom I/A8j+FuFUSXwt+HvKLKmGP896EY9C6wDhh802ntcManwrD0A9H0zRKjv w==; X-CSE-ConnectionGUID: midD1SyQQX6Lo+1z6qj97Q== X-CSE-MsgGUID: zjMW/uOWQJiBoSc3yzHQdg== X-IronPort-AV: E=McAfee;i="6800,10657,11631"; a="84324945" X-IronPort-AV: E=Sophos;i="6.20,245,1758610800"; d="scan'208";a="84324945" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2025 22:59:38 -0800 X-CSE-ConnectionGUID: Iax5iAaaQtC7ELeUQh+33w== X-CSE-MsgGUID: Dl9PMkYSSTGYZ1gM0YB5JA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.20,245,1758610800"; d="scan'208";a="199004058" Received: from spr.sh.intel.com ([10.112.229.196]) by fmviesa005.fm.intel.com with ESMTP; 02 Dec 2025 22:59:33 -0800 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Thomas Gleixner , Dave Hansen , Ian Rogers , Adrian Hunter , Jiri Olsa , Alexander Shishkin , Andi Kleen , Eranian Stephane Cc: Mark Rutland , broonie@kernel.org, Ravi Bangoria , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Zide Chen , Falcon Thomas , Dapeng Mi , Xudong Hao , Kan Liang , Dapeng Mi Subject: [Patch v5 19/19] perf regs: Enable dumping of SIMD registers Date: Wed, 3 Dec 2025 14:55:00 +0800 Message-Id: <20251203065500.2597594-20-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20251203065500.2597594-1-dapeng1.mi@linux.intel.com> References: <20251203065500.2597594-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang This patch adds support for dumping SIMD registers using the new PERF_SAMPLE_REGS_ABI_SIMD ABI. Currently, the XMM, YMM, ZMM, OPMASK, eGPRs, and SSP registers on x86 platforms are supported with the PERF_SAMPLE_REGS_ABI_SIMD ABI. An example of the output is displayed below. Example: $perf record -e cycles:p -IXMM,YMM,OPMASK,SSP ./test $perf report -D ... ... 237538985992962 0x454d0 [0x480]: PERF_RECORD_SAMPLE(IP, 0x1): 179370/179370: 0xffffffff969627fc period: 124999 addr: 0 ... intr regs: mask 0x20000000000 ABI 64-bit .... SSP 0x0000000000000000 ... SIMD ABI nr_vectors 32 vector_qwords 4 nr_pred 8 pred_qwords 1 .... YMM [0] 0x0000000000004000 .... YMM [0] 0x000055e828695270 .... YMM [0] 0x0000000000000000 .... YMM [0] 0x0000000000000000 .... YMM [1] 0x000055e8286990e0 .... YMM [1] 0x000055e828698dd0 .... YMM [1] 0x0000000000000000 .... YMM [1] 0x0000000000000000 ... ... .... YMM [31] 0x0000000000000000 .... YMM [31] 0x0000000000000000 .... YMM [31] 0x0000000000000000 .... YMM [31] 0x0000000000000000 .... OPMASK[0] 0x0000000000100221 .... OPMASK[1] 0x0000000000000020 .... OPMASK[2] 0x000000007fffffff .... OPMASK[3] 0x0000000000000000 .... OPMASK[4] 0x0000000000000000 .... OPMASK[5] 0x0000000000000000 .... OPMASK[6] 0x0000000000000000 .... OPMASK[7] 0x0000000000000000 ... ... Signed-off-by: Kan Liang Co-developed-by: Dapeng Mi Signed-off-by: Dapeng Mi --- tools/perf/util/evsel.c | 20 +++++ .../perf/util/perf-regs-arch/perf_regs_x86.c | 43 ++++++++++ tools/perf/util/sample.h | 10 +++ tools/perf/util/session.c | 78 +++++++++++++++++-- 4 files changed, 143 insertions(+), 8 deletions(-) diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index 5d1d90cf9488..8f3fafe3a43f 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -3347,6 +3347,16 @@ int evsel__parse_sample(struct evsel *evsel, union p= erf_event *event, regs->mask =3D mask; regs->regs =3D (u64 *)array; array =3D (void *)array + sz; + + if (regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) { + regs->config =3D *(u64 *)array; + array =3D (void *)array + sizeof(u64); + regs->data =3D (u64 *)array; + sz =3D (regs->nr_vectors * regs->vector_qwords + + regs->nr_pred * regs->pred_qwords) * sizeof(u64); + OVERFLOW_CHECK(array, sz, max_size); + array =3D (void *)array + sz; + } } } =20 @@ -3404,6 +3414,16 @@ int evsel__parse_sample(struct evsel *evsel, union p= erf_event *event, regs->mask =3D mask; regs->regs =3D (u64 *)array; array =3D (void *)array + sz; + + if (regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) { + regs->config =3D *(u64 *)array; + array =3D (void *)array + sizeof(u64); + regs->data =3D (u64 *)array; + sz =3D (regs->nr_vectors * regs->vector_qwords + + regs->nr_pred * regs->pred_qwords) * sizeof(u64); + OVERFLOW_CHECK(array, sz, max_size); + array =3D (void *)array + sz; + } } } =20 diff --git a/tools/perf/util/perf-regs-arch/perf_regs_x86.c b/tools/perf/ut= il/perf-regs-arch/perf_regs_x86.c index 708954a9d35d..32dac438b12d 100644 --- a/tools/perf/util/perf-regs-arch/perf_regs_x86.c +++ b/tools/perf/util/perf-regs-arch/perf_regs_x86.c @@ -5,6 +5,49 @@ =20 const char *__perf_reg_name_x86(int id) { + if (id > PERF_REG_X86_R15 && arch__intr_simd_reg_mask()) { + switch (id) { + case PERF_REG_X86_R16: + return "R16"; + case PERF_REG_X86_R17: + return "R17"; + case PERF_REG_X86_R18: + return "R18"; + case PERF_REG_X86_R19: + return "R19"; + case PERF_REG_X86_R20: + return "R20"; + case PERF_REG_X86_R21: + return "R21"; + case PERF_REG_X86_R22: + return "R22"; + case PERF_REG_X86_R23: + return "R23"; + case PERF_REG_X86_R24: + return "R24"; + case PERF_REG_X86_R25: + return "R25"; + case PERF_REG_X86_R26: + return "R26"; + case PERF_REG_X86_R27: + return "R27"; + case PERF_REG_X86_R28: + return "R28"; + case PERF_REG_X86_R29: + return "R29"; + case PERF_REG_X86_R30: + return "R30"; + case PERF_REG_X86_R31: + return "R31"; + case PERF_REG_X86_SSP: + return "SSP"; + default: + return NULL; + } + + return NULL; + } + switch (id) { case PERF_REG_X86_AX: return "AX"; diff --git a/tools/perf/util/sample.h b/tools/perf/util/sample.h index fae834144ef4..3b247e0e8242 100644 --- a/tools/perf/util/sample.h +++ b/tools/perf/util/sample.h @@ -12,6 +12,16 @@ struct regs_dump { u64 abi; u64 mask; u64 *regs; + union { + u64 config; + struct { + u16 nr_vectors; + u16 vector_qwords; + u16 nr_pred; + u16 pred_qwords; + }; + }; + u64 *data; =20 /* Cached values/mask filled by first register access. */ u64 cache_regs[PERF_SAMPLE_REGS_CACHE_SIZE]; diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c index 09af486c83e4..c692be265c21 100644 --- a/tools/perf/util/session.c +++ b/tools/perf/util/session.c @@ -927,18 +927,78 @@ static void regs_dump__printf(u64 mask, u64 *regs, co= nst char *arch) } } =20 -static const char *regs_abi[] =3D { - [PERF_SAMPLE_REGS_ABI_NONE] =3D "none", - [PERF_SAMPLE_REGS_ABI_32] =3D "32-bit", - [PERF_SAMPLE_REGS_ABI_64] =3D "64-bit", -}; +static void simd_regs_dump__printf(struct regs_dump *regs, bool intr) +{ + const char *name =3D "unknown"; + const struct sample_reg *r; + int i, idx =3D 0; + u16 qwords; + int reg_idx; + + if (!(regs->abi & PERF_SAMPLE_REGS_ABI_SIMD)) + return; + + printf("... SIMD ABI nr_vectors %d vector_qwords %d nr_pred %d pred_qword= s %d\n", + regs->nr_vectors, regs->vector_qwords, + regs->nr_pred, regs->pred_qwords); + + for (r =3D arch__sample_simd_reg_masks(); r->name; r++) { + if (!fls64(r->mask)) + continue; + reg_idx =3D fls64(r->mask) - 1; + if (intr) + arch__intr_simd_reg_bitmap_qwords(reg_idx, &qwords); + else + arch__user_simd_reg_bitmap_qwords(reg_idx, &qwords); + if (regs->vector_qwords =3D=3D qwords) { + name =3D r->name; + break; + } + } + + for (i =3D 0; i < regs->nr_vectors; i++) { + printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]); + printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]); + if (regs->vector_qwords > 2) { + printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]); + printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]); + } + if (regs->vector_qwords > 4) { + printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]); + printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]); + printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]); + printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]); + } + } + + name =3D "unknown"; + for (r =3D arch__sample_pred_reg_masks(); r->name; r++) { + if (!fls64(r->mask)) + continue; + reg_idx =3D fls64(r->mask) - 1; + if (intr) + arch__intr_pred_reg_bitmap_qwords(reg_idx, &qwords); + else + arch__user_pred_reg_bitmap_qwords(reg_idx, &qwords); + if (regs->pred_qwords =3D=3D qwords) { + name =3D r->name; + break; + } + } + for (i =3D 0; i < regs->nr_pred; i++) + printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]); +} =20 static inline const char *regs_dump_abi(struct regs_dump *d) { - if (d->abi > PERF_SAMPLE_REGS_ABI_64) - return "unknown"; + if (!d->abi) + return "none"; + if (d->abi & PERF_SAMPLE_REGS_ABI_32) + return "32-bit"; + else if (d->abi & PERF_SAMPLE_REGS_ABI_64) + return "64-bit"; =20 - return regs_abi[d->abi]; + return "unknown"; } =20 static void regs__printf(const char *type, struct regs_dump *regs, const c= har *arch) @@ -964,6 +1024,7 @@ static void regs_user__printf(struct perf_sample *samp= le, const char *arch) =20 if (user_regs->regs) regs__printf("user", user_regs, arch); + simd_regs_dump__printf(user_regs, false); } =20 static void regs_intr__printf(struct perf_sample *sample, const char *arch) @@ -977,6 +1038,7 @@ static void regs_intr__printf(struct perf_sample *samp= le, const char *arch) =20 if (intr_regs->regs) regs__printf("intr", intr_regs, arch); + simd_regs_dump__printf(intr_regs, true); } =20 static void stack_user__printf(struct stack_dump *dump) --=20 2.34.1