From nobody Thu Apr 2 09:33:38 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 85019366070; Tue, 24 Mar 2026 01:01:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774314086; cv=none; b=IhnVdBSldQf1SoJ2Jl383c00RjEriP/v6qUjWWcKghaJnUqyehFX5VcwjRfKiWd7HOy8G9ZGmys4TU77aUXVym90BSaV5D8H3dh3puTkEiuIro7mMR53FqJiRsPtXXdQ+r2MaoDS+RGE7dYMUgzcOKHdkc7pMDikiLFZdaD0mFs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774314086; c=relaxed/simple; bh=uCUtytyAEsBdn4rnZEq1jRmdpkz6yMp51OvWmEZbdpY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=NLER/jM6JZBEO3xCX2Q44SsgqzNdM5vL3U4i2dAUt/YwMULWsDCwjCE10A1r7RjtW89JeBA09HYmgQ8t9nVi6M2Zl3VMN9bbLaZL5j4amTmR3daan9mvu2Tg4WPOIJdrqDLLkgda/QttZ+mTHCh+6imY/Oyd3CNJBQ1hSOh6ZeM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Sf3QGYJb; arc=none smtp.client-ip=192.198.163.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Sf3QGYJb" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1774314086; x=1805850086; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=uCUtytyAEsBdn4rnZEq1jRmdpkz6yMp51OvWmEZbdpY=; b=Sf3QGYJbGqx4Taljg9YfVmaVLG38P/gXn4KcGYu34CK7L5qW9BlSG/2g OLxM7uRLV1DHbYxOqpClThoOdQAsuaBdi8Zv+SiK2kN2W66stsgqX56K2 SLFBd5B0psaji5znxKtIY8apHxV0wInEXfcpddgEMuG5lP9d0LMbSN3Mf BKdzw7IN9JejuBtexNJIJKfE/+Hr9nNlUTcSsDsWfxH4MSxW+hVukin5p MIw1VSX7zQM1o5OzvxVqgwtmC+dR+Xbv0OGedBApj2UbU+C9Bi6uQR8Yy coqBhD+i71GiUuESXGcISkRWeRsQfL0l/vJwh/TgD8V34+Nv6Wu0lRP/h g==; X-CSE-ConnectionGUID: Ls4DiYESTG2kKSxGn0Lzzg== X-CSE-MsgGUID: eABasgZWTbKhgfngY0xIXQ== X-IronPort-AV: E=McAfee;i="6800,10657,11738"; a="75441705" X-IronPort-AV: E=Sophos;i="6.23,138,1770624000"; d="scan'208";a="75441705" Received: from orviesa003.jf.intel.com ([10.64.159.143]) by fmvoesa109.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Mar 2026 18:01:25 -0700 X-CSE-ConnectionGUID: Gg+aC6O3Rxm4ShtpqeP6nw== X-CSE-MsgGUID: BVrqqIQqSd+AzUWOiKhx3w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,138,1770624000"; d="scan'208";a="228263237" Received: from spr.sh.intel.com ([10.112.229.196]) by orviesa003.jf.intel.com with ESMTP; 23 Mar 2026 18:01:19 -0700 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Thomas Gleixner , Dave Hansen , Ian Rogers , Adrian Hunter , Jiri Olsa , Alexander Shishkin , Andi Kleen , Eranian Stephane Cc: Mark Rutland , broonie@kernel.org, Ravi Bangoria , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Zide Chen , Falcon Thomas , Dapeng Mi , Xudong Hao , Kan Liang , Dapeng Mi Subject: [Patch v7 1/4] perf headers: Sync with the kernel headers Date: Tue, 24 Mar 2026 08:57:03 +0800 Message-Id: <20260324005706.3778057-2-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260324005706.3778057-1-dapeng1.mi@linux.intel.com> References: <20260324005706.3778057-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang Update include/uapi/linux/perf_event.h and arch/x86/include/uapi/asm/perf_regs.h to support extended regs. Signed-off-by: Kan Liang Co-developed-by: Dapeng Mi Signed-off-by: Dapeng Mi --- V7: Add more comments for newly added register indexes. tools/arch/x86/include/uapi/asm/perf_regs.h | 51 +++++++++++++++++++++ tools/include/uapi/linux/perf_event.h | 50 ++++++++++++++++++-- 2 files changed, 97 insertions(+), 4 deletions(-) diff --git a/tools/arch/x86/include/uapi/asm/perf_regs.h b/tools/arch/x86/i= nclude/uapi/asm/perf_regs.h index 7c9d2bb3833b..98a5b6c8e24c 100644 --- a/tools/arch/x86/include/uapi/asm/perf_regs.h +++ b/tools/arch/x86/include/uapi/asm/perf_regs.h @@ -27,9 +27,35 @@ enum perf_event_x86_regs { PERF_REG_X86_R13, PERF_REG_X86_R14, PERF_REG_X86_R15, + /* + * The eGPRs/SSP and XMM have overlaps. Only one can be used + * at a time. The ABI PERF_SAMPLE_REGS_ABI_SIMD is used to + * distinguish which one is used. If PERF_SAMPLE_REGS_ABI_SIMD + * is set, then eGPRs/SSP is used, otherwise, XMM is used. + * + * Extended GPRs (eGPRs) + */ + PERF_REG_X86_R16, + PERF_REG_X86_R17, + PERF_REG_X86_R18, + PERF_REG_X86_R19, + PERF_REG_X86_R20, + PERF_REG_X86_R21, + PERF_REG_X86_R22, + PERF_REG_X86_R23, + PERF_REG_X86_R24, + PERF_REG_X86_R25, + PERF_REG_X86_R26, + PERF_REG_X86_R27, + PERF_REG_X86_R28, + PERF_REG_X86_R29, + PERF_REG_X86_R30, + PERF_REG_X86_R31, + PERF_REG_X86_SSP, /* These are the limits for the GPRs. */ PERF_REG_X86_32_MAX =3D PERF_REG_X86_GS + 1, PERF_REG_X86_64_MAX =3D PERF_REG_X86_R15 + 1, + PERF_REG_MISC_MAX =3D PERF_REG_X86_SSP + 1, =20 /* These all need two bits set because they are 128bit */ PERF_REG_X86_XMM0 =3D 32, @@ -54,5 +80,30 @@ enum perf_event_x86_regs { }; =20 #define PERF_REG_EXTENDED_MASK (~((1ULL << PERF_REG_X86_XMM0) - 1)) +#define PERF_X86_EGPRS_MASK GENMASK_ULL(PERF_REG_X86_R31, PERF_REG_X86_R16) + +enum { + PERF_X86_SIMD_XMM_REGS =3D 16, + PERF_X86_SIMD_YMM_REGS =3D 16, + PERF_X86_SIMD_ZMM_REGS =3D 32, + PERF_X86_SIMD_VEC_REGS_MAX =3D PERF_X86_SIMD_ZMM_REGS, + + PERF_X86_SIMD_OPMASK_REGS =3D 8, + PERF_X86_SIMD_PRED_REGS_MAX =3D PERF_X86_SIMD_OPMASK_REGS, +}; + +#define PERF_X86_SIMD_PRED_MASK GENMASK(PERF_X86_SIMD_PRED_REGS_MAX - 1, 0) +#define PERF_X86_SIMD_VEC_MASK GENMASK_ULL(PERF_X86_SIMD_VEC_REGS_MAX - 1,= 0) + +#define PERF_X86_H16ZMM_BASE 16 + +enum { + /* 1 qword =3D 8 bytes */ + PERF_X86_OPMASK_QWORDS =3D 1, + PERF_X86_XMM_QWORDS =3D 2, + PERF_X86_YMM_QWORDS =3D 4, + PERF_X86_ZMM_QWORDS =3D 8, + PERF_X86_SIMD_QWORDS_MAX =3D PERF_X86_ZMM_QWORDS, +}; =20 #endif /* _ASM_X86_PERF_REGS_H */ diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/lin= ux/perf_event.h index 76e9d0664d0c..00bc0a262735 100644 --- a/tools/include/uapi/linux/perf_event.h +++ b/tools/include/uapi/linux/perf_event.h @@ -314,8 +314,9 @@ enum { */ enum perf_sample_regs_abi { PERF_SAMPLE_REGS_ABI_NONE =3D 0, - PERF_SAMPLE_REGS_ABI_32 =3D 1, - PERF_SAMPLE_REGS_ABI_64 =3D 2, + PERF_SAMPLE_REGS_ABI_32 =3D (1 << 0), + PERF_SAMPLE_REGS_ABI_64 =3D (1 << 1), + PERF_SAMPLE_REGS_ABI_SIMD =3D (1 << 2), }; =20 /* @@ -383,6 +384,7 @@ enum perf_event_read_format { #define PERF_ATTR_SIZE_VER7 128 /* Add: sig_data */ #define PERF_ATTR_SIZE_VER8 136 /* Add: config3 */ #define PERF_ATTR_SIZE_VER9 144 /* add: config4 */ +#define PERF_ATTR_SIZE_VER10 176 /* Add: sample_simd_{pred,vec}_reg_* */ =20 /* * 'struct perf_event_attr' contains various attributes that define @@ -547,6 +549,30 @@ struct perf_event_attr { =20 __u64 config3; /* extension of config2 */ __u64 config4; /* extension of config3 */ + + /* + * Defines the sampling SIMD/PRED registers bitmap and qwords + * (8 bytes) length. + * + * sample_simd_regs_enabled !=3D 0 indicates there are SIMD/PRED registers + * to be sampled, the SIMD/PRED registers bitmap and qwords length are + * represented in sample_{simd|pred}_pred_reg_{intr|user} and + * sample_simd_{vec|pred}_reg_qwords fields. + * + * sample_simd_regs_enabled =3D=3D 0 indicates no SIMD/PRED registers are + * sampled. + */ + union { + __u16 sample_simd_regs_enabled; + __u16 sample_simd_pred_reg_qwords; + }; + __u16 sample_simd_vec_reg_qwords; + __u32 __reserved_4; + + __u32 sample_simd_pred_reg_intr; + __u32 sample_simd_pred_reg_user; + __u64 sample_simd_vec_reg_intr; + __u64 sample_simd_vec_reg_user; }; =20 /* @@ -1020,7 +1046,15 @@ enum perf_event_type { * } && PERF_SAMPLE_BRANCH_STACK * * { u64 abi; # enum perf_sample_regs_abi - * u64 regs[weight(mask)]; } && PERF_SAMPLE_REGS_USER + * u64 regs[weight(mask)]; + * struct { + * u16 nr_vectors; # 0 ... weight(sample_simd_vec_reg_user) + * u16 vector_qwords; # 0 ... sample_simd_vec_reg_qwords + * u16 nr_pred; # 0 ... weight(sample_simd_pred_reg_user) + * u16 pred_qwords; # 0 ... sample_simd_pred_reg_qwords + * u64 data[nr_vectors * vector_qwords + nr_pred * pred_qwords]; + * } && (abi & PERF_SAMPLE_REGS_ABI_SIMD) + * } && PERF_SAMPLE_REGS_USER * * { u64 size; * char data[size]; @@ -1047,7 +1081,15 @@ enum perf_event_type { * { u64 data_src; } && PERF_SAMPLE_DATA_SRC * { u64 transaction; } && PERF_SAMPLE_TRANSACTION * { u64 abi; # enum perf_sample_regs_abi - * u64 regs[weight(mask)]; } && PERF_SAMPLE_REGS_INTR + * u64 regs[weight(mask)]; + * struct { + * u16 nr_vectors; # 0 ... weight(sample_simd_vec_reg_intr) + * u16 vector_qwords; # 0 ... sample_simd_vec_reg_qwords + * u16 nr_pred; # 0 ... weight(sample_simd_pred_reg_intr) + * u16 pred_qwords; # 0 ... sample_simd_pred_reg_qwords + * u64 data[nr_vectors * vector_qwords + nr_pred * pred_qwords]; + * } && (abi & PERF_SAMPLE_REGS_ABI_SIMD) + * } && PERF_SAMPLE_REGS_INTR * { u64 phys_addr;} && PERF_SAMPLE_PHYS_ADDR * { u64 cgroup;} && PERF_SAMPLE_CGROUP * { u64 data_page_size;} && PERF_SAMPLE_DATA_PAGE_SIZE --=20 2.34.1 From nobody Thu Apr 2 09:33:38 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A9E76372ECD; Tue, 24 Mar 2026 01:01:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774314092; cv=none; b=eOHdXBDirDW6o8FzTShWqDejUGPDk8PD41SJhPFfIH/h+zsh2j8ZCqGAeIKslfNiclVXunb5w2o/4FK+zUvl5YYchcRU1rV+WnIMd/hIcVPB28y/bT/JejspK16GwXzzqstZJICeCETkaMWQpwedbi2mjt6enrTqVvsYJoz/IGE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774314092; c=relaxed/simple; bh=G8khddG3CHnvakJ5FMPCmUyJSjB8cRGMzNe6c/XuBB0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=rXrwXfrP7gVu3+l81UvhkKL7DNlDHPdyTFBPNvVBb3GcVeVo8wcC609OJ3MTYMxd7seMDCeUQ0JqVqw+H24ZA13cTKRgDpEem0EeCiQmi+wTurMJ6+97pYn1w8m1e9oaXuz9LVTjnk6Y7JsinpegnaMOohcXUJqhDveFjlmLmC8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=RozvbNxq; arc=none smtp.client-ip=192.198.163.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="RozvbNxq" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1774314091; x=1805850091; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=G8khddG3CHnvakJ5FMPCmUyJSjB8cRGMzNe6c/XuBB0=; b=RozvbNxqAlQvDVZa/7QFsXZkEAe/HReDI4QxWEt6naX+tH6a5ck1tra0 BMHcOePi5m9gu9FJlOHldkrRVrpP8Wy9ZOgZvz6Wo0rV+BZi87OGb4Uae 9DNPOCxxv9oKGhnq+Wbs4VDZkVLv651IJ47wyvYlgIB8yj5Mkqf4rnmrp 7S7aGkN68IrKD1I4VkIJ5h1u8FEw8GCztbpVpoaN7Ub/lSFQXwYkG7eHY TfG+j2OeTtDvsXh1F2jxkHTUSDzOBeqd4gMyjSyHM/vvIUejFZtlsuVyF 4Tc7XK3tx2ZNVn089e/3feP4l8twp9jXeLpsLLNB8DozMneiO0R7oX5KT g==; X-CSE-ConnectionGUID: NjlBfH2ZQaWLOGVu0yb1Ig== X-CSE-MsgGUID: 3WnPa6W1SBmtnBXH4XQRJQ== X-IronPort-AV: E=McAfee;i="6800,10657,11738"; a="75441713" X-IronPort-AV: E=Sophos;i="6.23,138,1770624000"; d="scan'208";a="75441713" Received: from orviesa003.jf.intel.com ([10.64.159.143]) by fmvoesa109.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Mar 2026 18:01:30 -0700 X-CSE-ConnectionGUID: kEvackkHR++WAA+851lwxw== X-CSE-MsgGUID: wIJnNWVkT9OqwJ18eLcEbQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,138,1770624000"; d="scan'208";a="228263266" Received: from spr.sh.intel.com ([10.112.229.196]) by orviesa003.jf.intel.com with ESMTP; 23 Mar 2026 18:01:24 -0700 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Thomas Gleixner , Dave Hansen , Ian Rogers , Adrian Hunter , Jiri Olsa , Alexander Shishkin , Andi Kleen , Eranian Stephane Cc: Mark Rutland , broonie@kernel.org, Ravi Bangoria , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Zide Chen , Falcon Thomas , Dapeng Mi , Xudong Hao , Dapeng Mi Subject: [Patch v7 2/4] perf regs: Support x86 eGPRs/SSP sampling Date: Tue, 24 Mar 2026 08:57:04 +0800 Message-Id: <20260324005706.3778057-3-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260324005706.3778057-1-dapeng1.mi@linux.intel.com> References: <20260324005706.3778057-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This patch adds support for sampling x86 extended GP registers (R16-R31) and the shadow stack pointer (SSP) register. The original XMM registers space in sample_regs_user/sample_regs_intr is reclaimed to represent the eGPRs and SSP when SIMD registers sampling is supported with the new SIMD sampling fields in the perf_event_attr structure. This necessitates a way to distinguish which register layout is used for the sample_regs_user/sample_regs_intr bitmap. To address this, a new "abi" argument is added to the helpers perf_intr_reg_mask(), perf_user_reg_mask(), and perf_reg_name(). When "abi & PERF_SAMPLE_REGS_ABI_SIMD" is true, it indicates the eGPRs and SSP layout is represented; otherwise, the legacy XMM registers are represented. Signed-off-by: Dapeng Mi --- V7: Limit dwarf minimal regs to legacy GPRs (excluding APX eGPRs). tools/perf/builtin-script.c | 2 +- tools/perf/util/evsel.c | 7 +- tools/perf/util/parse-regs-options.c | 17 ++- .../perf/util/perf-regs-arch/perf_regs_x86.c | 124 +++++++++++++++--- tools/perf/util/perf_regs.c | 12 +- tools/perf/util/perf_regs.h | 10 +- .../scripting-engines/trace-event-python.c | 2 +- tools/perf/util/session.c | 9 +- 8 files changed, 142 insertions(+), 41 deletions(-) diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index b80c406d1fc1..714528732e02 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -730,7 +730,7 @@ static int perf_sample__fprintf_regs(struct regs_dump *= regs, uint64_t mask, for_each_set_bit(r, (unsigned long *) &mask, sizeof(mask) * 8) { u64 val =3D regs->regs[i++]; printed +=3D fprintf(fp, "%5s:0x%"PRIx64" ", - perf_reg_name(r, e_machine, e_flags), + perf_reg_name(r, e_machine, e_flags, regs->abi), val); } =20 diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index 5a294595a677..f565ef2eb476 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -1054,19 +1054,22 @@ static void __evsel__config_callchain(struct evsel = *evsel, const struct record_o } =20 if (param->record_mode =3D=3D CALLCHAIN_DWARF) { + int abi =3D -1; /* -1 indicates only basic GPRs are needed. */ + if (!function) { uint16_t e_machine =3D evsel__e_machine(evsel, /*e_flags=3D*/NULL); =20 evsel__set_sample_bit(evsel, REGS_USER); evsel__set_sample_bit(evsel, STACK_USER); if (opts->sample_user_regs && - DWARF_MINIMAL_REGS(e_machine) !=3D perf_user_reg_mask(EM_HOST)) { + DWARF_MINIMAL_REGS(e_machine) !=3D perf_user_reg_mask(EM_HOST, &abi= )) { attr->sample_regs_user |=3D DWARF_MINIMAL_REGS(e_machine); pr_warning("WARNING: The use of --call-graph=3Ddwarf may require all t= he user registers, " "specifying a subset with --user-regs may render DWARF unwinding u= nreliable, " "so the minimal registers set (IP, SP) is explicitly forced.\n"); } else { - attr->sample_regs_user |=3D perf_user_reg_mask(EM_HOST); + abi =3D -1; + attr->sample_regs_user |=3D perf_user_reg_mask(EM_HOST, &abi); } attr->sample_stack_user =3D param->dump_size; attr->exclude_callchain_user =3D 1; diff --git a/tools/perf/util/parse-regs-options.c b/tools/perf/util/parse-r= egs-options.c index c93c2f0c8105..6cf865bfc2f7 100644 --- a/tools/perf/util/parse-regs-options.c +++ b/tools/perf/util/parse-regs-options.c @@ -10,7 +10,8 @@ #include "util/perf_regs.h" #include "util/parse-regs-options.h" =20 -static void list_perf_regs(FILE *fp, uint64_t mask) +static void +list_perf_regs(FILE *fp, uint64_t mask, int abi) { const char *last_name =3D NULL; =20 @@ -21,7 +22,7 @@ static void list_perf_regs(FILE *fp, uint64_t mask) if (((1ULL << reg) & mask) =3D=3D 0) continue; =20 - name =3D perf_reg_name(reg, EM_HOST, EF_HOST); + name =3D perf_reg_name(reg, EM_HOST, EF_HOST, abi); if (name && (!last_name || strcmp(last_name, name))) fprintf(fp, "%s%s", reg > 0 ? " " : "", name); last_name =3D name; @@ -29,7 +30,8 @@ static void list_perf_regs(FILE *fp, uint64_t mask) fputc('\n', fp); } =20 -static uint64_t name_to_perf_reg_mask(const char *to_match, uint64_t mask) +static uint64_t +name_to_perf_reg_mask(const char *to_match, uint64_t mask, int abi) { uint64_t reg_mask =3D 0; =20 @@ -39,7 +41,7 @@ static uint64_t name_to_perf_reg_mask(const char *to_matc= h, uint64_t mask) if (((1ULL << reg) & mask) =3D=3D 0) continue; =20 - name =3D perf_reg_name(reg, EM_HOST, EF_HOST); + name =3D perf_reg_name(reg, EM_HOST, EF_HOST, abi); if (!name) continue; =20 @@ -56,6 +58,7 @@ __parse_regs(const struct option *opt, const char *str, i= nt unset, bool intr) char *s, *os =3D NULL, *p; int ret =3D -1; uint64_t mask; + int abi =3D 0; =20 if (unset) return 0; @@ -66,7 +69,7 @@ __parse_regs(const struct option *opt, const char *str, i= nt unset, bool intr) if (*mode) return -1; =20 - mask =3D intr ? perf_intr_reg_mask(EM_HOST) : perf_user_reg_mask(EM_HOST); + mask =3D intr ? perf_intr_reg_mask(EM_HOST, &abi) : perf_user_reg_mask(EM= _HOST, &abi); =20 /* str may be NULL in case no arg is passed to -I */ if (!str) { @@ -87,11 +90,11 @@ __parse_regs(const struct option *opt, const char *str,= int unset, bool intr) *p =3D '\0'; =20 if (!strcmp(s, "?")) { - list_perf_regs(stderr, mask); + list_perf_regs(stderr, mask, abi); goto error; } =20 - reg_mask =3D name_to_perf_reg_mask(s, mask); + reg_mask =3D name_to_perf_reg_mask(s, mask, abi); if (reg_mask =3D=3D 0) { ui__warning("Unknown register \"%s\", check man page or run \"perf reco= rd %s?\"\n", s, intr ? "-I" : "--user-regs=3D"); diff --git a/tools/perf/util/perf-regs-arch/perf_regs_x86.c b/tools/perf/ut= il/perf-regs-arch/perf_regs_x86.c index b6d20522b4e8..ae26d991cdc9 100644 --- a/tools/perf/util/perf-regs-arch/perf_regs_x86.c +++ b/tools/perf/util/perf-regs-arch/perf_regs_x86.c @@ -235,26 +235,26 @@ int __perf_sdt_arg_parse_op_x86(char *old_op, char **= new_op) return SDT_ARG_VALID; } =20 -uint64_t __perf_reg_mask_x86(bool intr) +static uint64_t __arch__reg_mask(u64 sample_type, u64 mask, bool has_simd_= regs) { struct perf_event_attr attr =3D { - .type =3D PERF_TYPE_HARDWARE, - .config =3D PERF_COUNT_HW_CPU_CYCLES, - .sample_type =3D PERF_SAMPLE_REGS_INTR, - .sample_regs_intr =3D PERF_REG_EXTENDED_MASK, - .precise_ip =3D 1, - .disabled =3D 1, - .exclude_kernel =3D 1, + .type =3D PERF_TYPE_HARDWARE, + .config =3D PERF_COUNT_HW_CPU_CYCLES, + .sample_type =3D sample_type, + .precise_ip =3D 1, + .disabled =3D 1, + .exclude_kernel =3D 1, + .sample_simd_regs_enabled =3D has_simd_regs, }; int fd; - - if (!intr) - return PERF_REGS_MASK; - /* * In an unnamed union, init it here to build on older gcc versions */ attr.sample_period =3D 1; + if (sample_type =3D=3D PERF_SAMPLE_REGS_INTR) + attr.sample_regs_intr =3D mask; + else + attr.sample_regs_user =3D mask; =20 if (perf_pmus__num_core_pmus() > 1) { struct perf_pmu *pmu =3D NULL; @@ -276,13 +276,38 @@ uint64_t __perf_reg_mask_x86(bool intr) /*group_fd=3D*/-1, /*flags=3D*/0); if (fd !=3D -1) { close(fd); - return (PERF_REG_EXTENDED_MASK | PERF_REGS_MASK); + return mask; + } + + return 0; +} + +uint64_t __perf_reg_mask_x86(bool intr, int *abi) +{ + u64 sample_type =3D intr ? PERF_SAMPLE_REGS_INTR : PERF_SAMPLE_REGS_USER; + uint64_t mask =3D PERF_REGS_MASK; + + /* -1 indicates only basic GPRs are needed. */ + if (*abi < 0) + return PERF_REGS_MASK; + + *abi =3D 0; + mask |=3D __arch__reg_mask(sample_type, + GENMASK_ULL(PERF_REG_X86_R31, PERF_REG_X86_R16), + true); + mask |=3D __arch__reg_mask(sample_type, BIT_ULL(PERF_REG_X86_SSP), true); + + if (mask !=3D PERF_REGS_MASK) { + *abi |=3D PERF_SAMPLE_REGS_ABI_SIMD; + } else { + mask |=3D __arch__reg_mask(sample_type, PERF_REG_EXTENDED_MASK, + false); } =20 - return PERF_REGS_MASK; + return mask; } =20 -const char *__perf_reg_name_x86(int id) +static const char *__arch_reg_gpr_name(int id) { switch (id) { case PERF_REG_X86_AX: @@ -333,7 +358,60 @@ const char *__perf_reg_name_x86(int id) return "R14"; case PERF_REG_X86_R15: return "R15"; + default: + return NULL; + } + + return NULL; +} =20 +static const char *__arch_reg_egpr_name(int id) +{ + switch (id) { + case PERF_REG_X86_R16: + return "R16"; + case PERF_REG_X86_R17: + return "R17"; + case PERF_REG_X86_R18: + return "R18"; + case PERF_REG_X86_R19: + return "R19"; + case PERF_REG_X86_R20: + return "R20"; + case PERF_REG_X86_R21: + return "R21"; + case PERF_REG_X86_R22: + return "R22"; + case PERF_REG_X86_R23: + return "R23"; + case PERF_REG_X86_R24: + return "R24"; + case PERF_REG_X86_R25: + return "R25"; + case PERF_REG_X86_R26: + return "R26"; + case PERF_REG_X86_R27: + return "R27"; + case PERF_REG_X86_R28: + return "R28"; + case PERF_REG_X86_R29: + return "R29"; + case PERF_REG_X86_R30: + return "R30"; + case PERF_REG_X86_R31: + return "R31"; + case PERF_REG_X86_SSP: + return "SSP"; + default: + return NULL; + } + + return NULL; +} + +static const char *__arch_reg_xmm_name(int id) +{ + switch (id) { #define XMM(x) \ case PERF_REG_X86_XMM ## x: \ case PERF_REG_X86_XMM ## x + 1: \ @@ -362,6 +440,22 @@ const char *__perf_reg_name_x86(int id) return NULL; } =20 +const char *__perf_reg_name_x86(int id, int abi) +{ + const char *name; + + name =3D __arch_reg_gpr_name(id); + if (name) + return name; + + if (abi & PERF_SAMPLE_REGS_ABI_SIMD) + name =3D __arch_reg_egpr_name(id); + else + name =3D __arch_reg_xmm_name(id); + + return name; +} + uint64_t __perf_reg_ip_x86(void) { return PERF_REG_X86_IP; diff --git a/tools/perf/util/perf_regs.c b/tools/perf/util/perf_regs.c index 5b8f34beb24e..afc567718bee 100644 --- a/tools/perf/util/perf_regs.c +++ b/tools/perf/util/perf_regs.c @@ -32,7 +32,7 @@ int perf_sdt_arg_parse_op(uint16_t e_machine, char *old_o= p, char **new_op) return ret; } =20 -uint64_t perf_intr_reg_mask(uint16_t e_machine) +uint64_t perf_intr_reg_mask(uint16_t e_machine, int *abi) { uint64_t mask =3D 0; =20 @@ -64,7 +64,7 @@ uint64_t perf_intr_reg_mask(uint16_t e_machine) break; case EM_386: case EM_X86_64: - mask =3D __perf_reg_mask_x86(/*intr=3D*/true); + mask =3D __perf_reg_mask_x86(/*intr=3D*/true, abi); break; default: pr_debug("Unknown ELF machine %d, interrupt sampling register mask will = be empty.\n", @@ -75,7 +75,7 @@ uint64_t perf_intr_reg_mask(uint16_t e_machine) return mask; } =20 -uint64_t perf_user_reg_mask(uint16_t e_machine) +uint64_t perf_user_reg_mask(uint16_t e_machine, int *abi) { uint64_t mask =3D 0; =20 @@ -107,7 +107,7 @@ uint64_t perf_user_reg_mask(uint16_t e_machine) break; case EM_386: case EM_X86_64: - mask =3D __perf_reg_mask_x86(/*intr=3D*/false); + mask =3D __perf_reg_mask_x86(/*intr=3D*/false, abi); break; default: pr_debug("Unknown ELF machine %d, user sampling register mask will be em= pty.\n", @@ -118,7 +118,7 @@ uint64_t perf_user_reg_mask(uint16_t e_machine) return mask; } =20 -const char *perf_reg_name(int id, uint16_t e_machine, uint32_t e_flags) +const char *perf_reg_name(int id, uint16_t e_machine, uint32_t e_flags, in= t abi) { const char *reg_name =3D NULL; =20 @@ -150,7 +150,7 @@ const char *perf_reg_name(int id, uint16_t e_machine, u= int32_t e_flags) break; case EM_386: case EM_X86_64: - reg_name =3D __perf_reg_name_x86(id); + reg_name =3D __perf_reg_name_x86(id, abi); break; default: break; diff --git a/tools/perf/util/perf_regs.h b/tools/perf/util/perf_regs.h index 7c04700bf837..c9501ca8045d 100644 --- a/tools/perf/util/perf_regs.h +++ b/tools/perf/util/perf_regs.h @@ -13,10 +13,10 @@ enum { }; =20 int perf_sdt_arg_parse_op(uint16_t e_machine, char *old_op, char **new_op); -uint64_t perf_intr_reg_mask(uint16_t e_machine); -uint64_t perf_user_reg_mask(uint16_t e_machine); +uint64_t perf_intr_reg_mask(uint16_t e_machine, int *abi); +uint64_t perf_user_reg_mask(uint16_t e_machine, int *abi); =20 -const char *perf_reg_name(int id, uint16_t e_machine, uint32_t e_flags); +const char *perf_reg_name(int id, uint16_t e_machine, uint32_t e_flags, in= t abi); int perf_reg_value(u64 *valp, struct regs_dump *regs, int id); uint64_t perf_arch_reg_ip(uint16_t e_machine); uint64_t perf_arch_reg_sp(uint16_t e_machine); @@ -64,8 +64,8 @@ uint64_t __perf_reg_ip_s390(void); uint64_t __perf_reg_sp_s390(void); =20 int __perf_sdt_arg_parse_op_x86(char *old_op, char **new_op); -uint64_t __perf_reg_mask_x86(bool intr); -const char *__perf_reg_name_x86(int id); +uint64_t __perf_reg_mask_x86(bool intr, int *abi); +const char *__perf_reg_name_x86(int id, int abi); uint64_t __perf_reg_ip_x86(void); uint64_t __perf_reg_sp_x86(void); =20 diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools= /perf/util/scripting-engines/trace-event-python.c index 2b0df7bd9a46..4cc5b96898e6 100644 --- a/tools/perf/util/scripting-engines/trace-event-python.c +++ b/tools/perf/util/scripting-engines/trace-event-python.c @@ -733,7 +733,7 @@ static void regs_map(struct regs_dump *regs, uint64_t m= ask, uint16_t e_machine, =20 printed +=3D scnprintf(bf + printed, size - printed, "%5s:0x%" PRIx64 " ", - perf_reg_name(r, e_machine, e_flags), val); + perf_reg_name(r, e_machine, e_flags, regs->abi), val); } } =20 diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c index 4b465abfa36c..7cf7bf86205d 100644 --- a/tools/perf/util/session.c +++ b/tools/perf/util/session.c @@ -959,15 +959,16 @@ static void branch_stack__printf(struct perf_sample *= sample, } } =20 -static void regs_dump__printf(u64 mask, u64 *regs, uint16_t e_machine, uin= t32_t e_flags) +static void regs_dump__printf(u64 mask, struct regs_dump *regs, + uint16_t e_machine, uint32_t e_flags) { unsigned rid, i =3D 0; =20 for_each_set_bit(rid, (unsigned long *) &mask, sizeof(mask) * 8) { - u64 val =3D regs[i++]; + u64 val =3D regs->regs[i++]; =20 printf(".... %-5s 0x%016" PRIx64 "\n", - perf_reg_name(rid, e_machine, e_flags), val); + perf_reg_name(rid, e_machine, e_flags, regs->abi), val); } } =20 @@ -995,7 +996,7 @@ static void regs__printf(const char *type, struct regs_= dump *regs, mask, regs_dump_abi(regs)); =20 - regs_dump__printf(mask, regs->regs, e_machine, e_flags); + regs_dump__printf(mask, regs, e_machine, e_flags); } =20 static void regs_user__printf(struct perf_sample *sample, uint16_t e_machi= ne, uint32_t e_flags) --=20 2.34.1 From nobody Thu Apr 2 09:33:38 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 26B7F372EFA; Tue, 24 Mar 2026 01:01:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774314098; cv=none; b=sWF66Vzd9kerEeXzdG8puDsaMhvhYdyJ9Dghcqi1EzP+e0V/Eaq8Nb9CGc87Q6g3ROUiGb/OrEW8iyp7vGKV3VZ7gEB8epYw93FdkwPNY55b7Tuovht8mMQmAVV5w2SSSRkvt+KD8504DNugm3pgphzq8L15SaOSCEV6ofVuR3s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774314098; c=relaxed/simple; bh=2QGfqR6are3XvF2IzsxAaUnAr9oaOdhH2uZIQr1Knzw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=laUJu3LpyLcR15Fvs+vEGQy7Z/UoTmSQ7eSrsf9kW7XIL81Ji9iNNHMbB3mP2FO+YmYw8PDlwNGGb1qGSy6IArZwBLRkEzcMXWsahy16QfbcbM2xaUQg2SyM6FsIr4FefiVhw3pmvvsyri3Um/kbTc9++HWfFDj2gBu8x0rVl8g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=IT2z6GhN; arc=none smtp.client-ip=192.198.163.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="IT2z6GhN" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1774314096; x=1805850096; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=2QGfqR6are3XvF2IzsxAaUnAr9oaOdhH2uZIQr1Knzw=; b=IT2z6GhNp4ImdAFqqZhC8wC29i+xcvIk6tktdARKqb5kAzzDqUBTIZDy GTp1yxv2FrCzT3KQlG22BAFON8Omygl5puQyCjvv67cJfPKgNR1HvQyGx 57VA+MSCBH3zJ6Jcpm175gH7wpusgROXQx1HFOIxCTZDw3Vg2tn6R8GCB y1PKgTwHd3xu4RSdIZqRjyc98TjcVMts1P4+ln0wkTOyvdT8+HNYiTChz IQgpqmqTl+A0k8TdVOLO/6VnyczjvsYtvIxk33b5gaYyWx5hSx909je8+ JAJ3ZxQ+ynal2gVWPu1ccUWg7+DdsNlfJ0GoZF3hEMd+18QFtzGATtANi w==; X-CSE-ConnectionGUID: a0q7p5zaSXmxnrh4u3jIZw== X-CSE-MsgGUID: YM/mMRiuSKyMES/GdWZb/g== X-IronPort-AV: E=McAfee;i="6800,10657,11738"; a="75441732" X-IronPort-AV: E=Sophos;i="6.23,138,1770624000"; d="scan'208";a="75441732" Received: from orviesa003.jf.intel.com ([10.64.159.143]) by fmvoesa109.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Mar 2026 18:01:36 -0700 X-CSE-ConnectionGUID: FHAFra/FThm+hRxmGO7JzA== X-CSE-MsgGUID: cWoM9XJ3QuWLZVPbyrxONQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,138,1770624000"; d="scan'208";a="228263294" Received: from spr.sh.intel.com ([10.112.229.196]) by orviesa003.jf.intel.com with ESMTP; 23 Mar 2026 18:01:30 -0700 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Thomas Gleixner , Dave Hansen , Ian Rogers , Adrian Hunter , Jiri Olsa , Alexander Shishkin , Andi Kleen , Eranian Stephane Cc: Mark Rutland , broonie@kernel.org, Ravi Bangoria , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Zide Chen , Falcon Thomas , Dapeng Mi , Xudong Hao , Dapeng Mi Subject: [Patch v7 3/4] perf regs: Support x86 SIMD registers sampling Date: Tue, 24 Mar 2026 08:57:05 +0800 Message-Id: <20260324005706.3778057-4-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260324005706.3778057-1-dapeng1.mi@linux.intel.com> References: <20260324005706.3778057-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This patch adds support for the newly introduced SIMD register sampling format by adding the following 5 functions: uint64_t perf_intr_simd_reg_class_mask(uint16_t e_machine, bool pred); uint64_t perf_user_simd_reg_class_mask(uint16_t e_machine, bool pred); uint64_t perf_intr_simd_reg_class_bitmap_qwords(uint16_t e_machine, int reg= _c, uint16_t *qwords, bool pred); uint64_t perf_user_simd_reg_class_bitmap_qwords(uint16_t e_machine, int reg= _c, uint16_t *qwords, bool pred); const char *perf_simd_reg_class_name(uint16_t e_machine, int id, bool pred); The perf_{intr|user}_simd_reg_class_mask() functions retrieve the bitmap of kernel supported SIMD/PRED register classes on current platform for intr-regs and user-regs sampling, such as OPMASK/XMM/YMM/ZMM on x86 platforms. The perf_{intr|user}_simd_reg_class_bitmap_qwords() functions retrieve the bitmap and qwords length of a certain class of SIMD/PRED register on current platform for intr-regs and user-regs sampling. For example, for the XMM registers on x86 platforms, the returned bitmap is 0xffff (XMM0 ~ XMM15) and the qwords length is 2 (128 bits for each XMM register). The perf_simd_reg_class_name() function gets the register class name for a certain register class index. Additionally, the function __parse_regs() is enhanced to support parsing these newly introduced SIMD/PRED registers. Currently, each class of register can only be sampled collectively; sampling a specific SIMD register is not supported. For example, all XMM registers are sampled together rather than sampling only XMM0. When multiple overlapping register types, such as XMM and YMM, are sampled simultaneously, only the superset (YMM registers) is sampled. With this patch, all supported sampling registers on x86 platforms are displayed as follows. $perf record --intr-regs=3D? available registers: AX BX CX DX SI DI BP SP IP FLAGS CS SS R8 R9 R10 R11 R12 R13 R14 R15 R16 R17 R18 R19 R20 R21 R22 R23 R24 R25 R26 R27 R28 R29 R30 R31 SSP XMM0-15 YMM0-15 ZMM0-31 OPMASK0-7 $perf record --user-regs=3D? available registers: AX BX CX DX SI DI BP SP IP FLAGS CS SS R8 R9 R10 R11 R12 R13 R14 R15 R16 R17 R18 R19 R20 R21 R22 R23 R24 R25 R26 R27 R28 R29 R30 R31 SSP XMM0-15 YMM0-15 ZMM0-31 OPMASK0-7 Signed-off-by: Dapeng Mi Reviewed-by: Ian Rogers --- tools/perf/util/evsel.c | 27 ++ tools/perf/util/parse-regs-options.c | 164 +++++++++- .../perf/util/perf-regs-arch/perf_regs_x86.c | 292 ++++++++++++++++++ tools/perf/util/perf_event_attr_fprintf.c | 6 + tools/perf/util/perf_regs.c | 72 +++++ tools/perf/util/perf_regs.h | 11 + tools/perf/util/record.h | 6 + 7 files changed, 567 insertions(+), 11 deletions(-) diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index f565ef2eb476..5f00489e714a 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -1589,12 +1589,39 @@ void evsel__config(struct evsel *evsel, const struc= t record_opts *opts, if (opts->sample_intr_regs && !evsel->no_aux_samples && !evsel__is_dummy_event(evsel)) { attr->sample_regs_intr =3D opts->sample_intr_regs; + attr->sample_simd_regs_enabled =3D !!opts->sample_pred_reg_qwords; + evsel__set_sample_bit(evsel, REGS_INTR); + } + + if ((opts->sample_intr_vec_regs || opts->sample_intr_pred_regs) && + !evsel->no_aux_samples && !evsel__is_dummy_event(evsel)) { + /* The pred qwords is to implies the set of SIMD registers is used */ + if (opts->sample_pred_reg_qwords) + attr->sample_simd_pred_reg_qwords =3D opts->sample_pred_reg_qwords; + else + attr->sample_simd_pred_reg_qwords =3D 1; + attr->sample_simd_vec_reg_intr =3D opts->sample_intr_vec_regs; + attr->sample_simd_vec_reg_qwords =3D opts->sample_vec_reg_qwords; + attr->sample_simd_pred_reg_intr =3D opts->sample_intr_pred_regs; evsel__set_sample_bit(evsel, REGS_INTR); } =20 if (opts->sample_user_regs && !evsel->no_aux_samples && !evsel__is_dummy_event(evsel)) { attr->sample_regs_user |=3D opts->sample_user_regs; + attr->sample_simd_regs_enabled =3D !!opts->sample_pred_reg_qwords; + evsel__set_sample_bit(evsel, REGS_USER); + } + + if ((opts->sample_user_vec_regs || opts->sample_user_pred_regs) && + !evsel->no_aux_samples && !evsel__is_dummy_event(evsel)) { + if (opts->sample_pred_reg_qwords) + attr->sample_simd_pred_reg_qwords =3D opts->sample_pred_reg_qwords; + else + attr->sample_simd_pred_reg_qwords =3D 1; + attr->sample_simd_vec_reg_user =3D opts->sample_user_vec_regs; + attr->sample_simd_vec_reg_qwords =3D opts->sample_vec_reg_qwords; + attr->sample_simd_pred_reg_user =3D opts->sample_user_pred_regs; evsel__set_sample_bit(evsel, REGS_USER); } =20 diff --git a/tools/perf/util/parse-regs-options.c b/tools/perf/util/parse-r= egs-options.c index 6cf865bfc2f7..3dfa7ec276c2 100644 --- a/tools/perf/util/parse-regs-options.c +++ b/tools/perf/util/parse-regs-options.c @@ -9,13 +9,13 @@ #include #include "util/perf_regs.h" #include "util/parse-regs-options.h" +#include "record.h" =20 static void -list_perf_regs(FILE *fp, uint64_t mask, int abi) +__list_gp_regs(FILE *fp, uint64_t mask, int abi) { const char *last_name =3D NULL; =20 - fprintf(fp, "available registers: "); for (int reg =3D 0; reg < 64; reg++) { const char *name; =20 @@ -27,14 +27,68 @@ list_perf_regs(FILE *fp, uint64_t mask, int abi) fprintf(fp, "%s%s", reg > 0 ? " " : "", name); last_name =3D name; } +} + +static void +__list_simd_regs(FILE *fp, uint64_t mask, bool intr, bool pred) +{ + uint64_t bitmap =3D 0; + uint16_t qwords =3D 0; + const char *name; + int i =3D 0; + + for (int reg_c =3D 0; reg_c < 64; reg_c++) { + if (((1ULL << reg_c) & mask) =3D=3D 0) + continue; + + name =3D perf_simd_reg_class_name(EM_HOST, reg_c, pred); + bitmap =3D intr ? + perf_intr_simd_reg_class_bitmap_qwords(EM_HOST, reg_c, &qwords, pred) : + perf_user_simd_reg_class_bitmap_qwords(EM_HOST, reg_c, &qwords, pred); + if (name && bitmap) + fprintf(fp, "%s%s0-%d", i++ > 0 ? " " : "", + name, fls64(bitmap) - 1); + } +} + +static void +list_perf_regs(FILE *fp, uint64_t mask, uint64_t simd_mask, + uint64_t pred_mask, int abi, bool intr) +{ + bool printed =3D false; + + fprintf(fp, "available registers: "); + + if (mask) { + __list_gp_regs(fp, mask, abi); + printed =3D true; + } + + if (simd_mask) { + if (printed) + fprintf(fp, " "); + __list_simd_regs(fp, simd_mask, intr, /*pred=3D*/false); + printed =3D true; + } + + if (pred_mask) { + if (printed) + fprintf(fp, " "); + __list_simd_regs(fp, pred_mask, intr, /*pred=3D*/true); + printed =3D true; + } + fputc('\n', fp); } =20 static uint64_t -name_to_perf_reg_mask(const char *to_match, uint64_t mask, int abi) +name_to_gp_reg_mask(const char *to_match, uint64_t mask, int abi) { uint64_t reg_mask =3D 0; =20 + if (!mask) + return reg_mask; + for (int reg =3D 0; reg < 64; reg++) { const char *name; =20 @@ -51,13 +105,79 @@ name_to_perf_reg_mask(const char *to_match, uint64_t m= ask, int abi) return reg_mask; } =20 +static bool +name_to_simd_reg_mask(struct record_opts *opts, const char *to_match, + uint64_t mask, bool intr, bool pred) +{ + bool matched =3D false; + uint64_t bitmap; + uint16_t qwords; + int reg_c; + + if (!mask) + return false; + + for (reg_c =3D 0; reg_c < 64; reg_c++) { + const char *name; + + if (((1ULL << reg_c) & mask) =3D=3D 0) + continue; + + name =3D perf_simd_reg_class_name(EM_HOST, reg_c, pred); + if (!name) + continue; + + if (!strcasecmp(to_match, name)) { + matched =3D true; + break; + } + } + + if (!matched) + return false; + + if (intr) { + bitmap =3D perf_intr_simd_reg_class_bitmap_qwords(EM_HOST, + reg_c, &qwords, pred); + } else { + bitmap =3D perf_user_simd_reg_class_bitmap_qwords(EM_HOST, + reg_c, &qwords, pred); + } + + /* Just need the highest qwords */ + if (pred) { + if (qwords >=3D opts->sample_pred_reg_qwords) { + opts->sample_pred_reg_qwords =3D qwords; + if (intr) + opts->sample_intr_pred_regs =3D bitmap; + else + opts->sample_user_pred_regs =3D bitmap; + } + } else { + if (qwords >=3D opts->sample_vec_reg_qwords) { + opts->sample_vec_reg_qwords =3D qwords; + if (intr) + opts->sample_intr_vec_regs =3D bitmap; + else + opts->sample_user_vec_regs =3D bitmap; + } + } + + return true; +} + static int __parse_regs(const struct option *opt, const char *str, int unset, bool in= tr) { uint64_t *mode =3D (uint64_t *)opt->value; + struct record_opts *opts; char *s, *os =3D NULL, *p; - int ret =3D -1; + uint64_t simd_mask; + uint64_t pred_mask; uint64_t mask; + const char *warn; + bool matched; + int ret =3D -1; int abi =3D 0; =20 if (unset) @@ -69,11 +189,16 @@ __parse_regs(const struct option *opt, const char *str= , int unset, bool intr) if (*mode) return -1; =20 - mask =3D intr ? perf_intr_reg_mask(EM_HOST, &abi) : perf_user_reg_mask(EM= _HOST, &abi); + mask =3D intr ? perf_intr_reg_mask(EM_HOST, &abi) : + perf_user_reg_mask(EM_HOST, &abi); + opts =3D intr ? container_of(opt->value, struct record_opts, sample_intr_= regs) : + container_of(opt->value, struct record_opts, sample_user_regs); =20 /* str may be NULL in case no arg is passed to -I */ if (!str) { *mode =3D mask; + if (abi & PERF_SAMPLE_REGS_ABI_SIMD) + opts->sample_pred_reg_qwords =3D 1; return 0; } =20 @@ -82,6 +207,15 @@ __parse_regs(const struct option *opt, const char *str,= int unset, bool intr) if (!s) return -1; =20 + if (intr) { + simd_mask =3D perf_intr_simd_reg_class_mask(EM_HOST, /*pred=3D*/false); + pred_mask =3D perf_intr_simd_reg_class_mask(EM_HOST, /*pred=3D*/true); + } else { + simd_mask =3D perf_user_simd_reg_class_mask(EM_HOST, /*pred=3D*/false); + pred_mask =3D perf_user_simd_reg_class_mask(EM_HOST, /*pred=3D*/true); + } + + warn =3D "Unknown register \"%s\", check man page or run \"perf record %s= ?\"\n"; for (;;) { uint64_t reg_mask; =20 @@ -90,15 +224,23 @@ __parse_regs(const struct option *opt, const char *str= , int unset, bool intr) *p =3D '\0'; =20 if (!strcmp(s, "?")) { - list_perf_regs(stderr, mask, abi); + list_perf_regs(stderr, mask, simd_mask, pred_mask, abi, intr); goto error; } =20 - reg_mask =3D name_to_perf_reg_mask(s, mask, abi); - if (reg_mask =3D=3D 0) { - ui__warning("Unknown register \"%s\", check man page or run \"perf reco= rd %s?\"\n", - s, intr ? "-I" : "--user-regs=3D"); - goto error; + reg_mask =3D name_to_gp_reg_mask(s, mask, abi); + if (reg_mask) { + if (abi & PERF_SAMPLE_REGS_ABI_SIMD) + opts->sample_pred_reg_qwords =3D 1; + } else { + matched =3D name_to_simd_reg_mask(opts, s, simd_mask, + intr, /*pred=3D*/false) || + name_to_simd_reg_mask(opts, s, pred_mask, + intr, /*pred=3D*/true); + if (!matched) { + ui__warning(warn, s, intr ? "-I" : "--user-regs=3D"); + goto error; + } } *mode |=3D reg_mask; =20 diff --git a/tools/perf/util/perf-regs-arch/perf_regs_x86.c b/tools/perf/ut= il/perf-regs-arch/perf_regs_x86.c index ae26d991cdc9..2bc93b600662 100644 --- a/tools/perf/util/perf-regs-arch/perf_regs_x86.c +++ b/tools/perf/util/perf-regs-arch/perf_regs_x86.c @@ -465,3 +465,295 @@ uint64_t __perf_reg_sp_x86(void) { return PERF_REG_X86_SP; } + +enum { + PERF_REG_CLASS_X86_OPMASK =3D 0, + PERF_REG_CLASS_X86_XMM, + PERF_REG_CLASS_X86_YMM, + PERF_REG_CLASS_X86_ZMM, + PERF_REG_X86_MAX_SIMD_CLASSES, +}; + +#define PERF_REG_CLASS_X86_PRED_MASK (BIT(PERF_REG_CLASS_X86_OPMASK)) +#define PERF_REG_CLASS_X86_SIMD_MASK (BIT(PERF_REG_CLASS_X86_XMM) | \ + BIT(PERF_REG_CLASS_X86_YMM) | \ + BIT(PERF_REG_CLASS_X86_ZMM)) + +/* + * This function is used to determin whether kernel perf subsystem supports + * which kinds of SIMD registers (OPMASK/XMM/YMM/ZMM) sampling. + * + * @sample_type: PERF_SAMPLE_REGS_INTR or PERF_SAMPLE_REGS_USER + * @qwords: the length of SIMD register, like 1/2/4/8 qwords for + * OPMASK/XMM/YMM/ZMM regisers. + * @mask: the bitamsk of SIMD register, like 0xffff for XMM0 ~ XMM15 + * @pred: whether It's a preceding SIMD register, like OPMASK register. + * + * Return value: true indicates support, otherwise no support. + */ +static bool +__support_simd_reg_class(uint64_t sample_type, uint16_t qwords, + uint64_t mask, bool pred) +{ + struct perf_event_attr attr =3D { + .type =3D PERF_TYPE_HARDWARE, + .config =3D PERF_COUNT_HW_CPU_CYCLES, + .sample_type =3D sample_type, + .disabled =3D 1, + .exclude_kernel =3D 1, + .sample_simd_regs_enabled =3D 1, + }; + int fd; + + attr.sample_period =3D 1; + + if (!pred) { + attr.sample_simd_vec_reg_qwords =3D qwords; + if (sample_type =3D=3D PERF_SAMPLE_REGS_INTR) + attr.sample_simd_vec_reg_intr =3D mask; + else + attr.sample_simd_vec_reg_user =3D mask; + } else { + attr.sample_simd_pred_reg_qwords =3D PERF_X86_OPMASK_QWORDS; + if (sample_type =3D=3D PERF_SAMPLE_REGS_INTR) + attr.sample_simd_pred_reg_intr =3D PERF_X86_SIMD_PRED_MASK; + else + attr.sample_simd_pred_reg_user =3D PERF_X86_SIMD_PRED_MASK; + } + + if (perf_pmus__num_core_pmus() > 1) { + __u64 type =3D perf_pmus__find_core_pmu()->type; + + attr.config |=3D type << PERF_PMU_TYPE_SHIFT; + } + + event_attr_init(&attr); + + fd =3D sys_perf_event_open(&attr, 0, -1, -1, 0); + if (fd !=3D -1) { + close(fd); + return true; + } + + return false; +} + +#define PERF_X86_SIMD_ZMMH_REGS (PERF_X86_SIMD_ZMM_REGS / 2) + +static bool __arch_has_simd_reg_class(uint64_t sample_type, int reg_class, + uint64_t *mask, uint16_t *qwords) +{ + bool supported =3D false; + uint64_t bits; + + *mask =3D 0; + *qwords =3D 0; + + switch (reg_class) { + case PERF_REG_CLASS_X86_OPMASK: + bits =3D BIT_ULL(PERF_X86_SIMD_OPMASK_REGS) - 1; + supported =3D __support_simd_reg_class(sample_type, + PERF_X86_OPMASK_QWORDS, + bits, true); + if (supported) { + *mask =3D bits; + *qwords =3D PERF_X86_OPMASK_QWORDS; + } + break; + case PERF_REG_CLASS_X86_XMM: + bits =3D BIT_ULL(PERF_X86_SIMD_XMM_REGS) - 1; + supported =3D __support_simd_reg_class(sample_type, + PERF_X86_XMM_QWORDS, + bits, false); + if (supported) { + *mask =3D bits; + *qwords =3D PERF_X86_XMM_QWORDS; + } + break; + case PERF_REG_CLASS_X86_YMM: + bits =3D BIT_ULL(PERF_X86_SIMD_YMM_REGS) - 1; + supported =3D __support_simd_reg_class(sample_type, + PERF_X86_YMM_QWORDS, + bits, false); + if (supported) { + *mask =3D bits; + *qwords =3D PERF_X86_YMM_QWORDS; + } + break; + case PERF_REG_CLASS_X86_ZMM: + bits =3D BIT_ULL(PERF_X86_SIMD_ZMM_REGS) - 1; + supported =3D __support_simd_reg_class(sample_type, + PERF_X86_ZMM_QWORDS, + bits, false); + if (supported) { + *mask =3D bits; + *qwords =3D PERF_X86_ZMM_QWORDS; + break; + } + + bits =3D BIT_ULL(PERF_X86_SIMD_ZMMH_REGS) - 1; + supported =3D __support_simd_reg_class(sample_type, + PERF_X86_ZMM_QWORDS, + bits, false); + if (supported) { + *mask =3D bits; + *qwords =3D PERF_X86_ZMM_QWORDS; + } + break; + default: + break; + } + + return supported; +} + +static bool __support_simd_sampling(void) +{ + uint64_t mask =3D BIT_ULL(PERF_X86_SIMD_XMM_REGS) - 1; + uint16_t qwords =3D PERF_X86_XMM_QWORDS; + static bool simd_sampling_supported; + static bool cached; + + if (cached) + return simd_sampling_supported; + + simd_sampling_supported =3D + __arch_has_simd_reg_class(PERF_SAMPLE_REGS_INTR, + PERF_REG_CLASS_X86_XMM, + &mask, &qwords); + simd_sampling_supported |=3D + __arch_has_simd_reg_class(PERF_SAMPLE_REGS_USER, + PERF_REG_CLASS_X86_XMM, + &mask, &qwords); + cached =3D true; + + return simd_sampling_supported; +} + +/* + * @x86_intr_simd_cached: indicates the data of below 3 + * x86_intr_simd_* items has been retrieved from kernel and cached. + * @x86_intr_simd_reg_class_mask: indicates which kinds of PRED/SIMD + * registers are supported for intr-regs option. Assume kernel perf + * subsystem supports XMM/YMM sampling, then the mask is + * PERF_REG_CLASS_X86_XMM|PERF_REG_CLASS_X86_YMM. + * @x86_intr_simd_mask: indicates register bitmask for each kind of + * supported PRED/SIMD register, like + * x86_intr_simd_mask[PERF_REG_CLASS_X86_XMM] =3D 0xffff. + * @x86_intr_simd_mask: indicates the register length (qwords uinit) + * for each kind of supported PRED/SIMD register, like + * x86_intr_simd_qwords[PERF_REG_CLASS_X86_XMM] =3D 2. + */ +static bool x86_intr_simd_cached; +static uint64_t x86_intr_simd_reg_class_mask; +static uint64_t x86_intr_simd_mask[PERF_REG_X86_MAX_SIMD_CLASSES]; +static uint16_t x86_intr_simd_qwords[PERF_REG_X86_MAX_SIMD_CLASSES]; + +/* + * Similar with above x86_intr_simd_* items, the difference is these + * items are used for user-regs option. + */ +static bool x86_user_simd_cached; +static uint64_t x86_user_simd_reg_class_mask; +static uint64_t x86_user_simd_mask[PERF_REG_X86_MAX_SIMD_CLASSES]; +static uint16_t x86_user_simd_qwords[PERF_REG_X86_MAX_SIMD_CLASSES]; + +static uint64_t __arch__simd_reg_class_mask(bool intr) +{ + uint64_t mask =3D 0; + bool supported; + int reg_c; + + if (!__support_simd_sampling()) + return 0; + + if (intr && x86_intr_simd_cached) + return x86_intr_simd_reg_class_mask; + + if (!intr && x86_user_simd_cached) + return x86_user_simd_reg_class_mask; + + for (reg_c =3D 0; reg_c < PERF_REG_X86_MAX_SIMD_CLASSES; reg_c++) { + supported =3D false; + + if (intr) { + supported =3D __arch_has_simd_reg_class( + PERF_SAMPLE_REGS_INTR, + reg_c, + &x86_intr_simd_mask[reg_c], + &x86_intr_simd_qwords[reg_c]); + } else { + supported =3D __arch_has_simd_reg_class( + PERF_SAMPLE_REGS_USER, + reg_c, + &x86_user_simd_mask[reg_c], + &x86_user_simd_qwords[reg_c]); + } + if (supported) + mask |=3D BIT_ULL(reg_c); + } + + if (intr) { + x86_intr_simd_reg_class_mask =3D mask; + x86_intr_simd_cached =3D true; + } else { + x86_user_simd_reg_class_mask =3D mask; + x86_user_simd_cached =3D true; + } + + return mask; +} + +static uint64_t +__arch__simd_reg_class_bitmap_qwords(bool intr, int reg_c, uint16_t *qword= s) +{ + uint64_t mask =3D 0; + + *qwords =3D 0; + if (reg_c >=3D PERF_REG_X86_MAX_SIMD_CLASSES) + return mask; + + if (intr) { + mask =3D x86_intr_simd_mask[reg_c]; + *qwords =3D x86_intr_simd_qwords[reg_c]; + } else { + mask =3D x86_user_simd_mask[reg_c]; + *qwords =3D x86_user_simd_qwords[reg_c]; + } + + return mask; +} + +uint64_t __perf_simd_reg_class_mask_x86(bool intr, bool pred) +{ + uint64_t mask =3D __arch__simd_reg_class_mask(intr); + + return pred ? mask & PERF_REG_CLASS_X86_PRED_MASK : + mask & PERF_REG_CLASS_X86_SIMD_MASK; +} + +uint64_t __perf_simd_reg_class_bitmap_qwords_x86(int reg_c, uint16_t *qwor= ds, + bool intr, bool pred) +{ + if (!x86_intr_simd_cached) + __perf_simd_reg_class_mask_x86(intr, pred); + return __arch__simd_reg_class_bitmap_qwords(intr, reg_c, qwords); +} + +const char *__perf_simd_reg_class_name_x86(int id, bool pred __maybe_unuse= d) +{ + switch (id) { + case PERF_REG_CLASS_X86_OPMASK: + return "OPMASK"; + case PERF_REG_CLASS_X86_XMM: + return "XMM"; + case PERF_REG_CLASS_X86_YMM: + return "YMM"; + case PERF_REG_CLASS_X86_ZMM: + return "ZMM"; + default: + return NULL; + } + + return NULL; +} diff --git a/tools/perf/util/perf_event_attr_fprintf.c b/tools/perf/util/pe= rf_event_attr_fprintf.c index 741c3d657a8b..c6b8e53e06fd 100644 --- a/tools/perf/util/perf_event_attr_fprintf.c +++ b/tools/perf/util/perf_event_attr_fprintf.c @@ -362,6 +362,12 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_eve= nt_attr *attr, PRINT_ATTRf(aux_start_paused, p_unsigned); PRINT_ATTRf(aux_pause, p_unsigned); PRINT_ATTRf(aux_resume, p_unsigned); + PRINT_ATTRf(sample_simd_pred_reg_qwords, p_unsigned); + PRINT_ATTRf(sample_simd_pred_reg_intr, p_hex); + PRINT_ATTRf(sample_simd_pred_reg_user, p_hex); + PRINT_ATTRf(sample_simd_vec_reg_qwords, p_unsigned); + PRINT_ATTRf(sample_simd_vec_reg_intr, p_hex); + PRINT_ATTRf(sample_simd_vec_reg_user, p_hex); =20 return ret; } diff --git a/tools/perf/util/perf_regs.c b/tools/perf/util/perf_regs.c index afc567718bee..dc99e797e715 100644 --- a/tools/perf/util/perf_regs.c +++ b/tools/perf/util/perf_regs.c @@ -246,3 +246,75 @@ uint64_t perf_arch_reg_sp(uint16_t e_machine) return 0; } } + +uint64_t perf_intr_simd_reg_class_mask(uint16_t e_machine, bool pred) +{ + switch (e_machine) { + case EM_386: + case EM_X86_64: + return __perf_simd_reg_class_mask_x86(/*intr=3D*/true, pred); + default: + return 0; + } +} + +uint64_t perf_user_simd_reg_class_mask(uint16_t e_machine, bool pred) +{ + switch (e_machine) { + case EM_386: + case EM_X86_64: + return __perf_simd_reg_class_mask_x86(/*intr=3D*/false, pred); + default: + return 0; + } +} + +uint64_t perf_intr_simd_reg_class_bitmap_qwords(uint16_t e_machine, int re= g_c, + uint16_t *qwords, bool pred) +{ + switch (e_machine) { + case EM_386: + case EM_X86_64: + return __perf_simd_reg_class_bitmap_qwords_x86(reg_c, qwords, + /*intr=3D*/true, + pred); + default: + *qwords =3D 0; + return 0; + } +} + +uint64_t perf_user_simd_reg_class_bitmap_qwords(uint16_t e_machine, int re= g_c, + uint16_t *qwords, bool pred) +{ + switch (e_machine) { + case EM_386: + case EM_X86_64: + return __perf_simd_reg_class_bitmap_qwords_x86(reg_c, qwords, + /*intr=3D*/false, + pred); + default: + *qwords =3D 0; + return 0; + } +} + +const char *perf_simd_reg_class_name(uint16_t e_machine, int id, bool pred) +{ + const char *name =3D NULL; + + switch (e_machine) { + case EM_386: + case EM_X86_64: + name =3D __perf_simd_reg_class_name_x86(id, pred); + break; + default: + break; + } + if (name) + return name; + + pr_debug("Failed to find %s register %d for ELF machine type %u\n", + pred ? "PRED" : "SIMD", id, e_machine); + return "unknown"; +} diff --git a/tools/perf/util/perf_regs.h b/tools/perf/util/perf_regs.h index c9501ca8045d..80d1d7316188 100644 --- a/tools/perf/util/perf_regs.h +++ b/tools/perf/util/perf_regs.h @@ -20,6 +20,13 @@ const char *perf_reg_name(int id, uint16_t e_machine, ui= nt32_t e_flags, int abi) int perf_reg_value(u64 *valp, struct regs_dump *regs, int id); uint64_t perf_arch_reg_ip(uint16_t e_machine); uint64_t perf_arch_reg_sp(uint16_t e_machine); +uint64_t perf_intr_simd_reg_class_mask(uint16_t e_machine, bool pred); +uint64_t perf_user_simd_reg_class_mask(uint16_t e_machine, bool pred); +uint64_t perf_intr_simd_reg_class_bitmap_qwords(uint16_t e_machine, int re= g_c, + uint16_t *qwords, bool pred); +uint64_t perf_user_simd_reg_class_bitmap_qwords(uint16_t e_machine, int re= g_c, + uint16_t *qwords, bool pred); +const char *perf_simd_reg_class_name(uint16_t e_machine, int id, bool pred= ); =20 int __perf_sdt_arg_parse_op_arm64(char *old_op, char **new_op); uint64_t __perf_reg_mask_arm64(bool intr); @@ -68,6 +75,10 @@ uint64_t __perf_reg_mask_x86(bool intr, int *abi); const char *__perf_reg_name_x86(int id, int abi); uint64_t __perf_reg_ip_x86(void); uint64_t __perf_reg_sp_x86(void); +uint64_t __perf_simd_reg_class_mask_x86(bool intr, bool pred); +uint64_t __perf_simd_reg_class_bitmap_qwords_x86(int reg_c, uint16_t *qwor= ds, + bool intr, bool pred); +const char *__perf_simd_reg_class_name_x86(int id, bool pred); =20 static inline uint64_t DWARF_MINIMAL_REGS(uint16_t e_machine) { diff --git a/tools/perf/util/record.h b/tools/perf/util/record.h index 93627c9a7338..37ed44b5f15b 100644 --- a/tools/perf/util/record.h +++ b/tools/perf/util/record.h @@ -62,6 +62,12 @@ struct record_opts { u64 branch_stack; u64 sample_intr_regs; u64 sample_user_regs; + u64 sample_intr_vec_regs; + u64 sample_user_vec_regs; + u32 sample_intr_pred_regs; + u32 sample_user_pred_regs; + u16 sample_vec_reg_qwords; + u16 sample_pred_reg_qwords; u64 default_interval; u64 user_interval; size_t auxtrace_snapshot_size; --=20 2.34.1 From nobody Thu Apr 2 09:33:38 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D2522374745; Tue, 24 Mar 2026 01:01:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774314103; cv=none; b=TXxfEyOkbPW4hGnxbznLIKBR2dQJQrcMinAxjr5vHL5TO21XmgrXUAzq/DoPaImgj45wGqB4K5LTQvrQRQuqTkKCSEA+GRw9fv4SxmX5DQWLMZQVQhSvQ2uA8C/3hG4C1RoB57i1gBxzVLtz9v7Wb+ZOsJ+ie2k1Ouj87wE1RsA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774314103; c=relaxed/simple; bh=5NI4ea9cKFwayWZZvfaw66PawTCNz+JzZthcJEGzSfo=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=kFVMR0DQC7/kqn4B2pOZPBxVPqyV7NpSzc7uGh9KYyR5gBapHHzgjlj9xQC624UYdoxYL5VZ3wGpoSZ2XR977wmDcSFPNiSXQpJ/OwlT4vGau46IEydgKXNvNRnLxoq3/iBqnAaAPpJhM7i4zF3dNTs5PCNWP/XApBvRoMbzl6A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Jaz8nIz/; arc=none smtp.client-ip=192.198.163.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Jaz8nIz/" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1774314102; x=1805850102; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=5NI4ea9cKFwayWZZvfaw66PawTCNz+JzZthcJEGzSfo=; b=Jaz8nIz/p+0WYR7CHtCzAuqDUq0y0ICTy/6DRUFqY2Z0nEwjujnQnUf3 tId7VoTHAuiJEYBmWYvO0D3SNkjXbVyrWLvKTEw2qVSDAJQH0ieGoIQGY f+y7PmX/aJom9pethpwydRdm5UdBea50t1z63OAGDOvUctsH/PxmQRadb tiUoO30niGF8rQvC3aola9pZGVRsCbTOtNUyuBGz0Ldw6l98wel9e/0g8 MyrHWwzHDu47WDOmwuQ/Z2Wkf2FIl9yIAhy1NJBu6QpLz+IxW4iUkdfGy uxgVd4NT8/ZS1Bkzx3bk7ydRFSgODFoxGJM8nWtGX/gOmGQNU5qbW2LCy A==; X-CSE-ConnectionGUID: oiTs01DJSjW8QiMrnknTcg== X-CSE-MsgGUID: z0b4X7f8TF2RGIPXkf5mkg== X-IronPort-AV: E=McAfee;i="6800,10657,11738"; a="75441749" X-IronPort-AV: E=Sophos;i="6.23,138,1770624000"; d="scan'208";a="75441749" Received: from orviesa003.jf.intel.com ([10.64.159.143]) by fmvoesa109.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Mar 2026 18:01:42 -0700 X-CSE-ConnectionGUID: ZJmOB2M8Syy/kh3qizfFmQ== X-CSE-MsgGUID: +36nYkBLSVy/CZPnlFGxvg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,138,1770624000"; d="scan'208";a="228263324" Received: from spr.sh.intel.com ([10.112.229.196]) by orviesa003.jf.intel.com with ESMTP; 23 Mar 2026 18:01:35 -0700 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Thomas Gleixner , Dave Hansen , Ian Rogers , Adrian Hunter , Jiri Olsa , Alexander Shishkin , Andi Kleen , Eranian Stephane Cc: Mark Rutland , broonie@kernel.org, Ravi Bangoria , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Zide Chen , Falcon Thomas , Dapeng Mi , Xudong Hao , Kan Liang , Dapeng Mi Subject: [Patch v7 4/4] perf regs: Enable dumping of SIMD registers Date: Tue, 24 Mar 2026 08:57:06 +0800 Message-Id: <20260324005706.3778057-5-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260324005706.3778057-1-dapeng1.mi@linux.intel.com> References: <20260324005706.3778057-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang This patch adds support for dumping SIMD registers using the new PERF_SAMPLE_REGS_ABI_SIMD ABI. Currently, the XMM, YMM, ZMM, OPMASK, eGPRs, and SSP registers on x86 platforms are supported with the PERF_SAMPLE_REGS_ABI_SIMD ABI. An example of the output is displayed below. Example: $perf record -e cycles:p -IXMM,YMM,OPMASK,SSP ./test $perf report -D ... ... 237538985992962 0x454d0 [0x480]: PERF_RECORD_SAMPLE(IP, 0x1): 179370/179370: 0xffffffff969627fc period: 124999 addr: 0 ... intr regs: mask 0x20000000000 ABI 64-bit .... SSP 0x0000000000000000 ... SIMD ABI nr_vectors 32 vector_qwords 4 nr_pred 8 pred_qwords 1 .... YMM [0] 0x0000000000004000 .... YMM [0] 0x000055e828695270 .... YMM [0] 0x0000000000000000 .... YMM [0] 0x0000000000000000 .... YMM [1] 0x000055e8286990e0 .... YMM [1] 0x000055e828698dd0 .... YMM [1] 0x0000000000000000 .... YMM [1] 0x0000000000000000 ... ... .... YMM [31] 0x0000000000000000 .... YMM [31] 0x0000000000000000 .... YMM [31] 0x0000000000000000 .... YMM [31] 0x0000000000000000 .... OPMASK[0] 0x0000000000100221 .... OPMASK[1] 0x0000000000000020 .... OPMASK[2] 0x000000007fffffff .... OPMASK[3] 0x0000000000000000 .... OPMASK[4] 0x0000000000000000 .... OPMASK[5] 0x0000000000000000 .... OPMASK[6] 0x0000000000000000 .... OPMASK[7] 0x0000000000000000 ... ... Signed-off-by: Kan Liang Co-developed-by: Dapeng Mi Signed-off-by: Dapeng Mi --- V7: 1) add assert() check for SIMD fields in sample data. 2) optimize regs_abi[] defination. tools/perf/util/evsel.c | 36 +++++++++++++++++++++ tools/perf/util/sample.h | 10 ++++++ tools/perf/util/session.c | 66 ++++++++++++++++++++++++++++++++++++++- 3 files changed, 111 insertions(+), 1 deletion(-) diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index 5f00489e714a..24cc7ba71ae1 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -3520,6 +3520,24 @@ int evsel__parse_sample(struct evsel *evsel, union p= erf_event *event, regs->mask =3D mask; regs->regs =3D (u64 *)array; array =3D (void *)array + sz; + + if (regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) { + assert(regs->nr_vectors <=3D + hweight64(evsel->core.attr.sample_simd_vec_reg_user)); + assert(regs->vector_qwords <=3D + evsel->core.attr.sample_simd_vec_reg_qwords); + assert(regs->nr_pred <=3D + hweight64(evsel->core.attr.sample_simd_pred_reg_user)); + assert(regs->pred_qwords <=3D + evsel->core.attr.sample_simd_pred_reg_qwords); + regs->config =3D *(u64 *)array; + array =3D (void *)array + sizeof(u64); + regs->simd_data =3D (u64 *)array; + sz =3D (regs->nr_vectors * regs->vector_qwords + + regs->nr_pred * regs->pred_qwords) * sizeof(u64); + OVERFLOW_CHECK(array, sz, max_size); + array =3D (void *)array + sz; + } } } =20 @@ -3577,6 +3595,24 @@ int evsel__parse_sample(struct evsel *evsel, union p= erf_event *event, regs->mask =3D mask; regs->regs =3D (u64 *)array; array =3D (void *)array + sz; + + if (regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) { + assert(regs->nr_vectors <=3D + hweight64(evsel->core.attr.sample_simd_vec_reg_intr)); + assert(regs->vector_qwords <=3D + evsel->core.attr.sample_simd_vec_reg_qwords); + assert(regs->nr_pred <=3D + hweight64(evsel->core.attr.sample_simd_pred_reg_intr)); + assert(regs->pred_qwords <=3D + evsel->core.attr.sample_simd_pred_reg_qwords); + regs->config =3D *(u64 *)array; + array =3D (void *)array + sizeof(u64); + regs->simd_data =3D (u64 *)array; + sz =3D (regs->nr_vectors * regs->vector_qwords + + regs->nr_pred * regs->pred_qwords) * sizeof(u64); + OVERFLOW_CHECK(array, sz, max_size); + array =3D (void *)array + sz; + } } } =20 diff --git a/tools/perf/util/sample.h b/tools/perf/util/sample.h index 3cce8dd202aa..21f3416d3755 100644 --- a/tools/perf/util/sample.h +++ b/tools/perf/util/sample.h @@ -15,6 +15,16 @@ struct regs_dump { u64 abi; u64 mask; u64 *regs; + union { + u64 config; + struct { + u16 nr_vectors; + u16 vector_qwords; + u16 nr_pred; + u16 pred_qwords; + }; + }; + u64 *simd_data; =20 /* Cached values/mask filled by first register access. */ u64 cache_regs[PERF_SAMPLE_REGS_CACHE_SIZE]; diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c index 7cf7bf86205d..453d44d32162 100644 --- a/tools/perf/util/session.c +++ b/tools/perf/util/session.c @@ -972,15 +972,77 @@ static void regs_dump__printf(u64 mask, struct regs_d= ump *regs, } } =20 +static void simd_regs_dump__printf(uint16_t e_machine, struct regs_dump *r= egs, bool intr) +{ + const char *name =3D "unknown"; + int i, idx =3D 0; + uint16_t qwords; + int reg_c; + + if (!(regs->abi & PERF_SAMPLE_REGS_ABI_SIMD)) + return; + + printf("... SIMD ABI nr_vectors %d vector_qwords %d nr_pred %d pred_qword= s %d\n", + regs->nr_vectors, regs->vector_qwords, + regs->nr_pred, regs->pred_qwords); + + for (reg_c =3D 0; reg_c < 64; reg_c++) { + if (intr) { + perf_intr_simd_reg_class_bitmap_qwords(e_machine, reg_c, + &qwords, /*pred=3D*/false); + } else { + perf_user_simd_reg_class_bitmap_qwords(e_machine, reg_c, + &qwords, /*pred=3D*/false); + } + if (regs->vector_qwords =3D=3D qwords) { + name =3D perf_simd_reg_class_name(e_machine, reg_c, /*pred=3D*/false); + break; + } + } + + for (i =3D 0; i < regs->nr_vectors; i++) { + printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->simd_data[idx+= +]); + printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->simd_data[idx+= +]); + if (regs->vector_qwords > 2) { + printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->simd_data[idx= ++]); + printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->simd_data[idx= ++]); + } + if (regs->vector_qwords > 4) { + printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->simd_data[idx= ++]); + printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->simd_data[idx= ++]); + printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->simd_data[idx= ++]); + printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->simd_data[idx= ++]); + } + } + + name =3D "unknown"; + for (reg_c =3D 0; reg_c < 64; reg_c++) { + if (intr) { + perf_intr_simd_reg_class_bitmap_qwords(e_machine, reg_c, + &qwords, /*pred=3D*/true); + } else { + perf_user_simd_reg_class_bitmap_qwords(e_machine, reg_c, + &qwords, /*pred=3D*/true); + } + if (regs->pred_qwords =3D=3D qwords) { + name =3D perf_simd_reg_class_name(e_machine, reg_c, /*pred=3D*/true); + break; + } + } + for (i =3D 0; i < regs->nr_pred; i++) + printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->simd_data[idx+= +]); +} + static const char *regs_abi[] =3D { [PERF_SAMPLE_REGS_ABI_NONE] =3D "none", [PERF_SAMPLE_REGS_ABI_32] =3D "32-bit", [PERF_SAMPLE_REGS_ABI_64] =3D "64-bit", + [PERF_SAMPLE_REGS_ABI_SIMD | PERF_SAMPLE_REGS_ABI_64] =3D "64-bit SIMD", }; =20 static inline const char *regs_dump_abi(struct regs_dump *d) { - if (d->abi > PERF_SAMPLE_REGS_ABI_64) + if (d->abi >=3D ARRAY_SIZE(regs_abi) || !regs_abi[d->abi]) return "unknown"; =20 return regs_abi[d->abi]; @@ -1010,6 +1072,7 @@ static void regs_user__printf(struct perf_sample *sam= ple, uint16_t e_machine, ui =20 if (user_regs->regs) regs__printf("user", user_regs, e_machine, e_flags); + simd_regs_dump__printf(e_machine, user_regs, /*intr=3D*/false); } =20 static void regs_intr__printf(struct perf_sample *sample, uint16_t e_machi= ne, uint32_t e_flags) @@ -1023,6 +1086,7 @@ static void regs_intr__printf(struct perf_sample *sam= ple, uint16_t e_machine, ui =20 if (intr_regs->regs) regs__printf("intr", intr_regs, e_machine, e_flags); + simd_regs_dump__printf(e_machine, intr_regs, /*intr=3D*/true); } =20 static void stack_user__printf(struct stack_dump *dump) --=20 2.34.1