From nobody Mon Jun 8 12:11:53 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BF0B93A9627; Fri, 29 May 2026 08:30:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780043430; cv=none; b=uzacGPvFEEVx1vdvyWDRNNjyTLeMNq1005d6wCr1fXHsDR8WH/YxrOXBfBucm+fLENIwmj5l36LQU5G2/ecbZkcj9CCte93arPYb5ndHh/6DFzjH5YKK2mZxJiNUdCiIa++gAYakJMRJn9RPj7iqb4ZHB3DLos6mR85SUlALWb8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780043430; c=relaxed/simple; bh=FIOb3Hfv+EalZ1nxHis5y9pZIwqMazRY5iZnsKSnzbo=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=UpbZ21Uer+mvgVFkjETkjVFabe6VzVFt1uz3Rlg2kKrp8NeoGuZdvrgENrRvkgvxQO5qaCUC1jvsVYYl1LR2krYTtUmbQH/arV2WkbOtG6BZ4aJGgMQkUw+J+ap2iZFzlJCOHMoPq/8txefbYITzF6PmPJnFGSZYxQfdEVHc2pI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=QGZtukYY; arc=none smtp.client-ip=198.175.65.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="QGZtukYY" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1780043429; x=1811579429; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=FIOb3Hfv+EalZ1nxHis5y9pZIwqMazRY5iZnsKSnzbo=; b=QGZtukYYkfs5AaYQbtmzi1CIHUaH9Bc5YLln52bCF5bDcauNI7lXoLXf 0baTReuGWJOsiJUGuOk0H+1jLERLdihZRcoZRNTfZJQQl12+I+fNU5vw4 3OAayk0hHlnL44H/Cwq+ZQICJLdk304fprCceTBuZahOakhpZjo5iNehy xeKwQSPEB9Wspd7xMlFm/6tI67N/zDs8BAh4JgMEQiD8rCSYYhCPDZuO1 HohlnPyGuhiGha58C47e4DZqKtdJX/s/0UVrKyv0/OCoy2PDbrmXqxk92 q4do/lMqxcC7qvrr7gTOeuH5o5+fwNnmKLKxO2vth2s0F++yTRRRsxgnG g==; X-CSE-ConnectionGUID: QwFpXPxZQoKcgWumS8Va0Q== X-CSE-MsgGUID: ojI70WA/Q+SJd87hVFHjZA== X-IronPort-AV: E=McAfee;i="6800,10657,11800"; a="81076332" X-IronPort-AV: E=Sophos;i="6.24,175,1774335600"; d="scan'208";a="81076332" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by orvoesa108.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 May 2026 01:30:19 -0700 X-CSE-ConnectionGUID: AqlT6aeESX+lzsTTtroUNQ== X-CSE-MsgGUID: UxN2cunNQu699dj5oR/kRg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,175,1774335600"; d="scan'208";a="247734805" Received: from spr.sh.intel.com ([10.112.230.239]) by orviesa005.jf.intel.com with ESMTP; 29 May 2026 01:30:14 -0700 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Thomas Gleixner , Dave Hansen , Ian Rogers , Adrian Hunter , Jiri Olsa , Alexander Shishkin , Andi Kleen , Eranian Stephane Cc: Mark Rutland , broonie@kernel.org, Ravi Bangoria , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Zide Chen , Falcon Thomas , Dapeng Mi , Xudong Hao , Dapeng Mi , Kan Liang Subject: [Patch v8 1/5] perf headers: Sync perf_event.h/perf_regs.h with the kernel headers Date: Fri, 29 May 2026 16:24:47 +0800 Message-Id: <20260529082451.591783-2-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260529082451.591783-1-dapeng1.mi@linux.intel.com> References: <20260529082451.591783-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Sync the UAPI header changes of supporting SIMD/eGPRs/SSP sampling into corresponding tools UAPI headers. Additionally, add sanity check if the new introduced __reserved_4 field in perf_attr_check(). Co-developed-by: Kan Liang Signed-off-by: Kan Liang Signed-off-by: Dapeng Mi --- tools/arch/x86/include/uapi/asm/perf_regs.h | 51 +++++++++++++++++++++ tools/include/uapi/linux/perf_event.h | 49 ++++++++++++++++++-- tools/perf/util/header.c | 3 +- 3 files changed, 98 insertions(+), 5 deletions(-) diff --git a/tools/arch/x86/include/uapi/asm/perf_regs.h b/tools/arch/x86/i= nclude/uapi/asm/perf_regs.h index 7c9d2bb3833b..31a025cb9dba 100644 --- a/tools/arch/x86/include/uapi/asm/perf_regs.h +++ b/tools/arch/x86/include/uapi/asm/perf_regs.h @@ -27,9 +27,35 @@ enum perf_event_x86_regs { PERF_REG_X86_R13, PERF_REG_X86_R14, PERF_REG_X86_R15, + /* + * The eGPRs/SSP and XMM have overlaps. Only one can be used + * at a time. The ABI PERF_SAMPLE_REGS_ABI_SIMD is used to + * distinguish which one is used. If PERF_SAMPLE_REGS_ABI_SIMD + * is set, then eGPRs/SSP is used, otherwise, XMM is used. + * + * Extended GPRs (eGPRs) + */ + PERF_REG_X86_R16, + PERF_REG_X86_R17, + PERF_REG_X86_R18, + PERF_REG_X86_R19, + PERF_REG_X86_R20, + PERF_REG_X86_R21, + PERF_REG_X86_R22, + PERF_REG_X86_R23, + PERF_REG_X86_R24, + PERF_REG_X86_R25, + PERF_REG_X86_R26, + PERF_REG_X86_R27, + PERF_REG_X86_R28, + PERF_REG_X86_R29, + PERF_REG_X86_R30, + PERF_REG_X86_R31, + PERF_REG_X86_SSP, /* These are the limits for the GPRs. */ PERF_REG_X86_32_MAX =3D PERF_REG_X86_GS + 1, PERF_REG_X86_64_MAX =3D PERF_REG_X86_R15 + 1, + PERF_REG_MISC_MAX =3D PERF_REG_X86_SSP + 1, =20 /* These all need two bits set because they are 128bit */ PERF_REG_X86_XMM0 =3D 32, @@ -54,5 +80,30 @@ enum perf_event_x86_regs { }; =20 #define PERF_REG_EXTENDED_MASK (~((1ULL << PERF_REG_X86_XMM0) - 1)) +#define PERF_X86_EGPRS_MASK __GENMASK_ULL(PERF_REG_X86_R31, PERF_REG_X86_R= 16) + +enum { + PERF_X86_SIMD_XMM_REGS =3D 16, + PERF_X86_SIMD_YMM_REGS =3D 16, + PERF_X86_SIMD_ZMM_REGS =3D 32, + PERF_X86_SIMD_VEC_REGS_MAX =3D PERF_X86_SIMD_ZMM_REGS, + + PERF_X86_SIMD_OPMASK_REGS =3D 8, + PERF_X86_SIMD_PRED_REGS_MAX =3D PERF_X86_SIMD_OPMASK_REGS, +}; + +#define PERF_X86_SIMD_PRED_MASK __GENMASK(PERF_X86_SIMD_PRED_REGS_MAX - 1,= 0) +#define PERF_X86_SIMD_VEC_MASK __GENMASK_ULL(PERF_X86_SIMD_VEC_REGS_MAX - = 1, 0) + +#define PERF_X86_H16ZMM_BASE 16 + +enum { + /* 1 qword =3D 8 bytes */ + PERF_X86_OPMASK_QWORDS =3D 1, + PERF_X86_XMM_QWORDS =3D 2, + PERF_X86_YMM_QWORDS =3D 4, + PERF_X86_ZMM_QWORDS =3D 8, + PERF_X86_SIMD_QWORDS_MAX =3D PERF_X86_ZMM_QWORDS, +}; =20 #endif /* _ASM_X86_PERF_REGS_H */ diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/lin= ux/perf_event.h index fd10aa8d697f..c49fc76292f7 100644 --- a/tools/include/uapi/linux/perf_event.h +++ b/tools/include/uapi/linux/perf_event.h @@ -314,8 +314,9 @@ enum { */ enum perf_sample_regs_abi { PERF_SAMPLE_REGS_ABI_NONE =3D 0, - PERF_SAMPLE_REGS_ABI_32 =3D 1, - PERF_SAMPLE_REGS_ABI_64 =3D 2, + PERF_SAMPLE_REGS_ABI_32 =3D (1 << 0), + PERF_SAMPLE_REGS_ABI_64 =3D (1 << 1), + PERF_SAMPLE_REGS_ABI_SIMD =3D (1 << 2), }; =20 /* @@ -383,6 +384,7 @@ enum perf_event_read_format { #define PERF_ATTR_SIZE_VER7 128 /* Add: sig_data */ #define PERF_ATTR_SIZE_VER8 136 /* Add: config3 */ #define PERF_ATTR_SIZE_VER9 144 /* add: config4 */ +#define PERF_ATTR_SIZE_VER10 176 /* Add: sample_simd_{vec|pred}_reg_* */ =20 /* * 'struct perf_event_attr' contains various attributes that define @@ -547,6 +549,29 @@ struct perf_event_attr { =20 __u64 config3; /* extension of config2 */ __u64 config4; /* extension of config3 */ + + /* + * Defines the sampling SIMD/PRED(predicate) registers bitmap and + * qwords (8 bytes) length. + * + * sample_simd_regs_enabled !=3D 0 indicates there are SIMD/PRED + * registers to be sampled, the SIMD/PRED registers bitmap and + * qwords length are represented in + * sample_simd_{vec|pred}_reg_{intr|user} and + * sample_simd_{vec|pred}_reg_qwords fields separately. + * + * sample_simd_regs_enabled =3D=3D 0 indicates no SIMD/PRED registers + * are sampled. + */ + __u16 sample_simd_regs_enabled; + __u16 sample_simd_pred_reg_qwords; + __u16 sample_simd_vec_reg_qwords; + __u16 __reserved_4; + + __u32 sample_simd_pred_reg_intr; + __u32 sample_simd_pred_reg_user; + __u64 sample_simd_vec_reg_intr; + __u64 sample_simd_vec_reg_user; }; =20 /* @@ -1020,7 +1045,15 @@ enum perf_event_type { * } && PERF_SAMPLE_BRANCH_STACK * * { u64 abi; # enum perf_sample_regs_abi - * u64 regs[weight(mask)]; } && PERF_SAMPLE_REGS_USER + * u64 regs[weight(mask)]; + * struct { + * u64 nr_vectors; # 0 ... weight(sample_simd_vec_reg_user) + * u64 vector_qwords; # 0 ... sample_simd_vec_reg_qwords + * u64 nr_pred; # 0 ... weight(sample_simd_pred_reg_user) + * u64 pred_qwords; # 0 ... sample_simd_pred_reg_qwords + * u64 data[nr_vectors * vector_qwords + nr_pred * pred_qwords]; + * } && (abi & PERF_SAMPLE_REGS_ABI_SIMD) + * } && PERF_SAMPLE_REGS_USER * * { u64 size; * char data[size]; @@ -1047,7 +1080,15 @@ enum perf_event_type { * { u64 data_src; } && PERF_SAMPLE_DATA_SRC * { u64 transaction; } && PERF_SAMPLE_TRANSACTION * { u64 abi; # enum perf_sample_regs_abi - * u64 regs[weight(mask)]; } && PERF_SAMPLE_REGS_INTR + * u64 regs[weight(mask)]; + * struct { + * u64 nr_vectors; # 0 ... weight(sample_simd_vec_reg_intr) + * u64 vector_qwords; # 0 ... sample_simd_vec_reg_qwords + * u64 nr_pred; # 0 ... weight(sample_simd_pred_reg_intr) + * u64 pred_qwords; # 0 ... sample_simd_pred_reg_qwords + * u64 data[nr_vectors * vector_qwords + nr_pred * pred_qwords]; + * } && (abi & PERF_SAMPLE_REGS_ABI_SIMD) + * } && PERF_SAMPLE_REGS_INTR * { u64 phys_addr;} && PERF_SAMPLE_PHYS_ADDR * { u64 cgroup;} && PERF_SAMPLE_CGROUP * { u64 data_page_size;} && PERF_SAMPLE_DATA_PAGE_SIZE diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c index f30e48eb3fc3..e8e4e00d6b4d 100644 --- a/tools/perf/util/header.c +++ b/tools/perf/util/header.c @@ -2091,7 +2091,8 @@ static void free_event_desc(struct evsel *events) =20 static bool perf_attr_check(struct perf_event_attr *attr) { - if (attr->__reserved_1 || attr->__reserved_2 || attr->__reserved_3) { + if (attr->__reserved_1 || attr->__reserved_2 || + attr->__reserved_3 || attr->__reserved_4) { pr_warning("Reserved bits are set unexpectedly. " "Please update perf tool.\n"); return false; --=20 2.34.1 From nobody Mon Jun 8 12:11:53 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 533483AD516; Fri, 29 May 2026 08:30:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780043428; cv=none; b=uR+9bHmYgLFiBMH37B8/Z3ayogRtqVu12HK19PvrSySIrgUcmSFKNhmbljawFdzt831iNGt4QZFDCEhS7g7fePxhtsxMGZGvZXRluJrCQQoZ7nKC9j4q8ebMHS/FoCZjea4HdMjPLD67P8XFDQOMOJGZwTP/LiOlGsNpzUDoDT4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780043428; c=relaxed/simple; bh=38L3rd0hOOlo0hcqqneUunyvxNS0mhukeEl5gKe4FIQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=MKdTkPKjWtEYhvAnvpRG/amgJY7sa64AH0xye+z0/Zgv3n0QYrA+feB/EigVc9ZXn6zCc3/X1PRFysIeA85/oG1ZDGvEnDW99oa4Pmoe17hLjGLDhAwz+xEKBXtq5CpqDwLLNr28Z/50RgI+WkilQGxvlLjQ/98dqNZ9IIXm1z4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=f4DEQ34Y; arc=none smtp.client-ip=198.175.65.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="f4DEQ34Y" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1780043426; x=1811579426; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=38L3rd0hOOlo0hcqqneUunyvxNS0mhukeEl5gKe4FIQ=; b=f4DEQ34YNYEbhVXio6Ec2ANz1grFhffu+Hv26MDrhKADPWv/C2p5nOfX uZrFdoE0y7JlYgKgzgYqXuZONSBczuiIhuWMYyV/SX7xeEISlYN6n0NG6 iQrpRIEE1cwco6RohXoMZSMEjvnT+naPLHQfDrUxt1AnArxJmZmeTmUEp MI/8OxiDCwMkicqffV+tQv4Gt+YR81yOO3pTU74F4So4b4dPmlVGhQkMf p3KmSr7/nwW51qcFUyW4G0Q7Chcc8RCx9ReRCMCH0FjFFu1wBIb8Cpie/ 34rDgX1E0UtI38D9kGrQMlAiT8jqY2660htngNUuGlPltEwZ6dANC4FDA Q==; X-CSE-ConnectionGUID: 5Z0SRFUrSuKMhxuaRf0S0w== X-CSE-MsgGUID: VOjq58aXRzqQurByqnQSMA== X-IronPort-AV: E=McAfee;i="6800,10657,11800"; a="81076350" X-IronPort-AV: E=Sophos;i="6.24,175,1774335600"; d="scan'208";a="81076350" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by orvoesa108.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 May 2026 01:30:25 -0700 X-CSE-ConnectionGUID: EcD2E8d4RrG5QiwZYoEA6g== X-CSE-MsgGUID: KO+SSG+tR5+DheNeJE4GAA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,175,1774335600"; d="scan'208";a="247734826" Received: from spr.sh.intel.com ([10.112.230.239]) by orviesa005.jf.intel.com with ESMTP; 29 May 2026 01:30:20 -0700 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Thomas Gleixner , Dave Hansen , Ian Rogers , Adrian Hunter , Jiri Olsa , Alexander Shishkin , Andi Kleen , Eranian Stephane Cc: Mark Rutland , broonie@kernel.org, Ravi Bangoria , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Zide Chen , Falcon Thomas , Dapeng Mi , Xudong Hao , Dapeng Mi Subject: [Patch v8 2/5] perf regs: Support x86 eGPRs/SSP sampling Date: Fri, 29 May 2026 16:24:48 +0800 Message-Id: <20260529082451.591783-3-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260529082451.591783-1-dapeng1.mi@linux.intel.com> References: <20260529082451.591783-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This patch adds support for sampling x86 extended GP registers (R16-R31) and the shadow stack pointer (SSP) register. The original XMM registers space in sample_regs_user/sample_regs_intr is reclaimed to represent the eGPRs and SSP when SIMD registers sampling is supported with the new SIMD sampling fields in the perf_event_attr structure. This necessitates a way to distinguish which register layout is used for the sample_regs_user/sample_regs_intr bitmap. To address this, a new "abi" argument is added to the helpers perf_intr_reg_mask(), perf_user_reg_mask(), and perf_reg_name(). When "abi & PERF_SAMPLE_REGS_ABI_SIMD" is true, it indicates the eGPRs and SSP layout is represented; otherwise, the legacy XMM registers are represented. Please note that PERF_SAMPLE_REGS_ABI_SIMD is set by default on platforms that support SIMD register sampling, even when no eGPR or SSP register is requested (for example, -Iax). As a result, sample_regs_intr and sample_regs_usr always use the new GPR layout on platforms with SIMD register sampling support. The patch only supports eGPRs and SSP sampling, the complete SIMD registers sampling would be supported in the next patch. Signed-off-by: Dapeng Mi --- tools/perf/builtin-inject.c | 2 + tools/perf/builtin-script.c | 2 +- tools/perf/util/evsel.c | 23 +++- tools/perf/util/intel-pt.c | 1 + tools/perf/util/parse-regs-options.c | 35 +++-- .../perf/util/perf-regs-arch/perf_regs_x86.c | 124 +++++++++++++++--- tools/perf/util/perf_regs.c | 12 +- tools/perf/util/perf_regs.h | 10 +- tools/perf/util/record.h | 7 + .../scripting-engines/trace-event-python.c | 2 +- tools/perf/util/session.c | 13 +- tools/perf/util/synthetic-events.c | 8 ++ 12 files changed, 194 insertions(+), 45 deletions(-) diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c index f174bc69cec4..f6611d7e85eb 100644 --- a/tools/perf/builtin-inject.c +++ b/tools/perf/builtin-inject.c @@ -457,6 +457,8 @@ static int perf_event__convert_sample_callchain(const s= truct perf_tool *tool, /* adjust sample size for stack and regs */ sample_size -=3D sample->user_stack.size; sample_size -=3D (hweight64(evsel->core.attr.sample_regs_user) + 1) * siz= eof(u64); + if (sample->user_regs && sample->user_regs->abi & PERF_SAMPLE_REGS_ABI_SI= MD) + sample_size -=3D 4 * sizeof(u64); /* Reduce SIMD regs header size */ sample_size +=3D (sample->callchain->nr + 1) * sizeof(u64); event_copy->header.size =3D sample_size; =20 diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index c8ac9f01a36b..8ec791e22778 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -730,7 +730,7 @@ static int perf_sample__fprintf_regs(struct regs_dump *= regs, uint64_t mask, for_each_set_bit(r, (unsigned long *) &mask, sizeof(mask) * 8) { u64 val =3D regs->regs[i++]; printed +=3D fprintf(fp, "%5s:0x%"PRIx64" ", - perf_reg_name(r, e_machine, e_flags), + perf_reg_name(r, e_machine, e_flags, regs->abi), val); } =20 diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index 2ee87fd84d3e..1c856a2ecc6e 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -1055,19 +1055,22 @@ static void __evsel__config_callchain(struct evsel = *evsel, const struct record_o } =20 if (param->record_mode =3D=3D CALLCHAIN_DWARF) { + int abi =3D -1; /* -1 indicates only basic GPRs are needed. */ + if (!function) { uint16_t e_machine =3D evsel__e_machine(evsel, /*e_flags=3D*/NULL); =20 evsel__set_sample_bit(evsel, REGS_USER); evsel__set_sample_bit(evsel, STACK_USER); if (opts->sample_user_regs && - DWARF_MINIMAL_REGS(e_machine) !=3D perf_user_reg_mask(EM_HOST)) { + DWARF_MINIMAL_REGS(e_machine) !=3D perf_user_reg_mask(EM_HOST, &abi= )) { attr->sample_regs_user |=3D DWARF_MINIMAL_REGS(e_machine); pr_warning("WARNING: The use of --call-graph=3Ddwarf may require all t= he user registers, " "specifying a subset with --user-regs may render DWARF unwinding u= nreliable, " "so the minimal registers set (IP, SP) is explicitly forced.\n"); } else { - attr->sample_regs_user |=3D perf_user_reg_mask(EM_HOST); + abi =3D -1; + attr->sample_regs_user |=3D perf_user_reg_mask(EM_HOST, &abi); } attr->sample_stack_user =3D param->dump_size; attr->exclude_callchain_user =3D 1; @@ -1587,12 +1590,14 @@ void evsel__config(struct evsel *evsel, const struc= t record_opts *opts, if (opts->sample_intr_regs && !evsel->no_aux_samples && !evsel__is_dummy_event(evsel)) { attr->sample_regs_intr =3D opts->sample_intr_regs; + attr->sample_simd_regs_enabled =3D !!opts->sample_simd_regs_enabled; evsel__set_sample_bit(evsel, REGS_INTR); } =20 if (opts->sample_user_regs && !evsel->no_aux_samples && !evsel__is_dummy_event(evsel)) { attr->sample_regs_user |=3D opts->sample_user_regs; + attr->sample_simd_regs_enabled =3D !!opts->sample_simd_regs_enabled; evsel__set_sample_bit(evsel, REGS_USER); } =20 @@ -3495,6 +3500,13 @@ int evsel__parse_sample(struct evsel *evsel, union p= erf_event *event, regs->mask =3D mask; regs->regs =3D (u64 *)array; array =3D (void *)array + sz; + + if (regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) { + /* Skip SIMD-regs header. */ + sz =3D 4 * sizeof(u64); + OVERFLOW_CHECK(array, sz, max_size); + array =3D (void *)array + sz; + } } } =20 @@ -3552,6 +3564,13 @@ int evsel__parse_sample(struct evsel *evsel, union p= erf_event *event, regs->mask =3D mask; regs->regs =3D (u64 *)array; array =3D (void *)array + sz; + + if (regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) { + /* Skip SIMD-regs header. */ + sz =3D 4 * sizeof(u64); + OVERFLOW_CHECK(array, sz, max_size); + array =3D (void *)array + sz; + } } } =20 diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c index fc9eec8b54b8..2729ad8c6d26 100644 --- a/tools/perf/util/intel-pt.c +++ b/tools/perf/util/intel-pt.c @@ -2470,6 +2470,7 @@ static int intel_pt_do_synth_pebs_sample(struct intel= _pt_queue *ptq, struct evse } =20 if (sample_type & PERF_SAMPLE_REGS_INTR && + !evsel->core.attr.sample_simd_regs_enabled && (items->mask[INTEL_PT_GP_REGS_POS] || items->mask[INTEL_PT_XMM_POS])) { u64 regs_mask =3D evsel->core.attr.sample_regs_intr; diff --git a/tools/perf/util/parse-regs-options.c b/tools/perf/util/parse-r= egs-options.c index c93c2f0c8105..70a1cc90b2c1 100644 --- a/tools/perf/util/parse-regs-options.c +++ b/tools/perf/util/parse-regs-options.c @@ -6,11 +6,14 @@ #include #include "util/debug.h" #include +#include #include #include "util/perf_regs.h" #include "util/parse-regs-options.h" +#include "record.h" =20 -static void list_perf_regs(FILE *fp, uint64_t mask) +static void +list_perf_regs(FILE *fp, uint64_t mask, int abi) { const char *last_name =3D NULL; =20 @@ -21,7 +24,7 @@ static void list_perf_regs(FILE *fp, uint64_t mask) if (((1ULL << reg) & mask) =3D=3D 0) continue; =20 - name =3D perf_reg_name(reg, EM_HOST, EF_HOST); + name =3D perf_reg_name(reg, EM_HOST, EF_HOST, abi); if (name && (!last_name || strcmp(last_name, name))) fprintf(fp, "%s%s", reg > 0 ? " " : "", name); last_name =3D name; @@ -29,7 +32,8 @@ static void list_perf_regs(FILE *fp, uint64_t mask) fputc('\n', fp); } =20 -static uint64_t name_to_perf_reg_mask(const char *to_match, uint64_t mask) +static uint64_t +name_to_perf_reg_mask(const char *to_match, uint64_t mask, int abi) { uint64_t reg_mask =3D 0; =20 @@ -39,7 +43,7 @@ static uint64_t name_to_perf_reg_mask(const char *to_matc= h, uint64_t mask) if (((1ULL << reg) & mask) =3D=3D 0) continue; =20 - name =3D perf_reg_name(reg, EM_HOST, EF_HOST); + name =3D perf_reg_name(reg, EM_HOST, EF_HOST, abi); if (!name) continue; =20 @@ -53,9 +57,12 @@ static int __parse_regs(const struct option *opt, const char *str, int unset, bool in= tr) { uint64_t *mode =3D (uint64_t *)opt->value; + struct record_opts *opts; char *s, *os =3D NULL, *p; + const char *warn; int ret =3D -1; uint64_t mask; + int abi =3D 0; =20 if (unset) return 0; @@ -66,11 +73,16 @@ __parse_regs(const struct option *opt, const char *str,= int unset, bool intr) if (*mode) return -1; =20 - mask =3D intr ? perf_intr_reg_mask(EM_HOST) : perf_user_reg_mask(EM_HOST); + mask =3D intr ? perf_intr_reg_mask(EM_HOST, &abi) : + perf_user_reg_mask(EM_HOST, &abi); + opts =3D intr ? container_of(opt->value, struct record_opts, sample_intr_= regs) : + container_of(opt->value, struct record_opts, sample_user_regs); =20 /* str may be NULL in case no arg is passed to -I */ if (!str) { *mode =3D mask; + if (abi & PERF_SAMPLE_REGS_ABI_SIMD) + opts->sample_simd_regs_enabled =3D 1; return 0; } =20 @@ -79,6 +91,7 @@ __parse_regs(const struct option *opt, const char *str, i= nt unset, bool intr) if (!s) return -1; =20 + warn =3D "Unknown register \"%s\", check man page or run \"perf record %s= ?\"\n"; for (;;) { uint64_t reg_mask; =20 @@ -87,14 +100,16 @@ __parse_regs(const struct option *opt, const char *str= , int unset, bool intr) *p =3D '\0'; =20 if (!strcmp(s, "?")) { - list_perf_regs(stderr, mask); + list_perf_regs(stderr, mask, abi); goto error; } =20 - reg_mask =3D name_to_perf_reg_mask(s, mask); - if (reg_mask =3D=3D 0) { - ui__warning("Unknown register \"%s\", check man page or run \"perf reco= rd %s?\"\n", - s, intr ? "-I" : "--user-regs=3D"); + reg_mask =3D name_to_perf_reg_mask(s, mask, abi); + if (reg_mask) { + if (abi & PERF_SAMPLE_REGS_ABI_SIMD) + opts->sample_simd_regs_enabled =3D 1; + } else { + ui__warning(warn, s, intr ? "-I" : "--user-regs=3D"); goto error; } *mode |=3D reg_mask; diff --git a/tools/perf/util/perf-regs-arch/perf_regs_x86.c b/tools/perf/ut= il/perf-regs-arch/perf_regs_x86.c index b6d20522b4e8..ae26d991cdc9 100644 --- a/tools/perf/util/perf-regs-arch/perf_regs_x86.c +++ b/tools/perf/util/perf-regs-arch/perf_regs_x86.c @@ -235,26 +235,26 @@ int __perf_sdt_arg_parse_op_x86(char *old_op, char **= new_op) return SDT_ARG_VALID; } =20 -uint64_t __perf_reg_mask_x86(bool intr) +static uint64_t __arch__reg_mask(u64 sample_type, u64 mask, bool has_simd_= regs) { struct perf_event_attr attr =3D { - .type =3D PERF_TYPE_HARDWARE, - .config =3D PERF_COUNT_HW_CPU_CYCLES, - .sample_type =3D PERF_SAMPLE_REGS_INTR, - .sample_regs_intr =3D PERF_REG_EXTENDED_MASK, - .precise_ip =3D 1, - .disabled =3D 1, - .exclude_kernel =3D 1, + .type =3D PERF_TYPE_HARDWARE, + .config =3D PERF_COUNT_HW_CPU_CYCLES, + .sample_type =3D sample_type, + .precise_ip =3D 1, + .disabled =3D 1, + .exclude_kernel =3D 1, + .sample_simd_regs_enabled =3D has_simd_regs, }; int fd; - - if (!intr) - return PERF_REGS_MASK; - /* * In an unnamed union, init it here to build on older gcc versions */ attr.sample_period =3D 1; + if (sample_type =3D=3D PERF_SAMPLE_REGS_INTR) + attr.sample_regs_intr =3D mask; + else + attr.sample_regs_user =3D mask; =20 if (perf_pmus__num_core_pmus() > 1) { struct perf_pmu *pmu =3D NULL; @@ -276,13 +276,38 @@ uint64_t __perf_reg_mask_x86(bool intr) /*group_fd=3D*/-1, /*flags=3D*/0); if (fd !=3D -1) { close(fd); - return (PERF_REG_EXTENDED_MASK | PERF_REGS_MASK); + return mask; + } + + return 0; +} + +uint64_t __perf_reg_mask_x86(bool intr, int *abi) +{ + u64 sample_type =3D intr ? PERF_SAMPLE_REGS_INTR : PERF_SAMPLE_REGS_USER; + uint64_t mask =3D PERF_REGS_MASK; + + /* -1 indicates only basic GPRs are needed. */ + if (*abi < 0) + return PERF_REGS_MASK; + + *abi =3D 0; + mask |=3D __arch__reg_mask(sample_type, + GENMASK_ULL(PERF_REG_X86_R31, PERF_REG_X86_R16), + true); + mask |=3D __arch__reg_mask(sample_type, BIT_ULL(PERF_REG_X86_SSP), true); + + if (mask !=3D PERF_REGS_MASK) { + *abi |=3D PERF_SAMPLE_REGS_ABI_SIMD; + } else { + mask |=3D __arch__reg_mask(sample_type, PERF_REG_EXTENDED_MASK, + false); } =20 - return PERF_REGS_MASK; + return mask; } =20 -const char *__perf_reg_name_x86(int id) +static const char *__arch_reg_gpr_name(int id) { switch (id) { case PERF_REG_X86_AX: @@ -333,7 +358,60 @@ const char *__perf_reg_name_x86(int id) return "R14"; case PERF_REG_X86_R15: return "R15"; + default: + return NULL; + } + + return NULL; +} =20 +static const char *__arch_reg_egpr_name(int id) +{ + switch (id) { + case PERF_REG_X86_R16: + return "R16"; + case PERF_REG_X86_R17: + return "R17"; + case PERF_REG_X86_R18: + return "R18"; + case PERF_REG_X86_R19: + return "R19"; + case PERF_REG_X86_R20: + return "R20"; + case PERF_REG_X86_R21: + return "R21"; + case PERF_REG_X86_R22: + return "R22"; + case PERF_REG_X86_R23: + return "R23"; + case PERF_REG_X86_R24: + return "R24"; + case PERF_REG_X86_R25: + return "R25"; + case PERF_REG_X86_R26: + return "R26"; + case PERF_REG_X86_R27: + return "R27"; + case PERF_REG_X86_R28: + return "R28"; + case PERF_REG_X86_R29: + return "R29"; + case PERF_REG_X86_R30: + return "R30"; + case PERF_REG_X86_R31: + return "R31"; + case PERF_REG_X86_SSP: + return "SSP"; + default: + return NULL; + } + + return NULL; +} + +static const char *__arch_reg_xmm_name(int id) +{ + switch (id) { #define XMM(x) \ case PERF_REG_X86_XMM ## x: \ case PERF_REG_X86_XMM ## x + 1: \ @@ -362,6 +440,22 @@ const char *__perf_reg_name_x86(int id) return NULL; } =20 +const char *__perf_reg_name_x86(int id, int abi) +{ + const char *name; + + name =3D __arch_reg_gpr_name(id); + if (name) + return name; + + if (abi & PERF_SAMPLE_REGS_ABI_SIMD) + name =3D __arch_reg_egpr_name(id); + else + name =3D __arch_reg_xmm_name(id); + + return name; +} + uint64_t __perf_reg_ip_x86(void) { return PERF_REG_X86_IP; diff --git a/tools/perf/util/perf_regs.c b/tools/perf/util/perf_regs.c index f52b0e1f7fc7..18eed85cf220 100644 --- a/tools/perf/util/perf_regs.c +++ b/tools/perf/util/perf_regs.c @@ -35,7 +35,7 @@ int perf_sdt_arg_parse_op(uint16_t e_machine, char *old_o= p, char **new_op) return ret; } =20 -uint64_t perf_intr_reg_mask(uint16_t e_machine) +uint64_t perf_intr_reg_mask(uint16_t e_machine, int *abi /*inout*/) { uint64_t mask =3D 0; =20 @@ -67,7 +67,7 @@ uint64_t perf_intr_reg_mask(uint16_t e_machine) break; case EM_386: case EM_X86_64: - mask =3D __perf_reg_mask_x86(/*intr=3D*/true); + mask =3D __perf_reg_mask_x86(/*intr=3D*/true, abi); break; default: pr_debug("Unknown ELF machine %d, interrupt sampling register mask will = be empty.\n", @@ -78,7 +78,7 @@ uint64_t perf_intr_reg_mask(uint16_t e_machine) return mask; } =20 -uint64_t perf_user_reg_mask(uint16_t e_machine) +uint64_t perf_user_reg_mask(uint16_t e_machine, int *abi /*inout*/) { uint64_t mask =3D 0; =20 @@ -110,7 +110,7 @@ uint64_t perf_user_reg_mask(uint16_t e_machine) break; case EM_386: case EM_X86_64: - mask =3D __perf_reg_mask_x86(/*intr=3D*/false); + mask =3D __perf_reg_mask_x86(/*intr=3D*/false, abi); break; default: pr_debug("Unknown ELF machine %d, user sampling register mask will be em= pty.\n", @@ -121,7 +121,7 @@ uint64_t perf_user_reg_mask(uint16_t e_machine) return mask; } =20 -const char *perf_reg_name(int id, uint16_t e_machine, uint32_t e_flags) +const char *perf_reg_name(int id, uint16_t e_machine, uint32_t e_flags, in= t abi) { const char *reg_name =3D NULL; =20 @@ -153,7 +153,7 @@ const char *perf_reg_name(int id, uint16_t e_machine, u= int32_t e_flags) break; case EM_386: case EM_X86_64: - reg_name =3D __perf_reg_name_x86(id); + reg_name =3D __perf_reg_name_x86(id, abi); break; default: break; diff --git a/tools/perf/util/perf_regs.h b/tools/perf/util/perf_regs.h index 573f0d1dfe04..3086d2f2a974 100644 --- a/tools/perf/util/perf_regs.h +++ b/tools/perf/util/perf_regs.h @@ -13,10 +13,10 @@ enum { }; =20 int perf_sdt_arg_parse_op(uint16_t e_machine, char *old_op, char **new_op); -uint64_t perf_intr_reg_mask(uint16_t e_machine); -uint64_t perf_user_reg_mask(uint16_t e_machine); +uint64_t perf_intr_reg_mask(uint16_t e_machine, int *abi /*inout*/); +uint64_t perf_user_reg_mask(uint16_t e_machine, int *abi /*inout*/); =20 -const char *perf_reg_name(int id, uint16_t e_machine, uint32_t e_flags); +const char *perf_reg_name(int id, uint16_t e_machine, uint32_t e_flags, in= t abi); int perf_reg_value(u64 *valp, struct regs_dump *regs, int id); uint64_t perf_arch_reg_ip(uint16_t e_machine); uint64_t perf_arch_reg_sp(uint16_t e_machine); @@ -65,8 +65,8 @@ uint64_t __perf_reg_sp_s390(void); int __perf_sdt_arg_parse_op_s390(char *old_op, char **new_op); =20 int __perf_sdt_arg_parse_op_x86(char *old_op, char **new_op); -uint64_t __perf_reg_mask_x86(bool intr); -const char *__perf_reg_name_x86(int id); +uint64_t __perf_reg_mask_x86(bool intr, int *abi); +const char *__perf_reg_name_x86(int id, int abi); uint64_t __perf_reg_ip_x86(void); uint64_t __perf_reg_sp_x86(void); =20 diff --git a/tools/perf/util/record.h b/tools/perf/util/record.h index 93627c9a7338..411bb7276ad7 100644 --- a/tools/perf/util/record.h +++ b/tools/perf/util/record.h @@ -62,6 +62,13 @@ struct record_opts { u64 branch_stack; u64 sample_intr_regs; u64 sample_user_regs; + u16 sample_simd_regs_enabled; + u16 sample_vec_reg_qwords; + u16 sample_pred_reg_qwords; + u32 sample_intr_pred_regs; + u32 sample_user_pred_regs; + u64 sample_intr_vec_regs; + u64 sample_user_vec_regs; u64 default_interval; u64 user_interval; size_t auxtrace_snapshot_size; diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools= /perf/util/scripting-engines/trace-event-python.c index 5a30caaec73e..a9ad7d712196 100644 --- a/tools/perf/util/scripting-engines/trace-event-python.c +++ b/tools/perf/util/scripting-engines/trace-event-python.c @@ -733,7 +733,7 @@ static void regs_map(struct regs_dump *regs, uint64_t m= ask, uint16_t e_machine, =20 printed +=3D scnprintf(bf + printed, size - printed, "%5s:0x%" PRIx64 " ", - perf_reg_name(r, e_machine, e_flags), val); + perf_reg_name(r, e_machine, e_flags, regs->abi), val); } } =20 diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c index fe0de2a0277f..9e36c834a8f4 100644 --- a/tools/perf/util/session.c +++ b/tools/perf/util/session.c @@ -966,15 +966,16 @@ static void branch_stack__printf(struct perf_sample *= sample, } } =20 -static void regs_dump__printf(u64 mask, u64 *regs, uint16_t e_machine, uin= t32_t e_flags) +static void regs_dump__printf(u64 mask, struct regs_dump *regs, + uint16_t e_machine, uint32_t e_flags) { unsigned rid, i =3D 0; =20 for_each_set_bit(rid, (unsigned long *) &mask, sizeof(mask) * 8) { - u64 val =3D regs[i++]; + u64 val =3D regs->regs[i++]; =20 printf(".... %-5s 0x%016" PRIx64 "\n", - perf_reg_name(rid, e_machine, e_flags), val); + perf_reg_name(rid, e_machine, e_flags, regs->abi), val); } } =20 @@ -982,11 +983,13 @@ static const char *regs_abi[] =3D { [PERF_SAMPLE_REGS_ABI_NONE] =3D "none", [PERF_SAMPLE_REGS_ABI_32] =3D "32-bit", [PERF_SAMPLE_REGS_ABI_64] =3D "64-bit", + [PERF_SAMPLE_REGS_ABI_SIMD | PERF_SAMPLE_REGS_ABI_32] =3D "32-bit SIMD", + [PERF_SAMPLE_REGS_ABI_SIMD | PERF_SAMPLE_REGS_ABI_64] =3D "64-bit SIMD", }; =20 static inline const char *regs_dump_abi(struct regs_dump *d) { - if (d->abi > PERF_SAMPLE_REGS_ABI_64) + if (d->abi >=3D ARRAY_SIZE(regs_abi) || !regs_abi[d->abi]) return "unknown"; =20 return regs_abi[d->abi]; @@ -1002,7 +1005,7 @@ static void regs__printf(const char *type, struct reg= s_dump *regs, mask, regs_dump_abi(regs)); =20 - regs_dump__printf(mask, regs->regs, e_machine, e_flags); + regs_dump__printf(mask, regs, e_machine, e_flags); } =20 static void regs_user__printf(struct perf_sample *sample, uint16_t e_machi= ne, uint32_t e_flags) diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic= -events.c index 85bee747f4cd..ce61734cd5d2 100644 --- a/tools/perf/util/synthetic-events.c +++ b/tools/perf/util/synthetic-events.c @@ -1524,6 +1524,8 @@ size_t perf_event__sample_event_size(const struct per= f_sample *sample, u64 type, if (sample->user_regs && sample->user_regs->abi) { result +=3D sizeof(u64); sz =3D hweight64(sample->user_regs->mask) * sizeof(u64); + if (sample->user_regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) + sz +=3D 4 * sizeof(u64); result +=3D sz; } else { result +=3D sizeof(u64); @@ -1552,6 +1554,8 @@ size_t perf_event__sample_event_size(const struct per= f_sample *sample, u64 type, if (sample->intr_regs && sample->intr_regs->abi) { result +=3D sizeof(u64); sz =3D hweight64(sample->intr_regs->mask) * sizeof(u64); + if (sample->intr_regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) + sz +=3D 4 * sizeof(u64); result +=3D sz; } else { result +=3D sizeof(u64); @@ -1729,6 +1733,8 @@ int perf_event__synthesize_sample(union perf_event *e= vent, u64 type, u64 read_fo if (sample->user_regs && sample->user_regs->abi) { *array++ =3D sample->user_regs->abi; sz =3D hweight64(sample->user_regs->mask) * sizeof(u64); + if (sample->user_regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) + sz +=3D 4 * sizeof(u64); memcpy(array, sample->user_regs->regs, sz); array =3D (void *)array + sz; } else { @@ -1765,6 +1771,8 @@ int perf_event__synthesize_sample(union perf_event *e= vent, u64 type, u64 read_fo if (sample->intr_regs && sample->intr_regs->abi) { *array++ =3D sample->intr_regs->abi; sz =3D hweight64(sample->intr_regs->mask) * sizeof(u64); + if (sample->intr_regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) + sz +=3D 4 * sizeof(u64); memcpy(array, sample->intr_regs->regs, sz); array =3D (void *)array + sz; } else { --=20 2.34.1 From nobody Mon Jun 8 12:11:53 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6A7613B3894; Fri, 29 May 2026 08:30:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780043432; cv=none; b=WoCFcYcb+XFuGLZJQsjIbEuqYniljCDWWChuZiAAztd9Ts/PHj4fxQqKpcC7WuicQHpppQcE12PoZonDSM6Hw7+34l+HrYBAZ1Rpx8GinpSLhkF1Vp6wqck1o/OOoNJxz/3kr4w+HhU1GCcuL1KV8wzLyikRvLHt2dS3LgoH9wo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780043432; c=relaxed/simple; bh=PWAR83pXehRXiqiTFxiGxjtJkoyhI9dNVQYm8DRuu+o=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=hQ6zMuf64Hhx8+PF+FzOswhiHZVGpjEvepxWuV+oJsUxgAYxNlqP5oBf8Z3gV0zC3eEaVA3MiUzp+E+ZumYGRJEw6qVtltm0fwSkWoUrfmIQ59S5dvyzk38Q/+tKK/mO+tXVnJjojsASlAXRvqEDIqlx8Fwra0oGrNP1eixVrIM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=lLNEsfBk; arc=none smtp.client-ip=198.175.65.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="lLNEsfBk" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1780043430; x=1811579430; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=PWAR83pXehRXiqiTFxiGxjtJkoyhI9dNVQYm8DRuu+o=; b=lLNEsfBkF888X+tirAiTqxYohXey5X6NIK8zFkrrJDv77lCIGP4J69Fv +zzQAFwl5gu24EIk0Q67JK8h9OUDqFgdQ5IPBTJ602qZSwlJlTaamnR+G bnk99BqLrlUrjx6omSGeNnupPsWk0JJbnj9CYpMBpxhMTkykW+XPwXM5S lPS1VpD0Znq2lbvI0cNdsUzhxWDZUue9b2yz0z/1AyGyMIZFsD6gqK8g8 isCvAhXviH2FNUT/3wt2IofUa8zCpQVexyKxaeqSseaybYL8SrJiO1Mkc h+TUWa+l9uq32QBMCGt3yo2I+v67W4xoUtb6CfiiyVVfPRm5qfap7JDuj w==; X-CSE-ConnectionGUID: HeEUxfo5S1mi7hss7ZpjXg== X-CSE-MsgGUID: wc8t0LPoRmGVWgQ7Xd0Ljg== X-IronPort-AV: E=McAfee;i="6800,10657,11800"; a="81076364" X-IronPort-AV: E=Sophos;i="6.24,175,1774335600"; d="scan'208";a="81076364" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by orvoesa108.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 May 2026 01:30:30 -0700 X-CSE-ConnectionGUID: cXckhpjwTwqWtqsSW4/Ojw== X-CSE-MsgGUID: ioqf2GniTL28MxnoLNOsrg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,175,1774335600"; d="scan'208";a="247734853" Received: from spr.sh.intel.com ([10.112.230.239]) by orviesa005.jf.intel.com with ESMTP; 29 May 2026 01:30:25 -0700 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Thomas Gleixner , Dave Hansen , Ian Rogers , Adrian Hunter , Jiri Olsa , Alexander Shishkin , Andi Kleen , Eranian Stephane Cc: Mark Rutland , broonie@kernel.org, Ravi Bangoria , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Zide Chen , Falcon Thomas , Dapeng Mi , Xudong Hao , Dapeng Mi Subject: [Patch v8 3/5] perf regs: Support x86 SIMD registers sampling Date: Fri, 29 May 2026 16:24:49 +0800 Message-Id: <20260529082451.591783-4-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260529082451.591783-1-dapeng1.mi@linux.intel.com> References: <20260529082451.591783-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This patch adds support for the newly introduced SIMD register sampling format by adding the following 5 functions: uint64_t perf_intr_simd_reg_class_mask(uint16_t e_machine, bool pred); uint64_t perf_user_simd_reg_class_mask(uint16_t e_machine, bool pred); uint64_t perf_intr_simd_reg_class_bitmap_qwords(uint16_t e_machine, int reg= _c, uint16_t *qwords, bool pred); uint64_t perf_user_simd_reg_class_bitmap_qwords(uint16_t e_machine, int reg= _c, uint16_t *qwords, bool pred); const char *perf_simd_reg_class_name(uint16_t e_machine, int id, bool pred); The perf_{intr|user}_simd_reg_class_mask() functions retrieve the bitmap of kernel supported SIMD/PRED register classes on current platform for intr-regs and user-regs sampling, such as OPMASK/XMM/YMM/ZMM on x86 platforms. The perf_{intr|user}_simd_reg_class_bitmap_qwords() functions retrieve the bitmap and qwords length of a certain class of SIMD/PRED register on current platform for intr-regs and user-regs sampling. For example, for the XMM registers on x86 platforms, the returned bitmap is 0xffff (XMM0 ~ XMM15) and the qwords length is 2 (128 bits for each XMM register). The perf_simd_reg_class_name() function gets the register class name for a certain register class index. Additionally, the function __parse_regs() is enhanced to support parsing these newly introduced SIMD/PRED registers. Currently, each class of register can only be sampled collectively; sampling a specific SIMD register is not supported. For example, all XMM registers are sampled together rather than sampling only XMM0. When multiple overlapping register types, such as XMM and YMM, are sampled simultaneously, only the superset (YMM registers) is sampled. With this patch, all supported sampling registers on x86 platforms are displayed as follows. $perf record --intr-regs=3D? available registers: AX BX CX DX SI DI BP SP IP FLAGS CS SS R8 R9 R10 R11 R12 R13 R14 R15 R16 R17 R18 R19 R20 R21 R22 R23 R24 R25 R26 R27 R28 R29 R30 R31 SSP XMM0-15 YMM0-15 ZMM0-31 OPMASK0-7 $perf record --user-regs=3D? available registers: AX BX CX DX SI DI BP SP IP FLAGS CS SS R8 R9 R10 R11 R12 R13 R14 R15 R16 R17 R18 R19 R20 R21 R22 R23 R24 R25 R26 R27 R28 R29 R30 R31 SSP XMM0-15 YMM0-15 ZMM0-31 OPMASK0-7 Signed-off-by: Dapeng Mi Reviewed-by: Ian Rogers --- tools/perf/Documentation/perf-record.txt | 10 +- tools/perf/util/evsel.c | 21 ++ tools/perf/util/parse-regs-options.c | 159 +++++++++- .../perf/util/perf-regs-arch/perf_regs_x86.c | 292 ++++++++++++++++++ tools/perf/util/perf_event_attr_fprintf.c | 6 + tools/perf/util/perf_regs.c | 72 +++++ tools/perf/util/perf_regs.h | 11 + 7 files changed, 559 insertions(+), 12 deletions(-) diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Document= ation/perf-record.txt index 178f483140ed..b8ff7ecd941d 100644 --- a/tools/perf/Documentation/perf-record.txt +++ b/tools/perf/Documentation/perf-record.txt @@ -513,12 +513,16 @@ Capture machine state (registers) at interrupt, i.e.,= on counter overflows for each sample. List of captured registers depends on the architecture. This = option is off by default. It is possible to select the registers to sample using = their symbolic names, e.g. on x86, ax, si. To list the available registers use ---intr-regs=3D\?. To name registers, pass a comma separated list such as ---intr-regs=3Dax,bx. The list of register is architecture dependent. +--intr-regs=3D\?. On supported architectures, SIMD registers are displayed= as +groups (e.g., on x86: XMM0-15,YMM0-15,ZMM0-31). To name registers, pass a = comma +separated list such as --intr-regs=3Dax,bx,zmm. Please notice SIMD registe= rs must +be assigned as a complete set, sampling individual SIMD registers (e.g., z= mm0) +is not supported. The list of register is architecture dependent. =20 --user-regs:: Similar to -I, but capture user registers at sample time. To list the avai= lable -user registers use --user-regs=3D\?. +user registers use --user-regs=3D\?. For SIMD registers, only complete reg= ister +sets are allowed like -I. =20 --running-time:: Record running and enabled time for read events (:S) diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index 1c856a2ecc6e..cd62af14a4f5 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -26,6 +26,7 @@ #include #include #include +#include #include #include #include @@ -1594,6 +1595,16 @@ void evsel__config(struct evsel *evsel, const struct= record_opts *opts, evsel__set_sample_bit(evsel, REGS_INTR); } =20 + if ((opts->sample_intr_vec_regs || opts->sample_intr_pred_regs) && + !evsel->no_aux_samples && !evsel__is_dummy_event(evsel)) { + attr->sample_simd_regs_enabled =3D !!opts->sample_simd_regs_enabled; + attr->sample_simd_vec_reg_intr =3D opts->sample_intr_vec_regs; + attr->sample_simd_vec_reg_qwords =3D opts->sample_vec_reg_qwords; + attr->sample_simd_pred_reg_intr =3D opts->sample_intr_pred_regs; + attr->sample_simd_pred_reg_qwords =3D opts->sample_pred_reg_qwords; + evsel__set_sample_bit(evsel, REGS_INTR); + } + if (opts->sample_user_regs && !evsel->no_aux_samples && !evsel__is_dummy_event(evsel)) { attr->sample_regs_user |=3D opts->sample_user_regs; @@ -1601,6 +1612,16 @@ void evsel__config(struct evsel *evsel, const struct= record_opts *opts, evsel__set_sample_bit(evsel, REGS_USER); } =20 + if ((opts->sample_user_vec_regs || opts->sample_user_pred_regs) && + !evsel->no_aux_samples && !evsel__is_dummy_event(evsel)) { + attr->sample_simd_regs_enabled =3D !!opts->sample_simd_regs_enabled; + attr->sample_simd_vec_reg_user =3D opts->sample_user_vec_regs; + attr->sample_simd_vec_reg_qwords =3D opts->sample_vec_reg_qwords; + attr->sample_simd_pred_reg_user =3D opts->sample_user_pred_regs; + attr->sample_simd_pred_reg_qwords =3D opts->sample_pred_reg_qwords; + evsel__set_sample_bit(evsel, REGS_USER); + } + if (target__has_cpu(&opts->target) || opts->sample_cpu) evsel__set_sample_bit(evsel, CPU); =20 diff --git a/tools/perf/util/parse-regs-options.c b/tools/perf/util/parse-r= egs-options.c index 70a1cc90b2c1..26d560a486c8 100644 --- a/tools/perf/util/parse-regs-options.c +++ b/tools/perf/util/parse-regs-options.c @@ -13,11 +13,10 @@ #include "record.h" =20 static void -list_perf_regs(FILE *fp, uint64_t mask, int abi) +__list_gp_regs(FILE *fp, uint64_t mask, int abi) { const char *last_name =3D NULL; =20 - fprintf(fp, "available registers: "); for (int reg =3D 0; reg < 64; reg++) { const char *name; =20 @@ -29,14 +28,68 @@ list_perf_regs(FILE *fp, uint64_t mask, int abi) fprintf(fp, "%s%s", reg > 0 ? " " : "", name); last_name =3D name; } +} + +static void +__list_simd_regs(FILE *fp, uint64_t mask, bool intr, bool pred) +{ + uint64_t bitmap =3D 0; + uint16_t qwords =3D 0; + const char *name; + int i =3D 0; + + for (int reg_c =3D 0; reg_c < 64; reg_c++) { + if (((1ULL << reg_c) & mask) =3D=3D 0) + continue; + + name =3D perf_simd_reg_class_name(EM_HOST, reg_c, pred); + bitmap =3D intr ? + perf_intr_simd_reg_class_bitmap_qwords(EM_HOST, reg_c, &qwords, pred) : + perf_user_simd_reg_class_bitmap_qwords(EM_HOST, reg_c, &qwords, pred); + if (name && bitmap) + fprintf(fp, "%s%s0-%d", i++ > 0 ? " " : "", + name, fls64(bitmap) - 1); + } +} + +static void +list_perf_regs(FILE *fp, uint64_t mask, uint64_t simd_mask, + uint64_t pred_mask, int abi, bool intr) +{ + bool printed =3D false; + + fprintf(fp, "available registers: "); + + if (mask) { + __list_gp_regs(fp, mask, abi); + printed =3D true; + } + + if (simd_mask) { + if (printed) + fprintf(fp, " "); + __list_simd_regs(fp, simd_mask, intr, /*pred=3D*/false); + printed =3D true; + } + + if (pred_mask) { + if (printed) + fprintf(fp, " "); + __list_simd_regs(fp, pred_mask, intr, /*pred=3D*/true); + printed =3D true; + } + fputc('\n', fp); } =20 static uint64_t -name_to_perf_reg_mask(const char *to_match, uint64_t mask, int abi) +name_to_gp_reg_mask(const char *to_match, uint64_t mask, int abi) { uint64_t reg_mask =3D 0; =20 + if (!mask) + return reg_mask; + for (int reg =3D 0; reg < 64; reg++) { const char *name; =20 @@ -53,22 +106,96 @@ name_to_perf_reg_mask(const char *to_match, uint64_t m= ask, int abi) return reg_mask; } =20 +static bool +name_to_simd_reg_mask(struct record_opts *opts, const char *to_match, + uint64_t mask, bool intr, bool pred) +{ + bool matched =3D false; + uint64_t bitmap; + uint16_t qwords; + int reg_c; + + if (!mask) + return false; + + for (reg_c =3D 0; reg_c < 64; reg_c++) { + const char *name; + + if (((1ULL << reg_c) & mask) =3D=3D 0) + continue; + + name =3D perf_simd_reg_class_name(EM_HOST, reg_c, pred); + if (!name) + continue; + + if (!strcasecmp(to_match, name)) { + matched =3D true; + break; + } + } + + if (!matched) + return false; + + if (intr) { + bitmap =3D perf_intr_simd_reg_class_bitmap_qwords(EM_HOST, + reg_c, &qwords, pred); + } else { + bitmap =3D perf_user_simd_reg_class_bitmap_qwords(EM_HOST, + reg_c, &qwords, pred); + } + + /* + * Assume higher width SIMD registers are always the superset of lower + * width SIMD registers. So only pick the largest qwords and bitmap. + */ + if (pred) { + opts->sample_pred_reg_qwords =3D + MAX(qwords, opts->sample_pred_reg_qwords); + if (intr && + hweight64(bitmap) > hweight32(opts->sample_intr_pred_regs)) + opts->sample_intr_pred_regs =3D bitmap; + if (!intr && + hweight64(bitmap) > hweight32(opts->sample_user_pred_regs)) + opts->sample_user_pred_regs =3D bitmap; + } else { + opts->sample_vec_reg_qwords =3D + MAX(qwords, opts->sample_vec_reg_qwords); + if (intr && + hweight64(bitmap) > hweight64(opts->sample_intr_vec_regs)) + opts->sample_intr_vec_regs =3D bitmap; + if (!intr && + hweight64(bitmap) > hweight64(opts->sample_user_vec_regs)) + opts->sample_user_vec_regs =3D bitmap; + } + + if (opts->sample_pred_reg_qwords || opts->sample_vec_reg_qwords) + opts->sample_simd_regs_enabled =3D 1; + + return true; +} + static int __parse_regs(const struct option *opt, const char *str, int unset, bool in= tr) { uint64_t *mode =3D (uint64_t *)opt->value; struct record_opts *opts; char *s, *os =3D NULL, *p; + uint64_t simd_mask; + uint64_t pred_mask; + uint64_t mask; const char *warn; + bool matched; int ret =3D -1; - uint64_t mask; int abi =3D 0; =20 if (unset) return 0; =20 /* - * cannot set it twice + * Non-SIMD registers cannot be set twice. + * SIMD registers can be set multiple times, but only the register + * class with largest length (qwords) is sampled. */ if (*mode) return -1; @@ -91,6 +218,14 @@ __parse_regs(const struct option *opt, const char *str,= int unset, bool intr) if (!s) return -1; =20 + if (intr) { + simd_mask =3D perf_intr_simd_reg_class_mask(EM_HOST, /*pred=3D*/false); + pred_mask =3D perf_intr_simd_reg_class_mask(EM_HOST, /*pred=3D*/true); + } else { + simd_mask =3D perf_user_simd_reg_class_mask(EM_HOST, /*pred=3D*/false); + pred_mask =3D perf_user_simd_reg_class_mask(EM_HOST, /*pred=3D*/true); + } + warn =3D "Unknown register \"%s\", check man page or run \"perf record %s= ?\"\n"; for (;;) { uint64_t reg_mask; @@ -100,17 +235,23 @@ __parse_regs(const struct option *opt, const char *st= r, int unset, bool intr) *p =3D '\0'; =20 if (!strcmp(s, "?")) { - list_perf_regs(stderr, mask, abi); + list_perf_regs(stderr, mask, simd_mask, pred_mask, abi, intr); goto error; } =20 - reg_mask =3D name_to_perf_reg_mask(s, mask, abi); + reg_mask =3D name_to_gp_reg_mask(s, mask, abi); if (reg_mask) { if (abi & PERF_SAMPLE_REGS_ABI_SIMD) opts->sample_simd_regs_enabled =3D 1; } else { - ui__warning(warn, s, intr ? "-I" : "--user-regs=3D"); - goto error; + matched =3D name_to_simd_reg_mask(opts, s, simd_mask, + intr, /*pred=3D*/false) || + name_to_simd_reg_mask(opts, s, pred_mask, + intr, /*pred=3D*/true); + if (!matched) { + ui__warning(warn, s, intr ? "-I" : "--user-regs=3D"); + goto error; + } } *mode |=3D reg_mask; =20 diff --git a/tools/perf/util/perf-regs-arch/perf_regs_x86.c b/tools/perf/ut= il/perf-regs-arch/perf_regs_x86.c index ae26d991cdc9..96f156d9971c 100644 --- a/tools/perf/util/perf-regs-arch/perf_regs_x86.c +++ b/tools/perf/util/perf-regs-arch/perf_regs_x86.c @@ -465,3 +465,295 @@ uint64_t __perf_reg_sp_x86(void) { return PERF_REG_X86_SP; } + +enum { + PERF_REG_CLASS_X86_OPMASK =3D 0, + PERF_REG_CLASS_X86_XMM, + PERF_REG_CLASS_X86_YMM, + PERF_REG_CLASS_X86_ZMM, + PERF_REG_X86_MAX_SIMD_CLASSES, +}; + +#define PERF_REG_CLASS_X86_PRED_MASK (BIT(PERF_REG_CLASS_X86_OPMASK)) +#define PERF_REG_CLASS_X86_SIMD_MASK (BIT(PERF_REG_CLASS_X86_XMM) | \ + BIT(PERF_REG_CLASS_X86_YMM) | \ + BIT(PERF_REG_CLASS_X86_ZMM)) + +/* + * This function is used to determine whether kernel perf subsystem + * supports which kinds of SIMD registers (OPMASK/XMM/YMM/ZMM) sampling. + * + * @sample_type: PERF_SAMPLE_REGS_INTR or PERF_SAMPLE_REGS_USER + * @qwords: the length of SIMD register, like 1/2/4/8 qwords for + * OPMASK/XMM/YMM/ZMM registers. + * @mask: the bitmask of SIMD register, like 0xffff for XMM0 ~ XMM15 + * @pred: whether It's a predicate SIMD register, like OPMASK register. + * + * Return value: true indicates support, otherwise no support. + */ +static bool +__support_simd_reg_class(uint64_t sample_type, uint16_t qwords, + uint64_t mask, bool pred) +{ + struct perf_event_attr attr =3D { + .type =3D PERF_TYPE_HARDWARE, + .config =3D PERF_COUNT_HW_CPU_CYCLES, + .sample_type =3D sample_type, + .disabled =3D 1, + .exclude_kernel =3D 1, + .sample_simd_regs_enabled =3D 1, + }; + int fd; + + attr.sample_period =3D 1; + + if (!pred) { + attr.sample_simd_vec_reg_qwords =3D qwords; + if (sample_type =3D=3D PERF_SAMPLE_REGS_INTR) + attr.sample_simd_vec_reg_intr =3D mask; + else + attr.sample_simd_vec_reg_user =3D mask; + } else { + attr.sample_simd_pred_reg_qwords =3D PERF_X86_OPMASK_QWORDS; + if (sample_type =3D=3D PERF_SAMPLE_REGS_INTR) + attr.sample_simd_pred_reg_intr =3D PERF_X86_SIMD_PRED_MASK; + else + attr.sample_simd_pred_reg_user =3D PERF_X86_SIMD_PRED_MASK; + } + + if (perf_pmus__num_core_pmus() > 1) { + __u64 type =3D perf_pmus__find_core_pmu()->type; + + attr.config |=3D type << PERF_PMU_TYPE_SHIFT; + } + + event_attr_init(&attr); + + fd =3D sys_perf_event_open(&attr, 0, -1, -1, 0); + if (fd !=3D -1) { + close(fd); + return true; + } + + return false; +} + +#define PERF_X86_SIMD_ZMM_LOW_REGS (PERF_X86_SIMD_ZMM_REGS / 2) + +static bool __arch_has_simd_reg_class(uint64_t sample_type, int reg_class, + uint64_t *mask, uint16_t *qwords) +{ + bool supported =3D false; + uint64_t bits; + + *mask =3D 0; + *qwords =3D 0; + + switch (reg_class) { + case PERF_REG_CLASS_X86_OPMASK: + bits =3D BIT_ULL(PERF_X86_SIMD_OPMASK_REGS) - 1; + supported =3D __support_simd_reg_class(sample_type, + PERF_X86_OPMASK_QWORDS, + bits, true); + if (supported) { + *mask =3D bits; + *qwords =3D PERF_X86_OPMASK_QWORDS; + } + break; + case PERF_REG_CLASS_X86_XMM: + bits =3D BIT_ULL(PERF_X86_SIMD_XMM_REGS) - 1; + supported =3D __support_simd_reg_class(sample_type, + PERF_X86_XMM_QWORDS, + bits, false); + if (supported) { + *mask =3D bits; + *qwords =3D PERF_X86_XMM_QWORDS; + } + break; + case PERF_REG_CLASS_X86_YMM: + bits =3D BIT_ULL(PERF_X86_SIMD_YMM_REGS) - 1; + supported =3D __support_simd_reg_class(sample_type, + PERF_X86_YMM_QWORDS, + bits, false); + if (supported) { + *mask =3D bits; + *qwords =3D PERF_X86_YMM_QWORDS; + } + break; + case PERF_REG_CLASS_X86_ZMM: + bits =3D BIT_ULL(PERF_X86_SIMD_ZMM_REGS) - 1; + supported =3D __support_simd_reg_class(sample_type, + PERF_X86_ZMM_QWORDS, + bits, false); + if (supported) { + *mask =3D bits; + *qwords =3D PERF_X86_ZMM_QWORDS; + break; + } + + bits =3D BIT_ULL(PERF_X86_SIMD_ZMM_LOW_REGS) - 1; + supported =3D __support_simd_reg_class(sample_type, + PERF_X86_ZMM_QWORDS, + bits, false); + if (supported) { + *mask =3D bits; + *qwords =3D PERF_X86_ZMM_QWORDS; + } + break; + default: + break; + } + + return supported; +} + +static bool __support_simd_sampling(void) +{ + uint64_t mask =3D BIT_ULL(PERF_X86_SIMD_XMM_REGS) - 1; + uint16_t qwords =3D PERF_X86_XMM_QWORDS; + static bool simd_sampling_supported; + static bool cached; + + if (cached) + return simd_sampling_supported; + + simd_sampling_supported =3D + __arch_has_simd_reg_class(PERF_SAMPLE_REGS_INTR, + PERF_REG_CLASS_X86_XMM, + &mask, &qwords); + cached =3D true; + + return simd_sampling_supported; +} + +/* + * @x86_intr_simd_cached: indicates the data of below 3 + * x86_intr_simd_* items has been retrieved from kernel and cached. + * @x86_intr_simd_reg_class_mask: indicates which kinds of PRED/SIMD + * registers are supported for intr-regs option. Assume kernel perf + * subsystem supports XMM/YMM sampling, then the mask is + * PERF_REG_CLASS_X86_XMM|PERF_REG_CLASS_X86_YMM. + * @x86_intr_simd_mask: indicates register bitmask for each kind of + * supported PRED/SIMD register, like + * x86_intr_simd_mask[PERF_REG_CLASS_X86_XMM] =3D 0xffff. + * @x86_intr_simd_qwords: indicates the register length (qwords unit) + * for each kind of supported PRED/SIMD register, like + * x86_intr_simd_qwords[PERF_REG_CLASS_X86_XMM] =3D 2. + */ +static bool x86_intr_simd_cached; +static uint64_t x86_intr_simd_reg_class_mask; +static uint64_t x86_intr_simd_mask[PERF_REG_X86_MAX_SIMD_CLASSES]; +static uint16_t x86_intr_simd_qwords[PERF_REG_X86_MAX_SIMD_CLASSES]; + +/* + * Similar with above x86_intr_simd_* items, the difference is these + * items are used for user-regs option. + */ +static bool x86_user_simd_cached; +static uint64_t x86_user_simd_reg_class_mask; +static uint64_t x86_user_simd_mask[PERF_REG_X86_MAX_SIMD_CLASSES]; +static uint16_t x86_user_simd_qwords[PERF_REG_X86_MAX_SIMD_CLASSES]; + +static uint64_t __arch__simd_reg_class_mask(bool intr) +{ + uint64_t mask =3D 0; + bool supported; + int reg_c; + + if (!__support_simd_sampling()) + goto done; + + if (intr && x86_intr_simd_cached) + return x86_intr_simd_reg_class_mask; + + if (!intr && x86_user_simd_cached) + return x86_user_simd_reg_class_mask; + + for (reg_c =3D 0; reg_c < PERF_REG_X86_MAX_SIMD_CLASSES; reg_c++) { + supported =3D false; + + if (intr) { + supported =3D __arch_has_simd_reg_class( + PERF_SAMPLE_REGS_INTR, + reg_c, + &x86_intr_simd_mask[reg_c], + &x86_intr_simd_qwords[reg_c]); + } else { + supported =3D __arch_has_simd_reg_class( + PERF_SAMPLE_REGS_USER, + reg_c, + &x86_user_simd_mask[reg_c], + &x86_user_simd_qwords[reg_c]); + } + if (supported) + mask |=3D BIT_ULL(reg_c); + } + +done: + if (intr) { + x86_intr_simd_reg_class_mask =3D mask; + x86_intr_simd_cached =3D true; + } else { + x86_user_simd_reg_class_mask =3D mask; + x86_user_simd_cached =3D true; + } + + return mask; +} + +static uint64_t +__arch__simd_reg_class_bitmap_qwords(bool intr, int reg_c, uint16_t *qword= s) +{ + uint64_t mask =3D 0; + uint64_t class_mask; + + *qwords =3D 0; + class_mask =3D intr ? x86_intr_simd_reg_class_mask : + x86_user_simd_reg_class_mask; + if (!(class_mask & BIT_ULL(reg_c))) + return 0; + + if (intr) { + mask =3D x86_intr_simd_mask[reg_c]; + *qwords =3D x86_intr_simd_qwords[reg_c]; + } else { + mask =3D x86_user_simd_mask[reg_c]; + *qwords =3D x86_user_simd_qwords[reg_c]; + } + + return mask; +} + +uint64_t __perf_simd_reg_class_mask_x86(bool intr, bool pred) +{ + uint64_t mask =3D __arch__simd_reg_class_mask(intr); + + return pred ? mask & PERF_REG_CLASS_X86_PRED_MASK : + mask & PERF_REG_CLASS_X86_SIMD_MASK; +} + +uint64_t __perf_simd_reg_class_bitmap_qwords_x86(int reg_c, uint16_t *qwor= ds, + bool intr, bool pred) +{ + if (intr ? !x86_intr_simd_cached : !x86_user_simd_cached) + __perf_simd_reg_class_mask_x86(intr, pred); + return __arch__simd_reg_class_bitmap_qwords(intr, reg_c, qwords); +} + +const char *__perf_simd_reg_class_name_x86(int id, bool pred __maybe_unuse= d) +{ + switch (id) { + case PERF_REG_CLASS_X86_OPMASK: + return "OPMASK"; + case PERF_REG_CLASS_X86_XMM: + return "XMM"; + case PERF_REG_CLASS_X86_YMM: + return "YMM"; + case PERF_REG_CLASS_X86_ZMM: + return "ZMM"; + default: + return NULL; + } + + return NULL; +} diff --git a/tools/perf/util/perf_event_attr_fprintf.c b/tools/perf/util/pe= rf_event_attr_fprintf.c index 741c3d657a8b..c6b8e53e06fd 100644 --- a/tools/perf/util/perf_event_attr_fprintf.c +++ b/tools/perf/util/perf_event_attr_fprintf.c @@ -362,6 +362,12 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_eve= nt_attr *attr, PRINT_ATTRf(aux_start_paused, p_unsigned); PRINT_ATTRf(aux_pause, p_unsigned); PRINT_ATTRf(aux_resume, p_unsigned); + PRINT_ATTRf(sample_simd_pred_reg_qwords, p_unsigned); + PRINT_ATTRf(sample_simd_pred_reg_intr, p_hex); + PRINT_ATTRf(sample_simd_pred_reg_user, p_hex); + PRINT_ATTRf(sample_simd_vec_reg_qwords, p_unsigned); + PRINT_ATTRf(sample_simd_vec_reg_intr, p_hex); + PRINT_ATTRf(sample_simd_vec_reg_user, p_hex); =20 return ret; } diff --git a/tools/perf/util/perf_regs.c b/tools/perf/util/perf_regs.c index 18eed85cf220..31920eb2fa04 100644 --- a/tools/perf/util/perf_regs.c +++ b/tools/perf/util/perf_regs.c @@ -249,3 +249,75 @@ uint64_t perf_arch_reg_sp(uint16_t e_machine) return 0; } } + +uint64_t perf_intr_simd_reg_class_mask(uint16_t e_machine, bool pred) +{ + switch (e_machine) { + case EM_386: + case EM_X86_64: + return __perf_simd_reg_class_mask_x86(/*intr=3D*/true, pred); + default: + return 0; + } +} + +uint64_t perf_user_simd_reg_class_mask(uint16_t e_machine, bool pred) +{ + switch (e_machine) { + case EM_386: + case EM_X86_64: + return __perf_simd_reg_class_mask_x86(/*intr=3D*/false, pred); + default: + return 0; + } +} + +uint64_t perf_intr_simd_reg_class_bitmap_qwords(uint16_t e_machine, int re= g_c, + uint16_t *qwords, bool pred) +{ + switch (e_machine) { + case EM_386: + case EM_X86_64: + return __perf_simd_reg_class_bitmap_qwords_x86(reg_c, qwords, + /*intr=3D*/true, + pred); + default: + *qwords =3D 0; + return 0; + } +} + +uint64_t perf_user_simd_reg_class_bitmap_qwords(uint16_t e_machine, int re= g_c, + uint16_t *qwords, bool pred) +{ + switch (e_machine) { + case EM_386: + case EM_X86_64: + return __perf_simd_reg_class_bitmap_qwords_x86(reg_c, qwords, + /*intr=3D*/false, + pred); + default: + *qwords =3D 0; + return 0; + } +} + +const char *perf_simd_reg_class_name(uint16_t e_machine, int id, bool pred) +{ + const char *name =3D NULL; + + switch (e_machine) { + case EM_386: + case EM_X86_64: + name =3D __perf_simd_reg_class_name_x86(id, pred); + break; + default: + break; + } + if (name) + return name; + + pr_debug("Failed to find %s register %d for ELF machine type %u\n", + pred ? "PRED" : "SIMD", id, e_machine); + return "unknown"; +} diff --git a/tools/perf/util/perf_regs.h b/tools/perf/util/perf_regs.h index 3086d2f2a974..8a3a40d6b1bb 100644 --- a/tools/perf/util/perf_regs.h +++ b/tools/perf/util/perf_regs.h @@ -20,6 +20,13 @@ const char *perf_reg_name(int id, uint16_t e_machine, ui= nt32_t e_flags, int abi) int perf_reg_value(u64 *valp, struct regs_dump *regs, int id); uint64_t perf_arch_reg_ip(uint16_t e_machine); uint64_t perf_arch_reg_sp(uint16_t e_machine); +uint64_t perf_intr_simd_reg_class_mask(uint16_t e_machine, bool pred); +uint64_t perf_user_simd_reg_class_mask(uint16_t e_machine, bool pred); +uint64_t perf_intr_simd_reg_class_bitmap_qwords(uint16_t e_machine, int re= g_c, + uint16_t *qwords, bool pred); +uint64_t perf_user_simd_reg_class_bitmap_qwords(uint16_t e_machine, int re= g_c, + uint16_t *qwords, bool pred); +const char *perf_simd_reg_class_name(uint16_t e_machine, int id, bool pred= ); =20 int __perf_sdt_arg_parse_op_arm64(char *old_op, char **new_op); uint64_t __perf_reg_mask_arm64(bool intr); @@ -69,6 +76,10 @@ uint64_t __perf_reg_mask_x86(bool intr, int *abi); const char *__perf_reg_name_x86(int id, int abi); uint64_t __perf_reg_ip_x86(void); uint64_t __perf_reg_sp_x86(void); +uint64_t __perf_simd_reg_class_mask_x86(bool intr, bool pred); +uint64_t __perf_simd_reg_class_bitmap_qwords_x86(int reg_c, uint16_t *qwor= ds, + bool intr, bool pred); +const char *__perf_simd_reg_class_name_x86(int id, bool pred); =20 static inline uint64_t DWARF_MINIMAL_REGS(uint16_t e_machine) { --=20 2.34.1 From nobody Mon Jun 8 12:11:53 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 812DF3B6BFF; Fri, 29 May 2026 08:30:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780043437; cv=none; b=ECnOOJ9DCe+y6zew1iKQDTCUJLz46ciGWbpAwqjFljnSVCuSqCDO6ekPoPokh7ystKqSRL7VnCHC0Sk4P2RVAC5oXYRz3tXkvudHoljfcZAT5hPlNDflCdIYoM6JVJbESfSgHjgE5Vyvka+erAdbB2jEeQfGr6mkHlIl1QPWRog= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780043437; c=relaxed/simple; bh=gRbVIFrVUjGfneIqDnqPxclESScD+RomiDo//Dr6Yas=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=G5UZ+ir14Tt1MZ7EIelTXyBA2B7ES/vnMQ/FvETjgy8FQvniWbbKJkeVdsz8O8z+De6DzUZIVc1o2zw3C0afdllK9Sn2C2tnKGoIE3J+xmlQ0KRPq+OGSYrkzEGXEU8GWAWnUXzOEmSPPjOLhhVWiz5qtdc/jV2nL868K3rrTxg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=atpBWbV9; arc=none smtp.client-ip=198.175.65.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="atpBWbV9" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1780043435; x=1811579435; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=gRbVIFrVUjGfneIqDnqPxclESScD+RomiDo//Dr6Yas=; b=atpBWbV9n/DcIEafNa/3RvHI6477rerOxR57FzHVHAGNTj/Lie3clqiO NRSolGmN1mLSEiAj4WXeedyuo34v0xjArO1G9hQqYrja9LhafsHHz/RPt +iJUGTF+1V0Q+OJF1K+AdknEHzH9iiTXk9fvoel/nF/wRw1VYkOJ9j68j udOIqCw79KzbQuqvYm8ZwsSEv5/UElpPgFX/PT+OLGKWOqSS/Phzl/Z8M lgu870OEPvURC3ndZOvHZzE4dGm+HyhHNkNQzSTAMBMgrSeWTwUdwpR5q YnSgG0H/NXVUT00cP06pcv1ESpEiKjYuLq6Y/oLcaxn69inU6Cv2MUPrO Q==; X-CSE-ConnectionGUID: lyGKS94BTmS06sMM35zWjA== X-CSE-MsgGUID: oUUw/zQrQfKfZCLbNzFiYQ== X-IronPort-AV: E=McAfee;i="6800,10657,11800"; a="81076379" X-IronPort-AV: E=Sophos;i="6.24,175,1774335600"; d="scan'208";a="81076379" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by orvoesa108.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 May 2026 01:30:35 -0700 X-CSE-ConnectionGUID: 8sftXwIxSA+Gzf02Ij9JOQ== X-CSE-MsgGUID: fzEoe74MS4qMbvNPkcCoZA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,175,1774335600"; d="scan'208";a="247734866" Received: from spr.sh.intel.com ([10.112.230.239]) by orviesa005.jf.intel.com with ESMTP; 29 May 2026 01:30:30 -0700 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Thomas Gleixner , Dave Hansen , Ian Rogers , Adrian Hunter , Jiri Olsa , Alexander Shishkin , Andi Kleen , Eranian Stephane Cc: Mark Rutland , broonie@kernel.org, Ravi Bangoria , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Zide Chen , Falcon Thomas , Dapeng Mi , Xudong Hao , Dapeng Mi , Kan Liang Subject: [Patch v8 4/5] perf regs: Enable dumping of SIMD registers Date: Fri, 29 May 2026 16:24:50 +0800 Message-Id: <20260529082451.591783-5-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260529082451.591783-1-dapeng1.mi@linux.intel.com> References: <20260529082451.591783-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This patch adds support for dumping SIMD registers using the new PERF_SAMPLE_REGS_ABI_SIMD ABI. Currently, the XMM, YMM, ZMM, OPMASK, eGPRs, and SSP registers on x86 platforms are supported with the PERF_SAMPLE_REGS_ABI_SIMD ABI. An example of the output is displayed below. Example: $perf record -e cycles:p -Iax,bx,r8,r16,r31,ssp,xmm,ymm,zmm,opmask ./test $perf report -D ... ... 3342715685845 0x3afe8 [0xbc8]: PERF_RECORD_SAMPLE(IP, 0x1): 27776/27776: 0xffffffff91d7c18f period: 10000 addr: 0 ... intr regs: mask 0x18001010003 ABI 64-bit SIMD .... AX 0xffffed102de1a606 .... BX 0xffffed102de1a606 .... R8 0x0000000000000001 .... R16 0x0000000000000000 .... R31 0x0000000000000000 .... SSP 0x0000000000000000 ... SIMD ABI nr_vectors 32 vector_qwords 8 nr_pred 8 pred_qwords 1 .... ZMM[0][0] 0x616c2f656d6f682f .... ZMM[0][1] 0x696c2f7265737562 .... ZMM[0][2] 0x0000000000000000 .... ZMM[0][3] 0x0000000000000000 .... ZMM[0][4] 0x0000000000000000 .... ZMM[0][5] 0x0000000000000000 .... ZMM[0][6] 0x0000000000000000 .... ZMM[0][7] 0x0000000000000000 .... ZMM[1][0] 0x702f636578656269 .... ZMM[1][1] 0x65726f632d667265 .... ZMM[1][2] 0x0000000000000000 .... ZMM[1][3] 0x0000000000000000 .... ZMM[1][4] 0x0000000000000000 .... ZMM[1][5] 0x0000000000000000 .... ZMM[1][6] 0x0000000000000000 .... ZMM[1][7] 0x0000000000000000 ... ... .... ZMM[31][0] 0x0000000000000000 .... ZMM[31][1] 0x0000000000000000 .... ZMM[31][2] 0x0000000000000000 .... ZMM[31][3] 0x0000000000000000 .... ZMM[31][4] 0x0000000000000000 .... ZMM[31][5] 0x0000000000000000 .... ZMM[31][6] 0x0000000000000000 .... ZMM[31][7] 0x0000000000000000 .... OPMASK[0] 0x0000000000100221 .... OPMASK[1] 0x0000000000000020 .... OPMASK[2] 0x000000007fffffff .... OPMASK[3] 0x0000000000000000 .... OPMASK[4] 0x0000000000000000 .... OPMASK[5] 0x0000000000000000 .... OPMASK[6] 0x0000000000000000 .... OPMASK[7] 0x0000000000000000 ... ... Co-developed-by: Kan Liang Signed-off-by: Kan Liang Signed-off-by: Dapeng Mi --- tools/perf/builtin-inject.c | 9 +++- tools/perf/util/evsel.c | 68 ++++++++++++++++++++++++-- tools/perf/util/sample.h | 5 ++ tools/perf/util/session.c | 78 ++++++++++++++++++++++++++++++ tools/perf/util/synthetic-events.c | 28 +++++++++-- 5 files changed, 178 insertions(+), 10 deletions(-) diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c index f6611d7e85eb..de19d5bd2d57 100644 --- a/tools/perf/builtin-inject.c +++ b/tools/perf/builtin-inject.c @@ -457,8 +457,13 @@ static int perf_event__convert_sample_callchain(const = struct perf_tool *tool, /* adjust sample size for stack and regs */ sample_size -=3D sample->user_stack.size; sample_size -=3D (hweight64(evsel->core.attr.sample_regs_user) + 1) * siz= eof(u64); - if (sample->user_regs && sample->user_regs->abi & PERF_SAMPLE_REGS_ABI_SI= MD) - sample_size -=3D 4 * sizeof(u64); /* Reduce SIMD regs header size */ + if (sample->user_regs && sample->user_regs->abi & PERF_SAMPLE_REGS_ABI_SI= MD) { + sample_size -=3D 4 * sizeof(u64); + sample_size -=3D (sample->user_regs->nr_vectors * + sample->user_regs->vector_qwords + + sample->user_regs->nr_pred * + sample->user_regs->pred_qwords) * sizeof(u64); + } sample_size +=3D (sample->callchain->nr + 1) * sizeof(u64); event_copy->header.size =3D sample_size; =20 diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index cd62af14a4f5..a47747c8be08 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -3523,9 +3523,39 @@ int evsel__parse_sample(struct evsel *evsel, union p= erf_event *event, array =3D (void *)array + sz; =20 if (regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) { - /* Skip SIMD-regs header. */ - sz =3D 4 * sizeof(u64); + u64 attr_nr_vectors =3D + hweight64(evsel->core.attr.sample_simd_vec_reg_user); + u64 attr_vec_qwords =3D + evsel->core.attr.sample_simd_vec_reg_qwords; + u64 attr_nr_pred =3D + hweight32(evsel->core.attr.sample_simd_pred_reg_user); + u64 attr_pred_qwords =3D + evsel->core.attr.sample_simd_pred_reg_qwords; + + OVERFLOW_CHECK_u64(array); + regs->nr_vectors =3D *(u64 *)array; + array =3D (void *)array + sizeof(u64); + OVERFLOW_CHECK_u64(array); + regs->vector_qwords =3D *(u64 *)array; + array =3D (void *)array + sizeof(u64); + OVERFLOW_CHECK_u64(array); + regs->nr_pred =3D *(u64 *)array; + array =3D (void *)array + sizeof(u64); + OVERFLOW_CHECK_u64(array); + regs->pred_qwords =3D *(u64 *)array; + array =3D (void *)array + sizeof(u64); + + if (regs->nr_vectors > attr_nr_vectors || + regs->vector_qwords > attr_vec_qwords || + regs->nr_pred > attr_nr_pred || + regs->pred_qwords > attr_pred_qwords) + goto out_efault; + + sz =3D (regs->nr_vectors * regs->vector_qwords + + regs->nr_pred * regs->pred_qwords) * sizeof(u64); OVERFLOW_CHECK(array, sz, max_size); + + regs->simd_data =3D (u64 *)array; array =3D (void *)array + sz; } } @@ -3587,9 +3617,39 @@ int evsel__parse_sample(struct evsel *evsel, union p= erf_event *event, array =3D (void *)array + sz; =20 if (regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) { - /* Skip SIMD-regs header. */ - sz =3D 4 * sizeof(u64); + u64 attr_nr_vectors =3D + hweight64(evsel->core.attr.sample_simd_vec_reg_intr); + u64 attr_vec_qwords =3D + evsel->core.attr.sample_simd_vec_reg_qwords; + u64 attr_nr_pred =3D + hweight32(evsel->core.attr.sample_simd_pred_reg_intr); + u64 attr_pred_qwords =3D + evsel->core.attr.sample_simd_pred_reg_qwords; + + OVERFLOW_CHECK_u64(array); + regs->nr_vectors =3D *(u64 *)array; + array =3D (void *)array + sizeof(u64); + OVERFLOW_CHECK_u64(array); + regs->vector_qwords =3D *(u64 *)array; + array =3D (void *)array + sizeof(u64); + OVERFLOW_CHECK_u64(array); + regs->nr_pred =3D *(u64 *)array; + array =3D (void *)array + sizeof(u64); + OVERFLOW_CHECK_u64(array); + regs->pred_qwords =3D *(u64 *)array; + array =3D (void *)array + sizeof(u64); + + if (regs->nr_vectors > attr_nr_vectors || + regs->vector_qwords > attr_vec_qwords || + regs->nr_pred > attr_nr_pred || + regs->pred_qwords > attr_pred_qwords) + goto out_efault; + + sz =3D (regs->nr_vectors * regs->vector_qwords + + regs->nr_pred * regs->pred_qwords) * sizeof(u64); OVERFLOW_CHECK(array, sz, max_size); + + regs->simd_data =3D (u64 *)array; array =3D (void *)array + sz; } } diff --git a/tools/perf/util/sample.h b/tools/perf/util/sample.h index e556c9b656ea..95f921d482ad 100644 --- a/tools/perf/util/sample.h +++ b/tools/perf/util/sample.h @@ -16,6 +16,11 @@ struct regs_dump { u64 abi; u64 mask; u64 *regs; + u64 nr_vectors; + u64 vector_qwords; + u64 nr_pred; + u64 pred_qwords; + u64 *simd_data; =20 /* Cached values/mask filled by first register access. */ u64 cache_regs[PERF_SAMPLE_REGS_CACHE_SIZE]; diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c index 9e36c834a8f4..cd8e9aaa10a1 100644 --- a/tools/perf/util/session.c +++ b/tools/perf/util/session.c @@ -979,6 +979,82 @@ static void regs_dump__printf(u64 mask, struct regs_du= mp *regs, } } =20 +static void simd_regs_dump__printf(uint16_t e_machine, struct regs_dump *r= egs, bool intr) +{ + const char *name =3D "unknown"; + const char *simd_header; + u32 i, j, idx, pred_base; + uint16_t qwords; + int reg_c; + + if (!(regs->abi & PERF_SAMPLE_REGS_ABI_SIMD)) + return; + + if (!regs->nr_vectors && !regs->nr_pred) + return; + + simd_header =3D "... SIMD ABI nr_vectors %" PRIu64 " vector_qwords %" PRI= u64 \ + " nr_pred %" PRIu64 " pred_qwords %" PRIu64 "\n"; + printf(simd_header, regs->nr_vectors, regs->vector_qwords, + regs->nr_pred, regs->pred_qwords); + + for (reg_c =3D 0; reg_c < 64; reg_c++) { + if (intr) { + perf_intr_simd_reg_class_bitmap_qwords(e_machine, reg_c, + &qwords, /*pred=3D*/false); + } else { + perf_user_simd_reg_class_bitmap_qwords(e_machine, reg_c, + &qwords, /*pred=3D*/false); + } + if (regs->vector_qwords =3D=3D qwords) { + name =3D perf_simd_reg_class_name(e_machine, reg_c, /*pred=3D*/false); + break; + } + } + + for (i =3D 0; i < regs->nr_vectors; i++) { + for (j =3D 0; j < regs->vector_qwords; j++) { + idx =3D i * regs->vector_qwords + j; + if (regs->vector_qwords > 1) { + printf(".... %3s[%d][%d] 0x%016" PRIx64 "\n", + name, i, j, regs->simd_data[idx++]); + } else { + printf(".... %3s[%d] 0x%016" PRIx64 "\n", + name, i, regs->simd_data[idx++]); + } + } + } + + name =3D "unknown"; + for (reg_c =3D 0; reg_c < 64; reg_c++) { + if (intr) { + perf_intr_simd_reg_class_bitmap_qwords(e_machine, reg_c, + &qwords, /*pred=3D*/true); + } else { + perf_user_simd_reg_class_bitmap_qwords(e_machine, reg_c, + &qwords, /*pred=3D*/true); + } + if (regs->pred_qwords =3D=3D qwords) { + name =3D perf_simd_reg_class_name(e_machine, reg_c, /*pred=3D*/true); + break; + } + } + + pred_base =3D regs->nr_vectors * regs->vector_qwords; + for (i =3D 0; i < regs->nr_pred; i++) { + for (j =3D 0; j < regs->pred_qwords; j++) { + idx =3D pred_base + i * regs->pred_qwords + j; + if (regs->pred_qwords > 1) { + printf(".... %3s[%d][%d] 0x%016" PRIx64 "\n", + name, i, j, regs->simd_data[idx++]); + } else { + printf(".... %3s[%d] 0x%016" PRIx64 "\n", + name, i, regs->simd_data[idx++]); + } + } + } +} + static const char *regs_abi[] =3D { [PERF_SAMPLE_REGS_ABI_NONE] =3D "none", [PERF_SAMPLE_REGS_ABI_32] =3D "32-bit", @@ -1019,6 +1095,7 @@ static void regs_user__printf(struct perf_sample *sam= ple, uint16_t e_machine, ui =20 if (user_regs->regs) regs__printf("user", user_regs, e_machine, e_flags); + simd_regs_dump__printf(e_machine, user_regs, /*intr=3D*/false); } =20 static void regs_intr__printf(struct perf_sample *sample, uint16_t e_machi= ne, uint32_t e_flags) @@ -1032,6 +1109,7 @@ static void regs_intr__printf(struct perf_sample *sam= ple, uint16_t e_machine, ui =20 if (intr_regs->regs) regs__printf("intr", intr_regs, e_machine, e_flags); + simd_regs_dump__printf(e_machine, intr_regs, /*intr=3D*/true); } =20 static void stack_user__printf(struct stack_dump *dump) diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic= -events.c index ce61734cd5d2..461a4633fd4e 100644 --- a/tools/perf/util/synthetic-events.c +++ b/tools/perf/util/synthetic-events.c @@ -1524,8 +1524,13 @@ size_t perf_event__sample_event_size(const struct pe= rf_sample *sample, u64 type, if (sample->user_regs && sample->user_regs->abi) { result +=3D sizeof(u64); sz =3D hweight64(sample->user_regs->mask) * sizeof(u64); - if (sample->user_regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) + if (sample->user_regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) { sz +=3D 4 * sizeof(u64); + sz +=3D (sample->user_regs->nr_vectors * + sample->user_regs->vector_qwords + + sample->user_regs->nr_pred * + sample->user_regs->pred_qwords) * sizeof(u64); + } result +=3D sz; } else { result +=3D sizeof(u64); @@ -1554,8 +1559,13 @@ size_t perf_event__sample_event_size(const struct pe= rf_sample *sample, u64 type, if (sample->intr_regs && sample->intr_regs->abi) { result +=3D sizeof(u64); sz =3D hweight64(sample->intr_regs->mask) * sizeof(u64); - if (sample->intr_regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) + if (sample->intr_regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) { sz +=3D 4 * sizeof(u64); + sz +=3D (sample->intr_regs->nr_vectors * + sample->intr_regs->vector_qwords + + sample->intr_regs->nr_pred * + sample->intr_regs->pred_qwords) * sizeof(u64); + } result +=3D sz; } else { result +=3D sizeof(u64); @@ -1733,8 +1743,13 @@ int perf_event__synthesize_sample(union perf_event *= event, u64 type, u64 read_fo if (sample->user_regs && sample->user_regs->abi) { *array++ =3D sample->user_regs->abi; sz =3D hweight64(sample->user_regs->mask) * sizeof(u64); - if (sample->user_regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) + if (sample->user_regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) { sz +=3D 4 * sizeof(u64); + sz +=3D (sample->user_regs->nr_vectors * + sample->user_regs->vector_qwords + + sample->user_regs->nr_pred * + sample->user_regs->pred_qwords) * sizeof(u64); + } memcpy(array, sample->user_regs->regs, sz); array =3D (void *)array + sz; } else { @@ -1771,8 +1786,13 @@ int perf_event__synthesize_sample(union perf_event *= event, u64 type, u64 read_fo if (sample->intr_regs && sample->intr_regs->abi) { *array++ =3D sample->intr_regs->abi; sz =3D hweight64(sample->intr_regs->mask) * sizeof(u64); - if (sample->intr_regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) + if (sample->intr_regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) { sz +=3D 4 * sizeof(u64); + sz +=3D (sample->intr_regs->nr_vectors * + sample->intr_regs->vector_qwords + + sample->intr_regs->nr_pred * + sample->intr_regs->pred_qwords) * sizeof(u64); + } memcpy(array, sample->intr_regs->regs, sz); array =3D (void *)array + sz; } else { --=20 2.34.1 From nobody Mon Jun 8 12:11:53 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 85E893B8BA1; Fri, 29 May 2026 08:30:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780043442; cv=none; b=p5CbemNXcaptGbQub0CNjBA5QcdApuJl8C+WP2C01f6zYRq7Xt7XUt93LkmAoeSb0J+4m6ItrqJbBrb8m7nHAtNHQz/R231YQInwezpmBimSJRPaeSchyCN/fqu+YkCvIxkRA5K+p8kqe93K8U7NIQV0wtJb+TraHS4U1DlUruY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780043442; c=relaxed/simple; bh=gEXQrVCxxIpR+aM1mvFnRIBLphfNM3g3S/0oL0bZXuY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version:Content-Type; b=WqN8M9Dh9uP5yd9QQGjEVRbkzQgKco2tyVGnoxhtwVsCKxWlbHBvgPqwrLSpRmshOza36FrkxSx3rqdQjWc9RuP0UpLqvZ0olT8Ny5VzLzgeAg3oj6vxfVTLqkTuJPCE0mS+nfmOVCkB41zvc8zcKHUDRWmVoJEJElKvZ7slRPo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=EoJHUhAr; arc=none smtp.client-ip=198.175.65.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="EoJHUhAr" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1780043440; x=1811579440; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=gEXQrVCxxIpR+aM1mvFnRIBLphfNM3g3S/0oL0bZXuY=; b=EoJHUhArmxZKkzLi/U24UKuRDMg39A8l/QyRhPLtRJPvzpyH2zv/vX2r UclZZ7xac7NEL6bwkG8rdICi2E86PqJEPXPpOllEFXT0qG0Fe9uvZWmuK SFo1j5duRN6DEy1XykuR4GuEnO/Ka7hQ1iBRkpUZ9z6ALFf+Zlo8/PG/+ sZ/TTsJ9I0Pkl9VLwmDubXTuKR/dDviK0B7byYSfTHBJzZ4KCI+cHCYY0 6edHXYHg9n2ccwswLPxvoy0GaXwz/MjmpA4gRAmUtugaDFgNVAjaAAIPX hlh6zwxVPR6mPPifQ1fA32YzsSqOe9FhjVxLis+Ktk9zsf1kg4XFByxPM g==; X-CSE-ConnectionGUID: qix4zhEzSdqPF+aW4ILm3A== X-CSE-MsgGUID: Ginax7CVRkaWny9Se36//A== X-IronPort-AV: E=McAfee;i="6800,10657,11800"; a="81076393" X-IronPort-AV: E=Sophos;i="6.24,175,1774335600"; d="scan'208";a="81076393" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by orvoesa108.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 May 2026 01:30:40 -0700 X-CSE-ConnectionGUID: tDwFVnEoReeXw0LxGUhD3g== X-CSE-MsgGUID: B3mKuJ00Sv+aaAxajVgTjQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,175,1774335600"; d="scan'208";a="247734884" Received: from spr.sh.intel.com ([10.112.230.239]) by orviesa005.jf.intel.com with ESMTP; 29 May 2026 01:30:35 -0700 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Thomas Gleixner , Dave Hansen , Ian Rogers , Adrian Hunter , Jiri Olsa , Alexander Shishkin , Andi Kleen , Eranian Stephane Cc: Mark Rutland , broonie@kernel.org, Ravi Bangoria , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Zide Chen , Falcon Thomas , Dapeng Mi , Xudong Hao , Dapeng Mi Subject: [Patch v8 5/5] perf dwarf-regs: Add SIMD/eGPRs support for x86 DWARF registers Date: Fri, 29 May 2026 16:24:51 +0800 Message-Id: <20260529082451.591783-6-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260529082451.591783-1-dapeng1.mi@linux.intel.com> References: <20260529082451.591783-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Enhance the x86-specific DWARF register handling by adding support for SIMD and eGPRs registers. This update is based on the "DWARF Register Number Mapping" table from the "System V Application Binary Interface AMD64 Architecture Processor Supplement" (version 1.0). Modifications include: - Updating the x86_64_regidx_table[] array to incorporate SIMD and eGPRs registers. - Enhancing the __get_dwarf_regnum_for_perf_regnum_x86_64() function to retrieve the DWARF register index for eGPRs. - Enlarge the x86_64 supported register number to 146 to cover eGPRs and SIMD registers (get_libdw_frame_nregs()). Signed-off-by: Dapeng Mi --- .../util/dwarf-regs-arch/dwarf-regs-x86.c | 138 +++++++++++++++--- tools/perf/util/dwarf-regs.c | 7 +- tools/perf/util/include/dwarf-regs.h | 7 +- tools/perf/util/unwind-libdw.c | 6 +- 4 files changed, 129 insertions(+), 29 deletions(-) diff --git a/tools/perf/util/dwarf-regs-arch/dwarf-regs-x86.c b/tools/perf/= util/dwarf-regs-arch/dwarf-regs-x86.c index cadef120aeb4..b014a36d21b5 100644 --- a/tools/perf/util/dwarf-regs-arch/dwarf-regs-x86.c +++ b/tools/perf/util/dwarf-regs-arch/dwarf-regs-x86.c @@ -90,22 +90,22 @@ static const struct dwarf_regs_idx x86_64_regidx_table[= ] =3D { { "r14", 14 }, { "r14d", 14 }, { "r14w", 14 }, { "r14b", 14 }, { "r15", 15 }, { "r15d", 15 }, { "r15w", 15 }, { "r15b", 15 }, // 16 - Return Address RA - { "xmm0", 17}, - { "xmm1", 18}, - { "xmm2", 19}, - { "xmm3", 20}, - { "xmm4", 21}, - { "xmm5", 22}, - { "xmm6", 23}, - { "xmm7", 24}, - { "xmm8", 25}, - { "xmm9", 26}, - { "xmm10", 27}, - { "xmm11", 28}, - { "xmm12", 29}, - { "xmm13", 30}, - { "xmm14", 31}, - { "xmm15", 32}, + { "zmm0", 17 }, { "ymm0", 17 }, { "xmm0", 17 }, + { "zmm1", 18 }, { "ymm1", 18 }, { "xmm1", 18 }, + { "zmm2", 19 }, { "ymm2", 19 }, { "xmm2", 19 }, + { "zmm3", 20 }, { "ymm3", 20 }, { "xmm3", 20 }, + { "zmm4", 21 }, { "ymm4", 21 }, { "xmm4", 21 }, + { "zmm5", 22 }, { "ymm5", 22 }, { "xmm5", 22 }, + { "zmm6", 23 }, { "ymm6", 23 }, { "xmm6", 23 }, + { "zmm7", 24 }, { "ymm7", 24 }, { "xmm7", 24 }, + { "zmm8", 25 }, { "ymm8", 25 }, { "xmm8", 25 }, + { "zmm9", 26 }, { "ymm9", 26 }, { "xmm9", 26 }, + { "zmm10", 27 }, { "ymm10", 27 }, { "xmm10", 27 }, + { "zmm11", 28 }, { "ymm11", 28 }, { "xmm11", 28 }, + { "zmm12", 29 }, { "ymm12", 29 }, { "xmm12", 29 }, + { "zmm13", 30 }, { "ymm13", 30 }, { "xmm13", 30 }, + { "zmm14", 31 }, { "ymm14", 31 }, { "xmm14", 31 }, + { "zmm15", 32 }, { "ymm15", 32 }, { "xmm15", 32 }, { "st0", 33}, { "st1", 34}, { "st2", 35}, @@ -129,7 +129,7 @@ static const struct dwarf_regs_idx x86_64_regidx_table[= ] =3D { { "ds", 53}, { "fs", 54}, { "gs", 55}, - // 56-47 - reserved + // 56-57 - reserved { "fs.base", 58}, { "gs.base", 59}, // 60-61 - reserved @@ -138,6 +138,49 @@ static const struct dwarf_regs_idx x86_64_regidx_table= [] =3D { { "mxcsr", 64}, // 128-bit Media Control and Status { "fcw", 65}, // x87 Control Word { "fsw", 66}, // x87 Status Word + // 67-82 - Upper Vector Registers 16=E2=80=9331 + { "zmm16", 67 }, { "ymm16", 67 }, { "xmm16", 67 }, + { "zmm17", 68 }, { "ymm17", 68 }, { "xmm17", 68 }, + { "zmm18", 69 }, { "ymm18", 69 }, { "xmm18", 69 }, + { "zmm19", 70 }, { "ymm19", 70 }, { "xmm19", 70 }, + { "zmm20", 71 }, { "ymm20", 71 }, { "xmm20", 71 }, + { "zmm21", 72 }, { "ymm21", 72 }, { "xmm21", 72 }, + { "zmm22", 73 }, { "ymm22", 73 }, { "xmm22", 73 }, + { "zmm23", 74 }, { "ymm23", 74 }, { "xmm23", 74 }, + { "zmm24", 75 }, { "ymm24", 75 }, { "xmm24", 75 }, + { "zmm25", 76 }, { "ymm25", 76 }, { "xmm25", 76 }, + { "zmm26", 77 }, { "ymm26", 77 }, { "xmm26", 77 }, + { "zmm27", 78 }, { "ymm27", 78 }, { "xmm27", 78 }, + { "zmm28", 79 }, { "ymm28", 79 }, { "xmm28", 79 }, + { "zmm29", 80 }, { "ymm29", 80 }, { "xmm29", 80 }, + { "zmm30", 81 }, { "ymm30", 81 }, { "xmm30", 81 }, + { "zmm31", 82 }, { "ymm31", 82 }, { "xmm31", 82 }, + // 118-125 - Vector Mask Registers 0=E2=80=937 + { "k0", 118 }, + { "k1", 119 }, + { "k2", 120 }, + { "k3", 121 }, + { "k4", 122 }, + { "k5", 123 }, + { "k6", 124 }, + { "k7", 125 }, + // 130-145 - APX Integer Registers 16-31 + { "r16", 130 }, { "r16d", 130 }, { "r16w", 130 }, { "r16b", 130 }, + { "r17", 131 }, { "r17d", 131 }, { "r17w", 131 }, { "r17b", 131 }, + { "r18", 132 }, { "r18d", 132 }, { "r18w", 132 }, { "r18b", 132 }, + { "r19", 133 }, { "r19d", 133 }, { "r19w", 133 }, { "r19b", 133 }, + { "r20", 134 }, { "r20d", 134 }, { "r20w", 134 }, { "r20b", 134 }, + { "r21", 135 }, { "r21d", 135 }, { "r21w", 135 }, { "r21b", 135 }, + { "r22", 136 }, { "r22d", 136 }, { "r22w", 136 }, { "r22b", 136 }, + { "r23", 137 }, { "r23d", 137 }, { "r23w", 137 }, { "r23b", 137 }, + { "r24", 138 }, { "r24d", 138 }, { "r24w", 138 }, { "r24b", 138 }, + { "r25", 139 }, { "r25d", 139 }, { "r25w", 139 }, { "r25b", 139 }, + { "r26", 140 }, { "r26d", 140 }, { "r26w", 140 }, { "r26b", 140 }, + { "r27", 141 }, { "r27d", 141 }, { "r27w", 141 }, { "r27b", 141 }, + { "r28", 142 }, { "r28d", 142 }, { "r28w", 142 }, { "r28b", 142 }, + { "r29", 143 }, { "r29d", 143 }, { "r29w", 143 }, { "r29b", 143 }, + { "r30", 144 }, { "r30d", 144 }, { "r30w", 144 }, { "r30b", 144 }, + { "r31", 145 }, { "r31d", 145 }, { "r31w", 145 }, { "r31b", 145 }, // End of regular dwarf registers. { "rip", DWARF_REG_PC }, { "eip", DWARF_REG_PC }, { "ip", DWARF_REG_PC }, }; @@ -204,7 +247,7 @@ int __get_dwarf_regnum_for_perf_regnum_i386(int perf_re= gnum) return dwarf_i386_regnums[perf_regnum]; } =20 -int __get_dwarf_regnum_for_perf_regnum_x86_64(int perf_regnum) +int __get_dwarf_regnum_for_perf_regnum_x86_64(int perf_regnum, int abi) { static const int dwarf_x86_64_regnums[] =3D { [PERF_REG_X86_AX] =3D 0, @@ -248,13 +291,66 @@ int __get_dwarf_regnum_for_perf_regnum_x86_64(int per= f_regnum) [PERF_REG_X86_XMM14] =3D 31, [PERF_REG_X86_XMM15] =3D 32, }; + static const int dwarf_x86_64_regnums_apx[] =3D { + [PERF_REG_X86_AX] =3D 0, + [PERF_REG_X86_BX] =3D 3, + [PERF_REG_X86_CX] =3D 2, + [PERF_REG_X86_DX] =3D 1, + [PERF_REG_X86_SI] =3D 4, + [PERF_REG_X86_DI] =3D 5, + [PERF_REG_X86_BP] =3D 6, + [PERF_REG_X86_SP] =3D 7, + [PERF_REG_X86_IP] =3D 16, + [PERF_REG_X86_FLAGS] =3D 49, + [PERF_REG_X86_CS] =3D 51, + [PERF_REG_X86_SS] =3D 52, + [PERF_REG_X86_DS] =3D 53, + [PERF_REG_X86_ES] =3D 50, + [PERF_REG_X86_FS] =3D 54, + [PERF_REG_X86_GS] =3D 55, + [PERF_REG_X86_R8] =3D 8, + [PERF_REG_X86_R9] =3D 9, + [PERF_REG_X86_R10] =3D 10, + [PERF_REG_X86_R11] =3D 11, + [PERF_REG_X86_R12] =3D 12, + [PERF_REG_X86_R13] =3D 13, + [PERF_REG_X86_R14] =3D 14, + [PERF_REG_X86_R15] =3D 15, + [PERF_REG_X86_R16] =3D 130, + [PERF_REG_X86_R17] =3D 131, + [PERF_REG_X86_R18] =3D 132, + [PERF_REG_X86_R19] =3D 133, + [PERF_REG_X86_R20] =3D 134, + [PERF_REG_X86_R21] =3D 135, + [PERF_REG_X86_R22] =3D 136, + [PERF_REG_X86_R23] =3D 137, + [PERF_REG_X86_R24] =3D 138, + [PERF_REG_X86_R25] =3D 139, + [PERF_REG_X86_R26] =3D 140, + [PERF_REG_X86_R27] =3D 141, + [PERF_REG_X86_R28] =3D 142, + [PERF_REG_X86_R29] =3D 143, + [PERF_REG_X86_R30] =3D 144, + [PERF_REG_X86_R31] =3D 145, + }; =20 if (perf_regnum =3D=3D 0) return 0; =20 - if (perf_regnum < 0 || perf_regnum > (int)ARRAY_SIZE(dwarf_x86_64_regnum= s) || - dwarf_x86_64_regnums[perf_regnum] =3D=3D 0) + if (perf_regnum < 0) + return -ENOENT; + + if (!(abi & PERF_SAMPLE_REGS_ABI_SIMD) && + (perf_regnum >=3D (int)ARRAY_SIZE(dwarf_x86_64_regnums) || + dwarf_x86_64_regnums[perf_regnum] =3D=3D 0)) + return -ENOENT; + + if ((abi & PERF_SAMPLE_REGS_ABI_SIMD) && + (perf_regnum >=3D (int)ARRAY_SIZE(dwarf_x86_64_regnums_apx) || + dwarf_x86_64_regnums_apx[perf_regnum] =3D=3D 0)) return -ENOENT; =20 - return dwarf_x86_64_regnums[perf_regnum]; + return abi & PERF_SAMPLE_REGS_ABI_SIMD ? + dwarf_x86_64_regnums_apx[perf_regnum] : + dwarf_x86_64_regnums[perf_regnum]; } diff --git a/tools/perf/util/dwarf-regs.c b/tools/perf/util/dwarf-regs.c index 797f455eba0d..9e2a0c93ecc9 100644 --- a/tools/perf/util/dwarf-regs.c +++ b/tools/perf/util/dwarf-regs.c @@ -158,7 +158,7 @@ static int get_libdw_frame_nregs(unsigned int machine, = unsigned int flags __mayb { switch (machine) { case EM_X86_64: - return 17; + return 146; /* Support APX eGPRs. */ case EM_386: return 9; case EM_ARM: @@ -187,13 +187,14 @@ static int get_libdw_frame_nregs(unsigned int machine= , unsigned int flags __mayb } =20 int get_dwarf_regnum_for_perf_regnum(int perf_regnum, unsigned int machine, - unsigned int flags, bool only_libdw_supported) + unsigned int flags, + bool only_libdw_supported, int abi) { int reg; =20 switch (machine) { case EM_X86_64: - reg =3D __get_dwarf_regnum_for_perf_regnum_x86_64(perf_regnum); + reg =3D __get_dwarf_regnum_for_perf_regnum_x86_64(perf_regnum, abi); break; case EM_386: reg =3D __get_dwarf_regnum_for_perf_regnum_i386(perf_regnum); diff --git a/tools/perf/util/include/dwarf-regs.h b/tools/perf/util/include= /dwarf-regs.h index 46a764cf322f..92cf0af93e9e 100644 --- a/tools/perf/util/include/dwarf-regs.h +++ b/tools/perf/util/include/dwarf-regs.h @@ -103,7 +103,7 @@ int __get_csky_regnum(const char *name, unsigned int fl= ags); int __get_dwarf_regnum_i386(const char *name); int __get_dwarf_regnum_x86_64(const char *name); int __get_dwarf_regnum_for_perf_regnum_i386(int perf_regnum); -int __get_dwarf_regnum_for_perf_regnum_x86_64(int perf_regnum); +int __get_dwarf_regnum_for_perf_regnum_x86_64(int perf_regnum, int abi); =20 int __get_dwarf_regnum_for_perf_regnum_arm(int perf_regnum); int __get_dwarf_regnum_for_perf_regnum_arm64(int perf_regnum); @@ -125,8 +125,9 @@ int get_dwarf_regnum(const char *name, unsigned int mac= hine, unsigned int flags) /* * get_dwarf_regnum - Returns DWARF regnum from perf register number. */ -int get_dwarf_regnum_for_perf_regnum(int perf_regnum, unsigned int machine= , unsigned int flags, - bool only_libdw_supported); +int get_dwarf_regnum_for_perf_regnum(int perf_regnum, unsigned int machine, + unsigned int flags, + bool only_libdw_supported, int abi); =20 void get_powerpc_regs(u32 raw_insn, int is_source, struct annotated_op_loc= *op_loc); =20 diff --git a/tools/perf/util/unwind-libdw.c b/tools/perf/util/unwind-libdw.c index 05e8e68bd49c..678db5a65ada 100644 --- a/tools/perf/util/unwind-libdw.c +++ b/tools/perf/util/unwind-libdw.c @@ -273,7 +273,8 @@ static bool libdw_set_initial_registers(Dwfl_Thread *th= read, void *arg) int dwarf_reg =3D get_dwarf_regnum_for_perf_regnum(perf_reg, e_machine, e_flags, - /*only_libdw_supported=3D*/true); + /*only_libdw_supported=3D*/true, + user_regs->abi); if (dwarf_reg > max_dwarf_reg) max_dwarf_reg =3D dwarf_reg; } @@ -288,7 +289,8 @@ static bool libdw_set_initial_registers(Dwfl_Thread *th= read, void *arg) int dwarf_reg =3D get_dwarf_regnum_for_perf_regnum(perf_reg, e_machine, e_flags, - /*only_libdw_supported=3D*/true); + /*only_libdw_supported=3D*/true, + user_regs->abi); if (dwarf_reg >=3D 0) { val =3D 0; if (perf_reg_value(&val, user_regs, perf_reg) =3D=3D 0) --=20 2.34.1