From nobody Tue Feb 10 09:22:10 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.14]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 591FC318130; Mon, 9 Feb 2026 08:39:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.14 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770626350; cv=none; b=jkc10YjWmR/L2OpCjR/ZaKzxW3nKb4XDp7ll6UV7mJZAyHqivWTMP7eG5zPOumI/K+V0Co6fTB9LTD4qKDPOq09DqUz9gceK6o3aXFbEURLW9pFQKj4tKzQVGvnriNPmqqcEMVdzqPmQT33CUJiVjnjN97Om/sGLDaj+qhviajI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770626350; c=relaxed/simple; bh=UO7iDro7nU6prcLVjmuZL0sBtTIs5+FPR9o9e+oSo4Q=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Y78qXJ1nuIhi3p6AN+UykGt2q9X7DPwhVll7mEzipAQKGeSRuPm7v36pysfR1KAW5qMD/fy8N1MpTIdaAwT54ThjysmHmI16TDZaAWuIEOQ7LN8lXbew0QMqcUhWMbStR0rmbAfFMV8WL0hH03VN7KfOzTUiYGiUbiEpSvwAjhQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=St6d3ZLs; arc=none smtp.client-ip=198.175.65.14 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="St6d3ZLs" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1770626351; x=1802162351; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=UO7iDro7nU6prcLVjmuZL0sBtTIs5+FPR9o9e+oSo4Q=; b=St6d3ZLsI2cwSUk2ltM/lJ1+krfq6cssSDxF8FFAAT32bmKSOekAVHNm 5HqRmeJf/PH1piHE/WBJ1LHNcCd0l2oTyYbVk/4xfvGyq7Zux0PQ98bwy 5O8C5iKRk4HUfw2Oxo5bvE+e5L2F7BUcd7N0JKr1wfwaJY8Wc5Usvep/o FSM+Ve9SiozdOHynlDc/P6wKyxh+i0SirrwRdN0+h0lAxp9xBfsvhZJ// 2E77LeRavcGITR/m0y6U1tB4fMID/0bQXEnP7ir9w3UhNvJ/WXULB2lKw wqdbzJOTncnqw183Tp7ACGManc7DdpjDHdqp67W63JdFbVitOVfIbn1Tw g==; X-CSE-ConnectionGUID: BOaxZq5vQnOEFB/edSNfIg== X-CSE-MsgGUID: Y/BA+0RbSvC+ACLLLysF4Q== X-IronPort-AV: E=McAfee;i="6800,10657,11695"; a="75580705" X-IronPort-AV: E=Sophos;i="6.21,281,1763452800"; d="scan'208";a="75580705" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by orvoesa106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Feb 2026 00:39:10 -0800 X-CSE-ConnectionGUID: jNN6pRrDTUKzw9EiFdfxEQ== X-CSE-MsgGUID: bv6PXaGTR8aNAItvWsnNqA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.21,281,1763452800"; d="scan'208";a="211582225" Received: from spr.sh.intel.com ([10.112.229.196]) by orviesa007.jf.intel.com with ESMTP; 09 Feb 2026 00:39:07 -0800 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Ian Rogers , Adrian Hunter , Alexander Shishkin Cc: linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, Zide Chen , Falcon Thomas , Dapeng Mi , Xudong Hao , Kan Liang , Dapeng Mi Subject: [Patch v6 1/4] perf headers: Sync with the kernel headers Date: Mon, 9 Feb 2026 16:35:11 +0800 Message-Id: <20260209083514.2225115-2-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260209083514.2225115-1-dapeng1.mi@linux.intel.com> References: <20260209083514.2225115-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang Update include/uapi/linux/perf_event.h and arch/x86/include/uapi/asm/perf_regs.h to support extended regs. Signed-off-by: Kan Liang Co-developed-by: Dapeng Mi Signed-off-by: Dapeng Mi --- tools/arch/x86/include/uapi/asm/perf_regs.h | 49 +++++++++++++++++++++ tools/include/uapi/linux/perf_event.h | 45 +++++++++++++++++-- 2 files changed, 90 insertions(+), 4 deletions(-) diff --git a/tools/arch/x86/include/uapi/asm/perf_regs.h b/tools/arch/x86/i= nclude/uapi/asm/perf_regs.h index 7c9d2bb3833b..6da63e1dbb40 100644 --- a/tools/arch/x86/include/uapi/asm/perf_regs.h +++ b/tools/arch/x86/include/uapi/asm/perf_regs.h @@ -27,9 +27,34 @@ enum perf_event_x86_regs { PERF_REG_X86_R13, PERF_REG_X86_R14, PERF_REG_X86_R15, + /* + * The EGPRs/SSP and XMM have overlaps. Only one can be used + * at a time. For the ABI type PERF_SAMPLE_REGS_ABI_SIMD, + * utilize EGPRs/SSP. For the other ABI type, XMM is used. + * + * Extended GPRs (EGPRs) + */ + PERF_REG_X86_R16, + PERF_REG_X86_R17, + PERF_REG_X86_R18, + PERF_REG_X86_R19, + PERF_REG_X86_R20, + PERF_REG_X86_R21, + PERF_REG_X86_R22, + PERF_REG_X86_R23, + PERF_REG_X86_R24, + PERF_REG_X86_R25, + PERF_REG_X86_R26, + PERF_REG_X86_R27, + PERF_REG_X86_R28, + PERF_REG_X86_R29, + PERF_REG_X86_R30, + PERF_REG_X86_R31, + PERF_REG_X86_SSP, /* These are the limits for the GPRs. */ PERF_REG_X86_32_MAX =3D PERF_REG_X86_GS + 1, PERF_REG_X86_64_MAX =3D PERF_REG_X86_R15 + 1, + PERF_REG_MISC_MAX =3D PERF_REG_X86_SSP + 1, =20 /* These all need two bits set because they are 128bit */ PERF_REG_X86_XMM0 =3D 32, @@ -54,5 +79,29 @@ enum perf_event_x86_regs { }; =20 #define PERF_REG_EXTENDED_MASK (~((1ULL << PERF_REG_X86_XMM0) - 1)) +#define PERF_X86_EGPRS_MASK GENMASK_ULL(PERF_REG_X86_R31, PERF_REG_X86_R16) + +enum { + PERF_X86_SIMD_XMM_REGS =3D 16, + PERF_X86_SIMD_YMM_REGS =3D 16, + PERF_X86_SIMD_ZMM_REGS =3D 32, + PERF_X86_SIMD_VEC_REGS_MAX =3D PERF_X86_SIMD_ZMM_REGS, + + PERF_X86_SIMD_OPMASK_REGS =3D 8, + PERF_X86_SIMD_PRED_REGS_MAX =3D PERF_X86_SIMD_OPMASK_REGS, +}; + +#define PERF_X86_SIMD_PRED_MASK GENMASK(PERF_X86_SIMD_PRED_REGS_MAX - 1, 0) +#define PERF_X86_SIMD_VEC_MASK GENMASK_ULL(PERF_X86_SIMD_VEC_REGS_MAX - 1,= 0) + +#define PERF_X86_H16ZMM_BASE 16 + +enum { + PERF_X86_OPMASK_QWORDS =3D 1, + PERF_X86_XMM_QWORDS =3D 2, + PERF_X86_YMM_QWORDS =3D 4, + PERF_X86_ZMM_QWORDS =3D 8, + PERF_X86_SIMD_QWORDS_MAX =3D PERF_X86_ZMM_QWORDS, +}; =20 #endif /* _ASM_X86_PERF_REGS_H */ diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/lin= ux/perf_event.h index 72f03153dd32..ce3a14d35390 100644 --- a/tools/include/uapi/linux/perf_event.h +++ b/tools/include/uapi/linux/perf_event.h @@ -314,8 +314,9 @@ enum { */ enum perf_sample_regs_abi { PERF_SAMPLE_REGS_ABI_NONE =3D 0, - PERF_SAMPLE_REGS_ABI_32 =3D 1, - PERF_SAMPLE_REGS_ABI_64 =3D 2, + PERF_SAMPLE_REGS_ABI_32 =3D (1 << 0), + PERF_SAMPLE_REGS_ABI_64 =3D (1 << 1), + PERF_SAMPLE_REGS_ABI_SIMD =3D (1 << 2), }; =20 /* @@ -383,6 +384,7 @@ enum perf_event_read_format { #define PERF_ATTR_SIZE_VER7 128 /* Add: sig_data */ #define PERF_ATTR_SIZE_VER8 136 /* Add: config3 */ #define PERF_ATTR_SIZE_VER9 144 /* add: config4 */ +#define PERF_ATTR_SIZE_VER10 176 /* Add: sample_simd_{pred,vec}_reg_* */ =20 /* * 'struct perf_event_attr' contains various attributes that define @@ -547,6 +549,25 @@ struct perf_event_attr { =20 __u64 config3; /* extension of config2 */ __u64 config4; /* extension of config3 */ + + /* + * Defines set of SIMD registers to dump on samples. + * The sample_simd_regs_enabled !=3D0 implies the + * set of SIMD registers is used to config all SIMD registers. + * If !sample_simd_regs_enabled, sample_regs_XXX may be used to + * config some SIMD registers on X86. + */ + union { + __u16 sample_simd_regs_enabled; + __u16 sample_simd_pred_reg_qwords; + }; + __u16 sample_simd_vec_reg_qwords; + __u32 __reserved_4; + + __u32 sample_simd_pred_reg_intr; + __u32 sample_simd_pred_reg_user; + __u64 sample_simd_vec_reg_intr; + __u64 sample_simd_vec_reg_user; }; =20 /* @@ -1020,7 +1041,15 @@ enum perf_event_type { * } && PERF_SAMPLE_BRANCH_STACK * * { u64 abi; # enum perf_sample_regs_abi - * u64 regs[weight(mask)]; } && PERF_SAMPLE_REGS_USER + * u64 regs[weight(mask)]; + * struct { + * u16 nr_vectors; # 0 ... weight(sample_simd_vec_reg_user) + * u16 vector_qwords; # 0 ... sample_simd_vec_reg_qwords + * u16 nr_pred; # 0 ... weight(sample_simd_pred_reg_user) + * u16 pred_qwords; # 0 ... sample_simd_pred_reg_qwords + * u64 data[nr_vectors * vector_qwords + nr_pred * pred_qwords]; + * } && (abi & PERF_SAMPLE_REGS_ABI_SIMD) + * } && PERF_SAMPLE_REGS_USER * * { u64 size; * char data[size]; @@ -1047,7 +1076,15 @@ enum perf_event_type { * { u64 data_src; } && PERF_SAMPLE_DATA_SRC * { u64 transaction; } && PERF_SAMPLE_TRANSACTION * { u64 abi; # enum perf_sample_regs_abi - * u64 regs[weight(mask)]; } && PERF_SAMPLE_REGS_INTR + * u64 regs[weight(mask)]; + * struct { + * u16 nr_vectors; # 0 ... weight(sample_simd_vec_reg_intr) + * u16 vector_qwords; # 0 ... sample_simd_vec_reg_qwords + * u16 nr_pred; # 0 ... weight(sample_simd_pred_reg_intr) + * u16 pred_qwords; # 0 ... sample_simd_pred_reg_qwords + * u64 data[nr_vectors * vector_qwords + nr_pred * pred_qwords]; + * } && (abi & PERF_SAMPLE_REGS_ABI_SIMD) + * } && PERF_SAMPLE_REGS_INTR * { u64 phys_addr;} && PERF_SAMPLE_PHYS_ADDR * { u64 cgroup;} && PERF_SAMPLE_CGROUP * { u64 data_page_size;} && PERF_SAMPLE_DATA_PAGE_SIZE --=20 2.34.1 From nobody Tue Feb 10 09:22:10 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.14]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0C5652222B2; Mon, 9 Feb 2026 08:39:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.14 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770626354; cv=none; b=b5ViC9IRKQXT0tOfYuZiPspVdOcK1D4SPJ2fXvXF97mNq6g0DGq2+0z4vOcUyYvlNGEcQBiHeCcCnbY3MqpeHTigg3kljY9oVU45dxfz8LpWjx9qSp/Y+PuuFL8EUlfsqYF/1WP2CgC4mcgCux1Ss2Th1n0K9QgP54zOmkf8i5E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770626354; c=relaxed/simple; bh=6aOsAfncun61PoIUBzwet9vdDDd5hAOgeXwcOFsnYug=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=ozIGwkiAyNyvsverq/JaWSUP5+0rmxARabrnEQbSCtb+8GPht59BLA0OAvHYH4NLBXxggOLOKNgkcSH3hXO3ULKTBIFHf2NosqiN8umfI3/q9dmJ70tFPKUR76gC6hQNJaGqFT5i+GptuJkamZLp9RtDkXJqgZIu542DmGKR4ss= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=mdAwKoge; arc=none smtp.client-ip=198.175.65.14 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="mdAwKoge" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1770626354; x=1802162354; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=6aOsAfncun61PoIUBzwet9vdDDd5hAOgeXwcOFsnYug=; b=mdAwKogekmlnbMrI1X2lGSjSllnrfKABQqCWzwcEBJZOThGS6nVp5GfH DAxHB6Jze2g7JZ+gjfK4ZathdP6INwot7pfbK2VoyuTu3gE82ef6Fo/i7 LK2qWiHgNOjllvlnZXFPsi0dNdV8Ke8n+9vzQUjFjcPbp0P8psVjRwXP2 RjIfOTiW2HF2QhZZ3sAOWgVqcP8WYq3zVpimiHuJO6VuFxjaHM1WDNbdl dLSh8S/TYJC1gLeL33Pvc6zLofskoTPbmzsZ2+uXTRbvYURjibSf9yFRN E++oBNtfO96p8dzK28PqYR2ZOrBJpG2GSnxwfQAIx1tjjh8lGdi+Nx/jy g==; X-CSE-ConnectionGUID: Dd5Fpi8URvmISPHwTD0/Hw== X-CSE-MsgGUID: LDkkoRn6R+CNkGxKlN4Aag== X-IronPort-AV: E=McAfee;i="6800,10657,11695"; a="75580712" X-IronPort-AV: E=Sophos;i="6.21,281,1763452800"; d="scan'208";a="75580712" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by orvoesa106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Feb 2026 00:39:14 -0800 X-CSE-ConnectionGUID: xa7EgVrDRMiv2DGgoNEEug== X-CSE-MsgGUID: jg5bZUQOQbKW4AV1XRdT6g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.21,281,1763452800"; d="scan'208";a="211582252" Received: from spr.sh.intel.com ([10.112.229.196]) by orviesa007.jf.intel.com with ESMTP; 09 Feb 2026 00:39:11 -0800 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Ian Rogers , Adrian Hunter , Alexander Shishkin Cc: linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, Zide Chen , Falcon Thomas , Dapeng Mi , Xudong Hao , Dapeng Mi Subject: [Patch v6 2/4] perf regs: Support x86 eGPRs/SSP sampling Date: Mon, 9 Feb 2026 16:35:12 +0800 Message-Id: <20260209083514.2225115-3-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260209083514.2225115-1-dapeng1.mi@linux.intel.com> References: <20260209083514.2225115-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This patch adds support for sampling x86 extended GP registers (R16-R31) and the shadow stack pointer (SSP) register. The original XMM registers space in sample_regs_user/sample_regs_intr is reclaimed to represent the eGPRs and SSP when SIMD registers sampling is supported with the new SIMD sampling fields in the perf_event_attr structure. This necessitates a way to distinguish which register layout is used for the sample_regs_user/sample_regs_intr bitmap. To address this, a new "abi" argument is added to the helpers perf_intr_reg_mask(), perf_user_reg_mask(), and perf_reg_name(). When "abi & PERF_SAMPLE_REGS_ABI_SIMD" is true, it indicates the eGPRs and SSP layout is represented; otherwise, the legacy XMM registers are represented. Signed-off-by: Dapeng Mi --- tools/perf/builtin-script.c | 2 +- tools/perf/util/evsel.c | 6 +- tools/perf/util/parse-regs-options.c | 17 ++- .../perf/util/perf-regs-arch/perf_regs_x86.c | 120 +++++++++++++++--- tools/perf/util/perf_regs.c | 14 +- tools/perf/util/perf_regs.h | 10 +- .../scripting-engines/trace-event-python.c | 2 +- tools/perf/util/session.c | 9 +- 8 files changed, 139 insertions(+), 41 deletions(-) diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index 14c6f6c3c4f2..ffe51f895666 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -730,7 +730,7 @@ static int perf_sample__fprintf_regs(struct regs_dump *= regs, uint64_t mask, for_each_set_bit(r, (unsigned long *) &mask, sizeof(mask) * 8) { u64 val =3D regs->regs[i++]; printed +=3D fprintf(fp, "%5s:0x%"PRIx64" ", - perf_reg_name(r, e_machine, e_flags), + perf_reg_name(r, e_machine, e_flags, regs->abi), val); } =20 diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index f59228c1a39e..b7fb3f936ae3 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -1049,19 +1049,21 @@ static void __evsel__config_callchain(struct evsel = *evsel, struct record_opts *o } =20 if (param->record_mode =3D=3D CALLCHAIN_DWARF) { + int abi; + if (!function) { uint16_t e_machine =3D evsel__e_machine(evsel, /*e_flags=3D*/NULL); =20 evsel__set_sample_bit(evsel, REGS_USER); evsel__set_sample_bit(evsel, STACK_USER); if (opts->sample_user_regs && - DWARF_MINIMAL_REGS(e_machine) !=3D perf_user_reg_mask(EM_HOST)) { + DWARF_MINIMAL_REGS(e_machine) !=3D perf_user_reg_mask(EM_HOST, &abi= )) { attr->sample_regs_user |=3D DWARF_MINIMAL_REGS(e_machine); pr_warning("WARNING: The use of --call-graph=3Ddwarf may require all t= he user registers, " "specifying a subset with --user-regs may render DWARF unwinding u= nreliable, " "so the minimal registers set (IP, SP) is explicitly forced.\n"); } else { - attr->sample_regs_user |=3D perf_user_reg_mask(EM_HOST); + attr->sample_regs_user |=3D perf_user_reg_mask(EM_HOST, &abi); } attr->sample_stack_user =3D param->dump_size; attr->exclude_callchain_user =3D 1; diff --git a/tools/perf/util/parse-regs-options.c b/tools/perf/util/parse-r= egs-options.c index c93c2f0c8105..518327883b18 100644 --- a/tools/perf/util/parse-regs-options.c +++ b/tools/perf/util/parse-regs-options.c @@ -10,7 +10,8 @@ #include "util/perf_regs.h" #include "util/parse-regs-options.h" =20 -static void list_perf_regs(FILE *fp, uint64_t mask) +static void +list_perf_regs(FILE *fp, uint64_t mask, int abi) { const char *last_name =3D NULL; =20 @@ -21,7 +22,7 @@ static void list_perf_regs(FILE *fp, uint64_t mask) if (((1ULL << reg) & mask) =3D=3D 0) continue; =20 - name =3D perf_reg_name(reg, EM_HOST, EF_HOST); + name =3D perf_reg_name(reg, EM_HOST, EF_HOST, abi); if (name && (!last_name || strcmp(last_name, name))) fprintf(fp, "%s%s", reg > 0 ? " " : "", name); last_name =3D name; @@ -29,7 +30,8 @@ static void list_perf_regs(FILE *fp, uint64_t mask) fputc('\n', fp); } =20 -static uint64_t name_to_perf_reg_mask(const char *to_match, uint64_t mask) +static uint64_t +name_to_perf_reg_mask(const char *to_match, uint64_t mask, int abi) { uint64_t reg_mask =3D 0; =20 @@ -39,7 +41,7 @@ static uint64_t name_to_perf_reg_mask(const char *to_matc= h, uint64_t mask) if (((1ULL << reg) & mask) =3D=3D 0) continue; =20 - name =3D perf_reg_name(reg, EM_HOST, EF_HOST); + name =3D perf_reg_name(reg, EM_HOST, EF_HOST, abi); if (!name) continue; =20 @@ -56,6 +58,7 @@ __parse_regs(const struct option *opt, const char *str, i= nt unset, bool intr) char *s, *os =3D NULL, *p; int ret =3D -1; uint64_t mask; + int abi; =20 if (unset) return 0; @@ -66,7 +69,7 @@ __parse_regs(const struct option *opt, const char *str, i= nt unset, bool intr) if (*mode) return -1; =20 - mask =3D intr ? perf_intr_reg_mask(EM_HOST) : perf_user_reg_mask(EM_HOST); + mask =3D intr ? perf_intr_reg_mask(EM_HOST, &abi) : perf_user_reg_mask(EM= _HOST, &abi); =20 /* str may be NULL in case no arg is passed to -I */ if (!str) { @@ -87,11 +90,11 @@ __parse_regs(const struct option *opt, const char *str,= int unset, bool intr) *p =3D '\0'; =20 if (!strcmp(s, "?")) { - list_perf_regs(stderr, mask); + list_perf_regs(stderr, mask, abi); goto error; } =20 - reg_mask =3D name_to_perf_reg_mask(s, mask); + reg_mask =3D name_to_perf_reg_mask(s, mask, abi); if (reg_mask =3D=3D 0) { ui__warning("Unknown register \"%s\", check man page or run \"perf reco= rd %s?\"\n", s, intr ? "-I" : "--user-regs=3D"); diff --git a/tools/perf/util/perf-regs-arch/perf_regs_x86.c b/tools/perf/ut= il/perf-regs-arch/perf_regs_x86.c index b6d20522b4e8..3e9241a11a95 100644 --- a/tools/perf/util/perf-regs-arch/perf_regs_x86.c +++ b/tools/perf/util/perf-regs-arch/perf_regs_x86.c @@ -235,26 +235,26 @@ int __perf_sdt_arg_parse_op_x86(char *old_op, char **= new_op) return SDT_ARG_VALID; } =20 -uint64_t __perf_reg_mask_x86(bool intr) +static uint64_t __arch__reg_mask(u64 sample_type, u64 mask, bool has_simd_= regs) { struct perf_event_attr attr =3D { - .type =3D PERF_TYPE_HARDWARE, - .config =3D PERF_COUNT_HW_CPU_CYCLES, - .sample_type =3D PERF_SAMPLE_REGS_INTR, - .sample_regs_intr =3D PERF_REG_EXTENDED_MASK, - .precise_ip =3D 1, - .disabled =3D 1, - .exclude_kernel =3D 1, + .type =3D PERF_TYPE_HARDWARE, + .config =3D PERF_COUNT_HW_CPU_CYCLES, + .sample_type =3D sample_type, + .precise_ip =3D 1, + .disabled =3D 1, + .exclude_kernel =3D 1, + .sample_simd_regs_enabled =3D has_simd_regs, }; int fd; - - if (!intr) - return PERF_REGS_MASK; - /* * In an unnamed union, init it here to build on older gcc versions */ attr.sample_period =3D 1; + if (sample_type =3D=3D PERF_SAMPLE_REGS_INTR) + attr.sample_regs_intr =3D mask; + else + attr.sample_regs_user =3D mask; =20 if (perf_pmus__num_core_pmus() > 1) { struct perf_pmu *pmu =3D NULL; @@ -276,13 +276,34 @@ uint64_t __perf_reg_mask_x86(bool intr) /*group_fd=3D*/-1, /*flags=3D*/0); if (fd !=3D -1) { close(fd); - return (PERF_REG_EXTENDED_MASK | PERF_REGS_MASK); + return mask; + } + + return 0; +} + +uint64_t __perf_reg_mask_x86(bool intr, int *abi) +{ + u64 sample_type =3D intr ? PERF_SAMPLE_REGS_INTR : PERF_SAMPLE_REGS_USER; + uint64_t mask =3D PERF_REGS_MASK; + + *abi =3D 0; + mask |=3D __arch__reg_mask(sample_type, + GENMASK_ULL(PERF_REG_X86_R31, PERF_REG_X86_R16), + true); + mask |=3D __arch__reg_mask(sample_type, BIT_ULL(PERF_REG_X86_SSP), true); + + if (mask !=3D PERF_REGS_MASK) { + *abi |=3D PERF_SAMPLE_REGS_ABI_SIMD; + } else { + mask |=3D __arch__reg_mask(sample_type, PERF_REG_EXTENDED_MASK, + false); } =20 - return PERF_REGS_MASK; + return mask; } =20 -const char *__perf_reg_name_x86(int id) +static const char *__arch_reg_gpr_name(int id) { switch (id) { case PERF_REG_X86_AX: @@ -333,7 +354,60 @@ const char *__perf_reg_name_x86(int id) return "R14"; case PERF_REG_X86_R15: return "R15"; + default: + return NULL; + } + + return NULL; +} =20 +static const char *__arch_reg_egpr_name(int id) +{ + switch (id) { + case PERF_REG_X86_R16: + return "R16"; + case PERF_REG_X86_R17: + return "R17"; + case PERF_REG_X86_R18: + return "R18"; + case PERF_REG_X86_R19: + return "R19"; + case PERF_REG_X86_R20: + return "R20"; + case PERF_REG_X86_R21: + return "R21"; + case PERF_REG_X86_R22: + return "R22"; + case PERF_REG_X86_R23: + return "R23"; + case PERF_REG_X86_R24: + return "R24"; + case PERF_REG_X86_R25: + return "R25"; + case PERF_REG_X86_R26: + return "R26"; + case PERF_REG_X86_R27: + return "R27"; + case PERF_REG_X86_R28: + return "R28"; + case PERF_REG_X86_R29: + return "R29"; + case PERF_REG_X86_R30: + return "R30"; + case PERF_REG_X86_R31: + return "R31"; + case PERF_REG_X86_SSP: + return "SSP"; + default: + return NULL; + } + + return NULL; +} + +static const char *__arch_reg_xmm_name(int id) +{ + switch (id) { #define XMM(x) \ case PERF_REG_X86_XMM ## x: \ case PERF_REG_X86_XMM ## x + 1: \ @@ -362,6 +436,22 @@ const char *__perf_reg_name_x86(int id) return NULL; } =20 +const char *__perf_reg_name_x86(int id, int abi) +{ + const char *name; + + name =3D __arch_reg_gpr_name(id); + if (name) + return name; + + if (abi & PERF_SAMPLE_REGS_ABI_SIMD) + name =3D __arch_reg_egpr_name(id); + else + name =3D __arch_reg_xmm_name(id); + + return name; +} + uint64_t __perf_reg_ip_x86(void) { return PERF_REG_X86_IP; diff --git a/tools/perf/util/perf_regs.c b/tools/perf/util/perf_regs.c index 5b8f34beb24e..bdd2eef13bc3 100644 --- a/tools/perf/util/perf_regs.c +++ b/tools/perf/util/perf_regs.c @@ -32,10 +32,11 @@ int perf_sdt_arg_parse_op(uint16_t e_machine, char *old= _op, char **new_op) return ret; } =20 -uint64_t perf_intr_reg_mask(uint16_t e_machine) +uint64_t perf_intr_reg_mask(uint16_t e_machine, int *abi) { uint64_t mask =3D 0; =20 + *abi =3D 0; switch (e_machine) { case EM_ARM: mask =3D __perf_reg_mask_arm(/*intr=3D*/true); @@ -64,7 +65,7 @@ uint64_t perf_intr_reg_mask(uint16_t e_machine) break; case EM_386: case EM_X86_64: - mask =3D __perf_reg_mask_x86(/*intr=3D*/true); + mask =3D __perf_reg_mask_x86(/*intr=3D*/true, abi); break; default: pr_debug("Unknown ELF machine %d, interrupt sampling register mask will = be empty.\n", @@ -75,10 +76,11 @@ uint64_t perf_intr_reg_mask(uint16_t e_machine) return mask; } =20 -uint64_t perf_user_reg_mask(uint16_t e_machine) +uint64_t perf_user_reg_mask(uint16_t e_machine, int *abi) { uint64_t mask =3D 0; =20 + *abi =3D 0; switch (e_machine) { case EM_ARM: mask =3D __perf_reg_mask_arm(/*intr=3D*/false); @@ -107,7 +109,7 @@ uint64_t perf_user_reg_mask(uint16_t e_machine) break; case EM_386: case EM_X86_64: - mask =3D __perf_reg_mask_x86(/*intr=3D*/false); + mask =3D __perf_reg_mask_x86(/*intr=3D*/false, abi); break; default: pr_debug("Unknown ELF machine %d, user sampling register mask will be em= pty.\n", @@ -118,7 +120,7 @@ uint64_t perf_user_reg_mask(uint16_t e_machine) return mask; } =20 -const char *perf_reg_name(int id, uint16_t e_machine, uint32_t e_flags) +const char *perf_reg_name(int id, uint16_t e_machine, uint32_t e_flags, in= t abi) { const char *reg_name =3D NULL; =20 @@ -150,7 +152,7 @@ const char *perf_reg_name(int id, uint16_t e_machine, u= int32_t e_flags) break; case EM_386: case EM_X86_64: - reg_name =3D __perf_reg_name_x86(id); + reg_name =3D __perf_reg_name_x86(id, abi); break; default: break; diff --git a/tools/perf/util/perf_regs.h b/tools/perf/util/perf_regs.h index 7c04700bf837..c9501ca8045d 100644 --- a/tools/perf/util/perf_regs.h +++ b/tools/perf/util/perf_regs.h @@ -13,10 +13,10 @@ enum { }; =20 int perf_sdt_arg_parse_op(uint16_t e_machine, char *old_op, char **new_op); -uint64_t perf_intr_reg_mask(uint16_t e_machine); -uint64_t perf_user_reg_mask(uint16_t e_machine); +uint64_t perf_intr_reg_mask(uint16_t e_machine, int *abi); +uint64_t perf_user_reg_mask(uint16_t e_machine, int *abi); =20 -const char *perf_reg_name(int id, uint16_t e_machine, uint32_t e_flags); +const char *perf_reg_name(int id, uint16_t e_machine, uint32_t e_flags, in= t abi); int perf_reg_value(u64 *valp, struct regs_dump *regs, int id); uint64_t perf_arch_reg_ip(uint16_t e_machine); uint64_t perf_arch_reg_sp(uint16_t e_machine); @@ -64,8 +64,8 @@ uint64_t __perf_reg_ip_s390(void); uint64_t __perf_reg_sp_s390(void); =20 int __perf_sdt_arg_parse_op_x86(char *old_op, char **new_op); -uint64_t __perf_reg_mask_x86(bool intr); -const char *__perf_reg_name_x86(int id); +uint64_t __perf_reg_mask_x86(bool intr, int *abi); +const char *__perf_reg_name_x86(int id, int abi); uint64_t __perf_reg_ip_x86(void); uint64_t __perf_reg_sp_x86(void); =20 diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools= /perf/util/scripting-engines/trace-event-python.c index 2b0df7bd9a46..4cc5b96898e6 100644 --- a/tools/perf/util/scripting-engines/trace-event-python.c +++ b/tools/perf/util/scripting-engines/trace-event-python.c @@ -733,7 +733,7 @@ static void regs_map(struct regs_dump *regs, uint64_t m= ask, uint16_t e_machine, =20 printed +=3D scnprintf(bf + printed, size - printed, "%5s:0x%" PRIx64 " ", - perf_reg_name(r, e_machine, e_flags), val); + perf_reg_name(r, e_machine, e_flags, regs->abi), val); } } =20 diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c index 4b465abfa36c..7cf7bf86205d 100644 --- a/tools/perf/util/session.c +++ b/tools/perf/util/session.c @@ -959,15 +959,16 @@ static void branch_stack__printf(struct perf_sample *= sample, } } =20 -static void regs_dump__printf(u64 mask, u64 *regs, uint16_t e_machine, uin= t32_t e_flags) +static void regs_dump__printf(u64 mask, struct regs_dump *regs, + uint16_t e_machine, uint32_t e_flags) { unsigned rid, i =3D 0; =20 for_each_set_bit(rid, (unsigned long *) &mask, sizeof(mask) * 8) { - u64 val =3D regs[i++]; + u64 val =3D regs->regs[i++]; =20 printf(".... %-5s 0x%016" PRIx64 "\n", - perf_reg_name(rid, e_machine, e_flags), val); + perf_reg_name(rid, e_machine, e_flags, regs->abi), val); } } =20 @@ -995,7 +996,7 @@ static void regs__printf(const char *type, struct regs_= dump *regs, mask, regs_dump_abi(regs)); =20 - regs_dump__printf(mask, regs->regs, e_machine, e_flags); + regs_dump__printf(mask, regs, e_machine, e_flags); } =20 static void regs_user__printf(struct perf_sample *sample, uint16_t e_machi= ne, uint32_t e_flags) --=20 2.34.1 From nobody Tue Feb 10 09:22:10 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.14]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8450E318140; Mon, 9 Feb 2026 08:39:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.14 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770626357; cv=none; b=ZLYmncdnrhP9rYNl7oStZ5kXfA/ynuPV7Hz3PgwdyVQtkrgKJ7SBddp7a8fjbr+QPVhIjjk6XA+WqxPD4gnh2s8c2GypHCtoFOg6vl9X/pwf5z4Mi6Ck/xeKk55LBRplyIBs5hDsrxL5ERKND6RIMAajF67hw5qvQ8LCOr7j1+w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770626357; c=relaxed/simple; bh=AW8IDotp2X4wBYiMFAxp3avcb36vx8AvyjVtpUf4l1w=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=V4DmggExr8lrOhytimiec/Dj+NV+dTXgRGLj9UvfKdDM+g7VJnG/h4vnoUQaiw+V0JTt9SGueUVvwzd8svkotYvX6OSZ1H/vdVczcYz4qEXZgiXq9LgIRhy6ma7i/A2VR15VpyZjAjp9OtxMjH3Pu80Hn4lLomJtG9vcIvsDF8k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=HuNZrCKK; arc=none smtp.client-ip=198.175.65.14 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="HuNZrCKK" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1770626358; x=1802162358; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=AW8IDotp2X4wBYiMFAxp3avcb36vx8AvyjVtpUf4l1w=; b=HuNZrCKKX3obKE9hqVWmr0QnSo/vlbx3/9HPZQ4VYeDvI1qBGcouYFTo Kjcm5ZGfAH2KIwuZkBmWOabA5MIEhkD9QpveUZRiysQgJgmzN6ZGdRbcm W+dTTgTJmHqSd5AgIeOQA7S3qxqHA01fIKH1piCATuwkvnJz8bVdTQH8a dcf5a5jLA9du4cMtVAUSVhuiap7zK8QrhbbPhMtnBAgzAl11pM1poZi3h ZJK+YURFUfwxUIDWIf2Poxqz8RGUztDJJq3vzq8GuQROt9RjpNvnv+AFi CLgVvh7wfE4WMRzYGwEbqYsnGjbpfcIAcTZFoEZzVBjJSdOqCFNZCVarz Q==; X-CSE-ConnectionGUID: lVjn9z7XSbuf0wxxHG0MGA== X-CSE-MsgGUID: 8CX+PUSqSeW5m2Fo81j1nw== X-IronPort-AV: E=McAfee;i="6800,10657,11695"; a="75580720" X-IronPort-AV: E=Sophos;i="6.21,281,1763452800"; d="scan'208";a="75580720" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by orvoesa106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Feb 2026 00:39:18 -0800 X-CSE-ConnectionGUID: ln2q+3QDRvSHBRZmXJgeeQ== X-CSE-MsgGUID: JI86iyhvSce8OeEH88E6Rw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.21,281,1763452800"; d="scan'208";a="211582292" Received: from spr.sh.intel.com ([10.112.229.196]) by orviesa007.jf.intel.com with ESMTP; 09 Feb 2026 00:39:15 -0800 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Ian Rogers , Adrian Hunter , Alexander Shishkin Cc: linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, Zide Chen , Falcon Thomas , Dapeng Mi , Xudong Hao , Dapeng Mi Subject: [Patch v6 3/4] perf regs: Support x86 SIMD registers sampling Date: Mon, 9 Feb 2026 16:35:13 +0800 Message-Id: <20260209083514.2225115-4-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260209083514.2225115-1-dapeng1.mi@linux.intel.com> References: <20260209083514.2225115-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This patch adds support for the newly introduced SIMD register sampling format by adding the following 5 functions: uint64_t perf_intr_simd_reg_class_mask(uint16_t e_machine, bool pred); uint64_t perf_user_simd_reg_class_mask(uint16_t e_machine, bool pred); uint64_t perf_intr_simd_reg_class_bitmap_qwords(uint16_t e_machine, int reg= _c, uint16_t *qwords, bool pred); uint64_t perf_user_simd_reg_class_bitmap_qwords(uint16_t e_machine, int reg= _c, uint16_t *qwords, bool pred); const char *perf_simd_reg_class_name(uint16_t e_machine, int id, bool pred); The perf_{intr|user}_simd_reg_class_mask() functions retrieve the bitmap of kernel supported SIMD/PRED register classes on current platform for intr-regs and user-regs sampling, such as OPMASK/XMM/YMM/ZMM on x86 platforms. The perf_{intr|user}_simd_reg_class_bitmap_qwords() functions retrieve the bitmap and qwords length of a certain class of SIMD/PRED register on current platform for intr-regs and user-regs sampling. For example, for the XMM registers on x86 platforms, the returned bitmap is 0xffff (XMM0 ~ XMM15) and the qwords length is 2 (128 bits for each XMM register). The perf_simd_reg_class_name() function gets the register class name for a certain register class index. Additionally, the function __parse_regs() is enhanced to support parsing these newly introduced SIMD/PRED registers. Currently, each class of register can only be sampled collectively; sampling a specific SIMD register is not supported. For example, all XMM registers are sampled together rather than sampling only XMM0. When multiple overlapping register types, such as XMM and YMM, are sampled simultaneously, only the superset (YMM registers) is sampled. With this patch, all supported sampling registers on x86 platforms are displayed as follows. $perf record --intr-regs=3D? available registers: AX BX CX DX SI DI BP SP IP FLAGS CS SS R8 R9 R10 R11 R12 R13 R14 R15 R16 R17 R18 R19 R20 R21 R22 R23 R24 R25 R26 R27 R28 R29 R30 R31 SSP XMM0-15 YMM0-15 ZMM0-31 OPMASK0-7 $perf record --user-regs=3D? available registers: AX BX CX DX SI DI BP SP IP FLAGS CS SS R8 R9 R10 R11 R12 R13 R14 R15 R16 R17 R18 R19 R20 R21 R22 R23 R24 R25 R26 R27 R28 R29 R30 R31 SSP XMM0-15 YMM0-15 ZMM0-31 OPMASK0-7 Signed-off-by: Dapeng Mi Reviewed-by: Ian Rogers --- tools/perf/util/evsel.c | 27 ++ tools/perf/util/parse-regs-options.c | 161 +++++++++- .../perf/util/perf-regs-arch/perf_regs_x86.c | 292 ++++++++++++++++++ tools/perf/util/perf_event_attr_fprintf.c | 6 + tools/perf/util/perf_regs.c | 72 +++++ tools/perf/util/perf_regs.h | 11 + tools/perf/util/record.h | 6 + 7 files changed, 565 insertions(+), 10 deletions(-) diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index b7fb3f936ae3..a86d2434a4ad 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -1583,12 +1583,39 @@ void evsel__config(struct evsel *evsel, struct reco= rd_opts *opts, if (opts->sample_intr_regs && !evsel->no_aux_samples && !evsel__is_dummy_event(evsel)) { attr->sample_regs_intr =3D opts->sample_intr_regs; + attr->sample_simd_regs_enabled =3D !!opts->sample_pred_reg_qwords; + evsel__set_sample_bit(evsel, REGS_INTR); + } + + if ((opts->sample_intr_vec_regs || opts->sample_intr_pred_regs) && + !evsel->no_aux_samples && !evsel__is_dummy_event(evsel)) { + /* The pred qwords is to implies the set of SIMD registers is used */ + if (opts->sample_pred_reg_qwords) + attr->sample_simd_pred_reg_qwords =3D opts->sample_pred_reg_qwords; + else + attr->sample_simd_pred_reg_qwords =3D 1; + attr->sample_simd_vec_reg_intr =3D opts->sample_intr_vec_regs; + attr->sample_simd_vec_reg_qwords =3D opts->sample_vec_reg_qwords; + attr->sample_simd_pred_reg_intr =3D opts->sample_intr_pred_regs; evsel__set_sample_bit(evsel, REGS_INTR); } =20 if (opts->sample_user_regs && !evsel->no_aux_samples && !evsel__is_dummy_event(evsel)) { attr->sample_regs_user |=3D opts->sample_user_regs; + attr->sample_simd_regs_enabled =3D !!opts->sample_pred_reg_qwords; + evsel__set_sample_bit(evsel, REGS_USER); + } + + if ((opts->sample_user_vec_regs || opts->sample_user_pred_regs) && + !evsel->no_aux_samples && !evsel__is_dummy_event(evsel)) { + if (opts->sample_pred_reg_qwords) + attr->sample_simd_pred_reg_qwords =3D opts->sample_pred_reg_qwords; + else + attr->sample_simd_pred_reg_qwords =3D 1; + attr->sample_simd_vec_reg_user =3D opts->sample_user_vec_regs; + attr->sample_simd_vec_reg_qwords =3D opts->sample_vec_reg_qwords; + attr->sample_simd_pred_reg_user =3D opts->sample_user_pred_regs; evsel__set_sample_bit(evsel, REGS_USER); } =20 diff --git a/tools/perf/util/parse-regs-options.c b/tools/perf/util/parse-r= egs-options.c index 518327883b18..f27960846edc 100644 --- a/tools/perf/util/parse-regs-options.c +++ b/tools/perf/util/parse-regs-options.c @@ -9,13 +9,13 @@ #include #include "util/perf_regs.h" #include "util/parse-regs-options.h" +#include "record.h" =20 static void -list_perf_regs(FILE *fp, uint64_t mask, int abi) +__list_gp_regs(FILE *fp, uint64_t mask, int abi) { const char *last_name =3D NULL; =20 - fprintf(fp, "available registers: "); for (int reg =3D 0; reg < 64; reg++) { const char *name; =20 @@ -27,14 +27,68 @@ list_perf_regs(FILE *fp, uint64_t mask, int abi) fprintf(fp, "%s%s", reg > 0 ? " " : "", name); last_name =3D name; } +} + +static void +__list_simd_regs(FILE *fp, uint64_t mask, bool intr, bool pred) +{ + uint64_t bitmap =3D 0; + uint16_t qwords =3D 0; + const char *name; + int i =3D 0; + + for (int reg_c =3D 0; reg_c < 64; reg_c++) { + if (((1ULL << reg_c) & mask) =3D=3D 0) + continue; + + name =3D perf_simd_reg_class_name(EM_HOST, reg_c, pred); + bitmap =3D intr ? + perf_intr_simd_reg_class_bitmap_qwords(EM_HOST, reg_c, &qwords, pred) : + perf_user_simd_reg_class_bitmap_qwords(EM_HOST, reg_c, &qwords, pred); + if (name && bitmap) + fprintf(fp, "%s%s0-%d", i++ > 0 ? " " : "", + name, fls64(bitmap) - 1); + } +} + +static void +list_perf_regs(FILE *fp, uint64_t mask, uint64_t simd_mask, + uint64_t pred_mask, int abi, bool intr) +{ + bool printed =3D false; + + fprintf(fp, "available registers: "); + + if (mask) { + __list_gp_regs(fp, mask, abi); + printed =3D true; + } + + if (simd_mask) { + if (printed) + fprintf(fp, " "); + __list_simd_regs(fp, simd_mask, intr, /*pred=3D*/false); + printed =3D true; + } + + if (pred_mask) { + if (printed) + fprintf(fp, " "); + __list_simd_regs(fp, pred_mask, intr, /*pred=3D*/true); + printed =3D true; + } + fputc('\n', fp); } =20 static uint64_t -name_to_perf_reg_mask(const char *to_match, uint64_t mask, int abi) +name_to_gp_reg_mask(const char *to_match, uint64_t mask, int abi) { uint64_t reg_mask =3D 0; =20 + if (!mask) + return reg_mask; + for (int reg =3D 0; reg < 64; reg++) { const char *name; =20 @@ -51,13 +105,78 @@ name_to_perf_reg_mask(const char *to_match, uint64_t m= ask, int abi) return reg_mask; } =20 +static bool +name_to_simd_reg_mask(struct record_opts *opts, const char *to_match, + uint64_t mask, bool intr, bool pred) +{ + bool matched =3D false; + uint64_t bitmap; + uint16_t qwords; + int reg_c; + + if (!mask) + return false; + + for (reg_c =3D 0; reg_c < 64; reg_c++) { + const char *name; + + if (((1ULL << reg_c) & mask) =3D=3D 0) + continue; + + name =3D perf_simd_reg_class_name(EM_HOST, reg_c, pred); + if (!name) + continue; + + if (!strcasecmp(to_match, name)) { + matched =3D true; + break; + } + } + + if (!matched) + return false; + + if (intr) { + bitmap =3D perf_intr_simd_reg_class_bitmap_qwords(EM_HOST, + reg_c, &qwords, pred); + } else { + bitmap =3D perf_user_simd_reg_class_bitmap_qwords(EM_HOST, + reg_c, &qwords, pred); + } + + /* Just need the highest qwords */ + if (pred) { + if (qwords >=3D opts->sample_pred_reg_qwords) { + opts->sample_pred_reg_qwords =3D qwords; + if (intr) + opts->sample_intr_pred_regs =3D bitmap; + else + opts->sample_user_pred_regs =3D bitmap; + } + } else { + if (qwords >=3D opts->sample_vec_reg_qwords) { + opts->sample_vec_reg_qwords =3D qwords; + if (intr) + opts->sample_intr_vec_regs =3D bitmap; + else + opts->sample_user_vec_regs =3D bitmap; + } + } + + return true; +} + static int __parse_regs(const struct option *opt, const char *str, int unset, bool in= tr) { uint64_t *mode =3D (uint64_t *)opt->value; + struct record_opts *opts; char *s, *os =3D NULL, *p; - int ret =3D -1; + uint64_t simd_mask; + uint64_t pred_mask; uint64_t mask; + bool matched; + int ret =3D -1; int abi; =20 if (unset) @@ -69,11 +188,16 @@ __parse_regs(const struct option *opt, const char *str= , int unset, bool intr) if (*mode) return -1; =20 - mask =3D intr ? perf_intr_reg_mask(EM_HOST, &abi) : perf_user_reg_mask(EM= _HOST, &abi); + mask =3D intr ? perf_intr_reg_mask(EM_HOST, &abi) : + perf_user_reg_mask(EM_HOST, &abi); + opts =3D intr ? container_of(opt->value, struct record_opts, sample_intr_= regs) : + container_of(opt->value, struct record_opts, sample_user_regs); =20 /* str may be NULL in case no arg is passed to -I */ if (!str) { *mode =3D mask; + if (abi & PERF_SAMPLE_REGS_ABI_SIMD) + opts->sample_pred_reg_qwords =3D 1; return 0; } =20 @@ -82,6 +206,14 @@ __parse_regs(const struct option *opt, const char *str,= int unset, bool intr) if (!s) return -1; =20 + if (intr) { + simd_mask =3D perf_intr_simd_reg_class_mask(EM_HOST, /*pred=3D*/false); + pred_mask =3D perf_intr_simd_reg_class_mask(EM_HOST, /*pred=3D*/true); + } else { + simd_mask =3D perf_user_simd_reg_class_mask(EM_HOST, /*pred=3D*/false); + pred_mask =3D perf_user_simd_reg_class_mask(EM_HOST, /*pred=3D*/true); + } + for (;;) { uint64_t reg_mask; =20 @@ -90,15 +222,24 @@ __parse_regs(const struct option *opt, const char *str= , int unset, bool intr) *p =3D '\0'; =20 if (!strcmp(s, "?")) { - list_perf_regs(stderr, mask, abi); + list_perf_regs(stderr, mask, simd_mask, pred_mask, abi, intr); goto error; } =20 - reg_mask =3D name_to_perf_reg_mask(s, mask, abi); - if (reg_mask =3D=3D 0) { - ui__warning("Unknown register \"%s\", check man page or run \"perf reco= rd %s?\"\n", + reg_mask =3D name_to_gp_reg_mask(s, mask, abi); + if (reg_mask) { + if (abi & PERF_SAMPLE_REGS_ABI_SIMD) + opts->sample_pred_reg_qwords =3D 1; + } else { + matched =3D name_to_simd_reg_mask(opts, s, simd_mask, + intr, /*pred=3D*/false) || + name_to_simd_reg_mask(opts, s, pred_mask, + intr, /*pred=3D*/true); + if (!matched) { + ui__warning("Unknown register \"%s\", check man page or run \"perf rec= ord %s?\"\n", s, intr ? "-I" : "--user-regs=3D"); - goto error; + goto error; + } } *mode |=3D reg_mask; =20 diff --git a/tools/perf/util/perf-regs-arch/perf_regs_x86.c b/tools/perf/ut= il/perf-regs-arch/perf_regs_x86.c index 3e9241a11a95..867059fc3cb0 100644 --- a/tools/perf/util/perf-regs-arch/perf_regs_x86.c +++ b/tools/perf/util/perf-regs-arch/perf_regs_x86.c @@ -461,3 +461,295 @@ uint64_t __perf_reg_sp_x86(void) { return PERF_REG_X86_SP; } + +enum { + PERF_REG_CLASS_X86_OPMASK =3D 0, + PERF_REG_CLASS_X86_XMM, + PERF_REG_CLASS_X86_YMM, + PERF_REG_CLASS_X86_ZMM, + PERF_REG_X86_MAX_SIMD_CLASSES, +}; + +#define PERF_REG_CLASS_X86_PRED_MASK (BIT(PERF_REG_CLASS_X86_OPMASK)) +#define PERF_REG_CLASS_X86_SIMD_MASK (BIT(PERF_REG_CLASS_X86_XMM) | \ + BIT(PERF_REG_CLASS_X86_YMM) | \ + BIT(PERF_REG_CLASS_X86_ZMM)) + +/* + * This function is used to determin whether kernel perf subsystem supports + * which kinds of SIMD registers (OPMASK/XMM/YMM/ZMM) sampling. + * + * @sample_type: PERF_SAMPLE_REGS_INTR or PERF_SAMPLE_REGS_USER + * @qwords: the length of SIMD register, like 1/2/4/8 qwords for + * OPMASK/XMM/YMM/ZMM regisers. + * @mask: the bitamsk of SIMD register, like 0xffff for XMM0 ~ XMM15 + * @pred: whether It's a preceding SIMD register, like OPMASK register. + * + * Return value: true indicates support, otherwise no support. + */ +static bool +__support_simd_reg_class(uint64_t sample_type, uint16_t qwords, + uint64_t mask, bool pred) +{ + struct perf_event_attr attr =3D { + .type =3D PERF_TYPE_HARDWARE, + .config =3D PERF_COUNT_HW_CPU_CYCLES, + .sample_type =3D sample_type, + .disabled =3D 1, + .exclude_kernel =3D 1, + .sample_simd_regs_enabled =3D 1, + }; + int fd; + + attr.sample_period =3D 1; + + if (!pred) { + attr.sample_simd_vec_reg_qwords =3D qwords; + if (sample_type =3D=3D PERF_SAMPLE_REGS_INTR) + attr.sample_simd_vec_reg_intr =3D mask; + else + attr.sample_simd_vec_reg_user =3D mask; + } else { + attr.sample_simd_pred_reg_qwords =3D PERF_X86_OPMASK_QWORDS; + if (sample_type =3D=3D PERF_SAMPLE_REGS_INTR) + attr.sample_simd_pred_reg_intr =3D PERF_X86_SIMD_PRED_MASK; + else + attr.sample_simd_pred_reg_user =3D PERF_X86_SIMD_PRED_MASK; + } + + if (perf_pmus__num_core_pmus() > 1) { + __u64 type =3D perf_pmus__find_core_pmu()->type; + + attr.config |=3D type << PERF_PMU_TYPE_SHIFT; + } + + event_attr_init(&attr); + + fd =3D sys_perf_event_open(&attr, 0, -1, -1, 0); + if (fd !=3D -1) { + close(fd); + return true; + } + + return false; +} + +#define PERF_X86_SIMD_ZMMH_REGS (PERF_X86_SIMD_ZMM_REGS / 2) + +static bool __arch_has_simd_reg_class(uint64_t sample_type, int reg_class, + uint64_t *mask, uint16_t *qwords) +{ + bool supported =3D false; + uint64_t bits; + + *mask =3D 0; + *qwords =3D 0; + + switch (reg_class) { + case PERF_REG_CLASS_X86_OPMASK: + bits =3D BIT_ULL(PERF_X86_SIMD_OPMASK_REGS) - 1; + supported =3D __support_simd_reg_class(sample_type, + PERF_X86_OPMASK_QWORDS, + bits, true); + if (supported) { + *mask =3D bits; + *qwords =3D PERF_X86_OPMASK_QWORDS; + } + break; + case PERF_REG_CLASS_X86_XMM: + bits =3D BIT_ULL(PERF_X86_SIMD_XMM_REGS) - 1; + supported =3D __support_simd_reg_class(sample_type, + PERF_X86_XMM_QWORDS, + bits, false); + if (supported) { + *mask =3D bits; + *qwords =3D PERF_X86_XMM_QWORDS; + } + break; + case PERF_REG_CLASS_X86_YMM: + bits =3D BIT_ULL(PERF_X86_SIMD_YMM_REGS) - 1; + supported =3D __support_simd_reg_class(sample_type, + PERF_X86_YMM_QWORDS, + bits, false); + if (supported) { + *mask =3D bits; + *qwords =3D PERF_X86_YMM_QWORDS; + } + break; + case PERF_REG_CLASS_X86_ZMM: + bits =3D BIT_ULL(PERF_X86_SIMD_ZMM_REGS) - 1; + supported =3D __support_simd_reg_class(sample_type, + PERF_X86_ZMM_QWORDS, + bits, false); + if (supported) { + *mask =3D bits; + *qwords =3D PERF_X86_ZMM_QWORDS; + break; + } + + bits =3D BIT_ULL(PERF_X86_SIMD_ZMMH_REGS) - 1; + supported =3D __support_simd_reg_class(sample_type, + PERF_X86_ZMM_QWORDS, + bits, false); + if (supported) { + *mask =3D bits; + *qwords =3D PERF_X86_ZMM_QWORDS; + } + break; + default: + break; + } + + return supported; +} + +static bool __support_simd_sampling(void) +{ + uint64_t mask =3D BIT_ULL(PERF_X86_SIMD_XMM_REGS) - 1; + uint16_t qwords =3D PERF_X86_XMM_QWORDS; + static bool simd_sampling_supported; + static bool cached; + + if (cached) + return simd_sampling_supported; + + simd_sampling_supported =3D + __arch_has_simd_reg_class(PERF_SAMPLE_REGS_INTR, + PERF_REG_CLASS_X86_XMM, + &mask, &qwords); + simd_sampling_supported |=3D + __arch_has_simd_reg_class(PERF_SAMPLE_REGS_USER, + PERF_REG_CLASS_X86_XMM, + &mask, &qwords); + cached =3D true; + + return simd_sampling_supported; +} + +/* + * @x86_intr_simd_cached: indicates the data of below 3 + * x86_intr_simd_* items has been retrieved from kernel and cached. + * @x86_intr_simd_reg_class_mask: indicates which kinds of PRED/SIMD + * registers are supported for intr-regs option. Assume kernel perf + * subsystem supports XMM/YMM sampling, then the mask is + * PERF_REG_CLASS_X86_XMM|PERF_REG_CLASS_X86_YMM. + * @x86_intr_simd_mask: indicates register bitmask for each kind of + * supported PRED/SIMD register, like + * x86_intr_simd_mask[PERF_REG_CLASS_X86_XMM] =3D 0xffff. + * @x86_intr_simd_mask: indicates the register length (qwords uinit) + * for each kind of supported PRED/SIMD register, like + * x86_intr_simd_qwords[PERF_REG_CLASS_X86_XMM] =3D 2. + */ +static bool x86_intr_simd_cached; +static uint64_t x86_intr_simd_reg_class_mask; +static uint64_t x86_intr_simd_mask[PERF_REG_X86_MAX_SIMD_CLASSES]; +static uint16_t x86_intr_simd_qwords[PERF_REG_X86_MAX_SIMD_CLASSES]; + +/* + * Similar with above x86_intr_simd_* items, the difference is these + * items are used for user-regs option. + */ +static bool x86_user_simd_cached; +static uint64_t x86_user_simd_reg_class_mask; +static uint64_t x86_user_simd_mask[PERF_REG_X86_MAX_SIMD_CLASSES]; +static uint16_t x86_user_simd_qwords[PERF_REG_X86_MAX_SIMD_CLASSES]; + +static uint64_t __arch__simd_reg_class_mask(bool intr) +{ + uint64_t mask =3D 0; + bool supported; + int reg_c; + + if (!__support_simd_sampling()) + return 0; + + if (intr && x86_intr_simd_cached) + return x86_intr_simd_reg_class_mask; + + if (!intr && x86_user_simd_cached) + return x86_user_simd_reg_class_mask; + + for (reg_c =3D 0; reg_c < PERF_REG_X86_MAX_SIMD_CLASSES; reg_c++) { + supported =3D false; + + if (intr) { + supported =3D __arch_has_simd_reg_class( + PERF_SAMPLE_REGS_INTR, + reg_c, + &x86_intr_simd_mask[reg_c], + &x86_intr_simd_qwords[reg_c]); + } else { + supported =3D __arch_has_simd_reg_class( + PERF_SAMPLE_REGS_USER, + reg_c, + &x86_user_simd_mask[reg_c], + &x86_user_simd_qwords[reg_c]); + } + if (supported) + mask |=3D BIT_ULL(reg_c); + } + + if (intr) { + x86_intr_simd_reg_class_mask =3D mask; + x86_intr_simd_cached =3D true; + } else { + x86_user_simd_reg_class_mask =3D mask; + x86_user_simd_cached =3D true; + } + + return mask; +} + +static uint64_t +__arch__simd_reg_class_bitmap_qwords(bool intr, int reg_c, uint16_t *qword= s) +{ + uint64_t mask =3D 0; + + *qwords =3D 0; + if (reg_c >=3D PERF_REG_X86_MAX_SIMD_CLASSES) + return mask; + + if (intr) { + mask =3D x86_intr_simd_mask[reg_c]; + *qwords =3D x86_intr_simd_qwords[reg_c]; + } else { + mask =3D x86_user_simd_mask[reg_c]; + *qwords =3D x86_user_simd_qwords[reg_c]; + } + + return mask; +} + +uint64_t __perf_simd_reg_class_mask_x86(bool intr, bool pred) +{ + uint64_t mask =3D __arch__simd_reg_class_mask(intr); + + return pred ? mask & PERF_REG_CLASS_X86_PRED_MASK : + mask & PERF_REG_CLASS_X86_SIMD_MASK; +} + +uint64_t __perf_simd_reg_class_bitmap_qwords_x86(int reg_c, uint16_t *qwor= ds, + bool intr, bool pred) +{ + if (!x86_intr_simd_cached) + __perf_simd_reg_class_mask_x86(intr, pred); + return __arch__simd_reg_class_bitmap_qwords(intr, reg_c, qwords); +} + +const char *__perf_simd_reg_class_name_x86(int id, bool pred __maybe_unuse= d) +{ + switch (id) { + case PERF_REG_CLASS_X86_OPMASK: + return "OPMASK"; + case PERF_REG_CLASS_X86_XMM: + return "XMM"; + case PERF_REG_CLASS_X86_YMM: + return "YMM"; + case PERF_REG_CLASS_X86_ZMM: + return "ZMM"; + default: + return NULL; + } + + return NULL; +} diff --git a/tools/perf/util/perf_event_attr_fprintf.c b/tools/perf/util/pe= rf_event_attr_fprintf.c index 741c3d657a8b..c6b8e53e06fd 100644 --- a/tools/perf/util/perf_event_attr_fprintf.c +++ b/tools/perf/util/perf_event_attr_fprintf.c @@ -362,6 +362,12 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_eve= nt_attr *attr, PRINT_ATTRf(aux_start_paused, p_unsigned); PRINT_ATTRf(aux_pause, p_unsigned); PRINT_ATTRf(aux_resume, p_unsigned); + PRINT_ATTRf(sample_simd_pred_reg_qwords, p_unsigned); + PRINT_ATTRf(sample_simd_pred_reg_intr, p_hex); + PRINT_ATTRf(sample_simd_pred_reg_user, p_hex); + PRINT_ATTRf(sample_simd_vec_reg_qwords, p_unsigned); + PRINT_ATTRf(sample_simd_vec_reg_intr, p_hex); + PRINT_ATTRf(sample_simd_vec_reg_user, p_hex); =20 return ret; } diff --git a/tools/perf/util/perf_regs.c b/tools/perf/util/perf_regs.c index bdd2eef13bc3..0ad40421f34e 100644 --- a/tools/perf/util/perf_regs.c +++ b/tools/perf/util/perf_regs.c @@ -248,3 +248,75 @@ uint64_t perf_arch_reg_sp(uint16_t e_machine) return 0; } } + +uint64_t perf_intr_simd_reg_class_mask(uint16_t e_machine, bool pred) +{ + switch (e_machine) { + case EM_386: + case EM_X86_64: + return __perf_simd_reg_class_mask_x86(/*intr=3D*/true, pred); + default: + return 0; + } +} + +uint64_t perf_user_simd_reg_class_mask(uint16_t e_machine, bool pred) +{ + switch (e_machine) { + case EM_386: + case EM_X86_64: + return __perf_simd_reg_class_mask_x86(/*intr=3D*/false, pred); + default: + return 0; + } +} + +uint64_t perf_intr_simd_reg_class_bitmap_qwords(uint16_t e_machine, int re= g_c, + uint16_t *qwords, bool pred) +{ + switch (e_machine) { + case EM_386: + case EM_X86_64: + return __perf_simd_reg_class_bitmap_qwords_x86(reg_c, qwords, + /*intr=3D*/true, + pred); + default: + *qwords =3D 0; + return 0; + } +} + +uint64_t perf_user_simd_reg_class_bitmap_qwords(uint16_t e_machine, int re= g_c, + uint16_t *qwords, bool pred) +{ + switch (e_machine) { + case EM_386: + case EM_X86_64: + return __perf_simd_reg_class_bitmap_qwords_x86(reg_c, qwords, + /*intr=3D*/false, + pred); + default: + *qwords =3D 0; + return 0; + } +} + +const char *perf_simd_reg_class_name(uint16_t e_machine, int id, bool pred) +{ + const char *name =3D NULL; + + switch (e_machine) { + case EM_386: + case EM_X86_64: + name =3D __perf_simd_reg_class_name_x86(id, pred); + break; + default: + break; + } + if (name) + return name; + + pr_debug("Failed to find %s register %d for ELF machine type %u\n", + pred ? "PRED" : "SIMD", id, e_machine); + return "unknown"; +} diff --git a/tools/perf/util/perf_regs.h b/tools/perf/util/perf_regs.h index c9501ca8045d..80d1d7316188 100644 --- a/tools/perf/util/perf_regs.h +++ b/tools/perf/util/perf_regs.h @@ -20,6 +20,13 @@ const char *perf_reg_name(int id, uint16_t e_machine, ui= nt32_t e_flags, int abi) int perf_reg_value(u64 *valp, struct regs_dump *regs, int id); uint64_t perf_arch_reg_ip(uint16_t e_machine); uint64_t perf_arch_reg_sp(uint16_t e_machine); +uint64_t perf_intr_simd_reg_class_mask(uint16_t e_machine, bool pred); +uint64_t perf_user_simd_reg_class_mask(uint16_t e_machine, bool pred); +uint64_t perf_intr_simd_reg_class_bitmap_qwords(uint16_t e_machine, int re= g_c, + uint16_t *qwords, bool pred); +uint64_t perf_user_simd_reg_class_bitmap_qwords(uint16_t e_machine, int re= g_c, + uint16_t *qwords, bool pred); +const char *perf_simd_reg_class_name(uint16_t e_machine, int id, bool pred= ); =20 int __perf_sdt_arg_parse_op_arm64(char *old_op, char **new_op); uint64_t __perf_reg_mask_arm64(bool intr); @@ -68,6 +75,10 @@ uint64_t __perf_reg_mask_x86(bool intr, int *abi); const char *__perf_reg_name_x86(int id, int abi); uint64_t __perf_reg_ip_x86(void); uint64_t __perf_reg_sp_x86(void); +uint64_t __perf_simd_reg_class_mask_x86(bool intr, bool pred); +uint64_t __perf_simd_reg_class_bitmap_qwords_x86(int reg_c, uint16_t *qwor= ds, + bool intr, bool pred); +const char *__perf_simd_reg_class_name_x86(int id, bool pred); =20 static inline uint64_t DWARF_MINIMAL_REGS(uint16_t e_machine) { diff --git a/tools/perf/util/record.h b/tools/perf/util/record.h index 93627c9a7338..37ed44b5f15b 100644 --- a/tools/perf/util/record.h +++ b/tools/perf/util/record.h @@ -62,6 +62,12 @@ struct record_opts { u64 branch_stack; u64 sample_intr_regs; u64 sample_user_regs; + u64 sample_intr_vec_regs; + u64 sample_user_vec_regs; + u32 sample_intr_pred_regs; + u32 sample_user_pred_regs; + u16 sample_vec_reg_qwords; + u16 sample_pred_reg_qwords; u64 default_interval; u64 user_interval; size_t auxtrace_snapshot_size; --=20 2.34.1 From nobody Tue Feb 10 09:22:10 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.14]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3F009318144; Mon, 9 Feb 2026 08:39:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.14 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770626361; cv=none; b=dHM0sF9WQvszNM5ALsmWHl4htshYVN53nKBOxLoeJvHS9t4o8d0eoyANVInwVqLl5ft7nUbM2Aum4JtAWey3C1IKezP/l3IQkjecM0SKTE589k5ktRSswirAXEknm6Ys6k8nBt4KjwsTHZWTOl2RN2WHBMxfcEWtnUNnGkttSXY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770626361; c=relaxed/simple; bh=E9saGuhJGOU9+I6Yjce+ud/XgxCtUAe4jefhxgreW1c=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=nad/FYhIeQpfch/CLQ+G4rNgcCJsEk6aglk/0ZJ7vBMtQSCy8sr072tH6rI0JeB9+bSC0gwLK5RmlIvqLyh2k3YC8u4/2QrBzWviSGaWwSke5q1mw9z6hgWBnPefe3fJZ9prKlZ00z55WdKp7V/RHjzG+odRO+bt5jLKtfm+1BM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=biTJAkH0; arc=none smtp.client-ip=198.175.65.14 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="biTJAkH0" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1770626362; x=1802162362; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=E9saGuhJGOU9+I6Yjce+ud/XgxCtUAe4jefhxgreW1c=; b=biTJAkH0l37AQiKQ9FJblFJ+UHWGY2cD77SovSFsVxQRK1MEwbNes4Ae XUNEyhzSQ3dOPqhdO5OUZMq46ik44B63FGFHNOfRhUC6xjWPFZlTXbfqs d+khOgGXB83HhJdv+ocxr5VLHxBZgPxSg7Bs+BOqZrpGPHXiwdCZCKlqg PwyaNAQ0QfAuxK9kV1Urwi9ZD4FMDv2S2S+93Y3fTkHJdNYc40uri7cQW JrIpynOA3Xy/Gjyxn+fm/XkAH2Ji0/bu+UElQQIIuoIS4Q8fkbmjp7Syy cmkL2EKhN2+A+HSEoQ7eZSAbSpIRTd8FtZd/yJCjdM3I4De13ePrn8wCK A==; X-CSE-ConnectionGUID: uVIygSjhQFGU0y5v0py1NA== X-CSE-MsgGUID: AJknPYOGSzyyuwIV0a4yfA== X-IronPort-AV: E=McAfee;i="6800,10657,11695"; a="75580733" X-IronPort-AV: E=Sophos;i="6.21,281,1763452800"; d="scan'208";a="75580733" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by orvoesa106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Feb 2026 00:39:21 -0800 X-CSE-ConnectionGUID: S25R2+EyQWaYUkSqdxWUrg== X-CSE-MsgGUID: l2aZxljuQMeIqE6bWMGOew== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.21,281,1763452800"; d="scan'208";a="211582346" Received: from spr.sh.intel.com ([10.112.229.196]) by orviesa007.jf.intel.com with ESMTP; 09 Feb 2026 00:39:18 -0800 From: Dapeng Mi To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Ian Rogers , Adrian Hunter , Alexander Shishkin Cc: linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, Zide Chen , Falcon Thomas , Dapeng Mi , Xudong Hao , Kan Liang , Dapeng Mi Subject: [Patch v6 4/4] perf regs: Enable dumping of SIMD registers Date: Mon, 9 Feb 2026 16:35:14 +0800 Message-Id: <20260209083514.2225115-5-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260209083514.2225115-1-dapeng1.mi@linux.intel.com> References: <20260209083514.2225115-1-dapeng1.mi@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kan Liang This patch adds support for dumping SIMD registers using the new PERF_SAMPLE_REGS_ABI_SIMD ABI. Currently, the XMM, YMM, ZMM, OPMASK, eGPRs, and SSP registers on x86 platforms are supported with the PERF_SAMPLE_REGS_ABI_SIMD ABI. An example of the output is displayed below. Example: $perf record -e cycles:p -IXMM,YMM,OPMASK,SSP ./test $perf report -D ... ... 237538985992962 0x454d0 [0x480]: PERF_RECORD_SAMPLE(IP, 0x1): 179370/179370: 0xffffffff969627fc period: 124999 addr: 0 ... intr regs: mask 0x20000000000 ABI 64-bit .... SSP 0x0000000000000000 ... SIMD ABI nr_vectors 32 vector_qwords 4 nr_pred 8 pred_qwords 1 .... YMM [0] 0x0000000000004000 .... YMM [0] 0x000055e828695270 .... YMM [0] 0x0000000000000000 .... YMM [0] 0x0000000000000000 .... YMM [1] 0x000055e8286990e0 .... YMM [1] 0x000055e828698dd0 .... YMM [1] 0x0000000000000000 .... YMM [1] 0x0000000000000000 ... ... .... YMM [31] 0x0000000000000000 .... YMM [31] 0x0000000000000000 .... YMM [31] 0x0000000000000000 .... YMM [31] 0x0000000000000000 .... OPMASK[0] 0x0000000000100221 .... OPMASK[1] 0x0000000000000020 .... OPMASK[2] 0x000000007fffffff .... OPMASK[3] 0x0000000000000000 .... OPMASK[4] 0x0000000000000000 .... OPMASK[5] 0x0000000000000000 .... OPMASK[6] 0x0000000000000000 .... OPMASK[7] 0x0000000000000000 ... ... Signed-off-by: Kan Liang Co-developed-by: Dapeng Mi Signed-off-by: Dapeng Mi --- tools/perf/util/evsel.c | 20 ++++++++++ tools/perf/util/sample.h | 10 +++++ tools/perf/util/session.c | 77 +++++++++++++++++++++++++++++++++++---- 3 files changed, 99 insertions(+), 8 deletions(-) diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index a86d2434a4ad..2e1d50a72762 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -3514,6 +3514,16 @@ int evsel__parse_sample(struct evsel *evsel, union p= erf_event *event, regs->mask =3D mask; regs->regs =3D (u64 *)array; array =3D (void *)array + sz; + + if (regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) { + regs->config =3D *(u64 *)array; + array =3D (void *)array + sizeof(u64); + regs->data =3D (u64 *)array; + sz =3D (regs->nr_vectors * regs->vector_qwords + + regs->nr_pred * regs->pred_qwords) * sizeof(u64); + OVERFLOW_CHECK(array, sz, max_size); + array =3D (void *)array + sz; + } } } =20 @@ -3571,6 +3581,16 @@ int evsel__parse_sample(struct evsel *evsel, union p= erf_event *event, regs->mask =3D mask; regs->regs =3D (u64 *)array; array =3D (void *)array + sz; + + if (regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) { + regs->config =3D *(u64 *)array; + array =3D (void *)array + sizeof(u64); + regs->data =3D (u64 *)array; + sz =3D (regs->nr_vectors * regs->vector_qwords + + regs->nr_pred * regs->pred_qwords) * sizeof(u64); + OVERFLOW_CHECK(array, sz, max_size); + array =3D (void *)array + sz; + } } } =20 diff --git a/tools/perf/util/sample.h b/tools/perf/util/sample.h index 3cce8dd202aa..b98bc58d365e 100644 --- a/tools/perf/util/sample.h +++ b/tools/perf/util/sample.h @@ -15,6 +15,16 @@ struct regs_dump { u64 abi; u64 mask; u64 *regs; + union { + u64 config; + struct { + u16 nr_vectors; + u16 vector_qwords; + u16 nr_pred; + u16 pred_qwords; + }; + }; + u64 *data; =20 /* Cached values/mask filled by first register access. */ u64 cache_regs[PERF_SAMPLE_REGS_CACHE_SIZE]; diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c index 7cf7bf86205d..fba8ef52f0a1 100644 --- a/tools/perf/util/session.c +++ b/tools/perf/util/session.c @@ -972,18 +972,77 @@ static void regs_dump__printf(u64 mask, struct regs_d= ump *regs, } } =20 -static const char *regs_abi[] =3D { - [PERF_SAMPLE_REGS_ABI_NONE] =3D "none", - [PERF_SAMPLE_REGS_ABI_32] =3D "32-bit", - [PERF_SAMPLE_REGS_ABI_64] =3D "64-bit", -}; +static void simd_regs_dump__printf(struct regs_dump *regs, bool intr) +{ + const char *name =3D "unknown"; + int i, idx =3D 0; + uint16_t qwords; + int reg_c; + + if (!(regs->abi & PERF_SAMPLE_REGS_ABI_SIMD)) + return; + + printf("... SIMD ABI nr_vectors %d vector_qwords %d nr_pred %d pred_qword= s %d\n", + regs->nr_vectors, regs->vector_qwords, + regs->nr_pred, regs->pred_qwords); + + for (reg_c =3D 0; reg_c < 64; reg_c++) { + if (intr) { + perf_intr_simd_reg_class_bitmap_qwords(EM_HOST, reg_c, + &qwords, /*pred=3D*/false); + } else { + perf_user_simd_reg_class_bitmap_qwords(EM_HOST, reg_c, + &qwords, /*pred=3D*/false); + } + if (regs->vector_qwords =3D=3D qwords) { + name =3D perf_simd_reg_class_name(EM_HOST, reg_c, /*pred=3D*/false); + break; + } + } + + for (i =3D 0; i < regs->nr_vectors; i++) { + printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]); + printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]); + if (regs->vector_qwords > 2) { + printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]); + printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]); + } + if (regs->vector_qwords > 4) { + printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]); + printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]); + printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]); + printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]); + } + } + + name =3D "unknown"; + for (reg_c =3D 0; reg_c < 64; reg_c++) { + if (intr) { + perf_intr_simd_reg_class_bitmap_qwords(EM_HOST, reg_c, + &qwords, /*pred=3D*/true); + } else { + perf_user_simd_reg_class_bitmap_qwords(EM_HOST, reg_c, + &qwords, /*pred=3D*/true); + } + if (regs->pred_qwords =3D=3D qwords) { + name =3D perf_simd_reg_class_name(EM_HOST, reg_c, /*pred=3D*/true); + break; + } + } + for (i =3D 0; i < regs->nr_pred; i++) + printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->data[idx++]); +} =20 static inline const char *regs_dump_abi(struct regs_dump *d) { - if (d->abi > PERF_SAMPLE_REGS_ABI_64) - return "unknown"; + if (!d->abi) + return "none"; + if (d->abi & PERF_SAMPLE_REGS_ABI_32) + return "32-bit"; + else if (d->abi & PERF_SAMPLE_REGS_ABI_64) + return "64-bit"; =20 - return regs_abi[d->abi]; + return "unknown"; } =20 static void regs__printf(const char *type, struct regs_dump *regs, @@ -1010,6 +1069,7 @@ static void regs_user__printf(struct perf_sample *sam= ple, uint16_t e_machine, ui =20 if (user_regs->regs) regs__printf("user", user_regs, e_machine, e_flags); + simd_regs_dump__printf(user_regs, /*intr=3D*/false); } =20 static void regs_intr__printf(struct perf_sample *sample, uint16_t e_machi= ne, uint32_t e_flags) @@ -1023,6 +1083,7 @@ static void regs_intr__printf(struct perf_sample *sam= ple, uint16_t e_machine, ui =20 if (intr_regs->regs) regs__printf("intr", intr_regs, e_machine, e_flags); + simd_regs_dump__printf(intr_regs, /*intr=3D*/true); } =20 static void stack_user__printf(struct stack_dump *dump) --=20 2.34.1