arch/arm/kernel/perf_regs.c | 9 +- arch/arm64/kernel/perf_regs.c | 9 +- arch/csky/kernel/perf_regs.c | 9 +- arch/loongarch/kernel/perf_regs.c | 8 +- arch/mips/kernel/perf_regs.c | 9 +- arch/powerpc/perf/perf_regs.c | 9 +- arch/riscv/kernel/perf_regs.c | 8 +- arch/s390/kernel/perf_regs.c | 9 +- arch/x86/events/core.c | 226 ++++++++++++++++++++++++-- arch/x86/events/intel/core.c | 49 ++++++ arch/x86/events/intel/ds.c | 12 +- arch/x86/events/perf_event.h | 58 +++++++ arch/x86/include/asm/fpu/xstate.h | 1 + arch/x86/include/asm/perf_event.h | 6 + arch/x86/include/uapi/asm/perf_regs.h | 101 ++++++++++++ arch/x86/kernel/fpu/xstate.c | 22 +++ arch/x86/kernel/perf_regs.c | 85 +++++++++- include/linux/perf_event.h | 23 +++ include/linux/perf_regs.h | 29 +++- include/uapi/linux/perf_event.h | 8 + kernel/events/core.c | 63 +++++-- 21 files changed, 699 insertions(+), 54 deletions(-)
From: Kan Liang <kan.liang@linux.intel.com>
Starting from the Intel Ice Lake, the XMM registers can be collected in
a PEBS record. More registers, e.g., YMM, ZMM, OPMASK, SPP and APX, will
be added in the upcoming Architecture PEBS as well. But it requires the
hardware support.
The patch set provides a software solution to mitigate the hardware
requirement. It utilizes the XSAVES command to retrieve the requested
registers in the overflow handler. The feature isn't limited to the PEBS
event or specific platforms anymore.
The hardware solution (if available) is still preferred, since it has
low overhead (especially with the large PEBS) and is more accurate.
In theory, the solution should work for all X86 platforms. But I only
have newer Inter platforms to test. The patch set only enable the
feature for Intel Ice Lake and later platforms.
Open:
The new registers include YMM, ZMM, OPMASK, SSP, and APX.
The sample_regs_user/intr has run out. A new field in the
struct perf_event_attr is required for the registers.
There could be several options as below for the new field.
- Follow a similar format to XSAVES. Introduce the below fields to store
the bitmap of the registers.
struct perf_event_attr {
...
__u64 sample_ext_regs_intr[2];
__u64 sample_ext_regs_user[2];
...
}
Includes YMMH (16 bits), APX (16 bits), OPMASK (8 bits),
ZMMH0-15 (16 bits), H16ZMM (16 bits), SSP
For example, if a user wants YMM8, the perf tool needs to set the
corresponding bits of XMM8 and YMMH8, and reconstruct the result.
The method is similar to the existing method for
sample_regs_user/intr, and match the XSAVES format.
The kernel doesn't need to do extra configuration and reconstruction.
It's implemented in the patch set.
- Similar to the above method. But the fields are the bitmap of the
complete registers, E.g., YMM (16 bits), APX (16 bits),
OPMASK (8 bits), ZMM (32 bits), SSP.
The kernel needs to do extra configuration and reconstruction,
which may brings extra overhead.
- Combine the XMM, YMM, and ZMM. So all the registers can be put into
one u64 field.
...
union {
__u64 sample_ext_regs_intr; //sample_ext_regs_user is simiar
struct {
__u32 vector_bitmap;
__u32 vector_type : 3, //0b001 XMM 0b010 YMM 0b100 ZMM
apx_bitmap : 16,
opmask_bitmap : 8,
ssp_bitmap : 1,
reserved : 4,
};
...
For example, if the YMM8-15 is required,
vector_bitmap: 0x0000ff00
vector_type: 0x2
This method can save two __u64 in the struct perf_event_attr.
But it's not straightforward since it mixes the type and bitmap.
The kernel also needs to do extra configuration and reconstruction.
Please let me know if there are more ideas.
Thanks,
Kan
Kan Liang (12):
perf/x86: Use x86_perf_regs in the x86 nmi handler
perf/x86: Setup the regs data
x86/fpu/xstate: Add xsaves_nmi
perf: Move has_extended_regs() to header file
perf/x86: Support XMM register for non-PEBS and REGS_USER
perf: Support extension of sample_regs
perf/x86: Add YMMH in extended regs
perf/x86: Add APX in extended regs
perf/x86: Add OPMASK in extended regs
perf/x86: Add ZMM in extended regs
perf/x86: Add SSP in extended regs
perf/x86/intel: Support extended registers
arch/arm/kernel/perf_regs.c | 9 +-
arch/arm64/kernel/perf_regs.c | 9 +-
arch/csky/kernel/perf_regs.c | 9 +-
arch/loongarch/kernel/perf_regs.c | 8 +-
arch/mips/kernel/perf_regs.c | 9 +-
arch/powerpc/perf/perf_regs.c | 9 +-
arch/riscv/kernel/perf_regs.c | 8 +-
arch/s390/kernel/perf_regs.c | 9 +-
arch/x86/events/core.c | 226 ++++++++++++++++++++++++--
arch/x86/events/intel/core.c | 49 ++++++
arch/x86/events/intel/ds.c | 12 +-
arch/x86/events/perf_event.h | 58 +++++++
arch/x86/include/asm/fpu/xstate.h | 1 +
arch/x86/include/asm/perf_event.h | 6 +
arch/x86/include/uapi/asm/perf_regs.h | 101 ++++++++++++
arch/x86/kernel/fpu/xstate.c | 22 +++
arch/x86/kernel/perf_regs.c | 85 +++++++++-
include/linux/perf_event.h | 23 +++
include/linux/perf_regs.h | 29 +++-
include/uapi/linux/perf_event.h | 8 +
kernel/events/core.c | 63 +++++--
21 files changed, 699 insertions(+), 54 deletions(-)
--
2.38.1
On Fri, Jun 13, 2025 at 06:49:31AM -0700, kan.liang@linux.intel.com wrote:
> From: Kan Liang <kan.liang@linux.intel.com>
>
> Starting from the Intel Ice Lake, the XMM registers can be collected in
> a PEBS record. More registers, e.g., YMM, ZMM, OPMASK, SPP and APX, will
> be added in the upcoming Architecture PEBS as well. But it requires the
> hardware support.
>
> The patch set provides a software solution to mitigate the hardware
> requirement. It utilizes the XSAVES command to retrieve the requested
> registers in the overflow handler. The feature isn't limited to the PEBS
> event or specific platforms anymore.
> The hardware solution (if available) is still preferred, since it has
> low overhead (especially with the large PEBS) and is more accurate.
>
> In theory, the solution should work for all X86 platforms. But I only
> have newer Inter platforms to test. The patch set only enable the
> feature for Intel Ice Lake and later platforms.
>
> Open:
> The new registers include YMM, ZMM, OPMASK, SSP, and APX.
> The sample_regs_user/intr has run out. A new field in the
> struct perf_event_attr is required for the registers.
> There could be several options as below for the new field.
>
> - Follow a similar format to XSAVES. Introduce the below fields to store
> the bitmap of the registers.
> struct perf_event_attr {
> ...
> __u64 sample_ext_regs_intr[2];
> __u64 sample_ext_regs_user[2];
> ...
> }
> Includes YMMH (16 bits), APX (16 bits), OPMASK (8 bits),
> ZMMH0-15 (16 bits), H16ZMM (16 bits), SSP
> For example, if a user wants YMM8, the perf tool needs to set the
> corresponding bits of XMM8 and YMMH8, and reconstruct the result.
> The method is similar to the existing method for
> sample_regs_user/intr, and match the XSAVES format.
> The kernel doesn't need to do extra configuration and reconstruction.
> It's implemented in the patch set.
>
> - Similar to the above method. But the fields are the bitmap of the
> complete registers, E.g., YMM (16 bits), APX (16 bits),
> OPMASK (8 bits), ZMM (32 bits), SSP.
> The kernel needs to do extra configuration and reconstruction,
> which may brings extra overhead.
>
> - Combine the XMM, YMM, and ZMM. So all the registers can be put into
> one u64 field.
> ...
> union {
> __u64 sample_ext_regs_intr; //sample_ext_regs_user is simiar
> struct {
> __u32 vector_bitmap;
> __u32 vector_type : 3, //0b001 XMM 0b010 YMM 0b100 ZMM
> apx_bitmap : 16,
> opmask_bitmap : 8,
> ssp_bitmap : 1,
> reserved : 4,
>
> };
> ...
> For example, if the YMM8-15 is required,
> vector_bitmap: 0x0000ff00
> vector_type: 0x2
> This method can save two __u64 in the struct perf_event_attr.
> But it's not straightforward since it mixes the type and bitmap.
> The kernel also needs to do extra configuration and reconstruction.
>
> Please let me know if there are more ideas.
https://lkml.kernel.org/r/20250416155327.GD17910@noisy.programming.kicks-ass.net
comes to mind. Combine that with a rule that reclaims the XMM register
space from perf_event_x86_regs when sample_simd_reg_words != 0, and then
we can put APX and SPP there.
On 2025-06-17 4:24 a.m., Peter Zijlstra wrote:
> On Fri, Jun 13, 2025 at 06:49:31AM -0700, kan.liang@linux.intel.com wrote:
>> From: Kan Liang <kan.liang@linux.intel.com>
>>
>> Starting from the Intel Ice Lake, the XMM registers can be collected in
>> a PEBS record. More registers, e.g., YMM, ZMM, OPMASK, SPP and APX, will
>> be added in the upcoming Architecture PEBS as well. But it requires the
>> hardware support.
>>
>> The patch set provides a software solution to mitigate the hardware
>> requirement. It utilizes the XSAVES command to retrieve the requested
>> registers in the overflow handler. The feature isn't limited to the PEBS
>> event or specific platforms anymore.
>> The hardware solution (if available) is still preferred, since it has
>> low overhead (especially with the large PEBS) and is more accurate.
>>
>> In theory, the solution should work for all X86 platforms. But I only
>> have newer Inter platforms to test. The patch set only enable the
>> feature for Intel Ice Lake and later platforms.
>>
>> Open:
>> The new registers include YMM, ZMM, OPMASK, SSP, and APX.
>> The sample_regs_user/intr has run out. A new field in the
>> struct perf_event_attr is required for the registers.
>> There could be several options as below for the new field.
>>
>> - Follow a similar format to XSAVES. Introduce the below fields to store
>> the bitmap of the registers.
>> struct perf_event_attr {
>> ...
>> __u64 sample_ext_regs_intr[2];
>> __u64 sample_ext_regs_user[2];
>> ...
>> }
>> Includes YMMH (16 bits), APX (16 bits), OPMASK (8 bits),
>> ZMMH0-15 (16 bits), H16ZMM (16 bits), SSP
>> For example, if a user wants YMM8, the perf tool needs to set the
>> corresponding bits of XMM8 and YMMH8, and reconstruct the result.
>> The method is similar to the existing method for
>> sample_regs_user/intr, and match the XSAVES format.
>> The kernel doesn't need to do extra configuration and reconstruction.
>> It's implemented in the patch set.
>>
>> - Similar to the above method. But the fields are the bitmap of the
>> complete registers, E.g., YMM (16 bits), APX (16 bits),
>> OPMASK (8 bits), ZMM (32 bits), SSP.
>> The kernel needs to do extra configuration and reconstruction,
>> which may brings extra overhead.
>>
>> - Combine the XMM, YMM, and ZMM. So all the registers can be put into
>> one u64 field.
>> ...
>> union {
>> __u64 sample_ext_regs_intr; //sample_ext_regs_user is simiar
>> struct {
>> __u32 vector_bitmap;
>> __u32 vector_type : 3, //0b001 XMM 0b010 YMM 0b100 ZMM
>> apx_bitmap : 16,
>> opmask_bitmap : 8,
>> ssp_bitmap : 1,
>> reserved : 4,
>>
>> };
>> ...
>> For example, if the YMM8-15 is required,
>> vector_bitmap: 0x0000ff00
>> vector_type: 0x2
>> This method can save two __u64 in the struct perf_event_attr.
>> But it's not straightforward since it mixes the type and bitmap.
>> The kernel also needs to do extra configuration and reconstruction.
>>
>> Please let me know if there are more ideas.
>
> https://lkml.kernel.org/r/20250416155327.GD17910@noisy.programming.kicks-ass.net
>
It's similar to the third method, but using the words to replace the
type. Also there are more space left in case we add more SIMDs in future.
I will implement it in the V2.
> comes to mind. Combine that with a rule that reclaims the XMM register
> space from perf_event_x86_regs when sample_simd_reg_words != 0, and then
> we can put APX and SPP there.
OK. So the sample_simd_reg_words actually has another meaning now. It's
used as a flag to tell whether utilizing the old format.
If so, I think it may be better to have a dedicate sample_simd_reg_flag
field.
For example,
#define SAMPLE_SIMD_FLAGS_FORMAT_LEGACY 0x0
#define SAMPLE_SIMD_FLAGS_FORMAT_WORDS 0x1
__u8 sample_simd_reg_flags;
__u8 sample_simd_reg_words;
__u64 sample_simd_reg_intr;
__u64 sample_simd_reg_user;
If (sample_simd_reg_flags != 0) reclaims the XMM space for APX and SPP.
Does it make sense?
Thanks,
Kan
On Tue, Jun 17, 2025 at 09:52:12AM -0400, Liang, Kan wrote: > OK. So the sample_simd_reg_words actually has another meaning now. Well, any simd field being non-zero means userspace knows about it. Sort of an implicit flag. > It's used as a flag to tell whether utilizing the old format. > > If so, I think it may be better to have a dedicate sample_simd_reg_flag > field. > > For example, > > #define SAMPLE_SIMD_FLAGS_FORMAT_LEGACY 0x0 > #define SAMPLE_SIMD_FLAGS_FORMAT_WORDS 0x1 > > __u8 sample_simd_reg_flags; > __u8 sample_simd_reg_words; > __u64 sample_simd_reg_intr; > __u64 sample_simd_reg_user; > > If (sample_simd_reg_flags != 0) reclaims the XMM space for APX and SPP. > > Does it make sense? Not sure, it eats up a whole byte. Dapeng seemed to favour separate intr/user vector width (although I'm not quite sure what the use would be). If you want an explicit bit, we might as well use one from __reserved_1, we still have some left.
On 2025-06-17 10:29 a.m., Peter Zijlstra wrote: > On Tue, Jun 17, 2025 at 09:52:12AM -0400, Liang, Kan wrote: > >> OK. So the sample_simd_reg_words actually has another meaning now. > > Well, any simd field being non-zero means userspace knows about it. Sort > of an implicit flag. Yes, but the tool probably wouldn't to touch any simd fields if user doesn't ask for simd registers > >> It's used as a flag to tell whether utilizing the old format. >> >> If so, I think it may be better to have a dedicate sample_simd_reg_flag >> field. >> >> For example, >> >> #define SAMPLE_SIMD_FLAGS_FORMAT_LEGACY 0x0 >> #define SAMPLE_SIMD_FLAGS_FORMAT_WORDS 0x1 >> >> __u8 sample_simd_reg_flags; >> __u8 sample_simd_reg_words; >> __u64 sample_simd_reg_intr; >> __u64 sample_simd_reg_user; >> >> If (sample_simd_reg_flags != 0) reclaims the XMM space for APX and SPP. >> >> Does it make sense? > > Not sure, it eats up a whole byte. Dapeng seemed to favour separate > intr/user vector width (although I'm not quite sure what the use would > be). > > If you want an explicit bit, we might as well use one from __reserved_1, > we still have some left. OK. I may add a sample_simd_reg : 1 to explicitly tell kernel to utilize the sample_simd_reg_XXX. Thanks, Kan
On 6/17/2025 11:23 PM, Liang, Kan wrote: > > On 2025-06-17 10:29 a.m., Peter Zijlstra wrote: >> On Tue, Jun 17, 2025 at 09:52:12AM -0400, Liang, Kan wrote: >> >>> OK. So the sample_simd_reg_words actually has another meaning now. >> Well, any simd field being non-zero means userspace knows about it. Sort >> of an implicit flag. > Yes, but the tool probably wouldn't to touch any simd fields if user > doesn't ask for simd registers > >>> It's used as a flag to tell whether utilizing the old format. >>> >>> If so, I think it may be better to have a dedicate sample_simd_reg_flag >>> field. >>> >>> For example, >>> >>> #define SAMPLE_SIMD_FLAGS_FORMAT_LEGACY 0x0 >>> #define SAMPLE_SIMD_FLAGS_FORMAT_WORDS 0x1 >>> >>> __u8 sample_simd_reg_flags; >>> __u8 sample_simd_reg_words; >>> __u64 sample_simd_reg_intr; >>> __u64 sample_simd_reg_user; >>> >>> If (sample_simd_reg_flags != 0) reclaims the XMM space for APX and SPP. >>> >>> Does it make sense? Not sure if I missed some discussion, but are these fields only intended for SIMD regs? What about the APX extended GPRs? Suppose APX eGPRs can reuse the legacy XMM bitmaps in sample_regs_user/intr[47:32], but we need an extra flag to distinguish it's XMM regs or APX eGPRs, maybe add an extra bit sample_egpr_reg : 1 in sample_simd_reg_words, but the *simd* word in the name would become ambiguous. >> Not sure, it eats up a whole byte. Dapeng seemed to favour separate >> intr/user vector width (although I'm not quite sure what the use would >> be). The reason that I prefer to add 2 separate "words" item is that user could sample interrupt and user space SIMD regs (but with different bit-width) simultaneously in theory, like "--intr-regs=YMM0, --user-regs=XMM0". >> >> If you want an explicit bit, we might as well use one from __reserved_1, >> we still have some left. > OK. I may add a sample_simd_reg : 1 to explicitly tell kernel to utilize > the sample_simd_reg_XXX. > > Thanks, > Kan
On 2025-06-17 8:57 p.m., Mi, Dapeng wrote:
>
> On 6/17/2025 11:23 PM, Liang, Kan wrote:
>>
>> On 2025-06-17 10:29 a.m., Peter Zijlstra wrote:
>>> On Tue, Jun 17, 2025 at 09:52:12AM -0400, Liang, Kan wrote:
>>>
>>>> OK. So the sample_simd_reg_words actually has another meaning now.
>>> Well, any simd field being non-zero means userspace knows about it. Sort
>>> of an implicit flag.
>> Yes, but the tool probably wouldn't to touch any simd fields if user
>> doesn't ask for simd registers
>>
>>>> It's used as a flag to tell whether utilizing the old format.
>>>>
>>>> If so, I think it may be better to have a dedicate sample_simd_reg_flag
>>>> field.
>>>>
>>>> For example,
>>>>
>>>> #define SAMPLE_SIMD_FLAGS_FORMAT_LEGACY 0x0
>>>> #define SAMPLE_SIMD_FLAGS_FORMAT_WORDS 0x1
>>>>
>>>> __u8 sample_simd_reg_flags;
>>>> __u8 sample_simd_reg_words;
>>>> __u64 sample_simd_reg_intr;
>>>> __u64 sample_simd_reg_user;
>>>>
>>>> If (sample_simd_reg_flags != 0) reclaims the XMM space for APX and SPP.
>>>>
>>>> Does it make sense?
>
> Not sure if I missed some discussion, but are these fields only intended
> for SIMD regs? What about the APX extended GPRs? Suppose APX eGPRs can
> reuse the legacy XMM bitmaps in sample_regs_user/intr[47:32], but we need
> an extra flag to distinguish it's XMM regs or APX eGPRs, maybe add an extra
> bit sample_egpr_reg : 1 in sample_simd_reg_words, but the *simd* word in
> the name would become ambiguous.
It can be used to explicitly tell the kernel that a new format is
expected. The new format means
- Put APX and SPP into sample_regs_user/intr[47:32]
- Use the sample_simd_reg_*
Alternatively, as Peter suggested, we can use the sample_simd_reg_words
to imply the new format.
If so, I will make it an union, for example.
union {
__u16 sample_reg_flags;
__u16 sample_simd_reg_words;
};
The first thing the tool does should be to set sample_reg_flags = 1,
regardless of whether simd is requested.
>
>
>>> Not sure, it eats up a whole byte. Dapeng seemed to favour separate
>>> intr/user vector width (although I'm not quite sure what the use would
>>> be).
>
> The reason that I prefer to add 2 separate "words" item is that user could
> sample interrupt and user space SIMD regs (but with different bit-width)
> simultaneously in theory, like "--intr-regs=YMM0, --user-regs=XMM0".
I'm not sure why the user wants a different bit-width. The
--user-regs=XMM0" doesn't seem to provide more useful information.
Anyway, I believe the tool can handle this case. The tool can always ask
YMM0 for both --intr-regs and --user-regs, but only output the XMM0 for
--user-regs. The only drawback is that the kernel may dump extra
information for the --user-regs. I don't think it's a big problem.
Thanks,
Kan
>
>
>>>
>>> If you want an explicit bit, we might as well use one from __reserved_1,
>>> we still have some left.
>> OK. I may add a sample_simd_reg : 1 to explicitly tell kernel to utilize
>> the sample_simd_reg_XXX.
>>
>> Thanks,
>> Kan
On 6/18/2025 6:47 PM, Liang, Kan wrote:
>
> On 2025-06-17 8:57 p.m., Mi, Dapeng wrote:
>> On 6/17/2025 11:23 PM, Liang, Kan wrote:
>>> On 2025-06-17 10:29 a.m., Peter Zijlstra wrote:
>>>> On Tue, Jun 17, 2025 at 09:52:12AM -0400, Liang, Kan wrote:
>>>>
>>>>> OK. So the sample_simd_reg_words actually has another meaning now.
>>>> Well, any simd field being non-zero means userspace knows about it. Sort
>>>> of an implicit flag.
>>> Yes, but the tool probably wouldn't to touch any simd fields if user
>>> doesn't ask for simd registers
>>>
>>>>> It's used as a flag to tell whether utilizing the old format.
>>>>>
>>>>> If so, I think it may be better to have a dedicate sample_simd_reg_flag
>>>>> field.
>>>>>
>>>>> For example,
>>>>>
>>>>> #define SAMPLE_SIMD_FLAGS_FORMAT_LEGACY 0x0
>>>>> #define SAMPLE_SIMD_FLAGS_FORMAT_WORDS 0x1
>>>>>
>>>>> __u8 sample_simd_reg_flags;
>>>>> __u8 sample_simd_reg_words;
>>>>> __u64 sample_simd_reg_intr;
>>>>> __u64 sample_simd_reg_user;
>>>>>
>>>>> If (sample_simd_reg_flags != 0) reclaims the XMM space for APX and SPP.
>>>>>
>>>>> Does it make sense?
>> Not sure if I missed some discussion, but are these fields only intended
>> for SIMD regs? What about the APX extended GPRs? Suppose APX eGPRs can
>> reuse the legacy XMM bitmaps in sample_regs_user/intr[47:32], but we need
>> an extra flag to distinguish it's XMM regs or APX eGPRs, maybe add an extra
>> bit sample_egpr_reg : 1 in sample_simd_reg_words, but the *simd* word in
>> the name would become ambiguous.
> It can be used to explicitly tell the kernel that a new format is
> expected. The new format means
> - Put APX and SPP into sample_regs_user/intr[47:32]
> - Use the sample_simd_reg_*
>
> Alternatively, as Peter suggested, we can use the sample_simd_reg_words
> to imply the new format.
> If so, I will make it an union, for example.
> union {
> __u16 sample_reg_flags;
> __u16 sample_simd_reg_words;
> };
>
> The first thing the tool does should be to set sample_reg_flags = 1,
> regardless of whether simd is requested.
So just double check, as long as the sample_reg_flags
(sample_simd_reg_words) > 0, the below new format would be used.
sample_regs_user/intr[31:0] bits unchanged, still represent the
original GPRs.
sample_regs_user/intr[47:32] bits represents APX eGPRs R31 - R16.
As for the SIMD regs including XMM regs, they are represented by the
dedicated SIMD regs structure ( or regs bitmap and regs word length) .
If sample_reg_flags (sample_simd_reg_words) == 0, then it falls back to
current format.
sample_regs_user/intr[31:0] bits represent the original GPRs.
sample_regs_user/intr[63:32] bits represent XMM regs.
If so, I think it's fine. The new format looks more reasonable than current
one.
>
>>
>>>> Not sure, it eats up a whole byte. Dapeng seemed to favour separate
>>>> intr/user vector width (although I'm not quite sure what the use would
>>>> be).
>> The reason that I prefer to add 2 separate "words" item is that user could
>> sample interrupt and user space SIMD regs (but with different bit-width)
>> simultaneously in theory, like "--intr-regs=YMM0, --user-regs=XMM0".
> I'm not sure why the user wants a different bit-width. The
> --user-regs=XMM0" doesn't seem to provide more useful information.
>
> Anyway, I believe the tool can handle this case. The tool can always ask
> YMM0 for both --intr-regs and --user-regs, but only output the XMM0 for
> --user-regs. The only drawback is that the kernel may dump extra
> information for the --user-regs. I don't think it's a big problem.
If we intent to handle it in user space tools, I'm not sure if user space
tool can easily know which records are from user space and filter out the
SIMD regs from kernel space and how complicated would the change be. IMO,
adding an extra u16 "words" would be much easier and won't consume too much
memory.
>
> Thanks,
> Kan
>>
>>>> If you want an explicit bit, we might as well use one from __reserved_1,
>>>> we still have some left.
>>> OK. I may add a sample_simd_reg : 1 to explicitly tell kernel to utilize
>>> the sample_simd_reg_XXX.
>>>
>>> Thanks,
>>> Kan
On 2025-06-18 8:28 a.m., Mi, Dapeng wrote: >>>>> Not sure, it eats up a whole byte. Dapeng seemed to favour separate >>>>> intr/user vector width (although I'm not quite sure what the use would >>>>> be). >>> The reason that I prefer to add 2 separate "words" item is that user could >>> sample interrupt and user space SIMD regs (but with different bit-width) >>> simultaneously in theory, like "--intr-regs=YMM0, --user-regs=XMM0". >> I'm not sure why the user wants a different bit-width. The >> --user-regs=XMM0" doesn't seem to provide more useful information. >> >> Anyway, I believe the tool can handle this case. The tool can always ask >> YMM0 for both --intr-regs and --user-regs, but only output the XMM0 for >> --user-regs. The only drawback is that the kernel may dump extra >> information for the --user-regs. I don't think it's a big problem. > If we intent to handle it in user space tools, I'm not sure if user space > tool can easily know which records are from user space and filter out the > SIMD regs from kernel space and how complicated would the change be. IMO, > adding an extra u16 "words" would be much easier and won't consume too much > memory. The filter is always done in kernel for --user-regs. The only difference is that the YMM (after filter) will be dumped to the perf.data. The tool just show the XMM registers to the end user. Thanks, Kan
On 6/18/2025 9:15 PM, Liang, Kan wrote:
>
> On 2025-06-18 8:28 a.m., Mi, Dapeng wrote:
>>>>>> Not sure, it eats up a whole byte. Dapeng seemed to favour separate
>>>>>> intr/user vector width (although I'm not quite sure what the use would
>>>>>> be).
>>>> The reason that I prefer to add 2 separate "words" item is that user could
>>>> sample interrupt and user space SIMD regs (but with different bit-width)
>>>> simultaneously in theory, like "--intr-regs=YMM0, --user-regs=XMM0".
>>> I'm not sure why the user wants a different bit-width. The
>>> --user-regs=XMM0" doesn't seem to provide more useful information.
>>>
>>> Anyway, I believe the tool can handle this case. The tool can always ask
>>> YMM0 for both --intr-regs and --user-regs, but only output the XMM0 for
>>> --user-regs. The only drawback is that the kernel may dump extra
>>> information for the --user-regs. I don't think it's a big problem.
>> If we intent to handle it in user space tools, I'm not sure if user space
>> tool can easily know which records are from user space and filter out the
>> SIMD regs from kernel space and how complicated would the change be. IMO,
>> adding an extra u16 "words" would be much easier and won't consume too much
>> memory.
> The filter is always done in kernel for --user-regs. The only difference
> is that the YMM (after filter) will be dumped to the perf.data. The tool
> just show the XMM registers to the end user.
Ok. But there could be another case, user may want to sample some APX eGPRs
in user space and sample SIMD regs in interrupt, like "--intr-regs=YMM0,
--user-regs=R16", then we have to define 2 separate "words" fields.
Anyway, it looks we would define a SIMD_REGS structure like below, and I
suppose we would create 2 instances, one is for interrupt, the other is for
user space. It's enough.
PERF_SAMPLE_SIMD_REGS := {
u16 nr_vectors;
u16 vector_length;
u16 nr_pred;
u16 pred_length;
u64 data[];
}
>
> Thanks,
> Kan
>
>
On 2025-06-18 8:41 p.m., Mi, Dapeng wrote:
>
> On 6/18/2025 9:15 PM, Liang, Kan wrote:
>>
>> On 2025-06-18 8:28 a.m., Mi, Dapeng wrote:
>>>>>>> Not sure, it eats up a whole byte. Dapeng seemed to favour separate
>>>>>>> intr/user vector width (although I'm not quite sure what the use would
>>>>>>> be).
>>>>> The reason that I prefer to add 2 separate "words" item is that user could
>>>>> sample interrupt and user space SIMD regs (but with different bit-width)
>>>>> simultaneously in theory, like "--intr-regs=YMM0, --user-regs=XMM0".
>>>> I'm not sure why the user wants a different bit-width. The
>>>> --user-regs=XMM0" doesn't seem to provide more useful information.
>>>>
>>>> Anyway, I believe the tool can handle this case. The tool can always ask
>>>> YMM0 for both --intr-regs and --user-regs, but only output the XMM0 for
>>>> --user-regs. The only drawback is that the kernel may dump extra
>>>> information for the --user-regs. I don't think it's a big problem.
>>> If we intent to handle it in user space tools, I'm not sure if user space
>>> tool can easily know which records are from user space and filter out the
>>> SIMD regs from kernel space and how complicated would the change be. IMO,
>>> adding an extra u16 "words" would be much easier and won't consume too much
>>> memory.
>> The filter is always done in kernel for --user-regs. The only difference
>> is that the YMM (after filter) will be dumped to the perf.data. The tool
>> just show the XMM registers to the end user.
>
> Ok. But there could be another case, user may want to sample some APX eGPRs
> in user space and sample SIMD regs in interrupt, like "--intr-regs=YMM0,
> --user-regs=R16", then we have to define 2 separate "words" fields.
>
Not for eGPRs. It uses the regular GP regs space, which implies u64 for
a 64b kernel. The "words" fields is only for vector and predicate registers.
I've stated working on the V2. The new interface would be as below.
diff --git a/include/uapi/linux/perf_event.h
b/include/uapi/linux/perf_event.h
index 78a362b80027..f7b8971fa99d 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -382,6 +382,7 @@ enum perf_event_read_format {
#define PERF_ATTR_SIZE_VER6 120 /* Add: aux_sample_size */
#define PERF_ATTR_SIZE_VER7 128 /* Add: sig_data */
#define PERF_ATTR_SIZE_VER8 136 /* Add: config3 */
+#define PERF_ATTR_SIZE_VER9 184 /* Add: sample_simd_regs */
/*
* 'struct perf_event_attr' contains various attributes that define
@@ -543,6 +544,24 @@ struct perf_event_attr {
__u64 sig_data;
__u64 config3; /* extension of config2 */
+
+
+ /*
+ * Defines set of SIMD registers to dump on samples.
+ * The sample_simd_req_enabled !=0 implies the
+ * set of SIMD registers is used to config all SIMD registers.
+ * If !sample_simd_req_enabled, sample_regs_XXX may be used to
+ * config some SIMD registers on X86.
+ */
+ union {
+ __u16 sample_simd_reg_enabled;
+ __u16 sample_simd_pred_reg_qwords;
+ };
+ __u16 sample_simd_pred_reg_intr;
+ __u16 sample_simd_pred_reg_user;
+ __u16 sample_simd_reg_qwords;
+ __u64 sample_simd_reg_intr;
+ __u64 sample_simd_reg_user;
};
/*
@@ -1016,7 +1035,15 @@ enum perf_event_type {
* } && PERF_SAMPLE_BRANCH_STACK
*
* { u64 abi; # enum perf_sample_regs_abi
- * u64 regs[weight(mask)]; } && PERF_SAMPLE_REGS_USER
+ * u64 regs[weight(mask)];
+ * struct {
+ * u16 nr_vectors;
+ * u16 vector_qwords;
+ * u16 nr_pred;
+ * u16 pred_qwords;
+ * u64 data[nr_vectors * vector_qwords + nr_pred * pred_qwords];
+ * } && sample_simd_reg_enabled
+ * } && PERF_SAMPLE_REGS_USER
*
* { u64 size;
* char data[size];
@@ -1043,7 +1070,15 @@ enum perf_event_type {
* { u64 data_src; } && PERF_SAMPLE_DATA_SRC
* { u64 transaction; } && PERF_SAMPLE_TRANSACTION
* { u64 abi; # enum perf_sample_regs_abi
- * u64 regs[weight(mask)]; } && PERF_SAMPLE_REGS_INTR
+ * u64 regs[weight(mask)];
+ * struct {
+ * u16 nr_vectors;
+ * u16 vector_qwords;
+ * u16 nr_pred;
+ * u16 pred_qwords;
+ * u64 data[nr_vectors * vector_qwords + nr_pred * pred_qwords];
+ * } && sample_simd_reg_enabled
+ * } && PERF_SAMPLE_REGS_INTR
* { u64 phys_addr;} && PERF_SAMPLE_PHYS_ADDR
* { u64 cgroup;} && PERF_SAMPLE_CGROUP
* { u64 data_page_size;} && PERF_SAMPLE_DATA_PAGE_SIZE
Thanks,
Kan
On Thu, Jun 19, 2025 at 07:11:23AM -0400, Liang, Kan wrote:
> @@ -543,6 +544,24 @@ struct perf_event_attr {
> __u64 sig_data;
>
> __u64 config3; /* extension of config2 */
> +
> +
> + /*
> + * Defines set of SIMD registers to dump on samples.
> + * The sample_simd_req_enabled !=0 implies the
> + * set of SIMD registers is used to config all SIMD registers.
> + * If !sample_simd_req_enabled, sample_regs_XXX may be used to
> + * config some SIMD registers on X86.
> + */
> + union {
> + __u16 sample_simd_reg_enabled;
> + __u16 sample_simd_pred_reg_qwords;
> + };
> + __u16 sample_simd_pred_reg_intr;
> + __u16 sample_simd_pred_reg_user;
This limits things to max 16 predicate registers. ARM will fully fill
that with present hardware.
> + __u16 sample_simd_reg_qwords;
> + __u64 sample_simd_reg_intr;
> + __u64 sample_simd_reg_user;
I would perhaps make this vec_reg.
> };
>
> /*
> @@ -1016,7 +1035,15 @@ enum perf_event_type {
> * } && PERF_SAMPLE_BRANCH_STACK
> *
> * { u64 abi; # enum perf_sample_regs_abi
> - * u64 regs[weight(mask)]; } && PERF_SAMPLE_REGS_USER
> + * u64 regs[weight(mask)];
> + * struct {
> + * u16 nr_vectors;
> + * u16 vector_qwords;
> + * u16 nr_pred;
> + * u16 pred_qwords;
> + * u64 data[nr_vectors * vector_qwords + nr_pred * pred_qwords];
> + * } && sample_simd_reg_enabled
Instead of using sample_simd_reg_enabled here I would perhaps extend
perf_sample_regs_abi. The current abi word is woefully underused.
Also, realistically, what you want to look at here is:
sample_simd_{pred,vec}_reg_user;
If those are empty, there will be no registers.
> + * } && PERF_SAMPLE_REGS_USER
> *
> * { u64 size;
> * char data[size];
On 2025-06-19 9:38 a.m., Peter Zijlstra wrote:
> On Thu, Jun 19, 2025 at 07:11:23AM -0400, Liang, Kan wrote:
>
>> @@ -543,6 +544,24 @@ struct perf_event_attr {
>> __u64 sig_data;
>>
>> __u64 config3; /* extension of config2 */
>> +
>> +
>> + /*
>> + * Defines set of SIMD registers to dump on samples.
>> + * The sample_simd_req_enabled !=0 implies the
>> + * set of SIMD registers is used to config all SIMD registers.
>> + * If !sample_simd_req_enabled, sample_regs_XXX may be used to
>> + * config some SIMD registers on X86.
>> + */
>> + union {
>> + __u16 sample_simd_reg_enabled;
>> + __u16 sample_simd_pred_reg_qwords;
>> + };
>> + __u16 sample_simd_pred_reg_intr;
>> + __u16 sample_simd_pred_reg_user;
>
> This limits things to max 16 predicate registers. ARM will fully fill
> that with present hardware.
I think I can use __u32 for predicate registers.
It means we need one more u64 for the qwords. It should not be a problem.
>
>> + __u16 sample_simd_reg_qwords;
>> + __u64 sample_simd_reg_intr;
>> + __u64 sample_simd_reg_user;
>
> I would perhaps make this vec_reg.
Sure.
>
>> };
>>
>> /*
>> @@ -1016,7 +1035,15 @@ enum perf_event_type {
>> * } && PERF_SAMPLE_BRANCH_STACK
>> *
>> * { u64 abi; # enum perf_sample_regs_abi
>> - * u64 regs[weight(mask)]; } && PERF_SAMPLE_REGS_USER
>> + * u64 regs[weight(mask)];
>> + * struct {
>> + * u16 nr_vectors;
>> + * u16 vector_qwords;
>> + * u16 nr_pred;
>> + * u16 pred_qwords;
>> + * u64 data[nr_vectors * vector_qwords + nr_pred * pred_qwords];
>> + * } && sample_simd_reg_enabled
>
> Instead of using sample_simd_reg_enabled here I would perhaps extend
> perf_sample_regs_abi. The current abi word is woefully underused.
>
Yes. Now I think the abi is used like a version number. I guess I can
add PERF_SAMPLE_REGS_ABI_SIMD and change it to a bitmap.
There should be no impact on the existing tool, since version and bitmap
are the same for 1 and 2.
enum perf_sample_regs_abi {
- PERF_SAMPLE_REGS_ABI_NONE = 0,
- PERF_SAMPLE_REGS_ABI_32 = 1,
- PERF_SAMPLE_REGS_ABI_64 = 2,
+ PERF_SAMPLE_REGS_ABI_NONE = 0x0,
+ PERF_SAMPLE_REGS_ABI_32 = 0x1,
+ PERF_SAMPLE_REGS_ABI_64 = 0x2,
+ PERF_SAMPLE_REGS_ABI_SIMD = 0x4,
};
> Also, realistically, what you want to look at here is:
>
> sample_simd_{pred,vec}_reg_user;
>
> If those are empty, there will be no registers.
Sure. But I will still keep the sample_simd_reg_enabled, since it can
explicitly tell if the new format is used.
Thanks,
Kan
>
>> + * } && PERF_SAMPLE_REGS_USER
>> *
>> * { u64 size;
>> * char data[size];
>
On 6/19/2025 7:11 PM, Liang, Kan wrote:
>
> On 2025-06-18 8:41 p.m., Mi, Dapeng wrote:
>> On 6/18/2025 9:15 PM, Liang, Kan wrote:
>>> On 2025-06-18 8:28 a.m., Mi, Dapeng wrote:
>>>>>>>> Not sure, it eats up a whole byte. Dapeng seemed to favour separate
>>>>>>>> intr/user vector width (although I'm not quite sure what the use would
>>>>>>>> be).
>>>>>> The reason that I prefer to add 2 separate "words" item is that user could
>>>>>> sample interrupt and user space SIMD regs (but with different bit-width)
>>>>>> simultaneously in theory, like "--intr-regs=YMM0, --user-regs=XMM0".
>>>>> I'm not sure why the user wants a different bit-width. The
>>>>> --user-regs=XMM0" doesn't seem to provide more useful information.
>>>>>
>>>>> Anyway, I believe the tool can handle this case. The tool can always ask
>>>>> YMM0 for both --intr-regs and --user-regs, but only output the XMM0 for
>>>>> --user-regs. The only drawback is that the kernel may dump extra
>>>>> information for the --user-regs. I don't think it's a big problem.
>>>> If we intent to handle it in user space tools, I'm not sure if user space
>>>> tool can easily know which records are from user space and filter out the
>>>> SIMD regs from kernel space and how complicated would the change be. IMO,
>>>> adding an extra u16 "words" would be much easier and won't consume too much
>>>> memory.
>>> The filter is always done in kernel for --user-regs. The only difference
>>> is that the YMM (after filter) will be dumped to the perf.data. The tool
>>> just show the XMM registers to the end user.
>> Ok. But there could be another case, user may want to sample some APX eGPRs
>> in user space and sample SIMD regs in interrupt, like "--intr-regs=YMM0,
>> --user-regs=R16", then we have to define 2 separate "words" fields.
>>
> Not for eGPRs. It uses the regular GP regs space, which implies u64 for
> a 64b kernel. The "words" fields is only for vector and predicate registers.
>
> I've stated working on the V2. The new interface would be as below.
>
> diff --git a/include/uapi/linux/perf_event.h
> b/include/uapi/linux/perf_event.h
> index 78a362b80027..f7b8971fa99d 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -382,6 +382,7 @@ enum perf_event_read_format {
> #define PERF_ATTR_SIZE_VER6 120 /* Add: aux_sample_size */
> #define PERF_ATTR_SIZE_VER7 128 /* Add: sig_data */
> #define PERF_ATTR_SIZE_VER8 136 /* Add: config3 */
> +#define PERF_ATTR_SIZE_VER9 184 /* Add: sample_simd_regs */
>
> /*
> * 'struct perf_event_attr' contains various attributes that define
> @@ -543,6 +544,24 @@ struct perf_event_attr {
> __u64 sig_data;
>
> __u64 config3; /* extension of config2 */
> +
> +
> + /*
> + * Defines set of SIMD registers to dump on samples.
> + * The sample_simd_req_enabled !=0 implies the
> + * set of SIMD registers is used to config all SIMD registers.
> + * If !sample_simd_req_enabled, sample_regs_XXX may be used to
> + * config some SIMD registers on X86.
> + */
> + union {
> + __u16 sample_simd_reg_enabled;
> + __u16 sample_simd_pred_reg_qwords;
> + };
> + __u16 sample_simd_pred_reg_intr;
> + __u16 sample_simd_pred_reg_user;
This is still a bitmap, right? Is it enough for ARM?
> + __u16 sample_simd_reg_qwords;
> + __u64 sample_simd_reg_intr;
> + __u64 sample_simd_reg_user;
> };
>
> /*
> @@ -1016,7 +1035,15 @@ enum perf_event_type {
> * } && PERF_SAMPLE_BRANCH_STACK
> *
> * { u64 abi; # enum perf_sample_regs_abi
> - * u64 regs[weight(mask)]; } && PERF_SAMPLE_REGS_USER
> + * u64 regs[weight(mask)];
> + * struct {
> + * u16 nr_vectors;
> + * u16 vector_qwords;
> + * u16 nr_pred;
> + * u16 pred_qwords;
> + * u64 data[nr_vectors * vector_qwords + nr_pred * pred_qwords];
> + * } && sample_simd_reg_enabled
> + * } && PERF_SAMPLE_REGS_USER
> *
> * { u64 size;
> * char data[size];
> @@ -1043,7 +1070,15 @@ enum perf_event_type {
> * { u64 data_src; } && PERF_SAMPLE_DATA_SRC
> * { u64 transaction; } && PERF_SAMPLE_TRANSACTION
> * { u64 abi; # enum perf_sample_regs_abi
> - * u64 regs[weight(mask)]; } && PERF_SAMPLE_REGS_INTR
> + * u64 regs[weight(mask)];
> + * struct {
> + * u16 nr_vectors;
> + * u16 vector_qwords;
> + * u16 nr_pred;
> + * u16 pred_qwords;
> + * u64 data[nr_vectors * vector_qwords + nr_pred * pred_qwords];
> + * } && sample_simd_reg_enabled
> + * } && PERF_SAMPLE_REGS_INTR
> * { u64 phys_addr;} && PERF_SAMPLE_PHYS_ADDR
> * { u64 cgroup;} && PERF_SAMPLE_CGROUP
> * { u64 data_page_size;} && PERF_SAMPLE_DATA_PAGE_SIZE
>
>
> Thanks,
> Kan
On Tue, Jun 17, 2025 at 11:23:10AM -0400, Liang, Kan wrote: > > > On 2025-06-17 10:29 a.m., Peter Zijlstra wrote: > > On Tue, Jun 17, 2025 at 09:52:12AM -0400, Liang, Kan wrote: > > > >> OK. So the sample_simd_reg_words actually has another meaning now. > > > > Well, any simd field being non-zero means userspace knows about it. Sort > > of an implicit flag. > > Yes, but the tool probably wouldn't to touch any simd fields if user > doesn't ask for simd registers Trivial enough to have the tool unconditionally write a simd_words size if the attr thing is big enough. But sure, whatever :-)
On 6/13/2025 9:49 PM, kan.liang@linux.intel.com wrote:
> From: Kan Liang <kan.liang@linux.intel.com>
>
> Starting from the Intel Ice Lake, the XMM registers can be collected in
> a PEBS record. More registers, e.g., YMM, ZMM, OPMASK, SPP and APX, will
> be added in the upcoming Architecture PEBS as well. But it requires the
> hardware support.
>
> The patch set provides a software solution to mitigate the hardware
> requirement. It utilizes the XSAVES command to retrieve the requested
> registers in the overflow handler. The feature isn't limited to the PEBS
> event or specific platforms anymore.
> The hardware solution (if available) is still preferred, since it has
> low overhead (especially with the large PEBS) and is more accurate.
>
> In theory, the solution should work for all X86 platforms. But I only
> have newer Inter platforms to test. The patch set only enable the
> feature for Intel Ice Lake and later platforms.
>
> Open:
> The new registers include YMM, ZMM, OPMASK, SSP, and APX.
> The sample_regs_user/intr has run out. A new field in the
> struct perf_event_attr is required for the registers.
> There could be several options as below for the new field.
>
> - Follow a similar format to XSAVES. Introduce the below fields to store
> the bitmap of the registers.
> struct perf_event_attr {
> ...
> __u64 sample_ext_regs_intr[2];
> __u64 sample_ext_regs_user[2];
> ...
> }
> Includes YMMH (16 bits), APX (16 bits), OPMASK (8 bits),
> ZMMH0-15 (16 bits), H16ZMM (16 bits), SSP
> For example, if a user wants YMM8, the perf tool needs to set the
> corresponding bits of XMM8 and YMMH8, and reconstruct the result.
> The method is similar to the existing method for
> sample_regs_user/intr, and match the XSAVES format.
> The kernel doesn't need to do extra configuration and reconstruction.
> It's implemented in the patch set.
>
> - Similar to the above method. But the fields are the bitmap of the
> complete registers, E.g., YMM (16 bits), APX (16 bits),
> OPMASK (8 bits), ZMM (32 bits), SSP.
> The kernel needs to do extra configuration and reconstruction,
> which may brings extra overhead.
>
> - Combine the XMM, YMM, and ZMM. So all the registers can be put into
> one u64 field.
> ...
> union {
> __u64 sample_ext_regs_intr; //sample_ext_regs_user is simiar
> struct {
> __u32 vector_bitmap;
> __u32 vector_type : 3, //0b001 XMM 0b010 YMM 0b100 ZMM
> apx_bitmap : 16,
> opmask_bitmap : 8,
> ssp_bitmap : 1,
> reserved : 4,
>
> };
> ...
> For example, if the YMM8-15 is required,
> vector_bitmap: 0x0000ff00
> vector_type: 0x2
> This method can save two __u64 in the struct perf_event_attr.
> But it's not straightforward since it mixes the type and bitmap.
> The kernel also needs to do extra configuration and reconstruction.
>
> Please let me know if there are more ideas.
+1 for method 1 or 2, and the method 2 is more preferred.
Method 1 doesn't need to reconstruct YMM/ZMM regs in kernel space, but it
offloads the reconstructions into user space, all user space perf related
tools have to reconstruct them by themselves. Not 100% sure, but I suppose
this needs a big change for perf tools to reconstruct and show the YMM/ZMM
regs.
The cons of method 2 is that it could need to extra memory space and memory
copy if users intent to sample these overlapped regs simultaneously, like
XMM0/YMM0/ZMM0, but suppose we can add extra check in perf tools and tell
users that these regs are overlapped and just force to sample the regs with
largest bit-width.
>
> Thanks,
> Kan
>
>
>
> Kan Liang (12):
> perf/x86: Use x86_perf_regs in the x86 nmi handler
> perf/x86: Setup the regs data
> x86/fpu/xstate: Add xsaves_nmi
> perf: Move has_extended_regs() to header file
> perf/x86: Support XMM register for non-PEBS and REGS_USER
> perf: Support extension of sample_regs
> perf/x86: Add YMMH in extended regs
> perf/x86: Add APX in extended regs
> perf/x86: Add OPMASK in extended regs
> perf/x86: Add ZMM in extended regs
> perf/x86: Add SSP in extended regs
> perf/x86/intel: Support extended registers
>
> arch/arm/kernel/perf_regs.c | 9 +-
> arch/arm64/kernel/perf_regs.c | 9 +-
> arch/csky/kernel/perf_regs.c | 9 +-
> arch/loongarch/kernel/perf_regs.c | 8 +-
> arch/mips/kernel/perf_regs.c | 9 +-
> arch/powerpc/perf/perf_regs.c | 9 +-
> arch/riscv/kernel/perf_regs.c | 8 +-
> arch/s390/kernel/perf_regs.c | 9 +-
> arch/x86/events/core.c | 226 ++++++++++++++++++++++++--
> arch/x86/events/intel/core.c | 49 ++++++
> arch/x86/events/intel/ds.c | 12 +-
> arch/x86/events/perf_event.h | 58 +++++++
> arch/x86/include/asm/fpu/xstate.h | 1 +
> arch/x86/include/asm/perf_event.h | 6 +
> arch/x86/include/uapi/asm/perf_regs.h | 101 ++++++++++++
> arch/x86/kernel/fpu/xstate.c | 22 +++
> arch/x86/kernel/perf_regs.c | 85 +++++++++-
> include/linux/perf_event.h | 23 +++
> include/linux/perf_regs.h | 29 +++-
> include/uapi/linux/perf_event.h | 8 +
> kernel/events/core.c | 63 +++++--
> 21 files changed, 699 insertions(+), 54 deletions(-)
>
© 2016 - 2026 Red Hat, Inc.