arch/arm64/include/asm/sysreg.h | 1 + arch/arm64/tools/sysreg | 6 ++++- drivers/perf/arm_spe_pmu.c | 60 ++++++++++++++++++++++++++++++++--------- 3 files changed, 54 insertions(+), 13 deletions(-)
SPE can be used from within a guest as long as the driver adheres to the new VM interface spec [1]. Because the driver should behave correctly whether it's running in a guest or not, the first patches are marked as a fix. Furthermore, in future versions of the architecture the PE will be allowed to behave in the same way. The last patch adds new behavior to make it easier for guests to be able to reserve large buffers. It's not strictly necessary, so it's not marked as a fix. [1]: https://developer.arm.com/documentation/den0154/latest/ Signed-off-by: James Clark <james.clark@linaro.org> --- James Clark (3): perf: arm_spe: Add barrier before enabling profiling buffer perf: arm_spe: Disable buffer before writing to PMBPTR_EL1 or PMBSR_EL1 perf: arm_spe: Add support for SPE VM interface arch/arm64/include/asm/sysreg.h | 1 + arch/arm64/tools/sysreg | 6 ++++- drivers/perf/arm_spe_pmu.c | 60 ++++++++++++++++++++++++++++++++--------- 3 files changed, 54 insertions(+), 13 deletions(-) --- base-commit: d0b3b7b22dfa1f4b515fd3a295b3fd958f9e81af change-id: 20250609-james-spe-vm-interface-2bb41e238072 Best regards, -- James Clark <james.clark@linaro.org>
Hi, On Tue, Jul 01, 2025 at 04:31:56PM +0100, James Clark wrote: > SPE can be used from within a guest as long as the driver adheres to the > new VM interface spec [1]. Because the driver should behave correctly > whether it's running in a guest or not, the first patches are marked as > a fix. Furthermore, in future versions of the architecture the PE will > be allowed to behave in the same way. > > The last patch adds new behavior to make it easier for guests to be > able to reserve large buffers. It's not strictly necessary, so it's not > marked as a fix. I had a look at the patches, and they all look ok to me, so for the series: Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> I also tested the series by hacking SPE virtualization support in KVM: - without these changes, the SPE driver gets into an infinite loop because it clears PMBSR_EL1.S before clearing PMBLIMITR_EL.E, and the hypervisor is allowed to ignore the write to PMBSR_EL1. - with these changes, that doesn't happen. - ran perf for about a day in a loop in a virtual machine and didn't notice anything out of the ordinary. - ran perf for about a day in a loop on baremetal and similary everything looked alright. - checked that the SPE driver correctly decodes the maximum buffer size for sizes 4M, 2M (2M is right at the boundary between the two encoding schemes) and 1M; that's also correctly reflected in /sys/devices/platform/<spe>/arm_spe_0/caps/max_buffer_size. - checked that perf is not allowed to use a buffer larger than the maximum. - checked that the SPE driver correctly detects a buffer size management event. So: Tested-by: Alexandru Elisei <alexandru.elisei@arm.com> While testing I noticed two things: 1. When perf tries to use a buffer larger than the maximum, the error is EINVAL (22): # cat /sys/devices/platform/spe/arm_spe_0/caps/max_buff_size 4194304 # perf record -ae arm_spe// -m,16M -- sleep 10 failed to mmap with 22 (Invalid argument) (used 16M as the buffer size because what the driver ends up programming is half that). I would have expected to get back ENOMEM (12), that seems less ambiguous to me. I had to hack the driver to print an error message to dmesg when the max buffer size is exceed to make sure that's why I was seeing the error message in perf, and it wasn't because of something else. I get that that's because .setup_aux() can only return NULL on error, but feels like there's room for improvement here. 2. A hypervisor is allowed to inject a buffer size event even though the buffer set by the guest is smaller than the maximum advertised. For example, this can happen if there isn't enough memory to pin the buffer, or if the limit on pinned memory is exceeded in the hypervisor (implementation specific behaviour, not mandated in DEN0154, of course). In this situation, when the SPE driver gets a buffer size management event injected by the hypervisor, there is no way for the driver to communicate it to the perf instance, and the profiled process continues executing even though profiling has stopped. That's not different from what happens today with buffer management events, but unlike the other events, which aren't under the control of userspace, the buffer size event is potentially recoverable if userspace restarts perf with a smaller buffer. Do you think there's something that can be done to improve this situation? Thanks, Alex > > [1]: https://developer.arm.com/documentation/den0154/latest/ > > Signed-off-by: James Clark <james.clark@linaro.org> > --- > James Clark (3): > perf: arm_spe: Add barrier before enabling profiling buffer > perf: arm_spe: Disable buffer before writing to PMBPTR_EL1 or PMBSR_EL1 > perf: arm_spe: Add support for SPE VM interface > > arch/arm64/include/asm/sysreg.h | 1 + > arch/arm64/tools/sysreg | 6 ++++- > drivers/perf/arm_spe_pmu.c | 60 ++++++++++++++++++++++++++++++++--------- > 3 files changed, 54 insertions(+), 13 deletions(-) > --- > base-commit: d0b3b7b22dfa1f4b515fd3a295b3fd958f9e81af > change-id: 20250609-james-spe-vm-interface-2bb41e238072 > > Best regards, > -- > James Clark <james.clark@linaro.org> >
On 01/08/2025 2:28 pm, Alexandru Elisei wrote: > Hi, > > On Tue, Jul 01, 2025 at 04:31:56PM +0100, James Clark wrote: >> SPE can be used from within a guest as long as the driver adheres to the >> new VM interface spec [1]. Because the driver should behave correctly >> whether it's running in a guest or not, the first patches are marked as >> a fix. Furthermore, in future versions of the architecture the PE will >> be allowed to behave in the same way. >> >> The last patch adds new behavior to make it easier for guests to be >> able to reserve large buffers. It's not strictly necessary, so it's not >> marked as a fix. > > I had a look at the patches, and they all look ok to me, so for the series: > > Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> > > I also tested the series by hacking SPE virtualization support in KVM: > > - without these changes, the SPE driver gets into an infinite loop because it > clears PMBSR_EL1.S before clearing PMBLIMITR_EL.E, and the hypervisor is > allowed to ignore the write to PMBSR_EL1. > > - with these changes, that doesn't happen. > > - ran perf for about a day in a loop in a virtual machine and didn't notice > anything out of the ordinary. > > - ran perf for about a day in a loop on baremetal and similary everything looked > alright. > > - checked that the SPE driver correctly decodes the maximum buffer size for > sizes 4M, 2M (2M is right at the boundary between the two encoding schemes) > and 1M; that's also correctly reflected in > /sys/devices/platform/<spe>/arm_spe_0/caps/max_buffer_size. > > - checked that perf is not allowed to use a buffer larger than the maximum. > > - checked that the SPE driver correctly detects a buffer size management event. > > So: > > Tested-by: Alexandru Elisei <alexandru.elisei@arm.com> > > While testing I noticed two things: > > 1. When perf tries to use a buffer larger than the maximum, the error is EINVAL > (22): > > # cat /sys/devices/platform/spe/arm_spe_0/caps/max_buff_size > 4194304 > # perf record -ae arm_spe// -m,16M -- sleep 10 > failed to mmap with 22 (Invalid argument) > > (used 16M as the buffer size because what the driver ends up programming is half > that). > > I would have expected to get back ENOMEM (12), that seems less ambiguous to me. > I had to hack the driver to print an error message to dmesg when the max buffer > size is exceed to make sure that's why I was seeing the error message in perf, > and it wasn't because of something else. I get that that's because .setup_aux() > can only return NULL on error, but feels like there's room for improvement here. > We could add an error code, rb_alloc_aux() already returns one and that calls setup_aux(). But the scenarios would be either EINVAL or ENOMEM and wouldn't give the user the exact reason ("need > 2 pages", "need even number of pages", etc). So I'm not sure it would be enough of an improvement over returning NULL to be worth it. However I will add a warning into Perf if the user asks for more than caps/max_buffer_size. That would be a useful message and Perf can do it itself so it doesn't need to be in the driver changes. > 2. A hypervisor is allowed to inject a buffer size event even though the buffer > set by the guest is smaller than the maximum advertised. For example, this can > happen if there isn't enough memory to pin the buffer, or if the limit on pinned > memory is exceeded in the hypervisor (implementation specific behaviour, not > mandated in DEN0154, of course). > > In this situation, when the SPE driver gets a buffer size management event > injected by the hypervisor, there is no way for the driver to communicate it to > the perf instance, and the profiled process continues executing even though > profiling has stopped. > > That's not different from what happens today with buffer management events, but > unlike the other events, which aren't under the control of userspace, the buffer > size event is potentially recoverable if userspace restarts perf with a smaller > buffer. > > Do you think there's something that can be done to improve this situation? > > Thanks, > Alex > It doesn't look like there's currently anything that can stop an event or signal to Perf that the event has gone bad. We could add something like "__u32 error" to struct perf_event_mmap_page. But I'm not sure what you'd do with it. If Perf is the parent of the process you wouldn't want to kill it in case anything bad happens. So you're left with leaving it running anyway. If it's just an error message that you want then there's already one in dmesg for buffer management errors, and that string is a lot better than a single code. Unless these new codes were detailed PMU specific ones? Actually it's a whole page so why not make it a string... It's not a case of the samples ending randomly somewhere though, you'll either get all of them or none of them. So it will be quite obvious to the user that something has gone wrong. Secondly I think the scenario of not being able to pin memory when asking for less than the limit would be very rare. It's probably fine to leave it like this for now and we can always add something later, maybe if people start to run into it for real. James
On 04/08/2025 17:00, James Clark wrote: > > > On 01/08/2025 2:28 pm, Alexandru Elisei wrote: >> Hi, >> >> On Tue, Jul 01, 2025 at 04:31:56PM +0100, James Clark wrote: >>> SPE can be used from within a guest as long as the driver adheres to the >>> new VM interface spec [1]. Because the driver should behave correctly >>> whether it's running in a guest or not, the first patches are marked as >>> a fix. Furthermore, in future versions of the architecture the PE will >>> be allowed to behave in the same way. >>> >>> The last patch adds new behavior to make it easier for guests to be >>> able to reserve large buffers. It's not strictly necessary, so it's not >>> marked as a fix. >> >> I had a look at the patches, and they all look ok to me, so for the >> series: >> >> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> >> >> I also tested the series by hacking SPE virtualization support in KVM: >> >> - without these changes, the SPE driver gets into an infinite loop >> because it >> clears PMBSR_EL1.S before clearing PMBLIMITR_EL.E, and the >> hypervisor is >> allowed to ignore the write to PMBSR_EL1. >> >> - with these changes, that doesn't happen. >> >> - ran perf for about a day in a loop in a virtual machine and didn't >> notice >> anything out of the ordinary. >> >> - ran perf for about a day in a loop on baremetal and similary >> everything looked >> alright. >> >> - checked that the SPE driver correctly decodes the maximum buffer >> size for >> sizes 4M, 2M (2M is right at the boundary between the two encoding >> schemes) >> and 1M; that's also correctly reflected in >> /sys/devices/platform/<spe>/arm_spe_0/caps/max_buffer_size. >> >> - checked that perf is not allowed to use a buffer larger than the >> maximum. >> >> - checked that the SPE driver correctly detects a buffer size >> management event. >> >> So: >> >> Tested-by: Alexandru Elisei <alexandru.elisei@arm.com> >> >> While testing I noticed two things: >> >> 1. When perf tries to use a buffer larger than the maximum, the error >> is EINVAL >> (22): >> >> # cat /sys/devices/platform/spe/arm_spe_0/caps/max_buff_size >> 4194304 >> # perf record -ae arm_spe// -m,16M -- sleep 10 >> failed to mmap with 22 (Invalid argument) >> >> (used 16M as the buffer size because what the driver ends up >> programming is half >> that). >> >> I would have expected to get back ENOMEM (12), that seems less >> ambiguous to me. >> I had to hack the driver to print an error message to dmesg when the >> max buffer >> size is exceed to make sure that's why I was seeing the error message >> in perf, >> and it wasn't because of something else. I get that that's >> because .setup_aux() >> can only return NULL on error, but feels like there's room for >> improvement here. >> > > We could add an error code, rb_alloc_aux() already returns one and that > calls setup_aux(). But the scenarios would be either EINVAL or ENOMEM > and wouldn't give the user the exact reason ("need > 2 pages", "need > even number of pages", etc). So I'm not sure it would be enough of an > improvement over returning NULL to be worth it. > > However I will add a warning into Perf if the user asks for more than > caps/max_buffer_size. That would be a useful message and Perf can do it > itself so it doesn't need to be in the driver changes. > >> 2. A hypervisor is allowed to inject a buffer size event even though >> the buffer >> set by the guest is smaller than the maximum advertised. For example, >> this can >> happen if there isn't enough memory to pin the buffer, or if the limit >> on pinned >> memory is exceeded in the hypervisor (implementation specific >> behaviour, not >> mandated in DEN0154, of course). >> >> In this situation, when the SPE driver gets a buffer size management >> event >> injected by the hypervisor, there is no way for the driver to >> communicate it to >> the perf instance, and the profiled process continues executing even >> though >> profiling has stopped. >> >> That's not different from what happens today with buffer management >> events, but >> unlike the other events, which aren't under the control of userspace, >> the buffer >> size event is potentially recoverable if userspace restarts perf with >> a smaller >> buffer. >> >> Do you think there's something that can be done to improve this >> situation? >> >> Thanks, >> Alex >> > > It doesn't look like there's currently anything that can stop an event > or signal to Perf that the event has gone bad. > > We could add something like "__u32 error" to struct > perf_event_mmap_page. But I'm not sure what you'd do with it. If Perf is > the parent of the process you wouldn't want to kill it in case anything > bad happens. So you're left with leaving it running anyway. If it's just > an error message that you want then there's already one in dmesg for > buffer management errors, and that string is a lot better than a single > code. Unless these new codes were detailed PMU specific ones? Actually > it's a whole page so why not make it a string... > > It's not a case of the samples ending randomly somewhere though, you'll > either get all of them or none of them. So it will be quite obvious to > the user that something has gone wrong. Secondly I think the scenario of > not being able to pin memory when asking for less than the limit would > be very rare. It's probably fine to leave it like this for now and we > can always add something later, maybe if people start to run into it for > real. Do we get EMPTY AUX records in this case where there was no profiling data ? If so, why not convey the error via AUX record header flags ? Suzuk > > James > >
© 2016 - 2025 Red Hat, Inc.