[PATCH V2 0/7] arm64/perf: Enable branch stack sampling

Anshuman Khandual posted 7 patches 3 years, 7 months ago
There is a newer version of this series
arch/arm64/include/asm/sysreg.h | 222 ++++++++++++++++
arch/arm64/kernel/perf_event.c  |  48 ++++
drivers/perf/Kconfig            |  11 +
drivers/perf/Makefile           |   1 +
drivers/perf/arm_pmu.c          |  82 +++++-
drivers/perf/arm_pmu_brbe.c     | 448 ++++++++++++++++++++++++++++++++
drivers/perf/arm_pmu_brbe.h     | 259 ++++++++++++++++++
drivers/perf/arm_pmu_platform.c |  34 +++
include/linux/perf/arm_pmu.h    |  67 +++++
9 files changed, 1169 insertions(+), 3 deletions(-)
create mode 100644 drivers/perf/arm_pmu_brbe.c
create mode 100644 drivers/perf/arm_pmu_brbe.h
[PATCH V2 0/7] arm64/perf: Enable branch stack sampling
Posted by Anshuman Khandual 3 years, 7 months ago
This series enables perf branch stack sampling support on arm64 platform
via a new arch feature called Branch Record Buffer Extension (BRBE). All
relevant register definitions could be accessed here.

https://developer.arm.com/documentation/ddi0601/2021-12/AArch64-Registers

This series applies on v6.0-rc4 after the BRBE related perf ABI changes series
(V7) that was posted earlier, and a branch sample filter helper patch.

https://lore.kernel.org/all/20220824044822.70230-1-anshuman.khandual@arm.com/

https://lore.kernel.org/all/20220906084414.396220-1-anshuman.khandual@arm.com/

Following issues have been resolved

- Jame's concerns regarding permission inadequacy related to perfmon_capable()
- Jame's concerns regarding using perf_event_paranoid along with perfmon_capable()

Following issues remain inconclusive

- Rob's concerns regarding the series structure, arm_pmu callbacks based framework

Changes in V2:

- Dropped branch sample filter helpers consolidation patch from this series 
- Added new hw_perf_event.flags element ARMPMU_EVT_PRIV to cache perfmon_capable()
- Use cached perfmon_capable() while configuring BRBE branch record filters

Changes in V1:

https://lore.kernel.org/linux-arm-kernel/20220613100119.684673-1-anshuman.khandual@arm.com/

- Added CONFIG_PERF_EVENTS wrapper for all branch sample filter helpers
- Process new perf branch types via PERF_BR_EXTEND_ABI

Changes in RFC V2:

https://lore.kernel.org/linux-arm-kernel/20220412115455.293119-1-anshuman.khandual@arm.com/

- Added branch_sample_priv() while consolidating other branch sample filter helpers
- Changed all SYS_BRBXXXN_EL1 register definition encodings per Marc
- Changed the BRBE driver as per proposed BRBE related perf ABI changes (V5)
- Added documentation for struct arm_pmu changes, updated commit message
- Updated commit message for BRBE detection infrastructure patch
- PERF_SAMPLE_BRANCH_KERNEL gets checked during arm event init (outside the driver)
- Branch privilege state capture mechanism has now moved inside the driver

Changes in RFC V1:

https://lore.kernel.org/all/1642998653-21377-1-git-send-email-anshuman.khandual@arm.com/

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-perf-users@vger.kernel.org
Cc: linux-kernel@vger.kernel.org

Anshuman Khandual (7):
  arm64/perf: Add register definitions for BRBE
  arm64/perf: Update struct arm_pmu for BRBE
  arm64/perf: Update struct pmu_hw_events for BRBE
  driver/perf/arm_pmu_platform: Add support for BRBE attributes detection
  arm64/perf: Drive BRBE from perf event states
  arm64/perf: Add BRBE driver
  arm64/perf: Enable branch stack sampling

 arch/arm64/include/asm/sysreg.h | 222 ++++++++++++++++
 arch/arm64/kernel/perf_event.c  |  48 ++++
 drivers/perf/Kconfig            |  11 +
 drivers/perf/Makefile           |   1 +
 drivers/perf/arm_pmu.c          |  82 +++++-
 drivers/perf/arm_pmu_brbe.c     | 448 ++++++++++++++++++++++++++++++++
 drivers/perf/arm_pmu_brbe.h     | 259 ++++++++++++++++++
 drivers/perf/arm_pmu_platform.c |  34 +++
 include/linux/perf/arm_pmu.h    |  67 +++++
 9 files changed, 1169 insertions(+), 3 deletions(-)
 create mode 100644 drivers/perf/arm_pmu_brbe.c
 create mode 100644 drivers/perf/arm_pmu_brbe.h

-- 
2.25.1
Re: [PATCH V2 0/7] arm64/perf: Enable branch stack sampling
Posted by James Clark 3 years, 6 months ago

On 08/09/2022 06:10, Anshuman Khandual wrote:
> This series enables perf branch stack sampling support on arm64 platform
> via a new arch feature called Branch Record Buffer Extension (BRBE). All
> relevant register definitions could be accessed here.
> 
> https://developer.arm.com/documentation/ddi0601/2021-12/AArch64-Registers
> 
> This series applies on v6.0-rc4 after the BRBE related perf ABI changes series
> (V7) that was posted earlier, and a branch sample filter helper patch.
> 
> https://lore.kernel.org/all/20220824044822.70230-1-anshuman.khandual@arm.com/
> 
> https://lore.kernel.org/all/20220906084414.396220-1-anshuman.khandual@arm.com/
> 
> Following issues have been resolved
> 
> - Jame's concerns regarding permission inadequacy related to perfmon_capable()
> - Jame's concerns regarding using perf_event_paranoid along with perfmon_capable()

I don't see the resolution to this one. I'm not 100% sure of the code
path used for LBR, but I think you just need to take perf_allow_kernel()
into account somewhere to make this command have the same result with
BRBE. Is there any contention that the permissions shouldn't behave in
the same way across platforms? This is when perf_event_paranoid < 2:

Intel:

  $ perf record -j any -- ls

  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.014 MB perf.data (16 samples) ]

Arm:

  $ perf record -j any -- ls

  Error:
  No permission to enable cycles event.

> 
> Following issues remain inconclusive
> 
> - Rob's concerns regarding the series structure, arm_pmu callbacks based framework
> 
> Changes in V2:
> 
> - Dropped branch sample filter helpers consolidation patch from this series 
> - Added new hw_perf_event.flags element ARMPMU_EVT_PRIV to cache perfmon_capable()
> - Use cached perfmon_capable() while configuring BRBE branch record filters
> 
> Changes in V1:
> 
> https://lore.kernel.org/linux-arm-kernel/20220613100119.684673-1-anshuman.khandual@arm.com/
> 
> - Added CONFIG_PERF_EVENTS wrapper for all branch sample filter helpers
> - Process new perf branch types via PERF_BR_EXTEND_ABI
> 
> Changes in RFC V2:
> 
> https://lore.kernel.org/linux-arm-kernel/20220412115455.293119-1-anshuman.khandual@arm.com/
> 
> - Added branch_sample_priv() while consolidating other branch sample filter helpers
> - Changed all SYS_BRBXXXN_EL1 register definition encodings per Marc
> - Changed the BRBE driver as per proposed BRBE related perf ABI changes (V5)
> - Added documentation for struct arm_pmu changes, updated commit message
> - Updated commit message for BRBE detection infrastructure patch
> - PERF_SAMPLE_BRANCH_KERNEL gets checked during arm event init (outside the driver)
> - Branch privilege state capture mechanism has now moved inside the driver
> 
> Changes in RFC V1:
> 
> https://lore.kernel.org/all/1642998653-21377-1-git-send-email-anshuman.khandual@arm.com/
> 
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: James Clark <james.clark@arm.com>
> Cc: Rob Herring <robh@kernel.org>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-perf-users@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> 
> Anshuman Khandual (7):
>   arm64/perf: Add register definitions for BRBE
>   arm64/perf: Update struct arm_pmu for BRBE
>   arm64/perf: Update struct pmu_hw_events for BRBE
>   driver/perf/arm_pmu_platform: Add support for BRBE attributes detection
>   arm64/perf: Drive BRBE from perf event states
>   arm64/perf: Add BRBE driver
>   arm64/perf: Enable branch stack sampling
> 
>  arch/arm64/include/asm/sysreg.h | 222 ++++++++++++++++
>  arch/arm64/kernel/perf_event.c  |  48 ++++
>  drivers/perf/Kconfig            |  11 +
>  drivers/perf/Makefile           |   1 +
>  drivers/perf/arm_pmu.c          |  82 +++++-
>  drivers/perf/arm_pmu_brbe.c     | 448 ++++++++++++++++++++++++++++++++
>  drivers/perf/arm_pmu_brbe.h     | 259 ++++++++++++++++++
>  drivers/perf/arm_pmu_platform.c |  34 +++
>  include/linux/perf/arm_pmu.h    |  67 +++++
>  9 files changed, 1169 insertions(+), 3 deletions(-)
>  create mode 100644 drivers/perf/arm_pmu_brbe.c
>  create mode 100644 drivers/perf/arm_pmu_brbe.h
>
Re: [PATCH V2 0/7] arm64/perf: Enable branch stack sampling
Posted by Anshuman Khandual 3 years, 6 months ago

On 9/13/22 16:25, James Clark wrote:
> 
> On 08/09/2022 06:10, Anshuman Khandual wrote:
>> This series enables perf branch stack sampling support on arm64 platform
>> via a new arch feature called Branch Record Buffer Extension (BRBE). All
>> relevant register definitions could be accessed here.
>>
>> https://developer.arm.com/documentation/ddi0601/2021-12/AArch64-Registers
>>
>> This series applies on v6.0-rc4 after the BRBE related perf ABI changes series
>> (V7) that was posted earlier, and a branch sample filter helper patch.
>>
>> https://lore.kernel.org/all/20220824044822.70230-1-anshuman.khandual@arm.com/
>>
>> https://lore.kernel.org/all/20220906084414.396220-1-anshuman.khandual@arm.com/
>>
>> Following issues have been resolved
>>
>> - Jame's concerns regarding permission inadequacy related to perfmon_capable()
>> - Jame's concerns regarding using perf_event_paranoid along with perfmon_capable()
> I don't see the resolution to this one. I'm not 100% sure of the code
> path used for LBR, but I think you just need to take perf_allow_kernel()
> into account somewhere to make this command have the same result with
> BRBE. Is there any contention that the permissions shouldn't behave in
> the same way across platforms? This is when perf_event_paranoid < 2:
> 
> Intel:
> 
>   $ perf record -j any -- ls
> 
>   [ perf record: Woken up 1 times to write data ]
>   [ perf record: Captured and wrote 0.014 MB perf.data (16 samples) ]
> 
> Arm:
> 
>   $ perf record -j any -- ls
> 
>   Error:
>   No permission to enable cycles event.
> 
Proposed solution here just follows what we did for the SPE driver recently.
I would not be surprised, if there is difference in semantics in permission
checking across various platform perf drivers. Ideally permission should not
even be checked in platform drivers - either capability or perf_event_paranoid.

Unfortunately changing the permission checking framework across generic perf
is beyond the scope for this BRBE proposal and might be taken up later via a
different series. Although I would be willing to accommodate any alternate
suggestions to improve permission checking here in the BRBE driver.
Re: [PATCH V2 0/7] arm64/perf: Enable branch stack sampling
Posted by James Clark 3 years, 6 months ago

On 13/09/2022 13:12, Anshuman Khandual wrote:
> 
> 
> On 9/13/22 16:25, James Clark wrote:
>>
>> On 08/09/2022 06:10, Anshuman Khandual wrote:
>>> This series enables perf branch stack sampling support on arm64 platform
>>> via a new arch feature called Branch Record Buffer Extension (BRBE). All
>>> relevant register definitions could be accessed here.
>>>
>>> https://developer.arm.com/documentation/ddi0601/2021-12/AArch64-Registers
>>>
>>> This series applies on v6.0-rc4 after the BRBE related perf ABI changes series
>>> (V7) that was posted earlier, and a branch sample filter helper patch.
>>>
>>> https://lore.kernel.org/all/20220824044822.70230-1-anshuman.khandual@arm.com/
>>>
>>> https://lore.kernel.org/all/20220906084414.396220-1-anshuman.khandual@arm.com/
>>>
>>> Following issues have been resolved
>>>
>>> - Jame's concerns regarding permission inadequacy related to perfmon_capable()
>>> - Jame's concerns regarding using perf_event_paranoid along with perfmon_capable()
>> I don't see the resolution to this one. I'm not 100% sure of the code
>> path used for LBR, but I think you just need to take perf_allow_kernel()
>> into account somewhere to make this command have the same result with
>> BRBE. Is there any contention that the permissions shouldn't behave in
>> the same way across platforms? This is when perf_event_paranoid < 2:
>>
>> Intel:
>>
>>   $ perf record -j any -- ls
>>
>>   [ perf record: Woken up 1 times to write data ]
>>   [ perf record: Captured and wrote 0.014 MB perf.data (16 samples) ]
>>
>> Arm:
>>
>>   $ perf record -j any -- ls
>>
>>   Error:
>>   No permission to enable cycles event.
>>
> Proposed solution here just follows what we did for the SPE driver recently.
> I would not be surprised, if there is difference in semantics in permission
> checking across various platform perf drivers. 

SPE isn't too relevant because it's its own thing and there is no SPE
command that can be run on other platforms. There may be something like
perf c2c that uses SPE under the hood but if it works differently across
platforms I would also consider that a bug and not something to be copied.

> Ideally permission should not
> even be checked in platform drivers - either capability or perf_event_paranoid.

But it is currently. Users don't care about the code or how complicated
the implementation is, only that the behaviour is sane. We're not
helping Arm users or adoption of BRBE if the same command that someone
runs somewhere else fails inexplicably, without any justification other
than "the code didn't look right".

> 
> Unfortunately changing the permission checking framework across generic perf
> is beyond the scope for this BRBE proposal and might be taken up later via a

Permissions are definitely not beyond the scope of this proposal because
the code to check the permissions has been added right here:

  +		if (perfmon_capable())
  +			event->hw.flags |= ARMPMU_EVT_PRIV;

And all it needs extra is a check of perf_allow_kernel() or similar.

> different series. Although I would be willing to accommodate any alternate
> suggestions to improve permission checking here in the BRBE driver.

I don't think planning to change it in the future is very user friendly
either, otherwise any help we give to people stuck will have to start
with an explanation about how we changed the permissions model across
versions, and their command or setup also depends on the kernel version.
Re: [PATCH V2 0/7] arm64/perf: Enable branch stack sampling
Posted by Anshuman Khandual 3 years, 6 months ago

On 9/13/22 18:42, James Clark wrote:
> 
> 
> On 13/09/2022 13:12, Anshuman Khandual wrote:
>>
>>
>> On 9/13/22 16:25, James Clark wrote:
>>>
>>> On 08/09/2022 06:10, Anshuman Khandual wrote:
>>>> This series enables perf branch stack sampling support on arm64 platform
>>>> via a new arch feature called Branch Record Buffer Extension (BRBE). All
>>>> relevant register definitions could be accessed here.
>>>>
>>>> https://developer.arm.com/documentation/ddi0601/2021-12/AArch64-Registers
>>>>
>>>> This series applies on v6.0-rc4 after the BRBE related perf ABI changes series
>>>> (V7) that was posted earlier, and a branch sample filter helper patch.
>>>>
>>>> https://lore.kernel.org/all/20220824044822.70230-1-anshuman.khandual@arm.com/
>>>>
>>>> https://lore.kernel.org/all/20220906084414.396220-1-anshuman.khandual@arm.com/
>>>>
>>>> Following issues have been resolved
>>>>
>>>> - Jame's concerns regarding permission inadequacy related to perfmon_capable()
>>>> - Jame's concerns regarding using perf_event_paranoid along with perfmon_capable()
>>> I don't see the resolution to this one. I'm not 100% sure of the code
>>> path used for LBR, but I think you just need to take perf_allow_kernel()
>>> into account somewhere to make this command have the same result with
>>> BRBE. Is there any contention that the permissions shouldn't behave in
>>> the same way across platforms? This is when perf_event_paranoid < 2:
>>>
>>> Intel:
>>>
>>>   $ perf record -j any -- ls
>>>
>>>   [ perf record: Woken up 1 times to write data ]
>>>   [ perf record: Captured and wrote 0.014 MB perf.data (16 samples) ]
>>>
>>> Arm:
>>>
>>>   $ perf record -j any -- ls
>>>
>>>   Error:
>>>   No permission to enable cycles event.
>>>
>> Proposed solution here just follows what we did for the SPE driver recently.
>> I would not be surprised, if there is difference in semantics in permission
>> checking across various platform perf drivers. 
> 
> SPE isn't too relevant because it's its own thing and there is no SPE
> command that can be run on other platforms. There may be something like
> perf c2c that uses SPE under the hood but if it works differently across
> platforms I would also consider that a bug and not something to be copied.
> 
>> Ideally permission should not
>> even be checked in platform drivers - either capability or perf_event_paranoid.
> 
> But it is currently. Users don't care about the code or how complicated
> the implementation is, only that the behaviour is sane. We're not
> helping Arm users or adoption of BRBE if the same command that someone
> runs somewhere else fails inexplicably, without any justification other
> than "the code didn't look right".
> 
>>
>> Unfortunately changing the permission checking framework across generic perf
>> is beyond the scope for this BRBE proposal and might be taken up later via a
> 
> Permissions are definitely not beyond the scope of this proposal because
> the code to check the permissions has been added right here:
> 
>   +		if (perfmon_capable())
>   +			event->hw.flags |= ARMPMU_EVT_PRIV;
> 
> And all it needs extra is a check of perf_allow_kernel() or similar.
> 
>> different series. Although I would be willing to accommodate any alternate
>> suggestions to improve permission checking here in the BRBE driver.
> 
> I don't think planning to change it in the future is very user friendly
> either, otherwise any help we give to people stuck will have to start
> with an explanation about how we changed the permissions model across
> versions, and their command or setup also depends on the kernel version.

I guess this discussion regarding perfmon_capable(), perf_event_paranoid,
and perf_allow_kernel() has been happening in a rather cyclical manner :)
There are multiple approaches to solve this problem both in near and long
term, and there seems to be disagreement over which is the preferred path
to be taken. Hence, will just leave the decision up to the maintainers.