[Qemu-devel] [PATCH v2 00/13] More fully implement ARM PMUv3

Aaron Lindsay posted 13 patches 6 years, 6 months ago
Failed in applying to current master (apply log)
Test checkpatch passed
Test docker passed
Test s390x passed
There is a newer version of this series
target/arm/cpu.c       |  10 +-
target/arm/cpu.h       |  34 +++-
target/arm/cpu64.c     |   2 +
target/arm/helper.c    | 535 +++++++++++++++++++++++++++++++++++++++++--------
target/arm/kvm64.c     |   2 +
target/arm/machine.c   |   2 +
target/arm/op_helper.c |   4 +
7 files changed, 500 insertions(+), 89 deletions(-)
[Qemu-devel] [PATCH v2 00/13] More fully implement ARM PMUv3
Posted by Aaron Lindsay 6 years, 6 months ago
The ARM PMU implementation currently contains a basic cycle counter, but it is
often useful to gather counts of other events and filter them based on
execution mode. These patches flesh out the implementations of various PMU
registers including PM[X]EVCNTR and PM[X]EVTYPER, add a struct definition to
represent arbitrary counter types, implement mode filtering, and add
instruction, cycle, and software increment events.

I am particularly interested in feedback on the following two patches because I
think I'm likely Doing It Wrong:
 [1] target/arm: Filter cycle counter based on PMCCFILTR_EL0
 [2] target/arm: PMU: Add instruction and cycle events

In order to implement mode filtering in an event-driven way, [1] adds a pair of
calls to pmu_sync() surrounding every update to a register/variable which may
affect whether any counter is currently filtered. These pmu_sync() calls
ultimately call cpu_get_icount_raw() for enabled instruction and cycle counters
when using icount. Unfortunately, cpu->can_do_io may otherwise be zero for
these calls so the current implementation in [2] temporarily sets can_do_io to
1. I haven't see any ill side effects from this in my testing, but it doesn't
seem like the right way to handle this.

I would like to eventually add sending interrupts on counter overflow.
Suggestions for the best direction to handle this are most welcome.  

Thanks for any feedback,
Aaron

Aaron Lindsay (13):
  target/arm: A53: Initialize PMCEID[0]
  target/arm: Check PMCNTEN for whether PMCCNTR is enabled
  target/arm: Reorganize PMCCNTR read, write, sync
  target/arm: Mask PMU register writes based on PMCR_EL0.N
  target/arm: Allow AArch32 access for PMCCFILTR
  target/arm: Filter cycle counter based on PMCCFILTR_EL0
  target/arm: Implement PMOVSSET
  target/arm: Split arm_ccnt_enabled into generic pmu_counter_enabled
  target/arm: Add array for supported PMU events, generate PMCEID[01]
  target/arm: Finish implementation of PM[X]EVCNTR and PM[X]EVTYPER
  target/arm: PMU: Add instruction and cycle events
  target/arm: PMU: Set PMCR.N to 4
  target/arm: Implement PMSWINC

 target/arm/cpu.c       |  10 +-
 target/arm/cpu.h       |  34 +++-
 target/arm/cpu64.c     |   2 +
 target/arm/helper.c    | 535 +++++++++++++++++++++++++++++++++++++++++--------
 target/arm/kvm64.c     |   2 +
 target/arm/machine.c   |   2 +
 target/arm/op_helper.c |   4 +
 7 files changed, 500 insertions(+), 89 deletions(-)

-- 
Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.


Re: [Qemu-devel] [PATCH v2 00/13] More fully implement ARM PMUv3
Posted by Aaron Lindsay 6 years, 5 months ago
Ping!

Unfortunately I'm not sure who to add other than the current recipients,
but I'm eager for feedback and would love to work this into something
that will allow for using the full ARM PMU. 

I've also updated Peter Crosthwaite's email since the xilinx one appears
to be stale.

-Aaron

On Sep 29 22:08, Aaron Lindsay wrote:
> The ARM PMU implementation currently contains a basic cycle counter, but it is
> often useful to gather counts of other events and filter them based on
> execution mode. These patches flesh out the implementations of various PMU
> registers including PM[X]EVCNTR and PM[X]EVTYPER, add a struct definition to
> represent arbitrary counter types, implement mode filtering, and add
> instruction, cycle, and software increment events.
> 
> I am particularly interested in feedback on the following two patches because I
> think I'm likely Doing It Wrong:
>  [1] target/arm: Filter cycle counter based on PMCCFILTR_EL0
>  [2] target/arm: PMU: Add instruction and cycle events
> 
> In order to implement mode filtering in an event-driven way, [1] adds a pair of
> calls to pmu_sync() surrounding every update to a register/variable which may
> affect whether any counter is currently filtered. These pmu_sync() calls
> ultimately call cpu_get_icount_raw() for enabled instruction and cycle counters
> when using icount. Unfortunately, cpu->can_do_io may otherwise be zero for
> these calls so the current implementation in [2] temporarily sets can_do_io to
> 1. I haven't see any ill side effects from this in my testing, but it doesn't
> seem like the right way to handle this.
> 
> I would like to eventually add sending interrupts on counter overflow.
> Suggestions for the best direction to handle this are most welcome.  
> 
> Thanks for any feedback,
> Aaron
> 
> Aaron Lindsay (13):
>   target/arm: A53: Initialize PMCEID[0]
>   target/arm: Check PMCNTEN for whether PMCCNTR is enabled
>   target/arm: Reorganize PMCCNTR read, write, sync
>   target/arm: Mask PMU register writes based on PMCR_EL0.N
>   target/arm: Allow AArch32 access for PMCCFILTR
>   target/arm: Filter cycle counter based on PMCCFILTR_EL0
>   target/arm: Implement PMOVSSET
>   target/arm: Split arm_ccnt_enabled into generic pmu_counter_enabled
>   target/arm: Add array for supported PMU events, generate PMCEID[01]
>   target/arm: Finish implementation of PM[X]EVCNTR and PM[X]EVTYPER
>   target/arm: PMU: Add instruction and cycle events
>   target/arm: PMU: Set PMCR.N to 4
>   target/arm: Implement PMSWINC
> 
>  target/arm/cpu.c       |  10 +-
>  target/arm/cpu.h       |  34 +++-
>  target/arm/cpu64.c     |   2 +
>  target/arm/helper.c    | 535 +++++++++++++++++++++++++++++++++++++++++--------
>  target/arm/kvm64.c     |   2 +
>  target/arm/machine.c   |   2 +
>  target/arm/op_helper.c |   4 +
>  7 files changed, 500 insertions(+), 89 deletions(-)
> 
> -- 
> Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc.
> Qualcomm Technologies, Inc. is a member of the
> Code Aurora Forum, a Linux Foundation Collaborative Project.
> 

-- 
Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

Re: [Qemu-devel] [PATCH v2 00/13] More fully implement ARM PMUv3
Posted by Peter Maydell 6 years, 5 months ago
On 9 October 2017 at 15:46, Aaron Lindsay <alindsay@codeaurora.org> wrote:
> Unfortunately I'm not sure who to add other than the current recipients,
> but I'm eager for feedback and would love to work this into something
> that will allow for using the full ARM PMU.

Hi -- I do have this on my review queue, but unfortunately it's
sitting behind some other fairly chunky hard-to-review patchsets.

As a first quick "is this going in the right direction" review based
pretty much only on the cover letter:

What extra events do you want to try to support in the emulated PMU?
Part of the reason we only support the cycle counter is because
 (1) a lot of the events in a real PMU would be hard to support
 (2) it's not clear to me that exposing events to the guest would be
     very useful to it anyway -- performance profiling of guest code
     running under emulation is fraught with difficulty

Giving more of an idea of what your use case is would help in
evaluating these patches.

Some of what you're doing looks like it's fixing bugs in our current
implementation, which is definitely fine in principle.

I haven't looked at the icount related stuff (and I can never remember
how it works either) but fiddling with can_do_io does sound like it's not
the right approach...

thanks
-- PMM

Re: [Qemu-devel] [PATCH v2 00/13] More fully implement ARM PMUv3
Posted by Aaron Lindsay 6 years, 5 months ago
On Oct 09 19:27, Peter Maydell wrote:
> On 9 October 2017 at 15:46, Aaron Lindsay <alindsay@codeaurora.org> wrote:
> > Unfortunately I'm not sure who to add other than the current recipients,
> > but I'm eager for feedback and would love to work this into something
> > that will allow for using the full ARM PMU.
> 
> Hi -- I do have this on my review queue, but unfortunately it's
> sitting behind some other fairly chunky hard-to-review patchsets.

No problem - I'll wait my turn.

> As a first quick "is this going in the right direction" review based
> pretty much only on the cover letter:
> 
> What extra events do you want to try to support in the emulated PMU?
> Part of the reason we only support the cycle counter is because
>  (1) a lot of the events in a real PMU would be hard to support
>  (2) it's not clear to me that exposing events to the guest would be
>      very useful to it anyway -- performance profiling of guest code
>      running under emulation is fraught with difficulty
> 
> Giving more of an idea of what your use case is would help in
> evaluating these patches.

My goal isn't to expose events concerned with microarchitectural
performance, but rather those that can help characterize architectural
behavior (initially instructions and maybe branches, but perhaps
anything in ARM ARM D5.10.4: "Common architectural event numbers"). We
use a number of platforms at different points along the accuracy/speed
trade-off continuum, and it is convenient to use self-hosted tools that
we can expect to work on all of them.

For instance, implementing the instruction counter allows us to stop
processes after executing prescribed numbers of instructions and resume
them on other platforms for further study (using CRIU). It can also be
useful to get a `perf` profile based on instruction count to get a rough
idea of where an application is spending its time.

> Some of what you're doing looks like it's fixing bugs in our current
> implementation, which is definitely fine in principle.

Yes, I believe patches 1-5 are fairly straightforward fixes, with the
later patches getting into the more invasive changes.

> I haven't looked at the icount related stuff (and I can never remember
> how it works either) but fiddling with can_do_io does sound like it's not
> the right approach...

Agreed. Perhaps part of the same offense as the pmu_sync() calls
scattered around - I was unable to find a better way to drive the mode
filtering, and am more than glad to pursue a different approach if
pointed in the right direction.

-Aaron

-- 
Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

Re: [Qemu-devel] [PATCH v2 00/13] More fully implement ARM PMUv3
Posted by Peter Maydell 6 years, 5 months ago
On 30 September 2017 at 03:08, Aaron Lindsay <alindsay@codeaurora.org> wrote:
> The ARM PMU implementation currently contains a basic cycle counter, but it is
> often useful to gather counts of other events and filter them based on
> execution mode. These patches flesh out the implementations of various PMU
> registers including PM[X]EVCNTR and PM[X]EVTYPER, add a struct definition to
> represent arbitrary counter types, implement mode filtering, and add
> instruction, cycle, and software increment events.
>
> I am particularly interested in feedback on the following two patches because I
> think I'm likely Doing It Wrong:
>  [1] target/arm: Filter cycle counter based on PMCCFILTR_EL0
>  [2] target/arm: PMU: Add instruction and cycle events
>
> In order to implement mode filtering in an event-driven way, [1] adds a pair of
> calls to pmu_sync() surrounding every update to a register/variable which may
> affect whether any counter is currently filtered. These pmu_sync() calls
> ultimately call cpu_get_icount_raw() for enabled instruction and cycle counters
> when using icount. Unfortunately, cpu->can_do_io may otherwise be zero for
> these calls so the current implementation in [2] temporarily sets can_do_io to
> 1. I haven't see any ill side effects from this in my testing, but it doesn't
> seem like the right way to handle this.

I've now reviewed the early stuff and provided what I hope is
a useful direction for the mode-filtering, so I'm not going to
look at the patches at the tail end on this version of the series.


> I would like to eventually add sending interrupts on counter overflow.
> Suggestions for the best direction to handle this are most welcome.

Check out how the helper.c timer interrupts are wired up
(the CPU object exposes outbound irq lines, which then get
wired up by the board to the GIC.)

thanks
-- PMM

Re: [Qemu-devel] [PATCH v2 00/13] More fully implement ARM PMUv3
Posted by Aaron Lindsay 6 years, 5 months ago
On Oct 17 16:09, Peter Maydell wrote:
> On 30 September 2017 at 03:08, Aaron Lindsay <alindsay@codeaurora.org> wrote:
> > The ARM PMU implementation currently contains a basic cycle counter, but it is
> > often useful to gather counts of other events and filter them based on
> > execution mode. These patches flesh out the implementations of various PMU
> > registers including PM[X]EVCNTR and PM[X]EVTYPER, add a struct definition to
> > represent arbitrary counter types, implement mode filtering, and add
> > instruction, cycle, and software increment events.
> >
> > I am particularly interested in feedback on the following two patches because I
> > think I'm likely Doing It Wrong:
> >  [1] target/arm: Filter cycle counter based on PMCCFILTR_EL0
> >  [2] target/arm: PMU: Add instruction and cycle events
> >
> > In order to implement mode filtering in an event-driven way, [1] adds a pair of
> > calls to pmu_sync() surrounding every update to a register/variable which may
> > affect whether any counter is currently filtered. These pmu_sync() calls
> > ultimately call cpu_get_icount_raw() for enabled instruction and cycle counters
> > when using icount. Unfortunately, cpu->can_do_io may otherwise be zero for
> > these calls so the current implementation in [2] temporarily sets can_do_io to
> > 1. I haven't see any ill side effects from this in my testing, but it doesn't
> > seem like the right way to handle this.
> 
> I've now reviewed the early stuff and provided what I hope is
> a useful direction for the mode-filtering, so I'm not going to
> look at the patches at the tail end on this version of the series.

Thank you, your review has been very helpful - the next pass of the
mode-filtering patch will be more maintainable.

> > I would like to eventually add sending interrupts on counter overflow.
> > Suggestions for the best direction to handle this are most welcome.
> 
> Check out how the helper.c timer interrupts are wired up
> (the CPU object exposes outbound irq lines, which then get
> wired up by the board to the GIC.)

Thanks for the pointer - I'll take a look.

-Aaron

-- 
Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.