drivers/gpu/drm/panthor/Makefile | 1 + drivers/gpu/drm/panthor/panthor_device.c | 14 +- drivers/gpu/drm/panthor/panthor_device.h | 11 +- drivers/gpu/drm/panthor/panthor_drv.c | 150 +- drivers/gpu/drm/panthor/panthor_fw.c | 6 + drivers/gpu/drm/panthor/panthor_fw.h | 9 +- drivers/gpu/drm/panthor/panthor_perf.c | 1940 ++++++++++++++++++++++ drivers/gpu/drm/panthor/panthor_perf.h | 40 + include/uapi/drm/panthor_drm.h | 546 ++++++ 9 files changed, 2712 insertions(+), 5 deletions(-) create mode 100644 drivers/gpu/drm/panthor/panthor_perf.c create mode 100644 drivers/gpu/drm/panthor/panthor_perf.h
Hello, This patch set implements initial support for performance counter sampling in Panthor, as a follow-up for Adrián Larumbe's patch set [1]. With this patch series, the RFC tag is dropped, following [2]. The Mesa implementation is in progress, and will be posted within the next week or two. Existing performance counter workflows, such as those in game engines, and user-space power models/governor implementations require the ability to simultaneously obtain counter data. The hardware and firmware interfaces support a single global configuration, meaning the kernel must allow for the multiplexing. It is also in the best position to supplement the counter data with contextual information about elapsed sampling periods, information on the power state transitions undergone during the sampling period, and cycles elapsed on specific clocks chosen by the integrator. Each userspace client creates a session, providing an enable mask of counter values it requires, a BO for a ring buffer, and a separate BO for the insert and extract indices, along with an eventfd to signal counter capture, all of which are kept fixed for the lifetime of the session. When emitting a sample for a session, counters that were not requested are stripped out, and non-counter information needed to interpret counter values is added to either the sample header, or the block header, which are stored in-line with the counter values in the sample. The proposed uAPI specifies two major sources of supplemental information: - coarse-grained block state transitions are provided on newer FW versions which support the metadata block, a FW-provided counter block which indicates the reason a sample was taken when entering or exiting a non-counting region, or when a shader core has powered down. - the clock assignments to individual blocks is done by integrators, and in order to normalize counter values which count cycles, userspace must know both the clock cycles elapsed over the sampling period, and which of the clocks that particular block is associated with. All of the sessions are then aggregated by the sampler, which handles the programming of the FW interface and subsequent handling of the samples coming from FW. v2: - Fixed offset issues into FW ring buffer - Fixed sparse shader core handling - Added pre- and post- reset handlers - Added module param to control size of FW ring buffer - Clarified naming on sampler functions - Added error logging for PERF_SETUP [1]: https://lore.kernel.org/lkml/20240305165820.585245-1-adrian.larumbe@collabora.com/T/#m67d1f89614fe35dc0560e8304d6731eb1a6942b6 [2]: https://lore.kernel.org/lkml/20241211165024.490748-1-lukas.zapolskas@arm.com/ Adrián Larumbe (1): drm/panthor: Implement the counter sampler and sample handling Lukas Zapolskas (6): drm/panthor: Add performance counter uAPI drm/panthor: Add DEV_QUERY.PERF_INFO handling for Gx10 drm/panthor: Add panthor perf initialization and termination drm/panthor: Introduce sampling sessions to handle userspace clients drm/panthor: Add suspend, resume and reset handling drm/panthor: Expose the panthor perf ioctls drivers/gpu/drm/panthor/Makefile | 1 + drivers/gpu/drm/panthor/panthor_device.c | 14 +- drivers/gpu/drm/panthor/panthor_device.h | 11 +- drivers/gpu/drm/panthor/panthor_drv.c | 150 +- drivers/gpu/drm/panthor/panthor_fw.c | 6 + drivers/gpu/drm/panthor/panthor_fw.h | 9 +- drivers/gpu/drm/panthor/panthor_perf.c | 1940 ++++++++++++++++++++++ drivers/gpu/drm/panthor/panthor_perf.h | 40 + include/uapi/drm/panthor_drm.h | 546 ++++++ 9 files changed, 2712 insertions(+), 5 deletions(-) create mode 100644 drivers/gpu/drm/panthor/panthor_perf.c create mode 100644 drivers/gpu/drm/panthor/panthor_perf.h -- 2.33.0.dirty
Hi Lukas, I wanted to review this series for quite some time but lately have found myself caught up in quite a few other things. I've had a look into it last week, but before I delve into it any further, I was wondering whether you could take some time to go over the questions and comments I left in the review for the previous patch series version. That way I could know what changes you introduced in response to issues I raised, and which ones are due to a rethinking of the whole design. I remember some of the questions I posed dealt with a genuine lack of understanding of the way performance counters in CSF GPUs operate, so if you could find some time to answer them or else point me to the right sections of the TRM I'd find the review of this latest revision a lot easier. Kind Regards, Adrian Larumbe On 01.04.2025 16:48, Lukas Zapolskas wrote: > Hello, > > This patch set implements initial support for performance counter > sampling in Panthor, as a follow-up for Adrián Larumbe's patch > set [1]. With this patch series, the RFC tag is dropped, > following [2]. The Mesa implementation is in progress, and > will be posted within the next week or two. > > Existing performance counter workflows, such as those in game > engines, and user-space power models/governor implementations > require the ability to simultaneously obtain counter data. The > hardware and firmware interfaces support a single global > configuration, meaning the kernel must allow for the multiplexing. > It is also in the best position to supplement the counter data > with contextual information about elapsed sampling periods, > information on the power state transitions undergone during > the sampling period, and cycles elapsed on specific clocks chosen > by the integrator. > > Each userspace client creates a session, providing an enable > mask of counter values it requires, a BO for a ring buffer, > and a separate BO for the insert and extract indices, along with > an eventfd to signal counter capture, all of which are kept fixed > for the lifetime of the session. When emitting a sample for a > session, counters that were not requested are stripped out, > and non-counter information needed to interpret counter values > is added to either the sample header, or the block header, > which are stored in-line with the counter values in the sample. > > The proposed uAPI specifies two major sources of supplemental > information: > - coarse-grained block state transitions are provided on newer > FW versions which support the metadata block, a FW-provided > counter block which indicates the reason a sample was taken > when entering or exiting a non-counting region, or when a > shader core has powered down. > - the clock assignments to individual blocks is done by > integrators, and in order to normalize counter values > which count cycles, userspace must know both the clock > cycles elapsed over the sampling period, and which > of the clocks that particular block is associated > with. > > All of the sessions are then aggregated by the sampler, which > handles the programming of the FW interface and subsequent > handling of the samples coming from FW. > > v2: > - Fixed offset issues into FW ring buffer > - Fixed sparse shader core handling > - Added pre- and post- reset handlers > - Added module param to control size of FW ring buffer > - Clarified naming on sampler functions > - Added error logging for PERF_SETUP > > [1]: https://lore.kernel.org/lkml/20240305165820.585245-1-adrian.larumbe@collabora.com/T/#m67d1f89614fe35dc0560e8304d6731eb1a6942b6 > [2]: https://lore.kernel.org/lkml/20241211165024.490748-1-lukas.zapolskas@arm.com/ > > Adrián Larumbe (1): > drm/panthor: Implement the counter sampler and sample handling > > Lukas Zapolskas (6): > drm/panthor: Add performance counter uAPI > drm/panthor: Add DEV_QUERY.PERF_INFO handling for Gx10 > drm/panthor: Add panthor perf initialization and termination > drm/panthor: Introduce sampling sessions to handle userspace clients > drm/panthor: Add suspend, resume and reset handling > drm/panthor: Expose the panthor perf ioctls > > drivers/gpu/drm/panthor/Makefile | 1 + > drivers/gpu/drm/panthor/panthor_device.c | 14 +- > drivers/gpu/drm/panthor/panthor_device.h | 11 +- > drivers/gpu/drm/panthor/panthor_drv.c | 150 +- > drivers/gpu/drm/panthor/panthor_fw.c | 6 + > drivers/gpu/drm/panthor/panthor_fw.h | 9 +- > drivers/gpu/drm/panthor/panthor_perf.c | 1940 ++++++++++++++++++++++ > drivers/gpu/drm/panthor/panthor_perf.h | 40 + > include/uapi/drm/panthor_drm.h | 546 ++++++ > 9 files changed, 2712 insertions(+), 5 deletions(-) > create mode 100644 drivers/gpu/drm/panthor/panthor_perf.c > create mode 100644 drivers/gpu/drm/panthor/panthor_perf.h > > -- > 2.33.0.dirty
Hello Adrián, Thank you for reaching out about the matter. Could you please clarify what you would like me to elaborate on? I have responded to most, if not all, of the comments you raised in that review, including providing more information about the approach (please see [1] for more details on the uAPI apprach). I am also in the process of publishing a Mesa MR corresponding to the v4 of the patch series, which will be available. It includes additional fixes discussed in RFC v2, and I will provide a more detailed change log. Kind regards, Lukas Zapolskas [1]: https://lore.kernel.org/dri-devel/55fb6aa6-89dc-404c-89fc-5c56d15d8c98@arm.com/ On 07/05/2025 20:54, Adrián Larumbe wrote: > I wanted to review this series for quite some time but lately have found myself caught up in quite > a few other things. I've had a look into it last week, but before I delve into it any further, I was > wondering whether you could take some time to go over the questions and comments I left in the review > for the previous patch series version. > > That way I could know what changes you introduced in response to issues I raised, and which ones are > due to a rethinking of the whole design. > > I remember some of the questions I posed dealt with a genuine lack of understanding of the way > performance counters in CSF GPUs operate, so if you could find some time to answer them or else > point me to the right sections of the TRM I'd find the review of this latest revision a lot easier. > > Kind Regards, > Adrian Larumbe
© 2016 - 2026 Red Hat, Inc.