include/linux/perf_event.h | 12 + kernel/events/Makefile | 1 + kernel/events/core.c | 39 ++- kernel/events/dummy_pmu.c | 492 +++++++++++++++++++++++++++++++++++++ 4 files changed, 539 insertions(+), 5 deletions(-) create mode 100644 kernel/events/dummy_pmu.c
v2 of my attempt at fixing how i915 interacts with perf events. v1 - https://lore.kernel.org/all/20240722210648.80892-1-lucas.demarchi@intel.com/ From other people: 1) https://lore.kernel.org/lkml/20240115170120.662220-1-tvrtko.ursulin@linux.intel.com/ 2) https://lore.kernel.org/all/20240213180302.47266-1-umesh.nerlige.ramappa@intel.com/ WARNING: patches 1, 4 and 5 are NOT intended to be applied as is. More on this below. This series basically builds on the idea of the first patch of my previous series, but extends it in a way that a) the other patches are not needed (at least, not as is) and b) driver can rebind just fine - no issues with the new call to perf_pmu_register() Another difference is that rather than mixing i915 cleanups this just adds a dummy pmu with no backing HW. Intention for dummy_pmu is for experimental purpose and to demonstrate changes tha need to be applied to i915 (and probably amdgpu, and also in the pending xe patch). If desired to have an example like that in tree, then we should hide it behind a config option and I'd need to re-check the error handling. With this set I could run the following test script multiple times with no issues observed: #!/bin/bash set -e rand_sleep() { sleep $(bc -l <<< "$(shuf -i 0-3000 -n 1) / 1000") } test_noperf() { echo 0 > /sys/kernel/debug/dummy_pmu/bind echo 0 > /sys/kernel/debug/dummy_pmu/unbind } test_perf_before() { echo 0 > /sys/kernel/debug/dummy_pmu/bind perf stat --interval-count 2 -e dummy_pmu_0/test-event-1/ -I500 echo 0 > /sys/kernel/debug/dummy_pmu/unbind } test_kill_perf_later() { echo 0 > /sys/kernel/debug/dummy_pmu/bind perf stat -e dummy_pmu_0/test-event-1/ -I500 & pid=$! rand_sleep echo 0 > /sys/kernel/debug/dummy_pmu/unbind rand_sleep kill $pid } test_kill_perf_laaaaaaater() { echo 0 > /sys/kernel/debug/dummy_pmu/bind perf stat -e dummy_pmu_0/test-event-1/ -I500 & pid=$! rand_sleep echo 0 > /sys/kernel/debug/dummy_pmu/unbind rand_sleep rand_sleep rand_sleep kill $pid } test_kill_perf_with_leader() { echo 0 > /sys/kernel/debug/dummy_pmu/bind perf stat -e '{dummy_pmu_0/test-event-1/,dummy_pmu_0/test-event-2/}:S' -I500 & pid=$! rand_sleep echo 0 > /sys/kernel/debug/dummy_pmu/unbind rand_sleep rand_sleep kill $pid } N=${1:-1} for ((i=0; i<N; i++)); do printf "%04u/%04u\n" "$((i+1))" "$N" >&2 test_noperf test_perf_before test_kill_perf_later test_kill_perf_laaaaaaater test_kill_perf_with_leader echo >&2 done Last patch is optional for a driver and not needed for the fix. Open topics: - If something like the last patch is desirable, should it be done from inside perf_pmu_unregister()? - Should we have a dummy_pmu (or whatever name) behind a config, or maybe in Documentation/ ? thanks, Lucas De Marchi Lucas De Marchi (5): perf: Add dummy pmu module perf: Move free outside of the mutex perf: Add pmu get/put perf/dummy_pmu: Tie pmu to device lifecycle perf/dummy_pmu: Track and disable active events include/linux/perf_event.h | 12 + kernel/events/Makefile | 1 + kernel/events/core.c | 39 ++- kernel/events/dummy_pmu.c | 492 +++++++++++++++++++++++++++++++++++++ 4 files changed, 539 insertions(+), 5 deletions(-) create mode 100644 kernel/events/dummy_pmu.c -- 2.46.2
On Tue, Oct 08, 2024 at 01:34:56PM -0500, Lucas De Marchi wrote: >v2 of my attempt at fixing how i915 interacts with perf events. > >v1 - https://lore.kernel.org/all/20240722210648.80892-1-lucas.demarchi@intel.com/ > >From other people: >1) https://lore.kernel.org/lkml/20240115170120.662220-1-tvrtko.ursulin@linux.intel.com/ >2) https://lore.kernel.org/all/20240213180302.47266-1-umesh.nerlige.ramappa@intel.com/ > >WARNING: patches 1, 4 and 5 are NOT intended to be applied as is. More >on this below. I also took the patches 2 and 3, that are the ones needed, and applied the i915 changes on top. I sent only to i915 mailing list since I didn't want to pollute the mailing list with resubmissions of the same patches over and over. These fixes also worked for i915. See https://lore.kernel.org/intel-gfx/20241011225430.1219345-1-lucas.demarchi@intel.com/ if interested. Thanks Lucas De Marchi
On Tue, Oct 08, 2024 at 01:34:56PM -0500, Lucas De Marchi wrote: >v2 of my attempt at fixing how i915 interacts with perf events. > >v1 - https://lore.kernel.org/all/20240722210648.80892-1-lucas.demarchi@intel.com/ > >From other people: >1) https://lore.kernel.org/lkml/20240115170120.662220-1-tvrtko.ursulin@linux.intel.com/ >2) https://lore.kernel.org/all/20240213180302.47266-1-umesh.nerlige.ramappa@intel.com/ > >WARNING: patches 1, 4 and 5 are NOT intended to be applied as is. More >on this below. > >This series basically builds on the idea of the first patch of my >previous series, but extends it in a way that > > a) the other patches are not needed (at least, not as is) and > b) driver can rebind just fine - no issues with the new call to > perf_pmu_register() I have 2 broad questions: (1) I am curious how (b) works. You seem to have a notion of instances of devices and then are you using the instance number to create the name used for the sysfs entry? Did I get that right? If so, should the application discover what the name is each time it is run? In the failure case that I am seeing, I am running an application that does not work when I rename the sysfs entry to something else. (2) Similar to Patch 1 of your v1 series where you modified _free_event: static void _free_event(struct perf_event *event) { struct module *module; ... module = event->pmu->module; ... if (event->destroy) event->destroy(event); ... module_put(module); ... } With the above code, the kref to i915->pmu can be taken from the below points in i915 code (just to point out the sequence): i915_pmu_register() { perf_pmu_register() drm_dev_get() kref_init() } i915_pmu_unregister() { kref_put(&ref, pmu_cleanup) } i915_pmu_event_init() { kref_get() } i915_pmu_event_destroy() { kref_put(&ref, pmu_cleanup) } void pmu_cleanup(struct kref *ref) { i915_pmu_unregister_cpuhp_state(pmu); perf_pmu_unregister(&pmu->base); pmu->base.event_init = NULL; kfree(pmu->base.attr_groups); if (!is_igp(i915)) kfree(pmu->name); free_event_attributes(pmu); drm_dev_put(&i915->drm); } Would this work? Do you see any gaps that may need the ref counting code you added in perf? Thanks, Umesh > >Another difference is that rather than mixing i915 cleanups this just >adds a dummy pmu with no backing HW. Intention for dummy_pmu is for >experimental purpose and to demonstrate changes tha need to be applied >to i915 (and probably amdgpu, and also in the pending xe patch). >If desired to have an example like that in tree, then we should hide it >behind a config option and I'd need to re-check the error handling. > >With this set I could run the following test script multiple times with >no issues observed: > > #!/bin/bash > > set -e > > rand_sleep() { > sleep $(bc -l <<< "$(shuf -i 0-3000 -n 1) / 1000") > } > > test_noperf() { > echo 0 > /sys/kernel/debug/dummy_pmu/bind > > echo 0 > /sys/kernel/debug/dummy_pmu/unbind > } > > test_perf_before() { > echo 0 > /sys/kernel/debug/dummy_pmu/bind > > perf stat --interval-count 2 -e dummy_pmu_0/test-event-1/ -I500 > echo 0 > /sys/kernel/debug/dummy_pmu/unbind > } > > test_kill_perf_later() { > echo 0 > /sys/kernel/debug/dummy_pmu/bind > > perf stat -e dummy_pmu_0/test-event-1/ -I500 & > pid=$! > rand_sleep > echo 0 > /sys/kernel/debug/dummy_pmu/unbind > rand_sleep > kill $pid > } > > test_kill_perf_laaaaaaater() { > echo 0 > /sys/kernel/debug/dummy_pmu/bind > > perf stat -e dummy_pmu_0/test-event-1/ -I500 & > pid=$! > rand_sleep > echo 0 > /sys/kernel/debug/dummy_pmu/unbind > rand_sleep > rand_sleep > rand_sleep > kill $pid > } > > test_kill_perf_with_leader() { > echo 0 > /sys/kernel/debug/dummy_pmu/bind > > perf stat -e '{dummy_pmu_0/test-event-1/,dummy_pmu_0/test-event-2/}:S' -I500 & > pid=$! > rand_sleep > echo 0 > /sys/kernel/debug/dummy_pmu/unbind > rand_sleep > rand_sleep > kill $pid > } > > N=${1:-1} > > for ((i=0; i<N; i++)); do > printf "%04u/%04u\n" "$((i+1))" "$N" >&2 > test_noperf > test_perf_before > test_kill_perf_later > test_kill_perf_laaaaaaater > test_kill_perf_with_leader > echo >&2 > done > >Last patch is optional for a driver and not needed for the fix. > >Open topics: > > - If something like the last patch is desirable, should it be > done from inside perf_pmu_unregister()? > > - Should we have a dummy_pmu (or whatever name) behind a config, > or maybe in Documentation/ ? > >thanks, >Lucas De Marchi > >Lucas De Marchi (5): > perf: Add dummy pmu module > perf: Move free outside of the mutex > perf: Add pmu get/put > perf/dummy_pmu: Tie pmu to device lifecycle > perf/dummy_pmu: Track and disable active events > > include/linux/perf_event.h | 12 + > kernel/events/Makefile | 1 + > kernel/events/core.c | 39 ++- > kernel/events/dummy_pmu.c | 492 +++++++++++++++++++++++++++++++++++++ > 4 files changed, 539 insertions(+), 5 deletions(-) > create mode 100644 kernel/events/dummy_pmu.c > >-- >2.46.2 >
On Fri, Oct 11, 2024 at 03:21:18PM -0700, Umesh Nerlige Ramappa wrote: >On Tue, Oct 08, 2024 at 01:34:56PM -0500, Lucas De Marchi wrote: >>v2 of my attempt at fixing how i915 interacts with perf events. >> >>v1 - https://lore.kernel.org/all/20240722210648.80892-1-lucas.demarchi@intel.com/ >> >>From other people: >>1) https://lore.kernel.org/lkml/20240115170120.662220-1-tvrtko.ursulin@linux.intel.com/ >>2) https://lore.kernel.org/all/20240213180302.47266-1-umesh.nerlige.ramappa@intel.com/ >> >>WARNING: patches 1, 4 and 5 are NOT intended to be applied as is. More >>on this below. >> >>This series basically builds on the idea of the first patch of my >>previous series, but extends it in a way that >> >> a) the other patches are not needed (at least, not as is) and >> b) driver can rebind just fine - no issues with the new call to >> perf_pmu_register() > >I have 2 broad questions: > >(1) I am curious how (b) works. You seem to have a notion of instances >of devices and then are you using the instance number to create the >name used for the sysfs entry? Did I get that right? humn... no. We just unregister the driver from pmu, so the name becomes free for when the driver rebinds with the same event name. > >If so, should the application discover what the name is each time it >is run? In the failure case that I am seeing, I am running an >application that does not work when I rename the sysfs entry to >something else. > >(2) Similar to Patch 1 of your v1 series where you modified _free_event: > >static void _free_event(struct perf_event *event) >{ > struct module *module; >... > module = event->pmu->module; >... > if (event->destroy) > event->destroy(event); >... > module_put(module); >... >} > >With the above code, the kref to i915->pmu can be taken from the below >points in i915 code (just to point out the sequence): > >i915_pmu_register() >{ > perf_pmu_register() > drm_dev_get() > kref_init() >} > >i915_pmu_unregister() >{ > kref_put(&ref, pmu_cleanup) >} > >i915_pmu_event_init() >{ > kref_get() >} > >i915_pmu_event_destroy() >{ > kref_put(&ref, pmu_cleanup) >} > >void pmu_cleanup(struct kref *ref) >{ > i915_pmu_unregister_cpuhp_state(pmu); > perf_pmu_unregister(&pmu->base); > pmu->base.event_init = NULL; > kfree(pmu->base.attr_groups); > if (!is_igp(i915)) > kfree(pmu->name); > free_event_attributes(pmu); > drm_dev_put(&i915->drm); >} > >Would this work? Do you see any gaps that may need the ref counting >code you added in perf? well... I just posted the fixes for i915 on top of these patches :) You may want to look at https://lore.kernel.org/intel-gfx/20241011225430.1219345-1-lucas.demarchi@intel.com/ It works for me on my machine with a DG2. Lucas De Marchi
© 2016 - 2024 Red Hat, Inc.