drivers/cpufreq/cpufreq.c | 2 ++ include/trace/events/power.h | 21 +++++++++++++++++++++ 2 files changed, 23 insertions(+)
The existing cpu_frequency trace_event can be verbose, emitting an event
for every CPU in the policy even when their frequencies are identical.
This patch adds a new policy_frequency trace event, which provides a
more efficient alternative to cpu_frequency trace event. This option
allows users who only need frequency at a policy level more concise logs
with simpler analysis.
Signed-off-by: Samuel Wu <wusamuel@google.com>
---
drivers/cpufreq/cpufreq.c | 2 ++
include/trace/events/power.h | 21 +++++++++++++++++++++
2 files changed, 23 insertions(+)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 4472bb1ec83c..b65534a4fd9a 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -345,6 +345,7 @@ static void cpufreq_notify_transition(struct cpufreq_policy *policy,
pr_debug("FREQ: %u - CPUs: %*pbl\n", freqs->new,
cpumask_pr_args(policy->cpus));
+ trace_policy_frequency(freqs->new, policy->cpu);
for_each_cpu(cpu, policy->cpus)
trace_cpu_frequency(freqs->new, cpu);
@@ -2214,6 +2215,7 @@ unsigned int cpufreq_driver_fast_switch(struct cpufreq_policy *policy,
arch_scale_freq_ref(policy->cpu));
cpufreq_stats_record_transition(policy, freq);
+ trace_policy_frequency(freq, policy->cpu);
if (trace_cpu_frequency_enabled()) {
for_each_cpu(cpu, policy->cpus)
trace_cpu_frequency(freq, cpu);
diff --git a/include/trace/events/power.h b/include/trace/events/power.h
index 82904291c2b8..87dbd53c8ea1 100644
--- a/include/trace/events/power.h
+++ b/include/trace/events/power.h
@@ -188,6 +188,27 @@ DEFINE_EVENT(cpu, cpu_frequency,
TP_ARGS(frequency, cpu_id)
);
+TRACE_EVENT(policy_frequency,
+
+ TP_PROTO(unsigned int frequency, unsigned int policy_cpu),
+
+ TP_ARGS(frequency, policy_cpu),
+
+ TP_STRUCT__entry(
+ __field(u32, frequency)
+ __field(u32, policy_cpu)
+ ),
+
+ TP_fast_assign(
+ __entry->frequency = frequency;
+ __entry->policy_cpu = policy_cpu;
+ ),
+
+ TP_printk("frequency=%lu policy_cpu=%lu",
+ (unsigned long)__entry->frequency,
+ (unsigned long)__entry->policy_cpu)
+);
+
TRACE_EVENT(cpu_frequency_limits,
TP_PROTO(struct cpufreq_policy *policy),
--
2.51.2.1041.gc1ab5b90ca-goog
On 12-11-25, 15:51, Samuel Wu wrote:
> The existing cpu_frequency trace_event can be verbose, emitting an event
> for every CPU in the policy even when their frequencies are identical.
>
> This patch adds a new policy_frequency trace event, which provides a
> more efficient alternative to cpu_frequency trace event. This option
> allows users who only need frequency at a policy level more concise logs
> with simpler analysis.
>
> Signed-off-by: Samuel Wu <wusamuel@google.com>
> ---
> drivers/cpufreq/cpufreq.c | 2 ++
> include/trace/events/power.h | 21 +++++++++++++++++++++
> 2 files changed, 23 insertions(+)
>
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 4472bb1ec83c..b65534a4fd9a 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -345,6 +345,7 @@ static void cpufreq_notify_transition(struct cpufreq_policy *policy,
> pr_debug("FREQ: %u - CPUs: %*pbl\n", freqs->new,
> cpumask_pr_args(policy->cpus));
>
> + trace_policy_frequency(freqs->new, policy->cpu);
> for_each_cpu(cpu, policy->cpus)
> trace_cpu_frequency(freqs->new, cpu);
I don't see much value in almost duplicate trace events. If we feel that a
per-policy event is a better fit (which makes sens), then we can just drop the
trace_cpu_frequency() events and print policy->cpus (or related_cpus)
information along with the per-policy events.
--
viresh
On Wed, Nov 12, 2025 at 10:45 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 12-11-25, 15:51, Samuel Wu wrote:
> > The existing cpu_frequency trace_event can be verbose, emitting an event
> > for every CPU in the policy even when their frequencies are identical.
> >
> > This patch adds a new policy_frequency trace event, which provides a
> > more efficient alternative to cpu_frequency trace event. This option
> > allows users who only need frequency at a policy level more concise logs
> > with simpler analysis.
> >
> > Signed-off-by: Samuel Wu <wusamuel@google.com>
> > ---
> > drivers/cpufreq/cpufreq.c | 2 ++
> > include/trace/events/power.h | 21 +++++++++++++++++++++
> > 2 files changed, 23 insertions(+)
> >
> > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > index 4472bb1ec83c..b65534a4fd9a 100644
> > --- a/drivers/cpufreq/cpufreq.c
> > +++ b/drivers/cpufreq/cpufreq.c
> > @@ -345,6 +345,7 @@ static void cpufreq_notify_transition(struct cpufreq_policy *policy,
> > pr_debug("FREQ: %u - CPUs: %*pbl\n", freqs->new,
> > cpumask_pr_args(policy->cpus));
> >
> > + trace_policy_frequency(freqs->new, policy->cpu);
> > for_each_cpu(cpu, policy->cpus)
> > trace_cpu_frequency(freqs->new, cpu);
>
> I don't see much value in almost duplicate trace events. If we feel that a
> per-policy event is a better fit (which makes sens), then we can just drop the
> trace_cpu_frequency() events and print policy->cpus (or related_cpus)
> information along with the per-policy events.
Thank you for the feedback Viresh. Fair enough, I've done some testing
and a single trace event should work and would be cleaner. Please let
me know what you think of this proposal for v2.
We can append a bitmask of policy->cpus field to
trace_cpu_frequency(). This way we maintain backwards compatibility:
trace_cpu_frequency() is not removed, and its pre-existing fields are
not disturbed.
Call flow wise, we can delete all the for_each_cpu() loops, and we
still retain the benefits of the trace emitting once per policy
instead of once per cpu.
On 13-11-25, 19:41, Samuel Wu wrote:
> On Wed, Nov 12, 2025 at 10:45 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> >
> > On 12-11-25, 15:51, Samuel Wu wrote:
> > > The existing cpu_frequency trace_event can be verbose, emitting an event
> > > for every CPU in the policy even when their frequencies are identical.
> > >
> > > This patch adds a new policy_frequency trace event, which provides a
> > > more efficient alternative to cpu_frequency trace event. This option
> > > allows users who only need frequency at a policy level more concise logs
> > > with simpler analysis.
> > >
> > > Signed-off-by: Samuel Wu <wusamuel@google.com>
> > > ---
> > > drivers/cpufreq/cpufreq.c | 2 ++
> > > include/trace/events/power.h | 21 +++++++++++++++++++++
> > > 2 files changed, 23 insertions(+)
> > >
> > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > > index 4472bb1ec83c..b65534a4fd9a 100644
> > > --- a/drivers/cpufreq/cpufreq.c
> > > +++ b/drivers/cpufreq/cpufreq.c
> > > @@ -345,6 +345,7 @@ static void cpufreq_notify_transition(struct cpufreq_policy *policy,
> > > pr_debug("FREQ: %u - CPUs: %*pbl\n", freqs->new,
> > > cpumask_pr_args(policy->cpus));
> > >
> > > + trace_policy_frequency(freqs->new, policy->cpu);
> > > for_each_cpu(cpu, policy->cpus)
> > > trace_cpu_frequency(freqs->new, cpu);
> >
> > I don't see much value in almost duplicate trace events. If we feel that a
> > per-policy event is a better fit (which makes sens), then we can just drop the
> > trace_cpu_frequency() events and print policy->cpus (or related_cpus)
> > information along with the per-policy events.
>
> Thank you for the feedback Viresh. Fair enough, I've done some testing
> and a single trace event should work and would be cleaner. Please let
> me know what you think of this proposal for v2.
>
> We can append a bitmask of policy->cpus field to
> trace_cpu_frequency(). This way we maintain backwards compatibility:
> trace_cpu_frequency() is not removed, and its pre-existing fields are
> not disturbed.
>
> Call flow wise, we can delete all the for_each_cpu() loops, and we
> still retain the benefits of the trace emitting once per policy
> instead of once per cpu.
Fine by me. I have added Scheduler maintainers in the loop to see if they have a
different view.
--
viresh
On 11/14/25 05:11, Viresh Kumar wrote:
> On 13-11-25, 19:41, Samuel Wu wrote:
>> On Wed, Nov 12, 2025 at 10:45 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>>>
>>> On 12-11-25, 15:51, Samuel Wu wrote:
>>>> The existing cpu_frequency trace_event can be verbose, emitting an event
>>>> for every CPU in the policy even when their frequencies are identical.
>>>>
>>>> This patch adds a new policy_frequency trace event, which provides a
>>>> more efficient alternative to cpu_frequency trace event. This option
>>>> allows users who only need frequency at a policy level more concise logs
>>>> with simpler analysis.
>>>>
>>>> Signed-off-by: Samuel Wu <wusamuel@google.com>
>>>> ---
>>>> drivers/cpufreq/cpufreq.c | 2 ++
>>>> include/trace/events/power.h | 21 +++++++++++++++++++++
>>>> 2 files changed, 23 insertions(+)
>>>>
>>>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
>>>> index 4472bb1ec83c..b65534a4fd9a 100644
>>>> --- a/drivers/cpufreq/cpufreq.c
>>>> +++ b/drivers/cpufreq/cpufreq.c
>>>> @@ -345,6 +345,7 @@ static void cpufreq_notify_transition(struct cpufreq_policy *policy,
>>>> pr_debug("FREQ: %u - CPUs: %*pbl\n", freqs->new,
>>>> cpumask_pr_args(policy->cpus));
>>>>
>>>> + trace_policy_frequency(freqs->new, policy->cpu);
>>>> for_each_cpu(cpu, policy->cpus)
>>>> trace_cpu_frequency(freqs->new, cpu);
>>>
>>> I don't see much value in almost duplicate trace events. If we feel that a
>>> per-policy event is a better fit (which makes sens), then we can just drop the
>>> trace_cpu_frequency() events and print policy->cpus (or related_cpus)
>>> information along with the per-policy events.
>>
>> Thank you for the feedback Viresh. Fair enough, I've done some testing
>> and a single trace event should work and would be cleaner. Please let
>> me know what you think of this proposal for v2.
>>
>> We can append a bitmask of policy->cpus field to
>> trace_cpu_frequency(). This way we maintain backwards compatibility:
>> trace_cpu_frequency() is not removed, and its pre-existing fields are
>> not disturbed.
>>
>> Call flow wise, we can delete all the for_each_cpu() loops, and we
>> still retain the benefits of the trace emitting once per policy
>> instead of once per cpu.
>
> Fine by me. I have added Scheduler maintainers in the loop to see if they have a
> different view.
>
And IIUC your proposal is to fold policy_frequency into cpu_frequency but then
only have one cpu_frequency event per policy emitted?
I think from a tooling perspective it would be easier to remove cpu_frequency
entirely, then tools can probe on the presence of policy_frequency / cpu_frequency.
On Mon, Nov 17, 2025 at 1:18 AM Christian Loehle
<christian.loehle@arm.com> wrote:
>
> On 11/14/25 05:11, Viresh Kumar wrote:
> > On 13-11-25, 19:41, Samuel Wu wrote:
> >> On Wed, Nov 12, 2025 at 10:45 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> >>>
> >>> On 12-11-25, 15:51, Samuel Wu wrote:
> >>>> The existing cpu_frequency trace_event can be verbose, emitting an event
> >>>> for every CPU in the policy even when their frequencies are identical.
> >>>>
> >>>> This patch adds a new policy_frequency trace event, which provides a
> >>>> more efficient alternative to cpu_frequency trace event. This option
> >>>> allows users who only need frequency at a policy level more concise logs
> >>>> with simpler analysis.
> >>>>
> >>>> Signed-off-by: Samuel Wu <wusamuel@google.com>
> >>>> ---
> >>>> drivers/cpufreq/cpufreq.c | 2 ++
> >>>> include/trace/events/power.h | 21 +++++++++++++++++++++
> >>>> 2 files changed, 23 insertions(+)
> >>>>
> >>>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> >>>> index 4472bb1ec83c..b65534a4fd9a 100644
> >>>> --- a/drivers/cpufreq/cpufreq.c
> >>>> +++ b/drivers/cpufreq/cpufreq.c
> >>>> @@ -345,6 +345,7 @@ static void cpufreq_notify_transition(struct cpufreq_policy *policy,
> >>>> pr_debug("FREQ: %u - CPUs: %*pbl\n", freqs->new,
> >>>> cpumask_pr_args(policy->cpus));
> >>>>
> >>>> + trace_policy_frequency(freqs->new, policy->cpu);
> >>>> for_each_cpu(cpu, policy->cpus)
> >>>> trace_cpu_frequency(freqs->new, cpu);
> >>>
> >>> I don't see much value in almost duplicate trace events. If we feel that a
> >>> per-policy event is a better fit (which makes sens), then we can just drop the
> >>> trace_cpu_frequency() events and print policy->cpus (or related_cpus)
> >>> information along with the per-policy events.
> >>
> >> Thank you for the feedback Viresh. Fair enough, I've done some testing
> >> and a single trace event should work and would be cleaner. Please let
> >> me know what you think of this proposal for v2.
> >>
> >> We can append a bitmask of policy->cpus field to
> >> trace_cpu_frequency(). This way we maintain backwards compatibility:
> >> trace_cpu_frequency() is not removed, and its pre-existing fields are
> >> not disturbed.
> >>
> >> Call flow wise, we can delete all the for_each_cpu() loops, and we
> >> still retain the benefits of the trace emitting once per policy
> >> instead of once per cpu.
> >
> > Fine by me. I have added Scheduler maintainers in the loop to see if they have a
> > different view.
> >
>
> And IIUC your proposal is to fold policy_frequency into cpu_frequency but then
> only have one cpu_frequency event per policy emitted?
That's right, emit the trace event once per policy instead of once per
cpu- which I think is the most valuable element of this patch. And
yes, the latest idea was to append bitmask of policy->cpus into the
cpu_frequency event such that relevant policy info is encapsulated in
the trace event.
> I think from a tooling perspective it would be easier to remove cpu_frequency
> entirely, then tools can probe on the presence of policy_frequency / cpu_frequency.
This can be handled perfectly fine by the tools I know of that consume
this trace event. The points you and Viresh have brought up are valid,
and as this solution is not in conflict with those points,
"policy_frequency replacing cpu_frequency" can be the frontrunner for
now.
On 11/14/25 05:11, Viresh Kumar wrote:
> On 13-11-25, 19:41, Samuel Wu wrote:
>> On Wed, Nov 12, 2025 at 10:45 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>>>
>>> On 12-11-25, 15:51, Samuel Wu wrote:
>>>> The existing cpu_frequency trace_event can be verbose, emitting an event
>>>> for every CPU in the policy even when their frequencies are identical.
>>>>
>>>> This patch adds a new policy_frequency trace event, which provides a
>>>> more efficient alternative to cpu_frequency trace event. This option
>>>> allows users who only need frequency at a policy level more concise logs
>>>> with simpler analysis.
>>>>
>>>> Signed-off-by: Samuel Wu <wusamuel@google.com>
>>>> ---
>>>> drivers/cpufreq/cpufreq.c | 2 ++
>>>> include/trace/events/power.h | 21 +++++++++++++++++++++
>>>> 2 files changed, 23 insertions(+)
>>>>
>>>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
>>>> index 4472bb1ec83c..b65534a4fd9a 100644
>>>> --- a/drivers/cpufreq/cpufreq.c
>>>> +++ b/drivers/cpufreq/cpufreq.c
>>>> @@ -345,6 +345,7 @@ static void cpufreq_notify_transition(struct cpufreq_policy *policy,
>>>> pr_debug("FREQ: %u - CPUs: %*pbl\n", freqs->new,
>>>> cpumask_pr_args(policy->cpus));
>>>>
>>>> + trace_policy_frequency(freqs->new, policy->cpu);
>>>> for_each_cpu(cpu, policy->cpus)
>>>> trace_cpu_frequency(freqs->new, cpu);
>>>
>>> I don't see much value in almost duplicate trace events. If we feel that a
>>> per-policy event is a better fit (which makes sens), then we can just drop the
>>> trace_cpu_frequency() events and print policy->cpus (or related_cpus)
>>> information along with the per-policy events.
>>
>> Thank you for the feedback Viresh. Fair enough, I've done some testing
>> and a single trace event should work and would be cleaner. Please let
>> me know what you think of this proposal for v2.
>>
>> We can append a bitmask of policy->cpus field to
>> trace_cpu_frequency(). This way we maintain backwards compatibility:
>> trace_cpu_frequency() is not removed, and its pre-existing fields are
>> not disturbed.
>>
>> Call flow wise, we can delete all the for_each_cpu() loops, and we
>> still retain the benefits of the trace emitting once per policy
>> instead of once per cpu.
>
> Fine by me. I have added Scheduler maintainers in the loop to see if they have a
> different view.
>
That's gonna break a lot of tooling but AFAICS that in and of itself isn't
a good enough reason not to do it.
(Added some CCs)
© 2016 - 2025 Red Hat, Inc.