[PATCH v1] cpufreq: Add policy_frequency trace event

Samuel Wu posted 1 patch 3 weeks, 4 days ago
drivers/cpufreq/cpufreq.c    |  2 ++
include/trace/events/power.h | 21 +++++++++++++++++++++
2 files changed, 23 insertions(+)
[PATCH v1] cpufreq: Add policy_frequency trace event
Posted by Samuel Wu 3 weeks, 4 days ago
The existing cpu_frequency trace_event can be verbose, emitting an event
for every CPU in the policy even when their frequencies are identical.

This patch adds a new policy_frequency trace event, which provides a
more efficient alternative to cpu_frequency trace event. This option
allows users who only need frequency at a policy level more concise logs
with simpler analysis.

Signed-off-by: Samuel Wu <wusamuel@google.com>
---
 drivers/cpufreq/cpufreq.c    |  2 ++
 include/trace/events/power.h | 21 +++++++++++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 4472bb1ec83c..b65534a4fd9a 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -345,6 +345,7 @@ static void cpufreq_notify_transition(struct cpufreq_policy *policy,
 		pr_debug("FREQ: %u - CPUs: %*pbl\n", freqs->new,
 			 cpumask_pr_args(policy->cpus));
 
+		trace_policy_frequency(freqs->new, policy->cpu);
 		for_each_cpu(cpu, policy->cpus)
 			trace_cpu_frequency(freqs->new, cpu);
 
@@ -2214,6 +2215,7 @@ unsigned int cpufreq_driver_fast_switch(struct cpufreq_policy *policy,
 			    arch_scale_freq_ref(policy->cpu));
 	cpufreq_stats_record_transition(policy, freq);
 
+	trace_policy_frequency(freq, policy->cpu);
 	if (trace_cpu_frequency_enabled()) {
 		for_each_cpu(cpu, policy->cpus)
 			trace_cpu_frequency(freq, cpu);
diff --git a/include/trace/events/power.h b/include/trace/events/power.h
index 82904291c2b8..87dbd53c8ea1 100644
--- a/include/trace/events/power.h
+++ b/include/trace/events/power.h
@@ -188,6 +188,27 @@ DEFINE_EVENT(cpu, cpu_frequency,
 	TP_ARGS(frequency, cpu_id)
 );
 
+TRACE_EVENT(policy_frequency,
+
+	TP_PROTO(unsigned int frequency, unsigned int policy_cpu),
+
+	TP_ARGS(frequency, policy_cpu),
+
+	TP_STRUCT__entry(
+		__field(u32, frequency)
+		__field(u32, policy_cpu)
+	),
+
+	TP_fast_assign(
+		__entry->frequency = frequency;
+		__entry->policy_cpu = policy_cpu;
+	),
+
+	TP_printk("frequency=%lu policy_cpu=%lu",
+		  (unsigned long)__entry->frequency,
+		  (unsigned long)__entry->policy_cpu)
+);
+
 TRACE_EVENT(cpu_frequency_limits,
 
 	TP_PROTO(struct cpufreq_policy *policy),
-- 
2.51.2.1041.gc1ab5b90ca-goog
Re: [PATCH v1] cpufreq: Add policy_frequency trace event
Posted by Viresh Kumar 3 weeks, 3 days ago
On 12-11-25, 15:51, Samuel Wu wrote:
> The existing cpu_frequency trace_event can be verbose, emitting an event
> for every CPU in the policy even when their frequencies are identical.
> 
> This patch adds a new policy_frequency trace event, which provides a
> more efficient alternative to cpu_frequency trace event. This option
> allows users who only need frequency at a policy level more concise logs
> with simpler analysis.
> 
> Signed-off-by: Samuel Wu <wusamuel@google.com>
> ---
>  drivers/cpufreq/cpufreq.c    |  2 ++
>  include/trace/events/power.h | 21 +++++++++++++++++++++
>  2 files changed, 23 insertions(+)
> 
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 4472bb1ec83c..b65534a4fd9a 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -345,6 +345,7 @@ static void cpufreq_notify_transition(struct cpufreq_policy *policy,
>  		pr_debug("FREQ: %u - CPUs: %*pbl\n", freqs->new,
>  			 cpumask_pr_args(policy->cpus));
>  
> +		trace_policy_frequency(freqs->new, policy->cpu);
>  		for_each_cpu(cpu, policy->cpus)
>  			trace_cpu_frequency(freqs->new, cpu);

I don't see much value in almost duplicate trace events. If we feel that a
per-policy event is a better fit (which makes sens), then we can just drop the
trace_cpu_frequency() events and print policy->cpus (or related_cpus)
information along with the per-policy events.

-- 
viresh
Re: [PATCH v1] cpufreq: Add policy_frequency trace event
Posted by Samuel Wu 3 weeks, 2 days ago
On Wed, Nov 12, 2025 at 10:45 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 12-11-25, 15:51, Samuel Wu wrote:
> > The existing cpu_frequency trace_event can be verbose, emitting an event
> > for every CPU in the policy even when their frequencies are identical.
> >
> > This patch adds a new policy_frequency trace event, which provides a
> > more efficient alternative to cpu_frequency trace event. This option
> > allows users who only need frequency at a policy level more concise logs
> > with simpler analysis.
> >
> > Signed-off-by: Samuel Wu <wusamuel@google.com>
> > ---
> >  drivers/cpufreq/cpufreq.c    |  2 ++
> >  include/trace/events/power.h | 21 +++++++++++++++++++++
> >  2 files changed, 23 insertions(+)
> >
> > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > index 4472bb1ec83c..b65534a4fd9a 100644
> > --- a/drivers/cpufreq/cpufreq.c
> > +++ b/drivers/cpufreq/cpufreq.c
> > @@ -345,6 +345,7 @@ static void cpufreq_notify_transition(struct cpufreq_policy *policy,
> >               pr_debug("FREQ: %u - CPUs: %*pbl\n", freqs->new,
> >                        cpumask_pr_args(policy->cpus));
> >
> > +             trace_policy_frequency(freqs->new, policy->cpu);
> >               for_each_cpu(cpu, policy->cpus)
> >                       trace_cpu_frequency(freqs->new, cpu);
>
> I don't see much value in almost duplicate trace events. If we feel that a
> per-policy event is a better fit (which makes sens), then we can just drop the
> trace_cpu_frequency() events and print policy->cpus (or related_cpus)
> information along with the per-policy events.

Thank you for the feedback Viresh. Fair enough, I've done some testing
and a single trace event should work and would be cleaner. Please let
me know what you think of this proposal for v2.

We can append a bitmask of policy->cpus field to
trace_cpu_frequency(). This way we maintain backwards compatibility:
trace_cpu_frequency() is not removed, and its pre-existing fields are
not disturbed.

Call flow wise, we can delete all the for_each_cpu() loops, and we
still retain the benefits of the trace emitting once per policy
instead of once per cpu.
Re: [PATCH v1] cpufreq: Add policy_frequency trace event
Posted by Viresh Kumar 3 weeks, 2 days ago
On 13-11-25, 19:41, Samuel Wu wrote:
> On Wed, Nov 12, 2025 at 10:45 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> >
> > On 12-11-25, 15:51, Samuel Wu wrote:
> > > The existing cpu_frequency trace_event can be verbose, emitting an event
> > > for every CPU in the policy even when their frequencies are identical.
> > >
> > > This patch adds a new policy_frequency trace event, which provides a
> > > more efficient alternative to cpu_frequency trace event. This option
> > > allows users who only need frequency at a policy level more concise logs
> > > with simpler analysis.
> > >
> > > Signed-off-by: Samuel Wu <wusamuel@google.com>
> > > ---
> > >  drivers/cpufreq/cpufreq.c    |  2 ++
> > >  include/trace/events/power.h | 21 +++++++++++++++++++++
> > >  2 files changed, 23 insertions(+)
> > >
> > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > > index 4472bb1ec83c..b65534a4fd9a 100644
> > > --- a/drivers/cpufreq/cpufreq.c
> > > +++ b/drivers/cpufreq/cpufreq.c
> > > @@ -345,6 +345,7 @@ static void cpufreq_notify_transition(struct cpufreq_policy *policy,
> > >               pr_debug("FREQ: %u - CPUs: %*pbl\n", freqs->new,
> > >                        cpumask_pr_args(policy->cpus));
> > >
> > > +             trace_policy_frequency(freqs->new, policy->cpu);
> > >               for_each_cpu(cpu, policy->cpus)
> > >                       trace_cpu_frequency(freqs->new, cpu);
> >
> > I don't see much value in almost duplicate trace events. If we feel that a
> > per-policy event is a better fit (which makes sens), then we can just drop the
> > trace_cpu_frequency() events and print policy->cpus (or related_cpus)
> > information along with the per-policy events.
> 
> Thank you for the feedback Viresh. Fair enough, I've done some testing
> and a single trace event should work and would be cleaner. Please let
> me know what you think of this proposal for v2.
> 
> We can append a bitmask of policy->cpus field to
> trace_cpu_frequency(). This way we maintain backwards compatibility:
> trace_cpu_frequency() is not removed, and its pre-existing fields are
> not disturbed.
> 
> Call flow wise, we can delete all the for_each_cpu() loops, and we
> still retain the benefits of the trace emitting once per policy
> instead of once per cpu.

Fine by me. I have added Scheduler maintainers in the loop to see if they have a
different view.

-- 
viresh
Re: [PATCH v1] cpufreq: Add policy_frequency trace event
Posted by Christian Loehle 2 weeks, 6 days ago
On 11/14/25 05:11, Viresh Kumar wrote:
> On 13-11-25, 19:41, Samuel Wu wrote:
>> On Wed, Nov 12, 2025 at 10:45 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>>>
>>> On 12-11-25, 15:51, Samuel Wu wrote:
>>>> The existing cpu_frequency trace_event can be verbose, emitting an event
>>>> for every CPU in the policy even when their frequencies are identical.
>>>>
>>>> This patch adds a new policy_frequency trace event, which provides a
>>>> more efficient alternative to cpu_frequency trace event. This option
>>>> allows users who only need frequency at a policy level more concise logs
>>>> with simpler analysis.
>>>>
>>>> Signed-off-by: Samuel Wu <wusamuel@google.com>
>>>> ---
>>>>  drivers/cpufreq/cpufreq.c    |  2 ++
>>>>  include/trace/events/power.h | 21 +++++++++++++++++++++
>>>>  2 files changed, 23 insertions(+)
>>>>
>>>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
>>>> index 4472bb1ec83c..b65534a4fd9a 100644
>>>> --- a/drivers/cpufreq/cpufreq.c
>>>> +++ b/drivers/cpufreq/cpufreq.c
>>>> @@ -345,6 +345,7 @@ static void cpufreq_notify_transition(struct cpufreq_policy *policy,
>>>>               pr_debug("FREQ: %u - CPUs: %*pbl\n", freqs->new,
>>>>                        cpumask_pr_args(policy->cpus));
>>>>
>>>> +             trace_policy_frequency(freqs->new, policy->cpu);
>>>>               for_each_cpu(cpu, policy->cpus)
>>>>                       trace_cpu_frequency(freqs->new, cpu);
>>>
>>> I don't see much value in almost duplicate trace events. If we feel that a
>>> per-policy event is a better fit (which makes sens), then we can just drop the
>>> trace_cpu_frequency() events and print policy->cpus (or related_cpus)
>>> information along with the per-policy events.
>>
>> Thank you for the feedback Viresh. Fair enough, I've done some testing
>> and a single trace event should work and would be cleaner. Please let
>> me know what you think of this proposal for v2.
>>
>> We can append a bitmask of policy->cpus field to
>> trace_cpu_frequency(). This way we maintain backwards compatibility:
>> trace_cpu_frequency() is not removed, and its pre-existing fields are
>> not disturbed.
>>
>> Call flow wise, we can delete all the for_each_cpu() loops, and we
>> still retain the benefits of the trace emitting once per policy
>> instead of once per cpu.
> 
> Fine by me. I have added Scheduler maintainers in the loop to see if they have a
> different view.
>

And IIUC your proposal is to fold policy_frequency into cpu_frequency but then
only have one cpu_frequency event per policy emitted?
I think from a tooling perspective it would be easier to remove cpu_frequency
entirely, then tools can probe on the presence of policy_frequency / cpu_frequency.
Re: [PATCH v1] cpufreq: Add policy_frequency trace event
Posted by Samuel Wu 2 weeks, 6 days ago
On Mon, Nov 17, 2025 at 1:18 AM Christian Loehle
<christian.loehle@arm.com> wrote:
>
> On 11/14/25 05:11, Viresh Kumar wrote:
> > On 13-11-25, 19:41, Samuel Wu wrote:
> >> On Wed, Nov 12, 2025 at 10:45 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> >>>
> >>> On 12-11-25, 15:51, Samuel Wu wrote:
> >>>> The existing cpu_frequency trace_event can be verbose, emitting an event
> >>>> for every CPU in the policy even when their frequencies are identical.
> >>>>
> >>>> This patch adds a new policy_frequency trace event, which provides a
> >>>> more efficient alternative to cpu_frequency trace event. This option
> >>>> allows users who only need frequency at a policy level more concise logs
> >>>> with simpler analysis.
> >>>>
> >>>> Signed-off-by: Samuel Wu <wusamuel@google.com>
> >>>> ---
> >>>>  drivers/cpufreq/cpufreq.c    |  2 ++
> >>>>  include/trace/events/power.h | 21 +++++++++++++++++++++
> >>>>  2 files changed, 23 insertions(+)
> >>>>
> >>>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> >>>> index 4472bb1ec83c..b65534a4fd9a 100644
> >>>> --- a/drivers/cpufreq/cpufreq.c
> >>>> +++ b/drivers/cpufreq/cpufreq.c
> >>>> @@ -345,6 +345,7 @@ static void cpufreq_notify_transition(struct cpufreq_policy *policy,
> >>>>               pr_debug("FREQ: %u - CPUs: %*pbl\n", freqs->new,
> >>>>                        cpumask_pr_args(policy->cpus));
> >>>>
> >>>> +             trace_policy_frequency(freqs->new, policy->cpu);
> >>>>               for_each_cpu(cpu, policy->cpus)
> >>>>                       trace_cpu_frequency(freqs->new, cpu);
> >>>
> >>> I don't see much value in almost duplicate trace events. If we feel that a
> >>> per-policy event is a better fit (which makes sens), then we can just drop the
> >>> trace_cpu_frequency() events and print policy->cpus (or related_cpus)
> >>> information along with the per-policy events.
> >>
> >> Thank you for the feedback Viresh. Fair enough, I've done some testing
> >> and a single trace event should work and would be cleaner. Please let
> >> me know what you think of this proposal for v2.
> >>
> >> We can append a bitmask of policy->cpus field to
> >> trace_cpu_frequency(). This way we maintain backwards compatibility:
> >> trace_cpu_frequency() is not removed, and its pre-existing fields are
> >> not disturbed.
> >>
> >> Call flow wise, we can delete all the for_each_cpu() loops, and we
> >> still retain the benefits of the trace emitting once per policy
> >> instead of once per cpu.
> >
> > Fine by me. I have added Scheduler maintainers in the loop to see if they have a
> > different view.
> >
>
> And IIUC your proposal is to fold policy_frequency into cpu_frequency but then
> only have one cpu_frequency event per policy emitted?

That's right, emit the trace event once per policy instead of once per
cpu- which I think is the most valuable element of this patch. And
yes, the latest idea was to append bitmask of policy->cpus into the
cpu_frequency event such that relevant policy info is encapsulated in
the trace event.

> I think from a tooling perspective it would be easier to remove cpu_frequency
> entirely, then tools can probe on the presence of policy_frequency / cpu_frequency.

This can be handled perfectly fine by the tools I know of that consume
this trace event. The points you and Viresh have brought up are valid,
and as this solution is not in conflict with those points,
"policy_frequency replacing cpu_frequency" can be the frontrunner for
now.
Re: [PATCH v1] cpufreq: Add policy_frequency trace event
Posted by Christian Loehle 2 weeks, 6 days ago
On 11/14/25 05:11, Viresh Kumar wrote:
> On 13-11-25, 19:41, Samuel Wu wrote:
>> On Wed, Nov 12, 2025 at 10:45 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>>>
>>> On 12-11-25, 15:51, Samuel Wu wrote:
>>>> The existing cpu_frequency trace_event can be verbose, emitting an event
>>>> for every CPU in the policy even when their frequencies are identical.
>>>>
>>>> This patch adds a new policy_frequency trace event, which provides a
>>>> more efficient alternative to cpu_frequency trace event. This option
>>>> allows users who only need frequency at a policy level more concise logs
>>>> with simpler analysis.
>>>>
>>>> Signed-off-by: Samuel Wu <wusamuel@google.com>
>>>> ---
>>>>  drivers/cpufreq/cpufreq.c    |  2 ++
>>>>  include/trace/events/power.h | 21 +++++++++++++++++++++
>>>>  2 files changed, 23 insertions(+)
>>>>
>>>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
>>>> index 4472bb1ec83c..b65534a4fd9a 100644
>>>> --- a/drivers/cpufreq/cpufreq.c
>>>> +++ b/drivers/cpufreq/cpufreq.c
>>>> @@ -345,6 +345,7 @@ static void cpufreq_notify_transition(struct cpufreq_policy *policy,
>>>>               pr_debug("FREQ: %u - CPUs: %*pbl\n", freqs->new,
>>>>                        cpumask_pr_args(policy->cpus));
>>>>
>>>> +             trace_policy_frequency(freqs->new, policy->cpu);
>>>>               for_each_cpu(cpu, policy->cpus)
>>>>                       trace_cpu_frequency(freqs->new, cpu);
>>>
>>> I don't see much value in almost duplicate trace events. If we feel that a
>>> per-policy event is a better fit (which makes sens), then we can just drop the
>>> trace_cpu_frequency() events and print policy->cpus (or related_cpus)
>>> information along with the per-policy events.
>>
>> Thank you for the feedback Viresh. Fair enough, I've done some testing
>> and a single trace event should work and would be cleaner. Please let
>> me know what you think of this proposal for v2.
>>
>> We can append a bitmask of policy->cpus field to
>> trace_cpu_frequency(). This way we maintain backwards compatibility:
>> trace_cpu_frequency() is not removed, and its pre-existing fields are
>> not disturbed.
>>
>> Call flow wise, we can delete all the for_each_cpu() loops, and we
>> still retain the benefits of the trace emitting once per policy
>> instead of once per cpu.
> 
> Fine by me. I have added Scheduler maintainers in the loop to see if they have a
> different view.
> 

That's gonna break a lot of tooling but AFAICS that in and of itself isn't
a good enough reason not to do it.
(Added some CCs)