[PATCH v2 02/18] KVM: x86: Use __free(put_cpufreq_policy) for policy reference

Zihuan Zhang posted 18 patches 1 month, 1 week ago
There is a newer version of this series
[PATCH v2 02/18] KVM: x86: Use __free(put_cpufreq_policy) for policy reference
Posted by Zihuan Zhang 1 month, 1 week ago
Replace the manual cpufreq_cpu_put() with __free(put_cpufreq_policy)
annotation for policy references. This reduces the risk of reference
counting mistakes and aligns the code with the latest kernel style.

No functional change intended.

Signed-off-by: Zihuan Zhang <zhangzihuan@kylinos.cn>
---
 arch/x86/kvm/x86.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a1c49bc681c4..2a825f4ec701 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9492,16 +9492,14 @@ static void kvm_timer_init(void)
 		max_tsc_khz = tsc_khz;
 
 		if (IS_ENABLED(CONFIG_CPU_FREQ)) {
-			struct cpufreq_policy *policy;
+			struct cpufreq_policy *policy __free(put_cpufreq_policy);
 			int cpu;
 
 			cpu = get_cpu();
 			policy = cpufreq_cpu_get(cpu);
-			if (policy) {
-				if (policy->cpuinfo.max_freq)
-					max_tsc_khz = policy->cpuinfo.max_freq;
-				cpufreq_cpu_put(policy);
-			}
+			if (policy && policy->cpuinfo.max_freq)
+				max_tsc_khz = policy->cpuinfo.max_freq;
+
 			put_cpu();
 		}
 		cpufreq_register_notifier(&kvmclock_cpufreq_notifier_block,
-- 
2.25.1
Re: [PATCH v2 02/18] KVM: x86: Use __free(put_cpufreq_policy) for policy reference
Posted by Sean Christopherson 1 month, 1 week ago
On Wed, Aug 27, 2025, Zihuan Zhang wrote:
> Replace the manual cpufreq_cpu_put() with __free(put_cpufreq_policy)
> annotation for policy references. This reduces the risk of reference
> counting mistakes and aligns the code with the latest kernel style.
> 
> No functional change intended.
> 
> Signed-off-by: Zihuan Zhang <zhangzihuan@kylinos.cn>
> ---
>  arch/x86/kvm/x86.c | 10 ++++------
>  1 file changed, 4 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index a1c49bc681c4..2a825f4ec701 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9492,16 +9492,14 @@ static void kvm_timer_init(void)
>  		max_tsc_khz = tsc_khz;
>  
>  		if (IS_ENABLED(CONFIG_CPU_FREQ)) {
> -			struct cpufreq_policy *policy;
> +			struct cpufreq_policy *policy __free(put_cpufreq_policy);
>  			int cpu;
>  
>  			cpu = get_cpu();
>  			policy = cpufreq_cpu_get(cpu);
> -			if (policy) {
> -				if (policy->cpuinfo.max_freq)
> -					max_tsc_khz = policy->cpuinfo.max_freq;
> -				cpufreq_cpu_put(policy);
> -			}
> +			if (policy && policy->cpuinfo.max_freq)
> +				max_tsc_khz = policy->cpuinfo.max_freq;
> +
>  			put_cpu();

Hmm, this is technically buggy.  __free() won't invoke put_cpufreq_policy() until
policy goes out of scope, and so using __free() means the code is effectively:

		if (IS_ENABLED(CONFIG_CPU_FREQ)) {
			struct cpufreq_policy *policy;
			int cpu;

			cpu = get_cpu();
			policy = cpufreq_cpu_get(cpu);
			if (policy && policy->cpuinfo.max_freq)
				max_tsc_khz = policy->cpuinfo.max_freq;
			put_cpu();

			if (policy)
				cpufreq_cpu_put(policy);
		}

That's "fine" because the policy isn't truly referenced after preemption is
disabled, the lifecycle of the policy doesn't rely on preemption being disabled,
and KVM doesn't actually care which CPU is used to get the max frequency, i.e.
this would technically be "fine" too:

		if (IS_ENABLED(CONFIG_CPU_FREQ)) {
			struct cpufreq_policy *policy;
			int cpu;

			cpu = get_cpu();
			policy = cpufreq_cpu_get(cpu);
			put_cpu();

			if (policy && policy->cpuinfo.max_freq)
				max_tsc_khz = policy->cpuinfo.max_freq;

			if (policy)
				cpufreq_cpu_put(policy);
		}

But given that the code we have today is perfectly readable, I don't see any
reason to switch to __free() given that's it's technically flawed.  So I'm very
strongly inclined to skip this patch and keep things as-is.
Re: [PATCH v2 02/18] KVM: x86: Use __free(put_cpufreq_policy) for policy reference
Posted by Zihuan Zhang 1 month ago
Hi,

在 2025/8/27 22:13, Sean Christopherson 写道:
> On Wed, Aug 27, 2025, Zihuan Zhang wrote:
>> Replace the manual cpufreq_cpu_put() with __free(put_cpufreq_policy)
>> annotation for policy references. This reduces the risk of reference
>> counting mistakes and aligns the code with the latest kernel style.
>>
>> No functional change intended.
>>
>> Signed-off-by: Zihuan Zhang <zhangzihuan@kylinos.cn>
>> ---
>>   arch/x86/kvm/x86.c | 10 ++++------
>>   1 file changed, 4 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index a1c49bc681c4..2a825f4ec701 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -9492,16 +9492,14 @@ static void kvm_timer_init(void)
>>   		max_tsc_khz = tsc_khz;
>>   
>>   		if (IS_ENABLED(CONFIG_CPU_FREQ)) {
>> -			struct cpufreq_policy *policy;
>> +			struct cpufreq_policy *policy __free(put_cpufreq_policy);
>>   			int cpu;
>>   
>>   			cpu = get_cpu();
>>   			policy = cpufreq_cpu_get(cpu);
>> -			if (policy) {
>> -				if (policy->cpuinfo.max_freq)
>> -					max_tsc_khz = policy->cpuinfo.max_freq;
>> -				cpufreq_cpu_put(policy);
>> -			}
>> +			if (policy && policy->cpuinfo.max_freq)
>> +				max_tsc_khz = policy->cpuinfo.max_freq;
>> +
>>   			put_cpu();
> Hmm, this is technically buggy.  __free() won't invoke put_cpufreq_policy() until
> policy goes out of scope, and so using __free() means the code is effectively:
>
> 		if (IS_ENABLED(CONFIG_CPU_FREQ)) {
> 			struct cpufreq_policy *policy;
> 			int cpu;
>
> 			cpu = get_cpu();
> 			policy = cpufreq_cpu_get(cpu);
> 			if (policy && policy->cpuinfo.max_freq)
> 				max_tsc_khz = policy->cpuinfo.max_freq;
> 			put_cpu();
>
> 			if (policy)
> 				cpufreq_cpu_put(policy);
> 		}
>
> That's "fine" because the policy isn't truly referenced after preemption is
> disabled, the lifecycle of the policy doesn't rely on preemption being disabled,
> and KVM doesn't actually care which CPU is used to get the max frequency, i.e.
> this would technically be "fine" too:
>
> 		if (IS_ENABLED(CONFIG_CPU_FREQ)) {
> 			struct cpufreq_policy *policy;
> 			int cpu;
>
> 			cpu = get_cpu();
> 			policy = cpufreq_cpu_get(cpu);
> 			put_cpu();
>
> 			if (policy && policy->cpuinfo.max_freq)
> 				max_tsc_khz = policy->cpuinfo.max_freq;
>
> 			if (policy)
> 				cpufreq_cpu_put(policy);
> 		}
>
> But given that the code we have today is perfectly readable, I don't see any
> reason to switch to __free() given that's it's technically flawed.  So I'm very
> strongly inclined to skip this patch and keep things as-is.


Yes, this will indeed change the execution order.
Can you accept that? Personally, I don’t think it’s ideal either.

		if (IS_ENABLED(CONFIG_CPU_FREQ)) {
  			int cpu;
			cpu = get_cpu();
			{
				struct cpufreq_policy *policy __free(put_cpufreq_policy) = cpufreq_cpu_get(cpu);
				if (policy && policy->cpuinfo.max_freq)
					max_tsc_khz = policy->cpuinfo.max_freq;
			}
			put_cpu();

                                             }

Other places may also have the same issue,

maybe we should consider introducing a macro to handle this properly,
so that initialization and cleanup are well defined without changing
the existing order unexpected.

like this:

#define WITH_CPUFREQ_POLICY(cpu) {\

for(struct cpufreq_policy *policy __free(put_cpufreq_policy) =  \
			cpufreq_cpu_get(cpu);			\
			policy;)

Then Use it:

		if (IS_ENABLED(CONFIG_CPU_FREQ)) {
  			int cpu;
			cpu = get_cpu();
			WITH_CPUFREQ_POLICY(cpu){
				if (policy->cpuinfo.max_freq)
					max_tsc_khz = policy->cpuinfo.max_freq;
			}
			put_cpu();

                                             }
Re: [PATCH v2 02/18] KVM: x86: Use __free(put_cpufreq_policy) for policy reference
Posted by Sean Christopherson 1 month ago
On Thu, Aug 28, 2025, Zihuan Zhang wrote:
> > Hmm, this is technically buggy.  __free() won't invoke put_cpufreq_policy() until
> > policy goes out of scope, and so using __free() means the code is effectively:
> > 
> > 		if (IS_ENABLED(CONFIG_CPU_FREQ)) {
> > 			struct cpufreq_policy *policy;
> > 			int cpu;
> > 
> > 			cpu = get_cpu();
> > 			policy = cpufreq_cpu_get(cpu);
> > 			if (policy && policy->cpuinfo.max_freq)
> > 				max_tsc_khz = policy->cpuinfo.max_freq;
> > 			put_cpu();
> > 
> > 			if (policy)
> > 				cpufreq_cpu_put(policy);
> > 		}

...

> Yes, this will indeed change the execution order.
> Can you accept that? 

No, because it's buggy.

> Personally, I don’t think it’s ideal either.
> 
> 		if (IS_ENABLED(CONFIG_CPU_FREQ)) {
>  			int cpu;
> 			cpu = get_cpu();
> 			{
> 				struct cpufreq_policy *policy __free(put_cpufreq_policy) = cpufreq_cpu_get(cpu);
> 				if (policy && policy->cpuinfo.max_freq)
> 					max_tsc_khz = policy->cpuinfo.max_freq;
> 			}
> 			put_cpu();
> 
>                                             }
> 
> Other places may also have the same issue,
> 
> maybe we should consider introducing a macro to handle this properly,
> so that initialization and cleanup are well defined without changing
> the existing order unexpected.
> 
> like this:
> 
> #define WITH_CPUFREQ_POLICY(cpu) {\
> 
> for(struct cpufreq_policy *policy __free(put_cpufreq_policy) =  \
> 			cpufreq_cpu_get(cpu);			\
> 			policy;)
> 
> Then Use it:
> 
> 		if (IS_ENABLED(CONFIG_CPU_FREQ)) {
>  			int cpu;
> 			cpu = get_cpu();
> 			WITH_CPUFREQ_POLICY(cpu){
> 				if (policy->cpuinfo.max_freq)
> 					max_tsc_khz = policy->cpuinfo.max_freq;
> 			}
> 			put_cpu();

This all feels very forced, in the sense that we have a shiny new tool and are
trying to use it everywhere without thinking critically about whether or not
doing so is actually an improvement.

At a glance, this is literally the only instance in the entire kernel where the
CPU to use is grabbed immediately before the policy.
 
  $ git grep -B 20 cpufreq_cpu_get | grep -e get_cpu -e smp_processor_id
  arch/x86/kvm/x86.c-			cpu = get_cpu();
  drivers/cpufreq/cppc_cpufreq.c-static int cppc_get_cpu_power(struct device *cpu_dev,
  drivers/cpufreq/cppc_cpufreq.c-static int cppc_get_cpu_cost(struct device *cpu_dev, unsigned long KHz,
  drivers/cpufreq/mediatek-cpufreq-hw.c-mtk_cpufreq_get_cpu_power(struct device *cpu_dev, unsigned long *uW,

Probably because KVM's usage is rather bizarre and honestly kind of dumb.  But
KVM has had this behavior for 15+ years, so as weird as it is, I'm not inclined
to change it without a really, really strong reason to do so, e.g. to iterate
over all CPUs or something.

So given that this is the only intance of the problem patter, I think it makes
sense to leave KVM as-is, and not spend a bunch of time trying to figure out how
to make KVM's usage play nice with __free().