[PATCH 3/4] mwait-idle: add 'preferred_cstates' module argument

Jan Beulich posted 4 patches 2 years, 7 months ago
[PATCH 3/4] mwait-idle: add 'preferred_cstates' module argument
Posted by Jan Beulich 2 years, 7 months ago
From: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>

On Sapphire Rapids Xeon (SPR) the C1 and C1E states are basically mutually
exclusive - only one of them can be enabled. By default, 'intel_idle' driver
enables C1 and disables C1E. However, some users prefer to use C1E instead of
C1, because it saves more energy.

This patch adds a new module parameter ('preferred_cstates') for enabling C1E
and disabling C1. Here is the idea behind it.

1. This option has effect only for "mutually exclusive" C-states like C1 and
   C1E on SPR.
2. It does not have any effect on independent C-states, which do not require
   other C-states to be disabled (most states on most platforms as of today).
3. For mutually exclusive C-states, the 'intel_idle' driver always has a
   reasonable default, such as enabling C1 on SPR by default. On other
   platforms, the default may be different.
4. Users can override the default using the 'preferred_cstates' parameter.
5. The parameter accepts the preferred C-states bit-mask, similarly to the
   existing 'states_off' parameter.
6. This parameter is not limited to C1/C1E, and leaves room for supporting
   other mutually exclusive C-states, if they come in the future.

Today 'intel_idle' can only be compiled-in, which means that on SPR, in order
to disable C1 and enable C1E, users should boot with the following kernel
argument: intel_idle.preferred_cstates=4

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git da0e58c038e6

Enable C1E (if requested) not only on the BSP's socket / package.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- unstable.orig/docs/misc/xen-command-line.pandoc	2022-04-25 17:59:42.123387258 +0200
+++ unstable/docs/misc/xen-command-line.pandoc	2022-04-25 17:36:00.000000000 +0200
@@ -1884,6 +1884,12 @@ paging controls access to usermode addre
 ### ple_window (Intel)
 > `= <integer>`
 
+### preferred-cstates (x86)
+> `= <integer>`
+
+This is a mask of C-states which are to be use preferably.  This option is
+applicable only oh hardware were certain C-states are exlusive of one another.
+
 ### psr (Intel)
 > `= List of ( cmt:<boolean> | rmid_max:<integer> | cat:<boolean> | cos_max:<integer> | cdp:<boolean> )`
 
--- unstable.orig/xen/arch/x86/cpu/mwait-idle.c	2022-04-25 17:17:05.000000000 +0200
+++ unstable/xen/arch/x86/cpu/mwait-idle.c	2022-04-25 17:33:47.000000000 +0200
@@ -82,6 +82,18 @@ boolean_param("mwait-idle", opt_mwait_id
 
 static unsigned int mwait_substates;
 
+/*
+ * Some platforms come with mutually exclusive C-states, so that if one is
+ * enabled, the other C-states must not be used. Example: C1 and C1E on
+ * Sapphire Rapids platform. This parameter allows for selecting the
+ * preferred C-states among the groups of mutually exclusive C-states - the
+ * selected C-states will be registered, the other C-states from the mutually
+ * exclusive group won't be registered. If the platform has no mutually
+ * exclusive C-states, this parameter has no effect.
+ */
+static unsigned int __ro_after_init preferred_states_mask;
+integer_param("preferred-cstates", preferred_states_mask);
+
 #define LAPIC_TIMER_ALWAYS_RELIABLE 0xFFFFFFFF
 /* Reliable LAPIC Timer States, bit 1 for C1 etc. Default to only C1. */
 static unsigned int lapic_timer_reliable_states = (1 << 1);
@@ -96,6 +108,7 @@ struct idle_cpu {
 	unsigned long auto_demotion_disable_flags;
 	bool byt_auto_demotion_disable_flag;
 	bool disable_promotion_to_c1e;
+	bool enable_promotion_to_c1e;
 };
 
 static const struct idle_cpu *icpu;
@@ -924,6 +937,15 @@ static void cf_check byt_auto_demotion_d
 	wrmsrl(MSR_MC6_DEMOTION_POLICY_CONFIG, 0);
 }
 
+static void cf_check c1e_promotion_enable(void *dummy)
+{
+	uint64_t msr_bits;
+
+	rdmsrl(MSR_IA32_POWER_CTL, msr_bits);
+	msr_bits |= 0x2;
+	wrmsrl(MSR_IA32_POWER_CTL, msr_bits);
+}
+
 static void cf_check c1e_promotion_disable(void *dummy)
 {
 	u64 msr_bits;
@@ -1241,6 +1263,26 @@ static void __init skx_idle_state_table_
 }
 
 /*
+ * spr_idle_state_table_update - Adjust Sapphire Rapids idle states table.
+ */
+static void __init spr_idle_state_table_update(void)
+{
+	/* Check if user prefers C1E over C1. */
+	if (preferred_states_mask & BIT(2, U)) {
+		if (preferred_states_mask & BIT(1, U))
+			/* Both can't be enabled, stick to the defaults. */
+			return;
+
+		spr_cstates[0].flags |= CPUIDLE_FLAG_DISABLED;
+		spr_cstates[1].flags &= ~CPUIDLE_FLAG_DISABLED;
+
+		/* Request enabling C1E using the "C1E promotion" bit. */
+		idle_cpu_spr.disable_promotion_to_c1e = false;
+		idle_cpu_spr.enable_promotion_to_c1e = true;
+	}
+}
+
+/*
  * mwait_idle_state_table_update()
  *
  * Update the default state_table for this CPU-id
@@ -1261,6 +1303,9 @@ static void __init mwait_idle_state_tabl
 	case INTEL_FAM6_SKYLAKE_X:
 		skx_idle_state_table_update();
 		break;
+	case INTEL_FAM6_SAPPHIRERAPIDS_X:
+		spr_idle_state_table_update();
+		break;
 	}
 }
 
@@ -1402,6 +1447,8 @@ static int cf_check mwait_idle_cpu_init(
 
 	if (icpu->disable_promotion_to_c1e)
 		on_selected_cpus(cpumask_of(cpu), c1e_promotion_disable, NULL, 1);
+	else if (icpu->enable_promotion_to_c1e)
+		on_selected_cpus(cpumask_of(cpu), c1e_promotion_enable, NULL, 1);
 
 	return NOTIFY_DONE;
 }
Re: [PATCH 3/4] mwait-idle: add 'preferred_cstates' module argument
Posted by Roger Pau Monné 2 years, 7 months ago
On Tue, Apr 26, 2022 at 12:05:28PM +0200, Jan Beulich wrote:
> From: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
> 
> On Sapphire Rapids Xeon (SPR) the C1 and C1E states are basically mutually
> exclusive - only one of them can be enabled. By default, 'intel_idle' driver
> enables C1 and disables C1E. However, some users prefer to use C1E instead of
> C1, because it saves more energy.
> 
> This patch adds a new module parameter ('preferred_cstates') for enabling C1E
> and disabling C1. Here is the idea behind it.
> 
> 1. This option has effect only for "mutually exclusive" C-states like C1 and
>    C1E on SPR.
> 2. It does not have any effect on independent C-states, which do not require
>    other C-states to be disabled (most states on most platforms as of today).
> 3. For mutually exclusive C-states, the 'intel_idle' driver always has a
>    reasonable default, such as enabling C1 on SPR by default. On other
>    platforms, the default may be different.
> 4. Users can override the default using the 'preferred_cstates' parameter.
> 5. The parameter accepts the preferred C-states bit-mask, similarly to the
>    existing 'states_off' parameter.
> 6. This parameter is not limited to C1/C1E, and leaves room for supporting
>    other mutually exclusive C-states, if they come in the future.
> 
> Today 'intel_idle' can only be compiled-in, which means that on SPR, in order
> to disable C1 and enable C1E, users should boot with the following kernel
> argument: intel_idle.preferred_cstates=4
> 
> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git da0e58c038e6
> 
> Enable C1E (if requested) not only on the BSP's socket / package.

Maybe we should also add a note here that the command line option for
Xen is preferred-cstates instead of intel_idle.preferred_cstates?

I think this is a bad interface however, we should have a more generic
option (ie: cstate-mode = 'performance | powersave') so that users
don't have to fiddle with model specific C state masks.

> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> --- unstable.orig/docs/misc/xen-command-line.pandoc	2022-04-25 17:59:42.123387258 +0200
> +++ unstable/docs/misc/xen-command-line.pandoc	2022-04-25 17:36:00.000000000 +0200
> @@ -1884,6 +1884,12 @@ paging controls access to usermode addre
>  ### ple_window (Intel)
>  > `= <integer>`
>  
> +### preferred-cstates (x86)
> +> `= <integer>`
> +
> +This is a mask of C-states which are to be use preferably.  This option is
> +applicable only oh hardware were certain C-states are exlusive of one another.
> +
>  ### psr (Intel)
>  > `= List of ( cmt:<boolean> | rmid_max:<integer> | cat:<boolean> | cos_max:<integer> | cdp:<boolean> )`
>  
> --- unstable.orig/xen/arch/x86/cpu/mwait-idle.c	2022-04-25 17:17:05.000000000 +0200
> +++ unstable/xen/arch/x86/cpu/mwait-idle.c	2022-04-25 17:33:47.000000000 +0200
> @@ -82,6 +82,18 @@ boolean_param("mwait-idle", opt_mwait_id
>  
>  static unsigned int mwait_substates;
>  
> +/*
> + * Some platforms come with mutually exclusive C-states, so that if one is
> + * enabled, the other C-states must not be used. Example: C1 and C1E on
> + * Sapphire Rapids platform. This parameter allows for selecting the
> + * preferred C-states among the groups of mutually exclusive C-states - the
> + * selected C-states will be registered, the other C-states from the mutually
> + * exclusive group won't be registered. If the platform has no mutually
> + * exclusive C-states, this parameter has no effect.
> + */
> +static unsigned int __ro_after_init preferred_states_mask;
> +integer_param("preferred-cstates", preferred_states_mask);
> +
>  #define LAPIC_TIMER_ALWAYS_RELIABLE 0xFFFFFFFF
>  /* Reliable LAPIC Timer States, bit 1 for C1 etc. Default to only C1. */
>  static unsigned int lapic_timer_reliable_states = (1 << 1);
> @@ -96,6 +108,7 @@ struct idle_cpu {
>  	unsigned long auto_demotion_disable_flags;
>  	bool byt_auto_demotion_disable_flag;
>  	bool disable_promotion_to_c1e;
> +	bool enable_promotion_to_c1e;

I'm confused by those fields, shouldn't we just have:
promotion_to_c1e = true | false?

As one field is the negation of the other:
enable_promotion_to_c1e = !disable_promotion_to_c1e

I know this is code from Linux, but would like to understand why two
fields are needed.

Thanks, Roger.

Re: [PATCH 3/4] mwait-idle: add 'preferred_cstates' module argument
Posted by Jan Beulich 2 years, 7 months ago
On 27.04.2022 14:45, Roger Pau Monné wrote:
> On Tue, Apr 26, 2022 at 12:05:28PM +0200, Jan Beulich wrote:
>> From: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
>>
>> On Sapphire Rapids Xeon (SPR) the C1 and C1E states are basically mutually
>> exclusive - only one of them can be enabled. By default, 'intel_idle' driver
>> enables C1 and disables C1E. However, some users prefer to use C1E instead of
>> C1, because it saves more energy.
>>
>> This patch adds a new module parameter ('preferred_cstates') for enabling C1E
>> and disabling C1. Here is the idea behind it.
>>
>> 1. This option has effect only for "mutually exclusive" C-states like C1 and
>>    C1E on SPR.
>> 2. It does not have any effect on independent C-states, which do not require
>>    other C-states to be disabled (most states on most platforms as of today).
>> 3. For mutually exclusive C-states, the 'intel_idle' driver always has a
>>    reasonable default, such as enabling C1 on SPR by default. On other
>>    platforms, the default may be different.
>> 4. Users can override the default using the 'preferred_cstates' parameter.
>> 5. The parameter accepts the preferred C-states bit-mask, similarly to the
>>    existing 'states_off' parameter.
>> 6. This parameter is not limited to C1/C1E, and leaves room for supporting
>>    other mutually exclusive C-states, if they come in the future.
>>
>> Today 'intel_idle' can only be compiled-in, which means that on SPR, in order
>> to disable C1 and enable C1E, users should boot with the following kernel
>> argument: intel_idle.preferred_cstates=4
>>
>> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git da0e58c038e6
>>
>> Enable C1E (if requested) not only on the BSP's socket / package.
> 
> Maybe we should also add a note here that the command line option for
> Xen is preferred-cstates instead of intel_idle.preferred_cstates?
> 
> I think this is a bad interface however, we should have a more generic
> option (ie: cstate-mode = 'performance | powersave') so that users
> don't have to fiddle with model specific C state masks.

Performance vs powersave doesn't cover it imo, especially if down
the road more states would appear which can be controlled this way.
I don't think there's a way around providing _some_ way to control
things one a per-state level. When porting this over, I too didn't
like this interface very much, but I had no good replacement idea.

>> --- unstable.orig/xen/arch/x86/cpu/mwait-idle.c
>> +++ unstable/xen/arch/x86/cpu/mwait-idle.c
>> @@ -82,6 +82,18 @@ boolean_param("mwait-idle", opt_mwait_id
>>  
>>  static unsigned int mwait_substates;
>>  
>> +/*
>> + * Some platforms come with mutually exclusive C-states, so that if one is
>> + * enabled, the other C-states must not be used. Example: C1 and C1E on
>> + * Sapphire Rapids platform. This parameter allows for selecting the
>> + * preferred C-states among the groups of mutually exclusive C-states - the
>> + * selected C-states will be registered, the other C-states from the mutually
>> + * exclusive group won't be registered. If the platform has no mutually
>> + * exclusive C-states, this parameter has no effect.
>> + */
>> +static unsigned int __ro_after_init preferred_states_mask;
>> +integer_param("preferred-cstates", preferred_states_mask);
>> +
>>  #define LAPIC_TIMER_ALWAYS_RELIABLE 0xFFFFFFFF
>>  /* Reliable LAPIC Timer States, bit 1 for C1 etc. Default to only C1. */
>>  static unsigned int lapic_timer_reliable_states = (1 << 1);
>> @@ -96,6 +108,7 @@ struct idle_cpu {
>>  	unsigned long auto_demotion_disable_flags;
>>  	bool byt_auto_demotion_disable_flag;
>>  	bool disable_promotion_to_c1e;
>> +	bool enable_promotion_to_c1e;
> 
> I'm confused by those fields, shouldn't we just have:
> promotion_to_c1e = true | false?
> 
> As one field is the negation of the other:
> enable_promotion_to_c1e = !disable_promotion_to_c1e
> 
> I know this is code from Linux, but would like to understand why two
> fields are needed.

This really is a tristate; Linux is now changing their global variable
to an enum, but we don't have an equivalent of that global variable.

Jan
Re: [PATCH 3/4] mwait-idle: add 'preferred_cstates' module argument
Posted by Roger Pau Monné 2 years, 7 months ago
On Wed, Apr 27, 2022 at 03:41:24PM +0200, Jan Beulich wrote:
> On 27.04.2022 14:45, Roger Pau Monné wrote:
> > On Tue, Apr 26, 2022 at 12:05:28PM +0200, Jan Beulich wrote:
> >> From: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
> >>
> >> On Sapphire Rapids Xeon (SPR) the C1 and C1E states are basically mutually
> >> exclusive - only one of them can be enabled. By default, 'intel_idle' driver
> >> enables C1 and disables C1E. However, some users prefer to use C1E instead of
> >> C1, because it saves more energy.
> >>
> >> This patch adds a new module parameter ('preferred_cstates') for enabling C1E
> >> and disabling C1. Here is the idea behind it.
> >>
> >> 1. This option has effect only for "mutually exclusive" C-states like C1 and
> >>    C1E on SPR.
> >> 2. It does not have any effect on independent C-states, which do not require
> >>    other C-states to be disabled (most states on most platforms as of today).
> >> 3. For mutually exclusive C-states, the 'intel_idle' driver always has a
> >>    reasonable default, such as enabling C1 on SPR by default. On other
> >>    platforms, the default may be different.
> >> 4. Users can override the default using the 'preferred_cstates' parameter.
> >> 5. The parameter accepts the preferred C-states bit-mask, similarly to the
> >>    existing 'states_off' parameter.
> >> 6. This parameter is not limited to C1/C1E, and leaves room for supporting
> >>    other mutually exclusive C-states, if they come in the future.
> >>
> >> Today 'intel_idle' can only be compiled-in, which means that on SPR, in order
> >> to disable C1 and enable C1E, users should boot with the following kernel
> >> argument: intel_idle.preferred_cstates=4
> >>
> >> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
> >> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >> Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git da0e58c038e6
> >>
> >> Enable C1E (if requested) not only on the BSP's socket / package.
> > 
> > Maybe we should also add a note here that the command line option for
> > Xen is preferred-cstates instead of intel_idle.preferred_cstates?
> > 
> > I think this is a bad interface however, we should have a more generic
> > option (ie: cstate-mode = 'performance | powersave') so that users
> > don't have to fiddle with model specific C state masks.
> 
> Performance vs powersave doesn't cover it imo, especially if down
> the road more states would appear which can be controlled this way.
> I don't think there's a way around providing _some_ way to control
> things one a per-state level. When porting this over, I too didn't
> like this interface very much, but I had no good replacement idea.

I think it's fine to have this more fine grained control of C states,
but it doesn't seem practical from a user (or distro) PoV.  But then I
also wonder how much of a difference this will make regarding power
consumption.

> >> --- unstable.orig/xen/arch/x86/cpu/mwait-idle.c
> >> +++ unstable/xen/arch/x86/cpu/mwait-idle.c
> >> @@ -82,6 +82,18 @@ boolean_param("mwait-idle", opt_mwait_id
> >>  
> >>  static unsigned int mwait_substates;
> >>  
> >> +/*
> >> + * Some platforms come with mutually exclusive C-states, so that if one is
> >> + * enabled, the other C-states must not be used. Example: C1 and C1E on
> >> + * Sapphire Rapids platform. This parameter allows for selecting the
> >> + * preferred C-states among the groups of mutually exclusive C-states - the
> >> + * selected C-states will be registered, the other C-states from the mutually
> >> + * exclusive group won't be registered. If the platform has no mutually
> >> + * exclusive C-states, this parameter has no effect.
> >> + */
> >> +static unsigned int __ro_after_init preferred_states_mask;
> >> +integer_param("preferred-cstates", preferred_states_mask);
> >> +
> >>  #define LAPIC_TIMER_ALWAYS_RELIABLE 0xFFFFFFFF
> >>  /* Reliable LAPIC Timer States, bit 1 for C1 etc. Default to only C1. */
> >>  static unsigned int lapic_timer_reliable_states = (1 << 1);
> >> @@ -96,6 +108,7 @@ struct idle_cpu {
> >>  	unsigned long auto_demotion_disable_flags;
> >>  	bool byt_auto_demotion_disable_flag;
> >>  	bool disable_promotion_to_c1e;
> >> +	bool enable_promotion_to_c1e;
> > 
> > I'm confused by those fields, shouldn't we just have:
> > promotion_to_c1e = true | false?
> > 
> > As one field is the negation of the other:
> > enable_promotion_to_c1e = !disable_promotion_to_c1e
> > 
> > I know this is code from Linux, but would like to understand why two
> > fields are needed.
> 
> This really is a tristate; Linux is now changing their global variable
> to an enum, but we don't have an equivalent of that global variable.

So it would be: leave default, disable C1E promotion, enable C1E
promotion.

And Linux is leaving the {disable,enable}_promotion_to_c1e in
idle_cpu?

I guess there's not much we can do unless we want to diverge from
upstream.

Thanks, Roger.

Re: [PATCH 3/4] mwait-idle: add 'preferred_cstates' module argument
Posted by Jan Beulich 2 years, 7 months ago
On 27.04.2022 17:06, Roger Pau Monné wrote:
> On Wed, Apr 27, 2022 at 03:41:24PM +0200, Jan Beulich wrote:
>> On 27.04.2022 14:45, Roger Pau Monné wrote:
>>> On Tue, Apr 26, 2022 at 12:05:28PM +0200, Jan Beulich wrote:
>>>> --- unstable.orig/xen/arch/x86/cpu/mwait-idle.c
>>>> +++ unstable/xen/arch/x86/cpu/mwait-idle.c
>>>> @@ -82,6 +82,18 @@ boolean_param("mwait-idle", opt_mwait_id
>>>>  
>>>>  static unsigned int mwait_substates;
>>>>  
>>>> +/*
>>>> + * Some platforms come with mutually exclusive C-states, so that if one is
>>>> + * enabled, the other C-states must not be used. Example: C1 and C1E on
>>>> + * Sapphire Rapids platform. This parameter allows for selecting the
>>>> + * preferred C-states among the groups of mutually exclusive C-states - the
>>>> + * selected C-states will be registered, the other C-states from the mutually
>>>> + * exclusive group won't be registered. If the platform has no mutually
>>>> + * exclusive C-states, this parameter has no effect.
>>>> + */
>>>> +static unsigned int __ro_after_init preferred_states_mask;
>>>> +integer_param("preferred-cstates", preferred_states_mask);
>>>> +
>>>>  #define LAPIC_TIMER_ALWAYS_RELIABLE 0xFFFFFFFF
>>>>  /* Reliable LAPIC Timer States, bit 1 for C1 etc. Default to only C1. */
>>>>  static unsigned int lapic_timer_reliable_states = (1 << 1);
>>>> @@ -96,6 +108,7 @@ struct idle_cpu {
>>>>  	unsigned long auto_demotion_disable_flags;
>>>>  	bool byt_auto_demotion_disable_flag;
>>>>  	bool disable_promotion_to_c1e;
>>>> +	bool enable_promotion_to_c1e;
>>>
>>> I'm confused by those fields, shouldn't we just have:
>>> promotion_to_c1e = true | false?
>>>
>>> As one field is the negation of the other:
>>> enable_promotion_to_c1e = !disable_promotion_to_c1e
>>>
>>> I know this is code from Linux, but would like to understand why two
>>> fields are needed.
>>
>> This really is a tristate; Linux is now changing their global variable
>> to an enum, but we don't have an equivalent of that global variable.
> 
> So it would be: leave default, disable C1E promotion, enable C1E
> promotion.
> 
> And Linux is leaving the {disable,enable}_promotion_to_c1e in
> idle_cpu?

Iirc they only have disable_promotion_to_c1e there (as a struct field)
and keep it, but they convert the similarly named file-scope variable
to a tristate.

> I guess there's not much we can do unless we want to diverge from
> upstream.

We've diverged some from Linux here already - as said, for example we
don't have their file-scope variable. I could convert our struct field
to an enum, but that would be larger code churn for (I think) little
gain.

Jan
Re: [PATCH 3/4] mwait-idle: add 'preferred_cstates' module argument
Posted by Roger Pau Monné 2 years, 7 months ago
On Wed, Apr 27, 2022 at 05:25:35PM +0200, Jan Beulich wrote:
> On 27.04.2022 17:06, Roger Pau Monné wrote:
> > On Wed, Apr 27, 2022 at 03:41:24PM +0200, Jan Beulich wrote:
> >> On 27.04.2022 14:45, Roger Pau Monné wrote:
> >>> On Tue, Apr 26, 2022 at 12:05:28PM +0200, Jan Beulich wrote:
> >>>> --- unstable.orig/xen/arch/x86/cpu/mwait-idle.c
> >>>> +++ unstable/xen/arch/x86/cpu/mwait-idle.c
> >>>> @@ -82,6 +82,18 @@ boolean_param("mwait-idle", opt_mwait_id
> >>>>  
> >>>>  static unsigned int mwait_substates;
> >>>>  
> >>>> +/*
> >>>> + * Some platforms come with mutually exclusive C-states, so that if one is
> >>>> + * enabled, the other C-states must not be used. Example: C1 and C1E on
> >>>> + * Sapphire Rapids platform. This parameter allows for selecting the
> >>>> + * preferred C-states among the groups of mutually exclusive C-states - the
> >>>> + * selected C-states will be registered, the other C-states from the mutually
> >>>> + * exclusive group won't be registered. If the platform has no mutually
> >>>> + * exclusive C-states, this parameter has no effect.
> >>>> + */
> >>>> +static unsigned int __ro_after_init preferred_states_mask;
> >>>> +integer_param("preferred-cstates", preferred_states_mask);
> >>>> +
> >>>>  #define LAPIC_TIMER_ALWAYS_RELIABLE 0xFFFFFFFF
> >>>>  /* Reliable LAPIC Timer States, bit 1 for C1 etc. Default to only C1. */
> >>>>  static unsigned int lapic_timer_reliable_states = (1 << 1);
> >>>> @@ -96,6 +108,7 @@ struct idle_cpu {
> >>>>  	unsigned long auto_demotion_disable_flags;
> >>>>  	bool byt_auto_demotion_disable_flag;
> >>>>  	bool disable_promotion_to_c1e;
> >>>> +	bool enable_promotion_to_c1e;
> >>>
> >>> I'm confused by those fields, shouldn't we just have:
> >>> promotion_to_c1e = true | false?
> >>>
> >>> As one field is the negation of the other:
> >>> enable_promotion_to_c1e = !disable_promotion_to_c1e
> >>>
> >>> I know this is code from Linux, but would like to understand why two
> >>> fields are needed.
> >>
> >> This really is a tristate; Linux is now changing their global variable
> >> to an enum, but we don't have an equivalent of that global variable.
> > 
> > So it would be: leave default, disable C1E promotion, enable C1E
> > promotion.
> > 
> > And Linux is leaving the {disable,enable}_promotion_to_c1e in
> > idle_cpu?
> 
> Iirc they only have disable_promotion_to_c1e there (as a struct field)
> and keep it, but they convert the similarly named file-scope variable
> to a tristate.
> 
> > I guess there's not much we can do unless we want to diverge from
> > upstream.
> 
> We've diverged some from Linux here already - as said, for example we
> don't have their file-scope variable. I could convert our struct field
> to an enum, but that would be larger code churn for (I think) little
> gain.

Hm, OK, could gaining the file scope variable would make sense in order
to reduce divergences?  Or are the other roadblocks there?

I think this is ugly, but would make sense as long as it allows us to
keep closer to upstream.

Thanks, Roger.

Re: [PATCH 3/4] mwait-idle: add 'preferred_cstates' module argument
Posted by Jan Beulich 2 years, 7 months ago
On 27.04.2022 18:12, Roger Pau Monné wrote:
> On Wed, Apr 27, 2022 at 05:25:35PM +0200, Jan Beulich wrote:
>> On 27.04.2022 17:06, Roger Pau Monné wrote:
>>> On Wed, Apr 27, 2022 at 03:41:24PM +0200, Jan Beulich wrote:
>>>> On 27.04.2022 14:45, Roger Pau Monné wrote:
>>>>> On Tue, Apr 26, 2022 at 12:05:28PM +0200, Jan Beulich wrote:
>>>>>> --- unstable.orig/xen/arch/x86/cpu/mwait-idle.c
>>>>>> +++ unstable/xen/arch/x86/cpu/mwait-idle.c
>>>>>> @@ -82,6 +82,18 @@ boolean_param("mwait-idle", opt_mwait_id
>>>>>>  
>>>>>>  static unsigned int mwait_substates;
>>>>>>  
>>>>>> +/*
>>>>>> + * Some platforms come with mutually exclusive C-states, so that if one is
>>>>>> + * enabled, the other C-states must not be used. Example: C1 and C1E on
>>>>>> + * Sapphire Rapids platform. This parameter allows for selecting the
>>>>>> + * preferred C-states among the groups of mutually exclusive C-states - the
>>>>>> + * selected C-states will be registered, the other C-states from the mutually
>>>>>> + * exclusive group won't be registered. If the platform has no mutually
>>>>>> + * exclusive C-states, this parameter has no effect.
>>>>>> + */
>>>>>> +static unsigned int __ro_after_init preferred_states_mask;
>>>>>> +integer_param("preferred-cstates", preferred_states_mask);
>>>>>> +
>>>>>>  #define LAPIC_TIMER_ALWAYS_RELIABLE 0xFFFFFFFF
>>>>>>  /* Reliable LAPIC Timer States, bit 1 for C1 etc. Default to only C1. */
>>>>>>  static unsigned int lapic_timer_reliable_states = (1 << 1);
>>>>>> @@ -96,6 +108,7 @@ struct idle_cpu {
>>>>>>  	unsigned long auto_demotion_disable_flags;
>>>>>>  	bool byt_auto_demotion_disable_flag;
>>>>>>  	bool disable_promotion_to_c1e;
>>>>>> +	bool enable_promotion_to_c1e;
>>>>>
>>>>> I'm confused by those fields, shouldn't we just have:
>>>>> promotion_to_c1e = true | false?
>>>>>
>>>>> As one field is the negation of the other:
>>>>> enable_promotion_to_c1e = !disable_promotion_to_c1e
>>>>>
>>>>> I know this is code from Linux, but would like to understand why two
>>>>> fields are needed.
>>>>
>>>> This really is a tristate; Linux is now changing their global variable
>>>> to an enum, but we don't have an equivalent of that global variable.
>>>
>>> So it would be: leave default, disable C1E promotion, enable C1E
>>> promotion.
>>>
>>> And Linux is leaving the {disable,enable}_promotion_to_c1e in
>>> idle_cpu?
>>
>> Iirc they only have disable_promotion_to_c1e there (as a struct field)
>> and keep it, but they convert the similarly named file-scope variable
>> to a tristate.
>>
>>> I guess there's not much we can do unless we want to diverge from
>>> upstream.
>>
>> We've diverged some from Linux here already - as said, for example we
>> don't have their file-scope variable. I could convert our struct field
>> to an enum, but that would be larger code churn for (I think) little
>> gain.
> 
> Hm, OK, could gaining the file scope variable would make sense in order
> to reduce divergences?  Or are the other roadblocks there?

I don't recall. It might have originated from a change I decided to not
port over, or I might have dropped it while porting. To be honest I'm
not keen on putting time into researching this, the more that I would
generally try to avoid static variables.

What I would be willing to put time in is making a more user friendly
command line option, but as said - I can't think of any good alternative
(except perhaps "preferred-cstates=c1e" or "cstates=preferred:c1e", with
internal translation of the strings into a bit mask, as long as (a) you
would think that's an improvement and (b) the further divergence from
Linux is not deemed a problem).

Jan

> I think this is ugly, but would make sense as long as it allows us to
> keep closer to upstream.
> 
> Thanks, Roger.
> 
Re: [PATCH 3/4] mwait-idle: add 'preferred_cstates' module argument
Posted by Roger Pau Monné 2 years, 7 months ago
On Thu, Apr 28, 2022 at 08:37:58AM +0200, Jan Beulich wrote:
> On 27.04.2022 18:12, Roger Pau Monné wrote:
> > On Wed, Apr 27, 2022 at 05:25:35PM +0200, Jan Beulich wrote:
> >> On 27.04.2022 17:06, Roger Pau Monné wrote:
> >>> On Wed, Apr 27, 2022 at 03:41:24PM +0200, Jan Beulich wrote:
> >>>> On 27.04.2022 14:45, Roger Pau Monné wrote:
> >>>>> On Tue, Apr 26, 2022 at 12:05:28PM +0200, Jan Beulich wrote:
> >>>>>> --- unstable.orig/xen/arch/x86/cpu/mwait-idle.c
> >>>>>> +++ unstable/xen/arch/x86/cpu/mwait-idle.c
> >>>>>> @@ -82,6 +82,18 @@ boolean_param("mwait-idle", opt_mwait_id
> >>>>>>  
> >>>>>>  static unsigned int mwait_substates;
> >>>>>>  
> >>>>>> +/*
> >>>>>> + * Some platforms come with mutually exclusive C-states, so that if one is
> >>>>>> + * enabled, the other C-states must not be used. Example: C1 and C1E on
> >>>>>> + * Sapphire Rapids platform. This parameter allows for selecting the
> >>>>>> + * preferred C-states among the groups of mutually exclusive C-states - the
> >>>>>> + * selected C-states will be registered, the other C-states from the mutually
> >>>>>> + * exclusive group won't be registered. If the platform has no mutually
> >>>>>> + * exclusive C-states, this parameter has no effect.
> >>>>>> + */
> >>>>>> +static unsigned int __ro_after_init preferred_states_mask;
> >>>>>> +integer_param("preferred-cstates", preferred_states_mask);
> >>>>>> +
> >>>>>>  #define LAPIC_TIMER_ALWAYS_RELIABLE 0xFFFFFFFF
> >>>>>>  /* Reliable LAPIC Timer States, bit 1 for C1 etc. Default to only C1. */
> >>>>>>  static unsigned int lapic_timer_reliable_states = (1 << 1);
> >>>>>> @@ -96,6 +108,7 @@ struct idle_cpu {
> >>>>>>  	unsigned long auto_demotion_disable_flags;
> >>>>>>  	bool byt_auto_demotion_disable_flag;
> >>>>>>  	bool disable_promotion_to_c1e;
> >>>>>> +	bool enable_promotion_to_c1e;
> >>>>>
> >>>>> I'm confused by those fields, shouldn't we just have:
> >>>>> promotion_to_c1e = true | false?
> >>>>>
> >>>>> As one field is the negation of the other:
> >>>>> enable_promotion_to_c1e = !disable_promotion_to_c1e
> >>>>>
> >>>>> I know this is code from Linux, but would like to understand why two
> >>>>> fields are needed.
> >>>>
> >>>> This really is a tristate; Linux is now changing their global variable
> >>>> to an enum, but we don't have an equivalent of that global variable.
> >>>
> >>> So it would be: leave default, disable C1E promotion, enable C1E
> >>> promotion.
> >>>
> >>> And Linux is leaving the {disable,enable}_promotion_to_c1e in
> >>> idle_cpu?
> >>
> >> Iirc they only have disable_promotion_to_c1e there (as a struct field)
> >> and keep it, but they convert the similarly named file-scope variable
> >> to a tristate.
> >>
> >>> I guess there's not much we can do unless we want to diverge from
> >>> upstream.
> >>
> >> We've diverged some from Linux here already - as said, for example we
> >> don't have their file-scope variable. I could convert our struct field
> >> to an enum, but that would be larger code churn for (I think) little
> >> gain.
> > 
> > Hm, OK, could gaining the file scope variable would make sense in order
> > to reduce divergences?  Or are the other roadblocks there?
> 
> I don't recall. It might have originated from a change I decided to not
> port over, or I might have dropped it while porting. To be honest I'm
> not keen on putting time into researching this, the more that I would
> generally try to avoid static variables.
> 
> What I would be willing to put time in is making a more user friendly
> command line option, but as said - I can't think of any good alternative
> (except perhaps "preferred-cstates=c1e" or "cstates=preferred:c1e", with
> internal translation of the strings into a bit mask, as long as (a) you
> would think that's an improvement and (b) the further divergence from
> Linux is not deemed a problem).

I think (b) won't be a problem as long as internally the user option
is translated into a bitmask.

Regarding (a) I do think it would be helpful to express this in a more
user friendly way, I'm not sure whether it would make sense to keep
Linux format also for compatibility reasons if users already have a
bitmask and want to use the same parameter for Xen and Linux, ie:

preferred-cstates = <string of c1e,c1,...> | <integer bitmask>

What I think we should fix is the naming of the two booleans:

bool disable_promotion_to_c1e;
bool enable_promotion_to_c1e;

I would rather translated this into an enum, as right now it's
confusing IMO.

Thanks, Roger.

Re: [PATCH 3/4] mwait-idle: add 'preferred_cstates' module argument
Posted by Jan Beulich 2 years, 7 months ago
On 28.04.2022 10:06, Roger Pau Monné wrote:
> On Thu, Apr 28, 2022 at 08:37:58AM +0200, Jan Beulich wrote:
>> On 27.04.2022 18:12, Roger Pau Monné wrote:
>>> On Wed, Apr 27, 2022 at 05:25:35PM +0200, Jan Beulich wrote:
>>>> On 27.04.2022 17:06, Roger Pau Monné wrote:
>>>>> On Wed, Apr 27, 2022 at 03:41:24PM +0200, Jan Beulich wrote:
>>>>>> On 27.04.2022 14:45, Roger Pau Monné wrote:
>>>>>>> On Tue, Apr 26, 2022 at 12:05:28PM +0200, Jan Beulich wrote:
>>>>>>>> --- unstable.orig/xen/arch/x86/cpu/mwait-idle.c
>>>>>>>> +++ unstable/xen/arch/x86/cpu/mwait-idle.c
>>>>>>>> @@ -82,6 +82,18 @@ boolean_param("mwait-idle", opt_mwait_id
>>>>>>>>  
>>>>>>>>  static unsigned int mwait_substates;
>>>>>>>>  
>>>>>>>> +/*
>>>>>>>> + * Some platforms come with mutually exclusive C-states, so that if one is
>>>>>>>> + * enabled, the other C-states must not be used. Example: C1 and C1E on
>>>>>>>> + * Sapphire Rapids platform. This parameter allows for selecting the
>>>>>>>> + * preferred C-states among the groups of mutually exclusive C-states - the
>>>>>>>> + * selected C-states will be registered, the other C-states from the mutually
>>>>>>>> + * exclusive group won't be registered. If the platform has no mutually
>>>>>>>> + * exclusive C-states, this parameter has no effect.
>>>>>>>> + */
>>>>>>>> +static unsigned int __ro_after_init preferred_states_mask;
>>>>>>>> +integer_param("preferred-cstates", preferred_states_mask);
>>>>>>>> +
>>>>>>>>  #define LAPIC_TIMER_ALWAYS_RELIABLE 0xFFFFFFFF
>>>>>>>>  /* Reliable LAPIC Timer States, bit 1 for C1 etc. Default to only C1. */
>>>>>>>>  static unsigned int lapic_timer_reliable_states = (1 << 1);
>>>>>>>> @@ -96,6 +108,7 @@ struct idle_cpu {
>>>>>>>>  	unsigned long auto_demotion_disable_flags;
>>>>>>>>  	bool byt_auto_demotion_disable_flag;
>>>>>>>>  	bool disable_promotion_to_c1e;
>>>>>>>> +	bool enable_promotion_to_c1e;
>>>>>>>
>>>>>>> I'm confused by those fields, shouldn't we just have:
>>>>>>> promotion_to_c1e = true | false?
>>>>>>>
>>>>>>> As one field is the negation of the other:
>>>>>>> enable_promotion_to_c1e = !disable_promotion_to_c1e
>>>>>>>
>>>>>>> I know this is code from Linux, but would like to understand why two
>>>>>>> fields are needed.
>>>>>>
>>>>>> This really is a tristate; Linux is now changing their global variable
>>>>>> to an enum, but we don't have an equivalent of that global variable.
>>>>>
>>>>> So it would be: leave default, disable C1E promotion, enable C1E
>>>>> promotion.
>>>>>
>>>>> And Linux is leaving the {disable,enable}_promotion_to_c1e in
>>>>> idle_cpu?
>>>>
>>>> Iirc they only have disable_promotion_to_c1e there (as a struct field)
>>>> and keep it, but they convert the similarly named file-scope variable
>>>> to a tristate.
>>>>
>>>>> I guess there's not much we can do unless we want to diverge from
>>>>> upstream.
>>>>
>>>> We've diverged some from Linux here already - as said, for example we
>>>> don't have their file-scope variable. I could convert our struct field
>>>> to an enum, but that would be larger code churn for (I think) little
>>>> gain.
>>>
>>> Hm, OK, could gaining the file scope variable would make sense in order
>>> to reduce divergences?  Or are the other roadblocks there?
>>
>> I don't recall. It might have originated from a change I decided to not
>> port over, or I might have dropped it while porting. To be honest I'm
>> not keen on putting time into researching this, the more that I would
>> generally try to avoid static variables.
>>
>> What I would be willing to put time in is making a more user friendly
>> command line option, but as said - I can't think of any good alternative
>> (except perhaps "preferred-cstates=c1e" or "cstates=preferred:c1e", with
>> internal translation of the strings into a bit mask, as long as (a) you
>> would think that's an improvement and (b) the further divergence from
>> Linux is not deemed a problem).
> 
> I think (b) won't be a problem as long as internally the user option
> is translated into a bitmask.
> 
> Regarding (a) I do think it would be helpful to express this in a more
> user friendly way, I'm not sure whether it would make sense to keep
> Linux format also for compatibility reasons if users already have a
> bitmask and want to use the same parameter for Xen and Linux, ie:
> 
> preferred-cstates = <string of c1e,c1,...> | <integer bitmask>

Yes, I would have been planning to accept both (but probably reject
mixing of both).

> What I think we should fix is the naming of the two booleans:
> 
> bool disable_promotion_to_c1e;
> bool enable_promotion_to_c1e;
> 
> I would rather translated this into an enum, as right now it's
> confusing IMO.

Okay, I can certainly do that. The more that I did consider doing
so already, breaking ties simply based on the "less code churn"
argument.

Jan