From: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
On Sapphire Rapids Xeon (SPR) the C1 and C1E states are basically mutually
exclusive - only one of them can be enabled. By default, 'intel_idle' driver
enables C1 and disables C1E. However, some users prefer to use C1E instead of
C1, because it saves more energy.
This patch adds a new module parameter ('preferred_cstates') for enabling C1E
and disabling C1. Here is the idea behind it.
1. This option has effect only for "mutually exclusive" C-states like C1 and
C1E on SPR.
2. It does not have any effect on independent C-states, which do not require
other C-states to be disabled (most states on most platforms as of today).
3. For mutually exclusive C-states, the 'intel_idle' driver always has a
reasonable default, such as enabling C1 on SPR by default. On other
platforms, the default may be different.
4. Users can override the default using the 'preferred_cstates' parameter.
5. The parameter accepts the preferred C-states bit-mask, similarly to the
existing 'states_off' parameter.
6. This parameter is not limited to C1/C1E, and leaves room for supporting
other mutually exclusive C-states, if they come in the future.
Today 'intel_idle' can only be compiled-in, which means that on SPR, in order
to disable C1 and enable C1E, users should boot with the following kernel
argument: intel_idle.preferred_cstates=4
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git da0e58c038e6
Enable C1E (if requested) not only on the BSP's socket / package.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
--- unstable.orig/docs/misc/xen-command-line.pandoc 2022-04-25 17:59:42.123387258 +0200
+++ unstable/docs/misc/xen-command-line.pandoc 2022-04-25 17:36:00.000000000 +0200
@@ -1884,6 +1884,12 @@ paging controls access to usermode addre
### ple_window (Intel)
> `= <integer>`
+### preferred-cstates (x86)
+> `= <integer>`
+
+This is a mask of C-states which are to be use preferably. This option is
+applicable only oh hardware were certain C-states are exlusive of one another.
+
### psr (Intel)
> `= List of ( cmt:<boolean> | rmid_max:<integer> | cat:<boolean> | cos_max:<integer> | cdp:<boolean> )`
--- unstable.orig/xen/arch/x86/cpu/mwait-idle.c 2022-04-25 17:17:05.000000000 +0200
+++ unstable/xen/arch/x86/cpu/mwait-idle.c 2022-04-25 17:33:47.000000000 +0200
@@ -82,6 +82,18 @@ boolean_param("mwait-idle", opt_mwait_id
static unsigned int mwait_substates;
+/*
+ * Some platforms come with mutually exclusive C-states, so that if one is
+ * enabled, the other C-states must not be used. Example: C1 and C1E on
+ * Sapphire Rapids platform. This parameter allows for selecting the
+ * preferred C-states among the groups of mutually exclusive C-states - the
+ * selected C-states will be registered, the other C-states from the mutually
+ * exclusive group won't be registered. If the platform has no mutually
+ * exclusive C-states, this parameter has no effect.
+ */
+static unsigned int __ro_after_init preferred_states_mask;
+integer_param("preferred-cstates", preferred_states_mask);
+
#define LAPIC_TIMER_ALWAYS_RELIABLE 0xFFFFFFFF
/* Reliable LAPIC Timer States, bit 1 for C1 etc. Default to only C1. */
static unsigned int lapic_timer_reliable_states = (1 << 1);
@@ -96,6 +108,7 @@ struct idle_cpu {
unsigned long auto_demotion_disable_flags;
bool byt_auto_demotion_disable_flag;
bool disable_promotion_to_c1e;
+ bool enable_promotion_to_c1e;
};
static const struct idle_cpu *icpu;
@@ -924,6 +937,15 @@ static void cf_check byt_auto_demotion_d
wrmsrl(MSR_MC6_DEMOTION_POLICY_CONFIG, 0);
}
+static void cf_check c1e_promotion_enable(void *dummy)
+{
+ uint64_t msr_bits;
+
+ rdmsrl(MSR_IA32_POWER_CTL, msr_bits);
+ msr_bits |= 0x2;
+ wrmsrl(MSR_IA32_POWER_CTL, msr_bits);
+}
+
static void cf_check c1e_promotion_disable(void *dummy)
{
u64 msr_bits;
@@ -1241,6 +1263,26 @@ static void __init skx_idle_state_table_
}
/*
+ * spr_idle_state_table_update - Adjust Sapphire Rapids idle states table.
+ */
+static void __init spr_idle_state_table_update(void)
+{
+ /* Check if user prefers C1E over C1. */
+ if (preferred_states_mask & BIT(2, U)) {
+ if (preferred_states_mask & BIT(1, U))
+ /* Both can't be enabled, stick to the defaults. */
+ return;
+
+ spr_cstates[0].flags |= CPUIDLE_FLAG_DISABLED;
+ spr_cstates[1].flags &= ~CPUIDLE_FLAG_DISABLED;
+
+ /* Request enabling C1E using the "C1E promotion" bit. */
+ idle_cpu_spr.disable_promotion_to_c1e = false;
+ idle_cpu_spr.enable_promotion_to_c1e = true;
+ }
+}
+
+/*
* mwait_idle_state_table_update()
*
* Update the default state_table for this CPU-id
@@ -1261,6 +1303,9 @@ static void __init mwait_idle_state_tabl
case INTEL_FAM6_SKYLAKE_X:
skx_idle_state_table_update();
break;
+ case INTEL_FAM6_SAPPHIRERAPIDS_X:
+ spr_idle_state_table_update();
+ break;
}
}
@@ -1402,6 +1447,8 @@ static int cf_check mwait_idle_cpu_init(
if (icpu->disable_promotion_to_c1e)
on_selected_cpus(cpumask_of(cpu), c1e_promotion_disable, NULL, 1);
+ else if (icpu->enable_promotion_to_c1e)
+ on_selected_cpus(cpumask_of(cpu), c1e_promotion_enable, NULL, 1);
return NOTIFY_DONE;
}
On Tue, Apr 26, 2022 at 12:05:28PM +0200, Jan Beulich wrote:
> From: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
>
> On Sapphire Rapids Xeon (SPR) the C1 and C1E states are basically mutually
> exclusive - only one of them can be enabled. By default, 'intel_idle' driver
> enables C1 and disables C1E. However, some users prefer to use C1E instead of
> C1, because it saves more energy.
>
> This patch adds a new module parameter ('preferred_cstates') for enabling C1E
> and disabling C1. Here is the idea behind it.
>
> 1. This option has effect only for "mutually exclusive" C-states like C1 and
> C1E on SPR.
> 2. It does not have any effect on independent C-states, which do not require
> other C-states to be disabled (most states on most platforms as of today).
> 3. For mutually exclusive C-states, the 'intel_idle' driver always has a
> reasonable default, such as enabling C1 on SPR by default. On other
> platforms, the default may be different.
> 4. Users can override the default using the 'preferred_cstates' parameter.
> 5. The parameter accepts the preferred C-states bit-mask, similarly to the
> existing 'states_off' parameter.
> 6. This parameter is not limited to C1/C1E, and leaves room for supporting
> other mutually exclusive C-states, if they come in the future.
>
> Today 'intel_idle' can only be compiled-in, which means that on SPR, in order
> to disable C1 and enable C1E, users should boot with the following kernel
> argument: intel_idle.preferred_cstates=4
>
> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git da0e58c038e6
>
> Enable C1E (if requested) not only on the BSP's socket / package.
Maybe we should also add a note here that the command line option for
Xen is preferred-cstates instead of intel_idle.preferred_cstates?
I think this is a bad interface however, we should have a more generic
option (ie: cstate-mode = 'performance | powersave') so that users
don't have to fiddle with model specific C state masks.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>
> --- unstable.orig/docs/misc/xen-command-line.pandoc 2022-04-25 17:59:42.123387258 +0200
> +++ unstable/docs/misc/xen-command-line.pandoc 2022-04-25 17:36:00.000000000 +0200
> @@ -1884,6 +1884,12 @@ paging controls access to usermode addre
> ### ple_window (Intel)
> > `= <integer>`
>
> +### preferred-cstates (x86)
> +> `= <integer>`
> +
> +This is a mask of C-states which are to be use preferably. This option is
> +applicable only oh hardware were certain C-states are exlusive of one another.
> +
> ### psr (Intel)
> > `= List of ( cmt:<boolean> | rmid_max:<integer> | cat:<boolean> | cos_max:<integer> | cdp:<boolean> )`
>
> --- unstable.orig/xen/arch/x86/cpu/mwait-idle.c 2022-04-25 17:17:05.000000000 +0200
> +++ unstable/xen/arch/x86/cpu/mwait-idle.c 2022-04-25 17:33:47.000000000 +0200
> @@ -82,6 +82,18 @@ boolean_param("mwait-idle", opt_mwait_id
>
> static unsigned int mwait_substates;
>
> +/*
> + * Some platforms come with mutually exclusive C-states, so that if one is
> + * enabled, the other C-states must not be used. Example: C1 and C1E on
> + * Sapphire Rapids platform. This parameter allows for selecting the
> + * preferred C-states among the groups of mutually exclusive C-states - the
> + * selected C-states will be registered, the other C-states from the mutually
> + * exclusive group won't be registered. If the platform has no mutually
> + * exclusive C-states, this parameter has no effect.
> + */
> +static unsigned int __ro_after_init preferred_states_mask;
> +integer_param("preferred-cstates", preferred_states_mask);
> +
> #define LAPIC_TIMER_ALWAYS_RELIABLE 0xFFFFFFFF
> /* Reliable LAPIC Timer States, bit 1 for C1 etc. Default to only C1. */
> static unsigned int lapic_timer_reliable_states = (1 << 1);
> @@ -96,6 +108,7 @@ struct idle_cpu {
> unsigned long auto_demotion_disable_flags;
> bool byt_auto_demotion_disable_flag;
> bool disable_promotion_to_c1e;
> + bool enable_promotion_to_c1e;
I'm confused by those fields, shouldn't we just have:
promotion_to_c1e = true | false?
As one field is the negation of the other:
enable_promotion_to_c1e = !disable_promotion_to_c1e
I know this is code from Linux, but would like to understand why two
fields are needed.
Thanks, Roger.
On 27.04.2022 14:45, Roger Pau Monné wrote:
> On Tue, Apr 26, 2022 at 12:05:28PM +0200, Jan Beulich wrote:
>> From: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
>>
>> On Sapphire Rapids Xeon (SPR) the C1 and C1E states are basically mutually
>> exclusive - only one of them can be enabled. By default, 'intel_idle' driver
>> enables C1 and disables C1E. However, some users prefer to use C1E instead of
>> C1, because it saves more energy.
>>
>> This patch adds a new module parameter ('preferred_cstates') for enabling C1E
>> and disabling C1. Here is the idea behind it.
>>
>> 1. This option has effect only for "mutually exclusive" C-states like C1 and
>> C1E on SPR.
>> 2. It does not have any effect on independent C-states, which do not require
>> other C-states to be disabled (most states on most platforms as of today).
>> 3. For mutually exclusive C-states, the 'intel_idle' driver always has a
>> reasonable default, such as enabling C1 on SPR by default. On other
>> platforms, the default may be different.
>> 4. Users can override the default using the 'preferred_cstates' parameter.
>> 5. The parameter accepts the preferred C-states bit-mask, similarly to the
>> existing 'states_off' parameter.
>> 6. This parameter is not limited to C1/C1E, and leaves room for supporting
>> other mutually exclusive C-states, if they come in the future.
>>
>> Today 'intel_idle' can only be compiled-in, which means that on SPR, in order
>> to disable C1 and enable C1E, users should boot with the following kernel
>> argument: intel_idle.preferred_cstates=4
>>
>> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git da0e58c038e6
>>
>> Enable C1E (if requested) not only on the BSP's socket / package.
>
> Maybe we should also add a note here that the command line option for
> Xen is preferred-cstates instead of intel_idle.preferred_cstates?
>
> I think this is a bad interface however, we should have a more generic
> option (ie: cstate-mode = 'performance | powersave') so that users
> don't have to fiddle with model specific C state masks.
Performance vs powersave doesn't cover it imo, especially if down
the road more states would appear which can be controlled this way.
I don't think there's a way around providing _some_ way to control
things one a per-state level. When porting this over, I too didn't
like this interface very much, but I had no good replacement idea.
>> --- unstable.orig/xen/arch/x86/cpu/mwait-idle.c
>> +++ unstable/xen/arch/x86/cpu/mwait-idle.c
>> @@ -82,6 +82,18 @@ boolean_param("mwait-idle", opt_mwait_id
>>
>> static unsigned int mwait_substates;
>>
>> +/*
>> + * Some platforms come with mutually exclusive C-states, so that if one is
>> + * enabled, the other C-states must not be used. Example: C1 and C1E on
>> + * Sapphire Rapids platform. This parameter allows for selecting the
>> + * preferred C-states among the groups of mutually exclusive C-states - the
>> + * selected C-states will be registered, the other C-states from the mutually
>> + * exclusive group won't be registered. If the platform has no mutually
>> + * exclusive C-states, this parameter has no effect.
>> + */
>> +static unsigned int __ro_after_init preferred_states_mask;
>> +integer_param("preferred-cstates", preferred_states_mask);
>> +
>> #define LAPIC_TIMER_ALWAYS_RELIABLE 0xFFFFFFFF
>> /* Reliable LAPIC Timer States, bit 1 for C1 etc. Default to only C1. */
>> static unsigned int lapic_timer_reliable_states = (1 << 1);
>> @@ -96,6 +108,7 @@ struct idle_cpu {
>> unsigned long auto_demotion_disable_flags;
>> bool byt_auto_demotion_disable_flag;
>> bool disable_promotion_to_c1e;
>> + bool enable_promotion_to_c1e;
>
> I'm confused by those fields, shouldn't we just have:
> promotion_to_c1e = true | false?
>
> As one field is the negation of the other:
> enable_promotion_to_c1e = !disable_promotion_to_c1e
>
> I know this is code from Linux, but would like to understand why two
> fields are needed.
This really is a tristate; Linux is now changing their global variable
to an enum, but we don't have an equivalent of that global variable.
Jan
On Wed, Apr 27, 2022 at 03:41:24PM +0200, Jan Beulich wrote:
> On 27.04.2022 14:45, Roger Pau Monné wrote:
> > On Tue, Apr 26, 2022 at 12:05:28PM +0200, Jan Beulich wrote:
> >> From: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
> >>
> >> On Sapphire Rapids Xeon (SPR) the C1 and C1E states are basically mutually
> >> exclusive - only one of them can be enabled. By default, 'intel_idle' driver
> >> enables C1 and disables C1E. However, some users prefer to use C1E instead of
> >> C1, because it saves more energy.
> >>
> >> This patch adds a new module parameter ('preferred_cstates') for enabling C1E
> >> and disabling C1. Here is the idea behind it.
> >>
> >> 1. This option has effect only for "mutually exclusive" C-states like C1 and
> >> C1E on SPR.
> >> 2. It does not have any effect on independent C-states, which do not require
> >> other C-states to be disabled (most states on most platforms as of today).
> >> 3. For mutually exclusive C-states, the 'intel_idle' driver always has a
> >> reasonable default, such as enabling C1 on SPR by default. On other
> >> platforms, the default may be different.
> >> 4. Users can override the default using the 'preferred_cstates' parameter.
> >> 5. The parameter accepts the preferred C-states bit-mask, similarly to the
> >> existing 'states_off' parameter.
> >> 6. This parameter is not limited to C1/C1E, and leaves room for supporting
> >> other mutually exclusive C-states, if they come in the future.
> >>
> >> Today 'intel_idle' can only be compiled-in, which means that on SPR, in order
> >> to disable C1 and enable C1E, users should boot with the following kernel
> >> argument: intel_idle.preferred_cstates=4
> >>
> >> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
> >> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >> Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git da0e58c038e6
> >>
> >> Enable C1E (if requested) not only on the BSP's socket / package.
> >
> > Maybe we should also add a note here that the command line option for
> > Xen is preferred-cstates instead of intel_idle.preferred_cstates?
> >
> > I think this is a bad interface however, we should have a more generic
> > option (ie: cstate-mode = 'performance | powersave') so that users
> > don't have to fiddle with model specific C state masks.
>
> Performance vs powersave doesn't cover it imo, especially if down
> the road more states would appear which can be controlled this way.
> I don't think there's a way around providing _some_ way to control
> things one a per-state level. When porting this over, I too didn't
> like this interface very much, but I had no good replacement idea.
I think it's fine to have this more fine grained control of C states,
but it doesn't seem practical from a user (or distro) PoV. But then I
also wonder how much of a difference this will make regarding power
consumption.
> >> --- unstable.orig/xen/arch/x86/cpu/mwait-idle.c
> >> +++ unstable/xen/arch/x86/cpu/mwait-idle.c
> >> @@ -82,6 +82,18 @@ boolean_param("mwait-idle", opt_mwait_id
> >>
> >> static unsigned int mwait_substates;
> >>
> >> +/*
> >> + * Some platforms come with mutually exclusive C-states, so that if one is
> >> + * enabled, the other C-states must not be used. Example: C1 and C1E on
> >> + * Sapphire Rapids platform. This parameter allows for selecting the
> >> + * preferred C-states among the groups of mutually exclusive C-states - the
> >> + * selected C-states will be registered, the other C-states from the mutually
> >> + * exclusive group won't be registered. If the platform has no mutually
> >> + * exclusive C-states, this parameter has no effect.
> >> + */
> >> +static unsigned int __ro_after_init preferred_states_mask;
> >> +integer_param("preferred-cstates", preferred_states_mask);
> >> +
> >> #define LAPIC_TIMER_ALWAYS_RELIABLE 0xFFFFFFFF
> >> /* Reliable LAPIC Timer States, bit 1 for C1 etc. Default to only C1. */
> >> static unsigned int lapic_timer_reliable_states = (1 << 1);
> >> @@ -96,6 +108,7 @@ struct idle_cpu {
> >> unsigned long auto_demotion_disable_flags;
> >> bool byt_auto_demotion_disable_flag;
> >> bool disable_promotion_to_c1e;
> >> + bool enable_promotion_to_c1e;
> >
> > I'm confused by those fields, shouldn't we just have:
> > promotion_to_c1e = true | false?
> >
> > As one field is the negation of the other:
> > enable_promotion_to_c1e = !disable_promotion_to_c1e
> >
> > I know this is code from Linux, but would like to understand why two
> > fields are needed.
>
> This really is a tristate; Linux is now changing their global variable
> to an enum, but we don't have an equivalent of that global variable.
So it would be: leave default, disable C1E promotion, enable C1E
promotion.
And Linux is leaving the {disable,enable}_promotion_to_c1e in
idle_cpu?
I guess there's not much we can do unless we want to diverge from
upstream.
Thanks, Roger.
On 27.04.2022 17:06, Roger Pau Monné wrote:
> On Wed, Apr 27, 2022 at 03:41:24PM +0200, Jan Beulich wrote:
>> On 27.04.2022 14:45, Roger Pau Monné wrote:
>>> On Tue, Apr 26, 2022 at 12:05:28PM +0200, Jan Beulich wrote:
>>>> --- unstable.orig/xen/arch/x86/cpu/mwait-idle.c
>>>> +++ unstable/xen/arch/x86/cpu/mwait-idle.c
>>>> @@ -82,6 +82,18 @@ boolean_param("mwait-idle", opt_mwait_id
>>>>
>>>> static unsigned int mwait_substates;
>>>>
>>>> +/*
>>>> + * Some platforms come with mutually exclusive C-states, so that if one is
>>>> + * enabled, the other C-states must not be used. Example: C1 and C1E on
>>>> + * Sapphire Rapids platform. This parameter allows for selecting the
>>>> + * preferred C-states among the groups of mutually exclusive C-states - the
>>>> + * selected C-states will be registered, the other C-states from the mutually
>>>> + * exclusive group won't be registered. If the platform has no mutually
>>>> + * exclusive C-states, this parameter has no effect.
>>>> + */
>>>> +static unsigned int __ro_after_init preferred_states_mask;
>>>> +integer_param("preferred-cstates", preferred_states_mask);
>>>> +
>>>> #define LAPIC_TIMER_ALWAYS_RELIABLE 0xFFFFFFFF
>>>> /* Reliable LAPIC Timer States, bit 1 for C1 etc. Default to only C1. */
>>>> static unsigned int lapic_timer_reliable_states = (1 << 1);
>>>> @@ -96,6 +108,7 @@ struct idle_cpu {
>>>> unsigned long auto_demotion_disable_flags;
>>>> bool byt_auto_demotion_disable_flag;
>>>> bool disable_promotion_to_c1e;
>>>> + bool enable_promotion_to_c1e;
>>>
>>> I'm confused by those fields, shouldn't we just have:
>>> promotion_to_c1e = true | false?
>>>
>>> As one field is the negation of the other:
>>> enable_promotion_to_c1e = !disable_promotion_to_c1e
>>>
>>> I know this is code from Linux, but would like to understand why two
>>> fields are needed.
>>
>> This really is a tristate; Linux is now changing their global variable
>> to an enum, but we don't have an equivalent of that global variable.
>
> So it would be: leave default, disable C1E promotion, enable C1E
> promotion.
>
> And Linux is leaving the {disable,enable}_promotion_to_c1e in
> idle_cpu?
Iirc they only have disable_promotion_to_c1e there (as a struct field)
and keep it, but they convert the similarly named file-scope variable
to a tristate.
> I guess there's not much we can do unless we want to diverge from
> upstream.
We've diverged some from Linux here already - as said, for example we
don't have their file-scope variable. I could convert our struct field
to an enum, but that would be larger code churn for (I think) little
gain.
Jan
On Wed, Apr 27, 2022 at 05:25:35PM +0200, Jan Beulich wrote:
> On 27.04.2022 17:06, Roger Pau Monné wrote:
> > On Wed, Apr 27, 2022 at 03:41:24PM +0200, Jan Beulich wrote:
> >> On 27.04.2022 14:45, Roger Pau Monné wrote:
> >>> On Tue, Apr 26, 2022 at 12:05:28PM +0200, Jan Beulich wrote:
> >>>> --- unstable.orig/xen/arch/x86/cpu/mwait-idle.c
> >>>> +++ unstable/xen/arch/x86/cpu/mwait-idle.c
> >>>> @@ -82,6 +82,18 @@ boolean_param("mwait-idle", opt_mwait_id
> >>>>
> >>>> static unsigned int mwait_substates;
> >>>>
> >>>> +/*
> >>>> + * Some platforms come with mutually exclusive C-states, so that if one is
> >>>> + * enabled, the other C-states must not be used. Example: C1 and C1E on
> >>>> + * Sapphire Rapids platform. This parameter allows for selecting the
> >>>> + * preferred C-states among the groups of mutually exclusive C-states - the
> >>>> + * selected C-states will be registered, the other C-states from the mutually
> >>>> + * exclusive group won't be registered. If the platform has no mutually
> >>>> + * exclusive C-states, this parameter has no effect.
> >>>> + */
> >>>> +static unsigned int __ro_after_init preferred_states_mask;
> >>>> +integer_param("preferred-cstates", preferred_states_mask);
> >>>> +
> >>>> #define LAPIC_TIMER_ALWAYS_RELIABLE 0xFFFFFFFF
> >>>> /* Reliable LAPIC Timer States, bit 1 for C1 etc. Default to only C1. */
> >>>> static unsigned int lapic_timer_reliable_states = (1 << 1);
> >>>> @@ -96,6 +108,7 @@ struct idle_cpu {
> >>>> unsigned long auto_demotion_disable_flags;
> >>>> bool byt_auto_demotion_disable_flag;
> >>>> bool disable_promotion_to_c1e;
> >>>> + bool enable_promotion_to_c1e;
> >>>
> >>> I'm confused by those fields, shouldn't we just have:
> >>> promotion_to_c1e = true | false?
> >>>
> >>> As one field is the negation of the other:
> >>> enable_promotion_to_c1e = !disable_promotion_to_c1e
> >>>
> >>> I know this is code from Linux, but would like to understand why two
> >>> fields are needed.
> >>
> >> This really is a tristate; Linux is now changing their global variable
> >> to an enum, but we don't have an equivalent of that global variable.
> >
> > So it would be: leave default, disable C1E promotion, enable C1E
> > promotion.
> >
> > And Linux is leaving the {disable,enable}_promotion_to_c1e in
> > idle_cpu?
>
> Iirc they only have disable_promotion_to_c1e there (as a struct field)
> and keep it, but they convert the similarly named file-scope variable
> to a tristate.
>
> > I guess there's not much we can do unless we want to diverge from
> > upstream.
>
> We've diverged some from Linux here already - as said, for example we
> don't have their file-scope variable. I could convert our struct field
> to an enum, but that would be larger code churn for (I think) little
> gain.
Hm, OK, could gaining the file scope variable would make sense in order
to reduce divergences? Or are the other roadblocks there?
I think this is ugly, but would make sense as long as it allows us to
keep closer to upstream.
Thanks, Roger.
On 27.04.2022 18:12, Roger Pau Monné wrote:
> On Wed, Apr 27, 2022 at 05:25:35PM +0200, Jan Beulich wrote:
>> On 27.04.2022 17:06, Roger Pau Monné wrote:
>>> On Wed, Apr 27, 2022 at 03:41:24PM +0200, Jan Beulich wrote:
>>>> On 27.04.2022 14:45, Roger Pau Monné wrote:
>>>>> On Tue, Apr 26, 2022 at 12:05:28PM +0200, Jan Beulich wrote:
>>>>>> --- unstable.orig/xen/arch/x86/cpu/mwait-idle.c
>>>>>> +++ unstable/xen/arch/x86/cpu/mwait-idle.c
>>>>>> @@ -82,6 +82,18 @@ boolean_param("mwait-idle", opt_mwait_id
>>>>>>
>>>>>> static unsigned int mwait_substates;
>>>>>>
>>>>>> +/*
>>>>>> + * Some platforms come with mutually exclusive C-states, so that if one is
>>>>>> + * enabled, the other C-states must not be used. Example: C1 and C1E on
>>>>>> + * Sapphire Rapids platform. This parameter allows for selecting the
>>>>>> + * preferred C-states among the groups of mutually exclusive C-states - the
>>>>>> + * selected C-states will be registered, the other C-states from the mutually
>>>>>> + * exclusive group won't be registered. If the platform has no mutually
>>>>>> + * exclusive C-states, this parameter has no effect.
>>>>>> + */
>>>>>> +static unsigned int __ro_after_init preferred_states_mask;
>>>>>> +integer_param("preferred-cstates", preferred_states_mask);
>>>>>> +
>>>>>> #define LAPIC_TIMER_ALWAYS_RELIABLE 0xFFFFFFFF
>>>>>> /* Reliable LAPIC Timer States, bit 1 for C1 etc. Default to only C1. */
>>>>>> static unsigned int lapic_timer_reliable_states = (1 << 1);
>>>>>> @@ -96,6 +108,7 @@ struct idle_cpu {
>>>>>> unsigned long auto_demotion_disable_flags;
>>>>>> bool byt_auto_demotion_disable_flag;
>>>>>> bool disable_promotion_to_c1e;
>>>>>> + bool enable_promotion_to_c1e;
>>>>>
>>>>> I'm confused by those fields, shouldn't we just have:
>>>>> promotion_to_c1e = true | false?
>>>>>
>>>>> As one field is the negation of the other:
>>>>> enable_promotion_to_c1e = !disable_promotion_to_c1e
>>>>>
>>>>> I know this is code from Linux, but would like to understand why two
>>>>> fields are needed.
>>>>
>>>> This really is a tristate; Linux is now changing their global variable
>>>> to an enum, but we don't have an equivalent of that global variable.
>>>
>>> So it would be: leave default, disable C1E promotion, enable C1E
>>> promotion.
>>>
>>> And Linux is leaving the {disable,enable}_promotion_to_c1e in
>>> idle_cpu?
>>
>> Iirc they only have disable_promotion_to_c1e there (as a struct field)
>> and keep it, but they convert the similarly named file-scope variable
>> to a tristate.
>>
>>> I guess there's not much we can do unless we want to diverge from
>>> upstream.
>>
>> We've diverged some from Linux here already - as said, for example we
>> don't have their file-scope variable. I could convert our struct field
>> to an enum, but that would be larger code churn for (I think) little
>> gain.
>
> Hm, OK, could gaining the file scope variable would make sense in order
> to reduce divergences? Or are the other roadblocks there?
I don't recall. It might have originated from a change I decided to not
port over, or I might have dropped it while porting. To be honest I'm
not keen on putting time into researching this, the more that I would
generally try to avoid static variables.
What I would be willing to put time in is making a more user friendly
command line option, but as said - I can't think of any good alternative
(except perhaps "preferred-cstates=c1e" or "cstates=preferred:c1e", with
internal translation of the strings into a bit mask, as long as (a) you
would think that's an improvement and (b) the further divergence from
Linux is not deemed a problem).
Jan
> I think this is ugly, but would make sense as long as it allows us to
> keep closer to upstream.
>
> Thanks, Roger.
>
On Thu, Apr 28, 2022 at 08:37:58AM +0200, Jan Beulich wrote:
> On 27.04.2022 18:12, Roger Pau Monné wrote:
> > On Wed, Apr 27, 2022 at 05:25:35PM +0200, Jan Beulich wrote:
> >> On 27.04.2022 17:06, Roger Pau Monné wrote:
> >>> On Wed, Apr 27, 2022 at 03:41:24PM +0200, Jan Beulich wrote:
> >>>> On 27.04.2022 14:45, Roger Pau Monné wrote:
> >>>>> On Tue, Apr 26, 2022 at 12:05:28PM +0200, Jan Beulich wrote:
> >>>>>> --- unstable.orig/xen/arch/x86/cpu/mwait-idle.c
> >>>>>> +++ unstable/xen/arch/x86/cpu/mwait-idle.c
> >>>>>> @@ -82,6 +82,18 @@ boolean_param("mwait-idle", opt_mwait_id
> >>>>>>
> >>>>>> static unsigned int mwait_substates;
> >>>>>>
> >>>>>> +/*
> >>>>>> + * Some platforms come with mutually exclusive C-states, so that if one is
> >>>>>> + * enabled, the other C-states must not be used. Example: C1 and C1E on
> >>>>>> + * Sapphire Rapids platform. This parameter allows for selecting the
> >>>>>> + * preferred C-states among the groups of mutually exclusive C-states - the
> >>>>>> + * selected C-states will be registered, the other C-states from the mutually
> >>>>>> + * exclusive group won't be registered. If the platform has no mutually
> >>>>>> + * exclusive C-states, this parameter has no effect.
> >>>>>> + */
> >>>>>> +static unsigned int __ro_after_init preferred_states_mask;
> >>>>>> +integer_param("preferred-cstates", preferred_states_mask);
> >>>>>> +
> >>>>>> #define LAPIC_TIMER_ALWAYS_RELIABLE 0xFFFFFFFF
> >>>>>> /* Reliable LAPIC Timer States, bit 1 for C1 etc. Default to only C1. */
> >>>>>> static unsigned int lapic_timer_reliable_states = (1 << 1);
> >>>>>> @@ -96,6 +108,7 @@ struct idle_cpu {
> >>>>>> unsigned long auto_demotion_disable_flags;
> >>>>>> bool byt_auto_demotion_disable_flag;
> >>>>>> bool disable_promotion_to_c1e;
> >>>>>> + bool enable_promotion_to_c1e;
> >>>>>
> >>>>> I'm confused by those fields, shouldn't we just have:
> >>>>> promotion_to_c1e = true | false?
> >>>>>
> >>>>> As one field is the negation of the other:
> >>>>> enable_promotion_to_c1e = !disable_promotion_to_c1e
> >>>>>
> >>>>> I know this is code from Linux, but would like to understand why two
> >>>>> fields are needed.
> >>>>
> >>>> This really is a tristate; Linux is now changing their global variable
> >>>> to an enum, but we don't have an equivalent of that global variable.
> >>>
> >>> So it would be: leave default, disable C1E promotion, enable C1E
> >>> promotion.
> >>>
> >>> And Linux is leaving the {disable,enable}_promotion_to_c1e in
> >>> idle_cpu?
> >>
> >> Iirc they only have disable_promotion_to_c1e there (as a struct field)
> >> and keep it, but they convert the similarly named file-scope variable
> >> to a tristate.
> >>
> >>> I guess there's not much we can do unless we want to diverge from
> >>> upstream.
> >>
> >> We've diverged some from Linux here already - as said, for example we
> >> don't have their file-scope variable. I could convert our struct field
> >> to an enum, but that would be larger code churn for (I think) little
> >> gain.
> >
> > Hm, OK, could gaining the file scope variable would make sense in order
> > to reduce divergences? Or are the other roadblocks there?
>
> I don't recall. It might have originated from a change I decided to not
> port over, or I might have dropped it while porting. To be honest I'm
> not keen on putting time into researching this, the more that I would
> generally try to avoid static variables.
>
> What I would be willing to put time in is making a more user friendly
> command line option, but as said - I can't think of any good alternative
> (except perhaps "preferred-cstates=c1e" or "cstates=preferred:c1e", with
> internal translation of the strings into a bit mask, as long as (a) you
> would think that's an improvement and (b) the further divergence from
> Linux is not deemed a problem).
I think (b) won't be a problem as long as internally the user option
is translated into a bitmask.
Regarding (a) I do think it would be helpful to express this in a more
user friendly way, I'm not sure whether it would make sense to keep
Linux format also for compatibility reasons if users already have a
bitmask and want to use the same parameter for Xen and Linux, ie:
preferred-cstates = <string of c1e,c1,...> | <integer bitmask>
What I think we should fix is the naming of the two booleans:
bool disable_promotion_to_c1e;
bool enable_promotion_to_c1e;
I would rather translated this into an enum, as right now it's
confusing IMO.
Thanks, Roger.
On 28.04.2022 10:06, Roger Pau Monné wrote:
> On Thu, Apr 28, 2022 at 08:37:58AM +0200, Jan Beulich wrote:
>> On 27.04.2022 18:12, Roger Pau Monné wrote:
>>> On Wed, Apr 27, 2022 at 05:25:35PM +0200, Jan Beulich wrote:
>>>> On 27.04.2022 17:06, Roger Pau Monné wrote:
>>>>> On Wed, Apr 27, 2022 at 03:41:24PM +0200, Jan Beulich wrote:
>>>>>> On 27.04.2022 14:45, Roger Pau Monné wrote:
>>>>>>> On Tue, Apr 26, 2022 at 12:05:28PM +0200, Jan Beulich wrote:
>>>>>>>> --- unstable.orig/xen/arch/x86/cpu/mwait-idle.c
>>>>>>>> +++ unstable/xen/arch/x86/cpu/mwait-idle.c
>>>>>>>> @@ -82,6 +82,18 @@ boolean_param("mwait-idle", opt_mwait_id
>>>>>>>>
>>>>>>>> static unsigned int mwait_substates;
>>>>>>>>
>>>>>>>> +/*
>>>>>>>> + * Some platforms come with mutually exclusive C-states, so that if one is
>>>>>>>> + * enabled, the other C-states must not be used. Example: C1 and C1E on
>>>>>>>> + * Sapphire Rapids platform. This parameter allows for selecting the
>>>>>>>> + * preferred C-states among the groups of mutually exclusive C-states - the
>>>>>>>> + * selected C-states will be registered, the other C-states from the mutually
>>>>>>>> + * exclusive group won't be registered. If the platform has no mutually
>>>>>>>> + * exclusive C-states, this parameter has no effect.
>>>>>>>> + */
>>>>>>>> +static unsigned int __ro_after_init preferred_states_mask;
>>>>>>>> +integer_param("preferred-cstates", preferred_states_mask);
>>>>>>>> +
>>>>>>>> #define LAPIC_TIMER_ALWAYS_RELIABLE 0xFFFFFFFF
>>>>>>>> /* Reliable LAPIC Timer States, bit 1 for C1 etc. Default to only C1. */
>>>>>>>> static unsigned int lapic_timer_reliable_states = (1 << 1);
>>>>>>>> @@ -96,6 +108,7 @@ struct idle_cpu {
>>>>>>>> unsigned long auto_demotion_disable_flags;
>>>>>>>> bool byt_auto_demotion_disable_flag;
>>>>>>>> bool disable_promotion_to_c1e;
>>>>>>>> + bool enable_promotion_to_c1e;
>>>>>>>
>>>>>>> I'm confused by those fields, shouldn't we just have:
>>>>>>> promotion_to_c1e = true | false?
>>>>>>>
>>>>>>> As one field is the negation of the other:
>>>>>>> enable_promotion_to_c1e = !disable_promotion_to_c1e
>>>>>>>
>>>>>>> I know this is code from Linux, but would like to understand why two
>>>>>>> fields are needed.
>>>>>>
>>>>>> This really is a tristate; Linux is now changing their global variable
>>>>>> to an enum, but we don't have an equivalent of that global variable.
>>>>>
>>>>> So it would be: leave default, disable C1E promotion, enable C1E
>>>>> promotion.
>>>>>
>>>>> And Linux is leaving the {disable,enable}_promotion_to_c1e in
>>>>> idle_cpu?
>>>>
>>>> Iirc they only have disable_promotion_to_c1e there (as a struct field)
>>>> and keep it, but they convert the similarly named file-scope variable
>>>> to a tristate.
>>>>
>>>>> I guess there's not much we can do unless we want to diverge from
>>>>> upstream.
>>>>
>>>> We've diverged some from Linux here already - as said, for example we
>>>> don't have their file-scope variable. I could convert our struct field
>>>> to an enum, but that would be larger code churn for (I think) little
>>>> gain.
>>>
>>> Hm, OK, could gaining the file scope variable would make sense in order
>>> to reduce divergences? Or are the other roadblocks there?
>>
>> I don't recall. It might have originated from a change I decided to not
>> port over, or I might have dropped it while porting. To be honest I'm
>> not keen on putting time into researching this, the more that I would
>> generally try to avoid static variables.
>>
>> What I would be willing to put time in is making a more user friendly
>> command line option, but as said - I can't think of any good alternative
>> (except perhaps "preferred-cstates=c1e" or "cstates=preferred:c1e", with
>> internal translation of the strings into a bit mask, as long as (a) you
>> would think that's an improvement and (b) the further divergence from
>> Linux is not deemed a problem).
>
> I think (b) won't be a problem as long as internally the user option
> is translated into a bitmask.
>
> Regarding (a) I do think it would be helpful to express this in a more
> user friendly way, I'm not sure whether it would make sense to keep
> Linux format also for compatibility reasons if users already have a
> bitmask and want to use the same parameter for Xen and Linux, ie:
>
> preferred-cstates = <string of c1e,c1,...> | <integer bitmask>
Yes, I would have been planning to accept both (but probably reject
mixing of both).
> What I think we should fix is the naming of the two booleans:
>
> bool disable_promotion_to_c1e;
> bool enable_promotion_to_c1e;
>
> I would rather translated this into an enum, as right now it's
> confusing IMO.
Okay, I can certainly do that. The more that I did consider doing
so already, breaking ties simply based on the "less code churn"
argument.
Jan
© 2016 - 2026 Red Hat, Inc.