From: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
On Sapphire Rapids Xeon (SPR) the C1 and C1E states are basically mutually
exclusive - only one of them can be enabled. By default, 'intel_idle' driver
enables C1 and disables C1E. However, some users prefer to use C1E instead of
C1, because it saves more energy.
This patch adds a new module parameter ('preferred_cstates') for enabling C1E
and disabling C1. Here is the idea behind it.
1. This option has effect only for "mutually exclusive" C-states like C1 and
C1E on SPR.
2. It does not have any effect on independent C-states, which do not require
other C-states to be disabled (most states on most platforms as of today).
3. For mutually exclusive C-states, the 'intel_idle' driver always has a
reasonable default, such as enabling C1 on SPR by default. On other
platforms, the default may be different.
4. Users can override the default using the 'preferred_cstates' parameter.
5. The parameter accepts the preferred C-states bit-mask, similarly to the
existing 'states_off' parameter.
6. This parameter is not limited to C1/C1E, and leaves room for supporting
other mutually exclusive C-states, if they come in the future.
Today 'intel_idle' can only be compiled-in, which means that on SPR, in order
to disable C1 and enable C1E, users should boot with the following kernel
argument: intel_idle.preferred_cstates=4
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git da0e58c038e6
Enable C1E (if requested) not only on the BSP's socket / package.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
--- unstable.orig/docs/misc/xen-command-line.pandoc 2022-04-25 17:59:42.123387258 +0200
+++ unstable/docs/misc/xen-command-line.pandoc 2022-04-25 17:36:00.000000000 +0200
@@ -1884,6 +1884,12 @@ paging controls access to usermode addre
### ple_window (Intel)
> `= <integer>`
+### preferred-cstates (x86)
+> `= <integer>`
+
+This is a mask of C-states which are to be use preferably. This option is
+applicable only oh hardware were certain C-states are exlusive of one another.
+
### psr (Intel)
> `= List of ( cmt:<boolean> | rmid_max:<integer> | cat:<boolean> | cos_max:<integer> | cdp:<boolean> )`
--- unstable.orig/xen/arch/x86/cpu/mwait-idle.c 2022-04-25 17:17:05.000000000 +0200
+++ unstable/xen/arch/x86/cpu/mwait-idle.c 2022-04-25 17:33:47.000000000 +0200
@@ -82,6 +82,18 @@ boolean_param("mwait-idle", opt_mwait_id
static unsigned int mwait_substates;
+/*
+ * Some platforms come with mutually exclusive C-states, so that if one is
+ * enabled, the other C-states must not be used. Example: C1 and C1E on
+ * Sapphire Rapids platform. This parameter allows for selecting the
+ * preferred C-states among the groups of mutually exclusive C-states - the
+ * selected C-states will be registered, the other C-states from the mutually
+ * exclusive group won't be registered. If the platform has no mutually
+ * exclusive C-states, this parameter has no effect.
+ */
+static unsigned int __ro_after_init preferred_states_mask;
+integer_param("preferred-cstates", preferred_states_mask);
+
#define LAPIC_TIMER_ALWAYS_RELIABLE 0xFFFFFFFF
/* Reliable LAPIC Timer States, bit 1 for C1 etc. Default to only C1. */
static unsigned int lapic_timer_reliable_states = (1 << 1);
@@ -96,6 +108,7 @@ struct idle_cpu {
unsigned long auto_demotion_disable_flags;
bool byt_auto_demotion_disable_flag;
bool disable_promotion_to_c1e;
+ bool enable_promotion_to_c1e;
};
static const struct idle_cpu *icpu;
@@ -924,6 +937,15 @@ static void cf_check byt_auto_demotion_d
wrmsrl(MSR_MC6_DEMOTION_POLICY_CONFIG, 0);
}
+static void cf_check c1e_promotion_enable(void *dummy)
+{
+ uint64_t msr_bits;
+
+ rdmsrl(MSR_IA32_POWER_CTL, msr_bits);
+ msr_bits |= 0x2;
+ wrmsrl(MSR_IA32_POWER_CTL, msr_bits);
+}
+
static void cf_check c1e_promotion_disable(void *dummy)
{
u64 msr_bits;
@@ -1241,6 +1263,26 @@ static void __init skx_idle_state_table_
}
/*
+ * spr_idle_state_table_update - Adjust Sapphire Rapids idle states table.
+ */
+static void __init spr_idle_state_table_update(void)
+{
+ /* Check if user prefers C1E over C1. */
+ if (preferred_states_mask & BIT(2, U)) {
+ if (preferred_states_mask & BIT(1, U))
+ /* Both can't be enabled, stick to the defaults. */
+ return;
+
+ spr_cstates[0].flags |= CPUIDLE_FLAG_DISABLED;
+ spr_cstates[1].flags &= ~CPUIDLE_FLAG_DISABLED;
+
+ /* Request enabling C1E using the "C1E promotion" bit. */
+ idle_cpu_spr.disable_promotion_to_c1e = false;
+ idle_cpu_spr.enable_promotion_to_c1e = true;
+ }
+}
+
+/*
* mwait_idle_state_table_update()
*
* Update the default state_table for this CPU-id
@@ -1261,6 +1303,9 @@ static void __init mwait_idle_state_tabl
case INTEL_FAM6_SKYLAKE_X:
skx_idle_state_table_update();
break;
+ case INTEL_FAM6_SAPPHIRERAPIDS_X:
+ spr_idle_state_table_update();
+ break;
}
}
@@ -1402,6 +1447,8 @@ static int cf_check mwait_idle_cpu_init(
if (icpu->disable_promotion_to_c1e)
on_selected_cpus(cpumask_of(cpu), c1e_promotion_disable, NULL, 1);
+ else if (icpu->enable_promotion_to_c1e)
+ on_selected_cpus(cpumask_of(cpu), c1e_promotion_enable, NULL, 1);
return NOTIFY_DONE;
}
On Tue, Apr 26, 2022 at 12:05:28PM +0200, Jan Beulich wrote: > From: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> > > On Sapphire Rapids Xeon (SPR) the C1 and C1E states are basically mutually > exclusive - only one of them can be enabled. By default, 'intel_idle' driver > enables C1 and disables C1E. However, some users prefer to use C1E instead of > C1, because it saves more energy. > > This patch adds a new module parameter ('preferred_cstates') for enabling C1E > and disabling C1. Here is the idea behind it. > > 1. This option has effect only for "mutually exclusive" C-states like C1 and > C1E on SPR. > 2. It does not have any effect on independent C-states, which do not require > other C-states to be disabled (most states on most platforms as of today). > 3. For mutually exclusive C-states, the 'intel_idle' driver always has a > reasonable default, such as enabling C1 on SPR by default. On other > platforms, the default may be different. > 4. Users can override the default using the 'preferred_cstates' parameter. > 5. The parameter accepts the preferred C-states bit-mask, similarly to the > existing 'states_off' parameter. > 6. This parameter is not limited to C1/C1E, and leaves room for supporting > other mutually exclusive C-states, if they come in the future. > > Today 'intel_idle' can only be compiled-in, which means that on SPR, in order > to disable C1 and enable C1E, users should boot with the following kernel > argument: intel_idle.preferred_cstates=4 > > Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git da0e58c038e6 > > Enable C1E (if requested) not only on the BSP's socket / package. Maybe we should also add a note here that the command line option for Xen is preferred-cstates instead of intel_idle.preferred_cstates? I think this is a bad interface however, we should have a more generic option (ie: cstate-mode = 'performance | powersave') so that users don't have to fiddle with model specific C state masks. > > Signed-off-by: Jan Beulich <jbeulich@suse.com> > > --- unstable.orig/docs/misc/xen-command-line.pandoc 2022-04-25 17:59:42.123387258 +0200 > +++ unstable/docs/misc/xen-command-line.pandoc 2022-04-25 17:36:00.000000000 +0200 > @@ -1884,6 +1884,12 @@ paging controls access to usermode addre > ### ple_window (Intel) > > `= <integer>` > > +### preferred-cstates (x86) > +> `= <integer>` > + > +This is a mask of C-states which are to be use preferably. This option is > +applicable only oh hardware were certain C-states are exlusive of one another. > + > ### psr (Intel) > > `= List of ( cmt:<boolean> | rmid_max:<integer> | cat:<boolean> | cos_max:<integer> | cdp:<boolean> )` > > --- unstable.orig/xen/arch/x86/cpu/mwait-idle.c 2022-04-25 17:17:05.000000000 +0200 > +++ unstable/xen/arch/x86/cpu/mwait-idle.c 2022-04-25 17:33:47.000000000 +0200 > @@ -82,6 +82,18 @@ boolean_param("mwait-idle", opt_mwait_id > > static unsigned int mwait_substates; > > +/* > + * Some platforms come with mutually exclusive C-states, so that if one is > + * enabled, the other C-states must not be used. Example: C1 and C1E on > + * Sapphire Rapids platform. This parameter allows for selecting the > + * preferred C-states among the groups of mutually exclusive C-states - the > + * selected C-states will be registered, the other C-states from the mutually > + * exclusive group won't be registered. If the platform has no mutually > + * exclusive C-states, this parameter has no effect. > + */ > +static unsigned int __ro_after_init preferred_states_mask; > +integer_param("preferred-cstates", preferred_states_mask); > + > #define LAPIC_TIMER_ALWAYS_RELIABLE 0xFFFFFFFF > /* Reliable LAPIC Timer States, bit 1 for C1 etc. Default to only C1. */ > static unsigned int lapic_timer_reliable_states = (1 << 1); > @@ -96,6 +108,7 @@ struct idle_cpu { > unsigned long auto_demotion_disable_flags; > bool byt_auto_demotion_disable_flag; > bool disable_promotion_to_c1e; > + bool enable_promotion_to_c1e; I'm confused by those fields, shouldn't we just have: promotion_to_c1e = true | false? As one field is the negation of the other: enable_promotion_to_c1e = !disable_promotion_to_c1e I know this is code from Linux, but would like to understand why two fields are needed. Thanks, Roger.
On 27.04.2022 14:45, Roger Pau Monné wrote: > On Tue, Apr 26, 2022 at 12:05:28PM +0200, Jan Beulich wrote: >> From: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> >> >> On Sapphire Rapids Xeon (SPR) the C1 and C1E states are basically mutually >> exclusive - only one of them can be enabled. By default, 'intel_idle' driver >> enables C1 and disables C1E. However, some users prefer to use C1E instead of >> C1, because it saves more energy. >> >> This patch adds a new module parameter ('preferred_cstates') for enabling C1E >> and disabling C1. Here is the idea behind it. >> >> 1. This option has effect only for "mutually exclusive" C-states like C1 and >> C1E on SPR. >> 2. It does not have any effect on independent C-states, which do not require >> other C-states to be disabled (most states on most platforms as of today). >> 3. For mutually exclusive C-states, the 'intel_idle' driver always has a >> reasonable default, such as enabling C1 on SPR by default. On other >> platforms, the default may be different. >> 4. Users can override the default using the 'preferred_cstates' parameter. >> 5. The parameter accepts the preferred C-states bit-mask, similarly to the >> existing 'states_off' parameter. >> 6. This parameter is not limited to C1/C1E, and leaves room for supporting >> other mutually exclusive C-states, if they come in the future. >> >> Today 'intel_idle' can only be compiled-in, which means that on SPR, in order >> to disable C1 and enable C1E, users should boot with the following kernel >> argument: intel_idle.preferred_cstates=4 >> >> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> >> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> >> Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git da0e58c038e6 >> >> Enable C1E (if requested) not only on the BSP's socket / package. > > Maybe we should also add a note here that the command line option for > Xen is preferred-cstates instead of intel_idle.preferred_cstates? > > I think this is a bad interface however, we should have a more generic > option (ie: cstate-mode = 'performance | powersave') so that users > don't have to fiddle with model specific C state masks. Performance vs powersave doesn't cover it imo, especially if down the road more states would appear which can be controlled this way. I don't think there's a way around providing _some_ way to control things one a per-state level. When porting this over, I too didn't like this interface very much, but I had no good replacement idea. >> --- unstable.orig/xen/arch/x86/cpu/mwait-idle.c >> +++ unstable/xen/arch/x86/cpu/mwait-idle.c >> @@ -82,6 +82,18 @@ boolean_param("mwait-idle", opt_mwait_id >> >> static unsigned int mwait_substates; >> >> +/* >> + * Some platforms come with mutually exclusive C-states, so that if one is >> + * enabled, the other C-states must not be used. Example: C1 and C1E on >> + * Sapphire Rapids platform. This parameter allows for selecting the >> + * preferred C-states among the groups of mutually exclusive C-states - the >> + * selected C-states will be registered, the other C-states from the mutually >> + * exclusive group won't be registered. If the platform has no mutually >> + * exclusive C-states, this parameter has no effect. >> + */ >> +static unsigned int __ro_after_init preferred_states_mask; >> +integer_param("preferred-cstates", preferred_states_mask); >> + >> #define LAPIC_TIMER_ALWAYS_RELIABLE 0xFFFFFFFF >> /* Reliable LAPIC Timer States, bit 1 for C1 etc. Default to only C1. */ >> static unsigned int lapic_timer_reliable_states = (1 << 1); >> @@ -96,6 +108,7 @@ struct idle_cpu { >> unsigned long auto_demotion_disable_flags; >> bool byt_auto_demotion_disable_flag; >> bool disable_promotion_to_c1e; >> + bool enable_promotion_to_c1e; > > I'm confused by those fields, shouldn't we just have: > promotion_to_c1e = true | false? > > As one field is the negation of the other: > enable_promotion_to_c1e = !disable_promotion_to_c1e > > I know this is code from Linux, but would like to understand why two > fields are needed. This really is a tristate; Linux is now changing their global variable to an enum, but we don't have an equivalent of that global variable. Jan
On Wed, Apr 27, 2022 at 03:41:24PM +0200, Jan Beulich wrote: > On 27.04.2022 14:45, Roger Pau Monné wrote: > > On Tue, Apr 26, 2022 at 12:05:28PM +0200, Jan Beulich wrote: > >> From: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> > >> > >> On Sapphire Rapids Xeon (SPR) the C1 and C1E states are basically mutually > >> exclusive - only one of them can be enabled. By default, 'intel_idle' driver > >> enables C1 and disables C1E. However, some users prefer to use C1E instead of > >> C1, because it saves more energy. > >> > >> This patch adds a new module parameter ('preferred_cstates') for enabling C1E > >> and disabling C1. Here is the idea behind it. > >> > >> 1. This option has effect only for "mutually exclusive" C-states like C1 and > >> C1E on SPR. > >> 2. It does not have any effect on independent C-states, which do not require > >> other C-states to be disabled (most states on most platforms as of today). > >> 3. For mutually exclusive C-states, the 'intel_idle' driver always has a > >> reasonable default, such as enabling C1 on SPR by default. On other > >> platforms, the default may be different. > >> 4. Users can override the default using the 'preferred_cstates' parameter. > >> 5. The parameter accepts the preferred C-states bit-mask, similarly to the > >> existing 'states_off' parameter. > >> 6. This parameter is not limited to C1/C1E, and leaves room for supporting > >> other mutually exclusive C-states, if they come in the future. > >> > >> Today 'intel_idle' can only be compiled-in, which means that on SPR, in order > >> to disable C1 and enable C1E, users should boot with the following kernel > >> argument: intel_idle.preferred_cstates=4 > >> > >> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> > >> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > >> Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git da0e58c038e6 > >> > >> Enable C1E (if requested) not only on the BSP's socket / package. > > > > Maybe we should also add a note here that the command line option for > > Xen is preferred-cstates instead of intel_idle.preferred_cstates? > > > > I think this is a bad interface however, we should have a more generic > > option (ie: cstate-mode = 'performance | powersave') so that users > > don't have to fiddle with model specific C state masks. > > Performance vs powersave doesn't cover it imo, especially if down > the road more states would appear which can be controlled this way. > I don't think there's a way around providing _some_ way to control > things one a per-state level. When porting this over, I too didn't > like this interface very much, but I had no good replacement idea. I think it's fine to have this more fine grained control of C states, but it doesn't seem practical from a user (or distro) PoV. But then I also wonder how much of a difference this will make regarding power consumption. > >> --- unstable.orig/xen/arch/x86/cpu/mwait-idle.c > >> +++ unstable/xen/arch/x86/cpu/mwait-idle.c > >> @@ -82,6 +82,18 @@ boolean_param("mwait-idle", opt_mwait_id > >> > >> static unsigned int mwait_substates; > >> > >> +/* > >> + * Some platforms come with mutually exclusive C-states, so that if one is > >> + * enabled, the other C-states must not be used. Example: C1 and C1E on > >> + * Sapphire Rapids platform. This parameter allows for selecting the > >> + * preferred C-states among the groups of mutually exclusive C-states - the > >> + * selected C-states will be registered, the other C-states from the mutually > >> + * exclusive group won't be registered. If the platform has no mutually > >> + * exclusive C-states, this parameter has no effect. > >> + */ > >> +static unsigned int __ro_after_init preferred_states_mask; > >> +integer_param("preferred-cstates", preferred_states_mask); > >> + > >> #define LAPIC_TIMER_ALWAYS_RELIABLE 0xFFFFFFFF > >> /* Reliable LAPIC Timer States, bit 1 for C1 etc. Default to only C1. */ > >> static unsigned int lapic_timer_reliable_states = (1 << 1); > >> @@ -96,6 +108,7 @@ struct idle_cpu { > >> unsigned long auto_demotion_disable_flags; > >> bool byt_auto_demotion_disable_flag; > >> bool disable_promotion_to_c1e; > >> + bool enable_promotion_to_c1e; > > > > I'm confused by those fields, shouldn't we just have: > > promotion_to_c1e = true | false? > > > > As one field is the negation of the other: > > enable_promotion_to_c1e = !disable_promotion_to_c1e > > > > I know this is code from Linux, but would like to understand why two > > fields are needed. > > This really is a tristate; Linux is now changing their global variable > to an enum, but we don't have an equivalent of that global variable. So it would be: leave default, disable C1E promotion, enable C1E promotion. And Linux is leaving the {disable,enable}_promotion_to_c1e in idle_cpu? I guess there's not much we can do unless we want to diverge from upstream. Thanks, Roger.
On 27.04.2022 17:06, Roger Pau Monné wrote: > On Wed, Apr 27, 2022 at 03:41:24PM +0200, Jan Beulich wrote: >> On 27.04.2022 14:45, Roger Pau Monné wrote: >>> On Tue, Apr 26, 2022 at 12:05:28PM +0200, Jan Beulich wrote: >>>> --- unstable.orig/xen/arch/x86/cpu/mwait-idle.c >>>> +++ unstable/xen/arch/x86/cpu/mwait-idle.c >>>> @@ -82,6 +82,18 @@ boolean_param("mwait-idle", opt_mwait_id >>>> >>>> static unsigned int mwait_substates; >>>> >>>> +/* >>>> + * Some platforms come with mutually exclusive C-states, so that if one is >>>> + * enabled, the other C-states must not be used. Example: C1 and C1E on >>>> + * Sapphire Rapids platform. This parameter allows for selecting the >>>> + * preferred C-states among the groups of mutually exclusive C-states - the >>>> + * selected C-states will be registered, the other C-states from the mutually >>>> + * exclusive group won't be registered. If the platform has no mutually >>>> + * exclusive C-states, this parameter has no effect. >>>> + */ >>>> +static unsigned int __ro_after_init preferred_states_mask; >>>> +integer_param("preferred-cstates", preferred_states_mask); >>>> + >>>> #define LAPIC_TIMER_ALWAYS_RELIABLE 0xFFFFFFFF >>>> /* Reliable LAPIC Timer States, bit 1 for C1 etc. Default to only C1. */ >>>> static unsigned int lapic_timer_reliable_states = (1 << 1); >>>> @@ -96,6 +108,7 @@ struct idle_cpu { >>>> unsigned long auto_demotion_disable_flags; >>>> bool byt_auto_demotion_disable_flag; >>>> bool disable_promotion_to_c1e; >>>> + bool enable_promotion_to_c1e; >>> >>> I'm confused by those fields, shouldn't we just have: >>> promotion_to_c1e = true | false? >>> >>> As one field is the negation of the other: >>> enable_promotion_to_c1e = !disable_promotion_to_c1e >>> >>> I know this is code from Linux, but would like to understand why two >>> fields are needed. >> >> This really is a tristate; Linux is now changing their global variable >> to an enum, but we don't have an equivalent of that global variable. > > So it would be: leave default, disable C1E promotion, enable C1E > promotion. > > And Linux is leaving the {disable,enable}_promotion_to_c1e in > idle_cpu? Iirc they only have disable_promotion_to_c1e there (as a struct field) and keep it, but they convert the similarly named file-scope variable to a tristate. > I guess there's not much we can do unless we want to diverge from > upstream. We've diverged some from Linux here already - as said, for example we don't have their file-scope variable. I could convert our struct field to an enum, but that would be larger code churn for (I think) little gain. Jan
On Wed, Apr 27, 2022 at 05:25:35PM +0200, Jan Beulich wrote: > On 27.04.2022 17:06, Roger Pau Monné wrote: > > On Wed, Apr 27, 2022 at 03:41:24PM +0200, Jan Beulich wrote: > >> On 27.04.2022 14:45, Roger Pau Monné wrote: > >>> On Tue, Apr 26, 2022 at 12:05:28PM +0200, Jan Beulich wrote: > >>>> --- unstable.orig/xen/arch/x86/cpu/mwait-idle.c > >>>> +++ unstable/xen/arch/x86/cpu/mwait-idle.c > >>>> @@ -82,6 +82,18 @@ boolean_param("mwait-idle", opt_mwait_id > >>>> > >>>> static unsigned int mwait_substates; > >>>> > >>>> +/* > >>>> + * Some platforms come with mutually exclusive C-states, so that if one is > >>>> + * enabled, the other C-states must not be used. Example: C1 and C1E on > >>>> + * Sapphire Rapids platform. This parameter allows for selecting the > >>>> + * preferred C-states among the groups of mutually exclusive C-states - the > >>>> + * selected C-states will be registered, the other C-states from the mutually > >>>> + * exclusive group won't be registered. If the platform has no mutually > >>>> + * exclusive C-states, this parameter has no effect. > >>>> + */ > >>>> +static unsigned int __ro_after_init preferred_states_mask; > >>>> +integer_param("preferred-cstates", preferred_states_mask); > >>>> + > >>>> #define LAPIC_TIMER_ALWAYS_RELIABLE 0xFFFFFFFF > >>>> /* Reliable LAPIC Timer States, bit 1 for C1 etc. Default to only C1. */ > >>>> static unsigned int lapic_timer_reliable_states = (1 << 1); > >>>> @@ -96,6 +108,7 @@ struct idle_cpu { > >>>> unsigned long auto_demotion_disable_flags; > >>>> bool byt_auto_demotion_disable_flag; > >>>> bool disable_promotion_to_c1e; > >>>> + bool enable_promotion_to_c1e; > >>> > >>> I'm confused by those fields, shouldn't we just have: > >>> promotion_to_c1e = true | false? > >>> > >>> As one field is the negation of the other: > >>> enable_promotion_to_c1e = !disable_promotion_to_c1e > >>> > >>> I know this is code from Linux, but would like to understand why two > >>> fields are needed. > >> > >> This really is a tristate; Linux is now changing their global variable > >> to an enum, but we don't have an equivalent of that global variable. > > > > So it would be: leave default, disable C1E promotion, enable C1E > > promotion. > > > > And Linux is leaving the {disable,enable}_promotion_to_c1e in > > idle_cpu? > > Iirc they only have disable_promotion_to_c1e there (as a struct field) > and keep it, but they convert the similarly named file-scope variable > to a tristate. > > > I guess there's not much we can do unless we want to diverge from > > upstream. > > We've diverged some from Linux here already - as said, for example we > don't have their file-scope variable. I could convert our struct field > to an enum, but that would be larger code churn for (I think) little > gain. Hm, OK, could gaining the file scope variable would make sense in order to reduce divergences? Or are the other roadblocks there? I think this is ugly, but would make sense as long as it allows us to keep closer to upstream. Thanks, Roger.
On 27.04.2022 18:12, Roger Pau Monné wrote: > On Wed, Apr 27, 2022 at 05:25:35PM +0200, Jan Beulich wrote: >> On 27.04.2022 17:06, Roger Pau Monné wrote: >>> On Wed, Apr 27, 2022 at 03:41:24PM +0200, Jan Beulich wrote: >>>> On 27.04.2022 14:45, Roger Pau Monné wrote: >>>>> On Tue, Apr 26, 2022 at 12:05:28PM +0200, Jan Beulich wrote: >>>>>> --- unstable.orig/xen/arch/x86/cpu/mwait-idle.c >>>>>> +++ unstable/xen/arch/x86/cpu/mwait-idle.c >>>>>> @@ -82,6 +82,18 @@ boolean_param("mwait-idle", opt_mwait_id >>>>>> >>>>>> static unsigned int mwait_substates; >>>>>> >>>>>> +/* >>>>>> + * Some platforms come with mutually exclusive C-states, so that if one is >>>>>> + * enabled, the other C-states must not be used. Example: C1 and C1E on >>>>>> + * Sapphire Rapids platform. This parameter allows for selecting the >>>>>> + * preferred C-states among the groups of mutually exclusive C-states - the >>>>>> + * selected C-states will be registered, the other C-states from the mutually >>>>>> + * exclusive group won't be registered. If the platform has no mutually >>>>>> + * exclusive C-states, this parameter has no effect. >>>>>> + */ >>>>>> +static unsigned int __ro_after_init preferred_states_mask; >>>>>> +integer_param("preferred-cstates", preferred_states_mask); >>>>>> + >>>>>> #define LAPIC_TIMER_ALWAYS_RELIABLE 0xFFFFFFFF >>>>>> /* Reliable LAPIC Timer States, bit 1 for C1 etc. Default to only C1. */ >>>>>> static unsigned int lapic_timer_reliable_states = (1 << 1); >>>>>> @@ -96,6 +108,7 @@ struct idle_cpu { >>>>>> unsigned long auto_demotion_disable_flags; >>>>>> bool byt_auto_demotion_disable_flag; >>>>>> bool disable_promotion_to_c1e; >>>>>> + bool enable_promotion_to_c1e; >>>>> >>>>> I'm confused by those fields, shouldn't we just have: >>>>> promotion_to_c1e = true | false? >>>>> >>>>> As one field is the negation of the other: >>>>> enable_promotion_to_c1e = !disable_promotion_to_c1e >>>>> >>>>> I know this is code from Linux, but would like to understand why two >>>>> fields are needed. >>>> >>>> This really is a tristate; Linux is now changing their global variable >>>> to an enum, but we don't have an equivalent of that global variable. >>> >>> So it would be: leave default, disable C1E promotion, enable C1E >>> promotion. >>> >>> And Linux is leaving the {disable,enable}_promotion_to_c1e in >>> idle_cpu? >> >> Iirc they only have disable_promotion_to_c1e there (as a struct field) >> and keep it, but they convert the similarly named file-scope variable >> to a tristate. >> >>> I guess there's not much we can do unless we want to diverge from >>> upstream. >> >> We've diverged some from Linux here already - as said, for example we >> don't have their file-scope variable. I could convert our struct field >> to an enum, but that would be larger code churn for (I think) little >> gain. > > Hm, OK, could gaining the file scope variable would make sense in order > to reduce divergences? Or are the other roadblocks there? I don't recall. It might have originated from a change I decided to not port over, or I might have dropped it while porting. To be honest I'm not keen on putting time into researching this, the more that I would generally try to avoid static variables. What I would be willing to put time in is making a more user friendly command line option, but as said - I can't think of any good alternative (except perhaps "preferred-cstates=c1e" or "cstates=preferred:c1e", with internal translation of the strings into a bit mask, as long as (a) you would think that's an improvement and (b) the further divergence from Linux is not deemed a problem). Jan > I think this is ugly, but would make sense as long as it allows us to > keep closer to upstream. > > Thanks, Roger. >
On Thu, Apr 28, 2022 at 08:37:58AM +0200, Jan Beulich wrote: > On 27.04.2022 18:12, Roger Pau Monné wrote: > > On Wed, Apr 27, 2022 at 05:25:35PM +0200, Jan Beulich wrote: > >> On 27.04.2022 17:06, Roger Pau Monné wrote: > >>> On Wed, Apr 27, 2022 at 03:41:24PM +0200, Jan Beulich wrote: > >>>> On 27.04.2022 14:45, Roger Pau Monné wrote: > >>>>> On Tue, Apr 26, 2022 at 12:05:28PM +0200, Jan Beulich wrote: > >>>>>> --- unstable.orig/xen/arch/x86/cpu/mwait-idle.c > >>>>>> +++ unstable/xen/arch/x86/cpu/mwait-idle.c > >>>>>> @@ -82,6 +82,18 @@ boolean_param("mwait-idle", opt_mwait_id > >>>>>> > >>>>>> static unsigned int mwait_substates; > >>>>>> > >>>>>> +/* > >>>>>> + * Some platforms come with mutually exclusive C-states, so that if one is > >>>>>> + * enabled, the other C-states must not be used. Example: C1 and C1E on > >>>>>> + * Sapphire Rapids platform. This parameter allows for selecting the > >>>>>> + * preferred C-states among the groups of mutually exclusive C-states - the > >>>>>> + * selected C-states will be registered, the other C-states from the mutually > >>>>>> + * exclusive group won't be registered. If the platform has no mutually > >>>>>> + * exclusive C-states, this parameter has no effect. > >>>>>> + */ > >>>>>> +static unsigned int __ro_after_init preferred_states_mask; > >>>>>> +integer_param("preferred-cstates", preferred_states_mask); > >>>>>> + > >>>>>> #define LAPIC_TIMER_ALWAYS_RELIABLE 0xFFFFFFFF > >>>>>> /* Reliable LAPIC Timer States, bit 1 for C1 etc. Default to only C1. */ > >>>>>> static unsigned int lapic_timer_reliable_states = (1 << 1); > >>>>>> @@ -96,6 +108,7 @@ struct idle_cpu { > >>>>>> unsigned long auto_demotion_disable_flags; > >>>>>> bool byt_auto_demotion_disable_flag; > >>>>>> bool disable_promotion_to_c1e; > >>>>>> + bool enable_promotion_to_c1e; > >>>>> > >>>>> I'm confused by those fields, shouldn't we just have: > >>>>> promotion_to_c1e = true | false? > >>>>> > >>>>> As one field is the negation of the other: > >>>>> enable_promotion_to_c1e = !disable_promotion_to_c1e > >>>>> > >>>>> I know this is code from Linux, but would like to understand why two > >>>>> fields are needed. > >>>> > >>>> This really is a tristate; Linux is now changing their global variable > >>>> to an enum, but we don't have an equivalent of that global variable. > >>> > >>> So it would be: leave default, disable C1E promotion, enable C1E > >>> promotion. > >>> > >>> And Linux is leaving the {disable,enable}_promotion_to_c1e in > >>> idle_cpu? > >> > >> Iirc they only have disable_promotion_to_c1e there (as a struct field) > >> and keep it, but they convert the similarly named file-scope variable > >> to a tristate. > >> > >>> I guess there's not much we can do unless we want to diverge from > >>> upstream. > >> > >> We've diverged some from Linux here already - as said, for example we > >> don't have their file-scope variable. I could convert our struct field > >> to an enum, but that would be larger code churn for (I think) little > >> gain. > > > > Hm, OK, could gaining the file scope variable would make sense in order > > to reduce divergences? Or are the other roadblocks there? > > I don't recall. It might have originated from a change I decided to not > port over, or I might have dropped it while porting. To be honest I'm > not keen on putting time into researching this, the more that I would > generally try to avoid static variables. > > What I would be willing to put time in is making a more user friendly > command line option, but as said - I can't think of any good alternative > (except perhaps "preferred-cstates=c1e" or "cstates=preferred:c1e", with > internal translation of the strings into a bit mask, as long as (a) you > would think that's an improvement and (b) the further divergence from > Linux is not deemed a problem). I think (b) won't be a problem as long as internally the user option is translated into a bitmask. Regarding (a) I do think it would be helpful to express this in a more user friendly way, I'm not sure whether it would make sense to keep Linux format also for compatibility reasons if users already have a bitmask and want to use the same parameter for Xen and Linux, ie: preferred-cstates = <string of c1e,c1,...> | <integer bitmask> What I think we should fix is the naming of the two booleans: bool disable_promotion_to_c1e; bool enable_promotion_to_c1e; I would rather translated this into an enum, as right now it's confusing IMO. Thanks, Roger.
On 28.04.2022 10:06, Roger Pau Monné wrote: > On Thu, Apr 28, 2022 at 08:37:58AM +0200, Jan Beulich wrote: >> On 27.04.2022 18:12, Roger Pau Monné wrote: >>> On Wed, Apr 27, 2022 at 05:25:35PM +0200, Jan Beulich wrote: >>>> On 27.04.2022 17:06, Roger Pau Monné wrote: >>>>> On Wed, Apr 27, 2022 at 03:41:24PM +0200, Jan Beulich wrote: >>>>>> On 27.04.2022 14:45, Roger Pau Monné wrote: >>>>>>> On Tue, Apr 26, 2022 at 12:05:28PM +0200, Jan Beulich wrote: >>>>>>>> --- unstable.orig/xen/arch/x86/cpu/mwait-idle.c >>>>>>>> +++ unstable/xen/arch/x86/cpu/mwait-idle.c >>>>>>>> @@ -82,6 +82,18 @@ boolean_param("mwait-idle", opt_mwait_id >>>>>>>> >>>>>>>> static unsigned int mwait_substates; >>>>>>>> >>>>>>>> +/* >>>>>>>> + * Some platforms come with mutually exclusive C-states, so that if one is >>>>>>>> + * enabled, the other C-states must not be used. Example: C1 and C1E on >>>>>>>> + * Sapphire Rapids platform. This parameter allows for selecting the >>>>>>>> + * preferred C-states among the groups of mutually exclusive C-states - the >>>>>>>> + * selected C-states will be registered, the other C-states from the mutually >>>>>>>> + * exclusive group won't be registered. If the platform has no mutually >>>>>>>> + * exclusive C-states, this parameter has no effect. >>>>>>>> + */ >>>>>>>> +static unsigned int __ro_after_init preferred_states_mask; >>>>>>>> +integer_param("preferred-cstates", preferred_states_mask); >>>>>>>> + >>>>>>>> #define LAPIC_TIMER_ALWAYS_RELIABLE 0xFFFFFFFF >>>>>>>> /* Reliable LAPIC Timer States, bit 1 for C1 etc. Default to only C1. */ >>>>>>>> static unsigned int lapic_timer_reliable_states = (1 << 1); >>>>>>>> @@ -96,6 +108,7 @@ struct idle_cpu { >>>>>>>> unsigned long auto_demotion_disable_flags; >>>>>>>> bool byt_auto_demotion_disable_flag; >>>>>>>> bool disable_promotion_to_c1e; >>>>>>>> + bool enable_promotion_to_c1e; >>>>>>> >>>>>>> I'm confused by those fields, shouldn't we just have: >>>>>>> promotion_to_c1e = true | false? >>>>>>> >>>>>>> As one field is the negation of the other: >>>>>>> enable_promotion_to_c1e = !disable_promotion_to_c1e >>>>>>> >>>>>>> I know this is code from Linux, but would like to understand why two >>>>>>> fields are needed. >>>>>> >>>>>> This really is a tristate; Linux is now changing their global variable >>>>>> to an enum, but we don't have an equivalent of that global variable. >>>>> >>>>> So it would be: leave default, disable C1E promotion, enable C1E >>>>> promotion. >>>>> >>>>> And Linux is leaving the {disable,enable}_promotion_to_c1e in >>>>> idle_cpu? >>>> >>>> Iirc they only have disable_promotion_to_c1e there (as a struct field) >>>> and keep it, but they convert the similarly named file-scope variable >>>> to a tristate. >>>> >>>>> I guess there's not much we can do unless we want to diverge from >>>>> upstream. >>>> >>>> We've diverged some from Linux here already - as said, for example we >>>> don't have their file-scope variable. I could convert our struct field >>>> to an enum, but that would be larger code churn for (I think) little >>>> gain. >>> >>> Hm, OK, could gaining the file scope variable would make sense in order >>> to reduce divergences? Or are the other roadblocks there? >> >> I don't recall. It might have originated from a change I decided to not >> port over, or I might have dropped it while porting. To be honest I'm >> not keen on putting time into researching this, the more that I would >> generally try to avoid static variables. >> >> What I would be willing to put time in is making a more user friendly >> command line option, but as said - I can't think of any good alternative >> (except perhaps "preferred-cstates=c1e" or "cstates=preferred:c1e", with >> internal translation of the strings into a bit mask, as long as (a) you >> would think that's an improvement and (b) the further divergence from >> Linux is not deemed a problem). > > I think (b) won't be a problem as long as internally the user option > is translated into a bitmask. > > Regarding (a) I do think it would be helpful to express this in a more > user friendly way, I'm not sure whether it would make sense to keep > Linux format also for compatibility reasons if users already have a > bitmask and want to use the same parameter for Xen and Linux, ie: > > preferred-cstates = <string of c1e,c1,...> | <integer bitmask> Yes, I would have been planning to accept both (but probably reject mixing of both). > What I think we should fix is the naming of the two booleans: > > bool disable_promotion_to_c1e; > bool enable_promotion_to_c1e; > > I would rather translated this into an enum, as right now it's > confusing IMO. Okay, I can certainly do that. The more that I did consider doing so already, breaking ties simply based on the "less code churn" argument. Jan
© 2016 - 2024 Red Hat, Inc.