Currently the user can set up isolcpus and nohz_full in such a way that
leaves no housekeeping CPU (i.e. no CPU that is neither domain isolated
nor nohz full). This can be a problem for other subsystems (e.g. the
timer wheel imgration).
Prevent this configuration by invalidating the last setting in case the
union of isolcpus and nohz_full covers all CPUs.
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
kernel/sched/isolation.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 93b038d48900..0019d941de68 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -165,6 +165,18 @@ static int __init housekeeping_setup(char *str, unsigned long flags)
}
}
+ /* Check in combination with the previously set cpumask */
+ type = find_first_bit(&housekeeping.flags, HK_TYPE_MAX);
+ first_cpu = cpumask_first_and_and(cpu_present_mask,
+ housekeeping_staging,
+ housekeeping.cpumasks[type]);
+ if (first_cpu >= nr_cpu_ids || first_cpu >= setup_max_cpus) {
+ pr_warn("Housekeeping: must include one present CPU neither "
+ "in nohz_full= nor in isolcpus=, ignoring setting %s\n",
+ str);
+ goto free_housekeeping_staging;
+ }
+
iter_flags = flags & ~housekeeping.flags;
for_each_set_bit(type, &iter_flags, HK_TYPE_MAX)
--
2.50.1
On 7/30/25 9:11 AM, Gabriele Monaco wrote: > Currently the user can set up isolcpus and nohz_full in such a way that > leaves no housekeeping CPU (i.e. no CPU that is neither domain isolated > nor nohz full). This can be a problem for other subsystems (e.g. the > timer wheel imgration). > > Prevent this configuration by invalidating the last setting in case the > union of isolcpus and nohz_full covers all CPUs. > > Reviewed-by: Frederic Weisbecker <frederic@kernel.org> > Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> > --- > kernel/sched/isolation.c | 12 ++++++++++++ > 1 file changed, 12 insertions(+) > > diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c > index 93b038d48900..0019d941de68 100644 > --- a/kernel/sched/isolation.c > +++ b/kernel/sched/isolation.c > @@ -165,6 +165,18 @@ static int __init housekeeping_setup(char *str, unsigned long flags) > } > } > > + /* Check in combination with the previously set cpumask */ > + type = find_first_bit(&housekeeping.flags, HK_TYPE_MAX); > + first_cpu = cpumask_first_and_and(cpu_present_mask, > + housekeeping_staging, > + housekeeping.cpumasks[type]); > + if (first_cpu >= nr_cpu_ids || first_cpu >= setup_max_cpus) { > + pr_warn("Housekeeping: must include one present CPU neither " > + "in nohz_full= nor in isolcpus=, ignoring setting %s\n", > + str); > + goto free_housekeeping_staging; > + } > + > iter_flags = flags & ~housekeeping.flags; > > for_each_set_bit(type, &iter_flags, HK_TYPE_MAX) I do have a question about this check. Currently isolcpus=domain is bit 0, managed_irq is bit 1 and nohz_full is bit 2. If manage_irq come first followed by nohz_full and then isolcpus=domain. By the time, isolcpus=domain is being set, you are comparing its cpumask with that of manage_irq, not nohz_full. Perhaps you can reuse the non_housekeeping_mask for doing the check, e.g. cpumask_and(non_housekeeping_mask, cpu_present_mask, housekeeping_staging); iter_flags = housekeeping.flags & ~flags; for_each_set_bit(type, &iter_flags, HK_TYPE_MAX) cpumask_and(non_housekeeping_mask, non_housekeeping_mask, housekeeping.cpumasks[type]); if (cpumask_empty(non_housekeeping_mask)) { pr_warn(... Regards, Longman
On Thu, 2025-07-31 at 11:09 -0400, Waiman Long wrote: > > On 7/30/25 9:11 AM, Gabriele Monaco wrote: > > Currently the user can set up isolcpus and nohz_full in such a way > > that > > leaves no housekeeping CPU (i.e. no CPU that is neither domain > > isolated > > nor nohz full). This can be a problem for other subsystems (e.g. > > the > > timer wheel imgration). > > > > Prevent this configuration by invalidating the last setting in case > > the > > union of isolcpus and nohz_full covers all CPUs. > > > > Reviewed-by: Frederic Weisbecker <frederic@kernel.org> > > Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> > > --- > > kernel/sched/isolation.c | 12 ++++++++++++ > > 1 file changed, 12 insertions(+) > > > > diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c > > index 93b038d48900..0019d941de68 100644 > > --- a/kernel/sched/isolation.c > > +++ b/kernel/sched/isolation.c > > @@ -165,6 +165,18 @@ static int __init housekeeping_setup(char > > *str, unsigned long flags) > > } > > } > > > > + /* Check in combination with the previously set > > cpumask */ > > + type = find_first_bit(&housekeeping.flags, > > HK_TYPE_MAX); > > + first_cpu = > > cpumask_first_and_and(cpu_present_mask, > > + > > housekeeping_staging, > > + > > housekeeping.cpumasks[type]); > > + if (first_cpu >= nr_cpu_ids || first_cpu >= > > setup_max_cpus) { > > + pr_warn("Housekeeping: must include one > > present CPU neither " > > + "in nohz_full= nor in isolcpus=, > > ignoring setting %s\n", > > + str); > > + goto free_housekeeping_staging; > > + } > > + > > iter_flags = flags & ~housekeeping.flags; > > > > for_each_set_bit(type, &iter_flags, HK_TYPE_MAX) > > I do have a question about this check. Currently isolcpus=domain is > bit 0, managed_irq is bit 1 and nohz_full is bit 2. If manage_irq > come first followed by nohz_full and then isolcpus=domain. By the > time, isolcpus=domain is being set, you are comparing its cpumask > with that of manage_irq, not nohz_full. > > Perhaps you can reuse the non_housekeeping_mask for doing the check, > e.g. > > cpumask_and(non_housekeeping_mask, cpu_present_mask, > housekeeping_staging); > iter_flags = housekeeping.flags & ~flags; > for_each_set_bit(type, &iter_flags, HK_TYPE_MAX) > cpumask_and(non_housekeeping_mask, > non_housekeeping_mask, housekeeping.cpumasks[type]); > if (cpumask_empty(non_housekeeping_mask)) { > pr_warn(... Mmh right didn't think passing different masks in isocpus was possible. You mean something like this right? isolcpus=managed_irq,0-4 nohz_full=8-15 isolcpus=domain,0-7 Which doesn't block the nohz_full because the first mask (managed_irq) leaves spaces. Right now we block assignments like isolcpus=managed_irq,0-7 nohz_full=8-15 and isolcpus=managed_irq,0-7 -a isolcpus=domain,8-15 although this series doesn't really have problems with it. Shouldn't we just ignore these cases and just count domain + nohz_full? The solution you propose is to check all housekeeping, so it would also prevent the (safe?) assignments above, right? We could just check against the previously set domain/nohz_full and leave other flags alone, couldn't we? Thanks, Gabriele
On 8/1/25 10:46 AM, Gabriele Monaco wrote: > > On Thu, 2025-07-31 at 11:09 -0400, Waiman Long wrote: >> On 7/30/25 9:11 AM, Gabriele Monaco wrote: >>> Currently the user can set up isolcpus and nohz_full in such a way >>> that >>> leaves no housekeeping CPU (i.e. no CPU that is neither domain >>> isolated >>> nor nohz full). This can be a problem for other subsystems (e.g. >>> the >>> timer wheel imgration). >>> >>> Prevent this configuration by invalidating the last setting in case >>> the >>> union of isolcpus and nohz_full covers all CPUs. >>> >>> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> >>> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> >>> --- >>> kernel/sched/isolation.c | 12 ++++++++++++ >>> 1 file changed, 12 insertions(+) >>> >>> diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c >>> index 93b038d48900..0019d941de68 100644 >>> --- a/kernel/sched/isolation.c >>> +++ b/kernel/sched/isolation.c >>> @@ -165,6 +165,18 @@ static int __init housekeeping_setup(char >>> *str, unsigned long flags) >>> } >>> } >>> >>> + /* Check in combination with the previously set >>> cpumask */ >>> + type = find_first_bit(&housekeeping.flags, >>> HK_TYPE_MAX); >>> + first_cpu = >>> cpumask_first_and_and(cpu_present_mask, >>> + >>> housekeeping_staging, >>> + >>> housekeeping.cpumasks[type]); >>> + if (first_cpu >= nr_cpu_ids || first_cpu >= >>> setup_max_cpus) { >>> + pr_warn("Housekeeping: must include one >>> present CPU neither " >>> + "in nohz_full= nor in isolcpus=, >>> ignoring setting %s\n", >>> + str); >>> + goto free_housekeeping_staging; >>> + } >>> + >>> iter_flags = flags & ~housekeeping.flags; >>> >>> for_each_set_bit(type, &iter_flags, HK_TYPE_MAX) >> I do have a question about this check. Currently isolcpus=domain is >> bit 0, managed_irq is bit 1 and nohz_full is bit 2. If manage_irq >> come first followed by nohz_full and then isolcpus=domain. By the >> time, isolcpus=domain is being set, you are comparing its cpumask >> with that of manage_irq, not nohz_full. >> >> Perhaps you can reuse the non_housekeeping_mask for doing the check, >> e.g. >> >> cpumask_and(non_housekeeping_mask, cpu_present_mask, >> housekeeping_staging); >> iter_flags = housekeeping.flags & ~flags; >> for_each_set_bit(type, &iter_flags, HK_TYPE_MAX) >> cpumask_and(non_housekeeping_mask, >> non_housekeeping_mask, housekeeping.cpumasks[type]); >> if (cpumask_empty(non_housekeeping_mask)) { >> pr_warn(... > Mmh right didn't think passing different masks in isocpus was possible. > > You mean something like this right? > > isolcpus=managed_irq,0-4 nohz_full=8-15 isolcpus=domain,0-7 > > Which doesn't block the nohz_full because the first mask (managed_irq) > leaves spaces. Yes, that is what I am talking about. > > Right now we block assignments like > > isolcpus=managed_irq,0-7 nohz_full=8-15 > > and > > isolcpus=managed_irq,0-7 -a isolcpus=domain,8-15 > > although this series doesn't really have problems with it. > Shouldn't we just ignore these cases and just count domain + nohz_full? You could, but you have to explicitly exclude managed_irq in your logic. > > The solution you propose is to check all housekeeping, so it would also > prevent the (safe?) assignments above, right? > > We could just check against the previously set domain/nohz_full and > leave other flags alone, couldn't we? You will have to modify your logic and be explicit that managed_irq is currently ignored. Cheers, Longman
© 2016 - 2025 Red Hat, Inc.