[PATCH v9 6/8] sched/isolation: Force housekeeping if isolcpus and nohz_full don't leave any

Gabriele Monaco posted 8 patches 2 months, 1 week ago
There is a newer version of this series
[PATCH v9 6/8] sched/isolation: Force housekeeping if isolcpus and nohz_full don't leave any
Posted by Gabriele Monaco 2 months, 1 week ago
Currently the user can set up isolcpus and nohz_full in such a way that
leaves no housekeeping CPU (i.e. no CPU that is neither domain isolated
nor nohz full). This can be a problem for other subsystems (e.g. the
timer wheel imgration).

Prevent this configuration by invalidating the last setting in case the
union of isolcpus and nohz_full covers all CPUs.

Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
 kernel/sched/isolation.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 93b038d48900..0019d941de68 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -165,6 +165,18 @@ static int __init housekeeping_setup(char *str, unsigned long flags)
 			}
 		}
 
+		/* Check in combination with the previously set cpumask */
+		type = find_first_bit(&housekeeping.flags, HK_TYPE_MAX);
+		first_cpu = cpumask_first_and_and(cpu_present_mask,
+						  housekeeping_staging,
+						  housekeeping.cpumasks[type]);
+		if (first_cpu >= nr_cpu_ids || first_cpu >= setup_max_cpus) {
+			pr_warn("Housekeeping: must include one present CPU neither "
+				"in nohz_full= nor in isolcpus=, ignoring setting %s\n",
+				str);
+			goto free_housekeeping_staging;
+		}
+
 		iter_flags = flags & ~housekeeping.flags;
 
 		for_each_set_bit(type, &iter_flags, HK_TYPE_MAX)
-- 
2.50.1
Re: [PATCH v9 6/8] sched/isolation: Force housekeeping if isolcpus and nohz_full don't leave any
Posted by Waiman Long 2 months ago
On 7/30/25 9:11 AM, Gabriele Monaco wrote:
> Currently the user can set up isolcpus and nohz_full in such a way that
> leaves no housekeeping CPU (i.e. no CPU that is neither domain isolated
> nor nohz full). This can be a problem for other subsystems (e.g. the
> timer wheel imgration).
>
> Prevent this configuration by invalidating the last setting in case the
> union of isolcpus and nohz_full covers all CPUs.
>
> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
> ---
>   kernel/sched/isolation.c | 12 ++++++++++++
>   1 file changed, 12 insertions(+)
>
> diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
> index 93b038d48900..0019d941de68 100644
> --- a/kernel/sched/isolation.c
> +++ b/kernel/sched/isolation.c
> @@ -165,6 +165,18 @@ static int __init housekeeping_setup(char *str, unsigned long flags)
>   			}
>   		}
>   
> +		/* Check in combination with the previously set cpumask */
> +		type = find_first_bit(&housekeeping.flags, HK_TYPE_MAX);
> +		first_cpu = cpumask_first_and_and(cpu_present_mask,
> +						  housekeeping_staging,
> +						  housekeeping.cpumasks[type]);
> +		if (first_cpu >= nr_cpu_ids || first_cpu >= setup_max_cpus) {
> +			pr_warn("Housekeeping: must include one present CPU neither "
> +				"in nohz_full= nor in isolcpus=, ignoring setting %s\n",
> +				str);
> +			goto free_housekeeping_staging;
> +		}
> +
>   		iter_flags = flags & ~housekeeping.flags;
>   
>   		for_each_set_bit(type, &iter_flags, HK_TYPE_MAX)

I do have a question about this check. Currently isolcpus=domain is bit 
0, managed_irq is bit 1 and nohz_full is bit 2. If manage_irq come first 
followed by nohz_full and then isolcpus=domain. By the time, 
isolcpus=domain is being set, you are comparing its cpumask with that of 
manage_irq, not nohz_full.

Perhaps you can reuse the non_housekeeping_mask for doing the check, e.g.

         cpumask_and(non_housekeeping_mask, cpu_present_mask, 
housekeeping_staging);
         iter_flags = housekeeping.flags & ~flags;
         for_each_set_bit(type, &iter_flags, HK_TYPE_MAX)
                 cpumask_and(non_housekeeping_mask, 
non_housekeeping_mask, housekeeping.cpumasks[type]);
         if (cpumask_empty(non_housekeeping_mask)) {
                 pr_warn(...

Regards,
Longman

Re: [PATCH v9 6/8] sched/isolation: Force housekeeping if isolcpus and nohz_full don't leave any
Posted by Gabriele Monaco 2 months ago

On Thu, 2025-07-31 at 11:09 -0400, Waiman Long wrote:
> 
> On 7/30/25 9:11 AM, Gabriele Monaco wrote:
> > Currently the user can set up isolcpus and nohz_full in such a way
> > that
> > leaves no housekeeping CPU (i.e. no CPU that is neither domain
> > isolated
> > nor nohz full). This can be a problem for other subsystems (e.g.
> > the
> > timer wheel imgration).
> > 
> > Prevent this configuration by invalidating the last setting in case
> > the
> > union of isolcpus and nohz_full covers all CPUs.
> > 
> > Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> > Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
> > ---
> >   kernel/sched/isolation.c | 12 ++++++++++++
> >   1 file changed, 12 insertions(+)
> > 
> > diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
> > index 93b038d48900..0019d941de68 100644
> > --- a/kernel/sched/isolation.c
> > +++ b/kernel/sched/isolation.c
> > @@ -165,6 +165,18 @@ static int __init housekeeping_setup(char
> > *str, unsigned long flags)
> >   			}
> >   		}
> >   
> > +		/* Check in combination with the previously set
> > cpumask */
> > +		type = find_first_bit(&housekeeping.flags,
> > HK_TYPE_MAX);
> > +		first_cpu =
> > cpumask_first_and_and(cpu_present_mask,
> > +						 
> > housekeeping_staging,
> > +						 
> > housekeeping.cpumasks[type]);
> > +		if (first_cpu >= nr_cpu_ids || first_cpu >=
> > setup_max_cpus) {
> > +			pr_warn("Housekeeping: must include one
> > present CPU neither "
> > +				"in nohz_full= nor in isolcpus=,
> > ignoring setting %s\n",
> > +				str);
> > +			goto free_housekeeping_staging;
> > +		}
> > +
> >   		iter_flags = flags & ~housekeeping.flags;
> >   
> >   		for_each_set_bit(type, &iter_flags, HK_TYPE_MAX)
> 
> I do have a question about this check. Currently isolcpus=domain is
> bit 0, managed_irq is bit 1 and nohz_full is bit 2. If manage_irq
> come first followed by nohz_full and then isolcpus=domain. By the
> time, isolcpus=domain is being set, you are comparing its cpumask
> with that of manage_irq, not nohz_full.
> 
> Perhaps you can reuse the non_housekeeping_mask for doing the check,
> e.g.
> 
>          cpumask_and(non_housekeeping_mask, cpu_present_mask, 
> housekeeping_staging);
>          iter_flags = housekeeping.flags & ~flags;
>          for_each_set_bit(type, &iter_flags, HK_TYPE_MAX)
>                  cpumask_and(non_housekeeping_mask, 
> non_housekeeping_mask, housekeeping.cpumasks[type]);
>          if (cpumask_empty(non_housekeeping_mask)) {
>                  pr_warn(...

Mmh right didn't think passing different masks in isocpus was possible.

You mean something like this right?

 isolcpus=managed_irq,0-4 nohz_full=8-15 isolcpus=domain,0-7

Which doesn't block the nohz_full because the first mask (managed_irq)
leaves spaces.

Right now we block assignments like

 isolcpus=managed_irq,0-7 nohz_full=8-15

and 

 isolcpus=managed_irq,0-7 -a isolcpus=domain,8-15

although this series doesn't really have problems with it.
Shouldn't we just ignore these cases and just count domain + nohz_full?

The solution you propose is to check all housekeeping, so it would also
prevent the (safe?) assignments above, right?

We could just check against the previously set domain/nohz_full and
leave other flags alone, couldn't we?

Thanks,
Gabriele
Re: [PATCH v9 6/8] sched/isolation: Force housekeeping if isolcpus and nohz_full don't leave any
Posted by Waiman Long 2 months ago
On 8/1/25 10:46 AM, Gabriele Monaco wrote:
>
> On Thu, 2025-07-31 at 11:09 -0400, Waiman Long wrote:
>> On 7/30/25 9:11 AM, Gabriele Monaco wrote:
>>> Currently the user can set up isolcpus and nohz_full in such a way
>>> that
>>> leaves no housekeeping CPU (i.e. no CPU that is neither domain
>>> isolated
>>> nor nohz full). This can be a problem for other subsystems (e.g.
>>> the
>>> timer wheel imgration).
>>>
>>> Prevent this configuration by invalidating the last setting in case
>>> the
>>> union of isolcpus and nohz_full covers all CPUs.
>>>
>>> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
>>> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
>>> ---
>>>    kernel/sched/isolation.c | 12 ++++++++++++
>>>    1 file changed, 12 insertions(+)
>>>
>>> diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
>>> index 93b038d48900..0019d941de68 100644
>>> --- a/kernel/sched/isolation.c
>>> +++ b/kernel/sched/isolation.c
>>> @@ -165,6 +165,18 @@ static int __init housekeeping_setup(char
>>> *str, unsigned long flags)
>>>    			}
>>>    		}
>>>    
>>> +		/* Check in combination with the previously set
>>> cpumask */
>>> +		type = find_first_bit(&housekeeping.flags,
>>> HK_TYPE_MAX);
>>> +		first_cpu =
>>> cpumask_first_and_and(cpu_present_mask,
>>> +						
>>> housekeeping_staging,
>>> +						
>>> housekeeping.cpumasks[type]);
>>> +		if (first_cpu >= nr_cpu_ids || first_cpu >=
>>> setup_max_cpus) {
>>> +			pr_warn("Housekeeping: must include one
>>> present CPU neither "
>>> +				"in nohz_full= nor in isolcpus=,
>>> ignoring setting %s\n",
>>> +				str);
>>> +			goto free_housekeeping_staging;
>>> +		}
>>> +
>>>    		iter_flags = flags & ~housekeeping.flags;
>>>    
>>>    		for_each_set_bit(type, &iter_flags, HK_TYPE_MAX)
>> I do have a question about this check. Currently isolcpus=domain is
>> bit 0, managed_irq is bit 1 and nohz_full is bit 2. If manage_irq
>> come first followed by nohz_full and then isolcpus=domain. By the
>> time, isolcpus=domain is being set, you are comparing its cpumask
>> with that of manage_irq, not nohz_full.
>>
>> Perhaps you can reuse the non_housekeeping_mask for doing the check,
>> e.g.
>>
>>           cpumask_and(non_housekeeping_mask, cpu_present_mask,
>> housekeeping_staging);
>>           iter_flags = housekeeping.flags & ~flags;
>>           for_each_set_bit(type, &iter_flags, HK_TYPE_MAX)
>>                   cpumask_and(non_housekeeping_mask,
>> non_housekeeping_mask, housekeeping.cpumasks[type]);
>>           if (cpumask_empty(non_housekeeping_mask)) {
>>                   pr_warn(...
> Mmh right didn't think passing different masks in isocpus was possible.
>
> You mean something like this right?
>
>   isolcpus=managed_irq,0-4 nohz_full=8-15 isolcpus=domain,0-7
>
> Which doesn't block the nohz_full because the first mask (managed_irq)
> leaves spaces.
Yes, that is what I am talking about.
>
> Right now we block assignments like
>
>   isolcpus=managed_irq,0-7 nohz_full=8-15
>
> and
>
>   isolcpus=managed_irq,0-7 -a isolcpus=domain,8-15
>
> although this series doesn't really have problems with it.
> Shouldn't we just ignore these cases and just count domain + nohz_full?

You could, but you have to explicitly exclude managed_irq in your logic.


>
> The solution you propose is to check all housekeeping, so it would also
> prevent the (safe?) assignments above, right?
>
> We could just check against the previously set domain/nohz_full and
> leave other flags alone, couldn't we?

You will have to modify your logic and be explicit that managed_irq is 
currently ignored.

Cheers,
Longman