Documentation/timers/no_hz.rst | 7 ++----- kernel/sched/isolation.c | 11 ++++++++++- 2 files changed, 12 insertions(+), 6 deletions(-)
Documentation/timers/no_hz.rst states that the "nohz_full=" mask must not
include the boot CPU, this is no longer true after the commit 08ae95f4fd3b
("nohz_full: Allow the boot CPU to be nohz_full").
However after another commit aae17ebb53cd ("workqueue: Avoid using isolated
cpus' timers on queue_delayed_work") the kernel will crash at boot time in
this case; housekeeping_any_cpu() returns an invalid cpu nr until smp_init()
paths bring the 1st housekeeping CPU up.
Change housekeeping_any_cpu() to check the result of cpumask_any_and() and
return smp_processor_id() in this case. Yes, this is just the simple and
backportable workaround which fixes the symptom, but smp_processor_id() at
boot time should be safe at least for type == HK_TYPE_TIMER, this more or
less matches the tick_do_timer_boot_cpu logic.
We should not worry about cpu_down(); tick_nohz_cpu_down() will not allow
to offline tick_do_timer_cpu (the 1st online housekeeping CPU).
Fixes: aae17ebb53cd ("workqueue: Avoid using isolated cpus' timers on queue_delayed_work")
Reported-by: Chris von Recklinghausen <crecklin@redhat.com>
Closes: https://lore.kernel.org/all/20240402105847.GA24832@redhat.com/
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
Documentation/timers/no_hz.rst | 7 ++-----
kernel/sched/isolation.c | 11 ++++++++++-
2 files changed, 12 insertions(+), 6 deletions(-)
diff --git a/Documentation/timers/no_hz.rst b/Documentation/timers/no_hz.rst
index f8786be15183..7fe8ef9718d8 100644
--- a/Documentation/timers/no_hz.rst
+++ b/Documentation/timers/no_hz.rst
@@ -129,11 +129,8 @@ adaptive-tick CPUs: At least one non-adaptive-tick CPU must remain
online to handle timekeeping tasks in order to ensure that system
calls like gettimeofday() returns accurate values on adaptive-tick CPUs.
(This is not an issue for CONFIG_NO_HZ_IDLE=y because there are no running
-user processes to observe slight drifts in clock rate.) Therefore, the
-boot CPU is prohibited from entering adaptive-ticks mode. Specifying a
-"nohz_full=" mask that includes the boot CPU will result in a boot-time
-error message, and the boot CPU will be removed from the mask. Note that
-this means that your system must have at least two CPUs in order for
+user processes to observe slight drifts in clock rate.) Note that this
+means that your system must have at least two CPUs in order for
CONFIG_NO_HZ_FULL=y to do anything for you.
Finally, adaptive-ticks CPUs must have their RCU callbacks offloaded.
diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 373d42c707bc..2a262d3ecb3d 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -46,7 +46,16 @@ int housekeeping_any_cpu(enum hk_type type)
if (cpu < nr_cpu_ids)
return cpu;
- return cpumask_any_and(housekeeping.cpumasks[type], cpu_online_mask);
+ cpu = cpumask_any_and(housekeeping.cpumasks[type], cpu_online_mask);
+ if (likely(cpu < nr_cpu_ids))
+ return cpu;
+ /*
+ * Unless we have another problem this can only happen
+ * at boot time before start_secondary() brings the 1st
+ * housekeeping CPU up.
+ */
+ WARN_ON_ONCE(system_state == SYSTEM_RUNNING ||
+ type != HK_TYPE_TIMER);
}
}
return smp_processor_id();
--
2.25.1.362.g51ebf55
On Thu, Apr 11, 2024 at 04:39:05PM +0200 Oleg Nesterov wrote:
> Documentation/timers/no_hz.rst states that the "nohz_full=" mask must not
> include the boot CPU, this is no longer true after the commit 08ae95f4fd3b
> ("nohz_full: Allow the boot CPU to be nohz_full").
>
> However after another commit aae17ebb53cd ("workqueue: Avoid using isolated
> cpus' timers on queue_delayed_work") the kernel will crash at boot time in
> this case; housekeeping_any_cpu() returns an invalid cpu nr until smp_init()
> paths bring the 1st housekeeping CPU up.
>
> Change housekeeping_any_cpu() to check the result of cpumask_any_and() and
> return smp_processor_id() in this case. Yes, this is just the simple and
> backportable workaround which fixes the symptom, but smp_processor_id() at
> boot time should be safe at least for type == HK_TYPE_TIMER, this more or
> less matches the tick_do_timer_boot_cpu logic.
>
> We should not worry about cpu_down(); tick_nohz_cpu_down() will not allow
> to offline tick_do_timer_cpu (the 1st online housekeeping CPU).
>
> Fixes: aae17ebb53cd ("workqueue: Avoid using isolated cpus' timers on queue_delayed_work")
> Reported-by: Chris von Recklinghausen <crecklin@redhat.com>
> Closes: https://lore.kernel.org/all/20240402105847.GA24832@redhat.com/
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Checking the returned value instead of assuming seems safer in any case.
Reviewed-by: Phil Auld <pauld@redhat.com>
> ---
> Documentation/timers/no_hz.rst | 7 ++-----
> kernel/sched/isolation.c | 11 ++++++++++-
> 2 files changed, 12 insertions(+), 6 deletions(-)
>
> diff --git a/Documentation/timers/no_hz.rst b/Documentation/timers/no_hz.rst
> index f8786be15183..7fe8ef9718d8 100644
> --- a/Documentation/timers/no_hz.rst
> +++ b/Documentation/timers/no_hz.rst
> @@ -129,11 +129,8 @@ adaptive-tick CPUs: At least one non-adaptive-tick CPU must remain
> online to handle timekeeping tasks in order to ensure that system
> calls like gettimeofday() returns accurate values on adaptive-tick CPUs.
> (This is not an issue for CONFIG_NO_HZ_IDLE=y because there are no running
> -user processes to observe slight drifts in clock rate.) Therefore, the
> -boot CPU is prohibited from entering adaptive-ticks mode. Specifying a
> -"nohz_full=" mask that includes the boot CPU will result in a boot-time
> -error message, and the boot CPU will be removed from the mask. Note that
> -this means that your system must have at least two CPUs in order for
> +user processes to observe slight drifts in clock rate.) Note that this
> +means that your system must have at least two CPUs in order for
> CONFIG_NO_HZ_FULL=y to do anything for you.
>
> Finally, adaptive-ticks CPUs must have their RCU callbacks offloaded.
> diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
> index 373d42c707bc..2a262d3ecb3d 100644
> --- a/kernel/sched/isolation.c
> +++ b/kernel/sched/isolation.c
> @@ -46,7 +46,16 @@ int housekeeping_any_cpu(enum hk_type type)
> if (cpu < nr_cpu_ids)
> return cpu;
>
> - return cpumask_any_and(housekeeping.cpumasks[type], cpu_online_mask);
> + cpu = cpumask_any_and(housekeeping.cpumasks[type], cpu_online_mask);
> + if (likely(cpu < nr_cpu_ids))
> + return cpu;
> + /*
> + * Unless we have another problem this can only happen
> + * at boot time before start_secondary() brings the 1st
> + * housekeeping CPU up.
> + */
> + WARN_ON_ONCE(system_state == SYSTEM_RUNNING ||
> + type != HK_TYPE_TIMER);
> }
> }
> return smp_processor_id();
> --
> 2.25.1.362.g51ebf55
>
>
>
--
Phil, Frederic,
Thanks for your review! Who do you think can take these patches?
At least the 1st one.
To remind, there are more problems with boot CPU in nohz mask, but
can we at least fix the kernel crash?
Oleg.
On 04/18, Phil Auld wrote:
>
> On Thu, Apr 11, 2024 at 04:39:05PM +0200 Oleg Nesterov wrote:
> > Documentation/timers/no_hz.rst states that the "nohz_full=" mask must not
> > include the boot CPU, this is no longer true after the commit 08ae95f4fd3b
> > ("nohz_full: Allow the boot CPU to be nohz_full").
> >
> > However after another commit aae17ebb53cd ("workqueue: Avoid using isolated
> > cpus' timers on queue_delayed_work") the kernel will crash at boot time in
> > this case; housekeeping_any_cpu() returns an invalid cpu nr until smp_init()
> > paths bring the 1st housekeeping CPU up.
> >
> > Change housekeeping_any_cpu() to check the result of cpumask_any_and() and
> > return smp_processor_id() in this case. Yes, this is just the simple and
> > backportable workaround which fixes the symptom, but smp_processor_id() at
> > boot time should be safe at least for type == HK_TYPE_TIMER, this more or
> > less matches the tick_do_timer_boot_cpu logic.
> >
> > We should not worry about cpu_down(); tick_nohz_cpu_down() will not allow
> > to offline tick_do_timer_cpu (the 1st online housekeeping CPU).
> >
> > Fixes: aae17ebb53cd ("workqueue: Avoid using isolated cpus' timers on queue_delayed_work")
> > Reported-by: Chris von Recklinghausen <crecklin@redhat.com>
> > Closes: https://lore.kernel.org/all/20240402105847.GA24832@redhat.com/
> > Signed-off-by: Oleg Nesterov <oleg@redhat.com>
>
> Checking the returned value instead of assuming seems safer in any case.
>
> Reviewed-by: Phil Auld <pauld@redhat.com>
>
> > ---
> > Documentation/timers/no_hz.rst | 7 ++-----
> > kernel/sched/isolation.c | 11 ++++++++++-
> > 2 files changed, 12 insertions(+), 6 deletions(-)
> >
> > diff --git a/Documentation/timers/no_hz.rst b/Documentation/timers/no_hz.rst
> > index f8786be15183..7fe8ef9718d8 100644
> > --- a/Documentation/timers/no_hz.rst
> > +++ b/Documentation/timers/no_hz.rst
> > @@ -129,11 +129,8 @@ adaptive-tick CPUs: At least one non-adaptive-tick CPU must remain
> > online to handle timekeeping tasks in order to ensure that system
> > calls like gettimeofday() returns accurate values on adaptive-tick CPUs.
> > (This is not an issue for CONFIG_NO_HZ_IDLE=y because there are no running
> > -user processes to observe slight drifts in clock rate.) Therefore, the
> > -boot CPU is prohibited from entering adaptive-ticks mode. Specifying a
> > -"nohz_full=" mask that includes the boot CPU will result in a boot-time
> > -error message, and the boot CPU will be removed from the mask. Note that
> > -this means that your system must have at least two CPUs in order for
> > +user processes to observe slight drifts in clock rate.) Note that this
> > +means that your system must have at least two CPUs in order for
> > CONFIG_NO_HZ_FULL=y to do anything for you.
> >
> > Finally, adaptive-ticks CPUs must have their RCU callbacks offloaded.
> > diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
> > index 373d42c707bc..2a262d3ecb3d 100644
> > --- a/kernel/sched/isolation.c
> > +++ b/kernel/sched/isolation.c
> > @@ -46,7 +46,16 @@ int housekeeping_any_cpu(enum hk_type type)
> > if (cpu < nr_cpu_ids)
> > return cpu;
> >
> > - return cpumask_any_and(housekeeping.cpumasks[type], cpu_online_mask);
> > + cpu = cpumask_any_and(housekeeping.cpumasks[type], cpu_online_mask);
> > + if (likely(cpu < nr_cpu_ids))
> > + return cpu;
> > + /*
> > + * Unless we have another problem this can only happen
> > + * at boot time before start_secondary() brings the 1st
> > + * housekeeping CPU up.
> > + */
> > + WARN_ON_ONCE(system_state == SYSTEM_RUNNING ||
> > + type != HK_TYPE_TIMER);
> > }
> > }
> > return smp_processor_id();
> > --
> > 2.25.1.362.g51ebf55
> >
> >
> >
>
> --
>
HI Oleg, Ingo,
On Mon, Apr 22, 2024 at 08:50:21PM +0200 Oleg Nesterov wrote:
> Phil, Frederic,
>
> Thanks for your review! Who do you think can take these patches?
> At least the 1st one.
>
> To remind, there are more problems with boot CPU in nohz mask, but
> can we at least fix the kernel crash?
>
I think that would be good. I don't know if Peter is at full strength.
Ingo could you take look at this, please?
Cheers,
Phil
> Oleg.
>
> On 04/18, Phil Auld wrote:
> >
> > On Thu, Apr 11, 2024 at 04:39:05PM +0200 Oleg Nesterov wrote:
> > > Documentation/timers/no_hz.rst states that the "nohz_full=" mask must not
> > > include the boot CPU, this is no longer true after the commit 08ae95f4fd3b
> > > ("nohz_full: Allow the boot CPU to be nohz_full").
> > >
> > > However after another commit aae17ebb53cd ("workqueue: Avoid using isolated
> > > cpus' timers on queue_delayed_work") the kernel will crash at boot time in
> > > this case; housekeeping_any_cpu() returns an invalid cpu nr until smp_init()
> > > paths bring the 1st housekeeping CPU up.
> > >
> > > Change housekeeping_any_cpu() to check the result of cpumask_any_and() and
> > > return smp_processor_id() in this case. Yes, this is just the simple and
> > > backportable workaround which fixes the symptom, but smp_processor_id() at
> > > boot time should be safe at least for type == HK_TYPE_TIMER, this more or
> > > less matches the tick_do_timer_boot_cpu logic.
> > >
> > > We should not worry about cpu_down(); tick_nohz_cpu_down() will not allow
> > > to offline tick_do_timer_cpu (the 1st online housekeeping CPU).
> > >
> > > Fixes: aae17ebb53cd ("workqueue: Avoid using isolated cpus' timers on queue_delayed_work")
> > > Reported-by: Chris von Recklinghausen <crecklin@redhat.com>
> > > Closes: https://lore.kernel.org/all/20240402105847.GA24832@redhat.com/
> > > Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> >
> > Checking the returned value instead of assuming seems safer in any case.
> >
> > Reviewed-by: Phil Auld <pauld@redhat.com>
> >
> > > ---
> > > Documentation/timers/no_hz.rst | 7 ++-----
> > > kernel/sched/isolation.c | 11 ++++++++++-
> > > 2 files changed, 12 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/Documentation/timers/no_hz.rst b/Documentation/timers/no_hz.rst
> > > index f8786be15183..7fe8ef9718d8 100644
> > > --- a/Documentation/timers/no_hz.rst
> > > +++ b/Documentation/timers/no_hz.rst
> > > @@ -129,11 +129,8 @@ adaptive-tick CPUs: At least one non-adaptive-tick CPU must remain
> > > online to handle timekeeping tasks in order to ensure that system
> > > calls like gettimeofday() returns accurate values on adaptive-tick CPUs.
> > > (This is not an issue for CONFIG_NO_HZ_IDLE=y because there are no running
> > > -user processes to observe slight drifts in clock rate.) Therefore, the
> > > -boot CPU is prohibited from entering adaptive-ticks mode. Specifying a
> > > -"nohz_full=" mask that includes the boot CPU will result in a boot-time
> > > -error message, and the boot CPU will be removed from the mask. Note that
> > > -this means that your system must have at least two CPUs in order for
> > > +user processes to observe slight drifts in clock rate.) Note that this
> > > +means that your system must have at least two CPUs in order for
> > > CONFIG_NO_HZ_FULL=y to do anything for you.
> > >
> > > Finally, adaptive-ticks CPUs must have their RCU callbacks offloaded.
> > > diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
> > > index 373d42c707bc..2a262d3ecb3d 100644
> > > --- a/kernel/sched/isolation.c
> > > +++ b/kernel/sched/isolation.c
> > > @@ -46,7 +46,16 @@ int housekeeping_any_cpu(enum hk_type type)
> > > if (cpu < nr_cpu_ids)
> > > return cpu;
> > >
> > > - return cpumask_any_and(housekeeping.cpumasks[type], cpu_online_mask);
> > > + cpu = cpumask_any_and(housekeeping.cpumasks[type], cpu_online_mask);
> > > + if (likely(cpu < nr_cpu_ids))
> > > + return cpu;
> > > + /*
> > > + * Unless we have another problem this can only happen
> > > + * at boot time before start_secondary() brings the 1st
> > > + * housekeeping CPU up.
> > > + */
> > > + WARN_ON_ONCE(system_state == SYSTEM_RUNNING ||
> > > + type != HK_TYPE_TIMER);
> > > }
> > > }
> > > return smp_processor_id();
> > > --
> > > 2.25.1.362.g51ebf55
> > >
> > >
> > >
> >
> > --
> >
>
--
Le Thu, Apr 11, 2024 at 04:39:05PM +0200, Oleg Nesterov a écrit :
> Documentation/timers/no_hz.rst states that the "nohz_full=" mask must not
> include the boot CPU, this is no longer true after the commit 08ae95f4fd3b
> ("nohz_full: Allow the boot CPU to be nohz_full").
>
> However after another commit aae17ebb53cd ("workqueue: Avoid using isolated
> cpus' timers on queue_delayed_work") the kernel will crash at boot time in
> this case; housekeeping_any_cpu() returns an invalid cpu nr until smp_init()
> paths bring the 1st housekeeping CPU up.
>
> Change housekeeping_any_cpu() to check the result of cpumask_any_and() and
> return smp_processor_id() in this case. Yes, this is just the simple and
> backportable workaround which fixes the symptom, but smp_processor_id() at
> boot time should be safe at least for type == HK_TYPE_TIMER, this more or
> less matches the tick_do_timer_boot_cpu logic.
>
> We should not worry about cpu_down(); tick_nohz_cpu_down() will not allow
> to offline tick_do_timer_cpu (the 1st online housekeeping CPU).
>
> Fixes: aae17ebb53cd ("workqueue: Avoid using isolated cpus' timers on queue_delayed_work")
> Reported-by: Chris von Recklinghausen <crecklin@redhat.com>
> Closes: https://lore.kernel.org/all/20240402105847.GA24832@redhat.com/
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Just in case...
This reveals another issue, tick_setup_device() can't use
smp_call_function() due to irqs_disabled(), but lets discuss this
later/separately too.
On 04/11, Oleg Nesterov wrote:
>
> Documentation/timers/no_hz.rst states that the "nohz_full=" mask must not
> include the boot CPU, this is no longer true after the commit 08ae95f4fd3b
> ("nohz_full: Allow the boot CPU to be nohz_full").
>
> However after another commit aae17ebb53cd ("workqueue: Avoid using isolated
> cpus' timers on queue_delayed_work") the kernel will crash at boot time in
> this case; housekeeping_any_cpu() returns an invalid cpu nr until smp_init()
> paths bring the 1st housekeeping CPU up.
>
> Change housekeeping_any_cpu() to check the result of cpumask_any_and() and
> return smp_processor_id() in this case. Yes, this is just the simple and
> backportable workaround which fixes the symptom, but smp_processor_id() at
> boot time should be safe at least for type == HK_TYPE_TIMER, this more or
> less matches the tick_do_timer_boot_cpu logic.
>
> We should not worry about cpu_down(); tick_nohz_cpu_down() will not allow
> to offline tick_do_timer_cpu (the 1st online housekeeping CPU).
>
> Fixes: aae17ebb53cd ("workqueue: Avoid using isolated cpus' timers on queue_delayed_work")
> Reported-by: Chris von Recklinghausen <crecklin@redhat.com>
> Closes: https://lore.kernel.org/all/20240402105847.GA24832@redhat.com/
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
> Documentation/timers/no_hz.rst | 7 ++-----
> kernel/sched/isolation.c | 11 ++++++++++-
> 2 files changed, 12 insertions(+), 6 deletions(-)
>
> diff --git a/Documentation/timers/no_hz.rst b/Documentation/timers/no_hz.rst
> index f8786be15183..7fe8ef9718d8 100644
> --- a/Documentation/timers/no_hz.rst
> +++ b/Documentation/timers/no_hz.rst
> @@ -129,11 +129,8 @@ adaptive-tick CPUs: At least one non-adaptive-tick CPU must remain
> online to handle timekeeping tasks in order to ensure that system
> calls like gettimeofday() returns accurate values on adaptive-tick CPUs.
> (This is not an issue for CONFIG_NO_HZ_IDLE=y because there are no running
> -user processes to observe slight drifts in clock rate.) Therefore, the
> -boot CPU is prohibited from entering adaptive-ticks mode. Specifying a
> -"nohz_full=" mask that includes the boot CPU will result in a boot-time
> -error message, and the boot CPU will be removed from the mask. Note that
> -this means that your system must have at least two CPUs in order for
> +user processes to observe slight drifts in clock rate.) Note that this
> +means that your system must have at least two CPUs in order for
> CONFIG_NO_HZ_FULL=y to do anything for you.
>
> Finally, adaptive-ticks CPUs must have their RCU callbacks offloaded.
> diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
> index 373d42c707bc..2a262d3ecb3d 100644
> --- a/kernel/sched/isolation.c
> +++ b/kernel/sched/isolation.c
> @@ -46,7 +46,16 @@ int housekeeping_any_cpu(enum hk_type type)
> if (cpu < nr_cpu_ids)
> return cpu;
>
> - return cpumask_any_and(housekeeping.cpumasks[type], cpu_online_mask);
> + cpu = cpumask_any_and(housekeeping.cpumasks[type], cpu_online_mask);
> + if (likely(cpu < nr_cpu_ids))
> + return cpu;
> + /*
> + * Unless we have another problem this can only happen
> + * at boot time before start_secondary() brings the 1st
> + * housekeeping CPU up.
> + */
> + WARN_ON_ONCE(system_state == SYSTEM_RUNNING ||
> + type != HK_TYPE_TIMER);
> }
> }
> return smp_processor_id();
> --
> 2.25.1.362.g51ebf55
>
The following commit has been merged into the sched/urgent branch of tip:
Commit-ID: 8e3101b38dfc20848a23525b1e6e80bd1641d44c
Gitweb: https://git.kernel.org/tip/8e3101b38dfc20848a23525b1e6e80bd1641d44c
Author: Oleg Nesterov <oleg@redhat.com>
AuthorDate: Thu, 11 Apr 2024 16:39:05 +02:00
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 24 Apr 2024 21:53:34 +02:00
sched/isolation: {revent boot crash when the boot CPU is nohz_full
Documentation/timers/no_hz.rst states that the "nohz_full=" mask must not
include the boot CPU, which is no longer true after commit 08ae95f4fd3b
("nohz_full: Allow the boot CPU to be nohz_full").
However after commit aae17ebb53cd ("workqueue: Avoid using isolated cpus'
timers on queue_delayed_work") the kernel will crash at boot time in this
case; housekeeping_any_cpu() returns an invalid CPU number until smp_init()
brings the first housekeeping CPU up.
Change housekeeping_any_cpu() to check the result of cpumask_any_and() and
return smp_processor_id() in this case.
This is just the simple and backportable workaround which fixes the
symptom, but smp_processor_id() at boot time should be safe at least for
type == HK_TYPE_TIMER, this more or less matches the tick_do_timer_boot_cpu
logic.
There is no worry about cpu_down(); tick_nohz_cpu_down() will not allow to
offline tick_do_timer_cpu (the 1st online housekeeping CPU).
Fixes: aae17ebb53cd ("workqueue: Avoid using isolated cpus' timers on queue_delayed_work")
Reported-by: Chris von Recklinghausen <crecklin@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Phil Auld <pauld@redhat.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20240411143905.GA19288@redhat.com
Closes: https://lore.kernel.org/all/20240402105847.GA24832@redhat.com/
---
Documentation/timers/no_hz.rst | 7 ++-----
kernel/sched/isolation.c | 11 ++++++++++-
2 files changed, 12 insertions(+), 6 deletions(-)
diff --git a/Documentation/timers/no_hz.rst b/Documentation/timers/no_hz.rst
index f8786be..7fe8ef9 100644
--- a/Documentation/timers/no_hz.rst
+++ b/Documentation/timers/no_hz.rst
@@ -129,11 +129,8 @@ adaptive-tick CPUs: At least one non-adaptive-tick CPU must remain
online to handle timekeeping tasks in order to ensure that system
calls like gettimeofday() returns accurate values on adaptive-tick CPUs.
(This is not an issue for CONFIG_NO_HZ_IDLE=y because there are no running
-user processes to observe slight drifts in clock rate.) Therefore, the
-boot CPU is prohibited from entering adaptive-ticks mode. Specifying a
-"nohz_full=" mask that includes the boot CPU will result in a boot-time
-error message, and the boot CPU will be removed from the mask. Note that
-this means that your system must have at least two CPUs in order for
+user processes to observe slight drifts in clock rate.) Note that this
+means that your system must have at least two CPUs in order for
CONFIG_NO_HZ_FULL=y to do anything for you.
Finally, adaptive-ticks CPUs must have their RCU callbacks offloaded.
diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 373d42c..2a262d3 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -46,7 +46,16 @@ int housekeeping_any_cpu(enum hk_type type)
if (cpu < nr_cpu_ids)
return cpu;
- return cpumask_any_and(housekeeping.cpumasks[type], cpu_online_mask);
+ cpu = cpumask_any_and(housekeeping.cpumasks[type], cpu_online_mask);
+ if (likely(cpu < nr_cpu_ids))
+ return cpu;
+ /*
+ * Unless we have another problem this can only happen
+ * at boot time before start_secondary() brings the 1st
+ * housekeeping CPU up.
+ */
+ WARN_ON_ONCE(system_state == SYSTEM_RUNNING ||
+ type != HK_TYPE_TIMER);
}
}
return smp_processor_id();
On Wed, Apr 24, 2024 at 08:05:02PM -0000 tip-bot2 for Oleg Nesterov wrote:
> The following commit has been merged into the sched/urgent branch of tip:
>
> Commit-ID: 8e3101b38dfc20848a23525b1e6e80bd1641d44c
> Gitweb: https://git.kernel.org/tip/8e3101b38dfc20848a23525b1e6e80bd1641d44c
> Author: Oleg Nesterov <oleg@redhat.com>
> AuthorDate: Thu, 11 Apr 2024 16:39:05 +02:00
> Committer: Thomas Gleixner <tglx@linutronix.de>
> CommitterDate: Wed, 24 Apr 2024 21:53:34 +02:00
>
> sched/isolation: {revent boot crash when the boot CPU is nohz_full
>
Thanks Thomas, Typo in the reworded description :)
> Documentation/timers/no_hz.rst states that the "nohz_full=" mask must not
> include the boot CPU, which is no longer true after commit 08ae95f4fd3b
> ("nohz_full: Allow the boot CPU to be nohz_full").
>
> However after commit aae17ebb53cd ("workqueue: Avoid using isolated cpus'
> timers on queue_delayed_work") the kernel will crash at boot time in this
> case; housekeeping_any_cpu() returns an invalid CPU number until smp_init()
> brings the first housekeeping CPU up.
>
> Change housekeeping_any_cpu() to check the result of cpumask_any_and() and
> return smp_processor_id() in this case.
>
> This is just the simple and backportable workaround which fixes the
> symptom, but smp_processor_id() at boot time should be safe at least for
> type == HK_TYPE_TIMER, this more or less matches the tick_do_timer_boot_cpu
> logic.
>
> There is no worry about cpu_down(); tick_nohz_cpu_down() will not allow to
> offline tick_do_timer_cpu (the 1st online housekeeping CPU).
>
> Fixes: aae17ebb53cd ("workqueue: Avoid using isolated cpus' timers on queue_delayed_work")
> Reported-by: Chris von Recklinghausen <crecklin@redhat.com>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Reviewed-by: Phil Auld <pauld@redhat.com>
> Acked-by: Frederic Weisbecker <frederic@kernel.org>
> Link: https://lore.kernel.org/r/20240411143905.GA19288@redhat.com
> Closes: https://lore.kernel.org/all/20240402105847.GA24832@redhat.com/
> ---
> Documentation/timers/no_hz.rst | 7 ++-----
> kernel/sched/isolation.c | 11 ++++++++++-
> 2 files changed, 12 insertions(+), 6 deletions(-)
>
> diff --git a/Documentation/timers/no_hz.rst b/Documentation/timers/no_hz.rst
> index f8786be..7fe8ef9 100644
> --- a/Documentation/timers/no_hz.rst
> +++ b/Documentation/timers/no_hz.rst
> @@ -129,11 +129,8 @@ adaptive-tick CPUs: At least one non-adaptive-tick CPU must remain
> online to handle timekeeping tasks in order to ensure that system
> calls like gettimeofday() returns accurate values on adaptive-tick CPUs.
> (This is not an issue for CONFIG_NO_HZ_IDLE=y because there are no running
> -user processes to observe slight drifts in clock rate.) Therefore, the
> -boot CPU is prohibited from entering adaptive-ticks mode. Specifying a
> -"nohz_full=" mask that includes the boot CPU will result in a boot-time
> -error message, and the boot CPU will be removed from the mask. Note that
> -this means that your system must have at least two CPUs in order for
> +user processes to observe slight drifts in clock rate.) Note that this
> +means that your system must have at least two CPUs in order for
> CONFIG_NO_HZ_FULL=y to do anything for you.
>
> Finally, adaptive-ticks CPUs must have their RCU callbacks offloaded.
> diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
> index 373d42c..2a262d3 100644
> --- a/kernel/sched/isolation.c
> +++ b/kernel/sched/isolation.c
> @@ -46,7 +46,16 @@ int housekeeping_any_cpu(enum hk_type type)
> if (cpu < nr_cpu_ids)
> return cpu;
>
> - return cpumask_any_and(housekeeping.cpumasks[type], cpu_online_mask);
> + cpu = cpumask_any_and(housekeeping.cpumasks[type], cpu_online_mask);
> + if (likely(cpu < nr_cpu_ids))
> + return cpu;
> + /*
> + * Unless we have another problem this can only happen
> + * at boot time before start_secondary() brings the 1st
> + * housekeeping CPU up.
> + */
> + WARN_ON_ONCE(system_state == SYSTEM_RUNNING ||
> + type != HK_TYPE_TIMER);
> }
> }
> return smp_processor_id();
>
--
* Phil Auld <pauld@redhat.com> wrote:
> On Wed, Apr 24, 2024 at 08:05:02PM -0000 tip-bot2 for Oleg Nesterov wrote:
> > The following commit has been merged into the sched/urgent branch of tip:
> >
> > Commit-ID: 8e3101b38dfc20848a23525b1e6e80bd1641d44c
> > Gitweb: https://git.kernel.org/tip/8e3101b38dfc20848a23525b1e6e80bd1641d44c
> > Author: Oleg Nesterov <oleg@redhat.com>
> > AuthorDate: Thu, 11 Apr 2024 16:39:05 +02:00
> > Committer: Thomas Gleixner <tglx@linutronix.de>
> > CommitterDate: Wed, 24 Apr 2024 21:53:34 +02:00
> >
> > sched/isolation: {revent boot crash when the boot CPU is nohz_full
> >
>
> Thanks Thomas, Typo in the reworded description :)
Ok, so normally we wouldn't rebase just for a typo in a changelog, but
that's an annoying typo that will show up in shortlogs - so I fixed it all
up in tip:sched/urgent.
Thanks,
Ingo
On Sun, Apr 28, 2024 at 10:14:30AM +0200 Ingo Molnar wrote:
>
> * Phil Auld <pauld@redhat.com> wrote:
>
> > On Wed, Apr 24, 2024 at 08:05:02PM -0000 tip-bot2 for Oleg Nesterov wrote:
> > > The following commit has been merged into the sched/urgent branch of tip:
> > >
> > > Commit-ID: 8e3101b38dfc20848a23525b1e6e80bd1641d44c
> > > Gitweb: https://git.kernel.org/tip/8e3101b38dfc20848a23525b1e6e80bd1641d44c
> > > Author: Oleg Nesterov <oleg@redhat.com>
> > > AuthorDate: Thu, 11 Apr 2024 16:39:05 +02:00
> > > Committer: Thomas Gleixner <tglx@linutronix.de>
> > > CommitterDate: Wed, 24 Apr 2024 21:53:34 +02:00
> > >
> > > sched/isolation: {revent boot crash when the boot CPU is nohz_full
> > >
> >
> > Thanks Thomas, Typo in the reworded description :)
>
> Ok, so normally we wouldn't rebase just for a typo in a changelog, but
> that's an annoying typo that will show up in shortlogs - so I fixed it all
> up in tip:sched/urgent.
>
Yeah, I kept seeing "revert"...
Thanks,
Phil
> Thanks,
>
> Ingo
>
--
The following commit has been merged into the sched/urgent branch of tip:
Commit-ID: 5097cbcb38e6e0d2627c9dde1985e91d2c9f880e
Gitweb: https://git.kernel.org/tip/5097cbcb38e6e0d2627c9dde1985e91d2c9f880e
Author: Oleg Nesterov <oleg@redhat.com>
AuthorDate: Thu, 11 Apr 2024 16:39:05 +02:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Sun, 28 Apr 2024 10:07:12 +02:00
sched/isolation: Prevent boot crash when the boot CPU is nohz_full
Documentation/timers/no_hz.rst states that the "nohz_full=" mask must not
include the boot CPU, which is no longer true after:
08ae95f4fd3b ("nohz_full: Allow the boot CPU to be nohz_full").
However after:
aae17ebb53cd ("workqueue: Avoid using isolated cpus' timers on queue_delayed_work")
the kernel will crash at boot time in this case; housekeeping_any_cpu()
returns an invalid CPU number until smp_init() brings the first
housekeeping CPU up.
Change housekeeping_any_cpu() to check the result of cpumask_any_and() and
return smp_processor_id() in this case.
This is just the simple and backportable workaround which fixes the
symptom, but smp_processor_id() at boot time should be safe at least for
type == HK_TYPE_TIMER, this more or less matches the tick_do_timer_boot_cpu
logic.
There is no worry about cpu_down(); tick_nohz_cpu_down() will not allow to
offline tick_do_timer_cpu (the 1st online housekeeping CPU).
Fixes: aae17ebb53cd ("workqueue: Avoid using isolated cpus' timers on queue_delayed_work")
Reported-by: Chris von Recklinghausen <crecklin@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Phil Auld <pauld@redhat.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20240411143905.GA19288@redhat.com
Closes: https://lore.kernel.org/all/20240402105847.GA24832@redhat.com/
---
Documentation/timers/no_hz.rst | 7 ++-----
kernel/sched/isolation.c | 11 ++++++++++-
2 files changed, 12 insertions(+), 6 deletions(-)
diff --git a/Documentation/timers/no_hz.rst b/Documentation/timers/no_hz.rst
index f8786be..7fe8ef9 100644
--- a/Documentation/timers/no_hz.rst
+++ b/Documentation/timers/no_hz.rst
@@ -129,11 +129,8 @@ adaptive-tick CPUs: At least one non-adaptive-tick CPU must remain
online to handle timekeeping tasks in order to ensure that system
calls like gettimeofday() returns accurate values on adaptive-tick CPUs.
(This is not an issue for CONFIG_NO_HZ_IDLE=y because there are no running
-user processes to observe slight drifts in clock rate.) Therefore, the
-boot CPU is prohibited from entering adaptive-ticks mode. Specifying a
-"nohz_full=" mask that includes the boot CPU will result in a boot-time
-error message, and the boot CPU will be removed from the mask. Note that
-this means that your system must have at least two CPUs in order for
+user processes to observe slight drifts in clock rate.) Note that this
+means that your system must have at least two CPUs in order for
CONFIG_NO_HZ_FULL=y to do anything for you.
Finally, adaptive-ticks CPUs must have their RCU callbacks offloaded.
diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 373d42c..2a262d3 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -46,7 +46,16 @@ int housekeeping_any_cpu(enum hk_type type)
if (cpu < nr_cpu_ids)
return cpu;
- return cpumask_any_and(housekeeping.cpumasks[type], cpu_online_mask);
+ cpu = cpumask_any_and(housekeeping.cpumasks[type], cpu_online_mask);
+ if (likely(cpu < nr_cpu_ids))
+ return cpu;
+ /*
+ * Unless we have another problem this can only happen
+ * at boot time before start_secondary() brings the 1st
+ * housekeeping CPU up.
+ */
+ WARN_ON_ONCE(system_state == SYSTEM_RUNNING ||
+ type != HK_TYPE_TIMER);
}
}
return smp_processor_id();
housekeeping_setup() checks cpumask_intersects(present, online) to ensure
that the kernel will have at least one housekeeping CPU after smp_init(),
but this doesn't work if the maxcpus= kernel parameter limits the number
of processors available after bootup.
For example, the kernel with "maxcpus=2 nohz_full=0-2" parameters crashes
at boot time on my virtual machine with 4 CPUs.
Change housekeeping_setup() to use cpumask_first_and() and check that the
returned cpu number is valid and less than setup_max_cpus.
Another corner case is "nohz_full=0" on a machine with a single CPU or
with the maxcpus=1 kernel argument. In this case non_housekeeping_mask
is empty and IIUC tick_nohz_full_setup() makes no sense. And indeed, the
kernel hits the WARN_ON(tick_nohz_full_running) in tick_sched_do_timer().
And how should the kernel interpret the "nohz_full=" parameter? I think
it should be silently ignored, but currently cpulist_parse() happily
returns the empty cpumask and this leads to the same problem.
Change housekeeping_setup() to check cpumask_empty(non_housekeeping_mask)
and do nothing in this case.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
kernel/sched/isolation.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 2a262d3ecb3d..5891e715f00d 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -118,6 +118,7 @@ static void __init housekeeping_setup_type(enum hk_type type,
static int __init housekeeping_setup(char *str, unsigned long flags)
{
cpumask_var_t non_housekeeping_mask, housekeeping_staging;
+ unsigned int first_cpu;
int err = 0;
if ((flags & HK_FLAG_TICK) && !(housekeeping.flags & HK_FLAG_TICK)) {
@@ -138,7 +139,8 @@ static int __init housekeeping_setup(char *str, unsigned long flags)
cpumask_andnot(housekeeping_staging,
cpu_possible_mask, non_housekeeping_mask);
- if (!cpumask_intersects(cpu_present_mask, housekeeping_staging)) {
+ first_cpu = cpumask_first_and(cpu_present_mask, housekeeping_staging);
+ if (first_cpu >= nr_cpu_ids || first_cpu >= setup_max_cpus) {
__cpumask_set_cpu(smp_processor_id(), housekeeping_staging);
__cpumask_clear_cpu(smp_processor_id(), non_housekeeping_mask);
if (!housekeeping.flags) {
@@ -147,6 +149,9 @@ static int __init housekeeping_setup(char *str, unsigned long flags)
}
}
+ if (cpumask_empty(non_housekeeping_mask))
+ goto free_housekeeping_staging;
+
if (!housekeeping.flags) {
/* First setup call ("nohz_full=" or "isolcpus=") */
enum hk_type type;
--
2.25.1.362.g51ebf55
Le Sat, Apr 13, 2024 at 04:17:46PM +0200, Oleg Nesterov a écrit : > housekeeping_setup() checks cpumask_intersects(present, online) to ensure > that the kernel will have at least one housekeeping CPU after smp_init(), > but this doesn't work if the maxcpus= kernel parameter limits the number > of processors available after bootup. > > For example, the kernel with "maxcpus=2 nohz_full=0-2" parameters crashes > at boot time on my virtual machine with 4 CPUs. > > Change housekeeping_setup() to use cpumask_first_and() and check that the > returned cpu number is valid and less than setup_max_cpus. > > Another corner case is "nohz_full=0" on a machine with a single CPU or > with the maxcpus=1 kernel argument. In this case non_housekeeping_mask > is empty and IIUC tick_nohz_full_setup() makes no sense. And indeed, the > kernel hits the WARN_ON(tick_nohz_full_running) in tick_sched_do_timer(). > > And how should the kernel interpret the "nohz_full=" parameter? I think > it should be silently ignored, but currently cpulist_parse() happily > returns the empty cpumask and this leads to the same problem. > > Change housekeeping_setup() to check cpumask_empty(non_housekeeping_mask) > and do nothing in this case. > > Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Frederic Weisbecker <frederic@kernel.org>
On Sat, Apr 13, 2024 at 04:17:46PM +0200 Oleg Nesterov wrote:
> housekeeping_setup() checks cpumask_intersects(present, online) to ensure
> that the kernel will have at least one housekeeping CPU after smp_init(),
> but this doesn't work if the maxcpus= kernel parameter limits the number
> of processors available after bootup.
>
> For example, the kernel with "maxcpus=2 nohz_full=0-2" parameters crashes
> at boot time on my virtual machine with 4 CPUs.
>
> Change housekeeping_setup() to use cpumask_first_and() and check that the
> returned cpu number is valid and less than setup_max_cpus.
>
> Another corner case is "nohz_full=0" on a machine with a single CPU or
> with the maxcpus=1 kernel argument. In this case non_housekeeping_mask
> is empty and IIUC tick_nohz_full_setup() makes no sense. And indeed, the
> kernel hits the WARN_ON(tick_nohz_full_running) in tick_sched_do_timer().
>
> And how should the kernel interpret the "nohz_full=" parameter? I think
> it should be silently ignored, but currently cpulist_parse() happily
> returns the empty cpumask and this leads to the same problem.
>
> Change housekeeping_setup() to check cpumask_empty(non_housekeeping_mask)
> and do nothing in this case.
>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Looks good to me. One less footgun.
Reviewed-by: Phil Auld <pauld@redhat.com>
> ---
> kernel/sched/isolation.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
> index 2a262d3ecb3d..5891e715f00d 100644
> --- a/kernel/sched/isolation.c
> +++ b/kernel/sched/isolation.c
> @@ -118,6 +118,7 @@ static void __init housekeeping_setup_type(enum hk_type type,
> static int __init housekeeping_setup(char *str, unsigned long flags)
> {
> cpumask_var_t non_housekeeping_mask, housekeeping_staging;
> + unsigned int first_cpu;
> int err = 0;
>
> if ((flags & HK_FLAG_TICK) && !(housekeeping.flags & HK_FLAG_TICK)) {
> @@ -138,7 +139,8 @@ static int __init housekeeping_setup(char *str, unsigned long flags)
> cpumask_andnot(housekeeping_staging,
> cpu_possible_mask, non_housekeeping_mask);
>
> - if (!cpumask_intersects(cpu_present_mask, housekeeping_staging)) {
> + first_cpu = cpumask_first_and(cpu_present_mask, housekeeping_staging);
> + if (first_cpu >= nr_cpu_ids || first_cpu >= setup_max_cpus) {
> __cpumask_set_cpu(smp_processor_id(), housekeeping_staging);
> __cpumask_clear_cpu(smp_processor_id(), non_housekeeping_mask);
> if (!housekeeping.flags) {
> @@ -147,6 +149,9 @@ static int __init housekeeping_setup(char *str, unsigned long flags)
> }
> }
>
> + if (cpumask_empty(non_housekeeping_mask))
> + goto free_housekeeping_staging;
> +
> if (!housekeeping.flags) {
> /* First setup call ("nohz_full=" or "isolcpus=") */
> enum hk_type type;
> --
> 2.25.1.362.g51ebf55
>
>
>
--
The following commit has been merged into the sched/urgent branch of tip:
Commit-ID: b6ad00418eaf376b4f2a68a1696d6368c1381310
Gitweb: https://git.kernel.org/tip/b6ad00418eaf376b4f2a68a1696d6368c1381310
Author: Oleg Nesterov <oleg@redhat.com>
AuthorDate: Sat, 13 Apr 2024 16:17:46 +02:00
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 24 Apr 2024 21:53:34 +02:00
sched/isolation: Fix boot crash when maxcpus < first housekeeping CPU
housekeeping_setup() checks cpumask_intersects(present, online) to ensure
that the kernel will have at least one housekeeping CPU after smp_init(),
but this doesn't work if the maxcpus= kernel parameter limits the number of
processors available after bootup.
For example, a kernel with "maxcpus=2 nohz_full=0-2" parameters crashes at
boot time on a virtual machine with 4 CPUs.
Change housekeeping_setup() to use cpumask_first_and() and check that the
returned CPU number is valid and less than setup_max_cpus.
Another corner case is "nohz_full=0" on a machine with a single CPU or with
the maxcpus=1 kernel argument. In this case non_housekeeping_mask is empty
and tick_nohz_full_setup() makes no sense. And indeed, the kernel hits the
WARN_ON(tick_nohz_full_running) in tick_sched_do_timer().
And how should the kernel interpret the "nohz_full=" parameter? It should
be silently ignored, but currently cpulist_parse() happily returns the
empty cpumask and this leads to the same problem.
Change housekeeping_setup() to check cpumask_empty(non_housekeeping_mask)
and do nothing in this case.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Phil Auld <pauld@redhat.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20240413141746.GA10008@redhat.com
---
kernel/sched/isolation.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 2a262d3..5891e71 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -118,6 +118,7 @@ static void __init housekeeping_setup_type(enum hk_type type,
static int __init housekeeping_setup(char *str, unsigned long flags)
{
cpumask_var_t non_housekeeping_mask, housekeeping_staging;
+ unsigned int first_cpu;
int err = 0;
if ((flags & HK_FLAG_TICK) && !(housekeeping.flags & HK_FLAG_TICK)) {
@@ -138,7 +139,8 @@ static int __init housekeeping_setup(char *str, unsigned long flags)
cpumask_andnot(housekeeping_staging,
cpu_possible_mask, non_housekeeping_mask);
- if (!cpumask_intersects(cpu_present_mask, housekeeping_staging)) {
+ first_cpu = cpumask_first_and(cpu_present_mask, housekeeping_staging);
+ if (first_cpu >= nr_cpu_ids || first_cpu >= setup_max_cpus) {
__cpumask_set_cpu(smp_processor_id(), housekeeping_staging);
__cpumask_clear_cpu(smp_processor_id(), non_housekeeping_mask);
if (!housekeeping.flags) {
@@ -147,6 +149,9 @@ static int __init housekeeping_setup(char *str, unsigned long flags)
}
}
+ if (cpumask_empty(non_housekeeping_mask))
+ goto free_housekeeping_staging;
+
if (!housekeeping.flags) {
/* First setup call ("nohz_full=" or "isolcpus=") */
enum hk_type type;
* tip-bot2 for Oleg Nesterov <tip-bot2@linutronix.de> wrote: > Another corner case is "nohz_full=0" on a machine with a single CPU or with > the maxcpus=1 kernel argument. In this case non_housekeeping_mask is empty > and tick_nohz_full_setup() makes no sense. And indeed, the kernel hits the > WARN_ON(tick_nohz_full_running) in tick_sched_do_timer(). > > And how should the kernel interpret the "nohz_full=" parameter? It should > be silently ignored, but currently cpulist_parse() happily returns the > empty cpumask and this leads to the same problem. > > Change housekeeping_setup() to check cpumask_empty(non_housekeeping_mask) > and do nothing in this case. So arguably the user meant NOHZ_FULL to be turned off - but it is de-facto already turned off by the fact that there's only a single CPU available, right? Thanks, Ingo
On 04/28, Ingo Molnar wrote: > > * tip-bot2 for Oleg Nesterov <tip-bot2@linutronix.de> wrote: > > > Another corner case is "nohz_full=0" on a machine with a single CPU or with > > the maxcpus=1 kernel argument. In this case non_housekeeping_mask is empty > > and tick_nohz_full_setup() makes no sense. And indeed, the kernel hits the > > WARN_ON(tick_nohz_full_running) in tick_sched_do_timer(). > > > > And how should the kernel interpret the "nohz_full=" parameter? It should > > be silently ignored, but currently cpulist_parse() happily returns the > > empty cpumask and this leads to the same problem. > > > > Change housekeeping_setup() to check cpumask_empty(non_housekeeping_mask) > > and do nothing in this case. > > So arguably the user meant NOHZ_FULL to be turned off - but it is de-facto > already turned off by the fact that there's only a single CPU available, > right? Or the user passes the empty "nohz_full=" mask on a multi-CPU machine. In both cases (before this patch) housekeeping_setup() calls tick_nohz_full_setup(non_housekeeping_mask) which sets tick_nohz_full_running = true even if tick_nohz_full_mask is empty. This doesn't look right to me and triggers the "this should not happen" warning in tick_sched_do_timer(). But let me repeat, I know nothing about nohz/etc. Oleg.
The following commit has been merged into the sched/urgent branch of tip:
Commit-ID: 257bf89d84121280904800acd25cc2c444c717ae
Gitweb: https://git.kernel.org/tip/257bf89d84121280904800acd25cc2c444c717ae
Author: Oleg Nesterov <oleg@redhat.com>
AuthorDate: Sat, 13 Apr 2024 16:17:46 +02:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Sun, 28 Apr 2024 10:08:21 +02:00
sched/isolation: Fix boot crash when maxcpus < first housekeeping CPU
housekeeping_setup() checks cpumask_intersects(present, online) to ensure
that the kernel will have at least one housekeeping CPU after smp_init(),
but this doesn't work if the maxcpus= kernel parameter limits the number of
processors available after bootup.
For example, a kernel with "maxcpus=2 nohz_full=0-2" parameters crashes at
boot time on a virtual machine with 4 CPUs.
Change housekeeping_setup() to use cpumask_first_and() and check that the
returned CPU number is valid and less than setup_max_cpus.
Another corner case is "nohz_full=0" on a machine with a single CPU or with
the maxcpus=1 kernel argument. In this case non_housekeeping_mask is empty
and tick_nohz_full_setup() makes no sense. And indeed, the kernel hits the
WARN_ON(tick_nohz_full_running) in tick_sched_do_timer().
And how should the kernel interpret the "nohz_full=" parameter? It should
be silently ignored, but currently cpulist_parse() happily returns the
empty cpumask and this leads to the same problem.
Change housekeeping_setup() to check cpumask_empty(non_housekeeping_mask)
and do nothing in this case.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Phil Auld <pauld@redhat.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20240413141746.GA10008@redhat.com
---
kernel/sched/isolation.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 2a262d3..5891e71 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -118,6 +118,7 @@ static void __init housekeeping_setup_type(enum hk_type type,
static int __init housekeeping_setup(char *str, unsigned long flags)
{
cpumask_var_t non_housekeeping_mask, housekeeping_staging;
+ unsigned int first_cpu;
int err = 0;
if ((flags & HK_FLAG_TICK) && !(housekeeping.flags & HK_FLAG_TICK)) {
@@ -138,7 +139,8 @@ static int __init housekeeping_setup(char *str, unsigned long flags)
cpumask_andnot(housekeeping_staging,
cpu_possible_mask, non_housekeeping_mask);
- if (!cpumask_intersects(cpu_present_mask, housekeeping_staging)) {
+ first_cpu = cpumask_first_and(cpu_present_mask, housekeeping_staging);
+ if (first_cpu >= nr_cpu_ids || first_cpu >= setup_max_cpus) {
__cpumask_set_cpu(smp_processor_id(), housekeeping_staging);
__cpumask_clear_cpu(smp_processor_id(), non_housekeeping_mask);
if (!housekeeping.flags) {
@@ -147,6 +149,9 @@ static int __init housekeeping_setup(char *str, unsigned long flags)
}
}
+ if (cpumask_empty(non_housekeeping_mask))
+ goto free_housekeeping_staging;
+
if (!housekeeping.flags) {
/* First setup call ("nohz_full=" or "isolcpus=") */
enum hk_type type;
© 2016 - 2026 Red Hat, Inc.