kernel/time/tick-sched.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
When nohz_full CPU stops tick in tick_nohz_irq_exit(),
It wouldn't be chosen to perform idle load balancing because it doesn't
call nohz_balance_enter_idle() in tick_nohz_idle_stop_tick() when it
becomes idle.
Formerly, __tick_nohz_idle_enter() is called in both
tick_nohz_irq_exit() and in do_idle().
That's why commit a0db971e4eb6 ("nohz: Move idle balancer registration
to the idle path") prevents nohz_full cpu which isn't yet
idle state but tick is stopped from entering idle balance.
However, this prevents nohz_full cpu which already stops tick from
entering idle balacne when this cpu really becomes idle state.
Currently, tick_nohz_idle_stop_tick() is only called in idle state and
it calls nohz_balance_enter_idle(). this function tracks the CPU
which is part of nohz.idle_cpus_mask with rq->nohz_tick_stopped properly.
Therefore, Change tick_nohz_idle_stop_tick() to call nohz_balance_enter_idle()
without checking !was_stopped so that nohz_full cpu can be chosen to
perform idle load balancing when it enters idle state.
Fixes: a0db971e4eb6 ("nohz: Move idle balancer registration to the idle path")
Signed-off-by: Levi Yun <ppbuk5246@gmail.com>
---
v4:
- Add fixes tags.
v3:
- Rewording commit message.
v2:
- Fix typos in commit message.
kernel/time/tick-sched.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 71a792cd8936..31a4cd89782f 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -1228,8 +1228,10 @@ void tick_nohz_idle_stop_tick(void)
ts->idle_sleeps++;
ts->idle_expires = expires;
- if (!was_stopped && tick_sched_flag_test(ts, TS_FLAG_STOPPED)) {
- ts->idle_jiffies = ts->last_jiffies;
+ if (tick_sched_flag_test(ts, TS_FLAG_STOPPED)) {
+ if (!was_stopped)
+ ts->idle_jiffies = ts->last_jiffies;
+
nohz_balance_enter_idle(cpu);
}
} else {
--
2.41.0
Le Thu, May 09, 2024 at 10:29:32AM +0100, Levi Yun a écrit :
> When nohz_full CPU stops tick in tick_nohz_irq_exit(),
> It wouldn't be chosen to perform idle load balancing because it doesn't
> call nohz_balance_enter_idle() in tick_nohz_idle_stop_tick() when it
> becomes idle.
>
> Formerly, __tick_nohz_idle_enter() is called in both
> tick_nohz_irq_exit() and in do_idle().
> That's why commit a0db971e4eb6 ("nohz: Move idle balancer registration
> to the idle path") prevents nohz_full cpu which isn't yet
> idle state but tick is stopped from entering idle balance.
>
> However, this prevents nohz_full cpu which already stops tick from
> entering idle balacne when this cpu really becomes idle state.
>
> Currently, tick_nohz_idle_stop_tick() is only called in idle state and
> it calls nohz_balance_enter_idle(). this function tracks the CPU
> which is part of nohz.idle_cpus_mask with rq->nohz_tick_stopped properly.
>
> Therefore, Change tick_nohz_idle_stop_tick() to call nohz_balance_enter_idle()
> without checking !was_stopped so that nohz_full cpu can be chosen to
> perform idle load balancing when it enters idle state.
>
> Fixes: a0db971e4eb6 ("nohz: Move idle balancer registration to the idle path")
> Signed-off-by: Levi Yun <ppbuk5246@gmail.com>
> ---
> v4:
> - Add fixes tags.
>
> v3:
> - Rewording commit message.
>
> v2:
> - Fix typos in commit message.
>
> kernel/time/tick-sched.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 71a792cd8936..31a4cd89782f 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -1228,8 +1228,10 @@ void tick_nohz_idle_stop_tick(void)
> ts->idle_sleeps++;
> ts->idle_expires = expires;
>
> - if (!was_stopped && tick_sched_flag_test(ts, TS_FLAG_STOPPED)) {
> - ts->idle_jiffies = ts->last_jiffies;
> + if (tick_sched_flag_test(ts, TS_FLAG_STOPPED)) {
> + if (!was_stopped)
> + ts->idle_jiffies = ts->last_jiffies;
> +
I've taken some time to respond because your patch has raised more questions
while discussing this with Anna-Maria:
1) Is Idle load balancing actually relevant for nohz_full? HK_TYPE_MISC already
prevent those CPUs from becoming idle load balancer. They can still be
targets for load balancing but nohz_full CPUs are supposed to run only one
task.
2) This is related to previous point: HK_TYPE_SCHED is never activated. It would
prevent the CPU from even beeing part of idle load balancing. Should we
remove it or plug it?
3) nohz_balance_enter_idle() is called when the tick is stopped for the first
time and nohz_balance_exit_idle() is called from the tick. But that also
applies to idle ticks. So if the load balancing triggers while the tick is
stopped, nohz_balance_enter_idle() won't be re-called in the idle loop even
though the tick is stopped (that would be fixed with your patch).
4) Why is nohz_balance_exit_idle() called from the tick and not from the idle
exit path? Is it to avoid overhead?
I'm adding some scheduler people in Cc who might help answer some of those
questions.
Thanks.
> nohz_balance_enter_idle(cpu);
> }
> } else {
> --
> 2.41.0
On Thu, May 16, 2024 at 12:52:06AM +0200, Frederic Weisbecker wrote:
> Le Thu, May 09, 2024 at 10:29:32AM +0100, Levi Yun a écrit :
> > When nohz_full CPU stops tick in tick_nohz_irq_exit(),
> > It wouldn't be chosen to perform idle load balancing because it doesn't
> > call nohz_balance_enter_idle() in tick_nohz_idle_stop_tick() when it
> > becomes idle.
> >
> > Formerly, __tick_nohz_idle_enter() is called in both
> > tick_nohz_irq_exit() and in do_idle().
> > That's why commit a0db971e4eb6 ("nohz: Move idle balancer registration
> > to the idle path") prevents nohz_full cpu which isn't yet
> > idle state but tick is stopped from entering idle balance.
> >
> > However, this prevents nohz_full cpu which already stops tick from
> > entering idle balacne when this cpu really becomes idle state.
> >
> > Currently, tick_nohz_idle_stop_tick() is only called in idle state and
> > it calls nohz_balance_enter_idle(). this function tracks the CPU
> > which is part of nohz.idle_cpus_mask with rq->nohz_tick_stopped properly.
> >
> > Therefore, Change tick_nohz_idle_stop_tick() to call nohz_balance_enter_idle()
> > without checking !was_stopped so that nohz_full cpu can be chosen to
> > perform idle load balancing when it enters idle state.
> >
> > Fixes: a0db971e4eb6 ("nohz: Move idle balancer registration to the idle path")
> > Signed-off-by: Levi Yun <ppbuk5246@gmail.com>
> > ---
> > v4:
> > - Add fixes tags.
> >
> > v3:
> > - Rewording commit message.
> >
> > v2:
> > - Fix typos in commit message.
> >
> > kernel/time/tick-sched.c | 6 ++++--
> > 1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> > index 71a792cd8936..31a4cd89782f 100644
> > --- a/kernel/time/tick-sched.c
> > +++ b/kernel/time/tick-sched.c
> > @@ -1228,8 +1228,10 @@ void tick_nohz_idle_stop_tick(void)
> > ts->idle_sleeps++;
> > ts->idle_expires = expires;
> >
> > - if (!was_stopped && tick_sched_flag_test(ts, TS_FLAG_STOPPED)) {
> > - ts->idle_jiffies = ts->last_jiffies;
> > + if (tick_sched_flag_test(ts, TS_FLAG_STOPPED)) {
> > + if (!was_stopped)
> > + ts->idle_jiffies = ts->last_jiffies;
> > +
>
> I've taken some time to respond because your patch has raised more questions
> while discussing this with Anna-Maria:
>
> 1) Is Idle load balancing actually relevant for nohz_full? HK_TYPE_MISC already
> prevent those CPUs from becoming idle load balancer. They can still be
> targets for load balancing but nohz_full CPUs are supposed to run only one
> task.
>
> 2) This is related to previous point: HK_TYPE_SCHED is never activated. It would
> prevent the CPU from even beeing part of idle load balancing. Should we
> remove it or plug it?
>
>
> 3) nohz_balance_enter_idle() is called when the tick is stopped for the first
> time and nohz_balance_exit_idle() is called from the tick. But that also
> applies to idle ticks. So if the load balancing triggers while the tick is
> stopped, nohz_balance_enter_idle() won't be re-called in the idle loop even
> though the tick is stopped (that would be fixed with your patch).
>
> 4) Why is nohz_balance_exit_idle() called from the tick and not from the idle
> exit path? Is it to avoid overhead?
>
> I'm adding some scheduler people in Cc who might help answer some of those
> questions.
None of that HK nonsense is relevant. The NOHZ_FULL nonsense implies
single CPU partitions, and *that* should be avoiding any and all
load-balancing.
If there still is, that's a bug, but that's not related to HK goo.
As such, I don't think the HK_TYPE_SCHED check in
nohz_balance_enter_idle() actually makes sense, the on_null_omain()
check a little below that should already take care of things, no?
> None of that HK nonsense is relevant. The NOHZ_FULL nonsense implies
> single CPU partitions, and *that* should be avoiding any and all
> load-balancing.
Do you mean.. tick_nohz_full cpu (non-HK-ticked cpu) shouldn't belong
to any sched_domain?
>
> If there still is, that's a bug, but that's not related to HK goo.
>
> As such, I don't think the HK_TYPE_SCHED check in
> nohz_balance_enter_idle() actually makes sense, the on_null_omain()
> check a little below that should already take care of things, no?
IIUC,
currently, whether cpu belongs on domain or null is determined by
HK_DOMAIN_FLAGS
However, when "nohz_full=" is used, it still on HK_DOMAIN, so it
belongs to sched_domain
so, it couldn't be filtered out by on_null_domain().
unless "isolcpus=domain" or "isolcpus={cpu_list}", it's on null domain.
with "isolcpus=tick", it participates sched_domain.
On Thu, May 16, 2024 at 09:20:08AM +0100, Yun Levi wrote:
> > None of that HK nonsense is relevant. The NOHZ_FULL nonsense implies
> > single CPU partitions, and *that* should be avoiding any and all
> > load-balancing.
>
> Do you mean.. tick_nohz_full cpu (non-HK-ticked cpu) shouldn't belong
> to any sched_domain?
AFAIK NOHZ_FULL still hard relies on the isolcpus garbage, so yeah, it
should be all single cpu partitions, which don't have a domain.
(this really should migrate to use cpusets partitions)
> > If there still is, that's a bug, but that's not related to HK goo.
> >
> > As such, I don't think the HK_TYPE_SCHED check in
> > nohz_balance_enter_idle() actually makes sense, the on_null_omain()
> > check a little below that should already take care of things, no?
>
> IIUC,
> currently, whether cpu belongs on domain or null is determined by
> HK_DOMAIN_FLAGS
No! you can create NULL domains without any of the HK nonsense. Both
isolcpus and cpusets can create single CPU partitions.
> However, when "nohz_full=" is used, it still on HK_DOMAIN, so it
> belongs to sched_domain
> so, it couldn't be filtered out by on_null_domain().
>
> unless "isolcpus=domain" or "isolcpus={cpu_list}", it's on null domain.
> with "isolcpus=tick", it participates sched_domain.
Frederic ?!? You can use nohz_full without isolcpus? That makes no
sense. If you do that you get to keep the pieces.
On Thu, May 16, 2024 at 10:49:11AM +0200, Peter Zijlstra wrote:
> On Thu, May 16, 2024 at 09:20:08AM +0100, Yun Levi wrote:
> > > None of that HK nonsense is relevant. The NOHZ_FULL nonsense implies
> > > single CPU partitions, and *that* should be avoiding any and all
> > > load-balancing.
> >
> > Do you mean.. tick_nohz_full cpu (non-HK-ticked cpu) shouldn't belong
> > to any sched_domain?
>
> AFAIK NOHZ_FULL still hard relies on the isolcpus garbage, so yeah, it
> should be all single cpu partitions, which don't have a domain.
>
> (this really should migrate to use cpusets partitions)
>
> > > If there still is, that's a bug, but that's not related to HK goo.
> > >
> > > As such, I don't think the HK_TYPE_SCHED check in
> > > nohz_balance_enter_idle() actually makes sense, the on_null_omain()
> > > check a little below that should already take care of things, no?
> >
> > IIUC,
> > currently, whether cpu belongs on domain or null is determined by
> > HK_DOMAIN_FLAGS
>
> No! you can create NULL domains without any of the HK nonsense. Both
> isolcpus and cpusets can create single CPU partitions.
>
> > However, when "nohz_full=" is used, it still on HK_DOMAIN, so it
> > belongs to sched_domain
> > so, it couldn't be filtered out by on_null_domain().
> >
> > unless "isolcpus=domain" or "isolcpus={cpu_list}", it's on null domain.
> > with "isolcpus=tick", it participates sched_domain.
>
> Frederic ?!? You can use nohz_full without isolcpus? That makes no
> sense. If you do that you get to keep the pieces.
I fear you can yes, even though most users combine it with isolcpus. I
know, that interface is terrible but it dates from times when we weren't
sure about all the potential usecases of nohz_full. There was a possibility
that HPC could just want to reduce ticks without all the hard and costly
isolation around. But all the usecases I have witnessed so far in ten years
involved wanting 0 noise after all...
> > > > As such, I don't think the HK_TYPE_SCHED check in
> > > > nohz_balance_enter_idle() actually makes sense, the on_null_omain()
> > > > check a little below that should already take care of things, no?
> > >
> > > IIUC,
> > > currently, whether cpu belongs on domain or null is determined by
> > > HK_DOMAIN_FLAGS
> >
> > No! you can create NULL domains without any of the HK nonsense. Both
> > isolcpus and cpusets can create single CPU partitions.
Yes. However what I said, nohz_full cpu isn't on null_domain
unless it was configured by cpusets.
even with option "nohz_full="
> > > However, when "nohz_full=" is used, it still on HK_DOMAIN, so it
> > > belongs to sched_domain
> > > so, it couldn't be filtered out by on_null_domain().
> > >
> > > unless "isolcpus=domain" or "isolcpus={cpu_list}", it's on null domain.
> > > with "isolcpus=tick", it participates sched_domain.
> >
> > Frederic ?!? You can use nohz_full without isolcpus? That makes no
> > sense. If you do that you get to keep the pieces.
>
> I fear you can yes, even though most users combine it with isolcpus. I
> know, that interface is terrible but it dates from times when we weren't
> sure about all the potential usecases of nohz_full. There was a possibility
> that HPC could just want to reduce ticks without all the hard and costly
> isolation around. But all the usecases I have witnessed so far in ten years
> involved wanting 0 noise after all...
If I make you annoyed I'm sorry in advance but let me clarify please.
1. In case of none-HK-TICK-housekeeping cpu (a.k.a nohz_full cpu),
It should be on the null_domain. right?
2. If (1) is true, when none-HK-TICK is set, should it set none-HK-DOMAIN
to prevent on any sched_domain (cpusets filter out none-HK-DOMAIN cpu)?
3. If (1) is true, Is HK_SCHED still necessary? There seems to be no use case
and the check for this can be replaced by on_null_domain().
Many thanks!
On Thu, May 16, 2024 at 01:43:49PM +0100, Yun Levi wrote:
> > > > > As such, I don't think the HK_TYPE_SCHED check in
> > > > > nohz_balance_enter_idle() actually makes sense, the on_null_omain()
> > > > > check a little below that should already take care of things, no?
> > > >
> > > > IIUC,
> > > > currently, whether cpu belongs on domain or null is determined by
> > > > HK_DOMAIN_FLAGS
> > >
> > > No! you can create NULL domains without any of the HK nonsense. Both
> > > isolcpus and cpusets can create single CPU partitions.
>
> Yes. However what I said, nohz_full cpu isn't on null_domain
> unless it was configured by cpusets.
> even with option "nohz_full="
So if a CPU isn't isolated, either by isolcpus or cpusets, you get to
participate in load-balancing -- end of story.
> > > > However, when "nohz_full=" is used, it still on HK_DOMAIN, so it
> > > > belongs to sched_domain
> > > > so, it couldn't be filtered out by on_null_domain().
> > > >
> > > > unless "isolcpus=domain" or "isolcpus={cpu_list}", it's on null domain.
> > > > with "isolcpus=tick", it participates sched_domain.
> > >
> > > Frederic ?!? You can use nohz_full without isolcpus? That makes no
> > > sense. If you do that you get to keep the pieces.
> >
> > I fear you can yes, even though most users combine it with isolcpus. I
> > know, that interface is terrible but it dates from times when we weren't
> > sure about all the potential usecases of nohz_full. There was a possibility
> > that HPC could just want to reduce ticks without all the hard and costly
> > isolation around. But all the usecases I have witnessed so far in ten years
> > involved wanting 0 noise after all...
>
>
> If I make you annoyed I'm sorry in advance but let me clarify please.
>
> 1. In case of none-HK-TICK-housekeeping cpu (a.k.a nohz_full cpu),
> It should be on the null_domain. right?
>
> 2. If (1) is true, when none-HK-TICK is set, should it set none-HK-DOMAIN
> to prevent on any sched_domain (cpusets filter out none-HK-DOMAIN cpu)?
>
> 3. If (1) is true, Is HK_SCHED still necessary? There seems to be no use case
> and the check for this can be replaced by on_null_domain().
I've no idea about all those HK knobs, it's all insane if you ask me.
Frederic, afaict all the HK_ goo in kernel/sched/fair.c is total
nonsense, can you please explain?
If the CPU participates in load-balancing, it gets to fully participate.
If you want to get out of load-balancing, you get single CPU partitions.
On Thu, May 16, 2024 at 04:00:03PM +0200, Peter Zijlstra wrote: > > If I make you annoyed I'm sorry in advance but let me clarify please. > > > > 1. In case of none-HK-TICK-housekeeping cpu (a.k.a nohz_full cpu), > > It should be on the null_domain. right? > > > > 2. If (1) is true, when none-HK-TICK is set, should it set none-HK-DOMAIN > > to prevent on any sched_domain (cpusets filter out none-HK-DOMAIN cpu)? > > > > 3. If (1) is true, Is HK_SCHED still necessary? There seems to be no use case > > and the check for this can be replaced by on_null_domain(). > > I've no idea about all those HK knobs, it's all insane if you ask me. > > Frederic, afaict all the HK_ goo in kernel/sched/fair.c is total > nonsense, can you please explain? Yes. Lemme unearth this patch: https://lore.kernel.org/all/20230203232409.163847-2-frederic@kernel.org/ Because all we need now is: _ HK_TYPE_KERNEL_NOISE: nohz_full= or isolcpus=nohz _ HK_TYPE_DOMAIN: isolcpus=domain (or classic isolcpus= alone) _ HK_TYPE_MANAGED_IRQ: isolcpus=managed_irq And that's it. Then let's remove HK_TYPE_SCHED that is unused. And then lemme comment the HK_TYPE_* uses within sched/* within the same patchset. Just a question, correct me if I'm wrong, we don't want nohz_full= to ever take the idle load balancer duty (this is what HK_TYPE_MISC prevents in find_new_ilb) because the nohz_full CPU going back to userspace concurrently doesn't want to be disturbed by a loose IPI telling it to do idle balancing. But we still want nohz_full CPUs to be part of nohz.idle_cpus_mask so that the idle balancing can be performed on them by a non isolated CPU. Right? Thanks. > > If the CPU participates in load-balancing, it gets to fully participate. > If you want to get out of load-balancing, you get single CPU partitions.
On Thu, May 16, 2024 at 04:23:31PM +0200, Frederic Weisbecker wrote: > On Thu, May 16, 2024 at 04:00:03PM +0200, Peter Zijlstra wrote: > > > If I make you annoyed I'm sorry in advance but let me clarify please. > > > > > > 1. In case of none-HK-TICK-housekeeping cpu (a.k.a nohz_full cpu), > > > It should be on the null_domain. right? > > > > > > 2. If (1) is true, when none-HK-TICK is set, should it set none-HK-DOMAIN > > > to prevent on any sched_domain (cpusets filter out none-HK-DOMAIN cpu)? > > > > > > 3. If (1) is true, Is HK_SCHED still necessary? There seems to be no use case > > > and the check for this can be replaced by on_null_domain(). > > > > I've no idea about all those HK knobs, it's all insane if you ask me. > > > > Frederic, afaict all the HK_ goo in kernel/sched/fair.c is total > > nonsense, can you please explain? > > Yes. Lemme unearth this patch: > https://lore.kernel.org/all/20230203232409.163847-2-frederic@kernel.org/ AFAICT we need more cleanups. > Because all we need now is: > > _ HK_TYPE_KERNEL_NOISE: nohz_full= or isolcpus=nohz > _ HK_TYPE_DOMAIN: isolcpus=domain (or classic isolcpus= alone) What does this do? > _ HK_TYPE_MANAGED_IRQ: isolcpus=managed_irq > > And that's it. Then let's remove HK_TYPE_SCHED that is unused. And then > lemme comment the HK_TYPE_* uses within sched/* within the same > patchset. Please, I find this MISC and DOMAIN stuff confusing, wth does it do? It can't possibly be right. > Just a question, correct me if I'm wrong, we don't want nohz_full= to ever > take the idle load balancer duty (this is what HK_TYPE_MISC prevents in > find_new_ilb) because the nohz_full CPU going back to userspace concurrently > doesn't want to be disturbed by a loose IPI telling it to do idle balancing. But > we still want nohz_full CPUs to be part of nohz.idle_cpus_mask so that the > idle balancing can be performed on them by a non isolated CPU. Right? I'm confused, none of that makes sense. If you're part of a load-balancer, you're part of a load-balancer, no ifs buts or other nonsense. idle load balancer is no different from regular load balancing. Fundamentally, you can't disable the tick if you're part of a load-balance group, the load-balancer needs the tick. The only possible way to use nohz_full is to not be part of a load-balancer, and the only way that is so is by having (lots of) single CPU partitions.
On Thu, May 16, 2024 at 04:45:04PM +0200, Peter Zijlstra wrote: > On Thu, May 16, 2024 at 04:23:31PM +0200, Frederic Weisbecker wrote: > > On Thu, May 16, 2024 at 04:00:03PM +0200, Peter Zijlstra wrote: > > > > If I make you annoyed I'm sorry in advance but let me clarify please. > > > > > > > > 1. In case of none-HK-TICK-housekeeping cpu (a.k.a nohz_full cpu), > > > > It should be on the null_domain. right? > > > > > > > > 2. If (1) is true, when none-HK-TICK is set, should it set none-HK-DOMAIN > > > > to prevent on any sched_domain (cpusets filter out none-HK-DOMAIN cpu)? > > > > > > > > 3. If (1) is true, Is HK_SCHED still necessary? There seems to be no use case > > > > and the check for this can be replaced by on_null_domain(). > > > > > > I've no idea about all those HK knobs, it's all insane if you ask me. > > > > > > Frederic, afaict all the HK_ goo in kernel/sched/fair.c is total > > > nonsense, can you please explain? > > > > Yes. Lemme unearth this patch: > > https://lore.kernel.org/all/20230203232409.163847-2-frederic@kernel.org/ > > AFAICT we need more cleanups. Well, we need to start somewhere :-) > > > Because all we need now is: > > > > _ HK_TYPE_KERNEL_NOISE: nohz_full= or isolcpus=nohz > > _ HK_TYPE_DOMAIN: isolcpus=domain (or classic isolcpus= alone) > > What does this do? So housekeeping_cpumask(HK_TYPE_KERNEL_NOISE) will return all the CPUs not in nohz_full= That is, all the CPUs that do all the housekeeping work on behalf of nohz_full CPUs (unbound workqueues and timers, etc...). Then in a similar way housekeeping_cpumask(HK_TYPE_DOMAIN) are all the CPUs not in isolcpus= (on_null_domain()). Perhaps that one is confusing and we should just have a simple isolcpus_cpumask instead? > > > _ HK_TYPE_MANAGED_IRQ: isolcpus=managed_irq > > > > And that's it. Then let's remove HK_TYPE_SCHED that is unused. And then > > lemme comment the HK_TYPE_* uses within sched/* within the same > > patchset. > > Please, I find this MISC and DOMAIN stuff confusing, wth does it do? It > can't possibly be right. MISC is actually part of what is going to become HK_TYPE_KERNEL_NOISE. It's an artificial microfeature but it's actually the same as _WORKQUEUE, _TICK, _RCU, _TIMER, etc... All of which intended to be merged together. > > > Just a question, correct me if I'm wrong, we don't want nohz_full= to ever > > take the idle load balancer duty (this is what HK_TYPE_MISC prevents in > > find_new_ilb) because the nohz_full CPU going back to userspace concurrently > > doesn't want to be disturbed by a loose IPI telling it to do idle balancing. But > > we still want nohz_full CPUs to be part of nohz.idle_cpus_mask so that the > > idle balancing can be performed on them by a non isolated CPU. Right? > > I'm confused, none of that makes sense. If you're part of a > load-balancer, you're part of a load-balancer, no ifs buts or other > nonsense. > > idle load balancer is no different from regular load balancing. > > Fundamentally, you can't disable the tick if you're part of a > load-balance group, the load-balancer needs the tick. > > The only possible way to use nohz_full is to not be part of a > load-balancer, and the only way that is so is by having (lots of) single > CPU partitions. So you're suggesting that nohz_full should just be part of the whole ilb machinery by default (that is, not fiddle with ilb internals) and then it's up to CPU partitioning (through cpuset or isolcpus) to disable ilb naturally. Right? Thanks.
On Thu, May 16, 2024 at 05:02:51PM +0200, Frederic Weisbecker wrote: > > I'm confused, none of that makes sense. If you're part of a > > load-balancer, you're part of a load-balancer, no ifs buts or other > > nonsense. > > > > idle load balancer is no different from regular load balancing. > > > > Fundamentally, you can't disable the tick if you're part of a > > load-balance group, the load-balancer needs the tick. > > > > The only possible way to use nohz_full is to not be part of a > > load-balancer, and the only way that is so is by having (lots of) single > > CPU partitions. > > So you're suggesting that nohz_full should just be part of the whole > ilb machinery by default (that is, not fiddle with ilb internals) and > then it's up to CPU partitioning (through cpuset or isolcpus) to disable > ilb naturally. Right? Yes, but stronger, as long as the CPU is part of a load-balance domain, it must not disable the tick while running anything. that is, NOHZ_FULL must not become active unless it's running on a single CPU partition.
On Thu, May 16, 2024 at 05:19:53PM +0200, Peter Zijlstra wrote: > On Thu, May 16, 2024 at 05:02:51PM +0200, Frederic Weisbecker wrote: > > > > I'm confused, none of that makes sense. If you're part of a > > > load-balancer, you're part of a load-balancer, no ifs buts or other > > > nonsense. > > > > > > idle load balancer is no different from regular load balancing. > > > > > > Fundamentally, you can't disable the tick if you're part of a > > > load-balance group, the load-balancer needs the tick. > > > > > > The only possible way to use nohz_full is to not be part of a > > > load-balancer, and the only way that is so is by having (lots of) single > > > CPU partitions. > > > > So you're suggesting that nohz_full should just be part of the whole > > ilb machinery by default (that is, not fiddle with ilb internals) and > > then it's up to CPU partitioning (through cpuset or isolcpus) to disable > > ilb naturally. Right? > > Yes, but stronger, as long as the CPU is part of a load-balance domain, > it must not disable the tick while running anything. > > that is, NOHZ_FULL must not become active unless it's running on a > single CPU partition. I like the idea but I'm afraid to introduce regressions while doing so, with people currently using nohz_full without proper partionning...
On Thu, May 16, 2024 at 05:32:56PM +0200, Frederic Weisbecker wrote: > On Thu, May 16, 2024 at 05:19:53PM +0200, Peter Zijlstra wrote: > > Yes, but stronger, as long as the CPU is part of a load-balance domain, > > it must not disable the tick while running anything. > > > > that is, NOHZ_FULL must not become active unless it's running on a > > single CPU partition. > > I like the idea but I'm afraid to introduce regressions while doing so, > with people currently using nohz_full without proper partionning... There is no regression, if this is possible today it is utterly broken. This should never have been possible.
Le Thu, May 16, 2024 at 07:53:21PM +0200, Peter Zijlstra a écrit : > On Thu, May 16, 2024 at 05:32:56PM +0200, Frederic Weisbecker wrote: > > On Thu, May 16, 2024 at 05:19:53PM +0200, Peter Zijlstra wrote: > > > > Yes, but stronger, as long as the CPU is part of a load-balance domain, > > > it must not disable the tick while running anything. > > > > > > that is, NOHZ_FULL must not become active unless it's running on a > > > single CPU partition. > > > > I like the idea but I'm afraid to introduce regressions while doing so, > > with people currently using nohz_full without proper partionning... > > There is no regression, if this is possible today it is utterly broken. > > This should never have been possible. Ok, I'll try something. Thanks.
> > Yes, but stronger, as long as the CPU is part of a load-balance domain, > > it must not disable the tick while running anything. > > > > that is, NOHZ_FULL must not become active unless it's running on a > > single CPU partition. > > I like the idea but I'm afraid to introduce regressions while doing so, > with people currently using nohz_full without proper partionning... But I wonder when nohz_full cpu enters *idle". If the cpuidle governor selects an idle state which wants "stop_tick", If nohz_full cpu is still in sched domain, should it disable tick or not?
HI. Frederic! Thanks for your reply. > 1) Is Idle load balancing actually relevant for nohz_full? HK_TYPE_MISC already > prevent those CPUs from becoming idle load balancer. They can still be > targets for load balancing but nohz_full CPUs are supposed to run only one > task. I'm not sure nohz_full cpu is really relevant with Idle load balancing. However, I couldn't find any reason it shouldn't participate Idle load balancing. And about HK_TYPE_MISC, this is half true when it sets with "nohz_full=" options, if nohz_full cpu is set via "isolcpus=nohz," it's still in HK_TYPE_MISC mask so find_ilb() doesn't prevent entering idle load balance. > 2) This is related to previous point: HK_TYPE_SCHED is never activated. It would > prevent the CPU from even beeing part of idle load balancing. Should we > remove it or plug it? IMHO, If we want to prevent nohz_full cpu entering, I should set with HK_TYPE_TICK | HK_TYPE_SCHED together. Because, find_new_ilb()'s comment say " HK_TYPE_MISC CPUs are used for this task, because HK_TYPE_SCHED is not set anywhere yet." or integrate with HK_TYPE_TICK....? removing HK_TYPE_SCHED in the hk list... It depends on ilb relevant for nohz_full... If it is, Idle load balance could be controlled with HK_TYPE_SCHED. If it doesn't ,IMHO, It seems better to remove HK_TYPE_SCHED and check with HK_TYPE_SCHED. > 4) Why is nohz_balance_exit_idle() called from the tick and not from the idle > exit path? Is it to avoid overhead? > I'm adding some scheduler people in Cc who might help answer some of those > questions. ... Thanks :)
Gentle ping..?
Am I missing something?
Thanks.
On Thu, May 9, 2024 at 10:29 AM Levi Yun <ppbuk5246@gmail.com> wrote:
>
> When nohz_full CPU stops tick in tick_nohz_irq_exit(),
> It wouldn't be chosen to perform idle load balancing because it doesn't
> call nohz_balance_enter_idle() in tick_nohz_idle_stop_tick() when it
> becomes idle.
>
> Formerly, __tick_nohz_idle_enter() is called in both
> tick_nohz_irq_exit() and in do_idle().
> That's why commit a0db971e4eb6 ("nohz: Move idle balancer registration
> to the idle path") prevents nohz_full cpu which isn't yet
> idle state but tick is stopped from entering idle balance.
>
> However, this prevents nohz_full cpu which already stops tick from
> entering idle balacne when this cpu really becomes idle state.
>
> Currently, tick_nohz_idle_stop_tick() is only called in idle state and
> it calls nohz_balance_enter_idle(). this function tracks the CPU
> which is part of nohz.idle_cpus_mask with rq->nohz_tick_stopped properly.
>
> Therefore, Change tick_nohz_idle_stop_tick() to call nohz_balance_enter_idle()
> without checking !was_stopped so that nohz_full cpu can be chosen to
> perform idle load balancing when it enters idle state.
>
> Fixes: a0db971e4eb6 ("nohz: Move idle balancer registration to the idle path")
> Signed-off-by: Levi Yun <ppbuk5246@gmail.com>
> ---
> v4:
> - Add fixes tags.
>
> v3:
> - Rewording commit message.
>
> v2:
> - Fix typos in commit message.
>
> kernel/time/tick-sched.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 71a792cd8936..31a4cd89782f 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -1228,8 +1228,10 @@ void tick_nohz_idle_stop_tick(void)
> ts->idle_sleeps++;
> ts->idle_expires = expires;
>
> - if (!was_stopped && tick_sched_flag_test(ts, TS_FLAG_STOPPED)) {
> - ts->idle_jiffies = ts->last_jiffies;
> + if (tick_sched_flag_test(ts, TS_FLAG_STOPPED)) {
> + if (!was_stopped)
> + ts->idle_jiffies = ts->last_jiffies;
> +
> nohz_balance_enter_idle(cpu);
> }
> } else {
> --
> 2.41.0
… > Currently, tick_nohz_idle_stop_tick() is only called in idle state and > it calls nohz_balance_enter_idle(). this function tracks the CPU > which is part of nohz.idle_cpus_mask with rq->nohz_tick_stopped properly. … * How did you notice improvement possibilities for such data processing? * I hope that a few remaining wording “weaknesses” can be adjusted accordingly. Regards, Markus
© 2016 - 2025 Red Hat, Inc.