RCU changes for PREEMPT_LAZY

[PATCH v2 0/6] RCU changes for PREEMPT_LAZY

Posted by Ankur Arora 2 weeks, 3 days ago

This series adds RCU and some leftover scheduler bits for lazy
preemption.

The main problem addressed in the RCU related patches is that before
PREEMPT_LAZY, PREEMPTION=y implied PREEMPT_RCU=y. With PREEMPT_LAZY,
that's no longer true. 

That's because PREEMPT_RCU makes some trade-offs to optimize for
latency as opposed to throughput, and configurations with limited
preemption might prefer the stronger forward-progress guarantees of
PREEMPT_RCU=n.

Accordingly, with standalone PREEMPT_LAZY (much like PREEMPT_NONE,
PREEMPT_VOLUNTARY) we want to use PREEMPT_RCU=n. And, when used in
conjunction with PREEMPT_DYNAMIC, we continue to use PREEMPT_RCU=y.

Patches 1 and 2 are cleanup patches:
  "rcu: fix header guard for rcu_all_qs()"
  "rcu: rename PREEMPT_AUTO to PREEMPT_LAZY"

Patch 3, "rcu: limit PREEMPT_RCU configurations", explicitly limits
PREEMPT_RCU=y to the PREEMPT_DYNAMIC or the latency oriented models.

Patches 4 and 5,
  "rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y"
  "osnoise: handle quiescent states for PREEMPT_RCU=n, PREEMPTION=y"

handle quiescent states for the (PREEMPT_LAZY=y, PREEMPT_RCU=n)
configuration.

And, finally patch-6
  "sched: warn for high latency with TIF_NEED_RESCHED_LAZY"
adds high latency warning for TIF_NEED_RESCHED_LAZY.

Goes on top of PeterZ's tree:

 git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/core

Changelog:
  - fixup incorrect usage of tif_need_resched_lazy() (comment from
    from Sebastian Andrzej Siewior)
  - massaged the commit messages a bit
  - drops the powerpc support for PREEMPT_LAZY as that was orthogonal
    to this series (Shrikanth will send that out separately.)

Please review.

Ankur Arora (6):
  rcu: fix header guard for rcu_all_qs()
  rcu: rename PREEMPT_AUTO to PREEMPT_LAZY
  rcu: limit PREEMPT_RCU configurations
  rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y
  osnoise: handle quiescent states for PREEMPT_RCU=n, PREEMPTION=y
  sched: warn for high latency with TIF_NEED_RESCHED_LAZY

 include/linux/rcutree.h      |  2 +-
 include/linux/srcutiny.h     |  2 +-
 kernel/rcu/Kconfig           |  4 ++--
 kernel/rcu/srcutiny.c        | 14 +++++++-------
 kernel/rcu/tree_plugin.h     | 11 +++++++----
 kernel/sched/core.c          |  3 ++-
 kernel/sched/debug.c         |  7 +++++--
 kernel/trace/trace_osnoise.c | 22 ++++++++++++----------
 8 files changed, 37 insertions(+), 28 deletions(-)

-- 
2.43.5

Re: [PATCH v2 0/6] RCU changes for PREEMPT_LAZY

Posted by Sebastian Andrzej Siewior 1 week, 2 days ago

On 2024-11-06 12:17:52 [-0800], Ankur Arora wrote:
> This series adds RCU and some leftover scheduler bits for lazy
> preemption.

This is not critical for the current implementation. The way I
understand is that you make a change in 3/6 and then all other patches
in this series are required to deal with this.

For bisect reasons it would make sense to have 3/6 last in the series
and to the "fixes" first before the code is enabled. I mean if you apply
3/6 first then you get build failures without 1/6. But with 3/6 before
5/6 you should get runtime errors, right?

> The main problem addressed in the RCU related patches is that before
> PREEMPT_LAZY, PREEMPTION=y implied PREEMPT_RCU=y. With PREEMPT_LAZY,
> that's no longer true. 

No, you want to make PREEMPTION=y + PREEMPT_RCU=n + PREEMPT_LAZY=y
possible. This is different. Your wording makes it sound like there _is_
an actual problem.

> That's because PREEMPT_RCU makes some trade-offs to optimize for
> latency as opposed to throughput, and configurations with limited
> preemption might prefer the stronger forward-progress guarantees of
> PREEMPT_RCU=n.
> 
> Accordingly, with standalone PREEMPT_LAZY (much like PREEMPT_NONE,
> PREEMPT_VOLUNTARY) we want to use PREEMPT_RCU=n. And, when used in
> conjunction with PREEMPT_DYNAMIC, we continue to use PREEMPT_RCU=y.
> 
> Patches 1 and 2 are cleanup patches:
>   "rcu: fix header guard for rcu_all_qs()"
>   "rcu: rename PREEMPT_AUTO to PREEMPT_LAZY"
> 
> Patch 3, "rcu: limit PREEMPT_RCU configurations", explicitly limits
> PREEMPT_RCU=y to the PREEMPT_DYNAMIC or the latency oriented models.
> 
> Patches 4 and 5,
>   "rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y"
>   "osnoise: handle quiescent states for PREEMPT_RCU=n, PREEMPTION=y"
> 
> handle quiescent states for the (PREEMPT_LAZY=y, PREEMPT_RCU=n)
> configuration.

I was briefly thinking about 

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5646,8 +5646,11 @@ void sched_tick(void)
 	hw_pressure = arch_scale_hw_pressure(cpu_of(rq));
 	update_hw_load_avg(rq_clock_task(rq), rq, hw_pressure);
 
-	if (dynamic_preempt_lazy() && tif_test_bit(TIF_NEED_RESCHED_LAZY))
+	if (dynamic_preempt_lazy() && tif_test_bit(TIF_NEED_RESCHED_LAZY)) {
 		resched_curr(rq);
+		if (!IS_ENABLED(CONFIG_PREEMPT_RCU))
+			rcu_all_qs();
+	}
 
 	donor->sched_class->task_tick(rq, donor, 0);
 	if (sched_feat(LATENCY_WARN))

which should make #4+ #5 obsolete. But I think it is nicer to have the
change in #4 since it extends the check to cover all cases. And then
we would do it twice just for osnoise.

Sebastian

Re: [PATCH v2 0/6] RCU changes for PREEMPT_LAZY

Posted by Ankur Arora 1 week, 1 day ago

Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:

> On 2024-11-06 12:17:52 [-0800], Ankur Arora wrote:
>> This series adds RCU and some leftover scheduler bits for lazy
>> preemption.
>
> This is not critical for the current implementation. The way I
> understand is that you make a change in 3/6 and then all other patches
> in this series are required to deal with this.
>
> For bisect reasons it would make sense to have 3/6 last in the series
> and to the "fixes" first before the code is enabled. I mean if you apply
> 3/6 first then you get build failures without 1/6. But with 3/6 before
> 5/6 you should get runtime errors, right?

That's a good point. Will reorder.

>> The main problem addressed in the RCU related patches is that before
>> PREEMPT_LAZY, PREEMPTION=y implied PREEMPT_RCU=y. With PREEMPT_LAZY,
>> that's no longer true.
>
> No, you want to make PREEMPTION=y + PREEMPT_RCU=n + PREEMPT_LAZY=y
> possible. This is different. Your wording makes it sound like there _is_
> an actual problem.

That's too literal a reading. It's just the problem ("matter or
situation that is unwelcome" to quote from a dictionary) addressed in
the patches.

>> That's because PREEMPT_RCU makes some trade-offs to optimize for
>> latency as opposed to throughput, and configurations with limited
>> preemption might prefer the stronger forward-progress guarantees of
>> PREEMPT_RCU=n.
>>
>> Accordingly, with standalone PREEMPT_LAZY (much like PREEMPT_NONE,
>> PREEMPT_VOLUNTARY) we want to use PREEMPT_RCU=n. And, when used in
>> conjunction with PREEMPT_DYNAMIC, we continue to use PREEMPT_RCU=y.
>>
>> Patches 1 and 2 are cleanup patches:
>>   "rcu: fix header guard for rcu_all_qs()"
>>   "rcu: rename PREEMPT_AUTO to PREEMPT_LAZY"
>>
>> Patch 3, "rcu: limit PREEMPT_RCU configurations", explicitly limits
>> PREEMPT_RCU=y to the PREEMPT_DYNAMIC or the latency oriented models.
>>
>> Patches 4 and 5,
>>   "rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y"
>>   "osnoise: handle quiescent states for PREEMPT_RCU=n, PREEMPTION=y"
>>
>> handle quiescent states for the (PREEMPT_LAZY=y, PREEMPT_RCU=n)
>> configuration.
>
> I was briefly thinking about
>
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5646,8 +5646,11 @@ void sched_tick(void)
>  	hw_pressure = arch_scale_hw_pressure(cpu_of(rq));
>  	update_hw_load_avg(rq_clock_task(rq), rq, hw_pressure);
>
> -	if (dynamic_preempt_lazy() && tif_test_bit(TIF_NEED_RESCHED_LAZY))
> +	if (dynamic_preempt_lazy() && tif_test_bit(TIF_NEED_RESCHED_LAZY)) {
>  		resched_curr(rq);
> +		if (!IS_ENABLED(CONFIG_PREEMPT_RCU))
> +			rcu_all_qs();
> +	}
>
>  	donor->sched_class->task_tick(rq, donor, 0);
>  	if (sched_feat(LATENCY_WARN))
>
> which should make #4+ #5 obsolete. But I think it is nicer to have the
> change in #4 since it extends the check to cover all cases. And then
> we would do it twice just for osnoise.

Yeah, exactly. The check here only deals with this specific case
while the one in rcu_flavor_sched_clock_irq() can handle that more
generally.


Thanks.

--
ankur