[v1] PREEMPT_AUTO: support lazy rescheduling

[PATCH 15/30] rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y

Posted by Ankur Arora 1 year, 12 months ago

With PREEMPT_RCU=n, cond_resched() provides urgently needed quiescent
states for read-side critical sections via rcu_all_qs().
One reason why this was necessary: lacking preempt-count, the tick
handler has no way of knowing whether it is executing in a read-side
critical section or not.

With PREEMPT_AUTO=y, there can be configurations with (PREEMPT_COUNT=y,
PREEMPT_RCU=n). This means that cond_resched() is a stub which does
not provide for quiescent states via rcu_all_qs().

So, use the availability of preempt_count() to report quiescent states
in rcu_flavor_sched_clock_irq().

Suggested-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---
 kernel/rcu/tree_plugin.h | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 26c79246873a..9b72e9d2b6fe 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -963,13 +963,16 @@ static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp)
  */
 static void rcu_flavor_sched_clock_irq(int user)
 {
-	if (user || rcu_is_cpu_rrupt_from_idle()) {
+	if (user || rcu_is_cpu_rrupt_from_idle() ||
+	    (IS_ENABLED(CONFIG_PREEMPT_COUNT) &&
+	     !(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK)))) {
 
 		/*
 		 * Get here if this CPU took its interrupt from user
-		 * mode or from the idle loop, and if this is not a
-		 * nested interrupt.  In this case, the CPU is in
-		 * a quiescent state, so note it.
+		 * mode, from the idle loop without this being a nested
+		 * interrupt, or while not holding a preempt count.
+		 * In this case, the CPU is in a quiescent state, so note
+		 * it.
 		 *
 		 * No memory barrier is required here because rcu_qs()
 		 * references only CPU-local variables that other CPUs
-- 
2.31.1

Re: [PATCH 15/30] rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y

Posted by Joel Fernandes 1 year, 11 months ago

Hello Ankur and Paul,

On Mon, Feb 12, 2024 at 09:55:39PM -0800, Ankur Arora wrote:
> With PREEMPT_RCU=n, cond_resched() provides urgently needed quiescent
> states for read-side critical sections via rcu_all_qs().
> One reason why this was necessary: lacking preempt-count, the tick
> handler has no way of knowing whether it is executing in a read-side
> critical section or not.
> 
> With PREEMPT_AUTO=y, there can be configurations with (PREEMPT_COUNT=y,
> PREEMPT_RCU=n). This means that cond_resched() is a stub which does
> not provide for quiescent states via rcu_all_qs().
> 
> So, use the availability of preempt_count() to report quiescent states
> in rcu_flavor_sched_clock_irq().
> 
> Suggested-by: Paul E. McKenney <paulmck@kernel.org>
> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
> ---
>  kernel/rcu/tree_plugin.h | 11 +++++++----
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 26c79246873a..9b72e9d2b6fe 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -963,13 +963,16 @@ static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp)
>   */
>  static void rcu_flavor_sched_clock_irq(int user)
>  {
> -	if (user || rcu_is_cpu_rrupt_from_idle()) {
> +	if (user || rcu_is_cpu_rrupt_from_idle() ||
> +	    (IS_ENABLED(CONFIG_PREEMPT_COUNT) &&
> +	     !(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK)))) {

I was wondering if it makes sense to even support !PREEMPT_RCU under
CONFIG_PREEMPT_AUTO.

AFAIU, this CONFIG_PREEMPT_AUTO series preempts the kernel on
the next tick boundary in the worst case, with all preempt modes including
the preempt=none mode.

Considering this, does it makes sense for RCU to be non-preemptible in
CONFIG_PREEMPT_AUTO=y? Because if that were the case, and a read-side critical
section extended beyond the tick, then it prevents the PREEMPT_AUTO preemption
from happening, because rcu_read_lock() would preempt_disable().

To that end, I wonder if CONFIG_PREEMPT_AUTO should select CONFIG_PREEMPTION
(or CONFIG_PREEMPT_BUILD, not sure which) as well because it does cause
kernel preemption. That then forces selection of CONFIG_PREEMPT_RCU as well.

thanks,

 - Joel







>  
>  		/*
>  		 * Get here if this CPU took its interrupt from user
> -		 * mode or from the idle loop, and if this is not a
> -		 * nested interrupt.  In this case, the CPU is in
> -		 * a quiescent state, so note it.
> +		 * mode, from the idle loop without this being a nested
> +		 * interrupt, or while not holding a preempt count.
> +		 * In this case, the CPU is in a quiescent state, so note
> +		 * it.
>  		 *
>  		 * No memory barrier is required here because rcu_qs()
>  		 * references only CPU-local variables that other CPUs
> -- 
> 2.31.1
>

Re: [PATCH 15/30] rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y

Posted by Paul E. McKenney 1 year, 11 months ago

On Sun, Mar 10, 2024 at 06:03:30AM -0400, Joel Fernandes wrote:
> Hello Ankur and Paul,
> 
> On Mon, Feb 12, 2024 at 09:55:39PM -0800, Ankur Arora wrote:
> > With PREEMPT_RCU=n, cond_resched() provides urgently needed quiescent
> > states for read-side critical sections via rcu_all_qs().
> > One reason why this was necessary: lacking preempt-count, the tick
> > handler has no way of knowing whether it is executing in a read-side
> > critical section or not.
> > 
> > With PREEMPT_AUTO=y, there can be configurations with (PREEMPT_COUNT=y,
> > PREEMPT_RCU=n). This means that cond_resched() is a stub which does
> > not provide for quiescent states via rcu_all_qs().
> > 
> > So, use the availability of preempt_count() to report quiescent states
> > in rcu_flavor_sched_clock_irq().
> > 
> > Suggested-by: Paul E. McKenney <paulmck@kernel.org>
> > Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
> > ---
> >  kernel/rcu/tree_plugin.h | 11 +++++++----
> >  1 file changed, 7 insertions(+), 4 deletions(-)
> > 
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index 26c79246873a..9b72e9d2b6fe 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -963,13 +963,16 @@ static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp)
> >   */
> >  static void rcu_flavor_sched_clock_irq(int user)
> >  {
> > -	if (user || rcu_is_cpu_rrupt_from_idle()) {
> > +	if (user || rcu_is_cpu_rrupt_from_idle() ||
> > +	    (IS_ENABLED(CONFIG_PREEMPT_COUNT) &&
> > +	     !(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK)))) {
> 
> I was wondering if it makes sense to even support !PREEMPT_RCU under
> CONFIG_PREEMPT_AUTO.
> 
> AFAIU, this CONFIG_PREEMPT_AUTO series preempts the kernel on
> the next tick boundary in the worst case, with all preempt modes including
> the preempt=none mode.
> 
> Considering this, does it makes sense for RCU to be non-preemptible in
> CONFIG_PREEMPT_AUTO=y? Because if that were the case, and a read-side critical
> section extended beyond the tick, then it prevents the PREEMPT_AUTO preemption
> from happening, because rcu_read_lock() would preempt_disable().

Yes, it does make sense for RCU to be non-preemptible in kernels
built with CONFIG_PREEMPT_AUTO=y and either CONFIG_PREEMPT_NONE=y or
CONFIG_PREEMPT_VOLUNTARY=y.  As noted in earlier discussions, there are
systems that are adequately but not abundantly endowed with memory.
Such systems need non-preemptible RCU to avoid preempted-reader OOMs.
Note well that non-preemptible RCU explicitly disables preemption across
all RCU readers.

							Thanx, Paul


> To that end, I wonder if CONFIG_PREEMPT_AUTO should select CONFIG_PREEMPTION
> (or CONFIG_PREEMPT_BUILD, not sure which) as well because it does cause
> kernel preemption. That then forces selection of CONFIG_PREEMPT_RCU as well.
> 
> thanks,
> 
>  - Joel
> 
> 
> 
> 
> 
> 
> 
> >  
> >  		/*
> >  		 * Get here if this CPU took its interrupt from user
> > -		 * mode or from the idle loop, and if this is not a
> > -		 * nested interrupt.  In this case, the CPU is in
> > -		 * a quiescent state, so note it.
> > +		 * mode, from the idle loop without this being a nested
> > +		 * interrupt, or while not holding a preempt count.
> > +		 * In this case, the CPU is in a quiescent state, so note
> > +		 * it.
> >  		 *
> >  		 * No memory barrier is required here because rcu_qs()
> >  		 * references only CPU-local variables that other CPUs
> > -- 
> > 2.31.1
> >

Re: [PATCH 15/30] rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y

Posted by Joel Fernandes 1 year, 11 months ago

On 3/10/2024 2:56 PM, Paul E. McKenney wrote:
> On Sun, Mar 10, 2024 at 06:03:30AM -0400, Joel Fernandes wrote:
>> Hello Ankur and Paul,
>>
>> On Mon, Feb 12, 2024 at 09:55:39PM -0800, Ankur Arora wrote:
>>> With PREEMPT_RCU=n, cond_resched() provides urgently needed quiescent
>>> states for read-side critical sections via rcu_all_qs().
>>> One reason why this was necessary: lacking preempt-count, the tick
>>> handler has no way of knowing whether it is executing in a read-side
>>> critical section or not.
>>>
>>> With PREEMPT_AUTO=y, there can be configurations with (PREEMPT_COUNT=y,
>>> PREEMPT_RCU=n). This means that cond_resched() is a stub which does
>>> not provide for quiescent states via rcu_all_qs().
>>>
>>> So, use the availability of preempt_count() to report quiescent states
>>> in rcu_flavor_sched_clock_irq().
>>>
>>> Suggested-by: Paul E. McKenney <paulmck@kernel.org>
>>> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
>>> ---
>>>  kernel/rcu/tree_plugin.h | 11 +++++++----
>>>  1 file changed, 7 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
>>> index 26c79246873a..9b72e9d2b6fe 100644
>>> --- a/kernel/rcu/tree_plugin.h
>>> +++ b/kernel/rcu/tree_plugin.h
>>> @@ -963,13 +963,16 @@ static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp)
>>>   */
>>>  static void rcu_flavor_sched_clock_irq(int user)
>>>  {
>>> -	if (user || rcu_is_cpu_rrupt_from_idle()) {
>>> +	if (user || rcu_is_cpu_rrupt_from_idle() ||
>>> +	    (IS_ENABLED(CONFIG_PREEMPT_COUNT) &&
>>> +	     !(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK)))) {
>>
>> I was wondering if it makes sense to even support !PREEMPT_RCU under
>> CONFIG_PREEMPT_AUTO.
>>
>> AFAIU, this CONFIG_PREEMPT_AUTO series preempts the kernel on
>> the next tick boundary in the worst case, with all preempt modes including
>> the preempt=none mode.
>>
>> Considering this, does it makes sense for RCU to be non-preemptible in
>> CONFIG_PREEMPT_AUTO=y? Because if that were the case, and a read-side critical
>> section extended beyond the tick, then it prevents the PREEMPT_AUTO preemption
>> from happening, because rcu_read_lock() would preempt_disable().
> 
> Yes, it does make sense for RCU to be non-preemptible in kernels
> built with CONFIG_PREEMPT_AUTO=y and either CONFIG_PREEMPT_NONE=y or
> CONFIG_PREEMPT_VOLUNTARY=y.
> As noted in earlier discussions, there are

Sorry if I missed a discussion, appreciate a link.

> systems that are adequately but not abundantly endowed with memory.
> Such systems need non-preemptible RCU to avoid preempted-reader OOMs.

Then why don't such systems have a problem with CONFIG_PREEMPT_DYNAMIC=y and
preempt=none mode? CONFIG_PREEMPT_DYNAMIC forces CONFIG_PREEMPT_RCU=y. There's
no way to set CONFIG_PREEMPT_RCU=n with CONFIG_PREEMPT_DYNAMIC=y and
preempt=none boot parameter.  IMHO, if this feature is inconsistent with
CONFIG_PREEMPT_DYNAMIC, that makes it super confusing.  In fact, I feel
CONFIG_PREEMPT_AUTO should instead just be another "preempt=auto" boot parameter
mode added to CONFIG_PREEMPT_DYNAMIC feature, otherwise the proliferation of
CONFIG_PREEMPT config options is getting a bit insane. And likely going to be
burden to the users configuring the PREEMPT Kconfig option IMHO.

> Note well that non-preemptible RCU explicitly disables preemption across
> all RCU readers.

Yes, I mentioned this 'disabling preemption' aspect in my last email. My point
being, unlike CONFIG_PREEMPT_NONE, CONFIG_PREEMPT_AUTO allows for kernel
preemption in preempt=none. So the "Don't preempt the kernel" behavior has
changed. That is, preempt=none under CONFIG_PREEMPT_AUTO is different from
CONFIG_PREEMPT_NONE=y already. Here we *are* preempting. And RCU is getting on
the way. It is like saying, you want an option for CONFIG_PREEMPT_RCU to be set
to =n for CONFIG_PREEMPT=y kernels, sighting users who want a fully-preemptible
kernel but are worried about reader preemptions.

That aside, as such, I do agree your point of view, that preemptible readers
presents a problem to folks using preempt=none in this series and we could
decide to keep CONFIG_PREEMPT_RCU optional for whoever wants it that way. I was
just saying that I want CONFIG_PREEMPT_AUTO's preempt=none mode to be somewhat
consistent with CONFIG_PREEMPT_DYNAMIC's preempt=none. Because I'm pretty sure a
week from now, no one will likely be able to tell the difference ;-). So IMHO
either CONFIG_PREEMPT_DYNAMIC should be changed to make CONFIG_PREEMPT_RCU
optional, or this series should be altered to force CONFIG_PREEMPT_RCU=y.

Let me know if I missed something.

Thanks!

 - Joel

Re: [PATCH 15/30] rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y

Posted by Ankur Arora 1 year, 11 months ago

Joel Fernandes <joel@joelfernandes.org> writes:

> On 3/10/2024 2:56 PM, Paul E. McKenney wrote:
>> On Sun, Mar 10, 2024 at 06:03:30AM -0400, Joel Fernandes wrote:
>>> Hello Ankur and Paul,
>>>
>>> On Mon, Feb 12, 2024 at 09:55:39PM -0800, Ankur Arora wrote:
>>>> With PREEMPT_RCU=n, cond_resched() provides urgently needed quiescent
>>>> states for read-side critical sections via rcu_all_qs().
>>>> One reason why this was necessary: lacking preempt-count, the tick
>>>> handler has no way of knowing whether it is executing in a read-side
>>>> critical section or not.
>>>>
>>>> With PREEMPT_AUTO=y, there can be configurations with (PREEMPT_COUNT=y,
>>>> PREEMPT_RCU=n). This means that cond_resched() is a stub which does
>>>> not provide for quiescent states via rcu_all_qs().
>>>>
>>>> So, use the availability of preempt_count() to report quiescent states
>>>> in rcu_flavor_sched_clock_irq().
>>>>
>>>> Suggested-by: Paul E. McKenney <paulmck@kernel.org>
>>>> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
>>>> ---
>>>>  kernel/rcu/tree_plugin.h | 11 +++++++----
>>>>  1 file changed, 7 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
>>>> index 26c79246873a..9b72e9d2b6fe 100644
>>>> --- a/kernel/rcu/tree_plugin.h
>>>> +++ b/kernel/rcu/tree_plugin.h
>>>> @@ -963,13 +963,16 @@ static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp)
>>>>   */
>>>>  static void rcu_flavor_sched_clock_irq(int user)
>>>>  {
>>>> -	if (user || rcu_is_cpu_rrupt_from_idle()) {
>>>> +	if (user || rcu_is_cpu_rrupt_from_idle() ||
>>>> +	    (IS_ENABLED(CONFIG_PREEMPT_COUNT) &&
>>>> +	     !(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK)))) {
>>>
>>> I was wondering if it makes sense to even support !PREEMPT_RCU under
>>> CONFIG_PREEMPT_AUTO.
>>>
>>> AFAIU, this CONFIG_PREEMPT_AUTO series preempts the kernel on
>>> the next tick boundary in the worst case, with all preempt modes including
>>> the preempt=none mode.
>>>
>>> Considering this, does it makes sense for RCU to be non-preemptible in
>>> CONFIG_PREEMPT_AUTO=y? Because if that were the case, and a read-side critical
>>> section extended beyond the tick, then it prevents the PREEMPT_AUTO preemption
>>> from happening, because rcu_read_lock() would preempt_disable().
>>
>> Yes, it does make sense for RCU to be non-preemptible in kernels
>> built with CONFIG_PREEMPT_AUTO=y and either CONFIG_PREEMPT_NONE=y or
>> CONFIG_PREEMPT_VOLUNTARY=y.
>> As noted in earlier discussions, there are
>
> Sorry if I missed a discussion, appreciate a link.
>
>> systems that are adequately but not abundantly endowed with memory.
>> Such systems need non-preemptible RCU to avoid preempted-reader OOMs.
>
> Then why don't such systems have a problem with CONFIG_PREEMPT_DYNAMIC=y and
> preempt=none mode? CONFIG_PREEMPT_DYNAMIC forces CONFIG_PREEMPT_RCU=y. There's
> no way to set CONFIG_PREEMPT_RCU=n with CONFIG_PREEMPT_DYNAMIC=y and
> preempt=none boot parameter.  IMHO, if this feature is inconsistent with
> CONFIG_PREEMPT_DYNAMIC, that makes it super confusing.  In fact, I feel
> CONFIG_PREEMPT_AUTO should instead just be another "preempt=auto" boot parameter
> mode added to CONFIG_PREEMPT_DYNAMIC feature, otherwise the proliferation of
> CONFIG_PREEMPT config options is getting a bit insane. And likely going to be
> burden to the users configuring the PREEMPT Kconfig option IMHO.
>
>> Note well that non-preemptible RCU explicitly disables preemption across
>> all RCU readers.
>
> Yes, I mentioned this 'disabling preemption' aspect in my last email. My point
> being, unlike CONFIG_PREEMPT_NONE, CONFIG_PREEMPT_AUTO allows for kernel
> preemption in preempt=none. So the "Don't preempt the kernel" behavior has
> changed. That is, preempt=none under CONFIG_PREEMPT_AUTO is different from
> CONFIG_PREEMPT_NONE=y already. Here we *are* preempting. And RCU is getting on

I think that's a view from too close to the implementation. Someone
using the kernel is not necessarily concered with whether tasks are
preempted or not. They are concerned with throughput and latency.

Framed thus:

preempt=none: tasks typically run to completion, might result in high latency
preempt=full: preempt at the earliest opportunity, low latency
preempt=voluntary: somewhere between these two

In that case you could argue that CONFIG_PREEMPT_NONE,
(CONFIG_PREEMPT_DYNAMIC, preempt=none) and (CONFIG_PREEMPT_AUTO,
preempt=none) have broadly similar behaviour.

> the way. It is like saying, you want an option for CONFIG_PREEMPT_RCU to be set
> to =n for CONFIG_PREEMPT=y kernels, sighting users who want a fully-preemptible
> kernel but are worried about reader preemptions.
>
> That aside, as such, I do agree your point of view, that preemptible readers
> presents a problem to folks using preempt=none in this series and we could
> decide to keep CONFIG_PREEMPT_RCU optional for whoever wants it that way. I was
> just saying that I want CONFIG_PREEMPT_AUTO's preempt=none mode to be somewhat
> consistent with CONFIG_PREEMPT_DYNAMIC's preempt=none. Because I'm pretty sure a

PREEMPT_DYNAMIC and PREEMPT_AUTO are trying to do different tasks.

PREEMPT_DYNAMIC: allow dynamic switching between the original
PREEMPT_NONE, PREEMPT_VOLUNTARY, PREEEMPT models.

PREEMPT_AUTO: remove the need for explicit preemption points, by
 - bringing the scheduling model under complete control of the
   scheduler
 - always having unconditional preemption (and using it to varying
   degrees of strictness based on the preemption model in effect.)

So, even though PREEMPT_AUTO does use PREEMPT_NONE, PREEMPT_VOLUNTARY,
PREEMPT options, those are just meant to loosely identify with Linux's
preemption models, and the intent is not to be identical to it -- they
can't be identical because the underlying implementation is completely
different.

The eventual hope is that we can get rid of explicit preemption points.

> week from now, no one will likely be able to tell the difference ;-). So IMHO
> either CONFIG_PREEMPT_DYNAMIC should be changed to make CONFIG_PREEMPT_RCU
> optional, or this series should be altered to force CONFIG_PREEMPT_RCU=y.
>
I think that's a patch for CONFIG_PREEMPT_DYNAMIC :).

From earlier discussions on this, I'm convinced that PREEMPT_AUTO, at
least where the user has explicitly opted for throughput,
(PREEMPT_{NONE,VOLUNTARY}), should support the non-preemptible RCU variant.

Thanks

--
ankur

Re: [PATCH 15/30] rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y

Posted by Joel Fernandes 1 year, 11 months ago


On 3/11/2024 1:18 AM, Ankur Arora wrote:
> 
> Joel Fernandes <joel@joelfernandes.org> writes:
> 
>> On 3/10/2024 2:56 PM, Paul E. McKenney wrote:
>>> On Sun, Mar 10, 2024 at 06:03:30AM -0400, Joel Fernandes wrote:
>>>> Hello Ankur and Paul,
>>>>
>>>> On Mon, Feb 12, 2024 at 09:55:39PM -0800, Ankur Arora wrote:
>>>>> With PREEMPT_RCU=n, cond_resched() provides urgently needed quiescent
>>>>> states for read-side critical sections via rcu_all_qs().
>>>>> One reason why this was necessary: lacking preempt-count, the tick
>>>>> handler has no way of knowing whether it is executing in a read-side
>>>>> critical section or not.
>>>>>
>>>>> With PREEMPT_AUTO=y, there can be configurations with (PREEMPT_COUNT=y,
>>>>> PREEMPT_RCU=n). This means that cond_resched() is a stub which does
>>>>> not provide for quiescent states via rcu_all_qs().
>>>>>
>>>>> So, use the availability of preempt_count() to report quiescent states
>>>>> in rcu_flavor_sched_clock_irq().
>>>>>
>>>>> Suggested-by: Paul E. McKenney <paulmck@kernel.org>
>>>>> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
>>>>> ---
>>>>>  kernel/rcu/tree_plugin.h | 11 +++++++----
>>>>>  1 file changed, 7 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
>>>>> index 26c79246873a..9b72e9d2b6fe 100644
>>>>> --- a/kernel/rcu/tree_plugin.h
>>>>> +++ b/kernel/rcu/tree_plugin.h
>>>>> @@ -963,13 +963,16 @@ static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp)
>>>>>   */
>>>>>  static void rcu_flavor_sched_clock_irq(int user)
>>>>>  {
>>>>> -	if (user || rcu_is_cpu_rrupt_from_idle()) {
>>>>> +	if (user || rcu_is_cpu_rrupt_from_idle() ||
>>>>> +	    (IS_ENABLED(CONFIG_PREEMPT_COUNT) &&
>>>>> +	     !(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK)))) {
>>>>
>>>> I was wondering if it makes sense to even support !PREEMPT_RCU under
>>>> CONFIG_PREEMPT_AUTO.
>>>>
>>>> AFAIU, this CONFIG_PREEMPT_AUTO series preempts the kernel on
>>>> the next tick boundary in the worst case, with all preempt modes including
>>>> the preempt=none mode.
>>>>
>>>> Considering this, does it makes sense for RCU to be non-preemptible in
>>>> CONFIG_PREEMPT_AUTO=y? Because if that were the case, and a read-side critical
>>>> section extended beyond the tick, then it prevents the PREEMPT_AUTO preemption
>>>> from happening, because rcu_read_lock() would preempt_disable().
>>>
>>> Yes, it does make sense for RCU to be non-preemptible in kernels
>>> built with CONFIG_PREEMPT_AUTO=y and either CONFIG_PREEMPT_NONE=y or
>>> CONFIG_PREEMPT_VOLUNTARY=y.
>>> As noted in earlier discussions, there are
>>
>> Sorry if I missed a discussion, appreciate a link.
>>
>>> systems that are adequately but not abundantly endowed with memory.
>>> Such systems need non-preemptible RCU to avoid preempted-reader OOMs.
>>
>> Then why don't such systems have a problem with CONFIG_PREEMPT_DYNAMIC=y and
>> preempt=none mode? CONFIG_PREEMPT_DYNAMIC forces CONFIG_PREEMPT_RCU=y. There's
>> no way to set CONFIG_PREEMPT_RCU=n with CONFIG_PREEMPT_DYNAMIC=y and
>> preempt=none boot parameter.  IMHO, if this feature is inconsistent with
>> CONFIG_PREEMPT_DYNAMIC, that makes it super confusing.  In fact, I feel
>> CONFIG_PREEMPT_AUTO should instead just be another "preempt=auto" boot parameter
>> mode added to CONFIG_PREEMPT_DYNAMIC feature, otherwise the proliferation of
>> CONFIG_PREEMPT config options is getting a bit insane. And likely going to be
>> burden to the users configuring the PREEMPT Kconfig option IMHO.
>>
>>> Note well that non-preemptible RCU explicitly disables preemption across
>>> all RCU readers.
>>
>> Yes, I mentioned this 'disabling preemption' aspect in my last email. My point
>> being, unlike CONFIG_PREEMPT_NONE, CONFIG_PREEMPT_AUTO allows for kernel
>> preemption in preempt=none. So the "Don't preempt the kernel" behavior has
>> changed. That is, preempt=none under CONFIG_PREEMPT_AUTO is different from
>> CONFIG_PREEMPT_NONE=y already. Here we *are* preempting. And RCU is getting on
> 
> I think that's a view from too close to the implementation. Someone
> using the kernel is not necessarily concered with whether tasks are
> preempted or not. They are concerned with throughput and latency.

No, we are not only talking about that (throughput/latency). We are also talking
about the issue related to RCU reader-preemption causing OOM (well and that
could hurt both throughput and latency as well).

With CONFIG_PREEMPT_AUTO=y, you now preempt in the preempt=none mode. Something
very different from the classical CONFIG_PREEMPT_NONE=y.

Essentially this means preemption is now more aggressive from the point of view
of a preempt=none user. I was suggesting that, a point of view could be RCU
should always support preepmtiblity (don't give PREEEMPT_RCU=n option) because
AUTO *does preempt* unlike classic CONFIG_PREEMPT_NONE. Otherwise it is
inconsistent -- say with CONFIG_PREEMPT=y (another *preemption mode*) which
forces CONFIG_PREEMPT_RCU. However to Paul's point, we need to worry about those
users who are concerned with running out of memory due to reader preemption.

In that vain, maybe we should also support CONFIG_PREEMPT_RCU=n for
CONFIG_PREEMPT=y as well. There are plenty of popular systems with relatively
low memory that need low latency (like some low-end devices / laptops :-)).

> Framed thus:
> 
> preempt=none: tasks typically run to completion, might result in high latency
> preempt=full: preempt at the earliest opportunity, low latency
> preempt=voluntary: somewhere between these two
> 
> In that case you could argue that CONFIG_PREEMPT_NONE,
> (CONFIG_PREEMPT_DYNAMIC, preempt=none) and (CONFIG_PREEMPT_AUTO,
> preempt=none) have broadly similar behaviour.

Yes, in that respect I agree.

> 
>> the way. It is like saying, you want an option for CONFIG_PREEMPT_RCU to be set
>> to =n for CONFIG_PREEMPT=y kernels, sighting users who want a fully-preemptible
>> kernel but are worried about reader preemptions.
>>
>> That aside, as such, I do agree your point of view, that preemptible readers
>> presents a problem to folks using preempt=none in this series and we could
>> decide to keep CONFIG_PREEMPT_RCU optional for whoever wants it that way. I was
>> just saying that I want CONFIG_PREEMPT_AUTO's preempt=none mode to be somewhat
>> consistent with CONFIG_PREEMPT_DYNAMIC's preempt=none. Because I'm pretty sure a
> 
> PREEMPT_DYNAMIC and PREEMPT_AUTO are trying to do different tasks.
> 
> PREEMPT_DYNAMIC: allow dynamic switching between the original
> PREEMPT_NONE, PREEMPT_VOLUNTARY, PREEEMPT models.
> 
> PREEMPT_AUTO: remove the need for explicit preemption points, by
>  - bringing the scheduling model under complete control of the
>    scheduler
>  - always having unconditional preemption (and using it to varying
>    degrees of strictness based on the preemption model in effect.)
> 
> So, even though PREEMPT_AUTO does use PREEMPT_NONE, PREEMPT_VOLUNTARY,
> PREEMPT options, those are just meant to loosely identify with Linux's
> preemption models, and the intent is not to be identical to it -- they
> can't be identical because the underlying implementation is completely
> different.
> 
> The eventual hope is that we can get rid of explicit preemption points.

Sounds good.

>> week from now, no one will likely be able to tell the difference ;-). So IMHO
>> either CONFIG_PREEMPT_DYNAMIC should be changed to make CONFIG_PREEMPT_RCU
>> optional, or this series should be altered to force CONFIG_PREEMPT_RCU=y.
>>
> I think that's a patch for CONFIG_PREEMPT_DYNAMIC :).

Yes, I'm curious to explore it more. Specifically, with DYNAMIC=y, I'll explore
during my free time about how different does preempt=none behave from an RCU PoV
(say versus CONFIG_PREEMPT_NONE). I will also continue reviewing these patches.

Thanks!

 - Joel

Re: [PATCH 15/30] rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y

Posted by Thomas Gleixner 1 year, 11 months ago

On Mon, Mar 11 2024 at 11:25, Joel Fernandes wrote:
> On 3/11/2024 1:18 AM, Ankur Arora wrote:
>>> Yes, I mentioned this 'disabling preemption' aspect in my last email. My point
>>> being, unlike CONFIG_PREEMPT_NONE, CONFIG_PREEMPT_AUTO allows for kernel
>>> preemption in preempt=none. So the "Don't preempt the kernel" behavior has
>>> changed. That is, preempt=none under CONFIG_PREEMPT_AUTO is different from
>>> CONFIG_PREEMPT_NONE=y already. Here we *are* preempting. And RCU is getting on
>> 
>> I think that's a view from too close to the implementation. Someone
>> using the kernel is not necessarily concered with whether tasks are
>> preempted or not. They are concerned with throughput and latency.
>
> No, we are not only talking about that (throughput/latency). We are also talking
> about the issue related to RCU reader-preemption causing OOM (well and that
> could hurt both throughput and latency as well).

That happens only when PREEMPT_RCU=y. For PREEMPT_RCU=n the read side
critical sections still have preemption disabled.

> With CONFIG_PREEMPT_AUTO=y, you now preempt in the preempt=none mode. Something
> very different from the classical CONFIG_PREEMPT_NONE=y.

In PREEMPT_RCU=y and preempt=none mode this happens only when really
required, i.e. when the task does not schedule out or returns to user
space on time, or when a higher scheduling class task gets runnable. For
the latter the jury is still out whether this should be done or just
lazily defered like the SCHED_OTHER preemption requests.

In any case for that to matter this forced preemption would need to
preempt a RCU read side critical section and then keep the preempted
task away from the CPU for a long time.

That's very different from the unconditional kernel preemption model which
preempt=full provides and only marginally different from the existing
PREEMPT_NONE model. I know there might be dragons, but I'm not convinced
yet that this is an actual problem.

OTOH, doesn't PREEMPT_RCU=y have mechanism to mitigate that already?

> Essentially this means preemption is now more aggressive from the point of view
> of a preempt=none user. I was suggesting that, a point of view could be RCU
> should always support preepmtiblity (don't give PREEEMPT_RCU=n option) because
> AUTO *does preempt* unlike classic CONFIG_PREEMPT_NONE. Otherwise it is
> inconsistent -- say with CONFIG_PREEMPT=y (another *preemption mode*) which
> forces CONFIG_PREEMPT_RCU. However to Paul's point, we need to worry about those
> users who are concerned with running out of memory due to reader
> preemption.

What's wrong with the combination of PREEMPT_AUTO=y and PREEMPT_RCU=n?
Paul and me agreed long ago that this needs to be supported.

> In that vain, maybe we should also support CONFIG_PREEMPT_RCU=n for
> CONFIG_PREEMPT=y as well. There are plenty of popular systems with relatively
> low memory that need low latency (like some low-end devices / laptops
> :-)).

I'm not sure whether that's useful as the goal is to get rid of all the
CONFIG_PREEMPT_FOO options, no?

I'd rather spend brain cycles on figuring out whether RCU can be flipped
over between PREEMPT_RCU=n/y at boot or obviously run-time.

Thanks,

        tglx

Re: [PATCH 15/30] rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y

Posted by Joel Fernandes 1 year, 11 months ago

Hi, Thomas,
Thanks for your reply! I replied below.

On 3/11/2024 3:12 PM, Thomas Gleixner wrote:
> On Mon, Mar 11 2024 at 11:25, Joel Fernandes wrote:
>> On 3/11/2024 1:18 AM, Ankur Arora wrote:
>>>> Yes, I mentioned this 'disabling preemption' aspect in my last email. My point
>>>> being, unlike CONFIG_PREEMPT_NONE, CONFIG_PREEMPT_AUTO allows for kernel
>>>> preemption in preempt=none. So the "Don't preempt the kernel" behavior has
>>>> changed. That is, preempt=none under CONFIG_PREEMPT_AUTO is different from
>>>> CONFIG_PREEMPT_NONE=y already. Here we *are* preempting. And RCU is getting on
>>>
>>> I think that's a view from too close to the implementation. Someone
>>> using the kernel is not necessarily concered with whether tasks are
>>> preempted or not. They are concerned with throughput and latency.
>>
>> No, we are not only talking about that (throughput/latency). We are also talking
>> about the issue related to RCU reader-preemption causing OOM (well and that
>> could hurt both throughput and latency as well).
> 
> That happens only when PREEMPT_RCU=y. For PREEMPT_RCU=n the read side
> critical sections still have preemption disabled.

Sorry, let me clarify. And please forgive my noise but it is just a point of
view. CONFIG_PREEMPT_AUTO always preempts sooner or later, even for
preempt=none. A point of view could be, if you are preempting anyway (like
CONFIG_PREEMPT=y), then why bother with disabling CONFIG_PREEMPT_RCU or even
give it as an option. After all, with CONFIG_PREEMPT=y, you cannot do
CONFIG_PREEMPT_RCU=n.  It is just a point of view, while we are still discussing
this patch series ahead of its potential merge.

>> With CONFIG_PREEMPT_AUTO=y, you now preempt in the preempt=none mode. Something
>> very different from the classical CONFIG_PREEMPT_NONE=y.
> 
> In PREEMPT_RCU=y and preempt=none mode this happens only when really
> required, i.e. when the task does not schedule out or returns to user
> space on time, or when a higher scheduling class task gets runnable. For
> the latter the jury is still out whether this should be done or just
> lazily defered like the SCHED_OTHER preemption requests.
> 
> In any case for that to matter this forced preemption would need to
> preempt a RCU read side critical section and then keep the preempted
> task away from the CPU for a long time.
> 
> That's very different from the unconditional kernel preemption model which
> preempt=full provides and only marginally different from the existing
> PREEMPT_NONE model. I know there might be dragons, but I'm not convinced
> yet that this is an actual problem.

Sure it is less aggressive than a full preemption, still a preemption
nonetheless, so its quirky in the regard of whether or not RCU preemption is
provided as an option (as I mentioned as a point-of-view above).

> OTOH, doesn't PREEMPT_RCU=y have mechanism to mitigate that already?

It does. But that sounds more in favor of forcing PREEMPT_RCU=y for AUTO since
such mitigation will help those concerns that AUTO users would need
PREEMPT_RCU=y (?).

>> Essentially this means preemption is now more aggressive from the point of view
>> of a preempt=none user. I was suggesting that, a point of view could be RCU
>> should always support preepmtiblity (don't give PREEEMPT_RCU=n option) because
>> AUTO *does preempt* unlike classic CONFIG_PREEMPT_NONE. Otherwise it is
>> inconsistent -- say with CONFIG_PREEMPT=y (another *preemption mode*) which
>> forces CONFIG_PREEMPT_RCU. However to Paul's point, we need to worry about those
>> users who are concerned with running out of memory due to reader
>> preemption.
> 
> What's wrong with the combination of PREEMPT_AUTO=y and PREEMPT_RCU=n?
> Paul and me agreed long ago that this needs to be supported.

There's nothing wrong with it. Its just a bit quirky (again just a point of
view), that for a configuration that causes preemption (similar to
CONFIG_PREEMPT=y), that PREEMPT_RCU can be disabled. After all, again with
CONFIG_PREEMPT=y, PREEMPT_RCU cannot be currently disabled.

>> In that vain, maybe we should also support CONFIG_PREEMPT_RCU=n for
>> CONFIG_PREEMPT=y as well. There are plenty of popular systems with relatively
>> low memory that need low latency (like some low-end devices / laptops
>> :-)).
> 
> I'm not sure whether that's useful as the goal is to get rid of all the
> CONFIG_PREEMPT_FOO options, no?

I think I may have lost you here, how does forcing or not forcing
CONFIG_PREEMPT_RCU relate to getting rid of CONFIG options? There's no new
CONFIG options added one way or the other.

> I'd rather spend brain cycles on figuring out whether RCU can be flipped
> over between PREEMPT_RCU=n/y at boot or obviously run-time.

Yes I agree with that actually, and I see Paul provided some detailed thoughts
in a reply to you in your quest to get him to write a novel as you put it ;-). I
am Ok with our providing preemptible-RCU as an option with AUTO and it could be
documented well that this is possible. I am fully on board also with the
sentiment of getting rid of the zoo of CONFIG_PREEMPT options!

thanks,

-  Joel

Re: [PATCH 15/30] rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y

Posted by Ankur Arora 1 year, 11 months ago

Joel Fernandes <joel@joelfernandes.org> writes:

> Hi, Thomas,
> Thanks for your reply! I replied below.
>
> On 3/11/2024 3:12 PM, Thomas Gleixner wrote:
>> On Mon, Mar 11 2024 at 11:25, Joel Fernandes wrote:

   [ ... ]

>> What's wrong with the combination of PREEMPT_AUTO=y and PREEMPT_RCU=n?
>> Paul and me agreed long ago that this needs to be supported.
>
> There's nothing wrong with it. Its just a bit quirky (again just a point of
> view), that for a configuration that causes preemption (similar to
> CONFIG_PREEMPT=y), that PREEMPT_RCU can be disabled. After all, again with
> CONFIG_PREEMPT=y, PREEMPT_RCU cannot be currently disabled.

I think the argument was that PREEMPT_RCU=y is suboptimal for certain
workloads, and those configurations might prefer the stronger
forward-progress guarantees that PREEMPT_RCU=n provides.

See this:
https://lore.kernel.org/lkml/73ecce1c-d321-4579-b892-13b1e0a0620a@paulmck-laptop/T/#m6aab5a6fd5f1fd4c3dc9282ce564e64f2fa6cdc3

and the surrounding thread.

Thanks

--
ankur

Re: [PATCH 15/30] rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y

Posted by Joel Fernandes 1 year, 11 months ago


On 3/11/2024 11:16 PM, Ankur Arora wrote:
> 
> Joel Fernandes <joel@joelfernandes.org> writes:
> 
>> Hi, Thomas,
>> Thanks for your reply! I replied below.
>>
>> On 3/11/2024 3:12 PM, Thomas Gleixner wrote:
>>> On Mon, Mar 11 2024 at 11:25, Joel Fernandes wrote:
> 
>    [ ... ]
> 
>>> What's wrong with the combination of PREEMPT_AUTO=y and PREEMPT_RCU=n?
>>> Paul and me agreed long ago that this needs to be supported.
>>
>> There's nothing wrong with it. Its just a bit quirky (again just a point of
>> view), that for a configuration that causes preemption (similar to
>> CONFIG_PREEMPT=y), that PREEMPT_RCU can be disabled. After all, again with
>> CONFIG_PREEMPT=y, PREEMPT_RCU cannot be currently disabled.
> 
> I think the argument was that PREEMPT_RCU=y is suboptimal for certain
> workloads, and those configurations might prefer the stronger
> forward-progress guarantees that PREEMPT_RCU=n provides.
> 
> See this:
> https://lore.kernel.org/lkml/73ecce1c-d321-4579-b892-13b1e0a0620a@paulmck-laptop/T/#m6aab5a6fd5f1fd4c3dc9282ce564e64f2fa6cdc3
> 
> and the surrounding thread.

Thanks for the link. Sorry for any noise due to being late to the party. Based
on the discussions, I concur with everyone on the goal of getting rid of
CONFIG_PREEMPT_DYNAMIC and the various cond_resched()/might_sleep() things. I'll
also go look harder at what else we need to get CONFIG_PREEMPT_RCU=y/n working
with CONFIG_PREEMPT_AUTO=y.

thanks,

- Joel

Re: [PATCH 15/30] rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y

Posted by Ankur Arora 1 year, 11 months ago

Joel Fernandes <joel@joelfernandes.org> writes:

> On 3/11/2024 11:16 PM, Ankur Arora wrote:
>>
>> Joel Fernandes <joel@joelfernandes.org> writes:
>>
>>> Hi, Thomas,
>>> Thanks for your reply! I replied below.
>>>
>>> On 3/11/2024 3:12 PM, Thomas Gleixner wrote:
>>>> On Mon, Mar 11 2024 at 11:25, Joel Fernandes wrote:
>>
>>    [ ... ]
>>
>>>> What's wrong with the combination of PREEMPT_AUTO=y and PREEMPT_RCU=n?
>>>> Paul and me agreed long ago that this needs to be supported.
>>>
>>> There's nothing wrong with it. Its just a bit quirky (again just a point of
>>> view), that for a configuration that causes preemption (similar to
>>> CONFIG_PREEMPT=y), that PREEMPT_RCU can be disabled. After all, again with
>>> CONFIG_PREEMPT=y, PREEMPT_RCU cannot be currently disabled.
>>
>> I think the argument was that PREEMPT_RCU=y is suboptimal for certain
>> workloads, and those configurations might prefer the stronger
>> forward-progress guarantees that PREEMPT_RCU=n provides.
>>
>> See this:
>> https://lore.kernel.org/lkml/73ecce1c-d321-4579-b892-13b1e0a0620a@paulmck-laptop/T/#m6aab5a6fd5f1fd4c3dc9282ce564e64f2fa6cdc3
>>
>> and the surrounding thread.
>
> Thanks for the link. Sorry for any noise due to being late to the party. Based
> on the discussions, I concur with everyone on the goal of getting rid of

No worries. Given the unending context, easy enough to miss.

> CONFIG_PREEMPT_DYNAMIC and the various cond_resched()/might_sleep() things. I'll
> also go look harder at what else we need to get CONFIG_PREEMPT_RCU=y/n working
> with CONFIG_PREEMPT_AUTO=y.

Sounds great. Thanks.

And, please keep the review comments coming.

--
ankur

Re: [PATCH 15/30] rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y

Posted by Paul E. McKenney 1 year, 11 months ago

On Mon, Mar 11, 2024 at 08:12:58PM +0100, Thomas Gleixner wrote:
> On Mon, Mar 11 2024 at 11:25, Joel Fernandes wrote:
> > On 3/11/2024 1:18 AM, Ankur Arora wrote:
> >>> Yes, I mentioned this 'disabling preemption' aspect in my last email. My point
> >>> being, unlike CONFIG_PREEMPT_NONE, CONFIG_PREEMPT_AUTO allows for kernel
> >>> preemption in preempt=none. So the "Don't preempt the kernel" behavior has
> >>> changed. That is, preempt=none under CONFIG_PREEMPT_AUTO is different from
> >>> CONFIG_PREEMPT_NONE=y already. Here we *are* preempting. And RCU is getting on
> >> 
> >> I think that's a view from too close to the implementation. Someone
> >> using the kernel is not necessarily concered with whether tasks are
> >> preempted or not. They are concerned with throughput and latency.
> >
> > No, we are not only talking about that (throughput/latency). We are also talking
> > about the issue related to RCU reader-preemption causing OOM (well and that
> > could hurt both throughput and latency as well).
> 
> That happens only when PREEMPT_RCU=y. For PREEMPT_RCU=n the read side
> critical sections still have preemption disabled.
> 
> > With CONFIG_PREEMPT_AUTO=y, you now preempt in the preempt=none mode. Something
> > very different from the classical CONFIG_PREEMPT_NONE=y.
> 
> In PREEMPT_RCU=y and preempt=none mode this happens only when really
> required, i.e. when the task does not schedule out or returns to user
> space on time, or when a higher scheduling class task gets runnable. For
> the latter the jury is still out whether this should be done or just
> lazily defered like the SCHED_OTHER preemption requests.
> 
> In any case for that to matter this forced preemption would need to
> preempt a RCU read side critical section and then keep the preempted
> task away from the CPU for a long time.
> 
> That's very different from the unconditional kernel preemption model which
> preempt=full provides and only marginally different from the existing
> PREEMPT_NONE model. I know there might be dragons, but I'm not convinced
> yet that this is an actual problem.
> 
> OTOH, doesn't PREEMPT_RCU=y have mechanism to mitigate that already?

You are right, it does, CONFIG_RCU_BOOST=y.

> > Essentially this means preemption is now more aggressive from the point of view
> > of a preempt=none user. I was suggesting that, a point of view could be RCU
> > should always support preepmtiblity (don't give PREEEMPT_RCU=n option) because
> > AUTO *does preempt* unlike classic CONFIG_PREEMPT_NONE. Otherwise it is
> > inconsistent -- say with CONFIG_PREEMPT=y (another *preemption mode*) which
> > forces CONFIG_PREEMPT_RCU. However to Paul's point, we need to worry about those
> > users who are concerned with running out of memory due to reader
> > preemption.
> 
> What's wrong with the combination of PREEMPT_AUTO=y and PREEMPT_RCU=n?
> Paul and me agreed long ago that this needs to be supported.
> 
> > In that vain, maybe we should also support CONFIG_PREEMPT_RCU=n for
> > CONFIG_PREEMPT=y as well. There are plenty of popular systems with relatively
> > low memory that need low latency (like some low-end devices / laptops
> > :-)).
> 
> I'm not sure whether that's useful as the goal is to get rid of all the
> CONFIG_PREEMPT_FOO options, no?
> 
> I'd rather spend brain cycles on figuring out whether RCU can be flipped
> over between PREEMPT_RCU=n/y at boot or obviously run-time.

Well, it is just software, so anything is possible.  But there can
be a wide gap between "possible" and "sensible".  ;-)

In theory, one boot-time approach would be build preemptible RCU,
and then to boot-time binary-rewrite calls to __rcu_read_lock()
and __rcu_read_unlock() to preempt_disable() and preempt_enable(),
respectively.  Because preemptible RCU has to treat preemption-disabled
regions of code as RCU readers, this Should Just Work.  However, there
would then be a lot of needless branches in the grace-period code.
Only the ones on fastpaths (for example, context switch) would need
to be static-branchified, but there would likely need to be other
restructuring, given the need for current preemptible RCU to do a better
job of emulating non-preemptible RCU.  (Emulating non-preemptible RCU
is of course currently a complete non-goal for preemptible RCU.)

So maybe?

But this one needs careful design and review up front, as in step through
all the code and check assumptions and changes in behavior.  After all,
this stuff is way easier to break than to debug and fix.  ;-)

On the other hand, making RCU switch at runtime is...  Tricky.

For example, if the system was in non-preemptible mode at rcu_read_lock()
time, the corresponding rcu_read_unlock() needs to be aware that it needs
to act as if the system was still in non-preemptible mode, and vice versa.
Grace period processing during the switch needs to be aware that different
CPUs will be switching at different times.  Also, it will be common for a
given CPU's switch to span more than one grace period.  So any approach
based on either binary rewrite or static branches will need to be set
up in a multi-phase multi-grace-period state machine.  Sort of like
Frederic's runtime-switched callback offloading, but rather more complex,
and way more performance sensitive.

But do we even need to switch RCU at runtime, other than to say that
we did it?  What is the use case?  Or is this just a case of "it would
be cool!"?  Don't get me wrong, I am a sucker for "it would be cool",
as you well know, but even for me there are limits.  ;-)

At the moment, I would prioritize improving quiescent-state forcing for
existing RCU over this, especially perhaps given the concerns from the
MM folks.

But what is motivating the desire to boot-time/run-time switch RCU
between preemptible and non-preemptible?

							Thanx, Paul

Re: [PATCH 15/30] rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y

Posted by Thomas Gleixner 1 year, 11 months ago

Paul!

On Mon, Mar 11 2024 at 12:53, Paul E. McKenney wrote:
> On Mon, Mar 11, 2024 at 08:12:58PM +0100, Thomas Gleixner wrote:
>> I'd rather spend brain cycles on figuring out whether RCU can be flipped
>> over between PREEMPT_RCU=n/y at boot or obviously run-time.
>
> Well, it is just software, so anything is possible.  But there can
> be a wide gap between "possible" and "sensible".  ;-)

Indeed.

> In theory, one boot-time approach would be build preemptible RCU,
> and then to boot-time binary-rewrite calls to __rcu_read_lock()
> and __rcu_read_unlock() to preempt_disable() and preempt_enable(),
> respectively.  Because preemptible RCU has to treat preemption-disabled
> regions of code as RCU readers, this Should Just Work.  However, there
> would then be a lot of needless branches in the grace-period code.
> Only the ones on fastpaths (for example, context switch) would need
> to be static-branchified, but there would likely need to be other
> restructuring, given the need for current preemptible RCU to do a better
> job of emulating non-preemptible RCU.  (Emulating non-preemptible RCU
> is of course currently a complete non-goal for preemptible RCU.)

Sure, but that might be a path to have a more unified RCU model in the
longer term with the tweak of patching the flavor once at boot.

> But this one needs careful design and review up front, as in step through
> all the code and check assumptions and changes in behavior.  After all,
> this stuff is way easier to break than to debug and fix.  ;-)

Isn't that where all the real fun is? :)

> On the other hand, making RCU switch at runtime is...  Tricky.

I was only half serious about that. Just thought to mention if for
completeness sake and of course to make you write a novel. :)

> For example, if the system was in non-preemptible mode at rcu_read_lock()
> time, the corresponding rcu_read_unlock() needs to be aware that it needs
> to act as if the system was still in non-preemptible mode, and vice versa.
> Grace period processing during the switch needs to be aware that different
> CPUs will be switching at different times.  Also, it will be common for a
> given CPU's switch to span more than one grace period.  So any approach
> based on either binary rewrite or static branches will need to be set
> up in a multi-phase multi-grace-period state machine.  Sort of like
> Frederic's runtime-switched callback offloading, but rather more complex,
> and way more performance sensitive.

Of course it would be a complex endavour at the scale of run-time
switching to or from PREEMPT_RT, which will never happen. Even the boot
time switch for RT would be way harder than the RCU one :)

> But do we even need to switch RCU at runtime, other than to say that
> we did it?  What is the use case?  Or is this just a case of "it would
> be cool!"?  Don't get me wrong, I am a sucker for "it would be cool",
> as you well know, but even for me there are limits.  ;-)

There is no need for runtime switching other than "it would be cool" and
I'm happy that even you have limits. :)

> At the moment, I would prioritize improving quiescent-state forcing for
> existing RCU over this, especially perhaps given the concerns from the
> MM folks.
>
> But what is motivating the desire to boot-time/run-time switch RCU
> between preemptible and non-preemptible?

The goal of PREEMPT_AUTO is to get rid of the zoo of preemption models
and their associated horrors. As we debated some time ago we still need
to have the distinction of preemptible and non-preemptible RCU even with
that.

So it's a pretty obvious consequence to make this decision at boot time
to reduce the number of kernel flavors for distros to build.

Nothing urgent, but it just came to my mind when reading through this
thread and replying to Joel.

Thanks,

        tglx

Re: [PATCH 15/30] rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y

Posted by Paul E. McKenney 1 year, 11 months ago

On Mon, Mar 11, 2024 at 09:29:14PM +0100, Thomas Gleixner wrote:
> Paul!
> 
> On Mon, Mar 11 2024 at 12:53, Paul E. McKenney wrote:
> > On Mon, Mar 11, 2024 at 08:12:58PM +0100, Thomas Gleixner wrote:
> >> I'd rather spend brain cycles on figuring out whether RCU can be flipped
> >> over between PREEMPT_RCU=n/y at boot or obviously run-time.
> >
> > Well, it is just software, so anything is possible.  But there can
> > be a wide gap between "possible" and "sensible".  ;-)
> 
> Indeed.
> 
> > In theory, one boot-time approach would be build preemptible RCU,
> > and then to boot-time binary-rewrite calls to __rcu_read_lock()
> > and __rcu_read_unlock() to preempt_disable() and preempt_enable(),
> > respectively.  Because preemptible RCU has to treat preemption-disabled
> > regions of code as RCU readers, this Should Just Work.  However, there
> > would then be a lot of needless branches in the grace-period code.
> > Only the ones on fastpaths (for example, context switch) would need
> > to be static-branchified, but there would likely need to be other
> > restructuring, given the need for current preemptible RCU to do a better
> > job of emulating non-preemptible RCU.  (Emulating non-preemptible RCU
> > is of course currently a complete non-goal for preemptible RCU.)
> 
> Sure, but that might be a path to have a more unified RCU model in the
> longer term with the tweak of patching the flavor once at boot.

I am pretty sure that it can be made to happen.  The big question in my
mind is "Is it worth it?"

> > But this one needs careful design and review up front, as in step through
> > all the code and check assumptions and changes in behavior.  After all,
> > this stuff is way easier to break than to debug and fix.  ;-)
> 
> Isn't that where all the real fun is? :)
> 
> > On the other hand, making RCU switch at runtime is...  Tricky.
> 
> I was only half serious about that. Just thought to mention if for
> completeness sake and of course to make you write a novel. :)

A short novel.  ;-)

> > For example, if the system was in non-preemptible mode at rcu_read_lock()
> > time, the corresponding rcu_read_unlock() needs to be aware that it needs
> > to act as if the system was still in non-preemptible mode, and vice versa.
> > Grace period processing during the switch needs to be aware that different
> > CPUs will be switching at different times.  Also, it will be common for a
> > given CPU's switch to span more than one grace period.  So any approach
> > based on either binary rewrite or static branches will need to be set
> > up in a multi-phase multi-grace-period state machine.  Sort of like
> > Frederic's runtime-switched callback offloading, but rather more complex,
> > and way more performance sensitive.
> 
> Of course it would be a complex endavour at the scale of run-time
> switching to or from PREEMPT_RT, which will never happen. Even the boot
> time switch for RT would be way harder than the RCU one :)

Run-time switching of either RCU or PREEMPT_RT would likely provide many
opportunities along the way for black hats.  ;-)

> > But do we even need to switch RCU at runtime, other than to say that
> > we did it?  What is the use case?  Or is this just a case of "it would
> > be cool!"?  Don't get me wrong, I am a sucker for "it would be cool",
> > as you well know, but even for me there are limits.  ;-)
> 
> There is no need for runtime switching other than "it would be cool" and
> I'm happy that even you have limits. :)

So *that* was the point of this whole email exchange.  ;-)

> > At the moment, I would prioritize improving quiescent-state forcing for
> > existing RCU over this, especially perhaps given the concerns from the
> > MM folks.
> >
> > But what is motivating the desire to boot-time/run-time switch RCU
> > between preemptible and non-preemptible?
> 
> The goal of PREEMPT_AUTO is to get rid of the zoo of preemption models
> and their associated horrors. As we debated some time ago we still need
> to have the distinction of preemptible and non-preemptible RCU even with
> that.
> 
> So it's a pretty obvious consequence to make this decision at boot time
> to reduce the number of kernel flavors for distros to build.

Very well, we can look to the distros for requirements, when and if.

> Nothing urgent, but it just came to my mind when reading through this
> thread and replying to Joel.

Got it, thank you!

							Thanx, Paul

Re: [PATCH 15/30] rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y

Posted by Paul E. McKenney 1 year, 11 months ago

On Sun, Mar 10, 2024 at 08:48:28PM -0400, Joel Fernandes wrote:
> On 3/10/2024 2:56 PM, Paul E. McKenney wrote:
> > On Sun, Mar 10, 2024 at 06:03:30AM -0400, Joel Fernandes wrote:
> >> Hello Ankur and Paul,
> >>
> >> On Mon, Feb 12, 2024 at 09:55:39PM -0800, Ankur Arora wrote:
> >>> With PREEMPT_RCU=n, cond_resched() provides urgently needed quiescent
> >>> states for read-side critical sections via rcu_all_qs().
> >>> One reason why this was necessary: lacking preempt-count, the tick
> >>> handler has no way of knowing whether it is executing in a read-side
> >>> critical section or not.
> >>>
> >>> With PREEMPT_AUTO=y, there can be configurations with (PREEMPT_COUNT=y,
> >>> PREEMPT_RCU=n). This means that cond_resched() is a stub which does
> >>> not provide for quiescent states via rcu_all_qs().
> >>>
> >>> So, use the availability of preempt_count() to report quiescent states
> >>> in rcu_flavor_sched_clock_irq().
> >>>
> >>> Suggested-by: Paul E. McKenney <paulmck@kernel.org>
> >>> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
> >>> ---
> >>>  kernel/rcu/tree_plugin.h | 11 +++++++----
> >>>  1 file changed, 7 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> >>> index 26c79246873a..9b72e9d2b6fe 100644
> >>> --- a/kernel/rcu/tree_plugin.h
> >>> +++ b/kernel/rcu/tree_plugin.h
> >>> @@ -963,13 +963,16 @@ static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp)
> >>>   */
> >>>  static void rcu_flavor_sched_clock_irq(int user)
> >>>  {
> >>> -	if (user || rcu_is_cpu_rrupt_from_idle()) {
> >>> +	if (user || rcu_is_cpu_rrupt_from_idle() ||
> >>> +	    (IS_ENABLED(CONFIG_PREEMPT_COUNT) &&
> >>> +	     !(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK)))) {
> >>
> >> I was wondering if it makes sense to even support !PREEMPT_RCU under
> >> CONFIG_PREEMPT_AUTO.
> >>
> >> AFAIU, this CONFIG_PREEMPT_AUTO series preempts the kernel on
> >> the next tick boundary in the worst case, with all preempt modes including
> >> the preempt=none mode.
> >>
> >> Considering this, does it makes sense for RCU to be non-preemptible in
> >> CONFIG_PREEMPT_AUTO=y? Because if that were the case, and a read-side critical
> >> section extended beyond the tick, then it prevents the PREEMPT_AUTO preemption
> >> from happening, because rcu_read_lock() would preempt_disable().
> > 
> > Yes, it does make sense for RCU to be non-preemptible in kernels
> > built with CONFIG_PREEMPT_AUTO=y and either CONFIG_PREEMPT_NONE=y or
> > CONFIG_PREEMPT_VOLUNTARY=y.
> > As noted in earlier discussions, there are
> 
> Sorry if I missed a discussion, appreciate a link.

It is part of the discussion of the first version of this patch series,
if I recall correctly.

> > systems that are adequately but not abundantly endowed with memory.
> > Such systems need non-preemptible RCU to avoid preempted-reader OOMs.
> 
> Then why don't such systems have a problem with CONFIG_PREEMPT_DYNAMIC=y and
> preempt=none mode? CONFIG_PREEMPT_DYNAMIC forces CONFIG_PREEMPT_RCU=y. There's
> no way to set CONFIG_PREEMPT_RCU=n with CONFIG_PREEMPT_DYNAMIC=y and
> preempt=none boot parameter.  IMHO, if this feature is inconsistent with
> CONFIG_PREEMPT_DYNAMIC, that makes it super confusing.  In fact, I feel
> CONFIG_PREEMPT_AUTO should instead just be another "preempt=auto" boot parameter
> mode added to CONFIG_PREEMPT_DYNAMIC feature, otherwise the proliferation of
> CONFIG_PREEMPT config options is getting a bit insane. And likely going to be
> burden to the users configuring the PREEMPT Kconfig option IMHO.

Because such systems are built with CONFIG_PREEMPT_DYNAMIC=n.

You could argue that we should just build with CONFIG_PREEMPT_AUTO=n,
but the long-term goal of eliminating cond_resched() will make that
ineffective.

> > Note well that non-preemptible RCU explicitly disables preemption across
> > all RCU readers.
> 
> Yes, I mentioned this 'disabling preemption' aspect in my last email. My point
> being, unlike CONFIG_PREEMPT_NONE, CONFIG_PREEMPT_AUTO allows for kernel
> preemption in preempt=none. So the "Don't preempt the kernel" behavior has
> changed. That is, preempt=none under CONFIG_PREEMPT_AUTO is different from
> CONFIG_PREEMPT_NONE=y already. Here we *are* preempting. And RCU is getting on
> the way. It is like saying, you want an option for CONFIG_PREEMPT_RCU to be set
> to =n for CONFIG_PREEMPT=y kernels, sighting users who want a fully-preemptible
> kernel but are worried about reader preemptions.

Such users can simply avoid building with either CONFIG_PREEMPT_NONE=y
or with CONFIG_PREEMPT_VOLUNTARY=y.  They might also experiment with
CONFIG_RCU_BOOST=y, and also with short timeouts until boosting.
If that doesn't do what they need, we talk with them and either help
them configure their kernels, make RCU do what they need, or help work
out some other way for them to get their jobs done.

> That aside, as such, I do agree your point of view, that preemptible readers
> presents a problem to folks using preempt=none in this series and we could
> decide to keep CONFIG_PREEMPT_RCU optional for whoever wants it that way. I was
> just saying that I want CONFIG_PREEMPT_AUTO's preempt=none mode to be somewhat
> consistent with CONFIG_PREEMPT_DYNAMIC's preempt=none. Because I'm pretty sure a
> week from now, no one will likely be able to tell the difference ;-). So IMHO
> either CONFIG_PREEMPT_DYNAMIC should be changed to make CONFIG_PREEMPT_RCU
> optional, or this series should be altered to force CONFIG_PREEMPT_RCU=y.
> 
> Let me know if I missed something.

Why not key off of the value of CONFIG_PREEMPT_DYNAMIC?  That way,
if both CONFIG_PREEMPT_AUTO=y and CONFIG_PREEMPT_DYNAMIC=y, RCU is
always preemptible.  Then CONFIG_PREEMPT_DYNAMIC=y enables boot-time
(and maybe even run-time) switching between preemption flavors, while
CONFIG_PREEMPT_AUTO instead enables unconditional preemption of any
region of code that has not explicitly disabled preemption (or irq or
bh or whatever).

But one way or another, we really do need non-preemptible RCU in
conjunction with CONFIG_PREEMPT_AUTO=y.

Also, I don't yet see CONFIG_PREEMPT_AUTO in -next.

							Thanx, Paul

Re: [PATCH 15/30] rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y

Posted by Joel Fernandes 1 year, 11 months ago


On 3/10/2024 11:56 PM, Paul E. McKenney wrote:
> On Sun, Mar 10, 2024 at 08:48:28PM -0400, Joel Fernandes wrote:
>> On 3/10/2024 2:56 PM, Paul E. McKenney wrote:
>>> On Sun, Mar 10, 2024 at 06:03:30AM -0400, Joel Fernandes wrote:
>>>> Hello Ankur and Paul,
>>>>
>>>> On Mon, Feb 12, 2024 at 09:55:39PM -0800, Ankur Arora wrote:
>>>>> With PREEMPT_RCU=n, cond_resched() provides urgently needed quiescent
>>>>> states for read-side critical sections via rcu_all_qs().
>>>>> One reason why this was necessary: lacking preempt-count, the tick
>>>>> handler has no way of knowing whether it is executing in a read-side
>>>>> critical section or not.
>>>>>
>>>>> With PREEMPT_AUTO=y, there can be configurations with (PREEMPT_COUNT=y,
>>>>> PREEMPT_RCU=n). This means that cond_resched() is a stub which does
>>>>> not provide for quiescent states via rcu_all_qs().
>>>>>
>>>>> So, use the availability of preempt_count() to report quiescent states
>>>>> in rcu_flavor_sched_clock_irq().
>>>>>
>>>>> Suggested-by: Paul E. McKenney <paulmck@kernel.org>
>>>>> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
>>>>> ---
>>>>>  kernel/rcu/tree_plugin.h | 11 +++++++----
>>>>>  1 file changed, 7 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
>>>>> index 26c79246873a..9b72e9d2b6fe 100644
>>>>> --- a/kernel/rcu/tree_plugin.h
>>>>> +++ b/kernel/rcu/tree_plugin.h
>>>>> @@ -963,13 +963,16 @@ static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp)
>>>>>   */
>>>>>  static void rcu_flavor_sched_clock_irq(int user)
>>>>>  {
>>>>> -	if (user || rcu_is_cpu_rrupt_from_idle()) {
>>>>> +	if (user || rcu_is_cpu_rrupt_from_idle() ||
>>>>> +	    (IS_ENABLED(CONFIG_PREEMPT_COUNT) &&
>>>>> +	     !(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK)))) {
>>>>
>>>> I was wondering if it makes sense to even support !PREEMPT_RCU under
>>>> CONFIG_PREEMPT_AUTO.
>>>>
>>>> AFAIU, this CONFIG_PREEMPT_AUTO series preempts the kernel on
>>>> the next tick boundary in the worst case, with all preempt modes including
>>>> the preempt=none mode.
>>>>
>>>> Considering this, does it makes sense for RCU to be non-preemptible in
>>>> CONFIG_PREEMPT_AUTO=y? Because if that were the case, and a read-side critical
>>>> section extended beyond the tick, then it prevents the PREEMPT_AUTO preemption
>>>> from happening, because rcu_read_lock() would preempt_disable().
>>>
>>> Yes, it does make sense for RCU to be non-preemptible in kernels
>>> built with CONFIG_PREEMPT_AUTO=y and either CONFIG_PREEMPT_NONE=y or
>>> CONFIG_PREEMPT_VOLUNTARY=y.
>>> As noted in earlier discussions, there are
>>
>> Sorry if I missed a discussion, appreciate a link.
> 
> It is part of the discussion of the first version of this patch series,
> if I recall correctly.
> 
>>> systems that are adequately but not abundantly endowed with memory.
>>> Such systems need non-preemptible RCU to avoid preempted-reader OOMs.
>>
>> Then why don't such systems have a problem with CONFIG_PREEMPT_DYNAMIC=y and
>> preempt=none mode? CONFIG_PREEMPT_DYNAMIC forces CONFIG_PREEMPT_RCU=y. There's
>> no way to set CONFIG_PREEMPT_RCU=n with CONFIG_PREEMPT_DYNAMIC=y and
>> preempt=none boot parameter.  IMHO, if this feature is inconsistent with
>> CONFIG_PREEMPT_DYNAMIC, that makes it super confusing.  In fact, I feel
>> CONFIG_PREEMPT_AUTO should instead just be another "preempt=auto" boot parameter
>> mode added to CONFIG_PREEMPT_DYNAMIC feature, otherwise the proliferation of
>> CONFIG_PREEMPT config options is getting a bit insane. And likely going to be
>> burden to the users configuring the PREEMPT Kconfig option IMHO.
> 
> Because such systems are built with CONFIG_PREEMPT_DYNAMIC=n.
> 
> You could argue that we should just build with CONFIG_PREEMPT_AUTO=n,
> but the long-term goal of eliminating cond_resched() will make that
> ineffective.

I see what you mean. We/I could also highlight some of the differences in RCU
between DYNAMIC vs AUTO.

> 
>>> Note well that non-preemptible RCU explicitly disables preemption across
>>> all RCU readers.
>>
>> Yes, I mentioned this 'disabling preemption' aspect in my last email. My point
>> being, unlike CONFIG_PREEMPT_NONE, CONFIG_PREEMPT_AUTO allows for kernel
>> preemption in preempt=none. So the "Don't preempt the kernel" behavior has
>> changed. That is, preempt=none under CONFIG_PREEMPT_AUTO is different from
>> CONFIG_PREEMPT_NONE=y already. Here we *are* preempting. And RCU is getting on
>> the way. It is like saying, you want an option for CONFIG_PREEMPT_RCU to be set
>> to =n for CONFIG_PREEMPT=y kernels, sighting users who want a fully-preemptible
>> kernel but are worried about reader preemptions.
> 
> Such users can simply avoid building with either CONFIG_PREEMPT_NONE=y
> or with CONFIG_PREEMPT_VOLUNTARY=y.  They might also experiment with
> CONFIG_RCU_BOOST=y, and also with short timeouts until boosting.
> If that doesn't do what they need, we talk with them and either help
> them configure their kernels, make RCU do what they need, or help work
> out some other way for them to get their jobs done.

Makes sense.

>> That aside, as such, I do agree your point of view, that preemptible readers
>> presents a problem to folks using preempt=none in this series and we could
>> decide to keep CONFIG_PREEMPT_RCU optional for whoever wants it that way. I was
>> just saying that I want CONFIG_PREEMPT_AUTO's preempt=none mode to be somewhat
>> consistent with CONFIG_PREEMPT_DYNAMIC's preempt=none. Because I'm pretty sure a
>> week from now, no one will likely be able to tell the difference ;-). So IMHO
>> either CONFIG_PREEMPT_DYNAMIC should be changed to make CONFIG_PREEMPT_RCU
>> optional, or this series should be altered to force CONFIG_PREEMPT_RCU=y.
>>
>> Let me know if I missed something.
> 
> Why not key off of the value of CONFIG_PREEMPT_DYNAMIC?  That way,
> if both CONFIG_PREEMPT_AUTO=y and CONFIG_PREEMPT_DYNAMIC=y, RCU is
> always preemptible.  Then CONFIG_PREEMPT_DYNAMIC=y enables boot-time
> (and maybe even run-time) switching between preemption flavors, while
> CONFIG_PREEMPT_AUTO instead enables unconditional preemption of any
> region of code that has not explicitly disabled preemption (or irq or
> bh or whatever).

That could be done. But currently, these patches disable DYNAMIC if AUTO is
enabled in the config. I think the reason is the 2 features are incompatible.
i.e. DYNAMIC wants to override the preemption mode at boot time, where as AUTO
wants the scheduler to have a say in it using the need-resched LAZY bit.

> But one way or another, we really do need non-preemptible RCU in
> conjunction with CONFIG_PREEMPT_AUTO=y.

Ok, lets go with that. Thanks,

 - Joel

Re: [PATCH 15/30] rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y

Posted by Ankur Arora 1 year, 11 months ago

Joel Fernandes <joel@joelfernandes.org> writes:

> On 3/10/2024 11:56 PM, Paul E. McKenney wrote:
>> On Sun, Mar 10, 2024 at 08:48:28PM -0400, Joel Fernandes wrote:
>>> On 3/10/2024 2:56 PM, Paul E. McKenney wrote:
>>>> On Sun, Mar 10, 2024 at 06:03:30AM -0400, Joel Fernandes wrote:
>>>>> Hello Ankur and Paul,
>>>>>
>>>>> On Mon, Feb 12, 2024 at 09:55:39PM -0800, Ankur Arora wrote:
>>>>>> With PREEMPT_RCU=n, cond_resched() provides urgently needed quiescent
>>>>>> states for read-side critical sections via rcu_all_qs().
>>>>>> One reason why this was necessary: lacking preempt-count, the tick
>>>>>> handler has no way of knowing whether it is executing in a read-side
>>>>>> critical section or not.
>>>>>>
>>>>>> With PREEMPT_AUTO=y, there can be configurations with (PREEMPT_COUNT=y,
>>>>>> PREEMPT_RCU=n). This means that cond_resched() is a stub which does
>>>>>> not provide for quiescent states via rcu_all_qs().
>>>>>>
>>>>>> So, use the availability of preempt_count() to report quiescent states
>>>>>> in rcu_flavor_sched_clock_irq().
>>>>>>
>>>>>> Suggested-by: Paul E. McKenney <paulmck@kernel.org>
>>>>>> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
>>>>>> ---
>>>>>>  kernel/rcu/tree_plugin.h | 11 +++++++----
>>>>>>  1 file changed, 7 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
>>>>>> index 26c79246873a..9b72e9d2b6fe 100644
>>>>>> --- a/kernel/rcu/tree_plugin.h
>>>>>> +++ b/kernel/rcu/tree_plugin.h
>>>>>> @@ -963,13 +963,16 @@ static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp)
>>>>>>   */
>>>>>>  static void rcu_flavor_sched_clock_irq(int user)
>>>>>>  {
>>>>>> -	if (user || rcu_is_cpu_rrupt_from_idle()) {
>>>>>> +	if (user || rcu_is_cpu_rrupt_from_idle() ||
>>>>>> +	    (IS_ENABLED(CONFIG_PREEMPT_COUNT) &&
>>>>>> +	     !(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK)))) {
>>>>>
>>>>> I was wondering if it makes sense to even support !PREEMPT_RCU under
>>>>> CONFIG_PREEMPT_AUTO.
>>>>>
>>>>> AFAIU, this CONFIG_PREEMPT_AUTO series preempts the kernel on
>>>>> the next tick boundary in the worst case, with all preempt modes including
>>>>> the preempt=none mode.
>>>>>
>>>>> Considering this, does it makes sense for RCU to be non-preemptible in
>>>>> CONFIG_PREEMPT_AUTO=y? Because if that were the case, and a read-side critical
>>>>> section extended beyond the tick, then it prevents the PREEMPT_AUTO preemption
>>>>> from happening, because rcu_read_lock() would preempt_disable().
>>>>
>>>> Yes, it does make sense for RCU to be non-preemptible in kernels
>>>> built with CONFIG_PREEMPT_AUTO=y and either CONFIG_PREEMPT_NONE=y or
>>>> CONFIG_PREEMPT_VOLUNTARY=y.
>>>> As noted in earlier discussions, there are
>>>
>>> Sorry if I missed a discussion, appreciate a link.
>>
>> It is part of the discussion of the first version of this patch series,
>> if I recall correctly.
>>
>>>> systems that are adequately but not abundantly endowed with memory.
>>>> Such systems need non-preemptible RCU to avoid preempted-reader OOMs.
>>>
>>> Then why don't such systems have a problem with CONFIG_PREEMPT_DYNAMIC=y and
>>> preempt=none mode? CONFIG_PREEMPT_DYNAMIC forces CONFIG_PREEMPT_RCU=y. There's
>>> no way to set CONFIG_PREEMPT_RCU=n with CONFIG_PREEMPT_DYNAMIC=y and
>>> preempt=none boot parameter.  IMHO, if this feature is inconsistent with
>>> CONFIG_PREEMPT_DYNAMIC, that makes it super confusing.  In fact, I feel
>>> CONFIG_PREEMPT_AUTO should instead just be another "preempt=auto" boot parameter
>>> mode added to CONFIG_PREEMPT_DYNAMIC feature, otherwise the proliferation of
>>> CONFIG_PREEMPT config options is getting a bit insane. And likely going to be
>>> burden to the users configuring the PREEMPT Kconfig option IMHO.
>>
>> Because such systems are built with CONFIG_PREEMPT_DYNAMIC=n.
>>
>> You could argue that we should just build with CONFIG_PREEMPT_AUTO=n,
>> but the long-term goal of eliminating cond_resched() will make that
>> ineffective.
>
> I see what you mean. We/I could also highlight some of the differences in RCU
> between DYNAMIC vs AUTO.
>
>>
>>>> Note well that non-preemptible RCU explicitly disables preemption across
>>>> all RCU readers.
>>>
>>> Yes, I mentioned this 'disabling preemption' aspect in my last email. My point
>>> being, unlike CONFIG_PREEMPT_NONE, CONFIG_PREEMPT_AUTO allows for kernel
>>> preemption in preempt=none. So the "Don't preempt the kernel" behavior has
>>> changed. That is, preempt=none under CONFIG_PREEMPT_AUTO is different from
>>> CONFIG_PREEMPT_NONE=y already. Here we *are* preempting. And RCU is getting on
>>> the way. It is like saying, you want an option for CONFIG_PREEMPT_RCU to be set
>>> to =n for CONFIG_PREEMPT=y kernels, sighting users who want a fully-preemptible
>>> kernel but are worried about reader preemptions.
>>
>> Such users can simply avoid building with either CONFIG_PREEMPT_NONE=y
>> or with CONFIG_PREEMPT_VOLUNTARY=y.  They might also experiment with
>> CONFIG_RCU_BOOST=y, and also with short timeouts until boosting.
>> If that doesn't do what they need, we talk with them and either help
>> them configure their kernels, make RCU do what they need, or help work
>> out some other way for them to get their jobs done.
>
> Makes sense.
>
>>> That aside, as such, I do agree your point of view, that preemptible readers
>>> presents a problem to folks using preempt=none in this series and we could
>>> decide to keep CONFIG_PREEMPT_RCU optional for whoever wants it that way. I was
>>> just saying that I want CONFIG_PREEMPT_AUTO's preempt=none mode to be somewhat
>>> consistent with CONFIG_PREEMPT_DYNAMIC's preempt=none. Because I'm pretty sure a
>>> week from now, no one will likely be able to tell the difference ;-). So IMHO
>>> either CONFIG_PREEMPT_DYNAMIC should be changed to make CONFIG_PREEMPT_RCU
>>> optional, or this series should be altered to force CONFIG_PREEMPT_RCU=y.
>>>
>>> Let me know if I missed something.
>>
>> Why not key off of the value of CONFIG_PREEMPT_DYNAMIC?  That way,
>> if both CONFIG_PREEMPT_AUTO=y and CONFIG_PREEMPT_DYNAMIC=y, RCU is
>> always preemptible.  Then CONFIG_PREEMPT_DYNAMIC=y enables boot-time
>> (and maybe even run-time) switching between preemption flavors, while
>> CONFIG_PREEMPT_AUTO instead enables unconditional preemption of any
>> region of code that has not explicitly disabled preemption (or irq or
>> bh or whatever).

Currently CONFIG_PREEMPT_DYNAMIC does a few things:

1. dynamic selection of preemption model
2. dynamically toggling explicit preemption points
3. PREEMPT_RCU=y (though maybe this should be fixed to also
   also allow PREEMPT_RCU=n)

Of these 3, PREEMPT_AUTO only really needs (1).

Maybe combining gives us the option of switching between the old and the
new models:
  preempt=none | voluntary | full | auto-none | auto-voluntary

Where the last two provide the new auto semantics. But, the mixture
seems too rich.
This just complicates all the CONFIG_PREEMPT_* configurations more than
they were before when the end goal is to actually reduce and simplify
the number of options.

> That could be done. But currently, these patches disable DYNAMIC if AUTO is
> enabled in the config. I think the reason is the 2 features are incompatible.
> i.e. DYNAMIC wants to override the preemption mode at boot time, where as AUTO
> wants the scheduler to have a say in it using the need-resched LAZY bit.

Yeah exactly. That's why I originally made PREEMPT_AUTO and
PREEMPT_DYNAMIC exclusive of each other.

Thanks

--
ankur

Re: [PATCH 15/30] rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y

Posted by Thomas Gleixner 1 year, 11 months ago

On Mon, Mar 11 2024 at 13:51, Ankur Arora wrote:
> Joel Fernandes <joel@joelfernandes.org> writes:
>>> Why not key off of the value of CONFIG_PREEMPT_DYNAMIC?  That way,
>>> if both CONFIG_PREEMPT_AUTO=y and CONFIG_PREEMPT_DYNAMIC=y, RCU is
>>> always preemptible.  Then CONFIG_PREEMPT_DYNAMIC=y enables boot-time
>>> (and maybe even run-time) switching between preemption flavors, while
>>> CONFIG_PREEMPT_AUTO instead enables unconditional preemption of any
>>> region of code that has not explicitly disabled preemption (or irq or
>>> bh or whatever).
>
> Currently CONFIG_PREEMPT_DYNAMIC does a few things:
>
> 1. dynamic selection of preemption model
> 2. dynamically toggling explicit preemption points
> 3. PREEMPT_RCU=y (though maybe this should be fixed to also
>    also allow PREEMPT_RCU=n)
>
> Of these 3, PREEMPT_AUTO only really needs (1).
>
> Maybe combining gives us the option of switching between the old and the
> new models:
>   preempt=none | voluntary | full | auto-none | auto-voluntary
>
> Where the last two provide the new auto semantics. But, the mixture
> seems too rich.
> This just complicates all the CONFIG_PREEMPT_* configurations more than
> they were before when the end goal is to actually reduce and simplify
> the number of options.
>
>> That could be done. But currently, these patches disable DYNAMIC if AUTO is
>> enabled in the config. I think the reason is the 2 features are incompatible.
>> i.e. DYNAMIC wants to override the preemption mode at boot time, where as AUTO
>> wants the scheduler to have a say in it using the need-resched LAZY bit.
>
> Yeah exactly. That's why I originally made PREEMPT_AUTO and
> PREEMPT_DYNAMIC exclusive of each other.

Rightfully so. The purpose of PREEMPT_AUTO is to get rid of
PREEMPT_DYNAMIC and not to proliferate the existance of it.

There is no point. All what AUTO wants to provide at configuration time
is the default model. So what would DYNAMIC buy what AUTO does not
provide trivially with a single sysfs knob which only affects the
scheduler where the decisions are made and nothing else?

The only extra config knob is PREEMPT_RCU which as we concluded long ago
needs to support both no and yes when AUTO is selected up to the point
where that model can be switched at boot time too.

Seriously, keep this stuff simple and straight forward and keep the real
goals in focus:

   1) Simplify the preemption model zoo

   2) Get rid of the ill defined cond_resched()/might_sleep() hackery

All the extra - pardon my french - ivory tower wankery on top is not
helpful at all. We can debate this forever on a theoretical base and
never get anywhere and anything done.

Please focus on getting the base mechanics in place with the required
fixes for the fallout for preemptible and non-preemtible RCU (selected
at compile time) and work it out from there.

Perfect it the enemy of good. Especially when nobody can come up with a
perfect definition what 'perfect' actually means.

Thanks,

        tglx