[PATCH v4 1/3] softirq: Allow raising SCHED_SOFTIRQ from SMP-call-function on RT kernel

K Prateek Nayak posted 3 patches 3 weeks, 4 days ago
[PATCH v4 1/3] softirq: Allow raising SCHED_SOFTIRQ from SMP-call-function on RT kernel
Posted by K Prateek Nayak 3 weeks, 4 days ago
do_softirq_post_smp_call_flush() on PREEMPT_RT kernels carries a
WARN_ON_ONCE() for any SOFTIRQ being raised from an SMP-call-function.
Since do_softirq_post_smp_call_flush() is called with preempt disabled,
raising a SOFTIRQ during flush_smp_call_function_queue() can lead to
longer preempt disabled sections.

Since commit b2a02fc43a1f ("smp: Optimize
send_call_function_single_ipi()") IPIs to an idle CPU in
TIF_POLLING_NRFLAG mode can be optimized out by instead setting
TIF_NEED_RESCHED bit in idle task's thread_info and relying on the
flush_smp_call_function_queue() in the idle-exit path to run the
SMP-call-function.

To trigger an idle load balancing, the scheduler queues
nohz_csd_function() responsible for triggering an idle load balancing on
a target nohz idle CPU and sends an IPI. Only now, this IPI is optimized
out and the SMP-call-function is executed from
flush_smp_call_function_queue() in do_idle() which can raise a
SCHED_SOFTIRQ to trigger the balancing.

So far, this went undetected since, the need_resched() check in
nohz_csd_function() would make it bail out of idle load balancing early
as the idle thread does not clear TIF_POLLING_NRFLAG before calling
flush_smp_call_function_queue(). The need_resched() check was added with
the intent to catch a new task wakeup, however, it has recently
discovered to be unnecessary and will be removed soon. As such,
nohz_csd_function() will raise a SCHED_SOFTIRQ from
flush_smp_call_function_queue() to trigger an idle load balance on an
idle target.

nohz_csd_function() bails out early if "idle_cpu()" check for the
target CPU returns false and should not delay a newly woken up task from
running. Account for this and prevent a WARN_ON_ONCE() when
SCHED_SOFTIRQ is raised from flush_smp_call_function_queue().

Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
v3..v4:

o No changes.
---
 kernel/softirq.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index b756d6b3fd09..d89be0affe46 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -280,17 +280,24 @@ static inline void invoke_softirq(void)
 		wakeup_softirqd();
 }
 
+#define SCHED_SOFTIRQ_MASK	BIT(SCHED_SOFTIRQ)
+
 /*
  * flush_smp_call_function_queue() can raise a soft interrupt in a function
- * call. On RT kernels this is undesired and the only known functionality
- * in the block layer which does this is disabled on RT. If soft interrupts
- * get raised which haven't been raised before the flush, warn so it can be
+ * call. On RT kernels this is undesired and the only known functionalities
+ * are in the block layer which is disabled on RT, and in the scheduler for
+ * idle load balancing. If soft interrupts get raised which haven't been
+ * raised before the flush, warn if it is not a SCHED_SOFTIRQ so it can be
  * investigated.
  */
 void do_softirq_post_smp_call_flush(unsigned int was_pending)
 {
-	if (WARN_ON_ONCE(was_pending != local_softirq_pending()))
+	unsigned int is_pending = local_softirq_pending();
+
+	if (unlikely(was_pending != is_pending)) {
+		WARN_ON_ONCE(was_pending != (is_pending & ~SCHED_SOFTIRQ_MASK));
 		invoke_softirq();
+	}
 }
 
 #else /* CONFIG_PREEMPT_RT */
-- 
2.34.1
Re: [PATCH v4 1/3] softirq: Allow raising SCHED_SOFTIRQ from SMP-call-function on RT kernel
Posted by Sebastian Andrzej Siewior 2 weeks, 2 days ago
On 2024-10-30 07:15:55 [+0000], K Prateek Nayak wrote:
…
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -280,17 +280,24 @@ static inline void invoke_softirq(void)
>  		wakeup_softirqd();
>  }
>  
> +#define SCHED_SOFTIRQ_MASK	BIT(SCHED_SOFTIRQ)
> +
>  /*
>   * flush_smp_call_function_queue() can raise a soft interrupt in a function
> - * call. On RT kernels this is undesired and the only known functionality
> - * in the block layer which does this is disabled on RT. If soft interrupts
> - * get raised which haven't been raised before the flush, warn so it can be
> + * call. On RT kernels this is undesired and the only known functionalities
> + * are in the block layer which is disabled on RT, and in the scheduler for
> + * idle load balancing. If soft interrupts get raised which haven't been
> + * raised before the flush, warn if it is not a SCHED_SOFTIRQ so it can be
>   * investigated.
>   */
>  void do_softirq_post_smp_call_flush(unsigned int was_pending)
>  {
> -	if (WARN_ON_ONCE(was_pending != local_softirq_pending()))
> +	unsigned int is_pending = local_softirq_pending();
> +
> +	if (unlikely(was_pending != is_pending)) {
> +		WARN_ON_ONCE(was_pending != (is_pending & ~SCHED_SOFTIRQ_MASK));
>  		invoke_softirq();

This behaviour also happens with threadirqs on !PREEMPT_RT but without
the warning. I haven't checked it but I expect invoke_softirq() to wake
ksoftirqd here, too.
This only happens because of 2/3 in the series as far as I can tell.

Now I am curious to hear from the sched/ NOHZ folks if it makes sense to
invoke SCHED_SOFTIRQ from within ksoftirqd because unlike on an idle CPU
the CPU is now not seen as idle due to ksoftirqd running on the CPU.
There is code that checks rq->nr_running and/ or idle_cpu().

> +	}
>  }

Sebastian