[RFC PATCH v3 3/3] softirq: defer softirq processing to ksoftirqd if CPU is busy with RT

John Stultz posted 3 patches 3 years, 6 months ago
There is a newer version of this series
[RFC PATCH v3 3/3] softirq: defer softirq processing to ksoftirqd if CPU is busy with RT
Posted by John Stultz 3 years, 6 months ago
From: Pavankumar Kondeti <pkondeti@codeaurora.org>

Defer the softirq processing to ksoftirqd if a RT task is
running or queued on the current CPU. This complements the RT
task placement algorithm which tries to find a CPU that is not
currently busy with softirqs.

Currently NET_TX, NET_RX, BLOCK and TASKLET softirqs are only
deferred as they can potentially run for long time.

Additionally, this patch stubs out ksoftirqd_running() logic,
in the CONFIG_RT_SOFTIRQ_OPTIMIZATION case, as deferring
potentially long-running softirqs will cause the logic to not
process shorter-running softirqs immediately. By stubbing it out
the potentially long running softirqs are deferred, but the
shorter running ones can still run immediately.

This patch includes folded-in fixes by:
  Lingutla Chandrasekhar <clingutla@codeaurora.org>
  Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
  J. Avila <elavila@google.com>

Cc: John Dias <joaodias@google.com>
Cc: Connor O'Brien <connoro@google.com>
Cc: Rick Yiu <rickyiu@google.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Qais Yousef <qais.yousef@arm.com>
Cc: Chris Redpath <chris.redpath@arm.com>
Cc: Abhijeet Dharmapurikar <adharmap@quicinc.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-team@android.com
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
[satyap@codeaurora.org: trivial merge conflict resolution.]
Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
[elavila: Port to mainline, squash with bugfix]
Signed-off-by: J. Avila <elavila@google.com>
[jstultz: Rebase to linus/HEAD, minor rearranging of code,
 included bug fix Reported-by: Qais Yousef <qais.yousef@arm.com> ]
Signed-off-by: John Stultz <jstultz@google.com>
---
 include/linux/sched.h | 10 ++++++++++
 kernel/sched/cpupri.c | 13 +++++++++++++
 kernel/softirq.c      | 25 +++++++++++++++++++++++--
 3 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index e7b2f8a5c711..7f76371cbbb0 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1826,6 +1826,16 @@ current_restore_flags(unsigned long orig_flags, unsigned long flags)
 
 extern int cpuset_cpumask_can_shrink(const struct cpumask *cur, const struct cpumask *trial);
 extern int task_can_attach(struct task_struct *p, const struct cpumask *cs_effective_cpus);
+
+#ifdef CONFIG_RT_SOFTIRQ_OPTIMIZATION
+extern bool cpupri_check_rt(void);
+#else
+static inline bool cpupri_check_rt(void)
+{
+	return false;
+}
+#endif
+
 #ifdef CONFIG_SMP
 extern void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask);
 extern int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask);
diff --git a/kernel/sched/cpupri.c b/kernel/sched/cpupri.c
index fa9ce9d83683..18dc75d16951 100644
--- a/kernel/sched/cpupri.c
+++ b/kernel/sched/cpupri.c
@@ -64,6 +64,19 @@ static int convert_prio(int prio)
 	return cpupri;
 }
 
+#ifdef CONFIG_RT_SOFTIRQ_OPTIMIZATION
+/*
+ * cpupri_check_rt - check if CPU has a RT task
+ * should be called from rcu-sched read section.
+ */
+bool cpupri_check_rt(void)
+{
+	int cpu = raw_smp_processor_id();
+
+	return cpu_rq(cpu)->rd->cpupri.cpu_to_pri[cpu] > CPUPRI_NORMAL;
+}
+#endif
+
 static inline int __cpupri_find(struct cpupri *cp, struct task_struct *p,
 				struct cpumask *lowest_mask, int idx)
 {
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 35ee79dd8786..203a70dc9459 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -87,6 +87,7 @@ static void wakeup_softirqd(void)
 		wake_up_process(tsk);
 }
 
+#ifndef CONFIG_RT_SOFTIRQ_OPTIMIZATION
 /*
  * If ksoftirqd is scheduled, we do not want to process pending softirqs
  * right now. Let ksoftirqd handle this at its own rate, to get fairness,
@@ -101,6 +102,9 @@ static bool ksoftirqd_running(unsigned long pending)
 		return false;
 	return tsk && task_is_running(tsk) && !__kthread_should_park(tsk);
 }
+#else
+#define ksoftirqd_running(pending) (false)
+#endif /* CONFIG_RT_SOFTIRQ_OPTIMIZATION */
 
 #ifdef CONFIG_TRACE_IRQFLAGS
 DEFINE_PER_CPU(int, hardirqs_enabled);
@@ -532,6 +536,17 @@ static inline bool lockdep_softirq_start(void) { return false; }
 static inline void lockdep_softirq_end(bool in_hardirq) { }
 #endif
 
+static __u32 softirq_deferred_for_rt(__u32 *pending)
+{
+	__u32 deferred = 0;
+
+	if (cpupri_check_rt()) {
+		deferred = *pending & LONG_SOFTIRQ_MASK;
+		*pending &= ~LONG_SOFTIRQ_MASK;
+	}
+	return deferred;
+}
+
 asmlinkage __visible void __softirq_entry __do_softirq(void)
 {
 	unsigned long end = jiffies + MAX_SOFTIRQ_TIME;
@@ -539,6 +554,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
 	int max_restart = MAX_SOFTIRQ_RESTART;
 	struct softirq_action *h;
 	bool in_hardirq;
+	__u32 deferred;
 	__u32 pending;
 	int softirq_bit;
 
@@ -551,13 +567,15 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
 
 	pending = local_softirq_pending();
 
+	deferred = softirq_deferred_for_rt(&pending);
 	softirq_handle_begin();
+
 	in_hardirq = lockdep_softirq_start();
 	account_softirq_enter(current);
 
 restart:
 	/* Reset the pending bitmask before enabling irqs */
-	set_softirq_pending(0);
+	set_softirq_pending(deferred);
 	__this_cpu_write(active_softirqs, pending);
 
 	local_irq_enable();
@@ -596,13 +614,16 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
 	local_irq_disable();
 
 	pending = local_softirq_pending();
+	deferred = softirq_deferred_for_rt(&pending);
+
 	if (pending) {
 		if (time_before(jiffies, end) && !need_resched() &&
 		    --max_restart)
 			goto restart;
+	}
 
+	if (pending | deferred)
 		wakeup_softirqd();
-	}
 
 	account_softirq_exit(current);
 	lockdep_softirq_end(in_hardirq);
-- 
2.37.3.968.ga6b4b080e4-goog
Re: [RFC PATCH v3 3/3] softirq: defer softirq processing to ksoftirqd if CPU is busy with RT
Posted by Qais Yousef 3 years, 6 months ago
On 09/21/22 01:25, John Stultz wrote:
> From: Pavankumar Kondeti <pkondeti@codeaurora.org>
> 
> Defer the softirq processing to ksoftirqd if a RT task is
> running or queued on the current CPU. This complements the RT
> task placement algorithm which tries to find a CPU that is not
> currently busy with softirqs.
> 
> Currently NET_TX, NET_RX, BLOCK and TASKLET softirqs are only

Should we mention IRQ_POLL?

I think TASKLET is debatable as I mentioned in my other email.

> deferred as they can potentially run for long time.
> 
> Additionally, this patch stubs out ksoftirqd_running() logic,
> in the CONFIG_RT_SOFTIRQ_OPTIMIZATION case, as deferring
> potentially long-running softirqs will cause the logic to not
> process shorter-running softirqs immediately. By stubbing it out
> the potentially long running softirqs are deferred, but the
> shorter running ones can still run immediately.
> 
> This patch includes folded-in fixes by:
>   Lingutla Chandrasekhar <clingutla@codeaurora.org>
>   Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
>   J. Avila <elavila@google.com>
> 
> Cc: John Dias <joaodias@google.com>
> Cc: Connor O'Brien <connoro@google.com>
> Cc: Rick Yiu <rickyiu@google.com>
> Cc: John Kacur <jkacur@redhat.com>
> Cc: Qais Yousef <qais.yousef@arm.com>
> Cc: Chris Redpath <chris.redpath@arm.com>
> Cc: Abhijeet Dharmapurikar <adharmap@quicinc.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Juri Lelli <juri.lelli@redhat.com>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: kernel-team@android.com
> Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
> [satyap@codeaurora.org: trivial merge conflict resolution.]
> Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
> [elavila: Port to mainline, squash with bugfix]
> Signed-off-by: J. Avila <elavila@google.com>
> [jstultz: Rebase to linus/HEAD, minor rearranging of code,
>  included bug fix Reported-by: Qais Yousef <qais.yousef@arm.com> ]
> Signed-off-by: John Stultz <jstultz@google.com>
> ---
>  include/linux/sched.h | 10 ++++++++++
>  kernel/sched/cpupri.c | 13 +++++++++++++
>  kernel/softirq.c      | 25 +++++++++++++++++++++++--
>  3 files changed, 46 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index e7b2f8a5c711..7f76371cbbb0 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1826,6 +1826,16 @@ current_restore_flags(unsigned long orig_flags, unsigned long flags)
>  
>  extern int cpuset_cpumask_can_shrink(const struct cpumask *cur, const struct cpumask *trial);
>  extern int task_can_attach(struct task_struct *p, const struct cpumask *cs_effective_cpus);
> +
> +#ifdef CONFIG_RT_SOFTIRQ_OPTIMIZATION
> +extern bool cpupri_check_rt(void);
> +#else
> +static inline bool cpupri_check_rt(void)
> +{
> +	return false;
> +}
> +#endif
> +
>  #ifdef CONFIG_SMP
>  extern void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask);
>  extern int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask);
> diff --git a/kernel/sched/cpupri.c b/kernel/sched/cpupri.c
> index fa9ce9d83683..18dc75d16951 100644
> --- a/kernel/sched/cpupri.c
> +++ b/kernel/sched/cpupri.c
> @@ -64,6 +64,19 @@ static int convert_prio(int prio)
>  	return cpupri;
>  }
>  
> +#ifdef CONFIG_RT_SOFTIRQ_OPTIMIZATION
> +/*
> + * cpupri_check_rt - check if CPU has a RT task
> + * should be called from rcu-sched read section.
> + */
> +bool cpupri_check_rt(void)
> +{
> +	int cpu = raw_smp_processor_id();
> +
> +	return cpu_rq(cpu)->rd->cpupri.cpu_to_pri[cpu] > CPUPRI_NORMAL;
> +}

Priorities always mess up with my brain! I always forget which direction to
look at :D

Hmm I was wondering why not do rt_task(current), but if the task is not running
(which can only indicate there's a DL or a stopper task preempting it), that
won't work. But I think your code has a similar problem; you'll return true
even if there's only a DL task running since we set the priority to
CPUPRI_HIGHER which will cause your condition to return true.

This makes me think if we should enable this optimization for DL tasks too.
Hmm...

That said, is there a reason why we can't remove this function and just call
rt_task(current) directly in softirq_deferred_for_rt()?

If we decided to care about DL we can do rt_task(current) || dl_task(current).


Thanks

--
Qais Yousef

> +#endif
> +
>  static inline int __cpupri_find(struct cpupri *cp, struct task_struct *p,
>  				struct cpumask *lowest_mask, int idx)
>  {
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index 35ee79dd8786..203a70dc9459 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -87,6 +87,7 @@ static void wakeup_softirqd(void)
>  		wake_up_process(tsk);
>  }
>  
> +#ifndef CONFIG_RT_SOFTIRQ_OPTIMIZATION
>  /*
>   * If ksoftirqd is scheduled, we do not want to process pending softirqs
>   * right now. Let ksoftirqd handle this at its own rate, to get fairness,
> @@ -101,6 +102,9 @@ static bool ksoftirqd_running(unsigned long pending)
>  		return false;
>  	return tsk && task_is_running(tsk) && !__kthread_should_park(tsk);
>  }
> +#else
> +#define ksoftirqd_running(pending) (false)
> +#endif /* CONFIG_RT_SOFTIRQ_OPTIMIZATION */
>  
>  #ifdef CONFIG_TRACE_IRQFLAGS
>  DEFINE_PER_CPU(int, hardirqs_enabled);
> @@ -532,6 +536,17 @@ static inline bool lockdep_softirq_start(void) { return false; }
>  static inline void lockdep_softirq_end(bool in_hardirq) { }
>  #endif
>  
> +static __u32 softirq_deferred_for_rt(__u32 *pending)
> +{
> +	__u32 deferred = 0;
> +
> +	if (cpupri_check_rt()) {
> +		deferred = *pending & LONG_SOFTIRQ_MASK;
> +		*pending &= ~LONG_SOFTIRQ_MASK;
> +	}
> +	return deferred;
> +}
> +
>  asmlinkage __visible void __softirq_entry __do_softirq(void)
>  {
>  	unsigned long end = jiffies + MAX_SOFTIRQ_TIME;
> @@ -539,6 +554,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
>  	int max_restart = MAX_SOFTIRQ_RESTART;
>  	struct softirq_action *h;
>  	bool in_hardirq;
> +	__u32 deferred;
>  	__u32 pending;
>  	int softirq_bit;
>  
> @@ -551,13 +567,15 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
>  
>  	pending = local_softirq_pending();
>  
> +	deferred = softirq_deferred_for_rt(&pending);
>  	softirq_handle_begin();
> +
>  	in_hardirq = lockdep_softirq_start();
>  	account_softirq_enter(current);
>  
>  restart:
>  	/* Reset the pending bitmask before enabling irqs */
> -	set_softirq_pending(0);
> +	set_softirq_pending(deferred);
>  	__this_cpu_write(active_softirqs, pending);
>  
>  	local_irq_enable();
> @@ -596,13 +614,16 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
>  	local_irq_disable();
>  
>  	pending = local_softirq_pending();
> +	deferred = softirq_deferred_for_rt(&pending);
> +
>  	if (pending) {
>  		if (time_before(jiffies, end) && !need_resched() &&
>  		    --max_restart)
>  			goto restart;
> +	}
>  
> +	if (pending | deferred)
>  		wakeup_softirqd();
> -	}
>  
>  	account_softirq_exit(current);
>  	lockdep_softirq_end(in_hardirq);
> -- 
> 2.37.3.968.ga6b4b080e4-goog
>
Re: [RFC PATCH v3 3/3] softirq: defer softirq processing to ksoftirqd if CPU is busy with RT
Posted by John Stultz 3 years, 6 months ago
On Wed, Sep 28, 2022 at 5:56 AM Qais Yousef <qais.yousef@arm.com> wrote:
>
> On 09/21/22 01:25, John Stultz wrote:
> > From: Pavankumar Kondeti <pkondeti@codeaurora.org>
> >
> > Defer the softirq processing to ksoftirqd if a RT task is
> > running or queued on the current CPU. This complements the RT
> > task placement algorithm which tries to find a CPU that is not
> > currently busy with softirqs.
> >
> > Currently NET_TX, NET_RX, BLOCK and TASKLET softirqs are only
>
> Should we mention IRQ_POLL?

Ah, yes. Thank you for pointing that out.

> I think TASKLET is debatable as I mentioned in my other email.

Yeah, I've dropped it for now.


> > +#ifdef CONFIG_RT_SOFTIRQ_OPTIMIZATION
> > +/*
> > + * cpupri_check_rt - check if CPU has a RT task
> > + * should be called from rcu-sched read section.
> > + */
> > +bool cpupri_check_rt(void)
> > +{
> > +     int cpu = raw_smp_processor_id();
> > +
> > +     return cpu_rq(cpu)->rd->cpupri.cpu_to_pri[cpu] > CPUPRI_NORMAL;
> > +}
>
> Priorities always mess up with my brain! I always forget which direction to
> look at :D

Yeah, cpu_pri logic in particular (as it also depends on which version
you're looking at - the original version of this patch against an
older kernel had an off by one error that took awhile to find).

> Hmm I was wondering why not do rt_task(current), but if the task is not running
> (which can only indicate there's a DL or a stopper task preempting it), that
> won't work. But I think your code has a similar problem; you'll return true
> even if there's only a DL task running since we set the priority to
> CPUPRI_HIGHER which will cause your condition to return true.
>
> This makes me think if we should enable this optimization for DL tasks too.
> Hmm...
>
> That said, is there a reason why we can't remove this function and just call
> rt_task(current) directly in softirq_deferred_for_rt()?
>

I had thought similarly, but had hesitated to switch in case there was
some subtlety I wasn't seeing.
But I think you've persuaded me to simplify this.

Thanks again for the feedback and suggestions!
-john