[RFC][PATCH v3 0/3] Softirq -rt Optimizations

John Stultz posted 3 patches 3 years, 6 months ago
There is a newer version of this series
arch/s390/include/asm/hardirq.h |  6 ++++
include/linux/interrupt.h       | 18 ++++++++++
include/linux/sched.h           | 10 ++++++
init/Kconfig                    | 10 ++++++
kernel/sched/cpupri.c           | 13 +++++++
kernel/sched/rt.c               | 64 ++++++++++++++++++++++++++++-----
kernel/softirq.c                | 34 ++++++++++++++++--
7 files changed, 144 insertions(+), 11 deletions(-)
[RFC][PATCH v3 0/3] Softirq -rt Optimizations
Posted by John Stultz 3 years, 6 months ago
Hey all,

This series is a set of patches that optimize scheduler decisions around
realtime tasks and softirqs.  This series is a rebased and reworked set
of changes that have been shipping on Android devices for a number of
years, originally created to resolve audio glitches seen on devices
caused by softirqs for network or storage drivers.

Long running softirqs cause issues because they aren’t currently taken
into account when a realtime task is woken up, but they will delay
realtime tasks from running if the realtime tasks are placed on a cpu
currently running a softirq.

This can easily be seen on some devices by running cyclictest* along
with some heavy background filesystems noise:

Without the patches:
T: 0 ( 7596) P:99 I:1000 C:  59980 Min:      7 Act:   13 Avg:   29 Max: 4107
T: 1 ( 7597) P:99 I:1500 C:  39990 Min:     14 Act:   26 Avg:   36 Max: 8994
T: 2 ( 7598) P:99 I:2000 C:  29995 Min:      7 Act:   22 Avg:   35 Max: 3616
T: 3 ( 7599) P:99 I:2500 C:  23915 Min:      7 Act:   25 Avg:   49 Max: 40273
T: 4 ( 7600) P:99 I:3000 C:  19995 Min:      8 Act:   22 Avg:   38 Max: 10510
T: 5 ( 7601) P:99 I:3500 C:  17135 Min:      7 Act:   26 Avg:   39 Max: 13194
T: 6 ( 7602) P:99 I:4000 C:  14990 Min:      7 Act:   26 Avg:   40 Max: 9470
T: 7 ( 7603) P:99 I:4500 C:  13318 Min:      8 Act:   29 Avg:   44 Max: 20101

Which you can visually see in the image here:
 https://github.com/johnstultz-work/misc/raw/main/images/2022-08-09-softirq-rt-big-latency.png

Which is from the perfetto trace captured here:
 https://ui.perfetto.dev/#!/?s=33661aec8ec82c2da0a59263f36f7d72b4a2f4e7a99b28b222bd12ad872f

The first patch adds a bit of generic infrastructure to get the per-cpu
softirq_pending flag.

The second patch in the series adds logic to account for when softirqs
are running, and then conditionally based on
CONFIG_RT_SOFTIRQ_OPTIMIZATION allows rt-task placement to be done in a
way that’s aware if a current softirq might be a long-running one, to
potentially place the rt task on another free core.

The third patch in the series adds logic in __do_softirq(), also under
CONFIG_RT_SOFTIRQ_OPTIMIZATION, to defer some of the potentially long
running softirqs to ksoftirqd if a -rt task is currently running on the
cpu. This patch also includes a folded down fix that stubbs out
ksoftirqd_running() based on CONFIG_RT_SOFTIRQ_OPTIMIZATION, since in
changing to more frequently defer long running softirqs, the logic using
ksoftirqd_running will end up being too conservative and needlessly
delay shorter-running softirqs.

With these patches we see dramatic improvements in the worst case
latencies in the cyclictest* + filesystem noise test above:

With the patches
T: 0 ( 7527) P:99 I:1000 C:  59998 Min:      6 Act:   29 Avg:   35 Max: 1734
T: 1 ( 7528) P:99 I:1500 C:  40000 Min:      7 Act:   39 Avg:   35 Max: 1181
T: 2 ( 7529) P:99 I:2000 C:  30000 Min:      7 Act:   25 Avg:   25 Max: 444
T: 3 ( 7530) P:99 I:2500 C:  24000 Min:      7 Act:   34 Avg:   36 Max: 1729
T: 4 ( 7531) P:99 I:3000 C:  20000 Min:      7 Act:   36 Avg:   25 Max: 406
T: 5 ( 7532) P:99 I:3500 C:  17143 Min:      7 Act:   38 Avg:   34 Max: 1264
T: 6 ( 7533) P:99 I:4000 C:  15000 Min:      7 Act:   27 Avg:   33 Max: 2351
T: 7 ( 7534) P:99 I:4500 C:  13334 Min:      7 Act:   41 Avg:   29 Max: 2285

Since these patches have been carried along for years, and have at times
badly collided with upstream, I wanted to submit them for some initial
review, discussion and feedback so we could hopefully eventually find a
reasonable solution that might land upstream.


* Unfortunately cyclictest had a bug that causes it to always affine
threads to cpus preventing them from being migrated. So you’ll need
to update to the latest version (which includes a fix) to reproduce.

Let me know what you think!

thanks
-john

Changes in v3:
* Added generic __cpu_softirq_pending() accessor to avoid s390 build
  trouble

Cc: John Dias <joaodias@google.com>
Cc: Connor O'Brien <connoro@google.com>
Cc: Rick Yiu <rickyiu@google.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Qais Yousef <qais.yousef@arm.com>
Cc: Chris Redpath <chris.redpath@arm.com>
Cc: Abhijeet Dharmapurikar <adharmap@quicinc.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: kernel-team@android.com


Connor O'Brien (1):
  sched: Avoid placing RT threads on cores handling long softirqs

John Stultz (1):
  softirq: Add generic accessor to percpu softirq_pending data

Pavankumar Kondeti (1):
  softirq: defer softirq processing to ksoftirqd if CPU is busy with RT

 arch/s390/include/asm/hardirq.h |  6 ++++
 include/linux/interrupt.h       | 18 ++++++++++
 include/linux/sched.h           | 10 ++++++
 init/Kconfig                    | 10 ++++++
 kernel/sched/cpupri.c           | 13 +++++++
 kernel/sched/rt.c               | 64 ++++++++++++++++++++++++++++-----
 kernel/softirq.c                | 34 ++++++++++++++++--
 7 files changed, 144 insertions(+), 11 deletions(-)

-- 
2.37.3.968.ga6b4b080e4-goog
Re: [RFC][PATCH v3 0/3] Softirq -rt Optimizations
Posted by Qais Yousef 3 years, 6 months ago
Hi John

On 09/21/22 01:25, John Stultz wrote:
> Hey all,
> 
> This series is a set of patches that optimize scheduler decisions around
> realtime tasks and softirqs.  This series is a rebased and reworked set
> of changes that have been shipping on Android devices for a number of
> years, originally created to resolve audio glitches seen on devices
> caused by softirqs for network or storage drivers.
> 
> Long running softirqs cause issues because they aren’t currently taken
> into account when a realtime task is woken up, but they will delay
> realtime tasks from running if the realtime tasks are placed on a cpu
> currently running a softirq.

Thanks a lot for sending this series. I've raised this problem in various
venues in the past, but it seems it is hard to do something better than what
you propose here.

Borrowing some behaviours from PREEMPT_RT (like threadedirqs) won't cut it
outside PREEMPT_RT AFAIU.

Peter did suggest an alternative at one point in the past to be more aggressive
in limiting softirqs [1] but I never managed to find the time to verify it
- especially its impact on network throughput as this seems to be the tricky
trade-of (and tricky thing to verify for me at least). I'm not sure if BLOCK
softirqs are as sensitive.

I think the proposed approach is not intrusive and offers a good balance that
is well contained and easy to improve upon on the future. It's protected with
a configuration option so users that don't want it can easily disable it.

[1] https://gitlab.arm.com/linux-arm/linux-qy/-/commits/core/softirq/


Thanks

--
Qais Yousef
RE: [RFC][PATCH v3 0/3] Softirq -rt Optimizations
Posted by David Laight 3 years, 6 months ago
From: Qais Yousef
> Sent: 28 September 2022 14:01
> 
> Hi John
> 
> On 09/21/22 01:25, John Stultz wrote:
> > Hey all,
> >
> > This series is a set of patches that optimize scheduler decisions around
> > realtime tasks and softirqs.  This series is a rebased and reworked set
> > of changes that have been shipping on Android devices for a number of
> > years, originally created to resolve audio glitches seen on devices
> > caused by softirqs for network or storage drivers.
> >
> > Long running softirqs cause issues because they aren’t currently taken
> > into account when a realtime task is woken up, but they will delay
> > realtime tasks from running if the realtime tasks are placed on a cpu
> > currently running a softirq.
> 
> Thanks a lot for sending this series. I've raised this problem in various
> venues in the past, but it seems it is hard to do something better than what
> you propose here.
> 
> Borrowing some behaviours from PREEMPT_RT (like threadedirqs) won't cut it
> outside PREEMPT_RT AFAIU.
> 
> Peter did suggest an alternative at one point in the past to be more aggressive
> in limiting softirqs [1] but I never managed to find the time to verify it
> - especially its impact on network throughput as this seems to be the tricky
> trade-of (and tricky thing to verify for me at least). I'm not sure if BLOCK
> softirqs are as sensitive.

I've had issues with the opposite problem.
Long running RT tasks stopping the softint code running.

If an RT task is running, the softint will run in the context of the
RT task - so has priority over it.
If the RT task isn't running the softint stops the RT task being scheduled.
This is really just the same.

If the softint defers back to thread context it won't be scheduled
until any RT task finishes. This is the opposite priority.

IIRC there is another strange case where the RT thread has been woken
but isn't yet running - can't remember the exact details.

I can (mostly) handle the RT task being delayed (there are a lot of RT
threads sharing the work) but it is paramount that the ethernet receive
code actually runs - I can't afford to drop packets (they contain audio
the RT threads are processing).

In my case threaded NAPI (mostly) fixes it - provided the NAPI thread are RT.

	David


> 
> I think the proposed approach is not intrusive and offers a good balance that
> is well contained and easy to improve upon on the future. It's protected with
> a configuration option so users that don't want it can easily disable it.
> 
> [1] https://gitlab.arm.com/linux-arm/linux-qy/-/commits/core/softirq/
> 
> 
> Thanks
> 
> --
> Qais Yousef

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Re: [RFC][PATCH v3 0/3] Softirq -rt Optimizations
Posted by Qais Yousef 3 years, 6 months ago
On 09/28/22 13:51, David Laight wrote:
> From: Qais Yousef
> > Sent: 28 September 2022 14:01
> > 
> > Hi John
> > 
> > On 09/21/22 01:25, John Stultz wrote:
> > > Hey all,
> > >
> > > This series is a set of patches that optimize scheduler decisions around
> > > realtime tasks and softirqs.  This series is a rebased and reworked set
> > > of changes that have been shipping on Android devices for a number of
> > > years, originally created to resolve audio glitches seen on devices
> > > caused by softirqs for network or storage drivers.
> > >
> > > Long running softirqs cause issues because they aren’t currently taken
> > > into account when a realtime task is woken up, but they will delay
> > > realtime tasks from running if the realtime tasks are placed on a cpu
> > > currently running a softirq.
> > 
> > Thanks a lot for sending this series. I've raised this problem in various
> > venues in the past, but it seems it is hard to do something better than what
> > you propose here.
> > 
> > Borrowing some behaviours from PREEMPT_RT (like threadedirqs) won't cut it
> > outside PREEMPT_RT AFAIU.
> > 
> > Peter did suggest an alternative at one point in the past to be more aggressive
> > in limiting softirqs [1] but I never managed to find the time to verify it
> > - especially its impact on network throughput as this seems to be the tricky
> > trade-of (and tricky thing to verify for me at least). I'm not sure if BLOCK
> > softirqs are as sensitive.
> 
> I've had issues with the opposite problem.
> Long running RT tasks stopping the softint code running.
> 
> If an RT task is running, the softint will run in the context of the
> RT task - so has priority over it.
> If the RT task isn't running the softint stops the RT task being scheduled.
> This is really just the same.
> 
> If the softint defers back to thread context it won't be scheduled
> until any RT task finishes. This is the opposite priority.

If we can get a subset of threadedirqs (call it threadedsoftirqs) from
PREEMPT_RT where softirqs can be converted into RT kthreads, that'll alleviate
both sides of the problem IMO. But last I checked with Thomas this won't be
possible. But things might have changed since then..

> 
> IIRC there is another strange case where the RT thread has been woken
> but isn't yet running - can't remember the exact details.
> 
> I can (mostly) handle the RT task being delayed (there are a lot of RT
> threads sharing the work) but it is paramount that the ethernet receive
> code actually runs - I can't afford to drop packets (they contain audio
> the RT threads are processing).
> 
> In my case threaded NAPI (mostly) fixes it - provided the NAPI thread are RT.

There's a netdev_budget and netdev_bugdet_usecs params in procfs that control
how long the NET_RX spends in the softirq. Maybe you need to tweak those too.
In your case, you probably want to increase the budget.

Note that in Android the BLOCK layer seems to cause similar problems which
don't have these NET facilities. So NET is only one side of the problem.


Thanks

--
Qais Yousef
RE: [RFC][PATCH v3 0/3] Softirq -rt Optimizations
Posted by David Laight 3 years, 6 months ago
From: Qais Yousef
> Sent: 28 September 2022 16:56
> 
> On 09/28/22 13:51, David Laight wrote:
> > From: Qais Yousef
> > > Sent: 28 September 2022 14:01
> > >
> > > Hi John
> > >
> > > On 09/21/22 01:25, John Stultz wrote:
> > > > Hey all,
> > > >
> > > > This series is a set of patches that optimize scheduler decisions around
> > > > realtime tasks and softirqs.  This series is a rebased and reworked set
> > > > of changes that have been shipping on Android devices for a number of
> > > > years, originally created to resolve audio glitches seen on devices
> > > > caused by softirqs for network or storage drivers.
> > > >
> > > > Long running softirqs cause issues because they aren’t currently taken
> > > > into account when a realtime task is woken up, but they will delay
> > > > realtime tasks from running if the realtime tasks are placed on a cpu
> > > > currently running a softirq.
> > >
> > > Thanks a lot for sending this series. I've raised this problem in various
> > > venues in the past, but it seems it is hard to do something better than what
> > > you propose here.
> > >
> > > Borrowing some behaviours from PREEMPT_RT (like threadedirqs) won't cut it
> > > outside PREEMPT_RT AFAIU.
> > >
> > > Peter did suggest an alternative at one point in the past to be more aggressive
> > > in limiting softirqs [1] but I never managed to find the time to verify it
> > > - especially its impact on network throughput as this seems to be the tricky
> > > trade-of (and tricky thing to verify for me at least). I'm not sure if BLOCK
> > > softirqs are as sensitive.
> >
> > I've had issues with the opposite problem.
> > Long running RT tasks stopping the softint code running.
> >
> > If an RT task is running, the softint will run in the context of the
> > RT task - so has priority over it.
> > If the RT task isn't running the softint stops the RT task being scheduled.
> > This is really just the same.
> >
> > If the softint defers back to thread context it won't be scheduled
> > until any RT task finishes. This is the opposite priority.
> 
> If we can get a subset of threadedirqs (call it threadedsoftirqs) from
> PREEMPT_RT where softirqs can be converted into RT kthreads, that'll alleviate
> both sides of the problem IMO. But last I checked with Thomas this won't be
> possible. But things might have changed since then..

Part of the problem is that can significantly increase latency.
Some softirq calls will be latency sensitive.

> > IIRC there is another strange case where the RT thread has been woken
> > but isn't yet running - can't remember the exact details.
> >
> > I can (mostly) handle the RT task being delayed (there are a lot of RT
> > threads sharing the work) but it is paramount that the ethernet receive
> > code actually runs - I can't afford to drop packets (they contain audio
> > the RT threads are processing).
> >
> > In my case threaded NAPI (mostly) fixes it - provided the NAPI thread are RT.
> 
> There's a netdev_budget and netdev_bugdet_usecs params in procfs that control
> how long the NET_RX spends in the softirq. Maybe you need to tweak those too.
> In your case, you probably want to increase the budget.

Maybe, but the problem is that the softint code is far too willing
to drop to kthread context.
Eric made a change to reduce that (to avoid losing ethernet packets)
but the original test got added back - there are now two tests, but
the original one dominates. Eric's bug fix got reverted (with extra
tests that make the code slower).

I did test with that changed, but still got some lost packets.
Trying to receive 500000 UDP packets/sec is quite hard!
They are also split across 10k unconnected sockets.

> Note that in Android the BLOCK layer seems to cause similar problems which
> don't have these NET facilities. So NET is only one side of the problem.

Isn't the block layer softints stopping other code?
I'd really got the other problem.
Although I do have a 10ms timer wakeup that really needs not to be delayed.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Re: [RFC][PATCH v3 0/3] Softirq -rt Optimizations
Posted by Qais Yousef 3 years, 6 months ago
On 09/28/22 16:19, David Laight wrote:
> From: Qais Yousef
> > Sent: 28 September 2022 16:56
> > 
> > On 09/28/22 13:51, David Laight wrote:
> > > From: Qais Yousef
> > > > Sent: 28 September 2022 14:01
> > > >
> > > > Hi John
> > > >
> > > > On 09/21/22 01:25, John Stultz wrote:
> > > > > Hey all,
> > > > >
> > > > > This series is a set of patches that optimize scheduler decisions around
> > > > > realtime tasks and softirqs.  This series is a rebased and reworked set
> > > > > of changes that have been shipping on Android devices for a number of
> > > > > years, originally created to resolve audio glitches seen on devices
> > > > > caused by softirqs for network or storage drivers.
> > > > >
> > > > > Long running softirqs cause issues because they aren’t currently taken
> > > > > into account when a realtime task is woken up, but they will delay
> > > > > realtime tasks from running if the realtime tasks are placed on a cpu
> > > > > currently running a softirq.
> > > >
> > > > Thanks a lot for sending this series. I've raised this problem in various
> > > > venues in the past, but it seems it is hard to do something better than what
> > > > you propose here.
> > > >
> > > > Borrowing some behaviours from PREEMPT_RT (like threadedirqs) won't cut it
> > > > outside PREEMPT_RT AFAIU.
> > > >
> > > > Peter did suggest an alternative at one point in the past to be more aggressive
> > > > in limiting softirqs [1] but I never managed to find the time to verify it
> > > > - especially its impact on network throughput as this seems to be the tricky
> > > > trade-of (and tricky thing to verify for me at least). I'm not sure if BLOCK
> > > > softirqs are as sensitive.
> > >
> > > I've had issues with the opposite problem.
> > > Long running RT tasks stopping the softint code running.
> > >
> > > If an RT task is running, the softint will run in the context of the
> > > RT task - so has priority over it.
> > > If the RT task isn't running the softint stops the RT task being scheduled.
> > > This is really just the same.
> > >
> > > If the softint defers back to thread context it won't be scheduled
> > > until any RT task finishes. This is the opposite priority.
> > 
> > If we can get a subset of threadedirqs (call it threadedsoftirqs) from
> > PREEMPT_RT where softirqs can be converted into RT kthreads, that'll alleviate
> > both sides of the problem IMO. But last I checked with Thomas this won't be
> > possible. But things might have changed since then..
> 
> Part of the problem is that can significantly increase latency.
> Some softirq calls will be latency sensitive.

Probably part of the problem why it can't be made available outside PREEMPT_RT
:)

> 
> > > IIRC there is another strange case where the RT thread has been woken
> > > but isn't yet running - can't remember the exact details.
> > >
> > > I can (mostly) handle the RT task being delayed (there are a lot of RT
> > > threads sharing the work) but it is paramount that the ethernet receive
> > > code actually runs - I can't afford to drop packets (they contain audio
> > > the RT threads are processing).
> > >
> > > In my case threaded NAPI (mostly) fixes it - provided the NAPI thread are RT.
> > 
> > There's a netdev_budget and netdev_bugdet_usecs params in procfs that control
> > how long the NET_RX spends in the softirq. Maybe you need to tweak those too.
> > In your case, you probably want to increase the budget.
> 
> Maybe, but the problem is that the softint code is far too willing
> to drop to kthread context.
> Eric made a change to reduce that (to avoid losing ethernet packets)
> but the original test got added back - there are now two tests, but
> the original one dominates. Eric's bug fix got reverted (with extra
> tests that make the code slower).

Would be good to know what fix you're referring to.

> I did test with that changed, but still got some lost packets.
> Trying to receive 500000 UDP packets/sec is quite hard!
> They are also split across 10k unconnected sockets.

There's a hardcoded value in kernel/softirq.c::MAX_SOFTIRQ_TIME which is set to
2ms.

It might be worth bringing your problem with the networking community. I don't
think your use case is unique - but they'd know better and what needs to be
done to achieve it.

Note there's a physical upper limit that will be dictated by the hardware;
whether it's the number of cores, max frequencies, memory (speed and size) etc.

I'm assuming this is not a problem, but worth to highlight.

> > Note that in Android the BLOCK layer seems to cause similar problems which
> > don't have these NET facilities. So NET is only one side of the problem.
> 
> Isn't the block layer softints stopping other code?
> I'd really got the other problem.
> Although I do have a 10ms timer wakeup that really needs not to be delayed.

I was just trying to highlight that this series is concerned with more than
just networking.

I thought you had concerns about this series, but it seems you're trying to
highlight another type of relevant problem.


Cheers

--
Qais Yousef