[PATCH 0/5] Fix SCHED_DEADLINE bandwidth accounting during suspend

Juri Lelli posted 5 patches 1 year ago
There is a newer version of this series
include/linux/sched.h          |  2 ++
include/linux/sched/deadline.h |  7 +++++++
include/linux/sched/topology.h |  2 ++
kernel/cgroup/cpuset.c         | 20 ++++++++++---------
kernel/sched/core.c            |  4 ++--
kernel/sched/deadline.c        | 36 ++++++++++++++++++++--------------
kernel/sched/debug.c           |  8 ++++----
kernel/sched/rt.c              |  2 ++
kernel/sched/sched.h           |  2 +-
kernel/sched/topology.c        | 33 +++++++++++++++----------------
10 files changed, 68 insertions(+), 48 deletions(-)
[PATCH 0/5] Fix SCHED_DEADLINE bandwidth accounting during suspend
Posted by Juri Lelli 1 year ago
Hello!

Jon reported [1] a suspend regression on a Tegra board configured to
boot with isolcpus and bisected it to commit 53916d5fd3c0
("sched/deadline: Check bandwidth overflow earlier for hotplug").

Root cause analysis pointed out that we are currently failing to
correctly clear and restore bandwidth accounting on root domains after
changes that initiate from partition_sched_domains(), as it is the case
for suspend operations on that board.

The way we currently make sure that accounting properly follows root
domain changes is quite convoluted and was indeed missing some corner
cases. So, instead of adding yet more fragile operations, I thought we
could simplify things by always clearing and rebuilding bandwidth
information on all domains after an update is complete. Also, we should
be ignoring DEADLINE special tasks when doing so (e.g. sugov), since we
ignore them already for runtime enforcement and admission control
anyway.

The following implements the approach by:

- 01/05: filter out DEADLINE special tasks
- 02/05: preparatory wrappers to be able to grab sched_domains_mutex on
         UP
- 03/05: generalize unique visiting of root domains so that we can
         re-use the mechanism elsewhere
- 04/05: the bulk of the approach, clean and rebuild after changes
- 05/05: clean up a now redundant call

Please test and review. The set is also available at

git@github.com:jlelli/linux.git upstream/deadline/domains-suspend

Waiman, could you please double check this doesn't break the cpuset
kselftest? It returns PASS on my end, but you never know.

Best,
Juri

1 - https://lore.kernel.org/lkml/ba51a43f-796d-4b79-808a-b8185905638a@nvidia.com/

Juri Lelli (5):
  sched/deadline: Ignore special tasks when rebuilding domains
  sched/topology: Wrappers for sched_domains_mutex
  sched/deadline: Generalize unique visiting of root domains
  sched/deadline: Rebuild root domain accounting after every update
  sched/topology: Remove redundant dl_clear_root_domain call

 include/linux/sched.h          |  2 ++
 include/linux/sched/deadline.h |  7 +++++++
 include/linux/sched/topology.h |  2 ++
 kernel/cgroup/cpuset.c         | 20 ++++++++++---------
 kernel/sched/core.c            |  4 ++--
 kernel/sched/deadline.c        | 36 ++++++++++++++++++++--------------
 kernel/sched/debug.c           |  8 ++++----
 kernel/sched/rt.c              |  2 ++
 kernel/sched/sched.h           |  2 +-
 kernel/sched/topology.c        | 33 +++++++++++++++----------------
 10 files changed, 68 insertions(+), 48 deletions(-)


base-commit: d082ecbc71e9e0bf49883ee4afd435a77a5101b6
-- 
2.48.1
Re: [PATCH 0/5] Fix SCHED_DEADLINE bandwidth accounting during suspend
Posted by Jon Hunter 1 year ago
Hi Juri,

On 04/03/2025 08:40, Juri Lelli wrote:
> Hello!
> 
> Jon reported [1] a suspend regression on a Tegra board configured to
> boot with isolcpus and bisected it to commit 53916d5fd3c0
> ("sched/deadline: Check bandwidth overflow earlier for hotplug").
> 
> Root cause analysis pointed out that we are currently failing to
> correctly clear and restore bandwidth accounting on root domains after
> changes that initiate from partition_sched_domains(), as it is the case
> for suspend operations on that board.
> 
> The way we currently make sure that accounting properly follows root
> domain changes is quite convoluted and was indeed missing some corner
> cases. So, instead of adding yet more fragile operations, I thought we
> could simplify things by always clearing and rebuilding bandwidth
> information on all domains after an update is complete. Also, we should
> be ignoring DEADLINE special tasks when doing so (e.g. sugov), since we
> ignore them already for runtime enforcement and admission control
> anyway.
> 
> The following implements the approach by:
> 
> - 01/05: filter out DEADLINE special tasks
> - 02/05: preparatory wrappers to be able to grab sched_domains_mutex on
>           UP
> - 03/05: generalize unique visiting of root domains so that we can
>           re-use the mechanism elsewhere
> - 04/05: the bulk of the approach, clean and rebuild after changes
> - 05/05: clean up a now redundant call
> 
> Please test and review. The set is also available at
> 
> git@github.com:jlelli/linux.git upstream/deadline/domains-suspend


I know that this is still under review, but I have tested on my side and 
it is working for me, so feel free to include my ...

Tested-by: Jon Hunter <jonathanh@nvidia.com>

Thanks!
Jon

-- 
nvpublic
Re: [PATCH 0/5] Fix SCHED_DEADLINE bandwidth accounting during suspend
Posted by Juri Lelli 1 year ago
Hi Jon,

On 04/03/25 15:32, Jon Hunter wrote:
> Hi Juri,
> 
> On 04/03/2025 08:40, Juri Lelli wrote:
> > Hello!
> > 
> > Jon reported [1] a suspend regression on a Tegra board configured to
> > boot with isolcpus and bisected it to commit 53916d5fd3c0
> > ("sched/deadline: Check bandwidth overflow earlier for hotplug").
> > 
> > Root cause analysis pointed out that we are currently failing to
> > correctly clear and restore bandwidth accounting on root domains after
> > changes that initiate from partition_sched_domains(), as it is the case
> > for suspend operations on that board.
> > 
> > The way we currently make sure that accounting properly follows root
> > domain changes is quite convoluted and was indeed missing some corner
> > cases. So, instead of adding yet more fragile operations, I thought we
> > could simplify things by always clearing and rebuilding bandwidth
> > information on all domains after an update is complete. Also, we should
> > be ignoring DEADLINE special tasks when doing so (e.g. sugov), since we
> > ignore them already for runtime enforcement and admission control
> > anyway.
> > 
> > The following implements the approach by:
> > 
> > - 01/05: filter out DEADLINE special tasks
> > - 02/05: preparatory wrappers to be able to grab sched_domains_mutex on
> >           UP
> > - 03/05: generalize unique visiting of root domains so that we can
> >           re-use the mechanism elsewhere
> > - 04/05: the bulk of the approach, clean and rebuild after changes
> > - 05/05: clean up a now redundant call
> > 
> > Please test and review. The set is also available at
> > 
> > git@github.com:jlelli/linux.git upstream/deadline/domains-suspend
> 
> 
> I know that this is still under review, but I have tested on my side and it
> is working for me, so feel free to include my ...
> 
> Tested-by: Jon Hunter <jonathanh@nvidia.com>

Great to hear this and thanks for the super quick turn around with
testing. I will be implementing the changes that Waiman (and possibly
others) is suggesting and post a new version soon.

Best,
Juri