[PATCH v2 0/8] Fix SCHED_DEADLINE bandwidth accounting during suspend

Juri Lelli posted 8 patches 11 months, 1 week ago
There is a newer version of this series
include/linux/cpuset.h         |  5 +++++
include/linux/sched.h          |  2 ++
include/linux/sched/deadline.h |  7 +++++++
include/linux/sched/topology.h | 10 ---------
kernel/cgroup/cpuset.c         | 27 +++++++++----------------
kernel/sched/core.c            |  4 ++--
kernel/sched/deadline.c        | 37 ++++++++++++++++++++--------------
kernel/sched/debug.c           |  8 ++++----
kernel/sched/rt.c              |  2 ++
kernel/sched/sched.h           |  2 +-
kernel/sched/topology.c        | 32 +++++++++++++----------------
11 files changed, 69 insertions(+), 67 deletions(-)
[PATCH v2 0/8] Fix SCHED_DEADLINE bandwidth accounting during suspend
Posted by Juri Lelli 11 months, 1 week ago
Hello!

Jon reported [1] a suspend regression on a Tegra board configured to
boot with isolcpus and bisected it to commit 53916d5fd3c0
("sched/deadline: Check bandwidth overflow earlier for hotplug").

Root cause analysis pointed out that we are currently failing to
correctly clear and restore bandwidth accounting on root domains after
changes that initiate from partition_sched_domains(), as it is the case
for suspend operations on that board.

This is v2 [2] of the proposed approach to fix the issue. With respect
to v1, the following implements the approach by:

- 01: filter out DEADLINE special tasks
- 02: preparatory wrappers to be able to grab sched_domains_mutex on
      UP (remove !SMP wrappers - Waiman)
- 03: generalize unique visiting of root domains so that we can
      re-use the mechanism elsewhere
- 04: the bulk of the approach, clean and rebuild after changes
- 05: clean up a now redundant call
- 06: remove partition_and_rebuild_sched_domains() (Waiman)
- 07: stop exposing partition_sched_domains_locked (Waiman)

Please test and review. The set is also available at

git@github.com:jlelli/linux.git upstream/deadline/domains-suspend

Best,
Juri

1 - https://lore.kernel.org/lkml/ba51a43f-796d-4b79-808a-b8185905638a@nvidia.com/
2 - v1 https://lore.kernel.org/lkml/20250304084045.62554-1-juri.lelli@redhat.com

Juri Lelli (8):
  sched/deadline: Ignore special tasks when rebuilding domains
  sched/topology: Wrappers for sched_domains_mutex
  sched/deadline: Generalize unique visiting of root domains
  sched/deadline: Rebuild root domain accounting after every update
  sched/topology: Remove redundant dl_clear_root_domain call
  cgroup/cpuset: Remove partition_and_rebuild_sched_domains
  sched/topology: Stop exposing partition_sched_domains_locked
  include/{topology,cpuset}: Move dl_rebuild_rd_accounting to cpuset.h

 include/linux/cpuset.h         |  5 +++++
 include/linux/sched.h          |  2 ++
 include/linux/sched/deadline.h |  7 +++++++
 include/linux/sched/topology.h | 10 ---------
 kernel/cgroup/cpuset.c         | 27 +++++++++----------------
 kernel/sched/core.c            |  4 ++--
 kernel/sched/deadline.c        | 37 ++++++++++++++++++++--------------
 kernel/sched/debug.c           |  8 ++++----
 kernel/sched/rt.c              |  2 ++
 kernel/sched/sched.h           |  2 +-
 kernel/sched/topology.c        | 32 +++++++++++++----------------
 11 files changed, 69 insertions(+), 67 deletions(-)


base-commit: 48a5eed9ad584315c30ed35204510536235ce402
-- 
2.48.1
Re: [PATCH v2 0/8] Fix SCHED_DEADLINE bandwidth accounting during suspend
Posted by Waiman Long 11 months ago
On 3/6/25 9:10 AM, Juri Lelli wrote:
> Hello!
>
> Jon reported [1] a suspend regression on a Tegra board configured to
> boot with isolcpus and bisected it to commit 53916d5fd3c0
> ("sched/deadline: Check bandwidth overflow earlier for hotplug").
>
> Root cause analysis pointed out that we are currently failing to
> correctly clear and restore bandwidth accounting on root domains after
> changes that initiate from partition_sched_domains(), as it is the case
> for suspend operations on that board.
>
> This is v2 [2] of the proposed approach to fix the issue. With respect
> to v1, the following implements the approach by:
>
> - 01: filter out DEADLINE special tasks
> - 02: preparatory wrappers to be able to grab sched_domains_mutex on
>        UP (remove !SMP wrappers - Waiman)
> - 03: generalize unique visiting of root domains so that we can
>        re-use the mechanism elsewhere
> - 04: the bulk of the approach, clean and rebuild after changes
> - 05: clean up a now redundant call
> - 06: remove partition_and_rebuild_sched_domains() (Waiman)
> - 07: stop exposing partition_sched_domains_locked (Waiman)
>
> Please test and review. The set is also available at
>
> git@github.com:jlelli/linux.git upstream/deadline/domains-suspend
>
> Best,
> Juri
>
> 1 - https://lore.kernel.org/lkml/ba51a43f-796d-4b79-808a-b8185905638a@nvidia.com/
> 2 - v1 https://lore.kernel.org/lkml/20250304084045.62554-1-juri.lelli@redhat.com
>
> Juri Lelli (8):
>    sched/deadline: Ignore special tasks when rebuilding domains
>    sched/topology: Wrappers for sched_domains_mutex
>    sched/deadline: Generalize unique visiting of root domains
>    sched/deadline: Rebuild root domain accounting after every update
>    sched/topology: Remove redundant dl_clear_root_domain call
>    cgroup/cpuset: Remove partition_and_rebuild_sched_domains
>    sched/topology: Stop exposing partition_sched_domains_locked
>    include/{topology,cpuset}: Move dl_rebuild_rd_accounting to cpuset.h
>
>   include/linux/cpuset.h         |  5 +++++
>   include/linux/sched.h          |  2 ++
>   include/linux/sched/deadline.h |  7 +++++++
>   include/linux/sched/topology.h | 10 ---------
>   kernel/cgroup/cpuset.c         | 27 +++++++++----------------
>   kernel/sched/core.c            |  4 ++--
>   kernel/sched/deadline.c        | 37 ++++++++++++++++++++--------------
>   kernel/sched/debug.c           |  8 ++++----
>   kernel/sched/rt.c              |  2 ++
>   kernel/sched/sched.h           |  2 +-
>   kernel/sched/topology.c        | 32 +++++++++++++----------------
>   11 files changed, 69 insertions(+), 67 deletions(-)
>
>
> base-commit: 48a5eed9ad584315c30ed35204510536235ce402

I have run my cpuset test and it completed successfully without any issue.

Tested-by: Waiman Long <longman@redhat.com>
Re: [PATCH v2 0/8] Fix SCHED_DEADLINE bandwidth accounting during suspend
Posted by Juri Lelli 11 months ago
On 07/03/25 14:00, Waiman Long wrote:
> On 3/6/25 9:10 AM, Juri Lelli wrote:
> > Hello!
> > 
> > Jon reported [1] a suspend regression on a Tegra board configured to
> > boot with isolcpus and bisected it to commit 53916d5fd3c0
> > ("sched/deadline: Check bandwidth overflow earlier for hotplug").
> > 
> > Root cause analysis pointed out that we are currently failing to
> > correctly clear and restore bandwidth accounting on root domains after
> > changes that initiate from partition_sched_domains(), as it is the case
> > for suspend operations on that board.
> > 
> > This is v2 [2] of the proposed approach to fix the issue. With respect
> > to v1, the following implements the approach by:
> > 
> > - 01: filter out DEADLINE special tasks
> > - 02: preparatory wrappers to be able to grab sched_domains_mutex on
> >        UP (remove !SMP wrappers - Waiman)
> > - 03: generalize unique visiting of root domains so that we can
> >        re-use the mechanism elsewhere
> > - 04: the bulk of the approach, clean and rebuild after changes
> > - 05: clean up a now redundant call
> > - 06: remove partition_and_rebuild_sched_domains() (Waiman)
> > - 07: stop exposing partition_sched_domains_locked (Waiman)
> > 
> > Please test and review. The set is also available at

...

> I have run my cpuset test and it completed successfully without any issue.
> 
> Tested-by: Waiman Long <longman@redhat.com>
> 

Thanks!
Juri
Re: [PATCH v2 0/8] Fix SCHED_DEADLINE bandwidth accounting during suspend
Posted by Jon Hunter 11 months ago
Hi Juri,

On 06/03/2025 14:10, Juri Lelli wrote:
> Hello!
> 
> Jon reported [1] a suspend regression on a Tegra board configured to
> boot with isolcpus and bisected it to commit 53916d5fd3c0
> ("sched/deadline: Check bandwidth overflow earlier for hotplug").
> 
> Root cause analysis pointed out that we are currently failing to
> correctly clear and restore bandwidth accounting on root domains after
> changes that initiate from partition_sched_domains(), as it is the case
> for suspend operations on that board.
> 
> This is v2 [2] of the proposed approach to fix the issue. With respect
> to v1, the following implements the approach by:
> 
> - 01: filter out DEADLINE special tasks
> - 02: preparatory wrappers to be able to grab sched_domains_mutex on
>        UP (remove !SMP wrappers - Waiman)
> - 03: generalize unique visiting of root domains so that we can
>        re-use the mechanism elsewhere
> - 04: the bulk of the approach, clean and rebuild after changes
> - 05: clean up a now redundant call
> - 06: remove partition_and_rebuild_sched_domains() (Waiman)
> - 07: stop exposing partition_sched_domains_locked (Waiman)
> 
> Please test and review. The set is also available at


Tested-by: Jon Hunter <jonathanh@nvidia.com>

Thanks!
Jon

-- 
nvpublic
Re: [PATCH v2 0/8] Fix SCHED_DEADLINE bandwidth accounting during suspend
Posted by Juri Lelli 11 months ago
Hi Jon,

On 07/03/25 11:40, Jon Hunter wrote:
> Hi Juri,
> 
> On 06/03/2025 14:10, Juri Lelli wrote:
> > Hello!
> > 
> > Jon reported [1] a suspend regression on a Tegra board configured to
> > boot with isolcpus and bisected it to commit 53916d5fd3c0
> > ("sched/deadline: Check bandwidth overflow earlier for hotplug").
> > 
> > Root cause analysis pointed out that we are currently failing to
> > correctly clear and restore bandwidth accounting on root domains after
> > changes that initiate from partition_sched_domains(), as it is the case
> > for suspend operations on that board.
> > 
> > This is v2 [2] of the proposed approach to fix the issue. With respect
> > to v1, the following implements the approach by:
> > 
> > - 01: filter out DEADLINE special tasks
> > - 02: preparatory wrappers to be able to grab sched_domains_mutex on
> >        UP (remove !SMP wrappers - Waiman)
> > - 03: generalize unique visiting of root domains so that we can
> >        re-use the mechanism elsewhere
> > - 04: the bulk of the approach, clean and rebuild after changes
> > - 05: clean up a now redundant call
> > - 06: remove partition_and_rebuild_sched_domains() (Waiman)
> > - 07: stop exposing partition_sched_domains_locked (Waiman)
> > 
> > Please test and review. The set is also available at
> 
> 
> Tested-by: Jon Hunter <jonathanh@nvidia.com>

Thanks!

Best,
Juri