[v7] SCHED_DEADLINE server infrastructure

[PATCH V7 0/9] SCHED_DEADLINE server infrastructure

Posted by Daniel Bristot de Oliveira 1 year, 8 months ago

This is v7 of Peter's SCHED_DEADLINE server infrastructure
implementation [1].

SCHED_DEADLINE servers can help fixing starvation issues of low priority
tasks (e.g., SCHED_OTHER) when higher priority tasks monopolize CPU
cycles. Today we have RT Throttling; DEADLINE servers should be able to
replace and improve that.

In the v1 there was discussion raised about the consequence of using
deadline based servers on the fixed-priority workloads. For a demonstration
here is the baseline of timerlat scheduling latency as-is, with kernel
build background workload:

 # rtla timerlat top -u -d 10m

  --------------------- %< ------------------------
                                     Timer Latency
  0 01:42:24   |          IRQ Timer Latency (us)        |         Thread Timer Latency (us)      |    Ret user Timer Latency (us)
CPU COUNT      |      cur       min       avg       max |      cur       min       avg       max |      cur       min       avg       max
  0 #6143559   |        0         0         0        92 |        2         1         3        98 |        4         1         5       100
  1 #6143559   |        1         0         0        97 |        7         1         5       101 |        9         1         7       103
  2 #6143559   |        0         0         0        88 |        3         1         5        95 |        5         1         7        99
  3 #6143559   |        0         0         0        90 |        6         1         5       103 |       10         1         7       126
  4 #6143558   |        1         0         0        81 |        7         1         4        86 |        9         1         7        90
  5 #6143558   |        0         0         0        74 |        3         1         5        79 |        4         1         7        83
  6 #6143558   |        0         0         0        83 |        2         1         5        89 |        3         0         7       108
  7 #6143558   |        0         0         0        85 |        3         1         4       126 |        5         1         6       137
  --------------------- >% ------------------------

And this is the same tests with DL server activating without any delay:
  --------------------- %< ------------------------
  0 00:10:01   |          IRQ Timer Latency (us)        |         Thread Timer Latency (us)      |    Ret user Timer Latency (us)
CPU COUNT      |      cur       min       avg       max |      cur       min       avg       max |      cur       min       avg       max
  0 #579147    |        0         0         0        54 |        2         1        52     61095 |        2         2        56     61102
  1 #578766    |        0         0         0        83 |        2         1        49     55824 |        3         2        53     55831
  2 #578559    |        0         0         1        59 |        2         1        50     55760 |        3         2        54     55770
  3 #578318    |        0         0         0        76 |        2         1        49     55751 |        3         2        54     55760
  4 #578611    |        0         0         0        64 |        2         1        49     55811 |        3         2        53     55820
  5 #578347    |        0         0         1        40 |        2         1        50     56121 |        3         2        55     56133
  6 #578938    |        0         0         1        75 |        2         1        49     55755 |        3         2        53     55764
  7 #578631    |        0         0         1        36 |        3         1        51     55528 |        4         2        55     55541
  --------------------- >% ------------------------

The problem with DL server only implementation is that FIFO tasks might
suffer preemption from NORMAL even when spare CPU cycles are available.
In fact, fair deadline server is enqueued right away when NORMAL tasks
wake up and they are first scheduled by the server, thus potentially
preempting a well behaving FIFO task. This is of course not ideal.

We had discussions about it, and one of the possibilities would be
using a different scheduling algorithm for this. But IMHO that is
an overkill.

Juri and I discussed this and though about delaying the server
activation for the (period - runtime), thus enabling the server
only if the fair scheduler is about to starve. We called it
the defer server.

The defer the server start to the (absolute deadline - runtime)
point in time. This is achieved by starting the dl server throttled,
with a next replenishing time set to activate the server at
(absolute deadline - runtime).

The server is enqueued with the runtime replenished. As the fair
scheduler runs without boost, its runtime is consumed. If the
fair server has its runtime before the runtime - deadline time,
the a new period is set, and the timer armed for the new
deadline.

The interface is per CPU and has two knobs:
	fair_server_runtime (950 ms)
	fair_server_period  (1s)

With defer enabled on CPUs [0:3], the results get better, having a
behavior similar to the one we have with the rt throttling.

  --------------------- %< ------------------------
                                     Timer Latency                                                                                       
  0 00:10:01   |          IRQ Timer Latency (us)        |         Thread Timer Latency (us)      |    Ret user Timer Latency (us)
CPU COUNT      |      cur       min       avg       max |      cur       min       avg       max |      cur       min       avg       max
  0 #599979    |        0         0         0        64 |        4         1         4        67 |        6         1         5        69
  1 #599979    |        0         0         1        17 |        6         1         5        50 |       10         2         7        71
  2 #599984    |        1         0         1        22 |        4         1         5        78 |        5         2         7       107
  3 #599986    |        0         0         1        72 |        7         1         5        79 |       10         2         7        82
  4 #581580    |        1         0         1        37 |        6         1        38     52797 |       10         2        41     52805
  5 #583270    |        1         0         1        41 |        9         1        36     52617 |       12         2        38     52623
  6 #581240    |        0         0         1        25 |        7         1        39     52870 |       11         2        41     52876
  7 #581208    |        0         0         1        69 |        6         1        39     52917 |        9         2        41     52923
  --------------------- >% ------------------------

Here are some osnoise measurement, with osnoise threads running as FIFO:1 with
different setups (defer enabled):
 - CPU 2 isolated
 - CPU 3 isolated shared with a CFS busy loop task
 - CPU 8 non-isolated
 - CPU 9 non-isolated shared with a CFS busy loop task

  --------------------- %< ------------------------
 ~# pgrep ktimer | while read pid; do chrt -p -f 2 $pid; done # for RT kernel
 ~# tuna  isolate -c 2
 ~# tuna  isolate -c 3
 ~# taskset -c 3 ./f &
 ~# taskset -c 9 ./f &
 ~# osnoise -P f:1 -c 2,3,8,9 -T 1 -d 10m -H 1
                                          Operating System Noise
duration:   0 00:10:00 | time is in us
CPU Period       Runtime        Noise  % CPU Aval   Max Noise   Max Single          HW          NMI          IRQ      Softirq       Thread
  2 #599       599000000          178    99.99997          18            2           0            0          270            0            0
  3 #598       598054434     31351553    94.75774      104442       104442           0            0      2837523            0         1794
  8 #599       599000001       567456    99.90526        3260         2375           2           89       620490            0        13539
  9 #598       598021196     31742537    94.69207       71707        53357           0           90      3411023            0         1762
   --------------------- >% ------------------------

the system runs fine!
	- no crashes (famous last words)
	- FIFO property is kept
	- per cpu interface because it is more flexible - and to detach this from
	  the throttling concept.
	In addition:
	 - This version is on my korg repo for three weeks without any
	   0 robot compaining.
	 - no regressions found on basic cfs tests, like kernel compilation.
	 - also tested with PREEMPT_RT 6.9-rt.

Global breaks only if the fair server activates (which is fair as RT
tasks are not behaving anyways).

The selftest mentioned in the sched/core patches is here:
	https://lore.kernel.org/all/20240313012451.1693807-8-joel@joelfernandes.org/

Thanks people at google for testing/suggesting:
 - Suleiman Souhlal <suleiman@google.com>
 - Youssef Esmat <youssefesmat@google.com>
 - Joel Fernandes (Google) <joel@joelfernandes.org>
 - Vineeth Pillai <vineeth@bitbyteword.org>

Changes from v6:
  - Rebased on top of v6.10-rc1
  - Improved comments (Daniel)
  - Fix division by 0 on adding bw to the rq (Daniel)
  - Use guard and scoped_guard (Peter)
  - Remove the defer knob (Peter)
  - Adjusted comments and code styling (Peter)
  - Be aware of cfs throttling (Vineeth)
  - Split the fixes and the feature from the basic fair server (Peter)
Changes from V5:
  - Fixes DL server for core scheduling (patches 3 and 4)
  (Joel/Vineeth/Suleiman)
  - Add a function to attach the fair server bandwidth to the
    new root domain when the rq changes root domain (Daniel)
  - Postpone the replenishment timer of the server if the defer reservation
    could consume runtime while waiting to boost (patch 2) (Daniel)
  - Add the running state to defer mode avoid forcing the defer mechanism
    if the server continues to be activated due to starvation (patch 2)
    (Daniel)
  - Consider idle time as time for the defer server to avoid penalty on RT
    tasks (patch 2) (Daniel)
  - Mark DL server as unthrottled before enqueue (patch 2)(Joel)
  - Make start_dl_timer callers more robust (patch 2) (Joel)
  - Do not restart the DL server on replenish from timer (patch 2)(Joel)
  - Fix Reverse args to dl_time_before in replenish (patch 2) (Suleiman Souhlal)
  - Removed the negative runtime optimization (patch 2)
  - Start the dl server as disabled in patch 1, enabling it only after
    removing the RT throttling, to avoid having two mechanism together
    by default (patch 1) (Daniel).
  - Added a check need resched after dl_server_start (patch 1) (Daniel)
  - reset dl_server pointer at put_prev_task_balance (patch 1) (Joel)
  - Do not include the already merged patches
  - Rebased to 6.9-rc2
Changes from V4:
  - Enable the server when nr fair tasks is > 0 (peter)
  - Consume runtime if the zerolax server is not boosted (peterz)
  - Adjust interface to deal with admission control (peterz)
  - Rebased to 6.6
Changes from V3:
  - Add the defer server (Daniel)
  - Add an per rq interface (Daniel with peter's feedback)
  - Add an option not defer the server
  - Typos and 1-liner fixes (Valentin, Luca, Peter)
  - Fair scheduler running on dl server do not account as RT task (Daniel)
  - Changed the condition to enable the server (RT & fair tasks) (Daniel)
Changes from v2:
  - Refactor/rephrase/typos changes
  - Defferable server using throttling
  - The server starts when RT && Fair tasks are enqueued
  - Interface with runtime/period/defer option
Changes from v1:
  - rebased on 6.4-rc1 tip/sched/core

Daniel Bristot de Oliveira (3):
  sched/deadline: Comment sched_dl_entity::dl_server variable
  sched/deadline: Deferrable dl server
  sched/fair: Fair server interface

Joel Fernandes (Google) (3):
  sched/core: Add clearing of ->dl_server in put_prev_task_balance()
  sched/core: Fix priority checking for DL server picks
  sched/core: Fix picking of tasks for core scheduling with DL server

Peter Zijlstra (2):
  sched/fair: Add trivial fair server
  sched/rt: Remove default bandwidth control

Youssef Esmat (1):
  sched/core: Clear prev->dl_server in CFS pick fast path

 include/linux/sched.h   |  17 +-
 kernel/sched/core.c     |  56 +++--
 kernel/sched/deadline.c | 449 ++++++++++++++++++++++++++++++++++------
 kernel/sched/debug.c    | 162 +++++++++++++++
 kernel/sched/fair.c     |  75 ++++++-
 kernel/sched/idle.c     |   2 +
 kernel/sched/rt.c       | 242 ++++++++++------------
 kernel/sched/sched.h    |  17 +-
 kernel/sched/topology.c |   8 +
 9 files changed, 813 insertions(+), 215 deletions(-)

-- 
2.45.1

Re: [PATCH V7 0/9] SCHED_DEADLINE server infrastructure

Posted by Juri Lelli 1 year, 7 months ago

Hi Daniel,

On 27/05/24 14:06, Daniel Bristot de Oliveira wrote:
> This is v7 of Peter's SCHED_DEADLINE server infrastructure
> implementation [1].

I finally managed to give this a go and can report that it works great
for what I've seen. :)

So, please consider this reply a

Tested-by: Juri Lelli <juri.lelli@redhat.com>

> SCHED_DEADLINE servers can help fixing starvation issues of low priority
> tasks (e.g., SCHED_OTHER) when higher priority tasks monopolize CPU
> cycles. Today we have RT Throttling; DEADLINE servers should be able to
> replace and improve that.

...

> The problem with DL server only implementation is that FIFO tasks might
> suffer preemption from NORMAL even when spare CPU cycles are available.
> In fact, fair deadline server is enqueued right away when NORMAL tasks
> wake up and they are first scheduled by the server, thus potentially
> preempting a well behaving FIFO task. This is of course not ideal.
> 
> We had discussions about it, and one of the possibilities would be
> using a different scheduling algorithm for this. But IMHO that is
> an overkill.
> 
> Juri and I discussed this and though about delaying the server
> activation for the (period - runtime), thus enabling the server
> only if the fair scheduler is about to starve. We called it
> the defer server.
> 
> The defer the server start to the (absolute deadline - runtime)
> point in time. This is achieved by starting the dl server throttled,
> with a next replenishing time set to activate the server at
> (absolute deadline - runtime).
> 
> The server is enqueued with the runtime replenished. As the fair
> scheduler runs without boost, its runtime is consumed. If the
> fair server has its runtime before the runtime - deadline time,
> the a new period is set, and the timer armed for the new
> deadline.

I also wanted to pay particular attention to this part implementing the
deferred server, but failed to find enough focus time for now. I will
keep trying. One thing that I wondered though is if this change (and the
move towards this replacing current RT throttling) would call for a Doc
update. What do you think?

Thanks!
Juri

Re: [PATCH V7 0/9] SCHED_DEADLINE server infrastructure

Posted by Daniel Bristot de Oliveira 1 year, 7 months ago

On 6/21/24 15:37, Juri Lelli wrote:
> Hi Daniel,
> 
> On 27/05/24 14:06, Daniel Bristot de Oliveira wrote:
>> This is v7 of Peter's SCHED_DEADLINE server infrastructure
>> implementation [1].
> 
> I finally managed to give this a go and can report that it works great
> for what I've seen. :)
> 
> So, please consider this reply a
> 
> Tested-by: Juri Lelli <juri.lelli@redhat.com>

Thanks!

>> SCHED_DEADLINE servers can help fixing starvation issues of low priority
>> tasks (e.g., SCHED_OTHER) when higher priority tasks monopolize CPU
>> cycles. Today we have RT Throttling; DEADLINE servers should be able to
>> replace and improve that.
> 
> ...
> 
>> The problem with DL server only implementation is that FIFO tasks might
>> suffer preemption from NORMAL even when spare CPU cycles are available.
>> In fact, fair deadline server is enqueued right away when NORMAL tasks
>> wake up and they are first scheduled by the server, thus potentially
>> preempting a well behaving FIFO task. This is of course not ideal.
>>
>> We had discussions about it, and one of the possibilities would be
>> using a different scheduling algorithm for this. But IMHO that is
>> an overkill.
>>
>> Juri and I discussed this and though about delaying the server
>> activation for the (period - runtime), thus enabling the server
>> only if the fair scheduler is about to starve. We called it
>> the defer server.
>>
>> The defer the server start to the (absolute deadline - runtime)
>> point in time. This is achieved by starting the dl server throttled,
>> with a next replenishing time set to activate the server at
>> (absolute deadline - runtime).
>>
>> The server is enqueued with the runtime replenished. As the fair
>> scheduler runs without boost, its runtime is consumed. If the
>> fair server has its runtime before the runtime - deadline time,
>> the a new period is set, and the timer armed for the new
>> deadline.
> 
> I also wanted to pay particular attention to this part implementing the
> deferred server, but failed to find enough focus time for now. I will
> keep trying. One thing that I wondered though is if this change (and the
> move towards this replacing current RT throttling) would call for a Doc
> update. What do you think?

Yeah, am I planning a v8 for the next week. It has no code changes, just a rebase
and the addition of documentation.

I am not mentioning the RT throttling in the documentation. Instead, I am treating
this as a new feature on its own, which is inline with the comments over the code.

I will add an rv monitor to it, extending the documentation, but I will do it
on another series... once we get this done.

Thoughts?

Peter/Ingo, which branch should I rebase it?

-- Daniel

> Thanks!
> Juri
>

Re: [PATCH V7 0/9] SCHED_DEADLINE server infrastructure

Posted by Juri Lelli 1 year, 7 months ago

On 21/06/24 15:43, Daniel Bristot de Oliveira wrote:
> On 6/21/24 15:37, Juri Lelli wrote:
> > Hi Daniel,
> > 
> > On 27/05/24 14:06, Daniel Bristot de Oliveira wrote:
> >> This is v7 of Peter's SCHED_DEADLINE server infrastructure
> >> implementation [1].
> > 
> > I finally managed to give this a go and can report that it works great
> > for what I've seen. :)
> > 
> > So, please consider this reply a
> > 
> > Tested-by: Juri Lelli <juri.lelli@redhat.com>
> 
> Thanks!
> 
> >> SCHED_DEADLINE servers can help fixing starvation issues of low priority
> >> tasks (e.g., SCHED_OTHER) when higher priority tasks monopolize CPU
> >> cycles. Today we have RT Throttling; DEADLINE servers should be able to
> >> replace and improve that.
> > 
> > ...
> > 
> >> The problem with DL server only implementation is that FIFO tasks might
> >> suffer preemption from NORMAL even when spare CPU cycles are available.
> >> In fact, fair deadline server is enqueued right away when NORMAL tasks
> >> wake up and they are first scheduled by the server, thus potentially
> >> preempting a well behaving FIFO task. This is of course not ideal.
> >>
> >> We had discussions about it, and one of the possibilities would be
> >> using a different scheduling algorithm for this. But IMHO that is
> >> an overkill.
> >>
> >> Juri and I discussed this and though about delaying the server
> >> activation for the (period - runtime), thus enabling the server
> >> only if the fair scheduler is about to starve. We called it
> >> the defer server.
> >>
> >> The defer the server start to the (absolute deadline - runtime)
> >> point in time. This is achieved by starting the dl server throttled,
> >> with a next replenishing time set to activate the server at
> >> (absolute deadline - runtime).
> >>
> >> The server is enqueued with the runtime replenished. As the fair
> >> scheduler runs without boost, its runtime is consumed. If the
> >> fair server has its runtime before the runtime - deadline time,
> >> the a new period is set, and the timer armed for the new
> >> deadline.
> > 
> > I also wanted to pay particular attention to this part implementing the
> > deferred server, but failed to find enough focus time for now. I will
> > keep trying. One thing that I wondered though is if this change (and the
> > move towards this replacing current RT throttling) would call for a Doc
> > update. What do you think?
> 
> Yeah, am I planning a v8 for the next week. It has no code changes, just a rebase
> and the addition of documentation.
> 
> I am not mentioning the RT throttling in the documentation. Instead, I am treating
> this as a new feature on its own, which is inline with the comments over the code.
> 
> I will add an rv monitor to it, extending the documentation, but I will do it
> on another series... once we get this done.
> 
> Thoughts?

Works for me! Guess we can deal with the RT throttling references in the
future when that gets eventually pruned.

Re: [PATCH V7 0/9] SCHED_DEADLINE server infrastructure

Posted by Vineeth Remanan Pillai 1 year, 7 months ago

Hi Daniel,


>
> This is v7 of Peter's SCHED_DEADLINE server infrastructure
> implementation [1].
>
Thanks for the v7 :-)

Sorry that I could not get to reviewing and testing this revision. In
v6 we had experienced a minor bug where suspend/resume had issues with
dlserver. Since suspend does not do dequeue, dlserver is not stopped
and this causes the premature wakeups. I haven't looked at v7 in
detail, but I think the issue might still be present. We have a
workaround patch for this in our 5.15 kernel based on v5 which I am
attaching for reference. This might not apply cleanly on v7 and
possibly not be the best solution, but thought of sharing it to give
an insight about the issue.

Thanks,
Vineeth

Attached patch
-----------------------

Subject: [PATCH] sched/dlserver: Freeze dlserver on system suspend.

dlserver is stopped only if a dequeue or cfs rq throttle results in no
runnable cfs tasks. But this doesn't happen during a system suspend and
can cause the dl server to stay active and break suspend/resume.

Freeze the dlserver on system suspend. Freezing is stopping the
dlserver, but maintaining the dl_server_active state so as to not
confuse the enqueue/dequeue path.

Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/5528103
Reviewed-by: Suleiman Souhlal <suleiman@google.com>
Tested-by: Vineeth Pillai <vineethrp@google.com>
Commit-Queue: Vineeth Pillai <vineethrp@google.com>
---
 include/linux/sched.h   | 27 +++++++++++++
 kernel/power/suspend.c  |  3 ++
 kernel/sched/deadline.c | 87 +++++++++++++++++++++++++++++++++++++----
 3 files changed, 110 insertions(+), 7 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 123ef7804d95..23beff5e48a2 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -650,6 +650,15 @@ struct sched_dl_entity {
        unsigned int                    dl_defer          : 1;
        unsigned int                    dl_defer_armed    : 1;
        unsigned int                    dl_server_active  : 1;
+       /*
+        * dl_server is marked as frozen when the system suspends. Frozen
+        * means that dl_server is stopped, but the dl_server_active state
+        * is maintained so that the enqueue/dequeue path is not confused.
+        * We need this separate state other than dl_server_active because
+        * suspend doesn't dequeue the tasks and hence does not stop the
+        * dl_server during suspend. And this may lead to spurious resumes.
+        */
+       unsigned int                    dl_server_frozen  : 1;

        /*
         * Bandwidth enforcement timer. Each -deadline task has its
@@ -690,6 +699,24 @@ struct sched_dl_entity {
 #endif
 };

+/*
+ * Power management related actions for dl_server
+ */
+enum dl_server_pm_action {
+       dl_server_pm_freeze = 0,
+       dl_server_pm_thaw = 1
+};
+extern void freeze_thaw_dl_server(enum dl_server_pm_action action);
+static inline void freeze_dl_server(void)
+{
+       freeze_thaw_dl_server(dl_server_pm_freeze);
+}
+static inline void thaw_dl_server(void)
+{
+       freeze_thaw_dl_server(dl_server_pm_thaw);
+}
+
+
 #ifdef CONFIG_UCLAMP_TASK
 /* Number of utilization clamp buckets (shorter alias) */
 #define UCLAMP_BUCKETS CONFIG_UCLAMP_BUCKETS_COUNT
diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
index 55235bf52c7e..a6d5f8f3072e 100644
--- a/kernel/power/suspend.c
+++ b/kernel/power/suspend.c
@@ -592,6 +592,8 @@ static int enter_state(suspend_state_t state)
        if (error)
                goto Unlock;

+       freeze_dl_server();
+
        if (suspend_test(TEST_FREEZER))
                goto Finish;

@@ -602,6 +604,7 @@ static int enter_state(suspend_state_t state)
        pm_restore_gfp_mask();

  Finish:
+       thaw_dl_server();
        events_check_enabled = false;
        pm_pr_dbg("Finishing wakeup.\n");
        suspend_finish();
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 5fc40caf20d7..f95a375af329 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1478,10 +1478,11 @@ static void update_curr_dl_se(struct rq *rq,
struct sched_dl_entity *dl_se, s64

 void dl_server_update(struct sched_dl_entity *dl_se, s64 delta_exec)
 {
-       update_curr_dl_se(dl_se->rq, dl_se, delta_exec);
+       if (!dl_se->dl_server_frozen)
+               update_curr_dl_se(dl_se->rq, dl_se, delta_exec);
 }

-void dl_server_start(struct sched_dl_entity *dl_se)
+static inline void __dl_server_start(struct sched_dl_entity *dl_se)
 {
        /*
         * XXX: the apply do not work fine at the init phase for the
@@ -1500,25 +1501,97 @@ void dl_server_start(struct sched_dl_entity *dl_se)
                setup_new_dl_entity(dl_se);
        }

+       enqueue_dl_entity(dl_se, ENQUEUE_WAKEUP);
+}
+
+void dl_server_start(struct sched_dl_entity *dl_se)
+{
+       if (dl_se->dl_server_frozen)
+               goto set_active;
+
        if (WARN_ON_ONCE(dl_se->dl_server_active))
                return;

-       enqueue_dl_entity(dl_se, ENQUEUE_WAKEUP);
+       __dl_server_start(dl_se);
+
+set_active:
        dl_se->dl_server_active = 1;
 }

-void dl_server_stop(struct sched_dl_entity *dl_se)
+static inline void __dl_server_stop(struct sched_dl_entity *dl_se)
 {
-       if (WARN_ON_ONCE(!dl_se->dl_server_active))
-               return;
-
        dequeue_dl_entity(dl_se, DEQUEUE_SLEEP);
        hrtimer_try_to_cancel(&dl_se->dl_timer);
        dl_se->dl_defer_armed = 0;
        dl_se->dl_throttled = 0;
+}
+
+void dl_server_stop(struct sched_dl_entity *dl_se)
+{
+       if (dl_se->dl_server_frozen)
+               goto reset_active;
+
+       if (WARN_ON_ONCE(!dl_se->dl_server_active))
+               return;
+
+       __dl_server_stop(dl_se);
+
+reset_active:
        dl_se->dl_server_active = 0;
 }

+void dl_server_freeze(struct sched_dl_entity *dl_se)
+{
+       if (dl_se->dl_server_active) {
+               update_rq_clock(dl_se->rq);
+               __dl_server_stop(dl_se);
+       }
+       dl_se->dl_server_frozen = 1;
+}
+
+void dl_server_thaw(struct sched_dl_entity *dl_se)
+{
+       if (dl_se->dl_server_active) {
+               update_rq_clock(dl_se->rq);
+               __dl_server_start(dl_se);
+       }
+       dl_se->dl_server_frozen = 0;
+}
+
+void freeze_thaw_dl_server(enum dl_server_pm_action action)
+{
+       int cpu;
+
+       cpus_read_lock();
+       for_each_online_cpu(cpu) {
+               struct rq_flags rf;
+               struct rq *rq = cpu_rq(cpu);
+               struct sched_dl_entity *dl_se;
+
+               sched_clock_tick();
+               rq_lock_irqsave(rq, &rf);
+               dl_se = &rq->fair_server;
+               switch (action) {
+               case dl_server_pm_freeze:
+                       if (WARN_ON_ONCE(dl_se->dl_server_frozen))
+                               break;
+
+                       dl_server_freeze(dl_se);
+                       break;
+               case dl_server_pm_thaw:
+                       if (WARN_ON_ONCE(!dl_se->dl_server_frozen))
+                               break;
+
+
+                       dl_server_thaw(dl_se);
+                       break;
+               default:
+                       WARN_ON_ONCE(1);
+               }
+               rq_unlock_irqrestore(rq, &rf);
+       }
+       cpus_read_unlock();
+}
+
 void dl_server_init(struct sched_dl_entity *dl_se, struct rq *rq,
                    dl_server_has_tasks_f has_tasks,
                    dl_server_pick_f pick_next,
-- 
2.45.2.741.gdbec12cfda-goog

Re: [PATCH V7 0/9] SCHED_DEADLINE server infrastructure

Posted by Peter Zijlstra 1 year, 6 months ago

On Fri, Jun 21, 2024 at 10:41:35AM -0400, Vineeth Remanan Pillai wrote:

> Sorry that I could not get to reviewing and testing this revision. In
> v6 we had experienced a minor bug where suspend/resume had issues with
> dlserver. Since suspend does not do dequeue, dlserver is not stopped
> and this causes the premature wakeups. I haven't looked at v7 in
> detail, but I think the issue might still be present.

It is not.

> We have a workaround patch for this in our 5.15 kernel 

That is the problem... your necro kernel doesn't yet have the freezer
rewrite I imagine:

  f5d39b020809 ("freezer,sched: Rewrite core freezer logic")

That would cause all frozen tasks to be dequeued, and once all tasks
are dequeued, the deadline server stops itself too.

Juri did some testing to double check and no suspend / resume issues
were found.

Anyway, I've merged the lot into tip/sched/core.

Thanks all!

Re: [PATCH V7 0/9] SCHED_DEADLINE server infrastructure

Posted by Vineeth Remanan Pillai 1 year, 6 months ago

On Mon, Jul 29, 2024 at 6:32 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Fri, Jun 21, 2024 at 10:41:35AM -0400, Vineeth Remanan Pillai wrote:
>
> > Sorry that I could not get to reviewing and testing this revision. In
> > v6 we had experienced a minor bug where suspend/resume had issues with
> > dlserver. Since suspend does not do dequeue, dlserver is not stopped
> > and this causes the premature wakeups. I haven't looked at v7 in
> > detail, but I think the issue might still be present.
>
> It is not.
>
> > We have a workaround patch for this in our 5.15 kernel
>
> That is the problem... your necro kernel doesn't yet have the freezer
> rewrite I imagine:
>
>   f5d39b020809 ("freezer,sched: Rewrite core freezer logic")
>
> That would cause all frozen tasks to be dequeued, and once all tasks
> are dequeued, the deadline server stops itself too.
>
You're right, we are on 5.15 kernel and do not have this fix. Thanks
for pointing this out.

> Juri did some testing to double check and no suspend / resume issues
> were found.
>
> Anyway, I've merged the lot into tip/sched/core.
>
Thanks, I shall port it to chromeos kernel and run through the usual
round of tests and update the details soon.

Thanks,
Vineeth

Re: [PATCH V7 0/9] SCHED_DEADLINE server infrastructure

Posted by Daniel Bristot de Oliveira 1 year, 7 months ago

On 6/21/24 16:41, Vineeth Remanan Pillai wrote:
> Sorry that I could not get to reviewing and testing this revision. In
> v6 we had experienced a minor bug where suspend/resume had issues with
> dlserver. Since suspend does not do dequeue, dlserver is not stopped
> and this causes the premature wakeups.

Ouch! I will have a look next week on this. Do you guys know any other bug?

an earlier report without necessarily a fix/work around is a good thing
for us to try to reproduce it/think about it as earlier as we can...

-- Daniel

Re: [PATCH V7 0/9] SCHED_DEADLINE server infrastructure

Posted by Vineeth Remanan Pillai 1 year, 7 months ago

On Fri, Jun 21, 2024 at 10:59 AM Daniel Bristot de Oliveira
<bristot@kernel.org> wrote:
>
> On 6/21/24 16:41, Vineeth Remanan Pillai wrote:
> > Sorry that I could not get to reviewing and testing this revision. In
> > v6 we had experienced a minor bug where suspend/resume had issues with
> > dlserver. Since suspend does not do dequeue, dlserver is not stopped
> > and this causes the premature wakeups.
>
> Ouch! I will have a look next week on this. Do you guys know any other bug?
>
> an earlier report without necessarily a fix/work around is a good thing
> for us to try to reproduce it/think about it as earlier as we can...
>
Sorry my mistake, I was buried in other things and missed reporting
this earlier.

There was another minor regression seen lately after we fixed the
above issue- idle cpu was spending more time in C7 than C10 with the
dlserver changes. This was reported very recently and we haven't
investigated this much yet. Just a heads up and will keep you posted
as we know more.

Thanks,
Vineeth

Re: [PATCH V7 0/9] SCHED_DEADLINE server infrastructure

Posted by Daniel Bristot de Oliveira 1 year, 7 months ago

On 6/21/24 17:09, Vineeth Remanan Pillai wrote:
> On Fri, Jun 21, 2024 at 10:59 AM Daniel Bristot de Oliveira
> <bristot@kernel.org> wrote:
>>
>> On 6/21/24 16:41, Vineeth Remanan Pillai wrote:
>>> Sorry that I could not get to reviewing and testing this revision. In
>>> v6 we had experienced a minor bug where suspend/resume had issues with
>>> dlserver. Since suspend does not do dequeue, dlserver is not stopped
>>> and this causes the premature wakeups.
>>
>> Ouch! I will have a look next week on this. Do you guys know any other bug?
>>
>> an earlier report without necessarily a fix/work around is a good thing
>> for us to try to reproduce it/think about it as earlier as we can...
>>
> Sorry my mistake, I was buried in other things and missed reporting
> this earlier.

no worries.

> 
> There was another minor regression seen lately after we fixed the
> above issue- idle cpu was spending more time in C7 than C10 with the
> dlserver changes. This was reported very recently and we haven't
> investigated this much yet. Just a heads up and will keep you posted
> as we know more.

Maybe that is an expected side effect because of the timer for the server, AFAIR you
guys are using a short period large runtime (25ms/50ms)?

> Thanks,
> Vineeth