[RFC PATCH 14/17] sched: Add deadline tracepoints

Gabriele Monaco posted 17 patches 1 month, 3 weeks ago
There is a newer version of this series
[RFC PATCH 14/17] sched: Add deadline tracepoints
Posted by Gabriele Monaco 1 month, 3 weeks ago
Add the following tracepoints:

* sched_dl_throttle(dl):
    Called when a deadline entity is throttled
* sched_dl_replenish(dl):
    Called when a deadline entity's runtime is replenished
* sched_dl_server_start(dl):
    Called when a deadline server is started
* sched_dl_server_stop(dl, hard):
    Called when a deadline server is stopped (hard) or put to idle
    waiting for the next period (!hard)

Those tracepoints can be useful to validate the deadline scheduler with
RV and are not exported to tracefs.

Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
 include/trace/events/sched.h | 55 ++++++++++++++++++++++++++++++++++++
 kernel/sched/deadline.c      |  8 ++++++
 2 files changed, 63 insertions(+)

diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 7b2645b50e78..f34cc1dc4a13 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -609,6 +609,45 @@ TRACE_EVENT(sched_pi_setprio,
 			__entry->oldprio, __entry->newprio)
 );
 
+/*
+DECLARE_EVENT_CLASS(sched_dl_template,
+
+	TP_PROTO(struct sched_dl_entity *dl),
+
+	TP_ARGS(dl),
+
+	TP_STRUCT__entry(
+		__field(  struct task_struct *,	tsk		)
+		__string( comm,		dl->dl_server ? "server" : container_of(dl, struct task_struct, dl)->comm	)
+		__field(  pid_t,	pid		)
+		__field(  s64,		runtime		)
+		__field(  u64,		deadline	)
+		__field(  int,		dl_yielded	)
+	),
+
+	TP_fast_assign(
+		__assign_str(comm);
+		__entry->pid		= dl->dl_server ? -1 : container_of(dl, struct task_struct, dl)->pid;
+		__entry->runtime	= dl->runtime;
+		__entry->deadline	= dl->deadline;
+		__entry->dl_yielded	= dl->dl_yielded;
+	),
+
+	TP_printk("comm=%s pid=%d runtime=%lld deadline=%lld yielded=%d",
+			__get_str(comm), __entry->pid,
+			__entry->runtime, __entry->deadline,
+			__entry->dl_yielded)
+);
+
+DEFINE_EVENT(sched_dl_template, sched_dl_throttle,
+	TP_PROTO(struct sched_dl_entity *dl),
+	TP_ARGS(dl));
+
+DEFINE_EVENT(sched_dl_template, sched_dl_replenish,
+	TP_PROTO(struct sched_dl_entity *dl),
+	TP_ARGS(dl));
+*/
+
 #ifdef CONFIG_DETECT_HUNG_TASK
 TRACE_EVENT(sched_process_hang,
 	TP_PROTO(struct task_struct *tsk),
@@ -896,6 +935,22 @@ DECLARE_TRACE(sched_set_need_resched,
 	TP_PROTO(struct task_struct *tsk, int cpu, int tif),
 	TP_ARGS(tsk, cpu, tif));
 
+DECLARE_TRACE(sched_dl_throttle,
+	TP_PROTO(struct sched_dl_entity *dl),
+	TP_ARGS(dl));
+
+DECLARE_TRACE(sched_dl_replenish,
+	TP_PROTO(struct sched_dl_entity *dl),
+	TP_ARGS(dl));
+
+DECLARE_TRACE(sched_dl_server_start,
+	TP_PROTO(struct sched_dl_entity *dl),
+	TP_ARGS(dl));
+
+DECLARE_TRACE(sched_dl_server_stop,
+	TP_PROTO(struct sched_dl_entity *dl, bool hard),
+	TP_ARGS(dl, hard));
+
 #endif /* _TRACE_SCHED_H */
 
 /* This part must be outside protection */
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index e2d51f4306b3..f8284accb6b4 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -742,6 +742,7 @@ static inline void replenish_dl_new_period(struct sched_dl_entity *dl_se,
 		dl_se->dl_throttled = 1;
 		dl_se->dl_defer_armed = 1;
 	}
+	trace_sched_dl_replenish_tp(dl_se);
 }
 
 /*
@@ -852,6 +853,9 @@ static void replenish_dl_entity(struct sched_dl_entity *dl_se)
 	if (dl_time_before(dl_se->deadline, rq_clock(rq))) {
 		printk_deferred_once("sched: DL replenish lagged too much\n");
 		replenish_dl_new_period(dl_se, rq);
+	} else {
+		/* replenish_dl_new_period is also tracing */
+		trace_sched_dl_replenish_tp(dl_se);
 	}
 
 	if (dl_se->dl_yielded)
@@ -1482,6 +1486,7 @@ static void update_curr_dl_se(struct rq *rq, struct sched_dl_entity *dl_se, s64
 
 throttle:
 	if (dl_runtime_exceeded(dl_se) || dl_se->dl_yielded) {
+		trace_sched_dl_throttle_tp(dl_se);
 		dl_se->dl_throttled = 1;
 
 		/* If requested, inform the user about runtime overruns. */
@@ -1590,6 +1595,7 @@ void dl_server_start(struct sched_dl_entity *dl_se)
 	if (!dl_server(dl_se) || dl_se->dl_server_active)
 		return;
 
+	trace_sched_dl_server_start_tp(dl_se);
 	dl_se->dl_server_active = 1;
 	enqueue_dl_entity(dl_se, ENQUEUE_WAKEUP);
 	if (!dl_task(dl_se->rq->curr) || dl_entity_preempt(dl_se, &rq->curr->dl))
@@ -1601,6 +1607,7 @@ void dl_server_stop(struct sched_dl_entity *dl_se)
 	if (!dl_server(dl_se) || !dl_server_active(dl_se))
 		return;
 
+	trace_sched_dl_server_stop_tp(dl_se, true);
 	dequeue_dl_entity(dl_se, DEQUEUE_SLEEP);
 	hrtimer_try_to_cancel(&dl_se->dl_timer);
 	dl_se->dl_defer_armed = 0;
@@ -1618,6 +1625,7 @@ static bool dl_server_stopped(struct sched_dl_entity *dl_se)
 		return true;
 	}
 
+	trace_sched_dl_server_stop_tp(dl_se, false);
 	dl_se->dl_server_idle = 1;
 	return false;
 }
-- 
2.50.1
Re: [RFC PATCH 14/17] sched: Add deadline tracepoints
Posted by Juri Lelli 1 month, 2 weeks ago
Hi!

On 14/08/25 17:08, Gabriele Monaco wrote:
> Add the following tracepoints:
> 
> * sched_dl_throttle(dl):
>     Called when a deadline entity is throttled
> * sched_dl_replenish(dl):
>     Called when a deadline entity's runtime is replenished
> * sched_dl_server_start(dl):
>     Called when a deadline server is started
> * sched_dl_server_stop(dl, hard):
>     Called when a deadline server is stopped (hard) or put to idle
>     waiting for the next period (!hard)
> 
> Those tracepoints can be useful to validate the deadline scheduler with
> RV and are not exported to tracefs.
> 
> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
> ---
>  include/trace/events/sched.h | 55 ++++++++++++++++++++++++++++++++++++
>  kernel/sched/deadline.c      |  8 ++++++
>  2 files changed, 63 insertions(+)
> 
> diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
> index 7b2645b50e78..f34cc1dc4a13 100644
> --- a/include/trace/events/sched.h
> +++ b/include/trace/events/sched.h
> @@ -609,6 +609,45 @@ TRACE_EVENT(sched_pi_setprio,
>  			__entry->oldprio, __entry->newprio)
>  );
>  
> +/*
> +DECLARE_EVENT_CLASS(sched_dl_template,
> +
> +	TP_PROTO(struct sched_dl_entity *dl),
> +
> +	TP_ARGS(dl),
> +
> +	TP_STRUCT__entry(
> +		__field(  struct task_struct *,	tsk		)
> +		__string( comm,		dl->dl_server ? "server" : container_of(dl, struct task_struct, dl)->comm	)
> +		__field(  pid_t,	pid		)
> +		__field(  s64,		runtime		)
> +		__field(  u64,		deadline	)
> +		__field(  int,		dl_yielded	)

I wonder if, while we are at it, we want to print all the other fields
as well (they might turn out to be useful). That would be

 .:: static (easier to retrieve with just a trace)
 - dl_runtime
 - dl_deadline
 - dl_period

 .:: behaviour (RECLAIM)
 - flags

 .:: state
 - dl_ bool flags in addition to dl_yielded

> +	),
> +
> +	TP_fast_assign(
> +		__assign_str(comm);
> +		__entry->pid		= dl->dl_server ? -1 : container_of(dl, struct task_struct, dl)->pid;
> +		__entry->runtime	= dl->runtime;
> +		__entry->deadline	= dl->deadline;
> +		__entry->dl_yielded	= dl->dl_yielded;
> +	),
> +
> +	TP_printk("comm=%s pid=%d runtime=%lld deadline=%lld yielded=%d",
                                                        ^^^
							llu ?

> +			__get_str(comm), __entry->pid,
> +			__entry->runtime, __entry->deadline,
> +			__entry->dl_yielded)
> +);

...

> @@ -1482,6 +1486,7 @@ static void update_curr_dl_se(struct rq *rq, struct sched_dl_entity *dl_se, s64
>  
>  throttle:
>  	if (dl_runtime_exceeded(dl_se) || dl_se->dl_yielded) {
> +		trace_sched_dl_throttle_tp(dl_se);
>  		dl_se->dl_throttled = 1;

I believe we also need to trace the dl_check_constrained_dl() throttle,
please take a look.

Also - we discussed this point a little already offline - but I still
wonder if we have to do anything special for dl-server defer. Those
entities are started as throttled until 0-lag, so maybe we should still
trace them explicitly as so?

In addition, since it's related, maybe we should do something about
sched_switch event, that is currently not aware of deadlines, runtimes,
etc.

Thanks,
Juri
Re: [RFC PATCH 14/17] sched: Add deadline tracepoints
Posted by Peter Zijlstra 1 month, 2 weeks ago
On Tue, Aug 19, 2025 at 11:56:57AM +0200, Juri Lelli wrote:
> Hi!
> 
> On 14/08/25 17:08, Gabriele Monaco wrote:
> > Add the following tracepoints:
> > 
> > * sched_dl_throttle(dl):
> >     Called when a deadline entity is throttled
> > * sched_dl_replenish(dl):
> >     Called when a deadline entity's runtime is replenished
> > * sched_dl_server_start(dl):
> >     Called when a deadline server is started
> > * sched_dl_server_stop(dl, hard):
> >     Called when a deadline server is stopped (hard) or put to idle
> >     waiting for the next period (!hard)
> > 
> > Those tracepoints can be useful to validate the deadline scheduler with
> > RV and are not exported to tracefs.
> > 
> > Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
> > ---
> >  include/trace/events/sched.h | 55 ++++++++++++++++++++++++++++++++++++
> >  kernel/sched/deadline.c      |  8 ++++++
> >  2 files changed, 63 insertions(+)
> > 
> > diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
> > index 7b2645b50e78..f34cc1dc4a13 100644
> > --- a/include/trace/events/sched.h
> > +++ b/include/trace/events/sched.h
> > @@ -609,6 +609,45 @@ TRACE_EVENT(sched_pi_setprio,
> >  			__entry->oldprio, __entry->newprio)
> >  );
> >  
> > +/*
> > +DECLARE_EVENT_CLASS(sched_dl_template,
> > +
> > +	TP_PROTO(struct sched_dl_entity *dl),
> > +
> > +	TP_ARGS(dl),
> > +
> > +	TP_STRUCT__entry(
> > +		__field(  struct task_struct *,	tsk		)
> > +		__string( comm,		dl->dl_server ? "server" : container_of(dl, struct task_struct, dl)->comm	)
> > +		__field(  pid_t,	pid		)
> > +		__field(  s64,		runtime		)
> > +		__field(  u64,		deadline	)
> > +		__field(  int,		dl_yielded	)
> 
> I wonder if, while we are at it, we want to print all the other fields
> as well (they might turn out to be useful). That would be
> 
>  .:: static (easier to retrieve with just a trace)
>  - dl_runtime
>  - dl_deadline
>  - dl_period
> 
>  .:: behaviour (RECLAIM)
>  - flags
> 
>  .:: state
>  - dl_ bool flags in addition to dl_yielded

All these things are used as _tp(). That means they don't have trace
buffer entries ever, why fill out fields?


> > +	),
> > +
> > +	TP_fast_assign(
> > +		__assign_str(comm);
> > +		__entry->pid		= dl->dl_server ? -1 : container_of(dl, struct task_struct, dl)->pid;
> > +		__entry->runtime	= dl->runtime;
> > +		__entry->deadline	= dl->deadline;
> > +		__entry->dl_yielded	= dl->dl_yielded;
> > +	),
> > +
> > +	TP_printk("comm=%s pid=%d runtime=%lld deadline=%lld yielded=%d",
>                                                         ^^^
> 							llu ?
> 
> > +			__get_str(comm), __entry->pid,
> > +			__entry->runtime, __entry->deadline,
> > +			__entry->dl_yielded)
> > +);
> 
> ...
> 
> > @@ -1482,6 +1486,7 @@ static void update_curr_dl_se(struct rq *rq, struct sched_dl_entity *dl_se, s64
> >  
> >  throttle:
> >  	if (dl_runtime_exceeded(dl_se) || dl_se->dl_yielded) {
> > +		trace_sched_dl_throttle_tp(dl_se);
> >  		dl_se->dl_throttled = 1;
> 
> I believe we also need to trace the dl_check_constrained_dl() throttle,
> please take a look.
> 
> Also - we discussed this point a little already offline - but I still
> wonder if we have to do anything special for dl-server defer. Those
> entities are started as throttled until 0-lag, so maybe we should still
> trace them explicitly as so?
> 
> In addition, since it's related, maybe we should do something about
> sched_switch event, that is currently not aware of deadlines, runtimes,
> etc.

As per the whole _tp() thing, you can attach to the actual
sched_switch tracepoint with a module and read whatever you want.
Re: [RFC PATCH 14/17] sched: Add deadline tracepoints
Posted by Gabriele Monaco 1 month, 2 weeks ago

On Tue, 2025-08-19 at 12:12 +0200, Peter Zijlstra wrote:
> On Tue, Aug 19, 2025 at 11:56:57AM +0200, Juri Lelli wrote:
> > Hi!
> > 
> > On 14/08/25 17:08, Gabriele Monaco wrote:
> > > Add the following tracepoints:
> > > 
> > > * sched_dl_throttle(dl):
> > >     Called when a deadline entity is throttled
> > > * sched_dl_replenish(dl):
> > >     Called when a deadline entity's runtime is replenished
> > > * sched_dl_server_start(dl):
> > >     Called when a deadline server is started
> > > * sched_dl_server_stop(dl, hard):
> > >     Called when a deadline server is stopped (hard) or put to
> > > idle
> > >     waiting for the next period (!hard)
> > > 
> > > Those tracepoints can be useful to validate the deadline
> > > scheduler with
> > > RV and are not exported to tracefs.
> > > 
> > > Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
> > > ---
> > >  include/trace/events/sched.h | 55
> > > ++++++++++++++++++++++++++++++++++++
> > >  kernel/sched/deadline.c      |  8 ++++++
> > >  2 files changed, 63 insertions(+)
> > > 
> > > diff --git a/include/trace/events/sched.h
> > > b/include/trace/events/sched.h
> > > index 7b2645b50e78..f34cc1dc4a13 100644
> > > --- a/include/trace/events/sched.h
> > > +++ b/include/trace/events/sched.h
> > > @@ -609,6 +609,45 @@ TRACE_EVENT(sched_pi_setprio,
> > >  			__entry->oldprio, __entry->newprio)
> > >  );
> > >  
> > > +/*
> > > +DECLARE_EVENT_CLASS(sched_dl_template,
> > > +
> > > +	TP_PROTO(struct sched_dl_entity *dl),
> > > +
> > > +	TP_ARGS(dl),
> > > +
> > > +	TP_STRUCT__entry(
> > > +		__field(  struct task_struct
> > > *,	tsk		)
> > > +		__string( comm,		dl->dl_server ?
> > > "server" : container_of(dl, struct task_struct, dl)-
> > > >comm	)
> > > +		__field(  pid_t,	pid		)
> > > +		__field( 
> > > s64,		runtime		)
> > > +		__field(  u64,		deadline	)
> > > +		__field(  int,		dl_yielded	)
> > 
> > I wonder if, while we are at it, we want to print all the other
> > fields
> > as well (they might turn out to be useful). That would be
> > 
> >  .:: static (easier to retrieve with just a trace)
> >  - dl_runtime
> >  - dl_deadline
> >  - dl_period
> > 
> >  .:: behaviour (RECLAIM)
> >  - flags
> > 
> >  .:: state
> >  - dl_ bool flags in addition to dl_yielded
> 
> All these things are used as _tp(). That means they don't have trace
> buffer entries ever, why fill out fields?
> 

Right, that is a relic of the way I put it initially, this whole thing
is commented out (which is indeed confusing and barely noticeable in
the patch).
The tracepoints are in fact not exported to the tracefs and do not use
the print format.

I should have removed this, the real ones are at the bottom of the
file.

> 
> > > +	),
> > > +
> > > +	TP_fast_assign(
> > > +		__assign_str(comm);
> > > +		__entry->pid		= dl->dl_server ? -1 :
> > > container_of(dl, struct task_struct, dl)->pid;
> > > +		__entry->runtime	= dl->runtime;
> > > +		__entry->deadline	= dl->deadline;
> > > +		__entry->dl_yielded	= dl->dl_yielded;
> > > +	),
> > > +
> > > +	TP_printk("comm=%s pid=%d runtime=%lld deadline=%lld
> > > yielded=%d",
> >                                                         ^^^
> > 							llu ?
> > 

As above, this should all go away.

> > > +			__get_str(comm), __entry->pid,
> > > +			__entry->runtime, __entry->deadline,
> > > +			__entry->dl_yielded)
> > > +);
> > 
> > ...
> > 
> > > @@ -1482,6 +1486,7 @@ static void update_curr_dl_se(struct rq
> > > *rq, struct sched_dl_entity *dl_se, s64
> > >  
> > >  throttle:
> > >  	if (dl_runtime_exceeded(dl_se) || dl_se->dl_yielded) {
> > > +		trace_sched_dl_throttle_tp(dl_se);
> > >  		dl_se->dl_throttled = 1;
> > 
> > I believe we also need to trace the dl_check_constrained_dl()
> > throttle, please take a look.

Probably yes, strangely I couldn't see failures without it, but it may
be down to my test setup. I'm going to have a look.

> > Also - we discussed this point a little already offline - but I
> > still wonder if we have to do anything special for dl-server defer.
> > Those entities are started as throttled until 0-lag, so maybe we
> > should still trace them explicitly as so?

The naming might need a bit of a consistency check here, but for the
monitor, the server is running, armed or preempted. Before the 0-lag,
it will never be running, so it will stay as armed (fair tasks running)
or preempted (rt tasks running).

armed and preempted have the _throttled version just to indicate an
explicit throttle event occurred.

We might want to start it in the armed_throttled if we are really
guaranteed not to see a throttle event, but I think that would
complicate the model considerably.

We could instead validate the 0-lag concept in a separate, server-
specific model.

Does this initial model feel particularly wrong for the server case?

> > In addition, since it's related, maybe we should do something about
> > sched_switch event, that is currently not aware of deadlines,
> > runtimes, etc.

I'm not sure I follow you here, what relation with switch and
runtime/deadline should we enforce?

We don't really force the switch to occur timely after throttling, is
that what you mean?
Or a switch must occur again timely after replenishing?

> As per the whole _tp() thing, you can attach to the actual
> sched_switch tracepoint with a module and read whatever you want.

Yeah I believe Juri referred to model constraints on the already
existing events rather than new tracepoints here.

Thanks both,
Gabriele
Re: [RFC PATCH 14/17] sched: Add deadline tracepoints
Posted by Juri Lelli 1 month, 2 weeks ago
On 19/08/25 12:34, Gabriele Monaco wrote:
> 
> 
> On Tue, 2025-08-19 at 12:12 +0200, Peter Zijlstra wrote:
> > On Tue, Aug 19, 2025 at 11:56:57AM +0200, Juri Lelli wrote:
> > > Hi!
> > > 
> > > On 14/08/25 17:08, Gabriele Monaco wrote:

...

> > > > @@ -1482,6 +1486,7 @@ static void update_curr_dl_se(struct rq
> > > > *rq, struct sched_dl_entity *dl_se, s64
> > > >  
> > > >  throttle:
> > > >  	if (dl_runtime_exceeded(dl_se) || dl_se->dl_yielded) {
> > > > +		trace_sched_dl_throttle_tp(dl_se);
> > > >  		dl_se->dl_throttled = 1;
> > > 
> > > I believe we also need to trace the dl_check_constrained_dl()
> > > throttle, please take a look.
> 
> Probably yes, strangely I couldn't see failures without it, but it may
> be down to my test setup. I'm going to have a look.

Not sure if you tested with constrained (deadline != period) tasks.

> > > Also - we discussed this point a little already offline - but I
> > > still wonder if we have to do anything special for dl-server defer.
> > > Those entities are started as throttled until 0-lag, so maybe we
> > > should still trace them explicitly as so?
> 
> The naming might need a bit of a consistency check here, but for the
> monitor, the server is running, armed or preempted. Before the 0-lag,
> it will never be running, so it will stay as armed (fair tasks running)
> or preempted (rt tasks running).
> 
> armed and preempted have the _throttled version just to indicate an
> explicit throttle event occurred.
> 
> We might want to start it in the armed_throttled if we are really
> guaranteed not to see a throttle event, but I think that would
> complicate the model considerably.
> 
> We could instead validate the 0-lag concept in a separate, server-
> specific model.
> 
> Does this initial model feel particularly wrong for the server case?

No it doesn't atm. :-) Thanks for the additional information.

> > > In addition, since it's related, maybe we should do something about
> > > sched_switch event, that is currently not aware of deadlines,
> > > runtimes, etc.
> 
> I'm not sure I follow you here, what relation with switch and
> runtime/deadline should we enforce?
> 
> We don't really force the switch to occur timely after throttling, is
> that what you mean?
> Or a switch must occur again timely after replenishing?

Hummm, yeah I was wondering if we need something along these lines, but
we can also maybe leave it for the future.

> > As per the whole _tp() thing, you can attach to the actual
> > sched_switch tracepoint with a module and read whatever you want.
> 
> Yeah I believe Juri referred to model constraints on the already
> existing events rather than new tracepoints here.

Separately from this series, maybe we should put such a module/bpf thing
somewhere shared, so it's easier to use it when needed.
Re: [RFC PATCH 14/17] sched: Add deadline tracepoints
Posted by Juri Lelli 1 month ago
On 19/08/25 16:02, Juri Lelli wrote:
> On 19/08/25 12:34, Gabriele Monaco wrote:

...

> > > As per the whole _tp() thing, you can attach to the actual
> > > sched_switch tracepoint with a module and read whatever you want.
> > 
> > Yeah I believe Juri referred to model constraints on the already
> > existing events rather than new tracepoints here.
> 
> Separately from this series, maybe we should put such a module/bpf thing
> somewhere shared, so it's easier to use it when needed.

Maybe we could

---
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 383cfc684e8e..994b6973d77d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -117,6 +117,10 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(sched_util_est_cfs_tp);
 EXPORT_TRACEPOINT_SYMBOL_GPL(sched_util_est_se_tp);
 EXPORT_TRACEPOINT_SYMBOL_GPL(sched_update_nr_running_tp);
 EXPORT_TRACEPOINT_SYMBOL_GPL(sched_compute_energy_tp);
+EXPORT_TRACEPOINT_SYMBOL_GPL(sched_dl_throttle_tp);
+EXPORT_TRACEPOINT_SYMBOL_GPL(sched_dl_replenish_tp);
+EXPORT_TRACEPOINT_SYMBOL_GPL(sched_dl_server_start_tp);
+EXPORT_TRACEPOINT_SYMBOL_GPL(sched_dl_server_stop_tp);
 
 DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
---

so that the new tps can be used from modules like sched_tp Phil
mentioned?

Thanks,
Juri
Re: [RFC PATCH 14/17] sched: Add deadline tracepoints
Posted by Phil Auld 1 month, 2 weeks ago
Hi Juri,

On Tue, Aug 19, 2025 at 04:02:04PM +0200 Juri Lelli wrote:
> On 19/08/25 12:34, Gabriele Monaco wrote:
> > 
> > 
> > On Tue, 2025-08-19 at 12:12 +0200, Peter Zijlstra wrote:
> > > On Tue, Aug 19, 2025 at 11:56:57AM +0200, Juri Lelli wrote:
> > > > Hi!
> > > > 
> > > > On 14/08/25 17:08, Gabriele Monaco wrote:
> 
> ...
> 
> > > > > @@ -1482,6 +1486,7 @@ static void update_curr_dl_se(struct rq
> > > > > *rq, struct sched_dl_entity *dl_se, s64
> > > > >  
> > > > >  throttle:
> > > > >  	if (dl_runtime_exceeded(dl_se) || dl_se->dl_yielded) {
> > > > > +		trace_sched_dl_throttle_tp(dl_se);
> > > > >  		dl_se->dl_throttled = 1;
> > > > 
> > > > I believe we also need to trace the dl_check_constrained_dl()
> > > > throttle, please take a look.
> > 
> > Probably yes, strangely I couldn't see failures without it, but it may
> > be down to my test setup. I'm going to have a look.
> 
> Not sure if you tested with constrained (deadline != period) tasks.
> 
> > > > Also - we discussed this point a little already offline - but I
> > > > still wonder if we have to do anything special for dl-server defer.
> > > > Those entities are started as throttled until 0-lag, so maybe we
> > > > should still trace them explicitly as so?
> > 
> > The naming might need a bit of a consistency check here, but for the
> > monitor, the server is running, armed or preempted. Before the 0-lag,
> > it will never be running, so it will stay as armed (fair tasks running)
> > or preempted (rt tasks running).
> > 
> > armed and preempted have the _throttled version just to indicate an
> > explicit throttle event occurred.
> > 
> > We might want to start it in the armed_throttled if we are really
> > guaranteed not to see a throttle event, but I think that would
> > complicate the model considerably.
> > 
> > We could instead validate the 0-lag concept in a separate, server-
> > specific model.
> > 
> > Does this initial model feel particularly wrong for the server case?
> 
> No it doesn't atm. :-) Thanks for the additional information.
> 
> > > > In addition, since it's related, maybe we should do something about
> > > > sched_switch event, that is currently not aware of deadlines,
> > > > runtimes, etc.
> > 
> > I'm not sure I follow you here, what relation with switch and
> > runtime/deadline should we enforce?
> > 
> > We don't really force the switch to occur timely after throttling, is
> > that what you mean?
> > Or a switch must occur again timely after replenishing?
> 
> Hummm, yeah I was wondering if we need something along these lines, but
> we can also maybe leave it for the future.
> 
> > > As per the whole _tp() thing, you can attach to the actual
> > > sched_switch tracepoint with a module and read whatever you want.
> > 
> > Yeah I believe Juri referred to model constraints on the already
> > existing events rather than new tracepoints here.
> 
> Separately from this series, maybe we should put such a module/bpf thing
> somewhere shared, so it's easier to use it when needed.
> 
> 

A few of us use: https://github.com/qais-yousef/sched_tp.git

This has all the current scheduler raw tps exposed, I believe, but would
need updates for these new ones, of course. 

I have a gitlab fork with our perf team uses to get at the ones they use
(mostly the nr_running ones to make their heat maps).

Fwiw...



Cheers,
Phil

-- 
Re: [RFC PATCH 14/17] sched: Add deadline tracepoints
Posted by Juri Lelli 1 month, 2 weeks ago
On 19/08/25 10:38, Phil Auld wrote:

...

> A few of us use: https://github.com/qais-yousef/sched_tp.git
> 
> This has all the current scheduler raw tps exposed, I believe, but would
> need updates for these new ones, of course. 
> 
> I have a gitlab fork with our perf team uses to get at the ones they use
> (mostly the nr_running ones to make their heat maps).

Ah, cool. Didn't know about this, thanks for sharing! I'll take a look.

Best,
Juri
Re: [RFC PATCH 14/17] sched: Add deadline tracepoints
Posted by Gabriele Monaco 1 month, 2 weeks ago
On Tue, 2025-08-19 at 16:02 +0200, Juri Lelli wrote:
> On 19/08/25 12:34, Gabriele Monaco wrote:
> > 
> > 
> > On Tue, 2025-08-19 at 12:12 +0200, Peter Zijlstra wrote:
> > > On Tue, Aug 19, 2025 at 11:56:57AM +0200, Juri Lelli wrote:
> > > > Hi!
> > > > 
> > > > On 14/08/25 17:08, Gabriele Monaco wrote:
> 
> ...
> 
> > > > > @@ -1482,6 +1486,7 @@ static void update_curr_dl_se(struct rq
> > > > > *rq, struct sched_dl_entity *dl_se, s64
> > > > >  
> > > > >  throttle:
> > > > >  	if (dl_runtime_exceeded(dl_se) || dl_se->dl_yielded)
> > > > > {
> > > > > +		trace_sched_dl_throttle_tp(dl_se);
> > > > >  		dl_se->dl_throttled = 1;
> > > > 
> > > > I believe we also need to trace the dl_check_constrained_dl()
> > > > throttle, please take a look.
> > 
> > Probably yes, strangely I couldn't see failures without it, but it
> > may
> > be down to my test setup. I'm going to have a look.
> 
> Not sure if you tested with constrained (deadline != period) tasks.

Not much actually.. I should start.

> > > > Also - we discussed this point a little already offline - but I
> > > > still wonder if we have to do anything special for dl-server
> > > > defer.
> > > > Those entities are started as throttled until 0-lag, so maybe
> > > > we
> > > > should still trace them explicitly as so?
> > 
> > The naming might need a bit of a consistency check here, but for
> > the
> > monitor, the server is running, armed or preempted. Before the 0-
> > lag,
> > it will never be running, so it will stay as armed (fair tasks
> > running)
> > or preempted (rt tasks running).
> > 
> > armed and preempted have the _throttled version just to indicate an
> > explicit throttle event occurred.
> > 
> > We might want to start it in the armed_throttled if we are really
> > guaranteed not to see a throttle event, but I think that would
> > complicate the model considerably.
> > 
> > We could instead validate the 0-lag concept in a separate, server-
> > specific model.
> > 
> > Does this initial model feel particularly wrong for the server
> > case?
> 
> No it doesn't atm. :-) Thanks for the additional information.

Perfect, I guess I need to write this a bit more clearly in the model
description.

> 
> > > > In addition, since it's related, maybe we should do something
> > > > about
> > > > sched_switch event, that is currently not aware of deadlines,
> > > > runtimes, etc.
> > 
> > I'm not sure I follow you here, what relation with switch and
> > runtime/deadline should we enforce?
> > 
> > We don't really force the switch to occur timely after throttling,
> > is
> > that what you mean?
> > Or a switch must occur again timely after replenishing?
> 
> Hummm, yeah I was wondering if we need something along these lines,
> but we can also maybe leave it for the future.

I'll have a thought about this, perhaps it's as simple as adding a few
more constraints on the edges.

> 
> > > As per the whole _tp() thing, you can attach to the actual
> > > sched_switch tracepoint with a module and read whatever you want.
> > 
> > Yeah I believe Juri referred to model constraints on the already
> > existing events rather than new tracepoints here.
> 
> Separately from this series, maybe we should put such a module/bpf
> thing somewhere shared, so it's easier to use it when needed.

You mean some module/bpf to print those tracepoints to the ftrace
buffer? Yeah that might help, but it might be ugly and tracepoint-
specific.

Also perf probe doesn't support (yet) this type of tracepoints, but
once it does, I guess it would do the job quite nicely.

Thanks,
Gabriele