[PATCH v2] sched_ext: Add trace point to track sched_ext core events

Changwoo Min posted 1 patch 9 months, 2 weeks ago
include/trace/events/sched_ext.h | 19 +++++++++++++++++++
kernel/sched/ext.c               |  2 ++
2 files changed, 21 insertions(+)
[PATCH v2] sched_ext: Add trace point to track sched_ext core events
Posted by Changwoo Min 9 months, 2 weeks ago
Add tracing support to track sched_ext core events
(/sched_ext/sched_ext_event). This may be useful for debugging sched_ext
schedulers that trigger a particular event.

The trace point can be used as other trace points, so it can be used in,
for example, `perf trace` and BPF programs, as follows:

======
$> sudo perf trace -e sched_ext:sched_ext_event --filter 'name == "SCX_EV_ENQ_SLICE_DFL"'
======

======
struct tp_sched_ext_event {
	struct trace_entry ent;
	u32 __data_loc_name;
	u64 delta;
};

SEC("tracepoint/sched_ext/sched_ext_event")
int rtp_add_event(struct tp_sched_ext_event *ctx)
{
	char event_name[128];
	unsigned short offset = ctx->__data_loc_name & 0xFFFF;
        bpf_probe_read_str((void *)event_name, 128, (char *)ctx + offset);

	bpf_printk("name %s   delta %llu", event_name, ctx->delta);
	return 0;
}
======

Signed-off-by: Changwoo Min <changwoo@igalia.com>
---

ChangeLog v1 -> v2:
 - Rename @added field to @delta for clarity.
 - Rename sched_ext_add_event to sched_ext_event.
 - Drop the @offset field to avoid the potential misuse of non-portable numbers.

 include/trace/events/sched_ext.h | 19 +++++++++++++++++++
 kernel/sched/ext.c               |  2 ++
 2 files changed, 21 insertions(+)

diff --git a/include/trace/events/sched_ext.h b/include/trace/events/sched_ext.h
index fe19da7315a9..b73499981682 100644
--- a/include/trace/events/sched_ext.h
+++ b/include/trace/events/sched_ext.h
@@ -26,6 +26,25 @@ TRACE_EVENT(sched_ext_dump,
 	)
 );
 
+TRACE_EVENT(sched_ext_event,
+	    TP_PROTO(const char *name, __u64 delta),
+	    TP_ARGS(name, delta),
+
+	TP_STRUCT__entry(
+		__string(name, name)
+		__field(	__u64,		delta		)
+	),
+
+	TP_fast_assign(
+		__assign_str(name);
+		__entry->delta		= delta;
+	),
+
+	TP_printk("name %s delta %llu",
+		  __get_str(name), __entry->delta
+	)
+);
+
 #endif /* _TRACE_SCHED_EXT_H */
 
 /* This part must be outside protection */
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 986b655911df..53729c584b63 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -1554,6 +1554,7 @@ static DEFINE_PER_CPU(struct scx_event_stats, event_stats_cpu);
  */
 #define scx_add_event(name, cnt) do {						\
 	this_cpu_add(event_stats_cpu.name, cnt);				\
+	trace_sched_ext_event(#name, cnt);					\
 } while(0)
 
 /**
@@ -1565,6 +1566,7 @@ static DEFINE_PER_CPU(struct scx_event_stats, event_stats_cpu);
  */
 #define __scx_add_event(name, cnt) do {						\
 	__this_cpu_add(event_stats_cpu.name, cnt);				\
+	trace_sched_ext_event(#name, cnt);					\
 } while(0)
 
 /**
-- 
2.48.1
Re: [PATCH v2] sched_ext: Add trace point to track sched_ext core events
Posted by Andrea Righi 9 months, 2 weeks ago
On Fri, Feb 28, 2025 at 05:59:44PM +0900, Changwoo Min wrote:
> Add tracing support to track sched_ext core events
> (/sched_ext/sched_ext_event). This may be useful for debugging sched_ext
> schedulers that trigger a particular event.
> 
> The trace point can be used as other trace points, so it can be used in,
> for example, `perf trace` and BPF programs, as follows:
> 
> ======
> $> sudo perf trace -e sched_ext:sched_ext_event --filter 'name == "SCX_EV_ENQ_SLICE_DFL"'
> ======
> 
> ======
> struct tp_sched_ext_event {
> 	struct trace_entry ent;
> 	u32 __data_loc_name;
> 	u64 delta;
> };
> 
> SEC("tracepoint/sched_ext/sched_ext_event")
> int rtp_add_event(struct tp_sched_ext_event *ctx)
> {
> 	char event_name[128];
> 	unsigned short offset = ctx->__data_loc_name & 0xFFFF;
>         bpf_probe_read_str((void *)event_name, 128, (char *)ctx + offset);
> 
> 	bpf_printk("name %s   delta %llu", event_name, ctx->delta);
> 	return 0;
> }
> ======
> 
> Signed-off-by: Changwoo Min <changwoo@igalia.com>
> ---
> 
> ChangeLog v1 -> v2:
>  - Rename @added field to @delta for clarity.
>  - Rename sched_ext_add_event to sched_ext_event.
>  - Drop the @offset field to avoid the potential misuse of non-portable numbers.
> 
>  include/trace/events/sched_ext.h | 19 +++++++++++++++++++
>  kernel/sched/ext.c               |  2 ++
>  2 files changed, 21 insertions(+)
> 
> diff --git a/include/trace/events/sched_ext.h b/include/trace/events/sched_ext.h
> index fe19da7315a9..b73499981682 100644
> --- a/include/trace/events/sched_ext.h
> +++ b/include/trace/events/sched_ext.h
> @@ -26,6 +26,25 @@ TRACE_EVENT(sched_ext_dump,
>  	)
>  );
>  
> +TRACE_EVENT(sched_ext_event,
> +	    TP_PROTO(const char *name, __u64 delta),
> +	    TP_ARGS(name, delta),
> +
> +	TP_STRUCT__entry(
> +		__string(name, name)
> +		__field(	__u64,		delta		)

I'm wondering if we should use a __s64 here (and %lld below). We don't have
negative deltas right now, but in the future who knows...

Apart than that, everything else looks good to me.

Thanks,
-Andrea

> +	),
> +
> +	TP_fast_assign(
> +		__assign_str(name);
> +		__entry->delta		= delta;
> +	),
> +
> +	TP_printk("name %s delta %llu",
> +		  __get_str(name), __entry->delta
> +	)
> +);
> +
>  #endif /* _TRACE_SCHED_EXT_H */
>  
>  /* This part must be outside protection */
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index 986b655911df..53729c584b63 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -1554,6 +1554,7 @@ static DEFINE_PER_CPU(struct scx_event_stats, event_stats_cpu);
>   */
>  #define scx_add_event(name, cnt) do {						\
>  	this_cpu_add(event_stats_cpu.name, cnt);				\
> +	trace_sched_ext_event(#name, cnt);					\
>  } while(0)
>  
>  /**
> @@ -1565,6 +1566,7 @@ static DEFINE_PER_CPU(struct scx_event_stats, event_stats_cpu);
>   */
>  #define __scx_add_event(name, cnt) do {						\
>  	__this_cpu_add(event_stats_cpu.name, cnt);				\
> +	trace_sched_ext_event(#name, cnt);					\
>  } while(0)
>  
>  /**
> -- 
> 2.48.1
>
Re: [PATCH v2] sched_ext: Add trace point to track sched_ext core events
Posted by Tejun Heo 9 months, 2 weeks ago
On Fri, Feb 28, 2025 at 11:03:54AM +0100, Andrea Righi wrote:
> > +TRACE_EVENT(sched_ext_event,
> > +	    TP_PROTO(const char *name, __u64 delta),
> > +	    TP_ARGS(name, delta),
> > +
> > +	TP_STRUCT__entry(
> > +		__string(name, name)
> > +		__field(	__u64,		delta		)
> 
> I'm wondering if we should use a __s64 here (and %lld below). We don't have
> negative deltas right now, but in the future who knows...
> 
> Apart than that, everything else looks good to me.

And let's also print out the updated value.

Thanks.

-- 
tejun
Re: [PATCH v2] sched_ext: Add trace point to track sched_ext core events
Posted by Changwoo Min 9 months, 2 weeks ago
Hi Tejun and Andrea,

On 25. 3. 1. 02:31, Tejun Heo wrote:
> On Fri, Feb 28, 2025 at 11:03:54AM +0100, Andrea Righi wrote:
>>> +TRACE_EVENT(sched_ext_event,
>>> +	    TP_PROTO(const char *name, __u64 delta),
>>> +	    TP_ARGS(name, delta),
>>> +
>>> +	TP_STRUCT__entry(
>>> +		__string(name, name)
>>> +		__field(	__u64,		delta		)
>>
>> I'm wondering if we should use a __s64 here (and %lld below). We don't have
>> negative deltas right now, but in the future who knows...

That makes sense. I will change it to __s64 in the tracepoint.
Also, I will make corresponding changes (u64 -> s64) in other
places, including struct scx_event_stats.

> 
> And let's also print out the updated value.

You might have two options here: 1) returning per-CPU event
counter or 2) returning aggregated event counter. The first opion
will be fast but less meaningful from user's point of view
compared to the second option. Assuming the tracepoint are not in
the hot path, I think the second option will be better choice.
I will add an @event field and a special version of
scx_bpf_events() for faster aggregation.

Thanks!
Changwoo Min
Re: [PATCH v2] sched_ext: Add trace point to track sched_ext core events
Posted by Tejun Heo 9 months, 2 weeks ago
Hello,

On Sat, Mar 01, 2025 at 08:33:36AM +0900, Changwoo Min wrote:
...
> You might have two options here: 1) returning per-CPU event
> counter or 2) returning aggregated event counter. The first opion
> will be fast but less meaningful from user's point of view
> compared to the second option. Assuming the tracepoint are not in
> the hot path, I think the second option will be better choice.
> I will add an @event field and a special version of
> scx_bpf_events() for faster aggregation.

Ah, right, let's forget about printing aggregate for now. That's not
difficult for anyone to find out if necessary after all.

Thanks.

-- 
tejun
Re: [PATCH v2] sched_ext: Add trace point to track sched_ext core events
Posted by Changwoo Min 9 months, 2 weeks ago

On 25. 3. 1. 08:50, Tejun Heo wrote:
> Hello,
> 
> On Sat, Mar 01, 2025 at 08:33:36AM +0900, Changwoo Min wrote:
> ...
>> You might have two options here: 1) returning per-CPU event
>> counter or 2) returning aggregated event counter. The first opion
>> will be fast but less meaningful from user's point of view
>> compared to the second option. Assuming the tracepoint are not in
>> the hot path, I think the second option will be better choice.
>> I will add an @event field and a special version of
>> scx_bpf_events() for faster aggregation.
> 
> Ah, right, let's forget about printing aggregate for now. That's not
> difficult for anyone to find out if necessary after all.

Okay. I will send a new version with __s64 changes.

Thanks!