[v1] sched_ext: Implement core event counters

[PATCH 5/7] sched_ext: Add an event, SCX_EVENT_RQ_BYPASSING_OPS

Posted by Changwoo Min 1 year ago

Add a core event, SCX_EVENT_RQ_BYPASSING_OPS, which represents how many
operations are bypassed once the bypass mode is set.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
---
 kernel/sched/ext.c | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 094e19f5fb78..44b44d963a0c 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -1485,6 +1485,11 @@ struct scx_event_stat {
 	 * is dispatched to a local DSQ when exiting.
 	 */
 	u64		ENQ_LOCAL_EXITING;
+
+	/*
+	 * When the bypassing mode is set, the number of bypassed operations.
+	 */
+	u64		RQ_BYPASSING_OPS;
 };
 
 #define SCX_EVENT_IDX(e)	(offsetof(struct scx_event_stat, e)/sizeof(u64))
@@ -1500,6 +1505,7 @@ enum scx_event_kind {
 	SCX_EVENT_DEFINE(OFFLINE_LOCAL_DSQ),
 	SCX_EVENT_DEFINE(CNTD_RUN_WO_ENQ),
 	SCX_EVENT_DEFINE(ENQ_LOCAL_EXITING),
+	SCX_EVENT_DEFINE(RQ_BYPASSING_OPS),
 	SCX_EVENT_END = SCX_EVENT_END_IDX(),
 };
 
@@ -1508,6 +1514,7 @@ static const char *scx_event_stat_str[] = {
 	[SCX_EVENT_OFFLINE_LOCAL_DSQ]	= "offline_local_dsq",
 	[SCX_EVENT_CNTD_RUN_WO_ENQ]	= "cntd_run_wo_enq",
 	[SCX_EVENT_ENQ_LOCAL_EXITING]	= "enq_local_exiting",
+	[SCX_EVENT_RQ_BYPASSING_OPS]	= "rq_bypassing_ops",
 };
 
 /*
@@ -2087,8 +2094,10 @@ static void do_enqueue_task(struct rq *rq, struct task_struct *p, u64 enq_flags,
 	if (!scx_rq_online(rq))
 		goto local;
 
-	if (scx_rq_bypassing(rq))
+	if (scx_rq_bypassing(rq)) {
+		scx_add_event(RQ_BYPASSING_OPS, 1);
 		goto global;
+	}
 
 	if (p->scx.ddsp_dsq_id != SCX_DSQ_INVALID)
 		goto direct;
@@ -2933,6 +2942,8 @@ static int balance_one(struct rq *rq, struct task_struct *prev)
 	     scx_rq_bypassing(rq))) {
 		rq->scx.flags |= SCX_RQ_BAL_KEEP;
 		scx_add_event(CNTD_RUN_WO_ENQ, 1);
+		if (scx_rq_bypassing(rq))
+			scx_add_event(RQ_BYPASSING_OPS, 1);
 		goto has_tasks;
 	}
 	rq->scx.flags &= ~SCX_RQ_IN_BALANCE;
@@ -3708,6 +3719,9 @@ static int select_task_rq_scx(struct task_struct *p, int prev_cpu, int wake_flag
 			p->scx.slice = SCX_SLICE_DFL;
 			p->scx.ddsp_dsq_id = SCX_DSQ_LOCAL;
 		}
+
+		if (scx_rq_bypassing(task_rq(p)))
+			scx_add_event(RQ_BYPASSING_OPS, 1);
 		return cpu;
 	}
 }
@@ -3799,6 +3813,8 @@ void __scx_update_idle(struct rq *rq, bool idle, bool do_notify)
 	 */
 	if (SCX_HAS_OP(update_idle) && do_notify && !scx_rq_bypassing(rq))
 		SCX_CALL_OP(SCX_KF_REST, update_idle, cpu_of(rq), idle);
+	else if (scx_rq_bypassing(rq))
+		scx_add_event(RQ_BYPASSING_OPS, 1);
 
 	/*
 	 * Update the idle masks:
@@ -3940,6 +3956,7 @@ static void task_tick_scx(struct rq *rq, struct task_struct *curr, int queued)
 	if (scx_rq_bypassing(rq)) {
 		curr->scx.slice = 0;
 		touch_core_sched(rq, curr);
+		scx_add_event(RQ_BYPASSING_OPS, 1);
 	} else if (SCX_HAS_OP(tick)) {
 		SCX_CALL_OP(SCX_KF_REST, tick, curr);
 	}
@@ -7131,8 +7148,10 @@ __bpf_kfunc void scx_bpf_kick_cpu(s32 cpu, u64 flags)
 	 * lead to irq_work_queue() malfunction such as infinite busy wait for
 	 * IRQ status update. Suppress kicking.
 	 */
-	if (scx_rq_bypassing(this_rq))
+	if (scx_rq_bypassing(this_rq)) {
+		scx_add_event(RQ_BYPASSING_OPS, 1);
 		goto out;
+	}
 
 	/*
 	 * Actual kicking is bounced to kick_cpus_irq_workfn() to avoid nesting
-- 
2.48.1

Re: [PATCH 5/7] sched_ext: Add an event, SCX_EVENT_RQ_BYPASSING_OPS

Posted by Tejun Heo 1 year ago

On Fri, Jan 17, 2025 at 12:15:41AM +0900, Changwoo Min wrote:
> Add a core event, SCX_EVENT_RQ_BYPASSING_OPS, which represents how many
> operations are bypassed once the bypass mode is set.
> 
> Signed-off-by: Changwoo Min <changwoo@igalia.com>
> ---
>  kernel/sched/ext.c | 23 +++++++++++++++++++++--
>  1 file changed, 21 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index 094e19f5fb78..44b44d963a0c 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -1485,6 +1485,11 @@ struct scx_event_stat {
>  	 * is dispatched to a local DSQ when exiting.
>  	 */
>  	u64		ENQ_LOCAL_EXITING;
> +
> +	/*
> +	 * When the bypassing mode is set, the number of bypassed operations.
> +	 */
> +	u64		RQ_BYPASSING_OPS;
>  };

I'm not sure this is a particularly good way to account for bypass mode.
Maybe account the total duration of bypass modes, the number of times it was
activated and the number of times tasks were dispatched in bypass mode?

Thanks.

-- 
tejun

Re: [PATCH 5/7] sched_ext: Add an event, SCX_EVENT_RQ_BYPASSING_OPS

Posted by Changwoo Min 1 year ago

Hello,

On 25. 1. 17. 10:41, Tejun Heo wrote:
>> +
>> +	/*
>> +	 * When the bypassing mode is set, the number of bypassed operations.
>> +	 */
>> +	u64		RQ_BYPASSING_OPS;
>>   };
> 
> I'm not sure this is a particularly good way to account for bypass mode.
> Maybe account the total duration of bypass modes,the number of times it was
> activated and the number of times tasks were dispatched in bypass mode?
I think it is a good idea to further specialize RQ_BYPASSING events.

For the number of times the bypassing mode activated, what about
BYPASS_NR_ACTIVATED?

For the number of task dispatched,what about
BYPASS_NR_TASK_DISPATCHED?

I think BYPASS_NR_ACTIVATED and BYPASS_NR_TASK_DISPATCHED will be
a good proxy for the total duration, so we can skip it until we
have a clear user case. If we need the total duration now (maybe
BYPASS_DURATION?), we can directly measure it in the
scx_ops_bypass() directly. What do you think?

Regards,
Changwoo Min

Re: [PATCH 5/7] sched_ext: Add an event, SCX_EVENT_RQ_BYPASSING_OPS

Posted by Tejun Heo 1 year ago

Hello,

On Fri, Jan 17, 2025 at 04:31:55PM +0900, Changwoo Min wrote:
...
> For the number of times the bypassing mode activated, what about
> BYPASS_NR_ACTIVATED?
> 
> For the number of task dispatched,what about
> BYPASS_NR_TASK_DISPATCHED?

Those names are fine but we used simple imperatives for other names, so the
followings might be more consistent:

- BYPASS_ACTIVATE
- BYPASS_DISPATCH

> I think BYPASS_NR_ACTIVATED and BYPASS_NR_TASK_DISPATCHED will be
> a good proxy for the total duration, so we can skip it until we
> have a clear user case. If we need the total duration now (maybe
> BYPASS_DURATION?), we can directly measure it in the
> scx_ops_bypass() directly. What do you think?

I think it'd be a useful counter to have and measuring from scx_ops_bypass()
makes sense to me.

Thanks.

-- 
tejun

[PATCH 1/7] sched_ext: Implement event counter infrastructure and add an event
[PATCH 2/7] sched_ext: Add an event, SCX_EVENT_OFFLINE_LOCAL_DSQ
[PATCH 3/7] sched_ext: Add an event, SCX_EVENT_CNTD_RUN_WO_ENQ
[PATCH 4/7] sched_ext: Add an event, SCX_EVENT_ENQ_LOCAL_EXITING
[PATCH 5/7] sched_ext: Add an event, SCX_EVENT_RQ_BYPASSING_OPS
[PATCH 6/7] sched_ext: Add scx_bpf_event_stat() and scx_read_event() for BPF schedulers
[PATCH 7/7] sched_ext: Print core event count in scx_central scheduler