From nobody Sun Feb 8 06:21:50 2026 Received: from mail-io1-f45.google.com (mail-io1-f45.google.com [209.85.166.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 79AA51F4190 for ; Fri, 3 Oct 2025 19:54:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.45 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759521259; cv=none; b=vDMBmE/pYi8ytZQW/jQ0xU+atZwi7FQfANtuWTseikBNdqNRKX4PJ8fxw7NLZn76EPRemutai5Q7UyhEmyXzQtMptcOjgkt+WAnF0tWKYv8MFTIgu3Z/DlOWri7kO7vhn1TK1+JmsA6A60Ys8GHHhrGAOiC3qAaY3//5fqIZ9vc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759521259; c=relaxed/simple; bh=aZW+03fLX/wTJ3rnXWszKeAlBNdbOHEE1U9BRDzjScU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=t9IIKtQyHrs74VAwdSjEfPRnpr16sYyREMBhu0b6LTlm6V545k0Vln2/z+2ZbOXYzR2Sb1vYxGfcMqbh938hmV72jJ0+lAZpHLBxcZPO1XNn3ouAEVS/es6S20PIrKebsWmV0r052XTAM35VP3ntk8boPmvzaB52KrVzPXEign0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=mTZ0nuGH; arc=none smtp.client-ip=209.85.166.45 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="mTZ0nuGH" Received: by mail-io1-f45.google.com with SMTP id ca18e2360f4ac-91179e3fe34so131986939f.1 for ; Fri, 03 Oct 2025 12:54:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1759521256; x=1760126056; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Ad/eDbLq2Wo/mrB8DJqKuZpped5NxFhJJkxQWW7iwws=; b=mTZ0nuGHP/w7QjfzlVfmF79xU7CNxu+PKWnz1WlUvwhRymi2VsG4QTVX1lMj5hXWMY sFD+neT85ZAn8HVLkrfyyzOPSZ27h7Ss1ZeEmVkE4vxm2RKRhy2VoGk+6+KshPU+z7C0 Gd9ubHS+dT15ukPZt5pISUDh7rsrBnJ8bz3FL4cRkvaIJLoDkzdL2LTTPM+uwJmk9h2s fRwnat+lLZJMGkTshLwejzTm6lqry2tAX4UsZLN68Fbzw1pdSxuzbaUakHzwQdTqa47I hYSo03pLCoSo55s7OAgSbNBT8fj45rKor0IKNj9m2KFcCPIYUeEpEGeoBH0Z+fJhQlIW jV1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1759521256; x=1760126056; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Ad/eDbLq2Wo/mrB8DJqKuZpped5NxFhJJkxQWW7iwws=; b=WN2qi1Lf3pGVrPwxc1tG7eritKx8j3GI5zwofHkGNZLWKdF5gY4uq48XuLz0Gyp954 lEFAQWOg4gQmTetgOg9dRoouIfFp9pVutdXurSFXpx0ms94Uh8DiFdwKQj2wGXhsi0B2 MVHm5fTHAf2P6O71KzMpkvRBjzeSHY1A0jiepRJmaTn1131/5zc9VAWkdu913OS/XrRO VAnLuLDHGF77Dgj+9mQ6mSMyvXLVOk6vmq1W3SWCSsz92lDkE7egjccp9qeZhqWJWL6H L6FqOGAsiUsm9Yk45YJaFNlLBzjqhMuk7xXB+OuAkTpTp3iOQb4fXSMopIdnIAAJK7yP kj5g== X-Gm-Message-State: AOJu0YyNm5NSLCkVyT40nma50PFbcTs+c1T0OUXGbo8WI4oqjAYJPcxl mXe5NICIkyEmBA7uNDKdKikQhTs0IlOd+c9qaRT47oTlsdarJ+DBkbLdMJNev1l4 X-Gm-Gg: ASbGnctjuZuhm2lLQr30c4K9uj60Ho+NSPhV7+QuQRxE1bYwIRFVtwUepRVlvSVjUiH huLU11I5EP4xkwJ4+wDhUGL6oge+n5wg2ed/NO+bvAdhWqM81QyyT2t2i5t/Vyam5sueTmszKFy 3JPpCqvIAwtMbPQ7xD4Ip1377Dal+iraZYpsphJicH7dXTifc8A1dHEf1zD53MWHA+nNEuGddRJ 2g10tK0omLh95M5C+xVSZ9C/bc/S4sX3RSatwlVGzel78Fz+riEyIaCiaDk7ef9ngGtIGtjQnhF h7Sn9wSZvXIM/KQ2l/IsctHJD7951guyxKsT2WacqazpMJQ/Geca7rAyeZwgtm3GHv9gETrdafC BuZVV+82e3h/0Vab6CCtusMVOEHDD8bvax5CLlJrzqCkXxYdgmkPZzSJTozg0wbmkT2urAmomdN W5VcFdg+EL4G8CsC6onR1P1QwvdpYF6XSF34OOGU+QQUcvANaf1GGua3C/ X-Google-Smtp-Source: AGHT+IGFr9efqpMYqCprbETZ0Yb7JkhIg7HeNNKGgNjVXKbQQLh9AjIVsjgBAE2YkDQ0DBPhxlFa3A== X-Received: by 2002:a05:6602:4f4f:b0:917:664e:c00b with SMTP id ca18e2360f4ac-93a8a55b4d9mr777270539f.9.1759521255965; Fri, 03 Oct 2025 12:54:15 -0700 (PDT) Received: from newton-fedora-MZ01GC9H (c-68-45-22-229.hsd1.in.comcast.net. [68.45.22.229]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-57b5ebc8181sm2296390173.47.2025.10.03.12.54.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 03 Oct 2025 12:54:15 -0700 (PDT) From: Ryan Newton To: linux-kernel@vger.kernel.org Cc: tj@kernel.org, arighi@nvidia.com, rrnewton@gmail.com, newton@meta.com Subject: [PATCH v2 1/3] sched_ext: Add lockless peek operation for DSQs Date: Fri, 3 Oct 2025 15:54:06 -0400 Message-ID: <20251003195408.675527-2-rrnewton@gmail.com> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20251003195408.675527-1-rrnewton@gmail.com> References: <20251003195408.675527-1-rrnewton@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Ryan Newton The builtin DSQ queue data structures are meant to be used by a wide range of different sched_ext schedulers with different demands on these data structures. They might be per-cpu with low-contention, or high-contention shared queues. Unfortunately, DSQs have a coarse-grained lock around the whole data structure. Without going all the way to a lock-free, more scalable implementation, a small step we can take to reduce lock contention is to allow a lockless, small-fixed-cost peek at the head of the queue. This change allows certain custom SCX schedulers to cheaply peek at queues, e.g. during load balancing, before locking them. But it represents a few extra memory operations to update the pointer each time the DSQ is modified, including a memory barrier on ARM so the write appears correctly ordered. This commit adds a first_task pointer field which is updated atomically when the DSQ is modified, and allows any thread to peek at the head of the queue without holding the lock. Signed-off-by: Ryan Newton --- include/linux/sched/ext.h | 1 + kernel/sched/ext.c | 59 ++++++++++++++++++++++++ tools/sched_ext/include/scx/common.bpf.h | 1 + tools/sched_ext/include/scx/compat.bpf.h | 19 ++++++++ 4 files changed, 80 insertions(+) diff --git a/include/linux/sched/ext.h b/include/linux/sched/ext.h index d82b7a9b0658..81478d4ae782 100644 --- a/include/linux/sched/ext.h +++ b/include/linux/sched/ext.h @@ -58,6 +58,7 @@ enum scx_dsq_id_flags { */ struct scx_dispatch_q { raw_spinlock_t lock; + struct task_struct __rcu *first_task; /* lockless peek at head */ struct list_head list; /* tasks in dispatch order */ struct rb_root priq; /* used to order by p->scx.dsq_vtime */ u32 nr; diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c index 2b0e88206d07..ea0fe4019eca 100644 --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -885,6 +885,27 @@ static void refill_task_slice_dfl(struct scx_sched *sc= h, struct task_struct *p) __scx_add_event(sch, SCX_EV_REFILL_SLICE_DFL, 1); } =20 +/* while holding dsq->lock and does nothing on builtin DSQs */ +static inline void dsq_set_first_task(struct scx_dispatch_q *dsq, + struct task_struct *p) +{ + if (dsq->id & SCX_DSQ_FLAG_BUILTIN) + return; + rcu_assign_pointer(dsq->first_task, p); +} + +/* while holding dsq->lock and does nothing on builtin DSQs */ +static void dsq_update_first_task(struct scx_dispatch_q *dsq) +{ + struct task_struct *first_task; + + if ((dsq->id & SCX_DSQ_FLAG_BUILTIN)) + return; + + first_task =3D nldsq_next_task(dsq, NULL, false); + rcu_assign_pointer(dsq->first_task, first_task); +} + static void dispatch_enqueue(struct scx_sched *sch, struct scx_dispatch_q = *dsq, struct task_struct *p, u64 enq_flags) { @@ -959,6 +980,9 @@ static void dispatch_enqueue(struct scx_sched *sch, str= uct scx_dispatch_q *dsq, list_add_tail(&p->scx.dsq_list.node, &dsq->list); } =20 + /* even the add_tail code path may have changed the first element */ + dsq_update_first_task(dsq); + /* seq records the order tasks are queued, used by BPF DSQ iterator */ dsq->seq++; p->scx.dsq_seq =3D dsq->seq; @@ -1013,6 +1037,7 @@ static void task_unlink_from_dsq(struct task_struct *= p, =20 list_del_init(&p->scx.dsq_list.node); dsq_mod_nr(dsq, -1); + dsq_update_first_task(dsq); } =20 static void dispatch_dequeue(struct rq *rq, struct task_struct *p) @@ -6084,6 +6109,39 @@ __bpf_kfunc void bpf_iter_scx_dsq_destroy(struct bpf= _iter_scx_dsq *it) kit->dsq =3D NULL; } =20 +/** + * scx_bpf_dsq_peek - Lockless peek at the first element. + * @dsq_id: DSQ to examine. + * + * Read the first element in the DSQ. This is semantically equivalent to u= sing + * the DSQ iterator, but is lockfree. + * + * Returns the pointer, or NULL indicates an empty queue OR internal error. + */ +__bpf_kfunc struct task_struct *scx_bpf_dsq_peek(u64 dsq_id) +{ + struct scx_sched *sch; + struct scx_dispatch_q *dsq; + + sch =3D rcu_dereference(scx_root); + /* Rather than return ERR_PTR(-ENODEV), we follow the precedent of + * other functions and let the peek fail but we continue on. + */ + if (unlikely(!sch)) + return NULL; + + dsq =3D find_user_dsq(sch, dsq_id); + if (unlikely(!dsq)) { + scx_error(sch, "peek on non-existent DSQ 0x%llx", dsq_id); + return NULL; + } + if (unlikely((dsq->id & SCX_DSQ_FLAG_BUILTIN))) { + scx_error(NULL, "peek disallowed on builtin DSQ 0x%llx", dsq_id); + return NULL; + } + return rcu_dereference(dsq->first_task); +} + __bpf_kfunc_end_defs(); =20 static s32 __bstr_format(struct scx_sched *sch, u64 *data_buf, char *line_= buf, @@ -6641,6 +6699,7 @@ BTF_KFUNCS_START(scx_kfunc_ids_any) BTF_ID_FLAGS(func, scx_bpf_kick_cpu) BTF_ID_FLAGS(func, scx_bpf_dsq_nr_queued) BTF_ID_FLAGS(func, scx_bpf_destroy_dsq) +BTF_ID_FLAGS(func, scx_bpf_dsq_peek, KF_RCU_PROTECTED | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_iter_scx_dsq_new, KF_ITER_NEW | KF_RCU_PROTECTED) BTF_ID_FLAGS(func, bpf_iter_scx_dsq_next, KF_ITER_NEXT | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_iter_scx_dsq_destroy, KF_ITER_DESTROY) diff --git a/tools/sched_ext/include/scx/common.bpf.h b/tools/sched_ext/inc= lude/scx/common.bpf.h index 06e2551033cb..fbf3e7f9526c 100644 --- a/tools/sched_ext/include/scx/common.bpf.h +++ b/tools/sched_ext/include/scx/common.bpf.h @@ -75,6 +75,7 @@ u32 scx_bpf_reenqueue_local(void) __ksym; void scx_bpf_kick_cpu(s32 cpu, u64 flags) __ksym; s32 scx_bpf_dsq_nr_queued(u64 dsq_id) __ksym; void scx_bpf_destroy_dsq(u64 dsq_id) __ksym; +struct task_struct *scx_bpf_dsq_peek(u64 dsq_id) __ksym __weak; int bpf_iter_scx_dsq_new(struct bpf_iter_scx_dsq *it, u64 dsq_id, u64 flag= s) __ksym __weak; struct task_struct *bpf_iter_scx_dsq_next(struct bpf_iter_scx_dsq *it) __k= sym __weak; void bpf_iter_scx_dsq_destroy(struct bpf_iter_scx_dsq *it) __ksym __weak; diff --git a/tools/sched_ext/include/scx/compat.bpf.h b/tools/sched_ext/inc= lude/scx/compat.bpf.h index dd9144624dc9..97b10c184b2c 100644 --- a/tools/sched_ext/include/scx/compat.bpf.h +++ b/tools/sched_ext/include/scx/compat.bpf.h @@ -130,6 +130,25 @@ int bpf_cpumask_populate(struct cpumask *dst, void *sr= c, size_t src__sz) __ksym false; \ }) =20 + +/* + * v6.19: Introduce lockless peek API for user DSQs. + * + * Preserve the following macro until v6.21. + */ +static inline struct task_struct *__COMPAT_scx_bpf_dsq_peek(u64 dsq_id) +{ + struct task_struct *p =3D NULL; + struct bpf_iter_scx_dsq it; + + if (bpf_ksym_exists(scx_bpf_dsq_peek)) + return scx_bpf_dsq_peek(dsq_id); + if (!bpf_iter_scx_dsq_new(&it, dsq_id, 0)) + p =3D bpf_iter_scx_dsq_next(&it); + bpf_iter_scx_dsq_destroy(&it); + return p; +} + /** * __COMPAT_is_enq_cpu_selected - Test if SCX_ENQ_CPU_SELECTED is on * in a compatible way. We will preserve this __COMPAT helper until v6.16. --=20 2.51.0 From nobody Sun Feb 8 06:21:50 2026 Received: from mail-il1-f173.google.com (mail-il1-f173.google.com [209.85.166.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DB23D221F26 for ; Fri, 3 Oct 2025 19:54:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759521260; cv=none; b=LeX2sroOAh0xkjJ/Ks6qofyhx7TE9Jn9M13kLzZcMLI7AJAVpwghe+UJp71zBXzlYET8q3MVEHkyPciQRP7Qt8VnQS9Gqq9TgdnBKP13WRpcCC4Sr0Jk+K95VFXI0h9jNwL6AmMZvg3H/NfZNfKHCRlASU8m31Z6iPOz+9vdDOU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759521260; c=relaxed/simple; bh=GkNgdKcnmVi9+/B+M+46U59wKu6VH1KFNJYzGItdEhQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Y0rIJYZ4ssdyId6fT6VN5HbVy9fD6WmZ8dDNkj0vBtb9dw0PH8aNGQYf5o3BxLVqoxnfMW1aOzZajicB7JvjZIHILP70GtbgVCsgwg+F69rB2wH58pErakP2EulamU3Ty3Sqlkze4WyEj51OZjH8YYEyhDj7qU/YmhwxFxp/uCA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=czwRrBOu; arc=none smtp.client-ip=209.85.166.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="czwRrBOu" Received: by mail-il1-f173.google.com with SMTP id e9e14a558f8ab-42f5e9e4314so8327815ab.0 for ; Fri, 03 Oct 2025 12:54:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1759521257; x=1760126057; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=lMp+XHfNq1nZpZJaFDK8t9UIIN13yunZQqK1KQ41Wew=; b=czwRrBOurknv6mijwM6YBUX7IOdSZNnFpKzacq8mPyZOwRC2cVhfErWk9OOsVzWCz3 Igys9wDp1PEprNg0mp45np/A+hzxAh7ljnV9wmTO7snKJSJfnA5VWCaIzaURsAnsX9Zl 7PIUZLknZFKzskLDnpSK877/BE8iEjW+SxH7JotCUdJYT9IaMJXIHnE8luaqern3TqGL AAJGkHYMkGrnlM3uWx15C6/XDbw4GBaBNR+Jyr7AJp8Qm0HPo1j/6V5HD+JljyHx/dcT o6WfJwxSWlcWckZE4B+E/Vti2PsoPPk1wh2W0cXG3KaGCGOAtnEykMR0RxcNWeEL9cMX kOUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1759521257; x=1760126057; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lMp+XHfNq1nZpZJaFDK8t9UIIN13yunZQqK1KQ41Wew=; b=Y2HNFBNFGo9ZMukZbzg7WO/MPIJVf8MM0n1kt22EIjTtFBi9lO+jgk15vKzB4wF9tS BORCVkOQ1YWb8TX8q0IQexLi0cCHQpKnLDRYXg5kiqewU3b97nMgBweKS5AKUedCg4uy NwpyhAuAkGFIdo/6WChznoWt7A8ze3tLhrwEWPx7ZPPQ7+u1P2wwCnSMBs3Ka5ysOKG0 Aw7LXtmlG+Q4GaTNmd4wU94LgMV0r/YnhIdH1mA649DF3C+wYpedzXR8Me3b5x2pWQgm zU5r5SSz7WDIflqw0kx0xEZe2a3DvMoRu+Ht/dOiAgN4SFyK9qVgfThYj1ferIRpgrE9 gZUA== X-Gm-Message-State: AOJu0YwrMjF0MFYgRevfmcgXD5I4bEsEoQ6f5k2NsNH4x78F4kQqVg8o jUgQsj0AvP8Bmq2K/CghLfnIXTDbPMTqiz2M+z90UPGb4fdTwA6oCPcPXSuYuL79 X-Gm-Gg: ASbGncsfSpJlftXPptBSV7udTESoC1Q4l3ghkuJOjSFtHPZcB5c3CjqwGs9ENNdPlmm BT4khp/IyQv8ZAudiTodastqyEBavakGux2/8Gij5FmVIOM+0z6VSiPNyOJFGzERIsPh9sLM41u Oz5vqbyrGohrKJusHsHEZCOy4Fj9qP0QkW81qsgDdMvzAP7msSrgZhtrGM20jTB3xwDE7GDcXt7 sV6Xpv0/aCjiVtdJcTN4PvQec7HEksFH7Bd2SM1Vyrd9YQPoxUwq1gkSI3+XpmfFgiVsWYHIVBC B/1mdo4B/VXHtdWcy6MClwIXnTpEub8/tjgctqVidpEn9Ivw6u544cHDpwrA4A2nnbfQ+PBPX1l RAQlzGCs/p8J7kQLNKZcdQweuFlhxd/RXJhwJpl8qrpAsaITpKCtKzjPxL47oAWosz+cCIIChZa ALcHn766T/QdC29QOg6atHrQuGAMG4Lgl3qkyU/12uqEyo6EGYEhsd2ao5 X-Google-Smtp-Source: AGHT+IFQq2tzXBb8pGJy0qlKDCFi99sDxKwKKAyXjgAgc7tyG0D296gU4ZRzy+caryXGUg13+peFjA== X-Received: by 2002:a05:6e02:16c6:b0:42e:712c:b15 with SMTP id e9e14a558f8ab-42e7acd2c6bmr54475085ab.1.1759521257174; Fri, 03 Oct 2025 12:54:17 -0700 (PDT) Received: from newton-fedora-MZ01GC9H (c-68-45-22-229.hsd1.in.comcast.net. [68.45.22.229]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-57b5ebc8181sm2296390173.47.2025.10.03.12.54.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 03 Oct 2025 12:54:16 -0700 (PDT) From: Ryan Newton To: linux-kernel@vger.kernel.org Cc: tj@kernel.org, arighi@nvidia.com, rrnewton@gmail.com, newton@meta.com Subject: [PATCH v2 2/3] sched_ext: optimize first_task update logic Date: Fri, 3 Oct 2025 15:54:07 -0400 Message-ID: <20251003195408.675527-3-rrnewton@gmail.com> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20251003195408.675527-1-rrnewton@gmail.com> References: <20251003195408.675527-1-rrnewton@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Ryan Newton This is a follow-on optimization to the prior commit which added a lockless peek operation on DSQs. That implementation is correct and simple, but elides several optimizations. Previously, we read the first_task using the same slowpath, irrespective of where we enqueue the task. With this change, we instead base the update on what we know about the calling context. On both insert and removal we can break down whether the change (1) definitely, (2) never, or (3) sometimes changes first task. In some cases we know what the new first task will be, and can set it more directly. Signed-off-by: Ryan Newton --- kernel/sched/ext.c | 22 ++++++++++++++++------ 1 file changed, 16 insertions(+), 6 deletions(-) diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c index ea0fe4019eca..53d7ba64a483 100644 --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -965,8 +965,11 @@ static void dispatch_enqueue(struct scx_sched *sch, st= ruct scx_dispatch_q *dsq, container_of(rbp, struct task_struct, scx.dsq_priq); list_add(&p->scx.dsq_list.node, &prev->scx.dsq_list.node); + /* first task unchanged - no update needed */ } else { list_add(&p->scx.dsq_list.node, &dsq->list); + /* not builtin and new task is at head - use fastpath */ + rcu_assign_pointer(dsq->first_task, p); } } else { /* a FIFO DSQ shouldn't be using PRIQ enqueuing */ @@ -974,15 +977,20 @@ static void dispatch_enqueue(struct scx_sched *sch, s= truct scx_dispatch_q *dsq, scx_error(sch, "DSQ ID 0x%016llx already had PRIQ-enqueued tasks", dsq->id); =20 - if (enq_flags & (SCX_ENQ_HEAD | SCX_ENQ_PREEMPT)) + if (enq_flags & (SCX_ENQ_HEAD | SCX_ENQ_PREEMPT)) { list_add(&p->scx.dsq_list.node, &dsq->list); - else + /* new task inserted at head - use fastpath */ + dsq_set_first_task(dsq, p); + } else { + bool was_empty; + + was_empty =3D list_empty(&dsq->list); list_add_tail(&p->scx.dsq_list.node, &dsq->list); + if (was_empty) + dsq_set_first_task(dsq, p); + } } =20 - /* even the add_tail code path may have changed the first element */ - dsq_update_first_task(dsq); - /* seq records the order tasks are queued, used by BPF DSQ iterator */ dsq->seq++; p->scx.dsq_seq =3D dsq->seq; @@ -1035,9 +1043,11 @@ static void task_unlink_from_dsq(struct task_struct = *p, p->scx.dsq_flags &=3D ~SCX_TASK_DSQ_ON_PRIQ; } =20 + if (dsq->first_task =3D=3D p) + dsq_update_first_task(dsq); + list_del_init(&p->scx.dsq_list.node); dsq_mod_nr(dsq, -1); - dsq_update_first_task(dsq); } =20 static void dispatch_dequeue(struct rq *rq, struct task_struct *p) --=20 2.51.0 From nobody Sun Feb 8 06:21:50 2026 Received: from mail-il1-f175.google.com (mail-il1-f175.google.com [209.85.166.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 67FB323815C for ; Fri, 3 Oct 2025 19:54:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.175 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759521262; cv=none; b=XwdyhBdVFT8D1Z3Yq0SQlxph/Gc7ixnpOYfI72eNP7vZn0uDUVvZWfCD3dbaJFOsJtkNtZH5wHNWiMRRcjnDwwA8p3OUQ+01VoQ31e1XBsLf/yk5sdFvIJ9PKKfTKBLPRv8+90Z0W9GWCNH9kabWxJmNTPLWkoW+U47LNknYmwo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759521262; c=relaxed/simple; bh=nMleBFXH6bcfJx5OJ8L/WmJo6u6eVfE6KPYO/5HKNFA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=A4cKtdTOxW2a/4REYqfenaecilncp4YF9GZfKfh8rJTIRIaSz0u8bwISST4AB1kfDDlzsx/Ort6TOL7Bbo4lnPwoqHIDdcK9SnnKn3keDpwpDKnizFeevvmNvBVGj0qS7b+ZtTk5ZFErinWXcs4tNJOb+qTGwwJEroHZhUNxsPI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=T/LYqs5j; arc=none smtp.client-ip=209.85.166.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="T/LYqs5j" Received: by mail-il1-f175.google.com with SMTP id e9e14a558f8ab-42f3acb1744so4989685ab.0 for ; Fri, 03 Oct 2025 12:54:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1759521259; x=1760126059; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=xG/jchVd7FOQzHSdx6QmN8oRj8yUYSgtpFCtEzkaOx4=; b=T/LYqs5jrAtmuBtkL7xHXhZYm1+PuHo3BevgFLHl386W0LHcAz1TMIF0JWOlosro9l y1lKEAABePWdeVP2+HeZLxFAQH9GvxsPRnFCn8dm+syCV+BxVdXrmS9xkiiQUg5uaXwC wia19hcpPzXNLkZJPBBg/tesk3/8MLRcCerkcAV/VF9BwDqXh29ZKkgQg5k3Qyx+Kzgg weaMpMYZF6y1nSU7VESYiHYN4gHCHs8kyvXNV9qKtSjRX59uL+L1wHrI2M9iFyNoIx9O ht4WDKf4Qsqjg35bUHk2UL53qIUJVAp9bgFVfqp52A/sYWqjGTH68sUqFzME0mL2SQd5 jAJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1759521259; x=1760126059; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=xG/jchVd7FOQzHSdx6QmN8oRj8yUYSgtpFCtEzkaOx4=; b=tOtLN9xMOUGd4X/MI1AJ+zLf0LGYusmPtj3zLcEDQEKu14ZViyBHK7qwd/rdhE9LAe 5x8Irw65vCP8JzuYToEOtUC1s4KiXkAqDZNkg9UGHirUz7DL8O9t0gz+Hn3MO9wTo5tr ajPoGIREBPanxs6xuyC6A5THqzEX9DKX0U74QuC5UoXuKaQ/RPJ2FhxyF0gqb8hflqoc iyeUc43ieXhH4VP+ENxqicE4KwTNKKEoQynbW1B2vXO5JBB2GnaOfCl0aKDGZcWau/L8 EKBn0p67OV+Q1Z7eiCDlLYzor0tRyQZu7mPEQw56M3kBSMTVq2UypJOtBiuaaaNzhM9G WlAQ== X-Gm-Message-State: AOJu0YxQ4FvDX53nZRxtKozaazd+/eWgAqf9m8dJEvWfPjQl4hHW9QxX FPpCvpdVTwTjELPpbu7wfojhAlJAcMLAHOJPgZsd6zuoJpmVNbjSKTij6M03EB3A X-Gm-Gg: ASbGncuHcRoClfmgiXH7uijNJqPBLp71kbaNdznL1w5mB3KwSusei3a/Ti7a07Gz6FG rJF2dQ6a/XSalOAFYlOIF0C9gfY0vN+O41M1tIybs3WX1/f6k2YJVnxLWKhT25lh6SZx7mEyChG xgd0mBg2dPLQomQVdREePzIUZt8xsXEsihr6lnf+QTp86nrm84JHDYm2Wjj/pUr+l/nSP4LoWco o+hou5/JrCMJloT7vIHElUK1hM21tue5FYg41tQT4vTCl0jchhOxk1CD8jne5IeWYhGDwKf0Dpf 5UN5T8XwvOIY6UECwXomLrNVIhqmbG7j1iOltZ1HlnuDAnTT1zFUT83JfDtsN2cvYqqiOKr+k3/ ZnTRF+iGQy2JmP+I0lNfHnku0uk3M5mOgiVxt86Qr5MK+ESv+qwIUiZ87GPE39jRuUNrJXoVtHN iuVsSMYzWopfThAPR0KqGAR1aRHMna5QN5NxXvwoGGAs7iZw== X-Google-Smtp-Source: AGHT+IFhQESd9MQviCWJSFFOrRvgjRZzpsmh8SUF0fLNVcge/P4nHtN5W/C+Uxj8rE05FoG9s+NFYA== X-Received: by 2002:a05:6e02:168e:b0:42d:840f:319b with SMTP id e9e14a558f8ab-42e7ad8458cmr57492035ab.18.1759521258744; Fri, 03 Oct 2025 12:54:18 -0700 (PDT) Received: from newton-fedora-MZ01GC9H (c-68-45-22-229.hsd1.in.comcast.net. [68.45.22.229]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-57b5ebc8181sm2296390173.47.2025.10.03.12.54.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 03 Oct 2025 12:54:17 -0700 (PDT) From: Ryan Newton To: linux-kernel@vger.kernel.org Cc: tj@kernel.org, arighi@nvidia.com, rrnewton@gmail.com, newton@meta.com Subject: [PATCH v2 3/3] sched_ext: Add a selftest for scx_bpf_dsq_peek Date: Fri, 3 Oct 2025 15:54:08 -0400 Message-ID: <20251003195408.675527-4-rrnewton@gmail.com> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20251003195408.675527-1-rrnewton@gmail.com> References: <20251003195408.675527-1-rrnewton@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Ryan Newton This is the most basic unit test: make sure an empty queue peeks as empty, and when we put one element in the queue, make sure peek returns that element. However, even this simple test is a little complicated by the different behavior of scx_bpf_dsq_insert in different calling contexts: - insert is for direct dispatch in enqueue - insert is delayed when called from select_cpu In this case we split the insert and the peek that verifies the result between enqueue/dispatch. Note: An alternative would be to call `scx_bpf_dsq_move_to_local` on an empty queue, which in turn calls `flush_dispatch_buf`, in order to flush the buffered insert. Unfortunately, this is not viable within the enqueue path, as it attempts a voluntary context switch within an RCU read-side critical section. Signed-off-by: Ryan Newton --- tools/testing/selftests/sched_ext/Makefile | 1 + .../selftests/sched_ext/peek_dsq.bpf.c | 268 ++++++++++++++++++ tools/testing/selftests/sched_ext/peek_dsq.c | 235 +++++++++++++++ 3 files changed, 504 insertions(+) create mode 100644 tools/testing/selftests/sched_ext/peek_dsq.bpf.c create mode 100644 tools/testing/selftests/sched_ext/peek_dsq.c diff --git a/tools/testing/selftests/sched_ext/Makefile b/tools/testing/sel= ftests/sched_ext/Makefile index 9d9d6b4c38b0..5fe45f9c5f8f 100644 --- a/tools/testing/selftests/sched_ext/Makefile +++ b/tools/testing/selftests/sched_ext/Makefile @@ -174,6 +174,7 @@ auto-test-targets :=3D \ minimal \ numa \ allowed_cpus \ + peek_dsq \ prog_run \ reload_loop \ select_cpu_dfl \ diff --git a/tools/testing/selftests/sched_ext/peek_dsq.bpf.c b/tools/testi= ng/selftests/sched_ext/peek_dsq.bpf.c new file mode 100644 index 000000000000..977ecf2c6ffc --- /dev/null +++ b/tools/testing/selftests/sched_ext/peek_dsq.bpf.c @@ -0,0 +1,268 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * A BPF program for testing DSQ operations including create, destroy, + * and peek operations. Uses a hybrid approach: + * - Syscall program for DSQ lifecycle (create/destroy) + * - Struct ops scheduler for task insertion/dequeue testing + * + * Copyright (c) 2025 Meta Platforms, Inc. and affiliates. + * Copyright (c) 2025 Ryan Newton + */ + +#include +#include + +char _license[] SEC("license") =3D "GPL"; + +#define MAX_SAMPLES 100 +#define MAX_CPUS 512 +#define DSQ_POOL_SIZE 8 +int max_samples =3D MAX_SAMPLES; +int max_cpus =3D MAX_CPUS; +int dsq_pool_size =3D DSQ_POOL_SIZE; + +/* Global variables to store test results */ +int dsq_create_result =3D -1; +int dsq_destroy_result =3D -1; +int dsq_peek_result1 =3D -1; +long dsq_inserted_pid =3D -1; +int insert_test_cpu =3D -1; /* Set to the cpu that performs the test */ +long dsq_peek_result2 =3D -1; +long dsq_peek_result2_pid =3D -1; +long dsq_peek_result2_expected =3D -1; +int test_dsq_id =3D 1234; /* Use a simple ID like create_dsq example */ +int real_dsq_id =3D 1235; /* DSQ for normal operation */ +int enqueue_count =3D -1; +int dispatch_count =3D -1; +int debug_ksym_exists =3D -1; + +/* DSQ pool for stress testing */ +int dsq_pool_base_id =3D 2000; +int phase1_complete =3D -1; +int total_peek_attempts =3D -1; +int successful_peeks =3D -1; + +/* BPF map for sharing peek results with userspace */ +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __uint(max_entries, MAX_SAMPLES); + __type(key, u32); + __type(value, long); +} peek_results SEC(".maps"); + +/* Test if we're actually using the native or compat version */ +int check_dsq_insert_ksym(void) +{ + return bpf_ksym_exists(scx_bpf_dsq_insert) ? 1 : 0; +} + +int check_dsq_peek_ksym(void) +{ + return bpf_ksym_exists(scx_bpf_dsq_peek) ? 1 : 0; +} + +static inline int get_random_dsq_id(void) +{ + u64 time =3D bpf_ktime_get_ns(); + + return dsq_pool_base_id + (time % DSQ_POOL_SIZE); +} + +static inline void record_peek_result(long pid) +{ + u32 slot_key; + long *slot_pid_ptr; + int ix; + + if (pid <=3D 0) + return; + + /* Find an empty slot or one with the same PID */ + bpf_for(ix, 0, 10) { + slot_key =3D (pid + ix) % MAX_SAMPLES; + slot_pid_ptr =3D bpf_map_lookup_elem(&peek_results, &slot_key); + if (!slot_pid_ptr) + continue; + + if (*slot_pid_ptr =3D=3D -1 || *slot_pid_ptr =3D=3D pid) { + *slot_pid_ptr =3D pid; + break; + } + } +} + +/* Scan all DSQs in the pool and try to move a task to local */ +static inline int scan_dsq_pool(void) +{ + struct task_struct *task; + int moved =3D 0; + int i; + + bpf_for(i, 0, DSQ_POOL_SIZE) { + int dsq_id =3D dsq_pool_base_id + i; + + total_peek_attempts++; + + task =3D __COMPAT_scx_bpf_dsq_peek(dsq_id); + if (task) { + successful_peeks++; + record_peek_result(task->pid); + + /* Try to move this task to local */ + if (!moved && scx_bpf_dsq_move_to_local(dsq_id) =3D=3D 0) { + moved =3D 1; + break; + } + } + } + return moved; +} + +/* Struct_ops scheduler for testing DSQ peek operations */ +void BPF_STRUCT_OPS(peek_dsq_enqueue, struct task_struct *p, u64 enq_flags) +{ + struct task_struct *peek_result; + int last_insert_test_cpu, cpu; + + enqueue_count++; + cpu =3D bpf_get_smp_processor_id(); + last_insert_test_cpu =3D __sync_val_compare_and_swap( + &insert_test_cpu, -1, cpu); + + /* Phase 1: Simple insert-then-peek test (only on first task) */ + if (last_insert_test_cpu =3D=3D -1) { + bpf_printk("peek_dsq_enqueue beginning phase 1 peek test on cpu %d\n", c= pu); + + /* Test 1: Peek empty DSQ - should return NULL */ + peek_result =3D __COMPAT_scx_bpf_dsq_peek(test_dsq_id); + dsq_peek_result1 =3D (long)peek_result; /* Should be 0 (NULL) */ + + /* Test 2: Insert task into test DSQ for testing in dispatch callback */ + dsq_inserted_pid =3D p->pid; + scx_bpf_dsq_insert(p, test_dsq_id, 0, enq_flags); + dsq_peek_result2_expected =3D (long)p; /* Expected the task we just inse= rted */ + } else if (!phase1_complete) { + /* Still in phase 1, use real DSQ */ + scx_bpf_dsq_insert(p, real_dsq_id, 0, enq_flags); + } else { + /* Phase 2: Random DSQ insertion for stress testing */ + int random_dsq_id =3D get_random_dsq_id(); + + scx_bpf_dsq_insert(p, random_dsq_id, 0, enq_flags); + } +} + +void BPF_STRUCT_OPS(peek_dsq_dispatch, s32 cpu, struct task_struct *prev) +{ + dispatch_count++; + + /* Phase 1: Complete the simple peek test if we inserted a task but + * haven't tested peek yet + */ + if (insert_test_cpu =3D=3D cpu && dsq_peek_result2 =3D=3D -1) { + struct task_struct *peek_result; + + bpf_printk("peek_dsq_dispatch completing phase 1 peek test on cpu %d\n",= cpu); + + /* Test 3: Peek DSQ after insert - should return the task we inserted */ + peek_result =3D __COMPAT_scx_bpf_dsq_peek(test_dsq_id); + /* Store the PID of the peeked task for comparison */ + dsq_peek_result2 =3D (long)peek_result; + dsq_peek_result2_pid =3D peek_result ? peek_result->pid : -1; + + /* Now consume the task since we've peeked at it */ + scx_bpf_dsq_move_to_local(test_dsq_id); + + /* Mark phase 1 as complete */ + phase1_complete =3D 1; + bpf_printk("Phase 1 complete, starting phase 2 stress testing\n"); + } else if (!phase1_complete) { + /* Still in phase 1, use real DSQ */ + scx_bpf_dsq_move_to_local(real_dsq_id); + } else { + /* Phase 2: Scan all DSQs in the pool and try to move a task */ + if (!scan_dsq_pool()) { + /* No tasks found in DSQ pool, fall back to real DSQ */ + scx_bpf_dsq_move_to_local(real_dsq_id); + } + } +} + +s32 BPF_STRUCT_OPS_SLEEPABLE(peek_dsq_init) +{ + s32 err; + int i; + + /* Always set debug values so we can see which version we're using */ + debug_ksym_exists =3D check_dsq_peek_ksym(); + + /* Initialize state first */ + insert_test_cpu =3D -1; + enqueue_count =3D 0; + dispatch_count =3D 0; + phase1_complete =3D 0; + total_peek_attempts =3D 0; + successful_peeks =3D 0; + dsq_create_result =3D 0; /* Reset to 0 before attempting */ + + /* Create the test and real DSQs */ + err =3D scx_bpf_create_dsq(test_dsq_id, -1); + if (!err) + err =3D scx_bpf_create_dsq(real_dsq_id, -1); + if (err) { + dsq_create_result =3D err; + scx_bpf_error("Failed to create primary DSQ %d: %d", test_dsq_id, err); + return err; + } + + /* Create the DSQ pool for stress testing */ + bpf_for(i, 0, DSQ_POOL_SIZE) { + int dsq_id =3D dsq_pool_base_id + i; + + err =3D scx_bpf_create_dsq(dsq_id, -1); + if (err) { + dsq_create_result =3D err; + scx_bpf_error("Failed to create DSQ pool entry %d: %d", dsq_id, err); + return err; + } + } + + dsq_create_result =3D 1; /* Success */ + + /* Initialize the peek results map */ + bpf_for(i, 0, MAX_SAMPLES) { + u32 key =3D i; + long pid =3D -1; + + bpf_map_update_elem(&peek_results, &key, &pid, BPF_ANY); + } + + return 0; +} + +void BPF_STRUCT_OPS(peek_dsq_exit, struct scx_exit_info *ei) +{ + int i; + + /* Destroy the primary DSQs */ + scx_bpf_destroy_dsq(test_dsq_id); + scx_bpf_destroy_dsq(real_dsq_id); + + /* Destroy the DSQ pool */ + bpf_for(i, 0, DSQ_POOL_SIZE) { + int dsq_id =3D dsq_pool_base_id + i; + + scx_bpf_destroy_dsq(dsq_id); + } + + dsq_destroy_result =3D 1; +} + +SEC(".struct_ops.link") +struct sched_ext_ops peek_dsq_ops =3D { + .enqueue =3D (void *)peek_dsq_enqueue, + .dispatch =3D (void *)peek_dsq_dispatch, + .init =3D (void *)peek_dsq_init, + .exit =3D (void *)peek_dsq_exit, + .name =3D "peek_dsq", +}; diff --git a/tools/testing/selftests/sched_ext/peek_dsq.c b/tools/testing/s= elftests/sched_ext/peek_dsq.c new file mode 100644 index 000000000000..1e326d494d1f --- /dev/null +++ b/tools/testing/selftests/sched_ext/peek_dsq.c @@ -0,0 +1,235 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Test for DSQ operations including create, destroy, and peek operations. + * + * Copyright (c) 2025 Meta Platforms, Inc. and affiliates. + * Copyright (c) 2025 Ryan Newton + */ +#include +#include +#include +#include +#include +#include +#include +#include "peek_dsq.bpf.skel.h" +#include "scx_test.h" + +#define NUM_WORKERS 100 + +static bool workload_running =3D true; +static pthread_t workload_threads[NUM_WORKERS]; + +/** + * Background workload thread that sleeps and wakes rapidly to exercise + * the scheduler's enqueue operations and ensure DSQ operations get tested. + */ +static void *workload_thread_fn(void *arg) +{ + while (workload_running) { + /* Sleep for a very short time to trigger scheduler activity */ + usleep(1000); /* 1ms sleep */ + /* Yield to ensure we go through the scheduler */ + sched_yield(); + } + return NULL; +} + +static enum scx_test_status setup(void **ctx) +{ + struct peek_dsq *skel; + + skel =3D peek_dsq__open(); + SCX_FAIL_IF(!skel, "Failed to open"); + SCX_ENUM_INIT(skel); + SCX_FAIL_IF(peek_dsq__load(skel), "Failed to load skel"); + + *ctx =3D skel; + + return SCX_TEST_PASS; +} + +static int print_observed_pids(struct bpf_map *map, int max_samples, const= char *dsq_name) +{ + long count =3D 0; + + printf("Observed %s DSQ peek pids:\n", dsq_name); + for (int i =3D 0; i < max_samples; i++) { + long pid; + int err; + + err =3D bpf_map_lookup_elem(bpf_map__fd(map), &i, &pid); + if (err =3D=3D 0) { + if (pid =3D=3D 0) { + printf(" Sample %d: NULL peek\n", i); + } else if (pid > 0) { + printf(" Sample %d: pid %ld\n", i, pid); + count++; + } + } else { + printf(" Sample %d: error reading pid (err=3D%d)\n", i, err); + } + } + printf("Observed ~%ld pids in the %s DSQ(s)\n", count, dsq_name); + return count; +} + +static enum scx_test_status run(void *ctx) +{ + struct peek_dsq *skel =3D ctx; + bool failed =3D false; + int seconds =3D 3; + int err; + + /* Enable the scheduler to test DSQ operations */ + printf("Enabling scheduler to test DSQ insert operations...\n"); + + struct bpf_link *link =3D + bpf_map__attach_struct_ops(skel->maps.peek_dsq_ops); + + if (!link) { + SCX_ERR("Failed to attach struct_ops"); + return SCX_TEST_FAIL; + } + + printf("Starting %d background workload threads...\n", NUM_WORKERS); + workload_running =3D true; + for (int i =3D 0; i < NUM_WORKERS; i++) { + err =3D pthread_create(&workload_threads[i], NULL, workload_thread_fn, N= ULL); + if (err) { + SCX_ERR("Failed to create workload thread %d: %s", i, strerror(err)); + /* Stop already created threads */ + workload_running =3D false; + for (int j =3D 0; j < i; j++) + pthread_join(workload_threads[j], NULL); + bpf_link__destroy(link); + return SCX_TEST_FAIL; + } + } + + printf("Waiting for enqueue events.\n"); + sleep(seconds); + while (skel->data->enqueue_count <=3D 0) { + printf("."); + fflush(stdout); + sleep(1); + seconds++; + if (seconds >=3D 30) { + printf("\n\u2717 Timeout waiting for enqueue events\n"); + /* Stop workload threads and cleanup */ + workload_running =3D false; + for (int i =3D 0; i < NUM_WORKERS; i++) + pthread_join(workload_threads[i], NULL); + bpf_link__destroy(link); + return SCX_TEST_FAIL; + } + } + + workload_running =3D false; + for (int i =3D 0; i < NUM_WORKERS; i++) { + err =3D pthread_join(workload_threads[i], NULL); + if (err) { + SCX_ERR("Failed to join workload thread %d: %s", i, strerror(err)); + bpf_link__destroy(link); + return SCX_TEST_FAIL; + } + } + printf("Background workload threads stopped.\n"); + + /* Detach the scheduler */ + bpf_link__destroy(link); + + /* Check if DSQ creation succeeded */ + if (skel->data->dsq_create_result !=3D 1) { + printf("\u2717 DSQ create failed: got %d, expected 1\n", + skel->data->dsq_create_result); + failed =3D true; + } else { + printf("\u2713 DSQ create succeeded\n"); + } + + printf("Enqueue/dispatch count over %d seconds: %d / %d\n", seconds, + skel->data->enqueue_count, skel->data->dispatch_count); + printf("Debug: ksym_exists=3D%d\n", + skel->data->debug_ksym_exists); + + /* Check DSQ insert result */ + printf("DSQ insert test done on cpu: %d\n", skel->data->insert_test_cpu); + if (skel->data->insert_test_cpu !=3D -1) + printf("\u2713 DSQ insert succeeded !\n"); + else { + printf("\u2717 DSQ insert failed or not attempted\n"); + failed =3D true; + } + + /* Check DSQ peek results */ + printf(" DSQ peek result 1 (before insert): %d\n", + skel->data->dsq_peek_result1); + if (skel->data->dsq_peek_result1 =3D=3D 0) + printf("\u2713 DSQ peek verification success: peek returned NULL!\n"); + else { + printf("\u2717 DSQ peek verification failed\n"); + failed =3D true; + } + + printf(" DSQ peek result 2 (after insert): %ld\n", + skel->data->dsq_peek_result2); + printf(" DSQ peek result 2, expected: %ld\n", + skel->data->dsq_peek_result2_expected); + if (skel->data->dsq_peek_result2 =3D=3D + skel->data->dsq_peek_result2_expected) + printf("\u2713 DSQ peek verification success: peek returned the inserted= task!\n"); + else { + printf("\u2717 DSQ peek verification failed\n"); + failed =3D true; + } + + printf(" Inserted test task -> pid: %ld\n", skel->data->dsq_inserted_pid= ); + printf(" DSQ peek result 2 -> pid: %ld\n", skel->data->dsq_peek_result2_= pid); + + if (skel->data->dsq_destroy_result !=3D 1) { + printf("\u2717 DSQ destroy failed: got %d, expected 1\n", + skel->data->dsq_destroy_result); + failed =3D true; + } + + int pid_count; + + pid_count =3D print_observed_pids(skel->maps.peek_results, + skel->data->max_samples, "DSQ pool"); + + if (skel->data->debug_ksym_exists && pid_count =3D=3D 0) { + printf("\u2717 DSQ pool test failed: no successful peeks in native mode\= n"); + failed =3D true; + } + if (skel->data->debug_ksym_exists && pid_count > 0) + printf("\u2713 DSQ pool test success: observed successful peeks in nativ= e mode\n"); + + if (failed) + return SCX_TEST_FAIL; + else + return SCX_TEST_PASS; +} + +static void cleanup(void *ctx) +{ + struct peek_dsq *skel =3D ctx; + + if (workload_running) { + workload_running =3D false; + for (int i =3D 0; i < NUM_WORKERS; i++) + pthread_join(workload_threads[i], NULL); + } + + peek_dsq__destroy(skel); +} + +struct scx_test peek_dsq =3D { + .name =3D "peek_dsq", + .description =3D + "Test DSQ create/destroy operations and future peek functionality", + .setup =3D setup, + .run =3D run, + .cleanup =3D cleanup, +}; +REGISTER_SCX_TEST(&peek_dsq) --=20 2.51.0