From nobody Wed Dec 17 16:08:10 2025 Received: from mail-il1-f172.google.com (mail-il1-f172.google.com [209.85.166.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E99CF2877DF for ; Thu, 2 Oct 2025 02:57:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759373854; cv=none; b=VsGa7tSXwuL4dmngIs7iFG1M93BgHn1JpQh8gTqee7cEmyKBm8/VNPuX1eQHqhuvj4JHl2jWaK94QdtrgxJenckfin/PH7cPsgZ0lg/c974hsmrkpFWlX8MaZqmdmSGWvZoHOTx+4NfQASaN4Ckr95aCJqLEBkO05nfI0Xb3fGA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759373854; c=relaxed/simple; bh=m8irqQQrWS/AppJmzBHn9/1/BqdXLBMn02Dh6cMBMkA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=f4pynWLXhiYN7RzpxrgdBFCYCOeHVnw+mSm8a8/VB3eZntzCXJ8bsOKJPGU0y6qWBm++Tn2dpSjQtrAfQEuM3ql4MBWHWNhqD6ru/8uDkFmxSJ14V4bkyu1ThEa2IFojPN97oQM5zy1WXslv21ogbfOzBCCeJnqISDTTUB4rC8w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=CAN9zvYd; arc=none smtp.client-ip=209.85.166.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="CAN9zvYd" Received: by mail-il1-f172.google.com with SMTP id e9e14a558f8ab-42d8ad71a51so7100365ab.1 for ; Wed, 01 Oct 2025 19:57:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1759373851; x=1759978651; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=iRRc4K6LmuroEV6vB89FwgV28B/ksAOPI2h8bfGG5Gg=; b=CAN9zvYd3S1N+gK8J0jX3/Oin7S1irHlXO5eMHz/ArQcB2yT0bpGnMBfNHhoxgjSXl Yh8uDhgyox911TPZvlWFqG8eLXiBVge5GfeJ1xcbEdL2nqvdeoUiKA4xdYR/QIrpcjgo PJCo1L3ISZxLgX/bYNVRFVKWknsG8PFYL8YX7bRey3lkChMbxLUCUdrBn2Omjbiki657 dfOUR6SXw/sHbU23xwLoHWvuuTJIzg7akkOn/+joXQFGxkb59PzeAfySQiDll71KOeeS ybbL3ZuA/xQNSenl3E3Yn2Z+bq7wWdpFEEzOGQ8ZE/WOWVbaHXtx84zfchqFiFtbuAwb d4Pw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1759373851; x=1759978651; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=iRRc4K6LmuroEV6vB89FwgV28B/ksAOPI2h8bfGG5Gg=; b=oowJ7n7qHlYk23VM8GSbRVxL5z9pveteXTgz4jO5MfnmiBww7thwuHQ6vLFoDgzKUC vVyNBVJlOnbGTuVk7JvnFU59ohWtIMP5T6HgwJp62tKrwHxBRGN0xuy1ALaSd5NV7UK6 KbjjpZKzOEeQhlRzOfe1FS9dbt5psiwG6YEDIEa7n9xjqta4bNqaHD+4iJr6KN+Eq0B/ N4yX2E5vGngcUJnEkhrCE4wnJwyRdw4vThs5OTf3CHknWujyI8hjSBFhTn8MX0T2JcuJ wa2w+S2UehURB2cO/GAWnXzTwjkZPouotImYaQtAJpslgX6PYNHvN4cro16d4SSa/QQD dkiA== X-Gm-Message-State: AOJu0Yzt4ppvRy696S+Vx7oWQUMboM7e13V0NeCtX15HmgWN+cnGfwdW /qsonSJ2aC8GpOD2X5hTelSlfq5kiNTwEpyDDxgSbQnsEMpAtnIEdtR8pHQaJKoB X-Gm-Gg: ASbGncu6SVsvcihCYs1W4hZhEbY2iJrVaYrif0V+KPcjgb4uxE/eYB0tSYSqY4/eLRj 7ZjlRfLPLodekOhezTTbPTdEtfm/mxScE7K+fVDRIu0nm9PsWNZ0reOSGbDLY65zc4LNCWW0TRk mRTf8HdI0BK89AvrLjL7+lB/qIQlfJi9YsqI6kbBWFyBlfsRWvuZS8Y1KhbFlqFQ/CLE+veGt6b Tn0zngFeJQsWynV5lg1crdBYsqeipBgv8/hBDHkG5pmDZUVOKi7I6Eo9IiEm2matUHLjf8dgZhx IDYIEq/uDBpcbHikvGcOKAlTtbeN9CN0QTYII+cbatnBs/ToJPb8k83mlwjKgzAo3SUZjre+OQP rgXnUPzju3tTu8LIJ71A+bHaVbG6WApt+eimH+Ln6X1iUgPMb7TjOG0RScvSPqNqHgyijP8ZnD/ BHiWe9bqu7Bnm4zrGsRYfYpKI+tuY0tCdy7nbpzhXX7CLD6qcn1tK/6/O0 X-Google-Smtp-Source: AGHT+IHod9HShDFHtXl45kRZhfQ2iuWNmcX7V08Eo581zjASSLepAjNEYaNgYu5TPBCq5xLIfaHLdw== X-Received: by 2002:a05:6e02:3105:b0:42d:8a3f:ec97 with SMTP id e9e14a558f8ab-42d8a3ff12emr29217695ab.7.1759373851351; Wed, 01 Oct 2025 19:57:31 -0700 (PDT) Received: from newton-fedora-MZ01GC9H (c-68-45-22-229.hsd1.in.comcast.net. [68.45.22.229]) by smtp.gmail.com with ESMTPSA id e9e14a558f8ab-42d8b1f4f0asm5250255ab.2.2025.10.01.19.57.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 Oct 2025 19:57:28 -0700 (PDT) From: Ryan Newton To: linux-kernel@vger.kernel.org Cc: tj@kernel.org, arighi@nvidia.com, newton@meta.com Subject: [PATCH 1/3] sched_ext: Add lockless peek operation for DSQs Date: Wed, 1 Oct 2025 22:57:19 -0400 Message-ID: <20251002025722.3420916-2-rrnewton@gmail.com> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20251002025722.3420916-1-rrnewton@gmail.com> References: <20251002025722.3420916-1-rrnewton@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Ryan Newton The builtin DSQ queue data structures are meant to be used by a wide range of different sched_ext schedulers with different demands on these data structures. They might be per-cpu with low-contention, or high-contention shared queues. Unfortunately, DSQs have a coarse-grained lock around the whole data structure. Without going all the way to a lock-free, more scalable implementation, a small step we can take to reduce lock contention is to allow a lockless, small-fixed-cost peek at the head of the queue. This change allows certain custom SCX schedulers to cheaply peek at queues, e.g. during load balancing, before locking them. But it represents a few extra memory operations to update the pointer each time the DSQ is modified, including a memory barrier on ARM so the write appears correctly ordered. This commit adds a first_task pointer field which is updated atomically when the DSQ is modified, and allows any thread to peek at the head of the queue without holding the lock. Signed-off-by: Ryan Newton --- include/linux/sched/ext.h | 1 + kernel/sched/ext.c | 37 ++++++++++++++++++++++++ tools/sched_ext/include/scx/common.bpf.h | 1 + tools/sched_ext/include/scx/compat.bpf.h | 19 ++++++++++++ 4 files changed, 58 insertions(+) diff --git a/include/linux/sched/ext.h b/include/linux/sched/ext.h index d82b7a9b0658..81478d4ae782 100644 --- a/include/linux/sched/ext.h +++ b/include/linux/sched/ext.h @@ -58,6 +58,7 @@ enum scx_dsq_id_flags { */ struct scx_dispatch_q { raw_spinlock_t lock; + struct task_struct __rcu *first_task; /* lockless peek at head */ struct list_head list; /* tasks in dispatch order */ struct rb_root priq; /* used to order by p->scx.dsq_vtime */ u32 nr; diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c index 2b0e88206d07..fd0121c03311 100644 --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -885,6 +885,15 @@ static void refill_task_slice_dfl(struct scx_sched *sc= h, struct task_struct *p) __scx_add_event(sch, SCX_EV_REFILL_SLICE_DFL, 1); } =20 +/* while holding dsq->lock */ +static void dsq_update_first_task(struct scx_dispatch_q *dsq) +{ + struct task_struct *first_task; + + first_task =3D nldsq_next_task(dsq, NULL, false); + rcu_assign_pointer(dsq->first_task, first_task); +} + static void dispatch_enqueue(struct scx_sched *sch, struct scx_dispatch_q = *dsq, struct task_struct *p, u64 enq_flags) { @@ -959,6 +968,9 @@ static void dispatch_enqueue(struct scx_sched *sch, str= uct scx_dispatch_q *dsq, list_add_tail(&p->scx.dsq_list.node, &dsq->list); } =20 + /* even the add_tail code path may have changed the first element */ + dsq_update_first_task(dsq); + /* seq records the order tasks are queued, used by BPF DSQ iterator */ dsq->seq++; p->scx.dsq_seq =3D dsq->seq; @@ -1013,6 +1025,7 @@ static void task_unlink_from_dsq(struct task_struct *= p, =20 list_del_init(&p->scx.dsq_list.node); dsq_mod_nr(dsq, -1); + dsq_update_first_task(dsq); } =20 static void dispatch_dequeue(struct rq *rq, struct task_struct *p) @@ -6084,6 +6097,29 @@ __bpf_kfunc void bpf_iter_scx_dsq_destroy(struct bpf= _iter_scx_dsq *it) kit->dsq =3D NULL; } =20 +/** + * scx_bpf_dsq_peek - Lockless peek at the first element. + * @dsq_id: DSQ to examine. + * + * Read the first element in the DSQ. This is semantically equivalent to u= sing + * the DSQ iterator, but is lockfree. + * + * Returns the pointer, or uses ERR_PTR() to encode an error as the pointe= r. + */ +__bpf_kfunc struct task_struct *scx_bpf_dsq_peek(u64 dsq_id) +{ + struct scx_sched *sch; + struct scx_dispatch_q *dsq; + + /* KF_RCU_PROTECTED means no need to guard(rcu)() */ + sch =3D rcu_dereference(scx_root); + + if (unlikely(!sch)) + return ERR_PTR(-ENODEV); + dsq =3D find_user_dsq(sch, dsq_id); + return rcu_dereference(dsq->first_task); +} + __bpf_kfunc_end_defs(); =20 static s32 __bstr_format(struct scx_sched *sch, u64 *data_buf, char *line_= buf, @@ -6641,6 +6677,7 @@ BTF_KFUNCS_START(scx_kfunc_ids_any) BTF_ID_FLAGS(func, scx_bpf_kick_cpu) BTF_ID_FLAGS(func, scx_bpf_dsq_nr_queued) BTF_ID_FLAGS(func, scx_bpf_destroy_dsq) +BTF_ID_FLAGS(func, scx_bpf_dsq_peek, KF_RCU_PROTECTED | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_iter_scx_dsq_new, KF_ITER_NEW | KF_RCU_PROTECTED) BTF_ID_FLAGS(func, bpf_iter_scx_dsq_next, KF_ITER_NEXT | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_iter_scx_dsq_destroy, KF_ITER_DESTROY) diff --git a/tools/sched_ext/include/scx/common.bpf.h b/tools/sched_ext/inc= lude/scx/common.bpf.h index 06e2551033cb..fbf3e7f9526c 100644 --- a/tools/sched_ext/include/scx/common.bpf.h +++ b/tools/sched_ext/include/scx/common.bpf.h @@ -75,6 +75,7 @@ u32 scx_bpf_reenqueue_local(void) __ksym; void scx_bpf_kick_cpu(s32 cpu, u64 flags) __ksym; s32 scx_bpf_dsq_nr_queued(u64 dsq_id) __ksym; void scx_bpf_destroy_dsq(u64 dsq_id) __ksym; +struct task_struct *scx_bpf_dsq_peek(u64 dsq_id) __ksym __weak; int bpf_iter_scx_dsq_new(struct bpf_iter_scx_dsq *it, u64 dsq_id, u64 flag= s) __ksym __weak; struct task_struct *bpf_iter_scx_dsq_next(struct bpf_iter_scx_dsq *it) __k= sym __weak; void bpf_iter_scx_dsq_destroy(struct bpf_iter_scx_dsq *it) __ksym __weak; diff --git a/tools/sched_ext/include/scx/compat.bpf.h b/tools/sched_ext/inc= lude/scx/compat.bpf.h index dd9144624dc9..0af1922d66a8 100644 --- a/tools/sched_ext/include/scx/compat.bpf.h +++ b/tools/sched_ext/include/scx/compat.bpf.h @@ -130,6 +130,25 @@ int bpf_cpumask_populate(struct cpumask *dst, void *sr= c, size_t src__sz) __ksym false; \ }) =20 + +/* + * v6.19: Introduce lockless peek API for user DSQs. + * + * Preserve the following macro until v6.20. + */ +static inline struct task_struct *__COMPAT_scx_bpf_dsq_peek(u64 dsq_id) +{ + struct task_struct *p =3D NULL; + struct bpf_iter_scx_dsq it; + + if (bpf_ksym_exists(scx_bpf_dsq_peek)) + return scx_bpf_dsq_peek(dsq_id); + if (!bpf_iter_scx_dsq_new(&it, dsq_id, 0)) + p =3D bpf_iter_scx_dsq_next(&it); + bpf_iter_scx_dsq_destroy(&it); + return p; +} + /** * __COMPAT_is_enq_cpu_selected - Test if SCX_ENQ_CPU_SELECTED is on * in a compatible way. We will preserve this __COMPAT helper until v6.16. --=20 2.51.0 From nobody Wed Dec 17 16:08:10 2025 Received: from mail-il1-f180.google.com (mail-il1-f180.google.com [209.85.166.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BADF228C00C for ; Thu, 2 Oct 2025 02:57:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759373856; cv=none; b=dBXRTHJZjKDI4ya6l2EO2o4KQDDcsGcKnLDzDbFagaNetqw6sDR8ERoBAPcFrJ5tb8ZDVyE9qdF0+TlMdQSTwAd5Uglymc+sS5Fi4iYL4f6eJ4pPXlFEfEz8YtPbkdN5CuXQh69Eq2JWNb2zbKM1jY/UUeg1MvSxD8/PCQeLeUs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759373856; c=relaxed/simple; bh=y3ipr8m9u67gmE4hZu7xPVTPdtoye4u5ez1ARduQHRE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=o+8+IGzLmZ3tFOqY7f4xU78hNZDVWcccnjcZZylploGV45lqX954DX7E2v4T4W11ncQGcFPwHPj/WpTiQOPocQmZ3kiPuln7n5bq4y9Cj5e4auONNrXNkPxPGmSDSO5aXN/ZfOdhnvX3pfNuywU3Ar+rb0mbSfBaDGbIEvVmff8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=lN8eEqrV; arc=none smtp.client-ip=209.85.166.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="lN8eEqrV" Received: by mail-il1-f180.google.com with SMTP id e9e14a558f8ab-42791510fe3so1715485ab.0 for ; Wed, 01 Oct 2025 19:57:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1759373853; x=1759978653; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=0BGFl3iaVg20wY/A9rj2Ykarg4EVcq5v3jqlKvBxJUM=; b=lN8eEqrVO1xhNSL15GjOb1jwf/BEUMcfo6ToTTIUKygCVdCcVrkfL2yK57JkJv2L3k ScEMSspGcF238y4Jb4Zgo/Yn0IfarescyqacvFg254L7JCGBpMMhxhVaZHKB3KhA+cUp ci7x0pOnPdKadY7MAJjMovdG1hY/ZWSgFxguNfT+9pDI4FxMgYwzPXxs6K/CL+YqbjCe 05Hz4UUsqtdw/ytCp0eqB/SA9fF8kHf/3D08g60dfqfM7sRtiIbcUBrwqIxQ7XzmN9x0 LTvfYQ8D7Uih9kf9YB+R6K3fJ72n4+L/HZaHyk9weHcIDHCUkZWEvvBBwqucj7uSReYU DIvw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1759373853; x=1759978653; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=0BGFl3iaVg20wY/A9rj2Ykarg4EVcq5v3jqlKvBxJUM=; b=cItgpGYD53bwHw8mqGQ5/iN2TsOF3yJJBye8V6njf3kCIgIF61nHpkC+Eh97HH3w09 z9bSoJeS208IYRAKfs48RysHIqTcNVLqEDbyW0tGqF/TtPXnGfnZPdfSPC/k63OXu/Er Sz0buQ7VyMfAWE+OUeWFqoS9tXi0uyx35+SL3GR2SLdGOESJAAHDeXFt8HNpr3z3QVmt H/4DvRuwHBRfy992XIKgwGue4YwoPM3hydylIIRtSQ+goSSxxIlrtJkuvD6KNAyzjj+/ bi3uyFz8zcixoKO+wrm8NiELcLIMTvVNVospxe3NoviPXFZxAgZ3EGBU5GJSF8bbU7tX eO9w== X-Gm-Message-State: AOJu0YwBLLzVmnj0vo6EtHJdczC/0QUI1qInxYfU1vIEH3Psy8G6cCwZ KtLs+pSXM6uo6RGKc/BNBEVSJFNRcKzSPoYVJc0+5orUqKe3VeKRVddE2Q0w9iAL X-Gm-Gg: ASbGncvqNzaSsthZzxbqUOsiOOaPzf1sZ9WEELSxUaAv3fR9JpVH6s8WwWMik7bgnRi +lpVzM3/EL6eenQ672ZBFfFYm5Yv/ApjIo7Z77xim/jLOiCTry1E7+uwZZuZaj084v+hEnzKhHF taqiofBUKXst98tmLMSE0iDwjXiehivsVsLPlXRHQjv3Etl6J9QpnienX8VwkdnUH56DQvjBgbq l3rcDT1Nw1F0l0aFj+C4yN1axl5J2R0icLpexFXBFPaNBc3j9GcfaYKelYLmgN6/WbtskLuvgJD V54ZmApaCWw+BRxu8xqGQC4j6XCfHjiBoFKVtpD6D5DY3aFrAXF+8v23ppdHZAOEiXC9gwywBJu XlUUy4M1e4GL25pizzhiIowven+4qu5qTy3RWFXdkolQgCpLzrLhsj8RyRcmMO7oWVaWDRD1xyc T1c3ADxXtS0mn0Tp1jN4qvYKlWMg+t5jK6sZNxY1f1GAQ2wg== X-Google-Smtp-Source: AGHT+IE/l+b3Sl9MkR7vAc6b+0igSxttuvanyaCpevYeUwXmM+kpMYtDx4mO1eB0Mn2+ddSPIIpRfw== X-Received: by 2002:a05:6e02:3108:b0:42d:89d8:20ad with SMTP id e9e14a558f8ab-42d8b1d6a26mr22991445ab.14.1759373853197; Wed, 01 Oct 2025 19:57:33 -0700 (PDT) Received: from newton-fedora-MZ01GC9H (c-68-45-22-229.hsd1.in.comcast.net. [68.45.22.229]) by smtp.gmail.com with ESMTPSA id e9e14a558f8ab-42d8b1f4f0asm5250255ab.2.2025.10.01.19.57.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 Oct 2025 19:57:31 -0700 (PDT) From: Ryan Newton To: linux-kernel@vger.kernel.org Cc: tj@kernel.org, arighi@nvidia.com, newton@meta.com Subject: [PATCH 2/3] sched_ext: optimize first_task update logic Date: Wed, 1 Oct 2025 22:57:20 -0400 Message-ID: <20251002025722.3420916-3-rrnewton@gmail.com> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20251002025722.3420916-1-rrnewton@gmail.com> References: <20251002025722.3420916-1-rrnewton@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Ryan Newton This is a follow-on optimization to the prior commit which added a lockless peek operation on DSQs. That implementation is correct and simple, but elides several optimizations. Previously, we read the first_task using the same slowpath, irrespective of where we enqueue the task. With this change, we instead base the update on what we know about the calling context. On both insert and removal we can break down whether the change (1) definitely, (2) never, or (3) sometimes changes first task. In some cases we know what the new first task will be, and can set it more directly. Signed-off-by: Ryan Newton --- kernel/sched/ext.c | 26 ++++++++++++++++++++------ 1 file changed, 20 insertions(+), 6 deletions(-) diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c index fd0121c03311..1cb10aa9913a 100644 --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -953,8 +953,11 @@ static void dispatch_enqueue(struct scx_sched *sch, st= ruct scx_dispatch_q *dsq, container_of(rbp, struct task_struct, scx.dsq_priq); list_add(&p->scx.dsq_list.node, &prev->scx.dsq_list.node); + /* first task unchanged - no update needed */ } else { list_add(&p->scx.dsq_list.node, &dsq->list); + /* new task is at head - use fastpath */ + rcu_assign_pointer(dsq->first_task, p); } } else { /* a FIFO DSQ shouldn't be using PRIQ enqueuing */ @@ -962,15 +965,20 @@ static void dispatch_enqueue(struct scx_sched *sch, s= truct scx_dispatch_q *dsq, scx_error(sch, "DSQ ID 0x%016llx already had PRIQ-enqueued tasks", dsq->id); =20 - if (enq_flags & (SCX_ENQ_HEAD | SCX_ENQ_PREEMPT)) + if (enq_flags & (SCX_ENQ_HEAD | SCX_ENQ_PREEMPT)) { list_add(&p->scx.dsq_list.node, &dsq->list); - else + /* new task inserted at head - use fastpath */ + rcu_assign_pointer(dsq->first_task, p); + } else { + bool was_empty; + + was_empty =3D list_empty(&dsq->list); list_add_tail(&p->scx.dsq_list.node, &dsq->list); + if (was_empty) + rcu_assign_pointer(dsq->first_task, p); + } } =20 - /* even the add_tail code path may have changed the first element */ - dsq_update_first_task(dsq); - /* seq records the order tasks are queued, used by BPF DSQ iterator */ dsq->seq++; p->scx.dsq_seq =3D dsq->seq; @@ -1023,9 +1031,15 @@ static void task_unlink_from_dsq(struct task_struct = *p, p->scx.dsq_flags &=3D ~SCX_TASK_DSQ_ON_PRIQ; } =20 + if (dsq->first_task =3D=3D p) { + if (dsq->id & SCX_DSQ_FLAG_BUILTIN) + rcu_assign_pointer(dsq->first_task, + list_next_entry(p, scx.dsq_list.node)); + else + dsq_update_first_task(dsq); + } list_del_init(&p->scx.dsq_list.node); dsq_mod_nr(dsq, -1); - dsq_update_first_task(dsq); } =20 static void dispatch_dequeue(struct rq *rq, struct task_struct *p) --=20 2.51.0 From nobody Wed Dec 17 16:08:10 2025 Received: from mail-il1-f173.google.com (mail-il1-f173.google.com [209.85.166.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 643AB29BDBC for ; Thu, 2 Oct 2025 02:57:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759373859; cv=none; b=cwPdnltSpN8qDWYNkhrqQmAo2q4uLG7RAghRfYwq736yWb/3dxdbb1+NqUrjTXV+9h7sxnjWwQrQ//Pw9j0oDI65RNTfENnQ33QGQfG9/CtVFnnahIX0rBgUKpEmUPKxS5kPX3e+SKzwEkqaWkvMRKiGYJE2dHF/dLg3xSIWmp4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759373859; c=relaxed/simple; bh=NsT/7leM+Z+LkGB8o+Jjf49sZXdF/46YFSdWNa66/AI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=nrTHgK9c4KM/EKBHvhaVc4dFG+SH55NOpfuFWE+1JoORJO09d6PJMhihhmyCaN/4GIvoXuoPjKATE5/PtXO4u+evDhH+CPeOY7+0MbbssZK2KOilHzkem5Jqg1UUTkXeMRdzXfBw51fl1FDLoEtZW6Q0khYQkisGFF245PqjZgE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=LytkQdS5; arc=none smtp.client-ip=209.85.166.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="LytkQdS5" Received: by mail-il1-f173.google.com with SMTP id e9e14a558f8ab-42486ed0706so2979895ab.0 for ; Wed, 01 Oct 2025 19:57:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1759373856; x=1759978656; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=wU35K+1akWNuALAzb08WKRtex5CvCVX+avkesvY8u3M=; b=LytkQdS5pOo9dJNg1xk56wDA88Y4AOP9SN/NpobM96ezl0B0P3AsF2LM1JUmoDuagT hcsBZ9X3WStPre2VejjXijx3z0CBQOaA1VqEEP9dmHpObJd37LXNiMdRop/eSJzjgWEC O4o+tcrtVHyXpiw+4rOs/GrKXft2F22gRYaJWq8q70yiWDSXaE9/jtSV2q7V1ZvhYTfB qGcf7q76CnsOxLyToEZpul7J6wJGxb7Hu5ySo6hTvvxdFUxQ+zZl5eHnMVh9JsjUaTA9 2uhkGZNn1Gratcov5RDAv1lF5ShoeznPClt54rByXQc90h3Py7xXdiOEJyaymdimoOR2 GyPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1759373856; x=1759978656; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=wU35K+1akWNuALAzb08WKRtex5CvCVX+avkesvY8u3M=; b=EBamHjkCHRF9GGRxpmwhpBp0JFtD/UGJTNU7QFKGJQ114GqQbmkH3x1FWdopZ16fPd R3Oyq+vLii2Jm8vJQ0hpAuJmZc4d0sJuwImuGDX9qmIaN/zfF6hs643Sw1H9Bvs2n4Xb zPFeeVEzTzeSMpFfEaPWyKnengMRcsNrdrzTV5ibUkDNZBaVmpYtA1kUXEEq/IIL1Pkc 4z9p1syX3naOsMsL3dF2v1j24y7EMTXa110z0s3L6kZrzuF1Y4oGRBgU40KM78zTV3LU JVMFf4zZBx4uVSlH0iYIGrDuX8KqilY+U+RFxzRgCocIVB9nexFWyPAS/jb0zkhP++b2 kBBQ== X-Gm-Message-State: AOJu0YwZ2ClUUEg7J74xYtBB1UYoV9UgIq0FV3sWwjBfo6HlFfygPQb1 BFzCjiYH5eGPs9y0zP/CogacBq54p0t7BfV/pv+6jsSQIvQFyQEMtCvMEztpT2Hd X-Gm-Gg: ASbGncvulHWILaCmQGHJP+N1pwT2O9GYR4bhhJVjqrnmMNZmxIhb+2NSDlD5UkAMm0R 1WR+ZnZvexrpewsKUPQIOdlvtUhUzL3X/1t5a0aNC169Mi9ylv7nqKtdzo1FTaqrErRbnnzYVNU F03cHjyk0GZrpglmHpTF7sGQcwl9mTTxLc7QQQFsB/99PprwDmnqiCeZWz2dE4NnWABTq3yF2MJ TE/g3UG2V+IrsPnxo5U4wiQBF1y9H+6LtQT0lXpdpwUPCFCYLIC8YWTWzUx2m5nB2uohATMTdwS X/X+Xft0FDBgZIM9JDEokTQmyjEZZ4dFiDO3vEaAzsHzgWT3QD/HB4AoD4hT3e0gGw9oM4W4+aI bi+DHZN8IGRF/2Ovr6yFgQw5ndQ9fGexvPxh574MIIoevoArNSltEJbfbNQwd+taMDDgeJETwFe q7TKs12x1LhNI3sBeygkVONLyU8YXK07T5+depuNP0yCCiwQ== X-Google-Smtp-Source: AGHT+IGFJnbkla4p6ugYoueYkRXj84y9YL9EJyBOrh5P+LWZK3apsFtDtdfikGHF20BbFxGDDzykSQ== X-Received: by 2002:a05:6e02:1fc4:b0:42a:cc5e:bde5 with SMTP id e9e14a558f8ab-42d8161765bmr71964395ab.21.1759373855573; Wed, 01 Oct 2025 19:57:35 -0700 (PDT) Received: from newton-fedora-MZ01GC9H (c-68-45-22-229.hsd1.in.comcast.net. [68.45.22.229]) by smtp.gmail.com with ESMTPSA id e9e14a558f8ab-42d8b1f4f0asm5250255ab.2.2025.10.01.19.57.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 Oct 2025 19:57:34 -0700 (PDT) From: Ryan Newton To: linux-kernel@vger.kernel.org Cc: tj@kernel.org, arighi@nvidia.com, newton@meta.com Subject: [PATCH 3/3] sched_ext: Add a selftest for scx_bpf_dsq_peek Date: Wed, 1 Oct 2025 22:57:21 -0400 Message-ID: <20251002025722.3420916-4-rrnewton@gmail.com> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20251002025722.3420916-1-rrnewton@gmail.com> References: <20251002025722.3420916-1-rrnewton@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Ryan Newton This is the most basic unit test: make sure an empty queue peeks as empty, and when we put one element in the queue, make sure peek returns that element. However, even this simple test is a little complicated by the different behavior of scx_bpf_dsq_insert in different calling contexts: - insert is for direct dispatch in enqueue - insert is delayed when called from select_cpu In this case we split the insert and the peek that verifies the result between enqueue/dispatch. Note: An alternative would be to call `scx_bpf_dsq_move_to_local` on an empty queue, which in turn calls `flush_dispatch_buf`, in order to flush the buffered insert. Unfortunately, this is not viable within the enqueue path, as it attempts a voluntary context switch within an RCU read-side critical section. Signed-off-by: Ryan Newton --- tools/testing/selftests/sched_ext/Makefile | 1 + .../selftests/sched_ext/peek_dsq.bpf.c | 133 +++++++++++++ tools/testing/selftests/sched_ext/peek_dsq.c | 188 ++++++++++++++++++ 3 files changed, 322 insertions(+) create mode 100644 tools/testing/selftests/sched_ext/peek_dsq.bpf.c create mode 100644 tools/testing/selftests/sched_ext/peek_dsq.c diff --git a/tools/testing/selftests/sched_ext/Makefile b/tools/testing/sel= ftests/sched_ext/Makefile index 9d9d6b4c38b0..5fe45f9c5f8f 100644 --- a/tools/testing/selftests/sched_ext/Makefile +++ b/tools/testing/selftests/sched_ext/Makefile @@ -174,6 +174,7 @@ auto-test-targets :=3D \ minimal \ numa \ allowed_cpus \ + peek_dsq \ prog_run \ reload_loop \ select_cpu_dfl \ diff --git a/tools/testing/selftests/sched_ext/peek_dsq.bpf.c b/tools/testi= ng/selftests/sched_ext/peek_dsq.bpf.c new file mode 100644 index 000000000000..6bbd98799503 --- /dev/null +++ b/tools/testing/selftests/sched_ext/peek_dsq.bpf.c @@ -0,0 +1,133 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * A BPF program for testing DSQ operations including create, destroy, + * and peek operations. Uses a hybrid approach: + * - Syscall program for DSQ lifecycle (create/destroy) + * - Struct ops scheduler for task insertion/dequeue testing + * + * Copyright (c) 2025 Meta Platforms, Inc. and affiliates. + * Copyright (c) 2025 Ryan Newton + */ + +#include +#include + +char _license[] SEC("license") =3D "GPL"; + +/* Global variables to store test results */ +int dsq_create_result =3D -1; +int dsq_destroy_result =3D -1; +int dsq_peek_result1 =3D -1; +long dsq_inserted_pid =3D -1; +int insert_test_cpu =3D -1; /* Set to the cpu that performs the test */ +long dsq_peek_result2 =3D -1; +long dsq_peek_result2_pid =3D -1; +long dsq_peek_result2_expected =3D -1; +int test_dsq_id =3D 1234; /* Use a simple ID like create_dsq example */ +int real_dsq_id =3D 1235; /* DSQ for normal operation */ +int enqueue_count =3D -1; +int dispatch_count =3D -1; +int debug_ksym_exists =3D -1; + + +/* Test if we're actually using the native or compat version */ +int check_dsq_insert_ksym(void) +{ + return bpf_ksym_exists(scx_bpf_dsq_insert) ? 1 : 0; +} + +int check_dsq_peek_ksym(void) +{ + return bpf_ksym_exists(scx_bpf_dsq_peek) ? 1 : 0; +} + +/* Struct_ops scheduler for testing DSQ peek operations */ +void BPF_STRUCT_OPS(peek_dsq_enqueue, struct task_struct *p, u64 enq_flags) +{ + struct task_struct *peek_result; + int last_insert_test_cpu, cpu; + + enqueue_count++; + cpu =3D bpf_get_smp_processor_id(); + last_insert_test_cpu =3D __sync_val_compare_and_swap( + &insert_test_cpu, -1, cpu); + + /* On the first task, just do the empty DSQ test and insert into test DSQ= */ + if (last_insert_test_cpu =3D=3D -1) { + bpf_printk("peek_dsq_enqueue beginning peek test on cpu %d\n", cpu); + + /* Test 1: Peek empty DSQ - should return NULL */ + peek_result =3D __COMPAT_scx_bpf_dsq_peek(test_dsq_id); + dsq_peek_result1 =3D (long)peek_result; /* Should be 0 (NULL) */ + + /* Test 2: Insert task into test DSQ for testing in dispatch callback */ + dsq_inserted_pid =3D p->pid; + scx_bpf_dsq_insert(p, test_dsq_id, 0, enq_flags); + dsq_peek_result2_expected =3D (long)p; /* Expected the task we just inse= rted */ + } else + scx_bpf_dsq_insert(p, real_dsq_id, 0, enq_flags); +} + +void BPF_STRUCT_OPS(peek_dsq_dispatch, s32 cpu, struct task_struct *prev) +{ + dispatch_count++; + /* Complete the peek test if we inserted a task but haven't tested peek y= et */ + if (insert_test_cpu =3D=3D cpu && dsq_peek_result2 =3D=3D -1) { + struct task_struct *peek_result; + + bpf_printk("peek_dsq_dispatch completing second half of peek test on cpu= %d\n", + cpu); + + /* Test 3: Peek DSQ after insert - should return the task we inserted */ + peek_result =3D __COMPAT_scx_bpf_dsq_peek(test_dsq_id); + /* Store the PID of the peeked task for comparison */ + dsq_peek_result2 =3D (long)peek_result; + dsq_peek_result2_pid =3D peek_result ? peek_result->pid : -1; + + /* Now consume the task since we've peeked at it */ + scx_bpf_dsq_move_to_local(test_dsq_id); + } else + scx_bpf_dsq_move_to_local(real_dsq_id); +} + +s32 BPF_STRUCT_OPS_SLEEPABLE(peek_dsq_init) +{ + s32 err; + + /* Always set debug values so we can see which version we're using */ + debug_ksym_exists =3D check_dsq_peek_ksym(); + + /* Initialize state first */ + insert_test_cpu =3D -1; + enqueue_count =3D 0; + dsq_create_result =3D 0; /* Reset to 0 before attempting */ + + /* Create a DSQ */ + err =3D scx_bpf_create_dsq(test_dsq_id, -1); + if (!err) + err =3D scx_bpf_create_dsq(real_dsq_id, -1); + if (err) { + dsq_create_result =3D err; + scx_bpf_error("Failed to create DSQ %d: %d", test_dsq_id, err); + return err; + } + + dsq_create_result =3D 1; /* Success */ + + return 0; +} + +void BPF_STRUCT_OPS(peek_dsq_exit, struct scx_exit_info *ei) +{ + scx_bpf_destroy_dsq(test_dsq_id); + dsq_destroy_result =3D 1; +} + +SEC(".struct_ops.link") +struct sched_ext_ops peek_dsq_ops =3D { + .enqueue =3D (void *)peek_dsq_enqueue, + .dispatch =3D (void *)peek_dsq_dispatch, + .init =3D (void *)peek_dsq_init, + .exit =3D (void *)peek_dsq_exit, + .name =3D "peek_dsq", +}; diff --git a/tools/testing/selftests/sched_ext/peek_dsq.c b/tools/testing/s= elftests/sched_ext/peek_dsq.c new file mode 100644 index 000000000000..ba9e03c2bd49 --- /dev/null +++ b/tools/testing/selftests/sched_ext/peek_dsq.c @@ -0,0 +1,188 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Test for DSQ operations including create, destroy, and peek operations. + * + * Copyright (c) 2025 Meta Platforms, Inc. and affiliates. + * Copyright (c) 2025 Ryan Newton + */ +#include +#include +#include +#include +#include +#include +#include +#include "peek_dsq.bpf.skel.h" +#include "scx_test.h" + +static bool workload_running =3D true; +static pthread_t workload_thread; + +/** + * Background workload thread that sleeps and wakes rapidly to exercise + * the scheduler's enqueue operations and ensure DSQ operations get tested. + */ +static void *workload_thread_fn(void *arg) +{ + while (workload_running) { + /* Sleep for a very short time to trigger scheduler activity */ + usleep(1000); /* 1ms sleep */ + /* Yield to ensure we go through the scheduler */ + sched_yield(); + } + return NULL; +} + +static enum scx_test_status setup(void **ctx) +{ + struct peek_dsq *skel; + + skel =3D peek_dsq__open(); + SCX_FAIL_IF(!skel, "Failed to open"); + SCX_ENUM_INIT(skel); + SCX_FAIL_IF(peek_dsq__load(skel), "Failed to load skel"); + + *ctx =3D skel; + + return SCX_TEST_PASS; +} + +static enum scx_test_status run(void *ctx) +{ + struct peek_dsq *skel =3D ctx; + bool failed =3D false; + int seconds =3D 2; + int err; + + /* Enable the scheduler to test DSQ operations */ + printf("Enabling scheduler to test DSQ insert operations...\n"); + + struct bpf_link *link =3D + bpf_map__attach_struct_ops(skel->maps.peek_dsq_ops); + + if (!link) { + SCX_ERR("Failed to attach struct_ops"); + return SCX_TEST_FAIL; + } + + /* Start background workload thread to exercise the scheduler */ + printf("Starting background workload thread...\n"); + workload_running =3D true; + err =3D pthread_create(&workload_thread, NULL, workload_thread_fn, NULL); + if (err) { + SCX_ERR("Failed to create workload thread: %s", strerror(err)); + bpf_link__destroy(link); + return SCX_TEST_FAIL; + } + + printf("Waiting for enqueue events.\n"); + sleep(2); + while (skel->data->enqueue_count <=3D 0) { + printf("."); + fflush(stdout); + sleep(1); + seconds++; + if (seconds >=3D 30) { + printf("\n=E2=9C=97 Timeout waiting for enqueue events\n"); + /* Stop workload thread and cleanup */ + workload_running =3D false; + pthread_join(workload_thread, NULL); + bpf_link__destroy(link); + return SCX_TEST_FAIL; + } + } + + workload_running =3D false; + err =3D pthread_join(workload_thread, NULL); + if (err) { + SCX_ERR("Failed to join workload thread: %s", strerror(err)); + bpf_link__destroy(link); + return SCX_TEST_FAIL; + } + printf("Background workload thread stopped.\n"); + + /* Detach the scheduler */ + bpf_link__destroy(link); + + /* Check if DSQ creation succeeded */ + if (skel->data->dsq_create_result !=3D 1) { + printf("=E2=9C=97 DSQ create failed: got %d, expected 1\n", + skel->data->dsq_create_result); + failed =3D true; + } else { + printf("=E2=9C=93 DSQ create succeeded\n"); + } + + printf("Enqueue/dispatch count over %d seconds: %d / %d\n", seconds, + skel->data->enqueue_count, skel->data->dispatch_count); + printf("Debug: ksym_exists=3D%d\n", + skel->data->debug_ksym_exists); + + /* Check DSQ insert result */ + printf("DSQ insert test done on cpu: %d\n", skel->data->insert_test_cpu); + if (skel->data->insert_test_cpu !=3D -1) + printf("=E2=9C=93 DSQ insert succeeded !\n"); + else { + printf("=E2=9C=97 DSQ insert failed or not attempted\n"); + failed =3D true; + } + + /* Check DSQ peek results */ + printf(" DSQ peek result 1 (before insert): %d\n", + skel->data->dsq_peek_result1); + if (skel->data->dsq_peek_result1 =3D=3D 0) + printf("=E2=9C=93 DSQ peek verification succeeded - peek returned NULL!\= n"); + else { + printf("=E2=9C=97 DSQ peek verification failed\n"); + failed =3D true; + } + + printf(" DSQ peek result 2 (after insert): %ld\n", + skel->data->dsq_peek_result2); + printf(" DSQ peek result 2, expected: %ld\n", + skel->data->dsq_peek_result2_expected); + if (skel->data->dsq_peek_result2 =3D=3D + skel->data->dsq_peek_result2_expected) + printf("=E2=9C=93 DSQ peek verification succeeded - peek returned the in= serted task!\n"); + else { + printf("=E2=9C=97 DSQ peek verification failed\n"); + failed =3D true; + } + + printf(" Inserted test task -> pid: %ld\n", skel->data->dsq_inserted_pid= ); + printf(" DSQ peek result 2 -> pid: %ld\n", skel->data->dsq_peek_result2_= pid); + + if (skel->data->dsq_destroy_result !=3D 1) { + printf("=E2=9C=97 DSQ destroy failed: got %d, expected 1\n", + skel->data->dsq_destroy_result); + failed =3D true; + } + + if (failed) + return SCX_TEST_FAIL; + else + return SCX_TEST_PASS; +} + +static void cleanup(void *ctx) +{ + struct peek_dsq *skel =3D ctx; + + /* Ensure workload thread is stopped */ + if (workload_running) { + workload_running =3D false; + pthread_join(workload_thread, NULL); + } + + peek_dsq__destroy(skel); +} + +struct scx_test peek_dsq =3D { + .name =3D "peek_dsq", + .description =3D + "Test DSQ create/destroy operations and future peek functionality", + .setup =3D setup, + .run =3D run, + .cleanup =3D cleanup, +}; +REGISTER_SCX_TEST(&peek_dsq) --=20 2.51.0