[PATCH v3 0/2] sched_ext: lockless peek operation for DSQs

Ryan Newton posted 2 patches 2 months ago
There is a newer version of this series
include/linux/sched/ext.h                     |   1 +
kernel/sched/ext.c                            |  56 +++-
tools/sched_ext/include/scx/common.bpf.h      |   1 +
tools/sched_ext/include/scx/compat.bpf.h      |  19 ++
tools/testing/selftests/sched_ext/Makefile    |   1 +
.../selftests/sched_ext/peek_dsq.bpf.c        | 265 ++++++++++++++++++
tools/testing/selftests/sched_ext/peek_dsq.c  | 230 +++++++++++++++
7 files changed, 571 insertions(+), 2 deletions(-)
create mode 100644 tools/testing/selftests/sched_ext/peek_dsq.bpf.c
create mode 100644 tools/testing/selftests/sched_ext/peek_dsq.c
[PATCH v3 0/2] sched_ext: lockless peek operation for DSQs
Posted by Ryan Newton 2 months ago
This allows sched_ext schedulers an inexpensive operation to peek
at the first element in a queue (DSQ), without creating an iterator 
and acquiring the lock on that queue.

Note that manual testing has thus far included a modified version of the
example qmap scheduler that exercises peek, as well as a modified
modified LAVD (from the SCX repo) that exercises peek. The attached test
passes >1000 stress tests when run in concurrent VMs, and when run
sequentially on the host kernel. Presently, tested on the below
workstation and server processors.
- AMD Ryzen Threadripper PRO 7975WX 32-Cores
- AMD EPYC 9D64 88-Core Processor

Initial experiments indicate a substantial speedup (on schbench) when
running an SCX scheduler with per-cpu DSQs and peeking each queue to
retrieve the task with the minimum vruntime across all the CPUs.

---
Changes in v3:
 - inline helpers and simplify
 - coding style tweaks

Changes in v2:
 - make peek() only work for user DSQs and error otherwise
 - added a stress test component to the selftest that performs many peeks
 - responded to review comments from tj@kernel.org and arighi@nvidia.com 
 - link: https://lore.kernel.org/lkml/20251003195408.675527-1-rrnewton@gmail.com/
 
v1 link: https://lore.kernel.org/lkml/20251002025722.3420916-1-rrnewton@gmail.com/

Ryan Newton (2):
  sched_ext: Add lockless peek operation for DSQs
  sched_ext: Add a selftest for scx_bpf_dsq_peek

 include/linux/sched/ext.h                     |   1 +
 kernel/sched/ext.c                            |  56 +++-
 tools/sched_ext/include/scx/common.bpf.h      |   1 +
 tools/sched_ext/include/scx/compat.bpf.h      |  19 ++
 tools/testing/selftests/sched_ext/Makefile    |   1 +
 .../selftests/sched_ext/peek_dsq.bpf.c        | 265 ++++++++++++++++++
 tools/testing/selftests/sched_ext/peek_dsq.c  | 230 +++++++++++++++
 7 files changed, 571 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/sched_ext/peek_dsq.bpf.c
 create mode 100644 tools/testing/selftests/sched_ext/peek_dsq.c

-- 
2.51.0
Re: [PATCH v3 0/2] sched_ext: lockless peek operation for DSQs
Posted by Christian Loehle 2 months ago
On 10/6/25 18:04, Ryan Newton wrote:
> This allows sched_ext schedulers an inexpensive operation to peek
> at the first element in a queue (DSQ), without creating an iterator 
> and acquiring the lock on that queue.
> 
> Note that manual testing has thus far included a modified version of the
> example qmap scheduler that exercises peek, as well as a modified
> modified LAVD (from the SCX repo) that exercises peek. The attached test
> passes >1000 stress tests when run in concurrent VMs, and when run
> sequentially on the host kernel. Presently, tested on the below
> workstation and server processors.
> - AMD Ryzen Threadripper PRO 7975WX 32-Cores
> - AMD EPYC 9D64 88-Core Processor

Is the adapted qmap and lavd available somewhere?

> 
> Initial experiments indicate a substantial speedup (on schbench) when
> running an SCX scheduler with per-cpu DSQs and peeking each queue to
> retrieve the task with the minimum vruntime across all the CPUs.
> 
> ---
> Changes in v3:
>  - inline helpers and simplify
>  - coding style tweaks
> 
> Changes in v2:
>  - make peek() only work for user DSQs and error otherwise
>  - added a stress test component to the selftest that performs many peeks
>  - responded to review comments from tj@kernel.org and arighi@nvidia.com 
>  - link: https://lore.kernel.org/lkml/20251003195408.675527-1-rrnewton@gmail.com/
>  
> v1 link: https://lore.kernel.org/lkml/20251002025722.3420916-1-rrnewton@gmail.com/
> 
> Ryan Newton (2):
>   sched_ext: Add lockless peek operation for DSQs
>   sched_ext: Add a selftest for scx_bpf_dsq_peek
> 
>  include/linux/sched/ext.h                     |   1 +
>  kernel/sched/ext.c                            |  56 +++-
>  tools/sched_ext/include/scx/common.bpf.h      |   1 +
>  tools/sched_ext/include/scx/compat.bpf.h      |  19 ++
>  tools/testing/selftests/sched_ext/Makefile    |   1 +
>  .../selftests/sched_ext/peek_dsq.bpf.c        | 265 ++++++++++++++++++
>  tools/testing/selftests/sched_ext/peek_dsq.c  | 230 +++++++++++++++
>  7 files changed, 571 insertions(+), 2 deletions(-)
>  create mode 100644 tools/testing/selftests/sched_ext/peek_dsq.bpf.c
>  create mode 100644 tools/testing/selftests/sched_ext/peek_dsq.c
>
Re: [PATCH v3 0/2] sched_ext: lockless peek operation for DSQs
Posted by Ryan Newton 2 months ago
Hello Christian,

Sure, the lavd with peek is here:
https://github.com/sched-ext/scx/pull/2675

Beerland with peek is here:
https://github.com/sched-ext/scx/commit/c2a0f185051c06cc1ebae1dc40e5fe2bd3022c1e

The qmap one was not a meaningful change to the scheduler but just
extra peeks thrown in with debug prints.

Cheers,
  -Ryan

P.S. Good catch on the two newlines being split into the wrong diff.
Argh, I was playing with Sapling's `sl absorb` and I swear it said it
amended the earlier commit. Apologies for not catching it. FWIW here's
a tip with that correction:
  https://github.com/rrnewton/linux/commit/db426f852813e2b6deeae0869d20df1bea647a07




On Mon, Oct 6, 2025 at 1:20 PM Christian Loehle
<christian.loehle@arm.com> wrote:
>
> On 10/6/25 18:04, Ryan Newton wrote:
> > This allows sched_ext schedulers an inexpensive operation to peek
> > at the first element in a queue (DSQ), without creating an iterator
> > and acquiring the lock on that queue.
> >
> > Note that manual testing has thus far included a modified version of the
> > example qmap scheduler that exercises peek, as well as a modified
> > modified LAVD (from the SCX repo) that exercises peek. The attached test
> > passes >1000 stress tests when run in concurrent VMs, and when run
> > sequentially on the host kernel. Presently, tested on the below
> > workstation and server processors.
> > - AMD Ryzen Threadripper PRO 7975WX 32-Cores
> > - AMD EPYC 9D64 88-Core Processor
>
> Is the adapted qmap and lavd available somewhere?
>
> >
> > Initial experiments indicate a substantial speedup (on schbench) when
> > running an SCX scheduler with per-cpu DSQs and peeking each queue to
> > retrieve the task with the minimum vruntime across all the CPUs.
> >
> > ---
> > Changes in v3:
> >  - inline helpers and simplify
> >  - coding style tweaks
> >
> > Changes in v2:
> >  - make peek() only work for user DSQs and error otherwise
> >  - added a stress test component to the selftest that performs many peeks
> >  - responded to review comments from tj@kernel.org and arighi@nvidia.com
> >  - link: https://lore.kernel.org/lkml/20251003195408.675527-1-rrnewton@gmail.com/
> >
> > v1 link: https://lore.kernel.org/lkml/20251002025722.3420916-1-rrnewton@gmail.com/
> >
> > Ryan Newton (2):
> >   sched_ext: Add lockless peek operation for DSQs
> >   sched_ext: Add a selftest for scx_bpf_dsq_peek
> >
> >  include/linux/sched/ext.h                     |   1 +
> >  kernel/sched/ext.c                            |  56 +++-
> >  tools/sched_ext/include/scx/common.bpf.h      |   1 +
> >  tools/sched_ext/include/scx/compat.bpf.h      |  19 ++
> >  tools/testing/selftests/sched_ext/Makefile    |   1 +
> >  .../selftests/sched_ext/peek_dsq.bpf.c        | 265 ++++++++++++++++++
> >  tools/testing/selftests/sched_ext/peek_dsq.c  | 230 +++++++++++++++
> >  7 files changed, 571 insertions(+), 2 deletions(-)
> >  create mode 100644 tools/testing/selftests/sched_ext/peek_dsq.bpf.c
> >  create mode 100644 tools/testing/selftests/sched_ext/peek_dsq.c
> >
>