[RFC PATCH 0/5] sched: NUMA-aware concurrency IDs

Mathieu Desnoyers posted 5 patches 1 year, 5 months ago
There is a newer version of this series
include/linux/cpumask.h                       |  60 ++++++++
include/linux/find.h                          | 122 ++++++++++++++-
include/linux/mm_types.h                      |  57 ++++++-
kernel/sched/core.c                           |  10 +-
kernel/sched/sched.h                          | 139 +++++++++++++++--
lib/find_bit.c                                |  42 +++++
tools/testing/selftests/rseq/.gitignore       |   1 +
tools/testing/selftests/rseq/Makefile         |   2 +-
.../testing/selftests/rseq/basic_numa_test.c  | 144 ++++++++++++++++++
tools/testing/selftests/rseq/rseq-x86-bits.h  |  43 ++++++
tools/testing/selftests/rseq/rseq.h           |  14 ++
11 files changed, 613 insertions(+), 21 deletions(-)
create mode 100644 tools/testing/selftests/rseq/basic_numa_test.c
[RFC PATCH 0/5] sched: NUMA-aware concurrency IDs
Posted by Mathieu Desnoyers 1 year, 5 months ago
The issue addressed by this series is the non-locality of NUMA accesses
to data structures indexed by concurrency IDs: for example, in a
scenario where a process has two threads, and they periodically run one
after the other on different NUMA nodes, each will be assigned mm_cid=0.
As a consequence, they will end up accessing the same pages, and thus at
least one of the threads will need to perform remote NUMA accesses,
which is inefficient.

Solve this by making the rseq concurrency ID (mm_cid) NUMA-aware. On
NUMA systems, when a NUMA-aware concurrency ID is observed by user-space
to be associated with a NUMA node, guarantee that it never changes NUMA
node unless either a kernel-level NUMA configuration change happens, or
scheduler migrations end up migrating tasks across NUMA nodes.

There is a tradeoff between NUMA locality and compactness of the
concurrency ID allocation. Favor compactness over NUMA locality when
the scheduler migrates tasks across NUMA nodes, as this does not cause
the frequent remote NUMA accesses behavior. This is done by limiting the
concurrency ID range to minimum between the number of threads belonging
to the process and the number of allowed CPUs.

This series applies on top of v6.10.3.

Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Yury Norov <yury.norov@gmail.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Shuah Khan <skhan@linuxfoundation.org>

Mathieu Desnoyers (5):
  lib: Implement find_{first,next,nth}_notandnot_bit,
    find_first_andnot_bit
  cpumask: Implement cpumask_{first,next}_{not,}andnot
  sched: NUMA-aware per-memory-map concurrency IDs
  selftests/rseq: x86: Implement rseq_load_u32_u32
  selftests/rseq: Implement NUMA node id vs mm_cid invariant test

 include/linux/cpumask.h                       |  60 ++++++++
 include/linux/find.h                          | 122 ++++++++++++++-
 include/linux/mm_types.h                      |  57 ++++++-
 kernel/sched/core.c                           |  10 +-
 kernel/sched/sched.h                          | 139 +++++++++++++++--
 lib/find_bit.c                                |  42 +++++
 tools/testing/selftests/rseq/.gitignore       |   1 +
 tools/testing/selftests/rseq/Makefile         |   2 +-
 .../testing/selftests/rseq/basic_numa_test.c  | 144 ++++++++++++++++++
 tools/testing/selftests/rseq/rseq-x86-bits.h  |  43 ++++++
 tools/testing/selftests/rseq/rseq.h           |  14 ++
 11 files changed, 613 insertions(+), 21 deletions(-)
 create mode 100644 tools/testing/selftests/rseq/basic_numa_test.c

-- 
2.39.2
Re: [RFC PATCH 0/5] sched: NUMA-aware concurrency IDs
Posted by Shuah Khan 1 year, 5 months ago
On 8/19/24 08:24, Mathieu Desnoyers wrote:
> The issue addressed by this series is the non-locality of NUMA accesses
> to data structures indexed by concurrency IDs: for example, in a
> scenario where a process has two threads, and they periodically run one
> after the other on different NUMA nodes, each will be assigned mm_cid=0.
> As a consequence, they will end up accessing the same pages, and thus at
> least one of the threads will need to perform remote NUMA accesses,
> which is inefficient.
> 
> Solve this by making the rseq concurrency ID (mm_cid) NUMA-aware. On
> NUMA systems, when a NUMA-aware concurrency ID is observed by user-space
> to be associated with a NUMA node, guarantee that it never changes NUMA
> node unless either a kernel-level NUMA configuration change happens, or
> scheduler migrations end up migrating tasks across NUMA nodes.
> 
> There is a tradeoff between NUMA locality and compactness of the
> concurrency ID allocation. Favor compactness over NUMA locality when
> the scheduler migrates tasks across NUMA nodes, as this does not cause
> the frequent remote NUMA accesses behavior. This is done by limiting the
> concurrency ID range to minimum between the number of threads belonging
> to the process and the number of allowed CPUs.
> 
> This series applies on top of v6.10.3.
> 
> Cc: Valentin Schneider <vschneid@redhat.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Ben Segall <bsegall@google.com>
> Cc: Yury Norov <yury.norov@gmail.com>
> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
> Cc: Shuah Khan <skhan@linuxfoundation.org>
> 
> Mathieu Desnoyers (5):
>    lib: Implement find_{first,next,nth}_notandnot_bit,
>      find_first_andnot_bit
>    cpumask: Implement cpumask_{first,next}_{not,}andnot
>    sched: NUMA-aware per-memory-map concurrency IDs
>    selftests/rseq: x86: Implement rseq_load_u32_u32
>    selftests/rseq: Implement NUMA node id vs mm_cid invariant test
> 
>   include/linux/cpumask.h                       |  60 ++++++++
>   include/linux/find.h                          | 122 ++++++++++++++-
>   include/linux/mm_types.h                      |  57 ++++++-
>   kernel/sched/core.c                           |  10 +-
>   kernel/sched/sched.h                          | 139 +++++++++++++++--
>   lib/find_bit.c                                |  42 +++++
>   tools/testing/selftests/rseq/.gitignore       |   1 +
>   tools/testing/selftests/rseq/Makefile         |   2 +-
>   .../testing/selftests/rseq/basic_numa_test.c  | 144 ++++++++++++++++++
>   tools/testing/selftests/rseq/rseq-x86-bits.h  |  43 ++++++
>   tools/testing/selftests/rseq/rseq.h           |  14 ++
>   11 files changed, 613 insertions(+), 21 deletions(-)
>   create mode 100644 tools/testing/selftests/rseq/basic_numa_test.c
> 

Looks good to me - for selftests:

Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>

thanks,
-- Shuah