[PATCH v3 0/2] sched/numa: Skip VMA scanning on memory pinned

Libo Chen posted 2 patches 9 months, 3 weeks ago
include/trace/events/sched.h | 30 ++++++++++++++++++++++++++++++
kernel/sched/fair.c          |  9 +++++++++
2 files changed, 39 insertions(+)
[PATCH v3 0/2] sched/numa: Skip VMA scanning on memory pinned
Posted by Libo Chen 9 months, 3 weeks ago
v1->v2:
1. add perf improvment numbers in commit log. Yet to find perf diff on
will-it-scale, so not included here. Plan to run more workloads.
2. add tracepoint.
3. To peterz's comment, this will make it impossible to attract tasks to
those memory just like other VMA skippings. This is the current
implementation, I think we can improve that in the future, but at the
moment it's probabaly better to keep it consistent.

v2->v3:
1. add enable_cpuset() based on Mel's suggestion but again I think it's
redundant
2. print out nodemask with %*p.. format in the tracepoint

Libo Chen (2):
  sched/numa: Skip VMA scanning on memory pinned to one NUMA node via
    cpuset.mems
  sched/numa: Add tracepoint that tracks the skipping of numa balancing
    due to cpuset memory pinning

 include/trace/events/sched.h | 30 ++++++++++++++++++++++++++++++
 kernel/sched/fair.c          |  9 +++++++++
 2 files changed, 39 insertions(+)

-- 
2.43.5
Re: [PATCH v3 0/2] sched/numa: Skip VMA scanning on memory pinned
Posted by Andrew Morton 9 months, 3 weeks ago
On Thu, 17 Apr 2025 12:15:41 -0700 Libo Chen <libo.chen@oracle.com> wrote:

> v1->v2:
> 1. add perf improvment numbers in commit log. Yet to find perf diff on
> will-it-scale, so not included here. Plan to run more workloads.
> 2. add tracepoint.
> 3. To peterz's comment, this will make it impossible to attract tasks to
> those memory just like other VMA skippings. This is the current
> implementation, I think we can improve that in the future, but at the
> moment it's probabaly better to keep it consistent.
> 
> v2->v3:
> 1. add enable_cpuset() based on Mel's suggestion but again I think it's
> redundant
> 2. print out nodemask with %*p.. format in the tracepoint

I do agree with Mel - bitmap_weight() is somewhat expensive and
cpusets_enabled() is super fast.  So the benefit to
cpusets_enabled()=false kernels will exceed to cost to
cpusets_enabled()=true kernels.

This isn't traditionally mm.git material, but it's close.  I'll grab
the patchset for some testing.  and shall drop it again if it turns up
via another tree.
Re: [PATCH v3 0/2] sched/numa: Skip VMA scanning on memory pinned
Posted by Libo Chen 9 months, 3 weeks ago

On 4/17/25 13:12, Andrew Morton wrote:
> On Thu, 17 Apr 2025 12:15:41 -0700 Libo Chen <libo.chen@oracle.com> wrote:
> 
>> v1->v2:
>> 1. add perf improvment numbers in commit log. Yet to find perf diff on
>> will-it-scale, so not included here. Plan to run more workloads.
>> 2. add tracepoint.
>> 3. To peterz's comment, this will make it impossible to attract tasks to
>> those memory just like other VMA skippings. This is the current
>> implementation, I think we can improve that in the future, but at the
>> moment it's probabaly better to keep it consistent.
>>
>> v2->v3:
>> 1. add enable_cpuset() based on Mel's suggestion but again I think it's
>> redundant
>> 2. print out nodemask with %*p.. format in the tracepoint
> 
> I do agree with Mel - bitmap_weight() is somewhat expensive and
> cpusets_enabled() is super fast.  So the benefit to
> cpusets_enabled()=false kernels will exceed to cost to
> cpusets_enabled()=true kernels.
> 
Ah yes, that's right. Thanks for grabbing it~

Libo

> This isn't traditionally mm.git material, but it's close.  I'll grab
> the patchset for some testing.  and shall drop it again if it turns up
> via another tree.
>