[PATCH v4 0/2] sched/numa: add statistics of numa balance task migration

Chen Yu posted 2 patches 7 months, 1 week ago
There is a newer version of this series
Documentation/admin-guide/cgroup-v2.rst | 6 ++++++
include/linux/sched.h                   | 4 ++++
include/linux/vm_event_item.h           | 2 ++
kernel/sched/core.c                     | 9 +++++++--
kernel/sched/debug.c                    | 4 ++++
kernel/sched/fair.c                     | 3 ++-
mm/memcontrol.c                         | 2 ++
mm/vmstat.c                             | 2 ++
8 files changed, 29 insertions(+), 3 deletions(-)
[PATCH v4 0/2] sched/numa: add statistics of numa balance task migration
Posted by Chen Yu 7 months, 1 week ago
Introducing the task migration and swap statistics in the following places:
/sys/fs/cgroup/{GROUP}/memory.stat
/proc/{PID}/sched
/proc/vmstat

These statistics facilitate a rapid evaluation of the performance and resource
utilization of the target workload.

Patch 1 is a fix from Libo to avoid task swapping for kernel threads,
because Numa balance only cares about the user pages via VMA.

Patch 2 is the major change to expose the statistics of task migration and
swapping in corresponding files.

The reason to fold patch 1 and patch 2 into 1 patch set is that patch 1 is
necessary for patch 2 to avoid accessing a NULL mm_struct from a kernel
thread, which causes NULL pointer exception.

The Tested-by and Acked-by tags are preserved, because these tags are provided
in version 1 which has the p->mm check.

Previous version:
v3:
https://lore.kernel.org/lkml/20250430103623.3349842-1-yu.c.chen@intel.com/
v2:
https://lore.kernel.org/lkml/20250408101444.192519-1-yu.c.chen@intel.com/
v1:
https://lore.kernel.org/lkml/20250402010611.3204674-1-yu.c.chen@intel.com/

Chen Yu (1):
  sched/numa: add statistics of numa balance task migration

Libo Chen (1):
  sched/numa: fix task swap by skipping kernel threads

 Documentation/admin-guide/cgroup-v2.rst | 6 ++++++
 include/linux/sched.h                   | 4 ++++
 include/linux/vm_event_item.h           | 2 ++
 kernel/sched/core.c                     | 9 +++++++--
 kernel/sched/debug.c                    | 4 ++++
 kernel/sched/fair.c                     | 3 ++-
 mm/memcontrol.c                         | 2 ++
 mm/vmstat.c                             | 2 ++
 8 files changed, 29 insertions(+), 3 deletions(-)

-- 
2.25.1
Re: [PATCH v4 0/2] sched/numa: add statistics of numa balance task migration
Posted by Venkat Rao Bagalkote 7 months, 1 week ago
On 07/05/25 4:44 pm, Chen Yu wrote:
> Introducing the task migration and swap statistics in the following places:
> /sys/fs/cgroup/{GROUP}/memory.stat
> /proc/{PID}/sched
> /proc/vmstat
>
> These statistics facilitate a rapid evaluation of the performance and resource
> utilization of the target workload.
>
> Patch 1 is a fix from Libo to avoid task swapping for kernel threads,
> because Numa balance only cares about the user pages via VMA.
>
> Patch 2 is the major change to expose the statistics of task migration and
> swapping in corresponding files.
>
> The reason to fold patch 1 and patch 2 into 1 patch set is that patch 1 is
> necessary for patch 2 to avoid accessing a NULL mm_struct from a kernel
> thread, which causes NULL pointer exception.
>
> The Tested-by and Acked-by tags are preserved, because these tags are provided
> in version 1 which has the p->mm check.
>
> Previous version:
> v3:
> https://lore.kernel.org/lkml/20250430103623.3349842-1-yu.c.chen@intel.com/
> v2:
> https://lore.kernel.org/lkml/20250408101444.192519-1-yu.c.chen@intel.com/
> v1:
> https://lore.kernel.org/lkml/20250402010611.3204674-1-yu.c.chen@intel.com/
>
> Chen Yu (1):
>    sched/numa: add statistics of numa balance task migration
>
> Libo Chen (1):
>    sched/numa: fix task swap by skipping kernel threads
>
>   Documentation/admin-guide/cgroup-v2.rst | 6 ++++++
>   include/linux/sched.h                   | 4 ++++
>   include/linux/vm_event_item.h           | 2 ++
>   kernel/sched/core.c                     | 9 +++++++--
>   kernel/sched/debug.c                    | 4 ++++
>   kernel/sched/fair.c                     | 3 ++-
>   mm/memcontrol.c                         | 2 ++
>   mm/vmstat.c                             | 2 ++
>   8 files changed, 29 insertions(+), 3 deletions(-)
>

Hello Chenyu,


Tested this patch by applying on top of next-20250507, and it fixes the 
NULL pointer exception error on IBM Power9 system. Hence,


Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>



Regards,

Venkat.
Re: [PATCH v4 0/2] sched/numa: add statistics of numa balance task migration
Posted by Venkat Rao Bagalkote 7 months, 1 week ago
Hello Chenyu,


On 07/05/25 4:44 pm, Chen Yu wrote:
> Introducing the task migration and swap statistics in the following places:
> /sys/fs/cgroup/{GROUP}/memory.stat
> /proc/{PID}/sched
> /proc/vmstat
>
> These statistics facilitate a rapid evaluation of the performance and resource
> utilization of the target workload.
>
> Patch 1 is a fix from Libo to avoid task swapping for kernel threads,
> because Numa balance only cares about the user pages via VMA.
>
> Patch 2 is the major change to expose the statistics of task migration and
> swapping in corresponding files.
>
> The reason to fold patch 1 and patch 2 into 1 patch set is that patch 1 is
> necessary for patch 2 to avoid accessing a NULL mm_struct from a kernel
> thread, which causes NULL pointer exception.
>
> The Tested-by and Acked-by tags are preserved, because these tags are provided
> in version 1 which has the p->mm check.

I see below tags from version 1 are missing. I think, its contridicting 
to the above line. Please correct me, If I am wrong.


Tested-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com>

Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>

>
> Previous version:
> v3:
> https://lore.kernel.org/lkml/20250430103623.3349842-1-yu.c.chen@intel.com/
> v2:
> https://lore.kernel.org/lkml/20250408101444.192519-1-yu.c.chen@intel.com/
> v1:
> https://lore.kernel.org/lkml/20250402010611.3204674-1-yu.c.chen@intel.com/
>
> Chen Yu (1):
>    sched/numa: add statistics of numa balance task migration
>
> Libo Chen (1):
>    sched/numa: fix task swap by skipping kernel threads
>
>   Documentation/admin-guide/cgroup-v2.rst | 6 ++++++
>   include/linux/sched.h                   | 4 ++++
>   include/linux/vm_event_item.h           | 2 ++
>   kernel/sched/core.c                     | 9 +++++++--
>   kernel/sched/debug.c                    | 4 ++++
>   kernel/sched/fair.c                     | 3 ++-
>   mm/memcontrol.c                         | 2 ++
>   mm/vmstat.c                             | 2 ++
>   8 files changed, 29 insertions(+), 3 deletions(-)
>

For some reason, I am not able to apply this patch on top of 
next-20250506. I see patch002 fails to apply. Please find the errors below.


Also, I see tags are changed. Specially Tested-by


Errors:


b4 am cover.1746611892.git.yu.c.chen@intel.com
Grabbing thread from 
lore.kernel.org/all/cover.1746611892.git.yu.c.chen@intel.com/t.mbox.gz
Analyzing 3 messages in the thread
Looking for additional code-review trailers on lore.kernel.org
Analyzing 0 code-review messages
Checking attestation on all messages, may take a moment...
---
   ✓ [PATCH v4 1/2] sched/numa: fix task swap by skipping kernel threads
   ✓ [PATCH v4 2/2] sched/numa: add statistics of numa balance task 
migration
   ---
   ✓ Signed: DKIM/intel.com
---
Total patches: 2
---
Cover: 
./v4_20250507_yu_c_chen_sched_numa_add_statistics_of_numa_balance_task_migration.cover
  Link: https://lore.kernel.org/r/cover.1746611892.git.yu.c.chen@intel.com
  Base: not specified
        git am 
./v4_20250507_yu_c_chen_sched_numa_add_statistics_of_numa_balance_task_migration.mbx

# git am -i 
v4_20250507_yu_c_chen_sched_numa_add_statistics_of_numa_balance_task_migration.mbx
Commit Body is:
--------------------------
sched/numa: fix task swap by skipping kernel threads

Task swapping is triggered when there are no idle CPUs in
task A's preferred node. In this case, the NUMA load balancer
chooses a task B on A's preferred node and swaps B with A. This
helps improve NUMA locality without introducing load imbalance
between nodes.

In the current implementation, B's NUMA node preference is not
mandatory, and it aims not to increase load imbalance. That is
to say, a kernel thread might be chosen as B. However, kernel
threads are not supposed to be covered by NUMA balancing because
NUMA balancing only considers user pages via VMAs.

Fix this by not considering kernel threads as swap targets in
task_numa_compare(). This can be extended beyond kernel threads
in the future by checking if a swap candidate has a valid NUMA
preference through checking the candidate's numa_preferred_nid
and numa_faults. For now, keep the code simple.

Suggested-by: Michal Koutny <mkoutny@suse.com>
Tested-by: Ayush Jain <Ayush.jain3@amd.com>
Signed-off-by: Libo Chen <libo.chen@oracle.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
--------------------------
Apply? [y]es/[n]o/[e]dit/[v]iew patch/[a]ccept all: a
Applying: sched/numa: fix task swap by skipping kernel threads
Applying: sched/numa: add statistics of numa balance task migration
error: patch failed: Documentation/admin-guide/cgroup-v2.rst:1670
error: Documentation/admin-guide/cgroup-v2.rst: patch does not apply
error: patch failed: include/linux/sched.h:549
error: include/linux/sched.h: patch does not apply
error: patch failed: include/linux/vm_event_item.h:66
error: include/linux/vm_event_item.h: patch does not apply
error: patch failed: kernel/sched/core.c:3352
error: kernel/sched/core.c: patch does not apply
error: patch failed: kernel/sched/debug.c:1206
error: kernel/sched/debug.c: patch does not apply
error: patch failed: mm/memcontrol.c:463
error: mm/memcontrol.c: patch does not apply
error: patch failed: mm/vmstat.c:1347
error: mm/vmstat.c: patch does not apply
Patch failed at 0002 sched/numa: add statistics of numa balance task 
migration



Am I missing anything? Please suggest.


Regards,

Venkat.

Re: [PATCH v4 0/2] sched/numa: add statistics of numa balance task migration
Posted by Chen, Yu C 7 months, 1 week ago
Hi Venkat,

On 5/7/2025 10:32 PM, Venkat Rao Bagalkote wrote:
> Hello Chenyu,
> 
> 
> On 07/05/25 4:44 pm, Chen Yu wrote:
>> Introducing the task migration and swap statistics in the following 
>> places:
>> /sys/fs/cgroup/{GROUP}/memory.stat
>> /proc/{PID}/sched
>> /proc/vmstat
>>
>> These statistics facilitate a rapid evaluation of the performance and 
>> resource
>> utilization of the target workload.
>>
>> Patch 1 is a fix from Libo to avoid task swapping for kernel threads,
>> because Numa balance only cares about the user pages via VMA.
>>
>> Patch 2 is the major change to expose the statistics of task migration 
>> and
>> swapping in corresponding files.
>>
>> The reason to fold patch 1 and patch 2 into 1 patch set is that patch 
>> 1 is
>> necessary for patch 2 to avoid accessing a NULL mm_struct from a kernel
>> thread, which causes NULL pointer exception.
>>
>> The Tested-by and Acked-by tags are preserved, because these tags are 
>> provided
>> in version 1 which has the p->mm check.
> 
> I see below tags from version 1 are missing. I think, its contridicting 
> to the above line. Please correct me, If I am wrong.
> 
> 
> Tested-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com>
> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
> 

These tags are in the patch 2/2, because Madadi and Prateek mainly
tested patch 2/2.

> 
> For some reason, I am not able to apply this patch on top of 
> next-20250506. I see patch002 fails to apply. Please find the errors below.
> 

next-20250507 should be OK(I just checked on top of commit 08710e696081).
next-20250506 might still have the old patch 2/2, and next-20250507 has
reverted it.

thanks,
Chenyu
Re: [PATCH v4 0/2] sched/numa: add statistics of numa balance task migration
Posted by Venkat Rao Bagalkote 7 months, 1 week ago
On 07/05/25 8:22 pm, Chen, Yu C wrote:
> Hi Venkat,
>
> On 5/7/2025 10:32 PM, Venkat Rao Bagalkote wrote:
>> Hello Chenyu,
>>
>>
>> On 07/05/25 4:44 pm, Chen Yu wrote:
>>> Introducing the task migration and swap statistics in the following 
>>> places:
>>> /sys/fs/cgroup/{GROUP}/memory.stat
>>> /proc/{PID}/sched
>>> /proc/vmstat
>>>
>>> These statistics facilitate a rapid evaluation of the performance 
>>> and resource
>>> utilization of the target workload.
>>>
>>> Patch 1 is a fix from Libo to avoid task swapping for kernel threads,
>>> because Numa balance only cares about the user pages via VMA.
>>>
>>> Patch 2 is the major change to expose the statistics of task 
>>> migration and
>>> swapping in corresponding files.
>>>
>>> The reason to fold patch 1 and patch 2 into 1 patch set is that 
>>> patch 1 is
>>> necessary for patch 2 to avoid accessing a NULL mm_struct from a kernel
>>> thread, which causes NULL pointer exception.
>>>
>>> The Tested-by and Acked-by tags are preserved, because these tags 
>>> are provided
>>> in version 1 which has the p->mm check.
>>
>> I see below tags from version 1 are missing. I think, its 
>> contridicting to the above line. Please correct me, If I am wrong.
>>
>>
>> Tested-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com>
>> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
>>
>
> These tags are in the patch 2/2, because Madadi and Prateek mainly
> tested patch 2/2.


Understood. Thanks for clarification.

>
>>
>> For some reason, I am not able to apply this patch on top of 
>> next-20250506. I see patch002 fails to apply. Please find the errors 
>> below.
>>
>
> next-20250507 should be OK(I just checked on top of commit 08710e696081).
> next-20250506 might still have the old patch 2/2, and next-20250507 has
> reverted it.
>
With next-20250507, there is a build issue [1] 
<https://lore.kernel.org/all/1bcc235f-b139-4423-a7bd-2dd16065e08c@linux.ibm.com/> 
, I will test this, once the build issue fixed.


Regards,

Venkat.

> thanks,
> Chenyu