[v2] memory tiering: Do not allow promotion if NUMA_BALANCING_MEMORY_TIERING is disabled

[PATCH v2] memory tiering: Do not allow promotion if NUMA_BALANCING_MEMORY_TIERING is disabled

Posted by Donet Tom 1 week, 4 days ago

In the current implementation, if NUMA_BALANCING_MEMORY_TIERING is
disabled and the pages are on the lower tier, the pages may still be
promoted.

This happens because task_numa_work() updates the last_cpupid field to
record the last access time only when NUMA_BALANCING_MEMORY_TIERING is
enabled and the folio is on the lower tier. If
NUMA_BALANCING_MEMORY_TIERING is disabled, the last_cpupid field
can retains a valid last CPU id.

In should_numa_migrate_memory(), the decision checks whether
NUMA_BALANCING_MEMORY_TIERING is disabled, the folio is on the lower
tier, and last_cpupid is invalid. However, the last_cpupid can be
valid when NUMA_BALANCING_MEMORY_TIERING is disabled, the condition
evaluates to false and migration is allowed.

This patch prevents promotion when NUMA_BALANCING_MEMORY_TIERING is
disabled and the folio is on the lower tier.

Behavior before this change:
============================
  - If NUMA_BALANCING_NORMAL is enabled, migration occurs between
    nodes within the same memory tier, and promotion from lower
    tier to higher tier may also happen.

  - If NUMA_BALANCING_MEMORY_TIERING is enabled, promotion from
    lower tier to higher tier nodes is allowed.

Behavior after this change:
===========================
  - If NUMA_BALANCING_NORMAL is enabled, migration will occur only
    between nodes within the same memory tier.

  - If NUMA_BALANCING_MEMORY_TIERING is enabled, promotion from lower
    tier to higher tier nodes will be allowed.

  - If both NUMA_BALANCING_MEMORY_TIERING and NUMA_BALANCING_NORMAL are
    enabled, both migration (same tier) and promotion (cross tier) are
    allowed.

Fixes: 33024536bafd ("memory tiering: hot page selection with hint page fault latency")
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
---
v1 -> v2
========
1. Dropped changes in task_numa_fault() since the original changes
   already handle runtime disabling of NUMA_BALANCING_MEMORY_TIERING.

v1 -> https://lore.kernel.org/all/20260320092251.1290207-1-donettom@linux.ibm.com/
---
 kernel/sched/fair.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bf948db905ed..4b43809a3fb1 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2024,8 +2024,12 @@ bool should_numa_migrate_memory(struct task_struct *p, struct folio *folio,
 	this_cpupid = cpu_pid_to_cpupid(dst_cpu, current->pid);
 	last_cpupid = folio_xchg_last_cpupid(folio, this_cpupid);
 
+	/*
+	 * Do not allow promotion if NUMA_BALANCING_MEMORY_TIERING is disabled
+	 * and the pages are on the lower tier.
+	 */
 	if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING) &&
-	    !node_is_toptier(src_nid) && !cpupid_valid(last_cpupid))
+	    !node_is_toptier(src_nid))
 		return false;
 
 	/*
-- 
2.47.1

Re: [PATCH v2] memory tiering: Do not allow promotion if NUMA_BALANCING_MEMORY_TIERING is disabled

Posted by Huang, Ying 1 day, 15 hours ago

Donet Tom <donettom@linux.ibm.com> writes:

> In the current implementation, if NUMA_BALANCING_MEMORY_TIERING is
> disabled and the pages are on the lower tier, the pages may still be
> promoted.
>
> This happens because task_numa_work() updates the last_cpupid field to
> record the last access time only when NUMA_BALANCING_MEMORY_TIERING is
> enabled and the folio is on the lower tier. If
> NUMA_BALANCING_MEMORY_TIERING is disabled, the last_cpupid field
> can retains a valid last CPU id.
>
> In should_numa_migrate_memory(), the decision checks whether
> NUMA_BALANCING_MEMORY_TIERING is disabled, the folio is on the lower
> tier, and last_cpupid is invalid. However, the last_cpupid can be
> valid when NUMA_BALANCING_MEMORY_TIERING is disabled, the condition
> evaluates to false and migration is allowed.
>
> This patch prevents promotion when NUMA_BALANCING_MEMORY_TIERING is
> disabled and the folio is on the lower tier.
>
> Behavior before this change:
> ============================
>   - If NUMA_BALANCING_NORMAL is enabled, migration occurs between
>     nodes within the same memory tier, and promotion from lower
>     tier to higher tier may also happen.
>
>   - If NUMA_BALANCING_MEMORY_TIERING is enabled, promotion from
>     lower tier to higher tier nodes is allowed.
>
> Behavior after this change:
> ===========================
>   - If NUMA_BALANCING_NORMAL is enabled, migration will occur only
>     between nodes within the same memory tier.
>
>   - If NUMA_BALANCING_MEMORY_TIERING is enabled, promotion from lower
>     tier to higher tier nodes will be allowed.
>
>   - If both NUMA_BALANCING_MEMORY_TIERING and NUMA_BALANCING_NORMAL are
>     enabled, both migration (same tier) and promotion (cross tier) are
>     allowed.
>
> Fixes: 33024536bafd ("memory tiering: hot page selection with hint page fault latency")
> Signed-off-by: Donet Tom <donettom@linux.ibm.com>
> ---
> v1 -> v2
> ========
> 1. Dropped changes in task_numa_fault() since the original changes
>    already handle runtime disabling of NUMA_BALANCING_MEMORY_TIERING.
>
> v1 -> https://lore.kernel.org/all/20260320092251.1290207-1-donettom@linux.ibm.com/
> ---
>  kernel/sched/fair.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index bf948db905ed..4b43809a3fb1 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2024,8 +2024,12 @@ bool should_numa_migrate_memory(struct task_struct *p, struct folio *folio,
>  	this_cpupid = cpu_pid_to_cpupid(dst_cpu, current->pid);
>  	last_cpupid = folio_xchg_last_cpupid(folio, this_cpupid);
>  
> +	/*
> +	 * Do not allow promotion if NUMA_BALANCING_MEMORY_TIERING is disabled
> +	 * and the pages are on the lower tier.
> +	 */
>  	if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING) &&
> -	    !node_is_toptier(src_nid) && !cpupid_valid(last_cpupid))
> +	    !node_is_toptier(src_nid))
>  		return false;
>  
>  	/*

No.  Even if NUMA_BALANCING_MEMORY_TIERING is disabled, we should still
allow migrate pages from lower tier to higher tier via
NUMA_BALANCING_NORMAL.  If we have precious DDR, why waste it?  This
follows the semantics of NUMA_BALANCING_NORMAL before introducing
NUMA_BALANCING_MEMORY_TIERING.

---
Best Regards,
Huang, Ying

Re: [PATCH v2] memory tiering: Do not allow promotion if NUMA_BALANCING_MEMORY_TIERING is disabled

Posted by Donet Tom 1 day, 14 hours ago

Hi

On 4/2/26 8:57 AM, Huang, Ying wrote:
> Donet Tom <donettom@linux.ibm.com> writes:
>
>> In the current implementation, if NUMA_BALANCING_MEMORY_TIERING is
>> disabled and the pages are on the lower tier, the pages may still be
>> promoted.
>>
>> This happens because task_numa_work() updates the last_cpupid field to
>> record the last access time only when NUMA_BALANCING_MEMORY_TIERING is
>> enabled and the folio is on the lower tier. If
>> NUMA_BALANCING_MEMORY_TIERING is disabled, the last_cpupid field
>> can retains a valid last CPU id.
>>
>> In should_numa_migrate_memory(), the decision checks whether
>> NUMA_BALANCING_MEMORY_TIERING is disabled, the folio is on the lower
>> tier, and last_cpupid is invalid. However, the last_cpupid can be
>> valid when NUMA_BALANCING_MEMORY_TIERING is disabled, the condition
>> evaluates to false and migration is allowed.
>>
>> This patch prevents promotion when NUMA_BALANCING_MEMORY_TIERING is
>> disabled and the folio is on the lower tier.
>>
>> Behavior before this change:
>> ============================
>>    - If NUMA_BALANCING_NORMAL is enabled, migration occurs between
>>      nodes within the same memory tier, and promotion from lower
>>      tier to higher tier may also happen.
>>
>>    - If NUMA_BALANCING_MEMORY_TIERING is enabled, promotion from
>>      lower tier to higher tier nodes is allowed.
>>
>> Behavior after this change:
>> ===========================
>>    - If NUMA_BALANCING_NORMAL is enabled, migration will occur only
>>      between nodes within the same memory tier.
>>
>>    - If NUMA_BALANCING_MEMORY_TIERING is enabled, promotion from lower
>>      tier to higher tier nodes will be allowed.
>>
>>    - If both NUMA_BALANCING_MEMORY_TIERING and NUMA_BALANCING_NORMAL are
>>      enabled, both migration (same tier) and promotion (cross tier) are
>>      allowed.
>>
>> Fixes: 33024536bafd ("memory tiering: hot page selection with hint page fault latency")
>> Signed-off-by: Donet Tom <donettom@linux.ibm.com>
>> ---
>> v1 -> v2
>> ========
>> 1. Dropped changes in task_numa_fault() since the original changes
>>     already handle runtime disabling of NUMA_BALANCING_MEMORY_TIERING.
>>
>> v1 -> https://lore.kernel.org/all/20260320092251.1290207-1-donettom@linux.ibm.com/
>> ---
>>   kernel/sched/fair.c | 6 +++++-
>>   1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index bf948db905ed..4b43809a3fb1 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -2024,8 +2024,12 @@ bool should_numa_migrate_memory(struct task_struct *p, struct folio *folio,
>>   	this_cpupid = cpu_pid_to_cpupid(dst_cpu, current->pid);
>>   	last_cpupid = folio_xchg_last_cpupid(folio, this_cpupid);
>>   
>> +	/*
>> +	 * Do not allow promotion if NUMA_BALANCING_MEMORY_TIERING is disabled
>> +	 * and the pages are on the lower tier.
>> +	 */
>>   	if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING) &&
>> -	    !node_is_toptier(src_nid) && !cpupid_valid(last_cpupid))
>> +	    !node_is_toptier(src_nid))
>>   		return false;
>>   
>>   	/*
> No.  Even if NUMA_BALANCING_MEMORY_TIERING is disabled, we should still
> allow migrate pages from lower tier to higher tier via
> NUMA_BALANCING_NORMAL.  If we have precious DDR, why waste it?  This
> follows the semantics of NUMA_BALANCING_NORMAL before introducing
> NUMA_BALANCING_MEMORY_TIERING.

Thank you for the review comments.

One thing I am trying to understand is that page promotion
appears to happen regardless of whether
NUMA_BALANCING_MEMORY_TIERING is enabled or disabled. In that
case, what is the specific role of
NUMA_BALANCING_MEMORY_TIERING? Do we get better performance
when it is enabled?

My initial understanding was that disabling
NUMA_BALANCING_MEMORY_TIERING could be used to turn off
promotion. However, it seems that currently we cannot control
promotion independently. If NUMA_BALANCING_NORMAL is disabled,
neither migration nor promotion happens, and if it is enabled,
both migration and promotion can occur.

I was under the impression that:
- NUMA_BALANCING_NORMAL would handle migration within the same tier,
- NUMA_BALANCING_MEMORY_TIERING would handle promotion across tiers,
- and enabling both would allow both migration and promotion.

This would provide more fine-grained control. Is my
understanding correct, or am I missing something here?


>
> ---
> Best Regards,
> Huang, Ying

Re: [PATCH v2] memory tiering: Do not allow promotion if NUMA_BALANCING_MEMORY_TIERING is disabled

Posted by Huang, Ying 1 day, 12 hours ago

Donet Tom <donettom@linux.ibm.com> writes:

> Hi

Hi, Donet,

> On 4/2/26 8:57 AM, Huang, Ying wrote:
>> Donet Tom <donettom@linux.ibm.com> writes:
>>
>>> In the current implementation, if NUMA_BALANCING_MEMORY_TIERING is
>>> disabled and the pages are on the lower tier, the pages may still be
>>> promoted.
>>>
>>> This happens because task_numa_work() updates the last_cpupid field to
>>> record the last access time only when NUMA_BALANCING_MEMORY_TIERING is
>>> enabled and the folio is on the lower tier. If
>>> NUMA_BALANCING_MEMORY_TIERING is disabled, the last_cpupid field
>>> can retains a valid last CPU id.
>>>
>>> In should_numa_migrate_memory(), the decision checks whether
>>> NUMA_BALANCING_MEMORY_TIERING is disabled, the folio is on the lower
>>> tier, and last_cpupid is invalid. However, the last_cpupid can be
>>> valid when NUMA_BALANCING_MEMORY_TIERING is disabled, the condition
>>> evaluates to false and migration is allowed.
>>>
>>> This patch prevents promotion when NUMA_BALANCING_MEMORY_TIERING is
>>> disabled and the folio is on the lower tier.
>>>
>>> Behavior before this change:
>>> ============================
>>>    - If NUMA_BALANCING_NORMAL is enabled, migration occurs between
>>>      nodes within the same memory tier, and promotion from lower
>>>      tier to higher tier may also happen.
>>>
>>>    - If NUMA_BALANCING_MEMORY_TIERING is enabled, promotion from
>>>      lower tier to higher tier nodes is allowed.
>>>
>>> Behavior after this change:
>>> ===========================
>>>    - If NUMA_BALANCING_NORMAL is enabled, migration will occur only
>>>      between nodes within the same memory tier.
>>>
>>>    - If NUMA_BALANCING_MEMORY_TIERING is enabled, promotion from lower
>>>      tier to higher tier nodes will be allowed.
>>>
>>>    - If both NUMA_BALANCING_MEMORY_TIERING and NUMA_BALANCING_NORMAL are
>>>      enabled, both migration (same tier) and promotion (cross tier) are
>>>      allowed.
>>>
>>> Fixes: 33024536bafd ("memory tiering: hot page selection with hint page fault latency")
>>> Signed-off-by: Donet Tom <donettom@linux.ibm.com>
>>> ---
>>> v1 -> v2
>>> ========
>>> 1. Dropped changes in task_numa_fault() since the original changes
>>>     already handle runtime disabling of NUMA_BALANCING_MEMORY_TIERING.
>>>
>>> v1 -> https://lore.kernel.org/all/20260320092251.1290207-1-donettom@linux.ibm.com/
>>> ---
>>>   kernel/sched/fair.c | 6 +++++-
>>>   1 file changed, 5 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index bf948db905ed..4b43809a3fb1 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -2024,8 +2024,12 @@ bool should_numa_migrate_memory(struct task_struct *p, struct folio *folio,
>>>   	this_cpupid = cpu_pid_to_cpupid(dst_cpu, current->pid);
>>>   	last_cpupid = folio_xchg_last_cpupid(folio, this_cpupid);
>>>   +	/*
>>> +	 * Do not allow promotion if NUMA_BALANCING_MEMORY_TIERING is disabled
>>> +	 * and the pages are on the lower tier.
>>> +	 */
>>>   	if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING) &&
>>> -	    !node_is_toptier(src_nid) && !cpupid_valid(last_cpupid))
>>> +	    !node_is_toptier(src_nid))
>>>   		return false;
>>>     	/*
>> No.  Even if NUMA_BALANCING_MEMORY_TIERING is disabled, we should still
>> allow migrate pages from lower tier to higher tier via
>> NUMA_BALANCING_NORMAL.  If we have precious DDR, why waste it?  This
>> follows the semantics of NUMA_BALANCING_NORMAL before introducing
>> NUMA_BALANCING_MEMORY_TIERING.
>
> Thank you for the review comments.
>
> One thing I am trying to understand is that page promotion
> appears to happen regardless of whether
> NUMA_BALANCING_MEMORY_TIERING is enabled or disabled. In that
> case, what is the specific role of
> NUMA_BALANCING_MEMORY_TIERING? Do we get better performance
> when it is enabled?

You can search NUMA_BALANCING_MEMORY_TIERING to find out what it does.
We can get better performance as the original commit message says.

When NUMA_BALANCING_MEMORY_TIERING is introduced, we didn't change the
original behavior of NUMA_BALANCING_MEMORY_NORMAL because we had no good
reason to do that.  In fact, you change its behavior, so you should
provide some supporting data or bug report to justify the change.

> My initial understanding was that disabling
> NUMA_BALANCING_MEMORY_TIERING could be used to turn off
> promotion. However, it seems that currently we cannot control
> promotion independently. If NUMA_BALANCING_NORMAL is disabled,
> neither migration nor promotion happens, and if it is enabled,
> both migration and promotion can occur.
>
> I was under the impression that:
> - NUMA_BALANCING_NORMAL would handle migration within the same tier,
> - NUMA_BALANCING_MEMORY_TIERING would handle promotion across tiers,
> - and enabling both would allow both migration and promotion.
>
> This would provide more fine-grained control. Is my
> understanding correct, or am I missing something here?

You can change this, if you have some supporting data or bug report.

---
Best Regards,
Huang, Ying

Re: [PATCH v2] memory tiering: Do not allow promotion if NUMA_BALANCING_MEMORY_TIERING is disabled

Posted by Andrew Morton 1 day, 19 hours ago

On Mon, 23 Mar 2026 04:48:49 -0500 Donet Tom <donettom@linux.ibm.com> wrote:

> In the current implementation, if NUMA_BALANCING_MEMORY_TIERING is
> disabled and the pages are on the lower tier, the pages may still be
> promoted.
> 
> This happens because task_numa_work() updates the last_cpupid field to
> record the last access time only when NUMA_BALANCING_MEMORY_TIERING is
> enabled and the folio is on the lower tier. If
> NUMA_BALANCING_MEMORY_TIERING is disabled, the last_cpupid field
> can retains a valid last CPU id.
> 
> In should_numa_migrate_memory(), the decision checks whether
> NUMA_BALANCING_MEMORY_TIERING is disabled, the folio is on the lower
> tier, and last_cpupid is invalid. However, the last_cpupid can be
> valid when NUMA_BALANCING_MEMORY_TIERING is disabled, the condition
> evaluates to false and migration is allowed.
> 
> This patch prevents promotion when NUMA_BALANCING_MEMORY_TIERING is
> disabled and the folio is on the lower tier.
> 
> Behavior before this change:
> ============================
>   - If NUMA_BALANCING_NORMAL is enabled, migration occurs between
>     nodes within the same memory tier, and promotion from lower
>     tier to higher tier may also happen.
> 
>   - If NUMA_BALANCING_MEMORY_TIERING is enabled, promotion from
>     lower tier to higher tier nodes is allowed.
> 
> Behavior after this change:
> ===========================
>   - If NUMA_BALANCING_NORMAL is enabled, migration will occur only
>     between nodes within the same memory tier.
> 
>   - If NUMA_BALANCING_MEMORY_TIERING is enabled, promotion from lower
>     tier to higher tier nodes will be allowed.
> 
>   - If both NUMA_BALANCING_MEMORY_TIERING and NUMA_BALANCING_NORMAL are
>     enabled, both migration (same tier) and promotion (cross tier) are
>     allowed.

There was no feedback on this, nor on your v1.

> Fixes: 33024536bafd ("memory tiering: hot page selection with hint page fault latency")

Ying Huang seems to have moved around a bit - let me add a couple more
email addresses.  Apologies if we have multiple Ying Huangs!

Rik, Mel?  It's a bugfix.

Thanks.



From: Donet Tom <donettom@linux.ibm.com>
Subject: memory tiering: do not allow promotion if NUMA_BALANCING_MEMORY_TIERING is disabled
Date: Mon, 23 Mar 2026 04:48:49 -0500

In the current implementation, if NUMA_BALANCING_MEMORY_TIERING is
disabled and the pages are on the lower tier, the pages may still be
promoted.

This happens because task_numa_work() updates the last_cpupid field to
record the last access time only when NUMA_BALANCING_MEMORY_TIERING is
enabled and the folio is on the lower tier.  If
NUMA_BALANCING_MEMORY_TIERING is disabled, the last_cpupid field can
retains a valid last CPU id.

In should_numa_migrate_memory(), the decision checks whether
NUMA_BALANCING_MEMORY_TIERING is disabled, the folio is on the lower tier,
and last_cpupid is invalid.  However, the last_cpupid can be valid when
NUMA_BALANCING_MEMORY_TIERING is disabled, the condition evaluates to
false and migration is allowed.

This patch prevents promotion when NUMA_BALANCING_MEMORY_TIERING is
disabled and the folio is on the lower tier.

Behavior before this change:
============================
  - If NUMA_BALANCING_NORMAL is enabled, migration occurs between
    nodes within the same memory tier, and promotion from lower
    tier to higher tier may also happen.

  - If NUMA_BALANCING_MEMORY_TIERING is enabled, promotion from
    lower tier to higher tier nodes is allowed.

Behavior after this change:
===========================
  - If NUMA_BALANCING_NORMAL is enabled, migration will occur only
    between nodes within the same memory tier.

  - If NUMA_BALANCING_MEMORY_TIERING is enabled, promotion from lower
    tier to higher tier nodes will be allowed.

  - If both NUMA_BALANCING_MEMORY_TIERING and NUMA_BALANCING_NORMAL are
    enabled, both migration (same tier) and promotion (cross tier) are
    allowed.

Link: https://lkml.kernel.org/r/20260323094849.3903-1-donettom@linux.ibm.com
Fixes: 33024536bafd ("memory tiering: hot page selection with hint page fault latency")
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Ben Segall <bsegall@google.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: "Huang, Ying" <huang.ying.caritas@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 kernel/sched/fair.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

--- a/kernel/sched/fair.c~memory-tiering-do-not-allow-promotion-if-numa_balancing_memory_tiering-is-disabled
+++ a/kernel/sched/fair.c
@@ -2024,8 +2024,12 @@ bool should_numa_migrate_memory(struct t
 	this_cpupid = cpu_pid_to_cpupid(dst_cpu, current->pid);
 	last_cpupid = folio_xchg_last_cpupid(folio, this_cpupid);
 
+	/*
+	 * Do not allow promotion if NUMA_BALANCING_MEMORY_TIERING is disabled
+	 * and the pages are on the lower tier.
+	 */
 	if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING) &&
-	    !node_is_toptier(src_nid) && !cpupid_valid(last_cpupid))
+	    !node_is_toptier(src_nid))
 		return false;
 
 	/*
_

Re: [PATCH v2] memory tiering: Do not allow promotion if NUMA_BALANCING_MEMORY_TIERING is disabled

Posted by Huang, Ying 1 day, 15 hours ago

Hi, Andrew,

Andrew Morton <akpm@linux-foundation.org> writes:

> On Mon, 23 Mar 2026 04:48:49 -0500 Donet Tom <donettom@linux.ibm.com> wrote:
>
>> In the current implementation, if NUMA_BALANCING_MEMORY_TIERING is
>> disabled and the pages are on the lower tier, the pages may still be
>> promoted.
>> 
>> This happens because task_numa_work() updates the last_cpupid field to
>> record the last access time only when NUMA_BALANCING_MEMORY_TIERING is
>> enabled and the folio is on the lower tier. If
>> NUMA_BALANCING_MEMORY_TIERING is disabled, the last_cpupid field
>> can retains a valid last CPU id.
>> 
>> In should_numa_migrate_memory(), the decision checks whether
>> NUMA_BALANCING_MEMORY_TIERING is disabled, the folio is on the lower
>> tier, and last_cpupid is invalid. However, the last_cpupid can be
>> valid when NUMA_BALANCING_MEMORY_TIERING is disabled, the condition
>> evaluates to false and migration is allowed.
>> 
>> This patch prevents promotion when NUMA_BALANCING_MEMORY_TIERING is
>> disabled and the folio is on the lower tier.
>> 
>> Behavior before this change:
>> ============================
>>   - If NUMA_BALANCING_NORMAL is enabled, migration occurs between
>>     nodes within the same memory tier, and promotion from lower
>>     tier to higher tier may also happen.
>> 
>>   - If NUMA_BALANCING_MEMORY_TIERING is enabled, promotion from
>>     lower tier to higher tier nodes is allowed.
>> 
>> Behavior after this change:
>> ===========================
>>   - If NUMA_BALANCING_NORMAL is enabled, migration will occur only
>>     between nodes within the same memory tier.
>> 
>>   - If NUMA_BALANCING_MEMORY_TIERING is enabled, promotion from lower
>>     tier to higher tier nodes will be allowed.
>> 
>>   - If both NUMA_BALANCING_MEMORY_TIERING and NUMA_BALANCING_NORMAL are
>>     enabled, both migration (same tier) and promotion (cross tier) are
>>     allowed.
>
> There was no feedback on this, nor on your v1.
>
>> Fixes: 33024536bafd ("memory tiering: hot page selection with hint page fault latency")
>
> Ying Huang seems to have moved around a bit - let me add a couple more
> email addresses.  Apologies if we have multiple Ying Huangs!

Thanks!  I don't find other Ying Huang in mm community yet.

Now I use the following email address:

"Huang, Ying" <ying.huang@linux.alibaba.com>
Ying Huang <huang.ying.caritas@gmail.com>

and stop using the following email address:

ying.huang@intel.com

> Rik, Mel?  It's a bugfix.
>
> Thanks.
>
>
>
> From: Donet Tom <donettom@linux.ibm.com>
> Subject: memory tiering: do not allow promotion if NUMA_BALANCING_MEMORY_TIERING is disabled
> Date: Mon, 23 Mar 2026 04:48:49 -0500
>
> In the current implementation, if NUMA_BALANCING_MEMORY_TIERING is
> disabled and the pages are on the lower tier, the pages may still be
> promoted.
>
> This happens because task_numa_work() updates the last_cpupid field to
> record the last access time only when NUMA_BALANCING_MEMORY_TIERING is
> enabled and the folio is on the lower tier.  If
> NUMA_BALANCING_MEMORY_TIERING is disabled, the last_cpupid field can
> retains a valid last CPU id.
>
> In should_numa_migrate_memory(), the decision checks whether
> NUMA_BALANCING_MEMORY_TIERING is disabled, the folio is on the lower tier,
> and last_cpupid is invalid.  However, the last_cpupid can be valid when
> NUMA_BALANCING_MEMORY_TIERING is disabled, the condition evaluates to
> false and migration is allowed.
>
> This patch prevents promotion when NUMA_BALANCING_MEMORY_TIERING is
> disabled and the folio is on the lower tier.
>
> Behavior before this change:
> ============================
>   - If NUMA_BALANCING_NORMAL is enabled, migration occurs between
>     nodes within the same memory tier, and promotion from lower
>     tier to higher tier may also happen.
>
>   - If NUMA_BALANCING_MEMORY_TIERING is enabled, promotion from
>     lower tier to higher tier nodes is allowed.
>
> Behavior after this change:
> ===========================
>   - If NUMA_BALANCING_NORMAL is enabled, migration will occur only
>     between nodes within the same memory tier.
>
>   - If NUMA_BALANCING_MEMORY_TIERING is enabled, promotion from lower
>     tier to higher tier nodes will be allowed.
>
>   - If both NUMA_BALANCING_MEMORY_TIERING and NUMA_BALANCING_NORMAL are
>     enabled, both migration (same tier) and promotion (cross tier) are
>     allowed.
>
> Link: https://lkml.kernel.org/r/20260323094849.3903-1-donettom@linux.ibm.com
> Fixes: 33024536bafd ("memory tiering: hot page selection with hint page fault latency")
> Signed-off-by: Donet Tom <donettom@linux.ibm.com>
> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
> Cc: Ben Segall <bsegall@google.com>
> Cc: David Hildenbrand <david@kernel.org>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: "Huang, Ying" <huang.ying.caritas@gmail.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Juri Lelli <juri.lelli@redhat.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Valentin Schneider <vschneid@redhat.com>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
>
>  kernel/sched/fair.c |    6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> --- a/kernel/sched/fair.c~memory-tiering-do-not-allow-promotion-if-numa_balancing_memory_tiering-is-disabled
> +++ a/kernel/sched/fair.c
> @@ -2024,8 +2024,12 @@ bool should_numa_migrate_memory(struct t
>  	this_cpupid = cpu_pid_to_cpupid(dst_cpu, current->pid);
>  	last_cpupid = folio_xchg_last_cpupid(folio, this_cpupid);
>  
> +	/*
> +	 * Do not allow promotion if NUMA_BALANCING_MEMORY_TIERING is disabled
> +	 * and the pages are on the lower tier.
> +	 */
>  	if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING) &&
> -	    !node_is_toptier(src_nid) && !cpupid_valid(last_cpupid))
> +	    !node_is_toptier(src_nid))
>  		return false;
>  
>  	/*
> _

---
Best Regards,
Huang, Ying