kernel/sched/fair.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
In the current implementation, if NUMA_BALANCING_MEMORY_TIERING is
disabled and the pages are on the lower tier, the pages may still be
promoted.
This happens because task_numa_work() updates the last_cpupid field to
record the last access time only when NUMA_BALANCING_MEMORY_TIERING is
enabled and the folio is on the lower tier. If
NUMA_BALANCING_MEMORY_TIERING is disabled, the last_cpupid field
can retains a valid last CPU id.
In should_numa_migrate_memory(), the decision checks whether
NUMA_BALANCING_MEMORY_TIERING is disabled, the folio is on the lower
tier, and last_cpupid is invalid. However, the last_cpupid can be
valid when NUMA_BALANCING_MEMORY_TIERING is disabled, the condition
evaluates to false and migration is allowed.
This patch prevents promotion when NUMA_BALANCING_MEMORY_TIERING is
disabled and the folio is on the lower tier.
Behavior before this change:
============================
- If NUMA_BALANCING_NORMAL is enabled, migration occurs between
nodes within the same memory tier, and promotion from lower
tier to higher tier may also happen.
- If NUMA_BALANCING_MEMORY_TIERING is enabled, promotion from
lower tier to higher tier nodes is allowed.
Behavior after this change:
===========================
- If NUMA_BALANCING_NORMAL is enabled, migration will occur only
between nodes within the same memory tier.
- If NUMA_BALANCING_MEMORY_TIERING is enabled, promotion from lower
tier to higher tier nodes will be allowed.
- If both NUMA_BALANCING_MEMORY_TIERING and NUMA_BALANCING_NORMAL are
enabled, both migration (same tier) and promotion (cross tier) are
allowed.
Fixes: 33024536bafd ("memory tiering: hot page selection with hint page fault latency")
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
---
v1 -> v2
========
1. Dropped changes in task_numa_fault() since the original changes
already handle runtime disabling of NUMA_BALANCING_MEMORY_TIERING.
v1 -> https://lore.kernel.org/all/20260320092251.1290207-1-donettom@linux.ibm.com/
---
kernel/sched/fair.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bf948db905ed..4b43809a3fb1 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2024,8 +2024,12 @@ bool should_numa_migrate_memory(struct task_struct *p, struct folio *folio,
this_cpupid = cpu_pid_to_cpupid(dst_cpu, current->pid);
last_cpupid = folio_xchg_last_cpupid(folio, this_cpupid);
+ /*
+ * Do not allow promotion if NUMA_BALANCING_MEMORY_TIERING is disabled
+ * and the pages are on the lower tier.
+ */
if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING) &&
- !node_is_toptier(src_nid) && !cpupid_valid(last_cpupid))
+ !node_is_toptier(src_nid))
return false;
/*
--
2.47.1
Donet Tom <donettom@linux.ibm.com> writes:
> In the current implementation, if NUMA_BALANCING_MEMORY_TIERING is
> disabled and the pages are on the lower tier, the pages may still be
> promoted.
>
> This happens because task_numa_work() updates the last_cpupid field to
> record the last access time only when NUMA_BALANCING_MEMORY_TIERING is
> enabled and the folio is on the lower tier. If
> NUMA_BALANCING_MEMORY_TIERING is disabled, the last_cpupid field
> can retains a valid last CPU id.
>
> In should_numa_migrate_memory(), the decision checks whether
> NUMA_BALANCING_MEMORY_TIERING is disabled, the folio is on the lower
> tier, and last_cpupid is invalid. However, the last_cpupid can be
> valid when NUMA_BALANCING_MEMORY_TIERING is disabled, the condition
> evaluates to false and migration is allowed.
>
> This patch prevents promotion when NUMA_BALANCING_MEMORY_TIERING is
> disabled and the folio is on the lower tier.
>
> Behavior before this change:
> ============================
> - If NUMA_BALANCING_NORMAL is enabled, migration occurs between
> nodes within the same memory tier, and promotion from lower
> tier to higher tier may also happen.
>
> - If NUMA_BALANCING_MEMORY_TIERING is enabled, promotion from
> lower tier to higher tier nodes is allowed.
>
> Behavior after this change:
> ===========================
> - If NUMA_BALANCING_NORMAL is enabled, migration will occur only
> between nodes within the same memory tier.
>
> - If NUMA_BALANCING_MEMORY_TIERING is enabled, promotion from lower
> tier to higher tier nodes will be allowed.
>
> - If both NUMA_BALANCING_MEMORY_TIERING and NUMA_BALANCING_NORMAL are
> enabled, both migration (same tier) and promotion (cross tier) are
> allowed.
>
> Fixes: 33024536bafd ("memory tiering: hot page selection with hint page fault latency")
> Signed-off-by: Donet Tom <donettom@linux.ibm.com>
> ---
> v1 -> v2
> ========
> 1. Dropped changes in task_numa_fault() since the original changes
> already handle runtime disabling of NUMA_BALANCING_MEMORY_TIERING.
>
> v1 -> https://lore.kernel.org/all/20260320092251.1290207-1-donettom@linux.ibm.com/
> ---
> kernel/sched/fair.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index bf948db905ed..4b43809a3fb1 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2024,8 +2024,12 @@ bool should_numa_migrate_memory(struct task_struct *p, struct folio *folio,
> this_cpupid = cpu_pid_to_cpupid(dst_cpu, current->pid);
> last_cpupid = folio_xchg_last_cpupid(folio, this_cpupid);
>
> + /*
> + * Do not allow promotion if NUMA_BALANCING_MEMORY_TIERING is disabled
> + * and the pages are on the lower tier.
> + */
> if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING) &&
> - !node_is_toptier(src_nid) && !cpupid_valid(last_cpupid))
> + !node_is_toptier(src_nid))
> return false;
>
> /*
No. Even if NUMA_BALANCING_MEMORY_TIERING is disabled, we should still
allow migrate pages from lower tier to higher tier via
NUMA_BALANCING_NORMAL. If we have precious DDR, why waste it? This
follows the semantics of NUMA_BALANCING_NORMAL before introducing
NUMA_BALANCING_MEMORY_TIERING.
---
Best Regards,
Huang, Ying
Hi
On 4/2/26 8:57 AM, Huang, Ying wrote:
> Donet Tom <donettom@linux.ibm.com> writes:
>
>> In the current implementation, if NUMA_BALANCING_MEMORY_TIERING is
>> disabled and the pages are on the lower tier, the pages may still be
>> promoted.
>>
>> This happens because task_numa_work() updates the last_cpupid field to
>> record the last access time only when NUMA_BALANCING_MEMORY_TIERING is
>> enabled and the folio is on the lower tier. If
>> NUMA_BALANCING_MEMORY_TIERING is disabled, the last_cpupid field
>> can retains a valid last CPU id.
>>
>> In should_numa_migrate_memory(), the decision checks whether
>> NUMA_BALANCING_MEMORY_TIERING is disabled, the folio is on the lower
>> tier, and last_cpupid is invalid. However, the last_cpupid can be
>> valid when NUMA_BALANCING_MEMORY_TIERING is disabled, the condition
>> evaluates to false and migration is allowed.
>>
>> This patch prevents promotion when NUMA_BALANCING_MEMORY_TIERING is
>> disabled and the folio is on the lower tier.
>>
>> Behavior before this change:
>> ============================
>> - If NUMA_BALANCING_NORMAL is enabled, migration occurs between
>> nodes within the same memory tier, and promotion from lower
>> tier to higher tier may also happen.
>>
>> - If NUMA_BALANCING_MEMORY_TIERING is enabled, promotion from
>> lower tier to higher tier nodes is allowed.
>>
>> Behavior after this change:
>> ===========================
>> - If NUMA_BALANCING_NORMAL is enabled, migration will occur only
>> between nodes within the same memory tier.
>>
>> - If NUMA_BALANCING_MEMORY_TIERING is enabled, promotion from lower
>> tier to higher tier nodes will be allowed.
>>
>> - If both NUMA_BALANCING_MEMORY_TIERING and NUMA_BALANCING_NORMAL are
>> enabled, both migration (same tier) and promotion (cross tier) are
>> allowed.
>>
>> Fixes: 33024536bafd ("memory tiering: hot page selection with hint page fault latency")
>> Signed-off-by: Donet Tom <donettom@linux.ibm.com>
>> ---
>> v1 -> v2
>> ========
>> 1. Dropped changes in task_numa_fault() since the original changes
>> already handle runtime disabling of NUMA_BALANCING_MEMORY_TIERING.
>>
>> v1 -> https://lore.kernel.org/all/20260320092251.1290207-1-donettom@linux.ibm.com/
>> ---
>> kernel/sched/fair.c | 6 +++++-
>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index bf948db905ed..4b43809a3fb1 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -2024,8 +2024,12 @@ bool should_numa_migrate_memory(struct task_struct *p, struct folio *folio,
>> this_cpupid = cpu_pid_to_cpupid(dst_cpu, current->pid);
>> last_cpupid = folio_xchg_last_cpupid(folio, this_cpupid);
>>
>> + /*
>> + * Do not allow promotion if NUMA_BALANCING_MEMORY_TIERING is disabled
>> + * and the pages are on the lower tier.
>> + */
>> if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING) &&
>> - !node_is_toptier(src_nid) && !cpupid_valid(last_cpupid))
>> + !node_is_toptier(src_nid))
>> return false;
>>
>> /*
> No. Even if NUMA_BALANCING_MEMORY_TIERING is disabled, we should still
> allow migrate pages from lower tier to higher tier via
> NUMA_BALANCING_NORMAL. If we have precious DDR, why waste it? This
> follows the semantics of NUMA_BALANCING_NORMAL before introducing
> NUMA_BALANCING_MEMORY_TIERING.
Thank you for the review comments.
One thing I am trying to understand is that page promotion
appears to happen regardless of whether
NUMA_BALANCING_MEMORY_TIERING is enabled or disabled. In that
case, what is the specific role of
NUMA_BALANCING_MEMORY_TIERING? Do we get better performance
when it is enabled?
My initial understanding was that disabling
NUMA_BALANCING_MEMORY_TIERING could be used to turn off
promotion. However, it seems that currently we cannot control
promotion independently. If NUMA_BALANCING_NORMAL is disabled,
neither migration nor promotion happens, and if it is enabled,
both migration and promotion can occur.
I was under the impression that:
- NUMA_BALANCING_NORMAL would handle migration within the same tier,
- NUMA_BALANCING_MEMORY_TIERING would handle promotion across tiers,
- and enabling both would allow both migration and promotion.
This would provide more fine-grained control. Is my
understanding correct, or am I missing something here?
>
> ---
> Best Regards,
> Huang, Ying
Donet Tom <donettom@linux.ibm.com> writes:
> Hi
Hi, Donet,
> On 4/2/26 8:57 AM, Huang, Ying wrote:
>> Donet Tom <donettom@linux.ibm.com> writes:
>>
>>> In the current implementation, if NUMA_BALANCING_MEMORY_TIERING is
>>> disabled and the pages are on the lower tier, the pages may still be
>>> promoted.
>>>
>>> This happens because task_numa_work() updates the last_cpupid field to
>>> record the last access time only when NUMA_BALANCING_MEMORY_TIERING is
>>> enabled and the folio is on the lower tier. If
>>> NUMA_BALANCING_MEMORY_TIERING is disabled, the last_cpupid field
>>> can retains a valid last CPU id.
>>>
>>> In should_numa_migrate_memory(), the decision checks whether
>>> NUMA_BALANCING_MEMORY_TIERING is disabled, the folio is on the lower
>>> tier, and last_cpupid is invalid. However, the last_cpupid can be
>>> valid when NUMA_BALANCING_MEMORY_TIERING is disabled, the condition
>>> evaluates to false and migration is allowed.
>>>
>>> This patch prevents promotion when NUMA_BALANCING_MEMORY_TIERING is
>>> disabled and the folio is on the lower tier.
>>>
>>> Behavior before this change:
>>> ============================
>>> - If NUMA_BALANCING_NORMAL is enabled, migration occurs between
>>> nodes within the same memory tier, and promotion from lower
>>> tier to higher tier may also happen.
>>>
>>> - If NUMA_BALANCING_MEMORY_TIERING is enabled, promotion from
>>> lower tier to higher tier nodes is allowed.
>>>
>>> Behavior after this change:
>>> ===========================
>>> - If NUMA_BALANCING_NORMAL is enabled, migration will occur only
>>> between nodes within the same memory tier.
>>>
>>> - If NUMA_BALANCING_MEMORY_TIERING is enabled, promotion from lower
>>> tier to higher tier nodes will be allowed.
>>>
>>> - If both NUMA_BALANCING_MEMORY_TIERING and NUMA_BALANCING_NORMAL are
>>> enabled, both migration (same tier) and promotion (cross tier) are
>>> allowed.
>>>
>>> Fixes: 33024536bafd ("memory tiering: hot page selection with hint page fault latency")
>>> Signed-off-by: Donet Tom <donettom@linux.ibm.com>
>>> ---
>>> v1 -> v2
>>> ========
>>> 1. Dropped changes in task_numa_fault() since the original changes
>>> already handle runtime disabling of NUMA_BALANCING_MEMORY_TIERING.
>>>
>>> v1 -> https://lore.kernel.org/all/20260320092251.1290207-1-donettom@linux.ibm.com/
>>> ---
>>> kernel/sched/fair.c | 6 +++++-
>>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index bf948db905ed..4b43809a3fb1 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -2024,8 +2024,12 @@ bool should_numa_migrate_memory(struct task_struct *p, struct folio *folio,
>>> this_cpupid = cpu_pid_to_cpupid(dst_cpu, current->pid);
>>> last_cpupid = folio_xchg_last_cpupid(folio, this_cpupid);
>>> + /*
>>> + * Do not allow promotion if NUMA_BALANCING_MEMORY_TIERING is disabled
>>> + * and the pages are on the lower tier.
>>> + */
>>> if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING) &&
>>> - !node_is_toptier(src_nid) && !cpupid_valid(last_cpupid))
>>> + !node_is_toptier(src_nid))
>>> return false;
>>> /*
>> No. Even if NUMA_BALANCING_MEMORY_TIERING is disabled, we should still
>> allow migrate pages from lower tier to higher tier via
>> NUMA_BALANCING_NORMAL. If we have precious DDR, why waste it? This
>> follows the semantics of NUMA_BALANCING_NORMAL before introducing
>> NUMA_BALANCING_MEMORY_TIERING.
>
> Thank you for the review comments.
>
> One thing I am trying to understand is that page promotion
> appears to happen regardless of whether
> NUMA_BALANCING_MEMORY_TIERING is enabled or disabled. In that
> case, what is the specific role of
> NUMA_BALANCING_MEMORY_TIERING? Do we get better performance
> when it is enabled?
You can search NUMA_BALANCING_MEMORY_TIERING to find out what it does.
We can get better performance as the original commit message says.
When NUMA_BALANCING_MEMORY_TIERING is introduced, we didn't change the
original behavior of NUMA_BALANCING_MEMORY_NORMAL because we had no good
reason to do that. In fact, you change its behavior, so you should
provide some supporting data or bug report to justify the change.
> My initial understanding was that disabling
> NUMA_BALANCING_MEMORY_TIERING could be used to turn off
> promotion. However, it seems that currently we cannot control
> promotion independently. If NUMA_BALANCING_NORMAL is disabled,
> neither migration nor promotion happens, and if it is enabled,
> both migration and promotion can occur.
>
> I was under the impression that:
> - NUMA_BALANCING_NORMAL would handle migration within the same tier,
> - NUMA_BALANCING_MEMORY_TIERING would handle promotion across tiers,
> - and enabling both would allow both migration and promotion.
>
> This would provide more fine-grained control. Is my
> understanding correct, or am I missing something here?
You can change this, if you have some supporting data or bug report.
---
Best Regards,
Huang, Ying
On Mon, 23 Mar 2026 04:48:49 -0500 Donet Tom <donettom@linux.ibm.com> wrote:
> In the current implementation, if NUMA_BALANCING_MEMORY_TIERING is
> disabled and the pages are on the lower tier, the pages may still be
> promoted.
>
> This happens because task_numa_work() updates the last_cpupid field to
> record the last access time only when NUMA_BALANCING_MEMORY_TIERING is
> enabled and the folio is on the lower tier. If
> NUMA_BALANCING_MEMORY_TIERING is disabled, the last_cpupid field
> can retains a valid last CPU id.
>
> In should_numa_migrate_memory(), the decision checks whether
> NUMA_BALANCING_MEMORY_TIERING is disabled, the folio is on the lower
> tier, and last_cpupid is invalid. However, the last_cpupid can be
> valid when NUMA_BALANCING_MEMORY_TIERING is disabled, the condition
> evaluates to false and migration is allowed.
>
> This patch prevents promotion when NUMA_BALANCING_MEMORY_TIERING is
> disabled and the folio is on the lower tier.
>
> Behavior before this change:
> ============================
> - If NUMA_BALANCING_NORMAL is enabled, migration occurs between
> nodes within the same memory tier, and promotion from lower
> tier to higher tier may also happen.
>
> - If NUMA_BALANCING_MEMORY_TIERING is enabled, promotion from
> lower tier to higher tier nodes is allowed.
>
> Behavior after this change:
> ===========================
> - If NUMA_BALANCING_NORMAL is enabled, migration will occur only
> between nodes within the same memory tier.
>
> - If NUMA_BALANCING_MEMORY_TIERING is enabled, promotion from lower
> tier to higher tier nodes will be allowed.
>
> - If both NUMA_BALANCING_MEMORY_TIERING and NUMA_BALANCING_NORMAL are
> enabled, both migration (same tier) and promotion (cross tier) are
> allowed.
There was no feedback on this, nor on your v1.
> Fixes: 33024536bafd ("memory tiering: hot page selection with hint page fault latency")
Ying Huang seems to have moved around a bit - let me add a couple more
email addresses. Apologies if we have multiple Ying Huangs!
Rik, Mel? It's a bugfix.
Thanks.
From: Donet Tom <donettom@linux.ibm.com>
Subject: memory tiering: do not allow promotion if NUMA_BALANCING_MEMORY_TIERING is disabled
Date: Mon, 23 Mar 2026 04:48:49 -0500
In the current implementation, if NUMA_BALANCING_MEMORY_TIERING is
disabled and the pages are on the lower tier, the pages may still be
promoted.
This happens because task_numa_work() updates the last_cpupid field to
record the last access time only when NUMA_BALANCING_MEMORY_TIERING is
enabled and the folio is on the lower tier. If
NUMA_BALANCING_MEMORY_TIERING is disabled, the last_cpupid field can
retains a valid last CPU id.
In should_numa_migrate_memory(), the decision checks whether
NUMA_BALANCING_MEMORY_TIERING is disabled, the folio is on the lower tier,
and last_cpupid is invalid. However, the last_cpupid can be valid when
NUMA_BALANCING_MEMORY_TIERING is disabled, the condition evaluates to
false and migration is allowed.
This patch prevents promotion when NUMA_BALANCING_MEMORY_TIERING is
disabled and the folio is on the lower tier.
Behavior before this change:
============================
- If NUMA_BALANCING_NORMAL is enabled, migration occurs between
nodes within the same memory tier, and promotion from lower
tier to higher tier may also happen.
- If NUMA_BALANCING_MEMORY_TIERING is enabled, promotion from
lower tier to higher tier nodes is allowed.
Behavior after this change:
===========================
- If NUMA_BALANCING_NORMAL is enabled, migration will occur only
between nodes within the same memory tier.
- If NUMA_BALANCING_MEMORY_TIERING is enabled, promotion from lower
tier to higher tier nodes will be allowed.
- If both NUMA_BALANCING_MEMORY_TIERING and NUMA_BALANCING_NORMAL are
enabled, both migration (same tier) and promotion (cross tier) are
allowed.
Link: https://lkml.kernel.org/r/20260323094849.3903-1-donettom@linux.ibm.com
Fixes: 33024536bafd ("memory tiering: hot page selection with hint page fault latency")
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Ben Segall <bsegall@google.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: "Huang, Ying" <huang.ying.caritas@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
kernel/sched/fair.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
--- a/kernel/sched/fair.c~memory-tiering-do-not-allow-promotion-if-numa_balancing_memory_tiering-is-disabled
+++ a/kernel/sched/fair.c
@@ -2024,8 +2024,12 @@ bool should_numa_migrate_memory(struct t
this_cpupid = cpu_pid_to_cpupid(dst_cpu, current->pid);
last_cpupid = folio_xchg_last_cpupid(folio, this_cpupid);
+ /*
+ * Do not allow promotion if NUMA_BALANCING_MEMORY_TIERING is disabled
+ * and the pages are on the lower tier.
+ */
if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING) &&
- !node_is_toptier(src_nid) && !cpupid_valid(last_cpupid))
+ !node_is_toptier(src_nid))
return false;
/*
_
Hi, Andrew,
Andrew Morton <akpm@linux-foundation.org> writes:
> On Mon, 23 Mar 2026 04:48:49 -0500 Donet Tom <donettom@linux.ibm.com> wrote:
>
>> In the current implementation, if NUMA_BALANCING_MEMORY_TIERING is
>> disabled and the pages are on the lower tier, the pages may still be
>> promoted.
>>
>> This happens because task_numa_work() updates the last_cpupid field to
>> record the last access time only when NUMA_BALANCING_MEMORY_TIERING is
>> enabled and the folio is on the lower tier. If
>> NUMA_BALANCING_MEMORY_TIERING is disabled, the last_cpupid field
>> can retains a valid last CPU id.
>>
>> In should_numa_migrate_memory(), the decision checks whether
>> NUMA_BALANCING_MEMORY_TIERING is disabled, the folio is on the lower
>> tier, and last_cpupid is invalid. However, the last_cpupid can be
>> valid when NUMA_BALANCING_MEMORY_TIERING is disabled, the condition
>> evaluates to false and migration is allowed.
>>
>> This patch prevents promotion when NUMA_BALANCING_MEMORY_TIERING is
>> disabled and the folio is on the lower tier.
>>
>> Behavior before this change:
>> ============================
>> - If NUMA_BALANCING_NORMAL is enabled, migration occurs between
>> nodes within the same memory tier, and promotion from lower
>> tier to higher tier may also happen.
>>
>> - If NUMA_BALANCING_MEMORY_TIERING is enabled, promotion from
>> lower tier to higher tier nodes is allowed.
>>
>> Behavior after this change:
>> ===========================
>> - If NUMA_BALANCING_NORMAL is enabled, migration will occur only
>> between nodes within the same memory tier.
>>
>> - If NUMA_BALANCING_MEMORY_TIERING is enabled, promotion from lower
>> tier to higher tier nodes will be allowed.
>>
>> - If both NUMA_BALANCING_MEMORY_TIERING and NUMA_BALANCING_NORMAL are
>> enabled, both migration (same tier) and promotion (cross tier) are
>> allowed.
>
> There was no feedback on this, nor on your v1.
>
>> Fixes: 33024536bafd ("memory tiering: hot page selection with hint page fault latency")
>
> Ying Huang seems to have moved around a bit - let me add a couple more
> email addresses. Apologies if we have multiple Ying Huangs!
Thanks! I don't find other Ying Huang in mm community yet.
Now I use the following email address:
"Huang, Ying" <ying.huang@linux.alibaba.com>
Ying Huang <huang.ying.caritas@gmail.com>
and stop using the following email address:
ying.huang@intel.com
> Rik, Mel? It's a bugfix.
>
> Thanks.
>
>
>
> From: Donet Tom <donettom@linux.ibm.com>
> Subject: memory tiering: do not allow promotion if NUMA_BALANCING_MEMORY_TIERING is disabled
> Date: Mon, 23 Mar 2026 04:48:49 -0500
>
> In the current implementation, if NUMA_BALANCING_MEMORY_TIERING is
> disabled and the pages are on the lower tier, the pages may still be
> promoted.
>
> This happens because task_numa_work() updates the last_cpupid field to
> record the last access time only when NUMA_BALANCING_MEMORY_TIERING is
> enabled and the folio is on the lower tier. If
> NUMA_BALANCING_MEMORY_TIERING is disabled, the last_cpupid field can
> retains a valid last CPU id.
>
> In should_numa_migrate_memory(), the decision checks whether
> NUMA_BALANCING_MEMORY_TIERING is disabled, the folio is on the lower tier,
> and last_cpupid is invalid. However, the last_cpupid can be valid when
> NUMA_BALANCING_MEMORY_TIERING is disabled, the condition evaluates to
> false and migration is allowed.
>
> This patch prevents promotion when NUMA_BALANCING_MEMORY_TIERING is
> disabled and the folio is on the lower tier.
>
> Behavior before this change:
> ============================
> - If NUMA_BALANCING_NORMAL is enabled, migration occurs between
> nodes within the same memory tier, and promotion from lower
> tier to higher tier may also happen.
>
> - If NUMA_BALANCING_MEMORY_TIERING is enabled, promotion from
> lower tier to higher tier nodes is allowed.
>
> Behavior after this change:
> ===========================
> - If NUMA_BALANCING_NORMAL is enabled, migration will occur only
> between nodes within the same memory tier.
>
> - If NUMA_BALANCING_MEMORY_TIERING is enabled, promotion from lower
> tier to higher tier nodes will be allowed.
>
> - If both NUMA_BALANCING_MEMORY_TIERING and NUMA_BALANCING_NORMAL are
> enabled, both migration (same tier) and promotion (cross tier) are
> allowed.
>
> Link: https://lkml.kernel.org/r/20260323094849.3903-1-donettom@linux.ibm.com
> Fixes: 33024536bafd ("memory tiering: hot page selection with hint page fault latency")
> Signed-off-by: Donet Tom <donettom@linux.ibm.com>
> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
> Cc: Ben Segall <bsegall@google.com>
> Cc: David Hildenbrand <david@kernel.org>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: "Huang, Ying" <huang.ying.caritas@gmail.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Juri Lelli <juri.lelli@redhat.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Valentin Schneider <vschneid@redhat.com>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
>
> kernel/sched/fair.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> --- a/kernel/sched/fair.c~memory-tiering-do-not-allow-promotion-if-numa_balancing_memory_tiering-is-disabled
> +++ a/kernel/sched/fair.c
> @@ -2024,8 +2024,12 @@ bool should_numa_migrate_memory(struct t
> this_cpupid = cpu_pid_to_cpupid(dst_cpu, current->pid);
> last_cpupid = folio_xchg_last_cpupid(folio, this_cpupid);
>
> + /*
> + * Do not allow promotion if NUMA_BALANCING_MEMORY_TIERING is disabled
> + * and the pages are on the lower tier.
> + */
> if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING) &&
> - !node_is_toptier(src_nid) && !cpupid_valid(last_cpupid))
> + !node_is_toptier(src_nid))
> return false;
>
> /*
> _
---
Best Regards,
Huang, Ying
© 2016 - 2026 Red Hat, Inc.