include/linux/damon.h | 6 ++ mm/damon/core.c | 158 ++++++++++++++++++++++++++++++++++++--- mm/damon/sysfs-schemes.c | 12 +++ 3 files changed, 164 insertions(+), 12 deletions(-)
This patch introduces two new DAMON quota goal metrics for controlling
memory distribution in heterogeneous memory systems (e.g., DRAM and CXL
memory tiering) using physical address (PA) mode monitoring.
v3: https://lore.kernel.org/linux-mm/20260223123232.12851-1-ravis.opensrc@gmail.com/
Changes since v3:
=================
- The first two patches from v3 (goal_tuner initialization fix and
esz=0 quota bypass fix) are now in damon/next. This submission
contains only the core metrics patch, rebased on top of those fixes.
- Simplified implementation: removed per-node eligible_bytes array, now
iterates scheme-eligible regions directly for each goal evaluation.
- Handle regions crossing node boundaries: uses damon_get_folio() to
determine actual NUMA node placement of each folio rather than
assuming uniform node placement within a region.
- Pass scheme pointer directly to metric calculation functions, avoiding
container_of() derivation from quota pointer.
- Fixed 80-column wrapping issues.
Background and Motivation
=========================
In heterogeneous memory systems, controlling memory distribution across
NUMA nodes is essential for performance optimization. This patch enables
system-wide page distribution with target-state goals like "maintain 30%
of scheme-eligible memory on CXL" using PA-mode DAMON schemes.
What These Metrics Measure
==========================
node_eligible_mem_bp:
scheme_eligible_bytes_on_node / total_scheme_eligible_bytes * 10000
node_ineligible_mem_bp:
(total - scheme_eligible_bytes_on_node) / total * 10000
The metrics are complementary: eligible_bp + ineligible_bp = 10000 bp.
Two-Scheme Setup for Hot Page Distribution
==========================================
For maintaining 30% of hot memory on CXL (node 1):
PUSH scheme: migrate_hot from node 0 -> node 1
goal: node_ineligible_mem_bp, nid=0, target=3000
"Push hot pages out until 30% of hot memory is NOT on DRAM"
PULL scheme: migrate_hot from node 1 -> node 0
goal: node_eligible_mem_bp, nid=0, target=7000
"Pull hot pages back until 70% of hot memory IS on DRAM"
The complementary goals create a feedback loop that converges to the
target distribution.
Dependencies
============
This patch is based on SJ's damon/next branch which includes the
TEMPORAL goal tuner required for these metrics.
Testing Results
===============
Functionally tested on a two-node heterogeneous memory system with DRAM
(node 0) and CXL memory (node 1). Used PUSH+PULL scheme configuration
with migrate_hot action to maintain a target hot memory ratio between
the two tiers.
With the TEMPORAL goal tuner, the system converges quickly to the target
distribution. The tuner drives esz to maximum when under goal and to
zero once the goal is met, forming a simple on/off feedback loop that
stabilizes at the desired ratio.
With the CONSIST tuner, the scheme still converges but more slowly, as
it migrates and then throttles itself based on quota feedback. The time
to reach the goal varies depending on workload intensity.
Ravi Jonnalagadda (1):
mm/damon: add node_eligible_mem_bp and node_ineligible_mem_bp goal
metrics
include/linux/damon.h | 6 ++
mm/damon/core.c | 158 ++++++++++++++++++++++++++++++++++++---
mm/damon/sysfs-schemes.c | 12 +++
3 files changed, 164 insertions(+), 12 deletions(-)
--
2.43.0
Hello Ravi, Thank you for this patch! TL; DR: Other than trivial things I commented below and to the patch, I believe it is time to drop the RFC tag, and work on merging this. On Fri, 20 Mar 2026 12:04:52 -0700 Ravi Jonnalagadda <ravis.opensrc@gmail.com> wrote: > This patch introduces two new DAMON quota goal metrics for controlling s/DAMON/DAMOS/ ? > memory distribution in heterogeneous memory systems (e.g., DRAM and CXL > memory tiering) using physical address (PA) mode monitoring. > > v3: https://lore.kernel.org/linux-mm/20260223123232.12851-1-ravis.opensrc@gmail.com/ The above link would better to be put on 'Chage since v3' section below. > > Changes since v3: > ================= > > - The first two patches from v3 (goal_tuner initialization fix and > esz=0 quota bypass fix) are now in damon/next. This submission It is not also in mm-unstable :) > contains only the core metrics patch, rebased on top of those fixes. > > - Simplified implementation: removed per-node eligible_bytes array, now > iterates scheme-eligible regions directly for each goal evaluation. > > - Handle regions crossing node boundaries: uses damon_get_folio() to > determine actual NUMA node placement of each folio rather than > assuming uniform node placement within a region. > > - Pass scheme pointer directly to metric calculation functions, avoiding > container_of() derivation from quota pointer. > > - Fixed 80-column wrapping issues. Thank you for addressing all my comments! > > Background and Motivation > ========================= > > In heterogeneous memory systems, controlling memory distribution across > NUMA nodes is essential for performance optimization. This patch enables > system-wide page distribution with target-state goals like "maintain 30% > of scheme-eligible memory on CXL" using PA-mode DAMON schemes. > > What These Metrics Measure > ========================== > > node_eligible_mem_bp: > scheme_eligible_bytes_on_node / total_scheme_eligible_bytes * 10000 > > node_ineligible_mem_bp: > (total - scheme_eligible_bytes_on_node) / total * 10000 > > The metrics are complementary: eligible_bp + ineligible_bp = 10000 bp. > > Two-Scheme Setup for Hot Page Distribution > ========================================== > > For maintaining 30% of hot memory on CXL (node 1): I think it could help easy reading if the above sentence also explains node 0 is DRAM. For example, For maintaining hot memory on DRAM (node 0) and CXL (node 1) in 7:3 ratio: > > PUSH scheme: migrate_hot from node 0 -> node 1 > goal: node_ineligible_mem_bp, nid=0, target=3000 > "Push hot pages out until 30% of hot memory is NOT on DRAM" Seems the sentence assumes the actor is in DRAM. It was not very clear to me. How about making it clear? E.g., "Move hot pages from DRAM to CXL, if more than 70% of hot data is in DRAM" > > PULL scheme: migrate_hot from node 1 -> node 0 > goal: node_eligible_mem_bp, nid=0, target=7000 > "Pull hot pages back until 70% of hot memory IS on DRAM" If the above example is good for you, to be consistent with it, how about rewording like below? "Move hot pages from CXL to DRAM, if less than 70% of hot data is in DRAM" > > The complementary goals create a feedback loop that converges to the > target distribution. > > Dependencies > ============ > > This patch is based on SJ's damon/next branch which includes the > TEMPORAL goal tuner required for these metrics. Your test might be depend on the feature. But this patch series itself is not, as users could also use it with CONSIST tuner? Also, as I mentioned above, the feature is now also in mm-unstable tree. > > Testing Results > =============== > > Functionally tested on a two-node heterogeneous memory system with DRAM > (node 0) and CXL memory (node 1). Used PUSH+PULL scheme configuration > with migrate_hot action to maintain a target hot memory ratio between > the two tiers. > > With the TEMPORAL goal tuner, the system converges quickly to the target > distribution. The tuner drives esz to maximum when under goal and to > zero once the goal is met, forming a simple on/off feedback loop that > stabilizes at the desired ratio. > > With the CONSIST tuner, the scheme still converges but more slowly, as > it migrates and then throttles itself based on quota feedback. The time > to reach the goal varies depending on workload intensity. Sounds reasonable! Do you plan to further evaluate some performance metrics? I'd not strongly request that, but it would be very nice if we can have that. Regardless of your answer to the above question, I think the current code and the test is good enough to consider merging this. I suggest dropping the RFC tag from the next spin. Thank you for doing this, Ravi! Thanks, SJ [...]
On Sat, Mar 21, 2026 at 9:57 AM SeongJae Park <sj@kernel.org> wrote: > > Hello Ravi, > > > Thank you for this patch! TL; DR: Other than trivial things I commented below > and to the patch, I believe it is time to drop the RFC tag, and work on merging > this. > Thanks, SJ, for the prompt and detailed feedback! > On Fri, 20 Mar 2026 12:04:52 -0700 Ravi Jonnalagadda <ravis.opensrc@gmail.com> wrote: > > > This patch introduces two new DAMON quota goal metrics for controlling > > s/DAMON/DAMOS/ ? > Will fix it. > > memory distribution in heterogeneous memory systems (e.g., DRAM and CXL > > memory tiering) using physical address (PA) mode monitoring. > > > > v3: https://lore.kernel.org/linux-mm/20260223123232.12851-1-ravis.opensrc@gmail.com/ > > The above link would better to be put on 'Chage since v3' section below. > Got it. Will take care of it next time. > > > > Changes since v3: > > ================= > > > > - The first two patches from v3 (goal_tuner initialization fix and > > esz=0 quota bypass fix) are now in damon/next. This submission > > It is not also in mm-unstable :) Good to know. Will mention this in the next version. > > > contains only the core metrics patch, rebased on top of those fixes. > > > > - Simplified implementation: removed per-node eligible_bytes array, now > > iterates scheme-eligible regions directly for each goal evaluation. > > > > - Handle regions crossing node boundaries: uses damon_get_folio() to > > determine actual NUMA node placement of each folio rather than > > assuming uniform node placement within a region. > > > > - Pass scheme pointer directly to metric calculation functions, avoiding > > container_of() derivation from quota pointer. > > > > - Fixed 80-column wrapping issues. > > Thank you for addressing all my comments! > > > > > Background and Motivation > > ========================= > > > > In heterogeneous memory systems, controlling memory distribution across > > NUMA nodes is essential for performance optimization. This patch enables > > system-wide page distribution with target-state goals like "maintain 30% > > of scheme-eligible memory on CXL" using PA-mode DAMON schemes. > > > > What These Metrics Measure > > ========================== > > > > node_eligible_mem_bp: > > scheme_eligible_bytes_on_node / total_scheme_eligible_bytes * 10000 > > > > node_ineligible_mem_bp: > > (total - scheme_eligible_bytes_on_node) / total * 10000 > > > > The metrics are complementary: eligible_bp + ineligible_bp = 10000 bp. > > > > Two-Scheme Setup for Hot Page Distribution > > ========================================== > > > > For maintaining 30% of hot memory on CXL (node 1): > > I think it could help easy reading if the above sentence also explains > node 0 is DRAM. For example, > > For maintaining hot memory on DRAM (node 0) and CXL (node 1) in 7:3 ratio: Good suggestion, will clarify the node mapping. > > > > > PUSH scheme: migrate_hot from node 0 -> node 1 > > goal: node_ineligible_mem_bp, nid=0, target=3000 > > "Push hot pages out until 30% of hot memory is NOT on DRAM" > > Seems the sentence assumes the actor is in DRAM. It was not very clear to me. > How about making it clear? E.g., > > "Move hot pages from DRAM to CXL, if more than 70% of hot data is in DRAM" Got it. Will use your suggested wording. > > > > > PULL scheme: migrate_hot from node 1 -> node 0 > > goal: node_eligible_mem_bp, nid=0, target=7000 > > "Pull hot pages back until 70% of hot memory IS on DRAM" > > If the above example is good for you, to be consistent with it, how about > rewording like below? > > "Move hot pages from CXL to DRAM, if less than 70% of hot data is in DRAM" Agreed. Will reword this too. > > > > > The complementary goals create a feedback loop that converges to the > > target distribution. > > > > Dependencies > > ============ > > > > This patch is based on SJ's damon/next branch which includes the > > TEMPORAL goal tuner required for these metrics. > > Your test might be depend on the feature. But this patch series itself is not, > as users could also use it with CONSIST tuner? > Correct, the metrics work with both tuners. Will reword to clarify that testing used TEMPORAL but the patch itself does not depend on it. > Also, as I mentioned above, the feature is now also in mm-unstable tree. > > > > > Testing Results > > =============== > > > > Functionally tested on a two-node heterogeneous memory system with DRAM > > (node 0) and CXL memory (node 1). Used PUSH+PULL scheme configuration > > with migrate_hot action to maintain a target hot memory ratio between > > the two tiers. > > > > With the TEMPORAL goal tuner, the system converges quickly to the target > > distribution. The tuner drives esz to maximum when under goal and to > > zero once the goal is met, forming a simple on/off feedback loop that > > stabilizes at the desired ratio. > > > > With the CONSIST tuner, the scheme still converges but more slowly, as > > it migrates and then throttles itself based on quota feedback. The time > > to reach the goal varies depending on workload intensity. > > Sounds reasonable! > > Do you plan to further evaluate some performance metrics? I'd not strongly > request that, but it would be very nice if we can have that. > Yes, I am planning to run additional tests. I will send v5 addressing all the review comments and dropping the RFC tag. Results will follow as the testing progresses. > Regardless of your answer to the above question, I think the current code and > the test is good enough to consider merging this. I suggest dropping the RFC > tag from the next spin. > > Thank you for doing this, Ravi! > Thank you! Will drop the RFC tag for v5. > > Thanks, > SJ > Best Regards, Ravi. > [...]
On Mon, 23 Mar 2026 12:23:49 -0700 Ravi Jonnalagadda <ravis.opensrc@gmail.com> wrote: > On Sat, Mar 21, 2026 at 9:57 AM SeongJae Park <sj@kernel.org> wrote: [...] > > > > > > Changes since v3: > > > ================= > > > > > > - The first two patches from v3 (goal_tuner initialization fix and > > > esz=0 quota bypass fix) are now in damon/next. This submission > > > > It is not also in mm-unstable :) What I really wanted to say is, s/not/now/ > > Good to know. Will mention this in the next version. I think Ravi understood what I really wanted to mean, though. [...] > Thank you! Will drop the RFC tag for v5. Looking forward to! Thanks, SJ [...]
© 2016 - 2026 Red Hat, Inc.