[RFC v2 5/7] mm/memory: implement MM_CP_DAMON

SeongJae Park posted 7 patches 2 months, 1 week ago
[RFC v2 5/7] mm/memory: implement MM_CP_DAMON
Posted by SeongJae Park 2 months, 1 week ago
DAMON is using Accessed bits of page table entries as the major source
of the access information.  It lacks some additional information such as
which CPU was making the access.  Page faults could be another source of
information for such additional information.

Implement another change_protection() flag for such use case, namely
MM_CP_DAMON.  DAMON will install PAGE_NONE protections using the flag.
To avoid interfering with NUMA_BALANCING, which is also using PAGE_NON
protection, pass the faults to DAMON only when NUMA_BALANCING is
disabled.

Signed-off-by: SeongJae Park <sj@kernel.org>
---
 include/linux/mm.h |  1 +
 mm/memory.c        | 53 ++++++++++++++++++++++++++++++++++++++++++++--
 mm/mprotect.c      |  5 +++++
 3 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 21270f1664a4..ad92b77bf782 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2567,6 +2567,7 @@ int get_cmdline(struct task_struct *task, char *buffer, int buflen);
 #define  MM_CP_UFFD_WP_RESOLVE             (1UL << 3) /* Resolve wp */
 #define  MM_CP_UFFD_WP_ALL                 (MM_CP_UFFD_WP | \
 					    MM_CP_UFFD_WP_RESOLVE)
+#define MM_CP_DAMON                        (1UL << 4)
 
 bool can_change_pte_writable(struct vm_area_struct *vma, unsigned long addr,
 			     pte_t pte);
diff --git a/mm/memory.c b/mm/memory.c
index 92fd18a5d8d1..656e610867b0 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -75,6 +75,7 @@
 #include <linux/ptrace.h>
 #include <linux/vmalloc.h>
 #include <linux/sched/sysctl.h>
+#include <linux/damon.h>
 
 #include <trace/events/kmem.h>
 
@@ -5972,6 +5973,47 @@ static vm_fault_t wp_huge_pud(struct vm_fault *vmf, pud_t orig_pud)
 	return VM_FAULT_FALLBACK;
 }
 
+static vm_fault_t do_damon_page(struct vm_fault *vmf, bool huge_pmd)
+{
+	struct damon_access_report access_report = {
+		.addr = vmf->address,
+		.size = 1,
+	};
+	struct vm_area_struct *vma = vmf->vma;
+	struct folio *folio;
+	pte_t pte, old_pte;
+	bool writable = false, ignore_writable = false;
+	bool pte_write_upgrade = vma_wants_manual_pte_write_upgrade(vma);
+
+	if (huge_pmd)
+		access_report.addr = PFN_PHYS(pmd_pfn(vmf->orig_pmd));
+	else
+		access_report.addr = PFN_PHYS(pte_pfn(vmf->orig_pte));
+
+	spin_lock(vmf->ptl);
+	old_pte = ptep_get(vmf->pte);
+	if (unlikely(!pte_same(old_pte, vmf->orig_pte))) {
+		pte_unmap_unlock(vmf->pte, vmf->ptl);
+		return 0;
+	}
+	pte = pte_modify(old_pte, vma->vm_page_prot);
+	writable = pte_write(pte);
+	if (!writable && pte_write_upgrade &&
+			can_change_pte_writable(vma, vmf->address, pte))
+		writable = true;
+	folio = vm_normal_folio(vma, vmf->address, pte);
+	if (folio && folio_test_large(folio))
+		numa_rebuild_large_mapping(vmf, vma, folio, pte,
+				ignore_writable, pte_write_upgrade);
+	else
+		numa_rebuild_single_mapping(vmf, vma, vmf->address, vmf->pte,
+				writable);
+	pte_unmap_unlock(vmf->pte, vmf->ptl);
+
+	damon_report_access(&access_report);
+	return 0;
+}
+
 /*
  * These routines also need to handle stuff like marking pages dirty
  * and/or accessed for architectures that don't do it in hardware (most
@@ -6036,8 +6078,11 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
 	if (!pte_present(vmf->orig_pte))
 		return do_swap_page(vmf);
 
-	if (pte_protnone(vmf->orig_pte) && vma_is_accessible(vmf->vma))
+	if (pte_protnone(vmf->orig_pte) && vma_is_accessible(vmf->vma)) {
+		if (sysctl_numa_balancing_mode == NUMA_BALANCING_DISABLED)
+			return do_damon_page(vmf, false);
 		return do_numa_page(vmf);
+	}
 
 	spin_lock(vmf->ptl);
 	entry = vmf->orig_pte;
@@ -6159,8 +6204,12 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
 			return 0;
 		}
 		if (pmd_trans_huge(vmf.orig_pmd)) {
-			if (pmd_protnone(vmf.orig_pmd) && vma_is_accessible(vma))
+			if (pmd_protnone(vmf.orig_pmd) && vma_is_accessible(vma)) {
+				if (sysctl_numa_balancing_mode ==
+						NUMA_BALANCING_DISABLED)
+					return do_damon_page(&vmf, true);
 				return do_huge_pmd_numa_page(&vmf);
+			}
 
 			if ((flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) &&
 			    !pmd_write(vmf.orig_pmd)) {
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 78bded7acf79..e8a76114e4f9 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -714,6 +714,11 @@ long change_protection(struct mmu_gather *tlb,
 	WARN_ON_ONCE(cp_flags & MM_CP_PROT_NUMA);
 #endif
 
+#ifdef CONFIG_ARCH_SUPPORTS_NUMA_BALANCING
+	if (cp_flags & MM_CP_DAMON)
+		newprot = PAGE_NONE;
+#endif
+
 	if (is_vm_hugetlb_page(vma))
 		pages = hugetlb_change_protection(vma, start, end, newprot,
 						  cp_flags);
-- 
2.39.5
Re: [RFC v2 5/7] mm/memory: implement MM_CP_DAMON
Posted by Lorenzo Stoakes 2 months, 1 week ago
On Sun, Jul 27, 2025 at 01:18:11PM -0700, SeongJae Park wrote:
> DAMON is using Accessed bits of page table entries as the major source
> of the access information.  It lacks some additional information such as
> which CPU was making the access.  Page faults could be another source of
> information for such additional information.
>
> Implement another change_protection() flag for such use case, namely
> MM_CP_DAMON.  DAMON will install PAGE_NONE protections using the flag.
> To avoid interfering with NUMA_BALANCING, which is also using PAGE_NON
> protection, pass the faults to DAMON only when NUMA_BALANCING is
> disabled.
>
> Signed-off-by: SeongJae Park <sj@kernel.org>

This seems to not be an upstreamable series right now unless I'm missing
something.

Firstly, you're making a change in behaviour even when CONFIG_DAMON is not
specified, and Linus already told you we can't have that default-on.

I'm very very much not happy with us doing something for 'damon'
unconditionaly when !CONFIG_DAMON on the assumption that an acessible
mapping has PROT_NONE set.

This change makes me nervous in general ANYWAY as you are now changing core
mm and introducing a new faulting mechanism specifically for DAMON.

And we are assuming that NUMA balancing being disabled is not racey in a
way that will cause things to break.

I really also dislike the idea of an _implicit_ assumption that we are good
to use the NUMA balancing faulting mechanism to 'tack on' DAMON stuff.

Is it really all that useful to provide a method that requires NUMA
balancing to be diabled?

Finally, you're making it so DAMON can mprotect() something to use the
DAMON/NUMA balancing fault handler, which doesn't appaer to check to see if
NUMA balacing is disabled, but anyway it could be re-enabled?

And then now DAMON is making stuff fault as NUMA balancing faults
incorrectly?

This just seems broken.

Unless there's really good justification I'm really not inclined for us to
merge this as-is right now unfortunately.

> ---
>  include/linux/mm.h |  1 +
>  mm/memory.c        | 53 ++++++++++++++++++++++++++++++++++++++++++++--
>  mm/mprotect.c      |  5 +++++
>  3 files changed, 57 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 21270f1664a4..ad92b77bf782 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2567,6 +2567,7 @@ int get_cmdline(struct task_struct *task, char *buffer, int buflen);
>  #define  MM_CP_UFFD_WP_RESOLVE             (1UL << 3) /* Resolve wp */
>  #define  MM_CP_UFFD_WP_ALL                 (MM_CP_UFFD_WP | \
>  					    MM_CP_UFFD_WP_RESOLVE)

/* Comment here :) */

> +#define MM_CP_DAMON                        (1UL << 4)

Shouldn't this be something more specific like MM_CP_DAMON_REPORT_FAULT ?

>
>  bool can_change_pte_writable(struct vm_area_struct *vma, unsigned long addr,
>  			     pte_t pte);
> diff --git a/mm/memory.c b/mm/memory.c
> index 92fd18a5d8d1..656e610867b0 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -75,6 +75,7 @@
>  #include <linux/ptrace.h>
>  #include <linux/vmalloc.h>
>  #include <linux/sched/sysctl.h>
> +#include <linux/damon.h>
>
>  #include <trace/events/kmem.h>
>
> @@ -5972,6 +5973,47 @@ static vm_fault_t wp_huge_pud(struct vm_fault *vmf, pud_t orig_pud)
>  	return VM_FAULT_FALLBACK;
>  }
>
> +static vm_fault_t do_damon_page(struct vm_fault *vmf, bool huge_pmd)

You need to explain what this function is doing at least.

> +{
> +	struct damon_access_report access_report = {
> +		.addr = vmf->address,
> +		.size = 1,
> +	};
> +	struct vm_area_struct *vma = vmf->vma;
> +	struct folio *folio;
> +	pte_t pte, old_pte;
> +	bool writable = false, ignore_writable = false;
> +	bool pte_write_upgrade = vma_wants_manual_pte_write_upgrade(vma);
> +
> +	if (huge_pmd)
> +		access_report.addr = PFN_PHYS(pmd_pfn(vmf->orig_pmd));
> +	else
> +		access_report.addr = PFN_PHYS(pte_pfn(vmf->orig_pte));
> +
> +	spin_lock(vmf->ptl);
> +	old_pte = ptep_get(vmf->pte);
> +	if (unlikely(!pte_same(old_pte, vmf->orig_pte))) {
> +		pte_unmap_unlock(vmf->pte, vmf->ptl);
> +		return 0;
> +	}
> +	pte = pte_modify(old_pte, vma->vm_page_prot);

This is little weird, you're applying VMA protection bits to the PTE then
later asking about protection bits? Can't you just tell from the VMA flags?

Seems like do_numa_page() copy/pasta.

> +	writable = pte_write(pte);
> +	if (!writable && pte_write_upgrade &&
> +			can_change_pte_writable(vma, vmf->address, pte))
> +		writable = true;
> +	folio = vm_normal_folio(vma, vmf->address, pte);
> +	if (folio && folio_test_large(folio))
> +		numa_rebuild_large_mapping(vmf, vma, folio, pte,
> +				ignore_writable, pte_write_upgrade);
> +	else
> +		numa_rebuild_single_mapping(vmf, vma, vmf->address, vmf->pte,
> +				writable);
> +	pte_unmap_unlock(vmf->pte, vmf->ptl);
> +

All of this seems to be duplicating aspects of do_numa_page().

Seems more sensible, if it makes sense to accept this change (I'm still a
bit dubious about changing the faulting mechanism here), we should probably
implement a 'do_non_present_acessible()' function that avoids duplication +
bit-rot, with #ifdef CONFIG_DAMON's in place, or even better, just a hook
into damon code that's static inline void {} if not enabled.

Note that the numa fault handling logic is called _even if numa balancing
is disabled_. So we'd avoid all the other changes too.

> +	damon_report_access(&access_report);
> +	return 0;
> +}
> +
>  /*
>   * These routines also need to handle stuff like marking pages dirty
>   * and/or accessed for architectures that don't do it in hardware (most
> @@ -6036,8 +6078,11 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
>  	if (!pte_present(vmf->orig_pte))
>  		return do_swap_page(vmf);
>
> -	if (pte_protnone(vmf->orig_pte) && vma_is_accessible(vmf->vma))
> +	if (pte_protnone(vmf->orig_pte) && vma_is_accessible(vmf->vma)) {
> +		if (sysctl_numa_balancing_mode == NUMA_BALANCING_DISABLED)
> +			return do_damon_page(vmf, false);

Making an assumption here that, even if damon is disabled, we should handle
a 'damon' fault is just icky, sorry.

Also are we 100% sure that there's no races here? When we disable numa
balancing do we correctly ensure that absolutely no racing NUMA balancing
faults can happen whatsever at this juncture?

Have you checked that and ensured that to be the case?

You really have to be 100% certain you're not going to wrongly handle NUMA
page faults this way, on a potentially non-CONFIG_DAMON kernel to boot.

Keep in mind fault handling is verrrry racey (purposefully) and can be done
under VMA read lock alone (and that's only very loosely a lock!).

This is just, yeah, concerning.

>  		return do_numa_page(vmf);
> +	}
>
>  	spin_lock(vmf->ptl);
>  	entry = vmf->orig_pte;
> @@ -6159,8 +6204,12 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
>  			return 0;
>  		}
>  		if (pmd_trans_huge(vmf.orig_pmd)) {
> -			if (pmd_protnone(vmf.orig_pmd) && vma_is_accessible(vma))
> +			if (pmd_protnone(vmf.orig_pmd) && vma_is_accessible(vma)) {
> +				if (sysctl_numa_balancing_mode ==
> +						NUMA_BALANCING_DISABLED)
> +					return do_damon_page(&vmf, true);

Now we have _more_ weirdness around what CONFIG_TRANSPARENT_HUGEPAGE
enables/disables.

>  				return do_huge_pmd_numa_page(&vmf);

This function will be called only if THP is enabled, but now damon
overrides that...

Let's try and make this consistent.

> +			}
>
>  			if ((flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) &&
>  			    !pmd_write(vmf.orig_pmd)) {
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index 78bded7acf79..e8a76114e4f9 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -714,6 +714,11 @@ long change_protection(struct mmu_gather *tlb,
>  	WARN_ON_ONCE(cp_flags & MM_CP_PROT_NUMA);
>  #endif
>
> +#ifdef CONFIG_ARCH_SUPPORTS_NUMA_BALANCING
> +	if (cp_flags & MM_CP_DAMON)
> +		newprot = PAGE_NONE;
> +#endif

OK this just seems broken.

Firstly, again you're introducing behaviour that applies even if DAMON is
disabled. Let's please not.

And predicating on -the architecture even supporting NUMA balancing- seems
rather bizarre?

Secondly, somebody can disable/enable NUMA balancing, and thus you are now
allowing this function to, when somebody specifies 'DAMON', get NUMA
balancing fault handling??

If you don't bother checking whether NUMA balancing is disabled when you
set it, then boom - you've broken the faulting mechanism, but even if you
_do_, somebody can just switch it on again...

Sorry but unless I'm wildly missing something here we're going to need a
new faulting mechanism (if we even want to allow DAMON to have its own
fault handling that is!)

> +
>  	if (is_vm_hugetlb_page(vma))
>  		pages = hugetlb_change_protection(vma, start, end, newprot,
>  						  cp_flags);
> --
> 2.39.5

Cheers, Lorenzo
Re: [RFC v2 5/7] mm/memory: implement MM_CP_DAMON
Posted by SeongJae Park 2 months, 1 week ago
On Mon, 28 Jul 2025 06:19:35 +0100 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:

> On Sun, Jul 27, 2025 at 01:18:11PM -0700, SeongJae Park wrote:
> > DAMON is using Accessed bits of page table entries as the major source
> > of the access information.  It lacks some additional information such as
> > which CPU was making the access.  Page faults could be another source of
> > information for such additional information.
> >
> > Implement another change_protection() flag for such use case, namely
> > MM_CP_DAMON.  DAMON will install PAGE_NONE protections using the flag.
> > To avoid interfering with NUMA_BALANCING, which is also using PAGE_NON
> > protection, pass the faults to DAMON only when NUMA_BALANCING is
> > disabled.
> >
> > Signed-off-by: SeongJae Park <sj@kernel.org>
> 
> This seems to not be an upstreamable series right now unless I'm missing
> something.
> 
> Firstly, you're making a change in behaviour even when CONFIG_DAMON is not
> specified, and Linus already told you we can't have that default-on.
> 
> I'm very very much not happy with us doing something for 'damon'
> unconditionaly when !CONFIG_DAMON on the assumption that an acessible
> mapping has PROT_NONE set.
> 
> This change makes me nervous in general ANYWAY as you are now changing core
> mm and introducing a new faulting mechanism specifically for DAMON.
> 
> And we are assuming that NUMA balancing being disabled is not racey in a
> way that will cause things to break.
> 
> I really also dislike the idea of an _implicit_ assumption that we are good
> to use the NUMA balancing faulting mechanism to 'tack on' DAMON stuff.
> 
> Is it really all that useful to provide a method that requires NUMA
> balancing to be diabled?
> 
> Finally, you're making it so DAMON can mprotect() something to use the
> DAMON/NUMA balancing fault handler, which doesn't appaer to check to see if
> NUMA balacing is disabled, but anyway it could be re-enabled?
> 
> And then now DAMON is making stuff fault as NUMA balancing faults
> incorrectly?
> 
> This just seems broken.
> 
> Unless there's really good justification I'm really not inclined for us to
> merge this as-is right now unfortunately.

Thank you for review and comments, Lorenzo.  I fundamentally agree all your
points.  I don't aim to merge this as-is.  Actually this patch series is more
like POC, but apparently I was rushing.  I will try to adjust your concerns in
the next version.

> 
> > ---
> >  include/linux/mm.h |  1 +
> >  mm/memory.c        | 53 ++++++++++++++++++++++++++++++++++++++++++++--
> >  mm/mprotect.c      |  5 +++++
> >  3 files changed, 57 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 21270f1664a4..ad92b77bf782 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -2567,6 +2567,7 @@ int get_cmdline(struct task_struct *task, char *buffer, int buflen);
> >  #define  MM_CP_UFFD_WP_RESOLVE             (1UL << 3) /* Resolve wp */
> >  #define  MM_CP_UFFD_WP_ALL                 (MM_CP_UFFD_WP | \
> >  					    MM_CP_UFFD_WP_RESOLVE)
> 
> /* Comment here :) */

I will add the comments on the next version.

> 
> > +#define MM_CP_DAMON                        (1UL << 4)
> 
> Shouldn't this be something more specific like MM_CP_DAMON_REPORT_FAULT ?

Thank you for the suggestion, I will use it.

> 
> >
> >  bool can_change_pte_writable(struct vm_area_struct *vma, unsigned long addr,
> >  			     pte_t pte);
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 92fd18a5d8d1..656e610867b0 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -75,6 +75,7 @@
> >  #include <linux/ptrace.h>
> >  #include <linux/vmalloc.h>
> >  #include <linux/sched/sysctl.h>
> > +#include <linux/damon.h>
> >
> >  #include <trace/events/kmem.h>
> >
> > @@ -5972,6 +5973,47 @@ static vm_fault_t wp_huge_pud(struct vm_fault *vmf, pud_t orig_pud)
> >  	return VM_FAULT_FALLBACK;
> >  }
> >
> > +static vm_fault_t do_damon_page(struct vm_fault *vmf, bool huge_pmd)
> 
> You need to explain what this function is doing at least.

This function lifts the protection and reports the access information to DAMON.
I will add more comments on the next version.

> 
> > +{
> > +	struct damon_access_report access_report = {
> > +		.addr = vmf->address,
> > +		.size = 1,
> > +	};
> > +	struct vm_area_struct *vma = vmf->vma;
> > +	struct folio *folio;
> > +	pte_t pte, old_pte;
> > +	bool writable = false, ignore_writable = false;
> > +	bool pte_write_upgrade = vma_wants_manual_pte_write_upgrade(vma);
> > +
> > +	if (huge_pmd)
> > +		access_report.addr = PFN_PHYS(pmd_pfn(vmf->orig_pmd));
> > +	else
> > +		access_report.addr = PFN_PHYS(pte_pfn(vmf->orig_pte));
> > +
> > +	spin_lock(vmf->ptl);
> > +	old_pte = ptep_get(vmf->pte);
> > +	if (unlikely(!pte_same(old_pte, vmf->orig_pte))) {
> > +		pte_unmap_unlock(vmf->pte, vmf->ptl);
> > +		return 0;
> > +	}
> > +	pte = pte_modify(old_pte, vma->vm_page_prot);
> 
> This is little weird, you're applying VMA protection bits to the PTE then
> later asking about protection bits? Can't you just tell from the VMA flags?
> 
> Seems like do_numa_page() copy/pasta.
> 
> > +	writable = pte_write(pte);
> > +	if (!writable && pte_write_upgrade &&
> > +			can_change_pte_writable(vma, vmf->address, pte))
> > +		writable = true;
> > +	folio = vm_normal_folio(vma, vmf->address, pte);
> > +	if (folio && folio_test_large(folio))
> > +		numa_rebuild_large_mapping(vmf, vma, folio, pte,
> > +				ignore_writable, pte_write_upgrade);
> > +	else
> > +		numa_rebuild_single_mapping(vmf, vma, vmf->address, vmf->pte,
> > +				writable);
> > +	pte_unmap_unlock(vmf->pte, vmf->ptl);
> > +
> 
> All of this seems to be duplicating aspects of do_numa_page().

You are correct.

> 
> Seems more sensible, if it makes sense to accept this change (I'm still a
> bit dubious about changing the faulting mechanism here), we should probably
> implement a 'do_non_present_acessible()' function that avoids duplication +
> bit-rot, with #ifdef CONFIG_DAMON's in place, or even better, just a hook
> into damon code that's static inline void {} if not enabled.

I will do so in the next version.

> 
> Note that the numa fault handling logic is called _even if numa balancing
> is disabled_. So we'd avoid all the other changes too.
> 
> > +	damon_report_access(&access_report);
> > +	return 0;
> > +}
> > +
> >  /*
> >   * These routines also need to handle stuff like marking pages dirty
> >   * and/or accessed for architectures that don't do it in hardware (most
> > @@ -6036,8 +6078,11 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
> >  	if (!pte_present(vmf->orig_pte))
> >  		return do_swap_page(vmf);
> >
> > -	if (pte_protnone(vmf->orig_pte) && vma_is_accessible(vmf->vma))
> > +	if (pte_protnone(vmf->orig_pte) && vma_is_accessible(vmf->vma)) {
> > +		if (sysctl_numa_balancing_mode == NUMA_BALANCING_DISABLED)
> > +			return do_damon_page(vmf, false);
> 
> Making an assumption here that, even if damon is disabled, we should handle
> a 'damon' fault is just icky, sorry.

Agreed.  I will make this happen only when CONFIG_DAMON is enabled.

> 
> Also are we 100% sure that there's no races here? When we disable numa
> balancing do we correctly ensure that absolutely no racing NUMA balancing
> faults can happen whatsever at this juncture?

Enabling CONFIG_DAMON will not automatically invoke change_protection() with
MM_CP_DAMON.  Such invocations will be made only if the user disables NUMA
balancing and run DAMON in the reporting mode.

So there can be two racy cases.

If the user enables NUMA balancing and then disables it after a while, page
faults for MM_CP_PROT_NUMA can be handled by DAMON.  That could look odd, but
there should be no real problem in practice.  DAMON's fault handling will
cleanup the PAGE_NONE protection like NUMA balancing, and DAMON has no problem
at receiving such additional reports from MM_CP_PROT_NUMA-caused faults.  DAMON
may show a few more than expected accesses, but that's no problem for DAMON.

Similarly, if the user starts DAMON in the report mode, stops DAMON, and then
enables NUMA balancing, faults for MM_CP_DAMON that installed while DAMON was
running in the report mode can be handled by NUMA balancing.  This should also
not make real problems in practice in my opinion.  NUMA balancing could see
more accesses and migrate pages little bit more than expected, but that should
be just for a moment.

Maybe I'm missing something, or not well explaining my thoughts.  I will be
happy to learn what I'm missing, or get chances to further clarify my points.

> 
> Have you checked that and ensured that to be the case?
> 
> You really have to be 100% certain you're not going to wrongly handle NUMA
> page faults this way, on a potentially non-CONFIG_DAMON kernel to boot.

I will ensure that never happens on CONFIG_DAMON disabled kernels, in the next
version.

> 
> Keep in mind fault handling is verrrry racey (purposefully) and can be done
> under VMA read lock alone (and that's only very loosely a lock!).
> 
> This is just, yeah, concerning.

Thank you for the caution.  DAMON's fault handling code only saves the
information in its internal ring buffer.  It doesn't touch vmas.  So I think
there should be no such problems.  I will add the clarification on the next
version.

> 
> >  		return do_numa_page(vmf);
> > +	}
> >
> >  	spin_lock(vmf->ptl);
> >  	entry = vmf->orig_pte;
> > @@ -6159,8 +6204,12 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
> >  			return 0;
> >  		}
> >  		if (pmd_trans_huge(vmf.orig_pmd)) {
> > -			if (pmd_protnone(vmf.orig_pmd) && vma_is_accessible(vma))
> > +			if (pmd_protnone(vmf.orig_pmd) && vma_is_accessible(vma)) {
> > +				if (sysctl_numa_balancing_mode ==
> > +						NUMA_BALANCING_DISABLED)
> > +					return do_damon_page(&vmf, true);
> 
> Now we have _more_ weirdness around what CONFIG_TRANSPARENT_HUGEPAGE
> enables/disables.
> 
> >  				return do_huge_pmd_numa_page(&vmf);
> 
> This function will be called only if THP is enabled, but now damon
> overrides that...
> 
> Let's try and make this consistent.

I will do so in the next version.

> 
> > +			}
> >
> >  			if ((flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) &&
> >  			    !pmd_write(vmf.orig_pmd)) {
> > diff --git a/mm/mprotect.c b/mm/mprotect.c
> > index 78bded7acf79..e8a76114e4f9 100644
> > --- a/mm/mprotect.c
> > +++ b/mm/mprotect.c
> > @@ -714,6 +714,11 @@ long change_protection(struct mmu_gather *tlb,
> >  	WARN_ON_ONCE(cp_flags & MM_CP_PROT_NUMA);
> >  #endif
> >
> > +#ifdef CONFIG_ARCH_SUPPORTS_NUMA_BALANCING
> > +	if (cp_flags & MM_CP_DAMON)
> > +		newprot = PAGE_NONE;
> > +#endif
> 
> OK this just seems broken.
> 
> Firstly, again you're introducing behaviour that applies even if DAMON is
> disabled. Let's please not.
> 
> And predicating on -the architecture even supporting NUMA balancing- seems
> rather bizarre?

Agree, I will ensure there will be no change on CONFIG_DAMON disabled kernels,
in the next version.

> 
> Secondly, somebody can disable/enable NUMA balancing, and thus you are now
> allowing this function to, when somebody specifies 'DAMON', get NUMA
> balancing fault handling??
> 
> If you don't bother checking whether NUMA balancing is disabled when you
> set it, then boom - you've broken the faulting mechanism, but even if you
> _do_, somebody can just switch it on again...

As I explained on the two racy cases aboe, faults that caused by DAMON or NUMA
balancing can be handled by the other's handling code, but only for a limited
time under the user's controls.  But to my understanding that should not cause
real problems in the practice, and users wouldn't be suggested to operate the
system in such a way.  I will add more documents and cautions about that in the
next version.

Again, I may be missing something, and would be happy to be enlightened or
get more chances to clarify my points.

> 
> Sorry but unless I'm wildly missing something here we're going to need a
> new faulting mechanism (if we even want to allow DAMON to have its own
> fault handling that is!)

No worry, I completely agree to your main points.  I think I was bettr to
explain more about the racy cases and ensure no interference on CONFIG_DAMON
disabled case, and thank you for letting me know that.

Maybe we need more discussions on the racy cases, but anyway I will try to
address all your concerns in the next version.


Thanks,
SJ

[...]
Re: [RFC v2 5/7] mm/memory: implement MM_CP_DAMON
Posted by Lorenzo Stoakes 2 months, 1 week ago
On Mon, Jul 28, 2025 at 08:06:32PM -0700, SeongJae Park wrote:
> On Mon, 28 Jul 2025 06:19:35 +0100 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:
>
> > On Sun, Jul 27, 2025 at 01:18:11PM -0700, SeongJae Park wrote:
> > > DAMON is using Accessed bits of page table entries as the major source
> > > of the access information.  It lacks some additional information such as
> > > which CPU was making the access.  Page faults could be another source of
> > > information for such additional information.
> > >
> > > Implement another change_protection() flag for such use case, namely
> > > MM_CP_DAMON.  DAMON will install PAGE_NONE protections using the flag.
> > > To avoid interfering with NUMA_BALANCING, which is also using PAGE_NON
> > > protection, pass the faults to DAMON only when NUMA_BALANCING is
> > > disabled.
> > >
> > > Signed-off-by: SeongJae Park <sj@kernel.org>
> >
> > This seems to not be an upstreamable series right now unless I'm missing
> > something.
> >
> > Firstly, you're making a change in behaviour even when CONFIG_DAMON is not
> > specified, and Linus already told you we can't have that default-on.
> >
> > I'm very very much not happy with us doing something for 'damon'
> > unconditionaly when !CONFIG_DAMON on the assumption that an acessible
> > mapping has PROT_NONE set.
> >
> > This change makes me nervous in general ANYWAY as you are now changing core
> > mm and introducing a new faulting mechanism specifically for DAMON.
> >
> > And we are assuming that NUMA balancing being disabled is not racey in a
> > way that will cause things to break.
> >
> > I really also dislike the idea of an _implicit_ assumption that we are good
> > to use the NUMA balancing faulting mechanism to 'tack on' DAMON stuff.
> >
> > Is it really all that useful to provide a method that requires NUMA
> > balancing to be diabled?
> >
> > Finally, you're making it so DAMON can mprotect() something to use the
> > DAMON/NUMA balancing fault handler, which doesn't appaer to check to see if
> > NUMA balacing is disabled, but anyway it could be re-enabled?
> >
> > And then now DAMON is making stuff fault as NUMA balancing faults
> > incorrectly?
> >
> > This just seems broken.
> >
> > Unless there's really good justification I'm really not inclined for us to
> > merge this as-is right now unfortunately.
>
> Thank you for review and comments, Lorenzo.  I fundamentally agree all your
> points.  I don't aim to merge this as-is.  Actually this patch series is more
> like POC, but apparently I was rushing.  I will try to adjust your concerns in
> the next version.

Thanks.

I do wonder whether we really can have a whole new faulting mechanism just for
DAMON. Because if in future, we wanted to change how this worked, we'd be
constrained, and it is a very specific user.

The issue is you need the PTE to be restored to its previous state, just like
NUMA balancing.

And I really really do not like this 'oh if you turn it off you can use it for
DAMON' thing, it's just really odd and asking for trouble.

I think the only _workable_ version of this would be to convert the numa
handling to a generic 'fault then restore' type mechanism that could be hooked
in to by either NUMA or DAMON.

But really I think you'd _need_ to not have significantly different behaviour
between the two and _not_ constrain this to being only when NUMA balancing is
disabled.

But then you'd need to know _this_ PTE is for NUMA balancing vs. another is for
this stuff.

I'm not really sure there is an upstreamable version of this, but it'd need to
be made generic like that if there were.

I think it might be worth taking some time to examine whether a version of this
that can be sensibly generic (which could have hooks for things) is _possible_
before sending a v2.

> >
> > Also are we 100% sure that there's no races here? When we disable numa
> > balancing do we correctly ensure that absolutely no racing NUMA balancing
> > faults can happen whatsever at this juncture?
>
> Enabling CONFIG_DAMON will not automatically invoke change_protection() with
> MM_CP_DAMON.  Such invocations will be made only if the user disables NUMA
> balancing and run DAMON in the reporting mode.
>
> So there can be two racy cases.
>
> If the user enables NUMA balancing and then disables it after a while, page
> faults for MM_CP_PROT_NUMA can be handled by DAMON.  That could look odd, but
> there should be no real problem in practice.  DAMON's fault handling will
> cleanup the PAGE_NONE protection like NUMA balancing, and DAMON has no problem
> at receiving such additional reports from MM_CP_PROT_NUMA-caused faults.  DAMON
> may show a few more than expected accesses, but that's no problem for DAMON.
>
> Similarly, if the user starts DAMON in the report mode, stops DAMON, and then
> enables NUMA balancing, faults for MM_CP_DAMON that installed while DAMON was
> running in the report mode can be handled by NUMA balancing.  This should also
> not make real problems in practice in my opinion.  NUMA balancing could see
> more accesses and migrate pages little bit more than expected, but that should
> be just for a moment.

I'm really concerned about these.

We're now introducing unexpected behaviour based on a race and allowing faults
to be mis-handled.

I'm maybe not as confident as you are that everything will 'just work' and it
seems like we're asking for obscure bugs in the future.

> > You really have to be 100% certain you're not going to wrongly handle NUMA
> > page faults this way, on a potentially non-CONFIG_DAMON kernel to boot.
>
> I will ensure that never happens on CONFIG_DAMON disabled kernels, in the next
> version.

Well, in the above you say that you can't help but do that when a race occurs?

>
> >
> > Keep in mind fault handling is verrrry racey (purposefully) and can be done
> > under VMA read lock alone (and that's only very loosely a lock!).
> >
> > This is just, yeah, concerning.
>
> Thank you for the caution.  DAMON's fault handling code only saves the
> information in its internal ring buffer.  It doesn't touch vmas.  So I think
> there should be no such problems.  I will add the clarification on the next
> version.

Right, I'm just saying that this all being super racey between NUMA
enabled/disabled seems pretty unavoidable, but we covered above.

> >
> > Secondly, somebody can disable/enable NUMA balancing, and thus you are now
> > allowing this function to, when somebody specifies 'DAMON', get NUMA
> > balancing fault handling??
> >
> > If you don't bother checking whether NUMA balancing is disabled when you
> > set it, then boom - you've broken the faulting mechanism, but even if you
> > _do_, somebody can just switch it on again...
>
> As I explained on the two racy cases aboe, faults that caused by DAMON or NUMA
> balancing can be handled by the other's handling code, but only for a limited
> time under the user's controls.  But to my understanding that should not cause
> real problems in the practice, and users wouldn't be suggested to operate the
> system in such a way.  I will add more documents and cautions about that in the
> next version.

I really don't think a version of the code that results in the wrong handler
being used is upstreamable, sorry.

I've not dug into the nitty gritty details on what would happen in both cases,
but even if it were 100% fine _now_, this is _exactly_ the kind of thing that
results in horrible hard-to-debug issues later, should something change.

Implicitly 'just having to know' that we might be in the wrong fault handler
seems like asking for trouble, and the RoI on an optional profiling tool (this
is not to take away from DAMON which is a _great_ utility, I'm saying it's
simply not part of the _core_) isn't there.
Re: [RFC v2 5/7] mm/memory: implement MM_CP_DAMON
Posted by SeongJae Park 2 months, 1 week ago
On Tue, 29 Jul 2025 10:40:11 +0100 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:

> On Mon, Jul 28, 2025 at 08:06:32PM -0700, SeongJae Park wrote:
> > On Mon, 28 Jul 2025 06:19:35 +0100 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:
> >
> > > On Sun, Jul 27, 2025 at 01:18:11PM -0700, SeongJae Park wrote:
> > > > DAMON is using Accessed bits of page table entries as the major source
> > > > of the access information.  It lacks some additional information such as
> > > > which CPU was making the access.  Page faults could be another source of
> > > > information for such additional information.
> > > >
> > > > Implement another change_protection() flag for such use case, namely
> > > > MM_CP_DAMON.  DAMON will install PAGE_NONE protections using the flag.
> > > > To avoid interfering with NUMA_BALANCING, which is also using PAGE_NON
> > > > protection, pass the faults to DAMON only when NUMA_BALANCING is
> > > > disabled.
> > > >
> > > > Signed-off-by: SeongJae Park <sj@kernel.org>
> > >
> > > This seems to not be an upstreamable series right now unless I'm missing
> > > something.
> > >
> > > Firstly, you're making a change in behaviour even when CONFIG_DAMON is not
> > > specified, and Linus already told you we can't have that default-on.
> > >
> > > I'm very very much not happy with us doing something for 'damon'
> > > unconditionaly when !CONFIG_DAMON on the assumption that an acessible
> > > mapping has PROT_NONE set.
> > >
> > > This change makes me nervous in general ANYWAY as you are now changing core
> > > mm and introducing a new faulting mechanism specifically for DAMON.
> > >
> > > And we are assuming that NUMA balancing being disabled is not racey in a
> > > way that will cause things to break.
> > >
> > > I really also dislike the idea of an _implicit_ assumption that we are good
> > > to use the NUMA balancing faulting mechanism to 'tack on' DAMON stuff.
> > >
> > > Is it really all that useful to provide a method that requires NUMA
> > > balancing to be diabled?
> > >
> > > Finally, you're making it so DAMON can mprotect() something to use the
> > > DAMON/NUMA balancing fault handler, which doesn't appaer to check to see if
> > > NUMA balacing is disabled, but anyway it could be re-enabled?
> > >
> > > And then now DAMON is making stuff fault as NUMA balancing faults
> > > incorrectly?
> > >
> > > This just seems broken.
> > >
> > > Unless there's really good justification I'm really not inclined for us to
> > > merge this as-is right now unfortunately.
> >
> > Thank you for review and comments, Lorenzo.  I fundamentally agree all your
> > points.  I don't aim to merge this as-is.  Actually this patch series is more
> > like POC, but apparently I was rushing.  I will try to adjust your concerns in
> > the next version.
> 
> Thanks.
> 
> I do wonder whether we really can have a whole new faulting mechanism just for
> DAMON. Because if in future, we wanted to change how this worked, we'd be
> constrained, and it is a very specific user.
> 
> The issue is you need the PTE to be restored to its previous state, just like
> NUMA balancing.
> 
> And I really really do not like this 'oh if you turn it off you can use it for
> DAMON' thing, it's just really odd and asking for trouble.
> 
> I think the only _workable_ version of this would be to convert the numa
> handling to a generic 'fault then restore' type mechanism that could be hooked
> in to by either NUMA or DAMON.

I agree, and this is my current plan for the next version of this patch.

> 
> But really I think you'd _need_ to not have significantly different behaviour
> between the two and _not_ constrain this to being only when NUMA balancing is
> disabled.

I agree all the points.  Especially the current interface is ambiguous and easy
to mistake.

> 
> But then you'd need to know _this_ PTE is for NUMA balancing vs. another is for
> this stuff.

Yes, this would be the ideal.  But, memorizing something in page level is ...
always an interesting challenge in my opinion, and I have no good idea to do
this for now :)

> 
> I'm not really sure there is an upstreamable version of this, but it'd need to
> be made generic like that if there were.
> 
> I think it might be worth taking some time to examine whether a version of this
> that can be sensibly generic (which could have hooks for things) is _possible_
> before sending a v2.

Agreed, I don't need to rush.  Let's take time and discuss sufficiently. :)

Nonetheless, I may post a followup version of this patch series that contains
this one, even before we get a conclusion about this specific one.  I think I
may have to do that, for sharing the future idea in a way easy to understand
and test.  I think it might also help us at understanding the real ROI of this
patch, and if there is another option to move forward.  In the case, I will of
course keep RFC tag and clearly note that this patch is still under the
discussion and not willing to be merged as is before the discussion is done.

> 
> > >
> > > Also are we 100% sure that there's no races here? When we disable numa
> > > balancing do we correctly ensure that absolutely no racing NUMA balancing
> > > faults can happen whatsever at this juncture?
> >
> > Enabling CONFIG_DAMON will not automatically invoke change_protection() with
> > MM_CP_DAMON.  Such invocations will be made only if the user disables NUMA
> > balancing and run DAMON in the reporting mode.
> >
> > So there can be two racy cases.
> >
> > If the user enables NUMA balancing and then disables it after a while, page
> > faults for MM_CP_PROT_NUMA can be handled by DAMON.  That could look odd, but
> > there should be no real problem in practice.  DAMON's fault handling will
> > cleanup the PAGE_NONE protection like NUMA balancing, and DAMON has no problem
> > at receiving such additional reports from MM_CP_PROT_NUMA-caused faults.  DAMON
> > may show a few more than expected accesses, but that's no problem for DAMON.
> >
> > Similarly, if the user starts DAMON in the report mode, stops DAMON, and then
> > enables NUMA balancing, faults for MM_CP_DAMON that installed while DAMON was
> > running in the report mode can be handled by NUMA balancing.  This should also
> > not make real problems in practice in my opinion.  NUMA balancing could see
> > more accesses and migrate pages little bit more than expected, but that should
> > be just for a moment.
> 
> I'm really concerned about these.
> 
> We're now introducing unexpected behaviour based on a race and allowing faults
> to be mis-handled.
> 
> I'm maybe not as confident as you are that everything will 'just work' and it
> seems like we're asking for obscure bugs in the future.
> 
> > > You really have to be 100% certain you're not going to wrongly handle NUMA
> > > page faults this way, on a potentially non-CONFIG_DAMON kernel to boot.
> >
> > I will ensure that never happens on CONFIG_DAMON disabled kernels, in the next
> > version.
> 
> Well, in the above you say that you can't help but do that when a race occurs?

I mean, I can't help when CONFIG_DAMON is enabled.

The race (or, handling faults that caused by other entity) can hppen if and
only if all below things happen.

1. CONFIG_DAMON is enbled.
2. CONFIG_NUMA_BALANCING is enabled.
3. The user repeatedly turns on and off NUMA balancing and fault-based mode
   DAMON in runtime.

If any of these are not true, the race can completely avoided.  I was saying
about the case that the first condition is not met.

> 
> >
> > >
> > > Keep in mind fault handling is verrrry racey (purposefully) and can be done
> > > under VMA read lock alone (and that's only very loosely a lock!).
> > >
> > > This is just, yeah, concerning.
> >
> > Thank you for the caution.  DAMON's fault handling code only saves the
> > information in its internal ring buffer.  It doesn't touch vmas.  So I think
> > there should be no such problems.  I will add the clarification on the next
> > version.
> 
> Right, I'm just saying that this all being super racey between NUMA
> enabled/disabled seems pretty unavoidable, but we covered above.
> 
> > >
> > > Secondly, somebody can disable/enable NUMA balancing, and thus you are now
> > > allowing this function to, when somebody specifies 'DAMON', get NUMA
> > > balancing fault handling??
> > >
> > > If you don't bother checking whether NUMA balancing is disabled when you
> > > set it, then boom - you've broken the faulting mechanism, but even if you
> > > _do_, somebody can just switch it on again...
> >
> > As I explained on the two racy cases aboe, faults that caused by DAMON or NUMA
> > balancing can be handled by the other's handling code, but only for a limited
> > time under the user's controls.  But to my understanding that should not cause
> > real problems in the practice, and users wouldn't be suggested to operate the
> > system in such a way.  I will add more documents and cautions about that in the
> > next version.
> 
> I really don't think a version of the code that results in the wrong handler
> being used is upstreamable, sorry.
> 
> I've not dug into the nitty gritty details on what would happen in both cases,
> but even if it were 100% fine _now_, this is _exactly_ the kind of thing that
> results in horrible hard-to-debug issues later, should something change.
> 
> Implicitly 'just having to know' that we might be in the wrong fault handler
> seems like asking for trouble, and the RoI on an optional profiling tool (this
> is not to take away from DAMON which is a _great_ utility, I'm saying it's
> simply not part of the _core_) isn't there.

I completely understand your concerns, thank you for nicely and patiently
keeping this discussion.  I don't need to upstream this in short term, and open
to every option.  So let's take sufficient time and discussions.

I will take more time to think about a few potential options to move forward,
and share those with a level of details that can help us easily further
discuss, hopefully in a few days.


Thanks,
SJ
Re: [RFC v2 5/7] mm/memory: implement MM_CP_DAMON
Posted by Lorenzo Stoakes 2 months ago
On Tue, Jul 29, 2025 at 09:21:06PM -0700, SeongJae Park wrote:
> > > Thank you for review and comments, Lorenzo.  I fundamentally agree all your
> > > points.  I don't aim to merge this as-is.  Actually this patch series is more
> > > like POC, but apparently I was rushing.  I will try to adjust your concerns in
> > > the next version.
> >
> > Thanks.
> >
> > I do wonder whether we really can have a whole new faulting mechanism just for
> > DAMON. Because if in future, we wanted to change how this worked, we'd be
> > constrained, and it is a very specific user.
> >
> > The issue is you need the PTE to be restored to its previous state, just like
> > NUMA balancing.
> >
> > And I really really do not like this 'oh if you turn it off you can use it for
> > DAMON' thing, it's just really odd and asking for trouble.
> >
> > I think the only _workable_ version of this would be to convert the numa
> > handling to a generic 'fault then restore' type mechanism that could be hooked
> > in to by either NUMA or DAMON.
>
> I agree, and this is my current plan for the next version of this patch.

I think we'd need some way to embed this information somehow, but I'll see what
you come up with :)

>
> >
> > But really I think you'd _need_ to not have significantly different behaviour
> > between the two and _not_ constrain this to being only when NUMA balancing is
> > disabled.
>
> I agree all the points.  Especially the current interface is ambiguous and easy
> to mistake.
>
> >
> > But then you'd need to know _this_ PTE is for NUMA balancing vs. another is for
> > this stuff.
>
> Yes, this would be the ideal.  But, memorizing something in page level is ...
> always an interesting challenge in my opinion, and I have no good idea to do
> this for now :)

Yes this is the crux, or another approach will need to be taken.

>
> >
> > I'm not really sure there is an upstreamable version of this, but it'd need to
> > be made generic like that if there were.
> >
> > I think it might be worth taking some time to examine whether a version of this
> > that can be sensibly generic (which could have hooks for things) is _possible_
> > before sending a v2.
>
> Agreed, I don't need to rush.  Let's take time and discuss sufficiently. :)

Worth just playing wtih different solution.

>
> Nonetheless, I may post a followup version of this patch series that contains
> this one, even before we get a conclusion about this specific one.  I think I
> may have to do that, for sharing the future idea in a way easy to understand
> and test.  I think it might also help us at understanding the real ROI of this
> patch, and if there is another option to move forward.  In the case, I will of
> course keep RFC tag and clearly note that this patch is still under the
> discussion and not willing to be merged as is before the discussion is done.

OK, as long as you please highlight this, indicating that you ack it's not
mergeable with the current approach.

> > Well, in the above you say that you can't help but do that when a race occurs?
>
> I mean, I can't help when CONFIG_DAMON is enabled.
>
> The race (or, handling faults that caused by other entity) can hppen if and
> only if all below things happen.
>
> 1. CONFIG_DAMON is enbled.
> 2. CONFIG_NUMA_BALANCING is enabled.
> 3. The user repeatedly turns on and off NUMA balancing and fault-based mode
>    DAMON in runtime.
>
> If any of these are not true, the race can completely avoided.  I was saying
> about the case that the first condition is not met.

Well not necessarily continually, just at any stage.

I do think you're limiting yourself rather by requiring NUMA balancing to be
switched off.

But in any case, for reasons previously mentioned, I don't like this approach
and I don't think it's at all workable, rather we either need a generic 'restore
PTE state' mechanism (that can somehow store 'both OR either' NUMA/DAMON state
with them), or an entirely different approach.

Another idea might be that of converting the current NUMA balancing to something
generic, whose semantics are such that:

- It restores PTE state
- It 'happens' to relocate the mapping to the node the CPU from which the access
  was made resides upon
- It also calls a hook.

So treat the NUMA balancing aspect as sort of implicit.

Then you can just use the machinery for marking ranges for NUMA balancing for
this, having defined a hook.

If you're indifferent as to whether you get the hook called for arbitrary NUMA
balancing faults, or have a way to identify whether or not those mappings were
marked for DAMON stats, then you have what you need without having to customise
_anything_ in the general faulting code for DAMON.

Instead you'd just set up the hook for DAMON.

This is just an untested and highly theoretical thought :)

> >
> > I really don't think a version of the code that results in the wrong handler
> > being used is upstreamable, sorry.
> >
> > I've not dug into the nitty gritty details on what would happen in both cases,
> > but even if it were 100% fine _now_, this is _exactly_ the kind of thing that
> > results in horrible hard-to-debug issues later, should something change.
> >
> > Implicitly 'just having to know' that we might be in the wrong fault handler
> > seems like asking for trouble, and the RoI on an optional profiling tool (this
> > is not to take away from DAMON which is a _great_ utility, I'm saying it's
> > simply not part of the _core_) isn't there.
>
> I completely understand your concerns, thank you for nicely and patiently
> keeping this discussion.  I don't need to upstream this in short term, and open
> to every option.  So let's take sufficient time and discussions.
>
> I will take more time to think about a few potential options to move forward,
> and share those with a level of details that can help us easily further
> discuss, hopefully in a few days.

Thanks!