include/linux/damon.h | 34 ++++++++++++++ include/linux/mm.h | 1 + mm/damon/core.c | 101 ++++++++++++++++++++++++++++++++++++++++++ mm/damon/paddr.c | 77 +++++++++++++++++++++++++++++++- mm/damon/sysfs.c | 4 ++ mm/damon/vaddr.c | 7 +++ mm/memory.c | 53 +++++++++++++++++++++- mm/mprotect.c | 5 +++ 8 files changed, 279 insertions(+), 3 deletions(-)
TL; DR: Extend DAMON interface between core and operation sets for operation set driven report-based monitoring such as per-CPU and write-only access monitoring. Further introduce an example physical address space monitoring operation set that uses page faults as the source of the information. Background ---------- Existing DAMON operations set implementations, namely paddr, vaddr, and fvaddr, use Accessed bits of page tables as the main source of the access information. Accessed bits has some restrictions, though. For example, it cannot tell which CPU or GPU made the access, whether the access was read or write, and which part of the mapped entity was really accessed. Depending on the use case, the limitations can be problematic. Because the issue stems from the nature of page table Accessed bit, utilizing access information from different sources can mitigate the issue. Page faults, memory access instructions sampling interrupts, system calls, or any information from other kernel space friends such as subsystems or device drivers of CXL or GPUs could be examples of the different sources. DAMON separates its core and operation set layer for easy extensions. The operation set layer handles the low level access information handling, and the core layer handles more high level work such as the region-based overhead/accuracy control and access-aware system operations. Hence we can extend DAMON to use the different sources by implementing and using another DAMON operations set. The core layer features will still be available with the new sources, without additional changes. Nevertheless, the current interface between the core and the operation set layers is optimized for the Accessed bits case. Specifically, the interface asks the operations set if a given part of memory has been accessed or not in a given time period (last sampling interval). It is easy for the Accessed bit use case, since the information is stored in page tables. Operation set can simply read the current value of the Accessed bit. For some sources other than Accessed bits, such as page faults or instruction sampling interrupts, the operations set may need to collect and keep the access information in its internal memory until the core layer asks the access information. Only after answering the question, the information could be dropped. Implementing such operation set internal memory management woudl be not very trivial. Also it could end up multiple similar operation set implementations having their own internal memory management code that is unnecessarily duplicated. Core Layer Changes for Reporting-based Monitoring ------------------------------------------------- Optimize such possible duplicated efforts, by updating DAMON core layer to support real time access reporting. The updated interface allows operations set implementations to report (or, push) their information to the core layer, on their preferred schedule. DAMON core layer will handle the reports by managing meta data and updating the final monitoring results (DAMON regions) accordingly. Also add another operations set callback to determine if a given access report is eligible to be used for a given operations set. For example, if the operations set implementation is for monitoring only specific CPU or writes, the operations set could ask the core layer to ignore reported accesses that were made by other CPUs, or were made for reads. paddr_fault: Page Faults-based Physical Address Space Access Monitoring ----------------------------------------------------------------------- Using the core layer changes, implement a new DAMON operation set, namely paddr_fault. It is the same as the page table Accessed bits based physical address space monitoring, but uses page faults as the source of the access information. Specifically, it installs PAGE_NONE protection to access sampling pages on damon_operations->prepare_access_checks() callback. Then, it captures the following access to the page in the page fault handling context, and directly reports the findings to DAMON, using damon_report_access(). For the PAGE_NONE protection use case, introduce a new change_protection() flag, namely MM_CP_DAMON. To avoid interfering with NUMA_BALANCING, the page fault handling invokes fault handling logic of DAMON or NUMA_BALANCING, based on the NUMA_BALANCING enablement. This operation set is only for giving examples of how the damon_report_access() can be used for multiple sources of the information, and easy testing. It ain't be merged into the mainline as is. I'm currently planning to further develop it for per-CPU access monitoring by the final version of this patch series. How Per-CPU or Write-only Monitoring Can Be Implemented ------------------------------------------------------- The paddr_fault can be extended for per-CPU or write-only monitoring. We can get the access source CPU or whether it was write access from the page fault information, and put that into the DAMON report (struct damon_access_report). Extending damon_access_report struct with a few fields for storing the information would be needed. Then we can make a new DAMON operation set that is similar to paddr_fault, but checks the eligibility of each access report, based on the CPU or write information. Of course, extending the existing operation set could also be an option. Then accesses made by CPUs of no interest or reads can be ignored, and users can show the per-CPU or write-only accesses using DAMON. Expected Users: Scheduling, VM Live Migration and NUMA Page Migrations ---------------------------------------------------------------------- We have ongoing off-list discussions of expected use cases of this patch series. We expect this patch series can be used for implementing per-CPU access monitoring, and it can be useful for L3 cache utilization-aware threads/process scheduling. Yet another expected use case is write-only monitoring, for finding easier live migration target VM instances. Also I believe this can be extended for not only per-CPU but any access entities including GPU-like accelerators, who expose their memory as NUMA nodes in some setups. With that, I think we could make a holistic and efficient access-aware NUMA pages migration system. Patches Sequence ---------------- The first patch introduces damon_report_access() that any kernel code that can sleep can use, to report their access information on their schedule. The second patch adds DAMON core-operations set interface for ignoring specific types of data access reports for the given operations set configuration. The third patch further implements the report eligibility check logic for vaddr. The fourth patch updates the core layer to really use the reported access information for making the monitoring results (DAMON regions). The fifth patch implements a new change_protection() flag, MM_CP_DAMON, and its fault handling logic for reporting the access to DAMON. The sixth patch implements a new page faults based physical address space access monitoring operation set, namely paddr_fault, using MM_CP_DAMON. Finally, the seventh patch updates DAMON sysfs interface to support paddr_fault. Plan for Dropping RFC --------------------- This patch series is an RFC for early sharing of the idea that was also shared on the last LSFMMBPF[1], as 'damon_report_access()' API plan. We will further optimize the core layer implementation and add one or more real operations set implementations that utilize the report-based interface, by the final version of this patch series. Of course, concerns we find on RFCs should be addressed. Revision History ---------------- Changes from RFC v1 (https://lore.kernel.org/20250629201443.52569-1-sj@kernel.org) - Fixup report reading logic for access absence accounting - Implement page faults based operations set (paddr_fault) [1] https://lwn.net/Articles/1016525/ SeongJae Park (7): mm/damon/core: introduce damon_report_access() mm/damon/core: add eligible_report() ops callback mm/damon/vaddr: implement eligible_report() mm/damon/core: read received access reports mm/memory: implement MM_CP_DAMON mm/damon: implement paddr_fault operations set mm/damon/sysfs: support paddr_fault include/linux/damon.h | 34 ++++++++++++++ include/linux/mm.h | 1 + mm/damon/core.c | 101 ++++++++++++++++++++++++++++++++++++++++++ mm/damon/paddr.c | 77 +++++++++++++++++++++++++++++++- mm/damon/sysfs.c | 4 ++ mm/damon/vaddr.c | 7 +++ mm/memory.c | 53 +++++++++++++++++++++- mm/mprotect.c | 5 +++ 8 files changed, 279 insertions(+), 3 deletions(-) base-commit: 3452e05f01b2a3dd126bd08961cc0df8daa5beee -- 2.39.5
On 27/07/2025, SeongJae Park wrote: > TL; DR: Extend DAMON interface between core and operation sets for > operation set driven report-based monitoring such as per-CPU and > write-only access monitoring. Further introduce an example physical > address space monitoring operation set that uses page faults as the > source of the information. Thank you very much for starting this update. RFC mentions write-only monitoring, this feature particularly would be really helpful in some of our use cases such as lightweight live migration target selection, so we are looking forward to collaborate in development and testing activity!
On Sun, 3 Aug 2025 19:47:41 -0700 Andrew Paniakin <apanyaki@amazon.com> wrote: > On 27/07/2025, SeongJae Park wrote: > > TL; DR: Extend DAMON interface between core and operation sets for > > operation set driven report-based monitoring such as per-CPU and > > write-only access monitoring. Further introduce an example physical > > address space monitoring operation set that uses page faults as the > > source of the information. > > Thank you very much for starting this update. RFC mentions write-only > monitoring, this feature particularly would be really helpful in some of > our use cases such as lightweight live migration target selection, so we > are looking forward to collaborate in development and testing activity! Thank you for letting us know your interest, Andrew. This should be helpful at better prioritizations. Now development trees of DAMON and DAMON user-space tool support[1] write-only monitoring. The implementation is dirty and not upstreamable for now, but please feel free to test and let me know what you see if you don't mind. I will continue working on more testing and making it upstreamable. [1] https://damonitor.github.io/posts/write_only_cpus_only_monitoring/ Thanks, SJ
© 2016 - 2025 Red Hat, Inc.