arch/x86/events/amd/ibs.c | 11 + arch/x86/include/asm/entry-common.h | 3 + arch/x86/include/asm/hardirq.h | 2 + arch/x86/include/asm/ibs.h | 9 + arch/x86/include/asm/msr-index.h | 16 ++ arch/x86/mm/Makefile | 3 +- arch/x86/mm/ibs.c | 344 ++++++++++++++++++++++++++++ include/linux/kpromoted.h | 54 +++++ include/linux/mmzone.h | 4 + include/linux/vm_event_item.h | 30 +++ mm/Kconfig | 7 + mm/Makefile | 1 + mm/kpromoted.c | 305 ++++++++++++++++++++++++ mm/migrate.c | 5 +- mm/mm_init.c | 10 + mm/vmstat.c | 30 +++ 16 files changed, 831 insertions(+), 3 deletions(-) create mode 100644 arch/x86/include/asm/ibs.h create mode 100644 arch/x86/mm/ibs.c create mode 100644 include/linux/kpromoted.h create mode 100644 mm/kpromoted.c
Hi, This is an attempt towards having a single subsystem that accumulates hot page information from lower memory tiers and does hot page promotion. At the heart of this subsystem is a kernel daemon named kpromoted that does the following: 1. Exposes an API that other subsystems which detect/generate memory access information can use to inform the daemon about memory accesses from lower memory tiers. 2. Maintains the list of hot pages and attempts to promote them to toptiers. Currently I have added AMD IBS driver as one source that provides page access information as an example. This driver feeds info to kpromoted in this RFC patchset. More sources were discussed in a similar context here at [1]. This is just an early attempt to check what it takes to maintain a single source of page hotness info and also separate hot page detection mechanisms from the promotion mechanism. There are too many open ends right now and I have listed a few of them below. - The API that is provided to register memory access expects the PFN, NID and time of access at the minimum. This is described more in patch 2/4. This API currently can be called only from contexts that allow sleeping and hence this rules out using it from PTE scanning paths. The API needs to be more flexible with respect to this. - Some sources like PTE A bit scanning can't provide the precise time of access or the NID that is accessing the page. The latter has been an open problem to which I haven't come across a good and acceptable solution. - The way the hot page information is maintained is pretty primitive right now. Ideally we would like to store hotness info in such a way that it should be easily possible to lookup say N most hot pages. - If PTE A bit scanners are considered as hotness sources, we will be bombarded with accesses. Do we want to accomodate all those accesses or just go with hotness info for fixed number of pages (possibly as a ratio of lower tier memory capacity)? - Undoubtedly the mechanism to classify a page as hot and subsequent promotion needs to be more sophisticated than what I have right now. This is just an early RFC posted now to ignite some discussion in the context of LSFMM [2]. I am also working with Raghu to integrate his kmmdscan [3] as the hotness source and use kpromoted for migration. Also, I had posted the IBS driver ealier as an alternative to hint faults based NUMA Balancing [4]. However here I am using it as generic page hotness source. [1] https://lore.kernel.org/linux-mm/de31971e-98fc-4baf-8f4f-09d153902e2e@amd.com/ [2] https://lore.kernel.org/linux-mm/20250123105721.424117-1-raghavendra.kt@amd.com/ [3] https://lore.kernel.org/all/20241201153818.2633616-1-raghavendra.kt@amd.com/ [3] https://lore.kernel.org/lkml/20230208073533.715-2-bharata@amd.com/ Regards, Bharata. Bharata B Rao (4): mm: migrate: Allow misplaced migration without VMA too mm: kpromoted: Hot page info collection and promotion daemon x86: ibs: In-kernel IBS driver for memory access profiling x86: ibs: Enable IBS profiling for memory accesses arch/x86/events/amd/ibs.c | 11 + arch/x86/include/asm/entry-common.h | 3 + arch/x86/include/asm/hardirq.h | 2 + arch/x86/include/asm/ibs.h | 9 + arch/x86/include/asm/msr-index.h | 16 ++ arch/x86/mm/Makefile | 3 +- arch/x86/mm/ibs.c | 344 ++++++++++++++++++++++++++++ include/linux/kpromoted.h | 54 +++++ include/linux/mmzone.h | 4 + include/linux/vm_event_item.h | 30 +++ mm/Kconfig | 7 + mm/Makefile | 1 + mm/kpromoted.c | 305 ++++++++++++++++++++++++ mm/migrate.c | 5 +- mm/mm_init.c | 10 + mm/vmstat.c | 30 +++ 16 files changed, 831 insertions(+), 3 deletions(-) create mode 100644 arch/x86/include/asm/ibs.h create mode 100644 arch/x86/mm/ibs.c create mode 100644 include/linux/kpromoted.h create mode 100644 mm/kpromoted.c -- 2.34.1
On 3/6/25 16:45, Bharata B Rao wrote: > Hi, > > This is an attempt towards having a single subsystem that accumulates > hot page information from lower memory tiers and does hot page > promotion. > > At the heart of this subsystem is a kernel daemon named kpromoted that > does the following: > > 1. Exposes an API that other subsystems which detect/generate memory > access information can use to inform the daemon about memory > accesses from lower memory tiers. > 2. Maintains the list of hot pages and attempts to promote them to > toptiers. > > Currently I have added AMD IBS driver as one source that provides > page access information as an example. This driver feeds info to > kpromoted in this RFC patchset. More sources were discussed in a > similar context here at [1]. > Is hot page promotion mandated or good to have? Memory tiers today are a function of latency and bandwidth, specifically in mt_aperf_to_distance() adist ~ k * R(B)/R(L) where R(x) is relatively performance of the memory w.r.t DRAM. Do we want hot pages in the top tier all the time? Are we optimizing for bandwidth or latency? > This is just an early attempt to check what it takes to maintain > a single source of page hotness info and also separate hot page > detection mechanisms from the promotion mechanism. There are too > many open ends right now and I have listed a few of them below. > <snip> > This is just an early RFC posted now to ignite some discussion > in the context of LSFMM [2]. > I look forward to any summary of the discussions Balbir Singh
Hi Balbir, On 18-Mar-25 10:58 AM, Balbir Singh wrote: > On 3/6/25 16:45, Bharata B Rao wrote: >> Hi, >> >> This is an attempt towards having a single subsystem that accumulates >> hot page information from lower memory tiers and does hot page >> promotion. >> >> At the heart of this subsystem is a kernel daemon named kpromoted that >> does the following: >> >> 1. Exposes an API that other subsystems which detect/generate memory >> access information can use to inform the daemon about memory >> accesses from lower memory tiers. >> 2. Maintains the list of hot pages and attempts to promote them to >> toptiers. >> >> Currently I have added AMD IBS driver as one source that provides >> page access information as an example. This driver feeds info to >> kpromoted in this RFC patchset. More sources were discussed in a >> similar context here at [1]. >> > > Is hot page promotion mandated or good to have? If you look at the current hot page promotion (NUMAB=2) logic, IIUC an accessed lower tier page is directly promoted to toptier if enough space exists in the toptier node. In such cases, it doesn't even bother about the hot threshold (measure of how recently it was accessed) or migration rate limiting. This tells me that it in a tiered memory setup, having an accessed page in toptier is preferrable. > Memory tiers today > are a function of latency and bandwidth, specifically in > mt_aperf_to_distance() > > adist ~ k * R(B)/R(L) where R(x) is relatively performance of the > memory w.r.t DRAM. Do we want hot pages in the top tier all the time? > Are we optimizing for bandwidth or latency? When memory tiering code converts BW and latency numbers into an opaque metric adistance based on which the node gets placed at an appropriate position in the tiering hierarchy, I wonder if it is still possible to say if we are optimizing for bandwidth or latency separately? >> This is just an early attempt to check what it takes to maintain >> a single source of page hotness info and also separate hot page >> detection mechanisms from the promotion mechanism. There are too >> many open ends right now and I have listed a few of them below. >> > > > <snip> > >> This is just an early RFC posted now to ignite some discussion >> in the context of LSFMM [2]. >> > > I look forward to any summary of the discussions Sure. Thanks, Bharata.
On 3/20/25 20:07, Bharata B Rao wrote: > Hi Balbir, > > On 18-Mar-25 10:58 AM, Balbir Singh wrote: >> On 3/6/25 16:45, Bharata B Rao wrote: >>> Hi, >>> >>> This is an attempt towards having a single subsystem that accumulates >>> hot page information from lower memory tiers and does hot page >>> promotion. >>> >>> At the heart of this subsystem is a kernel daemon named kpromoted that >>> does the following: >>> >>> 1. Exposes an API that other subsystems which detect/generate memory >>> access information can use to inform the daemon about memory >>> accesses from lower memory tiers. >>> 2. Maintains the list of hot pages and attempts to promote them to >>> toptiers. >>> >>> Currently I have added AMD IBS driver as one source that provides >>> page access information as an example. This driver feeds info to >>> kpromoted in this RFC patchset. More sources were discussed in a >>> similar context here at [1]. >>> >> >> Is hot page promotion mandated or good to have? > > If you look at the current hot page promotion (NUMAB=2) logic, IIUC an accessed lower tier page is directly promoted to toptier if enough space exists in the toptier node. In such cases, it doesn't even bother about the hot threshold (measure of how recently it was accessed) or migration rate limiting. This tells me that it in a tiered memory setup, having an accessed page in toptier is preferrable. > I'll review the patches, I don't agree with toptier, I think DRAM is the right tier >> Memory tiers today >> are a function of latency and bandwidth, specifically in >> mt_aperf_to_distance() >> >> adist ~ k * R(B)/R(L) where R(x) is relatively performance of the >> memory w.r.t DRAM. Do we want hot pages in the top tier all the time? >> Are we optimizing for bandwidth or latency? > > When memory tiering code converts BW and latency numbers into an opaque metric adistance based on which the node gets placed at an appropriate position in the tiering hierarchy, I wonder if it is still possible to say if we are optimizing for bandwidth or latency separately? I think we need a notion of that, just higher tiers may not be right. IOW, I think we need to promote to at-most the DRAM tier, not above it. Balbir Singh
+ Harry, who was called Hyeonggon before. Hello, Thank you very much for sharing this great patchset. On Thu, 6 Mar 2025 11:15:28 +0530 Bharata B Rao <bharata@amd.com> wrote: > Hi, > > This is an attempt towards having a single subsystem that accumulates > hot page information from lower memory tiers and does hot page > promotion. That is one of DAMON's goal, too. DAMON aims to be a kernel subsystem that can provide access information that accumulated from multiple sources and can be useful for multiple use cases including profiling and access aware system operations. Hot pages information and promotioning those are one of such information and operations. SK hynix developed their CXL memory tiering solution[1] using DAMON. I also shared auto-tuning based memory tiering solution idea[2] before. On LSFMMBPF 2025, I may share its prototype implementation and evaluation results on CXL memory devices that I recentily gained access. Of course, DAMON is still in the middle of its journey towards the northern star. I'm looking for what are really required to DAMON for the goal, what are [not] available with today's DAMON, and what should be the good future plans. My LSFMMBPF 2025 topic proposals are for those. Hence, this patchset is very helpful to me at showing what can be added and improved on DAMON. I specifically understand support of access information sources other than Page tables' accessed bits such as AMD IBS as the main thing. I admit the fact that DAMON of today is supporting only page tables' accessed bit as the primary source of the information. But DAMON of future would be different. Let me share more thoughts below. > > At the heart of this subsystem is a kernel daemon named kpromoted that > does the following: > > 1. Exposes an API that other subsystems which detect/generate memory > access information can use to inform the daemon about memory > accesses from lower memory tiers. DAMON also provides such API, namely, its monitoring operations set layer interface[3]. Nevertheless, only page tables accessed bit use cases exist today. Hence the interface may have have hidden problems at extending for other sources. > 2. Maintains the list of hot pages and attempts to promote them to > toptiers. DAMON provides its another half, DAMOS[4], for this kind of usages. > > Currently I have added AMD IBS driver as one source that provides > page access information as an example. This driver feeds info to > kpromoted in this RFC patchset. More sources were discussed in a > similar context here at [1]. I was imagining how I would be able to do this with DAMON via operations set layer interface. And I find thee current interface is not very optimized for AMD IBS like sources that catches the access on the line. That is, in a way, we could say AMD IBS like primitives as push-oriented, while page tables' accessed bits information are pull-oriented. DAMON operations set layer interface is easier to be used in pull-oriented case. I don't think it cannot be used for push-oriented case, but definitely the interface would better to be more optimized for the use case. I'm curious if you also tried doing this by extending DAMON, and if some hidden problems you found. > > This is just an early attempt to check what it takes to maintain > a single source of page hotness info and also separate hot page > detection mechanisms from the promotion mechanism. There are too > many open ends right now and I have listed a few of them below. > > - The API that is provided to register memory access expects > the PFN, NID and time of access at the minimum. This is > described more in patch 2/4. This API currently can be called > only from contexts that allow sleeping and hence this rules > out using it from PTE scanning paths. The API needs to be > more flexible with respect to this. > - Some sources like PTE A bit scanning can't provide the precise > time of access or the NID that is accessing the page. The latter > has been an open problem to which I haven't come across a good > and acceptable solution. Agree. PTE A bit scanning could be useful in many cases, but not every case. There was an RFC patchset[7] that extends DAMON for NID. I'm planning to do that again using DAMON operations layer interface. My current plan is to implement the prototype using prot_none page faults, and later extend for AMD IBS like h/w features. Hopefully I will share a prototype or at least more detailed idea on LSFMMBPF 2025. > - The way the hot page information is maintained is pretty > primitive right now. Ideally we would like to store hotness info > in such a way that it should be easily possible to lookup say N > most hot pages. DAMON provides a feature for lookup of N most hotpages, namely DAMOS quotas' access pattern based regions prioritization[5]. > - If PTE A bit scanners are considered as hotness sources, we will > be bombarded with accesses. Do we want to accomodate all those > accesses or just go with hotness info for fixed number of pages > (possibly as a ratio of lower tier memory capacity)? I understand you're saying about memory space overhead. Correct me if I'm wrong, please. Isn't same issue exists for current implementation of the sampling frequency is high, and/or aggregation window is long? To me, hence, this looks like not a problem of the information source, but how to maintain the information. Current implementation maintains it per page, so I think the problem is inherent. DAMON maintains the information in region abstraction that can save multiple pages with one data structure. The maximum number of regions can be set by users, so the space overhead can be controlled. > - Undoubtedly the mechanism to classify a page as hot and subsequent > promotion needs to be more sophisticated than what I have right now. DAMON provides aim-based DAMOS aggressiveness auto-tuning[6] and monitoring intervals auto-tuning[8] for this purpose. > > This is just an early RFC posted now to ignite some discussion > in the context of LSFMM [2]. This is really helpful. Appreciate, and looking forward to more discussions on LSFMM and mailing lists. > > I am also working with Raghu to integrate his kmmdscan [3] as the > hotness source and use kpromoted for migration. Raghu also mentioned he would try to take a time to look into DAMON if there is anything that he could reuse for the purpose. I'm curious if he was able to find something there. > > Also, I had posted the IBS driver ealier as an alternative to > hint faults based NUMA Balancing [4]. However here I am using > it as generic page hotness source. This will also be very helpful for understanding how IBS can be used. Appreciate! > > [1] https://lore.kernel.org/linux-mm/de31971e-98fc-4baf-8f4f-09d153902e2e@amd.com/ > [2] https://lore.kernel.org/linux-mm/20250123105721.424117-1-raghavendra.kt@amd.com/ > [3] https://lore.kernel.org/all/20241201153818.2633616-1-raghavendra.kt@amd.com/ > [3] https://lore.kernel.org/lkml/20230208073533.715-2-bharata@amd.com/ [1] https://github.com/skhynix/hmsdk/wiki/Capacity-Expansion [2] https://lore.kernel.org/all/20231112195602.61525-1-sj@kernel.org/ [3] https://origin.kernel.org/doc/html/latest/mm/damon/design.html#operations-set-layer [4] https://origin.kernel.org/doc/html/latest/mm/damon/design.html#operation-schemes [5] https://origin.kernel.org/doc/html/latest/mm/damon/design.html#prioritization [6] https://origin.kernel.org/doc/html/latest/mm/damon/design.html#aim-oriented-feedback-driven-auto-tuning [7] https://lore.kernel.org/linux-mm/cover.1645024354.git.xhao@linux.alibaba.com/ [8] https://origin.kernel.org/doc/html/next/mm/damon/design.html#monitoring-intervals-auto-tuning Thank, SJ > > Regards, > Bharata. > > Bharata B Rao (4): > mm: migrate: Allow misplaced migration without VMA too > mm: kpromoted: Hot page info collection and promotion daemon > x86: ibs: In-kernel IBS driver for memory access profiling > x86: ibs: Enable IBS profiling for memory accesses > > arch/x86/events/amd/ibs.c | 11 + > arch/x86/include/asm/entry-common.h | 3 + > arch/x86/include/asm/hardirq.h | 2 + > arch/x86/include/asm/ibs.h | 9 + > arch/x86/include/asm/msr-index.h | 16 ++ > arch/x86/mm/Makefile | 3 +- > arch/x86/mm/ibs.c | 344 ++++++++++++++++++++++++++++ > include/linux/kpromoted.h | 54 +++++ > include/linux/mmzone.h | 4 + > include/linux/vm_event_item.h | 30 +++ > mm/Kconfig | 7 + > mm/Makefile | 1 + > mm/kpromoted.c | 305 ++++++++++++++++++++++++ > mm/migrate.c | 5 +- > mm/mm_init.c | 10 + > mm/vmstat.c | 30 +++ > 16 files changed, 831 insertions(+), 3 deletions(-) > create mode 100644 arch/x86/include/asm/ibs.h > create mode 100644 arch/x86/mm/ibs.c > create mode 100644 include/linux/kpromoted.h > create mode 100644 mm/kpromoted.c > > -- > 2.34.1
Hi SJ, Thanks for your detailed points and this surely sets up a good context for discussion in LSFMM. Please see my replies to a few of your questions below: On 17-Mar-25 3:30 AM, SeongJae Park wrote: >> >> Currently I have added AMD IBS driver as one source that provides >> page access information as an example. This driver feeds info to >> kpromoted in this RFC patchset. More sources were discussed in a >> similar context here at [1]. > > I was imagining how I would be able to do this with DAMON via operations set > layer interface. And I find thee current interface is not very optimized for > AMD IBS like sources that catches the access on the line. That is, in a way, > we could say AMD IBS like primitives as push-oriented, while page tables' > accessed bits information are pull-oriented. DAMON operations set layer > interface is easier to be used in pull-oriented case. I don't think it cannot > be used for push-oriented case, but definitely the interface would better to be > more optimized for the use case. > > I'm curious if you also tried doing this by extending DAMON, and if some hidden > problems you found. I remember discussing this with you during DAMON BoF in one of the earlier LPC events, but I didn't get to try it. Guess now is the time :-) I see the challenge with the current DAMON interfaces to integrate IBS provided access info. If you check my IBS driver, I store the incoming access info from IBS into per-cpu buffers before pushing them on to the subsystem that act on them. I would think pull-based DAMON interfaces can consume those buffered samples rather than IBS pushing samples into DAMON. But I am yet to get clarity on how to honor the region based sampling that is inherent to DAMON's functioning. May be only using samples that are of interest to the region being tracked could be one way. > >> >> This is just an early attempt to check what it takes to maintain >> a single source of page hotness info and also separate hot page >> detection mechanisms from the promotion mechanism. There are too >> many open ends right now and I have listed a few of them below. >> >> - The API that is provided to register memory access expects >> the PFN, NID and time of access at the minimum. This is >> described more in patch 2/4. This API currently can be called >> only from contexts that allow sleeping and hence this rules >> out using it from PTE scanning paths. The API needs to be >> more flexible with respect to this. >> - Some sources like PTE A bit scanning can't provide the precise >> time of access or the NID that is accessing the page. The latter >> has been an open problem to which I haven't come across a good >> and acceptable solution. > > Agree. PTE A bit scanning could be useful in many cases, but not every case. > There was an RFC patchset[7] that extends DAMON for NID. I'm planning to do > that again using DAMON operations layer interface. My current plan is to > implement the prototype using prot_none page faults, and later extend for AMD > IBS like h/w features. Hopefully I will share a prototype or at least more > detailed idea on LSFMMBPF 2025. > >> - The way the hot page information is maintained is pretty >> primitive right now. Ideally we would like to store hotness info >> in such a way that it should be easily possible to lookup say N >> most hot pages. > > DAMON provides a feature for lookup of N most hotpages, namely DAMOS quotas' > access pattern based regions prioritization[5]. > >> - If PTE A bit scanners are considered as hotness sources, we will >> be bombarded with accesses. Do we want to accomodate all those >> accesses or just go with hotness info for fixed number of pages >> (possibly as a ratio of lower tier memory capacity)? > > I understand you're saying about memory space overhead. Correct me if I'm > wrong, please. Correct and also the overhead of managing so much data. What I see is that if I start pushing all the access info obtained from LRU pgtable scanning, kpromoted would end up spending a lot of time in operations like lookup, walking the list of hot pages etc. So may be it would be better to do some sort of early processing and/or filtering at the hotness source level itself before letting kpromoted-like subsystems to do further tracking and action. > > Isn't same issue exists for current implementation of the sampling frequency is > high, and/or aggregation window is long? > > To me, hence, this looks like not a problem of the information source, but how > to maintain the information. Current implementation maintains it per page, so > I think the problem is inherent. Well yes, but we the goal could be do better than NUMAB=2 which does per-page level tracking. > > DAMON maintains the information in region abstraction that can save multiple > pages with one data structure. The maximum number of regions can be set by > users, so the space overhead can be controlled. The granularity of tracking - per-page vs range/region is a topic of discussion I suppose. Regards, Bharata.
On 3/17/2025 3:30 AM, SeongJae Park wrote: > + Harry, who was called Hyeonggon before. >> >> I am also working with Raghu to integrate his kmmdscan [3] as the >> hotness source and use kpromoted for migration. > > Raghu also mentioned he would try to take a time to look into DAMON if there is > anything that he could reuse for the purpose. I'm curious if he was able to > find something there. > [...] Hello SJ, I did take a look at DAMON vaddr and paddr implementation. Also wondering how can I optimize hotness data collected by kmmscand. DAMON regions should be very helpful here, But I am not there yet. will surely need help brainstorming session post my next RFC. Thanks and Regards - Raghu
> Hi, > > This is an attempt towards having a single subsystem that accumulates > hot page information from lower memory tiers and does hot page > promotion. > > At the heart of this subsystem is a kernel daemon named kpromoted that > does the following: > > 1. Exposes an API that other subsystems which detect/generate memory > access information can use to inform the daemon about memory > accesses from lower memory tiers. > 2. Maintains the list of hot pages and attempts to promote them to > toptiers. > > Currently I have added AMD IBS driver as one source that provides > page access information as an example. This driver feeds info to > krpromoted in this RFC patchset. FWIW, here are some numbers from krpomoted driven hotpage promotion with IBS as the hotness source: Test 1 ====== Memory allocated on DRAM and CXL nodes explicitly and no demotion activity is seen. Benchmark details ----------------- * Memory is allocated initially on DRAM and CXL nodes separately. * Two threads: One accessing DRAM-allocated and other CXL-allocated memory. * Divides memory area into regions and accesses pages within the region randomly and repetitively. In the test config shown below, the allocated memory is divided into regions of 1GB size and each such region is repetitively (512 times) accessed with 21474836480 random accesses in each repetition). * Benchmark score is time taken for accesses to complete, lower is better * Data accesses from CXL node are expected to trigger promotion * Test system has 2 DRAM nodes (128G each) and a CXL node (128G) kernel.numa_balancing 2 for base, 0 for kpromoted demotion true Threads run on Node 1 Memory allocated on Node 1(DRAM) and Node 2(CXL) Initial allocation ratio 75% on DRAM Allocated memory size 160G (mmap, MAP_POPULATE) Initial memory on DRAM node 120G Initial memory on CXL node 40G Hot region size 1G Acccess pattern random Access granularity 4K Load/store ratio 50% loads + 50% stores Number of accesses 21474836480 Nr access repetitions 512 Benchmark completion time ------------------------- Base, NUMAB=2 261s kpromoted-ibs, NUMAB=0 281s Stats comparision ----------------- Base,NUMAB=2 kpromoted-IBS,NUMAB=0 pgdemote_kswapd 0 0 pgdemote_direct 0 0 numa_pte_updates 10485760 0 numa_hint_faults 4427809 0 numa_pages_migrated 388229 374765 kpromoted_recorded_accesses 1651130 /* nr accesses reported to kpromoted */ kpromoted_recorded_hwhints 1651130 /* nr accesses coming from IBS */ kpromoted_record_toptier 1269697 /* nr accesses from toptier/DRAM */ kpromoted_record_added 378090 /* nr accesses considered for promotion */ kpromoted_mig_promoted 374765 /* nr pages promoted */ hwhint_nr_events 1674227 /* nr events reported by IBS */ hwhint_dram_accesses 1269626 /* nr DRAM accesses reported by IBS */ hwhint_cxl_accesses 381435 /* nr Extmem (CXL) accesses reported by IBS */ hwhint_useful_samples 1651110 /* nr actionable samples as per IBS driver */ Test 2 ====== Memory is allocated with DRAM and CXL nodes in the affinity mask with MPOL_BIND + MPOL_F_NUMA_BALANCING. Benchmark details ----------------- * Initially, memory allocated spreads over from DRAM to CXL, involves demotion * Single thread accesses the memory * Divides memory area into regions and accesses pages within the region randomly and repetitively. In the test config shown below, the allocated memory is divided into regions of 1GB size and each such region is repetitively (512 times) accessed with 21474836480 random accesses in each repetition). * Benchmark score is time taken for accesses to complete, lower is better * Data accesses from CXL node are expected to trigger promotion * Test system has 2 DRAM nodes (128G each) and a CXL node (128G) kernel.numa_balancing 2 for base, 0 for kpromoted demotion true Threads run on Node 1 Memory allocated on Node 1(DRAM) and Node 2(CXL) Allocated memory size 192G (mmap, MAP_POPULATE) Hot region size 1G Acccess pattern random Access granularity 4K Load/store ratio 50% loads + 50% stores Number of accesses 21474836480 Nr access repetitions 512 Benchmark completion time ------------------------- Base, NUMAB=2 628s kpromoted-ibs, NUMAB=0 626s Stats comparision ----------------- Base,NUMAB=2 kpromoted-IBS,NUMAB=0 pgdemote_kswapd 73187 2196028 pgdemote_direct 0 0 numa_pte_updates 27511631 0 numa_hint_faults 10010852 0 numa_pages_migrated 14 611177 /* such low number of promotions is unexecpted in Base, Need to recheck */ kpromoted_recorded_accesses 1883570 kpromoted_recorded_hwhints 1883570 kpromoted_record_toptier 1262088 kpromoted_record_added 616273 kpromoted_mig_promoted 611077 hwhint_nr_events 1904619 hwhint_dram_accesses 1261758 hwhint_cxl_accesses 621428 hwhint_useful_samples 1883543
© 2016 - 2026 Red Hat, Inc.