fs/exec.c | 4 + include/linux/kmmscand.h | 30 + include/linux/mm.h | 14 + include/linux/mm_types.h | 4 + include/linux/vm_event_item.h | 14 + include/trace/events/kmem.h | 99 +++ kernel/fork.c | 4 + kernel/sched/fair.c | 13 +- mm/Kconfig | 7 + mm/Makefile | 1 + mm/huge_memory.c | 1 + mm/kmmscand.c | 1144 +++++++++++++++++++++++++++++++++ mm/memory.c | 12 +- mm/vmstat.c | 14 + 14 files changed, 1352 insertions(+), 9 deletions(-) create mode 100644 include/linux/kmmscand.h create mode 100644 mm/kmmscand.c
Introduction:
=============
This patchset is an outcome of an ongoing collaboration between AMD and Meta.
Meta wanted to explore an alternative page promotion technique as they
observe high latency spikes in their workloads that access CXL memory.
In the current hot page promotion, all the activities including the
process address space scanning, NUMA hint fault handling and page
migration is performed in the process context. i.e., scanning overhead is
borne by applications.
This is an early RFC patch series to do (slow tier) CXL page promotion.
The approach in this patchset assists/addresses the issue by adding PTE
Accessed bit scanning.
Scanning is done by a global kernel thread which routinely scans all
the processes' address spaces and checks for accesses by reading the
PTE A bit. It then migrates/promotes the pages to the toptier node
(node 0 in the current approach).
Thus, the approach pushes overhead of scanning, NUMA hint faults and
migrations off from process context.
Initial results show promising number on a microbenchmark.
Experiment:
============
Abench microbenchmark,
- Allocates 8GB/32GB of memory on CXL node
- 64 threads created, and each thread randomly accesses pages in 4K
granularity.
- 512 iterations with a delay of 1 us between two successive iterations.
SUT: 512 CPU, 2 node 256GB, AMD EPYC.
3 runs, command: abench -m 2 -d 1 -i 512 -s <size>
Calculates how much time is taken to complete the task, lower is better.
Expectation is CXL node memory is expected to be migrated as fast as
possible.
Base case: 6.11-rc6 w/ numab mode = 2 (hot page promotion is enabled).
patched case: 6.11-rc6 w/ numab mode = 0 (numa balancing is disabled).
we expect daemon to do page promotion.
Result [*]:
========
base patched
time in sec (%stdev) time in sec (%stdev) %gain
8GB 133.66 ( 0.38 ) 113.77 ( 1.83 ) 14.88
32GB 584.77 ( 0.19 ) 542.79 ( 0.11 ) 7.17
[*] Please note current patchset applies on 6.13-rc, but these results
are old because latest kernel has issues in populating CXL node memory.
Emailing findings/fix on that soon.
Overhead:
The below time is calculated using patch 10. Actual overhead for patched
case may be even lesser.
(scan + migration) time in sec
Total memory base kernel patched kernel %gain
8GB 65.743 13.93 78.8114324
32GB 153.95 132.12 14.17992855
Breakup for 8GB base patched
numa_task_work_oh 0.883 0
numa_hf_migration_oh 64.86 0
kmmscand_scan_oh 0 2.74
kmmscand_migration_oh 0 11.19
Breakup for 32GB base patched
numa_task_work_oh 4.79 0
numa_hf_migration_oh 149.16 0
kmmscand_scan_oh 0 23.4
kmmscand_migration_oh 0 108.72
Limitations:
===========
PTE A bit scanning approach lacks information about exact destination
node to migrate to.
Notes/Observations on design/Implementations/Alternatives/TODOs...
================================
1. Fine-tuning scan throttling
2. Use migrate_balanced_pgdat() to balance toptier node before migration
OR Use migrate_misplaced_folio_prepare() directly.
But it may need some optimizations (for e.g., invoke occasionaly so
that overhead is not there for every migration).
3. Explore if a separate PAGE_EXT flag is needed instead of reusing
PAGE_IDLE flag (cons: complicates PTE A bit handling in the system),
But practically does not look good idea.
4. Use timestamp information-based migration (Similar to numab mode=2).
instead of migrating immediately when PTE A bit set.
(cons:
- It will not be accurate since it is done outside of process
context.
- Performance benefit may be lost.)
5. Explore if we need to use PFN information + hash list instead of
simple migration list. Here scanning is directly done with PFN belonging
to CXL node.
6. Holding PTE lock before migration.
7. Solve: how to find target toptier node for migration.
8. Using DAMON APIs OR Reusing part of DAMON which already tracks range of
physical addresses accessed.
9. Gregory has nicely mentioned some details/ideas on different approaches in
[1] : development notes, in the context of promoting unmapped page cache folios.
10. SJ had pointed about concerns about kernel-thread based approaches as in
kstaled [2]. So current patchset has tried to address the issue with simple
algorithms to reduce CPU overhead. Migration throttling, Running the daemon
in NICE priority, Parallelizing migration with scanning could help further.
11. Toptier pages scanned can be used to assist current NUMAB by providing information
on hot VMAs.
Credits
=======
Thanks to Bharata, Joannes, Gregory, SJ, Chris for their valuable comments and
support.
Kernel thread skeleton and some part of the code is hugely inspired by khugepaged
implementation and some part of IBS patches from Bharata [3].
Looking forward for your comment on whether the current approach in this
*early* RFC looks promising, or are there any alternative ideas etc.
Links:
[1] https://lore.kernel.org/lkml/20241127082201.1276-1-gourry@gourry.net/
[2] kstaled: https://lore.kernel.org/lkml/1317170947-17074-3-git-send-email-walken@google.com/#r
[3] https://lore.kernel.org/lkml/Y+Pj+9bbBbHpf6xM@hirez.programming.kicks-ass.net/
I might have CCed more people or less people than needed
unintentionally.
Raghavendra K T (10):
mm: Add kmmscand kernel daemon
mm: Maintain mm_struct list in the system
mm: Scan the mm and create a migration list
mm/migration: Migrate accessed folios to toptier node
mm: Add throttling of mm scanning using scan_period
mm: Add throttling of mm scanning using scan_size
sysfs: Add sysfs support to tune scanning
vmstat: Add vmstat counters
trace/kmmscand: Add tracing of scanning and migration
kmmscand: Add scanning
fs/exec.c | 4 +
include/linux/kmmscand.h | 30 +
include/linux/mm.h | 14 +
include/linux/mm_types.h | 4 +
include/linux/vm_event_item.h | 14 +
include/trace/events/kmem.h | 99 +++
kernel/fork.c | 4 +
kernel/sched/fair.c | 13 +-
mm/Kconfig | 7 +
mm/Makefile | 1 +
mm/huge_memory.c | 1 +
mm/kmmscand.c | 1144 +++++++++++++++++++++++++++++++++
mm/memory.c | 12 +-
mm/vmstat.c | 14 +
14 files changed, 1352 insertions(+), 9 deletions(-)
create mode 100644 include/linux/kmmscand.h
create mode 100644 mm/kmmscand.c
base-commit: bcc8eda6d34934d80b96adb8dc4ff5dfc632a53a
--
2.39.3
Hello Raghavendra, Thank you for posting this nice patch series. I gave you some feedback offline. Adding those here again for transparency on this grateful public discussion. On Sun, 1 Dec 2024 15:38:08 +0000 Raghavendra K T <raghavendra.kt@amd.com> wrote: > Introduction: > ============= > This patchset is an outcome of an ongoing collaboration between AMD and Meta. > Meta wanted to explore an alternative page promotion technique as they > observe high latency spikes in their workloads that access CXL memory. > > In the current hot page promotion, all the activities including the > process address space scanning, NUMA hint fault handling and page > migration is performed in the process context. i.e., scanning overhead is > borne by applications. Yet another approach is using DAMON. DAMON does access monitoring, and further allows users to request access pattern-driven system operations in name of DAMOS (Data Access Monitoring-based Operation Schemes). Using it, users can request DAMON to find hot pages and promote, while finding cold pages and demote. SK hynix has made their CXL-based memory capacity expansion solution in the way (https://github.com/skhynix/hmsdk/wiki/Capacity-Expansion). We collaboratively developed new DAMON features for that, and those are all in the mainline since Linux v6.11. I also proposed an idea for advancing it using DAMOS auto-tuning on more general (>2 tiers) setup (https:lore.kernel.org/20231112195602.61525-1-sj@kernel.org). I haven't had a time to further implement and test the idea so far, though. > > This is an early RFC patch series to do (slow tier) CXL page promotion. > The approach in this patchset assists/addresses the issue by adding PTE > Accessed bit scanning. > > Scanning is done by a global kernel thread which routinely scans all > the processes' address spaces and checks for accesses by reading the > PTE A bit. It then migrates/promotes the pages to the toptier node > (node 0 in the current approach). > > Thus, the approach pushes overhead of scanning, NUMA hint faults and > migrations off from process context. DAMON also uses PTE A bit as major source of the access information. And DAMON does both access monitoring and promotion/demotion in a global kernel thread, namely kdamond. Hence the DAMON-based approach would also offload the overheads from process context. So I feel your approach has a sort of similarity with DAMON-based one in a way, and we might have a chance to avoid unnecessary duplicates. [...] > > Limitations: > =========== > PTE A bit scanning approach lacks information about exact destination > node to migrate to. This is same for DAMON-based approach, since DAMON also uses PTE A bit as the major source of the information. We aim to extend DAMON to aware of the access source CPU, and use it for solving this problem, though. Utilizing page faults or AMD IBS-like h/w features are on the table of the ideas. > > Notes/Observations on design/Implementations/Alternatives/TODOs... > ================================ > 1. Fine-tuning scan throttling DAMON allows users set the upper-limit of monitoring overhead, using max_nr_regions parameter. Then it provides its best-effort accuracy. We also have ongoing projects for making it more accurate and easier to tune. > > 2. Use migrate_balanced_pgdat() to balance toptier node before migration > OR Use migrate_misplaced_folio_prepare() directly. > But it may need some optimizations (for e.g., invoke occasionaly so > that overhead is not there for every migration). > > 3. Explore if a separate PAGE_EXT flag is needed instead of reusing > PAGE_IDLE flag (cons: complicates PTE A bit handling in the system), > But practically does not look good idea. > > 4. Use timestamp information-based migration (Similar to numab mode=2). > instead of migrating immediately when PTE A bit set. > (cons: > - It will not be accurate since it is done outside of process > context. > - Performance benefit may be lost.) DAMON provides a sort of time-based aggregated monitoring results. And DAMOS provides prioritization of pages based on the access temperature. Hence, DAMON-based apparoach can also be used for a similar purpose (promoting not every accessed pages but pages that more frequently used for longer time). > > 5. Explore if we need to use PFN information + hash list instead of > simple migration list. Here scanning is directly done with PFN belonging > to CXL node. DAMON supports physical address space monitoring, and maintains the access monitoring results in its own data structure called damon_region. So I think similar benefit can be achieved using DAMON? [...] > 8. Using DAMON APIs OR Reusing part of DAMON which already tracks range of > physical addresses accessed. My biased humble opinion is that it would be very nice to explore this opportunity, since I show some similarities and opportunities to solve some of challenges on your approach in an easier way. Even if it turns out that DAMON cannot be used for your use case, failing earlier is a good thing, I'd say :) > > 9. Gregory has nicely mentioned some details/ideas on different approaches in > [1] : development notes, in the context of promoting unmapped page cache folios. DAMON supports monitoring accesses to unmapped page cache folios, so hopefully DAMON-based approaches can also solve this issue. > > 10. SJ had pointed about concerns about kernel-thread based approaches as in > kstaled [2]. So current patchset has tried to address the issue with simple > algorithms to reduce CPU overhead. Migration throttling, Running the daemon > in NICE priority, Parallelizing migration with scanning could help further. > > 11. Toptier pages scanned can be used to assist current NUMAB by providing information > on hot VMAs. > > Credits > ======= > Thanks to Bharata, Joannes, Gregory, SJ, Chris for their valuable comments and > support. I also learned many things from the great discussions, thank you :) [...] > > Links: > [1] https://lore.kernel.org/lkml/20241127082201.1276-1-gourry@gourry.net/ > [2] kstaled: https://lore.kernel.org/lkml/1317170947-17074-3-git-send-email-walken@google.com/#r > [3] https://lore.kernel.org/lkml/Y+Pj+9bbBbHpf6xM@hirez.programming.kicks-ass.net/ > > I might have CCed more people or less people than needed > unintentionally. Thanks, SJ [...]
On 12/11/2024 12:23 AM, SeongJae Park wrote: > Hello Raghavendra, > > > Thank you for posting this nice patch series. I gave you some feedback > offline. Adding those here again for transparency on this grateful public > discussion. > > On Sun, 1 Dec 2024 15:38:08 +0000 Raghavendra K T <raghavendra.kt@amd.com> wrote: > >> Introduction: >> ============= >> This patchset is an outcome of an ongoing collaboration between AMD and Meta. >> Meta wanted to explore an alternative page promotion technique as they >> observe high latency spikes in their workloads that access CXL memory. >> >> In the current hot page promotion, all the activities including the >> process address space scanning, NUMA hint fault handling and page >> migration is performed in the process context. i.e., scanning overhead is >> borne by applications. > > Yet another approach is using DAMON. DAMON does access monitoring, and further > allows users to request access pattern-driven system operations in name of > DAMOS (Data Access Monitoring-based Operation Schemes). Using it, users can > request DAMON to find hot pages and promote, while finding cold pages and > demote. SK hynix has made their CXL-based memory capacity expansion solution > in the way (https://github.com/skhynix/hmsdk/wiki/Capacity-Expansion). We > collaboratively developed new DAMON features for that, and those are all > in the mainline since Linux v6.11. > > I also proposed an idea for advancing it using DAMOS auto-tuning on more > general (>2 tiers) setup > (https:lore.kernel.org/20231112195602.61525-1-sj@kernel.org). I haven't had a > time to further implement and test the idea so far, though. > >> >> This is an early RFC patch series to do (slow tier) CXL page promotion. >> The approach in this patchset assists/addresses the issue by adding PTE >> Accessed bit scanning. >> >> Scanning is done by a global kernel thread which routinely scans all >> the processes' address spaces and checks for accesses by reading the >> PTE A bit. It then migrates/promotes the pages to the toptier node >> (node 0 in the current approach). >> >> Thus, the approach pushes overhead of scanning, NUMA hint faults and >> migrations off from process context. > > DAMON also uses PTE A bit as major source of the access information. And DAMON > does both access monitoring and promotion/demotion in a global kernel thread, > namely kdamond. Hence the DAMON-based approach would also offload the > overheads from process context. So I feel your approach has a sort of > similarity with DAMON-based one in a way, and we might have a chance to avoid > unnecessary duplicates. > > [...] >> >> Limitations: >> =========== >> PTE A bit scanning approach lacks information about exact destination >> node to migrate to. > > This is same for DAMON-based approach, since DAMON also uses PTE A bit as the > major source of the information. We aim to extend DAMON to aware of the access > source CPU, and use it for solving this problem, though. Utilizing page faults > or AMD IBS-like h/w features are on the table of the ideas. > >> >> Notes/Observations on design/Implementations/Alternatives/TODOs... >> ================================ >> 1. Fine-tuning scan throttling > > DAMON allows users set the upper-limit of monitoring overhead, using > max_nr_regions parameter. Then it provides its best-effort accuracy. We also > have ongoing projects for making it more accurate and easier to tune. > >> >> 2. Use migrate_balanced_pgdat() to balance toptier node before migration >> OR Use migrate_misplaced_folio_prepare() directly. >> But it may need some optimizations (for e.g., invoke occasionaly so >> that overhead is not there for every migration). >> >> 3. Explore if a separate PAGE_EXT flag is needed instead of reusing >> PAGE_IDLE flag (cons: complicates PTE A bit handling in the system), >> But practically does not look good idea. >> >> 4. Use timestamp information-based migration (Similar to numab mode=2). >> instead of migrating immediately when PTE A bit set. >> (cons: >> - It will not be accurate since it is done outside of process >> context. >> - Performance benefit may be lost.) > > DAMON provides a sort of time-based aggregated monitoring results. And DAMOS > provides prioritization of pages based on the access temperature. Hence, > DAMON-based apparoach can also be used for a similar purpose (promoting not > every accessed pages but pages that more frequently used for longer time). > >> >> 5. Explore if we need to use PFN information + hash list instead of >> simple migration list. Here scanning is directly done with PFN belonging >> to CXL node. > > DAMON supports physical address space monitoring, and maintains the access > monitoring results in its own data structure called damon_region. So I think > similar benefit can be achieved using DAMON? > > [...] >> 8. Using DAMON APIs OR Reusing part of DAMON which already tracks range of >> physical addresses accessed. > > My biased humble opinion is that it would be very nice to explore this > opportunity, since I show some similarities and opportunities to solve some of > challenges on your approach in an easier way. Even if it turns out that DAMON > cannot be used for your use case, failing earlier is a good thing, I'd say :) > >> >> 9. Gregory has nicely mentioned some details/ideas on different approaches in >> [1] : development notes, in the context of promoting unmapped page cache folios. > > DAMON supports monitoring accesses to unmapped page cache folios, so hopefully > DAMON-based approaches can also solve this issue. > Hello SJ, Thank you for detailed explanation again. (Sorry for late acknowledgement as I was looking forward to MM alignment discussion when this message came). I think once the direction is fixed, we could surely use / Reuse lot source code from DAMON, MGLRU. Amazing design of DAMON should surely help. Will keep in mind all the points raised here. Thanks and Regards - Raghu
On Sun, 01 Dec 2024, Raghavendra K T wrote: >6. Holding PTE lock before migration. fyi I tried testing this series with 'perf-bench numa mem' and got a soft lockup, unable to take the PTL (and lost the machine to debug further atm), ie: [ 3852.217675] CPU: 127 UID: 0 PID: 12537 Comm: watch-numa-sche Tainted: G D L 6.14.0-rc2-kmmscand-v1+ #3 [ 3852.217677] Tainted: [D]=DIE, [L]=SOFTLOCKUP [ 3852.217678] RIP: 0010:native_queued_spin_lock_slowpath+0x64/0x290 [ 3852.217683] Code: 77 7b f0 0f ba 2b 08 0f 92 c2 8b 03 0f b6 d2 c1 e2 08 30 e4 09 d0 3d ff 00 00 00 77 57 85 c0 74 10 0f b6 03 84 c0 74 09 f3 90 <0f> b6 03 84 c0 75 f7 b8 01 00 00 00 66 89 03 5b 5d 41 5c 41 5d c3 [ 3852.217684] RSP: 0018:ff274259b3c9f988 EFLAGS: 00000202 [ 3852.217685] RAX: 0000000000000001 RBX: ffbd2efd8c08c9a8 RCX: 000ffffffffff000 [ 3852.217686] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffbd2efd8c08c9a8 [ 3852.217687] RBP: ff161328422c1328 R08: ff274259b3c9fb90 R09: ff161328422c1000 [ 3852.217688] R10: 00000000ffffffff R11: 0000000000000004 R12: 00007f52cca00000 [ 3852.217688] R13: ff274259b3c9fa00 R14: ff16132842326000 R15: ff161328422c1328 [ 3852.217689] FS: 00007f32b6f92b80(0000) GS:ff161423bfd80000(0000) knlGS:0000000000000000 [ 3852.217691] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3852.217692] CR2: 0000564ddbf68008 CR3: 00000080a81cc005 CR4: 0000000000773ef0 [ 3852.217693] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 3852.217694] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [ 3852.217694] PKRU: 55555554 [ 3852.217695] Call Trace: [ 3852.217696] <IRQ> [ 3852.217697] ? watchdog_timer_fn+0x21b/0x2a0 [ 3852.217699] ? __pfx_watchdog_timer_fn+0x10/0x10 [ 3852.217702] ? __hrtimer_run_queues+0x10f/0x2a0 [ 3852.217704] ? hrtimer_interrupt+0xfb/0x240 [ 3852.217706] ? __sysvec_apic_timer_interrupt+0x4e/0x110 [ 3852.217709] ? sysvec_apic_timer_interrupt+0x68/0x90 [ 3852.217712] </IRQ> [ 3852.217712] <TASK> [ 3852.217713] ? asm_sysvec_apic_timer_interrupt+0x16/0x20 [ 3852.217717] ? native_queued_spin_lock_slowpath+0x64/0x290 [ 3852.217720] _raw_spin_lock+0x25/0x30 [ 3852.217723] __pte_offset_map_lock+0x9a/0x110 [ 3852.217726] gather_pte_stats+0x1e3/0x2c0 [ 3852.217730] walk_pgd_range+0x528/0xbb0 [ 3852.217733] __walk_page_range+0x71/0x1d0 [ 3852.217736] walk_page_vma+0x98/0xf0 [ 3852.217738] show_numa_map+0x11a/0x3a0 [ 3852.217741] seq_read_iter+0x2a6/0x470 [ 3852.217745] seq_read+0x12b/0x170 [ 3852.217748] vfs_read+0xe0/0x370 [ 3852.217751] ? syscall_exit_to_user_mode+0x49/0x210 [ 3852.217755] ? do_syscall_64+0x8a/0x190 [ 3852.217758] ksys_read+0x6a/0xe0 [ 3852.217762] do_syscall_64+0x7e/0x190 [ 3852.217765] ? __memcg_slab_free_hook+0xd4/0x120 [ 3852.217768] ? __x64_sys_close+0x38/0x80 [ 3852.217771] ? kmem_cache_free+0x3bf/0x3e0 [ 3852.217774] ? syscall_exit_to_user_mode+0x49/0x210 [ 3852.217777] ? do_syscall_64+0x8a/0x190 [ 3852.217780] ? do_syscall_64+0x8a/0x190 [ 3852.217783] ? __irq_exit_rcu+0x3e/0xe0 [ 3852.217785] entry_SYSCALL_64_after_hwframe+0x76/0x7e
On 2/12/2025 10:32 PM, Davidlohr Bueso wrote: > On Sun, 01 Dec 2024, Raghavendra K T wrote: > >> 6. Holding PTE lock before migration. > > fyi I tried testing this series with 'perf-bench numa mem' and got a > soft lockup, > unable to take the PTL (and lost the machine to debug further atm), ie: > > [ 3852.217675] CPU: 127 UID: 0 PID: 12537 Comm: watch-numa-sche Tainted: > G D L 6.14.0-rc2-kmmscand-v1+ #3 > [ 3852.217677] Tainted: [D]=DIE, [L]=SOFTLOCKUP > [ 3852.217678] RIP: 0010:native_queued_spin_lock_slowpath+0x64/0x290 > [ 3852.217683] Code: 77 7b f0 0f ba 2b 08 0f 92 c2 8b 03 0f b6 d2 c1 e2 > 08 30 e4 09 d0 3d ff 00 00 00 77 57 85 c0 74 10 0f b6 03 84 c0 74 09 f3 > 90 <0f> b6 03 84 c0 75 f7 b8 01 00 00 00 66 89 03 5b 5d 41 5c 41 5d c3 > [ 3852.217684] RSP: 0018:ff274259b3c9f988 EFLAGS: 00000202 > [ 3852.217685] RAX: 0000000000000001 RBX: ffbd2efd8c08c9a8 RCX: > 000ffffffffff000 > [ 3852.217686] RDX: 0000000000000000 RSI: 0000000000000001 RDI: > ffbd2efd8c08c9a8 > [ 3852.217687] RBP: ff161328422c1328 R08: ff274259b3c9fb90 R09: > ff161328422c1000 > [ 3852.217688] R10: 00000000ffffffff R11: 0000000000000004 R12: > 00007f52cca00000 > [ 3852.217688] R13: ff274259b3c9fa00 R14: ff16132842326000 R15: > ff161328422c1328 > [ 3852.217689] FS: 00007f32b6f92b80(0000) GS:ff161423bfd80000(0000) > knlGS:0000000000000000 > [ 3852.217691] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 3852.217692] CR2: 0000564ddbf68008 CR3: 00000080a81cc005 CR4: > 0000000000773ef0 > [ 3852.217693] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [ 3852.217694] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: > 0000000000000400 > [ 3852.217694] PKRU: 55555554 > [ 3852.217695] Call Trace: > [ 3852.217696] <IRQ> > [ 3852.217697] ? watchdog_timer_fn+0x21b/0x2a0 > [ 3852.217699] ? __pfx_watchdog_timer_fn+0x10/0x10 > [ 3852.217702] ? __hrtimer_run_queues+0x10f/0x2a0 > [ 3852.217704] ? hrtimer_interrupt+0xfb/0x240 > [ 3852.217706] ? __sysvec_apic_timer_interrupt+0x4e/0x110 > [ 3852.217709] ? sysvec_apic_timer_interrupt+0x68/0x90 > [ 3852.217712] </IRQ> > [ 3852.217712] <TASK> > [ 3852.217713] ? asm_sysvec_apic_timer_interrupt+0x16/0x20 > [ 3852.217717] ? native_queued_spin_lock_slowpath+0x64/0x290 > [ 3852.217720] _raw_spin_lock+0x25/0x30 > [ 3852.217723] __pte_offset_map_lock+0x9a/0x110 > [ 3852.217726] gather_pte_stats+0x1e3/0x2c0 > [ 3852.217730] walk_pgd_range+0x528/0xbb0 > [ 3852.217733] __walk_page_range+0x71/0x1d0 > [ 3852.217736] walk_page_vma+0x98/0xf0 > [ 3852.217738] show_numa_map+0x11a/0x3a0 > [ 3852.217741] seq_read_iter+0x2a6/0x470 > [ 3852.217745] seq_read+0x12b/0x170 > [ 3852.217748] vfs_read+0xe0/0x370 > [ 3852.217751] ? syscall_exit_to_user_mode+0x49/0x210 > [ 3852.217755] ? do_syscall_64+0x8a/0x190 > [ 3852.217758] ksys_read+0x6a/0xe0 > [ 3852.217762] do_syscall_64+0x7e/0x190 > [ 3852.217765] ? __memcg_slab_free_hook+0xd4/0x120 > [ 3852.217768] ? __x64_sys_close+0x38/0x80 > [ 3852.217771] ? kmem_cache_free+0x3bf/0x3e0 > [ 3852.217774] ? syscall_exit_to_user_mode+0x49/0x210 > [ 3852.217777] ? do_syscall_64+0x8a/0x190 > [ 3852.217780] ? do_syscall_64+0x8a/0x190 > [ 3852.217783] ? __irq_exit_rcu+0x3e/0xe0 > [ 3852.217785] entry_SYSCALL_64_after_hwframe+0x76/0x7e Hello David, Thanks for reporting, details. Reproducer information helps me to stabilize the code quickly. Micro-benchmark I used did not show any issues. I will add PTL lock and also check the issue from my side.. (with multiple scanning threads, it could cause even more issues because of more migration pressure, wondering if I should go ahead with more stabilized single thread scanning version in the coming post) Thanks and Regards - Raghu
© 2016 - 2026 Red Hat, Inc.