Batch migration for NUMA balancing

[RFC PATCH v0 0/2] Batch migration for NUMA balancing

Posted by Bharata B Rao 8 months, 3 weeks ago

Hi,

This is an attempt to convert the NUMA balancing to do batched
migration instead of migrating one folio at a time. The basic
idea is to collect (from hint fault handler) the folios to be
migrated in a list and batch-migrate them from task_work context.
More details about the specifics are present in patch 2/2.

During LSFMM[1] and subsequent discussions in MM alignment calls[2],
it was suggested that separate migration threads to handle migration
or promotion request may be desirable. Existing NUMA balancing, hot
page promotion and other future promotion techniques could off-load
migration part to these threads. Or if we manage to have a single
source of hotness truth like kpromoted[3], then that too can hand
over migration requests to the migration threads. I am envisaging
that different hotness sources like kmmscand[4], MGLRU[5], IBS[6]
and CXL HMU would push hot page info to kpromoted, which would
then isolate and push the folios to be promoted to the migrator
thread.

As a first step, this is an attempt to batch and perform NUMAB
migrations in async manner. Separate migration threads aren't
yet implemented but I am using Gregory's patch[7] that provides
migrate_misplaced_folio_batch() API to do batch migration of
misplaced folios.

Some points for discussion
--------------------------
1. To isolate the misplaced folios or not?

To do batch migration, the misplaced folios need to be stored in
some manner. I thought isolating them and using the folio->lru
field to link them up would be the most straight-forward way. But
then there were concerns expressed about folios remaining isolated
for long until they get migrated.

Or should we just maintain the PFNs instead of folios and
isolate them only just prior to migrating them?

2. Managing target_nid for misplaced pages

NUMAB provides the accurate target_nid for each folio that is
detected as misplaced. However when we don't migrate the folio
right away, but instead want to batch and do asyn migration later,
then where do we keep track of target_nid for each folio?

In this implementation, I am using last_cpupid field as it appeared
that this field could be reused (with some challenges mentioned
in 2/2) for isolated folios. This approach may be specific to NUMAB
but then each sub-system that hands over pages to the migrator thread
should also provide a target_nid and hence each sub-system should be
free to maintain and track the target_nid of folios that it has
isolated/batched for migration in its own specific manner.

3. How many folios to batch?

Currently I have a fixed threshold for number of folios to batch.
It could be a sysctl to allow a setting between a min and max. It
could also be auto-tuned if required.

The state of the patchset
-------------------------
* Still raw and very lightly tested
* Just posted to serve as base for subsequent discussions
here and in MM alignment calls.

References
----------
[1] LSFMM LWN summary - https://lwn.net/Articles/1016519/
[2] MM alignment call summary - https://lore.kernel.org/linux-mm/263d7140-c343-e82e-b836-ec85c52b54eb@google.com/
[3] kpromoted patchset - https://lore.kernel.org/linux-mm/20250306054532.221138-1-bharata@amd.com/
[4] Kmmscand: PTE A bit scanning - https://lore.kernel.org/linux-mm/20250319193028.29514-1-raghavendra.kt@amd.com/
[5] MGLRU scanning for page promotion - https://lore.kernel.org/lkml/20250324220301.1273038-1-kinseyho@google.com/
[6] IBS base hot page promotion - https://lore.kernel.org/linux-mm/20250306054532.221138-4-bharata@amd.com/
[7] Unmapped page cache folio promotion patchset - https://lore.kernel.org/linux-mm/20250411221111.493193-1-gourry@gourry.net/

Bharata B Rao (1):
mm: sched: Batch-migrate misplaced pages

Gregory Price (1):
migrate: implement migrate_misplaced_folio_batch

--
2.34.1