From: Gregory Price <gourry@gourry.net>
Tiered memory systems often require migrating multiple folios at once.
Currently, migrate_misplaced_folio() handles only one folio per call,
which is inefficient for batch operations. This patch introduces
migrate_misplaced_folios_batch(), a batch variant that leverages
migrate_pages() internally for improved performance.
The caller must isolate folios beforehand using
migrate_misplaced_folio_prepare(). On return, the folio list will be
empty regardless of success or failure.
This function will be used by pghot kmigrated thread.
Signed-off-by: Gregory Price <gourry@gourry.net>
[Rewrote commit description]
Signed-off-by: Bharata B Rao <bharata@amd.com>
---
include/linux/migrate.h | 6 ++++++
mm/migrate.c | 48 +++++++++++++++++++++++++++++++++++++++++
2 files changed, 54 insertions(+)
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index d5af2b7f577b..5c1e2691cec2 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -111,6 +111,7 @@ static inline void softleaf_entry_wait_on_locked(softleaf_t entry, spinlock_t *p
int migrate_misplaced_folio_prepare(struct folio *folio,
struct vm_area_struct *vma, int node);
int migrate_misplaced_folio(struct folio *folio, int node);
+int migrate_misplaced_folios_batch(struct list_head *folio_list, int node);
#else
static inline int migrate_misplaced_folio_prepare(struct folio *folio,
struct vm_area_struct *vma, int node)
@@ -121,6 +122,11 @@ static inline int migrate_misplaced_folio(struct folio *folio, int node)
{
return -EAGAIN; /* can't migrate now */
}
+static inline int migrate_misplaced_folios_batch(struct list_head *folio_list,
+ int node)
+{
+ return -EAGAIN; /* can't migrate now */
+}
#endif /* CONFIG_NUMA_BALANCING */
#ifdef CONFIG_MIGRATION
diff --git a/mm/migrate.c b/mm/migrate.c
index a15184950e65..94daec0f49ef 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2751,5 +2751,53 @@ int migrate_misplaced_folio(struct folio *folio, int node)
BUG_ON(!list_empty(&migratepages));
return nr_remaining ? -EAGAIN : 0;
}
+
+/**
+ * migrate_misplaced_folios_batch() - Batch variant of migrate_misplaced_folio
+ * Attempts to migrate a folio list to the specified destination.
+ * @folio_list: Isolated list of folios to be batch-migrated.
+ * @node: The NUMA node ID to where the folios should be migrated.
+ *
+ * Caller is expected to have isolated the folios by calling
+ * migrate_misplaced_folio_prepare(), which will result in an
+ * elevated reference count on the folio. All the isolated folios
+ * in the list must belong to the same memcg so that NUMA_PAGE_MIGRATE
+ * stat can be attributed correctly to the memcg.
+ *
+ * This function will un-isolate the folios, drop the elevated reference
+ * and remove them from the list before returning. This is called
+ * only for batched promotion of hot pages from lower tier nodes.
+ *
+ * Return: 0 on success and -EAGAIN on failure or partial migration.
+ * On return, @folio_list will be empty regardless of success/failure.
+ */
+int migrate_misplaced_folios_batch(struct list_head *folio_list, int node)
+{
+ pg_data_t *pgdat = NODE_DATA(node);
+ struct mem_cgroup *memcg = NULL;
+ unsigned int nr_succeeded = 0;
+ int nr_remaining;
+
+ if (!list_empty(folio_list)) {
+ struct folio *first = list_first_entry(folio_list, struct folio, lru);
+ memcg = get_mem_cgroup_from_folio(first);
+ }
+
+ nr_remaining = migrate_pages(folio_list, alloc_misplaced_dst_folio,
+ NULL, node, MIGRATE_ASYNC,
+ MR_NUMA_MISPLACED, &nr_succeeded);
+ if (nr_remaining)
+ putback_movable_pages(folio_list);
+
+ if (nr_succeeded) {
+ count_vm_numa_events(NUMA_PAGE_MIGRATE, nr_succeeded);
+ mod_node_page_state(pgdat, PGPROMOTE_SUCCESS, nr_succeeded);
+ count_memcg_events(memcg, NUMA_PAGE_MIGRATE, nr_succeeded);
+ }
+
+ mem_cgroup_put(memcg);
+ WARN_ON(!list_empty(folio_list));
+ return nr_remaining ? -EAGAIN : 0;
+}
#endif /* CONFIG_NUMA_BALANCING */
#endif /* CONFIG_NUMA */
--
2.34.1
Replies to review from Shashiko
[https://sashiko.dev/#/patchset/20260323095104.238982-1-bharata%40amd.com]
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>[ ... ]
>> @@ -2751,5 +2751,53 @@ int migrate_misplaced_folio(struct folio *folio, int node)
>[ ... ]
>> + if (nr_remaining)
>> + putback_movable_pages(folio_list);
>> +
>> + if (nr_succeeded) {
>> + count_vm_numa_events(NUMA_PAGE_MIGRATE, nr_succeeded);
>> + mod_node_page_state(pgdat, PGPROMOTE_SUCCESS, nr_succeeded);
>
>Should this use mod_lruvec_state() instead of mod_node_page_state()?
>Using mod_node_page_state() updates the global NUMA node statistic but bypasses
>the memcg-specific lruvec statistic. This can cause the pgpromote_success
>metric in memory.stat to undercount for batched promotions.
That's correct. Earlier, memcg wasn't available in this routine but that
is no longer the case in v6. Hence I can get lruvec and should be able
to use mod_lruvec_state() itself for PGPROMOTE_SUCCESS accounting. This will
be part of next iteration.
>
>Are there missing topology and sysctl checks before incrementing
>PGPROMOTE_SUCCESS here?
>
>In the single-folio variant, this statistic is only incremented when:
> if ((sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING)
> && !node_is_toptier(folio_nid(folio))
> && node_is_toptier(node))
> mod_lruvec_state(lruvec, PGPROMOTE_SUCCESS, nr_succeeded);
>Unconditionally recording the stat inside the migration helper could
>corrupt promotion statistics if the function is ever used for lateral NUMA
>migrations in the future.
This routine is called only for hot page promotion by pghot kmigrated thread
and has been documented as such. So unconditional call should be fine.
Regards,
Bharata.
© 2016 - 2026 Red Hat, Inc.