The preferred demotion node should be the one closest to the source
node to minimize migration latency. However, if the preferred node
is not set in mems_allowed, demote_folio_list() currently randomly
select one from allowed nodes as the new preferred node. This can
result in selecting a very distant node.
Update demote_folio_list() to traverse the demotion targets
hierarchically until the perferred node is set in mems_allowed,
ensuring the perferred target is always the closest available node.
Signed-off-by: Bing Jiao <bingjiao@google.com>
---
mm/vmscan.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index db2413c4bd26..d452974c946e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1052,8 +1052,18 @@ static unsigned int demote_folio_list(struct list_head *demote_folios,
if (nodes_empty(allowed_mask))
return false;
- if (!node_isset(target_nid, allowed_mask))
- target_nid = node_random(&allowed_mask);
+ while (target_nid != NUMA_NO_NODE &&
+ !node_isset(target_nid, allowed_mask)) {
+ /* Get the preferred demotion target from the next tier. */
+ target_nid = next_demotion_node(target_nid);
+ }
+
+ /*
+ * The perferred node query is subject to race conditions such as
+ * nodes in the next tier are hot-unplugged.
+ */
+ if (target_nid == NUMA_NO_NODE)
+ return 0;
mtc.nid = target_nid;
/* Demotion ignores all cpuset and mempolicy settings */
--
2.52.0.358.g0dd7633a29-goog