From nobody Tue Apr 7 23:42:41 2026 Received: from mslow3.mail.gandi.net (mslow3.mail.gandi.net [217.70.178.249]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8CBC73D9025 for ; Wed, 11 Mar 2026 11:04:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.70.178.249 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773227076; cv=none; b=niV2tq0eh7i17axuC/Z/HYOdGFyooDd+GdDGRKIKXuiwgbxdG6SEWenceohwe8CZ/qoMJ5z9N8K66LcOR9K0FbL0VN0mnLpynT9scN4uV1HkmestXu7UHtz0y2FCqj9dM8/3Os7BAC+wASS/rzV1nR9BPJGYgRbJa0lfTbJ7QtU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773227076; c=relaxed/simple; bh=OsinLDao8OenrDWxvjx0tncPNkRRyQ4sm+SKTfSBkPs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=UTklRBjNcJUKcl1XcpPhkmPfH94MfOs6J2G/V68cqL7nIZ3v0brFQfJz5lOXyvc3g/5gWlRJIkzJp/VQowWx3wUZ2KxHCGuI/pjUY8eO7xEeHGvxCwLHaKzP1xsSlCc8Ixjs6xvEj/RuKoKIFeyWdwJmgsOIQgzMp5v8hSTSD8A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ghiti.fr; spf=pass smtp.mailfrom=ghiti.fr; arc=none smtp.client-ip=217.70.178.249 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ghiti.fr Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=ghiti.fr Received: from relay5-d.mail.gandi.net (relay5-d.mail.gandi.net [217.70.183.197]) by mslow3.mail.gandi.net (Postfix) with ESMTP id 6773A581EE8 for ; Wed, 11 Mar 2026 11:03:56 +0000 (UTC) Received: by mail.gandi.net (Postfix) with ESMTPSA id 74F38432F6; Wed, 11 Mar 2026 11:03:44 +0000 (UTC) From: Alexandre Ghiti To: akpm@linux-foundation.org Cc: alexghiti@kernel.org, kernel-team@meta.com, akinobu.mita@gmail.com, david@kernel.org, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@kernel.org, rppt@kernel.org, surenb@google.com, mhocko@suse.com, hannes@cmpxchg.org, zhengqi.arch@bytedance.com, shakeel.butt@linux.dev, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, gourry@gourry.net, apopple@nvidia.com, byungchul@sk.com, joshua.hahnjy@gmail.com, matthew.brost@intel.com, rakie.kim@sk.com, ying.huang@linux.alibaba.com, ziy@nvidia.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Alexandre Ghiti Subject: [PATCH 1/4] mm: Move demotion related functions in memory-tiers.c Date: Wed, 11 Mar 2026 12:02:40 +0100 Message-ID: <20260311110314.237315-2-alex@ghiti.fr> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260311110314.237315-1-alex@ghiti.fr> References: <20260311110314.237315-1-alex@ghiti.fr> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-GND-Sasl: alex@ghiti.fr X-GND-Score: 0 X-GND-Cause: gggruggvucftvghtrhhoucdtuddrgeefgedrtddtgddvkeefjeefucetufdoteggodetrfdotffvucfrrhhofhhilhgvmecuifetpfffkfdpucggtfgfnhhsuhgsshgtrhhisggvnecuuegrihhlohhuthemuceftddunecunecujfgurhephffvvefufffkofgjfhgggfestdekredtredttdenucfhrhhomheptehlvgigrghnughrvgcuifhhihhtihcuoegrlhgvgiesghhhihhtihdrfhhrqeenucggtffrrghtthgvrhhnpefhjeejtdelteefuedvffduheeifeeggeefudevfefhleduvdeutdeiteeukeegveenucfkphepvdeivddtmedutdgumegttdelvdemgedttdemmeehmeefrgeivgenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepihhnvghtpedviedvtdemuddtugemtgdtledvmeegtddtmeemheemfegriegvpdhhvghloheprghlvgigghhhihhtihdqfhgvughorhgrqdfrhfegofekiedvrfdrthhhvghfrggtvggsohhokhdrtghomhdpmhgrihhlfhhrohhmpegrlhgvgiesghhhihhtihdrfhhrpdhqihgupeejgefhfeekgeefvdfhiedpmhhouggvpehsmhhtphhouhhtpdhnsggprhgtphhtthhopedvkedprhgtphhtthhopegrkhhpmheslhhinhhugidqfhhouhhnuggrthhiohhnrdhorhhgpdhrtghpthhtoheprghlvgigghhhihhtiheskhgvrhhnvghlrdhorhhgpdhrtghpthhtohepkhgvrhhnvghlqdhtvggrmhesmhgvthgrrdgtohhmpdhrtghpthhtoheprghkihhnohgsuhhmihhtr gesghhmrghilhdrtghomhdprhgtphhtthhopegurghvihgusehkvghrnhgvlhdrohhrghdprhgtphhtthhopehlohhrvghniihordhsthhorghkvghssehorhgrtghlvgdrtghomh X-GND-State: clean Content-Type: text/plain; charset="utf-8" Let's have all the demotion functions in this file, no functional change intended. Suggested-by: Gregory Price Signed-off-by: Alexandre Ghiti --- include/linux/memory-tiers.h | 18 ++++++++ mm/memory-tiers.c | 75 +++++++++++++++++++++++++++++++++ mm/vmscan.c | 80 +----------------------------------- 3 files changed, 94 insertions(+), 79 deletions(-) diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h index 96987d9d95a8..0bf0d002939e 100644 --- a/include/linux/memory-tiers.h +++ b/include/linux/memory-tiers.h @@ -56,6 +56,9 @@ void mt_put_memory_types(struct list_head *memory_types); int next_demotion_node(int node, const nodemask_t *allowed_mask); void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets); bool node_is_toptier(int node); +unsigned int mt_demote_folios(struct list_head *demote_folios, + struct pglist_data *pgdat, + struct mem_cgroup *memcg); #else static inline int next_demotion_node(int node, const nodemask_t *allowed_m= ask) { @@ -71,6 +74,14 @@ static inline bool node_is_toptier(int node) { return true; } + +static inline unsigned int mt_demote_folios(struct list_head *demote_folio= s, + struct pglist_data *pgdat, + struct mem_cgroup *memcg) +{ + return 0; +} + #endif =20 #else @@ -116,6 +127,13 @@ static inline bool node_is_toptier(int node) return true; } =20 +static inline unsigned int mt_demote_folios(struct list_head *demote_folio= s, + struct pglist_data *pgdat, + struct mem_cgroup *memcg) +{ + return 0; +} + static inline int register_mt_adistance_algorithm(struct notifier_block *n= b) { return 0; diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c index 986f809376eb..afdf21738a54 100644 --- a/mm/memory-tiers.c +++ b/mm/memory-tiers.c @@ -7,6 +7,7 @@ #include #include #include +#include =20 #include "internal.h" =20 @@ -373,6 +374,80 @@ int next_demotion_node(int node, const nodemask_t *all= owed_mask) return find_next_best_node(node, &mask); } =20 +static struct folio *alloc_demote_folio(struct folio *src, + unsigned long private) +{ + struct folio *dst; + nodemask_t *allowed_mask; + struct migration_target_control *mtc; + + mtc =3D (struct migration_target_control *)private; + + allowed_mask =3D mtc->nmask; + /* + * make sure we allocate from the target node first also trying to + * demote or reclaim pages from the target node via kswapd if we are + * low on free memory on target node. If we don't do this and if + * we have free memory on the slower(lower) memtier, we would start + * allocating pages from slower(lower) memory tiers without even forcing + * a demotion of cold pages from the target memtier. This can result + * in the kernel placing hot pages in slower(lower) memory tiers. + */ + mtc->nmask =3D NULL; + mtc->gfp_mask |=3D __GFP_THISNODE; + dst =3D alloc_migration_target(src, (unsigned long)mtc); + if (dst) + return dst; + + mtc->gfp_mask &=3D ~__GFP_THISNODE; + mtc->nmask =3D allowed_mask; + + return alloc_migration_target(src, (unsigned long)mtc); +} + +unsigned int mt_demote_folios(struct list_head *demote_folios, + struct pglist_data *pgdat, + struct mem_cgroup *memcg) +{ + int target_nid; + unsigned int nr_succeeded; + nodemask_t allowed_mask; + + struct migration_target_control mtc =3D { + /* + * Allocate from 'node', or fail quickly and quietly. + * When this happens, 'page' will likely just be discarded + * instead of migrated. + */ + .gfp_mask =3D (GFP_HIGHUSER_MOVABLE & ~__GFP_RECLAIM) | + __GFP_NOMEMALLOC | GFP_NOWAIT, + .nmask =3D &allowed_mask, + .reason =3D MR_DEMOTION, + }; + + if (list_empty(demote_folios)) + return 0; + + node_get_allowed_targets(pgdat, &allowed_mask); + mem_cgroup_node_filter_allowed(memcg, &allowed_mask); + if (nodes_empty(allowed_mask)) + return 0; + + target_nid =3D next_demotion_node(pgdat->node_id, &allowed_mask); + if (target_nid =3D=3D NUMA_NO_NODE) + /* No lower-tier nodes or nodes were hot-unplugged. */ + return 0; + + mtc.nid =3D target_nid; + + /* Demotion ignores all cpuset and mempolicy settings */ + migrate_pages(demote_folios, alloc_demote_folio, NULL, + (unsigned long)&mtc, MIGRATE_ASYNC, MR_DEMOTION, + &nr_succeeded); + + return nr_succeeded; +} + static void disable_all_demotion_targets(void) { struct memory_tier *memtier; diff --git a/mm/vmscan.c b/mm/vmscan.c index 0fc9373e8251..5e0138b94480 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -983,84 +983,6 @@ static void folio_check_dirty_writeback(struct folio *= folio, mapping->a_ops->is_dirty_writeback(folio, dirty, writeback); } =20 -static struct folio *alloc_demote_folio(struct folio *src, - unsigned long private) -{ - struct folio *dst; - nodemask_t *allowed_mask; - struct migration_target_control *mtc; - - mtc =3D (struct migration_target_control *)private; - - allowed_mask =3D mtc->nmask; - /* - * make sure we allocate from the target node first also trying to - * demote or reclaim pages from the target node via kswapd if we are - * low on free memory on target node. If we don't do this and if - * we have free memory on the slower(lower) memtier, we would start - * allocating pages from slower(lower) memory tiers without even forcing - * a demotion of cold pages from the target memtier. This can result - * in the kernel placing hot pages in slower(lower) memory tiers. - */ - mtc->nmask =3D NULL; - mtc->gfp_mask |=3D __GFP_THISNODE; - dst =3D alloc_migration_target(src, (unsigned long)mtc); - if (dst) - return dst; - - mtc->gfp_mask &=3D ~__GFP_THISNODE; - mtc->nmask =3D allowed_mask; - - return alloc_migration_target(src, (unsigned long)mtc); -} - -/* - * Take folios on @demote_folios and attempt to demote them to another nod= e. - * Folios which are not demoted are left on @demote_folios. - */ -static unsigned int demote_folio_list(struct list_head *demote_folios, - struct pglist_data *pgdat, - struct mem_cgroup *memcg) -{ - int target_nid; - unsigned int nr_succeeded; - nodemask_t allowed_mask; - - struct migration_target_control mtc =3D { - /* - * Allocate from 'node', or fail quickly and quietly. - * When this happens, 'page' will likely just be discarded - * instead of migrated. - */ - .gfp_mask =3D (GFP_HIGHUSER_MOVABLE & ~__GFP_RECLAIM) | - __GFP_NOMEMALLOC | GFP_NOWAIT, - .nmask =3D &allowed_mask, - .reason =3D MR_DEMOTION, - }; - - if (list_empty(demote_folios)) - return 0; - - node_get_allowed_targets(pgdat, &allowed_mask); - mem_cgroup_node_filter_allowed(memcg, &allowed_mask); - if (nodes_empty(allowed_mask)) - return 0; - - target_nid =3D next_demotion_node(pgdat->node_id, &allowed_mask); - if (target_nid =3D=3D NUMA_NO_NODE) - /* No lower-tier nodes or nodes were hot-unplugged. */ - return 0; - - mtc.nid =3D target_nid; - - /* Demotion ignores all cpuset and mempolicy settings */ - migrate_pages(demote_folios, alloc_demote_folio, NULL, - (unsigned long)&mtc, MIGRATE_ASYNC, MR_DEMOTION, - &nr_succeeded); - - return nr_succeeded; -} - static bool may_enter_fs(struct folio *folio, gfp_t gfp_mask) { if (gfp_mask & __GFP_FS) @@ -1573,7 +1495,7 @@ static unsigned int shrink_folio_list(struct list_hea= d *folio_list, /* 'folio_list' is always empty here */ =20 /* Migrate folios selected for demotion */ - nr_demoted =3D demote_folio_list(&demote_folios, pgdat, memcg); + nr_demoted =3D mt_demote_folios(&demote_folios, pgdat, memcg); nr_reclaimed +=3D nr_demoted; stat->nr_demoted +=3D nr_demoted; /* Folios that could not be demoted are still in @demote_folios */ --=20 2.53.0