From nobody Sun Apr 5 21:29:06 2026 Received: from mail-oi1-f169.google.com (mail-oi1-f169.google.com [209.85.167.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3E69C37B41B for ; Mon, 23 Feb 2026 22:38:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.167.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771886321; cv=none; b=CT2TC0Zm2BTJeROdeDhu66eklK68kpV5yLB4gUIZpMBfOanNWwSLGJ0xDwIEtkQzCgq0sX8VxuAmS2MXlBelOVyMclWVOTCCBIHWywnhntf7V7eCkcMytuKu70OIbXY//7CBltQtm6fTN6zahRfoN87Db1qrSKDlJmBylHAhnPM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771886321; c=relaxed/simple; bh=T4vB2DOedOT6i9IF/jBh4raYemILf9XeHoYu3yDDtnE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Z81DV7xWfA3kWb94eEzjSB5JdiVzgRCaV9i3tl509YP7PZGeSrhzye6R4C2nP0G1/jLTMyUZf74iMFaFDDj9AU53aP3V8a9p2nKFmlR/KS5d6VPVcIlheXtTLvdB4IMcO75Ap/PzHqBCPM+05F+gWO2zpDmvN7fVtajDtTHqoM0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=iD6o+4Yg; arc=none smtp.client-ip=209.85.167.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="iD6o+4Yg" Received: by mail-oi1-f169.google.com with SMTP id 5614622812f47-45f10d7eb81so1840131b6e.3 for ; Mon, 23 Feb 2026 14:38:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1771886319; x=1772491119; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=wYuk97hSinOTQ/m7mlOvoqja3g8YulhySTVwuiXXVKY=; b=iD6o+4YgNTdGualUDzzIUgocUSIG8bLvnOFIzheZuNSghQngRcyGS9aG2WskobwNvw 8D9I3dHCC4uBpMFfu3wL7XJYGtdXtA3lXfDuz+9QOWlRHL+pbXku//vgzCTG0hgGHclW Mscd1XV+BKeBPHHG2Ojvztglu7aC7ARsGYISt5UjOOXWT/lN5iJmvDNLH24+DCSNImv+ OFndZz9mDLEc2oQtHWfG8sLrijCEHOJ61BSdZLJFTZvd2C/kPpDK467O487G44QyNuHC QgT+MZIjcdrTDg0Zuhfpz6yHdnFIpllGMS3ak8diPGDo1uJrPr+CAMOC2QJFeO0wkSPs heeA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1771886319; x=1772491119; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=wYuk97hSinOTQ/m7mlOvoqja3g8YulhySTVwuiXXVKY=; b=ItcwvljhAdDKrFCybhQO6uN5wsX5IRG528zu9HlF4h6FOYpzoQFg6U8z2beVWBpc/x Tlr6cFsLBs7Ne1mLqcjj/Q/3bT7PgQ3DbKgmyrRF8eRgM3E/svm2kkpBrmRhYhmYiugP /dOmbCjBdQ0gd2H+HubC6is72zIbpBp7+zJfN1T31XuDVTPQth48fvGvJWmbezQP1qZO 83XHOJJ8Wq/0BuToIJhDAr3PvCl4ZfgZ6orcA7LkGjq34ID4DuTmg+6LTn+SOL8hzMq4 Tl7lcZpj6lS7c0uREn/n6DuC3LMrYTwnw2AB+GcWjjFlD+/64RklDhj8fYBDwafydAWm aFtg== X-Forwarded-Encrypted: i=1; AJvYcCVUs1wcPzkzrQht/ziCHlBYaj4X4BCYlD9xwgNFvNxAyl2DeSyPAkZhaKimQfmI1U6KkVXQvoMXbWRcp00=@vger.kernel.org X-Gm-Message-State: AOJu0YziA/y/fU6jFC92sA5MOLzfHDmLfLq967TL+TGzl9e4MCfaBLBs qaPqciYPZlwWuOa5LEMqddrdtqg2fF+e/+GjT5ldfEhwetVNplt4WrrG X-Gm-Gg: AZuq6aLt7372MuJK4IUGqF1ItWoCK9z59rF2mIb9pVY1bHQrNBD0meeK+h3h9/rmZpc ZqfHOgzAEefE8ivkUKGmSUuPlTSsme0HhQGdz29InXbs3K8+TRu6tg1CYiIFu0LEpKY2/tG1QAQ H+/Ed9cPVSI/MLy8BFmFpTGgZ5mgxi0Mfv7BhutVA497djGyAiSbPhd1U92TYZHnakvsYFX+7v+ x2sS9cWrBP09vPT4mZyhVOhntaHmeEhvpCwKyPiwHsiV4TfC8ZjDmrDAizHGE0BcsBvn8b0mlpG siXK7e2PlS4o587ZJKrgrqmGoHzT3LJnLTZAwKi7KVPyX0EFH31kzylI/AlWyebG+3pPWOakl0s Al5eA0fRv90CkmYSdC4eTgQKvy7oTYkxlrLCPOlBLPf+iZlL/FirrWfiLC7aa1FMc2otbi9caB4 n9ZN98Om2nfJbqK6yB7hS9nw== X-Received: by 2002:a05:6808:1481:b0:45f:727:8fd7 with SMTP id 5614622812f47-4644638ee8bmr5488352b6e.46.1771886318987; Mon, 23 Feb 2026 14:38:38 -0800 (PST) Received: from localhost ([2a03:2880:10ff:45::]) by smtp.gmail.com with ESMTPSA id 586e51a60fabf-4157d2d7826sm8635887fac.10.2026.02.23.14.38.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 Feb 2026 14:38:38 -0800 (PST) From: Joshua Hahn To: Joshua Hahn Cc: Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R . Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Waiman Long , Chen Ridong , Tejun Heo , Michal Koutny , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: [RFC PATCH 3/6] mm/memory-tiers, memcontrol: Introduce toptier capacity updates Date: Mon, 23 Feb 2026 14:38:26 -0800 Message-ID: <20260223223830.586018-4-joshua.hahnjy@gmail.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20260223223830.586018-1-joshua.hahnjy@gmail.com> References: <20260223223830.586018-1-joshua.hahnjy@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" What a memcg considers to be a valid toptier node is defined by three criteria: (1) The node has CPUs, (2) The node has online memory, and (3) The node is within the cgroup's cpuset.mems. Of the three, the second and third criteria are the only ones that can change dynamically during runtime, via memory hotplug events and cpuset.mems changes, respectively. Introduce functions to calculate and update toptier capacity, and call them during cpuset.mems changes and memory hotplug events. Signed-off-by: Joshua Hahn --- include/linux/memcontrol.h | 6 ++++++ include/linux/memory-tiers.h | 29 +++++++++++++++++++++++++ include/linux/page_counter.h | 2 ++ kernel/cgroup/cpuset.c | 2 +- mm/memcontrol.c | 17 +++++++++++++++ mm/memory-tiers.c | 41 ++++++++++++++++++++++++++++++++++++ mm/page_counter.c | 8 +++++++ 7 files changed, 104 insertions(+), 1 deletion(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 5173a9f16721..900a36112b62 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -608,6 +608,8 @@ static inline void mem_cgroup_protection(struct mem_cgr= oup *root, void mem_cgroup_calculate_protection(struct mem_cgroup *root, struct mem_cgroup *memcg); =20 +void update_memcg_toptier_capacity(void); + static inline bool mem_cgroup_unprotected(struct mem_cgroup *target, struct mem_cgroup *memcg) { @@ -1116,6 +1118,10 @@ static inline void mem_cgroup_calculate_protection(s= truct mem_cgroup *root, { } =20 +static inline void update_memcg_toptier_capacity(void) +{ +} + static inline bool mem_cgroup_unprotected(struct mem_cgroup *target, struct mem_cgroup *memcg) { diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h index 85440473effb..cf616885e0db 100644 --- a/include/linux/memory-tiers.h +++ b/include/linux/memory-tiers.h @@ -53,6 +53,9 @@ int mt_perf_to_adistance(struct access_coordinate *perf, = int *adist); struct memory_dev_type *mt_find_alloc_memory_type(int adist, struct list_head *memory_types); void mt_put_memory_types(struct list_head *memory_types); +void mt_get_toptier_nodemask(nodemask_t *mask, const nodemask_t *allowed); +unsigned long mt_get_toptier_capacity(const nodemask_t *allowed); +unsigned long mt_get_total_capacity(const nodemask_t *allowed); #ifdef CONFIG_MIGRATION int next_demotion_node(int node, const nodemask_t *allowed_mask); void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets); @@ -152,5 +155,31 @@ static inline struct memory_dev_type *mt_find_alloc_me= mory_type(int adist, static inline void mt_put_memory_types(struct list_head *memory_types) { } + +static inline void mt_get_toptier_nodemask(nodemask_t *mask, + const nodemask_t *allowed) +{ + *mask =3D node_states[N_MEMORY]; + if (allowed) + nodes_and(*mask, *mask, *allowed); +} + +static inline unsigned long mt_get_toptier_capacity(const nodemask_t *allo= wed) +{ + int nid; + unsigned long capacity =3D 0; + + for_each_node_state(nid, N_MEMORY) { + if (allowed && !node_isset(nid, *allowed)) + continue; + capacity +=3D NODE_DATA(nid)->node_present_pages; + } + return capacity; +} + +static inline unsigned long mt_get_total_capacity(const nodemask_t *allowe= d) +{ + return mt_get_toptier_capacity(allowed); +} #endif /* CONFIG_NUMA */ #endif /* _LINUX_MEMORY_TIERS_H */ diff --git a/include/linux/page_counter.h b/include/linux/page_counter.h index 128c1272c88c..ada5f1dd75d4 100644 --- a/include/linux/page_counter.h +++ b/include/linux/page_counter.h @@ -121,6 +121,8 @@ static inline void page_counter_reset_watermark(struct = page_counter *counter) void page_counter_calculate_protection(struct page_counter *root, struct page_counter *counter, bool recursive_protection); +void page_counter_update_toptier_capacity(struct page_counter *counter, + const nodemask_t *allowed); unsigned long page_counter_toptier_high(struct page_counter *counter); unsigned long page_counter_toptier_low(struct page_counter *counter); #else diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 7607dfe516e6..e5641dc1af88 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -2620,7 +2620,6 @@ static void update_nodemasks_hier(struct cpuset *cs, = nodemask_t *new_mems) rcu_read_lock(); cpuset_for_each_descendant_pre(cp, pos_css, cs) { struct cpuset *parent =3D parent_cs(cp); - bool has_mems =3D nodes_and(*new_mems, cp->mems_allowed, parent->effecti= ve_mems); =20 /* @@ -2701,6 +2700,7 @@ static int update_nodemask(struct cpuset *cs, struct = cpuset *trialcs, =20 /* use trialcs->mems_allowed as a temp variable */ update_nodemasks_hier(cs, &trialcs->mems_allowed); + update_memcg_toptier_capacity(); return 0; } =20 diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 0be1e823d813..f3e4a6ce7181 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -54,6 +54,7 @@ #include #include #include +#include #include #include #include @@ -3906,6 +3907,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *pare= nt_css) =20 page_counter_init(&memcg->memory, &parent->memory, memcg_on_dfl); page_counter_init(&memcg->swap, &parent->swap, false); + page_counter_update_toptier_capacity(&memcg->memory, NULL); #ifdef CONFIG_MEMCG_V1 memcg->memory.track_failcnt =3D !memcg_on_dfl; WRITE_ONCE(memcg->oom_kill_disable, READ_ONCE(parent->oom_kill_disable)); @@ -3917,6 +3919,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *pare= nt_css) init_memcg_events(); page_counter_init(&memcg->memory, NULL, true); page_counter_init(&memcg->swap, NULL, false); + page_counter_update_toptier_capacity(&memcg->memory, NULL); #ifdef CONFIG_MEMCG_V1 page_counter_init(&memcg->kmem, NULL, false); page_counter_init(&memcg->tcpmem, NULL, false); @@ -4804,6 +4807,20 @@ void mem_cgroup_calculate_protection(struct mem_cgro= up *root, page_counter_calculate_protection(&root->memory, &memcg->memory, recursiv= e_protection); } =20 +void update_memcg_toptier_capacity(void) +{ + struct mem_cgroup *memcg; + nodemask_t allowed; + + for_each_mem_cgroup(memcg) { + if (memcg =3D=3D root_mem_cgroup) + continue; + + cpuset_nodes_allowed(memcg->css.cgroup, &allowed); + page_counter_update_toptier_capacity(&memcg->memory, &allowed); + } +} + static int charge_memcg(struct folio *folio, struct mem_cgroup *memcg, gfp_t gfp) { diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c index a88256381519..259caaf4be8f 100644 --- a/mm/memory-tiers.c +++ b/mm/memory-tiers.c @@ -889,6 +889,7 @@ static int __meminit memtier_hotplug_callback(struct no= tifier_block *self, mutex_lock(&memory_tier_lock); if (clear_node_memory_tier(nn->nid)) establish_demotion_targets(); + update_memcg_toptier_capacity(); mutex_unlock(&memory_tier_lock); break; case NODE_ADDED_FIRST_MEMORY: @@ -896,6 +897,7 @@ static int __meminit memtier_hotplug_callback(struct no= tifier_block *self, memtier =3D set_node_memory_tier(nn->nid); if (!IS_ERR(memtier)) establish_demotion_targets(); + update_memcg_toptier_capacity(); mutex_unlock(&memory_tier_lock); break; } @@ -941,6 +943,45 @@ bool numa_demotion_enabled =3D false; =20 bool tier_aware_memcg_limits; =20 +void mt_get_toptier_nodemask(nodemask_t *mask, const nodemask_t *allowed) +{ + int nid; + + *mask =3D NODE_MASK_NONE; + for_each_node_state(nid, N_MEMORY) { + if (node_is_toptier(nid)) + node_set(nid, *mask); + } + if (allowed) + nodes_and(*mask, *mask, *allowed); +} + +unsigned long mt_get_toptier_capacity(const nodemask_t *allowed) +{ + int nid; + unsigned long capacity =3D 0; + nodemask_t mask; + + mt_get_toptier_nodemask(&mask, allowed); + for_each_node_mask(nid, mask) + capacity +=3D NODE_DATA(nid)->node_present_pages; + + return capacity; +} + +unsigned long mt_get_total_capacity(const nodemask_t *allowed) +{ + int nid; + unsigned long capacity =3D 0; + + for_each_node_state(nid, N_MEMORY) { + if (allowed && !node_isset(nid, *allowed)) + continue; + capacity +=3D NODE_DATA(nid)->node_present_pages; + } + return capacity; +} + #ifdef CONFIG_MIGRATION #ifdef CONFIG_SYSFS static ssize_t demotion_enabled_show(struct kobject *kobj, diff --git a/mm/page_counter.c b/mm/page_counter.c index 5ec97811c418..cf21c72bfd4e 100644 --- a/mm/page_counter.c +++ b/mm/page_counter.c @@ -11,6 +11,7 @@ #include #include #include +#include #include =20 static bool track_protection(struct page_counter *c) @@ -463,6 +464,13 @@ void page_counter_calculate_protection(struct page_cou= nter *root, recursive_protection)); } =20 +void page_counter_update_toptier_capacity(struct page_counter *counter, + const nodemask_t *allowed) +{ + counter->toptier_capacity =3D mt_get_toptier_capacity(allowed); + counter->total_capacity =3D mt_get_total_capacity(allowed); +} + unsigned long page_counter_toptier_high(struct page_counter *counter) { unsigned long high =3D READ_ONCE(counter->high); --=20 2.47.3