From nobody Wed Jun 17 07:22:45 2026 Received: from mail-oa1-f48.google.com (mail-oa1-f48.google.com [209.85.160.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3B696342524 for ; Thu, 23 Apr 2026 20:34:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.48 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776976490; cv=none; b=ST6UZYZl9wAmMY2V7ETX0UciULO6ZE53PUtBVtEX7HJs9pFSXU2Gn9XCsCXqy0+/a9nEMPWp27Dd5e/24C2zhOrtsugTwH74pTSC9VpeqHcueO+G7lxReqWeuOMorrloTT2+llmfudebCBZcYVnybt3KwNNhftLEyj1KEzYPeiI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776976490; c=relaxed/simple; bh=39F1o581xg4twjfG9X6qcV7pU5QuDnl6qOMsk8dYRnc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=CoHcIgTd/pQkFID4Tij23SQPcf+ILCJsBBaEiY/N7LqN/hfTA7P03OO4OFf7rlGrh1LZTqty0NVxtRxboWQ+E97UMSZvzHier7UP4DKgb12DMtd4Tz+6Nmem56EWjdOzbyS6BKzRvU+2QTXbh4zyVDK2/my0I2vcpRF9QQ76bDQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=obDih05A; arc=none smtp.client-ip=209.85.160.48 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="obDih05A" Received: by mail-oa1-f48.google.com with SMTP id 586e51a60fabf-40974bf7781so6361904fac.0 for ; Thu, 23 Apr 2026 13:34:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1776976488; x=1777581288; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Dt6QPYXRPlOeK952eds6qWokMWJg/x4011D3wXxWslY=; b=obDih05ASzv8dlL7WrPJ8hTrgtEWBCNzrz5GfvitkEgav1Pk6oB2BVb1Mds90vYQ9K uuhm/zbcjoITPJasUpR8xyvdK4wLG8o+gKdioYgFgW56PVfZyifWuAkCWRByB8nELehp JXeN+LUMC3dIw6pWfxo9bEylh/JfXRAVIbLQ4I9z4L369KqfpiaDvDr+0tYoYCijgNpn GJ3p3Ai7bnnB/vdwWxV8EdXJO4VlITKFgsPnQpOkmibA4wONSj9m85bpExsr45S5zDv4 kC6BKuLsfvvTxGgIXRVQIL9dEkariIkxV7Lt5hCrfuoTdBpy41JpZFrREJIkLdp1LZrk +KIQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776976488; x=1777581288; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=Dt6QPYXRPlOeK952eds6qWokMWJg/x4011D3wXxWslY=; b=lICQo5ErRNTAtavPV899adWPPRzVyVwafUSaCWijw9vxyJoX8SvSbQO249u9Ou4XiU FB/LD34CbMgrPaq11g0hW7jzXbXDsEdFHjgQ9BAGCE5YZ4jYurNd8m8GgzILftgSdbCC pSJM/oIEhhTj7RtR5zQ6qq6+bh4c6Y4oiL13tbFXCmok40BxkIHe52J8H4t/ka/FWY2t pUJfwoeyzuHxqo5fS2CjapQSfruNFnzjTjF+V4FY5pQcPJE8iG4TpuZSV1oSJtLc/0yx JCmn6reDzBUeRtFn4/SvgktCIdWukwa8E9ZT0QBHRah34iYJyI4I+lK+Y/KdO0wnqou2 lrsQ== X-Forwarded-Encrypted: i=1; AFNElJ85HvBa9aNOI86JsPS+DnjrjfZ8YEHlFUTFOkCkMqrRVddkdlt0wX7sY0Te4oEPJOt8N4ONz0r5oZFD1Oc=@vger.kernel.org X-Gm-Message-State: AOJu0YxVkJTsnnHYlTXtnBLrKB6IzGzV0yTbYppiYObE4iBSA9wVrf+L RKTKGF/D7fcnmTGsE64IC+smQy/kVfNwcmfmMjfWdnu9lHjCGXE0TEpb X-Gm-Gg: AeBDiesvDa+5ZmNP4JsYMUPqivCuCIjeXm8/TwWBgFkh3UdPhnkEIpNbk/W/+4KCcXv /Wh6pz9rVudHrcwkmbzSM3RB81fBdsrruzc/gZvqsSepj/TAfcr7nm1fa45e5azGqNLDCpSN54r b4GsUExeUimYWNus4KgcCy/Z9tLseY3sK71wqU32jiCh9+Nfg6uOKdbFpgkQM6CckiIzESI+tTA FKzu5iB344kY9wc860Qp2hDRHVPexEVG192BnE6XCwowDA5ZzBchlUIeY8U6rRjWFs8LuyTFnOZ l8XgKH56Q7x2CVd0dzQ5T/9xF1AqxJ4AIb6uXjgq1kYabBma2KdQWbgsLVObYbgZa4FMqwc4nFN suIB3MITe+NbJF+H3Yucee9ZvWF12EiUysQ7s3Oq4mNn4GHipeWqJH6ieBkQvGHni+CCcPrSm6F V/2Cezkop1j6sjENUfb7iZf0WMUK3Z X-Received: by 2002:a05:6870:88a3:b0:3e8:926e:bf9f with SMTP id 586e51a60fabf-42a99ab5f12mr15325305fac.13.1776976488180; Thu, 23 Apr 2026 13:34:48 -0700 (PDT) Received: from localhost ([2a03:2880:10ff::]) by smtp.gmail.com with ESMTPSA id 586e51a60fabf-42fe61449bcsm2793964fac.11.2026.04.23.13.34.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Apr 2026 13:34:47 -0700 (PDT) From: Joshua Hahn To: linux-mm@kvack.org Cc: Tejun Heo , Johannes Weiner , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: [RFC PATCH 1/9 v2] cgroup: Introduce memory_tiered_limits cgroup mount option Date: Thu, 23 Apr 2026 13:34:35 -0700 Message-ID: <20260423203445.2914963-2-joshua.hahnjy@gmail.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260423203445.2914963-1-joshua.hahnjy@gmail.com> References: <20260423203445.2914963-1-joshua.hahnjy@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Introduce a cgroup mount option memory_tiered_limits to enable tier-proportional scaling of the memory cgroup controller limits memory.{min, low, high, max}. The mount option currently does not have any effect. Later commits will scale memcg limits proportional to the system's toptier:total capacity ratio. Signed-off-by: Joshua Hahn --- include/linux/cgroup-defs.h | 5 +++++ include/linux/memcontrol.h | 14 ++++++++++++++ kernel/cgroup/cgroup.c | 12 ++++++++++++ 3 files changed, 31 insertions(+) diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index bb92f5c169ca2..0b6861f4faece 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -128,6 +128,11 @@ enum { * Enable legacy local pids.events. */ CGRP_ROOT_PIDS_LOCAL_EVENTS =3D (1 << 20), + + /* + * Enable tier-proportional scaling of limits for the memory controller. + */ + CGRP_ROOT_MEMORY_TIERED_LIMITS =3D (1 << 21), }; =20 /* cftype->flags */ diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index dc3fa687759b4..be45641e890e4 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -533,6 +533,15 @@ static inline bool mem_cgroup_disabled(void) return !cgroup_subsys_enabled(memory_cgrp_subsys); } =20 +static inline bool mem_cgroup_tiered_limits(void) +{ +#ifdef CONFIG_NUMA + return cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_TIERED_LIMITS; +#else + return false; +#endif +} + static inline void mem_cgroup_protection(struct mem_cgroup *root, struct mem_cgroup *memcg, unsigned long *min, @@ -1084,6 +1093,11 @@ static inline bool mem_cgroup_disabled(void) return true; } =20 +static inline bool mem_cgroup_tiered_limits(void) +{ + return false; +} + static inline void memcg_memory_event(struct mem_cgroup *memcg, enum memcg_memory_event event) { diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index babf7b4560488..6a34d0e179dc5 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -1989,6 +1989,7 @@ enum cgroup2_param { Opt_memory_recursiveprot, Opt_memory_hugetlb_accounting, Opt_pids_localevents, + Opt_memory_tiered_limits, nr__cgroup2_params }; =20 @@ -1999,6 +2000,7 @@ static const struct fs_parameter_spec cgroup2_fs_para= meters[] =3D { fsparam_flag("memory_recursiveprot", Opt_memory_recursiveprot), fsparam_flag("memory_hugetlb_accounting", Opt_memory_hugetlb_accounting), fsparam_flag("pids_localevents", Opt_pids_localevents), + fsparam_flag("memory_tiered_limits", Opt_memory_tiered_limits), {} }; =20 @@ -2031,6 +2033,9 @@ static int cgroup2_parse_param(struct fs_context *fc,= struct fs_parameter *param case Opt_pids_localevents: ctx->flags |=3D CGRP_ROOT_PIDS_LOCAL_EVENTS; return 0; + case Opt_memory_tiered_limits: + ctx->flags |=3D CGRP_ROOT_MEMORY_TIERED_LIMITS; + return 0; } return -EINVAL; } @@ -2072,6 +2077,11 @@ static void apply_cgroup_root_flags(unsigned int roo= t_flags) cgrp_dfl_root.flags |=3D CGRP_ROOT_PIDS_LOCAL_EVENTS; else cgrp_dfl_root.flags &=3D ~CGRP_ROOT_PIDS_LOCAL_EVENTS; + + if (root_flags & CGRP_ROOT_MEMORY_TIERED_LIMITS) + cgrp_dfl_root.flags |=3D CGRP_ROOT_MEMORY_TIERED_LIMITS; + else + cgrp_dfl_root.flags &=3D ~CGRP_ROOT_MEMORY_TIERED_LIMITS; } } =20 @@ -2089,6 +2099,8 @@ static int cgroup_show_options(struct seq_file *seq, = struct kernfs_root *kf_root seq_puts(seq, ",memory_hugetlb_accounting"); if (cgrp_dfl_root.flags & CGRP_ROOT_PIDS_LOCAL_EVENTS) seq_puts(seq, ",pids_localevents"); + if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_TIERED_LIMITS) + seq_puts(seq, ",memory_tiered_limits"); return 0; } =20 --=20 2.52.0 From nobody Wed Jun 17 07:22:45 2026 Received: from mail-ot1-f41.google.com (mail-ot1-f41.google.com [209.85.210.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A934833D6E1 for ; Thu, 23 Apr 2026 20:34:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.41 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776976492; cv=none; b=MKoViNmz6kvla+f5P/mxsncZZ9u8K3O0cM+nFm+DufMIISP+8++qpHkEG5et7GuPFBov+5N77q4gFUaq9BaR/lU3pyhKmUVadKm57KjeQ6m6zVnLD+HQOwhx6B0yGdcTEhKhjbHac+pphGnMOJZhuWiWVwODe3jLx8IerPe5UXg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776976492; c=relaxed/simple; bh=d3NfjxKJKnmjOlirCDp9bag4Xf0ihYUqzIbZOP4xA14=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=UHK1YcNdpKKzqq8QxjigFJrLKIPIh3HqJhErMSZqqMptN5euJ62ZpNS42d9cwxU5B2Ns3VoyvaiDJr6VECAQQz3LRbdHabY1EZySNWHwVyHexWYt+xrIKLzZyXRRelk6d92OvWNNwA/knfHq8ScIbVMJEbrLK0FH3Tbf0sU/TuA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=gt8ZKoB4; arc=none smtp.client-ip=209.85.210.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="gt8ZKoB4" Received: by mail-ot1-f41.google.com with SMTP id 46e09a7af769-7de4a9cb8eeso1113864a34.0 for ; Thu, 23 Apr 2026 13:34:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1776976489; x=1777581289; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=/Sc+ZsDjiZEjytSKxuIVGQ3PUb0Eu41iRMSui9hW2VQ=; b=gt8ZKoB423MNdjBgXh8+SG5CEitH1yT4UmjImBzOG58PswWZ0Ty9+8OHyUC6qrugcZ 9URtzFoAoY5gCv+NJxtTS2hhCzKTJeNY8b/3zdVR5/6lCMt7cvZJR4ZZw6JvzZOs5f59 giapSFMnCVS1qmEqJWBs4sC6q51F0d+X9rvQxkZBv2fJaEJDfSLIoGZTVULghNSHlAbe jy1C1FP55WDy6ASPVxyxTmXJtV5ET58thB6vdlI6/kporIhBcpSF51KEVTtj/KBvMyPr PMtNXcQeJHcnVOw2zG11W0sSEfo46ivS4uNbZrg8zi53nZT93neyAyimAfSUQHFv22xn QcfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776976489; x=1777581289; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=/Sc+ZsDjiZEjytSKxuIVGQ3PUb0Eu41iRMSui9hW2VQ=; b=ZoNeJdBS9wMoI/bkiAkteXpyEc6VgtE3XL9jS5/XgRi7f7ILuxYhvPC17kNmXHe9fG Ue5Q26aTnHFjY2Krx22LuNpa34bEwiG8RlYJyBNogAk2uR48VVqg6RzXsB7Kebxs6Kbi lIMxctHWzZU8KfHcOnzpqsBUtA8UEWO14tGheqfVL/amuiB4UkP+0/7Dnq3AwTM3sbds m3QULVC7cFqIJlQjfZQGaFBypO4n8+fCEHxCV8R+zMNTSyhtPWJE/fZcMF/xwtnoirFU TGMe8QyjEy3hgVBC2FbQzZvTei2/crdmvKumTPFtY3dbucHvtvLrlpqlHaoXKTCREQ2R xelw== X-Forwarded-Encrypted: i=1; AFNElJ/vPqlZkN7fu+9EUidxXyNnog+VD95b6COwnr/ofjFBFZZ7UzEfLQEqUsNY6B5uv94QHDOvDpcmACWBBsI=@vger.kernel.org X-Gm-Message-State: AOJu0YwH2J4oy+h6HzleBWZ5xc+fTlECm8tFOSyXguJK8MsNWXbFJ9tn ZrtdC/bCRRmkvTb0YzLm2pU1ayfgv3KqWjdeRi2GnOpfRqWZR55PgnEf X-Gm-Gg: AeBDiesMD4o7ecYotGvWr92jiF/90Rxxs0h0QC9zcDWbWffKBNYXfIlUEE5QaTnLDzA k36v0cdogjnU+cVLKwKMi4kcZHQoX9ThrUmTdDij1xrFJ7DRJ+w+HQKGkBXmb4uSuBBImbTSiMn 9z8UEcvQnL4NIsFrMHHSrOrToNJYuprqZNYwnYU3XeiHctRExQfZ249z2yllPIGvAEPJTYB0oQV q8Nh3CyXVaiYVWsuI9N68Sd/iQi8Nnn8zSgquWQ35/CbywQ91fDkvQ3vqKwbqWMOHLx/WjhmtVP eqNXvO4zuzpyXVhe2rll/eNZeWlyBvHKJVIg7PQxl4O5Q2AlUR+8M7xKkBY+C5xT3JgQostfs4E I7bNEegw9sFx4I/5VRf0yvieS6eSQwciYAJjApkD/DAdHwU18tvVv99i98L7uEYxj/IlWi4uw0p +N7mGShuPY7IOvCTFknCbD7dQHbYmXheU= X-Received: by 2002:a05:6830:6a91:b0:7d9:d2b6:1568 with SMTP id 46e09a7af769-7dc9518bb34mr18578777a34.17.1776976489633; Thu, 23 Apr 2026 13:34:49 -0700 (PDT) Received: from localhost ([2a03:2880:10ff:5::]) by smtp.gmail.com with ESMTPSA id 46e09a7af769-7dc975034e7sm17408211a34.6.2026.04.23.13.34.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Apr 2026 13:34:49 -0700 (PDT) From: Joshua Hahn To: linux-mm@kvack.org Cc: Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: [RFC PATCH 2/9 v2] mm/memory-tiers: Introduce toptier utility functions Date: Thu, 23 Apr 2026 13:34:36 -0700 Message-ID: <20260423203445.2914963-3-joshua.hahnjy@gmail.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260423203445.2914963-1-joshua.hahnjy@gmail.com> References: <20260423203445.2914963-1-joshua.hahnjy@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This patch introduces two toptier-related utility functions, get_toptier_nodemask() and mt_scale_by_toptier(). Tier aware limits will introduce new memcg thresholds on toptier nodes for systems with multiple memory tiers. To simplify the calculation for these new thresholds, introduce a function mt_scale_by_toptier to scale memory limits by the ratio of toptier capacity and total capacity available on the system. For single-node / single-tier systems, the scaling operation will be a no-op since capacity updates are hooked into establish_demotion_targets. Note that the ratio is static for the entire system. Explicitly, it does not take cgroups' cpuset.mems into consideration, meaning even cgroups limited to toptier nodes only will still get a scaled down toptier limit. This is to ensure that all cgroups are limited to their fair share of toptier memory, regardless of what nodes they are restricted to. This also has the added benefit of preventing accidental /unintentional overcommitting of toptier memory, since every cgroup shares the same toptier ratio. get_toptier_nodemask() extends the existing node_is_toptier check to return a nodemask of all N_MEMORY nodes living on toptier. For !CONFIG_NUMA_MIGRATION or !CONFIG_NUMA systems, it will just return all N_MEMORY nodes. Signed-off-by: Joshua Hahn --- include/linux/memory-tiers.h | 17 ++++++++++++++++ mm/memory-tiers.c | 38 ++++++++++++++++++++++++++++++++++++ 2 files changed, 55 insertions(+) diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h index 7999c58629eeb..f21525c50a5ff 100644 --- a/include/linux/memory-tiers.h +++ b/include/linux/memory-tiers.h @@ -52,10 +52,12 @@ int mt_perf_to_adistance(struct access_coordinate *perf= , int *adist); struct memory_dev_type *mt_find_alloc_memory_type(int adist, struct list_head *memory_types); void mt_put_memory_types(struct list_head *memory_types); +unsigned long mt_scale_by_toptier(unsigned long val); #ifdef CONFIG_NUMA_MIGRATION int next_demotion_node(int node, const nodemask_t *allowed_mask); void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets); bool node_is_toptier(int node); +void get_toptier_nodemask(nodemask_t *mask); #else static inline int next_demotion_node(int node, const nodemask_t *allowed_m= ask) { @@ -71,6 +73,11 @@ static inline bool node_is_toptier(int node) { return true; } + +static inline void get_toptier_nodemask(nodemask_t *mask) +{ + *mask =3D node_states[N_MEMORY]; +} #endif =20 #else @@ -116,6 +123,11 @@ static inline bool node_is_toptier(int node) return true; } =20 +static inline void get_toptier_nodemask(nodemask_t *mask) +{ + *mask =3D node_states[N_MEMORY]; +} + static inline int register_mt_adistance_algorithm(struct notifier_block *n= b) { return 0; @@ -151,5 +163,10 @@ static inline struct memory_dev_type *mt_find_alloc_me= mory_type(int adist, static inline void mt_put_memory_types(struct list_head *memory_types) { } + +static inline unsigned long mt_scale_by_toptier(unsigned long val) +{ + return val; +} #endif /* CONFIG_NUMA */ #endif /* _LINUX_MEMORY_TIERS_H */ diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c index 54851d8a195b0..acc02679e312d 100644 --- a/mm/memory-tiers.c +++ b/mm/memory-tiers.c @@ -46,6 +46,8 @@ static struct node_memory_type_map node_memory_types[MAX_= NUMNODES]; struct memory_dev_type *default_dram_type; nodemask_t default_dram_nodes __initdata =3D NODE_MASK_NONE; =20 +static unsigned long toptier_capacity; + static const struct bus_type memory_tier_subsys =3D { .name =3D "memory_tiering", .dev_name =3D "memory_tier", @@ -299,6 +301,17 @@ bool node_is_toptier(int node) return toptier; } =20 +void get_toptier_nodemask(nodemask_t *mask) +{ + int node; + + nodes_clear(*mask); + for_each_node_state(node, N_MEMORY) { + if (node_is_toptier(node)) + node_set(node, *mask); + } +} + void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets) { struct memory_tier *memtier; @@ -428,6 +441,7 @@ static void establish_demotion_targets(void) struct demotion_nodes *nd; int target =3D NUMA_NO_NODE, node; int distance, best_distance; + int i; nodemask_t tier_nodes, lower_tier; =20 lockdep_assert_held_once(&memory_tier_lock); @@ -496,6 +510,19 @@ static void establish_demotion_targets(void) break; } } + + toptier_capacity =3D 0; + for_each_node_state(node, N_MEMORY) { + if (!node_is_toptier(node)) + continue; + + for (i =3D 0; i < MAX_NR_ZONES; i++) { + struct zone *z =3D &NODE_DATA(node)->node_zones[i]; + + toptier_capacity +=3D zone_managed_pages(z); + } + } + /* * Now build the lower_tier mask for each node collecting node mask from * all memory tier below it. This allows us to fallback demotion page @@ -878,6 +905,16 @@ int mt_calc_adistance(int node, int *adist) } EXPORT_SYMBOL_GPL(mt_calc_adistance); =20 +unsigned long mt_scale_by_toptier(unsigned long val) +{ + unsigned long total_capacity =3D totalram_pages(); + + if (!total_capacity) + return 0; + + return mult_frac(val, toptier_capacity, total_capacity); +} + static int __meminit memtier_hotplug_callback(struct notifier_block *self, unsigned long action, void *_arg) { @@ -932,6 +969,7 @@ static int __init memory_tier_init(void) node_states[N_CPU]); =20 hotplug_node_notifier(memtier_hotplug_callback, MEMTIER_HOTPLUG_PRI); + toptier_capacity =3D totalram_pages(); return 0; } subsys_initcall(memory_tier_init); --=20 2.52.0 From nobody Wed Jun 17 07:22:45 2026 Received: from mail-oo1-f48.google.com (mail-oo1-f48.google.com [209.85.161.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 630CD346773 for ; Thu, 23 Apr 2026 20:34:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.161.48 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776976495; cv=none; b=NMZvEZ6J4PNJEUvxZYB3f3uRonykeDkEteYy4D1MaI+7oiQlDpyNyRtMqZ3DkhrsgjE5cdtum3gEbE49J3YVkKvB1SEg4ZV1O7OK0dOgWQaTzA2V6W1DByw9e2Bl0lVg8inPkUbycUXmcv9dKSOAQnreiQ67N8YtQtZ6y5p1ZC4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776976495; c=relaxed/simple; bh=m2egjhEQC0t2Vv15sybW+Jh3yKaxRsDhuxEWKn//AVM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=VSdKLtPmnQ93QbgiIEowFY0IZdFslh7OnE3NBj/po77rwcNG0AkGWGCfFnQAIPMAyOoUtkm1QTzmGlSRC1I4BMDFdqultv4Fh9dC0lwqWzRXX7dRkEHSkOTt5fhwtg4DD6UCkuMsiDcxVwvhKzznZEz/1wttWhoGcSQhvw901qA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=LqdIkg/p; arc=none smtp.client-ip=209.85.161.48 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="LqdIkg/p" Received: by mail-oo1-f48.google.com with SMTP id 006d021491bc7-673ee2a98b1so3977945eaf.0 for ; Thu, 23 Apr 2026 13:34:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1776976491; x=1777581291; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=FE7KATctX9/flJ2U64Pj9KvPGOu/JEwhawIz5SLby7w=; b=LqdIkg/pl+32/D5WXRVGPF4Cb/fLswyGBJoR1Tkus3GaUzUlDmuS6XrMS0CSTL3cgm GNFizkLsQYdiPPGAphat09/+09J5umE339yZ1q9zR/ATemWDAcmK55cmbJK+kGgIMOvP an36297sY+UTdRk6lDfzoZ/qEek6hVmhvVLnz1z7ORSsjW1JD44auqq1QJ/sZrdv0tpS BIzg79gBVqrHqGu/TEMdP9m0PtqjGPtLwZMPjJcYl7DhXBGnym309smge1ITNSaX3OcM fW/bftW3XnWiEP+hx92cntPuy43khHtAmfi/hfTwJn2IItIxoW8ieDd6aBCR7PV89aNX QUww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776976491; x=1777581291; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=FE7KATctX9/flJ2U64Pj9KvPGOu/JEwhawIz5SLby7w=; b=f5xtevMqIud8eN1bZP/xKyHJbbVWVDYsq7D5NJkAUNB69pGX9hJakSN4nM/o9c3WO3 cLlGrCAQ+FAFTvfHj+dJ/Kwax31PPvs93N4TR0e6LEbcT405xoDWtZXKEL+cVY67m7Np VlLsYdpsE/uVuLYc4IAYThgS3g5oS4stZ+eDLONGmhVToVYXor4JrOHGnf+V7kOLTa7h AeFlDRJKFgyU/6anq4/aT9gkx+xoa4mUU2XZ7m55Snc9g7wkf1gM2dwIyIv6pVCIjB7x eEsAYQQzk5TX8BJPY4kM4cloqw8c4+CGnLvnuTNYISuktEqS7Dh6N5CHIBIhYGVnwmXI tjDQ== X-Forwarded-Encrypted: i=1; AFNElJ+XEOb1T6GXoVm5es1Nh67oZtsHF1icKs4UaEpyCeeU9DxLTciQI4V6x+p8Vfvxpyl0BHMr82KUmg+17ys=@vger.kernel.org X-Gm-Message-State: AOJu0YzFg8xJxA9mhlN/cusexhJ4CeNhTJhZvePKzq9G1EOt0/yl1JtQ t+I3Ak11MBRLAtKtm8W79xaPSzYcWdCSPprvMTkLoBLM/KlotuIJ3pzJ X-Gm-Gg: AeBDietragfi8Cl5K7j2aa2Lxjig/vSwHYwAbJveftPLQqKP6hx1aCxcxPmUlVMKs61 HGZaJfQQl956+g2VWSoosSfPWsZsy3yYhTMiOrfo8Aa3XqGAqYB2riAo7VT0ggAVg07Orey8I4d 0JhE6t8+eXMrZ7QuxocW0yi/Aezj6xI3pkE0H7dLkjCpBQGMvZGT9nr5vNjebyPTiK9B8scD3Ev Ca66vNG+F+2NE2upvujeGWLOSdKasz3MjmWhWq4HhCkKcHZouwANL0qK88gaSyDcGkuZG/FM6k/ YhwnjYs2t9XgoPc87EMDzDK+/NvGs5KooSUrs0N1i/uWB6WiHd4DcMRBwHGtPtLmQCoykOLGQGg CLcWBtCxhqanFHyTvVAsTABxRl5BVUsnqOMVQTT9HmqXDyHPPhazrjWuhbhZwzldLIicMsB2eVA qAzivTB02elnsRFek+0/EXxq2WIGHWdnth X-Received: by 2002:a05:6820:827:b0:694:97ca:9ec5 with SMTP id 006d021491bc7-69497caa1c1mr8146048eaf.51.1776976491332; Thu, 23 Apr 2026 13:34:51 -0700 (PDT) Received: from localhost ([2a03:2880:10ff:48::]) by smtp.gmail.com with ESMTPSA id 006d021491bc7-69464c77c7fsm13161024eaf.0.2026.04.23.13.34.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Apr 2026 13:34:50 -0700 (PDT) From: Joshua Hahn To: linux-mm@kvack.org Cc: Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Andrew Morton , Muchun Song , cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: [RFC PATCH 3/9 v2] mm/memcontrol: Refactor page_counter charging in try_charge_memcg Date: Thu, 23 Apr 2026 13:34:37 -0700 Message-ID: <20260423203445.2914963-4-joshua.hahnjy@gmail.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260423203445.2914963-1-joshua.hahnjy@gmail.com> References: <20260423203445.2914963-1-joshua.hahnjy@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In preparation for adding charging and uncharging of a new page_counter toptier to try_charge_memcg, refactor the code so that it is easier to roll back partial charges when any one of the three page_counters fail to charge. No functional changes intended. Signed-off-by: Joshua Hahn --- mm/memcontrol.c | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 7de23ecd7cef6..8f7bedb55dbb1 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2385,18 +2385,22 @@ static int try_charge_memcg(struct mem_cgroup *memc= g, gfp_t gfp_mask, =20 retry: reclaim_options =3D MEMCG_RECLAIM_MAY_SWAP; - if (!do_memsw_account() || - page_counter_try_charge(&memcg->memsw, nr_pages, &counter)) { - if (page_counter_try_charge(&memcg->memory, nr_pages, &counter)) - goto done_restock; - if (do_memsw_account()) - page_counter_uncharge(&memcg->memsw, nr_pages); - mem_over_limit =3D mem_cgroup_from_counter(counter, memory); - } else { + + if (do_memsw_account() && + !page_counter_try_charge(&memcg->memsw, nr_pages, &counter)) { mem_over_limit =3D mem_cgroup_from_counter(counter, memsw); reclaim_options &=3D ~MEMCG_RECLAIM_MAY_SWAP; + goto reclaim; } =20 + if (page_counter_try_charge(&memcg->memory, nr_pages, &counter)) + goto done_restock; + + if (do_memsw_account()) + page_counter_uncharge(&memcg->memsw, nr_pages); + mem_over_limit =3D mem_cgroup_from_counter(counter, memory); + +reclaim: /* * Prevent unbounded recursion when reclaim operations need to * allocate memory. This might exceed the limits temporarily, --=20 2.52.0 From nobody Wed Jun 17 07:22:45 2026 Received: from mail-oa1-f49.google.com (mail-oa1-f49.google.com [209.85.160.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CED24346FC3 for ; Thu, 23 Apr 2026 20:34:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.49 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776976498; cv=none; b=T/yVM86vSHNfZohCoeHKkuDVLBqMc9TZJL35nYWAh/XGhIsrGeJweTVuJlkRFbjrG6TSNcopq3iV6Tlee6YZ9JW8JMKGPsx03G6Mvhb1T2I1bs2SR+mIk5m5WWMA2o9PRbi8oto46Bo8KvbyXnSa/5fS39HJRbau35LLyw52DI4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776976498; c=relaxed/simple; bh=RGcDFyHKvXsYqrb2hd4XjqR9VB7xFPHAEoJogUHFeqE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KpnZXCjfpE8wyEL6s++aiJ6nPi1WDj6NiC2vzD6/uMHgLIUvEgHkSSfi3wK8hjX0h9fnrnUblD4pevhaN5/bOTi/pRsG+e1FB4Iot8gmD45dsP7gJOOAozn8QryGd/ZwdlZzwM9uRBMNqedbmo0AETi/BZFdWUeiWoth3526T5I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=MNuGMkjB; arc=none smtp.client-ip=209.85.160.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="MNuGMkjB" Received: by mail-oa1-f49.google.com with SMTP id 586e51a60fabf-40ede943bf0so4464405fac.2 for ; Thu, 23 Apr 2026 13:34:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1776976493; x=1777581293; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=k8h4sSDHY9wW/Z7IHaPKSopnSuqP+potJrVY6PtuZz8=; b=MNuGMkjBj6sMG6XngWzc1ISA2oO7PEJvSioDExUbIwDfygxRjrqkfHLswP/xNygNHr clyYzYAVfSqtQINgPSpP8+XJD98PQFallfZgYdHqqgI1Qq3O8w34XodhijpwLbftsD2L nFGtVAdVcgiEqBGUDHZdVmEQa1bBaRywNxqFSUoqgi0JoVMICc3v8iz84Tndt0x2X5xM VkrT7eJO6hlORhUM6JJqglgKyCb6yvDQqELtcuAcrjelTqKuLEMVNtgXU+t6NlEB9gPP lGUJmZv5mljmjx4UK+/Lp8zRaTFY0cc8NvOKnzgWMlBHiyP2pc+jyqLZbMVi5vOsBqYk XRYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776976493; x=1777581293; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=k8h4sSDHY9wW/Z7IHaPKSopnSuqP+potJrVY6PtuZz8=; b=GBmvY6nDSpWJjK1Htbhc1wrxFgI/ihn9FUuoZnQIoWZT2f+jFNfG7lnGZWR4GdGMFI B6n0ejuHBBiKizaPeHALbAD5Yv/z7TR0w72jzuFVKseOMfRW20eYE4aTUNwpITtYbLf6 3wfHd0Hh50bkFyR2NWRFvUvQjtCqqDvucQCaAodeUpI4+XV8UP1gxslz4xujQVaAEaCn CL+9wtWF9Aworcwp+zF6R2K+FJ5AqlVc6Uz51oL55qMkd5saInidV78iTTfZJBfgVV2X nRC54XAHaw0UYUYYwCUU/7Li4u+mYCxLFExq4Xskycdgu3JSh+HUop8xcMjSgigtc6cI 5pHQ== X-Forwarded-Encrypted: i=1; AFNElJ8qt/vdDeS5mkel+h/kd4kgl9DkLDYHh1LqugscG4mJK7m7D3Y3cewvgdgYw9XjnbLplEG979Ya3nr8LxA=@vger.kernel.org X-Gm-Message-State: AOJu0YyZfGo3yltZNKo75Y2Mdsm4rNceju9Uun15iGf2r4aF+GPA/C9j XgyPZRHz05yesOuBDCrrMxHfyKQ7WkRt7myx3HbBG6lPmZMmbG+tYbfK X-Gm-Gg: AeBDiesZgdIyJGCPQTx7NYiWfUyXtmu5YkSUlwMs7A/ez12J61WQLHtZMLXP8ST7HxW omnz6AGhvB6oBP5I3Or+3WS5zzfxh5MSfnNiAlSVVLzCykRCtsuy0iwnDq8gHA8M9BCG52YR92T ZPWqVsCjcHgDlVrV8VlMeusAdZaWFMH02f/qTx7KcGIQ5sUWrkoulJswRlroCXtx3OPz8Xu8dJg WCw883laCS2WHWWpxlinofg1ECIhkfXBq+EUlUpItCVBHn1J209Dl3wZSDx7/J+RBjjFi/kjQtf 0mG0oSTo8DTFvVV8DyxAQAds5x0wD75k3HWskbk+gtsazu/oko40SvCsAlYE/prR0yUVp1BPfJG 5nl3ZbZdfd/v+nPHQ0cFepq1zZVmAQRL7tetk4DJySGOkINyAIPf05/U6JsmDaohxJ5O1kRJ0Yv 5zRF8J61mjQJTEHOdIcFxnOS5ZSXcnlpsz X-Received: by 2002:a05:6870:89aa:b0:42f:c146:da68 with SMTP id 586e51a60fabf-42fc146e30amr6196515fac.16.1776976492846; Thu, 23 Apr 2026 13:34:52 -0700 (PDT) Received: from localhost ([2a03:2880:10ff:71::]) by smtp.gmail.com with ESMTPSA id 586e51a60fabf-42fbe8a0bcbsm6038078fac.2.2026.04.23.13.34.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Apr 2026 13:34:52 -0700 (PDT) From: Joshua Hahn To: linux-mm@kvack.org Cc: Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Andrew Morton , Muchun Song , cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: [RFC PATCH 4/9 v2] mm/memcontrol: charge/uncharge toptier memory to mem_cgroup Date: Thu, 23 Apr 2026 13:34:38 -0700 Message-ID: <20260423203445.2914963-5-joshua.hahnjy@gmail.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260423203445.2914963-1-joshua.hahnjy@gmail.com> References: <20260423203445.2914963-1-joshua.hahnjy@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Memory cgroup limits currently offer a way to isolate memory as a resource, but treats the cost/value of all memory to be equal, regardless of whether it is present in a toptier node or not. To better capture the asymmetric utility of toptier memory from "lowtier" memory, account toptier memory usage in parallel to existing memory accounting mechanisms. To do this, introduce a new page_counter "toptier" to mem_cgroup. From a simplified perspective, we can achieve this by checking the physical location of folios when the memory page_counter is updated, and decide whether to also account to toptier. Add a new "toptier" parameter to try_charge_memcg(), which callers must determine. However, as of this RFC, this simplified model only works on LRU folios (callers of try_charge_memcg() from charge_memcg()). The other two sites, obj_cgroup_charge_pages() and mem_cgroup_sk_charge(), will be addressed in future patches that transition enum memcg_stat_item to a per-lruvec counter (enum memcg_stat_item). Enforcement mechanisms are not present at this point. Failing the toptier limit check leads to nothing, but the charges are accumulated. Signed-off-by: Joshua Hahn --- include/linux/memcontrol.h | 1 + mm/memcontrol.c | 63 ++++++++++++++++++++++++++++++++++---- 2 files changed, 58 insertions(+), 6 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index be45641e890e4..0cdb6cd1955dc 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -206,6 +206,7 @@ struct mem_cgroup { =20 /* Accounted resources */ struct page_counter memory; /* Both v1 & v2 */ + struct page_counter toptier; /* v2 only */ =20 union { struct page_counter swap; /* v2 only */ diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 8f7bedb55dbb1..d891cf77cf6d6 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -53,6 +53,7 @@ #include #include #include +#include #include #include #include @@ -2096,6 +2097,7 @@ static int memcg_hotplug_cpu_dead(unsigned int cpu) =20 for_each_mem_cgroup(memcg) { page_counter_drain_cpu(&memcg->memory, cpu); + page_counter_drain_cpu(&memcg->toptier, cpu); page_counter_drain_cpu(&memcg->memsw, cpu); } =20 @@ -2370,7 +2372,7 @@ void __mem_cgroup_handle_over_high(gfp_t gfp_mask) } =20 static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, - unsigned int nr_pages) + unsigned int nr_pages, bool toptier) { int nr_retries =3D MAX_RECLAIM_RETRIES; struct mem_cgroup *mem_over_limit; @@ -2382,9 +2384,11 @@ static int try_charge_memcg(struct mem_cgroup *memcg= , gfp_t gfp_mask, bool raised_max_event =3D false; unsigned long pflags; bool allow_spinning =3D gfpflags_allow_spinning(gfp_mask); + bool toptier_charged; =20 retry: reclaim_options =3D MEMCG_RECLAIM_MAY_SWAP; + toptier_charged =3D false; =20 if (do_memsw_account() && !page_counter_try_charge(&memcg->memsw, nr_pages, &counter)) { @@ -2393,11 +2397,18 @@ static int try_charge_memcg(struct mem_cgroup *memc= g, gfp_t gfp_mask, goto reclaim; } =20 + if (toptier && + page_counter_try_charge(&memcg->toptier, nr_pages, &counter)) + toptier_charged =3D true; + if (page_counter_try_charge(&memcg->memory, nr_pages, &counter)) goto done_restock; =20 + if (toptier_charged) + page_counter_uncharge(&memcg->toptier, nr_pages); if (do_memsw_account()) page_counter_uncharge(&memcg->memsw, nr_pages); + mem_over_limit =3D mem_cgroup_from_counter(counter, memory); =20 reclaim: @@ -2490,6 +2501,8 @@ static int try_charge_memcg(struct mem_cgroup *memcg,= gfp_t gfp_mask, * being freed very soon. Allow memory usage go over the limit * temporarily by force charging it. */ + if (toptier) + page_counter_charge(&memcg->toptier, nr_pages); page_counter_charge(&memcg->memory, nr_pages); if (do_memsw_account()) page_counter_charge(&memcg->memsw, nr_pages); @@ -2559,7 +2572,7 @@ static inline int try_charge(struct mem_cgroup *memcg= , gfp_t gfp_mask, if (mem_cgroup_is_root(memcg)) return 0; =20 - return try_charge_memcg(memcg, gfp_mask, nr_pages); + return try_charge_memcg(memcg, gfp_mask, nr_pages, false); } =20 static void commit_charge(struct folio *folio, struct obj_cgroup *objcg) @@ -2859,7 +2872,7 @@ static int obj_cgroup_charge_pages(struct obj_cgroup = *objcg, gfp_t gfp, =20 memcg =3D get_mem_cgroup_from_objcg(objcg); =20 - ret =3D try_charge_memcg(memcg, gfp, nr_pages); + ret =3D try_charge_memcg(memcg, gfp, nr_pages, false); if (ret) goto out; =20 @@ -2888,6 +2901,11 @@ static void page_set_objcg(struct page *page, const = struct obj_cgroup *objcg) page->memcg_data =3D (unsigned long)objcg | MEMCG_DATA_KMEM; } =20 +static bool should_charge_toptier(struct folio *folio) +{ + return mem_cgroup_tiered_limits() && node_is_toptier(folio_nid(folio)); +} + /** * __memcg_kmem_charge_page: charge a kmem page to the current memory cgro= up * @page: page to charge @@ -3760,6 +3778,7 @@ static void __mem_cgroup_free(struct mem_cgroup *memc= g) static void mem_cgroup_free(struct mem_cgroup *memcg) { page_counter_free_stock(&memcg->memory); + page_counter_free_stock(&memcg->toptier); page_counter_free_stock(&memcg->memsw); lru_gen_exit_memcg(memcg); memcg_wb_domain_exit(memcg); @@ -3866,6 +3885,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *pare= nt_css) WRITE_ONCE(memcg->swappiness, mem_cgroup_swappiness(parent)); =20 page_counter_init(&memcg->memory, &parent->memory, memcg_on_dfl); + page_counter_init(&memcg->toptier, &parent->toptier, memcg_on_dfl); page_counter_init(&memcg->swap, &parent->swap, false); #ifdef CONFIG_MEMCG_V1 memcg->memory.track_failcnt =3D !memcg_on_dfl; @@ -3877,6 +3897,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *pare= nt_css) init_memcg_stats(); init_memcg_events(); page_counter_init(&memcg->memory, NULL, true); + page_counter_init(&memcg->toptier, NULL, true); page_counter_init(&memcg->swap, NULL, false); #ifdef CONFIG_MEMCG_V1 page_counter_init(&memcg->kmem, NULL, false); @@ -3936,6 +3957,7 @@ static int mem_cgroup_css_online(struct cgroup_subsys= _state *css) =20 /* failure is nonfatal, charges fall back to direct hierarchy */ page_counter_enable_stock(&memcg->memory, MEMCG_CHARGE_BATCH); + page_counter_enable_stock(&memcg->toptier, MEMCG_CHARGE_BATCH); if (do_memsw_account()) page_counter_enable_stock(&memcg->memsw, MEMCG_CHARGE_BATCH); =20 @@ -4013,6 +4035,7 @@ static void mem_cgroup_css_offline(struct cgroup_subs= ys_state *css) =20 drain_all_stock(memcg); page_counter_disable_stock(&memcg->memory); + page_counter_disable_stock(&memcg->toptier); page_counter_disable_stock(&memcg->memsw); =20 mem_cgroup_private_id_put(memcg, 1); @@ -4825,7 +4848,8 @@ static int charge_memcg(struct folio *folio, struct m= em_cgroup *memcg, objcg =3D get_obj_cgroup_from_memcg(memcg); /* Do not account at the root objcg level. */ if (!obj_cgroup_is_root(objcg)) - ret =3D try_charge_memcg(memcg, gfp, folio_nr_pages(folio)); + ret =3D try_charge_memcg(memcg, gfp, folio_nr_pages(folio), + should_charge_toptier(folio)); if (ret) { obj_cgroup_put(objcg); return ret; @@ -4922,6 +4946,7 @@ struct uncharge_gather { unsigned long nr_memory; unsigned long pgpgout; unsigned long nr_kmem; + unsigned long nr_toptier; int nid; }; =20 @@ -4942,6 +4967,8 @@ static void uncharge_batch(const struct uncharge_gath= er *ug) mod_memcg_state(memcg, MEMCG_KMEM, -ug->nr_kmem); memcg1_account_kmem(memcg, -ug->nr_kmem); } + if (ug->nr_toptier) + page_counter_uncharge(&memcg->toptier, ug->nr_toptier); memcg1_oom_recover(memcg); } =20 @@ -4987,8 +5014,11 @@ static void uncharge_folio(struct folio *folio, stru= ct uncharge_gather *ug) ug->nr_kmem +=3D nr_pages; } else { /* LRU pages aren't accounted at the root level */ - if (!obj_cgroup_is_root(objcg)) + if (!obj_cgroup_is_root(objcg)) { ug->nr_memory +=3D nr_pages; + if (should_charge_toptier(folio)) + ug->nr_toptier +=3D nr_pages; + } ug->pgpgout++; =20 WARN_ON_ONCE(folio_unqueue_deferred_split(folio)); @@ -5063,6 +5093,10 @@ void mem_cgroup_replace_folio(struct folio *old, str= uct folio *new) page_counter_charge(&memcg->memory, nr_pages); if (do_memsw_account()) page_counter_charge(&memcg->memsw, nr_pages); + + /* old folio's toptier usage will be uncharged on free */ + if (should_charge_toptier(new)) + page_counter_charge(&memcg->toptier, nr_pages); } =20 obj_cgroup_get(objcg); @@ -5105,6 +5139,23 @@ void mem_cgroup_migrate(struct folio *old, struct fo= lio *new) if (!objcg) return; =20 + if (!obj_cgroup_is_root(objcg)) { + struct mem_cgroup *memcg; + unsigned long nr_pages =3D folio_nr_pages(old); + bool old_toptier, new_toptier; + + rcu_read_lock(); + memcg =3D obj_cgroup_memcg(objcg); + old_toptier =3D should_charge_toptier(old); + new_toptier =3D should_charge_toptier(new); + + if (old_toptier && !new_toptier) + page_counter_uncharge(&memcg->toptier, nr_pages); + else if (!old_toptier && new_toptier) + page_counter_charge(&memcg->toptier, nr_pages); + rcu_read_unlock(); + } + /* Transfer the charge and the objcg ref */ commit_charge(new, objcg); =20 @@ -5180,7 +5231,7 @@ bool mem_cgroup_sk_charge(const struct sock *sk, unsi= gned int nr_pages, if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) return memcg1_charge_skmem(memcg, nr_pages, gfp_mask); =20 - if (try_charge_memcg(memcg, gfp_mask, nr_pages) =3D=3D 0) { + if (try_charge_memcg(memcg, gfp_mask, nr_pages, false) =3D=3D 0) { mod_memcg_state(memcg, MEMCG_SOCK, nr_pages); return true; } --=20 2.52.0 From nobody Wed Jun 17 07:22:45 2026 Received: from mail-oi1-f173.google.com (mail-oi1-f173.google.com [209.85.167.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BAA003451A9 for ; Thu, 23 Apr 2026 20:34:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.167.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776976498; cv=none; b=L+ffD5LNI5ww54w1h0jFtZPU5W44mannTl0wNAMoRV8OR1djAjgaAML5bhl7+xiZIIa5T8U+Bzt/a3hTSCYv/4LYfr6/B6vgrAwIAxwsetvEcpNmlzOXzQsOPkETA84AB2bBPkhWd69rOVzIQjEfSxtxMLT2XDxQbmIdv6td5rg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776976498; c=relaxed/simple; bh=0Vuo5qORGM2ky+6lFQl452VquXyeurxVpPag1KuYUsM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=rmRlqDv8J1gHZuzX93WD7zbbjNG5+oskd4GVrQql1RKh6T/nGKRj4VzKZ85qaX3JoRaV9shF9UQIaY1HEzU7vNhtVTljnTDK5R0NxiQr/offQL68cXneOOViem8rkBywpC7JbacHhejMuf+NIKtMNHXENDv/qOUoueZBvz+j9Fo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=J18JoHN7; arc=none smtp.client-ip=209.85.167.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="J18JoHN7" Received: by mail-oi1-f173.google.com with SMTP id 5614622812f47-479e4835e08so2689838b6e.1 for ; Thu, 23 Apr 2026 13:34:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1776976494; x=1777581294; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=fdJLrTxdQHuW/q5XioIV4K0g79ptTVlTBoTPrKNx/x0=; b=J18JoHN7RzmOAsZTPvnBNDKMM+U2DseL1GsxjtgrjqO0zTpigcT9lql2vsZFWHEUWg lE0GOR39hXw5U6zOUuE7eNhVrwWKU+WzbOVJdBa8GYL7Y25wWWrTZo6RwFqnGyy+kp8w FPDcf9H6zd0jNILNKlGDDiRkodjeMui6tlCQ7xinJMYmTs/4VBBhMbUchkT4bKw3JkF2 LDriWL+u5Q6z9Rb2DGSOuCVZTbI5edch3jf/+NPdHHKql2VKLlMbR5Dd92QOaoG+jSED Dl/srQNrR+yZ4aXpBBzxptacHBHsFwfLY+7gDlwgQPRiqE/kXXsY7uYeHieQ5WBGiW2f 8vZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776976494; x=1777581294; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=fdJLrTxdQHuW/q5XioIV4K0g79ptTVlTBoTPrKNx/x0=; b=hOgdLAOXGzkE98sD3iIR0njbRT8+i24hV+FIGuai1P8sTYskWU1OCaHrcx9Ox4NPqS 9hbjXKF+kWOnjPU+PHr/rsZ78sRZI9QKSiL33yrThalx3w31CpULuRqGdyT+ExujgONF +QVveS/mPdnQ2cV6TXmWdjGnHq26QkyvI8vaAwt7Cl8+KB4iv/6+IvAsRzdq77gWsMIN yUzW+XfRQiQ9sR/RG6qF7bhIwD3PU9HqewnfdApG8XroJxPa5rHL2WbsWYgbn17EY6uR UTO+Hj35BEdz02K8xawLUD1wBXpoHXVhQkmw3QmqAp3htLfhlyBzi/CX8g6AzslcoUuq qKBg== X-Forwarded-Encrypted: i=1; AFNElJ+f1nO6ABqzdusP04bgxZZVdtE+NZPjQI9S0rDcfdw03HCIqkN847Ct8SLnZ24ZBafUsJUzzLaeQ/HP8b8=@vger.kernel.org X-Gm-Message-State: AOJu0Yw8hGtLZrbKTSKtWcDDi1rCjrqXX5HQBq7dk8LyBTJRnxIuZ0TE mWZKSmcKd6+MGt/1JBb+NnOdDjO/TnEgEUT71VXSzESifgnPY2u/6D5T X-Gm-Gg: AeBDietyiq59BaWkSDe+r5V+PqQZHrP/hy8bi3R/TGo6eAe8Ftsdpn2FP3GKoyB1eC6 Cfr0fXlwRszlTI/Z7Sf20TUnlABUcE+uiIsGRbIYeEiYZpvXa9sYUTmP0jWbIy7GAmPK+/u7z0F SS9F/wJrjK+xpIeBtN6LUqG3WBqEJhoJr1tjg1M1CcM0oqDhP2J8xVWGpAwkEIwmyXbXMfbLrUc BRMvUM7PuJ/MX1s2F5rTTFFaBUdC2IIzLoki+/RSZx/foymUMd5rFhwU9m6SzXXLq0m7eD4hIiV KOo3qnFIEXIDYvjL4/+RErp/BoZCH3JbCg3zzVB/TXmI5r5hJ3ijyTgQEzToJgVM/QzGRzP+217 llL+4Qh7R7BFx+I3HQPajHajU/IlOf3EmCYiYRXBhyDddo1VLdARd+SNXsyCcrKe88JkZ6NWl3K WKeMmdVPdsPqo3vyIZBjpBYf+Q/7aNNiGL8OkFffrCZw== X-Received: by 2002:a05:6808:4f53:b0:467:58e:5d4b with SMTP id 5614622812f47-4799c8809admr14587198b6e.20.1776976494595; Thu, 23 Apr 2026 13:34:54 -0700 (PDT) Received: from localhost ([2a03:2880:10ff:7::]) by smtp.gmail.com with ESMTPSA id 5614622812f47-479a0169edfsm13908805b6e.9.2026.04.23.13.34.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Apr 2026 13:34:53 -0700 (PDT) From: Joshua Hahn To: linux-mm@kvack.org Cc: Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Andrew Morton , David Hildenbrand , Muchun Song , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: [RFC PATCH 5/9 v2] mm/memcontrol: Set toptier limits proportional to memory limits Date: Thu, 23 Apr 2026 13:34:39 -0700 Message-ID: <20260423203445.2914963-6-joshua.hahnjy@gmail.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260423203445.2914963-1-joshua.hahnjy@gmail.com> References: <20260423203445.2914963-1-joshua.hahnjy@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Compute proportional toptier limits based on memory limits when users write to memory limit sysfs files, or when memory hotplug causes the toptier capacity / total capacity ratio to be shifted. Also introduce new read-only cgroup files memory.toptier_{min,low,high,max} to expose the derived toptier limits. Signed-off-by: Joshua Hahn --- include/linux/memcontrol.h | 12 +++++ mm/memcontrol.c | 93 ++++++++++++++++++++++++++++++++++++++ mm/memory-tiers.c | 8 +++- 3 files changed, 111 insertions(+), 2 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 0cdb6cd1955dc..6bcb866440075 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -543,6 +543,14 @@ static inline bool mem_cgroup_tiered_limits(void) #endif } =20 +#ifdef CONFIG_NUMA +void update_memcg_toptier_limits(void); +#else +static inline void update_memcg_toptier_limits(void) +{ +} +#endif + static inline void mem_cgroup_protection(struct mem_cgroup *root, struct mem_cgroup *memcg, unsigned long *min, @@ -1099,6 +1107,10 @@ static inline bool mem_cgroup_tiered_limits(void) return false; } =20 +static inline void update_memcg_toptier_limits(void) +{ +} + static inline void memcg_memory_event(struct mem_cgroup *memcg, enum memcg_memory_event event) { diff --git a/mm/memcontrol.c b/mm/memcontrol.c index d891cf77cf6d6..3acb06388405c 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3875,6 +3875,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *pare= nt_css) return ERR_CAST(memcg); =20 page_counter_set_high(&memcg->memory, PAGE_COUNTER_MAX); + page_counter_set_high(&memcg->toptier, PAGE_COUNTER_MAX); memcg1_soft_limit_reset(memcg); #ifdef CONFIG_ZSWAP memcg->zswap_max =3D PAGE_COUNTER_MAX; @@ -4092,6 +4093,7 @@ static void mem_cgroup_css_reset(struct cgroup_subsys= _state *css) struct mem_cgroup *memcg =3D mem_cgroup_from_css(css); =20 page_counter_set_max(&memcg->memory, PAGE_COUNTER_MAX); + page_counter_set_max(&memcg->toptier, PAGE_COUNTER_MAX); page_counter_set_max(&memcg->swap, PAGE_COUNTER_MAX); #ifdef CONFIG_MEMCG_V1 page_counter_set_max(&memcg->kmem, PAGE_COUNTER_MAX); @@ -4100,6 +4102,9 @@ static void mem_cgroup_css_reset(struct cgroup_subsys= _state *css) page_counter_set_min(&memcg->memory, 0); page_counter_set_low(&memcg->memory, 0); page_counter_set_high(&memcg->memory, PAGE_COUNTER_MAX); + page_counter_set_min(&memcg->toptier, 0); + page_counter_set_low(&memcg->toptier, 0); + page_counter_set_high(&memcg->toptier, PAGE_COUNTER_MAX); memcg1_soft_limit_reset(memcg); page_counter_set_high(&memcg->swap, PAGE_COUNTER_MAX); memcg_wb_domain_size_changed(memcg); @@ -4438,12 +4443,51 @@ static ssize_t memory_peak_write(struct kernfs_open= _file *of, char *buf, =20 #undef OFP_PEAK_UNSET =20 +static inline unsigned long page_counter_max_or_scale(unsigned long val) +{ + return val =3D=3D PAGE_COUNTER_MAX ? PAGE_COUNTER_MAX : + mt_scale_by_toptier(val); +} + +void update_memcg_toptier_limits(void) +{ + struct mem_cgroup *memcg; + + if (!mem_cgroup_tiered_limits()) + return; + + for_each_mem_cgroup(memcg) { + unsigned long old_min =3D READ_ONCE(memcg->memory.min); + unsigned long old_low =3D READ_ONCE(memcg->memory.low); + unsigned long old_high =3D READ_ONCE(memcg->memory.high); + unsigned long old_max =3D READ_ONCE(memcg->memory.max); + + if (memcg =3D=3D root_mem_cgroup) + continue; + + page_counter_set_min(&memcg->toptier, + page_counter_max_or_scale(old_min)); + page_counter_set_low(&memcg->toptier, + page_counter_max_or_scale(old_low)); + page_counter_set_high(&memcg->toptier, + page_counter_max_or_scale(old_high)); + xchg(&memcg->toptier.max, + page_counter_max_or_scale(old_max)); + } +} + static int memory_min_show(struct seq_file *m, void *v) { return seq_puts_memcg_tunable(m, READ_ONCE(mem_cgroup_from_seq(m)->memory.min)); } =20 +static int toptier_min_show(struct seq_file *m, void *v) +{ + return seq_puts_memcg_tunable(m, + READ_ONCE(mem_cgroup_from_seq(m)->toptier.min)); +} + static ssize_t memory_min_write(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off) { @@ -4457,6 +4501,9 @@ static ssize_t memory_min_write(struct kernfs_open_fi= le *of, return err; =20 page_counter_set_min(&memcg->memory, min); + if (mem_cgroup_tiered_limits()) + page_counter_set_min(&memcg->toptier, + page_counter_max_or_scale(min)); =20 return nbytes; } @@ -4467,6 +4514,12 @@ static int memory_low_show(struct seq_file *m, void = *v) READ_ONCE(mem_cgroup_from_seq(m)->memory.low)); } =20 +static int toptier_low_show(struct seq_file *m, void *v) +{ + return seq_puts_memcg_tunable(m, + READ_ONCE(mem_cgroup_from_seq(m)->toptier.low)); +} + static ssize_t memory_low_write(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off) { @@ -4480,6 +4533,9 @@ static ssize_t memory_low_write(struct kernfs_open_fi= le *of, return err; =20 page_counter_set_low(&memcg->memory, low); + if (mem_cgroup_tiered_limits()) + page_counter_set_low(&memcg->toptier, + page_counter_max_or_scale(low)); =20 return nbytes; } @@ -4490,6 +4546,12 @@ static int memory_high_show(struct seq_file *m, void= *v) READ_ONCE(mem_cgroup_from_seq(m)->memory.high)); } =20 +static int toptier_high_show(struct seq_file *m, void *v) +{ + return seq_puts_memcg_tunable(m, + READ_ONCE(mem_cgroup_from_seq(m)->toptier.high)); +} + static ssize_t memory_high_write(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off) { @@ -4505,6 +4567,9 @@ static ssize_t memory_high_write(struct kernfs_open_f= ile *of, return err; =20 page_counter_set_high(&memcg->memory, high); + if (mem_cgroup_tiered_limits()) + page_counter_set_high(&memcg->toptier, + page_counter_max_or_scale(high)); =20 if (of->file->f_flags & O_NONBLOCK) goto out; @@ -4542,6 +4607,12 @@ static int memory_max_show(struct seq_file *m, void = *v) READ_ONCE(mem_cgroup_from_seq(m)->memory.max)); } =20 +static int toptier_max_show(struct seq_file *m, void *v) +{ + return seq_puts_memcg_tunable(m, + READ_ONCE(mem_cgroup_from_seq(m)->toptier.max)); +} + static ssize_t memory_max_write(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off) { @@ -4557,6 +4628,8 @@ static ssize_t memory_max_write(struct kernfs_open_fi= le *of, return err; =20 xchg(&memcg->memory.max, max); + if (mem_cgroup_tiered_limits()) + xchg(&memcg->toptier.max, page_counter_max_or_scale(max)); =20 if (of->file->f_flags & O_NONBLOCK) goto out; @@ -4762,6 +4835,26 @@ static struct cftype memory_files[] =3D { .seq_show =3D memory_max_show, .write =3D memory_max_write, }, + { + .name =3D "toptier_min", + .flags =3D CFTYPE_NOT_ON_ROOT, + .seq_show =3D toptier_min_show, + }, + { + .name =3D "toptier_low", + .flags =3D CFTYPE_NOT_ON_ROOT, + .seq_show =3D toptier_low_show, + }, + { + .name =3D "toptier_high", + .flags =3D CFTYPE_NOT_ON_ROOT, + .seq_show =3D toptier_high_show, + }, + { + .name =3D "toptier_max", + .flags =3D CFTYPE_NOT_ON_ROOT, + .seq_show =3D toptier_max_show, + }, { .name =3D "events", .flags =3D CFTYPE_NOT_ON_ROOT, diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c index acc02679e312d..ddcc11e3919da 100644 --- a/mm/memory-tiers.c +++ b/mm/memory-tiers.c @@ -924,15 +924,19 @@ static int __meminit memtier_hotplug_callback(struct = notifier_block *self, switch (action) { case NODE_REMOVED_LAST_MEMORY: mutex_lock(&memory_tier_lock); - if (clear_node_memory_tier(nn->nid)) + if (clear_node_memory_tier(nn->nid)) { establish_demotion_targets(); + update_memcg_toptier_limits(); + } mutex_unlock(&memory_tier_lock); break; case NODE_ADDED_FIRST_MEMORY: mutex_lock(&memory_tier_lock); memtier =3D set_node_memory_tier(nn->nid); - if (!IS_ERR(memtier)) + if (!IS_ERR(memtier)) { establish_demotion_targets(); + update_memcg_toptier_limits(); + } mutex_unlock(&memory_tier_lock); break; } --=20 2.52.0 From nobody Wed Jun 17 07:22:45 2026 Received: from mail-ot1-f50.google.com (mail-ot1-f50.google.com [209.85.210.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 204B0344D86 for ; Thu, 23 Apr 2026 20:34:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.50 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776976498; cv=none; b=H05AUFcpsW4qYYKD5zZh9hQLQ/wPCMX5L/tfX1VizbOkIvbvFN74enIK+eCGRVyq+ut4r6zJLmGbwRHvu3Nw9zKkR4bSRRtYPdVfn7+DKvoKVTB/Ad9EBcMIfdgNRNgpFoTcyGh6xqdpSPgF77IwoktnKIUxfQRYLxhg/9CYy8k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776976498; c=relaxed/simple; bh=v3B0kALdEcg2SGdPBM+DB4YS9GNCcUTy6UpO99PS20c=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=LtG8RW0ovn9s82w2fLte6QFVYjmlJhNBKXXVkCTNdZOTaGnFrCbXQWKhYCKDEI5cmZThLb6poXUuV4hstiDR9xC+RIxOPsByMGpGt923t/iyR2PpT2cn+5wUZckFX8y7I6v2CGn5eLWXeJd4o2/ptDcD/lqzE84tU9srAsITEJU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=R+ZGfYqX; arc=none smtp.client-ip=209.85.210.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="R+ZGfYqX" Received: by mail-ot1-f50.google.com with SMTP id 46e09a7af769-7dbccf6a23dso5696142a34.2 for ; Thu, 23 Apr 2026 13:34:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1776976496; x=1777581296; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=X3Q1U8jtPjzBDt27pK0uFFX6FlshrkPhJlAa3Maqh/4=; b=R+ZGfYqXDzZ/MWC/FySsn2NTg7q+XVU+HBftg9fcyUggpBSI8cfVuR784gkbRuczxA NCK2z2+vA51DEW3Z+cRITtRHmKn/sQnSd2dKzthglJIicPzhTy71Airz2qfBBenaJSPO Tgm3L/Uqsdi2zqHsfHui1CkuZHa4XVx8ThzmEo6YAewxHURAGBucI1BjQszGO6F6+Y3O J3kLm1O1zzsIPtE5VzdcYDWZ1joa0le861aCTE+NOIClTCPR79oebjs1tAiIBd70u86r MhG+ZW3gx3dzkF0Of66nY3iZ52aOa0jokWFdFh5DVTcgT/7bpbJLxDSffiX/xrMMBrjh ZUIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776976496; x=1777581296; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=X3Q1U8jtPjzBDt27pK0uFFX6FlshrkPhJlAa3Maqh/4=; b=HRssgJDaoxs26iASapq2roGzrPi0fgewDLwJG2OQuF/ytbUif4eFl0eMfXk/3I8yRD jLx2kxSF4RZTOqSH1qnO4bMU40QrcgE7OZp3mJkbAaJMNNKp559dQ3+EMgARPKd/8jZb pCq7ONm42snMmgHfwqB0yfDzHZVYMeCYEfqjvPGHUlxOoUHSWMfhxlmCXmq6XbzqS9fk +Ck9q7N8GdRz7MH7fUauTKwAbe2UipNSI869VnRXCgl+yB4vUx8SEsoJWrEoVjC2FmXA G9f6Ca7jRi1Dd2c6QUbGn3zw47m0YfG9v1Tw2mBymckoiqGB8mmLahTLZi0OUj9v5kKg 3rfA== X-Forwarded-Encrypted: i=1; AFNElJ8OwOWFyKBiRCp+QCICSOdSUIcyJIUNlQuciNrS/CL3Jg84/7kRmcMCzMbOV3jzJBYQKDat5Kk0kkK8LAo=@vger.kernel.org X-Gm-Message-State: AOJu0YzyA1hlDKwpMrgQENSQDwodDSXoRmY0NeXbCqKZ64E5IHzK/ZU3 rSvObZRqlKxFNVdzBrjY10HH0eSuv5K6DE5c5l9Rxg8JeE757yO1OedK X-Gm-Gg: AeBDieuRFW9iXTirp//f0ab9wydno/6bLhCY4gs8RWVGyKL+EDUc9836c5jh+DyKmIB 6vTIul8Mb6zRD4nEiW70tmSG4cf5r1ZM5Yt8ALwTHqJzYtvzKHHk+ty3C+s5veFFn8c+OkQ5h9o U8nI44659IVPq5s6Z+/cTdP/YetrhcrMg2fJTLFFudjdz/yjmSGtYE6d74/MoQqcKVaIfxd7hCE D9nPNYfxU82Fykd5nzA1iaSADr9bNQhRBYK1rTJ5SBlTViKA5eDOH9euK4aWmkwy/sKkp/Qq0Lf HJ+zw1fJVF8B0/JUk5nH4szHjtyIZt4IqN2efU5ccE8sPXatmF2yWNE4uwifAXBesZXlfgDWza7 tqWl50+OqsUWUGwCguN+hze03gnRvNSeqUviLIycIyUomN9r5lolEDiVta9x28o6SulbylzGBDG dP4T/FjM7yZgQlqaKrtaXJ/HTNLlwuwRZs X-Received: by 2002:a05:6820:298f:b0:695:a638:c6ba with SMTP id 006d021491bc7-695a638d134mr4634567eaf.7.1776976496089; Thu, 23 Apr 2026 13:34:56 -0700 (PDT) Received: from localhost ([2a03:2880:10ff:46::]) by smtp.gmail.com with ESMTPSA id 006d021491bc7-694994277a4sm5603069eaf.4.2026.04.23.13.34.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Apr 2026 13:34:55 -0700 (PDT) From: Joshua Hahn To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Kairui Song , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Kemeng Shi , Nhat Pham , Baoquan He , Barry Song , Youngjun Park , Muchun Song , Qi Zheng , Axel Rasmussen , Yuanchu Xie , Wei Xu , David Hildenbrand , Lorenzo Stoakes , cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: [RFC PATCH 6/9 v2] mm/vmscan, memcontrol: Add nodemask to try_to_free_mem_cgroup_pages Date: Thu, 23 Apr 2026 13:34:40 -0700 Message-ID: <20260423203445.2914963-7-joshua.hahnjy@gmail.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260423203445.2914963-1-joshua.hahnjy@gmail.com> References: <20260423203445.2914963-1-joshua.hahnjy@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add a new nodemask parameter to try_to_free_mem_cgroup_pages to allow selective reclaim on certain nodes. This new function signature can be used in future patches to selectively perform reclaim on toptier and place downward pressure when toptier limits are breached but memcg-wide limits are not yet breached. All callers pass NULL to the new nodemask, so there are no functional changes with this patch. Signed-off-by: Joshua Hahn --- include/linux/swap.h | 3 ++- mm/memcontrol-v1.c | 6 ++++-- mm/memcontrol.c | 11 +++++++---- mm/vmscan.c | 11 ++++++----- 4 files changed, 19 insertions(+), 12 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 1930f81e6be4d..493dd99f3165a 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -367,7 +367,8 @@ extern unsigned long try_to_free_mem_cgroup_pages(struc= t mem_cgroup *memcg, unsigned long nr_pages, gfp_t gfp_mask, unsigned int reclaim_options, - int *swappiness); + int *swappiness, + nodemask_t *allowed); extern unsigned long mem_cgroup_shrink_node(struct mem_cgroup *mem, gfp_t gfp_mask, bool noswap, pg_data_t *pgdat, diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index 433bba9dfe715..03df1cc71842c 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -1500,7 +1500,8 @@ static int mem_cgroup_resize_max(struct mem_cgroup *m= emcg, } =20 if (!try_to_free_mem_cgroup_pages(memcg, 1, GFP_KERNEL, - memsw ? 0 : MEMCG_RECLAIM_MAY_SWAP, NULL)) { + memsw ? 0 : MEMCG_RECLAIM_MAY_SWAP, + NULL, NULL)) { ret =3D -EBUSY; break; } @@ -1532,7 +1533,8 @@ static int mem_cgroup_force_empty(struct mem_cgroup *= memcg) return -EINTR; =20 if (!try_to_free_mem_cgroup_pages(memcg, 1, GFP_KERNEL, - MEMCG_RECLAIM_MAY_SWAP, NULL)) + MEMCG_RECLAIM_MAY_SWAP, + NULL, NULL)) nr_retries--; } =20 diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 3acb06388405c..3fb1ee1d18603 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2123,7 +2123,7 @@ static unsigned long reclaim_high(struct mem_cgroup *= memcg, nr_reclaimed +=3D try_to_free_mem_cgroup_pages(memcg, nr_pages, gfp_mask, MEMCG_RECLAIM_MAY_SWAP, - NULL); + NULL, NULL); psi_memstall_leave(&pflags); } while ((memcg =3D parent_mem_cgroup(memcg)) && !mem_cgroup_is_root(memcg)); @@ -2432,7 +2432,8 @@ static int try_charge_memcg(struct mem_cgroup *memcg,= gfp_t gfp_mask, =20 psi_memstall_enter(&pflags); nr_reclaimed =3D try_to_free_mem_cgroup_pages(mem_over_limit, nr_pages, - gfp_mask, reclaim_options, NULL); + gfp_mask, reclaim_options, + NULL, NULL); psi_memstall_leave(&pflags); =20 if (mem_cgroup_margin(mem_over_limit) >=3D nr_pages) @@ -4591,7 +4592,8 @@ static ssize_t memory_high_write(struct kernfs_open_f= ile *of, } =20 reclaimed =3D try_to_free_mem_cgroup_pages(memcg, nr_pages - high, - GFP_KERNEL, MEMCG_RECLAIM_MAY_SWAP, NULL); + GFP_KERNEL, MEMCG_RECLAIM_MAY_SWAP, + NULL, NULL); =20 if (!reclaimed && !nr_retries--) break; @@ -4651,7 +4653,8 @@ static ssize_t memory_max_write(struct kernfs_open_fi= le *of, =20 if (nr_reclaims) { if (!try_to_free_mem_cgroup_pages(memcg, nr_pages - max, - GFP_KERNEL, MEMCG_RECLAIM_MAY_SWAP, NULL)) + GFP_KERNEL, MEMCG_RECLAIM_MAY_SWAP, + NULL, NULL)) nr_reclaims--; continue; } diff --git a/mm/vmscan.c b/mm/vmscan.c index 5a8c8fcccbfc9..615aa0c899dad 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -6807,7 +6807,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem= _cgroup *memcg, unsigned long nr_pages, gfp_t gfp_mask, unsigned int reclaim_options, - int *swappiness) + int *swappiness, nodemask_t *allowed) { unsigned long nr_reclaimed; unsigned int noreclaim_flag; @@ -6823,6 +6823,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem= _cgroup *memcg, .may_unmap =3D 1, .may_swap =3D !!(reclaim_options & MEMCG_RECLAIM_MAY_SWAP), .proactive =3D !!(reclaim_options & MEMCG_RECLAIM_PROACTIVE), + .nodemask =3D allowed, }; /* * Traverse the ZONELIST_FALLBACK zonelist of the current node to put @@ -6848,7 +6849,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem= _cgroup *memcg, unsigned long nr_pages, gfp_t gfp_mask, unsigned int reclaim_options, - int *swappiness) + int *swappiness, nodemask_t *allowed) { return 0; } @@ -7964,9 +7965,9 @@ int user_proactive_reclaim(char *buf, reclaim_options =3D MEMCG_RECLAIM_MAY_SWAP | MEMCG_RECLAIM_PROACTIVE; reclaimed =3D try_to_free_mem_cgroup_pages(memcg, - batch_size, gfp_mask, - reclaim_options, - swappiness =3D=3D -1 ? NULL : &swappiness); + batch_size, gfp_mask, reclaim_options, + swappiness =3D=3D -1 ? NULL : &swappiness, + NULL); } else { struct scan_control sc =3D { .gfp_mask =3D current_gfp_context(gfp_mask), --=20 2.52.0 From nobody Wed Jun 17 07:22:45 2026 Received: from mail-ot1-f45.google.com (mail-ot1-f45.google.com [209.85.210.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C00FF3469F4 for ; Thu, 23 Apr 2026 20:34:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.45 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776976500; cv=none; b=uqPW3LEXOPTaCxNpOuAFMKY0HOeHUhJ0dg+ih0aUBQSvxqAxEeEd2vjNruucLfB6lWUoe645BbZG4kTUCw95UGIwlBONLFqz/s26zdTjcuRPrsbuxpmi0mdy0tYELmnz7p551Ccp+WEX4Fz1ij6gfgvM59y3yJIzUMlLfxMh5wk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776976500; c=relaxed/simple; bh=uWRjBtJOPcmcOipDjZW5A0jtm6KKpRLRzR3YtzGjk2Y=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=d2SU2k2LbrabTiWeWyYxr5jYfPXK2+MyuTSiv0mLUv0dgr+kdJpNHGDK6qFjbBK++3hMPLD7BI4aC/B4hEVxz1oNUWaPIbg9h4IpvC6M8tbf9B6haBzrjcEjxzZWTixyrtSwS6hVWWiXG01ZGvjwlsuEGVqUtHOMbw4cOsNjm+Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=dmZktplk; arc=none smtp.client-ip=209.85.210.45 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="dmZktplk" Received: by mail-ot1-f45.google.com with SMTP id 46e09a7af769-7d1872504cbso6022018a34.0 for ; Thu, 23 Apr 2026 13:34:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1776976498; x=1777581298; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=gL+yOOTEXjRtHcz4KSAUpDgaXmbkTootS4bc3SQomf4=; b=dmZktplkF4al2B7NXOlJlvuV2x0CrQN3HPhb/11HqO3Chkh7a7fFTHEHNUzttZHYkz RTOm/DrVojQChNpLPJcyRtz3pR9wOKNoc13nZ5XigkRYdNko6gRdtAexe90JAGh2/YUC 9qHDKNayvq1nrbL5jIimCb3paQtRe/KNjM+o/GdJUzh41NfS6mQx1Dx3A/xp2aJIJ/Hk zIBB2BkVnTaBzNtLXZuZnMp8LLF8v2b7A1DfLcumxShqQ7/NcXYIKms0uWAbJMZcE/7D 72nc6rGZMcSrrz87CaME3UHxNlW92bjzNzCRaovKWBjALpNf3gyH4gw3j85D3gGWtocD G20w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776976498; x=1777581298; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=gL+yOOTEXjRtHcz4KSAUpDgaXmbkTootS4bc3SQomf4=; b=aV5C9cAiU72X2F4xztRhs5z+DP4FB58/92d82FNXvkGDt38z+zhP7BLoyl/dL6KdVP nDrDGu8ZuiV3zou2mypw/nCyCYLavb2Qby+C8n8/3MERejgVDoEqCpDXgxBixau5Mj6g uRESuE28FyGrlTCt8pW9sheq/EMm1QbG74zdrCeP717oxyP46Zhl6DsW7um9zJ+rLSv2 zbe7hB7PpD54x8/ZlIkMEjyp1GudNSWSdp3p6GdztLt9rBXyEr0uInEK7+4/wTDVFtga amuXxZcSWX66v+5lya6owql5PAMkCdGsC3CV3UGhKxUY2iDC+LevRzzQ6rzo7vKf9zDv 53Ew== X-Forwarded-Encrypted: i=1; AFNElJ83pLM/G6XKNLZVjKOasvcYa9HwVF6t6dK8ED3qM/D/5d3wGIJiEHgqUn47YVzBtsWb0534C9hP2G5uIQA=@vger.kernel.org X-Gm-Message-State: AOJu0YyPnfqqTDl5HEr1dWX/sUwpWtkdPqD2XM3PddgnEdyzO2uhWncW LonWtAyaIJfljZ1BWsiGZwv6Z2+mRoVNdNy4VZkQ+Ne/EvIJdeRxTTqE X-Gm-Gg: AeBDietGDSWwtNp/6T7hE9GHs5LUUc9WQN9DkYk+E2Txip9awXzfP4urklmO3Z8EiwM +WCzzV/iXb5WFw95GJiHPF+xrkOqYMKUyvwlDjtRPva6Buq7uIu2pLqkfPg2KTfsDTfzMITo7yC Xg/Rl+UqAkT+bpQJ7HJ6KJPrb4llGlZu9kJMWozP8g2ZD0RI8vEzt6+pPRLPBLbgMqL70K8V9oN L65uB6mTDv5iGGX0hTwPJCXnyw8fzSipRGeh9EQuh/ke6uvKbaiO4Lfvbua8YeSrkckI33NwpsC wEAtxEra2NLIZPCyKe9heuSmjJt5BtV/bp9ZcqORe9XpWwTcNN0RmK023J/N/Cl0gPfbNKcrlL/ O3AE0dy5j52cfC8DMh4ObkTg0cHY0pjk0y4+nefYA8XUmCCzXt9Bv4j1RSJ653Ayz79vUrRd+TP 73w+cvwIxjWAiED37MnN7xPd9w/rhlmc4= X-Received: by 2002:a05:6830:82e3:b0:7dc:d7e5:8d43 with SMTP id 46e09a7af769-7dcd7e5a2e6mr8038210a34.2.1776976497705; Thu, 23 Apr 2026 13:34:57 -0700 (PDT) Received: from localhost ([2a03:2880:10ff:8::]) by smtp.gmail.com with ESMTPSA id 46e09a7af769-7dcc953e39asm10485258a34.15.2026.04.23.13.34.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Apr 2026 13:34:57 -0700 (PDT) From: Joshua Hahn To: linux-mm@kvack.org Cc: Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Andrew Morton , Muchun Song , cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: [RFC PATCH 7/9 v2] mm/memcontrol: Make memory.low and memory.min tier-aware Date: Thu, 23 Apr 2026 13:34:41 -0700 Message-ID: <20260423203445.2914963-8-joshua.hahnjy@gmail.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260423203445.2914963-1-joshua.hahnjy@gmail.com> References: <20260423203445.2914963-1-joshua.hahnjy@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" On machines serving multiple workloads whose memory is isolated via the memory cgroup controller, it is currently impossible to enforce a fair distribution of toptier memory among the workloads, as the only enforceable limits have to do with total memory footprint, but not where that memory resides. This makes ensuring a consistent and baseline performance difficult, as each workload's performance is heavily impacted by workload-external factors such as which other workloads are co-located in the same host, and the order at which different workloads are started. Extend the existing memory.{low, min} protection to be tier-aware in order to enforce proportional best-effort and guaranteed memory protection of toptier memory. Signed-off-by: Joshua Hahn --- include/linux/memcontrol.h | 8 ++++++++ mm/memcontrol.c | 3 +++ 2 files changed, 11 insertions(+) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 6bcb866440075..2222b390ebf10 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -624,6 +624,10 @@ static inline bool mem_cgroup_below_low(struct mem_cgr= oup *target, if (mem_cgroup_unprotected(target, memcg)) return false; =20 + if (mem_cgroup_tiered_limits() && READ_ONCE(memcg->toptier.elow) >=3D + page_counter_read(&memcg->toptier)) + return true; + return READ_ONCE(memcg->memory.elow) >=3D page_counter_read(&memcg->memory); } @@ -634,6 +638,10 @@ static inline bool mem_cgroup_below_min(struct mem_cgr= oup *target, if (mem_cgroup_unprotected(target, memcg)) return false; =20 + if (mem_cgroup_tiered_limits() && READ_ONCE(memcg->toptier.emin) >=3D + page_counter_read(&memcg->toptier)) + return true; + return READ_ONCE(memcg->memory.emin) >=3D page_counter_read(&memcg->memory); } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 3fb1ee1d18603..b115ff40e268d 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4933,6 +4933,9 @@ void mem_cgroup_calculate_protection(struct mem_cgrou= p *root, root =3D root_mem_cgroup; =20 page_counter_calculate_protection(&root->memory, &memcg->memory, recursiv= e_protection); + if (mem_cgroup_tiered_limits()) + page_counter_calculate_protection(&root->toptier, + &memcg->toptier, recursive_protection); } =20 static int charge_memcg(struct folio *folio, struct mem_cgroup *memcg, --=20 2.52.0 From nobody Wed Jun 17 07:22:45 2026 Received: from mail-oi1-f179.google.com (mail-oi1-f179.google.com [209.85.167.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 06284344D86 for ; Thu, 23 Apr 2026 20:35:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.167.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776976504; cv=none; b=uR5/2kmxih3gx1IbxDqBroklFCeouMY8eVoVe4pL4caKcOxr/J4rIQKP8kQq7IqFUzZgzz4qPgmLc58I0LZ6tcPgpuX0eyTcFNoZ30S1Iymn5dEGOHb7+wbKn5JTlzXK/TecqNv0Q0aDwwQll/epjezUV6+7TKB0F/XTlEzTi+w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776976504; c=relaxed/simple; bh=RKwyc5ityashLjP4KmDsx4hUDeZ5424iiaAS5OUt9G8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=PVY1JA3PjjtPrZ/4B+aG5cGEH3BxgWXpXL4OxzYz8v1mso3HS4T/HLcGDyACWvQ1f9b8qiCsXJ+mKs66MjiS3vZjVs1+h7jLF13BSQtog2jJ//VM/8JTTfAsiOU0Ou1PNsnJDDBTbHXHysKgGxCFM+KZbIP4UiIw9mNQtYfXmUI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Up/P7GKv; arc=none smtp.client-ip=209.85.167.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Up/P7GKv" Received: by mail-oi1-f179.google.com with SMTP id 5614622812f47-479ef2b7979so2353375b6e.3 for ; Thu, 23 Apr 2026 13:35:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1776976502; x=1777581302; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=VIA1ATnBximdIVrwS2AHkBrrXRUA8+TS+BtXj7CLEVI=; b=Up/P7GKvUbF4l6R9eDD3e55iDJn4oMitWq3vVPJxmOnzScR0LE/skShKjaYFeckoX4 45MoXfrADQ1hj7aLMtDm+Ho25yS4P6dMxAsK1eogvg7tuGnSt3N3u3G7ydHP/6K5eNkG JRnz7UUQCcAVuOhtkOs+vOhZl7UDzMSvm6nTmgJbZT6Z0Iz7TEQt5MYAyG3axP3kvHPF qTzGcaYQu+rgm6FlzeVcqiIlal/v5Wk/x41hWw3FzuvZG+uDWsw59SKPk56Tq0OoHwHg EPCQb7T1UXd+WJ7uZU9qcZ9j6VDT/LMEBmrNIA8Wb5jP4Oot6sKSiyh7ePLWftHZtDhJ bPNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776976502; x=1777581302; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=VIA1ATnBximdIVrwS2AHkBrrXRUA8+TS+BtXj7CLEVI=; b=iWf0kWF6Mi321awfxpPmocJZVp1eWVIFYzMSgQcbiqabhynphaJzfo48kQUVvTemMI v9OALIdXBtqR8cfuzA9rgZekFZ5/Q6t/VoNXhEMtJ78NhY41tgtcSNA7vmV71vNtU5RD a5+cNmX+W+1sGRpSZ7byCReGYEJMA2wNKYjAAh46ywpoYK2ut2yU2/jcmRlMCG7cZ7c0 ao+FU869lJZlcqW7jIzZX++aFvdsUgJMJ5NY5ICtVgtu4/1dKkWFahWhPYEM4L+bwvlZ lSGRq8pda/dK+V6Mtf7fHmI68tXfSmJq0NDn7aKm26axEUbkMgsuUQsdsPbehq7ZYZPw cWmA== X-Forwarded-Encrypted: i=1; AFNElJ9ij1zHLh16qw7n85TQlmM7HN92dHyRX27rOYnP2E7yO5xEWdmSTGqzi+Et6qKSOM9Ra9JgeyHVM/qSVh8=@vger.kernel.org X-Gm-Message-State: AOJu0YwuO+mW1OMp8FVMz9AojLmV0SiKtTWW65wgqht4cDb5yhOyqCOv JfGOtqb62EU7B0vCr5xPC8EehcPuPjEr2sjeANfPupI0PChi0fl2Cqir X-Gm-Gg: AeBDievFu6fcw/MG3TosU+yrsK83+httTooXYKn9dC6CNbJfYWc54bcd/n9MWGqg5a/ 3KQpsERoTifG8dsjw3v5AH/tX1+BLj/AhdACvoV7HMlLiLJx3N+tsrB8X2Ld/zozWd1oEd3C76j uyoqJFtYm9bH/J2NugyiKxHC7ATUlQ1ap53d0m2qn6bJiE6T+Rj5J+F8t0WR6lr9gHd2v4FCsSz 7731FvwuHxW/uNXy4okkcChwvWIrlSqlusygeF63D37VIAhKAZCw0ZQDC1t9acmtcT89vGoDG+g gMeAHwqzkJ/2BPjV98xM3VIFEXu2ptIuautxIA8+hZevlz24HutdSHJb46Aq/IQelw7KEPPqOmC DFjMsLWg1bk4tZyUP2leQKFYpvG3wisNi/vMJ9Yc40KkRG6dI3Jbk97YSSCX/DIg9bqePznZ5I8 WoI6+OKsx02G3H9MmYXLOUclCSHN3zLZs= X-Received: by 2002:a05:6808:e642:b0:47a:ae7:b5f6 with SMTP id 5614622812f47-47a0ae7c548mr3040802b6e.43.1776976501368; Thu, 23 Apr 2026 13:35:01 -0700 (PDT) Received: from localhost ([2a03:2880:10ff:7::]) by smtp.gmail.com with ESMTPSA id 5614622812f47-479f6705d7asm5629481b6e.10.2026.04.23.13.34.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Apr 2026 13:34:58 -0700 (PDT) From: Joshua Hahn To: linux-mm@kvack.org Cc: Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Andrew Morton , Muchun Song , cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: [RFC PATCH 8/9 v2] mm/memcontrol: Make memory.high tier-aware Date: Thu, 23 Apr 2026 13:34:42 -0700 Message-ID: <20260423203445.2914963-9-joshua.hahnjy@gmail.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260423203445.2914963-1-joshua.hahnjy@gmail.com> References: <20260423203445.2914963-1-joshua.hahnjy@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" On machines serving multiple workloads whose memory is isolated via the memory cgroup controller, it is currently impossible to enforce a fair distribution of toptier memory among the workloads, as the limits only enforce total memory footprint, but not where that memory resides. This makes ensuring consistent baseline performance difficult, as each workload's performance is heavily impacted by workload-external factors such as which other workloads are co-located in the same host, and the order in which the workloads are started. Extend the existing memory.high protection to be tier-aware. Depending on the combination of limit breaches, selectively reclaim on toptier nodes: when memory.high is breached, perform reclaim on all nodes. When memory.high is safe but toptier.high is breached, perform targeted reclaim on toptier nodes only. Also, throttle allocations when toptier is breached as well, making sure not to double-penalize when both toptier and memory limits are met. Signed-off-by: Joshua Hahn --- mm/memcontrol.c | 82 +++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 72 insertions(+), 10 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index b115ff40e268d..e5f39830d250d 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2112,10 +2112,25 @@ static unsigned long reclaim_high(struct mem_cgroup= *memcg, =20 do { unsigned long pflags; + nodemask_t toptier_nodes; + nodemask_t *reclaim_targets =3D NULL; =20 if (page_counter_read(&memcg->memory) <=3D - READ_ONCE(memcg->memory.high)) - continue; + READ_ONCE(memcg->memory.high)) { + if (!mem_cgroup_tiered_limits()) + continue; + + /* + * Even if the memcg is under the memory limit, toptier + * may have breached the toptier limit. Engage + * targeted reclaim on toptier nodes if so. + */ + if (page_counter_read(&memcg->toptier) <=3D + READ_ONCE(memcg->toptier.high)) + continue; + get_toptier_nodemask(&toptier_nodes); + reclaim_targets =3D &toptier_nodes; + } =20 memcg_memory_event(memcg, MEMCG_HIGH); =20 @@ -2123,7 +2138,7 @@ static unsigned long reclaim_high(struct mem_cgroup *= memcg, nr_reclaimed +=3D try_to_free_mem_cgroup_pages(memcg, nr_pages, gfp_mask, MEMCG_RECLAIM_MAY_SWAP, - NULL, NULL); + NULL, reclaim_targets); psi_memstall_leave(&pflags); } while ((memcg =3D parent_mem_cgroup(memcg)) && !mem_cgroup_is_root(memcg)); @@ -2224,6 +2239,23 @@ static u64 mem_find_max_overage(struct mem_cgroup *m= emcg) return max_overage; } =20 +static u64 toptier_find_max_overage(struct mem_cgroup *memcg) +{ + u64 overage, max_overage =3D 0; + + if (!mem_cgroup_tiered_limits()) + return 0; + + do { + overage =3D calculate_overage(page_counter_read(&memcg->toptier), + READ_ONCE(memcg->toptier.high)); + max_overage =3D max(overage, max_overage); + } while ((memcg =3D parent_mem_cgroup(memcg)) && + !mem_cgroup_is_root(memcg)); + + return max_overage; +} + static u64 swap_find_max_overage(struct mem_cgroup *memcg) { u64 overage, max_overage =3D 0; @@ -2326,6 +2358,14 @@ void __mem_cgroup_handle_over_high(gfp_t gfp_mask) penalty_jiffies =3D calculate_high_delay(memcg, nr_pages, mem_find_max_overage(memcg)); =20 + /* + * Don't double-penalize for toptier high overage if memory.high + * overage penalization has already been accounted for. + */ + if (!penalty_jiffies) + penalty_jiffies +=3D calculate_high_delay(memcg, nr_pages, + toptier_find_max_overage(memcg)); + penalty_jiffies +=3D calculate_high_delay(memcg, nr_pages, swap_find_max_overage(memcg)); =20 @@ -2522,22 +2562,26 @@ static int try_charge_memcg(struct mem_cgroup *memc= g, gfp_t gfp_mask, */ do { bool mem_high, swap_high; + bool toptier_high =3D false; =20 mem_high =3D page_counter_read(&memcg->memory) > READ_ONCE(memcg->memory.high); swap_high =3D page_counter_read(&memcg->swap) > READ_ONCE(memcg->swap.high); + toptier_high =3D mem_cgroup_tiered_limits() && + page_counter_read(&memcg->toptier) > + READ_ONCE(memcg->toptier.high); =20 /* Don't bother a random interrupted task */ if (!in_task()) { - if (mem_high) { + if (mem_high || toptier_high) { schedule_work(&memcg->high_work); break; } continue; } =20 - if (mem_high || swap_high) { + if (mem_high || swap_high || toptier_high) { /* * The allocating tasks in this cgroup will need to do * reclaim or be throttled to prevent further growth @@ -4577,10 +4621,28 @@ static ssize_t memory_high_write(struct kernfs_open= _file *of, =20 for (;;) { unsigned long nr_pages =3D page_counter_read(&memcg->memory); - unsigned long reclaimed; + unsigned long reclaimed, charge; + nodemask_t toptier_nodes; + nodemask_t *reclaim_targets =3D NULL; =20 - if (nr_pages <=3D high) - break; + if (nr_pages <=3D high) { + unsigned long toptier_nr_pages, toptier_high; + + if (!mem_cgroup_tiered_limits()) + break; + + toptier_nr_pages =3D page_counter_read(&memcg->toptier); + toptier_high =3D READ_ONCE(memcg->toptier.high); + + if (toptier_nr_pages <=3D toptier_high) + break; + + get_toptier_nodemask(&toptier_nodes); + reclaim_targets =3D &toptier_nodes; + charge =3D toptier_nr_pages - toptier_high; + } else { + charge =3D nr_pages - high; + } =20 if (signal_pending(current)) break; @@ -4591,9 +4653,9 @@ static ssize_t memory_high_write(struct kernfs_open_f= ile *of, continue; } =20 - reclaimed =3D try_to_free_mem_cgroup_pages(memcg, nr_pages - high, + reclaimed =3D try_to_free_mem_cgroup_pages(memcg, charge, GFP_KERNEL, MEMCG_RECLAIM_MAY_SWAP, - NULL, NULL); + NULL, reclaim_targets); =20 if (!reclaimed && !nr_retries--) break; --=20 2.52.0 From nobody Wed Jun 17 07:22:45 2026 Received: from mail-ot1-f50.google.com (mail-ot1-f50.google.com [209.85.210.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3F9A03451D5 for ; Thu, 23 Apr 2026 20:35:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.50 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776976505; cv=none; b=EQNkl6SP1f8tnyT1wgpWivLTpqnlDlrQZP0KhTuOXzJ460U98nhLTxUYH1VUcvSLn4ynthdRmdlxjGRBLwT815I4Z+sFoIIQVlJfnWcG1RD+I3i7lnm53nR4IJNhf549JfIAILo1dVPMzgx7uFLc3Lns+PPnMAEVblI6l3NOU9M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776976505; c=relaxed/simple; bh=X0KVZocSB/kQAXbaJ8RFL/IxN+NQQ785g3daaat8ObI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=qmt19ORn6O06EswWU/FiwBSNN/1hr0swG7xF5YZkmOWx/YEw4Hhs1Mq8i60vea8vz2KvMCch92J9KWYhHOL6+db3QnlmXwTXSOVLtC7ukxnjgLG9G7UR4zFqdFQ+sZPVa4oQUB7XEXJ3a2z6l2kEfoPxafIL1StoRlRKGKsKDLo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=izpImXtv; arc=none smtp.client-ip=209.85.210.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="izpImXtv" Received: by mail-ot1-f50.google.com with SMTP id 46e09a7af769-7d1872504cbso6022099a34.0 for ; Thu, 23 Apr 2026 13:35:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1776976503; x=1777581303; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=IxH+yk1exTjyfi+5qxKujuSUOSqkgCGj2Y+qVsczotM=; b=izpImXtv5c/k1MH8h6AcpMi0lGah9eHJckYknV6SrQteZwwI0YDdtr93A1M3dp6mjU y6EdOZSJrZXucQOEgl7BkK78JQiLYYuJRb2AoFikRoWXVfAxFqpfZciQIvR+C0XKGeoo 7Ui/7dXslNfjrrSabZh70aHt3Rm3aYSRfU8rjFOW0uLoFSPQSW+yVulr8biOTooNca0Q YQA71K2aBscZIExoJfNvZxC6nWimpescpAsHD0QJo6gGF6XW832QBS6wAdnVnAyFK9ed Vwqsj84yR4uRuJD9c9dYoE+dAo6p3rvhoOf/t1qpfwJRZbx8b3qQx8STSgidxcwPfWRM aiBQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776976503; x=1777581303; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=IxH+yk1exTjyfi+5qxKujuSUOSqkgCGj2Y+qVsczotM=; b=VlXDt3qCA0NpamBKA1ceM5hoDpsUKtPhAC7obRtPa/uwNXjjCkGUtJPIlIpoN+wxNV 7ydOcN7vBQsanC0UKoqZHYpjHj0dSajN2DQL1+WvTKtfWGedeUTk6d2bVP3T8XICCQNL kz6dco35u0dBCIP7dR2NZMGxg6+aqaez9pnRzPpbBU2YVsEkrMTrNmyRTimpdBzttN7I Y8yz1BBB27/lykOGpC4qGvC0A8SWR4FxacBoL5VlMZIHIDk3auV0bxbYMG8DEp5+Pdpq bTIedl6ZFe2l0GqvdWrjN+roKwfQ0miMS0M1PP519S53v6JkrHQY3g1Tps5DWLpawZRz kY6Q== X-Forwarded-Encrypted: i=1; AFNElJ8hZYfECxxr6wNEVbxvy6hFZjUE0HKHFJG8giOCu7EImi1JzVqqNNgz2uDR1Rq5g4L5BAciZKh1ofYIXiM=@vger.kernel.org X-Gm-Message-State: AOJu0YwHzVBAZCzosLrjxzH9SzqV8N3hYTgz6qXVlbwPLw9Cs4djaowS BSwTCjPId9W2JSJUtDAfZo+5f/yk1KjFqdy/BMNhIgQpAJpcTa1iHMF4 X-Gm-Gg: AeBDiet+NIKek6K6VQ58gUWLZSAg1piqEZs/lTccxwRS9Vhp76G+FMNl/D4gTov0VDl YfYtjTwsw2AhuVIuS9LAQ1/69XrKkjTAeTeCmeXuq7dWCUeBJpdgQOGQ5lTbbuZhm1WdBeYTCGs rMKMAh7aDPBOX1Mx1jarh5OWlStN3AKPO5ya0nU00TNesDB1N3dxJ7BmuvhB0I9gEXzhbCG9FWu RZp97qn0Ewx+1/WxSFk0bDi9v0EFyuxmB/VEF0ij6/dJwnF1fyYA1U2yRh6AgNXRr8sVYKX50oJ PJV0L0bDQF+jZqd8V5HcVLiKmtGslIkrh3uVIhg6+NjQ7TmswB/u4HleDFzThLj14sDXgeILDpT 4/P0r1EnU2Xcd85n6HlA9hjR1BGllZ7uzpPpQo1muknoqORIq+XkgDhNuT8QoAdDqZtA60zLZix m6qv1/8lFjbUO7IyW3HggvP65axZNQuwnh X-Received: by 2002:a9d:6044:0:b0:7dc:807:d1f3 with SMTP id 46e09a7af769-7dc9550d8e2mr10674183a34.7.1776976503071; Thu, 23 Apr 2026 13:35:03 -0700 (PDT) Received: from localhost ([2a03:2880:10ff:49::]) by smtp.gmail.com with ESMTPSA id 46e09a7af769-7dcc892c515sm10877354a34.21.2026.04.23.13.35.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Apr 2026 13:35:02 -0700 (PDT) From: Joshua Hahn To: linux-mm@kvack.org Cc: Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Andrew Morton , Muchun Song , cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: [RFC PATCH 9/9 v2] mm/memcontrol: Make memory.max tier-aware Date: Thu, 23 Apr 2026 13:34:43 -0700 Message-ID: <20260423203445.2914963-10-joshua.hahnjy@gmail.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260423203445.2914963-1-joshua.hahnjy@gmail.com> References: <20260423203445.2914963-1-joshua.hahnjy@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" On machines serving multiple workloads whose memory is isolated via the memory cgroup controller, it is currently impossible to enforce a fair distribution of toptier memory among the workloads, as the limits only enforce total memory footprint, but not where that memory resides. This makes ensuring consistent baseline performance difficult, as each workload's performance is heavily impacted by workload-external factors such as which other workloads are co-located in the same host, and the order in which the workloads are started. Extend the existing memory.max protection to be tier-aware. Depending on the combination of limit breaches, selectively reclaim on toptier nodes: when memory.max is breached, perform reclaim on all nodes. When memory.max is safe but toptier.max is breached, perform targeted reclaim on toptier nodes only. Signed-off-by: Joshua Hahn --- mm/memcontrol.c | 56 ++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 44 insertions(+), 12 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index e5f39830d250d..d8d67ada993ff 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1518,6 +1518,15 @@ static unsigned long mem_cgroup_margin(struct mem_cg= roup *memcg) if (count < limit) margin =3D limit - count; =20 + if (mem_cgroup_tiered_limits()) { + count =3D page_counter_read(&memcg->toptier); + limit =3D READ_ONCE(memcg->toptier.max); + if (count < limit) + margin =3D min(margin, limit - count); + else + margin =3D 0; + } + if (do_memsw_account()) { count =3D page_counter_read(&memcg->memsw); limit =3D READ_ONCE(memcg->memsw.max); @@ -2424,11 +2433,12 @@ static int try_charge_memcg(struct mem_cgroup *memc= g, gfp_t gfp_mask, bool raised_max_event =3D false; unsigned long pflags; bool allow_spinning =3D gfpflags_allow_spinning(gfp_mask); - bool toptier_charged; + nodemask_t toptier_nodes; + nodemask_t *reclaim_nodes; =20 retry: reclaim_options =3D MEMCG_RECLAIM_MAY_SWAP; - toptier_charged =3D false; + reclaim_nodes =3D NULL; =20 if (do_memsw_account() && !page_counter_try_charge(&memcg->memsw, nr_pages, &counter)) { @@ -2438,13 +2448,20 @@ static int try_charge_memcg(struct mem_cgroup *memc= g, gfp_t gfp_mask, } =20 if (toptier && - page_counter_try_charge(&memcg->toptier, nr_pages, &counter)) - toptier_charged =3D true; + !page_counter_try_charge(&memcg->toptier, nr_pages, &counter)) { + get_toptier_nodemask(&toptier_nodes); + reclaim_nodes =3D &toptier_nodes; + mem_over_limit =3D mem_cgroup_from_counter(counter, toptier); + + if (do_memsw_account()) + page_counter_uncharge(&memcg->memsw, nr_pages); + goto reclaim; + } =20 if (page_counter_try_charge(&memcg->memory, nr_pages, &counter)) goto done_restock; =20 - if (toptier_charged) + if (toptier) page_counter_uncharge(&memcg->toptier, nr_pages); if (do_memsw_account()) page_counter_uncharge(&memcg->memsw, nr_pages); @@ -2473,7 +2490,7 @@ static int try_charge_memcg(struct mem_cgroup *memcg,= gfp_t gfp_mask, psi_memstall_enter(&pflags); nr_reclaimed =3D try_to_free_mem_cgroup_pages(mem_over_limit, nr_pages, gfp_mask, reclaim_options, - NULL, NULL); + NULL, reclaim_nodes); psi_memstall_leave(&pflags); =20 if (mem_cgroup_margin(mem_over_limit) >=3D nr_pages) @@ -4683,7 +4700,8 @@ static ssize_t memory_max_write(struct kernfs_open_fi= le *of, struct mem_cgroup *memcg =3D mem_cgroup_from_css(of_css(of)); unsigned int nr_reclaims =3D MAX_RECLAIM_RETRIES; bool drained =3D false; - unsigned long max; + unsigned long max, toptier_max =3D PAGE_COUNTER_MAX; + nodemask_t toptier_nodes; int err; =20 buf =3D strstrip(buf); @@ -4692,16 +4710,30 @@ static ssize_t memory_max_write(struct kernfs_open_= file *of, return err; =20 xchg(&memcg->memory.max, max); - if (mem_cgroup_tiered_limits()) - xchg(&memcg->toptier.max, page_counter_max_or_scale(max)); + if (mem_cgroup_tiered_limits()) { + toptier_max =3D page_counter_max_or_scale(max); + xchg(&memcg->toptier.max, toptier_max); + get_toptier_nodemask(&toptier_nodes); + } =20 if (of->file->f_flags & O_NONBLOCK) goto out; =20 for (;;) { unsigned long nr_pages =3D page_counter_read(&memcg->memory); + unsigned long nr_toptier =3D page_counter_read(&memcg->toptier); + unsigned long to_reclaim =3D 0; + nodemask_t *reclaim_nodes =3D NULL; + + if (nr_pages > max) { + to_reclaim =3D nr_pages - max; + } else if (mem_cgroup_tiered_limits() && + nr_toptier > toptier_max) { + to_reclaim =3D nr_toptier - toptier_max; + reclaim_nodes =3D &toptier_nodes; + } =20 - if (nr_pages <=3D max) + if (!to_reclaim) break; =20 if (signal_pending(current)) @@ -4714,9 +4746,9 @@ static ssize_t memory_max_write(struct kernfs_open_fi= le *of, } =20 if (nr_reclaims) { - if (!try_to_free_mem_cgroup_pages(memcg, nr_pages - max, + if (!try_to_free_mem_cgroup_pages(memcg, to_reclaim, GFP_KERNEL, MEMCG_RECLAIM_MAY_SWAP, - NULL, NULL)) + NULL, reclaim_nodes)) nr_reclaims--; continue; } --=20 2.52.0