From nobody Tue Feb 10 11:23:50 2026 Received: from mail-qk1-f175.google.com (mail-qk1-f175.google.com [209.85.222.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 142F6352924 for ; Wed, 12 Nov 2025 19:30:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.175 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762975805; cv=none; b=XKxskaDX+j/xa9gLduHI1ALrNucWXmZFvNHGBTgysy6EJnj1TgicV69Mkxck/27+gGxxzuQbhCI1NyOEZpkr0Y7tTCfgr0rsK86VB4Gfel4Xl6toMcY/Gt26KlBbi6mXoHSKB7AHSTq1ZR97mw8d5buUeNlPp7JmhPE3A93a+OY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762975805; c=relaxed/simple; bh=qkFB80HrhZMKpQhvPyMUjApbN47T6UHdB3cpm7OAspA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=roKO0UkYp8LrfPt2opyQN60bgfBM1jx7zLJ5G7t9Bc6AtTfUn1Ldjb5MpvWhZzfHENVLFCaRC1QlDqQqN3VpfFR0lZPpL7K+BC7FMQluNDhtRP757RWiE6RRXk43KKcZ6ffYQINF4G+D5RkzUPZy27MJNierQOFd3GByxQ2VWRo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=T/rKUQMm; arc=none smtp.client-ip=209.85.222.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="T/rKUQMm" Received: by mail-qk1-f175.google.com with SMTP id af79cd13be357-8b271e6833fso112138485a.0 for ; Wed, 12 Nov 2025 11:30:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1762975803; x=1763580603; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=S9HfsEAYF9yiUWWUj5tMirDvvn0FEtJ7B1MIYMFDuxY=; b=T/rKUQMmSjvQ0EA5rpdMH77VwhiM5CblzY+F23KGJH2+i0FbsXSf3lLgGeu6aFAa3x 5r7df/efcs90iAKZlXVwDzIDAMIS4ADvFfr1EhEK0QnY8AbUW1/KurS5EBQtM+LOZSGL Cc13jMCILegW2HLNkqdUTaipUUtLfWZt5P7VD81nNOm3DjjbosAzABE7YFG9vazlYf3F aT2l50Wf5492Zuo1y4wBhJB+UGOCwdMwwKO65k0GoR2tgdfRLLra+74jRs4jJTcS7sZz kMc4L1Cw+/8pDlZhMJcxiUZLNXf+gcZFInG//EJ/qxZPmIV3Y7f+mdqjgS8DDes1IufY +QFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1762975803; x=1763580603; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=S9HfsEAYF9yiUWWUj5tMirDvvn0FEtJ7B1MIYMFDuxY=; b=sQL4GHsT1z/X223WPMe14qaXy1rJQ7jQ0YZv2DACggYHK9yVfEm9ed+UxpJWuF+GtU Hz7o8EzEUyXizZWXB9D+nbkH/qMX19ch4UqqQdsLdmBePVJwQAxReV5v+V2p/D7dw3wm L334CHUuGM1Ax/e9pQCIiXMulNSrL0olN0Pv2L6FU3phwZ0dATeVf90LdiAJSDBBqEd2 /F9xEn/cIol0+YBEwnw7QMDYe8D06hYXkvfmtNdzBI8osZUfCWKKSo8qlP7RhX1VqHMm 7Rg2NhthtDg0Ha+LNMGWVShLAYe+U2SBZx21h8P3y2LarIm+TzwJVWSL3aamQ9Io+OsS dl+g== X-Forwarded-Encrypted: i=1; AJvYcCUpNIoZTcGYAoEL3RdrIxMNv7fPZM9qkLvrHk3SxOVwuo4gMHjjvG5vq2EvN7Y5S3ZTNUGRL+Jl33ENnoU=@vger.kernel.org X-Gm-Message-State: AOJu0YxZVS0vh5ykMYuFK1HAcDeL5ANBHX9GFcq08OVMtvsLEk+ltpFR 9fUJ4bT4oE1s9RE+dSTCY2TxFuDttEnuUUBCQ5olxxfktqZtofNdncBxgsZqblbH4Fw= X-Gm-Gg: ASbGncv2uIQpE+/a8ZOWLI69xRQk9E/1+Zdwf5Dt1bMR9Mu8PMnhdksqYIsQrWrcsr6 IsOuONU4GWEN2pJUp84TPwdsxXq/2YQnrSBUcZNEjnYGUhFx1VAYhVdJCfTfM5UWJTAtBJN9nRm CEouZSNBvqH32tVN3Lqt4aG72hkPJmAzN/hEWVr1dxgRznCVmlUjUNI72Pz9lc9QDgAZvLsPr4u EAqeLrdh9SQvdhk64Z2a/vw/aUg4kA0GaEY7DMetdK8YIAIboSEB//WqPeJDAhjb8wVCu1W8MQE CxIdbOd9hlrvOAhx7Kr6EfpAcRwwwfdDa1kQ48F1XCjVcdwO0D/HAiEUEeuwA8Sn0gjg9s6Gs6J RxRla+q8lrJ8XHVZmJ5tbSbk1GYqOGIj8vXxn2BbzRXl+EpnUWzucNwXuZID/aqbFnCq+Dpkmzo QlbACZbnKF6fxdMYyumQ0udHZgD5Ofr/B/tAHP2ogfr0tqYjI+dp1GtqVh39yxgagT X-Google-Smtp-Source: AGHT+IEVluARmCpMkOWEYo8q4r5FDTRBhMVJs5VhBJ+XJ3044t3Im4FugeCYJRKlNUbrCKiFOXWBbg== X-Received: by 2002:a05:620a:1914:b0:8b2:73f0:bd20 with SMTP id af79cd13be357-8b2ac200a20mr88765385a.39.1762975803034; Wed, 12 Nov 2025 11:30:03 -0800 (PST) Received: from gourry-fedora-PF4VCD3F.lan (pool-96-255-20-138.washdc.ftas.verizon.net. [96.255.20.138]) by smtp.gmail.com with ESMTPSA id af79cd13be357-8b29aa0082esm243922885a.50.2025.11.12.11.30.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 12 Nov 2025 11:30:02 -0800 (PST) From: Gregory Price To: linux-mm@kvack.org Cc: kernel-team@meta.com, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev, linux-fsdevel@vger.kernel.org, cgroups@vger.kernel.org, dave@stgolabs.net, jonathan.cameron@huawei.com, dave.jiang@intel.com, alison.schofield@intel.com, vishal.l.verma@intel.com, ira.weiny@intel.com, dan.j.williams@intel.com, longman@redhat.com, akpm@linux-foundation.org, david@redhat.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com, mhocko@suse.com, osalvador@suse.de, ziy@nvidia.com, matthew.brost@intel.com, joshua.hahnjy@gmail.com, rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net, ying.huang@linux.alibaba.com, apopple@nvidia.com, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, tj@kernel.org, hannes@cmpxchg.org, mkoutny@suse.com, kees@kernel.org, muchun.song@linux.dev, roman.gushchin@linux.dev, shakeel.butt@linux.dev, rientjes@google.com, jackmanb@google.com, cl@gentwo.org, harry.yoo@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, zhengqi.arch@bytedance.com, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev, fabio.m.de.francesco@linux.intel.com, rrichter@amd.com, ming.li@zohomail.com, usamaarif642@gmail.com, brauner@kernel.org, oleg@redhat.com, namcao@linutronix.de, escape@linux.alibaba.com, dongjoo.seo1@samsung.com Subject: [RFC PATCH v2 05/11] mm: restrict slub, oom, compaction, and page_alloc to sysram by default Date: Wed, 12 Nov 2025 14:29:21 -0500 Message-ID: <20251112192936.2574429-6-gourry@gourry.net> X-Mailer: git-send-email 2.51.1 In-Reply-To: <20251112192936.2574429-1-gourry@gourry.net> References: <20251112192936.2574429-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Restrict page allocation and zone iteration behavior in mm to skip SPM Nodes via cpusets, or mt_sysram_nodelist when cpusets is disabled. This constrains core users of nodemasks to the mt_sysram_nodelist, which is guaranteed to at least contain the set of nodes with sysram memory blocks present at boot (or NULL if NUMA is compiled out). If the sysram nodelist is empty (something in memory-tiers broken), return NULL, which still allows all zones to be iterated. Signed-off-by: Gregory Price --- mm/compaction.c | 3 +++ mm/oom_kill.c | 5 ++++- mm/page_alloc.c | 18 ++++++++++++++---- mm/slub.c | 15 ++++++++++++--- 4 files changed, 33 insertions(+), 8 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index d2176935d3dd..7b73179d1fbf 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include #include @@ -2832,6 +2833,8 @@ enum compact_result try_to_compact_pages(gfp_t gfp_ma= sk, unsigned int order, if ((alloc_flags & ALLOC_CPUSET) && !cpuset_zone_allowed(zone, gfp_mask)) continue; + else if (!mt_node_allowed(zone_to_nid(zone), gfp_mask)) + continue; =20 if (prio > MIN_COMPACT_PRIORITY && compaction_deferred(zone, order)) { diff --git a/mm/oom_kill.c b/mm/oom_kill.c index c145b0feecc1..386b4ceeaeb8 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -34,6 +34,7 @@ #include #include #include +#include #include #include #include @@ -1118,6 +1119,8 @@ EXPORT_SYMBOL_GPL(unregister_oom_notifier); bool out_of_memory(struct oom_control *oc) { unsigned long freed =3D 0; + if (!oc->nodemask) + oc->nodemask =3D mt_sysram_nodemask(); =20 if (oom_killer_disabled) return false; @@ -1154,7 +1157,7 @@ bool out_of_memory(struct oom_control *oc) */ oc->constraint =3D constrained_alloc(oc); if (oc->constraint !=3D CONSTRAINT_MEMORY_POLICY) - oc->nodemask =3D NULL; + oc->nodemask =3D mt_sysram_nodemask(); check_panic_on_oom(oc); =20 if (!is_memcg_oom(oc) && sysctl_oom_kill_allocating_task && diff --git a/mm/page_alloc.c b/mm/page_alloc.c index bcaf1125d109..2ea6a50f6079 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -34,6 +34,7 @@ #include #include #include +#include #include #include #include @@ -3753,6 +3754,8 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int o= rder, int alloc_flags, if ((alloc_flags & ALLOC_CPUSET) && !cpuset_zone_allowed(zone, gfp_mask)) continue; + else if (!mt_node_allowed(zone_to_nid(zone), gfp_mask)) + continue; /* * When allocating a page cache page for writing, we * want to get it from a node that is within its dirty @@ -4555,6 +4558,8 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order, if ((alloc_flags & ALLOC_CPUSET) && !cpuset_zone_allowed(zone, gfp_mask)) continue; + else if (!mt_node_allowed(zone_to_nid(zone), gfp_mask)) + continue; =20 available =3D reclaimable =3D zone_reclaimable_pages(zone); available +=3D zone_page_state_snapshot(zone, NR_FREE_PAGES); @@ -4608,7 +4613,7 @@ check_retry_cpuset(int cpuset_mems_cookie, struct all= oc_context *ac) */ if (cpusets_enabled() && ac->nodemask && !cpuset_nodemask_valid_mems_allowed(ac->nodemask)) { - ac->nodemask =3D NULL; + ac->nodemask =3D mt_sysram_nodemask(); return true; } =20 @@ -4792,7 +4797,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int o= rder, * user oriented. */ if (!(alloc_flags & ALLOC_CPUSET) || reserve_flags) { - ac->nodemask =3D NULL; + ac->nodemask =3D mt_sysram_nodemask(); ac->preferred_zoneref =3D first_zones_zonelist(ac->zonelist, ac->highest_zoneidx, ac->nodemask); } @@ -4944,7 +4949,8 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask= , unsigned int order, ac->nodemask =3D &cpuset_current_mems_allowed; else *alloc_flags |=3D ALLOC_CPUSET; - } + } else if (!ac->nodemask) /* sysram_nodes may be NULL during __init */ + ac->nodemask =3D mt_sysram_nodemask(); =20 might_alloc(gfp_mask); =20 @@ -5053,6 +5059,8 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int = preferred_nid, if ((alloc_flags & ALLOC_CPUSET) && !cpuset_zone_allowed(zone, gfp)) continue; + else if (!mt_node_allowed(zone_to_nid(zone), gfp)) + continue; =20 if (nr_online_nodes > 1 && zone !=3D zonelist_zone(ac.preferred_zoneref)= && zone_to_nid(zone) !=3D zonelist_node_idx(ac.preferred_zoneref)) { @@ -5187,8 +5195,10 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, = unsigned int order, /* * Restore the original nodemask if it was potentially replaced with * &cpuset_current_mems_allowed to optimize the fast-path attempt. + * + * If not set, default to sysram nodes. */ - ac.nodemask =3D nodemask; + ac.nodemask =3D nodemask ? nodemask : mt_sysram_nodemask(); =20 page =3D __alloc_pages_slowpath(alloc_gfp, order, &ac); =20 diff --git a/mm/slub.c b/mm/slub.c index 1bf65c421325..c857db97c6a0 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -28,6 +28,7 @@ #include #include #include +#include #include #include #include @@ -3576,11 +3577,19 @@ static struct slab *get_any_partial(struct kmem_cac= he *s, zonelist =3D node_zonelist(mempolicy_slab_node(), pc->flags); for_each_zone_zonelist(zone, z, zonelist, highest_zoneidx) { struct kmem_cache_node *n; + int nid =3D zone_to_nid(zone); + bool allowed; =20 - n =3D get_node(s, zone_to_nid(zone)); + n =3D get_node(s, nid); + if (!n) + continue; + + if (cpusets_enabled()) + allowed =3D __cpuset_zone_allowed(zone, pc->flags); + else + allowed =3D mt_node_allowed(nid, pc->flags); =20 - if (n && cpuset_zone_allowed(zone, pc->flags) && - n->nr_partial > s->min_partial) { + if (allowed && (n->nr_partial > s->min_partial)) { slab =3D get_partial_node(s, n, pc); if (slab) { /* --=20 2.51.1