From nobody Sun Feb 8 22:43:21 2026 Received: from mail-qk1-f175.google.com (mail-qk1-f175.google.com [209.85.222.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0EAB334C9BE for ; Wed, 12 Nov 2025 19:29:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.175 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762975794; cv=none; b=kq12IpFFEX8DQ2SMQp1Wn7ByNhizvDDt/4pxMarWpzn3ZGP5ND/jnmGSbgUWK+8jdXFc2rkc3fpXq8oxHkbLhQP3gdLGAE8MH6kX+eltvzPBJiF6QNXy+Mb89AO23OvFEFbiybeiBp5yso3+DsYa9ANX1cleeU/73d3gy08hX7Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762975794; c=relaxed/simple; bh=oWT3QdfzIV3jbSGJXVBBh6ddTLW23EABmnAWQr4nZ1Q=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=QkRh8FZdFi5VJMAwvLAARD/7L6sgsuIRJ1JlNx9QGKhUX0hFvAmWMj0RDI0S4ZGsLKUy3juz13wTm6Wz3tzcqfzM8y32SMUifK6PCgWLS8sDBzG7Q1CK1QHqRN5D6sYngUntlz1R0nk6S/RDXXjw1YQv58no7gildmP4ed8hEbs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=ueCv7JmA; arc=none smtp.client-ip=209.85.222.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="ueCv7JmA" Received: by mail-qk1-f175.google.com with SMTP id af79cd13be357-8b272a4ca78so6080485a.1 for ; Wed, 12 Nov 2025 11:29:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1762975791; x=1763580591; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=T7tukR8goduFTZ00FpwpL7JWktUWF3Mq7A3Wmoox3b0=; b=ueCv7JmAjtqI0gF+tfVnACo9PU17ifJtvdmippVwrC5/JZH1bQevLHEZ0bn15lkr9S Ier1nGktNolPrwiEgbq4RmATGAYT6iR9RlR/FR504QEJmNDBYJ2gZNWJ3/HrJbPGxjFk G6koATqImxp133UblEdRRXaZHk7Mk3ne6qmST2sooUX15G91xUTeu/7hRygInlHBe2E+ hrMgegeuVmzq/nxFXlWJMi3qHZh/fBq05t8VElR4paYQ6l2EYxWSnu2hGX08rsyZkqyG KAuI0J312Y/rXV8yQi5vycN/oZYwdjOOHvnrk9ClSEV0KSnvySm8HaOFR9CTa3iqNvjd Loqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1762975791; x=1763580591; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=T7tukR8goduFTZ00FpwpL7JWktUWF3Mq7A3Wmoox3b0=; b=Hi7ndZ3NqS0G8GCDnJXkRtSmgXz9qudrOkKJ3bc/cUceFfB3mz0c0+9RAGEvmcNsvA UhHM6Izg4rdWjzL/IIU2W7rtwerY+lRrLqFXgX41VRMvCduxWb+rnK+pB4Lt5AQUDhhE 2aWyPcH/mxDfMGyu5ngIaJn56kwPYkQ/TMIuXDzMTEE+FAzpbSu0KuoiuPa632YT1F+k egoQgswg3tm6rMyF9sTuG/OZCycAUOhFB0pgaYx0DDddhEQPz1yacUvgPDI6P4pH2y9o cmU+tEC346/qH4mrsehV4qABfq2j4UYDfH1+EiIsshn6dz6QNhMr49/YrSYS8eqmqHIp Wduw== X-Forwarded-Encrypted: i=1; AJvYcCXfOnFYsFThuSBzM9uvYnYIF32B0T0hfeTDGD54tQ63qcB2r27MpsVPrgoJ2c7UhpGxS7YmaNSNfsOR9Ko=@vger.kernel.org X-Gm-Message-State: AOJu0YyMWYK1F1a2dyrrXN2HxRlh+ryPc4pCVwmmBGr375iFELsAHSf/ 74dq444n3V/XnKo6c1+GC739aWUB5duUcYpzw0FXM3jie56IT/nbLHuc00LswRlo1f48VQXR2sw Dh32/ X-Gm-Gg: ASbGncsP58ihtDYgr08r7jBm3QMm9TfGXBRpQiqYeOP83tYpvLbUYNcU9OZmdaAX8MD LlEK9ZUA04wcn+C+RXzCdNmUiqjRceOnjVh/P8JVInroBRTJvQK+i6jrlGkJhmFgdi7PJA9B+rx gjGuNq+SH0oYzYyg+S0BZoaLfcFuhP7fgqLZ2Kg/+RlPS8Un7abp6LTMsCweg4MXfk3DGwZNBIU 0riD25Xf4SFwh4Q8W+46UXERyDAjmn22wCFLQWPOMKTcgXri0aP2I+davJ8yhZtP31yLOhP2FNC LyhCl2EEr4HJ3up0nvIH6Bq4xDd5qZASyGVZwqDvr4mVm3a7bZ9dT5yEEgqgBVtG2iBkHvxPHZD qj8bK81SwKBPoK1o+dAetnXBrX22dPVSsXKwlyM6YZ7nPaU+y9/LzfAbJS6vyoELJuHi0l7lDS/ yscIe1jXFu+RUPqWcv6TndR52aHaMd1OgYOotzWiYTqUpfq0V2/WJoDKV3542ZdXaK X-Google-Smtp-Source: AGHT+IHo8OwgwUC032YSv60eOJd12q77M7KD3oUAYRPoiD1R5IwVdmpQvxPyI0R/rCY63GPtuNDIug== X-Received: by 2002:a05:620a:4406:b0:8a4:40da:b907 with SMTP id af79cd13be357-8b29b768ba9mr557487185a.32.1762975790793; Wed, 12 Nov 2025 11:29:50 -0800 (PST) Received: from gourry-fedora-PF4VCD3F.lan (pool-96-255-20-138.washdc.ftas.verizon.net. [96.255.20.138]) by smtp.gmail.com with ESMTPSA id af79cd13be357-8b29aa0082esm243922885a.50.2025.11.12.11.29.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 12 Nov 2025 11:29:50 -0800 (PST) From: Gregory Price To: linux-mm@kvack.org Cc: kernel-team@meta.com, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev, linux-fsdevel@vger.kernel.org, cgroups@vger.kernel.org, dave@stgolabs.net, jonathan.cameron@huawei.com, dave.jiang@intel.com, alison.schofield@intel.com, vishal.l.verma@intel.com, ira.weiny@intel.com, dan.j.williams@intel.com, longman@redhat.com, akpm@linux-foundation.org, david@redhat.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com, mhocko@suse.com, osalvador@suse.de, ziy@nvidia.com, matthew.brost@intel.com, joshua.hahnjy@gmail.com, rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net, ying.huang@linux.alibaba.com, apopple@nvidia.com, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, tj@kernel.org, hannes@cmpxchg.org, mkoutny@suse.com, kees@kernel.org, muchun.song@linux.dev, roman.gushchin@linux.dev, shakeel.butt@linux.dev, rientjes@google.com, jackmanb@google.com, cl@gentwo.org, harry.yoo@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, zhengqi.arch@bytedance.com, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev, fabio.m.de.francesco@linux.intel.com, rrichter@amd.com, ming.li@zohomail.com, usamaarif642@gmail.com, brauner@kernel.org, oleg@redhat.com, namcao@linutronix.de, escape@linux.alibaba.com, dongjoo.seo1@samsung.com Subject: [RFC PATCH v2 01/11] mm: constify oom_control, scan_control, and alloc_context nodemask Date: Wed, 12 Nov 2025 14:29:17 -0500 Message-ID: <20251112192936.2574429-2-gourry@gourry.net> X-Mailer: git-send-email 2.51.1 In-Reply-To: <20251112192936.2574429-1-gourry@gourry.net> References: <20251112192936.2574429-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The nodemasks in these structures may come from a variety of sources, including tasks and cpusets - and should never be modified by any code when being passed around inside another context. Signed-off-by: Gregory Price Acked-by: Balbir Singh --- include/linux/cpuset.h | 4 ++-- include/linux/mm.h | 4 ++-- include/linux/mmzone.h | 6 +++--- include/linux/oom.h | 2 +- include/linux/swap.h | 2 +- kernel/cgroup/cpuset.c | 2 +- mm/internal.h | 2 +- mm/mmzone.c | 5 +++-- mm/page_alloc.c | 4 ++-- mm/show_mem.c | 9 ++++++--- mm/vmscan.c | 6 +++--- 11 files changed, 25 insertions(+), 21 deletions(-) diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h index 2ddb256187b5..548eaf7ef8d0 100644 --- a/include/linux/cpuset.h +++ b/include/linux/cpuset.h @@ -80,7 +80,7 @@ extern bool cpuset_cpu_is_isolated(int cpu); extern nodemask_t cpuset_mems_allowed(struct task_struct *p); #define cpuset_current_mems_allowed (current->mems_allowed) void cpuset_init_current_mems_allowed(void); -int cpuset_nodemask_valid_mems_allowed(nodemask_t *nodemask); +int cpuset_nodemask_valid_mems_allowed(const nodemask_t *nodemask); =20 extern bool cpuset_current_node_allowed(int node, gfp_t gfp_mask); =20 @@ -219,7 +219,7 @@ static inline nodemask_t cpuset_mems_allowed(struct tas= k_struct *p) #define cpuset_current_mems_allowed (node_states[N_MEMORY]) static inline void cpuset_init_current_mems_allowed(void) {} =20 -static inline int cpuset_nodemask_valid_mems_allowed(nodemask_t *nodemask) +static inline int cpuset_nodemask_valid_mems_allowed(const nodemask_t *nod= emask) { return 1; } diff --git a/include/linux/mm.h b/include/linux/mm.h index d16b33bacc32..1a874917eae6 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3343,7 +3343,7 @@ extern int __meminit early_pfn_to_nid(unsigned long p= fn); extern void mem_init(void); extern void __init mmap_init(void); =20 -extern void __show_mem(unsigned int flags, nodemask_t *nodemask, int max_z= one_idx); +extern void __show_mem(unsigned int flags, const nodemask_t *nodemask, int= max_zone_idx); static inline void show_mem(void) { __show_mem(0, NULL, MAX_NR_ZONES - 1); @@ -3353,7 +3353,7 @@ extern void si_meminfo(struct sysinfo * val); extern void si_meminfo_node(struct sysinfo *val, int nid); =20 extern __printf(3, 4) -void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...= ); +void warn_alloc(gfp_t gfp_mask, const nodemask_t *nodemask, const char *fm= t, ...); =20 extern void setup_per_cpu_pageset(void); =20 diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 7fb7331c5725..5c96b2c52817 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1725,7 +1725,7 @@ static inline int zonelist_node_idx(const struct zone= ref *zoneref) =20 struct zoneref *__next_zones_zonelist(struct zoneref *z, enum zone_type highest_zoneidx, - nodemask_t *nodes); + const nodemask_t *nodes); =20 /** * next_zones_zonelist - Returns the next zone at or below highest_zoneidx= within the allowed nodemask using a cursor within a zonelist as a starting= point @@ -1744,7 +1744,7 @@ struct zoneref *__next_zones_zonelist(struct zoneref = *z, */ static __always_inline struct zoneref *next_zones_zonelist(struct zoneref = *z, enum zone_type highest_zoneidx, - nodemask_t *nodes) + const nodemask_t *nodes) { if (likely(!nodes && zonelist_zone_idx(z) <=3D highest_zoneidx)) return z; @@ -1770,7 +1770,7 @@ static __always_inline struct zoneref *next_zones_zon= elist(struct zoneref *z, */ static inline struct zoneref *first_zones_zonelist(struct zonelist *zoneli= st, enum zone_type highest_zoneidx, - nodemask_t *nodes) + const nodemask_t *nodes) { return next_zones_zonelist(zonelist->_zonerefs, highest_zoneidx, nodes); diff --git a/include/linux/oom.h b/include/linux/oom.h index 7b02bc1d0a7e..00da05d227e6 100644 --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -30,7 +30,7 @@ struct oom_control { struct zonelist *zonelist; =20 /* Used to determine mempolicy */ - nodemask_t *nodemask; + const nodemask_t *nodemask; =20 /* Memory cgroup in which oom is invoked, or NULL for global oom */ struct mem_cgroup *memcg; diff --git a/include/linux/swap.h b/include/linux/swap.h index e818fbade1e2..f5154499bafd 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -381,7 +381,7 @@ extern void swap_setup(void); /* linux/mm/vmscan.c */ extern unsigned long zone_reclaimable_pages(struct zone *zone); extern unsigned long try_to_free_pages(struct zonelist *zonelist, int orde= r, - gfp_t gfp_mask, nodemask_t *mask); + gfp_t gfp_mask, const nodemask_t *mask); =20 #define MEMCG_RECLAIM_MAY_SWAP (1 << 1) #define MEMCG_RECLAIM_PROACTIVE (1 << 2) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 52468d2c178a..cd3e2ae83d70 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -4238,7 +4238,7 @@ nodemask_t cpuset_mems_allowed(struct task_struct *ts= k) * * Are any of the nodes in the nodemask allowed in current->mems_allowed? */ -int cpuset_nodemask_valid_mems_allowed(nodemask_t *nodemask) +int cpuset_nodemask_valid_mems_allowed(const nodemask_t *nodemask) { return nodes_intersects(*nodemask, current->mems_allowed); } diff --git a/mm/internal.h b/mm/internal.h index 1561fc2ff5b8..464e60dd7ba1 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -587,7 +587,7 @@ void page_alloc_sysctl_init(void); */ struct alloc_context { struct zonelist *zonelist; - nodemask_t *nodemask; + const nodemask_t *nodemask; struct zoneref *preferred_zoneref; int migratetype; =20 diff --git a/mm/mmzone.c b/mm/mmzone.c index 0c8f181d9d50..59dc3f2076a6 100644 --- a/mm/mmzone.c +++ b/mm/mmzone.c @@ -43,7 +43,8 @@ struct zone *next_zone(struct zone *zone) return zone; } =20 -static inline int zref_in_nodemask(struct zoneref *zref, nodemask_t *nodes) +static inline int zref_in_nodemask(struct zoneref *zref, + const nodemask_t *nodes) { #ifdef CONFIG_NUMA return node_isset(zonelist_node_idx(zref), *nodes); @@ -55,7 +56,7 @@ static inline int zref_in_nodemask(struct zoneref *zref, = nodemask_t *nodes) /* Returns the next zone at or below highest_zoneidx in a zonelist */ struct zoneref *__next_zones_zonelist(struct zoneref *z, enum zone_type highest_zoneidx, - nodemask_t *nodes) + const nodemask_t *nodes) { /* * Find the next suitable zone to use for the allocation. diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 600d9e981c23..fd5401fb5e00 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3924,7 +3924,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int o= rder, int alloc_flags, return NULL; } =20 -static void warn_alloc_show_mem(gfp_t gfp_mask, nodemask_t *nodemask) +static void warn_alloc_show_mem(gfp_t gfp_mask, const nodemask_t *nodemask) { unsigned int filter =3D SHOW_MEM_FILTER_NODES; =20 @@ -3943,7 +3943,7 @@ static void warn_alloc_show_mem(gfp_t gfp_mask, nodem= ask_t *nodemask) __show_mem(filter, nodemask, gfp_zone(gfp_mask)); } =20 -void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...) +void warn_alloc(gfp_t gfp_mask, const nodemask_t *nodemask, const char *fm= t, ...) { struct va_format vaf; va_list args; diff --git a/mm/show_mem.c b/mm/show_mem.c index 3a4b5207635d..24685b5c6dcf 100644 --- a/mm/show_mem.c +++ b/mm/show_mem.c @@ -116,7 +116,8 @@ void si_meminfo_node(struct sysinfo *val, int nid) * Determine whether the node should be displayed or not, depending on whe= ther * SHOW_MEM_FILTER_NODES was passed to show_free_areas(). */ -static bool show_mem_node_skip(unsigned int flags, int nid, nodemask_t *no= demask) +static bool show_mem_node_skip(unsigned int flags, int nid, + const nodemask_t *nodemask) { if (!(flags & SHOW_MEM_FILTER_NODES)) return false; @@ -177,7 +178,8 @@ static bool node_has_managed_zones(pg_data_t *pgdat, in= t max_zone_idx) * SHOW_MEM_FILTER_NODES: suppress nodes that are not allowed by current's * cpuset. */ -static void show_free_areas(unsigned int filter, nodemask_t *nodemask, int= max_zone_idx) +static void show_free_areas(unsigned int filter, const nodemask_t *nodemas= k, + int max_zone_idx) { unsigned long free_pcp =3D 0; int cpu, nid; @@ -399,7 +401,8 @@ static void show_free_areas(unsigned int filter, nodema= sk_t *nodemask, int max_z show_swap_cache_info(); } =20 -void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_id= x) +void __show_mem(unsigned int filter, const nodemask_t *nodemask, + int max_zone_idx) { unsigned long total =3D 0, reserved =3D 0, highmem =3D 0; struct zone *zone; diff --git a/mm/vmscan.c b/mm/vmscan.c index b2fc8b626d3d..03e7f5206ad9 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -80,7 +80,7 @@ struct scan_control { * Nodemask of nodes allowed by the caller. If NULL, all nodes * are scanned. */ - nodemask_t *nodemask; + const nodemask_t *nodemask; =20 /* * The memory cgroup that hit its limit and as a result is the @@ -6530,7 +6530,7 @@ static bool allow_direct_reclaim(pg_data_t *pgdat) * happens, the page allocator should not consider triggering the OOM kill= er. */ static bool throttle_direct_reclaim(gfp_t gfp_mask, struct zonelist *zonel= ist, - nodemask_t *nodemask) + const nodemask_t *nodemask) { struct zoneref *z; struct zone *zone; @@ -6610,7 +6610,7 @@ static bool throttle_direct_reclaim(gfp_t gfp_mask, s= truct zonelist *zonelist, } =20 unsigned long try_to_free_pages(struct zonelist *zonelist, int order, - gfp_t gfp_mask, nodemask_t *nodemask) + gfp_t gfp_mask, const nodemask_t *nodemask) { unsigned long nr_reclaimed; struct scan_control sc =3D { --=20 2.51.1 From nobody Sun Feb 8 22:43:21 2026 Received: from mail-qt1-f180.google.com (mail-qt1-f180.google.com [209.85.160.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 255A434E769 for ; Wed, 12 Nov 2025 19:29:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762975798; cv=none; b=NScNDQeK913vZmmoZGvERM/gb2IX/txM4yCd7eOuZSLIi7IRdBvZj1lHne5GgBa9t87CmtVgDth/Td59JT5UQYntcbK6wA6gKVTeMrE4zILvc7Yk1dfPth8ls8KDQuCQfgLCPy3NyVM9ZrLhaHZxoaF6AaEcvQBKy4V0vCiX730= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762975798; c=relaxed/simple; bh=LBAE8kZkJPq29k1iLwOntoOZutywPK8fvxCPXpNAG00=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=S7bwL85DlMUT3Ef66HY3c2ShY7N6cw1N+jrb0tWsbrY54QKCinsuZ2crCJd+xPCLSMfbBlN0CfhR4ry3NNkUXX9fagDjQD1lzYfcgb6ddPETj555mueeQlP+33AkLpSr/fbtSSsvMCrHtVuTMeH6uEhe1w2489G0+ULcukPjmpA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=U21woACI; arc=none smtp.client-ip=209.85.160.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="U21woACI" Received: by mail-qt1-f180.google.com with SMTP id d75a77b69052e-4ed9c19248bso10850661cf.1 for ; Wed, 12 Nov 2025 11:29:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1762975794; x=1763580594; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=sCG0uhUeeA1he7Dw5Q3elz2n5ZNHYlg7H15kRGg6IlY=; b=U21woACIC0IijHo6RyKeSPCm+5RbJUW5pOEKTWR5JuSXcjxbSz03302AaVefkaK1mN 1YSVR62/CcaXLISutnAmS1VBXtnnQxUQorFfdAqXHmnpfJjHVI9Y7K2tvRHVushO6Oue 6PE+SEbCrKLR5O5I0CMqw7INddgtmB1xDuFlZR47qzU9xB/XZcDuKnxcz5zp5vrhP1u5 plxt8n3Ig/WrOi1pSpkE1Fpa82gKW5jxb9Y+utSlJnAzYiPrcSPro5ew2TTkSBC0sa/J XH9oigqx1GQ7eMvy4tVul+1+agNbfLI2NA4/qEpTCwAn6mUbboDZXtJ9kIfO3RT4OIQE PgtA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1762975794; x=1763580594; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=sCG0uhUeeA1he7Dw5Q3elz2n5ZNHYlg7H15kRGg6IlY=; b=SxnvTxqFSG92i8vi5XLy3uw2Ej5GbaplJFkEMtdgzms/uIYhGJ2QafMmOpVwpQgpYg DcHSUsa9aiUtcz9uBmLN3LtQAFkb/zt39jwjFHL+cnGu7Iv+Y8rNYkKOtQdhdjh/N42y Zy9DcbVH9BQ43RiwzriAa43zl0xz9huCwaq7YA9KyrIHqqyZwSfaxJj9sSHI+DwwvmJb JeZeNc7xJ+TlRJPRNBL/2qONanWWverzAeFI5WIxQd7SS9E/Qs8hQgxikfwzEQyQmD0u I2TwlubGp7dpdrvaTAg8oLGBOp0JGniFAtS2Q9/EUwybT05f8+/upIQRUCdu5l+YmuXu WcZQ== X-Forwarded-Encrypted: i=1; AJvYcCVh7KKCCbY+Qsa2/idmvWj2ZpkbREqjJJL07PYC02Gwu6zTIfVLd7vr4xk60xN2uKz8fEZC/BySrSEPwts=@vger.kernel.org X-Gm-Message-State: AOJu0YxttvQ1wv4WnuR49J5bxM4K9By2i1haepwAyobo1WeqPHXH4AKY NJ+ewabs2NXgvt3iXFxuI751bnw9q9e/YfQzc+ynghjY//B7YTNuHoKWkIFV5NbppeI= X-Gm-Gg: ASbGncuprVYdj8pljVPrObIkQncr7ceVzQ6CQz9CORT8NNCr4dVfMRcPSoO29uB84Sh h1a4MNhjAZ/YGNLLpZDU7Qr51yR8AjFHvB79zPwXoxCv5DKCb4qZSUnP6ICewyCQS/bTKXek+Hc zzJQ8aL2QImawoNg6EgYm3aOAKG0dlILPz0zoeMeaRlxhnoNNjZtymTsnaeG6LlI8n4WK2jUWGE Gl1AumssEwBbtVmNUaC718GtK61WrRSxyf2ZtIWJ/MhVlpWBKYT7s2M2L040n9dON3GhxQ2ksj8 V3mo89VdRUQU5NmmVbLwjVlYKw7yDIaVIooAt9qcurJnKgIk38Yjg86ERjkD2VHYktkmnTaLRs+ PmjI5ruR8DdR7dtcuUR3O22z9GfFC9BSn9NzUEp1o4lRsDmu1/PU8PL1QZCVRcE8ror58aMmnUU cCJGn7Kc8IioG12S60JojAGmQXtKIYEQTvLdzBghmqDm3p6W5nx7wWpUGDVF+SzAG7TtvKAY5Al 9E= X-Google-Smtp-Source: AGHT+IHnM9b4iMhaMM3Umfgp794GytPagXJHdOPIuSBhJp6PMVhTsTQsQQE9eWpDhS66Z0o2IvV+zA== X-Received: by 2002:a05:622a:2d6:b0:4db:db96:15d3 with SMTP id d75a77b69052e-4eddbd61fe9mr49171401cf.31.1762975793923; Wed, 12 Nov 2025 11:29:53 -0800 (PST) Received: from gourry-fedora-PF4VCD3F.lan (pool-96-255-20-138.washdc.ftas.verizon.net. [96.255.20.138]) by smtp.gmail.com with ESMTPSA id af79cd13be357-8b29aa0082esm243922885a.50.2025.11.12.11.29.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 12 Nov 2025 11:29:53 -0800 (PST) From: Gregory Price To: linux-mm@kvack.org Cc: kernel-team@meta.com, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev, linux-fsdevel@vger.kernel.org, cgroups@vger.kernel.org, dave@stgolabs.net, jonathan.cameron@huawei.com, dave.jiang@intel.com, alison.schofield@intel.com, vishal.l.verma@intel.com, ira.weiny@intel.com, dan.j.williams@intel.com, longman@redhat.com, akpm@linux-foundation.org, david@redhat.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com, mhocko@suse.com, osalvador@suse.de, ziy@nvidia.com, matthew.brost@intel.com, joshua.hahnjy@gmail.com, rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net, ying.huang@linux.alibaba.com, apopple@nvidia.com, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, tj@kernel.org, hannes@cmpxchg.org, mkoutny@suse.com, kees@kernel.org, muchun.song@linux.dev, roman.gushchin@linux.dev, shakeel.butt@linux.dev, rientjes@google.com, jackmanb@google.com, cl@gentwo.org, harry.yoo@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, zhengqi.arch@bytedance.com, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev, fabio.m.de.francesco@linux.intel.com, rrichter@amd.com, ming.li@zohomail.com, usamaarif642@gmail.com, brauner@kernel.org, oleg@redhat.com, namcao@linutronix.de, escape@linux.alibaba.com, dongjoo.seo1@samsung.com Subject: [RFC PATCH v2 02/11] mm: change callers of __cpuset_zone_allowed to cpuset_zone_allowed Date: Wed, 12 Nov 2025 14:29:18 -0500 Message-ID: <20251112192936.2574429-3-gourry@gourry.net> X-Mailer: git-send-email 2.51.1 In-Reply-To: <20251112192936.2574429-1-gourry@gourry.net> References: <20251112192936.2574429-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" All current callers of __cpuset_zone_allowed() presently check if cpusets_enabled() is true first - which is the first check of the cpuset_zone_allowed() function. Signed-off-by: Gregory Price --- mm/compaction.c | 7 +++---- mm/page_alloc.c | 19 ++++++++----------- 2 files changed, 11 insertions(+), 15 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 1e8f8eca318c..d2176935d3dd 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -2829,10 +2829,9 @@ enum compact_result try_to_compact_pages(gfp_t gfp_m= ask, unsigned int order, ac->highest_zoneidx, ac->nodemask) { enum compact_result status; =20 - if (cpusets_enabled() && - (alloc_flags & ALLOC_CPUSET) && - !__cpuset_zone_allowed(zone, gfp_mask)) - continue; + if ((alloc_flags & ALLOC_CPUSET) && + !cpuset_zone_allowed(zone, gfp_mask)) + continue; =20 if (prio > MIN_COMPACT_PRIORITY && compaction_deferred(zone, order)) { diff --git a/mm/page_alloc.c b/mm/page_alloc.c index fd5401fb5e00..bcaf1125d109 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3750,10 +3750,9 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int = order, int alloc_flags, struct page *page; unsigned long mark; =20 - if (cpusets_enabled() && - (alloc_flags & ALLOC_CPUSET) && - !__cpuset_zone_allowed(zone, gfp_mask)) - continue; + if ((alloc_flags & ALLOC_CPUSET) && + !cpuset_zone_allowed(zone, gfp_mask)) + continue; /* * When allocating a page cache page for writing, we * want to get it from a node that is within its dirty @@ -4553,10 +4552,9 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order, unsigned long min_wmark =3D min_wmark_pages(zone); bool wmark; =20 - if (cpusets_enabled() && - (alloc_flags & ALLOC_CPUSET) && - !__cpuset_zone_allowed(zone, gfp_mask)) - continue; + if ((alloc_flags & ALLOC_CPUSET) && + !cpuset_zone_allowed(zone, gfp_mask)) + continue; =20 available =3D reclaimable =3D zone_reclaimable_pages(zone); available +=3D zone_page_state_snapshot(zone, NR_FREE_PAGES); @@ -5052,10 +5050,9 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int= preferred_nid, for_next_zone_zonelist_nodemask(zone, z, ac.highest_zoneidx, ac.nodemask)= { unsigned long mark; =20 - if (cpusets_enabled() && (alloc_flags & ALLOC_CPUSET) && - !__cpuset_zone_allowed(zone, gfp)) { + if ((alloc_flags & ALLOC_CPUSET) && + !cpuset_zone_allowed(zone, gfp)) continue; - } =20 if (nr_online_nodes > 1 && zone !=3D zonelist_zone(ac.preferred_zoneref)= && zone_to_nid(zone) !=3D zonelist_node_idx(ac.preferred_zoneref)) { --=20 2.51.1 From nobody Sun Feb 8 22:43:21 2026 Received: from mail-qk1-f174.google.com (mail-qk1-f174.google.com [209.85.222.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 20652350A16 for ; Wed, 12 Nov 2025 19:29:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762975799; cv=none; b=OnRaAs67uDPSPWco2O/EvJSzJ35kL58V/j70tPjdi7D87odipTfvO4QP9ykHnb5JmTYHNg4bkGCWA4HuUF3HDhM0xmyd9+6K9+5FGD5TT/q7amkqFPuDGSDo+0/obW9XWX81rrtpGDLq+l8Z//em9vCNXrn0cOxld6qMRLPaG6Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762975799; c=relaxed/simple; bh=0RDVQa7L0OnIu08nxQH3yL1ogWBg2dkj/2taZUo7yjs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=aoUOkGuO3W8N5Lrky4Ocmrznxxv7RLpyUmg6o3xrli137Z0zacwsQmOAxo7sP7oXeEG9GuC393Dkt/6mTV0obv9jdO3KnSHFv+HUYab9xowavvTgWEyn1w+prgB27hh+l8Ahb54haEiJvv1WR+vtcXtHA2sPS+ty7FHal/KLzR8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=aTwvNscC; arc=none smtp.client-ip=209.85.222.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="aTwvNscC" Received: by mail-qk1-f174.google.com with SMTP id af79cd13be357-8b28f983333so4543385a.3 for ; Wed, 12 Nov 2025 11:29:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1762975797; x=1763580597; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Nze/QytclSTH4fFVa9JSQZznz6m85tFFzurTQmWSt3g=; b=aTwvNscCNINUUy1Sifr92j+E6x4+KZxRLycf5i1o5rm86Fqd3MYahZywGTryVCqANs ZUh/JKvGp51yrCxyyTH1h1BVLSO3PGnTCADcDxHrSkHefQPVgrMpwbvjFzLfAmWVmr8f AL9LXgZO8hEzfx6eMenGwYlIgsi+HdAyIXQQgj3jO6Y1iJ3KwAAXHT77R3WF3yl0Bf9b G4UsVoMj+AaF52ZeLKLOv8QfmCyOxAaY8gBebVoQx5BeDkIhvt94Kmx6vOdBnPCY5a6c WSGTM34laehVBAoFnzTTNYmEYFKvsRveLuUbhpF8Kj+ReI4jl1tB8wa/toeD3Ok0f64l /btg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1762975797; x=1763580597; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=Nze/QytclSTH4fFVa9JSQZznz6m85tFFzurTQmWSt3g=; b=aO6wMs3bACjYQPkfsVRitFmOY1iUw9gizzbmEkOVw9pcOUBZXHVpsBhi8j20OfAoRu knEfn98MY0FOVEk8deXRRh6Yt0XmVusNyDFFTRQU8PbO9AnvocShLuUyvGh+ZXokE6nC 6KaF8oGcyi0gbwP+Vx4spfqRBsTeABJdJfUnArPyAz5pmWMyMWu1cjbkFr4+J5D+yD7P eiglU+EkHpRC9a0xQg18ryLBGd3bfQNM3+87fUT38TZENOJFERahPyd4b2IlGbhDEAwV SmUxwlmOZNgNZlhqJVGz9XTjIwfFY4zEWTyruOpa7R9OYNwk+CHXNZ6rATrWSfirIy5j JNAA== X-Forwarded-Encrypted: i=1; AJvYcCVRzaIe6haqcTu5lwrkBKcrYKWaLTHcnZVIf3yzmV194vyNrJ5KI0RFGfDh8+QcCPNdi+47LzrlTVd0XKw=@vger.kernel.org X-Gm-Message-State: AOJu0YwB17MQ5cgLNOUVqUluM3EW6glrm9P+mzuFyjSN56YfNr9Pf+8R u8TyCm5v6FyD05UHeFBRp2XP3YN8lCCQu0YX4IRcb/W1G36jkxR11LOv3sJZDwff8A0= X-Gm-Gg: ASbGncvp1O8RpHhsVG1cEN3s9ABcX3qymSMEdK5FGgkR7MLYK71/4HH+a+yQEDEpQGB KbaWjHe87KzkxdvfMfxdFtgoauOVfRY49G61EyCiJVVy+eXdsIAGOB6sJr+U5s1iWlcGnNCTSWN 2xipOrvGbZB6e/KIfqrEFKlrQoQxFWgfeQnu25yiHllRGuGdGXBBNtlkIDWUXM+XBVCyuhZqnQL ic8L33pOtY0Sa9kiByttikBW1pOo7jWhdIyztql12dXu1fAfZp8BPh6N5oVMl3qIll9yclPA0qy IKLdmsPjYzLkfXA0ygMYfDkwFutVUoOxrdmz7IBCMJByuaBpyh/UK7qWytk5NhBiCnA1B3A0302 10DOWqoquoxHWtfwzVB/qzfyyeGkjYIu+LtL+HFEyMls4MYZ593Gbi0ZuQN9vECXUUqcE5P8aEY bi4eLY/g0xGDHuqkYrkMEaNNhpiG5YjgvfCwowWmhqVUbvXflIMta3UPPNvZVC+StE X-Google-Smtp-Source: AGHT+IH1lDcByGDzgs1svKTRXfQs8NRaQexk4Q0+NImfdOthU+jf0vqFA5hw/PqFsiRTGmnVj1m8eQ== X-Received: by 2002:a05:620a:1a0f:b0:890:2e24:a543 with SMTP id af79cd13be357-8b29b77b3c0mr590594985a.34.1762975796840; Wed, 12 Nov 2025 11:29:56 -0800 (PST) Received: from gourry-fedora-PF4VCD3F.lan (pool-96-255-20-138.washdc.ftas.verizon.net. [96.255.20.138]) by smtp.gmail.com with ESMTPSA id af79cd13be357-8b29aa0082esm243922885a.50.2025.11.12.11.29.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 12 Nov 2025 11:29:56 -0800 (PST) From: Gregory Price To: linux-mm@kvack.org Cc: kernel-team@meta.com, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev, linux-fsdevel@vger.kernel.org, cgroups@vger.kernel.org, dave@stgolabs.net, jonathan.cameron@huawei.com, dave.jiang@intel.com, alison.schofield@intel.com, vishal.l.verma@intel.com, ira.weiny@intel.com, dan.j.williams@intel.com, longman@redhat.com, akpm@linux-foundation.org, david@redhat.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com, mhocko@suse.com, osalvador@suse.de, ziy@nvidia.com, matthew.brost@intel.com, joshua.hahnjy@gmail.com, rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net, ying.huang@linux.alibaba.com, apopple@nvidia.com, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, tj@kernel.org, hannes@cmpxchg.org, mkoutny@suse.com, kees@kernel.org, muchun.song@linux.dev, roman.gushchin@linux.dev, shakeel.butt@linux.dev, rientjes@google.com, jackmanb@google.com, cl@gentwo.org, harry.yoo@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, zhengqi.arch@bytedance.com, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev, fabio.m.de.francesco@linux.intel.com, rrichter@amd.com, ming.li@zohomail.com, usamaarif642@gmail.com, brauner@kernel.org, oleg@redhat.com, namcao@linutronix.de, escape@linux.alibaba.com, dongjoo.seo1@samsung.com Subject: [RFC PATCH v2 03/11] gfp: Add GFP_SPM_NODE for Specific Purpose Memory (SPM) allocations Date: Wed, 12 Nov 2025 14:29:19 -0500 Message-ID: <20251112192936.2574429-4-gourry@gourry.net> X-Mailer: git-send-email 2.51.1 In-Reply-To: <20251112192936.2574429-1-gourry@gourry.net> References: <20251112192936.2574429-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" GFP_SPM_NODE changes the nodemask checks in the page allocator to include the full set memory nodes, rather than just SysRAM nodes. Signed-off-by: Gregory Price --- include/linux/gfp_types.h | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/include/linux/gfp_types.h b/include/linux/gfp_types.h index 65db9349f905..525ae891420e 100644 --- a/include/linux/gfp_types.h +++ b/include/linux/gfp_types.h @@ -58,6 +58,7 @@ enum { #ifdef CONFIG_SLAB_OBJ_EXT ___GFP_NO_OBJ_EXT_BIT, #endif + ___GFP_SPM_NODE_BIT, ___GFP_LAST_BIT }; =20 @@ -103,6 +104,7 @@ enum { #else #define ___GFP_NO_OBJ_EXT 0 #endif +#define ___GFP_SPM_NODE BIT(___GFP_SPM_NODE_BIT) =20 /* * Physical address zone modifiers (see linux/mmzone.h - low four bits) @@ -145,6 +147,8 @@ enum { * %__GFP_ACCOUNT causes the allocation to be accounted to kmemcg. * * %__GFP_NO_OBJ_EXT causes slab allocation to have no object extension. + * + * %__GFP_SPM_NODE allows the use of Specific Purpose Memory Nodes */ #define __GFP_RECLAIMABLE ((__force gfp_t)___GFP_RECLAIMABLE) #define __GFP_WRITE ((__force gfp_t)___GFP_WRITE) @@ -152,6 +156,7 @@ enum { #define __GFP_THISNODE ((__force gfp_t)___GFP_THISNODE) #define __GFP_ACCOUNT ((__force gfp_t)___GFP_ACCOUNT) #define __GFP_NO_OBJ_EXT ((__force gfp_t)___GFP_NO_OBJ_EXT) +#define __GFP_SPM_NODE ((__force gfp_t)___GFP_SPM_NODE) =20 /** * DOC: Watermark modifiers --=20 2.51.1 From nobody Sun Feb 8 22:43:21 2026 Received: from mail-qk1-f182.google.com (mail-qk1-f182.google.com [209.85.222.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0EAE0351FC2 for ; Wed, 12 Nov 2025 19:30:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762975803; cv=none; b=UOC2Le8aFvuX45MEf7a37o3+j+/YM3c6WTaEuVgwxXnEl9VEuI5qjfyrnpz/9T7dQ6vJtIXU9kLYuWM6q+gshi/87Hf+Rk3C05c7UiVPleSm4pR7pmde9R3GNbLuy+k0l+xedyAeVioaRb/eZNdMvU6vmlqyI1RPxRCwcuD1t0k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762975803; c=relaxed/simple; bh=eHEbONvZIj37o0CRFSIwDW5/+99mEJwz4WMdReKTyos=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=b5GHKl217516sX+CbIVKutwq+oG7IzSaa04ElhEkJM7wUjnauu8DxoavEVLApZd7sAQ1V9V8UxNJpkiKdKZ6O/B7rAvMm9461MbVOaT1AbrWJrWQmtvIJwtAEty8zUY9PnburFWlZglUsoLWApqD+7I2/ZBFjJvIsc2AweyZkCs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=su++Ku3T; arc=none smtp.client-ip=209.85.222.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="su++Ku3T" Received: by mail-qk1-f182.google.com with SMTP id af79cd13be357-8b28f983333so4549485a.3 for ; Wed, 12 Nov 2025 11:30:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1762975800; x=1763580600; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=QJ3STVw+04anWCoqtoEILU8J9sgMLUtAldTEWscHkZw=; b=su++Ku3TZuZIylJVYIjsRg+bG6Ff5F1yniZAeMEwZV5B9SRxtgO8Pa7ZA5qW344eAl ALBiIvLqY/UMYydvLNIpsyxsZ7eedw6gfFhdmcmcwRrCXiQCtPN19XthORu1knmMPizB yu9ljJXEC3e3JIq91X/s5iu48q7lYyN2yMNpAwHQTU7JoCN6O1Cc3aaG+APdqlXkh2CZ RqlI8414oV1/m9eAoOHuFZWL5fxiSBQ2UnAcMG27CVyqvAIVSzb+U7eNyXS5A+H7jXzN C+JF2TMOqJg3HCzBJdrplR2Cu7b+E6bjFE+nXrCYec/cSczQZfl+KbCLWRQYN4toikkz oPow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1762975800; x=1763580600; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=QJ3STVw+04anWCoqtoEILU8J9sgMLUtAldTEWscHkZw=; b=gVeJuP5DNo+duPQ6FkIVMNEMs5kxB2nJPwpGuxjVxHMtZDIVedZBx944xL3oRSpeZj KQNV96pAcB1YAaOYBaZixORu8Y+l484WeBjJZzUzvwzWYyJjv18wC214fx9lFrIxnPwJ f1W7RGUPguuJPo6ihCk/KqsanXEqNSNCEFeJxCGabc6L6yVogCgPGExYMrXDf30UlZ80 k7SwYC3GE5mJO2kX7OqLWK5WxXHCnTDm9Uit3uELFKW5n6qNqk0XJOL/KhwIgjGGTJHR Z5TUr+aOpEjYcceSmah7mmqiyHvD66rhjNxQWBb/6aZ8lr4jadkRWbgOGI7VNv+XsMWz TtLw== X-Forwarded-Encrypted: i=1; AJvYcCVux/pLlmJlJfn1WL9aojmHEckUbZx7YwCMDkNM5KzJf0x49K8980KCzf9liBxP8Th5LImEbs2W8SEtz7g=@vger.kernel.org X-Gm-Message-State: AOJu0YygUcfG2N7gZ5v5BTxNzBA/7O6QK+2jXTRzbsM3t+cjEGmVSVPS agywiLszUqgaHr2Hqo6/Z/YYAoXpVjZB3CgmJjxNekMFhjgSmL3TGP4rS/sEm8477PI= X-Gm-Gg: ASbGncv4AffkAcQGPMrb1eHNLZ5vhDwVrbKAyQ+k4nIIScYVcIEBAQIjujXCks+XvsL Mdp1P7wS7J4jfuShJFtADdlsQnKoRF8mI4PjFYlvDKafNleJi9vzxlIo0J5DSlFdTxh8flev6PT ChBQHiklu7lqD7prWcq+FD9X6zvogYxawfGb6JfwK78Sj7Pzto6E2z1eyNpdm0qk5KKY9R/mn85 hJx7WXo5syaSGp0bPUCTpiwn1mZLRu21GGELaUKewXGq5ILHUKMpPMdZ9lgqSBYxjFNtFQqMQBF Bp1nY0NH02zsGm2jLAkI8Rf272pENlkCkPMVwbvTjGlNT3qSRYLPGF+4/ih1SyNKaj7CudEtiBF OsJyXUCJkW8QUeaOZ/WIQmCMbSMLO8VQlROacuZYD658tel/aqK7O0jMEEYihEYq+JKt9W0jh+f kZwV/oR5rIsjYVyw8qe9RE09flNmAr09IzZSybV2SWcP1IOYAuRWsukTXWsC8wePwV X-Google-Smtp-Source: AGHT+IFpnJx+AzJWivjhYu5iwBOCku4YiBEji1yF3XFF/JbCkfgSDAdwWhiHUxXnRZSwthPc0nD5rw== X-Received: by 2002:a05:620a:40c1:b0:8b1:ac18:acc9 with SMTP id af79cd13be357-8b29b77ad4bmr554239985a.32.1762975799847; Wed, 12 Nov 2025 11:29:59 -0800 (PST) Received: from gourry-fedora-PF4VCD3F.lan (pool-96-255-20-138.washdc.ftas.verizon.net. [96.255.20.138]) by smtp.gmail.com with ESMTPSA id af79cd13be357-8b29aa0082esm243922885a.50.2025.11.12.11.29.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 12 Nov 2025 11:29:59 -0800 (PST) From: Gregory Price To: linux-mm@kvack.org Cc: kernel-team@meta.com, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev, linux-fsdevel@vger.kernel.org, cgroups@vger.kernel.org, dave@stgolabs.net, jonathan.cameron@huawei.com, dave.jiang@intel.com, alison.schofield@intel.com, vishal.l.verma@intel.com, ira.weiny@intel.com, dan.j.williams@intel.com, longman@redhat.com, akpm@linux-foundation.org, david@redhat.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com, mhocko@suse.com, osalvador@suse.de, ziy@nvidia.com, matthew.brost@intel.com, joshua.hahnjy@gmail.com, rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net, ying.huang@linux.alibaba.com, apopple@nvidia.com, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, tj@kernel.org, hannes@cmpxchg.org, mkoutny@suse.com, kees@kernel.org, muchun.song@linux.dev, roman.gushchin@linux.dev, shakeel.butt@linux.dev, rientjes@google.com, jackmanb@google.com, cl@gentwo.org, harry.yoo@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, zhengqi.arch@bytedance.com, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev, fabio.m.de.francesco@linux.intel.com, rrichter@amd.com, ming.li@zohomail.com, usamaarif642@gmail.com, brauner@kernel.org, oleg@redhat.com, namcao@linutronix.de, escape@linux.alibaba.com, dongjoo.seo1@samsung.com Subject: [RFC PATCH v2 04/11] memory-tiers: Introduce SysRAM and Specific Purpose Memory Nodes Date: Wed, 12 Nov 2025 14:29:20 -0500 Message-ID: <20251112192936.2574429-5-gourry@gourry.net> X-Mailer: git-send-email 2.51.1 In-Reply-To: <20251112192936.2574429-1-gourry@gourry.net> References: <20251112192936.2574429-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Create Memory Node "types" (SysRAM and Specific Purpose) which can be set at memory hotplug time. SysRAM nodes present at __init time are added to the mt_sysram_nodelist and memory hotplug will decide whether hotplugged nodes will be placed in mt_sysram_nodelist or mt_spm_nodelist. SPM nodes are not included in demotion targets. Setting a node type is permanent and cannot be switched once set, this prevents type-change race conditions on the global mt_sysram_nodelist. Signed-off-by: Gregory Price --- include/linux/memory-tiers.h | 47 +++++++++++++++++++++++++ mm/memory-tiers.c | 66 ++++++++++++++++++++++++++++++++++-- 2 files changed, 111 insertions(+), 2 deletions(-) diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h index 7a805796fcfd..59443cbfaec3 100644 --- a/include/linux/memory-tiers.h +++ b/include/linux/memory-tiers.h @@ -35,10 +35,44 @@ struct memory_dev_type { =20 struct access_coordinate; =20 +enum { + MT_NODE_TYPE_SYSRAM, + MT_NODE_TYPE_SPM +}; + #ifdef CONFIG_NUMA extern bool numa_demotion_enabled; extern struct memory_dev_type *default_dram_type; extern nodemask_t default_dram_nodes; +extern nodemask_t mt_sysram_nodelist; +extern nodemask_t mt_spm_nodelist; +static inline nodemask_t *mt_sysram_nodemask(void) +{ + if (nodes_empty(mt_sysram_nodelist)) + return NULL; + return &mt_sysram_nodelist; +} +static inline void mt_nodemask_sysram_mask(nodemask_t *dst, nodemask_t *ma= sk) +{ + /* If the sysram filter isn't available, this allows all */ + if (nodes_empty(mt_sysram_nodelist)) { + nodes_or(*dst, *mask, NODE_MASK_NONE); + return; + } + nodes_and(*dst, *mask, mt_sysram_nodelist); +} +static inline bool mt_node_is_sysram(int nid) +{ + /* if sysram filter isn't setup, this allows all */ + return nodes_empty(mt_sysram_nodelist) || + node_isset(nid, mt_sysram_nodelist); +} +static inline bool mt_node_allowed(int nid, gfp_t gfp_mask) +{ + if (gfp_mask & __GFP_SPM_NODE) + return true; + return mt_node_is_sysram(nid); +} struct memory_dev_type *alloc_memory_type(int adistance); void put_memory_type(struct memory_dev_type *memtype); void init_node_memory_type(int node, struct memory_dev_type *default_type); @@ -73,11 +107,19 @@ static inline bool node_is_toptier(int node) } #endif =20 +int mt_set_node_type(int node, int type); + #else =20 #define numa_demotion_enabled false #define default_dram_type NULL #define default_dram_nodes NODE_MASK_NONE +#define mt_sysram_nodelist NODE_MASK_NONE +#define mt_spm_nodelist NODE_MASK_NONE +static inline nodemask_t *mt_sysram_nodemask(void) { return NULL; } +static inline void mt_nodemask_sysram_mask(nodemask_t *dst, nodemask_t *ma= sk) {} +static inline bool mt_node_is_sysram(int nid) { return true; } +static inline bool mt_node_allowed(int nid, gfp_t gfp_mask) { return true;= } /* * CONFIG_NUMA implementation returns non NULL error. */ @@ -151,5 +193,10 @@ static inline struct memory_dev_type *mt_find_alloc_me= mory_type(int adist, static inline void mt_put_memory_types(struct list_head *memory_types) { } + +int mt_set_node_type(int node, int type) +{ + return 0; +} #endif /* CONFIG_NUMA */ #endif /* _LINUX_MEMORY_TIERS_H */ diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c index 0ea5c13f10a2..dd6cfaa4c667 100644 --- a/mm/memory-tiers.c +++ b/mm/memory-tiers.c @@ -44,7 +44,15 @@ static LIST_HEAD(memory_tiers); static LIST_HEAD(default_memory_types); static struct node_memory_type_map node_memory_types[MAX_NUMNODES]; struct memory_dev_type *default_dram_type; -nodemask_t default_dram_nodes __initdata =3D NODE_MASK_NONE; + +/* default_dram_nodes is the list of nodes with both CPUs and RAM */ +nodemask_t default_dram_nodes =3D NODE_MASK_NONE; + +/* mt_sysram_nodelist is the list of nodes with SysramRAM */ +nodemask_t mt_sysram_nodelist =3D NODE_MASK_NONE; + +/* mt_spm_nodelist is the list of nodes with Specific Purpose Memory */ +nodemask_t mt_spm_nodelist =3D NODE_MASK_NONE; =20 static const struct bus_type memory_tier_subsys =3D { .name =3D "memory_tiering", @@ -427,6 +435,14 @@ static void establish_demotion_targets(void) disable_all_demotion_targets(); =20 for_each_node_state(node, N_MEMORY) { + /* + * If this is not a sysram node, direct-demotion is not allowed + * and must be managed by special logic that understands the + * memory features of that particular node. + */ + if (!node_isset(node, mt_sysram_nodelist)) + continue; + best_distance =3D -1; nd =3D &node_demotion[node]; =20 @@ -457,7 +473,8 @@ static void establish_demotion_targets(void) break; =20 distance =3D node_distance(node, target); - if (distance =3D=3D best_distance || best_distance =3D=3D -1) { + if ((distance =3D=3D best_distance || best_distance =3D=3D -1) && + node_isset(target, mt_sysram_nodelist)) { best_distance =3D distance; node_set(target, nd->preferred); } else { @@ -689,6 +706,48 @@ void mt_put_memory_types(struct list_head *memory_type= s) } EXPORT_SYMBOL_GPL(mt_put_memory_types); =20 +/** + * mt_set_node_type() - Set a NUMA Node's Memory type. + * @node: The node type to set + * @type: The type to set + * + * This is a one-way setting, once a type is assigned it cannot be cleared + * without resetting the system. This is to avoid race conditions associa= ted + * with moving nodes from one type to another during memory hotplug. + * + * Once a node is added as a SysRAM node, it will be used by default in + * the page allocator as a valid target when the calling does not provide + * a node or nodemask. This is safe as the page allocator iterates through + * zones and uses this nodemask to filter zones - if a node is present but + * has no zones the node is ignored. + * + * Return: 0 if the node type is set successfully (or it's already set) + * -EBUSY if the node has a different type already + * -ENODEV if the type is invalid + */ +int mt_set_node_type(int node, int type) +{ + int err; + + mutex_lock(&memory_tier_lock); + if (type =3D=3D MT_NODE_TYPE_SYSRAM) + err =3D node_isset(node, mt_spm_nodelist) ? -EBUSY : 0; + else if (type =3D=3D MT_NODE_TYPE_SPM) + err =3D node_isset(node, mt_sysram_nodelist) ? -EBUSY : 0; + if (err) + goto out; + + if (type =3D=3D MT_NODE_TYPE_SYSRAM) + node_set(node, mt_sysram_nodelist); + else if (type =3D=3D MT_NODE_TYPE_SPM) + node_set(node, mt_spm_nodelist); + else + err =3D -ENODEV; +out: + mutex_unlock(&memory_tier_lock); + return err; +} + /* * This is invoked via `late_initcall()` to initialize memory tiers for * memory nodes, both with and without CPUs. After the initialization of @@ -922,6 +981,9 @@ static int __init memory_tier_init(void) nodes_and(default_dram_nodes, node_states[N_MEMORY], node_states[N_CPU]); =20 + /* Record all nodes with non-hotplugged memory as default SYSRAM nodes */ + mt_sysram_nodelist =3D node_states[N_MEMORY]; + hotplug_node_notifier(memtier_hotplug_callback, MEMTIER_HOTPLUG_PRI); return 0; } --=20 2.51.1 From nobody Sun Feb 8 22:43:21 2026 Received: from mail-qk1-f175.google.com (mail-qk1-f175.google.com [209.85.222.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 142F6352924 for ; Wed, 12 Nov 2025 19:30:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.175 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762975805; cv=none; b=XKxskaDX+j/xa9gLduHI1ALrNucWXmZFvNHGBTgysy6EJnj1TgicV69Mkxck/27+gGxxzuQbhCI1NyOEZpkr0Y7tTCfgr0rsK86VB4Gfel4Xl6toMcY/Gt26KlBbi6mXoHSKB7AHSTq1ZR97mw8d5buUeNlPp7JmhPE3A93a+OY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762975805; c=relaxed/simple; bh=qkFB80HrhZMKpQhvPyMUjApbN47T6UHdB3cpm7OAspA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=roKO0UkYp8LrfPt2opyQN60bgfBM1jx7zLJ5G7t9Bc6AtTfUn1Ldjb5MpvWhZzfHENVLFCaRC1QlDqQqN3VpfFR0lZPpL7K+BC7FMQluNDhtRP757RWiE6RRXk43KKcZ6ffYQINF4G+D5RkzUPZy27MJNierQOFd3GByxQ2VWRo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=T/rKUQMm; arc=none smtp.client-ip=209.85.222.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="T/rKUQMm" Received: by mail-qk1-f175.google.com with SMTP id af79cd13be357-8b271e6833fso112138485a.0 for ; Wed, 12 Nov 2025 11:30:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1762975803; x=1763580603; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=S9HfsEAYF9yiUWWUj5tMirDvvn0FEtJ7B1MIYMFDuxY=; b=T/rKUQMmSjvQ0EA5rpdMH77VwhiM5CblzY+F23KGJH2+i0FbsXSf3lLgGeu6aFAa3x 5r7df/efcs90iAKZlXVwDzIDAMIS4ADvFfr1EhEK0QnY8AbUW1/KurS5EBQtM+LOZSGL Cc13jMCILegW2HLNkqdUTaipUUtLfWZt5P7VD81nNOm3DjjbosAzABE7YFG9vazlYf3F aT2l50Wf5492Zuo1y4wBhJB+UGOCwdMwwKO65k0GoR2tgdfRLLra+74jRs4jJTcS7sZz kMc4L1Cw+/8pDlZhMJcxiUZLNXf+gcZFInG//EJ/qxZPmIV3Y7f+mdqjgS8DDes1IufY +QFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1762975803; x=1763580603; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=S9HfsEAYF9yiUWWUj5tMirDvvn0FEtJ7B1MIYMFDuxY=; b=sQL4GHsT1z/X223WPMe14qaXy1rJQ7jQ0YZv2DACggYHK9yVfEm9ed+UxpJWuF+GtU Hz7o8EzEUyXizZWXB9D+nbkH/qMX19ch4UqqQdsLdmBePVJwQAxReV5v+V2p/D7dw3wm L334CHUuGM1Ax/e9pQCIiXMulNSrL0olN0Pv2L6FU3phwZ0dATeVf90LdiAJSDBBqEd2 /F9xEn/cIol0+YBEwnw7QMDYe8D06hYXkvfmtNdzBI8osZUfCWKKSo8qlP7RhX1VqHMm 7Rg2NhthtDg0Ha+LNMGWVShLAYe+U2SBZx21h8P3y2LarIm+TzwJVWSL3aamQ9Io+OsS dl+g== X-Forwarded-Encrypted: i=1; AJvYcCUpNIoZTcGYAoEL3RdrIxMNv7fPZM9qkLvrHk3SxOVwuo4gMHjjvG5vq2EvN7Y5S3ZTNUGRL+Jl33ENnoU=@vger.kernel.org X-Gm-Message-State: AOJu0YxZVS0vh5ykMYuFK1HAcDeL5ANBHX9GFcq08OVMtvsLEk+ltpFR 9fUJ4bT4oE1s9RE+dSTCY2TxFuDttEnuUUBCQ5olxxfktqZtofNdncBxgsZqblbH4Fw= X-Gm-Gg: ASbGncv2uIQpE+/a8ZOWLI69xRQk9E/1+Zdwf5Dt1bMR9Mu8PMnhdksqYIsQrWrcsr6 IsOuONU4GWEN2pJUp84TPwdsxXq/2YQnrSBUcZNEjnYGUhFx1VAYhVdJCfTfM5UWJTAtBJN9nRm CEouZSNBvqH32tVN3Lqt4aG72hkPJmAzN/hEWVr1dxgRznCVmlUjUNI72Pz9lc9QDgAZvLsPr4u EAqeLrdh9SQvdhk64Z2a/vw/aUg4kA0GaEY7DMetdK8YIAIboSEB//WqPeJDAhjb8wVCu1W8MQE CxIdbOd9hlrvOAhx7Kr6EfpAcRwwwfdDa1kQ48F1XCjVcdwO0D/HAiEUEeuwA8Sn0gjg9s6Gs6J RxRla+q8lrJ8XHVZmJ5tbSbk1GYqOGIj8vXxn2BbzRXl+EpnUWzucNwXuZID/aqbFnCq+Dpkmzo QlbACZbnKF6fxdMYyumQ0udHZgD5Ofr/B/tAHP2ogfr0tqYjI+dp1GtqVh39yxgagT X-Google-Smtp-Source: AGHT+IEVluARmCpMkOWEYo8q4r5FDTRBhMVJs5VhBJ+XJ3044t3Im4FugeCYJRKlNUbrCKiFOXWBbg== X-Received: by 2002:a05:620a:1914:b0:8b2:73f0:bd20 with SMTP id af79cd13be357-8b2ac200a20mr88765385a.39.1762975803034; Wed, 12 Nov 2025 11:30:03 -0800 (PST) Received: from gourry-fedora-PF4VCD3F.lan (pool-96-255-20-138.washdc.ftas.verizon.net. [96.255.20.138]) by smtp.gmail.com with ESMTPSA id af79cd13be357-8b29aa0082esm243922885a.50.2025.11.12.11.30.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 12 Nov 2025 11:30:02 -0800 (PST) From: Gregory Price To: linux-mm@kvack.org Cc: kernel-team@meta.com, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev, linux-fsdevel@vger.kernel.org, cgroups@vger.kernel.org, dave@stgolabs.net, jonathan.cameron@huawei.com, dave.jiang@intel.com, alison.schofield@intel.com, vishal.l.verma@intel.com, ira.weiny@intel.com, dan.j.williams@intel.com, longman@redhat.com, akpm@linux-foundation.org, david@redhat.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com, mhocko@suse.com, osalvador@suse.de, ziy@nvidia.com, matthew.brost@intel.com, joshua.hahnjy@gmail.com, rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net, ying.huang@linux.alibaba.com, apopple@nvidia.com, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, tj@kernel.org, hannes@cmpxchg.org, mkoutny@suse.com, kees@kernel.org, muchun.song@linux.dev, roman.gushchin@linux.dev, shakeel.butt@linux.dev, rientjes@google.com, jackmanb@google.com, cl@gentwo.org, harry.yoo@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, zhengqi.arch@bytedance.com, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev, fabio.m.de.francesco@linux.intel.com, rrichter@amd.com, ming.li@zohomail.com, usamaarif642@gmail.com, brauner@kernel.org, oleg@redhat.com, namcao@linutronix.de, escape@linux.alibaba.com, dongjoo.seo1@samsung.com Subject: [RFC PATCH v2 05/11] mm: restrict slub, oom, compaction, and page_alloc to sysram by default Date: Wed, 12 Nov 2025 14:29:21 -0500 Message-ID: <20251112192936.2574429-6-gourry@gourry.net> X-Mailer: git-send-email 2.51.1 In-Reply-To: <20251112192936.2574429-1-gourry@gourry.net> References: <20251112192936.2574429-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Restrict page allocation and zone iteration behavior in mm to skip SPM Nodes via cpusets, or mt_sysram_nodelist when cpusets is disabled. This constrains core users of nodemasks to the mt_sysram_nodelist, which is guaranteed to at least contain the set of nodes with sysram memory blocks present at boot (or NULL if NUMA is compiled out). If the sysram nodelist is empty (something in memory-tiers broken), return NULL, which still allows all zones to be iterated. Signed-off-by: Gregory Price --- mm/compaction.c | 3 +++ mm/oom_kill.c | 5 ++++- mm/page_alloc.c | 18 ++++++++++++++---- mm/slub.c | 15 ++++++++++++--- 4 files changed, 33 insertions(+), 8 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index d2176935d3dd..7b73179d1fbf 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include #include @@ -2832,6 +2833,8 @@ enum compact_result try_to_compact_pages(gfp_t gfp_ma= sk, unsigned int order, if ((alloc_flags & ALLOC_CPUSET) && !cpuset_zone_allowed(zone, gfp_mask)) continue; + else if (!mt_node_allowed(zone_to_nid(zone), gfp_mask)) + continue; =20 if (prio > MIN_COMPACT_PRIORITY && compaction_deferred(zone, order)) { diff --git a/mm/oom_kill.c b/mm/oom_kill.c index c145b0feecc1..386b4ceeaeb8 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -34,6 +34,7 @@ #include #include #include +#include #include #include #include @@ -1118,6 +1119,8 @@ EXPORT_SYMBOL_GPL(unregister_oom_notifier); bool out_of_memory(struct oom_control *oc) { unsigned long freed =3D 0; + if (!oc->nodemask) + oc->nodemask =3D mt_sysram_nodemask(); =20 if (oom_killer_disabled) return false; @@ -1154,7 +1157,7 @@ bool out_of_memory(struct oom_control *oc) */ oc->constraint =3D constrained_alloc(oc); if (oc->constraint !=3D CONSTRAINT_MEMORY_POLICY) - oc->nodemask =3D NULL; + oc->nodemask =3D mt_sysram_nodemask(); check_panic_on_oom(oc); =20 if (!is_memcg_oom(oc) && sysctl_oom_kill_allocating_task && diff --git a/mm/page_alloc.c b/mm/page_alloc.c index bcaf1125d109..2ea6a50f6079 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -34,6 +34,7 @@ #include #include #include +#include #include #include #include @@ -3753,6 +3754,8 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int o= rder, int alloc_flags, if ((alloc_flags & ALLOC_CPUSET) && !cpuset_zone_allowed(zone, gfp_mask)) continue; + else if (!mt_node_allowed(zone_to_nid(zone), gfp_mask)) + continue; /* * When allocating a page cache page for writing, we * want to get it from a node that is within its dirty @@ -4555,6 +4558,8 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order, if ((alloc_flags & ALLOC_CPUSET) && !cpuset_zone_allowed(zone, gfp_mask)) continue; + else if (!mt_node_allowed(zone_to_nid(zone), gfp_mask)) + continue; =20 available =3D reclaimable =3D zone_reclaimable_pages(zone); available +=3D zone_page_state_snapshot(zone, NR_FREE_PAGES); @@ -4608,7 +4613,7 @@ check_retry_cpuset(int cpuset_mems_cookie, struct all= oc_context *ac) */ if (cpusets_enabled() && ac->nodemask && !cpuset_nodemask_valid_mems_allowed(ac->nodemask)) { - ac->nodemask =3D NULL; + ac->nodemask =3D mt_sysram_nodemask(); return true; } =20 @@ -4792,7 +4797,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int o= rder, * user oriented. */ if (!(alloc_flags & ALLOC_CPUSET) || reserve_flags) { - ac->nodemask =3D NULL; + ac->nodemask =3D mt_sysram_nodemask(); ac->preferred_zoneref =3D first_zones_zonelist(ac->zonelist, ac->highest_zoneidx, ac->nodemask); } @@ -4944,7 +4949,8 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask= , unsigned int order, ac->nodemask =3D &cpuset_current_mems_allowed; else *alloc_flags |=3D ALLOC_CPUSET; - } + } else if (!ac->nodemask) /* sysram_nodes may be NULL during __init */ + ac->nodemask =3D mt_sysram_nodemask(); =20 might_alloc(gfp_mask); =20 @@ -5053,6 +5059,8 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int = preferred_nid, if ((alloc_flags & ALLOC_CPUSET) && !cpuset_zone_allowed(zone, gfp)) continue; + else if (!mt_node_allowed(zone_to_nid(zone), gfp)) + continue; =20 if (nr_online_nodes > 1 && zone !=3D zonelist_zone(ac.preferred_zoneref)= && zone_to_nid(zone) !=3D zonelist_node_idx(ac.preferred_zoneref)) { @@ -5187,8 +5195,10 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, = unsigned int order, /* * Restore the original nodemask if it was potentially replaced with * &cpuset_current_mems_allowed to optimize the fast-path attempt. + * + * If not set, default to sysram nodes. */ - ac.nodemask =3D nodemask; + ac.nodemask =3D nodemask ? nodemask : mt_sysram_nodemask(); =20 page =3D __alloc_pages_slowpath(alloc_gfp, order, &ac); =20 diff --git a/mm/slub.c b/mm/slub.c index 1bf65c421325..c857db97c6a0 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -28,6 +28,7 @@ #include #include #include +#include #include #include #include @@ -3576,11 +3577,19 @@ static struct slab *get_any_partial(struct kmem_cac= he *s, zonelist =3D node_zonelist(mempolicy_slab_node(), pc->flags); for_each_zone_zonelist(zone, z, zonelist, highest_zoneidx) { struct kmem_cache_node *n; + int nid =3D zone_to_nid(zone); + bool allowed; =20 - n =3D get_node(s, zone_to_nid(zone)); + n =3D get_node(s, nid); + if (!n) + continue; + + if (cpusets_enabled()) + allowed =3D __cpuset_zone_allowed(zone, pc->flags); + else + allowed =3D mt_node_allowed(nid, pc->flags); =20 - if (n && cpuset_zone_allowed(zone, pc->flags) && - n->nr_partial > s->min_partial) { + if (allowed && (n->nr_partial > s->min_partial)) { slab =3D get_partial_node(s, n, pc); if (slab) { /* --=20 2.51.1 From nobody Sun Feb 8 22:43:21 2026 Received: from mail-qk1-f169.google.com (mail-qk1-f169.google.com [209.85.222.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8F46234F24A for ; Wed, 12 Nov 2025 19:30:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762975810; cv=none; b=CAew8b6P/olxQHDlibZXeNd8shajP+InX8RqBGAsocjrGYE3O7MSj+q/2RGu32bCBq+dOOF56HJBcZJs4rcS913dDa95o7sHWBV39g2sgoX8p5MuUXk3cZWg4ZoHnJjCgoPgsAMa2VCoRsWbAMjzAr0GYhuyU45oN2UPU7FvVpk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762975810; c=relaxed/simple; bh=3eQAOWVLWYdFH8IRSCtox2dkM9KAIfvHkmeNLhZe/t0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=TT7OKOWag9d97jHUkCL+BdCGSqzuMSGlpR4Ut2pBm9KS4/RixNUfpLXzjezKQJg+nisBaRuBn37LSU2why/xZSj0GsXJXRvZWNC8OWcBUNMj2euJQKMIytKAPIhKviCOqM7M3icyh8nPVPTkoZIA6Ix0Q7hmxYlO7iC2LXTltgE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=Ra73BIaP; arc=none smtp.client-ip=209.85.222.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="Ra73BIaP" Received: by mail-qk1-f169.google.com with SMTP id af79cd13be357-8b22a30986fso3522685a.2 for ; Wed, 12 Nov 2025 11:30:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1762975806; x=1763580606; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=AhBmvL59ycsUz3ibQ/6uOqgLkj12C0XSyaG6LdtCjSY=; b=Ra73BIaPFVGmm0r33U2vrLFcTXXy999T+1/SWzTgw2hA102EnexfBxEbFL7SIT0sW9 FlON92KJezoRn54eoUiH9rADe8DiMmvY+oifuZ8lq8xC4AvVZVJAz4YWm95BiT8TmdSI tgh0IUmr72TRt8T5VTwciZOcOfYY6MBXYjOvAJvePwNWZ96ui1Lvd2knTEaQ+LVa553d 11XAxv0HYO4VSffQxsEzOhQ8XRcZ+bAF0Rz7a90gUwIuyl1JHHhc9ZLf0uIapLdFW7W2 xFzOYfWskeJ/8zgDNGCDExulWO6JRRHNI2TMH37DuA+0Kw159Wm/TTuB9ZEvtp8Okqam 6MHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1762975806; x=1763580606; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=AhBmvL59ycsUz3ibQ/6uOqgLkj12C0XSyaG6LdtCjSY=; b=lE0z6pTATHWW4RCZt9hjxkTuIKnC7tUgIFZHsUTX0AKywpdSQzSoo/ly8Ua8HAHvgS oX7eaNKijvyhhgmK5vVBqYIumrVZon1u9AowNO0DpkrDsx5kfD5wvpJuT1fvmOW3DboJ r2TsGdXzfUP8r4fbqaqydfVrU3kxmukFvx+gMoIjlzjuso4ubaKTo7TRvMHVBphEDTjQ Z6UwY/74MbBD5UTQsugAmi2eQjJ7XZRDwv5GOoj/ykd1JFCW1GelUO1F9+AOwJ2csSgu 3IwtBc6lKCjKMMOFZ6HkHnZ/bWlTtRCXa38+qA/wDZH37vcaeq6U+yiiSdxQc5sYYBiL yBJw== X-Forwarded-Encrypted: i=1; AJvYcCWl8GOimMcA3SJ9zxKu6ONVL3gm0jG7yUpglEKRi9joHY78VnsRUHq2eciqGgILdVPMrRiV4je+sufhNo4=@vger.kernel.org X-Gm-Message-State: AOJu0YxOSjH3Vd5gDpo7VQk/GsSKLMmioGRktlnwbEsuYda4VL6Ndpq4 T3lSoa5xQNPy+f9YCYXLY7LsHdA1g+mciSopnzzdJaPa7WQRMHHB9wswYBAMywPGQA4= X-Gm-Gg: ASbGncsxo7t/6pXU03SLNK7EsRWZSWBFE++octeU/n8nf7qoz9um5aSgBWcMCQAehCs iptt85SlU+b0SFXW44v1Y+3fqETPZPtSnB+tc7EMx3PXcgbvua5pesEiRkL5zZilmwL38xcRK5F gcEILyJc/yW9rrztnizl9iHGes3i9yfkRRuMfNmIt/z80emUhcKcjQf8yv0Ur3ofr8nhuZL+wKz nZvH/Vb7MPxwTc73+VIAu1HdfPXxhsxgG3wTy5cFzsb3cc6WXNxqGnr6IN7UIGc7MxbcTNs29Us Mh0alUXeZzezH6sH8tAX8MrIKzJr/MdniNUCgdM+P9P7VaQ5ToRXI3Mv/TNh+eLmI8awR4tWCi9 FtTS9DM/966fgUBhD+9APruU4ylGqdlzQtqM0oMiKl4/tvZG7Q7o3WlbPFwVNpa8GjkwV0AhNpE yWlj7rAX1XatQcPEcQVOVMm8PJ90nhvy/bXnvyKwl4iV/M9sUp6EDlce/o4KIdkwN5 X-Google-Smtp-Source: AGHT+IGTEVLf9O0yaDcu/kw8dFzHlv1Pot58wNRLhXzP9mTYCvcWg0Vdidkl8DHHnj9i/GN8ZNY6bg== X-Received: by 2002:a05:620a:3941:b0:8b2:1568:82e4 with SMTP id af79cd13be357-8b29b77a0famr523474785a.25.1762975806067; Wed, 12 Nov 2025 11:30:06 -0800 (PST) Received: from gourry-fedora-PF4VCD3F.lan (pool-96-255-20-138.washdc.ftas.verizon.net. [96.255.20.138]) by smtp.gmail.com with ESMTPSA id af79cd13be357-8b29aa0082esm243922885a.50.2025.11.12.11.30.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 12 Nov 2025 11:30:05 -0800 (PST) From: Gregory Price To: linux-mm@kvack.org Cc: kernel-team@meta.com, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev, linux-fsdevel@vger.kernel.org, cgroups@vger.kernel.org, dave@stgolabs.net, jonathan.cameron@huawei.com, dave.jiang@intel.com, alison.schofield@intel.com, vishal.l.verma@intel.com, ira.weiny@intel.com, dan.j.williams@intel.com, longman@redhat.com, akpm@linux-foundation.org, david@redhat.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com, mhocko@suse.com, osalvador@suse.de, ziy@nvidia.com, matthew.brost@intel.com, joshua.hahnjy@gmail.com, rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net, ying.huang@linux.alibaba.com, apopple@nvidia.com, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, tj@kernel.org, hannes@cmpxchg.org, mkoutny@suse.com, kees@kernel.org, muchun.song@linux.dev, roman.gushchin@linux.dev, shakeel.butt@linux.dev, rientjes@google.com, jackmanb@google.com, cl@gentwo.org, harry.yoo@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, zhengqi.arch@bytedance.com, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev, fabio.m.de.francesco@linux.intel.com, rrichter@amd.com, ming.li@zohomail.com, usamaarif642@gmail.com, brauner@kernel.org, oleg@redhat.com, namcao@linutronix.de, escape@linux.alibaba.com, dongjoo.seo1@samsung.com Subject: [RFC PATCH v2 06/11] mm,cpusets: rename task->mems_allowed to task->sysram_nodes Date: Wed, 12 Nov 2025 14:29:22 -0500 Message-ID: <20251112192936.2574429-7-gourry@gourry.net> X-Mailer: git-send-email 2.51.1 In-Reply-To: <20251112192936.2574429-1-gourry@gourry.net> References: <20251112192936.2574429-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" task->mems_allowed actually contains the value of cpuset.effective_mems The value of cpuset.mems.effective is the intersection of mems_allowed and the cpuset's parent's mems.effective. This creates a confusing naming scheme between references to task->mems_allowed, and cpuset mems_allowed and effective_mems. With the intent of making this nodemask only contain SystemRAM Nodes (i.e. omitting Specific Purpose Memory Nodes), rename task->mems_allowed to task->sysram_nodes. This accomplishes two things: 1) Detach task->mems_allowed and cpuset.mems_allowed naming scheme, making it slightly clearer that these may contain different values. 2) To enable cpusets.mems_allowed to contain SPM Nodes, letting a cgroup still control whether SPM nodes are "allowed" for that context, even if these nodes are not reachable through existing means. Signed-off-by: Gregory Price --- fs/proc/array.c | 2 +- include/linux/cpuset.h | 54 ++++++++++++++------------- include/linux/mempolicy.h | 2 +- include/linux/sched.h | 6 +-- init/init_task.c | 2 +- kernel/cgroup/cpuset.c | 78 +++++++++++++++++++-------------------- kernel/fork.c | 2 +- kernel/sched/fair.c | 4 +- mm/hugetlb.c | 8 ++-- mm/mempolicy.c | 28 +++++++------- mm/oom_kill.c | 6 +-- mm/page_alloc.c | 16 ++++---- mm/show_mem.c | 2 +- 13 files changed, 106 insertions(+), 104 deletions(-) diff --git a/fs/proc/array.c b/fs/proc/array.c index 2ae63189091e..61ee857a5caf 100644 --- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -456,7 +456,7 @@ int proc_pid_status(struct seq_file *m, struct pid_name= space *ns, task_cap(m, task); task_seccomp(m, task); task_cpus_allowed(m, task); - cpuset_task_status_allowed(m, task); + cpuset_task_status_sysram(m, task); task_context_switch_counts(m, task); arch_proc_pid_thread_features(m, task); return 0; diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h index 548eaf7ef8d0..9baaf19431b5 100644 --- a/include/linux/cpuset.h +++ b/include/linux/cpuset.h @@ -23,14 +23,14 @@ /* * Static branch rewrites can happen in an arbitrary order for a given * key. In code paths where we need to loop with read_mems_allowed_begin()= and - * read_mems_allowed_retry() to get a consistent view of mems_allowed, we = need - * to ensure that begin() always gets rewritten before retry() in the + * read_mems_allowed_retry() to get a consistent view of task->sysram_node= s, we + * need to ensure that begin() always gets rewritten before retry() in the * disabled -> enabled transition. If not, then if local irqs are disabled * around the loop, we can deadlock since retry() would always be - * comparing the latest value of the mems_allowed seqcount against 0 as + * comparing the latest value of the sysram_nodes seqcount against 0 as * begin() still would see cpusets_enabled() as false. The enabled -> disa= bled * transition should happen in reverse order for the same reasons (want to= stop - * looking at real value of mems_allowed.sequence in retry() first). + * looking at real value of sysram_nodes.sequence in retry() first). */ extern struct static_key_false cpusets_pre_enable_key; extern struct static_key_false cpusets_enabled_key; @@ -78,9 +78,10 @@ extern void cpuset_cpus_allowed(struct task_struct *p, s= truct cpumask *mask); extern bool cpuset_cpus_allowed_fallback(struct task_struct *p); extern bool cpuset_cpu_is_isolated(int cpu); extern nodemask_t cpuset_mems_allowed(struct task_struct *p); -#define cpuset_current_mems_allowed (current->mems_allowed) -void cpuset_init_current_mems_allowed(void); -int cpuset_nodemask_valid_mems_allowed(const nodemask_t *nodemask); +#define cpuset_current_sysram_nodes (current->sysram_nodes) +#define cpuset_current_mems_allowed (cpuset_current_sysram_nodes) +void cpuset_init_current_sysram_nodes(void); +int cpuset_nodemask_valid_sysram_nodes(const nodemask_t *nodemask); =20 extern bool cpuset_current_node_allowed(int node, gfp_t gfp_mask); =20 @@ -96,7 +97,7 @@ static inline bool cpuset_zone_allowed(struct zone *z, gf= p_t gfp_mask) return true; } =20 -extern int cpuset_mems_allowed_intersects(const struct task_struct *tsk1, +extern int cpuset_sysram_nodes_intersects(const struct task_struct *tsk1, const struct task_struct *tsk2); =20 #ifdef CONFIG_CPUSETS_V1 @@ -111,8 +112,8 @@ extern void __cpuset_memory_pressure_bump(void); static inline void cpuset_memory_pressure_bump(void) { } #endif =20 -extern void cpuset_task_status_allowed(struct seq_file *m, - struct task_struct *task); +extern void cpuset_task_status_sysram(struct seq_file *m, + struct task_struct *task); extern int proc_cpuset_show(struct seq_file *m, struct pid_namespace *ns, struct pid *pid, struct task_struct *tsk); =20 @@ -128,12 +129,12 @@ extern bool current_cpuset_is_being_rebound(void); extern void dl_rebuild_rd_accounting(void); extern void rebuild_sched_domains(void); =20 -extern void cpuset_print_current_mems_allowed(void); +extern void cpuset_print_current_sysram_nodes(void); extern void cpuset_reset_sched_domains(void); =20 /* - * read_mems_allowed_begin is required when making decisions involving - * mems_allowed such as during page allocation. mems_allowed can be update= d in + * read_mems_allowed_begin is required when making decisions involving a t= ask's + * sysram_nodes such as during page allocation. sysram_nodes can be update= d in * parallel and depending on the new value an operation can fail potential= ly * causing process failure. A retry loop with read_mems_allowed_begin and * read_mems_allowed_retry prevents these artificial failures. @@ -143,13 +144,13 @@ static inline unsigned int read_mems_allowed_begin(vo= id) if (!static_branch_unlikely(&cpusets_pre_enable_key)) return 0; =20 - return read_seqcount_begin(¤t->mems_allowed_seq); + return read_seqcount_begin(¤t->sysram_nodes_seq); } =20 /* * If this returns true, the operation that took place after * read_mems_allowed_begin may have failed artificially due to a concurrent - * update of mems_allowed. It is up to the caller to retry the operation if + * update of sysram_nodes. It is up to the caller to retry the operation if * appropriate. */ static inline bool read_mems_allowed_retry(unsigned int seq) @@ -157,7 +158,7 @@ static inline bool read_mems_allowed_retry(unsigned int= seq) if (!static_branch_unlikely(&cpusets_enabled_key)) return false; =20 - return read_seqcount_retry(¤t->mems_allowed_seq, seq); + return read_seqcount_retry(¤t->sysram_nodes_seq, seq); } =20 static inline void set_mems_allowed(nodemask_t nodemask) @@ -166,9 +167,9 @@ static inline void set_mems_allowed(nodemask_t nodemask) =20 task_lock(current); local_irq_save(flags); - write_seqcount_begin(¤t->mems_allowed_seq); - current->mems_allowed =3D nodemask; - write_seqcount_end(¤t->mems_allowed_seq); + write_seqcount_begin(¤t->sysram_nodes_seq); + current->sysram_nodes =3D nodemask; + write_seqcount_end(¤t->sysram_nodes_seq); local_irq_restore(flags); task_unlock(current); } @@ -216,10 +217,11 @@ static inline nodemask_t cpuset_mems_allowed(struct t= ask_struct *p) return node_possible_map; } =20 -#define cpuset_current_mems_allowed (node_states[N_MEMORY]) -static inline void cpuset_init_current_mems_allowed(void) {} +#define cpuset_current_sysram_nodes (node_states[N_MEMORY]) +#define cpuset_current_mems_allowed (cpuset_current_sysram_nodes) +static inline void cpuset_init_current_sysram_nodes(void) {} =20 -static inline int cpuset_nodemask_valid_mems_allowed(const nodemask_t *nod= emask) +static inline int cpuset_nodemask_valid_sysram_nodes(const nodemask_t *nod= emask) { return 1; } @@ -234,7 +236,7 @@ static inline bool cpuset_zone_allowed(struct zone *z, = gfp_t gfp_mask) return true; } =20 -static inline int cpuset_mems_allowed_intersects(const struct task_struct = *tsk1, +static inline int cpuset_sysram_nodes_intersects(const struct task_struct = *tsk1, const struct task_struct *tsk2) { return 1; @@ -242,8 +244,8 @@ static inline int cpuset_mems_allowed_intersects(const = struct task_struct *tsk1, =20 static inline void cpuset_memory_pressure_bump(void) {} =20 -static inline void cpuset_task_status_allowed(struct seq_file *m, - struct task_struct *task) +static inline void cpuset_task_status_sysram(struct seq_file *m, + struct task_struct *task) { } =20 @@ -276,7 +278,7 @@ static inline void cpuset_reset_sched_domains(void) partition_sched_domains(1, NULL, NULL); } =20 -static inline void cpuset_print_current_mems_allowed(void) +static inline void cpuset_print_current_sysram_nodes(void) { } =20 diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h index 0fe96f3ab3ef..f9a2b1bed3fa 100644 --- a/include/linux/mempolicy.h +++ b/include/linux/mempolicy.h @@ -52,7 +52,7 @@ struct mempolicy { int home_node; /* Home node to use for MPOL_BIND and MPOL_PREFERRED_MANY= */ =20 union { - nodemask_t cpuset_mems_allowed; /* relative to these nodes */ + nodemask_t cpuset_sysram_nodes; /* relative to these nodes */ nodemask_t user_nodemask; /* nodemask passed by user */ } w; }; diff --git a/include/linux/sched.h b/include/linux/sched.h index b469878de25c..ad2d0cb00772 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1223,7 +1223,7 @@ struct task_struct { u64 parent_exec_id; u64 self_exec_id; =20 - /* Protection against (de-)allocation: mm, files, fs, tty, keyrings, mems= _allowed, mempolicy: */ + /* Protection against (de-)allocation: mm, files, fs, tty, keyrings, sysr= am_nodes, mempolicy: */ spinlock_t alloc_lock; =20 /* Protection of the PI data structures: */ @@ -1314,9 +1314,9 @@ struct task_struct { #endif #ifdef CONFIG_CPUSETS /* Protected by ->alloc_lock: */ - nodemask_t mems_allowed; + nodemask_t sysram_nodes; /* Sequence number to catch updates: */ - seqcount_spinlock_t mems_allowed_seq; + seqcount_spinlock_t sysram_nodes_seq; int cpuset_mem_spread_rotor; #endif #ifdef CONFIG_CGROUPS diff --git a/init/init_task.c b/init/init_task.c index a55e2189206f..857a5978d403 100644 --- a/init/init_task.c +++ b/init/init_task.c @@ -173,7 +173,7 @@ struct task_struct init_task __aligned(L1_CACHE_BYTES) = =3D { .trc_blkd_node =3D LIST_HEAD_INIT(init_task.trc_blkd_node), #endif #ifdef CONFIG_CPUSETS - .mems_allowed_seq =3D SEQCNT_SPINLOCK_ZERO(init_task.mems_allowed_seq, + .sysram_nodes_seq =3D SEQCNT_SPINLOCK_ZERO(init_task.sysram_nodes_seq, &init_task.alloc_lock), #endif #ifdef CONFIG_RT_MUTEXES diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index cd3e2ae83d70..f0c59621a7f2 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -240,7 +240,7 @@ static struct cpuset top_cpuset =3D { * If a task is only holding callback_lock, then it has read-only * access to cpusets. * - * Now, the task_struct fields mems_allowed and mempolicy may be changed + * Now, the task_struct fields sysram_nodes and mempolicy may be changed * by other task, we use alloc_lock in the task_struct fields to protect * them. * @@ -2678,11 +2678,11 @@ static void schedule_flush_migrate_mm(void) } =20 /* - * cpuset_change_task_nodemask - change task's mems_allowed and mempolicy + * cpuset_change_task_nodemask - change task's sysram_nodes and mempolicy * @tsk: the task to change * @newmems: new nodes that the task will be set * - * We use the mems_allowed_seq seqlock to safely update both tsk->mems_all= owed + * We use the sysram_nodes_seq seqlock to safely update both tsk->sysram_n= odes * and rebind an eventual tasks' mempolicy. If the task is allocating in * parallel, it might temporarily see an empty intersection, which results= in * a seqlock check and retry before OOM or allocation failure. @@ -2693,13 +2693,13 @@ static void cpuset_change_task_nodemask(struct task= _struct *tsk, task_lock(tsk); =20 local_irq_disable(); - write_seqcount_begin(&tsk->mems_allowed_seq); + write_seqcount_begin(&tsk->sysram_nodes_seq); =20 - nodes_or(tsk->mems_allowed, tsk->mems_allowed, *newmems); + nodes_or(tsk->sysram_nodes, tsk->sysram_nodes, *newmems); mpol_rebind_task(tsk, newmems); - tsk->mems_allowed =3D *newmems; + tsk->sysram_nodes =3D *newmems; =20 - write_seqcount_end(&tsk->mems_allowed_seq); + write_seqcount_end(&tsk->sysram_nodes_seq); local_irq_enable(); =20 task_unlock(tsk); @@ -2709,9 +2709,9 @@ static void *cpuset_being_rebound; =20 /** * cpuset_update_tasks_nodemask - Update the nodemasks of tasks in the cpu= set. - * @cs: the cpuset in which each task's mems_allowed mask needs to be chan= ged + * @cs: the cpuset in which each task's sysram_nodes mask needs to be chan= ged * - * Iterate through each task of @cs updating its mems_allowed to the + * Iterate through each task of @cs updating its sysram_nodes to the * effective cpuset's. As this function is called with cpuset_mutex held, * cpuset membership stays stable. */ @@ -3763,7 +3763,7 @@ static void cpuset_fork(struct task_struct *task) return; =20 set_cpus_allowed_ptr(task, current->cpus_ptr); - task->mems_allowed =3D current->mems_allowed; + task->sysram_nodes =3D current->sysram_nodes; return; } =20 @@ -4205,9 +4205,9 @@ bool cpuset_cpus_allowed_fallback(struct task_struct = *tsk) return changed; } =20 -void __init cpuset_init_current_mems_allowed(void) +void __init cpuset_init_current_sysram_nodes(void) { - nodes_setall(current->mems_allowed); + nodes_setall(current->sysram_nodes); } =20 /** @@ -4233,14 +4233,14 @@ nodemask_t cpuset_mems_allowed(struct task_struct *= tsk) } =20 /** - * cpuset_nodemask_valid_mems_allowed - check nodemask vs. current mems_al= lowed + * cpuset_nodemask_valid_sysram_nodes - check nodemask vs. current sysram_= nodes * @nodemask: the nodemask to be checked * - * Are any of the nodes in the nodemask allowed in current->mems_allowed? + * Are any of the nodes in the nodemask allowed in current->sysram_nodes? */ -int cpuset_nodemask_valid_mems_allowed(const nodemask_t *nodemask) +int cpuset_nodemask_valid_sysram_nodes(const nodemask_t *nodemask) { - return nodes_intersects(*nodemask, current->mems_allowed); + return nodes_intersects(*nodemask, current->sysram_nodes); } =20 /* @@ -4262,7 +4262,7 @@ static struct cpuset *nearest_hardwall_ancestor(struc= t cpuset *cs) * @gfp_mask: memory allocation flags * * If we're in interrupt, yes, we can always allocate. If @node is set in - * current's mems_allowed, yes. If it's not a __GFP_HARDWALL request and = this + * current's sysram_nodes, yes. If it's not a __GFP_HARDWALL request and = this * node is set in the nearest hardwalled cpuset ancestor to current's cpus= et, * yes. If current has access to memory reserves as an oom victim, yes. * Otherwise, no. @@ -4276,7 +4276,7 @@ static struct cpuset *nearest_hardwall_ancestor(struc= t cpuset *cs) * Scanning up parent cpusets requires callback_lock. The * __alloc_pages() routine only calls here with __GFP_HARDWALL bit * _not_ set if it's a GFP_KERNEL allocation, and all nodes in the - * current tasks mems_allowed came up empty on the first pass over + * current tasks sysram_nodes came up empty on the first pass over * the zonelist. So only GFP_KERNEL allocations, if all nodes in the * cpuset are short of memory, might require taking the callback_lock. * @@ -4304,7 +4304,7 @@ bool cpuset_current_node_allowed(int node, gfp_t gfp_= mask) =20 if (in_interrupt()) return true; - if (node_isset(node, current->mems_allowed)) + if (node_isset(node, current->sysram_nodes)) return true; /* * Allow tasks that have access to memory reserves because they have @@ -4375,13 +4375,13 @@ bool cpuset_node_allowed(struct cgroup *cgroup, int= nid) * certain page cache or slab cache pages such as used for file * system buffers and inode caches, then instead of starting on the * local node to look for a free page, rather spread the starting - * node around the tasks mems_allowed nodes. + * node around the tasks sysram_nodes nodes. * * We don't have to worry about the returned node being offline * because "it can't happen", and even if it did, it would be ok. * * The routines calling guarantee_online_mems() are careful to - * only set nodes in task->mems_allowed that are online. So it + * only set nodes in task->sysram_nodes that are online. So it * should not be possible for the following code to return an * offline node. But if it did, that would be ok, as this routine * is not returning the node where the allocation must be, only @@ -4392,7 +4392,7 @@ bool cpuset_node_allowed(struct cgroup *cgroup, int n= id) */ static int cpuset_spread_node(int *rotor) { - return *rotor =3D next_node_in(*rotor, current->mems_allowed); + return *rotor =3D next_node_in(*rotor, current->sysram_nodes); } =20 /** @@ -4402,35 +4402,35 @@ int cpuset_mem_spread_node(void) { if (current->cpuset_mem_spread_rotor =3D=3D NUMA_NO_NODE) current->cpuset_mem_spread_rotor =3D - node_random(¤t->mems_allowed); + node_random(¤t->sysram_nodes); =20 return cpuset_spread_node(¤t->cpuset_mem_spread_rotor); } =20 /** - * cpuset_mems_allowed_intersects - Does @tsk1's mems_allowed intersect @t= sk2's? + * cpuset_sysram_nodes_intersects - Does @tsk1's sysram_nodes intersect @t= sk2's? * @tsk1: pointer to task_struct of some task. * @tsk2: pointer to task_struct of some other task. * - * Description: Return true if @tsk1's mems_allowed intersects the - * mems_allowed of @tsk2. Used by the OOM killer to determine if + * Description: Return true if @tsk1's sysram_nodes intersects the + * sysram_nodes of @tsk2. Used by the OOM killer to determine if * one of the task's memory usage might impact the memory available * to the other. **/ =20 -int cpuset_mems_allowed_intersects(const struct task_struct *tsk1, +int cpuset_sysram_nodes_intersects(const struct task_struct *tsk1, const struct task_struct *tsk2) { - return nodes_intersects(tsk1->mems_allowed, tsk2->mems_allowed); + return nodes_intersects(tsk1->sysram_nodes, tsk2->sysram_nodes); } =20 /** - * cpuset_print_current_mems_allowed - prints current's cpuset and mems_al= lowed + * cpuset_print_current_sysram_nodes - prints current's cpuset and sysram_= nodes * * Description: Prints current's name, cpuset name, and cached copy of its - * mems_allowed to the kernel log. + * sysram_nodes to the kernel log. */ -void cpuset_print_current_mems_allowed(void) +void cpuset_print_current_sysram_nodes(void) { struct cgroup *cgrp; =20 @@ -4439,17 +4439,17 @@ void cpuset_print_current_mems_allowed(void) cgrp =3D task_cs(current)->css.cgroup; pr_cont(",cpuset=3D"); pr_cont_cgroup_name(cgrp); - pr_cont(",mems_allowed=3D%*pbl", - nodemask_pr_args(¤t->mems_allowed)); + pr_cont(",sysram_nodes=3D%*pbl", + nodemask_pr_args(¤t->sysram_nodes)); =20 rcu_read_unlock(); } =20 -/* Display task mems_allowed in /proc//status file. */ -void cpuset_task_status_allowed(struct seq_file *m, struct task_struct *ta= sk) +/* Display task sysram_nodes in /proc//status file. */ +void cpuset_task_status_sysram(struct seq_file *m, struct task_struct *tas= k) { - seq_printf(m, "Mems_allowed:\t%*pb\n", - nodemask_pr_args(&task->mems_allowed)); - seq_printf(m, "Mems_allowed_list:\t%*pbl\n", - nodemask_pr_args(&task->mems_allowed)); + seq_printf(m, "Sysram_nodes:\t%*pb\n", + nodemask_pr_args(&task->sysram_nodes)); + seq_printf(m, "Sysram_nodes_list:\t%*pbl\n", + nodemask_pr_args(&task->sysram_nodes)); } diff --git a/kernel/fork.c b/kernel/fork.c index 3da0f08615a9..9ca2b59d7f0e 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2120,7 +2120,7 @@ __latent_entropy struct task_struct *copy_process( #endif #ifdef CONFIG_CPUSETS p->cpuset_mem_spread_rotor =3D NUMA_NO_NODE; - seqcount_spinlock_init(&p->mems_allowed_seq, &p->alloc_lock); + seqcount_spinlock_init(&p->sysram_nodes_seq, &p->alloc_lock); #endif #ifdef CONFIG_TRACE_IRQFLAGS memset(&p->irqtrace, 0, sizeof(p->irqtrace)); diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 5b752324270b..667c53fc3954 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3317,8 +3317,8 @@ static void task_numa_work(struct callback_head *work) * Memory is pinned to only one NUMA node via cpuset.mems, naturally * no page can be migrated. */ - if (cpusets_enabled() && nodes_weight(cpuset_current_mems_allowed) =3D=3D= 1) { - trace_sched_skip_cpuset_numa(current, &cpuset_current_mems_allowed); + if (cpusets_enabled() && nodes_weight(cpuset_current_sysram_nodes) =3D=3D= 1) { + trace_sched_skip_cpuset_numa(current, &cpuset_current_sysram_nodes); return; } =20 diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 0455119716ec..0d16890c1a4f 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2366,7 +2366,7 @@ static nodemask_t *policy_mbind_nodemask(gfp_t gfp) */ if (mpol->mode =3D=3D MPOL_BIND && (apply_policy_zone(mpol, gfp_zone(gfp)) && - cpuset_nodemask_valid_mems_allowed(&mpol->nodes))) + cpuset_nodemask_valid_sysram_nodes(&mpol->nodes))) return &mpol->nodes; #endif return NULL; @@ -2389,9 +2389,9 @@ static int gather_surplus_pages(struct hstate *h, lon= g delta) =20 mbind_nodemask =3D policy_mbind_nodemask(htlb_alloc_mask(h)); if (mbind_nodemask) - nodes_and(alloc_nodemask, *mbind_nodemask, cpuset_current_mems_allowed); + nodes_and(alloc_nodemask, *mbind_nodemask, cpuset_current_sysram_nodes); else - alloc_nodemask =3D cpuset_current_mems_allowed; + alloc_nodemask =3D cpuset_current_sysram_nodes; =20 lockdep_assert_held(&hugetlb_lock); needed =3D (h->resv_huge_pages + delta) - h->free_huge_pages; @@ -5084,7 +5084,7 @@ static unsigned int allowed_mems_nr(struct hstate *h) gfp_t gfp_mask =3D htlb_alloc_mask(h); =20 mbind_nodemask =3D policy_mbind_nodemask(gfp_mask); - for_each_node_mask(node, cpuset_current_mems_allowed) { + for_each_node_mask(node, cpuset_current_sysram_nodes) { if (!mbind_nodemask || node_isset(node, *mbind_nodemask)) nr +=3D array[node]; } diff --git a/mm/mempolicy.c b/mm/mempolicy.c index eb83cff7db8c..735dabb9c50c 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -396,7 +396,7 @@ static int mpol_new_preferred(struct mempolicy *pol, co= nst nodemask_t *nodes) * any, for the new policy. mpol_new() has already validated the nodes * parameter with respect to the policy mode and flags. * - * Must be called holding task's alloc_lock to protect task's mems_allowed + * Must be called holding task's alloc_lock to protect task's sysram_nodes * and mempolicy. May also be called holding the mmap_lock for write. */ static int mpol_set_nodemask(struct mempolicy *pol, @@ -414,7 +414,7 @@ static int mpol_set_nodemask(struct mempolicy *pol, =20 /* Check N_MEMORY */ nodes_and(nsc->mask1, - cpuset_current_mems_allowed, node_states[N_MEMORY]); + cpuset_current_sysram_nodes, node_states[N_MEMORY]); =20 VM_BUG_ON(!nodes); =20 @@ -426,7 +426,7 @@ static int mpol_set_nodemask(struct mempolicy *pol, if (mpol_store_user_nodemask(pol)) pol->w.user_nodemask =3D *nodes; else - pol->w.cpuset_mems_allowed =3D cpuset_current_mems_allowed; + pol->w.cpuset_sysram_nodes =3D cpuset_current_sysram_nodes; =20 ret =3D mpol_ops[pol->mode].create(pol, &nsc->mask2); return ret; @@ -501,9 +501,9 @@ static void mpol_rebind_nodemask(struct mempolicy *pol,= const nodemask_t *nodes) else if (pol->flags & MPOL_F_RELATIVE_NODES) mpol_relative_nodemask(&tmp, &pol->w.user_nodemask, nodes); else { - nodes_remap(tmp, pol->nodes, pol->w.cpuset_mems_allowed, + nodes_remap(tmp, pol->nodes, pol->w.cpuset_sysram_nodes, *nodes); - pol->w.cpuset_mems_allowed =3D *nodes; + pol->w.cpuset_sysram_nodes =3D *nodes; } =20 if (nodes_empty(tmp)) @@ -515,14 +515,14 @@ static void mpol_rebind_nodemask(struct mempolicy *po= l, const nodemask_t *nodes) static void mpol_rebind_preferred(struct mempolicy *pol, const nodemask_t *nodes) { - pol->w.cpuset_mems_allowed =3D *nodes; + pol->w.cpuset_sysram_nodes =3D *nodes; } =20 /* * mpol_rebind_policy - Migrate a policy to a different set of nodes * * Per-vma policies are protected by mmap_lock. Allocations using per-task - * policies are protected by task->mems_allowed_seq to prevent a premature + * policies are protected by task->sysram_nodes_seq to prevent a premature * OOM/allocation failure due to parallel nodemask modification. */ static void mpol_rebind_policy(struct mempolicy *pol, const nodemask_t *ne= wmask) @@ -530,7 +530,7 @@ static void mpol_rebind_policy(struct mempolicy *pol, c= onst nodemask_t *newmask) if (!pol || pol->mode =3D=3D MPOL_LOCAL) return; if (!mpol_store_user_nodemask(pol) && - nodes_equal(pol->w.cpuset_mems_allowed, *newmask)) + nodes_equal(pol->w.cpuset_sysram_nodes, *newmask)) return; =20 mpol_ops[pol->mode].rebind(pol, newmask); @@ -1086,7 +1086,7 @@ static long do_get_mempolicy(int *policy, nodemask_t = *nmask, return -EINVAL; *policy =3D 0; /* just so it's initialized */ task_lock(current); - *nmask =3D cpuset_current_mems_allowed; + *nmask =3D cpuset_current_sysram_nodes; task_unlock(current); return 0; } @@ -2029,7 +2029,7 @@ static unsigned int weighted_interleave_nodes(struct = mempolicy *policy) unsigned int cpuset_mems_cookie; =20 retry: - /* to prevent miscount use tsk->mems_allowed_seq to detect rebind */ + /* to prevent miscount use tsk->sysram_nodes_seq to detect rebind */ cpuset_mems_cookie =3D read_mems_allowed_begin(); node =3D current->il_prev; if (!current->il_weight || !node_isset(node, policy->nodes)) { @@ -2051,7 +2051,7 @@ static unsigned int interleave_nodes(struct mempolicy= *policy) unsigned int nid; unsigned int cpuset_mems_cookie; =20 - /* to prevent miscount, use tsk->mems_allowed_seq to detect rebind */ + /* to prevent miscount, use tsk->sysram_nodes_seq to detect rebind */ do { cpuset_mems_cookie =3D read_mems_allowed_begin(); nid =3D next_node_in(current->il_prev, policy->nodes); @@ -2118,7 +2118,7 @@ static unsigned int read_once_policy_nodemask(struct = mempolicy *pol, /* * barrier stabilizes the nodemask locally so that it can be iterated * over safely without concern for changes. Allocators validate node - * selection does not violate mems_allowed, so this is safe. + * selection does not violate sysram_nodes, so this is safe. */ barrier(); memcpy(mask, &pol->nodes, sizeof(nodemask_t)); @@ -2210,7 +2210,7 @@ static nodemask_t *policy_nodemask(gfp_t gfp, struct = mempolicy *pol, case MPOL_BIND: /* Restrict to nodemask (but not on lower zones) */ if (apply_policy_zone(pol, gfp_zone(gfp)) && - cpuset_nodemask_valid_mems_allowed(&pol->nodes)) + cpuset_nodemask_valid_sysram_nodes(&pol->nodes)) nodemask =3D &pol->nodes; if (pol->home_node !=3D NUMA_NO_NODE) *nid =3D pol->home_node; @@ -2738,7 +2738,7 @@ int vma_dup_policy(struct vm_area_struct *src, struct= vm_area_struct *dst) /* * If mpol_dup() sees current->cpuset =3D=3D cpuset_being_rebound, then it * rebinds the mempolicy its copying by calling mpol_rebind_policy() - * with the mems_allowed returned by cpuset_mems_allowed(). This + * with the sysram_nodes returned by cpuset_mems_allowed(). This * keeps mempolicies cpuset relative after its cpuset moves. See * further kernel/cpuset.c update_nodemask(). * diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 386b4ceeaeb8..9d13580c21ef 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -110,7 +110,7 @@ static bool oom_cpuset_eligible(struct task_struct *sta= rt, * This is not a mempolicy constrained oom, so only * check the mems of tsk's cpuset. */ - ret =3D cpuset_mems_allowed_intersects(current, tsk); + ret =3D cpuset_sysram_nodes_intersects(current, tsk); } if (ret) break; @@ -300,7 +300,7 @@ static enum oom_constraint constrained_alloc(struct oom= _control *oc) =20 if (cpuset_limited) { oc->totalpages =3D total_swap_pages; - for_each_node_mask(nid, cpuset_current_mems_allowed) + for_each_node_mask(nid, cpuset_current_sysram_nodes) oc->totalpages +=3D node_present_pages(nid); return CONSTRAINT_CPUSET; } @@ -451,7 +451,7 @@ static void dump_oom_victim(struct oom_control *oc, str= uct task_struct *victim) pr_info("oom-kill:constraint=3D%s,nodemask=3D%*pbl", oom_constraint_text[oc->constraint], nodemask_pr_args(oc->nodemask)); - cpuset_print_current_mems_allowed(); + cpuset_print_current_sysram_nodes(); mem_cgroup_print_oom_context(oc->memcg, victim); pr_cont(",task=3D%s,pid=3D%d,uid=3D%d\n", victim->comm, victim->pid, from_kuid(&init_user_ns, task_uid(victim))); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2ea6a50f6079..e1257cb7aea4 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3964,7 +3964,7 @@ void warn_alloc(gfp_t gfp_mask, const nodemask_t *nod= emask, const char *fmt, ... nodemask_pr_args(nodemask)); va_end(args); =20 - cpuset_print_current_mems_allowed(); + cpuset_print_current_sysram_nodes(); pr_cont("\n"); dump_stack(); warn_alloc_show_mem(gfp_mask, nodemask); @@ -4601,7 +4601,7 @@ static inline bool check_retry_cpuset(int cpuset_mems_cookie, struct alloc_context *ac) { /* - * It's possible that cpuset's mems_allowed and the nodemask from + * It's possible that cpuset's sysram_nodes and the nodemask from * mempolicy don't intersect. This should be normally dealt with by * policy_nodemask(), but it's possible to race with cpuset update in * such a way the check therein was true, and then it became false @@ -4612,13 +4612,13 @@ check_retry_cpuset(int cpuset_mems_cookie, struct a= lloc_context *ac) * caller can deal with a violated nodemask. */ if (cpusets_enabled() && ac->nodemask && - !cpuset_nodemask_valid_mems_allowed(ac->nodemask)) { + !cpuset_nodemask_valid_sysram_nodes(ac->nodemask)) { ac->nodemask =3D mt_sysram_nodemask(); return true; } =20 /* - * When updating a task's mems_allowed or mempolicy nodemask, it is + * When updating a task's sysram_nodes or mempolicy nodemask, it is * possible to race with parallel threads in such a way that our * allocation can fail while the mask is being updated. If we are about * to fail, check if the cpuset changed during allocation and if so, @@ -4702,7 +4702,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int o= rder, if (cpusets_insane_config() && (gfp_mask & __GFP_HARDWALL)) { struct zoneref *z =3D first_zones_zonelist(ac->zonelist, ac->highest_zoneidx, - &cpuset_current_mems_allowed); + &cpuset_current_sysram_nodes); if (!zonelist_zone(z)) goto nopage; } @@ -4946,7 +4946,7 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask= , unsigned int order, * to the current task context. It means that any node ok. */ if (in_task() && !ac->nodemask) - ac->nodemask =3D &cpuset_current_mems_allowed; + ac->nodemask =3D &cpuset_current_sysram_nodes; else *alloc_flags |=3D ALLOC_CPUSET; } else if (!ac->nodemask) /* sysram_nodes may be NULL during __init */ @@ -5194,7 +5194,7 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, u= nsigned int order, =20 /* * Restore the original nodemask if it was potentially replaced with - * &cpuset_current_mems_allowed to optimize the fast-path attempt. + * &cpuset_current_sysram_nodes to optimize the fast-path attempt. * * If not set, default to sysram nodes. */ @@ -5819,7 +5819,7 @@ build_all_zonelists_init(void) per_cpu_pages_init(&per_cpu(boot_pageset, cpu), &per_cpu(boot_zonestats,= cpu)); =20 mminit_verify_zonelist(); - cpuset_init_current_mems_allowed(); + cpuset_init_current_sysram_nodes(); } =20 /* diff --git a/mm/show_mem.c b/mm/show_mem.c index 24685b5c6dcf..ca7b6872c3d8 100644 --- a/mm/show_mem.c +++ b/mm/show_mem.c @@ -128,7 +128,7 @@ static bool show_mem_node_skip(unsigned int flags, int = nid, * have to be precise here. */ if (!nodemask) - nodemask =3D &cpuset_current_mems_allowed; + nodemask =3D &cpuset_current_sysram_nodes; =20 return !node_isset(nid, *nodemask); } --=20 2.51.1 From nobody Sun Feb 8 22:43:21 2026 Received: from mail-qk1-f180.google.com (mail-qk1-f180.google.com [209.85.222.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7086227E07E for ; Wed, 12 Nov 2025 19:30:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762975812; cv=none; b=lqvuQWVW0l/MSPnrEuTRviOvUb/VKSVabENIa+Jqr2oTvjr0d/ndm1CU5vlDh6cjbCqKD8sO7h73eAsEiSuPlGYPdh1G+DEVUqb1n2j5gMX5XIg/nv0mBA8x20F2P0PdV+is20VOl1HWYSkkTbh9WSNCzDiqb2rEq79HObolswc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762975812; c=relaxed/simple; bh=6BwwMBgEL/VCSv0hU+JRwWUReqTxvIqTXz0u7krmCXU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fa7Y2VOlqkj1VCZTG3JccS7/YebF+IAI1dAK8aEqjScmwqHo4hzLHsMnwZ+rslZlZIFtLnLQDpYByyMF+qDQWQz6yGSe4bziGZxDI/KTUFRomi4HnDcbmz88Rz3Os/AectHz+CHKIUEPJOPMFb/gXZZ0B6my9bnj9QHvdjYT9nU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=XQPrCBO0; arc=none smtp.client-ip=209.85.222.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="XQPrCBO0" Received: by mail-qk1-f180.google.com with SMTP id af79cd13be357-8b25e273a8dso3506585a.3 for ; Wed, 12 Nov 2025 11:30:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1762975809; x=1763580609; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=q3IcU6Qzmbokcc6SWwuh5Ibj7p8oAnPPqespIDCjwME=; b=XQPrCBO0XBRF/9jTJbtRssz1MKYQsSHIe7/dNbwmBS+o8gmseH5+6xK4eOm4h/no/h 9jgkZCLLFuDzrObX0jC15tVmLm0PHyQKlMRnZdiHuuGlqYgf7SM24vAj6t91O1sqmmFI B90I9yNZPr8WxHJa6gva3D9PBBFJpvDj8MiaMiQcmZEBCv3n/k2zN9CSZy9d9Eih57hE aBmZMPtg8/4ikO9+znv4tj2pE8yEzaxD7fzcpYzTTvRtr1bdIusmFQT+rzIgcETcvvbH zL98EBhIlgXplKyAP/NH0YewQ2Tncd+hURWUHogc1pYBsbDp/1CmP/tZGiHVxKJ3sEgy GgOw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1762975809; x=1763580609; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=q3IcU6Qzmbokcc6SWwuh5Ibj7p8oAnPPqespIDCjwME=; b=M19U04/EhDfEDcLQFe3DqWiK6TyiObtW2ml5QCRPxuVCTA2U7FXd8ew7FjSfL5L3+A JlXljLj0m3bO8NHV5ipx+2wAUotl6sQaHcCqpHfYzCat7JJScPqE5aCIROmxsPNHcsLG zJRjMw60c7NrpicDETFofoGIKL8Q/fjRiB4cRzba6jmqhCJtcJ/jWF3Fnf/KSsUJJKCm KuhRqNQnN+YYqz8/Fzb8QhqXZt8Zwql+jAglDBnrAS41+FToAVH9Y1RVwu+6YVlMWLWg Ngd8xOlyVOb8RDzinptnilzGYVlnt8pkCmSDl2PiUu6MelRY/XxhzDMlvXKlGAHG2ZGw RhWg== X-Forwarded-Encrypted: i=1; AJvYcCV7dge2EjHaCgBYHuutygvknvH/hFRzLlhbN0TFvyejpb6roiIUsn7KASF+MxPRk6ddb+QRqs3vMpVN0SM=@vger.kernel.org X-Gm-Message-State: AOJu0Yx7DCmfwAETPzV4Cy+5+BjTpa7i1LrPMkri1k9nPcOOpJsw6/Z6 Dvli1dnEz9efNRdaw+9b24PrYIJd0swvyq41RZvSStqomTX4aRrGmLEyXjfeZUCS3uU= X-Gm-Gg: ASbGncs0N0TWHWkq2wRTwjd6TqhHcbxHiC5NrbMbDD8UBr5F6fvHgJJLon6uCyFVO2/ +F0BQUmjd++I9JBGxuqVXT3x+9HHuiLp/YBrQeoDUaft+CD6ThwwxvhcpDY+hRP2T2G20+1hYcP tQuISizDo5DptsvYvebSMs4AuttagcXbDViBxLjST8ehWnafB8o3zmB273Ja8ReFeT5Q1Zf3ChC 8jSvhk3CfOi+xkgNE7ATKIE3oOSikMz+L0I5twSwNXwgNBLG6T53yMi/7HSy5ZQJvGKIcxGFC70 RaEUPJKRtOmjUEIpF5rj4fCPm1fYBeTFHLJY0MjlpTTwp91gVWQSzf2J9jzlH4d7D/ZY52CqMwP JB1NilI+DwOFOPi39b7v65ZktX89E4lo7Ny3bUUn30hLEB5oBQ4eKTIFNgSO0AzcyguzMVgBKEz yV665EDyi+LRA6Tp+wyjQG2N50Z2QKjr4W095T+e28pIB+tebYXAoViNdlPdoxn6VH08CuK4nuK UQ= X-Google-Smtp-Source: AGHT+IGzXDA3pFOlRkp+KwBypVXBUVkP9rTexoWChXscaSu4KBqspgRTeSQKPufdx+F4Isa7aW22uA== X-Received: by 2002:a05:620a:f05:b0:86e:ff4e:d55e with SMTP id af79cd13be357-8b29b78aa43mr510566785a.39.1762975809008; Wed, 12 Nov 2025 11:30:09 -0800 (PST) Received: from gourry-fedora-PF4VCD3F.lan (pool-96-255-20-138.washdc.ftas.verizon.net. [96.255.20.138]) by smtp.gmail.com with ESMTPSA id af79cd13be357-8b29aa0082esm243922885a.50.2025.11.12.11.30.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 12 Nov 2025 11:30:08 -0800 (PST) From: Gregory Price To: linux-mm@kvack.org Cc: kernel-team@meta.com, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev, linux-fsdevel@vger.kernel.org, cgroups@vger.kernel.org, dave@stgolabs.net, jonathan.cameron@huawei.com, dave.jiang@intel.com, alison.schofield@intel.com, vishal.l.verma@intel.com, ira.weiny@intel.com, dan.j.williams@intel.com, longman@redhat.com, akpm@linux-foundation.org, david@redhat.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com, mhocko@suse.com, osalvador@suse.de, ziy@nvidia.com, matthew.brost@intel.com, joshua.hahnjy@gmail.com, rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net, ying.huang@linux.alibaba.com, apopple@nvidia.com, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, tj@kernel.org, hannes@cmpxchg.org, mkoutny@suse.com, kees@kernel.org, muchun.song@linux.dev, roman.gushchin@linux.dev, shakeel.butt@linux.dev, rientjes@google.com, jackmanb@google.com, cl@gentwo.org, harry.yoo@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, zhengqi.arch@bytedance.com, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev, fabio.m.de.francesco@linux.intel.com, rrichter@amd.com, ming.li@zohomail.com, usamaarif642@gmail.com, brauner@kernel.org, oleg@redhat.com, namcao@linutronix.de, escape@linux.alibaba.com, dongjoo.seo1@samsung.com Subject: [RFC PATCH v2 07/11] cpuset: introduce cpuset.mems.sysram Date: Wed, 12 Nov 2025 14:29:23 -0500 Message-ID: <20251112192936.2574429-8-gourry@gourry.net> X-Mailer: git-send-email 2.51.1 In-Reply-To: <20251112192936.2574429-1-gourry@gourry.net> References: <20251112192936.2574429-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" mems_sysram contains only SystemRAM nodes (omitting SPM Nodes). The nodelist is effectively intersect(effective_mems, mt_sysram_nodelist). When checking mems_allowed, check for GFP_SPM_NODE to determine if the check should be made against mems_sysram or mems_allowed, since mems_sysram only contains sysram nodes. This omits "Specific Purpose Memory Nodes" from default mems_allowed checks, making those nodes unreachable via "normal" allocation paths (page faults, mempolicies, etc). Signed-off-by: Gregory Price --- include/linux/cpuset.h | 8 ++-- kernel/cgroup/cpuset-internal.h | 8 ++++ kernel/cgroup/cpuset-v1.c | 7 +++ kernel/cgroup/cpuset.c | 84 ++++++++++++++++++++++++--------- mm/memcontrol.c | 3 +- mm/mempolicy.c | 6 +-- mm/migrate.c | 4 +- 7 files changed, 88 insertions(+), 32 deletions(-) diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h index 9baaf19431b5..375bf446b66e 100644 --- a/include/linux/cpuset.h +++ b/include/linux/cpuset.h @@ -77,7 +77,7 @@ extern void cpuset_unlock(void); extern void cpuset_cpus_allowed(struct task_struct *p, struct cpumask *mas= k); extern bool cpuset_cpus_allowed_fallback(struct task_struct *p); extern bool cpuset_cpu_is_isolated(int cpu); -extern nodemask_t cpuset_mems_allowed(struct task_struct *p); +extern nodemask_t cpuset_sysram_nodes_allowed(struct task_struct *p); #define cpuset_current_sysram_nodes (current->sysram_nodes) #define cpuset_current_mems_allowed (cpuset_current_sysram_nodes) void cpuset_init_current_sysram_nodes(void); @@ -174,7 +174,7 @@ static inline void set_mems_allowed(nodemask_t nodemask) task_unlock(current); } =20 -extern bool cpuset_node_allowed(struct cgroup *cgroup, int nid); +extern bool cpuset_sysram_node_allowed(struct cgroup *cgroup, int nid); #else /* !CONFIG_CPUSETS */ =20 static inline bool cpusets_enabled(void) { return false; } @@ -212,7 +212,7 @@ static inline bool cpuset_cpu_is_isolated(int cpu) return false; } =20 -static inline nodemask_t cpuset_mems_allowed(struct task_struct *p) +static inline nodemask_t cpuset_sysram_nodes_allowed(struct task_struct *p) { return node_possible_map; } @@ -296,7 +296,7 @@ static inline bool read_mems_allowed_retry(unsigned int= seq) return false; } =20 -static inline bool cpuset_node_allowed(struct cgroup *cgroup, int nid) +static inline bool cpuset_sysram_node_allowed(struct cgroup *cgroup, int n= id) { return true; } diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-interna= l.h index 337608f408ce..64e48fe040ed 100644 --- a/kernel/cgroup/cpuset-internal.h +++ b/kernel/cgroup/cpuset-internal.h @@ -53,6 +53,7 @@ typedef enum { FILE_MEMORY_MIGRATE, FILE_CPULIST, FILE_MEMLIST, + FILE_MEMS_SYSRAM, FILE_EFFECTIVE_CPULIST, FILE_EFFECTIVE_MEMLIST, FILE_SUBPARTS_CPULIST, @@ -104,6 +105,13 @@ struct cpuset { cpumask_var_t effective_cpus; nodemask_t effective_mems; =20 + /* + * SystemRAM Memory Nodes for tasks. + * This is the intersection of effective_mems and mt_sysram_nodelist. + * Tasks will have their sysram_nodes set to this value. + */ + nodemask_t mems_sysram; + /* * Exclusive CPUs dedicated to current cgroup (default hierarchy only) * diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c index 12e76774c75b..c58215d7230e 100644 --- a/kernel/cgroup/cpuset-v1.c +++ b/kernel/cgroup/cpuset-v1.c @@ -293,6 +293,7 @@ void cpuset1_hotplug_update_tasks(struct cpuset *cs, cpumask_copy(cs->effective_cpus, new_cpus); cs->mems_allowed =3D *new_mems; cs->effective_mems =3D *new_mems; + cpuset_update_tasks_nodemask(cs); cpuset_callback_unlock_irq(); =20 /* @@ -532,6 +533,12 @@ struct cftype cpuset1_files[] =3D { .private =3D FILE_EFFECTIVE_MEMLIST, }, =20 + { + .name =3D "mems_sysram", + .seq_show =3D cpuset_common_seq_show, + .private =3D FILE_MEMS_SYSRAM, + }, + { .name =3D "cpu_exclusive", .read_u64 =3D cpuset_read_u64, diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index f0c59621a7f2..e08b59a0cf99 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -29,6 +29,7 @@ #include #include #include +#include #include #include #include @@ -428,11 +429,11 @@ static void guarantee_active_cpus(struct task_struct = *tsk, * * Call with callback_lock or cpuset_mutex held. */ -static void guarantee_online_mems(struct cpuset *cs, nodemask_t *pmask) +static void guarantee_online_sysram_nodes(struct cpuset *cs, nodemask_t *p= mask) { - while (!nodes_intersects(cs->effective_mems, node_states[N_MEMORY])) + while (!nodes_intersects(cs->mems_sysram, node_states[N_MEMORY])) cs =3D parent_cs(cs); - nodes_and(*pmask, cs->effective_mems, node_states[N_MEMORY]); + nodes_and(*pmask, cs->mems_sysram, node_states[N_MEMORY]); } =20 /** @@ -2723,7 +2724,7 @@ void cpuset_update_tasks_nodemask(struct cpuset *cs) =20 cpuset_being_rebound =3D cs; /* causes mpol_dup() rebind */ =20 - guarantee_online_mems(cs, &newmems); + guarantee_online_sysram_nodes(cs, &newmems); =20 /* * The mpol_rebind_mm() call takes mmap_lock, which we couldn't @@ -2748,7 +2749,7 @@ void cpuset_update_tasks_nodemask(struct cpuset *cs) =20 migrate =3D is_memory_migrate(cs); =20 - mpol_rebind_mm(mm, &cs->mems_allowed); + mpol_rebind_mm(mm, &cs->mems_sysram); if (migrate) cpuset_migrate_mm(mm, &cs->old_mems_allowed, &newmems); else @@ -2808,6 +2809,7 @@ static void update_nodemasks_hier(struct cpuset *cs, = nodemask_t *new_mems) =20 spin_lock_irq(&callback_lock); cp->effective_mems =3D *new_mems; + mt_nodemask_sysram_mask(&cp->mems_sysram, &cp->effective_mems); spin_unlock_irq(&callback_lock); =20 WARN_ON(!is_in_v2_mode() && @@ -3234,11 +3236,11 @@ static void cpuset_attach(struct cgroup_taskset *ts= et) * by skipping the task iteration and update. */ if (cpuset_v2() && !cpus_updated && !mems_updated) { - cpuset_attach_nodemask_to =3D cs->effective_mems; + cpuset_attach_nodemask_to =3D cs->mems_sysram; goto out; } =20 - guarantee_online_mems(cs, &cpuset_attach_nodemask_to); + guarantee_online_sysram_nodes(cs, &cpuset_attach_nodemask_to); =20 cgroup_taskset_for_each(task, css, tset) cpuset_attach_task(cs, task); @@ -3249,7 +3251,7 @@ static void cpuset_attach(struct cgroup_taskset *tset) * if there is no change in effective_mems and CS_MEMORY_MIGRATE is * not set. */ - cpuset_attach_nodemask_to =3D cs->effective_mems; + cpuset_attach_nodemask_to =3D cs->mems_sysram; if (!is_memory_migrate(cs) && !mems_updated) goto out; =20 @@ -3371,6 +3373,9 @@ int cpuset_common_seq_show(struct seq_file *sf, void = *v) case FILE_EFFECTIVE_MEMLIST: seq_printf(sf, "%*pbl\n", nodemask_pr_args(&cs->effective_mems)); break; + case FILE_MEMS_SYSRAM: + seq_printf(sf, "%*pbl\n", nodemask_pr_args(&cs->mems_sysram)); + break; case FILE_EXCLUSIVE_CPULIST: seq_printf(sf, "%*pbl\n", cpumask_pr_args(cs->exclusive_cpus)); break; @@ -3482,6 +3487,12 @@ static struct cftype dfl_files[] =3D { .private =3D FILE_EFFECTIVE_MEMLIST, }, =20 + { + .name =3D "mems.sysram", + .seq_show =3D cpuset_common_seq_show, + .private =3D FILE_MEMS_SYSRAM, + }, + { .name =3D "cpus.partition", .seq_show =3D cpuset_partition_show, @@ -3585,6 +3596,7 @@ static int cpuset_css_online(struct cgroup_subsys_sta= te *css) if (is_in_v2_mode()) { cpumask_copy(cs->effective_cpus, parent->effective_cpus); cs->effective_mems =3D parent->effective_mems; + mt_nodemask_sysram_mask(&cs->mems_sysram, &cs->effective_mems); } spin_unlock_irq(&callback_lock); =20 @@ -3616,6 +3628,7 @@ static int cpuset_css_online(struct cgroup_subsys_sta= te *css) spin_lock_irq(&callback_lock); cs->mems_allowed =3D parent->mems_allowed; cs->effective_mems =3D parent->mems_allowed; + mt_nodemask_sysram_mask(&cs->mems_sysram, &cs->effective_mems); cpumask_copy(cs->cpus_allowed, parent->cpus_allowed); cpumask_copy(cs->effective_cpus, parent->cpus_allowed); spin_unlock_irq(&callback_lock); @@ -3769,7 +3782,7 @@ static void cpuset_fork(struct task_struct *task) =20 /* CLONE_INTO_CGROUP */ mutex_lock(&cpuset_mutex); - guarantee_online_mems(cs, &cpuset_attach_nodemask_to); + guarantee_online_sysram_nodes(cs, &cpuset_attach_nodemask_to); cpuset_attach_task(cs, task); =20 dec_attach_in_progress_locked(cs); @@ -3818,7 +3831,8 @@ int __init cpuset_init(void) cpumask_setall(top_cpuset.effective_xcpus); cpumask_setall(top_cpuset.exclusive_cpus); nodes_setall(top_cpuset.effective_mems); - + mt_nodemask_sysram_mask(&top_cpuset.mems_sysram, + &top_cpuset.effective_mems); fmeter_init(&top_cpuset.fmeter); INIT_LIST_HEAD(&remote_children); =20 @@ -3848,6 +3862,7 @@ hotplug_update_tasks(struct cpuset *cs, spin_lock_irq(&callback_lock); cpumask_copy(cs->effective_cpus, new_cpus); cs->effective_mems =3D *new_mems; + mt_nodemask_sysram_mask(&cs->mems_sysram, &cs->effective_mems); spin_unlock_irq(&callback_lock); =20 if (cpus_updated) @@ -4039,6 +4054,8 @@ static void cpuset_handle_hotplug(void) if (!on_dfl) top_cpuset.mems_allowed =3D new_mems; top_cpuset.effective_mems =3D new_mems; + mt_nodemask_sysram_mask(&top_cpuset.mems_sysram, + &top_cpuset.effective_mems); spin_unlock_irq(&callback_lock); cpuset_update_tasks_nodemask(&top_cpuset); } @@ -4109,6 +4126,8 @@ void __init cpuset_init_smp(void) =20 cpumask_copy(top_cpuset.effective_cpus, cpu_active_mask); top_cpuset.effective_mems =3D node_states[N_MEMORY]; + mt_nodemask_sysram_mask(&top_cpuset.mems_sysram, + &top_cpuset.effective_mems); =20 hotplug_node_notifier(cpuset_track_online_nodes, CPUSET_CALLBACK_PRI); =20 @@ -4205,14 +4224,18 @@ bool cpuset_cpus_allowed_fallback(struct task_struc= t *tsk) return changed; } =20 +/* + * At this point in time, no hotplug nodes can have been added, so just set + * the sysram_nodes of the init task to the set of N_MEMORY nodes. + */ void __init cpuset_init_current_sysram_nodes(void) { - nodes_setall(current->sysram_nodes); + current->sysram_nodes =3D node_states[N_MEMORY]; } =20 /** - * cpuset_mems_allowed - return mems_allowed mask from a tasks cpuset. - * @tsk: pointer to task_struct from which to obtain cpuset->mems_allowed. + * cpuset_sysram_nodes_allowed - return mems_sysram mask from a tasks cpus= et. + * @tsk: pointer to task_struct from which to obtain cpuset->mems_sysram. * * Description: Returns the nodemask_t mems_allowed of the cpuset * attached to the specified @tsk. Guaranteed to return some non-empty @@ -4220,13 +4243,13 @@ void __init cpuset_init_current_sysram_nodes(void) * tasks cpuset. **/ =20 -nodemask_t cpuset_mems_allowed(struct task_struct *tsk) +nodemask_t cpuset_sysram_nodes_allowed(struct task_struct *tsk) { nodemask_t mask; unsigned long flags; =20 spin_lock_irqsave(&callback_lock, flags); - guarantee_online_mems(task_cs(tsk), &mask); + guarantee_online_sysram_nodes(task_cs(tsk), &mask); spin_unlock_irqrestore(&callback_lock, flags); =20 return mask; @@ -4295,17 +4318,30 @@ static struct cpuset *nearest_hardwall_ancestor(str= uct cpuset *cs) * tsk_is_oom_victim - any node ok * GFP_KERNEL - any node in enclosing hardwalled cpuset ok * GFP_USER - only nodes in current tasks mems allowed ok. + * GFP_SPM_NODE - allow specific purpose memory nodes in mems_allowed */ bool cpuset_current_node_allowed(int node, gfp_t gfp_mask) { struct cpuset *cs; /* current cpuset ancestors */ bool allowed; /* is allocation in zone z allowed? */ unsigned long flags; + bool sp_node =3D gfp_mask & __GFP_SPM_NODE; =20 + /* Only SysRAM nodes are valid in interrupt context */ if (in_interrupt()) - return true; - if (node_isset(node, current->sysram_nodes)) - return true; + return (!sp_node || node_isset(node, mt_sysram_nodelist)); + + if (sp_node) { + rcu_read_lock(); + cs =3D task_cs(current); + allowed =3D node_isset(node, cs->mems_allowed); + rcu_read_unlock(); + } else + allowed =3D node_isset(node, current->sysram_nodes); + + if (allowed) + return allowed; + /* * Allow tasks that have access to memory reserves because they have * been OOM killed to get memory anywhere. @@ -4324,11 +4360,15 @@ bool cpuset_current_node_allowed(int node, gfp_t gf= p_mask) cs =3D nearest_hardwall_ancestor(task_cs(current)); allowed =3D node_isset(node, cs->mems_allowed); =20 + /* If not a SP Node allocation, restrict to sysram nodes */ + if (!sp_node && !nodes_empty(mt_sysram_nodelist)) + allowed &=3D node_isset(node, mt_sysram_nodelist); + spin_unlock_irqrestore(&callback_lock, flags); return allowed; } =20 -bool cpuset_node_allowed(struct cgroup *cgroup, int nid) +bool cpuset_sysram_node_allowed(struct cgroup *cgroup, int nid) { struct cgroup_subsys_state *css; struct cpuset *cs; @@ -4347,7 +4387,7 @@ bool cpuset_node_allowed(struct cgroup *cgroup, int n= id) return true; =20 /* - * Normally, accessing effective_mems would require the cpuset_mutex + * Normally, accessing mems_sysram would require the cpuset_mutex * or callback_lock - but node_isset is atomic and the reference * taken via cgroup_get_e_css is sufficient to protect css. * @@ -4359,7 +4399,7 @@ bool cpuset_node_allowed(struct cgroup *cgroup, int n= id) * cannot make strong isolation guarantees, so this is acceptable. */ cs =3D container_of(css, struct cpuset, css); - allowed =3D node_isset(nid, cs->effective_mems); + allowed =3D node_isset(nid, cs->mems_sysram); css_put(css); return allowed; } @@ -4380,7 +4420,7 @@ bool cpuset_node_allowed(struct cgroup *cgroup, int n= id) * We don't have to worry about the returned node being offline * because "it can't happen", and even if it did, it would be ok. * - * The routines calling guarantee_online_mems() are careful to + * The routines calling guarantee_online_sysram_nodes() are careful to * only set nodes in task->sysram_nodes that are online. So it * should not be possible for the following code to return an * offline node. But if it did, that would be ok, as this routine diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 4deda33625f4..7cac7ff013a7 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5599,5 +5599,6 @@ subsys_initcall(mem_cgroup_swap_init); =20 bool mem_cgroup_node_allowed(struct mem_cgroup *memcg, int nid) { - return memcg ? cpuset_node_allowed(memcg->css.cgroup, nid) : true; + return memcg ? cpuset_sysram_node_allowed(memcg->css.cgroup, nid) : + true; } diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 735dabb9c50c..e1e8a1f3e1a2 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1831,14 +1831,14 @@ static int kernel_migrate_pages(pid_t pid, unsigned= long maxnode, } rcu_read_unlock(); =20 - task_nodes =3D cpuset_mems_allowed(task); + task_nodes =3D cpuset_sysram_nodes_allowed(task); /* Is the user allowed to access the target nodes? */ if (!nodes_subset(*new, task_nodes) && !capable(CAP_SYS_NICE)) { err =3D -EPERM; goto out_put; } =20 - task_nodes =3D cpuset_mems_allowed(current); + task_nodes =3D cpuset_sysram_nodes_allowed(current); nodes_and(*new, *new, task_nodes); if (nodes_empty(*new)) goto out_put; @@ -2763,7 +2763,7 @@ struct mempolicy *__mpol_dup(struct mempolicy *old) *new =3D *old; =20 if (current_cpuset_is_being_rebound()) { - nodemask_t mems =3D cpuset_mems_allowed(current); + nodemask_t mems =3D cpuset_sysram_nodes_allowed(current); mpol_rebind_policy(new, &mems); } atomic_set(&new->refcnt, 1); diff --git a/mm/migrate.c b/mm/migrate.c index c0e9f15be2a2..c612f05d23db 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -2526,7 +2526,7 @@ static struct mm_struct *find_mm_struct(pid_t pid, no= demask_t *mem_nodes) */ if (!pid) { mmget(current->mm); - *mem_nodes =3D cpuset_mems_allowed(current); + *mem_nodes =3D cpuset_sysram_nodes_allowed(current); return current->mm; } =20 @@ -2547,7 +2547,7 @@ static struct mm_struct *find_mm_struct(pid_t pid, no= demask_t *mem_nodes) mm =3D ERR_PTR(security_task_movememory(task)); if (IS_ERR(mm)) goto out; - *mem_nodes =3D cpuset_mems_allowed(task); + *mem_nodes =3D cpuset_sysram_nodes_allowed(task); mm =3D get_task_mm(task); out: put_task_struct(task); --=20 2.51.1 From nobody Sun Feb 8 22:43:21 2026 Received: from mail-qk1-f182.google.com (mail-qk1-f182.google.com [209.85.222.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A77DC352F8D for ; Wed, 12 Nov 2025 19:30:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762975815; cv=none; b=rTj/APBqVDO89ENA/6z/8BIFEzmZ682ptmZhb1QrrbPYELBWY26DKFFOuJPLSqgWQrQv/PhW9hjIb1KY2S9TEEdub2tSGaMje7hBjKFSyl/a2IKQKR08LokaiThH56Z9KWYROn4Wmve6E/GfS5xP3uEAIiATDUVVFZxVwblIMWU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762975815; c=relaxed/simple; bh=CCv29ioHuBPEaztDjypCZpSL1ZEuoaIx0rBLzpHAvqM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=qbKS6vqUCK1hHbuV/L6g1gs6LaQkDoQWgAfhztGeF/psfBmwNS5Dd/Mm85m9OeZKVhL4MqmrDs0XXNsY2oBJHXgx9kwsNvYzzLCozZND8peAQsiRFv16Pj7gKONui4AoD4Qhqk5SInbP3AnSVYj1uETk8Kk60VF/B2CVfb4S338= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=pmyCQB+I; arc=none smtp.client-ip=209.85.222.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="pmyCQB+I" Received: by mail-qk1-f182.google.com with SMTP id af79cd13be357-88e51cf965dso4383685a.2 for ; Wed, 12 Nov 2025 11:30:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1762975812; x=1763580612; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=nmy6EhUdgU0DNpQaVrJpBZ6lZ48+dr4nIpOCH6X0C1w=; b=pmyCQB+Ik15ItccoSBlwNS2FTNxFwsEeYFliMScoNTWXSxdC20VqNEKlWbBuJlgbK+ bmjs5VcEGpJJvAWqhlzgvX55be1Dn5/UL47vanr9YF292sJ+5lpDVdNRvc4Y896vrVQf rJemvkqVbEc8WLvoRCYiXRHMj+ET6hG883YJaus4EOTukTQK5NWRNrxjAPqdb2/3uk7d XZKgkbAmyJhRDF+eJ78sUGWdLtjnNW/VFMXNlpxnrhgX21V4/u+c+Q8hiTgI1XRwOyzH B8A5bVfGZvq3U6zj/6eT8Hksw8be2UolaWSddsTRHPxNTyejpLooxk8DsYhO21SB0/60 8C2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1762975812; x=1763580612; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=nmy6EhUdgU0DNpQaVrJpBZ6lZ48+dr4nIpOCH6X0C1w=; b=mLU0wUb5Z8F2+/HLT2eJgq00aLzIkTl06JMcnffaiBuODtJJnU0dwNLaVoNTvOx1z4 cG1sbqDHpql0vtlt0vdoM4XiJfa3hCNPMjrRipZ0JDuL7syc9Wz4je9i8f0WE9nFz6kt nvzJp1OUiJ3fP6oDUpGLE5ik6gxqd3toENI2+RCdga8uWKZc+EAIzM2oi5JlBswrQwzj 2acFCkte4Vmc9US/FYedglckfRaVTeBmVYL/CeZz3B/b7zWtEXyofJJ+7RastdqP8WoE 37qjRARcqIQS1FcXQnfUHYQgRmMO8eascbtziaHoaLkk2abkuaXhhkmQ6cPgeeCtzupL OYLw== X-Forwarded-Encrypted: i=1; AJvYcCXeLN5Ewm8ffEhNE+xH5ERSD+HwLFSqGn11/9N+1aLdsZh1mIjP9hywhd27irU+EUXEQkEtQZx73FEFNcc=@vger.kernel.org X-Gm-Message-State: AOJu0Yw+bH0FOk0gZlzFmSxyvHsLhULhhoADy+a5V6bxodbmFKoci5R/ mDL6fdHocE9Dkb4XQDXG+g/hJLZ+qO3KdR8HNEpdoSxIn0LTG1T0FdlQ4ocAKoEgTKY= X-Gm-Gg: ASbGncuTRVthtn1IAVZl9wedtfw8Z4/IrNaSCMfMZw1yUGU6B+5+eDu2jG4TiYcT60I CYX5GLvPam5FvxXNJpL3Gs7PDOUtGX/gPjpC4xw8EByV1xOfxPuZNK6H7CXnU9KQFmfhXeR53dQ EN4TGvQq1K/ffDuSXPOGIM2vTobVxxPDAjKDPxjlkTuiXBAFRc8pAvudeIWKgH1XKkcln4fSbfs /Iad2XGIwk6fKbdLPx2gOIwtpq+iLJnW0TEotqNNuy+HELCF0w5HlTRzklXJTcB7eljQi8FrGNC dQbSnxyaXC/ZE0SAdFNGyvgt0aUpBAmbmvVwznQ+bDtg+H3FsGiA7YdFWd3cK1p2yhFcCNBTdzc HbspVMpN8tDfMy0TiPsWeQl3ezhMB1NqFmjifIfXzW+U2BsN2VsvzMQWNKcoX8gG8QhNG/J6Zxx 3ASltU5srUg9kKQ2UDrjLFsP1SOCT1gKmfKadqcyJzU7preQQgFXWqJR+YS9bf5nL9 X-Google-Smtp-Source: AGHT+IHrguqekTBYRBdFA9dxH70A9CcZsL7BjLOY758kp78A0w22ToZorWvEFBPVQ4lklZwp+U9JIA== X-Received: by 2002:a05:620a:29ca:b0:8a3:d644:6944 with SMTP id af79cd13be357-8b29b74d7c0mr602392885a.5.1762975812194; Wed, 12 Nov 2025 11:30:12 -0800 (PST) Received: from gourry-fedora-PF4VCD3F.lan (pool-96-255-20-138.washdc.ftas.verizon.net. [96.255.20.138]) by smtp.gmail.com with ESMTPSA id af79cd13be357-8b29aa0082esm243922885a.50.2025.11.12.11.30.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 12 Nov 2025 11:30:11 -0800 (PST) From: Gregory Price To: linux-mm@kvack.org Cc: kernel-team@meta.com, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev, linux-fsdevel@vger.kernel.org, cgroups@vger.kernel.org, dave@stgolabs.net, jonathan.cameron@huawei.com, dave.jiang@intel.com, alison.schofield@intel.com, vishal.l.verma@intel.com, ira.weiny@intel.com, dan.j.williams@intel.com, longman@redhat.com, akpm@linux-foundation.org, david@redhat.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com, mhocko@suse.com, osalvador@suse.de, ziy@nvidia.com, matthew.brost@intel.com, joshua.hahnjy@gmail.com, rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net, ying.huang@linux.alibaba.com, apopple@nvidia.com, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, tj@kernel.org, hannes@cmpxchg.org, mkoutny@suse.com, kees@kernel.org, muchun.song@linux.dev, roman.gushchin@linux.dev, shakeel.butt@linux.dev, rientjes@google.com, jackmanb@google.com, cl@gentwo.org, harry.yoo@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, zhengqi.arch@bytedance.com, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev, fabio.m.de.francesco@linux.intel.com, rrichter@amd.com, ming.li@zohomail.com, usamaarif642@gmail.com, brauner@kernel.org, oleg@redhat.com, namcao@linutronix.de, escape@linux.alibaba.com, dongjoo.seo1@samsung.com Subject: [RFC PATCH v2 08/11] mm/memory_hotplug: add MHP_SPM_NODE flag Date: Wed, 12 Nov 2025 14:29:24 -0500 Message-ID: <20251112192936.2574429-9-gourry@gourry.net> X-Mailer: git-send-email 2.51.1 In-Reply-To: <20251112192936.2574429-1-gourry@gourry.net> References: <20251112192936.2574429-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add support for Specific Purpose Memory (SPM) NUMA nodes. A SPM node is managed by the page allocator, but can only allocated by using the __GFP_SP_NODE flag with an appropriate nodemask. Check/Set the node type (SysRAM vs SPM) at hotplug time. Disallow SPM from being added to SysRAM nodes and vice-versa. This prevents normal allocation paths (page faults, kmalloc, etc) from being directly exposed to these memories, and provides a clear integration point for buddy-allocation of SPM memory. Signed-off-by: Gregory Price --- include/linux/memory_hotplug.h | 10 ++++++++++ mm/memory_hotplug.c | 7 +++++++ 2 files changed, 17 insertions(+) diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 23f038a16231..a50c467951ba 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -74,6 +74,16 @@ typedef int __bitwise mhp_t; * helpful in low-memory situations. */ #define MHP_OFFLINE_INACCESSIBLE ((__force mhp_t)BIT(3)) +/* + * The hotplugged memory can only be added to a "Specific Purpose Memory" + * NUMA node. SPM Nodes are not generally accessible by the page allocator + * by way of userland configuration - as most nodemask interfaces + * (mempolicy, cpusets) restrict nodes to SysRAM nodes. + * + * Hotplugging SPM into a SysRAM Node results in -EINVAL. + * Hotplugging SysRAM into a SPM Node results in -EINVAL. + */ +#define MHP_SPM_NODE ((__force mhp_t)BIT(4)) =20 /* * Extended parameters for memory hotplug: diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 0be83039c3b5..488cdd8e5f6f 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -20,6 +20,7 @@ #include #include #include +#include #include #include #include @@ -1529,6 +1530,12 @@ int add_memory_resource(int nid, struct resource *re= s, mhp_t mhp_flags) =20 mem_hotplug_begin(); =20 + /* Set the NUMA node type and bail out if the type is wrong */ + ret =3D mt_set_node_type(nid, (mhp_flags & MHP_SPM_NODE) ? + MT_NODE_TYPE_SPM : MT_NODE_TYPE_SYSRAM); + if (ret) + goto error_mem_hotplug_end; + if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) { if (res->flags & IORESOURCE_SYSRAM_DRIVER_MANAGED) memblock_flags =3D MEMBLOCK_DRIVER_MANAGED; --=20 2.51.1 From nobody Sun Feb 8 22:43:21 2026 Received: from mail-qk1-f173.google.com (mail-qk1-f173.google.com [209.85.222.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9465D352FB3 for ; Wed, 12 Nov 2025 19:30:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762975819; cv=none; b=a5ZHxWeQel2y+sXKHdctByDp5EtlJ2amf0V3B0VB13qJ/hP7HSjq0U67BSbVPoyz9uygJ8OBUAcxMV2LgQhfq8MiiC+VVtJtuAyKcjMjyI50352r/y0DaVYYrm2RJSQyMBxJ8a8ZWmXe3ZOnHxcSS1vlJO8jM1UyjaXx6DEMK6E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762975819; c=relaxed/simple; bh=KDT4LvBi3S/XOOUfgA+q7nj0PsLuynWQ2oewm937Vbs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=WQ7/LUAPrNCjXY+K+3DjPybobYT5+IZWxsMx+ehjkzTBty6DLS2pvmdydB23qwb/81ASivVw/lpkeB/lGGX0fFQT149fjyFohEK/c4rib7dmR6wcyndIRc3fK3eKv8Bx2dJrdjQ1Wo6N169bF0Xpl664L6U1eoKE5hlkGt+GcA8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=iK/WZAx6; arc=none smtp.client-ip=209.85.222.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="iK/WZAx6" Received: by mail-qk1-f173.google.com with SMTP id af79cd13be357-8b28f983333so4588085a.3 for ; Wed, 12 Nov 2025 11:30:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1762975816; x=1763580616; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=x8LOzdRF3F1ID8i/f4610zpV6s32IDb3v5xseVevQWo=; b=iK/WZAx6WSo597sQv8dIVqkTiZevnl7RNJ6CjNCRNNsysgMjw9wd8LAxosHniaUOIQ 1ZeQKAR7HVbZBtr2aHSNZujRlWlLerK6xSmtGIb5v/1f1jFULjCLQj3IJGukejb072yG kyPlEwN3TL3wmoh95kLwZTsprX4hSCgP9W7300+Pk0YhzI3tzMWTTn5QJPR7VE8MDyg0 L1M6CtPORE57vIczAIkMIyIHNQJuObn6FaNHoEGYJHHzVgY7u3dtKIocl1CdlccwNLB1 EECRWVa4KKu2DOX9Q1E3aqUohIhmijL+SVGo0jNVJDtazmB2KiWDETbvcUL6sMr6eK6m ERYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1762975816; x=1763580616; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=x8LOzdRF3F1ID8i/f4610zpV6s32IDb3v5xseVevQWo=; b=uKo9Ut7JJy1dKjkn+WZ4a8X3/ygS/muGYId5iY6n2EeG28TVRrGflPnrdEPClrah6q gLd2+4aBQFYVEtUMHHPIZiwSCaEMp3zlYPMrbRgPn5xwlzAHUsND0Rw20lZQMk6p5t6K tLS9s8IqzJLoJHCmSA9gcWPol8KkNbtPAT+cG7JPAhFOaYS9g8IJz1EW6hBe7Hw96/Mc O6eLa9i58m4DYGX18tY2yIVKHicxaV8lvVFHJtjxotknQe5PzocE9Y9xLCufiBE5NJ5M ssn7SHnRr4K4rXYfqORuhtV9MH/p7AaPP+0Le2mSl0cQOe/4cte3i1IxdCf+fmMFBj5l Xo/A== X-Forwarded-Encrypted: i=1; AJvYcCV4f4XQoF8JL8u/yBXSEbsjmAuu13QKmUX3oayU5uugiKM0dzfRqPeWggRAIlHPcKz9Xm6Ixjp3VnfOgK4=@vger.kernel.org X-Gm-Message-State: AOJu0YyvG3qLg61685bJ+CiT59iN/VSwx2kNv/cp4c9TOpEpikedX928 8yei9vSJoYfuecPOs0r8JSrIj7gDmb3uiSR3z4lN/syFvrX0+tVVUDwz0RELvTp8aUU= X-Gm-Gg: ASbGncs1UytqoMf/gNTussmU6tR3yUM0UJBECbtcrWsU2uhguMSv5ZcSC+QxAWfOO/x Uh9xCDvZqrdN/JdOxEHXzhCCID/kbcCWh9J5snMmbftEFXjuaJ1IdHaWafYnHtRk2HNdCPr1Jdq k3R0AipM76KbBv6/Lm4jYJAJeQNJUbkhWzYVEu1UrGe4QwTcBbmAFt+vuPzJrI0O9B0/6RlV2Q/ mQTn7cVWMkA1XWNnWLdFBSa3Bo59f+b87DNiMHq3RPeTkJcQkyET8LTPV5oi4oYoYD3nzMftTYs HowSSSfFjvCJ7/w9j7qoZ2DhZKUhbE/a7Yd6D4xPfIqIZq2K9BZph+IXSsR3JV0oHLQYyd/BBR7 l3fjrOA4bE1Zv/RyFVrO8ytGRMSo1S9SiL67NXJXAuyHyTauTU4qxcNtSaCDUBp8F3sVZS+CaeJ BZakhizZ3aV+V/4Vphdo7eaJkOatonkxjndo6wALYrCd2vn8u+hOWJm0ShxVgFNnLC X-Google-Smtp-Source: AGHT+IHll+IKPvcEc2EdGPBw2QQZcQZ6T6jlawQUNb6L1NrYv5cDZU0Tf0FeuJID2gzeZN1UxCbY+g== X-Received: by 2002:a05:620a:2a0f:b0:84e:2544:6be7 with SMTP id af79cd13be357-8b29b815e49mr640273285a.65.1762975816129; Wed, 12 Nov 2025 11:30:16 -0800 (PST) Received: from gourry-fedora-PF4VCD3F.lan (pool-96-255-20-138.washdc.ftas.verizon.net. [96.255.20.138]) by smtp.gmail.com with ESMTPSA id af79cd13be357-8b29aa0082esm243922885a.50.2025.11.12.11.30.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 12 Nov 2025 11:30:15 -0800 (PST) From: Gregory Price To: linux-mm@kvack.org Cc: kernel-team@meta.com, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev, linux-fsdevel@vger.kernel.org, cgroups@vger.kernel.org, dave@stgolabs.net, jonathan.cameron@huawei.com, dave.jiang@intel.com, alison.schofield@intel.com, vishal.l.verma@intel.com, ira.weiny@intel.com, dan.j.williams@intel.com, longman@redhat.com, akpm@linux-foundation.org, david@redhat.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com, mhocko@suse.com, osalvador@suse.de, ziy@nvidia.com, matthew.brost@intel.com, joshua.hahnjy@gmail.com, rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net, ying.huang@linux.alibaba.com, apopple@nvidia.com, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, tj@kernel.org, hannes@cmpxchg.org, mkoutny@suse.com, kees@kernel.org, muchun.song@linux.dev, roman.gushchin@linux.dev, shakeel.butt@linux.dev, rientjes@google.com, jackmanb@google.com, cl@gentwo.org, harry.yoo@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, zhengqi.arch@bytedance.com, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev, fabio.m.de.francesco@linux.intel.com, rrichter@amd.com, ming.li@zohomail.com, usamaarif642@gmail.com, brauner@kernel.org, oleg@redhat.com, namcao@linutronix.de, escape@linux.alibaba.com, dongjoo.seo1@samsung.com Subject: [RFC PATCH v2 09/11] drivers/dax: add spm_node bit to dev_dax Date: Wed, 12 Nov 2025 14:29:25 -0500 Message-ID: <20251112192936.2574429-10-gourry@gourry.net> X-Mailer: git-send-email 2.51.1 In-Reply-To: <20251112192936.2574429-1-gourry@gourry.net> References: <20251112192936.2574429-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This bit is used by dax/kmem to determine whether to set the MHP_SPM_NODE flags, which determines whether the hotplug memory is SysRAM or Specific Purpose Memory. Signed-off-by: Gregory Price --- drivers/dax/bus.c | 39 +++++++++++++++++++++++++++++++++++++++ drivers/dax/bus.h | 1 + drivers/dax/dax-private.h | 1 + drivers/dax/kmem.c | 2 ++ 4 files changed, 43 insertions(+) diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c index fde29e0ad68b..b0de43854112 100644 --- a/drivers/dax/bus.c +++ b/drivers/dax/bus.c @@ -1361,6 +1361,43 @@ static ssize_t memmap_on_memory_store(struct device = *dev, } static DEVICE_ATTR_RW(memmap_on_memory); =20 +static ssize_t spm_node_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct dev_dax *dev_dax =3D to_dev_dax(dev); + + return sysfs_emit(buf, "%d\n", dev_dax->spm_node); +} + +static ssize_t spm_node_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t len) +{ + struct dev_dax *dev_dax =3D to_dev_dax(dev); + bool val; + int rc; + + rc =3D kstrtobool(buf, &val); + if (rc) + return rc; + + rc =3D down_write_killable(&dax_dev_rwsem); + if (rc) + return rc; + + if (dev_dax->spm_node !=3D val && dev->driver && + to_dax_drv(dev->driver)->type =3D=3D DAXDRV_KMEM_TYPE) { + up_write(&dax_dev_rwsem); + return -EBUSY; + } + + dev_dax->spm_node =3D val; + up_write(&dax_dev_rwsem); + + return len; +} +static DEVICE_ATTR_RW(spm_node); + static umode_t dev_dax_visible(struct kobject *kobj, struct attribute *a, = int n) { struct device *dev =3D container_of(kobj, struct device, kobj); @@ -1388,6 +1425,7 @@ static struct attribute *dev_dax_attributes[] =3D { &dev_attr_resource.attr, &dev_attr_numa_node.attr, &dev_attr_memmap_on_memory.attr, + &dev_attr_spm_node.attr, NULL, }; =20 @@ -1494,6 +1532,7 @@ static struct dev_dax *__devm_create_dev_dax(struct d= ev_dax_data *data) ida_init(&dev_dax->ida); =20 dev_dax->memmap_on_memory =3D data->memmap_on_memory; + dev_dax->spm_node =3D data->spm_node; =20 inode =3D dax_inode(dax_dev); dev->devt =3D inode->i_rdev; diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h index cbbf64443098..51ed961b6a3c 100644 --- a/drivers/dax/bus.h +++ b/drivers/dax/bus.h @@ -24,6 +24,7 @@ struct dev_dax_data { resource_size_t size; int id; bool memmap_on_memory; + bool spm_node; }; =20 struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data); diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h index 0867115aeef2..3d1b1f996383 100644 --- a/drivers/dax/dax-private.h +++ b/drivers/dax/dax-private.h @@ -89,6 +89,7 @@ struct dev_dax { struct device dev; struct dev_pagemap *pgmap; bool memmap_on_memory; + bool spm_node; int nr_range; struct dev_dax_range *ranges; }; diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c index c036e4d0b610..3c3dd1cd052c 100644 --- a/drivers/dax/kmem.c +++ b/drivers/dax/kmem.c @@ -169,6 +169,8 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax) mhp_flags =3D MHP_NID_IS_MGID; if (dev_dax->memmap_on_memory) mhp_flags |=3D MHP_MEMMAP_ON_MEMORY; + if (dev_dax->spm_node) + mhp_flags |=3D MHP_SPM_NODE; =20 /* * Ensure that future kexec'd kernels will not treat --=20 2.51.1 From nobody Sun Feb 8 22:43:21 2026 Received: from mail-qk1-f171.google.com (mail-qk1-f171.google.com [209.85.222.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3AF183538A0 for ; Wed, 12 Nov 2025 19:30:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762975822; cv=none; b=t/tmbnxhJySJXIyV774z6XX7z5DcjgVQPyYoS7WzkVqT+fy+ueZsaoUFdbrxX7AWIDRIw4tdu0tSCva9VZEUR+ZEjlodBuPuIfe/5YrWvMDdNhIFQYGri8fHuPNiSPp10EICjgKh+Z4WrV7Ace1qv7fvzSwCBYyxG90+7kfsOVQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762975822; c=relaxed/simple; bh=xWEyjgvu6RU6fuiMig9vWVi/ppozqErZlNQJst8wywk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=JV6n+L/k+MkD4ZTRCXrJBoMbYNYw5/wLJr7tvccg8qWF8xnzTf1E0C56dJY+hDcbl9rJzdee8eRqRlsrMUHS9oVYxkgLialf38JnTFfh4raekFa8xDWX/B+wn7s46zJV+AS9+jJMRmyt3OgEeA3PjRWnmCl10s4EYaAKG+/AQo0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=ipZjo2jl; arc=none smtp.client-ip=209.85.222.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="ipZjo2jl" Received: by mail-qk1-f171.google.com with SMTP id af79cd13be357-8b29b6f3178so5339385a.0 for ; Wed, 12 Nov 2025 11:30:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1762975819; x=1763580619; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=UkWGDFkxsYcV2vUuZc9I0ZRX3P3sH+nyS9DR2ZbJTF8=; b=ipZjo2jl/Dch0d04gT1tjuiNbKIaQvCSX7LpVMMbbuinhKix3MwaREbBnqCMD1m+FQ DFg9FrSCOqtuTwlddeHXjaWuFLlCfCv+4p8vyI7A4OhZCIhVJhaa+xghbWDRM7P2iSYN e6Yd8q/wKZUx/i5zDXpbLF/ArMvm7AYPZO4bxcAd5qPpDSlGu3Hs6McSwDRGDjoIHlf1 pWGNF+XWZYhF7IuYCHOKYGvZrpJkjFDbSn1XsQD+//ZH/5agNq9iWky0g10RKnlmf4Dn e4A+PB6ShPyZGe5q8y1Y0BKT4BRCsWvQ5FwTpo8/r56a3Stnpkie+Aaqtgu8ZETdoRnP vUoQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1762975819; x=1763580619; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=UkWGDFkxsYcV2vUuZc9I0ZRX3P3sH+nyS9DR2ZbJTF8=; b=kqmehNmRQ+qrY7yC74+zMAN66NUYoSNFRHPCtF0AjbbNerB/pBnvi+lOnQlNaXFCyg NjrUn0/eBkz4cEVJuFPuAaYIHuwZkrLihAQU/pdBjtDLHReCRcFTlUGcrzMiuV2j0yQp kiXib/KVoPQUB1Vk1jrgyfV2AM3CrLQJQy45fDNuqy7CPKv3/tOO0qnnXKR9AWY30adV Pwj52jA1gqU1R0H6i6FaTypaX0EbpojIW2bcta0Rf8bQh6MXUmETtUcOzGlNfOg8FdK7 JCV9ArQBBZ/6HhHPX/QlVMtemcWSBkt25E7V3Yjnl5AOXcAXrLRphQFk00jDkZYfUpC3 OseA== X-Forwarded-Encrypted: i=1; AJvYcCU+vqWbVJAvQEfld5K8vE3K2iN6pqv0kw2FFmenYqLTbNXoQPVOF9AzoMjrNpgAUU1vBtkUKhn2Jo5FWNM=@vger.kernel.org X-Gm-Message-State: AOJu0YwbFZc6oioCSw8ah+m3/BrEFZC9t5+jC51SUx1YnE875Smy8AJT ztQ3H2gJsApGF6Ze1L+CoJlOlUGMeb5Kb/upXBvvS5c+wN/ZMF3g4hpWsnm/9CQL/PJK3q+0Kn8 gObjX X-Gm-Gg: ASbGncvhMpO3yupsjAtktaSTEGgXhhd9QwcbmhTy2wW/GDnq0oV762Z1GAbvoh1AvbW QCSNe10gD1M0OmICVhtFZ1Hw9RrpsXMwVQCHweZugmJ5eEsRxNsOrNCxmfhHtCQdjIJPtmsv8P0 FK9siZoi3W4MlPAbgrtQAiDwZULFdCWWKIWuYi0n0rHXpVI4/ZLGqZeXfAw4hU3vJ2nRYOGqJ7g rgKF6NcL+1yypDZ/y1YMgpCOYUXgl56d78cWvlza/4kZ9IlsOpNvFeGVa4cWX1BNHEgRxIiSnKM IIKoHygtKOdYWmXSdC8l6EcK0sFOmorX/zPhoz+ifo7QPJya6WdTnjfnwuAsJp0n9Sq9lijM7g/ VW+AfK55uRlhApbIcEp4XBecO+dibnG7CWHURmCOz69BjCPZnjt8eKZBA9Gnn4oc9h5RB17PwjQ oqGUJV+zg1jjbMqRZbaAwL5cwobkJHBk/fujikoOm583dynLbGRr8lqr41CO0zzylZ X-Google-Smtp-Source: AGHT+IHnA96vKIbRbEVDdRL2YlwlXqxXxNpoq+1tyGBGVESiKZZsfFfpRROUBFK0hifrTyISq2ufZA== X-Received: by 2002:a05:620a:f01:b0:89d:b480:309f with SMTP id af79cd13be357-8b2ac08bd6amr88119885a.7.1762975819003; Wed, 12 Nov 2025 11:30:19 -0800 (PST) Received: from gourry-fedora-PF4VCD3F.lan (pool-96-255-20-138.washdc.ftas.verizon.net. [96.255.20.138]) by smtp.gmail.com with ESMTPSA id af79cd13be357-8b29aa0082esm243922885a.50.2025.11.12.11.30.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 12 Nov 2025 11:30:18 -0800 (PST) From: Gregory Price To: linux-mm@kvack.org Cc: kernel-team@meta.com, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev, linux-fsdevel@vger.kernel.org, cgroups@vger.kernel.org, dave@stgolabs.net, jonathan.cameron@huawei.com, dave.jiang@intel.com, alison.schofield@intel.com, vishal.l.verma@intel.com, ira.weiny@intel.com, dan.j.williams@intel.com, longman@redhat.com, akpm@linux-foundation.org, david@redhat.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com, mhocko@suse.com, osalvador@suse.de, ziy@nvidia.com, matthew.brost@intel.com, joshua.hahnjy@gmail.com, rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net, ying.huang@linux.alibaba.com, apopple@nvidia.com, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, tj@kernel.org, hannes@cmpxchg.org, mkoutny@suse.com, kees@kernel.org, muchun.song@linux.dev, roman.gushchin@linux.dev, shakeel.butt@linux.dev, rientjes@google.com, jackmanb@google.com, cl@gentwo.org, harry.yoo@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, zhengqi.arch@bytedance.com, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev, fabio.m.de.francesco@linux.intel.com, rrichter@amd.com, ming.li@zohomail.com, usamaarif642@gmail.com, brauner@kernel.org, oleg@redhat.com, namcao@linutronix.de, escape@linux.alibaba.com, dongjoo.seo1@samsung.com Subject: [RFC PATCH v2 10/11] drivers/cxl: add spm_node bit to cxl region Date: Wed, 12 Nov 2025 14:29:26 -0500 Message-ID: <20251112192936.2574429-11-gourry@gourry.net> X-Mailer: git-send-email 2.51.1 In-Reply-To: <20251112192936.2574429-1-gourry@gourry.net> References: <20251112192936.2574429-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add spm_node bit to cxl region, forward it to the dax device. This allows auto-hotplug to occur without an intermediate udev step to poke the DAX device spm_node bit. Signed-off-by: Gregory Price --- drivers/cxl/core/region.c | 30 ++++++++++++++++++++++++++++++ drivers/cxl/cxl.h | 2 ++ drivers/dax/cxl.c | 1 + 3 files changed, 33 insertions(+) diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index b06fee1978ba..3348b09dfe9a 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -754,6 +754,35 @@ static ssize_t size_show(struct device *dev, struct de= vice_attribute *attr, } static DEVICE_ATTR_RW(size); =20 +static ssize_t spm_node_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct cxl_region *cxlr =3D to_cxl_region(dev); + + return sysfs_emit(buf, "%d\n", cxlr->spm_node); +} + +static ssize_t spm_node_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t len) +{ + struct cxl_region *cxlr =3D to_cxl_region(dev); + bool val; + int rc; + + rc =3D kstrtobool(buf, &val); + if (rc) + return rc; + + ACQUIRE(rwsem_write_kill, rwsem)(&cxl_rwsem.region); + if ((rc =3D ACQUIRE_ERR(rwsem_read_intr, &rwsem))) + return rc; + + cxlr->spm_node =3D val; + return len; +} +static DEVICE_ATTR_RW(spm_node); + static struct attribute *cxl_region_attrs[] =3D { &dev_attr_uuid.attr, &dev_attr_commit.attr, @@ -762,6 +791,7 @@ static struct attribute *cxl_region_attrs[] =3D { &dev_attr_resource.attr, &dev_attr_size.attr, &dev_attr_mode.attr, + &dev_attr_spm_node.attr, NULL, }; =20 diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 231ddccf8977..ba7cde06dfd3 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -530,6 +530,7 @@ enum cxl_partition_mode { * @coord: QoS access coordinates for the region * @node_notifier: notifier for setting the access coordinates to node * @adist_notifier: notifier for calculating the abstract distance of node + * @spm_node: memory can only be added to specific purpose NUMA nodes */ struct cxl_region { struct device dev; @@ -543,6 +544,7 @@ struct cxl_region { struct access_coordinate coord[ACCESS_COORDINATE_MAX]; struct notifier_block node_notifier; struct notifier_block adist_notifier; + bool spm_node; }; =20 struct cxl_nvdimm_bridge { diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c index 13cd94d32ff7..968d23fc19ed 100644 --- a/drivers/dax/cxl.c +++ b/drivers/dax/cxl.c @@ -27,6 +27,7 @@ static int cxl_dax_region_probe(struct device *dev) .id =3D -1, .size =3D range_len(&cxlr_dax->hpa_range), .memmap_on_memory =3D true, + .spm_node =3D cxlr->spm_node, }; =20 return PTR_ERR_OR_ZERO(devm_create_dev_dax(&data)); --=20 2.51.1 From nobody Sun Feb 8 22:43:21 2026 Received: from mail-qt1-f176.google.com (mail-qt1-f176.google.com [209.85.160.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 476B535470F for ; Wed, 12 Nov 2025 19:30:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762975825; cv=none; b=Lu8qBmQ6iRCVU5TNbndoD9jIOJD+YE3LdZ/ocZFIH1IdrLSSGF3NrK/YSH9KCud2H/tFTvlkWafbh2GkteTMFsOtMugM2H8vc+EtMhd7Vky2WvpiL/kwLSpCZbquTuv1hcDYbHVOlU0MKBtDhIYTELYbzXEhPbkiB1uIVXpr8r4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762975825; c=relaxed/simple; bh=rQ2kHp6lPREv3YxzvbPdgaiaRh8JzJ1empdCPfgiMYg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=U6Ij13qvN4BGh5Gf4SWMgmWUFllpWkCA5ogNtcKq3fOBdNY87Z6yQoWFQCuVwzHOPywAlB2sEUXH/5rGJhEKofoXNMfjcxXgAZE7S62xYpCwEiC+uOangzJbD3rdYURvi8XmWdJWBFuzHwLlFdH7Q4Vt9OCZnKJgCxRr8EdfS/I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=pOiqeYQC; arc=none smtp.client-ip=209.85.160.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="pOiqeYQC" Received: by mail-qt1-f176.google.com with SMTP id d75a77b69052e-4eddfb8c7f5so8952701cf.1 for ; Wed, 12 Nov 2025 11:30:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1762975822; x=1763580622; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=EE3olEnyRjl52kAEWnA02JLmmLDlTmwS9hIpIeJT8kY=; b=pOiqeYQCabQJef3Rs9wWabdkiZ771ipQtxnLdPN8Y9Io/D5tmux82CXFBxpEiwDOD3 0HUsImdY3DffKnat2HR1WwA/P2GcwlnLE/ElaNkW+LibWaC/wOhEyfS3X3f6H9Wpebgt AmFZr9UQ8dGUE5gkR2dNl93p5S3mo9aNaje3hsRTg475vogLENODR6MIfAUfIcnfiRRQ uxfbG9o1CGd6U45pXz/Ye39X5Lq8jQeY0f1h3PQKS5hpmmSwmc6v3AYRPU945hJG7ATL A8Jmn6od++41aDCAYWvKcjFMxXI0rtOELJL0B8WwxGS1nxIpebr7U0zst7YWxVjs1rQv 2WKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1762975822; x=1763580622; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=EE3olEnyRjl52kAEWnA02JLmmLDlTmwS9hIpIeJT8kY=; b=Z+YgMi2K0ZJk6kn4otcyaFEKLeoiePr+Mb7IDqDF+wgNAFkjXA4iYy5vvfKCVHdykA zWtSDY3lWKA+SzWMS2eLgKg3L18+cwOtF/E5DLfsjbLAgr9RcSnbYgX/ARONzJ9zKk5v cNzPfirRwA/y7QF2p3jdm/99Vwe3aGoEM0clnXSJQ+CPtI1GJYDZ8Yj5tcI6IE0jBvpp iC7cj2zepsXqYkYdU3SkJ1sV5pprsJCdzWQLpBdUStLfRf+oG56qyz8vhhTmVZV7ho0U 8p9J1aGjOAwQrKae5OBguV+jKvLhlT5b6jB1TzJ9ci8pdQy4BGlX13uhdNPYCEn+7fHQ KJew== X-Forwarded-Encrypted: i=1; AJvYcCUwFMgGHwTPOfMrhNj0xGenUSvtavmEFGPuXFbcenlipgFYqHdrlMq6m4HZ2QtFdNsEPmbGPkxjLRpBR80=@vger.kernel.org X-Gm-Message-State: AOJu0YwVahyNIjbPvo7ku3W6+NpOTDiKvGM1V9TYz6+9eHR39OywVNe7 qsRc++ygCM+rZq8SuwSqEjcOKSaugXLsgrEgdp1uoDrbJGtpKyzZH5ZjydL0E12Btlo= X-Gm-Gg: ASbGnctZZyhHScaSFtLdGrbyj02MYtTanXoKoY2sk/kQZGjg2LDu9WohPu3Ogs8Sx4O fuEA+WIZS18WyGbFSYfn18Xl7Gk3OcCGAgdi5ebzVnhr6OIkbfFdSYuE+AlIRGx3awuPaWPtkPq phaKWyLBkLz0aEPXqbB3pkJ8dEd/TzX2Xi5m+P9wrX5ROSG+BwTvu3PZGgRELrpkozj1CbEv7S3 y8YHgc2hWnvu+q5J7sDRg8OpxuYlsqMBIYaO+nOFzIvbGePiSiOXK9qOiOFK/FgwM8Uix98Pk3T awqpHwL81G1THKR1TkVAFvCyMaQrowrohkqeNBodbRBNTVzop7bylOXRWsYVLahUCpGmQIbufZy H/YkBKRC+WSczTw16LEhdFUJ7Cg+988c0sJHgMYkhGzCLaARveEcWacVzRcSvjdYE5PcGprZrDK v1hWwGH7giLI3VgPH5JlWotkbKZ/x9LByjGqpUvzoSmKjgdFUajkESa3eUMsS6oqrZ X-Google-Smtp-Source: AGHT+IEpkA9sHRf+vDkvWZyRaV+z+4MD09rdo9Mmp9OjAhV3qhp/e6NAADZhOGarPgun9sbSz0rc+w== X-Received: by 2002:a05:622a:14f:b0:4e8:b980:4792 with SMTP id d75a77b69052e-4eddbcb30a2mr55332141cf.37.1762975821965; Wed, 12 Nov 2025 11:30:21 -0800 (PST) Received: from gourry-fedora-PF4VCD3F.lan (pool-96-255-20-138.washdc.ftas.verizon.net. [96.255.20.138]) by smtp.gmail.com with ESMTPSA id af79cd13be357-8b29aa0082esm243922885a.50.2025.11.12.11.30.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 12 Nov 2025 11:30:21 -0800 (PST) From: Gregory Price To: linux-mm@kvack.org Cc: kernel-team@meta.com, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev, linux-fsdevel@vger.kernel.org, cgroups@vger.kernel.org, dave@stgolabs.net, jonathan.cameron@huawei.com, dave.jiang@intel.com, alison.schofield@intel.com, vishal.l.verma@intel.com, ira.weiny@intel.com, dan.j.williams@intel.com, longman@redhat.com, akpm@linux-foundation.org, david@redhat.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com, mhocko@suse.com, osalvador@suse.de, ziy@nvidia.com, matthew.brost@intel.com, joshua.hahnjy@gmail.com, rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net, ying.huang@linux.alibaba.com, apopple@nvidia.com, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, tj@kernel.org, hannes@cmpxchg.org, mkoutny@suse.com, kees@kernel.org, muchun.song@linux.dev, roman.gushchin@linux.dev, shakeel.butt@linux.dev, rientjes@google.com, jackmanb@google.com, cl@gentwo.org, harry.yoo@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, zhengqi.arch@bytedance.com, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev, fabio.m.de.francesco@linux.intel.com, rrichter@amd.com, ming.li@zohomail.com, usamaarif642@gmail.com, brauner@kernel.org, oleg@redhat.com, namcao@linutronix.de, escape@linux.alibaba.com, dongjoo.seo1@samsung.com Subject: [RFC PATCH v2 11/11] [HACK] mm/zswap: compressed ram integration example Date: Wed, 12 Nov 2025 14:29:27 -0500 Message-ID: <20251112192936.2574429-12-gourry@gourry.net> X-Mailer: git-send-email 2.51.1 In-Reply-To: <20251112192936.2574429-1-gourry@gourry.net> References: <20251112192936.2574429-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Here is an example of how you might use a SPM memory node. If there is compressed ram available (in this case, a bit present in mt_spm_nodelist), we skip the entire software compression process and memcpy directly to a compressed memory folio, and store the newly allocated compressed memory page as the zswap entry->handle. On decompress we do the opposite: copy directly from the stored page to the destination, and free the compressed memory page. Note: We do not integrate any compressed memory device checks at this point because this is a stand-in to demonstrate how the SPM node allocation mechanism works. See the "TODO" comment in `zswap_compress_direct()` for more details In reality, we would want to move this mechanism out of zswap into its own component (cram.c?), and enable a more direct migrate_page() call that actually re-maps the page read-only into any mappings, and then provides a write-fault handler which promotes the page on write. (Similar to a NUMA Hint Fault, but only on write-access) This prevents any run-away compression ratio failures, since the compression ratio would be checked on allocation, rather than allowed to silently decrease on writes until the device becomes unstable. Signed-off-by: Gregory Price --- mm/zswap.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 65 insertions(+), 1 deletion(-) diff --git a/mm/zswap.c b/mm/zswap.c index c1af782e54ec..e6f48a4e90f1 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -25,6 +25,7 @@ #include #include #include +#include #include #include #include @@ -191,6 +192,7 @@ struct zswap_entry { swp_entry_t swpentry; unsigned int length; bool referenced; + bool direct; struct zswap_pool *pool; unsigned long handle; struct obj_cgroup *objcg; @@ -717,7 +719,8 @@ static void zswap_entry_cache_free(struct zswap_entry *= entry) static void zswap_entry_free(struct zswap_entry *entry) { zswap_lru_del(&zswap_list_lru, entry); - zs_free(entry->pool->zs_pool, entry->handle); + if (!entry->direct) + zs_free(entry->pool->zs_pool, entry->handle); zswap_pool_put(entry->pool); if (entry->objcg) { obj_cgroup_uncharge_zswap(entry->objcg, entry->length); @@ -851,6 +854,43 @@ static void acomp_ctx_put_unlock(struct crypto_acomp_c= tx *acomp_ctx) mutex_unlock(&acomp_ctx->mutex); } =20 +static struct page *zswap_compress_direct(struct page *src, + struct zswap_entry *entry) +{ + int nid =3D first_node(mt_spm_nodelist); + struct page *dst; + gfp_t gfp; + + if (nid =3D=3D NUMA_NO_NODE) + return NULL; + + gfp =3D GFP_NOWAIT | __GFP_NORETRY | __GFP_HIGHMEM | __GFP_MOVABLE | + __GFP_SPM_NODE; + dst =3D __alloc_pages(gfp, 0, nid, &mt_spm_nodelist); + if (!dst) + return NULL; + + /* + * TODO: check that the page is safe to use + * + * In a real implementation, we would not be using ZSWAP to demonstrate t= his + * and instead would implement a new component (compressed_ram, cram.c?) + * + * At this point we would check via some callback that the device's memory + * is actually safe to use - and if not, free the page (without writing to + * it), and kick off kswapd for that node to make room. + * + * Alternatively, if the compressed memory device(s) report a watermark + * crossing via interrupt, a flag can be set that is checked here rather + * that calling back into a device driver. + * + * In this case, we're testing with normal memory, so the memory is always + * safe to use (i.e. no compression ratio to worry about). + */ + copy_mc_highpage(dst, src); + return dst; +} + static bool zswap_compress(struct page *page, struct zswap_entry *entry, struct zswap_pool *pool) { @@ -862,6 +902,19 @@ static bool zswap_compress(struct page *page, struct z= swap_entry *entry, gfp_t gfp; u8 *dst; bool mapped =3D false; + struct page *zpage; + + /* Try to shunt directly to compressed ram */ + if (!nodes_empty(mt_spm_nodelist)) { + zpage =3D zswap_compress_direct(page, entry); + if (zpage) { + entry->handle =3D (unsigned long)zpage; + entry->length =3D PAGE_SIZE; + entry->direct =3D true; + return true; + } + /* otherwise fallback to normal zswap */ + } =20 acomp_ctx =3D acomp_ctx_get_cpu_lock(pool); dst =3D acomp_ctx->buffer; @@ -939,6 +992,16 @@ static bool zswap_decompress(struct zswap_entry *entry= , struct folio *folio) int decomp_ret =3D 0, dlen =3D PAGE_SIZE; u8 *src, *obj; =20 + /* compressed ram page */ + if (entry->direct) { + struct page *src =3D (struct page *)entry->handle; + struct folio *zfolio =3D page_folio(src); + + memcpy_folio(folio, 0, zfolio, 0, PAGE_SIZE); + __free_page(src); + goto direct_done; + } + acomp_ctx =3D acomp_ctx_get_cpu_lock(pool); obj =3D zs_obj_read_begin(pool->zs_pool, entry->handle, acomp_ctx->buffer= ); =20 @@ -972,6 +1035,7 @@ static bool zswap_decompress(struct zswap_entry *entry= , struct folio *folio) zs_obj_read_end(pool->zs_pool, entry->handle, obj); acomp_ctx_put_unlock(acomp_ctx); =20 +direct_done: if (!decomp_ret && dlen =3D=3D PAGE_SIZE) return true; =20 --=20 2.51.1