From nobody Mon May 12 16:36:40 2025 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4FBDD1EF38D for <linux-kernel@vger.kernel.org>; Tue, 1 Apr 2025 09:27:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743499660; cv=none; b=UrZT4Uk/b0FSpYCqK4xHuAXLiTjR+CYcnX7/5UfmcZd2OnKIBTpjB4Ty3q2xqTH/zbn5wpkpsRvdsgpv6vSYIpDn1U/d1SJibQ2PRmg1qDgo9C0h0QOyk7xeTa3LqehxJIsrek7VnI7lyimlBKPMDAjRw22V7YFlrbUlr+PHa7M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743499660; c=relaxed/simple; bh=FjQTcNkDohnu2X/iYR75Kv3ouX2d/iZnxO/qb3HRi+M=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=HCYoBwISMUp2IwrULTzvB03GEaPn1SCiuPBagLtyaQx2c4YQPBuOBJ66sPKjeQ2SZvsv/Zmo6P+Mt0wCKsF7NmUfLImI6q8yc0SPz0qoNDBYVt2hskmCcJQLv9Q37K7i/wgRykjipRCHAHmQXUhzxHX761JSN3X80RuS2SS/VGE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=suse.de; spf=pass smtp.mailfrom=suse.de; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b=Kw1oFMUm; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b=Y2JWPR6t; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b=Kw1oFMUm; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b=Y2JWPR6t; arc=none smtp.client-ip=195.135.223.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b="Kw1oFMUm"; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b="Y2JWPR6t"; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b="Kw1oFMUm"; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b="Y2JWPR6t" Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 6A2D41F38E; Tue, 1 Apr 2025 09:27:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1743499656; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ja9IRtVB+Kg0mMs7eGKdSau5ewSm4bpz3DgB3v/fQNk=; b=Kw1oFMUmMHjS9+Qdxs+ynAdFOmek9Ig7PR5cLXKRQqTZAQl2qkuUm5OBdS7+V5BQQ2o3HC b96VOXnvb9PQTQpjvMNiitNnV71fXF5jZyEbuaKpL8XbYte4K0j7qI0krvhg/GOqPHVQWN oNMKYNl1GatAZlAx0qrXSegwfa7eFrM= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1743499656; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ja9IRtVB+Kg0mMs7eGKdSau5ewSm4bpz3DgB3v/fQNk=; b=Y2JWPR6tB5yq8NW4gjxWJv7UA6U/TUFpj6CsOO57TbQqhLUFRO8E4HGbuLlkCn4NrXMStt B7C8IkgbMJdtnfBg== Authentication-Results: smtp-out2.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1743499656; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ja9IRtVB+Kg0mMs7eGKdSau5ewSm4bpz3DgB3v/fQNk=; b=Kw1oFMUmMHjS9+Qdxs+ynAdFOmek9Ig7PR5cLXKRQqTZAQl2qkuUm5OBdS7+V5BQQ2o3HC b96VOXnvb9PQTQpjvMNiitNnV71fXF5jZyEbuaKpL8XbYte4K0j7qI0krvhg/GOqPHVQWN oNMKYNl1GatAZlAx0qrXSegwfa7eFrM= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1743499656; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ja9IRtVB+Kg0mMs7eGKdSau5ewSm4bpz3DgB3v/fQNk=; b=Y2JWPR6tB5yq8NW4gjxWJv7UA6U/TUFpj6CsOO57TbQqhLUFRO8E4HGbuLlkCn4NrXMStt B7C8IkgbMJdtnfBg== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id D05B613691; Tue, 1 Apr 2025 09:27:35 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id eMoyMIex62cnfwAAD6G6ig (envelope-from <osalvador@suse.de>); Tue, 01 Apr 2025 09:27:35 +0000 From: Oscar Salvador <osalvador@suse.de> To: Andrew Morton <akpm@linux-foundation.org> Cc: David Hildenbrand <david@redhat.com>, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Vlastimil Babka <vbabka@suse.cz>, Hyeonggon Yoo <42.hyeyoo@gmail.com>, mkoutny@suse.com, Dan Williams <dan.j.williams@intel.com>, Jonathan Cameron <Jonathan.Cameron@huawei.com>, Oscar Salvador <osalvador@suse.de> Subject: [PATCH 1/2] mm,memory_hotplug: Implement numa node notifier Date: Tue, 1 Apr 2025 11:27:15 +0200 Message-ID: <20250401092716.537512-2-osalvador@suse.de> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250401092716.537512-1-osalvador@suse.de> References: <20250401092716.537512-1-osalvador@suse.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -1.30 X-Spamd-Result: default: False [-1.30 / 50.00]; BAYES_HAM(-3.00)[100.00%]; SUSPICIOUS_RECIPS(1.50)[]; MID_CONTAINS_FROM(1.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; R_MISSING_CHARSET(0.50)[]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.de:mid,suse.de:email]; FREEMAIL_CC(0.00)[redhat.com,kvack.org,vger.kernel.org,suse.cz,gmail.com,suse.com,intel.com,huawei.com,suse.de]; TAGGED_RCPT(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; MIME_TRACE(0.00)[0:+]; ARC_NA(0.00)[]; FROM_HAS_DN(0.00)[]; RCVD_TLS_ALL(0.00)[]; RCPT_COUNT_SEVEN(0.00)[10]; RCVD_COUNT_TWO(0.00)[2]; TO_MATCH_ENVRCPT_ALL(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; FUZZY_BLOCKED(0.00)[rspamd.com]; TO_DN_SOME(0.00)[]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; FREEMAIL_ENVRCPT(0.00)[gmail.com] X-Spam-Flag: NO X-Spam-Level: Content-Type: text/plain; charset="utf-8" There are at least four consumers of hotplug_memory_notifier that what they really are interested in is whether any numa node changed its state, e.g: g= oing from being memory aware to becoming memoryless. Implement a specific notifier for numa nodes when their state gets changed, and have those consumers that only care about numa node state changes use i= t. Signed-off-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> --- drivers/acpi/numa/hmat.c | 6 +-- drivers/base/node.c | 19 +++++++++ drivers/cxl/core/region.c | 14 +++---- drivers/cxl/cxl.h | 4 +- include/linux/memory.h | 38 ++++++++++++++++++ kernel/cgroup/cpuset.c | 2 +- mm/memory-tiers.c | 8 ++-- mm/memory_hotplug.c | 84 +++++++++++++++++++++++++++++---------- mm/slub.c | 22 +++++----- 9 files changed, 148 insertions(+), 49 deletions(-) diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c index bfbb08b1e6af..d18f3efa2149 100644 --- a/drivers/acpi/numa/hmat.c +++ b/drivers/acpi/numa/hmat.c @@ -918,10 +918,10 @@ static int hmat_callback(struct notifier_block *self, unsigned long action, void *arg) { struct memory_target *target; - struct memory_notify *mnb =3D arg; + struct node_notify *mnb =3D arg; int pxm, nid =3D mnb->status_change_nid; =20 - if (nid =3D=3D NUMA_NO_NODE || action !=3D MEM_ONLINE) + if (nid =3D=3D NUMA_NO_NODE || action !=3D NODE_BECAME_MEM_AWARE) return NOTIFY_OK; =20 pxm =3D node_to_pxm(nid); @@ -1074,7 +1074,7 @@ static __init int hmat_init(void) hmat_register_targets(); =20 /* Keep the table and structures if the notifier may use them */ - if (hotplug_memory_notifier(hmat_callback, HMAT_CALLBACK_PRI)) + if (hotplug_node_notifier(hmat_callback, HMAT_CALLBACK_PRI)) goto out_put; =20 if (!hmat_set_default_dram_perf()) diff --git a/drivers/base/node.c b/drivers/base/node.c index 0ea653fa3433..182c71dfb5b8 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -110,6 +110,25 @@ static const struct attribute_group *node_access_node_= groups[] =3D { NULL, }; =20 +static BLOCKING_NOTIFIER_HEAD(node_chain); + +int register_node_notifier(struct notifier_block *nb) +{ + return blocking_notifier_chain_register(&node_chain, nb); +} +EXPORT_SYMBOL(register_node_notifier); + +void unregister_node_notifier(struct notifier_block *nb) +{ + blocking_notifier_chain_unregister(&node_chain, nb); +} +EXPORT_SYMBOL(unregister_node_notifier); + +int node_notify(unsigned long val, void *v) +{ + return blocking_notifier_call_chain(&node_chain, val, v); +} + static void node_remove_accesses(struct node *node) { struct node_access_nodes *c, *cnext; diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index e8d11a988fd9..7d187088f557 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -2409,12 +2409,12 @@ static int cxl_region_perf_attrs_callback(struct no= tifier_block *nb, unsigned long action, void *arg) { struct cxl_region *cxlr =3D container_of(nb, struct cxl_region, - memory_notifier); - struct memory_notify *mnb =3D arg; + node_notifier); + struct node_notify *mnb =3D arg; int nid =3D mnb->status_change_nid; int region_nid; =20 - if (nid =3D=3D NUMA_NO_NODE || action !=3D MEM_ONLINE) + if (nid =3D=3D NUMA_NO_NODE || action !=3D NODE_BECAME_MEM_AWARE) return NOTIFY_DONE; =20 /* @@ -3388,7 +3388,7 @@ static void shutdown_notifiers(void *_cxlr) { struct cxl_region *cxlr =3D _cxlr; =20 - unregister_memory_notifier(&cxlr->memory_notifier); + unregister_node_notifier(&cxlr->node_notifier); unregister_mt_adistance_algorithm(&cxlr->adist_notifier); } =20 @@ -3427,9 +3427,9 @@ static int cxl_region_probe(struct device *dev) if (rc) return rc; =20 - cxlr->memory_notifier.notifier_call =3D cxl_region_perf_attrs_callback; - cxlr->memory_notifier.priority =3D CXL_CALLBACK_PRI; - register_memory_notifier(&cxlr->memory_notifier); + cxlr->node_notifier.notifier_call =3D cxl_region_perf_attrs_callback; + cxlr->node_notifier.priority =3D CXL_CALLBACK_PRI; + register_node_notifier(&cxlr->node_notifier); =20 cxlr->adist_notifier.notifier_call =3D cxl_region_calculate_adistance; cxlr->adist_notifier.priority =3D 100; diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index bbbaa0d0a670..d4c9a499de7a 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -532,7 +532,7 @@ struct cxl_region_params { * @flags: Region state flags * @params: active + config params for the region * @coord: QoS access coordinates for the region - * @memory_notifier: notifier for setting the access coordinates to node + * @node_notifier: notifier for setting the access coordinates to node * @adist_notifier: notifier for calculating the abstract distance of node */ struct cxl_region { @@ -545,7 +545,7 @@ struct cxl_region { unsigned long flags; struct cxl_region_params params; struct access_coordinate coord[ACCESS_COORDINATE_MAX]; - struct notifier_block memory_notifier; + struct notifier_block node_notifier; struct notifier_block adist_notifier; }; =20 diff --git a/include/linux/memory.h b/include/linux/memory.h index 12daa6ec7d09..1d814dfbb8a8 100644 --- a/include/linux/memory.h +++ b/include/linux/memory.h @@ -99,6 +99,14 @@ int set_memory_block_size_order(unsigned int order); #define MEM_PREPARE_ONLINE (1<<6) #define MEM_FINISH_OFFLINE (1<<7) =20 +/* These states are used for numa node notifiers */ +#define NODE_BECOMING_MEM_AWARE (1<<0) +#define NODE_BECAME_MEM_AWARE (1<<1) +#define NODE_BECOMING_MEMORYLESS (1<<2) +#define NODE_BECAME_MEMORYLESS (1<<3) +#define NODE_CANCEL_MEM_AWARE (1<<4) +#define NODE_CANCEL_MEMORYLESS (1<<5) + struct memory_notify { /* * The altmap_start_pfn and altmap_nr_pages fields are designated for @@ -113,6 +121,11 @@ struct memory_notify { int status_change_nid; }; =20 +struct node_notify { + int status_change_nid_normal; + int status_change_nid; +}; + struct notifier_block; struct mem_section; =20 @@ -149,15 +162,34 @@ static inline int hotplug_memory_notifier(notifier_fn= _t fn, int pri) { return 0; } + +static inline int register_node_notifier(struct notifier_block *nb) +{ + return 0; +} +static inline void unregister_node_notifier(struct notifier_block *nb) +{ +} +static inline int node_notify(unsigned long val, void *v) +{ + return 0; +} +static inline int hotplug_node_notifier(notifier_fn_t fn, int pri) +{ + return 0; +} #else /* CONFIG_MEMORY_HOTPLUG */ extern int register_memory_notifier(struct notifier_block *nb); +extern int register_node_notifier(struct notifier_block *nb); extern void unregister_memory_notifier(struct notifier_block *nb); +extern void unregister_node_notifier(struct notifier_block *nb); int create_memory_block_devices(unsigned long start, unsigned long size, struct vmem_altmap *altmap, struct memory_group *group); void remove_memory_block_devices(unsigned long start, unsigned long size); extern void memory_dev_init(void); extern int memory_notify(unsigned long val, void *v); +extern int node_notify(unsigned long val, void *v); extern struct memory_block *find_memory_block(unsigned long section_nr); typedef int (*walk_memory_blocks_func_t)(struct memory_block *, void *); extern int walk_memory_blocks(unsigned long start, unsigned long size, @@ -177,6 +209,12 @@ int walk_dynamic_memory_groups(int nid, walk_memory_gr= oups_func_t func, register_memory_notifier(&fn##_mem_nb); \ }) =20 +#define hotplug_node_notifier(fn, pri) ({ \ + static __meminitdata struct notifier_block fn##_node_nb =3D\ + { .notifier_call =3D fn, .priority =3D pri };\ + register_node_notifier(&fn##_node_nb); \ +}) + #ifdef CONFIG_NUMA void memory_block_add_nid(struct memory_block *mem, int nid, enum meminit_context context); diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 0f910c828973..62a5d34c4331 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -3939,7 +3939,7 @@ void __init cpuset_init_smp(void) cpumask_copy(top_cpuset.effective_cpus, cpu_active_mask); top_cpuset.effective_mems =3D node_states[N_MEMORY]; =20 - hotplug_memory_notifier(cpuset_track_online_nodes, CPUSET_CALLBACK_PRI); + hotplug_node_notifier(cpuset_track_online_nodes, CPUSET_CALLBACK_PRI); =20 cpuset_migrate_mm_wq =3D alloc_ordered_workqueue("cpuset_migrate_mm", 0); BUG_ON(!cpuset_migrate_mm_wq); diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c index fc14fe53e9b7..dfe6c28c8352 100644 --- a/mm/memory-tiers.c +++ b/mm/memory-tiers.c @@ -872,7 +872,7 @@ static int __meminit memtier_hotplug_callback(struct no= tifier_block *self, unsigned long action, void *_arg) { struct memory_tier *memtier; - struct memory_notify *arg =3D _arg; + struct node_notify *arg =3D _arg; =20 /* * Only update the node migration order when a node is @@ -882,13 +882,13 @@ static int __meminit memtier_hotplug_callback(struct = notifier_block *self, return notifier_from_errno(0); =20 switch (action) { - case MEM_OFFLINE: + case NODE_BECAME_MEMORYLESS: mutex_lock(&memory_tier_lock); if (clear_node_memory_tier(arg->status_change_nid)) establish_demotion_targets(); mutex_unlock(&memory_tier_lock); break; - case MEM_ONLINE: + case NODE_BECAME_MEM_AWARE: mutex_lock(&memory_tier_lock); memtier =3D set_node_memory_tier(arg->status_change_nid); if (!IS_ERR(memtier)) @@ -929,7 +929,7 @@ static int __init memory_tier_init(void) nodes_and(default_dram_nodes, node_states[N_MEMORY], node_states[N_CPU]); =20 - hotplug_memory_notifier(memtier_hotplug_callback, MEMTIER_HOTPLUG_PRI); + hotplug_node_notifier(memtier_hotplug_callback, MEMTIER_HOTPLUG_PRI); return 0; } subsys_initcall(memory_tier_init); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 75401866fb76..4bb9ff282ec9 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -701,7 +701,7 @@ static void online_pages_range(unsigned long start_pfn,= unsigned long nr_pages) =20 /* check which state of node_states will be changed when online memory */ static void node_states_check_changes_online(unsigned long nr_pages, - struct zone *zone, struct memory_notify *arg) + struct zone *zone, struct node_notify *arg) { int nid =3D zone_to_nid(zone); =20 @@ -714,7 +714,7 @@ static void node_states_check_changes_online(unsigned l= ong nr_pages, arg->status_change_nid_normal =3D nid; } =20 -static void node_states_set_node(int node, struct memory_notify *arg) +static void node_states_set_node(int node, struct node_notify *arg) { if (arg->status_change_nid_normal >=3D 0) node_set_state(node, N_NORMAL_MEMORY); @@ -1177,7 +1177,9 @@ int online_pages(unsigned long pfn, unsigned long nr_= pages, int need_zonelists_rebuild =3D 0; const int nid =3D zone_to_nid(zone); int ret; - struct memory_notify arg; + struct memory_notify mem_arg; + struct node_notify node_arg; + bool cancel_mem_notifier_on_err =3D false, cancel_node_notifier_on_err = =3D false; =20 /* * {on,off}lining is constrained to full memory sections (or more @@ -1194,11 +1196,23 @@ int online_pages(unsigned long pfn, unsigned long n= r_pages, /* associate pfn range with the zone */ move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_ISOLATE); =20 - arg.start_pfn =3D pfn; - arg.nr_pages =3D nr_pages; - node_states_check_changes_online(nr_pages, zone, &arg); + mem_arg.start_pfn =3D pfn; + mem_arg.nr_pages =3D nr_pages; + node_states_check_changes_online(nr_pages, zone, &node_arg); =20 - ret =3D memory_notify(MEM_GOING_ONLINE, &arg); + if (node_arg.status_change_nid >=3D 0) { + /* Node is becoming memory aware. Notify consumers */ + cancel_node_notifier_on_err =3D true; + ret =3D node_notify(NODE_BECOMING_MEM_AWARE, &node_arg); + ret =3D notifier_to_errno(ret); + if (ret) + goto failed_addition; + } + + cancel_mem_notifier_on_err =3D true; + mem_arg.status_change_nid =3D node_arg.status_change_nid; + mem_arg.status_change_nid_normal =3D node_arg.status_change_nid_normal; + ret =3D memory_notify(MEM_GOING_ONLINE, &mem_arg); ret =3D notifier_to_errno(ret); if (ret) goto failed_addition; @@ -1224,7 +1238,7 @@ int online_pages(unsigned long pfn, unsigned long nr_= pages, online_pages_range(pfn, nr_pages); adjust_present_page_count(pfn_to_page(pfn), group, nr_pages); =20 - node_states_set_node(nid, &arg); + node_states_set_node(nid, &node_arg); if (need_zonelists_rebuild) build_all_zonelists(NULL); =20 @@ -1245,16 +1259,26 @@ int online_pages(unsigned long pfn, unsigned long n= r_pages, kswapd_run(nid); kcompactd_run(nid); =20 + if (node_arg.status_change_nid >=3D 0) + /* + * Node went from memoryless to have memory. Notifiy interested + * consumers + */ + node_notify(NODE_BECAME_MEM_AWARE, &node_arg); + writeback_set_ratelimit(); =20 - memory_notify(MEM_ONLINE, &arg); + memory_notify(MEM_ONLINE, &mem_arg); return 0; =20 failed_addition: pr_debug("online_pages [mem %#010llx-%#010llx] failed\n", (unsigned long long) pfn << PAGE_SHIFT, (((unsigned long long) pfn + nr_pages) << PAGE_SHIFT) - 1); - memory_notify(MEM_CANCEL_ONLINE, &arg); + if (cancel_node_notifier_on_err) + node_notify(NODE_CANCEL_MEM_AWARE, &node_arg); + if (cancel_mem_notifier_on_err) + memory_notify(MEM_CANCEL_ONLINE, &mem_arg); remove_pfn_range_from_zone(zone, pfn, nr_pages); return ret; } @@ -1898,7 +1922,7 @@ early_param("movable_node", cmdline_parse_movable_nod= e); =20 /* check which state of node_states will be changed when offline memory */ static void node_states_check_changes_offline(unsigned long nr_pages, - struct zone *zone, struct memory_notify *arg) + struct zone *zone, struct node_notify *arg) { struct pglist_data *pgdat =3D zone->zone_pgdat; unsigned long present_pages =3D 0; @@ -1935,7 +1959,7 @@ static void node_states_check_changes_offline(unsigne= d long nr_pages, arg->status_change_nid =3D zone_to_nid(zone); } =20 -static void node_states_clear_node(int node, struct memory_notify *arg) +static void node_states_clear_node(int node, struct node_notify *arg) { if (arg->status_change_nid_normal >=3D 0) node_clear_state(node, N_NORMAL_MEMORY); @@ -1963,7 +1987,9 @@ int offline_pages(unsigned long start_pfn, unsigned l= ong nr_pages, unsigned long pfn, managed_pages, system_ram_pages =3D 0; const int node =3D zone_to_nid(zone); unsigned long flags; - struct memory_notify arg; + struct memory_notify mem_arg; + struct node_notify node_arg; + bool cancel_mem_notifier_on_err =3D false, cancel_node_notifier_on_err = =3D false; char *reason; int ret; =20 @@ -2022,11 +2048,22 @@ int offline_pages(unsigned long start_pfn, unsigned= long nr_pages, goto failed_removal_pcplists_disabled; } =20 - arg.start_pfn =3D start_pfn; - arg.nr_pages =3D nr_pages; - node_states_check_changes_offline(nr_pages, zone, &arg); + mem_arg.start_pfn =3D start_pfn; + mem_arg.nr_pages =3D nr_pages; + node_states_check_changes_offline(nr_pages, zone, &node_arg); + + if (node_arg.status_change_nid >=3D 0) { + cancel_node_notifier_on_err =3D true; + ret =3D node_notify(NODE_BECOMING_MEMORYLESS, &node_arg); + ret =3D notifier_to_errno(ret); + if (ret) + goto failed_removal_isolated; + } =20 - ret =3D memory_notify(MEM_GOING_OFFLINE, &arg); + cancel_mem_notifier_on_err =3D true; + mem_arg.status_change_nid =3D node_arg.status_change_nid; + mem_arg.status_change_nid_normal =3D node_arg.status_change_nid_normal; + ret =3D memory_notify(MEM_GOING_OFFLINE, &mem_arg); ret =3D notifier_to_errno(ret); if (ret) { reason =3D "notifier failure"; @@ -2106,27 +2143,32 @@ int offline_pages(unsigned long start_pfn, unsigned= long nr_pages, * Make sure to mark the node as memory-less before rebuilding the zone * list. Otherwise this node would still appear in the fallback lists. */ - node_states_clear_node(node, &arg); + node_states_clear_node(node, &node_arg); if (!populated_zone(zone)) { zone_pcp_reset(zone); build_all_zonelists(NULL); } =20 - if (arg.status_change_nid >=3D 0) { + if (node_arg.status_change_nid >=3D 0) { kcompactd_stop(node); kswapd_stop(node); + /*Node went memoryless. Notifiy interested consumers */ + node_notify(NODE_BECAME_MEMORYLESS, &node_arg); } =20 writeback_set_ratelimit(); =20 - memory_notify(MEM_OFFLINE, &arg); + memory_notify(MEM_OFFLINE, &mem_arg); remove_pfn_range_from_zone(zone, start_pfn, nr_pages); return 0; =20 failed_removal_isolated: /* pushback to free area */ undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE); - memory_notify(MEM_CANCEL_OFFLINE, &arg); + if (cancel_node_notifier_on_err) + node_notify(NODE_CANCEL_MEMORYLESS, &node_arg); + if (cancel_mem_notifier_on_err) + memory_notify(MEM_CANCEL_OFFLINE, &mem_arg); failed_removal_pcplists_disabled: lru_cache_enable(); zone_pcp_enable(zone); diff --git a/mm/slub.c b/mm/slub.c index 184fd2b14758..74350f6c8ddd 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -5928,10 +5928,10 @@ static int slab_mem_going_offline_callback(void *ar= g) =20 static void slab_mem_offline_callback(void *arg) { - struct memory_notify *marg =3D arg; + struct node_notify *narg =3D arg; int offline_node; =20 - offline_node =3D marg->status_change_nid_normal; + offline_node =3D narg->status_change_nid_normal; =20 /* * If the node still has available memory. we need kmem_cache_node @@ -5954,8 +5954,8 @@ static int slab_mem_going_online_callback(void *arg) { struct kmem_cache_node *n; struct kmem_cache *s; - struct memory_notify *marg =3D arg; - int nid =3D marg->status_change_nid_normal; + struct node_notify *narg =3D arg; + int nid =3D narg->status_change_nid_normal; int ret =3D 0; =20 /* @@ -6007,18 +6007,18 @@ static int slab_memory_callback(struct notifier_blo= ck *self, int ret =3D 0; =20 switch (action) { - case MEM_GOING_ONLINE: + case NODE_BECOMING_MEM_AWARE: ret =3D slab_mem_going_online_callback(arg); break; - case MEM_GOING_OFFLINE: + case NODE_BECOMING_MEMORYLESS: ret =3D slab_mem_going_offline_callback(arg); break; - case MEM_OFFLINE: - case MEM_CANCEL_ONLINE: + case NODE_BECAME_MEMORYLESS: + case NODE_CANCEL_MEM_AWARE: slab_mem_offline_callback(arg); break; - case MEM_ONLINE: - case MEM_CANCEL_OFFLINE: + case NODE_BECAME_MEM_AWARE: + case NODE_CANCEL_MEMORYLESS: break; } if (ret) @@ -6094,7 +6094,7 @@ void __init kmem_cache_init(void) sizeof(struct kmem_cache_node), SLAB_HWCACHE_ALIGN | SLAB_NO_OBJ_EXT, 0, 0); =20 - hotplug_memory_notifier(slab_memory_callback, SLAB_CALLBACK_PRI); + hotplug_node_notifier(slab_memory_callback, SLAB_CALLBACK_PRI); =20 /* Able to allocate the per node structures */ slab_state =3D PARTIAL; --=20 2.49.0 From nobody Mon May 12 16:36:40 2025 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A16BB1EF0B1 for <linux-kernel@vger.kernel.org>; Tue, 1 Apr 2025 09:27:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.130 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743499665; cv=none; b=I2bPc0pdCY93jG9rXJ/8dNkBF9+qNc4xLxRm1r2YiuBg29aUyrEIb765f8TwTSU30tw4vxjnO8ziREadHgUYOXzJYXc8nb7ZethoFOhD8E+g5KK5Gvz5Rh3CyVh0MqsBzT3YgZZzb1fWsJ2dfhJklRl2ET9f9z4L1NVWUNMANr4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743499665; c=relaxed/simple; bh=6sHvaKhEiMzck8YhRZwbUktP2VvUflF+Mtoy5zKtl9g=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=lDVPdVXIptd+P3CWMCP9yqnL1zZha1WEdnABkm8c1LrDafRmAy7QNMuwlR+AXJiomvhXpJFClMpPghjVZg9CcNxRCfMRJ/AxSkTGcts9HQRGLalNt+W+KZjton/kQDntfA3eo1VUTmFO8xSQRZxSe8PKa6cyo+FjGFTBWiQk3XY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=suse.de; spf=pass smtp.mailfrom=suse.de; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b=xN6VL7l3; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b=7blGqOUv; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b=xN6VL7l3; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b=7blGqOUv; arc=none smtp.client-ip=195.135.223.130 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b="xN6VL7l3"; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b="7blGqOUv"; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b="xN6VL7l3"; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b="7blGqOUv" Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id C1CD621189; Tue, 1 Apr 2025 09:27:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1743499657; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=czY8Ikj4+P6N/A2ZD53YGUbiampkq/ZY/cKANaTlIWY=; b=xN6VL7l3V3TkZcNeC+qr2aF9Q1Fw86ZFgXHRkB7zMNDvwZ7FiDpy4fYzdOZMmBCPVS2BwQ y3tbkJWe3h4G9B2v3ATsl2hE3KWeXkNWWZ61ssPYaDiTGozuyjWcJekkfXtKPVHny/STpL +xIOOub6kALuQBHxZGqghUltzGqb76w= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1743499657; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=czY8Ikj4+P6N/A2ZD53YGUbiampkq/ZY/cKANaTlIWY=; b=7blGqOUv+NwFyNdBkk73FxHbPAYT6RO3YZitrGczsryyfceUqylqic29t2MxzoLWMv8QC7 0a4wk1oer25v6pCQ== Authentication-Results: smtp-out1.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1743499657; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=czY8Ikj4+P6N/A2ZD53YGUbiampkq/ZY/cKANaTlIWY=; b=xN6VL7l3V3TkZcNeC+qr2aF9Q1Fw86ZFgXHRkB7zMNDvwZ7FiDpy4fYzdOZMmBCPVS2BwQ y3tbkJWe3h4G9B2v3ATsl2hE3KWeXkNWWZ61ssPYaDiTGozuyjWcJekkfXtKPVHny/STpL +xIOOub6kALuQBHxZGqghUltzGqb76w= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1743499657; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=czY8Ikj4+P6N/A2ZD53YGUbiampkq/ZY/cKANaTlIWY=; b=7blGqOUv+NwFyNdBkk73FxHbPAYT6RO3YZitrGczsryyfceUqylqic29t2MxzoLWMv8QC7 0a4wk1oer25v6pCQ== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 33DAD13691; Tue, 1 Apr 2025 09:27:37 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id kFHxCYmx62cnfwAAD6G6ig (envelope-from <osalvador@suse.de>); Tue, 01 Apr 2025 09:27:37 +0000 From: Oscar Salvador <osalvador@suse.de> To: Andrew Morton <akpm@linux-foundation.org> Cc: David Hildenbrand <david@redhat.com>, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Vlastimil Babka <vbabka@suse.cz>, Hyeonggon Yoo <42.hyeyoo@gmail.com>, mkoutny@suse.com, Dan Williams <dan.j.williams@intel.com>, Jonathan Cameron <Jonathan.Cameron@huawei.com>, Oscar Salvador <osalvador@suse.de> Subject: [PATCH 2/2] mm,memory_hotplug: Replace status_change_nid parameter in memory_notify Date: Tue, 1 Apr 2025 11:27:16 +0200 Message-ID: <20250401092716.537512-3-osalvador@suse.de> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250401092716.537512-1-osalvador@suse.de> References: <20250401092716.537512-1-osalvador@suse.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -1.30 X-Spamd-Result: default: False [-1.30 / 50.00]; BAYES_HAM(-3.00)[100.00%]; SUSPICIOUS_RECIPS(1.50)[]; MID_CONTAINS_FROM(1.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; R_MISSING_CHARSET(0.50)[]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.de:mid,suse.de:email]; FREEMAIL_CC(0.00)[redhat.com,kvack.org,vger.kernel.org,suse.cz,gmail.com,suse.com,intel.com,huawei.com,suse.de]; TAGGED_RCPT(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; MIME_TRACE(0.00)[0:+]; ARC_NA(0.00)[]; FROM_HAS_DN(0.00)[]; RCVD_TLS_ALL(0.00)[]; RCPT_COUNT_SEVEN(0.00)[10]; RCVD_COUNT_TWO(0.00)[2]; TO_MATCH_ENVRCPT_ALL(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; FUZZY_BLOCKED(0.00)[rspamd.com]; TO_DN_SOME(0.00)[]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; FREEMAIL_ENVRCPT(0.00)[gmail.com] X-Spam-Flag: NO X-Spam-Level: Content-Type: text/plain; charset="utf-8" memory notify consumers are only interested in which node the memory we are adding belongs to, so replace current status_change_nid{_normal} fields with only one that specifies the node. Signed-off-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> --- include/linux/memory.h | 3 +-- mm/memory_hotplug.c | 6 ++---- mm/page_ext.c | 12 +----------- 3 files changed, 4 insertions(+), 17 deletions(-) diff --git a/include/linux/memory.h b/include/linux/memory.h index 1d814dfbb8a8..4d8884578a1a 100644 --- a/include/linux/memory.h +++ b/include/linux/memory.h @@ -117,8 +117,7 @@ struct memory_notify { unsigned long altmap_nr_pages; unsigned long start_pfn; unsigned long nr_pages; - int status_change_nid_normal; - int status_change_nid; + int nid; }; =20 struct node_notify { diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 4bb9ff282ec9..185d799c79e2 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1198,6 +1198,7 @@ int online_pages(unsigned long pfn, unsigned long nr_= pages, =20 mem_arg.start_pfn =3D pfn; mem_arg.nr_pages =3D nr_pages; + mem_arg.nid =3D nid; node_states_check_changes_online(nr_pages, zone, &node_arg); =20 if (node_arg.status_change_nid >=3D 0) { @@ -1210,8 +1211,6 @@ int online_pages(unsigned long pfn, unsigned long nr_= pages, } =20 cancel_mem_notifier_on_err =3D true; - mem_arg.status_change_nid =3D node_arg.status_change_nid; - mem_arg.status_change_nid_normal =3D node_arg.status_change_nid_normal; ret =3D memory_notify(MEM_GOING_ONLINE, &mem_arg); ret =3D notifier_to_errno(ret); if (ret) @@ -2050,6 +2049,7 @@ int offline_pages(unsigned long start_pfn, unsigned l= ong nr_pages, =20 mem_arg.start_pfn =3D start_pfn; mem_arg.nr_pages =3D nr_pages; + mem_arg.nid =3D node; node_states_check_changes_offline(nr_pages, zone, &node_arg); =20 if (node_arg.status_change_nid >=3D 0) { @@ -2061,8 +2061,6 @@ int offline_pages(unsigned long start_pfn, unsigned l= ong nr_pages, } =20 cancel_mem_notifier_on_err =3D true; - mem_arg.status_change_nid =3D node_arg.status_change_nid; - mem_arg.status_change_nid_normal =3D node_arg.status_change_nid_normal; ret =3D memory_notify(MEM_GOING_OFFLINE, &mem_arg); ret =3D notifier_to_errno(ret); if (ret) { diff --git a/mm/page_ext.c b/mm/page_ext.c index c351fdfe9e9a..477e6f24b7ab 100644 --- a/mm/page_ext.c +++ b/mm/page_ext.c @@ -378,16 +378,6 @@ static int __meminit online_page_ext(unsigned long sta= rt_pfn, start =3D SECTION_ALIGN_DOWN(start_pfn); end =3D SECTION_ALIGN_UP(start_pfn + nr_pages); =20 - if (nid =3D=3D NUMA_NO_NODE) { - /* - * In this case, "nid" already exists and contains valid memory. - * "start_pfn" passed to us is a pfn which is an arg for - * online__pages(), and start_pfn should exist. - */ - nid =3D pfn_to_nid(start_pfn); - VM_BUG_ON(!node_online(nid)); - } - for (pfn =3D start; !fail && pfn < end; pfn +=3D PAGES_PER_SECTION) fail =3D init_section_page_ext(pfn, nid); if (!fail) @@ -436,7 +426,7 @@ static int __meminit page_ext_callback(struct notifier_= block *self, switch (action) { case MEM_GOING_ONLINE: ret =3D online_page_ext(mn->start_pfn, - mn->nr_pages, mn->status_change_nid); + mn->nr_pages, mn->nid); break; case MEM_OFFLINE: offline_page_ext(mn->start_pfn, --=20 2.49.0