Add new memory notifiers to mimic the dynamic ACPI event triggered logic
for memory hotplug on platforms that do not generate such events. This
will be used to implement "memmap on memory" feature for s390 in a later
patch.
Platforms such as x86 can support physical memory hotplug via ACPI. When
there is physical memory hotplug, ACPI event leads to the memory
addition with the following callchain:
acpi_memory_device_add()
-> acpi_memory_enable_device()
-> __add_memory()
After this, the hotplugged memory is physically accessible, and altmap
support prepared, before the "memmap on memory" initialization in
memory_block_online() is called.
On s390, memory hotplug works in a different way. The available hotplug
memory has to be defined upfront in the hypervisor, but it is made
physically accessible only when the user sets it online via sysfs,
currently in the MEM_GOING_ONLINE notifier. This requires calling
add_memory() during early memory detection, in order to get the sysfs
representation, but we cannot use "memmap on memory" altmap support at
this stage, w/o having it physically accessible.
Since no ACPI or similar events are generated, there is no way to set up
altmap support, or even make the memory physically accessible at all,
before the "memmap on memory" initialization in memory_block_online().
The new MEM_PHYS_ONLINE notifier allows to work around this, by
providing a hook to make the memory physically accessible, and also call
__add_pages() with altmap support, early in memory_block_online().
Similarly, the MEM_PHYS_OFFLINE notifier allows to make the memory
inaccessible and call __remove_pages(), at the end of
memory_block_offline().
Calling __add/remove_pages() requires mem_hotplug_lock, so move
mem_hotplug_begin/done() to include the new notifiers.
All architectures ignore unknown memory notifiers, so this patch should
not introduce any functional changes.
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
---
drivers/base/memory.c | 18 +++++++++++++++++-
include/linux/memory.h | 2 ++
2 files changed, 19 insertions(+), 1 deletion(-)
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 1e9f6a1749b9..604940f62246 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -185,6 +185,7 @@ static int memory_block_online(struct memory_block *mem)
unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
unsigned long nr_vmemmap_pages = 0;
+ struct memory_notify arg;
struct zone *zone;
int ret;
@@ -194,6 +195,14 @@ static int memory_block_online(struct memory_block *mem)
zone = zone_for_pfn_range(mem->online_type, mem->nid, mem->group,
start_pfn, nr_pages);
+ arg.start_pfn = start_pfn;
+ arg.nr_pages = nr_pages;
+ mem_hotplug_begin();
+ ret = memory_notify(MEM_PHYS_ONLINE, &arg);
+ ret = notifier_to_errno(ret);
+ if (ret)
+ goto out_notifier;
+
/*
* Although vmemmap pages have a different lifecycle than the pages
* they describe (they remain until the memory is unplugged), doing
@@ -204,7 +213,6 @@ static int memory_block_online(struct memory_block *mem)
if (mem->altmap)
nr_vmemmap_pages = mem->altmap->free;
- mem_hotplug_begin();
if (nr_vmemmap_pages) {
ret = mhp_init_memmap_on_memory(start_pfn, nr_vmemmap_pages, zone);
if (ret)
@@ -228,7 +236,11 @@ static int memory_block_online(struct memory_block *mem)
nr_vmemmap_pages);
mem->zone = zone;
+ mem_hotplug_done();
+ return ret;
out:
+ memory_notify(MEM_PHYS_OFFLINE, &arg);
+out_notifier:
mem_hotplug_done();
return ret;
}
@@ -238,6 +250,7 @@ static int memory_block_offline(struct memory_block *mem)
unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
unsigned long nr_vmemmap_pages = 0;
+ struct memory_notify arg;
int ret;
if (!mem->zone)
@@ -269,6 +282,9 @@ static int memory_block_offline(struct memory_block *mem)
mhp_deinit_memmap_on_memory(start_pfn, nr_vmemmap_pages);
mem->zone = NULL;
+ arg.start_pfn = start_pfn;
+ arg.nr_pages = nr_pages;
+ memory_notify(MEM_PHYS_OFFLINE, &arg);
out:
mem_hotplug_done();
return ret;
diff --git a/include/linux/memory.h b/include/linux/memory.h
index f53cfdaaaa41..5d8b962b8fa1 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -96,6 +96,8 @@ int set_memory_block_size_order(unsigned int order);
#define MEM_GOING_ONLINE (1<<3)
#define MEM_CANCEL_ONLINE (1<<4)
#define MEM_CANCEL_OFFLINE (1<<5)
+#define MEM_PHYS_ONLINE (1<<6)
+#define MEM_PHYS_OFFLINE (1<<7)
struct memory_notify {
unsigned long start_pfn;
--
2.41.0
On 14.11.23 19:02, Sumanth Korikkar wrote: > Add new memory notifiers to mimic the dynamic ACPI event triggered logic > for memory hotplug on platforms that do not generate such events. This > will be used to implement "memmap on memory" feature for s390 in a later > patch. > > Platforms such as x86 can support physical memory hotplug via ACPI. When > there is physical memory hotplug, ACPI event leads to the memory > addition with the following callchain: > acpi_memory_device_add() > -> acpi_memory_enable_device() > -> __add_memory() > > After this, the hotplugged memory is physically accessible, and altmap > support prepared, before the "memmap on memory" initialization in > memory_block_online() is called. > > On s390, memory hotplug works in a different way. The available hotplug > memory has to be defined upfront in the hypervisor, but it is made > physically accessible only when the user sets it online via sysfs, > currently in the MEM_GOING_ONLINE notifier. This requires calling > add_memory() during early memory detection, in order to get the sysfs > representation, but we cannot use "memmap on memory" altmap support at > this stage, w/o having it physically accessible. > > Since no ACPI or similar events are generated, there is no way to set up > altmap support, or even make the memory physically accessible at all, > before the "memmap on memory" initialization in memory_block_online(). > > The new MEM_PHYS_ONLINE notifier allows to work around this, by > providing a hook to make the memory physically accessible, and also call > __add_pages() with altmap support, early in memory_block_online(). > Similarly, the MEM_PHYS_OFFLINE notifier allows to make the memory > inaccessible and call __remove_pages(), at the end of > memory_block_offline(). > > Calling __add/remove_pages() requires mem_hotplug_lock, so move > mem_hotplug_begin/done() to include the new notifiers. > > All architectures ignore unknown memory notifiers, so this patch should > not introduce any functional changes. Sorry to say, no. No hacks please, and this is a hack for memory that has already been added to the system. If you want memory without an altmap to suddenly not have an altmap anymore, then look into removing and readding that memory, or some way to convert offline memory. But certainly not on the online/offline path triggered by sysfs. -- Cheers, David / dhildenb
On Tue, 14 Nov 2023 19:27:35 +0100 David Hildenbrand <david@redhat.com> wrote: > On 14.11.23 19:02, Sumanth Korikkar wrote: > > Add new memory notifiers to mimic the dynamic ACPI event triggered logic > > for memory hotplug on platforms that do not generate such events. This > > will be used to implement "memmap on memory" feature for s390 in a later > > patch. > > > > Platforms such as x86 can support physical memory hotplug via ACPI. When > > there is physical memory hotplug, ACPI event leads to the memory > > addition with the following callchain: > > acpi_memory_device_add() > > -> acpi_memory_enable_device() > > -> __add_memory() > > > > After this, the hotplugged memory is physically accessible, and altmap > > support prepared, before the "memmap on memory" initialization in > > memory_block_online() is called. > > > > On s390, memory hotplug works in a different way. The available hotplug > > memory has to be defined upfront in the hypervisor, but it is made > > physically accessible only when the user sets it online via sysfs, > > currently in the MEM_GOING_ONLINE notifier. This requires calling > > add_memory() during early memory detection, in order to get the sysfs > > representation, but we cannot use "memmap on memory" altmap support at > > this stage, w/o having it physically accessible. > > > > Since no ACPI or similar events are generated, there is no way to set up > > altmap support, or even make the memory physically accessible at all, > > before the "memmap on memory" initialization in memory_block_online(). > > > > The new MEM_PHYS_ONLINE notifier allows to work around this, by > > providing a hook to make the memory physically accessible, and also call > > __add_pages() with altmap support, early in memory_block_online(). > > Similarly, the MEM_PHYS_OFFLINE notifier allows to make the memory > > inaccessible and call __remove_pages(), at the end of > > memory_block_offline(). > > > > Calling __add/remove_pages() requires mem_hotplug_lock, so move > > mem_hotplug_begin/done() to include the new notifiers. > > > > All architectures ignore unknown memory notifiers, so this patch should > > not introduce any functional changes. > > Sorry to say, no. No hacks please, and this is a hack for memory that > has already been added to the system. IIUC, when we enter memory_block_online(), memory has always already been added to the system, on all architectures. E.g. via ACPI events on x86, or with the existing s390 hack, where we add it at boot time, including memmap allocated from system memory. Without a preceding add_memory() you cannot reach memory_block_online() via sysfs online. The difference is that for s390, the memory is not yet physically accessible, and therefore we cannot use the existing altmap support in memory_block_online(), which requires that the memory is accessible before it calls mhp_init_memmap_on_memory(). Currently, on s390 we make the memory accessible in the GOING_ONLINE notifier, by sclp call to the hypervisor. That is too late for altmap setup code in memory_block_online(), therefore we'd like to introduce the new notifier, to have a hook where we can make it accessible earlier, and after that there is no difference to how it works for other architectures, and we can make use of the existing altmap support. > > If you want memory without an altmap to suddenly not have an altmap > anymore, then look into removing and readding that memory, or some way > to convert offline memory. We do not want to have memory suddenly not have an altmap support any more, but simply get a hook so that we can prepare the memory to have altmap support. This means making it physically accessible, and calling __add_pages() for altmap support, which for other architecture has already happened before. Of course, it is a hack for s390, that we must skip __add_pages() in the initial (arch_)add_memory() during boot time, when we want altmap support, because the memory simply is not accessible at that time. But s390 memory hotplug support has always been a hack, and had to be, because of how it is implemented by the architecture. So we replace one hack with another one, that has the huge advantage that we do not need to allocate struct pages upfront from system memory any more, for the whole possible online memory range. And the current approach comes without any change to existing interfaces, and minimal change to common code, i.e. these new notifiers, that should not have any impact on other architectures. What exactly is your concern regarding the new notifiers? Is it useless no-op notifier calls on other archs (not sure if they would get optimized out by compiler)?
On 15.11.23 16:03, Gerald Schaefer wrote: > On Tue, 14 Nov 2023 19:27:35 +0100 > David Hildenbrand <david@redhat.com> wrote: > >> On 14.11.23 19:02, Sumanth Korikkar wrote: >>> Add new memory notifiers to mimic the dynamic ACPI event triggered logic >>> for memory hotplug on platforms that do not generate such events. This >>> will be used to implement "memmap on memory" feature for s390 in a later >>> patch. >>> >>> Platforms such as x86 can support physical memory hotplug via ACPI. When >>> there is physical memory hotplug, ACPI event leads to the memory >>> addition with the following callchain: >>> acpi_memory_device_add() >>> -> acpi_memory_enable_device() >>> -> __add_memory() >>> >>> After this, the hotplugged memory is physically accessible, and altmap >>> support prepared, before the "memmap on memory" initialization in >>> memory_block_online() is called. >>> >>> On s390, memory hotplug works in a different way. The available hotplug >>> memory has to be defined upfront in the hypervisor, but it is made >>> physically accessible only when the user sets it online via sysfs, >>> currently in the MEM_GOING_ONLINE notifier. This requires calling >>> add_memory() during early memory detection, in order to get the sysfs >>> representation, but we cannot use "memmap on memory" altmap support at >>> this stage, w/o having it physically accessible. >>> >>> Since no ACPI or similar events are generated, there is no way to set up >>> altmap support, or even make the memory physically accessible at all, >>> before the "memmap on memory" initialization in memory_block_online(). >>> >>> The new MEM_PHYS_ONLINE notifier allows to work around this, by >>> providing a hook to make the memory physically accessible, and also call >>> __add_pages() with altmap support, early in memory_block_online(). >>> Similarly, the MEM_PHYS_OFFLINE notifier allows to make the memory >>> inaccessible and call __remove_pages(), at the end of >>> memory_block_offline(). >>> >>> Calling __add/remove_pages() requires mem_hotplug_lock, so move >>> mem_hotplug_begin/done() to include the new notifiers. >>> >>> All architectures ignore unknown memory notifiers, so this patch should >>> not introduce any functional changes. >> >> Sorry to say, no. No hacks please, and this is a hack for memory that >> has already been added to the system. > > IIUC, when we enter memory_block_online(), memory has always already > been added to the system, on all architectures. E.g. via ACPI events > on x86, or with the existing s390 hack, where we add it at boot time, > including memmap allocated from system memory. Without a preceding > add_memory() you cannot reach memory_block_online() via sysfs online. Adding that memory block at boot time is the legacy leftover s390x is carrying along; and now we want to "workaround" that by adding s390x special handling for online/offlining code and having memory blocks without any memmap, or configuring an altmap in the very last minute using a s390x specific memory notifier. Instead, if you want to support the altmap, the kernel should not add standby memory to the system (if configured for this new feature), but instead only remember the standby memory ranges so it knows what can later be added and what can't. From there, users should have an interface where they can actually add memory to the system, and either online it manually or just let the kernel online it automatically. s390x code will call add_memory() and properly prepare an altmap if requested and make that standby memory available. You can then even have an interface to remove that memory again once offline. That will work with an altmap or without an altmap. This approach is aligned with any other code that hot(un)plugs memory and is compatible with things like variable-sized memory blocks people have been talking about quite a while already, and altmaps that span multiple memory blocks to make gigantic pages in such ranges usable. Sure, you'll have a new interface and have to enable the new handling for the new kernel, but you're asking for supporting a new feature that cannot be supported cleanly just like any other architecture does. But it's a clean approach and probably should have been done that way right from the start (decades ago). Note: We do have the same for other architectures without ACPI that add memory via the probe interface. But IIRC we cannot really do any checks there, because these architectures have no way of identifying what > > The difference is that for s390, the memory is not yet physically > accessible, and therefore we cannot use the existing altmap support > in memory_block_online(), which requires that the memory is accessible > before it calls mhp_init_memmap_on_memory(). > > Currently, on s390 we make the memory accessible in the GOING_ONLINE > notifier, by sclp call to the hypervisor. That is too late for altmap > setup code in memory_block_online(), therefore we'd like to introduce > the new notifier, to have a hook where we can make it accessible > earlier, and after that there is no difference to how it works for > other architectures, and we can make use of the existing altmap support. > >> >> If you want memory without an altmap to suddenly not have an altmap >> anymore, then look into removing and readding that memory, or some way >> to convert offline memory. > > We do not want to have memory suddenly not have an altmap support > any more, but simply get a hook so that we can prepare the memory > to have altmap support. This means making it physically accessible, > and calling __add_pages() for altmap support, which for other > architecture has already happened before. > > Of course, it is a hack for s390, that we must skip __add_pages() > in the initial (arch_)add_memory() during boot time, when we want > altmap support, because the memory simply is not accessible at that > time. But s390 memory hotplug support has always been a hack, and > had to be, because of how it is implemented by the architecture. I write above paragraph before reading this; and it's fully aligned with what I said above. > > So we replace one hack with another one, that has the huge advantage > that we do not need to allocate struct pages upfront from system > memory any more, for the whole possible online memory range. > > And the current approach comes without any change to existing > interfaces, and minimal change to common code, i.e. these new > notifiers, that should not have any impact on other architectures. > > What exactly is your concern regarding the new notifiers? Is it > useless no-op notifier calls on other archs (not sure if they > would get optimized out by compiler)? That it makes hotplug code more special because of s390x, instead of cleaning up that legacy code. -- Cheers, David / dhildenb
On Tue, Nov 14, 2023 at 07:27:35PM +0100, David Hildenbrand wrote: > On 14.11.23 19:02, Sumanth Korikkar wrote: > > Add new memory notifiers to mimic the dynamic ACPI event triggered logic > > for memory hotplug on platforms that do not generate such events. This > > will be used to implement "memmap on memory" feature for s390 in a later > > patch. > > > > Platforms such as x86 can support physical memory hotplug via ACPI. When > > there is physical memory hotplug, ACPI event leads to the memory > > addition with the following callchain: > > acpi_memory_device_add() > > -> acpi_memory_enable_device() > > -> __add_memory() > > > > After this, the hotplugged memory is physically accessible, and altmap > > support prepared, before the "memmap on memory" initialization in > > memory_block_online() is called. > > > > On s390, memory hotplug works in a different way. The available hotplug > > memory has to be defined upfront in the hypervisor, but it is made > > physically accessible only when the user sets it online via sysfs, > > currently in the MEM_GOING_ONLINE notifier. This requires calling > > add_memory() during early memory detection, in order to get the sysfs > > representation, but we cannot use "memmap on memory" altmap support at > > this stage, w/o having it physically accessible. > > > > Since no ACPI or similar events are generated, there is no way to set up > > altmap support, or even make the memory physically accessible at all, > > before the "memmap on memory" initialization in memory_block_online(). > > > > The new MEM_PHYS_ONLINE notifier allows to work around this, by > > providing a hook to make the memory physically accessible, and also call > > __add_pages() with altmap support, early in memory_block_online(). > > Similarly, the MEM_PHYS_OFFLINE notifier allows to make the memory > > inaccessible and call __remove_pages(), at the end of > > memory_block_offline(). > > > > Calling __add/remove_pages() requires mem_hotplug_lock, so move > > mem_hotplug_begin/done() to include the new notifiers. > > > > All architectures ignore unknown memory notifiers, so this patch should > > not introduce any functional changes. > > Sorry to say, no. No hacks please, and this is a hack for memory that has > already been added to the system. > > If you want memory without an altmap to suddenly not have an altmap anymore, > then look into removing and readding that memory, or some way to convert > offline memory. Sorry, I couldnt get the context. Could you please give me more details? Thanks
On 15.11.23 15:23, Sumanth Korikkar wrote: > On Tue, Nov 14, 2023 at 07:27:35PM +0100, David Hildenbrand wrote: >> On 14.11.23 19:02, Sumanth Korikkar wrote: >>> Add new memory notifiers to mimic the dynamic ACPI event triggered logic >>> for memory hotplug on platforms that do not generate such events. This >>> will be used to implement "memmap on memory" feature for s390 in a later >>> patch. >>> >>> Platforms such as x86 can support physical memory hotplug via ACPI. When >>> there is physical memory hotplug, ACPI event leads to the memory >>> addition with the following callchain: >>> acpi_memory_device_add() >>> -> acpi_memory_enable_device() >>> -> __add_memory() >>> >>> After this, the hotplugged memory is physically accessible, and altmap >>> support prepared, before the "memmap on memory" initialization in >>> memory_block_online() is called. >>> >>> On s390, memory hotplug works in a different way. The available hotplug >>> memory has to be defined upfront in the hypervisor, but it is made >>> physically accessible only when the user sets it online via sysfs, >>> currently in the MEM_GOING_ONLINE notifier. This requires calling >>> add_memory() during early memory detection, in order to get the sysfs >>> representation, but we cannot use "memmap on memory" altmap support at >>> this stage, w/o having it physically accessible. >>> >>> Since no ACPI or similar events are generated, there is no way to set up >>> altmap support, or even make the memory physically accessible at all, >>> before the "memmap on memory" initialization in memory_block_online(). >>> >>> The new MEM_PHYS_ONLINE notifier allows to work around this, by >>> providing a hook to make the memory physically accessible, and also call >>> __add_pages() with altmap support, early in memory_block_online(). >>> Similarly, the MEM_PHYS_OFFLINE notifier allows to make the memory >>> inaccessible and call __remove_pages(), at the end of >>> memory_block_offline(). >>> >>> Calling __add/remove_pages() requires mem_hotplug_lock, so move >>> mem_hotplug_begin/done() to include the new notifiers. >>> >>> All architectures ignore unknown memory notifiers, so this patch should >>> not introduce any functional changes. >> >> Sorry to say, no. No hacks please, and this is a hack for memory that has >> already been added to the system. >> >> If you want memory without an altmap to suddenly not have an altmap anymore, >> then look into removing and readding that memory, or some way to convert >> offline memory. > > Sorry, I couldnt get the context. Could you please give me more details? See my reply to Gerald. In an ideal world, there would not be any new callbacks, we would get rid of them, and just let the architecture properly hotplug memory to the system when requested by the user. -- Cheers, David / dhildenb
© 2016 - 2025 Red Hat, Inc.