[PATCH 4/8] mm/memory_hotplug: introduce MEM_PHYS_ONLINE/OFFLINE memory notifiers

Sumanth Korikkar posted 8 patches 2 years, 1 month ago
There is a newer version of this series
[PATCH 4/8] mm/memory_hotplug: introduce MEM_PHYS_ONLINE/OFFLINE memory notifiers
Posted by Sumanth Korikkar 2 years, 1 month ago
Add new memory notifiers to mimic the dynamic ACPI event triggered logic
for memory hotplug on platforms that do not generate such events. This
will be used to implement "memmap on memory" feature for s390 in a later
patch.

Platforms such as x86 can support physical memory hotplug via ACPI. When
there is physical memory hotplug, ACPI event leads to the memory
addition with the following callchain:
acpi_memory_device_add()
  -> acpi_memory_enable_device()
     -> __add_memory()

After this, the hotplugged memory is physically accessible, and altmap
support prepared, before the "memmap on memory" initialization in
memory_block_online() is called.

On s390, memory hotplug works in a different way. The available hotplug
memory has to be defined upfront in the hypervisor, but it is made
physically accessible only when the user sets it online via sysfs,
currently in the MEM_GOING_ONLINE notifier. This requires calling
add_memory() during early memory detection, in order to get the sysfs
representation, but we cannot use "memmap on memory" altmap support at
this stage, w/o having it physically accessible.

Since no ACPI or similar events are generated, there is no way to set up
altmap support, or even make the memory physically accessible at all,
before the "memmap on memory" initialization in memory_block_online().

The new MEM_PHYS_ONLINE notifier allows to work around this, by
providing a hook to make the memory physically accessible, and also call
__add_pages() with altmap support, early in memory_block_online().
Similarly, the MEM_PHYS_OFFLINE notifier allows to make the memory
inaccessible and call __remove_pages(), at the end of
memory_block_offline().

Calling __add/remove_pages() requires mem_hotplug_lock, so move
mem_hotplug_begin/done() to include the new notifiers.

All architectures ignore unknown memory notifiers, so this patch should
not introduce any functional changes.

Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
---
 drivers/base/memory.c  | 18 +++++++++++++++++-
 include/linux/memory.h |  2 ++
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 1e9f6a1749b9..604940f62246 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -185,6 +185,7 @@ static int memory_block_online(struct memory_block *mem)
 	unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
 	unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
 	unsigned long nr_vmemmap_pages = 0;
+	struct memory_notify arg;
 	struct zone *zone;
 	int ret;
 
@@ -194,6 +195,14 @@ static int memory_block_online(struct memory_block *mem)
 	zone = zone_for_pfn_range(mem->online_type, mem->nid, mem->group,
 				  start_pfn, nr_pages);
 
+	arg.start_pfn = start_pfn;
+	arg.nr_pages = nr_pages;
+	mem_hotplug_begin();
+	ret = memory_notify(MEM_PHYS_ONLINE, &arg);
+	ret = notifier_to_errno(ret);
+	if (ret)
+		goto out_notifier;
+
 	/*
 	 * Although vmemmap pages have a different lifecycle than the pages
 	 * they describe (they remain until the memory is unplugged), doing
@@ -204,7 +213,6 @@ static int memory_block_online(struct memory_block *mem)
 	if (mem->altmap)
 		nr_vmemmap_pages = mem->altmap->free;
 
-	mem_hotplug_begin();
 	if (nr_vmemmap_pages) {
 		ret = mhp_init_memmap_on_memory(start_pfn, nr_vmemmap_pages, zone);
 		if (ret)
@@ -228,7 +236,11 @@ static int memory_block_online(struct memory_block *mem)
 					  nr_vmemmap_pages);
 
 	mem->zone = zone;
+	mem_hotplug_done();
+	return ret;
 out:
+	memory_notify(MEM_PHYS_OFFLINE, &arg);
+out_notifier:
 	mem_hotplug_done();
 	return ret;
 }
@@ -238,6 +250,7 @@ static int memory_block_offline(struct memory_block *mem)
 	unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
 	unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
 	unsigned long nr_vmemmap_pages = 0;
+	struct memory_notify arg;
 	int ret;
 
 	if (!mem->zone)
@@ -269,6 +282,9 @@ static int memory_block_offline(struct memory_block *mem)
 		mhp_deinit_memmap_on_memory(start_pfn, nr_vmemmap_pages);
 
 	mem->zone = NULL;
+	arg.start_pfn = start_pfn;
+	arg.nr_pages = nr_pages;
+	memory_notify(MEM_PHYS_OFFLINE, &arg);
 out:
 	mem_hotplug_done();
 	return ret;
diff --git a/include/linux/memory.h b/include/linux/memory.h
index f53cfdaaaa41..5d8b962b8fa1 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -96,6 +96,8 @@ int set_memory_block_size_order(unsigned int order);
 #define	MEM_GOING_ONLINE	(1<<3)
 #define	MEM_CANCEL_ONLINE	(1<<4)
 #define	MEM_CANCEL_OFFLINE	(1<<5)
+#define	MEM_PHYS_ONLINE		(1<<6)
+#define	MEM_PHYS_OFFLINE	(1<<7)
 
 struct memory_notify {
 	unsigned long start_pfn;
-- 
2.41.0
Re: [PATCH 4/8] mm/memory_hotplug: introduce MEM_PHYS_ONLINE/OFFLINE memory notifiers
Posted by David Hildenbrand 2 years, 1 month ago
On 14.11.23 19:02, Sumanth Korikkar wrote:
> Add new memory notifiers to mimic the dynamic ACPI event triggered logic
> for memory hotplug on platforms that do not generate such events. This
> will be used to implement "memmap on memory" feature for s390 in a later
> patch.
> 
> Platforms such as x86 can support physical memory hotplug via ACPI. When
> there is physical memory hotplug, ACPI event leads to the memory
> addition with the following callchain:
> acpi_memory_device_add()
>    -> acpi_memory_enable_device()
>       -> __add_memory()
> 
> After this, the hotplugged memory is physically accessible, and altmap
> support prepared, before the "memmap on memory" initialization in
> memory_block_online() is called.
> 
> On s390, memory hotplug works in a different way. The available hotplug
> memory has to be defined upfront in the hypervisor, but it is made
> physically accessible only when the user sets it online via sysfs,
> currently in the MEM_GOING_ONLINE notifier. This requires calling
> add_memory() during early memory detection, in order to get the sysfs
> representation, but we cannot use "memmap on memory" altmap support at
> this stage, w/o having it physically accessible.
> 
> Since no ACPI or similar events are generated, there is no way to set up
> altmap support, or even make the memory physically accessible at all,
> before the "memmap on memory" initialization in memory_block_online().
> 
> The new MEM_PHYS_ONLINE notifier allows to work around this, by
> providing a hook to make the memory physically accessible, and also call
> __add_pages() with altmap support, early in memory_block_online().
> Similarly, the MEM_PHYS_OFFLINE notifier allows to make the memory
> inaccessible and call __remove_pages(), at the end of
> memory_block_offline().
> 
> Calling __add/remove_pages() requires mem_hotplug_lock, so move
> mem_hotplug_begin/done() to include the new notifiers.
> 
> All architectures ignore unknown memory notifiers, so this patch should
> not introduce any functional changes.

Sorry to say, no. No hacks please, and this is a hack for memory that 
has already been added to the system.

If you want memory without an altmap to suddenly not have an altmap 
anymore, then look into removing and readding that memory, or some way 
to convert offline memory.

But certainly not on the online/offline path triggered by sysfs.

-- 
Cheers,

David / dhildenb
Re: [PATCH 4/8] mm/memory_hotplug: introduce MEM_PHYS_ONLINE/OFFLINE memory notifiers
Posted by Gerald Schaefer 2 years, 1 month ago
On Tue, 14 Nov 2023 19:27:35 +0100
David Hildenbrand <david@redhat.com> wrote:

> On 14.11.23 19:02, Sumanth Korikkar wrote:
> > Add new memory notifiers to mimic the dynamic ACPI event triggered logic
> > for memory hotplug on platforms that do not generate such events. This
> > will be used to implement "memmap on memory" feature for s390 in a later
> > patch.
> > 
> > Platforms such as x86 can support physical memory hotplug via ACPI. When
> > there is physical memory hotplug, ACPI event leads to the memory
> > addition with the following callchain:
> > acpi_memory_device_add()  
> >    -> acpi_memory_enable_device()
> >       -> __add_memory()  
> > 
> > After this, the hotplugged memory is physically accessible, and altmap
> > support prepared, before the "memmap on memory" initialization in
> > memory_block_online() is called.
> > 
> > On s390, memory hotplug works in a different way. The available hotplug
> > memory has to be defined upfront in the hypervisor, but it is made
> > physically accessible only when the user sets it online via sysfs,
> > currently in the MEM_GOING_ONLINE notifier. This requires calling
> > add_memory() during early memory detection, in order to get the sysfs
> > representation, but we cannot use "memmap on memory" altmap support at
> > this stage, w/o having it physically accessible.
> > 
> > Since no ACPI or similar events are generated, there is no way to set up
> > altmap support, or even make the memory physically accessible at all,
> > before the "memmap on memory" initialization in memory_block_online().
> > 
> > The new MEM_PHYS_ONLINE notifier allows to work around this, by
> > providing a hook to make the memory physically accessible, and also call
> > __add_pages() with altmap support, early in memory_block_online().
> > Similarly, the MEM_PHYS_OFFLINE notifier allows to make the memory
> > inaccessible and call __remove_pages(), at the end of
> > memory_block_offline().
> > 
> > Calling __add/remove_pages() requires mem_hotplug_lock, so move
> > mem_hotplug_begin/done() to include the new notifiers.
> > 
> > All architectures ignore unknown memory notifiers, so this patch should
> > not introduce any functional changes.  
> 
> Sorry to say, no. No hacks please, and this is a hack for memory that 
> has already been added to the system.

IIUC, when we enter memory_block_online(), memory has always already
been added to the system, on all architectures. E.g. via ACPI events
on x86, or with the existing s390 hack, where we add it at boot time,
including memmap allocated from system memory. Without a preceding
add_memory() you cannot reach memory_block_online() via sysfs online.

The difference is that for s390, the memory is not yet physically
accessible, and therefore we cannot use the existing altmap support
in memory_block_online(), which requires that the memory is accessible
before it calls mhp_init_memmap_on_memory().

Currently, on s390 we make the memory accessible in the GOING_ONLINE
notifier, by sclp call to the hypervisor. That is too late for altmap
setup code in memory_block_online(), therefore we'd like to introduce
the new notifier, to have a hook where we can make it accessible
earlier, and after that there is no difference to how it works for
other architectures, and we can make use of the existing altmap support.

> 
> If you want memory without an altmap to suddenly not have an altmap 
> anymore, then look into removing and readding that memory, or some way 
> to convert offline memory.

We do not want to have memory suddenly not have an altmap support
any more, but simply get a hook so that we can prepare the memory
to have altmap support. This means making it physically accessible,
and calling __add_pages() for altmap support, which for other
architecture has already happened before.

Of course, it is a hack for s390, that we must skip __add_pages()
in the initial (arch_)add_memory() during boot time, when we want
altmap support, because the memory simply is not accessible at that
time. But s390 memory hotplug support has always been a hack, and
had to be, because of how it is implemented by the architecture.

So we replace one hack with another one, that has the huge advantage
that we do not need to allocate struct pages upfront from system
memory any more, for the whole possible online memory range.

And the current approach comes without any change to existing
interfaces, and minimal change to common code, i.e. these new
notifiers, that should not have any impact on other architectures.

What exactly is your concern regarding the new notifiers? Is it
useless no-op notifier calls on other archs (not sure if they
would get optimized out by compiler)?
Re: [PATCH 4/8] mm/memory_hotplug: introduce MEM_PHYS_ONLINE/OFFLINE memory notifiers
Posted by David Hildenbrand 2 years, 1 month ago
On 15.11.23 16:03, Gerald Schaefer wrote:
> On Tue, 14 Nov 2023 19:27:35 +0100
> David Hildenbrand <david@redhat.com> wrote:
> 
>> On 14.11.23 19:02, Sumanth Korikkar wrote:
>>> Add new memory notifiers to mimic the dynamic ACPI event triggered logic
>>> for memory hotplug on platforms that do not generate such events. This
>>> will be used to implement "memmap on memory" feature for s390 in a later
>>> patch.
>>>
>>> Platforms such as x86 can support physical memory hotplug via ACPI. When
>>> there is physical memory hotplug, ACPI event leads to the memory
>>> addition with the following callchain:
>>> acpi_memory_device_add()
>>>     -> acpi_memory_enable_device()
>>>        -> __add_memory()
>>>
>>> After this, the hotplugged memory is physically accessible, and altmap
>>> support prepared, before the "memmap on memory" initialization in
>>> memory_block_online() is called.
>>>
>>> On s390, memory hotplug works in a different way. The available hotplug
>>> memory has to be defined upfront in the hypervisor, but it is made
>>> physically accessible only when the user sets it online via sysfs,
>>> currently in the MEM_GOING_ONLINE notifier. This requires calling
>>> add_memory() during early memory detection, in order to get the sysfs
>>> representation, but we cannot use "memmap on memory" altmap support at
>>> this stage, w/o having it physically accessible.
>>>
>>> Since no ACPI or similar events are generated, there is no way to set up
>>> altmap support, or even make the memory physically accessible at all,
>>> before the "memmap on memory" initialization in memory_block_online().
>>>
>>> The new MEM_PHYS_ONLINE notifier allows to work around this, by
>>> providing a hook to make the memory physically accessible, and also call
>>> __add_pages() with altmap support, early in memory_block_online().
>>> Similarly, the MEM_PHYS_OFFLINE notifier allows to make the memory
>>> inaccessible and call __remove_pages(), at the end of
>>> memory_block_offline().
>>>
>>> Calling __add/remove_pages() requires mem_hotplug_lock, so move
>>> mem_hotplug_begin/done() to include the new notifiers.
>>>
>>> All architectures ignore unknown memory notifiers, so this patch should
>>> not introduce any functional changes.
>>
>> Sorry to say, no. No hacks please, and this is a hack for memory that
>> has already been added to the system.
> 
> IIUC, when we enter memory_block_online(), memory has always already
> been added to the system, on all architectures. E.g. via ACPI events
> on x86, or with the existing s390 hack, where we add it at boot time,
> including memmap allocated from system memory. Without a preceding
> add_memory() you cannot reach memory_block_online() via sysfs online.

Adding that memory block at boot time is the legacy leftover s390x is 
carrying along; and now we want to "workaround" that by adding s390x 
special handling for online/offlining code and having memory blocks 
without any memmap, or configuring an altmap in the very last minute 
using a s390x specific memory notifier.

Instead, if you want to support the altmap, the kernel should not add 
standby memory to the system (if configured for this new feature), but 
instead only remember the standby memory ranges so it knows what can 
later be added and what can't.

 From there, users should have an interface where they can actually add 
memory to the system, and either online it manually or just let the 
kernel online it automatically.

s390x code will call add_memory() and properly prepare an altmap if 
requested and make that standby memory available. You can then even have 
an interface to remove that memory again once offline. That will work 
with an altmap or without an altmap.

This approach is aligned with any other code that hot(un)plugs memory 
and is compatible with things like variable-sized memory blocks people 
have been talking about quite a while already, and altmaps that span 
multiple memory blocks to make gigantic pages in such ranges usable.

Sure, you'll have a new interface and have to enable the new handling 
for the new kernel, but you're asking for supporting a new feature that 
cannot be supported cleanly just like any other architecture does. But 
it's a clean approach and probably should have been done that way right 
from the start (decades ago).

Note: We do have the same for other architectures without ACPI that add 
memory via the probe interface. But IIRC we cannot really do any checks 
there, because these architectures have no way of identifying what

> 
> The difference is that for s390, the memory is not yet physically
> accessible, and therefore we cannot use the existing altmap support
> in memory_block_online(), which requires that the memory is accessible
> before it calls mhp_init_memmap_on_memory().
> 
> Currently, on s390 we make the memory accessible in the GOING_ONLINE
> notifier, by sclp call to the hypervisor. That is too late for altmap
> setup code in memory_block_online(), therefore we'd like to introduce
> the new notifier, to have a hook where we can make it accessible
> earlier, and after that there is no difference to how it works for
> other architectures, and we can make use of the existing altmap support.
> 
>>
>> If you want memory without an altmap to suddenly not have an altmap
>> anymore, then look into removing and readding that memory, or some way
>> to convert offline memory.
> 
> We do not want to have memory suddenly not have an altmap support
> any more, but simply get a hook so that we can prepare the memory
> to have altmap support. This means making it physically accessible,
> and calling __add_pages() for altmap support, which for other
> architecture has already happened before.
> 
> Of course, it is a hack for s390, that we must skip __add_pages()
> in the initial (arch_)add_memory() during boot time, when we want
> altmap support, because the memory simply is not accessible at that
> time. But s390 memory hotplug support has always been a hack, and
> had to be, because of how it is implemented by the architecture.

I write above paragraph before reading this; and it's fully aligned with 
what I said above.

> 
> So we replace one hack with another one, that has the huge advantage
> that we do not need to allocate struct pages upfront from system
> memory any more, for the whole possible online memory range.
> 
> And the current approach comes without any change to existing
> interfaces, and minimal change to common code, i.e. these new
> notifiers, that should not have any impact on other architectures.
> 
> What exactly is your concern regarding the new notifiers? Is it
> useless no-op notifier calls on other archs (not sure if they
> would get optimized out by compiler)?

That it makes hotplug code more special because of s390x, instead of 
cleaning up that legacy code.

-- 
Cheers,

David / dhildenb
Re: [PATCH 4/8] mm/memory_hotplug: introduce MEM_PHYS_ONLINE/OFFLINE memory notifiers
Posted by Sumanth Korikkar 2 years, 1 month ago
On Tue, Nov 14, 2023 at 07:27:35PM +0100, David Hildenbrand wrote:
> On 14.11.23 19:02, Sumanth Korikkar wrote:
> > Add new memory notifiers to mimic the dynamic ACPI event triggered logic
> > for memory hotplug on platforms that do not generate such events. This
> > will be used to implement "memmap on memory" feature for s390 in a later
> > patch.
> > 
> > Platforms such as x86 can support physical memory hotplug via ACPI. When
> > there is physical memory hotplug, ACPI event leads to the memory
> > addition with the following callchain:
> > acpi_memory_device_add()
> >    -> acpi_memory_enable_device()
> >       -> __add_memory()
> > 
> > After this, the hotplugged memory is physically accessible, and altmap
> > support prepared, before the "memmap on memory" initialization in
> > memory_block_online() is called.
> > 
> > On s390, memory hotplug works in a different way. The available hotplug
> > memory has to be defined upfront in the hypervisor, but it is made
> > physically accessible only when the user sets it online via sysfs,
> > currently in the MEM_GOING_ONLINE notifier. This requires calling
> > add_memory() during early memory detection, in order to get the sysfs
> > representation, but we cannot use "memmap on memory" altmap support at
> > this stage, w/o having it physically accessible.
> > 
> > Since no ACPI or similar events are generated, there is no way to set up
> > altmap support, or even make the memory physically accessible at all,
> > before the "memmap on memory" initialization in memory_block_online().
> > 
> > The new MEM_PHYS_ONLINE notifier allows to work around this, by
> > providing a hook to make the memory physically accessible, and also call
> > __add_pages() with altmap support, early in memory_block_online().
> > Similarly, the MEM_PHYS_OFFLINE notifier allows to make the memory
> > inaccessible and call __remove_pages(), at the end of
> > memory_block_offline().
> > 
> > Calling __add/remove_pages() requires mem_hotplug_lock, so move
> > mem_hotplug_begin/done() to include the new notifiers.
> > 
> > All architectures ignore unknown memory notifiers, so this patch should
> > not introduce any functional changes.
> 
> Sorry to say, no. No hacks please, and this is a hack for memory that has
> already been added to the system.
> 
> If you want memory without an altmap to suddenly not have an altmap anymore,
> then look into removing and readding that memory, or some way to convert
> offline memory.

Sorry, I couldnt get the context. Could you please give me more details?

Thanks
Re: [PATCH 4/8] mm/memory_hotplug: introduce MEM_PHYS_ONLINE/OFFLINE memory notifiers
Posted by David Hildenbrand 2 years, 1 month ago
On 15.11.23 15:23, Sumanth Korikkar wrote:
> On Tue, Nov 14, 2023 at 07:27:35PM +0100, David Hildenbrand wrote:
>> On 14.11.23 19:02, Sumanth Korikkar wrote:
>>> Add new memory notifiers to mimic the dynamic ACPI event triggered logic
>>> for memory hotplug on platforms that do not generate such events. This
>>> will be used to implement "memmap on memory" feature for s390 in a later
>>> patch.
>>>
>>> Platforms such as x86 can support physical memory hotplug via ACPI. When
>>> there is physical memory hotplug, ACPI event leads to the memory
>>> addition with the following callchain:
>>> acpi_memory_device_add()
>>>     -> acpi_memory_enable_device()
>>>        -> __add_memory()
>>>
>>> After this, the hotplugged memory is physically accessible, and altmap
>>> support prepared, before the "memmap on memory" initialization in
>>> memory_block_online() is called.
>>>
>>> On s390, memory hotplug works in a different way. The available hotplug
>>> memory has to be defined upfront in the hypervisor, but it is made
>>> physically accessible only when the user sets it online via sysfs,
>>> currently in the MEM_GOING_ONLINE notifier. This requires calling
>>> add_memory() during early memory detection, in order to get the sysfs
>>> representation, but we cannot use "memmap on memory" altmap support at
>>> this stage, w/o having it physically accessible.
>>>
>>> Since no ACPI or similar events are generated, there is no way to set up
>>> altmap support, or even make the memory physically accessible at all,
>>> before the "memmap on memory" initialization in memory_block_online().
>>>
>>> The new MEM_PHYS_ONLINE notifier allows to work around this, by
>>> providing a hook to make the memory physically accessible, and also call
>>> __add_pages() with altmap support, early in memory_block_online().
>>> Similarly, the MEM_PHYS_OFFLINE notifier allows to make the memory
>>> inaccessible and call __remove_pages(), at the end of
>>> memory_block_offline().
>>>
>>> Calling __add/remove_pages() requires mem_hotplug_lock, so move
>>> mem_hotplug_begin/done() to include the new notifiers.
>>>
>>> All architectures ignore unknown memory notifiers, so this patch should
>>> not introduce any functional changes.
>>
>> Sorry to say, no. No hacks please, and this is a hack for memory that has
>> already been added to the system.
>>
>> If you want memory without an altmap to suddenly not have an altmap anymore,
>> then look into removing and readding that memory, or some way to convert
>> offline memory.
> 
> Sorry, I couldnt get the context. Could you please give me more details?

See my reply to Gerald.

In an ideal world, there would not be any new callbacks, we would get 
rid of them, and just let the architecture properly hotplug memory to 
the system when requested by the user.

-- 
Cheers,

David / dhildenb