[PATCH 6/8] s390/mm: implement MEM_PHYS_ONLINE MEM_PHYS_OFFLINE memory notifiers

Sumanth Korikkar posted 8 patches 2 years, 1 month ago
There is a newer version of this series
[PATCH 6/8] s390/mm: implement MEM_PHYS_ONLINE MEM_PHYS_OFFLINE memory notifiers
Posted by Sumanth Korikkar 2 years, 1 month ago
Implement MEM_PHYS_ONLINE and MEM_PHYS_OFFLINE memory notifiers on s390

Implementation of MEM_PHYS_ONLINE Memory Notifier:
* Transition the memory block to an accessible/online state using the
  sclp assign command.
* Execute __add_pages() for the memory block, enabling a self-contained
  memory map range. For boot-time memory, vmemmap mapping is carried out
  through sparse_init().

Implementation of MEM_PHYS_OFFLINE Memory Notifier:
* Execute __remove_pages() exclusively for the memory block (applicable
  where a self-contained memory map was possible before).
* Shift the memory block to an inaccessible/offline state using the sclp
  unassign command.

Additional Implementation Considerations:
* When MHP_MEMMAP_ON_MEMORY is disabled, the system retains the old
  behavior. This means the memory map is allocated from default memory,
  and struct vmemmap pages are populated during the standby memory
  detection phase.
* With MHP_MEMMAP_ON_MEMORY enabled (allowing self-contained memory
  map), the memory map is allocated using the self-contained memory map
  range. Struct vmemmap pages are populated during the memory hotplug
  phase.
* If MACHINE_HAS_EDAT1 is unavailable, MHP_MEMMAP_ON_MEMORY is
  automatically disabled. This ensures that vmemmap pagetables do not
  consume additional memory from the default memory allocator.
* The MEM_GOING_ONLINE notifier has been modified to perform no
  operation, as MEM_PHYS_ONLINE already executes the sclp assign
  command.
* The MEM_CANCEL_ONLINE notifier now performs no operation, as
  MEM_PHYS_OFFLINE already executes the sclp unassign command.
* The call to __add_pages() in arch_add_memory() with altmap support is
  skipped. This operation is deferred and will be performed later in the
  MEM_PHYS_ONLINE notifier.

Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
---
 arch/s390/mm/init.c          | 16 +++++++++++++++-
 drivers/s390/char/sclp_cmd.c | 33 ++++++++++++++++++++++++++++++---
 2 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index 8d9a60ccb777..db505ed590b2 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -288,6 +288,12 @@ int arch_add_memory(int nid, u64 start, u64 size,
 	rc = vmem_add_mapping(start, size);
 	if (rc)
 		return rc;
+	/*
+	 * If MHP_MEMMAP_ON_MEMORY is enabled, perform __add_pages() during memory
+	 * onlining phase
+	 */
+	if (params->altmap)
+		return 0;
 
 	rc = __add_pages(nid, start_pfn, size_pages, params);
 	if (rc)
@@ -300,7 +306,15 @@ void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 
-	__remove_pages(start_pfn, nr_pages, altmap);
+	/*
+	 * On s390, currently arch_remove_memory() will be called during error
+	 * handling of add_memory_resource(). When MHP_MEMMAP_ON_MEMORY is
+	 * enabled, __add_pages() is performed later during the memory onlining
+	 * phase.  Hence, __remove_pages() should not be called here in that
+	 * case, but only later during memory offline phase
+	 */
+	if (!altmap)
+		__remove_pages(start_pfn, nr_pages, NULL);
 	vmem_remove_mapping(start, size);
 }
 #endif /* CONFIG_MEMORY_HOTPLUG */
diff --git a/drivers/s390/char/sclp_cmd.c b/drivers/s390/char/sclp_cmd.c
index 11c428f4c7cf..12f3d4af7e4e 100644
--- a/drivers/s390/char/sclp_cmd.c
+++ b/drivers/s390/char/sclp_cmd.c
@@ -18,6 +18,7 @@
 #include <linux/mm.h>
 #include <linux/mmzone.h>
 #include <linux/memory.h>
+#include <linux/memory_hotplug.h>
 #include <linux/module.h>
 #include <asm/ctlreg.h>
 #include <asm/chpid.h>
@@ -26,6 +27,7 @@
 #include <asm/sclp.h>
 #include <asm/numa.h>
 #include <asm/facility.h>
+#include <asm/page-states.h>
 
 #include "sclp.h"
 
@@ -319,6 +321,8 @@ static bool contains_standby_increment(unsigned long start, unsigned long end)
 static int sclp_mem_notifier(struct notifier_block *nb,
 			     unsigned long action, void *data)
 {
+	struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) };
+	struct memory_block *memory_block;
 	unsigned long start, size;
 	struct memory_notify *arg;
 	unsigned char id;
@@ -330,6 +334,11 @@ static int sclp_mem_notifier(struct notifier_block *nb,
 	mutex_lock(&sclp_mem_mutex);
 	for_each_clear_bit(id, sclp_storage_ids, sclp_max_storage_id + 1)
 		sclp_attach_storage(id);
+	memory_block = find_memory_block(pfn_to_section_nr(arg->start_pfn));
+	if (!memory_block) {
+		rc = -EINVAL;
+		goto out;
+	}
 	switch (action) {
 	case MEM_GOING_OFFLINE:
 		/*
@@ -344,17 +353,34 @@ static int sclp_mem_notifier(struct notifier_block *nb,
 	case MEM_CANCEL_OFFLINE:
 		break;
 	case MEM_GOING_ONLINE:
+		break;
+	case MEM_PHYS_ONLINE:
 		rc = sclp_mem_change_state(start, size, 1);
+		if (rc || !memory_block->altmap)
+			goto out;
+		params.altmap = memory_block->altmap;
+		rc = __add_pages(0, arg->start_pfn, arg->nr_pages, &params);
+		if (rc)
+			sclp_mem_change_state(start, size, 0);
+		/*
+		 * Set CMMA state to nodat here, since the struct page memory
+		 * at the beginning of the memory block will not go through the
+		 * buddy allocator later.
+		 */
+		__arch_set_page_nodat((void *)start, memory_block->altmap->free);
 		break;
 	case MEM_CANCEL_ONLINE:
-		sclp_mem_change_state(start, size, 0);
-		break;
 	case MEM_OFFLINE:
+		break;
+	case MEM_PHYS_OFFLINE:
+		if (memory_block->altmap)
+			__remove_pages(arg->start_pfn, arg->nr_pages, memory_block->altmap);
 		sclp_mem_change_state(start, size, 0);
 		break;
 	default:
 		break;
 	}
+out:
 	mutex_unlock(&sclp_mem_mutex);
 	return rc ? NOTIFY_BAD : NOTIFY_OK;
 }
@@ -400,7 +426,8 @@ static void __init add_memory_merged(u16 rn)
 	if (!size)
 		goto skip_add;
 	for (addr = start; addr < start + size; addr += block_size)
-		add_memory(0, addr, block_size, MHP_NONE);
+		add_memory(0, addr, block_size,
+			   MACHINE_HAS_EDAT1 ? MHP_MEMMAP_ON_MEMORY : MHP_NONE);
 skip_add:
 	first_rn = rn;
 	num = 1;
-- 
2.41.0
Re: [PATCH 6/8] s390/mm: implement MEM_PHYS_ONLINE MEM_PHYS_OFFLINE memory notifiers
Posted by David Hildenbrand 2 years, 1 month ago
On 14.11.23 19:02, Sumanth Korikkar wrote:
> Implement MEM_PHYS_ONLINE and MEM_PHYS_OFFLINE memory notifiers on s390
> 
> Implementation of MEM_PHYS_ONLINE Memory Notifier:
> * Transition the memory block to an accessible/online state using the
>    sclp assign command.
> * Execute __add_pages() for the memory block, enabling a self-contained
>    memory map range. For boot-time memory, vmemmap mapping is carried out
>    through sparse_init().
> 
> Implementation of MEM_PHYS_OFFLINE Memory Notifier:
> * Execute __remove_pages() exclusively for the memory block (applicable
>    where a self-contained memory map was possible before).
> * Shift the memory block to an inaccessible/offline state using the sclp
>    unassign command.
> 
> Additional Implementation Considerations:
> * When MHP_MEMMAP_ON_MEMORY is disabled, the system retains the old
>    behavior. This means the memory map is allocated from default memory,
>    and struct vmemmap pages are populated during the standby memory
>    detection phase.
> * With MHP_MEMMAP_ON_MEMORY enabled (allowing self-contained memory
>    map), the memory map is allocated using the self-contained memory map
>    range. Struct vmemmap pages are populated during the memory hotplug
>    phase.
> * If MACHINE_HAS_EDAT1 is unavailable, MHP_MEMMAP_ON_MEMORY is
>    automatically disabled. This ensures that vmemmap pagetables do not
>    consume additional memory from the default memory allocator.
> * The MEM_GOING_ONLINE notifier has been modified to perform no
>    operation, as MEM_PHYS_ONLINE already executes the sclp assign
>    command.
> * The MEM_CANCEL_ONLINE notifier now performs no operation, as
>    MEM_PHYS_OFFLINE already executes the sclp unassign command.
> * The call to __add_pages() in arch_add_memory() with altmap support is
>    skipped. This operation is deferred and will be performed later in the
>    MEM_PHYS_ONLINE notifier.
> 
> Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
> Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
> ---
>   arch/s390/mm/init.c          | 16 +++++++++++++++-
>   drivers/s390/char/sclp_cmd.c | 33 ++++++++++++++++++++++++++++++---
>   2 files changed, 45 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
> index 8d9a60ccb777..db505ed590b2 100644
> --- a/arch/s390/mm/init.c
> +++ b/arch/s390/mm/init.c
> @@ -288,6 +288,12 @@ int arch_add_memory(int nid, u64 start, u64 size,
>   	rc = vmem_add_mapping(start, size);
>   	if (rc)
>   		return rc;
> +	/*
> +	 * If MHP_MEMMAP_ON_MEMORY is enabled, perform __add_pages() during memory
> +	 * onlining phase
> +	 */
> +	if (params->altmap)
> +		return 0;


So we'd have added memory blocks without a memmap? Sorry, but this seems 
to further hack into the s390x direction.

Maybe s390x should just provide a dedicate interface to add these memory 
blocks instead of adding them during boot and then relying on the old 
way of using online/offline set them online/offline.

-- 
Cheers,

David / dhildenb
Re: [PATCH 6/8] s390/mm: implement MEM_PHYS_ONLINE MEM_PHYS_OFFLINE memory notifiers
Posted by Sumanth Korikkar 2 years, 1 month ago
On Tue, Nov 14, 2023 at 07:39:40PM +0100, David Hildenbrand wrote:
> On 14.11.23 19:02, Sumanth Korikkar wrote:
> > Implement MEM_PHYS_ONLINE and MEM_PHYS_OFFLINE memory notifiers on s390
> > 
...
> >   arch/s390/mm/init.c          | 16 +++++++++++++++-
> >   drivers/s390/char/sclp_cmd.c | 33 ++++++++++++++++++++++++++++++---
> >   2 files changed, 45 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
> > index 8d9a60ccb777..db505ed590b2 100644
> > --- a/arch/s390/mm/init.c
> > +++ b/arch/s390/mm/init.c
> > @@ -288,6 +288,12 @@ int arch_add_memory(int nid, u64 start, u64 size,
> >   	rc = vmem_add_mapping(start, size);
> >   	if (rc)
> >   		return rc;
> > +	/*
> > +	 * If MHP_MEMMAP_ON_MEMORY is enabled, perform __add_pages() during memory
> > +	 * onlining phase
> > +	 */
> > +	if (params->altmap)
> > +		return 0;
> 
> 
> So we'd have added memory blocks without a memmap? Sorry, but this seems to
> further hack into the s390x direction.

This new approach has the advantage that we do not need to allocate any
additional memory during online phase, neither for direct mapping page
tables nor struct pages, so that memory hotplug can never fail.

The old approach (without altmap) is already a hack, because we add
the memmap / struct pages, but for memory that is not really accessible.
And with all the disadvantage of pre-allocating struct pages from system
memory.

The new approach allows to better integrate s390 to the existing
interface, and also make use of altmap support, which would eliminate
the major disadvantage of the old behaviour. So from s390 perspective,
this new mechanism would be preferred, provided that there is no
functional issue with the "added memory blocks without a memmap"
approach.

Do you see any functional issues, e.g. conflict with common
code?
> 
> Maybe s390x should just provide a dedicate interface to add these memory
> blocks instead of adding them during boot and then relying on the old way of
> using online/offline set them online/offline.

Existing behavior:
The current 'lsmem -a' command displays both online and standby memory.

interface changes:
If a new interface is introduced and standby memory is no longer listed,
the following consequences might occur:

1. Running 'lsmem -a' would only show online memory, potentially leading
   to user complaints.
2. standby memory addition would need:
   * echo "standby memory addr" > /sys/devices/system/memory/probe
   As far as I understand, this interface is already deprecated.

3. To remove standby memory, a new interface probe_remove is needed
   * echo "standby memory addr" > /sys/devices/system/memory/probe_remove

4. Users may express a need to identify standby memory addresses,
resulting in the creation of another interface to list these standby
memory ranges.

Hence, introducing new physical memory notifiers to platforms lacking
dynamic ACPI events would be highly advantageous while maintaining
existing user-friendly interface.

Thanks
Re: [PATCH 6/8] s390/mm: implement MEM_PHYS_ONLINE MEM_PHYS_OFFLINE memory notifiers
Posted by David Hildenbrand 2 years, 1 month ago
On 15.11.23 15:20, Sumanth Korikkar wrote:
> On Tue, Nov 14, 2023 at 07:39:40PM +0100, David Hildenbrand wrote:
>> On 14.11.23 19:02, Sumanth Korikkar wrote:
>>> Implement MEM_PHYS_ONLINE and MEM_PHYS_OFFLINE memory notifiers on s390
>>>
> ...
>>>    arch/s390/mm/init.c          | 16 +++++++++++++++-
>>>    drivers/s390/char/sclp_cmd.c | 33 ++++++++++++++++++++++++++++++---
>>>    2 files changed, 45 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
>>> index 8d9a60ccb777..db505ed590b2 100644
>>> --- a/arch/s390/mm/init.c
>>> +++ b/arch/s390/mm/init.c
>>> @@ -288,6 +288,12 @@ int arch_add_memory(int nid, u64 start, u64 size,
>>>    	rc = vmem_add_mapping(start, size);
>>>    	if (rc)
>>>    		return rc;
>>> +	/*
>>> +	 * If MHP_MEMMAP_ON_MEMORY is enabled, perform __add_pages() during memory
>>> +	 * onlining phase
>>> +	 */
>>> +	if (params->altmap)
>>> +		return 0;
>>
>>
>> So we'd have added memory blocks without a memmap? Sorry, but this seems to
>> further hack into the s390x direction.
> 
> This new approach has the advantage that we do not need to allocate any
> additional memory during online phase, neither for direct mapping page
> tables nor struct pages, so that memory hotplug can never fail.

Right, just like any other architecture that (triggered by whatever 
mechanism) ends up calling add_memory() and friends.

> 
> The old approach (without altmap) is already a hack, because we add
> the memmap / struct pages, but for memory that is not really accessible.

Yes, it's disgusting. And you still allocate other things like memory 
block devices or the identify map.

> And with all the disadvantage of pre-allocating struct pages from system
> memory.

Jep. It never should have been done like that.

> 
> The new approach allows to better integrate s390 to the existing
> interface, and also make use of altmap support, which would eliminate
> the major disadvantage of the old behaviour. So from s390 perspective,
> this new mechanism would be preferred, provided that there is no
> functional issue with the "added memory blocks without a memmap"
> approach.

It achieves that by s390x specific hacks in common code :) Instead of 
everybody else that simply uses add_memory() and friends.

> 
> Do you see any functional issues, e.g. conflict with common
> code?

I don't see functional issues right now, just the way it is done to 
implement support for a new feature is a hack IMHO. Replacing hack #1 by 
hack #2 is not really something reasonable. Let's try to remove hacks.

>>
>> Maybe s390x should just provide a dedicate interface to add these memory
>> blocks instead of adding them during boot and then relying on the old way of
>> using online/offline set them online/offline.
> 
> Existing behavior:
> The current 'lsmem -a' command displays both online and standby memory.
> 
> interface changes:
> If a new interface is introduced and standby memory is no longer listed,
> the following consequences might occur:
> 
> 1. Running 'lsmem -a' would only show online memory, potentially leading
>     to user complaints.

That's why the new, clean way of doing it will require a world switch. 
If the admin wants the benefits of altmap/memmap allocation, it can be 
enabled.

> 2. standby memory addition would need:
>     * echo "standby memory addr" > /sys/devices/system/memory/probe
>     As far as I understand, this interface is already deprecated.

It should actually be an s390x specific interface where people are able 
to query the standby ranges, and request to add/remove them. There, 
s390x can perform checks and setup everything accordingly before calling 
add_memory() and have the memory onlined.

We do have something comparable with the dax/kmem infrastructure: users 
configure the available memory to hotplug, and then hotplug it. Tooling 
onlines that memory automatically.

Ideally they will add ranges, not memory blocks.

> 
> 3. To remove standby memory, a new interface probe_remove is needed
>     * echo "standby memory addr" > /sys/devices/system/memory/probe_remove
> 

Similarly, an s390x specific interface that performs checks and properly 
tears everything s390x-specifc down -- for example, turning system RAM 
into standby RAM again.


> 4. Users may express a need to identify standby memory addresses,
> resulting in the creation of another interface to list these standby
> memory ranges.

Exactly. Memory that is not added to the system that does not consume 
any resources, but can be added on demand using an interface that is not 
the second stage (onlining/offlining) of memory hot(un)plug.

-- 
Cheers,

David / dhildenb
Re: [PATCH 6/8] s390/mm: implement MEM_PHYS_ONLINE MEM_PHYS_OFFLINE memory notifiers
Posted by Gerald Schaefer 2 years, 1 month ago
On Thu, 16 Nov 2023 20:16:02 +0100
David Hildenbrand <david@redhat.com> wrote:

> On 15.11.23 15:20, Sumanth Korikkar wrote:
> > On Tue, Nov 14, 2023 at 07:39:40PM +0100, David Hildenbrand wrote:  
> >> On 14.11.23 19:02, Sumanth Korikkar wrote:  
> >>> Implement MEM_PHYS_ONLINE and MEM_PHYS_OFFLINE memory notifiers on s390
> >>>  
> > ...  
> >>>    arch/s390/mm/init.c          | 16 +++++++++++++++-
> >>>    drivers/s390/char/sclp_cmd.c | 33 ++++++++++++++++++++++++++++++---
> >>>    2 files changed, 45 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
> >>> index 8d9a60ccb777..db505ed590b2 100644
> >>> --- a/arch/s390/mm/init.c
> >>> +++ b/arch/s390/mm/init.c
> >>> @@ -288,6 +288,12 @@ int arch_add_memory(int nid, u64 start, u64 size,
> >>>    	rc = vmem_add_mapping(start, size);
> >>>    	if (rc)
> >>>    		return rc;
> >>> +	/*
> >>> +	 * If MHP_MEMMAP_ON_MEMORY is enabled, perform __add_pages() during memory
> >>> +	 * onlining phase
> >>> +	 */
> >>> +	if (params->altmap)
> >>> +		return 0;  
> >>
> >>
> >> So we'd have added memory blocks without a memmap? Sorry, but this seems to
> >> further hack into the s390x direction.  
> > 
> > This new approach has the advantage that we do not need to allocate any
> > additional memory during online phase, neither for direct mapping page
> > tables nor struct pages, so that memory hotplug can never fail.  
> 
> Right, just like any other architecture that (triggered by whatever 
> mechanism) ends up calling add_memory() and friends.

Just for better understanding, are page tables for identity and also
vmemmap mapping not allocated from system memory on other archs? I.e.
no altmap support for that, only for struct pages (so far)?

> 
> > 
> > The old approach (without altmap) is already a hack, because we add
> > the memmap / struct pages, but for memory that is not really accessible.  
> 
> Yes, it's disgusting. And you still allocate other things like memory 
> block devices or the identify map.

I would say it is special :-). And again, for understanding, all other
things apart from struct pages, still would need to be allocated from
system memory on other archs?

Of course, struct pages would be by far the biggest part, so having
altmap support for that helps a lot. Doing the other allocations also
via altmap would feel natural, but it is not possible yet, or did we
miss something?

> 
> > And with all the disadvantage of pre-allocating struct pages from system
> > memory.  
> 
> Jep. It never should have been done like that.

At that time, it gave the benefit over all others, that we do not need
to allocate struct pages from system memory, at the time of memory online,
when memory pressure might be high and such allocations might fail.

I guess you can say that it should have been done "right" at that time,
e.g. by already adding something like altmap support, instead of our own
hack.

> 
> > 
> > The new approach allows to better integrate s390 to the existing
> > interface, and also make use of altmap support, which would eliminate
> > the major disadvantage of the old behaviour. So from s390 perspective,
> > this new mechanism would be preferred, provided that there is no
> > functional issue with the "added memory blocks without a memmap"
> > approach.  
> 
> It achieves that by s390x specific hacks in common code :) Instead of 
> everybody else that simply uses add_memory() and friends.
> 
> > 
> > Do you see any functional issues, e.g. conflict with common
> > code?  
> 
> I don't see functional issues right now, just the way it is done to 
> implement support for a new feature is a hack IMHO. Replacing hack #1 by 
> hack #2 is not really something reasonable. Let's try to remove hacks.

Ok, sounds reasonable, let's try that. Introducing some new s390-specific
interface also feels a bit hacky, or ugly, but we'll see if we find a
way so that it is only "special" :-)
Reminds me a bit of that "probe" attribute, that also was an arch-specific
hack initially, IIRC, and is now to be deprecated...
Re: [PATCH 6/8] s390/mm: implement MEM_PHYS_ONLINE MEM_PHYS_OFFLINE memory notifiers
Posted by David Hildenbrand 2 years, 1 month ago
[catching up on mails]

>>> This new approach has the advantage that we do not need to allocate any
>>> additional memory during online phase, neither for direct mapping page
>>> tables nor struct pages, so that memory hotplug can never fail.
>>
>> Right, just like any other architecture that (triggered by whatever
>> mechanism) ends up calling add_memory() and friends.
> 
> Just for better understanding, are page tables for identity and also
> vmemmap mapping not allocated from system memory on other archs? I.e.
> no altmap support for that, only for struct pages (so far)?

Yes, only the actual "memmap ("struct page")" comes from altmap space, 
everything else comes from the buddy during memory hotplug.

> 
>>
>>>
>>> The old approach (without altmap) is already a hack, because we add
>>> the memmap / struct pages, but for memory that is not really accessible.
>>
>> Yes, it's disgusting. And you still allocate other things like memory
>> block devices or the identify map.
> 
> I would say it is special :-). And again, for understanding, all other

:)

> things apart from struct pages, still would need to be allocated from
> system memory on other archs?

Yes!

> 
> Of course, struct pages would be by far the biggest part, so having
> altmap support for that helps a lot. Doing the other allocations also
> via altmap would feel natural, but it is not possible yet, or did we
> miss something?

The tricky part is making sure ahead of time that that we set aside the 
required number of pageblocks, to properly control during memory 
onlining what to set aside and what to expose to the buddy.

See mhp_supports_memmap_on_memory() / 
memory_block_memmap_on_memory_pages() for the dirty details :)

> 
>>
>>> And with all the disadvantage of pre-allocating struct pages from system
>>> memory.
>>
>> Jep. It never should have been done like that.
> 
> At that time, it gave the benefit over all others, that we do not need
> to allocate struct pages from system memory, at the time of memory online,
> when memory pressure might be high and such allocations might fail.

Agreed. Having the memmap already around can be helpful. But not for all 
standby memory, that's just pure waste.

... but as memory onlining is triggered by user space, it's likely that 
that user space cannot even make progress (e.g., start a process to set 
memory online) to actually trigger memory onlining in serious low-memory 
situations.

> 
> I guess you can say that it should have been done "right" at that time,
> e.g. by already adding something like altmap support, instead of our own
> hack.

Probably yes. IMHO, relying on the existing memory block interface was 
the low hanging fruit. Now, s390x is just special.

> 
>>
>>>
>>> The new approach allows to better integrate s390 to the existing
>>> interface, and also make use of altmap support, which would eliminate
>>> the major disadvantage of the old behaviour. So from s390 perspective,
>>> this new mechanism would be preferred, provided that there is no
>>> functional issue with the "added memory blocks without a memmap"
>>> approach.
>>
>> It achieves that by s390x specific hacks in common code :) Instead of
>> everybody else that simply uses add_memory() and friends.
>>
>>>
>>> Do you see any functional issues, e.g. conflict with common
>>> code?
>>
>> I don't see functional issues right now, just the way it is done to
>> implement support for a new feature is a hack IMHO. Replacing hack #1 by
>> hack #2 is not really something reasonable. Let's try to remove hacks.
> 
> Ok, sounds reasonable, let's try that. Introducing some new s390-specific
> interface also feels a bit hacky, or ugly, but we'll see if we find a
> way so that it is only "special" :-)

As proposed in my other mail, I think there are ways to make s390x happy 
first and look into a cleaner approach long-term.

> Reminds me a bit of that "probe" attribute, that also was an arch-specific
> hack initially, IIRC, and is now to be deprecated...

Yeah, that was for interfaces where the kernel has absolutely no clue 
where/what/how memory gets hotplugged. ARM64 without ACPI.

s390x is completely different though: you know exactly which standby 
memory exists, where it resides, in which granularity in can be 
enabled/disabled, ... you don't have to play dangerous "I'm pretty sure 
there is memory out there although nobody can check so I crash the 
kernel" games.

-- 
Cheers,

David / dhildenb
Re: [PATCH 6/8] s390/mm: implement MEM_PHYS_ONLINE MEM_PHYS_OFFLINE memory notifiers
Posted by David Hildenbrand 2 years, 1 month ago
>>>
>>> Maybe s390x should just provide a dedicate interface to add these memory
>>> blocks instead of adding them during boot and then relying on the old way of
>>> using online/offline set them online/offline.
>>
>> Existing behavior:
>> The current 'lsmem -a' command displays both online and standby memory.
>>
>> interface changes:
>> If a new interface is introduced and standby memory is no longer listed,
>> the following consequences might occur:
>>
>> 1. Running 'lsmem -a' would only show online memory, potentially leading
>>      to user complaints.
> 
> That's why the new, clean way of doing it will require a world switch.
> If the admin wants the benefits of altmap/memmap allocation, it can be
> enabled.

BTW, thinking about it, I guess one could teach lsmem (and maybe chmem) 
to consult additional interfaces on s390x to show standby memory that's 
not added to the system yet.

-- 
Cheers,

David / dhildenb