[PATCH v3 3/7] s390/physmem_info: query diag500(STORAGE LIMIT) to support QEMU/KVM memory devices

David Hildenbrand posted 7 patches 1 month ago
[PATCH v3 3/7] s390/physmem_info: query diag500(STORAGE LIMIT) to support QEMU/KVM memory devices
Posted by David Hildenbrand 1 month ago
To support memory devices under QEMU/KVM, such as virtio-mem,
we have to prepare our kernel virtual address space accordingly and
have to know the highest possible physical memory address we might see
later: the storage limit. The good old SCLP interface is not suitable for
this use case.

In particular, memory owned by memory devices has no relationship to
storage increments, it is always detected using the device driver, and
unaware OSes (no driver) must never try making use of that memory.
Consequently this memory is located outside of the "maximum storage
increment"-indicated memory range.

Let's use our new diag500 STORAGE_LIMIT subcode to query this storage
limit that can exceed the "maximum storage increment", and use the
existing interfaces (i.e., SCLP) to obtain information about the initial
memory that is not owned+managed by memory devices.

If a hypervisor does not support such memory devices, the address exposed
through diag500 STORAGE_LIMIT will correspond to the maximum storage
increment exposed through SCLP.

To teach kdump on s390 to include memory owned by memory devices, there
will be ways to query the relevant memory ranges from the device via a
driver running in special kdump mode (like virtio-mem already implements
to filter /proc/vmcore access so we don't end up reading from unplugged
device blocks).

Update setup_ident_map_size(), to clarify that there can be more than
just online and standby memory.

Tested-by: Mario Casquero <mcasquer@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/s390/boot/physmem_info.c        | 47 +++++++++++++++++++++++++++-
 arch/s390/boot/startup.c             |  7 +++--
 arch/s390/include/asm/physmem_info.h |  3 ++
 3 files changed, 54 insertions(+), 3 deletions(-)

diff --git a/arch/s390/boot/physmem_info.c b/arch/s390/boot/physmem_info.c
index 1d131a81cb8b..f3ea5dbff10b 100644
--- a/arch/s390/boot/physmem_info.c
+++ b/arch/s390/boot/physmem_info.c
@@ -109,6 +109,42 @@ static int diag260(void)
 	return 0;
 }
 
+#define DIAG500_SC_STOR_LIMIT 4
+
+static int diag500_storage_limit(unsigned long *max_physmem_end)
+{
+	unsigned long storage_limit;
+	unsigned long reg1, reg2;
+	psw_t old;
+
+	asm volatile(
+		"	mvc	0(16,%[psw_old]),0(%[psw_pgm])\n"
+		"	epsw	%[reg1],%[reg2]\n"
+		"	st	%[reg1],0(%[psw_pgm])\n"
+		"	st	%[reg2],4(%[psw_pgm])\n"
+		"	larl	%[reg1],1f\n"
+		"	stg	%[reg1],8(%[psw_pgm])\n"
+		"	lghi	1,%[subcode]\n"
+		"	lghi	2,0\n"
+		"	diag	2,4,0x500\n"
+		"1:	mvc	0(16,%[psw_pgm]),0(%[psw_old])\n"
+		"	lgr	%[slimit],2\n"
+		: [reg1] "=&d" (reg1),
+		  [reg2] "=&a" (reg2),
+		  [slimit] "=d" (storage_limit),
+		  "=Q" (get_lowcore()->program_new_psw),
+		  "=Q" (old)
+		: [psw_old] "a" (&old),
+		  [psw_pgm] "a" (&get_lowcore()->program_new_psw),
+		  [subcode] "i" (DIAG500_SC_STOR_LIMIT)
+		: "memory", "1", "2");
+	if (!storage_limit)
+		return -EINVAL;
+	/* Convert inclusive end to exclusive end */
+	*max_physmem_end = storage_limit + 1;
+	return 0;
+}
+
 static int tprot(unsigned long addr)
 {
 	unsigned long reg1, reg2;
@@ -157,7 +193,9 @@ unsigned long detect_max_physmem_end(void)
 {
 	unsigned long max_physmem_end = 0;
 
-	if (!sclp_early_get_memsize(&max_physmem_end)) {
+	if (!diag500_storage_limit(&max_physmem_end)) {
+		physmem_info.info_source = MEM_DETECT_DIAG500_STOR_LIMIT;
+	} else if (!sclp_early_get_memsize(&max_physmem_end)) {
 		physmem_info.info_source = MEM_DETECT_SCLP_READ_INFO;
 	} else {
 		max_physmem_end = search_mem_end();
@@ -170,6 +208,13 @@ void detect_physmem_online_ranges(unsigned long max_physmem_end)
 {
 	if (!sclp_early_read_storage_info()) {
 		physmem_info.info_source = MEM_DETECT_SCLP_STOR_INFO;
+	} else if (physmem_info.info_source == MEM_DETECT_DIAG500_STOR_LIMIT) {
+		unsigned long online_end;
+
+		if (!sclp_early_get_memsize(&online_end)) {
+			physmem_info.info_source = MEM_DETECT_SCLP_READ_INFO;
+			add_physmem_online_range(0, online_end);
+		}
 	} else if (!diag260()) {
 		physmem_info.info_source = MEM_DETECT_DIAG260;
 	} else if (max_physmem_end) {
diff --git a/arch/s390/boot/startup.c b/arch/s390/boot/startup.c
index c8f149ad77e5..76c33c7442df 100644
--- a/arch/s390/boot/startup.c
+++ b/arch/s390/boot/startup.c
@@ -182,12 +182,15 @@ static void kaslr_adjust_got(unsigned long offset)
  * Merge information from several sources into a single ident_map_size value.
  * "ident_map_size" represents the upper limit of physical memory we may ever
  * reach. It might not be all online memory, but also include standby (offline)
- * memory. "ident_map_size" could be lower then actual standby or even online
+ * memory or memory areas reserved for other means (e.g., memory devices such as
+ * virtio-mem).
+ *
+ * "ident_map_size" could be lower then actual standby/reserved or even online
  * memory present, due to limiting factors. We should never go above this limit.
  * It is the size of our identity mapping.
  *
  * Consider the following factors:
- * 1. max_physmem_end - end of physical memory online or standby.
+ * 1. max_physmem_end - end of physical memory online, standby or reserved.
  *    Always >= end of the last online memory range (get_physmem_online_end()).
  * 2. CONFIG_MAX_PHYSMEM_BITS - the maximum size of physical memory the
  *    kernel is able to support.
diff --git a/arch/s390/include/asm/physmem_info.h b/arch/s390/include/asm/physmem_info.h
index f45cfc8bc233..51b68a43e195 100644
--- a/arch/s390/include/asm/physmem_info.h
+++ b/arch/s390/include/asm/physmem_info.h
@@ -9,6 +9,7 @@ enum physmem_info_source {
 	MEM_DETECT_NONE = 0,
 	MEM_DETECT_SCLP_STOR_INFO,
 	MEM_DETECT_DIAG260,
+	MEM_DETECT_DIAG500_STOR_LIMIT,
 	MEM_DETECT_SCLP_READ_INFO,
 	MEM_DETECT_BIN_SEARCH
 };
@@ -107,6 +108,8 @@ static inline const char *get_physmem_info_source(void)
 		return "sclp storage info";
 	case MEM_DETECT_DIAG260:
 		return "diag260";
+	case MEM_DETECT_DIAG500_STOR_LIMIT:
+		return "diag500 storage limit";
 	case MEM_DETECT_SCLP_READ_INFO:
 		return "sclp read info";
 	case MEM_DETECT_BIN_SEARCH:
-- 
2.46.1
Re: [PATCH v3 3/7] s390/physmem_info: query diag500(STORAGE LIMIT) to support QEMU/KVM memory devices
Posted by Alexander Gordeev 3 weeks, 5 days ago
On Fri, Oct 25, 2024 at 04:14:48PM +0200, David Hildenbrand wrote:
> To support memory devices under QEMU/KVM, such as virtio-mem,
> we have to prepare our kernel virtual address space accordingly and
> have to know the highest possible physical memory address we might see
> later: the storage limit. The good old SCLP interface is not suitable for
> this use case.
> 
> In particular, memory owned by memory devices has no relationship to
> storage increments, it is always detected using the device driver, and
> unaware OSes (no driver) must never try making use of that memory.
> Consequently this memory is located outside of the "maximum storage
> increment"-indicated memory range.
> 
> Let's use our new diag500 STORAGE_LIMIT subcode to query this storage
> limit that can exceed the "maximum storage increment", and use the
> existing interfaces (i.e., SCLP) to obtain information about the initial
> memory that is not owned+managed by memory devices.
> 
> If a hypervisor does not support such memory devices, the address exposed
> through diag500 STORAGE_LIMIT will correspond to the maximum storage
> increment exposed through SCLP.
> 
> To teach kdump on s390 to include memory owned by memory devices, there
> will be ways to query the relevant memory ranges from the device via a
> driver running in special kdump mode (like virtio-mem already implements
> to filter /proc/vmcore access so we don't end up reading from unplugged
> device blocks).
> 
> Update setup_ident_map_size(), to clarify that there can be more than
> just online and standby memory.
> 
> Tested-by: Mario Casquero <mcasquer@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  arch/s390/boot/physmem_info.c        | 47 +++++++++++++++++++++++++++-
>  arch/s390/boot/startup.c             |  7 +++--
>  arch/s390/include/asm/physmem_info.h |  3 ++
>  3 files changed, 54 insertions(+), 3 deletions(-)

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Re: [PATCH v3 3/7] s390/physmem_info: query diag500(STORAGE LIMIT) to support QEMU/KVM memory devices
Posted by David Hildenbrand 3 weeks, 5 days ago
On 30.10.24 15:32, Alexander Gordeev wrote:
> On Fri, Oct 25, 2024 at 04:14:48PM +0200, David Hildenbrand wrote:
>> To support memory devices under QEMU/KVM, such as virtio-mem,
>> we have to prepare our kernel virtual address space accordingly and
>> have to know the highest possible physical memory address we might see
>> later: the storage limit. The good old SCLP interface is not suitable for
>> this use case.
>>
>> In particular, memory owned by memory devices has no relationship to
>> storage increments, it is always detected using the device driver, and
>> unaware OSes (no driver) must never try making use of that memory.
>> Consequently this memory is located outside of the "maximum storage
>> increment"-indicated memory range.
>>
>> Let's use our new diag500 STORAGE_LIMIT subcode to query this storage
>> limit that can exceed the "maximum storage increment", and use the
>> existing interfaces (i.e., SCLP) to obtain information about the initial
>> memory that is not owned+managed by memory devices.
>>
>> If a hypervisor does not support such memory devices, the address exposed
>> through diag500 STORAGE_LIMIT will correspond to the maximum storage
>> increment exposed through SCLP.
>>
>> To teach kdump on s390 to include memory owned by memory devices, there
>> will be ways to query the relevant memory ranges from the device via a
>> driver running in special kdump mode (like virtio-mem already implements
>> to filter /proc/vmcore access so we don't end up reading from unplugged
>> device blocks).
>>
>> Update setup_ident_map_size(), to clarify that there can be more than
>> just online and standby memory.
>>
>> Tested-by: Mario Casquero <mcasquer@redhat.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>   arch/s390/boot/physmem_info.c        | 47 +++++++++++++++++++++++++++-
>>   arch/s390/boot/startup.c             |  7 +++--
>>   arch/s390/include/asm/physmem_info.h |  3 ++
>>   3 files changed, 54 insertions(+), 3 deletions(-)
> 
> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
> 

Thanks Alexander!

-- 
Cheers,

David / dhildenb
Re: [PATCH v3 3/7] s390/physmem_info: query diag500(STORAGE LIMIT) to support QEMU/KVM memory devices
Posted by Heiko Carstens 3 weeks, 5 days ago
On Fri, Oct 25, 2024 at 04:14:48PM +0200, David Hildenbrand wrote:
> To support memory devices under QEMU/KVM, such as virtio-mem,
> we have to prepare our kernel virtual address space accordingly and
> have to know the highest possible physical memory address we might see
> later: the storage limit. The good old SCLP interface is not suitable for
> this use case.
> 
> In particular, memory owned by memory devices has no relationship to
> storage increments, it is always detected using the device driver, and
> unaware OSes (no driver) must never try making use of that memory.
> Consequently this memory is located outside of the "maximum storage
> increment"-indicated memory range.
> 
> Let's use our new diag500 STORAGE_LIMIT subcode to query this storage
> limit that can exceed the "maximum storage increment", and use the
> existing interfaces (i.e., SCLP) to obtain information about the initial
> memory that is not owned+managed by memory devices.
> 
> If a hypervisor does not support such memory devices, the address exposed
> through diag500 STORAGE_LIMIT will correspond to the maximum storage
> increment exposed through SCLP.
> 
> To teach kdump on s390 to include memory owned by memory devices, there
> will be ways to query the relevant memory ranges from the device via a
> driver running in special kdump mode (like virtio-mem already implements
> to filter /proc/vmcore access so we don't end up reading from unplugged
> device blocks).
> 
> Update setup_ident_map_size(), to clarify that there can be more than
> just online and standby memory.
> 
> Tested-by: Mario Casquero <mcasquer@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  arch/s390/boot/physmem_info.c        | 47 +++++++++++++++++++++++++++-
>  arch/s390/boot/startup.c             |  7 +++--
>  arch/s390/include/asm/physmem_info.h |  3 ++
>  3 files changed, 54 insertions(+), 3 deletions(-)

Looks like I couldn't convince you to implement a query subcode.
But anyway, let's move on.

Acked-by: Heiko Carstens <hca@linux.ibm.com>

However, I would like to see an Ack or review from Alexander Gordeev
or Vasily Gorbik for this patch.
Re: [PATCH v3 3/7] s390/physmem_info: query diag500(STORAGE LIMIT) to support QEMU/KVM memory devices
Posted by David Hildenbrand 3 weeks, 5 days ago
On 30.10.24 10:23, Heiko Carstens wrote:
> On Fri, Oct 25, 2024 at 04:14:48PM +0200, David Hildenbrand wrote:
>> To support memory devices under QEMU/KVM, such as virtio-mem,
>> we have to prepare our kernel virtual address space accordingly and
>> have to know the highest possible physical memory address we might see
>> later: the storage limit. The good old SCLP interface is not suitable for
>> this use case.
>>
>> In particular, memory owned by memory devices has no relationship to
>> storage increments, it is always detected using the device driver, and
>> unaware OSes (no driver) must never try making use of that memory.
>> Consequently this memory is located outside of the "maximum storage
>> increment"-indicated memory range.
>>
>> Let's use our new diag500 STORAGE_LIMIT subcode to query this storage
>> limit that can exceed the "maximum storage increment", and use the
>> existing interfaces (i.e., SCLP) to obtain information about the initial
>> memory that is not owned+managed by memory devices.
>>
>> If a hypervisor does not support such memory devices, the address exposed
>> through diag500 STORAGE_LIMIT will correspond to the maximum storage
>> increment exposed through SCLP.
>>
>> To teach kdump on s390 to include memory owned by memory devices, there
>> will be ways to query the relevant memory ranges from the device via a
>> driver running in special kdump mode (like virtio-mem already implements
>> to filter /proc/vmcore access so we don't end up reading from unplugged
>> device blocks).
>>
>> Update setup_ident_map_size(), to clarify that there can be more than
>> just online and standby memory.
>>
>> Tested-by: Mario Casquero <mcasquer@redhat.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>   arch/s390/boot/physmem_info.c        | 47 +++++++++++++++++++++++++++-
>>   arch/s390/boot/startup.c             |  7 +++--
>>   arch/s390/include/asm/physmem_info.h |  3 ++
>>   3 files changed, 54 insertions(+), 3 deletions(-)
> 
> Looks like I couldn't convince you to implement a query subcode.

Well, you convinced me that it might be useful, but after waiting on 
feedback from the KVM folks ... which didn't happen I moved on. In the 
cover letter I have "No query function for diag500 for now."

My thinking was that if we go for a query subcode, maybe we'd start 
"anew" with a new diag and use "0=query" like all similar instructions I 
am aware of. And that is then a bigger rework ...

... and I am not particularly interested in extra work without a clear 
statement from KVM people what (a) if that work is required and; (b) 
what it should look like.

Thanks for the review Heiko!

-- 
Cheers,

David / dhildenb
Re: [PATCH v3 3/7] s390/physmem_info: query diag500(STORAGE LIMIT) to support QEMU/KVM memory devices
Posted by Heiko Carstens 3 weeks, 5 days ago
On Wed, Oct 30, 2024 at 10:42:05AM +0100, David Hildenbrand wrote:
> On 30.10.24 10:23, Heiko Carstens wrote:
> > Looks like I couldn't convince you to implement a query subcode.
> 
> Well, you convinced me that it might be useful, but after waiting on
> feedback from the KVM folks ... which didn't happen I moved on. In the cover
> letter I have "No query function for diag500 for now."
> 
> My thinking was that if we go for a query subcode, maybe we'd start "anew"
> with a new diag and use "0=query" like all similar instructions I am aware
> of. And that is then a bigger rework ...
> 
> ... and I am not particularly interested in extra work without a clear
> statement from KVM people what (a) if that work is required and; (b) what it
> should look like.

Yes, it is all good. Let's just move on.