[Qemu-devel] [PATCH/RFC] s390x: split memory into 4TB chunks

Christian Borntraeger posted 1 patch 6 years, 4 months ago
hw/s390x/s390-virtio-ccw.c | 20 +++++++++++++++++---
1 file changed, 17 insertions(+), 3 deletions(-)
[Qemu-devel] [PATCH/RFC] s390x: split memory into 4TB chunks
Posted by Christian Borntraeger 6 years, 4 months ago
KVM does not allow memory regions > KVM_MEM_MAX_NR_PAGES, basically
limiting the memory per slot to 7.999TB. Lets split the base memory
into 4TB chunks to allow go beyond 8TB for a guest. With that (and
optimistic overcommitment in the kernel) I was able to start  a 59TB
guest on a 1TB system.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 hw/s390x/s390-virtio-ccw.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 8425534..6735bbb 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -157,11 +157,25 @@ static void virtio_ccw_register_hcalls(void)
 static void s390_memory_init(ram_addr_t mem_size)
 {
     MemoryRegion *sysmem = get_system_memory();
-    MemoryRegion *ram = g_new(MemoryRegion, 1);
+    ram_addr_t chunk, offset;
+    gchar *name;
 
     /* allocate RAM for core */
-    memory_region_allocate_system_memory(ram, NULL, "s390.ram", mem_size);
-    memory_region_add_subregion(sysmem, 0, ram);
+    offset = 0;
+    while (mem_size) {
+        MemoryRegion *ram = g_new(MemoryRegion, 1);
+        chunk = mem_size;
+        /* KVM does not allow memslots >= 8 TB. Lets split into 4TB chunks */
+        if (chunk > 4UL * 1024 * 1024 * 1024 * 1024) {
+            chunk = 4UL * 1024 * 1024 * 1024 * 1024;
+        }
+        name = g_strdup_printf("s390.ram[0x%lx]", offset);
+        memory_region_allocate_system_memory(ram, NULL, name, chunk);
+        memory_region_add_subregion(sysmem, offset, ram);
+        mem_size -= chunk;
+        offset += chunk;
+        g_free(name);
+    }
 
     /* Initialize storage key device */
     s390_skeys_init();
-- 
2.9.4


Re: [Qemu-devel] [PATCH/RFC] s390x: split memory into 4TB chunks
Posted by David Hildenbrand 6 years, 4 months ago
On 06.12.2017 13:00, Christian Borntraeger wrote:
> KVM does not allow memory regions > KVM_MEM_MAX_NR_PAGES, basically
> limiting the memory per slot to 7.999TB. Lets split the base memory
> into 4TB chunks to allow go beyond 8TB for a guest. With that (and
> optimistic overcommitment in the kernel) I was able to start  a 59TB
> guest on a 1TB system.
> 
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  hw/s390x/s390-virtio-ccw.c | 20 +++++++++++++++++---
>  1 file changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
> index 8425534..6735bbb 100644
> --- a/hw/s390x/s390-virtio-ccw.c
> +++ b/hw/s390x/s390-virtio-ccw.c
> @@ -157,11 +157,25 @@ static void virtio_ccw_register_hcalls(void)
>  static void s390_memory_init(ram_addr_t mem_size)
>  {
>      MemoryRegion *sysmem = get_system_memory();
> -    MemoryRegion *ram = g_new(MemoryRegion, 1);
> +    ram_addr_t chunk, offset;
> +    gchar *name;
>  
>      /* allocate RAM for core */
> -    memory_region_allocate_system_memory(ram, NULL, "s390.ram", mem_size);
> -    memory_region_add_subregion(sysmem, 0, ram);
> +    offset = 0;
> +    while (mem_size) {
> +        MemoryRegion *ram = g_new(MemoryRegion, 1);
> +        chunk = mem_size;
> +        /* KVM does not allow memslots >= 8 TB. Lets split into 4TB chunks */> +        if (chunk > 4UL * 1024 * 1024 * 1024 * 1024) {
> +            chunk = 4UL * 1024 * 1024 * 1024 * 1024;
> +        }
> +        name = g_strdup_printf("s390.ram[0x%lx]", offset);
> +        memory_region_allocate_system_memory(ram, NULL, name, chunk);
> +        memory_region_add_subregion(sysmem, offset, ram);
> +        mem_size -= chunk;
> +        offset += chunk;
> +        g_free(name);
> +    }
>  
>      /* Initialize storage key device */
>      s390_skeys_init();
> 

This will most certainly break migration, no?

1. I remember the name being used for migration, I might be wrong.
2. Migration of guests > 4TB is certainly broken ;)

I wonder if this should rather be handled in the kvm_slot code.
(silently create an manage two slots)

-- 

Thanks,

David / dhildenb

Re: [Qemu-devel] [PATCH/RFC] s390x: split memory into 4TB chunks
Posted by Christian Borntraeger 6 years, 4 months ago

On 12/06/2017 01:04 PM, David Hildenbrand wrote:
> On 06.12.2017 13:00, Christian Borntraeger wrote:
>> KVM does not allow memory regions > KVM_MEM_MAX_NR_PAGES, basically
>> limiting the memory per slot to 7.999TB. Lets split the base memory
>> into 4TB chunks to allow go beyond 8TB for a guest. With that (and
>> optimistic overcommitment in the kernel) I was able to start  a 59TB
>> guest on a 1TB system.
>>
>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>> ---
>>  hw/s390x/s390-virtio-ccw.c | 20 +++++++++++++++++---
>>  1 file changed, 17 insertions(+), 3 deletions(-)
>>
>> diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
>> index 8425534..6735bbb 100644
>> --- a/hw/s390x/s390-virtio-ccw.c
>> +++ b/hw/s390x/s390-virtio-ccw.c
>> @@ -157,11 +157,25 @@ static void virtio_ccw_register_hcalls(void)
>>  static void s390_memory_init(ram_addr_t mem_size)
>>  {
>>      MemoryRegion *sysmem = get_system_memory();
>> -    MemoryRegion *ram = g_new(MemoryRegion, 1);
>> +    ram_addr_t chunk, offset;
>> +    gchar *name;
>>  
>>      /* allocate RAM for core */
>> -    memory_region_allocate_system_memory(ram, NULL, "s390.ram", mem_size);
>> -    memory_region_add_subregion(sysmem, 0, ram);
>> +    offset = 0;
>> +    while (mem_size) {
>> +        MemoryRegion *ram = g_new(MemoryRegion, 1);
>> +        chunk = mem_size;
>> +        /* KVM does not allow memslots >= 8 TB. Lets split into 4TB chunks */> +        if (chunk > 4UL * 1024 * 1024 * 1024 * 1024) {
>> +            chunk = 4UL * 1024 * 1024 * 1024 * 1024;
>> +        }
>> +        name = g_strdup_printf("s390.ram[0x%lx]", offset);
>> +        memory_region_allocate_system_memory(ram, NULL, name, chunk);
>> +        memory_region_add_subregion(sysmem, offset, ram);
>> +        mem_size -= chunk;
>> +        offset += chunk;
>> +        g_free(name);
>> +    }
>>  
>>      /* Initialize storage key device */
>>      s390_skeys_init();
>>
> 
> This will most certainly break migration, no?

Probably yes. Thats why the patch has RFC. I was looking for ideas.
> 
> 1. I remember the name being used for migration, I might be wrong.
> 2. Migration of guests > 4TB is certainly broken ;)
> 
> I wonder if this should rather be handled in the kvm_slot code.
> (silently create an manage two slots)


Re: [Qemu-devel] [PATCH/RFC] s390x: split memory into 4TB chunks
Posted by David Hildenbrand 6 years, 4 months ago
On 06.12.2017 13:06, Christian Borntraeger wrote:
> 
> 
> On 12/06/2017 01:04 PM, David Hildenbrand wrote:
>> On 06.12.2017 13:00, Christian Borntraeger wrote:
>>> KVM does not allow memory regions > KVM_MEM_MAX_NR_PAGES, basically
>>> limiting the memory per slot to 7.999TB. Lets split the base memory
>>> into 4TB chunks to allow go beyond 8TB for a guest. With that (and
>>> optimistic overcommitment in the kernel) I was able to start  a 59TB
>>> guest on a 1TB system.
>>>
>>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>>> ---
>>>  hw/s390x/s390-virtio-ccw.c | 20 +++++++++++++++++---
>>>  1 file changed, 17 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
>>> index 8425534..6735bbb 100644
>>> --- a/hw/s390x/s390-virtio-ccw.c
>>> +++ b/hw/s390x/s390-virtio-ccw.c
>>> @@ -157,11 +157,25 @@ static void virtio_ccw_register_hcalls(void)
>>>  static void s390_memory_init(ram_addr_t mem_size)
>>>  {
>>>      MemoryRegion *sysmem = get_system_memory();
>>> -    MemoryRegion *ram = g_new(MemoryRegion, 1);
>>> +    ram_addr_t chunk, offset;
>>> +    gchar *name;
>>>  
>>>      /* allocate RAM for core */
>>> -    memory_region_allocate_system_memory(ram, NULL, "s390.ram", mem_size);
>>> -    memory_region_add_subregion(sysmem, 0, ram);
>>> +    offset = 0;
>>> +    while (mem_size) {
>>> +        MemoryRegion *ram = g_new(MemoryRegion, 1);
>>> +        chunk = mem_size;
>>> +        /* KVM does not allow memslots >= 8 TB. Lets split into 4TB chunks */> +        if (chunk > 4UL * 1024 * 1024 * 1024 * 1024) {
>>> +            chunk = 4UL * 1024 * 1024 * 1024 * 1024;
>>> +        }
>>> +        name = g_strdup_printf("s390.ram[0x%lx]", offset);
>>> +        memory_region_allocate_system_memory(ram, NULL, name, chunk);
>>> +        memory_region_add_subregion(sysmem, offset, ram);
>>> +        mem_size -= chunk;
>>> +        offset += chunk;
>>> +        g_free(name);
>>> +    }
>>>  
>>>      /* Initialize storage key device */
>>>      s390_skeys_init();
>>>
>>
>> This will most certainly break migration, no?
> 
> Probably yes. Thats why the patch has RFC. I was looking for ideas.

As other architectures might also run into this problem, wonder if

a) bumping up KVM_MEM_MAX_NR_PAGES makes sense.
b) as I said, transparently handle it in kvm slot handling code.

>>
>> 1. I remember the name being used for migration, I might be wrong.
>> 2. Migration of guests > 4TB is certainly broken ;)
>>
>> I wonder if this should rather be handled in the kvm_slot code.
>> (silently create an manage two slots)
> 


-- 

Thanks,

David / dhildenb

Re: [Qemu-devel] [PATCH/RFC] s390x: split memory into 4TB chunks
Posted by Christian Borntraeger 6 years, 4 months ago

On 12/06/2017 01:12 PM, David Hildenbrand wrote:
> On 06.12.2017 13:06, Christian Borntraeger wrote:
>>
>>
>> On 12/06/2017 01:04 PM, David Hildenbrand wrote:
>>> On 06.12.2017 13:00, Christian Borntraeger wrote:
>>>> KVM does not allow memory regions > KVM_MEM_MAX_NR_PAGES, basically
>>>> limiting the memory per slot to 7.999TB. Lets split the base memory
>>>> into 4TB chunks to allow go beyond 8TB for a guest. With that (and
>>>> optimistic overcommitment in the kernel) I was able to start  a 59TB
>>>> guest on a 1TB system.
>>>>
>>>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>>>> ---
>>>>  hw/s390x/s390-virtio-ccw.c | 20 +++++++++++++++++---
>>>>  1 file changed, 17 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
>>>> index 8425534..6735bbb 100644
>>>> --- a/hw/s390x/s390-virtio-ccw.c
>>>> +++ b/hw/s390x/s390-virtio-ccw.c
>>>> @@ -157,11 +157,25 @@ static void virtio_ccw_register_hcalls(void)
>>>>  static void s390_memory_init(ram_addr_t mem_size)
>>>>  {
>>>>      MemoryRegion *sysmem = get_system_memory();
>>>> -    MemoryRegion *ram = g_new(MemoryRegion, 1);
>>>> +    ram_addr_t chunk, offset;
>>>> +    gchar *name;
>>>>  
>>>>      /* allocate RAM for core */
>>>> -    memory_region_allocate_system_memory(ram, NULL, "s390.ram", mem_size);
>>>> -    memory_region_add_subregion(sysmem, 0, ram);
>>>> +    offset = 0;
>>>> +    while (mem_size) {
>>>> +        MemoryRegion *ram = g_new(MemoryRegion, 1);
>>>> +        chunk = mem_size;
>>>> +        /* KVM does not allow memslots >= 8 TB. Lets split into 4TB chunks */> +        if (chunk > 4UL * 1024 * 1024 * 1024 * 1024) {
>>>> +            chunk = 4UL * 1024 * 1024 * 1024 * 1024;
>>>> +        }
>>>> +        name = g_strdup_printf("s390.ram[0x%lx]", offset);
>>>> +        memory_region_allocate_system_memory(ram, NULL, name, chunk);
>>>> +        memory_region_add_subregion(sysmem, offset, ram);
>>>> +        mem_size -= chunk;
>>>> +        offset += chunk;
>>>> +        g_free(name);
>>>> +    }
>>>>  
>>>>      /* Initialize storage key device */
>>>>      s390_skeys_init();
>>>>
>>>
>>> This will most certainly break migration, no?
>>
>> Probably yes. Thats why the patch has RFC. I was looking for ideas.
> 
> As other architectures might also run into this problem, wonder if
> 
> a) bumping up KVM_MEM_MAX_NR_PAGES makes sense.

The original limitation comes from the fact that this define is used to limit
the number of bits in the dirty bitmap as some architectures do not provide
bitops beyond 2^32.

> b) as I said, transparently handle it in kvm slot handling code.

adding Paolo to check what he thinks.
> 
>>>
>>> 1. I remember the name being used for migration, I might be wrong.
>>> 2. Migration of guests > 4TB is certainly broken ;)
>>>
>>> I wonder if this should rather be handled in the kvm_slot code.
>>> (silently create an manage two slots)
>>
> 
> 


Re: [Qemu-devel] [PATCH/RFC] s390x: split memory into 4TB chunks
Posted by David Hildenbrand 6 years, 4 months ago
>>>> This will most certainly break migration, no?
>>>
>>> Probably yes. Thats why the patch has RFC. I was looking for ideas.
>>
>> As other architectures might also run into this problem, wonder if
>>
>> a) bumping up KVM_MEM_MAX_NR_PAGES makes sense.
> 
> The original limitation comes from the fact that this define is used to limit
> the number of bits in the dirty bitmap as some architectures do not provide
> bitops beyond 2^32.

Then my question would be if we can bump it up for all architectures
that support it.

> 
>> b) as I said, transparently handle it in kvm slot handling code.
> 
> adding Paolo to check what he thinks.
>>
>>>>
>>>> 1. I remember the name being used for migration, I might be wrong.
>>>> 2. Migration of guests > 4TB is certainly broken ;)
>>>>
>>>> I wonder if this should rather be handled in the kvm_slot code.
>>>> (silently create an manage two slots)
>>>
>>
>>
> 


-- 

Thanks,

David / dhildenb

Re: [Qemu-devel] [PATCH/RFC] s390x: split memory into 4TB chunks
Posted by Christian Borntraeger 6 years, 4 months ago

On 12/06/2017 01:19 PM, David Hildenbrand wrote:
> 
>>>>> This will most certainly break migration, no?
>>>>
>>>> Probably yes. Thats why the patch has RFC. I was looking for ideas.
>>>
>>> As other architectures might also run into this problem, wonder if
>>>
>>> a) bumping up KVM_MEM_MAX_NR_PAGES makes sense.
>>
>> The original limitation comes from the fact that this define is used to limit
>> the number of bits in the dirty bitmap as some architectures do not provide
>> bitops beyond 2^32.
> 
> Then my question would be if we can bump it up for all architectures
> that support it.
> 
>>
>>> b) as I said, transparently handle it in kvm slot handling code.
>>
>> adding Paolo to check what he thinks.

Another quick variant would be to keep the name of the first memslot as 
is (to be compatible) and start a new one just after the maximum size of.
After all migration will fail anyway for larger guests.

>>>
>>>>>
>>>>> 1. I remember the name being used for migration, I might be wrong.
>>>>> 2. Migration of guests > 4TB is certainly broken ;)
>>>>>
>>>>> I wonder if this should rather be handled in the kvm_slot code.
>>>>> (silently create an manage two slots)
>>>>
>>>
>>>
>>
> 
> 


Re: [Qemu-devel] [PATCH/RFC] s390x: split memory into 4TB chunks
Posted by Paolo Bonzini 6 years, 4 months ago
On 06/12/2017 13:15, Christian Borntraeger wrote:
>> a) bumping up KVM_MEM_MAX_NR_PAGES makes sense.
> The original limitation comes from the fact that this define is used to limit
> the number of bits in the dirty bitmap as some architectures do not provide
> bitops beyond 2^32.
> 
>> b) as I said, transparently handle it in kvm slot handling code.
> adding Paolo to check what he thinks.

It's a bit of a first-world problem, :) and working around it in QEMU
makes is easy enough that it's okay in my opinion.

Bumping up KVM_MEM_MAX_NR_PAGES if the bitops allows it should be a good
idea too, but it requires auditing the code for overflows and truncations.

Paolo

Re: [Qemu-devel] [PATCH/RFC] s390x: split memory into 4TB chunks
Posted by Christian Borntraeger 6 years, 4 months ago

On 12/06/2017 02:35 PM, Paolo Bonzini wrote:
> On 06/12/2017 13:15, Christian Borntraeger wrote:
>>> a) bumping up KVM_MEM_MAX_NR_PAGES makes sense.
>> The original limitation comes from the fact that this define is used to limit
>> the number of bits in the dirty bitmap as some architectures do not provide
>> bitops beyond 2^32.
>>
>>> b) as I said, transparently handle it in kvm slot handling code.
>> adding Paolo to check what he thinks.
> 
> It's a bit of a first-world problem, :) 

If a genie comes along and offers you to fufill 3 wishes, as an IT guy
you say: "I want a memory cardridge that is never full".
The genie say "ok, what is your 2nd wish". You say "A 2nd memory cartridge"....


> and working around it in QEMU
> makes is easy enough that it's okay in my opinion.

Since the dirty logging is synched between the qemu memory regions and the 
KVM slots I will look into something like keep the original memslots and create a
new one as soon as we cross the current KVM_MEM_MAX_NR_PAGES size.

> 
> Bumping up KVM_MEM_MAX_NR_PAGES if the bitops allows it should be a good
> idea too, but it requires auditing the code for overflows and truncations.
> 
> Paolo
>