[PATCH v1 02/10] qemu_domain.c: align memory modules before calculating 'initialmem'

Daniel Henrique Barboza posted 10 patches 5 years, 2 months ago
[PATCH v1 02/10] qemu_domain.c: align memory modules before calculating 'initialmem'
Posted by Daniel Henrique Barboza 5 years, 2 months ago
qemuDomainAlignMemorySizes() has an operation order problem. We are
calculating 'initialmem' without aligning the memory modules first.
Since we're aligning the dimms afterwards this can create inconsistencies
in the end result. x86 has alignment of 1-2MiB and it's not severely
impacted by it, but pSeries works with 256MiB alignment and the difference
is noticeable.

This is the case of the existing 'memory-hotplug-ppc64-nonuma' test.
The test consists of a 2GiB (aligned value) guest with 2 ~520MiB dimms,
both unaligned. 'initialmem' is calculated by taking total_mem and
subtracting the dimms size (via virDomainDefGetMemoryInitial()), which
wil give us 2GiB - 520MiB - 520MiB, ending up with a little more than
an 1GiB of 'initialmem'. Note that this value is now unaligned, and
will be aligned up via VIR_ROUND_UP(), and we'll end up with 'initialmem'
of 1GiB + 256MiB. Given that the dimms are aligned later on, the end
result for QEMU is that the guest will have a 'mem' size of 1310720k,
plus the two 512 MiB dimms, exceeding in 256MiB the desired 2GiB
memory and currentMemory specified in the XML.

This behavior was found when trying to push ppc64 memory alignment
to postparse, as done in the next patch. In that case, the memory
modules are aligned before entering qemuDomainAlignMemorySizes() and
the 'memory-hotplug-ppc64-nonuma' test started to fail.

Fortunately the fix is simple: align the memory modules before
calculating 'initialmem' in qemuDomainAlignMemorySizes().

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
---
 src/qemu/qemu_domain.c                        | 42 ++++++++++---------
 .../memory-hotplug-ppc64-nonuma.args          |  2 +-
 2 files changed, 24 insertions(+), 20 deletions(-)

diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
index 375a9e5075..5a17887fa1 100644
--- a/src/qemu/qemu_domain.c
+++ b/src/qemu/qemu_domain.c
@@ -8083,25 +8083,9 @@ qemuDomainAlignMemorySizes(virDomainDefPtr def)
         virDomainNumaSetNodeMemorySize(def->numa, i, mem);
     }
 
-    /* align initial memory size, if NUMA is present calculate it as total of
-     * individual aligned NUMA node sizes */
-    if (initialmem == 0)
-        initialmem = VIR_ROUND_UP(virDomainDefGetMemoryInitial(def), align);
-
-    if (initialmem > maxmemcapped) {
-        virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
-                       _("initial memory size overflowed after alignment"));
-        return -1;
-    }
-
-    def->mem.max_memory = VIR_ROUND_UP(def->mem.max_memory, align);
-    if (def->mem.max_memory > maxmemkb) {
-        virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
-                       _("maximum memory size overflowed after alignment"));
-        return -1;
-    }
-
-    /* Align memory module sizes */
+    /* Align memory module sizes. This needs to occur before 'initialmem'
+     * calculation because virDomainDefGetMemoryInitial() uses the size
+     * of the modules in the math. */
     for (i = 0; i < def->nmems; i++) {
         if (def->mems[i]->model == VIR_DOMAIN_MEMORY_MODEL_NVDIMM &&
             ARCH_IS_PPC64(def->os.arch)) {
@@ -8122,6 +8106,26 @@ qemuDomainAlignMemorySizes(virDomainDefPtr def)
         }
     }
 
+    /* Align initial memory size, if NUMA is present calculate it as total of
+     * individual aligned NUMA node sizes. */
+    if (initialmem == 0) {
+        align = qemuDomainGetMemorySizeAlignment(def);
+        initialmem = VIR_ROUND_UP(virDomainDefGetMemoryInitial(def), align);
+    }
+
+    if (initialmem > maxmemcapped) {
+        virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
+                       _("initial memory size overflowed after alignment"));
+        return -1;
+    }
+
+    def->mem.max_memory = VIR_ROUND_UP(def->mem.max_memory, align);
+    if (def->mem.max_memory > maxmemkb) {
+        virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
+                       _("maximum memory size overflowed after alignment"));
+        return -1;
+    }
+
     virDomainDefSetMemoryTotal(def, initialmem + hotplugmem);
 
     return 0;
diff --git a/tests/qemuxml2argvdata/memory-hotplug-ppc64-nonuma.args b/tests/qemuxml2argvdata/memory-hotplug-ppc64-nonuma.args
index 91cea9d8bf..78406f7f04 100644
--- a/tests/qemuxml2argvdata/memory-hotplug-ppc64-nonuma.args
+++ b/tests/qemuxml2argvdata/memory-hotplug-ppc64-nonuma.args
@@ -11,7 +11,7 @@ QEMU_AUDIO_DRV=none \
 -name QEMUGuest1 \
 -S \
 -machine pseries,accel=kvm,usb=off,dump-guest-core=off \
--m size=1310720k,slots=16,maxmem=4194304k \
+-m size=1048576k,slots=16,maxmem=4194304k \
 -realtime mlock=off \
 -smp 1,sockets=1,cores=1,threads=1 \
 -object memory-backend-ram,id=memdimm0,size=536870912 \
-- 
2.26.2

Re: [PATCH v1 02/10] qemu_domain.c: align memory modules before calculating 'initialmem'
Posted by Andrea Bolognani 5 years, 2 months ago
On Wed, 2020-11-11 at 19:07 -0300, Daniel Henrique Barboza wrote:
> +++ b/tests/qemuxml2argvdata/memory-hotplug-ppc64-nonuma.args
> @@ -11,7 +11,7 @@ QEMU_AUDIO_DRV=none \
>  -name QEMUGuest1 \
>  -S \
>  -machine pseries,accel=kvm,usb=off,dump-guest-core=off \
> --m size=1310720k,slots=16,maxmem=4194304k \
> +-m size=1048576k,slots=16,maxmem=4194304k \
>  -realtime mlock=off \
>  -smp 1,sockets=1,cores=1,threads=1 \
>  -object memory-backend-ram,id=memdimm0,size=536870912 \

This doesn't look right: AFAIK the initial memory size is
guest-visible, so by changing how the alignment is performed you
might both change the guest ABI across guest boots (if libvirt is
upgraded in between them) and break migration (if either the source
or destination host is running the newer libvirt but the other side
isn't).

Did I miss something that makes this okay?

-- 
Andrea Bolognani / Red Hat / Virtualization

Re: [PATCH v1 02/10] qemu_domain.c: align memory modules before calculating 'initialmem'
Posted by Daniel Henrique Barboza 5 years, 2 months ago

On 11/13/20 7:30 AM, Andrea Bolognani wrote:
> On Wed, 2020-11-11 at 19:07 -0300, Daniel Henrique Barboza wrote:
>> +++ b/tests/qemuxml2argvdata/memory-hotplug-ppc64-nonuma.args
>> @@ -11,7 +11,7 @@ QEMU_AUDIO_DRV=none \
>>   -name QEMUGuest1 \
>>   -S \
>>   -machine pseries,accel=kvm,usb=off,dump-guest-core=off \
>> --m size=1310720k,slots=16,maxmem=4194304k \
>> +-m size=1048576k,slots=16,maxmem=4194304k \
>>   -realtime mlock=off \
>>   -smp 1,sockets=1,cores=1,threads=1 \
>>   -object memory-backend-ram,id=memdimm0,size=536870912 \
> 
> This doesn't look right: AFAIK the initial memory size is
> guest-visible, so by changing how the alignment is performed you
> might both change the guest ABI across guest boots (if libvirt is
> upgraded in between them) and break migration (if either the source
> or destination host is running the newer libvirt but the other side
> isn't).

Good point. I failed to consider ABI stability for ppc64 guest migration.
Yes, this will break older guests that happen to have extra memory. In
fact, this can be specially harmful for migration.

This means that I can't proceed with any other ppc64 changes made here.
Aligning ppc64 DIMMs in PostParse will achieve the same results even
without this patch - the DIMMs will be aligned before qemuDomainAlignMemorySizes()
and initialmem will be rounded to a more precise value of 'currentMemory'.
I can think of ways to align ppc64 DIMMs while not touching initialmem,
but all of them will require extra state in the domain definition. The
benefits are there (the DIMMs will be aligned in the live XML) but I'm
not sure it's worth the extra code.

Thanks for pointing this out. I'll evaluate if the x86 bits are still
valuable and re-send them, since they're not messing with initialmem
calculation of x86 guests.


DHB



> 
> Did I miss something that makes this okay?
> 

Re: [PATCH v1 02/10] qemu_domain.c: align memory modules before calculating 'initialmem'
Posted by Andrea Bolognani 5 years, 2 months ago
On Fri, 2020-11-13 at 09:58 -0300, Daniel Henrique Barboza wrote:
> On 11/13/20 7:30 AM, Andrea Bolognani wrote:
> > On Wed, 2020-11-11 at 19:07 -0300, Daniel Henrique Barboza wrote:
> > > +++ b/tests/qemuxml2argvdata/memory-hotplug-ppc64-nonuma.args
> > > @@ -11,7 +11,7 @@ QEMU_AUDIO_DRV=none \
> > >   -name QEMUGuest1 \
> > >   -S \
> > >   -machine pseries,accel=kvm,usb=off,dump-guest-core=off \
> > > --m size=1310720k,slots=16,maxmem=4194304k \
> > > +-m size=1048576k,slots=16,maxmem=4194304k \
> > >   -realtime mlock=off \
> > >   -smp 1,sockets=1,cores=1,threads=1 \
> > >   -object memory-backend-ram,id=memdimm0,size=536870912 \
> > 
> > This doesn't look right: AFAIK the initial memory size is
> > guest-visible, so by changing how the alignment is performed you
> > might both change the guest ABI across guest boots (if libvirt is
> > upgraded in between them) and break migration (if either the source
> > or destination host is running the newer libvirt but the other side
> > isn't).
> 
> Good point. I failed to consider ABI stability for ppc64 guest migration.
> Yes, this will break older guests that happen to have extra memory. In
> fact, this can be specially harmful for migration.
> 
> This means that I can't proceed with any other ppc64 changes made here.
> Aligning ppc64 DIMMs in PostParse will achieve the same results even
> without this patch - the DIMMs will be aligned before qemuDomainAlignMemorySizes()
> and initialmem will be rounded to a more precise value of 'currentMemory'.
> I can think of ways to align ppc64 DIMMs while not touching initialmem,
> but all of them will require extra state in the domain definition. The
> benefits are there (the DIMMs will be aligned in the live XML) but I'm
> not sure it's worth the extra code.

I only skimmed the remaining patches and didn't dig into the code as
much, or as recently, as you have, but from a high-level perspective
I don't see why you wouldn't be able to simply move the existing
rounding logic from the command line generator to PostParse? It's not
like the former has access to additional information that the latter
can't get to, right?

-- 
Andrea Bolognani / Red Hat / Virtualization

Re: [PATCH v1 02/10] qemu_domain.c: align memory modules before calculating 'initialmem'
Posted by Daniel Henrique Barboza 5 years, 2 months ago
(Peter, I cc'ed you because I cited one of your commits and you might want
to weight in)

On 11/13/20 10:51 AM, Andrea Bolognani wrote:
> On Fri, 2020-11-13 at 09:58 -0300, Daniel Henrique Barboza wrote:
>> On 11/13/20 7:30 AM, Andrea Bolognani wrote:
>>> On Wed, 2020-11-11 at 19:07 -0300, Daniel Henrique Barboza wrote:
>>>> +++ b/tests/qemuxml2argvdata/memory-hotplug-ppc64-nonuma.args

[...]

> I only skimmed the remaining patches and didn't dig into the code as
> much, or as recently, as you have, but from a high-level perspective
> I don't see why you wouldn't be able to simply move the existing
> rounding logic from the command line generator to PostParse? It's not
> like the former has access to additional information that the latter
> can't get to, right?
> 

I was looking into the code and I think that might have the wrong idea here.
Apparently we're not aligning memory during migration or snapshot restore.
This specific line in qemu_command.c got my attention:

-- qemuBuildCommandLine() --

     if (!migrateURI && !snapshot && qemuDomainAlignMemorySizes(def) < 0)
         return NULL;

------

I investigated the history behind this logic and found the following commit:

commit c7d7ba85a6242d789ba3f4dae313e950fbb638c5
Author: Peter Krempa <pkrempa@redhat.com>
Date:   Thu Sep 17 08:14:05 2015 +0200

     qemu: command: Align memory sizes only on fresh starts
     
     When we are starting a qemu process for an incomming migration or
     snapshot reloading we should not modify the memory sizes in the domain
     since we could potentially change the guest ABI that was tediously
     checked before. Additionally the function now updates the initial memory
     size according to the NUMA node size, which should not happen if we are
     restoring state.
     
     Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1252685

---------


This means that the changes made in this series will not break migration, since
the alignment of 'initialmem' is not being triggered in those cases. Which
is good.

However, you also brought up in an earlier reply that these changes might break
"the guest ABI across guest boots (if libvirt is upgraded in between them)". This
can't be helped I think - an older ppc64 guest with 4GiB of 'currentMemory' in the
XML, that is ending up having 4.256GiB (extra 256MiB) due to the way alignment was
being done, will have exactly 4GiB of 'initialmem' after these changes. My point is
that we're giving the exact memory the guest is demanding, as intended by the domain
XML, in a fresh guest start. This might be considered an ABI break probably, but
why would one complain that Libvirt is now giving precisely what was asked for?
Avoiding granting extra 256MBs of mem for domains seems worth it, given that we're
not impacting live domains or migration.


Thanks,


DHB










Re: [PATCH v1 02/10] qemu_domain.c: align memory modules before calculating 'initialmem'
Posted by Andrea Bolognani 5 years, 2 months ago
On Fri, 2020-11-13 at 16:23 -0300, Daniel Henrique Barboza wrote:
> On 11/13/20 10:51 AM, Andrea Bolognani wrote:
> > I only skimmed the remaining patches and didn't dig into the code as
> > much, or as recently, as you have, but from a high-level perspective
> > I don't see why you wouldn't be able to simply move the existing
> > rounding logic from the command line generator to PostParse? It's not
> > like the former has access to additional information that the latter
> > can't get to, right?
> 
> I was looking into the code and I think that might have the wrong idea here.
> Apparently we're not aligning memory during migration or snapshot restore.
> This specific line in qemu_command.c got my attention:
> 
> -- qemuBuildCommandLine() --
> 
>      if (!migrateURI && !snapshot && qemuDomainAlignMemorySizes(def) < 0)
>          return NULL;
> 
> ------
> 
> I investigated the history behind this logic and found the following commit:
> 
> commit c7d7ba85a6242d789ba3f4dae313e950fbb638c5
> Author: Peter Krempa <pkrempa@redhat.com>
> Date:   Thu Sep 17 08:14:05 2015 +0200
> 
>      qemu: command: Align memory sizes only on fresh starts
>      
>      When we are starting a qemu process for an incomming migration or
>      snapshot reloading we should not modify the memory sizes in the domain
>      since we could potentially change the guest ABI that was tediously
>      checked before. Additionally the function now updates the initial memory
>      size according to the NUMA node size, which should not happen if we are
>      restoring state.
>      
>      Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1252685
> 
> ---------
> 
> This means that the changes made in this series will not break migration, since
> the alignment of 'initialmem' is not being triggered in those cases. Which
> is good.
> 
> However, you also brought up in an earlier reply that these changes might break
> "the guest ABI across guest boots (if libvirt is upgraded in between them)". This
> can't be helped I think - an older ppc64 guest with 4GiB of 'currentMemory' in the
> XML, that is ending up having 4.256GiB (extra 256MiB) due to the way alignment was
> being done, will have exactly 4GiB of 'initialmem' after these changes. My point is
> that we're giving the exact memory the guest is demanding, as intended by the domain
> XML, in a fresh guest start. This might be considered an ABI break probably, but
> why would one complain that Libvirt is now giving precisely what was asked for?
> Avoiding granting extra 256MBs of mem for domains seems worth it, given that we're
> not impacting live domains or migration.

In general, changing guest ABI between boots is not something that we
want to happen.

I have trouble keeping all the details of memory alignment inside my
head and I can't quite spend the time necessary to swap them back in
right now, so please apologies if I'm being vague and of course
correct me if I'm wrong... Having Peter in the thread will also
surely help with that :)

The aim of this series should *not* be to change the calculations
that happen when aligning memory, but only to reflect them back to
the domain XML where they can be queried: so for example if the input

  <memory unit='GiB'>4</memory>
  <devices>
    <memory model='nvdimm'>
      <target>
        <size unit='MiB'>500</size>

results in the command line

  -m 4352m
  -object memory-backend-file,size=512m

(the exact numbers are not relevant), then what we want is for the
XML to be updated at define time so that it reads

  <memory unit='MiB'>4864</memory>
  <devices>
    <memory model='nvdimm'>
      <target>
        <size unit='MiB'>512</size>

(again, the numbers are almost certainly wrong) and we want *that*
XML to generate the same QEMU command line as before.

If this can't be achieved, or there are other side effects to it that
I haven't considered, then we're better off leaving the current
behavior alone (documenting the heck out of it if necessary) instead
of changing it in ways that would alter guest ABI between boots.

-- 
Andrea Bolognani / Red Hat / Virtualization

Re: [PATCH v1 02/10] qemu_domain.c: align memory modules before calculating 'initialmem'
Posted by Peter Krempa 5 years, 2 months ago
On Mon, Nov 16, 2020 at 20:43:09 +0100, Andrea Bolognani wrote:
> On Fri, 2020-11-13 at 16:23 -0300, Daniel Henrique Barboza wrote:
> > On 11/13/20 10:51 AM, Andrea Bolognani wrote:
> > > I only skimmed the remaining patches and didn't dig into the code as
> > > much, or as recently, as you have, but from a high-level perspective
> > > I don't see why you wouldn't be able to simply move the existing
> > > rounding logic from the command line generator to PostParse? It's not
> > > like the former has access to additional information that the latter
> > > can't get to, right?
> > 
> > I was looking into the code and I think that might have the wrong idea here.
> > Apparently we're not aligning memory during migration or snapshot restore.
> > This specific line in qemu_command.c got my attention:
> > 
> > -- qemuBuildCommandLine() --
> > 
> >      if (!migrateURI && !snapshot && qemuDomainAlignMemorySizes(def) < 0)
> >          return NULL;
> > 
> > ------
> > 
> > I investigated the history behind this logic and found the following commit:
> > 
> > commit c7d7ba85a6242d789ba3f4dae313e950fbb638c5
> > Author: Peter Krempa <pkrempa@redhat.com>
> > Date:   Thu Sep 17 08:14:05 2015 +0200
> > 
> >      qemu: command: Align memory sizes only on fresh starts
> >      
> >      When we are starting a qemu process for an incomming migration or
> >      snapshot reloading we should not modify the memory sizes in the domain
> >      since we could potentially change the guest ABI that was tediously
> >      checked before. Additionally the function now updates the initial memory
> >      size according to the NUMA node size, which should not happen if we are
> >      restoring state.
> >      
> >      Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1252685
> > 
> > ---------
> > 
> > This means that the changes made in this series will not break migration, since
> > the alignment of 'initialmem' is not being triggered in those cases. Which
> > is good.
> > 
> > However, you also brought up in an earlier reply that these changes might break
> > "the guest ABI across guest boots (if libvirt is upgraded in between them)". This
> > can't be helped I think - an older ppc64 guest with 4GiB of 'currentMemory' in the
> > XML, that is ending up having 4.256GiB (extra 256MiB) due to the way alignment was
> > being done, will have exactly 4GiB of 'initialmem' after these changes. My point is
> > that we're giving the exact memory the guest is demanding, as intended by the domain
> > XML, in a fresh guest start. This might be considered an ABI break probably, but
> > why would one complain that Libvirt is now giving precisely what was asked for?
> > Avoiding granting extra 256MBs of mem for domains seems worth it, given that we're
> > not impacting live domains or migration.
> 
> In general, changing guest ABI between boots is not something that we
> want to happen.
> 
> I have trouble keeping all the details of memory alignment inside my
> head and I can't quite spend the time necessary to swap them back in
> right now, so please apologies if I'm being vague and of course
> correct me if I'm wrong... Having Peter in the thread will also
> surely help with that :)

Either you have high hopes or my sarcasm detector is failing.

Just as one data point: the 'PostParse' callback happen _always_. Even
when migrating the VM. My patch according to the commit message is
specifically avoiding the alignment on migration. Thus the code can't be
moved to the post parse callback.

The outcome documented in the NEWS just mentions that it's to update the
live XML.

Neither of the commit messages of this series mentions that there is an
issue with migration so the series needs to be re-evaluated in that
light.

Re: [PATCH v1 02/10] qemu_domain.c: align memory modules before calculating 'initialmem'
Posted by Daniel Henrique Barboza 5 years, 2 months ago

On 11/18/20 4:44 AM, Peter Krempa wrote:
> On Mon, Nov 16, 2020 at 20:43:09 +0100, Andrea Bolognani wrote:
>> On Fri, 2020-11-13 at 16:23 -0300, Daniel Henrique Barboza wrote:
>>> On 11/13/20 10:51 AM, Andrea Bolognani wrote:
>>>> I only skimmed the remaining patches and didn't dig into the code as

[...]


>>
>> In general, changing guest ABI between boots is not something that we
>> want to happen.
>>
>> I have trouble keeping all the details of memory alignment inside my
>> head and I can't quite spend the time necessary to swap them back in
>> right now, so please apologies if I'm being vague and of course
>> correct me if I'm wrong... Having Peter in the thread will also
>> surely help with that :)
> 
> Either you have high hopes or my sarcasm detector is failing.
> 
> Just as one data point: the 'PostParse' callback happen _always_. Even
> when migrating the VM. My patch according to the commit message is
> specifically avoiding the alignment on migration. Thus the code can't be
> moved to the post parse callback.

Not without considering the PARSE_ABI_UPDATE flag, and even then I'm not
sure if it's worth the trouble like I mentioned earlier.

> 
> The outcome documented in the NEWS just mentions that it's to update the
> live XML.
> 
> Neither of the commit messages of this series mentions that there is an
> issue with migration so the series needs to be re-evaluated in that
> light.
> 

This is correct. I got a lot of stuff wrong in the v1. v2 will be way
shorter in scope but it will not mess up with existing migration/snapshot
mechanics. My aim will be to fix the existing alignment calculation for
pseries DIMMs.


Thanks,


DHB

Re: [PATCH v1 02/10] qemu_domain.c: align memory modules before calculating 'initialmem'
Posted by Daniel Henrique Barboza 5 years, 2 months ago

On 11/16/20 4:43 PM, Andrea Bolognani wrote:
> On Fri, 2020-11-13 at 16:23 -0300, Daniel Henrique Barboza wrote:
>> On 11/13/20 10:51 AM, Andrea Bolognani wrote:
>>> I only skimmed the remaining patches and didn't dig into the code as
>>> much, or as recently, as you have, but from a high-level perspective
>>> I don't see why you wouldn't be able to simply move the existing
>>> rounding logic from the command line generator to PostParse? It's not
>>> like the former has access to additional information that the latter
>>> can't get to, right?
>>
>> I was looking into the code and I think that might have the wrong idea here.
>> Apparently we're not aligning memory during migration or snapshot restore.
>> This specific line in qemu_command.c got my attention:
>>
>> -- qemuBuildCommandLine() --
>>
>>       if (!migrateURI && !snapshot && qemuDomainAlignMemorySizes(def) < 0)
>>           return NULL;
>>
>> ------
>>
>> I investigated the history behind this logic and found the following commit:
>>
>> commit c7d7ba85a6242d789ba3f4dae313e950fbb638c5
>> Author: Peter Krempa <pkrempa@redhat.com>
>> Date:   Thu Sep 17 08:14:05 2015 +0200
>>
>>       qemu: command: Align memory sizes only on fresh starts
>>       
>>       When we are starting a qemu process for an incomming migration or
>>       snapshot reloading we should not modify the memory sizes in the domain
>>       since we could potentially change the guest ABI that was tediously
>>       checked before. Additionally the function now updates the initial memory
>>       size according to the NUMA node size, which should not happen if we are
>>       restoring state.
>>       
>>       Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1252685
>>
>> ---------
>>
>> This means that the changes made in this series will not break migration, since
>> the alignment of 'initialmem' is not being triggered in those cases. Which
>> is good.
>>
>> However, you also brought up in an earlier reply that these changes might break
>> "the guest ABI across guest boots (if libvirt is upgraded in between them)". This
>> can't be helped I think - an older ppc64 guest with 4GiB of 'currentMemory' in the
>> XML, that is ending up having 4.256GiB (extra 256MiB) due to the way alignment was
>> being done, will have exactly 4GiB of 'initialmem' after these changes. My point is
>> that we're giving the exact memory the guest is demanding, as intended by the domain
>> XML, in a fresh guest start. This might be considered an ABI break probably, but
>> why would one complain that Libvirt is now giving precisely what was asked for?
>> Avoiding granting extra 256MBs of mem for domains seems worth it, given that we're
>> not impacting live domains or migration.
> 
> In general, changing guest ABI between boots is not something that we
> want to happen.
> 
> I have trouble keeping all the details of memory alignment inside my
> head and I can't quite spend the time necessary to swap them back in
> right now, so please apologies if I'm being vague and of course
> correct me if I'm wrong... Having Peter in the thread will also
> surely help with that :)
> 
> The aim of this series should *not* be to change the calculations
> that happen when aligning memory, but only to reflect them back to
> the domain XML where they can be queried: so for example if the input
> 
>    <memory unit='GiB'>4</memory>
>    <devices>
>      <memory model='nvdimm'>
>        <target>
>          <size unit='MiB'>500</size>
> 
> results in the command line
> 
>    -m 4352m
>    -object memory-backend-file,size=512m
> 
> (the exact numbers are not relevant), then what we want is for the
> XML to be updated at define time so that it reads


I investigate it more and I think I got a few premises wrong here:


- I pointed out that the alignment changes aren't being reflected. This
is wrong. I figured it out when launching real guests and seeing that
the live XML was being updated properly. This diminishes the urgency
of this patch series considerably, and I apologize for the confusion.

- The reason I claimed that is because I got misled by my own
misunderstanding of what qemuxml2xmltest.c does: it checks the
result XML only up to PostParse changes, and that's it. Any further
change in the XML aren't being reflected by the tests. qemuxml2argvtest.c
goes as far as building the command line, and I got the wrong idea
that both tests reflected the same stages of a VM launch. For what
I'm doing here, qemuxml2xml isn't helpful.

All this out of the way:
  


> 
>    <memory unit='MiB'>4864</memory>
>    <devices>
>      <memory model='nvdimm'>
>        <target>
>          <size unit='MiB'>512</size>
> 
> (again, the numbers are almost certainly wrong) and we want *that*
> XML to generate the same QEMU command line as before.
> 
> If this can't be achieved, or there are other side effects to it that
> I haven't considered, then we're better off leaving the current
> behavior alone (documenting the heck out of it if necessary) instead
> of changing it in ways that would alter guest ABI between boots.



This can be achieve, sort of. I can't just move the alignment to PostParse,
even without changing how it is calculated, because I would break the premises
it's based on today (not be executed on migration or snapshot). To do
that I would need to gate the code into a VIR_DOMAIN_DEF_PARSE_ABI_UPDATE
parse flag. But I'm not sure if this flag is activated on guest reboot
without changing the domain (I'm looking into it ATM, but I don't think
that's the case). So even if we do that, we would still need the
alignment call after PostParse time, as is today, to accommodate existing
guests. Regardless of whether this is worth doing or not (TBH, I'm not
sure) I'll end up sending a couple of patches to reposition the alignment
code out of qemu_command.c (but not up to ParseTime).


What I still want to do is to use VIR_DOMAIN_DEF_PARSE_ABI_UPDATE to
force the alignment of pseries DIMMs in PostParse time. This will not
break ABI of existing guests in any capacity and will fix the alignment
calculation for newer ones. We have precedent for such scenario in
qemuDomainControllerDefPostParse(), when we use this same flag to
define VIR_DOMAIN_CONTROLLER_MODEL_USB_QEMU_XHCI for new pSeries
guests.


I'll post patches today. Let's see how it goes.





DHB

> 

Re: [PATCH v1 02/10] qemu_domain.c: align memory modules before calculating 'initialmem'
Posted by Daniel Henrique Barboza 5 years, 2 months ago

On 11/13/20 10:51 AM, Andrea Bolognani wrote:
> On Fri, 2020-11-13 at 09:58 -0300, Daniel Henrique Barboza wrote:
>> On 11/13/20 7:30 AM, Andrea Bolognani wrote:
>>> On Wed, 2020-11-11 at 19:07 -0300, Daniel Henrique Barboza wrote:
>>>> +++ b/tests/qemuxml2argvdata/memory-hotplug-ppc64-nonuma.args
>>>> @@ -11,7 +11,7 @@ QEMU_AUDIO_DRV=none \
>>>>    -name QEMUGuest1 \
>>>>    -S \
>>>>    -machine pseries,accel=kvm,usb=off,dump-guest-core=off \
>>>> --m size=1310720k,slots=16,maxmem=4194304k \
>>>> +-m size=1048576k,slots=16,maxmem=4194304k \
>>>>    -realtime mlock=off \
>>>>    -smp 1,sockets=1,cores=1,threads=1 \
>>>>    -object memory-backend-ram,id=memdimm0,size=536870912 \
>>>
>>> This doesn't look right: AFAIK the initial memory size is
>>> guest-visible, so by changing how the alignment is performed you
>>> might both change the guest ABI across guest boots (if libvirt is
>>> upgraded in between them) and break migration (if either the source
>>> or destination host is running the newer libvirt but the other side
>>> isn't).
>>
>> Good point. I failed to consider ABI stability for ppc64 guest migration.
>> Yes, this will break older guests that happen to have extra memory. In
>> fact, this can be specially harmful for migration.
>>
>> This means that I can't proceed with any other ppc64 changes made here.
>> Aligning ppc64 DIMMs in PostParse will achieve the same results even
>> without this patch - the DIMMs will be aligned before qemuDomainAlignMemorySizes()
>> and initialmem will be rounded to a more precise value of 'currentMemory'.
>> I can think of ways to align ppc64 DIMMs while not touching initialmem,
>> but all of them will require extra state in the domain definition. The
>> benefits are there (the DIMMs will be aligned in the live XML) but I'm
>> not sure it's worth the extra code.
> 
> I only skimmed the remaining patches and didn't dig into the code as
> much, or as recently, as you have, but from a high-level perspective
> I don't see why you wouldn't be able to simply move the existing
> rounding logic from the command line generator to PostParse? It's not
> like the former has access to additional information that the latter
> can't get to, right?

Interesting. I suppose that moving the logic of "qemuDomainAlignMemorySizes"
to PostParse might allows us to align DIMMs while being able to not touch
'initialmem'. I'm not entirely sure if that code is dependent on runtime stuff
though (the NUMA related code for instance). I'll investigate.


DHB


>