qemu: Switch to new -numa memdev=

[PATCH 0/2] qemu: Switch to new -numa memdev=

Posted by Michal Privoznik 5 years, 9 months ago

In a way, libvirt already uses -numa memdev= in a few cases. In fact, in
as few cases as possible - only configurations which can not be
configured with -numa mem=, because these two ways are incompatible when
it comes to migration.

My approach to solve this is to have a privateData flag which tells
which directs libvirt to generate old or new cmd line. And then this
flag is saved into migration cookie so that the destination is directed
the same way.

Problem with this approach is that, while migration 6.3.0 -> 6.4.0 ->
6.3.0 works, migration where the machine is started on newer libvirt
6.4.0 and then migrated back to 6.3.0 won't work (in fact is explicitly
denied by 2/2) even though there is nothing visible that should prevent
the migration.

I am not sure whether we have a good move here, because even if we
waited until QEMU removes the old way, it won't help us really. We
would be just leaving the problem for future us.

Michal Prívozník (2):
  qemu: Switch to new -numa memdev=
  qemu: Disallow migration to older -numa if newer is used

 src/qemu/qemu_command.c                       | 20 ++---
 src/qemu/qemu_domain.c                        | 38 +++++++++-
 src/qemu/qemu_domain.h                        |  3 +
 src/qemu/qemu_migration.c                     | 18 ++++-
 src/qemu/qemu_migration_cookie.c              | 73 +++++++++++++++++++
 src/qemu/qemu_migration_cookie.h              | 12 +++
 src/qemu/qemu_process.c                       | 25 ++++++-
 .../qemustatusxml2xmldata/backup-pull-in.xml  |  1 +
 .../blockjob-blockdev-in.xml                  |  1 +
 .../blockjob-mirror-in.xml                    |  1 +
 .../disk-secinfo-upgrade-in.xml               |  1 +
 .../disk-secinfo-upgrade-out.xml              |  1 +
 .../migration-in-params-in.xml                |  1 +
 .../migration-out-nbd-in.xml                  |  1 +
 .../migration-out-nbd-out.xml                 |  1 +
 .../migration-out-nbd-tls-in.xml              |  1 +
 .../migration-out-nbd-tls-out.xml             |  1 +
 .../migration-out-params-in.xml               |  1 +
 tests/qemustatusxml2xmldata/modern-in.xml     |  1 +
 .../qemustatusxml2xmldata/vcpus-multi-in.xml  |  1 +
 .../hugepages-numa-default-2M.args            | 10 ++-
 .../hugepages-numa-default-dimm.args          |  8 +-
 .../hugepages-numa-default.args               |  6 +-
 .../memory-hotplug-dimm-addr.args             |  3 +-
 .../qemuxml2argvdata/memory-hotplug-dimm.args |  3 +-
 ...y-hotplug-nvdimm-access.x86_64-latest.args |  3 +-
 ...ry-hotplug-nvdimm-align.x86_64-latest.args |  3 +-
 ...ry-hotplug-nvdimm-label.x86_64-latest.args |  3 +-
 ...ory-hotplug-nvdimm-pmem.x86_64-latest.args |  3 +-
 ...ory-hotplug-nvdimm-ppc64.ppc64-latest.args |  3 +-
 ...hotplug-nvdimm-readonly.x86_64-latest.args |  3 +-
 .../memory-hotplug-nvdimm.x86_64-latest.args  |  3 +-
 .../numatune-auto-prefer.args                 |  4 +-
 .../qemuxml2argvdata/pages-dimm-discard.args  |  3 +-
 .../pages-discard-hugepages.args              | 11 ++-
 tests/qemuxml2argvdata/pages-discard.args     | 12 ++-
 36 files changed, 236 insertions(+), 47 deletions(-)

-- 
2.26.2

Re: [PATCH 0/2] qemu: Switch to new -numa memdev=

Posted by Daniel Henrique Barboza 5 years, 9 months ago


On 5/12/20 11:52 AM, Michal Privoznik wrote:
> In a way, libvirt already uses -numa memdev= in a few cases. In fact, in
> as few cases as possible - only configurations which can not be
> configured with -numa mem=, because these two ways are incompatible when
> it comes to migration.
> 
> My approach to solve this is to have a privateData flag which tells
> which directs libvirt to generate old or new cmd line. And then this
> flag is saved into migration cookie so that the destination is directed
> the same way.
> 
> Problem with this approach is that, while migration 6.3.0 -> 6.4.0 ->
> 6.3.0 works, migration where the machine is started on newer libvirt
> 6.4.0 and then migrated back to 6.3.0 won't work (in fact is explicitly
> denied by 2/2) even though there is nothing visible that should prevent
> the migration.

Is there any advantage in using the new "memdev" format instead of the old
one? An actual benefit inside QEMU? Or is it just an API change?

If there is a benefit, I'd say we make the user choose to use the new
memdev format, explicitly, in the NUMA XML. Or, if we want to default to
use the new format, the user needs to set a "legacy" flag in the XML.
This makes the migration compatibility an user problem,with clear benefits and
drawbacks, and then the user can decide whether it is worth breaking migration
for it. Eventually, when QEMU moves on, the 'legacy' option can be deprecated.

If there is no benefit aside from "-numa mem=" stop working one day, I'd say
to keep the legacy format as long as possible. When the QEMU binary stops
supporting it, then we have no choice other than to stop using legacy format
with the newer binaries. But the user will move on to the newer QEMU binaries
across the env, so it's ok to not use the legacy format any longer. This approach
also avoids dealing with migration issues in Libvirt side.


> 
> I am not sure whether we have a good move here, because even if we
> waited until QEMU removes the old way, it won't help us really. We
> would be just leaving the problem for future us.


I think the problem is simpler to handle in Libvirt when QEMU deprecates the old
format entirely. Otherwise the user will have to also keep track of Libvirt versions
to understand why the migration is failing here and there.



Thanks,


DHB


> 
> Michal Prívozník (2):
>    qemu: Switch to new -numa memdev=
>    qemu: Disallow migration to older -numa if newer is used
> 
>   src/qemu/qemu_command.c                       | 20 ++---
>   src/qemu/qemu_domain.c                        | 38 +++++++++-
>   src/qemu/qemu_domain.h                        |  3 +
>   src/qemu/qemu_migration.c                     | 18 ++++-
>   src/qemu/qemu_migration_cookie.c              | 73 +++++++++++++++++++
>   src/qemu/qemu_migration_cookie.h              | 12 +++
>   src/qemu/qemu_process.c                       | 25 ++++++-
>   .../qemustatusxml2xmldata/backup-pull-in.xml  |  1 +
>   .../blockjob-blockdev-in.xml                  |  1 +
>   .../blockjob-mirror-in.xml                    |  1 +
>   .../disk-secinfo-upgrade-in.xml               |  1 +
>   .../disk-secinfo-upgrade-out.xml              |  1 +
>   .../migration-in-params-in.xml                |  1 +
>   .../migration-out-nbd-in.xml                  |  1 +
>   .../migration-out-nbd-out.xml                 |  1 +
>   .../migration-out-nbd-tls-in.xml              |  1 +
>   .../migration-out-nbd-tls-out.xml             |  1 +
>   .../migration-out-params-in.xml               |  1 +
>   tests/qemustatusxml2xmldata/modern-in.xml     |  1 +
>   .../qemustatusxml2xmldata/vcpus-multi-in.xml  |  1 +
>   .../hugepages-numa-default-2M.args            | 10 ++-
>   .../hugepages-numa-default-dimm.args          |  8 +-
>   .../hugepages-numa-default.args               |  6 +-
>   .../memory-hotplug-dimm-addr.args             |  3 +-
>   .../qemuxml2argvdata/memory-hotplug-dimm.args |  3 +-
>   ...y-hotplug-nvdimm-access.x86_64-latest.args |  3 +-
>   ...ry-hotplug-nvdimm-align.x86_64-latest.args |  3 +-
>   ...ry-hotplug-nvdimm-label.x86_64-latest.args |  3 +-
>   ...ory-hotplug-nvdimm-pmem.x86_64-latest.args |  3 +-
>   ...ory-hotplug-nvdimm-ppc64.ppc64-latest.args |  3 +-
>   ...hotplug-nvdimm-readonly.x86_64-latest.args |  3 +-
>   .../memory-hotplug-nvdimm.x86_64-latest.args  |  3 +-
>   .../numatune-auto-prefer.args                 |  4 +-
>   .../qemuxml2argvdata/pages-dimm-discard.args  |  3 +-
>   .../pages-discard-hugepages.args              | 11 ++-
>   tests/qemuxml2argvdata/pages-discard.args     | 12 ++-
>   36 files changed, 236 insertions(+), 47 deletions(-)
>

Re: [PATCH 0/2] qemu: Switch to new -numa memdev=

Posted by Michal Privoznik 5 years, 9 months ago

On 5/13/20 4:48 PM, Daniel Henrique Barboza wrote:
> 
> 
> On 5/12/20 11:52 AM, Michal Privoznik wrote:
>> In a way, libvirt already uses -numa memdev= in a few cases. In fact, in
>> as few cases as possible - only configurations which can not be
>> configured with -numa mem=, because these two ways are incompatible when
>> it comes to migration.
>>
>> My approach to solve this is to have a privateData flag which tells
>> which directs libvirt to generate old or new cmd line. And then this
>> flag is saved into migration cookie so that the destination is directed
>> the same way.
>>
>> Problem with this approach is that, while migration 6.3.0 -> 6.4.0 ->
>> 6.3.0 works, migration where the machine is started on newer libvirt
>> 6.4.0 and then migrated back to 6.3.0 won't work (in fact is explicitly
>> denied by 2/2) even though there is nothing visible that should prevent
>> the migration.
> 
> Is there any advantage in using the new "memdev" format instead of the old
> one? An actual benefit inside QEMU? Or is it just an API change?

I'm not aware of any real benefit. IIUC, qemu is trying to clean up its 
own code so they changed the API. The qemu commit I'm referencing in 1/2 
mentions possible performance benefits of using the new API. Also, the 
commit mentions better control over memory allocation because we won't 
be using global -mem-prealloc and -mem-path but in libvirt, we would use 
corresponding attributes of memory-backend-* anyway. Therefore, from my 
POV it's only API change.

> 
> If there is a benefit, I'd say we make the user choose to use the new
> memdev format, explicitly, in the NUMA XML. Or, if we want to default to
> use the new format, the user needs to set a "legacy" flag in the XML.
> This makes the migration compatibility an user problem,with clear 
> benefits and
> drawbacks, and then the user can decide whether it is worth breaking 
> migration
> for it. Eventually, when QEMU moves on, the 'legacy' option can be 
> deprecated.
> 

Interesting idea. This is basically the same as making the private 
variable I'm inventing controllable by user. Let me check if there is a 
way to actually detect whether qemu supports only the old way, or only 
the new way or both.

And it looks like it sort of can. I mean, there is numa-mem-supported 
attribute to a machine type in output of 'query-machines'. But there 
doesn't seem to be anything for -numa memdev=.

> If there is no benefit aside from "-numa mem=" stop working one day, I'd 
> say
> to keep the legacy format as long as possible. When the QEMU binary stops
> supporting it, then we have no choice other than to stop using legacy 
> format
> with the newer binaries. But the user will move on to the newer QEMU 
> binaries
> across the env, so it's ok to not use the legacy format any longer. This 
> approach
> also avoids dealing with migration issues in Libvirt side.

Is that so? Qemu deprecated the old API and just for the sake of the 
argument, imagine they already removed it. Now, on libvirt level, how do 
we handle it? I mean, you are not able to migrate from older qemu to 
this hypothetical qemu. Sure, we can blame qemu, or we can start using 
the new API and some machines started with older qemu would be able to 
migrate to the hypothetical qemu.

Yeah, we don't have a good option.

> 
> 
>>
>> I am not sure whether we have a good move here, because even if we
>> waited until QEMU removes the old way, it won't help us really. We
>> would be just leaving the problem for future us.
> 
> 
> I think the problem is simpler to handle in Libvirt when QEMU deprecates 
> the old
> format entirely. Otherwise the user will have to also keep track of 
> Libvirt versions
> to understand why the migration is failing here and there.

I think they will have to track both actually. They will have to track 
QEMU version, no argument there. But also Libvirt version - they minimal 
required version would have to be the one where Libvirt switches from 
old to the new API.

Michal

Re: [PATCH 0/2] qemu: Switch to new -numa memdev=

Posted by Daniel Henrique Barboza 5 years, 9 months ago

On 5/13/20 12:32 PM, Michal Privoznik wrote:
> On 5/13/20 4:48 PM, Daniel Henrique Barboza wrote:
>>
>>
>> On 5/12/20 11:52 AM, Michal Privoznik wrote:

[...]

>> I think the problem is simpler to handle in Libvirt when QEMU deprecates the old
>> format entirely. Otherwise the user will have to also keep track of Libvirt versions
>> to understand why the migration is failing here and there.
> 
> I think they will have to track both actually. They will have to track QEMU version, no argument there. But also Libvirt version - they minimal required version would have to be the one where Libvirt switches from old to the new API.

In that case, I believe that offering a XML knob to choose which NUMA mode
the user wants (like I mentioned in the previous message) is a good way to
mitigate the overall annoyance of the situation. It's simpler to add "mode='legacy'"
in the domain XML than to update Libvirt.

We can keep this mechanism option around until we're certain that (most) users
moved on to newer QEMU/Libvirt versions. Granted, some guests will end up
breaking when the option is deprecated anyway. At least this is a bit more of a
"controlled explosion" scenario.

Thanks,

DHB

> 
> Michal
>

Re: [PATCH 0/2] qemu: Switch to new -numa memdev=

Posted by Daniel P. Berrangé 5 years, 9 months ago

On Wed, May 13, 2020 at 01:36:05PM -0300, Daniel Henrique Barboza wrote:
> 
> 
> On 5/13/20 12:32 PM, Michal Privoznik wrote:
> > On 5/13/20 4:48 PM, Daniel Henrique Barboza wrote:
> > > 
> > > 
> > > On 5/12/20 11:52 AM, Michal Privoznik wrote:
> 
> [...]
> 
> > > I think the problem is simpler to handle in Libvirt when QEMU deprecates the old
> > > format entirely. Otherwise the user will have to also keep track of Libvirt versions
> > > to understand why the migration is failing here and there.
> > 
> > I think they will have to track both actually. They will have to track QEMU version, no argument there. But also Libvirt version - they minimal required version would have to be the one where Libvirt switches from old to the new API.
> 
> In that case, I believe that offering a XML knob to choose which NUMA mode
> the user wants (like I mentioned in the previous message) is a good way to
> mitigate the overall annoyance of the situation. It's simpler to add "mode='legacy'"
> in the domain XML than to update Libvirt.

This is something we really don't want todo, as this is supposed to be an
internal impl detail. The fact that it had migration impact was accident.
If we expose it as a knob, then you need to teach mgmt apps to set the
knob which has a huge ripple effect through the consumers of libvirt.
This is what the QEMU feature advertizement against machine type is
supposed to avoid

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|