[PATCH v2 2/4] Support live migration between file-backed memory and anonymous memory.

Michael Galaxy via Devel posted 4 patches 4 months, 2 weeks ago
There is a newer version of this series
[PATCH v2 2/4] Support live migration between file-backed memory and anonymous memory.
Posted by Michael Galaxy via Devel 4 months, 2 weeks ago
From: Michael Galaxy <mgalaxy@akamai.com>

In our environment, we need to convert VMs into a live-update-comptabile
configuration "on-the-fly" (via live migration). More specifically: We
need to convert between anonymous memory-backed VMs and file-backed
memory VMs. So, for this very specific case, this needs to work when
host-level PMEM is being enabled. QEMU does not have a problem with this
at all, but we need to relax the rules here a bit so that libvirt allows
it to go through normally.

Signed-off-by: Michael Galaxy <mgalaxy@akamai.com>
---
 src/qemu/qemu_domain.c | 24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
index bda62f2e5c..93e98f8dae 100644
--- a/src/qemu/qemu_domain.c
+++ b/src/qemu/qemu_domain.c
@@ -8641,11 +8641,25 @@ qemuDomainABIStabilityCheck(const virDomainDef *src,
     size_t i;
 
     if (src->mem.source != dst->mem.source) {
-        virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
-                       _("Target memoryBacking source '%1$s' doesn't match source memoryBacking source'%2$s'"),
-                       virDomainMemorySourceTypeToString(dst->mem.source),
-                       virDomainMemorySourceTypeToString(src->mem.source));
-        return false;
+        /*
+         * The current use case for this is the live migration of live-update
+         * capable CPR guests mounted on PMEM devices at the host
+         * level (not in-guest PMEM). QEMU has no problem doing these kinds of
+         * live migrations between these two memory backends, so let them go through.
+         * This allows us to "upgrade" guests from regular memory to file-backed
+         * memory seemlessly without taking them down.
+         */
+        if (!((src->mem.source == VIR_DOMAIN_MEMORY_SOURCE_NONE
+                && dst->mem.source == VIR_DOMAIN_MEMORY_SOURCE_FILE) ||
+            (src->mem.source == VIR_DOMAIN_MEMORY_SOURCE_FILE
+                && dst->mem.source == VIR_DOMAIN_MEMORY_SOURCE_NONE))) {
+
+            virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
+                           _("Target memoryBacking source '%1$s' doesn't match source memoryBacking source'%2$s'"),
+                           virDomainMemorySourceTypeToString(dst->mem.source),
+                           virDomainMemorySourceTypeToString(src->mem.source));
+            return false;
+        }
     }
 
     for (i = 0; i < src->nmems; i++) {
-- 
2.34.1
Re: [PATCH v2 2/4] Support live migration between file-backed memory and anonymous memory.
Posted by Martin Kletzander 2 months, 1 week ago
On Wed, Jun 05, 2024 at 04:37:36PM -0400, mgalaxy@akamai.com wrote:
>From: Michael Galaxy <mgalaxy@akamai.com>
>
>In our environment, we need to convert VMs into a live-update-comptabile
>configuration "on-the-fly" (via live migration). More specifically: We
>need to convert between anonymous memory-backed VMs and file-backed
>memory VMs. So, for this very specific case, this needs to work when
>host-level PMEM is being enabled. QEMU does not have a problem with this
>at all, but we need to relax the rules here a bit so that libvirt allows
>it to go through normally.
>
>Signed-off-by: Michael Galaxy <mgalaxy@akamai.com>
>---
> src/qemu/qemu_domain.c | 24 +++++++++++++++++++-----
> 1 file changed, 19 insertions(+), 5 deletions(-)
>
>diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
>index bda62f2e5c..93e98f8dae 100644
>--- a/src/qemu/qemu_domain.c
>+++ b/src/qemu/qemu_domain.c
>@@ -8641,11 +8641,25 @@ qemuDomainABIStabilityCheck(const virDomainDef *src,
>     size_t i;
>
>     if (src->mem.source != dst->mem.source) {
>-        virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
>-                       _("Target memoryBacking source '%1$s' doesn't match source memoryBacking source'%2$s'"),
>-                       virDomainMemorySourceTypeToString(dst->mem.source),
>-                       virDomainMemorySourceTypeToString(src->mem.source));
>-        return false;
>+        /*
>+         * The current use case for this is the live migration of live-update
>+         * capable CPR guests mounted on PMEM devices at the host

Does libvirt need more adjustments to support cpr-reboots?  I don't
think we have any support for them yet.

>+         * level (not in-guest PMEM). QEMU has no problem doing these kinds of
>+         * live migrations between these two memory backends, so let them go through.
>+         * This allows us to "upgrade" guests from regular memory to file-backed
>+         * memory seemlessly without taking them down.
>+         */
>+        if (!((src->mem.source == VIR_DOMAIN_MEMORY_SOURCE_NONE
>+                && dst->mem.source == VIR_DOMAIN_MEMORY_SOURCE_FILE) ||
>+            (src->mem.source == VIR_DOMAIN_MEMORY_SOURCE_FILE
>+                && dst->mem.source == VIR_DOMAIN_MEMORY_SOURCE_NONE))) {
>+
>+            virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
>+                           _("Target memoryBacking source '%1$s' doesn't match source memoryBacking source'%2$s'"),

Maybe the message could be tweaked a bit to also mention the added
supported case, but that's just a nitpick.

>+                           virDomainMemorySourceTypeToString(dst->mem.source),
>+                           virDomainMemorySourceTypeToString(src->mem.source));
>+            return false;
>+        }
>     }
>
>     for (i = 0; i < src->nmems; i++) {
>-- 
>2.34.1
>
Re: [PATCH v2 2/4] Support live migration between file-backed memory and anonymous memory.
Posted by Michael Galaxy via Devel 2 months, 1 week ago
Hi,

On 8/6/24 07:38, Martin Kletzander wrote:
> On Wed, Jun 05, 2024 at 04:37:36PM -0400, mgalaxy@akamai.com wrote:
>> From: Michael Galaxy <mgalaxy@akamai.com>
>>
>>
>>     if (src->mem.source != dst->mem.source) {
>> -        virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
>> -                       _("Target memoryBacking source '%1$s' doesn't 
>> match source memoryBacking source'%2$s'"),
>> - virDomainMemorySourceTypeToString(dst->mem.source),
>> - virDomainMemorySourceTypeToString(src->mem.source));
>> -        return false;
>> +        /*
>> +         * The current use case for this is the live migration of 
>> live-update
>> +         * capable CPR guests mounted on PMEM devices at the host
>
> Does libvirt need more adjustments to support cpr-reboots?  I don't
> think we have any support for them yet.
>
Ummmm, no, not really, no. Which is a good question.

CPR has two different modes. "reboots" and "execs". The former is when 
you want to do a full kexec
(which blows away libvirt because you're  rebooting), and the latter 
does not do a kexec at all.

We are only  currently using the reboot mode. And it works just fine.

There are a number of QMP commands that CPR uses, but we are feeding 
those commands
through libvirt with just the normal qmp command support that it already 
provides rather than
doing any "built-in" changes to libvirt to support those features, 
currently.

So, no, we don't have a need (currently) to further modify libvirt to 
support CPR.

- Michael
Re: [PATCH v2 2/4] Support live migration between file-backed memory and anonymous memory.
Posted by Martin Kletzander 2 months, 1 week ago
On Tue, Aug 06, 2024 at 03:50:45PM -0500, Michael Galaxy wrote:
>Hi,
>
>On 8/6/24 07:38, Martin Kletzander wrote:
>> On Wed, Jun 05, 2024 at 04:37:36PM -0400, mgalaxy@akamai.com wrote:
>>> From: Michael Galaxy <mgalaxy@akamai.com>
>>>
>>>
>>>     if (src->mem.source != dst->mem.source) {
>>> -        virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
>>> -                       _("Target memoryBacking source '%1$s' doesn't
>>> match source memoryBacking source'%2$s'"),
>>> - virDomainMemorySourceTypeToString(dst->mem.source),
>>> - virDomainMemorySourceTypeToString(src->mem.source));
>>> -        return false;
>>> +        /*
>>> +         * The current use case for this is the live migration of
>>> live-update
>>> +         * capable CPR guests mounted on PMEM devices at the host
>>
>> Does libvirt need more adjustments to support cpr-reboots?  I don't
>> think we have any support for them yet.
>>
>Ummmm, no, not really, no. Which is a good question.
>
>CPR has two different modes. "reboots" and "execs". The former is when
>you want to do a full kexec
>(which blows away libvirt because you're  rebooting), and the latter
>does not do a kexec at all.
>
>We are only  currently using the reboot mode. And it works just fine.
>
>There are a number of QMP commands that CPR uses, but we are feeding
>those commands
>through libvirt with just the normal qmp command support that it already
>provides rather than
>doing any "built-in" changes to libvirt to support those features,
>currently.
>
>So, no, we don't have a need (currently) to further modify libvirt to
>support CPR.
>

The QMP command is fine, but since it messes with the VM behind
libvirt's back we will "taint" the domain.  In order for this to be
fully supported (together with any future changes, which makes it easier
for consumers of libvirt) it should be added to libvirt as a
possibility.

So you are solely relying on the fact that when we start QEMU again it
will use the same paths and just resume working?  That could be another
reason to make changes to libvirt, if only to make sure the paths are
the same.

>- Michael
>
Re: [PATCH v2 2/4] Support live migration between file-backed memory and anonymous memory.
Posted by Michael Galaxy via Devel 2 months, 1 week ago
Hi,

Answers below.....

On 8/7/24 08:26, Martin Kletzander wrote:
> On Tue, Aug 06, 2024 at 03:50:45PM -0500, Michael Galaxy wrote:
>> Hi,
>>
>> On 8/6/24 07:38, Martin Kletzander wrote:
>>> On Wed, Jun 05, 2024 at 04:37:36PM -0400, mgalaxy@akamai.com wrote:
>>>> From: Michael Galaxy <mgalaxy@akamai.com>
>>>>
>>>>
>>>>     if (src->mem.source != dst->mem.source) {
>>>> -        virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
>>>> -                       _("Target memoryBacking source '%1$s' doesn't
>>>> match source memoryBacking source'%2$s'"),
>>>> - virDomainMemorySourceTypeToString(dst->mem.source),
>>>> - virDomainMemorySourceTypeToString(src->mem.source));
>>>> -        return false;
>>>> +        /*
>>>> +         * The current use case for this is the live migration of
>>>> live-update
>>>> +         * capable CPR guests mounted on PMEM devices at the host
>>>
>>> Does libvirt need more adjustments to support cpr-reboots?  I don't
>>> think we have any support for them yet.
>>>
>> Ummmm, no, not really, no. Which is a good question.
>>
>> CPR has two different modes. "reboots" and "execs". The former is when
>> you want to do a full kexec
>> (which blows away libvirt because you're  rebooting), and the latter
>> does not do a kexec at all.
>>
>> We are only  currently using the reboot mode. And it works just fine.
>>
>> There are a number of QMP commands that CPR uses, but we are feeding
>> those commands
>> through libvirt with just the normal qmp command support that it already
>> provides rather than
>> doing any "built-in" changes to libvirt to support those features,
>> currently.
>>
>> So, no, we don't have a need (currently) to further modify libvirt to
>> support CPR.
>>
>
> The QMP command is fine, but since it messes with the VM behind
> libvirt's back we will "taint" the domain.  In order for this to be
> fully supported (together with any future changes, which makes it easier
> for consumers of libvirt) it should be added to libvirt as a
> possibility.
>

That's a valid point, but I think it's an exercise for a future RFC, I 
think.

What we have here so far is the minimal set of changes needed to make it 
work.
I'd like to avoid making this set too complicated. How we handle QMP 
abstractions
can be improved later if we want to engage the original CPR author 
(steven.sistare@oracle.com)
at some point.

> So you are solely relying on the fact that when we start QEMU again it
> will use the same paths and just resume working?  That could be another
> reason to make changes to libvirt, if only to make sure the paths are
> the same.
>
Yes, we are relying on that fact, correct. We have not had any serious 
issues on this front,
as when we're doing a live updates, we're expecting all the paths to be 
the same.

One interesting use case for potentially changing paths, as you say is 
maybe if there was
a storage change with a new QCOW2 path for some reason, or as you 
mentioned before
the number of NUMA nodes changed, but again, that would be highly 
irregular and intrusive
for a local-only live update.

If such situations are really happening, then the cloud manager should 
do a live *migration* instead of
a live update and get the original libvirt-managed system into its new 
configuration before
returning the host back to service. I live "update" (in place) would be 
pretty strange, I think,
if paths have the potential to change underneath us.

All good questions though,
- Michael
Re: [PATCH v2 2/4] Support live migration between file-backed memory and anonymous memory.
Posted by Martin Kletzander 2 months, 1 week ago
On Wed, Aug 07, 2024 at 10:20:49AM -0500, Michael Galaxy wrote:
>Hi,
>
>Answers below.....
>
>On 8/7/24 08:26, Martin Kletzander wrote:
>> On Tue, Aug 06, 2024 at 03:50:45PM -0500, Michael Galaxy wrote:
>>> Hi,
>>>
>>> On 8/6/24 07:38, Martin Kletzander wrote:
>>>> On Wed, Jun 05, 2024 at 04:37:36PM -0400, mgalaxy@akamai.com wrote:
>>>>> From: Michael Galaxy <mgalaxy@akamai.com>
>>>>>
>>>>>
>>>>>     if (src->mem.source != dst->mem.source) {
>>>>> -        virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
>>>>> -                       _("Target memoryBacking source '%1$s' doesn't
>>>>> match source memoryBacking source'%2$s'"),
>>>>> - virDomainMemorySourceTypeToString(dst->mem.source),
>>>>> - virDomainMemorySourceTypeToString(src->mem.source));
>>>>> -        return false;
>>>>> +        /*
>>>>> +         * The current use case for this is the live migration of
>>>>> live-update
>>>>> +         * capable CPR guests mounted on PMEM devices at the host
>>>>
>>>> Does libvirt need more adjustments to support cpr-reboots?  I don't
>>>> think we have any support for them yet.
>>>>
>>> Ummmm, no, not really, no. Which is a good question.
>>>
>>> CPR has two different modes. "reboots" and "execs". The former is when
>>> you want to do a full kexec
>>> (which blows away libvirt because you're  rebooting), and the latter
>>> does not do a kexec at all.
>>>
>>> We are only  currently using the reboot mode. And it works just fine.
>>>
>>> There are a number of QMP commands that CPR uses, but we are feeding
>>> those commands
>>> through libvirt with just the normal qmp command support that it already
>>> provides rather than
>>> doing any "built-in" changes to libvirt to support those features,
>>> currently.
>>>
>>> So, no, we don't have a need (currently) to further modify libvirt to
>>> support CPR.
>>>
>>
>> The QMP command is fine, but since it messes with the VM behind
>> libvirt's back we will "taint" the domain.  In order for this to be
>> fully supported (together with any future changes, which makes it easier
>> for consumers of libvirt) it should be added to libvirt as a
>> possibility.
>>
>
>That's a valid point, but I think it's an exercise for a future RFC, I
>think.
>

Oh, sure, it's not needed for this particular patchset, I was wondering
if there are more things we need to implement in order for CPR to be
supported in libvirt.

>What we have here so far is the minimal set of changes needed to make it
>work.
>I'd like to avoid making this set too complicated. How we handle QMP
>abstractions
>can be improved later if we want to engage the original CPR author
>(steven.sistare@oracle.com)
>at some point.
>
>> So you are solely relying on the fact that when we start QEMU again it
>> will use the same paths and just resume working?  That could be another
>> reason to make changes to libvirt, if only to make sure the paths are
>> the same.
>>
>Yes, we are relying on that fact, correct. We have not had any serious
>issues on this front,
>as when we're doing a live updates, we're expecting all the paths to be
>the same.
>

Well, if you are starting the domains without libvirt's prior knowledge
(i.e. not from as saved snapshot or anything like that) libvirt will
generate the paths based on the domain ID.  But if there is nothing
running during the start libvirt will start the IDs from number 1.
Unless the previous VM was also the first one started it will have
different ID and the whole path will be different.

>One interesting use case for potentially changing paths, as you say is
>maybe if there was
>a storage change with a new QCOW2 path for some reason, or as you
>mentioned before
>the number of NUMA nodes changed, but again, that would be highly
>irregular and intrusive
>for a local-only live update.
>
>If such situations are really happening, then the cloud manager should
>do a live *migration* instead of
>a live update and get the original libvirt-managed system into its new
>configuration before
>returning the host back to service. I live "update" (in place) would be
>pretty strange, I think,
>if paths have the potential to change underneath us.
>
>All good questions though,
>- Michael
>
Re: [PATCH v2 2/4] Support live migration between file-backed memory and anonymous memory.
Posted by Michael Galaxy via Devel 2 months, 1 week ago
Hi Martin,

Very good question below.....

On 8/9/24 06:10, Martin Kletzander wrote:
> On Wed, Aug 07, 2024 at 10:20:49AM -0500, Michael Galaxy wrote:
>
>> What we have here so far is the minimal set of changes needed to make it
>> work.
>> I'd like to avoid making this set too complicated. How we handle QMP
>> abstractions
>> can be improved later if we want to engage the original CPR author
>> (steven.sistare@oracle.com)
>> at some point.
>>
>>> So you are solely relying on the fact that when we start QEMU again it
>>> will use the same paths and just resume working?  That could be another
>>> reason to make changes to libvirt, if only to make sure the paths are
>>> the same.
>>>
>> Yes, we are relying on that fact, correct. We have not had any serious
>> issues on this front,
>> as when we're doing a live updates, we're expecting all the paths to be
>> the same.
>>
>
> Well, if you are starting the domains without libvirt's prior knowledge
> (i.e. not from as saved snapshot or anything like that) libvirt will
> generate the paths based on the domain ID.  But if there is nothing
> running during the start libvirt will start the IDs from number 1.
> Unless the previous VM was also the first one started it will have
> different ID and the whole path will be different.
>
That is in fact exactly what we are doing. =) I'll be demonstrating this
at KVM Forum next month, but basically, to save time, we ended up
starting QEMU *manually* inside the initramfs phase of the boot process.
This shaved many valuable seconds off the kexec resume process. We are
using libvirt's "domxml-to-native" command to provide us with most of
the QEMU command line and then we *manually* bring QEMU back to life
before libvirt is back. Later when systemd comes online, libvirt then 
starts back
up again and continues taking ownership of the VM that it started before
the kexec occurred. When we do it this way, there are no path naming issues.

We did this for many reasons: Pulling libvirt into the initramfs was 
extremely
heavy weight and we did not want to package the whole daemon with all
the systemd dependencies inside of the initramfs.... it's just too much 
baggage.

On the other hand, if libvirt had an offline "helper" binary capable of 
resming a
VM inside the initramfs that did not require us bypassing the daemon, 
that would be really
great, but those are thoughts for future work.

There are a lot of caveats that we had to workaround in the above approach:
the "domxml-to-native" output was not very well sanitized..... it has 
bugs in it.
We can try to post fixes to those fixes when we get around to them. 
Also, we had
to deal with cgroup reconstruction. The cgroups are of course blown away 
when
the new kernel takes over so we had to preserve them across the kexec so 
that
libvirt is able to re-attach to a fully "valid" QEMU when libvirt comes 
back online.

So, there's a lot to unpack there => Much too complex of a conversation 
for this
more minimal patchset.

- Michael
Re: [PATCH v2 2/4] Support live migration between file-backed memory and anonymous memory.
Posted by Martin Kletzander 1 month, 1 week ago
On Fri, Aug 09, 2024 at 03:40:41PM -0500, Michael Galaxy wrote:
>Hi Martin,
>
>Very good question below.....
>

Sorry, this thread slipped my mind for a while.

>On 8/9/24 06:10, Martin Kletzander wrote:
>> On Wed, Aug 07, 2024 at 10:20:49AM -0500, Michael Galaxy wrote:
>>
>>> What we have here so far is the minimal set of changes needed to make it
>>> work.
>>> I'd like to avoid making this set too complicated. How we handle QMP
>>> abstractions
>>> can be improved later if we want to engage the original CPR author
>>> (steven.sistare@oracle.com)
>>> at some point.
>>>
>>>> So you are solely relying on the fact that when we start QEMU again it
>>>> will use the same paths and just resume working?  That could be another
>>>> reason to make changes to libvirt, if only to make sure the paths are
>>>> the same.
>>>>
>>> Yes, we are relying on that fact, correct. We have not had any serious
>>> issues on this front,
>>> as when we're doing a live updates, we're expecting all the paths to be
>>> the same.
>>>
>>
>> Well, if you are starting the domains without libvirt's prior knowledge
>> (i.e. not from as saved snapshot or anything like that) libvirt will
>> generate the paths based on the domain ID.  But if there is nothing
>> running during the start libvirt will start the IDs from number 1.
>> Unless the previous VM was also the first one started it will have
>> different ID and the whole path will be different.
>>
>That is in fact exactly what we are doing. =) I'll be demonstrating this
>at KVM Forum next month,

That's great, I'm looking forward to it.  It might be the best venue to
talk about the intricacies of introducing support for such a feature.

>but basically, to save time, we ended up
>starting QEMU *manually* inside the initramfs phase of the boot process.
>This shaved many valuable seconds off the kexec resume process. We are
>using libvirt's "domxml-to-native" command to provide us with most of
>the QEMU command line and then we *manually* bring QEMU back to life
>before libvirt is back. Later when systemd comes online, libvirt then
>starts back
>up again and continues taking ownership of the VM that it started before
>the kexec occurred. When we do it this way, there are no path naming issues.
>
>We did this for many reasons: Pulling libvirt into the initramfs was
>extremely
>heavy weight and we did not want to package the whole daemon with all
>the systemd dependencies inside of the initramfs.... it's just too much
>baggage.
>
>On the other hand, if libvirt had an offline "helper" binary capable of
>resming a
>VM inside the initramfs that did not require us bypassing the daemon,
>that would be really
>great, but those are thoughts for future work.
>
>There are a lot of caveats that we had to workaround in the above approach:
>the "domxml-to-native" output was not very well sanitized..... it has
>bugs in it.
>We can try to post fixes to those fixes when we get around to them.
>Also, we had
>to deal with cgroup reconstruction. The cgroups are of course blown away
>when
>the new kernel takes over so we had to preserve them across the kexec so
>that
>libvirt is able to re-attach to a fully "valid" QEMU when libvirt comes
>back online.
>
>So, there's a lot to unpack there => Much too complex of a conversation
>for this
>more minimal patchset.
>

I see there is much to solve which needs to be worked around it this
moment.

I understand that this small feature you posted is just the beginning
and does not promise to bring the whole CPR feature set to libvirt.  I
am just trying to be cautious about the future proof aspect of the
design.  That's because, even though this is a small first step, it
might, at least from my experience, bite us in the back later on if
there are some details like the need for changing the paths due to
something almost irrelevant.

>- Michael
>