[libvirt] [PATCH] qemu: Warn of restore with managed save being risky

Michael Weiser posted 1 patch 4 years, 3 months ago
Test syntax-check failed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/20191230180736.3203497-1-michael.weiser@gmx.de
src/qemu/qemu_driver.c | 9 +++++++++
1 file changed, 9 insertions(+)
[libvirt] [PATCH] qemu: Warn of restore with managed save being risky
Posted by Michael Weiser 4 years, 3 months ago
Internal snapshots of a non-running domain do not carry any memory state
and restoring such a snapshot will not replace existing saved memory
state. This allows a scenario, where a user first suspends a domain into
managedsave, restores a non-running snapshot and then resumes the domain
from managedsave. After that, the guest system will run with its
previous memory state atop a different disk state. The most obvious
possible fallout from this is extensive file system corruption. Swap
content and RAID bitmaps might also be off.

This has been discussed[1] and fixed[2] from the end-user perspective for
virt-manager.

This patch marks the restore operation as risky at the libvirt level,
requiring the user to remove the saved memory state first or force the
operation.

[1] https://www.redhat.com/archives/virt-tools-list/2019-November/msg00011.html
[2] https://www.redhat.com/archives/virt-tools-list/2019-December/msg00049.html

Signed-off-by: Michael Weiser <michael.weiser@gmx.de>
Cc: Cole Robinson <crobinso@redhat.com>
---
 src/qemu/qemu_driver.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index ec8faf384c..dcd103d3bb 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -16652,6 +16652,15 @@ qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot,
                            _("must respawn qemu to start inactive snapshot"));
             goto endjob;
         }
+        if (vm->hasManagedSave &&
+            !(snapdef->state == VIR_DOMAIN_SNAPSHOT_RUNNING ||
+              snapdef->state == VIR_DOMAIN_SNAPSHOT_PAUSED)) {
+            virReportError(VIR_ERR_SNAPSHOT_REVERT_RISKY, "%s",
+                           _("revert to snapshot while there is a managed "
+                             "saved state will cause corruption when run, "
+                             "remove saved state first"));
+            goto endjob;
+        }
     }
 
     if (snap->def->dom) {
-- 
2.24.1


--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] qemu: Warn of restore with managed save being risky
Posted by Daniel Henrique Barboza 4 years, 3 months ago

On 12/30/19 3:07 PM, Michael Weiser wrote:
> Internal snapshots of a non-running domain do not carry any memory state
> and restoring such a snapshot will not replace existing saved memory
> state. This allows a scenario, where a user first suspends a domain into
> managedsave, restores a non-running snapshot and then resumes the domain
> from managedsave. After that, the guest system will run with its
> previous memory state atop a different disk state. The most obvious
> possible fallout from this is extensive file system corruption. Swap
> content and RAID bitmaps might also be off.
> 
> This has been discussed[1] and fixed[2] from the end-user perspective for
> virt-manager.
> 
> This patch marks the restore operation as risky at the libvirt level,
> requiring the user to remove the saved memory state first or force the
> operation.
> 
> [1] https://www.redhat.com/archives/virt-tools-list/2019-November/msg00011.html
> [2] https://www.redhat.com/archives/virt-tools-list/2019-December/msg00049.html
> 
> Signed-off-by: Michael Weiser <michael.weiser@gmx.de>
> Cc: Cole Robinson <crobinso@redhat.com>
> ---

As said in [1], I agree that the API needs a flag override to allow the user
to roll with it despite the risks. Given that this is somewhat a corner case,
I also believe that such override can go in a separated patch/series later
on.



Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] qemu: Warn of restore with managed save being risky
Posted by Michael Weiser 4 years, 3 months ago
Hi Daniel,

On Tue, Dec 31, 2019 at 05:19:54PM -0300, Daniel Henrique Barboza wrote:

> > This patch marks the restore operation as risky at the libvirt level,
> > requiring the user to remove the saved memory state first or force the
> > operation.
> > 
> > [1] https://www.redhat.com/archives/virt-tools-list/2019-November/msg00011.html
> > [2] https://www.redhat.com/archives/virt-tools-list/2019-December/msg00049.html
> > 
> > Signed-off-by: Michael Weiser <michael.weiser@gmx.de>
> > Cc: Cole Robinson <crobinso@redhat.com>
> > ---
> As said in [1], I agree that the API needs a flag override to allow the user
> to roll with it despite the risks. Given that this is somewhat a corner case,
> I also believe that such override can go in a separated patch/series later
> on.

Do you mean a separate override flag beyond
VIR_DOMAIN_SNAPSHOT_REVERT_FORCE? Or should I just update the commit
message to make clear how the user can force the operation?

> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>

Thanks!
-- 
Michael


--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] qemu: Warn of restore with managed save being risky
Posted by Daniel Henrique Barboza 4 years, 3 months ago

On 12/31/19 5:42 PM, Michael Weiser wrote:
> Hi Daniel,
> 
> On Tue, Dec 31, 2019 at 05:19:54PM -0300, Daniel Henrique Barboza wrote:
> 
>>> This patch marks the restore operation as risky at the libvirt level,
>>> requiring the user to remove the saved memory state first or force the
>>> operation.
>>>
>>> [1] https://www.redhat.com/archives/virt-tools-list/2019-November/msg00011.html
>>> [2] https://www.redhat.com/archives/virt-tools-list/2019-December/msg00049.html
>>>
>>> Signed-off-by: Michael Weiser <michael.weiser@gmx.de>
>>> Cc: Cole Robinson <crobinso@redhat.com>
>>> ---
>> As said in [1], I agree that the API needs a flag override to allow the user
>> to roll with it despite the risks. Given that this is somewhat a corner case,
>> I also believe that such override can go in a separated patch/series later
>> on.
> 
> Do you mean a separate override flag beyond
> VIR_DOMAIN_SNAPSHOT_REVERT_FORCE? Or should I just update the commit
> message to make clear how the user can force the operation?


Ah, I overlooked the "or force the operation" part of the commit message after
reading the discussions in [1] and [2].

Instead of updating the commit message, I think you can update the error message
to mention the '--force' option, e.g. :


+            virReportError(VIR_ERR_SNAPSHOT_REVERT_RISKY, "%s",
+                           _("revert to snapshot while there is a managed "
+                             "saved state will cause corruption when run, "
+                             "remove saved state first or use the "
+                             "--force option"));



I also noticed that the man page for 'snapshot-revert' mentions two cases where
the use of --force is required. One of them talks about snapshots created using old
Libvirt versions. The other goes as follows:

----
(docs/manpages/virsh.rst)


The other is the case of reverting from a running domain to an active state
where a new hypervisor has to be created rather than reusing the existing
hypervisor, because it implies drawbacks such as breaking any existing
VNC or Spice connections; this condition happens with an active
snapshot that uses a provably incompatible configuration, as well as
with an inactive snapshot that is combined with the --start or
--pause flag.
----

I am almost certain that scenario you're handling here is not covered by this
excerpt. In this case, since you're adding a new '--force' condition, I believe a
second patch adding this new '--force' condition in the documentation is a
nice touch.



Thanks,


DHB


> 
>> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
> 
> Thanks!
> 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] qemu: Warn of restore with managed save being risky
Posted by Michael Weiser 4 years, 3 months ago
Hello Daniel,

On Thu, Jan 02, 2020 at 09:58:19AM -0300, Daniel Henrique Barboza wrote:

> > > > This patch marks the restore operation as risky at the libvirt level,
> > > > requiring the user to remove the saved memory state first or force the
> > > > operation.
> > > > 
> > > > [1] https://www.redhat.com/archives/virt-tools-list/2019-November/msg00011.html
> > > > [2] https://www.redhat.com/archives/virt-tools-list/2019-December/msg00049.html
> > > > 
> > > > Signed-off-by: Michael Weiser <michael.weiser@gmx.de>
> > > > Cc: Cole Robinson <crobinso@redhat.com>
> > > > ---
> > > As said in [1], I agree that the API needs a flag override to allow the user
> > > to roll with it despite the risks. Given that this is somewhat a corner case,
> > > I also believe that such override can go in a separated patch/series later
> > > on.
> > 
> > Do you mean a separate override flag beyond
> > VIR_DOMAIN_SNAPSHOT_REVERT_FORCE? Or should I just update the commit
> > message to make clear how the user can force the operation?
> Ah, I overlooked the "or force the operation" part of the commit message after
> reading the discussions in [1] and [2].

> Instead of updating the commit message, I think you can update the error message
> to mention the '--force' option, e.g. :


> +            virReportError(VIR_ERR_SNAPSHOT_REVERT_RISKY, "%s",
> +                           _("revert to snapshot while there is a managed "
> +                             "saved state will cause corruption when run, "
> +                             "remove saved state first or use the "
> +                             "--force option"));

I'd rather not reference virsh command options in the error message as
it would be highly confusing in any other context. For example, python
clients get the error message wrapped in an exception, augmented already
by a prefix telling them they need to force the operation:

Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 75, in cb_wrapper
    callback(asyncjob, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 111, in tmpcb
    callback(*args, **kwargs)
  File "/usr/share/virt-manager/virtManager/object/libvirtobject.py", line 66, in newfn
    ret = fn(self, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/object/domain.py", line 1055, in revert_to_snapshot
    self._backend.revertToSnapshot(snap.get_backend())
  File "/usr/lib64/python3.6/site-packages/libvirt.py", line 2088, in revertToSnapshot
    if ret == -1: raise libvirtError ('virDomainRevertToSnapshot() failed', dom=self)
libvirt.libvirtError: revert requires force: revert to snapshot while
there is a managed saved state will cause corruption when run, remove
saved state first

The same is actually the case for virsh already:

virsh # snapshot-revert debian --snapshotname snapshot1
error: revert requires force: revert to snapshot while there is a
managed saved state will cause corruption when run, remove saved state
first

virsh # 

We could of course reword to better take context and prefix into
account, e.g.:

error: revert requires force: Removal of existing managed saved state
strongly recommended to avoid corruption

Any ideas?

> I also noticed that the man page for 'snapshot-revert' mentions two cases where
> the use of --force is required. One of them talks about snapshots created using old
> Libvirt versions. The other goes as follows:

> ----
> (docs/manpages/virsh.rst)


> The other is the case of reverting from a running domain to an active state
> where a new hypervisor has to be created rather than reusing the existing
> hypervisor, because it implies drawbacks such as breaking any existing
> VNC or Spice connections; this condition happens with an active
> snapshot that uses a provably incompatible configuration, as well as
> with an inactive snapshot that is combined with the --start or
> --pause flag.
> ----

> I am almost certain that scenario you're handling here is not covered by this
> excerpt. In this case, since you're adding a new '--force' condition, I believe a
> second patch adding this new '--force' condition in the documentation is a
> nice touch.

Will do. v2 coming as soon as we've agreed on where to go with the error message.
-- 
Thanks,
Michael


--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] qemu: Warn of restore with managed save being risky
Posted by Daniel Henrique Barboza 4 years, 3 months ago

On 1/2/20 11:36 AM, Michael Weiser wrote:
> Hello Daniel,
> 
> On Thu, Jan 02, 2020 at 09:58:19AM -0300, Daniel Henrique Barboza wrote:
> 

[...]

  
> I'd rather not reference virsh command options in the error message as
> it would be highly confusing in any other context. For example, python
> clients get the error message wrapped in an exception, augmented already
> by a prefix telling them they need to force the operation:

Good point.
  
> 
> Traceback (most recent call last):
>    File "/usr/share/virt-manager/virtManager/asyncjob.py", line 75, in cb_wrapper
>      callback(asyncjob, *args, **kwargs)
>    File "/usr/share/virt-manager/virtManager/asyncjob.py", line 111, in tmpcb
>      callback(*args, **kwargs)
>    File "/usr/share/virt-manager/virtManager/object/libvirtobject.py", line 66, in newfn
>      ret = fn(self, *args, **kwargs)
>    File "/usr/share/virt-manager/virtManager/object/domain.py", line 1055, in revert_to_snapshot
>      self._backend.revertToSnapshot(snap.get_backend())
>    File "/usr/lib64/python3.6/site-packages/libvirt.py", line 2088, in revertToSnapshot
>      if ret == -1: raise libvirtError ('virDomainRevertToSnapshot() failed', dom=self)
> libvirt.libvirtError: revert requires force: revert to snapshot while
> there is a managed saved state will cause corruption when run, remove
> saved state first
> 
> The same is actually the case for virsh already:
> 
> virsh # snapshot-revert debian --snapshotname snapshot1
> error: revert requires force: revert to snapshot while there is a
> managed saved state will cause corruption when run, remove saved state
> first
> 
> virsh #
> 
> We could of course reword to better take context and prefix into
> account, e.g.:

Since there is already a "revert requires force" prefix in both python and
virsh error messages, changing the error message of this v1 becomes more of
a wording/flavor issue.

> 
> error: revert requires force: Removal of existing managed saved state
> strongly recommended to avoid corruption


I prefer this wording though :)


Thanks,


DHB

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] qemu: Warn of restore with managed save being risky
Posted by Michael Weiser 4 years, 3 months ago
Hello Daniel,

On Thu, Jan 02, 2020 at 11:54:23AM -0300, Daniel Henrique Barboza wrote:

> On 1/2/20 11:36 AM, Michael Weiser wrote:

> > I'd rather not reference virsh command options in the error message as
> > it would be highly confusing in any other context. For example, python
> > clients get the error message wrapped in an exception, augmented already
> > by a prefix telling them they need to force the operation:
> Good point.
> > 
> > Traceback (most recent call last):
> >    File "/usr/share/virt-manager/virtManager/asyncjob.py", line 75, in cb_wrapper
> >      callback(asyncjob, *args, **kwargs)
> >    File "/usr/share/virt-manager/virtManager/asyncjob.py", line 111, in tmpcb
> >      callback(*args, **kwargs)
> >    File "/usr/share/virt-manager/virtManager/object/libvirtobject.py", line 66, in newfn
> >      ret = fn(self, *args, **kwargs)
> >    File "/usr/share/virt-manager/virtManager/object/domain.py", line 1055, in revert_to_snapshot
> >      self._backend.revertToSnapshot(snap.get_backend())
> >    File "/usr/lib64/python3.6/site-packages/libvirt.py", line 2088, in revertToSnapshot
> >      if ret == -1: raise libvirtError ('virDomainRevertToSnapshot() failed', dom=self)
> > libvirt.libvirtError: revert requires force: revert to snapshot while
> > there is a managed saved state will cause corruption when run, remove
> > saved state first
> > 
> > The same is actually the case for virsh already:
> > 
> > virsh # snapshot-revert debian --snapshotname snapshot1
> > error: revert requires force: revert to snapshot while there is a
> > managed saved state will cause corruption when run, remove saved state
> > first
> > 
> > virsh #
> > 
> > We could of course reword to better take context and prefix into
> > account, e.g.:
> Since there is already a "revert requires force" prefix in both python and
> virsh error messages, changing the error message of this v1 becomes more of
> a wording/flavor issue.

> > 
> > error: revert requires force: Removal of existing managed saved state
> > strongly recommended to avoid corruption

> I prefer this wording though :)

I went with:

error: revert requires force: snapshot without memory state, removal of
existing managed saved state strongly recommended to avoid corruption

because it gives the reason and recommended action in compact form.

v2 is on its way.
-- 
Michael


--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list