[v5] Incremental Backup API additions

[libvirt] [PATCH v5 03/20] backup: Document nuances between different state capture APIs

Posted by Eric Blake 6 years, 11 months ago

Upcoming patches will add support for incremental backups via
a new API; but first, we need a landing page that gives an
overview of capturing various pieces of guest state, and which
APIs are best suited to which tasks.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v2: wording improvements based on review
---
 docs/docs.html.in               |   5 +
 docs/domainstatecapture.html.in | 314 ++++++++++++++++++++++++++++++++
 docs/formatsnapshot.html.in     |   2 +
 3 files changed, 321 insertions(+)
 create mode 100644 docs/domainstatecapture.html.in

diff --git a/docs/docs.html.in b/docs/docs.html.in
index d0ff844d0c..3afd13080a 100644
--- a/docs/docs.html.in
+++ b/docs/docs.html.in
@@ -121,6 +121,11 @@

         <dt><a href="secureusage.html">Secure usage</a></dt>
         <dd>Secure usage of the libvirt APIs</dd>
+
+        <dt><a href="domainstatecapture.html">Domain state
+            capture</a></dt>
+        <dd>Comparison between different methods of capturing domain
+          state</dd>
       </dl>
     </div>

diff --git a/docs/domainstatecapture.html.in b/docs/domainstatecapture.html.in
new file mode 100644
index 0000000000..f7f2fe0b98
--- /dev/null
+++ b/docs/domainstatecapture.html.in
@@ -0,0 +1,314 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE html>
+<html xmlns="http://www.w3.org/1999/xhtml">
+  <body>
+
+    <h1>Domain state capture using Libvirt</h1>
+
+    <ul id="toc"></ul>
+
+    <p>
+      In order to aid application developers to choose which
+      operations best suit their needs, this page compares the
+      different means for capturing state related to a domain managed
+      by libvirt.
+    </p>
+
+    <p>
+      The information here is primarily geared towards capturing the
+      state of an active domain. Capturing the state of an inactive
+      domain essentially amounts to copying the contents of guest
+      disks, followed by a fresh boot with disks restored to that
+      state. Some of the topics presented below may relate to inactive
+      state collection, but it is not the primary focus of this page.
+    </p>
+
+    <h2><a id="definitions">State capture trade-offs</a></h2>
+
+    <p>One of the features made possible with virtual machines is live
+      migration -- transferring all state related to the guest from
+      one host to another with minimal interruption to the guest's
+      activity. In this case, state includes domain memory (including
+      register and device contents), and domain storage (whether the
+      guest's view of the disks are backed by local storage on the
+      host, or by the hypervisor accessing shared storage over a
+      network).  A clever observer will then note that if all state is
+      available for live migration, then there is nothing stopping a
+      user from saving some or all of that state at a given point of
+      time in order to be able to later rewind guest execution back to
+      the state it previously had. The astute reader will also realize
+      that state capture at any level requires that the data must be
+      stored and managed by some mechanism. This processing might fit
+      in a single file, or more likely require a chain of related
+      files, and may require synchronization with third-party tools
+      built around managing the amount of data resulting from
+      capturing the state of multiple guests that each use multiple
+      disks.
+    </p>
+
+    <p>
+      There are several libvirt APIs associated with capturing the
+      state of a guest, which can later be used to rewind that guest
+      to the conditions it was in earlier.  The following is a list of
+      trade-offs and differences between the various facets that
+      affect capturing domain state for active domains:
+    </p>
+
+    <dl>
+      <dt>Duration</dt>
+      <dd>Capturing state can be a lengthy process, so while the
+        captured state ideally represents an atomic point in time
+        correpsonding to something the guest was actually executing,
+        capturing state tends to focus on minimizing guest downtime
+        while performing the rest of the state capture in parallel
+        with guest execution.  Some interfaces require up-front
+        preparation (the state captured is not complete until the API
+        ends, which may be some time after the command was first
+        started), while other interfaces track the state when the
+        command was first issued, regardless of the time spent in
+        capturing the rest of the state.  Also, time spent in state
+        capture may be longer than the time required for live
+        migration, when state must be duplicated rather than shared.
+      </dd>
+
+      <dt>Amount of state</dt>
+      <dd>For an online guest, there is a choice between capturing the
+        guest's memory (all that is needed during live migration when
+        the storage is already shared between source and destination),
+        the guest's disk state (all that is needed if there are no
+        pending guest I/O transactions that would be lost without the
+        corresponding memory state), or both together.  Reverting to
+        partial state may still be viable, but typically, booting from
+        captured disk state without corresponding memory is comparable
+        to rebooting a machine that had power cut before I/O could be
+        flushed. Guests may need to use proper journaling methods to
+        avoid problems when booting from partial state.
+      </dd>
+
+      <dt>Quiescing of data</dt>
+      <dd>Even if a guest has no pending I/O, capturing disk state may
+        catch the guest at a time when the contents of the disk are
+        inconsistent. Cooperating with the guest to perform data
+        quiescing is an optional step to ensure that captured disk
+        state is fully consistent without requiring additional memory
+        state, rather than just crash-consistent.  But guest
+        cooperation may also have time constraints, where the guest
+        can rightfully panic if there is too much downtime while I/O
+        is frozen.
+      </dd>
+
+      <dt>Quantity of files</dt>
+      <dd>When capturing state, some approaches store all state within
+        the same file (internal), while others expand a chain of
+        related files that must be used together (external), for more
+        files that a management application must track.
+      </dd>
+
+      <dt>Impact to guest definition</dt>
+      <dd>Capturing state may require temporary changes to the guest
+        definition, such as associating new files into the domain
+        definition. While state capture should never impact the
+        running guest, a change to the domain's active XML may have
+        impact on other host operations being performed on the domain.
+      </dd>
+
+      <dt>Third-party integration</dt>
+      <dd>When capturing state, there are tradeoffs to how much of the
+        process must be done directly by the hypervisor, and how much
+        can be off-loaded to third-party software.  Since capturing
+        state is not instantaneous, it is essential that any
+        third-party integration see consistent data even if the
+        running guest continues to modify that data after the point in
+        time of the capture.</dd>
+
+      <dt>Full vs. incremental</dt>
+      <dd>When periodically repeating the action of state capture, it
+        is useful to minimize the amount of state that must be
+        captured by exploiting the relation to a previous capture,
+        such as focusing only on the portions of the disk that the
+        guest has modified in the meantime.  Some approaches are able
+        to take advantage of checkpoints to provide an incremental
+        backup, while others are only capable of a full backup even if
+        that means re-capturing unchanged portions of the disk.</dd>
+
+      <dt>Local vs. remote</dt>
+      <dd>Domains that completely use remote storage may only need
+        some mechanism to keep track of guest memory state while using
+        external means to manage storage. Still, hypervisor and guest
+        cooperation to ensure points in time when no I/O is in flight
+        across the network can be important for properly capturing
+        disk state.</dd>
+
+      <dt>Network latency</dt>
+      <dd>Whether it's domain storage or saving domain state into
+        remote storage, network latency has an impact on snapshot
+        data. Having dedicated network capacity, bandwidth, or quality
+        of service levels may play a role, as well as planning for how
+        much of the backup process needs to be local.</dd>
+    </dl>
+
+    <p>
+      An example of the various facets in action is migration of a
+      running guest. In order for the guest to be able to resume on
+      the destination at the same place it left off at the source, the
+      hypervisor has to get to a point where execution on the source
+      is stopped, the last remaining changes occurring since the
+      migration started are then transferred, and the guest is started
+      on the target. The management software thus must keep track of
+      the starting point and any changes since the starting
+      point. These last changes are often referred to as dirty page
+      tracking or dirty disk block bitmaps. At some point in time
+      during the migration, the management software must freeze the
+      source guest, transfer the dirty data, and then start the guest
+      on the target. This period of time must be minimal. To minimize
+      overall migration time, one is advised to use a dedicated
+      network connection with a high quality of service. Alternatively
+      saving the current state of the running guest can just be a
+      point in time type operation which doesn't require updating the
+      "last vestiges" of state prior to writing out the saved state
+      file. The state file is the point in time of whatever is current
+      and may contain incomplete data which if used to restart the
+      guest could cause confusion or problems because some operation
+      wasn't completed depending upon where in time the operation was
+      commenced.
+    </p>
+
+    <h2><a id="apis">State capture APIs</a></h2>
+    <p>With those definitions, the following libvirt APIs related to
+      state capture have these properties:</p>
+    <dl>
+      <dt>virDomainManagedSave</dt>
+      <dd>This API saves guest memory, with libvirt managing all of
+        the saved state, then stops the guest. While stopped, the
+        disks can be copied by a third party.  However, since any
+        subsequent restart of the guest by libvirt API will restore
+        the memory state (which typically only works if the disk state
+        is unchanged in the meantime), and since it is not possible to
+        get at the memory state that libvirt is managing, this is not
+        viable as a means for rolling back to earlier saved states,
+        but is rather more suited to situations such as suspending a
+        guest prior to rebooting the host in order to resume the guest
+        when the host is back up. This API also has a drawback of
+        potentially long guest downtime, and therefore does not lend
+        itself well to live backups.</dd>
+
+      <dt>virDomainSave</dt>
+      <dd>This API is similar to virDomainManagedSave(), but moves the
+        burden on managing the stored memory state to the user. As
+        such, the user can now couple saved state with copies of the
+        disks to perform a revert to an arbitrary earlier saved state.
+        However, changing who manages the memory state does not change
+        the drawback of potentially long guest downtime when capturing
+        state.</dd>
+
+      <dt>virDomainSnapshotCreateXML()</dt>
+      <dd>This API wraps several approaches for capturing guest state,
+        with a general premise of creating a snapshot (where the
+        current guest resources are frozen in time and a new wrapper
+        layer is opened for tracking subsequent guest changes).  It
+        can operate on both offline and running guests, can choose
+        whether to capture the state of memory, disk, or both when
+        used on a running guest, and can choose between internal and
+        external storage for captured state.  However, it is geared
+        towards post-event captures (when capturing both memory and
+        disk state, the disk state is not captured until all memory
+        state has been collected first).  Using QEMU as the
+        hypervisor, internal snapshots currently have lengthy downtime
+        that is incompatible with freezing guest I/O, but external
+        snapshots are quick.  Since creating an external snapshot
+        changes which disk image resource is in use by the guest, this
+        API can be coupled with <code>virDomainBlockCommit()</code> to
+        restore things back to the guest using its original disk
+        image, where a third-party tool can read the backing file
+        prior to the live commit.  See also
+        the <a href="formatsnapshot.html">XML details</a> used with
+        this command.</dd>
+
+      <dt>virDomainFSFreeze(), virDomainFSThaw()</dt>
+      <dd>This pair of APIs does not directly capture guest state, but
+        can be used to coordinate with a trusted live guest that state
+        capture is about to happen, and therefore guest I/O should be
+        quiesced so that the state capture is fully consistent, rather
+        than merely crash consistent.  Some APIs are able to
+        automatically perform a freeze and thaw via a flags parameter,
+        rather than having to make separate calls to these
+        functions. Also, note that freezing guest I/O is only possible
+        with trusted guests running a guest agent, and that some
+        guests place maximum time limits on how long I/O can be
+        frozen.</dd>
+
+      <dt>virDomainBlockCopy()</dt>
+      <dd>This API wraps approaches for capturing the disk state (but
+        not memory) of a running guest, but does not track
+        accompanying guest memory state, but can only operate on one
+        block device per job.  To get a consistent copy of multiple
+        disks, multiple jobs just be run in parallel, then the domain
+        must be paused before ending all of the jobs.  The capture is
+        consistent only at the end of the operation with a choice for
+        future guest changes to either pivot to the new file or to
+        resume to just using the original file.  The resulting backup
+        file is thus the other file no longer in use by the
+        guest.</dd>
+
+      <dt>virDomainCheckpointCreateXML()</dt>
+      <dd>This API does not actually capture guest state, rather it
+        makes it possible to track which portions of guest disks have
+        changed between a checkpoint and the current live execution of
+        the guest.  However, while it is possible use this API to
+        create checkpoints in isolation, it is more typical to create
+        a checkpoint as a side-effect of starting a new incremental
+        backup with <code>virDomainBackupBegin()</code>, since a
+        second incremental backup is most useful when using the
+        checkpoint created during the first.  <!--See also
+        the <a href="formatcheckpoint.html">XML details</a> used with
+        this command.--></dd>
+
+      <dt>virDomainBackupBegin(), virDomainBackupEnd()</dt>
+      <dd>This API wraps approaches for capturing the state of disks
+        of a running guest, but does not track accompanying guest
+        memory state.  The capture is consistent to the start of the
+        operation, where the captured state is stored independently
+        from the disk image in use with the guest and where it can be
+        easily integrated with a third-party for capturing the disk
+        state.  Since the backup operation is stored externally from
+        the guest resources, there is no need to commit data back in
+        at the completion of the operation.  When coupled with
+        checkpoints, this can be used to capture incremental backups
+        instead of full.</dd>
+    </dl>
+
+    <h2><a id="examples">Examples</a></h2>
+    <p>The following two sequences both accomplish the task of
+      capturing the disk state of a running guest, then wrapping
+      things up so that the guest is still running with the same file
+      as its disk image as before the sequence of operations began.
+      The difference between the two sequences boils down to the
+      impact of an unexpected interruption made at any point in the
+      middle of the sequence: with such an interruption, the first
+      example leaves the guest tied to a temporary wrapper file rather
+      than the original disk, and requires manual clean up of the
+      domain definition; while the second example has no impact to the
+      domain definition.</p>
+
+    <p>1. Backup via temporary snapshot
+      <pre>
+virDomainFSFreeze()
+virDomainSnapshotCreateXML(VIR_DOMAIN_SNAPSHOT_CREATE_DISK_ONLY)
+virDomainFSThaw()
+third-party copy the backing file to backup storage # most time spent here
+virDomainBlockCommit(VIR_DOMAIN_BLOCK_COMMIT_ACTIVE) per disk
+wait for commit ready event per disk
+virDomainBlockJobAbort() per disk
+      </pre></p>
+
+    <p>2. Direct backup
+      <pre>
+virDomainFSFreeze()
+virDomainBackupBegin()
+virDomainFSThaw()
+wait for push mode event, or pull data over NBD # most time spent here
+virDomainBackeupEnd()
+    </pre></p>
+
+  </body>
+</html>
diff --git a/docs/formatsnapshot.html.in b/docs/formatsnapshot.html.in
index c60b4fb7c9..9ee355198f 100644
--- a/docs/formatsnapshot.html.in
+++ b/docs/formatsnapshot.html.in
@@ -9,6 +9,8 @@
     <h2><a id="SnapshotAttributes">Snapshot XML</a></h2>

     <p>
+      Snapshots are one form
+      of <a href="domainstatecapture.html">domain state capture</a>.
       There are several types of snapshots:
     </p>
     <dl>
-- 
2.20.1

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v5 03/20] backup: Document nuances between different state capture APIs

Posted by John Ferlan 6 years, 11 months ago


On 3/7/19 12:47 AM, Eric Blake wrote:
> Upcoming patches will add support for incremental backups via
> a new API; but first, we need a landing page that gives an
> overview of capturing various pieces of guest state, and which
> APIs are best suited to which tasks.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v2: wording improvements based on review
> ---
>  docs/docs.html.in               |   5 +
>  docs/domainstatecapture.html.in | 314 ++++++++++++++++++++++++++++++++
>  docs/formatsnapshot.html.in     |   2 +
>  3 files changed, 321 insertions(+)
>  create mode 100644 docs/domainstatecapture.html.in
> > diff --git a/docs/docs.html.in b/docs/docs.html.in
> index d0ff844d0c..3afd13080a 100644
> --- a/docs/docs.html.in
> +++ b/docs/docs.html.in
> @@ -121,6 +121,11 @@
> 
>          <dt><a href="secureusage.html">Secure usage</a></dt>
>          <dd>Secure usage of the libvirt APIs</dd>
> +
> +        <dt><a href="domainstatecapture.html">Domain state
> +            capture</a></dt>
> +        <dd>Comparison between different methods of capturing domain
> +          state</dd>
>        </dl>
>      </div>
> 
> diff --git a/docs/domainstatecapture.html.in b/docs/domainstatecapture.html.in
> new file mode 100644
> index 0000000000..f7f2fe0b98
> --- /dev/null
> +++ b/docs/domainstatecapture.html.in
> @@ -0,0 +1,314 @@
> +<?xml version="1.0" encoding="UTF-8"?>
> +<!DOCTYPE html>
> +<html xmlns="http://www.w3.org/1999/xhtml">
> +  <body>
> +
> +    <h1>Domain state capture using Libvirt</h1>
> +
> +    <ul id="toc"></ul>
> +
> +    <p>
> +      In order to aid application developers to choose which
> +      operations best suit their needs, this page compares the
> +      different means for capturing state related to a domain managed
> +      by libvirt.
> +    </p>
> +
> +    <p>
> +      The information here is primarily geared towards capturing the
> +      state of an active domain. Capturing the state of an inactive
> +      domain essentially amounts to copying the contents of guest
> +      disks, followed by a fresh boot with disks restored to that
> +      state. Some of the topics presented below may relate to inactive
> +      state collection, but it is not the primary focus of this page.

Perhaps the last sentence is redundant or unnecessary, IDC.

> +    </p>
> +
> +    <h2><a id="definitions">State capture trade-offs</a></h2>
> +
> +    <p>One of the features made possible with virtual machines is live
> +      migration -- transferring all state related to the guest from
> +      one host to another with minimal interruption to the guest's
> +      activity. In this case, state includes domain memory (including
> +      register and device contents), and domain storage (whether the
> +      guest's view of the disks are backed by local storage on the
> +      host, or by the hypervisor accessing shared storage over a
> +      network).  A clever observer will then note that if all state is
> +      available for live migration, then there is nothing stopping a
> +      user from saving some or all of that state at a given point of
> +      time in order to be able to later rewind guest execution back to
> +      the state it previously had. The astute reader will also realize
> +      that state capture at any level requires that the data must be
> +      stored and managed by some mechanism. This processing might fit
> +      in a single file, or more likely require a chain of related
> +      files, and may require synchronization with third-party tools
> +      built around managing the amount of data resulting from
> +      capturing the state of multiple guests that each use multiple
> +      disks.
> +    </p>
> +
> +    <p>
> +      There are several libvirt APIs associated with capturing the
> +      state of a guest, which can later be used to rewind that guest
> +      to the conditions it was in earlier.  The following is a list of
> +      trade-offs and differences between the various facets that
> +      affect capturing domain state for active domains:
> +    </p>
> +
> +    <dl>
> +      <dt>Duration</dt>
> +      <dd>Capturing state can be a lengthy process, so while the
> +        captured state ideally represents an atomic point in time
> +        correpsonding to something the guest was actually executing,

corresponding

> +        capturing state tends to focus on minimizing guest downtime
> +        while performing the rest of the state capture in parallel
> +        with guest execution.  Some interfaces require up-front
> +        preparation (the state captured is not complete until the API
> +        ends, which may be some time after the command was first
> +        started), while other interfaces track the state when the
> +        command was first issued, regardless of the time spent in
> +        capturing the rest of the state.  Also, time spent in state
> +        capture may be longer than the time required for live
> +        migration, when state must be duplicated rather than shared.
> +      </dd>
> +
> +      <dt>Amount of state</dt>
> +      <dd>For an online guest, there is a choice between capturing the
> +        guest's memory (all that is needed during live migration when
> +        the storage is already shared between source and destination),
> +        the guest's disk state (all that is needed if there are no
> +        pending guest I/O transactions that would be lost without the
> +        corresponding memory state), or both together.  Reverting to
> +        partial state may still be viable, but typically, booting from
> +        captured disk state without corresponding memory is comparable
> +        to rebooting a machine that had power cut before I/O could be
> +        flushed. Guests may need to use proper journaling methods to
> +        avoid problems when booting from partial state.
> +      </dd>
> +
> +      <dt>Quiescing of data</dt>
> +      <dd>Even if a guest has no pending I/O, capturing disk state may
> +        catch the guest at a time when the contents of the disk are
> +        inconsistent. Cooperating with the guest to perform data
> +        quiescing is an optional step to ensure that captured disk
> +        state is fully consistent without requiring additional memory
> +        state, rather than just crash-consistent.  But guest
> +        cooperation may also have time constraints, where the guest
> +        can rightfully panic if there is too much downtime while I/O
> +        is frozen.
> +      </dd>
> +
> +      <dt>Quantity of files</dt>
> +      <dd>When capturing state, some approaches store all state within
> +        the same file (internal), while others expand a chain of
> +        related files that must be used together (external), for more
> +        files that a management application must track.
> +      </dd>
> +
> +      <dt>Impact to guest definition</dt>
> +      <dd>Capturing state may require temporary changes to the guest
> +        definition, such as associating new files into the domain
> +        definition. While state capture should never impact the
> +        running guest, a change to the domain's active XML may have
> +        impact on other host operations being performed on the domain.
> +      </dd>
> +
> +      <dt>Third-party integration</dt>
> +      <dd>When capturing state, there are tradeoffs to how much of the
> +        process must be done directly by the hypervisor, and how much
> +        can be off-loaded to third-party software.  Since capturing
> +        state is not instantaneous, it is essential that any
> +        third-party integration see consistent data even if the
> +        running guest continues to modify that data after the point in
> +        time of the capture.</dd>
> +
> +      <dt>Full vs. incremental</dt>
> +      <dd>When periodically repeating the action of state capture, it
> +        is useful to minimize the amount of state that must be
> +        captured by exploiting the relation to a previous capture,
> +        such as focusing only on the portions of the disk that the
> +        guest has modified in the meantime.  Some approaches are able
> +        to take advantage of checkpoints to provide an incremental
> +        backup, while others are only capable of a full backup even if
> +        that means re-capturing unchanged portions of the disk.</dd>
> +
> +      <dt>Local vs. remote</dt>
> +      <dd>Domains that completely use remote storage may only need
> +        some mechanism to keep track of guest memory state while using
> +        external means to manage storage. Still, hypervisor and guest
> +        cooperation to ensure points in time when no I/O is in flight
> +        across the network can be important for properly capturing
> +        disk state.</dd>
> +
> +      <dt>Network latency</dt>
> +      <dd>Whether it's domain storage or saving domain state into
> +        remote storage, network latency has an impact on snapshot
> +        data. Having dedicated network capacity, bandwidth, or quality
> +        of service levels may play a role, as well as planning for how
> +        much of the backup process needs to be local.</dd>
> +    </dl>
> +
> +    <p>
> +      An example of the various facets in action is migration of a
> +      running guest. In order for the guest to be able to resume on
> +      the destination at the same place it left off at the source, the
> +      hypervisor has to get to a point where execution on the source
> +      is stopped, the last remaining changes occurring since the
> +      migration started are then transferred, and the guest is started
> +      on the target. The management software thus must keep track of
> +      the starting point and any changes since the starting
> +      point. These last changes are often referred to as dirty page
> +      tracking or dirty disk block bitmaps. At some point in time
> +      during the migration, the management software must freeze the
> +      source guest, transfer the dirty data, and then start the guest
> +      on the target. This period of time must be minimal. To minimize
> +      overall migration time, one is advised to use a dedicated
> +      network connection with a high quality of service. Alternatively
> +      saving the current state of the running guest can just be a
> +      point in time type operation which doesn't require updating the
> +      "last vestiges" of state prior to writing out the saved state
> +      file. The state file is the point in time of whatever is current
> +      and may contain incomplete data which if used to restart the
> +      guest could cause confusion or problems because some operation
> +      wasn't completed depending upon where in time the operation was
> +      commenced.
> +    </p>
> +
> +    <h2><a id="apis">State capture APIs</a></h2>
> +    <p>With those definitions, the following libvirt APIs related to
> +      state capture have these properties:</p>
> +    <dl>
> +      <dt>virDomainManagedSave</dt>

Do you think it'd be worthwhile to modify to:

<code><a
href="html/libvirt-libvirt-domain.html#virDomainManagedSave">virDomainManagedSave</a></code>

> +      <dd>This API saves guest memory, with libvirt managing all of
> +        the saved state, then stops the guest. While stopped, the
> +        disks can be copied by a third party.  However, since any
> +        subsequent restart of the guest by libvirt API will restore
> +        the memory state (which typically only works if the disk state
> +        is unchanged in the meantime), and since it is not possible to
> +        get at the memory state that libvirt is managing, this is not
> +        viable as a means for rolling back to earlier saved states,
> +        but is rather more suited to situations such as suspending a
> +        guest prior to rebooting the host in order to resume the guest
> +        when the host is back up. This API also has a drawback of
> +        potentially long guest downtime, and therefore does not lend
> +        itself well to live backups.</dd>
> +
> +      <dt>virDomainSave</dt>

<code><a
href="html/libvirt-libvirt-domain.html#virDomainSave">virDomainSave</a></code>

> +      <dd>This API is similar to virDomainManagedSave(), but moves the

s/()//  or add them above. I like without (), but don't really care.
Just make them all consistent - above and below.

> +        burden on managing the stored memory state to the user. As
> +        such, the user can now couple saved state with copies of the
> +        disks to perform a revert to an arbitrary earlier saved state.
> +        However, changing who manages the memory state does not change
> +        the drawback of potentially long guest downtime when capturing
> +        state.</dd>
> +
> +      <dt>virDomainSnapshotCreateXML()</dt>


<code><a
href="html/libvirt-libvirt-domain-snapshot.html#virDomainSnapshotCreateXML">virDomainSnapshotCreateXML</a></code>

> +      <dd>This API wraps several approaches for capturing guest state,
> +        with a general premise of creating a snapshot (where the
> +        current guest resources are frozen in time and a new wrapper
> +        layer is opened for tracking subsequent guest changes).  It
> +        can operate on both offline and running guests, can choose
> +        whether to capture the state of memory, disk, or both when
> +        used on a running guest, and can choose between internal and
> +        external storage for captured state.  However, it is geared
> +        towards post-event captures (when capturing both memory and
> +        disk state, the disk state is not captured until all memory
> +        state has been collected first).  Using QEMU as the
> +        hypervisor, internal snapshots currently have lengthy downtime
> +        that is incompatible with freezing guest I/O, but external
> +        snapshots are quick.  Since creating an external snapshot
> +        changes which disk image resource is in use by the guest, this
> +        API can be coupled with <code>virDomainBlockCommit()</code> to
> +        restore things back to the guest using its original disk
> +        image, where a third-party tool can read the backing file
> +        prior to the live commit.  See also
> +        the <a href="formatsnapshot.html">XML details</a> used with
> +        this command.</dd>
> +
> +      <dt>virDomainFSFreeze(), virDomainFSThaw()</dt>

<code><a
href="html/libvirt-libvirt-domain.html#virDomainFSFreeze">virDomainFSFreeze</a></code>,
<code><a
href="html/libvirt-libvirt-domain.html#virDomainFSThaw">virDomainFSThaw</a></code>

> +      <dd>This pair of APIs does not directly capture guest state, but
> +        can be used to coordinate with a trusted live guest that state
> +        capture is about to happen, and therefore guest I/O should be
> +        quiesced so that the state capture is fully consistent, rather
> +        than merely crash consistent.  Some APIs are able to
> +        automatically perform a freeze and thaw via a flags parameter,
> +        rather than having to make separate calls to these
> +        functions. Also, note that freezing guest I/O is only possible
> +        with trusted guests running a guest agent, and that some
> +        guests place maximum time limits on how long I/O can be
> +        frozen.</dd>
> +
> +      <dt>virDomainBlockCopy()</dt>

<code><a
href="html/libvirt-libvirt-domain.html#virDomainBlockCopy">virDomainBlockCopy</a></code>

> +      <dd>This API wraps approaches for capturing the disk state (but
> +        not memory) of a running guest, but does not track
> +        accompanying guest memory state, but can only operate on one
> +        block device per job.  To get a consistent copy of multiple
> +        disks, multiple jobs just be run in parallel, then the domain
> +        must be paused before ending all of the jobs.  The capture is
> +        consistent only at the end of the operation with a choice for
> +        future guest changes to either pivot to the new file or to
> +        resume to just using the original file.  The resulting backup
> +        file is thus the other file no longer in use by the
> +        guest.</dd>
> +
> +      <dt>virDomainCheckpointCreateXML()</dt>

<code><a
href="html/libvirt-libvirt-domain-checkpoint.html#virDomainCheckpointCreateXML">virDomainCheckpointCreateXML</a></code>


Since this and the next two following don't have links yet, I think
rather than do any sort of split, can we move this to after the
virDomainBackup* API's are introduced? It's been great to help lay the
groundwork though.

> +      <dd>This API does not actually capture guest state, rather it
> +        makes it possible to track which portions of guest disks have
> +        changed between a checkpoint and the current live execution of
> +        the guest.  However, while it is possible use this API to
> +        create checkpoints in isolation, it is more typical to create
> +        a checkpoint as a side-effect of starting a new incremental
> +        backup with <code>virDomainBackupBegin()</code>, since a
> +        second incremental backup is most useful when using the
> +        checkpoint created during the first.  <!--See also
> +        the <a href="formatcheckpoint.html">XML details</a> used with
> +        this command.--></dd>

Making this patch later in the series removes the need for this too.

> +
> +      <dt>virDomainBackupBegin(), virDomainBackupEnd()</dt>

<code><a
href="html/libvirt-libvirt-domain.html#virDomainBackupBegin">virDomainBackupBegin</a></code>,
<code><a
href="html/libvirt-libvirt-domain.html#virDomainBackupEnd">virDomainBackupEnd</a></code>

> +      <dd>This API wraps approaches for capturing the state of disks
> +        of a running guest, but does not track accompanying guest
> +        memory state.  The capture is consistent to the start of the
> +        operation, where the captured state is stored independently
> +        from the disk image in use with the guest and where it can be
> +        easily integrated with a third-party for capturing the disk
> +        state.  Since the backup operation is stored externally from
> +        the guest resources, there is no need to commit data back in
> +        at the completion of the operation.  When coupled with
> +        checkpoints, this can be used to capture incremental backups
> +        instead of full.</dd>
> +    </dl>
> +
> +    <h2><a id="examples">Examples</a></h2>
> +    <p>The following two sequences both accomplish the task of
> +      capturing the disk state of a running guest, then wrapping
> +      things up so that the guest is still running with the same file
> +      as its disk image as before the sequence of operations began.
> +      The difference between the two sequences boils down to the
> +      impact of an unexpected interruption made at any point in the
> +      middle of the sequence: with such an interruption, the first
> +      example leaves the guest tied to a temporary wrapper file rather
> +      than the original disk, and requires manual clean up of the
> +      domain definition; while the second example has no impact to the
> +      domain definition.</p>
> +
> +    <p>1. Backup via temporary snapshot
> +      <pre>
> +virDomainFSFreeze()
> +virDomainSnapshotCreateXML(VIR_DOMAIN_SNAPSHOT_CREATE_DISK_ONLY)
> +virDomainFSThaw()
> +third-party copy the backing file to backup storage # most time spent here
> +virDomainBlockCommit(VIR_DOMAIN_BLOCK_COMMIT_ACTIVE) per disk
> +wait for commit ready event per disk
> +virDomainBlockJobAbort() per disk
> +      </pre></p>
> +
> +    <p>2. Direct backup
> +      <pre>
> +virDomainFSFreeze()
> +virDomainBackupBegin()
> +virDomainFSThaw()
> +wait for push mode event, or pull data over NBD # most time spent here
> +virDomainBackeupEnd()

virDomainBackupEnd

Reviewed-by: John Ferlan <jferlan@redhat.com>

John

> +    </pre></p>
> +
> +  </body>
> +</html>
> diff --git a/docs/formatsnapshot.html.in b/docs/formatsnapshot.html.in
> index c60b4fb7c9..9ee355198f 100644
> --- a/docs/formatsnapshot.html.in
> +++ b/docs/formatsnapshot.html.in
> @@ -9,6 +9,8 @@
>      <h2><a id="SnapshotAttributes">Snapshot XML</a></h2>
> 
>      <p>
> +      Snapshots are one form
> +      of <a href="domainstatecapture.html">domain state capture</a>.
>        There are several types of snapshots:
>      </p>
>      <dl>
> 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v5 03/20] backup: Document nuances between different state capture APIs

Posted by Eric Blake 6 years, 11 months ago

On 3/12/19 12:59 PM, John Ferlan wrote:
> 
> 
> On 3/7/19 12:47 AM, Eric Blake wrote:
>> Upcoming patches will add support for incremental backups via
>> a new API; but first, we need a landing page that gives an
>> overview of capturing various pieces of guest state, and which
>> APIs are best suited to which tasks.
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
>>
>> ---

>> +    <p>
>> +      The information here is primarily geared towards capturing the
>> +      state of an active domain. Capturing the state of an inactive
>> +      domain essentially amounts to copying the contents of guest
>> +      disks, followed by a fresh boot with disks restored to that
>> +      state. Some of the topics presented below may relate to inactive
>> +      state collection, but it is not the primary focus of this page.
> 
> Perhaps the last sentence is redundant or unnecessary, IDC.

I don't think it hurts to drop it.


>> +    <dl>
>> +      <dt>Duration</dt>
>> +      <dd>Capturing state can be a lengthy process, so while the
>> +        captured state ideally represents an atomic point in time
>> +        correpsonding to something the guest was actually executing,
> 
> corresponding

Fixed.


>> +
>> +    <h2><a id="apis">State capture APIs</a></h2>
>> +    <p>With those definitions, the following libvirt APIs related to
>> +      state capture have these properties:</p>
>> +    <dl>
>> +      <dt>virDomainManagedSave</dt>
> 
> Do you think it'd be worthwhile to modify to:
> 
> <code><a
> href="html/libvirt-libvirt-domain.html#virDomainManagedSave">virDomainManagedSave</a></code>

Yes, making links is worthwhile. I'll do that...

> 
> Since this and the next two following don't have links yet, I think
> rather than do any sort of split, can we move this to after the
> virDomainBackup* API's are introduced? It's been great to help lay the
> groundwork though.
> 
>> +      <dd>This API does not actually capture guest state, rather it
>> +        makes it possible to track which portions of guest disks have
>> +        changed between a checkpoint and the current live execution of
>> +        the guest.  However, while it is possible use this API to
>> +        create checkpoints in isolation, it is more typical to create
>> +        a checkpoint as a side-effect of starting a new incremental
>> +        backup with <code>virDomainBackupBegin()</code>, since a
>> +        second incremental backup is most useful when using the
>> +        checkpoint created during the first.  <!--See also
>> +        the <a href="formatcheckpoint.html">XML details</a> used with
>> +        this command.--></dd>
> 
> Making this patch later in the series removes the need for this too.
> 

and yes, I can make this shuffle in the series.


>> +    <p>2. Direct backup
>> +      <pre>
>> +virDomainFSFreeze()
>> +virDomainBackupBegin()
>> +virDomainFSThaw()
>> +wait for push mode event, or pull data over NBD # most time spent here
>> +virDomainBackeupEnd()
> 
> virDomainBackupEnd

Indeed.

> 
> Reviewed-by: John Ferlan <jferlan@redhat.com>

Thanks.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list