include/libvirt/libvirt-domain-snapshot.h | 16 +- src/conf/snapshot_conf.h | 1 + src/driver-hypervisor.h | 13 +- src/conf/snapshot_conf.c | 34 ++-- src/libvirt-domain-snapshot.c | 122 ++++++++++- src/libvirt_public.syms | 2 + src/qemu/qemu_domain.c | 2 +- src/qemu/qemu_driver.c | 236 +++++++++++++++++----- src/remote/remote_driver.c | 6 +- src/remote/remote_protocol.x | 38 +++- src/remote_protocol-structs | 17 ++ src/test/test_driver.c | 62 +++++- src/vz/vz_driver.c | 3 +- tools/virsh-snapshot.c | 111 ++++++++-- tools/virsh.pod | 15 +- 15 files changed, 584 insertions(+), 94 deletions(-)
While looking at my work on incremental backups, Nir raised a good point: if we want to recreate a set of known checkpoints on one machine that will be taking over a domain from another machine, my initial proposal required making multiple API calls to list the XML for each checkpoint on the source, and then more API calls to redefine each checkpoint on the destination; it also had the drawback that the list has to be presented in topological order (the API won't let you define a child checkpoint if the parent is not defined first). He asked if I could instead add bulk APIs, for getting the XML for all checkpoints at once, and then for redefining all checkpoints at once. Since my checkpoint code borrows heavily from concepts in the snapshot code, I chose to tackle the problem by starting with this series, which does the same thing for snapshots as what I plan to do for checkpoints. That is, since this patch series adds 2 new APIs, the checkpoint series will do likewise with very similar code. I'm also toying with the idea of followup patches that store all snapshots alongside the persistent <domain> XML in a single file, rather than in one snapshot per <domainsnapshot> file (we'd still support reading from the old method at libvirtd startup, but would not need to output in that manner any more). Based-on: <20190308060512.17733-1-eblake@redhat.com> [0/5 snapshots: topological sorting] Changes from v3: - introduce 2 new API instead of flags to 2 existing API [Jan] - rebase on top of minor changes that already landed in the non-controversial first half of the v3 series [John] git backport-diff gets confused by patch renames, but this is quite obviously more or less a rewrite: 001/8:[down] 'snapshot: Add new API for bulk dumpxml/redefine' 002/8:[down] 'snapshot: Support topological virDomainSnapshotForEach()' 003/8:[down] 'snapshot: Tweaks to support new bulk dumpxml/import API' 004/8:[down] 'remote: Wire up snapshot bulk dumpxml/import' 005/8:[down] 'virsh: Expose bulk snapshot dumpxml/import' 006/8:[0087] [FC] 'test: Implement bulk snapshot operations' 007/8:[0040] [FC] 'qemu: Factor out qemuDomainSnapshotValidate() helper' 008/8:[0152] [FC] 'qemu: Implement bulk snapshot operations' Eric Blake (8): snapshot: Add new API for bulk dumpxml/redefine snapshot: Support topological virDomainSnapshotForEach() snapshot: Tweaks to support new bulk dumpxml/import API remote: Wire up snapshot bulk dumpxml/import virsh: Expose bulk snapshot dumpxml/import test: Implement bulk snapshot operations qemu: Factor out qemuDomainSnapshotValidate() helper qemu: Implement bulk snapshot operations include/libvirt/libvirt-domain-snapshot.h | 16 +- src/conf/snapshot_conf.h | 1 + src/driver-hypervisor.h | 13 +- src/conf/snapshot_conf.c | 34 ++-- src/libvirt-domain-snapshot.c | 122 ++++++++++- src/libvirt_public.syms | 2 + src/qemu/qemu_domain.c | 2 +- src/qemu/qemu_driver.c | 236 +++++++++++++++++----- src/remote/remote_driver.c | 6 +- src/remote/remote_protocol.x | 38 +++- src/remote_protocol-structs | 17 ++ src/test/test_driver.c | 62 +++++- src/vz/vz_driver.c | 3 +- tools/virsh-snapshot.c | 111 ++++++++-- tools/virsh.pod | 15 +- 15 files changed, 584 insertions(+), 94 deletions(-) -- 2.20.1 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On Mon, Mar 11, 2019 at 09:38:31PM -0500, Eric Blake wrote: > While looking at my work on incremental backups, Nir raised a good > point: if we want to recreate a set of known checkpoints on one > machine that will be taking over a domain from another machine, > my initial proposal required making multiple API calls to list the > XML for each checkpoint on the source, and then more API calls to > redefine each checkpoint on the destination; it also had the drawback > that the list has to be presented in topological order (the API won't > let you define a child checkpoint if the parent is not defined first). > He asked if I could instead add bulk APIs, for getting the XML for > all checkpoints at once, and then for redefining all checkpoints at > once. To me the key blocking problem here is the topological sorting one as it makes it incredibly painful to run individual APIs. Your other series on list addresses this nicely by adding the topological flag. Using that flag, in psuedo-python like code the process of exporting and importing snapshots from one host to another should look approx like this snapshots = srcdom.ListAllSnapshots(libvirt.VIR_DOMAIN_SNAPSHOT_LIST_TOPOLOGICAL) for snapshot in snapshts: xml = snapshot.XMLDesc() dstdom.SnapshotCreateXML(xml) I don't really see a reason why this is so hard that it justifies adding these bulk snapshot list/define APIs. At the same time there is a very real downside to providing these bulk snapshot APIs in terms of scalability. The <domain> XML for a guest can get very large. The <domainsnapshot> XML contains a copy of the <domain> XML from the point in time of the snapsht. Therefore a <snapshots> XML document is going to contain *many* <domain> XML documents concatenated. For a domain with many devices and many snapshots, the bulk list APIs have non-negligible risk of hitting the RPC protocol limit on string size. We've seen this before with the bulk domain stats APIs we added. We none the less still added the bulk domain stats APIs because it was a compelling performance benefit for something that is run very frequently. For apps to be scalable though, they have to limit how many guests they request bulk stats for at any time. This in fact makes the code using the bulk stats APIs more complex than if it just called the individual APIs. The performance benefit is still a win though in this case. In the bulk snapshot import/export case though, I'm not seeing a compelling performance reason for their usage. Making the APIs scalable would require to limit how many snapshots are returned in any single call. There would then need to be repeated calls to fetch the rest. This makes it more complicated than just fetching/defining each individual snapshot IOW, overall I'm not seeing a big enough justification to add these new APIs, that would outweigh the downsides they bring in terms of scalability. The topological sorting flag solves the really big pain point well enough. > I'm also toying with the idea of followup patches that store all > snapshots alongside the persistent <domain> XML in a single file, > rather than in one snapshot per <domainsnapshot> file (we'd still > support reading from the old method at libvirtd startup, but would > not need to output in that manner any more). What would be the benefit to having it all in one file ? During initial libvirtd startup we could read the info from one files, but we still need code to read the info from many files so we're not saving code in that respect. Most of the snapshot APIs that save XML to disk only care about a single <domainsnapshot> document so wouldn't benefit from having all snapshots in one place. Re-writing the entire <domain> XML means that a host failure during a snapshot save can put the entire domain definition at risk of loss. Though we do try to mitigate that by writing to a temp file and doing an atomic rename, that doesn't eliminate the risk entirely. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On 3/12/19 1:10 PM, Daniel P. Berrangé wrote: > On Mon, Mar 11, 2019 at 09:38:31PM -0500, Eric Blake wrote: >> While looking at my work on incremental backups, Nir raised a good >> point: if we want to recreate a set of known checkpoints on one >> machine that will be taking over a domain from another machine, >> my initial proposal required making multiple API calls to list the >> XML for each checkpoint on the source, and then more API calls to >> redefine each checkpoint on the destination; it also had the drawback >> that the list has to be presented in topological order (the API won't >> let you define a child checkpoint if the parent is not defined first). >> He asked if I could instead add bulk APIs, for getting the XML for >> all checkpoints at once, and then for redefining all checkpoints at >> once. > > To me the key blocking problem here is the topological sorting one > as it makes it incredibly painful to run individual APIs. Your other > series on list addresses this nicely by adding the topological flag. > > Using that flag, in psuedo-python like code the process of exporting > and importing snapshots from one host to another should look approx > like this > > snapshots = srcdom.ListAllSnapshots(libvirt.VIR_DOMAIN_SNAPSHOT_LIST_TOPOLOGICAL) > for snapshot in snapshts: > xml = snapshot.XMLDesc() > dstdom.SnapshotCreateXML(xml) > > I don't really see a reason why this is so hard that it justifies > adding these bulk snapshot list/define APIs. Nir, do you want to chime in here? > > At the same time there is a very real downside to providing these bulk > snapshot APIs in terms of scalability. The <domain> XML for a guest > can get very large. The <domainsnapshot> XML contains a copy of the > <domain> XML from the point in time of the snapsht. Therefore a > <snapshots> XML document is going to contain *many* <domain> XML > documents concatenated. > > For a domain with many devices and many snapshots, the bulk list > APIs have non-negligible risk of hitting the RPC protocol limit > on string size. I did not consider that, but you are definitely correct about it being a real problem (I didn't trip over it in my small domain testing, but combined XML is definitely a repetition of a <domain> per <domainsnapshot>). One idea I had on the checkpoint series was adding a NO_DOMAIN flag which outputs much more compact XML by omitting the <domain> element. With checkpoints, the <domain> at the time of the checkpoint is less important (useful to know if we need it, but we can't revert to that state); but with snapshots, the <domain> at the time of the snapshot is very important (to the point that you have to request a flag for an unsafe revert in virDomainSnapshotRevert() if the <domain> information was incomplete). So using a NO_DOMAIN flag to minimize XML so that the bulk output doesn't exceed RPC limits is a possibility, but not a very nice one (as you'd STILL have to do a one-API-per-snapshot call to reinstate the <domain> of any snapshot that you plan to revert to). > > We've seen this before with the bulk domain stats APIs we added. > We none the less still added the bulk domain stats APIs because > it was a compelling performance benefit for something that is run > very frequently. For apps to be scalable though, they have to > limit how many guests they request bulk stats for at any time. > This in fact makes the code using the bulk stats APIs more complex > than if it just called the individual APIs. The performance benefit > is still a win though in this case. > > In the bulk snapshot import/export case though, I'm not seeing a > compelling performance reason for their usage. Making the APIs > scalable would require to limit how many snapshots are returned > in any single call. There would then need to be repeated calls > to fetch the rest. This makes it more complicated than just > fetching/defining each individual snapshot And the frequency of migrating snapshots is less than the frequency of querying bulk stats. It makes the most sense to speed up APIs called frequently. > > IOW, overall I'm not seeing a big enough justification to add these > new APIs, that would outweigh the downsides they bring in terms of > scalability. You raise some good points. > > The topological sorting flag solves the really big pain point well > enough. The topological sorting thing kind of fell on my lap as I was working on the bulk code, but now that I've written it, it definitely solves a problem that we previously had, even if I had not properly identified the problem (randomized list ordering is painful to work with). Either redefine has to accept randomized ordering (which it currently doesn't), or listing should output in sane ordering (which topological adds) - and once we have topological output, subsequent input gets a LOT easier, even if split across multiple API calls. > >> I'm also toying with the idea of followup patches that store all >> snapshots alongside the persistent <domain> XML in a single file, >> rather than in one snapshot per <domainsnapshot> file (we'd still >> support reading from the old method at libvirtd startup, but would >> not need to output in that manner any more). > > What would be the benefit to having it all in one file ? During > initial libvirtd startup we could read the info from one files, > but we still need code to read the info from many files so we're > not saving code in that respect. Most of the snapshot APIs that > save XML to disk only care about a single <domainsnapshot> > document so wouldn't benefit from having all snapshots in one > place. One possible benefit: in the current virDomainSnapshotCreateXML(REDEFINE) code, we potentially end up changing three separate snapshots in one call: if we are redefining a replacement snapshot (maybe supplying a <domain> subelement that was previously missing), AND marking the redefinition as the new current element, it requires rewriting the XML file of the previous current snapshot (to drop an internal-only <active>1</active> marker), deleting the XML file of the previous state of the snapshot being modified, and writing the new XML file of the redefined snapshot. If any one of those steps die, it's very hard to roll back to the earlier state. But if we store all snapshots in a single file using the <snapshots current='name'> (rather than a per-snapshot <active>1</active> marker), we only have to modify a single file when making any changes to the snapshot hierarchy. If we fail at any earlier step of the way, we have not partially corrupted the snapshot hierarchy in a manner that is hard to roll back. Another (weak) benefit - right now, qemu snapshots cannot be named "../blah", because that would create a filename that escapes the proper subdirectory. With a single <snapshots>, you can allow more flexible snapshot names, as those names no longer influence host file names. > > Re-writing the entire <domain> XML means that a host failure during > a snapshot save can put the entire domain definition at risk of loss. > Though we do try to mitigate that by writing to a temp file and doing > an atomic rename, that doesn't eliminate the risk entirely. Okay, so it's probably better to store the <snapshots> XML in a file separate from the <domain> itself (failure to write the snapshots file may lose the snapshots, but does not lose the domain), but still, I can't help but think that two files is potentially nicer than a subdirectory full of one file per snapshot. Unless Nir speaks up with strong reasons in favor of new bulk APIs (rather than living with the new topological sort flags), I'm okay dropping this series. The patches that are already in (the head of the v3 posting of this series, for generating/parsing <snapshots>) may still prove to be useful if you agree with me that a single file for ALL snapshots is more robust than one file per snapshot, particularly when we have exiting APIs that already have a hard time rolling back to a sane state on partial failure when manipulating more than one file. But as I have not yet posted any series showing how that would work, that doesn't change that these v4 patches are probably not worth pursuing further. And, since it is more important to get new API in by 5.2 (since rebasing APIs is difficult to backport) than it is to do internal-only changes (a subdirectory of multiple files vs. one <snapshots> file is not user-visible, and can be backported without as much hassle), I'll focus more on the checkpoint APIs rather than any refactoring of internals. So that may mean that 5.2 is released with my <snapshots> format/parse code with no consumers of that code, but I don't think it hurts enough to revert them (as we may end up with a consumer for that code, even if not in 5.2). -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On Tue, Mar 12, 2019 at 9:49 PM Eric Blake <eblake@redhat.com> wrote: > On 3/12/19 1:10 PM, Daniel P. Berrangé wrote: > > On Mon, Mar 11, 2019 at 09:38:31PM -0500, Eric Blake wrote: > >> While looking at my work on incremental backups, Nir raised a good > >> point: if we want to recreate a set of known checkpoints on one > >> machine that will be taking over a domain from another machine, > >> my initial proposal required making multiple API calls to list the > >> XML for each checkpoint on the source, and then more API calls to > >> redefine each checkpoint on the destination; it also had the drawback > >> that the list has to be presented in topological order (the API won't > >> let you define a child checkpoint if the parent is not defined first). > >> He asked if I could instead add bulk APIs, for getting the XML for > >> all checkpoints at once, and then for redefining all checkpoints at > >> once. > > > > To me the key blocking problem here is the topological sorting one > > as it makes it incredibly painful to run individual APIs. Your other > > series on list addresses this nicely by adding the topological flag. > > > > Using that flag, in psuedo-python like code the process of exporting > > and importing snapshots from one host to another should look approx > > like this > > > > snapshots = > srcdom.ListAllSnapshots(libvirt.VIR_DOMAIN_SNAPSHOT_LIST_TOPOLOGICAL) > > for snapshot in snapshts: > > xml = snapshot.XMLDesc() > > dstdom.SnapshotCreateXML(xml) > > > > I don't really see a reason why this is so hard that it justifies > > adding these bulk snapshot list/define APIs. > > Nir, do you want to chime in here? > We don't have a need to list or define snapshots since we managed snapshots on oVirt side. We want an API to list and redefine checkpoints. Our use case is: 1. Start VM on some host 2. When oVirt starts a backup, it adds a checkpoint in oVirt database 3. When libvirt starts a backup, libvirt add the checkpoint oVirt created 4. Steps 2 and 3 are repeated while the VM is running... 5. Stop VM - at this point we undefine the VM, so libivrt lost all info about checkopints 6. Start VM on some host 7. oVirt use the bulk checkpoint API to redefine all checkpoints in oVirt database 8. If there is unknown bitmap on a disk, or a bitmap is missing, it means that someone modified the disk behind oVirt back, and oVirt will delete the affected checkpoints and use libvirt checkpoints API to delete the checkpoints and bitmaps. This forces full backup for the next backup of the affected disks. We would like that libvirt will fail to redefine the checkpoints if the info in the checkpoints does not match the the bitmap on the disks mentioned in the checkpoints. To do this, libvirt must have the complete checkpoint state. It cannot if we define checkpoints one by one. If libivirt cannot fail, we need to a way to list all checkpoints and bitmaps, so we can sync oVirt database with reality. If we don't validate bitmaps with the expected bitmaps, the backup may fail when starting a backup (good), or succeed, including wrong data in the backup. It also does not make sense to redefine checkpoint state in incremental way, this like building a VM XML with many devices with one call per device. The same checkpoint management will have to be duplicated in any libvirt client that manage more then one host (e.g. KubeVirt, OpenStack), since libvirt does not handle more than one host. For more info see - https://ovirt.org/develop/release-management/features/storage/incremental-backup.html - https://github.com/oVirt/vdsm/blob/master/lib/vdsm/api/vdsm-api.yml#L11525 - https://github.com/oVirt/vdsm/blob/0874b89a19af2d7726ed2f0b9d674cf0bb48a67e/lib/vdsm/api/vdsm-api.yml#L8129 > > > At the same time there is a very real downside to providing these bulk > > snapshot APIs in terms of scalability. The <domain> XML for a guest > > can get very large. The <domainsnapshot> XML contains a copy of the > > <domain> XML from the point in time of the snapsht. Therefore a > > <snapshots> XML document is going to contain *many* <domain> XML > > documents concatenated. > > > > For a domain with many devices and many snapshots, the bulk list > > APIs have non-negligible risk of hitting the RPC protocol limit > > on string size. > > I did not consider that, but you are definitely correct about it being a > real problem (I didn't trip over it in my small domain testing, but > combined XML is definitely a repetition of a <domain> per > <domainsnapshot>). > > One idea I had on the checkpoint series was adding a NO_DOMAIN flag > which outputs much more compact XML by omitting the <domain> element. > With checkpoints, the <domain> at the time of the checkpoint is less > important (useful to know if we need it, but we can't revert to that > state); but with snapshots, the <domain> at the time of the snapshot is > very important (to the point that you have to request a flag for an > unsafe revert in virDomainSnapshotRevert() if the <domain> information > was incomplete). So using a NO_DOMAIN flag to minimize XML so that the > bulk output doesn't exceed RPC limits is a possibility, but not a very > nice one (as you'd STILL have to do a one-API-per-snapshot call to > reinstate the <domain> of any snapshot that you plan to revert to). > > > > > We've seen this before with the bulk domain stats APIs we added. > > We none the less still added the bulk domain stats APIs because > > it was a compelling performance benefit for something that is run > > very frequently. For apps to be scalable though, they have to > > limit how many guests they request bulk stats for at any time. > > This in fact makes the code using the bulk stats APIs more complex > > than if it just called the individual APIs. The performance benefit > > is still a win though in this case. > > > > In the bulk snapshot import/export case though, I'm not seeing a > > compelling performance reason for their usage. Making the APIs > > scalable would require to limit how many snapshots are returned > > in any single call. There would then need to be repeated calls > > to fetch the rest. This makes it more complicated than just > > fetching/defining each individual snapshot > > And the frequency of migrating snapshots is less than the frequency of > querying bulk stats. It makes the most sense to speed up APIs called > frequently. > > > > > IOW, overall I'm not seeing a big enough justification to add these > > new APIs, that would outweigh the downsides they bring in terms of > > scalability. > > You raise some good points. > > > > > The topological sorting flag solves the really big pain point well > > enough. > > The topological sorting thing kind of fell on my lap as I was working on > the bulk code, but now that I've written it, it definitely solves a > problem that we previously had, even if I had not properly identified > the problem (randomized list ordering is painful to work with). Either > redefine has to accept randomized ordering (which it currently doesn't), > or listing should output in sane ordering (which topological adds) - and > once we have topological output, subsequent input gets a LOT easier, > even if split across multiple API calls. > > > > >> I'm also toying with the idea of followup patches that store all > >> snapshots alongside the persistent <domain> XML in a single file, > >> rather than in one snapshot per <domainsnapshot> file (we'd still > >> support reading from the old method at libvirtd startup, but would > >> not need to output in that manner any more). > > > > What would be the benefit to having it all in one file ? During > > initial libvirtd startup we could read the info from one files, > > but we still need code to read the info from many files so we're > > not saving code in that respect. Most of the snapshot APIs that > > save XML to disk only care about a single <domainsnapshot> > > document so wouldn't benefit from having all snapshots in one > > place. > > One possible benefit: in the current > virDomainSnapshotCreateXML(REDEFINE) code, we potentially end up > changing three separate snapshots in one call: if we are redefining a > replacement snapshot (maybe supplying a <domain> subelement that was > previously missing), AND marking the redefinition as the new current > element, it requires rewriting the XML file of the previous current > snapshot (to drop an internal-only <active>1</active> marker), deleting > the XML file of the previous state of the snapshot being modified, and > writing the new XML file of the redefined snapshot. If any one of those > steps die, it's very hard to roll back to the earlier state. But if we > store all snapshots in a single file using the <snapshots > current='name'> (rather than a per-snapshot <active>1</active> marker), > we only have to modify a single file when making any changes to the > snapshot hierarchy. If we fail at any earlier step of the way, we have > not partially corrupted the snapshot hierarchy in a manner that is hard > to roll back. > > Another (weak) benefit - right now, qemu snapshots cannot be named > "../blah", because that would create a filename that escapes the proper > subdirectory. With a single <snapshots>, you can allow more flexible > snapshot names, as those names no longer influence host file names. > > > > > Re-writing the entire <domain> XML means that a host failure during > > a snapshot save can put the entire domain definition at risk of loss. > > Though we do try to mitigate that by writing to a temp file and doing > > an atomic rename, that doesn't eliminate the risk entirely. > > Okay, so it's probably better to store the <snapshots> XML in a file > separate from the <domain> itself (failure to write the snapshots file > may lose the snapshots, but does not lose the domain), but still, I > can't help but think that two files is potentially nicer than a > subdirectory full of one file per snapshot. > > Unless Nir speaks up with strong reasons in favor of new bulk APIs > (rather than living with the new topological sort flags), I'm okay > dropping this series. I'm ok with dropping bulk snapshots, but I think libvirt should provide bulk APIs for checkpoints. Nir > The patches that are already in (the head of the > v3 posting of this series, for generating/parsing <snapshots>) may still > prove to be useful if you agree with me that a single file for ALL > snapshots is more robust than one file per snapshot, particularly when > we have exiting APIs that already have a hard time rolling back to a > sane state on partial failure when manipulating more than one file. But > as I have not yet posted any series showing how that would work, that > doesn't change that these v4 patches are probably not worth pursuing > further. > > And, since it is more important to get new API in by 5.2 (since rebasing > APIs is difficult to backport) than it is to do internal-only changes (a > subdirectory of multiple files vs. one <snapshots> file is not > user-visible, and can be backported without as much hassle), I'll focus > more on the checkpoint APIs rather than any refactoring of internals. So > that may mean that 5.2 is released with my <snapshots> format/parse code > with no consumers of that code, but I don't think it hurts enough to > revert them (as we may end up with a consumer for that code, even if not > in 5.2). > > -- > Eric Blake, Principal Software Engineer > Red Hat, Inc. +1-919-301-3226 > Virtualization: qemu.org | libvirt.org > > -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On 3/12/19 3:29 PM, Nir Soffer wrote: >>> snapshots = >> srcdom.ListAllSnapshots(libvirt.VIR_DOMAIN_SNAPSHOT_LIST_TOPOLOGICAL) >>> for snapshot in snapshts: >>> xml = snapshot.XMLDesc() >>> dstdom.SnapshotCreateXML(xml) >>> >>> I don't really see a reason why this is so hard that it justifies >>> adding these bulk snapshot list/define APIs. >> >> Nir, do you want to chime in here? >> > > We don't have a need to list or define snapshots since we managed snapshots > on oVirt side. > We want an API to list and redefine checkpoints. But the proposed <domaincheckpoint> also has a <domain> subelement, so it has the same problem (the XML for a bulk operation can become prohibitively large). > > Our use case is: > > 1. Start VM on some host > 2. When oVirt starts a backup, it adds a checkpoint in oVirt database And does this database preserve topological ordering? If not, why not? > 3. When libvirt starts a backup, libvirt add the checkpoint oVirt created > 4. Steps 2 and 3 are repeated while the VM is running... > 5. Stop VM - at this point we undefine the VM, so libivrt lost all info > about checkopints > 6. Start VM on some host > 7. oVirt use the bulk checkpoint API to redefine all checkpoints in oVirt > database > 8. If there is unknown bitmap on a disk, or a bitmap is missing, it means > that someone > modified the disk behind oVirt back, and oVirt will delete the affected > checkpoints > and use libvirt checkpoints API to delete the checkpoints and bitmaps. > This forces > full backup for the next backup of the affected disks. > > We would like that libvirt will fail to redefine the checkpoints if the > info in the checkpoints > does not match the the bitmap on the disks mentioned in the checkpoints. That's a different feature request - you're asking for virDomainCheckpointCreateXML(REDEFINE) to have some additional flag where the metadata re-creation fails if the underlying bitmap is corrupt/missing. Which may still be a good flag to add, but is something you can do with one-at-a-time checkpoint redefinitions, and doesn't, in itself, require bulk redefinition. > To > do this, > libvirt must have the complete checkpoint state. It cannot if we define > checkpoints one > by one. As long as you redefine checkpoints in topological order, I don't see how defining one at a time or bulk changes whether libvirt can validate that the checkpoints are correct. And even then, redefining in topological order is only necessary so that libvirt can ensure that any <parent/> checkpoint exists before attempting to define the metadata for a child checkpoint. > > If libivirt cannot fail, we need to a way to list all checkpoints and > bitmaps, so we can > sync oVirt database with reality. > > If we don't validate bitmaps with the expected bitmaps, the backup may fail > when starting > a backup (good), or succeed, including wrong data in the backup. > > It also does not make sense to redefine checkpoint state in incremental > way, this like > building a VM XML with many devices with one call per device. The argument here is that it will be the same amount of XML whether it is one call per checkpoint or one call for all checkpoints (note - there are multiple devices in one checkpoint, so it is NOT one call per 'device * checkpoint') > > The same checkpoint management will have to be duplicated in any libvirt > client that > manage more then one host (e.g. KubeVirt, OpenStack), since libvirt does > not handle > more than one host. But again, if you are moving checkpoint metadata from one host to another, doing so one-at-a-time in topological order shouldn't be much harder than doing so in one single call, but without the drawbacks of exceeding XML size over RPC call limits. > > For more info see > - > https://ovirt.org/develop/release-management/features/storage/incremental-backup.html > - https://github.com/oVirt/vdsm/blob/master/lib/vdsm/api/vdsm-api.yml#L11525 > - > https://github.com/oVirt/vdsm/blob/0874b89a19af2d7726ed2f0b9d674cf0bb48a67e/lib/vdsm/api/vdsm-api.yml#L8129 > >> Unless Nir speaks up with strong reasons in favor of new bulk APIs >> (rather than living with the new topological sort flags), I'm okay >> dropping this series. > > > I'm ok with dropping bulk snapshots, but I think libvirt should provide > bulk APIs for checkpoints. > > Nir > Given that the same amount of XML is involved across RPC, I'm hard-pressed to include bulk operations on one object without providing it on both. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On Tue, Mar 12, 2019 at 11:02 PM Eric Blake <eblake@redhat.com> wrote: > On 3/12/19 3:29 PM, Nir Soffer wrote: > > >>> snapshots = > >> srcdom.ListAllSnapshots(libvirt.VIR_DOMAIN_SNAPSHOT_LIST_TOPOLOGICAL) > >>> for snapshot in snapshts: > >>> xml = snapshot.XMLDesc() > >>> dstdom.SnapshotCreateXML(xml) > >>> > >>> I don't really see a reason why this is so hard that it justifies > >>> adding these bulk snapshot list/define APIs. > >> > >> Nir, do you want to chime in here? > >> > > > > We don't have a need to list or define snapshots since we managed > snapshots > > on oVirt side. > > We want an API to list and redefine checkpoints. > > But the proposed <domaincheckpoint> also has a <domain> subelement, so > it has the same problem (the XML for a bulk operation can become > prohibitively large). > Why do we need <domain> in a checkpoint? You can see here what we store in a checkpoint (based on your patches v2 or v3): https://ovirt.org/develop/release-management/features/storage/incremental-backup.html#engine-database-changes Which is: vm_checkpoints table - checkpoint_id: UUID - parent: UUID - vm_id: UUID - creation_date: TIMESTAMP vm_checkpoint_disk_map table - disk_id: UUID - checkpoint_id: UUID > > > Our use case is: > > > > 1. Start VM on some host > > 2. When oVirt starts a backup, it adds a checkpoint in oVirt database > > And does this database preserve topological ordering? If not, why not? > Using the parent UUID we can create the correct chain in the right order, and this is what we send to vdsm. Based on this, vdsm will generate the XML for libvirt. There is no domain info related to these checkpoints, and I hope we are not adding such requirements at this stage. > > 3. When libvirt starts a backup, libvirt add the checkpoint oVirt created > > 4. Steps 2 and 3 are repeated while the VM is running... > > 5. Stop VM - at this point we undefine the VM, so libivrt lost all info > > about checkopints > > 6. Start VM on some host > > 7. oVirt use the bulk checkpoint API to redefine all checkpoints in oVirt > > database > > 8. If there is unknown bitmap on a disk, or a bitmap is missing, it means > > that someone > > modified the disk behind oVirt back, and oVirt will delete the > affected > > checkpoints > > and use libvirt checkpoints API to delete the checkpoints and bitmaps. > > This forces > > full backup for the next backup of the affected disks. > > > > We would like that libvirt will fail to redefine the checkpoints if the > > info in the checkpoints > > does not match the the bitmap on the disks mentioned in the checkpoints. > > That's a different feature request - you're asking for > virDomainCheckpointCreateXML(REDEFINE) to have some additional flag > where the metadata re-creation fails if the underlying bitmap is > corrupt/missing. Which may still be a good flag to add, but is something > you can do with one-at-a-time checkpoint redefinitions, and doesn't, in > itself, require bulk redefinition. > Checking for unknown bitmap on disk is not possible unless you have the complete list of checkpoints. Unless you add something like: virDomainCheckpointValidate() Which will fail if the checkpoints known to libvirt do not match the bitmaps in the underlying qcow2 images. But even if we add this, the user will have to do: try: for checkpoint in orderdered_checkpoints: generate xml... redefine checkpoint... except ...: remove all checkpoints... validate checkpoints Unless the xml for single checkpoint is big, I don't see why all libvirt users need to redefine checkopoints one by one and handle all the possible errors (e.g. some checkpoints redefined, some not). Note that vdsm may be killed in the middle of the redefine loop, and in this case we leave livbirt with partial info about checkpoints, and we need to redefine the checkpoints again handling this partial sate. We means all libvirt clients that need to managed more than one host. > To > > do this, > > libvirt must have the complete checkpoint state. It cannot if we define > > checkpoints one > > by one. > > As long as you redefine checkpoints in topological order, I don't see > how defining one at a time or bulk changes whether libvirt can validate > that the checkpoints are correct. And even then, redefining in > topological order is only necessary so that libvirt can ensure that any > <parent/> checkpoint exists before attempting to define the metadata for > a child checkpoint. > > > > > If libivirt cannot fail, we need to a way to list all checkpoints and > > bitmaps, so we can > > sync oVirt database with reality. > > > > If we don't validate bitmaps with the expected bitmaps, the backup may > fail > > when starting > > a backup (good), or succeed, including wrong data in the backup. > > > > It also does not make sense to redefine checkpoint state in incremental > > way, this like > > building a VM XML with many devices with one call per device. > > The argument here is that it will be the same amount of XML whether it > is one call per checkpoint or one call for all checkpoints (note - there > are multiple devices in one checkpoint, so it is NOT one call per > 'device * checkpoint') > > > > > The same checkpoint management will have to be duplicated in any libvirt > > client that > > manage more then one host (e.g. KubeVirt, OpenStack), since libvirt does > > not handle > > more than one host. > > But again, if you are moving checkpoint metadata from one host to > another, doing so one-at-a-time in topological order shouldn't be much > harder than doing so in one single call, but without the drawbacks of > exceeding XML size over RPC call limits. > > > > > For more info see > > - > > > https://ovirt.org/develop/release-management/features/storage/incremental-backup.html > > - > https://github.com/oVirt/vdsm/blob/master/lib/vdsm/api/vdsm-api.yml#L11525 > > - > > > https://github.com/oVirt/vdsm/blob/0874b89a19af2d7726ed2f0b9d674cf0bb48a67e/lib/vdsm/api/vdsm-api.yml#L8129 > > > > >> Unless Nir speaks up with strong reasons in favor of new bulk APIs > >> (rather than living with the new topological sort flags), I'm okay > >> dropping this series. > > > > > > I'm ok with dropping bulk snapshots, but I think libvirt should provide > > bulk APIs for checkpoints. > > > > Nir > > > > Given that the same amount of XML is involved across RPC, I'm > hard-pressed to include bulk operations on one object without providing > it on both. > > -- > Eric Blake, Principal Software Engineer > Red Hat, Inc. +1-919-301-3226 > Virtualization: qemu.org | libvirt.org > > -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On 3/12/19 4:35 PM, Nir Soffer wrote: >>> We don't have a need to list or define snapshots since we managed >> snapshots >>> on oVirt side. >>> We want an API to list and redefine checkpoints. >> >> But the proposed <domaincheckpoint> also has a <domain> subelement, so >> it has the same problem (the XML for a bulk operation can become >> prohibitively large). >> > > Why do we need <domain> in a checkpoint? The initial design for snapshots did not have <domain>, and it bit us hard; you cannot safely revert to a snapshot if you don't know the state of the domain at that time. The initial design for checkpoints has thus mandated that <domain> be present (my v5 series fails to redefine a snapshot if you omit <domain>, even though it has a NO_DOMAIN flag during dumpxml for reduced output). If we are convinced that defining a snapshot without <domain> is safe enough, then I can relax the checkpoint code to allow redefined metadata without <domain> the way snapshots already have to do it, even though I was hoping that checkpoints could start life with fewer back-compats that snapshot has had to carry along. But I'd rather start strict and relax later (require <domain> and then remove it when proven safe), and not start loose (leave <domain> optional, and then wish we had made it mandatory). > > You can see here what we store in a checkpoint (based on your patches v2 or > v3): > https://ovirt.org/develop/release-management/features/storage/incremental-backup.html#engine-database-changes > > Which is: > > vm_checkpoints table > > - checkpoint_id: UUID > - parent: UUID > - vm_id: UUID > - creation_date: TIMESTAMP > > vm_checkpoint_disk_map table > - disk_id: UUID > - checkpoint_id: UUID > And no state of the <domain> at the time the checkpoint was created? >> >>> Our use case is: >>> >>> 1. Start VM on some host >>> 2. When oVirt starts a backup, it adds a checkpoint in oVirt database >> >> And does this database preserve topological ordering? If not, why not? >> > > Using the parent UUID we can create the correct chain in the right order, > and this is what we send to vdsm. Based on this, vdsm will generate the XML > for libvirt. > > There is no domain info related to these checkpoints, and I hope we are not > adding such requirements at this stage. Rather, that requirement has been there in all of my drafts, and it sounds like you are trying to get me to relax that requirement to not need a <domain> during checkpoint metadata redefine. >> That's a different feature request - you're asking for >> virDomainCheckpointCreateXML(REDEFINE) to have some additional flag >> where the metadata re-creation fails if the underlying bitmap is >> corrupt/missing. Which may still be a good flag to add, but is something >> you can do with one-at-a-time checkpoint redefinitions, and doesn't, in >> itself, require bulk redefinition. >> > > Checking for unknown bitmap on disk is not possible unless you have the > complete > list of checkpoints. Unless you add something like: > > virDomainCheckpointValidate() > > Which will fail if the checkpoints known to libvirt do not match the > bitmaps in the > underlying qcow2 images. Yes, an API like that may be better than trying to validate on a per-redefine basis whether libvirt metadata matches qcow2 metadata. > > But even if we add this, the user will have to do: > > try: > for checkpoint in orderdered_checkpoints: > generate xml... > redefine checkpoint... > except ...: > remove all checkpoints... > > validate checkpoints Yes, but that's not hard. And it's no different whether the 'try/except' code is a single libvirt API (with bulk redefine) or multiple libvirt APIs (no bulk redefine, but easy enough to one-at-a-time redefine in topological order). Why you have to generate the XML instead of reusing the XML that libvirt produces is something I'm not sure of (that is, what's wrong with using libvirt's dumpxml output as your input, rather than trying to reconstruct it on the fly?). > > Unless the xml for single checkpoint is big, I don't see why all libvirt > users > need to redefine checkopoints one by one and handle all the possible errors > (e.g. some checkpoints redefined, some not). As currently proposed, <domaincheckpoint> includes a mandatory <domain> sub-element on redefinition, which means that the XML _is_ big for a series of checkpoints. It also means that you're better off storing the XML produced by libvirt than trying to regenerate it by hand, when it comes to metadata redefinition. > > Note that vdsm may be killed in the middle of the redefine loop, and in > this case > we leave livbirt with partial info about checkpoints, and we need to > redefine > the checkpoints again handling this partial sate. But that's relatively easy - if you don't know whether libvirt might have partial data, then wipe the data and start the redefine loop from scratch. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On Tue, Mar 12, 2019 at 11:52 PM Eric Blake <eblake@redhat.com> wrote: > On 3/12/19 4:35 PM, Nir Soffer wrote: > > >>> We don't have a need to list or define snapshots since we managed > >> snapshots > >>> on oVirt side. > >>> We want an API to list and redefine checkpoints. > >> > >> But the proposed <domaincheckpoint> also has a <domain> subelement, so > >> it has the same problem (the XML for a bulk operation can become > >> prohibitively large). > >> > > > > Why do we need <domain> in a checkpoint? > > The initial design for snapshots did not have <domain>, and it bit us > hard; you cannot safely revert to a snapshot if you don't know the state > of the domain at that time. The initial design for checkpoints has thus > mandated that <domain> be present (my v5 series fails to redefine a > snapshot if you omit <domain>, even though it has a NO_DOMAIN flag > during dumpxml for reduced output). If we are convinced that defining a > snapshot without <domain> is safe enough, then I can relax the > checkpoint code to allow redefined metadata without <domain> the way > snapshots already have to do it, even though I was hoping that > checkpoints could start life with fewer back-compats that snapshot has > had to carry along. But I'd rather start strict and relax later > (require <domain> and then remove it when proven safe), and not start > loose (leave <domain> optional, and then wish we had made it mandatory). > You keep mentioning snapshots, but we are not going to redefine snapshots, only checkpoints. We need to do this only because qcow2 does not store a chain of bitmaps, but un-ordered bitmaps without any metadata, so we need to remind libvirt which bitmaps we have in every disk, and what is the order of the bitmaps. We will be happy if we could just specify the bitmaps and disks in the backup API, and avoid rebuilding the checkopints history, just like we avoid rebuilding the snapshot history. > You can see here what we store in a checkpoint (based on your patches v2 > or > > v3): > > > https://ovirt.org/develop/release-management/features/storage/incremental-backup.html#engine-database-changes > > > > Which is: > > > > vm_checkpoints table > > > > - checkpoint_id: UUID > > - parent: UUID > > - vm_id: UUID > > - creation_date: TIMESTAMP > > > > vm_checkpoint_disk_map table > > - disk_id: UUID > > - checkpoint_id: UUID > > > > And no state of the <domain> at the time the checkpoint was created? > No, but the configuration of the VM is stored in oVirt, so the backup application can get this configuration from oVirt at the time of the backup. >> > >>> Our use case is: > >>> > >>> 1. Start VM on some host > >>> 2. When oVirt starts a backup, it adds a checkpoint in oVirt database > >> > >> And does this database preserve topological ordering? If not, why not? > >> > > > > Using the parent UUID we can create the correct chain in the right order, > > and this is what we send to vdsm. Based on this, vdsm will generate the > XML > > for libvirt. > > > > There is no domain info related to these checkpoints, and I hope we are > not > > adding such requirements at this stage. > > Rather, that requirement has been there in all of my drafts, and it > sounds like you are trying to get me to relax that requirement to not > need a <domain> during checkpoint metadata redefine. > We never understood this requirement it in the html documentation built from your patches. Looking again in v3 - I see: domain The inactive domain configuration at the time the checkpoint was created. Readonly. And we have an example: <domaincheckpoint> <description>Completion of updates after OS install</description> <disks> <disk name='vda' checkpoint='bitmap'/> <disk name='vdb' checkpoint='no'/> </disks> </domaincheckpoint> Which does not have a domain, since it is read-only. There is an example of the xml we get from libvirt: <domaincheckpoint> <name>1525889631</name> <description>Completion of updates after OS install</description> <creationTime>1525889631</creationTime> <parent> <name>1525111885</name> </parent> <disks> <disk name='vda' checkpoint='bitmap' bitmap='1525889631'/> <disk name='vdb' checkpoint='no'/> </disks> <domain type='qemu'> <name>fedora</name> <uuid>93a5c045-6457-2c09-e56c-927cdf34e178</uuid> <memory>1048576</memory> ... <devices> <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/path/to/file1'/> <target dev='vda' bus='virtio'/> </disk> <disk type='file' device='disk' snapshot='external'> <driver name='qemu' type='raw'/> <source file='/path/to/file2'/> <target dev='vdb' bus='virtio'/> </disk> ... </devices> </domain> </domaincheckpoint> But as the domain is read-only, we assumed that we don't need to pass it in the domaincheckpoint xml. We also don't need it since we keep the the disks uuids included in every checkpoint, and from this you can find the information needed to generate the tiny checkpoint xml above. Can you explain why historic domain xml is needed for performing a backup? The caller of the API is responsible to give you the list of bitmaps that should be included for every disk. Libvirt needs to create a new active bitmap and start the NBD server exposing the bitmaps specified by the caller. How having the xml for every checkpoint/bitmap is needed for perform these steps? Note also that libvirt does not have snapshot history on oVirt host, since we manage snapshots externally. When libvirt starts a VM, the only thing it knows about snapshots is the chain of disks. Even if we restore the domain xml for every checkpoint, libvirt cannot validate it against the non-existing snapshots. While we use persistent VMs since 4.2, our VMs are actually transient. We started to use persistent VMs because they are more tested, and we can use the VM <metadata> to persist stuff, instead of maintaining our own temporary metadata persistence solution. >> That's a different feature request - you're asking for > >> virDomainCheckpointCreateXML(REDEFINE) to have some additional flag > >> where the metadata re-creation fails if the underlying bitmap is > >> corrupt/missing. Which may still be a good flag to add, but is something > >> you can do with one-at-a-time checkpoint redefinitions, and doesn't, in > >> itself, require bulk redefinition. > >> > > > > Checking for unknown bitmap on disk is not possible unless you have the > > complete > > list of checkpoints. Unless you add something like: > > > > virDomainCheckpointValidate() > > > > Which will fail if the checkpoints known to libvirt do not match the > > bitmaps in the > > underlying qcow2 images. > > Yes, an API like that may be better than trying to validate on a > per-redefine basis whether libvirt metadata matches qcow2 metadata. > > > > > But even if we add this, the user will have to do: > > > > try: > > for checkpoint in orderdered_checkpoints: > > generate xml... > > redefine checkpoint... > > except ...: > > remove all checkpoints... > > > > validate checkpoints > > Yes, but that's not hard. And it's no different whether the 'try/except' > code is a single libvirt API (with bulk redefine) or multiple libvirt > APIs (no bulk redefine, but easy enough to one-at-a-time redefine in > topological order). Why you have to generate the XML instead of reusing > the XML that libvirt produces is something I'm not sure of (that is, > what's wrong with using libvirt's dumpxml output as your input, rather > than trying to reconstruct it on the fly?). > The xml libvirt produces may not be unstable. For example the disk uuid "22e5a7d6-5409-482d-bda4-f5939b27aad4" may be called now "sda", but when the vm was running on another host 2 weeks ago maybe it was called "sdb", since the user added or removed some devices. When we create the xml using the disk uuuid, we will always refer to the current name of the disk, regardless of the history of the vm. > > Unless the xml for single checkpoint is big, I don't see why all libvirt > > users > > need to redefine checkopoints one by one and handle all the possible > errors > > (e.g. some checkpoints redefined, some not). > > As currently proposed, <domaincheckpoint> includes a mandatory <domain> > sub-element on redefinition, which means that the XML _is_ big for a > series of checkpoints. It also means that you're better off storing the > XML produced by libvirt than trying to regenerate it by hand, when it > comes to metadata redefinition. > > > > > Note that vdsm may be killed in the middle of the redefine loop, and in > > this case > > we leave livbirt with partial info about checkpoints, and we need to > > redefine > > the checkpoints again handling this partial sate. > > But that's relatively easy - if you don't know whether libvirt might > have partial data, then wipe the data and start the redefine loop from > scratch. > But this easy boilerplate will be duplicated in all libvirt clients, instead of having single API to restore the state. Of course if we must keep historic domain configuration for every checkpoint bulk API does not make sense, but I don't understand this requirement. Nir -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On 3/12/19 6:29 PM, Nir Soffer wrote: > On Tue, Mar 12, 2019 at 11:52 PM Eric Blake <eblake@redhat.com> wrote: > >> On 3/12/19 4:35 PM, Nir Soffer wrote: >> >>>>> We don't have a need to list or define snapshots since we managed >>>> snapshots >>>>> on oVirt side. >>>>> We want an API to list and redefine checkpoints. >>>> >>>> But the proposed <domaincheckpoint> also has a <domain> subelement, so >>>> it has the same problem (the XML for a bulk operation can become >>>> prohibitively large). >>>> >>> >>> Why do we need <domain> in a checkpoint? >> >> The initial design for snapshots did not have <domain>, and it bit us >> hard; you cannot safely revert to a snapshot if you don't know the state >> of the domain at that time. The initial design for checkpoints has thus >> mandated that <domain> be present (my v5 series fails to redefine a >> snapshot if you omit <domain>, even though it has a NO_DOMAIN flag >> during dumpxml for reduced output). If we are convinced that defining a >> snapshot without <domain> is safe enough, then I can relax the s/snapshot/checkpoint/ in that line >> checkpoint code to allow redefined metadata without <domain> the way >> snapshots already have to do it, even though I was hoping that >> checkpoints could start life with fewer back-compats that snapshot has >> had to carry along. But I'd rather start strict and relax later >> (require <domain> and then remove it when proven safe), and not start >> loose (leave <domain> optional, and then wish we had made it mandatory). >> > > You keep mentioning snapshots, but we are not going to redefine snapshots, > only checkpoints. > > We need to do this only because qcow2 does not store a chain of bitmaps, but just as qcow2 does not store a chain of snapshots either. > un-ordered bitmaps without any metadata, so we need to remind libvirt which > bitmaps we have in every disk, and what is the order of the bitmaps. We > will be > happy if we could just specify the bitmaps and disks in the backup API, and > avoid > rebuilding the checkopints history, just like we avoid rebuilding the > snapshot > history. For persistent domains, it would be better if libvirt migrated snapshots (and therefore checkpoints) automatically, instead of being a coward and refusing the migration because a snapshot exists. The workaround is that you dumpxml then undefine snapshots on the source, then migrate, then createXML(REDEFINE) on the destination using the dumped XML. Since checkpoints share a lot of code with snapshots, it was easier to get the code working by having the same rules (and if we fix it for one, we should fix it for both, again because they share a lot of code). For transient domains, libvirt has no where to store the information about the snapshots or checkpoints, so it is relying on you to REDEFINE the SAME state as what it had last time libvirt was running. >>> There is no domain info related to these checkpoints, and I hope we are >> not >>> adding such requirements at this stage. >> >> Rather, that requirement has been there in all of my drafts, and it >> sounds like you are trying to get me to relax that requirement to not >> need a <domain> during checkpoint metadata redefine. >> > > We never understood this requirement it in the html documentation built > from your patches. > > Looking again in v3 - I see: > > domain > The inactive domain configuration at the time the checkpoint was > created. Readonly. Copied heavily from snapshots which has the same wording, so I'm going to improve that wording in the next round of patches. But the key point is that: CreateXML(, 0) takes abbreviated (or full) XML, ignores readonly fields, _and_ populates remaining fields with information learned AT THE TIME the object was created. Thereafter, GetXMLDesc() outputs the full XML, and: CreateXML(, _REDEFINE) says to take the FULL XML from a previous dump (and NOT the abbreviated XML at the initial creation) in order to restore ALL state as libvirt had (if you omit fields, libvirt has to guess at what state to recreate for those fields - for some fields, guessing might not be too bad, such as what timestamp to use. But for other fields, guessing what to use for <domain> by using the current state of the domain, which may have had hotplug events in the meantime, is WRONG compared to being told the exact <domain> state that libvirt previously had in memory and which was included in the previous dumpxml). In other words, _REDEFINE has never been about inventing state from scratch, but about restoring state that libvirt previously had, as dumped by libvirt. > > And we have an example: > > <domaincheckpoint> > <description>Completion of updates after OS install</description> > <disks> > <disk name='vda' checkpoint='bitmap'/> > <disk name='vdb' checkpoint='no'/> > </disks> > </domaincheckpoint> > > Which does not have a domain, since it is read-only. Yes, that's and example of CreateXML(, 0). But NOT an example of CreateXML(, _REDEFINE). > > There is an example of the xml we get from libvirt: > > <domaincheckpoint> > <name>1525889631</name> > <description>Completion of updates after OS install</description> > <creationTime>1525889631</creationTime> > <parent> > <name>1525111885</name> > </parent> > <disks> > <disk name='vda' checkpoint='bitmap' bitmap='1525889631'/> > <disk name='vdb' checkpoint='no'/> > </disks> > <domain type='qemu'> > <name>fedora</name> > <uuid>93a5c045-6457-2c09-e56c-927cdf34e178</uuid> > <memory>1048576</memory> > ... > <devices> > <disk type='file' device='disk'> > <driver name='qemu' type='qcow2'/> > <source file='/path/to/file1'/> > <target dev='vda' bus='virtio'/> > </disk> > <disk type='file' device='disk' snapshot='external'> > <driver name='qemu' type='raw'/> > <source file='/path/to/file2'/> > <target dev='vdb' bus='virtio'/> > </disk> > ... > </devices> > </domain> > </domaincheckpoint> And _that's_ the XML that the _REDEFINE flag expects. > > But as the domain is read-only, we assumed that we don't need to pass it in > the > domaincheckpoint xml. We also don't need it since we keep the the disks > uuids > included in every checkpoint, and from this you can find the information > needed > to generate the tiny checkpoint xml above. But the tiny original input XML is no longer sufficient during _REDEFINE, because you no longer have the point in time where libvirt can populate missing/readonly fields with sane defaults. > > Can you explain why historic domain xml is needed for performing a backup? In the current implementation for qemu, the full <domain> is needed only for validation that the disks requested in the virBackupBegin() operation have a clean chain of bitmaps from the checkpoint to the present; and you may be right that the bulk of the information about the <domain> is spurious for qemu's needs. But it is not obvious whether throwing away information about the <domain> would be valid for other hypervisor implementations, nor whether some future tweak to qemu may actually require more of the <domain> state as it was at the time the snapshot was created. > > The caller of the API is responsible to give you the list of bitmaps that > should be > included for every disk. Libvirt needs to create a new active bitmap and > start the > NBD server exposing the bitmaps specified by the caller. How having the xml > for > every checkpoint/bitmap is needed for perform these steps? Because having the <domain> lets libvirt correlate that disk 'vda' at the time of the snapshot maps to whatever filename with UUID in the <domain> definition, to ensure that it is operating on the same device (and that 'vda' was not hot-unplugged and then hot-plugged to some other disk in the meantime). >> > > The xml libvirt produces may not be unstable. For example the disk uuid > "22e5a7d6-5409-482d-bda4-f5939b27aad4" may be called now "sda", but > when the vm was running on another host 2 weeks ago maybe it was called > "sdb", since the user added or removed some devices. But at the time a <domainsnapshot> or <domaincheckpoint> was created, the <domain> subelement of that XML _is_ stable - that particular disk name/UUID WAS disk 'sda' at the time the object was created, no matter what name it has later. >>> Note that vdsm may be killed in the middle of the redefine loop, and in >>> this case >>> we leave livbirt with partial info about checkpoints, and we need to >>> redefine >>> the checkpoints again handling this partial sate. >> >> But that's relatively easy - if you don't know whether libvirt might >> have partial data, then wipe the data and start the redefine loop from >> scratch. >> > > But this easy boilerplate will be duplicated in all libvirt clients, > instead of > having single API to restore the state. If the single API didn't have to include a <domain> element for every checkpoint, and thus risk exceeding RPC bounds, then we'd be okay. But at the present moment, bulk operations would require a LOT of XML, regardless of snapshot vs. checkpoint. > > Of course if we must keep historic domain configuration for every checkpoint > bulk API does not make sense, but I don't understand this requirement. The requirement is stronger for snapshot (because you can't revert without either knowing the <domain> setup to revert to or using the UNSAFE flag) than for checkpoint; but my plan was to make <domain> mandatory for checkpoint because experience with checkpoint has shown that making <domain> optional on REDEFINE causes more grief than desired, and because I wanted to share as much good code between the two (while leaving the back-compats to just snapshots) as possible. If it turns out that we want <domain> to be optional on REDEFINE for checkpoints, we can do that as a followup patch, but first I want to focus on getting the API in (we can relax API restrictions later, but only if the API even exists to make it into backports). -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On Tue, Mar 12, 2019 at 04:52:24PM -0500, Eric Blake wrote: > On 3/12/19 4:35 PM, Nir Soffer wrote: > > >>> We don't have a need to list or define snapshots since we managed > >> snapshots > >>> on oVirt side. > >>> We want an API to list and redefine checkpoints. > >> > >> But the proposed <domaincheckpoint> also has a <domain> subelement, so > >> it has the same problem (the XML for a bulk operation can become > >> prohibitively large). > >> > > > > Why do we need <domain> in a checkpoint? > > The initial design for snapshots did not have <domain>, and it bit us > hard; you cannot safely revert to a snapshot if you don't know the state > of the domain at that time. The initial design for checkpoints has thus > mandated that <domain> be present (my v5 series fails to redefine a > snapshot if you omit <domain>, even though it has a NO_DOMAIN flag > during dumpxml for reduced output). If we are convinced that defining a > snapshot without <domain> is safe enough, then I can relax the > checkpoint code to allow redefined metadata without <domain> the way > snapshots already have to do it, even though I was hoping that > checkpoints could start life with fewer back-compats that snapshot has > had to carry along. But I'd rather start strict and relax later > (require <domain> and then remove it when proven safe), and not start > loose (leave <domain> optional, and then wish we had made it mandatory). Given that a guest domain XML can be change at runtime at any point, I don't see how omitting <domain> from the checkpoint XML is safe in general. Even if apps think it is safe now and omit it, a future version of the app might change in a way that makes omitting the <domain> unsafe. If we didn't historically record the <domain> in the checkpoint in the first place, then the new version of the app is potentially in trouble. So I think it is good that we are strict and mandate the <domain> XML even if it is not technically required in some use cases. > > Note that vdsm may be killed in the middle of the redefine loop, and in > > this case > > we leave livbirt with partial info about checkpoints, and we need to > > redefine > > the checkpoints again handling this partial sate. > > But that's relatively easy - if you don't know whether libvirt might > have partial data, then wipe the data and start the redefine loop from > scratch. Of course the same failure scenario applies if libvirt is doing it via a bulk operation. The redefine loop still exists, just inside libvirt instead, which might be killed or die part way though. So you're not really fixing a failure scenario, just moving the failure to a different piece. That's no net win. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On 3/12/19 4:52 PM, Eric Blake wrote: >> vm_checkpoints table >> >> - checkpoint_id: UUID >> - parent: UUID >> - vm_id: UUID >> - creation_date: TIMESTAMP >> >> vm_checkpoint_disk_map table >> - disk_id: UUID >> - checkpoint_id: UUID >> > > And no state of the <domain> at the time the checkpoint was created? I meant to add: once we start playing with checkpoints vs. external snapshots vs. hotplug in anger, then knowing WHICH disks existed at the time of a given checkpoint may prove invaluable to even figure out WHICH disks need to participate in a given backup operation. The initial implementation uses XML (so that adding features can hopefully reuse the same API, rather than needing yet more one-off API additions), even if it is currently limited to not playing nicely with snapshots. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On Wed, Mar 13, 2019 at 12:01 AM Eric Blake <eblake@redhat.com> wrote: > On 3/12/19 4:52 PM, Eric Blake wrote: > > >> vm_checkpoints table > >> > >> - checkpoint_id: UUID > >> - parent: UUID > >> - vm_id: UUID > >> - creation_date: TIMESTAMP > >> > >> vm_checkpoint_disk_map table > >> - disk_id: UUID > >> - checkpoint_id: UUID > >> > > > > And no state of the <domain> at the time the checkpoint was created? > > I meant to add: > > once we start playing with checkpoints vs. external snapshots vs. > hotplug in anger, then knowing WHICH disks existed at the time of a > given checkpoint may prove invaluable to even figure out WHICH disks > need to participate in a given backup operation. The initial > implementation uses XML (so that adding features can hopefully reuse the > same API, rather than needing yet more one-off API additions), even if > it is currently limited to not playing nicely with snapshots. When you plug a disk, you cannot use any of the bitmaps on the disk because you don't have any guarantee that the user did not attach the disk to another machine that modified the data in some way without updating the bitmaps. So we are going to delete a disk from checkpoints once you unplug it, and start from scratch when you plug a disk. I think what we keep is enough to tell libvirt which bitmaps are needed for a backup. We probably need to keep the vm configuration per checkpoint or at least ensure consistency, so once you started a backup the configuration cannot change until the backup ends. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On 3/12/19 6:45 PM, Nir Soffer wrote: > > When you plug a disk, you cannot use any of the bitmaps on the disk because > you don't have any guarantee that the user did not attach the disk to > another machine > that modified the data in some way without updating the bitmaps. So we are > going to > delete a disk from checkpoints once you unplug it, and start from scratch > when you > plug a disk. Correct - from libvirt's point of view, when a disk is hotplugged, the next backup needs to perform a full backup rather than an incremental backup on that disk, because that disk was not present in the machine on the last checkpoint and therefore does not have bitmap coverage from the last checkpoint to the present moment. But you don't necessarily have to delete bitmaps from the qcow2 file to get this effect. The fact that libvirt stored a full <domain> at the time of the checkpoint is enough for libvirt to tell that you are requesting a backup of a disk that did not exist at the starting checkpoint, and therefore that disk needs a full rather than incremental backup. > > I think what we keep is enough to tell libvirt which bitmaps are needed for > a backup. The current design for virDomainBackup() is that you state which checkpoint to start from, and then libvirt computes all bitmaps from that checkpoint to the present in order to perform the incremental backup, or complains that there is no sequence of bitmaps that covers that length in time so you have to do a full backup instead. You do not tell libvirt 'perform a backup of the clusters represented in bitmaps B, C, and D', but 'perform a backup of the clusters modified since time T2'. Libvirt then crawls the chain of checkpoint objects to find T2 and verify which bitmaps between T2 and the present (if that be bitmaps B, C, and D) that it needs to use to perform that differential backup. > > We probably need to keep the vm configuration per checkpoint or at least > ensure > consistency, so once you started a backup the configuration cannot change > until > the backup ends. And that configuration per checkpoint IS the <domain> subelement. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
© 2016 - 2024 Red Hat, Inc.