qapi/block-core.json | 80 +++++++++++++++++++++++ qapi/transaction.json | 4 ++ include/block/dirty-bitmap.h | 2 + block/dirty-bitmap.c | 21 ++++++ blockdev.c | 151 +++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 258 insertions(+)
Hi all. There are three qmp commands, needed to implement external backup API. Using these three commands, client may do all needed bitmap management by hand: on backup start we need to do a transaction: {disable old bitmap, create new bitmap} on backup success: drop old bitmap on backup fail: enable old bitmap merge new bitmap to old bitmap drop new bitmap Question: it may be better to make one command instead of two: block-dirty-bitmap-set-enabled(bool enabled) Vladimir Sementsov-Ogievskiy (4): block/dirty-bitmap: add lock to bdrv_enable/disable_dirty_bitmap qapi: add block-dirty-bitmap-enable/disable qmp: transaction support for block-dirty-bitmap-enable/disable qapi: add block-dirty-bitmap-merge qapi/block-core.json | 80 +++++++++++++++++++++++ qapi/transaction.json | 4 ++ include/block/dirty-bitmap.h | 2 + block/dirty-bitmap.c | 21 ++++++ blockdev.c | 151 +++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 258 insertions(+) -- 2.11.1
On 11/13/2017 11:20 AM, Vladimir Sementsov-Ogievskiy wrote: > Hi all. > > There are three qmp commands, needed to implement external backup API. > > Using these three commands, client may do all needed bitmap management by > hand: > > on backup start we need to do a transaction: > {disable old bitmap, create new bitmap} > > on backup success: > drop old bitmap > > on backup fail: > enable old bitmap > merge new bitmap to old bitmap > drop new bitmap > Can you give me an example of how you expect these commands to be used, and why they're required? I'm a little weary about how error-prone these commands might be and the potential for incorrect usage seems... high. Why do we require them?
16.11.2017 00:20, John Snow wrote: > > On 11/13/2017 11:20 AM, Vladimir Sementsov-Ogievskiy wrote: >> Hi all. >> >> There are three qmp commands, needed to implement external backup API. >> >> Using these three commands, client may do all needed bitmap management by >> hand: >> >> on backup start we need to do a transaction: >> {disable old bitmap, create new bitmap} >> >> on backup success: >> drop old bitmap >> >> on backup fail: >> enable old bitmap >> merge new bitmap to old bitmap >> drop new bitmap >> > Can you give me an example of how you expect these commands to be used, > and why they're required? > > I'm a little weary about how error-prone these commands might be and the > potential for incorrect usage seems... high. Why do we require them? It is needed for incremental backup. It looks like bad idea to export abdicate/reclaim functionality, it is simpler and clearer to allow user to merge/enable/disable bitmaps by hand. usage is like this: 1. we have dirty bitmap bitmap0 for incremental backup. 2. prepare image fleecing (create temporary image with backing=our_disk) 3. in qmp transaction: - disable bitmap0 - create bitmap1 - start image fleecing (backup sync=none our_disk -> temp_disk) 4. export temp_disk (it looks like a snapshot) and bitmap0 through NBD === external tool can get bitmap0 and then copy some clusters through NBD === on successful backup external tool can drop bitmap0. or can save it and reuse somehow on failed backup external tool can do the following: - enable bitmap0 - merge bitmap1 to bitmap0 - drop bitmap1 -- Best regards, Vladimir
On 11/16/2017 03:17 AM, Vladimir Sementsov-Ogievskiy wrote: > 16.11.2017 00:20, John Snow wrote: >> >> On 11/13/2017 11:20 AM, Vladimir Sementsov-Ogievskiy wrote: >>> Hi all. >>> >>> There are three qmp commands, needed to implement external backup API. >>> >>> Using these three commands, client may do all needed bitmap >>> management by >>> hand: >>> >>> on backup start we need to do a transaction: >>> {disable old bitmap, create new bitmap} >>> >>> on backup success: >>> drop old bitmap >>> >>> on backup fail: >>> enable old bitmap >>> merge new bitmap to old bitmap >>> drop new bitmap >>> >> Can you give me an example of how you expect these commands to be used, >> and why they're required? >> >> I'm a little weary about how error-prone these commands might be and the >> potential for incorrect usage seems... high. Why do we require them? > > It is needed for incremental backup. It looks like bad idea to export > abdicate/reclaim functionality, it is simpler > and clearer to allow user to merge/enable/disable bitmaps by hand. > > usage is like this: > > 1. we have dirty bitmap bitmap0 for incremental backup. > > 2. prepare image fleecing (create temporary image with backing=our_disk) > 3. in qmp transaction: > - disable bitmap0 > - create bitmap1 > - start image fleecing (backup sync=none our_disk -> temp_disk) This could probably just be its own command, though: block-job-fleece node=foobar bitmap=bitmap0 etc=etera etc=etera Could handle forking the bitmap. I'm not sure what the arguments would look like, but we could name the NBD export here, too. (Assuming the server is already started and we just need to create the share.) Then, we can basically do what mirror does: (1) Cancel (2) Complete Cancel would instruct QEMU to keep the bitmap changes (i.e. roll back), and Complete would instruct QEMU to discard the changes. This way we don't need to expose commands like split or merge that will almost always be dangerous to use over QMP. In fact, a fleecing job would be really convenient even without a bitmap, because it'd still be nice to have a convenience command for it. Using an existing infrastructure and understood paradigm is just a bonus. > 4. export temp_disk (it looks like a snapshot) and bitmap0 through NBD > > === external tool can get bitmap0 and then copy some clusters through > NBD === > > on successful backup external tool can drop bitmap0. or can save it and > reuse somehow > > on failed backup external tool can do the following: > - enable bitmap0 > - merge bitmap1 to bitmap0 > - drop bitmap1 >
17.11.2017 06:10, John Snow wrote: > > On 11/16/2017 03:17 AM, Vladimir Sementsov-Ogievskiy wrote: >> 16.11.2017 00:20, John Snow wrote: >>> On 11/13/2017 11:20 AM, Vladimir Sementsov-Ogievskiy wrote: >>>> Hi all. >>>> >>>> There are three qmp commands, needed to implement external backup API. >>>> >>>> Using these three commands, client may do all needed bitmap >>>> management by >>>> hand: >>>> >>>> on backup start we need to do a transaction: >>>> {disable old bitmap, create new bitmap} >>>> >>>> on backup success: >>>> drop old bitmap >>>> >>>> on backup fail: >>>> enable old bitmap >>>> merge new bitmap to old bitmap >>>> drop new bitmap >>>> >>> Can you give me an example of how you expect these commands to be used, >>> and why they're required? >>> >>> I'm a little weary about how error-prone these commands might be and the >>> potential for incorrect usage seems... high. Why do we require them? >> It is needed for incremental backup. It looks like bad idea to export >> abdicate/reclaim functionality, it is simpler >> and clearer to allow user to merge/enable/disable bitmaps by hand. >> >> usage is like this: >> >> 1. we have dirty bitmap bitmap0 for incremental backup. >> >> 2. prepare image fleecing (create temporary image with backing=our_disk) >> 3. in qmp transaction: >> - disable bitmap0 >> - create bitmap1 >> - start image fleecing (backup sync=none our_disk -> temp_disk) > This could probably just be its own command, though: > > block-job-fleece node=foobar bitmap=bitmap0 etc=etera etc=etera > > Could handle forking the bitmap. I'm not sure what the arguments would > look like, but we could name the NBD export here, too. (Assuming the > server is already started and we just need to create the share.) > > Then, we can basically do what mirror does: > > (1) Cancel > (2) Complete > > Cancel would instruct QEMU to keep the bitmap changes (i.e. roll back), > and Complete would instruct QEMU to discard the changes. > > This way we don't need to expose commands like split or merge that will > almost always be dangerous to use over QMP. > > In fact, a fleecing job would be really convenient even without a > bitmap, because it'd still be nice to have a convenience command for it. > Using an existing infrastructure and understood paradigm is just a bonus. 1. If I understand correctly, Kevin and Max said in their report in Prague about new block-job approach, using filter nodes, so I'm not sure that this is a good Idea to introduce now new old-style block-job, where we can do without it. 2. there is the following scenario: customers needs a possibility to create a backup of data changed since some point in time. So, maintaining several static and one (or several) activ bitmaps with a possiblity of merge some of them and create a backup using this merged bitmap may be convenient. 3. And I don't see any danger here: we already can create, delete and clear dirty bitmaps through qmp. Some additional operations should not be bad. 4. Interesting: "disabled" field looks like never used.. > >> 4. export temp_disk (it looks like a snapshot) and bitmap0 through NBD >> >> === external tool can get bitmap0 and then copy some clusters through >> NBD === >> >> on successful backup external tool can drop bitmap0. or can save it and >> reuse somehow >> >> on failed backup external tool can do the following: >> - enable bitmap0 >> - merge bitmap1 to bitmap0 >> - drop bitmap1 >> -- Best regards, Vladimir
On 11/17/2017 03:22 AM, Vladimir Sementsov-Ogievskiy wrote: > 17.11.2017 06:10, John Snow wrote: >> >> On 11/16/2017 03:17 AM, Vladimir Sementsov-Ogievskiy wrote: >>> 16.11.2017 00:20, John Snow wrote: >>>> On 11/13/2017 11:20 AM, Vladimir Sementsov-Ogievskiy wrote: >>>>> Hi all. >>>>> >>>>> There are three qmp commands, needed to implement external backup API. >>>>> >>>>> Using these three commands, client may do all needed bitmap >>>>> management by >>>>> hand: >>>>> >>>>> on backup start we need to do a transaction: >>>>> {disable old bitmap, create new bitmap} >>>>> >>>>> on backup success: >>>>> drop old bitmap >>>>> >>>>> on backup fail: >>>>> enable old bitmap >>>>> merge new bitmap to old bitmap >>>>> drop new bitmap >>>>> >>>> Can you give me an example of how you expect these commands to be used, >>>> and why they're required? >>>> >>>> I'm a little weary about how error-prone these commands might be and the >>>> potential for incorrect usage seems... high. Why do we require them? >>> It is needed for incremental backup. It looks like bad idea to export >>> abdicate/reclaim functionality, it is simpler >>> and clearer to allow user to merge/enable/disable bitmaps by hand. >>> >>> usage is like this: >>> >>> 1. we have dirty bitmap bitmap0 for incremental backup. >>> >>> 2. prepare image fleecing (create temporary image with backing=our_disk) >>> 3. in qmp transaction: >>> - disable bitmap0 >>> - create bitmap1 >>> - start image fleecing (backup sync=none our_disk -> temp_disk) >> This could probably just be its own command, though: >> >> block-job-fleece node=foobar bitmap=bitmap0 etc=etera etc=etera >> >> Could handle forking the bitmap. I'm not sure what the arguments would >> look like, but we could name the NBD export here, too. (Assuming the >> server is already started and we just need to create the share.) >> >> Then, we can basically do what mirror does: >> >> (1) Cancel >> (2) Complete >> >> Cancel would instruct QEMU to keep the bitmap changes (i.e. roll back), >> and Complete would instruct QEMU to discard the changes. >> >> This way we don't need to expose commands like split or merge that will >> almost always be dangerous to use over QMP. >> >> In fact, a fleecing job would be really convenient even without a >> bitmap, because it'd still be nice to have a convenience command for it. >> Using an existing infrastructure and understood paradigm is just a bonus. > > 1. If I understand correctly, Kevin and Max said in their report in > Prague about new block-job approach, > using filter nodes, so I'm not sure that this is a good Idea to > introduce now new old-style block-job, where we can > do without it. > We could do without it, but it might be a lot better to have everything wrapped up in a command that's easy to digest instead of releasing 10 smaller commands that have to be executed in a very specific way in order to work correctly. I'm thinking about the complexity of error checking here with all the smaller commands, versus error checking on a larger workflow we understand and can quality test better. I'm not sure that filter nodes becoming the new normal for block jobs precludes our ability to use the job-management API as a handle for managing the lifetime of a long-running task like fleecing, but I'll check with Max and Kevin about this. > 2. there is the following scenario: customers needs a possibility to > create a backup of data changed since some > point in time. So, maintaining several static and one (or several) activ > bitmaps with a possiblity of merge some of them > and create a backup using this merged bitmap may be convenient. > I think the ability to copy bitmaps and issue differential backups would be sufficient in all cases I could think of... > 3. And I don't see any danger here: we already can create, delete and > clear dirty bitmaps through qmp. Some additional > operations should not be bad. > Not bad, just risk of misuse for little benefit, IMO. I'm wary of raising the external complexity too much. I'd like to keep the API as simple as I can. > 4. Interesting: "disabled" field looks like never used.. Ah! That's true. It was in an early draft and Stefan noticed that there was no reason to let users disable or enable bitmaps in the first version of the feature set, so those QMP commands (along with copy) got removed to keep the feature set smaller. The mechanisms that paid attention to a bitmap being enabled or not remained behind, though. > >> >>> 4. export temp_disk (it looks like a snapshot) and bitmap0 through NBD >>> >>> === external tool can get bitmap0 and then copy some clusters through >>> NBD === >>> >>> on successful backup external tool can drop bitmap0. or can save it and >>> reuse somehow >>> >>> on failed backup external tool can do the following: >>> - enable bitmap0 >>> - merge bitmap1 to bitmap0 >>> - drop bitmap1 >>> > > > -- > Best regards, > Vladimir >
Am 17.11.2017 um 22:35 hat John Snow geschrieben: > >>> usage is like this: > >>> > >>> 1. we have dirty bitmap bitmap0 for incremental backup. > >>> > >>> 2. prepare image fleecing (create temporary image with backing=our_disk) > >>> 3. in qmp transaction: > >>> - disable bitmap0 > >>> - create bitmap1 > >>> - start image fleecing (backup sync=none our_disk -> temp_disk) > >> This could probably just be its own command, though: > >> > >> block-job-fleece node=foobar bitmap=bitmap0 etc=etera etc=etera > >> > >> Could handle forking the bitmap. I'm not sure what the arguments would > >> look like, but we could name the NBD export here, too. (Assuming the > >> server is already started and we just need to create the share.) > >> > >> Then, we can basically do what mirror does: > >> > >> (1) Cancel > >> (2) Complete > >> > >> Cancel would instruct QEMU to keep the bitmap changes (i.e. roll back), > >> and Complete would instruct QEMU to discard the changes. > >> > >> This way we don't need to expose commands like split or merge that will > >> almost always be dangerous to use over QMP. > >> > >> In fact, a fleecing job would be really convenient even without a > >> bitmap, because it'd still be nice to have a convenience command for it. > >> Using an existing infrastructure and understood paradigm is just a bonus. > > > > 1. If I understand correctly, Kevin and Max said in their report in > > Prague about new block-job approach, > > using filter nodes, so I'm not sure that this is a good Idea to > > introduce now new old-style block-job, where we can > > do without it. > > We could do without it, but it might be a lot better to have everything > wrapped up in a command that's easy to digest instead of releasing 10 > smaller commands that have to be executed in a very specific way in > order to work correctly. > > I'm thinking about the complexity of error checking here with all the > smaller commands, versus error checking on a larger workflow we > understand and can quality test better. > > I'm not sure that filter nodes becoming the new normal for block jobs > precludes our ability to use the job-management API as a handle for > managing the lifetime of a long-running task like fleecing, but I'll > check with Max and Kevin about this. We have a general tendency at least in the block layer, but in fact I think in qemu in general, to switch from exposing high-level functionality to exposing lower-level operations via QAPI. If we expose high-level commands, then every new use case will require a change in both qemu and libvirt. With low-level commands, often libvirt already has all of the required tools to implement what it needs. So I do see good reasons for exposing low-level commands. On another note, I would double check before adding a new block job type that this is the right way to go. We have recently noticed that many, if not all, of the existing block jobs (at least mirror, commit and backup) are so similar that they implement the same things multiple times and are just lacking different options and have different bugs. I'm seriously considering merging all of them into a single job type internally that just provides options that effectively turn it into one of the existing job types. So even if we want to tie the bitmap management to a block job, we should consider adding it as an option to the existing backup job rather than adding a completely new fourth job type that again does almost the same except for some bitmap mangement stuff on completion. Kevin
On 11/21/2017 12:23 PM, Kevin Wolf wrote: > Am 17.11.2017 um 22:35 hat John Snow geschrieben: >>>>> usage is like this: >>>>> >>>>> 1. we have dirty bitmap bitmap0 for incremental backup. >>>>> >>>>> 2. prepare image fleecing (create temporary image with backing=our_disk) >>>>> 3. in qmp transaction: >>>>> - disable bitmap0 >>>>> - create bitmap1 >>>>> - start image fleecing (backup sync=none our_disk -> temp_disk) >>>> This could probably just be its own command, though: >>>> >>>> block-job-fleece node=foobar bitmap=bitmap0 etc=etera etc=etera >>>> >>>> Could handle forking the bitmap. I'm not sure what the arguments would >>>> look like, but we could name the NBD export here, too. (Assuming the >>>> server is already started and we just need to create the share.) >>>> >>>> Then, we can basically do what mirror does: >>>> >>>> (1) Cancel >>>> (2) Complete >>>> >>>> Cancel would instruct QEMU to keep the bitmap changes (i.e. roll back), >>>> and Complete would instruct QEMU to discard the changes. >>>> >>>> This way we don't need to expose commands like split or merge that will >>>> almost always be dangerous to use over QMP. >>>> >>>> In fact, a fleecing job would be really convenient even without a >>>> bitmap, because it'd still be nice to have a convenience command for it. >>>> Using an existing infrastructure and understood paradigm is just a bonus. >>> >>> 1. If I understand correctly, Kevin and Max said in their report in >>> Prague about new block-job approach, >>> using filter nodes, so I'm not sure that this is a good Idea to >>> introduce now new old-style block-job, where we can >>> do without it. >> >> We could do without it, but it might be a lot better to have everything >> wrapped up in a command that's easy to digest instead of releasing 10 >> smaller commands that have to be executed in a very specific way in >> order to work correctly. >> >> I'm thinking about the complexity of error checking here with all the >> smaller commands, versus error checking on a larger workflow we >> understand and can quality test better. >> >> I'm not sure that filter nodes becoming the new normal for block jobs >> precludes our ability to use the job-management API as a handle for >> managing the lifetime of a long-running task like fleecing, but I'll >> check with Max and Kevin about this. > > We have a general tendency at least in the block layer, but in fact I > think in qemu in general, to switch from exposing high-level > functionality to exposing lower-level operations via QAPI. > I am aware of that, yeah. I worry about going too far to the other extreme in some cases. Even at the API level where we don't care about the feelings of, or the ease-of-use by a robot, if a certain action requires several API commands to be issued in a very specific order, that increases our test matrix and it increases the complexity in the management API. There's a middle ground, I think. "Fleecing" is one of those cases where we can already fleece today with component commands, but a composite command that encapsulates that functionality would be helpful. In this case, I worry about adding low-level commands for bitmaps that will almost always be incorrect to use except in conjunction with other commands -- and even then generally only useful when issued via transaction specifically. (You might be able to make the case to me that we should add these commands but ONLY as transaction primitives, foregoing their traditional standalone QMP command counterparts.) If I capitulate and let the more targeted primitives into QEMU instead of an encompassing job, it means a higher number of QMP commands overall, more tests, and more interfaces to test and maintain. Maybe I am being wrong-headed, but I actually think a new job would actually give us less to maintain, test and verify than several new primitives would, especially when considering that these primitives will in general only be used by transactions with other commands anyway, it increases the evidence that the right paradigm here is a new job, not more transaction primitives. ...maybe. I won't draw a line in the sand, but it's an approach I would like you to consider. > If we expose high-level commands, then every new use case will require a > change in both qemu and libvirt. With low-level commands, often libvirt > already has all of the required tools to implement what it needs. > I am receptive to how "big" commands often need to change frequently, though. Primitives certainly have a purity of design about them that larger job commands do not possess. > So I do see good reasons for exposing low-level commands. > > > On another note, I would double check before adding a new block job type > that this is the right way to go. We have recently noticed that many, if > not all, of the existing block jobs (at least mirror, commit and backup) > are so similar that they implement the same things multiple times and > are just lacking different options and have different bugs. I'm > seriously considering merging all of them into a single job type > internally that just provides options that effectively turn it into one > of the existing job types. > I'm not particularly opposed. At the very, very least "backup" and "mirror" are pretty much the same thing and "stream" and "commit" are basically the same. Forcing the backuppy-job and the consolidatey-job together seems like an ever-so-slightly harder case to make, but I suppose the truth of the matter in all cases is that we're copying data from one node to another... > So even if we want to tie the bitmap management to a block job, we > should consider adding it as an option to the existing backup job rather > than adding a completely new fourth job type that again does almost the > same except for some bitmap mangement stuff on completion. > ...except here, where fleecing does not necessarily copy data in the same way. (It probably could re-use the copy-on-write notifiers that will be replaced by filter nodes, but I don't see it reusing much else.) We could try it before I naysay it, but where fleecing is concerned we're not using QEMU to move any bits around. It does feel pretty categorically different from the first four jobs. I wouldn't want to see the franken-job be drowned in conditional branches for 5,000 options, either. Eliminating some redundancy is good, but asserting that all existing jobs (and this possible new one too) should all be the same makes me worry that the resulting code would be too complex to work with. ...Maybe? Try it! > Kevin >
22.11.2017 03:10, John Snow wrote: > > On 11/21/2017 12:23 PM, Kevin Wolf wrote: >> Am 17.11.2017 um 22:35 hat John Snow geschrieben: >>>>>> usage is like this: >>>>>> >>>>>> 1. we have dirty bitmap bitmap0 for incremental backup. >>>>>> >>>>>> 2. prepare image fleecing (create temporary image with backing=our_disk) >>>>>> 3. in qmp transaction: >>>>>> - disable bitmap0 >>>>>> - create bitmap1 >>>>>> - start image fleecing (backup sync=none our_disk -> temp_disk) >>>>> This could probably just be its own command, though: >>>>> >>>>> block-job-fleece node=foobar bitmap=bitmap0 etc=etera etc=etera >>>>> >>>>> Could handle forking the bitmap. I'm not sure what the arguments would >>>>> look like, but we could name the NBD export here, too. (Assuming the >>>>> server is already started and we just need to create the share.) >>>>> >>>>> Then, we can basically do what mirror does: >>>>> >>>>> (1) Cancel >>>>> (2) Complete >>>>> >>>>> Cancel would instruct QEMU to keep the bitmap changes (i.e. roll back), >>>>> and Complete would instruct QEMU to discard the changes. >>>>> >>>>> This way we don't need to expose commands like split or merge that will >>>>> almost always be dangerous to use over QMP. >>>>> >>>>> In fact, a fleecing job would be really convenient even without a >>>>> bitmap, because it'd still be nice to have a convenience command for it. >>>>> Using an existing infrastructure and understood paradigm is just a bonus. >>>> 1. If I understand correctly, Kevin and Max said in their report in >>>> Prague about new block-job approach, >>>> using filter nodes, so I'm not sure that this is a good Idea to >>>> introduce now new old-style block-job, where we can >>>> do without it. >>> We could do without it, but it might be a lot better to have everything >>> wrapped up in a command that's easy to digest instead of releasing 10 >>> smaller commands that have to be executed in a very specific way in >>> order to work correctly. >>> >>> I'm thinking about the complexity of error checking here with all the >>> smaller commands, versus error checking on a larger workflow we >>> understand and can quality test better. >>> >>> I'm not sure that filter nodes becoming the new normal for block jobs >>> precludes our ability to use the job-management API as a handle for >>> managing the lifetime of a long-running task like fleecing, but I'll >>> check with Max and Kevin about this. >> We have a general tendency at least in the block layer, but in fact I >> think in qemu in general, to switch from exposing high-level >> functionality to exposing lower-level operations via QAPI. >> > I am aware of that, yeah. I worry about going too far to the other > extreme in some cases. Even at the API level where we don't care about > the feelings of, or the ease-of-use by a robot, if a certain action > requires several API commands to be issued in a very specific order, > that increases our test matrix and it increases the complexity in the > management API. > > There's a middle ground, I think. > > "Fleecing" is one of those cases where we can already fleece today with > component commands, but a composite command that encapsulates that > functionality would be helpful. Additionally, a special filter node is needed to synchronize reads from fleecing node and COW-requests in backup, like in block/replication. > > In this case, I worry about adding low-level commands for bitmaps that > will almost always be incorrect to use except in conjunction with other > commands -- and even then generally only useful when issued via > transaction specifically. > > (You might be able to make the case to me that we should add these > commands but ONLY as transaction primitives, foregoing their traditional > standalone QMP command counterparts.) merge is ok without transaction. transaction is needed only to start a new bitmap and initiate fleecing in the same time. and I do not see anything difficult in it. backup-related staff uses transactions anyway, as we need to backup several disks at once. However, if it is OK to implement these commands only as transaction, we can go that way. > > If I capitulate and let the more targeted primitives into QEMU instead > of an encompassing job, it means a higher number of QMP commands > overall, more tests, and more interfaces to test and maintain. I'm not sure that implementing, testing and maintaining fleecing-job would be simpler than for these three simple commands. Number or QMP commands is not equal to the overall interface complexity. > > Maybe I am being wrong-headed, but I actually think a new job would > actually give us less to maintain, test and verify than several new > primitives would, especially when considering that these primitives will > in general only be used by transactions with other commands anyway, it > increases the evidence that the right paradigm here is a new job, not > more transaction primitives. > > ...maybe. I won't draw a line in the sand, but it's an approach I would > like you to consider. > >> If we expose high-level commands, then every new use case will require a >> change in both qemu and libvirt. With low-level commands, often libvirt >> already has all of the required tools to implement what it needs. >> > I am receptive to how "big" commands often need to change frequently, > though. Primitives certainly have a purity of design about them that > larger job commands do not possess. > >> So I do see good reasons for exposing low-level commands. >> >> >> On another note, I would double check before adding a new block job type >> that this is the right way to go. We have recently noticed that many, if >> not all, of the existing block jobs (at least mirror, commit and backup) >> are so similar that they implement the same things multiple times and >> are just lacking different options and have different bugs. I'm >> seriously considering merging all of them into a single job type >> internally that just provides options that effectively turn it into one >> of the existing job types. >> > I'm not particularly opposed. At the very, very least "backup" and > "mirror" are pretty much the same thing and "stream" and "commit" are > basically the same. > > Forcing the backuppy-job and the consolidatey-job together seems like an > ever-so-slightly harder case to make, but I suppose the truth of the > matter in all cases is that we're copying data from one node to another... > >> So even if we want to tie the bitmap management to a block job, we >> should consider adding it as an option to the existing backup job rather >> than adding a completely new fourth job type that again does almost the >> same except for some bitmap mangement stuff on completion. >> > ...except here, where fleecing does not necessarily copy data in the > same way. > > (It probably could re-use the copy-on-write notifiers that will be > replaced by filter nodes, but I don't see it reusing much else.) > > We could try it before I naysay it, but where fleecing is concerned > we're not using QEMU to move any bits around. It does feel pretty > categorically different from the first four jobs. > > I wouldn't want to see the franken-job be drowned in conditional > branches for 5,000 options, either. Eliminating some redundancy is good, > but asserting that all existing jobs (and this possible new one too) > should all be the same makes me worry that the resulting code would be > too complex to work with. > > ...Maybe? Try it! > >> Kevin >> -- Best regards, Vladimir
On Tue, Nov 21, 2017 at 07:10:19PM -0500, John Snow wrote: > On 11/21/2017 12:23 PM, Kevin Wolf wrote: [...] # Snip lot of good discussion. > > On another note, I would double check before adding a new block job type > > that this is the right way to go. We have recently noticed that many, if > > not all, of the existing block jobs (at least mirror, commit and backup) > > are so similar that they implement the same things multiple times and > > are just lacking different options and have different bugs. I'm > > seriously considering merging all of them into a single job type > > internally that just provides options that effectively turn it into one > > of the existing job types. > > > > I'm not particularly opposed. At the very, very least "backup" and > "mirror" are pretty much the same thing and "stream" and "commit" are > basically the same. > > Forcing the backuppy-job and the consolidatey-job together seems like an > ever-so-slightly harder case to make, but I suppose the truth of the > matter in all cases is that we're copying data from one node to another... So from the above interesting discussion, it seems like Kevin is leaning towards a single job type that offers 'stream', 'commit', 'backup', and 'mirror' functionality as part of a single command / job type. Based on an instinct, this sounds a bit too stuffy and complex to me. And John seems to be leaning towards two block device job types: - 'blockdev-foo' that offers both current 'stream' and 'commit' functionality as two different options to the same QMP command; and - 'blockdev-bar' will offer both 'mirror' and 'backup' functionality as part of the same QMP command FWIW, this seems a bit more palatable, as it is unifying similar-functionality-that-differ-slightly into two distinct commands. (/me is still wrapping his head around the bitmaps and incremental backups infrastructure.) > > So even if we want to tie the bitmap management to a block job, we > > should consider adding it as an option to the existing backup job rather > > than adding a completely new fourth job type that again does almost the > > same except for some bitmap mangement stuff on completion. > > > > ...except here, where fleecing does not necessarily copy data in the > same way. > > (It probably could re-use the copy-on-write notifiers that will be > replaced by filter nodes, but I don't see it reusing much else.) > > We could try it before I naysay it, but where fleecing is concerned > we're not using QEMU to move any bits around. It does feel pretty > categorically different from the first four jobs. > > I wouldn't want to see the franken-job be drowned in conditional > branches for 5,000 options, either. Eliminating some redundancy is good, > but asserting that all existing jobs (and this possible new one too) > should all be the same makes me worry that the resulting code would be > too complex to work with. [...] -- /kashyap
Am 07.12.2017 um 12:56 hat Kashyap Chamarthy geschrieben: > On Tue, Nov 21, 2017 at 07:10:19PM -0500, John Snow wrote: > > On 11/21/2017 12:23 PM, Kevin Wolf wrote: > > [...] # Snip lot of good discussion. > > > > On another note, I would double check before adding a new block job type > > > that this is the right way to go. We have recently noticed that many, if > > > not all, of the existing block jobs (at least mirror, commit and backup) > > > are so similar that they implement the same things multiple times and > > > are just lacking different options and have different bugs. I'm > > > seriously considering merging all of them into a single job type > > > internally that just provides options that effectively turn it into one > > > of the existing job types. > > > > > > > I'm not particularly opposed. At the very, very least "backup" and > > "mirror" are pretty much the same thing and "stream" and "commit" are > > basically the same. > > > > Forcing the backuppy-job and the consolidatey-job together seems like an > > ever-so-slightly harder case to make, but I suppose the truth of the > > matter in all cases is that we're copying data from one node to another... > > So from the above interesting discussion, it seems like Kevin is leaning > towards a single job type that offers 'stream', 'commit', 'backup', and > 'mirror' functionality as part of a single command / job type. Based on > an instinct, this sounds a bit too stuffy and complex to me. > > And John seems to be leaning towards two block device job types: > > - 'blockdev-foo' that offers both current 'stream' and 'commit' > functionality as two different options to the same QMP command; and > > - 'blockdev-bar' will offer both 'mirror' and 'backup' functionality > as part of the same QMP command > > FWIW, this seems a bit more palatable, as it is unifying > similar-functionality-that-differ-slightly into two distinct commands. > > (/me is still wrapping his head around the bitmaps and incremental > backups infrastructure.) Commit of the active layer is _already_ a mirror job internally (and not a stream job). It's pretty clear to me that commit and mirror are almost the same, backup is pretty similar. Stream is somewhat different and might make more sense as a separate job type. Kevin
On Thu, Dec 07, 2017 at 06:33:04PM +0100, Kevin Wolf wrote: > Am 07.12.2017 um 12:56 hat Kashyap Chamarthy geschrieben: [...] > > So from the above interesting discussion, it seems like Kevin is leaning > > towards a single job type that offers 'stream', 'commit', 'backup', and > > 'mirror' functionality as part of a single command / job type. Based on > > an instinct, this sounds a bit too stuffy and complex to me. > > > > And John seems to be leaning towards two block device job types: > > > > - 'blockdev-foo' that offers both current 'stream' and 'commit' > > functionality as two different options to the same QMP command; and > > > > - 'blockdev-bar' will offer both 'mirror' and 'backup' functionality > > as part of the same QMP command > > > > FWIW, this seems a bit more palatable, as it is unifying > > similar-functionality-that-differ-slightly into two distinct commands. [...] > Commit of the active layer is _already_ a mirror job internally (and not > a stream job). I see; didn't realize the above. I learn that it is the case so for a long time, and came in via the commit 20a63d2 ("commit: Support commit active layer"). For my own education, I went looking into qemu/block/mirror.c, and noticed the variable 'commit_active_job_driver' (of type BlockJobDriver), which is used in the function: commit_active_start(). Thanks for the pointer. > It's pretty clear to me that commit and mirror are almost > the same, backup is pretty similar. Stream is somewhat different and > might make more sense as a separate job type. Yep, understood. -- /kashyap
On 12/07/2017 06:56 AM, Kashyap Chamarthy wrote: > On Tue, Nov 21, 2017 at 07:10:19PM -0500, John Snow wrote: >> On 11/21/2017 12:23 PM, Kevin Wolf wrote: > > [...] # Snip lot of good discussion. > >>> On another note, I would double check before adding a new block job type >>> that this is the right way to go. We have recently noticed that many, if >>> not all, of the existing block jobs (at least mirror, commit and backup) >>> are so similar that they implement the same things multiple times and >>> are just lacking different options and have different bugs. I'm >>> seriously considering merging all of them into a single job type >>> internally that just provides options that effectively turn it into one >>> of the existing job types. >>> >> >> I'm not particularly opposed. At the very, very least "backup" and >> "mirror" are pretty much the same thing and "stream" and "commit" are >> basically the same. >> >> Forcing the backuppy-job and the consolidatey-job together seems like an >> ever-so-slightly harder case to make, but I suppose the truth of the >> matter in all cases is that we're copying data from one node to another... > > So from the above interesting discussion, it seems like Kevin is leaning > towards a single job type that offers 'stream', 'commit', 'backup', and > 'mirror' functionality as part of a single command / job type. Based on > an instinct, this sounds a bit too stuffy and complex to me. > > And John seems to be leaning towards two block device job types: > > - 'blockdev-foo' that offers both current 'stream' and 'commit' > functionality as two different options to the same QMP command; and > > - 'blockdev-bar' will offer both 'mirror' and 'backup' functionality > as part of the same QMP command > > FWIW, this seems a bit more palatable, as it is unifying > similar-functionality-that-differ-slightly into two distinct commands. > > (/me is still wrapping his head around the bitmaps and incremental > backups infrastructure.) > The discussion you're honing in on is tangential to the one at the heart of this topic, which is: "What's the right API for pull model incremental backups?" which digs up the question "What is the right API for our existing data copying commands?" Kevin is probably right in the end, though; he usually is. We do need to unify our data moving logic to avoid fixing the same bugs in four different but nearly identical jobs. My only concern is that the special casing will grow too complex for that one unified job and the job will become so complicated nobody knows how to work on it anymore without breaking other cases. Maybe this is a reasonable concern, maybe it isn't. I'm also not in a hurry to personally unify the loops myself, but if someone did, they could CC me and I'd review it thoroughly. We'd find out in a hurry from the implementation if unification was a win for SLOC. --js
On 2017-12-07 23:47, John Snow wrote: > > > On 12/07/2017 06:56 AM, Kashyap Chamarthy wrote: >> On Tue, Nov 21, 2017 at 07:10:19PM -0500, John Snow wrote: >>> On 11/21/2017 12:23 PM, Kevin Wolf wrote: >> >> [...] # Snip lot of good discussion. >> >>>> On another note, I would double check before adding a new block job type >>>> that this is the right way to go. We have recently noticed that many, if >>>> not all, of the existing block jobs (at least mirror, commit and backup) >>>> are so similar that they implement the same things multiple times and >>>> are just lacking different options and have different bugs. I'm >>>> seriously considering merging all of them into a single job type >>>> internally that just provides options that effectively turn it into one >>>> of the existing job types. >>>> >>> >>> I'm not particularly opposed. At the very, very least "backup" and >>> "mirror" are pretty much the same thing and "stream" and "commit" are >>> basically the same. >>> >>> Forcing the backuppy-job and the consolidatey-job together seems like an >>> ever-so-slightly harder case to make, but I suppose the truth of the >>> matter in all cases is that we're copying data from one node to another... >> >> So from the above interesting discussion, it seems like Kevin is leaning >> towards a single job type that offers 'stream', 'commit', 'backup', and >> 'mirror' functionality as part of a single command / job type. Based on >> an instinct, this sounds a bit too stuffy and complex to me. >> >> And John seems to be leaning towards two block device job types: >> >> - 'blockdev-foo' that offers both current 'stream' and 'commit' >> functionality as two different options to the same QMP command; and >> >> - 'blockdev-bar' will offer both 'mirror' and 'backup' functionality >> as part of the same QMP command >> >> FWIW, this seems a bit more palatable, as it is unifying >> similar-functionality-that-differ-slightly into two distinct commands. >> >> (/me is still wrapping his head around the bitmaps and incremental >> backups infrastructure.) >> > > The discussion you're honing in on is tangential to the one at the heart > of this topic, which is: > > "What's the right API for pull model incremental backups?" which digs up > the question "What is the right API for our existing data copying commands?" > > Kevin is probably right in the end, though; he usually is. We do need to > unify our data moving logic to avoid fixing the same bugs in four > different but nearly identical jobs. > > My only concern is that the special casing will grow too complex for > that one unified job and the job will become so complicated nobody knows > how to work on it anymore without breaking other cases. Maybe this is a > reasonable concern, maybe it isn't. The thing is that mirror is becoming more complicated anyway. For one thing, we already do commits through it and I don't see why intermediate commits shouldn't just work the same way (the only thing is that we would have to automatically complete the job instead of emitting a ready event). The other lies in the active mirror thingy. If we have active mirror, then we should be able to implement backup pretty easily, too. We just need to copy the old data to the target instead of the new one (I think). So the thing is that commit and backup are going to be pretty much included in mirror for free by accident. There is no real reason to have own code for them, then. Max
18.11.2017 00:35, John Snow wrote: > > On 11/17/2017 03:22 AM, Vladimir Sementsov-Ogievskiy wrote: >> 17.11.2017 06:10, John Snow wrote: >>> On 11/16/2017 03:17 AM, Vladimir Sementsov-Ogievskiy wrote: >>>> 16.11.2017 00:20, John Snow wrote: >>>>> On 11/13/2017 11:20 AM, Vladimir Sementsov-Ogievskiy wrote: >>>>>> Hi all. >>>>>> >>>>>> There are three qmp commands, needed to implement external backup API. >>>>>> >>>>>> Using these three commands, client may do all needed bitmap >>>>>> management by >>>>>> hand: >>>>>> >>>>>> on backup start we need to do a transaction: >>>>>> {disable old bitmap, create new bitmap} >>>>>> >>>>>> on backup success: >>>>>> drop old bitmap >>>>>> >>>>>> on backup fail: >>>>>> enable old bitmap >>>>>> merge new bitmap to old bitmap >>>>>> drop new bitmap >>>>>> >>>>> Can you give me an example of how you expect these commands to be used, >>>>> and why they're required? >>>>> >>>>> I'm a little weary about how error-prone these commands might be and the >>>>> potential for incorrect usage seems... high. Why do we require them? >>>> It is needed for incremental backup. It looks like bad idea to export >>>> abdicate/reclaim functionality, it is simpler >>>> and clearer to allow user to merge/enable/disable bitmaps by hand. >>>> >>>> usage is like this: >>>> >>>> 1. we have dirty bitmap bitmap0 for incremental backup. >>>> >>>> 2. prepare image fleecing (create temporary image with backing=our_disk) >>>> 3. in qmp transaction: >>>> - disable bitmap0 >>>> - create bitmap1 >>>> - start image fleecing (backup sync=none our_disk -> temp_disk) >>> This could probably just be its own command, though: >>> >>> block-job-fleece node=foobar bitmap=bitmap0 etc=etera etc=etera >>> >>> Could handle forking the bitmap. I'm not sure what the arguments would >>> look like, but we could name the NBD export here, too. (Assuming the >>> server is already started and we just need to create the share.) >>> >>> Then, we can basically do what mirror does: >>> >>> (1) Cancel >>> (2) Complete >>> >>> Cancel would instruct QEMU to keep the bitmap changes (i.e. roll back), >>> and Complete would instruct QEMU to discard the changes. >>> >>> This way we don't need to expose commands like split or merge that will >>> almost always be dangerous to use over QMP. >>> >>> In fact, a fleecing job would be really convenient even without a >>> bitmap, because it'd still be nice to have a convenience command for it. >>> Using an existing infrastructure and understood paradigm is just a bonus. >> 1. If I understand correctly, Kevin and Max said in their report in >> Prague about new block-job approach, >> using filter nodes, so I'm not sure that this is a good Idea to >> introduce now new old-style block-job, where we can >> do without it. >> > We could do without it, but it might be a lot better to have everything > wrapped up in a command that's easy to digest instead of releasing 10 > smaller commands that have to be executed in a very specific way in > order to work correctly. > > I'm thinking about the complexity of error checking here with all the > smaller commands, versus error checking on a larger workflow we > understand and can quality test better. > > I'm not sure that filter nodes becoming the new normal for block jobs > precludes our ability to use the job-management API as a handle for > managing the lifetime of a long-running task like fleecing, but I'll > check with Max and Kevin about this. > >> 2. there is the following scenario: customers needs a possibility to >> create a backup of data changed since some >> point in time. So, maintaining several static and one (or several) activ >> bitmaps with a possiblity of merge some of them >> and create a backup using this merged bitmap may be convenient. >> > I think the ability to copy bitmaps and issue differential backups would > be sufficient in all cases I could think of... so, instead of keeping several static bitmaps with ability to merge them, you propose to keep several active bitmaps and copy them to make a backup? so, instead of new qmp command for merge, add new qmp command for copy? in case of static bitmaps, we can implement saving/loading them to the image to free RAM space, so it is better. or what do you propose for [2] ? -- Best regards, Vladimir
On 11/30/2017 07:10 AM, Vladimir Sementsov-Ogievskiy wrote: > 18.11.2017 00:35, John Snow wrote: >> >> On 11/17/2017 03:22 AM, Vladimir Sementsov-Ogievskiy wrote: >>> 17.11.2017 06:10, John Snow wrote: >>>> On 11/16/2017 03:17 AM, Vladimir Sementsov-Ogievskiy wrote: >>>>> 16.11.2017 00:20, John Snow wrote: >>>>>> On 11/13/2017 11:20 AM, Vladimir Sementsov-Ogievskiy wrote: >>>>>>> Hi all. >>>>>>> >>>>>>> There are three qmp commands, needed to implement external backup >>>>>>> API. >>>>>>> >>>>>>> Using these three commands, client may do all needed bitmap >>>>>>> management by >>>>>>> hand: >>>>>>> >>>>>>> on backup start we need to do a transaction: >>>>>>> {disable old bitmap, create new bitmap} >>>>>>> >>>>>>> on backup success: >>>>>>> drop old bitmap >>>>>>> >>>>>>> on backup fail: >>>>>>> enable old bitmap >>>>>>> merge new bitmap to old bitmap >>>>>>> drop new bitmap >>>>>>> >>>>>> Can you give me an example of how you expect these commands to be >>>>>> used, >>>>>> and why they're required? >>>>>> >>>>>> I'm a little weary about how error-prone these commands might be >>>>>> and the >>>>>> potential for incorrect usage seems... high. Why do we require them? >>>>> It is needed for incremental backup. It looks like bad idea to export >>>>> abdicate/reclaim functionality, it is simpler >>>>> and clearer to allow user to merge/enable/disable bitmaps by hand. >>>>> >>>>> usage is like this: >>>>> >>>>> 1. we have dirty bitmap bitmap0 for incremental backup. >>>>> >>>>> 2. prepare image fleecing (create temporary image with >>>>> backing=our_disk) >>>>> 3. in qmp transaction: >>>>> - disable bitmap0 >>>>> - create bitmap1 >>>>> - start image fleecing (backup sync=none our_disk -> temp_disk) >>>> This could probably just be its own command, though: >>>> >>>> block-job-fleece node=foobar bitmap=bitmap0 etc=etera etc=etera >>>> >>>> Could handle forking the bitmap. I'm not sure what the arguments would >>>> look like, but we could name the NBD export here, too. (Assuming the >>>> server is already started and we just need to create the share.) >>>> >>>> Then, we can basically do what mirror does: >>>> >>>> (1) Cancel >>>> (2) Complete >>>> >>>> Cancel would instruct QEMU to keep the bitmap changes (i.e. roll back), >>>> and Complete would instruct QEMU to discard the changes. >>>> >>>> This way we don't need to expose commands like split or merge that will >>>> almost always be dangerous to use over QMP. >>>> >>>> In fact, a fleecing job would be really convenient even without a >>>> bitmap, because it'd still be nice to have a convenience command for >>>> it. >>>> Using an existing infrastructure and understood paradigm is just a >>>> bonus. >>> 1. If I understand correctly, Kevin and Max said in their report in >>> Prague about new block-job approach, >>> using filter nodes, so I'm not sure that this is a good Idea to >>> introduce now new old-style block-job, where we can >>> do without it. >>> >> We could do without it, but it might be a lot better to have everything >> wrapped up in a command that's easy to digest instead of releasing 10 >> smaller commands that have to be executed in a very specific way in >> order to work correctly. >> >> I'm thinking about the complexity of error checking here with all the >> smaller commands, versus error checking on a larger workflow we >> understand and can quality test better. >> >> I'm not sure that filter nodes becoming the new normal for block jobs >> precludes our ability to use the job-management API as a handle for >> managing the lifetime of a long-running task like fleecing, but I'll >> check with Max and Kevin about this. >> >>> 2. there is the following scenario: customers needs a possibility to >>> create a backup of data changed since some >>> point in time. So, maintaining several static and one (or several) activ >>> bitmaps with a possiblity of merge some of them >>> and create a backup using this merged bitmap may be convenient. >>> >> I think the ability to copy bitmaps and issue differential backups would >> be sufficient in all cases I could think of... > > so, instead of keeping several static bitmaps with ability to merge them, > you propose to keep several active bitmaps and copy them to make a backup? > > so, instead of new qmp command for merge, add new qmp command for copy? > > in case of static bitmaps, we can implement saving/loading them to the > image to free RAM space, > so it is better. > > or what do you propose for [2] ? > > > > I'm sorry, I don't think I understand. "customers needs a possibility to create a backup of data changed since some point in time." Is that not the existing case for a simple incremental backup? Granted, the point in time was decided when we created the bitmap or when we made the last backup, but it is "since some point in time." If you mean to say an arbitrary point in time after-the-fact, I don't see how the API presented here helps enable that functionality. (by "arbitrary point in time after-the-fact I mean for example: Say a user installs a malicious application in a VM on Thursday, but the bitmap was created on Monday. The user wants to go back to Wednesday evening, but we have no record of that point in time, so we cannot go back to it.) Can you elaborate on what you're trying to accomplish so I make sure I'm considering you carefully?
07.12.2017 03:38, John Snow wrote: > > On 11/30/2017 07:10 AM, Vladimir Sementsov-Ogievskiy wrote: >> 18.11.2017 00:35, John Snow wrote: >>> On 11/17/2017 03:22 AM, Vladimir Sementsov-Ogievskiy wrote: >>>> 17.11.2017 06:10, John Snow wrote: >>>>> On 11/16/2017 03:17 AM, Vladimir Sementsov-Ogievskiy wrote: >>>>>> 16.11.2017 00:20, John Snow wrote: >>>>>>> On 11/13/2017 11:20 AM, Vladimir Sementsov-Ogievskiy wrote: >>>>>>>> Hi all. >>>>>>>> >>>>>>>> There are three qmp commands, needed to implement external backup >>>>>>>> API. >>>>>>>> >>>>>>>> Using these three commands, client may do all needed bitmap >>>>>>>> management by >>>>>>>> hand: >>>>>>>> >>>>>>>> on backup start we need to do a transaction: >>>>>>>> {disable old bitmap, create new bitmap} >>>>>>>> >>>>>>>> on backup success: >>>>>>>> drop old bitmap >>>>>>>> >>>>>>>> on backup fail: >>>>>>>> enable old bitmap >>>>>>>> merge new bitmap to old bitmap >>>>>>>> drop new bitmap >>>>>>>> >>>>>>> Can you give me an example of how you expect these commands to be >>>>>>> used, >>>>>>> and why they're required? >>>>>>> >>>>>>> I'm a little weary about how error-prone these commands might be >>>>>>> and the >>>>>>> potential for incorrect usage seems... high. Why do we require them? >>>>>> It is needed for incremental backup. It looks like bad idea to export >>>>>> abdicate/reclaim functionality, it is simpler >>>>>> and clearer to allow user to merge/enable/disable bitmaps by hand. >>>>>> >>>>>> usage is like this: >>>>>> >>>>>> 1. we have dirty bitmap bitmap0 for incremental backup. >>>>>> >>>>>> 2. prepare image fleecing (create temporary image with >>>>>> backing=our_disk) >>>>>> 3. in qmp transaction: >>>>>> - disable bitmap0 >>>>>> - create bitmap1 >>>>>> - start image fleecing (backup sync=none our_disk -> temp_disk) >>>>> This could probably just be its own command, though: >>>>> >>>>> block-job-fleece node=foobar bitmap=bitmap0 etc=etera etc=etera >>>>> >>>>> Could handle forking the bitmap. I'm not sure what the arguments would >>>>> look like, but we could name the NBD export here, too. (Assuming the >>>>> server is already started and we just need to create the share.) >>>>> >>>>> Then, we can basically do what mirror does: >>>>> >>>>> (1) Cancel >>>>> (2) Complete >>>>> >>>>> Cancel would instruct QEMU to keep the bitmap changes (i.e. roll back), >>>>> and Complete would instruct QEMU to discard the changes. >>>>> >>>>> This way we don't need to expose commands like split or merge that will >>>>> almost always be dangerous to use over QMP. >>>>> >>>>> In fact, a fleecing job would be really convenient even without a >>>>> bitmap, because it'd still be nice to have a convenience command for >>>>> it. >>>>> Using an existing infrastructure and understood paradigm is just a >>>>> bonus. >>>> 1. If I understand correctly, Kevin and Max said in their report in >>>> Prague about new block-job approach, >>>> using filter nodes, so I'm not sure that this is a good Idea to >>>> introduce now new old-style block-job, where we can >>>> do without it. >>>> >>> We could do without it, but it might be a lot better to have everything >>> wrapped up in a command that's easy to digest instead of releasing 10 >>> smaller commands that have to be executed in a very specific way in >>> order to work correctly. >>> >>> I'm thinking about the complexity of error checking here with all the >>> smaller commands, versus error checking on a larger workflow we >>> understand and can quality test better. >>> >>> I'm not sure that filter nodes becoming the new normal for block jobs >>> precludes our ability to use the job-management API as a handle for >>> managing the lifetime of a long-running task like fleecing, but I'll >>> check with Max and Kevin about this. >>> >>>> 2. there is the following scenario: customers needs a possibility to >>>> create a backup of data changed since some >>>> point in time. So, maintaining several static and one (or several) activ >>>> bitmaps with a possiblity of merge some of them >>>> and create a backup using this merged bitmap may be convenient. >>>> >>> I think the ability to copy bitmaps and issue differential backups would >>> be sufficient in all cases I could think of... >> so, instead of keeping several static bitmaps with ability to merge them, >> you propose to keep several active bitmaps and copy them to make a backup? >> >> so, instead of new qmp command for merge, add new qmp command for copy? >> >> in case of static bitmaps, we can implement saving/loading them to the >> image to free RAM space, >> so it is better. >> >> or what do you propose for [2] ? >> >> >> >> > I'm sorry, I don't think I understand. > > "customers needs a possibility to create a backup of data changed since > some point in time." > > Is that not the existing case for a simple incremental backup? Granted, > the point in time was decided when we created the bitmap or when we made > the last backup, but it is "since some point in time." > > If you mean to say an arbitrary point in time after-the-fact, I don't > see how the API presented here helps enable that functionality. > > (by "arbitrary point in time after-the-fact I mean for example: Say a > user installs a malicious application in a VM on Thursday, but the > bitmap was created on Monday. The user wants to go back to Wednesday > evening, but we have no record of that point in time, so we cannot go > back to it.) > > Can you elaborate on what you're trying to accomplish so I make sure I'm > considering you carefully? Yes, point in time is when we create a dirty bitmap. But we want to maintain several points in time. For example it may be last 10 incremental backups. User wants an ability to create incremental backup which will contain changes from selected point in time to current moment. It is needed for example if backup was deleted (to save disk space) and now user wants to recreate it. In current scheme for incremental backup, after successful backup we actually lose previous point in time, and have only the last one. With ability to merge bitmap we can do the following: 1. create bitmap1 2. disable bitmap1, do external backup by bitmap1, create bitmap2 2.1 on backup fail: merge bitmap2 to bitmap1, enable bitmap1, delete bitmap2 2.2 on backup success: do nothing 3. disable bitmap2, do external backup by bitmap2, create bitmap3 3.1 on backup fail: merge bitmap3 to bitmap2, enable bitmap2, delete bitmap3 3.2 on backup success: do nothing ... so, we have disabled bitmaps bitmap_i, corresponding to difference between points in time i and i+1 and if user wants to get data, changed from point in time number j, he just merges bitmaps from bitmap_j to the last(active) one to the new bitmap bitmap_temp and create corresponding backup. instead of storing several disabled bitmaps we can store enabled bitmaps, in this case we will not need merge operation but instead we can implement copy operation. However maintaining disabled bitmaps is better, as we can then implement storing them to the qcow2 image, to save RAM space. I also have another idea: implement new object: point-in-time or checkpoint. The should have names, and the simple add/remove API. And they will be backed by dirty bitmaps. so checkpoint deletion is bitmap merge (and delete one of them), checkpoint creation is disabling of active-checkpoint-bitmap and starting new active-checkpoint-bitmap. Then we can implement merging of several bitmaps (from one of checkpoints to current moment) in NBD meta-context-query handling. -- Best regards, Vladimir
This is going to be a long one. Maybe go get a cup of coffee. On 12/07/2017 04:39 AM, Vladimir Sementsov-Ogievskiy wrote: > 07.12.2017 03:38, John Snow wrote: >> I'm sorry, I don't think I understand. >> >> "customers needs a possibility to create a backup of data changed since >> some point in time." >> >> Is that not the existing case for a simple incremental backup? Granted, >> the point in time was decided when we created the bitmap or when we made >> the last backup, but it is "since some point in time." >> >> If you mean to say an arbitrary point in time after-the-fact, I don't >> see how the API presented here helps enable that functionality. >> >> (by "arbitrary point in time after-the-fact I mean for example: Say a >> user installs a malicious application in a VM on Thursday, but the >> bitmap was created on Monday. The user wants to go back to Wednesday >> evening, but we have no record of that point in time, so we cannot go >> back to it.) >> >> Can you elaborate on what you're trying to accomplish so I make sure I'm >> considering you carefully? > > Yes, point in time is when we create a dirty bitmap. But we want to > maintain several points in time. > For example it may be last 10 incremental backups. > User wants an ability to create incremental backup which will contain > changes from selected point in > time to current moment. It is needed for example if backup was deleted > (to save disk space) and now user > wants to recreate it. > > In current scheme for incremental backup, after successful backup we > actually lose previous point in time, > and have only the last one. Differential backup mode may help a little in this flexibility and costs us basically nothing to implement: We simply always re-merge the bitmaps after creating the backup. So you have two options: (1) Incremental: Replace the existing point-in-time with a new one (2) Differential: Keep the existing point-in-time. I suspect you are wanting something a lot more powerful than this, though. > > With ability to merge bitmap we can do the following: > > 1. create bitmap1 > 2. disable bitmap1, do external backup by bitmap1, create bitmap2 > 2.1 on backup fail: merge bitmap2 to bitmap1, enable bitmap1, delete > bitmap2 > 2.2 on backup success: do nothing Okay, so bitmap1 and bitmap2 cover periods of time that are disjoint; where you have ----time---------------> [bitmap1][bitmap2....--> so you intend to accrue a number of bitmaps representing the last N slices of time, with only the most recent bitmap being enabled. Functionally you intend to permanently fork a bitmap every time a backup operation succeeds; so on incremental backup: (1) We succeed and the forked bitmap we already made gets saved as a new disabled bitmap instead of being deleted. (2) We fail, and we roll back exactly as we always have. Here's an idea of what this API might look like without revealing explicit merge/split primitives. A new bitmap property that lets us set retention: :: block-dirty-bitmap-set-retention bitmap=foo slices=10 Or something similar, where the default property for all bitmaps is zero -- the current behavior: no copies retained. By setting it to a non-zero positive integer, the incremental backup mode will automatically save a disabled copy when possible. "What happens if we exceed our retention?" (A) We push the last one out automatically, or (B) We fail the operation immediately. A is more convenient, but potentially unsafe if the management tool or user wasn't aware that was going to happen. B is more annoying, but definitely more safe as it means we cannot lose a bitmap accidentally. I would argue for B with perhaps a force-cycle=true|false that defaults to false to let management tools say "Yes, go ahead, remove the old one" with additionally some return to let us know it happened: {"return": { "dropped-slices": [ {"bitmap0": 0}, ...] }} This would introduce some concept of bitmap slices into the mix as ID'd children of a bitmap. I would propose that these slices are numbered and monotonically increasing. "bitmap0" as an object starts with no slices, but every incremental backup creates slice 0, slice 1, slice 2, and so on. Even after we start deleting some, they stay ordered. These numbers then stand in for points in time. The counter can (must?) be reset and all slices forgotten when performing a full backup while providing a bitmap argument. "How can a user make use of the slices once they're made?" Let's consider something like mode=partial in contrast to mode=incremental, and an example where we have 6 prior slices: 0,1,2,3,4,5, (and, unnamed, the 'active' slice.) mode=partial bitmap=foo slice=4 This would create a backup from slice 4 to the current time α. This includes all clusters from 4, 5, and the active bitmap. I don't think it is meaningful to define any end point that isn't the current time, so I've omitted that as a possibility. "Does a partial backup create a new point in time?" If yes: This means that the next incremental backup must necessarily be based off of the last partial backup that was made. This seems a little inconvenient. This would mean that point in time α becomes "slice 6." If no: This means that we lose the point in time when we made the partial and we cannot chain off of the partial backup. It does mean that the next incremental backup will work as normally expected, however. This means that point in time α cannot again be referenced by the management client. This mirrors the dynamic between "incremental" and "differential" backups. ..hmmm.. You know, incremental backups are just a special case of "partial" here where slice is the last recorded slice... Let's look at an API like this: mode=<incremental|differential> bitmap=<name> [slice=N] Incremental: We create a new slice if the bitmap has room for one. Differential: We don't create a new slice. The data in the active bitmap α does not get cleared after the bitmap operation. Slice: If not specified, assume we want only the active slice. This is the current behavior in QEMU 2.11. If specified, we create a temporary merge between bitmaps [N..α] and use that for the backup operation. "Can we delete slices?" Sure. :: block-dirty-bitmap-slice-delete bitmap=foo slice=4 "Can we create a slice without making a bitmap?" It would be easy to do, but I'm not sure I see the utility. In using it, it means if you don't specify the slice manually for the next backup that you will necessarily be getting something not usable. but we COULD do it, it would just be banking the changes in the active bitmap into a new slice. > 3. disable bitmap2, do external backup by bitmap2, create bitmap3 > 3.1 on backup fail: merge bitmap3 to bitmap2, enable bitmap2, delete > bitmap3 > 3.2 on backup success: do nothing > ... > > so, we have disabled bitmaps bitmap_i, corresponding to difference > between points in time i and i+1 > and if user wants to get data, changed from point in time number j, he > just merges bitmaps from > bitmap_j to the last(active) one to the new bitmap bitmap_temp and > create corresponding backup. > Roger. > > instead of storing several disabled bitmaps we can store enabled > bitmaps, in this case we will not need > merge operation but instead we can implement copy operation. However > maintaining disabled bitmaps > is better, as we can then implement storing them to the qcow2 image, to > save RAM space. > OK, I understand what you meant by this now. It's definitely preferable to have inert bitmaps we don't have to worry about updating identically. I don't know how many checkpoints you intend to accrue, but if it's more than a few then the update cost on every single write may become measurable. > I also have another idea: > implement new object: point-in-time or checkpoint. The should have > names, and the simple add/remove API. > And they will be backed by dirty bitmaps. so checkpoint deletion is > bitmap merge (and delete one of them), > checkpoint creation is disabling of active-checkpoint-bitmap and > starting new active-checkpoint-bitmap. > Yes, exactly! I think that's pretty similar to what I am thinking of with slices. This sounds a little safer to me in that we can examine an operation to see if it's sane or not. > Then we can implement merging of several bitmaps (from one of > checkpoints to current moment) in > NBD meta-context-query handling. > Note: I should say that I've had discussions with Stefan in the past over things like differential mode and the feeling I got from him was that he felt that data should be copied from QEMU precisely *once*, viewing any subsequent copying of the same data as redundant and wasteful. Once data is copied out of QEMU and it becomes inert, it should be up to the management utility to handle re-configuring that data in different forms, perhaps using bitmaps copied out alongside the initial task. (e.g. combining multiple incrementals into a 'differential' instead of allowing a differential mode.) I think his viewpoint was that copying data from QEMU is expensive and necessarily harder and less flexible than manipulating inert data; so he would prefer to see offline operations provide flexibility when possible. (Stefan, do I misrepresent your sentiment?) Maybe there's some sympathy to "What if we lose the backup, though?" however. I'm not against increased flexibility, but you might need to explain why incremental backups previously made have a reason for being deleted or lost. Did you finish your cup of coffee? --John
On 12/09/2017 03:57 AM, John Snow wrote: > This is going to be a long one. Maybe go get a cup of coffee. > > On 12/07/2017 04:39 AM, Vladimir Sementsov-Ogievskiy wrote: >> 07.12.2017 03:38, John Snow wrote: >>> I'm sorry, I don't think I understand. >>> >>> "customers needs a possibility to create a backup of data changed since >>> some point in time." >>> >>> Is that not the existing case for a simple incremental backup? Granted, >>> the point in time was decided when we created the bitmap or when we made >>> the last backup, but it is "since some point in time." >>> >>> If you mean to say an arbitrary point in time after-the-fact, I don't >>> see how the API presented here helps enable that functionality. >>> >>> (by "arbitrary point in time after-the-fact I mean for example: Say a >>> user installs a malicious application in a VM on Thursday, but the >>> bitmap was created on Monday. The user wants to go back to Wednesday >>> evening, but we have no record of that point in time, so we cannot go >>> back to it.) >>> >>> Can you elaborate on what you're trying to accomplish so I make sure I'm >>> considering you carefully? >> Yes, point in time is when we create a dirty bitmap. But we want to >> maintain several points in time. >> For example it may be last 10 incremental backups. >> User wants an ability to create incremental backup which will contain >> changes from selected point in >> time to current moment. It is needed for example if backup was deleted >> (to save disk space) and now user >> wants to recreate it. >> >> In current scheme for incremental backup, after successful backup we >> actually lose previous point in time, >> and have only the last one. > Differential backup mode may help a little in this flexibility and costs > us basically nothing to implement: > > We simply always re-merge the bitmaps after creating the backup. So you > have two options: > > (1) Incremental: Replace the existing point-in-time with a new one > (2) Differential: Keep the existing point-in-time. > > I suspect you are wanting something a lot more powerful than this, though. > >> With ability to merge bitmap we can do the following: >> >> 1. create bitmap1 >> 2. disable bitmap1, do external backup by bitmap1, create bitmap2 >> 2.1 on backup fail: merge bitmap2 to bitmap1, enable bitmap1, delete >> bitmap2 >> 2.2 on backup success: do nothing > Okay, so bitmap1 and bitmap2 cover periods of time that are disjoint; > where you have > > ----time---------------> > [bitmap1][bitmap2....--> > > so you intend to accrue a number of bitmaps representing the last N > slices of time, with only the most recent bitmap being enabled. > > Functionally you intend to permanently fork a bitmap every time a backup > operation succeeds; so on incremental backup: > > (1) We succeed and the forked bitmap we already made gets saved as a new > disabled bitmap instead of being deleted. > (2) We fail, and we roll back exactly as we always have. > > > Here's an idea of what this API might look like without revealing > explicit merge/split primitives. > > A new bitmap property that lets us set retention: > > :: block-dirty-bitmap-set-retention bitmap=foo slices=10 > > Or something similar, where the default property for all bitmaps is zero > -- the current behavior: no copies retained. > > By setting it to a non-zero positive integer, the incremental backup > mode will automatically save a disabled copy when possible. > > "What happens if we exceed our retention?" > > (A) We push the last one out automatically, or > (B) We fail the operation immediately. > > A is more convenient, but potentially unsafe if the management tool or > user wasn't aware that was going to happen. > B is more annoying, but definitely more safe as it means we cannot lose > a bitmap accidentally. > > I would argue for B with perhaps a force-cycle=true|false that defaults > to false to let management tools say "Yes, go ahead, remove the old one" > with additionally some return to let us know it happened: > > {"return": { > "dropped-slices": [ {"bitmap0": 0}, ...] > }} > > This would introduce some concept of bitmap slices into the mix as ID'd > children of a bitmap. I would propose that these slices are numbered and > monotonically increasing. "bitmap0" as an object starts with no slices, > but every incremental backup creates slice 0, slice 1, slice 2, and so > on. Even after we start deleting some, they stay ordered. These numbers > then stand in for points in time. > > The counter can (must?) be reset and all slices forgotten when > performing a full backup while providing a bitmap argument. > > "How can a user make use of the slices once they're made?" > > Let's consider something like mode=partial in contrast to > mode=incremental, and an example where we have 6 prior slices: > 0,1,2,3,4,5, (and, unnamed, the 'active' slice.) > > mode=partial bitmap=foo slice=4 > > This would create a backup from slice 4 to the current time α. This > includes all clusters from 4, 5, and the active bitmap. > > I don't think it is meaningful to define any end point that isn't the > current time, so I've omitted that as a possibility. > > "Does a partial backup create a new point in time?" > > If yes: This means that the next incremental backup must necessarily be > based off of the last partial backup that was made. This seems a little > inconvenient. This would mean that point in time α becomes "slice 6." > > If no: This means that we lose the point in time when we made the > partial and we cannot chain off of the partial backup. It does mean that > the next incremental backup will work as normally expected, however. > This means that point in time α cannot again be referenced by the > management client. > > This mirrors the dynamic between "incremental" and "differential" backups. > > ..hmmm.. > > You know, incremental backups are just a special case of "partial" here > where slice is the last recorded slice... Let's look at an API like this: > > mode=<incremental|differential> bitmap=<name> [slice=N] > > Incremental: We create a new slice if the bitmap has room for one. > Differential: We don't create a new slice. The data in the active bitmap > α does not get cleared after the bitmap operation. > > Slice: > If not specified, assume we want only the active slice. This is the > current behavior in QEMU 2.11. > If specified, we create a temporary merge between bitmaps [N..α] and use > that for the backup operation. > > "Can we delete slices?" > > Sure. > > :: block-dirty-bitmap-slice-delete bitmap=foo slice=4 > > "Can we create a slice without making a bitmap?" > > It would be easy to do, but I'm not sure I see the utility. In using it, > it means if you don't specify the slice manually for the next backup > that you will necessarily be getting something not usable. > > but we COULD do it, it would just be banking the changes in the active > bitmap into a new slice. > >> 3. disable bitmap2, do external backup by bitmap2, create bitmap3 >> 3.1 on backup fail: merge bitmap3 to bitmap2, enable bitmap2, delete >> bitmap3 >> 3.2 on backup success: do nothing >> ... >> >> so, we have disabled bitmaps bitmap_i, corresponding to difference >> between points in time i and i+1 >> and if user wants to get data, changed from point in time number j, he >> just merges bitmaps from >> bitmap_j to the last(active) one to the new bitmap bitmap_temp and >> create corresponding backup. >> > Roger. > >> instead of storing several disabled bitmaps we can store enabled >> bitmaps, in this case we will not need >> merge operation but instead we can implement copy operation. However >> maintaining disabled bitmaps >> is better, as we can then implement storing them to the qcow2 image, to >> save RAM space. >> > OK, I understand what you meant by this now. It's definitely preferable > to have inert bitmaps we don't have to worry about updating identically. > I don't know how many checkpoints you intend to accrue, but if it's more > than a few then the update cost on every single write may become measurable. > >> I also have another idea: >> implement new object: point-in-time or checkpoint. The should have >> names, and the simple add/remove API. >> And they will be backed by dirty bitmaps. so checkpoint deletion is >> bitmap merge (and delete one of them), >> checkpoint creation is disabling of active-checkpoint-bitmap and >> starting new active-checkpoint-bitmap. >> > Yes, exactly! I think that's pretty similar to what I am thinking of > with slices. > > This sounds a little safer to me in that we can examine an operation to > see if it's sane or not. > >> Then we can implement merging of several bitmaps (from one of >> checkpoints to current moment) in >> NBD meta-context-query handling. >> > Note: > > I should say that I've had discussions with Stefan in the past over > things like differential mode and the feeling I got from him was that he > felt that data should be copied from QEMU precisely *once*, viewing any > subsequent copying of the same data as redundant and wasteful. > > Once data is copied out of QEMU and it becomes inert, it should be up to > the management utility to handle re-configuring that data in different > forms, perhaps using bitmaps copied out alongside the initial task. > (e.g. combining multiple incrementals into a 'differential' instead of > allowing a differential mode.) > > I think his viewpoint was that copying data from QEMU is expensive and > necessarily harder and less flexible than manipulating inert data; so he > would prefer to see offline operations provide flexibility when possible. > > (Stefan, do I misrepresent your sentiment?) > > Maybe there's some sympathy to "What if we lose the backup, though?" > however. I'm not against increased flexibility, but you might need to > explain why incremental backups previously made have a reason for being > deleted or lost. > > > Did you finish your cup of coffee? > > --John Guys, the situation on our side is quite simple. Backup vendor who is trying to integrate with us is setting fancy requirements. They have the workflow which is integrated with several other hypervisors and for sure they do not want to have different workflow if possible. Actually this would be a good point for us as the closer API to others we can provide without losing our quality/flexibility etc more people will use QEMU as we will have more solutions. QEMU always provide low-level API for management and does not make any decisions. This is fine. Thus it would be nice to follow that way again and again. So, I do not understand why we can not provide direct access to management. They always know better what to do in the current approach and the simpler the interface will be, the better will be their life. Can we avoid to do complex things when simple alternatives are available? Den
Am 09.12.2017 um 01:57 hat John Snow geschrieben: > Here's an idea of what this API might look like without revealing > explicit merge/split primitives. > > A new bitmap property that lets us set retention: > > :: block-dirty-bitmap-set-retention bitmap=foo slices=10 > > Or something similar, where the default property for all bitmaps is > zero -- the current behavior: no copies retained. > > By setting it to a non-zero positive integer, the incremental backup > mode will automatically save a disabled copy when possible. -EMAGIC Operations that create or delete user-visible objects should be explicit, not automatic. You're trying to implement management layer functionality in qemu here, but incomplete enough that the artifacts of it are still visible externally. (A complete solution within qemu wouldn't expose low-level concepts such as bitmaps on an external interface, but you would expose something like checkpoints.) Usually it's not a good idea to have a design where qemu implements enough to restrict management tools to whatever use case we had in mind, but not enough to make the management tool's life substantially easier (by not having to care about some low-level concepts). > "What happens if we exceed our retention?" > > (A) We push the last one out automatically, or > (B) We fail the operation immediately. > > A is more convenient, but potentially unsafe if the management tool or > user wasn't aware that was going to happen. > B is more annoying, but definitely more safe as it means we cannot lose > a bitmap accidentally. Both mean that the management layer has not only to deal with the deletion of bitmaps as it wants to have them, but also to keep the retention counter somewhere and predict what qemu is going to do to the bitmaps and whether any corrective action needs to be taken. This is making things more complex rather than simpler. > I would argue for B with perhaps a force-cycle=true|false that defaults > to false to let management tools say "Yes, go ahead, remove the old one" > with additionally some return to let us know it happened: > > {"return": { > "dropped-slices": [ {"bitmap0": 0}, ...] > }} > > This would introduce some concept of bitmap slices into the mix as ID'd > children of a bitmap. I would propose that these slices are numbered and > monotonically increasing. "bitmap0" as an object starts with no slices, > but every incremental backup creates slice 0, slice 1, slice 2, and so > on. Even after we start deleting some, they stay ordered. These numbers > then stand in for points in time. > > The counter can (must?) be reset and all slices forgotten when > performing a full backup while providing a bitmap argument. > > "How can a user make use of the slices once they're made?" > > Let's consider something like mode=partial in contrast to > mode=incremental, and an example where we have 6 prior slices: > 0,1,2,3,4,5, (and, unnamed, the 'active' slice.) > > mode=partial bitmap=foo slice=4 > > This would create a backup from slice 4 to the current time α. This > includes all clusters from 4, 5, and the active bitmap. > > I don't think it is meaningful to define any end point that isn't the > current time, so I've omitted that as a possibility. John, what are you doing here? This adds option after option, and even additional slice object, only complicating an easy thing more and more. I'm not sure if that was your intention, but I feel I'm starting to understand better how Linus's rants come about. Let me summarise what this means for management layer: * The management layer has to manage bitmaps. They have direct control over creation and deletion of bitmaps. So far so good. * It also has to manage slices in those bitmaps objects; and these slices are what contains the actual bitmaps. In order to identify a bitmap in qemu, you need: a) the node name b) the bitmap ID, and c) the slice number The slice number is assigned by qemu and libvirt has to wait until qemu tells it about the slice number of a newly created slice. If libvirt doesn't receive the reply to the command that started the block job, it needs to be able to query this information from qemu, e.g. in query-block-jobs. * Slices are automatically created when you start a backup job with a bitmap. It doesn't matter whether you even intend to do an incremental backup against this point in time. qemu knows better. * In order to delete a slice that you don't need any more, you have to create more slices (by doing more backups), but you don't get to decide which one is dropped. qemu helpfully just drops the oldest one. It doesn't matter if you want to keep an older one so you can do an incremental backup for a longer timespan. Don't worry about your backup strategy, qemu knows better. * Of course, just creating a new backup job doesn't mean that removing the old slice works, even if you give the respective option. That's what the 'dropped-slices' return is for. So once again wait for whatever qemu did and reproduce it in the data structures of the management tool. It's also more information that needs to be exposed in query-block-jobs because libvirt might miss the return value. * Hmm... What happens if you start n backup block jobs, with n > slices? Sounds like a great way to introduce subtle bugs in both qemu and the management layer. Do you really think working with this API would be fun for libvirt? > "Does a partial backup create a new point in time?" > > If yes: This means that the next incremental backup must necessarily be > based off of the last partial backup that was made. This seems a little > inconvenient. This would mean that point in time α becomes "slice 6." Or based off any of the previous points in time, provided that qemu didn't helpfully decide to delete it. Can't I still create a backup starting from slice 4 then? Also, a more general question about incremental backup: How does it play with snapshots? Shouldn't we expect that people sometimes use both snapshots and backups? Can we restrict the backup job to considering bitmaps only from a single node or should we be able to reference bitmaps of a backing file as well? > If no: This means that we lose the point in time when we made the > partial and we cannot chain off of the partial backup. It does mean that > the next incremental backup will work as normally expected, however. > This means that point in time α cannot again be referenced by the > management client. > > This mirrors the dynamic between "incremental" and "differential" backups. > > ..hmmm.. > > You know, incremental backups are just a special case of "partial" here > where slice is the last recorded slice... Let's look at an API like this: > > mode=<incremental|differential> bitmap=<name> [slice=N] > > Incremental: We create a new slice if the bitmap has room for one. > Differential: We don't create a new slice. The data in the active bitmap > α does not get cleared after the bitmap operation. > > Slice: > If not specified, assume we want only the active slice. This is the > current behavior in QEMU 2.11. > If specified, we create a temporary merge between bitmaps [N..α] and use > that for the backup operation. > > "Can we delete slices?" > > Sure. > > :: block-dirty-bitmap-slice-delete bitmap=foo slice=4 > > "Can we create a slice without making a bitmap?" > > It would be easy to do, but I'm not sure I see the utility. In using it, > it means if you don't specify the slice manually for the next backup > that you will necessarily be getting something not usable. > > but we COULD do it, it would just be banking the changes in the active > bitmap into a new slice. Okay, with explicit management this is getting a little more reasonable now. However, I don't understand what slices buy us then compared to just separate bitmaps. Essentially, bitmaps form a second kind of backing chain. Backup always wants to use the combined bitmaps of some subchain. I see two easy ways to do this: Either pass an array of bitmaps to consider to the job, or store the "backing link" in the bitmap so that we can just specify a "base bitmap" like we usually do with normal backing files. The backup block job can optionally append a new bitmap to the chain like external snapshots do for backing chains. Deleting a bitmap in the chain is the merge operation, similar to a commit block job for backing chains. We know these mechanism very well because the block layer has been using them for ages. > > I also have another idea: > > implement new object: point-in-time or checkpoint. The should have > > names, and the simple add/remove API. > > And they will be backed by dirty bitmaps. so checkpoint deletion is > > bitmap merge (and delete one of them), > > checkpoint creation is disabling of active-checkpoint-bitmap and > > starting new active-checkpoint-bitmap. > > Yes, exactly! I think that's pretty similar to what I am thinking of > with slices. > > This sounds a little safer to me in that we can examine an operation to > see if it's sane or not. Exposing checkpoints is a reasonable high-level API. The important part then is that you don't expose bitmaps + slices, but only checkpoints without bitmaps. The bitmaps are an implementation detail. > > Then we can implement merging of several bitmaps (from one of > > checkpoints to current moment) in > > NBD meta-context-query handling. > > > Note: > > I should say that I've had discussions with Stefan in the past over > things like differential mode and the feeling I got from him was that he > felt that data should be copied from QEMU precisely *once*, viewing any > subsequent copying of the same data as redundant and wasteful. That's a management layer decision. Apparently there are users who want to copy from qemu multiple times, otherwise we wouldn't be talking about slices and retention. Kevin
11.12.2017 14:15, Kevin Wolf wrote: > Am 09.12.2017 um 01:57 hat John Snow geschrieben: >> Here's an idea of what this API might look like without revealing >> explicit merge/split primitives. >> >> A new bitmap property that lets us set retention: >> >> :: block-dirty-bitmap-set-retention bitmap=foo slices=10 >> >> Or something similar, where the default property for all bitmaps is >> zero -- the current behavior: no copies retained. >> >> By setting it to a non-zero positive integer, the incremental backup >> mode will automatically save a disabled copy when possible. > -EMAGIC > > Operations that create or delete user-visible objects should be > explicit, not automatic. You're trying to implement management layer > functionality in qemu here, but incomplete enough that the artifacts of > it are still visible externally. (A complete solution within qemu > wouldn't expose low-level concepts such as bitmaps on an external > interface, but you would expose something like checkpoints.) > > Usually it's not a good idea to have a design where qemu implements > enough to restrict management tools to whatever use case we had in mind, > but not enough to make the management tool's life substantially easier > (by not having to care about some low-level concepts). > >> "What happens if we exceed our retention?" >> >> (A) We push the last one out automatically, or >> (B) We fail the operation immediately. >> >> A is more convenient, but potentially unsafe if the management tool or >> user wasn't aware that was going to happen. >> B is more annoying, but definitely more safe as it means we cannot lose >> a bitmap accidentally. > Both mean that the management layer has not only to deal with the > deletion of bitmaps as it wants to have them, but also to keep the > retention counter somewhere and predict what qemu is going to do to the > bitmaps and whether any corrective action needs to be taken. > > This is making things more complex rather than simpler. > >> I would argue for B with perhaps a force-cycle=true|false that defaults >> to false to let management tools say "Yes, go ahead, remove the old one" >> with additionally some return to let us know it happened: >> >> {"return": { >> "dropped-slices": [ {"bitmap0": 0}, ...] >> }} >> >> This would introduce some concept of bitmap slices into the mix as ID'd >> children of a bitmap. I would propose that these slices are numbered and >> monotonically increasing. "bitmap0" as an object starts with no slices, >> but every incremental backup creates slice 0, slice 1, slice 2, and so >> on. Even after we start deleting some, they stay ordered. These numbers >> then stand in for points in time. >> >> The counter can (must?) be reset and all slices forgotten when >> performing a full backup while providing a bitmap argument. >> >> "How can a user make use of the slices once they're made?" >> >> Let's consider something like mode=partial in contrast to >> mode=incremental, and an example where we have 6 prior slices: >> 0,1,2,3,4,5, (and, unnamed, the 'active' slice.) >> >> mode=partial bitmap=foo slice=4 >> >> This would create a backup from slice 4 to the current time α. This >> includes all clusters from 4, 5, and the active bitmap. >> >> I don't think it is meaningful to define any end point that isn't the >> current time, so I've omitted that as a possibility. > John, what are you doing here? This adds option after option, and even > additional slice object, only complicating an easy thing more and more. > I'm not sure if that was your intention, but I feel I'm starting to > understand better how Linus's rants come about. > > Let me summarise what this means for management layer: > > * The management layer has to manage bitmaps. They have direct control > over creation and deletion of bitmaps. So far so good. > > * It also has to manage slices in those bitmaps objects; and these > slices are what contains the actual bitmaps. In order to identify a > bitmap in qemu, you need: > > a) the node name > b) the bitmap ID, and > c) the slice number > > The slice number is assigned by qemu and libvirt has to wait until > qemu tells it about the slice number of a newly created slice. If > libvirt doesn't receive the reply to the command that started the > block job, it needs to be able to query this information from qemu, > e.g. in query-block-jobs. > > * Slices are automatically created when you start a backup job with a > bitmap. It doesn't matter whether you even intend to do an incremental > backup against this point in time. qemu knows better. > > * In order to delete a slice that you don't need any more, you have to > create more slices (by doing more backups), but you don't get to > decide which one is dropped. qemu helpfully just drops the oldest one. > It doesn't matter if you want to keep an older one so you can do an > incremental backup for a longer timespan. Don't worry about your > backup strategy, qemu knows better. > > * Of course, just creating a new backup job doesn't mean that removing > the old slice works, even if you give the respective option. That's > what the 'dropped-slices' return is for. So once again wait for > whatever qemu did and reproduce it in the data structures of the > management tool. It's also more information that needs to be exposed > in query-block-jobs because libvirt might miss the return value. > > * Hmm... What happens if you start n backup block jobs, with n > slices? > Sounds like a great way to introduce subtle bugs in both qemu and the > management layer. > > Do you really think working with this API would be fun for libvirt? > >> "Does a partial backup create a new point in time?" >> >> If yes: This means that the next incremental backup must necessarily be >> based off of the last partial backup that was made. This seems a little >> inconvenient. This would mean that point in time α becomes "slice 6." > Or based off any of the previous points in time, provided that qemu > didn't helpfully decide to delete it. Can't I still create a backup > starting from slice 4 then? > > Also, a more general question about incremental backup: How does it play > with snapshots? Shouldn't we expect that people sometimes use both > snapshots and backups? Can we restrict the backup job to considering > bitmaps only from a single node or should we be able to reference > bitmaps of a backing file as well? > >> If no: This means that we lose the point in time when we made the >> partial and we cannot chain off of the partial backup. It does mean that >> the next incremental backup will work as normally expected, however. >> This means that point in time α cannot again be referenced by the >> management client. >> >> This mirrors the dynamic between "incremental" and "differential" backups. >> >> ..hmmm.. >> >> You know, incremental backups are just a special case of "partial" here >> where slice is the last recorded slice... Let's look at an API like this: >> >> mode=<incremental|differential> bitmap=<name> [slice=N] >> >> Incremental: We create a new slice if the bitmap has room for one. >> Differential: We don't create a new slice. The data in the active bitmap >> α does not get cleared after the bitmap operation. >> >> Slice: >> If not specified, assume we want only the active slice. This is the >> current behavior in QEMU 2.11. >> If specified, we create a temporary merge between bitmaps [N..α] and use >> that for the backup operation. >> >> "Can we delete slices?" >> >> Sure. >> >> :: block-dirty-bitmap-slice-delete bitmap=foo slice=4 >> >> "Can we create a slice without making a bitmap?" >> >> It would be easy to do, but I'm not sure I see the utility. In using it, >> it means if you don't specify the slice manually for the next backup >> that you will necessarily be getting something not usable. >> >> but we COULD do it, it would just be banking the changes in the active >> bitmap into a new slice. > Okay, with explicit management this is getting a little more reasonable > now. However, I don't understand what slices buy us then compared to > just separate bitmaps. > > Essentially, bitmaps form a second kind of backing chain. Backup always > wants to use the combined bitmaps of some subchain. I see two easy ways > to do this: Either pass an array of bitmaps to consider to the job, or > store the "backing link" in the bitmap so that we can just specify a > "base bitmap" like we usually do with normal backing files. > > The backup block job can optionally append a new bitmap to the chain > like external snapshots do for backing chains. Deleting a bitmap in the > chain is the merge operation, similar to a commit block job for backing > chains. > > We know these mechanism very well because the block layer has been using > them for ages. > >>> I also have another idea: >>> implement new object: point-in-time or checkpoint. The should have >>> names, and the simple add/remove API. >>> And they will be backed by dirty bitmaps. so checkpoint deletion is >>> bitmap merge (and delete one of them), >>> checkpoint creation is disabling of active-checkpoint-bitmap and >>> starting new active-checkpoint-bitmap. >> Yes, exactly! I think that's pretty similar to what I am thinking of >> with slices. >> >> This sounds a little safer to me in that we can examine an operation to >> see if it's sane or not. > Exposing checkpoints is a reasonable high-level API. The important part > then is that you don't expose bitmaps + slices, but only checkpoints > without bitmaps. The bitmaps are an implementation detail. > >>> Then we can implement merging of several bitmaps (from one of >>> checkpoints to current moment) in >>> NBD meta-context-query handling. >>> >> Note: >> >> I should say that I've had discussions with Stefan in the past over >> things like differential mode and the feeling I got from him was that he >> felt that data should be copied from QEMU precisely *once*, viewing any >> subsequent copying of the same data as redundant and wasteful. > That's a management layer decision. Apparently there are users who want > to copy from qemu multiple times, otherwise we wouldn't be talking about > slices and retention. > > Kevin You didn't touch storing-to-qcow2 and migration, which we will have to implement for these bitmap-like objects. By exposing low level merge/disable/enable we solve all the problem by less code and without negotiation the architecture. We avoid: - implementing new job - implementing new objects (checkpoints) and new interface for them - saving these new objects to qcow2 - migration of them Also, even if we open merge/disable/enable interface, we can implement checkpoints or slices in future, if we will be _sure_ that we need them. Also, I don't see what is unsafe with implementing this simple low-level API. We already have bitmaps deletion. Is it safe? -- Best regards, Vladimir
On 12/11/2017 07:18 AM, Vladimir Sementsov-Ogievskiy wrote: > 11.12.2017 14:15, Kevin Wolf wrote: >> Am 09.12.2017 um 01:57 hat John Snow geschrieben: >>> Here's an idea of what this API might look like without revealing >>> explicit merge/split primitives. >>> >>> A new bitmap property that lets us set retention: >>> >>> :: block-dirty-bitmap-set-retention bitmap=foo slices=10 >>> >>> Or something similar, where the default property for all bitmaps is >>> zero -- the current behavior: no copies retained. >>> >>> By setting it to a non-zero positive integer, the incremental backup >>> mode will automatically save a disabled copy when possible. >> -EMAGIC >> >> Operations that create or delete user-visible objects should be >> explicit, not automatic. You're trying to implement management layer >> functionality in qemu here, but incomplete enough that the artifacts of >> it are still visible externally. (A complete solution within qemu >> wouldn't expose low-level concepts such as bitmaps on an external >> interface, but you would expose something like checkpoints.) >> >> Usually it's not a good idea to have a design where qemu implements >> enough to restrict management tools to whatever use case we had in mind, >> but not enough to make the management tool's life substantially easier >> (by not having to care about some low-level concepts). >> >>> "What happens if we exceed our retention?" >>> >>> (A) We push the last one out automatically, or >>> (B) We fail the operation immediately. >>> >>> A is more convenient, but potentially unsafe if the management tool or >>> user wasn't aware that was going to happen. >>> B is more annoying, but definitely more safe as it means we cannot lose >>> a bitmap accidentally. >> Both mean that the management layer has not only to deal with the >> deletion of bitmaps as it wants to have them, but also to keep the >> retention counter somewhere and predict what qemu is going to do to the >> bitmaps and whether any corrective action needs to be taken. >> >> This is making things more complex rather than simpler. >> >>> I would argue for B with perhaps a force-cycle=true|false that defaults >>> to false to let management tools say "Yes, go ahead, remove the old one" >>> with additionally some return to let us know it happened: >>> >>> {"return": { >>> "dropped-slices": [ {"bitmap0": 0}, ...] >>> }} >>> >>> This would introduce some concept of bitmap slices into the mix as ID'd >>> children of a bitmap. I would propose that these slices are numbered and >>> monotonically increasing. "bitmap0" as an object starts with no slices, >>> but every incremental backup creates slice 0, slice 1, slice 2, and so >>> on. Even after we start deleting some, they stay ordered. These numbers >>> then stand in for points in time. >>> >>> The counter can (must?) be reset and all slices forgotten when >>> performing a full backup while providing a bitmap argument. >>> >>> "How can a user make use of the slices once they're made?" >>> >>> Let's consider something like mode=partial in contrast to >>> mode=incremental, and an example where we have 6 prior slices: >>> 0,1,2,3,4,5, (and, unnamed, the 'active' slice.) >>> >>> mode=partial bitmap=foo slice=4 >>> >>> This would create a backup from slice 4 to the current time α. This >>> includes all clusters from 4, 5, and the active bitmap. >>> >>> I don't think it is meaningful to define any end point that isn't the >>> current time, so I've omitted that as a possibility. >> John, what are you doing here? This adds option after option, and even >> additional slice object, only complicating an easy thing more and more. >> I'm not sure if that was your intention, but I feel I'm starting to >> understand better how Linus's rants come about. >> >> Let me summarise what this means for management layer: >> >> * The management layer has to manage bitmaps. They have direct control >> over creation and deletion of bitmaps. So far so good. >> >> * It also has to manage slices in those bitmaps objects; and these >> slices are what contains the actual bitmaps. In order to identify a >> bitmap in qemu, you need: >> >> a) the node name >> b) the bitmap ID, and >> c) the slice number >> >> The slice number is assigned by qemu and libvirt has to wait until >> qemu tells it about the slice number of a newly created slice. If >> libvirt doesn't receive the reply to the command that started the >> block job, it needs to be able to query this information from qemu, >> e.g. in query-block-jobs. >> >> * Slices are automatically created when you start a backup job with a >> bitmap. It doesn't matter whether you even intend to do an incremental >> backup against this point in time. qemu knows better. >> >> * In order to delete a slice that you don't need any more, you have to >> create more slices (by doing more backups), but you don't get to >> decide which one is dropped. qemu helpfully just drops the oldest one. >> It doesn't matter if you want to keep an older one so you can do an >> incremental backup for a longer timespan. Don't worry about your >> backup strategy, qemu knows better. >> >> * Of course, just creating a new backup job doesn't mean that removing >> the old slice works, even if you give the respective option. That's >> what the 'dropped-slices' return is for. So once again wait for >> whatever qemu did and reproduce it in the data structures of the >> management tool. It's also more information that needs to be exposed >> in query-block-jobs because libvirt might miss the return value. >> >> * Hmm... What happens if you start n backup block jobs, with n > slices? >> Sounds like a great way to introduce subtle bugs in both qemu and the >> management layer. >> >> Do you really think working with this API would be fun for libvirt? >> >>> "Does a partial backup create a new point in time?" >>> >>> If yes: This means that the next incremental backup must necessarily be >>> based off of the last partial backup that was made. This seems a little >>> inconvenient. This would mean that point in time α becomes "slice 6." >> Or based off any of the previous points in time, provided that qemu >> didn't helpfully decide to delete it. Can't I still create a backup >> starting from slice 4 then? >> >> Also, a more general question about incremental backup: How does it play >> with snapshots? Shouldn't we expect that people sometimes use both >> snapshots and backups? Can we restrict the backup job to considering >> bitmaps only from a single node or should we be able to reference >> bitmaps of a backing file as well? >> >>> If no: This means that we lose the point in time when we made the >>> partial and we cannot chain off of the partial backup. It does mean that >>> the next incremental backup will work as normally expected, however. >>> This means that point in time α cannot again be referenced by the >>> management client. >>> >>> This mirrors the dynamic between "incremental" and "differential" >>> backups. >>> >>> ..hmmm.. >>> >>> You know, incremental backups are just a special case of "partial" here >>> where slice is the last recorded slice... Let's look at an API like >>> this: >>> >>> mode=<incremental|differential> bitmap=<name> [slice=N] >>> >>> Incremental: We create a new slice if the bitmap has room for one. >>> Differential: We don't create a new slice. The data in the active bitmap >>> α does not get cleared after the bitmap operation. >>> >>> Slice: >>> If not specified, assume we want only the active slice. This is the >>> current behavior in QEMU 2.11. >>> If specified, we create a temporary merge between bitmaps [N..α] and use >>> that for the backup operation. >>> >>> "Can we delete slices?" >>> >>> Sure. >>> >>> :: block-dirty-bitmap-slice-delete bitmap=foo slice=4 >>> >>> "Can we create a slice without making a bitmap?" >>> >>> It would be easy to do, but I'm not sure I see the utility. In using it, >>> it means if you don't specify the slice manually for the next backup >>> that you will necessarily be getting something not usable. >>> >>> but we COULD do it, it would just be banking the changes in the active >>> bitmap into a new slice. >> Okay, with explicit management this is getting a little more reasonable >> now. However, I don't understand what slices buy us then compared to >> just separate bitmaps. >> >> Essentially, bitmaps form a second kind of backing chain. Backup always >> wants to use the combined bitmaps of some subchain. I see two easy ways >> to do this: Either pass an array of bitmaps to consider to the job, or >> store the "backing link" in the bitmap so that we can just specify a >> "base bitmap" like we usually do with normal backing files. >> >> The backup block job can optionally append a new bitmap to the chain >> like external snapshots do for backing chains. Deleting a bitmap in the >> chain is the merge operation, similar to a commit block job for backing >> chains. >> >> We know these mechanism very well because the block layer has been using >> them for ages. >> >>>> I also have another idea: >>>> implement new object: point-in-time or checkpoint. The should have >>>> names, and the simple add/remove API. >>>> And they will be backed by dirty bitmaps. so checkpoint deletion is >>>> bitmap merge (and delete one of them), >>>> checkpoint creation is disabling of active-checkpoint-bitmap and >>>> starting new active-checkpoint-bitmap. >>> Yes, exactly! I think that's pretty similar to what I am thinking of >>> with slices. >>> >>> This sounds a little safer to me in that we can examine an operation to >>> see if it's sane or not. >> Exposing checkpoints is a reasonable high-level API. The important part >> then is that you don't expose bitmaps + slices, but only checkpoints >> without bitmaps. The bitmaps are an implementation detail. >> >>>> Then we can implement merging of several bitmaps (from one of >>>> checkpoints to current moment) in >>>> NBD meta-context-query handling. >>>> >>> Note: >>> >>> I should say that I've had discussions with Stefan in the past over >>> things like differential mode and the feeling I got from him was that he >>> felt that data should be copied from QEMU precisely *once*, viewing any >>> subsequent copying of the same data as redundant and wasteful. >> That's a management layer decision. Apparently there are users who want >> to copy from qemu multiple times, otherwise we wouldn't be talking about >> slices and retention. >> >> Kevin > > You didn't touch storing-to-qcow2 and migration, which we will have to > implement for > these bitmap-like objects. > > By exposing low level merge/disable/enable we solve all the problem by > less code and > without negotiation the architecture. We avoid: > - implementing new job > - implementing new objects (checkpoints) and new interface for them > - saving these new objects to qcow2 > - migration of them > > Also, even if we open merge/disable/enable interface, we can implement > checkpoints or slices > in future, if we will be _sure_ that we need them. > > Also, I don't see what is unsafe with implementing this simple low-level > API. > We already have bitmaps deletion. Is it safe? > > Just scrap the whole thought.
On 12/11/2017 06:15 AM, Kevin Wolf wrote: > Am 09.12.2017 um 01:57 hat John Snow geschrieben: >> Here's an idea of what this API might look like without revealing >> explicit merge/split primitives. >> >> A new bitmap property that lets us set retention: >> >> :: block-dirty-bitmap-set-retention bitmap=foo slices=10 >> >> Or something similar, where the default property for all bitmaps is >> zero -- the current behavior: no copies retained. >> >> By setting it to a non-zero positive integer, the incremental backup >> mode will automatically save a disabled copy when possible. > > -EMAGIC > > Operations that create or delete user-visible objects should be > explicit, not automatic. You're trying to implement management layer > functionality in qemu here, but incomplete enough that the artifacts of > it are still visible externally. (A complete solution within qemu > wouldn't expose low-level concepts such as bitmaps on an external > interface, but you would expose something like checkpoints.) > > Usually it's not a good idea to have a design where qemu implements > enough to restrict management tools to whatever use case we had in mind, > but not enough to make the management tool's life substantially easier > (by not having to care about some low-level concepts). > >> "What happens if we exceed our retention?" >> >> (A) We push the last one out automatically, or >> (B) We fail the operation immediately. >> >> A is more convenient, but potentially unsafe if the management tool or >> user wasn't aware that was going to happen. >> B is more annoying, but definitely more safe as it means we cannot lose >> a bitmap accidentally. > > Both mean that the management layer has not only to deal with the > deletion of bitmaps as it wants to have them, but also to keep the > retention counter somewhere and predict what qemu is going to do to the > bitmaps and whether any corrective action needs to be taken. > > This is making things more complex rather than simpler. > >> I would argue for B with perhaps a force-cycle=true|false that defaults >> to false to let management tools say "Yes, go ahead, remove the old one" >> with additionally some return to let us know it happened: >> >> {"return": { >> "dropped-slices": [ {"bitmap0": 0}, ...] >> }} >> >> This would introduce some concept of bitmap slices into the mix as ID'd >> children of a bitmap. I would propose that these slices are numbered and >> monotonically increasing. "bitmap0" as an object starts with no slices, >> but every incremental backup creates slice 0, slice 1, slice 2, and so >> on. Even after we start deleting some, they stay ordered. These numbers >> then stand in for points in time. >> >> The counter can (must?) be reset and all slices forgotten when >> performing a full backup while providing a bitmap argument. >> >> "How can a user make use of the slices once they're made?" >> >> Let's consider something like mode=partial in contrast to >> mode=incremental, and an example where we have 6 prior slices: >> 0,1,2,3,4,5, (and, unnamed, the 'active' slice.) >> >> mode=partial bitmap=foo slice=4 >> >> This would create a backup from slice 4 to the current time α. This >> includes all clusters from 4, 5, and the active bitmap. >> >> I don't think it is meaningful to define any end point that isn't the >> current time, so I've omitted that as a possibility. > > John, what are you doing here? This adds option after option, and even > additional slice object, only complicating an easy thing more and more. > I'm not sure if that was your intention, but I feel I'm starting to > understand better how Linus's rants come about. > > Let me summarise what this means for management layer: > > * The management layer has to manage bitmaps. They have direct control > over creation and deletion of bitmaps. So far so good. > > * It also has to manage slices in those bitmaps objects; and these > slices are what contains the actual bitmaps. In order to identify a > bitmap in qemu, you need: > > a) the node name > b) the bitmap ID, and > c) the slice number > > The slice number is assigned by qemu and libvirt has to wait until > qemu tells it about the slice number of a newly created slice. If > libvirt doesn't receive the reply to the command that started the > block job, it needs to be able to query this information from qemu, > e.g. in query-block-jobs. > > * Slices are automatically created when you start a backup job with a > bitmap. It doesn't matter whether you even intend to do an incremental > backup against this point in time. qemu knows better. > > * In order to delete a slice that you don't need any more, you have to > create more slices (by doing more backups), but you don't get to > decide which one is dropped. qemu helpfully just drops the oldest one. > It doesn't matter if you want to keep an older one so you can do an > incremental backup for a longer timespan. Don't worry about your > backup strategy, qemu knows better. > > * Of course, just creating a new backup job doesn't mean that removing > the old slice works, even if you give the respective option. That's > what the 'dropped-slices' return is for. So once again wait for > whatever qemu did and reproduce it in the data structures of the > management tool. It's also more information that needs to be exposed > in query-block-jobs because libvirt might miss the return value. > > * Hmm... What happens if you start n backup block jobs, with n > slices? > Sounds like a great way to introduce subtle bugs in both qemu and the > management layer. > > Do you really think working with this API would be fun for libvirt? > >> "Does a partial backup create a new point in time?" >> >> If yes: This means that the next incremental backup must necessarily be >> based off of the last partial backup that was made. This seems a little >> inconvenient. This would mean that point in time α becomes "slice 6." > > Or based off any of the previous points in time, provided that qemu > didn't helpfully decide to delete it. Can't I still create a backup > starting from slice 4 then? > > Also, a more general question about incremental backup: How does it play > with snapshots? Shouldn't we expect that people sometimes use both > snapshots and backups? Can we restrict the backup job to considering > bitmaps only from a single node or should we be able to reference > bitmaps of a backing file as well? > >> If no: This means that we lose the point in time when we made the >> partial and we cannot chain off of the partial backup. It does mean that >> the next incremental backup will work as normally expected, however. >> This means that point in time α cannot again be referenced by the >> management client. >> >> This mirrors the dynamic between "incremental" and "differential" backups. >> >> ..hmmm.. >> >> You know, incremental backups are just a special case of "partial" here >> where slice is the last recorded slice... Let's look at an API like this: >> >> mode=<incremental|differential> bitmap=<name> [slice=N] >> >> Incremental: We create a new slice if the bitmap has room for one. >> Differential: We don't create a new slice. The data in the active bitmap >> α does not get cleared after the bitmap operation. >> >> Slice: >> If not specified, assume we want only the active slice. This is the >> current behavior in QEMU 2.11. >> If specified, we create a temporary merge between bitmaps [N..α] and use >> that for the backup operation. >> >> "Can we delete slices?" >> >> Sure. >> >> :: block-dirty-bitmap-slice-delete bitmap=foo slice=4 >> >> "Can we create a slice without making a bitmap?" >> >> It would be easy to do, but I'm not sure I see the utility. In using it, >> it means if you don't specify the slice manually for the next backup >> that you will necessarily be getting something not usable. >> >> but we COULD do it, it would just be banking the changes in the active >> bitmap into a new slice. > > Okay, with explicit management this is getting a little more reasonable > now. However, I don't understand what slices buy us then compared to > just separate bitmaps. > > Essentially, bitmaps form a second kind of backing chain. Backup always > wants to use the combined bitmaps of some subchain. I see two easy ways > to do this: Either pass an array of bitmaps to consider to the job, or > store the "backing link" in the bitmap so that we can just specify a > "base bitmap" like we usually do with normal backing files. > > The backup block job can optionally append a new bitmap to the chain > like external snapshots do for backing chains. Deleting a bitmap in the > chain is the merge operation, similar to a commit block job for backing > chains. > > We know these mechanism very well because the block layer has been using > them for ages. > >>> I also have another idea: >>> implement new object: point-in-time or checkpoint. The should have >>> names, and the simple add/remove API. >>> And they will be backed by dirty bitmaps. so checkpoint deletion is >>> bitmap merge (and delete one of them), >>> checkpoint creation is disabling of active-checkpoint-bitmap and >>> starting new active-checkpoint-bitmap. >> >> Yes, exactly! I think that's pretty similar to what I am thinking of >> with slices. >> >> This sounds a little safer to me in that we can examine an operation to >> see if it's sane or not. > > Exposing checkpoints is a reasonable high-level API. The important part > then is that you don't expose bitmaps + slices, but only checkpoints > without bitmaps. The bitmaps are an implementation detail. > >>> Then we can implement merging of several bitmaps (from one of >>> checkpoints to current moment) in >>> NBD meta-context-query handling. >>> >> Note: >> >> I should say that I've had discussions with Stefan in the past over >> things like differential mode and the feeling I got from him was that he >> felt that data should be copied from QEMU precisely *once*, viewing any >> subsequent copying of the same data as redundant and wasteful. > > That's a management layer decision. Apparently there are users who want > to copy from qemu multiple times, otherwise we wouldn't be talking about > slices and retention. > > Kevin > Sorry. John
Am 11.12.2017 um 19:40 hat John Snow geschrieben: > > > On 12/11/2017 06:15 AM, Kevin Wolf wrote: > > Am 09.12.2017 um 01:57 hat John Snow geschrieben: > >> Here's an idea of what this API might look like without revealing > >> explicit merge/split primitives. > >> > >> A new bitmap property that lets us set retention: > >> > >> :: block-dirty-bitmap-set-retention bitmap=foo slices=10 > >> > >> Or something similar, where the default property for all bitmaps is > >> zero -- the current behavior: no copies retained. > >> > >> By setting it to a non-zero positive integer, the incremental backup > >> mode will automatically save a disabled copy when possible. > > > > -EMAGIC > > > > Operations that create or delete user-visible objects should be > > explicit, not automatic. You're trying to implement management layer > > functionality in qemu here, but incomplete enough that the artifacts of > > it are still visible externally. (A complete solution within qemu > > wouldn't expose low-level concepts such as bitmaps on an external > > interface, but you would expose something like checkpoints.) > > > > Usually it's not a good idea to have a design where qemu implements > > enough to restrict management tools to whatever use case we had in mind, > > but not enough to make the management tool's life substantially easier > > (by not having to care about some low-level concepts). > > > >> "What happens if we exceed our retention?" > >> > >> (A) We push the last one out automatically, or > >> (B) We fail the operation immediately. > >> > >> A is more convenient, but potentially unsafe if the management tool or > >> user wasn't aware that was going to happen. > >> B is more annoying, but definitely more safe as it means we cannot lose > >> a bitmap accidentally. > > > > Both mean that the management layer has not only to deal with the > > deletion of bitmaps as it wants to have them, but also to keep the > > retention counter somewhere and predict what qemu is going to do to the > > bitmaps and whether any corrective action needs to be taken. > > > > This is making things more complex rather than simpler. > > > >> I would argue for B with perhaps a force-cycle=true|false that defaults > >> to false to let management tools say "Yes, go ahead, remove the old one" > >> with additionally some return to let us know it happened: > >> > >> {"return": { > >> "dropped-slices": [ {"bitmap0": 0}, ...] > >> }} > >> > >> This would introduce some concept of bitmap slices into the mix as ID'd > >> children of a bitmap. I would propose that these slices are numbered and > >> monotonically increasing. "bitmap0" as an object starts with no slices, > >> but every incremental backup creates slice 0, slice 1, slice 2, and so > >> on. Even after we start deleting some, they stay ordered. These numbers > >> then stand in for points in time. > >> > >> The counter can (must?) be reset and all slices forgotten when > >> performing a full backup while providing a bitmap argument. > >> > >> "How can a user make use of the slices once they're made?" > >> > >> Let's consider something like mode=partial in contrast to > >> mode=incremental, and an example where we have 6 prior slices: > >> 0,1,2,3,4,5, (and, unnamed, the 'active' slice.) > >> > >> mode=partial bitmap=foo slice=4 > >> > >> This would create a backup from slice 4 to the current time α. This > >> includes all clusters from 4, 5, and the active bitmap. > >> > >> I don't think it is meaningful to define any end point that isn't the > >> current time, so I've omitted that as a possibility. > > > > John, what are you doing here? This adds option after option, and even > > additional slice object, only complicating an easy thing more and more. > > I'm not sure if that was your intention, but I feel I'm starting to > > understand better how Linus's rants come about. > > > > Let me summarise what this means for management layer: > > > > * The management layer has to manage bitmaps. They have direct control > > over creation and deletion of bitmaps. So far so good. > > > > * It also has to manage slices in those bitmaps objects; and these > > slices are what contains the actual bitmaps. In order to identify a > > bitmap in qemu, you need: > > > > a) the node name > > b) the bitmap ID, and > > c) the slice number > > > > The slice number is assigned by qemu and libvirt has to wait until > > qemu tells it about the slice number of a newly created slice. If > > libvirt doesn't receive the reply to the command that started the > > block job, it needs to be able to query this information from qemu, > > e.g. in query-block-jobs. > > > > * Slices are automatically created when you start a backup job with a > > bitmap. It doesn't matter whether you even intend to do an incremental > > backup against this point in time. qemu knows better. > > > > * In order to delete a slice that you don't need any more, you have to > > create more slices (by doing more backups), but you don't get to > > decide which one is dropped. qemu helpfully just drops the oldest one. > > It doesn't matter if you want to keep an older one so you can do an > > incremental backup for a longer timespan. Don't worry about your > > backup strategy, qemu knows better. > > > > * Of course, just creating a new backup job doesn't mean that removing > > the old slice works, even if you give the respective option. That's > > what the 'dropped-slices' return is for. So once again wait for > > whatever qemu did and reproduce it in the data structures of the > > management tool. It's also more information that needs to be exposed > > in query-block-jobs because libvirt might miss the return value. > > > > * Hmm... What happens if you start n backup block jobs, with n > slices? > > Sounds like a great way to introduce subtle bugs in both qemu and the > > management layer. > > > > Do you really think working with this API would be fun for libvirt? > > > >> "Does a partial backup create a new point in time?" > >> > >> If yes: This means that the next incremental backup must necessarily be > >> based off of the last partial backup that was made. This seems a little > >> inconvenient. This would mean that point in time α becomes "slice 6." > > > > Or based off any of the previous points in time, provided that qemu > > didn't helpfully decide to delete it. Can't I still create a backup > > starting from slice 4 then? > > > > Also, a more general question about incremental backup: How does it play > > with snapshots? Shouldn't we expect that people sometimes use both > > snapshots and backups? Can we restrict the backup job to considering > > bitmaps only from a single node or should we be able to reference > > bitmaps of a backing file as well? > > > >> If no: This means that we lose the point in time when we made the > >> partial and we cannot chain off of the partial backup. It does mean that > >> the next incremental backup will work as normally expected, however. > >> This means that point in time α cannot again be referenced by the > >> management client. > >> > >> This mirrors the dynamic between "incremental" and "differential" backups. > >> > >> ..hmmm.. > >> > >> You know, incremental backups are just a special case of "partial" here > >> where slice is the last recorded slice... Let's look at an API like this: > >> > >> mode=<incremental|differential> bitmap=<name> [slice=N] > >> > >> Incremental: We create a new slice if the bitmap has room for one. > >> Differential: We don't create a new slice. The data in the active bitmap > >> α does not get cleared after the bitmap operation. > >> > >> Slice: > >> If not specified, assume we want only the active slice. This is the > >> current behavior in QEMU 2.11. > >> If specified, we create a temporary merge between bitmaps [N..α] and use > >> that for the backup operation. > >> > >> "Can we delete slices?" > >> > >> Sure. > >> > >> :: block-dirty-bitmap-slice-delete bitmap=foo slice=4 > >> > >> "Can we create a slice without making a bitmap?" > >> > >> It would be easy to do, but I'm not sure I see the utility. In using it, > >> it means if you don't specify the slice manually for the next backup > >> that you will necessarily be getting something not usable. > >> > >> but we COULD do it, it would just be banking the changes in the active > >> bitmap into a new slice. > > > > Okay, with explicit management this is getting a little more reasonable > > now. However, I don't understand what slices buy us then compared to > > just separate bitmaps. > > > > Essentially, bitmaps form a second kind of backing chain. Backup always > > wants to use the combined bitmaps of some subchain. I see two easy ways > > to do this: Either pass an array of bitmaps to consider to the job, or > > store the "backing link" in the bitmap so that we can just specify a > > "base bitmap" like we usually do with normal backing files. > > > > The backup block job can optionally append a new bitmap to the chain > > like external snapshots do for backing chains. Deleting a bitmap in the > > chain is the merge operation, similar to a commit block job for backing > > chains. > > > > We know these mechanism very well because the block layer has been using > > them for ages. > > > >>> I also have another idea: > >>> implement new object: point-in-time or checkpoint. The should have > >>> names, and the simple add/remove API. > >>> And they will be backed by dirty bitmaps. so checkpoint deletion is > >>> bitmap merge (and delete one of them), > >>> checkpoint creation is disabling of active-checkpoint-bitmap and > >>> starting new active-checkpoint-bitmap. > >> > >> Yes, exactly! I think that's pretty similar to what I am thinking of > >> with slices. > >> > >> This sounds a little safer to me in that we can examine an operation to > >> see if it's sane or not. > > > > Exposing checkpoints is a reasonable high-level API. The important part > > then is that you don't expose bitmaps + slices, but only checkpoints > > without bitmaps. The bitmaps are an implementation detail. > > > >>> Then we can implement merging of several bitmaps (from one of > >>> checkpoints to current moment) in > >>> NBD meta-context-query handling. > >>> > >> Note: > >> > >> I should say that I've had discussions with Stefan in the past over > >> things like differential mode and the feeling I got from him was that he > >> felt that data should be copied from QEMU precisely *once*, viewing any > >> subsequent copying of the same data as redundant and wasteful. > > > > That's a management layer decision. Apparently there are users who want > > to copy from qemu multiple times, otherwise we wouldn't be talking about > > slices and retention. > > > > Kevin > > Sorry. *lol* Though I do hope that my rant was at least somewhat constructive, if only by making differently broken suggestions. ;-) Kevin
On 11/17/2017 06:10 AM, John Snow wrote: > > On 11/16/2017 03:17 AM, Vladimir Sementsov-Ogievskiy wrote: >> 16.11.2017 00:20, John Snow wrote: >>> On 11/13/2017 11:20 AM, Vladimir Sementsov-Ogievskiy wrote: >>>> Hi all. >>>> >>>> There are three qmp commands, needed to implement external backup API. >>>> >>>> Using these three commands, client may do all needed bitmap >>>> management by >>>> hand: >>>> >>>> on backup start we need to do a transaction: >>>> {disable old bitmap, create new bitmap} >>>> >>>> on backup success: >>>> drop old bitmap >>>> >>>> on backup fail: >>>> enable old bitmap >>>> merge new bitmap to old bitmap >>>> drop new bitmap >>>> >>> Can you give me an example of how you expect these commands to be used, >>> and why they're required? >>> >>> I'm a little weary about how error-prone these commands might be and the >>> potential for incorrect usage seems... high. Why do we require them? >> It is needed for incremental backup. It looks like bad idea to export >> abdicate/reclaim functionality, it is simpler >> and clearer to allow user to merge/enable/disable bitmaps by hand. >> >> usage is like this: >> >> 1. we have dirty bitmap bitmap0 for incremental backup. >> >> 2. prepare image fleecing (create temporary image with backing=our_disk) >> 3. in qmp transaction: >> - disable bitmap0 >> - create bitmap1 >> - start image fleecing (backup sync=none our_disk -> temp_disk) > This could probably just be its own command, though: > > block-job-fleece node=foobar bitmap=bitmap0 etc=etera etc=etera > > Could handle forking the bitmap. I'm not sure what the arguments would > look like, but we could name the NBD export here, too. (Assuming the > server is already started and we just need to create the share.) > > Then, we can basically do what mirror does: > > (1) Cancel > (2) Complete > > Cancel would instruct QEMU to keep the bitmap changes (i.e. roll back), > and Complete would instruct QEMU to discard the changes. > > This way we don't need to expose commands like split or merge that will > almost always be dangerous to use over QMP. > > In fact, a fleecing job would be really convenient even without a > bitmap, because it'd still be nice to have a convenience command for it. > Using an existing infrastructure and understood paradigm is just a bonus. > actually this is a very good question about safety/simplicity... We actually have spent a bit of time on design and have not come to a good solution. Yes, technically for now we can put all into fleecing command and rely on its semantics. Though I am not convinced with that approach. We can think that this can be reused on snapshot operations (may be, may be not). Also technically there are other cases. This is actually a matter that QEMU should provide low level API while management software should make decisions. Providing merge etc operations is done using exactly that approach. We can also consider this in a similar way we have recently fixed "revert to snapshot" operation. Management can make and should make a decision. Den
John, Kevin, do we reach a consensus? Can we go on with this? 20.11.2017 19:00, Denis V. Lunev wrote: > On 11/17/2017 06:10 AM, John Snow wrote: >> On 11/16/2017 03:17 AM, Vladimir Sementsov-Ogievskiy wrote: >>> 16.11.2017 00:20, John Snow wrote: >>>> On 11/13/2017 11:20 AM, Vladimir Sementsov-Ogievskiy wrote: >>>>> Hi all. >>>>> >>>>> There are three qmp commands, needed to implement external backup API. >>>>> >>>>> Using these three commands, client may do all needed bitmap >>>>> management by >>>>> hand: >>>>> >>>>> on backup start we need to do a transaction: >>>>> {disable old bitmap, create new bitmap} >>>>> >>>>> on backup success: >>>>> drop old bitmap >>>>> >>>>> on backup fail: >>>>> enable old bitmap >>>>> merge new bitmap to old bitmap >>>>> drop new bitmap >>>>> >>>> Can you give me an example of how you expect these commands to be used, >>>> and why they're required? >>>> >>>> I'm a little weary about how error-prone these commands might be and the >>>> potential for incorrect usage seems... high. Why do we require them? >>> It is needed for incremental backup. It looks like bad idea to export >>> abdicate/reclaim functionality, it is simpler >>> and clearer to allow user to merge/enable/disable bitmaps by hand. >>> >>> usage is like this: >>> >>> 1. we have dirty bitmap bitmap0 for incremental backup. >>> >>> 2. prepare image fleecing (create temporary image with backing=our_disk) >>> 3. in qmp transaction: >>> - disable bitmap0 >>> - create bitmap1 >>> - start image fleecing (backup sync=none our_disk -> temp_disk) >> This could probably just be its own command, though: >> >> block-job-fleece node=foobar bitmap=bitmap0 etc=etera etc=etera >> >> Could handle forking the bitmap. I'm not sure what the arguments would >> look like, but we could name the NBD export here, too. (Assuming the >> server is already started and we just need to create the share.) >> >> Then, we can basically do what mirror does: >> >> (1) Cancel >> (2) Complete >> >> Cancel would instruct QEMU to keep the bitmap changes (i.e. roll back), >> and Complete would instruct QEMU to discard the changes. >> >> This way we don't need to expose commands like split or merge that will >> almost always be dangerous to use over QMP. >> >> In fact, a fleecing job would be really convenient even without a >> bitmap, because it'd still be nice to have a convenience command for it. >> Using an existing infrastructure and understood paradigm is just a bonus. >> > actually this is a very good question about safety/simplicity... > > We actually have spent a bit of time on design and have not > come to a good solution. Yes, technically for now we can put > all into fleecing command and rely on its semantics. Though > I am not convinced with that approach. We can think that this > can be reused on snapshot operations (may be, may be not). > Also technically there are other cases. > > This is actually a matter that QEMU should provide low level > API while management software should make decisions. > Providing merge etc operations is done using exactly that > approach. We can also consider this in a similar way we have > recently fixed "revert to snapshot" operation. Management > can make and should make a decision. > > Den > -- Best regards, Vladimir
Am 24.11.2017 um 16:01 hat Vladimir Sementsov-Ogievskiy geschrieben: > John, Kevin, do we reach a consensus? Can we go on with this? I don't know the details of this, so I can't really offer a strong opinion. I gave a high-level perspective of what we're doing in other places and that's all I was planning to contribute at the moment. So I'm deferring this to John. If you guys can't find a decision or are uncertain about the approach, please let me know and I can try to find the time to actually get into the details and provide some more in-depth feedback. Kevin > 20.11.2017 19:00, Denis V. Lunev wrote: > > On 11/17/2017 06:10 AM, John Snow wrote: > > > On 11/16/2017 03:17 AM, Vladimir Sementsov-Ogievskiy wrote: > > > > 16.11.2017 00:20, John Snow wrote: > > > > > On 11/13/2017 11:20 AM, Vladimir Sementsov-Ogievskiy wrote: > > > > > > Hi all. > > > > > > > > > > > > There are three qmp commands, needed to implement external backup API. > > > > > > > > > > > > Using these three commands, client may do all needed bitmap > > > > > > management by > > > > > > hand: > > > > > > > > > > > > on backup start we need to do a transaction: > > > > > > {disable old bitmap, create new bitmap} > > > > > > > > > > > > on backup success: > > > > > > drop old bitmap > > > > > > > > > > > > on backup fail: > > > > > > enable old bitmap > > > > > > merge new bitmap to old bitmap > > > > > > drop new bitmap > > > > > > > > > > > Can you give me an example of how you expect these commands to be used, > > > > > and why they're required? > > > > > > > > > > I'm a little weary about how error-prone these commands might be and the > > > > > potential for incorrect usage seems... high. Why do we require them? > > > > It is needed for incremental backup. It looks like bad idea to export > > > > abdicate/reclaim functionality, it is simpler > > > > and clearer to allow user to merge/enable/disable bitmaps by hand. > > > > > > > > usage is like this: > > > > > > > > 1. we have dirty bitmap bitmap0 for incremental backup. > > > > > > > > 2. prepare image fleecing (create temporary image with backing=our_disk) > > > > 3. in qmp transaction: > > > > - disable bitmap0 > > > > - create bitmap1 > > > > - start image fleecing (backup sync=none our_disk -> temp_disk) > > > This could probably just be its own command, though: > > > > > > block-job-fleece node=foobar bitmap=bitmap0 etc=etera etc=etera > > > > > > Could handle forking the bitmap. I'm not sure what the arguments would > > > look like, but we could name the NBD export here, too. (Assuming the > > > server is already started and we just need to create the share.) > > > > > > Then, we can basically do what mirror does: > > > > > > (1) Cancel > > > (2) Complete > > > > > > Cancel would instruct QEMU to keep the bitmap changes (i.e. roll back), > > > and Complete would instruct QEMU to discard the changes. > > > > > > This way we don't need to expose commands like split or merge that will > > > almost always be dangerous to use over QMP. > > > > > > In fact, a fleecing job would be really convenient even without a > > > bitmap, because it'd still be nice to have a convenience command for it. > > > Using an existing infrastructure and understood paradigm is just a bonus. > > > > > actually this is a very good question about safety/simplicity... > > > > We actually have spent a bit of time on design and have not > > come to a good solution. Yes, technically for now we can put > > all into fleecing command and rely on its semantics. Though > > I am not convinced with that approach. We can think that this > > can be reused on snapshot operations (may be, may be not). > > Also technically there are other cases. > > > > This is actually a matter that QEMU should provide low level > > API while management software should make decisions. > > Providing merge etc operations is done using exactly that > > approach. We can also consider this in a similar way we have > > recently fixed "revert to snapshot" operation. Management > > can make and should make a decision. > > > > Den > > > > > -- > Best regards, > Vladimir >
On Mon, 11/13 19:20, Vladimir Sementsov-Ogievskiy wrote: > Hi all. > > There are three qmp commands, needed to implement external backup API. > > Using these three commands, client may do all needed bitmap management by > hand: > > on backup start we need to do a transaction: > {disable old bitmap, create new bitmap} > > on backup success: > drop old bitmap > > on backup fail: > enable old bitmap > merge new bitmap to old bitmap > drop new bitmap > > Question: it may be better to make one command instead of two: > block-dirty-bitmap-set-enabled(bool enabled) > > Vladimir Sementsov-Ogievskiy (4): > block/dirty-bitmap: add lock to bdrv_enable/disable_dirty_bitmap > qapi: add block-dirty-bitmap-enable/disable > qmp: transaction support for block-dirty-bitmap-enable/disable > qapi: add block-dirty-bitmap-merge > > qapi/block-core.json | 80 +++++++++++++++++++++++ > qapi/transaction.json | 4 ++ > include/block/dirty-bitmap.h | 2 + > block/dirty-bitmap.c | 21 ++++++ > blockdev.c | 151 +++++++++++++++++++++++++++++++++++++++++++ > 5 files changed, 258 insertions(+) > I think tests are required to merge new features/commands. Can we include tests on these new code please? We should cover error handling, and also write tests that demonstrate the intended real world use cases. Also should we add new sections to docs/interop/bitmaps.rst? Meta: John started a long discussion about the API design but I think after all it turns out exposing dirty bitmap objects and the primitives is a reasonable approach to implement incremental backup functionalities. The comment I have is that we should ensure we have also reviewed it from a higher level (e.g. all the potential user requirements) to make sure this low level API is both sound and flexible. We shouldn't introduce a minimal set of low level commands just to support one particular use case, but look a bit further and broader and come up with a more complete design? Writing docs and tests might force us to think in this direction, which I think is a good thing to have for this series, too. Fam
13.12.2017 07:12, Fam Zheng wrote: > On Mon, 11/13 19:20, Vladimir Sementsov-Ogievskiy wrote: >> Hi all. >> >> There are three qmp commands, needed to implement external backup API. >> >> Using these three commands, client may do all needed bitmap management by >> hand: >> >> on backup start we need to do a transaction: >> {disable old bitmap, create new bitmap} >> >> on backup success: >> drop old bitmap >> >> on backup fail: >> enable old bitmap >> merge new bitmap to old bitmap >> drop new bitmap >> >> Question: it may be better to make one command instead of two: >> block-dirty-bitmap-set-enabled(bool enabled) >> >> Vladimir Sementsov-Ogievskiy (4): >> block/dirty-bitmap: add lock to bdrv_enable/disable_dirty_bitmap >> qapi: add block-dirty-bitmap-enable/disable >> qmp: transaction support for block-dirty-bitmap-enable/disable >> qapi: add block-dirty-bitmap-merge >> >> qapi/block-core.json | 80 +++++++++++++++++++++++ >> qapi/transaction.json | 4 ++ >> include/block/dirty-bitmap.h | 2 + >> block/dirty-bitmap.c | 21 ++++++ >> blockdev.c | 151 +++++++++++++++++++++++++++++++++++++++++++ >> 5 files changed, 258 insertions(+) >> > I think tests are required to merge new features/commands. Can we include tests > on these new code please? We should cover error handling, and also write tests > that demonstrate the intended real world use cases. > > Also should we add new sections to docs/interop/bitmaps.rst? > > Meta: John started a long discussion about the API design but I think after all > it turns out exposing dirty bitmap objects and the primitives is a reasonable > approach to implement incremental backup functionalities. The comment I have is > that we should ensure we have also reviewed it from a higher level (e.g. all the > potential user requirements) to make sure this low level API is both sound and > flexible. We shouldn't introduce a minimal set of low level commands just to > support one particular use case, but look a bit further and broader and come up > with a more complete design? Writing docs and tests might force us to think in > this direction, which I think is a good thing to have for this series, too. > > Fam Nikolay, please describe what do you plan in libvirt over qmp bitmap API. Kirill, what do you think about this all? (brief history: we are considering 3 new qmp commands for bitmap management, needed for external incremental backup support - enable (bitmap will track disk changes) - disable (bitmap will stop tracking changes) - merge (merge bitmap A to bitmap B) ) -- Best regards, Vladimir
On 12/19/2017 11:07 AM, Vladimir Sementsov-Ogievskiy wrote: > 13.12.2017 07:12, Fam Zheng wrote: >> On Mon, 11/13 19:20, Vladimir Sementsov-Ogievskiy wrote: >>> Hi all. >>> >>> There are three qmp commands, needed to implement external backup API. >>> >>> Using these three commands, client may do all needed bitmap >>> management by >>> hand: >>> >>> on backup start we need to do a transaction: >>> {disable old bitmap, create new bitmap} >>> >>> on backup success: >>> drop old bitmap >>> >>> on backup fail: >>> enable old bitmap >>> merge new bitmap to old bitmap >>> drop new bitmap >>> >>> Question: it may be better to make one command instead of two: >>> block-dirty-bitmap-set-enabled(bool enabled) >>> >>> Vladimir Sementsov-Ogievskiy (4): >>> block/dirty-bitmap: add lock to bdrv_enable/disable_dirty_bitmap >>> qapi: add block-dirty-bitmap-enable/disable >>> qmp: transaction support for block-dirty-bitmap-enable/disable >>> qapi: add block-dirty-bitmap-merge >>> >>> qapi/block-core.json | 80 +++++++++++++++++++++++ >>> qapi/transaction.json | 4 ++ >>> include/block/dirty-bitmap.h | 2 + >>> block/dirty-bitmap.c | 21 ++++++ >>> blockdev.c | 151 >>> +++++++++++++++++++++++++++++++++++++++++++ >>> 5 files changed, 258 insertions(+) >>> >> I think tests are required to merge new features/commands. Can we >> include tests >> on these new code please? We should cover error handling, and also >> write tests >> that demonstrate the intended real world use cases. >> >> Also should we add new sections to docs/interop/bitmaps.rst? >> >> Meta: John started a long discussion about the API design but I think >> after all >> it turns out exposing dirty bitmap objects and the primitives is a >> reasonable >> approach to implement incremental backup functionalities. The comment >> I have is >> that we should ensure we have also reviewed it from a higher level >> (e.g. all the >> potential user requirements) to make sure this low level API is both >> sound and >> flexible. We shouldn't introduce a minimal set of low level commands >> just to >> support one particular use case, but look a bit further and broader >> and come up >> with a more complete design? Writing docs and tests might force us to >> think in >> this direction, which I think is a good thing to have for this series, >> too. >> >> Fam > > Nikolay, please describe what do you plan in libvirt over qmp bitmap API. > > Kirill, what do you think about this all? > > (brief history: > we are considering 3 new qmp commands for bitmap management, needed for > external incremental backup support > - enable (bitmap will track disk changes) > - disable (bitmap will stop tracking changes) > - merge (merge bitmap A to bitmap B) > ) > Yeah, it would be helpful to know what the full workflow for the API will be ... before I get ahead of myself again (sorry) ... but I'd like to see a quick writeup of your vision for the pull-mode backups (which I assume this is for, right?) from start-to-finish, like a mockup of annotated QMP output or something. Nothing fancy, just something that lets me orient where we're headed, since you're doing most of the work and I just want to get out of your way, but having a roadmap helps me do that. --js
On 20.12.2017 04:06, John Snow wrote: > > > On 12/19/2017 11:07 AM, Vladimir Sementsov-Ogievskiy wrote: >> 13.12.2017 07:12, Fam Zheng wrote: >>> On Mon, 11/13 19:20, Vladimir Sementsov-Ogievskiy wrote: >>>> Hi all. >>>> >>>> There are three qmp commands, needed to implement external backup API. >>>> >>>> Using these three commands, client may do all needed bitmap >>>> management by >>>> hand: >>>> >>>> on backup start we need to do a transaction: >>>> {disable old bitmap, create new bitmap} >>>> >>>> on backup success: >>>> drop old bitmap >>>> >>>> on backup fail: >>>> enable old bitmap >>>> merge new bitmap to old bitmap >>>> drop new bitmap >>>> >>>> Question: it may be better to make one command instead of two: >>>> block-dirty-bitmap-set-enabled(bool enabled) >>>> >>>> Vladimir Sementsov-Ogievskiy (4): >>>> block/dirty-bitmap: add lock to bdrv_enable/disable_dirty_bitmap >>>> qapi: add block-dirty-bitmap-enable/disable >>>> qmp: transaction support for block-dirty-bitmap-enable/disable >>>> qapi: add block-dirty-bitmap-merge >>>> >>>> qapi/block-core.json | 80 +++++++++++++++++++++++ >>>> qapi/transaction.json | 4 ++ >>>> include/block/dirty-bitmap.h | 2 + >>>> block/dirty-bitmap.c | 21 ++++++ >>>> blockdev.c | 151 >>>> +++++++++++++++++++++++++++++++++++++++++++ >>>> 5 files changed, 258 insertions(+) >>>> >>> I think tests are required to merge new features/commands. Can we >>> include tests >>> on these new code please? We should cover error handling, and also >>> write tests >>> that demonstrate the intended real world use cases. >>> >>> Also should we add new sections to docs/interop/bitmaps.rst? >>> >>> Meta: John started a long discussion about the API design but I think >>> after all >>> it turns out exposing dirty bitmap objects and the primitives is a >>> reasonable >>> approach to implement incremental backup functionalities. The comment >>> I have is >>> that we should ensure we have also reviewed it from a higher level >>> (e.g. all the >>> potential user requirements) to make sure this low level API is both >>> sound and >>> flexible. We shouldn't introduce a minimal set of low level commands >>> just to >>> support one particular use case, but look a bit further and broader >>> and come up >>> with a more complete design? Writing docs and tests might force us to >>> think in >>> this direction, which I think is a good thing to have for this series, >>> too. >>> >>> Fam >> >> Nikolay, please describe what do you plan in libvirt over qmp bitmap API. >> >> Kirill, what do you think about this all? >> >> (brief history: >> we are considering 3 new qmp commands for bitmap management, needed for >> external incremental backup support >> - enable (bitmap will track disk changes) >> - disable (bitmap will stop tracking changes) >> - merge (merge bitmap A to bitmap B) >> ) >> > > Yeah, it would be helpful to know what the full workflow for the API > will be ... before I get ahead of myself again (sorry) ... > > but I'd like to see a quick writeup of your vision for the pull-mode > backups (which I assume this is for, right?) from start-to-finish, like > a mockup of annotated QMP output or something. > > Nothing fancy, just something that lets me orient where we're headed, > since you're doing most of the work and I just want to get out of your > way, but having a roadmap helps me do that. > > --js > Hi, all. In terms of API we want to be able to create an incremental backup from any previous backup. But as in qemu it does cost a bitmap for every point in time we want to do incremental backup from we plan to keep only limited number of such checkpoints. Like last N checkpoints I guess. So there will be an API to delete checkpoint as well. In terms of implementations Vladimir already described usage of disable/enable/ merge commands in [1] for both creation and deletion API. There is also a RFC [2] in libvirt list which described in more detail libvirt API. [1] http://lists.nongnu.org/archive/html/qemu-devel/2017-12/msg01099.html [2] https://www.redhat.com/archives/libvir-list/2017-November/msg00514.html
20.12.2017 04:06, John Snow wrote: > > On 12/19/2017 11:07 AM, Vladimir Sementsov-Ogievskiy wrote: >> 13.12.2017 07:12, Fam Zheng wrote: >>> On Mon, 11/13 19:20, Vladimir Sementsov-Ogievskiy wrote: >>>> Hi all. >>>> >>>> There are three qmp commands, needed to implement external backup API. >>>> >>>> Using these three commands, client may do all needed bitmap >>>> management by >>>> hand: >>>> >>>> on backup start we need to do a transaction: >>>> {disable old bitmap, create new bitmap} >>>> >>>> on backup success: >>>> drop old bitmap >>>> >>>> on backup fail: >>>> enable old bitmap >>>> merge new bitmap to old bitmap >>>> drop new bitmap >>>> >>>> Question: it may be better to make one command instead of two: >>>> block-dirty-bitmap-set-enabled(bool enabled) >>>> >>>> Vladimir Sementsov-Ogievskiy (4): >>>> block/dirty-bitmap: add lock to bdrv_enable/disable_dirty_bitmap >>>> qapi: add block-dirty-bitmap-enable/disable >>>> qmp: transaction support for block-dirty-bitmap-enable/disable >>>> qapi: add block-dirty-bitmap-merge >>>> >>>> qapi/block-core.json | 80 +++++++++++++++++++++++ >>>> qapi/transaction.json | 4 ++ >>>> include/block/dirty-bitmap.h | 2 + >>>> block/dirty-bitmap.c | 21 ++++++ >>>> blockdev.c | 151 >>>> +++++++++++++++++++++++++++++++++++++++++++ >>>> 5 files changed, 258 insertions(+) >>>> >>> I think tests are required to merge new features/commands. Can we >>> include tests >>> on these new code please? We should cover error handling, and also >>> write tests >>> that demonstrate the intended real world use cases. >>> >>> Also should we add new sections to docs/interop/bitmaps.rst? >>> >>> Meta: John started a long discussion about the API design but I think >>> after all >>> it turns out exposing dirty bitmap objects and the primitives is a >>> reasonable >>> approach to implement incremental backup functionalities. The comment >>> I have is >>> that we should ensure we have also reviewed it from a higher level >>> (e.g. all the >>> potential user requirements) to make sure this low level API is both >>> sound and >>> flexible. We shouldn't introduce a minimal set of low level commands >>> just to >>> support one particular use case, but look a bit further and broader >>> and come up >>> with a more complete design? Writing docs and tests might force us to >>> think in >>> this direction, which I think is a good thing to have for this series, >>> too. >>> >>> Fam >> Nikolay, please describe what do you plan in libvirt over qmp bitmap API. >> >> Kirill, what do you think about this all? >> >> (brief history: >> we are considering 3 new qmp commands for bitmap management, needed for >> external incremental backup support >> - enable (bitmap will track disk changes) >> - disable (bitmap will stop tracking changes) >> - merge (merge bitmap A to bitmap B) >> ) >> > Yeah, it would be helpful to know what the full workflow for the API > will be ... before I get ahead of myself again (sorry) ... > > but I'd like to see a quick writeup of your vision for the pull-mode > backups (which I assume this is for, right?) from start-to-finish, like > a mockup of annotated QMP output or something. > > Nothing fancy, just something that lets me orient where we're headed, > since you're doing most of the work and I just want to get out of your > way, but having a roadmap helps me do that. > > --js external backup: 0. we have active_disk and attached to it dirty bitmap bitmap0 1. qmp blockdev-add tmp_disk (backing=active_disk) 2. guest fsfreeze 3. qmp transaction: - block-dirty-bitmap-add node=active_disk name=bitmap1 - block-dirty-bitmap-disable node=active_disk name=bitmap0 - blockdev-backup drive=active_disk target=tmp_disk sync=none 4. guest fsthaw 5. (? not designed yet) qmp blockdev-add filter_node - special filter node over tmp_disk for synchronization of nbd-reads and backup(sync=none) cow requests (like it is done in block/replication) 6. qmp nbd-server-start 7. qmp nbd-server-add filter_node (there should be possibility of exporting bitmap of child node filter_node->tmp_disk->active_disk->bitmap0) then, external tool can connect to nbd server and get exported bitmap and read data (including bitmap0) accordingly to nbd specification. (also, external tool may get a merge of several bitmaps, if we already have a sequence of them) then, after backup finish, what can be done: 1. qmp block-job-cancel device=active_disk (stop our backup(sync=none)) 2. qmp nbd-server-stop (or qmp nbd-server-remove filter_node) 3. qmp blockdev-remove filter_node 4. qmp blockdev-remove tmp_disk on successful backup, you can drop old bitmap if you want (or do not drop it, if you need to keep sequence of disabled bitmaps): 1. block-dirty-bitmap-remove node=active_disk name=bitmap0 on failed backup, you can merge bitmaps, to make it look like nothing happened: 1. qmp transaction: - block-dirty-bitmap-merge node=active_disk name-source=bitmap1 name-target=bitmap0 - block-dirty-bitmap-remove node=active_disk name=bitmap1 - block-dirty-bitmap-enable node=active_disk name=bitmap0 -- Best regards, Vladimir
Lets me add a couple of words from backup vendor side: 1. pull backup is very important to have, it enables additional important functionality: 1. User can select separate volumes (rather then disks) to backup 2. User can select specific files/folders to backup (!) 3. Backup software can automatically skip copying useless data like swap file/partition. All 3 require disk/filesystem analysis, i.e. random disk access on demand (rather then "push all” approach). 2. It is important to have multiple CBT bitmaps and be able to detect modifications between different point in times because one might have at least 2 applications interested in CBT: a. backup b. replication Thanks, Kirill > On 20 Dec 2017, at 11:20, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> wrote: > > 20.12.2017 04:06, John Snow wrote: >> >> On 12/19/2017 11:07 AM, Vladimir Sementsov-Ogievskiy wrote: >>> 13.12.2017 07:12, Fam Zheng wrote: >>>> On Mon, 11/13 19:20, Vladimir Sementsov-Ogievskiy wrote: >>>>> Hi all. >>>>> >>>>> There are three qmp commands, needed to implement external backup API. >>>>> >>>>> Using these three commands, client may do all needed bitmap >>>>> management by >>>>> hand: >>>>> >>>>> on backup start we need to do a transaction: >>>>> {disable old bitmap, create new bitmap} >>>>> >>>>> on backup success: >>>>> drop old bitmap >>>>> >>>>> on backup fail: >>>>> enable old bitmap >>>>> merge new bitmap to old bitmap >>>>> drop new bitmap >>>>> >>>>> Question: it may be better to make one command instead of two: >>>>> block-dirty-bitmap-set-enabled(bool enabled) >>>>> >>>>> Vladimir Sementsov-Ogievskiy (4): >>>>> block/dirty-bitmap: add lock to bdrv_enable/disable_dirty_bitmap >>>>> qapi: add block-dirty-bitmap-enable/disable >>>>> qmp: transaction support for block-dirty-bitmap-enable/disable >>>>> qapi: add block-dirty-bitmap-merge >>>>> >>>>> qapi/block-core.json | 80 +++++++++++++++++++++++ >>>>> qapi/transaction.json | 4 ++ >>>>> include/block/dirty-bitmap.h | 2 + >>>>> block/dirty-bitmap.c | 21 ++++++ >>>>> blockdev.c | 151 >>>>> +++++++++++++++++++++++++++++++++++++++++++ >>>>> 5 files changed, 258 insertions(+) >>>>> >>>> I think tests are required to merge new features/commands. Can we >>>> include tests >>>> on these new code please? We should cover error handling, and also >>>> write tests >>>> that demonstrate the intended real world use cases. >>>> >>>> Also should we add new sections to docs/interop/bitmaps.rst? >>>> >>>> Meta: John started a long discussion about the API design but I think >>>> after all >>>> it turns out exposing dirty bitmap objects and the primitives is a >>>> reasonable >>>> approach to implement incremental backup functionalities. The comment >>>> I have is >>>> that we should ensure we have also reviewed it from a higher level >>>> (e.g. all the >>>> potential user requirements) to make sure this low level API is both >>>> sound and >>>> flexible. We shouldn't introduce a minimal set of low level commands >>>> just to >>>> support one particular use case, but look a bit further and broader >>>> and come up >>>> with a more complete design? Writing docs and tests might force us to >>>> think in >>>> this direction, which I think is a good thing to have for this series, >>>> too. >>>> >>>> Fam >>> Nikolay, please describe what do you plan in libvirt over qmp bitmap API. >>> >>> Kirill, what do you think about this all? >>> >>> (brief history: >>> we are considering 3 new qmp commands for bitmap management, needed for >>> external incremental backup support >>> - enable (bitmap will track disk changes) >>> - disable (bitmap will stop tracking changes) >>> - merge (merge bitmap A to bitmap B) >>> ) >>> >> Yeah, it would be helpful to know what the full workflow for the API >> will be ... before I get ahead of myself again (sorry) ... >> >> but I'd like to see a quick writeup of your vision for the pull-mode >> backups (which I assume this is for, right?) from start-to-finish, like >> a mockup of annotated QMP output or something. >> >> Nothing fancy, just something that lets me orient where we're headed, >> since you're doing most of the work and I just want to get out of your >> way, but having a roadmap helps me do that. >> >> --js > > external backup: > > 0. we have active_disk and attached to it dirty bitmap bitmap0 > 1. qmp blockdev-add tmp_disk (backing=active_disk) > 2. guest fsfreeze > 3. qmp transaction: > - block-dirty-bitmap-add node=active_disk name=bitmap1 > - block-dirty-bitmap-disable node=active_disk name=bitmap0 > - blockdev-backup drive=active_disk target=tmp_disk sync=none > 4. guest fsthaw > 5. (? not designed yet) qmp blockdev-add filter_node - special filter node over tmp_disk for synchronization of nbd-reads and backup(sync=none) cow requests (like it is done in block/replication) > 6. qmp nbd-server-start > 7. qmp nbd-server-add filter_node (there should be possibility of exporting bitmap of child node filter_node->tmp_disk->active_disk->bitmap0) > > then, external tool can connect to nbd server and get exported bitmap and read data (including bitmap0) accordingly to nbd specification. > (also, external tool may get a merge of several bitmaps, if we already have a sequence of them) > then, after backup finish, what can be done: > > 1. qmp block-job-cancel device=active_disk (stop our backup(sync=none)) > 2. qmp nbd-server-stop (or qmp nbd-server-remove filter_node) > 3. qmp blockdev-remove filter_node > 4. qmp blockdev-remove tmp_disk > > on successful backup, you can drop old bitmap if you want (or do not drop it, if you need to keep sequence of disabled bitmaps): > 1. block-dirty-bitmap-remove node=active_disk name=bitmap0 > > on failed backup, you can merge bitmaps, to make it look like nothing happened: > 1. qmp transaction: > - block-dirty-bitmap-merge node=active_disk name-source=bitmap1 name-target=bitmap0 > - block-dirty-bitmap-remove node=active_disk name=bitmap1 > - block-dirty-bitmap-enable node=active_disk name=bitmap0 > > -- > Best regards, > Vladimir >
On Wed, 12/20 11:20, Vladimir Sementsov-Ogievskiy wrote: > external backup: > > 0. we have active_disk and attached to it dirty bitmap bitmap0 > 1. qmp blockdev-add tmp_disk (backing=active_disk) > 2. guest fsfreeze > 3. qmp transaction: > - block-dirty-bitmap-add node=active_disk name=bitmap1 > - block-dirty-bitmap-disable node=active_disk name=bitmap0 > - blockdev-backup drive=active_disk target=tmp_disk sync=none > 4. guest fsthaw > 5. (? not designed yet) qmp blockdev-add filter_node - special filter node > over tmp_disk for synchronization of nbd-reads and backup(sync=none) cow > requests (like it is done in block/replication) > 6. qmp nbd-server-start > 7. qmp nbd-server-add filter_node (there should be possibility of exporting > bitmap of child node filter_node->tmp_disk->active_disk->bitmap0) > > then, external tool can connect to nbd server and get exported bitmap and > read data (including bitmap0) accordingly to nbd specification. > (also, external tool may get a merge of several bitmaps, if we already have > a sequence of them) > then, after backup finish, what can be done: > > 1. qmp block-job-cancel device=active_disk (stop our backup(sync=none)) > 2. qmp nbd-server-stop (or qmp nbd-server-remove filter_node) > 3. qmp blockdev-remove filter_node > 4. qmp blockdev-remove tmp_disk > > on successful backup, you can drop old bitmap if you want (or do not drop > it, if you need to keep sequence of disabled bitmaps): > 1. block-dirty-bitmap-remove node=active_disk name=bitmap0 > > on failed backup, you can merge bitmaps, to make it look like nothing > happened: > 1. qmp transaction: > - block-dirty-bitmap-merge node=active_disk name-source=bitmap1 > name-target=bitmap0 Being done in a transaction, will merging a large-ish bitmap synchronously hurt the responsiveness? Because we have the BQL lock held here which pauses all device emulation. Have you measured how long it takes to merge two typical bitmaps. Say, for a 1TB disk? Fam
26.12.2017 10:07, Fam Zheng wrote: > On Wed, 12/20 11:20, Vladimir Sementsov-Ogievskiy wrote: >> external backup: >> >> 0. we have active_disk and attached to it dirty bitmap bitmap0 >> 1. qmp blockdev-add tmp_disk (backing=active_disk) >> 2. guest fsfreeze >> 3. qmp transaction: >> - block-dirty-bitmap-add node=active_disk name=bitmap1 >> - block-dirty-bitmap-disable node=active_disk name=bitmap0 >> - blockdev-backup drive=active_disk target=tmp_disk sync=none >> 4. guest fsthaw >> 5. (? not designed yet) qmp blockdev-add filter_node - special filter node >> over tmp_disk for synchronization of nbd-reads and backup(sync=none) cow >> requests (like it is done in block/replication) >> 6. qmp nbd-server-start >> 7. qmp nbd-server-add filter_node (there should be possibility of exporting >> bitmap of child node filter_node->tmp_disk->active_disk->bitmap0) >> >> then, external tool can connect to nbd server and get exported bitmap and >> read data (including bitmap0) accordingly to nbd specification. >> (also, external tool may get a merge of several bitmaps, if we already have >> a sequence of them) >> then, after backup finish, what can be done: >> >> 1. qmp block-job-cancel device=active_disk (stop our backup(sync=none)) >> 2. qmp nbd-server-stop (or qmp nbd-server-remove filter_node) >> 3. qmp blockdev-remove filter_node >> 4. qmp blockdev-remove tmp_disk >> >> on successful backup, you can drop old bitmap if you want (or do not drop >> it, if you need to keep sequence of disabled bitmaps): >> 1. block-dirty-bitmap-remove node=active_disk name=bitmap0 >> >> on failed backup, you can merge bitmaps, to make it look like nothing >> happened: >> 1. qmp transaction: >> - block-dirty-bitmap-merge node=active_disk name-source=bitmap1 >> name-target=bitmap0 > Being done in a transaction, will merging a large-ish bitmap synchronously hurt > the responsiveness? Because we have the BQL lock held here which pauses all > device emulation. > > Have you measured how long it takes to merge two typical bitmaps. Say, for a 1TB > disk? > > Fam We don't need merge in a transaction. Anyway, good question. two full of ones bitmaps, 64k granularity, 1tb disk: # time virsh qemu-monitor-command tmp '{"execute": "block-dirty-bitmap-merge", "arguments": {"node": "disk", "src_name": "a", "dst_name": "b"}}' {"return":{},"id":"libvirt-1181"} real 0m0.009s user 0m0.006s sys 0m0.002s and this is fine: for last level of hbitmap we will have disk_size / granularity / nb_bits_in_long = (1024 ^ 4) / (64 * 1024) / 64 = 262144 oparations, which is quite a few bitmaps in gdb: (gdb) p bdrv_lookup_bs ("disk", "disk", 0) $1 = (BlockDriverState *) 0x7fd3f6274940 (gdb) p *$1->dirty_bitmaps.lh_first $2 = {mutex = 0x7fd3f6277b28, bitmap = 0x7fd3f5a5adc0, meta = 0x0, successor = 0x0, name = 0x7fd3f637b410 "b", size = 1099511627776, disabled = false, active_iterators = 0, readonly = false, autoload = false, persistent = false, list = {le_next = 0x7fd3f567c650, le_prev = 0x7fd3f6277b58}} (gdb) p *$1->dirty_bitmaps.lh_first ->bitmap $3 = {size = 16777216, count = 16777216, granularity = 16, meta = 0x0, levels = {0x7fd3f6279a90, 0x7fd3f5506350, 0x7fd3f5affcb0, 0x7fd3f547a860, 0x7fd3f637b200, 0x7fd3f67ff5c0, 0x7fd3d8dfe010}, sizes = {1, 1, 1, 1, 64, 4096, 262144}} (gdb) p *$1->dirty_bitmaps.lh_first ->list .le_next $4 = {mutex = 0x7fd3f6277b28, bitmap = 0x7fd3f567cb30, meta = 0x0, successor = 0x0, name = 0x7fd3f5482fb0 "a", size = 1099511627776, disabled = false, active_iterators = 0, readonly = false, autoload = false, persistent = false, list = {le_next = 0x0, le_prev = 0x7fd3f6c779e0}} (gdb) p *$1->dirty_bitmaps.lh_first ->list .le_next ->bitmap $5 = {size = 16777216, count = 16777216, granularity = 16, meta = 0x0, levels = {0x7fd3f5ef8880, 0x7fd3f5facea0, 0x7fd3f5f1cec0, 0x7fd3f5f40a00, 0x7fd3f6c80a00, 0x7fd3f66e5f60, 0x7fd3d8fff010}, sizes = {1, 1, 1, 1, 64, 4096, 262144}} -- Best regards, Vladimir
On Tue, 12/26 11:57, Vladimir Sementsov-Ogievskiy wrote: > 26.12.2017 10:07, Fam Zheng wrote: > > On Wed, 12/20 11:20, Vladimir Sementsov-Ogievskiy wrote: > > > external backup: > > > > > > 0. we have active_disk and attached to it dirty bitmap bitmap0 > > > 1. qmp blockdev-add tmp_disk (backing=active_disk) > > > 2. guest fsfreeze > > > 3. qmp transaction: > > > - block-dirty-bitmap-add node=active_disk name=bitmap1 > > > - block-dirty-bitmap-disable node=active_disk name=bitmap0 > > > - blockdev-backup drive=active_disk target=tmp_disk sync=none > > > 4. guest fsthaw > > > 5. (? not designed yet) qmp blockdev-add filter_node - special filter node > > > over tmp_disk for synchronization of nbd-reads and backup(sync=none) cow > > > requests (like it is done in block/replication) > > > 6. qmp nbd-server-start > > > 7. qmp nbd-server-add filter_node (there should be possibility of exporting > > > bitmap of child node filter_node->tmp_disk->active_disk->bitmap0) > > > > > > then, external tool can connect to nbd server and get exported bitmap and > > > read data (including bitmap0) accordingly to nbd specification. > > > (also, external tool may get a merge of several bitmaps, if we already have > > > a sequence of them) > > > then, after backup finish, what can be done: > > > > > > 1. qmp block-job-cancel device=active_disk (stop our backup(sync=none)) > > > 2. qmp nbd-server-stop (or qmp nbd-server-remove filter_node) > > > 3. qmp blockdev-remove filter_node > > > 4. qmp blockdev-remove tmp_disk > > > > > > on successful backup, you can drop old bitmap if you want (or do not drop > > > it, if you need to keep sequence of disabled bitmaps): > > > 1. block-dirty-bitmap-remove node=active_disk name=bitmap0 > > > > > > on failed backup, you can merge bitmaps, to make it look like nothing > > > happened: > > > 1. qmp transaction: > > > - block-dirty-bitmap-merge node=active_disk name-source=bitmap1 > > > name-target=bitmap0 > > Being done in a transaction, will merging a large-ish bitmap synchronously hurt > > the responsiveness? Because we have the BQL lock held here which pauses all > > device emulation. > > > > Have you measured how long it takes to merge two typical bitmaps. Say, for a 1TB > > disk? > > > > Fam > > We don't need merge in a transaction. Yes. Either way, the command is synchronous and the whole merge process is done with BQL held, so my question still stands. But your numbers have answered it and the time is neglectable. Bitmap merging even doesn't have to be synchronous if it really matters, but we can live with a synchronous implementation for now. Thanks! Fam > > Anyway, good question. > > two full of ones bitmaps, 64k granularity, 1tb disk: > # time virsh qemu-monitor-command tmp '{"execute": > "block-dirty-bitmap-merge", "arguments": {"node": "disk", "src_name": "a", > "dst_name": "b"}}' > {"return":{},"id":"libvirt-1181"} > real 0m0.009s > user 0m0.006s > sys 0m0.002s > > and this is fine: > for last level of hbitmap we will have > disk_size / granularity / nb_bits_in_long = (1024 ^ 4) / (64 * 1024) / 64 > = 262144 > oparations, which is quite a few > > > > bitmaps in gdb: > > (gdb) p bdrv_lookup_bs ("disk", "disk", 0) > $1 = (BlockDriverState *) 0x7fd3f6274940 > (gdb) p *$1->dirty_bitmaps.lh_first > $2 = {mutex = 0x7fd3f6277b28, bitmap = 0x7fd3f5a5adc0, meta = 0x0, successor > = 0x0, > name = 0x7fd3f637b410 "b", size = 1099511627776, disabled = false, > active_iterators = 0, > readonly = false, autoload = false, persistent = false, list = {le_next = > 0x7fd3f567c650, > le_prev = 0x7fd3f6277b58}} > (gdb) p *$1->dirty_bitmaps.lh_first ->bitmap > $3 = {size = 16777216, count = 16777216, granularity = 16, meta = 0x0, > levels = {0x7fd3f6279a90, > 0x7fd3f5506350, 0x7fd3f5affcb0, 0x7fd3f547a860, 0x7fd3f637b200, > 0x7fd3f67ff5c0, 0x7fd3d8dfe010}, > sizes = {1, 1, 1, 1, 64, 4096, 262144}} > (gdb) p *$1->dirty_bitmaps.lh_first ->list .le_next > $4 = {mutex = 0x7fd3f6277b28, bitmap = 0x7fd3f567cb30, meta = 0x0, successor > = 0x0, > name = 0x7fd3f5482fb0 "a", size = 1099511627776, disabled = false, > active_iterators = 0, > readonly = false, autoload = false, persistent = false, list = {le_next = > 0x0, > le_prev = 0x7fd3f6c779e0}} > (gdb) p *$1->dirty_bitmaps.lh_first ->list .le_next ->bitmap > $5 = {size = 16777216, count = 16777216, granularity = 16, meta = 0x0, > levels = {0x7fd3f5ef8880, > 0x7fd3f5facea0, 0x7fd3f5f1cec0, 0x7fd3f5f40a00, 0x7fd3f6c80a00, > 0x7fd3f66e5f60, 0x7fd3d8fff010}, > sizes = {1, 1, 1, 1, 64, 4096, 262144}} > > -- > Best regards, > Vladimir >
© 2016 - 2024 Red Hat, Inc.