This is needed to implement image-fleecing scheme, when we create
a temporary node, mark our active node to be backing for the temp,
and start backup(sync=none) from active node to the temp node.
Temp node then represents a kind of snapshot and may be used
for external backup through NBD.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
What was the reason to abandon non-root nodes?
blockdev.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/blockdev.c b/blockdev.c
index 56a6b24a0b..d0a5a107f0 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3345,7 +3345,7 @@ BlockJob *do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn,
backup->compress = false;
}
- bs = qmp_get_root_bs(backup->device, errp);
+ bs = bdrv_lookup_bs(backup->device, backup->device, errp);
if (!bs) {
return NULL;
}
--
2.11.1
On 11/09/2017 08:16 AM, Vladimir Sementsov-Ogievskiy wrote: > This is needed to implement image-fleecing scheme, when we create > a temporary node, mark our active node to be backing for the temp, > and start backup(sync=none) from active node to the temp node. > Temp node then represents a kind of snapshot and may be used > for external backup through NBD. > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> > --- > > What was the reason to abandon non-root nodes? I think the original restriction was that we didn't know all the implications to having multiple readers to an intermediate node, so it was easier to prevent it with plans to add it later than to add it up front and deal with the fallout. But I think that now we are sufficiently versed in handling BDS trees with multiple readers, with proper op-blocking in place; so you are right that we can probably support it now. However, I'm a bit worried that there is no documentation change about this in a .json QAPI file, nor any easy way to introspect via QMP whether a particular qemu implementation supports this (short of trying it and seeing whether it works). I'm also thinking that this is 2.12 material, unless we can prove it is fixing a bug for 2.11 that was not previously present. > > blockdev.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/blockdev.c b/blockdev.c > index 56a6b24a0b..d0a5a107f0 100644 > --- a/blockdev.c > +++ b/blockdev.c > @@ -3345,7 +3345,7 @@ BlockJob *do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn, > backup->compress = false; > } > > - bs = qmp_get_root_bs(backup->device, errp); > + bs = bdrv_lookup_bs(backup->device, backup->device, errp); > if (!bs) { > return NULL; > } > -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Am 09.11.2017 um 17:33 hat Eric Blake geschrieben: > On 11/09/2017 08:16 AM, Vladimir Sementsov-Ogievskiy wrote: > > This is needed to implement image-fleecing scheme, when we create > > a temporary node, mark our active node to be backing for the temp, > > and start backup(sync=none) from active node to the temp node. > > Temp node then represents a kind of snapshot and may be used > > for external backup through NBD. > > > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> > > --- > > > > What was the reason to abandon non-root nodes? > > I think the original restriction was that we didn't know all the > implications to having multiple readers to an intermediate node, so it > was easier to prevent it with plans to add it later than to add it up > front and deal with the fallout. But I think that now we are > sufficiently versed in handling BDS trees with multiple readers, with > proper op-blocking in place; so you are right that we can probably > support it now. Op blockers are actually the reason why I'm not so sure. The old blockers often still only work on the top level, and we haven't fully replaced them yet. In particular, graph modifications might not be protected well enough by the new system yet. But then, backup works only on a single node in the source tree, not on a whole subchain, so I don't see what other operation could conflict with a backup job. The only thing is resizes, but that is covered by the new system. So in the end, I think this specific change should be okay. > However, I'm a bit worried that there is no documentation change about > this in a .json QAPI file, nor any easy way to introspect via QMP > whether a particular qemu implementation supports this (short of trying > it and seeing whether it works). I'm also thinking that this is 2.12 > material, unless we can prove it is fixing a bug for 2.11 that was not > previously present. Sounds like another use case for the capability annotations that Markus wants to introduce for commands in the QAPI schema. Kevin
On 11/09/2017 09:16 AM, Vladimir Sementsov-Ogievskiy wrote: > What was the reason to abandon non-root nodes? Eric had it correct: we were never convinced it work would properly, so we went with a smaller set.
On 2017-11-09 15:16, Vladimir Sementsov-Ogievskiy wrote: > This is needed to implement image-fleecing scheme, when we create > a temporary node, mark our active node to be backing for the temp, > and start backup(sync=none) from active node to the temp node. > Temp node then represents a kind of snapshot and may be used > for external backup through NBD. > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> > --- > > What was the reason to abandon non-root nodes? > > blockdev.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/blockdev.c b/blockdev.c > index 56a6b24a0b..d0a5a107f0 100644 > --- a/blockdev.c > +++ b/blockdev.c > @@ -3345,7 +3345,7 @@ BlockJob *do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn, > backup->compress = false; > } > > - bs = qmp_get_root_bs(backup->device, errp); > + bs = bdrv_lookup_bs(backup->device, backup->device, errp); > if (!bs) { > return NULL; > } I can't find anything wrong with it, but it will need some more changes. One is the comment in bdrv_backing_attach() that unsets the BLOCK_OP_TYPE_BACKUP_SOURCE blocker which claims that it's safe to do so because a backing file can never be a backup source anyway. Question here: Why did that seem to be important? Other places to look into are all callers of bdrv_op_block_all() that do not unblock BLOCK_OP_TYPE_BACKUP_SOURCE afterwards (i.e. all but bdrv_backing_attach()). I suspect this is because they were not sure whether they could guarantee consistent data... But does that extend to their children? (Because the exact node that is blocked will still be seen as blocked by the backup job, so that is safe.) This wouldn't give us any data corruption, though, because the user just gets garbage data in their backup -- their fault for using a garbage node, I'd argue. Maybe it doesn't matter at all. If you really want to you can probably already attach some raw node to any node you want to backup and hey, that's a root node, so congratulations, you can run blockdev-backup on it. Well, I mean, you can run it but it won't do you any good because the before_write notifiers don't work then. (Same for mirror, by the way -- the dirty bitmap of that raw node wouldn't get set.) ((By the way, I don't suppose that's how it should work... But I don't suppose that we want propagation of dirtying towards the BDS roots, do we? :-/)) Finally, there is the BlockdevBackup and DriveBackup QAPI documentation that would each need one "root" removed. Max
On 12/01/2017 02:41 PM, Max Reitz wrote: > ((By the way, I don't suppose that's how it should work... But I don't > suppose that we want propagation of dirtying towards the BDS roots, do > we? :-/)) I have never really satisfactorily explained to myself what bitmaps on intermediate notes truly represent or mean. The simple case is "This layer itself serviced a write request." If that information is not necessarily meaningful, I'm not sure that's a problem except in configuration. ...Now, if you wanted to talk about bitmaps that associate with a Backend instead of a Node...
On 2017-12-04 23:15, John Snow wrote: > > > On 12/01/2017 02:41 PM, Max Reitz wrote: >> ((By the way, I don't suppose that's how it should work... But I don't >> suppose that we want propagation of dirtying towards the BDS roots, do >> we? :-/)) > > I have never really satisfactorily explained to myself what bitmaps on > intermediate notes truly represent or mean. > > The simple case is "This layer itself serviced a write request." > > If that information is not necessarily meaningful, I'm not sure that's a > problem except in configuration. > > > ...Now, if you wanted to talk about bitmaps that associate with a > Backend instead of a Node... But it's not about bitmaps on intermediate nodes, quite the opposite. It's about bitmaps on roots but write requests happening on intermediate nodes. Say you have a node I and two filter nodes A and B using it (and they are OK with shared writers). There is a dirty bitmap on A. Now when a write request goes through B, I will obviously have changed, and because A and B are filters, so will A. But the dirty bitmap on A will still be clean. My example was that when you run a mirror over A, you won't see dirtying from B. So you can't e.g. add a throttle driver between a mirror job and the node you want to mirror, because the dirty bitmap on the throttle driver will not be affected by accesses to the actual node. Max
On 12/04/2017 05:21 PM, Max Reitz wrote: > On 2017-12-04 23:15, John Snow wrote: >> >> >> On 12/01/2017 02:41 PM, Max Reitz wrote: >>> ((By the way, I don't suppose that's how it should work... But I don't >>> suppose that we want propagation of dirtying towards the BDS roots, do >>> we? :-/)) >> >> I have never really satisfactorily explained to myself what bitmaps on >> intermediate notes truly represent or mean. >> >> The simple case is "This layer itself serviced a write request." >> >> If that information is not necessarily meaningful, I'm not sure that's a >> problem except in configuration. >> >> >> ...Now, if you wanted to talk about bitmaps that associate with a >> Backend instead of a Node... > > But it's not about bitmaps on intermediate nodes, quite the opposite. > It's about bitmaps on roots but write requests happening on intermediate > nodes. > Oh, I see what you're saying. It magically doesn't really change my opinion, by coincidence! > Say you have a node I and two filter nodes A and B using it (and they > are OK with shared writers). There is a dirty bitmap on A. > > Now when a write request goes through B, I will obviously have changed, > and because A and B are filters, so will A. But the dirty bitmap on A > will still be clean. > > My example was that when you run a mirror over A, you won't see dirtying > from B. So you can't e.g. add a throttle driver between a mirror job > and the node you want to mirror, because the dirty bitmap on the > throttle driver will not be affected by accesses to the actual node. > > Max > Well, in this case I would say that a root BDS is not really any different from an intermediate one and can't really know what's going on in the world outside. At least, I think that's how we model it right now -- we pretend that we can record the activity of an entire drive graph by putting the bitmap on the root-most node we can get a hold of and assuming that all writes are going to go through us. Clearly this is increasingly false the more we modularise the block graph. *uhm* I would say that a bitmap attached to a BlockBackend should behave in the way you say: writes to any children should change the bitmap here. bitmaps attached to nodes shouldn't worry about such things.
On 2017-12-05 01:48, John Snow wrote: > > > On 12/04/2017 05:21 PM, Max Reitz wrote: >> On 2017-12-04 23:15, John Snow wrote: >>> >>> >>> On 12/01/2017 02:41 PM, Max Reitz wrote: >>>> ((By the way, I don't suppose that's how it should work... But I don't >>>> suppose that we want propagation of dirtying towards the BDS roots, do >>>> we? :-/)) >>> >>> I have never really satisfactorily explained to myself what bitmaps on >>> intermediate notes truly represent or mean. >>> >>> The simple case is "This layer itself serviced a write request." >>> >>> If that information is not necessarily meaningful, I'm not sure that's a >>> problem except in configuration. >>> >>> >>> ...Now, if you wanted to talk about bitmaps that associate with a >>> Backend instead of a Node... >> >> But it's not about bitmaps on intermediate nodes, quite the opposite. >> It's about bitmaps on roots but write requests happening on intermediate >> nodes. >> > > Oh, I see what you're saying. It magically doesn't really change my > opinion, by coincidence! > >> Say you have a node I and two filter nodes A and B using it (and they >> are OK with shared writers). There is a dirty bitmap on A. >> >> Now when a write request goes through B, I will obviously have changed, >> and because A and B are filters, so will A. But the dirty bitmap on A >> will still be clean. >> >> My example was that when you run a mirror over A, you won't see dirtying >> from B. So you can't e.g. add a throttle driver between a mirror job >> and the node you want to mirror, because the dirty bitmap on the >> throttle driver will not be affected by accesses to the actual node. >> >> Max >> > > Well, in this case I would say that a root BDS is not really any > different from an intermediate one and can't really know what's going on > in the world outside. > > At least, I think that's how we model it right now -- we pretend that we > can record the activity of an entire drive graph by putting the bitmap > on the root-most node we can get a hold of and assuming that all writes > are going to go through us. Well, yeah, I know we do. But I consider this counter-intuitive and if something is counter-intuitive it's often a bug. > Clearly this is increasingly false the more we modularise the block graph. > > > *uhm* > > > I would say that a bitmap attached to a BlockBackend should behave in > the way you say: writes to any children should change the bitmap here. > > bitmaps attached to nodes shouldn't worry about such things. Do we have bitmaps attached to BlockBackends? I sure hope not. We should not have any interface that requires the use of BlockBackends by now. If we do, that's something that has to be fixed. Max
On 12/08/2017 09:30 AM, Max Reitz wrote: > On 2017-12-05 01:48, John Snow wrote: >> >> >> On 12/04/2017 05:21 PM, Max Reitz wrote: >>> On 2017-12-04 23:15, John Snow wrote: >>>> >>>> >>>> On 12/01/2017 02:41 PM, Max Reitz wrote: >>>>> ((By the way, I don't suppose that's how it should work... But I don't >>>>> suppose that we want propagation of dirtying towards the BDS roots, do >>>>> we? :-/)) >>>> >>>> I have never really satisfactorily explained to myself what bitmaps on >>>> intermediate notes truly represent or mean. >>>> >>>> The simple case is "This layer itself serviced a write request." >>>> >>>> If that information is not necessarily meaningful, I'm not sure that's a >>>> problem except in configuration. >>>> >>>> >>>> ...Now, if you wanted to talk about bitmaps that associate with a >>>> Backend instead of a Node... >>> >>> But it's not about bitmaps on intermediate nodes, quite the opposite. >>> It's about bitmaps on roots but write requests happening on intermediate >>> nodes. >>> >> >> Oh, I see what you're saying. It magically doesn't really change my >> opinion, by coincidence! >> >>> Say you have a node I and two filter nodes A and B using it (and they >>> are OK with shared writers). There is a dirty bitmap on A. >>> >>> Now when a write request goes through B, I will obviously have changed, >>> and because A and B are filters, so will A. But the dirty bitmap on A >>> will still be clean. >>> >>> My example was that when you run a mirror over A, you won't see dirtying >>> from B. So you can't e.g. add a throttle driver between a mirror job >>> and the node you want to mirror, because the dirty bitmap on the >>> throttle driver will not be affected by accesses to the actual node. >>> >>> Max >>> >> >> Well, in this case I would say that a root BDS is not really any >> different from an intermediate one and can't really know what's going on >> in the world outside. >> >> At least, I think that's how we model it right now -- we pretend that we >> can record the activity of an entire drive graph by putting the bitmap >> on the root-most node we can get a hold of and assuming that all writes >> are going to go through us. > > Well, yeah, I know we do. But I consider this counter-intuitive and if > something is counter-intuitive it's often a bug. > >> Clearly this is increasingly false the more we modularise the block graph. >> >> >> *uhm* >> >> >> I would say that a bitmap attached to a BlockBackend should behave in >> the way you say: writes to any children should change the bitmap here. >> >> bitmaps attached to nodes shouldn't worry about such things. > > Do we have bitmaps attached to BlockBackends? I sure hope not. > > We should not have any interface that requires the use of BlockBackends > by now. If we do, that's something that has to be fixed. > > Max > I'm not sure what the right paradigm is anymore, then. A node is just a node, but something has to represent the "drive" as far as the device model sees it. I thought that *was* the BlockBackend, but is it not? --js
On 2017-12-08 18:09, John Snow wrote: > > > On 12/08/2017 09:30 AM, Max Reitz wrote: >> On 2017-12-05 01:48, John Snow wrote: >>> >>> >>> On 12/04/2017 05:21 PM, Max Reitz wrote: >>>> On 2017-12-04 23:15, John Snow wrote: >>>>> >>>>> >>>>> On 12/01/2017 02:41 PM, Max Reitz wrote: >>>>>> ((By the way, I don't suppose that's how it should work... But I don't >>>>>> suppose that we want propagation of dirtying towards the BDS roots, do >>>>>> we? :-/)) >>>>> >>>>> I have never really satisfactorily explained to myself what bitmaps on >>>>> intermediate notes truly represent or mean. >>>>> >>>>> The simple case is "This layer itself serviced a write request." >>>>> >>>>> If that information is not necessarily meaningful, I'm not sure that's a >>>>> problem except in configuration. >>>>> >>>>> >>>>> ...Now, if you wanted to talk about bitmaps that associate with a >>>>> Backend instead of a Node... >>>> >>>> But it's not about bitmaps on intermediate nodes, quite the opposite. >>>> It's about bitmaps on roots but write requests happening on intermediate >>>> nodes. >>>> >>> >>> Oh, I see what you're saying. It magically doesn't really change my >>> opinion, by coincidence! >>> >>>> Say you have a node I and two filter nodes A and B using it (and they >>>> are OK with shared writers). There is a dirty bitmap on A. >>>> >>>> Now when a write request goes through B, I will obviously have changed, >>>> and because A and B are filters, so will A. But the dirty bitmap on A >>>> will still be clean. >>>> >>>> My example was that when you run a mirror over A, you won't see dirtying >>>> from B. So you can't e.g. add a throttle driver between a mirror job >>>> and the node you want to mirror, because the dirty bitmap on the >>>> throttle driver will not be affected by accesses to the actual node. >>>> >>>> Max >>>> >>> >>> Well, in this case I would say that a root BDS is not really any >>> different from an intermediate one and can't really know what's going on >>> in the world outside. >>> >>> At least, I think that's how we model it right now -- we pretend that we >>> can record the activity of an entire drive graph by putting the bitmap >>> on the root-most node we can get a hold of and assuming that all writes >>> are going to go through us. >> >> Well, yeah, I know we do. But I consider this counter-intuitive and if >> something is counter-intuitive it's often a bug. >> >>> Clearly this is increasingly false the more we modularise the block graph. >>> >>> >>> *uhm* >>> >>> >>> I would say that a bitmap attached to a BlockBackend should behave in >>> the way you say: writes to any children should change the bitmap here. >>> >>> bitmaps attached to nodes shouldn't worry about such things. >> >> Do we have bitmaps attached to BlockBackends? I sure hope not. >> >> We should not have any interface that requires the use of BlockBackends >> by now. If we do, that's something that has to be fixed. >> >> Max >> > > I'm not sure what the right paradigm is anymore, then. > > A node is just a node, but something has to represent the "drive" as far > as the device model sees it. I thought that *was* the BlockBackend, but > is it not? Yes, and on the other side the BB represents the device model for the block layer. But the thing is that the user should be blissfully unaware... Or do you want to make bitmaps attachable to guest devices (through the QOM path or ID) instead? (The block layer would then internally translate that to a BB. But that's a bad internal interface because the bitmap is still attached to a BDS, and it's a bad external interface because currently you can attach bitmaps to nodes and only to nodes...) Max
On 12/11/2017 11:31 AM, Max Reitz wrote: > On 2017-12-08 18:09, John Snow wrote: >> >> >> On 12/08/2017 09:30 AM, Max Reitz wrote: >>> On 2017-12-05 01:48, John Snow wrote: >>>> >>>> >>>> On 12/04/2017 05:21 PM, Max Reitz wrote: >>>>> On 2017-12-04 23:15, John Snow wrote: >>>>>> >>>>>> >>>>>> On 12/01/2017 02:41 PM, Max Reitz wrote: >>>>>>> ((By the way, I don't suppose that's how it should work... But I don't >>>>>>> suppose that we want propagation of dirtying towards the BDS roots, do >>>>>>> we? :-/)) >>>>>> >>>>>> I have never really satisfactorily explained to myself what bitmaps on >>>>>> intermediate notes truly represent or mean. >>>>>> >>>>>> The simple case is "This layer itself serviced a write request." >>>>>> >>>>>> If that information is not necessarily meaningful, I'm not sure that's a >>>>>> problem except in configuration. >>>>>> >>>>>> >>>>>> ...Now, if you wanted to talk about bitmaps that associate with a >>>>>> Backend instead of a Node... >>>>> >>>>> But it's not about bitmaps on intermediate nodes, quite the opposite. >>>>> It's about bitmaps on roots but write requests happening on intermediate >>>>> nodes. >>>>> >>>> >>>> Oh, I see what you're saying. It magically doesn't really change my >>>> opinion, by coincidence! >>>> >>>>> Say you have a node I and two filter nodes A and B using it (and they >>>>> are OK with shared writers). There is a dirty bitmap on A. >>>>> >>>>> Now when a write request goes through B, I will obviously have changed, >>>>> and because A and B are filters, so will A. But the dirty bitmap on A >>>>> will still be clean. >>>>> >>>>> My example was that when you run a mirror over A, you won't see dirtying >>>>> from B. So you can't e.g. add a throttle driver between a mirror job >>>>> and the node you want to mirror, because the dirty bitmap on the >>>>> throttle driver will not be affected by accesses to the actual node. >>>>> >>>>> Max >>>>> >>>> >>>> Well, in this case I would say that a root BDS is not really any >>>> different from an intermediate one and can't really know what's going on >>>> in the world outside. >>>> >>>> At least, I think that's how we model it right now -- we pretend that we >>>> can record the activity of an entire drive graph by putting the bitmap >>>> on the root-most node we can get a hold of and assuming that all writes >>>> are going to go through us. >>> >>> Well, yeah, I know we do. But I consider this counter-intuitive and if >>> something is counter-intuitive it's often a bug. >>> >>>> Clearly this is increasingly false the more we modularise the block graph. >>>> >>>> >>>> *uhm* >>>> >>>> >>>> I would say that a bitmap attached to a BlockBackend should behave in >>>> the way you say: writes to any children should change the bitmap here. >>>> >>>> bitmaps attached to nodes shouldn't worry about such things. >>> >>> Do we have bitmaps attached to BlockBackends? I sure hope not. >>> >>> We should not have any interface that requires the use of BlockBackends >>> by now. If we do, that's something that has to be fixed. >>> >>> Max >>> >> >> I'm not sure what the right paradigm is anymore, then. >> >> A node is just a node, but something has to represent the "drive" as far >> as the device model sees it. I thought that *was* the BlockBackend, but >> is it not? > > Yes, and on the other side the BB represents the device model for the > block layer. But the thing is that the user should be blissfully > unaware... Or do you want to make bitmaps attachable to guest devices > (through the QOM path or ID) instead? > OK, sure -- the user can specify a device model to attach it to instead of a node. They don't have to be aware of the BB itself. The implementation though, I imagine it associates with that BB. > (The block layer would then internally translate that to a BB. But > that's a bad internal interface because the bitmap is still attached to > a BDS, and it's a bad external interface because currently you can > attach bitmaps to nodes and only to nodes...) What if the type of bitmap we want to track trans-node changes was not attached to a BDS? That'd be one way to obviously discriminate between "This tracks tree-wide changes" and "This tracks node-local changes." Implementation wise I don't have any actual thought as to how this could possibly be efficient. Maybe a bitmap reference at each BDS that is a child of that particular BB? On attach, the BDS gets a set-only reference to that bitmap. On detach, we remove the reference. Then, any writes anywhere in the tree will coagulate in one place. It may or may not be particularly true or correct, because a write down the tree doesn't necessarily mean the visible data has changed at the top layer, but I don't think we have anything institutionally set in place to determine if we are changing visible data up-graph with a write down-graph.
On 2017-12-11 17:47, John Snow wrote: > > > On 12/11/2017 11:31 AM, Max Reitz wrote: >> On 2017-12-08 18:09, John Snow wrote: >>> >>> >>> On 12/08/2017 09:30 AM, Max Reitz wrote: >>>> On 2017-12-05 01:48, John Snow wrote: >>>>> >>>>> >>>>> On 12/04/2017 05:21 PM, Max Reitz wrote: >>>>>> On 2017-12-04 23:15, John Snow wrote: >>>>>>> >>>>>>> >>>>>>> On 12/01/2017 02:41 PM, Max Reitz wrote: >>>>>>>> ((By the way, I don't suppose that's how it should work... But I don't >>>>>>>> suppose that we want propagation of dirtying towards the BDS roots, do >>>>>>>> we? :-/)) >>>>>>> >>>>>>> I have never really satisfactorily explained to myself what bitmaps on >>>>>>> intermediate notes truly represent or mean. >>>>>>> >>>>>>> The simple case is "This layer itself serviced a write request." >>>>>>> >>>>>>> If that information is not necessarily meaningful, I'm not sure that's a >>>>>>> problem except in configuration. >>>>>>> >>>>>>> >>>>>>> ...Now, if you wanted to talk about bitmaps that associate with a >>>>>>> Backend instead of a Node... >>>>>> >>>>>> But it's not about bitmaps on intermediate nodes, quite the opposite. >>>>>> It's about bitmaps on roots but write requests happening on intermediate >>>>>> nodes. >>>>>> >>>>> >>>>> Oh, I see what you're saying. It magically doesn't really change my >>>>> opinion, by coincidence! >>>>> >>>>>> Say you have a node I and two filter nodes A and B using it (and they >>>>>> are OK with shared writers). There is a dirty bitmap on A. >>>>>> >>>>>> Now when a write request goes through B, I will obviously have changed, >>>>>> and because A and B are filters, so will A. But the dirty bitmap on A >>>>>> will still be clean. >>>>>> >>>>>> My example was that when you run a mirror over A, you won't see dirtying >>>>>> from B. So you can't e.g. add a throttle driver between a mirror job >>>>>> and the node you want to mirror, because the dirty bitmap on the >>>>>> throttle driver will not be affected by accesses to the actual node. >>>>>> >>>>>> Max >>>>>> >>>>> >>>>> Well, in this case I would say that a root BDS is not really any >>>>> different from an intermediate one and can't really know what's going on >>>>> in the world outside. >>>>> >>>>> At least, I think that's how we model it right now -- we pretend that we >>>>> can record the activity of an entire drive graph by putting the bitmap >>>>> on the root-most node we can get a hold of and assuming that all writes >>>>> are going to go through us. >>>> >>>> Well, yeah, I know we do. But I consider this counter-intuitive and if >>>> something is counter-intuitive it's often a bug. >>>> >>>>> Clearly this is increasingly false the more we modularise the block graph. >>>>> >>>>> >>>>> *uhm* >>>>> >>>>> >>>>> I would say that a bitmap attached to a BlockBackend should behave in >>>>> the way you say: writes to any children should change the bitmap here. >>>>> >>>>> bitmaps attached to nodes shouldn't worry about such things. >>>> >>>> Do we have bitmaps attached to BlockBackends? I sure hope not. >>>> >>>> We should not have any interface that requires the use of BlockBackends >>>> by now. If we do, that's something that has to be fixed. >>>> >>>> Max >>>> >>> >>> I'm not sure what the right paradigm is anymore, then. >>> >>> A node is just a node, but something has to represent the "drive" as far >>> as the device model sees it. I thought that *was* the BlockBackend, but >>> is it not? >> >> Yes, and on the other side the BB represents the device model for the >> block layer. But the thing is that the user should be blissfully >> unaware... Or do you want to make bitmaps attachable to guest devices >> (through the QOM path or ID) instead? >> > > OK, sure -- the user can specify a device model to attach it to instead > of a node. They don't have to be aware of the BB itself. > > The implementation though, I imagine it associates with that BB. But that would be a whole new implementation... >> (The block layer would then internally translate that to a BB. But >> that's a bad internal interface because the bitmap is still attached to >> a BDS, and it's a bad external interface because currently you can >> attach bitmaps to nodes and only to nodes...) > > What if the type of bitmap we want to track trans-node changes was not > attached to a BDS? That'd be one way to obviously discriminate between > "This tracks tree-wide changes" and "This tracks node-local changes." A new type of bitmap? :-/ > Implementation wise I don't have any actual thought as to how this could > possibly be efficient. Maybe a bitmap reference at each BDS that is a > child of that particular BB? > > On attach, the BDS gets a set-only reference to that bitmap. > On detach, we remove the reference. > > Then, any writes anywhere in the tree will coagulate in one place. It > may or may not be particularly true or correct, because a write down the > tree doesn't necessarily mean the visible data has changed at the top > layer, but I don't think we have anything institutionally set in place > to determine if we are changing visible data up-graph with a write > down-graph. Hmmm... The first thing to clarify is whether we want two types of bitmaps. I don't think there is much use to node-local bitmaps, all bitmaps should track every dirtying of their associated node (wherever it comes from). However, if that is too much of a performance penalty... Then we probably do have to distinguish between the two so that users only add tree-wide bitmaps when they need them. OTOH, I guess that in the common case it's not a performance penalty at all, if done right. Usually, a node you attach a bitmap to will not have child nodes that are written to by other nodes. So in the common case your tree-wide bitmap is just a plain local bitmap and thus just as efficient. And if some child node is indeed written to by some other node... I think you always want a tree-wide bitmap anyway. So I think all bitmaps should be tree-wide and the fact that they currently are not is a bug. Max
On 12/11/2017 12:05 PM, Max Reitz wrote: > On 2017-12-11 17:47, John Snow wrote: >> On 12/11/2017 11:31 AM, Max Reitz wrote: >>> On 2017-12-08 18:09, John Snow wrote: >>>> On 12/08/2017 09:30 AM, Max Reitz wrote: >>>>> On 2017-12-05 01:48, John Snow wrote: >>>>>> >>>>>> I would say that a bitmap attached to a BlockBackend should behave in >>>>>> the way you say: writes to any children should change the bitmap here. >>>>>> >>>>>> bitmaps attached to nodes shouldn't worry about such things. >>>>> >>>>> Do we have bitmaps attached to BlockBackends? I sure hope not. >>>>> >>>>> We should not have any interface that requires the use of BlockBackends >>>>> by now. If we do, that's something that has to be fixed. >>>>> >>>>> >>>> >>>> I'm not sure what the right paradigm is anymore, then. >>>> >>>> A node is just a node, but something has to represent the "drive" as far >>>> as the device model sees it. I thought that *was* the BlockBackend, but >>>> is it not? >>> >>> Yes, and on the other side the BB represents the device model for the >>> block layer. But the thing is that the user should be blissfully >>> unaware... Or do you want to make bitmaps attachable to guest devices >>> (through the QOM path or ID) instead? >>> >> >> OK, sure -- the user can specify a device model to attach it to instead >> of a node. They don't have to be aware of the BB itself. >> >> The implementation though, I imagine it associates with that BB. > > But that would be a whole new implementation... > Yeah. >>> (The block layer would then internally translate that to a BB. But >>> that's a bad internal interface because the bitmap is still attached to >>> a BDS, and it's a bad external interface because currently you can >>> attach bitmaps to nodes and only to nodes...) >> >> What if the type of bitmap we want to track trans-node changes was not >> attached to a BDS? That'd be one way to obviously discriminate between >> "This tracks tree-wide changes" and "This tracks node-local changes." > > A new type of bitmap? :-/ > "type" may be too strong of a word, but... all the ones we use currently are node-local. >> Implementation wise I don't have any actual thought as to how this could >> possibly be efficient. Maybe a bitmap reference at each BDS that is a >> child of that particular BB? >> >> On attach, the BDS gets a set-only reference to that bitmap. >> On detach, we remove the reference. >> >> Then, any writes anywhere in the tree will coagulate in one place. It >> may or may not be particularly true or correct, because a write down the >> tree doesn't necessarily mean the visible data has changed at the top >> layer, but I don't think we have anything institutionally set in place >> to determine if we are changing visible data up-graph with a write >> down-graph. > > Hmmm... The first thing to clarify is whether we want two types of > bitmaps. I don't think there is much use to node-local bitmaps, all > bitmaps should track every dirtying of their associated node (wherever > it comes from). > I don't disagree, but if that's the case then attaching bitmaps to non-root nodes should be prohibited and we ought to shore up the semantic idea that bitmaps must collect writes from their children. I'm not sure that's any better inherently than expanding the idea to include obvious differences between node-local and tree-wide bitmaps. Currently, we just confuse the two concepts and do a poor job of either. > However, if that is too much of a performance penalty... Then we > probably do have to distinguish between the two so that users only add > tree-wide bitmaps when they need them. > > OTOH, I guess that in the common case it's not a performance penalty at > all, if done right. Usually, a node you attach a bitmap to will not > have child nodes that are written to by other nodes. So in the common > case your tree-wide bitmap is just a plain local bitmap and thus just as > efficient. > > And if some child node is indeed written to by some other node... I > think you always want a tree-wide bitmap anyway. > > So I think all bitmaps should be tree-wide and the fact that they > currently are not is a bug. I could agree with this viewpoint, but that means we need to disallow bitmaps to be attached to any non-root BDS, and then implement an ability to give set-only references to children. That's a bit of an extreme tactic that might prevent us from ever using node-local bitmaps with anything resembling a sane API in the future. Maybe that's OK, but it is a commitment. I'm not sure there are any good reasons to have node-local bitmaps, but the idea was one I more or less inherited when I started working in this area because specifying specific BDS nodes with "device=" was going out of vogue, but what exactly a BB was wasn't really defined yet, so I got stuck with a QMP interface named "node=" and an implementation that has an affinity for nodes.
© 2016 - 2024 Red Hat, Inc.