[Qemu-devel] [PATCH] blockdev-backup: enable non-root nodes for backup

Vladimir Sementsov-Ogievskiy posted 1 patch 6 years, 5 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20171109141631.25688-1-vsementsov@virtuozzo.com
Test checkpatch passed
Test docker passed
Test ppc passed
Test s390x passed
blockdev.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[Qemu-devel] [PATCH] blockdev-backup: enable non-root nodes for backup
Posted by Vladimir Sementsov-Ogievskiy 6 years, 5 months ago
This is needed to implement image-fleecing scheme, when we create
a temporary node, mark our active node to be backing for the temp,
and start backup(sync=none) from active node to the temp node.
Temp node then represents a kind of snapshot and may be used
for external backup through NBD.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---

What was the reason to abandon non-root nodes?

 blockdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/blockdev.c b/blockdev.c
index 56a6b24a0b..d0a5a107f0 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3345,7 +3345,7 @@ BlockJob *do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn,
         backup->compress = false;
     }
 
-    bs = qmp_get_root_bs(backup->device, errp);
+    bs = bdrv_lookup_bs(backup->device, backup->device, errp);
     if (!bs) {
         return NULL;
     }
-- 
2.11.1


Re: [Qemu-devel] [PATCH] blockdev-backup: enable non-root nodes for backup
Posted by Eric Blake 6 years, 5 months ago
On 11/09/2017 08:16 AM, Vladimir Sementsov-Ogievskiy wrote:
> This is needed to implement image-fleecing scheme, when we create
> a temporary node, mark our active node to be backing for the temp,
> and start backup(sync=none) from active node to the temp node.
> Temp node then represents a kind of snapshot and may be used
> for external backup through NBD.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
> 
> What was the reason to abandon non-root nodes?

I think the original restriction was that we didn't know all the
implications to having multiple readers to an intermediate node, so it
was easier to prevent it with plans to add it later than to add it up
front and deal with the fallout.  But I think that now we are
sufficiently versed in handling BDS trees with multiple readers, with
proper op-blocking in place; so you are right that we can probably
support it now.

However, I'm a bit worried that there is no documentation change about
this in a .json QAPI file, nor any easy way to introspect via QMP
whether a particular qemu implementation supports this (short of trying
it and seeing whether it works).  I'm also thinking that this is 2.12
material, unless we can prove it is fixing a bug for 2.11 that was not
previously present.

> 
>  blockdev.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/blockdev.c b/blockdev.c
> index 56a6b24a0b..d0a5a107f0 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -3345,7 +3345,7 @@ BlockJob *do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn,
>          backup->compress = false;
>      }
>  
> -    bs = qmp_get_root_bs(backup->device, errp);
> +    bs = bdrv_lookup_bs(backup->device, backup->device, errp);
>      if (!bs) {
>          return NULL;
>      }
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH] blockdev-backup: enable non-root nodes for backup
Posted by Kevin Wolf 6 years, 5 months ago
Am 09.11.2017 um 17:33 hat Eric Blake geschrieben:
> On 11/09/2017 08:16 AM, Vladimir Sementsov-Ogievskiy wrote:
> > This is needed to implement image-fleecing scheme, when we create
> > a temporary node, mark our active node to be backing for the temp,
> > and start backup(sync=none) from active node to the temp node.
> > Temp node then represents a kind of snapshot and may be used
> > for external backup through NBD.
> > 
> > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> > ---
> > 
> > What was the reason to abandon non-root nodes?
> 
> I think the original restriction was that we didn't know all the
> implications to having multiple readers to an intermediate node, so it
> was easier to prevent it with plans to add it later than to add it up
> front and deal with the fallout.  But I think that now we are
> sufficiently versed in handling BDS trees with multiple readers, with
> proper op-blocking in place; so you are right that we can probably
> support it now.

Op blockers are actually the reason why I'm not so sure. The old
blockers often still only work on the top level, and we haven't fully
replaced them yet. In particular, graph modifications might not be
protected well enough by the new system yet.

But then, backup works only on a single node in the source tree, not on
a whole subchain, so I don't see what other operation could conflict
with a backup job. The only thing is resizes, but that is covered by the
new system.

So in the end, I think this specific change should be okay.

> However, I'm a bit worried that there is no documentation change about
> this in a .json QAPI file, nor any easy way to introspect via QMP
> whether a particular qemu implementation supports this (short of trying
> it and seeing whether it works).  I'm also thinking that this is 2.12
> material, unless we can prove it is fixing a bug for 2.11 that was not
> previously present.

Sounds like another use case for the capability annotations that Markus
wants to introduce for commands in the QAPI schema.

Kevin
Re: [Qemu-devel] [Qemu-block] [PATCH] blockdev-backup: enable non-root nodes for backup
Posted by John Snow 6 years, 5 months ago

On 11/09/2017 09:16 AM, Vladimir Sementsov-Ogievskiy wrote:
> What was the reason to abandon non-root nodes?

Eric had it correct: we were never convinced it work would properly, so
we went with a smaller set.

Re: [Qemu-devel] [PATCH] blockdev-backup: enable non-root nodes for backup
Posted by Max Reitz 6 years, 4 months ago
On 2017-11-09 15:16, Vladimir Sementsov-Ogievskiy wrote:
> This is needed to implement image-fleecing scheme, when we create
> a temporary node, mark our active node to be backing for the temp,
> and start backup(sync=none) from active node to the temp node.
> Temp node then represents a kind of snapshot and may be used
> for external backup through NBD.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
> 
> What was the reason to abandon non-root nodes?
> 
>  blockdev.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/blockdev.c b/blockdev.c
> index 56a6b24a0b..d0a5a107f0 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -3345,7 +3345,7 @@ BlockJob *do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn,
>          backup->compress = false;
>      }
>  
> -    bs = qmp_get_root_bs(backup->device, errp);
> +    bs = bdrv_lookup_bs(backup->device, backup->device, errp);
>      if (!bs) {
>          return NULL;
>      }

I can't find anything wrong with it, but it will need some more changes.

One is the comment in bdrv_backing_attach() that unsets the
BLOCK_OP_TYPE_BACKUP_SOURCE blocker which claims that it's safe to do so
because a backing file can never be a backup source anyway.  Question
here: Why did that seem to be important?

Other places to look into are all callers of bdrv_op_block_all() that do
not unblock BLOCK_OP_TYPE_BACKUP_SOURCE afterwards (i.e. all but
bdrv_backing_attach()).  I suspect this is because they were not sure
whether they could guarantee consistent data...  But does that extend to
their children?  (Because the exact node that is blocked will still be
seen as blocked by the backup job, so that is safe.)  This wouldn't give
us any data corruption, though, because the user just gets garbage data
in their backup -- their fault for using a garbage node, I'd argue.

Maybe it doesn't matter at all.  If you really want to you can probably
already attach some raw node to any node you want to backup and hey,
that's a root node, so congratulations, you can run blockdev-backup on
it.  Well, I mean, you can run it but it won't do you any good because
the before_write notifiers don't work then.

(Same for mirror, by the way -- the dirty bitmap of that raw node
wouldn't get set.)

((By the way, I don't suppose that's how it should work...  But I don't
suppose that we want propagation of dirtying towards the BDS roots, do
we? :-/))


Finally, there is the BlockdevBackup and DriveBackup QAPI documentation
that would each need one "root" removed.

Max

Re: [Qemu-devel] [Qemu-block] [PATCH] blockdev-backup: enable non-root nodes for backup
Posted by John Snow 6 years, 4 months ago

On 12/01/2017 02:41 PM, Max Reitz wrote:
> ((By the way, I don't suppose that's how it should work...  But I don't
> suppose that we want propagation of dirtying towards the BDS roots, do
> we? :-/))

I have never really satisfactorily explained to myself what bitmaps on
intermediate notes truly represent or mean.

The simple case is "This layer itself serviced a write request."

If that information is not necessarily meaningful, I'm not sure that's a
problem except in configuration.


...Now, if you wanted to talk about bitmaps that associate with a
Backend instead of a Node...

Re: [Qemu-devel] [Qemu-block] [PATCH] blockdev-backup: enable non-root nodes for backup
Posted by Max Reitz 6 years, 4 months ago
On 2017-12-04 23:15, John Snow wrote:
> 
> 
> On 12/01/2017 02:41 PM, Max Reitz wrote:
>> ((By the way, I don't suppose that's how it should work...  But I don't
>> suppose that we want propagation of dirtying towards the BDS roots, do
>> we? :-/))
> 
> I have never really satisfactorily explained to myself what bitmaps on
> intermediate notes truly represent or mean.
> 
> The simple case is "This layer itself serviced a write request."
> 
> If that information is not necessarily meaningful, I'm not sure that's a
> problem except in configuration.
> 
> 
> ...Now, if you wanted to talk about bitmaps that associate with a
> Backend instead of a Node...

But it's not about bitmaps on intermediate nodes, quite the opposite.
It's about bitmaps on roots but write requests happening on intermediate
nodes.

Say you have a node I and two filter nodes A and B using it (and they
are OK with shared writers).  There is a dirty bitmap on A.

Now when a write request goes through B, I will obviously have changed,
and because A and B are filters, so will A.  But the dirty bitmap on A
will still be clean.

My example was that when you run a mirror over A, you won't see dirtying
from B.  So you can't e.g. add a throttle driver between a mirror job
and the node you want to mirror, because the dirty bitmap on the
throttle driver will not be affected by accesses to the actual node.

Max

Re: [Qemu-devel] [Qemu-block] [PATCH] blockdev-backup: enable non-root nodes for backup
Posted by John Snow 6 years, 4 months ago

On 12/04/2017 05:21 PM, Max Reitz wrote:
> On 2017-12-04 23:15, John Snow wrote:
>>
>>
>> On 12/01/2017 02:41 PM, Max Reitz wrote:
>>> ((By the way, I don't suppose that's how it should work...  But I don't
>>> suppose that we want propagation of dirtying towards the BDS roots, do
>>> we? :-/))
>>
>> I have never really satisfactorily explained to myself what bitmaps on
>> intermediate notes truly represent or mean.
>>
>> The simple case is "This layer itself serviced a write request."
>>
>> If that information is not necessarily meaningful, I'm not sure that's a
>> problem except in configuration.
>>
>>
>> ...Now, if you wanted to talk about bitmaps that associate with a
>> Backend instead of a Node...
> 
> But it's not about bitmaps on intermediate nodes, quite the opposite.
> It's about bitmaps on roots but write requests happening on intermediate
> nodes.
> 

Oh, I see what you're saying. It magically doesn't really change my
opinion, by coincidence!

> Say you have a node I and two filter nodes A and B using it (and they
> are OK with shared writers).  There is a dirty bitmap on A.
> 
> Now when a write request goes through B, I will obviously have changed,
> and because A and B are filters, so will A.  But the dirty bitmap on A
> will still be clean.
> 
> My example was that when you run a mirror over A, you won't see dirtying
> from B.  So you can't e.g. add a throttle driver between a mirror job
> and the node you want to mirror, because the dirty bitmap on the
> throttle driver will not be affected by accesses to the actual node.
> 
> Max
> 

Well, in this case I would say that a root BDS is not really any
different from an intermediate one and can't really know what's going on
in the world outside.

At least, I think that's how we model it right now -- we pretend that we
can record the activity of an entire drive graph by putting the bitmap
on the root-most node we can get a hold of and assuming that all writes
are going to go through us.

Clearly this is increasingly false the more we modularise the block graph.


*uhm*


I would say that a bitmap attached to a BlockBackend should behave in
the way you say: writes to any children should change the bitmap here.

bitmaps attached to nodes shouldn't worry about such things.

Re: [Qemu-devel] [Qemu-block] [PATCH] blockdev-backup: enable non-root nodes for backup
Posted by Max Reitz 6 years, 4 months ago
On 2017-12-05 01:48, John Snow wrote:
> 
> 
> On 12/04/2017 05:21 PM, Max Reitz wrote:
>> On 2017-12-04 23:15, John Snow wrote:
>>>
>>>
>>> On 12/01/2017 02:41 PM, Max Reitz wrote:
>>>> ((By the way, I don't suppose that's how it should work...  But I don't
>>>> suppose that we want propagation of dirtying towards the BDS roots, do
>>>> we? :-/))
>>>
>>> I have never really satisfactorily explained to myself what bitmaps on
>>> intermediate notes truly represent or mean.
>>>
>>> The simple case is "This layer itself serviced a write request."
>>>
>>> If that information is not necessarily meaningful, I'm not sure that's a
>>> problem except in configuration.
>>>
>>>
>>> ...Now, if you wanted to talk about bitmaps that associate with a
>>> Backend instead of a Node...
>>
>> But it's not about bitmaps on intermediate nodes, quite the opposite.
>> It's about bitmaps on roots but write requests happening on intermediate
>> nodes.
>>
> 
> Oh, I see what you're saying. It magically doesn't really change my
> opinion, by coincidence!
> 
>> Say you have a node I and two filter nodes A and B using it (and they
>> are OK with shared writers).  There is a dirty bitmap on A.
>>
>> Now when a write request goes through B, I will obviously have changed,
>> and because A and B are filters, so will A.  But the dirty bitmap on A
>> will still be clean.
>>
>> My example was that when you run a mirror over A, you won't see dirtying
>> from B.  So you can't e.g. add a throttle driver between a mirror job
>> and the node you want to mirror, because the dirty bitmap on the
>> throttle driver will not be affected by accesses to the actual node.
>>
>> Max
>>
> 
> Well, in this case I would say that a root BDS is not really any
> different from an intermediate one and can't really know what's going on
> in the world outside.
> 
> At least, I think that's how we model it right now -- we pretend that we
> can record the activity of an entire drive graph by putting the bitmap
> on the root-most node we can get a hold of and assuming that all writes
> are going to go through us.

Well, yeah, I know we do.  But I consider this counter-intuitive and if
something is counter-intuitive it's often a bug.

> Clearly this is increasingly false the more we modularise the block graph.
> 
> 
> *uhm*
> 
> 
> I would say that a bitmap attached to a BlockBackend should behave in
> the way you say: writes to any children should change the bitmap here.
> 
> bitmaps attached to nodes shouldn't worry about such things.

Do we have bitmaps attached to BlockBackends?  I sure hope not.

We should not have any interface that requires the use of BlockBackends
by now.  If we do, that's something that has to be fixed.

Max

Re: [Qemu-devel] [Qemu-block] [PATCH] blockdev-backup: enable non-root nodes for backup
Posted by John Snow 6 years, 4 months ago

On 12/08/2017 09:30 AM, Max Reitz wrote:
> On 2017-12-05 01:48, John Snow wrote:
>>
>>
>> On 12/04/2017 05:21 PM, Max Reitz wrote:
>>> On 2017-12-04 23:15, John Snow wrote:
>>>>
>>>>
>>>> On 12/01/2017 02:41 PM, Max Reitz wrote:
>>>>> ((By the way, I don't suppose that's how it should work...  But I don't
>>>>> suppose that we want propagation of dirtying towards the BDS roots, do
>>>>> we? :-/))
>>>>
>>>> I have never really satisfactorily explained to myself what bitmaps on
>>>> intermediate notes truly represent or mean.
>>>>
>>>> The simple case is "This layer itself serviced a write request."
>>>>
>>>> If that information is not necessarily meaningful, I'm not sure that's a
>>>> problem except in configuration.
>>>>
>>>>
>>>> ...Now, if you wanted to talk about bitmaps that associate with a
>>>> Backend instead of a Node...
>>>
>>> But it's not about bitmaps on intermediate nodes, quite the opposite.
>>> It's about bitmaps on roots but write requests happening on intermediate
>>> nodes.
>>>
>>
>> Oh, I see what you're saying. It magically doesn't really change my
>> opinion, by coincidence!
>>
>>> Say you have a node I and two filter nodes A and B using it (and they
>>> are OK with shared writers).  There is a dirty bitmap on A.
>>>
>>> Now when a write request goes through B, I will obviously have changed,
>>> and because A and B are filters, so will A.  But the dirty bitmap on A
>>> will still be clean.
>>>
>>> My example was that when you run a mirror over A, you won't see dirtying
>>> from B.  So you can't e.g. add a throttle driver between a mirror job
>>> and the node you want to mirror, because the dirty bitmap on the
>>> throttle driver will not be affected by accesses to the actual node.
>>>
>>> Max
>>>
>>
>> Well, in this case I would say that a root BDS is not really any
>> different from an intermediate one and can't really know what's going on
>> in the world outside.
>>
>> At least, I think that's how we model it right now -- we pretend that we
>> can record the activity of an entire drive graph by putting the bitmap
>> on the root-most node we can get a hold of and assuming that all writes
>> are going to go through us.
> 
> Well, yeah, I know we do.  But I consider this counter-intuitive and if
> something is counter-intuitive it's often a bug.
> 
>> Clearly this is increasingly false the more we modularise the block graph.
>>
>>
>> *uhm*
>>
>>
>> I would say that a bitmap attached to a BlockBackend should behave in
>> the way you say: writes to any children should change the bitmap here.
>>
>> bitmaps attached to nodes shouldn't worry about such things.
> 
> Do we have bitmaps attached to BlockBackends?  I sure hope not.
> 
> We should not have any interface that requires the use of BlockBackends
> by now.  If we do, that's something that has to be fixed.
> 
> Max
> 

I'm not sure what the right paradigm is anymore, then.

A node is just a node, but something has to represent the "drive" as far
as the device model sees it. I thought that *was* the BlockBackend, but
is it not?

--js

Re: [Qemu-devel] [Qemu-block] [PATCH] blockdev-backup: enable non-root nodes for backup
Posted by Max Reitz 6 years, 4 months ago
On 2017-12-08 18:09, John Snow wrote:
> 
> 
> On 12/08/2017 09:30 AM, Max Reitz wrote:
>> On 2017-12-05 01:48, John Snow wrote:
>>>
>>>
>>> On 12/04/2017 05:21 PM, Max Reitz wrote:
>>>> On 2017-12-04 23:15, John Snow wrote:
>>>>>
>>>>>
>>>>> On 12/01/2017 02:41 PM, Max Reitz wrote:
>>>>>> ((By the way, I don't suppose that's how it should work...  But I don't
>>>>>> suppose that we want propagation of dirtying towards the BDS roots, do
>>>>>> we? :-/))
>>>>>
>>>>> I have never really satisfactorily explained to myself what bitmaps on
>>>>> intermediate notes truly represent or mean.
>>>>>
>>>>> The simple case is "This layer itself serviced a write request."
>>>>>
>>>>> If that information is not necessarily meaningful, I'm not sure that's a
>>>>> problem except in configuration.
>>>>>
>>>>>
>>>>> ...Now, if you wanted to talk about bitmaps that associate with a
>>>>> Backend instead of a Node...
>>>>
>>>> But it's not about bitmaps on intermediate nodes, quite the opposite.
>>>> It's about bitmaps on roots but write requests happening on intermediate
>>>> nodes.
>>>>
>>>
>>> Oh, I see what you're saying. It magically doesn't really change my
>>> opinion, by coincidence!
>>>
>>>> Say you have a node I and two filter nodes A and B using it (and they
>>>> are OK with shared writers).  There is a dirty bitmap on A.
>>>>
>>>> Now when a write request goes through B, I will obviously have changed,
>>>> and because A and B are filters, so will A.  But the dirty bitmap on A
>>>> will still be clean.
>>>>
>>>> My example was that when you run a mirror over A, you won't see dirtying
>>>> from B.  So you can't e.g. add a throttle driver between a mirror job
>>>> and the node you want to mirror, because the dirty bitmap on the
>>>> throttle driver will not be affected by accesses to the actual node.
>>>>
>>>> Max
>>>>
>>>
>>> Well, in this case I would say that a root BDS is not really any
>>> different from an intermediate one and can't really know what's going on
>>> in the world outside.
>>>
>>> At least, I think that's how we model it right now -- we pretend that we
>>> can record the activity of an entire drive graph by putting the bitmap
>>> on the root-most node we can get a hold of and assuming that all writes
>>> are going to go through us.
>>
>> Well, yeah, I know we do.  But I consider this counter-intuitive and if
>> something is counter-intuitive it's often a bug.
>>
>>> Clearly this is increasingly false the more we modularise the block graph.
>>>
>>>
>>> *uhm*
>>>
>>>
>>> I would say that a bitmap attached to a BlockBackend should behave in
>>> the way you say: writes to any children should change the bitmap here.
>>>
>>> bitmaps attached to nodes shouldn't worry about such things.
>>
>> Do we have bitmaps attached to BlockBackends?  I sure hope not.
>>
>> We should not have any interface that requires the use of BlockBackends
>> by now.  If we do, that's something that has to be fixed.
>>
>> Max
>>
> 
> I'm not sure what the right paradigm is anymore, then.
> 
> A node is just a node, but something has to represent the "drive" as far
> as the device model sees it. I thought that *was* the BlockBackend, but
> is it not?

Yes, and on the other side the BB represents the device model for the
block layer.  But the thing is that the user should be blissfully
unaware...  Or do you want to make bitmaps attachable to guest devices
(through the QOM path or ID) instead?

(The block layer would then internally translate that to a BB.  But
that's a bad internal interface because the bitmap is still attached to
a BDS, and it's a bad external interface because currently you can
attach bitmaps to nodes and only to nodes...)

Max

Re: [Qemu-devel] [Qemu-block] [PATCH] blockdev-backup: enable non-root nodes for backup
Posted by John Snow 6 years, 4 months ago

On 12/11/2017 11:31 AM, Max Reitz wrote:
> On 2017-12-08 18:09, John Snow wrote:
>>
>>
>> On 12/08/2017 09:30 AM, Max Reitz wrote:
>>> On 2017-12-05 01:48, John Snow wrote:
>>>>
>>>>
>>>> On 12/04/2017 05:21 PM, Max Reitz wrote:
>>>>> On 2017-12-04 23:15, John Snow wrote:
>>>>>>
>>>>>>
>>>>>> On 12/01/2017 02:41 PM, Max Reitz wrote:
>>>>>>> ((By the way, I don't suppose that's how it should work...  But I don't
>>>>>>> suppose that we want propagation of dirtying towards the BDS roots, do
>>>>>>> we? :-/))
>>>>>>
>>>>>> I have never really satisfactorily explained to myself what bitmaps on
>>>>>> intermediate notes truly represent or mean.
>>>>>>
>>>>>> The simple case is "This layer itself serviced a write request."
>>>>>>
>>>>>> If that information is not necessarily meaningful, I'm not sure that's a
>>>>>> problem except in configuration.
>>>>>>
>>>>>>
>>>>>> ...Now, if you wanted to talk about bitmaps that associate with a
>>>>>> Backend instead of a Node...
>>>>>
>>>>> But it's not about bitmaps on intermediate nodes, quite the opposite.
>>>>> It's about bitmaps on roots but write requests happening on intermediate
>>>>> nodes.
>>>>>
>>>>
>>>> Oh, I see what you're saying. It magically doesn't really change my
>>>> opinion, by coincidence!
>>>>
>>>>> Say you have a node I and two filter nodes A and B using it (and they
>>>>> are OK with shared writers).  There is a dirty bitmap on A.
>>>>>
>>>>> Now when a write request goes through B, I will obviously have changed,
>>>>> and because A and B are filters, so will A.  But the dirty bitmap on A
>>>>> will still be clean.
>>>>>
>>>>> My example was that when you run a mirror over A, you won't see dirtying
>>>>> from B.  So you can't e.g. add a throttle driver between a mirror job
>>>>> and the node you want to mirror, because the dirty bitmap on the
>>>>> throttle driver will not be affected by accesses to the actual node.
>>>>>
>>>>> Max
>>>>>
>>>>
>>>> Well, in this case I would say that a root BDS is not really any
>>>> different from an intermediate one and can't really know what's going on
>>>> in the world outside.
>>>>
>>>> At least, I think that's how we model it right now -- we pretend that we
>>>> can record the activity of an entire drive graph by putting the bitmap
>>>> on the root-most node we can get a hold of and assuming that all writes
>>>> are going to go through us.
>>>
>>> Well, yeah, I know we do.  But I consider this counter-intuitive and if
>>> something is counter-intuitive it's often a bug.
>>>
>>>> Clearly this is increasingly false the more we modularise the block graph.
>>>>
>>>>
>>>> *uhm*
>>>>
>>>>
>>>> I would say that a bitmap attached to a BlockBackend should behave in
>>>> the way you say: writes to any children should change the bitmap here.
>>>>
>>>> bitmaps attached to nodes shouldn't worry about such things.
>>>
>>> Do we have bitmaps attached to BlockBackends?  I sure hope not.
>>>
>>> We should not have any interface that requires the use of BlockBackends
>>> by now.  If we do, that's something that has to be fixed.
>>>
>>> Max
>>>
>>
>> I'm not sure what the right paradigm is anymore, then.
>>
>> A node is just a node, but something has to represent the "drive" as far
>> as the device model sees it. I thought that *was* the BlockBackend, but
>> is it not?
> 
> Yes, and on the other side the BB represents the device model for the
> block layer.  But the thing is that the user should be blissfully
> unaware...  Or do you want to make bitmaps attachable to guest devices
> (through the QOM path or ID) instead?
> 

OK, sure -- the user can specify a device model to attach it to instead
of a node. They don't have to be aware of the BB itself.

The implementation though, I imagine it associates with that BB.

> (The block layer would then internally translate that to a BB.  But
> that's a bad internal interface because the bitmap is still attached to
> a BDS, and it's a bad external interface because currently you can
> attach bitmaps to nodes and only to nodes...)

What if the type of bitmap we want to track trans-node changes was not
attached to a BDS? That'd be one way to obviously discriminate between
"This tracks tree-wide changes" and "This tracks node-local changes."

Implementation wise I don't have any actual thought as to how this could
possibly be efficient. Maybe a bitmap reference at each BDS that is a
child of that particular BB?

On attach, the BDS gets a set-only reference to that bitmap.
On detach, we remove the reference.

Then, any writes anywhere in the tree will coagulate in one place. It
may or may not be particularly true or correct, because a write down the
tree doesn't necessarily mean the visible data has changed at the top
layer, but I don't think we have anything institutionally set in place
to determine if we are changing visible data up-graph with a write
down-graph.

Re: [Qemu-devel] [Qemu-block] [PATCH] blockdev-backup: enable non-root nodes for backup
Posted by Max Reitz 6 years, 4 months ago
On 2017-12-11 17:47, John Snow wrote:
> 
> 
> On 12/11/2017 11:31 AM, Max Reitz wrote:
>> On 2017-12-08 18:09, John Snow wrote:
>>>
>>>
>>> On 12/08/2017 09:30 AM, Max Reitz wrote:
>>>> On 2017-12-05 01:48, John Snow wrote:
>>>>>
>>>>>
>>>>> On 12/04/2017 05:21 PM, Max Reitz wrote:
>>>>>> On 2017-12-04 23:15, John Snow wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 12/01/2017 02:41 PM, Max Reitz wrote:
>>>>>>>> ((By the way, I don't suppose that's how it should work...  But I don't
>>>>>>>> suppose that we want propagation of dirtying towards the BDS roots, do
>>>>>>>> we? :-/))
>>>>>>>
>>>>>>> I have never really satisfactorily explained to myself what bitmaps on
>>>>>>> intermediate notes truly represent or mean.
>>>>>>>
>>>>>>> The simple case is "This layer itself serviced a write request."
>>>>>>>
>>>>>>> If that information is not necessarily meaningful, I'm not sure that's a
>>>>>>> problem except in configuration.
>>>>>>>
>>>>>>>
>>>>>>> ...Now, if you wanted to talk about bitmaps that associate with a
>>>>>>> Backend instead of a Node...
>>>>>>
>>>>>> But it's not about bitmaps on intermediate nodes, quite the opposite.
>>>>>> It's about bitmaps on roots but write requests happening on intermediate
>>>>>> nodes.
>>>>>>
>>>>>
>>>>> Oh, I see what you're saying. It magically doesn't really change my
>>>>> opinion, by coincidence!
>>>>>
>>>>>> Say you have a node I and two filter nodes A and B using it (and they
>>>>>> are OK with shared writers).  There is a dirty bitmap on A.
>>>>>>
>>>>>> Now when a write request goes through B, I will obviously have changed,
>>>>>> and because A and B are filters, so will A.  But the dirty bitmap on A
>>>>>> will still be clean.
>>>>>>
>>>>>> My example was that when you run a mirror over A, you won't see dirtying
>>>>>> from B.  So you can't e.g. add a throttle driver between a mirror job
>>>>>> and the node you want to mirror, because the dirty bitmap on the
>>>>>> throttle driver will not be affected by accesses to the actual node.
>>>>>>
>>>>>> Max
>>>>>>
>>>>>
>>>>> Well, in this case I would say that a root BDS is not really any
>>>>> different from an intermediate one and can't really know what's going on
>>>>> in the world outside.
>>>>>
>>>>> At least, I think that's how we model it right now -- we pretend that we
>>>>> can record the activity of an entire drive graph by putting the bitmap
>>>>> on the root-most node we can get a hold of and assuming that all writes
>>>>> are going to go through us.
>>>>
>>>> Well, yeah, I know we do.  But I consider this counter-intuitive and if
>>>> something is counter-intuitive it's often a bug.
>>>>
>>>>> Clearly this is increasingly false the more we modularise the block graph.
>>>>>
>>>>>
>>>>> *uhm*
>>>>>
>>>>>
>>>>> I would say that a bitmap attached to a BlockBackend should behave in
>>>>> the way you say: writes to any children should change the bitmap here.
>>>>>
>>>>> bitmaps attached to nodes shouldn't worry about such things.
>>>>
>>>> Do we have bitmaps attached to BlockBackends?  I sure hope not.
>>>>
>>>> We should not have any interface that requires the use of BlockBackends
>>>> by now.  If we do, that's something that has to be fixed.
>>>>
>>>> Max
>>>>
>>>
>>> I'm not sure what the right paradigm is anymore, then.
>>>
>>> A node is just a node, but something has to represent the "drive" as far
>>> as the device model sees it. I thought that *was* the BlockBackend, but
>>> is it not?
>>
>> Yes, and on the other side the BB represents the device model for the
>> block layer.  But the thing is that the user should be blissfully
>> unaware...  Or do you want to make bitmaps attachable to guest devices
>> (through the QOM path or ID) instead?
>>
> 
> OK, sure -- the user can specify a device model to attach it to instead
> of a node. They don't have to be aware of the BB itself.
> 
> The implementation though, I imagine it associates with that BB.

But that would be a whole new implementation...

>> (The block layer would then internally translate that to a BB.  But
>> that's a bad internal interface because the bitmap is still attached to
>> a BDS, and it's a bad external interface because currently you can
>> attach bitmaps to nodes and only to nodes...)
> 
> What if the type of bitmap we want to track trans-node changes was not
> attached to a BDS? That'd be one way to obviously discriminate between
> "This tracks tree-wide changes" and "This tracks node-local changes."

A new type of bitmap? :-/

> Implementation wise I don't have any actual thought as to how this could
> possibly be efficient. Maybe a bitmap reference at each BDS that is a
> child of that particular BB?
> 
> On attach, the BDS gets a set-only reference to that bitmap.
> On detach, we remove the reference.
> 
> Then, any writes anywhere in the tree will coagulate in one place. It
> may or may not be particularly true or correct, because a write down the
> tree doesn't necessarily mean the visible data has changed at the top
> layer, but I don't think we have anything institutionally set in place
> to determine if we are changing visible data up-graph with a write
> down-graph.

Hmmm...  The first thing to clarify is whether we want two types of
bitmaps.  I don't think there is much use to node-local bitmaps, all
bitmaps should track every dirtying of their associated node (wherever
it comes from).

However, if that is too much of a performance penalty...  Then we
probably do have to distinguish between the two so that users only add
tree-wide bitmaps when they need them.

OTOH, I guess that in the common case it's not a performance penalty at
all, if done right.  Usually, a node you attach a bitmap to will not
have child nodes that are written to by other nodes.  So in the common
case your tree-wide bitmap is just a plain local bitmap and thus just as
efficient.

And if some child node is indeed written to by some other node...  I
think you always want a tree-wide bitmap anyway.

So I think all bitmaps should be tree-wide and the fact that they
currently are not is a bug.

Max

Re: [Qemu-devel] [Qemu-block] [PATCH] blockdev-backup: enable non-root nodes for backup
Posted by John Snow 6 years, 4 months ago

On 12/11/2017 12:05 PM, Max Reitz wrote:
> On 2017-12-11 17:47, John Snow wrote:
>> On 12/11/2017 11:31 AM, Max Reitz wrote:
>>> On 2017-12-08 18:09, John Snow wrote:
>>>> On 12/08/2017 09:30 AM, Max Reitz wrote:
>>>>> On 2017-12-05 01:48, John Snow wrote:
>>>>>>
>>>>>> I would say that a bitmap attached to a BlockBackend should behave in
>>>>>> the way you say: writes to any children should change the bitmap here.
>>>>>>
>>>>>> bitmaps attached to nodes shouldn't worry about such things.
>>>>>
>>>>> Do we have bitmaps attached to BlockBackends?  I sure hope not.
>>>>>
>>>>> We should not have any interface that requires the use of BlockBackends
>>>>> by now.  If we do, that's something that has to be fixed.
>>>>>
>>>>>
>>>>
>>>> I'm not sure what the right paradigm is anymore, then.
>>>>
>>>> A node is just a node, but something has to represent the "drive" as far
>>>> as the device model sees it. I thought that *was* the BlockBackend, but
>>>> is it not?
>>>
>>> Yes, and on the other side the BB represents the device model for the
>>> block layer.  But the thing is that the user should be blissfully
>>> unaware...  Or do you want to make bitmaps attachable to guest devices
>>> (through the QOM path or ID) instead?
>>>
>>
>> OK, sure -- the user can specify a device model to attach it to instead
>> of a node. They don't have to be aware of the BB itself.
>>
>> The implementation though, I imagine it associates with that BB.
> 
> But that would be a whole new implementation...
> 

Yeah.

>>> (The block layer would then internally translate that to a BB.  But
>>> that's a bad internal interface because the bitmap is still attached to
>>> a BDS, and it's a bad external interface because currently you can
>>> attach bitmaps to nodes and only to nodes...)
>>
>> What if the type of bitmap we want to track trans-node changes was not
>> attached to a BDS? That'd be one way to obviously discriminate between
>> "This tracks tree-wide changes" and "This tracks node-local changes."
> 
> A new type of bitmap? :-/
> 

"type" may be too strong of a word, but... all the ones we use currently
are node-local.

>> Implementation wise I don't have any actual thought as to how this could
>> possibly be efficient. Maybe a bitmap reference at each BDS that is a
>> child of that particular BB?
>>
>> On attach, the BDS gets a set-only reference to that bitmap.
>> On detach, we remove the reference.
>>
>> Then, any writes anywhere in the tree will coagulate in one place. It
>> may or may not be particularly true or correct, because a write down the
>> tree doesn't necessarily mean the visible data has changed at the top
>> layer, but I don't think we have anything institutionally set in place
>> to determine if we are changing visible data up-graph with a write
>> down-graph.
> 
> Hmmm...  The first thing to clarify is whether we want two types of
> bitmaps.  I don't think there is much use to node-local bitmaps, all
> bitmaps should track every dirtying of their associated node (wherever
> it comes from).
> 

I don't disagree, but if that's the case then attaching bitmaps to
non-root nodes should be prohibited and we ought to shore up the
semantic idea that bitmaps must collect writes from their children.

I'm not sure that's any better inherently than expanding the idea to
include obvious differences between node-local and tree-wide bitmaps.
Currently, we just confuse the two concepts and do a poor job of either.

> However, if that is too much of a performance penalty...  Then we
> probably do have to distinguish between the two so that users only add
> tree-wide bitmaps when they need them.
> 
> OTOH, I guess that in the common case it's not a performance penalty at
> all, if done right.  Usually, a node you attach a bitmap to will not
> have child nodes that are written to by other nodes.  So in the common
> case your tree-wide bitmap is just a plain local bitmap and thus just as
> efficient.
> 
> And if some child node is indeed written to by some other node...  I
> think you always want a tree-wide bitmap anyway.
> 
> So I think all bitmaps should be tree-wide and the fact that they
> currently are not is a bug.
I could agree with this viewpoint, but that means we need to disallow
bitmaps to be attached to any non-root BDS, and then implement an
ability to give set-only references to children.

That's a bit of an extreme tactic that might prevent us from ever using
node-local bitmaps with anything resembling a sane API in the future.
Maybe that's OK, but it is a commitment.

I'm not sure there are any good reasons to have node-local bitmaps, but
the idea was one I more or less inherited when I started working in this
area because specifying specific BDS nodes with "device=" was going out
of vogue, but what exactly a BB was wasn't really defined yet, so I got
stuck with a QMP interface named "node=" and an implementation that has
an affinity for nodes.