[v2] introduce pinned blk

[Qemu-devel] [PATCH v2 2/2] blockjob: use blk_new_pinned in block_job_create

Posted by Vladimir Sementsov-Ogievskiy 6 years, 8 months ago

child_role job already has .stay_at_node=true, so on bdrv_replace_node
operation these child are unchanged. Make block job blk behave in same
manner, to avoid inconsistent intermediate graph states and workarounds
like in mirror.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/mirror.c | 6 +-----
 blockjob.c     | 2 +-
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index f8bdb5b21b..23443116e4 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -713,12 +713,8 @@ static int mirror_exit_common(Job *job)
                             &error_abort);
     bdrv_replace_node(mirror_top_bs, backing_bs(mirror_top_bs), &error_abort);
 
-    /* We just changed the BDS the job BB refers to (with either or both of the
-     * bdrv_replace_node() calls), so switch the BB back so the cleanup does
-     * the right thing. We don't need any permissions any more now. */
-    blk_remove_bs(bjob->blk);
+    /* We don't need any permissions any more now. */
     blk_set_perm(bjob->blk, 0, BLK_PERM_ALL, &error_abort);
-    blk_insert_bs(bjob->blk, mirror_top_bs, &error_abort);
 
     bs_opaque->job = NULL;
 
diff --git a/blockjob.c b/blockjob.c
index 931d675c0c..f5c8d31491 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -398,7 +398,7 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
         job_id = bdrv_get_device_name(bs);
     }
 
-    blk = blk_new(bdrv_get_aio_context(bs), perm, shared_perm);
+    blk = blk_new_pinned(bdrv_get_aio_context(bs), perm, shared_perm);
     ret = blk_insert_bs(blk, bs, errp);
     if (ret < 0) {
         blk_unref(blk);
-- 
2.18.0

Re: [Qemu-devel] [PATCH v2 2/2] blockjob: use blk_new_pinned in block_job_create

Posted by Kevin Wolf 6 years, 8 months ago

Am 05.06.2019 um 14:32 hat Vladimir Sementsov-Ogievskiy geschrieben:
> child_role job already has .stay_at_node=true, so on bdrv_replace_node
> operation these child are unchanged. Make block job blk behave in same
> manner, to avoid inconsistent intermediate graph states and workarounds
> like in mirror.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

This feels dangerous. It does what you want it to do if the only graph
change below the BlockBackend is the one in mirror_exit_common. But the
user could also take a snapshot, or in the future hopefully insert a
filter node, and you would then want the BlockBackend to move.

To be honest, even BdrvChildRole.stay_at_node is a bit of a hack. But at
least it's only used for permissions and not for the actual data flow.

Kevin

Re: [Qemu-devel] [PATCH v2 2/2] blockjob: use blk_new_pinned in block_job_create

Posted by Vladimir Sementsov-Ogievskiy 6 years, 8 months ago

05.06.2019 20:11, Kevin Wolf wrote:
> Am 05.06.2019 um 14:32 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> child_role job already has .stay_at_node=true, so on bdrv_replace_node
>> operation these child are unchanged. Make block job blk behave in same
>> manner, to avoid inconsistent intermediate graph states and workarounds
>> like in mirror.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> 
> This feels dangerous. It does what you want it to do if the only graph
> change below the BlockBackend is the one in mirror_exit_common. But the
> user could also take a snapshot, or in the future hopefully insert a
> filter node, and you would then want the BlockBackend to move.
> 
> To be honest, even BdrvChildRole.stay_at_node is a bit of a hack. But at
> least it's only used for permissions and not for the actual data flow.
> 

Hmm. Than, may be just add a parameter to bdrv_replace_node, which parents
to ignore? Would it work?


-- 
Best regards,
Vladimir

Re: [Qemu-devel] [PATCH v2 2/2] blockjob: use blk_new_pinned in block_job_create

Posted by Kevin Wolf 6 years, 8 months ago

Am 05.06.2019 um 19:16 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 05.06.2019 20:11, Kevin Wolf wrote:
> > Am 05.06.2019 um 14:32 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >> child_role job already has .stay_at_node=true, so on bdrv_replace_node
> >> operation these child are unchanged. Make block job blk behave in same
> >> manner, to avoid inconsistent intermediate graph states and workarounds
> >> like in mirror.
> >>
> >> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> > 
> > This feels dangerous. It does what you want it to do if the only graph
> > change below the BlockBackend is the one in mirror_exit_common. But the
> > user could also take a snapshot, or in the future hopefully insert a
> > filter node, and you would then want the BlockBackend to move.
> > 
> > To be honest, even BdrvChildRole.stay_at_node is a bit of a hack. But at
> > least it's only used for permissions and not for the actual data flow.
> 
> Hmm. Than, may be just add a parameter to bdrv_replace_node, which parents
> to ignore? Would it work?

I would have to think a bit more about it, but it does sound safer.

Or we take a step back and ask why it's even a problem for the mirror
block job if the BlockBackend is moved to a different node. The main
reason I see is because of bs->job that is set for the root node of the
BlockBackend and needs to be unset for the same node.

Maybe we can just finally get rid of bs->job? It doesn't have many users
any more.

Kevin

Re: [Qemu-devel] [PATCH v2 2/2] blockjob: use blk_new_pinned in block_job_create

Posted by Vladimir Sementsov-Ogievskiy 6 years, 8 months ago

06.06.2019 13:05, Kevin Wolf wrote:
> Am 05.06.2019 um 19:16 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 05.06.2019 20:11, Kevin Wolf wrote:
>>> Am 05.06.2019 um 14:32 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>> child_role job already has .stay_at_node=true, so on bdrv_replace_node
>>>> operation these child are unchanged. Make block job blk behave in same
>>>> manner, to avoid inconsistent intermediate graph states and workarounds
>>>> like in mirror.
>>>>
>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>
>>> This feels dangerous. It does what you want it to do if the only graph
>>> change below the BlockBackend is the one in mirror_exit_common. But the
>>> user could also take a snapshot, or in the future hopefully insert a
>>> filter node, and you would then want the BlockBackend to move.
>>>
>>> To be honest, even BdrvChildRole.stay_at_node is a bit of a hack. But at
>>> least it's only used for permissions and not for the actual data flow.
>>
>> Hmm. Than, may be just add a parameter to bdrv_replace_node, which parents
>> to ignore? Would it work?
> 
> I would have to think a bit more about it, but it does sound safer.
> 
> Or we take a step back and ask why it's even a problem for the mirror
> block job if the BlockBackend is moved to a different node. The main
> reason I see is because of bs->job that is set for the root node of the
> BlockBackend and needs to be unset for the same node.
> 
> Maybe we can just finally get rid of bs->job? It doesn't have many users
> any more.
> 

Hmm, looked at it. Not sure what should be refactored around job to get rid
of "main node" concept.. Which seems to be in a bad relation with starting
job on implicit filters as a main node..

But about just removing bs->job pointer, I don't know at least what to do with
blk_iostatus_reset and blockdev_mark_auto_del..


-- 
Best regards,
Vladimir

Re: [Qemu-devel] [PATCH v2 2/2] blockjob: use blk_new_pinned in block_job_create

Posted by Kevin Wolf 6 years, 8 months ago

Am 06.06.2019 um 14:29 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 06.06.2019 13:05, Kevin Wolf wrote:
> > Am 05.06.2019 um 19:16 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >> 05.06.2019 20:11, Kevin Wolf wrote:
> >>> Am 05.06.2019 um 14:32 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >>>> child_role job already has .stay_at_node=true, so on bdrv_replace_node
> >>>> operation these child are unchanged. Make block job blk behave in same
> >>>> manner, to avoid inconsistent intermediate graph states and workarounds
> >>>> like in mirror.
> >>>>
> >>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> >>>
> >>> This feels dangerous. It does what you want it to do if the only graph
> >>> change below the BlockBackend is the one in mirror_exit_common. But the
> >>> user could also take a snapshot, or in the future hopefully insert a
> >>> filter node, and you would then want the BlockBackend to move.
> >>>
> >>> To be honest, even BdrvChildRole.stay_at_node is a bit of a hack. But at
> >>> least it's only used for permissions and not for the actual data flow.
> >>
> >> Hmm. Than, may be just add a parameter to bdrv_replace_node, which parents
> >> to ignore? Would it work?
> > 
> > I would have to think a bit more about it, but it does sound safer.
> > 
> > Or we take a step back and ask why it's even a problem for the mirror
> > block job if the BlockBackend is moved to a different node. The main
> > reason I see is because of bs->job that is set for the root node of the
> > BlockBackend and needs to be unset for the same node.
> > 
> > Maybe we can just finally get rid of bs->job? It doesn't have many users
> > any more.
> > 
> 
> Hmm, looked at it. Not sure what should be refactored around job to get rid
> of "main node" concept.. Which seems to be in a bad relation with starting
> job on implicit filters as a main node..
> 
> But about just removing bs->job pointer, I don't know at least what to do with
> blk_iostatus_reset and blockdev_mark_auto_del..

blk_iostatus_reset() looks easy. It has only two callers:

1. blk_attach_dev(). This doesn't have anything to do with jobs and
   attaching a new guest device won't solve any problem the job
   encountered, so no reason to reset the iostatus for the job.

2. qmp_cont(). This resets the iostatus for everything. We can just
   call block_job_iostatus_reset() for all block jobs instead of going
   through BlockBackend.

blockdev_mark_auto_del() might be a bit trickier. The whole idea of the
function is: When a guest device gets unplugged, automatically remove
its root block node, too. Commit 12bde0eed6b made it cancel a block job
because that should happen immediately when the device is actually
released by the guest and not only after the job finishes and gives up
its reference. I would like to just change the behaviour, but I'm afraid
we can't do this because of compatibility.

However, just checking bs->job is really only one special case of
another user of the node to be deleted. Maybe we can extend it a little
so that any block jobs that contain the node in job->nodes are
cancelled.

Kevin

Re: [Qemu-devel] [PATCH v2 2/2] blockjob: use blk_new_pinned in block_job_create

Posted by Vladimir Sementsov-Ogievskiy 6 years, 8 months ago

06.06.2019 16:06, Kevin Wolf wrote:
> Am 06.06.2019 um 14:29 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 06.06.2019 13:05, Kevin Wolf wrote:
>>> Am 05.06.2019 um 19:16 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>> 05.06.2019 20:11, Kevin Wolf wrote:
>>>>> Am 05.06.2019 um 14:32 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>>>> child_role job already has .stay_at_node=true, so on bdrv_replace_node
>>>>>> operation these child are unchanged. Make block job blk behave in same
>>>>>> manner, to avoid inconsistent intermediate graph states and workarounds
>>>>>> like in mirror.
>>>>>>
>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>>
>>>>> This feels dangerous. It does what you want it to do if the only graph
>>>>> change below the BlockBackend is the one in mirror_exit_common. But the
>>>>> user could also take a snapshot, or in the future hopefully insert a
>>>>> filter node, and you would then want the BlockBackend to move.
>>>>>
>>>>> To be honest, even BdrvChildRole.stay_at_node is a bit of a hack. But at
>>>>> least it's only used for permissions and not for the actual data flow.
>>>>
>>>> Hmm. Than, may be just add a parameter to bdrv_replace_node, which parents
>>>> to ignore? Would it work?
>>>
>>> I would have to think a bit more about it, but it does sound safer.
>>>
>>> Or we take a step back and ask why it's even a problem for the mirror
>>> block job if the BlockBackend is moved to a different node. The main
>>> reason I see is because of bs->job that is set for the root node of the
>>> BlockBackend and needs to be unset for the same node.
>>>
>>> Maybe we can just finally get rid of bs->job? It doesn't have many users
>>> any more.
>>>
>>
>> Hmm, looked at it. Not sure what should be refactored around job to get rid
>> of "main node" concept.. Which seems to be in a bad relation with starting
>> job on implicit filters as a main node..
>>
>> But about just removing bs->job pointer, I don't know at least what to do with
>> blk_iostatus_reset and blockdev_mark_auto_del..
> 
> blk_iostatus_reset() looks easy. It has only two callers:
> 
> 1. blk_attach_dev(). This doesn't have anything to do with jobs and
>     attaching a new guest device won't solve any problem the job
>     encountered, so no reason to reset the iostatus for the job.
> 
> 2. qmp_cont(). This resets the iostatus for everything. We can just
>     call block_job_iostatus_reset() for all block jobs instead of going
>     through BlockBackend.
> 
> blockdev_mark_auto_del() might be a bit trickier. The whole idea of the
> function is: When a guest device gets unplugged, automatically remove
> its root block node, too. Commit 12bde0eed6b made it cancel a block job
> because that should happen immediately when the device is actually
> released by the guest and not only after the job finishes and gives up
> its reference. I would like to just change the behaviour, but I'm afraid
> we can't do this because of compatibility.
> 
> However, just checking bs->job is really only one special case of
> another user of the node to be deleted. Maybe we can extend it a little
> so that any block jobs that contain the node in job->nodes are
> cancelled.
> 

OK, thanks. I'll try this way


-- 
Best regards,
Vladimir