[PATCH] stream: Remove bdrv from job in .clean()

Wesley Hershberger posted 1 patch 3 weeks, 3 days ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20251020-fix-3149-v1-1-04b2d4db5179@canonical.com
Maintainers: John Snow <jsnow@redhat.com>, Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>, Kevin Wolf <kwolf@redhat.com>, Hanna Reitz <hreitz@redhat.com>
There is a newer version of this series
block/stream.c | 6 ++++++
1 file changed, 6 insertions(+)
[PATCH] stream: Remove bdrv from job in .clean()
Posted by Wesley Hershberger 3 weeks, 3 days ago
This is similar to bdc4c4c5e372756a5ba3fb3a61e585b02f0dd7f4
(qmp blockdev-backup). The cor_filter_bs was added to the blockjob as
the main BDS (by passing it to block_job_create), so
bdrv_cor_filter_drop doesn't actually remove it from global_bdrv_states.

This can cause races with qmp query-named-block nodes as described in
 #3149.

As in bdc4c4c, there is no function to remove just the cor_filter_bs
from the job, so drop all the job's nodes as they are no longer needed.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3149
Buglink: https://bugs.launchpad.net/bugs/2126951
Signed-off-by: Wesley Hershberger <wesley.hershberger@canonical.com>
---
This patch fixes the issue described in Gitlab #3149. Please see the bug
for additional context & reproducer for the issue.

I'm happy to discuss alternative approaches or resubmit as needed.

`make check-block` passes locally.

A review would be greatly appreciated as a customer's production is
impacted.

First-time patch mailer so please pardon any mistakes.
---
 block/stream.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/block/stream.c b/block/stream.c
index c0616b69e259bf5a9b146dadd9dbac62bfaa9f23..1733abd8f96d7847701f54a7a55d3284387b8582 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -132,6 +132,12 @@ static void stream_clean(Job *job)
 {
     StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
 
+    /*
+     * The job still holds a reference to cor_filter_bs; drop all bdrv to
+     * ensure that it is unref-ed
+     */
+    block_job_remove_all_bdrv(&s->common);
+
     if (s->cor_filter_bs) {
         bdrv_cor_filter_drop(s->cor_filter_bs);
         s->cor_filter_bs = NULL;

---
base-commit: 3a2d5612a7422732b648b46d4b934e2e54622fd6
change-id: 20251020-fix-3149-f01ae62fa53c

Best regards,
-- 
Wesley Hershberger <wesley.hershberger@canonical.com>
Re: [PATCH] stream: Remove bdrv from job in .clean()
Posted by Vladimir Sementsov-Ogievskiy 3 weeks, 3 days ago
On 21.10.25 01:10, Wesley Hershberger wrote:
> This is similar to bdc4c4c5e372756a5ba3fb3a61e585b02f0dd7f4
> (qmp blockdev-backup). The cor_filter_bs was added to the blockjob as
> the main BDS (by passing it to block_job_create), so
> bdrv_cor_filter_drop doesn't actually remove it from global_bdrv_states.
> 
> This can cause races with qmp query-named-block nodes as described in
>   #3149.

Hmm. But why it lead to segfault? Ok, children kept in global_bdrv_states.
and referenced by the job itself (as I can assume from the patch). What's
wrong with the state, so we went in segfault?

Also, would be good to write the segfault backtrace here, for someone
to search the fix through git.

> 
> As in bdc4c4c, there is no function to remove just the cor_filter_bs
> from the job, so drop all the job's nodes as they are no longer needed.
> 
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3149
> Buglink: https://bugs.launchpad.net/bugs/2126951
> Signed-off-by: Wesley Hershberger <wesley.hershberger@canonical.com>
> ---
> This patch fixes the issue described in Gitlab #3149. Please see the bug
> for additional context & reproducer for the issue.
> 
> I'm happy to discuss alternative approaches or resubmit as needed.
> 
> `make check-block` passes locally.
> 
> A review would be greatly appreciated as a customer's production is
> impacted.
> 
> First-time patch mailer so please pardon any mistakes.
> ---
>   block/stream.c | 6 ++++++
>   1 file changed, 6 insertions(+)
> 
> diff --git a/block/stream.c b/block/stream.c
> index c0616b69e259bf5a9b146dadd9dbac62bfaa9f23..1733abd8f96d7847701f54a7a55d3284387b8582 100644
> --- a/block/stream.c
> +++ b/block/stream.c
> @@ -132,6 +132,12 @@ static void stream_clean(Job *job)
>   {
>       StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
>   
> +    /*
> +     * The job still holds a reference to cor_filter_bs; drop all bdrv to
> +     * ensure that it is unref-ed
> +     */
> +    block_job_remove_all_bdrv(&s->common);
> +
>       if (s->cor_filter_bs) {
>           bdrv_cor_filter_drop(s->cor_filter_bs);
>           s->cor_filter_bs = NULL;
> 
> ---
> base-commit: 3a2d5612a7422732b648b46d4b934e2e54622fd6
> change-id: 20251020-fix-3149-f01ae62fa53c
> 
> Best regards,


-- 
Best regards,
Vladimir
Re: [PATCH] stream: Remove bdrv from job in .clean()
Posted by Wesley Hershberger 3 weeks, 3 days ago
On Tue, Oct 21, 2025 at 1:05 AM Vladimir Sementsov-Ogievskiy
<vsementsov@yandex-team.ru> wrote:
>
> > This can cause races with qmp query-named-block nodes as described in
> >   #3149.
>
> Hmm. But why it lead to segfault? Ok, children kept in global_bdrv_states.
> and referenced by the job itself (as I can assume from the patch). What's
> wrong with the state, so we went in segfault?

My understanding is that when bdrv_replace_node_common is called with
`detach_subchain=true` then `children` is cleared. Here's a trace from
stream_clean (line numbers approximate on v10.1.0):

 - stream_clean (block/stream.c L131)
 - bdrv_cor_filter_drop (block/copy-on-read.c L274)
 - bdrv_drop_filter (block.c L5466)
 - bdrv_replace_node_common (block.c L5404)
    (called with `detach_subchain=true`)
 - bdrv_remove_child (block.c L5335)
 - bdrv_replace_child_tran (block.c L2489)
 - bdrv_replace_child_noperm (block.c L2944)
 - bdrv_child_cb_detach (BdrvChildClass.detach) (block.c L1492)

I can observe that children.lh_first == NULL in gdb.

> Also, would be good to write the segfault backtrace here, for someone
> to search the fix through git.

I'll include the backtrace from #3149 in the commit message of my next
submission.

Thanks so much for the review!
~Wesley