The recursive bdrv_drain_recurse may run a block job completion BH that
drops nodes. The coming changes will make that more likely and use-after-free
would happen without this patch
Stash the bs pointer and use bdrv_ref/bdrv_unref in addition to
QLIST_FOREACH_SAFE to prevent such a case from happening.
Since bdrv_unref accesses global state that is not protected by the AioContext
lock, we cannot use bdrv_ref/bdrv_unref unconditionally. Fortunately the
protection is not needed in IOThread because only main loop can modify a graph
with the AioContext lock held.
Signed-off-by: Fam Zheng <famz@redhat.com>
---
block/io.c | 23 ++++++++++++++++++++---
1 file changed, 20 insertions(+), 3 deletions(-)
diff --git a/block/io.c b/block/io.c
index 8706bfa..a0df8c4 100644
--- a/block/io.c
+++ b/block/io.c
@@ -158,7 +158,7 @@ bool bdrv_requests_pending(BlockDriverState *bs)
static bool bdrv_drain_recurse(BlockDriverState *bs)
{
- BdrvChild *child;
+ BdrvChild *child, *tmp;
bool waited;
waited = BDRV_POLL_WHILE(bs, atomic_read(&bs->in_flight) > 0);
@@ -167,8 +167,25 @@ static bool bdrv_drain_recurse(BlockDriverState *bs)
bs->drv->bdrv_drain(bs);
}
- QLIST_FOREACH(child, &bs->children, next) {
- waited |= bdrv_drain_recurse(child->bs);
+ QLIST_FOREACH_SAFE(child, &bs->children, next, tmp) {
+ BlockDriverState *bs = child->bs;
+ bool in_main_loop =
+ qemu_get_current_aio_context() == qemu_get_aio_context();
+ assert(bs->refcnt > 0);
+ if (in_main_loop) {
+ /* In case the resursive bdrv_drain_recurse processes a
+ * block_job_defer_to_main_loop BH and modifies the graph,
+ * let's hold a reference to bs until we are done.
+ *
+ * IOThread doesn't have such a BH, and it is not safe to call
+ * bdrv_unref without BQL, so skip doing it there.
+ **/
+ bdrv_ref(bs);
+ }
+ waited |= bdrv_drain_recurse(bs);
+ if (in_main_loop) {
+ bdrv_unref(bs);
+ }
}
return waited;
--
2.9.3
Am 18.04.2017 um 16:30 hat Fam Zheng geschrieben:
> The recursive bdrv_drain_recurse may run a block job completion BH that
> drops nodes. The coming changes will make that more likely and use-after-free
> would happen without this patch
>
> Stash the bs pointer and use bdrv_ref/bdrv_unref in addition to
> QLIST_FOREACH_SAFE to prevent such a case from happening.
>
> Since bdrv_unref accesses global state that is not protected by the AioContext
> lock, we cannot use bdrv_ref/bdrv_unref unconditionally. Fortunately the
> protection is not needed in IOThread because only main loop can modify a graph
> with the AioContext lock held.
>
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
> block/io.c | 23 ++++++++++++++++++++---
> 1 file changed, 20 insertions(+), 3 deletions(-)
>
> diff --git a/block/io.c b/block/io.c
> index 8706bfa..a0df8c4 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -158,7 +158,7 @@ bool bdrv_requests_pending(BlockDriverState *bs)
>
> static bool bdrv_drain_recurse(BlockDriverState *bs)
> {
> - BdrvChild *child;
> + BdrvChild *child, *tmp;
> bool waited;
>
> waited = BDRV_POLL_WHILE(bs, atomic_read(&bs->in_flight) > 0);
> @@ -167,8 +167,25 @@ static bool bdrv_drain_recurse(BlockDriverState *bs)
> bs->drv->bdrv_drain(bs);
> }
>
> - QLIST_FOREACH(child, &bs->children, next) {
> - waited |= bdrv_drain_recurse(child->bs);
> + QLIST_FOREACH_SAFE(child, &bs->children, next, tmp) {
> + BlockDriverState *bs = child->bs;
> + bool in_main_loop =
> + qemu_get_current_aio_context() == qemu_get_aio_context();
> + assert(bs->refcnt > 0);
> + if (in_main_loop) {
> + /* In case the resursive bdrv_drain_recurse processes a
s/resursive/recursive/
> + * block_job_defer_to_main_loop BH and modifies the graph,
> + * let's hold a reference to bs until we are done.
> + *
> + * IOThread doesn't have such a BH, and it is not safe to call
> + * bdrv_unref without BQL, so skip doing it there.
> + **/
And **/ is unusual, too.
> + bdrv_ref(bs);
> + }
> + waited |= bdrv_drain_recurse(bs);
> + if (in_main_loop) {
> + bdrv_unref(bs);
> + }
> }
Other than this, the series looks good to me.
Kevin
On Tue, 04/18 16:46, Kevin Wolf wrote:
> Am 18.04.2017 um 16:30 hat Fam Zheng geschrieben:
> > The recursive bdrv_drain_recurse may run a block job completion BH that
> > drops nodes. The coming changes will make that more likely and use-after-free
> > would happen without this patch
> >
> > Stash the bs pointer and use bdrv_ref/bdrv_unref in addition to
> > QLIST_FOREACH_SAFE to prevent such a case from happening.
> >
> > Since bdrv_unref accesses global state that is not protected by the AioContext
> > lock, we cannot use bdrv_ref/bdrv_unref unconditionally. Fortunately the
> > protection is not needed in IOThread because only main loop can modify a graph
> > with the AioContext lock held.
> >
> > Signed-off-by: Fam Zheng <famz@redhat.com>
> > ---
> > block/io.c | 23 ++++++++++++++++++++---
> > 1 file changed, 20 insertions(+), 3 deletions(-)
> >
> > diff --git a/block/io.c b/block/io.c
> > index 8706bfa..a0df8c4 100644
> > --- a/block/io.c
> > +++ b/block/io.c
> > @@ -158,7 +158,7 @@ bool bdrv_requests_pending(BlockDriverState *bs)
> >
> > static bool bdrv_drain_recurse(BlockDriverState *bs)
> > {
> > - BdrvChild *child;
> > + BdrvChild *child, *tmp;
> > bool waited;
> >
> > waited = BDRV_POLL_WHILE(bs, atomic_read(&bs->in_flight) > 0);
> > @@ -167,8 +167,25 @@ static bool bdrv_drain_recurse(BlockDriverState *bs)
> > bs->drv->bdrv_drain(bs);
> > }
> >
> > - QLIST_FOREACH(child, &bs->children, next) {
> > - waited |= bdrv_drain_recurse(child->bs);
> > + QLIST_FOREACH_SAFE(child, &bs->children, next, tmp) {
> > + BlockDriverState *bs = child->bs;
> > + bool in_main_loop =
> > + qemu_get_current_aio_context() == qemu_get_aio_context();
> > + assert(bs->refcnt > 0);
> > + if (in_main_loop) {
> > + /* In case the resursive bdrv_drain_recurse processes a
>
> s/resursive/recursive/
>
> > + * block_job_defer_to_main_loop BH and modifies the graph,
> > + * let's hold a reference to bs until we are done.
> > + *
> > + * IOThread doesn't have such a BH, and it is not safe to call
> > + * bdrv_unref without BQL, so skip doing it there.
> > + **/
>
> And **/ is unusual, too.
>
> > + bdrv_ref(bs);
> > + }
> > + waited |= bdrv_drain_recurse(bs);
> > + if (in_main_loop) {
> > + bdrv_unref(bs);
> > + }
> > }
>
> Other than this, the series looks good to me.
Thanks, I'll fix them and send a pull request to Peter.
Fam
© 2016 - 2026 Red Hat, Inc.