When an NBD block driver state is moved from one aio_context to another
(e.g. when doing a drain in a migration thread),
nbd_client_attach_aio_context_bh is executed that enters the connection
coroutine.
However, the assumption that ->connection_co is always present here
appears incorrect: the connection may have encountered an error other
than -EIO in the underlying transport, and thus may have decided to quit
rather than keep trying to reconnect, and therefore it may have
terminated the connection coroutine. As a result an attempt to reassign
the client in this state (NBD_CLIENT_QUIT) to a different aio_context
leads to a null pointer dereference:
at /build/qemu-6MF7tq/qemu-5.0.1/util/qemu-coroutine.c:109
opaque=0x561805ed4c00) at /build/qemu-6MF7tq/qemu-5.0.1/block/nbd.c:164
at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-wait.c:55
at /build/qemu-6MF7tq/qemu-5.0.1/util/async.c:136
at /build/qemu-6MF7tq/qemu-5.0.1/util/async.c:164
blocking=blocking@entry=true)
at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-posix.c:650
cb=<optimized out>, opaque=<optimized out>)
at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-wait.c:71
bs=0x561805ed4c00) at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6172
new_context=new_context@entry=0x5618056c7580,
ignore=ignore@entry=0x7f60e1e63780)
at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6237
bs=bs@entry=0x561805ed4c00, ctx=0x5618056c7580,
ignore_child=<optimized out>, errp=<optimized out>)
at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6332
new_context=0x5618056c7580, update_root_node=update_root_node@entry=true,
errp=errp@entry=0x0)
at /build/qemu-6MF7tq/qemu-5.0.1/block/block-backend.c:1989
new_context=<optimized out>, errp=errp@entry=0x0)
at /build/qemu-6MF7tq/qemu-5.0.1/block/block-backend.c:2010
at /build/qemu-6MF7tq/qemu-5.0.1/hw/block/dataplane/virtio-blk.c:292
at /build/qemu-6MF7tq/qemu-5.0.1/hw/virtio/virtio-bus.c:245
running=0, state=<optimized out>)
at /build/qemu-6MF7tq/qemu-5.0.1/hw/virtio/virtio.c:3220
state=state@entry=RUN_STATE_FINISH_MIGRATE)
at /build/qemu-6MF7tq/qemu-5.0.1/softmmu/vl.c:1275
send_stop=<optimized out>) at /build/qemu-6MF7tq/qemu-5.0.1/cpus.c:1032
at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:2914
at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:3275
at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:3439
at /build/qemu-6MF7tq/qemu-5.0.1/util/qemu-thread-posix.c:519
from /lib/x86_64-linux-gnu/libpthread.so.0
Fix it by checking that the connection coroutine is non-null before
trying to enter it. If it is null, no entering is needed, as the
connection is probably going down anyway.
Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru>
---
block/nbd.c | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/block/nbd.c b/block/nbd.c
index bcd6641e90..b3cbbeb4b0 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -250,13 +250,15 @@ static void nbd_client_attach_aio_context_bh(void *opaque)
BlockDriverState *bs = opaque;
BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
- /*
- * The node is still drained, so we know the coroutine has yielded in
- * nbd_read_eof(), the only place where bs->in_flight can reach 0, or it is
- * entered for the first time. Both places are safe for entering the
- * coroutine.
- */
- qemu_aio_coroutine_enter(bs->aio_context, s->connection_co);
+ if (s->connection_co) {
+ /*
+ * The node is still drained, so we know the coroutine has yielded in
+ * nbd_read_eof(), the only place where bs->in_flight can reach 0, or
+ * it is entered for the first time. Both places are safe for entering
+ * the coroutine.
+ */
+ qemu_aio_coroutine_enter(bs->aio_context, s->connection_co);
+ }
bdrv_dec_in_flight(bs);
}
--
2.29.2
28.01.2021 23:14, Roman Kagan wrote:
> When an NBD block driver state is moved from one aio_context to another
> (e.g. when doing a drain in a migration thread),
> nbd_client_attach_aio_context_bh is executed that enters the connection
> coroutine.
>
> However, the assumption that ->connection_co is always present here
> appears incorrect: the connection may have encountered an error other
> than -EIO in the underlying transport, and thus may have decided to quit
> rather than keep trying to reconnect, and therefore it may have
> terminated the connection coroutine. As a result an attempt to reassign
> the client in this state (NBD_CLIENT_QUIT) to a different aio_context
> leads to a null pointer dereference:
>
> at /build/qemu-6MF7tq/qemu-5.0.1/util/qemu-coroutine.c:109
> opaque=0x561805ed4c00) at /build/qemu-6MF7tq/qemu-5.0.1/block/nbd.c:164
> at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-wait.c:55
> at /build/qemu-6MF7tq/qemu-5.0.1/util/async.c:136
> at /build/qemu-6MF7tq/qemu-5.0.1/util/async.c:164
> blocking=blocking@entry=true)
> at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-posix.c:650
> cb=<optimized out>, opaque=<optimized out>)
> at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-wait.c:71
> bs=0x561805ed4c00) at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6172
> new_context=new_context@entry=0x5618056c7580,
> ignore=ignore@entry=0x7f60e1e63780)
> at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6237
> bs=bs@entry=0x561805ed4c00, ctx=0x5618056c7580,
> ignore_child=<optimized out>, errp=<optimized out>)
> at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6332
> new_context=0x5618056c7580, update_root_node=update_root_node@entry=true,
> errp=errp@entry=0x0)
> at /build/qemu-6MF7tq/qemu-5.0.1/block/block-backend.c:1989
> new_context=<optimized out>, errp=errp@entry=0x0)
> at /build/qemu-6MF7tq/qemu-5.0.1/block/block-backend.c:2010
> at /build/qemu-6MF7tq/qemu-5.0.1/hw/block/dataplane/virtio-blk.c:292
> at /build/qemu-6MF7tq/qemu-5.0.1/hw/virtio/virtio-bus.c:245
> running=0, state=<optimized out>)
> at /build/qemu-6MF7tq/qemu-5.0.1/hw/virtio/virtio.c:3220
> state=state@entry=RUN_STATE_FINISH_MIGRATE)
> at /build/qemu-6MF7tq/qemu-5.0.1/softmmu/vl.c:1275
> send_stop=<optimized out>) at /build/qemu-6MF7tq/qemu-5.0.1/cpus.c:1032
> at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:2914
> at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:3275
> at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:3439
> at /build/qemu-6MF7tq/qemu-5.0.1/util/qemu-thread-posix.c:519
> from /lib/x86_64-linux-gnu/libpthread.so.0
>
> Fix it by checking that the connection coroutine is non-null before
> trying to enter it. If it is null, no entering is needed, as the
> connection is probably going down anyway.
>
> Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
> block/nbd.c | 16 +++++++++-------
> 1 file changed, 9 insertions(+), 7 deletions(-)
>
> diff --git a/block/nbd.c b/block/nbd.c
> index bcd6641e90..b3cbbeb4b0 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -250,13 +250,15 @@ static void nbd_client_attach_aio_context_bh(void *opaque)
> BlockDriverState *bs = opaque;
> BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
>
> - /*
> - * The node is still drained, so we know the coroutine has yielded in
> - * nbd_read_eof(), the only place where bs->in_flight can reach 0, or it is
> - * entered for the first time. Both places are safe for entering the
> - * coroutine.
> - */
> - qemu_aio_coroutine_enter(bs->aio_context, s->connection_co);
> + if (s->connection_co) {
> + /*
> + * The node is still drained, so we know the coroutine has yielded in
> + * nbd_read_eof(), the only place where bs->in_flight can reach 0, or
> + * it is entered for the first time. Both places are safe for entering
> + * the coroutine.
> + */
> + qemu_aio_coroutine_enter(bs->aio_context, s->connection_co);
> + }
> bdrv_dec_in_flight(bs);
> }
>
>
--
Best regards,
Vladimir
© 2016 - 2026 Red Hat, Inc.