During shutdown, blockdev_close_all_bdrv_states() drops any block node
references that are still owned by the monitor (i.e. the user). However,
in doing so, it forgot to also remove the node from monitor_bdrv_states
(which qmp_blockdev_del() correctly does), which means that later calls
of bdrv_first()/bdrv_next() will still return the (now stale) pointer to
the node.
Usually there is no such call after this point, but in some cases it can
happen. In the reported case, there was an ongoing migration, and the
migration thread wasn't shut down yet: migration_shutdown() called by
qemu_cleanup() doesn't actually wait for the migration to be shut down,
but may just move it to MIGRATION_STATUS_CANCELLING. The next time
migration_iteration_finish() runs, it sees the status and tries to
re-activate all block devices that migration may have previously
inactivated. This is where bdrv_first()/bdrv_next() get called and the
access to the already freed node happens.
It is debatable if migration_shutdown() should really return before
migration has settled, but leaving a dangling pointer in the list of
monitor-owned block nodes is clearly a bug either way and fixing it
solves the immediate problem, so fix it.
Cc: qemu-stable@nongnu.org
Reported-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
blockdev.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/blockdev.c b/blockdev.c
index dbd1d4d3e80..6e86c6262f9 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -686,6 +686,7 @@ void blockdev_close_all_bdrv_states(void)
GLOBAL_STATE_CODE();
QTAILQ_FOREACH_SAFE(bs, &monitor_bdrv_states, monitor_list, next_bs) {
+ QTAILQ_REMOVE(&monitor_bdrv_states, bs, monitor_list);
bdrv_unref(bs);
}
}
--
2.52.0
On 15/12/2025 16.07, Kevin Wolf wrote:
> During shutdown, blockdev_close_all_bdrv_states() drops any block node
> references that are still owned by the monitor (i.e. the user). However,
> in doing so, it forgot to also remove the node from monitor_bdrv_states
> (which qmp_blockdev_del() correctly does), which means that later calls
> of bdrv_first()/bdrv_next() will still return the (now stale) pointer to
> the node.
>
> Usually there is no such call after this point, but in some cases it can
> happen. In the reported case, there was an ongoing migration, and the
> migration thread wasn't shut down yet: migration_shutdown() called by
> qemu_cleanup() doesn't actually wait for the migration to be shut down,
> but may just move it to MIGRATION_STATUS_CANCELLING. The next time
> migration_iteration_finish() runs, it sees the status and tries to
> re-activate all block devices that migration may have previously
> inactivated. This is where bdrv_first()/bdrv_next() get called and the
> access to the already freed node happens.
>
> It is debatable if migration_shutdown() should really return before
> migration has settled, but leaving a dangling pointer in the list of
> monitor-owned block nodes is clearly a bug either way and fixing it
> solves the immediate problem, so fix it.
>
> Cc: qemu-stable@nongnu.org
> Reported-by: Thomas Huth <thuth@redhat.com>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
> blockdev.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/blockdev.c b/blockdev.c
> index dbd1d4d3e80..6e86c6262f9 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -686,6 +686,7 @@ void blockdev_close_all_bdrv_states(void)
>
> GLOBAL_STATE_CODE();
> QTAILQ_FOREACH_SAFE(bs, &monitor_bdrv_states, monitor_list, next_bs) {
> + QTAILQ_REMOVE(&monitor_bdrv_states, bs, monitor_list);
> bdrv_unref(bs);
> }
> }
Thanks again!
Reviewed-by: Thomas Huth <thuth@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
On Mon, Dec 15, 2025 at 04:07:14PM +0100, Kevin Wolf wrote: > During shutdown, blockdev_close_all_bdrv_states() drops any block node > references that are still owned by the monitor (i.e. the user). However, > in doing so, it forgot to also remove the node from monitor_bdrv_states > (which qmp_blockdev_del() correctly does), which means that later calls > of bdrv_first()/bdrv_next() will still return the (now stale) pointer to > the node. > > Usually there is no such call after this point, but in some cases it can > happen. In the reported case, there was an ongoing migration, and the > migration thread wasn't shut down yet: migration_shutdown() called by > qemu_cleanup() doesn't actually wait for the migration to be shut down, > but may just move it to MIGRATION_STATUS_CANCELLING. The next time > migration_iteration_finish() runs, it sees the status and tries to > re-activate all block devices that migration may have previously > inactivated. This is where bdrv_first()/bdrv_next() get called and the > access to the already freed node happens. > > It is debatable if migration_shutdown() should really return before > migration has settled, but leaving a dangling pointer in the list of > monitor-owned block nodes is clearly a bug either way and fixing it > solves the immediate problem, so fix it. > > Cc: qemu-stable@nongnu.org > Reported-by: Thomas Huth <thuth@redhat.com> > Signed-off-by: Kevin Wolf <kwolf@redhat.com> > --- > blockdev.c | 1 + > 1 file changed, 1 insertion(+) Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Kevin Wolf <kwolf@redhat.com> writes: > During shutdown, blockdev_close_all_bdrv_states() drops any block node > references that are still owned by the monitor (i.e. the user). However, > in doing so, it forgot to also remove the node from monitor_bdrv_states > (which qmp_blockdev_del() correctly does), which means that later calls > of bdrv_first()/bdrv_next() will still return the (now stale) pointer to > the node. > > Usually there is no such call after this point, but in some cases it can > happen. In the reported case, there was an ongoing migration, and the > migration thread wasn't shut down yet: migration_shutdown() called by > qemu_cleanup() doesn't actually wait for the migration to be shut down, > but may just move it to MIGRATION_STATUS_CANCELLING. The next time > migration_iteration_finish() runs, it sees the status and tries to > re-activate all block devices that migration may have previously > inactivated. This is where bdrv_first()/bdrv_next() get called and the > access to the already freed node happens. > > It is debatable if migration_shutdown() should really return before > migration has settled, but leaving a dangling pointer in the list of > monitor-owned block nodes is clearly a bug either way and fixing it > solves the immediate problem, so fix it. > > Cc: qemu-stable@nongnu.org > Reported-by: Thomas Huth <thuth@redhat.com> > Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de>
© 2016 - 2025 Red Hat, Inc.