On Wed, Feb 19, 2025 at 09:33:49PM +0100, Maciej S. Szmigiero wrote:
> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
>
> All callers to migration_incoming_state_destroy() other than
> postcopy_ram_listen_thread() do this call with BQL held.
>
> Since migration_incoming_state_destroy() ultimately calls "load_cleanup"
> SaveVMHandlers and it will soon call BQL-sensitive code it makes sense
> to always call that function under BQL rather than to have it deal with
> both cases (with BQL and without BQL).
> Add the necessary bql_lock() and bql_unlock() to
> postcopy_ram_listen_thread().
We can do that, but let's be explicit on what needs BQL to be taken.
Could you add an assertion in migration_incoming_state_destroy() on
bql_locked(), then add a rich comment above it listing what needs the BQL?
We may consider dropping it some day when it's not needed.
Thanks,
>
> qemu_loadvm_state_main() in postcopy_ram_listen_thread() could call
> "load_state" SaveVMHandlers that are expecting BQL to be held.
>
> In principle, the only devices that should be arriving on migration
> channel serviced by postcopy_ram_listen_thread() are those that are
> postcopiable and whose load handlers are safe to be called without BQL
> being held.
>
> But nothing currently prevents the source from sending data for "unsafe"
> devices which would cause trouble there.
> Add a TODO comment there so it's clear that it would be good to improve
> handling of such (erroneous) case in the future.
>
> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
> ---
> migration/savevm.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 7c1aa8ad7b9d..3e86b572cfa8 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1986,6 +1986,8 @@ static void *postcopy_ram_listen_thread(void *opaque)
> * in qemu_file, and thus we must be blocking now.
> */
> qemu_file_set_blocking(f, true);
> +
> + /* TODO: sanity check that only postcopiable data will be loaded here */
> load_res = qemu_loadvm_state_main(f, mis);
>
> /*
> @@ -2046,7 +2048,9 @@ static void *postcopy_ram_listen_thread(void *opaque)
> * (If something broke then qemu will have to exit anyway since it's
> * got a bad migration state).
> */
> + bql_lock();
> migration_incoming_state_destroy();
> + bql_unlock();
>
> rcu_unregister_thread();
> mis->have_listen_thread = false;
>
--
Peter Xu