[PATCH v5 07/36] migration: postcopy_ram_listen_thread() should take BQL for some calls

Maciej S. Szmigiero posted 36 patches 1 month, 2 weeks ago
There is a newer version of this series
[PATCH v5 07/36] migration: postcopy_ram_listen_thread() should take BQL for some calls
Posted by Maciej S. Szmigiero 1 month, 2 weeks ago
From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>

All callers to migration_incoming_state_destroy() other than
postcopy_ram_listen_thread() do this call with BQL held.

Since migration_incoming_state_destroy() ultimately calls "load_cleanup"
SaveVMHandlers and it will soon call BQL-sensitive code it makes sense
to always call that function under BQL rather than to have it deal with
both cases (with BQL and without BQL).
Add the necessary bql_lock() and bql_unlock() to
postcopy_ram_listen_thread().

qemu_loadvm_state_main() in postcopy_ram_listen_thread() could call
"load_state" SaveVMHandlers that are expecting BQL to be held.

In principle, the only devices that should be arriving on migration
channel serviced by postcopy_ram_listen_thread() are those that are
postcopiable and whose load handlers are safe to be called without BQL
being held.

But nothing currently prevents the source from sending data for "unsafe"
devices which would cause trouble there.
Add a TODO comment there so it's clear that it would be good to improve
handling of such (erroneous) case in the future.

Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
---
 migration/savevm.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/migration/savevm.c b/migration/savevm.c
index 7c1aa8ad7b9d..3e86b572cfa8 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1986,6 +1986,8 @@ static void *postcopy_ram_listen_thread(void *opaque)
      * in qemu_file, and thus we must be blocking now.
      */
     qemu_file_set_blocking(f, true);
+
+    /* TODO: sanity check that only postcopiable data will be loaded here */
     load_res = qemu_loadvm_state_main(f, mis);
 
     /*
@@ -2046,7 +2048,9 @@ static void *postcopy_ram_listen_thread(void *opaque)
      * (If something broke then qemu will have to exit anyway since it's
      * got a bad migration state).
      */
+    bql_lock();
     migration_incoming_state_destroy();
+    bql_unlock();
 
     rcu_unregister_thread();
     mis->have_listen_thread = false;
Re: [PATCH v5 07/36] migration: postcopy_ram_listen_thread() should take BQL for some calls
Posted by Peter Xu 1 month, 1 week ago
On Wed, Feb 19, 2025 at 09:33:49PM +0100, Maciej S. Szmigiero wrote:
> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
> 
> All callers to migration_incoming_state_destroy() other than
> postcopy_ram_listen_thread() do this call with BQL held.
> 
> Since migration_incoming_state_destroy() ultimately calls "load_cleanup"
> SaveVMHandlers and it will soon call BQL-sensitive code it makes sense
> to always call that function under BQL rather than to have it deal with
> both cases (with BQL and without BQL).
> Add the necessary bql_lock() and bql_unlock() to
> postcopy_ram_listen_thread().

We can do that, but let's be explicit on what needs BQL to be taken.

Could you add an assertion in migration_incoming_state_destroy() on
bql_locked(), then add a rich comment above it listing what needs the BQL?
We may consider dropping it some day when it's not needed.

Thanks,

> 
> qemu_loadvm_state_main() in postcopy_ram_listen_thread() could call
> "load_state" SaveVMHandlers that are expecting BQL to be held.
> 
> In principle, the only devices that should be arriving on migration
> channel serviced by postcopy_ram_listen_thread() are those that are
> postcopiable and whose load handlers are safe to be called without BQL
> being held.
> 
> But nothing currently prevents the source from sending data for "unsafe"
> devices which would cause trouble there.
> Add a TODO comment there so it's clear that it would be good to improve
> handling of such (erroneous) case in the future.
> 
> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
> ---
>  migration/savevm.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 7c1aa8ad7b9d..3e86b572cfa8 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1986,6 +1986,8 @@ static void *postcopy_ram_listen_thread(void *opaque)
>       * in qemu_file, and thus we must be blocking now.
>       */
>      qemu_file_set_blocking(f, true);
> +
> +    /* TODO: sanity check that only postcopiable data will be loaded here */
>      load_res = qemu_loadvm_state_main(f, mis);
>  
>      /*
> @@ -2046,7 +2048,9 @@ static void *postcopy_ram_listen_thread(void *opaque)
>       * (If something broke then qemu will have to exit anyway since it's
>       * got a bad migration state).
>       */
> +    bql_lock();
>      migration_incoming_state_destroy();
> +    bql_unlock();
>  
>      rcu_unregister_thread();
>      mis->have_listen_thread = false;
> 

-- 
Peter Xu
Re: [PATCH v5 07/36] migration: postcopy_ram_listen_thread() should take BQL for some calls
Posted by Maciej S. Szmigiero 1 month, 1 week ago
On 25.02.2025 18:16, Peter Xu wrote:
> On Wed, Feb 19, 2025 at 09:33:49PM +0100, Maciej S. Szmigiero wrote:
>> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
>>
>> All callers to migration_incoming_state_destroy() other than
>> postcopy_ram_listen_thread() do this call with BQL held.
>>
>> Since migration_incoming_state_destroy() ultimately calls "load_cleanup"
>> SaveVMHandlers and it will soon call BQL-sensitive code it makes sense
>> to always call that function under BQL rather than to have it deal with
>> both cases (with BQL and without BQL).
>> Add the necessary bql_lock() and bql_unlock() to
>> postcopy_ram_listen_thread().
> 
> We can do that, but let's be explicit on what needs BQL to be taken.
> 
> Could you add an assertion in migration_incoming_state_destroy() on
> bql_locked(), then add a rich comment above it listing what needs the BQL?
> We may consider dropping it some day when it's not needed.

Sure, good idea.

Updated this commit now, and tests (make check) still pass.

> Thanks,

Thanks,
Maciej