Nicholas Piggin <npiggin@gmail.com> writes:
Hi Nick,
I'm ignorant about replay, but we try to know why were taking the BQL in
the migration code, we move it around sometimes, etc. Can we be a bit
more strict with documentation here so we don't get stuck with a lock
that can't be changed?
> Migration causes a number of events that need to go in the replay
> trace, such as vm state transitions. The replay_mutex lock needs to
> be held for these.
>
Is it practical to explicitly list which events are those?
Are there any tests that exercise this that we could use to validate
changes around this area?
> The simplest approach seems to be just take it up-front when taking
> the bql.
But also the thing asserts if taken inside the BQL, so is the actual
matter here that we _cannot_ take the lock around the proper places?
I also see the replay lock around the main loop, so is it basically bql2
from the perspective of most of QEMU?
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> migration/migration.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 2eb9e50a263..277fca954c1 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -24,6 +24,7 @@
> #include "socket.h"
> #include "sysemu/runstate.h"
> #include "sysemu/sysemu.h"
> +#include "sysemu/replay.h"
> #include "sysemu/cpu-throttle.h"
> #include "rdma.h"
> #include "ram.h"
> @@ -2518,6 +2519,7 @@ static int postcopy_start(MigrationState *ms, Error **errp)
> }
>
> trace_postcopy_start();
> + replay_mutex_lock();
> bql_lock();
> trace_postcopy_start_set_run();
>
> @@ -2629,6 +2631,7 @@ static int postcopy_start(MigrationState *ms, Error **errp)
> migration_downtime_end(ms);
>
> bql_unlock();
> + replay_mutex_unlock();
>
> if (migrate_postcopy_ram()) {
> /*
> @@ -2670,6 +2673,7 @@ fail:
> }
> migration_call_notifiers(ms, MIG_EVENT_PRECOPY_FAILED, NULL);
> bql_unlock();
> + replay_mutex_unlock();
> return -1;
> }
>
> @@ -2721,6 +2725,7 @@ static int migration_completion_precopy(MigrationState *s,
> {
> int ret;
>
> + replay_mutex_lock();
> bql_lock();
>
> if (!migrate_mode_is_cpr(s)) {
> @@ -2746,6 +2751,7 @@ static int migration_completion_precopy(MigrationState *s,
> s->block_inactive);
> out_unlock:
> bql_unlock();
> + replay_mutex_unlock();
> return ret;
> }
>
> @@ -3633,6 +3639,7 @@ static void *bg_migration_thread(void *opaque)
>
> trace_migration_thread_setup_complete();
>
> + replay_mutex_lock();
> bql_lock();
>
> if (migration_stop_vm(s, RUN_STATE_PAUSED)) {
> @@ -3666,6 +3673,7 @@ static void *bg_migration_thread(void *opaque)
> */
> migration_bh_schedule(bg_migration_vm_start_bh, s);
> bql_unlock();
> + replay_mutex_unlock();
>
> while (migration_is_active()) {
> MigIterateState iter_state = bg_migration_iteration_run(s);
> @@ -3695,6 +3703,7 @@ fail:
> migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE,
> MIGRATION_STATUS_FAILED);
> bql_unlock();
> + replay_mutex_unlock();
> }
>
> fail_setup: