From nobody Mon Nov 25 03:02:02 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1718727267196797.6646916888488; Tue, 18 Jun 2024 09:14:27 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sJbTJ-00067L-5d; Tue, 18 Jun 2024 12:14:13 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sJbT8-0005ac-1l for qemu-devel@nongnu.org; Tue, 18 Jun 2024 12:14:03 -0400 Received: from vps-vb.mhejs.net ([37.28.154.113]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sJbT5-0000vm-EA for qemu-devel@nongnu.org; Tue, 18 Jun 2024 12:14:01 -0400 Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sJbSs-0001dw-0e; Tue, 18 Jun 2024 18:13:46 +0200 From: "Maciej S. Szmigiero" To: Peter Xu , Fabiano Rosas Cc: Alex Williamson , =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= , Eric Blake , Markus Armbruster , =?UTF-8?q?Daniel=20P=20=2E=20Berrang=C3=A9?= , Avihai Horon , Joao Martins , qemu-devel@nongnu.org Subject: [PATCH v1 11/13] vfio/migration: Multifd device state transfer support - receive side Date: Tue, 18 Jun 2024 18:12:29 +0200 Message-ID: X-Mailer: git-send-email 2.45.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=37.28.154.113; envelope-from=mail@maciej.szmigiero.name; helo=vps-vb.mhejs.net X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZM-MESSAGEID: 1718727268031100005 Content-Type: text/plain; charset="utf-8" From: "Maciej S. Szmigiero" The multifd received data needs to be reassembled since device state packets sent via different multifd channels can arrive out-of-order. Therefore, each VFIO device state packet carries a header indicating its position in the stream. The last such VFIO device state packet should have VFIO_DEVICE_STATE_CONFIG_STATE flag set and carry the device config state. Since it's important to finish loading device state transferred via the main migration channel (via save_live_iterate handler) before starting loading the data asynchronously transferred via multifd a new VFIO_MIG_FLAG_DEV_DATA_STATE_COMPLETE flag is introduced to mark the end of the main migration channel data. The device state loading process waits until that flag is seen before commencing loading of the multifd-transferred device state. Signed-off-by: Maciej S. Szmigiero --- hw/vfio/migration.c | 325 +++++++++++++++++++++++++++++++++- hw/vfio/trace-events | 9 +- include/hw/vfio/vfio-common.h | 14 ++ 3 files changed, 344 insertions(+), 4 deletions(-) diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index 93f767e3c2dd..719e36800ab5 100644 --- a/hw/vfio/migration.c +++ b/hw/vfio/migration.c @@ -15,6 +15,7 @@ #include #include =20 +#include "io/channel-buffer.h" #include "sysemu/runstate.h" #include "hw/vfio/vfio-common.h" #include "migration/misc.h" @@ -47,6 +48,7 @@ #define VFIO_MIG_FLAG_DEV_SETUP_STATE (0xffffffffef100003ULL) #define VFIO_MIG_FLAG_DEV_DATA_STATE (0xffffffffef100004ULL) #define VFIO_MIG_FLAG_DEV_INIT_DATA_SENT (0xffffffffef100005ULL) +#define VFIO_MIG_FLAG_DEV_DATA_STATE_COMPLETE (0xffffffffef100006ULL) =20 /* * This is an arbitrary size based on migration of mlx5 devices, where typ= ically @@ -55,6 +57,15 @@ */ #define VFIO_MIG_DEFAULT_DATA_BUFFER_SIZE (1 * MiB) =20 +#define VFIO_DEVICE_STATE_CONFIG_STATE (1) + +typedef struct VFIODeviceStatePacket { + uint32_t version; + uint32_t idx; + uint32_t flags; + uint8_t data[0]; +} QEMU_PACKED VFIODeviceStatePacket; + static int64_t bytes_transferred; =20 static const char *mig_state_to_str(enum vfio_device_mig_state state) @@ -254,6 +265,176 @@ static int vfio_load_buffer(QEMUFile *f, VFIODevice *= vbasedev, return ret; } =20 +typedef struct LoadedBuffer { + bool is_present; + char *data; + size_t len; +} LoadedBuffer; + +static void loaded_buffer_clear(gpointer data) +{ + LoadedBuffer *lb =3D data; + + if (!lb->is_present) { + return; + } + + g_clear_pointer(&lb->data, g_free); + lb->is_present =3D false; +} + +static int vfio_load_state_buffer(void *opaque, char *data, size_t data_si= ze, + Error **errp) +{ + VFIODevice *vbasedev =3D opaque; + VFIOMigration *migration =3D vbasedev->migration; + VFIODeviceStatePacket *packet =3D (VFIODeviceStatePacket *)data; + QEMU_LOCK_GUARD(&migration->load_bufs_mutex); + LoadedBuffer *lb; + + if (data_size < sizeof(*packet)) { + error_setg(errp, "packet too short at %zu (min is %zu)", + data_size, sizeof(*packet)); + return -1; + } + + if (packet->version !=3D 0) { + error_setg(errp, "packet has unknown version %" PRIu32, + packet->version); + return -1; + } + + if (packet->idx =3D=3D UINT32_MAX) { + error_setg(errp, "packet has too high idx %" PRIu32, + packet->idx); + return -1; + } + + trace_vfio_load_state_device_buffer_incoming(vbasedev->name, packet->i= dx); + + /* config state packet should be the last one in the stream */ + if (packet->flags & VFIO_DEVICE_STATE_CONFIG_STATE) { + migration->load_buf_idx_last =3D packet->idx; + } + + assert(migration->load_bufs); + if (packet->idx >=3D migration->load_bufs->len) { + g_array_set_size(migration->load_bufs, packet->idx + 1); + } + + lb =3D &g_array_index(migration->load_bufs, typeof(*lb), packet->idx); + if (lb->is_present) { + error_setg(errp, "state buffer %" PRIu32 " already filled", packet= ->idx); + return -1; + } + + assert(packet->idx >=3D migration->load_buf_idx); + + lb->data =3D g_memdup2(&packet->data, data_size - sizeof(*packet)); + lb->len =3D data_size - sizeof(*packet); + lb->is_present =3D true; + + qemu_cond_broadcast(&migration->load_bufs_buffer_ready_cond); + + return 0; +} + +static void *vfio_load_bufs_thread(void *opaque) +{ + VFIODevice *vbasedev =3D opaque; + VFIOMigration *migration =3D vbasedev->migration; + Error **errp =3D &migration->load_bufs_thread_errp; + g_autoptr(QemuLockable) locker =3D qemu_lockable_auto_lock( + QEMU_MAKE_LOCKABLE(&migration->load_bufs_mutex)); + LoadedBuffer *lb; + + while (!migration->load_bufs_device_ready && + !migration->load_bufs_thread_want_exit) { + qemu_cond_wait(&migration->load_bufs_device_ready_cond, &migration= ->load_bufs_mutex); + } + + while (!migration->load_bufs_thread_want_exit) { + bool starved; + ssize_t ret; + + assert(migration->load_buf_idx <=3D migration->load_buf_idx_last); + + if (migration->load_buf_idx >=3D migration->load_bufs->len) { + assert(migration->load_buf_idx =3D=3D migration->load_bufs->le= n); + starved =3D true; + } else { + lb =3D &g_array_index(migration->load_bufs, typeof(*lb), migra= tion->load_buf_idx); + starved =3D !lb->is_present; + } + + if (starved) { + trace_vfio_load_state_device_buffer_starved(vbasedev->name, mi= gration->load_buf_idx); + qemu_cond_wait(&migration->load_bufs_buffer_ready_cond, &migra= tion->load_bufs_mutex); + continue; + } + + if (migration->load_buf_idx =3D=3D migration->load_buf_idx_last) { + break; + } + + if (migration->load_buf_idx =3D=3D 0) { + trace_vfio_load_state_device_buffer_start(vbasedev->name); + } + + if (lb->len) { + g_autofree char *buf =3D NULL; + size_t buf_len; + int errno_save; + + trace_vfio_load_state_device_buffer_load_start(vbasedev->name, + migration->load= _buf_idx); + + /* lb might become re-allocated when we drop the lock */ + buf =3D g_steal_pointer(&lb->data); + buf_len =3D lb->len; + + /* Loading data to the device takes a while, drop the lock dur= ing this process */ + qemu_mutex_unlock(&migration->load_bufs_mutex); + ret =3D write(migration->data_fd, buf, buf_len); + errno_save =3D errno; + qemu_mutex_lock(&migration->load_bufs_mutex); + + if (ret < 0) { + error_setg(errp, "write to state buffer %" PRIu32 " failed= with %d", + migration->load_buf_idx, errno_save); + break; + } else if (ret < buf_len) { + error_setg(errp, "write to state buffer %" PRIu32 " incomp= lete %zd / %zu", + migration->load_buf_idx, ret, buf_len); + break; + } + + trace_vfio_load_state_device_buffer_load_end(vbasedev->name, + migration->load_b= uf_idx); + } + + if (migration->load_buf_idx =3D=3D migration->load_buf_idx_last - = 1) { + trace_vfio_load_state_device_buffer_end(vbasedev->name); + } + + migration->load_buf_idx++; + } + + if (migration->load_bufs_thread_want_exit && + !*errp) { + error_setg(errp, "load bufs thread asked to quit"); + } + + g_clear_pointer(&locker, qemu_lockable_auto_unlock); + + qemu_loadvm_load_finish_ready_lock(); + migration->load_bufs_thread_finished =3D true; + qemu_loadvm_load_finish_ready_broadcast(); + qemu_loadvm_load_finish_ready_unlock(); + + return NULL; +} + static int vfio_save_device_config_state(QEMUFile *f, void *opaque, Error **errp) { @@ -285,6 +466,8 @@ static int vfio_load_device_config_state(QEMUFile *f, v= oid *opaque) VFIODevice *vbasedev =3D opaque; uint64_t data; =20 + trace_vfio_load_device_config_state_start(vbasedev->name); + if (vbasedev->ops && vbasedev->ops->vfio_load_config) { int ret; =20 @@ -303,7 +486,7 @@ static int vfio_load_device_config_state(QEMUFile *f, v= oid *opaque) return -EINVAL; } =20 - trace_vfio_load_device_config_state(vbasedev->name); + trace_vfio_load_device_config_state_end(vbasedev->name); return qemu_file_get_error(f); } =20 @@ -687,16 +870,69 @@ static void vfio_save_state(QEMUFile *f, void *opaque) static int vfio_load_setup(QEMUFile *f, void *opaque, Error **errp) { VFIODevice *vbasedev =3D opaque; + VFIOMigration *migration =3D vbasedev->migration; + int ret; + + ret =3D vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RESUMING, + vbasedev->migration->device_state, errp= ); + if (ret) { + return ret; + } + + assert(!migration->load_bufs); + migration->load_bufs =3D g_array_new(FALSE, TRUE, sizeof(LoadedBuffer)= ); + g_array_set_clear_func(migration->load_bufs, loaded_buffer_clear); + + qemu_mutex_init(&migration->load_bufs_mutex); + + migration->load_bufs_device_ready =3D false; + qemu_cond_init(&migration->load_bufs_device_ready_cond); + + migration->load_buf_idx =3D 0; + migration->load_buf_idx_last =3D UINT32_MAX; + qemu_cond_init(&migration->load_bufs_buffer_ready_cond); + + migration->config_state_loaded_to_dev =3D false; + + assert(!migration->load_bufs_thread_started); + + migration->load_bufs_thread_finished =3D false; + migration->load_bufs_thread_want_exit =3D false; + qemu_thread_create(&migration->load_bufs_thread, "vfio-load-bufs", + vfio_load_bufs_thread, opaque, QEMU_THREAD_JOINABLE= ); =20 - return vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RESUMING, - vbasedev->migration->device_state, err= p); + migration->load_bufs_thread_started =3D true; + + return 0; } =20 static int vfio_load_cleanup(void *opaque) { VFIODevice *vbasedev =3D opaque; + VFIOMigration *migration =3D vbasedev->migration; + + if (migration->load_bufs_thread_started) { + qemu_mutex_lock(&migration->load_bufs_mutex); + migration->load_bufs_thread_want_exit =3D true; + qemu_mutex_unlock(&migration->load_bufs_mutex); + + qemu_cond_broadcast(&migration->load_bufs_device_ready_cond); + qemu_cond_broadcast(&migration->load_bufs_buffer_ready_cond); + + qemu_thread_join(&migration->load_bufs_thread); + + assert(migration->load_bufs_thread_finished); + + migration->load_bufs_thread_started =3D false; + } =20 vfio_migration_cleanup(vbasedev); + + g_clear_pointer(&migration->load_bufs, g_array_unref); + qemu_cond_destroy(&migration->load_bufs_buffer_ready_cond); + qemu_cond_destroy(&migration->load_bufs_device_ready_cond); + qemu_mutex_destroy(&migration->load_bufs_mutex); + trace_vfio_load_cleanup(vbasedev->name); =20 return 0; @@ -705,6 +941,7 @@ static int vfio_load_cleanup(void *opaque) static int vfio_load_state(QEMUFile *f, void *opaque, int version_id) { VFIODevice *vbasedev =3D opaque; + VFIOMigration *migration =3D vbasedev->migration; int ret =3D 0; uint64_t data; =20 @@ -716,6 +953,7 @@ static int vfio_load_state(QEMUFile *f, void *opaque, i= nt version_id) switch (data) { case VFIO_MIG_FLAG_DEV_CONFIG_STATE: { + migration->config_state_loaded_to_dev =3D true; return vfio_load_device_config_state(f, opaque); } case VFIO_MIG_FLAG_DEV_SETUP_STATE: @@ -742,6 +980,15 @@ static int vfio_load_state(QEMUFile *f, void *opaque, = int version_id) } break; } + case VFIO_MIG_FLAG_DEV_DATA_STATE_COMPLETE: + { + QEMU_LOCK_GUARD(&migration->load_bufs_mutex); + + migration->load_bufs_device_ready =3D true; + qemu_cond_broadcast(&migration->load_bufs_device_ready_cond); + + break; + } case VFIO_MIG_FLAG_DEV_INIT_DATA_SENT: { if (!vfio_precopy_supported(vbasedev) || @@ -774,6 +1021,76 @@ static int vfio_load_state(QEMUFile *f, void *opaque,= int version_id) return ret; } =20 +static int vfio_load_finish(void *opaque, bool *is_finished, Error **errp) +{ + VFIODevice *vbasedev =3D opaque; + VFIOMigration *migration =3D vbasedev->migration; + g_autoptr(QemuLockable) locker =3D NULL; + LoadedBuffer *lb; + g_autoptr(QIOChannelBuffer) bioc =3D NULL; + QEMUFile *f_out =3D NULL, *f_in =3D NULL; + uint64_t mig_header; + int ret; + + if (migration->config_state_loaded_to_dev) { + *is_finished =3D true; + return 0; + } + + if (!migration->load_bufs_thread_finished) { + assert(migration->load_bufs_thread_started); + *is_finished =3D false; + return 0; + } + + if (migration->load_bufs_thread_errp) { + error_propagate(errp, g_steal_pointer(&migration->load_bufs_thread= _errp)); + return -1; + } + + locker =3D qemu_lockable_auto_lock(QEMU_MAKE_LOCKABLE(&migration->load= _bufs_mutex)); + + assert(migration->load_buf_idx =3D=3D migration->load_buf_idx_last); + lb =3D &g_array_index(migration->load_bufs, typeof(*lb), migration->lo= ad_buf_idx); + assert(lb->is_present); + + bioc =3D qio_channel_buffer_new(lb->len); + qio_channel_set_name(QIO_CHANNEL(bioc), "vfio-device-config-load"); + + f_out =3D qemu_file_new_output(QIO_CHANNEL(bioc)); + qemu_put_buffer(f_out, (uint8_t *)lb->data, lb->len); + + ret =3D qemu_fflush(f_out); + if (ret) { + error_setg(errp, "load device config state file flush failed with = %d", ret); + g_clear_pointer(&f_out, qemu_fclose); + return -1; + } + + qio_channel_io_seek(QIO_CHANNEL(bioc), 0, 0, NULL); + f_in =3D qemu_file_new_input(QIO_CHANNEL(bioc)); + + mig_header =3D qemu_get_be64(f_in); + if (mig_header !=3D VFIO_MIG_FLAG_DEV_CONFIG_STATE) { + error_setg(errp, "load device config state invalid header %"PRIu64= , mig_header); + g_clear_pointer(&f_out, qemu_fclose); + g_clear_pointer(&f_in, qemu_fclose); + return -1; + } + + ret =3D vfio_load_device_config_state(f_in, opaque); + g_clear_pointer(&f_out, qemu_fclose); + g_clear_pointer(&f_in, qemu_fclose); + if (ret < 0) { + error_setg(errp, "load device config state failed with %d", ret); + return -1; + } + + migration->config_state_loaded_to_dev =3D true; + *is_finished =3D true; + return 0; +} + static bool vfio_switchover_ack_needed(void *opaque) { VFIODevice *vbasedev =3D opaque; @@ -794,6 +1111,8 @@ static const SaveVMHandlers savevm_vfio_handlers =3D { .load_setup =3D vfio_load_setup, .load_cleanup =3D vfio_load_cleanup, .load_state =3D vfio_load_state, + .load_state_buffer =3D vfio_load_state_buffer, + .load_finish =3D vfio_load_finish, .switchover_ack_needed =3D vfio_switchover_ack_needed, }; =20 diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events index 814000796687..7f224e4d240f 100644 --- a/hw/vfio/trace-events +++ b/hw/vfio/trace-events @@ -148,9 +148,16 @@ vfio_display_edid_write_error(void) "" =20 # migration.c vfio_load_cleanup(const char *name) " (%s)" -vfio_load_device_config_state(const char *name) " (%s)" +vfio_load_device_config_state_start(const char *name) " (%s)" +vfio_load_device_config_state_end(const char *name) " (%s)" vfio_load_state(const char *name, uint64_t data) " (%s) data 0x%"PRIx64 vfio_load_state_device_data(const char *name, uint64_t data_size, int ret)= " (%s) size 0x%"PRIx64" ret %d" +vfio_load_state_device_buffer_incoming(const char *name, uint32_t idx) " (= %s) idx %"PRIu32 +vfio_load_state_device_buffer_start(const char *name) " (%s)" +vfio_load_state_device_buffer_starved(const char *name, uint32_t idx) " (%= s) idx %"PRIu32 +vfio_load_state_device_buffer_load_start(const char *name, uint32_t idx) "= (%s) idx %"PRIu32 +vfio_load_state_device_buffer_load_end(const char *name, uint32_t idx) " (= %s) idx %"PRIu32 +vfio_load_state_device_buffer_end(const char *name) " (%s)" vfio_migration_realize(const char *name) " (%s)" vfio_migration_set_device_state(const char *name, const char *state) " (%s= ) state %s" vfio_migration_set_state(const char *name, const char *new_state, const ch= ar *recover_state) " (%s) new state %s, recover state %s" diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index 510818f4dae3..aa8476a859a6 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -74,6 +74,20 @@ typedef struct VFIOMigration { =20 bool save_iterate_run; bool save_iterate_empty_hit; + QemuThread load_bufs_thread; + Error *load_bufs_thread_errp; + bool load_bufs_thread_started; + bool load_bufs_thread_finished; + bool load_bufs_thread_want_exit; + + GArray *load_bufs; + bool load_bufs_device_ready; + QemuCond load_bufs_device_ready_cond; + QemuCond load_bufs_buffer_ready_cond; + QemuMutex load_bufs_mutex; + uint32_t load_buf_idx; + uint32_t load_buf_idx_last; + bool config_state_loaded_to_dev; } VFIOMigration; =20 struct VFIOGroup;