From nobody Thu Apr 3 11:33:04 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1741270979; cv=none; d=zohomail.com; s=zohoarc; b=OxJvY+uIiM7myR64og0ksrAhpMS1d+d3OUVgnNZIO89lLqWc0RNS514nN7V8XDwM7UJGlMv76ynejDUnFtqMRvC7IeuEbmMvgxnhg6B23mss4GeeHbEK555m+7eWn5Nl6wBXfXIbM6TTDlhSNaKwwVTGpaeR3pxefpvt2yjVFq0= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1741270979; h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=fL2jDoLkbNFzfcsI8GHD0gwVFThbZF+dj4p0Hq4j3rw=; b=ITUGlrhmKDrbu0vAj6Fx215arZcr9c4AVarOG3moqXOTXkz3DGYbYRCwqphGBUB6O592yK3udCl1j2Za5PWxTdrhnUaIk7LQWWojOtir7KzEAQH5kRV9/AUL1A0gJY8PGNMnAH0tQuJJjjZlZOONLX+0iEXFdMTExJ81zkcA7NU= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1741270979133799.9836723956802; Thu, 6 Mar 2025 06:22:59 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tqC4B-0003zL-M1; Thu, 06 Mar 2025 09:19:15 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tqC1e-0000sG-Mh for qemu-devel@nongnu.org; Thu, 06 Mar 2025 09:16:46 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tqC1X-00027m-Nd for qemu-devel@nongnu.org; Thu, 06 Mar 2025 09:16:35 -0500 Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-650-dNUYE8_EMP6zXm3kRnVMeQ-1; Thu, 06 Mar 2025 09:16:09 -0500 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 8BDCE1955E9C; Thu, 6 Mar 2025 14:16:08 +0000 (UTC) Received: from corto.redhat.com (unknown [10.44.33.141]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id A99CD18009BC; Thu, 6 Mar 2025 14:16:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1741270590; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fL2jDoLkbNFzfcsI8GHD0gwVFThbZF+dj4p0Hq4j3rw=; b=X+MeZnpc5ZzGsHJPPGu6V6y7RxVIbCwra0NOh/0UhDyGu9aBFaf52zTlqedcITJ7uFQAIW u4MYS2PwpTcXrh/BKN7lk7KYvqR2QHX4VO3AKIl6Syp3ltTh3zZF8CVaoEo307ZB2z6Xhl KkjsGpRbd7IQA7yzSPpUFiQG5y+I3/w= X-MC-Unique: dNUYE8_EMP6zXm3kRnVMeQ-1 X-Mimecast-MFC-AGG-ID: dNUYE8_EMP6zXm3kRnVMeQ_1741270568 From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= To: qemu-devel@nongnu.org Cc: Alex Williamson , "Maciej S. Szmigiero" , =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= Subject: [PULL 35/42] vfio/migration: Multifd device state transfer support - received buffers queuing Date: Thu, 6 Mar 2025 15:14:11 +0100 Message-ID: <20250306141419.2015340-36-clg@redhat.com> In-Reply-To: <20250306141419.2015340-1-clg@redhat.com> References: <20250306141419.2015340-1-clg@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.133.124; envelope-from=clg@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1741270981814019100 From: "Maciej S. Szmigiero" The multifd received data needs to be reassembled since device state packets sent via different multifd channels can arrive out-of-order. Therefore, each VFIO device state packet carries a header indicating its position in the stream. The raw device state data is saved into a VFIOStateBuffer for later in-order loading into the device. The last such VFIO device state packet should have VFIO_DEVICE_STATE_CONFIG_STATE flag set and carry the device config state. Signed-off-by: Maciej S. Szmigiero Reviewed-by: C=C3=A9dric Le Goater Link: https://lore.kernel.org/qemu-devel/e3bff515a8d61c582b94b409eb12a45b1a= 143a69.1741124640.git.maciej.szmigiero@oracle.com [ clg: - Reordered savevm_vfio_handlers - Added load_state_buffer documentation ] Signed-off-by: C=C3=A9dric Le Goater --- docs/devel/migration/vfio.rst | 7 ++ hw/vfio/migration-multifd.h | 3 + hw/vfio/migration-multifd.c | 163 ++++++++++++++++++++++++++++++++++ hw/vfio/migration.c | 4 + hw/vfio/trace-events | 1 + 5 files changed, 178 insertions(+) diff --git a/docs/devel/migration/vfio.rst b/docs/devel/migration/vfio.rst index c49482eab66d8e831ea1c2c791fc895b51893e4d..8b1f28890a0ba708cc1e49d87b1= 6e17a5d8f7c88 100644 --- a/docs/devel/migration/vfio.rst +++ b/docs/devel/migration/vfio.rst @@ -76,6 +76,10 @@ VFIO implements the device hooks for the iterative appro= ach as follows: * A ``load_state`` function that loads the config section and the data sections that are generated by the save functions above. =20 +* A ``load_state_buffer`` function that loads the device state and the dev= ice + config that arrived via multifd channels. + It's used only in the multifd mode. + * ``cleanup`` functions for both save and load that perform any migration related cleanup. =20 @@ -194,6 +198,9 @@ Live migration resume path (RESTORE_VM, _ACTIVE, _STOP) | For each device, .load_state() is called for that device section data + transmitted via the main migration channel. + For data transmitted via multifd channels .load_state_buffer() is cal= led + instead. (RESTORE_VM, _ACTIVE, _RESUMING) | At the end, .load_cleanup() is called for each device and vCPUs are star= ted diff --git a/hw/vfio/migration-multifd.h b/hw/vfio/migration-multifd.h index 2a7a76164f291d182172775524a5b11c0a560c58..8c6320fcb484ca9f779e14d4f9d= 814081d2f760e 100644 --- a/hw/vfio/migration-multifd.h +++ b/hw/vfio/migration-multifd.h @@ -20,4 +20,7 @@ void vfio_multifd_cleanup(VFIODevice *vbasedev); bool vfio_multifd_transfer_supported(void); bool vfio_multifd_transfer_enabled(VFIODevice *vbasedev); =20 +bool vfio_multifd_load_state_buffer(void *opaque, char *data, size_t data_= size, + Error **errp); + #endif diff --git a/hw/vfio/migration-multifd.c b/hw/vfio/migration-multifd.c index 091dc43210ad1459d5114da18336e73f6cb0baf9..79df11b7baa991763039b395c34= 931e796e711b0 100644 --- a/hw/vfio/migration-multifd.c +++ b/hw/vfio/migration-multifd.c @@ -32,18 +32,181 @@ typedef struct VFIODeviceStatePacket { uint8_t data[0]; } QEMU_PACKED VFIODeviceStatePacket; =20 +/* type safety */ +typedef struct VFIOStateBuffers { + GArray *array; +} VFIOStateBuffers; + +typedef struct VFIOStateBuffer { + bool is_present; + char *data; + size_t len; +} VFIOStateBuffer; + typedef struct VFIOMultifd { + VFIOStateBuffers load_bufs; + QemuCond load_bufs_buffer_ready_cond; + QemuMutex load_bufs_mutex; /* Lock order: this lock -> BQL */ + uint32_t load_buf_idx; + uint32_t load_buf_idx_last; } VFIOMultifd; =20 +static void vfio_state_buffer_clear(gpointer data) +{ + VFIOStateBuffer *lb =3D data; + + if (!lb->is_present) { + return; + } + + g_clear_pointer(&lb->data, g_free); + lb->is_present =3D false; +} + +static void vfio_state_buffers_init(VFIOStateBuffers *bufs) +{ + bufs->array =3D g_array_new(FALSE, TRUE, sizeof(VFIOStateBuffer)); + g_array_set_clear_func(bufs->array, vfio_state_buffer_clear); +} + +static void vfio_state_buffers_destroy(VFIOStateBuffers *bufs) +{ + g_clear_pointer(&bufs->array, g_array_unref); +} + +static void vfio_state_buffers_assert_init(VFIOStateBuffers *bufs) +{ + assert(bufs->array); +} + +static unsigned int vfio_state_buffers_size_get(VFIOStateBuffers *bufs) +{ + return bufs->array->len; +} + +static void vfio_state_buffers_size_set(VFIOStateBuffers *bufs, + unsigned int size) +{ + g_array_set_size(bufs->array, size); +} + +static VFIOStateBuffer *vfio_state_buffers_at(VFIOStateBuffers *bufs, + unsigned int idx) +{ + return &g_array_index(bufs->array, VFIOStateBuffer, idx); +} + +/* called with load_bufs_mutex locked */ +static bool vfio_load_state_buffer_insert(VFIODevice *vbasedev, + VFIODeviceStatePacket *packet, + size_t packet_total_size, + Error **errp) +{ + VFIOMigration *migration =3D vbasedev->migration; + VFIOMultifd *multifd =3D migration->multifd; + VFIOStateBuffer *lb; + + vfio_state_buffers_assert_init(&multifd->load_bufs); + if (packet->idx >=3D vfio_state_buffers_size_get(&multifd->load_bufs))= { + vfio_state_buffers_size_set(&multifd->load_bufs, packet->idx + 1); + } + + lb =3D vfio_state_buffers_at(&multifd->load_bufs, packet->idx); + if (lb->is_present) { + error_setg(errp, "%s: state buffer %" PRIu32 " already filled", + vbasedev->name, packet->idx); + return false; + } + + assert(packet->idx >=3D multifd->load_buf_idx); + + lb->data =3D g_memdup2(&packet->data, packet_total_size - sizeof(*pack= et)); + lb->len =3D packet_total_size - sizeof(*packet); + lb->is_present =3D true; + + return true; +} + +bool vfio_multifd_load_state_buffer(void *opaque, char *data, size_t data_= size, + Error **errp) +{ + VFIODevice *vbasedev =3D opaque; + VFIOMigration *migration =3D vbasedev->migration; + VFIOMultifd *multifd =3D migration->multifd; + VFIODeviceStatePacket *packet =3D (VFIODeviceStatePacket *)data; + + if (!vfio_multifd_transfer_enabled(vbasedev)) { + error_setg(errp, + "%s: got device state packet but not doing multifd tran= sfer", + vbasedev->name); + return false; + } + + assert(multifd); + + if (data_size < sizeof(*packet)) { + error_setg(errp, "%s: packet too short at %zu (min is %zu)", + vbasedev->name, data_size, sizeof(*packet)); + return false; + } + + if (packet->version !=3D VFIO_DEVICE_STATE_PACKET_VER_CURRENT) { + error_setg(errp, "%s: packet has unknown version %" PRIu32, + vbasedev->name, packet->version); + return false; + } + + if (packet->idx =3D=3D UINT32_MAX) { + error_setg(errp, "%s: packet index is invalid", vbasedev->name); + return false; + } + + trace_vfio_load_state_device_buffer_incoming(vbasedev->name, packet->i= dx); + + /* + * Holding BQL here would violate the lock order and can cause + * a deadlock once we attempt to lock load_bufs_mutex below. + */ + assert(!bql_locked()); + + WITH_QEMU_LOCK_GUARD(&multifd->load_bufs_mutex) { + /* config state packet should be the last one in the stream */ + if (packet->flags & VFIO_DEVICE_STATE_CONFIG_STATE) { + multifd->load_buf_idx_last =3D packet->idx; + } + + if (!vfio_load_state_buffer_insert(vbasedev, packet, data_size, + errp)) { + return false; + } + + qemu_cond_signal(&multifd->load_bufs_buffer_ready_cond); + } + + return true; +} + static VFIOMultifd *vfio_multifd_new(void) { VFIOMultifd *multifd =3D g_new(VFIOMultifd, 1); =20 + vfio_state_buffers_init(&multifd->load_bufs); + + qemu_mutex_init(&multifd->load_bufs_mutex); + + multifd->load_buf_idx =3D 0; + multifd->load_buf_idx_last =3D UINT32_MAX; + qemu_cond_init(&multifd->load_bufs_buffer_ready_cond); + return multifd; } =20 static void vfio_multifd_free(VFIOMultifd *multifd) { + vfio_state_buffers_destroy(&multifd->load_bufs); + qemu_cond_destroy(&multifd->load_bufs_buffer_ready_cond); + qemu_mutex_destroy(&multifd->load_bufs_mutex); + g_free(multifd); } =20 diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index 3c8286ae62300122582ae5ced26f5cbf5742818a..2cdb92356e0a2afb64109c10536= c857b19f7e7c5 100644 --- a/hw/vfio/migration.c +++ b/hw/vfio/migration.c @@ -802,6 +802,10 @@ static const SaveVMHandlers savevm_vfio_handlers =3D { .load_cleanup =3D vfio_load_cleanup, .load_state =3D vfio_load_state, .switchover_ack_needed =3D vfio_switchover_ack_needed, + /* + * Multifd support + */ + .load_state_buffer =3D vfio_multifd_load_state_buffer, }; =20 /* ---------------------------------------------------------------------- = */ diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events index a02c668f28a43f73ed2db9f15827f26fed0994c3..404ea079b25c49fe25f4c9b05f0= cde4f0536fdd7 100644 --- a/hw/vfio/trace-events +++ b/hw/vfio/trace-events @@ -154,6 +154,7 @@ vfio_load_device_config_state_start(const char *name) "= (%s)" vfio_load_device_config_state_end(const char *name) " (%s)" vfio_load_state(const char *name, uint64_t data) " (%s) data 0x%"PRIx64 vfio_load_state_device_data(const char *name, uint64_t data_size, int ret)= " (%s) size %"PRIu64" ret %d" +vfio_load_state_device_buffer_incoming(const char *name, uint32_t idx) " (= %s) idx %"PRIu32 vfio_migration_realize(const char *name) " (%s)" vfio_migration_set_device_state(const char *name, const char *state) " (%s= ) state %s" vfio_migration_set_state(const char *name, const char *new_state, const ch= ar *recover_state) " (%s) new state %s, recover state %s" --=20 2.48.1