From nobody Sat May 30 18:34:49 2026 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1778665592; cv=none; d=zohomail.com; s=zohoarc; b=e+M+1DEG4YX7mKGC6mglZVXsCBeIiTQxo684NBPQURVdu0n240nelWknokBTPNF/J3Jd3bRidrzpOQLvjZ96u+aMGHWT93wzfDFULcrFiNfA5fHpK7QIhB6epEtyGK48CPhudzNlRrCmgUA1EHSCG3vzJIzNcoaggu6txkTKJoI= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1778665592; h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=Px2dUMgeLu/5YimIXfBtjkKBUI/HSeRLcRmhywoNfS8=; b=lqEKTtrEFTnYrl4XnwJP9vOZSleVbC5zgDYc/vHPuMTYedODbugC/+atBgZYkxN8f2wdfFfHct7OYVnUpy5/4U+1evf4fWlIlkp8xB9trYhYBwAGSeRVGJIo60W54cYvZhYW2YAU/rmA2ZBOpgo7PM8wn74qqXrwowflpeoJxnI= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists1p.gnu.org (lists1p.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1778665592477167.7137407296741; Wed, 13 May 2026 02:46:32 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists1p.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1wN6A1-0003vg-E8; Wed, 13 May 2026 05:45:49 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1wN69r-0003vH-KM for qemu-devel@nongnu.org; Wed, 13 May 2026 05:45:40 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1wN69l-0006XQ-Bj for qemu-devel@nongnu.org; Wed, 13 May 2026 05:45:39 -0400 Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-445-XJ9XErwOOviqwXA5n-mSGw-1; Wed, 13 May 2026 05:45:28 -0400 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id D3DC8180059D; Wed, 13 May 2026 09:45:26 +0000 (UTC) Received: from corto.redhat.com (unknown [10.44.49.156]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 85B3B180058F; Wed, 13 May 2026 09:45:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1778665531; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=Px2dUMgeLu/5YimIXfBtjkKBUI/HSeRLcRmhywoNfS8=; b=Cu+v6Ox5trQNJbmqXQ+XUu8KnToC1WP5nlzSyIQ3QFPGV6dromOkE2Ww4MPCkdzsK0vhyN AcL1mDGLWdij4zRD0yRxyRVNVcWksiMys9S+HNy7ZaJolCmRAgaLUizli22ezXeuIal9Rs 8fXdrJgapCx4bKhkaLxSK8Ow+47Qn6w= X-MC-Unique: XJ9XErwOOviqwXA5n-mSGw-1 X-Mimecast-MFC-AGG-ID: XJ9XErwOOviqwXA5n-mSGw_1778665527 From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= To: qemu-devel@nongnu.org Cc: Alex Williamson , =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= , Avihai Horon , Peter Xu Subject: [PATCH] vfio/migration: Detect and report overflow in migration size queries Date: Wed, 13 May 2026 11:45:22 +0200 Message-ID: <20260513094522.346314-1-clg@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists1p.gnu.org; Received-SPF: pass client-ip=170.10.129.124; envelope-from=clg@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: 1 X-Spam_score: 0.1 X-Spam_bar: / X-Spam_report: (0.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.445, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HEXHASH_WORD=2.602, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: qemu development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1778665594058158500 VFIO migration ioctls (VFIO_DEVICE_FEATURE_MIG_DATA_SIZE and VFIO_MIG_GET_PRECOPY_INFO) return device-estimated migration sizes as uint64_t values. A misbehaving kernel driver could return values that are unreasonably large, which would corrupt the size accounting used to decide migration convergence. This misbehavior occurred a few times when testing migration of a VM with an assigned NVIDIA vGPU and an MLX5 VF. In some of the save iterations, the reported precopy and stopcopy sizes were unreasonably large (close to UINT64_MAX): vfio_state_pending (4fbce62c-8ce2-4cc9-b429-41635bc94f24) stopcopy size = 0 precopy initial size 18446744073708667040 precopy dirty size 0 vfio_save_iterate (4fbce62c-8ce2-4cc9-b429-41635bc94f24) precopy initia= l size 18446744073707618464 precopy dirty size 0 vfio_state_pending (4fbce62c-8ce2-4cc9-b429-41635bc94f24) stopcopy size = 18446744073708503040 precopy initial size 18446744073707618464 precopy dirt= y size 0 vfio_state_pending (4fbce62c-8ce2-4cc9-b429-41635bc94f24) stopcopy size = 0 precopy initial size 18446744073707618464 precopy dirty size 0 vfio_state_pending (0000:b1:01.0) stopcopy size 18446744073709543408 pre= copy initial size 0 precopy dirty size 1008 This had the effect of corrupting migration convergence, as reported by the HMP migrate command: (qemu) info migrate Status: active Time (ms): total=3D21140, setup=3D86, exp_down=3D15245543488= 6355 Remaining: 16 EiB RAM info: Throughput (Mbps): 967.98 Sizes: pagesize=3D4 KiB, total=3D4 GiB Transfers: transferred=3D2.29 GiB, remain=3D4.7 MiB Channels: precopy=3D1.91 GiB, multifd=3D0 B, postcopy=3D0 B= , vfio=3D387 MiB Page Types: normal=3D499427, zero=3D559708 Page Rates (pps): transfer=3D0, dirty=3D1892 Others: dirty_syncs=3D3 Add a helper to detect values that exceed INT64_MAX, which is far beyond any realistic device state size, and report them with an error message. Return -ERANGE from the query functions so callers can abort the migration rather than proceeding with corrupted estimates. However, the callers don't yet check the return value to actually stop the migration. Cc: Avihai Horon Cc: Peter Xu Signed-off-by: C=C3=A9dric Le Goater Reviewed-by: Avihai Horon Reviewed-by: Peter Xu --- hw/vfio/migration.c | 32 ++++++++++++++++++++++++++++---- 1 file changed, 28 insertions(+), 4 deletions(-) diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index 150e28656e97c5e8198541e5b6dfc4ed4102d143..fb12b9717f773fdde657911517d= e9d74c1eb3931 100644 --- a/hw/vfio/migration.c +++ b/hw/vfio/migration.c @@ -320,6 +320,18 @@ static void vfio_migration_cleanup(VFIODevice *vbasede= v) migration->data_fd =3D -1; } =20 +static bool vfio_migration_check_overflow(VFIODevice *vbasedev, uint64_t s= ize, + const char *name) +{ + if (size > INT64_MAX) { + error_report("%s: Estimated %s size overflow: 0x%"PRIx64, + vbasedev->name, name, size); + return true; + } + + return false; +} + static int vfio_query_stop_copy_size(VFIODevice *vbasedev) { uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature) + @@ -329,7 +341,7 @@ static int vfio_query_stop_copy_size(VFIODevice *vbased= ev) struct vfio_device_feature_mig_data_size *mig_data_size =3D (struct vfio_device_feature_mig_data_size *)feature->data; VFIOMigration *migration =3D vbasedev->migration; - int ret; + int ret =3D 0; =20 feature->argsz =3D sizeof(buf); feature->flags =3D @@ -347,7 +359,10 @@ static int vfio_query_stop_copy_size(VFIODevice *vbase= dev) vbasedev->name, ret); } else { migration->stopcopy_size =3D mig_data_size->stop_copy_length; - ret =3D 0; + if (vfio_migration_check_overflow(vbasedev, migration->stopcopy_si= ze, + "stop copy size")) { + ret =3D -ERANGE; + } } =20 trace_vfio_query_stop_copy_size(vbasedev->name, @@ -361,7 +376,7 @@ static int vfio_query_precopy_size(VFIOMigration *migra= tion) struct vfio_precopy_info precopy =3D { .argsz =3D sizeof(precopy), }; - int ret; + int ret =3D 0; =20 if (ioctl(migration->data_fd, VFIO_MIG_GET_PRECOPY_INFO, &precopy)) { migration->precopy_init_size =3D 0; @@ -370,9 +385,18 @@ static int vfio_query_precopy_size(VFIOMigration *migr= ation) warn_report_once("VFIO device %s ioctl(VFIO_MIG_GET_PRECOPY_INFO) " "failed (%d)", migration->vbasedev->name, ret); } else { + bool overflow; + migration->precopy_init_size =3D precopy.initial_bytes; migration->precopy_dirty_size =3D precopy.dirty_bytes; - ret =3D 0; + + overflow =3D vfio_migration_check_overflow(migration->vbasedev, + migration->precopy_init_size, "precopy init size= "); + overflow |=3D vfio_migration_check_overflow(migration->vbasedev, + migration->precopy_dirty_size, "precopy dirty siz= e"); + if (overflow) { + ret =3D -ERANGE; + } } =20 trace_vfio_query_precopy_size(migration->vbasedev->name, --=20 2.54.0