From nobody Sat Nov 15 17:43:54 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1749654491; cv=none; d=zohomail.com; s=zohoarc; b=Uf2ntqUNKEje5Al3oplmLl8Soktx/dfmkT21B38/DtFZOX7Gwv4c5t3usxqViv77rjPeTXFNMzBvIG3/7QUGAebWYrZofqn5ajEPns051ZasJoGRyT9F8t83rG/ncUL/Sf3patEX5hoCcyY3RhLiz1nv+6nTEWu593xtJJO4uFg= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1749654491; h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=7Mncl3eIe35qDtefYpoRq5zfuIlzvDpPbzDgAjgPdLc=; b=dgOwHi6W0aueDQHof9uxdvwlVhtGR5YmawYQ5iX9qzL9A8oPOoiYjygSvwagyp6MVmOCPkhxkOudIKQ8IGsD9pnUqui7yltoHJNxBLnn177Yg4k4XoL/ckrWtWubrze4SfYpREdZHwd5TS1bl0exWpESfojQuNN3Pl9+26DqA5A= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1749654491426652.9218433517191; Wed, 11 Jun 2025 08:08:11 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1uPN2p-0005S9-Ro; Wed, 11 Jun 2025 11:07:16 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1uPN2m-0005H9-Hb for qemu-devel@nongnu.org; Wed, 11 Jun 2025 11:07:12 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1uPN2k-00079l-8u for qemu-devel@nongnu.org; Wed, 11 Jun 2025 11:07:12 -0400 Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-640-BoZ7ku3CNISfbKCe2NM4_g-1; Wed, 11 Jun 2025 11:07:07 -0400 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id E7CDD1800282; Wed, 11 Jun 2025 15:07:06 +0000 (UTC) Received: from corto.redhat.com (unknown [10.45.225.191]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 528CC180045C; Wed, 11 Jun 2025 15:07:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1749654429; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7Mncl3eIe35qDtefYpoRq5zfuIlzvDpPbzDgAjgPdLc=; b=QFcF+PVqmK+ZymRLo1IMrN77IFdvij2ZWzG43YgmQVh8qg4UJIdrHmm4laoH2SI8pn/Tth vGvh2D2jfzbCzMdWLw6WoUqLUDhuwCC2SXFY7eCYfQ4JxEOq5kaJCu4fncMz2SeGqTsIaw 0kZyyGoDHpE/IPZ89bpdn8g3qw7P9tI= X-MC-Unique: BoZ7ku3CNISfbKCe2NM4_g-1 X-Mimecast-MFC-AGG-ID: BoZ7ku3CNISfbKCe2NM4_g_1749654427 From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= To: qemu-devel@nongnu.org Cc: Alex Williamson , Steve Sistare , =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= Subject: [PULL 17/27] vfio/container: recover from unmap-all-vaddr failure Date: Wed, 11 Jun 2025 17:06:09 +0200 Message-ID: <20250611150620.701903-18-clg@redhat.com> In-Reply-To: <20250611150620.701903-1-clg@redhat.com> References: <20250611150620.701903-1-clg@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.133.124; envelope-from=clg@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1749655095993116600 From: Steve Sistare If there are multiple containers and unmap-all fails for some container, we need to remap vaddr for the other containers for which unmap-all succeeded. Recover by walking all address ranges of all containers to restore the vaddr for each. Do so by invoking the vfio listener callback, and passing a new "remap" flag that tells it to restore a mapping without re-allocating new userland data structures. Signed-off-by: Steve Sistare Reviewed-by: C=C3=A9dric Le Goater Link: https://lore.kernel.org/qemu-devel/1749569991-25171-9-git-send-email-= steven.sistare@oracle.com Signed-off-by: C=C3=A9dric Le Goater --- include/hw/vfio/vfio-container-base.h | 3 + include/hw/vfio/vfio-cpr.h | 10 +++ hw/vfio/cpr-legacy.c | 91 +++++++++++++++++++++++++++ hw/vfio/listener.c | 19 +++++- 4 files changed, 122 insertions(+), 1 deletion(-) diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-c= ontainer-base.h index 9d37f86115a873eb164ae90c5ebaf2acd2c6a5d8..f0232654eedf19c4d9c4f0ed55e= 79074442720c3 100644 --- a/include/hw/vfio/vfio-container-base.h +++ b/include/hw/vfio/vfio-container-base.h @@ -256,4 +256,7 @@ struct VFIOIOMMUClass { VFIORamDiscardListener *vfio_find_ram_discard_listener( VFIOContainerBase *bcontainer, MemoryRegionSection *section); =20 +void vfio_container_region_add(VFIOContainerBase *bcontainer, + MemoryRegionSection *section, bool cpr_rema= p); + #endif /* HW_VFIO_VFIO_CONTAINER_BASE_H */ diff --git a/include/hw/vfio/vfio-cpr.h b/include/hw/vfio/vfio-cpr.h index b83dd4275183595aa31071d99099ad746931c66a..56ede049ad68759e31d855809c5= bd8493dc09176 100644 --- a/include/hw/vfio/vfio-cpr.h +++ b/include/hw/vfio/vfio-cpr.h @@ -10,6 +10,7 @@ #define HW_VFIO_VFIO_CPR_H =20 #include "migration/misc.h" +#include "system/memory.h" =20 struct VFIOContainer; struct VFIOContainerBase; @@ -17,6 +18,9 @@ struct VFIOGroup; =20 typedef struct VFIOContainerCPR { Error *blocker; + bool vaddr_unmapped; + NotifierWithReturn transfer_notifier; + MemoryListener remap_listener; int (*saved_dma_map)(const struct VFIOContainerBase *bcontainer, hwaddr iova, ram_addr_t size, void *vaddr, bool readonly, MemoryRegion *mr); @@ -42,4 +46,10 @@ int vfio_cpr_group_get_device_fd(int d, const char *name= ); bool vfio_cpr_container_match(struct VFIOContainer *container, struct VFIOGroup *group, int fd); =20 +void vfio_cpr_giommu_remap(struct VFIOContainerBase *bcontainer, + MemoryRegionSection *section); + +bool vfio_cpr_ram_discard_register_listener( + struct VFIOContainerBase *bcontainer, MemoryRegionSection *section); + #endif /* HW_VFIO_VFIO_CPR_H */ diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c index 2fd8348c7cd37964af87ef04e32ce3dcd2bfde1a..a84c3247b7172a1f084659f2418= d0c1e1394becf 100644 --- a/hw/vfio/cpr-legacy.c +++ b/hw/vfio/cpr-legacy.c @@ -29,6 +29,7 @@ static bool vfio_dma_unmap_vaddr_all(VFIOContainer *conta= iner, Error **errp) error_setg_errno(errp, errno, "vfio_dma_unmap_vaddr_all"); return false; } + container->cpr.vaddr_unmapped =3D true; return true; } =20 @@ -59,6 +60,14 @@ static int vfio_legacy_cpr_dma_map(const VFIOContainerBa= se *bcontainer, return 0; } =20 +static void vfio_region_remap(MemoryListener *listener, + MemoryRegionSection *section) +{ + VFIOContainer *container =3D container_of(listener, VFIOContainer, + cpr.remap_listener); + vfio_container_region_add(&container->bcontainer, section, true); +} + static bool vfio_cpr_supported(VFIOContainer *container, Error **errp) { if (!ioctl(container->fd, VFIO_CHECK_EXTENSION, VFIO_UPDATE_VADDR)) { @@ -120,6 +129,40 @@ static const VMStateDescription vfio_container_vmstate= =3D { } }; =20 +static int vfio_cpr_fail_notifier(NotifierWithReturn *notifier, + MigrationEvent *e, Error **errp) +{ + VFIOContainer *container =3D + container_of(notifier, VFIOContainer, cpr.transfer_notifier); + VFIOContainerBase *bcontainer =3D &container->bcontainer; + + if (e->type !=3D MIG_EVENT_PRECOPY_FAILED) { + return 0; + } + + if (container->cpr.vaddr_unmapped) { + /* + * Force a call to vfio_region_remap for each mapped section by + * temporarily registering a listener, and temporarily diverting + * dma_map to vfio_legacy_cpr_dma_map. The latter restores vaddr. + */ + + VFIOIOMMUClass *vioc =3D VFIO_IOMMU_GET_CLASS(bcontainer); + vioc->dma_map =3D vfio_legacy_cpr_dma_map; + + container->cpr.remap_listener =3D (MemoryListener) { + .name =3D "vfio cpr recover", + .region_add =3D vfio_region_remap + }; + memory_listener_register(&container->cpr.remap_listener, + bcontainer->space->as); + memory_listener_unregister(&container->cpr.remap_listener); + container->cpr.vaddr_unmapped =3D false; + vioc->dma_map =3D container->cpr.saved_dma_map; + } + return 0; +} + bool vfio_legacy_cpr_register_container(VFIOContainer *container, Error **= errp) { VFIOContainerBase *bcontainer =3D &container->bcontainer; @@ -142,6 +185,10 @@ bool vfio_legacy_cpr_register_container(VFIOContainer = *container, Error **errp) container->cpr.saved_dma_map =3D vioc->dma_map; vioc->dma_map =3D vfio_legacy_cpr_dma_map; } + + migration_add_notifier_mode(&container->cpr.transfer_notifier, + vfio_cpr_fail_notifier, + MIG_MODE_CPR_TRANSFER); return true; } =20 @@ -152,6 +199,50 @@ void vfio_legacy_cpr_unregister_container(VFIOContaine= r *container) migration_remove_notifier(&bcontainer->cpr_reboot_notifier); migrate_del_blocker(&container->cpr.blocker); vmstate_unregister(NULL, &vfio_container_vmstate, container); + migration_remove_notifier(&container->cpr.transfer_notifier); +} + +/* + * In old QEMU, VFIO_DMA_UNMAP_FLAG_VADDR may fail on some mapping after + * succeeding for others, so the latter have lost their vaddr. Call this + * to restore vaddr for a section with a giommu. + * + * The giommu already exists. Find it and replay it, which calls + * vfio_legacy_cpr_dma_map further down the stack. + */ +void vfio_cpr_giommu_remap(VFIOContainerBase *bcontainer, + MemoryRegionSection *section) +{ + VFIOGuestIOMMU *giommu =3D NULL; + hwaddr as_offset =3D section->offset_within_address_space; + hwaddr iommu_offset =3D as_offset - section->offset_within_region; + + QLIST_FOREACH(giommu, &bcontainer->giommu_list, giommu_next) { + if (giommu->iommu_mr =3D=3D IOMMU_MEMORY_REGION(section->mr) && + giommu->iommu_offset =3D=3D iommu_offset) { + break; + } + } + g_assert(giommu); + memory_region_iommu_replay(giommu->iommu_mr, &giommu->n); +} + +/* + * In old QEMU, VFIO_DMA_UNMAP_FLAG_VADDR may fail on some mapping after + * succeeding for others, so the latter have lost their vaddr. Call this + * to restore vaddr for a section with a RamDiscardManager. + * + * The ram discard listener already exists. Call its populate function + * directly, which calls vfio_legacy_cpr_dma_map. + */ +bool vfio_cpr_ram_discard_register_listener(VFIOContainerBase *bcontainer, + MemoryRegionSection *section) +{ + VFIORamDiscardListener *vrdl =3D + vfio_find_ram_discard_listener(bcontainer, section); + + g_assert(vrdl); + return vrdl->listener.notify_populate(&vrdl->listener, section) =3D=3D= 0; } =20 int vfio_cpr_group_get_device_fd(int d, const char *name) diff --git a/hw/vfio/listener.c b/hw/vfio/listener.c index 735b5f21b7b87cff6b5e757f9696d9a7c1c44fbf..f498e23a93747cb1826726f7c4c= a28f8128b4ced 100644 --- a/hw/vfio/listener.c +++ b/hw/vfio/listener.c @@ -481,6 +481,13 @@ static void vfio_listener_region_add(MemoryListener *l= istener, { VFIOContainerBase *bcontainer =3D container_of(listener, VFIOContainer= Base, listener); + vfio_container_region_add(bcontainer, section, false); +} + +void vfio_container_region_add(VFIOContainerBase *bcontainer, + MemoryRegionSection *section, + bool cpr_remap) +{ hwaddr iova, end; Int128 llend, llsize; void *vaddr; @@ -516,6 +523,11 @@ static void vfio_listener_region_add(MemoryListener *l= istener, int iommu_idx; =20 trace_vfio_listener_region_add_iommu(section->mr->name, iova, end); + + if (cpr_remap) { + vfio_cpr_giommu_remap(bcontainer, section); + } + /* * FIXME: For VFIO iommu types which have KVM acceleration to * avoid bouncing all map/unmaps through qemu this way, this @@ -558,7 +570,12 @@ static void vfio_listener_region_add(MemoryListener *l= istener, * about changes. */ if (memory_region_has_ram_discard_manager(section->mr)) { - vfio_ram_discard_register_listener(bcontainer, section); + if (!cpr_remap) { + vfio_ram_discard_register_listener(bcontainer, section); + } else if (!vfio_cpr_ram_discard_register_listener(bcontainer, + section)) { + goto fail; + } return; } =20 --=20 2.49.0