From nobody Sun Oct 5 01:49:22 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.libvirt.org designates 8.43.85.245 as permitted sender) client-ip=8.43.85.245; envelope-from=devel-bounces@lists.libvirt.org; helo=lists.libvirt.org; Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zohomail.com: domain of lists.libvirt.org designates 8.43.85.245 as permitted sender) smtp.mailfrom=devel-bounces@lists.libvirt.org; arc=fail (Bad Signature); dmarc=pass(p=reject dis=none) header.from=lists.libvirt.org Return-Path: Received: from lists.libvirt.org (lists.libvirt.org [8.43.85.245]) by mx.zohomail.com with SMTPS id 1758591087846312.92747782606136; Mon, 22 Sep 2025 18:31:27 -0700 (PDT) Received: by lists.libvirt.org (Postfix, from userid 993) id E7F5741BD7; Mon, 22 Sep 2025 21:31:26 -0400 (EDT) Received: from [172.19.199.10] (lists.libvirt.org [8.43.85.245]) by lists.libvirt.org (Postfix) with ESMTP id AC1CB41C83; Mon, 22 Sep 2025 21:19:43 -0400 (EDT) Received: by lists.libvirt.org (Postfix, from userid 993) id 91B01419E9; Mon, 22 Sep 2025 21:19:25 -0400 (EDT) Received: from SN4PR2101CU001.outbound.protection.outlook.com (mail-southcentralusazon11012013.outbound.protection.outlook.com [40.93.195.13]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (prime256v1) server-signature RSA-PSS (3072 bits) server-digest SHA256) (No client certificate requested) by lists.libvirt.org (Postfix) with ESMTPS id 5EAAA419BE for ; Mon, 22 Sep 2025 21:16:19 -0400 (EDT) Received: from PH7PR12MB6834.namprd12.prod.outlook.com (2603:10b6:510:1b4::18) by IA1PR12MB6090.namprd12.prod.outlook.com (2603:10b6:208:3ee::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9137.19; Tue, 23 Sep 2025 01:16:12 +0000 Received: from PH7PR12MB6834.namprd12.prod.outlook.com ([fe80::f432:162b:b94e:d2cb]) by PH7PR12MB6834.namprd12.prod.outlook.com ([fe80::f432:162b:b94e:d2cb%6]) with mapi id 15.20.9137.018; Tue, 23 Sep 2025 01:16:12 +0000 X-Spam-Checker-Version: SpamAssassin 4.0.1 (2024-03-26) on lists.libvirt.org X-Spam-Level: X-Spam-Status: No, score=-3.1 required=5.0 tests=ARC_SIGNED,ARC_VALID, DKIM_INVALID,DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED,RCVD_IN_VALIDITY_RPBL_BLOCKED, RCVD_IN_VALIDITY_SAFE_BLOCKED,SPF_PASS autolearn=unavailable autolearn_force=no version=4.0.1 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=JlfEoGuH0/sacSaINo+WM8I8OLXUoI4SrAISThCDm+md/g+kEFojL8GK7CosS4V1dZnu0KT6atos4o9o736ngnzDwwKmpWF/w1a75p7X9l3f9iKZoHbzmjQSPquC5ZjWBk5NECd1wIC8tvY8tX1Vzp9g29+zBzUBsyRe9o3GaeYIYEtPvRJfiR8AvdL5dqWxrAIlQSs4eNJokGkWz1rVjs73tOLWhzxBgKRooywbJKx6f6uB8TOEzEKP1a7Ho5SYieD/v5TmR5MRVyORKBbJFIvdENhhkmizEzzbiEAD0d4lpWFXYioMIcBzG96+29YYEyWKOduB/5prD3B0Z4RpNA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=UM+HkmvaXFPmOH6QVc1XdTCMLV9WTgdgomWjlC3K6Eg=; b=n66GQLJRCw/T4pWwvOkrFp2Sd+uSBCrjvT13gfxn4MYbpfFTFmYmRlsr0yQ+uNiOQveCRxWvwlpFYLtojf+vKJ8PMM/YldZNSPqoTDP8PBQoq22TCMoZv+Mhest2JQndtEb2+Hd8rY9YT2XMNpMuNV2qJScxovUuSDENOuGv7szcbUE6eeu+25DZpWz0Xa22nkKzuSNAyl/cnJMD2w08e+v2MwUS3o9gSAa4nxyXKFybQV3O0w6qCG+kMOCE6llZy4Pati0FajcQZasDolbW6RbTwMa6Ms/HOGYVeix55bwEKpOQNeNCzHofgXvBzaDsP91fE/awIA8rtDl44lk1gw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=UM+HkmvaXFPmOH6QVc1XdTCMLV9WTgdgomWjlC3K6Eg=; b=P4ztQc8eJHzaunDI2WysgAr6eJIIoelMsu/QsbjJJZxtrvKsufEWoPj4F10z16L+MGOCacFxOW+WBuUgSWqBeNU+09MzHihGldJx6JNeMmPzvjP5wKPm+wNbhEWPimTN1wCZiyF9BvcOexV7LOqLLcXFfPvThCWVHz0C16ZyZuijw9Cz8VWptsKUIPZ11ECsfCUvZRqSvwkXJ0XA+Q/uLx6yGN3NMhIWOH3gHqk15wUr38JtfHu6ZqwavEX5nVyFnCkN5cdpTb6q+mBwO6kxNv/GSvbxKHxdYIE+3E1D67ncu9ej4pmMRM/WQaRwko8LAgeQ1MtsomrtLw2Ao/G1Qg== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; To: devel@lists.libvirt.org Subject: [RFC PATCH 5/7] qemu: open iommufd FDs from libvirt backend Date: Mon, 22 Sep 2025 18:15:58 -0700 Message-ID: <20250923011600.3229388-6-nathanc@nvidia.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250923011600.3229388-1-nathanc@nvidia.com> References: <20250923011600.3229388-1-nathanc@nvidia.com> Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: BYAPR06CA0040.namprd06.prod.outlook.com (2603:10b6:a03:14b::17) To PH7PR12MB6834.namprd12.prod.outlook.com (2603:10b6:510:1b4::18) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: PH7PR12MB6834:EE_|IA1PR12MB6090:EE_ X-MS-Office365-Filtering-Correlation-Id: 54a995ec-1af8-4e62-05a8-08ddfa3ec915 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|366016|376014|1800799024; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?egLjmDiQLGCuRUHPEkxoupRAnUivwramZ72nnZnvypMUccUpPeprMykEnrz0?= =?us-ascii?Q?aumbN4ask7vpMHMX6YdgHJqR3Q7u68HDTSk3O4yOjjLMlOSFWRhPgMXISyb9?= =?us-ascii?Q?p1cbTOSql57p9mXuZEMAmy6TS7/ePNwVeDxHKgRDsWDTxOYTgCQgYSmPsREr?= =?us-ascii?Q?Gr08yFLlCxGXS2VbyPfXKNFMCggmZ7CeD5lfsNNASvniSCO/e0CCkySZG8c3?= =?us-ascii?Q?FDi0hcNXA+Kkty229y/+GtqANjC186dDNfaPGxpBOQgE1BATeZPzBrRPM0f7?= =?us-ascii?Q?2ObaFInb6OvKxC/uta4IMmCMKkXqYTMmdL/m2ULWW8NneASBoyFFZ3XWMndd?= =?us-ascii?Q?Yk7WGTHpVJ3GYuso0003Scqy1LUWwQv+AoI9Lepok4OHMnAjD0F9zRGGkCv1?= =?us-ascii?Q?GckZvnz+bVozs1dgENgtG2mEF78Onzln8XNWGgRzLs+h2bZsLspLUsU1ZfMj?= =?us-ascii?Q?kQZoRLOlHECqFzUvxmQ5MPljKNrDLO8GWJ0o2uaoGsaX27vExcQyd03LTapT?= =?us-ascii?Q?zmaB+UY3wfWQ6UuoBR1HNsDsx0fz5XwD4FDu7ar6pzpH/Eytf7NKvHdjc6EJ?= =?us-ascii?Q?+DiaRqvHL/qLUMdgrw3xMUFUEa1z9DzTS7OI5YCYr48CNVygdXF6miBJ0As/?= =?us-ascii?Q?mSspXf9xmEB5tqMJ9nkoBDdjUwhCRyXb8nl3VJwcoHlJUbrua8RJH8fjzejA?= =?us-ascii?Q?UHQ5rVh8Ap9rQo0cJH/7en44EPJqNW4fKuHeYI2PAQhbfTrOvOA3PWZlT7RE?= =?us-ascii?Q?SCgLf6KtTEKV8P62pbkSDvW4klKuEU+FONuLGq9jdOqmQfBtgGAq5YD6j9/T?= =?us-ascii?Q?XSXuSvfonAX/LsSoIKJ+gi/H79erqdsmKvZa8qV3jqtmmzFHmj0cFw7ePeKD?= =?us-ascii?Q?wvjJC2/nz9FnkXMYNV9d2wG6ALSRNrx6c/mwQQi3BKAFMx0MpFLrH1BUZxsa?= =?us-ascii?Q?4Y2Mw0DC9qogGJJy33Wy1EdrwYE9heYRoEbNDmIQz2oWxyHaD72gy6JnLGfh?= =?us-ascii?Q?kcyQQkFDNcaolUm5UjukBMCCtm4rUL5U8hdyei3JIEuPcZCEyb7618J6+wSW?= =?us-ascii?Q?hUNeVywEyZs6jVX606DnRG8FwWrcL8qK8YlWRlAXaRXg5IOGdBMT6EE+RnoA?= =?us-ascii?Q?fGtMZUceFu6jV6S/cc9R3kEFg+tHjjC/dd7H61+qedNdMJpTwAd1A3EU3CBX?= =?us-ascii?Q?+8HB5jzKqmQqf9ECjYEGM/9S+omt1kEquVbCoTRpp2bDUTfyR2Pesppq19wT?= =?us-ascii?Q?g8+Fp+HS/NrO+1Dq+eFfIky/WOquPnwJpusphPkfgyD0lqZFcWW2P/NWtxPi?= =?us-ascii?Q?AFJdsOjr2SH3NY0T+Gsb2lYHP3MxMzwDh1tpVNx8FVOao+9jAnIM0Nc4eHKM?= =?us-ascii?Q?Q8f2I+fAyHbbO+s3Vu4PrpB/gUfcKW9iHHrVJbDMI/pPyXa731xcyoaBGZcK?= =?us-ascii?Q?sUFDYnLvFsw=3D?= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:PH7PR12MB6834.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(366016)(376014)(1800799024);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?uqHY5VMYHM0RMtThy5IHIwBlRv1JbhrkDrvxnf6mk0wWMFz2IOZIjTjWCL5T?= =?us-ascii?Q?7v0aAyTc82FMck9M7jGK+boJ+O2JxIZwlCM/89tnf4lXxvhllnm3J/3eU1EI?= =?us-ascii?Q?dPd12jpSTrkGH7sxzCR4zLCyTHgxDHArqceUnfrdZgiy58HD7SM/EC/11X3u?= =?us-ascii?Q?qqj4yNeFKw0Wr8WC6pWYdIDyucwFBhEaZm5qZ7E0nbMpAeco57lkaQN93328?= =?us-ascii?Q?E9uJAwTAZbEpog2+dm9Zb5qIsp61Zb0nV3qu3zT9N+HWr+dShfLD78uaK6/m?= =?us-ascii?Q?IIn+ysFAHG0/zhx/mGCc08t7Cvri+ONqFD6E/VQv2rlxXmI/Hjom7CqPIeSA?= =?us-ascii?Q?MpXIHdIVjBFlQIYbgwinIuKySUj2XKQUTeHC6RLR8vliCYmpAXm6uIlMulwi?= =?us-ascii?Q?mEq0Flm96YuJVeDcIPLL6HOBhJVxsOwJIGySn0gJHdv4H4/hqBdMIYw8tl5/?= =?us-ascii?Q?AcPhnJP2ec8m54L+72rqJcxvwhpn5DQqL5JDgsVQ5/6oiJvXe/iXe/Wlp4mu?= =?us-ascii?Q?dq92cffKUhFhYCpnSbod2vi+aLmx5hAuq9NNyV2SFk0bDZ0IBhpqEIcS5WX6?= =?us-ascii?Q?Ir2Va8Rfyl83s0bkNxJbYIeLXl9Q4xkCIljW0jvik7co6A+QkWggLM/Kce11?= =?us-ascii?Q?4D04nwxLbJHBI1vKVL11em8cntPC/i9UZy0mBqffG7rRmr02UU/UuBXnUhC/?= =?us-ascii?Q?tl60sOgb6vHOgGhzssRpWmNiv/4ws7aOhw87PJc08G2ZZ0WB8w8cDQP8MZDN?= =?us-ascii?Q?YKcPgpiy5KeHlrQUJt6NKJZsvowB29m1hCsqQT+3FyNOKtjSqP2M0dJtRiLW?= =?us-ascii?Q?mPrbvsERgHccgbcMQUyvmF4uNz0dUxVMzSdsQpAZDSHQiYCOHad1W5OuVHD8?= =?us-ascii?Q?69GHPTgAVeBkR7mEpOpa7SHapvjMDaw9v07oMfZHps1eGdgDd/1Ri6MV5VBu?= =?us-ascii?Q?u4lT+zaUP5g0pazVSiHt2DPRJqfDjejOMbk9gdbY5Te/pH+wFB5M8zEI+5WI?= =?us-ascii?Q?ngvOBWqEohhoHf+NOGzsXu7pE0/4IMr/MwXLkTIAb9LKRZHwxADuBP898Ryd?= =?us-ascii?Q?rxop3+gLGJpRtymzOdLMqTfp8GhuX/WiMVUwYMqk1ZZya2cNgQu+17yLx13Q?= =?us-ascii?Q?oft/Xe96h0HX2fhlGlOZUEOAus1fUsnA0c3PxTdmVPcFw1ORYlFT47O+Ur22?= =?us-ascii?Q?H0wNb1WhaLMObRBNVysQVOroGUqKz9iGLvP4U7lkbuZbZLCD3eqnzZtOh7oG?= =?us-ascii?Q?LirPga8ZGYa3p+xAR7f9RapS6WJ4YFlKCgifOznaar+z88GVz0Nnkt0wVkhR?= =?us-ascii?Q?SJOiw/47PHOK2L7HYBdo8tE9r2DRJX/ahCyAFpps3JhIMMaOJarpLlQysVHy?= =?us-ascii?Q?+yuvPE0elj5ZRCu+E8aJIZ+/6tLarDjsFu4Xt3OdCEWLArXiRHSiPrUYxQbi?= =?us-ascii?Q?nBumq/2n78Zn+0ymW/dVzIG2GR0N2mMZMagr2GEKvmpeIc9sVmsPmY7ubuSc?= =?us-ascii?Q?3EaB7YQEgV5PkG4XHiJ96x3i/lAAb9+OMlWW7EQuCz837oPrN0o6p/wEtOOV?= =?us-ascii?Q?8eq/iTWJ4Ly7z3O5gnIkClEHPcDDWZSBX69olshs?= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 54a995ec-1af8-4e62-05a8-08ddfa3ec915 X-MS-Exchange-CrossTenant-AuthSource: PH7PR12MB6834.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 23 Sep 2025 01:16:12.6529 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: uXDY2xZvnNy8bW4/Tl8jxVZdbMxd2lT3eNUMrvXc9Op5fwuOuTTYyjt4s//hbeZ5QEo4mQgHzrLZ45vtyBb7jA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: IA1PR12MB6090 Message-ID-Hash: CH76KV7L2TWMABOGRGUE47WIG3GVH7PD X-Message-ID-Hash: CH76KV7L2TWMABOGRGUE47WIG3GVH7PD X-MailFrom: nathanc@nvidia.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; header-match-devel.lists.libvirt.org-0; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: shameerali.kolothum.thodi@huawei.com, nicolinc@nvidia.com, nathanc@nvidia.com, mochs@nvidia.com X-Mailman-Version: 3.3.10 Precedence: list List-Id: Development discussions about the libvirt library & tools Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: From: Nathan Chen via Devel Reply-To: Nathan Chen X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZM-MESSAGEID: 1758591088173116600 Content-Type: text/plain; charset="utf-8" Open iommufd FDs from libvirt backend without exposing these FDs to XML users, i.e. one per domain for /dev/iommu and one per iommufd hostdev for /dev/vfio/devices/vfioX, and pass the FD to qemu command line. Signed-off-by: Nathan Chen --- src/qemu/qemu_command.c | 43 +++++++- src/qemu/qemu_command.h | 3 +- src/qemu/qemu_domain.c | 8 ++ src/qemu/qemu_domain.h | 7 ++ src/qemu/qemu_hotplug.c | 2 +- src/qemu/qemu_process.c | 232 ++++++++++++++++++++++++++++++++++++++++ 6 files changed, 289 insertions(+), 6 deletions(-) diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index c9d165dfd9..418ea601d6 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -4800,7 +4800,8 @@ qemuBuildVideoCommandLine(virCommand *cmd, =20 virJSONValue * qemuBuildPCIHostdevDevProps(const virDomainDef *def, - virDomainHostdevDef *dev) + virDomainHostdevDef *dev, + virDomainObj *vm) { g_autoptr(virJSONValue) props =3D NULL; virDomainHostdevSubsysPCI *pcisrc =3D &dev->source.subsys.u.pci; @@ -4811,6 +4812,13 @@ qemuBuildPCIHostdevDevProps(const virDomainDef *def, const char *iommufdId =3D NULL; /* 'ramfb' property must be omitted unless it's to be enabled */ bool ramfb =3D pcisrc->ramfb =3D=3D VIR_TRISTATE_SWITCH_ON; + bool useIommufd =3D false; + qemuDomainObjPrivate *priv =3D vm ? vm->privateData : NULL; + + if (pcisrc->driver.name =3D=3D VIR_DEVICE_HOSTDEV_PCI_DRIVER_NAME_VFIO= && + pcisrc->driver.iommufd) { + useIommufd =3D true; + } =20 /* caller has to assign proper passthrough driver name */ switch (pcisrc->driver.name) { @@ -4857,6 +4865,18 @@ qemuBuildPCIHostdevDevProps(const virDomainDef *def, NULL) < 0) return NULL; =20 + if (useIommufd && priv) { + g_autofree char *vfioFdName =3D g_strdup_printf("vfio-%04x:%02x:%0= 2x.%d", + pcisrc->addr.domain,= pcisrc->addr.bus, + pcisrc->addr.slot, p= cisrc->addr.function); + + int vfiofd =3D GPOINTER_TO_INT(g_hash_table_lookup(priv->vfioDevic= eFds, vfioFdName)); + if (virJSONValueObjectAdd(&props, + "S:fd", g_strdup_printf("%d", vfiofd), + NULL) < 0) + return NULL; + } + if (qemuBuildDeviceAddressProps(props, def, dev->info) < 0) return NULL; =20 @@ -5267,12 +5287,14 @@ qemuBuildAcpiNodesetProps(virCommand *cmd, static int qemuBuildHostdevCommandLine(virCommand *cmd, const virDomainDef *def, - virQEMUCaps *qemuCaps) + virQEMUCaps *qemuCaps, + virDomainObj *vm) { size_t i; g_autoptr(virJSONValue) props =3D NULL; int iommufd =3D 0; const char * iommufdId =3D "iommufd0"; + qemuDomainObjPrivate *priv =3D vm->privateData; =20 for (i =3D 0; i < def->nhostdevs; i++) { virDomainHostdevDef *hostdev =3D def->hostdevs[i]; @@ -5303,8 +5325,10 @@ qemuBuildHostdevCommandLine(virCommand *cmd, =20 if (subsys->u.pci.driver.iommufd && iommufd =3D=3D 0) { iommufd =3D 1; + virCommandPassFD(cmd, priv->iommufd, VIR_COMMAND_PASS_FD_C= LOSE_PARENT); if (qemuMonitorCreateObjectProps(&props, "iommufd", iommufdId, + "S:fd", g_strdup_printf("= %d", priv->iommufd), NULL) < 0) return -1; =20 @@ -5315,7 +5339,18 @@ qemuBuildHostdevCommandLine(virCommand *cmd, if (qemuCommandAddExtDevice(cmd, hostdev->info, def, qemuCaps)= < 0) return -1; =20 - if (!(devprops =3D qemuBuildPCIHostdevDevProps(def, hostdev))) + if (subsys->u.pci.driver.iommufd) { + virDomainHostdevSubsysPCI *pcisrc =3D &hostdev->source.sub= sys.u.pci; + g_autofree char *vfioFdName =3D g_strdup_printf("vfio-%04x= :%02x:%02x.%d", + pcisrc->addr= .domain, pcisrc->addr.bus, + pcisrc->addr= .slot, pcisrc->addr.function); + + int vfiofd =3D GPOINTER_TO_INT(g_hash_table_lookup(priv->v= fioDeviceFds, vfioFdName)); + + virCommandPassFD(cmd, vfiofd, VIR_COMMAND_PASS_FD_CLOSE_PA= RENT); + } + + if (!(devprops =3D qemuBuildPCIHostdevDevProps(def, hostdev, v= m))) return -1; =20 if (qemuBuildDeviceCommandlineFromJSON(cmd, devprops, def, qem= uCaps) < 0) @@ -11018,7 +11053,7 @@ qemuBuildCommandLine(virDomainObj *vm, if (qemuBuildRedirdevCommandLine(cmd, def, qemuCaps) < 0) return NULL; =20 - if (qemuBuildHostdevCommandLine(cmd, def, qemuCaps) < 0) + if (qemuBuildHostdevCommandLine(cmd, def, qemuCaps, vm) < 0) return NULL; =20 if (migrateURI) diff --git a/src/qemu/qemu_command.h b/src/qemu/qemu_command.h index ad068f1f16..380aac261f 100644 --- a/src/qemu/qemu_command.h +++ b/src/qemu/qemu_command.h @@ -180,7 +180,8 @@ qemuBuildThreadContextProps(virJSONValue **tcProps, /* Current, best practice */ virJSONValue * qemuBuildPCIHostdevDevProps(const virDomainDef *def, - virDomainHostdevDef *dev); + virDomainHostdevDef *dev, + virDomainObj *vm); =20 virJSONValue * qemuBuildRNGDevProps(const virDomainDef *def, diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index e45757ccd5..2f1c3de85d 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -1954,6 +1954,11 @@ qemuDomainObjPrivateFree(void *data) =20 virChrdevFree(priv->devs); =20 + if (priv->iommufd >=3D 0) { + virEventRemoveHandle(priv->iommufd); + priv->iommufd =3D -1; + } + if (priv->pidMonitored >=3D 0) { virEventRemoveHandle(priv->pidMonitored); priv->pidMonitored =3D -1; @@ -1975,6 +1980,7 @@ qemuDomainObjPrivateFree(void *data) =20 g_clear_pointer(&priv->blockjobs, g_hash_table_unref); g_clear_pointer(&priv->fds, g_hash_table_unref); + g_clear_pointer(&priv->vfioDeviceFds, g_hash_table_unref); =20 /* This should never be non-NULL if we get here, but just in case... */ if (priv->eventThread) { @@ -2003,7 +2009,9 @@ qemuDomainObjPrivateAlloc(void *opaque) =20 priv->blockjobs =3D virHashNew(virObjectUnref); priv->fds =3D virHashNew(g_object_unref); + priv->vfioDeviceFds =3D g_hash_table_new(g_str_hash, g_str_equal); =20 + priv->iommufd =3D -1; priv->pidMonitored =3D -1; =20 /* agent commands block by default, user can choose different behavior= */ diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index ffe5bee1bf..bed5326bba 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -266,6 +266,10 @@ struct _qemuDomainObjPrivate { /* named file descriptor groups associated with the VM */ GHashTable *fds; =20 + int iommufd; + + GHashTable *vfioDeviceFds; + char *memoryBackingDir; }; =20 @@ -1171,3 +1175,6 @@ qemuDomainCheckCPU(virArch arch, bool qemuDomainMachineSupportsFloppy(const char *machine, virQEMUCaps *qemuCaps); + +int qemuProcessOpenVfioFds(virDomainObj *vm); +void qemuProcessCloseVfioFds(virDomainObj *vm); diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index fb426deb1a..661e9008f7 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -1630,7 +1630,7 @@ qemuDomainAttachHostPCIDevice(virQEMUDriver *driver, goto error; } =20 - if (!(devprops =3D qemuBuildPCIHostdevDevProps(vm->def, hostdev))) + if (!(devprops =3D qemuBuildPCIHostdevDevProps(vm->def, hostdev, vm))) goto error; =20 qemuDomainObjEnterMonitor(vm); diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index ead5bf3e48..5acaf12cfc 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -25,6 +25,7 @@ #include #include #include +#include #if WITH_SYS_SYSCALL_H # include #endif @@ -8019,6 +8020,9 @@ qemuProcessLaunch(virConnectPtr conn, if (qemuExtDevicesStart(driver, vm, incomingMigrationExtDevices) < 0) goto cleanup; =20 + if (qemuProcessOpenVfioFds(vm) < 0) + goto cleanup; + if (!(cmd =3D qemuBuildCommandLine(vm, incoming ? "defer" : NULL, vmop, @@ -10200,3 +10204,231 @@ qemuProcessHandleNbdkitExit(qemuNbdkitProcess *nb= dkit, qemuProcessEventSubmit(vm, QEMU_PROCESS_EVENT_NBDKIT_EXITED, 0, 0, nbd= kit); virObjectUnlock(vm); } + +/** + * qemuProcessOpenIommuFd: + * @vm: domain object + * @iommuFd: returned file descriptor + * + * Opens /dev/iommu file descriptor for the VM. + * + * Returns: 0 on success, -1 on failure + */ +static int +qemuProcessOpenIommuFd(virDomainObj *vm, int *iommuFd) +{ + int fd =3D -1; + + VIR_DEBUG("Opening IOMMU FD for domain %s", vm->def->name); + + if ((fd =3D open("/dev/iommu", O_RDWR | O_CLOEXEC)) < 0) { + if (errno =3D=3D ENOENT) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("IOMMU FD support requires /dev/iommu device"= )); + } else { + virReportSystemError(errno, "%s", + _("cannot open /dev/iommu")); + } + return -1; + } + + *iommuFd =3D fd; + VIR_DEBUG("Opened IOMMU FD %d for domain %s", fd, vm->def->name); + return 0; +} + +/** + * qemuProcessGetVfioDevicePath: + * @hostdev: host device definition + * @vfioPath: returned VFIO device path + * + * Constructs the VFIO device path for a PCI hostdev. + * + * Returns: 0 on success, -1 on failure + */ +static int +qemuProcessGetVfioDevicePath(virDomainHostdevDef *hostdev, + char **vfioPath) +{ + virPCIDeviceAddress *addr; + g_autofree char *sysfsPath =3D NULL; + DIR *dir =3D NULL; + struct dirent *entry =3D NULL; + int ret =3D -1; + + if (hostdev->mode !=3D VIR_DOMAIN_HOSTDEV_MODE_SUBSYS || + hostdev->source.subsys.type !=3D VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PC= I) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("VFIO FD only supported for PCI hostdevs")); + return -1; + } + + addr =3D &hostdev->source.subsys.u.pci.addr; + + /* Build sysfs path: /sys/bus/pci/devices/DDDD:BB:DD.F/vfio-dev/ */ + sysfsPath =3D g_strdup_printf("/sys/bus/pci/devices/" + "%04x:%02x:%02x.%d/vfio-dev/", + addr->domain, addr->bus, + addr->slot, addr->function); + + if (virDirOpen(&dir, sysfsPath) < 0) { + virReportSystemError(errno, + _("cannot open VFIO sysfs directory %1$s"), + sysfsPath); + return -1; + } + + /* Find the vfio device name in the directory */ + while (virDirRead(dir, &entry, sysfsPath) > 0) { + if (STRPREFIX(entry->d_name, "vfio")) { + *vfioPath =3D g_strdup_printf("/dev/vfio/devices/%s", entry->d= _name); + ret =3D 0; + break; + } + } + + if (ret < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("cannot find VFIO device for PCI device %1$04x:%2= $02x:%3$02x.%4$d"), + addr->domain, addr->bus, addr->slot, addr->function= ); + } + + virDirClose(dir); + return ret; +} + +/** + * qemuProcessOpenVfioDeviceFd: + * @hostdev: host device definition + * @vfioFd: returned file descriptor + * + * Opens the VFIO device file descriptor for a hostdev. + * + * Returns: 0 on success, -1 on failure + */ +static int +qemuProcessOpenVfioDeviceFd(virDomainHostdevDef *hostdev, + int *vfioFd) +{ + g_autofree char *vfioPath =3D NULL; + int fd =3D -1; + + if (qemuProcessGetVfioDevicePath(hostdev, &vfioPath) < 0) + return -1; + + VIR_DEBUG("Opening VFIO device %s", vfioPath); + + if ((fd =3D open(vfioPath, O_RDWR | O_CLOEXEC)) < 0) { + if (errno =3D=3D ENOENT) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("VFIO device %1$s not found - ensure device i= s bound to vfio-pci driver"), + vfioPath); + } else { + virReportSystemError(errno, + _("cannot open VFIO device %1$s"), vfioPa= th); + } + return -1; + } + + *vfioFd =3D fd; + VIR_DEBUG("Opened VFIO device FD %d for %s", *vfioFd, vfioPath); + return 0; +} + +/** + * qemuProcessOpenVfioFds: + * @vm: domain object + * + * Opens all necessary VFIO file descriptors for the domain. + * + * Returns: 0 on success, -1 on failure + */ +int +qemuProcessOpenVfioFds(virDomainObj *vm) +{ + qemuDomainObjPrivate *priv =3D vm->privateData; + bool needsIommuFd =3D false; + size_t i; + + /* Check if we have any hostdevs that need VFIO FDs */ + for (i =3D 0; i < vm->def->nhostdevs; i++) { + virDomainHostdevDef *hostdev =3D vm->def->hostdevs[i]; + int vfioFd =3D -1; + g_autofree char *fdname =3D NULL; + + if (hostdev->mode =3D=3D VIR_DOMAIN_HOSTDEV_MODE_SUBSYS && + hostdev->source.subsys.type =3D=3D VIR_DOMAIN_HOSTDEV_SUBSYS_T= YPE_PCI) { + + /* Check if this hostdev uses VFIO with IOMMU FD */ + if (hostdev->source.subsys.u.pci.driver.name =3D=3D VIR_DEVICE= _HOSTDEV_PCI_DRIVER_NAME_VFIO && + hostdev->source.subsys.u.pci.driver.iommufd) { + + needsIommuFd =3D true; + + /* Open VFIO device FD */ + if (qemuProcessOpenVfioDeviceFd(hostdev, &vfioFd) < 0) + goto error; + + /* Store the FD */ + fdname =3D g_strdup_printf("vfio-%04x:%02x:%02x.%d", + hostdev->source.subsys.u.pci.addr= .domain, + hostdev->source.subsys.u.pci.addr= .bus, + hostdev->source.subsys.u.pci.addr= .slot, + hostdev->source.subsys.u.pci.addr= .function); + + g_hash_table_insert(priv->vfioDeviceFds, g_steal_pointer(&= fdname), GINT_TO_POINTER(vfioFd)); + + VIR_DEBUG("Stored VFIO FD for device %s", fdname); + } + } + } + + /* Open IOMMU FD if needed */ + if (needsIommuFd) { + int iommuFd =3D -1; + + if (qemuProcessOpenIommuFd(vm, &iommuFd) < 0) + goto error; + + priv->iommufd =3D iommuFd; + + VIR_DEBUG("Stored IOMMU FD"); + } + + return 0; + + error: + qemuProcessCloseVfioFds(vm); + return -1; +} + +/** + * qemuProcessCloseVfioFds: + * @vm: domain object + * + * Closes all VFIO file descriptors for the domain. + */ +void +qemuProcessCloseVfioFds(virDomainObj *vm) +{ + qemuDomainObjPrivate *priv =3D vm->privateData; + GHashTableIter iter; + gpointer key, value; + + /* Close all VFIO device FDs */ + if (priv->vfioDeviceFds) { + g_hash_table_iter_init(&iter, priv->vfioDeviceFds); + while (g_hash_table_iter_next(&iter, &key, &value)) { + int fd =3D GPOINTER_TO_INT(value); + VIR_DEBUG("Closing VFIO device FD %d for %s", fd, (char*)key); + VIR_FORCE_CLOSE(fd); + } + g_hash_table_remove_all(priv->vfioDeviceFds); + } + + /* Close IOMMU FD */ + if (priv->iommufd >=3D 0) { + VIR_DEBUG("Closing IOMMU FD %d", priv->iommufd); + VIR_FORCE_CLOSE(priv->iommufd); + } +} --=20 2.43.0