From nobody Fri Nov 21 10:14:38 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.libvirt.org designates 8.43.85.245 as permitted sender) client-ip=8.43.85.245; envelope-from=devel-bounces@lists.libvirt.org; helo=lists.libvirt.org; Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zohomail.com: domain of lists.libvirt.org designates 8.43.85.245 as permitted sender) smtp.mailfrom=devel-bounces@lists.libvirt.org; arc=fail (Bad Signature); dmarc=pass(p=reject dis=none) header.from=lists.libvirt.org Return-Path: Received: from lists.libvirt.org (lists.libvirt.org [8.43.85.245]) by mx.zohomail.com with SMTPS id 1762195888239222.25173067676621; Mon, 3 Nov 2025 10:51:28 -0800 (PST) Received: by lists.libvirt.org (Postfix, from userid 993) id 36A523F358; Mon, 3 Nov 2025 13:51:27 -0500 (EST) Received: from [172.19.199.29] (lists.libvirt.org [8.43.85.245]) by lists.libvirt.org (Postfix) with ESMTP id E3078441CD; Mon, 3 Nov 2025 13:48:03 -0500 (EST) Received: by lists.libvirt.org (Postfix, from userid 993) id AF9A944139; Mon, 3 Nov 2025 13:47:54 -0500 (EST) Received: from SN4PR0501CU005.outbound.protection.outlook.com (mail-southcentralusazon11011030.outbound.protection.outlook.com [40.93.194.30]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (prime256v1) server-signature RSA-PSS (3072 bits) server-digest SHA256) (No client certificate requested) by lists.libvirt.org (Postfix) with ESMTPS id 30C9244112 for ; Mon, 3 Nov 2025 13:47:34 -0500 (EST) Received: from SJ0PR12MB6855.namprd12.prod.outlook.com (2603:10b6:a03:47e::6) by BL3PR12MB6643.namprd12.prod.outlook.com (2603:10b6:208:38f::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9275.15; Mon, 3 Nov 2025 18:47:28 +0000 Received: from SJ0PR12MB6855.namprd12.prod.outlook.com ([fe80::1924:8e6f:c8f3:83c2]) by SJ0PR12MB6855.namprd12.prod.outlook.com ([fe80::1924:8e6f:c8f3:83c2%3]) with mapi id 15.20.9275.015; Mon, 3 Nov 2025 18:47:27 +0000 X-Spam-Checker-Version: SpamAssassin 4.0.1 (2024-03-26) on lists.libvirt.org X-Spam-Level: X-Spam-Status: No, score=-5.0 required=5.0 tests=ARC_SIGNED,ARC_VALID,BAYES_00, DKIM_INVALID,DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED,RCVD_IN_VALIDITY_RPBL_BLOCKED, RCVD_IN_VALIDITY_SAFE_BLOCKED,SPF_PASS autolearn=unavailable autolearn_force=no version=4.0.1 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=gtWljJYaVP//NHRMEBFPE97lIHLS0m+LSnjIjygnDQQec3X6kG6YLIQ7rcrxaB2+F9oiihKT+PdyQbBXCs2kA9wpgEtFy+MybMrlndZb/s6EV7SY18dQ8U2v3nnQRa1cSgzFhAVjwvx7moreuWG5qJcCwotwNhcWvVmgJNNBRbeVFahOvBvwCW3RP/t/UZo/cIng67EkrF9H6ph/oyd0585VAYL+u5YgKxKcJP/F5VVfuZjmSaHgWm2zlLQRlVLiSXazF3ygwdLTKmGzSE9/8eb+3r7pz116aJ/M9cd1lcCwfcZQJSgzc45pDB/hWIaEA3bke8jMM477OwKF8OOP4Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=qABJKzH3qAGuilz9elVH9PI9KAjMz5vC0L0V1hq9v5Y=; b=tw2zP7QwOVN5fYzkK7R/m19RmEAFS00W6vfT2/tGKWfn7ZqNUEydY9MDZE1Ga4M2uy8xmKjkS9zd/yGEOwu0AxOCjRWBuENHlCUnBSpM9r0GTudK02UKSWXOaSwIeQQa+Douvcbahw9zwZAsm2uXcuDRbPPULPZ1kSUUp1aSehhdvTeNpgh8gJ9HzTiLHL6B61aBMPAQg9eDVDNMaXypuh23281a3slA0qMcl4Ektx+ask6gdJ/Dc3K0rFN3rwKR+a4cG6wvp2TnCCShTa3HJeyjJh5yWRkIPPwLa37rO86Fz3UBbvHm+LOt65VezUFMOxw8UUD6+7xys8zPUR2t/Q== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=qABJKzH3qAGuilz9elVH9PI9KAjMz5vC0L0V1hq9v5Y=; b=A2kOoXEYtZcg4PLeShxuheFHgM1KdRe1SCWCKrJMUGZfjwCNbaVtdwTwDJ832iMgwI/kcmPdc3j1nIaH38YSWZUmgscELonEeGeswzllesunXbNqvoRbk8CSdUWt7Y/CKUyHjsnjmNJv0x6prE4849RGEd3q5/4gCIff57xIOIN/cHcdPCuEjADiR+BkJG7czCNgjgj1tTJoGnnVWc0MZAFNGFlO2+xR4v3GlpeY0Oc+gdLbBQNCTS4IXcvjpobT56ANVMA9iejEb5KW/AfFxku7UgDj0aAMWkiAihrtfzexxxo7ko0y4yyfVgebP1XN5s+cFyGp1MmGu+o0cY15QQ== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; To: devel@lists.libvirt.org Subject: [PATCH 2/4] qemu: open iommufd FDs from libvirt backend Date: Mon, 3 Nov 2025 10:47:09 -0800 Message-ID: <20251103184711.4022833-3-nathanc@nvidia.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251103184711.4022833-1-nathanc@nvidia.com> References: <20251103184711.4022833-1-nathanc@nvidia.com> Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: PH8P220CA0026.NAMP220.PROD.OUTLOOK.COM (2603:10b6:510:348::12) To SJ0PR12MB6855.namprd12.prod.outlook.com (2603:10b6:a03:47e::6) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SJ0PR12MB6855:EE_|BL3PR12MB6643:EE_ X-MS-Office365-Filtering-Correlation-Id: c726466d-0e9a-4288-8ba8-08de1b096f82 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|366016|376014|1800799024; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?HKmQDEVt0uRuWJoqmk4dLECDmaX2WpnpVq00mLzp/JsWx60slqxbWluwZVVG?= =?us-ascii?Q?Sj2OIA+F6pw6mM7dPJwbicNl00z95Rx+NWjcaKJB8hZKB7HYpdnT6XZpNStc?= =?us-ascii?Q?Dry8SUGyQp66dcK2F/XXV0oUIvMuYr0jiYm8PwJPEG4RGBtgbacS0tQF5Ho0?= =?us-ascii?Q?QOXVMeKTTiE1EmILt+ADqoa8y2xegdsRHbchTEwypG/sj0cavn+6GjJVogbw?= =?us-ascii?Q?QnZPWtZRdw8wGnH8kyfl8h/6rYD9LDlwBhQbOnOCLryr/0ap61Ww8MoNWkzL?= =?us-ascii?Q?Sh70XKrSG1k7rHHuljlCWNp4id61OPBKoek/jw0XrHfOxP0C1Scl5P/PfDA2?= =?us-ascii?Q?QusDHPq69dxrvN4qo970C71Sg4okkmlMeUMTbaWF61pS/jLKcfFiqOHrWxdv?= =?us-ascii?Q?mBcmIf71JsG+fsBxEAEF5/BhO03O8hpSkXqan2HCljgWteFs2sX4ZQ2H+2Ej?= =?us-ascii?Q?7SEY9lxqWE2CnfdQjmVhP9rr4OrWXlV+fu7ZkO97zMFtCuUukM19HtcAPic+?= =?us-ascii?Q?jMz4gsg4MkMIoMhLlGiIBgbaSZrUuFxwFvXESmrNHR4i5nBinVGxd04YZ27/?= =?us-ascii?Q?XqYpgjonP/H3CIwqvD8lAZCRNemHnmGDAotCk5l247/LuV64KntgfSoqVsF1?= =?us-ascii?Q?TYEQevEds0R09O9U2HELUIMnx0DJ81zb49LrIQ3UhTz3J+rt2up5Q/TgetI+?= =?us-ascii?Q?L5mcsU3YYIeEFPLVm7Q4hy2a5w6p6daFXIIfrZ4iCDZlr2vWECE4ZrIKfxrH?= =?us-ascii?Q?iESstgpC6ODPkWSLYfGzQbOZV7h9W8XBCpIwzeTAM1G55S/3UK4sYV+8xUc/?= =?us-ascii?Q?3rIQBRcOnZda4yWcwu5t1deVvFDk+KtVSbqs1uRFoq+uKgJguzGCzP00dbR4?= =?us-ascii?Q?TiH1hYIQ/A04z4Qv/WXSDJEN+kZEPUlRXyW65MlzXTm5MFF8RVJSGbz2M7zA?= =?us-ascii?Q?8Sc0CNHac3tgD7Rpce2Yz1vH0R7/f05ehBWvPiLIkRRZV2MxfwAs01O+UyjX?= =?us-ascii?Q?RZpE7exahB+6pNwP6aPaEq9z529MhXqLb4PVI80RCz+KMebfhr883Vr7jaz7?= =?us-ascii?Q?0ytuAKFGex5abDhe2FU2smkK+Qc61yVZNf/Pv29cQG3ZBry4l7QjhY54SxKw?= =?us-ascii?Q?BMp60knIvFjK7iUerOSWMx+30MwZo+QYykGeJPdcJmKM4cJ16tUo7hTkRiYz?= =?us-ascii?Q?Rd9Xg8ybMr6YvD26esv0qbcsEioLE6XXA6vFcfj4GRKcizdO/WkeKv7d2rAJ?= =?us-ascii?Q?V5GMG15oXmECJpoDMa6NaCZRSr3jty3xA7bejgqta/BDdOGeU9/7GMRjrdvp?= =?us-ascii?Q?H59DnziE14QrsIoFW4LYyPyAvPB2Pgsng6vKSM5uPCxno4RMugmaX02oeSoD?= =?us-ascii?Q?Vxj5Oevxs3Z9hjJ9wfQuqbSUqtKbgTKBe1hafsQHvtv42wz9TFYAFNaJst1/?= =?us-ascii?Q?WfFuwA6cN7X+tRe9TEe8qSAwD+W14Am0?= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:SJ0PR12MB6855.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(366016)(376014)(1800799024);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?QlJjhteKkKP8fdizgu9tzbA4pC5cjoVZcMixkvRzq81anN7Ji6I+QKqEuV5S?= =?us-ascii?Q?3O1ncQ1Su9Qvb/PeSA/6gVHpEUmMtt8Ee/14ETJ8B66ps0yg1bmDA+XOP1Tv?= =?us-ascii?Q?EfMkGa7s07CzSYM2kf/s2bF8YF2+tH9QIPWZKSX3qXs9AbEU3ayo8N5NHoNn?= =?us-ascii?Q?pcVMXec0VEBAHl6rqGuatKHYMK2NFP0GiIxlJksE4rt/VyfV/h+zBr5Z5UyY?= =?us-ascii?Q?+EdDkFn/L2aDmTtivr/2DC6iijr3Zjwe+c3c/3tL7ZwZnjNzfWZvLEkZEjSC?= =?us-ascii?Q?9Guu644l+VStkw+Cdz9SSEONWkEG3Qg2XttXMBtASyXoEZC+PLDGSje23y40?= =?us-ascii?Q?DyYUmKqjLtD46ojoNg5gRyrmWlF5/o+YHxSBxHZh78jMNVVMB2jjH6pBIyNE?= =?us-ascii?Q?SaENC8A5ggGEkNmw2ZM5T5f3ckR3BN6Jhy8Z0G7RPkTSj5G2MkZ9+Y7CpD1N?= =?us-ascii?Q?RRjZarSnFo7ZUiQnl20qIrIJoOT1vLR8PWIcHigz2sv9cTJEWJ0s3q2qmh72?= =?us-ascii?Q?28MN/P/xzF57uYYa5/wohOGK44Z5YcCgi+VhYhodFc6dLJWfGUJVBlP0tUn+?= =?us-ascii?Q?+7Q8+O+tPfPezNoo2t3Qlf35/nXZ6PQmdlfeLbaezcQSm9QmmpPc+cB8KxC0?= =?us-ascii?Q?X2SwA3Sp3Zk4WSPW6MRqwACetcJhtjnXAd/wluO7cd6B/CXX1g6BbXuBUUYA?= =?us-ascii?Q?DTZ+bW5qfCuQK3ZUp/Nv66ncAHe/sTsSwLYYcZ6y3MgO+U40J0i82C/bmE5+?= =?us-ascii?Q?90Kkrl4X+0GkfR0oieiNOQSe2ktYuHM2/jpOzvqUQKuIYOhmwEuyJ/lPZ0L0?= =?us-ascii?Q?54XduRIIFnUBxOcXhl/M9diUUt9SaUEsxKnZDI3gvfKMv0rN/UN+zSrOxNUg?= =?us-ascii?Q?psXOavxIXEWFQgavXdM4DVP+t26p0Idjgkqv04/9/y1Rs5djD85xR06SBZtL?= =?us-ascii?Q?Y4LyxoRgDPME7+vmN9ruBdNUXCVVpgQRiKOetT/vUANNie7lomDzQKmRuRNR?= =?us-ascii?Q?xeiby+QXv18jbanSx2vzeo0yABnJ3snWN6iqmo44ppOkQgmoj6Tbmrx3V4xG?= =?us-ascii?Q?mH+AtfDyT3buCeDbvvow/kQd8Hl0cWlQQnWUFdJOqELqFGH38OYi6iNvG7iz?= =?us-ascii?Q?+t0Nu87WrxzGz/xDzmpu744HeYRteRLm0gnWP7QtHMGa448yN/SjoGIQxTQz?= =?us-ascii?Q?MR+T5S+O4Fyyu3ExNnwnESPVG9dRheG8bIVeF8s/f6oWyvUlNXhpkxZLhmNe?= =?us-ascii?Q?5Sf5pnmUkLOMusV0rqP6NFIzyHpB2X71odDy2N979lL8kdKqkobXXeVKpnYC?= =?us-ascii?Q?fm0JtduZsysnRErF4CoBzI3d2zU7AGWHFEv5CNE68oyS8QXN6QHbh7yb5m5a?= =?us-ascii?Q?sjptcKUuLfAfKCeMiAtKlhD4c4/M4r8FXkanGb/N/aTtmLQ45eU45/vsecvm?= =?us-ascii?Q?slQlRIuI2l/C1BWNvwOFZx7UeOW/p+Bd+1f96ZfOCrLLDZld1fOiOiTYh9uY?= =?us-ascii?Q?dzXzWyoEH9EOtz9/SjJ0DajN4Bv0VIcZRCeATujtciB+7paLx/Zrr/kTZW4w?= =?us-ascii?Q?V5i6IZ3+vvtd6nK8gocfw0W9nn6849aBVAMzZB5j?= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: c726466d-0e9a-4288-8ba8-08de1b096f82 X-MS-Exchange-CrossTenant-AuthSource: SJ0PR12MB6855.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 03 Nov 2025 18:47:27.3758 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: M5Qoz9u8ICVKZOyW49gCNM+HV3XvUj7N1k/gC/x99toGZUgCfMcuKNPJSGYBSrEofkmszLISJ7etFv3TAPucBg== X-MS-Exchange-Transport-CrossTenantHeadersStamped: BL3PR12MB6643 Message-ID-Hash: O3ZXSBDEZLSWBHGF3UJTVCETBELY5WIQ X-Message-ID-Hash: O3ZXSBDEZLSWBHGF3UJTVCETBELY5WIQ X-MailFrom: nathanc@nvidia.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; header-match-devel.lists.libvirt.org-0; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: skolothumtho@nvidia.com, nicolinc@nvidia.com, nathanc@nvidia.com, mochs@nvidia.com X-Mailman-Version: 3.3.10 Precedence: list List-Id: Development discussions about the libvirt library & tools Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: From: Nathan Chen via Devel Reply-To: Nathan Chen X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZM-MESSAGEID: 1762195891686158500 Content-Type: text/plain; charset="utf-8" Open iommufd FDs from libvirt backend without exposing these FDs to XML users, i.e. one per domain for /dev/iommu and one per iommufd hostdev for /dev/vfio/devices/vfioX, and pass the FD to qemu command line. Signed-off-by: Nathan Chen --- src/qemu/qemu_command.c | 43 +++++++- src/qemu/qemu_command.h | 3 +- src/qemu/qemu_domain.c | 8 ++ src/qemu/qemu_domain.h | 7 ++ src/qemu/qemu_hotplug.c | 2 +- src/qemu/qemu_process.c | 232 ++++++++++++++++++++++++++++++++++++++++ 6 files changed, 289 insertions(+), 6 deletions(-) diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 8fd7527645..740a6970f2 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -4730,7 +4730,8 @@ qemuBuildVideoCommandLine(virCommand *cmd, =20 virJSONValue * qemuBuildPCIHostdevDevProps(const virDomainDef *def, - virDomainHostdevDef *dev) + virDomainHostdevDef *dev, + virDomainObj *vm) { g_autoptr(virJSONValue) props =3D NULL; virDomainHostdevSubsysPCI *pcisrc =3D &dev->source.subsys.u.pci; @@ -4741,6 +4742,13 @@ qemuBuildPCIHostdevDevProps(const virDomainDef *def, const char *iommufdId =3D NULL; /* 'ramfb' property must be omitted unless it's to be enabled */ bool ramfb =3D pcisrc->ramfb =3D=3D VIR_TRISTATE_SWITCH_ON; + bool useIommufd =3D false; + qemuDomainObjPrivate *priv =3D vm ? vm->privateData : NULL; + + if (pcisrc->driver.name =3D=3D VIR_DEVICE_HOSTDEV_PCI_DRIVER_NAME_VFIO= && + pcisrc->driver.iommufd) { + useIommufd =3D true; + } =20 /* caller has to assign proper passthrough driver name */ switch (pcisrc->driver.name) { @@ -4787,6 +4795,18 @@ qemuBuildPCIHostdevDevProps(const virDomainDef *def, NULL) < 0) return NULL; =20 + if (useIommufd && priv) { + g_autofree char *vfioFdName =3D g_strdup_printf("vfio-%04x:%02x:%0= 2x.%d", + pcisrc->addr.domain,= pcisrc->addr.bus, + pcisrc->addr.slot, p= cisrc->addr.function); + + int vfiofd =3D GPOINTER_TO_INT(g_hash_table_lookup(priv->vfioDevic= eFds, vfioFdName)); + if (virJSONValueObjectAdd(&props, + "S:fd", g_strdup_printf("%d", vfiofd), + NULL) < 0) + return NULL; + } + if (qemuBuildDeviceAddressProps(props, def, dev->info) < 0) return NULL; =20 @@ -5197,12 +5217,14 @@ qemuBuildAcpiNodesetProps(virCommand *cmd, static int qemuBuildHostdevCommandLine(virCommand *cmd, const virDomainDef *def, - virQEMUCaps *qemuCaps) + virQEMUCaps *qemuCaps, + virDomainObj *vm) { size_t i; g_autoptr(virJSONValue) props =3D NULL; int iommufd =3D 0; const char * iommufdId =3D "iommufd0"; + qemuDomainObjPrivate *priv =3D vm->privateData; =20 for (i =3D 0; i < def->nhostdevs; i++) { virDomainHostdevDef *hostdev =3D def->hostdevs[i]; @@ -5233,8 +5255,10 @@ qemuBuildHostdevCommandLine(virCommand *cmd, =20 if (subsys->u.pci.driver.iommufd && iommufd =3D=3D 0) { iommufd =3D 1; + virCommandPassFD(cmd, priv->iommufd, VIR_COMMAND_PASS_FD_C= LOSE_PARENT); if (qemuMonitorCreateObjectProps(&props, "iommufd", iommufdId, + "S:fd", g_strdup_printf("= %d", priv->iommufd), NULL) < 0) return -1; =20 @@ -5245,7 +5269,18 @@ qemuBuildHostdevCommandLine(virCommand *cmd, if (qemuCommandAddExtDevice(cmd, hostdev->info, def, qemuCaps)= < 0) return -1; =20 - if (!(devprops =3D qemuBuildPCIHostdevDevProps(def, hostdev))) + if (subsys->u.pci.driver.iommufd) { + virDomainHostdevSubsysPCI *pcisrc =3D &hostdev->source.sub= sys.u.pci; + g_autofree char *vfioFdName =3D g_strdup_printf("vfio-%04x= :%02x:%02x.%d", + pcisrc->addr= .domain, pcisrc->addr.bus, + pcisrc->addr= .slot, pcisrc->addr.function); + + int vfiofd =3D GPOINTER_TO_INT(g_hash_table_lookup(priv->v= fioDeviceFds, vfioFdName)); + + virCommandPassFD(cmd, vfiofd, VIR_COMMAND_PASS_FD_CLOSE_PA= RENT); + } + + if (!(devprops =3D qemuBuildPCIHostdevDevProps(def, hostdev, v= m))) return -1; =20 if (qemuBuildDeviceCommandlineFromJSON(cmd, devprops, def, qem= uCaps) < 0) @@ -10893,7 +10928,7 @@ qemuBuildCommandLine(virDomainObj *vm, if (qemuBuildRedirdevCommandLine(cmd, def, qemuCaps) < 0) return NULL; =20 - if (qemuBuildHostdevCommandLine(cmd, def, qemuCaps) < 0) + if (qemuBuildHostdevCommandLine(cmd, def, qemuCaps, vm) < 0) return NULL; =20 if (migrateURI) diff --git a/src/qemu/qemu_command.h b/src/qemu/qemu_command.h index ad068f1f16..380aac261f 100644 --- a/src/qemu/qemu_command.h +++ b/src/qemu/qemu_command.h @@ -180,7 +180,8 @@ qemuBuildThreadContextProps(virJSONValue **tcProps, /* Current, best practice */ virJSONValue * qemuBuildPCIHostdevDevProps(const virDomainDef *def, - virDomainHostdevDef *dev); + virDomainHostdevDef *dev, + virDomainObj *vm); =20 virJSONValue * qemuBuildRNGDevProps(const virDomainDef *def, diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index a42721efad..86640aa3e3 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -1953,6 +1953,11 @@ qemuDomainObjPrivateFree(void *data) =20 virChrdevFree(priv->devs); =20 + if (priv->iommufd >=3D 0) { + virEventRemoveHandle(priv->iommufd); + priv->iommufd =3D -1; + } + if (priv->pidMonitored >=3D 0) { virEventRemoveHandle(priv->pidMonitored); priv->pidMonitored =3D -1; @@ -1974,6 +1979,7 @@ qemuDomainObjPrivateFree(void *data) =20 g_clear_pointer(&priv->blockjobs, g_hash_table_unref); g_clear_pointer(&priv->fds, g_hash_table_unref); + g_clear_pointer(&priv->vfioDeviceFds, g_hash_table_unref); =20 /* This should never be non-NULL if we get here, but just in case... */ if (priv->eventThread) { @@ -2002,7 +2008,9 @@ qemuDomainObjPrivateAlloc(void *opaque) =20 priv->blockjobs =3D virHashNew(virObjectUnref); priv->fds =3D virHashNew(g_object_unref); + priv->vfioDeviceFds =3D g_hash_table_new(g_str_hash, g_str_equal); =20 + priv->iommufd =3D -1; priv->pidMonitored =3D -1; =20 /* agent commands block by default, user can choose different behavior= */ diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 3396f929fd..d6214df783 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -264,6 +264,10 @@ struct _qemuDomainObjPrivate { /* named file descriptor groups associated with the VM */ GHashTable *fds; =20 + int iommufd; + + GHashTable *vfioDeviceFds; + char *memoryBackingDir; }; =20 @@ -1174,3 +1178,6 @@ qemuDomainCheckCPU(virArch arch, bool qemuDomainMachineSupportsFloppy(const char *machine, virQEMUCaps *qemuCaps); + +int qemuProcessOpenVfioFds(virDomainObj *vm); +void qemuProcessCloseVfioFds(virDomainObj *vm); diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index fb426deb1a..661e9008f7 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -1630,7 +1630,7 @@ qemuDomainAttachHostPCIDevice(virQEMUDriver *driver, goto error; } =20 - if (!(devprops =3D qemuBuildPCIHostdevDevProps(vm->def, hostdev))) + if (!(devprops =3D qemuBuildPCIHostdevDevProps(vm->def, hostdev, vm))) goto error; =20 qemuDomainObjEnterMonitor(vm); diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 45fc32a663..cecfed94a7 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -25,6 +25,7 @@ #include #include #include +#include #if WITH_SYS_SYSCALL_H # include #endif @@ -8091,6 +8092,9 @@ qemuProcessLaunch(virConnectPtr conn, if (qemuExtDevicesStart(driver, vm, incomingMigrationExtDevices) < 0) goto cleanup; =20 + if (qemuProcessOpenVfioFds(vm) < 0) + goto cleanup; + if (!(cmd =3D qemuBuildCommandLine(vm, incoming ? "defer" : NULL, vmop, @@ -10267,3 +10271,231 @@ qemuProcessHandleNbdkitExit(qemuNbdkitProcess *nb= dkit, qemuProcessEventSubmit(vm, QEMU_PROCESS_EVENT_NBDKIT_EXITED, 0, 0, nbd= kit); virObjectUnlock(vm); } + +/** + * qemuProcessOpenIommuFd: + * @vm: domain object + * @iommuFd: returned file descriptor + * + * Opens /dev/iommu file descriptor for the VM. + * + * Returns: 0 on success, -1 on failure + */ +static int +qemuProcessOpenIommuFd(virDomainObj *vm, int *iommuFd) +{ + int fd =3D -1; + + VIR_DEBUG("Opening IOMMU FD for domain %s", vm->def->name); + + if ((fd =3D open("/dev/iommu", O_RDWR | O_CLOEXEC)) < 0) { + if (errno =3D=3D ENOENT) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("IOMMU FD support requires /dev/iommu device"= )); + } else { + virReportSystemError(errno, "%s", + _("cannot open /dev/iommu")); + } + return -1; + } + + *iommuFd =3D fd; + VIR_DEBUG("Opened IOMMU FD %d for domain %s", fd, vm->def->name); + return 0; +} + +/** + * qemuProcessGetVfioDevicePath: + * @hostdev: host device definition + * @vfioPath: returned VFIO device path + * + * Constructs the VFIO device path for a PCI hostdev. + * + * Returns: 0 on success, -1 on failure + */ +static int +qemuProcessGetVfioDevicePath(virDomainHostdevDef *hostdev, + char **vfioPath) +{ + virPCIDeviceAddress *addr; + g_autofree char *sysfsPath =3D NULL; + DIR *dir =3D NULL; + struct dirent *entry =3D NULL; + int ret =3D -1; + + if (hostdev->mode !=3D VIR_DOMAIN_HOSTDEV_MODE_SUBSYS || + hostdev->source.subsys.type !=3D VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PC= I) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("VFIO FD only supported for PCI hostdevs")); + return -1; + } + + addr =3D &hostdev->source.subsys.u.pci.addr; + + /* Build sysfs path: /sys/bus/pci/devices/DDDD:BB:DD.F/vfio-dev/ */ + sysfsPath =3D g_strdup_printf("/sys/bus/pci/devices/" + "%04x:%02x:%02x.%d/vfio-dev/", + addr->domain, addr->bus, + addr->slot, addr->function); + + if (virDirOpen(&dir, sysfsPath) < 0) { + virReportSystemError(errno, + _("cannot open VFIO sysfs directory %1$s"), + sysfsPath); + return -1; + } + + /* Find the vfio device name in the directory */ + while (virDirRead(dir, &entry, sysfsPath) > 0) { + if (STRPREFIX(entry->d_name, "vfio")) { + *vfioPath =3D g_strdup_printf("/dev/vfio/devices/%s", entry->d= _name); + ret =3D 0; + break; + } + } + + if (ret < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("cannot find VFIO device for PCI device %1$04x:%2= $02x:%3$02x.%4$d"), + addr->domain, addr->bus, addr->slot, addr->function= ); + } + + virDirClose(dir); + return ret; +} + +/** + * qemuProcessOpenVfioDeviceFd: + * @hostdev: host device definition + * @vfioFd: returned file descriptor + * + * Opens the VFIO device file descriptor for a hostdev. + * + * Returns: 0 on success, -1 on failure + */ +static int +qemuProcessOpenVfioDeviceFd(virDomainHostdevDef *hostdev, + int *vfioFd) +{ + g_autofree char *vfioPath =3D NULL; + int fd =3D -1; + + if (qemuProcessGetVfioDevicePath(hostdev, &vfioPath) < 0) + return -1; + + VIR_DEBUG("Opening VFIO device %s", vfioPath); + + if ((fd =3D open(vfioPath, O_RDWR | O_CLOEXEC)) < 0) { + if (errno =3D=3D ENOENT) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("VFIO device %1$s not found - ensure device i= s bound to vfio-pci driver"), + vfioPath); + } else { + virReportSystemError(errno, + _("cannot open VFIO device %1$s"), vfioPa= th); + } + return -1; + } + + *vfioFd =3D fd; + VIR_DEBUG("Opened VFIO device FD %d for %s", *vfioFd, vfioPath); + return 0; +} + +/** + * qemuProcessOpenVfioFds: + * @vm: domain object + * + * Opens all necessary VFIO file descriptors for the domain. + * + * Returns: 0 on success, -1 on failure + */ +int +qemuProcessOpenVfioFds(virDomainObj *vm) +{ + qemuDomainObjPrivate *priv =3D vm->privateData; + bool needsIommuFd =3D false; + size_t i; + + /* Check if we have any hostdevs that need VFIO FDs */ + for (i =3D 0; i < vm->def->nhostdevs; i++) { + virDomainHostdevDef *hostdev =3D vm->def->hostdevs[i]; + int vfioFd =3D -1; + g_autofree char *fdname =3D NULL; + + if (hostdev->mode =3D=3D VIR_DOMAIN_HOSTDEV_MODE_SUBSYS && + hostdev->source.subsys.type =3D=3D VIR_DOMAIN_HOSTDEV_SUBSYS_T= YPE_PCI) { + + /* Check if this hostdev uses VFIO with IOMMU FD */ + if (hostdev->source.subsys.u.pci.driver.name =3D=3D VIR_DEVICE= _HOSTDEV_PCI_DRIVER_NAME_VFIO && + hostdev->source.subsys.u.pci.driver.iommufd) { + + needsIommuFd =3D true; + + /* Open VFIO device FD */ + if (qemuProcessOpenVfioDeviceFd(hostdev, &vfioFd) < 0) + goto error; + + /* Store the FD */ + fdname =3D g_strdup_printf("vfio-%04x:%02x:%02x.%d", + hostdev->source.subsys.u.pci.addr= .domain, + hostdev->source.subsys.u.pci.addr= .bus, + hostdev->source.subsys.u.pci.addr= .slot, + hostdev->source.subsys.u.pci.addr= .function); + + g_hash_table_insert(priv->vfioDeviceFds, g_steal_pointer(&= fdname), GINT_TO_POINTER(vfioFd)); + + VIR_DEBUG("Stored VFIO FD for device %s", fdname); + } + } + } + + /* Open IOMMU FD if needed */ + if (needsIommuFd) { + int iommuFd =3D -1; + + if (qemuProcessOpenIommuFd(vm, &iommuFd) < 0) + goto error; + + priv->iommufd =3D iommuFd; + + VIR_DEBUG("Stored IOMMU FD"); + } + + return 0; + + error: + qemuProcessCloseVfioFds(vm); + return -1; +} + +/** + * qemuProcessCloseVfioFds: + * @vm: domain object + * + * Closes all VFIO file descriptors for the domain. + */ +void +qemuProcessCloseVfioFds(virDomainObj *vm) +{ + qemuDomainObjPrivate *priv =3D vm->privateData; + GHashTableIter iter; + gpointer key, value; + + /* Close all VFIO device FDs */ + if (priv->vfioDeviceFds) { + g_hash_table_iter_init(&iter, priv->vfioDeviceFds); + while (g_hash_table_iter_next(&iter, &key, &value)) { + int fd =3D GPOINTER_TO_INT(value); + VIR_DEBUG("Closing VFIO device FD %d for %s", fd, (char*)key); + VIR_FORCE_CLOSE(fd); + } + g_hash_table_remove_all(priv->vfioDeviceFds); + } + + /* Close IOMMU FD */ + if (priv->iommufd >=3D 0) { + VIR_DEBUG("Closing IOMMU FD %d", priv->iommufd); + VIR_FORCE_CLOSE(priv->iommufd); + } +} --=20 2.43.0