From nobody Mon Sep 8 02:37:19 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.libvirt.org designates 8.43.85.245 as permitted sender) client-ip=8.43.85.245; envelope-from=devel-bounces@lists.libvirt.org; helo=lists.libvirt.org; Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zohomail.com: domain of lists.libvirt.org designates 8.43.85.245 as permitted sender) smtp.mailfrom=devel-bounces@lists.libvirt.org; arc=fail (BodyHash is different from the expected one); dmarc=pass(p=reject dis=none) header.from=lists.libvirt.org Return-Path: Received: from lists.libvirt.org (lists.libvirt.org [8.43.85.245]) by mx.zohomail.com with SMTPS id 1755226929052944.7037008852008; Thu, 14 Aug 2025 20:02:09 -0700 (PDT) Received: by lists.libvirt.org (Postfix, from userid 996) id 5F92EADC; Thu, 14 Aug 2025 23:02:08 -0400 (EDT) Received: from lists.libvirt.org (localhost [IPv6:::1]) by lists.libvirt.org (Postfix) with ESMTP id 5409CC18; Thu, 14 Aug 2025 22:55:42 -0400 (EDT) Received: by lists.libvirt.org (Postfix, from userid 996) id 64ADD999; Thu, 14 Aug 2025 22:55:39 -0400 (EDT) Received: from NAM11-CO1-obe.outbound.protection.outlook.com (mail-co1nam11on2086.outbound.protection.outlook.com [40.107.220.86]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.libvirt.org (Postfix) with ESMTPS id 65247B15 for ; Thu, 14 Aug 2025 22:54:43 -0400 (EDT) Received: from PH7PR12MB6834.namprd12.prod.outlook.com (2603:10b6:510:1b4::18) by PH7PR12MB5685.namprd12.prod.outlook.com (2603:10b6:510:13c::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9031.15; Fri, 15 Aug 2025 02:54:38 +0000 Received: from PH7PR12MB6834.namprd12.prod.outlook.com ([fe80::f432:162b:b94e:d2cb]) by PH7PR12MB6834.namprd12.prod.outlook.com ([fe80::f432:162b:b94e:d2cb%4]) with mapi id 15.20.9009.017; Fri, 15 Aug 2025 02:54:38 +0000 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on lists.libvirt.org X-Spam-Level: X-Spam-Status: No, score=0.2 required=5.0 tests=DKIM_INVALID,DKIM_SIGNED, FORGED_SPF_HELO,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,RCVD_IN_VALIDITY_RPBL_BLOCKED, RCVD_IN_VALIDITY_SAFE_BLOCKED,SPF_HELO_PASS autolearn=no autolearn_force=no version=3.4.4 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=B/0qDh8wxEz8yjOgE5KwQZ99CCAiiQTvLSmshhnZIigBdEKCGq5ju2rAOmUR23u2ypUAOPN5UHtQVWD3J/QGoY5HpZcSdT2y2N6M4AfN0jdEk42sl7Ppua+SY3fKjO67XoUxGy7LNXMjLAiMaiL6988fPLVng4Pyc/ntJwdj1IaQATAUu28RY+MY9IBo8Zwn6UpMQs4IS/Xq+JSvuCTwvSVnBw2EgLuwyx/leoXSsJuLxJAoCfwVeZlJLASlYRPaQG5YHUWVFPa2P8irnFG0vs7L1HaD8qNRyCzPXvGjMVv8w/Tln6ZRkL+ZEGi+YH1BQEZPgPJM9XH5Vcj7PEPj3g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=/zSrQU8H1NPnxidodqAjH8kilpudJOfTro+bjakd4mE=; b=ol3BHVpAE3e7HuzC6e3ta2iTIG62Icg7Z4DUBf8tcNPx/FjF/29BkiYsWhHqWg+OQHesGSZ33lB1+5KojXJvvu+r7lAFPlb9o+FskasNqyfnje5xfLpPmTHauhtWMngLqIbuYLn9eQNcM95x/nDQxLhMXw5vUe5SWMK8+bsoNQfJRfD+vRVg4RbNNXwz9jpAcJ/mXtIkmcToB0cPOzf/EIjEyhOhQ21TqC6hsKbmLRPM+MThVmRNQhWZuSVbgSfHcLYgtLZysrE9sDinQKvRzRfgzdDR5uoc/NEWD1NPJA1xEOLwQv4hp+tLqThkif0k2xgIQ76hXk7NhLBh/y9zRQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=/zSrQU8H1NPnxidodqAjH8kilpudJOfTro+bjakd4mE=; b=gavtZZd68WfWb6lMcZsW5ggL7JhbBIS/FCNZEM+SHJ0f8Zpe0jj0jmGYAJzDtFBGCzqXbfMcYImyehPXAOwocrs6VJxJYqRNQ+TpV1i3urEnFSPlGsYwDf4QIr3SWbDZjYICHilhCeKpXCBwspBfuV7wBLYg3zqSDyQu+EsCuWno+iN6boc3BnwmSiaJZOP/a4ib7mDQnsLFOA147IkjucEoXblYj2AL1eeCuKtzLIn+aRBVgGB+fmN8zLP9CJQsTDwfSV4UoMual8+hGBEiF77Ev4ckGkdOqRySrsbrIOm+eqv4R8llBUARQr/yBHHUaQjGMNQfoGrngI6D5zhZfw== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; To: devel@lists.libvirt.org Subject: [RFC PATCH 4/5] qemu: open iommufd FDs from libvirt backend Date: Thu, 14 Aug 2025 19:54:13 -0700 Message-ID: <20250815025415.2805374-5-nathanc@nvidia.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250815025415.2805374-1-nathanc@nvidia.com> References: <20250815025415.2805374-1-nathanc@nvidia.com> X-ClientProxiedBy: BYAPR05CA0082.namprd05.prod.outlook.com (2603:10b6:a03:e0::23) To PH7PR12MB6834.namprd12.prod.outlook.com (2603:10b6:510:1b4::18) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: PH7PR12MB6834:EE_|PH7PR12MB5685:EE_ X-MS-Office365-Filtering-Correlation-Id: 7f9b891c-fa11-44f8-d9f7-08dddba7135f X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|366016|376014; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?NTsHrAXB/jv1v+UBRMnyi0MTVrGSo9v4d8BITkdfaPbtUxcxHzv8C5H81R7T?= =?us-ascii?Q?kr6tp2YHqT0pYAxuVDCT6IlDObZb+MJR0b007VuHd0angRynxX/Xpk4H7WAS?= =?us-ascii?Q?P75v5cvJKsJ2kcfBdIvLEK2xyV4uGTOSkJ8TpvYrFyUtEF7w4SE9gQ1kpY9c?= =?us-ascii?Q?skmqOE02O9zA6BbQoSGWCnr8Vww298NyvgJOk9R8pApJT5FM/7A5W3FGVGIM?= =?us-ascii?Q?hPOsWUEcZvx9wWaUzxN92cMEPULXJAbbEsJ2apojByokmtpGVNCUBGGEx4kk?= =?us-ascii?Q?wJrieMURNTzg/TBiSIR75a6tRp9CeNcY6sItueC8zbRXD0PttBhOa/2q46am?= =?us-ascii?Q?AF5RNQehaIfHgNX3A0lKO/LSCHIu8muBe/sMO26M44eOHIHWDuSZhVCthhQI?= =?us-ascii?Q?vZ3dbHWEaBmJl4n1W1KDbFZbaEtELG1NpotfKHAcRJeaqe0/YKEz68mDP34v?= =?us-ascii?Q?yGAQZt3/Sn6zVStu9MtqrCbtfRcdmK4lSoRnIU3XagXhb0ekuDtX6r6Iol8j?= =?us-ascii?Q?pdSPYUVf0uXwG1xHYS7ToqMPrQEzVgGEonyndUIG6mqHRpKBbPnIXK7L6enw?= =?us-ascii?Q?hGhzzcFL20fEn5D6x37Ca7N64DsRzEc7Pz5sA1o/OV2QJXysXfKRgGDQ4OfA?= =?us-ascii?Q?pniKyqRKqC6faRc5Mlt5zmiKBdixpyzKf0G8eX4KQCsN9P4Xu7cbsE33IHzN?= =?us-ascii?Q?QsQm6QEyiv/qb+ptuWI4GqPChGrRB10AaXFiY0jw0bwPC21PgzUKK5w0zVIu?= =?us-ascii?Q?9qSL9cYksJZlCmmZ+YxYjoaQmnkIkTsYI4K+CtYTiE7wbB/flFilLfMLEG/9?= =?us-ascii?Q?DE7OP230SY3xAySphCDFURolko6fCrHsVrdV8E9yjNZ3tT7nqDc7gu0uKvPC?= =?us-ascii?Q?9ZHoPSV2Zjv3txg9iIokIRIneNPEpXVfTZvzLvxMuYmqun3zEwa2g8qcWTY8?= =?us-ascii?Q?VSDJub3Qlgz73LpBrTDNEiXZ2EK2hvXseQcUCF2S5EDc2oZKwLbVRy2HbOM9?= =?us-ascii?Q?OlSls8AF+xCSshMVvO63igAm8vXO/9PDEnOAvai9OpzSu76Lgi4+evL9xWzX?= =?us-ascii?Q?Em95oY/eDPMUQB6/PSYOM7fzZxhYIdaKQBlsp8FefrO7U+6IIhoyxfjSkas+?= =?us-ascii?Q?vdLdzxipF5w+nTN/vjoLXIeOJJ1L/512j56uMllsNvTsBTQD1/VZhLwkjFOh?= =?us-ascii?Q?4fN1yluxDgGHhkL27skIPwZwZzwwBkEz8+Bil1HbcG6TNPTo22OYymcZojSI?= =?us-ascii?Q?kcjKLP5y+UPCP4zRddgC1nhGpjEcae0eF4JBNmoou4nVVJMrJasWv/EdUReL?= =?us-ascii?Q?jUcU+ykS8rZAonyL7hk/wen0YAS5urlrAR6PVZscMLAbKmZuhk3GGnR37off?= =?us-ascii?Q?+6GbWwCljzafEKIY5OCXfF1WIk0c/91J0IeZcK4U+2KYhOEWKoIBWGBcaK40?= =?us-ascii?Q?zrUFZCcZHjg=3D?= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:PH7PR12MB6834.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(1800799024)(366016)(376014);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?GhCVnJJkVKDwMq2b10jlDTpuFrVqOFvsxfp06kehFIFnu1buMo4nG0tlyHbn?= =?us-ascii?Q?82BgtsAa1CwyCMKE9YyM8YIfW85+ERBBpkIToOhqo3k8uLXtL0Fi+mNOggJj?= =?us-ascii?Q?NDel+MqzzvKM/TTb0crmI6LAhEK5CNDvfanoW9oxyZ18iWDpv3MZp5On1yg+?= =?us-ascii?Q?XsVfDIvMkRFvJtwiJiLwFv1VRh6joVr4LX9bEP3DBtrB7lYil9/vYwl+TeX8?= =?us-ascii?Q?wFr6zEMWMV/b0VsHuuZ5XAxC0/E8O0mHCTSmSgCBjPKHaQhRrpuuN6QXX1Xm?= =?us-ascii?Q?gB1z6dq4GzbzSQ1texu4rgnjRAWLrWEkrn4sF/zxBo3eAh9W7O5cvagY9z6C?= =?us-ascii?Q?+8lvexnukjwtoibGWdDpsptq9Jv7jLmrct3CilV5Zdd9dZfxGT19LFeA976F?= =?us-ascii?Q?ggmQ7VGMxGiYfLSWZaf/aLhsIjRngOaCKDo9U7t+wCkOB8b+aORDpIuPtzeo?= =?us-ascii?Q?AIUQwFXOzl+bWLoIrWGcY7aepnJEMCuPJhH2qS9Sz12q3zjNWdjn6BtIWolc?= =?us-ascii?Q?CuBPK4eWjLtAkTfxwnjyBIx8/6aOfd5V5betcZ0jps1OBUifXBNyjgTfOJu6?= =?us-ascii?Q?xG42JwfElmNObFi2QlpxJ2OsTzX8qHGnIwA930V7MIyiLURBsvKxcgmy7E9k?= =?us-ascii?Q?QPyVzDeQBhq6Ra4ZD41yv6qdDHS1RFKE6mZ2JodQW1tyxgEPGHSbR+P9RtSb?= =?us-ascii?Q?kU5sEfnq6nrhzCFWhieM3tqtX21iNXn2EX3iv0K2YLt5DRylHntbvChvjwos?= =?us-ascii?Q?GSRcYdMWTlZqMNjkzUTScOTv/bxjanhR8HtNWO0YFBIceZYpIB4qB443kb58?= =?us-ascii?Q?JiQ+ugo5XUfb7j/GuRdj0+7l6jyoa1J4+OyGi2rRt90tP0vCJuZ0Khar41FE?= =?us-ascii?Q?2DxxnIs/sDvQgxVBkLCBkZYK7pqa3zNTCiH7QVAC6HM/VmCMuNN9EeWSl1/L?= =?us-ascii?Q?ApHxIu7O9Haz8qQwzf5RIOHn/26VJFsIff1gdL53AFi//GBtJOpWmEGSaBwZ?= =?us-ascii?Q?j3C0QI5z4wtQGUXYVDjosUQc0LYZfkXxSvixPMUuqRAf7P8xo5yHWw6qIXlS?= =?us-ascii?Q?sEQskHQwwSOU3deVoLj8NWQbCUVOGKMIgDJlfuB5yzrKfsn97lyVJWoMWR6f?= =?us-ascii?Q?4tr6364Ut2e8GGHfcnXpcaPS51jR73LnDFBbdjKtE+IlM05LLWtSPZLw65nl?= =?us-ascii?Q?1WG3GteCKKfHvPyL57fc3h/lTJfVKw4Tng+Sm99BE4jMK5QRFKAtD/xqni8W?= =?us-ascii?Q?4t+ZqDTgLr+qWCRuv4/IQda1rRfzixaCTYnVTwUvSf+GZ2eG/sQgEsOnvgtS?= =?us-ascii?Q?Z0IbCv/ymRR/MwpvpumHLF5geoCkXCfZrRYIbe2M/FGyeP76XUIYI4d5bZtZ?= =?us-ascii?Q?Ejtg1Grbj0z2DM4tkX63eN/6YlE7584dA0M8mn5X5WDNwJlOzpq1a7ZIam58?= =?us-ascii?Q?gMxO4wviRuSKGj9juVFmL9bJi/ZPQvswJOx2avIH4B162H0s49iE3RAWmqVg?= =?us-ascii?Q?X1ZPtYLcbEkxwQUCCjZrUV2lBeBcO7MaEdazlLvtQW/0YW8MofwHut49peAh?= =?us-ascii?Q?wK6Do5BOKnGDqUyd7iz3QsHua4XZmrs6y+hoJykd?= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 7f9b891c-fa11-44f8-d9f7-08dddba7135f X-MS-Exchange-CrossTenant-AuthSource: PH7PR12MB6834.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Aug 2025 02:54:38.8907 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: M5YPEDGp7rR5/asfAxmVJEuQmispwHjI3USt2dPJK61uYe+ReXspGS+2zx9jgyObJ8xXdanP2H/FxWUdPi5Buw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH7PR12MB5685 Content-Transfer-Encoding: quoted-printable Message-ID-Hash: ZIHD53WZBIZ6AEUYO5MKGL6U3QBK62ZN X-Message-ID-Hash: ZIHD53WZBIZ6AEUYO5MKGL6U3QBK62ZN X-MailFrom: nathanc@nvidia.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-config-1; header-match-config-2; header-match-config-3; header-match-devel.lists.libvirt.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: shameerali.kolothum.thodi@huawei.com, nicolinc@nvidia.com, nathanc@nvidia.com X-Mailman-Version: 3.2.2 Precedence: list List-Id: Development discussions about the libvirt library & tools Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: From: Nathan Chen via Devel Reply-To: Nathan Chen X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZM-MESSAGEID: 1755226932028124100 Content-Type: text/plain; charset="utf-8" Open iommufd FDs from libvirt backend without exposing these FDs to XML users, i.e. one per domain for /dev/iommu and one per iommufd hostdev for /dev/vfio/devices/vfioX, and pass the FD to qemu command line. Signed-off-by: Nathan Chen --- src/qemu/qemu_command.c | 44 +++++++- src/qemu/qemu_command.h | 3 +- src/qemu/qemu_domain.c | 8 ++ src/qemu/qemu_domain.h | 7 ++ src/qemu/qemu_hotplug.c | 2 +- src/qemu/qemu_process.c | 232 ++++++++++++++++++++++++++++++++++++++++ 6 files changed, 290 insertions(+), 6 deletions(-) diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 6b3e2ffd0d..359dbb2621 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -4797,7 +4797,8 @@ qemuBuildVideoCommandLine(virCommand *cmd, =20 virJSONValue * qemuBuildPCIHostdevDevProps(const virDomainDef *def, - virDomainHostdevDef *dev) + virDomainHostdevDef *dev, + virDomainObj *vm) { g_autoptr(virJSONValue) props =3D NULL; virDomainHostdevSubsysPCI *pcisrc =3D &dev->source.subsys.u.pci; @@ -4807,6 +4808,13 @@ qemuBuildPCIHostdevDevProps(const virDomainDef *def, const char *driver =3D NULL; /* 'ramfb' property must be omitted unless it's to be enabled */ bool ramfb =3D pcisrc->ramfb =3D=3D VIR_TRISTATE_SWITCH_ON; + bool useIommufd =3D false; + qemuDomainObjPrivate *priv =3D vm ? vm->privateData : NULL; + + if (pcisrc->driver.name =3D=3D VIR_DEVICE_HOSTDEV_PCI_DRIVER_NAME_VFIO= && + dev->iommufdId) { + useIommufd =3D true; + } =20 /* caller has to assign proper passthrough driver name */ switch (pcisrc->driver.name) { @@ -4850,6 +4858,18 @@ qemuBuildPCIHostdevDevProps(const virDomainDef *def, NULL) < 0) return NULL; =20 + if (useIommufd && priv) { + g_autofree char *vfioFdName =3D g_strdup_printf("vfio-%04x:%02x:%0= 2x.%d", + pcisrc->addr.domain,= pcisrc->addr.bus, + pcisrc->addr.slot, p= cisrc->addr.function); + + int vfiofd =3D GPOINTER_TO_INT(g_hash_table_lookup(priv->vfioDevic= eFds, vfioFdName)); + if (virJSONValueObjectAdd(&props, + "S:fd", g_strdup_printf("%d", vfiofd), + NULL) < 0) + return NULL; + } + if (qemuBuildDeviceAddressProps(props, def, dev->info) < 0) return NULL; =20 @@ -5223,11 +5243,13 @@ qemuBuildHostdevSCSICommandLine(virCommand *cmd, static int qemuBuildHostdevCommandLine(virCommand *cmd, const virDomainDef *def, - virQEMUCaps *qemuCaps) + virQEMUCaps *qemuCaps, + virDomainObj *vm) { size_t i; g_autoptr(virJSONValue) props =3D NULL; int iommufd =3D 0; + qemuDomainObjPrivate *priv =3D vm->privateData; =20 for (i =3D 0; i < def->nhostdevs; i++) { virDomainHostdevDef *hostdev =3D def->hostdevs[i]; @@ -5239,8 +5261,11 @@ qemuBuildHostdevCommandLine(virCommand *cmd, =20 if (hostdev->iommufdId && iommufd =3D=3D 0) { iommufd =3D 1; + virCommandPassFD(cmd, priv->iommufd, VIR_COMMAND_PASS_FD_CLOSE= _PARENT); + if (qemuMonitorCreateObjectProps(&props, "iommufd", hostdev->iommufdId, + "S:fd", g_strdup_printf("%d",= priv->iommufd), NULL) < 0) return -1; =20 @@ -5270,7 +5295,18 @@ qemuBuildHostdevCommandLine(virCommand *cmd, if (qemuCommandAddExtDevice(cmd, hostdev->info, def, qemuCaps)= < 0) return -1; =20 - if (!(devprops =3D qemuBuildPCIHostdevDevProps(def, hostdev))) + if (hostdev->iommufdId) { + virDomainHostdevSubsysPCI *pcisrc =3D &hostdev->source.sub= sys.u.pci; + g_autofree char *vfioFdName =3D g_strdup_printf("vfio-%04x= :%02x:%02x.%d", + pcisrc->addr= .domain, pcisrc->addr.bus, + pcisrc->addr= .slot, pcisrc->addr.function); + + int vfiofd =3D GPOINTER_TO_INT(g_hash_table_lookup(priv->v= fioDeviceFds, vfioFdName)); + + virCommandPassFD(cmd, vfiofd, VIR_COMMAND_PASS_FD_CLOSE_PA= RENT); + } + + if (!(devprops =3D qemuBuildPCIHostdevDevProps(def, hostdev, v= m))) return -1; =20 if (qemuBuildDeviceCommandlineFromJSON(cmd, devprops, def, qem= uCaps) < 0) @@ -10960,7 +10996,7 @@ qemuBuildCommandLine(virDomainObj *vm, if (qemuBuildRedirdevCommandLine(cmd, def, qemuCaps) < 0) return NULL; =20 - if (qemuBuildHostdevCommandLine(cmd, def, qemuCaps) < 0) + if (qemuBuildHostdevCommandLine(cmd, def, qemuCaps, vm) < 0) return NULL; =20 if (migrateURI) diff --git a/src/qemu/qemu_command.h b/src/qemu/qemu_command.h index ad068f1f16..380aac261f 100644 --- a/src/qemu/qemu_command.h +++ b/src/qemu/qemu_command.h @@ -180,7 +180,8 @@ qemuBuildThreadContextProps(virJSONValue **tcProps, /* Current, best practice */ virJSONValue * qemuBuildPCIHostdevDevProps(const virDomainDef *def, - virDomainHostdevDef *dev); + virDomainHostdevDef *dev, + virDomainObj *vm); =20 virJSONValue * qemuBuildRNGDevProps(const virDomainDef *def, diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index a2c7c88a7e..2086dbb575 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -1954,6 +1954,11 @@ qemuDomainObjPrivateFree(void *data) =20 virChrdevFree(priv->devs); =20 + if (priv->iommufd >=3D 0) { + virEventRemoveHandle(priv->iommufd); + priv->iommufd =3D -1; + } + if (priv->pidMonitored >=3D 0) { virEventRemoveHandle(priv->pidMonitored); priv->pidMonitored =3D -1; @@ -1975,6 +1980,7 @@ qemuDomainObjPrivateFree(void *data) =20 g_clear_pointer(&priv->blockjobs, g_hash_table_unref); g_clear_pointer(&priv->fds, g_hash_table_unref); + g_clear_pointer(&priv->vfioDeviceFds, g_hash_table_unref); =20 /* This should never be non-NULL if we get here, but just in case... */ if (priv->eventThread) { @@ -2003,7 +2009,9 @@ qemuDomainObjPrivateAlloc(void *opaque) =20 priv->blockjobs =3D virHashNew(virObjectUnref); priv->fds =3D virHashNew(g_object_unref); + priv->vfioDeviceFds =3D g_hash_table_new(g_str_hash, g_str_equal); =20 + priv->iommufd =3D -1; priv->pidMonitored =3D -1; =20 /* agent commands block by default, user can choose different behavior= */ diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 1afd932764..6460323554 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -266,6 +266,10 @@ struct _qemuDomainObjPrivate { /* named file descriptor groups associated with the VM */ GHashTable *fds; =20 + int iommufd; + + GHashTable *vfioDeviceFds; + char *memoryBackingDir; }; =20 @@ -1172,3 +1176,6 @@ qemuDomainCheckCPU(virArch arch, bool qemuDomainMachineSupportsFloppy(const char *machine, virQEMUCaps *qemuCaps); + +int qemuProcessOpenVfioFds(virDomainObj *vm); +void qemuProcessCloseVfioFds(virDomainObj *vm); diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index e9568af125..e0e693e251 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -1633,7 +1633,7 @@ qemuDomainAttachHostPCIDevice(virQEMUDriver *driver, goto error; } =20 - if (!(devprops =3D qemuBuildPCIHostdevDevProps(vm->def, hostdev))) + if (!(devprops =3D qemuBuildPCIHostdevDevProps(vm->def, hostdev, vm))) goto error; =20 qemuDomainObjEnterMonitor(vm); diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index a81c02c9d5..1bc779c6aa 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -25,6 +25,7 @@ #include #include #include +#include #if WITH_SYS_SYSCALL_H # include #endif @@ -8025,6 +8026,9 @@ qemuProcessLaunch(virConnectPtr conn, if (qemuExtDevicesStart(driver, vm, incomingMigrationExtDevices) < 0) goto cleanup; =20 + if (qemuProcessOpenVfioFds(vm) < 0) + goto cleanup; + if (!(cmd =3D qemuBuildCommandLine(vm, incoming ? "defer" : NULL, vmop, @@ -10206,3 +10210,231 @@ qemuProcessHandleNbdkitExit(qemuNbdkitProcess *nb= dkit, qemuProcessEventSubmit(vm, QEMU_PROCESS_EVENT_NBDKIT_EXITED, 0, 0, nbd= kit); virObjectUnlock(vm); } + +/** + * qemuProcessOpenIommuFd: + * @vm: domain object + * @iommuFd: returned file descriptor + * + * Opens /dev/iommu file descriptor for the VM. + * + * Returns: 0 on success, -1 on failure + */ +static int +qemuProcessOpenIommuFd(virDomainObj *vm, int *iommuFd) +{ + int fd =3D -1; + + VIR_DEBUG("Opening IOMMU FD for domain %s", vm->def->name); + + if ((fd =3D open("/dev/iommu", O_RDWR | O_CLOEXEC)) < 0) { + if (errno =3D=3D ENOENT) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("IOMMU FD support requires /dev/iommu device"= )); + } else { + virReportSystemError(errno, "%s", + _("cannot open /dev/iommu")); + } + return -1; + } + + *iommuFd =3D fd; + VIR_DEBUG("Opened IOMMU FD %d for domain %s", fd, vm->def->name); + return 0; +} + +/** + * qemuProcessGetVfioDevicePath: + * @hostdev: host device definition + * @vfioPath: returned VFIO device path + * + * Constructs the VFIO device path for a PCI hostdev. + * + * Returns: 0 on success, -1 on failure + */ +static int +qemuProcessGetVfioDevicePath(virDomainHostdevDef *hostdev, + char **vfioPath) +{ + virPCIDeviceAddress *addr; + g_autofree char *sysfsPath =3D NULL; + DIR *dir =3D NULL; + struct dirent *entry =3D NULL; + int ret =3D -1; + + if (hostdev->mode !=3D VIR_DOMAIN_HOSTDEV_MODE_SUBSYS || + hostdev->source.subsys.type !=3D VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PC= I) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("VFIO FD only supported for PCI hostdevs")); + return -1; + } + + addr =3D &hostdev->source.subsys.u.pci.addr; + + /* Build sysfs path: /sys/bus/pci/devices/DDDD:BB:DD.F/vfio-dev/ */ + sysfsPath =3D g_strdup_printf("/sys/bus/pci/devices/" + "%04x:%02x:%02x.%d/vfio-dev/", + addr->domain, addr->bus, + addr->slot, addr->function); + + if (virDirOpen(&dir, sysfsPath) < 0) { + virReportSystemError(errno, + _("cannot open VFIO sysfs directory %1$s"), + sysfsPath); + return -1; + } + + /* Find the vfio device name in the directory */ + while (virDirRead(dir, &entry, sysfsPath) > 0) { + if (STRPREFIX(entry->d_name, "vfio")) { + *vfioPath =3D g_strdup_printf("/dev/vfio/devices/%s", entry->d= _name); + ret =3D 0; + break; + } + } + + if (ret < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("cannot find VFIO device for PCI device %1$04x:%2= $02x:%3$02x.%4$d"), + addr->domain, addr->bus, addr->slot, addr->function= ); + } + + virDirClose(dir); + return ret; +} + +/** + * qemuProcessOpenVfioDeviceFd: + * @hostdev: host device definition + * @vfioFd: returned file descriptor + * + * Opens the VFIO device file descriptor for a hostdev. + * + * Returns: 0 on success, -1 on failure + */ +static int +qemuProcessOpenVfioDeviceFd(virDomainHostdevDef *hostdev, + int *vfioFd) +{ + g_autofree char *vfioPath =3D NULL; + int fd =3D -1; + + if (qemuProcessGetVfioDevicePath(hostdev, &vfioPath) < 0) + return -1; + + VIR_DEBUG("Opening VFIO device %s", vfioPath); + + if ((fd =3D open(vfioPath, O_RDWR | O_CLOEXEC)) < 0) { + if (errno =3D=3D ENOENT) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("VFIO device %1$s not found - ensure device i= s bound to vfio-pci driver"), + vfioPath); + } else { + virReportSystemError(errno, + _("cannot open VFIO device %1$s"), vfioPa= th); + } + return -1; + } + + *vfioFd =3D fd; + VIR_DEBUG("Opened VFIO device FD %d for %s", *vfioFd, vfioPath); + return 0; +} + +/** + * qemuProcessOpenVfioFds: + * @vm: domain object + * + * Opens all necessary VFIO file descriptors for the domain. + * + * Returns: 0 on success, -1 on failure + */ +int +qemuProcessOpenVfioFds(virDomainObj *vm) +{ + qemuDomainObjPrivate *priv =3D vm->privateData; + bool needsIommuFd =3D false; + size_t i; + + /* Check if we have any hostdevs that need VFIO FDs */ + for (i =3D 0; i < vm->def->nhostdevs; i++) { + virDomainHostdevDef *hostdev =3D vm->def->hostdevs[i]; + int vfioFd =3D -1; + g_autofree char *fdname =3D NULL; + + if (hostdev->mode =3D=3D VIR_DOMAIN_HOSTDEV_MODE_SUBSYS && + hostdev->source.subsys.type =3D=3D VIR_DOMAIN_HOSTDEV_SUBSYS_T= YPE_PCI) { + + /* Check if this hostdev uses VFIO with IOMMU FD */ + if (hostdev->source.subsys.u.pci.driver.name =3D=3D VIR_DEVICE= _HOSTDEV_PCI_DRIVER_NAME_VFIO && + hostdev->iommufdId) { + + needsIommuFd =3D true; + + /* Open VFIO device FD */ + if (qemuProcessOpenVfioDeviceFd(hostdev, &vfioFd) < 0) + goto error; + + /* Store the FD */ + fdname =3D g_strdup_printf("vfio-%04x:%02x:%02x.%d", + hostdev->source.subsys.u.pci.addr= .domain, + hostdev->source.subsys.u.pci.addr= .bus, + hostdev->source.subsys.u.pci.addr= .slot, + hostdev->source.subsys.u.pci.addr= .function); + + g_hash_table_insert(priv->vfioDeviceFds, g_steal_pointer(&= fdname), GINT_TO_POINTER(vfioFd)); + + VIR_DEBUG("Stored VFIO FD for device %s", fdname); + } + } + } + + /* Open IOMMU FD if needed */ + if (needsIommuFd) { + int iommuFd =3D -1; + + if (qemuProcessOpenIommuFd(vm, &iommuFd) < 0) + goto error; + + priv->iommufd =3D iommuFd; + + VIR_DEBUG("Stored IOMMU FD"); + } + + return 0; + + error: + qemuProcessCloseVfioFds(vm); + return -1; +} + +/** + * qemuProcessCloseVfioFds: + * @vm: domain object + * + * Closes all VFIO file descriptors for the domain. + */ +void +qemuProcessCloseVfioFds(virDomainObj *vm) +{ + qemuDomainObjPrivate *priv =3D vm->privateData; + GHashTableIter iter; + gpointer key, value; + + /* Close all VFIO device FDs */ + if (priv->vfioDeviceFds) { + g_hash_table_iter_init(&iter, priv->vfioDeviceFds); + while (g_hash_table_iter_next(&iter, &key, &value)) { + int fd =3D GPOINTER_TO_INT(value); + VIR_DEBUG("Closing VFIO device FD %d for %s", fd, (char*)key); + VIR_FORCE_CLOSE(fd); + } + g_hash_table_remove_all(priv->vfioDeviceFds); + } + + /* Close IOMMU FD */ + if (priv->iommufd >=3D 0) { + VIR_DEBUG("Closing IOMMU FD %d", priv->iommufd); + VIR_FORCE_CLOSE(priv->iommufd); + } +} --=20 2.43.0