From nobody Sun Oct 5 01:49:22 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.libvirt.org designates 8.43.85.245 as permitted sender) client-ip=8.43.85.245; envelope-from=devel-bounces@lists.libvirt.org; helo=lists.libvirt.org; Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zohomail.com: domain of lists.libvirt.org designates 8.43.85.245 as permitted sender) smtp.mailfrom=devel-bounces@lists.libvirt.org; arc=fail (Bad Signature); dmarc=pass(p=reject dis=none) header.from=lists.libvirt.org Return-Path: Received: from lists.libvirt.org (lists.libvirt.org [8.43.85.245]) by mx.zohomail.com with SMTPS id 175936618740998.96705417978944; Wed, 1 Oct 2025 17:49:47 -0700 (PDT) Received: by lists.libvirt.org (Postfix, from userid 993) id 2F2A9441C8; Wed, 1 Oct 2025 20:49:46 -0400 (EDT) Received: from [172.19.199.14] (lists.libvirt.org [8.43.85.245]) by lists.libvirt.org (Postfix) with ESMTP id 35C4044348; Wed, 1 Oct 2025 20:39:32 -0400 (EDT) Received: by lists.libvirt.org (Postfix, from userid 993) id 812FE43E5A; Wed, 1 Oct 2025 20:39:12 -0400 (EDT) Received: from CH5PR02CU005.outbound.protection.outlook.com (mail-northcentralusazon11012038.outbound.protection.outlook.com [40.107.200.38]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (prime256v1) server-signature RSA-PSS (3072 bits) server-digest SHA256) (No client certificate requested) by lists.libvirt.org (Postfix) with ESMTPS id BD089440C7 for ; Wed, 1 Oct 2025 20:38:58 -0400 (EDT) Received: from PH7PR12MB6834.namprd12.prod.outlook.com (2603:10b6:510:1b4::18) by SN7PR12MB7299.namprd12.prod.outlook.com (2603:10b6:806:2af::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9160.18; Thu, 2 Oct 2025 00:38:50 +0000 Received: from PH7PR12MB6834.namprd12.prod.outlook.com ([fe80::f432:162b:b94e:d2cb]) by PH7PR12MB6834.namprd12.prod.outlook.com ([fe80::f432:162b:b94e:d2cb%6]) with mapi id 15.20.9160.015; Thu, 2 Oct 2025 00:38:50 +0000 X-Spam-Checker-Version: SpamAssassin 4.0.1 (2024-03-26) on lists.libvirt.org X-Spam-Level: X-Spam-Status: No, score=-3.1 required=5.0 tests=ARC_SIGNED,ARC_VALID, DKIM_INVALID,DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED,RCVD_IN_VALIDITY_RPBL_BLOCKED, RCVD_IN_VALIDITY_SAFE_BLOCKED,SPF_PASS autolearn=unavailable autolearn_force=no version=4.0.1 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=ZQ9nfSNA6N5OMVrU73wQQWVWQOkB7xumUeVyM8BSaQH3xoaRExnztHQ63QUJeJpJLTBfRpM1fVJxFgTRMyKevnkwZNmLD/jBAfPn4TPnLSQ1rlP+N7y0RWI4eF/ku8R1bOISzh1uXvJDmgGVYcCKsAzjucL+8+4jWxQ7XA5jl/KUo0DWwN/mHxT+Iv26tz69lL7UyJAUgQ//dasReNqomHnq0ouLeewbj2iclKeaPovqEHoRrAEaGKEdlwnUYVQ79ON2mex0xPYFTLXIHXkxMx4X6q3lW9khRxTcK2oWAZ3ejvNi2EhKXyBu1eucnzhzOByEvCztZwxq7tcV9BeBOQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Mh6BZoWnWGKFne+cOWzv87ljymbVNsEaGXAPfilJAzg=; b=LAXbTq+5cZYuYdemioDjJ2mWphpV1UlwFHZt81ddCKwl6tQsNvBQvP04FLF1sos2G2xXUZV++lJtvXJRxFZRwwSvi/wNrKgXzVgSALjNI1M+xtEQD+rPqB+5msnma2kmQRJfGZvEY5yfaKueqRXwyiYiaEsm6KZw8+0V26bNXLcpApYdiLxsWkcBeEzUg3Fgd3cOAqpH8tNTOS4J0kCJMpJGaGXwcinvgLhmjQ8qn4rcwU1JZoHqCturxKF5P/ROk0dgaC1CgBjtibsbxSuS8KkcuRQm5Qi1JeQvwv0uyacDZnvZ2GNDAusP8GTdmAv5U+pjxSbiESW0RCpHIHApRQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Mh6BZoWnWGKFne+cOWzv87ljymbVNsEaGXAPfilJAzg=; b=F8XqmbuyJq+BCisB4ak5rxwZ0cf4FZr1P+587Yh88SxchUEWtu6Afsj8d9CutFMtlP5+N4txqzMR9RhGdwlmJdj8Awf+IG2CmmRR9HBbTBPBSbzrGvVwTUbNbPdDQryDHwEeCWjrXjMSV/8WGzLSoi3r5+/a1U2o0QjtK2QdzInCJv/zzZB37FMuI3hlscPfDKhZar1BFPYAjYidUxG7bPrUXgzovl+A8srZFuvdNSOdgQ0BRCrRovaBOpVHbw39jcuje6fMskLrp180cG3BTxqeNKzcFDJ0hQCITQjncZva+wFuXF2qb5ZWLSbym6vcBzul6NJvuoWTeKs/HCEQuQ== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; To: devel@lists.libvirt.org Subject: [RFC PATCH v5 5/7] qemu: open iommufd FDs from libvirt backend Date: Wed, 1 Oct 2025 17:38:35 -0700 Message-ID: <20251002003837.1546646-6-nathanc@nvidia.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251002003837.1546646-1-nathanc@nvidia.com> References: <20251002003837.1546646-1-nathanc@nvidia.com> Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: BYAPR01CA0012.prod.exchangelabs.com (2603:10b6:a02:80::25) To PH7PR12MB6834.namprd12.prod.outlook.com (2603:10b6:510:1b4::18) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: PH7PR12MB6834:EE_|SN7PR12MB7299:EE_ X-MS-Office365-Filtering-Correlation-Id: 0074f3c9-27dc-4af0-ec16-08de014c0e4d X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|376014|366016|1800799024; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?O+64KMmIe4d8vyd7+KQNc0Yf8d411Ep6R7jwOYhC07FgZY0CmaPXjCSZuxta?= =?us-ascii?Q?gUsSsz4KhS0w2Rgufy3+0jSvo0AQXJTJvH0lviH711xwP1bGf2UZ1ugLCYJ+?= =?us-ascii?Q?GT5QPKZUbXP7VDEj4IBuTyzQ06eaQjjbQBZhJrj3G01STeji04+HGSX94zx4?= =?us-ascii?Q?snOPVYb6c1V+1WRTAf3g6LtDZrcuKSDnxiDgQlDTrTnfNIBh0t4IeJmJyUZe?= =?us-ascii?Q?yCWQAgodBr/Do6mCB84ZKZQ+RUlktd5SJHVjnAIdv2+QQQFTrevTMIIcj2wj?= =?us-ascii?Q?LeA2UerVmFreYSJSEoq4njECDRnTxxXZ/fxrEKKegfh+cvmQhyeMLiC3N+Hd?= =?us-ascii?Q?uQv1cQO+utNefyv1LkoyBqkUnSmO5VZq/oJn4ZDqYry0ry8a5P0mmbbdrRXg?= =?us-ascii?Q?YryZfOBqp8diBx+ikH0Q+eOxwURVClnEOBePf+JlrWDRyex9wkl+UFjRMGiW?= =?us-ascii?Q?gJbjclVeMm+pRBYSE4TZvjRoEI1DAeReBrVv/+4KWtZZMJOmtIPPwzWLOZNV?= =?us-ascii?Q?Xbg8vXCtwB6tf/5vUNVpz5XhPJ7ognAebCZY1egaIc+03MbepozQI9ySbfZw?= =?us-ascii?Q?IHnawVPLjJjmLsChSP5ADne1WSDy+Ze7Ps6A3+F4nJjHG5I4hzgkZ4T+tN8L?= =?us-ascii?Q?uhBpDknYlb9eZCI8h3OyQRvdqV7peAVIhrF8s7oL7W383Er1DIEZjeV28hym?= =?us-ascii?Q?BMXVpIcKQFRy0g3rY/twMbvhOZmJSY8YIPyfN+Sus5sMfT5Qsx4l7xSrNE+y?= =?us-ascii?Q?Cc+GSuY5o0XMxr0EfcO4YlWAmD+2GwvAE7Tl9AX5ClRna8meUmKaSVladyfY?= =?us-ascii?Q?QNAa1JdKYxN0gXpe4ece9Ks7XiegA8aaJDwsJuoEminbzl0wvGbcrWfCY5wr?= =?us-ascii?Q?/nho+9qeUl88kUewHeOEgMRx+EFpPjn5bcbB0lWe3WGYpZlNYavchg1T+bh3?= =?us-ascii?Q?+c18yxt2zkEHICPDeHndkqAtE4pOcMPTEeDPQ8d1+Dvt+66IdjJY+koRIAjI?= =?us-ascii?Q?mOow9U5H4YEgNXJvsNjGfSjTksgkIz80g9H5UlKdV7I8qKflmXE/a7uIaERF?= =?us-ascii?Q?tOMa7C7VJXWeqLXAPUuHvTNJZWUwlD5ebUmIP2hmLby17AYE2pwE1dl/JRL8?= =?us-ascii?Q?zDc5OUlKOWjScZLXVG4JQilrZwv5ZCbfEzwzVaI5hAd9SDI3ltA3aYhKBxW5?= =?us-ascii?Q?jQyUz7gjtJNMFzqkgUSUBaAob9YCpEPqcPszB5RJAQZ/Irs90U6vs/D+8q1v?= =?us-ascii?Q?PmCcQlN552rWtbpmRuiNQGp7bt2gBE+B3fizPziLRZ8gj3SJP7rKNVKKA+sN?= =?us-ascii?Q?rvu5uhR7qgfSOwmcF4FzFL8RDdFtNLzcX8C+Z0/JIWvHrSVGPFWNa4zT2+nB?= =?us-ascii?Q?eWR4a+TE+42bQ0pPz6WCmAH/8Mjmqo0Bt2/HvfYv0z+sHc3ZBEHjnj4GcXAs?= =?us-ascii?Q?XHoLoSXYv/e6grToFXeyc3atDAdA0Fw9?= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:PH7PR12MB6834.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(376014)(366016)(1800799024);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?KLStBgrRBiJrZST9mkT7KjcfAulNaMeL02AhDnEavLqEaGG6C4MSxIP4iQfD?= =?us-ascii?Q?sHLlqT71PAGnZM+5OncKURZy/UVSqvVW8IlbwVsFH6gpj4/5oyWBwxV0vvJn?= =?us-ascii?Q?SLMFdWt9vdGlskfwy67BEMtM3T063jh2mzYo65jXgp46891m3Nlv+oaxydA6?= =?us-ascii?Q?Thnwb6afD2wwJ4SHrfV90TYQPwGddjsDQNC7au+TyV/9aRIDu74G+S1foIb4?= =?us-ascii?Q?EtuA+bZpHY4eQG405RD2/YqQqOA++p0Yn1uvVRDN1pymnS/8Cp1PTsoCiHWZ?= =?us-ascii?Q?bTr18JMe1aHMWnJlSJgJqLGqxltTMLB06r+Ts49Hka/QqOXY3+rCJBb9CilS?= =?us-ascii?Q?3UEcwGWzLNvzpL45OAfRdCRsC65cyKp43KzxtWUy6dJ10MJdPWh8rteJKRHH?= =?us-ascii?Q?jpYfRh5yqq0UNG8v+EO7VEpt1KVpTBx3ahSYqmx7/rgyXx2IvKTsRwmmGnXt?= =?us-ascii?Q?7tuGnBRYeFJ6iUafYxJq5tWqyO+yPzwTk8zUyKe2aZjRSCxpssRPLU2cSZYw?= =?us-ascii?Q?blZyyfzBoEnSX2yLaKl+RBYMtAy+LRLWtKzudrU3KZ9hPHqZpfGtdXHj3MFy?= =?us-ascii?Q?j/98EeUSjQq0cMdDf1LQpmga4u6jsV19Ll5M9AIaLTo94qW6qqzhvRWztsQx?= =?us-ascii?Q?i/jLoZC9czRFPDh4KRpJ+RVpez6ku29SeTPKhpOXIO3FN0SoMbgmocMZS1Th?= =?us-ascii?Q?naySRtMeg0BpYsPWlXPI1lEToHDKJ6YXRFECkPgCxlfbVo9gMbxxcVLN4YdW?= =?us-ascii?Q?LJibbLDjXx/JEUMZyF2LI+h3Squ6GjPoBuZEfVs+9hJsZPheO9GXTK4kAWnk?= =?us-ascii?Q?r6bP9ceMw5NQDtAwlVUq/TcscmZxsGvs7PCCvPhxpm1UnZlq0ID+ZYt5jayP?= =?us-ascii?Q?BDt0+11Uih0r/eGv6tOev9ai0Q7amx/sWekQOmrXs7K1TsXSidnrl2kFLc9c?= =?us-ascii?Q?Yx7NNHGXSUVuAKvC5/EEXGdUryPKxSPQStyBwfwTKf19Pimcyr0XMZ6FDKDm?= =?us-ascii?Q?yjDSSlCxdgleXhaltCW4s6PMTUZXgb023wlP+lOEoP0XbPiCiuLfbnuFFUrf?= =?us-ascii?Q?hS0j7s5I+XAE++/bvnhQxD15ZRj5sL8IdovMjEUpTXKcMLyPNKiwiKPubYCn?= =?us-ascii?Q?PU1hOrWb32EgehkFFaPRqYgD+svwcNAxXfqvGPCjhVgwOfyuP0z/4XJqzUN0?= =?us-ascii?Q?Mhjl3aCjawdc1w6mYvtu6uhMs2PPv511eBAh7eRKv1cJFVXfVmtCSx8zDtgl?= =?us-ascii?Q?7sMDfLWYlfG391HBjAI7WYUuEPKj94eX6sKcbww4gOm4HO+6NNKyc921Xnhc?= =?us-ascii?Q?MAbu2nm4nXGeALBneQQB3A6ZpvQDqWz/7cs5LKgUn9Qwfke1u0gGWfPwkb9k?= =?us-ascii?Q?Da9WjBZmbo3r3IZMcxrW87VkFLoNLJKuBn3L8jB9qX8o5H72dvz26xYL5lQf?= =?us-ascii?Q?/rv6MTOTeFGWmt9Y0lXe75Log+z6LN02Cjnb9j+iHqAemhjnuQalaf2iXWaS?= =?us-ascii?Q?W6bmUs0iOxjMuQS2n33imUXPtyy1hjf3obPGytOBXZlbUoY/15sqS89w5Em0?= =?us-ascii?Q?80PIXbQfuAUb9egR7PBn6WOmnJknAwI/4ViaGQXL?= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 0074f3c9-27dc-4af0-ec16-08de014c0e4d X-MS-Exchange-CrossTenant-AuthSource: PH7PR12MB6834.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 02 Oct 2025 00:38:50.3423 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: I1SHidvf/8V7AjZIRpk0tz/457v4tHxn/Aa3QmtW04LT2IzgnijOjOd2zI7CQsMTElTOQgnAk2XGpd0mffsZQw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN7PR12MB7299 Message-ID-Hash: XATMCBPDWGYKIYVZFWCWNCMYREXZW5Z7 X-Message-ID-Hash: XATMCBPDWGYKIYVZFWCWNCMYREXZW5Z7 X-MailFrom: nathanc@nvidia.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; header-match-devel.lists.libvirt.org-0; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: shameerali.kolothum.thodi@huawei.com, nicolinc@nvidia.com, nathanc@nvidia.com, mochs@nvidia.com X-Mailman-Version: 3.3.10 Precedence: list List-Id: Development discussions about the libvirt library & tools Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: From: Nathan Chen via Devel Reply-To: Nathan Chen X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZM-MESSAGEID: 1759366189849116600 Content-Type: text/plain; charset="utf-8" Open iommufd FDs from libvirt backend without exposing these FDs to XML users, i.e. one per domain for /dev/iommu and one per iommufd hostdev for /dev/vfio/devices/vfioX, and pass the FD to qemu command line. Signed-off-by: Nathan Chen --- src/qemu/qemu_command.c | 43 +++++++- src/qemu/qemu_command.h | 3 +- src/qemu/qemu_domain.c | 8 ++ src/qemu/qemu_domain.h | 7 ++ src/qemu/qemu_hotplug.c | 2 +- src/qemu/qemu_process.c | 232 ++++++++++++++++++++++++++++++++++++++++ 6 files changed, 289 insertions(+), 6 deletions(-) diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 38dab98dee..1901ecab36 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -4800,7 +4800,8 @@ qemuBuildVideoCommandLine(virCommand *cmd, =20 virJSONValue * qemuBuildPCIHostdevDevProps(const virDomainDef *def, - virDomainHostdevDef *dev) + virDomainHostdevDef *dev, + virDomainObj *vm) { g_autoptr(virJSONValue) props =3D NULL; virDomainHostdevSubsysPCI *pcisrc =3D &dev->source.subsys.u.pci; @@ -4811,6 +4812,13 @@ qemuBuildPCIHostdevDevProps(const virDomainDef *def, const char *iommufdId =3D NULL; /* 'ramfb' property must be omitted unless it's to be enabled */ bool ramfb =3D pcisrc->ramfb =3D=3D VIR_TRISTATE_SWITCH_ON; + bool useIommufd =3D false; + qemuDomainObjPrivate *priv =3D vm ? vm->privateData : NULL; + + if (pcisrc->driver.name =3D=3D VIR_DEVICE_HOSTDEV_PCI_DRIVER_NAME_VFIO= && + pcisrc->driver.iommufd) { + useIommufd =3D true; + } =20 /* caller has to assign proper passthrough driver name */ switch (pcisrc->driver.name) { @@ -4857,6 +4865,18 @@ qemuBuildPCIHostdevDevProps(const virDomainDef *def, NULL) < 0) return NULL; =20 + if (useIommufd && priv) { + g_autofree char *vfioFdName =3D g_strdup_printf("vfio-%04x:%02x:%0= 2x.%d", + pcisrc->addr.domain,= pcisrc->addr.bus, + pcisrc->addr.slot, p= cisrc->addr.function); + + int vfiofd =3D GPOINTER_TO_INT(g_hash_table_lookup(priv->vfioDevic= eFds, vfioFdName)); + if (virJSONValueObjectAdd(&props, + "S:fd", g_strdup_printf("%d", vfiofd), + NULL) < 0) + return NULL; + } + if (qemuBuildDeviceAddressProps(props, def, dev->info) < 0) return NULL; =20 @@ -5267,12 +5287,14 @@ qemuBuildAcpiNodesetProps(virCommand *cmd, static int qemuBuildHostdevCommandLine(virCommand *cmd, const virDomainDef *def, - virQEMUCaps *qemuCaps) + virQEMUCaps *qemuCaps, + virDomainObj *vm) { size_t i; g_autoptr(virJSONValue) props =3D NULL; int iommufd =3D 0; const char * iommufdId =3D "iommufd0"; + qemuDomainObjPrivate *priv =3D vm->privateData; =20 for (i =3D 0; i < def->nhostdevs; i++) { virDomainHostdevDef *hostdev =3D def->hostdevs[i]; @@ -5303,8 +5325,10 @@ qemuBuildHostdevCommandLine(virCommand *cmd, =20 if (subsys->u.pci.driver.iommufd && iommufd =3D=3D 0) { iommufd =3D 1; + virCommandPassFD(cmd, priv->iommufd, VIR_COMMAND_PASS_FD_C= LOSE_PARENT); if (qemuMonitorCreateObjectProps(&props, "iommufd", iommufdId, + "S:fd", g_strdup_printf("= %d", priv->iommufd), NULL) < 0) return -1; =20 @@ -5315,7 +5339,18 @@ qemuBuildHostdevCommandLine(virCommand *cmd, if (qemuCommandAddExtDevice(cmd, hostdev->info, def, qemuCaps)= < 0) return -1; =20 - if (!(devprops =3D qemuBuildPCIHostdevDevProps(def, hostdev))) + if (subsys->u.pci.driver.iommufd) { + virDomainHostdevSubsysPCI *pcisrc =3D &hostdev->source.sub= sys.u.pci; + g_autofree char *vfioFdName =3D g_strdup_printf("vfio-%04x= :%02x:%02x.%d", + pcisrc->addr= .domain, pcisrc->addr.bus, + pcisrc->addr= .slot, pcisrc->addr.function); + + int vfiofd =3D GPOINTER_TO_INT(g_hash_table_lookup(priv->v= fioDeviceFds, vfioFdName)); + + virCommandPassFD(cmd, vfiofd, VIR_COMMAND_PASS_FD_CLOSE_PA= RENT); + } + + if (!(devprops =3D qemuBuildPCIHostdevDevProps(def, hostdev, v= m))) return -1; =20 if (qemuBuildDeviceCommandlineFromJSON(cmd, devprops, def, qem= uCaps) < 0) @@ -11028,7 +11063,7 @@ qemuBuildCommandLine(virDomainObj *vm, if (qemuBuildRedirdevCommandLine(cmd, def, qemuCaps) < 0) return NULL; =20 - if (qemuBuildHostdevCommandLine(cmd, def, qemuCaps) < 0) + if (qemuBuildHostdevCommandLine(cmd, def, qemuCaps, vm) < 0) return NULL; =20 if (migrateURI) diff --git a/src/qemu/qemu_command.h b/src/qemu/qemu_command.h index ad068f1f16..380aac261f 100644 --- a/src/qemu/qemu_command.h +++ b/src/qemu/qemu_command.h @@ -180,7 +180,8 @@ qemuBuildThreadContextProps(virJSONValue **tcProps, /* Current, best practice */ virJSONValue * qemuBuildPCIHostdevDevProps(const virDomainDef *def, - virDomainHostdevDef *dev); + virDomainHostdevDef *dev, + virDomainObj *vm); =20 virJSONValue * qemuBuildRNGDevProps(const virDomainDef *def, diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index e45757ccd5..2f1c3de85d 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -1954,6 +1954,11 @@ qemuDomainObjPrivateFree(void *data) =20 virChrdevFree(priv->devs); =20 + if (priv->iommufd >=3D 0) { + virEventRemoveHandle(priv->iommufd); + priv->iommufd =3D -1; + } + if (priv->pidMonitored >=3D 0) { virEventRemoveHandle(priv->pidMonitored); priv->pidMonitored =3D -1; @@ -1975,6 +1980,7 @@ qemuDomainObjPrivateFree(void *data) =20 g_clear_pointer(&priv->blockjobs, g_hash_table_unref); g_clear_pointer(&priv->fds, g_hash_table_unref); + g_clear_pointer(&priv->vfioDeviceFds, g_hash_table_unref); =20 /* This should never be non-NULL if we get here, but just in case... */ if (priv->eventThread) { @@ -2003,7 +2009,9 @@ qemuDomainObjPrivateAlloc(void *opaque) =20 priv->blockjobs =3D virHashNew(virObjectUnref); priv->fds =3D virHashNew(g_object_unref); + priv->vfioDeviceFds =3D g_hash_table_new(g_str_hash, g_str_equal); =20 + priv->iommufd =3D -1; priv->pidMonitored =3D -1; =20 /* agent commands block by default, user can choose different behavior= */ diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index ffe5bee1bf..bed5326bba 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -266,6 +266,10 @@ struct _qemuDomainObjPrivate { /* named file descriptor groups associated with the VM */ GHashTable *fds; =20 + int iommufd; + + GHashTable *vfioDeviceFds; + char *memoryBackingDir; }; =20 @@ -1171,3 +1175,6 @@ qemuDomainCheckCPU(virArch arch, bool qemuDomainMachineSupportsFloppy(const char *machine, virQEMUCaps *qemuCaps); + +int qemuProcessOpenVfioFds(virDomainObj *vm); +void qemuProcessCloseVfioFds(virDomainObj *vm); diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index fb426deb1a..661e9008f7 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -1630,7 +1630,7 @@ qemuDomainAttachHostPCIDevice(virQEMUDriver *driver, goto error; } =20 - if (!(devprops =3D qemuBuildPCIHostdevDevProps(vm->def, hostdev))) + if (!(devprops =3D qemuBuildPCIHostdevDevProps(vm->def, hostdev, vm))) goto error; =20 qemuDomainObjEnterMonitor(vm); diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index ead5bf3e48..5acaf12cfc 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -25,6 +25,7 @@ #include #include #include +#include #if WITH_SYS_SYSCALL_H # include #endif @@ -8019,6 +8020,9 @@ qemuProcessLaunch(virConnectPtr conn, if (qemuExtDevicesStart(driver, vm, incomingMigrationExtDevices) < 0) goto cleanup; =20 + if (qemuProcessOpenVfioFds(vm) < 0) + goto cleanup; + if (!(cmd =3D qemuBuildCommandLine(vm, incoming ? "defer" : NULL, vmop, @@ -10200,3 +10204,231 @@ qemuProcessHandleNbdkitExit(qemuNbdkitProcess *nb= dkit, qemuProcessEventSubmit(vm, QEMU_PROCESS_EVENT_NBDKIT_EXITED, 0, 0, nbd= kit); virObjectUnlock(vm); } + +/** + * qemuProcessOpenIommuFd: + * @vm: domain object + * @iommuFd: returned file descriptor + * + * Opens /dev/iommu file descriptor for the VM. + * + * Returns: 0 on success, -1 on failure + */ +static int +qemuProcessOpenIommuFd(virDomainObj *vm, int *iommuFd) +{ + int fd =3D -1; + + VIR_DEBUG("Opening IOMMU FD for domain %s", vm->def->name); + + if ((fd =3D open("/dev/iommu", O_RDWR | O_CLOEXEC)) < 0) { + if (errno =3D=3D ENOENT) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("IOMMU FD support requires /dev/iommu device"= )); + } else { + virReportSystemError(errno, "%s", + _("cannot open /dev/iommu")); + } + return -1; + } + + *iommuFd =3D fd; + VIR_DEBUG("Opened IOMMU FD %d for domain %s", fd, vm->def->name); + return 0; +} + +/** + * qemuProcessGetVfioDevicePath: + * @hostdev: host device definition + * @vfioPath: returned VFIO device path + * + * Constructs the VFIO device path for a PCI hostdev. + * + * Returns: 0 on success, -1 on failure + */ +static int +qemuProcessGetVfioDevicePath(virDomainHostdevDef *hostdev, + char **vfioPath) +{ + virPCIDeviceAddress *addr; + g_autofree char *sysfsPath =3D NULL; + DIR *dir =3D NULL; + struct dirent *entry =3D NULL; + int ret =3D -1; + + if (hostdev->mode !=3D VIR_DOMAIN_HOSTDEV_MODE_SUBSYS || + hostdev->source.subsys.type !=3D VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PC= I) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("VFIO FD only supported for PCI hostdevs")); + return -1; + } + + addr =3D &hostdev->source.subsys.u.pci.addr; + + /* Build sysfs path: /sys/bus/pci/devices/DDDD:BB:DD.F/vfio-dev/ */ + sysfsPath =3D g_strdup_printf("/sys/bus/pci/devices/" + "%04x:%02x:%02x.%d/vfio-dev/", + addr->domain, addr->bus, + addr->slot, addr->function); + + if (virDirOpen(&dir, sysfsPath) < 0) { + virReportSystemError(errno, + _("cannot open VFIO sysfs directory %1$s"), + sysfsPath); + return -1; + } + + /* Find the vfio device name in the directory */ + while (virDirRead(dir, &entry, sysfsPath) > 0) { + if (STRPREFIX(entry->d_name, "vfio")) { + *vfioPath =3D g_strdup_printf("/dev/vfio/devices/%s", entry->d= _name); + ret =3D 0; + break; + } + } + + if (ret < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("cannot find VFIO device for PCI device %1$04x:%2= $02x:%3$02x.%4$d"), + addr->domain, addr->bus, addr->slot, addr->function= ); + } + + virDirClose(dir); + return ret; +} + +/** + * qemuProcessOpenVfioDeviceFd: + * @hostdev: host device definition + * @vfioFd: returned file descriptor + * + * Opens the VFIO device file descriptor for a hostdev. + * + * Returns: 0 on success, -1 on failure + */ +static int +qemuProcessOpenVfioDeviceFd(virDomainHostdevDef *hostdev, + int *vfioFd) +{ + g_autofree char *vfioPath =3D NULL; + int fd =3D -1; + + if (qemuProcessGetVfioDevicePath(hostdev, &vfioPath) < 0) + return -1; + + VIR_DEBUG("Opening VFIO device %s", vfioPath); + + if ((fd =3D open(vfioPath, O_RDWR | O_CLOEXEC)) < 0) { + if (errno =3D=3D ENOENT) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("VFIO device %1$s not found - ensure device i= s bound to vfio-pci driver"), + vfioPath); + } else { + virReportSystemError(errno, + _("cannot open VFIO device %1$s"), vfioPa= th); + } + return -1; + } + + *vfioFd =3D fd; + VIR_DEBUG("Opened VFIO device FD %d for %s", *vfioFd, vfioPath); + return 0; +} + +/** + * qemuProcessOpenVfioFds: + * @vm: domain object + * + * Opens all necessary VFIO file descriptors for the domain. + * + * Returns: 0 on success, -1 on failure + */ +int +qemuProcessOpenVfioFds(virDomainObj *vm) +{ + qemuDomainObjPrivate *priv =3D vm->privateData; + bool needsIommuFd =3D false; + size_t i; + + /* Check if we have any hostdevs that need VFIO FDs */ + for (i =3D 0; i < vm->def->nhostdevs; i++) { + virDomainHostdevDef *hostdev =3D vm->def->hostdevs[i]; + int vfioFd =3D -1; + g_autofree char *fdname =3D NULL; + + if (hostdev->mode =3D=3D VIR_DOMAIN_HOSTDEV_MODE_SUBSYS && + hostdev->source.subsys.type =3D=3D VIR_DOMAIN_HOSTDEV_SUBSYS_T= YPE_PCI) { + + /* Check if this hostdev uses VFIO with IOMMU FD */ + if (hostdev->source.subsys.u.pci.driver.name =3D=3D VIR_DEVICE= _HOSTDEV_PCI_DRIVER_NAME_VFIO && + hostdev->source.subsys.u.pci.driver.iommufd) { + + needsIommuFd =3D true; + + /* Open VFIO device FD */ + if (qemuProcessOpenVfioDeviceFd(hostdev, &vfioFd) < 0) + goto error; + + /* Store the FD */ + fdname =3D g_strdup_printf("vfio-%04x:%02x:%02x.%d", + hostdev->source.subsys.u.pci.addr= .domain, + hostdev->source.subsys.u.pci.addr= .bus, + hostdev->source.subsys.u.pci.addr= .slot, + hostdev->source.subsys.u.pci.addr= .function); + + g_hash_table_insert(priv->vfioDeviceFds, g_steal_pointer(&= fdname), GINT_TO_POINTER(vfioFd)); + + VIR_DEBUG("Stored VFIO FD for device %s", fdname); + } + } + } + + /* Open IOMMU FD if needed */ + if (needsIommuFd) { + int iommuFd =3D -1; + + if (qemuProcessOpenIommuFd(vm, &iommuFd) < 0) + goto error; + + priv->iommufd =3D iommuFd; + + VIR_DEBUG("Stored IOMMU FD"); + } + + return 0; + + error: + qemuProcessCloseVfioFds(vm); + return -1; +} + +/** + * qemuProcessCloseVfioFds: + * @vm: domain object + * + * Closes all VFIO file descriptors for the domain. + */ +void +qemuProcessCloseVfioFds(virDomainObj *vm) +{ + qemuDomainObjPrivate *priv =3D vm->privateData; + GHashTableIter iter; + gpointer key, value; + + /* Close all VFIO device FDs */ + if (priv->vfioDeviceFds) { + g_hash_table_iter_init(&iter, priv->vfioDeviceFds); + while (g_hash_table_iter_next(&iter, &key, &value)) { + int fd =3D GPOINTER_TO_INT(value); + VIR_DEBUG("Closing VFIO device FD %d for %s", fd, (char*)key); + VIR_FORCE_CLOSE(fd); + } + g_hash_table_remove_all(priv->vfioDeviceFds); + } + + /* Close IOMMU FD */ + if (priv->iommufd >=3D 0) { + VIR_DEBUG("Closing IOMMU FD %d", priv->iommufd); + VIR_FORCE_CLOSE(priv->iommufd); + } +} --=20 2.43.0