From nobody Sat Jun 13 00:25:45 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id C58F34A1384 for ; Mon, 11 May 2026 18:41:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778524880; cv=none; b=lkMxD16zs03/jjKYqIr5p5O37v6VgIN+3a5iIA8puRXvcVGP2WIGACiwzRxRa54SaFFrjhfwfZPWBeDu8aFphLMPfSRb/a3pzBnZZE8yawhGPOfIBrxaWZ72/l/59C2i5HbtTxy/PJgQUk+fqbYvkH8WyRUnZu3V0OuVc2ioTBA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778524880; c=relaxed/simple; bh=fTahA4tH62XGxZMDo/qLrD/qJ0iTEU/i2poil691+Yc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=HWvCfuaDvT9FTZ4dbV8nN28hv5i51vYtvYubv45Db9wM2H3hHSg7qm5s5Sbxw9W9qC82HO/YXxL5E/rrKdoiQsGk7UQBlFYjbnkdw0HRTbu/n47jpOYGrZMSS5nORH2iNshZStF1+7NYsiRXEl6ZIXpuph2+ysLo7mIp88eywss= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=IHOoCaNC; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="IHOoCaNC" Received: from administrator-PowerEdge-R660.corp.microsoft.com (unknown [131.107.147.7]) by linux.microsoft.com (Postfix) with ESMTPSA id 3EC5C20B7168; Mon, 11 May 2026 11:41:16 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 3EC5C20B7168 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1778524876; bh=xRssKIrcaPGNeeWfmpX+V9VzwqkWcV9psWjiMj7YxA8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=IHOoCaNC5IdtqMH3gQjCdWcF9RxZO6l4w4YqOjEHmOIJD8bHhqRFW8YECjQk1BOOv gXByBi3W1w79pMW2sKKUFJ/bHVIJ982Uo3G7dhGEco9dMxWYxJCWffIL4U2Kg/1Wcx jEx118qcG1bDYo7KE6gymEBVKDKzs+J4Q7uMHaJ0= From: Jacob Pan To: linux-kernel@vger.kernel.org, "iommu@lists.linux.dev" , Jason Gunthorpe , Alex Williamson , Joerg Roedel , Mostafa Saleh , David Matlack , Robin Murphy , Nicolin Chen , "Tian, Kevin" , Yi Liu Cc: Saurabh Sengar , skhawaja@google.com, pasha.tatashin@soleen.com, Will Deacon , Jacob Pan , Baolu Lu Subject: [PATCH v5 1/9] vfio: Rename VFIO_NOIOMMU to VFIO_GROUP_NOIOMMU Date: Mon, 11 May 2026 11:41:06 -0700 Message-ID: <20260511184116.3687392-2-jacob.pan@linux.microsoft.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260511184116.3687392-1-jacob.pan@linux.microsoft.com> References: <20260511184116.3687392-1-jacob.pan@linux.microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In preparation for adding cdev-based noiommu support under iommufd, rename CONFIG_VFIO_NOIOMMU to CONFIG_VFIO_GROUP_NOIOMMU to clearly scope it to the legacy group/container path. Also rename the helper vfio_device_is_noiommu() to vfio_device_is_group_noiommu() to match. Add an explicit dependency on VFIO_CONTAINER or IOMMUFD_VFIO_CONTAINER since the group-based noiommu path is only meaningful when container support is enabled. This is a pure rename with no functional change, laying the groundwork for a separate VFIO_CDEV_NOIOMMU config that enables noiommu mode through the iommufd cdev interface. Link: https://lore.kernel.org/linux-iommu/20260416144915.4fe38481@shazbot.o= rg/ Suggested-by: Alex Williamson Signed-off-by: Jacob Pan Reviewed-by: Jason Gunthorpe --- drivers/iommu/iommufd/vfio_compat.c | 4 ++-- drivers/vfio/Kconfig | 6 +++--- drivers/vfio/container.c | 6 +++--- drivers/vfio/group.c | 4 ++-- drivers/vfio/iommufd.c | 6 +++--- drivers/vfio/vfio.h | 12 ++++++------ drivers/vfio/vfio_main.c | 4 ++-- 7 files changed, 21 insertions(+), 21 deletions(-) diff --git a/drivers/iommu/iommufd/vfio_compat.c b/drivers/iommu/iommufd/vf= io_compat.c index acb48cdd3b00..51f4870ec2b3 100644 --- a/drivers/iommu/iommufd/vfio_compat.c +++ b/drivers/iommu/iommufd/vfio_compat.c @@ -286,7 +286,7 @@ static int iommufd_vfio_check_extension(struct iommufd_= ctx *ictx, return !ictx->no_iommu_mode; =20 case VFIO_NOIOMMU_IOMMU: - return IS_ENABLED(CONFIG_VFIO_NOIOMMU); + return IS_ENABLED(CONFIG_VFIO_GROUP_NOIOMMU); =20 case VFIO_DMA_CC_IOMMU: return iommufd_vfio_cc_iommu(ictx); @@ -318,7 +318,7 @@ static int iommufd_vfio_set_iommu(struct iommufd_ctx *i= ctx, unsigned long type) * other ioctls. We let them keep working but they mostly fail since no * IOAS should exist. */ - if (IS_ENABLED(CONFIG_VFIO_NOIOMMU) && type =3D=3D VFIO_NOIOMMU_IOMMU && + if (IS_ENABLED(CONFIG_VFIO_GROUP_NOIOMMU) && type =3D=3D VFIO_NOIOMMU_IOM= MU && no_iommu_mode) { if (!capable(CAP_SYS_RAWIO)) return -EPERM; diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig index ceae52fd7586..39939be2908e 100644 --- a/drivers/vfio/Kconfig +++ b/drivers/vfio/Kconfig @@ -60,9 +60,9 @@ config VFIO_IOMMU_SPAPR_TCE default VFIO endif =20 -config VFIO_NOIOMMU - bool "VFIO No-IOMMU support" - depends on VFIO_GROUP +config VFIO_GROUP_NOIOMMU + bool "VFIO group No-IOMMU support" + depends on VFIO_GROUP && (VFIO_CONTAINER || IOMMUFD_VFIO_CONTAINER) help VFIO is built on the ability to isolate devices using the IOMMU. Only with an IOMMU can userspace access to DMA capable devices be diff --git a/drivers/vfio/container.c b/drivers/vfio/container.c index 003281dbf8bc..9b8cdc5317d8 100644 --- a/drivers/vfio/container.c +++ b/drivers/vfio/container.c @@ -80,7 +80,7 @@ static const struct vfio_iommu_driver_ops vfio_noiommu_op= s =3D { static bool vfio_iommu_driver_allowed(struct vfio_container *container, const struct vfio_iommu_driver *driver) { - if (!IS_ENABLED(CONFIG_VFIO_NOIOMMU)) + if (!IS_ENABLED(CONFIG_VFIO_GROUP_NOIOMMU)) return true; return container->noiommu =3D=3D (driver->ops =3D=3D &vfio_noiommu_ops); } @@ -583,7 +583,7 @@ int __init vfio_container_init(void) return ret; } =20 - if (IS_ENABLED(CONFIG_VFIO_NOIOMMU)) { + if (IS_ENABLED(CONFIG_VFIO_GROUP_NOIOMMU)) { ret =3D vfio_register_iommu_driver(&vfio_noiommu_ops); if (ret) goto err_misc; @@ -597,7 +597,7 @@ int __init vfio_container_init(void) =20 void vfio_container_cleanup(void) { - if (IS_ENABLED(CONFIG_VFIO_NOIOMMU)) + if (IS_ENABLED(CONFIG_VFIO_GROUP_NOIOMMU)) vfio_unregister_iommu_driver(&vfio_noiommu_ops); misc_deregister(&vfio_dev); mutex_destroy(&vfio.iommu_drivers_lock); diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c index b2299e5bc6df..5b9329df04e5 100644 --- a/drivers/vfio/group.c +++ b/drivers/vfio/group.c @@ -137,7 +137,7 @@ static int vfio_group_ioctl_set_container(struct vfio_g= roup *group, =20 iommufd =3D iommufd_ctx_from_file(fd_file(f)); if (!IS_ERR(iommufd)) { - if (IS_ENABLED(CONFIG_VFIO_NOIOMMU) && + if (IS_ENABLED(CONFIG_VFIO_GROUP_NOIOMMU) && group->type =3D=3D VFIO_NO_IOMMU) ret =3D iommufd_vfio_compat_set_no_iommu(iommufd); else @@ -190,7 +190,7 @@ static int vfio_df_group_open(struct vfio_device_file *= df) vfio_device_group_get_kvm_safe(device); =20 df->iommufd =3D device->group->iommufd; - if (df->iommufd && vfio_device_is_noiommu(device) && device->open_count = =3D=3D 0) { + if (df->iommufd && vfio_device_is_group_noiommu(device) && device->open_c= ount =3D=3D 0) { /* * Require no compat ioas to be assigned to proceed. The basic * statement is that the user cannot have done something that diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c index a38d262c6028..39079ab27f92 100644 --- a/drivers/vfio/iommufd.c +++ b/drivers/vfio/iommufd.c @@ -26,7 +26,7 @@ int vfio_df_iommufd_bind(struct vfio_device_file *df) lockdep_assert_held(&vdev->dev_set->lock); =20 /* Returns 0 to permit device opening under noiommu mode */ - if (vfio_device_is_noiommu(vdev)) + if (vfio_device_is_group_noiommu(vdev)) return 0; =20 return vdev->ops->bind_iommufd(vdev, ictx, &df->devid); @@ -41,7 +41,7 @@ int vfio_iommufd_compat_attach_ioas(struct vfio_device *v= dev, lockdep_assert_held(&vdev->dev_set->lock); =20 /* compat noiommu does not need to do ioas attach */ - if (vfio_device_is_noiommu(vdev)) + if (vfio_device_is_group_noiommu(vdev)) return 0; =20 ret =3D iommufd_vfio_compat_ioas_get_id(ictx, &ioas_id); @@ -58,7 +58,7 @@ void vfio_df_iommufd_unbind(struct vfio_device_file *df) =20 lockdep_assert_held(&vdev->dev_set->lock); =20 - if (vfio_device_is_noiommu(vdev)) + if (vfio_device_is_group_noiommu(vdev)) return; =20 if (vdev->ops->unbind_iommufd) diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h index e4b72e79b7e3..602623cacfc0 100644 --- a/drivers/vfio/vfio.h +++ b/drivers/vfio/vfio.h @@ -36,7 +36,7 @@ vfio_allocate_device_file(struct vfio_device *device); =20 extern const struct file_operations vfio_device_fops; =20 -#ifdef CONFIG_VFIO_NOIOMMU +#ifdef CONFIG_VFIO_GROUP_NOIOMMU extern bool vfio_noiommu __read_mostly; #else enum { vfio_noiommu =3D false }; @@ -112,9 +112,9 @@ bool vfio_device_has_container(struct vfio_device *devi= ce); int __init vfio_group_init(void); void vfio_group_cleanup(void); =20 -static inline bool vfio_device_is_noiommu(struct vfio_device *vdev) +static inline bool vfio_device_is_group_noiommu(struct vfio_device *vdev) { - return IS_ENABLED(CONFIG_VFIO_NOIOMMU) && + return IS_ENABLED(CONFIG_VFIO_GROUP_NOIOMMU) && vdev->group->type =3D=3D VFIO_NO_IOMMU; } #else @@ -188,7 +188,7 @@ static inline void vfio_group_cleanup(void) { } =20 -static inline bool vfio_device_is_noiommu(struct vfio_device *vdev) +static inline bool vfio_device_is_group_noiommu(struct vfio_device *vdev) { return false; } @@ -359,7 +359,7 @@ void vfio_init_device_cdev(struct vfio_device *device); static inline int vfio_device_add(struct vfio_device *device) { /* cdev does not support noiommu device */ - if (vfio_device_is_noiommu(device)) + if (vfio_device_is_group_noiommu(device)) return device_add(&device->device); vfio_init_device_cdev(device); return cdev_device_add(&device->cdev, &device->device); @@ -367,7 +367,7 @@ static inline int vfio_device_add(struct vfio_device *d= evice) =20 static inline void vfio_device_del(struct vfio_device *device) { - if (vfio_device_is_noiommu(device)) + if (vfio_device_is_group_noiommu(device)) device_del(&device->device); else cdev_device_del(&device->cdev, &device->device); diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index 6222376ab6ab..4d940ce6f114 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -54,7 +54,7 @@ static struct vfio { int fs_count; } vfio; =20 -#ifdef CONFIG_VFIO_NOIOMMU +#ifdef CONFIG_VFIO_GROUP_NOIOMMU bool vfio_noiommu __read_mostly; module_param_named(enable_unsafe_noiommu_mode, vfio_noiommu, bool, S_IRUGO | S_IWUSR); @@ -353,7 +353,7 @@ static int __vfio_register_dev(struct vfio_device *devi= ce, * restore cache coherency. It has to be checked here because it is only * valid for cases where we are using iommu groups. */ - if (type =3D=3D VFIO_IOMMU && !vfio_device_is_noiommu(device) && + if (type =3D=3D VFIO_IOMMU && !vfio_device_is_group_noiommu(device) && !device_iommu_capable(device->dev, IOMMU_CAP_CACHE_COHERENCY)) { ret =3D -EINVAL; goto err_out; --=20 2.43.0 From nobody Sat Jun 13 00:25:45 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 7E8CB49691E for ; Mon, 11 May 2026 18:41:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778524881; cv=none; b=OoZ/l0sbrlhv8NoTiSZjADC4gzglFhWqMAa0H4hvM6lGNdaTYfj1kU7gS0ZKMAvk+hS2+szS2WlxqTSFDeTEw5PmRSpM8q+SR6ytmXK7cKfVF5uUqM1FS5UxcwCPp8ayhtYwHU8YeuEKd03vJ63aCU2abLd8OtLBAHmde6IojVo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778524881; c=relaxed/simple; bh=t6bqBr6IS5XERY7OEmVlf1KEsbach914eDo3vRK3Gxk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=cBJFs7epXMwwftlLOq7VE9D85BZXBC9tuaknN5AxnW7Q78bXqC5vWArvXm6BSCE4HRUS1FgYJswN/8MKAKZSuS+mbuDqBtPv6l+ldOMT18fLfz+AucQ7WmosgDx230WMu6VXayneoxdqkRFZpAjKqrayu0Z4pdht+QQeNMmDows= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=g7ADrjTm; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="g7ADrjTm" Received: from administrator-PowerEdge-R660.corp.microsoft.com (unknown [131.107.147.7]) by linux.microsoft.com (Postfix) with ESMTPSA id 05E3820B716A; Mon, 11 May 2026 11:41:17 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 05E3820B716A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1778524877; bh=1IHhz2EChqzP1rLu37BY3+70LmYJ70cE8ptFEXHUdbk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=g7ADrjTm3Cg8RRYCx0TQy5DtmNjWDbZyC+axF6x6y2zimIuj/DcNCRIhDI5vfIMTS 3O/clQQFCj7N2pr2vUb4d1Rk7gcxmMSXvX4gZyPvgYeUBBdLSSU7hAe/04NCffNIcZ 5EBV6bXC0FOUPMv0rIK7xtMYEu1w12SXYVxeOqRw= From: Jacob Pan To: linux-kernel@vger.kernel.org, "iommu@lists.linux.dev" , Jason Gunthorpe , Alex Williamson , Joerg Roedel , Mostafa Saleh , David Matlack , Robin Murphy , Nicolin Chen , "Tian, Kevin" , Yi Liu Cc: Saurabh Sengar , skhawaja@google.com, pasha.tatashin@soleen.com, Will Deacon , Jacob Pan , Baolu Lu Subject: [PATCH v5 2/9] iommufd: Support a HWPT without an iommu driver for noiommu Date: Mon, 11 May 2026 11:41:07 -0700 Message-ID: <20260511184116.3687392-3-jacob.pan@linux.microsoft.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260511184116.3687392-1-jacob.pan@linux.microsoft.com> References: <20260511184116.3687392-1-jacob.pan@linux.microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Jason Gunthorpe Create just a little part of a real iommu driver, enough to slot in under the dev_iommu_ops() and allow iommufd to call domain_alloc_paging_flags() and fail everything else. This allows explicitly creating a HWPT under an IOAS. A new Kconfig option IOMMUFD_NOIOMMU is introduced to differentiate from the VFIO group/container based noiommu mode. Signed-off-by: Jason Gunthorpe Signed-off-by: Jacob Pan Reviewed-by: Lu Baolu Reviewed-by: Samiullah Khawaja --- v5: - Use the new IOMMUFD_NOIOMMU Kconfig instead of VFIO_NOIOMMU - Use consistent wording referring to VFIO noiommu mode (Kevin) - Copyright date fix (Kevin) v4: - Make iommufd_noiommu_ops const v3: - Add comment to explain the design difference over the legacy noiommu VFIO code. --- drivers/iommu/iommufd/Kconfig | 13 +++ drivers/iommu/iommufd/Makefile | 1 + drivers/iommu/iommufd/hw_pagetable.c | 15 +++- drivers/iommu/iommufd/hwpt_noiommu.c | 102 ++++++++++++++++++++++++ drivers/iommu/iommufd/iommufd_private.h | 2 + 5 files changed, 131 insertions(+), 2 deletions(-) create mode 100644 drivers/iommu/iommufd/hwpt_noiommu.c diff --git a/drivers/iommu/iommufd/Kconfig b/drivers/iommu/iommufd/Kconfig index 455bac0351f2..74d6ea5b5b3b 100644 --- a/drivers/iommu/iommufd/Kconfig +++ b/drivers/iommu/iommufd/Kconfig @@ -16,6 +16,19 @@ config IOMMUFD If you don't know what to do here, say N. =20 if IOMMUFD +config IOMMUFD_NOIOMMU + bool + depends on !GENERIC_ATOMIC64 # IOMMU_PT_AMDV1 requires cmpxchg64 + select GENERIC_PT + select IOMMU_PT + select IOMMU_PT_AMDV1 + help + Provides a SW-only IO page table for devices without hardware + IOMMU backing. This uses the AMDV1 page table format for + IOVA-to-PA lookups only, not for hardware DMA translation. + + Selected by VFIO_CDEV_NOIOMMU. Not intended to be enabled directly. + config IOMMUFD_VFIO_CONTAINER bool "IOMMUFD provides the VFIO container /dev/vfio/vfio" depends on VFIO_GROUP && !VFIO_CONTAINER diff --git a/drivers/iommu/iommufd/Makefile b/drivers/iommu/iommufd/Makefile index 71d692c9a8f4..67207914bb6e 100644 --- a/drivers/iommu/iommufd/Makefile +++ b/drivers/iommu/iommufd/Makefile @@ -10,6 +10,7 @@ iommufd-y :=3D \ vfio_compat.o \ viommu.o =20 +iommufd-$(CONFIG_IOMMUFD_NOIOMMU) +=3D hwpt_noiommu.o iommufd-$(CONFIG_IOMMUFD_TEST) +=3D selftest.o =20 obj-$(CONFIG_IOMMUFD) +=3D iommufd.o diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/h= w_pagetable.c index fe789c2dc0c9..0ae14cd3fc72 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -8,6 +8,15 @@ #include "../iommu-priv.h" #include "iommufd_private.h" =20 +static const struct iommu_ops *get_iommu_ops(struct iommufd_device *idev) +{ + if (IS_ENABLED(CONFIG_IOMMUFD_NOIOMMU) && !idev->igroup->group) + return &iommufd_noiommu_ops; + if (WARN_ON_ONCE(!idev->dev->iommu)) + return NULL; + return dev_iommu_ops(idev->dev); +} + static void __iommufd_hwpt_destroy(struct iommufd_hw_pagetable *hwpt) { if (hwpt->domain) @@ -114,11 +123,13 @@ iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, s= truct iommufd_ioas *ioas, IOMMU_HWPT_ALLOC_DIRTY_TRACKING | IOMMU_HWPT_FAULT_ID_VALID | IOMMU_HWPT_ALLOC_PASID; - const struct iommu_ops *ops =3D dev_iommu_ops(idev->dev); + const struct iommu_ops *ops =3D get_iommu_ops(idev); struct iommufd_hwpt_paging *hwpt_paging; struct iommufd_hw_pagetable *hwpt; int rc; =20 + if (!ops) + return ERR_PTR(-ENODEV); lockdep_assert_held(&ioas->mutex); =20 if ((flags || user_data) && !ops->domain_alloc_paging_flags) @@ -229,7 +240,7 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx, struct iommufd_device *idev, u32 flags, const struct iommu_user_data *user_data) { - const struct iommu_ops *ops =3D dev_iommu_ops(idev->dev); + const struct iommu_ops *ops =3D get_iommu_ops(idev); struct iommufd_hwpt_nested *hwpt_nested; struct iommufd_hw_pagetable *hwpt; int rc; diff --git a/drivers/iommu/iommufd/hwpt_noiommu.c b/drivers/iommu/iommufd/h= wpt_noiommu.c new file mode 100644 index 000000000000..b1efc4bca880 --- /dev/null +++ b/drivers/iommu/iommufd/hwpt_noiommu.c @@ -0,0 +1,102 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES + */ +#include +#include +#include "iommufd_private.h" + +static const struct iommu_domain_ops noiommu_amdv1_ops; + +struct noiommu_domain { + union { + struct iommu_domain domain; + struct pt_iommu_amdv1 amdv1; + }; + spinlock_t lock; +}; +PT_IOMMU_CHECK_DOMAIN(struct noiommu_domain, amdv1.iommu, domain); + +static void noiommu_change_top(struct pt_iommu *iommu_table, + phys_addr_t top_paddr, unsigned int top_level) +{ +} + +static spinlock_t *noiommu_get_top_lock(struct pt_iommu *iommupt) +{ + struct noiommu_domain *domain =3D + container_of(iommupt, struct noiommu_domain, amdv1.iommu); + + return &domain->lock; +} + +static const struct pt_iommu_driver_ops noiommu_driver_ops =3D { + .get_top_lock =3D noiommu_get_top_lock, + .change_top =3D noiommu_change_top, +}; + +static struct iommu_domain * +noiommu_alloc_paging_flags(struct device *dev, u32 flags, + const struct iommu_user_data *user_data) +{ + struct pt_iommu_amdv1_cfg cfg =3D {}; + struct noiommu_domain *dom; + int rc; + + if (flags || user_data) + return ERR_PTR(-EOPNOTSUPP); + + cfg.common.hw_max_vasz_lg2 =3D 64; + cfg.common.hw_max_oasz_lg2 =3D 52; + cfg.starting_level =3D 2; + cfg.common.features =3D + (BIT(PT_FEAT_DYNAMIC_TOP) | BIT(PT_FEAT_AMDV1_ENCRYPT_TABLES) | + BIT(PT_FEAT_AMDV1_FORCE_COHERENCE)); + + dom =3D kzalloc(sizeof(*dom), GFP_KERNEL); + if (!dom) + return ERR_PTR(-ENOMEM); + + spin_lock_init(&dom->lock); + dom->amdv1.iommu.nid =3D NUMA_NO_NODE; + dom->amdv1.iommu.driver_ops =3D &noiommu_driver_ops; + dom->domain.ops =3D &noiommu_amdv1_ops; + + /* Use mock page table which is based on AMDV1 */ + rc =3D pt_iommu_amdv1_init(&dom->amdv1, &cfg, GFP_KERNEL); + if (rc) { + kfree(dom); + return ERR_PTR(rc); + } + + return &dom->domain; +} + +static void noiommu_domain_free(struct iommu_domain *iommu_domain) +{ + struct noiommu_domain *domain =3D + container_of(iommu_domain, struct noiommu_domain, domain); + + pt_iommu_deinit(&domain->amdv1.iommu); + kfree(domain); +} + +/* + * AMDV1 is used as a SW-only page table for no-IOMMU mode, similar to the + * iommufd selftest mock page table. + * Unlike the VFIO group-container based no-IOMMU mode, where no container + * level APIs are supported, this allows IOAS and hwpt objects to exist + * without hardware IOMMU support. IOVAs are used only for IOVA-to-PA + * lookups not for hardware translation in DMA. + * + * This is only used with iommufd and cdev-based interfaces and does not + * apply to the VFIO group-container based noiommu mode. + */ +static const struct iommu_domain_ops noiommu_amdv1_ops =3D { + IOMMU_PT_DOMAIN_OPS(amdv1), + .free =3D noiommu_domain_free, +}; + +const struct iommu_ops iommufd_noiommu_ops =3D { + .domain_alloc_paging_flags =3D noiommu_alloc_paging_flags, +}; diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommuf= d/iommufd_private.h index 6ac1965199e9..2682b5baa6e9 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -464,6 +464,8 @@ static inline void iommufd_hw_pagetable_put(struct iomm= ufd_ctx *ictx, refcount_dec(&hwpt->obj.users); } =20 +extern const struct iommu_ops iommufd_noiommu_ops; + struct iommufd_attach; =20 struct iommufd_group { --=20 2.43.0 From nobody Sat Jun 13 00:25:45 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 28EC74A33FC for ; Mon, 11 May 2026 18:41:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778524881; cv=none; b=pCBA+OPJJtlnwkWCYo7SXoiOGipeD9PlPQLA1BZhw9dVYJveuZFkbVy4iJmhvWWtjJk3thD6g5zO6h+I2TsNfwWk8R4meOuwzKo6n91uMsjE+klm8jsG9rdKMCwlsRV2v1i1aCf4xi8WH9DqtrhJpIwfVGYlgPTR4YLGREO6/Zo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778524881; c=relaxed/simple; bh=IR26TFVF1p/ShoJ7hk/esRjA5uezcqic+/+KTROdqiE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=pFpJHumdgIwwLYvl9TvcwbaHb5tH8y4C4aSPL4iQfaX9X1cFGbf6MWJm3zEHy660h4b1HEQH+FaPNt36n0D2q5paW3EpKgeNKotcagdOq4zDLPT4BNfoj/uVzu6XrVApos0qZ+OOMk3bq7LgdN0McvIxPAFVpLZa0/PQNa79D3U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=KeuDHUOY; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="KeuDHUOY" Received: from administrator-PowerEdge-R660.corp.microsoft.com (unknown [131.107.147.7]) by linux.microsoft.com (Postfix) with ESMTPSA id B5C1B20B716B; Mon, 11 May 2026 11:41:17 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com B5C1B20B716B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1778524877; bh=DymJjM/8uCvIk2lLmgH7Wp2OWbrYkSL1YxpydroEpMk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=KeuDHUOYSPV6eKAKENoMcl650GCj6IG3V8A8yl4VgQe0qGPuxp2jamsRls/CJruIP Mn8LPZ1SNC5iYEJ09wROONJ4zq27o7+PemgKtlfp57xWMz56A1H1iIUJiOV+5L6eOT SJdB9Q9tYj1657TO6ieZcN5rBlgH3LsA8Ut8oL/8= From: Jacob Pan To: linux-kernel@vger.kernel.org, "iommu@lists.linux.dev" , Jason Gunthorpe , Alex Williamson , Joerg Roedel , Mostafa Saleh , David Matlack , Robin Murphy , Nicolin Chen , "Tian, Kevin" , Yi Liu Cc: Saurabh Sengar , skhawaja@google.com, pasha.tatashin@soleen.com, Will Deacon , Jacob Pan , Baolu Lu Subject: [PATCH v5 3/9] iommufd: Move igroup allocation to a function Date: Mon, 11 May 2026 11:41:08 -0700 Message-ID: <20260511184116.3687392-4-jacob.pan@linux.microsoft.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260511184116.3687392-1-jacob.pan@linux.microsoft.com> References: <20260511184116.3687392-1-jacob.pan@linux.microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Jason Gunthorpe So it can be reused in the next patch which allows binding to noiommu device. Reviewed-by: Samiullah Khawaja Reviewed-by: Yi Liu Reviewed-by: Kevin Tian Signed-off-by: Jason Gunthorpe Signed-off-by: Jacob Pan Reviewed-by: Lu Baolu --- v5: - Add NULL group to the error handling path of iommufd_group_setup_msi() v3: - New patch --- drivers/iommu/iommufd/device.c | 43 +++++++++++++++++++++------------- 1 file changed, 27 insertions(+), 16 deletions(-) diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 170a7005f0bc..d03076fcf3c2 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -56,6 +56,30 @@ static bool iommufd_group_try_get(struct iommufd_group *= igroup, return kref_get_unless_zero(&igroup->ref); } =20 +static struct iommufd_group *iommufd_alloc_group(struct iommufd_ctx *ictx, + struct iommu_group *group) +{ + struct iommufd_group *new_igroup; + + new_igroup =3D kzalloc(sizeof(*new_igroup), GFP_KERNEL); + if (!new_igroup) + return ERR_PTR(-ENOMEM); + + kref_init(&new_igroup->ref); + mutex_init(&new_igroup->lock); + xa_init(&new_igroup->pasid_attach); + new_igroup->sw_msi_start =3D PHYS_ADDR_MAX; + /* group reference moves into new_igroup */ + new_igroup->group =3D group; + + /* + * The ictx is not additionally refcounted here because all objects using + * an igroup must put it before their destroy completes. + */ + new_igroup->ictx =3D ictx; + return new_igroup; +} + /* * iommufd needs to store some more data for each iommu_group, we keep a * parallel xarray indexed by iommu_group id to hold this instead of putti= ng it @@ -87,25 +111,12 @@ static struct iommufd_group *iommufd_get_group(struct = iommufd_ctx *ictx, } xa_unlock(&ictx->groups); =20 - new_igroup =3D kzalloc_obj(*new_igroup); - if (!new_igroup) { + new_igroup =3D iommufd_alloc_group(ictx, group); + if (IS_ERR(new_igroup)) { iommu_group_put(group); - return ERR_PTR(-ENOMEM); + return new_igroup; } =20 - kref_init(&new_igroup->ref); - mutex_init(&new_igroup->lock); - xa_init(&new_igroup->pasid_attach); - new_igroup->sw_msi_start =3D PHYS_ADDR_MAX; - /* group reference moves into new_igroup */ - new_igroup->group =3D group; - - /* - * The ictx is not additionally refcounted here becase all objects using - * an igroup must put it before their destroy completes. - */ - new_igroup->ictx =3D ictx; - /* * We dropped the lock so igroup is invalid. NULL is a safe and likely * value to assume for the xa_cmpxchg algorithm. --=20 2.43.0 From nobody Sat Jun 13 00:25:45 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 25D504ADDBE for ; Mon, 11 May 2026 18:41:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778524882; cv=none; b=rwObdhlAaTyCi+q55rX79vJRRYkkDEi4kqDg6UQ7fPpPngoxiN7viCHaKWpi0Iqh2j/f4JZ+61Tt9IYRUz0EZk1uRrF3epnW3hrqq38eHvUoH0MGl1uKDFG1drWMWS6voG6e36DTZor25Wufan6D+vsawMA8x5AcuuZ/TWAFQ7k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778524882; c=relaxed/simple; bh=DaspiKIfSFio4VTISQhAiLDWpdnRJjb51za7mPSAX+Q=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=bizY+MEfN5wY/cBbi91nOiTDZlsAA0qwtw5zjhRvh6XUwl+cb8IaX7f7hyyzXO8OGMZmhg9LHsiIFD1nJ7bGuvvuaoY/4y9gaJZF7pL+nrj8OKv6+qrL5903MWxGg7RAV1VI7RsWzVkn//pWmNF8IaGvo7A3e3NVoR73odqiYtA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=ALwXEGZ2; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="ALwXEGZ2" Received: from administrator-PowerEdge-R660.corp.microsoft.com (unknown [131.107.147.7]) by linux.microsoft.com (Postfix) with ESMTPSA id 6193920B716C; Mon, 11 May 2026 11:41:18 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 6193920B716C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1778524878; bh=o/Uk4IOdsYKAtmsHn82zexbucnOfgG88cc8Diqnjjsw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ALwXEGZ2voqH2EtrmyQZPKyDHKcMEIuCC7+rURH8BJAatHjOuF7gCexQyhj1b2SUY CKjsPivTR39hm7IGpRr+ZgosV2dx0pjylLoTxxZlYJ+E469TE/szevyU7YU1BbYYyy hahF6qybQ0HOJuiVOO3DV/YbvxJOv1TQdshwrRIQ= From: Jacob Pan To: linux-kernel@vger.kernel.org, "iommu@lists.linux.dev" , Jason Gunthorpe , Alex Williamson , Joerg Roedel , Mostafa Saleh , David Matlack , Robin Murphy , Nicolin Chen , "Tian, Kevin" , Yi Liu Cc: Saurabh Sengar , skhawaja@google.com, pasha.tatashin@soleen.com, Will Deacon , Jacob Pan , Baolu Lu Subject: [PATCH v5 4/9] iommufd: Allow binding to a noiommu device Date: Mon, 11 May 2026 11:41:09 -0700 Message-ID: <20260511184116.3687392-5-jacob.pan@linux.microsoft.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260511184116.3687392-1-jacob.pan@linux.microsoft.com> References: <20260511184116.3687392-1-jacob.pan@linux.microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Jason Gunthorpe Allow iommufd to bind devices without an IOMMU (noiommu mode) by creating a dummy IOMMU group for such devices and skipping hwpt operations. This enables noiommu devices to operate through the same iommufd API as IOM= MU- capable devices. Signed-off-by: Jason Gunthorpe Signed-off-by: Jacob Pan Reviewed-by: Yi Liu --- v5: - simplify logic and rename iommufd_device_is_noiommu (Kevin, Yi) - use a helper iommufd_bind_noiommu instead of open coding (Kevin) - move IOMMU cap check under iommufd_bind_iommu() (Yi) - reword comments for partial init (Yi) - misc minor clean up v4: - Update the description of the module parameter (Alex) v3: - Consolidate into fewer patches --- drivers/iommu/iommufd/device.c | 148 ++++++++++++++++++++++++--------- 1 file changed, 109 insertions(+), 39 deletions(-) diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index d03076fcf3c2..4d75720432cc 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -23,6 +23,16 @@ struct iommufd_attach { struct xarray device_array; }; =20 +/* + * A noiommu device has no IOMMU driver attached regardless of whether it + * enters via the cdev path (no iommu_group) or the group path (fake + * noiommu iommu_group). In both cases dev->iommu is NULL. + */ +static bool iommufd_device_is_noiommu(struct iommufd_device *idev) +{ + return IS_ENABLED(CONFIG_IOMMUFD_NOIOMMU) && !idev->dev->iommu; +} + static void iommufd_group_release(struct kref *kref) { struct iommufd_group *igroup =3D @@ -30,9 +40,11 @@ static void iommufd_group_release(struct kref *kref) =20 WARN_ON(!xa_empty(&igroup->pasid_attach)); =20 - xa_cmpxchg(&igroup->ictx->groups, iommu_group_id(igroup->group), igroup, - NULL, GFP_KERNEL); - iommu_group_put(igroup->group); + if (igroup->group) { + xa_cmpxchg(&igroup->ictx->groups, iommu_group_id(igroup->group), + igroup, NULL, GFP_KERNEL); + iommu_group_put(igroup->group); + } mutex_destroy(&igroup->lock); kfree(igroup); } @@ -204,32 +216,19 @@ void iommufd_device_destroy(struct iommufd_object *ob= j) struct iommufd_device *idev =3D container_of(obj, struct iommufd_device, obj); =20 - iommu_device_release_dma_owner(idev->dev); + if (!idev->igroup) + return; + if (!iommufd_device_is_noiommu(idev)) + iommu_device_release_dma_owner(idev->dev); iommufd_put_group(idev->igroup); if (!iommufd_selftest_is_mock_dev(idev->dev)) iommufd_ctx_put(idev->ictx); } =20 -/** - * iommufd_device_bind - Bind a physical device to an iommu fd - * @ictx: iommufd file descriptor - * @dev: Pointer to a physical device struct - * @id: Output ID number to return to userspace for this device - * - * A successful bind establishes an ownership over the device and returns - * struct iommufd_device pointer, otherwise returns error pointer. - * - * A driver using this API must set driver_managed_dma and must not touch - * the device until this routine succeeds and establishes ownership. - * - * Binding a PCI device places the entire RID under iommufd control. - * - * The caller must undo this with iommufd_device_unbind() - */ -struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx, - struct device *dev, u32 *id) +static int iommufd_bind_iommu(struct iommufd_device *idev) { - struct iommufd_device *idev; + struct iommufd_ctx *ictx =3D idev->ictx; + struct device *dev =3D idev->dev; struct iommufd_group *igroup; int rc; =20 @@ -238,11 +237,11 @@ struct iommufd_device *iommufd_device_bind(struct iom= mufd_ctx *ictx, * to restore cache coherency. */ if (!device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY)) - return ERR_PTR(-EINVAL); + return -EINVAL; =20 igroup =3D iommufd_get_group(ictx, dev); if (IS_ERR(igroup)) - return ERR_CAST(igroup); + return PTR_ERR(igroup); =20 /* * For historical compat with VFIO the insecure interrupt path is @@ -268,21 +267,80 @@ struct iommufd_device *iommufd_device_bind(struct iom= mufd_ctx *ictx, if (rc) goto out_group_put; =20 + /* igroup refcount moves into iommufd_device */ + idev->igroup =3D igroup; + idev->enforce_cache_coherency =3D + device_iommu_capable(dev, IOMMU_CAP_ENFORCE_CACHE_COHERENCY); + return 0; + +out_group_put: + iommufd_put_group(igroup); + return rc; +} + +/* + * Noiommu devices have no real IOMMU group. Create a dummy igroup so that + * internal code paths that expect idev->igroup to be present still work. + * A NULL igroup->group distinguishes this from a real IOMMU-backed group. + */ +static int iommufd_bind_noiommu(struct iommufd_device *idev) +{ + struct iommufd_group *igroup; + + igroup =3D iommufd_alloc_group(idev->ictx, NULL); + if (IS_ERR(igroup)) + return PTR_ERR(igroup); + idev->igroup =3D igroup; + return 0; +} + +/** + * iommufd_device_bind - Bind a physical device to an iommu fd + * @ictx: iommufd file descriptor + * @dev: Pointer to a physical device struct + * @id: Output ID number to return to userspace for this device + * + * A successful bind establishes an ownership over the device and returns + * struct iommufd_device pointer, otherwise returns error pointer. + * + * A driver using this API must set driver_managed_dma and must not touch + * the device until this routine succeeds and establishes ownership. + * + * Binding a PCI device places the entire RID under iommufd control. + * + * The caller must undo this with iommufd_device_unbind() + */ +struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx, + struct device *dev, u32 *id) +{ + struct iommufd_device *idev; + int rc; + idev =3D iommufd_object_alloc(ictx, idev, IOMMUFD_OBJ_DEVICE); - if (IS_ERR(idev)) { - rc =3D PTR_ERR(idev); - goto out_release_owner; - } + if (IS_ERR(idev)) + return idev; + idev->ictx =3D ictx; + idev->dev =3D dev; + + if (!iommufd_device_is_noiommu(idev)) { + rc =3D iommufd_bind_iommu(idev); + if (rc) + goto err_out; + } else { + rc =3D iommufd_bind_noiommu(idev); + if (rc) + goto err_out; + } + + /* + * Take a ctx reference after bind succeeds. This must happen here + * so that iommufd_device_destroy() can handle partial initialization + */ if (!iommufd_selftest_is_mock_dev(dev)) iommufd_ctx_get(ictx); - idev->dev =3D dev; - idev->enforce_cache_coherency =3D - device_iommu_capable(dev, IOMMU_CAP_ENFORCE_CACHE_COHERENCY); /* The calling driver is a user until iommufd_device_unbind() */ refcount_inc(&idev->obj.users); - /* igroup refcount moves into iommufd_device */ - idev->igroup =3D igroup; =20 /* * If the caller fails after this success it must call @@ -294,11 +352,14 @@ struct iommufd_device *iommufd_device_bind(struct iom= mufd_ctx *ictx, *id =3D idev->obj.id; return idev; =20 -out_release_owner: - iommu_device_release_dma_owner(dev); -out_group_put: - iommufd_put_group(igroup); +err_out: + /* + * iommufd_device_destroy() handles partially initialized idev, + * so iommufd_object_abort_and_destroy() is safe to call here. + */ + iommufd_object_abort_and_destroy(ictx, &idev->obj); return ERR_PTR(rc); + } EXPORT_SYMBOL_NS_GPL(iommufd_device_bind, "IOMMUFD"); =20 @@ -512,6 +573,9 @@ static int iommufd_hwpt_attach_device(struct iommufd_hw= _pagetable *hwpt, struct iommufd_attach_handle *handle; int rc; =20 + if (iommufd_device_is_noiommu(idev)) + return 0; + if (!iommufd_hwpt_compatible_device(hwpt, idev)) return -EINVAL; =20 @@ -559,6 +623,9 @@ static void iommufd_hwpt_detach_device(struct iommufd_h= w_pagetable *hwpt, { struct iommufd_attach_handle *handle; =20 + if (iommufd_device_is_noiommu(idev)) + return; + handle =3D iommufd_device_get_attach_handle(idev, pasid); if (pasid =3D=3D IOMMU_NO_PASID) iommu_detach_group_handle(hwpt->domain, idev->igroup->group); @@ -577,6 +644,9 @@ static int iommufd_hwpt_replace_device(struct iommufd_d= evice *idev, struct iommufd_attach_handle *handle, *old_handle; int rc; =20 + if (iommufd_device_is_noiommu(idev)) + return 0; + if (!iommufd_hwpt_compatible_device(hwpt, idev)) return -EINVAL; =20 @@ -652,7 +722,7 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_paget= able *hwpt, goto err_release_devid; } =20 - if (attach_resv) { + if (attach_resv && !iommufd_device_is_noiommu(idev)) { rc =3D iommufd_device_attach_reserved_iova(idev, hwpt_paging); if (rc) goto err_release_devid; --=20 2.43.0 From nobody Sat Jun 13 00:25:45 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id A90B54B8DE3 for ; Mon, 11 May 2026 18:41:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778524884; cv=none; b=nLRznz0SpcCarV0roFTFVor+ZTYTXNpiY5F2lqmLm/XREx4CEhfGaq6j2MjVeHk8MMalXRr1LlAQF1TVGlLzvdIyrQ1SIv1me6ynYEntanN8yV3s+O4gBlINOuJ5pO5OtIjvjnVcNNHkSX77kCBM1onTr8uvfEQ3IebZLXYbZX8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778524884; c=relaxed/simple; bh=S/mLqIW2xoIjdzsecFkCDuFrsg6BIpzl+B1ViTQaLao=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=BzxeQvbd5Qa3GKNlIlxh1eXLvvM4qTPJpOhG350ucVwRdgnwfQGT26GJpCEQAdr/xSXA7P5hC5y3FAzTo0HahoGzn7EKrArtKT2ee13xYY8Lg8o8PYKpOe7RgOoYtaOb0qhLhLpFwdcOyVAfEsgQL88xbN2DL9sNIUwgF5Cs3N0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=cRWVh4ke; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="cRWVh4ke" Received: from administrator-PowerEdge-R660.corp.microsoft.com (unknown [131.107.147.7]) by linux.microsoft.com (Postfix) with ESMTPSA id 6C3EE20B7168; Mon, 11 May 2026 11:41:19 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 6C3EE20B7168 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1778524879; bh=JLT/D+LKRNNW+8Y+8wWf9P3+geYXRWGwMMUe+rr4oek=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=cRWVh4ke8tghsxTSwLfPJPJZdy0OOtVCssMgisG2giEt1BN9iRtzm4MCLQ3WF/OKl NRIwhnsdgnJBIYJ9dieMqcTy/5jm+ayqdnKd2n/0NVZDtzmnrgy+1b9wv4+Ca0o7BV 27sj5YJDD1PlcB+IH1haRaE/yLki/Y8slj5SCPr4= From: Jacob Pan To: linux-kernel@vger.kernel.org, "iommu@lists.linux.dev" , Jason Gunthorpe , Alex Williamson , Joerg Roedel , Mostafa Saleh , David Matlack , Robin Murphy , Nicolin Chen , "Tian, Kevin" , Yi Liu Cc: Saurabh Sengar , skhawaja@google.com, pasha.tatashin@soleen.com, Will Deacon , Jacob Pan , Baolu Lu Subject: [PATCH v5 5/9] iommufd: Add an ioctl to query PA from IOVA for noiommu mode Date: Mon, 11 May 2026 11:41:10 -0700 Message-ID: <20260511184116.3687392-6-jacob.pan@linux.microsoft.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260511184116.3687392-1-jacob.pan@linux.microsoft.com> References: <20260511184116.3687392-1-jacob.pan@linux.microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" To support no-IOMMU mode where userspace drivers perform unsafe DMA using physical addresses, introduce a new API to retrieve the physical address of a user-allocated DMA buffer that has been mapped to an IOVA via IOAS. The mapping is backed by SW-only I/O page tables maintained by the generic IOMMUPT framework. Suggested-by: Jason Gunthorpe Signed-off-by: Jason Gunthorpe Signed-off-by: Jacob Pan Reviewed-by: Lu Baolu --- v5: - Add header stubs for iopt_get_phys() and iommufd_ioas_noiommu_get_pa() to avoid ifdef at call sites (Kevin) v4: - Fix ioctl return type (Yi Liu) v2: - New patch --- drivers/iommu/iommufd/io_pagetable.c | 62 +++++++++++++++++++++++++ drivers/iommu/iommufd/ioas.c | 30 ++++++++++++ drivers/iommu/iommufd/iommufd_private.h | 18 +++++++ drivers/iommu/iommufd/main.c | 3 ++ include/uapi/linux/iommufd.h | 25 ++++++++++ 5 files changed, 138 insertions(+) diff --git a/drivers/iommu/iommufd/io_pagetable.c b/drivers/iommu/iommufd/i= o_pagetable.c index 24d4917105d9..1ee7c8e6408c 100644 --- a/drivers/iommu/iommufd/io_pagetable.c +++ b/drivers/iommu/iommufd/io_pagetable.c @@ -859,6 +859,68 @@ int iopt_unmap_iova(struct io_pagetable *iopt, unsigne= d long iova, return iopt_unmap_iova_range(iopt, iova, iova_last, unmapped); } =20 +#ifdef CONFIG_IOMMUFD_NOIOMMU +int iopt_get_phys(struct io_pagetable *iopt, unsigned long iova, u64 *padd= r, + u64 *length) +{ + struct iopt_area *area; + u64 tmp_length =3D 0; + u64 tmp_paddr =3D 0; + int rc =3D 0; + + down_read(&iopt->iova_rwsem); + area =3D iopt_area_iter_first(iopt, iova, iova); + if (!area || !area->pages) { + rc =3D -ENOENT; + goto unlock_exit; + } + + if (!area->storage_domain || + area->storage_domain->owner !=3D &iommufd_noiommu_ops) { + rc =3D -EOPNOTSUPP; + goto unlock_exit; + } + + *paddr =3D iommu_iova_to_phys(area->storage_domain, iova); + if (!*paddr) { + rc =3D -EINVAL; + goto unlock_exit; + } + + tmp_length =3D PAGE_SIZE - offset_in_page(iova); + tmp_paddr =3D *paddr; + /* + * Scan the domain for the contiguous physical address length so that + * userspace search can be optimized for fewer ioctls. + */ + while (iova < iopt_area_last_iova(area)) { + unsigned long next_iova; + u64 next_paddr; + + if (check_add_overflow(iova, PAGE_SIZE, &next_iova)) + break; + + if (next_iova > iopt_area_last_iova(area)) + break; + + next_paddr =3D iommu_iova_to_phys(area->storage_domain, next_iova); + + if (!next_paddr || next_paddr !=3D tmp_paddr + PAGE_SIZE) + break; + + iova =3D next_iova; + tmp_paddr +=3D PAGE_SIZE; + tmp_length +=3D PAGE_SIZE; + } + *length =3D tmp_length; + +unlock_exit: + up_read(&iopt->iova_rwsem); + + return rc; +} +#endif + int iopt_unmap_all(struct io_pagetable *iopt, unsigned long *unmapped) { /* If the IOVAs are empty then unmap all succeeds */ diff --git a/drivers/iommu/iommufd/ioas.c b/drivers/iommu/iommufd/ioas.c index fed06c2b728e..666440e32c9e 100644 --- a/drivers/iommu/iommufd/ioas.c +++ b/drivers/iommu/iommufd/ioas.c @@ -375,6 +375,36 @@ int iommufd_ioas_unmap(struct iommufd_ucmd *ucmd) return rc; } =20 +#ifdef CONFIG_IOMMUFD_NOIOMMU +int iommufd_ioas_noiommu_get_pa(struct iommufd_ucmd *ucmd) +{ + struct iommu_ioas_noiommu_get_pa *cmd =3D ucmd->cmd; + struct iommufd_ioas *ioas; + int rc; + + if (!capable(CAP_SYS_RAWIO)) + return -EPERM; + + if (cmd->flags || cmd->__reserved) + return -EOPNOTSUPP; + + ioas =3D iommufd_get_ioas(ucmd->ictx, cmd->ioas_id); + if (IS_ERR(ioas)) + return PTR_ERR(ioas); + + rc =3D iopt_get_phys(&ioas->iopt, cmd->iova, &cmd->out_phys, + &cmd->out_length); + if (rc) + goto out_put; + + rc =3D iommufd_ucmd_respond(ucmd, sizeof(*cmd)); +out_put: + iommufd_put_object(ucmd->ictx, &ioas->obj); + + return rc; +} +#endif + static void iommufd_release_all_iova_rwsem(struct iommufd_ctx *ictx, struct xarray *ioas_list) { diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommuf= d/iommufd_private.h index 2682b5baa6e9..13f1506d8066 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -118,6 +118,16 @@ int iopt_map_pages(struct io_pagetable *iopt, struct l= ist_head *pages_list, int iopt_unmap_iova(struct io_pagetable *iopt, unsigned long iova, unsigned long length, unsigned long *unmapped); int iopt_unmap_all(struct io_pagetable *iopt, unsigned long *unmapped); +#ifdef CONFIG_IOMMUFD_NOIOMMU +int iopt_get_phys(struct io_pagetable *iopt, unsigned long iova, u64 *padd= r, + u64 *length); +#else +static inline int iopt_get_phys(struct io_pagetable *iopt, unsigned long i= ova, + u64 *paddr, u64 *length) +{ + return -EOPNOTSUPP; +} +#endif =20 int iopt_read_and_clear_dirty_data(struct io_pagetable *iopt, struct iommu_domain *domain, @@ -346,6 +356,14 @@ int iommufd_ioas_map_file(struct iommufd_ucmd *ucmd); int iommufd_ioas_change_process(struct iommufd_ucmd *ucmd); int iommufd_ioas_copy(struct iommufd_ucmd *ucmd); int iommufd_ioas_unmap(struct iommufd_ucmd *ucmd); +#ifdef CONFIG_IOMMUFD_NOIOMMU +int iommufd_ioas_noiommu_get_pa(struct iommufd_ucmd *ucmd); +#else +static inline int iommufd_ioas_noiommu_get_pa(struct iommufd_ucmd *ucmd) +{ + return -EOPNOTSUPP; +} +#endif int iommufd_ioas_option(struct iommufd_ucmd *ucmd); int iommufd_option_rlimit_mode(struct iommu_option *cmd, struct iommufd_ctx *ictx); diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index 8c6d43601afb..3b4192d70570 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -424,6 +424,7 @@ union ucmd_buffer { struct iommu_ioas_alloc alloc; struct iommu_ioas_allow_iovas allow_iovas; struct iommu_ioas_copy ioas_copy; + struct iommu_ioas_noiommu_get_pa noiommu_get_pa; struct iommu_ioas_iova_ranges iova_ranges; struct iommu_ioas_map map; struct iommu_ioas_unmap unmap; @@ -482,6 +483,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[= ] =3D { IOCTL_OP(IOMMU_IOAS_MAP, iommufd_ioas_map, struct iommu_ioas_map, iova), IOCTL_OP(IOMMU_IOAS_MAP_FILE, iommufd_ioas_map_file, struct iommu_ioas_map_file, iova), + IOCTL_OP(IOMMU_IOAS_NOIOMMU_GET_PA, iommufd_ioas_noiommu_get_pa, struct i= ommu_ioas_noiommu_get_pa, + out_phys), IOCTL_OP(IOMMU_IOAS_UNMAP, iommufd_ioas_unmap, struct iommu_ioas_unmap, length), IOCTL_OP(IOMMU_OPTION, iommufd_option, struct iommu_option, val64), diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index e998dfbd6960..7df366d161f1 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -57,6 +57,7 @@ enum { IOMMUFD_CMD_IOAS_CHANGE_PROCESS =3D 0x92, IOMMUFD_CMD_VEVENTQ_ALLOC =3D 0x93, IOMMUFD_CMD_HW_QUEUE_ALLOC =3D 0x94, + IOMMUFD_CMD_IOAS_NOIOMMU_GET_PA =3D 0x95, }; =20 /** @@ -219,6 +220,30 @@ struct iommu_ioas_map { }; #define IOMMU_IOAS_MAP _IO(IOMMUFD_TYPE, IOMMUFD_CMD_IOAS_MAP) =20 +/** + * struct iommu_ioas_noiommu_get_pa - ioctl(IOMMU_IOAS_NOIOMMU_GET_PA) + * @size: sizeof(struct iommu_ioas_noiommu_get_pa) + * @flags: Reserved, must be 0 for now + * @ioas_id: IOAS ID to query IOVA to PA mapping from + * @__reserved: Must be 0 + * @iova: IOVA to query + * @out_length: Number of bytes contiguous physical address starting from = phys + * @out_phys: Output physical address the IOVA maps to + * + * Query the physical address backing an IOVA range. The entire range must= be + * mapped already. For noiommu devices doing unsafe DMA only. + */ +struct iommu_ioas_noiommu_get_pa { + __u32 size; + __u32 flags; + __u32 ioas_id; + __u32 __reserved; + __aligned_u64 iova; + __aligned_u64 out_length; + __aligned_u64 out_phys; +}; +#define IOMMU_IOAS_NOIOMMU_GET_PA _IO(IOMMUFD_TYPE, IOMMUFD_CMD_IOAS_NOIOM= MU_GET_PA) + /** * struct iommu_ioas_map_file - ioctl(IOMMU_IOAS_MAP_FILE) * @size: sizeof(struct iommu_ioas_map_file) --=20 2.43.0 From nobody Sat Jun 13 00:25:45 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 8D5794BC027 for ; Mon, 11 May 2026 18:41:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778524884; cv=none; b=rCuu946MyA9X+RBSA/9SjNfOm6oVkOFRyJxH5ptkcgOpzbJNkJOuNRZN82CnU/86sRjsLUPz2MkBghOS2JOCpRA976t6THY7iLjq7sduuFMsXsfouhLPZIC4/C3xvZgON3yzbHDPhOZCC49dXxHHNnR8BuQKzfwRdOS8cuufpdw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778524884; c=relaxed/simple; bh=bQsSNTNsI2IlIYddQCMknSy1Db/4RJRmkKxSm/jKHvM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=hHYDYfe/4OxoI+S/yT9eFSLuMlMJ/19h5ny2QnAGaG3f6+VHSY5wmBOezvWzwnAYdDCvW95ArmeJ1MAGEJXclQX18ZSEMtNJW8qjV0u3AbifBhKVh+C11d554DiNv30qeF42U1WIr9DC7YXfNl+fgcM7q80Fgf25BJUMLb7EVyI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=Y8lccsXz; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="Y8lccsXz" Received: from administrator-PowerEdge-R660.corp.microsoft.com (unknown [131.107.147.7]) by linux.microsoft.com (Postfix) with ESMTPSA id 0A76620B7169; Mon, 11 May 2026 11:41:20 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 0A76620B7169 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1778524880; bh=TDZ5i67VQF3EWjuduA+apFnardmoQaxov31srAy2fpI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Y8lccsXz7o2I2td4BBDbTkk3NCB4UAQcooWM6THembNSODIXKXXvR+hdXVqQ+ewUm H+sQWHL7fmoed8loB53FgwRVVknUbTpBcTFaow3yA+2IYvakV71+M3odae9IEcLtUk h/NppbsGCQRc0lDIm+MSu1nOnToCM3sBUb/7l6TQ= From: Jacob Pan To: linux-kernel@vger.kernel.org, "iommu@lists.linux.dev" , Jason Gunthorpe , Alex Williamson , Joerg Roedel , Mostafa Saleh , David Matlack , Robin Murphy , Nicolin Chen , "Tian, Kevin" , Yi Liu Cc: Saurabh Sengar , skhawaja@google.com, pasha.tatashin@soleen.com, Will Deacon , Jacob Pan , Baolu Lu Subject: [PATCH v5 6/9] vfio/group: Add VFIO_CDEV_NOIOMMU Kconfig and tolerate NULL group Date: Mon, 11 May 2026 11:41:11 -0700 Message-ID: <20260511184116.3687392-7-jacob.pan@linux.microsoft.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260511184116.3687392-1-jacob.pan@linux.microsoft.com> References: <20260511184116.3687392-1-jacob.pan@linux.microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Add a new CONFIG_VFIO_CDEV_NOIOMMU option, independent of CONFIG_VFIO_GROUP, to support noiommu mode via the cdev interface. Since CONFIG_VFIO_GROUP can be enabled while CONFIG_VFIO_GROUP_NOIOMMU is not, guard the noiommu group allocation in vfio_group_find_or_alloc() with IS_ENABLED(CONFIG_VFIO_GROUP_NOIOMMU) to prevent creating spurious /dev/vfio/noiommu-N group files when only cdev noiommu is configured. For cdev noiommu devices that have no group, let vfio_device_set_group() return success with a NULL group pointer and add null guards in group functions that may be called during device lifecycle. These guards are contained within group.c and are dead code for IOMMU-enabled devices where device->group is always non-NULL. Signed-off-by: Jacob Pan --- drivers/vfio/Kconfig | 17 +++++++++++++++++ drivers/vfio/group.c | 31 +++++++++++++++++++++++++++++-- 2 files changed, 46 insertions(+), 2 deletions(-) diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig index 39939be2908e..b1b1633412a9 100644 --- a/drivers/vfio/Kconfig +++ b/drivers/vfio/Kconfig @@ -75,6 +75,23 @@ config VFIO_GROUP_NOIOMMU =20 If you don't know what to do here, say N. =20 +config VFIO_CDEV_NOIOMMU + bool "VFIO cdev No-IOMMU support" + depends on VFIO_DEVICE_CDEV + select IOMMUFD_NOIOMMU + help + VFIO cdev no-IOMMU mode enables device access via the cdev + interface without hardware IOMMU backing. This relies on + IOMMUFD_NOIOMMU to provide a SW-only IO page table for + IOVA-to-PA lookups. + + Use of this mode will result in an unsupportable kernel and + will therefore taint the kernel. Device assignment to virtual + machines is also not possible with this mode since there is + no IOMMU to provide DMA translation. + + If you don't know what to do here, say N. + config VFIO_VIRQFD bool select EVENTFD diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c index 5b9329df04e5..c8a75ee28f20 100644 --- a/drivers/vfio/group.c +++ b/drivers/vfio/group.c @@ -386,6 +386,9 @@ int vfio_device_block_group(struct vfio_device *device) struct vfio_group *group =3D device->group; int ret =3D 0; =20 + if (!group) + return 0; + mutex_lock(&group->group_lock); if (group->opened_file) { ret =3D -EBUSY; @@ -403,6 +406,9 @@ void vfio_device_unblock_group(struct vfio_device *devi= ce) { struct vfio_group *group =3D device->group; =20 + if (!group) + return; + mutex_lock(&group->group_lock); group->cdev_device_open_cnt--; mutex_unlock(&group->group_lock); @@ -641,7 +647,8 @@ static struct vfio_group *vfio_group_find_or_alloc(stru= ct device *dev) struct vfio_group *group; =20 iommu_group =3D iommu_group_get(dev); - if (!iommu_group && vfio_noiommu) { + if (!iommu_group && IS_ENABLED(CONFIG_VFIO_GROUP_NOIOMMU) && + vfio_noiommu) { /* * With noiommu enabled, create an IOMMU group for devices that * don't already have one, implying no IOMMU hardware/driver @@ -686,8 +693,19 @@ int vfio_device_set_group(struct vfio_device *device, else group =3D vfio_noiommu_group_alloc(device->dev, type); =20 - if (IS_ERR(group)) + if (IS_ERR(group)) { + /* + * Cdev noiommu devices don't need a vfio_group. When + * CONFIG_VFIO_GROUP_NOIOMMU is not set, the group alloc + * above returns -EINVAL for devices without an IOMMU. + * That's fine =E2=80=94 a NULL group is expected and iommufd + * handles these devices directly. + */ + if (IS_ENABLED(CONFIG_VFIO_CDEV_NOIOMMU) && + vfio_noiommu && !device->dev->iommu) + return 0; return PTR_ERR(group); + } =20 /* Our reference on group is moved to the device */ device->group =3D group; @@ -699,6 +717,9 @@ void vfio_device_remove_group(struct vfio_device *devic= e) struct vfio_group *group =3D device->group; struct iommu_group *iommu_group; =20 + if (!group) + return; + if (group->type =3D=3D VFIO_NO_IOMMU || group->type =3D=3D VFIO_EMULATED_= IOMMU) iommu_group_remove_device(device->dev); =20 @@ -742,6 +763,8 @@ void vfio_device_remove_group(struct vfio_device *devic= e) =20 void vfio_device_group_register(struct vfio_device *device) { + if (!device->group) + return; mutex_lock(&device->group->device_lock); list_add(&device->group_next, &device->group->device_list); mutex_unlock(&device->group->device_lock); @@ -749,6 +772,8 @@ void vfio_device_group_register(struct vfio_device *dev= ice) =20 void vfio_device_group_unregister(struct vfio_device *device) { + if (!device->group) + return; mutex_lock(&device->group->device_lock); list_del(&device->group_next); mutex_unlock(&device->group->device_lock); @@ -786,6 +811,8 @@ void vfio_device_group_unuse_iommu(struct vfio_device *= device) =20 bool vfio_device_has_container(struct vfio_device *device) { + if (!device->group) + return false; return device->group->container; } =20 --=20 2.43.0 From nobody Sat Jun 13 00:25:45 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 28DC74ADDBF for ; Mon, 11 May 2026 18:41:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778524884; cv=none; b=V3UP9+M2E8N8GXyhT1ykNOMgd+KCnWqw0sHl9Wh+OGhHoV7/CweHxr/kBz+iiH2JrzCY0IW47+g17cxfqpTXm8FRdVUtjrQfOMEDbG5uGuC4eRrb7drc47QnwzyZjA3ODnAKed+lKctnsxddMFxYFF+Zc31oKsijzqkvFRKxyno= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778524884; c=relaxed/simple; bh=pGKSHzjBjry0ncUxTRtbbPnjc+fwKLyZhbIVYzJgVoI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=hnJvxml7H7xxEzy5zjtMMgIt+MtVeILyhbZ3seHk5ACLUY6SNAKqUrsQJirolfrsWZQQwklI2TdKi/YNHRjAsqOMf0+a2yvwcnLs2PuRZna+/PXrkdLu5Ex/sdyCfHJTdeBeJmIPDts8391iFdGnPq64k+dt7aCLeioMH1e7vyw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=OXhXbbaW; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="OXhXbbaW" Received: from administrator-PowerEdge-R660.corp.microsoft.com (unknown [131.107.147.7]) by linux.microsoft.com (Postfix) with ESMTPSA id D00C920B716E; Mon, 11 May 2026 11:41:20 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com D00C920B716E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1778524880; bh=A8hla75Jg/26Fd28mQp+uOQndUsTAifFfjWEmyAYl8U=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=OXhXbbaWnGQADkQJ1DsIkvpW4qShPrKhsG7gbMUZ/Z87BuehOaGDkKwSYj/+665ub 70BxcQqwu6mSrKA3V/HDbM8udWN/cBCyDnrpmEhuKq7C+xvwR4oVM0NkAttAdhIkLr vSkcLUHGBb1WT5U4RMASwjGb1EceEEQ0BWgbSbsg= From: Jacob Pan To: linux-kernel@vger.kernel.org, "iommu@lists.linux.dev" , Jason Gunthorpe , Alex Williamson , Joerg Roedel , Mostafa Saleh , David Matlack , Robin Murphy , Nicolin Chen , "Tian, Kevin" , Yi Liu Cc: Saurabh Sengar , skhawaja@google.com, pasha.tatashin@soleen.com, Will Deacon , Jacob Pan , Baolu Lu Subject: [PATCH v5 7/9] vfio: Enable cdev noiommu mode under iommufd Date: Mon, 11 May 2026 11:41:12 -0700 Message-ID: <20260511184116.3687392-8-jacob.pan@linux.microsoft.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260511184116.3687392-1-jacob.pan@linux.microsoft.com> References: <20260511184116.3687392-1-jacob.pan@linux.microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Now that devices under noiommu mode can bind with IOMMUFD and perform IOAS operations, lift restrictions on cdev from VFIO side. Remove the vfio_device_is_group_noiommu() early returns in vfio_df_iommufd_bind() and vfio_df_iommufd_unbind() so that both group and cdev noiommu devices go through the standard iommufd bind path. This is safe because iommufd_device_bind() now handles noiommu devices via its own iommufd_device_is_noiommu() check. Add CAP_SYS_RAWIO checks for cdev open and bind under noiommu to maintain security parity with the group noiommu path. No IOMMU cdevs are explicitly named with noiommu prefix. e.g. /dev/vfio/ |-- devices | `-- noiommu-vfio0 `-- vfio Signed-off-by: Jacob Pan --- v5: - Add Kconfig VFIO_CDEV_NOIOMMU to select IOMMUFD_NOIOMMU and its dependencies - Add comment to explain vfio_noiommu conditional definition (Alex) - Removed early return for group noiommu in bind/unbind - Use consistent wording referring to VFIO noiommu mode (Kevin) - Update unsafe_noiommu Kconfig help text (Kevin) - Change dev_warn to dev_info for noiommu enabling msg (Kevin) v4: - Remove early return in iommufd_bind for noiommu (Alex) v3: - Consolidate into fewer patches v2: - removed unnecessary device->noiommu set in iommufd_vfio_compat_ioas_get_id() --- drivers/vfio/Kconfig | 3 +-- drivers/vfio/device_cdev.c | 10 ++++++++++ drivers/vfio/iommufd.c | 7 ------- drivers/vfio/vfio.h | 22 ++++++++++++++-------- drivers/vfio/vfio_main.c | 25 ++++++++++++++++++++----- include/linux/vfio.h | 1 + 6 files changed, 46 insertions(+), 22 deletions(-) diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig index b1b1633412a9..b1a260b6054c 100644 --- a/drivers/vfio/Kconfig +++ b/drivers/vfio/Kconfig @@ -22,8 +22,7 @@ config VFIO_DEVICE_CDEV The VFIO device cdev is another way for userspace to get device access. Userspace gets device fd by opening device cdev under /dev/vfio/devices/vfioX, and then bind the device fd with an iommufd - to set up secure DMA context for device access. This interface does - not support noiommu. + to set up secure DMA context for device access. =20 If you don't know what to do here, say N. =20 diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c index 54abf312cf04..46a808244398 100644 --- a/drivers/vfio/device_cdev.c +++ b/drivers/vfio/device_cdev.c @@ -27,6 +27,9 @@ int vfio_device_fops_cdev_open(struct inode *inode, struc= t file *filep) struct vfio_device_file *df; int ret; =20 + if (device->noiommu && !capable(CAP_SYS_RAWIO)) + return -EPERM; + /* Paired with the put in vfio_device_fops_release() */ if (!vfio_device_try_get_registration(device)) return -ENODEV; @@ -110,6 +113,13 @@ long vfio_df_ioctl_bind_iommufd(struct vfio_device_fil= e *df, if (df->group) return -EINVAL; =20 + /* + * CAP_SYS_RAWIO is already checked at cdev open, recheck here + * in case the fd was passed to a less privileged process. + */ + if (device->noiommu && !capable(CAP_SYS_RAWIO)) + return -EPERM; + ret =3D vfio_device_block_group(device); if (ret) return ret; diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c index 39079ab27f92..bc80056c74d3 100644 --- a/drivers/vfio/iommufd.c +++ b/drivers/vfio/iommufd.c @@ -25,10 +25,6 @@ int vfio_df_iommufd_bind(struct vfio_device_file *df) =20 lockdep_assert_held(&vdev->dev_set->lock); =20 - /* Returns 0 to permit device opening under noiommu mode */ - if (vfio_device_is_group_noiommu(vdev)) - return 0; - return vdev->ops->bind_iommufd(vdev, ictx, &df->devid); } =20 @@ -58,9 +54,6 @@ void vfio_df_iommufd_unbind(struct vfio_device_file *df) =20 lockdep_assert_held(&vdev->dev_set->lock); =20 - if (vfio_device_is_group_noiommu(vdev)) - return; - if (vdev->ops->unbind_iommufd) vdev->ops->unbind_iommufd(vdev); } diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h index 602623cacfc0..ac79b1a2fce9 100644 --- a/drivers/vfio/vfio.h +++ b/drivers/vfio/vfio.h @@ -36,7 +36,7 @@ vfio_allocate_device_file(struct vfio_device *device); =20 extern const struct file_operations vfio_device_fops; =20 -#ifdef CONFIG_VFIO_GROUP_NOIOMMU +#if IS_ENABLED(CONFIG_VFIO_GROUP_NOIOMMU) || IS_ENABLED(CONFIG_VFIO_CDEV_N= OIOMMU) extern bool vfio_noiommu __read_mostly; #else enum { vfio_noiommu =3D false }; @@ -358,19 +358,13 @@ void vfio_init_device_cdev(struct vfio_device *device= ); =20 static inline int vfio_device_add(struct vfio_device *device) { - /* cdev does not support noiommu device */ - if (vfio_device_is_group_noiommu(device)) - return device_add(&device->device); vfio_init_device_cdev(device); return cdev_device_add(&device->cdev, &device->device); } =20 static inline void vfio_device_del(struct vfio_device *device) { - if (vfio_device_is_group_noiommu(device)) - device_del(&device->device); - else - cdev_device_del(&device->cdev, &device->device); + cdev_device_del(&device->cdev, &device->device); } =20 int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep); @@ -420,6 +414,18 @@ static inline void vfio_cdev_cleanup(void) } #endif /* CONFIG_VFIO_DEVICE_CDEV */ =20 +#if IS_ENABLED(CONFIG_VFIO_CDEV_NOIOMMU) +static inline bool vfio_device_is_cdev_noiommu(struct vfio_device *vdev) +{ + return vdev->noiommu; +} +#else +static inline bool vfio_device_is_cdev_noiommu(struct vfio_device *vdev) +{ + return false; +} +#endif + #if IS_ENABLED(CONFIG_VFIO_VIRQFD) int __init vfio_virqfd_init(void); void vfio_virqfd_exit(void); diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index 4d940ce6f114..1ba0f282d746 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -54,7 +54,7 @@ static struct vfio { int fs_count; } vfio; =20 -#ifdef CONFIG_VFIO_GROUP_NOIOMMU +#if IS_ENABLED(CONFIG_VFIO_GROUP_NOIOMMU) || IS_ENABLED(CONFIG_VFIO_CDEV_N= OIOMMU) bool vfio_noiommu __read_mostly; module_param_named(enable_unsafe_noiommu_mode, vfio_noiommu, bool, S_IRUGO | S_IWUSR); @@ -321,6 +321,20 @@ static int vfio_init_device(struct vfio_device *device= , struct device *dev, return ret; } =20 +static int vfio_device_set_noiommu_and_name(struct vfio_device *device) +{ + if (IS_ENABLED(CONFIG_VFIO_CDEV_NOIOMMU) && vfio_noiommu && !device->dev-= >iommu) { + device->noiommu =3D true; + add_taint(TAINT_USER, LOCKDEP_STILL_OK); + dev_warn(device->dev, + "Adding kernel taint for vfio-noiommu cdev on device\n"); + } + + /* Just to be safe, expose to user explicitly noiommu cdev node */ + return dev_set_name(&device->device, "%svfio%d", + device->noiommu ? "noiommu-" : "", device->index); +} + static int __vfio_register_dev(struct vfio_device *device, enum vfio_group_type type) { @@ -340,20 +354,21 @@ static int __vfio_register_dev(struct vfio_device *de= vice, if (!device->dev_set) vfio_assign_device_set(device, device); =20 - ret =3D dev_set_name(&device->device, "vfio%d", device->index); + ret =3D vfio_device_set_group(device, type); if (ret) return ret; =20 - ret =3D vfio_device_set_group(device, type); + ret =3D vfio_device_set_noiommu_and_name(device); if (ret) - return ret; + goto err_out; =20 /* * VFIO always sets IOMMU_CACHE because we offer no way for userspace to * restore cache coherency. It has to be checked here because it is only * valid for cases where we are using iommu groups. */ - if (type =3D=3D VFIO_IOMMU && !vfio_device_is_group_noiommu(device) && + if (type =3D=3D VFIO_IOMMU && !(vfio_device_is_group_noiommu(device) || + vfio_device_is_cdev_noiommu(device)) && !device_iommu_capable(device->dev, IOMMU_CAP_CACHE_COHERENCY)) { ret =3D -EINVAL; goto err_out; diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 31b826efba00..45f08986359e 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -74,6 +74,7 @@ struct vfio_device { u8 iommufd_attached:1; #endif u8 cdev_opened:1; + u8 noiommu:1; /* * debug_root is a static property of the vfio_device * which must be set prior to registering the vfio_device. --=20 2.43.0 From nobody Sat Jun 13 00:25:45 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 395AA4C040A for ; Mon, 11 May 2026 18:41:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778524886; cv=none; b=cs6vdi2tQnIYdmveuu01KREzumaMulde8+CKQL+jjkEGwv4+VnYuwm8Ke7+ZkyfWp5WWbhfvtZXOLu3ZMdDVvCquq+Zn6/VE4mjQ58buZmaUmJGJHtqp5JHxPE+usujSTh4ojiHfkp/wIHIBaCuQEotfJOrZXdS6Y5Enivs4ttA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778524886; c=relaxed/simple; bh=VTyHhz1GrZLSfTiFzwyAoB0AhMDOAjSazWnM4+W1lLw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=IzsNzfThRfjaL0N2STNWuJTZTQdJ/LcLtu47VRjOtn6Bp7PahbNxpz1NeeswfgiZ+M3V7zxhK2VxPdx8G18Kn30CLfAErhnQwUvz3WqeXgOlGN9Rg5nAGhV6vWFxUx/HNNEOIuC2xJ6MNtSz3O0TiANJyEWmfN7VqRnodub3Mzs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=oClZKiAs; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="oClZKiAs" Received: from administrator-PowerEdge-R660.corp.microsoft.com (unknown [131.107.147.7]) by linux.microsoft.com (Postfix) with ESMTPSA id 7D36C20B7167; Mon, 11 May 2026 11:41:21 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 7D36C20B7167 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1778524881; bh=Cm5rT3slQz5n2SXqU/6ZScY5d22GqzKuutmNytXzf3E=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=oClZKiAslMmfwPIRh1HsXOUK64pYgoAUlMwmz7g0QSBv0SJ0tMf7m4/+ViBeHjp8r AL4UuFI1Rsj8t/GbLK09n3519gpx/6yC5pw6cHFE6SYJ3igk7+LdczjOGdBr/Ibw3K z9ha1whgq0RzPLxVKCwxOTfVsgwZJlO+Y1xufGJI= From: Jacob Pan To: linux-kernel@vger.kernel.org, "iommu@lists.linux.dev" , Jason Gunthorpe , Alex Williamson , Joerg Roedel , Mostafa Saleh , David Matlack , Robin Murphy , Nicolin Chen , "Tian, Kevin" , Yi Liu Cc: Saurabh Sengar , skhawaja@google.com, pasha.tatashin@soleen.com, Will Deacon , Jacob Pan , Baolu Lu Subject: [PATCH v5 8/9] selftests/vfio: Add iommufd noiommu mode selftest for cdev Date: Mon, 11 May 2026 11:41:13 -0700 Message-ID: <20260511184116.3687392-9-jacob.pan@linux.microsoft.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260511184116.3687392-1-jacob.pan@linux.microsoft.com> References: <20260511184116.3687392-1-jacob.pan@linux.microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add comprehensive selftest for VFIO device operations with iommufd in noiommu mode. Tests cover: - Device binding to iommufd - IOAS (I/O Address Space) allocation, mapping with dummy IOVA - Retrieve PA from dummy IOVA - Device attach/detach operations as usual Signed-off-by: Jacob Pan --- v4: - squash DSA specific selftest changes v2: - New selftest for generic noiommu bind/unbind --- tools/testing/selftests/vfio/Makefile | 1 + .../lib/include/libvfio/vfio_pci_device.h | 16 + .../selftests/vfio/lib/vfio_pci_device.c | 5 +- .../vfio/vfio_iommufd_noiommu_test.c | 567 ++++++++++++++++++ 4 files changed, 587 insertions(+), 2 deletions(-) create mode 100644 tools/testing/selftests/vfio/vfio_iommufd_noiommu_test.c diff --git a/tools/testing/selftests/vfio/Makefile b/tools/testing/selftest= s/vfio/Makefile index 0684932d91bf..c9c02fdfd946 100644 --- a/tools/testing/selftests/vfio/Makefile +++ b/tools/testing/selftests/vfio/Makefile @@ -9,6 +9,7 @@ CFLAGS =3D $(KHDR_INCLUDES) TEST_GEN_PROGS +=3D vfio_dma_mapping_test TEST_GEN_PROGS +=3D vfio_dma_mapping_mmio_test TEST_GEN_PROGS +=3D vfio_iommufd_setup_test +TEST_GEN_PROGS +=3D vfio_iommufd_noiommu_test TEST_GEN_PROGS +=3D vfio_pci_device_test TEST_GEN_PROGS +=3D vfio_pci_device_init_perf_test TEST_GEN_PROGS +=3D vfio_pci_driver_test diff --git a/tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_devi= ce.h b/tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h index 2858885a89bb..6218c91776b3 100644 --- a/tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h +++ b/tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h @@ -122,4 +122,20 @@ static inline bool vfio_pci_device_match(struct vfio_p= ci_device *device, =20 const char *vfio_pci_get_cdev_path(const char *bdf); =20 +static inline bool vfio_pci_noiommu_mode_enabled(void) +{ + char buf[8] =3D {}; + int fd, n; + + fd =3D open("/sys/module/vfio/parameters/enable_unsafe_noiommu_mode", + O_RDONLY); + if (fd < 0) + return false; + + n =3D read(fd, buf, sizeof(buf) - 1); + close(fd); + + return n > 0 && buf[0] =3D=3D 'Y'; +} + #endif /* SELFTESTS_VFIO_LIB_INCLUDE_LIBVFIO_VFIO_PCI_DEVICE_H */ diff --git a/tools/testing/selftests/vfio/lib/vfio_pci_device.c b/tools/tes= ting/selftests/vfio/lib/vfio_pci_device.c index fc75e04ef010..1a91658e812d 100644 --- a/tools/testing/selftests/vfio/lib/vfio_pci_device.c +++ b/tools/testing/selftests/vfio/lib/vfio_pci_device.c @@ -308,8 +308,9 @@ const char *vfio_pci_get_cdev_path(const char *bdf) VFIO_ASSERT_NOT_NULL(dir, "Failed to open directory %s\n", dir_path); =20 while ((entry =3D readdir(dir)) !=3D NULL) { - /* Find the file that starts with "vfio" */ - if (strncmp("vfio", entry->d_name, 4)) + /* Find the file that starts with "vfio" or "noiommu-vfio" */ + if (strncmp("vfio", entry->d_name, 4) && + strncmp("noiommu-vfio", entry->d_name, 12)) continue; =20 snprintf(cdev_path, PATH_MAX, "/dev/vfio/devices/%s", entry->d_name); diff --git a/tools/testing/selftests/vfio/vfio_iommufd_noiommu_test.c b/too= ls/testing/selftests/vfio/vfio_iommufd_noiommu_test.c new file mode 100644 index 000000000000..2df7cf40daff --- /dev/null +++ b/tools/testing/selftests/vfio/vfio_iommufd_noiommu_test.c @@ -0,0 +1,567 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * VFIO iommufd NoIOMMU Mode Selftest + * + * Tests VFIO device operations with iommufd in noiommu mode, including: + * - Device binding to iommufd + * - IOAS (I/O Address Space) allocation and management + * - Device attach/detach to IOAS + * - Memory mapping in IOAS + * - Device info queries and reset + */ + +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include "kselftest_harness.h" + +static const char iommu_dev_path[] =3D "/dev/iommu"; +static const char *cdev_path; + +static char *vfio_noiommu_get_device_id(const char *bdf) +{ + char *path =3D NULL; + char *vfio_id =3D NULL; + struct dirent *dentry; + DIR *dp; + + if (asprintf(&path, "/sys/bus/pci/devices/%s/vfio-dev", bdf) < 0) + return NULL; + + dp =3D opendir(path); + if (!dp) { + free(path); + return NULL; + } + + while ((dentry =3D readdir(dp)) !=3D NULL) { + if (strncmp("noiommu-vfio", dentry->d_name, 12) =3D=3D 0) { + vfio_id =3D strdup(dentry->d_name); + break; + } + } + + closedir(dp); + free(path); + return vfio_id; +} + +static char *vfio_noiommu_get_cdev_path(const char *bdf) +{ + char *vfio_id =3D vfio_noiommu_get_device_id(bdf); + char *cdev =3D NULL; + + if (vfio_id) { + asprintf(&cdev, "/dev/vfio/devices/%s", vfio_id); + free(vfio_id); + } + return cdev; +} + +static int vfio_device_bind_iommufd_ioctl(int cdev_fd, int iommufd) +{ + struct vfio_device_bind_iommufd bind_args =3D { + .argsz =3D sizeof(bind_args), + .iommufd =3D iommufd, + }; + + return ioctl(cdev_fd, VFIO_DEVICE_BIND_IOMMUFD, &bind_args); +} + +static int vfio_device_get_info_ioctl(int cdev_fd, + struct vfio_device_info *info) +{ + info->argsz =3D sizeof(*info); + return ioctl(cdev_fd, VFIO_DEVICE_GET_INFO, info); +} + +static int vfio_device_ioas_alloc_ioctl(int iommufd, + struct iommu_ioas_alloc *alloc_args) +{ + alloc_args->size =3D sizeof(*alloc_args); + alloc_args->flags =3D 0; + return ioctl(iommufd, IOMMU_IOAS_ALLOC, alloc_args); +} + +static int vfio_device_attach_iommufd_pt_ioctl(int cdev_fd, u32 pt_id) +{ + struct vfio_device_attach_iommufd_pt attach_args =3D { + .argsz =3D sizeof(attach_args), + .pt_id =3D pt_id, + }; + + return ioctl(cdev_fd, VFIO_DEVICE_ATTACH_IOMMUFD_PT, &attach_args); +} + +static int vfio_device_detach_iommufd_pt_ioctl(int cdev_fd) +{ + struct vfio_device_detach_iommufd_pt detach_args =3D { + .argsz =3D sizeof(detach_args), + }; + + return ioctl(cdev_fd, VFIO_DEVICE_DETACH_IOMMUFD_PT, &detach_args); +} + +static int vfio_device_get_region_info_ioctl(int cdev_fd, uint32_t index, + struct vfio_region_info *info) +{ + info->argsz =3D sizeof(*info); + info->index =3D index; + return ioctl(cdev_fd, VFIO_DEVICE_GET_REGION_INFO, info); +} + +static int vfio_device_reset_ioctl(int cdev_fd) +{ + return ioctl(cdev_fd, VFIO_DEVICE_RESET); +} + +static int ioas_map_pages(int iommufd, uint32_t ioas_id, uint64_t iova, + size_t length, bool hugepages) +{ + struct iommu_ioas_map map_args =3D { + .size =3D sizeof(map_args), + .ioas_id =3D ioas_id, + .iova =3D iova, + .length =3D length, + .flags =3D IOMMU_IOAS_MAP_READABLE | IOMMU_IOAS_MAP_WRITEABLE | IOMMU_IO= AS_MAP_FIXED_IOVA, + }; + void *pages; + int ret; + + /* Allocate test pages */ + if (hugepages) + pages =3D mmap(NULL, length, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0); + else + pages =3D mmap(NULL, length, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + if (pages =3D=3D MAP_FAILED) { + printf("mmap failed for length 0x%lx\n", (unsigned long)length); + return -ENOMEM; + } + + /* Set up page pointer for mapping */ + map_args.user_va =3D (uintptr_t)pages; + + printf(" ioas_map_pages: ioas_id=3D%u, iova=3D0x%lx, length=3D0x%lx, use= r_va=3D%p\n", + ioas_id, (unsigned long)iova, (unsigned long)length, pages); + + /* Map into IOAS */ + ret =3D ioctl(iommufd, IOMMU_IOAS_MAP, &map_args); + if (ret !=3D 0) + printf(" IOMMU_IOAS_MAP failed: %d (%s)\n", ret, strerror(errno)); + else + printf(" IOMMU_IOAS_MAP succeeded, IOVA=3D0x%lx\n", (unsigned long)map_= args.iova); + + munmap(pages, length); + return ret; +} + +static int ioas_unmap_pages(int iommufd, uint32_t ioas_id, uint64_t iova, + size_t length) +{ + struct iommu_ioas_unmap unmap_args =3D { + .size =3D sizeof(unmap_args), + .ioas_id =3D ioas_id, + .iova =3D iova, + .length =3D length, + }; + + return ioctl(iommufd, IOMMU_IOAS_UNMAP, &unmap_args); +} + +static int ioas_destroy_ioctl(int iommufd, uint32_t ioas_id) +{ + struct iommu_destroy destroy_args =3D { + .size =3D sizeof(destroy_args), + .id =3D ioas_id, + }; + + return ioctl(iommufd, IOMMU_DESTROY, &destroy_args); +} + +static int ioas_noiommu_get_pa_ioctl(int iommufd, uint32_t ioas_id, uint64= _t iova, + uint64_t *phys_out, uint64_t *length_out) +{ + struct { + __u32 size; + __u32 flags; + __u32 ioas_id; + __u32 __reserved; + __u64 iova; + __u64 out_length; + __u64 out_phys; + } get_pa =3D { + .size =3D sizeof(get_pa), + .flags =3D 0, + .ioas_id =3D ioas_id, + .iova =3D iova, + }; + + printf(" ioas_noiommu_get_pa_ioctl: ioas_id=3D%u, iova=3D0x%lx\n", + ioas_id, (unsigned long)iova); + + if (ioctl(iommufd, IOMMU_IOAS_NOIOMMU_GET_PA, &get_pa) !=3D 0) { + printf(" IOMMU_IOAS_NOIOMMU_GET_PA failed: %s (errno=3D%d)\n", + strerror(errno), errno); + return -1; + } + + printf(" IOMMU_IOAS_NOIOMMU_GET_PA succeeded: PA=3D0x%lx, length=3D0x%lx= \n", + (unsigned long)get_pa.out_phys, (unsigned long)get_pa.out_length); + + if (phys_out) + *phys_out =3D get_pa.out_phys; + if (length_out) + *length_out =3D get_pa.out_length; + + return 0; +} + +FIXTURE(vfio_noiommu) { + int cdev_fd; + int iommufd; +}; + +FIXTURE_SETUP(vfio_noiommu) +{ + ASSERT_LE(0, (self->cdev_fd =3D open(cdev_path, O_RDWR, 0))); + ASSERT_LE(0, (self->iommufd =3D open(iommu_dev_path, O_RDWR, 0))); +} + +FIXTURE_TEARDOWN(vfio_noiommu) +{ + if (self->cdev_fd >=3D 0) + close(self->cdev_fd); + if (self->iommufd >=3D 0) + close(self->iommufd); +} + +/* + * Test: Device cdev can be opened + */ +TEST_F(vfio_noiommu, device_cdev_open) +{ + ASSERT_LE(0, self->cdev_fd); +} + +/* + * Test: Device can be bound to iommufd + */ +TEST_F(vfio_noiommu, device_bind_iommufd) +{ + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd, + self->iommufd)); +} + +/* + * Test: Device info can be queried after binding + */ +TEST_F(vfio_noiommu, device_get_info_after_bind) +{ + struct vfio_device_info info; + + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd, + self->iommufd)); + ASSERT_EQ(0, vfio_device_get_info_ioctl(self->cdev_fd, &info)); + ASSERT_NE(0, info.argsz); +} + +/* + * Test: Getting device info fails without bind + */ +TEST_F(vfio_noiommu, device_get_info_without_bind_fails) +{ + struct vfio_device_info info; + + ASSERT_NE(0, vfio_device_get_info_ioctl(self->cdev_fd, &info)); +} + +/* + * Test: Binding with invalid iommufd fails + */ +TEST_F(vfio_noiommu, device_bind_bad_iommufd_fails) +{ + ASSERT_NE(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd, -2)); +} + +/* + * Test: Cannot bind twice to same device + */ +TEST_F(vfio_noiommu, device_repeated_bind_fails) +{ + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd, + self->iommufd)); + ASSERT_NE(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd, + self->iommufd)); +} + +/* + * Test: IOAS can be allocated + */ +TEST_F(vfio_noiommu, ioas_alloc) +{ + struct iommu_ioas_alloc alloc_args; + + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd, + &alloc_args)); + ASSERT_NE(0, alloc_args.out_ioas_id); +} + +/* + * Test: IOAS can be destroyed + */ +TEST_F(vfio_noiommu, ioas_destroy) +{ + struct iommu_ioas_alloc alloc_args; + + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd, + &alloc_args)); + ASSERT_EQ(0, ioas_destroy_ioctl(self->iommufd, + alloc_args.out_ioas_id)); +} + +/* + * Test: Device can attach to IOAS after binding + */ +TEST_F(vfio_noiommu, device_attach_to_ioas) +{ + struct iommu_ioas_alloc alloc_args; + + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd, + self->iommufd)); + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd, + &alloc_args)); + ASSERT_EQ(0, vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd, + alloc_args.out_ioas_id)); +} + +/* + * Test: Attaching to invalid IOAS fails + */ +TEST_F(vfio_noiommu, device_attach_invalid_ioas_fails) +{ + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd, + self->iommufd)); + ASSERT_NE(0, vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd, + UINT32_MAX)); +} + +/* + * Test: Device can detach from IOAS + */ +TEST_F(vfio_noiommu, device_detach_from_ioas) +{ + struct iommu_ioas_alloc alloc_args; + + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd, + self->iommufd)); + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd, + &alloc_args)); + ASSERT_EQ(0, vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd, + alloc_args.out_ioas_id)); + ASSERT_EQ(0, vfio_device_detach_iommufd_pt_ioctl(self->cdev_fd)); +} + +/* + * Test: Full lifecycle - bind, attach, detach, reset + */ +TEST_F(vfio_noiommu, device_lifecycle) +{ + struct iommu_ioas_alloc alloc_args; + struct vfio_device_info info; + + /* Bind device to iommufd */ + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd, + self->iommufd)); + + /* Allocate IOAS */ + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd, + &alloc_args)); + + /* Attach device to IOAS */ + ASSERT_EQ(0, vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd, + alloc_args.out_ioas_id)); + + /* Query device info */ + ASSERT_EQ(0, vfio_device_get_info_ioctl(self->cdev_fd, &info)); + + /* Detach device from IOAS */ + ASSERT_EQ(0, vfio_device_detach_iommufd_pt_ioctl(self->cdev_fd)); + + /* Reset device */ + ASSERT_EQ(0, vfio_device_reset_ioctl(self->cdev_fd)); +} + +/* + * Test: Get region info + */ +TEST_F(vfio_noiommu, device_get_region_info) +{ + struct vfio_device_info dev_info; + struct vfio_region_info region_info; + + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd, + self->iommufd)); + ASSERT_EQ(0, vfio_device_get_info_ioctl(self->cdev_fd, &dev_info)); + + /* Try to get first region info if device has regions */ + if (dev_info.num_regions > 0) { + ASSERT_EQ(0, vfio_device_get_region_info_ioctl(self->cdev_fd, 0, + ®ion_info)); + ASSERT_NE(0, region_info.argsz); + } +} + +TEST_F(vfio_noiommu, device_reset) +{ + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd, + self->iommufd)); + ASSERT_EQ(0, vfio_device_reset_ioctl(self->cdev_fd)); +} + +TEST_F(vfio_noiommu, ioas_map_pages) +{ + struct iommu_ioas_alloc alloc_args; + long page_size =3D sysconf(_SC_PAGESIZE); + uint64_t iova =3D 0x10000; + int i; + + ASSERT_GT(page_size, 0); + + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd, + &alloc_args)); + + printf("Page size: %ld bytes\n", page_size); + /* Test mapping regions of different sizes: 1, 2, 4, 8 pages */ + for (i =3D 0; i < 4; i++) { + size_t map_size =3D page_size * (1 << i); /* 1, 2, 4, 8 pages */ + uint64_t test_iova =3D iova + (i * 0x100000); + + /* Attempt to map each region (may fail if not supported) */ + ioas_map_pages(self->iommufd, alloc_args.out_ioas_id, + test_iova, map_size, false); + } +} + +TEST_F(vfio_noiommu, multiple_ioas_alloc) +{ + struct iommu_ioas_alloc alloc1, alloc2; + + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd, &alloc1)); + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd, &alloc2)); + ASSERT_NE(alloc1.out_ioas_id, alloc2.out_ioas_id); +} + +/* + * Test: Query physical address for IOVA + * Tests IOMMU_IOAS_NOIOMMU_GET_PA ioctl to translate IOVA to physical add= ress + * Note: Device must be attached to IOAS for PA query to work + */ +#define NR_PAGES 32 +TEST_F(vfio_noiommu, ioas_noiommu_get_pa_mapped) +{ + struct iommu_ioas_alloc alloc_args; + long page_size =3D sysconf(_SC_PAGESIZE); + uint64_t iova =3D 0x200000; + uint64_t phys =3D 0; + uint64_t length =3D 0; + int ret; + + ASSERT_GT(page_size, 0); + + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd, + self->iommufd)); + + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd, + &alloc_args)); + + ASSERT_EQ(0, vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd, + alloc_args.out_ioas_id)); + + /* + * Map a page into an arbitrary IOAS, used as a cookie for lookup. + * Use hugepages to test contiguous PA. Make sure hugepages are + * available. e.g. echo 64 > /proc/sys/vm/nr_hugepages + */ + ret =3D ioas_map_pages(self->iommufd, alloc_args.out_ioas_id, + iova, page_size * NR_PAGES, true); + if (ret !=3D 0) + return; + + /* Query the physical address for the mapped dummy IOVA */ + ret =3D ioas_noiommu_get_pa_ioctl(self->iommufd, alloc_args.out_ioas_id, + iova, &phys, &length); + + if (ret =3D=3D 0) { + /* If we got a result, verify it's valid */ + ASSERT_NE(0, phys); + ASSERT_GE((uint64_t)page_size * NR_PAGES, length); + } + + /* + * Query with a non-page-aligned IOVA. The returned length must + * not exceed the actual contiguous range starting from that + * offset, i.e. it must be reduced by the sub-page offset. + */ + phys =3D 0; + length =3D 0; + ret =3D ioas_noiommu_get_pa_ioctl(self->iommufd, alloc_args.out_ioas_id, + iova + 0x80, &phys, &length); + if (ret =3D=3D 0) { + ASSERT_NE(0, phys); + /* Length must account for the sub-page offset */ + ASSERT_GE((uint64_t)page_size * NR_PAGES - 0x80, length); + ASSERT_LE(length, (uint64_t)page_size * NR_PAGES - 0x80); + /* Must not overshoot into the next page boundary */ + ASSERT_EQ(0, (phys + length) % page_size); + } +} + +TEST_F(vfio_noiommu, ioas_noiommu_get_pa_unmapped_fails) +{ + struct iommu_ioas_alloc alloc_args; + + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd, + &alloc_args)); + + /* Try to retrieve unmapped IOVA (should fail) */ + ASSERT_NE(0, ioas_noiommu_get_pa_ioctl(self->iommufd, alloc_args.out_ioas= _id, + 0x10000, NULL, NULL)); +} + +int main(int argc, char *argv[]) +{ + const char *device_bdf =3D vfio_selftests_get_bdf(&argc, argv); + char *cdev =3D NULL; + + if (!device_bdf) { + ksft_print_msg("No device BDF provided\n"); + return KSFT_SKIP; + } + + cdev =3D vfio_noiommu_get_cdev_path(device_bdf); + if (!cdev) { + ksft_print_msg("Could not find cdev for device %s\n", + device_bdf); + return KSFT_SKIP; + } + + cdev_path =3D cdev; + ksft_print_msg("Using cdev device %s for BDF %s\n", cdev_path, + device_bdf); + + return test_harness_run(argc, argv); +} --=20 2.43.0 From nobody Sat Jun 13 00:25:45 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id AAC094C0421 for ; Mon, 11 May 2026 18:41:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778524886; cv=none; b=ncJH4LpFPHjC8Olmh6AE7EZQJFwLtxEiaoZTp7X+rkmxpfu145AHdB2kC49h1heoTAPAOajWMWsJK04JM21HNdBsP0d+V57KV9A9FJRp5G7D4vf2fWtTPwVHIBEFQil6cFqC0WMS9R2f1keijEDHIr4crVBJWy4IkSxE9wTzuHw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778524886; c=relaxed/simple; bh=O/DngcUI2BzfOTaG7QGuvMJMnYlUhv8RWNmBXvnHwcU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=E/RG1WKe+kCa7SdH7JO9QeN5lbL0hRHWuuklLs4VaZdCuzsPXiZtYeZhSBxrF1jsWvRwfTca3WGfTVNNUT3p9nlZvsbT4kMz8VxOfjjVwR2y77dCZKGl1B7oc/aXdTQtHsfBwI9+k7qBPHMO3PAJSiaM0tPNQ8HcU9haTL5IM4M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=P/3XquV5; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="P/3XquV5" Received: from administrator-PowerEdge-R660.corp.microsoft.com (unknown [131.107.147.7]) by linux.microsoft.com (Postfix) with ESMTPSA id 7C55A20B716B; Mon, 11 May 2026 11:41:22 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 7C55A20B716B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1778524882; bh=PIZ3AtZMw5ZZgdKM/toxFCQtVAmRi6DowL2aMuJTh28=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=P/3XquV5fdCbD7dbbcGRRUyK4mFHJ2f7SgvTMjFn+V5nsV9c4biJFqzSAxEFVYYXg 2vrQ9UUCsqIUp4mvptsYVW+SDJ5K6w8Qe3wHARRNuhYknH5E5B9TAlUzpeMWEwtAYF HWBrIH/PRqAGNzOsBXFlx591hEGRmyN3bbwNz6EM= From: Jacob Pan To: linux-kernel@vger.kernel.org, "iommu@lists.linux.dev" , Jason Gunthorpe , Alex Williamson , Joerg Roedel , Mostafa Saleh , David Matlack , Robin Murphy , Nicolin Chen , "Tian, Kevin" , Yi Liu Cc: Saurabh Sengar , skhawaja@google.com, pasha.tatashin@soleen.com, Will Deacon , Jacob Pan , Baolu Lu Subject: [PATCH v5 9/9] Documentation: Update VFIO NOIOMMU mode Date: Mon, 11 May 2026 11:41:14 -0700 Message-ID: <20260511184116.3687392-10-jacob.pan@linux.microsoft.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260511184116.3687392-1-jacob.pan@linux.microsoft.com> References: <20260511184116.3687392-1-jacob.pan@linux.microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Document the NOIOMMU mode with newly added cdev support under iommufd. Cc: Jonathan Corbet Signed-off-by: Jacob Pan --- Documentation/driver-api/vfio.rst | 46 +++++++++++++++++++++++++++++-- 1 file changed, 44 insertions(+), 2 deletions(-) diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/v= fio.rst index 2a21a42c9386..d97017d80b98 100644 --- a/Documentation/driver-api/vfio.rst +++ b/Documentation/driver-api/vfio.rst @@ -275,8 +275,6 @@ in a VFIO group. With CONFIG_VFIO_DEVICE_CDEV=3Dy the user can now acquire a device fd by directly opening a character device /dev/vfio/devices/vfioX where "X" is the number allocated uniquely by VFIO for registered devices. -cdev interface does not support noiommu devices, so user should use -the legacy group interface if noiommu is wanted. =20 The cdev only works with IOMMUFD. Both VFIO drivers and applications must adapt to the new cdev security model which requires using @@ -370,6 +368,50 @@ IOMMUFD IOAS/HWPT to enable userspace DMA:: =20 /* Other device operations as stated in "VFIO Usage Example" */ =20 +VFIO NOIOMMU mode +--------------------------------------------------------------------------= ----- +VFIO also supports a no-IOMMU mode, intended for usages where unsafe DMA c= an +be performed by userspace drivers w/o physical IOMMU protection. This mode +is controlled by the parameter: + +/sys/module/vfio/parameters/enable_unsafe_noiommu_mode + +Upon enabling this mode, with an assigned device, the user will be present= ed +with a VFIO group and device file, e.g.:: + + /dev/vfio/ + |-- devices + | `-- noiommu-vfio0 /* VFIO device cdev */ + |-- noiommu-0 /* VFIO group */ + `-- vfio + +The capabilities vary depending on the device programming interface and ke= rnel +configuration used. The following table summarizes the differences: + ++-------------------+---------------------+----------------------+ +| Feature | VFIO group | VFIO device cdev | ++=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ +| VFIO device UAPI | Yes | Yes | ++-------------------+---------------------+----------------------+ +| VFIO container | No | No | ++-------------------+---------------------+----------------------+ +| IOMMUFD IOAS | No | Yes* | ++-------------------+---------------------+----------------------+ + +Note that the VFIO container case includes IOMMUFD provided VFIO compatibi= lity +interfaces when either CONFIG_VFIO_CONTAINER or CONFIG_IOMMUFD_VFIO_CONTAI= NER is +enabled. + +* IOMMUFD UAPI is available for VFIO device cdev to pin and map user memor= y with + the ability to retrieve physical addresses for DMA command submission. + +A new IOMMUFD ioctl IOMMU_IOAS_NOIOMMU_GET_PA is added to retrieve the phy= sical +address for a given IOVA. Although there is no physical DMA remapping hard= ware, +IOMMU_IOAS_MAP_FIXED_IOVA is still used to establish IOVA-to-PA mappings i= n the +software page table for later IOMMU_IOAS_NOIOMMU_GET_PA lookups. +tools/testing/selftests/vfio/vfio_iommufd_noiommu_test.c provides an examp= le of +using this ioctl in no-IOMMU mode. + VFIO User API --------------------------------------------------------------------------= ----- =20 --=20 2.43.0