From nobody Mon Jun 8 06:35:45 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id BB66D33D6D8 for ; Wed, 3 Jun 2026 22:02:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780524135; cv=none; b=tl/nVslDajv1oX71p2vY85M5MGqKwnRcitZh6+m9Um3fLU/eqaGnYiYqDeb8tDyOAHX8RnTPfXXUCzxjOX2379yYrPKJ17WcwgJlvWpVDjyyIoZcY+PJcqlQ7SRL4cmnY9hZzR9+T7FDDb6WMi7Xact5dUXusq9ReppdWkCmf0Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780524135; c=relaxed/simple; bh=7JmlzQkOyORWKRKcDTRx8V2A+rBeBaHPMv0bPpG/x6I=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=pIA2/84KCUYM4bYkBjl3UA/jZUOE7QRLBB9Fkz0aZoavDo9RB/4zSIkUZb7kx5WigzeYuMzzCy3DVK+iXr2/wMxeRKxrZwtO+VS8ZhkjI11x5tsCMSkvcqrzMSK6SlslZvQetLqflGCXe6OK2lI7zrPO0tYeGxh5ySElX+RwXEw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=sj2xQquH; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="sj2xQquH" Received: from administrator-PowerEdge-R660.corp.microsoft.com (unknown [131.107.147.7]) by linux.microsoft.com (Postfix) with ESMTPSA id 20F2620B7169; Wed, 3 Jun 2026 15:01:58 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 20F2620B7169 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1780524118; bh=NnRp4LSPrsHHXBhPsyKwV2T0cmByZ1kV7iBycWX7Dj4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=sj2xQquHdPn9pIQ8DSItbM4Rzn09fRBNzMxN3xQweGHHlUpD0McN+mAtJTscVewai w2On6NaiXip2S6p+WHLHhtJlteG6DQoRKHLBANF0QRVq8f6YzvvyEivMhKpuiXjyI/ 5qzRz7pvF/i+8Js2dfoksrTyyVUKD5ADRmqDBsyc= From: Jacob Pan To: linux-kernel@vger.kernel.org, "iommu@lists.linux.dev" , Jason Gunthorpe , Alex Williamson , Joerg Roedel , Mostafa Saleh , David Matlack , Robin Murphy , Nicolin Chen , "Tian, Kevin" , Yi Liu , Baolu Lu Cc: Saurabh Sengar , skhawaja@google.com, pasha.tatashin@soleen.com, Will Deacon , Jacob Pan Subject: [PATCH v8 1/6] iommufd: Support a HWPT without an iommu driver for noiommu Date: Wed, 3 Jun 2026 15:02:06 -0700 Message-ID: <20260603220211.2584590-2-jacob.pan@linux.microsoft.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260603220211.2584590-1-jacob.pan@linux.microsoft.com> References: <20260603220211.2584590-1-jacob.pan@linux.microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Jason Gunthorpe Create just a little part of a real iommu driver, enough to slot in under the dev_iommu_ops() and allow iommufd to call domain_alloc_paging_flags() and fail everything else. This allows explicitly creating a HWPT under an IOAS. A new Kconfig option IOMMUFD_NOIOMMU is introduced to differentiate from the VFIO group/container based noiommu mode. Reviewed-by: Lu Baolu Reviewed-by: Samiullah Khawaja Reviewed-by: Kevin Tian Signed-off-by: Jason Gunthorpe Signed-off-by: Jacob Pan --- V8: - Guard vIOMMU and vDevice allocation paths for noiommu (Sashiko) v7: - Drain no-IOMMU generic-PT freelist (Sashiko) - Import generic-PT IOMMU namespace (Sashiko) v6: (Yi) - Sort includes alphabetically (iommu.h after generic_pt/iommu.h) - Fix comment: s/mock page table/SW-only page table/ to avoid confusion with selftest mock - Rewrite noiommu_amdv1_ops comment: explain why AMDV1 format is chosen (multi-page size options), remove references to group-container mode di= stinction v5: - Use the new IOMMUFD_NOIOMMU Kconfig instead of VFIO_NOIOMMU - Use consistent wording referring to VFIO noiommu mode (Kevin) - Copyright date fix (Kevin) v4: - Make iommufd_noiommu_ops const v3: - Add comment to explain the design difference over the legacy noiommu VFIO code. --- drivers/iommu/iommufd/Kconfig | 12 +++ drivers/iommu/iommufd/Makefile | 1 + drivers/iommu/iommufd/hw_pagetable.c | 19 ++++- drivers/iommu/iommufd/hwpt_noiommu.c | 105 ++++++++++++++++++++++++ drivers/iommu/iommufd/iommufd_private.h | 12 +++ drivers/iommu/iommufd/main.c | 1 + drivers/iommu/iommufd/viommu.c | 14 +++- 7 files changed, 158 insertions(+), 6 deletions(-) create mode 100644 drivers/iommu/iommufd/hwpt_noiommu.c diff --git a/drivers/iommu/iommufd/Kconfig b/drivers/iommu/iommufd/Kconfig index 455bac0351f2..6c3bea83631b 100644 --- a/drivers/iommu/iommufd/Kconfig +++ b/drivers/iommu/iommufd/Kconfig @@ -16,6 +16,18 @@ config IOMMUFD If you don't know what to do here, say N. =20 if IOMMUFD +config IOMMUFD_NOIOMMU + bool + depends on !GENERIC_ATOMIC64 # IOMMU_PT_AMDV1 requires cmpxchg64 + select GENERIC_PT + select IOMMU_PT + select IOMMU_PT_AMDV1 + help + Provides a SW-only IO page table for devices without hardware + IOMMU backing. This uses the AMDV1 page table format for + IOVA-to-PA lookups only, not for hardware DMA translation. + To be selected by VFIO_NOIOMMU when VFIO_DEVICE_CDEV is enabled. + config IOMMUFD_VFIO_CONTAINER bool "IOMMUFD provides the VFIO container /dev/vfio/vfio" depends on VFIO_GROUP && !VFIO_CONTAINER diff --git a/drivers/iommu/iommufd/Makefile b/drivers/iommu/iommufd/Makefile index 71d692c9a8f4..67207914bb6e 100644 --- a/drivers/iommu/iommufd/Makefile +++ b/drivers/iommu/iommufd/Makefile @@ -10,6 +10,7 @@ iommufd-y :=3D \ vfio_compat.o \ viommu.o =20 +iommufd-$(CONFIG_IOMMUFD_NOIOMMU) +=3D hwpt_noiommu.o iommufd-$(CONFIG_IOMMUFD_TEST) +=3D selftest.o =20 obj-$(CONFIG_IOMMUFD) +=3D iommufd.o diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/h= w_pagetable.c index fe789c2dc0c9..8f95c75d47f3 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -8,6 +8,15 @@ #include "../iommu-priv.h" #include "iommufd_private.h" =20 +static const struct iommu_ops *get_iommu_ops(struct iommufd_device *idev) +{ + if (IS_ENABLED(CONFIG_IOMMUFD_NOIOMMU) && !idev->igroup->group) + return &iommufd_noiommu_ops; + if (WARN_ON_ONCE(!idev->dev->iommu)) + return NULL; + return dev_iommu_ops(idev->dev); +} + static void __iommufd_hwpt_destroy(struct iommufd_hw_pagetable *hwpt) { if (hwpt->domain) @@ -114,11 +123,13 @@ iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, s= truct iommufd_ioas *ioas, IOMMU_HWPT_ALLOC_DIRTY_TRACKING | IOMMU_HWPT_FAULT_ID_VALID | IOMMU_HWPT_ALLOC_PASID; - const struct iommu_ops *ops =3D dev_iommu_ops(idev->dev); + const struct iommu_ops *ops =3D get_iommu_ops(idev); struct iommufd_hwpt_paging *hwpt_paging; struct iommufd_hw_pagetable *hwpt; int rc; =20 + if (!ops) + return ERR_PTR(-ENODEV); lockdep_assert_held(&ioas->mutex); =20 if ((flags || user_data) && !ops->domain_alloc_paging_flags) @@ -229,7 +240,7 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx, struct iommufd_device *idev, u32 flags, const struct iommu_user_data *user_data) { - const struct iommu_ops *ops =3D dev_iommu_ops(idev->dev); + const struct iommu_ops *ops =3D get_iommu_ops(idev); struct iommufd_hwpt_nested *hwpt_nested; struct iommufd_hw_pagetable *hwpt; int rc; @@ -389,10 +400,12 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) hwpt =3D &hwpt_nested->common; } else if (pt_obj->type =3D=3D IOMMUFD_OBJ_VIOMMU) { struct iommufd_hwpt_nested *hwpt_nested; + struct iommu_device *iommu_dev; struct iommufd_viommu *viommu; =20 viommu =3D container_of(pt_obj, struct iommufd_viommu, obj); - if (viommu->iommu_dev !=3D __iommu_get_iommu_dev(idev->dev)) { + iommu_dev =3D iommufd_device_get_iommu_dev(idev); + if (!iommu_dev || viommu->iommu_dev !=3D iommu_dev) { rc =3D -EINVAL; goto out_unlock; } diff --git a/drivers/iommu/iommufd/hwpt_noiommu.c b/drivers/iommu/iommufd/h= wpt_noiommu.c new file mode 100644 index 000000000000..9b8b5eb71491 --- /dev/null +++ b/drivers/iommu/iommufd/hwpt_noiommu.c @@ -0,0 +1,105 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES + */ +#include +#include +#include "../iommu-pages.h" +#include "iommufd_private.h" + +static const struct iommu_domain_ops noiommu_amdv1_ops; + +struct noiommu_domain { + union { + struct iommu_domain domain; + struct pt_iommu_amdv1 amdv1; + }; + spinlock_t lock; +}; +PT_IOMMU_CHECK_DOMAIN(struct noiommu_domain, amdv1.iommu, domain); + +static void noiommu_change_top(struct pt_iommu *iommu_table, + phys_addr_t top_paddr, unsigned int top_level) +{ +} + +static spinlock_t *noiommu_get_top_lock(struct pt_iommu *iommupt) +{ + struct noiommu_domain *domain =3D + container_of(iommupt, struct noiommu_domain, amdv1.iommu); + + return &domain->lock; +} + +static const struct pt_iommu_driver_ops noiommu_driver_ops =3D { + .get_top_lock =3D noiommu_get_top_lock, + .change_top =3D noiommu_change_top, +}; + +static struct iommu_domain * +noiommu_alloc_paging_flags(struct device *dev, u32 flags, + const struct iommu_user_data *user_data) +{ + struct pt_iommu_amdv1_cfg cfg =3D {}; + struct noiommu_domain *dom; + int rc; + + if (flags || user_data) + return ERR_PTR(-EOPNOTSUPP); + + cfg.common.hw_max_vasz_lg2 =3D 64; + cfg.common.hw_max_oasz_lg2 =3D 52; + cfg.starting_level =3D 2; + cfg.common.features =3D + (BIT(PT_FEAT_DYNAMIC_TOP) | BIT(PT_FEAT_AMDV1_ENCRYPT_TABLES) | + BIT(PT_FEAT_AMDV1_FORCE_COHERENCE)); + + dom =3D kzalloc(sizeof(*dom), GFP_KERNEL); + if (!dom) + return ERR_PTR(-ENOMEM); + + spin_lock_init(&dom->lock); + dom->amdv1.iommu.nid =3D NUMA_NO_NODE; + dom->amdv1.iommu.driver_ops =3D &noiommu_driver_ops; + dom->domain.ops =3D &noiommu_amdv1_ops; + + /* Use SW-only page table which is based on AMDV1 */ + rc =3D pt_iommu_amdv1_init(&dom->amdv1, &cfg, GFP_KERNEL); + if (rc) { + kfree(dom); + return ERR_PTR(rc); + } + + return &dom->domain; +} + +static void noiommu_domain_free(struct iommu_domain *iommu_domain) +{ + struct noiommu_domain *domain =3D + container_of(iommu_domain, struct noiommu_domain, domain); + + pt_iommu_deinit(&domain->amdv1.iommu); + kfree(domain); +} + +static void noiommu_iotlb_sync(struct iommu_domain *domain, + struct iommu_iotlb_gather *gather) +{ + iommu_put_pages_list(&gather->freelist); +} + +/* + * Domain ops for iommufd no-IOMMU mode. Uses AMDV1 format as a + * SW-only IOPT because it has the best multi-page size options + * of all the formats. IOVAs serve only for IOVA-to-PA lookups, + * not for hardware DMA translation. + */ +static const struct iommu_domain_ops noiommu_amdv1_ops =3D { + IOMMU_PT_DOMAIN_OPS(amdv1), + .iotlb_sync =3D noiommu_iotlb_sync, + .free =3D noiommu_domain_free, +}; + +const struct iommu_ops iommufd_noiommu_ops =3D { + .domain_alloc_paging_flags =3D noiommu_alloc_paging_flags, +}; diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommuf= d/iommufd_private.h index 6ac1965199e9..c8ed612e896a 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -464,6 +464,8 @@ static inline void iommufd_hw_pagetable_put(struct iomm= ufd_ctx *ictx, refcount_dec(&hwpt->obj.users); } =20 +extern const struct iommu_ops iommufd_noiommu_ops; + struct iommufd_attach; =20 struct iommufd_group { @@ -501,6 +503,16 @@ iommufd_get_device(struct iommufd_ucmd *ucmd, u32 id) struct iommufd_device, obj); } =20 +static inline struct iommu_device * +iommufd_device_get_iommu_dev(struct iommufd_device *idev) +{ + if (IS_ENABLED(CONFIG_IOMMUFD_NOIOMMU) && !idev->igroup->group) + return NULL; + if (WARN_ON_ONCE(!idev->dev->iommu)) + return NULL; + return __iommu_get_iommu_dev(idev->dev); +} + void iommufd_device_pre_destroy(struct iommufd_object *obj); void iommufd_device_destroy(struct iommufd_object *obj); int iommufd_get_hw_info(struct iommufd_ucmd *ucmd); diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index 8c6d43601afb..f6ae60bd3f70 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -804,5 +804,6 @@ MODULE_ALIAS("devname:vfio/vfio"); MODULE_IMPORT_NS("IOMMUFD_INTERNAL"); MODULE_IMPORT_NS("IOMMUFD"); MODULE_IMPORT_NS("DMA_BUF"); +MODULE_IMPORT_NS("GENERIC_PT_IOMMU"); MODULE_DESCRIPTION("I/O Address Space Management for passthrough devices"); MODULE_LICENSE("GPL"); diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c index 4081deda9b33..b51f67fdf4e3 100644 --- a/drivers/iommu/iommufd/viommu.c +++ b/drivers/iommu/iommufd/viommu.c @@ -25,6 +25,7 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) struct iommufd_hwpt_paging *hwpt_paging; struct iommufd_viommu *viommu; struct iommufd_device *idev; + struct iommu_device *iommu_dev; const struct iommu_ops *ops; size_t viommu_size; int rc; @@ -36,7 +37,12 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) if (IS_ERR(idev)) return PTR_ERR(idev); =20 - ops =3D dev_iommu_ops(idev->dev); + iommu_dev =3D iommufd_device_get_iommu_dev(idev); + if (!iommu_dev) { + rc =3D -EOPNOTSUPP; + goto out_put_idev; + } + ops =3D iommu_dev->ops; if (!ops->get_viommu_size || !ops->viommu_init) { rc =3D -EOPNOTSUPP; goto out_put_idev; @@ -87,7 +93,7 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) * pluggable IOMMU instance (if exists) is responsible for refcounting * on its own. */ - viommu->iommu_dev =3D __iommu_get_iommu_dev(idev->dev); + viommu->iommu_dev =3D iommu_dev; =20 rc =3D ops->viommu_init(viommu, hwpt_paging->common.domain, user_data.len ? &user_data : NULL); @@ -146,6 +152,7 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *uc= md) struct iommufd_vdevice *vdev, *curr; size_t vdev_size =3D sizeof(*vdev); struct iommufd_viommu *viommu; + struct iommu_device *iommu_dev; struct iommufd_device *idev; u64 virt_id =3D cmd->virt_id; int rc =3D 0; @@ -164,7 +171,8 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *uc= md) goto out_put_viommu; } =20 - if (viommu->iommu_dev !=3D __iommu_get_iommu_dev(idev->dev)) { + iommu_dev =3D iommufd_device_get_iommu_dev(idev); + if (!iommu_dev || viommu->iommu_dev !=3D iommu_dev) { rc =3D -EINVAL; goto out_put_idev; } --=20 2.43.0 From nobody Mon Jun 8 06:35:45 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 32E753FD96D for ; Wed, 3 Jun 2026 22:02:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780524135; cv=none; b=m/nRgfC+RnkJSRAfzuEyU/HAc3SAVvUJ+GMV8BIq2hrHUbLVAlxzq0kuNHDO6wdvs8V2yWOD88nBSPRQKReiewm21NfGeFSjVUHQg/29Ja343cIpJA+N/dDCad7okRri4SchqPKDVQcCCCJnVUw4I+UR3kv5StSzYUL6UXkXVRU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780524135; c=relaxed/simple; bh=OiuszM0Yvnhq3Tezpf6xIqVTeeiSGx6pbgkQsIAwy4k=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kkVqPRJ5MvjzajowA8X7eF9TUIlPL7AJYHCZaqTi65O6tlEfXCdLA4kczvUAHyAe5v9789Aj/9HRvaF0aZsRYsi8ajRIGZoCHz9PvnBq9YsJarMKq+NSUlbh0hS8Kv9ztM/X4l1A8EUcTKFAeTHf0XBpuS47dpaY1HxxKsOkpPc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=lXSm39hp; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="lXSm39hp" Received: from administrator-PowerEdge-R660.corp.microsoft.com (unknown [131.107.147.7]) by linux.microsoft.com (Postfix) with ESMTPSA id 5CB9620B716C; Wed, 3 Jun 2026 15:01:59 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 5CB9620B716C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1780524119; bh=SVWqiZ0ZtNTbTFfbhtPOhyl+evHwkHDU4CIjGvBUE4g=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=lXSm39hpkE8Hv804ohqOWsWGWd7qx5Fh7fDHAIFZwpksbwFmgPnzddvVZmyHhOIbW 8cvBw/0aC20crbUju6/PD/26mS0dTAJWXqGLIv9OEj0zS1i5U3KYeY/wYhUs3Tpof+ PfUwJt7h8B1gaNE1ox01OkHebSlpezpNg8Vhpckc= From: Jacob Pan To: linux-kernel@vger.kernel.org, "iommu@lists.linux.dev" , Jason Gunthorpe , Alex Williamson , Joerg Roedel , Mostafa Saleh , David Matlack , Robin Murphy , Nicolin Chen , "Tian, Kevin" , Yi Liu , Baolu Lu Cc: Saurabh Sengar , skhawaja@google.com, pasha.tatashin@soleen.com, Will Deacon , Jacob Pan Subject: [PATCH v8 2/6] iommufd: Move igroup allocation to a function Date: Wed, 3 Jun 2026 15:02:07 -0700 Message-ID: <20260603220211.2584590-3-jacob.pan@linux.microsoft.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260603220211.2584590-1-jacob.pan@linux.microsoft.com> References: <20260603220211.2584590-1-jacob.pan@linux.microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Jason Gunthorpe So it can be reused in the next patch which allows binding to noiommu device. Reviewed-by: Samiullah Khawaja Reviewed-by: Yi Liu Reviewed-by: Kevin Tian Reviewed-by: Lu Baolu Signed-off-by: Jason Gunthorpe Signed-off-by: Jacob Pan --- v5: - Add NULL group to the error handling path of iommufd_group_setup_msi() v3: - New patch --- drivers/iommu/iommufd/device.c | 43 +++++++++++++++++++++------------- 1 file changed, 27 insertions(+), 16 deletions(-) diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 170a7005f0bc..d03076fcf3c2 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -56,6 +56,30 @@ static bool iommufd_group_try_get(struct iommufd_group *= igroup, return kref_get_unless_zero(&igroup->ref); } =20 +static struct iommufd_group *iommufd_alloc_group(struct iommufd_ctx *ictx, + struct iommu_group *group) +{ + struct iommufd_group *new_igroup; + + new_igroup =3D kzalloc(sizeof(*new_igroup), GFP_KERNEL); + if (!new_igroup) + return ERR_PTR(-ENOMEM); + + kref_init(&new_igroup->ref); + mutex_init(&new_igroup->lock); + xa_init(&new_igroup->pasid_attach); + new_igroup->sw_msi_start =3D PHYS_ADDR_MAX; + /* group reference moves into new_igroup */ + new_igroup->group =3D group; + + /* + * The ictx is not additionally refcounted here because all objects using + * an igroup must put it before their destroy completes. + */ + new_igroup->ictx =3D ictx; + return new_igroup; +} + /* * iommufd needs to store some more data for each iommu_group, we keep a * parallel xarray indexed by iommu_group id to hold this instead of putti= ng it @@ -87,25 +111,12 @@ static struct iommufd_group *iommufd_get_group(struct = iommufd_ctx *ictx, } xa_unlock(&ictx->groups); =20 - new_igroup =3D kzalloc_obj(*new_igroup); - if (!new_igroup) { + new_igroup =3D iommufd_alloc_group(ictx, group); + if (IS_ERR(new_igroup)) { iommu_group_put(group); - return ERR_PTR(-ENOMEM); + return new_igroup; } =20 - kref_init(&new_igroup->ref); - mutex_init(&new_igroup->lock); - xa_init(&new_igroup->pasid_attach); - new_igroup->sw_msi_start =3D PHYS_ADDR_MAX; - /* group reference moves into new_igroup */ - new_igroup->group =3D group; - - /* - * The ictx is not additionally refcounted here becase all objects using - * an igroup must put it before their destroy completes. - */ - new_igroup->ictx =3D ictx; - /* * We dropped the lock so igroup is invalid. NULL is a safe and likely * value to assume for the xa_cmpxchg algorithm. --=20 2.43.0 From nobody Mon Jun 8 06:35:45 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id C806B4048AA for ; Wed, 3 Jun 2026 22:02:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780524136; cv=none; b=YJUsCLTEonWFchHBHDNOPIv4xgsGtpCMc6jMXgEglHBia0+Ingbe3J5UgOVeTbHG2zRo4JrvGJd30d4Jod/f5uRZ/9bOJzswPXhSNtN6JFVZV5WjlRlsuwAadBzMcgQHslYVnDSQCjS3CrOOffDBhPDyRgUAB2XeTh6ojt0ZWxw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780524136; c=relaxed/simple; bh=wN+Dk6nkTd3eJ9PzQVaMOZ50whiDgF42HEfh6ntxKv0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=eGvS4Kzk50wigSK06Bsky2eEfVjQSQeZ2RMrtUrrc14paNYemfVmCDWTCJAw5fbAWeBYlFPuIfd71X20ag61IkDcpZUmkgK1L8EruO/3Nu12+eWrpwmR4P/K9boTNd8Lqr88ugfwmjgP3qReCS4CN/l0QSn0CwwaZA1UHfn8mfM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=XmOJX+tw; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="XmOJX+tw" Received: from administrator-PowerEdge-R660.corp.microsoft.com (unknown [131.107.147.7]) by linux.microsoft.com (Postfix) with ESMTPSA id CD82120B716D; Wed, 3 Jun 2026 15:01:59 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com CD82120B716D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1780524119; bh=fqUCwKNZ0+olZO+PI992CgoyO/0IJLDo52hjiF59KoY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=XmOJX+twtF6a1PDSIcFr0f9jRncu1cOsQLzNuZd5xotrtIo1SgC8otJ0ipNeR94Mz Iyjx6kNkoKEA0OtnYc/13MG/4mMHJKbj3Ntskkz5DNM7uUiaDYh7qdjRIf03ZU6fjd 6caj7nswGXW6OBSxEGo7JgVkXbdH+qx84pKgUwj8= From: Jacob Pan To: linux-kernel@vger.kernel.org, "iommu@lists.linux.dev" , Jason Gunthorpe , Alex Williamson , Joerg Roedel , Mostafa Saleh , David Matlack , Robin Murphy , Nicolin Chen , "Tian, Kevin" , Yi Liu , Baolu Lu Cc: Saurabh Sengar , skhawaja@google.com, pasha.tatashin@soleen.com, Will Deacon , Jacob Pan Subject: [PATCH v8 3/6] iommufd: Allow binding to a noiommu device Date: Wed, 3 Jun 2026 15:02:08 -0700 Message-ID: <20260603220211.2584590-4-jacob.pan@linux.microsoft.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260603220211.2584590-1-jacob.pan@linux.microsoft.com> References: <20260603220211.2584590-1-jacob.pan@linux.microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Jason Gunthorpe Allow iommufd to bind devices without an IOMMU (noiommu mode) by creating a dummy igroup for such devices and skipping hwpt operations. This enables noiommu devices to operate through the same iommufd API as IOM= MU- capable devices. Reviewed-by: Kevin Tian Reviewed-by: Yi Liu Reviewed-by: Lu Baolu Signed-off-by: Jason Gunthorpe Signed-off-by: Jacob Pan --- v7: - Block get hw info for noiommu v6: - Expand iommufd_device_is_noiommu() comment to explain why dev->iommu is checked instead of device_iommu_mapped() (Yi & Baolu) - Simplify bind error handling by factoring out duplicated rc check (Yi) v5: - simplify logic and rename iommufd_device_is_noiommu (Kevin, Yi) - use a helper iommufd_bind_noiommu instead of open coding (Kevin) - move IOMMU cap check under iommufd_bind_iommu() (Yi) - reword comments for partial init (Yi) - misc minor clean up v4: - Update the description of the module parameter (Alex) v3: - Consolidate into fewer patches --- drivers/iommu/iommufd/device.c | 154 ++++++++++++++++++++++++--------- 1 file changed, 115 insertions(+), 39 deletions(-) diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index d03076fcf3c2..670349ff65ea 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -23,6 +23,19 @@ struct iommufd_attach { struct xarray device_array; }; =20 +/* + * Detect a noiommu device for the cdev path. We check dev->iommu rather t= han + * using device_iommu_mapped() (which checks dev->iommu_group) because when + * both group and cdev interfaces coexist, the group path assigns a fake + * noiommu iommu_group to the device. That would cause device_iommu_mapped= () + * to return true and hide the noiommu case from the cdev path. dev->iommu= is + * reliably NULL when no IOMMU driver is managing the device. + */ +static bool iommufd_device_is_noiommu(struct iommufd_device *idev) +{ + return IS_ENABLED(CONFIG_IOMMUFD_NOIOMMU) && !idev->dev->iommu; +} + static void iommufd_group_release(struct kref *kref) { struct iommufd_group *igroup =3D @@ -30,9 +43,11 @@ static void iommufd_group_release(struct kref *kref) =20 WARN_ON(!xa_empty(&igroup->pasid_attach)); =20 - xa_cmpxchg(&igroup->ictx->groups, iommu_group_id(igroup->group), igroup, - NULL, GFP_KERNEL); - iommu_group_put(igroup->group); + if (igroup->group) { + xa_cmpxchg(&igroup->ictx->groups, iommu_group_id(igroup->group), + igroup, NULL, GFP_KERNEL); + iommu_group_put(igroup->group); + } mutex_destroy(&igroup->lock); kfree(igroup); } @@ -204,32 +219,20 @@ void iommufd_device_destroy(struct iommufd_object *ob= j) struct iommufd_device *idev =3D container_of(obj, struct iommufd_device, obj); =20 - iommu_device_release_dma_owner(idev->dev); + /* igroup is NULL when destroy called during bind error cleanup */ + if (!idev->igroup) + return; + if (!iommufd_device_is_noiommu(idev)) + iommu_device_release_dma_owner(idev->dev); iommufd_put_group(idev->igroup); if (!iommufd_selftest_is_mock_dev(idev->dev)) iommufd_ctx_put(idev->ictx); } =20 -/** - * iommufd_device_bind - Bind a physical device to an iommu fd - * @ictx: iommufd file descriptor - * @dev: Pointer to a physical device struct - * @id: Output ID number to return to userspace for this device - * - * A successful bind establishes an ownership over the device and returns - * struct iommufd_device pointer, otherwise returns error pointer. - * - * A driver using this API must set driver_managed_dma and must not touch - * the device until this routine succeeds and establishes ownership. - * - * Binding a PCI device places the entire RID under iommufd control. - * - * The caller must undo this with iommufd_device_unbind() - */ -struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx, - struct device *dev, u32 *id) +static int iommufd_bind_iommu(struct iommufd_device *idev) { - struct iommufd_device *idev; + struct iommufd_ctx *ictx =3D idev->ictx; + struct device *dev =3D idev->dev; struct iommufd_group *igroup; int rc; =20 @@ -238,11 +241,11 @@ struct iommufd_device *iommufd_device_bind(struct iom= mufd_ctx *ictx, * to restore cache coherency. */ if (!device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY)) - return ERR_PTR(-EINVAL); + return -EINVAL; =20 igroup =3D iommufd_get_group(ictx, dev); if (IS_ERR(igroup)) - return ERR_CAST(igroup); + return PTR_ERR(igroup); =20 /* * For historical compat with VFIO the insecure interrupt path is @@ -268,21 +271,77 @@ struct iommufd_device *iommufd_device_bind(struct iom= mufd_ctx *ictx, if (rc) goto out_group_put; =20 + /* igroup refcount moves into iommufd_device */ + idev->igroup =3D igroup; + idev->enforce_cache_coherency =3D + device_iommu_capable(dev, IOMMU_CAP_ENFORCE_CACHE_COHERENCY); + return 0; + +out_group_put: + iommufd_put_group(igroup); + return rc; +} + +/* + * Noiommu devices have no real IOMMU group. Create a dummy igroup so that + * internal code paths that expect idev->igroup to be present still work. + * A NULL igroup->group distinguishes this from a real IOMMU-backed group. + */ +static int iommufd_bind_noiommu(struct iommufd_device *idev) +{ + struct iommufd_group *igroup; + + igroup =3D iommufd_alloc_group(idev->ictx, NULL); + if (IS_ERR(igroup)) + return PTR_ERR(igroup); + idev->igroup =3D igroup; + return 0; +} + +/** + * iommufd_device_bind - Bind a physical device to an iommu fd + * @ictx: iommufd file descriptor + * @dev: Pointer to a physical device struct + * @id: Output ID number to return to userspace for this device + * + * A successful bind establishes an ownership over the device and returns + * struct iommufd_device pointer, otherwise returns error pointer. + * + * A driver using this API must set driver_managed_dma and must not touch + * the device until this routine succeeds and establishes ownership. + * + * Binding a PCI device places the entire RID under iommufd control. + * + * The caller must undo this with iommufd_device_unbind() + */ +struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx, + struct device *dev, u32 *id) +{ + struct iommufd_device *idev; + int rc; + idev =3D iommufd_object_alloc(ictx, idev, IOMMUFD_OBJ_DEVICE); - if (IS_ERR(idev)) { - rc =3D PTR_ERR(idev); - goto out_release_owner; - } + if (IS_ERR(idev)) + return idev; + idev->ictx =3D ictx; + idev->dev =3D dev; + + if (!iommufd_device_is_noiommu(idev)) + rc =3D iommufd_bind_iommu(idev); + else + rc =3D iommufd_bind_noiommu(idev); + if (rc) + goto err_out; + + /* + * Take a ctx reference after bind succeeds. This must happen here + * so that iommufd_device_destroy() can handle partial initialization + */ if (!iommufd_selftest_is_mock_dev(dev)) iommufd_ctx_get(ictx); - idev->dev =3D dev; - idev->enforce_cache_coherency =3D - device_iommu_capable(dev, IOMMU_CAP_ENFORCE_CACHE_COHERENCY); /* The calling driver is a user until iommufd_device_unbind() */ refcount_inc(&idev->obj.users); - /* igroup refcount moves into iommufd_device */ - idev->igroup =3D igroup; =20 /* * If the caller fails after this success it must call @@ -294,11 +353,14 @@ struct iommufd_device *iommufd_device_bind(struct iom= mufd_ctx *ictx, *id =3D idev->obj.id; return idev; =20 -out_release_owner: - iommu_device_release_dma_owner(dev); -out_group_put: - iommufd_put_group(igroup); +err_out: + /* + * iommufd_device_destroy() handles partially initialized idev, + * so iommufd_object_abort_and_destroy() is safe to call here. + */ + iommufd_object_abort_and_destroy(ictx, &idev->obj); return ERR_PTR(rc); + } EXPORT_SYMBOL_NS_GPL(iommufd_device_bind, "IOMMUFD"); =20 @@ -512,6 +574,9 @@ static int iommufd_hwpt_attach_device(struct iommufd_hw= _pagetable *hwpt, struct iommufd_attach_handle *handle; int rc; =20 + if (iommufd_device_is_noiommu(idev)) + return 0; + if (!iommufd_hwpt_compatible_device(hwpt, idev)) return -EINVAL; =20 @@ -559,6 +624,9 @@ static void iommufd_hwpt_detach_device(struct iommufd_h= w_pagetable *hwpt, { struct iommufd_attach_handle *handle; =20 + if (iommufd_device_is_noiommu(idev)) + return; + handle =3D iommufd_device_get_attach_handle(idev, pasid); if (pasid =3D=3D IOMMU_NO_PASID) iommu_detach_group_handle(hwpt->domain, idev->igroup->group); @@ -577,6 +645,9 @@ static int iommufd_hwpt_replace_device(struct iommufd_d= evice *idev, struct iommufd_attach_handle *handle, *old_handle; int rc; =20 + if (iommufd_device_is_noiommu(idev)) + return 0; + if (!iommufd_hwpt_compatible_device(hwpt, idev)) return -EINVAL; =20 @@ -652,7 +723,7 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_paget= able *hwpt, goto err_release_devid; } =20 - if (attach_resv) { + if (attach_resv && !iommufd_device_is_noiommu(idev)) { rc =3D iommufd_device_attach_reserved_iova(idev, hwpt_paging); if (rc) goto err_release_devid; @@ -1585,6 +1656,11 @@ int iommufd_get_hw_info(struct iommufd_ucmd *ucmd) if (IS_ERR(idev)) return PTR_ERR(idev); =20 + if (iommufd_device_is_noiommu(idev)) { + rc =3D -EOPNOTSUPP; + goto out_put; + } + ops =3D dev_iommu_ops(idev->dev); if (ops->hw_info) { data =3D ops->hw_info(idev->dev, &data_len, &cmd->out_data_type); --=20 2.43.0 From nobody Mon Jun 8 06:35:45 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id DEB41429838 for ; Wed, 3 Jun 2026 22:02:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780524137; cv=none; b=oEDm0e65lA3GuDA/n4yuYe6pamy2iILmFBOnThhhRxctSIjJpJDQMzTmtKaUIe7eF9FalZv9a6RsmdqtjvoGsWytBlnTgPtdwlUszF3ESYWyaCU3SX9BAdC2f+wrAv51zOkjEV3V38f5pXU3QoL0cgZKEoGgksNzuXJOR4uv9QE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780524137; c=relaxed/simple; bh=qmK16nr4gdlAEVALgvBiMtqcJxnvc6j9khQXY/aohek=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=a+2ClqIE34TdJ3nk/NSwyNSeNRCprTfkKR4AYmhRw8LphVndVyk55098t2D+jEKP5IyMNzf+Ib+ls4jbt+yWz7YpsdTE/MzazI8iQlL13nakZ2XI/PYPnxQrSPxVhdTIneK+s4cIs64179iYNl0mmaXu6+0zcjq8GjajxRnWtYw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=tC1D4qKj; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="tC1D4qKj" Received: from administrator-PowerEdge-R660.corp.microsoft.com (unknown [131.107.147.7]) by linux.microsoft.com (Postfix) with ESMTPSA id 8341520B716F; Wed, 3 Jun 2026 15:02:00 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 8341520B716F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1780524120; bh=oX58qjT3Gk0MUPNrGmzR7g0gp4VCHOc7HmNkZRzDMXM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=tC1D4qKjm4Dn1ZZViiph2V1Pa14TmNkZbXzXfA/yzu2QPVm41uTDucbSxFCvV/PhV L54+NuMNiqbdyTMlTCbAE5IhvzhCrQG8/YKW42mOUkUXNcnKeIc4nx6YWFtLO+0D/Z S9zRcSC4dNESgPXyc3cwvryUVtIhoLBATcp1kaYY= From: Jacob Pan To: linux-kernel@vger.kernel.org, "iommu@lists.linux.dev" , Jason Gunthorpe , Alex Williamson , Joerg Roedel , Mostafa Saleh , David Matlack , Robin Murphy , Nicolin Chen , "Tian, Kevin" , Yi Liu , Baolu Lu Cc: Saurabh Sengar , skhawaja@google.com, pasha.tatashin@soleen.com, Will Deacon , Jacob Pan Subject: [PATCH v8 4/6] iommufd: Add an ioctl to query PA from IOVA for noiommu mode Date: Wed, 3 Jun 2026 15:02:09 -0700 Message-ID: <20260603220211.2584590-5-jacob.pan@linux.microsoft.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260603220211.2584590-1-jacob.pan@linux.microsoft.com> References: <20260603220211.2584590-1-jacob.pan@linux.microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" To support no-IOMMU mode where userspace drivers perform unsafe DMA using physical addresses, introduce a new API to retrieve the physical address of a user-allocated DMA buffer that has been mapped to an IOVA via IOMMU_IOAS_MAP. The mapping is backed by SW-only I/O page tables maintained by the GENERIC_PT framework. Reviewed-by: Lu Baolu Reviewed-by: Kevin Tian Suggested-by: Jason Gunthorpe Co-developed-by: Jason Gunthorpe Signed-off-by: Jason Gunthorpe Signed-off-by: Jacob Pan --- v8: - Fix comment on start IOVA range (Kevin) v7: - Fix commit message (Yi) - Avoid duplicated tmp_length settting (yi) - Handle race with dma-buf revoke pages (Sashiko) v6: - Limit search length (Baolu, Jason) v5: - Fix next_iova exceeds iopt_area_last_iova (Alex) - Rename IOCTL more specific to NOIOMMU, i.e. IOMMUFD_CMD_IOAS_NOIOMMU_GET_PA (Kevin) - Add header stubs for iopt_get_phys() v4: - Fix ioctl return type (Yi Liu) fix comment get_pa Signed-off-by: Jacob Pan --- drivers/iommu/iommufd/io_pagetable.c | 80 +++++++++++++++++++++++++ drivers/iommu/iommufd/ioas.c | 33 ++++++++++ drivers/iommu/iommufd/iommufd_private.h | 18 ++++++ drivers/iommu/iommufd/main.c | 3 + include/uapi/linux/iommufd.h | 27 +++++++++ 5 files changed, 161 insertions(+) diff --git a/drivers/iommu/iommufd/io_pagetable.c b/drivers/iommu/iommufd/i= o_pagetable.c index 24d4917105d9..667c2d07e08b 100644 --- a/drivers/iommu/iommufd/io_pagetable.c +++ b/drivers/iommu/iommufd/io_pagetable.c @@ -859,6 +859,86 @@ int iopt_unmap_iova(struct io_pagetable *iopt, unsigne= d long iova, return iopt_unmap_iova_range(iopt, iova, iova_last, unmapped); } =20 +#ifdef CONFIG_IOMMUFD_NOIOMMU +int iopt_get_phys(struct io_pagetable *iopt, unsigned long iova, u64 *padd= r, + u64 *length) +{ + struct iopt_area *area; + struct iopt_pages *pages; + u64 max_length =3D *length; + u64 tmp_length =3D 0; + u64 tmp_paddr =3D 0; + int rc =3D 0; + + down_read(&iopt->iova_rwsem); + area =3D iopt_area_iter_first(iopt, iova, iova); + if (!area || !area->pages) { + rc =3D -ENOENT; + goto unlock_exit; + } + + pages =3D area->pages; + mutex_lock(&pages->mutex); + if (iopt_dmabuf_revoked(pages)) { + rc =3D -EINVAL; + goto unlock_pages; + } + + if (!area->storage_domain || + area->storage_domain->owner !=3D &iommufd_noiommu_ops) { + rc =3D -EOPNOTSUPP; + goto unlock_pages; + } + + *paddr =3D iommu_iova_to_phys(area->storage_domain, iova); + if (!*paddr) { + rc =3D -EINVAL; + goto unlock_pages; + } + + tmp_length =3D PAGE_SIZE - offset_in_page(iova); + tmp_paddr =3D *paddr; + /* + * Scan the domain for the contiguous physical address length so that + * userspace search can be optimized for fewer ioctls. A max_length of + * 0 means no limit. + */ + while (iova < iopt_area_last_iova(area)) { + unsigned long next_iova; + u64 next_paddr; + + if (max_length && tmp_length >=3D max_length) + break; + + if (check_add_overflow(iova, PAGE_SIZE, &next_iova)) + break; + + if (next_iova > iopt_area_last_iova(area)) + break; + + next_paddr =3D iommu_iova_to_phys(area->storage_domain, next_iova); + + if (!next_paddr || next_paddr !=3D tmp_paddr + PAGE_SIZE) + break; + + iova =3D next_iova; + tmp_paddr +=3D PAGE_SIZE; + tmp_length +=3D PAGE_SIZE; + } + + if (max_length && tmp_length > max_length) + tmp_length =3D max_length; + *length =3D tmp_length; + +unlock_pages: + mutex_unlock(&pages->mutex); +unlock_exit: + up_read(&iopt->iova_rwsem); + + return rc; +} +#endif + int iopt_unmap_all(struct io_pagetable *iopt, unsigned long *unmapped) { /* If the IOVAs are empty then unmap all succeeds */ diff --git a/drivers/iommu/iommufd/ioas.c b/drivers/iommu/iommufd/ioas.c index fed06c2b728e..ad1c3031f6a9 100644 --- a/drivers/iommu/iommufd/ioas.c +++ b/drivers/iommu/iommufd/ioas.c @@ -375,6 +375,39 @@ int iommufd_ioas_unmap(struct iommufd_ucmd *ucmd) return rc; } =20 +#ifdef CONFIG_IOMMUFD_NOIOMMU +int iommufd_ioas_noiommu_get_pa(struct iommufd_ucmd *ucmd) +{ + struct iommu_ioas_noiommu_get_pa *cmd =3D ucmd->cmd; + struct iommufd_ioas *ioas; + int rc; + + if (!capable(CAP_SYS_RAWIO)) + return -EPERM; + + if (cmd->flags || cmd->__reserved) + return -EOPNOTSUPP; + + if (cmd->iova >=3D ULONG_MAX) + return -EOVERFLOW; + + ioas =3D iommufd_get_ioas(ucmd->ictx, cmd->ioas_id); + if (IS_ERR(ioas)) + return PTR_ERR(ioas); + + rc =3D iopt_get_phys(&ioas->iopt, cmd->iova, &cmd->out_phys, + &cmd->length); + if (rc) + goto out_put; + + rc =3D iommufd_ucmd_respond(ucmd, sizeof(*cmd)); +out_put: + iommufd_put_object(ucmd->ictx, &ioas->obj); + + return rc; +} +#endif + static void iommufd_release_all_iova_rwsem(struct iommufd_ctx *ictx, struct xarray *ioas_list) { diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommuf= d/iommufd_private.h index c8ed612e896a..15909ba75c18 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -118,6 +118,16 @@ int iopt_map_pages(struct io_pagetable *iopt, struct l= ist_head *pages_list, int iopt_unmap_iova(struct io_pagetable *iopt, unsigned long iova, unsigned long length, unsigned long *unmapped); int iopt_unmap_all(struct io_pagetable *iopt, unsigned long *unmapped); +#ifdef CONFIG_IOMMUFD_NOIOMMU +int iopt_get_phys(struct io_pagetable *iopt, unsigned long iova, u64 *padd= r, + u64 *length); +#else +static inline int iopt_get_phys(struct io_pagetable *iopt, unsigned long i= ova, + u64 *paddr, u64 *length) +{ + return -EOPNOTSUPP; +} +#endif =20 int iopt_read_and_clear_dirty_data(struct io_pagetable *iopt, struct iommu_domain *domain, @@ -346,6 +356,14 @@ int iommufd_ioas_map_file(struct iommufd_ucmd *ucmd); int iommufd_ioas_change_process(struct iommufd_ucmd *ucmd); int iommufd_ioas_copy(struct iommufd_ucmd *ucmd); int iommufd_ioas_unmap(struct iommufd_ucmd *ucmd); +#ifdef CONFIG_IOMMUFD_NOIOMMU +int iommufd_ioas_noiommu_get_pa(struct iommufd_ucmd *ucmd); +#else +static inline int iommufd_ioas_noiommu_get_pa(struct iommufd_ucmd *ucmd) +{ + return -EOPNOTSUPP; +} +#endif int iommufd_ioas_option(struct iommufd_ucmd *ucmd); int iommufd_option_rlimit_mode(struct iommu_option *cmd, struct iommufd_ctx *ictx); diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index f6ae60bd3f70..a4668995269c 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -424,6 +424,7 @@ union ucmd_buffer { struct iommu_ioas_alloc alloc; struct iommu_ioas_allow_iovas allow_iovas; struct iommu_ioas_copy ioas_copy; + struct iommu_ioas_noiommu_get_pa noiommu_get_pa; struct iommu_ioas_iova_ranges iova_ranges; struct iommu_ioas_map map; struct iommu_ioas_unmap unmap; @@ -482,6 +483,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[= ] =3D { IOCTL_OP(IOMMU_IOAS_MAP, iommufd_ioas_map, struct iommu_ioas_map, iova), IOCTL_OP(IOMMU_IOAS_MAP_FILE, iommufd_ioas_map_file, struct iommu_ioas_map_file, iova), + IOCTL_OP(IOMMU_IOAS_NOIOMMU_GET_PA, iommufd_ioas_noiommu_get_pa, struct i= ommu_ioas_noiommu_get_pa, + out_phys), IOCTL_OP(IOMMU_IOAS_UNMAP, iommufd_ioas_unmap, struct iommu_ioas_unmap, length), IOCTL_OP(IOMMU_OPTION, iommufd_option, struct iommu_option, val64), diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index e998dfbd6960..552bc5c096b4 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -57,6 +57,7 @@ enum { IOMMUFD_CMD_IOAS_CHANGE_PROCESS =3D 0x92, IOMMUFD_CMD_VEVENTQ_ALLOC =3D 0x93, IOMMUFD_CMD_HW_QUEUE_ALLOC =3D 0x94, + IOMMUFD_CMD_IOAS_NOIOMMU_GET_PA =3D 0x95, }; =20 /** @@ -219,6 +220,32 @@ struct iommu_ioas_map { }; #define IOMMU_IOAS_MAP _IO(IOMMUFD_TYPE, IOMMUFD_CMD_IOAS_MAP) =20 +/** + * struct iommu_ioas_noiommu_get_pa - ioctl(IOMMU_IOAS_NOIOMMU_GET_PA) + * @size: sizeof(struct iommu_ioas_noiommu_get_pa) + * @flags: Reserved, must be 0 for now + * @ioas_id: IOAS ID to query IOVA to PA mapping from + * @__reserved: Must be 0 + * @iova: IOVA to query + * @length: On input, maximum number of bytes to scan for contiguity (0 me= ans + * no limit). On output, actual number of contiguous bytes starti= ng + * from out_phys. + * @out_phys: Output physical address the IOVA maps to + * + * Query the physical address backing an IOVA range. The beginning of the + * range must be mapped already. For noiommu devices doing unsafe DMA only. + */ +struct iommu_ioas_noiommu_get_pa { + __u32 size; + __u32 flags; + __u32 ioas_id; + __u32 __reserved; + __aligned_u64 iova; + __aligned_u64 length; + __aligned_u64 out_phys; +}; +#define IOMMU_IOAS_NOIOMMU_GET_PA _IO(IOMMUFD_TYPE, IOMMUFD_CMD_IOAS_NOIOM= MU_GET_PA) + /** * struct iommu_ioas_map_file - ioctl(IOMMU_IOAS_MAP_FILE) * @size: sizeof(struct iommu_ioas_map_file) --=20 2.43.0 From nobody Mon Jun 8 06:35:45 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 3E08F43E4A6 for ; Wed, 3 Jun 2026 22:02:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780524138; cv=none; b=T+Vg4O5bz4RpXTrUCm6NaDrc3RNbDKpSRmoU32R4Uji4HFbw64dGd2gK+DaWl/qHjFsxtIW26aL0DTqlfbYUFWIi59Z2tWllQ8D8rESq8waULSC9AK6t/TGnksoO55WH3DJX9TTtltCvAXioFZoRHm155m6xXf+/m4i7sxy5llY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780524138; c=relaxed/simple; bh=0oOUZab0C42KyQHK91KKID/5miNZ34CSruVKduqCvgc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=E/u1OGLGvD43id13W+JLtRA1cfXTQUp0xgoJK8syDmOGpV4F0482XYTf29MQ7b73U3WskiFk9+StCFcJoor256+31fJ4FY02P6nZ4wFdTLl/FY9Z2W3IQqHhwTeHkAo6HdTT3bndC7Yzf0RGIStn4RM2lKIYsrQZCjhuOMv3NP0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=jW8glq0/; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="jW8glq0/" Received: from administrator-PowerEdge-R660.corp.microsoft.com (unknown [131.107.147.7]) by linux.microsoft.com (Postfix) with ESMTPSA id A69C520B716A; Wed, 3 Jun 2026 15:02:01 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com A69C520B716A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1780524121; bh=iFAffnFqEz+0dzezYRbJ6LRxkIFZYw2wBqckTlJVIYo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=jW8glq0/vFnsVFePjKWiFPy5RIGZt8ZupFedqKx36z5nO+KSymuMmpsQE7cgLdfdL r6+N/znZSmD54fu+zOWGw0W1D/cH4Cmje8oJ6zifAeqN1pUSDgTYumIqTuQhF3OI4B BZ83rzpPDVMBgpGoVYlyWyIDKv6QJMWLeladFI34= From: Jacob Pan To: linux-kernel@vger.kernel.org, "iommu@lists.linux.dev" , Jason Gunthorpe , Alex Williamson , Joerg Roedel , Mostafa Saleh , David Matlack , Robin Murphy , Nicolin Chen , "Tian, Kevin" , Yi Liu , Baolu Lu Cc: Saurabh Sengar , skhawaja@google.com, pasha.tatashin@soleen.com, Will Deacon , Jacob Pan Subject: [PATCH v8 5/6] vfio: Enable cdev noiommu mode under iommufd Date: Wed, 3 Jun 2026 15:02:10 -0700 Message-ID: <20260603220211.2584590-6-jacob.pan@linux.microsoft.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260603220211.2584590-1-jacob.pan@linux.microsoft.com> References: <20260603220211.2584590-1-jacob.pan@linux.microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Now that devices under noiommu mode can bind with IOMMUFD and perform IOAS operations, lift restrictions on cdev from VFIO side. Use cases are documented in Documentation/driver-api/vfio.rst Reviewed-by: Kevin Tian Signed-off-by: Jacob Pan --- v8: - Fix warning message (Kevin) v7: - Avoid treating emulated device as noiommu device (Sashiko) - Keep platforms w/ GENERIC_ATOMIC64 to use VFIO group noiommu as before (Sashiko) - Restore order of group & cdev init for noiommu (Yi) - Consolidate noiommu helper for cdev & group (Yi) v6: - Revert back to unified VFIO_NOIOMMU Kconfig for both cdev and group. Use Kconfig dependency to restrict usages and avoid null group checks. (Alex & Yi) - Add CAP_SYS_RAWIO checks for cdev open to maintain security parity with the group noiommu path. (Alex) v5: - Add Kconfig VFIO_CDEV_NOIOMMU to select IOMMUFD_NOIOMMU and its dependencies - Add comment to explain vfio_noiommu conditional definition (Alex) - Removed early return for group noiommu in bind/unbind - Use consistent wording referring to VFIO noiommu mode (Kevin) - Update unsafe_noiommu Kconfig help text (Kevin) - Change dev_warn to dev_info for noiommu enabling msg (Kevin) v4: - Remove early return in iommufd_bind for noiommu (Alex) v3: - Consolidate into fewer patches v2: - removed unnecessary device->noiommu set in iommufd_vfio_compat_ioas_get_id() --- drivers/vfio/Kconfig | 7 ++++--- drivers/vfio/device_cdev.c | 3 +++ drivers/vfio/iommufd.c | 12 ++++++++---- drivers/vfio/vfio.h | 23 +++++++++-------------- drivers/vfio/vfio_main.c | 26 +++++++++++++++++++++++++- include/linux/vfio.h | 1 + 6 files changed, 50 insertions(+), 22 deletions(-) diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig index ceae52fd7586..b9d6e1c22aed 100644 --- a/drivers/vfio/Kconfig +++ b/drivers/vfio/Kconfig @@ -22,8 +22,7 @@ config VFIO_DEVICE_CDEV The VFIO device cdev is another way for userspace to get device access. Userspace gets device fd by opening device cdev under /dev/vfio/devices/vfioX, and then bind the device fd with an iommufd - to set up secure DMA context for device access. This interface does - not support noiommu. + to set up secure DMA context for device access. =20 If you don't know what to do here, say N. =20 @@ -62,7 +61,9 @@ endif =20 config VFIO_NOIOMMU bool "VFIO No-IOMMU support" - depends on VFIO_GROUP + depends on VFIO_GROUP || (VFIO_DEVICE_CDEV && !GENERIC_ATOMIC64) + depends on !VFIO_GROUP || VFIO_CONTAINER || IOMMUFD_VFIO_CONTAINER + select IOMMUFD_NOIOMMU if VFIO_DEVICE_CDEV && !GENERIC_ATOMIC64 help VFIO is built on the ability to isolate devices using the IOMMU. Only with an IOMMU can userspace access to DMA capable devices be diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c index 54abf312cf04..5ca14979b56e 100644 --- a/drivers/vfio/device_cdev.c +++ b/drivers/vfio/device_cdev.c @@ -27,6 +27,9 @@ int vfio_device_fops_cdev_open(struct inode *inode, struc= t file *filep) struct vfio_device_file *df; int ret; =20 + if (vfio_device_is_noiommu(device) && !capable(CAP_SYS_RAWIO)) + return -EPERM; + /* Paired with the put in vfio_device_fops_release() */ if (!vfio_device_try_get_registration(device)) return -ENODEV; diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c index a38d262c6028..e9893d34d07b 100644 --- a/drivers/vfio/iommufd.c +++ b/drivers/vfio/iommufd.c @@ -25,8 +25,8 @@ int vfio_df_iommufd_bind(struct vfio_device_file *df) =20 lockdep_assert_held(&vdev->dev_set->lock); =20 - /* Returns 0 to permit device opening under noiommu mode */ - if (vfio_device_is_noiommu(vdev)) + /* Group noiommu via iommufd compat needs no device binding */ + if (df->group && vfio_device_is_noiommu(vdev)) return 0; =20 return vdev->ops->bind_iommufd(vdev, ictx, &df->devid); @@ -40,7 +40,11 @@ int vfio_iommufd_compat_attach_ioas(struct vfio_device *= vdev, =20 lockdep_assert_held(&vdev->dev_set->lock); =20 - /* compat noiommu does not need to do ioas attach */ + /* + * Compat noiommu does not need to do ioas attach. This helper is + * only called from the legacy group/iommufd compat path, so no + * explicit df->group check is needed. + */ if (vfio_device_is_noiommu(vdev)) return 0; =20 @@ -58,7 +62,7 @@ void vfio_df_iommufd_unbind(struct vfio_device_file *df) =20 lockdep_assert_held(&vdev->dev_set->lock); =20 - if (vfio_device_is_noiommu(vdev)) + if (df->group && vfio_device_is_noiommu(vdev)) return; =20 if (vdev->ops->unbind_iommufd) diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h index e4b72e79b7e3..7728bc99b63d 100644 --- a/drivers/vfio/vfio.h +++ b/drivers/vfio/vfio.h @@ -112,11 +112,6 @@ bool vfio_device_has_container(struct vfio_device *dev= ice); int __init vfio_group_init(void); void vfio_group_cleanup(void); =20 -static inline bool vfio_device_is_noiommu(struct vfio_device *vdev) -{ - return IS_ENABLED(CONFIG_VFIO_NOIOMMU) && - vdev->group->type =3D=3D VFIO_NO_IOMMU; -} #else struct vfio_group; =20 @@ -188,11 +183,17 @@ static inline void vfio_group_cleanup(void) { } =20 +#endif /* CONFIG_VFIO_GROUP */ + static inline bool vfio_device_is_noiommu(struct vfio_device *vdev) { - return false; +#if IS_ENABLED(CONFIG_VFIO_GROUP) + if (vdev->group && vdev->group->type =3D=3D VFIO_NO_IOMMU) + return true; +#endif + + return IS_ENABLED(CONFIG_IOMMUFD_NOIOMMU) && vdev->noiommu; } -#endif /* CONFIG_VFIO_GROUP */ =20 #if IS_ENABLED(CONFIG_VFIO_CONTAINER) /** @@ -358,19 +359,13 @@ void vfio_init_device_cdev(struct vfio_device *device= ); =20 static inline int vfio_device_add(struct vfio_device *device) { - /* cdev does not support noiommu device */ - if (vfio_device_is_noiommu(device)) - return device_add(&device->device); vfio_init_device_cdev(device); return cdev_device_add(&device->cdev, &device->device); } =20 static inline void vfio_device_del(struct vfio_device *device) { - if (vfio_device_is_noiommu(device)) - device_del(&device->device); - else - cdev_device_del(&device->cdev, &device->device); + cdev_device_del(&device->cdev, &device->device); } =20 int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep); diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index 6222376ab6ab..fc8a50941aac 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -321,6 +321,24 @@ static int vfio_init_device(struct vfio_device *device= , struct device *dev, return ret; } =20 +static int vfio_device_set_noiommu_and_name(struct vfio_device *device, en= um vfio_group_type type) +{ + if (IS_ENABLED(CONFIG_IOMMUFD_NOIOMMU) && vfio_noiommu && + !device->dev->iommu && type =3D=3D VFIO_IOMMU) + device->noiommu =3D true; + + /* + * device->noiommu records no-IOMMU support for the standalone cdev + * interface. VFIO_NOIOMMU enables both group and cdev no-IOMMU; when + * cdev no-IOMMU is available, device->noiommu is set before + * vfio_device_set_group(), so the cdev is named noiommu-vfio%d up + * front. There cannot be a combination of a plain vfio%d cdev name and + * a no-IOMMU group because VFIO_NOIOMMU selects IOMMUFD_NOIOMMU. + */ + return dev_set_name(&device->device, "%svfio%d", + device->noiommu ? "noiommu-" : "", device->index); +} + static int __vfio_register_dev(struct vfio_device *device, enum vfio_group_type type) { @@ -340,7 +358,7 @@ static int __vfio_register_dev(struct vfio_device *devi= ce, if (!device->dev_set) vfio_assign_device_set(device, device); =20 - ret =3D dev_set_name(&device->device, "vfio%d", device->index); + ret =3D vfio_device_set_noiommu_and_name(device, type); if (ret) return ret; =20 @@ -348,6 +366,12 @@ static int __vfio_register_dev(struct vfio_device *dev= ice, if (ret) return ret; =20 + if (vfio_device_is_noiommu(device) && IS_ENABLED(CONFIG_IOMMUFD_NOIOMMU))= { + add_taint(TAINT_USER, LOCKDEP_STILL_OK); + dev_warn(device->dev, + "Adding kernel taint for vfio-noiommu cdev\n"); + } + /* * VFIO always sets IOMMU_CACHE because we offer no way for userspace to * restore cache coherency. It has to be checked here because it is only diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 31b826efba00..45f08986359e 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -74,6 +74,7 @@ struct vfio_device { u8 iommufd_attached:1; #endif u8 cdev_opened:1; + u8 noiommu:1; /* * debug_root is a static property of the vfio_device * which must be set prior to registering the vfio_device. --=20 2.43.0 From nobody Mon Jun 8 06:35:45 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 79B04477E37 for ; Wed, 3 Jun 2026 22:02:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780524139; cv=none; b=GzKCHuSLsSfuPX1Jd03MvY0GXkN0tyIGHUu04LvunIdAd/NzyZlKWVZVFwWhPUeoEuvBmIAjb8Xpv8PwTzJXZuH7bMNVGeJgy380/D1bKo911+bpymnCdoZhxqila/LQpTv6tuW2JkifhDVWtg+ir4Eiq5czMGXFIXBSDpX9BmU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780524139; c=relaxed/simple; bh=ISO2SuIHi96YENEDGkOwdgRJTTkektLfFRHtgmfryng=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ge/2n9En4HQCbkTyd26g7EMwLlwQl6iDBpwGyAO/mT1w4Izq+Ht58/N4HHTGV0WErIqq0WqfdwfiRB9nNDDi8am3xZBaC45Dka2pdi+9XlL4P2jkVf8auNSNFv1ZQl7Rnoql0g98fn8MLDlZrzsiqNpKvOhszOvVle31E27bRS0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=P8DvjF20; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="P8DvjF20" Received: from administrator-PowerEdge-R660.corp.microsoft.com (unknown [131.107.147.7]) by linux.microsoft.com (Postfix) with ESMTPSA id 0851320B716D; Wed, 3 Jun 2026 15:02:03 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 0851320B716D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1780524123; bh=9iOe+4xrTvdYuZWo1/OXpHg74XdGQ2EUncLcsguBF6I=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=P8DvjF20CYDk9ioMoY1GGqe5Uu5xQiYKPq3eZFm7hAWrp4DVM9jfOFpT7yY9veI9l SxBwrZT81rtihYfBrxw1I2FGlsFDHRCv3k0Qi9R6nUJun4d3Xvq4cR8ttZxwIAuC9C 0JtOMXWt9TvAxwpBqY858bW2Xkq6TKySdNwDx2/A= From: Jacob Pan To: linux-kernel@vger.kernel.org, "iommu@lists.linux.dev" , Jason Gunthorpe , Alex Williamson , Joerg Roedel , Mostafa Saleh , David Matlack , Robin Murphy , Nicolin Chen , "Tian, Kevin" , Yi Liu , Baolu Lu Cc: Saurabh Sengar , skhawaja@google.com, pasha.tatashin@soleen.com, Will Deacon , Jacob Pan Subject: [PATCH v8 6/6] Documentation: Update VFIO NOIOMMU mode Date: Wed, 3 Jun 2026 15:02:11 -0700 Message-ID: <20260603220211.2584590-7-jacob.pan@linux.microsoft.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260603220211.2584590-1-jacob.pan@linux.microsoft.com> References: <20260603220211.2584590-1-jacob.pan@linux.microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Document the NOIOMMU mode with newly added cdev support under iommufd. Cc: Jonathan Corbet Reviewed-by: Yi Liu Reviewed-by: Kevin Tian Signed-off-by: Jacob Pan --- V8: - Remove reference about self test. v7: - Added Kconfig matrix v6: - Generalize device node names (noiommu-vfioX, noiommu-Y) in the tree example (Yi) - Clarify table column descriptions for Yes/No meanings (Yi) --- Documentation/driver-api/vfio.rst | 81 ++++++++++++++++++++++++++++++- 1 file changed, 79 insertions(+), 2 deletions(-) diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/v= fio.rst index 2a21a42c9386..bf0632a43bc6 100644 --- a/Documentation/driver-api/vfio.rst +++ b/Documentation/driver-api/vfio.rst @@ -275,8 +275,6 @@ in a VFIO group. With CONFIG_VFIO_DEVICE_CDEV=3Dy the user can now acquire a device fd by directly opening a character device /dev/vfio/devices/vfioX where "X" is the number allocated uniquely by VFIO for registered devices. -cdev interface does not support noiommu devices, so user should use -the legacy group interface if noiommu is wanted. =20 The cdev only works with IOMMUFD. Both VFIO drivers and applications must adapt to the new cdev security model which requires using @@ -370,6 +368,85 @@ IOMMUFD IOAS/HWPT to enable userspace DMA:: =20 /* Other device operations as stated in "VFIO Usage Example" */ =20 +VFIO NOIOMMU mode +--------------------------------------------------------------------------= ----- +VFIO also supports a no-IOMMU mode, intended for usages where unsafe DMA c= an +be performed by userspace drivers w/o physical IOMMU protection. This mode +is controlled by the parameter: + +/sys/module/vfio/parameters/enable_unsafe_noiommu_mode + +Upon enabling this mode, with an assigned device, the user will be present= ed +with a VFIO group and device file, e.g.:: + + /dev/vfio/ + |-- devices + | `-- noiommu-vfioX /* VFIO device cdev */ + |-- noiommu-Y /* VFIO group */ + `-- vfio + +The capabilities vary depending on the device programming interface and ke= rnel +configuration used. The following table summarizes the differences ("Yes" = means +the UAPI is accessible and functional in noiommu mode, "No" means the UAPI= is +not supported): + ++-------------------+---------------------+----------------------+ +| Feature | VFIO group | VFIO device cdev | ++=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ +| VFIO device UAPI | Yes | Yes | ++-------------------+---------------------+----------------------+ +| VFIO container | No | No | ++-------------------+---------------------+----------------------+ +| IOMMUFD IOAS | No | Yes* | ++-------------------+---------------------+----------------------+ + +Note that the VFIO container case includes IOMMUFD provided VFIO compatibi= lity +interfaces when either CONFIG_VFIO_CONTAINER or CONFIG_IOMMUFD_VFIO_CONTAI= NER is +enabled. + +* IOMMUFD UAPI is available for VFIO device cdev to pin and map user memor= y with + the ability to retrieve physical addresses for DMA command submission. + +Kconfig Support Matrix +^^^^^^^^^^^^^^^^^^^^^^ + +The visibility of CONFIG_VFIO_NOIOMMU depends on the combination of +CONFIG_VFIO_GROUP, CONFIG_VFIO_DEVICE_CDEV, and whether a container backend +(CONFIG_VFIO_CONTAINER or CONFIG_IOMMUFD_VFIO_CONTAINER) is configured. T= he +Kconfig dependencies enforce the following constraints: + +- At least one access path (group or cdev) must be available. +- If VFIO_GROUP is enabled, a container backend is required; otherwise the + group node would be unusable in noiommu mode. + +The resulting support matrix: + ++------+-------+-----------+------+---------+---------------------------+ +| Case | GROUP | Container | CDEV | NOIOMMU | Notes | ++=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D+=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ +| 1 | y | y | n | yes | Group noiommu works | ++------+-------+-----------+------+---------+---------------------------+ +| 2 | y | n | n | no | Blocked - no container | ++------+-------+-----------+------+---------+---------------------------+ +| 3 | y | y | y | yes | Both paths work | ++------+-------+-----------+------+---------+---------------------------+ +| 4 | y | n | y | no | Blocked - no container | ++------+-------+-----------+------+---------+---------------------------+ +| 5 | n | - | y | yes | Cdev-only works | ++------+-------+-----------+------+---------+---------------------------+ +| 6 | n | - | n | no | No access path | ++------+-------+-----------+------+---------+---------------------------+ + +Container =3D CONFIG_VFIO_CONTAINER or CONFIG_IOMMUFD_VFIO_CONTAINER (eith= er +suffices). Case 4 is intentionally blocked: allowing NOIOMMU with GROUP +enabled but no container would create unusable group nodes. Users who want +cdev-only noiommu should set CONFIG_VFIO_GROUP=3Dn (case 5). + +A new IOMMUFD ioctl IOMMU_IOAS_NOIOMMU_GET_PA is added to retrieve the phy= sical +address for a given IOVA. Although there is no physical DMA remapping hard= ware, +IOMMU_IOAS_MAP_FIXED_IOVA is still used to establish IOVA-to-PA mappings i= n the +software page table for later IOMMU_IOAS_NOIOMMU_GET_PA lookups. + VFIO User API --------------------------------------------------------------------------= ----- =20 --=20 2.43.0