From nobody Sun May 24 20:33:26 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 6E1FB332EBD for ; Thu, 21 May 2026 22:11:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779401518; cv=none; b=JegTkn4IUgK4TL9iVhs41HemWFkoSzKkZzecwGqowXsR0kSMJhd8BnYuNc1mTNpP1vi06k/42Ysx8BWExFVHzPfquLhwOXcdkfOS6yx/MObPkr2bODwtlsApTTFdRcIyU3gQ+9SfJYnkgDWqJRPDtt/cG4YyLkw6i0Rq1/XcDp0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779401518; c=relaxed/simple; bh=ksSXMvM8ovZJlrtps9/d/1c9YjgeRB/oZs30xvfmRd8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=TOOzGNqSgw9LWJBiqsR03NedZj7kaJqBhdTMsqHwZj/5Nx2Puo8V4vwmKD70DyNE5fVZn6wFQ67/GZVBOD+2ZvgVCmy7hBMxAcoNeL7yhaRCvXb6UEicjRUNKQM+NP5tnciKWszkxU9C7PpnVzgeq3gfXBsUftR/EIT6Jxo/VjA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=jKvSSUNX; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="jKvSSUNX" Received: from administrator-PowerEdge-R660.corp.microsoft.com (unknown [131.107.147.7]) by linux.microsoft.com (Postfix) with ESMTPSA id 6A17120B7169; Thu, 21 May 2026 15:11:49 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 6A17120B7169 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1779401509; bh=16gXjRM1XRgk54x3E/Rwenma3AATUdRyBgucO9DVGhA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=jKvSSUNXKw3I3gSuK0b6P8liaer77Py0DgBt5dn5TxYbnBlICCYJqhbFq3yK4P3Iy w1rFXcFLfO04jHX/Wj8XimuSzAG7mC4fwMRTcHxds1PjGn3jA5rtJNROeetK7Z/IhC EZ4A6ix5U+MBwQ8pz/uxeVsp02SGx7JmuD2U3o2M= From: Jacob Pan To: linux-kernel@vger.kernel.org, "iommu@lists.linux.dev" , Jason Gunthorpe , Alex Williamson , Joerg Roedel , Mostafa Saleh , David Matlack , Robin Murphy , Nicolin Chen , "Tian, Kevin" , Yi Liu , Baolu Lu Cc: Saurabh Sengar , skhawaja@google.com, pasha.tatashin@soleen.com, Will Deacon , Jacob Pan Subject: [PATCH v6 1/7] iommufd: Support a HWPT without an iommu driver for noiommu Date: Thu, 21 May 2026 15:11:48 -0700 Message-ID: <20260521221155.1375144-2-jacob.pan@linux.microsoft.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260521221155.1375144-1-jacob.pan@linux.microsoft.com> References: <20260521221155.1375144-1-jacob.pan@linux.microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Jason Gunthorpe Create just a little part of a real iommu driver, enough to slot in under the dev_iommu_ops() and allow iommufd to call domain_alloc_paging_flags() and fail everything else. This allows explicitly creating a HWPT under an IOAS. A new Kconfig option IOMMUFD_NOIOMMU is introduced to differentiate from the VFIO group/container based noiommu mode. Reviewed-by: Lu Baolu Reviewed-by: Samiullah Khawaja Signed-off-by: Jason Gunthorpe Signed-off-by: Jacob Pan --- v6: (Yi) - Sort includes alphabetically (iommu.h after generic_pt/iommu.h) - Fix comment: s/mock page table/SW-only page table/ to avoid confusion with selftest mock - Rewrite noiommu_amdv1_ops comment: explain why AMDV1 format is chosen (multi-page size options), remove references to group-container mode di= stinction v5: - Use the new IOMMUFD_NOIOMMU Kconfig instead of VFIO_NOIOMMU - Use consistent wording referring to VFIO noiommu mode (Kevin) - Copyright date fix (Kevin) v4: - Make iommufd_noiommu_ops const v3: - Add comment to explain the design difference over the legacy noiommu VFIO code. --- drivers/iommu/iommufd/Kconfig | 12 +++ drivers/iommu/iommufd/Makefile | 1 + drivers/iommu/iommufd/hw_pagetable.c | 15 +++- drivers/iommu/iommufd/hwpt_noiommu.c | 97 +++++++++++++++++++++++++ drivers/iommu/iommufd/iommufd_private.h | 2 + 5 files changed, 125 insertions(+), 2 deletions(-) create mode 100644 drivers/iommu/iommufd/hwpt_noiommu.c diff --git a/drivers/iommu/iommufd/Kconfig b/drivers/iommu/iommufd/Kconfig index 455bac0351f2..6c3bea83631b 100644 --- a/drivers/iommu/iommufd/Kconfig +++ b/drivers/iommu/iommufd/Kconfig @@ -16,6 +16,18 @@ config IOMMUFD If you don't know what to do here, say N. =20 if IOMMUFD +config IOMMUFD_NOIOMMU + bool + depends on !GENERIC_ATOMIC64 # IOMMU_PT_AMDV1 requires cmpxchg64 + select GENERIC_PT + select IOMMU_PT + select IOMMU_PT_AMDV1 + help + Provides a SW-only IO page table for devices without hardware + IOMMU backing. This uses the AMDV1 page table format for + IOVA-to-PA lookups only, not for hardware DMA translation. + To be selected by VFIO_NOIOMMU when VFIO_DEVICE_CDEV is enabled. + config IOMMUFD_VFIO_CONTAINER bool "IOMMUFD provides the VFIO container /dev/vfio/vfio" depends on VFIO_GROUP && !VFIO_CONTAINER diff --git a/drivers/iommu/iommufd/Makefile b/drivers/iommu/iommufd/Makefile index 71d692c9a8f4..67207914bb6e 100644 --- a/drivers/iommu/iommufd/Makefile +++ b/drivers/iommu/iommufd/Makefile @@ -10,6 +10,7 @@ iommufd-y :=3D \ vfio_compat.o \ viommu.o =20 +iommufd-$(CONFIG_IOMMUFD_NOIOMMU) +=3D hwpt_noiommu.o iommufd-$(CONFIG_IOMMUFD_TEST) +=3D selftest.o =20 obj-$(CONFIG_IOMMUFD) +=3D iommufd.o diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/h= w_pagetable.c index fe789c2dc0c9..0ae14cd3fc72 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -8,6 +8,15 @@ #include "../iommu-priv.h" #include "iommufd_private.h" =20 +static const struct iommu_ops *get_iommu_ops(struct iommufd_device *idev) +{ + if (IS_ENABLED(CONFIG_IOMMUFD_NOIOMMU) && !idev->igroup->group) + return &iommufd_noiommu_ops; + if (WARN_ON_ONCE(!idev->dev->iommu)) + return NULL; + return dev_iommu_ops(idev->dev); +} + static void __iommufd_hwpt_destroy(struct iommufd_hw_pagetable *hwpt) { if (hwpt->domain) @@ -114,11 +123,13 @@ iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, s= truct iommufd_ioas *ioas, IOMMU_HWPT_ALLOC_DIRTY_TRACKING | IOMMU_HWPT_FAULT_ID_VALID | IOMMU_HWPT_ALLOC_PASID; - const struct iommu_ops *ops =3D dev_iommu_ops(idev->dev); + const struct iommu_ops *ops =3D get_iommu_ops(idev); struct iommufd_hwpt_paging *hwpt_paging; struct iommufd_hw_pagetable *hwpt; int rc; =20 + if (!ops) + return ERR_PTR(-ENODEV); lockdep_assert_held(&ioas->mutex); =20 if ((flags || user_data) && !ops->domain_alloc_paging_flags) @@ -229,7 +240,7 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx, struct iommufd_device *idev, u32 flags, const struct iommu_user_data *user_data) { - const struct iommu_ops *ops =3D dev_iommu_ops(idev->dev); + const struct iommu_ops *ops =3D get_iommu_ops(idev); struct iommufd_hwpt_nested *hwpt_nested; struct iommufd_hw_pagetable *hwpt; int rc; diff --git a/drivers/iommu/iommufd/hwpt_noiommu.c b/drivers/iommu/iommufd/h= wpt_noiommu.c new file mode 100644 index 000000000000..62a44f4b9164 --- /dev/null +++ b/drivers/iommu/iommufd/hwpt_noiommu.c @@ -0,0 +1,97 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES + */ +#include +#include +#include "iommufd_private.h" + +static const struct iommu_domain_ops noiommu_amdv1_ops; + +struct noiommu_domain { + union { + struct iommu_domain domain; + struct pt_iommu_amdv1 amdv1; + }; + spinlock_t lock; +}; +PT_IOMMU_CHECK_DOMAIN(struct noiommu_domain, amdv1.iommu, domain); + +static void noiommu_change_top(struct pt_iommu *iommu_table, + phys_addr_t top_paddr, unsigned int top_level) +{ +} + +static spinlock_t *noiommu_get_top_lock(struct pt_iommu *iommupt) +{ + struct noiommu_domain *domain =3D + container_of(iommupt, struct noiommu_domain, amdv1.iommu); + + return &domain->lock; +} + +static const struct pt_iommu_driver_ops noiommu_driver_ops =3D { + .get_top_lock =3D noiommu_get_top_lock, + .change_top =3D noiommu_change_top, +}; + +static struct iommu_domain * +noiommu_alloc_paging_flags(struct device *dev, u32 flags, + const struct iommu_user_data *user_data) +{ + struct pt_iommu_amdv1_cfg cfg =3D {}; + struct noiommu_domain *dom; + int rc; + + if (flags || user_data) + return ERR_PTR(-EOPNOTSUPP); + + cfg.common.hw_max_vasz_lg2 =3D 64; + cfg.common.hw_max_oasz_lg2 =3D 52; + cfg.starting_level =3D 2; + cfg.common.features =3D + (BIT(PT_FEAT_DYNAMIC_TOP) | BIT(PT_FEAT_AMDV1_ENCRYPT_TABLES) | + BIT(PT_FEAT_AMDV1_FORCE_COHERENCE)); + + dom =3D kzalloc(sizeof(*dom), GFP_KERNEL); + if (!dom) + return ERR_PTR(-ENOMEM); + + spin_lock_init(&dom->lock); + dom->amdv1.iommu.nid =3D NUMA_NO_NODE; + dom->amdv1.iommu.driver_ops =3D &noiommu_driver_ops; + dom->domain.ops =3D &noiommu_amdv1_ops; + + /* Use SW-only page table which is based on AMDV1 */ + rc =3D pt_iommu_amdv1_init(&dom->amdv1, &cfg, GFP_KERNEL); + if (rc) { + kfree(dom); + return ERR_PTR(rc); + } + + return &dom->domain; +} + +static void noiommu_domain_free(struct iommu_domain *iommu_domain) +{ + struct noiommu_domain *domain =3D + container_of(iommu_domain, struct noiommu_domain, domain); + + pt_iommu_deinit(&domain->amdv1.iommu); + kfree(domain); +} + +/* + * Domain ops for iommufd no-IOMMU mode. Uses AMDV1 format as a + * SW-only IOPT because it has the best multi-page size options + * of all the formats. IOVAs serve only for IOVA-to-PA lookups, + * not for hardware DMA translation. + */ +static const struct iommu_domain_ops noiommu_amdv1_ops =3D { + IOMMU_PT_DOMAIN_OPS(amdv1), + .free =3D noiommu_domain_free, +}; + +const struct iommu_ops iommufd_noiommu_ops =3D { + .domain_alloc_paging_flags =3D noiommu_alloc_paging_flags, +}; diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommuf= d/iommufd_private.h index 6ac1965199e9..2682b5baa6e9 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -464,6 +464,8 @@ static inline void iommufd_hw_pagetable_put(struct iomm= ufd_ctx *ictx, refcount_dec(&hwpt->obj.users); } =20 +extern const struct iommu_ops iommufd_noiommu_ops; + struct iommufd_attach; =20 struct iommufd_group { --=20 2.43.0 From nobody Sun May 24 20:33:26 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 27CE7345729 for ; Thu, 21 May 2026 22:11:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779401519; cv=none; b=OFOgUrd9y2cjKdqKTV2E30jY4A0H/+5QfowL7hKmcEoLujP+FQ/NoC4BUPeOEpy2PnhEuUoFtuZ5hIpGkrrE92WVVacdKzQEqG9NOFBg4JVroiGvaf+O4BesYXgWMlNpmsrtpdVkftsTUdT1pF+AFV1YUPfz1FBXO8+9zha0R4c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779401519; c=relaxed/simple; bh=IR26TFVF1p/ShoJ7hk/esRjA5uezcqic+/+KTROdqiE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=CNu2IaNUm16HGl8VL2h0V2OetL6ZlkfLISQGDhk+Lahz8mlmuN6fz48V4dTPLKmT2/XtMyMhOxJ6E0gQ13QRHRvzuj/E6lE/W6bMxO++JDUV92PW6qAPLC67mNZwUGtX2kkNM5y08eyy3SUalSVxadE6eaGcaCWmAKKXkh2SrdE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=OeX6CneR; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="OeX6CneR" Received: from administrator-PowerEdge-R660.corp.microsoft.com (unknown [131.107.147.7]) by linux.microsoft.com (Postfix) with ESMTPSA id 2228C20B716B; Thu, 21 May 2026 15:11:50 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 2228C20B716B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1779401510; bh=DymJjM/8uCvIk2lLmgH7Wp2OWbrYkSL1YxpydroEpMk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=OeX6CneRFtnnwecBwfzTkPbuNlPEF0Bkvc08BkKwrY5g2qCXeo8etMxcVv7vCotS6 kfZz6YI4j8rNI5o8/n+6yhpN/tl3HnD2pYJnw20C1z7ESW22erda0nJF0Z1Igd/KFy m2iXmbC6Q8Kac7M6RDRFBWX2ns+GnInZzGE/NXs8= From: Jacob Pan To: linux-kernel@vger.kernel.org, "iommu@lists.linux.dev" , Jason Gunthorpe , Alex Williamson , Joerg Roedel , Mostafa Saleh , David Matlack , Robin Murphy , Nicolin Chen , "Tian, Kevin" , Yi Liu , Baolu Lu Cc: Saurabh Sengar , skhawaja@google.com, pasha.tatashin@soleen.com, Will Deacon , Jacob Pan Subject: [PATCH v6 2/7] iommufd: Move igroup allocation to a function Date: Thu, 21 May 2026 15:11:49 -0700 Message-ID: <20260521221155.1375144-3-jacob.pan@linux.microsoft.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260521221155.1375144-1-jacob.pan@linux.microsoft.com> References: <20260521221155.1375144-1-jacob.pan@linux.microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Jason Gunthorpe So it can be reused in the next patch which allows binding to noiommu device. Reviewed-by: Samiullah Khawaja Reviewed-by: Yi Liu Reviewed-by: Kevin Tian Signed-off-by: Jason Gunthorpe Signed-off-by: Jacob Pan Reviewed-by: Lu Baolu --- v5: - Add NULL group to the error handling path of iommufd_group_setup_msi() v3: - New patch --- drivers/iommu/iommufd/device.c | 43 +++++++++++++++++++++------------- 1 file changed, 27 insertions(+), 16 deletions(-) diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 170a7005f0bc..d03076fcf3c2 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -56,6 +56,30 @@ static bool iommufd_group_try_get(struct iommufd_group *= igroup, return kref_get_unless_zero(&igroup->ref); } =20 +static struct iommufd_group *iommufd_alloc_group(struct iommufd_ctx *ictx, + struct iommu_group *group) +{ + struct iommufd_group *new_igroup; + + new_igroup =3D kzalloc(sizeof(*new_igroup), GFP_KERNEL); + if (!new_igroup) + return ERR_PTR(-ENOMEM); + + kref_init(&new_igroup->ref); + mutex_init(&new_igroup->lock); + xa_init(&new_igroup->pasid_attach); + new_igroup->sw_msi_start =3D PHYS_ADDR_MAX; + /* group reference moves into new_igroup */ + new_igroup->group =3D group; + + /* + * The ictx is not additionally refcounted here because all objects using + * an igroup must put it before their destroy completes. + */ + new_igroup->ictx =3D ictx; + return new_igroup; +} + /* * iommufd needs to store some more data for each iommu_group, we keep a * parallel xarray indexed by iommu_group id to hold this instead of putti= ng it @@ -87,25 +111,12 @@ static struct iommufd_group *iommufd_get_group(struct = iommufd_ctx *ictx, } xa_unlock(&ictx->groups); =20 - new_igroup =3D kzalloc_obj(*new_igroup); - if (!new_igroup) { + new_igroup =3D iommufd_alloc_group(ictx, group); + if (IS_ERR(new_igroup)) { iommu_group_put(group); - return ERR_PTR(-ENOMEM); + return new_igroup; } =20 - kref_init(&new_igroup->ref); - mutex_init(&new_igroup->lock); - xa_init(&new_igroup->pasid_attach); - new_igroup->sw_msi_start =3D PHYS_ADDR_MAX; - /* group reference moves into new_igroup */ - new_igroup->group =3D group; - - /* - * The ictx is not additionally refcounted here becase all objects using - * an igroup must put it before their destroy completes. - */ - new_igroup->ictx =3D ictx; - /* * We dropped the lock so igroup is invalid. NULL is a safe and likely * value to assume for the xa_cmpxchg algorithm. --=20 2.43.0 From nobody Sun May 24 20:33:26 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id B3E82356756 for ; Thu, 21 May 2026 22:11:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779401520; cv=none; b=EeIxuEERxAmY84vFH1xRKFAQAMMKrX8kR6f/FqgRAaSlaLWDkOnMJmO0Phe4tfhBl4hWQVrBKpKLyQ/AbLid0krY3aV+mi78FbKYQsD1gVy6tuhmMFiksf35s8vlJd2ggpXbZerKjn3u6wgnCnlQ/4f3dGl+m/Tg3bQQSbuT35E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779401520; c=relaxed/simple; bh=E7olAJ10qdpErUe8hcsSC5FtxJ201xjlB1L+h/O6J7A=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=YPr2fL25vkOcumm5AJiRqn/2GbkYNm13AYCeuQUJbBg0H86Bui3JI7Fim2SOLMIZsw8oM2JJgy1LqDi80QXocragBwPuE04cl5T7FuxbvoRDU3GlFceXd95LXuOWo/FeoBu+0svfV2ofos66gFreAnvvq6jvwgkxbfMpbns/Li8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=VBdBJrOI; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="VBdBJrOI" Received: from administrator-PowerEdge-R660.corp.microsoft.com (unknown [131.107.147.7]) by linux.microsoft.com (Postfix) with ESMTPSA id CC3DC20B716C; Thu, 21 May 2026 15:11:50 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com CC3DC20B716C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1779401510; bh=c9ffTynzEGJ7A6ExP1VWZqw9ypGRpQMwxoh6yLQ7cI8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=VBdBJrOIXccSyV2m5ukOtgx02qoTBOF0l1WuDDGZL96xC3W4LpUXCSjb/QtHAZcbF R+MN3+g0BTFwxZzu7r7TGP7H9VOyW4r2ZtnQN+2FFA8UP1KnZzMdxNXfN84fk3Jeu6 kaqv3IANsc68AQqK3bHQAWPC6IENFDPqErq80Nek= From: Jacob Pan To: linux-kernel@vger.kernel.org, "iommu@lists.linux.dev" , Jason Gunthorpe , Alex Williamson , Joerg Roedel , Mostafa Saleh , David Matlack , Robin Murphy , Nicolin Chen , "Tian, Kevin" , Yi Liu , Baolu Lu Cc: Saurabh Sengar , skhawaja@google.com, pasha.tatashin@soleen.com, Will Deacon , Jacob Pan Subject: [PATCH v6 3/7] iommufd: Allow binding to a noiommu device Date: Thu, 21 May 2026 15:11:50 -0700 Message-ID: <20260521221155.1375144-4-jacob.pan@linux.microsoft.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260521221155.1375144-1-jacob.pan@linux.microsoft.com> References: <20260521221155.1375144-1-jacob.pan@linux.microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Jason Gunthorpe Allow iommufd to bind devices without an IOMMU (noiommu mode) by creating a dummy IOMMU group for such devices and skipping hwpt operations. This enables noiommu devices to operate through the same iommufd API as IOM= MU- capable devices. Reviewed-by: Yi Liu Signed-off-by: Jason Gunthorpe Signed-off-by: Jacob Pan Reviewed-by: Lu Baolu --- v6: - Expand iommufd_device_is_noiommu() comment to explain why dev->iommu is checked instead of device_iommu_mapped() (Yi & Baolu) - Simplify bind error handling by factoring out duplicated rc check (Yi) v5: - simplify logic and rename iommufd_device_is_noiommu (Kevin, Yi) - use a helper iommufd_bind_noiommu instead of open coding (Kevin) - move IOMMU cap check under iommufd_bind_iommu() (Yi) - reword comments for partial init (Yi) - misc minor clean up v4: - Update the description of the module parameter (Alex) v3: - Consolidate into fewer patches fix baolu comment Signed-off-by: Jacob Pan --- drivers/iommu/iommufd/device.c | 149 ++++++++++++++++++++++++--------- 1 file changed, 110 insertions(+), 39 deletions(-) diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index d03076fcf3c2..ff7f7bff5058 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -23,6 +23,19 @@ struct iommufd_attach { struct xarray device_array; }; =20 +/* + * Detect a noiommu device for the cdev path. We check dev->iommu rather t= han + * using device_iommu_mapped() (which checks dev->iommu_group) because when + * both group and cdev interfaces coexist, the group path assigns a fake + * noiommu iommu_group to the device. That would cause device_iommu_mapped= () + * to return true and hide the noiommu case from the cdev path. dev->iommu= is + * reliably NULL when no IOMMU driver is managing the device. + */ +static bool iommufd_device_is_noiommu(struct iommufd_device *idev) +{ + return IS_ENABLED(CONFIG_IOMMUFD_NOIOMMU) && !idev->dev->iommu; +} + static void iommufd_group_release(struct kref *kref) { struct iommufd_group *igroup =3D @@ -30,9 +43,11 @@ static void iommufd_group_release(struct kref *kref) =20 WARN_ON(!xa_empty(&igroup->pasid_attach)); =20 - xa_cmpxchg(&igroup->ictx->groups, iommu_group_id(igroup->group), igroup, - NULL, GFP_KERNEL); - iommu_group_put(igroup->group); + if (igroup->group) { + xa_cmpxchg(&igroup->ictx->groups, iommu_group_id(igroup->group), + igroup, NULL, GFP_KERNEL); + iommu_group_put(igroup->group); + } mutex_destroy(&igroup->lock); kfree(igroup); } @@ -204,32 +219,20 @@ void iommufd_device_destroy(struct iommufd_object *ob= j) struct iommufd_device *idev =3D container_of(obj, struct iommufd_device, obj); =20 - iommu_device_release_dma_owner(idev->dev); + /* igroup is NULL when destroy called during bind error cleanup */ + if (!idev->igroup) + return; + if (!iommufd_device_is_noiommu(idev)) + iommu_device_release_dma_owner(idev->dev); iommufd_put_group(idev->igroup); if (!iommufd_selftest_is_mock_dev(idev->dev)) iommufd_ctx_put(idev->ictx); } =20 -/** - * iommufd_device_bind - Bind a physical device to an iommu fd - * @ictx: iommufd file descriptor - * @dev: Pointer to a physical device struct - * @id: Output ID number to return to userspace for this device - * - * A successful bind establishes an ownership over the device and returns - * struct iommufd_device pointer, otherwise returns error pointer. - * - * A driver using this API must set driver_managed_dma and must not touch - * the device until this routine succeeds and establishes ownership. - * - * Binding a PCI device places the entire RID under iommufd control. - * - * The caller must undo this with iommufd_device_unbind() - */ -struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx, - struct device *dev, u32 *id) +static int iommufd_bind_iommu(struct iommufd_device *idev) { - struct iommufd_device *idev; + struct iommufd_ctx *ictx =3D idev->ictx; + struct device *dev =3D idev->dev; struct iommufd_group *igroup; int rc; =20 @@ -238,11 +241,11 @@ struct iommufd_device *iommufd_device_bind(struct iom= mufd_ctx *ictx, * to restore cache coherency. */ if (!device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY)) - return ERR_PTR(-EINVAL); + return -EINVAL; =20 igroup =3D iommufd_get_group(ictx, dev); if (IS_ERR(igroup)) - return ERR_CAST(igroup); + return PTR_ERR(igroup); =20 /* * For historical compat with VFIO the insecure interrupt path is @@ -268,21 +271,77 @@ struct iommufd_device *iommufd_device_bind(struct iom= mufd_ctx *ictx, if (rc) goto out_group_put; =20 + /* igroup refcount moves into iommufd_device */ + idev->igroup =3D igroup; + idev->enforce_cache_coherency =3D + device_iommu_capable(dev, IOMMU_CAP_ENFORCE_CACHE_COHERENCY); + return 0; + +out_group_put: + iommufd_put_group(igroup); + return rc; +} + +/* + * Noiommu devices have no real IOMMU group. Create a dummy igroup so that + * internal code paths that expect idev->igroup to be present still work. + * A NULL igroup->group distinguishes this from a real IOMMU-backed group. + */ +static int iommufd_bind_noiommu(struct iommufd_device *idev) +{ + struct iommufd_group *igroup; + + igroup =3D iommufd_alloc_group(idev->ictx, NULL); + if (IS_ERR(igroup)) + return PTR_ERR(igroup); + idev->igroup =3D igroup; + return 0; +} + +/** + * iommufd_device_bind - Bind a physical device to an iommu fd + * @ictx: iommufd file descriptor + * @dev: Pointer to a physical device struct + * @id: Output ID number to return to userspace for this device + * + * A successful bind establishes an ownership over the device and returns + * struct iommufd_device pointer, otherwise returns error pointer. + * + * A driver using this API must set driver_managed_dma and must not touch + * the device until this routine succeeds and establishes ownership. + * + * Binding a PCI device places the entire RID under iommufd control. + * + * The caller must undo this with iommufd_device_unbind() + */ +struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx, + struct device *dev, u32 *id) +{ + struct iommufd_device *idev; + int rc; + idev =3D iommufd_object_alloc(ictx, idev, IOMMUFD_OBJ_DEVICE); - if (IS_ERR(idev)) { - rc =3D PTR_ERR(idev); - goto out_release_owner; - } + if (IS_ERR(idev)) + return idev; + idev->ictx =3D ictx; + idev->dev =3D dev; + + if (!iommufd_device_is_noiommu(idev)) + rc =3D iommufd_bind_iommu(idev); + else + rc =3D iommufd_bind_noiommu(idev); + if (rc) + goto err_out; + + /* + * Take a ctx reference after bind succeeds. This must happen here + * so that iommufd_device_destroy() can handle partial initialization + */ if (!iommufd_selftest_is_mock_dev(dev)) iommufd_ctx_get(ictx); - idev->dev =3D dev; - idev->enforce_cache_coherency =3D - device_iommu_capable(dev, IOMMU_CAP_ENFORCE_CACHE_COHERENCY); /* The calling driver is a user until iommufd_device_unbind() */ refcount_inc(&idev->obj.users); - /* igroup refcount moves into iommufd_device */ - idev->igroup =3D igroup; =20 /* * If the caller fails after this success it must call @@ -294,11 +353,14 @@ struct iommufd_device *iommufd_device_bind(struct iom= mufd_ctx *ictx, *id =3D idev->obj.id; return idev; =20 -out_release_owner: - iommu_device_release_dma_owner(dev); -out_group_put: - iommufd_put_group(igroup); +err_out: + /* + * iommufd_device_destroy() handles partially initialized idev, + * so iommufd_object_abort_and_destroy() is safe to call here. + */ + iommufd_object_abort_and_destroy(ictx, &idev->obj); return ERR_PTR(rc); + } EXPORT_SYMBOL_NS_GPL(iommufd_device_bind, "IOMMUFD"); =20 @@ -512,6 +574,9 @@ static int iommufd_hwpt_attach_device(struct iommufd_hw= _pagetable *hwpt, struct iommufd_attach_handle *handle; int rc; =20 + if (iommufd_device_is_noiommu(idev)) + return 0; + if (!iommufd_hwpt_compatible_device(hwpt, idev)) return -EINVAL; =20 @@ -559,6 +624,9 @@ static void iommufd_hwpt_detach_device(struct iommufd_h= w_pagetable *hwpt, { struct iommufd_attach_handle *handle; =20 + if (iommufd_device_is_noiommu(idev)) + return; + handle =3D iommufd_device_get_attach_handle(idev, pasid); if (pasid =3D=3D IOMMU_NO_PASID) iommu_detach_group_handle(hwpt->domain, idev->igroup->group); @@ -577,6 +645,9 @@ static int iommufd_hwpt_replace_device(struct iommufd_d= evice *idev, struct iommufd_attach_handle *handle, *old_handle; int rc; =20 + if (iommufd_device_is_noiommu(idev)) + return 0; + if (!iommufd_hwpt_compatible_device(hwpt, idev)) return -EINVAL; =20 @@ -652,7 +723,7 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_paget= able *hwpt, goto err_release_devid; } =20 - if (attach_resv) { + if (attach_resv && !iommufd_device_is_noiommu(idev)) { rc =3D iommufd_device_attach_reserved_iova(idev, hwpt_paging); if (rc) goto err_release_devid; --=20 2.43.0 From nobody Sun May 24 20:33:26 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 7244D36826E for ; Thu, 21 May 2026 22:11:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779401521; cv=none; b=C2kYjp5aXnHJ+gWGnypyOcvhLXJBnGvaHWiM/oTo4/9ERAA8SSHe/bJAF9gQ++9B3GQ3eG8PNXpmYZEudRHW2rG1I1g110zltoB77lEpCWeeA0/yoVWvab2fcy85npqdJrbHtmvt1ytN1fw6cNGFiLbtMA7rVP3ZgOaG0hV1mI8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779401521; c=relaxed/simple; bh=O6QDql38CrwSgWajz2716r2zAwbf9jQp80tclOGPZB8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=HO6cme1zdnEnSxriyvrlB4b3HFY4TQGHYUTm9hS/mV83yDcg2RTCYudx2efk2kPMjLFOKKXGJZP/NmFkEAjllSxJFmOxWDyaMbHGITnRwHvhrBMfqwH0IX0qZIIVH48id3x7AgjzGaTKWMFjqw/by2dtlS24NwdQXZY0r2EknFc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=AiNbw3Ua; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="AiNbw3Ua" Received: from administrator-PowerEdge-R660.corp.microsoft.com (unknown [131.107.147.7]) by linux.microsoft.com (Postfix) with ESMTPSA id 7FEA820B716E; Thu, 21 May 2026 15:11:51 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 7FEA820B716E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1779401511; bh=EDlso5VltbS/VZ1ZeOiXcE7WWiFXGm63L1qp6L4RSaE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=AiNbw3UaX92rHBzTfSE4aoxHu4JlPU7FL2Xuccyf+AjBkec9N7mmxwiNkLjlpyfkm 7perUdEMaV5Jj0Grxgp1xMAAIxsJ5ubagTNiAS54pqlX2m8grOXwyM8Ule+dLyoERb Vtr2FPErqDMT5FXnkLAFqQYjjUHTIsykezgnXSww= From: Jacob Pan To: linux-kernel@vger.kernel.org, "iommu@lists.linux.dev" , Jason Gunthorpe , Alex Williamson , Joerg Roedel , Mostafa Saleh , David Matlack , Robin Murphy , Nicolin Chen , "Tian, Kevin" , Yi Liu , Baolu Lu Cc: Saurabh Sengar , skhawaja@google.com, pasha.tatashin@soleen.com, Will Deacon , Jacob Pan Subject: [PATCH v6 4/7] iommufd: Add an ioctl to query PA from IOVA for noiommu mode Date: Thu, 21 May 2026 15:11:51 -0700 Message-ID: <20260521221155.1375144-5-jacob.pan@linux.microsoft.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260521221155.1375144-1-jacob.pan@linux.microsoft.com> References: <20260521221155.1375144-1-jacob.pan@linux.microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" To support no-IOMMU mode where userspace drivers perform unsafe DMA using physical addresses, introduce a new API to retrieve the physical address of a user-allocated DMA buffer that has been mapped to an IOVA via IOAS. The mapping is backed by SW-only I/O page tables maintained by the generic IOMMUPT framework. Reviewed-by: Lu Baolu Suggested-by: Jason Gunthorpe Co-developed-by: Jason Gunthorpe Signed-off-by: Jason Gunthorpe Signed-off-by: Jacob Pan --- v6: - Limit search length (Baolu, Jason) v5: - Fix next_iova exceeds iopt_area_last_iova (Alex) - Rename IOCTL more specific to NOIOMMU, i.e. IOMMUFD_CMD_IOAS_NOIOMMU_GET_PA (Kevin) - Add header stubs for iopt_get_phys() v4: - Fix ioctl return type (Yi Liu) --- drivers/iommu/iommufd/io_pagetable.c | 72 +++++++++++++++++++++++++ drivers/iommu/iommufd/ioas.c | 30 +++++++++++ drivers/iommu/iommufd/iommufd_private.h | 18 +++++++ drivers/iommu/iommufd/main.c | 3 ++ include/uapi/linux/iommufd.h | 27 ++++++++++ 5 files changed, 150 insertions(+) diff --git a/drivers/iommu/iommufd/io_pagetable.c b/drivers/iommu/iommufd/i= o_pagetable.c index 24d4917105d9..4369447e2125 100644 --- a/drivers/iommu/iommufd/io_pagetable.c +++ b/drivers/iommu/iommufd/io_pagetable.c @@ -859,6 +859,78 @@ int iopt_unmap_iova(struct io_pagetable *iopt, unsigne= d long iova, return iopt_unmap_iova_range(iopt, iova, iova_last, unmapped); } =20 +#ifdef CONFIG_IOMMUFD_NOIOMMU +int iopt_get_phys(struct io_pagetable *iopt, unsigned long iova, u64 *padd= r, + u64 *length) +{ + struct iopt_area *area; + u64 max_length =3D *length; + u64 tmp_length =3D 0; + u64 tmp_paddr =3D 0; + int rc =3D 0; + + down_read(&iopt->iova_rwsem); + area =3D iopt_area_iter_first(iopt, iova, iova); + if (!area || !area->pages) { + rc =3D -ENOENT; + goto unlock_exit; + } + + if (!area->storage_domain || + area->storage_domain->owner !=3D &iommufd_noiommu_ops) { + rc =3D -EOPNOTSUPP; + goto unlock_exit; + } + + *paddr =3D iommu_iova_to_phys(area->storage_domain, iova); + if (!*paddr) { + rc =3D -EINVAL; + goto unlock_exit; + } + + tmp_length =3D PAGE_SIZE - offset_in_page(iova); + tmp_paddr =3D *paddr; + /* + * Scan the domain for the contiguous physical address length so that + * userspace search can be optimized for fewer ioctls. A max_length of + * 0 means no limit. + */ + while (iova < iopt_area_last_iova(area)) { + unsigned long next_iova; + u64 next_paddr; + + if (max_length && tmp_length >=3D max_length) { + tmp_length =3D max_length; + break; + } + + if (check_add_overflow(iova, PAGE_SIZE, &next_iova)) + break; + + if (next_iova > iopt_area_last_iova(area)) + break; + + next_paddr =3D iommu_iova_to_phys(area->storage_domain, next_iova); + + if (!next_paddr || next_paddr !=3D tmp_paddr + PAGE_SIZE) + break; + + iova =3D next_iova; + tmp_paddr +=3D PAGE_SIZE; + tmp_length +=3D PAGE_SIZE; + } + + if (max_length && tmp_length > max_length) + tmp_length =3D max_length; + *length =3D tmp_length; + +unlock_exit: + up_read(&iopt->iova_rwsem); + + return rc; +} +#endif + int iopt_unmap_all(struct io_pagetable *iopt, unsigned long *unmapped) { /* If the IOVAs are empty then unmap all succeeds */ diff --git a/drivers/iommu/iommufd/ioas.c b/drivers/iommu/iommufd/ioas.c index fed06c2b728e..82bbc0c2357e 100644 --- a/drivers/iommu/iommufd/ioas.c +++ b/drivers/iommu/iommufd/ioas.c @@ -375,6 +375,36 @@ int iommufd_ioas_unmap(struct iommufd_ucmd *ucmd) return rc; } =20 +#ifdef CONFIG_IOMMUFD_NOIOMMU +int iommufd_ioas_noiommu_get_pa(struct iommufd_ucmd *ucmd) +{ + struct iommu_ioas_noiommu_get_pa *cmd =3D ucmd->cmd; + struct iommufd_ioas *ioas; + int rc; + + if (!capable(CAP_SYS_RAWIO)) + return -EPERM; + + if (cmd->flags || cmd->__reserved) + return -EOPNOTSUPP; + + ioas =3D iommufd_get_ioas(ucmd->ictx, cmd->ioas_id); + if (IS_ERR(ioas)) + return PTR_ERR(ioas); + + rc =3D iopt_get_phys(&ioas->iopt, cmd->iova, &cmd->out_phys, + &cmd->length); + if (rc) + goto out_put; + + rc =3D iommufd_ucmd_respond(ucmd, sizeof(*cmd)); +out_put: + iommufd_put_object(ucmd->ictx, &ioas->obj); + + return rc; +} +#endif + static void iommufd_release_all_iova_rwsem(struct iommufd_ctx *ictx, struct xarray *ioas_list) { diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommuf= d/iommufd_private.h index 2682b5baa6e9..13f1506d8066 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -118,6 +118,16 @@ int iopt_map_pages(struct io_pagetable *iopt, struct l= ist_head *pages_list, int iopt_unmap_iova(struct io_pagetable *iopt, unsigned long iova, unsigned long length, unsigned long *unmapped); int iopt_unmap_all(struct io_pagetable *iopt, unsigned long *unmapped); +#ifdef CONFIG_IOMMUFD_NOIOMMU +int iopt_get_phys(struct io_pagetable *iopt, unsigned long iova, u64 *padd= r, + u64 *length); +#else +static inline int iopt_get_phys(struct io_pagetable *iopt, unsigned long i= ova, + u64 *paddr, u64 *length) +{ + return -EOPNOTSUPP; +} +#endif =20 int iopt_read_and_clear_dirty_data(struct io_pagetable *iopt, struct iommu_domain *domain, @@ -346,6 +356,14 @@ int iommufd_ioas_map_file(struct iommufd_ucmd *ucmd); int iommufd_ioas_change_process(struct iommufd_ucmd *ucmd); int iommufd_ioas_copy(struct iommufd_ucmd *ucmd); int iommufd_ioas_unmap(struct iommufd_ucmd *ucmd); +#ifdef CONFIG_IOMMUFD_NOIOMMU +int iommufd_ioas_noiommu_get_pa(struct iommufd_ucmd *ucmd); +#else +static inline int iommufd_ioas_noiommu_get_pa(struct iommufd_ucmd *ucmd) +{ + return -EOPNOTSUPP; +} +#endif int iommufd_ioas_option(struct iommufd_ucmd *ucmd); int iommufd_option_rlimit_mode(struct iommu_option *cmd, struct iommufd_ctx *ictx); diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index 8c6d43601afb..3b4192d70570 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -424,6 +424,7 @@ union ucmd_buffer { struct iommu_ioas_alloc alloc; struct iommu_ioas_allow_iovas allow_iovas; struct iommu_ioas_copy ioas_copy; + struct iommu_ioas_noiommu_get_pa noiommu_get_pa; struct iommu_ioas_iova_ranges iova_ranges; struct iommu_ioas_map map; struct iommu_ioas_unmap unmap; @@ -482,6 +483,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[= ] =3D { IOCTL_OP(IOMMU_IOAS_MAP, iommufd_ioas_map, struct iommu_ioas_map, iova), IOCTL_OP(IOMMU_IOAS_MAP_FILE, iommufd_ioas_map_file, struct iommu_ioas_map_file, iova), + IOCTL_OP(IOMMU_IOAS_NOIOMMU_GET_PA, iommufd_ioas_noiommu_get_pa, struct i= ommu_ioas_noiommu_get_pa, + out_phys), IOCTL_OP(IOMMU_IOAS_UNMAP, iommufd_ioas_unmap, struct iommu_ioas_unmap, length), IOCTL_OP(IOMMU_OPTION, iommufd_option, struct iommu_option, val64), diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index e998dfbd6960..26b4998439e8 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -57,6 +57,7 @@ enum { IOMMUFD_CMD_IOAS_CHANGE_PROCESS =3D 0x92, IOMMUFD_CMD_VEVENTQ_ALLOC =3D 0x93, IOMMUFD_CMD_HW_QUEUE_ALLOC =3D 0x94, + IOMMUFD_CMD_IOAS_NOIOMMU_GET_PA =3D 0x95, }; =20 /** @@ -219,6 +220,32 @@ struct iommu_ioas_map { }; #define IOMMU_IOAS_MAP _IO(IOMMUFD_TYPE, IOMMUFD_CMD_IOAS_MAP) =20 +/** + * struct iommu_ioas_noiommu_get_pa - ioctl(IOMMU_IOAS_NOIOMMU_GET_PA) + * @size: sizeof(struct iommu_ioas_noiommu_get_pa) + * @flags: Reserved, must be 0 for now + * @ioas_id: IOAS ID to query IOVA to PA mapping from + * @__reserved: Must be 0 + * @iova: IOVA to query + * @length: On input, maximum number of bytes to scan for contiguity (0 me= ans + * no limit). On output, actual number of contiguous bytes starti= ng + * from out_phys. + * @out_phys: Output physical address the IOVA maps to + * + * Query the physical address backing an IOVA range. The entire range must= be + * mapped already. For noiommu devices doing unsafe DMA only. + */ +struct iommu_ioas_noiommu_get_pa { + __u32 size; + __u32 flags; + __u32 ioas_id; + __u32 __reserved; + __aligned_u64 iova; + __aligned_u64 length; + __aligned_u64 out_phys; +}; +#define IOMMU_IOAS_NOIOMMU_GET_PA _IO(IOMMUFD_TYPE, IOMMUFD_CMD_IOAS_NOIOM= MU_GET_PA) + /** * struct iommu_ioas_map_file - ioctl(IOMMU_IOAS_MAP_FILE) * @size: sizeof(struct iommu_ioas_map_file) --=20 2.43.0 From nobody Sun May 24 20:33:26 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 5FEBB36E489 for ; Thu, 21 May 2026 22:12:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779401522; cv=none; b=iDnPwozawvSAfhGc2SXu/R6UHrl0nzdSxd/wLF6fqiJpSNQBWltJbwdTc2C4qBYPbEmuSe0qCJwNySQO5+jD2CsU/U5qzlfsLKVgXaZCEntndmQB74lbD47fW0Jg8oQonAoFaV2GQcBcZlPB3LyjzdzBMIt6UxEjJ+zIcsHB/GM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779401522; c=relaxed/simple; bh=A9ah0RgMfOidz5z7nkdG4Mwg2UwD6V86iSpbtMSeD1w=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=f3uobcSyM34AEEs8eeEqdl0UhQsucXeTRYjAUid9PAbGsKa7WW1mDRf1R7hlBrzQV0EXxPXw1VRCauOw7alUj1oUrKHLF3PIRLjMcGUAx6Kj7RacJuEVKwUgkj+nhwZKtlKF5/rDAJycdhOVvbHX3pMDumv1/bqH27tOxJPnSM8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=Qukz5zyU; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="Qukz5zyU" Received: from administrator-PowerEdge-R660.corp.microsoft.com (unknown [131.107.147.7]) by linux.microsoft.com (Postfix) with ESMTPSA id 376F120B7167; Thu, 21 May 2026 15:11:52 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 376F120B7167 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1779401512; bh=5edBK/N/MRvo0l8pw3uQHEpk9bR+/l7DSu1rr7MICeM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Qukz5zyUi7gKJR0TmZCrR9aevV0UmLnBqvbbIIymzzda0yVRSqoC8fx3IhKMkWQ0K jfjRtgEDDtYLYi5ZxY3i8Nhaeo6qoLmguLeyYFnS++3AdOgnJ12MwZyL3okmZumu46 wgA95fJDaXl1xVOqiDxeuSxN8MK+ohEJk/SI+gTY= From: Jacob Pan To: linux-kernel@vger.kernel.org, "iommu@lists.linux.dev" , Jason Gunthorpe , Alex Williamson , Joerg Roedel , Mostafa Saleh , David Matlack , Robin Murphy , Nicolin Chen , "Tian, Kevin" , Yi Liu , Baolu Lu Cc: Saurabh Sengar , skhawaja@google.com, pasha.tatashin@soleen.com, Will Deacon , Jacob Pan Subject: [PATCH v6 5/7] vfio: Enable cdev noiommu mode under iommufd Date: Thu, 21 May 2026 15:11:52 -0700 Message-ID: <20260521221155.1375144-6-jacob.pan@linux.microsoft.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260521221155.1375144-1-jacob.pan@linux.microsoft.com> References: <20260521221155.1375144-1-jacob.pan@linux.microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Now that devices under noiommu mode can bind with IOMMUFD and perform IOAS operations, lift restrictions on cdev from VFIO side. Use cases are documented in Documentation/driver-api/vfio.rst Signed-off-by: Jacob Pan --- v6: - Revert back to unified VFIO_NOIOMMU Kconfig for both cdev and group. Use Kconfig dependency to restrict usages and avoid null group checks. (Alex & Yi) - Add CAP_SYS_RAWIO checks for cdev open to maintain security parity with the group noiommu path. (Alex) v5: - Add Kconfig VFIO_CDEV_NOIOMMU to select IOMMUFD_NOIOMMU and its dependencies - Add comment to explain vfio_noiommu conditional definition (Alex) - Removed early return for group noiommu in bind/unbind - Use consistent wording referring to VFIO noiommu mode (Kevin) - Update unsafe_noiommu Kconfig help text (Kevin) - Change dev_warn to dev_info for noiommu enabling msg (Kevin) v4: - Remove early return in iommufd_bind for noiommu (Alex) v3: - Consolidate into fewer patches v2: - removed unnecessary device->noiommu set in iommufd_vfio_compat_ioas_get_id() Signed-off-by: Jacob Pan --- drivers/vfio/Kconfig | 8 +++++--- drivers/vfio/device_cdev.c | 3 +++ drivers/vfio/iommufd.c | 6 +++--- drivers/vfio/vfio.h | 20 +++++++++++++------- drivers/vfio/vfio_main.c | 23 +++++++++++++++++++---- include/linux/vfio.h | 1 + 6 files changed, 44 insertions(+), 17 deletions(-) diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig index ceae52fd7586..d3d8fef2855c 100644 --- a/drivers/vfio/Kconfig +++ b/drivers/vfio/Kconfig @@ -22,8 +22,7 @@ config VFIO_DEVICE_CDEV The VFIO device cdev is another way for userspace to get device access. Userspace gets device fd by opening device cdev under /dev/vfio/devices/vfioX, and then bind the device fd with an iommufd - to set up secure DMA context for device access. This interface does - not support noiommu. + to set up secure DMA context for device access. =20 If you don't know what to do here, say N. =20 @@ -62,7 +61,10 @@ endif =20 config VFIO_NOIOMMU bool "VFIO No-IOMMU support" - depends on VFIO_GROUP + depends on VFIO_GROUP || VFIO_DEVICE_CDEV + depends on !VFIO_GROUP || VFIO_CONTAINER || IOMMUFD_VFIO_CONTAINER + depends on !VFIO_DEVICE_CDEV || !GENERIC_ATOMIC64 + select IOMMUFD_NOIOMMU if VFIO_DEVICE_CDEV help VFIO is built on the ability to isolate devices using the IOMMU. Only with an IOMMU can userspace access to DMA capable devices be diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c index 54abf312cf04..4e2c1e4fc1f8 100644 --- a/drivers/vfio/device_cdev.c +++ b/drivers/vfio/device_cdev.c @@ -27,6 +27,9 @@ int vfio_device_fops_cdev_open(struct inode *inode, struc= t file *filep) struct vfio_device_file *df; int ret; =20 + if (device->noiommu && !capable(CAP_SYS_RAWIO)) + return -EPERM; + /* Paired with the put in vfio_device_fops_release() */ if (!vfio_device_try_get_registration(device)) return -ENODEV; diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c index a38d262c6028..d4f2e2a0f2f3 100644 --- a/drivers/vfio/iommufd.c +++ b/drivers/vfio/iommufd.c @@ -25,8 +25,8 @@ int vfio_df_iommufd_bind(struct vfio_device_file *df) =20 lockdep_assert_held(&vdev->dev_set->lock); =20 - /* Returns 0 to permit device opening under noiommu mode */ - if (vfio_device_is_noiommu(vdev)) + /* Group noiommu via iommufd compat needs no device binding */ + if (df->group && vfio_device_is_noiommu(vdev)) return 0; =20 return vdev->ops->bind_iommufd(vdev, ictx, &df->devid); @@ -58,7 +58,7 @@ void vfio_df_iommufd_unbind(struct vfio_device_file *df) =20 lockdep_assert_held(&vdev->dev_set->lock); =20 - if (vfio_device_is_noiommu(vdev)) + if (df->group && vfio_device_is_noiommu(vdev)) return; =20 if (vdev->ops->unbind_iommufd) diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h index e4b72e79b7e3..6f0a2dfc8a00 100644 --- a/drivers/vfio/vfio.h +++ b/drivers/vfio/vfio.h @@ -358,19 +358,13 @@ void vfio_init_device_cdev(struct vfio_device *device= ); =20 static inline int vfio_device_add(struct vfio_device *device) { - /* cdev does not support noiommu device */ - if (vfio_device_is_noiommu(device)) - return device_add(&device->device); vfio_init_device_cdev(device); return cdev_device_add(&device->cdev, &device->device); } =20 static inline void vfio_device_del(struct vfio_device *device) { - if (vfio_device_is_noiommu(device)) - device_del(&device->device); - else - cdev_device_del(&device->cdev, &device->device); + cdev_device_del(&device->cdev, &device->device); } =20 int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep); @@ -420,6 +414,18 @@ static inline void vfio_cdev_cleanup(void) } #endif /* CONFIG_VFIO_DEVICE_CDEV */ =20 +#if IS_ENABLED(CONFIG_VFIO_NOIOMMU) +static inline bool vfio_device_is_cdev_noiommu(struct vfio_device *vdev) +{ + return vdev->noiommu; +} +#else +static inline bool vfio_device_is_cdev_noiommu(struct vfio_device *vdev) +{ + return false; +} +#endif + #if IS_ENABLED(CONFIG_VFIO_VIRQFD) int __init vfio_virqfd_init(void); void vfio_virqfd_exit(void); diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index 6222376ab6ab..84381c500623 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -321,6 +321,20 @@ static int vfio_init_device(struct vfio_device *device= , struct device *dev, return ret; } =20 +static int vfio_device_set_noiommu_and_name(struct vfio_device *device) +{ + if (IS_ENABLED(CONFIG_VFIO_DEVICE_CDEV) && vfio_noiommu && !device->dev->= iommu) { + device->noiommu =3D true; + add_taint(TAINT_USER, LOCKDEP_STILL_OK); + dev_warn(device->dev, + "Adding kernel taint for vfio-noiommu cdev on device\n"); + } + + /* Just to be safe, expose to user explicitly noiommu cdev node */ + return dev_set_name(&device->device, "%svfio%d", + device->noiommu ? "noiommu-" : "", device->index); +} + static int __vfio_register_dev(struct vfio_device *device, enum vfio_group_type type) { @@ -340,20 +354,21 @@ static int __vfio_register_dev(struct vfio_device *de= vice, if (!device->dev_set) vfio_assign_device_set(device, device); =20 - ret =3D dev_set_name(&device->device, "vfio%d", device->index); + ret =3D vfio_device_set_group(device, type); if (ret) return ret; =20 - ret =3D vfio_device_set_group(device, type); + ret =3D vfio_device_set_noiommu_and_name(device); if (ret) - return ret; + goto err_out; =20 /* * VFIO always sets IOMMU_CACHE because we offer no way for userspace to * restore cache coherency. It has to be checked here because it is only * valid for cases where we are using iommu groups. */ - if (type =3D=3D VFIO_IOMMU && !vfio_device_is_noiommu(device) && + if (type =3D=3D VFIO_IOMMU && !(vfio_device_is_noiommu(device) || + vfio_device_is_cdev_noiommu(device)) && !device_iommu_capable(device->dev, IOMMU_CAP_CACHE_COHERENCY)) { ret =3D -EINVAL; goto err_out; diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 31b826efba00..45f08986359e 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -74,6 +74,7 @@ struct vfio_device { u8 iommufd_attached:1; #endif u8 cdev_opened:1; + u8 noiommu:1; /* * debug_root is a static property of the vfio_device * which must be set prior to registering the vfio_device. --=20 2.43.0 From nobody Sun May 24 20:33:26 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 80B7B35AC16 for ; Thu, 21 May 2026 22:12:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779401524; cv=none; b=br4nR7GeO/ePxhv69/ciCS7T7hhVi1Urq+ZV+f/MFfJV4A9Td3AnsKa70s1WDxmqJm2S78BIs3Wd++YnAIxhwLGANcJAMkms8VhLEWbINSR3qrFT2QBPInghLfQwJmLkmeCLprLC9IlHpvawx/Ts5f508R8hYQaOaQJX5jNlX2g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779401524; c=relaxed/simple; bh=iobayco76nHGkn5kFmtpPBRnwmYz0XX5SQaPhv3k1dI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=rZUbY13LoDnNIn40kNx/CeG5POPpGYAQIW264r6FZsjoNhZZeohuzZfWQjmBGZe3nZIFy5+ZvjUKs9dR6KJ9prGRxUwLXj1yXoKrKbc8g4Bnbyg5CuDiWPWIOruCODiHjgN8+HTT0i00+fkGNa3PaDUInVGhu8tCoBdsediGvqY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=lbf2IqCk; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="lbf2IqCk" Received: from administrator-PowerEdge-R660.corp.microsoft.com (unknown [131.107.147.7]) by linux.microsoft.com (Postfix) with ESMTPSA id 2BE8B20B716A; Thu, 21 May 2026 15:11:53 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 2BE8B20B716A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1779401513; bh=ftPYChFB+Iz59WWKCNAeA51hg/UE9AZzFRdffcuHl2A=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=lbf2IqCkkosDHT1/1WMmxXoP9uIzZAWDcTgbvY4skwFn07XYE525T7Eb02pdKXJjY SW8VpXwRbzZbGrrAXeIuMzvuUo6kJkj/DKx4VDft0wETfHQJ7ZjvsH6miJAJsOtTIW NHaHvPaUTJeEQBsem9DNkWV1F6IivwpJjUTesNnM= From: Jacob Pan To: linux-kernel@vger.kernel.org, "iommu@lists.linux.dev" , Jason Gunthorpe , Alex Williamson , Joerg Roedel , Mostafa Saleh , David Matlack , Robin Murphy , Nicolin Chen , "Tian, Kevin" , Yi Liu , Baolu Lu Cc: Saurabh Sengar , skhawaja@google.com, pasha.tatashin@soleen.com, Will Deacon , Jacob Pan Subject: [PATCH v6 6/7] selftests/vfio: Add iommufd noiommu mode selftest for cdev Date: Thu, 21 May 2026 15:11:53 -0700 Message-ID: <20260521221155.1375144-7-jacob.pan@linux.microsoft.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260521221155.1375144-1-jacob.pan@linux.microsoft.com> References: <20260521221155.1375144-1-jacob.pan@linux.microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Add comprehensive selftest for VFIO device operations with iommufd in noiommu mode. Tests cover: - Device binding to iommufd - IOAS (I/O Address Space) allocation, mapping with dummy IOVA - Retrieve PA from dummy IOVA - Device attach/detach operations as usual Signed-off-by: Jacob Pan --- v6: - Add test cases for get_pa length limit v4: - squash DSA specific selftest changes v2: - New selftest for generic noiommu bind/unbind --- tools/testing/selftests/vfio/Makefile | 1 + .../lib/include/libvfio/vfio_pci_device.h | 16 + .../selftests/vfio/lib/vfio_pci_device.c | 5 +- .../vfio/vfio_iommufd_noiommu_test.c | 664 ++++++++++++++++++ 4 files changed, 684 insertions(+), 2 deletions(-) create mode 100644 tools/testing/selftests/vfio/vfio_iommufd_noiommu_test.c diff --git a/tools/testing/selftests/vfio/Makefile b/tools/testing/selftest= s/vfio/Makefile index 0684932d91bf..c9c02fdfd946 100644 --- a/tools/testing/selftests/vfio/Makefile +++ b/tools/testing/selftests/vfio/Makefile @@ -9,6 +9,7 @@ CFLAGS =3D $(KHDR_INCLUDES) TEST_GEN_PROGS +=3D vfio_dma_mapping_test TEST_GEN_PROGS +=3D vfio_dma_mapping_mmio_test TEST_GEN_PROGS +=3D vfio_iommufd_setup_test +TEST_GEN_PROGS +=3D vfio_iommufd_noiommu_test TEST_GEN_PROGS +=3D vfio_pci_device_test TEST_GEN_PROGS +=3D vfio_pci_device_init_perf_test TEST_GEN_PROGS +=3D vfio_pci_driver_test diff --git a/tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_devi= ce.h b/tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h index 2858885a89bb..6218c91776b3 100644 --- a/tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h +++ b/tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h @@ -122,4 +122,20 @@ static inline bool vfio_pci_device_match(struct vfio_p= ci_device *device, =20 const char *vfio_pci_get_cdev_path(const char *bdf); =20 +static inline bool vfio_pci_noiommu_mode_enabled(void) +{ + char buf[8] =3D {}; + int fd, n; + + fd =3D open("/sys/module/vfio/parameters/enable_unsafe_noiommu_mode", + O_RDONLY); + if (fd < 0) + return false; + + n =3D read(fd, buf, sizeof(buf) - 1); + close(fd); + + return n > 0 && buf[0] =3D=3D 'Y'; +} + #endif /* SELFTESTS_VFIO_LIB_INCLUDE_LIBVFIO_VFIO_PCI_DEVICE_H */ diff --git a/tools/testing/selftests/vfio/lib/vfio_pci_device.c b/tools/tes= ting/selftests/vfio/lib/vfio_pci_device.c index fc75e04ef010..1a91658e812d 100644 --- a/tools/testing/selftests/vfio/lib/vfio_pci_device.c +++ b/tools/testing/selftests/vfio/lib/vfio_pci_device.c @@ -308,8 +308,9 @@ const char *vfio_pci_get_cdev_path(const char *bdf) VFIO_ASSERT_NOT_NULL(dir, "Failed to open directory %s\n", dir_path); =20 while ((entry =3D readdir(dir)) !=3D NULL) { - /* Find the file that starts with "vfio" */ - if (strncmp("vfio", entry->d_name, 4)) + /* Find the file that starts with "vfio" or "noiommu-vfio" */ + if (strncmp("vfio", entry->d_name, 4) && + strncmp("noiommu-vfio", entry->d_name, 12)) continue; =20 snprintf(cdev_path, PATH_MAX, "/dev/vfio/devices/%s", entry->d_name); diff --git a/tools/testing/selftests/vfio/vfio_iommufd_noiommu_test.c b/too= ls/testing/selftests/vfio/vfio_iommufd_noiommu_test.c new file mode 100644 index 000000000000..d91b505fc60d --- /dev/null +++ b/tools/testing/selftests/vfio/vfio_iommufd_noiommu_test.c @@ -0,0 +1,664 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * VFIO iommufd NoIOMMU Mode Selftest + * + * Tests VFIO device operations with iommufd in noiommu mode, including: + * - Device binding to iommufd + * - IOAS (I/O Address Space) allocation and management + * - Device attach/detach to IOAS + * - Memory mapping in IOAS + * - Device info queries and reset + */ + +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include "kselftest_harness.h" + +static const char iommu_dev_path[] =3D "/dev/iommu"; +static const char *cdev_path; + +static char *vfio_noiommu_get_device_id(const char *bdf) +{ + char *path =3D NULL; + char *vfio_id =3D NULL; + struct dirent *dentry; + DIR *dp; + + if (asprintf(&path, "/sys/bus/pci/devices/%s/vfio-dev", bdf) < 0) + return NULL; + + dp =3D opendir(path); + if (!dp) { + free(path); + return NULL; + } + + while ((dentry =3D readdir(dp)) !=3D NULL) { + if (strncmp("noiommu-vfio", dentry->d_name, 12) =3D=3D 0) { + vfio_id =3D strdup(dentry->d_name); + break; + } + } + + closedir(dp); + free(path); + return vfio_id; +} + +static char *vfio_noiommu_get_cdev_path(const char *bdf) +{ + char *vfio_id =3D vfio_noiommu_get_device_id(bdf); + char *cdev =3D NULL; + + if (vfio_id) { + asprintf(&cdev, "/dev/vfio/devices/%s", vfio_id); + free(vfio_id); + } + return cdev; +} + +static int vfio_device_bind_iommufd_ioctl(int cdev_fd, int iommufd) +{ + struct vfio_device_bind_iommufd bind_args =3D { + .argsz =3D sizeof(bind_args), + .iommufd =3D iommufd, + }; + + return ioctl(cdev_fd, VFIO_DEVICE_BIND_IOMMUFD, &bind_args); +} + +static int vfio_device_get_info_ioctl(int cdev_fd, + struct vfio_device_info *info) +{ + info->argsz =3D sizeof(*info); + return ioctl(cdev_fd, VFIO_DEVICE_GET_INFO, info); +} + +static int vfio_device_ioas_alloc_ioctl(int iommufd, + struct iommu_ioas_alloc *alloc_args) +{ + alloc_args->size =3D sizeof(*alloc_args); + alloc_args->flags =3D 0; + return ioctl(iommufd, IOMMU_IOAS_ALLOC, alloc_args); +} + +static int vfio_device_attach_iommufd_pt_ioctl(int cdev_fd, u32 pt_id) +{ + struct vfio_device_attach_iommufd_pt attach_args =3D { + .argsz =3D sizeof(attach_args), + .pt_id =3D pt_id, + }; + + return ioctl(cdev_fd, VFIO_DEVICE_ATTACH_IOMMUFD_PT, &attach_args); +} + +static int vfio_device_detach_iommufd_pt_ioctl(int cdev_fd) +{ + struct vfio_device_detach_iommufd_pt detach_args =3D { + .argsz =3D sizeof(detach_args), + }; + + return ioctl(cdev_fd, VFIO_DEVICE_DETACH_IOMMUFD_PT, &detach_args); +} + +static int vfio_device_get_region_info_ioctl(int cdev_fd, uint32_t index, + struct vfio_region_info *info) +{ + info->argsz =3D sizeof(*info); + info->index =3D index; + return ioctl(cdev_fd, VFIO_DEVICE_GET_REGION_INFO, info); +} + +static int vfio_device_reset_ioctl(int cdev_fd) +{ + return ioctl(cdev_fd, VFIO_DEVICE_RESET); +} + +static int ioas_map_pages(int iommufd, uint32_t ioas_id, uint64_t iova, + size_t length, bool hugepages) +{ + struct iommu_ioas_map map_args =3D { + .size =3D sizeof(map_args), + .ioas_id =3D ioas_id, + .iova =3D iova, + .length =3D length, + .flags =3D IOMMU_IOAS_MAP_READABLE | IOMMU_IOAS_MAP_WRITEABLE | IOMMU_IO= AS_MAP_FIXED_IOVA, + }; + void *pages; + int ret; + + /* Allocate test pages */ + if (hugepages) + pages =3D mmap(NULL, length, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0); + else + pages =3D mmap(NULL, length, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + if (pages =3D=3D MAP_FAILED) { + printf("mmap failed for length 0x%lx\n", (unsigned long)length); + return -ENOMEM; + } + + /* Set up page pointer for mapping */ + map_args.user_va =3D (uintptr_t)pages; + + printf(" ioas_map_pages: ioas_id=3D%u, iova=3D0x%lx, length=3D0x%lx, use= r_va=3D%p\n", + ioas_id, (unsigned long)iova, (unsigned long)length, pages); + + /* Map into IOAS */ + ret =3D ioctl(iommufd, IOMMU_IOAS_MAP, &map_args); + if (ret !=3D 0) + printf(" IOMMU_IOAS_MAP failed: %d (%s)\n", ret, strerror(errno)); + else + printf(" IOMMU_IOAS_MAP succeeded, IOVA=3D0x%lx\n", (unsigned long)map_= args.iova); + + munmap(pages, length); + return ret; +} + +static int ioas_unmap_pages(int iommufd, uint32_t ioas_id, uint64_t iova, + size_t length) +{ + struct iommu_ioas_unmap unmap_args =3D { + .size =3D sizeof(unmap_args), + .ioas_id =3D ioas_id, + .iova =3D iova, + .length =3D length, + }; + + return ioctl(iommufd, IOMMU_IOAS_UNMAP, &unmap_args); +} + +static int ioas_destroy_ioctl(int iommufd, uint32_t ioas_id) +{ + struct iommu_destroy destroy_args =3D { + .size =3D sizeof(destroy_args), + .id =3D ioas_id, + }; + + return ioctl(iommufd, IOMMU_DESTROY, &destroy_args); +} + +static int ioas_noiommu_get_pa_ioctl_len(int iommufd, uint32_t ioas_id, + uint64_t iova, uint64_t max_length, + uint64_t *phys_out, uint64_t *length_out) +{ + struct iommu_ioas_noiommu_get_pa get_pa =3D { + .size =3D sizeof(get_pa), + .flags =3D 0, + .ioas_id =3D ioas_id, + .iova =3D iova, + .length =3D max_length, + }; + + printf(" ioas_noiommu_get_pa_ioctl: ioas_id=3D%u, iova=3D0x%lx, max_leng= th=3D0x%lx\n", + ioas_id, (unsigned long)iova, (unsigned long)max_length); + + if (ioctl(iommufd, IOMMU_IOAS_NOIOMMU_GET_PA, &get_pa) !=3D 0) { + printf(" IOMMU_IOAS_NOIOMMU_GET_PA failed: %s (errno=3D%d)\n", + strerror(errno), errno); + return -1; + } + + printf(" IOMMU_IOAS_NOIOMMU_GET_PA succeeded: PA=3D0x%lx, length=3D0x%lx= \n", + (unsigned long)get_pa.out_phys, (unsigned long)get_pa.length); + + if (phys_out) + *phys_out =3D get_pa.out_phys; + if (length_out) + *length_out =3D get_pa.length; + + return 0; +} + +static int ioas_noiommu_get_pa_ioctl(int iommufd, uint32_t ioas_id, uint64= _t iova, + uint64_t *phys_out, uint64_t *length_out) +{ + return ioas_noiommu_get_pa_ioctl_len(iommufd, ioas_id, iova, 0, + phys_out, length_out); +} + +FIXTURE(vfio_noiommu) { + int cdev_fd; + int iommufd; +}; + +FIXTURE_SETUP(vfio_noiommu) +{ + ASSERT_LE(0, (self->cdev_fd =3D open(cdev_path, O_RDWR, 0))); + ASSERT_LE(0, (self->iommufd =3D open(iommu_dev_path, O_RDWR, 0))); +} + +FIXTURE_TEARDOWN(vfio_noiommu) +{ + if (self->cdev_fd >=3D 0) + close(self->cdev_fd); + if (self->iommufd >=3D 0) + close(self->iommufd); +} + +/* + * Test: Device cdev can be opened + */ +TEST_F(vfio_noiommu, device_cdev_open) +{ + ASSERT_LE(0, self->cdev_fd); +} + +/* + * Test: Device can be bound to iommufd + */ +TEST_F(vfio_noiommu, device_bind_iommufd) +{ + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd, + self->iommufd)); +} + +/* + * Test: Device info can be queried after binding + */ +TEST_F(vfio_noiommu, device_get_info_after_bind) +{ + struct vfio_device_info info; + + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd, + self->iommufd)); + ASSERT_EQ(0, vfio_device_get_info_ioctl(self->cdev_fd, &info)); + ASSERT_NE(0, info.argsz); +} + +/* + * Test: Getting device info fails without bind + */ +TEST_F(vfio_noiommu, device_get_info_without_bind_fails) +{ + struct vfio_device_info info; + + ASSERT_NE(0, vfio_device_get_info_ioctl(self->cdev_fd, &info)); +} + +/* + * Test: Binding with invalid iommufd fails + */ +TEST_F(vfio_noiommu, device_bind_bad_iommufd_fails) +{ + ASSERT_NE(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd, -2)); +} + +/* + * Test: Cannot bind twice to same device + */ +TEST_F(vfio_noiommu, device_repeated_bind_fails) +{ + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd, + self->iommufd)); + ASSERT_NE(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd, + self->iommufd)); +} + +/* + * Test: IOAS can be allocated + */ +TEST_F(vfio_noiommu, ioas_alloc) +{ + struct iommu_ioas_alloc alloc_args; + + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd, + &alloc_args)); + ASSERT_NE(0, alloc_args.out_ioas_id); +} + +/* + * Test: IOAS can be destroyed + */ +TEST_F(vfio_noiommu, ioas_destroy) +{ + struct iommu_ioas_alloc alloc_args; + + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd, + &alloc_args)); + ASSERT_EQ(0, ioas_destroy_ioctl(self->iommufd, + alloc_args.out_ioas_id)); +} + +/* + * Test: Device can attach to IOAS after binding + */ +TEST_F(vfio_noiommu, device_attach_to_ioas) +{ + struct iommu_ioas_alloc alloc_args; + + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd, + self->iommufd)); + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd, + &alloc_args)); + ASSERT_EQ(0, vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd, + alloc_args.out_ioas_id)); +} + +/* + * Test: Attaching to invalid IOAS fails + */ +TEST_F(vfio_noiommu, device_attach_invalid_ioas_fails) +{ + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd, + self->iommufd)); + ASSERT_NE(0, vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd, + UINT32_MAX)); +} + +/* + * Test: Device can detach from IOAS + */ +TEST_F(vfio_noiommu, device_detach_from_ioas) +{ + struct iommu_ioas_alloc alloc_args; + + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd, + self->iommufd)); + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd, + &alloc_args)); + ASSERT_EQ(0, vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd, + alloc_args.out_ioas_id)); + ASSERT_EQ(0, vfio_device_detach_iommufd_pt_ioctl(self->cdev_fd)); +} + +/* + * Test: Full lifecycle - bind, attach, detach, reset + */ +TEST_F(vfio_noiommu, device_lifecycle) +{ + struct iommu_ioas_alloc alloc_args; + struct vfio_device_info info; + + /* Bind device to iommufd */ + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd, + self->iommufd)); + + /* Allocate IOAS */ + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd, + &alloc_args)); + + /* Attach device to IOAS */ + ASSERT_EQ(0, vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd, + alloc_args.out_ioas_id)); + + /* Query device info */ + ASSERT_EQ(0, vfio_device_get_info_ioctl(self->cdev_fd, &info)); + + /* Detach device from IOAS */ + ASSERT_EQ(0, vfio_device_detach_iommufd_pt_ioctl(self->cdev_fd)); + + /* Reset device */ + ASSERT_EQ(0, vfio_device_reset_ioctl(self->cdev_fd)); +} + +/* + * Test: Get region info + */ +TEST_F(vfio_noiommu, device_get_region_info) +{ + struct vfio_device_info dev_info; + struct vfio_region_info region_info; + + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd, + self->iommufd)); + ASSERT_EQ(0, vfio_device_get_info_ioctl(self->cdev_fd, &dev_info)); + + /* Try to get first region info if device has regions */ + if (dev_info.num_regions > 0) { + ASSERT_EQ(0, vfio_device_get_region_info_ioctl(self->cdev_fd, 0, + ®ion_info)); + ASSERT_NE(0, region_info.argsz); + } +} + +TEST_F(vfio_noiommu, device_reset) +{ + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd, + self->iommufd)); + ASSERT_EQ(0, vfio_device_reset_ioctl(self->cdev_fd)); +} + +TEST_F(vfio_noiommu, ioas_map_pages) +{ + struct iommu_ioas_alloc alloc_args; + long page_size =3D sysconf(_SC_PAGESIZE); + uint64_t iova =3D 0x10000; + int i; + + ASSERT_GT(page_size, 0); + + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd, + &alloc_args)); + + printf("Page size: %ld bytes\n", page_size); + /* Test mapping regions of different sizes: 1, 2, 4, 8 pages */ + for (i =3D 0; i < 4; i++) { + size_t map_size =3D page_size * (1 << i); /* 1, 2, 4, 8 pages */ + uint64_t test_iova =3D iova + (i * 0x100000); + + /* Attempt to map each region (may fail if not supported) */ + ioas_map_pages(self->iommufd, alloc_args.out_ioas_id, + test_iova, map_size, false); + } +} + +TEST_F(vfio_noiommu, multiple_ioas_alloc) +{ + struct iommu_ioas_alloc alloc1, alloc2; + + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd, &alloc1)); + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd, &alloc2)); + ASSERT_NE(alloc1.out_ioas_id, alloc2.out_ioas_id); +} + +/* + * Test: Query physical address for IOVA + * Tests IOMMU_IOAS_NOIOMMU_GET_PA ioctl to translate IOVA to physical add= ress + * Note: Device must be attached to IOAS for PA query to work + */ +#define NR_PAGES 32 +TEST_F(vfio_noiommu, ioas_noiommu_get_pa_mapped) +{ + struct iommu_ioas_alloc alloc_args; + long page_size =3D sysconf(_SC_PAGESIZE); + uint64_t iova =3D 0x200000; + uint64_t phys =3D 0; + uint64_t length =3D 0; + int ret; + + ASSERT_GT(page_size, 0); + + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd, + self->iommufd)); + + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd, + &alloc_args)); + + ASSERT_EQ(0, vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd, + alloc_args.out_ioas_id)); + + /* + * Map a page into an arbitrary IOAS, used as a cookie for lookup. + * Use hugepages to test contiguous PA. Make sure hugepages are + * available. e.g. echo 64 > /proc/sys/vm/nr_hugepages + */ + ret =3D ioas_map_pages(self->iommufd, alloc_args.out_ioas_id, + iova, page_size * NR_PAGES, true); + if (ret !=3D 0) + return; + + /* Query the physical address for the mapped dummy IOVA */ + ret =3D ioas_noiommu_get_pa_ioctl(self->iommufd, alloc_args.out_ioas_id, + iova, &phys, &length); + + if (ret =3D=3D 0) { + /* If we got a result, verify it's valid */ + ASSERT_NE(0, phys); + ASSERT_GE((uint64_t)page_size * NR_PAGES, length); + } + + /* + * Query with a non-page-aligned IOVA. The returned length must + * not exceed the actual contiguous range starting from that + * offset, i.e. it must be reduced by the sub-page offset. + */ + phys =3D 0; + length =3D 0; + ret =3D ioas_noiommu_get_pa_ioctl(self->iommufd, alloc_args.out_ioas_id, + iova + 0x80, &phys, &length); + if (ret =3D=3D 0) { + ASSERT_NE(0, phys); + /* Length must account for the sub-page offset */ + ASSERT_GE((uint64_t)page_size * NR_PAGES - 0x80, length); + ASSERT_LE(length, (uint64_t)page_size * NR_PAGES - 0x80); + /* Must not overshoot into the next page boundary */ + ASSERT_EQ(0, (phys + length) % page_size); + } +} + +TEST_F(vfio_noiommu, ioas_noiommu_get_pa_unmapped_fails) +{ + struct iommu_ioas_alloc alloc_args; + + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd, + &alloc_args)); + + /* Try to retrieve unmapped IOVA (should fail) */ + ASSERT_NE(0, ioas_noiommu_get_pa_ioctl(self->iommufd, alloc_args.out_ioas= _id, + 0x10000, NULL, NULL)); +} + +/* + * Test: length =3D=3D 0 means no limit (backward compat default) + */ +TEST_F(vfio_noiommu, ioas_noiommu_get_pa_length_zero_no_limit) +{ + struct iommu_ioas_alloc alloc_args; + long page_size =3D sysconf(_SC_PAGESIZE); + uint64_t iova =3D 0x200000; + uint64_t phys_nolimit =3D 0, phys_zero =3D 0; + uint64_t len_nolimit =3D 0, len_zero =3D 0; + int ret; + + ASSERT_GT(page_size, 0); + + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd, + self->iommufd)); + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd, &alloc_args)); + ASSERT_EQ(0, vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd, + alloc_args.out_ioas_id)); + + ret =3D ioas_map_pages(self->iommufd, alloc_args.out_ioas_id, + iova, page_size * NR_PAGES, true); + if (ret !=3D 0) + return; + + /* Query with length=3D0 (no limit, default behavior) */ + ret =3D ioas_noiommu_get_pa_ioctl_len(self->iommufd, alloc_args.out_ioas_= id, + iova, 0, &phys_zero, &len_zero); + if (ret !=3D 0) + return; + + /* Query with the wrapper (also passes 0) =E2=80=94 must match */ + ret =3D ioas_noiommu_get_pa_ioctl(self->iommufd, alloc_args.out_ioas_id, + iova, &phys_nolimit, &len_nolimit); + ASSERT_EQ(0, ret); + ASSERT_EQ(phys_zero, phys_nolimit); + ASSERT_EQ(len_zero, len_nolimit); +} + +/* + * Test: length caps the returned contiguous range + */ +TEST_F(vfio_noiommu, ioas_noiommu_get_pa_length_capped) +{ + struct iommu_ioas_alloc alloc_args; + long page_size =3D sysconf(_SC_PAGESIZE); + uint64_t iova =3D 0x200000; + uint64_t phys =3D 0; + uint64_t len_full =3D 0, len_capped =3D 0; + uint64_t cap; + int ret; + + ASSERT_GT(page_size, 0); + + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd, + self->iommufd)); + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd, &alloc_args)); + ASSERT_EQ(0, vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd, + alloc_args.out_ioas_id)); + + ret =3D ioas_map_pages(self->iommufd, alloc_args.out_ioas_id, + iova, page_size * NR_PAGES, true); + if (ret !=3D 0) + return; + + /* First get the full uncapped length */ + ret =3D ioas_noiommu_get_pa_ioctl(self->iommufd, alloc_args.out_ioas_id, + iova, &phys, &len_full); + if (ret !=3D 0) + return; + + ASSERT_NE(0, phys); + ASSERT_NE(0, len_full); + + /* Cap to a single page =E2=80=94 returned length must not exceed it */ + cap =3D page_size; + ret =3D ioas_noiommu_get_pa_ioctl_len(self->iommufd, alloc_args.out_ioas_= id, + iova, cap, &phys, &len_capped); + ASSERT_EQ(0, ret); + ASSERT_LE(len_capped, cap); + ASSERT_NE(0, len_capped); + + /* + * If full length was larger than one page, confirm capping works. + * Otherwise the mapping wasn't contiguous enough to test. + */ + if (len_full > cap) + ASSERT_GT(len_full, len_capped); + + /* Cap to a very large value =E2=80=94 should return the same as uncapped= */ + ret =3D ioas_noiommu_get_pa_ioctl_len(self->iommufd, alloc_args.out_ioas_= id, + iova, UINT64_MAX, &phys, &len_capped); + ASSERT_EQ(0, ret); + ASSERT_EQ(len_full, len_capped); +} + +int main(int argc, char *argv[]) +{ + const char *device_bdf =3D vfio_selftests_get_bdf(&argc, argv); + char *cdev =3D NULL; + + if (!device_bdf) { + ksft_print_msg("No device BDF provided\n"); + return KSFT_SKIP; + } + + cdev =3D vfio_noiommu_get_cdev_path(device_bdf); + if (!cdev) { + ksft_print_msg("Could not find cdev for device %s\n", + device_bdf); + return KSFT_SKIP; + } + + cdev_path =3D cdev; + ksft_print_msg("Using cdev device %s for BDF %s\n", cdev_path, + device_bdf); + + return test_harness_run(argc, argv); +} --=20 2.43.0 From nobody Sun May 24 20:33:26 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 58AC9349CE4 for ; Thu, 21 May 2026 22:12:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779401524; cv=none; b=CZCawxKLBGkgKvOiVzB3nHmOGFI/Y1e168Wh5sJRqY1iSTUuXEMvI7XHlH+R8K1QTefO67gySC21YB4NL6B39+YYLwHCKYUAlGknFhrqcQYPzngmywe09yWxXK3uKaKuWRMxm0XMFOKHURvZPtFqjaMYJJ9jMy+jrMv+cqayir0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779401524; c=relaxed/simple; bh=R3g9rR+SQcFS/QChrabp71IpbUopGms1sA+Trghuidk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=EgR6vItug67YtcrS754g1jqM5i7R2CUSRLLIhORgg0ZGbBc1cAX11vB9iecKACSmHDfh5QLaff/WdXP/7k7vVKv35nd0sM0clupAUOXjl1wSjPSKsnZRdSo6yE40Ira/q7sdwaWhQyNeNe70wMQT0K8dXR5kgWQZ+QeTJz07MNY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=qqv+d4xU; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="qqv+d4xU" Received: from administrator-PowerEdge-R660.corp.microsoft.com (unknown [131.107.147.7]) by linux.microsoft.com (Postfix) with ESMTPSA id 52F1020B7169; Thu, 21 May 2026 15:11:54 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 52F1020B7169 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1779401514; bh=6KmR7jXbzopM+VNTMsaSMgkNWul8XMre5mGUEemKyLk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=qqv+d4xUkOk07FH7eD27wbtienOgGQym4VPekLfdpKsH0aAZlWy8zf4houxDxCY16 THqAKCzjXCPn/2iKVQMjfp0to8N97HHGRck3vBhdtcT2gCF7uIPlhxwu8pesBW2q+w LCwHZmkwtn00xJjKK+72RSoZsSzq6HCzMFjbOAdc= From: Jacob Pan To: linux-kernel@vger.kernel.org, "iommu@lists.linux.dev" , Jason Gunthorpe , Alex Williamson , Joerg Roedel , Mostafa Saleh , David Matlack , Robin Murphy , Nicolin Chen , "Tian, Kevin" , Yi Liu , Baolu Lu Cc: Saurabh Sengar , skhawaja@google.com, pasha.tatashin@soleen.com, Will Deacon , Jacob Pan Subject: [PATCH v6 7/7] Documentation: Update VFIO NOIOMMU mode Date: Thu, 21 May 2026 15:11:54 -0700 Message-ID: <20260521221155.1375144-8-jacob.pan@linux.microsoft.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260521221155.1375144-1-jacob.pan@linux.microsoft.com> References: <20260521221155.1375144-1-jacob.pan@linux.microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Document the NOIOMMU mode with newly added cdev support under iommufd. Cc: Jonathan Corbet Signed-off-by: Jacob Pan --- v6: - Generalize device node names (noiommu-vfioX, noiommu-Y) in the tree example (Yi) - Clarify table column descriptions for Yes/No meanings (Yi) --- Documentation/driver-api/vfio.rst | 83 ++++++++++++++++++++++++++++++- 1 file changed, 81 insertions(+), 2 deletions(-) diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/v= fio.rst index 2a21a42c9386..739576a22de6 100644 --- a/Documentation/driver-api/vfio.rst +++ b/Documentation/driver-api/vfio.rst @@ -275,8 +275,6 @@ in a VFIO group. With CONFIG_VFIO_DEVICE_CDEV=3Dy the user can now acquire a device fd by directly opening a character device /dev/vfio/devices/vfioX where "X" is the number allocated uniquely by VFIO for registered devices. -cdev interface does not support noiommu devices, so user should use -the legacy group interface if noiommu is wanted. =20 The cdev only works with IOMMUFD. Both VFIO drivers and applications must adapt to the new cdev security model which requires using @@ -370,6 +368,87 @@ IOMMUFD IOAS/HWPT to enable userspace DMA:: =20 /* Other device operations as stated in "VFIO Usage Example" */ =20 +VFIO NOIOMMU mode +--------------------------------------------------------------------------= ----- +VFIO also supports a no-IOMMU mode, intended for usages where unsafe DMA c= an +be performed by userspace drivers w/o physical IOMMU protection. This mode +is controlled by the parameter: + +/sys/module/vfio/parameters/enable_unsafe_noiommu_mode + +Upon enabling this mode, with an assigned device, the user will be present= ed +with a VFIO group and device file, e.g.:: + + /dev/vfio/ + |-- devices + | `-- noiommu-vfioX /* VFIO device cdev */ + |-- noiommu-Y /* VFIO group */ + `-- vfio + +The capabilities vary depending on the device programming interface and ke= rnel +configuration used. The following table summarizes the differences ("Yes" = means +the UAPI is accessible and functional in noiommu mode, "No" means the UAPI= is +not supported): + ++-------------------+---------------------+----------------------+ +| Feature | VFIO group | VFIO device cdev | ++=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ +| VFIO device UAPI | Yes | Yes | ++-------------------+---------------------+----------------------+ +| VFIO container | No | No | ++-------------------+---------------------+----------------------+ +| IOMMUFD IOAS | No | Yes* | ++-------------------+---------------------+----------------------+ + +Note that the VFIO container case includes IOMMUFD provided VFIO compatibi= lity +interfaces when either CONFIG_VFIO_CONTAINER or CONFIG_IOMMUFD_VFIO_CONTAI= NER is +enabled. + +* IOMMUFD UAPI is available for VFIO device cdev to pin and map user memor= y with + the ability to retrieve physical addresses for DMA command submission. + +Kconfig Support Matrix +^^^^^^^^^^^^^^^^^^^^^^ + +The visibility of CONFIG_VFIO_NOIOMMU depends on the combination of +CONFIG_VFIO_GROUP, CONFIG_VFIO_DEVICE_CDEV, and whether a container backend +(CONFIG_VFIO_CONTAINER or CONFIG_IOMMUFD_VFIO_CONTAINER) is configured. T= he +Kconfig dependencies enforce the following constraints: + +- At least one access path (group or cdev) must be available. +- If VFIO_GROUP is enabled, a container backend is required; otherwise the + group node would be unusable in noiommu mode. + +The resulting support matrix: + ++------+-------+-----------+------+---------+---------------------------+ +| Case | GROUP | Container | CDEV | NOIOMMU | Notes | ++=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D+=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ +| 1 | y | y | n | yes | Group noiommu works | ++------+-------+-----------+------+---------+---------------------------+ +| 2 | y | n | n | no | Blocked - no container | ++------+-------+-----------+------+---------+---------------------------+ +| 3 | y | y | y | yes | Both paths work | ++------+-------+-----------+------+---------+---------------------------+ +| 4 | y | n | y | no | Blocked - no container | ++------+-------+-----------+------+---------+---------------------------+ +| 5 | n | - | y | yes | Cdev-only works | ++------+-------+-----------+------+---------+---------------------------+ +| 6 | n | - | n | no | No access path | ++------+-------+-----------+------+---------+---------------------------+ + +Container =3D CONFIG_VFIO_CONTAINER or CONFIG_IOMMUFD_VFIO_CONTAINER (eith= er +suffices). Case 4 is intentionally blocked: allowing NOIOMMU with GROUP +enabled but no container would create unusable group nodes. Users who want +cdev-only noiommu should set CONFIG_VFIO_GROUP=3Dn (case 5). + +A new IOMMUFD ioctl IOMMU_IOAS_NOIOMMU_GET_PA is added to retrieve the phy= sical +address for a given IOVA. Although there is no physical DMA remapping hard= ware, +IOMMU_IOAS_MAP_FIXED_IOVA is still used to establish IOVA-to-PA mappings i= n the +software page table for later IOMMU_IOAS_NOIOMMU_GET_PA lookups. +tools/testing/selftests/vfio/vfio_iommufd_noiommu_test.c provides an examp= le of +using this ioctl in no-IOMMU mode. + VFIO User API --------------------------------------------------------------------------= ----- =20 --=20 2.43.0