From nobody Thu Apr 16 12:24:56 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 8B0C137EFF2 for ; Tue, 14 Apr 2026 21:14:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776201257; cv=none; b=Yo91XW8tMsilpeCPnPsMpUtzcGefDqAvY2q0PHAyvHqWBYCY+rfXrV55W332iun9orKE3BsIpsCTp/OUhwLISa9fduzH/fAsO034v9XItcC256aJKi7xIzw3LPb83iK8RcbpxLEIg0RiS5aWYIIZmvadke3ppfs/CqhnWwhZNmM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776201257; c=relaxed/simple; bh=Lhuq5xAlpwVPJn07YR/Cag2fu1WPE3u3MTdnXtAAdOY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=i/5c46/FOzHbNxVVr9KPMzwzwTFhNg6u2wu//ytV5Qb3kf41yxUlqKMO+mH9kLiHuCMjGoGF7CQljuBHPgEx/DqN82qs4oXlr8iawgBVCwjlI1/g1FhNxqhfmR+TOFVfdyWJhDWLhp+QzH/N9WiB7+yNIhd3/DS1B8+1qbiZVCk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=MlRp2nCR; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="MlRp2nCR" Received: from DESKTOP-0403QTC.corp.microsoft.com (unknown [20.191.74.188]) by linux.microsoft.com (Postfix) with ESMTPSA id CE4D120B6F08; Tue, 14 Apr 2026 14:14:14 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com CE4D120B6F08 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1776201255; bh=DueQqGsTeBZ5f9epBqPjmXrmVIT9Xztuh7ie4O7QWxY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=MlRp2nCRMjo3p5MuVZNqNBiIb4zcwNBqx6YZY56JszJAra5mlEdnyczHLzhI7BwNr BYZZGQhx9ygYjAzNLC1MhyMmlHa0iJ0lKDJwsNhRLGoBgcmY5sm5MgnaOfhobRCnyh ISP19pgS8GAqHBKUh/GiG7T4441XWm2dbIe3Iiok= From: Jacob Pan To: linux-kernel@vger.kernel.org, "iommu@lists.linux.dev" , Jason Gunthorpe , Alex Williamson , Joerg Roedel , Mostafa Saleh , David Matlack , Robin Murphy , Nicolin Chen , "Tian, Kevin" , Yi Liu Cc: skhawaja@google.com, pasha.tatashin@soleen.com, Will Deacon , Jacob Pan , Baolu Lu Subject: [PATCH V4 01/10] iommufd: Support a HWPT without an iommu driver for noiommu Date: Tue, 14 Apr 2026 14:14:03 -0700 Message-Id: <20260414211412.2729-2-jacob.pan@linux.microsoft.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260414211412.2729-1-jacob.pan@linux.microsoft.com> References: <20260414211412.2729-1-jacob.pan@linux.microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Jason Gunthorpe Create just a little part of a real iommu driver, enough to slot in under the dev_iommu_ops() and allow iommufd to call domain_alloc_paging_flags() and fail everything else. This allows explicitly creating a HWPT under an IOAS. Signed-off-by: Jason Gunthorpe Signed-off-by: Jacob Pan --- v4: - Make iommufd_noiommu_ops const v3: - Add comment to explain the design difference over the legacy noiommu VFIO code. fix const hwpt --- drivers/iommu/iommufd/Makefile | 1 + drivers/iommu/iommufd/hw_pagetable.c | 11 ++- drivers/iommu/iommufd/hwpt_noiommu.c | 102 ++++++++++++++++++++++++ drivers/iommu/iommufd/iommufd_private.h | 2 + 4 files changed, 114 insertions(+), 2 deletions(-) create mode 100644 drivers/iommu/iommufd/hwpt_noiommu.c diff --git a/drivers/iommu/iommufd/Makefile b/drivers/iommu/iommufd/Makefile index 71d692c9a8f4..2b1a020b14a6 100644 --- a/drivers/iommu/iommufd/Makefile +++ b/drivers/iommu/iommufd/Makefile @@ -10,6 +10,7 @@ iommufd-y :=3D \ vfio_compat.o \ viommu.o =20 +iommufd-$(CONFIG_VFIO_NOIOMMU) +=3D hwpt_noiommu.o iommufd-$(CONFIG_IOMMUFD_TEST) +=3D selftest.o =20 obj-$(CONFIG_IOMMUFD) +=3D iommufd.o diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/h= w_pagetable.c index fe789c2dc0c9..37316d77277d 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -8,6 +8,13 @@ #include "../iommu-priv.h" #include "iommufd_private.h" =20 +static const struct iommu_ops *get_iommu_ops(struct iommufd_device *idev) +{ + if (IS_ENABLED(CONFIG_VFIO_NOIOMMU) && !idev->igroup->group) + return &iommufd_noiommu_ops; + return dev_iommu_ops(idev->dev); +} + static void __iommufd_hwpt_destroy(struct iommufd_hw_pagetable *hwpt) { if (hwpt->domain) @@ -114,7 +121,7 @@ iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, str= uct iommufd_ioas *ioas, IOMMU_HWPT_ALLOC_DIRTY_TRACKING | IOMMU_HWPT_FAULT_ID_VALID | IOMMU_HWPT_ALLOC_PASID; - const struct iommu_ops *ops =3D dev_iommu_ops(idev->dev); + const struct iommu_ops *ops =3D get_iommu_ops(idev); struct iommufd_hwpt_paging *hwpt_paging; struct iommufd_hw_pagetable *hwpt; int rc; @@ -229,7 +236,7 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx, struct iommufd_device *idev, u32 flags, const struct iommu_user_data *user_data) { - const struct iommu_ops *ops =3D dev_iommu_ops(idev->dev); + const struct iommu_ops *ops =3D get_iommu_ops(idev); struct iommufd_hwpt_nested *hwpt_nested; struct iommufd_hw_pagetable *hwpt; int rc; diff --git a/drivers/iommu/iommufd/hwpt_noiommu.c b/drivers/iommu/iommufd/h= wpt_noiommu.c new file mode 100644 index 000000000000..1c8cae02beec --- /dev/null +++ b/drivers/iommu/iommufd/hwpt_noiommu.c @@ -0,0 +1,102 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (c) 2021-2022, NVIDIA CORPORATION & AFFILIATES + */ +#include +#include +#include "iommufd_private.h" + +static const struct iommu_domain_ops noiommu_amdv1_ops; + +struct noiommu_domain { + union { + struct iommu_domain domain; + struct pt_iommu_amdv1 amdv1; + }; + spinlock_t lock; +}; +PT_IOMMU_CHECK_DOMAIN(struct noiommu_domain, amdv1.iommu, domain); + +static void noiommu_change_top(struct pt_iommu *iommu_table, + phys_addr_t top_paddr, unsigned int top_level) +{ +} + +static spinlock_t *noiommu_get_top_lock(struct pt_iommu *iommupt) +{ + struct noiommu_domain *domain =3D + container_of(iommupt, struct noiommu_domain, amdv1.iommu); + + return &domain->lock; +} + +static const struct pt_iommu_driver_ops noiommu_driver_ops =3D { + .get_top_lock =3D noiommu_get_top_lock, + .change_top =3D noiommu_change_top, +}; + +static struct iommu_domain * +noiommu_alloc_paging_flags(struct device *dev, u32 flags, + const struct iommu_user_data *user_data) +{ + struct pt_iommu_amdv1_cfg cfg =3D {}; + struct noiommu_domain *dom; + int rc; + + if (flags || user_data) + return ERR_PTR(-EOPNOTSUPP); + + cfg.common.hw_max_vasz_lg2 =3D 64; + cfg.common.hw_max_oasz_lg2 =3D 52; + cfg.starting_level =3D 2; + cfg.common.features =3D + (BIT(PT_FEAT_DYNAMIC_TOP) | BIT(PT_FEAT_AMDV1_ENCRYPT_TABLES) | + BIT(PT_FEAT_AMDV1_FORCE_COHERENCE)); + + dom =3D kzalloc(sizeof(*dom), GFP_KERNEL); + if (!dom) + return ERR_PTR(-ENOMEM); + + spin_lock_init(&dom->lock); + dom->amdv1.iommu.nid =3D NUMA_NO_NODE; + dom->amdv1.iommu.driver_ops =3D &noiommu_driver_ops; + dom->domain.ops =3D &noiommu_amdv1_ops; + + /* Use mock page table which is based on AMDV1 */ + rc =3D pt_iommu_amdv1_init(&dom->amdv1, &cfg, GFP_KERNEL); + if (rc) { + kfree(dom); + return ERR_PTR(rc); + } + + return &dom->domain; +} + +static void noiommu_domain_free(struct iommu_domain *iommu_domain) +{ + struct noiommu_domain *domain =3D + container_of(iommu_domain, struct noiommu_domain, domain); + + pt_iommu_deinit(&domain->amdv1.iommu); + kfree(domain); +} + +/* + * AMDV1 is used as a dummy page table for no-IOMMU mode, similar to the + * iommufd selftest mock page table. + * Unlike legacy VFIO no-IOMMU mode, where no container level APIs are + * supported, this allows IOAS and hwpt objects to exist without hardware + * IOMMU support. IOVAs are used only for IOVA-to-PA lookups not for + * hardware translation in DMA. + * + * This is only used with iommufd and cdev-based interfaces and does not + * apply to legacy VFIO group-container based noiommu mode. + */ +static const struct iommu_domain_ops noiommu_amdv1_ops =3D { + IOMMU_PT_DOMAIN_OPS(amdv1), + .free =3D noiommu_domain_free, +}; + +const struct iommu_ops iommufd_noiommu_ops =3D { + .domain_alloc_paging_flags =3D noiommu_alloc_paging_flags, +}; diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommuf= d/iommufd_private.h index 6ac1965199e9..2682b5baa6e9 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -464,6 +464,8 @@ static inline void iommufd_hw_pagetable_put(struct iomm= ufd_ctx *ictx, refcount_dec(&hwpt->obj.users); } =20 +extern const struct iommu_ops iommufd_noiommu_ops; + struct iommufd_attach; =20 struct iommufd_group { --=20 2.34.1 From nobody Thu Apr 16 12:24:56 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 4A66137E308 for ; Tue, 14 Apr 2026 21:14:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776201257; cv=none; b=hhIXemHXYuaO+Vjl0enKlG85lE+xbskjqNKG2rZWMRJ0Ud0mq3qWU7KONg/wuhp+6xtkiF0sTeNkPOKDE8B43jJv8h6HDcEa6IlzzyGON8BkQoXEg+Lmxo0SaIIzgpoPBfNqaN16iM/WId+5fWqDrRhKwd5Tl6hU2s2M8phmJJ0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776201257; c=relaxed/simple; bh=P82U9b1A4HStqPYDnOm9GD+DzgbwLf1lAwUYP1ER5wg=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=iD70DfvC0vJzGcYgunJAvyY4/HVndwf7v+UgKbKfmKRvHDwwIMKkYFJYWv30VXHjXnUZ+c++acRIZzgF2729tjuDZeZLwPejF28Y+5O9fM+PRKey+ng7ZcrgLNfu/sIv3KN9vSq/ExqKsI+3bpuKmbN1x7YeoC/FRVO2oJYeIKs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=PwbZfXha; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="PwbZfXha" Received: from DESKTOP-0403QTC.corp.microsoft.com (unknown [20.191.74.188]) by linux.microsoft.com (Postfix) with ESMTPSA id 9BFF020B6F0C; Tue, 14 Apr 2026 14:14:15 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 9BFF020B6F0C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1776201256; bh=iwqcK02WLd+CWksCwPDzhz12+1Xn7Wqds38ku7rZIyY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=PwbZfXhaN1XS5bMzuyOFODVYXTjUVG1I+xEKccvMMmPZk2EYf/F0OSfvNIUktJGDn d9UoQJvHwVENsblzHqWvH8ZONqcg8B3mKx/04KipD5jukXMBZic5xuozH1MavuM0Dg RY8bWcDZ7VfIW+uRXWFlgd4OEKlwJ8bMDLPOgTmc= From: Jacob Pan To: linux-kernel@vger.kernel.org, "iommu@lists.linux.dev" , Jason Gunthorpe , Alex Williamson , Joerg Roedel , Mostafa Saleh , David Matlack , Robin Murphy , Nicolin Chen , "Tian, Kevin" , Yi Liu Cc: skhawaja@google.com, pasha.tatashin@soleen.com, Will Deacon , Jacob Pan , Baolu Lu Subject: [PATCH V4 02/10] iommufd: Move igroup allocation to a function Date: Tue, 14 Apr 2026 14:14:04 -0700 Message-Id: <20260414211412.2729-3-jacob.pan@linux.microsoft.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260414211412.2729-1-jacob.pan@linux.microsoft.com> References: <20260414211412.2729-1-jacob.pan@linux.microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Jason Gunthorpe So it can be reused in the next patch which allows binding to noiommu device. Reviewed-by: Samiullah Khawaja Signed-off-by: Jason Gunthorpe Signed-off-by: Jacob Pan Reviewed-by: Kevin Tian --- v3: - Moved null group check out to the next patch (Mostafa) --- drivers/iommu/iommufd/device.c | 43 +++++++++++++++++++++------------- 1 file changed, 27 insertions(+), 16 deletions(-) diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 344d620cdecc..2f30ea069f66 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -56,6 +56,30 @@ static bool iommufd_group_try_get(struct iommufd_group *= igroup, return kref_get_unless_zero(&igroup->ref); } =20 +static struct iommufd_group *iommufd_alloc_group(struct iommufd_ctx *ictx, + struct iommu_group *group) +{ + struct iommufd_group *new_igroup; + + new_igroup =3D kzalloc(sizeof(*new_igroup), GFP_KERNEL); + if (!new_igroup) + return ERR_PTR(-ENOMEM); + + kref_init(&new_igroup->ref); + mutex_init(&new_igroup->lock); + xa_init(&new_igroup->pasid_attach); + new_igroup->sw_msi_start =3D PHYS_ADDR_MAX; + /* group reference moves into new_igroup */ + new_igroup->group =3D group; + + /* + * The ictx is not additionally refcounted here becase all objects using + * an igroup must put it before their destroy completes. + */ + new_igroup->ictx =3D ictx; + return new_igroup; +} + /* * iommufd needs to store some more data for each iommu_group, we keep a * parallel xarray indexed by iommu_group id to hold this instead of putti= ng it @@ -87,25 +111,12 @@ static struct iommufd_group *iommufd_get_group(struct = iommufd_ctx *ictx, } xa_unlock(&ictx->groups); =20 - new_igroup =3D kzalloc_obj(*new_igroup); - if (!new_igroup) { + new_igroup =3D iommufd_alloc_group(ictx, group); + if (IS_ERR(new_igroup)) { iommu_group_put(group); - return ERR_PTR(-ENOMEM); + return new_igroup; } =20 - kref_init(&new_igroup->ref); - mutex_init(&new_igroup->lock); - xa_init(&new_igroup->pasid_attach); - new_igroup->sw_msi_start =3D PHYS_ADDR_MAX; - /* group reference moves into new_igroup */ - new_igroup->group =3D group; - - /* - * The ictx is not additionally refcounted here becase all objects using - * an igroup must put it before their destroy completes. - */ - new_igroup->ictx =3D ictx; - /* * We dropped the lock so igroup is invalid. NULL is a safe and likely * value to assume for the xa_cmpxchg algorithm. --=20 2.34.1 From nobody Thu Apr 16 12:24:56 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 06ADB37F753 for ; Tue, 14 Apr 2026 21:14:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776201260; cv=none; b=Z+cB5KNs39/4Yeyo12icRMGLNPUG3nfabW9wNrj6Xk1c0SVQ16xo8Ey8HOob0PfAi7akSqWqtMBF6XvnKuOtd8zCurbm0KubROy1gF9INrmTCge3fW/AraBH4aPuIJX16nZj9LfpQQooTBGp5GyTBRFPQQzZQVzycGc7y3Kb9a0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776201260; c=relaxed/simple; bh=TEwn8xnIfnQ5g5EDCk0MnMQrKh7dunzIdSPDGthiQDc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=fOctdim4K/2e4Hvpvc69OtHJPyvfoWsg7N896Ce96D/se83BIE3TeliiOaL/l5SRrajWHoOIGEvzL9l8Ri/8phxebm1jwZ4MevVURKP31aX0CZGFPkH6i/EJ3m8mVxlxc6Gt1HXWyXC2TxB4iv4WynoO8sP1Vy8FfMkOKhsNzc0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=SKMjYCtr; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="SKMjYCtr" Received: from DESKTOP-0403QTC.corp.microsoft.com (unknown [20.191.74.188]) by linux.microsoft.com (Postfix) with ESMTPSA id 61C3920B6F12; Tue, 14 Apr 2026 14:14:16 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 61C3920B6F12 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1776201256; bh=yLzSFbvWIQHjbdo3Y7KeS/89Fv6fjcp/ThL3mVaX6FY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=SKMjYCtrmwdTBqNKVXJHn4xtw5t6kYDSlu5Sb48YAiwMZ1uhUYnOeCiZDZ+QzRMsp qgH6ah51Kc4FDoOJI80gx9QxP6BasCgQLdfQBo+WlkG/RaiEQKm1N55uhsPxfutIjU Gl2+oX+yfwWgL61pW3kBtR04r+DSuJq/g/l1d5xk= From: Jacob Pan To: linux-kernel@vger.kernel.org, "iommu@lists.linux.dev" , Jason Gunthorpe , Alex Williamson , Joerg Roedel , Mostafa Saleh , David Matlack , Robin Murphy , Nicolin Chen , "Tian, Kevin" , Yi Liu Cc: skhawaja@google.com, pasha.tatashin@soleen.com, Will Deacon , Jacob Pan , Baolu Lu Subject: [PATCH V4 03/10] iommufd: Allow binding to a noiommu device Date: Tue, 14 Apr 2026 14:14:05 -0700 Message-Id: <20260414211412.2729-4-jacob.pan@linux.microsoft.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260414211412.2729-1-jacob.pan@linux.microsoft.com> References: <20260414211412.2729-1-jacob.pan@linux.microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Jason Gunthorpe Allow iommufd to bind devices without an IOMMU (noiommu mode) by creating a dummy IOMMU group for such devices and skipping hwpt operations. This enables noiommu devices to operate through the same iommufd API as IOM= MU- capable devices. Signed-off-by: Jason Gunthorpe Signed-off-by: Jacob Pan --- v4: - handle partially initialized idev w/o igroup (Sashiko) v3: - handle new error cases iommufd_device_bind. (Mostafa) --- drivers/iommu/iommufd/device.c | 132 +++++++++++++++++++++++---------- 1 file changed, 93 insertions(+), 39 deletions(-) diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 2f30ea069f66..0283ac39be55 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -23,6 +23,11 @@ struct iommufd_attach { struct xarray device_array; }; =20 +static bool is_vfio_noiommu(struct iommufd_device *idev) +{ + return !device_iommu_mapped(idev->dev) || !idev->dev->iommu; +} + static void iommufd_group_release(struct kref *kref) { struct iommufd_group *igroup =3D @@ -30,9 +35,11 @@ static void iommufd_group_release(struct kref *kref) =20 WARN_ON(!xa_empty(&igroup->pasid_attach)); =20 - xa_cmpxchg(&igroup->ictx->groups, iommu_group_id(igroup->group), igroup, - NULL, GFP_KERNEL); - iommu_group_put(igroup->group); + if (igroup->group) { + xa_cmpxchg(&igroup->ictx->groups, iommu_group_id(igroup->group), + igroup, NULL, GFP_KERNEL); + iommu_group_put(igroup->group); + } mutex_destroy(&igroup->lock); kfree(igroup); } @@ -204,32 +211,19 @@ void iommufd_device_destroy(struct iommufd_object *ob= j) struct iommufd_device *idev =3D container_of(obj, struct iommufd_device, obj); =20 - iommu_device_release_dma_owner(idev->dev); + if (!idev->igroup) + return; + if (!is_vfio_noiommu(idev)) + iommu_device_release_dma_owner(idev->dev); iommufd_put_group(idev->igroup); if (!iommufd_selftest_is_mock_dev(idev->dev)) iommufd_ctx_put(idev->ictx); } =20 -/** - * iommufd_device_bind - Bind a physical device to an iommu fd - * @ictx: iommufd file descriptor - * @dev: Pointer to a physical device struct - * @id: Output ID number to return to userspace for this device - * - * A successful bind establishes an ownership over the device and returns - * struct iommufd_device pointer, otherwise returns error pointer. - * - * A driver using this API must set driver_managed_dma and must not touch - * the device until this routine succeeds and establishes ownership. - * - * Binding a PCI device places the entire RID under iommufd control. - * - * The caller must undo this with iommufd_device_unbind() - */ -struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx, - struct device *dev, u32 *id) +static int iommufd_bind_iommu(struct iommufd_device *idev) { - struct iommufd_device *idev; + struct iommufd_ctx *ictx =3D idev->ictx; + struct device *dev =3D idev->dev; struct iommufd_group *igroup; int rc; =20 @@ -238,11 +232,11 @@ struct iommufd_device *iommufd_device_bind(struct iom= mufd_ctx *ictx, * to restore cache coherency. */ if (!device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY)) - return ERR_PTR(-EINVAL); + return -EINVAL; =20 - igroup =3D iommufd_get_group(ictx, dev); + igroup =3D iommufd_get_group(idev->ictx, dev); if (IS_ERR(igroup)) - return ERR_CAST(igroup); + return PTR_ERR(igroup); =20 /* * For historical compat with VFIO the insecure interrupt path is @@ -268,21 +262,69 @@ struct iommufd_device *iommufd_device_bind(struct iom= mufd_ctx *ictx, if (rc) goto out_group_put; =20 + /* igroup refcount moves into iommufd_device */ + idev->igroup =3D igroup; + return 0; + +out_group_put: + iommufd_put_group(igroup); + return rc; +} + +/** + * iommufd_device_bind - Bind a physical device to an iommu fd + * @ictx: iommufd file descriptor + * @dev: Pointer to a physical device struct + * @id: Output ID number to return to userspace for this device + * + * A successful bind establishes an ownership over the device and returns + * struct iommufd_device pointer, otherwise returns error pointer. + * + * A driver using this API must set driver_managed_dma and must not touch + * the device until this routine succeeds and establishes ownership. + * + * Binding a PCI device places the entire RID under iommufd control. + * + * The caller must undo this with iommufd_device_unbind() + */ +struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx, + struct device *dev, u32 *id) +{ + struct iommufd_device *idev; + int rc; + idev =3D iommufd_object_alloc(ictx, idev, IOMMUFD_OBJ_DEVICE); - if (IS_ERR(idev)) { - rc =3D PTR_ERR(idev); - goto out_release_owner; - } + if (IS_ERR(idev)) + return idev; + idev->ictx =3D ictx; - if (!iommufd_selftest_is_mock_dev(dev)) - iommufd_ctx_get(ictx); idev->dev =3D dev; idev->enforce_cache_coherency =3D device_iommu_capable(dev, IOMMU_CAP_ENFORCE_CACHE_COHERENCY); + + if (!is_vfio_noiommu(idev)) { + rc =3D iommufd_bind_iommu(idev); + if (rc) + goto err_out; + } else { + struct iommufd_group *igroup; + + /* + * Create a dummy igroup, lots of stuff expects ths igroup to be + * present, but a NULL igroup->group is OK + */ + igroup =3D iommufd_alloc_group(ictx, NULL); + if (IS_ERR(igroup)) { + rc =3D PTR_ERR(igroup); + goto err_out; + } + idev->igroup =3D igroup; + } + + if (!iommufd_selftest_is_mock_dev(dev)) + iommufd_ctx_get(ictx); /* The calling driver is a user until iommufd_device_unbind() */ refcount_inc(&idev->obj.users); - /* igroup refcount moves into iommufd_device */ - idev->igroup =3D igroup; =20 /* * If the caller fails after this success it must call @@ -294,11 +336,14 @@ struct iommufd_device *iommufd_device_bind(struct iom= mufd_ctx *ictx, *id =3D idev->obj.id; return idev; =20 -out_release_owner: - iommu_device_release_dma_owner(dev); -out_group_put: - iommufd_put_group(igroup); +err_out: + /* + * Be careful that iommufd_device_destroy() can handle partial + * initialization. + */ + iommufd_object_abort_and_destroy(ictx, &idev->obj); return ERR_PTR(rc); + } EXPORT_SYMBOL_NS_GPL(iommufd_device_bind, "IOMMUFD"); =20 @@ -512,6 +557,9 @@ static int iommufd_hwpt_attach_device(struct iommufd_hw= _pagetable *hwpt, struct iommufd_attach_handle *handle; int rc; =20 + if (is_vfio_noiommu(idev)) + return 0; + if (!iommufd_hwpt_compatible_device(hwpt, idev)) return -EINVAL; =20 @@ -559,6 +607,9 @@ static void iommufd_hwpt_detach_device(struct iommufd_h= w_pagetable *hwpt, { struct iommufd_attach_handle *handle; =20 + if (is_vfio_noiommu(idev)) + return; + handle =3D iommufd_device_get_attach_handle(idev, pasid); if (pasid =3D=3D IOMMU_NO_PASID) iommu_detach_group_handle(hwpt->domain, idev->igroup->group); @@ -577,6 +628,9 @@ static int iommufd_hwpt_replace_device(struct iommufd_d= evice *idev, struct iommufd_attach_handle *handle, *old_handle; int rc; =20 + if (is_vfio_noiommu(idev)) + return 0; + if (!iommufd_hwpt_compatible_device(hwpt, idev)) return -EINVAL; =20 @@ -652,7 +706,7 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_paget= able *hwpt, goto err_release_devid; } =20 - if (attach_resv) { + if (attach_resv && !is_vfio_noiommu(idev)) { rc =3D iommufd_device_attach_reserved_iova(idev, hwpt_paging); if (rc) goto err_release_devid; --=20 2.34.1 From nobody Thu Apr 16 12:24:56 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id D3CB537FF40 for ; Tue, 14 Apr 2026 21:14:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776201261; cv=none; b=SIOmFw2KB23JBllV1i45kOteUsgIo9pUcXbxrwXOCAGpF2uBWaO1lK+sFK/womWoVGjOUWMK1eaPAiI3jVctGuLNB8OGKBeFsGKi7jUhjAStbKJ32Xuu3hqkC2lk5yIznd5zhzryRvbMyabb2cmcW+jj+pqZe4qGV6hSlulBDFo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776201261; c=relaxed/simple; bh=39rXe6HTWbMAUae/IYSp8fWWwmdUXGJxwvbo1xOQQEg=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Hu64Gd1NilPwaS5XLsHOWDAK3cTIWFb4Th7R1rooT/fjkWvvguD5V8YDHmVsxHLoULiMeOB/45bu9KWcs9Qf62436bjg9NKNqOXCdiVYHz74CUAvK5/uRrAABWx7RIf4oQKxjpXv4VpNYlvxScoT837huZcTZZt4Oouvo0HITcY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=J2L3tWbf; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="J2L3tWbf" Received: from DESKTOP-0403QTC.corp.microsoft.com (unknown [20.191.74.188]) by linux.microsoft.com (Postfix) with ESMTPSA id 2A44B20B6F15; Tue, 14 Apr 2026 14:14:17 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 2A44B20B6F15 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1776201257; bh=rmU24ZNo7dAZHd5IsL9tCiEHiIK7TBCDrNLbO3ETZ9I=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=J2L3tWbfsjP8QM0xshnB5PHjClpeRetm099FFZD9Cu+r6k2hihW0eJXo6cbZ7bDrM y7C2r36pqyL8YeYY0uYCiv4tO3ljgMmv6nzu2bKiG19XGM7l2B+Nh28uwC4pVaP5Ow vCid1kAV+ovsc5T2ulbE2I9+D+havlu0hpt8Kabg= From: Jacob Pan To: linux-kernel@vger.kernel.org, "iommu@lists.linux.dev" , Jason Gunthorpe , Alex Williamson , Joerg Roedel , Mostafa Saleh , David Matlack , Robin Murphy , Nicolin Chen , "Tian, Kevin" , Yi Liu Cc: skhawaja@google.com, pasha.tatashin@soleen.com, Will Deacon , Jacob Pan , Baolu Lu Subject: [PATCH V4 04/10] iommufd: Add an ioctl IOMMU_IOAS_GET_PA to query PA from IOVA Date: Tue, 14 Apr 2026 14:14:06 -0700 Message-Id: <20260414211412.2729-5-jacob.pan@linux.microsoft.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260414211412.2729-1-jacob.pan@linux.microsoft.com> References: <20260414211412.2729-1-jacob.pan@linux.microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" To support no-IOMMU mode where userspace drivers perform unsafe DMA using physical addresses, introduce a new API to retrieve the physical address of a user-allocated DMA buffer that has been mapped to an IOVA via IOAS. The mapping is backed by mock I/O page tables maintained by generic IOMMUPT framework. Suggested-by: Jason Gunthorpe Signed-off-by: Jacob Pan Signed-off-by: Jason Gunthorpe --- v4: - Fix unaligned IOVA length (Sashiko) v2: - Scan the contiguous physical-address span beyond the first page and re= turn its length. --- drivers/iommu/iommufd/io_pagetable.c | 60 +++++++++++++++++++++++++ drivers/iommu/iommufd/ioas.c | 25 +++++++++++ drivers/iommu/iommufd/iommufd_private.h | 3 ++ drivers/iommu/iommufd/main.c | 3 ++ include/uapi/linux/iommufd.h | 25 +++++++++++ 5 files changed, 116 insertions(+) diff --git a/drivers/iommu/iommufd/io_pagetable.c b/drivers/iommu/iommufd/i= o_pagetable.c index ee003bb2f647..04336a8e12f5 100644 --- a/drivers/iommu/iommufd/io_pagetable.c +++ b/drivers/iommu/iommufd/io_pagetable.c @@ -849,6 +849,66 @@ int iopt_unmap_iova(struct io_pagetable *iopt, unsigne= d long iova, return iopt_unmap_iova_range(iopt, iova, iova_last, unmapped); } =20 +int iopt_get_phys(struct io_pagetable *iopt, unsigned long iova, u64 *padd= r, + u64 *length) +{ + struct iopt_area *area; + u64 tmp_length =3D 0; + u64 tmp_paddr =3D 0; + int rc =3D 0; + + if (!IS_ENABLED(CONFIG_VFIO_NOIOMMU)) + return -EOPNOTSUPP; + + down_read(&iopt->iova_rwsem); + area =3D iopt_area_iter_first(iopt, iova, iova); + if (!area || !area->pages) { + rc =3D -ENOENT; + goto unlock_exit; + } + + if (!area->storage_domain || + area->storage_domain->owner !=3D &iommufd_noiommu_ops) { + rc =3D -EOPNOTSUPP; + goto unlock_exit; + } + + *paddr =3D iommu_iova_to_phys(area->storage_domain, iova); + if (!*paddr) { + rc =3D -EINVAL; + goto unlock_exit; + } + + tmp_length =3D PAGE_SIZE - offset_in_page(iova); + tmp_paddr =3D *paddr; + /* + * Scan the domain for the contiguous physical address length so that + * userspace search can be optimized for fewer ioctls. + */ + while (iova < iopt_area_last_iova(area)) { + unsigned long next_iova; + u64 next_paddr; + + if (check_add_overflow(iova, PAGE_SIZE, &next_iova)) + break; + + next_paddr =3D iommu_iova_to_phys(area->storage_domain, next_iova); + + if (!next_paddr || next_paddr !=3D tmp_paddr + PAGE_SIZE) + break; + + iova =3D next_iova; + tmp_paddr +=3D PAGE_SIZE; + tmp_length +=3D PAGE_SIZE; + } + *length =3D tmp_length; + +unlock_exit: + up_read(&iopt->iova_rwsem); + + return rc; +} + int iopt_unmap_all(struct io_pagetable *iopt, unsigned long *unmapped) { /* If the IOVAs are empty then unmap all succeeds */ diff --git a/drivers/iommu/iommufd/ioas.c b/drivers/iommu/iommufd/ioas.c index fed06c2b728e..93cebb4c23bd 100644 --- a/drivers/iommu/iommufd/ioas.c +++ b/drivers/iommu/iommufd/ioas.c @@ -375,6 +375,31 @@ int iommufd_ioas_unmap(struct iommufd_ucmd *ucmd) return rc; } =20 +int iommufd_ioas_get_pa(struct iommufd_ucmd *ucmd) +{ + struct iommu_ioas_get_pa *cmd =3D ucmd->cmd; + struct iommufd_ioas *ioas; + int rc; + + if (cmd->flags || cmd->__reserved) + return -EOPNOTSUPP; + + ioas =3D iommufd_get_ioas(ucmd->ictx, cmd->ioas_id); + if (IS_ERR(ioas)) + return PTR_ERR(ioas); + + rc =3D iopt_get_phys(&ioas->iopt, cmd->iova, &cmd->out_phys, + &cmd->out_length); + if (rc) + goto out_put; + + rc =3D iommufd_ucmd_respond(ucmd, sizeof(*cmd)); +out_put: + iommufd_put_object(ucmd->ictx, &ioas->obj); + + return rc; +} + static void iommufd_release_all_iova_rwsem(struct iommufd_ctx *ictx, struct xarray *ioas_list) { diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommuf= d/iommufd_private.h index 2682b5baa6e9..0e772882aee9 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -118,6 +118,8 @@ int iopt_map_pages(struct io_pagetable *iopt, struct li= st_head *pages_list, int iopt_unmap_iova(struct io_pagetable *iopt, unsigned long iova, unsigned long length, unsigned long *unmapped); int iopt_unmap_all(struct io_pagetable *iopt, unsigned long *unmapped); +int iopt_get_phys(struct io_pagetable *iopt, unsigned long iova, u64 *padd= r, + u64 *length); =20 int iopt_read_and_clear_dirty_data(struct io_pagetable *iopt, struct iommu_domain *domain, @@ -346,6 +348,7 @@ int iommufd_ioas_map_file(struct iommufd_ucmd *ucmd); int iommufd_ioas_change_process(struct iommufd_ucmd *ucmd); int iommufd_ioas_copy(struct iommufd_ucmd *ucmd); int iommufd_ioas_unmap(struct iommufd_ucmd *ucmd); +int iommufd_ioas_get_pa(struct iommufd_ucmd *ucmd); int iommufd_ioas_option(struct iommufd_ucmd *ucmd); int iommufd_option_rlimit_mode(struct iommu_option *cmd, struct iommufd_ctx *ictx); diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index 8c6d43601afb..ebae01ed947d 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -432,6 +432,7 @@ union ucmd_buffer { struct iommu_veventq_alloc veventq; struct iommu_vfio_ioas vfio_ioas; struct iommu_viommu_alloc viommu; + struct iommu_ioas_get_pa get_pa; #ifdef CONFIG_IOMMUFD_TEST struct iommu_test_cmd test; #endif @@ -484,6 +485,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[= ] =3D { struct iommu_ioas_map_file, iova), IOCTL_OP(IOMMU_IOAS_UNMAP, iommufd_ioas_unmap, struct iommu_ioas_unmap, length), + IOCTL_OP(IOMMU_IOAS_GET_PA, iommufd_ioas_get_pa, struct iommu_ioas_get_pa, + out_phys), IOCTL_OP(IOMMU_OPTION, iommufd_option, struct iommu_option, val64), IOCTL_OP(IOMMU_VDEVICE_ALLOC, iommufd_vdevice_alloc_ioctl, struct iommu_vdevice_alloc, virt_id), diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 1dafbc552d37..9afe0a1b11a0 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -57,6 +57,7 @@ enum { IOMMUFD_CMD_IOAS_CHANGE_PROCESS =3D 0x92, IOMMUFD_CMD_VEVENTQ_ALLOC =3D 0x93, IOMMUFD_CMD_HW_QUEUE_ALLOC =3D 0x94, + IOMMUFD_CMD_IOAS_GET_PA =3D 0x95, }; =20 /** @@ -219,6 +220,30 @@ struct iommu_ioas_map { }; #define IOMMU_IOAS_MAP _IO(IOMMUFD_TYPE, IOMMUFD_CMD_IOAS_MAP) =20 +/** + * struct iommu_ioas_get_pa - ioctl(IOMMU_IOAS_GET_PA) + * @size: sizeof(struct iommu_ioas_get_pa) + * @flags: Reserved, must be 0 for now + * @ioas_id: IOAS ID to query IOVA to PA mapping from + * @__reserved: Must be 0 + * @iova: IOVA to query + * @out_length: Number of bytes contiguous physical address starting from = phys + * @out_phys: Output physical address the IOVA maps to + * + * Query the physical address backing an IOVA range. The entire range must= be + * mapped already. For noiommu devices doing unsafe DMA only. + */ +struct iommu_ioas_get_pa { + __u32 size; + __u32 flags; + __u32 ioas_id; + __u32 __reserved; + __aligned_u64 iova; + __aligned_u64 out_length; + __aligned_u64 out_phys; +}; +#define IOMMU_IOAS_GET_PA _IO(IOMMUFD_TYPE, IOMMUFD_CMD_IOAS_GET_PA) + /** * struct iommu_ioas_map_file - ioctl(IOMMU_IOAS_MAP_FILE) * @size: sizeof(struct iommu_ioas_map_file) --=20 2.34.1 From nobody Thu Apr 16 12:24:56 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id A782D37F8C3 for ; Tue, 14 Apr 2026 21:14:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776201261; cv=none; b=QXJnixmQyL20Ww1QndywXtrpmB04aDLTPcsP8CgBl/KVlg8O+pinKQ3E6dAbYUeddCWCgQluCMDoHVzjjufDV5OhPG+/IInfxuL9tSne2IrQmLLEj0uMblTmUoU3S0mrikzDB1rc+ycEOAObx8I1gESzr874QLpCpKagNTRYyBw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776201261; c=relaxed/simple; bh=zISu7q0PpKyEhEY/6t/70q0QbuRGGmIjDGc/QHXhVis=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=f4Wuj9G+noEHvDJWq8mqlK0h1sBn56DVjs6+LqoA+F4xt528AuxlT3Kx8uWu571tBouW4ZLH9QmCJbV7yiw3fkro1nXIHNtvFrXviQZxYBuONqppRWlpcC0WfFSXVxs1TuOpGQAyMl8inHuWhDvy0qV7+99e+cGZo4Nu8sIwd24= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=OS1iKw8O; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="OS1iKw8O" Received: from DESKTOP-0403QTC.corp.microsoft.com (unknown [20.191.74.188]) by linux.microsoft.com (Postfix) with ESMTPSA id EFF1B20B6F01; Tue, 14 Apr 2026 14:14:17 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com EFF1B20B6F01 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1776201258; bh=oVf1LLQ0I/WldtbE72D4+914B1WjLkuib6Ic+fwSG8E=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=OS1iKw8OqrOGLBnARs3fKGG5W/3uChV0sx/QJ605tw+bmwreB2Z8ek0O++Aixgc66 fLyJDcBpSK4k46JmdmScgOpVJ53802zwssEMGyp3WkxMgtFgf/PCFr2tGqweedPINB qhUy7bj8ayxbQtBjRKOk/ye+k0iSuJUomfKQ2etE= From: Jacob Pan To: linux-kernel@vger.kernel.org, "iommu@lists.linux.dev" , Jason Gunthorpe , Alex Williamson , Joerg Roedel , Mostafa Saleh , David Matlack , Robin Murphy , Nicolin Chen , "Tian, Kevin" , Yi Liu Cc: skhawaja@google.com, pasha.tatashin@soleen.com, Will Deacon , Jacob Pan , Baolu Lu Subject: [PATCH V4 05/10] vfio: Allow null group for noiommu without containers Date: Tue, 14 Apr 2026 14:14:07 -0700 Message-Id: <20260414211412.2729-6-jacob.pan@linux.microsoft.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260414211412.2729-1-jacob.pan@linux.microsoft.com> References: <20260414211412.2729-1-jacob.pan@linux.microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In case of noiommu mode is enabled for VFIO cdev without VFIO container nor IOMMUFD provided compatibility container, there is no need to create a dummy group. Update the group operations to tolerate null group pointer. Signed-off-by: Jacob Pan --- v4: (Jason) - Avoid null pointer deref in error unwind - Add null group check in vfio_device_group_unregister - repartition to include vfio_device_has_group() in this patch --- drivers/vfio/group.c | 20 ++++++++++++++++++++ drivers/vfio/vfio.h | 17 +++++++++++++++++ drivers/vfio/vfio_main.c | 14 ++++++++++++++ include/linux/vfio.h | 9 +++++++++ 4 files changed, 60 insertions(+) diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c index 0fa9761b13d3..451e49d851f8 100644 --- a/drivers/vfio/group.c +++ b/drivers/vfio/group.c @@ -390,6 +390,9 @@ int vfio_device_block_group(struct vfio_device *device) struct vfio_group *group =3D device->group; int ret =3D 0; =20 + if (vfio_null_group_allowed() && !group) + return 0; + mutex_lock(&group->group_lock); if (group->opened_file) { ret =3D -EBUSY; @@ -407,6 +410,9 @@ void vfio_device_unblock_group(struct vfio_device *devi= ce) { struct vfio_group *group =3D device->group; =20 + if (vfio_null_group_allowed() && !group) + return; + mutex_lock(&group->group_lock); group->cdev_device_open_cnt--; mutex_unlock(&group->group_lock); @@ -598,6 +604,14 @@ static struct vfio_group *vfio_noiommu_group_alloc(str= uct device *dev, struct vfio_group *group; int ret; =20 + /* + * With noiommu enabled under cdev interface only, there is no need to + * create a vfio_group if the group based containers are not enabled. + * The cdev interface is exclusively used for iommufd. + */ + if (vfio_null_group_allowed()) + return NULL; + iommu_group =3D iommu_group_alloc(); if (IS_ERR(iommu_group)) return ERR_CAST(iommu_group); @@ -705,6 +719,9 @@ void vfio_device_remove_group(struct vfio_device *devic= e) struct vfio_group *group =3D device->group; struct iommu_group *iommu_group; =20 + if (!group) + return; + if (group->type =3D=3D VFIO_NO_IOMMU || group->type =3D=3D VFIO_EMULATED_= IOMMU) iommu_group_remove_device(device->dev); =20 @@ -756,6 +773,9 @@ void vfio_device_group_register(struct vfio_device *dev= ice) =20 void vfio_device_group_unregister(struct vfio_device *device) { + if (!device->group) + return; + mutex_lock(&device->group->device_lock); list_del(&device->group_next); mutex_unlock(&device->group->device_lock); diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h index 8fcc98cf9577..db1530bb1716 100644 --- a/drivers/vfio/vfio.h +++ b/drivers/vfio/vfio.h @@ -114,6 +114,18 @@ bool vfio_device_has_container(struct vfio_device *dev= ice); int __init vfio_group_init(void); void vfio_group_cleanup(void); =20 +/* + * With noiommu enabled and no containers are supported, allow devices that + * don't have a dummy group. + */ +static inline bool vfio_null_group_allowed(void) +{ + if (vfio_noiommu && (!IS_ENABLED(CONFIG_VFIO_CONTAINER) && !IS_ENABLED(CO= NFIG_IOMMUFD_VFIO_CONTAINER))) + return true; + + return false; +} + static inline bool vfio_device_is_noiommu(struct vfio_device *vdev) { return IS_ENABLED(CONFIG_VFIO_NOIOMMU) && @@ -190,6 +202,11 @@ static inline void vfio_group_cleanup(void) { } =20 +static inline bool vfio_null_group_allowed(void) +{ + return false; +} + static inline bool vfio_device_is_noiommu(struct vfio_device *vdev) { return false; diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index e5886235cad4..5d7c2d014689 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -358,6 +358,10 @@ static int __vfio_register_dev(struct vfio_device *dev= ice, /* Refcounting can't start until the driver calls register */ refcount_set(&device->refcount, 1); =20 + /* noiommu device w/o container may have NULL group */ + if (!vfio_device_has_group(device)) + return 0; + vfio_device_group_register(device); vfio_device_debugfs_init(device); =20 @@ -392,6 +396,16 @@ void vfio_unregister_group_dev(struct vfio_device *dev= ice) bool interrupted =3D false; long rc; =20 + /* + * For noiommu devices without a container, thus no dummy group, + * simply delete and unregister to balance refcount. + */ + if (!vfio_device_has_group(device)) { + vfio_device_del(device); + vfio_device_put_registration(device); + return; + } + /* * Prevent new device opened by userspace via the * VFIO_GROUP_GET_DEVICE_FD in the group path. diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 7384965d15d7..ceb5034c3a2e 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -328,6 +328,10 @@ struct iommu_group *vfio_file_iommu_group(struct file = *file); #if IS_ENABLED(CONFIG_VFIO_GROUP) bool vfio_file_is_group(struct file *file); bool vfio_file_has_dev(struct file *file, struct vfio_device *device); +static inline bool vfio_device_has_group(struct vfio_device *device) +{ + return device->group; +} #else static inline bool vfio_file_is_group(struct file *file) { @@ -338,6 +342,11 @@ static inline bool vfio_file_has_dev(struct file *file= , struct vfio_device *devi { return false; } + +static inline bool vfio_device_has_group(struct vfio_device *device) +{ + return false; +} #endif bool vfio_file_is_valid(struct file *file); bool vfio_file_enforced_coherent(struct file *file); --=20 2.34.1 From nobody Thu Apr 16 12:24:56 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id ABFA73803E5 for ; Tue, 14 Apr 2026 21:14:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776201261; cv=none; b=MV1AexVj6V0k7Y538WG2h2Ozq/QgqvkTNJLdF+yO0o6TrucM2eovhsXfVtTXZrWTK0PQVEo3ouFqpXFWmHVE3P1gE4xaawXLc9E2MPGCnHWwXRbuarr1MiElKFimMlXmQvwlyWNB4Yr8kIyd6x5Dub6wkkMWqaoyWlnQX1sAW7w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776201261; c=relaxed/simple; bh=8Q3cXVQNJC9cLuxWvrXeOPu2s2u4COyZ34aexMRuctE=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=H7W1rVTwFlUt3sQ/Cx06WnbJ+4h8vCeV1Jk/eBmVP0N0p4Wnsr8G7xPgiY/n0JXBXrMFDnPDIBhxLh7e3ZNcSe7qR1jvCC5GB4SEX4I1e+zfiL9qMA3Al7cc/OiRcjJtDarNYVEii7PMMB9fMMgaZ57e0VarnDsN8KS1s5iDdbw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=exBpK4zi; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="exBpK4zi" Received: from DESKTOP-0403QTC.corp.microsoft.com (unknown [20.191.74.188]) by linux.microsoft.com (Postfix) with ESMTPSA id CB45220B6F08; Tue, 14 Apr 2026 14:14:18 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com CB45220B6F08 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1776201259; bh=ZuyTNPhPDg3cOv53T388OJ43zpcz1YMQ3Rnb5TsGKT0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=exBpK4ziHMNGgKjGIUCDmC8MVUooN8/xkK97zyLo+s+45kY4WPDScipnpIs1jwX3/ D7ch993IQsNo6Vr5fHLvWASINsbfpoenwyf3n9EWSoy8YzoGxdMWR/dDJbFCVHwHNx 2qaY7jC8oGJzrRjRyURRG8UT1CoFsi7+0rN9vjWY= From: Jacob Pan To: linux-kernel@vger.kernel.org, "iommu@lists.linux.dev" , Jason Gunthorpe , Alex Williamson , Joerg Roedel , Mostafa Saleh , David Matlack , Robin Murphy , Nicolin Chen , "Tian, Kevin" , Yi Liu Cc: skhawaja@google.com, pasha.tatashin@soleen.com, Will Deacon , Jacob Pan , Baolu Lu Subject: [PATCH V4 06/10] vfio: Introduce and set noiommu flag on vfio_device Date: Tue, 14 Apr 2026 14:14:08 -0700 Message-Id: <20260414211412.2729-7-jacob.pan@linux.microsoft.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260414211412.2729-1-jacob.pan@linux.microsoft.com> References: <20260414211412.2729-1-jacob.pan@linux.microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When a VFIO device is added to a noiommu group, set the noiommu flag on the vfio_device structure to indicate that the device operates in noiommu mode. Also update function signatures to pass vfio_device instead of device, which has the direct access to the noiommu flag. Reviewed-by: Mostafa Saleh Signed-off-by: Jacob Pan --- v3: - Squashed with vfio noiommu logic check patch 7/11 of v2 (Mostafa) --- drivers/vfio/group.c | 21 +++++++++++---------- drivers/vfio/vfio.h | 3 +-- include/linux/vfio.h | 1 + 3 files changed, 13 insertions(+), 12 deletions(-) diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c index 451e49d851f8..b7009b44703b 100644 --- a/drivers/vfio/group.c +++ b/drivers/vfio/group.c @@ -597,7 +597,7 @@ static struct vfio_group *vfio_create_group(struct iomm= u_group *iommu_group, return ret; } =20 -static struct vfio_group *vfio_noiommu_group_alloc(struct device *dev, +static struct vfio_group *vfio_noiommu_group_alloc(struct vfio_device *vde= v, enum vfio_group_type type) { struct iommu_group *iommu_group; @@ -619,7 +619,7 @@ static struct vfio_group *vfio_noiommu_group_alloc(stru= ct device *dev, ret =3D iommu_group_set_name(iommu_group, "vfio-noiommu"); if (ret) goto out_put_group; - ret =3D iommu_group_add_device(iommu_group, dev); + ret =3D iommu_group_add_device(iommu_group, vdev->dev); if (ret) goto out_put_group; =20 @@ -634,7 +634,7 @@ static struct vfio_group *vfio_noiommu_group_alloc(stru= ct device *dev, return group; =20 out_remove_device: - iommu_group_remove_device(dev); + iommu_group_remove_device(vdev->dev); out_put_group: iommu_group_put(iommu_group); return ERR_PTR(ret); @@ -655,23 +655,24 @@ static bool vfio_group_has_device(struct vfio_group *= group, struct device *dev) return false; } =20 -static struct vfio_group *vfio_group_find_or_alloc(struct device *dev) +static struct vfio_group *vfio_group_find_or_alloc(struct vfio_device *vde= v) { struct iommu_group *iommu_group; struct vfio_group *group; =20 - iommu_group =3D iommu_group_get(dev); + iommu_group =3D iommu_group_get(vdev->dev); if (!iommu_group && vfio_noiommu) { + vdev->noiommu =3D 1; /* * With noiommu enabled, create an IOMMU group for devices that * don't already have one, implying no IOMMU hardware/driver * exists. Taint the kernel because we're about to give a DMA * capable device to a user without IOMMU protection. */ - group =3D vfio_noiommu_group_alloc(dev, VFIO_NO_IOMMU); + group =3D vfio_noiommu_group_alloc(vdev, VFIO_NO_IOMMU); if (!IS_ERR(group)) { add_taint(TAINT_USER, LOCKDEP_STILL_OK); - dev_warn(dev, "Adding kernel taint for vfio-noiommu group on device\n"); + dev_warn(vdev->dev, "Adding kernel taint for vfio-noiommu group on devi= ce\n"); } return group; } @@ -682,7 +683,7 @@ static struct vfio_group *vfio_group_find_or_alloc(stru= ct device *dev) mutex_lock(&vfio.group_lock); group =3D vfio_group_find_from_iommu(iommu_group); if (group) { - if (WARN_ON(vfio_group_has_device(group, dev))) + if (WARN_ON(vfio_group_has_device(group, vdev->dev))) group =3D ERR_PTR(-EINVAL); else refcount_inc(&group->drivers); @@ -702,9 +703,9 @@ int vfio_device_set_group(struct vfio_device *device, struct vfio_group *group; =20 if (type =3D=3D VFIO_IOMMU) - group =3D vfio_group_find_or_alloc(device->dev); + group =3D vfio_group_find_or_alloc(device); else - group =3D vfio_noiommu_group_alloc(device->dev, type); + group =3D vfio_noiommu_group_alloc(device, type); =20 if (IS_ERR(group)) return PTR_ERR(group); diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h index db1530bb1716..9e25605da564 100644 --- a/drivers/vfio/vfio.h +++ b/drivers/vfio/vfio.h @@ -128,8 +128,7 @@ static inline bool vfio_null_group_allowed(void) =20 static inline bool vfio_device_is_noiommu(struct vfio_device *vdev) { - return IS_ENABLED(CONFIG_VFIO_NOIOMMU) && - vdev->group->type =3D=3D VFIO_NO_IOMMU; + return IS_ENABLED(CONFIG_VFIO_NOIOMMU) && vdev->noiommu; } #else struct vfio_group; diff --git a/include/linux/vfio.h b/include/linux/vfio.h index ceb5034c3a2e..502be18a1390 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -72,6 +72,7 @@ struct vfio_device { u8 iommufd_attached:1; #endif u8 cdev_opened:1; + u8 noiommu:1; #ifdef CONFIG_DEBUG_FS /* * debug_root is a static property of the vfio_device --=20 2.34.1 From nobody Thu Apr 16 12:24:56 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 68D303806BE for ; Tue, 14 Apr 2026 21:14:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776201262; cv=none; b=M9+d1HatQK3UxtY6u4ursGH1J3lDYBDZFAmt5Uqgfh9mWMlXYlbesQ00Cx+zu5QFqdjnT7Yw0wc5PaIIaYGgv/6O3zOvhnpWHxDJ5uSEjw6lXMjgezmKJqq5zq8KHASaORTbpMD/f0zVBrb6EeT40C8wExrfg/HE5yh3mJFkwcM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776201262; c=relaxed/simple; bh=EphhcaxyLDa/srh7+qco6UOAcGo9WUAhUYaGgE0Il44=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=W+BVxcchCmBPqfYn8YVGpktIY7ElnWE99oPfNdM8YBOM2rswVDHxVP4fQCgAimlQz0ZDf4+FMiyM2JAmknH6Cr7fRWZurRHquFe0kD/S0lvQ7aAJXaId8OOjJ81togiKwtiEnPW+0upyT6OcVTwtCcrrj1yvOHWHX0kgzV53kAg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=SeISo8EA; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="SeISo8EA" Received: from DESKTOP-0403QTC.corp.microsoft.com (unknown [20.191.74.188]) by linux.microsoft.com (Postfix) with ESMTPSA id 8EC4320B6F0C; Tue, 14 Apr 2026 14:14:19 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 8EC4320B6F0C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1776201259; bh=Lnu30SbS/Ukekt8h2B59910/avV7ytB1fZw6oe0YW1Y=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=SeISo8EAJMWroAt4wBK3GUhKtTCGZNmP6RKfg06+D4Mlftgwl350rvRmPnHJ/Wrj1 fnMMp+QPbcdsEEDIrR3FX+uBGFkQlxlkIrYBs1LE2Q6Ue6BLtr7gH7XDRdRsGBrgEG YbBFgL01zt52HTGaQEByLpvKuCnEfCue1XxG4q50= From: Jacob Pan To: linux-kernel@vger.kernel.org, "iommu@lists.linux.dev" , Jason Gunthorpe , Alex Williamson , Joerg Roedel , Mostafa Saleh , David Matlack , Robin Murphy , Nicolin Chen , "Tian, Kevin" , Yi Liu Cc: skhawaja@google.com, pasha.tatashin@soleen.com, Will Deacon , Jacob Pan , Baolu Lu Subject: [PATCH V4 07/10] vfio: Enable cdev noiommu mode under iommufd Date: Tue, 14 Apr 2026 14:14:09 -0700 Message-Id: <20260414211412.2729-8-jacob.pan@linux.microsoft.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260414211412.2729-1-jacob.pan@linux.microsoft.com> References: <20260414211412.2729-1-jacob.pan@linux.microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Now that devices under noiommu mode can bind with IOMMUFD and perform IOAS operations, lift restrictions on cdev from VFIO side. No IOMMU cdevs are explicitly named with noiommu prefix. e.g. /dev/vfio/ |-- 7 |-- devices | `-- noiommu-vfio0 `-- vfio Signed-off-by: Jacob Pan --- v4: - Move vfio_device_has_group() related out to 5/10 - Keep wait loop in vfio_unregister_group_dev (Jason) v3: - Add explict dependency on !GENERIC_ATOMIC64 v2: - Fix build dependency on IOMMU_SUPPORT --- drivers/vfio/Kconfig | 8 ++++++-- drivers/vfio/iommufd.c | 7 ------- drivers/vfio/vfio.h | 8 +------- drivers/vfio/vfio_main.c | 20 ++++++-------------- 4 files changed, 13 insertions(+), 30 deletions(-) diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig index ceae52fd7586..c013255bf7f1 100644 --- a/drivers/vfio/Kconfig +++ b/drivers/vfio/Kconfig @@ -22,8 +22,7 @@ config VFIO_DEVICE_CDEV The VFIO device cdev is another way for userspace to get device access. Userspace gets device fd by opening device cdev under /dev/vfio/devices/vfioX, and then bind the device fd with an iommufd - to set up secure DMA context for device access. This interface does - not support noiommu. + to set up secure DMA context for device access. =20 If you don't know what to do here, say N. =20 @@ -63,6 +62,11 @@ endif config VFIO_NOIOMMU bool "VFIO No-IOMMU support" depends on VFIO_GROUP + depends on !GENERIC_ATOMIC64 # IOMMU_PT_AMDV1 requires cmpxchg64 + select GENERIC_PT + select IOMMU_PT + select IOMMU_PT_AMDV1 + depends on IOMMU_SUPPORT help VFIO is built on the ability to isolate devices using the IOMMU. Only with an IOMMU can userspace access to DMA capable devices be diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c index a38d262c6028..26c9c3068c77 100644 --- a/drivers/vfio/iommufd.c +++ b/drivers/vfio/iommufd.c @@ -25,10 +25,6 @@ int vfio_df_iommufd_bind(struct vfio_device_file *df) =20 lockdep_assert_held(&vdev->dev_set->lock); =20 - /* Returns 0 to permit device opening under noiommu mode */ - if (vfio_device_is_noiommu(vdev)) - return 0; - return vdev->ops->bind_iommufd(vdev, ictx, &df->devid); } =20 @@ -58,9 +54,6 @@ void vfio_df_iommufd_unbind(struct vfio_device_file *df) =20 lockdep_assert_held(&vdev->dev_set->lock); =20 - if (vfio_device_is_noiommu(vdev)) - return; - if (vdev->ops->unbind_iommufd) vdev->ops->unbind_iommufd(vdev); } diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h index 9e25605da564..ad9e09f6d095 100644 --- a/drivers/vfio/vfio.h +++ b/drivers/vfio/vfio.h @@ -376,19 +376,13 @@ void vfio_init_device_cdev(struct vfio_device *device= ); =20 static inline int vfio_device_add(struct vfio_device *device) { - /* cdev does not support noiommu device */ - if (vfio_device_is_noiommu(device)) - return device_add(&device->device); vfio_init_device_cdev(device); return cdev_device_add(&device->cdev, &device->device); } =20 static inline void vfio_device_del(struct vfio_device *device) { - if (vfio_device_is_noiommu(device)) - device_del(&device->device); - else - cdev_device_del(&device->cdev, &device->device); + cdev_device_del(&device->cdev, &device->device); } =20 int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep); diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index 5d7c2d014689..3ae3d34c21cc 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -332,13 +332,15 @@ static int __vfio_register_dev(struct vfio_device *de= vice, if (!device->dev_set) vfio_assign_device_set(device, device); =20 - ret =3D dev_set_name(&device->device, "vfio%d", device->index); + ret =3D vfio_device_set_group(device, type); if (ret) return ret; =20 - ret =3D vfio_device_set_group(device, type); + /* Just to be safe, expose to user explicitly noiommu cdev node */ + ret =3D dev_set_name(&device->device, "%svfio%d", + device->noiommu ? "noiommu-" : "", device->index); if (ret) - return ret; + goto err_out; =20 /* * VFIO always sets IOMMU_CACHE because we offer no way for userspace to @@ -359,7 +361,7 @@ static int __vfio_register_dev(struct vfio_device *devi= ce, refcount_set(&device->refcount, 1); =20 /* noiommu device w/o container may have NULL group */ - if (!vfio_device_has_group(device)) + if (vfio_device_is_noiommu(device) && !vfio_device_has_group(device)) return 0; =20 vfio_device_group_register(device); @@ -396,16 +398,6 @@ void vfio_unregister_group_dev(struct vfio_device *dev= ice) bool interrupted =3D false; long rc; =20 - /* - * For noiommu devices without a container, thus no dummy group, - * simply delete and unregister to balance refcount. - */ - if (!vfio_device_has_group(device)) { - vfio_device_del(device); - vfio_device_put_registration(device); - return; - } - /* * Prevent new device opened by userspace via the * VFIO_GROUP_GET_DEVICE_FD in the group path. --=20 2.34.1 From nobody Thu Apr 16 12:24:56 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id E88583815C8 for ; Tue, 14 Apr 2026 21:14:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776201262; cv=none; b=VQ34kdCMyv88wWiDsvi2aKvvvq75RWAwoadYd1eowrXsXlRH7zM34zQDhIQQoelAABfDE7FIqNIAiqylNmu4tVmGwqh3swyCYvNsRUNWN+bcmCqaGaus/i6VArqozW9VZyszKcW5ChVRUYu5FsHS0q4KEBK3zyHYUqwMbiJyTqM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776201262; c=relaxed/simple; bh=V31Q3Af4wdCYq4VTT4ChM3OKL1fBC/ih/e2pbKE1f2M=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Awn/DfHPf8Qs3LfOKHvQr0eJbc/KiCMiOTEHDOrVri9NL/rEcGgBH8vKkV9ASeQHkDIDNLlot48rBqAgivsavrcwIkr4vWf4cnFvdIOd+D3v0VtgjavX47P9HGMCLGrQDPGyf59vrbxelUdzfl/qn7dQmvEnd02uX6MCDUHy/lo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=F1wmVuWX; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="F1wmVuWX" Received: from DESKTOP-0403QTC.corp.microsoft.com (unknown [20.191.74.188]) by linux.microsoft.com (Postfix) with ESMTPSA id 5ECEF20B6F1F; Tue, 14 Apr 2026 14:14:20 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 5ECEF20B6F1F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1776201260; bh=Kwjfymmh3pQiKj6OmsaX+cYDKLyC7WNs0YCS0m8B5TQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=F1wmVuWXfx5tm/UhFRMafzxjMkqpKCtjX3JXVEYqtZmqOnLg9E13mC+FcpOe55pnQ WCO0J/pVCOAYpLfr/5G5yhsJlVFBsxXM+XH9LhW0FYJnS8YO1E8c+IJezB6EaF/AMD nHGerkeCopHEoH6lIC3dtOEY+61QCz6LPIWnggz4= From: Jacob Pan To: linux-kernel@vger.kernel.org, "iommu@lists.linux.dev" , Jason Gunthorpe , Alex Williamson , Joerg Roedel , Mostafa Saleh , David Matlack , Robin Murphy , Nicolin Chen , "Tian, Kevin" , Yi Liu Cc: skhawaja@google.com, pasha.tatashin@soleen.com, Will Deacon , Jacob Pan , Baolu Lu Subject: [PATCH V4 08/10] vfio:selftest: Handle VFIO noiommu cdev Date: Tue, 14 Apr 2026 14:14:10 -0700 Message-Id: <20260414211412.2729-9-jacob.pan@linux.microsoft.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260414211412.2729-1-jacob.pan@linux.microsoft.com> References: <20260414211412.2729-1-jacob.pan@linux.microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" With unsafe DMA noiommu mode, the vfio devices are prefixed with noiommu-, e.g. /dev/vfio/ |-- devices | `-- noiommu-vfio0 |-- noiommu-0 `-- vfio Let vfio tests, such as luo kexec test, accommodate the noiommu device files. Signed-off-by: Jacob Pan --- .../lib/include/libvfio/vfio_pci_device.h | 1 + .../selftests/vfio/lib/vfio_pci_device.c | 32 ++++++++++++++++--- .../vfio/vfio_pci_liveupdate_kexec_test.c | 9 ++++++ 3 files changed, 38 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_devi= ce.h b/tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h index 2389c7698335..e2721e36b37e 100644 --- a/tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h +++ b/tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h @@ -124,6 +124,7 @@ static inline bool vfio_pci_device_match(struct vfio_pc= i_device *device, } =20 const char *vfio_pci_get_cdev_path(const char *bdf); +int vfio_pci_noiommu_mode_enabled(void); =20 /* Low-level routines for setting up a struct vfio_pci_device */ struct vfio_pci_device *vfio_pci_device_alloc(const char *bdf, struct iomm= u *iommu); diff --git a/tools/testing/selftests/vfio/lib/vfio_pci_device.c b/tools/tes= ting/selftests/vfio/lib/vfio_pci_device.c index 66ee268110e2..1ba81d169208 100644 --- a/tools/testing/selftests/vfio/lib/vfio_pci_device.c +++ b/tools/testing/selftests/vfio/lib/vfio_pci_device.c @@ -225,11 +225,14 @@ void vfio_pci_group_setup(struct vfio_pci_device *dev= ice) struct vfio_group_status group_status =3D { .argsz =3D sizeof(group_status), }; - char group_path[32]; + char group_path[64]; int group; =20 group =3D vfio_pci_get_group_from_dev(device->bdf); - snprintf(group_path, sizeof(group_path), "/dev/vfio/%d", group); + if (vfio_pci_noiommu_mode_enabled()) + snprintf(group_path, sizeof(group_path), "/dev/vfio/noiommu-%d", group); + else + snprintf(group_path, sizeof(group_path), "/dev/vfio/%d", group); =20 device->group_fd =3D open(group_path, O_RDWR); VFIO_ASSERT_GE(device->group_fd, 0, "open(%s) failed\n", group_path); @@ -294,6 +297,24 @@ static void vfio_pci_device_setup(struct vfio_pci_devi= ce *device) device->msi_eventfds[i] =3D -1; } =20 + +int vfio_pci_noiommu_mode_enabled(void) +{ + const char *path =3D "/sys/module/vfio/parameters/enable_unsafe_noiommu_m= ode"; + FILE *f; + int c; + + f =3D fopen(path, "re"); + if (!f) + return 0; + + c =3D fgetc(f); + fclose(f); + if (c =3D=3D 'Y' || c =3D=3D 'y') + return 1; + return 0; +} + const char *vfio_pci_get_cdev_path(const char *bdf) { char dir_path[PATH_MAX]; @@ -310,8 +331,11 @@ const char *vfio_pci_get_cdev_path(const char *bdf) VFIO_ASSERT_NOT_NULL(dir, "Failed to open directory %s\n", dir_path); =20 while ((entry =3D readdir(dir)) !=3D NULL) { - /* Find the file that starts with "vfio" */ - if (strncmp("vfio", entry->d_name, 4)) + /* Find the file that starts with "noiommu-vfio" or "vfio" */ + if (vfio_pci_noiommu_mode_enabled()) { + if (strncmp("noiommu-vfio", entry->d_name, strlen("noiommu-vfio"))) + continue; + } else if (strncmp("vfio", entry->d_name, 4)) continue; =20 snprintf(cdev_path, PATH_MAX, "/dev/vfio/devices/%s", entry->d_name); diff --git a/tools/testing/selftests/vfio/vfio_pci_liveupdate_kexec_test.c = b/tools/testing/selftests/vfio/vfio_pci_liveupdate_kexec_test.c index 36bddfbb88ed..d72f6a58e3e6 100644 --- a/tools/testing/selftests/vfio/vfio_pci_liveupdate_kexec_test.c +++ b/tools/testing/selftests/vfio/vfio_pci_liveupdate_kexec_test.c @@ -159,6 +159,15 @@ static void check_open_vfio_device_fails(void) VFIO_ASSERT_EQ(errno, EBUSY); free((void *)cdev_path); =20 + /* + * In no-IOMMU mode the group lives at /dev/vfio/noiommu- and + * cannot be added to a Type1 IOMMU container, so the container-based + * access check below doesn't apply. The cdev check above already + * covers that the device is inaccessible in this mode. + */ + if (vfio_pci_noiommu_mode_enabled()) + return; + for (i =3D 0; i < nr_iommu_modes; i++) { if (!iommu_modes[i].container_path) continue; --=20 2.34.1 From nobody Thu Apr 16 12:24:56 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 017E83815CC for ; Tue, 14 Apr 2026 21:14:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776201265; cv=none; b=e3K2oWUPyWm2UILKtXIVq7IZAEDSIXJEAqN+zahX3kSHGnaa1y/78bnsV2nmb/L2XOKj8oUV03WgDcxSR+ugW1CWEpBIy0UhmItlgjmrysgsEgIcG3Sa992S4x8PYvOSfDWG+9czJAPYcm6GOxNqwo8bNNVznTNnbZnPQxrurLY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776201265; c=relaxed/simple; bh=5B0f72xmfuuSSae9Cg7q4gC+NTxFnLeyeJV3Nf3kGr0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=WIyv4j3jjWjHG5ByB86TdT6SgUlfaGlkRDMA5ae/0bfDck8pjx/Kh62K4g0Lk2si5H4kTeesS00S38LdErCZV7j5SJNE1KwxSTmiYBJ5ID+Fn6DJ8vw1zWIGEnXkTIHksRJ9+Jdf5ClOuov0mxVKShB56uP5MrnpNT8+MleQbOU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=aGYpgZnV; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="aGYpgZnV" Received: from DESKTOP-0403QTC.corp.microsoft.com (unknown [20.191.74.188]) by linux.microsoft.com (Postfix) with ESMTPSA id 0B1D120B6F12; Tue, 14 Apr 2026 14:14:21 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 0B1D120B6F12 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1776201261; bh=YovwgwBZ9FoNQ6xv2sayOs2NmuuyHFT4cnEBT48dSVk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=aGYpgZnV7tXo1OIUe9shW5cFsGjuLdkroHGOIzfArnBixmpV0zb2NH5dvICcmBmuu aunyMHi5+xeAm+yNYBYKvzl9qbr0DftFMVo1xLvuhdMfHMqfOirJL8yMqS4w5Ity8n 5AjnQYAOR564lRrBgXGeC74JDaD/+iEdE0RzejXI= From: Jacob Pan To: linux-kernel@vger.kernel.org, "iommu@lists.linux.dev" , Jason Gunthorpe , Alex Williamson , Joerg Roedel , Mostafa Saleh , David Matlack , Robin Murphy , Nicolin Chen , "Tian, Kevin" , Yi Liu Cc: skhawaja@google.com, pasha.tatashin@soleen.com, Will Deacon , Jacob Pan , Baolu Lu Subject: [PATCH V4 09/10] selftests/vfio: Add iommufd noiommu mode selftest for cdev Date: Tue, 14 Apr 2026 14:14:11 -0700 Message-Id: <20260414211412.2729-10-jacob.pan@linux.microsoft.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260414211412.2729-1-jacob.pan@linux.microsoft.com> References: <20260414211412.2729-1-jacob.pan@linux.microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add comprehensive selftest for VFIO device operations with iommufd in noiommu mode. Tests cover: - Device binding to iommufd - IOAS (I/O Address Space) allocation, mapping with dummy IOVA - Retrieve PA from dummy IOVA - Device attach/detach operations as usual Signed-off-by: Jacob Pan --- v4: - Add a test case for unaligned IOVA. v2: - Use huge page ioas map to test GET_PA searching for contiguous PA range. --- tools/testing/selftests/vfio/Makefile | 1 + .../vfio/vfio_iommufd_noiommu_test.c | 567 ++++++++++++++++++ 2 files changed, 568 insertions(+) create mode 100644 tools/testing/selftests/vfio/vfio_iommufd_noiommu_test.c diff --git a/tools/testing/selftests/vfio/Makefile b/tools/testing/selftest= s/vfio/Makefile index 792c4245d4f7..ce76f3f3df60 100644 --- a/tools/testing/selftests/vfio/Makefile +++ b/tools/testing/selftests/vfio/Makefile @@ -9,6 +9,7 @@ CFLAGS =3D $(KHDR_INCLUDES) TEST_GEN_PROGS +=3D vfio_dma_mapping_test TEST_GEN_PROGS +=3D vfio_dma_mapping_mmio_test TEST_GEN_PROGS +=3D vfio_iommufd_setup_test +TEST_GEN_PROGS +=3D vfio_iommufd_noiommu_test TEST_GEN_PROGS +=3D vfio_pci_device_test TEST_GEN_PROGS +=3D vfio_pci_device_init_perf_test TEST_GEN_PROGS +=3D vfio_pci_driver_test diff --git a/tools/testing/selftests/vfio/vfio_iommufd_noiommu_test.c b/too= ls/testing/selftests/vfio/vfio_iommufd_noiommu_test.c new file mode 100644 index 000000000000..0e217cc29b5b --- /dev/null +++ b/tools/testing/selftests/vfio/vfio_iommufd_noiommu_test.c @@ -0,0 +1,567 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * VFIO iommufd NoIOMMU Mode Selftest + * + * Tests VFIO device operations with iommufd in noiommu mode, including: + * - Device binding to iommufd + * - IOAS (I/O Address Space) allocation and management + * - Device attach/detach to IOAS + * - Memory mapping in IOAS + * - Device info queries and reset + */ + +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include "kselftest_harness.h" + +static const char iommu_dev_path[] =3D "/dev/iommu"; +static const char *cdev_path; + +static char *vfio_noiommu_get_device_id(const char *bdf) +{ + char *path =3D NULL; + char *vfio_id =3D NULL; + struct dirent *dentry; + DIR *dp; + + if (asprintf(&path, "/sys/bus/pci/devices/%s/vfio-dev", bdf) < 0) + return NULL; + + dp =3D opendir(path); + if (!dp) { + free(path); + return NULL; + } + + while ((dentry =3D readdir(dp)) !=3D NULL) { + if (strncmp("noiommu-vfio", dentry->d_name, 12) =3D=3D 0) { + vfio_id =3D strdup(dentry->d_name); + break; + } + } + + closedir(dp); + free(path); + return vfio_id; +} + +static char *vfio_noiommu_get_cdev_path(const char *bdf) +{ + char *vfio_id =3D vfio_noiommu_get_device_id(bdf); + char *cdev =3D NULL; + + if (vfio_id) { + asprintf(&cdev, "/dev/vfio/devices/%s", vfio_id); + free(vfio_id); + } + return cdev; +} + +static int vfio_device_bind_iommufd_ioctl(int cdev_fd, int iommufd) +{ + struct vfio_device_bind_iommufd bind_args =3D { + .argsz =3D sizeof(bind_args), + .iommufd =3D iommufd, + }; + + return ioctl(cdev_fd, VFIO_DEVICE_BIND_IOMMUFD, &bind_args); +} + +static int vfio_device_get_info_ioctl(int cdev_fd, + struct vfio_device_info *info) +{ + info->argsz =3D sizeof(*info); + return ioctl(cdev_fd, VFIO_DEVICE_GET_INFO, info); +} + +static int vfio_device_ioas_alloc_ioctl(int iommufd, + struct iommu_ioas_alloc *alloc_args) +{ + alloc_args->size =3D sizeof(*alloc_args); + alloc_args->flags =3D 0; + return ioctl(iommufd, IOMMU_IOAS_ALLOC, alloc_args); +} + +static int vfio_device_attach_iommufd_pt_ioctl(int cdev_fd, u32 pt_id) +{ + struct vfio_device_attach_iommufd_pt attach_args =3D { + .argsz =3D sizeof(attach_args), + .pt_id =3D pt_id, + }; + + return ioctl(cdev_fd, VFIO_DEVICE_ATTACH_IOMMUFD_PT, &attach_args); +} + +static int vfio_device_detach_iommufd_pt_ioctl(int cdev_fd) +{ + struct vfio_device_detach_iommufd_pt detach_args =3D { + .argsz =3D sizeof(detach_args), + }; + + return ioctl(cdev_fd, VFIO_DEVICE_DETACH_IOMMUFD_PT, &detach_args); +} + +static int vfio_device_get_region_info_ioctl(int cdev_fd, uint32_t index, + struct vfio_region_info *info) +{ + info->argsz =3D sizeof(*info); + info->index =3D index; + return ioctl(cdev_fd, VFIO_DEVICE_GET_REGION_INFO, info); +} + +static int vfio_device_reset_ioctl(int cdev_fd) +{ + return ioctl(cdev_fd, VFIO_DEVICE_RESET); +} + +static int ioas_map_pages(int iommufd, uint32_t ioas_id, uint64_t iova, + size_t length, bool hugepages) +{ + struct iommu_ioas_map map_args =3D { + .size =3D sizeof(map_args), + .ioas_id =3D ioas_id, + .iova =3D iova, + .length =3D length, + .flags =3D IOMMU_IOAS_MAP_READABLE | IOMMU_IOAS_MAP_WRITEABLE | IOMMU_IO= AS_MAP_FIXED_IOVA, + }; + void *pages; + int ret; + + /* Allocate test pages */ + if (hugepages) + pages =3D mmap(NULL, length, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0); + else + pages =3D mmap(NULL, length, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + if (pages =3D=3D MAP_FAILED) { + printf("mmap failed for length 0x%lx\n", (unsigned long)length); + return -ENOMEM; + } + + /* Set up page pointer for mapping */ + map_args.user_va =3D (uintptr_t)pages; + + printf(" ioas_map_pages: ioas_id=3D%u, iova=3D0x%lx, length=3D0x%lx, use= r_va=3D%p\n", + ioas_id, (unsigned long)iova, (unsigned long)length, pages); + + /* Map into IOAS */ + ret =3D ioctl(iommufd, IOMMU_IOAS_MAP, &map_args); + if (ret !=3D 0) + printf(" IOMMU_IOAS_MAP failed: %d (%s)\n", ret, strerror(errno)); + else + printf(" IOMMU_IOAS_MAP succeeded, IOVA=3D0x%lx\n", (unsigned long)map_= args.iova); + + munmap(pages, length); + return ret; +} + +static int ioas_unmap_pages(int iommufd, uint32_t ioas_id, uint64_t iova, + size_t length) +{ + struct iommu_ioas_unmap unmap_args =3D { + .size =3D sizeof(unmap_args), + .ioas_id =3D ioas_id, + .iova =3D iova, + .length =3D length, + }; + + return ioctl(iommufd, IOMMU_IOAS_UNMAP, &unmap_args); +} + +static int ioas_destroy_ioctl(int iommufd, uint32_t ioas_id) +{ + struct iommu_destroy destroy_args =3D { + .size =3D sizeof(destroy_args), + .id =3D ioas_id, + }; + + return ioctl(iommufd, IOMMU_DESTROY, &destroy_args); +} + +static int ioas_get_pa_ioctl(int iommufd, uint32_t ioas_id, uint64_t iova, + uint64_t *phys_out, uint64_t *length_out) +{ + struct { + __u32 size; + __u32 flags; + __u32 ioas_id; + __u32 __reserved; + __u64 iova; + __u64 out_length; + __u64 out_phys; + } get_pa =3D { + .size =3D sizeof(get_pa), + .flags =3D 0, + .ioas_id =3D ioas_id, + .iova =3D iova, + }; + + printf(" ioas_get_pa_ioctl: ioas_id=3D%u, iova=3D0x%lx\n", + ioas_id, (unsigned long)iova); + + if (ioctl(iommufd, IOMMU_IOAS_GET_PA, &get_pa) !=3D 0) { + printf(" IOMMU_IOAS_GET_PA failed: %s (errno=3D%d)\n", + strerror(errno), errno); + return -1; + } + + printf(" IOMMU_IOAS_GET_PA succeeded: PA=3D0x%lx, length=3D0x%lx\n", + (unsigned long)get_pa.out_phys, (unsigned long)get_pa.out_length); + + if (phys_out) + *phys_out =3D get_pa.out_phys; + if (length_out) + *length_out =3D get_pa.out_length; + + return 0; +} + +FIXTURE(vfio_noiommu) { + int cdev_fd; + int iommufd; +}; + +FIXTURE_SETUP(vfio_noiommu) +{ + ASSERT_LE(0, (self->cdev_fd =3D open(cdev_path, O_RDWR, 0))); + ASSERT_LE(0, (self->iommufd =3D open(iommu_dev_path, O_RDWR, 0))); +} + +FIXTURE_TEARDOWN(vfio_noiommu) +{ + if (self->cdev_fd >=3D 0) + close(self->cdev_fd); + if (self->iommufd >=3D 0) + close(self->iommufd); +} + +/* + * Test: Device cdev can be opened + */ +TEST_F(vfio_noiommu, device_cdev_open) +{ + ASSERT_LE(0, self->cdev_fd); +} + +/* + * Test: Device can be bound to iommufd + */ +TEST_F(vfio_noiommu, device_bind_iommufd) +{ + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd, + self->iommufd)); +} + +/* + * Test: Device info can be queried after binding + */ +TEST_F(vfio_noiommu, device_get_info_after_bind) +{ + struct vfio_device_info info; + + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd, + self->iommufd)); + ASSERT_EQ(0, vfio_device_get_info_ioctl(self->cdev_fd, &info)); + ASSERT_NE(0, info.argsz); +} + +/* + * Test: Getting device info fails without bind + */ +TEST_F(vfio_noiommu, device_get_info_without_bind_fails) +{ + struct vfio_device_info info; + + ASSERT_NE(0, vfio_device_get_info_ioctl(self->cdev_fd, &info)); +} + +/* + * Test: Binding with invalid iommufd fails + */ +TEST_F(vfio_noiommu, device_bind_bad_iommufd_fails) +{ + ASSERT_NE(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd, -2)); +} + +/* + * Test: Cannot bind twice to same device + */ +TEST_F(vfio_noiommu, device_repeated_bind_fails) +{ + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd, + self->iommufd)); + ASSERT_NE(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd, + self->iommufd)); +} + +/* + * Test: IOAS can be allocated + */ +TEST_F(vfio_noiommu, ioas_alloc) +{ + struct iommu_ioas_alloc alloc_args; + + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd, + &alloc_args)); + ASSERT_NE(0, alloc_args.out_ioas_id); +} + +/* + * Test: IOAS can be destroyed + */ +TEST_F(vfio_noiommu, ioas_destroy) +{ + struct iommu_ioas_alloc alloc_args; + + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd, + &alloc_args)); + ASSERT_EQ(0, ioas_destroy_ioctl(self->iommufd, + alloc_args.out_ioas_id)); +} + +/* + * Test: Device can attach to IOAS after binding + */ +TEST_F(vfio_noiommu, device_attach_to_ioas) +{ + struct iommu_ioas_alloc alloc_args; + + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd, + self->iommufd)); + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd, + &alloc_args)); + ASSERT_EQ(0, vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd, + alloc_args.out_ioas_id)); +} + +/* + * Test: Attaching to invalid IOAS fails + */ +TEST_F(vfio_noiommu, device_attach_invalid_ioas_fails) +{ + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd, + self->iommufd)); + ASSERT_NE(0, vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd, + UINT32_MAX)); +} + +/* + * Test: Device can detach from IOAS + */ +TEST_F(vfio_noiommu, device_detach_from_ioas) +{ + struct iommu_ioas_alloc alloc_args; + + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd, + self->iommufd)); + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd, + &alloc_args)); + ASSERT_EQ(0, vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd, + alloc_args.out_ioas_id)); + ASSERT_EQ(0, vfio_device_detach_iommufd_pt_ioctl(self->cdev_fd)); +} + +/* + * Test: Full lifecycle - bind, attach, detach, reset + */ +TEST_F(vfio_noiommu, device_lifecycle) +{ + struct iommu_ioas_alloc alloc_args; + struct vfio_device_info info; + + /* Bind device to iommufd */ + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd, + self->iommufd)); + + /* Allocate IOAS */ + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd, + &alloc_args)); + + /* Attach device to IOAS */ + ASSERT_EQ(0, vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd, + alloc_args.out_ioas_id)); + + /* Query device info */ + ASSERT_EQ(0, vfio_device_get_info_ioctl(self->cdev_fd, &info)); + + /* Detach device from IOAS */ + ASSERT_EQ(0, vfio_device_detach_iommufd_pt_ioctl(self->cdev_fd)); + + /* Reset device */ + ASSERT_EQ(0, vfio_device_reset_ioctl(self->cdev_fd)); +} + +/* + * Test: Get region info + */ +TEST_F(vfio_noiommu, device_get_region_info) +{ + struct vfio_device_info dev_info; + struct vfio_region_info region_info; + + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd, + self->iommufd)); + ASSERT_EQ(0, vfio_device_get_info_ioctl(self->cdev_fd, &dev_info)); + + /* Try to get first region info if device has regions */ + if (dev_info.num_regions > 0) { + ASSERT_EQ(0, vfio_device_get_region_info_ioctl(self->cdev_fd, 0, + ®ion_info)); + ASSERT_NE(0, region_info.argsz); + } +} + +TEST_F(vfio_noiommu, device_reset) +{ + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd, + self->iommufd)); + ASSERT_EQ(0, vfio_device_reset_ioctl(self->cdev_fd)); +} + +TEST_F(vfio_noiommu, ioas_map_pages) +{ + struct iommu_ioas_alloc alloc_args; + long page_size =3D sysconf(_SC_PAGESIZE); + uint64_t iova =3D 0x10000; + int i; + + ASSERT_GT(page_size, 0); + + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd, + &alloc_args)); + + printf("Page size: %ld bytes\n", page_size); + /* Test mapping regions of different sizes: 1, 2, 4, 8 pages */ + for (i =3D 0; i < 4; i++) { + size_t map_size =3D page_size * (1 << i); /* 1, 2, 4, 8 pages */ + uint64_t test_iova =3D iova + (i * 0x100000); + + /* Attempt to map each region (may fail if not supported) */ + ioas_map_pages(self->iommufd, alloc_args.out_ioas_id, + test_iova, map_size, false); + } +} + +TEST_F(vfio_noiommu, multiple_ioas_alloc) +{ + struct iommu_ioas_alloc alloc1, alloc2; + + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd, &alloc1)); + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd, &alloc2)); + ASSERT_NE(alloc1.out_ioas_id, alloc2.out_ioas_id); +} + +/* + * Test: Query physical address for IOVA + * Tests IOMMU_IOAS_GET_PA ioctl to translate IOVA to physical address + * Note: Device must be attached to IOAS for PA query to work + */ +#define NR_PAGES 32 +TEST_F(vfio_noiommu, ioas_get_pa_mapped) +{ + struct iommu_ioas_alloc alloc_args; + long page_size =3D sysconf(_SC_PAGESIZE); + uint64_t iova =3D 0x200000; + uint64_t phys =3D 0; + uint64_t length =3D 0; + int ret; + + ASSERT_GT(page_size, 0); + + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd, + self->iommufd)); + + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd, + &alloc_args)); + + ASSERT_EQ(0, vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd, + alloc_args.out_ioas_id)); + + /* + * Map a page into an arbitrary IOAS, used as a cookie for lookup. + * Use hugepages to test contiguous PA. Make sure hugepages are + * available. e.g. echo 64 > /proc/sys/vm/nr_hugepages + */ + ret =3D ioas_map_pages(self->iommufd, alloc_args.out_ioas_id, + iova, page_size * NR_PAGES, true); + if (ret !=3D 0) + return; + + /* Query the physical address for the mapped dummy IOVA */ + ret =3D ioas_get_pa_ioctl(self->iommufd, alloc_args.out_ioas_id, + iova, &phys, &length); + + if (ret =3D=3D 0) { + /* If we got a result, verify it's valid */ + ASSERT_NE(0, phys); + ASSERT_GE((uint64_t)page_size * NR_PAGES, length); + } + + /* + * Query with a non-page-aligned IOVA. The returned length must + * not exceed the actual contiguous range starting from that + * offset, i.e. it must be reduced by the sub-page offset. + */ + phys =3D 0; + length =3D 0; + ret =3D ioas_get_pa_ioctl(self->iommufd, alloc_args.out_ioas_id, + iova + 0x80, &phys, &length); + if (ret =3D=3D 0) { + ASSERT_NE(0, phys); + /* Length must account for the sub-page offset */ + ASSERT_GE((uint64_t)page_size * NR_PAGES - 0x80, length); + ASSERT_LE(length, (uint64_t)page_size * NR_PAGES - 0x80); + /* Must not overshoot into the next page boundary */ + ASSERT_EQ(0, (phys + length) % page_size); + } +} + +TEST_F(vfio_noiommu, ioas_get_pa_unmapped_fails) +{ + struct iommu_ioas_alloc alloc_args; + + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd, + &alloc_args)); + + /* Try to retrieve unmapped IOVA (should fail) */ + ASSERT_NE(0, ioas_get_pa_ioctl(self->iommufd, alloc_args.out_ioas_id, + 0x10000, NULL, NULL)); +} + +int main(int argc, char *argv[]) +{ + const char *device_bdf =3D vfio_selftests_get_bdf(&argc, argv); + char *cdev =3D NULL; + + if (!device_bdf) { + ksft_print_msg("No device BDF provided\n"); + return KSFT_SKIP; + } + + cdev =3D vfio_noiommu_get_cdev_path(device_bdf); + if (!cdev) { + ksft_print_msg("Could not find cdev for device %s\n", + device_bdf); + return KSFT_SKIP; + } + + cdev_path =3D cdev; + ksft_print_msg("Using cdev device %s for BDF %s\n", cdev_path, + device_bdf); + + return test_harness_run(argc, argv); +} --=20 2.34.1 From nobody Thu Apr 16 12:24:56 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id C477E3815F7 for ; Tue, 14 Apr 2026 21:14:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776201265; cv=none; b=Qm4a730dT7WU+/XEY+/gxzP9mdQSv8mamO/3cVzsyfwD0ClZPE73UUXtRiKTzPN0NPyUN6KCXDXHQmdCAIVKqslsnZ18PfhNJCcsxSKgpeytBDvV5D4BYlB8UM3E84H04hixg9AmHGnHaBagCqM7m7pphwMMgnFqu+l8pS2Y24c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776201265; c=relaxed/simple; bh=WurGjAPySDnxSs87Rz7to6pBVzCWRroBMaf7/9rmgL8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=NPpeLXagPu2EDys/QSgLR6LRxIFh1jzr2LzlrjPk3SupyHRmHRJg+pdT6t+fpLooQ6Z1UmWSkHmzJbR/+O6l4wQuPEyRAgTijXwzHBGz+4Kq/Hkfz5nCUYuHGmJpNpLhMj79uuYuciy8JmybePOH680aPPBuTSgPgs73OGF4umo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=l+3rwstd; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="l+3rwstd" Received: from DESKTOP-0403QTC.corp.microsoft.com (unknown [20.191.74.188]) by linux.microsoft.com (Postfix) with ESMTPSA id 4685920B6F1B; Tue, 14 Apr 2026 14:14:22 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 4685920B6F1B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1776201262; bh=5Pty+EK/52Y89Hf1Y5QloHTVW8bXBevk8NFrPi8hG1k=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=l+3rwstd9KDpXKl12D6liRYRGrQPHZOoGaeQESopXizr4hMMrq4cKZReORqETw01V 4yQ5gVb141uXUL+J6kApQXqf4moT8eCvvDc2LX6Co60DEBM4sGaCGxhnrydONKfZQ9 jYKz+2dX23jIvAMqH7xXJQV1srcsM00m129ayJqo= From: Jacob Pan To: linux-kernel@vger.kernel.org, "iommu@lists.linux.dev" , Jason Gunthorpe , Alex Williamson , Joerg Roedel , Mostafa Saleh , David Matlack , Robin Murphy , Nicolin Chen , "Tian, Kevin" , Yi Liu Cc: skhawaja@google.com, pasha.tatashin@soleen.com, Will Deacon , Jacob Pan , Baolu Lu Subject: [PATCH V4 10/10] Documentation: Update VFIO NOIOMMU mode Date: Tue, 14 Apr 2026 14:14:12 -0700 Message-Id: <20260414211412.2729-11-jacob.pan@linux.microsoft.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260414211412.2729-1-jacob.pan@linux.microsoft.com> References: <20260414211412.2729-1-jacob.pan@linux.microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Document the NOIOMMU mode with newly added cdev support under iommufd. Cc: Jonathan Corbet Signed-off-by: Jacob Pan --- Documentation/driver-api/vfio.rst | 45 +++++++++++++++++++++++++++++-- 1 file changed, 43 insertions(+), 2 deletions(-) diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/v= fio.rst index 2a21a42c9386..da6f77414c3b 100644 --- a/Documentation/driver-api/vfio.rst +++ b/Documentation/driver-api/vfio.rst @@ -275,8 +275,6 @@ in a VFIO group. With CONFIG_VFIO_DEVICE_CDEV=3Dy the user can now acquire a device fd by directly opening a character device /dev/vfio/devices/vfioX where "X" is the number allocated uniquely by VFIO for registered devices. -cdev interface does not support noiommu devices, so user should use -the legacy group interface if noiommu is wanted. =20 The cdev only works with IOMMUFD. Both VFIO drivers and applications must adapt to the new cdev security model which requires using @@ -370,6 +368,49 @@ IOMMUFD IOAS/HWPT to enable userspace DMA:: =20 /* Other device operations as stated in "VFIO Usage Example" */ =20 +VFIO NOIOMMU mode +--------------------------------------------------------------------------= ----- +VFIO also supports a no-IOMMU mode, intended for usages where unsafe DMA c= an +be performed by userspace drivers w/o physical IOMMU protection. This mode +is controlled by the parameter: + +/sys/module/vfio/parameters/enable_unsafe_noiommu_mode + +Upon enabling this mode, with an assigned device, the user will be present= ed +with a VFIO group and device file, e.g.:: + + /dev/vfio/ + |-- devices + | `-- noiommu-vfio0 /* VFIO device cdev */ + |-- noiommu-0 /* VFIO group */ + `-- vfio + +The capabilities vary depending on the device programming interface and ke= rnel +configuration used. The following table summarizes the differences: + ++-------------------+---------------------+----------------------+ +| Feature | VFIO group | VFIO device cdev | ++=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ +| VFIO device UAPI | Yes | Yes | ++-------------------+---------------------+----------------------+ +| VFIO container | No | No | ++-------------------+---------------------+----------------------+ +| IOMMUFD IOAS | No | Yes* | ++-------------------+---------------------+----------------------+ + +Note that the VFIO container case includes IOMMUFD provided VFIO compatibi= lity +interfaces when either CONFIG_VFIO_CONTAINER or CONFIG_IOMMUFD_VFIO_CONTAI= NER is +enabled. + +* IOMMUFD UAPI is available for VFIO device cdev to pin and map user memor= y with + the ability to retrieve physical addresses for DMA command submission. + +A new IOMMUFD ioctl IOMMU_IOAS_GET_PA is added to retrieve the physical ad= dress +for a given user virtual address. Note that IOMMU_IOAS_MAP_FIXED_IOVA flag= is +ignored in no-IOMMU mode since there is no physical DMA remapping hardware. +tools/testing/selftests/vfio/vfio_iommufd_noiommu_test.c provides an examp= le of +using this ioctl in no-IOMMU mode. + VFIO User API --------------------------------------------------------------------------= ----- =20 --=20 2.34.1