From nobody Wed Oct 8 23:42:18 2025 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9341623CEF9 for ; Mon, 23 Jun 2025 09:57:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.7 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750672645; cv=none; b=gKTWkD73jSEmXugcpEMoVW93OXCY2ORXAeAg2fJjbM11nPakoyOdVY71xPDe0BRlmez3EsMMTreF4J7S+K+oicekk4YSKNDsiH9LFuB+faHNKZHx+Ogpmz63j4zs+2zDOVZjBc5q6H2HbeUumykfYazGSdUhDaN8Awl/PQQdbl0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750672645; c=relaxed/simple; bh=R4ymocCvY35VgWBYLTK+OrWIsRUfApFcm1v81xqi+wc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=twtJyq+L5SkTqCGizjjhZ8v7sMgTK//ebWyD0KEV5W33DfHqo5WTqO1OJprwBJ9c8ADi32HriVi4gn0qiESWqH4htEn2oXDBh7xc1YGsPbsBvXzrMfR+lbUZGh+s9IYv2x5beSYm/78FFm2u77sYT1R4MAdhker2FnwI6+NShvI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=ZEG3duXC; arc=none smtp.client-ip=192.198.163.7 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="ZEG3duXC" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1750672643; x=1782208643; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=R4ymocCvY35VgWBYLTK+OrWIsRUfApFcm1v81xqi+wc=; b=ZEG3duXCl/Sm4xD+3K9ZtrlmijM6e0Y0mJHow5aX/PzMZjivh0C6ecsk Vs6nzZwe58oALLAcFIqirevV7lC0ZdO2fTawvuy6ed71tNyGltPyQ8Aiz 6bkyAZ+tPLUR+zKmecD3Xn0CUSUAXy8WJCRPU8HahT0wtuZxOdr0iGGFk xD+w/BzhkI7IJmPzh+B+OjDgBj8SIj5szpGQzY9cYb/LjMF38pf9rVD7L V/tQ/pNj5YsS12eWTo+y//BL8bChFNfOFX7YBmxTx1KBodUnjHZTWyPv4 reKSBWrOoH7C8LZNHOvPsj4roUhFcI8USHOoxU0ao+q4Q6B3kBdm081U9 g==; X-CSE-ConnectionGUID: v6yFW1UoScyv2RUw6NISwQ== X-CSE-MsgGUID: gWLqA8O7QkKVy+3nv9+H7w== X-IronPort-AV: E=McAfee;i="6800,10657,11472"; a="78285794" X-IronPort-AV: E=Sophos;i="6.16,258,1744095600"; d="scan'208";a="78285794" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Jun 2025 02:57:23 -0700 X-CSE-ConnectionGUID: XB7CLA9uS/eCizvlqNrJWA== X-CSE-MsgGUID: SrRtBjALRLGNUhwSn/eBkw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.16,258,1744095600"; d="scan'208";a="155859340" Received: from yilunxu-optiplex-7050.sh.intel.com ([10.239.159.165]) by fmviesa005.fm.intel.com with ESMTP; 23 Jun 2025 02:57:20 -0700 From: Xu Yilun To: jgg@nvidia.com, jgg@ziepe.ca, kevin.tian@intel.com, will@kernel.org, aneesh.kumar@kernel.org Cc: iommu@lists.linux.dev, linux-kernel@vger.kernel.org, joro@8bytes.org, robin.murphy@arm.com, shuah@kernel.org, nicolinc@nvidia.com, aik@amd.com, dan.j.williams@intel.com, baolu.lu@linux.intel.com, yilun.xu@linux.intel.com, yilun.xu@intel.com Subject: [PATCH v2 1/4] iommufd: Add iommufd_object_tombstone_user() helper Date: Mon, 23 Jun 2025 17:49:43 +0800 Message-Id: <20250623094946.1714996-2-yilun.xu@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20250623094946.1714996-1-yilun.xu@linux.intel.com> References: <20250623094946.1714996-1-yilun.xu@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add the iommufd_object_tombstone_user() helper, which allows the caller to destroy an iommufd object created by userspace. This is useful on some destroy paths when the kernel caller finds the object should have been removed by userspace but is still alive. With this helper, the caller destroys the object but leave the object ID reserved (so called tombstone). The tombstone prevents repurposing the object ID without awareness from the original user. Since this happens for abnomal userspace behavior, for simplicity, the tombstoned object ID would be permanently leaked until iommufd_fops_release(). I.e. the original user gets an error when calling ioctl(IOMMU_DESTROY) on that ID. The first use case would be to ensure the iommufd_vdevice can't outlive the associated iommufd_device. Suggested-by: Jason Gunthorpe Signed-off-by: Xu Yilun --- drivers/iommu/iommufd/iommufd_private.h | 23 +++++++++++++++++- drivers/iommu/iommufd/main.c | 31 ++++++++++++++++++------- 2 files changed, 45 insertions(+), 9 deletions(-) diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommuf= d/iommufd_private.h index 9ccc83341f32..fbc9ef78d81f 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -187,7 +187,8 @@ void iommufd_object_finalize(struct iommufd_ctx *ictx, struct iommufd_object *obj); =20 enum { - REMOVE_WAIT_SHORTTERM =3D 1, + REMOVE_WAIT_SHORTTERM =3D BIT(0), + REMOVE_OBJ_TOMBSTONE =3D BIT(1), }; int iommufd_object_remove(struct iommufd_ctx *ictx, struct iommufd_object *to_destroy, u32 id, @@ -213,6 +214,26 @@ static inline void iommufd_object_destroy_user(struct = iommufd_ctx *ictx, WARN_ON(ret); } =20 +/* + * Similar to iommufd_object_destroy_user(), except that the object ID is = left + * reserved/tombstoned. + */ +static inline void iommufd_object_tombstone_user(struct iommufd_ctx *ictx, + struct iommufd_object *obj) +{ + int ret; + + ret =3D iommufd_object_remove(ictx, obj, obj->id, + REMOVE_WAIT_SHORTTERM | REMOVE_OBJ_TOMBSTONE); + + /* + * If there is a bug and we couldn't destroy the object then we did put + * back the caller's users refcount and will eventually try to free it + * again during close. + */ + WARN_ON(ret); +} + /* * The HWPT allocated by autodomains is used in possibly many devices and * is automatically destroyed when its refcount reaches zero. diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index 3df468f64e7d..5fd75aba068b 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -167,7 +167,7 @@ int iommufd_object_remove(struct iommufd_ctx *ictx, goto err_xa; } =20 - xas_store(&xas, NULL); + xas_store(&xas, (flags & REMOVE_OBJ_TOMBSTONE) ? XA_ZERO_ENTRY : NULL); if (ictx->vfio_ioas =3D=3D container_of(obj, struct iommufd_ioas, obj)) ictx->vfio_ioas =3D NULL; xa_unlock(&ictx->objects); @@ -238,6 +238,7 @@ static int iommufd_fops_release(struct inode *inode, st= ruct file *filp) struct iommufd_ctx *ictx =3D filp->private_data; struct iommufd_sw_msi_map *next; struct iommufd_sw_msi_map *cur; + XA_STATE(xas, &ictx->objects, 0); struct iommufd_object *obj; =20 /* @@ -251,16 +252,30 @@ static int iommufd_fops_release(struct inode *inode, = struct file *filp) */ while (!xa_empty(&ictx->objects)) { unsigned int destroyed =3D 0; - unsigned long index; =20 - xa_for_each(&ictx->objects, index, obj) { - if (!refcount_dec_if_one(&obj->users)) - continue; + xas_set(&xas, 0); + for (;;) { + rcu_read_lock(); + obj =3D xas_find(&xas, ULONG_MAX); + rcu_read_unlock(); + + if (!obj) + break; + + if (!xa_is_zero(obj)) { + if (!refcount_dec_if_one(&obj->users)) + continue; + + iommufd_object_ops[obj->type].destroy(obj); + kfree(obj); + } + destroyed++; - xa_erase(&ictx->objects, index); - iommufd_object_ops[obj->type].destroy(obj); - kfree(obj); + xas_lock(&xas); + xas_store(&xas, NULL); + xas_unlock(&xas); } + /* Bug related to users refcount */ if (WARN_ON(!destroyed)) break; --=20 2.25.1 From nobody Wed Oct 8 23:42:18 2025 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 216E823E235 for ; Mon, 23 Jun 2025 09:57:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.7 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750672648; cv=none; b=FvJwx8KqvEJibaIaYmqv5jO/RdJDyU4Aao3HN+AYBbIhkWWWU4SpYB2SxuRL9WxJ8/AmJJ/+8+WSkfXNma8j3xacmcbz3aoPhahw8Zj3JubHdzQsKK5NOR2sfloOOYxOOmw+273jREZBUpCnNTG5XHQbZL46LYr2HvC8QvmDWek= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750672648; c=relaxed/simple; bh=DpL5RcR7IOCGHlGzPA7k9n8Ag2B6IqmddaRdUeckuJM=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=RfcmEQwZHEJ8is/C4muUa9+GG1xWJevDSw5TubcdW/O03nScCjk3qWWyW48+aBGkOGIMUjr1tx0ekdkzj6xksygFOD9HVZ9/UtLj6dY/RA8aZr0q76WTTHw6UzESpdMaRJrJ2R6Q8GmaYhvGE70wL7WhNIZVaeG4proyI+AhK4E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=jnZRVmGG; arc=none smtp.client-ip=192.198.163.7 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="jnZRVmGG" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1750672647; x=1782208647; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=DpL5RcR7IOCGHlGzPA7k9n8Ag2B6IqmddaRdUeckuJM=; b=jnZRVmGGUEafI+zScESvIv7V9wS+AZRTiKf7R3YDGQ5ivo33UKdwqrMS MBcW8fk9TreBFcRvSBqWAXvrnL6HcRdX6JaZ0MTdqMgsvLZt8jScRXfV/ nBM0OfT+KbL6Mr+mAOIar5SLrSO4exvLaHX2xBGbXOgjcMR4GsxmE1aZe IZifRm9u0TbAgOomAMNHq3mAhadQd/n4DCqE3q/brn2InVdOTSFZh2Mdi 0ckZeAe7N4s78fV2MpHTZ7OXIolg04HOc9P5QSJnhF2elN5d/JcqlNeqA 0ZQlk32PBIV410iEVInPq0szE6aH8xiRoB61vkVZyVkp4Olk+U1b45dK3 A==; X-CSE-ConnectionGUID: pBTPF4BOSSeGwoW3torcrQ== X-CSE-MsgGUID: mAzFMMMvSQSgaBWQ78b+6w== X-IronPort-AV: E=McAfee;i="6800,10657,11472"; a="78285817" X-IronPort-AV: E=Sophos;i="6.16,258,1744095600"; d="scan'208";a="78285817" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Jun 2025 02:57:27 -0700 X-CSE-ConnectionGUID: hr4raBz2TZCSn+mKIFbKow== X-CSE-MsgGUID: fRUYoG3eTImDl/HVBHekWw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.16,258,1744095600"; d="scan'208";a="155859354" Received: from yilunxu-optiplex-7050.sh.intel.com ([10.239.159.165]) by fmviesa005.fm.intel.com with ESMTP; 23 Jun 2025 02:57:23 -0700 From: Xu Yilun To: jgg@nvidia.com, jgg@ziepe.ca, kevin.tian@intel.com, will@kernel.org, aneesh.kumar@kernel.org Cc: iommu@lists.linux.dev, linux-kernel@vger.kernel.org, joro@8bytes.org, robin.murphy@arm.com, shuah@kernel.org, nicolinc@nvidia.com, aik@amd.com, dan.j.williams@intel.com, baolu.lu@linux.intel.com, yilun.xu@linux.intel.com, yilun.xu@intel.com Subject: [PATCH v2 2/4] iommufd/viommu: Fix the uninitialized iommufd_vdevice::ictx Date: Mon, 23 Jun 2025 17:49:44 +0800 Message-Id: <20250623094946.1714996-3-yilun.xu@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20250623094946.1714996-1-yilun.xu@linux.intel.com> References: <20250623094946.1714996-1-yilun.xu@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Fix the uninitialized iommufd_vdevice::ictx. No code was using this field before, but later vdevice will use it to sync up with idevice on destroy paths. Fixes: 0ce5c2477af2 ("iommufd/viommu: Add IOMMUFD_OBJ_VDEVICE and IOMMU_VDE= VICE_ALLOC ioctl") Cc: Signed-off-by: Xu Yilun --- drivers/iommu/iommufd/viommu.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c index 01df2b985f02..4577b88c8560 100644 --- a/drivers/iommu/iommufd/viommu.c +++ b/drivers/iommu/iommufd/viommu.c @@ -130,6 +130,7 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *uc= md) goto out_put_idev; } =20 + vdev->ictx =3D ucmd->ictx; vdev->id =3D virt_id; vdev->dev =3D idev->dev; get_device(idev->dev); --=20 2.25.1 From nobody Wed Oct 8 23:42:18 2025 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D255E23F404 for ; Mon, 23 Jun 2025 09:57:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.7 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750672652; cv=none; b=LukeJiiaaD/DQEazyo3ssf27/ToOXqhlmv3BOprbgJzjb53E05DkMmrnOSA14U4JRkz3MydwsjrhbV72yZwQ0aL2a68IUZ0IOwzQuamOcvrk88x6PHPssL0rsCkuA8ERV6njJnNPtS5SpxUWF7UFl/qQGG+fWvqBYHQ16bhQi44= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750672652; c=relaxed/simple; bh=eG1eg5apZTSueD/Cu1CtjPn9WRibNu7IhiHyzCb/Dgg=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=FP4qpy2ulKXtvcePc7Qf7T0ihCH0clZbSvqA8VFtFGC6enlYJjfHG7BlXUffc5TPex6vGyGsvW/EvTL/0qpvk7vmOZ2NsFwv1QzRqFCnLRZCDT+5b9vARG3w74MF5J/7dGZ7arEvKJ/1Lf0v5JHtNPVWxx3oMWZVdYjuGVVmGPY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=lv6I4wVW; arc=none smtp.client-ip=192.198.163.7 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="lv6I4wVW" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1750672651; x=1782208651; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=eG1eg5apZTSueD/Cu1CtjPn9WRibNu7IhiHyzCb/Dgg=; b=lv6I4wVWkpwINbjawrh4jsuYcE0clt9HOKRRAw6uYxs0pGRm6oBJpKeT 3undZCVoe0lakUCLLLJi5R2c4AO52zBhjbFZFZmKZbcrSFiw/VwcA1Z6b lXhNSofBvB68pIM+OyOQh9ndbvUlZ0ra2QgK+op+MpsO5x4bg0u8d5gUi 2CBzwgA5IPNVVdQg8ehss++DjNBFR01IKLdNWY5cyuMDVX4sz6kDTFNzi weH1N565/OMO6L/c7VBoJBXIX48CV3E3czzo1fY6Sh4JwRYV1eKbB3ZXA ykzskqn1u/BLlRlh0FooqVJeeYW4nXP6BpRqXaFh5k1WFveT+UKiHhhDV g==; X-CSE-ConnectionGUID: fDvB8pA3SCOQVtZxXqaQpQ== X-CSE-MsgGUID: TFY/MjTCSVecpdE9Ae89uA== X-IronPort-AV: E=McAfee;i="6800,10657,11472"; a="78285838" X-IronPort-AV: E=Sophos;i="6.16,258,1744095600"; d="scan'208";a="78285838" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Jun 2025 02:57:30 -0700 X-CSE-ConnectionGUID: 5yqTpPfdSxSyH0O7PGnmaw== X-CSE-MsgGUID: 3a5agaUaSj+qgsFXUznEzQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.16,258,1744095600"; d="scan'208";a="155859363" Received: from yilunxu-optiplex-7050.sh.intel.com ([10.239.159.165]) by fmviesa005.fm.intel.com with ESMTP; 23 Jun 2025 02:57:27 -0700 From: Xu Yilun To: jgg@nvidia.com, jgg@ziepe.ca, kevin.tian@intel.com, will@kernel.org, aneesh.kumar@kernel.org Cc: iommu@lists.linux.dev, linux-kernel@vger.kernel.org, joro@8bytes.org, robin.murphy@arm.com, shuah@kernel.org, nicolinc@nvidia.com, aik@amd.com, dan.j.williams@intel.com, baolu.lu@linux.intel.com, yilun.xu@linux.intel.com, yilun.xu@intel.com Subject: [PATCH v2 3/4] iommufd: Destroy vdevice on idevice destroy Date: Mon, 23 Jun 2025 17:49:45 +0800 Message-Id: <20250623094946.1714996-4-yilun.xu@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20250623094946.1714996-1-yilun.xu@linux.intel.com> References: <20250623094946.1714996-1-yilun.xu@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Destroy iommufd_vdevice(vdev) on iommufd_idevice(idev) destroy so that vdev can't outlive idev. iommufd_device(idev) represents the physical device bound to iommufd, while the iommufd_vdevice(vdev) represents the virtual instance of the physical device in the VM. The lifecycle of the vdev should not be longer than idev. This doesn't cause real problem on existing use cases cause vdev doesn't impact the physical device, only provides virtualization information. But to extend vdev for Confidential Computing(CC), there are needs to do secure configuration for the vdev, e.g. TSM Bind/Unbind. These configurations should be rolled back on idev destroy, or the external driver(VFIO) functionality may be impact. Building the association between idev & vdev requires the two objects pointing each other, but not referencing each other. This requires proper locking. This is done by reviving some of Nicolin's patch [1]. There are 3 cases on idev destroy: 1. vdev is already destroyed by userspace. No extra handling needed. 2. vdev is still alive. Use iommufd_object_tombstone_user() to destroy vdev and tombstone the vdev ID. 3. vdev is being destroyed by userspace. The vdev ID is already freed, but vdev destroy handler is not complete. The idev destroy handler should wait for vdev destroy completion. [1]: https://lore.kernel.org/all/53025c827c44d68edb6469bfd940a8e8bc6147a5.1= 729897278.git.nicolinc@nvidia.com/ Original-by: Nicolin Chen Original-by: Aneesh Kumar K.V (Arm) Signed-off-by: Xu Yilun --- drivers/iommu/iommufd/device.c | 43 +++++++++++++++++++++++++ drivers/iommu/iommufd/iommufd_private.h | 11 +++++++ drivers/iommu/iommufd/main.c | 1 + drivers/iommu/iommufd/viommu.c | 33 +++++++++++++++++-- 4 files changed, 85 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 86244403b532..908a94a27bab 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -137,11 +137,54 @@ static struct iommufd_group *iommufd_get_group(struct= iommufd_ctx *ictx, } } =20 +static void iommufd_device_remove_vdev(struct iommufd_device *idev) +{ + bool vdev_removing =3D false; + + mutex_lock(&idev->igroup->lock); + if (idev->vdev) { + struct iommufd_vdevice *vdev; + + vdev =3D iommufd_get_vdevice(idev->ictx, idev->vdev->obj.id); + if (IS_ERR(vdev)) { + /* vdev is removed from xarray, but is not destroyed/freed */ + vdev_removing =3D true; + goto unlock; + } + + /* Should never happen */ + if (WARN_ON(vdev !=3D idev->vdev)) { + idev->vdev =3D NULL; + iommufd_put_object(idev->ictx, &vdev->obj); + goto unlock; + } + + /* + * vdev cannot be destroyed after refcount_inc, safe to release + * idev->igroup->lock and use idev->vdev afterward. + */ + refcount_inc(&idev->vdev->obj.users); + iommufd_put_object(idev->ictx, &idev->vdev->obj); + } +unlock: + mutex_unlock(&idev->igroup->lock); + + if (vdev_removing) { + if (!wait_event_timeout(idev->ictx->destroy_wait, + !idev->vdev, + msecs_to_jiffies(60000))) + pr_crit("Time out waiting for iommufd vdevice removed\n"); + } else if (idev->vdev) { + iommufd_object_tombstone_user(idev->ictx, &idev->vdev->obj); + } +} + void iommufd_device_destroy(struct iommufd_object *obj) { struct iommufd_device *idev =3D container_of(obj, struct iommufd_device, obj); =20 + iommufd_device_remove_vdev(idev); iommu_device_release_dma_owner(idev->dev); iommufd_put_group(idev->igroup); if (!iommufd_selftest_is_mock_dev(idev->dev)) diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommuf= d/iommufd_private.h index fbc9ef78d81f..f58aa4439c53 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -446,6 +446,7 @@ struct iommufd_device { /* always the physical device */ struct device *dev; bool enforce_cache_coherency; + struct iommufd_vdevice *vdev; }; =20 static inline struct iommufd_device * @@ -621,6 +622,7 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucm= d); void iommufd_viommu_destroy(struct iommufd_object *obj); int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd); void iommufd_vdevice_destroy(struct iommufd_object *obj); +void iommufd_vdevice_abort(struct iommufd_object *obj); =20 struct iommufd_vdevice { struct iommufd_object obj; @@ -628,8 +630,17 @@ struct iommufd_vdevice { struct iommufd_viommu *viommu; struct device *dev; u64 id; /* per-vIOMMU virtual ID */ + struct iommufd_device *idev; }; =20 +static inline struct iommufd_vdevice * +iommufd_get_vdevice(struct iommufd_ctx *ictx, u32 id) +{ + return container_of(iommufd_get_object(ictx, id, + IOMMUFD_OBJ_VDEVICE), + struct iommufd_vdevice, obj); +} + #ifdef CONFIG_IOMMUFD_TEST int iommufd_test(struct iommufd_ucmd *ucmd); void iommufd_selftest_destroy(struct iommufd_object *obj); diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index 5fd75aba068b..3f955a123095 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -531,6 +531,7 @@ static const struct iommufd_object_ops iommufd_object_o= ps[] =3D { }, [IOMMUFD_OBJ_VDEVICE] =3D { .destroy =3D iommufd_vdevice_destroy, + .abort =3D iommufd_vdevice_abort, }, [IOMMUFD_OBJ_VEVENTQ] =3D { .destroy =3D iommufd_veventq_destroy, diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c index 4577b88c8560..9b062e651ea5 100644 --- a/drivers/iommu/iommufd/viommu.c +++ b/drivers/iommu/iommufd/viommu.c @@ -84,16 +84,31 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucm= d) return rc; } =20 -void iommufd_vdevice_destroy(struct iommufd_object *obj) +void iommufd_vdevice_abort(struct iommufd_object *obj) { struct iommufd_vdevice *vdev =3D container_of(obj, struct iommufd_vdevice, obj); struct iommufd_viommu *viommu =3D vdev->viommu; + struct iommufd_device *idev =3D vdev->idev; + + lockdep_assert_held(&idev->igroup->lock); =20 /* xa_cmpxchg is okay to fail if alloc failed xa_cmpxchg previously */ xa_cmpxchg(&viommu->vdevs, vdev->id, vdev, NULL, GFP_KERNEL); refcount_dec(&viommu->obj.users); put_device(vdev->dev); + idev->vdev =3D NULL; +} + +void iommufd_vdevice_destroy(struct iommufd_object *obj) +{ + struct iommufd_vdevice *vdev =3D + container_of(obj, struct iommufd_vdevice, obj); + + mutex_lock(&vdev->idev->igroup->lock); + iommufd_vdevice_abort(obj); + mutex_unlock(&vdev->idev->igroup->lock); + wake_up_interruptible_all(&vdev->ictx->destroy_wait); } =20 int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd) @@ -124,18 +139,28 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *= ucmd) goto out_put_idev; } =20 + mutex_lock(&idev->igroup->lock); + if (idev->vdev) { + rc =3D -EEXIST; + goto out_unlock_igroup; + } + vdev =3D iommufd_object_alloc(ucmd->ictx, vdev, IOMMUFD_OBJ_VDEVICE); if (IS_ERR(vdev)) { rc =3D PTR_ERR(vdev); - goto out_put_idev; + goto out_unlock_igroup; } =20 + /* vdev can't outlive idev, vdev->idev is always valid, need no refcnt */ + vdev->idev =3D idev; vdev->ictx =3D ucmd->ictx; vdev->id =3D virt_id; vdev->dev =3D idev->dev; get_device(idev->dev); vdev->viommu =3D viommu; refcount_inc(&viommu->obj.users); + /* idev->vdev is protected by idev->igroup->lock, need no refcnt */ + idev->vdev =3D vdev; =20 curr =3D xa_cmpxchg(&viommu->vdevs, virt_id, NULL, vdev, GFP_KERNEL); if (curr) { @@ -148,10 +173,12 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *= ucmd) if (rc) goto out_abort; iommufd_object_finalize(ucmd->ictx, &vdev->obj); - goto out_put_idev; + goto out_unlock_igroup; =20 out_abort: iommufd_object_abort_and_destroy(ucmd->ictx, &vdev->obj); +out_unlock_igroup: + mutex_unlock(&idev->igroup->lock); out_put_idev: iommufd_put_object(ucmd->ictx, &idev->obj); out_put_viommu: --=20 2.25.1 From nobody Wed Oct 8 23:42:18 2025 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E86671F4604 for ; Mon, 23 Jun 2025 09:57:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.7 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750672657; cv=none; b=LcnYhVKDlBBxF91XPxxlaFBV/7+HqQi36sr0rbbrYZjVCcyB9jEPGY7Bw8rx86RQyq8IcJrNa1cwisNpxB+aOdMHBOf2IgNn1CrNyfKgmtQ4uZdthXwC+v1FDWADeNByUdon7byw3MOgZFljdUhwgV/VwGGtJ7fsjAa8j6Qw/PU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750672657; c=relaxed/simple; bh=l8eKALXzeaaC5h8btBdADZUaPZ7XJqqocvLm82cf354=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=hGwl+L6DLO5+FKQed6uW1HkDPXYqgQqPgk9eTQ4I9RHrP2yw9rScJC5aVma71J+YBu1aircegEKW+39VW7DkeFK9oklM3wA+jbpcwHNvAEAe3QzAxzK3U0fSHsOy7hEPc/qtclJn9Y5ptil9iq8aqy5Dj40zKwzMXnOl/JO6FJ8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=ZcriAQaG; arc=none smtp.client-ip=192.198.163.7 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="ZcriAQaG" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1750672656; x=1782208656; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=l8eKALXzeaaC5h8btBdADZUaPZ7XJqqocvLm82cf354=; b=ZcriAQaGW/ZfiqQ8geJjU2Vzgyt/q+j4zczTrBOSR8TKgyWYYuX/m0LO l/PgThKq2REsZ59UbcTEFrPbBW1Ky+/F48PzewMRhl/G9dBNY3d651ejp 28JUMZ6VBrtwjT9tamcZMu0KF6IAAPLF/MQvNMx1PjUWdlm41h1DfvTPi cD2P6joJfTl1zKsCtrAeH8+Ek334T/mIM1fjwgBT2pjPhdVlswD9eNUY6 x+KN0holSo0OjurXKWUrMVVJd2DKnCwNqVdqLeql4FQyo6DMMt3RnLpzC t+qqwcxP42vlVFRrHZAITybeUGaH1ZMrkmTHtkvUajfJm09A62vlu0aUj w==; X-CSE-ConnectionGUID: D+JZiSc8QM6wYp/JefXJKA== X-CSE-MsgGUID: BodaUS1+S3qyDjodyH6faw== X-IronPort-AV: E=McAfee;i="6800,10657,11472"; a="78285870" X-IronPort-AV: E=Sophos;i="6.16,258,1744095600"; d="scan'208";a="78285870" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Jun 2025 02:57:34 -0700 X-CSE-ConnectionGUID: GQ6xhPXaQLW+y0PzodT/cA== X-CSE-MsgGUID: cq5kHF8wTiapbOmvRqb/SA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.16,258,1744095600"; d="scan'208";a="155859371" Received: from yilunxu-optiplex-7050.sh.intel.com ([10.239.159.165]) by fmviesa005.fm.intel.com with ESMTP; 23 Jun 2025 02:57:31 -0700 From: Xu Yilun To: jgg@nvidia.com, jgg@ziepe.ca, kevin.tian@intel.com, will@kernel.org, aneesh.kumar@kernel.org Cc: iommu@lists.linux.dev, linux-kernel@vger.kernel.org, joro@8bytes.org, robin.murphy@arm.com, shuah@kernel.org, nicolinc@nvidia.com, aik@amd.com, dan.j.williams@intel.com, baolu.lu@linux.intel.com, yilun.xu@linux.intel.com, yilun.xu@intel.com Subject: [PATCH v2 4/4] iommufd/selftest: Add coverage for vdevice tombstone Date: Mon, 23 Jun 2025 17:49:46 +0800 Message-Id: <20250623094946.1714996-5-yilun.xu@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20250623094946.1714996-1-yilun.xu@linux.intel.com> References: <20250623094946.1714996-1-yilun.xu@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This tests the flow to tombstone vdevice when idevice is to be unbound before vdevice destroy. The expected result is: - idevice unbound tombstones vdevice ID, the ID can't be repurposed anymore. - Even ioctl(IOMMU_DESTROY) can't free it the tombstoned ID. - iommufd_fops_release() can still free everything. Signed-off-by: Xu Yilun --- tools/testing/selftests/iommu/iommufd.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selfte= sts/iommu/iommufd.c index 1a8e85afe9aa..092e1344447e 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -3014,6 +3014,19 @@ TEST_F(iommufd_viommu, vdevice_cache) } } =20 +TEST_F(iommufd_viommu, vdevice_tombstone) +{ + uint32_t viommu_id =3D self->viommu_id; + uint32_t dev_id =3D self->device_id; + uint32_t vdev_id =3D 0; + + if (dev_id) { + test_cmd_vdevice_alloc(viommu_id, dev_id, 0x99, &vdev_id); + test_ioctl_destroy(self->stdev_id); + EXPECT_ERRNO(ENOENT, _test_ioctl_destroy(self->fd, vdev_id)); + } +} + FIXTURE(iommufd_device_pasid) { int fd; --=20 2.25.1