From nobody Fri Dec 19 14:08:09 2025 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 335412033F; Wed, 20 Dec 2023 01:29:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="a4pqoZSa" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1703035758; x=1734571758; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=dNWjTkTwu0fVl5MQ2bItdZlWIfXllntPZIk4DSP0fII=; b=a4pqoZSadMhw0B2vY9cJ05idcmrlcli2DGnb7C/bZwnzC/e4pncAYXmn 3kXRx/xfcBTh8WdXBHQGvrW8R5VZVdvlBQvH46OjAAf5GrD1lfV+F/PPo FATcclgYSYqfxjEcaSDARWldNmdy2wgdXTz/9JpfG8SCzrp1BBU8sULRT VxuF8NMZXveOAIc9KH8UAdINuHnHG2y+C7LOpS9qX8OpxAJjmvlE75IQp OLtJSiS0EAVlKoLLyxSB12rmn9Htl06xIEmxFJ0LCjXfhJPhwUdLQYUqL hG66IzmoQ/+y9UnAzrSI5ousrALLrnnoLlXZ6hx7fqMOkvTa647agOoLL Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10929"; a="2965854" X-IronPort-AV: E=Sophos;i="6.04,290,1695711600"; d="scan'208";a="2965854" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Dec 2023 17:29:18 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10929"; a="1023319199" X-IronPort-AV: E=Sophos;i="6.04,290,1695711600"; d="scan'208";a="1023319199" Received: from allen-box.sh.intel.com ([10.239.159.127]) by fmsmga006.fm.intel.com with ESMTP; 19 Dec 2023 17:29:14 -0800 From: Lu Baolu To: Joerg Roedel , Will Deacon , Robin Murphy , Jason Gunthorpe , Kevin Tian , Jean-Philippe Brucker , Nicolin Chen Cc: Yi Liu , Jacob Pan , Longfang Liu , Yan Zhao , iommu@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Lu Baolu , Jason Gunthorpe Subject: [PATCH v9 12/14] iommu: Use refcount for fault data access Date: Wed, 20 Dec 2023 09:23:30 +0800 Message-Id: <20231220012332.168188-13-baolu.lu@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231220012332.168188-1-baolu.lu@linux.intel.com> References: <20231220012332.168188-1-baolu.lu@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The per-device fault data structure stores information about faults occurring on a device. Its lifetime spans from IOPF enablement to disablement. Multiple paths, including IOPF reporting, handling, and responding, may access it concurrently. Previously, a mutex protected the fault data from use after free. But this is not performance friendly due to the critical nature of IOPF handling paths. Refine this with a refcount-based approach. The fault data pointer is obtained within an RCU read region with a refcount. The fault data pointer is returned for usage only when the pointer is valid and a refcount is successfully obtained. The fault data is freed with kfree_rcu(), ensuring data is only freed after all RCU critical regions complete. An iopf handling work starts once an iopf group is created. The handling work continues until iommu_page_response() is called to respond to the iopf and the iopf group is freed. During this time, the device fault parameter should always be available. Add a pointer to the device fault parameter in the iopf_group structure and hold the reference until the iopf_group is freed. Make iommu_page_response() static as it is only used in io-pgfault.c. Co-developed-by: Jason Gunthorpe Signed-off-by: Jason Gunthorpe Signed-off-by: Lu Baolu Tested-by: Yan Zhao Reviewed-by: Jason Gunthorpe --- include/linux/iommu.h | 17 +++-- drivers/iommu/io-pgfault.c | 129 +++++++++++++++++++++++-------------- drivers/iommu/iommu-sva.c | 2 +- 3 files changed, 90 insertions(+), 58 deletions(-) diff --git a/include/linux/iommu.h b/include/linux/iommu.h index dbad2cb9eca2..c2416aa79922 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -41,6 +41,7 @@ struct iommu_dirty_ops; struct notifier_block; struct iommu_sva; struct iommu_dma_cookie; +struct iommu_fault_param; =20 #define IOMMU_FAULT_PERM_READ (1 << 0) /* read */ #define IOMMU_FAULT_PERM_WRITE (1 << 1) /* write */ @@ -129,8 +130,9 @@ struct iopf_group { struct iopf_fault last_fault; struct list_head faults; struct work_struct work; - struct device *dev; struct iommu_domain *domain; + /* The device's fault data parameter. */ + struct iommu_fault_param *fault_param; }; =20 /** @@ -602,6 +604,8 @@ struct iommu_device { /** * struct iommu_fault_param - per-device IOMMU fault data * @lock: protect pending faults list + * @users: user counter to manage the lifetime of the data + * @ruc: rcu head for kfree_rcu() * @dev: the device that owns this param * @queue: IOPF queue * @queue_list: index into queue->devices @@ -611,6 +615,8 @@ struct iommu_device { */ struct iommu_fault_param { struct mutex lock; + refcount_t users; + struct rcu_head rcu; =20 struct device *dev; struct iopf_queue *queue; @@ -638,7 +644,7 @@ struct iommu_fault_param { */ struct dev_iommu { struct mutex lock; - struct iommu_fault_param *fault_param; + struct iommu_fault_param __rcu *fault_param; struct iommu_fwspec *fwspec; struct iommu_device *iommu_dev; void *priv; @@ -1466,7 +1472,6 @@ void iopf_queue_free(struct iopf_queue *queue); int iopf_queue_discard_partial(struct iopf_queue *queue); void iopf_free_group(struct iopf_group *group); int iommu_report_device_fault(struct device *dev, struct iopf_fault *evt); -int iommu_page_response(struct device *dev, struct iommu_page_response *ms= g); int iopf_group_response(struct iopf_group *group, enum iommu_page_response_code status); #else @@ -1511,12 +1516,6 @@ iommu_report_device_fault(struct device *dev, struct= iopf_fault *evt) return -ENODEV; } =20 -static inline int -iommu_page_response(struct device *dev, struct iommu_page_response *msg) -{ - return -ENODEV; -} - static inline int iopf_group_response(struct iopf_group *group, enum iommu_page_response_code status) { diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c index 5aea8402be47..3a907bad2fcb 100644 --- a/drivers/iommu/io-pgfault.c +++ b/drivers/iommu/io-pgfault.c @@ -13,6 +13,32 @@ =20 #include "iommu-priv.h" =20 +/* + * Return the fault parameter of a device if it exists. Otherwise, return = NULL. + * On a successful return, the caller takes a reference of this parameter = and + * should put it after use by calling iopf_put_dev_fault_param(). + */ +static struct iommu_fault_param *iopf_get_dev_fault_param(struct device *d= ev) +{ + struct dev_iommu *param =3D dev->iommu; + struct iommu_fault_param *fault_param; + + rcu_read_lock(); + fault_param =3D rcu_dereference(param->fault_param); + if (fault_param && !refcount_inc_not_zero(&fault_param->users)) + fault_param =3D NULL; + rcu_read_unlock(); + + return fault_param; +} + +/* Caller must hold a reference of the fault parameter. */ +static void iopf_put_dev_fault_param(struct iommu_fault_param *fault_param) +{ + if (refcount_dec_and_test(&fault_param->users)) + kfree_rcu(fault_param, rcu); +} + void iopf_free_group(struct iopf_group *group) { struct iopf_fault *iopf, *next; @@ -22,6 +48,8 @@ void iopf_free_group(struct iopf_group *group) kfree(iopf); } =20 + /* Pair with iommu_report_device_fault(). */ + iopf_put_dev_fault_param(group->fault_param); kfree(group); } EXPORT_SYMBOL_GPL(iopf_free_group); @@ -135,7 +163,7 @@ static int iommu_handle_iopf(struct iommu_fault *fault, goto cleanup_partial; } =20 - group->dev =3D dev; + group->fault_param =3D iopf_param; group->last_fault.fault =3D *fault; INIT_LIST_HEAD(&group->faults); group->domain =3D domain; @@ -178,64 +206,61 @@ static int iommu_handle_iopf(struct iommu_fault *faul= t, */ int iommu_report_device_fault(struct device *dev, struct iopf_fault *evt) { + bool last_prq =3D evt->fault.type =3D=3D IOMMU_FAULT_PAGE_REQ && + (evt->fault.prm.flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE); struct iommu_fault_param *fault_param; - struct iopf_fault *evt_pending =3D NULL; - struct dev_iommu *param =3D dev->iommu; - int ret =3D 0; + struct iopf_fault *evt_pending; + int ret; =20 - mutex_lock(¶m->lock); - fault_param =3D param->fault_param; - if (!fault_param) { - mutex_unlock(¶m->lock); + fault_param =3D iopf_get_dev_fault_param(dev); + if (!fault_param) return -EINVAL; - } =20 mutex_lock(&fault_param->lock); - if (evt->fault.type =3D=3D IOMMU_FAULT_PAGE_REQ && - (evt->fault.prm.flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE)) { + if (last_prq) { evt_pending =3D kmemdup(evt, sizeof(struct iopf_fault), GFP_KERNEL); if (!evt_pending) { ret =3D -ENOMEM; - goto done_unlock; + goto err_unlock; } list_add_tail(&evt_pending->list, &fault_param->faults); } =20 ret =3D iommu_handle_iopf(&evt->fault, fault_param); - if (ret && evt_pending) { + if (ret) + goto err_free; + + mutex_unlock(&fault_param->lock); + /* The reference count of fault_param is now held by iopf_group. */ + if (!last_prq) + iopf_put_dev_fault_param(fault_param); + + return 0; +err_free: + if (last_prq) { list_del(&evt_pending->list); kfree(evt_pending); } -done_unlock: +err_unlock: mutex_unlock(&fault_param->lock); - mutex_unlock(¶m->lock); + iopf_put_dev_fault_param(fault_param); =20 return ret; } EXPORT_SYMBOL_GPL(iommu_report_device_fault); =20 -int iommu_page_response(struct device *dev, - struct iommu_page_response *msg) +static int iommu_page_response(struct iopf_group *group, + struct iommu_page_response *msg) { bool needs_pasid; int ret =3D -EINVAL; struct iopf_fault *evt; struct iommu_fault_page_request *prm; - struct dev_iommu *param =3D dev->iommu; - struct iommu_fault_param *fault_param; + struct device *dev =3D group->fault_param->dev; const struct iommu_ops *ops =3D dev_iommu_ops(dev); bool has_pasid =3D msg->flags & IOMMU_PAGE_RESP_PASID_VALID; - - if (!ops->page_response) - return -ENODEV; - - mutex_lock(¶m->lock); - fault_param =3D param->fault_param; - if (!fault_param) { - mutex_unlock(¶m->lock); - return -EINVAL; - } + struct iommu_fault_param *fault_param =3D group->fault_param; =20 /* Only send response if there is a fault report pending */ mutex_lock(&fault_param->lock); @@ -276,10 +301,9 @@ int iommu_page_response(struct device *dev, =20 done_unlock: mutex_unlock(&fault_param->lock); - mutex_unlock(¶m->lock); + return ret; } -EXPORT_SYMBOL_GPL(iommu_page_response); =20 /** * iopf_queue_flush_dev - Ensure that all queued faults have been processed @@ -295,22 +319,20 @@ EXPORT_SYMBOL_GPL(iommu_page_response); */ int iopf_queue_flush_dev(struct device *dev) { - int ret =3D 0; struct iommu_fault_param *iopf_param; - struct dev_iommu *param =3D dev->iommu; =20 - if (!param) + /* + * It's a driver bug to be here after iopf_queue_remove_device(). + * Therefore, it's safe to dereference the fault parameter without + * holding the lock. + */ + iopf_param =3D rcu_dereference_check(dev->iommu->fault_param, true); + if (WARN_ON(!iopf_param)) return -ENODEV; =20 - mutex_lock(¶m->lock); - iopf_param =3D param->fault_param; - if (iopf_param) - flush_workqueue(iopf_param->queue->wq); - else - ret =3D -ENODEV; - mutex_unlock(¶m->lock); + flush_workqueue(iopf_param->queue->wq); =20 - return ret; + return 0; } EXPORT_SYMBOL_GPL(iopf_queue_flush_dev); =20 @@ -335,7 +357,7 @@ int iopf_group_response(struct iopf_group *group, (iopf->fault.prm.flags & IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID)) resp.flags =3D IOMMU_PAGE_RESP_PASID_VALID; =20 - return iommu_page_response(group->dev, &resp); + return iommu_page_response(group, &resp); } EXPORT_SYMBOL_GPL(iopf_group_response); =20 @@ -384,10 +406,15 @@ int iopf_queue_add_device(struct iopf_queue *queue, s= truct device *dev) int ret =3D 0; struct dev_iommu *param =3D dev->iommu; struct iommu_fault_param *fault_param; + const struct iommu_ops *ops =3D dev_iommu_ops(dev); + + if (!ops->page_response) + return -ENODEV; =20 mutex_lock(&queue->lock); mutex_lock(¶m->lock); - if (param->fault_param) { + if (rcu_dereference_check(param->fault_param, + lockdep_is_held(¶m->lock))) { ret =3D -EBUSY; goto done_unlock; } @@ -402,10 +429,12 @@ int iopf_queue_add_device(struct iopf_queue *queue, s= truct device *dev) INIT_LIST_HEAD(&fault_param->faults); INIT_LIST_HEAD(&fault_param->partial); fault_param->dev =3D dev; + refcount_set(&fault_param->users, 1); + init_rcu_head(&fault_param->rcu); list_add(&fault_param->queue_list, &queue->devices); fault_param->queue =3D queue; =20 - param->fault_param =3D fault_param; + rcu_assign_pointer(param->fault_param, fault_param); =20 done_unlock: mutex_unlock(¶m->lock); @@ -429,10 +458,12 @@ int iopf_queue_remove_device(struct iopf_queue *queue= , struct device *dev) int ret =3D 0; struct iopf_fault *iopf, *next; struct dev_iommu *param =3D dev->iommu; - struct iommu_fault_param *fault_param =3D param->fault_param; + struct iommu_fault_param *fault_param; =20 mutex_lock(&queue->lock); mutex_lock(¶m->lock); + fault_param =3D rcu_dereference_check(param->fault_param, + lockdep_is_held(¶m->lock)); if (!fault_param) { ret =3D -ENODEV; goto unlock; @@ -454,8 +485,10 @@ int iopf_queue_remove_device(struct iopf_queue *queue,= struct device *dev) list_for_each_entry_safe(iopf, next, &fault_param->partial, list) kfree(iopf); =20 - param->fault_param =3D NULL; - kfree(fault_param); + /* dec the ref owned by iopf_queue_add_device() */ + rcu_assign_pointer(param->fault_param, NULL); + if (refcount_dec_and_test(&fault_param->users)) + kfree_rcu(fault_param, rcu); unlock: mutex_unlock(¶m->lock); mutex_unlock(&queue->lock); diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c index 9de878e40413..b51995b4fe90 100644 --- a/drivers/iommu/iommu-sva.c +++ b/drivers/iommu/iommu-sva.c @@ -251,7 +251,7 @@ static void iommu_sva_handle_iopf(struct work_struct *w= ork) =20 static int iommu_sva_iopf_handler(struct iopf_group *group) { - struct iommu_fault_param *fault_param =3D group->dev->iommu->fault_param; + struct iommu_fault_param *fault_param =3D group->fault_param; =20 INIT_WORK(&group->work, iommu_sva_handle_iopf); if (!queue_work(fault_param->queue->wq, &group->work)) --=20 2.34.1