From nobody Fri Jun 19 13:27:36 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F3590C433EF for ; Mon, 4 Apr 2022 11:33:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1359669AbiDDLfD (ORCPT ); Mon, 4 Apr 2022 07:35:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45854 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1349295AbiDDLfA (ORCPT ); Mon, 4 Apr 2022 07:35:00 -0400 Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F1C643C739 for ; Mon, 4 Apr 2022 04:33:03 -0700 (PDT) Received: from fraeml704-chm.china.huawei.com (unknown [172.18.147.207]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4KX7qQ6l2nz67tf3; Mon, 4 Apr 2022 19:30:10 +0800 (CST) Received: from lhreml724-chm.china.huawei.com (10.201.108.75) by fraeml704-chm.china.huawei.com (10.206.15.53) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.2375.24; Mon, 4 Apr 2022 13:33:01 +0200 Received: from localhost.localdomain (10.69.192.58) by lhreml724-chm.china.huawei.com (10.201.108.75) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Mon, 4 Apr 2022 12:32:58 +0100 From: John Garry To: , , CC: , , , , , , , , , John Garry Subject: [PATCH RESEND v5 1/5] iommu: Refactor iommu_group_store_type() Date: Mon, 4 Apr 2022 19:27:10 +0800 Message-ID: <1649071634-188535-2-git-send-email-john.garry@huawei.com> X-Mailer: git-send-email 2.8.1 In-Reply-To: <1649071634-188535-1-git-send-email-john.garry@huawei.com> References: <1649071634-188535-1-git-send-email-john.garry@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.69.192.58] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To lhreml724-chm.china.huawei.com (10.201.108.75) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Function iommu_group_store_type() supports changing the default domain of an IOMMU group. Many conditions need to be satisfied and steps taken for this action to be successful. Satisfying these conditions and steps will be required for setting other IOMMU group attributes, so factor into a common part and a part specific to update the IOMMU group attribute. No functional change intended. Some code comments are tidied up also. Signed-off-by: John Garry Acked-by: Will Deacon Reviewed-by: Zhen Lei --- drivers/iommu/iommu.c | 96 ++++++++++++++++++++++++++++--------------- 1 file changed, 62 insertions(+), 34 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index f2c45b85b9fc..0dd766030baf 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -3000,21 +3000,57 @@ static int iommu_change_dev_def_domain(struct iommu= _group *group, return ret; } =20 +enum iommu_group_op { + CHANGE_GROUP_TYPE, +}; + +static int __iommu_group_store_type(const char *buf, struct iommu_group *g= roup, + struct device *dev) +{ + int type; + + if (sysfs_streq(buf, "identity")) + type =3D IOMMU_DOMAIN_IDENTITY; + else if (sysfs_streq(buf, "DMA")) + type =3D IOMMU_DOMAIN_DMA; + else if (sysfs_streq(buf, "DMA-FQ")) + type =3D IOMMU_DOMAIN_DMA_FQ; + else if (sysfs_streq(buf, "auto")) + type =3D 0; + else + return -EINVAL; + + /* + * Check if the only device in the group still has a driver bound or + * we're transistioning from DMA -> DMA-FQ + */ + if (device_is_bound(dev) && !(type =3D=3D IOMMU_DOMAIN_DMA_FQ && + group->default_domain->type =3D=3D IOMMU_DOMAIN_DMA)) { + pr_err_ratelimited("Device is still bound to driver\n"); + return -EINVAL; + } + + return iommu_change_dev_def_domain(group, dev, type); +} + /* * Changing the default domain through sysfs requires the users to unbind = the * drivers from the devices in the iommu group, except for a DMA -> DMA-FQ - * transition. Return failure if this isn't met. + * transition. Changing or any other IOMMU group attribute still requires = the + * user to unbind the drivers from the devices in the iommu group. Return + * failure if these conditions are not met. * * We need to consider the race between this and the device release path. * device_lock(dev) is used here to guarantee that the device release path * will not be entered at the same time. */ -static ssize_t iommu_group_store_type(struct iommu_group *group, - const char *buf, size_t count) +static ssize_t iommu_group_store_common(struct iommu_group *group, + enum iommu_group_op op, + const char *buf, size_t count) { struct group_device *grp_dev; struct device *dev; - int ret, req_type; + int ret; =20 if (!capable(CAP_SYS_ADMIN) || !capable(CAP_SYS_RAWIO)) return -EACCES; @@ -3022,27 +3058,16 @@ static ssize_t iommu_group_store_type(struct iommu_= group *group, if (WARN_ON(!group)) return -EINVAL; =20 - if (sysfs_streq(buf, "identity")) - req_type =3D IOMMU_DOMAIN_IDENTITY; - else if (sysfs_streq(buf, "DMA")) - req_type =3D IOMMU_DOMAIN_DMA; - else if (sysfs_streq(buf, "DMA-FQ")) - req_type =3D IOMMU_DOMAIN_DMA_FQ; - else if (sysfs_streq(buf, "auto")) - req_type =3D 0; - else - return -EINVAL; - /* * Lock/Unlock the group mutex here before device lock to - * 1. Make sure that the iommu group has only one device (this is a + * 1. Make sure that the IOMMU group has only one device (this is a * prerequisite for step 2) * 2. Get struct *dev which is needed to lock device */ mutex_lock(&group->mutex); if (iommu_group_device_count(group) !=3D 1) { mutex_unlock(&group->mutex); - pr_err_ratelimited("Cannot change default domain: Group has more than on= e device\n"); + pr_err_ratelimited("Cannot change IOMMU group default domain attribute: = Group has more than one device\n"); return -EINVAL; } =20 @@ -3054,16 +3079,16 @@ static ssize_t iommu_group_store_type(struct iommu_= group *group, /* * Don't hold the group mutex because taking group mutex first and then * the device lock could potentially cause a deadlock as below. Assume - * two threads T1 and T2. T1 is trying to change default domain of an - * iommu group and T2 is trying to hot unplug a device or release [1] VF - * of a PCIe device which is in the same iommu group. T1 takes group - * mutex and before it could take device lock assume T2 has taken device - * lock and is yet to take group mutex. Now, both the threads will be - * waiting for the other thread to release lock. Below, lock order was - * suggested. + * two threads, T1 and T2. T1 is trying to change default domain + * attribute of an IOMMU group and T2 is trying to hot unplug a device + * or release [1] VF of a PCIe device which is in the same IOMMU group. + * T1 takes the group mutex and before it could take device lock T2 may + * have taken device lock and is yet to take group mutex. Now, both the + * threads will be waiting for the other thread to release lock. Below, + * lock order was suggested. * device_lock(dev); * mutex_lock(&group->mutex); - * iommu_change_dev_def_domain(); + * cb->iommu_change_dev_def_domain(); [example cb] * mutex_unlock(&group->mutex); * device_unlock(dev); * @@ -3077,21 +3102,24 @@ static ssize_t iommu_group_store_type(struct iommu_= group *group, */ mutex_unlock(&group->mutex); =20 - /* Check if the device in the group still has a driver bound to it */ device_lock(dev); - if (device_is_bound(dev) && !(req_type =3D=3D IOMMU_DOMAIN_DMA_FQ && - group->default_domain->type =3D=3D IOMMU_DOMAIN_DMA)) { - pr_err_ratelimited("Device is still bound to driver\n"); - ret =3D -EBUSY; - goto out; + switch (op) { + case CHANGE_GROUP_TYPE: + ret =3D __iommu_group_store_type(buf, group, dev); + break; + default: + ret =3D -EINVAL; } - - ret =3D iommu_change_dev_def_domain(group, dev, req_type); ret =3D ret ?: count; =20 -out: device_unlock(dev); put_device(dev); =20 return ret; } + +static ssize_t iommu_group_store_type(struct iommu_group *group, + const char *buf, size_t count) +{ + return iommu_group_store_common(group, CHANGE_GROUP_TYPE, buf, count); +} --=20 2.26.2 From nobody Fri Jun 19 13:27:36 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3FDD5C433F5 for ; Mon, 4 Apr 2022 11:33:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1359762AbiDDLfG (ORCPT ); Mon, 4 Apr 2022 07:35:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45942 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1349295AbiDDLfD (ORCPT ); Mon, 4 Apr 2022 07:35:03 -0400 Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 497183C73A for ; Mon, 4 Apr 2022 04:33:07 -0700 (PDT) Received: from fraeml703-chm.china.huawei.com (unknown [172.18.147.207]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4KX7rT4dzdz67Lnh; Mon, 4 Apr 2022 19:31:05 +0800 (CST) Received: from lhreml724-chm.china.huawei.com (10.201.108.75) by fraeml703-chm.china.huawei.com (10.206.15.52) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.2375.24; Mon, 4 Apr 2022 13:33:05 +0200 Received: from localhost.localdomain (10.69.192.58) by lhreml724-chm.china.huawei.com (10.201.108.75) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Mon, 4 Apr 2022 12:33:01 +0100 From: John Garry To: , , CC: , , , , , , , , , John Garry Subject: [PATCH RESEND v5 2/5] iova: Allow rcache range upper limit to be flexible Date: Mon, 4 Apr 2022 19:27:11 +0800 Message-ID: <1649071634-188535-3-git-send-email-john.garry@huawei.com> X-Mailer: git-send-email 2.8.1 In-Reply-To: <1649071634-188535-1-git-send-email-john.garry@huawei.com> References: <1649071634-188535-1-git-send-email-john.garry@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.69.192.58] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To lhreml724-chm.china.huawei.com (10.201.108.75) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Some low-level drivers may request DMA mappings whose IOVA length exceeds that of the current rcache upper limit. This means that allocations for those IOVAs will never be cached, and always must be allocated and freed from the RB tree per DMA mapping cycle. This has a significant effect on performance, more so since commit 4e89dce72521 ("iommu/iova: Retry from last rb tree node if iova search fails"), as discussed at [0]. As a first step towards allowing the rcache range upper limit be configured, hold this value in the IOVA rcache structure, and allocate the rcaches separately. Delete macro IOVA_RANGE_CACHE_MAX_SIZE in case it's reused by mistake. [0] https://lore.kernel.org/linux-iommu/20210129092120.1482-1-thunder.leizh= en@huawei.com/ Signed-off-by: John Garry Acked-by: Will Deacon --- drivers/iommu/iova.c | 20 ++++++++++---------- include/linux/iova.h | 3 +++ 2 files changed, 13 insertions(+), 10 deletions(-) diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index db77aa675145..5c22b9187b79 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -15,8 +15,6 @@ /* The anchor node sits above the top of the usable address space */ #define IOVA_ANCHOR ~0UL =20 -#define IOVA_RANGE_CACHE_MAX_SIZE 6 /* log of max cached IOVA range size (= in pages) */ - static bool iova_rcache_insert(struct iova_domain *iovad, unsigned long pfn, unsigned long size); @@ -443,7 +441,7 @@ alloc_iova_fast(struct iova_domain *iovad, unsigned lon= g size, * rounding up anything cacheable to make sure that can't happen. The * order of the unadjusted size will still match upon freeing. */ - if (size < (1 << (IOVA_RANGE_CACHE_MAX_SIZE - 1))) + if (size < (1 << (iovad->rcache_max_size - 1))) size =3D roundup_pow_of_two(size); =20 iova_pfn =3D iova_rcache_get(iovad, size, limit_pfn + 1); @@ -713,13 +711,15 @@ int iova_domain_init_rcaches(struct iova_domain *iova= d) unsigned int cpu; int i, ret; =20 - iovad->rcaches =3D kcalloc(IOVA_RANGE_CACHE_MAX_SIZE, + iovad->rcache_max_size =3D 6; /* Arbitrarily high default */ + + iovad->rcaches =3D kcalloc(iovad->rcache_max_size, sizeof(struct iova_rcache), GFP_KERNEL); if (!iovad->rcaches) return -ENOMEM; =20 - for (i =3D 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) { + for (i =3D 0; i < iovad->rcache_max_size; ++i) { struct iova_cpu_rcache *cpu_rcache; struct iova_rcache *rcache; =20 @@ -816,7 +816,7 @@ static bool iova_rcache_insert(struct iova_domain *iova= d, unsigned long pfn, { unsigned int log_size =3D order_base_2(size); =20 - if (log_size >=3D IOVA_RANGE_CACHE_MAX_SIZE) + if (log_size >=3D iovad->rcache_max_size) return false; =20 return __iova_rcache_insert(iovad, &iovad->rcaches[log_size], pfn); @@ -872,7 +872,7 @@ static unsigned long iova_rcache_get(struct iova_domain= *iovad, { unsigned int log_size =3D order_base_2(size); =20 - if (log_size >=3D IOVA_RANGE_CACHE_MAX_SIZE || !iovad->rcaches) + if (log_size >=3D iovad->rcache_max_size || !iovad->rcaches) return 0; =20 return __iova_rcache_get(&iovad->rcaches[log_size], limit_pfn - size); @@ -888,7 +888,7 @@ static void free_iova_rcaches(struct iova_domain *iovad) unsigned int cpu; int i, j; =20 - for (i =3D 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) { + for (i =3D 0; i < iovad->rcache_max_size; ++i) { rcache =3D &iovad->rcaches[i]; if (!rcache->cpu_rcaches) break; @@ -916,7 +916,7 @@ static void free_cpu_cached_iovas(unsigned int cpu, str= uct iova_domain *iovad) unsigned long flags; int i; =20 - for (i =3D 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) { + for (i =3D 0; i < iovad->rcache_max_size; ++i) { rcache =3D &iovad->rcaches[i]; cpu_rcache =3D per_cpu_ptr(rcache->cpu_rcaches, cpu); spin_lock_irqsave(&cpu_rcache->lock, flags); @@ -935,7 +935,7 @@ static void free_global_cached_iovas(struct iova_domain= *iovad) unsigned long flags; int i, j; =20 - for (i =3D 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) { + for (i =3D 0; i < iovad->rcache_max_size; ++i) { rcache =3D &iovad->rcaches[i]; spin_lock_irqsave(&rcache->lock, flags); for (j =3D 0; j < rcache->depot_size; ++j) { diff --git a/include/linux/iova.h b/include/linux/iova.h index 320a70e40233..02f7222fa85a 100644 --- a/include/linux/iova.h +++ b/include/linux/iova.h @@ -38,6 +38,9 @@ struct iova_domain { =20 struct iova_rcache *rcaches; struct hlist_node cpuhp_dead; + + /* log of max cached IOVA range size (in pages) */ + unsigned long rcache_max_size; }; =20 static inline unsigned long iova_size(struct iova *iova) --=20 2.26.2 From nobody Fri Jun 19 13:27:36 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 53D43C433EF for ; Mon, 4 Apr 2022 11:33:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1359709AbiDDLfM (ORCPT ); Mon, 4 Apr 2022 07:35:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46156 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1359792AbiDDLfJ (ORCPT ); Mon, 4 Apr 2022 07:35:09 -0400 Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6652E3D1F6 for ; Mon, 4 Apr 2022 04:33:10 -0700 (PDT) Received: from fraeml702-chm.china.huawei.com (unknown [172.18.147.207]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4KX7rX4qvzz67Lqc; Mon, 4 Apr 2022 19:31:08 +0800 (CST) Received: from lhreml724-chm.china.huawei.com (10.201.108.75) by fraeml702-chm.china.huawei.com (10.206.15.51) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.2375.24; Mon, 4 Apr 2022 13:33:08 +0200 Received: from localhost.localdomain (10.69.192.58) by lhreml724-chm.china.huawei.com (10.201.108.75) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Mon, 4 Apr 2022 12:33:05 +0100 From: John Garry To: , , CC: , , , , , , , , , John Garry Subject: [PATCH RESEND v5 3/5] iommu: Allow iommu_change_dev_def_domain() realloc same default domain type Date: Mon, 4 Apr 2022 19:27:12 +0800 Message-ID: <1649071634-188535-4-git-send-email-john.garry@huawei.com> X-Mailer: git-send-email 2.8.1 In-Reply-To: <1649071634-188535-1-git-send-email-john.garry@huawei.com> References: <1649071634-188535-1-git-send-email-john.garry@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.69.192.58] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To lhreml724-chm.china.huawei.com (10.201.108.75) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Allow iommu_change_dev_def_domain() to create a new default domain, keeping the same as current. Also remove comment about the function purpose, which will become stale. Signed-off-by: John Garry --- drivers/iommu/iommu.c | 49 ++++++++++++++++++++++--------------------- include/linux/iommu.h | 1 + 2 files changed, 26 insertions(+), 24 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 0dd766030baf..10bb10c2a210 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -2863,6 +2863,7 @@ u32 iommu_sva_get_pasid(struct iommu_sva *handle) } EXPORT_SYMBOL_GPL(iommu_sva_get_pasid); =20 + /* * Changes the default domain of an iommu group that has *only* one device * @@ -2873,10 +2874,6 @@ EXPORT_SYMBOL_GPL(iommu_sva_get_pasid); * * Returns 0 on success and error code on failure * - * Note: - * 1. Presently, this function is called only when user requests to change= the - * group's default domain type through /sys/kernel/iommu_groups//type - * Please take a closer look if intended to use for other purposes. */ static int iommu_change_dev_def_domain(struct iommu_group *group, struct device *prev_dev, int type) @@ -2929,28 +2926,32 @@ static int iommu_change_dev_def_domain(struct iommu= _group *group, goto out; } =20 - dev_def_dom =3D iommu_get_def_domain_type(dev); - if (!type) { + if (type =3D=3D __IOMMU_DOMAIN_SAME) { + type =3D prev_dom->type; + } else { + dev_def_dom =3D iommu_get_def_domain_type(dev); + if (!type) { + /* + * If the user hasn't requested any specific type of domain and + * if the device supports both the domains, then default to the + * domain the device was booted with + */ + type =3D dev_def_dom ? : iommu_def_domain_type; + } else if (dev_def_dom && type !=3D dev_def_dom) { + dev_err_ratelimited(prev_dev, "Device cannot be in %s domain\n", + iommu_domain_type_str(type)); + ret =3D -EINVAL; + goto out; + } + /* - * If the user hasn't requested any specific type of domain and - * if the device supports both the domains, then default to the - * domain the device was booted with + * Switch to a new domain only if the requested domain type is different + * from the existing default domain type */ - type =3D dev_def_dom ? : iommu_def_domain_type; - } else if (dev_def_dom && type !=3D dev_def_dom) { - dev_err_ratelimited(prev_dev, "Device cannot be in %s domain\n", - iommu_domain_type_str(type)); - ret =3D -EINVAL; - goto out; - } - - /* - * Switch to a new domain only if the requested domain type is different - * from the existing default domain type - */ - if (prev_dom->type =3D=3D type) { - ret =3D 0; - goto out; + if (prev_dom->type =3D=3D type) { + ret =3D 0; + goto out; + } } =20 /* We can bring up a flush queue without tearing down the domain */ diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 9208eca4b0d1..b141cf71c7af 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -63,6 +63,7 @@ struct iommu_domain_geometry { implementation */ #define __IOMMU_DOMAIN_PT (1U << 2) /* Domain is identity mapped */ #define __IOMMU_DOMAIN_DMA_FQ (1U << 3) /* DMA-API uses flush queue */ +#define __IOMMU_DOMAIN_SAME (1U << 4) /* Keep same type (internal) */ =20 /* * This are the possible domain-types --=20 2.26.2 From nobody Fri Jun 19 13:27:36 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E7463C433EF for ; Mon, 4 Apr 2022 11:33:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1376297AbiDDLfP (ORCPT ); Mon, 4 Apr 2022 07:35:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46284 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1359676AbiDDLfL (ORCPT ); Mon, 4 Apr 2022 07:35:11 -0400 Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E153E3D497 for ; Mon, 4 Apr 2022 04:33:13 -0700 (PDT) Received: from fraeml701-chm.china.huawei.com (unknown [172.18.147.200]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4KX7qc626lz67yKr; Mon, 4 Apr 2022 19:30:20 +0800 (CST) Received: from lhreml724-chm.china.huawei.com (10.201.108.75) by fraeml701-chm.china.huawei.com (10.206.15.50) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.2375.24; Mon, 4 Apr 2022 13:33:11 +0200 Received: from localhost.localdomain (10.69.192.58) by lhreml724-chm.china.huawei.com (10.201.108.75) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Mon, 4 Apr 2022 12:33:08 +0100 From: John Garry To: , , CC: , , , , , , , , , John Garry Subject: [PATCH RESEND v5 4/5] iommu: Allow max opt DMA len be set for a group via sysfs Date: Mon, 4 Apr 2022 19:27:13 +0800 Message-ID: <1649071634-188535-5-git-send-email-john.garry@huawei.com> X-Mailer: git-send-email 2.8.1 In-Reply-To: <1649071634-188535-1-git-send-email-john.garry@huawei.com> References: <1649071634-188535-1-git-send-email-john.garry@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.69.192.58] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To lhreml724-chm.china.huawei.com (10.201.108.75) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add support to allow the maximum optimised DMA len be set for an IOMMU group via sysfs. This is much the same with the method to change the default domain type for a group. Signed-off-by: John Garry --- .../ABI/testing/sysfs-kernel-iommu_groups | 16 +++++ drivers/iommu/iommu.c | 59 ++++++++++++++++++- include/linux/iommu.h | 6 ++ 3 files changed, 79 insertions(+), 2 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-kernel-iommu_groups b/Document= ation/ABI/testing/sysfs-kernel-iommu_groups index b15af6a5bc08..ed6f72794f6c 100644 --- a/Documentation/ABI/testing/sysfs-kernel-iommu_groups +++ b/Documentation/ABI/testing/sysfs-kernel-iommu_groups @@ -63,3 +63,19 @@ Description: /sys/kernel/iommu_groups//type show= s the type of default system could lead to catastrophic effects (the users might need to reboot the machine to get it to normal state). So, it's expected that the users understand what they're doing. + +What: /sys/kernel/iommu_groups//max_opt_dma_size +Date: Feb 2022 +KernelVersion: v5.18 +Contact: iommu@lists.linux-foundation.org +Description: /sys/kernel/iommu_groups//max_opt_dma_size shows the + max optimised DMA size for the default IOMMU domain associated + with the group. + Each IOMMU domain has an IOVA domain. The IOVA domain caches + IOVAs upto a certain size as a performance optimisation. + This sysfs file allows the range of the IOVA domain caching be + set, such that larger than default IOVAs may be cached. + A value of 0 means that the default caching range is chosen. + A privileged user could request the kernel the change the range + by writing to this file. For this to happen, the same rules + and procedure applies as in changing the default domain type. diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 10bb10c2a210..7c7258f19bed 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -48,6 +48,7 @@ struct iommu_group { struct iommu_domain *default_domain; struct iommu_domain *domain; struct list_head entry; + size_t max_opt_dma_size; }; =20 struct group_device { @@ -89,6 +90,9 @@ static int iommu_create_device_direct_mappings(struct iom= mu_group *group, static struct iommu_group *iommu_group_get_for_dev(struct device *dev); static ssize_t iommu_group_store_type(struct iommu_group *group, const char *buf, size_t count); +static ssize_t iommu_group_store_max_opt_dma_size(struct iommu_group *grou= p, + const char *buf, + size_t count); =20 #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store) \ struct iommu_group_attribute iommu_group_attr_##_name =3D \ @@ -571,6 +575,12 @@ static ssize_t iommu_group_show_type(struct iommu_grou= p *group, return strlen(type); } =20 +static ssize_t iommu_group_show_max_opt_dma_size(struct iommu_group *group, + char *buf) +{ + return sprintf(buf, "%zu\n", group->max_opt_dma_size); +} + static IOMMU_GROUP_ATTR(name, S_IRUGO, iommu_group_show_name, NULL); =20 static IOMMU_GROUP_ATTR(reserved_regions, 0444, @@ -579,6 +589,9 @@ static IOMMU_GROUP_ATTR(reserved_regions, 0444, static IOMMU_GROUP_ATTR(type, 0644, iommu_group_show_type, iommu_group_store_type); =20 +static IOMMU_GROUP_ATTR(max_opt_dma_size, 0644, iommu_group_show_max_opt_d= ma_size, + iommu_group_store_max_opt_dma_size); + static void iommu_group_release(struct kobject *kobj) { struct iommu_group *group =3D to_iommu_group(kobj); @@ -665,6 +678,10 @@ struct iommu_group *iommu_group_alloc(void) if (ret) return ERR_PTR(ret); =20 + ret =3D iommu_group_create_file(group, &iommu_group_attr_max_opt_dma_size= ); + if (ret) + return ERR_PTR(ret); + pr_debug("Allocated group %d\n", group->id); =20 return group; @@ -2087,6 +2104,11 @@ struct iommu_domain *iommu_get_dma_domain(struct dev= ice *dev) return dev->iommu_group->default_domain; } =20 +size_t iommu_group_get_max_opt_dma_size(struct iommu_group *group) +{ + return group->max_opt_dma_size; +} + /* * IOMMU groups are really the natural working unit of the IOMMU, but * the IOMMU API works on domains and devices. Bridge that gap by @@ -2871,12 +2893,14 @@ EXPORT_SYMBOL_GPL(iommu_sva_get_pasid); * @prev_dev: The device in the group (this is used to make sure that the = device * hasn't changed after the caller has called this function) * @type: The type of the new default domain that gets associated with the= group + * @max_opt_dma_size: Set the IOMMU group max_opt_dma_size if non-zero * * Returns 0 on success and error code on failure * */ static int iommu_change_dev_def_domain(struct iommu_group *group, - struct device *prev_dev, int type) + struct device *prev_dev, int type, + unsigned long max_opt_dma_size) { struct iommu_domain *prev_dom; struct group_device *grp_dev; @@ -2977,6 +3001,9 @@ static int iommu_change_dev_def_domain(struct iommu_g= roup *group, =20 group->domain =3D group->default_domain; =20 + if (max_opt_dma_size) + group->max_opt_dma_size =3D max_opt_dma_size; + /* * Release the mutex here because ops->probe_finalize() call-back of * some vendor IOMMU drivers calls arm_iommu_attach_device() which @@ -3003,6 +3030,7 @@ static int iommu_change_dev_def_domain(struct iommu_g= roup *group, =20 enum iommu_group_op { CHANGE_GROUP_TYPE, + CHANGE_DMA_OPT_SIZE, }; =20 static int __iommu_group_store_type(const char *buf, struct iommu_group *g= roup, @@ -3031,7 +3059,24 @@ static int __iommu_group_store_type(const char *buf,= struct iommu_group *group, return -EINVAL; } =20 - return iommu_change_dev_def_domain(group, dev, type); + return iommu_change_dev_def_domain(group, dev, type, 0); +} + +static int __iommu_group_store_max_opt_dma_size(const char *buf, + struct iommu_group *group, + struct device *dev) +{ + unsigned long val; + + if (kstrtoul(buf, 0, &val) || !val) + return -EINVAL; + + if (device_is_bound(dev)) { + pr_err_ratelimited("Device is still bound to driver\n"); + return -EINVAL; + } + + return iommu_change_dev_def_domain(group, dev, __IOMMU_DOMAIN_SAME, val); } =20 /* @@ -3108,6 +3153,9 @@ static ssize_t iommu_group_store_common(struct iommu_= group *group, case CHANGE_GROUP_TYPE: ret =3D __iommu_group_store_type(buf, group, dev); break; + case CHANGE_DMA_OPT_SIZE: + ret =3D __iommu_group_store_max_opt_dma_size(buf, group, dev); + break; default: ret =3D -EINVAL; } @@ -3124,3 +3172,10 @@ static ssize_t iommu_group_store_type(struct iommu_g= roup *group, { return iommu_group_store_common(group, CHANGE_GROUP_TYPE, buf, count); } + +static ssize_t iommu_group_store_max_opt_dma_size(struct iommu_group *grou= p, + const char *buf, + size_t count) +{ + return iommu_group_store_common(group, CHANGE_DMA_OPT_SIZE, buf, count); +} diff --git a/include/linux/iommu.h b/include/linux/iommu.h index b141cf71c7af..6915e68c40b7 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -430,6 +430,7 @@ extern int iommu_sva_unbind_gpasid(struct iommu_domain = *domain, struct device *dev, ioasid_t pasid); extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev); extern struct iommu_domain *iommu_get_dma_domain(struct device *dev); +extern size_t iommu_group_get_max_opt_dma_size(struct iommu_group *group); extern int iommu_map(struct iommu_domain *domain, unsigned long iova, phys_addr_t paddr, size_t size, int prot); extern int iommu_map_atomic(struct iommu_domain *domain, unsigned long iov= a, @@ -725,6 +726,11 @@ static inline struct iommu_domain *iommu_get_domain_fo= r_dev(struct device *dev) return NULL; } =20 +static inline size_t iommu_group_get_max_opt_dma_size(struct iommu_group *= group) +{ + return 0; +} + static inline int iommu_map(struct iommu_domain *domain, unsigned long iov= a, phys_addr_t paddr, size_t size, int prot) { --=20 2.26.2 From nobody Fri Jun 19 13:27:36 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BC639C433F5 for ; Mon, 4 Apr 2022 11:33:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1359792AbiDDLf1 (ORCPT ); Mon, 4 Apr 2022 07:35:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46438 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1376275AbiDDLfO (ORCPT ); Mon, 4 Apr 2022 07:35:14 -0400 Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 858E53D1FA for ; Mon, 4 Apr 2022 04:33:17 -0700 (PDT) Received: from fraeml745-chm.china.huawei.com (unknown [172.18.147.200]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4KX7qh0hzfz67tf3; Mon, 4 Apr 2022 19:30:24 +0800 (CST) Received: from lhreml724-chm.china.huawei.com (10.201.108.75) by fraeml745-chm.china.huawei.com (10.206.15.226) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Mon, 4 Apr 2022 13:33:15 +0200 Received: from localhost.localdomain (10.69.192.58) by lhreml724-chm.china.huawei.com (10.201.108.75) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Mon, 4 Apr 2022 12:33:11 +0100 From: John Garry To: , , CC: , , , , , , , , , John Garry Subject: [PATCH RESEND v5 5/5] iova: Add iova_len argument to iova_domain_init_rcaches() Date: Mon, 4 Apr 2022 19:27:14 +0800 Message-ID: <1649071634-188535-6-git-send-email-john.garry@huawei.com> X-Mailer: git-send-email 2.8.1 In-Reply-To: <1649071634-188535-1-git-send-email-john.garry@huawei.com> References: <1649071634-188535-1-git-send-email-john.garry@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.69.192.58] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To lhreml724-chm.china.huawei.com (10.201.108.75) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add max opt argument to iova_domain_init_rcaches(), and use it to set the rcaches range. Also fix up all users to set this value (at 0, meaning use default), including a wrapper for that, iova_domain_init_rcaches_default(). For dma-iommu.c we derive the iova_len argument from the IOMMU group max opt DMA size. Signed-off-by: John Garry --- drivers/iommu/dma-iommu.c | 15 ++++++++++++++- drivers/iommu/iova.c | 19 ++++++++++++++++--- drivers/vdpa/vdpa_user/iova_domain.c | 4 ++-- include/linux/iova.h | 3 ++- 4 files changed, 34 insertions(+), 7 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 42ca42ff1b5d..19f35624611c 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -525,6 +525,8 @@ static int iommu_dma_init_domain(struct iommu_domain *d= omain, dma_addr_t base, struct iommu_dma_cookie *cookie =3D domain->iova_cookie; unsigned long order, base_pfn; struct iova_domain *iovad; + size_t max_opt_dma_size; + unsigned long iova_len =3D 0; int ret; =20 if (!cookie || cookie->type !=3D IOMMU_DMA_IOVA_COOKIE) @@ -560,7 +562,18 @@ static int iommu_dma_init_domain(struct iommu_domain *= domain, dma_addr_t base, } =20 init_iova_domain(iovad, 1UL << order, base_pfn); - ret =3D iova_domain_init_rcaches(iovad); + + max_opt_dma_size =3D iommu_group_get_max_opt_dma_size(dev->iommu_group); + if (max_opt_dma_size) { + unsigned long shift =3D __ffs(1UL << order); + + iova_len =3D roundup_pow_of_two(max_opt_dma_size); + iova_len >>=3D shift; + if (!iova_len) + iova_len =3D 1; + } + + ret =3D iova_domain_init_rcaches(iovad, iova_len); if (ret) return ret; =20 diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index 5c22b9187b79..d65e79e132ee 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -706,12 +706,20 @@ static void iova_magazine_push(struct iova_magazine *= mag, unsigned long pfn) mag->pfns[mag->size++] =3D pfn; } =20 -int iova_domain_init_rcaches(struct iova_domain *iovad) +static unsigned long iova_len_to_rcache_max(unsigned long iova_len) +{ + return order_base_2(iova_len) + 1; +} + +int iova_domain_init_rcaches(struct iova_domain *iovad, unsigned long iova= _len) { unsigned int cpu; int i, ret; =20 - iovad->rcache_max_size =3D 6; /* Arbitrarily high default */ + if (iova_len) + iovad->rcache_max_size =3D iova_len_to_rcache_max(iova_len); + else + iovad->rcache_max_size =3D 6; /* Arbitrarily high default */ =20 iovad->rcaches =3D kcalloc(iovad->rcache_max_size, sizeof(struct iova_rcache), @@ -755,7 +763,12 @@ int iova_domain_init_rcaches(struct iova_domain *iovad) free_iova_rcaches(iovad); return ret; } -EXPORT_SYMBOL_GPL(iova_domain_init_rcaches); + +int iova_domain_init_rcaches_default(struct iova_domain *iovad) +{ + return iova_domain_init_rcaches(iovad, 0); +} +EXPORT_SYMBOL_GPL(iova_domain_init_rcaches_default); =20 /* * Try inserting IOVA range starting with 'iova_pfn' into 'rcache', and diff --git a/drivers/vdpa/vdpa_user/iova_domain.c b/drivers/vdpa/vdpa_user/= iova_domain.c index 6daa3978d290..3a2acef98a4a 100644 --- a/drivers/vdpa/vdpa_user/iova_domain.c +++ b/drivers/vdpa/vdpa_user/iova_domain.c @@ -514,12 +514,12 @@ vduse_domain_create(unsigned long iova_limit, size_t = bounce_size) spin_lock_init(&domain->iotlb_lock); init_iova_domain(&domain->stream_iovad, PAGE_SIZE, IOVA_START_PFN); - ret =3D iova_domain_init_rcaches(&domain->stream_iovad); + ret =3D iova_domain_init_rcaches_default(&domain->stream_iovad); if (ret) goto err_iovad_stream; init_iova_domain(&domain->consistent_iovad, PAGE_SIZE, bounce_pfns); - ret =3D iova_domain_init_rcaches(&domain->consistent_iovad); + ret =3D iova_domain_init_rcaches_default(&domain->consistent_iovad); if (ret) goto err_iovad_consistent; =20 diff --git a/include/linux/iova.h b/include/linux/iova.h index 02f7222fa85a..56281434ce0c 100644 --- a/include/linux/iova.h +++ b/include/linux/iova.h @@ -95,7 +95,8 @@ struct iova *reserve_iova(struct iova_domain *iovad, unsi= gned long pfn_lo, unsigned long pfn_hi); void init_iova_domain(struct iova_domain *iovad, unsigned long granule, unsigned long start_pfn); -int iova_domain_init_rcaches(struct iova_domain *iovad); +int iova_domain_init_rcaches(struct iova_domain *iovad, unsigned long iova= _len); +int iova_domain_init_rcaches_default(struct iova_domain *iovad); struct iova *find_iova(struct iova_domain *iovad, unsigned long pfn); void put_iova_domain(struct iova_domain *iovad); #else --=20 2.26.2