In addition to IOMMU_DOMAIN_DMA, def_domain_type can also return
IOMMU_DOMAIN_DMA_FQ when applicable, else flush queues will never be
used.
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
include/linux/iommu.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 05279109c732..d0da1918d2de 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -585,6 +585,7 @@ iommu_copy_struct_from_full_user_array(void *kdst, size_t kdst_entry_size,
* @def_domain_type: device default domain type, return value:
* - IOMMU_DOMAIN_IDENTITY: must use an identity domain
* - IOMMU_DOMAIN_DMA: must use a dma domain
+ * - IOMMU_DOMAIN_DMA_FQ: dma domain with batch invalidation
* - 0: use the default setting
* @default_domain_ops: the default ops for domains
* @remove_dev_pasid: Remove any translation configurations of a specific
--
2.47.0
On 12/10/24 03:24, Matthew Rosato wrote: > In addition to IOMMU_DOMAIN_DMA, def_domain_type can also return > IOMMU_DOMAIN_DMA_FQ when applicable, else flush queues will never be > used. > > Signed-off-by: Matthew Rosato<mjrosato@linux.ibm.com> > --- > include/linux/iommu.h | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h > index 05279109c732..d0da1918d2de 100644 > --- a/include/linux/iommu.h > +++ b/include/linux/iommu.h > @@ -585,6 +585,7 @@ iommu_copy_struct_from_full_user_array(void *kdst, size_t kdst_entry_size, > * @def_domain_type: device default domain type, return value: > * - IOMMU_DOMAIN_IDENTITY: must use an identity domain > * - IOMMU_DOMAIN_DMA: must use a dma domain > + * - IOMMU_DOMAIN_DMA_FQ: dma domain with batch invalidation In which case must an iommu driver return IOMMU_DOMAIN_DMA_FQ? The flush queue is a policy of "when and how to synchronize the IOTLB" in dma-iommu.c. The iommu driver actually has no need to understand such policy. By the way, "dma domain" in this comment is a bit confusing, it reads better if we make it like: - IOMMU_DOMAIN_IDENTITY: must use an identity domain - IOMMU_DOMAIN_DMA: must use a paging domain Thanks, baolu
On 12/9/24 9:57 PM, Baolu Lu wrote: > On 12/10/24 03:24, Matthew Rosato wrote: >> In addition to IOMMU_DOMAIN_DMA, def_domain_type can also return >> IOMMU_DOMAIN_DMA_FQ when applicable, else flush queues will never be >> used. >> >> Signed-off-by: Matthew Rosato<mjrosato@linux.ibm.com> >> --- >> include/linux/iommu.h | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/include/linux/iommu.h b/include/linux/iommu.h >> index 05279109c732..d0da1918d2de 100644 >> --- a/include/linux/iommu.h >> +++ b/include/linux/iommu.h >> @@ -585,6 +585,7 @@ iommu_copy_struct_from_full_user_array(void *kdst, size_t kdst_entry_size, >> * @def_domain_type: device default domain type, return value: >> * - IOMMU_DOMAIN_IDENTITY: must use an identity domain >> * - IOMMU_DOMAIN_DMA: must use a dma domain >> + * - IOMMU_DOMAIN_DMA_FQ: dma domain with batch invalidation > > In which case must an iommu driver return IOMMU_DOMAIN_DMA_FQ? > > The flush queue is a policy of "when and how to synchronize the IOTLB" > in dma-iommu.c. The iommu driver actually has no need to understand such > policy. If you look ahead to the next patch where I implement def_domain_type for s390, I found that if I only ever return IOMMU_DOMAIN_DMA from ops->def_domain_type then when go through iommu_dma_init_domain() we will never call iommu_dma_init_fq() regardless of IOMMU_CAP_DEFERRED_FLUSH because of the if (domain->type == IOMMU_DOMAIN_DMA_FQ) check. So something isn't right here. It looks to me like the following is happening: We first have the iommu_def_domain_type set in iommu_subsys_init or via one of the set_default routines, e.g.: if (!iommu_default_passthrough() && !iommu_dma_strict) iommu_def_domain_type = IOMMU_DOMAIN_DMA_FQ; But when we arrive at iommu_group_alloc_default_domain()... if we have no ops->def_domain_type() defined we will call __iommu_group_alloc_default_domain using what is in iommu_def_domain_type, which could be IOMMU_DOMAIN_DMA, IOMMU_DOMAIN_DMA_FQ or IOMMU_DOMAIN_IDENTITY based on strict/passthrough settings. Testing an s390 scenario today without this series applied, we will call __iommu_group_alloc_default_domain with IOMMU_DOMAIN_DMA_FQ, as long as iommu.strict/passthrough is not specified, so then later in dma-iommu:iommu_dma_init_domain() we can use FQ based on IOMMU_CAP_DEFERRED_FLUSH. but once we add ops->def_domain_type() then we end up calling iommu_group_alloc_default_domain() with a req_type == the return value from ops->def_domain_type(), which by the current definition can only be IOMMU_DOMAIN_DMA or IOMMU_DOMAIN_IDENTITY. We will then call __iommu_group_alloc_default_domain with that req_type; so without this patch + the DMA_FQ path in patch 6 we would always end up allocating IOMMU_DOMAIN_DMA instead of IOMMU_DOMAIN_DMA_FQ by default, so when we arrive at dma:iommu_dma_init_domain() we won't check for IOMMU_CAP_DEFERRED_FLUSH because of the type. So unless I'm missing something I think either we have to 1) be more flexible in what ops->default_domain_type() is allowed to return as this patch does or 2) iommu core needs to look at the return from ops->default_domain_type() and decide whether it's OK to convert IOMMU_DOMAIN_DMA->IOMMU_DOMAIN_DMA_FQ based on strict setting. This removes the decision from the individual drivers and dma-iommu can later decide whether or not to use it or not based on IOMMU_CAP_DEFERRED_FLUSH? But would also affect other users of def_domain_type() today that perhaps did not want DMA_FQ? Unsure. What I mean is something like (untested): diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 6bdede4177ff..275daa7f819d 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1744,9 +1744,11 @@ static int iommu_get_def_domain_type(struct iommu_group *group, */ type = ops->default_domain->type; } else { - if (ops->def_domain_type) + if (ops->def_domain_type) { type = ops->def_domain_type(dev); - else + if (type == IOMMU_DOMAIN_DMA && !iommu_dma_strict) + type = IOMMU_DOMAIN_DMA_FQ; + } else return cur_type; } if (!type || cur_type == type)
On 10/12/2024 4:26 pm, Matthew Rosato wrote: > On 12/9/24 9:57 PM, Baolu Lu wrote: >> On 12/10/24 03:24, Matthew Rosato wrote: >>> In addition to IOMMU_DOMAIN_DMA, def_domain_type can also return >>> IOMMU_DOMAIN_DMA_FQ when applicable, else flush queues will never be >>> used. >>> >>> Signed-off-by: Matthew Rosato<mjrosato@linux.ibm.com> >>> --- >>> include/linux/iommu.h | 1 + >>> 1 file changed, 1 insertion(+) >>> >>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h >>> index 05279109c732..d0da1918d2de 100644 >>> --- a/include/linux/iommu.h >>> +++ b/include/linux/iommu.h >>> @@ -585,6 +585,7 @@ iommu_copy_struct_from_full_user_array(void *kdst, size_t kdst_entry_size, >>> * @def_domain_type: device default domain type, return value: >>> * - IOMMU_DOMAIN_IDENTITY: must use an identity domain >>> * - IOMMU_DOMAIN_DMA: must use a dma domain >>> + * - IOMMU_DOMAIN_DMA_FQ: dma domain with batch invalidation >> >> In which case must an iommu driver return IOMMU_DOMAIN_DMA_FQ? >> >> The flush queue is a policy of "when and how to synchronize the IOTLB" >> in dma-iommu.c. The iommu driver actually has no need to understand such >> policy. > > If you look ahead to the next patch where I implement def_domain_type for s390, I found that if I only ever return IOMMU_DOMAIN_DMA from ops->def_domain_type then when go through iommu_dma_init_domain() we will never call iommu_dma_init_fq() regardless of IOMMU_CAP_DEFERRED_FLUSH because of the if (domain->type == IOMMU_DOMAIN_DMA_FQ) check. So something isn't right here. Conceptually I don't think it ever makes sense for a driver to *require* a device to use deferred invalidation. Furthermore it's been the whole design for a while now that drivers should never see nor have to acknowledge IOMMU_DOMAIN_DMA_FQ, it's now just an internal type which exists largely for the sake of making the sysfs interface work really neatly. Also beware that a major reason for overriding iommu_def_domain_type with a paging domain is for untrusted devices, so massaging the result based on iommu_dma_strict is still not necessarily appropriate anyway. It appears the real underlying issue is that, like everyone else in the same situation, you're doing def_domain_type wrong. If and when you can't support IOMMU_DOMAIN_IDENTITY, the expectation is that you make __iommu_alloc_identity_domain() fail, such that if iommu_def_domain_type is then ever set to passthrough, iommu_group_alloc_default_domain() falls back to IOMMU_DOMAIN_DMA by itself, and the user gets told they did a silly thing. What you see apple-dart doing is a hack around the old bus-based iommu_domain_alloc() API where there wasn't enough information at the right point to necessarily do the right thing. Thanks, Robin. > It looks to me like the following is happening: > > We first have the iommu_def_domain_type set in iommu_subsys_init or via one of the set_default routines, e.g.: > if (!iommu_default_passthrough() && !iommu_dma_strict) > iommu_def_domain_type = IOMMU_DOMAIN_DMA_FQ; > > But when we arrive at iommu_group_alloc_default_domain()... > > if we have no ops->def_domain_type() defined we will call __iommu_group_alloc_default_domain using what is in iommu_def_domain_type, which could be IOMMU_DOMAIN_DMA, IOMMU_DOMAIN_DMA_FQ or IOMMU_DOMAIN_IDENTITY based on strict/passthrough settings. Testing an s390 scenario today without this series applied, we will call __iommu_group_alloc_default_domain with IOMMU_DOMAIN_DMA_FQ, as long as iommu.strict/passthrough is not specified, so then later in dma-iommu:iommu_dma_init_domain() we can use FQ based on IOMMU_CAP_DEFERRED_FLUSH. > > but once we add ops->def_domain_type() then we end up calling iommu_group_alloc_default_domain() with a req_type == the return value from ops->def_domain_type(), which by the current definition can only be IOMMU_DOMAIN_DMA or IOMMU_DOMAIN_IDENTITY. We will then call __iommu_group_alloc_default_domain with that req_type; so without this patch + the DMA_FQ path in patch 6 we would always end up allocating IOMMU_DOMAIN_DMA instead of IOMMU_DOMAIN_DMA_FQ by default, so when we arrive at dma:iommu_dma_init_domain() we won't check for IOMMU_CAP_DEFERRED_FLUSH because of the type. > > So unless I'm missing something I think either we have to > 1) be more flexible in what ops->default_domain_type() is allowed to return as this patch does > or > 2) iommu core needs to look at the return from ops->default_domain_type() and decide whether it's OK to convert IOMMU_DOMAIN_DMA->IOMMU_DOMAIN_DMA_FQ based on strict setting. This removes the decision from the individual drivers and dma-iommu can later decide whether or not to use it or not based on IOMMU_CAP_DEFERRED_FLUSH? But would also affect other users of def_domain_type() today that perhaps did not want DMA_FQ? Unsure. What I mean is something like (untested): > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > index 6bdede4177ff..275daa7f819d 100644 > --- a/drivers/iommu/iommu.c > +++ b/drivers/iommu/iommu.c > @@ -1744,9 +1744,11 @@ static int iommu_get_def_domain_type(struct iommu_group *group, > */ > type = ops->default_domain->type; > } else { > - if (ops->def_domain_type) > + if (ops->def_domain_type) { > type = ops->def_domain_type(dev); > - else > + if (type == IOMMU_DOMAIN_DMA && !iommu_dma_strict) > + type = IOMMU_DOMAIN_DMA_FQ; > + } else > return cur_type; > } > if (!type || cur_type == type) > >
On 12/10/24 1:42 PM, Robin Murphy wrote: > On 10/12/2024 4:26 pm, Matthew Rosato wrote: >> On 12/9/24 9:57 PM, Baolu Lu wrote: >>> On 12/10/24 03:24, Matthew Rosato wrote: >>>> In addition to IOMMU_DOMAIN_DMA, def_domain_type can also return >>>> IOMMU_DOMAIN_DMA_FQ when applicable, else flush queues will never be >>>> used. >>>> >>>> Signed-off-by: Matthew Rosato<mjrosato@linux.ibm.com> >>>> --- >>>> include/linux/iommu.h | 1 + >>>> 1 file changed, 1 insertion(+) >>>> >>>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h >>>> index 05279109c732..d0da1918d2de 100644 >>>> --- a/include/linux/iommu.h >>>> +++ b/include/linux/iommu.h >>>> @@ -585,6 +585,7 @@ iommu_copy_struct_from_full_user_array(void *kdst, size_t kdst_entry_size, >>>> * @def_domain_type: device default domain type, return value: >>>> * - IOMMU_DOMAIN_IDENTITY: must use an identity domain >>>> * - IOMMU_DOMAIN_DMA: must use a dma domain >>>> + * - IOMMU_DOMAIN_DMA_FQ: dma domain with batch invalidation >>> >>> In which case must an iommu driver return IOMMU_DOMAIN_DMA_FQ? >>> >>> The flush queue is a policy of "when and how to synchronize the IOTLB" >>> in dma-iommu.c. The iommu driver actually has no need to understand such >>> policy. >> >> If you look ahead to the next patch where I implement def_domain_type for s390, I found that if I only ever return IOMMU_DOMAIN_DMA from ops->def_domain_type then when go through iommu_dma_init_domain() we will never call iommu_dma_init_fq() regardless of IOMMU_CAP_DEFERRED_FLUSH because of the if (domain->type == IOMMU_DOMAIN_DMA_FQ) check. So something isn't right here. > > Conceptually I don't think it ever makes sense for a driver to *require* a device to use deferred invalidation. Furthermore it's been the whole design for a while now that drivers should never see nor have to acknowledge IOMMU_DOMAIN_DMA_FQ, it's now just an internal type which exists largely for the sake of making the sysfs interface work really neatly. Also beware that a major reason for overriding iommu_def_domain_type with a paging domain is for untrusted devices, so massaging the result based on iommu_dma_strict is still not necessarily appropriate anyway. > > It appears the real underlying issue is that, like everyone else in the same situation, you're doing def_domain_type wrong. If and when you can't support IOMMU_DOMAIN_IDENTITY, the expectation is that you make __iommu_alloc_identity_domain() fail, such that if iommu_def_domain_type is then ever set to passthrough, iommu_group_alloc_default_domain() falls back to IOMMU_DOMAIN_DMA by itself, and the user gets told they did a silly thing. OK, I almost see where this all fits to throw out def_domain_type for this series... but looking at __iommu_alloc_identity_domain, the preferred approach is using a static identity domain which turns __iommu_alloc_identity_domain into a nofail case once you define the identity_domain: if (ops->identity_domain) return ops->identity_domain; So it seems to me to be an all-or-nothing thing, whereas what I'm trying to achieve is a device-based decision on whether the group is allowed to use that identity domain. Which reminds me that this is ultimately why I ended up looking into def_domain_type in the first place. If I need __iommu_alloc_identity_domain to fail, I guess what I'm looking to do boils down to something like... if (ops->identity_domain) { if (!ops->allow_identity || ops->allow_identity(dev)) return ops->identity_domain; else return ERR_PTR(-EOPNOTSUPP); }
On 10/12/2024 10:06 pm, Matthew Rosato wrote: > On 12/10/24 1:42 PM, Robin Murphy wrote: >> On 10/12/2024 4:26 pm, Matthew Rosato wrote: >>> On 12/9/24 9:57 PM, Baolu Lu wrote: >>>> On 12/10/24 03:24, Matthew Rosato wrote: >>>>> In addition to IOMMU_DOMAIN_DMA, def_domain_type can also return >>>>> IOMMU_DOMAIN_DMA_FQ when applicable, else flush queues will never be >>>>> used. >>>>> >>>>> Signed-off-by: Matthew Rosato<mjrosato@linux.ibm.com> >>>>> --- >>>>> include/linux/iommu.h | 1 + >>>>> 1 file changed, 1 insertion(+) >>>>> >>>>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h >>>>> index 05279109c732..d0da1918d2de 100644 >>>>> --- a/include/linux/iommu.h >>>>> +++ b/include/linux/iommu.h >>>>> @@ -585,6 +585,7 @@ iommu_copy_struct_from_full_user_array(void *kdst, size_t kdst_entry_size, >>>>> * @def_domain_type: device default domain type, return value: >>>>> * - IOMMU_DOMAIN_IDENTITY: must use an identity domain >>>>> * - IOMMU_DOMAIN_DMA: must use a dma domain >>>>> + * - IOMMU_DOMAIN_DMA_FQ: dma domain with batch invalidation >>>> >>>> In which case must an iommu driver return IOMMU_DOMAIN_DMA_FQ? >>>> >>>> The flush queue is a policy of "when and how to synchronize the IOTLB" >>>> in dma-iommu.c. The iommu driver actually has no need to understand such >>>> policy. >>> >>> If you look ahead to the next patch where I implement def_domain_type for s390, I found that if I only ever return IOMMU_DOMAIN_DMA from ops->def_domain_type then when go through iommu_dma_init_domain() we will never call iommu_dma_init_fq() regardless of IOMMU_CAP_DEFERRED_FLUSH because of the if (domain->type == IOMMU_DOMAIN_DMA_FQ) check. So something isn't right here. >> >> Conceptually I don't think it ever makes sense for a driver to *require* a device to use deferred invalidation. Furthermore it's been the whole design for a while now that drivers should never see nor have to acknowledge IOMMU_DOMAIN_DMA_FQ, it's now just an internal type which exists largely for the sake of making the sysfs interface work really neatly. Also beware that a major reason for overriding iommu_def_domain_type with a paging domain is for untrusted devices, so massaging the result based on iommu_dma_strict is still not necessarily appropriate anyway. >> >> It appears the real underlying issue is that, like everyone else in the same situation, you're doing def_domain_type wrong. If and when you can't support IOMMU_DOMAIN_IDENTITY, the expectation is that you make __iommu_alloc_identity_domain() fail, such that if iommu_def_domain_type is then ever set to passthrough, iommu_group_alloc_default_domain() falls back to IOMMU_DOMAIN_DMA by itself, and the user gets told they did a silly thing. > > OK, I almost see where this all fits to throw out def_domain_type for this series... but looking at __iommu_alloc_identity_domain, the preferred approach is using a static identity domain which turns __iommu_alloc_identity_domain into a nofail case once you define the identity_domain: > > if (ops->identity_domain) > return ops->identity_domain; > > So it seems to me to be an all-or-nothing thing, whereas what I'm trying to achieve is a device-based decision on whether the group is allowed to use that identity domain. Which reminds me that this is ultimately why I ended up looking into def_domain_type in the first place. Well, yeah, the static domain is very much designed for the typical case where it *is* unconditionally available. Simple options right now would be to have two sets of ops and select them per-instance, or if it's a system-wide property perhaps have non-const ops and populate/clear identity_domain as appropriate, or possibly even just stick with the ops->domain_alloc(IOMMU_DOMAIN_IDENTITY) path (perhaps eventually evolving into a specialised domain_alloc_identity op to also become per-device?). What we should not have is mechanisms like the def_domain_type hack where something claims to be available, but then some other thing tries to say "oh but you mustn't touch that". Thanks, Robin. > If I need __iommu_alloc_identity_domain to fail, I guess what I'm looking to do boils down to something like... > > if (ops->identity_domain) { > if (!ops->allow_identity || ops->allow_identity(dev)) > return ops->identity_domain; > else > return ERR_PTR(-EOPNOTSUPP); > } > > > > > >
© 2016 - 2024 Red Hat, Inc.