[PATCH] iommu/arm-smmu-qcom: Only enable stall on smmu-v2

Rob Clark posted 1 patch 1 year, 1 month ago
There is a newer version of this series
drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
[PATCH] iommu/arm-smmu-qcom: Only enable stall on smmu-v2
Posted by Rob Clark 1 year, 1 month ago
From: Rob Clark <robdclark@chromium.org>

On mmu-500, stall-on-fault seems to stall all context banks, causing the
GMU to misbehave.  So limit this feature to smmu-v2 for now.

This fixes an issue with an older mesa bug taking outo the system
because of GMU going off into the year.

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index c4c52f7bd09a..1c881e88fc4d 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -331,8 +331,10 @@ static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
 	priv->get_ttbr1_cfg = qcom_adreno_smmu_get_ttbr1_cfg;
 	priv->set_ttbr0_cfg = qcom_adreno_smmu_set_ttbr0_cfg;
 	priv->get_fault_info = qcom_adreno_smmu_get_fault_info;
-	priv->set_stall = qcom_adreno_smmu_set_stall;
-	priv->resume_translation = qcom_adreno_smmu_resume_translation;
+	if (of_device_is_compatible(np, "qcom,smmu-v2")) {
+		priv->set_stall = qcom_adreno_smmu_set_stall;
+		priv->resume_translation = qcom_adreno_smmu_resume_translation;
+	}
 	priv->set_prr_bit = NULL;
 	priv->set_prr_addr = NULL;
 
-- 
2.47.1
Re: [PATCH] iommu/arm-smmu-qcom: Only enable stall on smmu-v2
Posted by Will Deacon 1 year, 1 month ago
On Mon, Dec 16, 2024 at 09:10:17AM -0800, Rob Clark wrote:
> From: Rob Clark <robdclark@chromium.org>
> 
> On mmu-500, stall-on-fault seems to stall all context banks, causing the
> GMU to misbehave.  So limit this feature to smmu-v2 for now.

MMU-500 has public documentation so please can you dig up what the
actual behaviour is rather than guess?

> This fixes an issue with an older mesa bug taking outo the system
> because of GMU going off into the year.

Sorry, but I don't understand this sentence.

Will
Re: [PATCH] iommu/arm-smmu-qcom: Only enable stall on smmu-v2
Posted by Robin Murphy 1 year, 1 month ago
On 2024-12-19 11:30 am, Will Deacon wrote:
> On Mon, Dec 16, 2024 at 09:10:17AM -0800, Rob Clark wrote:
>> From: Rob Clark <robdclark@chromium.org>
>>
>> On mmu-500, stall-on-fault seems to stall all context banks, causing the
>> GMU to misbehave.  So limit this feature to smmu-v2 for now.
> 
> MMU-500 has public documentation so please can you dig up what the
> actual behaviour is rather than guess?

Yeah, I'm pretty sure that's not true as stated, especially with 
SCTLR.HUPCF set as qcom_adreno_smmu_write_sctlr() does. However it is 
plausible that at the system interconnect level, a sufficient number of 
stalled transactions might backpressure other transactions from entering 
the same TBU, even if they are destined for a different context. That's 
more about the configuration and integration of individual SoCs than the 
SMMU IP used.

Robin.

>> This fixes an issue with an older mesa bug taking outo the system
>> because of GMU going off into the year.
> 
> Sorry, but I don't understand this sentence.
> 
> Will
Re: [PATCH] iommu/arm-smmu-qcom: Only enable stall on smmu-v2
Posted by Rob Clark 1 year, 1 month ago
On Thu, Dec 19, 2024 at 4:08 AM Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 2024-12-19 11:30 am, Will Deacon wrote:
> > On Mon, Dec 16, 2024 at 09:10:17AM -0800, Rob Clark wrote:
> >> From: Rob Clark <robdclark@chromium.org>
> >>
> >> On mmu-500, stall-on-fault seems to stall all context banks, causing the
> >> GMU to misbehave.  So limit this feature to smmu-v2 for now.
> >
> > MMU-500 has public documentation so please can you dig up what the
> > actual behaviour is rather than guess?
>
> Yeah, I'm pretty sure that's not true as stated, especially with
> SCTLR.HUPCF set as qcom_adreno_smmu_write_sctlr() does. However it is
> plausible that at the system interconnect level, a sufficient number of
> stalled transactions might backpressure other transactions from entering
> the same TBU, even if they are destined for a different context. That's
> more about the configuration and integration of individual SoCs than the
> SMMU IP used.

I am aware of the docs and I've spent most of the last couple days
going thru them, as well as the errata, since it would be unfortunate
for debugging to disable this ;-)

The scenario where things lock up involves at least a few thousand
faults in rapid succession.  Disabling CFIE in the irq handler and
re-enabling when I resume translation does stop the flood of irq's but
not the lockup.  It might well be something about how the smmu is
integrated with the interconnect.

BR,
-R

> Robin.
>
> >> This fixes an issue with an older mesa bug taking outo the system
> >> because of GMU going off into the year.
> >
> > Sorry, but I don't understand this sentence.
> >
> > Will
>
Re: [PATCH] iommu/arm-smmu-qcom: Only enable stall on smmu-v2
Posted by Akhil P Oommen 1 year, 1 month ago
On 12/16/2024 10:40 PM, Rob Clark wrote:
> From: Rob Clark <robdclark@chromium.org>
> 
> On mmu-500, stall-on-fault seems to stall all context banks, causing the
> GMU to misbehave.  So limit this feature to smmu-v2 for now.
> 
> This fixes an issue with an older mesa bug taking outo the system
> because of GMU going off into the year.
> 
> Signed-off-by: Rob Clark <robdclark@chromium.org>
> ---
>  drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> index c4c52f7bd09a..1c881e88fc4d 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> @@ -331,8 +331,10 @@ static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
>  	priv->get_ttbr1_cfg = qcom_adreno_smmu_get_ttbr1_cfg;
>  	priv->set_ttbr0_cfg = qcom_adreno_smmu_set_ttbr0_cfg;
>  	priv->get_fault_info = qcom_adreno_smmu_get_fault_info;
> -	priv->set_stall = qcom_adreno_smmu_set_stall;
> -	priv->resume_translation = qcom_adreno_smmu_resume_translation;
> +	if (of_device_is_compatible(np, "qcom,smmu-v2")) {
> +		priv->set_stall = qcom_adreno_smmu_set_stall;
> +		priv->resume_translation = qcom_adreno_smmu_resume_translation;
> +	}

Shall we disable this from the driver instead? A debugfs knob to trigger
coredump after a pagefault is very convenient.

-Akhil

>  	priv->set_prr_bit = NULL;
>  	priv->set_prr_addr = NULL;
>
Re: [PATCH] iommu/arm-smmu-qcom: Only enable stall on smmu-v2
Posted by Rob Clark 1 year, 1 month ago
On Mon, Dec 16, 2024 at 12:28 PM Akhil P Oommen
<quic_akhilpo@quicinc.com> wrote:
>
> On 12/16/2024 10:40 PM, Rob Clark wrote:
> > From: Rob Clark <robdclark@chromium.org>
> >
> > On mmu-500, stall-on-fault seems to stall all context banks, causing the
> > GMU to misbehave.  So limit this feature to smmu-v2 for now.
> >
> > This fixes an issue with an older mesa bug taking outo the system
> > because of GMU going off into the year.
> >
> > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > ---
> >  drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> > index c4c52f7bd09a..1c881e88fc4d 100644
> > --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> > @@ -331,8 +331,10 @@ static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
> >       priv->get_ttbr1_cfg = qcom_adreno_smmu_get_ttbr1_cfg;
> >       priv->set_ttbr0_cfg = qcom_adreno_smmu_set_ttbr0_cfg;
> >       priv->get_fault_info = qcom_adreno_smmu_get_fault_info;
> > -     priv->set_stall = qcom_adreno_smmu_set_stall;
> > -     priv->resume_translation = qcom_adreno_smmu_resume_translation;
> > +     if (of_device_is_compatible(np, "qcom,smmu-v2")) {
> > +             priv->set_stall = qcom_adreno_smmu_set_stall;
> > +             priv->resume_translation = qcom_adreno_smmu_resume_translation;
> > +     }
>
> Shall we disable this from the driver instead? A debugfs knob to trigger
> coredump after a pagefault is very convenient.

It would require the driver to find the compatible for the smmu, so it
could differentiate btwn smmu-v2 and mmu-500, which seemed like it
might be a bit uglier.

Ideally/hopefully we could figure out how to make GMU a bit more
resilient in the face of stalled translations, because it is useful to
get accurate devcore dumps on smmu faults.  At that point we could
revert this change.

BR,
-R

>
> -Akhil
>
> >       priv->set_prr_bit = NULL;
> >       priv->set_prr_addr = NULL;
> >
>