[PATCH] xen/arm: smmuv1: remove iommu group when deassign a device

Rahul Singh posted 1 patch 2 years ago
Test gitlab-ci failed
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/a19f7238f428deb610df643944f60e1e79e273cf.1651075797.git.rahul.singh@arm.com
xen/drivers/passthrough/arm/smmu.c | 2 ++
1 file changed, 2 insertions(+)
[PATCH] xen/arm: smmuv1: remove iommu group when deassign a device
Posted by Rahul Singh 2 years ago
When a device is deassigned from the domain it is required to remove the
iommu group.

If we don't remove the group, the next time when we assign
a device, SME and S2CR will not be setup correctly for the device
because of that SMMU fault will be observed.

Signed-off-by: Rahul Singh <rahul.singh@arm.com>
---
 xen/drivers/passthrough/arm/smmu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c
index 5cacb2dd99..9a31c332d0 100644
--- a/xen/drivers/passthrough/arm/smmu.c
+++ b/xen/drivers/passthrough/arm/smmu.c
@@ -1690,6 +1690,8 @@ static void arm_smmu_detach_dev(struct iommu_domain *domain, struct device *dev)
 	if (cfg)
 		arm_smmu_master_free_smes(cfg);
 
+	iommu_group_put(dev_iommu_group(dev));
+	dev_iommu_group(dev) = NULL;
 }
 
 #if 0 /*
-- 
2.25.1


Re: [PATCH] xen/arm: smmuv1: remove iommu group when deassign a device
Posted by Julien Grall 2 years ago
Hi,

On 27/04/2022 17:15, Rahul Singh wrote:
> When a device is deassigned from the domain it is required to remove the
> iommu group.

This read wrong to me. We should not need to re-create the IOMMU group 
(and call arm_smmu_add_device()) every time a device is re-assigned.

> 
> If we don't remove the group, the next time when we assign
> a device, SME and S2CR will not be setup correctly for the device
> because of that SMMU fault will be observed.

I think this is a bug fix for 0435784cc75dcfef3b5f59c29deb1dbb84265ddb. 
If so, please add a Fixes tag.

> 
> Signed-off-by: Rahul Singh <rahul.singh@arm.com>
> ---
>   xen/drivers/passthrough/arm/smmu.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c
> index 5cacb2dd99..9a31c332d0 100644
> --- a/xen/drivers/passthrough/arm/smmu.c
> +++ b/xen/drivers/passthrough/arm/smmu.c
> @@ -1690,6 +1690,8 @@ static void arm_smmu_detach_dev(struct iommu_domain *domain, struct device *dev)
>   	if (cfg)
>   		arm_smmu_master_free_smes(cfg);
>   
> +	iommu_group_put(dev_iommu_group(dev));
> +	dev_iommu_group(dev) = NULL;
>   }

The goal of arm_smmu_detach_dev() is to revert the change made in 
arm_smmu_attach_dev(). But looking at the code, neither the IOMMU group 
nor the smes are allocated in arm_smmu_attach_dev().

Are the SMES meant to be re-allocated everytime we assign to a different 
domain? If yes, the allocation should be done in arm_smmu_attach_dev().

If not, then we should not free the SMES here

IIUC, the SMES have to be re-allocated every time a device is assigned. 
Therefore, I think we should move the call to 
arm_smmu_master_alloc_smes() out of the detach callback and in a helper 
that would be used when removing a device (not yet supported by Xen).

Cheers,

-- 
Julien Grall

Re: [PATCH] xen/arm: smmuv1: remove iommu group when deassign a device
Posted by Rahul Singh 2 years ago
Hi Julien,

> On 27 Apr 2022, at 6:42 pm, Julien Grall <julien@xen.org> wrote:
> 
> Hi,
> 
> On 27/04/2022 17:15, Rahul Singh wrote:
>> When a device is deassigned from the domain it is required to remove the
>> iommu group.
> 
> This read wrong to me. We should not need to re-create the IOMMU group (and call arm_smmu_add_device()) every time a device is re-assigned.
Ok.
> 
>> If we don't remove the group, the next time when we assign
>> a device, SME and S2CR will not be setup correctly for the device
>> because of that SMMU fault will be observed.
> 
> I think this is a bug fix for 0435784cc75dcfef3b5f59c29deb1dbb84265ddb. If so, please add a Fixes tag.

Ok Let me add the Fixes tag in next version.
> 
>> Signed-off-by: Rahul Singh <rahul.singh@arm.com>
>> ---
>> xen/drivers/passthrough/arm/smmu.c | 2 ++
>> 1 file changed, 2 insertions(+)
>> diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c
>> index 5cacb2dd99..9a31c332d0 100644
>> --- a/xen/drivers/passthrough/arm/smmu.c
>> +++ b/xen/drivers/passthrough/arm/smmu.c
>> @@ -1690,6 +1690,8 @@ static void arm_smmu_detach_dev(struct iommu_domain *domain, struct device *dev)
>> 	if (cfg)
>> 		arm_smmu_master_free_smes(cfg);
>> +	iommu_group_put(dev_iommu_group(dev));
>> +	dev_iommu_group(dev) = NULL;
>> }
> 
> The goal of arm_smmu_detach_dev() is to revert the change made in arm_smmu_attach_dev(). But looking at the code, neither the IOMMU group nor the smes are allocated in arm_smmu_attach_dev().
> 
> Are the SMES meant to be re-allocated everytime we assign to a different domain? If yes, the allocation should be done in arm_smmu_attach_dev().

Yes SMES have to be re-allocated every time a device is assigned.

Is that okay if I will move the function arm_smmu_master_alloc_smes() from arm_smmu_add_device() to arm_smmu_attach_dev().
In this case we don’t need to remove the IOMMU group and also arm_smmu_detach_dev() will also revert the  change made in arm_smmu_attach_dev().

diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c
index 5cacb2dd99..ff1b73d3d8 100644
--- a/xen/drivers/passthrough/arm/smmu.c
+++ b/xen/drivers/passthrough/arm/smmu.c
@@ -1680,6 +1680,10 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
        if (!cfg)
                return -ENODEV;
 
+       ret = arm_smmu_master_alloc_smes(dev);
+       if (ret)
+               return ret;
+
        return arm_smmu_domain_add_master(smmu_domain, cfg);
 }
 
@@ -2075,7 +2079,7 @@ static int arm_smmu_add_device(struct device *dev)
        iommu_group_add_device(group, dev);
        iommu_group_put(group);
 
-       return arm_smmu_master_alloc_smes(dev);
+       return 0;
 }

Regards,
Rahul
> 
> If not, then we should not free the SMES here
> 
> IIUC, the SMES have to be re-allocated every time a device is assigned. Therefore, I think we should move the call to arm_smmu_master_alloc_smes() out of the detach callback and in a helper that would be used when removing a device (not yet supported by Xen).
> 
> Cheers,
> 
> -- 
> Julien Grall

Re: [PATCH] xen/arm: smmuv1: remove iommu group when deassign a device
Posted by Julien Grall 2 years ago

On 29/04/2022 15:33, Rahul Singh wrote:
> Hi Julien,

Hi Rahul,

>> On 27 Apr 2022, at 6:42 pm, Julien Grall <julien@xen.org> wrote:
>>
>> Hi,
>>
>> On 27/04/2022 17:15, Rahul Singh wrote:
>>> When a device is deassigned from the domain it is required to remove the
>>> iommu group.
>>
>> This read wrong to me. We should not need to re-create the IOMMU group (and call arm_smmu_add_device()) every time a device is re-assigned.
> Ok.
>>
>>> If we don't remove the group, the next time when we assign
>>> a device, SME and S2CR will not be setup correctly for the device
>>> because of that SMMU fault will be observed.
>>
>> I think this is a bug fix for 0435784cc75dcfef3b5f59c29deb1dbb84265ddb. If so, please add a Fixes tag.
> 
> Ok Let me add the Fixes tag in next version.
>>
>>> Signed-off-by: Rahul Singh <rahul.singh@arm.com>
>>> ---
>>> xen/drivers/passthrough/arm/smmu.c | 2 ++
>>> 1 file changed, 2 insertions(+)
>>> diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c
>>> index 5cacb2dd99..9a31c332d0 100644
>>> --- a/xen/drivers/passthrough/arm/smmu.c
>>> +++ b/xen/drivers/passthrough/arm/smmu.c
>>> @@ -1690,6 +1690,8 @@ static void arm_smmu_detach_dev(struct iommu_domain *domain, struct device *dev)
>>> 	if (cfg)
>>> 		arm_smmu_master_free_smes(cfg);
>>> +	iommu_group_put(dev_iommu_group(dev));
>>> +	dev_iommu_group(dev) = NULL;
>>> }
>>
>> The goal of arm_smmu_detach_dev() is to revert the change made in arm_smmu_attach_dev(). But looking at the code, neither the IOMMU group nor the smes are allocated in arm_smmu_attach_dev().
>>
>> Are the SMES meant to be re-allocated everytime we assign to a different domain? If yes, the allocation should be done in arm_smmu_attach_dev().
> 
> Yes SMES have to be re-allocated every time a device is assigned.

Hmmmm.... Looking at the code, arm_smmu_alloc_smes() doesn't seem to use 
the domain information. So why would it need to be done every time it is 
assigned?

> 
> Is that okay if I will move the function arm_smmu_master_alloc_smes() from arm_smmu_add_device() to arm_smmu_attach_dev().
> In this case we don’t need to remove the IOMMU group and also arm_smmu_detach_dev() will also revert the  change made in arm_smmu_attach_dev().
> 
> diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c
> index 5cacb2dd99..ff1b73d3d8 100644
> --- a/xen/drivers/passthrough/arm/smmu.c
> +++ b/xen/drivers/passthrough/arm/smmu.c
> @@ -1680,6 +1680,10 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
>          if (!cfg)
>                  return -ENODEV;
>   
> +       ret = arm_smmu_master_alloc_smes(dev);
> +       if (ret)
> +               return ret;
> +
>          return arm_smmu_domain_add_master(smmu_domain, cfg);

If we go down this route, then you will likely need to revert the change 
made by arm_smmu_master_alloc_smes().

>   }
>   
> @@ -2075,7 +2079,7 @@ static int arm_smmu_add_device(struct device *dev)
>          iommu_group_add_device(group, dev);
>          iommu_group_put(group);
>   
> -       return arm_smmu_master_alloc_smes(dev);
> +       return 0;
>   }
> 
> Regards,
> Rahul
>>
>> If not, then we should not free the SMES here
>>
>> IIUC, the SMES have to be re-allocated every time a device is assigned. Therefore, I think we should move the call to arm_smmu_master_alloc_smes() out of the detach callback and in a helper that would be used when removing a device (not yet supported by Xen).
>>
>> Cheers,
>>
>> -- 
>> Julien Grall
> 

-- 
Julien Grall

Re: [PATCH] xen/arm: smmuv1: remove iommu group when deassign a device
Posted by Mykyta Poturai 1 year, 10 months ago
Hi Julien, Rahul
I've encountered a similar problem with IMX8 GPU recently. It wasn't probing
properly after the domain reboot.  After some digging, I came to the same
solution as Rahul and found this thread. I also encountered the occasional
"Unexpected global fault, this could be serious" error message when destroying
a domain with an actively-working GPU.

>Hmmmm.... Looking at the code, arm_smmu_alloc_smes() doesn't seem to use
>the domain information. So why would it need to be done every time it is assigned?
Indeed after removing the arm_smmu_master_free_smes() call, both reboot and global
fault issues are gone. If I understand correctly, device removing is not yet
supported, so I can't find a proper place for the arm_smmu_master_free_smes() call.
Should we remove the function completely or just left it commented for later or
something else?

Rahul, are you still working on this or could I send my patch?

Regards,
Mykyta
Re: [PATCH] xen/arm: smmuv1: remove iommu group when deassign a device
Posted by Rahul Singh 1 year, 10 months ago
Hi Mykyta,

> On 16 Jun 2022, at 8:48 am, Mykyta Poturai <mykyta.poturai@gmail.com> wrote:
> 
> Hi Julien, Rahul
> I've encountered a similar problem with IMX8 GPU recently. It wasn't probing
> properly after the domain reboot.  After some digging, I came to the same
> solution as Rahul and found this thread. I also encountered the occasional
> "Unexpected global fault, this could be serious" error message when destroying
> a domain with an actively-working GPU.
> 
>> Hmmmm.... Looking at the code, arm_smmu_alloc_smes() doesn't seem to use
>> the domain information. So why would it need to be done every time it is assigned?
> Indeed after removing the arm_smmu_master_free_smes() call, both reboot and global
> fault issues are gone. If I understand correctly, device removing is not yet
> supported, so I can't find a proper place for the arm_smmu_master_free_smes() call.
> Should we remove the function completely or just left it commented for later or
> something else?
> 
> Rahul, are you still working on this or could I send my patch?

Yes, I have this on my to-do list but I was busy with other work and it got delayed. 

I created another solution for this issue, in which we don’t need to call arm_smmu_master_free_smes() 
in arm_smmu_detach_dev() but we can configure the s2cr value to type fault in detach function.

Will call new function arm_smmu_domain_remove_master() in detach function that will revert the changes done 
by arm_smmu_domain_add_master()  in attach function.

I don’t have any board to test the patch. If it is okay, Could you please test the patch and let me know the result.

diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c
index 69511683b4..da3adf8e7f 100644
--- a/xen/drivers/passthrough/arm/smmu.c
+++ b/xen/drivers/passthrough/arm/smmu.c
@@ -1598,21 +1598,6 @@ out_err:
        return ret;
 }
 
-static void arm_smmu_master_free_smes(struct arm_smmu_master_cfg *cfg)
-{
-    struct arm_smmu_device *smmu = cfg->smmu;
-       int i, idx;
-       struct iommu_fwspec *fwspec = arm_smmu_get_fwspec(cfg);
-
-       spin_lock(&smmu->stream_map_lock);
-       for_each_cfg_sme(cfg, i, idx, fwspec->num_ids) {
-               if (arm_smmu_free_sme(smmu, idx))
-                       arm_smmu_write_sme(smmu, idx);
-               cfg->smendx[i] = INVALID_SMENDX;
-       }
-       spin_unlock(&smmu->stream_map_lock);
-}
-
 static int arm_smmu_domain_add_master(struct arm_smmu_domain *smmu_domain,
                                      struct arm_smmu_master_cfg *cfg)
 {
@@ -1635,6 +1620,20 @@ static int arm_smmu_domain_add_master(struct arm_smmu_domain *smmu_domain,
        return 0;
 }
 
+static void arm_smmu_domain_remove_master(struct arm_smmu_domain *smmu_domain,
+                                     struct arm_smmu_master_cfg *cfg)
+{
+       struct arm_smmu_device *smmu = smmu_domain->smmu;
+       struct arm_smmu_s2cr *s2cr = smmu->s2crs;
+       struct iommu_fwspec *fwspec = arm_smmu_get_fwspec(cfg);
+       int i, idx;
+
+       for_each_cfg_sme(cfg, i, idx, fwspec->num_ids) {
+               s2cr[idx] = s2cr_init_val;
+               arm_smmu_write_s2cr(smmu, idx);
+       }
+}
+
 static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 {
        int ret;
@@ -1684,10 +1683,11 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 
 static void arm_smmu_detach_dev(struct iommu_domain *domain, struct device *dev)
 {
+       struct arm_smmu_domain *smmu_domain = domain->priv;
        struct arm_smmu_master_cfg *cfg = find_smmu_master_cfg(dev);
 
        if (cfg)
-               arm_smmu_master_free_smes(cfg);
+               return arm_smmu_domain_remove_master(smmu_domain, cfg);
 
 }

Regards,
Rahul
Re: [PATCH] xen/arm: smmuv1: remove iommu group when deassign a device
Posted by Mykyta Poturai 1 year, 10 months ago
> Hi Mykyta,
> 
>> On 16 Jun 2022, at 8:48 am, Mykyta Poturai <mykyta.poturai@gmail.com> wrote:
>> 
>> Hi Julien, Rahul
>> I've encountered a similar problem with IMX8 GPU recently. It wasn't probing
>> properly after the domain reboot.  After some digging, I came to the same
>> solution as Rahul and found this thread. I also encountered the occasional
>> "Unexpected global fault, this could be serious" error message when destroying
>> a domain with an actively-working GPU.
>> 
>>> Hmmmm.... Looking at the code, arm_smmu_alloc_smes() doesn't seem to use
>>> the domain information. So why would it need to be done every time it is assigned?
>> Indeed after removing the arm_smmu_master_free_smes() call, both reboot and global
>> fault issues are gone. If I understand correctly, device removing is not yet
>> supported, so I can't find a proper place for the arm_smmu_master_free_smes() call.
>> Should we remove the function completely or just left it commented for later or
>> something else?
>> 
>> Rahul, are you still working on this or could I send my patch?
> 
> Yes, I have this on my to-do list but I was busy with other work and it got delayed. 
> 
> I created another solution for this issue, in which we don’t need to call arm_smmu_master_free_smes() 
> in arm_smmu_detach_dev() but we can configure the s2cr value to type fault in detach function.
> 
> Will call new function arm_smmu_domain_remove_master() in detach function that will revert the changes done 
> by arm_smmu_domain_add_master()  in attach function.
> 
> I don’t have any board to test the patch. If it is okay, Could you please test the patch and let me know the result.
> 
> diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c
> index 69511683b4..da3adf8e7f 100644
> --- a/xen/drivers/passthrough/arm/smmu.c
> +++ b/xen/drivers/passthrough/arm/smmu.c
> @@ -1598,21 +1598,6 @@ out_err:
>         return ret;
>  }
>  
> -static void arm_smmu_master_free_smes(struct arm_smmu_master_cfg *cfg)
> -{
> -    struct arm_smmu_device *smmu = cfg->smmu;
> -       int i, idx;
> -       struct iommu_fwspec *fwspec = arm_smmu_get_fwspec(cfg);
> -
> -       spin_lock(&smmu->stream_map_lock);
> -       for_each_cfg_sme(cfg, i, idx, fwspec->num_ids) {
> -               if (arm_smmu_free_sme(smmu, idx))
> -                       arm_smmu_write_sme(smmu, idx);
> -               cfg->smendx[i] = INVALID_SMENDX;
> -       }
> -       spin_unlock(&smmu->stream_map_lock);
> -}
> -
>  static int arm_smmu_domain_add_master(struct arm_smmu_domain *smmu_domain,
>                                       struct arm_smmu_master_cfg *cfg)
>  {
> @@ -1635,6 +1620,20 @@ static int arm_smmu_domain_add_master(struct arm_smmu_domain *smmu_domain,
>         return 0;
>  }
>  
> +static void arm_smmu_domain_remove_master(struct arm_smmu_domain *smmu_domain,
> +                                     struct arm_smmu_master_cfg *cfg)
> +{
> +       struct arm_smmu_device *smmu = smmu_domain->smmu;
> +       struct arm_smmu_s2cr *s2cr = smmu->s2crs;
> +       struct iommu_fwspec *fwspec = arm_smmu_get_fwspec(cfg);
> +       int i, idx;
> +
> +       for_each_cfg_sme(cfg, i, idx, fwspec->num_ids) {
> +               s2cr[idx] = s2cr_init_val;
> +               arm_smmu_write_s2cr(smmu, idx);
> +       }
> +}
> +
>  static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
>  {
>         int ret;
> @@ -1684,10 +1683,11 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
>  
>  static void arm_smmu_detach_dev(struct iommu_domain *domain, struct device *dev)
>  {
> +       struct arm_smmu_domain *smmu_domain = domain->priv;
>         struct arm_smmu_master_cfg *cfg = find_smmu_master_cfg(dev);
>  
>         if (cfg)
> -               arm_smmu_master_free_smes(cfg);
> +               return arm_smmu_domain_remove_master(smmu_domain, cfg);
>  
>  }
> 
> Regards,
> Rahul

Hello Rahul,

For me, this patch fixed the issue with the GPU not probing after domain reboot.
But not fixed the "Unexpected Global fault" that occasionally happens when destroying
the domain with an actively working GPU. Although, I am not sure if this issue
is relevant here.

Regards,
Mykyta

Re: [PATCH] xen/arm: smmuv1: remove iommu group when deassign a device
Posted by Rahul Singh 1 year, 10 months ago
Hi Mykyta,

> On 17 Jun 2022, at 12:15 pm, Mykyta Poturai <mykyta.poturai@gmail.com> wrote:
> 
>> Hi Mykyta,
>> 
>>> On 16 Jun 2022, at 8:48 am, Mykyta Poturai <mykyta.poturai@gmail.com> wrote:
>>> 
>>> Hi Julien, Rahul
>>> I've encountered a similar problem with IMX8 GPU recently. It wasn't probing
>>> properly after the domain reboot.  After some digging, I came to the same
>>> solution as Rahul and found this thread. I also encountered the occasional
>>> "Unexpected global fault, this could be serious" error message when destroying
>>> a domain with an actively-working GPU.
>>> 
>>>> Hmmmm.... Looking at the code, arm_smmu_alloc_smes() doesn't seem to use
>>>> the domain information. So why would it need to be done every time it is assigned?
>>> Indeed after removing the arm_smmu_master_free_smes() call, both reboot and global
>>> fault issues are gone. If I understand correctly, device removing is not yet
>>> supported, so I can't find a proper place for the arm_smmu_master_free_smes() call.
>>> Should we remove the function completely or just left it commented for later or
>>> something else?
>>> 
>>> Rahul, are you still working on this or could I send my patch?
>> 
>> Yes, I have this on my to-do list but I was busy with other work and it got delayed. 
>> 
>> I created another solution for this issue, in which we don’t need to call arm_smmu_master_free_smes() 
>> in arm_smmu_detach_dev() but we can configure the s2cr value to type fault in detach function.
>> 
>> Will call new function arm_smmu_domain_remove_master() in detach function that will revert the changes done 
>> by arm_smmu_domain_add_master()  in attach function.
>> 
>> I don’t have any board to test the patch. If it is okay, Could you please test the patch and let me know the result.
>> 
>> diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c
>> index 69511683b4..da3adf8e7f 100644
>> --- a/xen/drivers/passthrough/arm/smmu.c
>> +++ b/xen/drivers/passthrough/arm/smmu.c
>> @@ -1598,21 +1598,6 @@ out_err:
>>        return ret;
>> }
>> 
>> -static void arm_smmu_master_free_smes(struct arm_smmu_master_cfg *cfg)
>> -{
>> -    struct arm_smmu_device *smmu = cfg->smmu;
>> -       int i, idx;
>> -       struct iommu_fwspec *fwspec = arm_smmu_get_fwspec(cfg);
>> -
>> -       spin_lock(&smmu->stream_map_lock);
>> -       for_each_cfg_sme(cfg, i, idx, fwspec->num_ids) {
>> -               if (arm_smmu_free_sme(smmu, idx))
>> -                       arm_smmu_write_sme(smmu, idx);
>> -               cfg->smendx[i] = INVALID_SMENDX;
>> -       }
>> -       spin_unlock(&smmu->stream_map_lock);
>> -}
>> -
>> static int arm_smmu_domain_add_master(struct arm_smmu_domain *smmu_domain,
>>                                      struct arm_smmu_master_cfg *cfg)
>> {
>> @@ -1635,6 +1620,20 @@ static int arm_smmu_domain_add_master(struct arm_smmu_domain *smmu_domain,
>>        return 0;
>> }
>> 
>> +static void arm_smmu_domain_remove_master(struct arm_smmu_domain *smmu_domain,
>> +                                     struct arm_smmu_master_cfg *cfg)
>> +{
>> +       struct arm_smmu_device *smmu = smmu_domain->smmu;
>> +       struct arm_smmu_s2cr *s2cr = smmu->s2crs;
>> +       struct iommu_fwspec *fwspec = arm_smmu_get_fwspec(cfg);
>> +       int i, idx;
>> +
>> +       for_each_cfg_sme(cfg, i, idx, fwspec->num_ids) {
>> +               s2cr[idx] = s2cr_init_val;
>> +               arm_smmu_write_s2cr(smmu, idx);
>> +       }
>> +}
>> +
>> static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
>> {
>>        int ret;
>> @@ -1684,10 +1683,11 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
>> 
>> static void arm_smmu_detach_dev(struct iommu_domain *domain, struct device *dev)
>> {
>> +       struct arm_smmu_domain *smmu_domain = domain->priv;
>>        struct arm_smmu_master_cfg *cfg = find_smmu_master_cfg(dev);
>> 
>>        if (cfg)
>> -               arm_smmu_master_free_smes(cfg);
>> +               return arm_smmu_domain_remove_master(smmu_domain, cfg);
>> 
>> }
>> 
>> Regards,
>> Rahul
> 
> Hello Rahul,
> 
> For me, this patch fixed the issue with the GPU not probing after domain reboot.

Thanks for testing the patch.
> But not fixed the "Unexpected Global fault" that occasionally happens when destroying
> the domain with an actively working GPU. Although, I am not sure if this issue
> is relevant here.

Can you please if possible share the more details and logs so that I can look if this issue is relevant here ?

Thanks in advance.

Regards,
Rahul
Re: [PATCH] xen/arm: smmuv1: remove iommu group when deassign a device
Posted by Mykyta Poturai 1 year, 10 months ago
> Thanks for testing the patch.
>> But not fixed the "Unexpected Global fault" that occasionally happens when destroying
>> the domain with an actively working GPU. Although, I am not sure if this issue
>> is relevant here.
>
> Can you please if possible share the more details and logs so that I can look if this issue is relevant here ?

So in my setup I have a board with IMX8 chip and 2 core Vivante GPU. GPU is split between domains.
One core goes to Dom0 and one to DomU.

Steps to trigger this issue:
1. Start DomU
2. Start wayland and glmark2-es2-wayland inside DomU
3. Destroy DomU

Sometimes it destroys fine but roughly 1 of 8 times I get logs like this:

root@dom0:~# xl dest DomU
[12725.412940] xenbr0: port 1(vif8.0) entered disabled state
[12725.671033] xenbr0: port 1(vif8.0) entered disabled state
[12725.689923] device vif8.0 left promiscuous mode
[12725.696736] xenbr0: port 1(vif8.0) entered disabled state
[12725.696989] audit: type=1700 audit(1616594240.068:39): dev=vif8.0 prom=0 old_prom=256 auid=4294967295 uid=0 gid=0 ses=4294967295
(XEN) smmu: /iommu@51400000: Unexpected global fault, this could be serious
(XEN) smmu: /iommu@51400000:    GFSR 0x00000001, GFSYNR0 0x00000004, GFSYNR1 0x00001055, GFSYNR2 0x00000000

My guess is that this happens because GPU continues to access memory when the context is already invalidated,
and therefore triggers the "Invalid context fault".

Regards,
Mykyta
Re: [PATCH] xen/arm: smmuv1: remove iommu group when deassign a device
Posted by Rahul Singh 1 year, 10 months ago
Hi Mykyta,

> On 21 Jun 2022, at 10:38 am, Mykyta Poturai <mykyta.poturai@gmail.com> wrote:
> 
>> Thanks for testing the patch.
>>> But not fixed the "Unexpected Global fault" that occasionally happens when destroying
>>> the domain with an actively working GPU. Although, I am not sure if this issue
>>> is relevant here.
>> 
>> Can you please if possible share the more details and logs so that I can look if this issue is relevant here ?
> 
> So in my setup I have a board with IMX8 chip and 2 core Vivante GPU. GPU is split between domains.
> One core goes to Dom0 and one to DomU.
> 
> Steps to trigger this issue:
> 1. Start DomU
> 2. Start wayland and glmark2-es2-wayland inside DomU
> 3. Destroy DomU
> 
> Sometimes it destroys fine but roughly 1 of 8 times I get logs like this:
> 
> root@dom0:~# xl dest DomU
> [12725.412940] xenbr0: port 1(vif8.0) entered disabled state
> [12725.671033] xenbr0: port 1(vif8.0) entered disabled state
> [12725.689923] device vif8.0 left promiscuous mode
> [12725.696736] xenbr0: port 1(vif8.0) entered disabled state
> [12725.696989] audit: type=1700 audit(1616594240.068:39): dev=vif8.0 prom=0 old_prom=256 auid=4294967295 uid=0 gid=0 ses=4294967295
> (XEN) smmu: /iommu@51400000: Unexpected global fault, this could be serious
> (XEN) smmu: /iommu@51400000:    GFSR 0x00000001, GFSYNR0 0x00000004, GFSYNR1 0x00001055, GFSYNR2 0x00000000
> 
> My guess is that this happens because GPU continues to access memory when the context is already invalidated,
> and therefore triggers the "Invalid context fault".

Yes you are right in this case GPU trying to do DMA operation after Xen destroyed the guest and configures
the S2CR type value to fault. Solution to this issue is the patch that I shared earlier.

You can try this patch and confirm.This patch will solve both the issues.

diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c
index 5cacb2dd99..ff1b73d3d8 100644
--- a/xen/drivers/passthrough/arm/smmu.c
+++ b/xen/drivers/passthrough/arm/smmu.c
@@ -1680,6 +1680,10 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
if (!cfg)
return -ENODEV;

+ ret = arm_smmu_master_alloc_smes(dev);
+ if (ret)
+ return ret;
+
return arm_smmu_domain_add_master(smmu_domain, cfg);
}

@@ -2075,7 +2079,7 @@ static int arm_smmu_add_device(struct device *dev)
iommu_group_add_device(group, dev);
iommu_group_put(group);

- return arm_smmu_master_alloc_smes(dev);
+ return 0;
}


Regards,
Rahul
Re: [PATCH] xen/arm: smmuv1: remove iommu group when deassign a device
Posted by Mykyta Poturai 1 year, 10 months ago
> Hi Mykyta,
> 
>> On 21 Jun 2022, at 10:38 am, Mykyta Poturai <mykyta.poturai@gmail.com> wrote:
>> 
>>> Thanks for testing the patch.
>>>> But not fixed the "Unexpected Global fault" that occasionally happens when destroying
>>>> the domain with an actively working GPU. Although, I am not sure if this issue
>>>> is relevant here.
>>> 
>>> Can you please if possible share the more details and logs so that I can look if this issue is relevant here ?
>> 
>> So in my setup I have a board with IMX8 chip and 2 core Vivante GPU. GPU is split between domains.
>> One core goes to Dom0 and one to DomU.
>> 
>> Steps to trigger this issue:
>> 1. Start DomU
>> 2. Start wayland and glmark2-es2-wayland inside DomU
>> 3. Destroy DomU
>> 
>> Sometimes it destroys fine but roughly 1 of 8 times I get logs like this:
>> 
>> root@dom0:~# xl dest DomU
>> [12725.412940] xenbr0: port 1(vif8.0) entered disabled state
>> [12725.671033] xenbr0: port 1(vif8.0) entered disabled state
>> [12725.689923] device vif8.0 left promiscuous mode
>> [12725.696736] xenbr0: port 1(vif8.0) entered disabled state
>> [12725.696989] audit: type=1700 audit(1616594240.068:39): dev=vif8.0 prom=0 old_prom=256 auid=4294967295 uid=0 gid=0 ses=4294967295
>> (XEN) smmu: /iommu@51400000: Unexpected global fault, this could be serious
>> (XEN) smmu: /iommu@51400000:    GFSR 0x00000001, GFSYNR0 0x00000004, GFSYNR1 0x00001055, GFSYNR2 0x00000000
>> 
>> My guess is that this happens because GPU continues to access memory when the context is already invalidated,
>> and therefore triggers the "Invalid context fault".
> 
> Yes you are right in this case GPU trying to do DMA operation after Xen destroyed the guest and configures
> the S2CR type value to fault. Solution to this issue is the patch that I shared earlier.
> 
> You can try this patch and confirm.This patch will solve both the issues.
> 
> diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c
> index 5cacb2dd99..ff1b73d3d8 100644
> --- a/xen/drivers/passthrough/arm/smmu.c
> +++ b/xen/drivers/passthrough/arm/smmu.c
> @@ -1680,6 +1680,10 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
> if (!cfg)
> return -ENODEV;
> 
> + ret = arm_smmu_master_alloc_smes(dev);
> + if (ret)
> + return ret;
> +
> return arm_smmu_domain_add_master(smmu_domain, cfg);
> }
> 
> @@ -2075,7 +2079,7 @@ static int arm_smmu_add_device(struct device *dev)
> iommu_group_add_device(group, dev);
> iommu_group_put(group);
> 
> - return arm_smmu_master_alloc_smes(dev);
> + return 0;
> }
> 
> 
> Regards,
> Rahul

Hi Rahul,

With this patch I get the same results, here is the error message:

(XEN) smmu: /iommu@51400000: Unexpected global fault, this could be serious
(XEN) smmu: /iommu@51400000:    GFSR 0x00000001, GFSYNR0 0x00000004, GFSYNR1 0x00001055, GFSYNR2 0x00000000

Regards,
Mykyta
Re: [PATCH] xen/arm: smmuv1: remove iommu group when deassign a device
Posted by Rahul Singh 1 year, 9 months ago
Hi ,

> On 28 Jun 2022, at 6:23 pm, Mykyta Poturai <mykyta.poturai@gmail.com> wrote:
> 
>> Hi Mykyta,
>> 
>>> On 21 Jun 2022, at 10:38 am, Mykyta Poturai <mykyta.poturai@gmail.com> wrote:
>>> 
>>>> Thanks for testing the patch.
>>>>> But not fixed the "Unexpected Global fault" that occasionally happens when destroying
>>>>> the domain with an actively working GPU. Although, I am not sure if this issue
>>>>> is relevant here.
>>>> 
>>>> Can you please if possible share the more details and logs so that I can look if this issue is relevant here ?
>>> 
>>> So in my setup I have a board with IMX8 chip and 2 core Vivante GPU. GPU is split between domains.
>>> One core goes to Dom0 and one to DomU.
>>> 
>>> Steps to trigger this issue:
>>> 1. Start DomU
>>> 2. Start wayland and glmark2-es2-wayland inside DomU
>>> 3. Destroy DomU
>>> 
>>> Sometimes it destroys fine but roughly 1 of 8 times I get logs like this:
>>> 
>>> root@dom0:~# xl dest DomU
>>> [12725.412940] xenbr0: port 1(vif8.0) entered disabled state
>>> [12725.671033] xenbr0: port 1(vif8.0) entered disabled state
>>> [12725.689923] device vif8.0 left promiscuous mode
>>> [12725.696736] xenbr0: port 1(vif8.0) entered disabled state
>>> [12725.696989] audit: type=1700 audit(1616594240.068:39): dev=vif8.0 prom=0 old_prom=256 auid=4294967295 uid=0 gid=0 ses=4294967295
>>> (XEN) smmu: /iommu@51400000: Unexpected global fault, this could be serious
>>> (XEN) smmu: /iommu@51400000:    GFSR 0x00000001, GFSYNR0 0x00000004, GFSYNR1 0x00001055, GFSYNR2 0x00000000
>>> 
>>> My guess is that this happens because GPU continues to access memory when the context is already invalidated,
>>> and therefore triggers the "Invalid context fault".
>> 
>> Yes you are right in this case GPU trying to do DMA operation after Xen destroyed the guest and configures
>> the S2CR type value to fault. Solution to this issue is the patch that I shared earlier.
>> 
>> You can try this patch and confirm.This patch will solve both the issues.
>> 
>> diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c
>> index 5cacb2dd99..ff1b73d3d8 100644
>> --- a/xen/drivers/passthrough/arm/smmu.c
>> +++ b/xen/drivers/passthrough/arm/smmu.c
>> @@ -1680,6 +1680,10 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
>> if (!cfg)
>> return -ENODEV;
>> 
>> + ret = arm_smmu_master_alloc_smes(dev);
>> + if (ret)
>> + return ret;
>> +
>> return arm_smmu_domain_add_master(smmu_domain, cfg);
>> }
>> 
>> @@ -2075,7 +2079,7 @@ static int arm_smmu_add_device(struct device *dev)
>> iommu_group_add_device(group, dev);
>> iommu_group_put(group);
>> 
>> - return arm_smmu_master_alloc_smes(dev);
>> + return 0;
>> }
>> 
>> 
>> Regards,
>> Rahul
> 
> Hi Rahul,
> 
> With this patch I get the same results, here is the error message:
> 
> (XEN) smmu: /iommu@51400000: Unexpected global fault, this could be serious
> (XEN) smmu: /iommu@51400000:    GFSR 0x00000001, GFSYNR0 0x00000004, GFSYNR1 0x00001055, GFSYNR2 0x00000000
> 

As you mentioned earlier, this fault is observed because GPU continues to access memory when the context is
already invalidated, and therefore triggers the "Invalid context fault".  This is a different issue and is not related to this patch.

@Julien are you okay if I will send the below patch for official review as this issue pending for a long time?

In this patch we don’t need to call arm_smmu_master_free_smes() in arm_smmu_detach_dev() but we will implement new
function arm_smmu_domain_remove_master() that will configure the s2cr value to type fault in detach function.
arm_smmu_domain_remove_master() will revert the changes done by arm_smmu_domain_add_master() in attach function.


diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c
index 69511683b4..da3adf8e7f 100644
--- a/xen/drivers/passthrough/arm/smmu.c
+++ b/xen/drivers/passthrough/arm/smmu.c
@@ -1598,21 +1598,6 @@ out_err:
return ret;
}

-static void arm_smmu_master_free_smes(struct arm_smmu_master_cfg *cfg)
-{
- struct arm_smmu_device *smmu = cfg->smmu;
- int i, idx;
- struct iommu_fwspec *fwspec = arm_smmu_get_fwspec(cfg);
-
- spin_lock(&smmu->stream_map_lock);
- for_each_cfg_sme(cfg, i, idx, fwspec->num_ids) {
- if (arm_smmu_free_sme(smmu, idx))
- arm_smmu_write_sme(smmu, idx);
- cfg->smendx[i] = INVALID_SMENDX;
- }
- spin_unlock(&smmu->stream_map_lock);
-}
-
static int arm_smmu_domain_add_master(struct arm_smmu_domain *smmu_domain,
struct arm_smmu_master_cfg *cfg)
{
@@ -1635,6 +1620,20 @@ static int arm_smmu_domain_add_master(struct arm_smmu_domain *smmu_domain,
return 0;
}

+static void arm_smmu_domain_remove_master(struct arm_smmu_domain *smmu_domain,
+ struct arm_smmu_master_cfg *cfg)
+{
+ struct arm_smmu_device *smmu = smmu_domain->smmu;
+ struct arm_smmu_s2cr *s2cr = smmu->s2crs;
+ struct iommu_fwspec *fwspec = arm_smmu_get_fwspec(cfg);
+ int i, idx;
+
+ for_each_cfg_sme(cfg, i, idx, fwspec->num_ids) {
+ s2cr[idx] = s2cr_init_val;
+ arm_smmu_write_s2cr(smmu, idx);
+ }
+}
+
static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
{
int ret;
@@ -1684,10 +1683,11 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)

static void arm_smmu_detach_dev(struct iommu_domain *domain, struct device *dev)
{
+ struct arm_smmu_domain *smmu_domain = domain->priv;
struct arm_smmu_master_cfg *cfg = find_smmu_master_cfg(dev);

if (cfg)
- arm_smmu_master_free_smes(cfg);
+ return arm_smmu_domain_remove_master(smmu_domain, cfg);

}


Regards,
Rahul
Re: [PATCH] xen/arm: smmuv1: remove iommu group when deassign a device
Posted by Julien Grall 1 year, 9 months ago
Hi,

On 21/07/2022 16:53, Rahul Singh wrote:
>> On 28 Jun 2022, at 6:23 pm, Mykyta Poturai <mykyta.poturai@gmail.com> wrote:
>> With this patch I get the same results, here is the error message:
>>
>> (XEN) smmu: /iommu@51400000: Unexpected global fault, this could be serious
>> (XEN) smmu: /iommu@51400000:    GFSR 0x00000001, GFSYNR0 0x00000004, GFSYNR1 0x00001055, GFSYNR2 0x00000000
>>
> 
> As you mentioned earlier, this fault is observed because GPU continues to access memory when the context is
> already invalidated, and therefore triggers the "Invalid context fault".  This is a different issue and is not related to this patch.
> 
> @Julien are you okay if I will send the below patch for official review as this issue pending for a long time?

I am OK in principle. I will do a proper review on you send a formal patch.

Cheers,

-- 
Julien Grall
Re: [PATCH] xen/arm: smmuv1: remove iommu group when deassign a device
Posted by Rahul Singh 1 year, 9 months ago
Hi Julien,

> On 26 Jul 2022, at 6:35 pm, Julien Grall <julien@xen.org> wrote:
> 
> Hi,
> 
> On 21/07/2022 16:53, Rahul Singh wrote:
>>> On 28 Jun 2022, at 6:23 pm, Mykyta Poturai <mykyta.poturai@gmail.com> wrote:
>>> With this patch I get the same results, here is the error message:
>>> 
>>> (XEN) smmu: /iommu@51400000: Unexpected global fault, this could be serious
>>> (XEN) smmu: /iommu@51400000:    GFSR 0x00000001, GFSYNR0 0x00000004, GFSYNR1 0x00001055, GFSYNR2 0x00000000
>>> 
>> As you mentioned earlier, this fault is observed because GPU continues to access memory when the context is
>> already invalidated, and therefore triggers the "Invalid context fault".  This is a different issue and is not related to this patch.
>> @Julien are you okay if I will send the below patch for official review as this issue pending for a long time?
> 
> I am OK in principle. I will do a proper review on you send a formal patch.

Thanks for the confirmation. I will send the formal patch for review.

Regards.
Rahul