[PATCH 1/3] drm/panthor: Update memattr programing to align with GPU spec

Akash Goel posted 3 patches 1 month ago
There is a newer version of this series
[PATCH 1/3] drm/panthor: Update memattr programing to align with GPU spec
Posted by Akash Goel 1 month ago
Mali GPU Arch spec forbids the GPU PTEs to indicate Inner or Outer
shareability when no_coherency protocol is selected. Doing so results in
unexpected or undesired snooping of the CPU caches on some platforms,
such as Juno FPGA, causing functional issues. For example the boot of
MCU firmware fails as GPU ends up reading stale data for the FW memory
pages from the CPU's cache. The FW memory pages are initialized with
uncached mapping when the device is not reported to be dma-coherent.
The shareability bits are set to inner-shareable when IOMMU_CACHE flag
is passed to map_pages() callback and IOMMU_CACHE flag is passed by
Panthor driver when memory needs to be mapped as cached on the GPU side.

IOMMU_CACHE seems to imply cache coherent and is probably not fit for
purpose for the memory that is mapped as cached on GPU side but doesn't
need to remain coherent with the CPU.

This commit updates the programming of MEMATTR register to use
MIDGARD_INNER instead of CPU_INNER when coherency is disabled. That way
the inner-shareability specified in the GPU PTEs would map to Mali's
internal-shareable mode, which is always supported by the GPU regardless
of the coherency protocal and is required by the Userspace driver to
ensure coherency between the shader cores.

Signed-off-by: Akash Goel <akash.goel@arm.com>
---
 drivers/gpu/drm/panthor/panthor_mmu.c | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c b/drivers/gpu/drm/panthor/panthor_mmu.c
index f3ee5d2753f1..f522a116c1b1 100644
--- a/drivers/gpu/drm/panthor/panthor_mmu.c
+++ b/drivers/gpu/drm/panthor/panthor_mmu.c
@@ -1927,7 +1927,7 @@ struct panthor_heap_pool *panthor_vm_get_heap_pool(struct panthor_vm *vm, bool c
 	return pool;
 }
 
-static u64 mair_to_memattr(u64 mair)
+static u64 mair_to_memattr(u64 mair, bool coherent)
 {
 	u64 memattr = 0;
 	u32 i;
@@ -1946,14 +1946,21 @@ static u64 mair_to_memattr(u64 mair)
 				   AS_MEMATTR_AARCH64_SH_MIDGARD_INNER |
 				   AS_MEMATTR_AARCH64_INNER_ALLOC_EXPL(false, false);
 		} else {
-			/* Use SH_CPU_INNER mode so SH_IS, which is used when
-			 * IOMMU_CACHE is set, actually maps to the standard
-			 * definition of inner-shareable and not Mali's
-			 * internal-shareable mode.
-			 */
 			out_attr = AS_MEMATTR_AARCH64_INNER_OUTER_WB |
-				   AS_MEMATTR_AARCH64_SH_CPU_INNER |
 				   AS_MEMATTR_AARCH64_INNER_ALLOC_EXPL(inner & 1, inner & 2);
+			/* Use SH_MIDGARD_INNER mode when device isn't coherent,
+			 * so SH_IS, which is used when IOMMU_CACHE is set, maps
+			 * to Mali's internal-shareable mode. As per the Mali
+			 * Spec, inner and outer-shareable modes aren't allowed
+			 * for WB memory when coherency is disabled.
+			 * Use SH_CPU_INNER mode when coherency is enabled, so
+			 * that SH_IS actually maps to the standard definition of
+			 * inner-shareable.
+			 */
+			if (!coherent)
+				out_attr |= AS_MEMATTR_AARCH64_SH_MIDGARD_INNER;
+			else
+				out_attr |= AS_MEMATTR_AARCH64_SH_CPU_INNER;
 		}
 
 		memattr |= (u64)out_attr << (8 * i);
@@ -2325,7 +2332,7 @@ panthor_vm_create(struct panthor_device *ptdev, bool for_mcu,
 		goto err_sched_fini;
 
 	mair = io_pgtable_ops_to_pgtable(vm->pgtbl_ops)->cfg.arm_lpae_s1_cfg.mair;
-	vm->memattr = mair_to_memattr(mair);
+	vm->memattr = mair_to_memattr(mair, ptdev->coherent);
 
 	mutex_lock(&ptdev->mmu->vm.lock);
 	list_add_tail(&vm->node, &ptdev->mmu->vm.list);
-- 
2.25.1
Re: [PATCH 1/3] drm/panthor: Update memattr programing to align with GPU spec
Posted by Steven Price 1 month ago
On 24/10/2024 15:54, Akash Goel wrote:
> Mali GPU Arch spec forbids the GPU PTEs to indicate Inner or Outer
> shareability when no_coherency protocol is selected. Doing so results in
> unexpected or undesired snooping of the CPU caches on some platforms,
> such as Juno FPGA, causing functional issues. For example the boot of
> MCU firmware fails as GPU ends up reading stale data for the FW memory
> pages from the CPU's cache. The FW memory pages are initialized with
> uncached mapping when the device is not reported to be dma-coherent.
> The shareability bits are set to inner-shareable when IOMMU_CACHE flag
> is passed to map_pages() callback and IOMMU_CACHE flag is passed by
> Panthor driver when memory needs to be mapped as cached on the GPU side.
> 
> IOMMU_CACHE seems to imply cache coherent and is probably not fit for
> purpose for the memory that is mapped as cached on GPU side but doesn't
> need to remain coherent with the CPU.
> 
> This commit updates the programming of MEMATTR register to use
> MIDGARD_INNER instead of CPU_INNER when coherency is disabled. That way
> the inner-shareability specified in the GPU PTEs would map to Mali's
> internal-shareable mode, which is always supported by the GPU regardless
> of the coherency protocal and is required by the Userspace driver to
> ensure coherency between the shader cores.
> 
> Signed-off-by: Akash Goel <akash.goel@arm.com>

Reviewed-by: Steven Price <steven.price@arm.com>

> ---
>  drivers/gpu/drm/panthor/panthor_mmu.c | 23 +++++++++++++++--------
>  1 file changed, 15 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c b/drivers/gpu/drm/panthor/panthor_mmu.c
> index f3ee5d2753f1..f522a116c1b1 100644
> --- a/drivers/gpu/drm/panthor/panthor_mmu.c
> +++ b/drivers/gpu/drm/panthor/panthor_mmu.c
> @@ -1927,7 +1927,7 @@ struct panthor_heap_pool *panthor_vm_get_heap_pool(struct panthor_vm *vm, bool c
>  	return pool;
>  }
>  
> -static u64 mair_to_memattr(u64 mair)
> +static u64 mair_to_memattr(u64 mair, bool coherent)
>  {
>  	u64 memattr = 0;
>  	u32 i;
> @@ -1946,14 +1946,21 @@ static u64 mair_to_memattr(u64 mair)
>  				   AS_MEMATTR_AARCH64_SH_MIDGARD_INNER |
>  				   AS_MEMATTR_AARCH64_INNER_ALLOC_EXPL(false, false);
>  		} else {
> -			/* Use SH_CPU_INNER mode so SH_IS, which is used when
> -			 * IOMMU_CACHE is set, actually maps to the standard
> -			 * definition of inner-shareable and not Mali's
> -			 * internal-shareable mode.
> -			 */
>  			out_attr = AS_MEMATTR_AARCH64_INNER_OUTER_WB |
> -				   AS_MEMATTR_AARCH64_SH_CPU_INNER |
>  				   AS_MEMATTR_AARCH64_INNER_ALLOC_EXPL(inner & 1, inner & 2);
> +			/* Use SH_MIDGARD_INNER mode when device isn't coherent,
> +			 * so SH_IS, which is used when IOMMU_CACHE is set, maps
> +			 * to Mali's internal-shareable mode. As per the Mali
> +			 * Spec, inner and outer-shareable modes aren't allowed
> +			 * for WB memory when coherency is disabled.
> +			 * Use SH_CPU_INNER mode when coherency is enabled, so
> +			 * that SH_IS actually maps to the standard definition of
> +			 * inner-shareable.
> +			 */
> +			if (!coherent)
> +				out_attr |= AS_MEMATTR_AARCH64_SH_MIDGARD_INNER;
> +			else
> +				out_attr |= AS_MEMATTR_AARCH64_SH_CPU_INNER;
>  		}
>  
>  		memattr |= (u64)out_attr << (8 * i);
> @@ -2325,7 +2332,7 @@ panthor_vm_create(struct panthor_device *ptdev, bool for_mcu,
>  		goto err_sched_fini;
>  
>  	mair = io_pgtable_ops_to_pgtable(vm->pgtbl_ops)->cfg.arm_lpae_s1_cfg.mair;
> -	vm->memattr = mair_to_memattr(mair);
> +	vm->memattr = mair_to_memattr(mair, ptdev->coherent);
>  
>  	mutex_lock(&ptdev->mmu->vm.lock);
>  	list_add_tail(&vm->node, &ptdev->mmu->vm.list);
Re: [PATCH 1/3] drm/panthor: Update memattr programing to align with GPU spec
Posted by Boris Brezillon 1 month ago
+Robin for the MMU details

On Thu, 24 Oct 2024 15:54:30 +0100
Akash Goel <akash.goel@arm.com> wrote:

> Mali GPU Arch spec forbids the GPU PTEs to indicate Inner or Outer
> shareability when no_coherency protocol is selected. Doing so results in
> unexpected or undesired snooping of the CPU caches on some platforms,
> such as Juno FPGA, causing functional issues. For example the boot of
> MCU firmware fails as GPU ends up reading stale data for the FW memory
> pages from the CPU's cache. The FW memory pages are initialized with
> uncached mapping when the device is not reported to be dma-coherent.
> The shareability bits are set to inner-shareable when IOMMU_CACHE flag
> is passed to map_pages() callback and IOMMU_CACHE flag is passed by
> Panthor driver when memory needs to be mapped as cached on the GPU side.
> 
> IOMMU_CACHE seems to imply cache coherent and is probably not fit for
> purpose for the memory that is mapped as cached on GPU side but doesn't
> need to remain coherent with the CPU.

Yeah, IIRC I've been abusing the _CACHE flag to mean GPU-cached, not
cache-coherent. I think it be good to sit down with Rob and add the
necessary IOMMU_ flags so we can express all the shareability and
cacheability variants we have with the "Mali" MMU. For instance, I
think the shareability between MCU/GPU can be expressed properly at the
moment, and we unconditionally map things uncached because of that.

> 
> This commit updates the programming of MEMATTR register to use
> MIDGARD_INNER instead of CPU_INNER when coherency is disabled. That way
> the inner-shareability specified in the GPU PTEs would map to Mali's
> internal-shareable mode, which is always supported by the GPU regardless
> of the coherency protocal and is required by the Userspace driver to
> ensure coherency between the shader cores.
> 
> Signed-off-by: Akash Goel <akash.goel@arm.com>

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>

> ---
>  drivers/gpu/drm/panthor/panthor_mmu.c | 23 +++++++++++++++--------
>  1 file changed, 15 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c b/drivers/gpu/drm/panthor/panthor_mmu.c
> index f3ee5d2753f1..f522a116c1b1 100644
> --- a/drivers/gpu/drm/panthor/panthor_mmu.c
> +++ b/drivers/gpu/drm/panthor/panthor_mmu.c
> @@ -1927,7 +1927,7 @@ struct panthor_heap_pool *panthor_vm_get_heap_pool(struct panthor_vm *vm, bool c
>  	return pool;
>  }
>  
> -static u64 mair_to_memattr(u64 mair)
> +static u64 mair_to_memattr(u64 mair, bool coherent)
>  {
>  	u64 memattr = 0;
>  	u32 i;
> @@ -1946,14 +1946,21 @@ static u64 mair_to_memattr(u64 mair)
>  				   AS_MEMATTR_AARCH64_SH_MIDGARD_INNER |
>  				   AS_MEMATTR_AARCH64_INNER_ALLOC_EXPL(false, false);
>  		} else {
> -			/* Use SH_CPU_INNER mode so SH_IS, which is used when
> -			 * IOMMU_CACHE is set, actually maps to the standard
> -			 * definition of inner-shareable and not Mali's
> -			 * internal-shareable mode.
> -			 */
>  			out_attr = AS_MEMATTR_AARCH64_INNER_OUTER_WB |
> -				   AS_MEMATTR_AARCH64_SH_CPU_INNER |
>  				   AS_MEMATTR_AARCH64_INNER_ALLOC_EXPL(inner & 1, inner & 2);
> +			/* Use SH_MIDGARD_INNER mode when device isn't coherent,
> +			 * so SH_IS, which is used when IOMMU_CACHE is set, maps
> +			 * to Mali's internal-shareable mode. As per the Mali
> +			 * Spec, inner and outer-shareable modes aren't allowed
> +			 * for WB memory when coherency is disabled.
> +			 * Use SH_CPU_INNER mode when coherency is enabled, so
> +			 * that SH_IS actually maps to the standard definition of
> +			 * inner-shareable.
> +			 */
> +			if (!coherent)
> +				out_attr |= AS_MEMATTR_AARCH64_SH_MIDGARD_INNER;
> +			else
> +				out_attr |= AS_MEMATTR_AARCH64_SH_CPU_INNER;
>  		}
>  
>  		memattr |= (u64)out_attr << (8 * i);
> @@ -2325,7 +2332,7 @@ panthor_vm_create(struct panthor_device *ptdev, bool for_mcu,
>  		goto err_sched_fini;
>  
>  	mair = io_pgtable_ops_to_pgtable(vm->pgtbl_ops)->cfg.arm_lpae_s1_cfg.mair;
> -	vm->memattr = mair_to_memattr(mair);
> +	vm->memattr = mair_to_memattr(mair, ptdev->coherent);
>  
>  	mutex_lock(&ptdev->mmu->vm.lock);
>  	list_add_tail(&vm->node, &ptdev->mmu->vm.list);
Re: [PATCH 1/3] drm/panthor: Update memattr programing to align with GPU spec
Posted by Liviu Dudau 1 month ago
On Thu, Oct 24, 2024 at 05:49:44PM +0200, Boris Brezillon wrote:
> +Robin for the MMU details
> 
> On Thu, 24 Oct 2024 15:54:30 +0100
> Akash Goel <akash.goel@arm.com> wrote:
> 
> > Mali GPU Arch spec forbids the GPU PTEs to indicate Inner or Outer
> > shareability when no_coherency protocol is selected. Doing so results in
> > unexpected or undesired snooping of the CPU caches on some platforms,
> > such as Juno FPGA, causing functional issues. For example the boot of
> > MCU firmware fails as GPU ends up reading stale data for the FW memory
> > pages from the CPU's cache. The FW memory pages are initialized with
> > uncached mapping when the device is not reported to be dma-coherent.
> > The shareability bits are set to inner-shareable when IOMMU_CACHE flag
> > is passed to map_pages() callback and IOMMU_CACHE flag is passed by
> > Panthor driver when memory needs to be mapped as cached on the GPU side.
> > 
> > IOMMU_CACHE seems to imply cache coherent and is probably not fit for
> > purpose for the memory that is mapped as cached on GPU side but doesn't
> > need to remain coherent with the CPU.
> 
> Yeah, IIRC I've been abusing the _CACHE flag to mean GPU-cached, not
> cache-coherent. I think it be good to sit down with Rob and add the
> necessary IOMMU_ flags so we can express all the shareability and
> cacheability variants we have with the "Mali" MMU. For instance, I
> think the shareability between MCU/GPU can be expressed properly at the
> moment, and we unconditionally map things uncached because of that.

Boris, did you mean to say "shareability between MCU/GPU *can't* be expressed
properly" ? Currently the sentence reads a bit strange, as if there was a
negation somewhere.

Our GPU's architecture dictates a lot of coherency attributes, especially at
the read-write/read-only L1$, so using _CACHE as a flag for something else
is indeed tempting. We should talk with Rob to see how we can improve things
here.

> 
> > 
> > This commit updates the programming of MEMATTR register to use
> > MIDGARD_INNER instead of CPU_INNER when coherency is disabled. That way
> > the inner-shareability specified in the GPU PTEs would map to Mali's
> > internal-shareable mode, which is always supported by the GPU regardless
> > of the coherency protocal and is required by the Userspace driver to
> > ensure coherency between the shader cores.
> > 
> > Signed-off-by: Akash Goel <akash.goel@arm.com>
> 
> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>

Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>

Best regards,
Liviu

> 
> > ---
> >  drivers/gpu/drm/panthor/panthor_mmu.c | 23 +++++++++++++++--------
> >  1 file changed, 15 insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c b/drivers/gpu/drm/panthor/panthor_mmu.c
> > index f3ee5d2753f1..f522a116c1b1 100644
> > --- a/drivers/gpu/drm/panthor/panthor_mmu.c
> > +++ b/drivers/gpu/drm/panthor/panthor_mmu.c
> > @@ -1927,7 +1927,7 @@ struct panthor_heap_pool *panthor_vm_get_heap_pool(struct panthor_vm *vm, bool c
> >  	return pool;
> >  }
> >  
> > -static u64 mair_to_memattr(u64 mair)
> > +static u64 mair_to_memattr(u64 mair, bool coherent)
> >  {
> >  	u64 memattr = 0;
> >  	u32 i;
> > @@ -1946,14 +1946,21 @@ static u64 mair_to_memattr(u64 mair)
> >  				   AS_MEMATTR_AARCH64_SH_MIDGARD_INNER |
> >  				   AS_MEMATTR_AARCH64_INNER_ALLOC_EXPL(false, false);
> >  		} else {
> > -			/* Use SH_CPU_INNER mode so SH_IS, which is used when
> > -			 * IOMMU_CACHE is set, actually maps to the standard
> > -			 * definition of inner-shareable and not Mali's
> > -			 * internal-shareable mode.
> > -			 */
> >  			out_attr = AS_MEMATTR_AARCH64_INNER_OUTER_WB |
> > -				   AS_MEMATTR_AARCH64_SH_CPU_INNER |
> >  				   AS_MEMATTR_AARCH64_INNER_ALLOC_EXPL(inner & 1, inner & 2);
> > +			/* Use SH_MIDGARD_INNER mode when device isn't coherent,
> > +			 * so SH_IS, which is used when IOMMU_CACHE is set, maps
> > +			 * to Mali's internal-shareable mode. As per the Mali
> > +			 * Spec, inner and outer-shareable modes aren't allowed
> > +			 * for WB memory when coherency is disabled.
> > +			 * Use SH_CPU_INNER mode when coherency is enabled, so
> > +			 * that SH_IS actually maps to the standard definition of
> > +			 * inner-shareable.
> > +			 */
> > +			if (!coherent)
> > +				out_attr |= AS_MEMATTR_AARCH64_SH_MIDGARD_INNER;
> > +			else
> > +				out_attr |= AS_MEMATTR_AARCH64_SH_CPU_INNER;
> >  		}
> >  
> >  		memattr |= (u64)out_attr << (8 * i);
> > @@ -2325,7 +2332,7 @@ panthor_vm_create(struct panthor_device *ptdev, bool for_mcu,
> >  		goto err_sched_fini;
> >  
> >  	mair = io_pgtable_ops_to_pgtable(vm->pgtbl_ops)->cfg.arm_lpae_s1_cfg.mair;
> > -	vm->memattr = mair_to_memattr(mair);
> > +	vm->memattr = mair_to_memattr(mair, ptdev->coherent);
> >  
> >  	mutex_lock(&ptdev->mmu->vm.lock);
> >  	list_add_tail(&vm->node, &ptdev->mmu->vm.list);
> 

-- 
====================
| I would like to |
| fix the world,  |
| but they're not |
| giving me the   |
 \ source code!  /
  ---------------
    ¯\_(ツ)_/¯
Re: [PATCH 1/3] drm/panthor: Update memattr programing to align with GPU spec
Posted by Boris Brezillon 1 month ago
On Fri, 25 Oct 2024 10:24:32 +0100
Liviu Dudau <liviu.dudau@arm.com> wrote:

> On Thu, Oct 24, 2024 at 05:49:44PM +0200, Boris Brezillon wrote:
> > +Robin for the MMU details
> > 
> > On Thu, 24 Oct 2024 15:54:30 +0100
> > Akash Goel <akash.goel@arm.com> wrote:
> >   
> > > Mali GPU Arch spec forbids the GPU PTEs to indicate Inner or Outer
> > > shareability when no_coherency protocol is selected. Doing so results in
> > > unexpected or undesired snooping of the CPU caches on some platforms,
> > > such as Juno FPGA, causing functional issues. For example the boot of
> > > MCU firmware fails as GPU ends up reading stale data for the FW memory
> > > pages from the CPU's cache. The FW memory pages are initialized with
> > > uncached mapping when the device is not reported to be dma-coherent.
> > > The shareability bits are set to inner-shareable when IOMMU_CACHE flag
> > > is passed to map_pages() callback and IOMMU_CACHE flag is passed by
> > > Panthor driver when memory needs to be mapped as cached on the GPU side.
> > > 
> > > IOMMU_CACHE seems to imply cache coherent and is probably not fit for
> > > purpose for the memory that is mapped as cached on GPU side but doesn't
> > > need to remain coherent with the CPU.  
> > 
> > Yeah, IIRC I've been abusing the _CACHE flag to mean GPU-cached, not
> > cache-coherent. I think it be good to sit down with Rob and add the
> > necessary IOMMU_ flags so we can express all the shareability and
> > cacheability variants we have with the "Mali" MMU. For instance, I
> > think the shareability between MCU/GPU can be expressed properly at the
> > moment, and we unconditionally map things uncached because of that.  
> 
> Boris, did you mean to say "shareability between MCU/GPU *can't* be expressed
> properly" ? Currently the sentence reads a bit strange, as if there was a
> negation somewhere.

Yes, sorry, I meant "can't".