drm/ttm: support device w/o coherency

[RFC PATCH 2/2] drm/ttm: downgrade cached to write_combined when snooping not available

Posted by Icenowy Zheng 1 year, 7 months ago

As we can now acquire the presence of the full DMA coherency (snooping
capability) from ttm_device, we can now map the CPU side memory as
write-combined when cached is requested and snooping is not avilable.

Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
---
 drivers/gpu/drm/ttm/ttm_bo_util.c | 4 ++++
 drivers/gpu/drm/ttm/ttm_tt.c      | 4 ++++
 include/drm/ttm/ttm_caching.h     | 3 ++-
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
index 0b3f4267130c4..6519ce047787d 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -302,6 +302,10 @@ pgprot_t ttm_io_prot(struct ttm_buffer_object *bo, struct ttm_resource *res,
 		caching = res->bus.caching;
 	}
 
+	/* Downgrade cached mapping for non-snooping devices */
+	if (!bo->bdev->dma_coherent && caching == ttm_cached)
+		caching = ttm_write_combined;
+
 	return ttm_prot_from_caching(caching, tmp);
 }
 EXPORT_SYMBOL(ttm_io_prot);
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 7b00ddf0ce49f..3335df45fba5e 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -152,6 +152,10 @@ static void ttm_tt_init_fields(struct ttm_tt *ttm,
 			       enum ttm_caching caching,
 			       unsigned long extra_pages)
 {
+	/* Downgrade cached mapping for non-snooping devices */
+	if (!bo->bdev->dma_coherent && caching == ttm_cached)
+		caching = ttm_write_combined;
+
 	ttm->num_pages = (PAGE_ALIGN(bo->base.size) >> PAGE_SHIFT) + extra_pages;
 	ttm->page_flags = page_flags;
 	ttm->dma_address = NULL;
diff --git a/include/drm/ttm/ttm_caching.h b/include/drm/ttm/ttm_caching.h
index a18f43e93abab..f92d7911f50e4 100644
--- a/include/drm/ttm/ttm_caching.h
+++ b/include/drm/ttm/ttm_caching.h
@@ -47,7 +47,8 @@ enum ttm_caching {
 
 	/**
 	 * @ttm_cached: Fully cached like normal system memory, requires that
-	 * devices snoop the CPU cache on accesses.
+	 * devices snoop the CPU cache on accesses. Downgraded to
+	 * ttm_write_combined when the snooping capaiblity is missing.
 	 */
 	ttm_cached
 };
-- 
2.45.2

Re: [RFC PATCH 2/2] drm/ttm: downgrade cached to write_combined when snooping not available

Posted by Jiaxun Yang 1 year, 7 months ago


在2024年6月29日六月 上午6:22，Icenowy Zheng写道：
[...]
> @@ -302,6 +302,10 @@ pgprot_t ttm_io_prot(struct ttm_buffer_object *bo, 
> struct ttm_resource *res,
>  		caching = res->bus.caching;
>  	}
> 
> +	/* Downgrade cached mapping for non-snooping devices */
> +	if (!bo->bdev->dma_coherent && caching == ttm_cached)
> +		caching = ttm_write_combined;
Hi Icenowy,

Thanks for your patch! You saved many non-coh PCIe host implementations a day!.

Unfortunately I don't think we can safely ttm_cached to ttm_write_comnined, we've
had enough drama with write combine behaviour on all different platforms.

See drm_arch_can_wc_memory in drm_cache.h.

Thanks 

> +
>  	return ttm_prot_from_caching(caching, tmp);
>  }
>  EXPORT_SYMBOL(ttm_io_prot);
> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
> index 7b00ddf0ce49f..3335df45fba5e 100644
> --- a/drivers/gpu/drm/ttm/ttm_tt.c
> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
> @@ -152,6 +152,10 @@ static void ttm_tt_init_fields(struct ttm_tt *ttm,
>  			       enum ttm_caching caching,
>  			       unsigned long extra_pages)
>  {
> +	/* Downgrade cached mapping for non-snooping devices */
> +	if (!bo->bdev->dma_coherent && caching == ttm_cached)
> +		caching = ttm_write_combined;
> +
>  	ttm->num_pages = (PAGE_ALIGN(bo->base.size) >> PAGE_SHIFT) + extra_pages;
>  	ttm->page_flags = page_flags;
>  	ttm->dma_address = NULL;
> diff --git a/include/drm/ttm/ttm_caching.h b/include/drm/ttm/ttm_caching.h
> index a18f43e93abab..f92d7911f50e4 100644
> --- a/include/drm/ttm/ttm_caching.h
> +++ b/include/drm/ttm/ttm_caching.h
> @@ -47,7 +47,8 @@ enum ttm_caching {
> 
>  	/**
>  	 * @ttm_cached: Fully cached like normal system memory, requires that
> -	 * devices snoop the CPU cache on accesses.
> +	 * devices snoop the CPU cache on accesses. Downgraded to
> +	 * ttm_write_combined when the snooping capaiblity is missing.
>  	 */
>  	ttm_cached
>  };
> -- 
> 2.45.2

-- 
- Jiaxun

Re: [RFC PATCH 2/2] drm/ttm: downgrade cached to write_combined when snooping not available

Posted by Icenowy Zheng 1 year, 7 months ago

在 2024-06-29星期六的 20:57 +0100，Jiaxun Yang写道：
> 
> 
> 在2024年6月29日六月 上午6:22，Icenowy Zheng写道：
> [...]
> > @@ -302,6 +302,10 @@ pgprot_t ttm_io_prot(struct ttm_buffer_object
> > *bo, 
> > struct ttm_resource *res,
> >                 caching = res->bus.caching;
> >         }
> > 
> > +       /* Downgrade cached mapping for non-snooping devices */
> > +       if (!bo->bdev->dma_coherent && caching == ttm_cached)
> > +               caching = ttm_write_combined;
> Hi Icenowy,
> 
> Thanks for your patch! You saved many non-coh PCIe host
> implementations a day!.
> 
> Unfortunately I don't think we can safely ttm_cached to
> ttm_write_comnined, we've
> had enough drama with write combine behaviour on all different
> platforms.

I think on these platforms, ttm_write_combined should be prevented to
be mapped to pgprot_writecombine instead, but downgrade ttm_cached to
ttm_write_combined (and then being treated same as ttm_uncached) is
acceptable.

> 
> See drm_arch_can_wc_memory in drm_cache.h.
> 
> Thanks 
> 
> > +
> >         return ttm_prot_from_caching(caching, tmp);
> >  }
> >  EXPORT_SYMBOL(ttm_io_prot);
> > diff --git a/drivers/gpu/drm/ttm/ttm_tt.c
> > b/drivers/gpu/drm/ttm/ttm_tt.c
> > index 7b00ddf0ce49f..3335df45fba5e 100644
> > --- a/drivers/gpu/drm/ttm/ttm_tt.c
> > +++ b/drivers/gpu/drm/ttm/ttm_tt.c
> > @@ -152,6 +152,10 @@ static void ttm_tt_init_fields(struct ttm_tt
> > *ttm,
> >                                enum ttm_caching caching,
> >                                unsigned long extra_pages)
> >  {
> > +       /* Downgrade cached mapping for non-snooping devices */
> > +       if (!bo->bdev->dma_coherent && caching == ttm_cached)
> > +               caching = ttm_write_combined;
> > +
> >         ttm->num_pages = (PAGE_ALIGN(bo->base.size) >> PAGE_SHIFT)
> > + extra_pages;
> >         ttm->page_flags = page_flags;
> >         ttm->dma_address = NULL;
> > diff --git a/include/drm/ttm/ttm_caching.h
> > b/include/drm/ttm/ttm_caching.h
> > index a18f43e93abab..f92d7911f50e4 100644
> > --- a/include/drm/ttm/ttm_caching.h
> > +++ b/include/drm/ttm/ttm_caching.h
> > @@ -47,7 +47,8 @@ enum ttm_caching {
> > 
> >         /**
> >          * @ttm_cached: Fully cached like normal system memory,
> > requires that
> > -        * devices snoop the CPU cache on accesses.
> > +        * devices snoop the CPU cache on accesses. Downgraded to
> > +        * ttm_write_combined when the snooping capaiblity is
> > missing.
> >          */
> >         ttm_cached
> >  };
> > -- 
> > 2.45.2
>

Re: [RFC PATCH 2/2] drm/ttm: downgrade cached to write_combined when snooping not available

Posted by Icenowy Zheng 1 year, 7 months ago


于 2024年6月30日 GMT+08:00 03:57:47，Jiaxun Yang <jiaxun.yang@flygoat.com> 写道：
>
>
>在2024年6月29日六月 上午6:22，Icenowy Zheng写道：
>[...]
>> @@ -302,6 +302,10 @@ pgprot_t ttm_io_prot(struct ttm_buffer_object *bo, 
>> struct ttm_resource *res,
>>  		caching = res->bus.caching;
>>  	}
>> 
>> +	/* Downgrade cached mapping for non-snooping devices */
>> +	if (!bo->bdev->dma_coherent && caching == ttm_cached)
>> +		caching = ttm_write_combined;
>Hi Icenowy,
>
>Thanks for your patch! You saved many non-coh PCIe host implementations a day!.
>
>Unfortunately I don't think we can safely ttm_cached to ttm_write_comnined, we've
>had enough drama with write combine behaviour on all different platforms.
>
>See drm_arch_can_wc_memory in drm_cache.h.
>

Yes this really sounds like an issue.

Maybe the behavior of ttm_write_combined should furtherly be decided
by drm_arch_can_wc_memory() in case of quirks?

>Thanks 
>
>> +
>>  	return ttm_prot_from_caching(caching, tmp);
>>  }
>>  EXPORT_SYMBOL(ttm_io_prot);
>> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
>> index 7b00ddf0ce49f..3335df45fba5e 100644
>> --- a/drivers/gpu/drm/ttm/ttm_tt.c
>> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
>> @@ -152,6 +152,10 @@ static void ttm_tt_init_fields(struct ttm_tt *ttm,
>>  			       enum ttm_caching caching,
>>  			       unsigned long extra_pages)
>>  {
>> +	/* Downgrade cached mapping for non-snooping devices */
>> +	if (!bo->bdev->dma_coherent && caching == ttm_cached)
>> +		caching = ttm_write_combined;
>> +
>>  	ttm->num_pages = (PAGE_ALIGN(bo->base.size) >> PAGE_SHIFT) + extra_pages;
>>  	ttm->page_flags = page_flags;
>>  	ttm->dma_address = NULL;
>> diff --git a/include/drm/ttm/ttm_caching.h b/include/drm/ttm/ttm_caching.h
>> index a18f43e93abab..f92d7911f50e4 100644
>> --- a/include/drm/ttm/ttm_caching.h
>> +++ b/include/drm/ttm/ttm_caching.h
>> @@ -47,7 +47,8 @@ enum ttm_caching {
>> 
>>  	/**
>>  	 * @ttm_cached: Fully cached like normal system memory, requires that
>> -	 * devices snoop the CPU cache on accesses.
>> +	 * devices snoop the CPU cache on accesses. Downgraded to
>> +	 * ttm_write_combined when the snooping capaiblity is missing.
>>  	 */
>>  	ttm_cached
>>  };
>> -- 
>> 2.45.2
>

Re: [RFC PATCH 2/2] drm/ttm: downgrade cached to write_combined when snooping not available

Posted by Christian König 1 year, 7 months ago

Am 29.06.24 um 22:51 schrieb Icenowy Zheng:
>
> 于 2024年6月30日 GMT+08:00 03:57:47，Jiaxun Yang <jiaxun.yang@flygoat.com> 写道：
>>
>> 在2024年6月29日六月 上午6:22，Icenowy Zheng写道：
>> [...]
>>> @@ -302,6 +302,10 @@ pgprot_t ttm_io_prot(struct ttm_buffer_object *bo,
>>> struct ttm_resource *res,
>>>   		caching = res->bus.caching;
>>>   	}
>>>
>>> +	/* Downgrade cached mapping for non-snooping devices */
>>> +	if (!bo->bdev->dma_coherent && caching == ttm_cached)
>>> +		caching = ttm_write_combined;
>> Hi Icenowy,
>>
>> Thanks for your patch! You saved many non-coh PCIe host implementations a day!.

Ah, wait a second.

Such a thing as non-coherent PCIe implementation doesn't exist. The PCIe 
specification makes it mandatory for memory access to be cache coherent.

There are a bunch of non-compliant PCIe implementations which have 
broken cache coherency, but those explicitly violate the specification 
and because of that are not supported.

Regards,
Christian.

>>
>> Unfortunately I don't think we can safely ttm_cached to ttm_write_comnined, we've
>> had enough drama with write combine behaviour on all different platforms.
>>
>> See drm_arch_can_wc_memory in drm_cache.h.
>>
> Yes this really sounds like an issue.
>
> Maybe the behavior of ttm_write_combined should furtherly be decided
> by drm_arch_can_wc_memory() in case of quirks?
>
>> Thanks
>>
>>> +
>>>   	return ttm_prot_from_caching(caching, tmp);
>>>   }
>>>   EXPORT_SYMBOL(ttm_io_prot);
>>> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
>>> index 7b00ddf0ce49f..3335df45fba5e 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_tt.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
>>> @@ -152,6 +152,10 @@ static void ttm_tt_init_fields(struct ttm_tt *ttm,
>>>   			       enum ttm_caching caching,
>>>   			       unsigned long extra_pages)
>>>   {
>>> +	/* Downgrade cached mapping for non-snooping devices */
>>> +	if (!bo->bdev->dma_coherent && caching == ttm_cached)
>>> +		caching = ttm_write_combined;
>>> +
>>>   	ttm->num_pages = (PAGE_ALIGN(bo->base.size) >> PAGE_SHIFT) + extra_pages;
>>>   	ttm->page_flags = page_flags;
>>>   	ttm->dma_address = NULL;
>>> diff --git a/include/drm/ttm/ttm_caching.h b/include/drm/ttm/ttm_caching.h
>>> index a18f43e93abab..f92d7911f50e4 100644
>>> --- a/include/drm/ttm/ttm_caching.h
>>> +++ b/include/drm/ttm/ttm_caching.h
>>> @@ -47,7 +47,8 @@ enum ttm_caching {
>>>
>>>   	/**
>>>   	 * @ttm_cached: Fully cached like normal system memory, requires that
>>> -	 * devices snoop the CPU cache on accesses.
>>> +	 * devices snoop the CPU cache on accesses. Downgraded to
>>> +	 * ttm_write_combined when the snooping capaiblity is missing.
>>>   	 */
>>>   	ttm_cached
>>>   };
>>> -- 
>>> 2.45.2

Re: [RFC PATCH 2/2] drm/ttm: downgrade cached to write_combined when snooping not available

Posted by Icenowy Zheng 1 year, 7 months ago

在 2024-07-01星期一的 13:40 +0200，Christian König写道：
> Am 29.06.24 um 22:51 schrieb Icenowy Zheng:
> > 
> > 于 2024年6月30日 GMT+08:00 03:57:47，Jiaxun Yang
> > <jiaxun.yang@flygoat.com> 写道：
> > > 
> > > 在2024年6月29日六月 上午6:22，Icenowy Zheng写道：
> > > [...]
> > > > @@ -302,6 +302,10 @@ pgprot_t ttm_io_prot(struct
> > > > ttm_buffer_object *bo,
> > > > struct ttm_resource *res,
> > > >                 caching = res->bus.caching;
> > > >         }
> > > > 
> > > > +       /* Downgrade cached mapping for non-snooping devices */
> > > > +       if (!bo->bdev->dma_coherent && caching == ttm_cached)
> > > > +               caching = ttm_write_combined;
> > > Hi Icenowy,
> > > 
> > > Thanks for your patch! You saved many non-coh PCIe host
> > > implementations a day!.
> 
> Ah, wait a second.
> 
> Such a thing as non-coherent PCIe implementation doesn't exist. The
> PCIe 
> specification makes it mandatory for memory access to be cache
> coherent.

Really? I tried to get PCIe spec 2.0, PCI spec 3.0 and PCI-X addendum
1.0, none of this explicitly requires the PCIe controller and the CPU
being fully coherent. The PCI-X spec even says "Note that PCI-X, like
conventional PCI, does not require systems to support coherent caches
for addresses accessed by PCI-X requesters".

In addition, in the perspective of Linux, I think bypassing CPU cache
of shared memory is considered as coherent access too, see
dma_alloc_coherent() function's naming.

> 
> There are a bunch of non-compliant PCIe implementations which have 
> broken cache coherency, but those explicitly violate the
> specification 
> and because of that are not supported.

Regardless of it violating the spec or not, these devices work with
Linux subsystems that use dma_alloc_coherent to allocate DMA buffers
(which is the most common case), and GPU drivers just give out cryptic
error messages like "ring gfx test failed" without any mention of
coherency issues at all, which makes the fact that Linux DRM/TTM
subsystem currently requires PCIe snooping CPU cache more obscure.

> 
> Regards,
> Christian.
> 
> > > 
> > > Unfortunately I don't think we can safely ttm_cached to
> > > ttm_write_comnined, we've
> > > had enough drama with write combine behaviour on all different
> > > platforms.
> > > 
> > > See drm_arch_can_wc_memory in drm_cache.h.
> > > 
> > Yes this really sounds like an issue.
> > 
> > Maybe the behavior of ttm_write_combined should furtherly be
> > decided
> > by drm_arch_can_wc_memory() in case of quirks?
> > 
> > > Thanks
> > > 
> > > > +
> > > >         return ttm_prot_from_caching(caching, tmp);
> > > >   }
> > > >   EXPORT_SYMBOL(ttm_io_prot);
> > > > diff --git a/drivers/gpu/drm/ttm/ttm_tt.c
> > > > b/drivers/gpu/drm/ttm/ttm_tt.c
> > > > index 7b00ddf0ce49f..3335df45fba5e 100644
> > > > --- a/drivers/gpu/drm/ttm/ttm_tt.c
> > > > +++ b/drivers/gpu/drm/ttm/ttm_tt.c
> > > > @@ -152,6 +152,10 @@ static void ttm_tt_init_fields(struct
> > > > ttm_tt *ttm,
> > > >                                enum ttm_caching caching,
> > > >                                unsigned long extra_pages)
> > > >   {
> > > > +       /* Downgrade cached mapping for non-snooping devices */
> > > > +       if (!bo->bdev->dma_coherent && caching == ttm_cached)
> > > > +               caching = ttm_write_combined;
> > > > +
> > > >         ttm->num_pages = (PAGE_ALIGN(bo->base.size) >>
> > > > PAGE_SHIFT) + extra_pages;
> > > >         ttm->page_flags = page_flags;
> > > >         ttm->dma_address = NULL;
> > > > diff --git a/include/drm/ttm/ttm_caching.h
> > > > b/include/drm/ttm/ttm_caching.h
> > > > index a18f43e93abab..f92d7911f50e4 100644
> > > > --- a/include/drm/ttm/ttm_caching.h
> > > > +++ b/include/drm/ttm/ttm_caching.h
> > > > @@ -47,7 +47,8 @@ enum ttm_caching {
> > > > 
> > > >         /**
> > > >          * @ttm_cached: Fully cached like normal system memory,
> > > > requires that
> > > > -        * devices snoop the CPU cache on accesses.
> > > > +        * devices snoop the CPU cache on accesses. Downgraded
> > > > to
> > > > +        * ttm_write_combined when the snooping capaiblity is
> > > > missing.
> > > >          */
> > > >         ttm_cached
> > > >   };
> > > > -- 
> > > > 2.45.2
>

Re: [RFC PATCH 2/2] drm/ttm: downgrade cached to write_combined when snooping not available

Posted by Jiaxun Yang 1 year, 7 months ago


在2024年7月1日七月 下午12:40，Christian König写道：
[...]
>
> Ah, wait a second.
>
> Such a thing as non-coherent PCIe implementation doesn't exist. The PCIe 
> specification makes it mandatory for memory access to be cache coherent.

There are some non-PCIe TTM GPU being hit by this pitfall, we have non-coherent
Vivante GPU on some devices.

Handling it at TTM core makes more sense on reducing per-driver effort on dealing
platform issues.

>
> There are a bunch of non-compliant PCIe implementations which have 
> broken cache coherency, but those explicitly violate the specification 
> and because of that are not supported.

I don't really understand, "doesn't exist" and "bunch of" seems to be contradicting
with each other.

>
> Regards,
> Christian.
>
>>>
>>> Unfortunately I don't think we can safely ttm_cached to ttm_write_comnined, we've
>>> had enough drama with write combine behaviour on all different platforms.
>>>
>>> See drm_arch_can_wc_memory in drm_cache.h.
>>>
>> Yes this really sounds like an issue.
>>
>> Maybe the behavior of ttm_write_combined should furtherly be decided
>> by drm_arch_can_wc_memory() in case of quirks?

IMO for DMA mappings, use dma_pgprot at mapping makes more sense :-)

Thanks 
- Jiaxun
>>
>>> Thanks
>>>
>>>> +
>>>>   	return ttm_prot_from_caching(caching, tmp);
>>>>   }
>>>>   EXPORT_SYMBOL(ttm_io_prot);
>>>> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
>>>> index 7b00ddf0ce49f..3335df45fba5e 100644
>>>> --- a/drivers/gpu/drm/ttm/ttm_tt.c
>>>> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
>>>> @@ -152,6 +152,10 @@ static void ttm_tt_init_fields(struct ttm_tt *ttm,
>>>>   			       enum ttm_caching caching,
>>>>   			       unsigned long extra_pages)
>>>>   {
>>>> +	/* Downgrade cached mapping for non-snooping devices */
>>>> +	if (!bo->bdev->dma_coherent && caching == ttm_cached)
>>>> +		caching = ttm_write_combined;
>>>> +
>>>>   	ttm->num_pages = (PAGE_ALIGN(bo->base.size) >> PAGE_SHIFT) + extra_pages;
>>>>   	ttm->page_flags = page_flags;
>>>>   	ttm->dma_address = NULL;
>>>> diff --git a/include/drm/ttm/ttm_caching.h b/include/drm/ttm/ttm_caching.h
>>>> index a18f43e93abab..f92d7911f50e4 100644
>>>> --- a/include/drm/ttm/ttm_caching.h
>>>> +++ b/include/drm/ttm/ttm_caching.h
>>>> @@ -47,7 +47,8 @@ enum ttm_caching {
>>>>
>>>>   	/**
>>>>   	 * @ttm_cached: Fully cached like normal system memory, requires that
>>>> -	 * devices snoop the CPU cache on accesses.
>>>> +	 * devices snoop the CPU cache on accesses. Downgraded to
>>>> +	 * ttm_write_combined when the snooping capaiblity is missing.
>>>>   	 */
>>>>   	ttm_cached
>>>>   };
>>>> -- 
>>>> 2.45.2

-- 
- Jiaxun

Re: [RFC PATCH 2/2] drm/ttm: downgrade cached to write_combined when snooping not available

Posted by Christian König 1 year, 7 months ago

Am 01.07.24 um 13:52 schrieb Jiaxun Yang:
> 在2024年7月1日七月 下午12:40，Christian König写道：
> [...]
>> Ah, wait a second.
>>
>> Such a thing as non-coherent PCIe implementation doesn't exist. The PCIe
>> specification makes it mandatory for memory access to be cache coherent.
> There are some non-PCIe TTM GPU being hit by this pitfall, we have non-coherent
> Vivante GPU on some devices.

Yeah, but those are perfectly supported.

If you have a non PCIe device which needs uncached or write combined 
system memory allocations you can just specify this at memory allocation 
time.

> Handling it at TTM core makes more sense on reducing per-driver effort on dealing
> platform issues.
>
>> There are a bunch of non-compliant PCIe implementations which have
>> broken cache coherency, but those explicitly violate the specification
>> and because of that are not supported.
> I don't really understand, "doesn't exist" and "bunch of" seems to be contradicting
> with each other.

A compliant non-coherent PCIe implementation doesn't exists, that's made 
mandatory by the PCIe standard.

What does exists are some non compliant non coherent PCIe 
implementations, but as far as I know those are then not supported at 
all by Linux.

We already had tons of problems with platform chips which intentionally 
doesn't correctly implement the PCIe specification because it is cheaper.

At least for AMDs GPU driver we reverted to rejecting patches for 
platform bugs cause by incorrectly wired up PCIe root complexes.

Regards,
Christian.

>
>> Regards,
>> Christian.
>>
>>>> Unfortunately I don't think we can safely ttm_cached to ttm_write_comnined, we've
>>>> had enough drama with write combine behaviour on all different platforms.
>>>>
>>>> See drm_arch_can_wc_memory in drm_cache.h.
>>>>
>>> Yes this really sounds like an issue.
>>>
>>> Maybe the behavior of ttm_write_combined should furtherly be decided
>>> by drm_arch_can_wc_memory() in case of quirks?
> IMO for DMA mappings, use dma_pgprot at mapping makes more sense :-)
>
> Thanks
> - Jiaxun
>>>> Thanks
>>>>
>>>>> +
>>>>>    	return ttm_prot_from_caching(caching, tmp);
>>>>>    }
>>>>>    EXPORT_SYMBOL(ttm_io_prot);
>>>>> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
>>>>> index 7b00ddf0ce49f..3335df45fba5e 100644
>>>>> --- a/drivers/gpu/drm/ttm/ttm_tt.c
>>>>> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
>>>>> @@ -152,6 +152,10 @@ static void ttm_tt_init_fields(struct ttm_tt *ttm,
>>>>>    			       enum ttm_caching caching,
>>>>>    			       unsigned long extra_pages)
>>>>>    {
>>>>> +	/* Downgrade cached mapping for non-snooping devices */
>>>>> +	if (!bo->bdev->dma_coherent && caching == ttm_cached)
>>>>> +		caching = ttm_write_combined;
>>>>> +
>>>>>    	ttm->num_pages = (PAGE_ALIGN(bo->base.size) >> PAGE_SHIFT) + extra_pages;
>>>>>    	ttm->page_flags = page_flags;
>>>>>    	ttm->dma_address = NULL;
>>>>> diff --git a/include/drm/ttm/ttm_caching.h b/include/drm/ttm/ttm_caching.h
>>>>> index a18f43e93abab..f92d7911f50e4 100644
>>>>> --- a/include/drm/ttm/ttm_caching.h
>>>>> +++ b/include/drm/ttm/ttm_caching.h
>>>>> @@ -47,7 +47,8 @@ enum ttm_caching {
>>>>>
>>>>>    	/**
>>>>>    	 * @ttm_cached: Fully cached like normal system memory, requires that
>>>>> -	 * devices snoop the CPU cache on accesses.
>>>>> +	 * devices snoop the CPU cache on accesses. Downgraded to
>>>>> +	 * ttm_write_combined when the snooping capaiblity is missing.
>>>>>    	 */
>>>>>    	ttm_cached
>>>>>    };
>>>>> -- 
>>>>> 2.45.2