[PATCH] drm: ttm: do not direct reclaim when allocating high order pages

Thadeu Lima de Souza Cascardo posted 1 patch 3 weeks, 1 day ago
drivers/gpu/drm/ttm/ttm_pool.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
[PATCH] drm: ttm: do not direct reclaim when allocating high order pages
Posted by Thadeu Lima de Souza Cascardo 3 weeks, 1 day ago
When the TTM pool tries to allocate new pages, it stats with max order. If
there are no pages ready in the system, the page allocator will start
reclaim. If direct reclaim fails, the allocator will reduce the order until
it gets all the pages it wants with whatever order the allocator succeeds
to reclaim.

However, while the allocator is reclaiming, lower order pages might be
available, which would work just fine for the pool allocator. Doing direct
reclaim just introduces latency in allocating memory.

The system should still start reclaiming in the background with kswapd, but
the pool allocator should try to allocate a lower order page instead of
directly reclaiming.

If not even a order-1 page is available, the TTM pool allocator will
eventually get to start allocating order-0 pages, at which point it should
and will directly reclaim.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
---
 drivers/gpu/drm/ttm/ttm_pool.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index baf27c70a4193a121fbc8b4e67cd6feb4c612b85..6124a53cd15634c833bce379093b557d2a2660fd 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -144,9 +144,11 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
 	 * Mapping pages directly into an userspace process and calling
 	 * put_page() on a TTM allocated page is illegal.
 	 */
-	if (order)
+	if (order) {
 		gfp_flags |= __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN |
 			__GFP_THISNODE;
+		gfp_flags &= ~__GFP_DIRECT_RECLAIM;
+	}
 
 	if (!pool->use_dma_alloc) {
 		p = alloc_pages_node(pool->nid, gfp_flags, order);

---
base-commit: b320789d6883cc00ac78ce83bccbfe7ed58afcf0
change-id: 20250909-ttm_pool_no_direct_reclaim-ee0807a2d3fe

Best regards,
-- 
Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
Re: [PATCH] drm: ttm: do not direct reclaim when allocating high order pages
Posted by Christian König 3 weeks, 1 day ago
On 10.09.25 13:59, Thadeu Lima de Souza Cascardo wrote:
> When the TTM pool tries to allocate new pages, it stats with max order. If
> there are no pages ready in the system, the page allocator will start
> reclaim. If direct reclaim fails, the allocator will reduce the order until
> it gets all the pages it wants with whatever order the allocator succeeds
> to reclaim.
> 
> However, while the allocator is reclaiming, lower order pages might be
> available, which would work just fine for the pool allocator. Doing direct
> reclaim just introduces latency in allocating memory.
> 
> The system should still start reclaiming in the background with kswapd, but
> the pool allocator should try to allocate a lower order page instead of
> directly reclaiming.
> 
> If not even a order-1 page is available, the TTM pool allocator will
> eventually get to start allocating order-0 pages, at which point it should
> and will directly reclaim.

Yeah that was discussed before quite a bit but at least for AMD GPUs that is absolutely not something we should do.

The performance difference between using high and low order pages can be up to 30%. So the added extra latency is just vital for good performance.

We could of course make that depend on the HW you use if it isn't necessary for some other GPU, but at least both NVidia and Intel seem to have pretty much the same HW restrictions.

NVidia has been working on extending this to even use 1GiB pages to reduce the TLB overhead even further.

Regards,
Christian.


> 
> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
> ---
>  drivers/gpu/drm/ttm/ttm_pool.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
> index baf27c70a4193a121fbc8b4e67cd6feb4c612b85..6124a53cd15634c833bce379093b557d2a2660fd 100644
> --- a/drivers/gpu/drm/ttm/ttm_pool.c
> +++ b/drivers/gpu/drm/ttm/ttm_pool.c
> @@ -144,9 +144,11 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
>  	 * Mapping pages directly into an userspace process and calling
>  	 * put_page() on a TTM allocated page is illegal.
>  	 */
> -	if (order)
> +	if (order) {
>  		gfp_flags |= __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN |
>  			__GFP_THISNODE;
> +		gfp_flags &= ~__GFP_DIRECT_RECLAIM;
> +	}
>  
>  	if (!pool->use_dma_alloc) {
>  		p = alloc_pages_node(pool->nid, gfp_flags, order);
> 
> ---
> base-commit: b320789d6883cc00ac78ce83bccbfe7ed58afcf0
> change-id: 20250909-ttm_pool_no_direct_reclaim-ee0807a2d3fe
> 
> Best regards,
Re: [PATCH] drm: ttm: do not direct reclaim when allocating high order pages
Posted by Thadeu Lima de Souza Cascardo 3 weeks, 1 day ago
On Wed, Sep 10, 2025 at 02:11:58PM +0200, Christian König wrote:
> On 10.09.25 13:59, Thadeu Lima de Souza Cascardo wrote:
> > When the TTM pool tries to allocate new pages, it stats with max order. If
> > there are no pages ready in the system, the page allocator will start
> > reclaim. If direct reclaim fails, the allocator will reduce the order until
> > it gets all the pages it wants with whatever order the allocator succeeds
> > to reclaim.
> > 
> > However, while the allocator is reclaiming, lower order pages might be
> > available, which would work just fine for the pool allocator. Doing direct
> > reclaim just introduces latency in allocating memory.
> > 
> > The system should still start reclaiming in the background with kswapd, but
> > the pool allocator should try to allocate a lower order page instead of
> > directly reclaiming.
> > 
> > If not even a order-1 page is available, the TTM pool allocator will
> > eventually get to start allocating order-0 pages, at which point it should
> > and will directly reclaim.
> 
> Yeah that was discussed before quite a bit but at least for AMD GPUs that is absolutely not something we should do.
> 
> The performance difference between using high and low order pages can be up to 30%. So the added extra latency is just vital for good performance.
> 
> We could of course make that depend on the HW you use if it isn't necessary for some other GPU, but at least both NVidia and Intel seem to have pretty much the same HW restrictions.
> 
> NVidia has been working on extending this to even use 1GiB pages to reduce the TLB overhead even further.
> 
> Regards,
> Christian.
> 

But if the system cannot reclaim or is working hard on reclaiming, it will
not allocate that page and the pool allocator will resort to lower order
pages anyway.

In case the system has pages available, it will use them. I think there is
a balance here and I find this one is reasonable. If the system is not
under pressure, it will allocate those higher order pages, as expected.

I can look into the behavior when the system might be fragmented, but I
still believe that the pool is offering such a protection by keeping those
higher order pages around. It is when the system is under memory presure
that we need to resort to lower order pages.

What we are seeing here is on a low memory (4GiB) single node system with
an APU, that it will have lots of latencies trying to allocate memory by
doing direct reclaim trying to allocate order-10 pages, which will fail and
down it goes until it gets to order-4 or order-3. With this change, we
don't see those latencies anymore and memory pressure goes down as well.

Cascardo.

> 
> > 
> > Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
> > ---
> >  drivers/gpu/drm/ttm/ttm_pool.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
> > index baf27c70a4193a121fbc8b4e67cd6feb4c612b85..6124a53cd15634c833bce379093b557d2a2660fd 100644
> > --- a/drivers/gpu/drm/ttm/ttm_pool.c
> > +++ b/drivers/gpu/drm/ttm/ttm_pool.c
> > @@ -144,9 +144,11 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
> >  	 * Mapping pages directly into an userspace process and calling
> >  	 * put_page() on a TTM allocated page is illegal.
> >  	 */
> > -	if (order)
> > +	if (order) {
> >  		gfp_flags |= __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN |
> >  			__GFP_THISNODE;
> > +		gfp_flags &= ~__GFP_DIRECT_RECLAIM;
> > +	}
> >  
> >  	if (!pool->use_dma_alloc) {
> >  		p = alloc_pages_node(pool->nid, gfp_flags, order);
> > 
> > ---
> > base-commit: b320789d6883cc00ac78ce83bccbfe7ed58afcf0
> > change-id: 20250909-ttm_pool_no_direct_reclaim-ee0807a2d3fe
> > 
> > Best regards,
> 
Re: [PATCH] drm: ttm: do not direct reclaim when allocating high order pages
Posted by Michel Dänzer 3 weeks ago
On 10.09.25 14:52, Thadeu Lima de Souza Cascardo wrote:
> On Wed, Sep 10, 2025 at 02:11:58PM +0200, Christian König wrote:
>> On 10.09.25 13:59, Thadeu Lima de Souza Cascardo wrote:
>>> When the TTM pool tries to allocate new pages, it stats with max order. If
>>> there are no pages ready in the system, the page allocator will start
>>> reclaim. If direct reclaim fails, the allocator will reduce the order until
>>> it gets all the pages it wants with whatever order the allocator succeeds
>>> to reclaim.
>>>
>>> However, while the allocator is reclaiming, lower order pages might be
>>> available, which would work just fine for the pool allocator. Doing direct
>>> reclaim just introduces latency in allocating memory.
>>>
>>> The system should still start reclaiming in the background with kswapd, but
>>> the pool allocator should try to allocate a lower order page instead of
>>> directly reclaiming.
>>>
>>> If not even a order-1 page is available, the TTM pool allocator will
>>> eventually get to start allocating order-0 pages, at which point it should
>>> and will directly reclaim.
>>
>> Yeah that was discussed before quite a bit but at least for AMD GPUs that is absolutely not something we should do.
>>
>> The performance difference between using high and low order pages can be up to 30%. So the added extra latency is just vital for good performance.
>>
>> We could of course make that depend on the HW you use if it isn't necessary for some other GPU, but at least both NVidia and Intel seem to have pretty much the same HW restrictions.
>>
>> NVidia has been working on extending this to even use 1GiB pages to reduce the TLB overhead even further.
> 
> But if the system cannot reclaim or is working hard on reclaiming, it will
> not allocate that page and the pool allocator will resort to lower order
> pages anyway.
> 
> In case the system has pages available, it will use them. I think there is
> a balance here and I find this one is reasonable. If the system is not
> under pressure, it will allocate those higher order pages, as expected.
> 
> I can look into the behavior when the system might be fragmented, but I
> still believe that the pool is offering such a protection by keeping those
> higher order pages around. It is when the system is under memory presure
> that we need to resort to lower order pages.
> 
> What we are seeing here is on a low memory (4GiB) single node system with
> an APU, that it will have lots of latencies trying to allocate memory by
> doing direct reclaim trying to allocate order-10 pages, which will fail and
> down it goes until it gets to order-4 or order-3. With this change, we
> don't see those latencies anymore and memory pressure goes down as well.
That reminds me of the scenario I described in the 00862edba135 ("drm/ttm: Use GFP_TRANSHUGE_LIGHT for allocating huge pages") commit log, where taking a filesystem backup could cause Firefox to freeze for on the order of a minute.

Something like that can't just be ignored as "not a problem" for a potential 30% performance gain.


-- 
Earthling Michel Dänzer       \        GNOME / Xwayland / Mesa developer
https://redhat.com             \               Libre software enthusiast
Re: [PATCH] drm: ttm: do not direct reclaim when allocating high order pages
Posted by Christian König 3 weeks ago
On 11.09.25 10:26, Michel Dänzer wrote:
> On 10.09.25 14:52, Thadeu Lima de Souza Cascardo wrote:
>> On Wed, Sep 10, 2025 at 02:11:58PM +0200, Christian König wrote:
>>> On 10.09.25 13:59, Thadeu Lima de Souza Cascardo wrote:
>>>> When the TTM pool tries to allocate new pages, it stats with max order. If
>>>> there are no pages ready in the system, the page allocator will start
>>>> reclaim. If direct reclaim fails, the allocator will reduce the order until
>>>> it gets all the pages it wants with whatever order the allocator succeeds
>>>> to reclaim.
>>>>
>>>> However, while the allocator is reclaiming, lower order pages might be
>>>> available, which would work just fine for the pool allocator. Doing direct
>>>> reclaim just introduces latency in allocating memory.
>>>>
>>>> The system should still start reclaiming in the background with kswapd, but
>>>> the pool allocator should try to allocate a lower order page instead of
>>>> directly reclaiming.
>>>>
>>>> If not even a order-1 page is available, the TTM pool allocator will
>>>> eventually get to start allocating order-0 pages, at which point it should
>>>> and will directly reclaim.
>>>
>>> Yeah that was discussed before quite a bit but at least for AMD GPUs that is absolutely not something we should do.
>>>
>>> The performance difference between using high and low order pages can be up to 30%. So the added extra latency is just vital for good performance.
>>>
>>> We could of course make that depend on the HW you use if it isn't necessary for some other GPU, but at least both NVidia and Intel seem to have pretty much the same HW restrictions.
>>>
>>> NVidia has been working on extending this to even use 1GiB pages to reduce the TLB overhead even further.
>>
>> But if the system cannot reclaim or is working hard on reclaiming, it will
>> not allocate that page and the pool allocator will resort to lower order
>> pages anyway.
>>
>> In case the system has pages available, it will use them. I think there is
>> a balance here and I find this one is reasonable. If the system is not
>> under pressure, it will allocate those higher order pages, as expected.
>>
>> I can look into the behavior when the system might be fragmented, but I
>> still believe that the pool is offering such a protection by keeping those
>> higher order pages around. It is when the system is under memory presure
>> that we need to resort to lower order pages.
>>
>> What we are seeing here is on a low memory (4GiB) single node system with
>> an APU, that it will have lots of latencies trying to allocate memory by
>> doing direct reclaim trying to allocate order-10 pages, which will fail and
>> down it goes until it gets to order-4 or order-3. With this change, we
>> don't see those latencies anymore and memory pressure goes down as well.
> That reminds me of the scenario I described in the 00862edba135 ("drm/ttm: Use GFP_TRANSHUGE_LIGHT for allocating huge pages") commit log, where taking a filesystem backup could cause Firefox to freeze for on the order of a minute.
> 
> Something like that can't just be ignored as "not a problem" for a potential 30% performance gain.

Well using 2MiB is actually a must have for certain HW features and we have quite a lot of people pushing to always using them.

So that TTM still falls back to lower order allocations is just a compromise to not trigger the OOM killer.

What we could do is to remove the fallback, but then Cascardos use case wouldn't be working any more at all.

Regards,
Christian.
Re: [PATCH] drm: ttm: do not direct reclaim when allocating high order pages
Posted by Michel Dänzer 3 weeks ago
On 11.09.25 11:07, Christian König wrote:
> On 11.09.25 10:26, Michel Dänzer wrote:
>> On 10.09.25 14:52, Thadeu Lima de Souza Cascardo wrote:
>>> On Wed, Sep 10, 2025 at 02:11:58PM +0200, Christian König wrote:
>>>> On 10.09.25 13:59, Thadeu Lima de Souza Cascardo wrote:
>>>>> When the TTM pool tries to allocate new pages, it stats with max order. If
>>>>> there are no pages ready in the system, the page allocator will start
>>>>> reclaim. If direct reclaim fails, the allocator will reduce the order until
>>>>> it gets all the pages it wants with whatever order the allocator succeeds
>>>>> to reclaim.
>>>>>
>>>>> However, while the allocator is reclaiming, lower order pages might be
>>>>> available, which would work just fine for the pool allocator. Doing direct
>>>>> reclaim just introduces latency in allocating memory.
>>>>>
>>>>> The system should still start reclaiming in the background with kswapd, but
>>>>> the pool allocator should try to allocate a lower order page instead of
>>>>> directly reclaiming.
>>>>>
>>>>> If not even a order-1 page is available, the TTM pool allocator will
>>>>> eventually get to start allocating order-0 pages, at which point it should
>>>>> and will directly reclaim.
>>>>
>>>> Yeah that was discussed before quite a bit but at least for AMD GPUs that is absolutely not something we should do.
>>>>
>>>> The performance difference between using high and low order pages can be up to 30%. So the added extra latency is just vital for good performance.
>>>>
>>>> We could of course make that depend on the HW you use if it isn't necessary for some other GPU, but at least both NVidia and Intel seem to have pretty much the same HW restrictions.
>>>>
>>>> NVidia has been working on extending this to even use 1GiB pages to reduce the TLB overhead even further.
>>>
>>> But if the system cannot reclaim or is working hard on reclaiming, it will
>>> not allocate that page and the pool allocator will resort to lower order
>>> pages anyway.
>>>
>>> In case the system has pages available, it will use them. I think there is
>>> a balance here and I find this one is reasonable. If the system is not
>>> under pressure, it will allocate those higher order pages, as expected.
>>>
>>> I can look into the behavior when the system might be fragmented, but I
>>> still believe that the pool is offering such a protection by keeping those
>>> higher order pages around. It is when the system is under memory presure
>>> that we need to resort to lower order pages.
>>>
>>> What we are seeing here is on a low memory (4GiB) single node system with
>>> an APU, that it will have lots of latencies trying to allocate memory by
>>> doing direct reclaim trying to allocate order-10 pages, which will fail and
>>> down it goes until it gets to order-4 or order-3. With this change, we
>>> don't see those latencies anymore and memory pressure goes down as well.
>> That reminds me of the scenario I described in the 00862edba135 ("drm/ttm: Use GFP_TRANSHUGE_LIGHT for allocating huge pages") commit log, where taking a filesystem backup could cause Firefox to freeze for on the order of a minute.
>>
>> Something like that can't just be ignored as "not a problem" for a potential 30% performance gain.
> 
> Well using 2MiB is actually a must have for certain HW features and we have quite a lot of people pushing to always using them.

Latency can't just be ignored though. Interactive apps intermittently freezing because this code desperately tries to reclaim huge pages while the system is under memory pressure isn't acceptable.


Maybe there could be some kind of mechanism which periodically scans BOs for sub-optimal page orders and tries migrating their storage to more optimal pages.


> So that TTM still falls back to lower order allocations is just a compromise to not trigger the OOM killer.
> 
> What we could do is to remove the fallback, but then Cascardos use case wouldn't be working any more at all.

Surely the issue is direct reclaim, not the fallback.


-- 
Earthling Michel Dänzer       \        GNOME / Xwayland / Mesa developer
https://redhat.com             \               Libre software enthusiast
Re: [PATCH] drm: ttm: do not direct reclaim when allocating high order pages
Posted by Christian König 3 weeks ago
On 11.09.25 14:49, Michel Dänzer wrote:
>>>> What we are seeing here is on a low memory (4GiB) single node system with
>>>> an APU, that it will have lots of latencies trying to allocate memory by
>>>> doing direct reclaim trying to allocate order-10 pages, which will fail and
>>>> down it goes until it gets to order-4 or order-3. With this change, we
>>>> don't see those latencies anymore and memory pressure goes down as well.
>>> That reminds me of the scenario I described in the 00862edba135 ("drm/ttm: Use GFP_TRANSHUGE_LIGHT for allocating huge pages") commit log, where taking a filesystem backup could cause Firefox to freeze for on the order of a minute.
>>>
>>> Something like that can't just be ignored as "not a problem" for a potential 30% performance gain.
>>
>> Well using 2MiB is actually a must have for certain HW features and we have quite a lot of people pushing to always using them.
> 
> Latency can't just be ignored though. Interactive apps intermittently freezing because this code desperately tries to reclaim huge pages while the system is under memory pressure isn't acceptable.

Why should that not be acceptable?

The purpose of the fallback is to allow displaying messages like "Your system is low on memory, please close some application!" instead of triggering the OOM killer directly.

In that situation latency is not really a priority any more, but rather functionality.

> Maybe there could be some kind of mechanism which periodically scans BOs for sub-optimal page orders and tries migrating their storage to more optimal pages.

Well the problem usually happens because automatic page de-fragmentation is turned off, we had quite a number of bug reports for that.

So you are basically suggesting to implement something on the BO level which the system administrator has previously turned off on the page level.

On the other hand in this particular case it could be that the system just doesn't has not enough memory for the particular use case.

>> So that TTM still falls back to lower order allocations is just a compromise to not trigger the OOM killer.
>>
>> What we could do is to remove the fallback, but then Cascardos use case wouldn't be working any more at all.
> 
> Surely the issue is direct reclaim, not the fallback.

I would rather say the issue is that fallback makes people think that direct reclaim isn't mandatory.

Regards,
Christian.
Re: [PATCH] drm: ttm: do not direct reclaim when allocating high order pages
Posted by Michel Dänzer 3 weeks ago
On 11.09.25 16:31, Christian König wrote:
> On 11.09.25 14:49, Michel Dänzer wrote:
>>>>> What we are seeing here is on a low memory (4GiB) single node system with
>>>>> an APU, that it will have lots of latencies trying to allocate memory by
>>>>> doing direct reclaim trying to allocate order-10 pages, which will fail and
>>>>> down it goes until it gets to order-4 or order-3. With this change, we
>>>>> don't see those latencies anymore and memory pressure goes down as well.
>>>> That reminds me of the scenario I described in the 00862edba135 ("drm/ttm: Use GFP_TRANSHUGE_LIGHT for allocating huge pages") commit log, where taking a filesystem backup could cause Firefox to freeze for on the order of a minute.
>>>>
>>>> Something like that can't just be ignored as "not a problem" for a potential 30% performance gain.
>>>
>>> Well using 2MiB is actually a must have for certain HW features and we have quite a lot of people pushing to always using them.
>>
>> Latency can't just be ignored though. Interactive apps intermittently freezing because this code desperately tries to reclaim huge pages while the system is under memory pressure isn't acceptable.
> 
> Why should that not be acceptable?

Sounds like you didn't read / understand the scenario in the 00862edba135 commit log:

I was trying to use Firefox while restic was taking a filesystem backup, and it froze for up to a minute. After disabling direct reclaim, Firefox was perfectly usable without noticeable freezes in the same scenario.

Show me the user who finds it acceptable to wait for a minute for interactive apps to respond, just in case some GPU operations might be 30% faster.


> The purpose of the fallback is to allow displaying messages like "Your system is low on memory, please close some application!" instead of triggering the OOM killer directly.

That's not the issue here.


-- 
Earthling Michel Dänzer       \        GNOME / Xwayland / Mesa developer
https://redhat.com             \               Libre software enthusiast
Re: [PATCH] drm: ttm: do not direct reclaim when allocating high order pages
Posted by Christian König 3 weeks ago
On 11.09.25 16:48, Michel Dänzer wrote:
> On 11.09.25 16:31, Christian König wrote:
>> On 11.09.25 14:49, Michel Dänzer wrote:
>>>>>> What we are seeing here is on a low memory (4GiB) single node system with
>>>>>> an APU, that it will have lots of latencies trying to allocate memory by
>>>>>> doing direct reclaim trying to allocate order-10 pages, which will fail and
>>>>>> down it goes until it gets to order-4 or order-3. With this change, we
>>>>>> don't see those latencies anymore and memory pressure goes down as well.
>>>>> That reminds me of the scenario I described in the 00862edba135 ("drm/ttm: Use GFP_TRANSHUGE_LIGHT for allocating huge pages") commit log, where taking a filesystem backup could cause Firefox to freeze for on the order of a minute.
>>>>>
>>>>> Something like that can't just be ignored as "not a problem" for a potential 30% performance gain.
>>>>
>>>> Well using 2MiB is actually a must have for certain HW features and we have quite a lot of people pushing to always using them.
>>>
>>> Latency can't just be ignored though. Interactive apps intermittently freezing because this code desperately tries to reclaim huge pages while the system is under memory pressure isn't acceptable.
>>
>> Why should that not be acceptable?
> 
> Sounds like you didn't read / understand the scenario in the 00862edba135 commit log:
> 
> I was trying to use Firefox while restic was taking a filesystem backup, and it froze for up to a minute. After disabling direct reclaim, Firefox was perfectly usable without noticeable freezes in the same scenario.
> 
> Show me the user who finds it acceptable to wait for a minute for interactive apps to respond, just in case some GPU operations might be 30% faster.

Ok granted, a minute is rather extreme. But IIRC the issue you described was solved by using __GFP_NORETRY, that here is about completely disabling direct reclaim.

As far as I know with __GFP_NORETRY set the direct reclaim path results in latency in the milliseconds range.

The key point is we tried to completely disable direct reclaim before and that made the datacenter customers scream out because now the performance was totally unstable.

E.g. something like compiling software first and then running a benchmark was like 30% slower than running the benchmark directly after boot.

Regards,
Christian.
Re: [PATCH] drm: ttm: do not direct reclaim when allocating high order pages
Posted by Christian König 3 weeks, 1 day ago
On 10.09.25 14:52, Thadeu Lima de Souza Cascardo wrote:
> On Wed, Sep 10, 2025 at 02:11:58PM +0200, Christian König wrote:
>> On 10.09.25 13:59, Thadeu Lima de Souza Cascardo wrote:
>>> When the TTM pool tries to allocate new pages, it stats with max order. If
>>> there are no pages ready in the system, the page allocator will start
>>> reclaim. If direct reclaim fails, the allocator will reduce the order until
>>> it gets all the pages it wants with whatever order the allocator succeeds
>>> to reclaim.
>>>
>>> However, while the allocator is reclaiming, lower order pages might be
>>> available, which would work just fine for the pool allocator. Doing direct
>>> reclaim just introduces latency in allocating memory.
>>>
>>> The system should still start reclaiming in the background with kswapd, but
>>> the pool allocator should try to allocate a lower order page instead of
>>> directly reclaiming.
>>>
>>> If not even a order-1 page is available, the TTM pool allocator will
>>> eventually get to start allocating order-0 pages, at which point it should
>>> and will directly reclaim.
>>
>> Yeah that was discussed before quite a bit but at least for AMD GPUs that is absolutely not something we should do.
>>
>> The performance difference between using high and low order pages can be up to 30%. So the added extra latency is just vital for good performance.
>>
>> We could of course make that depend on the HW you use if it isn't necessary for some other GPU, but at least both NVidia and Intel seem to have pretty much the same HW restrictions.
>>
>> NVidia has been working on extending this to even use 1GiB pages to reduce the TLB overhead even further.
>>
>> Regards,
>> Christian.
>>
> 
> But if the system cannot reclaim or is working hard on reclaiming, it will
> not allocate that page and the pool allocator will resort to lower order
> pages anyway.
> 
> In case the system has pages available, it will use them. I think there is
> a balance here and I find this one is reasonable. If the system is not
> under pressure, it will allocate those higher order pages, as expected.

Well that is not even remotely correct.

We have seen all kind of problems with this, especially on Fedora were automatic de-fragmentation is disabled by default.

The result is that an use case which causes strong memory fragmentation massively affects the performance of later running GPU based applications even when reclaim is enabled.

Disabling this would massively worsen the problem. Falling back to lower order pages is basically just a workaround to avoid the OOM killer under heavy memory fragmentation.

> I can look into the behavior when the system might be fragmented, but I
> still believe that the pool is offering such a protection by keeping those
> higher order pages around. It is when the system is under memory presure
> that we need to resort to lower order pages.
>
> What we are seeing here is on a low memory (4GiB) single node system with
> an APU, that it will have lots of latencies trying to allocate memory by
> doing direct reclaim trying to allocate order-10 pages, which will fail and
> down it goes until it gets to order-4 or order-3. With this change, we
> don't see those latencies anymore and memory pressure goes down as well.

Yeah and that you see memory pressure going down is a clear indicator that something is going wrong here.

If this is for an AMD based GPU then that is an absolutely clear no-go from my side.

Regards,
Christian.

> 
> Cascardo.
> 
>>
>>>
>>> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
>>> ---
>>>  drivers/gpu/drm/ttm/ttm_pool.c | 4 +++-
>>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
>>> index baf27c70a4193a121fbc8b4e67cd6feb4c612b85..6124a53cd15634c833bce379093b557d2a2660fd 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_pool.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_pool.c
>>> @@ -144,9 +144,11 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
>>>  	 * Mapping pages directly into an userspace process and calling
>>>  	 * put_page() on a TTM allocated page is illegal.
>>>  	 */
>>> -	if (order)
>>> +	if (order) {
>>>  		gfp_flags |= __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN |
>>>  			__GFP_THISNODE;
>>> +		gfp_flags &= ~__GFP_DIRECT_RECLAIM;
>>> +	}
>>>  
>>>  	if (!pool->use_dma_alloc) {
>>>  		p = alloc_pages_node(pool->nid, gfp_flags, order);
>>>
>>> ---
>>> base-commit: b320789d6883cc00ac78ce83bccbfe7ed58afcf0
>>> change-id: 20250909-ttm_pool_no_direct_reclaim-ee0807a2d3fe
>>>
>>> Best regards,
>>