dma-buf: heaps: cma: enable dmem cgroup accounting

[PATCH v2 0/3] dma-buf: heaps: cma: enable dmem cgroup accounting

Posted by Eric Chanudet 1 month, 1 week ago

An earlier series[1] from Maxime introduced dmem to the cma allocator in
an attempt to use it generally for dma-buf. Restart from there and apply
the charge in the narrower context of the CMA dma-buf heap instead.

In line with introducing cgroup to the system heap[2], this behavior is
enabled based on dma_heap.mem_accounting, disabled by default.

dmem is chosen for CMA heaps as it allows limits to be set for each
region backing each heap. The charge is only put in the dma-buf heap for
now as it guaranties it can be accounted against a userspace process
that requested the allocation.

[1] https://lore.kernel.org/all/20250310-dmem-cgroups-v1-0-2984c1bc9312@kernel.org/
[2] https://lore.kernel.org/all/20260116-dmabuf-heap-system-memcg-v3-0-ecc6b62cc446@redhat.com/

Signed-off-by: Eric Chanudet <echanude@redhat.com>
---
Changes in v2:
- Rebase on Maxime's introduction of dmem to the cma allocator:
  https://lore.kernel.org/all/20250310-dmem-cgroups-v1-0-2984c1bc9312@kernel.org/
- Remove the dmem region registration from the cma dma-buf heap
- Remove the misplaced logic for the default region.
- Link to v1: https://lore.kernel.org/r/20260130-dmabuf-heap-cma-dmem-v1-1-3647ea993e99@redhat.com

---
Eric Chanudet (1):
      dma-buf: heaps: cma: charge each cma heap's dmem

Maxime Ripard (2):
      cma: Register dmem region for each cma region
      cma: Provide accessor to cma dmem region

 drivers/dma-buf/heaps/cma_heap.c | 15 ++++++++++++++-
 include/linux/cma.h              |  9 +++++++++
 mm/cma.c                         | 20 +++++++++++++++++++-
 mm/cma.h                         |  3 +++
 4 files changed, 45 insertions(+), 2 deletions(-)
---
base-commit: 948e195dfaa56e48eabda591f97630502ff7e27e
change-id: 20260128-dmabuf-heap-cma-dmem-f4120a2df4a8

Best regards,
-- 
Eric Chanudet <echanude@redhat.com>

Re: [PATCH v2 0/3] dma-buf: heaps: cma: enable dmem cgroup accounting

Posted by T.J. Mercier 1 month, 1 week ago

On Wed, Feb 18, 2026 at 9:15 AM Eric Chanudet <echanude@redhat.com> wrote:

Hi Eric,

> An earlier series[1] from Maxime introduced dmem to the cma allocator in
> an attempt to use it generally for dma-buf. Restart from there and apply
> the charge in the narrower context of the CMA dma-buf heap instead.
>
> In line with introducing cgroup to the system heap[2], this behavior is
> enabled based on dma_heap.mem_accounting, disabled by default.
>
> dmem is chosen for CMA heaps as it allows limits to be set for each
> region backing each heap. The charge is only put in the dma-buf heap for
> now as it guaranties it can be accounted against a userspace process
> that requested the allocation.

But CMA memory is system memory, and regular (non-CMA) movable
allocations can occur out of these CMA areas. So this splits system
memory accounting between memcg (from [2]) and dmem. If I want to put
a limit on system memory use I have to adjust multiple limits (memcg +
dmems) and know how to divide the total between them all.

How do you envision using this combination of different controllers?

Thanks,
T.J.

> [1] https://lore.kernel.org/all/20250310-dmem-cgroups-v1-0-2984c1bc9312@kernel.org/
> [2] https://lore.kernel.org/all/20260116-dmabuf-heap-system-memcg-v3-0-ecc6b62cc446@redhat.com/
>
> Signed-off-by: Eric Chanudet <echanude@redhat.com>
> ---
> Changes in v2:
> - Rebase on Maxime's introduction of dmem to the cma allocator:
>   https://lore.kernel.org/all/20250310-dmem-cgroups-v1-0-2984c1bc9312@kernel.org/
> - Remove the dmem region registration from the cma dma-buf heap
> - Remove the misplaced logic for the default region.
> - Link to v1: https://lore.kernel.org/r/20260130-dmabuf-heap-cma-dmem-v1-1-3647ea993e99@redhat.com
>
> ---
> Eric Chanudet (1):
>       dma-buf: heaps: cma: charge each cma heap's dmem
>
> Maxime Ripard (2):
>       cma: Register dmem region for each cma region
>       cma: Provide accessor to cma dmem region
>
>  drivers/dma-buf/heaps/cma_heap.c | 15 ++++++++++++++-
>  include/linux/cma.h              |  9 +++++++++
>  mm/cma.c                         | 20 +++++++++++++++++++-
>  mm/cma.h                         |  3 +++
>  4 files changed, 45 insertions(+), 2 deletions(-)
> ---
> base-commit: 948e195dfaa56e48eabda591f97630502ff7e27e
> change-id: 20260128-dmabuf-heap-cma-dmem-f4120a2df4a8
>
> Best regards,
> --
> Eric Chanudet <echanude@redhat.com>
>

Re: [PATCH v2 0/3] dma-buf: heaps: cma: enable dmem cgroup accounting

Posted by Maxime Ripard 1 month, 1 week ago

Hi TJ,

On Thu, Feb 19, 2026 at 05:14:42PM -0800, T.J. Mercier wrote:
> On Wed, Feb 18, 2026 at 9:15 AM Eric Chanudet <echanude@redhat.com> wrote:
> > An earlier series[1] from Maxime introduced dmem to the cma allocator in
> > an attempt to use it generally for dma-buf. Restart from there and apply
> > the charge in the narrower context of the CMA dma-buf heap instead.
> >
> > In line with introducing cgroup to the system heap[2], this behavior is
> > enabled based on dma_heap.mem_accounting, disabled by default.
> >
> > dmem is chosen for CMA heaps as it allows limits to be set for each
> > region backing each heap. The charge is only put in the dma-buf heap for
> > now as it guaranties it can be accounted against a userspace process
> > that requested the allocation.
> 
> But CMA memory is system memory, and regular (non-CMA) movable
> allocations can occur out of these CMA areas. So this splits system
> memory accounting between memcg (from [2]) and dmem. If I want to put
> a limit on system memory use I have to adjust multiple limits (memcg +
> dmems) and know how to divide the total between them all.
> 
> How do you envision using this combination of different controllers?

I feel like it can be argued either way, and I don't really see a way
out of supporting both.

Like you pointed out, CMA can indeed be seen as system memory, but it's
also a limited pool that you might want to place arbitrary limits on
since, unlike system memory, it can't be reclaimed, will not trigger the
OOM killer, and more generally is a much more sparse resource.

Maxime

Re: [PATCH v2 0/3] dma-buf: heaps: cma: enable dmem cgroup accounting

Posted by T.J. Mercier 1 month ago

On Tue, Feb 24, 2026 at 1:42 AM Maxime Ripard <mripard@redhat.com> wrote:
>
> Hi TJ,
>
> On Thu, Feb 19, 2026 at 05:14:42PM -0800, T.J. Mercier wrote:
> > On Wed, Feb 18, 2026 at 9:15 AM Eric Chanudet <echanude@redhat.com> wrote:
> > > An earlier series[1] from Maxime introduced dmem to the cma allocator in
> > > an attempt to use it generally for dma-buf. Restart from there and apply
> > > the charge in the narrower context of the CMA dma-buf heap instead.
> > >
> > > In line with introducing cgroup to the system heap[2], this behavior is
> > > enabled based on dma_heap.mem_accounting, disabled by default.
> > >
> > > dmem is chosen for CMA heaps as it allows limits to be set for each
> > > region backing each heap. The charge is only put in the dma-buf heap for
> > > now as it guaranties it can be accounted against a userspace process
> > > that requested the allocation.
> >
> > But CMA memory is system memory, and regular (non-CMA) movable
> > allocations can occur out of these CMA areas. So this splits system
> > memory accounting between memcg (from [2]) and dmem. If I want to put
> > a limit on system memory use I have to adjust multiple limits (memcg +
> > dmems) and know how to divide the total between them all.
> >
> > How do you envision using this combination of different controllers?
>
> I feel like it can be argued either way, and I don't really see a way
> out of supporting both.
>
> Like you pointed out, CMA can indeed be seen as system memory, but it's
> also a limited pool that you might want to place arbitrary limits on
> since, unlike system memory, it can't be reclaimed, will not trigger the
> OOM killer, and more generally is a much more sparse resource.

Ok thanks. Yeah I guess we'll just have to add the accounting
complexity as needed to satisfy everyone's different needs.

Re: [PATCH v2 0/3] dma-buf: heaps: cma: enable dmem cgroup accounting

Posted by Christian König 1 month, 1 week ago

On 2/20/26 02:14, T.J. Mercier wrote:
> On Wed, Feb 18, 2026 at 9:15 AM Eric Chanudet <echanude@redhat.com> wrote:
> 
> Hi Eric,
> 
>> An earlier series[1] from Maxime introduced dmem to the cma allocator in
>> an attempt to use it generally for dma-buf. Restart from there and apply
>> the charge in the narrower context of the CMA dma-buf heap instead.
>>
>> In line with introducing cgroup to the system heap[2], this behavior is
>> enabled based on dma_heap.mem_accounting, disabled by default.
>>
>> dmem is chosen for CMA heaps as it allows limits to be set for each
>> region backing each heap. The charge is only put in the dma-buf heap for
>> now as it guaranties it can be accounted against a userspace process
>> that requested the allocation.
> 
> But CMA memory is system memory, and regular (non-CMA) movable
> allocations can occur out of these CMA areas. So this splits system
> memory accounting between memcg (from [2]) and dmem. If I want to put
> a limit on system memory use I have to adjust multiple limits (memcg +
> dmems) and know how to divide the total between them all.
> 
> How do you envision using this combination of different controllers?

Yeah we have this problem pretty much everywhere.

There are both use cases where you want to account device allocations to memcg and when you don't want that.

From what I know at the moment it would be best if the administrator could say for each dmem if it should account additionally to memcg or not.

Using module parameters to enable/disable it globally is just a workaround as far as I can see.

Regards,
Christian.

> 
> Thanks,
> T.J.
> 
>> [1] https://lore.kernel.org/all/20250310-dmem-cgroups-v1-0-2984c1bc9312@kernel.org/
>> [2] https://lore.kernel.org/all/20260116-dmabuf-heap-system-memcg-v3-0-ecc6b62cc446@redhat.com/
>>
>> Signed-off-by: Eric Chanudet <echanude@redhat.com>
>> ---
>> Changes in v2:
>> - Rebase on Maxime's introduction of dmem to the cma allocator:
>>   https://lore.kernel.org/all/20250310-dmem-cgroups-v1-0-2984c1bc9312@kernel.org/
>> - Remove the dmem region registration from the cma dma-buf heap
>> - Remove the misplaced logic for the default region.
>> - Link to v1: https://lore.kernel.org/r/20260130-dmabuf-heap-cma-dmem-v1-1-3647ea993e99@redhat.com
>>
>> ---
>> Eric Chanudet (1):
>>       dma-buf: heaps: cma: charge each cma heap's dmem
>>
>> Maxime Ripard (2):
>>       cma: Register dmem region for each cma region
>>       cma: Provide accessor to cma dmem region
>>
>>  drivers/dma-buf/heaps/cma_heap.c | 15 ++++++++++++++-
>>  include/linux/cma.h              |  9 +++++++++
>>  mm/cma.c                         | 20 +++++++++++++++++++-
>>  mm/cma.h                         |  3 +++
>>  4 files changed, 45 insertions(+), 2 deletions(-)
>> ---
>> base-commit: 948e195dfaa56e48eabda591f97630502ff7e27e
>> change-id: 20260128-dmabuf-heap-cma-dmem-f4120a2df4a8
>>
>> Best regards,
>> --
>> Eric Chanudet <echanude@redhat.com>
>>

Re: [PATCH v2 0/3] dma-buf: heaps: cma: enable dmem cgroup accounting

Posted by Maxime Ripard 1 month, 1 week ago

Hi Christian,

On Fri, Feb 20, 2026 at 10:45:08AM +0100, Christian König wrote:
> On 2/20/26 02:14, T.J. Mercier wrote:
> > On Wed, Feb 18, 2026 at 9:15 AM Eric Chanudet <echanude@redhat.com> wrote:
> > 
> > Hi Eric,
> > 
> >> An earlier series[1] from Maxime introduced dmem to the cma allocator in
> >> an attempt to use it generally for dma-buf. Restart from there and apply
> >> the charge in the narrower context of the CMA dma-buf heap instead.
> >>
> >> In line with introducing cgroup to the system heap[2], this behavior is
> >> enabled based on dma_heap.mem_accounting, disabled by default.
> >>
> >> dmem is chosen for CMA heaps as it allows limits to be set for each
> >> region backing each heap. The charge is only put in the dma-buf heap for
> >> now as it guaranties it can be accounted against a userspace process
> >> that requested the allocation.
> > 
> > But CMA memory is system memory, and regular (non-CMA) movable
> > allocations can occur out of these CMA areas. So this splits system
> > memory accounting between memcg (from [2]) and dmem. If I want to put
> > a limit on system memory use I have to adjust multiple limits (memcg +
> > dmems) and know how to divide the total between them all.
> > 
> > How do you envision using this combination of different controllers?
> 
> Yeah we have this problem pretty much everywhere.
> 
> There are both use cases where you want to account device allocations
> to memcg and when you don't want that.
> 
> From what I know at the moment it would be best if the administrator
> could say for each dmem if it should account additionally to memcg or
> not.
> 
> Using module parameters to enable/disable it globally is just a
> workaround as far as I can see.

That's a pretty good idea! It would indeed be a solution that could
satisfy everyone (I assume?).

Maxime

Re: [PATCH v2 0/3] dma-buf: heaps: cma: enable dmem cgroup accounting

Posted by Christian König 1 month, 1 week ago

On 2/24/26 10:43, Maxime Ripard wrote:
> Hi Christian,
> 
> On Fri, Feb 20, 2026 at 10:45:08AM +0100, Christian König wrote:
>> On 2/20/26 02:14, T.J. Mercier wrote:
>>> On Wed, Feb 18, 2026 at 9:15 AM Eric Chanudet <echanude@redhat.com> wrote:
>>>
>>> Hi Eric,
>>>
>>>> An earlier series[1] from Maxime introduced dmem to the cma allocator in
>>>> an attempt to use it generally for dma-buf. Restart from there and apply
>>>> the charge in the narrower context of the CMA dma-buf heap instead.
>>>>
>>>> In line with introducing cgroup to the system heap[2], this behavior is
>>>> enabled based on dma_heap.mem_accounting, disabled by default.
>>>>
>>>> dmem is chosen for CMA heaps as it allows limits to be set for each
>>>> region backing each heap. The charge is only put in the dma-buf heap for
>>>> now as it guaranties it can be accounted against a userspace process
>>>> that requested the allocation.
>>>
>>> But CMA memory is system memory, and regular (non-CMA) movable
>>> allocations can occur out of these CMA areas. So this splits system
>>> memory accounting between memcg (from [2]) and dmem. If I want to put
>>> a limit on system memory use I have to adjust multiple limits (memcg +
>>> dmems) and know how to divide the total between them all.
>>>
>>> How do you envision using this combination of different controllers?
>>
>> Yeah we have this problem pretty much everywhere.
>>
>> There are both use cases where you want to account device allocations
>> to memcg and when you don't want that.
>>
>> From what I know at the moment it would be best if the administrator
>> could say for each dmem if it should account additionally to memcg or
>> not.
>>
>> Using module parameters to enable/disable it globally is just a
>> workaround as far as I can see.
> 
> That's a pretty good idea! It would indeed be a solution that could
> satisfy everyone (I assume?).

I think so yeah.

From what I have seen we have three different use cases:

1. local device memory (VRAM), GTT/CMA and memcg are completely separate domains and you want to have completely separate values as limit for them.

2. local device memory (VRAM) is separate. GTT/CMA are accounted to memcg, you can still have separate values as limit so that nobody over allocates CMA (for example).

3. All three are accounted to memcg because system memory is actually used as fallback if applications over allocate device local memory.

It's debatable what should be the default, but we clearly need to handle all three use cases. Potentially even on the same system.

Regards,
Christian.

> 
> Maxime

Re: [PATCH v2 0/3] dma-buf: heaps: cma: enable dmem cgroup accounting

Posted by Dave Airlie 1 month ago

On Tue, 24 Feb 2026 at 20:32, Christian König <christian.koenig@amd.com> wrote:
>
> On 2/24/26 10:43, Maxime Ripard wrote:
> > Hi Christian,
> >
> > On Fri, Feb 20, 2026 at 10:45:08AM +0100, Christian König wrote:
> >> On 2/20/26 02:14, T.J. Mercier wrote:
> >>> On Wed, Feb 18, 2026 at 9:15 AM Eric Chanudet <echanude@redhat.com> wrote:
> >>>
> >>> Hi Eric,
> >>>
> >>>> An earlier series[1] from Maxime introduced dmem to the cma allocator in
> >>>> an attempt to use it generally for dma-buf. Restart from there and apply
> >>>> the charge in the narrower context of the CMA dma-buf heap instead.
> >>>>
> >>>> In line with introducing cgroup to the system heap[2], this behavior is
> >>>> enabled based on dma_heap.mem_accounting, disabled by default.
> >>>>
> >>>> dmem is chosen for CMA heaps as it allows limits to be set for each
> >>>> region backing each heap. The charge is only put in the dma-buf heap for
> >>>> now as it guaranties it can be accounted against a userspace process
> >>>> that requested the allocation.
> >>>
> >>> But CMA memory is system memory, and regular (non-CMA) movable
> >>> allocations can occur out of these CMA areas. So this splits system
> >>> memory accounting between memcg (from [2]) and dmem. If I want to put
> >>> a limit on system memory use I have to adjust multiple limits (memcg +
> >>> dmems) and know how to divide the total between them all.
> >>>
> >>> How do you envision using this combination of different controllers?
> >>
> >> Yeah we have this problem pretty much everywhere.
> >>
> >> There are both use cases where you want to account device allocations
> >> to memcg and when you don't want that.
> >>
> >> From what I know at the moment it would be best if the administrator
> >> could say for each dmem if it should account additionally to memcg or
> >> not.
> >>
> >> Using module parameters to enable/disable it globally is just a
> >> workaround as far as I can see.
> >
> > That's a pretty good idea! It would indeed be a solution that could
> > satisfy everyone (I assume?).
>
> I think so yeah.
>
> From what I have seen we have three different use cases:
>
> 1. local device memory (VRAM), GTT/CMA and memcg are completely separate domains and you want to have completely separate values as limit for them.
>
> 2. local device memory (VRAM) is separate. GTT/CMA are accounted to memcg, you can still have separate values as limit so that nobody over allocates CMA (for example).
>
> 3. All three are accounted to memcg because system memory is actually used as fallback if applications over allocate device local memory.
>
> It's debatable what should be the default, but we clearly need to handle all three use cases. Potentially even on the same system.


Give me cases where 1 or 3 actually make sense in the real world.

I can maybe take 1 if CMA is just old school CMA carved out preboot so
it's not in the main memory pool, but in that case it's just equiv to
device memory really

If something is in the main memory pool, it should be accounted for
using memcg. You cannot remove memory from the main memory pool
without accounting for it. Now we can add gpu limits to memcg, that
was going to me a next step in my series.

Whether we have that as a percentage or a hard limit, we would just
say GPU can consume 95% of the configured max for this cgroup.

3 to me just sounds like we haven't figured out fallback or
suspend/resume accounting yet, which is true, but I'm not sure there
is a reason for 3 to exist outside of the we don't know how to account
for temporary storage of swapped out VRAM objects.

Like it might be we need to have it so we have a limited transfer pool
of system memory for VRAM objects to "live in" but we move them to
swap as soon as possible once we get to the limit on that. Now what we
do on systems where no swap is available, that gets into I've no idea
space.

Static partitioning memcg up into a dmem and memcg isn't going to
solve this, we should solve it inside memcg.

Dave.

Re: [PATCH v2 0/3] dma-buf: heaps: cma: enable dmem cgroup accounting

Posted by Christian König 1 month ago

On 2/26/26 00:43, Dave Airlie wrote:
>>>>
>>>> Using module parameters to enable/disable it globally is just a
>>>> workaround as far as I can see.
>>>
>>> That's a pretty good idea! It would indeed be a solution that could
>>> satisfy everyone (I assume?).
>>
>> I think so yeah.
>>
>> From what I have seen we have three different use cases:
>>
>> 1. local device memory (VRAM), GTT/CMA and memcg are completely separate domains and you want to have completely separate values as limit for them.
>>
>> 2. local device memory (VRAM) is separate. GTT/CMA are accounted to memcg, you can still have separate values as limit so that nobody over allocates CMA (for example).
>>
>> 3. All three are accounted to memcg because system memory is actually used as fallback if applications over allocate device local memory.
>>
>> It's debatable what should be the default, but we clearly need to handle all three use cases. Potentially even on the same system.
> 
> 
> Give me cases where 1 or 3 actually make sense in the real world.
> 
> I can maybe take 1 if CMA is just old school CMA carved out preboot so
> it's not in the main memory pool, but in that case it's just equiv to
> device memory really

Well I think #1 is pretty much the default for dGPUs on a desktop. That's why I mentioned it first.

> If something is in the main memory pool, it should be accounted for
> using memcg. You cannot remove memory from the main memory pool
> without accounting for it.

That's what I'm strongly disagreeing on. See the page cache is not accounted to memcg either, so when you open a file and the kernel caches the backing pages that doesn't reduce the amount you can allocate through malloc, doesn't it?

For dGPUs GTT is basically just the fallback when you over allocate local memory (plus a few things for uploads).

In other words system memory becomes the swap of device local memory. Just think about why memcg doesn't limits swap but only how much is swapped out.

For those use cases you want to have a hard static limit on how much system memory can be used as swap. That's why we originally used to have the per driver gttsize, the global TTM page limit etc... 

The problem is that we weakened those limitations because of the APU use case and that in turn resulted in all those problems with browsers over allocating system memory etc....

Now cgroups should provide an alternative and I still think that this is the right approach to solve this, but in this alternative I think we want to preserve the original idea of separate domains for dGPUs.

> Now we can add gpu limits to memcg, that
> was going to me a next step in my series.
> 
> Whether we have that as a percentage or a hard limit, we would just
> say GPU can consume 95% of the configured max for this cgroup.

That is only useful on APUs which don't have local memory because those make all of their allocations through system memory.

dGPUs should be much more limited in that regard.

> 3 to me just sounds like we haven't figured out fallback or
> suspend/resume accounting yet, which is true, but I'm not sure there
> is a reason for 3 to exist outside of the we don't know how to account
> for temporary storage of swapped out VRAM objects.

Mario has fixed or is at least working on the suspend/resume problems. So I don't consider that an issue any more.

The use case 3 happens on HPC systems where device local memory is basically just a cache. For example this one here: https://en.wikipedia.org/wiki/Frontier_(supercomputer)

In this use case you don't care if a buffer is in device local memory or system memory, what you care about is that things are reliable and for that your task at hand shouldn't exceeds a certain limit.

E.g. you run computation A which can use 100GB of resources and when computation B starts concurrently you don't want A to suddenly fail because it now fights with B for resources.

> Like it might be we need to have it so we have a limited transfer pool
> of system memory for VRAM objects to "live in" but we move them to
> swap as soon as possible once we get to the limit on that. Now what we
> do on systems where no swap is available, that gets into I've no idea
> space.
> 
> Static partitioning memcg up into a dmem and memcg isn't going to
> solve this, we should solve it inside memcg.

Well it's certainly possible to solve all of this in memcg, but I don't think it's very elegant.

Static partitioning between memcg and dmeme for the dGPU case and merged accounting for the APU case by default and then giving the system administrator to eventually switch to use case 3 sounds much more flexible to me.

At least the obvious advantage is that you don't start to add module parameters to TTM, DMA-buf heaps and drivers if they should or should not account to memcg, but rather keep all the logic inside cgroups.

Christian.

> 
> Dave.

Re: [PATCH v2 0/3] dma-buf: heaps: cma: enable dmem cgroup accounting

Posted by Dave Airlie 1 month ago

On Thu, 26 Feb 2026 at 21:32, Christian König <christian.koenig@amd.com> wrote:
>
> On 2/26/26 00:43, Dave Airlie wrote:
> >>>>
> >>>> Using module parameters to enable/disable it globally is just a
> >>>> workaround as far as I can see.
> >>>
> >>> That's a pretty good idea! It would indeed be a solution that could
> >>> satisfy everyone (I assume?).
> >>
> >> I think so yeah.
> >>
> >> From what I have seen we have three different use cases:
> >>
> >> 1. local device memory (VRAM), GTT/CMA and memcg are completely separate domains and you want to have completely separate values as limit for them.
> >>
> >> 2. local device memory (VRAM) is separate. GTT/CMA are accounted to memcg, you can still have separate values as limit so that nobody over allocates CMA (for example).
> >>
> >> 3. All three are accounted to memcg because system memory is actually used as fallback if applications over allocate device local memory.
> >>
> >> It's debatable what should be the default, but we clearly need to handle all three use cases. Potentially even on the same system.
> >
> >
> > Give me cases where 1 or 3 actually make sense in the real world.
> >
> > I can maybe take 1 if CMA is just old school CMA carved out preboot so
> > it's not in the main memory pool, but in that case it's just equiv to
> > device memory really
>
> Well I think #1 is pretty much the default for dGPUs on a desktop. That's why I mentioned it first.

But I don't think it's what we would want, if someone allocate a
system memory object then we should memcg account it. But in this
scenario it's where we really have to face eviction, and maybe in this
scenarios it makes sense to state that we need to reserve memcg space
for swapping objects, both out of VRAM and into swap itself.

I'm starting to think there isn't another good way to deal with
dynamic power and suspend/resume if we don't have some accounting for
moving objects out of VRAM into system memory, it's just whether we
can do something special to account for it, but not destroy the
process on behalf of another process doing the wrong thing.

>
> > If something is in the main memory pool, it should be accounted for
> > using memcg. You cannot remove memory from the main memory pool
> > without accounting for it.
>
> That's what I'm strongly disagreeing on. See the page cache is not accounted to memcg either, so when you open a file and the kernel caches the backing pages that doesn't reduce the amount you can allocate through malloc, doesn't it?

So the page cache is accounted according to Shakeel, so can we find
some other example. I really think this is a bad idea, partitioning a
single resource into two competing pools isn't going to work that
well.

>
> In other words system memory becomes the swap of device local memory. Just think about why memcg doesn't limits swap but only how much is swapped out.

But we still need swap for system memory as well, but there are
systems with no swap configured, and on those I think we need to be
integrated with memcg anyways to make it work.

> For those use cases you want to have a hard static limit on how much system memory can be used as swap. That's why we originally used to have the per driver gttsize, the global TTM page limit etc...
>
> The problem is that we weakened those limitations because of the APU use case and that in turn resulted in all those problems with browsers over allocating system memory etc....
>
> Now cgroups should provide an alternative and I still think that this is the right approach to solve this, but in this alternative I think we want to preserve the original idea of separate domains for dGPUs.
>
> > Now we can add gpu limits to memcg, that
> > was going to me a next step in my series.
> >
> > Whether we have that as a percentage or a hard limit, we would just
> > say GPU can consume 95% of the configured max for this cgroup.
>
> That is only useful on APUs which don't have local memory because those make all of their allocations through system memory.
>
> dGPUs should be much more limited in that regard.

So you think we should limit the system memory allocations on dGPU.
I'm worried about GTT|VRAM allocations which once evicted, there might
be no reason to push back into VRAM and that ending up as a backdoor
to allocating a lot of system memory and bypassing memcg. I don't
really like the idea of bypassing memcg at all.

>
> > 3 to me just sounds like we haven't figured out fallback or
> > suspend/resume accounting yet, which is true, but I'm not sure there
> > is a reason for 3 to exist outside of the we don't know how to account
> > for temporary storage of swapped out VRAM objects.
>
> Mario has fixed or is at least working on the suspend/resume problems. So I don't consider that an issue any more.
>
> The use case 3 happens on HPC systems where device local memory is basically just a cache. For example this one here: https://en.wikipedia.org/wiki/Frontier_(supercomputer)
>
> In this use case you don't care if a buffer is in device local memory or system memory, what you care about is that things are reliable and for that your task at hand shouldn't exceeds a certain limit.
>
> E.g. you run computation A which can use 100GB of resources and when computation B starts concurrently you don't want A to suddenly fail because it now fights with B for resources.
>
> > Like it might be we need to have it so we have a limited transfer pool
> > of system memory for VRAM objects to "live in" but we move them to
> > swap as soon as possible once we get to the limit on that. Now what we
> > do on systems where no swap is available, that gets into I've no idea
> > space.
> >
> > Static partitioning memcg up into a dmem and memcg isn't going to
> > solve this, we should solve it inside memcg.
>
> Well it's certainly possible to solve all of this in memcg, but I don't think it's very elegant.
>
> Static partitioning between memcg and dmeme for the dGPU case and merged accounting for the APU case by default and then giving the system administrator to eventually switch to use case 3 sounds much more flexible to me.
>
> At least the obvious advantage is that you don't start to add module parameters to TTM, DMA-buf heaps and drivers if they should or should not account to memcg, but rather keep all the logic inside cgroups.

I don't think we should have to static partition at all here, it's
just asking for problems later, and it without proper accounting will
cause a bunch of reclaim unnecessarily.

Dave.

Re: [PATCH v2 0/3] dma-buf: heaps: cma: enable dmem cgroup accounting

Posted by Shakeel Butt 1 month ago

On Thu, Feb 26, 2026 at 12:32:42PM +0100, Christian König wrote:
> On 2/26/26 00:43, Dave Airlie wrote:
> >>>>
> 
> > If something is in the main memory pool, it should be accounted for
> > using memcg. You cannot remove memory from the main memory pool
> > without accounting for it.
> 
> That's what I'm strongly disagreeing on. See the page cache is not accounted to memcg either, so when you open a file and the kernel caches the backing pages that doesn't reduce the amount you can allocate through malloc, doesn't it?

Page cache is accounted/charged to memcg and usually it is reclaimable meaning
it most probably doesn't reduce the amount of anon memory you can allocate.

> 
> For dGPUs GTT is basically just the fallback when you over allocate local memory (plus a few things for uploads).
> 
> In other words system memory becomes the swap of device local memory. Just think about why memcg doesn't limits swap but only how much is swapped out.

What does "memcg doesn't limits swap" mean?

Re: [PATCH v2 0/3] dma-buf: heaps: cma: enable dmem cgroup accounting

Posted by Eric Chanudet 1 month, 1 week ago

On Fri, Feb 20, 2026 at 10:45:08AM +0100, Christian König wrote:
> On 2/20/26 02:14, T.J. Mercier wrote:
> > On Wed, Feb 18, 2026 at 9:15 AM Eric Chanudet <echanude@redhat.com> wrote:
> > 
> > Hi Eric,
> > 
> >> An earlier series[1] from Maxime introduced dmem to the cma allocator in
> >> an attempt to use it generally for dma-buf. Restart from there and apply
> >> the charge in the narrower context of the CMA dma-buf heap instead.
> >>
> >> In line with introducing cgroup to the system heap[2], this behavior is
> >> enabled based on dma_heap.mem_accounting, disabled by default.
> >>
> >> dmem is chosen for CMA heaps as it allows limits to be set for each
> >> region backing each heap. The charge is only put in the dma-buf heap for
> >> now as it guaranties it can be accounted against a userspace process
> >> that requested the allocation.
> > 
> > But CMA memory is system memory, and regular (non-CMA) movable
> > allocations can occur out of these CMA areas. So this splits system
> > memory accounting between memcg (from [2]) and dmem. If I want to put
> > a limit on system memory use I have to adjust multiple limits (memcg +
> > dmems) and know how to divide the total between them all.
> > 
> > How do you envision using this combination of different controllers?

We are trying to control each CMA heap use of their CMA regions.

Regular allocation would be migrated out should CMA allocation require
some space already taken in the region (bare, I suppose, if these end up
pinned...) so I didn't think it needed to account for these in dmem.

As for accounting for CMA allocations in memcg, I suppose that's the
question prior discussions explored as well.

> Yeah we have this problem pretty much everywhere.
> 
> There are both use cases where you want to account device allocations to memcg and when you don't want that.
> 
> From what I know at the moment it would be best if the administrator could say for each dmem if it should account additionally to memcg or not.
> 
> Using module parameters to enable/disable it globally is just a workaround as far as I can see.
> 

So, for example, adding a dmem knob so one can:
echo "cma/reserved $SIZE" > /sys/fs/cgroup/user.slice/dmem.max
echo "cma/reserved 1" > /sys/fs/cgroup/user.slice/dmem.charge_memcg

I'll take a look.

> Regards,
> Christian.
> 
> > 
> > Thanks,
> > T.J.
> > 
> >> [1] https://lore.kernel.org/all/20250310-dmem-cgroups-v1-0-2984c1bc9312@kernel.org/
> >> [2] https://lore.kernel.org/all/20260116-dmabuf-heap-system-memcg-v3-0-ecc6b62cc446@redhat.com/
> >>
> >> Signed-off-by: Eric Chanudet <echanude@redhat.com>
> >> ---
> >> Changes in v2:
> >> - Rebase on Maxime's introduction of dmem to the cma allocator:
> >>   https://lore.kernel.org/all/20250310-dmem-cgroups-v1-0-2984c1bc9312@kernel.org/
> >> - Remove the dmem region registration from the cma dma-buf heap
> >> - Remove the misplaced logic for the default region.
> >> - Link to v1: https://lore.kernel.org/r/20260130-dmabuf-heap-cma-dmem-v1-1-3647ea993e99@redhat.com
> >>
> >> ---
> >> Eric Chanudet (1):
> >>       dma-buf: heaps: cma: charge each cma heap's dmem
> >>
> >> Maxime Ripard (2):
> >>       cma: Register dmem region for each cma region
> >>       cma: Provide accessor to cma dmem region
> >>
> >>  drivers/dma-buf/heaps/cma_heap.c | 15 ++++++++++++++-
> >>  include/linux/cma.h              |  9 +++++++++
> >>  mm/cma.c                         | 20 +++++++++++++++++++-
> >>  mm/cma.h                         |  3 +++
> >>  4 files changed, 45 insertions(+), 2 deletions(-)
> >> ---
> >> base-commit: 948e195dfaa56e48eabda591f97630502ff7e27e
> >> change-id: 20260128-dmabuf-heap-cma-dmem-f4120a2df4a8
> >>
> >> Best regards,
> >> --
> >> Eric Chanudet <echanude@redhat.com>
> >>
> 

-- 
Eric Chanudet

Re: [PATCH v2 0/3] dma-buf: heaps: cma: enable dmem cgroup accounting

Posted by Albert Esteve 1 month, 1 week ago

On Wed, Feb 18, 2026 at 6:15 PM Eric Chanudet <echanude@redhat.com> wrote:
>
> An earlier series[1] from Maxime introduced dmem to the cma allocator in
> an attempt to use it generally for dma-buf. Restart from there and apply
> the charge in the narrower context of the CMA dma-buf heap instead.
>
> In line with introducing cgroup to the system heap[2], this behavior is
> enabled based on dma_heap.mem_accounting, disabled by default.
>
> dmem is chosen for CMA heaps as it allows limits to be set for each
> region backing each heap. The charge is only put in the dma-buf heap for
> now as it guaranties it can be accounted against a userspace process
> that requested the allocation.
>
> [1] https://lore.kernel.org/all/20250310-dmem-cgroups-v1-0-2984c1bc9312@kernel.org/
> [2] https://lore.kernel.org/all/20260116-dmabuf-heap-system-memcg-v3-0-ecc6b62cc446@redhat.com/
>
> Signed-off-by: Eric Chanudet <echanude@redhat.com>

Tested-by: Albert Esteve <aesteve@redhat.com>

I tested the series with a Fedora VM, setting the global user.slice
dmem.max value and then trying to allocate buffers of different sizes
with DMA_HEAP_IOCTL_ALLOC. Exceeding the max limit results in
'Resource temporarily unavailable' and the allocation fails.

BR,
Albert

> ---
> Changes in v2:
> - Rebase on Maxime's introduction of dmem to the cma allocator:
>   https://lore.kernel.org/all/20250310-dmem-cgroups-v1-0-2984c1bc9312@kernel.org/
> - Remove the dmem region registration from the cma dma-buf heap
> - Remove the misplaced logic for the default region.
> - Link to v1: https://lore.kernel.org/r/20260130-dmabuf-heap-cma-dmem-v1-1-3647ea993e99@redhat.com
>
> ---
> Eric Chanudet (1):
>       dma-buf: heaps: cma: charge each cma heap's dmem
>
> Maxime Ripard (2):
>       cma: Register dmem region for each cma region
>       cma: Provide accessor to cma dmem region
>
>  drivers/dma-buf/heaps/cma_heap.c | 15 ++++++++++++++-
>  include/linux/cma.h              |  9 +++++++++
>  mm/cma.c                         | 20 +++++++++++++++++++-
>  mm/cma.h                         |  3 +++
>  4 files changed, 45 insertions(+), 2 deletions(-)
> ---
> base-commit: 948e195dfaa56e48eabda591f97630502ff7e27e
> change-id: 20260128-dmabuf-heap-cma-dmem-f4120a2df4a8
>
> Best regards,
> --
> Eric Chanudet <echanude@redhat.com>
>