[PATCH v6 0/2] dma-buf: heaps: Create a CMA heap for each CMA reserved region

Maxime Ripard posted 2 patches 2 months, 4 weeks ago
There is a newer version of this series
drivers/dma-buf/heaps/cma_heap.c | 52 +++++++++++++++++++++++++++++++++++++++-
include/linux/dma-map-ops.h      | 13 ++++++++++
kernel/dma/contiguous.c          |  7 ++++++
3 files changed, 71 insertions(+), 1 deletion(-)
[PATCH v6 0/2] dma-buf: heaps: Create a CMA heap for each CMA reserved region
Posted by Maxime Ripard 2 months, 4 weeks ago
Hi,

Here's another attempt at supporting user-space allocations from a
specific carved-out reserved memory region.

The initial problem we were discussing was that I'm currently working on
a platform which has a memory layout with ECC enabled. However, enabling
the ECC has a number of drawbacks on that platform: lower performance,
increased memory usage, etc. So for things like framebuffers, the
trade-off isn't great and thus there's a memory region with ECC disabled
to allocate from for such use cases.

After a suggestion from John, I chose to first start using heap
allocations flags to allow for userspace to ask for a particular ECC
setup. This is then backed by a new heap type that runs from reserved
memory chunks flagged as such, and the existing DT properties to specify
the ECC properties.

After further discussion, it was considered that flags were not the
right solution, and relying on the names of the heaps would be enough to
let userspace know the kind of buffer it deals with.

Thus, even though the uAPI part of it had been dropped in this second
version, we still needed a driver to create heaps out of carved-out memory
regions. In addition to the original usecase, a similar driver can be
found in BSPs from most vendors, so I believe it would be a useful
addition to the kernel.

Some extra discussion with Rob Herring [1] came to the conclusion that
some specific compatible for this is not great either, and as such an
new driver probably isn't called for either.

Some other discussions we had with John [2] also dropped some hints that
multiple CMA heaps might be a good idea, and some vendors seem to do
that too.

So here's another attempt that doesn't affect the device tree at all and
will just create a heap for every CMA reserved memory region.

It also falls nicely into the current plan we have to support cgroups in
DRM/KMS and v4l2, which is an additional benefit.

Let me know what you think,
Maxime

1: https://lore.kernel.org/all/20250707-cobalt-dingo-of-serenity-dbf92c@houat/
2: https://lore.kernel.org/all/CANDhNCroe6ZBtN_o=c71kzFFaWK-fF5rCdnr9P5h1sgPOWSGSw@mail.gmail.com/

Let me know what you think,
Maxime

Signed-off-by: Maxime Ripard <mripard@kernel.org>
---
Changes in v6:
- Drop the new driver and allocate a CMA heap for each region now
- Dropped the binding
- Rebased on 6.16-rc5
- Link to v5: https://lore.kernel.org/r/20250617-dma-buf-ecc-heap-v5-0-0abdc5863a4f@kernel.org

Changes in v5:
- Rebased on 6.16-rc2
- Switch from property to dedicated binding
- Link to v4: https://lore.kernel.org/r/20250520-dma-buf-ecc-heap-v4-1-bd2e1f1bb42c@kernel.org

Changes in v4:
- Rebased on 6.15-rc7
- Map buffers only when map is actually called, not at allocation time
- Deal with restricted-dma-pool and shared-dma-pool
- Reword Kconfig options
- Properly report dma_map_sgtable failures
- Link to v3: https://lore.kernel.org/r/20250407-dma-buf-ecc-heap-v3-0-97cdd36a5f29@kernel.org

Changes in v3:
- Reworked global variable patch
- Link to v2: https://lore.kernel.org/r/20250401-dma-buf-ecc-heap-v2-0-043fd006a1af@kernel.org

Changes in v2:
- Add vmap/vunmap operations
- Drop ECC flags uapi
- Rebase on top of 6.14
- Link to v1: https://lore.kernel.org/r/20240515-dma-buf-ecc-heap-v1-0-54cbbd049511@kernel.org

---
Maxime Ripard (2):
      dma/contiguous: Add helper to test reserved memory type
      dma-buf: heaps: cma: Create CMA heap for each CMA reserved region

 drivers/dma-buf/heaps/cma_heap.c | 52 +++++++++++++++++++++++++++++++++++++++-
 include/linux/dma-map-ops.h      | 13 ++++++++++
 kernel/dma/contiguous.c          |  7 ++++++
 3 files changed, 71 insertions(+), 1 deletion(-)
---
base-commit: 47633099a672fc7bfe604ef454e4f116e2c954b1
change-id: 20240515-dma-buf-ecc-heap-28a311d2c94e
prerequisite-message-id: <20250610131231.1724627-1-jkangas@redhat.com>
prerequisite-patch-id: bc44be5968feb187f2bc1b8074af7209462b18e7
prerequisite-patch-id: f02a91b723e5ec01fbfedf3c3905218b43d432da
prerequisite-patch-id: e944d0a3e22f2cdf4d3b3906e5603af934696deb

Best regards,
-- 
Maxime Ripard <mripard@kernel.org>
Re: [PATCH v6 0/2] dma-buf: heaps: Create a CMA heap for each CMA reserved region
Posted by Nicolas Dufresne 2 months, 4 weeks ago
Hi Maxime,

Le mercredi 09 juillet 2025 à 14:44 +0200, Maxime Ripard a écrit :
> Hi,
> 
> Here's another attempt at supporting user-space allocations from a
> specific carved-out reserved memory region.
> 
> The initial problem we were discussing was that I'm currently working on
> a platform which has a memory layout with ECC enabled. However, enabling
> the ECC has a number of drawbacks on that platform: lower performance,
> increased memory usage, etc. So for things like framebuffers, the
> trade-off isn't great and thus there's a memory region with ECC disabled
> to allocate from for such use cases.
> 
> After a suggestion from John, I chose to first start using heap
> allocations flags to allow for userspace to ask for a particular ECC
> setup. This is then backed by a new heap type that runs from reserved
> memory chunks flagged as such, and the existing DT properties to specify
> the ECC properties.
> 
> After further discussion, it was considered that flags were not the
> right solution, and relying on the names of the heaps would be enough to
> let userspace know the kind of buffer it deals with.
> 
> Thus, even though the uAPI part of it had been dropped in this second
> version, we still needed a driver to create heaps out of carved-out memory
> regions. In addition to the original usecase, a similar driver can be
> found in BSPs from most vendors, so I believe it would be a useful
> addition to the kernel.
> 
> Some extra discussion with Rob Herring [1] came to the conclusion that
> some specific compatible for this is not great either, and as such an
> new driver probably isn't called for either.
> 
> Some other discussions we had with John [2] also dropped some hints that
> multiple CMA heaps might be a good idea, and some vendors seem to do
> that too.
> 
> So here's another attempt that doesn't affect the device tree at all and
> will just create a heap for every CMA reserved memory region.

Does that means that if we carve-out memory for a co-processor operating system,
that memory region is now available to userspace to allocate from ? Or is there
a nuance to that ?

For other carveout, such as RK3588 HDMI receiver, that is clearly a win, giving
user the ability to allocate using externally supplied constraints rather then
having to convince the v4l2 driver to match these. While keeping the safety that
this carveout will yield valid addresses for the IP.

Will there be a generic way to find out which driver/device this carveout
belongs to ? In V4L2, only complex cameras have userspace drivers, everything
else is generic code.

Nicolas

> 
> It also falls nicely into the current plan we have to support cgroups in
> DRM/KMS and v4l2, which is an additional benefit.
> 
> Let me know what you think,
> Maxime
> 
> 1: https://lore.kernel.org/all/20250707-cobalt-dingo-of-serenity-dbf92c@houat/
> 2:
> https://lore.kernel.org/all/CANDhNCroe6ZBtN_o=c71kzFFaWK-fF5rCdnr9P5h1sgPOWSGSw@mail.gmail.com/
> 
> Let me know what you think,
> Maxime
> 
> Signed-off-by: Maxime Ripard <mripard@kernel.org>
> ---
> Changes in v6:
> - Drop the new driver and allocate a CMA heap for each region now
> - Dropped the binding
> - Rebased on 6.16-rc5
> - Link to v5:
> https://lore.kernel.org/r/20250617-dma-buf-ecc-heap-v5-0-0abdc5863a4f@kernel.org
> 
> Changes in v5:
> - Rebased on 6.16-rc2
> - Switch from property to dedicated binding
> - Link to v4:
> https://lore.kernel.org/r/20250520-dma-buf-ecc-heap-v4-1-bd2e1f1bb42c@kernel.org
> 
> Changes in v4:
> - Rebased on 6.15-rc7
> - Map buffers only when map is actually called, not at allocation time
> - Deal with restricted-dma-pool and shared-dma-pool
> - Reword Kconfig options
> - Properly report dma_map_sgtable failures
> - Link to v3:
> https://lore.kernel.org/r/20250407-dma-buf-ecc-heap-v3-0-97cdd36a5f29@kernel.org
> 
> Changes in v3:
> - Reworked global variable patch
> - Link to v2:
> https://lore.kernel.org/r/20250401-dma-buf-ecc-heap-v2-0-043fd006a1af@kernel.org
> 
> Changes in v2:
> - Add vmap/vunmap operations
> - Drop ECC flags uapi
> - Rebase on top of 6.14
> - Link to v1:
> https://lore.kernel.org/r/20240515-dma-buf-ecc-heap-v1-0-54cbbd049511@kernel.org
> 
> ---
> Maxime Ripard (2):
>       dma/contiguous: Add helper to test reserved memory type
>       dma-buf: heaps: cma: Create CMA heap for each CMA reserved region
> 
>  drivers/dma-buf/heaps/cma_heap.c | 52
> +++++++++++++++++++++++++++++++++++++++-
>  include/linux/dma-map-ops.h      | 13 ++++++++++
>  kernel/dma/contiguous.c          |  7 ++++++
>  3 files changed, 71 insertions(+), 1 deletion(-)
> ---
> base-commit: 47633099a672fc7bfe604ef454e4f116e2c954b1
> change-id: 20240515-dma-buf-ecc-heap-28a311d2c94e
> prerequisite-message-id: <20250610131231.1724627-1-jkangas@redhat.com>
> prerequisite-patch-id: bc44be5968feb187f2bc1b8074af7209462b18e7
> prerequisite-patch-id: f02a91b723e5ec01fbfedf3c3905218b43d432da
> prerequisite-patch-id: e944d0a3e22f2cdf4d3b3906e5603af934696deb
> 
> Best regards,
Re: [PATCH v6 0/2] dma-buf: heaps: Create a CMA heap for each CMA reserved region
Posted by Maxime Ripard 2 months, 4 weeks ago
On Wed, Jul 09, 2025 at 09:10:02AM -0400, Nicolas Dufresne wrote:
> Hi Maxime,
> 
> Le mercredi 09 juillet 2025 à 14:44 +0200, Maxime Ripard a écrit :
> > Hi,
> > 
> > Here's another attempt at supporting user-space allocations from a
> > specific carved-out reserved memory region.
> > 
> > The initial problem we were discussing was that I'm currently working on
> > a platform which has a memory layout with ECC enabled. However, enabling
> > the ECC has a number of drawbacks on that platform: lower performance,
> > increased memory usage, etc. So for things like framebuffers, the
> > trade-off isn't great and thus there's a memory region with ECC disabled
> > to allocate from for such use cases.
> > 
> > After a suggestion from John, I chose to first start using heap
> > allocations flags to allow for userspace to ask for a particular ECC
> > setup. This is then backed by a new heap type that runs from reserved
> > memory chunks flagged as such, and the existing DT properties to specify
> > the ECC properties.
> > 
> > After further discussion, it was considered that flags were not the
> > right solution, and relying on the names of the heaps would be enough to
> > let userspace know the kind of buffer it deals with.
> > 
> > Thus, even though the uAPI part of it had been dropped in this second
> > version, we still needed a driver to create heaps out of carved-out memory
> > regions. In addition to the original usecase, a similar driver can be
> > found in BSPs from most vendors, so I believe it would be a useful
> > addition to the kernel.
> > 
> > Some extra discussion with Rob Herring [1] came to the conclusion that
> > some specific compatible for this is not great either, and as such an
> > new driver probably isn't called for either.
> > 
> > Some other discussions we had with John [2] also dropped some hints that
> > multiple CMA heaps might be a good idea, and some vendors seem to do
> > that too.
> > 
> > So here's another attempt that doesn't affect the device tree at all and
> > will just create a heap for every CMA reserved memory region.
> 
> Does that means that if we carve-out memory for a co-processor operating system,
> that memory region is now available to userspace to allocate from ? Or is there
> a nuance to that ?

There is a nuance to that :)

You need to have the "reusable" property set which is documented as:

      The operating system can use the memory in this region with the
      limitation that the device driver(s) owning the region need to be
      able to reclaim it back. Typically that means that the operating
      system can use that region to store volatile or cached data that
      can be otherwise regenerated or migrated elsewhere.

https://github.com/devicetree-org/dt-schema/blob/main/dtschema/schemas/reserved-memory/reserved-memory.yaml#L87

If it's not set, it's not exposed, and I'd expect a coprocessor memory
region wouldn't be flagged as such.

> For other carveout, such as RK3588 HDMI receiver, that is clearly a win, giving
> user the ability to allocate using externally supplied constraints rather then
> having to convince the v4l2 driver to match these. While keeping the safety that
> this carveout will yield valid addresses for the IP.
> 
> Will there be a generic way to find out which driver/device this carveout
> belongs to ? In V4L2, only complex cameras have userspace drivers, everything
> else is generic code.

I believe it's a separate discussion, but the current stance is that the
heap name is enough to identify in a platform-specific way where you
allocate from. I've worked on documenting what a good name is so
userspace can pick it up more easily here:

https://lore.kernel.org/r/20250616-dma-buf-heap-names-doc-v2-1-8ae43174cdbf@kernel.org

But it's not really what you expected

Maxime
Re: [PATCH v6 0/2] dma-buf: heaps: Create a CMA heap for each CMA reserved region
Posted by Nicolas Dufresne 2 months, 4 weeks ago
Hi,

Le mercredi 09 juillet 2025 à 15:38 +0200, Maxime Ripard a écrit :
> > Will there be a generic way to find out which driver/device this carveout
> > belongs to ? In V4L2, only complex cameras have userspace drivers,
> > everything
> > else is generic code.
> 
> I believe it's a separate discussion, but the current stance is that the
> heap name is enough to identify in a platform-specific way where you
> allocate from. I've worked on documenting what a good name is so
> userspace can pick it up more easily here:
> 
> https://lore.kernel.org/r/20250616-dma-buf-heap-names-doc-v2-1-8ae43174cdbf@kernel.org
> 
> But it's not really what you expected

From a dma-heap API, the naming rules seems necessary, but suggesting generic
code to use "grep" style of search to match a heap is extremely fragile. The
documentation you propose is (intentionally?) vague. For me, the naming is more
like giving proper names to your function calls do devs can make sense out of
it.

Stepping back a little, we already opened the door for in-driver use of heaps.
So perhaps the way forward is to have V4L2 drivers utilize heaps from inside the
kernel. Once driver are fully ported, additional APIs could be added so that
userspace can read which heap(s) is going to be used for the active
configuration, and which other heaps are known usable (enumerate them). There is
no need to add properties in that context, since these will derives from the
driver configuration you picked. If you told you driver you doing secure memory
playback, the driver will filter-out what can't be used.

Examples out there often express simplified view of the problem. Your ECC video
playback case is a good one. Let's say you have performance issue in both
decoder and display due to ECC. You may think that you just allocate from a non-
ECC heap, import these into the decoder, and once filled, import these into the
display driver and you won.

But in reality, your display buffer might not be the reference buffers, and most
of the memory bandwidth in a modern decoder goes into reading reference frames
and the attached metadata (the later which may or may not be in the same
allocation block).

Even once the reference frames get exposed to userspace (which is a long term
goal), there will still be couple of buffers that just simply don't fit and must
be kept hidden inside the driver.

My general conclusion is that once these heap exists, and that we guarantee
platform specific unique names, we should probably build on top. Both userspace
and driver become consumers of the heap. And for the case where the platform-
specific knowledge lives inside the kernel, then heaps are selected by the
kernel. Also, very little per-driver duplication will be needed, since 90% of
the V4L2 driver share the allocator implementation.

Does that makes any sense to anyone ?

Nicolas

Re: [PATCH v6 0/2] dma-buf: heaps: Create a CMA heap for each CMA reserved region
Posted by Maxime Ripard 2 months, 4 weeks ago
On Thu, Jul 10, 2025 at 11:21:02AM -0400, Nicolas Dufresne wrote:
> Hi,
> 
> Le mercredi 09 juillet 2025 à 15:38 +0200, Maxime Ripard a écrit :
> > > Will there be a generic way to find out which driver/device this carveout
> > > belongs to ? In V4L2, only complex cameras have userspace drivers,
> > > everything
> > > else is generic code.
> > 
> > I believe it's a separate discussion, but the current stance is that the
> > heap name is enough to identify in a platform-specific way where you
> > allocate from. I've worked on documenting what a good name is so
> > userspace can pick it up more easily here:
> > 
> > https://lore.kernel.org/r/20250616-dma-buf-heap-names-doc-v2-1-8ae43174cdbf@kernel.org
> > 
> > But it's not really what you expected
> 
> From a dma-heap API, the naming rules seems necessary, but suggesting generic
> code to use "grep" style of search to match a heap is extremely fragile. The
> documentation you propose is (intentionally?) vague. For me, the naming is more
> like giving proper names to your function calls do devs can make sense out of
> it.

I agree, and made a proposal to implement some kind of heap capabilities
discovery ioctl. The main concern at the time was that Android tried
that with ION and it lead to a proliferation of poorly defined flags,
and that names were enough to do so.

I still think that at some point we will need this, but I also don't
have a good idea to address these concerns.

> Stepping back a little, we already opened the door for in-driver use of heaps.
> So perhaps the way forward is to have V4L2 drivers utilize heaps from inside the
> kernel. Once driver are fully ported, additional APIs could be added so that
> userspace can read which heap(s) is going to be used for the active
> configuration, and which other heaps are known usable (enumerate them). There is
> no need to add properties in that context, since these will derives from the
> driver configuration you picked. If you told you driver you doing secure memory
> playback, the driver will filter-out what can't be used.
> 
> Examples out there often express simplified view of the problem. Your ECC video
> playback case is a good one. Let's say you have performance issue in both
> decoder and display due to ECC. You may think that you just allocate from a non-
> ECC heap, import these into the decoder, and once filled, import these into the
> display driver and you won.
> 
> But in reality, your display buffer might not be the reference buffers, and most
> of the memory bandwidth in a modern decoder goes into reading reference frames
> and the attached metadata (the later which may or may not be in the same
> allocation block).
> 
> Even once the reference frames get exposed to userspace (which is a long term
> goal), there will still be couple of buffers that just simply don't fit and must
> be kept hidden inside the driver.
> 
> My general conclusion is that once these heap exists, and that we guarantee
> platform specific unique names, we should probably build on top. Both userspace
> and driver become consumers of the heap. And for the case where the platform-
> specific knowledge lives inside the kernel, then heaps are selected by the
> kernel. Also, very little per-driver duplication will be needed, since 90% of
> the V4L2 driver share the allocator implementation.
> 
> Does that makes any sense to anyone ?

It does, and it's roughly what we have in mind for the cgroups support
in KMS and v4l2. The main issue with it is that knowing if you allocate
from a dedicated pool (which would use the dmem cgroup controller) or
the main memory pool (which would use memcg) wasn't deterministic and
thus you couldn't properly account.

The solution we have in mind right now is indeed to switch everyone to
using heaps, and then exposing which cgroup that heap allocates from.

Your proposal here has a few extra steps, but the main idea is there
still.

Maxime