drivers/dma-buf/heaps/cma_heap.c | 52 +++++++++++++++++++++++++++++++++++++++- include/linux/dma-map-ops.h | 13 ++++++++++ kernel/dma/contiguous.c | 7 ++++++ 3 files changed, 71 insertions(+), 1 deletion(-)
Hi, Here's another attempt at supporting user-space allocations from a specific carved-out reserved memory region. The initial problem we were discussing was that I'm currently working on a platform which has a memory layout with ECC enabled. However, enabling the ECC has a number of drawbacks on that platform: lower performance, increased memory usage, etc. So for things like framebuffers, the trade-off isn't great and thus there's a memory region with ECC disabled to allocate from for such use cases. After a suggestion from John, I chose to first start using heap allocations flags to allow for userspace to ask for a particular ECC setup. This is then backed by a new heap type that runs from reserved memory chunks flagged as such, and the existing DT properties to specify the ECC properties. After further discussion, it was considered that flags were not the right solution, and relying on the names of the heaps would be enough to let userspace know the kind of buffer it deals with. Thus, even though the uAPI part of it had been dropped in this second version, we still needed a driver to create heaps out of carved-out memory regions. In addition to the original usecase, a similar driver can be found in BSPs from most vendors, so I believe it would be a useful addition to the kernel. Some extra discussion with Rob Herring [1] came to the conclusion that some specific compatible for this is not great either, and as such an new driver probably isn't called for either. Some other discussions we had with John [2] also dropped some hints that multiple CMA heaps might be a good idea, and some vendors seem to do that too. So here's another attempt that doesn't affect the device tree at all and will just create a heap for every CMA reserved memory region. It also falls nicely into the current plan we have to support cgroups in DRM/KMS and v4l2, which is an additional benefit. Let me know what you think, Maxime 1: https://lore.kernel.org/all/20250707-cobalt-dingo-of-serenity-dbf92c@houat/ 2: https://lore.kernel.org/all/CANDhNCroe6ZBtN_o=c71kzFFaWK-fF5rCdnr9P5h1sgPOWSGSw@mail.gmail.com/ Let me know what you think, Maxime Signed-off-by: Maxime Ripard <mripard@kernel.org> --- Changes in v6: - Drop the new driver and allocate a CMA heap for each region now - Dropped the binding - Rebased on 6.16-rc5 - Link to v5: https://lore.kernel.org/r/20250617-dma-buf-ecc-heap-v5-0-0abdc5863a4f@kernel.org Changes in v5: - Rebased on 6.16-rc2 - Switch from property to dedicated binding - Link to v4: https://lore.kernel.org/r/20250520-dma-buf-ecc-heap-v4-1-bd2e1f1bb42c@kernel.org Changes in v4: - Rebased on 6.15-rc7 - Map buffers only when map is actually called, not at allocation time - Deal with restricted-dma-pool and shared-dma-pool - Reword Kconfig options - Properly report dma_map_sgtable failures - Link to v3: https://lore.kernel.org/r/20250407-dma-buf-ecc-heap-v3-0-97cdd36a5f29@kernel.org Changes in v3: - Reworked global variable patch - Link to v2: https://lore.kernel.org/r/20250401-dma-buf-ecc-heap-v2-0-043fd006a1af@kernel.org Changes in v2: - Add vmap/vunmap operations - Drop ECC flags uapi - Rebase on top of 6.14 - Link to v1: https://lore.kernel.org/r/20240515-dma-buf-ecc-heap-v1-0-54cbbd049511@kernel.org --- Maxime Ripard (2): dma/contiguous: Add helper to test reserved memory type dma-buf: heaps: cma: Create CMA heap for each CMA reserved region drivers/dma-buf/heaps/cma_heap.c | 52 +++++++++++++++++++++++++++++++++++++++- include/linux/dma-map-ops.h | 13 ++++++++++ kernel/dma/contiguous.c | 7 ++++++ 3 files changed, 71 insertions(+), 1 deletion(-) --- base-commit: 47633099a672fc7bfe604ef454e4f116e2c954b1 change-id: 20240515-dma-buf-ecc-heap-28a311d2c94e prerequisite-message-id: <20250610131231.1724627-1-jkangas@redhat.com> prerequisite-patch-id: bc44be5968feb187f2bc1b8074af7209462b18e7 prerequisite-patch-id: f02a91b723e5ec01fbfedf3c3905218b43d432da prerequisite-patch-id: e944d0a3e22f2cdf4d3b3906e5603af934696deb Best regards, -- Maxime Ripard <mripard@kernel.org>
Hi Maxime, Le mercredi 09 juillet 2025 à 14:44 +0200, Maxime Ripard a écrit : > Hi, > > Here's another attempt at supporting user-space allocations from a > specific carved-out reserved memory region. > > The initial problem we were discussing was that I'm currently working on > a platform which has a memory layout with ECC enabled. However, enabling > the ECC has a number of drawbacks on that platform: lower performance, > increased memory usage, etc. So for things like framebuffers, the > trade-off isn't great and thus there's a memory region with ECC disabled > to allocate from for such use cases. > > After a suggestion from John, I chose to first start using heap > allocations flags to allow for userspace to ask for a particular ECC > setup. This is then backed by a new heap type that runs from reserved > memory chunks flagged as such, and the existing DT properties to specify > the ECC properties. > > After further discussion, it was considered that flags were not the > right solution, and relying on the names of the heaps would be enough to > let userspace know the kind of buffer it deals with. > > Thus, even though the uAPI part of it had been dropped in this second > version, we still needed a driver to create heaps out of carved-out memory > regions. In addition to the original usecase, a similar driver can be > found in BSPs from most vendors, so I believe it would be a useful > addition to the kernel. > > Some extra discussion with Rob Herring [1] came to the conclusion that > some specific compatible for this is not great either, and as such an > new driver probably isn't called for either. > > Some other discussions we had with John [2] also dropped some hints that > multiple CMA heaps might be a good idea, and some vendors seem to do > that too. > > So here's another attempt that doesn't affect the device tree at all and > will just create a heap for every CMA reserved memory region. Does that means that if we carve-out memory for a co-processor operating system, that memory region is now available to userspace to allocate from ? Or is there a nuance to that ? For other carveout, such as RK3588 HDMI receiver, that is clearly a win, giving user the ability to allocate using externally supplied constraints rather then having to convince the v4l2 driver to match these. While keeping the safety that this carveout will yield valid addresses for the IP. Will there be a generic way to find out which driver/device this carveout belongs to ? In V4L2, only complex cameras have userspace drivers, everything else is generic code. Nicolas > > It also falls nicely into the current plan we have to support cgroups in > DRM/KMS and v4l2, which is an additional benefit. > > Let me know what you think, > Maxime > > 1: https://lore.kernel.org/all/20250707-cobalt-dingo-of-serenity-dbf92c@houat/ > 2: > https://lore.kernel.org/all/CANDhNCroe6ZBtN_o=c71kzFFaWK-fF5rCdnr9P5h1sgPOWSGSw@mail.gmail.com/ > > Let me know what you think, > Maxime > > Signed-off-by: Maxime Ripard <mripard@kernel.org> > --- > Changes in v6: > - Drop the new driver and allocate a CMA heap for each region now > - Dropped the binding > - Rebased on 6.16-rc5 > - Link to v5: > https://lore.kernel.org/r/20250617-dma-buf-ecc-heap-v5-0-0abdc5863a4f@kernel.org > > Changes in v5: > - Rebased on 6.16-rc2 > - Switch from property to dedicated binding > - Link to v4: > https://lore.kernel.org/r/20250520-dma-buf-ecc-heap-v4-1-bd2e1f1bb42c@kernel.org > > Changes in v4: > - Rebased on 6.15-rc7 > - Map buffers only when map is actually called, not at allocation time > - Deal with restricted-dma-pool and shared-dma-pool > - Reword Kconfig options > - Properly report dma_map_sgtable failures > - Link to v3: > https://lore.kernel.org/r/20250407-dma-buf-ecc-heap-v3-0-97cdd36a5f29@kernel.org > > Changes in v3: > - Reworked global variable patch > - Link to v2: > https://lore.kernel.org/r/20250401-dma-buf-ecc-heap-v2-0-043fd006a1af@kernel.org > > Changes in v2: > - Add vmap/vunmap operations > - Drop ECC flags uapi > - Rebase on top of 6.14 > - Link to v1: > https://lore.kernel.org/r/20240515-dma-buf-ecc-heap-v1-0-54cbbd049511@kernel.org > > --- > Maxime Ripard (2): > dma/contiguous: Add helper to test reserved memory type > dma-buf: heaps: cma: Create CMA heap for each CMA reserved region > > drivers/dma-buf/heaps/cma_heap.c | 52 > +++++++++++++++++++++++++++++++++++++++- > include/linux/dma-map-ops.h | 13 ++++++++++ > kernel/dma/contiguous.c | 7 ++++++ > 3 files changed, 71 insertions(+), 1 deletion(-) > --- > base-commit: 47633099a672fc7bfe604ef454e4f116e2c954b1 > change-id: 20240515-dma-buf-ecc-heap-28a311d2c94e > prerequisite-message-id: <20250610131231.1724627-1-jkangas@redhat.com> > prerequisite-patch-id: bc44be5968feb187f2bc1b8074af7209462b18e7 > prerequisite-patch-id: f02a91b723e5ec01fbfedf3c3905218b43d432da > prerequisite-patch-id: e944d0a3e22f2cdf4d3b3906e5603af934696deb > > Best regards,
On Wed, Jul 09, 2025 at 09:10:02AM -0400, Nicolas Dufresne wrote: > Hi Maxime, > > Le mercredi 09 juillet 2025 à 14:44 +0200, Maxime Ripard a écrit : > > Hi, > > > > Here's another attempt at supporting user-space allocations from a > > specific carved-out reserved memory region. > > > > The initial problem we were discussing was that I'm currently working on > > a platform which has a memory layout with ECC enabled. However, enabling > > the ECC has a number of drawbacks on that platform: lower performance, > > increased memory usage, etc. So for things like framebuffers, the > > trade-off isn't great and thus there's a memory region with ECC disabled > > to allocate from for such use cases. > > > > After a suggestion from John, I chose to first start using heap > > allocations flags to allow for userspace to ask for a particular ECC > > setup. This is then backed by a new heap type that runs from reserved > > memory chunks flagged as such, and the existing DT properties to specify > > the ECC properties. > > > > After further discussion, it was considered that flags were not the > > right solution, and relying on the names of the heaps would be enough to > > let userspace know the kind of buffer it deals with. > > > > Thus, even though the uAPI part of it had been dropped in this second > > version, we still needed a driver to create heaps out of carved-out memory > > regions. In addition to the original usecase, a similar driver can be > > found in BSPs from most vendors, so I believe it would be a useful > > addition to the kernel. > > > > Some extra discussion with Rob Herring [1] came to the conclusion that > > some specific compatible for this is not great either, and as such an > > new driver probably isn't called for either. > > > > Some other discussions we had with John [2] also dropped some hints that > > multiple CMA heaps might be a good idea, and some vendors seem to do > > that too. > > > > So here's another attempt that doesn't affect the device tree at all and > > will just create a heap for every CMA reserved memory region. > > Does that means that if we carve-out memory for a co-processor operating system, > that memory region is now available to userspace to allocate from ? Or is there > a nuance to that ? There is a nuance to that :) You need to have the "reusable" property set which is documented as: The operating system can use the memory in this region with the limitation that the device driver(s) owning the region need to be able to reclaim it back. Typically that means that the operating system can use that region to store volatile or cached data that can be otherwise regenerated or migrated elsewhere. https://github.com/devicetree-org/dt-schema/blob/main/dtschema/schemas/reserved-memory/reserved-memory.yaml#L87 If it's not set, it's not exposed, and I'd expect a coprocessor memory region wouldn't be flagged as such. > For other carveout, such as RK3588 HDMI receiver, that is clearly a win, giving > user the ability to allocate using externally supplied constraints rather then > having to convince the v4l2 driver to match these. While keeping the safety that > this carveout will yield valid addresses for the IP. > > Will there be a generic way to find out which driver/device this carveout > belongs to ? In V4L2, only complex cameras have userspace drivers, everything > else is generic code. I believe it's a separate discussion, but the current stance is that the heap name is enough to identify in a platform-specific way where you allocate from. I've worked on documenting what a good name is so userspace can pick it up more easily here: https://lore.kernel.org/r/20250616-dma-buf-heap-names-doc-v2-1-8ae43174cdbf@kernel.org But it's not really what you expected Maxime
Hi, Le mercredi 09 juillet 2025 à 15:38 +0200, Maxime Ripard a écrit : > > Will there be a generic way to find out which driver/device this carveout > > belongs to ? In V4L2, only complex cameras have userspace drivers, > > everything > > else is generic code. > > I believe it's a separate discussion, but the current stance is that the > heap name is enough to identify in a platform-specific way where you > allocate from. I've worked on documenting what a good name is so > userspace can pick it up more easily here: > > https://lore.kernel.org/r/20250616-dma-buf-heap-names-doc-v2-1-8ae43174cdbf@kernel.org > > But it's not really what you expected From a dma-heap API, the naming rules seems necessary, but suggesting generic code to use "grep" style of search to match a heap is extremely fragile. The documentation you propose is (intentionally?) vague. For me, the naming is more like giving proper names to your function calls do devs can make sense out of it. Stepping back a little, we already opened the door for in-driver use of heaps. So perhaps the way forward is to have V4L2 drivers utilize heaps from inside the kernel. Once driver are fully ported, additional APIs could be added so that userspace can read which heap(s) is going to be used for the active configuration, and which other heaps are known usable (enumerate them). There is no need to add properties in that context, since these will derives from the driver configuration you picked. If you told you driver you doing secure memory playback, the driver will filter-out what can't be used. Examples out there often express simplified view of the problem. Your ECC video playback case is a good one. Let's say you have performance issue in both decoder and display due to ECC. You may think that you just allocate from a non- ECC heap, import these into the decoder, and once filled, import these into the display driver and you won. But in reality, your display buffer might not be the reference buffers, and most of the memory bandwidth in a modern decoder goes into reading reference frames and the attached metadata (the later which may or may not be in the same allocation block). Even once the reference frames get exposed to userspace (which is a long term goal), there will still be couple of buffers that just simply don't fit and must be kept hidden inside the driver. My general conclusion is that once these heap exists, and that we guarantee platform specific unique names, we should probably build on top. Both userspace and driver become consumers of the heap. And for the case where the platform- specific knowledge lives inside the kernel, then heaps are selected by the kernel. Also, very little per-driver duplication will be needed, since 90% of the V4L2 driver share the allocator implementation. Does that makes any sense to anyone ? Nicolas
On Thu, Jul 10, 2025 at 11:21:02AM -0400, Nicolas Dufresne wrote: > Hi, > > Le mercredi 09 juillet 2025 à 15:38 +0200, Maxime Ripard a écrit : > > > Will there be a generic way to find out which driver/device this carveout > > > belongs to ? In V4L2, only complex cameras have userspace drivers, > > > everything > > > else is generic code. > > > > I believe it's a separate discussion, but the current stance is that the > > heap name is enough to identify in a platform-specific way where you > > allocate from. I've worked on documenting what a good name is so > > userspace can pick it up more easily here: > > > > https://lore.kernel.org/r/20250616-dma-buf-heap-names-doc-v2-1-8ae43174cdbf@kernel.org > > > > But it's not really what you expected > > From a dma-heap API, the naming rules seems necessary, but suggesting generic > code to use "grep" style of search to match a heap is extremely fragile. The > documentation you propose is (intentionally?) vague. For me, the naming is more > like giving proper names to your function calls do devs can make sense out of > it. I agree, and made a proposal to implement some kind of heap capabilities discovery ioctl. The main concern at the time was that Android tried that with ION and it lead to a proliferation of poorly defined flags, and that names were enough to do so. I still think that at some point we will need this, but I also don't have a good idea to address these concerns. > Stepping back a little, we already opened the door for in-driver use of heaps. > So perhaps the way forward is to have V4L2 drivers utilize heaps from inside the > kernel. Once driver are fully ported, additional APIs could be added so that > userspace can read which heap(s) is going to be used for the active > configuration, and which other heaps are known usable (enumerate them). There is > no need to add properties in that context, since these will derives from the > driver configuration you picked. If you told you driver you doing secure memory > playback, the driver will filter-out what can't be used. > > Examples out there often express simplified view of the problem. Your ECC video > playback case is a good one. Let's say you have performance issue in both > decoder and display due to ECC. You may think that you just allocate from a non- > ECC heap, import these into the decoder, and once filled, import these into the > display driver and you won. > > But in reality, your display buffer might not be the reference buffers, and most > of the memory bandwidth in a modern decoder goes into reading reference frames > and the attached metadata (the later which may or may not be in the same > allocation block). > > Even once the reference frames get exposed to userspace (which is a long term > goal), there will still be couple of buffers that just simply don't fit and must > be kept hidden inside the driver. > > My general conclusion is that once these heap exists, and that we guarantee > platform specific unique names, we should probably build on top. Both userspace > and driver become consumers of the heap. And for the case where the platform- > specific knowledge lives inside the kernel, then heaps are selected by the > kernel. Also, very little per-driver duplication will be needed, since 90% of > the V4L2 driver share the allocator implementation. > > Does that makes any sense to anyone ? It does, and it's roughly what we have in mind for the cgroups support in KMS and v4l2. The main issue with it is that knowing if you allocate from a dedicated pool (which would use the dmem cgroup controller) or the main memory pool (which would use memcg) wasn't deterministic and thus you couldn't properly account. The solution we have in mind right now is indeed to switch everyone to using heaps, and then exposing which cgroup that heap allocates from. Your proposal here has a few extra steps, but the main idea is there still. Maxime
© 2016 - 2025 Red Hat, Inc.