drivers/tee/Makefile | 1 + drivers/tee/optee/Makefile | 1 + drivers/tee/optee/call.c | 10 +- drivers/tee/optee/core.c | 1 + drivers/tee/optee/ffa_abi.c | 178 +++++++++++++- drivers/tee/optee/optee_ffa.h | 27 ++- drivers/tee/optee/optee_msg.h | 65 ++++- drivers/tee/optee/optee_private.h | 75 ++++-- drivers/tee/optee/optee_smc.h | 71 +++++- drivers/tee/optee/rpc.c | 31 ++- drivers/tee/optee/rstmem.c | 388 ++++++++++++++++++++++++++++++ drivers/tee/optee/smc_abi.c | 213 ++++++++++++++-- drivers/tee/tee_core.c | 38 ++- drivers/tee/tee_private.h | 2 + drivers/tee/tee_rstmem.c | 201 ++++++++++++++++ drivers/tee/tee_shm.c | 2 + drivers/tee/tee_shm_pool.c | 69 +++++- include/linux/tee_core.h | 15 ++ include/linux/tee_drv.h | 2 + include/uapi/linux/tee.h | 44 +++- 20 files changed, 1358 insertions(+), 76 deletions(-) create mode 100644 drivers/tee/optee/rstmem.c create mode 100644 drivers/tee/tee_rstmem.c
Hi,
This patch set allocates the restricted DMA-bufs via the TEE subsystem.
The TEE subsystem handles the DMA-buf allocations since it is the TEE
(OP-TEE, AMD-TEE, TS-TEE, or perhaps a future QCOMTEE) which sets up the
restrictions for the memory used for the DMA-bufs.
I've added a new IOCTL, TEE_IOC_RSTMEM_ALLOC, to allocate the restricted
DMA-bufs. This IOCTL reaches the backend TEE driver, allowing it to choose
how to allocate the restricted physical memory.
TEE_IOC_RSTMEM_ALLOC takes in addition to a size and flags parameters also
a use-case parameter. This is used by the backend TEE driver to decide on
allocation policy and which devices should be able to access the memory.
Three use-cases (Secure Video Playback, Trusted UI, and Secure Video
Recording) has been identified so far to serve as examples of what can be
expected. More use-cases can be added in userspace ABI, but it's up to the
backend TEE drivers to provide the implementation.
Each use-case has it's own restricted memory pool since different use-cases
requires isolation from different parts of the system. A restricted memory
pool can be based on a static carveout instantiated while probing the TEE
backend driver, or dynamically allocated from CMA and made restricted as
needed by the TEE.
This can be tested on QEMU with the following steps:
repo init -u https://github.com/jenswi-linaro/manifest.git -m qemu_v8.xml \
-b prototype/sdp-v4
repo sync -j8
cd build
make toolchains -j$(nproc)
make SPMC_AT_EL=1 all -j$(nproc)
make SPMC_AT_EL=1 run-only
# login and at the prompt:
xtest --sdp-basic
The SPMC_AT_EL=1 parameter configures the build with FF-A and an SPMC at
S-EL1 inside OP-TEE. The parameter can be changed into SPMC_AT_EL=n to test
without FF-A using the original SMC ABI instead. Please remember to do
%rm -rf ../trusted-firmware-a/build/qemu
for TF-A to be rebuilt properly using the new configuration.
https://optee.readthedocs.io/en/latest/building/prerequisites.html
list dependencies needed to build the above.
The tests are pretty basic, mostly checking that a Trusted Application in
the secure world can access and manipulate the memory. There are also some
negative tests for out of bounds buffers etc.
Thanks,
Jens
Changes since V3:
* Make the use_case and flags field in struct tee_shm u32's instead of
u16's
* Add more description for TEE_IOC_RSTMEM_ALLOC in the header file
* Import namespace DMA_BUF in module tee, reported by lkp@intel.com
* Added a note in the commit message for "optee: account for direction
while converting parameters" why it's needed
* Factor out dynamic restricted memory allocation from
"optee: support restricted memory allocation" into two new commits
"optee: FF-A: dynamic restricted memory allocation" and
"optee: smc abi: dynamic restricted memory allocation"
* Guard CMA usage with #ifdef CONFIG_CMA, effectively disabling dynamic
restricted memory allocate if CMA isn't configured
Changes since the V2 RFC:
* Based on v6.12
* Replaced the flags for SVP and Trusted UID memory with a u32 field with
unique id for each use case
* Added dynamic allocation of restricted memory pools
* Added OP-TEE ABI both with and without FF-A for dynamic restricted memory
* Added support for FF-A with FFA_LEND
Changes since the V1 RFC:
* Based on v6.11
* Complete rewrite, replacing the restricted heap with TEE_IOC_RSTMEM_ALLOC
Changes since Olivier's post [2]:
* Based on Yong Wu's post [1] where much of dma-buf handling is done in
the generic restricted heap
* Simplifications and cleanup
* New commit message for "dma-buf: heaps: add Linaro restricted dmabuf heap
support"
* Replaced the word "secure" with "restricted" where applicable
Jens Wiklander (6):
tee: add restricted memory allocation
optee: account for direction while converting parameters
optee: sync secure world ABI headers
optee: support restricted memory allocation
optee: FF-A: dynamic restricted memory allocation
optee: smc abi: dynamic restricted memory allocation
drivers/tee/Makefile | 1 +
drivers/tee/optee/Makefile | 1 +
drivers/tee/optee/call.c | 10 +-
drivers/tee/optee/core.c | 1 +
drivers/tee/optee/ffa_abi.c | 178 +++++++++++++-
drivers/tee/optee/optee_ffa.h | 27 ++-
drivers/tee/optee/optee_msg.h | 65 ++++-
drivers/tee/optee/optee_private.h | 75 ++++--
drivers/tee/optee/optee_smc.h | 71 +++++-
drivers/tee/optee/rpc.c | 31 ++-
drivers/tee/optee/rstmem.c | 388 ++++++++++++++++++++++++++++++
drivers/tee/optee/smc_abi.c | 213 ++++++++++++++--
drivers/tee/tee_core.c | 38 ++-
drivers/tee/tee_private.h | 2 +
drivers/tee/tee_rstmem.c | 201 ++++++++++++++++
drivers/tee/tee_shm.c | 2 +
drivers/tee/tee_shm_pool.c | 69 +++++-
include/linux/tee_core.h | 15 ++
include/linux/tee_drv.h | 2 +
include/uapi/linux/tee.h | 44 +++-
20 files changed, 1358 insertions(+), 76 deletions(-)
create mode 100644 drivers/tee/optee/rstmem.c
create mode 100644 drivers/tee/tee_rstmem.c
base-commit: fac04efc5c793dccbd07e2d59af9f90b7fc0dca4
--
2.43.0
+Florent, who's working on protected-mode support in Panthor. Hi Jens, On Tue, 17 Dec 2024 11:07:36 +0100 Jens Wiklander <jens.wiklander@linaro.org> wrote: > Hi, > > This patch set allocates the restricted DMA-bufs via the TEE subsystem. We're currently working on protected-mode support for Panthor [1] and it looks like your series (and the OP-TEE implementation that goes with it) would allow us to have a fully upstream/open solution for the protected content use case we're trying to support. I need a bit more time to play with the implementation but this looks very promising (especially the lend rstmem feature, which might help us allocate our FW sections that are supposed to execute code accessing protected content). > > The TEE subsystem handles the DMA-buf allocations since it is the TEE > (OP-TEE, AMD-TEE, TS-TEE, or perhaps a future QCOMTEE) which sets up the > restrictions for the memory used for the DMA-bufs. > > I've added a new IOCTL, TEE_IOC_RSTMEM_ALLOC, to allocate the restricted > DMA-bufs. This IOCTL reaches the backend TEE driver, allowing it to choose > how to allocate the restricted physical memory. I'll probably have more questions soon, but here's one to start: any particular reason you didn't go for a dma-heap to expose restricted buffer allocation to userspace? I see you already have a cdev you can take ioctl()s from, but my understanding was that dma-heap was the standard solution for these device-agnostic/central allocators. Regards, Boris [1]https://lwn.net/ml/all/cover.1738228114.git.florent.tomasin@arm.com/#t
Hi Boris, On Thu, 13 Feb 2025 at 01:26, Boris Brezillon <boris.brezillon@collabora.com> wrote: > > +Florent, who's working on protected-mode support in Panthor. > > Hi Jens, > > On Tue, 17 Dec 2024 11:07:36 +0100 > Jens Wiklander <jens.wiklander@linaro.org> wrote: > > > Hi, > > > > This patch set allocates the restricted DMA-bufs via the TEE subsystem. > > We're currently working on protected-mode support for Panthor [1] and it > looks like your series (and the OP-TEE implementation that goes with > it) would allow us to have a fully upstream/open solution for the > protected content use case we're trying to support. I need a bit more > time to play with the implementation but this looks very promising > (especially the lend rstmem feature, which might help us allocate our > FW sections that are supposed to execute code accessing protected > content). Glad to hear that, if you can demonstrate an open source use case based on this series then it will help to land it. We really would love to see support for restricted DMA-buf consumers be it GPU, crypto accelerator, media pipeline etc. > > > > > The TEE subsystem handles the DMA-buf allocations since it is the TEE > > (OP-TEE, AMD-TEE, TS-TEE, or perhaps a future QCOMTEE) which sets up the > > restrictions for the memory used for the DMA-bufs. > > > > I've added a new IOCTL, TEE_IOC_RSTMEM_ALLOC, to allocate the restricted > > DMA-bufs. This IOCTL reaches the backend TEE driver, allowing it to choose > > how to allocate the restricted physical memory. > > I'll probably have more questions soon, but here's one to start: any > particular reason you didn't go for a dma-heap to expose restricted > buffer allocation to userspace? I see you already have a cdev you can > take ioctl()s from, but my understanding was that dma-heap was the > standard solution for these device-agnostic/central allocators. This series started with the DMA heap approach only here [1] but later discussions [2] lead us here. To point out specifically: - DMA heaps require reliance on DT to discover static restricted regions carve-outs whereas via the TEE implementation driver (eg. OP-TEE) those can be discovered dynamically. - Dynamic allocation of buffers and making them restricted requires vendor specific driver hooks with DMA heaps whereas the TEE subsystem abstracts that out with underlying TEE implementation (eg. OP-TEE) managing the dynamic buffer restriction. - TEE subsystem already has a well defined user-space interface for managing shared memory buffers with TEE and restricted DMA buffers will be yet another interface managed along similar lines. [1] https://lore.kernel.org/lkml/mzur3odofwwrdqnystozjgf3qtvb73wqjm6g2vf5dfsqiehaxk@u67fcarhm6ge/T/ [2] https://lore.kernel.org/lkml/CAFA6WYPtp3H5JhxzgH9=z2EvNL7Kdku3EmG1aDkTS-gjFtNZZA@mail.gmail.com/ -Sumit > > Regards, > > Boris > > [1]https://lwn.net/ml/all/cover.1738228114.git.florent.tomasin@arm.com/#t
Hi, On Thu, Feb 13, 2025 at 7:42 AM Sumit Garg <sumit.garg@linaro.org> wrote: > > Hi Boris, > > On Thu, 13 Feb 2025 at 01:26, Boris Brezillon > <boris.brezillon@collabora.com> wrote: > > > > +Florent, who's working on protected-mode support in Panthor. > > > > Hi Jens, > > > > On Tue, 17 Dec 2024 11:07:36 +0100 > > Jens Wiklander <jens.wiklander@linaro.org> wrote: > > > > > Hi, > > > > > > This patch set allocates the restricted DMA-bufs via the TEE subsystem. > > > > We're currently working on protected-mode support for Panthor [1] and it > > looks like your series (and the OP-TEE implementation that goes with > > it) would allow us to have a fully upstream/open solution for the > > protected content use case we're trying to support. I need a bit more > > time to play with the implementation but this looks very promising > > (especially the lend rstmem feature, which might help us allocate our > > FW sections that are supposed to execute code accessing protected > > content). > > Glad to hear that, if you can demonstrate an open source use case > based on this series then it will help to land it. We really would > love to see support for restricted DMA-buf consumers be it GPU, crypto > accelerator, media pipeline etc. I'm preparing a demo based on GStreamer to share. It helps with more real-world examples to see that APIs etc work. > > > > > > > > > The TEE subsystem handles the DMA-buf allocations since it is the TEE > > > (OP-TEE, AMD-TEE, TS-TEE, or perhaps a future QCOMTEE) which sets up the > > > restrictions for the memory used for the DMA-bufs. > > > > > > I've added a new IOCTL, TEE_IOC_RSTMEM_ALLOC, to allocate the restricted > > > DMA-bufs. This IOCTL reaches the backend TEE driver, allowing it to choose > > > how to allocate the restricted physical memory. > > > > I'll probably have more questions soon, but here's one to start: any > > particular reason you didn't go for a dma-heap to expose restricted > > buffer allocation to userspace? I see you already have a cdev you can > > take ioctl()s from, but my understanding was that dma-heap was the > > standard solution for these device-agnostic/central allocators. > > This series started with the DMA heap approach only here [1] but later > discussions [2] lead us here. To point out specifically: > > - DMA heaps require reliance on DT to discover static restricted > regions carve-outs whereas via the TEE implementation driver (eg. > OP-TEE) those can be discovered dynamically. > - Dynamic allocation of buffers and making them restricted requires > vendor specific driver hooks with DMA heaps whereas the TEE subsystem > abstracts that out with underlying TEE implementation (eg. OP-TEE) > managing the dynamic buffer restriction. > - TEE subsystem already has a well defined user-space interface for > managing shared memory buffers with TEE and restricted DMA buffers > will be yet another interface managed along similar lines. > > [1] https://lore.kernel.org/lkml/mzur3odofwwrdqnystozjgf3qtvb73wqjm6g2vf5dfsqiehaxk@u67fcarhm6ge/T/ > [2] https://lore.kernel.org/lkml/CAFA6WYPtp3H5JhxzgH9=z2EvNL7Kdku3EmG1aDkTS-gjFtNZZA@mail.gmail.com/ Thanks for the good summary. :-) Cheers, Jens > > -Sumit > > > > > Regards, > > > > Boris > > > > [1]https://lwn.net/ml/all/cover.1738228114.git.florent.tomasin@arm.com/#t
On Thu, 13 Feb 2025 12:11:52 +0530 Sumit Garg <sumit.garg@linaro.org> wrote: > Hi Boris, > > On Thu, 13 Feb 2025 at 01:26, Boris Brezillon > <boris.brezillon@collabora.com> wrote: > > > > +Florent, who's working on protected-mode support in Panthor. > > > > Hi Jens, > > > > On Tue, 17 Dec 2024 11:07:36 +0100 > > Jens Wiklander <jens.wiklander@linaro.org> wrote: > > > > > Hi, > > > > > > This patch set allocates the restricted DMA-bufs via the TEE subsystem. > > > > We're currently working on protected-mode support for Panthor [1] and it > > looks like your series (and the OP-TEE implementation that goes with > > it) would allow us to have a fully upstream/open solution for the > > protected content use case we're trying to support. I need a bit more > > time to play with the implementation but this looks very promising > > (especially the lend rstmem feature, which might help us allocate our > > FW sections that are supposed to execute code accessing protected > > content). > > Glad to hear that, if you can demonstrate an open source use case > based on this series then it will help to land it. We really would > love to see support for restricted DMA-buf consumers be it GPU, crypto > accelerator, media pipeline etc. > > > > > > > > > The TEE subsystem handles the DMA-buf allocations since it is the TEE > > > (OP-TEE, AMD-TEE, TS-TEE, or perhaps a future QCOMTEE) which sets up the > > > restrictions for the memory used for the DMA-bufs. > > > > > > I've added a new IOCTL, TEE_IOC_RSTMEM_ALLOC, to allocate the restricted > > > DMA-bufs. This IOCTL reaches the backend TEE driver, allowing it to choose > > > how to allocate the restricted physical memory. > > > > I'll probably have more questions soon, but here's one to start: any > > particular reason you didn't go for a dma-heap to expose restricted > > buffer allocation to userspace? I see you already have a cdev you can > > take ioctl()s from, but my understanding was that dma-heap was the > > standard solution for these device-agnostic/central allocators. > > This series started with the DMA heap approach only here [1] but later > discussions [2] lead us here. To point out specifically: > > - DMA heaps require reliance on DT to discover static restricted > regions carve-outs whereas via the TEE implementation driver (eg. > OP-TEE) those can be discovered dynamically. Hm, the system heap [1] doesn't rely on any DT information AFAICT. The dynamic allocation scheme, where the TEE implementation allocates a chunk of protected memory for us would have a similar behavior, I guess. > - Dynamic allocation of buffers and making them restricted requires > vendor specific driver hooks with DMA heaps whereas the TEE subsystem > abstracts that out with underlying TEE implementation (eg. OP-TEE) > managing the dynamic buffer restriction. Yeah, the lend rstmem feature is clearly something tee specific, and I think that's okay to assume the user knows the protection request should go through the tee subsystem in that case. > - TEE subsystem already has a well defined user-space interface for > managing shared memory buffers with TEE and restricted DMA buffers > will be yet another interface managed along similar lines. Okay, so the very reason I'm asking about the dma-buf heap interface is because there might be cases where the protected/restricted allocation doesn't go through the TEE (Mediatek has a TEE-free implementation for instance, but I realize vendor implementations are probably not the best selling point :-/). If we expose things as a dma-heap, we have a solution where integrators can pick the dma-heap they think is relevant for protected buffer allocations without the various drivers (GPU, video codec, ...) having to implement a dispatch function for all possible implementations. The same goes for userspace allocations, where passing a dma-heap name, is simpler than supporting different ioctl()s based on the allocation backend. [1]https://elixir.bootlin.com/linux/v6.13.2/source/drivers/dma-buf/heaps/system_heap.c#L424
On Thu, 13 Feb 2025 at 14:06, Boris Brezillon <boris.brezillon@collabora.com> wrote: > > On Thu, 13 Feb 2025 12:11:52 +0530 > Sumit Garg <sumit.garg@linaro.org> wrote: > > > Hi Boris, > > > > On Thu, 13 Feb 2025 at 01:26, Boris Brezillon > > <boris.brezillon@collabora.com> wrote: > > > > > > +Florent, who's working on protected-mode support in Panthor. > > > > > > Hi Jens, > > > > > > On Tue, 17 Dec 2024 11:07:36 +0100 > > > Jens Wiklander <jens.wiklander@linaro.org> wrote: > > > > > > > Hi, > > > > > > > > This patch set allocates the restricted DMA-bufs via the TEE subsystem. > > > > > > We're currently working on protected-mode support for Panthor [1] and it > > > looks like your series (and the OP-TEE implementation that goes with > > > it) would allow us to have a fully upstream/open solution for the > > > protected content use case we're trying to support. I need a bit more > > > time to play with the implementation but this looks very promising > > > (especially the lend rstmem feature, which might help us allocate our > > > FW sections that are supposed to execute code accessing protected > > > content). > > > > Glad to hear that, if you can demonstrate an open source use case > > based on this series then it will help to land it. We really would > > love to see support for restricted DMA-buf consumers be it GPU, crypto > > accelerator, media pipeline etc. > > > > > > > > > > > > > The TEE subsystem handles the DMA-buf allocations since it is the TEE > > > > (OP-TEE, AMD-TEE, TS-TEE, or perhaps a future QCOMTEE) which sets up the > > > > restrictions for the memory used for the DMA-bufs. > > > > > > > > I've added a new IOCTL, TEE_IOC_RSTMEM_ALLOC, to allocate the restricted > > > > DMA-bufs. This IOCTL reaches the backend TEE driver, allowing it to choose > > > > how to allocate the restricted physical memory. > > > > > > I'll probably have more questions soon, but here's one to start: any > > > particular reason you didn't go for a dma-heap to expose restricted > > > buffer allocation to userspace? I see you already have a cdev you can > > > take ioctl()s from, but my understanding was that dma-heap was the > > > standard solution for these device-agnostic/central allocators. > > > > This series started with the DMA heap approach only here [1] but later > > discussions [2] lead us here. To point out specifically: > > > > - DMA heaps require reliance on DT to discover static restricted > > regions carve-outs whereas via the TEE implementation driver (eg. > > OP-TEE) those can be discovered dynamically. > > Hm, the system heap [1] doesn't rely on any DT information AFAICT. Yeah but all the prior vendor specific secure/restricted DMA heaps relied on DT information. > The dynamic allocation scheme, where the TEE implementation allocates a > chunk of protected memory for us would have a similar behavior, I guess. In a dynamic scheme, the allocation will still be from CMA or system heap depending on TEE implementation capabilities but the restriction will be enforced via interaction with TEE. > > > - Dynamic allocation of buffers and making them restricted requires > > vendor specific driver hooks with DMA heaps whereas the TEE subsystem > > abstracts that out with underlying TEE implementation (eg. OP-TEE) > > managing the dynamic buffer restriction. > > Yeah, the lend rstmem feature is clearly something tee specific, and I > think that's okay to assume the user knows the protection request > should go through the tee subsystem in that case. Yeah but how will the user discover that? Rather than that it's better for the user to directly ask the TEE device to allocate restricted memory without worrying about how the memory restriction gets enforced. > > > - TEE subsystem already has a well defined user-space interface for > > managing shared memory buffers with TEE and restricted DMA buffers > > will be yet another interface managed along similar lines. > > Okay, so the very reason I'm asking about the dma-buf heap interface is > because there might be cases where the protected/restricted allocation > doesn't go through the TEE (Mediatek has a TEE-free implementation > for instance, but I realize vendor implementations are probably not the > best selling point :-/). You can always have a system with memory and peripheral access permissions setup during boot (or even have a pre-configured hardware as a special case) prior to booting up the kernel too. But that even gets somehow configured by a TEE implementation during boot, so calling it a TEE-free implementation seems over-simplified and not a scalable solution. However, this patchset [1] from Mediatek requires runtime TEE interaction too. [1] https://lore.kernel.org/linux-arm-kernel/20240515112308.10171-1-yong.wu@mediatek.com/ > If we expose things as a dma-heap, we have > a solution where integrators can pick the dma-heap they think is > relevant for protected buffer allocations without the various drivers > (GPU, video codec, ...) having to implement a dispatch function for all > possible implementations. The same goes for userspace allocations, > where passing a dma-heap name, is simpler than supporting different > ioctl()s based on the allocation backend. There have been several attempts with DMA heaps in the past which all resulted in a very vendor specific vertically integrated solution. But the solution with TEE subsystem aims to make it generic and vendor agnostic. > > [1]https://elixir.bootlin.com/linux/v6.13.2/source/drivers/dma-buf/heaps/system_heap.c#L424 -Sumit
On Thu, 13 Feb 2025 14:46:01 +0530 Sumit Garg <sumit.garg@linaro.org> wrote: > On Thu, 13 Feb 2025 at 14:06, Boris Brezillon > <boris.brezillon@collabora.com> wrote: > > > > On Thu, 13 Feb 2025 12:11:52 +0530 > > Sumit Garg <sumit.garg@linaro.org> wrote: > > > > > Hi Boris, > > > > > > On Thu, 13 Feb 2025 at 01:26, Boris Brezillon > > > <boris.brezillon@collabora.com> wrote: > > > > > > > > +Florent, who's working on protected-mode support in Panthor. > > > > > > > > Hi Jens, > > > > > > > > On Tue, 17 Dec 2024 11:07:36 +0100 > > > > Jens Wiklander <jens.wiklander@linaro.org> wrote: > > > > > > > > > Hi, > > > > > > > > > > This patch set allocates the restricted DMA-bufs via the TEE subsystem. > > > > > > > > We're currently working on protected-mode support for Panthor [1] and it > > > > looks like your series (and the OP-TEE implementation that goes with > > > > it) would allow us to have a fully upstream/open solution for the > > > > protected content use case we're trying to support. I need a bit more > > > > time to play with the implementation but this looks very promising > > > > (especially the lend rstmem feature, which might help us allocate our > > > > FW sections that are supposed to execute code accessing protected > > > > content). > > > > > > Glad to hear that, if you can demonstrate an open source use case > > > based on this series then it will help to land it. We really would > > > love to see support for restricted DMA-buf consumers be it GPU, crypto > > > accelerator, media pipeline etc. > > > > > > > > > > > > > > > > > The TEE subsystem handles the DMA-buf allocations since it is the TEE > > > > > (OP-TEE, AMD-TEE, TS-TEE, or perhaps a future QCOMTEE) which sets up the > > > > > restrictions for the memory used for the DMA-bufs. > > > > > > > > > > I've added a new IOCTL, TEE_IOC_RSTMEM_ALLOC, to allocate the restricted > > > > > DMA-bufs. This IOCTL reaches the backend TEE driver, allowing it to choose > > > > > how to allocate the restricted physical memory. > > > > > > > > I'll probably have more questions soon, but here's one to start: any > > > > particular reason you didn't go for a dma-heap to expose restricted > > > > buffer allocation to userspace? I see you already have a cdev you can > > > > take ioctl()s from, but my understanding was that dma-heap was the > > > > standard solution for these device-agnostic/central allocators. > > > > > > This series started with the DMA heap approach only here [1] but later > > > discussions [2] lead us here. To point out specifically: > > > > > > - DMA heaps require reliance on DT to discover static restricted > > > regions carve-outs whereas via the TEE implementation driver (eg. > > > OP-TEE) those can be discovered dynamically. > > > > Hm, the system heap [1] doesn't rely on any DT information AFAICT. > > Yeah but all the prior vendor specific secure/restricted DMA heaps > relied on DT information. Right, but there's nothing in the DMA heap provider API forcing that. > > > The dynamic allocation scheme, where the TEE implementation allocates a > > chunk of protected memory for us would have a similar behavior, I guess. > > In a dynamic scheme, the allocation will still be from CMA or system > heap depending on TEE implementation capabilities but the restriction > will be enforced via interaction with TEE. Sorry, that's a wording issue. By dynamic allocation I meant the mode where allocations goes through the TEE, not the lend rstmem thing. BTW, calling the lend mode dynamic-allocation is kinda confusing, because in a sense, both modes can be considered dynamic allocation from the user PoV. I get that when the TEE allocates memory, it's picking from its fixed address/size pool, hence the name, but when I first read this, I thought the dynamic mode was the other one, and the static mode was the one where you reserve a mem range from the DT, query it from the driver and pass it to the TEE to restrict access post reservation/static allocation. > > > > > > - Dynamic allocation of buffers and making them restricted requires > > > vendor specific driver hooks with DMA heaps whereas the TEE subsystem > > > abstracts that out with underlying TEE implementation (eg. OP-TEE) > > > managing the dynamic buffer restriction. > > > > Yeah, the lend rstmem feature is clearly something tee specific, and I > > think that's okay to assume the user knows the protection request > > should go through the tee subsystem in that case. > > Yeah but how will the user discover that? There's nothing to discover here. It would just be explicitly specified: - for in-kernel users it can be a module parameter (or a DT prop if that's deemed acceptable) - for userspace, it can be an envvar, a config file, or whatever the app/lib uses to get config options > Rather than that it's better > for the user to directly ask the TEE device to allocate restricted > memory without worrying about how the memory restriction gets > enforced. If the consensus is that restricted/protected memory allocation should always be routed to the TEE, sure, but I had the feeling this wasn't as clear as that. OTOH, using a dma-heap to expose the TEE-SDP implementation provides the same benefits, without making potential future non-TEE based implementations a pain for users. The dma-heap ioctl being common to all implementations, it just becomes a configuration matter if we want to change the heap we rely on for protected/restricted buffer allocation. And because heaps have unique/well-known names, users can still default to (or rely solely on) the TEE-SPD implementation if they want. > > > > > > - TEE subsystem already has a well defined user-space interface for > > > managing shared memory buffers with TEE and restricted DMA buffers > > > will be yet another interface managed along similar lines. > > > > Okay, so the very reason I'm asking about the dma-buf heap interface is > > because there might be cases where the protected/restricted allocation > > doesn't go through the TEE (Mediatek has a TEE-free implementation > > for instance, but I realize vendor implementations are probably not the > > best selling point :-/). > > You can always have a system with memory and peripheral access > permissions setup during boot (or even have a pre-configured hardware > as a special case) prior to booting up the kernel too. But that even > gets somehow configured by a TEE implementation during boot, so > calling it a TEE-free implementation seems over-simplified and not a > scalable solution. However, this patchset [1] from Mediatek requires > runtime TEE interaction too. > > [1] https://lore.kernel.org/linux-arm-kernel/20240515112308.10171-1-yong.wu@mediatek.com/ > > > If we expose things as a dma-heap, we have > > a solution where integrators can pick the dma-heap they think is > > relevant for protected buffer allocations without the various drivers > > (GPU, video codec, ...) having to implement a dispatch function for all > > possible implementations. The same goes for userspace allocations, > > where passing a dma-heap name, is simpler than supporting different > > ioctl()s based on the allocation backend. > > There have been several attempts with DMA heaps in the past which all > resulted in a very vendor specific vertically integrated solution. But > the solution with TEE subsystem aims to make it generic and vendor > agnostic. Just because all previous protected/restricted dma-heap effort failed to make it upstream, doesn't mean dma-heap is the wrong way of exposing this feature IMHO. Regards, Boris
Hi, On Thu, 13 Feb 2025 at 12:40, Boris Brezillon <boris.brezillon@collabora.com> wrote: > On Thu, 13 Feb 2025 14:46:01 +0530 Sumit Garg <sumit.garg@linaro.org> wrote: > > Yeah but all the prior vendor specific secure/restricted DMA heaps > > relied on DT information. > > Right, but there's nothing in the DMA heap provider API forcing that. Yeah. DMA heaps are just a way to allocate memory from a specific place. It allows people to settle on having a single way to do allocations from weird platform-specific places; the only weird platform-specific part userspace needs to deal with is figuring out the name to use. The rest is at least a unified API: the point of dma-heaps was exactly to have a single coherent API for userspace, not to create one API for ZONE_CMA and DT ranges and everyone else doing their own thing. > > Rather than that it's better > > for the user to directly ask the TEE device to allocate restricted > > memory without worrying about how the memory restriction gets > > enforced. > > If the consensus is that restricted/protected memory allocation should > always be routed to the TEE, sure, but I had the feeling this wasn't as > clear as that. OTOH, using a dma-heap to expose the TEE-SDP > implementation provides the same benefits, without making potential > future non-TEE based implementations a pain for users. The dma-heap > ioctl being common to all implementations, it just becomes a > configuration matter if we want to change the heap we rely on for > protected/restricted buffer allocation. And because heaps have > unique/well-known names, users can still default to (or rely solely on) > the TEE-SPD implementation if they want. > > > There have been several attempts with DMA heaps in the past which all > > resulted in a very vendor specific vertically integrated solution. But > > the solution with TEE subsystem aims to make it generic and vendor > > agnostic. > > Just because all previous protected/restricted dma-heap effort > failed to make it upstream, doesn't mean dma-heap is the wrong way of > exposing this feature IMHO. To be fair, having a TEE implementation does give us a much better chance of having a sensible cross-vendor plan. And the fact it's already (sort of accidentally and only on one platform AFAICT) ready for a 'test' interface, where we can still exercise protected allocation paths but without having to go through all the platform-specific setup that is inaccessible to most people, is also really great! That's probably been the biggest barrier to having this tested outside of IHVs and OEMs. But just because TEE is one good backend implementation, doesn't mean it should be the userspace ABI. Why should userspace care that TEE has mediated the allocation instead of it being a predefined range within DT? How does userspace pick which TEE device to use? What advantage does userspace get from having to have a different codepath to get a different handle to memory? What about x86? I think this proposal is looking at it from the wrong direction. Instead of working upwards from the implementation to userspace, start with userspace and work downwards. The interesting property to focus on is allocating memory, not that EL1 is involved behind the scenes. Cheers, Daniel
Hi, On Thu, Feb 13, 2025 at 3:05 PM Daniel Stone <daniel@fooishbar.org> wrote: > > Hi, > > On Thu, 13 Feb 2025 at 12:40, Boris Brezillon > <boris.brezillon@collabora.com> wrote: > > On Thu, 13 Feb 2025 14:46:01 +0530 Sumit Garg <sumit.garg@linaro.org> wrote: > > > Yeah but all the prior vendor specific secure/restricted DMA heaps > > > relied on DT information. > > > > Right, but there's nothing in the DMA heap provider API forcing that. > > Yeah. DMA heaps are just a way to allocate memory from a specific > place. It allows people to settle on having a single way to do > allocations from weird platform-specific places; the only weird > platform-specific part userspace needs to deal with is figuring out > the name to use. The rest is at least a unified API: the point of > dma-heaps was exactly to have a single coherent API for userspace, not > to create one API for ZONE_CMA and DT ranges and everyone else doing > their own thing. > > > > Rather than that it's better > > > for the user to directly ask the TEE device to allocate restricted > > > memory without worrying about how the memory restriction gets > > > enforced. > > > > If the consensus is that restricted/protected memory allocation should > > always be routed to the TEE, sure, but I had the feeling this wasn't as > > clear as that. OTOH, using a dma-heap to expose the TEE-SDP > > implementation provides the same benefits, without making potential > > future non-TEE based implementations a pain for users. The dma-heap > > ioctl being common to all implementations, it just becomes a > > configuration matter if we want to change the heap we rely on for > > protected/restricted buffer allocation. And because heaps have > > unique/well-known names, users can still default to (or rely solely on) > > the TEE-SPD implementation if they want. > > > > > There have been several attempts with DMA heaps in the past which all > > > resulted in a very vendor specific vertically integrated solution. But > > > the solution with TEE subsystem aims to make it generic and vendor > > > agnostic. > > > > Just because all previous protected/restricted dma-heap effort > > failed to make it upstream, doesn't mean dma-heap is the wrong way of > > exposing this feature IMHO. > > To be fair, having a TEE implementation does give us a much better > chance of having a sensible cross-vendor plan. And the fact it's > already (sort of accidentally and only on one platform AFAICT) ready > for a 'test' interface, where we can still exercise protected > allocation paths but without having to go through all the > platform-specific setup that is inaccessible to most people, is also > really great! That's probably been the biggest barrier to having this > tested outside of IHVs and OEMs. > > But just because TEE is one good backend implementation, doesn't mean > it should be the userspace ABI. Why should userspace care that TEE has > mediated the allocation instead of it being a predefined range within > DT? The TEE may very well use a predefined range that part is abstracted with the interface. > How does userspace pick which TEE device to use? There's normally only one and even if there is more than one it should be safe to assume that only one of them should be used when allocating restricted memory (TEE_GEN_CAP_RSTMEM from TEE_IOC_VERSION). > What advantage > does userspace get from having to have a different codepath to get a > different handle to memory? What about x86? > > I think this proposal is looking at it from the wrong direction. > Instead of working upwards from the implementation to userspace, start > with userspace and work downwards. The interesting property to focus > on is allocating memory, not that EL1 is involved behind the scenes. From what I've gathered from earlier discussions, it wasn't much of a problem for userspace to handle this. If the kernel were to provide it via a different ABI, how would it be easier to implement in the kernel? I think we need an example to understand your suggestion. Cheers, Jens
Hi, On Thu, 13 Feb 2025 at 15:57, Jens Wiklander <jens.wiklander@linaro.org> wrote: > On Thu, Feb 13, 2025 at 3:05 PM Daniel Stone <daniel@fooishbar.org> wrote: > > But just because TEE is one good backend implementation, doesn't mean > > it should be the userspace ABI. Why should userspace care that TEE has > > mediated the allocation instead of it being a predefined range within > > DT? > > The TEE may very well use a predefined range that part is abstracted > with the interface. Of course. But you can also (and this has been shipped on real devices) handle this without any per-allocation TEE needs by simply allocating from a memory range which is predefined within DT. From the userspace point of view, why should there be one ABI to allocate memory from a predefined range which is delivered by DT to the kernel, and one ABI to allocate memory from a predefined range which is mediated by TEE? > > What advantage > > does userspace get from having to have a different codepath to get a > > different handle to memory? What about x86? > > > > I think this proposal is looking at it from the wrong direction. > > Instead of working upwards from the implementation to userspace, start > > with userspace and work downwards. The interesting property to focus > > on is allocating memory, not that EL1 is involved behind the scenes. > > From what I've gathered from earlier discussions, it wasn't much of a > problem for userspace to handle this. If the kernel were to provide it > via a different ABI, how would it be easier to implement in the > kernel? I think we need an example to understand your suggestion. It is a problem for userspace, because we need to expose acceptable parameters for allocation through the entire stack. If you look at the dmabuf documentation in the kernel for how buffers should be allocated and exchanged, you can see the negotiation flow for modifiers. This permeates through KMS, EGL, Vulkan, Wayland, GStreamer, and more. Standardising on heaps allows us to add those in a similar way. If we have to add different allocation mechanisms, then the complexity increases, permeating not only into all the different userspace APIs, but also into the drivers which need to support every different allocation mechanism even if they have no opinion on it - e.g. Mali doesn't care in any way whether the allocation comes from a heap or TEE or ACPI or whatever, it cares only that the memory is protected. Does that help? Cheers, Daniel
Hi, On Thu, Feb 13, 2025 at 6:39 PM Daniel Stone <daniel@fooishbar.org> wrote: > > Hi, > > On Thu, 13 Feb 2025 at 15:57, Jens Wiklander <jens.wiklander@linaro.org> wrote: > > On Thu, Feb 13, 2025 at 3:05 PM Daniel Stone <daniel@fooishbar.org> wrote: > > > But just because TEE is one good backend implementation, doesn't mean > > > it should be the userspace ABI. Why should userspace care that TEE has > > > mediated the allocation instead of it being a predefined range within > > > DT? > > > > The TEE may very well use a predefined range that part is abstracted > > with the interface. > > Of course. But you can also (and this has been shipped on real > devices) handle this without any per-allocation TEE needs by simply > allocating from a memory range which is predefined within DT. > > From the userspace point of view, why should there be one ABI to > allocate memory from a predefined range which is delivered by DT to > the kernel, and one ABI to allocate memory from a predefined range > which is mediated by TEE? We need some way to specify the protection profile (or use case as I've called it in the ABI) required for the buffer. Whether it's defined in DT seems irrelevant. > > > > What advantage > > > does userspace get from having to have a different codepath to get a > > > different handle to memory? What about x86? > > > > > > I think this proposal is looking at it from the wrong direction. > > > Instead of working upwards from the implementation to userspace, start > > > with userspace and work downwards. The interesting property to focus > > > on is allocating memory, not that EL1 is involved behind the scenes. > > > > From what I've gathered from earlier discussions, it wasn't much of a > > problem for userspace to handle this. If the kernel were to provide it > > via a different ABI, how would it be easier to implement in the > > kernel? I think we need an example to understand your suggestion. > > It is a problem for userspace, because we need to expose acceptable > parameters for allocation through the entire stack. If you look at the > dmabuf documentation in the kernel for how buffers should be allocated > and exchanged, you can see the negotiation flow for modifiers. This > permeates through KMS, EGL, Vulkan, Wayland, GStreamer, and more. What dma-buf properties are you referring to? dma_heap_ioctl_allocate() accepts a few flags for the resulting file descriptor and no flags for the heap itself. > > Standardising on heaps allows us to add those in a similar way. How would you solve this with heaps? Would you use one heap for each protection profile (use case), add heap_flags, or do a bit of both? > If we > have to add different allocation mechanisms, then the complexity > increases, permeating not only into all the different userspace APIs, > but also into the drivers which need to support every different > allocation mechanism even if they have no opinion on it - e.g. Mali > doesn't care in any way whether the allocation comes from a heap or > TEE or ACPI or whatever, it cares only that the memory is protected. > > Does that help? I think you're missing the stage where an unprotected buffer is received and decrypted into a protected buffer. If you use the TEE for decryption or to configure the involved devices for the use case, it makes sense to let the TEE allocate the buffers, too. A TEE doesn't have to be an OS in the secure world, it can be an abstraction to support the use case depending on the design. So the restricted buffer is already allocated before we reach Mali in your example. Allocating restricted buffers from the TEE subsystem saves us from maintaining proxy dma-buf heaps. Cheers, Jens > > Cheers, > Daniel
On Fri, 14 Feb 2025 at 15:37, Jens Wiklander <jens.wiklander@linaro.org> wrote: > > Hi, > > On Thu, Feb 13, 2025 at 6:39 PM Daniel Stone <daniel@fooishbar.org> wrote: > > > > Hi, > > > > On Thu, 13 Feb 2025 at 15:57, Jens Wiklander <jens.wiklander@linaro.org> wrote: > > > On Thu, Feb 13, 2025 at 3:05 PM Daniel Stone <daniel@fooishbar.org> wrote: > > > > But just because TEE is one good backend implementation, doesn't mean > > > > it should be the userspace ABI. Why should userspace care that TEE has > > > > mediated the allocation instead of it being a predefined range within > > > > DT? > > > > > > The TEE may very well use a predefined range that part is abstracted > > > with the interface. > > > > Of course. But you can also (and this has been shipped on real > > devices) handle this without any per-allocation TEE needs by simply > > allocating from a memory range which is predefined within DT. > > > > From the userspace point of view, why should there be one ABI to > > allocate memory from a predefined range which is delivered by DT to > > the kernel, and one ABI to allocate memory from a predefined range > > which is mediated by TEE? > > We need some way to specify the protection profile (or use case as > I've called it in the ABI) required for the buffer. Whether it's > defined in DT seems irrelevant. > > > > > > > What advantage > > > > does userspace get from having to have a different codepath to get a > > > > different handle to memory? What about x86? > > > > > > > > I think this proposal is looking at it from the wrong direction. > > > > Instead of working upwards from the implementation to userspace, start > > > > with userspace and work downwards. The interesting property to focus > > > > on is allocating memory, not that EL1 is involved behind the scenes. > > > > > > From what I've gathered from earlier discussions, it wasn't much of a > > > problem for userspace to handle this. If the kernel were to provide it > > > via a different ABI, how would it be easier to implement in the > > > kernel? I think we need an example to understand your suggestion. > > > > It is a problem for userspace, because we need to expose acceptable > > parameters for allocation through the entire stack. If you look at the > > dmabuf documentation in the kernel for how buffers should be allocated > > and exchanged, you can see the negotiation flow for modifiers. This > > permeates through KMS, EGL, Vulkan, Wayland, GStreamer, and more. > > What dma-buf properties are you referring to? > dma_heap_ioctl_allocate() accepts a few flags for the resulting file > descriptor and no flags for the heap itself. > > > > > Standardising on heaps allows us to add those in a similar way. > > How would you solve this with heaps? Would you use one heap for each > protection profile (use case), add heap_flags, or do a bit of both? Christian gave an historical background here [1] as to why that hasn't worked in the past with DMA heaps given the scalability issues. [1] https://lore.kernel.org/dri-devel/e967e382-6cca-4dee-8333-39892d532f71@gmail.com/ > > > If we > > have to add different allocation mechanisms, then the complexity > > increases, permeating not only into all the different userspace APIs, > > but also into the drivers which need to support every different > > allocation mechanism even if they have no opinion on it - e.g. Mali > > doesn't care in any way whether the allocation comes from a heap or > > TEE or ACPI or whatever, it cares only that the memory is protected. > > > > Does that help? > > I think you're missing the stage where an unprotected buffer is > received and decrypted into a protected buffer. If you use the TEE for > decryption or to configure the involved devices for the use case, it > makes sense to let the TEE allocate the buffers, too. A TEE doesn't > have to be an OS in the secure world, it can be an abstraction to > support the use case depending on the design. So the restricted buffer > is already allocated before we reach Mali in your example. > > Allocating restricted buffers from the TEE subsystem saves us from > maintaining proxy dma-buf heaps. +1 -Sumit
On Fri, 14 Feb 2025 18:37:14 +0530 Sumit Garg <sumit.garg@linaro.org> wrote: > On Fri, 14 Feb 2025 at 15:37, Jens Wiklander <jens.wiklander@linaro.org> wrote: > > > > Hi, > > > > On Thu, Feb 13, 2025 at 6:39 PM Daniel Stone <daniel@fooishbar.org> wrote: > > > > > > Hi, > > > > > > On Thu, 13 Feb 2025 at 15:57, Jens Wiklander <jens.wiklander@linaro.org> wrote: > > > > On Thu, Feb 13, 2025 at 3:05 PM Daniel Stone <daniel@fooishbar.org> wrote: > > > > > But just because TEE is one good backend implementation, doesn't mean > > > > > it should be the userspace ABI. Why should userspace care that TEE has > > > > > mediated the allocation instead of it being a predefined range within > > > > > DT? > > > > > > > > The TEE may very well use a predefined range that part is abstracted > > > > with the interface. > > > > > > Of course. But you can also (and this has been shipped on real > > > devices) handle this without any per-allocation TEE needs by simply > > > allocating from a memory range which is predefined within DT. > > > > > > From the userspace point of view, why should there be one ABI to > > > allocate memory from a predefined range which is delivered by DT to > > > the kernel, and one ABI to allocate memory from a predefined range > > > which is mediated by TEE? > > > > We need some way to specify the protection profile (or use case as > > I've called it in the ABI) required for the buffer. Whether it's > > defined in DT seems irrelevant. > > > > > > > > > > What advantage > > > > > does userspace get from having to have a different codepath to get a > > > > > different handle to memory? What about x86? > > > > > > > > > > I think this proposal is looking at it from the wrong direction. > > > > > Instead of working upwards from the implementation to userspace, start > > > > > with userspace and work downwards. The interesting property to focus > > > > > on is allocating memory, not that EL1 is involved behind the scenes. > > > > > > > > From what I've gathered from earlier discussions, it wasn't much of a > > > > problem for userspace to handle this. If the kernel were to provide it > > > > via a different ABI, how would it be easier to implement in the > > > > kernel? I think we need an example to understand your suggestion. > > > > > > It is a problem for userspace, because we need to expose acceptable > > > parameters for allocation through the entire stack. If you look at the > > > dmabuf documentation in the kernel for how buffers should be allocated > > > and exchanged, you can see the negotiation flow for modifiers. This > > > permeates through KMS, EGL, Vulkan, Wayland, GStreamer, and more. > > > > What dma-buf properties are you referring to? > > dma_heap_ioctl_allocate() accepts a few flags for the resulting file > > descriptor and no flags for the heap itself. > > > > > > > > Standardising on heaps allows us to add those in a similar way. > > > > How would you solve this with heaps? Would you use one heap for each > > protection profile (use case), add heap_flags, or do a bit of both? I would say one heap per-profile. > > Christian gave an historical background here [1] as to why that hasn't > worked in the past with DMA heaps given the scalability issues. > > [1] https://lore.kernel.org/dri-devel/e967e382-6cca-4dee-8333-39892d532f71@gmail.com/ Hm, I fail to see where Christian dismiss the dma-heaps solution in this email. He even says: > If the memory is not physically attached to any device, but rather just memory attached to the CPU or a system wide memory controller then expose the memory as DMA-heap with specific requirements (e.g. certain sized pages, contiguous, restricted, encrypted, ...). > > > > > > If we > > > have to add different allocation mechanisms, then the complexity > > > increases, permeating not only into all the different userspace APIs, > > > but also into the drivers which need to support every different > > > allocation mechanism even if they have no opinion on it - e.g. Mali > > > doesn't care in any way whether the allocation comes from a heap or > > > TEE or ACPI or whatever, it cares only that the memory is protected. > > > > > > Does that help? > > > > I think you're missing the stage where an unprotected buffer is > > received and decrypted into a protected buffer. If you use the TEE for > > decryption or to configure the involved devices for the use case, it > > makes sense to let the TEE allocate the buffers, too. A TEE doesn't > > have to be an OS in the secure world, it can be an abstraction to > > support the use case depending on the design. So the restricted buffer > > is already allocated before we reach Mali in your example. > > > > Allocating restricted buffers from the TEE subsystem saves us from > > maintaining proxy dma-buf heaps. Honestly, when I look at dma-heap implementations, they seem to be trivial shells around existing (more complex) allocators, and the boiler plate [1] to expose a dma-heap is relatively small. The dma-buf implementation, you already have, so we're talking about a hundred lines of code to maintain, which shouldn't be significantly more than what you have for the new ioctl() to be honest. And I'll insist on what Daniel said, it's a small price to pay to have a standard interface to expose to userspace. If dma-heaps are not used for this kind things, I honestly wonder what they will be used for... Regards, Boris [1]https://elixir.bootlin.com/linux/v6.13.2/source/drivers/dma-buf/heaps/system_heap.c#L314
On Fri, 14 Feb 2025 at 21:19, Boris Brezillon <boris.brezillon@collabora.com> wrote: > > On Fri, 14 Feb 2025 18:37:14 +0530 > Sumit Garg <sumit.garg@linaro.org> wrote: > > > On Fri, 14 Feb 2025 at 15:37, Jens Wiklander <jens.wiklander@linaro.org> wrote: > > > > > > Hi, > > > > > > On Thu, Feb 13, 2025 at 6:39 PM Daniel Stone <daniel@fooishbar.org> wrote: > > > > > > > > Hi, > > > > > > > > On Thu, 13 Feb 2025 at 15:57, Jens Wiklander <jens.wiklander@linaro.org> wrote: > > > > > On Thu, Feb 13, 2025 at 3:05 PM Daniel Stone <daniel@fooishbar.org> wrote: > > > > > > But just because TEE is one good backend implementation, doesn't mean > > > > > > it should be the userspace ABI. Why should userspace care that TEE has > > > > > > mediated the allocation instead of it being a predefined range within > > > > > > DT? > > > > > > > > > > The TEE may very well use a predefined range that part is abstracted > > > > > with the interface. > > > > > > > > Of course. But you can also (and this has been shipped on real > > > > devices) handle this without any per-allocation TEE needs by simply > > > > allocating from a memory range which is predefined within DT. > > > > > > > > From the userspace point of view, why should there be one ABI to > > > > allocate memory from a predefined range which is delivered by DT to > > > > the kernel, and one ABI to allocate memory from a predefined range > > > > which is mediated by TEE? > > > > > > We need some way to specify the protection profile (or use case as > > > I've called it in the ABI) required for the buffer. Whether it's > > > defined in DT seems irrelevant. > > > > > > > > > > > > > What advantage > > > > > > does userspace get from having to have a different codepath to get a > > > > > > different handle to memory? What about x86? > > > > > > > > > > > > I think this proposal is looking at it from the wrong direction. > > > > > > Instead of working upwards from the implementation to userspace, start > > > > > > with userspace and work downwards. The interesting property to focus > > > > > > on is allocating memory, not that EL1 is involved behind the scenes. > > > > > > > > > > From what I've gathered from earlier discussions, it wasn't much of a > > > > > problem for userspace to handle this. If the kernel were to provide it > > > > > via a different ABI, how would it be easier to implement in the > > > > > kernel? I think we need an example to understand your suggestion. > > > > > > > > It is a problem for userspace, because we need to expose acceptable > > > > parameters for allocation through the entire stack. If you look at the > > > > dmabuf documentation in the kernel for how buffers should be allocated > > > > and exchanged, you can see the negotiation flow for modifiers. This > > > > permeates through KMS, EGL, Vulkan, Wayland, GStreamer, and more. > > > > > > What dma-buf properties are you referring to? > > > dma_heap_ioctl_allocate() accepts a few flags for the resulting file > > > descriptor and no flags for the heap itself. > > > > > > > > > > > Standardising on heaps allows us to add those in a similar way. > > > > > > How would you solve this with heaps? Would you use one heap for each > > > protection profile (use case), add heap_flags, or do a bit of both? > > I would say one heap per-profile. > And then it would have a per vendor multiplication factor as each vendor enforces memory restriction in a platform specific manner which won't scale. > > > > Christian gave an historical background here [1] as to why that hasn't > > worked in the past with DMA heaps given the scalability issues. > > > > [1] https://lore.kernel.org/dri-devel/e967e382-6cca-4dee-8333-39892d532f71@gmail.com/ > > Hm, I fail to see where Christian dismiss the dma-heaps solution in > this email. He even says: > > > If the memory is not physically attached to any device, but rather just > memory attached to the CPU or a system wide memory controller then > expose the memory as DMA-heap with specific requirements (e.g. certain > sized pages, contiguous, restricted, encrypted, ...). I am not saying Christian dismissed DMA heaps but rather how scalability is an issue. What we are proposing here is a generic interface via TEE to the firmware/Trusted OS which can perform all the platform specific memory restrictions. This solution will scale across vendors. > > > > > > > > > > If we > > > > have to add different allocation mechanisms, then the complexity > > > > increases, permeating not only into all the different userspace APIs, > > > > but also into the drivers which need to support every different > > > > allocation mechanism even if they have no opinion on it - e.g. Mali > > > > doesn't care in any way whether the allocation comes from a heap or > > > > TEE or ACPI or whatever, it cares only that the memory is protected. > > > > > > > > Does that help? > > > > > > I think you're missing the stage where an unprotected buffer is > > > received and decrypted into a protected buffer. If you use the TEE for > > > decryption or to configure the involved devices for the use case, it > > > makes sense to let the TEE allocate the buffers, too. A TEE doesn't > > > have to be an OS in the secure world, it can be an abstraction to > > > support the use case depending on the design. So the restricted buffer > > > is already allocated before we reach Mali in your example. > > > > > > Allocating restricted buffers from the TEE subsystem saves us from > > > maintaining proxy dma-buf heaps. > > Honestly, when I look at dma-heap implementations, they seem > to be trivial shells around existing (more complex) allocators, and the > boiler plate [1] to expose a dma-heap is relatively small. The dma-buf > implementation, you already have, so we're talking about a hundred > lines of code to maintain, which shouldn't be significantly more than > what you have for the new ioctl() to be honest. It will rather be redundant vendor specific code under DMA heaps calling into firmware/Trusted OS to enforce memory restrictions as you can look into Mediatek example [1]. With TEE subsystem managing that it won't be the case as we will provide a common abstraction for the communication with underlying firmware/Trusted OS. [1] https://lore.kernel.org/linux-arm-kernel/20240515112308.10171-1-yong.wu@mediatek.com/ > And I'll insist on what > Daniel said, it's a small price to pay to have a standard interface to > expose to userspace. If dma-heaps are not used for this kind things, I > honestly wonder what they will be used for... Let's try not to forcefully find a use-case for DMA heaps when there is a better alternative available. I am still failing to see why you don't consider following as a standardised user-space interface: "When user-space has to work with restricted memory, ask TEE device to allocate it" -Sumit
Hi Sumit, On Mon, 17 Feb 2025 at 06:13, Sumit Garg <sumit.garg@linaro.org> wrote: > On Fri, 14 Feb 2025 at 21:19, Boris Brezillon <boris.brezillon@collabora.com> wrote: > > I would say one heap per-profile. > > And then it would have a per vendor multiplication factor as each > vendor enforces memory restriction in a platform specific manner which > won't scale. Yes, they do enforce it in a platform-specific manner, but so does TEE. There is no one golden set of semantics which is globally applicable between all hardware and all products in a useful manner. So, if we define protected,secure-video + protected,secure-video-record + protected,trusted-ui heap names, we have exactly the same number of axes. The only change is from uint32_t to string. > > > Christian gave an historical background here [1] as to why that hasn't > > > worked in the past with DMA heaps given the scalability issues. > > > > > > [1] https://lore.kernel.org/dri-devel/e967e382-6cca-4dee-8333-39892d532f71@gmail.com/ > > > > Hm, I fail to see where Christian dismiss the dma-heaps solution in > > this email. He even says: > > > > > If the memory is not physically attached to any device, but rather just > > memory attached to the CPU or a system wide memory controller then > > expose the memory as DMA-heap with specific requirements (e.g. certain > > sized pages, contiguous, restricted, encrypted, ...). > > I am not saying Christian dismissed DMA heaps but rather how > scalability is an issue. What we are proposing here is a generic > interface via TEE to the firmware/Trusted OS which can perform all the > platform specific memory restrictions. This solution will scale across > vendors. I read something completely different into Christian's mail. What Christian is saying is that injecting generic constraint solving into the kernel doesn't scale. It's not OK to build out generic infrastructure in the kernel which queries a bunch of leaf drivers and attempts to somehow come up with something which satisfies userspace-provided constraints. But this isn't the same thing as saying 'dma-heaps is wrong'! Again, there is no additional complexity in the kernel between a dma-heap which bridges over to TEE, and a TEE userspace interface which also bridges over to TEE. Both of them are completely fine according to what he's said. > > Honestly, when I look at dma-heap implementations, they seem > > to be trivial shells around existing (more complex) allocators, and the > > boiler plate [1] to expose a dma-heap is relatively small. The dma-buf > > implementation, you already have, so we're talking about a hundred > > lines of code to maintain, which shouldn't be significantly more than > > what you have for the new ioctl() to be honest. > > It will rather be redundant vendor specific code under DMA heaps > calling into firmware/Trusted OS to enforce memory restrictions as you > can look into Mediatek example [1]. With TEE subsystem managing that > it won't be the case as we will provide a common abstraction for the > communication with underlying firmware/Trusted OS. Yes, it's common for everyone who uses TEE to implement SVP. It's not common for the people who do _not_ use TEE to implement SVP. Which means that userspace has to type out both, and what we're asking in this thread is: why? Why should userspace have to support dma-heap allocation for platforms supporting SVP via a static DT-defined carveout as well as supporting TEE API allocation for platforms supporting SVP via a dynamic carveout? What benefit does it bring to have this surfaced as a completely separate uAPI? > > And I'll insist on what > > Daniel said, it's a small price to pay to have a standard interface to > > expose to userspace. If dma-heaps are not used for this kind things, I > > honestly wonder what they will be used for... > > Let's try not to forcefully find a use-case for DMA heaps when there > is a better alternative available. What makes it better? If you could explain very clearly the benefit userspace will gain from asking TEE to allocate $n bytes for TEE_IOC_UC_SECURE_VIDEO_PLAY, compared to asking dma-heap to allocate $n bytes for protected,secure-video, I think that would really help. Right now, I don't understand how it would be better in any way whatsoever for userspace. And I think your decision to implement it as a separate API is based on a misunderstanding of Christian's position. > I am still failing to see why you > don't consider following as a standardised user-space interface: > > "When user-space has to work with restricted memory, ask TEE device to > allocate it" As far as I can tell, having userspace work with the TEE interface brings zero benefit (again, please correct me if I'm wrong and explain how it's better). The direct cost - call it a disbenefit - it brings is that we have to spend a pile of time typing out support for TEE allocation in every media/GPU/display driver/application, and when we do any kind of negotiation, we have to have one protocol definition for TEE and one for non-TEE. dma-heaps was created to solve the problem of having too many 'allocate $n bytes from $specialplace' uAPIs. The proliferation was painful and making it difficult for userspace to do what it needed to do. Userspace doesn't _yet_ make full use of it, but the solution is to make userspace make full use of it, not to go create entirely separate allocation paths for unclear reasons. Besides, I'm writing this from a platform that implements SVP not via TEE. I've worked on platforms which implement SVP without any TEE, where the TEE implementation would be at best a no-op stub, and at worst flat-out impossible. So that's 'why not TEE as the single uAPI for SVP'. So, again, let's please turn this around: _why_ TEE? Who benefits from exposing this as completely separate to the more generic uAPI that we specifically designed to handle things like this? Cheers, Daniel
On Tue, 18 Feb 2025 at 21:52, Daniel Stone <daniel@fooishbar.org> wrote:
>
> Hi Sumit,
>
> On Mon, 17 Feb 2025 at 06:13, Sumit Garg <sumit.garg@linaro.org> wrote:
> > On Fri, 14 Feb 2025 at 21:19, Boris Brezillon <boris.brezillon@collabora.com> wrote:
> > > I would say one heap per-profile.
> >
> > And then it would have a per vendor multiplication factor as each
> > vendor enforces memory restriction in a platform specific manner which
> > won't scale.
>
> Yes, they do enforce it in a platform-specific manner, but so does
> TEE. There is no one golden set of semantics which is globally
> applicable between all hardware and all products in a useful manner.
>
> So, if we define protected,secure-video +
> protected,secure-video-record + protected,trusted-ui heap names, we
> have exactly the same number of axes. The only change is from uint32_t
> to string.
>
> > > > Christian gave an historical background here [1] as to why that hasn't
> > > > worked in the past with DMA heaps given the scalability issues.
> > > >
> > > > [1] https://lore.kernel.org/dri-devel/e967e382-6cca-4dee-8333-39892d532f71@gmail.com/
> > >
> > > Hm, I fail to see where Christian dismiss the dma-heaps solution in
> > > this email. He even says:
> > >
> > > > If the memory is not physically attached to any device, but rather just
> > > memory attached to the CPU or a system wide memory controller then
> > > expose the memory as DMA-heap with specific requirements (e.g. certain
> > > sized pages, contiguous, restricted, encrypted, ...).
> >
> > I am not saying Christian dismissed DMA heaps but rather how
> > scalability is an issue. What we are proposing here is a generic
> > interface via TEE to the firmware/Trusted OS which can perform all the
> > platform specific memory restrictions. This solution will scale across
> > vendors.
>
> I read something completely different into Christian's mail.
>
> What Christian is saying is that injecting generic constraint solving
> into the kernel doesn't scale. It's not OK to build out generic
> infrastructure in the kernel which queries a bunch of leaf drivers and
> attempts to somehow come up with something which satisfies
> userspace-provided constraints.
>
> But this isn't the same thing as saying 'dma-heaps is wrong'! Again,
> there is no additional complexity in the kernel between a dma-heap
> which bridges over to TEE, and a TEE userspace interface which also
> bridges over to TEE. Both of them are completely fine according to
> what he's said.
>
> > > Honestly, when I look at dma-heap implementations, they seem
> > > to be trivial shells around existing (more complex) allocators, and the
> > > boiler plate [1] to expose a dma-heap is relatively small. The dma-buf
> > > implementation, you already have, so we're talking about a hundred
> > > lines of code to maintain, which shouldn't be significantly more than
> > > what you have for the new ioctl() to be honest.
> >
> > It will rather be redundant vendor specific code under DMA heaps
> > calling into firmware/Trusted OS to enforce memory restrictions as you
> > can look into Mediatek example [1]. With TEE subsystem managing that
> > it won't be the case as we will provide a common abstraction for the
> > communication with underlying firmware/Trusted OS.
>
> Yes, it's common for everyone who uses TEE to implement SVP. It's not
> common for the people who do _not_ use TEE to implement SVP. Which
> means that userspace has to type out both, and what we're asking in
> this thread is: why?
>
> Why should userspace have to support dma-heap allocation for platforms
> supporting SVP via a static DT-defined carveout as well as supporting
> TEE API allocation for platforms supporting SVP via a dynamic
> carveout? What benefit does it bring to have this surfaced as a
> completely separate uAPI?
>
> > > And I'll insist on what
> > > Daniel said, it's a small price to pay to have a standard interface to
> > > expose to userspace. If dma-heaps are not used for this kind things, I
> > > honestly wonder what they will be used for...
> >
> > Let's try not to forcefully find a use-case for DMA heaps when there
> > is a better alternative available.
>
> What makes it better? If you could explain very clearly the benefit
> userspace will gain from asking TEE to allocate $n bytes for
> TEE_IOC_UC_SECURE_VIDEO_PLAY, compared to asking dma-heap to allocate
> $n bytes for protected,secure-video, I think that would really help.
> Right now, I don't understand how it would be better in any way
> whatsoever for userspace. And I think your decision to implement it as
> a separate API is based on a misunderstanding of Christian's position.
>
> > I am still failing to see why you
> > don't consider following as a standardised user-space interface:
> >
> > "When user-space has to work with restricted memory, ask TEE device to
> > allocate it"
>
> As far as I can tell, having userspace work with the TEE interface
> brings zero benefit (again, please correct me if I'm wrong and explain
> how it's better). The direct cost - call it a disbenefit - it brings
> is that we have to spend a pile of time typing out support for TEE
> allocation in every media/GPU/display driver/application, and when we
> do any kind of negotiation, we have to have one protocol definition
> for TEE and one for non-TEE.
>
> dma-heaps was created to solve the problem of having too many
> 'allocate $n bytes from $specialplace' uAPIs. The proliferation was
> painful and making it difficult for userspace to do what it needed to
> do. Userspace doesn't _yet_ make full use of it, but the solution is
> to make userspace make full use of it, not to go create entirely
> separate allocation paths for unclear reasons.
>
> Besides, I'm writing this from a platform that implements SVP not via
> TEE. I've worked on platforms which implement SVP without any TEE,
> where the TEE implementation would be at best a no-op stub, and at
> worst flat-out impossible.
Can you elaborate the non-TEE use-case for Secure Video Path (SVP) a
bit more? As to how the protected/encrypted media content pipeline
works? Which architecture support does your use-case require? Is there
any higher privileged level firmware interaction required to perform
media content decryption into restricted memory? Do you plan to
upstream corresponding support in near future?
Let me try to elaborate on the Secure Video Path (SVP) flow requiring
a TEE implementation (in general terms a higher privileged firmware
managing the pipeline as the kernel/user-space has no access
permissions to the plain text media content):
- Firstly a content decryption key is securely provisioned into the
TEE implementation.
- Interaction with TEE to set up access permissions of different
peripherals in the media pipeline so that they can access restricted
memory.
- Interaction with TEE to allocate restricted memory buffers.
- Interaction with TEE to decrypt downloaded encrypted media content
from normal memory buffers to restricted memory buffers.
- Then the further media pipeline is able to process the plain media
content in restricted buffers and display it.
>
> So that's 'why not TEE as the single uAPI for SVP'.
Let's try to see if your SVP use-case really converges with TEE based
SVP such that we really need a single uAPI.
> So, again, let's
> please turn this around: _why_ TEE? Who benefits from exposing this as
> completely separate to the more generic uAPI that we specifically
> designed to handle things like this?
The bridging between DMA heaps and TEE would still require user-space
to perform an IOCTL into TEE to register the DMA-bufs as you can see
here [1]. Then it will rather be two handles for user-space to manage.
Similarly during restricted memory allocation/free we need another
glue layer under DMA heaps to TEE subsystem.
The reason is simply which has been iterated over many times in the
past threads that:
"If user-space has to interact with a TEE device for SVP use-case
then why it's not better to ask TEE to allocate restricted DMA-bufs
too"
[1] https://lkml.indiana.edu/hypermail/linux/kernel/2408.3/08296.html
-Sumit
Hi Sumit, On Fri, 21 Feb 2025 at 11:24, Sumit Garg <sumit.garg@linaro.org> wrote: > On Tue, 18 Feb 2025 at 21:52, Daniel Stone <daniel@fooishbar.org> wrote: > > dma-heaps was created to solve the problem of having too many > > 'allocate $n bytes from $specialplace' uAPIs. The proliferation was > > painful and making it difficult for userspace to do what it needed to > > do. Userspace doesn't _yet_ make full use of it, but the solution is > > to make userspace make full use of it, not to go create entirely > > separate allocation paths for unclear reasons. > > > > Besides, I'm writing this from a platform that implements SVP not via > > TEE. I've worked on platforms which implement SVP without any TEE, > > where the TEE implementation would be at best a no-op stub, and at > > worst flat-out impossible. > > Can you elaborate the non-TEE use-case for Secure Video Path (SVP) a > bit more? As to how the protected/encrypted media content pipeline > works? Which architecture support does your use-case require? Is there > any higher privileged level firmware interaction required to perform > media content decryption into restricted memory? Do you plan to > upstream corresponding support in near future? You can see the MTK SVP patches on list which use the MTK SMC to mediate it. There are TI Jacinto platforms which implement a 'secure' area configured statically by (IIRC) BL2, with static permissions defined for each AXI endpoint, e.g. CPU write + codec RW + dispc read. I've heard of another SoC vendor doing the same, but I don't think I can share those details. There is no TEE interaction. I'm writing this message from an AMD laptop which implements restricted content paths outside of TEE. I don't have the full picture of how SVP is implemented on AMD systems, but I do know that I don't have any TEE devices exposed. > Let me try to elaborate on the Secure Video Path (SVP) flow requiring > a TEE implementation (in general terms a higher privileged firmware > managing the pipeline as the kernel/user-space has no access > permissions to the plain text media content): > > - [...] Yeah, I totally understand the TEE usecase. I think that TEE is a good design to implement this. I think that TEE should be used for SVP where it makes sense. Please understand that I am _not_ arguing that no-one should use TEE for SVP! > > So, again, let's > > please turn this around: _why_ TEE? Who benefits from exposing this as > > completely separate to the more generic uAPI that we specifically > > designed to handle things like this? > > The bridging between DMA heaps and TEE would still require user-space > to perform an IOCTL into TEE to register the DMA-bufs as you can see > here [1]. Then it will rather be two handles for user-space to manage. Yes, the decoder would need to do this. That's common though: if you want to share a buffer between V4L2 and DRM, you have three handles: the V4L2 buffer handle, the DRM GEM handle, and the dmabuf you use to bridge the two. > Similarly during restricted memory allocation/free we need another > glue layer under DMA heaps to TEE subsystem. Yep. > The reason is simply which has been iterated over many times in the > past threads that: > > "If user-space has to interact with a TEE device for SVP use-case > then why it's not better to ask TEE to allocate restricted DMA-bufs > too" The first word in your proposition is load-bearing. Build out the usecase a little more here. You have a DRMed video stream coming in, which you need to decode (involving TEE for this usecase). You get a dmabuf handle to the decoded frame. You need to pass the dmabuf across to the Wayland compositor. The compositor needs to pass it to EGL/Vulkan to import and do composition, which in turn passes it to the GPU DRM driver. The output of the composition is in turn shared between the GPU DRM driver and the separate KMS DRM driver, with the involvement of GBM. For the platforms I'm interested in, the GPU DRM driver needs to switch into protected mode, which has no involvement at all with TEE - it's architecturally impossible to have TEE involved without moving most of the GPU driver into TEE and destroying performance. The display hardware also needs to engage protected mode, which again has no involvement with TEE and again would need to have half the driver moved into TEE for no benefit in order to do so. The Wayland compositor also has no interest in TEE: it tells the GPU DRM driver about the protected status of its buffers, and that's it. What these components _are_ opinionated about, is the way buffers are allocated and managed. We built out dmabuf modifiers for this usecase, and we have a good negotiation protocol around that. We also really care about buffer placement in some usecases - e.g. some display/codec hardware requires buffers to be sourced from contiguous memory, other hardware needs to know that when it shares buffers with another device, it needs to place the buffers outside of inaccessible/slow local RAM. So we built out dma-heaps, so every part of the component in the stack can communicate their buffer-placement needs in the same way as we do modifiers, and negotiate an acceptable allocation. That's my starting point for this discussion. We have a mechanism to deal with the fact that buffers need to be shared between different IP blocks which have their own constraints on buffer placement, avoiding the current problem of having every subsystem reinvent their own allocation uAPI which was burying us in impedance mismatch and confusion. That mechanism is dma-heaps. It seems like your starting point from this discussion is that you've implemented a TEE-centric design for SVP, and so all of userspace should bypass our existing cross-subsystem special-purpose allocation mechanism, and write specifically to one implementation. I believe that is a massive step backwards and an immediate introduction of technical debt. Again, having an implementation of SVP via TEE makes a huge amount of sense. Having _most_ SVP implementations via TEE still makes a lot of sense. Having _all_ SVP implementations eventually be via TEE would still make sense. But even if we were at that point - which we aren't - it still doesn't justify telling userspace 'use the generic dma-heap uAPI for every device-specific allocation constraint, apart from SVP which has a completely different way to allocate some bytes'. Cheers, Daniel
Hi Daniel, On Fri, Feb 21, 2025 at 3:12 PM Daniel Stone <daniel@fooishbar.org> wrote: > > Hi Sumit, > > On Fri, 21 Feb 2025 at 11:24, Sumit Garg <sumit.garg@linaro.org> wrote: > > On Tue, 18 Feb 2025 at 21:52, Daniel Stone <daniel@fooishbar.org> wrote: > > > dma-heaps was created to solve the problem of having too many > > > 'allocate $n bytes from $specialplace' uAPIs. The proliferation was > > > painful and making it difficult for userspace to do what it needed to > > > do. Userspace doesn't _yet_ make full use of it, but the solution is > > > to make userspace make full use of it, not to go create entirely > > > separate allocation paths for unclear reasons. > > > > > > Besides, I'm writing this from a platform that implements SVP not via > > > TEE. I've worked on platforms which implement SVP without any TEE, > > > where the TEE implementation would be at best a no-op stub, and at > > > worst flat-out impossible. > > > > Can you elaborate the non-TEE use-case for Secure Video Path (SVP) a > > bit more? As to how the protected/encrypted media content pipeline > > works? Which architecture support does your use-case require? Is there > > any higher privileged level firmware interaction required to perform > > media content decryption into restricted memory? Do you plan to > > upstream corresponding support in near future? > > You can see the MTK SVP patches on list which use the MTK SMC to mediate it. > > There are TI Jacinto platforms which implement a 'secure' area > configured statically by (IIRC) BL2, with static permissions defined > for each AXI endpoint, e.g. CPU write + codec RW + dispc read. I've > heard of another SoC vendor doing the same, but I don't think I can > share those details. There is no TEE interaction. > > I'm writing this message from an AMD laptop which implements > restricted content paths outside of TEE. I don't have the full picture > of how SVP is implemented on AMD systems, but I do know that I don't > have any TEE devices exposed. > > > Let me try to elaborate on the Secure Video Path (SVP) flow requiring > > a TEE implementation (in general terms a higher privileged firmware > > managing the pipeline as the kernel/user-space has no access > > permissions to the plain text media content): > > > > - [...] > > Yeah, I totally understand the TEE usecase. I think that TEE is a good > design to implement this. I think that TEE should be used for SVP > where it makes sense. > > Please understand that I am _not_ arguing that no-one should use TEE for SVP! > > > > So, again, let's > > > please turn this around: _why_ TEE? Who benefits from exposing this as > > > completely separate to the more generic uAPI that we specifically > > > designed to handle things like this? > > > > The bridging between DMA heaps and TEE would still require user-space > > to perform an IOCTL into TEE to register the DMA-bufs as you can see > > here [1]. Then it will rather be two handles for user-space to manage. > > Yes, the decoder would need to do this. That's common though: if you > want to share a buffer between V4L2 and DRM, you have three handles: > the V4L2 buffer handle, the DRM GEM handle, and the dmabuf you use to > bridge the two. > > > Similarly during restricted memory allocation/free we need another > > glue layer under DMA heaps to TEE subsystem. > > Yep. > > > The reason is simply which has been iterated over many times in the > > past threads that: > > > > "If user-space has to interact with a TEE device for SVP use-case > > then why it's not better to ask TEE to allocate restricted DMA-bufs > > too" > > The first word in your proposition is load-bearing. > > Build out the usecase a little more here. You have a DRMed video > stream coming in, which you need to decode (involving TEE for this > usecase). You get a dmabuf handle to the decoded frame. You need to > pass the dmabuf across to the Wayland compositor. The compositor needs > to pass it to EGL/Vulkan to import and do composition, which in turn > passes it to the GPU DRM driver. The output of the composition is in > turn shared between the GPU DRM driver and the separate KMS DRM > driver, with the involvement of GBM. > > For the platforms I'm interested in, the GPU DRM driver needs to > switch into protected mode, which has no involvement at all with TEE - > it's architecturally impossible to have TEE involved without moving > most of the GPU driver into TEE and destroying performance. The > display hardware also needs to engage protected mode, which again has > no involvement with TEE and again would need to have half the driver > moved into TEE for no benefit in order to do so. The Wayland > compositor also has no interest in TEE: it tells the GPU DRM driver > about the protected status of its buffers, and that's it. > > What these components _are_ opinionated about, is the way buffers are > allocated and managed. We built out dmabuf modifiers for this usecase, > and we have a good negotiation protocol around that. We also really > care about buffer placement in some usecases - e.g. some display/codec > hardware requires buffers to be sourced from contiguous memory, other > hardware needs to know that when it shares buffers with another > device, it needs to place the buffers outside of inaccessible/slow > local RAM. So we built out dma-heaps, so every part of the component > in the stack can communicate their buffer-placement needs in the same > way as we do modifiers, and negotiate an acceptable allocation. > > That's my starting point for this discussion. We have a mechanism to > deal with the fact that buffers need to be shared between different IP > blocks which have their own constraints on buffer placement, avoiding > the current problem of having every subsystem reinvent their own > allocation uAPI which was burying us in impedance mismatch and > confusion. That mechanism is dma-heaps. It seems like your starting > point from this discussion is that you've implemented a TEE-centric > design for SVP, and so all of userspace should bypass our existing > cross-subsystem special-purpose allocation mechanism, and write > specifically to one implementation. I believe that is a massive step > backwards and an immediate introduction of technical debt. > > Again, having an implementation of SVP via TEE makes a huge amount of > sense. Having _most_ SVP implementations via TEE still makes a lot of > sense. Having _all_ SVP implementations eventually be via TEE would > still make sense. But even if we were at that point - which we aren't > - it still doesn't justify telling userspace 'use the generic dma-heap > uAPI for every device-specific allocation constraint, apart from SVP > which has a completely different way to allocate some bytes'. I must admit that I don't see how this makes a significant difference, but then I haven't hacked much in the stacks you're talking about, so I'm going to take your word for it. I've experimented with providing a dma-heap replacing the TEE API. The implementation is more complex than I first anticipated, adding about 400 lines to the patch set. From user space, it looks like another dma-heap. I'm using the names you gave earlier, protected,secure-video, protected,trusted-ui, and protected,secure-video-record. However, I wonder if we shouldn't use "restricted" instead of "protected" since we had agreed to call it restricted memory earlier. I'll soon post this in a v6 and an updated demo. Cheers, Jens
On Tue, Mar 04, 2025 at 08:17:23AM +0100, Jens Wiklander wrote: > Hi Daniel, > > On Fri, Feb 21, 2025 at 3:12 PM Daniel Stone <daniel@fooishbar.org> wrote: > > > > Hi Sumit, > > > > On Fri, 21 Feb 2025 at 11:24, Sumit Garg <sumit.garg@linaro.org> wrote: > > > On Tue, 18 Feb 2025 at 21:52, Daniel Stone <daniel@fooishbar.org> wrote: > > > > dma-heaps was created to solve the problem of having too many > > > > 'allocate $n bytes from $specialplace' uAPIs. The proliferation was > > > > painful and making it difficult for userspace to do what it needed to > > > > do. Userspace doesn't _yet_ make full use of it, but the solution is > > > > to make userspace make full use of it, not to go create entirely > > > > separate allocation paths for unclear reasons. > > > > > > > > Besides, I'm writing this from a platform that implements SVP not via > > > > TEE. I've worked on platforms which implement SVP without any TEE, > > > > where the TEE implementation would be at best a no-op stub, and at > > > > worst flat-out impossible. > > > > > > Can you elaborate the non-TEE use-case for Secure Video Path (SVP) a > > > bit more? As to how the protected/encrypted media content pipeline > > > works? Which architecture support does your use-case require? Is there > > > any higher privileged level firmware interaction required to perform > > > media content decryption into restricted memory? Do you plan to > > > upstream corresponding support in near future? > > > > You can see the MTK SVP patches on list which use the MTK SMC to mediate it. > > > > There are TI Jacinto platforms which implement a 'secure' area > > configured statically by (IIRC) BL2, with static permissions defined > > for each AXI endpoint, e.g. CPU write + codec RW + dispc read. I've > > heard of another SoC vendor doing the same, but I don't think I can > > share those details. There is no TEE interaction. > > > > I'm writing this message from an AMD laptop which implements > > restricted content paths outside of TEE. I don't have the full picture > > of how SVP is implemented on AMD systems, but I do know that I don't > > have any TEE devices exposed. > > > > > Let me try to elaborate on the Secure Video Path (SVP) flow requiring > > > a TEE implementation (in general terms a higher privileged firmware > > > managing the pipeline as the kernel/user-space has no access > > > permissions to the plain text media content): > > > > > > - [...] > > > > Yeah, I totally understand the TEE usecase. I think that TEE is a good > > design to implement this. I think that TEE should be used for SVP > > where it makes sense. > > > > Please understand that I am _not_ arguing that no-one should use TEE for SVP! > > > > > > So, again, let's > > > > please turn this around: _why_ TEE? Who benefits from exposing this as > > > > completely separate to the more generic uAPI that we specifically > > > > designed to handle things like this? > > > > > > The bridging between DMA heaps and TEE would still require user-space > > > to perform an IOCTL into TEE to register the DMA-bufs as you can see > > > here [1]. Then it will rather be two handles for user-space to manage. > > > > Yes, the decoder would need to do this. That's common though: if you > > want to share a buffer between V4L2 and DRM, you have three handles: > > the V4L2 buffer handle, the DRM GEM handle, and the dmabuf you use to > > bridge the two. > > > > > Similarly during restricted memory allocation/free we need another > > > glue layer under DMA heaps to TEE subsystem. > > > > Yep. > > > > > The reason is simply which has been iterated over many times in the > > > past threads that: > > > > > > "If user-space has to interact with a TEE device for SVP use-case > > > then why it's not better to ask TEE to allocate restricted DMA-bufs > > > too" > > > > The first word in your proposition is load-bearing. > > > > Build out the usecase a little more here. You have a DRMed video > > stream coming in, which you need to decode (involving TEE for this > > usecase). You get a dmabuf handle to the decoded frame. You need to > > pass the dmabuf across to the Wayland compositor. The compositor needs > > to pass it to EGL/Vulkan to import and do composition, which in turn > > passes it to the GPU DRM driver. The output of the composition is in > > turn shared between the GPU DRM driver and the separate KMS DRM > > driver, with the involvement of GBM. > > > > For the platforms I'm interested in, the GPU DRM driver needs to > > switch into protected mode, which has no involvement at all with TEE - > > it's architecturally impossible to have TEE involved without moving > > most of the GPU driver into TEE and destroying performance. The > > display hardware also needs to engage protected mode, which again has > > no involvement with TEE and again would need to have half the driver > > moved into TEE for no benefit in order to do so. The Wayland > > compositor also has no interest in TEE: it tells the GPU DRM driver > > about the protected status of its buffers, and that's it. > > > > What these components _are_ opinionated about, is the way buffers are > > allocated and managed. We built out dmabuf modifiers for this usecase, > > and we have a good negotiation protocol around that. We also really > > care about buffer placement in some usecases - e.g. some display/codec > > hardware requires buffers to be sourced from contiguous memory, other > > hardware needs to know that when it shares buffers with another > > device, it needs to place the buffers outside of inaccessible/slow > > local RAM. So we built out dma-heaps, so every part of the component > > in the stack can communicate their buffer-placement needs in the same > > way as we do modifiers, and negotiate an acceptable allocation. > > > > That's my starting point for this discussion. We have a mechanism to > > deal with the fact that buffers need to be shared between different IP > > blocks which have their own constraints on buffer placement, avoiding > > the current problem of having every subsystem reinvent their own > > allocation uAPI which was burying us in impedance mismatch and > > confusion. That mechanism is dma-heaps. It seems like your starting > > point from this discussion is that you've implemented a TEE-centric > > design for SVP, and so all of userspace should bypass our existing > > cross-subsystem special-purpose allocation mechanism, and write > > specifically to one implementation. I believe that is a massive step > > backwards and an immediate introduction of technical debt. > > > > Again, having an implementation of SVP via TEE makes a huge amount of > > sense. Having _most_ SVP implementations via TEE still makes a lot of > > sense. Having _all_ SVP implementations eventually be via TEE would > > still make sense. But even if we were at that point - which we aren't > > - it still doesn't justify telling userspace 'use the generic dma-heap > > uAPI for every device-specific allocation constraint, apart from SVP > > which has a completely different way to allocate some bytes'. > > I must admit that I don't see how this makes a significant difference, > but then I haven't hacked much in the stacks you're talking about, so > I'm going to take your word for it. > > I've experimented with providing a dma-heap replacing the TEE API. The > implementation is more complex than I first anticipated, adding about > 400 lines to the patch set. I did anticipated this but let's give it a try and see if DMA heaps really adds any value from user-space point of view. If it does then it will be worth the maintenence overhead. > From user space, it looks like another > dma-heap. I'm using the names you gave earlier, > protected,secure-video, protected,trusted-ui, and > protected,secure-video-record. However, I wonder if we shouldn't use > "restricted" instead of "protected" since we had agreed to call it > restricted memory earlier. Let's stick with "restricted" memory buffer references only. -Sumit
Le mardi 04 mars 2025 à 13:15 +0530, Sumit Garg a écrit : > On Tue, Mar 04, 2025 at 08:17:23AM +0100, Jens Wiklander wrote: > > Hi Daniel, > > > > On Fri, Feb 21, 2025 at 3:12 PM Daniel Stone <daniel@fooishbar.org> wrote: > > > > > > Hi Sumit, > > > > > > On Fri, 21 Feb 2025 at 11:24, Sumit Garg <sumit.garg@linaro.org> wrote: > > > > On Tue, 18 Feb 2025 at 21:52, Daniel Stone <daniel@fooishbar.org> wrote: > > > > > dma-heaps was created to solve the problem of having too many > > > > > 'allocate $n bytes from $specialplace' uAPIs. The proliferation was > > > > > painful and making it difficult for userspace to do what it needed to > > > > > do. Userspace doesn't _yet_ make full use of it, but the solution is > > > > > to make userspace make full use of it, not to go create entirely > > > > > separate allocation paths for unclear reasons. > > > > > > > > > > Besides, I'm writing this from a platform that implements SVP not via > > > > > TEE. I've worked on platforms which implement SVP without any TEE, > > > > > where the TEE implementation would be at best a no-op stub, and at > > > > > worst flat-out impossible. > > > > > > > > Can you elaborate the non-TEE use-case for Secure Video Path (SVP) a > > > > bit more? As to how the protected/encrypted media content pipeline > > > > works? Which architecture support does your use-case require? Is there > > > > any higher privileged level firmware interaction required to perform > > > > media content decryption into restricted memory? Do you plan to > > > > upstream corresponding support in near future? > > > > > > You can see the MTK SVP patches on list which use the MTK SMC to mediate it. > > > > > > There are TI Jacinto platforms which implement a 'secure' area > > > configured statically by (IIRC) BL2, with static permissions defined > > > for each AXI endpoint, e.g. CPU write + codec RW + dispc read. I've > > > heard of another SoC vendor doing the same, but I don't think I can > > > share those details. There is no TEE interaction. > > > > > > I'm writing this message from an AMD laptop which implements > > > restricted content paths outside of TEE. I don't have the full picture > > > of how SVP is implemented on AMD systems, but I do know that I don't > > > have any TEE devices exposed. > > > > > > > Let me try to elaborate on the Secure Video Path (SVP) flow requiring > > > > a TEE implementation (in general terms a higher privileged firmware > > > > managing the pipeline as the kernel/user-space has no access > > > > permissions to the plain text media content): > > > > > > > > - [...] > > > > > > Yeah, I totally understand the TEE usecase. I think that TEE is a good > > > design to implement this. I think that TEE should be used for SVP > > > where it makes sense. > > > > > > Please understand that I am _not_ arguing that no-one should use TEE for SVP! > > > > > > > > So, again, let's > > > > > please turn this around: _why_ TEE? Who benefits from exposing this as > > > > > completely separate to the more generic uAPI that we specifically > > > > > designed to handle things like this? > > > > > > > > The bridging between DMA heaps and TEE would still require user-space > > > > to perform an IOCTL into TEE to register the DMA-bufs as you can see > > > > here [1]. Then it will rather be two handles for user-space to manage. > > > > > > Yes, the decoder would need to do this. That's common though: if you > > > want to share a buffer between V4L2 and DRM, you have three handles: > > > the V4L2 buffer handle, the DRM GEM handle, and the dmabuf you use to > > > bridge the two. > > > > > > > Similarly during restricted memory allocation/free we need another > > > > glue layer under DMA heaps to TEE subsystem. > > > > > > Yep. > > > > > > > The reason is simply which has been iterated over many times in the > > > > past threads that: > > > > > > > > "If user-space has to interact with a TEE device for SVP use-case > > > > then why it's not better to ask TEE to allocate restricted DMA-bufs > > > > too" > > > > > > The first word in your proposition is load-bearing. > > > > > > Build out the usecase a little more here. You have a DRMed video > > > stream coming in, which you need to decode (involving TEE for this > > > usecase). You get a dmabuf handle to the decoded frame. You need to > > > pass the dmabuf across to the Wayland compositor. The compositor needs > > > to pass it to EGL/Vulkan to import and do composition, which in turn > > > passes it to the GPU DRM driver. The output of the composition is in > > > turn shared between the GPU DRM driver and the separate KMS DRM > > > driver, with the involvement of GBM. > > > > > > For the platforms I'm interested in, the GPU DRM driver needs to > > > switch into protected mode, which has no involvement at all with TEE - > > > it's architecturally impossible to have TEE involved without moving > > > most of the GPU driver into TEE and destroying performance. The > > > display hardware also needs to engage protected mode, which again has > > > no involvement with TEE and again would need to have half the driver > > > moved into TEE for no benefit in order to do so. The Wayland > > > compositor also has no interest in TEE: it tells the GPU DRM driver > > > about the protected status of its buffers, and that's it. > > > > > > What these components _are_ opinionated about, is the way buffers are > > > allocated and managed. We built out dmabuf modifiers for this usecase, > > > and we have a good negotiation protocol around that. We also really > > > care about buffer placement in some usecases - e.g. some display/codec > > > hardware requires buffers to be sourced from contiguous memory, other > > > hardware needs to know that when it shares buffers with another > > > device, it needs to place the buffers outside of inaccessible/slow > > > local RAM. So we built out dma-heaps, so every part of the component > > > in the stack can communicate their buffer-placement needs in the same > > > way as we do modifiers, and negotiate an acceptable allocation. > > > > > > That's my starting point for this discussion. We have a mechanism to > > > deal with the fact that buffers need to be shared between different IP > > > blocks which have their own constraints on buffer placement, avoiding > > > the current problem of having every subsystem reinvent their own > > > allocation uAPI which was burying us in impedance mismatch and > > > confusion. That mechanism is dma-heaps. It seems like your starting > > > point from this discussion is that you've implemented a TEE-centric > > > design for SVP, and so all of userspace should bypass our existing > > > cross-subsystem special-purpose allocation mechanism, and write > > > specifically to one implementation. I believe that is a massive step > > > backwards and an immediate introduction of technical debt. > > > > > > Again, having an implementation of SVP via TEE makes a huge amount of > > > sense. Having _most_ SVP implementations via TEE still makes a lot of > > > sense. Having _all_ SVP implementations eventually be via TEE would > > > still make sense. But even if we were at that point - which we aren't > > > - it still doesn't justify telling userspace 'use the generic dma-heap > > > uAPI for every device-specific allocation constraint, apart from SVP > > > which has a completely different way to allocate some bytes'. > > > > I must admit that I don't see how this makes a significant difference, > > but then I haven't hacked much in the stacks you're talking about, so > > I'm going to take your word for it. > > > > I've experimented with providing a dma-heap replacing the TEE API. The > > implementation is more complex than I first anticipated, adding about > > 400 lines to the patch set. > > I did anticipated this but let's give it a try and see if DMA heaps > really adds any value from user-space point of view. If it does then it > will be worth the maintenence overhead. > > > From user space, it looks like another > > dma-heap. I'm using the names you gave earlier, > > protected,secure-video, protected,trusted-ui, and > > protected,secure-video-record. However, I wonder if we shouldn't use > > "restricted" instead of "protected" since we had agreed to call it > > restricted memory earlier. > > Let's stick with "restricted" memory buffer references only. Until now, we didn't have a standard to balance our naming choice, we simply wanted to move away from "secure" which didn't mean much, and restricted met our needs. I think the discussion is worth having again, now that there is a standard that decided toward "protected". Matchcing the Khronos standard means reducing a lot of confusion. https://docs.vulkan.org/guide/latest/protected.html regards, Nicolas
Hi, On Tue, Mar 18, 2025 at 7:38 PM Nicolas Dufresne <nicolas@ndufresne.ca> wrote: > > Le mardi 04 mars 2025 à 13:15 +0530, Sumit Garg a écrit : > > On Tue, Mar 04, 2025 at 08:17:23AM +0100, Jens Wiklander wrote: > > > Hi Daniel, > > > > > > On Fri, Feb 21, 2025 at 3:12 PM Daniel Stone <daniel@fooishbar.org> wrote: > > > > > > > > Hi Sumit, > > > > > > > > On Fri, 21 Feb 2025 at 11:24, Sumit Garg <sumit.garg@linaro.org> wrote: > > > > > On Tue, 18 Feb 2025 at 21:52, Daniel Stone <daniel@fooishbar.org> wrote: > > > > > > dma-heaps was created to solve the problem of having too many > > > > > > 'allocate $n bytes from $specialplace' uAPIs. The proliferation was > > > > > > painful and making it difficult for userspace to do what it needed to > > > > > > do. Userspace doesn't _yet_ make full use of it, but the solution is > > > > > > to make userspace make full use of it, not to go create entirely > > > > > > separate allocation paths for unclear reasons. > > > > > > > > > > > > Besides, I'm writing this from a platform that implements SVP not via > > > > > > TEE. I've worked on platforms which implement SVP without any TEE, > > > > > > where the TEE implementation would be at best a no-op stub, and at > > > > > > worst flat-out impossible. > > > > > > > > > > Can you elaborate the non-TEE use-case for Secure Video Path (SVP) a > > > > > bit more? As to how the protected/encrypted media content pipeline > > > > > works? Which architecture support does your use-case require? Is there > > > > > any higher privileged level firmware interaction required to perform > > > > > media content decryption into restricted memory? Do you plan to > > > > > upstream corresponding support in near future? > > > > > > > > You can see the MTK SVP patches on list which use the MTK SMC to mediate it. > > > > > > > > There are TI Jacinto platforms which implement a 'secure' area > > > > configured statically by (IIRC) BL2, with static permissions defined > > > > for each AXI endpoint, e.g. CPU write + codec RW + dispc read. I've > > > > heard of another SoC vendor doing the same, but I don't think I can > > > > share those details. There is no TEE interaction. > > > > > > > > I'm writing this message from an AMD laptop which implements > > > > restricted content paths outside of TEE. I don't have the full picture > > > > of how SVP is implemented on AMD systems, but I do know that I don't > > > > have any TEE devices exposed. > > > > > > > > > Let me try to elaborate on the Secure Video Path (SVP) flow requiring > > > > > a TEE implementation (in general terms a higher privileged firmware > > > > > managing the pipeline as the kernel/user-space has no access > > > > > permissions to the plain text media content): > > > > > > > > > > - [...] > > > > > > > > Yeah, I totally understand the TEE usecase. I think that TEE is a good > > > > design to implement this. I think that TEE should be used for SVP > > > > where it makes sense. > > > > > > > > Please understand that I am _not_ arguing that no-one should use TEE for SVP! > > > > > > > > > > So, again, let's > > > > > > please turn this around: _why_ TEE? Who benefits from exposing this as > > > > > > completely separate to the more generic uAPI that we specifically > > > > > > designed to handle things like this? > > > > > > > > > > The bridging between DMA heaps and TEE would still require user-space > > > > > to perform an IOCTL into TEE to register the DMA-bufs as you can see > > > > > here [1]. Then it will rather be two handles for user-space to manage. > > > > > > > > Yes, the decoder would need to do this. That's common though: if you > > > > want to share a buffer between V4L2 and DRM, you have three handles: > > > > the V4L2 buffer handle, the DRM GEM handle, and the dmabuf you use to > > > > bridge the two. > > > > > > > > > Similarly during restricted memory allocation/free we need another > > > > > glue layer under DMA heaps to TEE subsystem. > > > > > > > > Yep. > > > > > > > > > The reason is simply which has been iterated over many times in the > > > > > past threads that: > > > > > > > > > > "If user-space has to interact with a TEE device for SVP use-case > > > > > then why it's not better to ask TEE to allocate restricted DMA-bufs > > > > > too" > > > > > > > > The first word in your proposition is load-bearing. > > > > > > > > Build out the usecase a little more here. You have a DRMed video > > > > stream coming in, which you need to decode (involving TEE for this > > > > usecase). You get a dmabuf handle to the decoded frame. You need to > > > > pass the dmabuf across to the Wayland compositor. The compositor needs > > > > to pass it to EGL/Vulkan to import and do composition, which in turn > > > > passes it to the GPU DRM driver. The output of the composition is in > > > > turn shared between the GPU DRM driver and the separate KMS DRM > > > > driver, with the involvement of GBM. > > > > > > > > For the platforms I'm interested in, the GPU DRM driver needs to > > > > switch into protected mode, which has no involvement at all with TEE - > > > > it's architecturally impossible to have TEE involved without moving > > > > most of the GPU driver into TEE and destroying performance. The > > > > display hardware also needs to engage protected mode, which again has > > > > no involvement with TEE and again would need to have half the driver > > > > moved into TEE for no benefit in order to do so. The Wayland > > > > compositor also has no interest in TEE: it tells the GPU DRM driver > > > > about the protected status of its buffers, and that's it. > > > > > > > > What these components _are_ opinionated about, is the way buffers are > > > > allocated and managed. We built out dmabuf modifiers for this usecase, > > > > and we have a good negotiation protocol around that. We also really > > > > care about buffer placement in some usecases - e.g. some display/codec > > > > hardware requires buffers to be sourced from contiguous memory, other > > > > hardware needs to know that when it shares buffers with another > > > > device, it needs to place the buffers outside of inaccessible/slow > > > > local RAM. So we built out dma-heaps, so every part of the component > > > > in the stack can communicate their buffer-placement needs in the same > > > > way as we do modifiers, and negotiate an acceptable allocation. > > > > > > > > That's my starting point for this discussion. We have a mechanism to > > > > deal with the fact that buffers need to be shared between different IP > > > > blocks which have their own constraints on buffer placement, avoiding > > > > the current problem of having every subsystem reinvent their own > > > > allocation uAPI which was burying us in impedance mismatch and > > > > confusion. That mechanism is dma-heaps. It seems like your starting > > > > point from this discussion is that you've implemented a TEE-centric > > > > design for SVP, and so all of userspace should bypass our existing > > > > cross-subsystem special-purpose allocation mechanism, and write > > > > specifically to one implementation. I believe that is a massive step > > > > backwards and an immediate introduction of technical debt. > > > > > > > > Again, having an implementation of SVP via TEE makes a huge amount of > > > > sense. Having _most_ SVP implementations via TEE still makes a lot of > > > > sense. Having _all_ SVP implementations eventually be via TEE would > > > > still make sense. But even if we were at that point - which we aren't > > > > - it still doesn't justify telling userspace 'use the generic dma-heap > > > > uAPI for every device-specific allocation constraint, apart from SVP > > > > which has a completely different way to allocate some bytes'. > > > > > > I must admit that I don't see how this makes a significant difference, > > > but then I haven't hacked much in the stacks you're talking about, so > > > I'm going to take your word for it. > > > > > > I've experimented with providing a dma-heap replacing the TEE API. The > > > implementation is more complex than I first anticipated, adding about > > > 400 lines to the patch set. > > > > I did anticipated this but let's give it a try and see if DMA heaps > > really adds any value from user-space point of view. If it does then it > > will be worth the maintenence overhead. > > > > > From user space, it looks like another > > > dma-heap. I'm using the names you gave earlier, > > > protected,secure-video, protected,trusted-ui, and > > > protected,secure-video-record. However, I wonder if we shouldn't use > > > "restricted" instead of "protected" since we had agreed to call it > > > restricted memory earlier. > > > > Let's stick with "restricted" memory buffer references only. > > Until now, we didn't have a standard to balance our naming choice, we > simply wanted to move away from "secure" which didn't mean much, and > restricted met our needs. I think the discussion is worth having again, > now that there is a standard that decided toward "protected". Matchcing > the Khronos standard means reducing a lot of confusion. > > https://docs.vulkan.org/guide/latest/protected.html Yeah, that's fine with me. I don't mind changing the name again as long as we progress. The latest version of the patchset is here [1]. I've published a demo and changed the patchset to provide a heap interface instead of a special interface in the TEE subsystem for memory allocations as requested. I'm interested in feedback on the patches in general, but in particular, on how the heap interface is provided. [1] https://lore.kernel.org/lkml/20250305130634.1850178-1-jens.wiklander@linaro.org/ Cheers, Jens
On Tue, Feb 18, 2025 at 04:22:10PM +0000, Daniel Stone wrote: > Hi Sumit, > > On Mon, 17 Feb 2025 at 06:13, Sumit Garg <sumit.garg@linaro.org> wrote: > > On Fri, 14 Feb 2025 at 21:19, Boris Brezillon <boris.brezillon@collabora.com> wrote: > > > I would say one heap per-profile. > > > > And then it would have a per vendor multiplication factor as each > > vendor enforces memory restriction in a platform specific manner which > > won't scale. > > Yes, they do enforce it in a platform-specific manner, but so does > TEE. There is no one golden set of semantics which is globally > applicable between all hardware and all products in a useful manner. > > So, if we define protected,secure-video + > protected,secure-video-record + protected,trusted-ui heap names, we > have exactly the same number of axes. The only change is from uint32_t > to string. > > > > > Christian gave an historical background here [1] as to why that hasn't > > > > worked in the past with DMA heaps given the scalability issues. > > > > > > > > [1] https://lore.kernel.org/dri-devel/e967e382-6cca-4dee-8333-39892d532f71@gmail.com/ > > > > > > Hm, I fail to see where Christian dismiss the dma-heaps solution in > > > this email. He even says: > > > > > > > If the memory is not physically attached to any device, but rather just > > > memory attached to the CPU or a system wide memory controller then > > > expose the memory as DMA-heap with specific requirements (e.g. certain > > > sized pages, contiguous, restricted, encrypted, ...). > > > > I am not saying Christian dismissed DMA heaps but rather how > > scalability is an issue. What we are proposing here is a generic > > interface via TEE to the firmware/Trusted OS which can perform all the > > platform specific memory restrictions. This solution will scale across > > vendors. > > I read something completely different into Christian's mail. > > What Christian is saying is that injecting generic constraint solving > into the kernel doesn't scale. It's not OK to build out generic > infrastructure in the kernel which queries a bunch of leaf drivers and > attempts to somehow come up with something which satisfies > userspace-provided constraints. Fully agreeing. The one thing we discussed, but haven't implemented yet, is that we'd add sysfs links from devices to the dma-heaps they support. Including allowing for priorities and different use-cases on the same device. We just haven't gotten there yet. But even with that it's up to userspace to do the constraint solving, not the kernel. > But this isn't the same thing as saying 'dma-heaps is wrong'! Again, > there is no additional complexity in the kernel between a dma-heap > which bridges over to TEE, and a TEE userspace interface which also > bridges over to TEE. Both of them are completely fine according to > what he's said. > > > > Honestly, when I look at dma-heap implementations, they seem > > > to be trivial shells around existing (more complex) allocators, and the > > > boiler plate [1] to expose a dma-heap is relatively small. The dma-buf > > > implementation, you already have, so we're talking about a hundred > > > lines of code to maintain, which shouldn't be significantly more than > > > what you have for the new ioctl() to be honest. > > > > It will rather be redundant vendor specific code under DMA heaps > > calling into firmware/Trusted OS to enforce memory restrictions as you > > can look into Mediatek example [1]. With TEE subsystem managing that > > it won't be the case as we will provide a common abstraction for the > > communication with underlying firmware/Trusted OS. > > Yes, it's common for everyone who uses TEE to implement SVP. It's not > common for the people who do _not_ use TEE to implement SVP. Which > means that userspace has to type out both, and what we're asking in > this thread is: why? > > Why should userspace have to support dma-heap allocation for platforms > supporting SVP via a static DT-defined carveout as well as supporting > TEE API allocation for platforms supporting SVP via a dynamic > carveout? What benefit does it bring to have this surfaced as a > completely separate uAPI? > > > > And I'll insist on what > > > Daniel said, it's a small price to pay to have a standard interface to > > > expose to userspace. If dma-heaps are not used for this kind things, I > > > honestly wonder what they will be used for... > > > > Let's try not to forcefully find a use-case for DMA heaps when there > > is a better alternative available. > > What makes it better? If you could explain very clearly the benefit > userspace will gain from asking TEE to allocate $n bytes for > TEE_IOC_UC_SECURE_VIDEO_PLAY, compared to asking dma-heap to allocate > $n bytes for protected,secure-video, I think that would really help. > Right now, I don't understand how it would be better in any way > whatsoever for userspace. And I think your decision to implement it as > a separate API is based on a misunderstanding of Christian's position. > > > I am still failing to see why you > > don't consider following as a standardised user-space interface: > > > > "When user-space has to work with restricted memory, ask TEE device to > > allocate it" > > As far as I can tell, having userspace work with the TEE interface > brings zero benefit (again, please correct me if I'm wrong and explain > how it's better). The direct cost - call it a disbenefit - it brings > is that we have to spend a pile of time typing out support for TEE > allocation in every media/GPU/display driver/application, and when we > do any kind of negotiation, we have to have one protocol definition > for TEE and one for non-TEE. > > dma-heaps was created to solve the problem of having too many > 'allocate $n bytes from $specialplace' uAPIs. The proliferation was > painful and making it difficult for userspace to do what it needed to > do. Userspace doesn't _yet_ make full use of it, but the solution is > to make userspace make full use of it, not to go create entirely > separate allocation paths for unclear reasons. > > Besides, I'm writing this from a platform that implements SVP not via > TEE. I've worked on platforms which implement SVP without any TEE, > where the TEE implementation would be at best a no-op stub, and at > worst flat-out impossible. > > So that's 'why not TEE as the single uAPI for SVP'. So, again, let's > please turn this around: _why_ TEE? Who benefits from exposing this as > completely separate to the more generic uAPI that we specifically > designed to handle things like this? Completely concurring on everything said above. TEE exposed through a dma-buf heap (or maybe special v4l allocation flag for secure video playback) and then we prime import that on the display side. Maybe also through drm render drivers for the EGL/VK protected content extensions. Same for any other hw means to allocate content protected buffers, TEE is not special here at all. Anything else needs seriously good justifications why the entire dma-buf heap design is busted. Cheers, Sima -- Simona Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
On Tue, Dec 17, 2024 at 11:07:36AM +0100, Jens Wiklander wrote: > Hi, > > This patch set allocates the restricted DMA-bufs via the TEE subsystem. > > The TEE subsystem handles the DMA-buf allocations since it is the TEE > (OP-TEE, AMD-TEE, TS-TEE, or perhaps a future QCOMTEE) which sets up the > restrictions for the memory used for the DMA-bufs. > > I've added a new IOCTL, TEE_IOC_RSTMEM_ALLOC, to allocate the restricted > DMA-bufs. This IOCTL reaches the backend TEE driver, allowing it to choose > how to allocate the restricted physical memory. > > TEE_IOC_RSTMEM_ALLOC takes in addition to a size and flags parameters also > a use-case parameter. This is used by the backend TEE driver to decide on > allocation policy and which devices should be able to access the memory. > > Three use-cases (Secure Video Playback, Trusted UI, and Secure Video > Recording) has been identified so far to serve as examples of what can be > expected. More use-cases can be added in userspace ABI, but it's up to the > backend TEE drivers to provide the implementation. > > Each use-case has it's own restricted memory pool since different use-cases > requires isolation from different parts of the system. A restricted memory > pool can be based on a static carveout instantiated while probing the TEE > backend driver, or dynamically allocated from CMA and made restricted as > needed by the TEE. > > This can be tested on QEMU with the following steps: > repo init -u https://github.com/jenswi-linaro/manifest.git -m qemu_v8.xml \ > -b prototype/sdp-v4 > repo sync -j8 > cd build > make toolchains -j$(nproc) > make SPMC_AT_EL=1 all -j$(nproc) > make SPMC_AT_EL=1 run-only > # login and at the prompt: > xtest --sdp-basic > > The SPMC_AT_EL=1 parameter configures the build with FF-A and an SPMC at > S-EL1 inside OP-TEE. The parameter can be changed into SPMC_AT_EL=n to test > without FF-A using the original SMC ABI instead. Please remember to do > %rm -rf ../trusted-firmware-a/build/qemu > for TF-A to be rebuilt properly using the new configuration. > > https://optee.readthedocs.io/en/latest/building/prerequisites.html > list dependencies needed to build the above. > > The tests are pretty basic, mostly checking that a Trusted Application in > the secure world can access and manipulate the memory. There are also some > negative tests for out of bounds buffers etc. I think I've dropped this on earlier encrypted dma-buf discussions for TEE, but can't find one right now ... Do we have some open source userspace for this? To my knowledge we have two implementations of encrypted/content protected dma-buf in upstream right now in the amd and intel gpu drivers, and unless I'm mistaken they both have some minimal userspace supporting EXT_protected_textures: https://github.com/KhronosGroup/OpenGL-Registry/blob/main/extensions/EXT/EXT_protected_textures.txt It's not great, but it does just barely clear the bar in my opinion. I guess something in gstreamer or similar video pipeline framework would also do the job. Especially with the context of the uapi discussion in the v1/RFC thread I think we need more than a bare-bones testcase to make sure this works in actual use. Cheers, Sima > > Thanks, > Jens > > Changes since V3: > * Make the use_case and flags field in struct tee_shm u32's instead of > u16's > * Add more description for TEE_IOC_RSTMEM_ALLOC in the header file > * Import namespace DMA_BUF in module tee, reported by lkp@intel.com > * Added a note in the commit message for "optee: account for direction > while converting parameters" why it's needed > * Factor out dynamic restricted memory allocation from > "optee: support restricted memory allocation" into two new commits > "optee: FF-A: dynamic restricted memory allocation" and > "optee: smc abi: dynamic restricted memory allocation" > * Guard CMA usage with #ifdef CONFIG_CMA, effectively disabling dynamic > restricted memory allocate if CMA isn't configured > > Changes since the V2 RFC: > * Based on v6.12 > * Replaced the flags for SVP and Trusted UID memory with a u32 field with > unique id for each use case > * Added dynamic allocation of restricted memory pools > * Added OP-TEE ABI both with and without FF-A for dynamic restricted memory > * Added support for FF-A with FFA_LEND > > Changes since the V1 RFC: > * Based on v6.11 > * Complete rewrite, replacing the restricted heap with TEE_IOC_RSTMEM_ALLOC > > Changes since Olivier's post [2]: > * Based on Yong Wu's post [1] where much of dma-buf handling is done in > the generic restricted heap > * Simplifications and cleanup > * New commit message for "dma-buf: heaps: add Linaro restricted dmabuf heap > support" > * Replaced the word "secure" with "restricted" where applicable > > Jens Wiklander (6): > tee: add restricted memory allocation > optee: account for direction while converting parameters > optee: sync secure world ABI headers > optee: support restricted memory allocation > optee: FF-A: dynamic restricted memory allocation > optee: smc abi: dynamic restricted memory allocation > > drivers/tee/Makefile | 1 + > drivers/tee/optee/Makefile | 1 + > drivers/tee/optee/call.c | 10 +- > drivers/tee/optee/core.c | 1 + > drivers/tee/optee/ffa_abi.c | 178 +++++++++++++- > drivers/tee/optee/optee_ffa.h | 27 ++- > drivers/tee/optee/optee_msg.h | 65 ++++- > drivers/tee/optee/optee_private.h | 75 ++++-- > drivers/tee/optee/optee_smc.h | 71 +++++- > drivers/tee/optee/rpc.c | 31 ++- > drivers/tee/optee/rstmem.c | 388 ++++++++++++++++++++++++++++++ > drivers/tee/optee/smc_abi.c | 213 ++++++++++++++-- > drivers/tee/tee_core.c | 38 ++- > drivers/tee/tee_private.h | 2 + > drivers/tee/tee_rstmem.c | 201 ++++++++++++++++ > drivers/tee/tee_shm.c | 2 + > drivers/tee/tee_shm_pool.c | 69 +++++- > include/linux/tee_core.h | 15 ++ > include/linux/tee_drv.h | 2 + > include/uapi/linux/tee.h | 44 +++- > 20 files changed, 1358 insertions(+), 76 deletions(-) > create mode 100644 drivers/tee/optee/rstmem.c > create mode 100644 drivers/tee/tee_rstmem.c > > > base-commit: fac04efc5c793dccbd07e2d59af9f90b7fc0dca4 > -- > 2.43.0 > -- Simona Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Hi Simona, On Wed, 18 Dec 2024 at 16:36, Simona Vetter <simona.vetter@ffwll.ch> wrote: > > On Tue, Dec 17, 2024 at 11:07:36AM +0100, Jens Wiklander wrote: > > Hi, > > > > This patch set allocates the restricted DMA-bufs via the TEE subsystem. > > > > The TEE subsystem handles the DMA-buf allocations since it is the TEE > > (OP-TEE, AMD-TEE, TS-TEE, or perhaps a future QCOMTEE) which sets up the > > restrictions for the memory used for the DMA-bufs. > > > > I've added a new IOCTL, TEE_IOC_RSTMEM_ALLOC, to allocate the restricted > > DMA-bufs. This IOCTL reaches the backend TEE driver, allowing it to choose > > how to allocate the restricted physical memory. > > > > TEE_IOC_RSTMEM_ALLOC takes in addition to a size and flags parameters also > > a use-case parameter. This is used by the backend TEE driver to decide on > > allocation policy and which devices should be able to access the memory. > > > > Three use-cases (Secure Video Playback, Trusted UI, and Secure Video > > Recording) has been identified so far to serve as examples of what can be > > expected. More use-cases can be added in userspace ABI, but it's up to the > > backend TEE drivers to provide the implementation. > > > > Each use-case has it's own restricted memory pool since different use-cases > > requires isolation from different parts of the system. A restricted memory > > pool can be based on a static carveout instantiated while probing the TEE > > backend driver, or dynamically allocated from CMA and made restricted as > > needed by the TEE. > > > > This can be tested on QEMU with the following steps: > > repo init -u https://github.com/jenswi-linaro/manifest.git -m qemu_v8.xml \ > > -b prototype/sdp-v4 > > repo sync -j8 > > cd build > > make toolchains -j$(nproc) > > make SPMC_AT_EL=1 all -j$(nproc) > > make SPMC_AT_EL=1 run-only > > # login and at the prompt: > > xtest --sdp-basic > > > > The SPMC_AT_EL=1 parameter configures the build with FF-A and an SPMC at > > S-EL1 inside OP-TEE. The parameter can be changed into SPMC_AT_EL=n to test > > without FF-A using the original SMC ABI instead. Please remember to do > > %rm -rf ../trusted-firmware-a/build/qemu > > for TF-A to be rebuilt properly using the new configuration. > > > > https://optee.readthedocs.io/en/latest/building/prerequisites.html > > list dependencies needed to build the above. > > > > The tests are pretty basic, mostly checking that a Trusted Application in > > the secure world can access and manipulate the memory. There are also some > > negative tests for out of bounds buffers etc. > > I think I've dropped this on earlier encrypted dma-buf discussions for > TEE, but can't find one right now ... Thanks for raising this query. > > Do we have some open source userspace for this? To my knowledge we have > two implementations of encrypted/content protected dma-buf in upstream > right now in the amd and intel gpu drivers, and unless I'm mistaken they > both have some minimal userspace supporting EXT_protected_textures: First of all to clarify the support Jens is adding here for allocating restricted shared memory allocation in TEE subsystem is meant to be generic and not specific to only secure media pipeline use-case. Then here we not only have open source test applications but rather open source firmware too (OP-TEE as a Trusted OS) [1] supporting this as a core feature where we maintain a stable and extensible ABI among the kernel and the OP-TEE core. Restricted memory is a feature enforced by hardware specific firewalls where a particular TEE implementation governs which particular block of memory is accessible to a particular peripheral or a CPU running in a higher privileged mode than the Linux kernel. There can be numeric use-cases surrounding that as follows: - Secure media pipeline where the contents gets decrypted and stored in a restricted buffer which are then accessible only to media display pipeline peripherals. - Trusted user interface where a peripheral takes input from the user and stores it in a restricted buffer which then is accessible to TEE implementation only. - Another possible use-case can be for the TEE implementation to store key material in a restricted buffer which is only accessible to the hardware crypto accelerator. I am sure there will be more use-cases related to this feature but those will only be possible once we provide a stable and extensible restricted memory interface among the Linux user-space and the secure world user-space (normally referred to as Trusted Applications). [1] https://github.com/OP-TEE/optee_os/pull/7159 > > https://github.com/KhronosGroup/OpenGL-Registry/blob/main/extensions/EXT/EXT_protected_textures.txt > > It's not great, but it does just barely clear the bar in my opinion. I > guess something in gstreamer or similar video pipeline framework would > also do the job. > > Especially with the context of the uapi discussion in the v1/RFC thread I > think we need more than a bare-bones testcase to make sure this works in > actual use. Currently the TEE subsystem already supports a stable ABI for shared memory allocator among Linux user-space and secure world user-space here [2]. And the stable ABI for restricted memory is also along the same lines meant to be a vendor neutral abstraction for the user-space access. The current test cases not only test the interface but also perform regression tests too. I am also in favour of end to end open source use-cases. But I fear without progressing in a step wise manner as with this proposal we would rather force developers to upstream all the software pieces in one go which will be kind of a chicken and egg situation. I am sure once this feature lands Mediatek folks will be interested to port their secure video playback patchset [3] on top of it. Similarly other silicon vendors like NXP, Qcom etc. will be motivated to do the same. [2] https://docs.kernel.org/userspace-api/tee.html [3] https://lore.kernel.org/linux-arm-kernel/20240515112308.10171-1-yong.wu@mediatek.com/ -Sumit > > Cheers, Sima > > > > > Thanks, > > Jens > > > > Changes since V3: > > * Make the use_case and flags field in struct tee_shm u32's instead of > > u16's > > * Add more description for TEE_IOC_RSTMEM_ALLOC in the header file > > * Import namespace DMA_BUF in module tee, reported by lkp@intel.com > > * Added a note in the commit message for "optee: account for direction > > while converting parameters" why it's needed > > * Factor out dynamic restricted memory allocation from > > "optee: support restricted memory allocation" into two new commits > > "optee: FF-A: dynamic restricted memory allocation" and > > "optee: smc abi: dynamic restricted memory allocation" > > * Guard CMA usage with #ifdef CONFIG_CMA, effectively disabling dynamic > > restricted memory allocate if CMA isn't configured > > > > Changes since the V2 RFC: > > * Based on v6.12 > > * Replaced the flags for SVP and Trusted UID memory with a u32 field with > > unique id for each use case > > * Added dynamic allocation of restricted memory pools > > * Added OP-TEE ABI both with and without FF-A for dynamic restricted memory > > * Added support for FF-A with FFA_LEND > > > > Changes since the V1 RFC: > > * Based on v6.11 > > * Complete rewrite, replacing the restricted heap with TEE_IOC_RSTMEM_ALLOC > > > > Changes since Olivier's post [2]: > > * Based on Yong Wu's post [1] where much of dma-buf handling is done in > > the generic restricted heap > > * Simplifications and cleanup > > * New commit message for "dma-buf: heaps: add Linaro restricted dmabuf heap > > support" > > * Replaced the word "secure" with "restricted" where applicable > > > > Jens Wiklander (6): > > tee: add restricted memory allocation > > optee: account for direction while converting parameters > > optee: sync secure world ABI headers > > optee: support restricted memory allocation > > optee: FF-A: dynamic restricted memory allocation > > optee: smc abi: dynamic restricted memory allocation > > > > drivers/tee/Makefile | 1 + > > drivers/tee/optee/Makefile | 1 + > > drivers/tee/optee/call.c | 10 +- > > drivers/tee/optee/core.c | 1 + > > drivers/tee/optee/ffa_abi.c | 178 +++++++++++++- > > drivers/tee/optee/optee_ffa.h | 27 ++- > > drivers/tee/optee/optee_msg.h | 65 ++++- > > drivers/tee/optee/optee_private.h | 75 ++++-- > > drivers/tee/optee/optee_smc.h | 71 +++++- > > drivers/tee/optee/rpc.c | 31 ++- > > drivers/tee/optee/rstmem.c | 388 ++++++++++++++++++++++++++++++ > > drivers/tee/optee/smc_abi.c | 213 ++++++++++++++-- > > drivers/tee/tee_core.c | 38 ++- > > drivers/tee/tee_private.h | 2 + > > drivers/tee/tee_rstmem.c | 201 ++++++++++++++++ > > drivers/tee/tee_shm.c | 2 + > > drivers/tee/tee_shm_pool.c | 69 +++++- > > include/linux/tee_core.h | 15 ++ > > include/linux/tee_drv.h | 2 + > > include/uapi/linux/tee.h | 44 +++- > > 20 files changed, 1358 insertions(+), 76 deletions(-) > > create mode 100644 drivers/tee/optee/rstmem.c > > create mode 100644 drivers/tee/tee_rstmem.c > > > > > > base-commit: fac04efc5c793dccbd07e2d59af9f90b7fc0dca4 > > -- > > 2.43.0 > > > > -- > Simona Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch
On Tue, Dec 24, 2024 at 12:05:19PM +0530, Sumit Garg wrote: > Hi Simona, > > On Wed, 18 Dec 2024 at 16:36, Simona Vetter <simona.vetter@ffwll.ch> wrote: > > > > On Tue, Dec 17, 2024 at 11:07:36AM +0100, Jens Wiklander wrote: > > > Hi, > > > > > > This patch set allocates the restricted DMA-bufs via the TEE subsystem. > > > > > > The TEE subsystem handles the DMA-buf allocations since it is the TEE > > > (OP-TEE, AMD-TEE, TS-TEE, or perhaps a future QCOMTEE) which sets up the > > > restrictions for the memory used for the DMA-bufs. > > > > > > I've added a new IOCTL, TEE_IOC_RSTMEM_ALLOC, to allocate the restricted > > > DMA-bufs. This IOCTL reaches the backend TEE driver, allowing it to choose > > > how to allocate the restricted physical memory. > > > > > > TEE_IOC_RSTMEM_ALLOC takes in addition to a size and flags parameters also > > > a use-case parameter. This is used by the backend TEE driver to decide on > > > allocation policy and which devices should be able to access the memory. > > > > > > Three use-cases (Secure Video Playback, Trusted UI, and Secure Video > > > Recording) has been identified so far to serve as examples of what can be > > > expected. More use-cases can be added in userspace ABI, but it's up to the > > > backend TEE drivers to provide the implementation. > > > > > > Each use-case has it's own restricted memory pool since different use-cases > > > requires isolation from different parts of the system. A restricted memory > > > pool can be based on a static carveout instantiated while probing the TEE > > > backend driver, or dynamically allocated from CMA and made restricted as > > > needed by the TEE. > > > > > > This can be tested on QEMU with the following steps: > > > repo init -u https://github.com/jenswi-linaro/manifest.git -m qemu_v8.xml \ > > > -b prototype/sdp-v4 > > > repo sync -j8 > > > cd build > > > make toolchains -j$(nproc) > > > make SPMC_AT_EL=1 all -j$(nproc) > > > make SPMC_AT_EL=1 run-only > > > # login and at the prompt: > > > xtest --sdp-basic > > > > > > The SPMC_AT_EL=1 parameter configures the build with FF-A and an SPMC at > > > S-EL1 inside OP-TEE. The parameter can be changed into SPMC_AT_EL=n to test > > > without FF-A using the original SMC ABI instead. Please remember to do > > > %rm -rf ../trusted-firmware-a/build/qemu > > > for TF-A to be rebuilt properly using the new configuration. > > > > > > https://optee.readthedocs.io/en/latest/building/prerequisites.html > > > list dependencies needed to build the above. > > > > > > The tests are pretty basic, mostly checking that a Trusted Application in > > > the secure world can access and manipulate the memory. There are also some > > > negative tests for out of bounds buffers etc. > > > > I think I've dropped this on earlier encrypted dma-buf discussions for > > TEE, but can't find one right now ... > > Thanks for raising this query. > > > > > Do we have some open source userspace for this? To my knowledge we have > > two implementations of encrypted/content protected dma-buf in upstream > > right now in the amd and intel gpu drivers, and unless I'm mistaken they > > both have some minimal userspace supporting EXT_protected_textures: > > First of all to clarify the support Jens is adding here for allocating > restricted shared memory allocation in TEE subsystem is meant to be > generic and not specific to only secure media pipeline use-case. Then > here we not only have open source test applications but rather open > source firmware too (OP-TEE as a Trusted OS) [1] supporting this as a > core feature where we maintain a stable and extensible ABI among the > kernel and the OP-TEE core. > > Restricted memory is a feature enforced by hardware specific firewalls > where a particular TEE implementation governs which particular block > of memory is accessible to a particular peripheral or a CPU running in > a higher privileged mode than the Linux kernel. There can be numeric > use-cases surrounding that as follows: > > - Secure media pipeline where the contents gets decrypted and stored > in a restricted buffer which are then accessible only to media display > pipeline peripherals. > - Trusted user interface where a peripheral takes input from the user > and stores it in a restricted buffer which then is accessible to TEE > implementation only. > - Another possible use-case can be for the TEE implementation to store > key material in a restricted buffer which is only accessible to the > hardware crypto accelerator. > > I am sure there will be more use-cases related to this feature but > those will only be possible once we provide a stable and extensible > restricted memory interface among the Linux user-space and the secure > world user-space (normally referred to as Trusted Applications). > > [1] https://github.com/OP-TEE/optee_os/pull/7159 > > > > > https://github.com/KhronosGroup/OpenGL-Registry/blob/main/extensions/EXT/EXT_protected_textures.txt > > > > It's not great, but it does just barely clear the bar in my opinion. I > > guess something in gstreamer or similar video pipeline framework would > > also do the job. > > > > Especially with the context of the uapi discussion in the v1/RFC thread I > > think we need more than a bare-bones testcase to make sure this works in > > actual use. > > Currently the TEE subsystem already supports a stable ABI for shared > memory allocator among Linux user-space and secure world user-space > here [2]. And the stable ABI for restricted memory is also along the > same lines meant to be a vendor neutral abstraction for the user-space > access. The current test cases not only test the interface but also > perform regression tests too. > > I am also in favour of end to end open source use-cases. But I fear > without progressing in a step wise manner as with this proposal we > would rather force developers to upstream all the software pieces in > one go which will be kind of a chicken and egg situation. I am sure > once this feature lands Mediatek folks will be interested to port > their secure video playback patchset [3] on top of it. Similarly other > silicon vendors like NXP, Qcom etc. will be motivated to do the same. > > [2] https://docs.kernel.org/userspace-api/tee.html > [3] https://lore.kernel.org/linux-arm-kernel/20240515112308.10171-1-yong.wu@mediatek.com/ We get entire opengl/vulkan driver stacks ready before we merge new drm drivers, I really don't think this is too hard from a technical pov. And I think the mediatek patches had the same issue of lacking userspace for it, so that's not moving things forward. -Sima > > -Sumit > > > > > Cheers, Sima > > > > > > > > Thanks, > > > Jens > > > > > > Changes since V3: > > > * Make the use_case and flags field in struct tee_shm u32's instead of > > > u16's > > > * Add more description for TEE_IOC_RSTMEM_ALLOC in the header file > > > * Import namespace DMA_BUF in module tee, reported by lkp@intel.com > > > * Added a note in the commit message for "optee: account for direction > > > while converting parameters" why it's needed > > > * Factor out dynamic restricted memory allocation from > > > "optee: support restricted memory allocation" into two new commits > > > "optee: FF-A: dynamic restricted memory allocation" and > > > "optee: smc abi: dynamic restricted memory allocation" > > > * Guard CMA usage with #ifdef CONFIG_CMA, effectively disabling dynamic > > > restricted memory allocate if CMA isn't configured > > > > > > Changes since the V2 RFC: > > > * Based on v6.12 > > > * Replaced the flags for SVP and Trusted UID memory with a u32 field with > > > unique id for each use case > > > * Added dynamic allocation of restricted memory pools > > > * Added OP-TEE ABI both with and without FF-A for dynamic restricted memory > > > * Added support for FF-A with FFA_LEND > > > > > > Changes since the V1 RFC: > > > * Based on v6.11 > > > * Complete rewrite, replacing the restricted heap with TEE_IOC_RSTMEM_ALLOC > > > > > > Changes since Olivier's post [2]: > > > * Based on Yong Wu's post [1] where much of dma-buf handling is done in > > > the generic restricted heap > > > * Simplifications and cleanup > > > * New commit message for "dma-buf: heaps: add Linaro restricted dmabuf heap > > > support" > > > * Replaced the word "secure" with "restricted" where applicable > > > > > > Jens Wiklander (6): > > > tee: add restricted memory allocation > > > optee: account for direction while converting parameters > > > optee: sync secure world ABI headers > > > optee: support restricted memory allocation > > > optee: FF-A: dynamic restricted memory allocation > > > optee: smc abi: dynamic restricted memory allocation > > > > > > drivers/tee/Makefile | 1 + > > > drivers/tee/optee/Makefile | 1 + > > > drivers/tee/optee/call.c | 10 +- > > > drivers/tee/optee/core.c | 1 + > > > drivers/tee/optee/ffa_abi.c | 178 +++++++++++++- > > > drivers/tee/optee/optee_ffa.h | 27 ++- > > > drivers/tee/optee/optee_msg.h | 65 ++++- > > > drivers/tee/optee/optee_private.h | 75 ++++-- > > > drivers/tee/optee/optee_smc.h | 71 +++++- > > > drivers/tee/optee/rpc.c | 31 ++- > > > drivers/tee/optee/rstmem.c | 388 ++++++++++++++++++++++++++++++ > > > drivers/tee/optee/smc_abi.c | 213 ++++++++++++++-- > > > drivers/tee/tee_core.c | 38 ++- > > > drivers/tee/tee_private.h | 2 + > > > drivers/tee/tee_rstmem.c | 201 ++++++++++++++++ > > > drivers/tee/tee_shm.c | 2 + > > > drivers/tee/tee_shm_pool.c | 69 +++++- > > > include/linux/tee_core.h | 15 ++ > > > include/linux/tee_drv.h | 2 + > > > include/uapi/linux/tee.h | 44 +++- > > > 20 files changed, 1358 insertions(+), 76 deletions(-) > > > create mode 100644 drivers/tee/optee/rstmem.c > > > create mode 100644 drivers/tee/tee_rstmem.c > > > > > > > > > base-commit: fac04efc5c793dccbd07e2d59af9f90b7fc0dca4 > > > -- > > > 2.43.0 > > > > > > > -- > > Simona Vetter > > Software Engineer, Intel Corporation > > http://blog.ffwll.ch -- Simona Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
On Wed, 8 Jan 2025 at 22:27, Simona Vetter <simona.vetter@ffwll.ch> wrote: > > On Tue, Dec 24, 2024 at 12:05:19PM +0530, Sumit Garg wrote: > > Hi Simona, > > > > On Wed, 18 Dec 2024 at 16:36, Simona Vetter <simona.vetter@ffwll.ch> wrote: > > > > > > On Tue, Dec 17, 2024 at 11:07:36AM +0100, Jens Wiklander wrote: > > > > Hi, > > > > > > > > This patch set allocates the restricted DMA-bufs via the TEE subsystem. > > > > > > > > The TEE subsystem handles the DMA-buf allocations since it is the TEE > > > > (OP-TEE, AMD-TEE, TS-TEE, or perhaps a future QCOMTEE) which sets up the > > > > restrictions for the memory used for the DMA-bufs. > > > > > > > > I've added a new IOCTL, TEE_IOC_RSTMEM_ALLOC, to allocate the restricted > > > > DMA-bufs. This IOCTL reaches the backend TEE driver, allowing it to choose > > > > how to allocate the restricted physical memory. > > > > > > > > TEE_IOC_RSTMEM_ALLOC takes in addition to a size and flags parameters also > > > > a use-case parameter. This is used by the backend TEE driver to decide on > > > > allocation policy and which devices should be able to access the memory. > > > > > > > > Three use-cases (Secure Video Playback, Trusted UI, and Secure Video > > > > Recording) has been identified so far to serve as examples of what can be > > > > expected. More use-cases can be added in userspace ABI, but it's up to the > > > > backend TEE drivers to provide the implementation. > > > > > > > > Each use-case has it's own restricted memory pool since different use-cases > > > > requires isolation from different parts of the system. A restricted memory > > > > pool can be based on a static carveout instantiated while probing the TEE > > > > backend driver, or dynamically allocated from CMA and made restricted as > > > > needed by the TEE. > > > > > > > > This can be tested on QEMU with the following steps: > > > > repo init -u https://github.com/jenswi-linaro/manifest.git -m qemu_v8.xml \ > > > > -b prototype/sdp-v4 > > > > repo sync -j8 > > > > cd build > > > > make toolchains -j$(nproc) > > > > make SPMC_AT_EL=1 all -j$(nproc) > > > > make SPMC_AT_EL=1 run-only > > > > # login and at the prompt: > > > > xtest --sdp-basic > > > > > > > > The SPMC_AT_EL=1 parameter configures the build with FF-A and an SPMC at > > > > S-EL1 inside OP-TEE. The parameter can be changed into SPMC_AT_EL=n to test > > > > without FF-A using the original SMC ABI instead. Please remember to do > > > > %rm -rf ../trusted-firmware-a/build/qemu > > > > for TF-A to be rebuilt properly using the new configuration. > > > > > > > > https://optee.readthedocs.io/en/latest/building/prerequisites.html > > > > list dependencies needed to build the above. > > > > > > > > The tests are pretty basic, mostly checking that a Trusted Application in > > > > the secure world can access and manipulate the memory. There are also some > > > > negative tests for out of bounds buffers etc. > > > > > > I think I've dropped this on earlier encrypted dma-buf discussions for > > > TEE, but can't find one right now ... > > > > Thanks for raising this query. > > > > > > > > Do we have some open source userspace for this? To my knowledge we have > > > two implementations of encrypted/content protected dma-buf in upstream > > > right now in the amd and intel gpu drivers, and unless I'm mistaken they > > > both have some minimal userspace supporting EXT_protected_textures: > > > > First of all to clarify the support Jens is adding here for allocating > > restricted shared memory allocation in TEE subsystem is meant to be > > generic and not specific to only secure media pipeline use-case. Then > > here we not only have open source test applications but rather open > > source firmware too (OP-TEE as a Trusted OS) [1] supporting this as a > > core feature where we maintain a stable and extensible ABI among the > > kernel and the OP-TEE core. > > > > Restricted memory is a feature enforced by hardware specific firewalls > > where a particular TEE implementation governs which particular block > > of memory is accessible to a particular peripheral or a CPU running in > > a higher privileged mode than the Linux kernel. There can be numeric > > use-cases surrounding that as follows: > > > > - Secure media pipeline where the contents gets decrypted and stored > > in a restricted buffer which are then accessible only to media display > > pipeline peripherals. > > - Trusted user interface where a peripheral takes input from the user > > and stores it in a restricted buffer which then is accessible to TEE > > implementation only. > > - Another possible use-case can be for the TEE implementation to store > > key material in a restricted buffer which is only accessible to the > > hardware crypto accelerator. > > > > I am sure there will be more use-cases related to this feature but > > those will only be possible once we provide a stable and extensible > > restricted memory interface among the Linux user-space and the secure > > world user-space (normally referred to as Trusted Applications). > > > > [1] https://github.com/OP-TEE/optee_os/pull/7159 > > > > > > > > https://github.com/KhronosGroup/OpenGL-Registry/blob/main/extensions/EXT/EXT_protected_textures.txt > > > > > > It's not great, but it does just barely clear the bar in my opinion. I > > > guess something in gstreamer or similar video pipeline framework would > > > also do the job. > > > > > > Especially with the context of the uapi discussion in the v1/RFC thread I > > > think we need more than a bare-bones testcase to make sure this works in > > > actual use. > > > > Currently the TEE subsystem already supports a stable ABI for shared > > memory allocator among Linux user-space and secure world user-space > > here [2]. And the stable ABI for restricted memory is also along the > > same lines meant to be a vendor neutral abstraction for the user-space > > access. The current test cases not only test the interface but also > > perform regression tests too. > > > > I am also in favour of end to end open source use-cases. But I fear > > without progressing in a step wise manner as with this proposal we > > would rather force developers to upstream all the software pieces in > > one go which will be kind of a chicken and egg situation. I am sure > > once this feature lands Mediatek folks will be interested to port > > their secure video playback patchset [3] on top of it. Similarly other > > silicon vendors like NXP, Qcom etc. will be motivated to do the same. > > > > [2] https://docs.kernel.org/userspace-api/tee.html > > [3] https://lore.kernel.org/linux-arm-kernel/20240515112308.10171-1-yong.wu@mediatek.com/ > > We get entire opengl/vulkan driver stacks ready before we merge new drm > drivers, I really don't think this is too hard from a technical pov. And I > think the mediatek patches had the same issue of lacking userspace for it, > so that's not moving things forward. > -Sima > Okay fair enough, I think I get your point. Currently we are missing at least one peripheral support being the consumer for these restricted DMA-bufs. So I discussed with Jens offline that we can try with a crypto peripheral use-case first which can simply be demonstrated using the current OP-TEE client user-space. Also, in crypto peripheral use-case we can target the symmetric crypto use-case first which already has a concept of hardware backed symmetric key [1]. IOW, we should be able to come up with a generic symmetric crypto algorithm which can be supported by different crypto accelerators using a TEE backed restricted key DMA buffer. [1] https://www.youtube.com/watch?v=GbcpwUBFGDw -Sumit
On Tue, Dec 24, 2024 at 12:05:19PM +0530, Sumit Garg wrote:
> Restricted memory is a feature enforced by hardware specific firewalls
> where a particular TEE implementation governs which particular block
> of memory is accessible to a particular peripheral or a CPU running in
> a higher privileged mode than the Linux kernel.
[...]
> - Another possible use-case can be for the TEE implementation to store
> key material in a restricted buffer which is only accessible to the
> hardware crypto accelerator.
Just a heads-up:
For RSA sign/verify operations using rsassa-pkcs1 encoding,
the message to be signed/verified (which I understand could
be located in restricted memory) is prepended by a padding.
The crypto subsystem does the prepending of the padding in software.
The actual signature generation/verification (which is an RSA encrypt
or decrypt operation) may be performed in hardware by a crypto
accelerator.
Before commit 8552cb04e083 ("crypto: rsassa-pkcs1 - Copy source
data for SG list"), the kernel constructed a scatterlist
consisting of the padding on the one hand, and of the message
to be signed/verified on the other hand. I believe this worked
for use cases where the message is located in restricted memory.
However since that commit, the kernel kmalloc's a new buffer and
copies the message to be signed/verified into it. The argument
was that although the *kernel* may be able to access the data,
the crypto accelerator may *not* be able to do so. In particular,
portions of the padding are located in the kernel's .rodata section
which is a valid virtual address on x86 but not on arm64 and
which may be inaccessible to a crypto accelerator.
However in the case of restricted memory, the situation is exactly
the opposite: The kernel may *not* be able to access the data,
but the crypto accelerator can access it just fine.
I did raise a concern about this to the maintainer, but to no avail:
https://lore.kernel.org/r/Z1Kym1-9ka8kGHrM@wunner.de/
This is the alternative solution I would have preferred:
https://lore.kernel.org/r/3de5d373c86dcaa5abc36f501c1398c4fbf05f2f.1732865109.git.lukas@wunner.de/
> I am also in favour of end to end open source use-cases. But I fear
> without progressing in a step wise manner as with this proposal we
> would rather force developers to upstream all the software pieces in
> one go which will be kind of a chicken and egg situation. I am sure
> once this feature lands Mediatek folks will be interested to port
> their secure video playback patchset [3] on top of it. Similarly other
> silicon vendors like NXP, Qcom etc. will be motivated to do the same.
The crypto use case may be easier to bring up than the video decoding
use case because you don't need to implement a huge amount of
user space code.
Thanks,
Lukas
Hi Lukas,
On Tue, 24 Dec 2024 at 14:58, Lukas Wunner <lukas@wunner.de> wrote:
>
> On Tue, Dec 24, 2024 at 12:05:19PM +0530, Sumit Garg wrote:
> > Restricted memory is a feature enforced by hardware specific firewalls
> > where a particular TEE implementation governs which particular block
> > of memory is accessible to a particular peripheral or a CPU running in
> > a higher privileged mode than the Linux kernel.
> [...]
> > - Another possible use-case can be for the TEE implementation to store
> > key material in a restricted buffer which is only accessible to the
> > hardware crypto accelerator.
>
> Just a heads-up:
>
> For RSA sign/verify operations using rsassa-pkcs1 encoding,
> the message to be signed/verified (which I understand could
> be located in restricted memory) is prepended by a padding.
>
> The crypto subsystem does the prepending of the padding in software.
> The actual signature generation/verification (which is an RSA encrypt
> or decrypt operation) may be performed in hardware by a crypto
> accelerator.
>
> Before commit 8552cb04e083 ("crypto: rsassa-pkcs1 - Copy source
> data for SG list"), the kernel constructed a scatterlist
> consisting of the padding on the one hand, and of the message
> to be signed/verified on the other hand. I believe this worked
> for use cases where the message is located in restricted memory.
>
> However since that commit, the kernel kmalloc's a new buffer and
> copies the message to be signed/verified into it. The argument
> was that although the *kernel* may be able to access the data,
> the crypto accelerator may *not* be able to do so. In particular,
> portions of the padding are located in the kernel's .rodata section
> which is a valid virtual address on x86 but not on arm64 and
> which may be inaccessible to a crypto accelerator.
>
> However in the case of restricted memory, the situation is exactly
> the opposite: The kernel may *not* be able to access the data,
> but the crypto accelerator can access it just fine.
>
> I did raise a concern about this to the maintainer, but to no avail:
> https://lore.kernel.org/r/Z1Kym1-9ka8kGHrM@wunner.de/
Herbert's point is valid that there isn't any point for mapping
restricted memory in the kernel virtual address space as any kernel
access to that space can lead to platform specific hardware error
scenarios. And for that reason we simply disallow dma_buf_mmap() and
don't support dma_buf_vmap() for DMA-bufs holding TEE restricted
memory. The only consumers for those DMA-bufs will be the DMA capable
peripherals granted access permissions by the TEE implementation. IOW,
kernel role here will be to just provide the DMA-buf infrastructure
for buffers to be set up by TEE and then setting up DMA addresses for
peripherals to access them. The hardware crypto accelerator can be one
such peripheral.
>
> This is the alternative solution I would have preferred:
> https://lore.kernel.org/r/3de5d373c86dcaa5abc36f501c1398c4fbf05f2f.1732865109.git.lukas@wunner.de/
>
> > I am also in favour of end to end open source use-cases. But I fear
> > without progressing in a step wise manner as with this proposal we
> > would rather force developers to upstream all the software pieces in
> > one go which will be kind of a chicken and egg situation. I am sure
> > once this feature lands Mediatek folks will be interested to port
> > their secure video playback patchset [3] on top of it. Similarly other
> > silicon vendors like NXP, Qcom etc. will be motivated to do the same.
>
> The crypto use case may be easier to bring up than the video decoding
> use case because you don't need to implement a huge amount of
> user space code.
Agree, if you already have such an existing hardware use-case then
please feel free to build up on this patch-set.
-Sumit
>
> Thanks,
>
> Lukas
On Thu, Dec 26, 2024 at 11:29:23AM +0530, Sumit Garg wrote: > On Tue, 24 Dec 2024 at 14:58, Lukas Wunner <lukas@wunner.de> wrote: > > However in the case of restricted memory, the situation is exactly > > the opposite: The kernel may *not* be able to access the data, > > but the crypto accelerator can access it just fine. > > > > I did raise a concern about this to the maintainer, but to no avail: > > https://lore.kernel.org/r/Z1Kym1-9ka8kGHrM@wunner.de/ > > Herbert's point is valid that there isn't any point for mapping > restricted memory in the kernel virtual address space as any kernel > access to that space can lead to platform specific hardware error > scenarios. And for that reason we simply disallow dma_buf_mmap() and > don't support dma_buf_vmap() for DMA-bufs holding TEE restricted > memory. The API for signature generation/verification (e.g. crypto_sig_sign(), crypto_sig_verify()) no longer accepts scatterlists, only buffers in virtual address space: https://lore.kernel.org/all/ZIrnPcPj9Zbq51jK@gondor.apana.org.au/ Hence in order to use buffers in restricted memory for signature generation/verification, you'd need to map them into virtual address space first. Thanks, Lukas
On Thu, Dec 26, 2024 at 12:26:29PM +0100, Lukas Wunner wrote: > On Thu, Dec 26, 2024 at 11:29:23AM +0530, Sumit Garg wrote: > > On Tue, 24 Dec 2024 at 14:58, Lukas Wunner <lukas@wunner.de> wrote: > > > However in the case of restricted memory, the situation is exactly > > > the opposite: The kernel may *not* be able to access the data, > > > but the crypto accelerator can access it just fine. > > > > > > I did raise a concern about this to the maintainer, but to no avail: > > > https://lore.kernel.org/r/Z1Kym1-9ka8kGHrM@wunner.de/ > > > > Herbert's point is valid that there isn't any point for mapping > > restricted memory in the kernel virtual address space as any kernel > > access to that space can lead to platform specific hardware error > > scenarios. And for that reason we simply disallow dma_buf_mmap() and > > don't support dma_buf_vmap() for DMA-bufs holding TEE restricted > > memory. > > The API for signature generation/verification (e.g. crypto_sig_sign(), > crypto_sig_verify()) no longer accepts scatterlists, only buffers in > virtual address space: > > https://lore.kernel.org/all/ZIrnPcPj9Zbq51jK@gondor.apana.org.au/ > > Hence in order to use buffers in restricted memory for signature > generation/verification, you'd need to map them into virtual address > space first. Nope, you need to get that old api back. Kernel virtual address space mappings for dma-buf are very intentionally optional. -Sima -- Simona Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
On Tue, Dec 24, 2024 at 10:28:31AM +0100, Lukas Wunner wrote: > I did raise a concern about this to the maintainer, but to no avail: > https://lore.kernel.org/r/Z1Kym1-9ka8kGHrM@wunner.de/ Sorry, wrong link. This is the one I meant to copy-paste... :( https://lore.kernel.org/r/Z0rPxCGdD7r8HFKb@wunner.de/
On Tue, Dec 24, 2024 at 10:32:41AM +0100, Lukas Wunner wrote: > On Tue, Dec 24, 2024 at 10:28:31AM +0100, Lukas Wunner wrote: > > I did raise a concern about this to the maintainer, but to no avail: > > https://lore.kernel.org/r/Z1Kym1-9ka8kGHrM@wunner.de/ > > Sorry, wrong link. This is the one I meant to copy-paste... :( > > https://lore.kernel.org/r/Z0rPxCGdD7r8HFKb@wunner.de/ Herbert asked a logical question, which got no response from your side. -- With best wishes Dmitry
© 2016 - 2025 Red Hat, Inc.