MAINTAINERS | 7 + drivers/misc/Kconfig | 1 + drivers/misc/Makefile | 1 + drivers/misc/trinity/Kconfig | 27 + drivers/misc/trinity/Makefile | 12 + drivers/misc/trinity/sched/core.c | 170 ++ drivers/misc/trinity/sched/priority.c | 335 +++ drivers/misc/trinity/sched/priority.h | 18 + drivers/misc/trinity/sched/sched.h | 52 + drivers/misc/trinity/trinity.c | 1282 +++++++++++ drivers/misc/trinity/trinity_common.h | 434 ++++ drivers/misc/trinity/trinity_debug.c | 358 ++++ drivers/misc/trinity/trinity_hwmem.c | 438 ++++ drivers/misc/trinity/trinity_hwmem.h | 45 + drivers/misc/trinity/trinity_pm.c | 76 + drivers/misc/trinity/trinity_resv_mem.c | 264 +++ drivers/misc/trinity/trinity_resv_mem.h | 41 + drivers/misc/trinity/trinity_stat.c | 893 ++++++++ drivers/misc/trinity/trinity_stat.h | 56 + drivers/misc/trinity/trinity_sysfs.c | 864 ++++++++ drivers/misc/trinity/trinity_trace.c | 15 + drivers/misc/trinity/trinity_trace.h | 406 ++++ drivers/misc/trinity/trinity_vision2_drv.c | 1893 +++++++++++++++++ .../misc/trinity/trinity_vision2_profile.h | 324 +++ drivers/misc/trinity/trinity_vision2_regs.h | 210 ++ include/uapi/misc/trinity.h | 458 ++++ 26 files changed, 8680 insertions(+) create mode 100644 drivers/misc/trinity/Kconfig create mode 100644 drivers/misc/trinity/Makefile create mode 100644 drivers/misc/trinity/sched/core.c create mode 100644 drivers/misc/trinity/sched/priority.c create mode 100644 drivers/misc/trinity/sched/priority.h create mode 100644 drivers/misc/trinity/sched/sched.h create mode 100644 drivers/misc/trinity/trinity.c create mode 100644 drivers/misc/trinity/trinity_common.h create mode 100644 drivers/misc/trinity/trinity_debug.c create mode 100644 drivers/misc/trinity/trinity_hwmem.c create mode 100644 drivers/misc/trinity/trinity_hwmem.h create mode 100644 drivers/misc/trinity/trinity_pm.c create mode 100644 drivers/misc/trinity/trinity_resv_mem.c create mode 100644 drivers/misc/trinity/trinity_resv_mem.h create mode 100644 drivers/misc/trinity/trinity_stat.c create mode 100644 drivers/misc/trinity/trinity_stat.h create mode 100644 drivers/misc/trinity/trinity_sysfs.c create mode 100644 drivers/misc/trinity/trinity_trace.c create mode 100644 drivers/misc/trinity/trinity_trace.h create mode 100644 drivers/misc/trinity/trinity_vision2_drv.c create mode 100644 drivers/misc/trinity/trinity_vision2_profile.h create mode 100644 drivers/misc/trinity/trinity_vision2_regs.h create mode 100644 include/uapi/misc/trinity.h
Hello, My name is Jiho Chu, and working for device driver and system daemon for several years at Samsung Electronics. Trinity Neural Processing Unit (NPU) series are hardware accelerators for neural network processing in embedded systems, which are integrated into application processors or SoCs. Trinity NPU is compatible with AMBA bus architecture and first launched in 2018 with its first version for vision processing, Trinity Version1 (TRIV1). Its second version, TRIV2, is released in Dec, 2021. Another Trinity NPU for audio processing is referred as TRIA. TRIV2 is shipped for many models of 2022 Samsung TVs, providing acceleration for various AI-based applications, which include image recognition and picture quality improvements for streaming video, which can be accessed via GStreamer and its neural network plugins, NNStreamer. In this patch set, it includes Trinity Vision 2 kernel device driver. Trinity Vision 2 supports accelerating image inference process for Convolution Neural Network (CNN). The CNN workload is executed by Deep Learning Accelerator (DLA), and general Neural Network Layers are executed by Digital Signal Processor (DSP). And there is a Control Processor (CP) which can control DLA and DSP. These three IPs (DLA, DSP, CP) are composing Trinity Vision 2 NPU, and the device driver mainly supervise the CP to manage entire NPU. Controlling DLA and DSP operations is performed with internal command instructions. and the instructions for the Trinity is similar with general processor's ISA, but it is specialized for Neural Processing operations. The virtual ISA (vISA) is designed for calculating multiple data with single operation, like modern SIMD processor. The device driver loads a program to CP at start up, and the program can decode a binary which is built with the vISA. We calls this decoding program as a Instruction Decoding Unit (IDU) program. While running the NPU, the CP executes IDU program to fetch and decode instructions which made up of vISA, by the scheduling policy of the device driver. These DLA, DSP and CP are loosely coupled using ARM's AMBA, so the Trinity can easily communicate with most ARM processors. Each IPs designed to have memory-mapped registers which can be used to control the IP, and the CP provides Wait-For-Event (WFE) operation to subscribe interrupt signals from the DLA and DSP. Also, embedded Direct Memory Access Controller (DMAC) manages data communications between internal SRAM and outer main memory, IOMMU module supports unified memory space. A user can control the Trinity NPU with IOCTLs provided by driver. These controls includes memory management operations to transfer model data (HWMEM_ALLOC/HWMEM_DEALLOC), NPU workload control operations to submit workload (RUN/STOP), and statistics operations to check current NPU status. (STAT) The device driver also implemented features for developers. It provides sysfs control attributes like stop, suspend, sched_test, and profile. Also, it provides status attributes like app status, a number of total requests, a number of active requests and memory usages. For the tracing operations, several ftrace events are defined and embedded for several important points. I would highly appreciate your feedback. Review, question or anythings. Thanks. Jiho Chu Jiho Chu (9): trinity: Add base driver tirnity: Add dma memory module trinity: Add load/unload IDU files trinity: Add schduler module trinity: Add sysfs debugfs module trinity: Add pm and ioctl feature trinity: Add profile module trinity: Add trace module MAINTAINERS: add TRINITY driver MAINTAINERS | 7 + drivers/misc/Kconfig | 1 + drivers/misc/Makefile | 1 + drivers/misc/trinity/Kconfig | 27 + drivers/misc/trinity/Makefile | 12 + drivers/misc/trinity/sched/core.c | 170 ++ drivers/misc/trinity/sched/priority.c | 335 +++ drivers/misc/trinity/sched/priority.h | 18 + drivers/misc/trinity/sched/sched.h | 52 + drivers/misc/trinity/trinity.c | 1282 +++++++++++ drivers/misc/trinity/trinity_common.h | 434 ++++ drivers/misc/trinity/trinity_debug.c | 358 ++++ drivers/misc/trinity/trinity_hwmem.c | 438 ++++ drivers/misc/trinity/trinity_hwmem.h | 45 + drivers/misc/trinity/trinity_pm.c | 76 + drivers/misc/trinity/trinity_resv_mem.c | 264 +++ drivers/misc/trinity/trinity_resv_mem.h | 41 + drivers/misc/trinity/trinity_stat.c | 893 ++++++++ drivers/misc/trinity/trinity_stat.h | 56 + drivers/misc/trinity/trinity_sysfs.c | 864 ++++++++ drivers/misc/trinity/trinity_trace.c | 15 + drivers/misc/trinity/trinity_trace.h | 406 ++++ drivers/misc/trinity/trinity_vision2_drv.c | 1893 +++++++++++++++++ .../misc/trinity/trinity_vision2_profile.h | 324 +++ drivers/misc/trinity/trinity_vision2_regs.h | 210 ++ include/uapi/misc/trinity.h | 458 ++++ 26 files changed, 8680 insertions(+) create mode 100644 drivers/misc/trinity/Kconfig create mode 100644 drivers/misc/trinity/Makefile create mode 100644 drivers/misc/trinity/sched/core.c create mode 100644 drivers/misc/trinity/sched/priority.c create mode 100644 drivers/misc/trinity/sched/priority.h create mode 100644 drivers/misc/trinity/sched/sched.h create mode 100644 drivers/misc/trinity/trinity.c create mode 100644 drivers/misc/trinity/trinity_common.h create mode 100644 drivers/misc/trinity/trinity_debug.c create mode 100644 drivers/misc/trinity/trinity_hwmem.c create mode 100644 drivers/misc/trinity/trinity_hwmem.h create mode 100644 drivers/misc/trinity/trinity_pm.c create mode 100644 drivers/misc/trinity/trinity_resv_mem.c create mode 100644 drivers/misc/trinity/trinity_resv_mem.h create mode 100644 drivers/misc/trinity/trinity_stat.c create mode 100644 drivers/misc/trinity/trinity_stat.h create mode 100644 drivers/misc/trinity/trinity_sysfs.c create mode 100644 drivers/misc/trinity/trinity_trace.c create mode 100644 drivers/misc/trinity/trinity_trace.h create mode 100644 drivers/misc/trinity/trinity_vision2_drv.c create mode 100644 drivers/misc/trinity/trinity_vision2_profile.h create mode 100644 drivers/misc/trinity/trinity_vision2_regs.h create mode 100644 include/uapi/misc/trinity.h -- 2.25.1
On 25/07/2022 08:52, Jiho Chu wrote: > Hello, > > My name is Jiho Chu, and working for device driver and system daemon for > several years at Samsung Electronics. > > Trinity Neural Processing Unit (NPU) series are hardware accelerators > for neural network processing in embedded systems, which are integrated > into application processors or SoCs. Trinity NPU is compatible with AMBA > bus architecture and first launched in 2018 with its first version for > vision processing, Trinity Version1 (TRIV1). Its second version, TRIV2, > is released in Dec, 2021. Another Trinity NPU for audio processing is > referred as TRIA. > Why there are no bindings? How is it supposed to be used on ARM64 platforms? Best regards, Krzysztof
On Tue, 26 Jul 2022 08:57:08 +0200 Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> wrote: > On 25/07/2022 08:52, Jiho Chu wrote: > > Hello, > > > > My name is Jiho Chu, and working for device driver and system daemon for > > several years at Samsung Electronics. > > > > Trinity Neural Processing Unit (NPU) series are hardware accelerators > > for neural network processing in embedded systems, which are integrated > > into application processors or SoCs. Trinity NPU is compatible with AMBA > > bus architecture and first launched in 2018 with its first version for > > vision processing, Trinity Version1 (TRIV1). Its second version, TRIV2, > > is released in Dec, 2021. Another Trinity NPU for audio processing is > > referred as TRIA. > > > > Why there are no bindings? How is it supposed to be used on ARM64 platforms? > > > Best regards, > Krzysztof > Hi, Krzysztof Thanks for your review. A dt-bindings document under 'bindings/arm/' is being ready, and it could be included in v2. Sincerely, Jiho Chu
On Mon, Jul 25, 2022 at 03:52:59PM +0900, Jiho Chu wrote: > Hello, > > My name is Jiho Chu, and working for device driver and system daemon for > several years at Samsung Electronics. > > Trinity Neural Processing Unit (NPU) series are hardware accelerators > for neural network processing in embedded systems, which are integrated > into application processors or SoCs. Trinity NPU is compatible with AMBA > bus architecture and first launched in 2018 with its first version for > vision processing, Trinity Version1 (TRIV1). Its second version, TRIV2, > is released in Dec, 2021. Another Trinity NPU for audio processing is > referred as TRIA. > > TRIV2 is shipped for many models of 2022 Samsung TVs, providing > acceleration for various AI-based applications, which include image > recognition and picture quality improvements for streaming video, which > can be accessed via GStreamer and its neural network plugins, > NNStreamer. > > In this patch set, it includes Trinity Vision 2 kernel device driver. > Trinity Vision 2 supports accelerating image inference process for > Convolution Neural Network (CNN). The CNN workload is executed by Deep > Learning Accelerator (DLA), and general Neural Network Layers are > executed by Digital Signal Processor (DSP). And there is a Control > Processor (CP) which can control DLA and DSP. These three IPs (DLA, DSP, > CP) are composing Trinity Vision 2 NPU, and the device driver mainly > supervise the CP to manage entire NPU. > > Controlling DLA and DSP operations is performed with internal command > instructions. and the instructions for the Trinity is similar with > general processor's ISA, but it is specialized for Neural Processing > operations. The virtual ISA (vISA) is designed for calculating multiple > data with single operation, like modern SIMD processor. The device > driver loads a program to CP at start up, and the program can decode a > binary which is built with the vISA. We calls this decoding program as a > Instruction Decoding Unit (IDU) program. While running the NPU, the CP > executes IDU program to fetch and decode instructions which made up of > vISA, by the scheduling policy of the device driver. > > These DLA, DSP and CP are loosely coupled using ARM's AMBA, so the > Trinity can easily communicate with most ARM processors. Each IPs > designed to have memory-mapped registers which can be used to control > the IP, and the CP provides Wait-For-Event (WFE) operation to subscribe > interrupt signals from the DLA and DSP. Also, embedded Direct Memory > Access Controller (DMAC) manages data communications between internal > SRAM and outer main memory, IOMMU module supports unified memory space. > > A user can control the Trinity NPU with IOCTLs provided by driver. These > controls includes memory management operations to transfer model data > (HWMEM_ALLOC/HWMEM_DEALLOC), NPU workload control operations to submit > workload (RUN/STOP), and statistics operations to check current NPU > status. (STAT) > > The device driver also implemented features for developers. It provides > sysfs control attributes like stop, suspend, sched_test, and profile. > Also, it provides status attributes like app status, a number of total > requests, a number of active requests and memory usages. For the tracing > operations, several ftrace events are defined and embedded for several > important points. If you have created sysfs files, you need to document them in Documentation/ABI/ which I do not see in your diffstat. Perhaps add that for your next respin? Also, please remove the "tracing" logic you have in the code, use ftrace, don't abuse dev_info() everywhere, that's not needed at all. thanks, greg k-h
On Mon, Jul 25, 2022 at 12:02 PM Greg KH <gregkh@linuxfoundation.org> wrote: > > On Mon, Jul 25, 2022 at 03:52:59PM +0900, Jiho Chu wrote: > > Hello, > > > > My name is Jiho Chu, and working for device driver and system daemon for > > several years at Samsung Electronics. > > > > Trinity Neural Processing Unit (NPU) series are hardware accelerators > > for neural network processing in embedded systems, which are integrated > > into application processors or SoCs. Trinity NPU is compatible with AMBA > > bus architecture and first launched in 2018 with its first version for > > vision processing, Trinity Version1 (TRIV1). Its second version, TRIV2, > > is released in Dec, 2021. Another Trinity NPU for audio processing is > > referred as TRIA. > > > > TRIV2 is shipped for many models of 2022 Samsung TVs, providing > > acceleration for various AI-based applications, which include image > > recognition and picture quality improvements for streaming video, which > > can be accessed via GStreamer and its neural network plugins, > > NNStreamer. > > > > In this patch set, it includes Trinity Vision 2 kernel device driver. > > Trinity Vision 2 supports accelerating image inference process for > > Convolution Neural Network (CNN). The CNN workload is executed by Deep > > Learning Accelerator (DLA), and general Neural Network Layers are > > executed by Digital Signal Processor (DSP). And there is a Control > > Processor (CP) which can control DLA and DSP. These three IPs (DLA, DSP, > > CP) are composing Trinity Vision 2 NPU, and the device driver mainly > > supervise the CP to manage entire NPU. > > > > Controlling DLA and DSP operations is performed with internal command > > instructions. and the instructions for the Trinity is similar with > > general processor's ISA, but it is specialized for Neural Processing > > operations. The virtual ISA (vISA) is designed for calculating multiple > > data with single operation, like modern SIMD processor. The device > > driver loads a program to CP at start up, and the program can decode a > > binary which is built with the vISA. We calls this decoding program as a > > Instruction Decoding Unit (IDU) program. While running the NPU, the CP > > executes IDU program to fetch and decode instructions which made up of > > vISA, by the scheduling policy of the device driver. > > > > These DLA, DSP and CP are loosely coupled using ARM's AMBA, so the > > Trinity can easily communicate with most ARM processors. Each IPs > > designed to have memory-mapped registers which can be used to control > > the IP, and the CP provides Wait-For-Event (WFE) operation to subscribe > > interrupt signals from the DLA and DSP. Also, embedded Direct Memory > > Access Controller (DMAC) manages data communications between internal > > SRAM and outer main memory, IOMMU module supports unified memory space. > > > > A user can control the Trinity NPU with IOCTLs provided by driver. These > > controls includes memory management operations to transfer model data > > (HWMEM_ALLOC/HWMEM_DEALLOC), NPU workload control operations to submit > > workload (RUN/STOP), and statistics operations to check current NPU > > status. (STAT) > > > > The device driver also implemented features for developers. It provides > > sysfs control attributes like stop, suspend, sched_test, and profile. > > Also, it provides status attributes like app status, a number of total > > requests, a number of active requests and memory usages. For the tracing > > operations, several ftrace events are defined and embedded for several > > important points. > > If you have created sysfs files, you need to document them in > Documentation/ABI/ which I do not see in your diffstat. Perhaps add > that for your next respin? > > Also, please remove the "tracing" logic you have in the code, use > ftrace, don't abuse dev_info() everywhere, that's not needed at all. > > thanks, > > greg k-h Hi, Why isn't this submitted to soc/ subsystem ? Don't you think that would be more appropriate, given that this IP is integrated into application processors ? Thanks, Oded
> Hi, > Why isn't this submitted to soc/ subsystem ? > Don't you think that would be more appropriate, given that this IP is > integrated into application processors ? > > Thanks, > Oded This series (Trinity-V2.3, V2.4, A1, ..) is being integrated to multiple SoCs, not limited to Samsung-designed chips (e.g., Exynos). It's a bit weird to have them in /drivers/soc/samsung. CC: Krzysztof and Alim (Samsung-SoC maintainers) Cheers, MyungJoo
On 26/07/2022 04:09, MyungJoo Ham wrote: >> Hi, >> Why isn't this submitted to soc/ subsystem ? >> Don't you think that would be more appropriate, given that this IP is >> integrated into application processors ? >> >> Thanks, >> Oded > > This series (Trinity-V2.3, V2.4, A1, ..) is being integrated to multiple SoCs, > not limited to Samsung-designed chips (e.g., Exynos). > It's a bit weird to have them in /drivers/soc/samsung. > > CC: Krzysztof and Alim (Samsung-SoC maintainers) If it is not related to Samsung SoCs (or other designs by Samsung Foundry), then it should not go to drivers/soc. Based on cover letter, it looks this is the case. Best regards, Krzysztof
On Tue, Jul 26, 2022 at 8:59 AM Krzysztof Kozlowski
<krzysztof.kozlowski@linaro.org> wrote:
> On 26/07/2022 04:09, MyungJoo Ham wrote:
> >> Hi,
> >> Why isn't this submitted to soc/ subsystem ?
> >> Don't you think that would be more appropriate, given that this IP is
> >> integrated into application processors ?
> >>
> >> Thanks,
> >> Oded
> >
> > This series (Trinity-V2.3, V2.4, A1, ..) is being integrated to multiple SoCs,
> > not limited to Samsung-designed chips (e.g., Exynos).
> > It's a bit weird to have them in /drivers/soc/samsung.
> >
> > CC: Krzysztof and Alim (Samsung-SoC maintainers)
>
> If it is not related to Samsung SoCs (or other designs by Samsung
> Foundry), then it should not go to drivers/soc. Based on cover letter,
> it looks this is the case.
Agreed, and I also don't want to add any drivers with a user interface
to drivers/soc/. The things we have in there mainly fall into two categories:
- soc_device drivers for identifying the SoC itself from userspace or
another driver
- drivers that provide exported symbols to other kernel drivers for things
that do not have a proper subsystem abstraction (yet).
This driver clearly does not fall into those categories. As long as there
is no subsystem for NPUs, the only sensible options are drivers/gpu
and drivers/misc/.
Arnd
Hi! > This driver clearly does not fall into those categories. As long as there > is no subsystem for NPUs, the only sensible options are drivers/gpu > and drivers/misc/. Well, we can create drivers/npu. I'm sure these will get more widespread. And GPU people really should be cc-ed. Best regards, Pavel --
On Tue, Jul 26, 2022 at 10:51 AM Arnd Bergmann <arnd@arndb.de> wrote: > > On Tue, Jul 26, 2022 at 8:59 AM Krzysztof Kozlowski > <krzysztof.kozlowski@linaro.org> wrote: > > On 26/07/2022 04:09, MyungJoo Ham wrote: > > >> Hi, > > >> Why isn't this submitted to soc/ subsystem ? > > >> Don't you think that would be more appropriate, given that this IP is > > >> integrated into application processors ? > > >> > > >> Thanks, > > >> Oded > > > > > > This series (Trinity-V2.3, V2.4, A1, ..) is being integrated to multiple SoCs, > > > not limited to Samsung-designed chips (e.g., Exynos). > > > It's a bit weird to have them in /drivers/soc/samsung. > > > > > > CC: Krzysztof and Alim (Samsung-SoC maintainers) > > > > If it is not related to Samsung SoCs (or other designs by Samsung > > Foundry), then it should not go to drivers/soc. Based on cover letter, > > it looks this is the case. > > Agreed, and I also don't want to add any drivers with a user interface > to drivers/soc/. The things we have in there mainly fall into two categories: > > - soc_device drivers for identifying the SoC itself from userspace or > another driver > > - drivers that provide exported symbols to other kernel drivers for things > that do not have a proper subsystem abstraction (yet). > > This driver clearly does not fall into those categories. As long as there > is no subsystem for NPUs, the only sensible options are drivers/gpu > and drivers/misc/. > > Arnd Thanks for the explanation, I wasn't sure what the criteria for getting into drivers/soc is, but now it is clear. Oded
© 2016 - 2026 Red Hat, Inc.