arch/arm64/boot/dts/renesas/Makefile | 2 +
.../boot/dts/renesas/r8a779f0-spider-ep.dts | 46 +
.../boot/dts/renesas/r8a779f0-spider-rc.dts | 52 +
drivers/dma/dw-edma/dw-edma-core.c | 28 +-
drivers/iommu/ipmmu-vmsa.c | 7 +-
drivers/net/ntb_netdev.c | 341 ++-
drivers/ntb/Kconfig | 11 +
drivers/ntb/Makefile | 3 +
drivers/ntb/hw/amd/ntb_hw_amd.c | 6 +-
drivers/ntb/hw/epf/ntb_hw_epf.c | 177 +-
drivers/ntb/hw/idt/ntb_hw_idt.c | 3 +-
drivers/ntb/hw/intel/ntb_hw_gen1.c | 6 +-
drivers/ntb/hw/intel/ntb_hw_gen1.h | 2 +-
drivers/ntb/hw/intel/ntb_hw_gen3.c | 3 +-
drivers/ntb/hw/intel/ntb_hw_gen4.c | 6 +-
drivers/ntb/hw/mscc/ntb_hw_switchtec.c | 6 +-
drivers/ntb/msi.c | 6 +-
drivers/ntb/ntb_edma.c | 628 ++++++
drivers/ntb/ntb_edma.h | 128 ++
.../{ntb_transport.c => ntb_transport_core.c} | 1829 ++++++++++++++---
drivers/ntb/test/ntb_perf.c | 4 +-
drivers/ntb/test/ntb_tool.c | 6 +-
.../pci/controller/dwc/pcie-designware-ep.c | 287 ++-
drivers/pci/controller/dwc/pcie-designware.h | 7 +
drivers/pci/endpoint/functions/pci-epf-vntb.c | 229 ++-
drivers/pci/endpoint/pci-epc-core.c | 44 +
include/linux/ntb.h | 39 +-
include/linux/ntb_transport.h | 21 +
include/linux/pci-epc.h | 11 +
29 files changed, 3415 insertions(+), 523 deletions(-)
create mode 100644 arch/arm64/boot/dts/renesas/r8a779f0-spider-ep.dts
create mode 100644 arch/arm64/boot/dts/renesas/r8a779f0-spider-rc.dts
create mode 100644 drivers/ntb/ntb_edma.c
create mode 100644 drivers/ntb/ntb_edma.h
rename drivers/ntb/{ntb_transport.c => ntb_transport_core.c} (59%)
Hi,
This is RFC v2 of the NTB/PCI series for Renesas R-Car S4. The ultimate
goal is unchanged, i.e. to improve performance between RC and EP
(with vNTB) over ntb_transport, but the approach has changed drastically.
Based on the feedback from Frank Li in the v1 thread, in particular:
https://lore.kernel.org/all/aQEsip3TsPn4LJY9@lizhi-Precision-Tower-5810/
this RFC v2 instead builds an NTB transport backed by remote eDMA
architecture and reshapes the series around it. The RC->EP interruption
is now achieved using a dedicated eDMA read channel, so the somewhat
"hack"-ish approach in RFC v1 is no longer needed.
Compared to RFC v1, this v2 series enables NTB transport backed by
remote DW eDMA, so the current ntb_transport handling of Memory Window
is no longer needed, and direct DMA transfers between EP and RC are
used.
I realize this is quite a large series. Sorry for the volume, but for
the RFC stage I believe presenting the full picture in a single set
helps with reviewing the overall architecture. Once the direction is
agreed, I will respin it split by subsystem and topic.
The new architecture
====================
In the new architecture the endpoint exposes a small memory window that
contains the unrolled DesignWare eDMA register block plus a per-channel
control structure and linked-list rings. The endpoint allocates these in
its own memory, then maps them into a peer MW via an inbound iATU
region. The host maps the peer MW, configures a dw-edma engine to use
the remote rings. The data plane flow is depicted as below (Figure 1 and
Figure 2).
With this design, per-queue PCI memory usage is reduced to control-plane
metadata (ring descriptors and indices). Data buffers live in system
memory and are transferred by the remote eDMA, so even relatively small
BAR windows can theoritically scale to multiple ntb_transport queue
pairs, and it no longer requires the DMA_MEMCPY operation. This series
also adds ntb_netdev multiple queues support to demonstrate performance
improvement.
The shared-memory ntb_transport backend remains the default. The remote
eDMA mode is compile-time and run-time selectable via
CONFIG_NTB_TRANSPORT_EDMA and the new 'use_remote_edma' module
parameter, and existing users that do not enable it should see no
behavioural change apart from the BAR subrange support described below.
Figure 1. RC->EP traffic via ntb_netdev+ntb_transport
backed by Remote eDMA
EP RC
phys addr phys addr
space space
+-+ +-+
| | | |
| | || | |
+-+-----. || | |
EDMA REG | | \ [A] || | |
+-+----. '---+-+ || | |
| | \ | |<---------[0-a]----------
+-+-----------| |<----------[2]----------.
EDMA LL | | | | || | | :
| | | | || | | :
+-+-----------+-+ || [B] | | :
| | || ++ | | :
---------[0-b]----------->||----------------'
| | ++ || || | |
| | || || ++ | |
| | ||<----------[4]-----------
| | ++ || | |
| | [C] || | |
.--|#|<------------------------[3]------|#|<-.
: |#| || |#| :
[5] | | || | | [1]
: | | || | | :
'->|#| |#|--'
|#| |#|
| | | |
0-a. configure Remote eDMA
0-b. DMA-map and produce DAR
1. memcpy while building skb in ntb_netdev case
2. consume DAR, DMA-map SAR and kick DMA read transfer
3. DMA read transfer (initiated by RC remotely)
4. consume (commit)
5. memcpy to application side
[A]: MemoryWindow that aggregates eDMA regs and LL.
IB iATU translations (Address Match Mode).
[B]: Control plane ring buffer (for "produce")
[C]: Control plane ring buffer (for "consume")
Figure 2. EP->RC traffic via ntb_netdev+ntb_transport
backed by Remote eDMA
EP RC
phys addr phys addr
space space
+-+ +-+
| | | |
| | || | |
+-+-----. || | |
EDMA REG | | \ [A] || | |
+-+----. '---+-+ || | |
| | \ | |<----------[0]-----------
+-+-----------| |<----------[3]----------.
EDMA LL | | | | || | | :
| | | | || | | :
+-+-----------+-+ || [B] | | :
| | || ++ | | :
-----------[2]----------->||----------------'
| | ++ || || | |
| | || || ++ | |
| | ||<----------[5]-----------
| | ++ || | |
| | [C] || | |
.->|#|--------[4]---------------------->|#|--.
: |#| || |#| :
[1] | | || | | [6]
: | | || | | :
'--|#| |#|<-'
|#| |#|
| | | |
0-a. configure Remote eDMA
1. memcpy while building skb in ntb_netdev case
2. DMA-map SAR and "produce"
3. consume SAR, DMA-map DAR and kick DMA write transfer
4. DMA write transfer (initiated by RC remotely)
5. consume (commit)
6. memcpy to application side
[A]: MemoryWindow that aggregates eDMA regs and LL.
IB iATU translations (Address Match Mode).
[B]: Control plane ring buffer (for "produce")
[C]: Control plane ring buffer (for "consume")
Patch layout
============
Patch 01-19 : preparation for Patch 20
- 01-10: support multiple MWs in a BAR
- 11-19: other misc preparations
Patch 20 : main and most important patch, adds remote eDMA support
Patch 21-22 : multi-queue use, thanks to the remote eDMA, performance
scales
Patch 23-27 : handle several SoC-specific issues so that remote eDMA
mode ntb_transport works on R-Car S4
Changelog
=========
RFCv1->RFCv2 changes:
- Architecture
- Drop the generic interrupt backend + DW eDMA test-interrupt backend
approach and instead adopt the remote eDMA-backed ntb_transport mode
proposed by Frank Li. The BAR-sharing / mwN_offset / inbound
mapping (Address Match Mode) infrastructure from RFC v1 is largely
kept, with only minor refinements and code motion where necessary
to fit the new transport-mode design.
- For Patch 01
- Rework the array_index_nospec() conversion to address review
comments on "[RFC PATCH 01/25]".
RFCv1: https://lore.kernel.org/all/20251023071916.901355-1-den@valinux.co.jp/
Tested on
=========
* 2x Renesas R-Car S4 Spider (RC<->EP connected with OcuLink cable)
* Kernel base: next-20251128
Performance measurement
=======================
No serious measurements yet, because:
* For "before the change", even use_dma/use_msi does not work on the
upstream kernel unless we apply some patches for R-Car S4. With some
unmerged patch series I had posted earlier, it was observed that we
can achieve about 7 Gbps for the RC->EP direction. Pure upstream
kernel can achieve around 500 Mbps though.
* For "after the change", measurements are not mature because this
RFC v2 patch series is not yet performance-optimized at this stage.
Also, somewhat unstable behaviour remains around ntb_edma_isr().
Here are the rough measurements showing the achievable performance on
the R-Car S4:
- Before this change:
* ping
64 bytes from 10.0.0.11: icmp_seq=1 ttl=64 time=12.3 ms
64 bytes from 10.0.0.11: icmp_seq=2 ttl=64 time=6.58 ms
64 bytes from 10.0.0.11: icmp_seq=3 ttl=64 time=1.26 ms
64 bytes from 10.0.0.11: icmp_seq=4 ttl=64 time=7.43 ms
64 bytes from 10.0.0.11: icmp_seq=5 ttl=64 time=1.39 ms
64 bytes from 10.0.0.11: icmp_seq=6 ttl=64 time=7.38 ms
64 bytes from 10.0.0.11: icmp_seq=7 ttl=64 time=1.42 ms
64 bytes from 10.0.0.11: icmp_seq=8 ttl=64 time=7.41 ms
* RC->EP (`sudo iperf3 -ub0 -l 65480 -P 2`)
[ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams
[ 5] 0.00-10.01 sec 344 MBytes 288 Mbits/sec 3.483 ms 51/5555 (0.92%) receiver
[ 6] 0.00-10.01 sec 342 MBytes 287 Mbits/sec 3.814 ms 38/5517 (0.69%) receiver
[SUM] 0.00-10.01 sec 686 MBytes 575 Mbits/sec 3.648 ms 89/11072 (0.8%) receiver
* EP->RC (`sudo iperf3 -ub0 -l 65480 -P 2`)
[ 5] 0.00-10.03 sec 334 MBytes 279 Mbits/sec 3.164 ms 390/5731 (6.8%) receiver
[ 6] 0.00-10.03 sec 334 MBytes 279 Mbits/sec 2.416 ms 396/5741 (6.9%) receiver
[SUM] 0.00-10.03 sec 667 MBytes 558 Mbits/sec 2.790 ms 786/11472 (6.9%) receiver
Note: with `-P 2`, the best total bitrate (receiver side) was achieved.
- After this change (use_remote_edma=1) [1]:
* ping
64 bytes from 10.0.0.11: icmp_seq=1 ttl=64 time=1.48 ms
64 bytes from 10.0.0.11: icmp_seq=2 ttl=64 time=1.03 ms
64 bytes from 10.0.0.11: icmp_seq=3 ttl=64 time=0.931 ms
64 bytes from 10.0.0.11: icmp_seq=4 ttl=64 time=0.910 ms
64 bytes from 10.0.0.11: icmp_seq=5 ttl=64 time=1.07 ms
64 bytes from 10.0.0.11: icmp_seq=6 ttl=64 time=0.986 ms
64 bytes from 10.0.0.11: icmp_seq=7 ttl=64 time=0.910 ms
64 bytes from 10.0.0.11: icmp_seq=8 ttl=64 time=0.883 ms
* RC->EP (`sudo iperf3 -ub0 -l 65480 -P 4`)
[ 5] 0.00-10.01 sec 3.54 GBytes 3.04 Gbits/sec 0.030 ms 0/58007 (0%) receiver
[ 6] 0.00-10.01 sec 3.71 GBytes 3.19 Gbits/sec 0.453 ms 0/60909 (0%) receiver
[ 9] 0.00-10.01 sec 3.85 GBytes 3.30 Gbits/sec 0.027 ms 0/63072 (0%) receiver
[ 11] 0.00-10.01 sec 3.26 GBytes 2.80 Gbits/sec 0.070 ms 1/53512 (0.0019%) receiver
[SUM] 0.00-10.01 sec 14.4 GBytes 12.3 Gbits/sec 0.145 ms 1/235500 (0.00042%) receiver
* EP->RC (`sudo iperf3 -ub0 -l 65480 -P 4`)
[ 5] 0.00-10.03 sec 3.40 GBytes 2.91 Gbits/sec 0.104 ms 15467/71208 (22%) receiver
[ 6] 0.00-10.03 sec 3.08 GBytes 2.64 Gbits/sec 0.176 ms 12097/62609 (19%) receiver
[ 9] 0.00-10.03 sec 3.38 GBytes 2.90 Gbits/sec 0.270 ms 17212/72710 (24%) receiver
[ 11] 0.00-10.03 sec 2.56 GBytes 2.19 Gbits/sec 0.200 ms 11193/53090 (21%) receiver
[SUM] 0.00-10.03 sec 12.4 GBytes 10.6 Gbits/sec 0.188 ms 55969/259617 (22%) receiver
[1] configfs settings:
# modprobe pci_epf_vntb dyndbg=+pmf
# cd /sys/kernel/config/pci_ep/
# mkdir functions/pci_epf_vntb/func1
# echo 0x1912 > functions/pci_epf_vntb/func1/vendorid
# echo 0x0030 > functions/pci_epf_vntb/func1/deviceid
# echo 32 > functions/pci_epf_vntb/func1/msi_interrupts
# echo 16 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/db_count
# echo 128 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/spad_count
# echo 2 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/num_mws
# echo 0xe0000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw1
# echo 0x20000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw2
# echo 0xe0000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw2_offset
# echo 0x1912 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/vntb_vid
# echo 0x0030 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/vntb_pid
# echo 0x10 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/vbus_number
# echo 0 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/ctrl_bar
# echo 4 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/db_bar
# echo 2 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw1_bar
# echo 2 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw2_bar
# ln -s controllers/e65d0000.pcie-ep functions/pci_epf_vntb/func1/primary/
# echo 1 > controllers/e65d0000.pcie-ep/start
Thanks for taking a look.
Koichiro Den (27):
PCI: endpoint: pci-epf-vntb: Use array_index_nospec() on mws_size[]
access
PCI: endpoint: pci-epf-vntb: Add mwN_offset configfs attributes
NTB: epf: Handle mwN_offset for inbound MW regions
PCI: endpoint: Add inbound mapping ops to EPC core
PCI: dwc: ep: Implement EPC inbound mapping support
PCI: endpoint: pci-epf-vntb: Use pci_epc_map_inbound() for MW mapping
NTB: Add offset parameter to MW translation APIs
PCI: endpoint: pci-epf-vntb: Propagate MW offset from configfs when
present
NTB: ntb_transport: Support offsetted partial memory windows
NTB: core: Add .get_pci_epc() to ntb_dev_ops
NTB: epf: vntb: Implement .get_pci_epc() callback
damengine: dw-edma: Fix MSI data values for multi-vector IMWr
interrupts
NTB: ntb_transport: Use seq_file for QP stats debugfs
NTB: ntb_transport: Move TX memory window setup into setup_qp_mw()
NTB: ntb_transport: Dynamically determine qp count
NTB: ntb_transport: Introduce get_dma_dev() helper
NTB: epf: Reserve a subset of MSI vectors for non-NTB users
NTB: ntb_transport: Introduce ntb_transport_backend_ops
PCI: dwc: ep: Cache MSI outbound iATU mapping
NTB: ntb_transport: Introduce remote eDMA backed transport mode
NTB: epf: Provide db_vector_count/db_vector_mask callbacks
ntb_netdev: Multi-queue support
NTB: epf: Add per-SoC quirk to cap MRRS for DWC eDMA (128B for R-Car)
iommu: ipmmu-vmsa: Add PCIe ch0 to devices_allowlist
iommu: ipmmu-vmsa: Add support for reserved regions
arm64: dts: renesas: Add Spider RC/EP DTs for NTB with remote DW PCIe
eDMA
NTB: epf: Add an additional memory window (MW2) barno mapping on
Renesas R-Car
arch/arm64/boot/dts/renesas/Makefile | 2 +
.../boot/dts/renesas/r8a779f0-spider-ep.dts | 46 +
.../boot/dts/renesas/r8a779f0-spider-rc.dts | 52 +
drivers/dma/dw-edma/dw-edma-core.c | 28 +-
drivers/iommu/ipmmu-vmsa.c | 7 +-
drivers/net/ntb_netdev.c | 341 ++-
drivers/ntb/Kconfig | 11 +
drivers/ntb/Makefile | 3 +
drivers/ntb/hw/amd/ntb_hw_amd.c | 6 +-
drivers/ntb/hw/epf/ntb_hw_epf.c | 177 +-
drivers/ntb/hw/idt/ntb_hw_idt.c | 3 +-
drivers/ntb/hw/intel/ntb_hw_gen1.c | 6 +-
drivers/ntb/hw/intel/ntb_hw_gen1.h | 2 +-
drivers/ntb/hw/intel/ntb_hw_gen3.c | 3 +-
drivers/ntb/hw/intel/ntb_hw_gen4.c | 6 +-
drivers/ntb/hw/mscc/ntb_hw_switchtec.c | 6 +-
drivers/ntb/msi.c | 6 +-
drivers/ntb/ntb_edma.c | 628 ++++++
drivers/ntb/ntb_edma.h | 128 ++
.../{ntb_transport.c => ntb_transport_core.c} | 1829 ++++++++++++++---
drivers/ntb/test/ntb_perf.c | 4 +-
drivers/ntb/test/ntb_tool.c | 6 +-
.../pci/controller/dwc/pcie-designware-ep.c | 287 ++-
drivers/pci/controller/dwc/pcie-designware.h | 7 +
drivers/pci/endpoint/functions/pci-epf-vntb.c | 229 ++-
drivers/pci/endpoint/pci-epc-core.c | 44 +
include/linux/ntb.h | 39 +-
include/linux/ntb_transport.h | 21 +
include/linux/pci-epc.h | 11 +
29 files changed, 3415 insertions(+), 523 deletions(-)
create mode 100644 arch/arm64/boot/dts/renesas/r8a779f0-spider-ep.dts
create mode 100644 arch/arm64/boot/dts/renesas/r8a779f0-spider-rc.dts
create mode 100644 drivers/ntb/ntb_edma.c
create mode 100644 drivers/ntb/ntb_edma.h
rename drivers/ntb/{ntb_transport.c => ntb_transport_core.c} (59%)
--
2.48.1
On Sun, Nov 30, 2025 at 01:03:38AM +0900, Koichiro Den wrote:
> Hi,
>
> This is RFC v2 of the NTB/PCI series for Renesas R-Car S4. The ultimate
> goal is unchanged, i.e. to improve performance between RC and EP
> (with vNTB) over ntb_transport, but the approach has changed drastically.
> Based on the feedback from Frank Li in the v1 thread, in particular:
> https://lore.kernel.org/all/aQEsip3TsPn4LJY9@lizhi-Precision-Tower-5810/
> this RFC v2 instead builds an NTB transport backed by remote eDMA
> architecture and reshapes the series around it. The RC->EP interruption
> is now achieved using a dedicated eDMA read channel, so the somewhat
> "hack"-ish approach in RFC v1 is no longer needed.
>
> Compared to RFC v1, this v2 series enables NTB transport backed by
> remote DW eDMA, so the current ntb_transport handling of Memory Window
> is no longer needed, and direct DMA transfers between EP and RC are
> used.
>
> I realize this is quite a large series. Sorry for the volume, but for
> the RFC stage I believe presenting the full picture in a single set
> helps with reviewing the overall architecture. Once the direction is
> agreed, I will respin it split by subsystem and topic.
>
>
...
>
> - Before this change:
>
> * ping
> 64 bytes from 10.0.0.11: icmp_seq=1 ttl=64 time=12.3 ms
> 64 bytes from 10.0.0.11: icmp_seq=2 ttl=64 time=6.58 ms
> 64 bytes from 10.0.0.11: icmp_seq=3 ttl=64 time=1.26 ms
> 64 bytes from 10.0.0.11: icmp_seq=4 ttl=64 time=7.43 ms
> 64 bytes from 10.0.0.11: icmp_seq=5 ttl=64 time=1.39 ms
> 64 bytes from 10.0.0.11: icmp_seq=6 ttl=64 time=7.38 ms
> 64 bytes from 10.0.0.11: icmp_seq=7 ttl=64 time=1.42 ms
> 64 bytes from 10.0.0.11: icmp_seq=8 ttl=64 time=7.41 ms
>
> * RC->EP (`sudo iperf3 -ub0 -l 65480 -P 2`)
> [ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams
> [ 5] 0.00-10.01 sec 344 MBytes 288 Mbits/sec 3.483 ms 51/5555 (0.92%) receiver
> [ 6] 0.00-10.01 sec 342 MBytes 287 Mbits/sec 3.814 ms 38/5517 (0.69%) receiver
> [SUM] 0.00-10.01 sec 686 MBytes 575 Mbits/sec 3.648 ms 89/11072 (0.8%) receiver
>
> * EP->RC (`sudo iperf3 -ub0 -l 65480 -P 2`)
> [ 5] 0.00-10.03 sec 334 MBytes 279 Mbits/sec 3.164 ms 390/5731 (6.8%) receiver
> [ 6] 0.00-10.03 sec 334 MBytes 279 Mbits/sec 2.416 ms 396/5741 (6.9%) receiver
> [SUM] 0.00-10.03 sec 667 MBytes 558 Mbits/sec 2.790 ms 786/11472 (6.9%) receiver
>
> Note: with `-P 2`, the best total bitrate (receiver side) was achieved.
>
> - After this change (use_remote_edma=1) [1]:
>
> * ping
> 64 bytes from 10.0.0.11: icmp_seq=1 ttl=64 time=1.48 ms
> 64 bytes from 10.0.0.11: icmp_seq=2 ttl=64 time=1.03 ms
> 64 bytes from 10.0.0.11: icmp_seq=3 ttl=64 time=0.931 ms
> 64 bytes from 10.0.0.11: icmp_seq=4 ttl=64 time=0.910 ms
> 64 bytes from 10.0.0.11: icmp_seq=5 ttl=64 time=1.07 ms
> 64 bytes from 10.0.0.11: icmp_seq=6 ttl=64 time=0.986 ms
> 64 bytes from 10.0.0.11: icmp_seq=7 ttl=64 time=0.910 ms
> 64 bytes from 10.0.0.11: icmp_seq=8 ttl=64 time=0.883 ms
>
> * RC->EP (`sudo iperf3 -ub0 -l 65480 -P 4`)
> [ 5] 0.00-10.01 sec 3.54 GBytes 3.04 Gbits/sec 0.030 ms 0/58007 (0%) receiver
> [ 6] 0.00-10.01 sec 3.71 GBytes 3.19 Gbits/sec 0.453 ms 0/60909 (0%) receiver
> [ 9] 0.00-10.01 sec 3.85 GBytes 3.30 Gbits/sec 0.027 ms 0/63072 (0%) receiver
> [ 11] 0.00-10.01 sec 3.26 GBytes 2.80 Gbits/sec 0.070 ms 1/53512 (0.0019%) receiver
> [SUM] 0.00-10.01 sec 14.4 GBytes 12.3 Gbits/sec 0.145 ms 1/235500 (0.00042%) receiver
>
> * EP->RC (`sudo iperf3 -ub0 -l 65480 -P 4`)
> [ 5] 0.00-10.03 sec 3.40 GBytes 2.91 Gbits/sec 0.104 ms 15467/71208 (22%) receiver
> [ 6] 0.00-10.03 sec 3.08 GBytes 2.64 Gbits/sec 0.176 ms 12097/62609 (19%) receiver
> [ 9] 0.00-10.03 sec 3.38 GBytes 2.90 Gbits/sec 0.270 ms 17212/72710 (24%) receiver
> [ 11] 0.00-10.03 sec 2.56 GBytes 2.19 Gbits/sec 0.200 ms 11193/53090 (21%) receiver
Almost 10x fast, 2.9G vs 279M? high light this one will bring more peopole
interesting about this topic.
> [SUM] 0.00-10.03 sec 12.4 GBytes 10.6 Gbits/sec 0.188 ms 55969/259617 (22%) receiver
>
> [1] configfs settings:
> # modprobe pci_epf_vntb dyndbg=+pmf
> # cd /sys/kernel/config/pci_ep/
> # mkdir functions/pci_epf_vntb/func1
> # echo 0x1912 > functions/pci_epf_vntb/func1/vendorid
> # echo 0x0030 > functions/pci_epf_vntb/func1/deviceid
> # echo 32 > functions/pci_epf_vntb/func1/msi_interrupts
> # echo 16 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/db_count
> # echo 128 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/spad_count
> # echo 2 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/num_mws
> # echo 0xe0000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw1
> # echo 0x20000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw2
> # echo 0xe0000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw2_offset
look like, you try to create sub-small mw windows.
Is it more clean ?
echo 0xe0000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw1.0
echo 0x20000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw1.1
so wm1.1 natively continue from prevous one.
Frank
> # echo 0x1912 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/vntb_vid
> # echo 0x0030 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/vntb_pid
> # echo 0x10 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/vbus_number
> # echo 0 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/ctrl_bar
> # echo 4 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/db_bar
> # echo 2 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw1_bar
> # echo 2 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw2_bar
> # ln -s controllers/e65d0000.pcie-ep functions/pci_epf_vntb/func1/primary/
> # echo 1 > controllers/e65d0000.pcie-ep/start
>
>
> Thanks for taking a look.
>
>
> Koichiro Den (27):
> PCI: endpoint: pci-epf-vntb: Use array_index_nospec() on mws_size[]
> access
> PCI: endpoint: pci-epf-vntb: Add mwN_offset configfs attributes
> NTB: epf: Handle mwN_offset for inbound MW regions
> PCI: endpoint: Add inbound mapping ops to EPC core
> PCI: dwc: ep: Implement EPC inbound mapping support
> PCI: endpoint: pci-epf-vntb: Use pci_epc_map_inbound() for MW mapping
> NTB: Add offset parameter to MW translation APIs
> PCI: endpoint: pci-epf-vntb: Propagate MW offset from configfs when
> present
> NTB: ntb_transport: Support offsetted partial memory windows
> NTB: core: Add .get_pci_epc() to ntb_dev_ops
> NTB: epf: vntb: Implement .get_pci_epc() callback
> damengine: dw-edma: Fix MSI data values for multi-vector IMWr
> interrupts
> NTB: ntb_transport: Use seq_file for QP stats debugfs
> NTB: ntb_transport: Move TX memory window setup into setup_qp_mw()
> NTB: ntb_transport: Dynamically determine qp count
> NTB: ntb_transport: Introduce get_dma_dev() helper
> NTB: epf: Reserve a subset of MSI vectors for non-NTB users
> NTB: ntb_transport: Introduce ntb_transport_backend_ops
> PCI: dwc: ep: Cache MSI outbound iATU mapping
> NTB: ntb_transport: Introduce remote eDMA backed transport mode
> NTB: epf: Provide db_vector_count/db_vector_mask callbacks
> ntb_netdev: Multi-queue support
> NTB: epf: Add per-SoC quirk to cap MRRS for DWC eDMA (128B for R-Car)
> iommu: ipmmu-vmsa: Add PCIe ch0 to devices_allowlist
> iommu: ipmmu-vmsa: Add support for reserved regions
> arm64: dts: renesas: Add Spider RC/EP DTs for NTB with remote DW PCIe
> eDMA
> NTB: epf: Add an additional memory window (MW2) barno mapping on
> Renesas R-Car
>
> arch/arm64/boot/dts/renesas/Makefile | 2 +
> .../boot/dts/renesas/r8a779f0-spider-ep.dts | 46 +
> .../boot/dts/renesas/r8a779f0-spider-rc.dts | 52 +
> drivers/dma/dw-edma/dw-edma-core.c | 28 +-
> drivers/iommu/ipmmu-vmsa.c | 7 +-
> drivers/net/ntb_netdev.c | 341 ++-
> drivers/ntb/Kconfig | 11 +
> drivers/ntb/Makefile | 3 +
> drivers/ntb/hw/amd/ntb_hw_amd.c | 6 +-
> drivers/ntb/hw/epf/ntb_hw_epf.c | 177 +-
> drivers/ntb/hw/idt/ntb_hw_idt.c | 3 +-
> drivers/ntb/hw/intel/ntb_hw_gen1.c | 6 +-
> drivers/ntb/hw/intel/ntb_hw_gen1.h | 2 +-
> drivers/ntb/hw/intel/ntb_hw_gen3.c | 3 +-
> drivers/ntb/hw/intel/ntb_hw_gen4.c | 6 +-
> drivers/ntb/hw/mscc/ntb_hw_switchtec.c | 6 +-
> drivers/ntb/msi.c | 6 +-
> drivers/ntb/ntb_edma.c | 628 ++++++
> drivers/ntb/ntb_edma.h | 128 ++
> .../{ntb_transport.c => ntb_transport_core.c} | 1829 ++++++++++++++---
> drivers/ntb/test/ntb_perf.c | 4 +-
> drivers/ntb/test/ntb_tool.c | 6 +-
> .../pci/controller/dwc/pcie-designware-ep.c | 287 ++-
> drivers/pci/controller/dwc/pcie-designware.h | 7 +
> drivers/pci/endpoint/functions/pci-epf-vntb.c | 229 ++-
> drivers/pci/endpoint/pci-epc-core.c | 44 +
> include/linux/ntb.h | 39 +-
> include/linux/ntb_transport.h | 21 +
> include/linux/pci-epc.h | 11 +
> 29 files changed, 3415 insertions(+), 523 deletions(-)
> create mode 100644 arch/arm64/boot/dts/renesas/r8a779f0-spider-ep.dts
> create mode 100644 arch/arm64/boot/dts/renesas/r8a779f0-spider-rc.dts
> create mode 100644 drivers/ntb/ntb_edma.c
> create mode 100644 drivers/ntb/ntb_edma.h
> rename drivers/ntb/{ntb_transport.c => ntb_transport_core.c} (59%)
>
> --
> 2.48.1
>
On Mon, Dec 01, 2025 at 05:02:57PM -0500, Frank Li wrote:
> On Sun, Nov 30, 2025 at 01:03:38AM +0900, Koichiro Den wrote:
> > Hi,
> >
> > This is RFC v2 of the NTB/PCI series for Renesas R-Car S4. The ultimate
> > goal is unchanged, i.e. to improve performance between RC and EP
> > (with vNTB) over ntb_transport, but the approach has changed drastically.
> > Based on the feedback from Frank Li in the v1 thread, in particular:
> > https://lore.kernel.org/all/aQEsip3TsPn4LJY9@lizhi-Precision-Tower-5810/
> > this RFC v2 instead builds an NTB transport backed by remote eDMA
> > architecture and reshapes the series around it. The RC->EP interruption
> > is now achieved using a dedicated eDMA read channel, so the somewhat
> > "hack"-ish approach in RFC v1 is no longer needed.
> >
> > Compared to RFC v1, this v2 series enables NTB transport backed by
> > remote DW eDMA, so the current ntb_transport handling of Memory Window
> > is no longer needed, and direct DMA transfers between EP and RC are
> > used.
> >
> > I realize this is quite a large series. Sorry for the volume, but for
> > the RFC stage I believe presenting the full picture in a single set
> > helps with reviewing the overall architecture. Once the direction is
> > agreed, I will respin it split by subsystem and topic.
> >
> >
> ...
> >
> > - Before this change:
> >
> > * ping
> > 64 bytes from 10.0.0.11: icmp_seq=1 ttl=64 time=12.3 ms
> > 64 bytes from 10.0.0.11: icmp_seq=2 ttl=64 time=6.58 ms
> > 64 bytes from 10.0.0.11: icmp_seq=3 ttl=64 time=1.26 ms
> > 64 bytes from 10.0.0.11: icmp_seq=4 ttl=64 time=7.43 ms
> > 64 bytes from 10.0.0.11: icmp_seq=5 ttl=64 time=1.39 ms
> > 64 bytes from 10.0.0.11: icmp_seq=6 ttl=64 time=7.38 ms
> > 64 bytes from 10.0.0.11: icmp_seq=7 ttl=64 time=1.42 ms
> > 64 bytes from 10.0.0.11: icmp_seq=8 ttl=64 time=7.41 ms
> >
> > * RC->EP (`sudo iperf3 -ub0 -l 65480 -P 2`)
> > [ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams
> > [ 5] 0.00-10.01 sec 344 MBytes 288 Mbits/sec 3.483 ms 51/5555 (0.92%) receiver
> > [ 6] 0.00-10.01 sec 342 MBytes 287 Mbits/sec 3.814 ms 38/5517 (0.69%) receiver
> > [SUM] 0.00-10.01 sec 686 MBytes 575 Mbits/sec 3.648 ms 89/11072 (0.8%) receiver
> >
> > * EP->RC (`sudo iperf3 -ub0 -l 65480 -P 2`)
> > [ 5] 0.00-10.03 sec 334 MBytes 279 Mbits/sec 3.164 ms 390/5731 (6.8%) receiver
> > [ 6] 0.00-10.03 sec 334 MBytes 279 Mbits/sec 2.416 ms 396/5741 (6.9%) receiver
> > [SUM] 0.00-10.03 sec 667 MBytes 558 Mbits/sec 2.790 ms 786/11472 (6.9%) receiver
> >
> > Note: with `-P 2`, the best total bitrate (receiver side) was achieved.
> >
> > - After this change (use_remote_edma=1) [1]:
> >
> > * ping
> > 64 bytes from 10.0.0.11: icmp_seq=1 ttl=64 time=1.48 ms
> > 64 bytes from 10.0.0.11: icmp_seq=2 ttl=64 time=1.03 ms
> > 64 bytes from 10.0.0.11: icmp_seq=3 ttl=64 time=0.931 ms
> > 64 bytes from 10.0.0.11: icmp_seq=4 ttl=64 time=0.910 ms
> > 64 bytes from 10.0.0.11: icmp_seq=5 ttl=64 time=1.07 ms
> > 64 bytes from 10.0.0.11: icmp_seq=6 ttl=64 time=0.986 ms
> > 64 bytes from 10.0.0.11: icmp_seq=7 ttl=64 time=0.910 ms
> > 64 bytes from 10.0.0.11: icmp_seq=8 ttl=64 time=0.883 ms
> >
> > * RC->EP (`sudo iperf3 -ub0 -l 65480 -P 4`)
> > [ 5] 0.00-10.01 sec 3.54 GBytes 3.04 Gbits/sec 0.030 ms 0/58007 (0%) receiver
> > [ 6] 0.00-10.01 sec 3.71 GBytes 3.19 Gbits/sec 0.453 ms 0/60909 (0%) receiver
> > [ 9] 0.00-10.01 sec 3.85 GBytes 3.30 Gbits/sec 0.027 ms 0/63072 (0%) receiver
> > [ 11] 0.00-10.01 sec 3.26 GBytes 2.80 Gbits/sec 0.070 ms 1/53512 (0.0019%) receiver
> > [SUM] 0.00-10.01 sec 14.4 GBytes 12.3 Gbits/sec 0.145 ms 1/235500 (0.00042%) receiver
> >
> > * EP->RC (`sudo iperf3 -ub0 -l 65480 -P 4`)
> > [ 5] 0.00-10.03 sec 3.40 GBytes 2.91 Gbits/sec 0.104 ms 15467/71208 (22%) receiver
> > [ 6] 0.00-10.03 sec 3.08 GBytes 2.64 Gbits/sec 0.176 ms 12097/62609 (19%) receiver
> > [ 9] 0.00-10.03 sec 3.38 GBytes 2.90 Gbits/sec 0.270 ms 17212/72710 (24%) receiver
> > [ 11] 0.00-10.03 sec 2.56 GBytes 2.19 Gbits/sec 0.200 ms 11193/53090 (21%) receiver
>
> Almost 10x fast, 2.9G vs 279M? high light this one will bring more peopole
> interesting about this topic.
Thank you for the review!
OK, I'll highlight this in the next iteration.
By the way, my impression is that we can achieve even higher with this remote
eDMA architecture.
>
> > [SUM] 0.00-10.03 sec 12.4 GBytes 10.6 Gbits/sec 0.188 ms 55969/259617 (22%) receiver
> >
> > [1] configfs settings:
> > # modprobe pci_epf_vntb dyndbg=+pmf
> > # cd /sys/kernel/config/pci_ep/
> > # mkdir functions/pci_epf_vntb/func1
> > # echo 0x1912 > functions/pci_epf_vntb/func1/vendorid
> > # echo 0x0030 > functions/pci_epf_vntb/func1/deviceid
> > # echo 32 > functions/pci_epf_vntb/func1/msi_interrupts
> > # echo 16 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/db_count
> > # echo 128 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/spad_count
> > # echo 2 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/num_mws
> > # echo 0xe0000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw1
> > # echo 0x20000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw2
> > # echo 0xe0000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw2_offset
>
> look like, you try to create sub-small mw windows.
>
> Is it more clean ?
>
> echo 0xe0000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw1.0
> echo 0x20000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw1.1
>
> so wm1.1 natively continue from prevous one.
Thanks for the suggestion.
I was trying to keep the sub-small mw windows referred to in the same way
as normal windows for simplicity and readability, but I agree your proposal
looks intuitive from a User-eXperience point of view.
My only concern is that e.g. {mw1.0, mw1.1, mw2.0} may translate internally
into something like {mw1, mw2, mw3} effectively, and that numbering
mismatch might become confusing when reading or debugging the code.
-Koichiro
>
> Frank
>
> > # echo 0x1912 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/vntb_vid
> > # echo 0x0030 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/vntb_pid
> > # echo 0x10 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/vbus_number
> > # echo 0 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/ctrl_bar
> > # echo 4 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/db_bar
> > # echo 2 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw1_bar
> > # echo 2 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw2_bar
> > # ln -s controllers/e65d0000.pcie-ep functions/pci_epf_vntb/func1/primary/
> > # echo 1 > controllers/e65d0000.pcie-ep/start
> >
> >
> > Thanks for taking a look.
> >
> >
> > Koichiro Den (27):
> > PCI: endpoint: pci-epf-vntb: Use array_index_nospec() on mws_size[]
> > access
> > PCI: endpoint: pci-epf-vntb: Add mwN_offset configfs attributes
> > NTB: epf: Handle mwN_offset for inbound MW regions
> > PCI: endpoint: Add inbound mapping ops to EPC core
> > PCI: dwc: ep: Implement EPC inbound mapping support
> > PCI: endpoint: pci-epf-vntb: Use pci_epc_map_inbound() for MW mapping
> > NTB: Add offset parameter to MW translation APIs
> > PCI: endpoint: pci-epf-vntb: Propagate MW offset from configfs when
> > present
> > NTB: ntb_transport: Support offsetted partial memory windows
> > NTB: core: Add .get_pci_epc() to ntb_dev_ops
> > NTB: epf: vntb: Implement .get_pci_epc() callback
> > damengine: dw-edma: Fix MSI data values for multi-vector IMWr
> > interrupts
> > NTB: ntb_transport: Use seq_file for QP stats debugfs
> > NTB: ntb_transport: Move TX memory window setup into setup_qp_mw()
> > NTB: ntb_transport: Dynamically determine qp count
> > NTB: ntb_transport: Introduce get_dma_dev() helper
> > NTB: epf: Reserve a subset of MSI vectors for non-NTB users
> > NTB: ntb_transport: Introduce ntb_transport_backend_ops
> > PCI: dwc: ep: Cache MSI outbound iATU mapping
> > NTB: ntb_transport: Introduce remote eDMA backed transport mode
> > NTB: epf: Provide db_vector_count/db_vector_mask callbacks
> > ntb_netdev: Multi-queue support
> > NTB: epf: Add per-SoC quirk to cap MRRS for DWC eDMA (128B for R-Car)
> > iommu: ipmmu-vmsa: Add PCIe ch0 to devices_allowlist
> > iommu: ipmmu-vmsa: Add support for reserved regions
> > arm64: dts: renesas: Add Spider RC/EP DTs for NTB with remote DW PCIe
> > eDMA
> > NTB: epf: Add an additional memory window (MW2) barno mapping on
> > Renesas R-Car
> >
> > arch/arm64/boot/dts/renesas/Makefile | 2 +
> > .../boot/dts/renesas/r8a779f0-spider-ep.dts | 46 +
> > .../boot/dts/renesas/r8a779f0-spider-rc.dts | 52 +
> > drivers/dma/dw-edma/dw-edma-core.c | 28 +-
> > drivers/iommu/ipmmu-vmsa.c | 7 +-
> > drivers/net/ntb_netdev.c | 341 ++-
> > drivers/ntb/Kconfig | 11 +
> > drivers/ntb/Makefile | 3 +
> > drivers/ntb/hw/amd/ntb_hw_amd.c | 6 +-
> > drivers/ntb/hw/epf/ntb_hw_epf.c | 177 +-
> > drivers/ntb/hw/idt/ntb_hw_idt.c | 3 +-
> > drivers/ntb/hw/intel/ntb_hw_gen1.c | 6 +-
> > drivers/ntb/hw/intel/ntb_hw_gen1.h | 2 +-
> > drivers/ntb/hw/intel/ntb_hw_gen3.c | 3 +-
> > drivers/ntb/hw/intel/ntb_hw_gen4.c | 6 +-
> > drivers/ntb/hw/mscc/ntb_hw_switchtec.c | 6 +-
> > drivers/ntb/msi.c | 6 +-
> > drivers/ntb/ntb_edma.c | 628 ++++++
> > drivers/ntb/ntb_edma.h | 128 ++
> > .../{ntb_transport.c => ntb_transport_core.c} | 1829 ++++++++++++++---
> > drivers/ntb/test/ntb_perf.c | 4 +-
> > drivers/ntb/test/ntb_tool.c | 6 +-
> > .../pci/controller/dwc/pcie-designware-ep.c | 287 ++-
> > drivers/pci/controller/dwc/pcie-designware.h | 7 +
> > drivers/pci/endpoint/functions/pci-epf-vntb.c | 229 ++-
> > drivers/pci/endpoint/pci-epc-core.c | 44 +
> > include/linux/ntb.h | 39 +-
> > include/linux/ntb_transport.h | 21 +
> > include/linux/pci-epc.h | 11 +
> > 29 files changed, 3415 insertions(+), 523 deletions(-)
> > create mode 100644 arch/arm64/boot/dts/renesas/r8a779f0-spider-ep.dts
> > create mode 100644 arch/arm64/boot/dts/renesas/r8a779f0-spider-rc.dts
> > create mode 100644 drivers/ntb/ntb_edma.c
> > create mode 100644 drivers/ntb/ntb_edma.h
> > rename drivers/ntb/{ntb_transport.c => ntb_transport_core.c} (59%)
> >
> > --
> > 2.48.1
> >
On Tue, Dec 02, 2025 at 03:20:01PM +0900, Koichiro Den wrote:
> On Mon, Dec 01, 2025 at 05:02:57PM -0500, Frank Li wrote:
> > On Sun, Nov 30, 2025 at 01:03:38AM +0900, Koichiro Den wrote:
> > > Hi,
> > >
> > > This is RFC v2 of the NTB/PCI series for Renesas R-Car S4. The ultimate
> > > goal is unchanged, i.e. to improve performance between RC and EP
> > > (with vNTB) over ntb_transport, but the approach has changed drastically.
> > > Based on the feedback from Frank Li in the v1 thread, in particular:
> > > https://lore.kernel.org/all/aQEsip3TsPn4LJY9@lizhi-Precision-Tower-5810/
> > > this RFC v2 instead builds an NTB transport backed by remote eDMA
> > > architecture and reshapes the series around it. The RC->EP interruption
> > > is now achieved using a dedicated eDMA read channel, so the somewhat
> > > "hack"-ish approach in RFC v1 is no longer needed.
> > >
> > > Compared to RFC v1, this v2 series enables NTB transport backed by
> > > remote DW eDMA, so the current ntb_transport handling of Memory Window
> > > is no longer needed, and direct DMA transfers between EP and RC are
> > > used.
> > >
> > > I realize this is quite a large series. Sorry for the volume, but for
> > > the RFC stage I believe presenting the full picture in a single set
> > > helps with reviewing the overall architecture. Once the direction is
> > > agreed, I will respin it split by subsystem and topic.
> > >
> > >
> > ...
> > >
> > > - Before this change:
> > >
> > > * ping
> > > 64 bytes from 10.0.0.11: icmp_seq=1 ttl=64 time=12.3 ms
> > > 64 bytes from 10.0.0.11: icmp_seq=2 ttl=64 time=6.58 ms
> > > 64 bytes from 10.0.0.11: icmp_seq=3 ttl=64 time=1.26 ms
> > > 64 bytes from 10.0.0.11: icmp_seq=4 ttl=64 time=7.43 ms
> > > 64 bytes from 10.0.0.11: icmp_seq=5 ttl=64 time=1.39 ms
> > > 64 bytes from 10.0.0.11: icmp_seq=6 ttl=64 time=7.38 ms
> > > 64 bytes from 10.0.0.11: icmp_seq=7 ttl=64 time=1.42 ms
> > > 64 bytes from 10.0.0.11: icmp_seq=8 ttl=64 time=7.41 ms
> > >
> > > * RC->EP (`sudo iperf3 -ub0 -l 65480 -P 2`)
> > > [ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams
> > > [ 5] 0.00-10.01 sec 344 MBytes 288 Mbits/sec 3.483 ms 51/5555 (0.92%) receiver
> > > [ 6] 0.00-10.01 sec 342 MBytes 287 Mbits/sec 3.814 ms 38/5517 (0.69%) receiver
> > > [SUM] 0.00-10.01 sec 686 MBytes 575 Mbits/sec 3.648 ms 89/11072 (0.8%) receiver
> > >
> > > * EP->RC (`sudo iperf3 -ub0 -l 65480 -P 2`)
> > > [ 5] 0.00-10.03 sec 334 MBytes 279 Mbits/sec 3.164 ms 390/5731 (6.8%) receiver
> > > [ 6] 0.00-10.03 sec 334 MBytes 279 Mbits/sec 2.416 ms 396/5741 (6.9%) receiver
> > > [SUM] 0.00-10.03 sec 667 MBytes 558 Mbits/sec 2.790 ms 786/11472 (6.9%) receiver
> > >
> > > Note: with `-P 2`, the best total bitrate (receiver side) was achieved.
> > >
> > > - After this change (use_remote_edma=1) [1]:
> > >
> > > * ping
> > > 64 bytes from 10.0.0.11: icmp_seq=1 ttl=64 time=1.48 ms
> > > 64 bytes from 10.0.0.11: icmp_seq=2 ttl=64 time=1.03 ms
> > > 64 bytes from 10.0.0.11: icmp_seq=3 ttl=64 time=0.931 ms
> > > 64 bytes from 10.0.0.11: icmp_seq=4 ttl=64 time=0.910 ms
> > > 64 bytes from 10.0.0.11: icmp_seq=5 ttl=64 time=1.07 ms
> > > 64 bytes from 10.0.0.11: icmp_seq=6 ttl=64 time=0.986 ms
> > > 64 bytes from 10.0.0.11: icmp_seq=7 ttl=64 time=0.910 ms
> > > 64 bytes from 10.0.0.11: icmp_seq=8 ttl=64 time=0.883 ms
> > >
> > > * RC->EP (`sudo iperf3 -ub0 -l 65480 -P 4`)
> > > [ 5] 0.00-10.01 sec 3.54 GBytes 3.04 Gbits/sec 0.030 ms 0/58007 (0%) receiver
> > > [ 6] 0.00-10.01 sec 3.71 GBytes 3.19 Gbits/sec 0.453 ms 0/60909 (0%) receiver
> > > [ 9] 0.00-10.01 sec 3.85 GBytes 3.30 Gbits/sec 0.027 ms 0/63072 (0%) receiver
> > > [ 11] 0.00-10.01 sec 3.26 GBytes 2.80 Gbits/sec 0.070 ms 1/53512 (0.0019%) receiver
> > > [SUM] 0.00-10.01 sec 14.4 GBytes 12.3 Gbits/sec 0.145 ms 1/235500 (0.00042%) receiver
> > >
> > > * EP->RC (`sudo iperf3 -ub0 -l 65480 -P 4`)
> > > [ 5] 0.00-10.03 sec 3.40 GBytes 2.91 Gbits/sec 0.104 ms 15467/71208 (22%) receiver
> > > [ 6] 0.00-10.03 sec 3.08 GBytes 2.64 Gbits/sec 0.176 ms 12097/62609 (19%) receiver
> > > [ 9] 0.00-10.03 sec 3.38 GBytes 2.90 Gbits/sec 0.270 ms 17212/72710 (24%) receiver
> > > [ 11] 0.00-10.03 sec 2.56 GBytes 2.19 Gbits/sec 0.200 ms 11193/53090 (21%) receiver
> >
> > Almost 10x fast, 2.9G vs 279M? high light this one will bring more peopole
> > interesting about this topic.
>
> Thank you for the review!
>
> OK, I'll highlight this in the next iteration.
> By the way, my impression is that we can achieve even higher with this remote
> eDMA architecture.
eDMA can reduce one memory copy and longer TLP data length. Previously, I
tried use RDMA framework some year ago, but it is over complex and stop the
work.
>
> >
> > > [SUM] 0.00-10.03 sec 12.4 GBytes 10.6 Gbits/sec 0.188 ms 55969/259617 (22%) receiver
> > >
> > > [1] configfs settings:
> > > # modprobe pci_epf_vntb dyndbg=+pmf
> > > # cd /sys/kernel/config/pci_ep/
> > > # mkdir functions/pci_epf_vntb/func1
> > > # echo 0x1912 > functions/pci_epf_vntb/func1/vendorid
> > > # echo 0x0030 > functions/pci_epf_vntb/func1/deviceid
> > > # echo 32 > functions/pci_epf_vntb/func1/msi_interrupts
> > > # echo 16 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/db_count
> > > # echo 128 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/spad_count
> > > # echo 2 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/num_mws
> > > # echo 0xe0000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw1
> > > # echo 0x20000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw2
> > > # echo 0xe0000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw2_offset
> >
> > look like, you try to create sub-small mw windows.
> >
> > Is it more clean ?
> >
> > echo 0xe0000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw1.0
> > echo 0x20000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw1.1
> >
> > so wm1.1 natively continue from prevous one.
>
> Thanks for the suggestion.
>
> I was trying to keep the sub-small mw windows referred to in the same way
> as normal windows for simplicity and readability, but I agree your proposal
> looks intuitive from a User-eXperience point of view.
>
> My only concern is that e.g. {mw1.0, mw1.1, mw2.0} may translate internally
> into something like {mw1, mw2, mw3} effectively, and that numbering
> mismatch might become confusing when reading or debugging the code.
If there are enough bars, you can try use one dedicate bar for EDMA register
space, LL space shared with bar0 (control bar) to reduce complex, and get
better performace firstly.
Frank
>
> -Koichiro
>
> >
> > Frank
> >
> > > # echo 0x1912 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/vntb_vid
> > > # echo 0x0030 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/vntb_pid
> > > # echo 0x10 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/vbus_number
> > > # echo 0 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/ctrl_bar
> > > # echo 4 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/db_bar
> > > # echo 2 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw1_bar
> > > # echo 2 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw2_bar
> > > # ln -s controllers/e65d0000.pcie-ep functions/pci_epf_vntb/func1/primary/
> > > # echo 1 > controllers/e65d0000.pcie-ep/start
> > >
> > >
> > > Thanks for taking a look.
> > >
> > >
> > > Koichiro Den (27):
> > > PCI: endpoint: pci-epf-vntb: Use array_index_nospec() on mws_size[]
> > > access
> > > PCI: endpoint: pci-epf-vntb: Add mwN_offset configfs attributes
> > > NTB: epf: Handle mwN_offset for inbound MW regions
> > > PCI: endpoint: Add inbound mapping ops to EPC core
> > > PCI: dwc: ep: Implement EPC inbound mapping support
> > > PCI: endpoint: pci-epf-vntb: Use pci_epc_map_inbound() for MW mapping
> > > NTB: Add offset parameter to MW translation APIs
> > > PCI: endpoint: pci-epf-vntb: Propagate MW offset from configfs when
> > > present
> > > NTB: ntb_transport: Support offsetted partial memory windows
> > > NTB: core: Add .get_pci_epc() to ntb_dev_ops
> > > NTB: epf: vntb: Implement .get_pci_epc() callback
> > > damengine: dw-edma: Fix MSI data values for multi-vector IMWr
> > > interrupts
> > > NTB: ntb_transport: Use seq_file for QP stats debugfs
> > > NTB: ntb_transport: Move TX memory window setup into setup_qp_mw()
> > > NTB: ntb_transport: Dynamically determine qp count
> > > NTB: ntb_transport: Introduce get_dma_dev() helper
> > > NTB: epf: Reserve a subset of MSI vectors for non-NTB users
> > > NTB: ntb_transport: Introduce ntb_transport_backend_ops
> > > PCI: dwc: ep: Cache MSI outbound iATU mapping
> > > NTB: ntb_transport: Introduce remote eDMA backed transport mode
> > > NTB: epf: Provide db_vector_count/db_vector_mask callbacks
> > > ntb_netdev: Multi-queue support
> > > NTB: epf: Add per-SoC quirk to cap MRRS for DWC eDMA (128B for R-Car)
> > > iommu: ipmmu-vmsa: Add PCIe ch0 to devices_allowlist
> > > iommu: ipmmu-vmsa: Add support for reserved regions
> > > arm64: dts: renesas: Add Spider RC/EP DTs for NTB with remote DW PCIe
> > > eDMA
> > > NTB: epf: Add an additional memory window (MW2) barno mapping on
> > > Renesas R-Car
> > >
> > > arch/arm64/boot/dts/renesas/Makefile | 2 +
> > > .../boot/dts/renesas/r8a779f0-spider-ep.dts | 46 +
> > > .../boot/dts/renesas/r8a779f0-spider-rc.dts | 52 +
> > > drivers/dma/dw-edma/dw-edma-core.c | 28 +-
> > > drivers/iommu/ipmmu-vmsa.c | 7 +-
> > > drivers/net/ntb_netdev.c | 341 ++-
> > > drivers/ntb/Kconfig | 11 +
> > > drivers/ntb/Makefile | 3 +
> > > drivers/ntb/hw/amd/ntb_hw_amd.c | 6 +-
> > > drivers/ntb/hw/epf/ntb_hw_epf.c | 177 +-
> > > drivers/ntb/hw/idt/ntb_hw_idt.c | 3 +-
> > > drivers/ntb/hw/intel/ntb_hw_gen1.c | 6 +-
> > > drivers/ntb/hw/intel/ntb_hw_gen1.h | 2 +-
> > > drivers/ntb/hw/intel/ntb_hw_gen3.c | 3 +-
> > > drivers/ntb/hw/intel/ntb_hw_gen4.c | 6 +-
> > > drivers/ntb/hw/mscc/ntb_hw_switchtec.c | 6 +-
> > > drivers/ntb/msi.c | 6 +-
> > > drivers/ntb/ntb_edma.c | 628 ++++++
> > > drivers/ntb/ntb_edma.h | 128 ++
> > > .../{ntb_transport.c => ntb_transport_core.c} | 1829 ++++++++++++++---
> > > drivers/ntb/test/ntb_perf.c | 4 +-
> > > drivers/ntb/test/ntb_tool.c | 6 +-
> > > .../pci/controller/dwc/pcie-designware-ep.c | 287 ++-
> > > drivers/pci/controller/dwc/pcie-designware.h | 7 +
> > > drivers/pci/endpoint/functions/pci-epf-vntb.c | 229 ++-
> > > drivers/pci/endpoint/pci-epc-core.c | 44 +
> > > include/linux/ntb.h | 39 +-
> > > include/linux/ntb_transport.h | 21 +
> > > include/linux/pci-epc.h | 11 +
> > > 29 files changed, 3415 insertions(+), 523 deletions(-)
> > > create mode 100644 arch/arm64/boot/dts/renesas/r8a779f0-spider-ep.dts
> > > create mode 100644 arch/arm64/boot/dts/renesas/r8a779f0-spider-rc.dts
> > > create mode 100644 drivers/ntb/ntb_edma.c
> > > create mode 100644 drivers/ntb/ntb_edma.h
> > > rename drivers/ntb/{ntb_transport.c => ntb_transport_core.c} (59%)
> > >
> > > --
> > > 2.48.1
> > >
On Tue, Dec 02, 2025 at 11:07:23AM -0500, Frank Li wrote:
> On Tue, Dec 02, 2025 at 03:20:01PM +0900, Koichiro Den wrote:
> > On Mon, Dec 01, 2025 at 05:02:57PM -0500, Frank Li wrote:
> > > On Sun, Nov 30, 2025 at 01:03:38AM +0900, Koichiro Den wrote:
> > > > Hi,
> > > >
> > > > This is RFC v2 of the NTB/PCI series for Renesas R-Car S4. The ultimate
> > > > goal is unchanged, i.e. to improve performance between RC and EP
> > > > (with vNTB) over ntb_transport, but the approach has changed drastically.
> > > > Based on the feedback from Frank Li in the v1 thread, in particular:
> > > > https://lore.kernel.org/all/aQEsip3TsPn4LJY9@lizhi-Precision-Tower-5810/
> > > > this RFC v2 instead builds an NTB transport backed by remote eDMA
> > > > architecture and reshapes the series around it. The RC->EP interruption
> > > > is now achieved using a dedicated eDMA read channel, so the somewhat
> > > > "hack"-ish approach in RFC v1 is no longer needed.
> > > >
> > > > Compared to RFC v1, this v2 series enables NTB transport backed by
> > > > remote DW eDMA, so the current ntb_transport handling of Memory Window
> > > > is no longer needed, and direct DMA transfers between EP and RC are
> > > > used.
> > > >
> > > > I realize this is quite a large series. Sorry for the volume, but for
> > > > the RFC stage I believe presenting the full picture in a single set
> > > > helps with reviewing the overall architecture. Once the direction is
> > > > agreed, I will respin it split by subsystem and topic.
> > > >
> > > >
> > > ...
> > > >
> > > > - Before this change:
> > > >
> > > > * ping
> > > > 64 bytes from 10.0.0.11: icmp_seq=1 ttl=64 time=12.3 ms
> > > > 64 bytes from 10.0.0.11: icmp_seq=2 ttl=64 time=6.58 ms
> > > > 64 bytes from 10.0.0.11: icmp_seq=3 ttl=64 time=1.26 ms
> > > > 64 bytes from 10.0.0.11: icmp_seq=4 ttl=64 time=7.43 ms
> > > > 64 bytes from 10.0.0.11: icmp_seq=5 ttl=64 time=1.39 ms
> > > > 64 bytes from 10.0.0.11: icmp_seq=6 ttl=64 time=7.38 ms
> > > > 64 bytes from 10.0.0.11: icmp_seq=7 ttl=64 time=1.42 ms
> > > > 64 bytes from 10.0.0.11: icmp_seq=8 ttl=64 time=7.41 ms
> > > >
> > > > * RC->EP (`sudo iperf3 -ub0 -l 65480 -P 2`)
> > > > [ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams
> > > > [ 5] 0.00-10.01 sec 344 MBytes 288 Mbits/sec 3.483 ms 51/5555 (0.92%) receiver
> > > > [ 6] 0.00-10.01 sec 342 MBytes 287 Mbits/sec 3.814 ms 38/5517 (0.69%) receiver
> > > > [SUM] 0.00-10.01 sec 686 MBytes 575 Mbits/sec 3.648 ms 89/11072 (0.8%) receiver
> > > >
> > > > * EP->RC (`sudo iperf3 -ub0 -l 65480 -P 2`)
> > > > [ 5] 0.00-10.03 sec 334 MBytes 279 Mbits/sec 3.164 ms 390/5731 (6.8%) receiver
> > > > [ 6] 0.00-10.03 sec 334 MBytes 279 Mbits/sec 2.416 ms 396/5741 (6.9%) receiver
> > > > [SUM] 0.00-10.03 sec 667 MBytes 558 Mbits/sec 2.790 ms 786/11472 (6.9%) receiver
> > > >
> > > > Note: with `-P 2`, the best total bitrate (receiver side) was achieved.
> > > >
> > > > - After this change (use_remote_edma=1) [1]:
> > > >
> > > > * ping
> > > > 64 bytes from 10.0.0.11: icmp_seq=1 ttl=64 time=1.48 ms
> > > > 64 bytes from 10.0.0.11: icmp_seq=2 ttl=64 time=1.03 ms
> > > > 64 bytes from 10.0.0.11: icmp_seq=3 ttl=64 time=0.931 ms
> > > > 64 bytes from 10.0.0.11: icmp_seq=4 ttl=64 time=0.910 ms
> > > > 64 bytes from 10.0.0.11: icmp_seq=5 ttl=64 time=1.07 ms
> > > > 64 bytes from 10.0.0.11: icmp_seq=6 ttl=64 time=0.986 ms
> > > > 64 bytes from 10.0.0.11: icmp_seq=7 ttl=64 time=0.910 ms
> > > > 64 bytes from 10.0.0.11: icmp_seq=8 ttl=64 time=0.883 ms
> > > >
> > > > * RC->EP (`sudo iperf3 -ub0 -l 65480 -P 4`)
> > > > [ 5] 0.00-10.01 sec 3.54 GBytes 3.04 Gbits/sec 0.030 ms 0/58007 (0%) receiver
> > > > [ 6] 0.00-10.01 sec 3.71 GBytes 3.19 Gbits/sec 0.453 ms 0/60909 (0%) receiver
> > > > [ 9] 0.00-10.01 sec 3.85 GBytes 3.30 Gbits/sec 0.027 ms 0/63072 (0%) receiver
> > > > [ 11] 0.00-10.01 sec 3.26 GBytes 2.80 Gbits/sec 0.070 ms 1/53512 (0.0019%) receiver
> > > > [SUM] 0.00-10.01 sec 14.4 GBytes 12.3 Gbits/sec 0.145 ms 1/235500 (0.00042%) receiver
> > > >
> > > > * EP->RC (`sudo iperf3 -ub0 -l 65480 -P 4`)
> > > > [ 5] 0.00-10.03 sec 3.40 GBytes 2.91 Gbits/sec 0.104 ms 15467/71208 (22%) receiver
> > > > [ 6] 0.00-10.03 sec 3.08 GBytes 2.64 Gbits/sec 0.176 ms 12097/62609 (19%) receiver
> > > > [ 9] 0.00-10.03 sec 3.38 GBytes 2.90 Gbits/sec 0.270 ms 17212/72710 (24%) receiver
> > > > [ 11] 0.00-10.03 sec 2.56 GBytes 2.19 Gbits/sec 0.200 ms 11193/53090 (21%) receiver
> > >
> > > Almost 10x fast, 2.9G vs 279M? high light this one will bring more peopole
> > > interesting about this topic.
> >
> > Thank you for the review!
> >
> > OK, I'll highlight this in the next iteration.
> > By the way, my impression is that we can achieve even higher with this remote
> > eDMA architecture.
>
> eDMA can reduce one memory copy and longer TLP data length. Previously, I
> tried use RDMA framework some year ago, but it is over complex and stop the
> work.
That's interesting. Thank you for the info.
>
> >
> > >
> > > > [SUM] 0.00-10.03 sec 12.4 GBytes 10.6 Gbits/sec 0.188 ms 55969/259617 (22%) receiver
> > > >
> > > > [1] configfs settings:
> > > > # modprobe pci_epf_vntb dyndbg=+pmf
> > > > # cd /sys/kernel/config/pci_ep/
> > > > # mkdir functions/pci_epf_vntb/func1
> > > > # echo 0x1912 > functions/pci_epf_vntb/func1/vendorid
> > > > # echo 0x0030 > functions/pci_epf_vntb/func1/deviceid
> > > > # echo 32 > functions/pci_epf_vntb/func1/msi_interrupts
> > > > # echo 16 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/db_count
> > > > # echo 128 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/spad_count
> > > > # echo 2 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/num_mws
> > > > # echo 0xe0000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw1
> > > > # echo 0x20000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw2
> > > > # echo 0xe0000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw2_offset
> > >
> > > look like, you try to create sub-small mw windows.
> > >
> > > Is it more clean ?
> > >
> > > echo 0xe0000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw1.0
> > > echo 0x20000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw1.1
> > >
> > > so wm1.1 natively continue from prevous one.
> >
> > Thanks for the suggestion.
> >
> > I was trying to keep the sub-small mw windows referred to in the same way
> > as normal windows for simplicity and readability, but I agree your proposal
> > looks intuitive from a User-eXperience point of view.
> >
> > My only concern is that e.g. {mw1.0, mw1.1, mw2.0} may translate internally
> > into something like {mw1, mw2, mw3} effectively, and that numbering
> > mismatch might become confusing when reading or debugging the code.
>
> If there are enough bars, you can try use one dedicate bar for EDMA register
> space, LL space shared with bar0 (control bar) to reduce complex, and get
> better performace firstly.
Thank you for the suggestion. Once I have the critical pieces (which we are
discussing in several threads for this RFCv2 series) sorted out and start
preparing the next iteration, I'll revisit this.
Koichiro
>
> Frank
>
> >
> > -Koichiro
> >
> > >
> > > Frank
> > >
> > > > # echo 0x1912 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/vntb_vid
> > > > # echo 0x0030 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/vntb_pid
> > > > # echo 0x10 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/vbus_number
> > > > # echo 0 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/ctrl_bar
> > > > # echo 4 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/db_bar
> > > > # echo 2 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw1_bar
> > > > # echo 2 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw2_bar
> > > > # ln -s controllers/e65d0000.pcie-ep functions/pci_epf_vntb/func1/primary/
> > > > # echo 1 > controllers/e65d0000.pcie-ep/start
> > > >
> > > >
> > > > Thanks for taking a look.
> > > >
> > > >
> > > > Koichiro Den (27):
> > > > PCI: endpoint: pci-epf-vntb: Use array_index_nospec() on mws_size[]
> > > > access
> > > > PCI: endpoint: pci-epf-vntb: Add mwN_offset configfs attributes
> > > > NTB: epf: Handle mwN_offset for inbound MW regions
> > > > PCI: endpoint: Add inbound mapping ops to EPC core
> > > > PCI: dwc: ep: Implement EPC inbound mapping support
> > > > PCI: endpoint: pci-epf-vntb: Use pci_epc_map_inbound() for MW mapping
> > > > NTB: Add offset parameter to MW translation APIs
> > > > PCI: endpoint: pci-epf-vntb: Propagate MW offset from configfs when
> > > > present
> > > > NTB: ntb_transport: Support offsetted partial memory windows
> > > > NTB: core: Add .get_pci_epc() to ntb_dev_ops
> > > > NTB: epf: vntb: Implement .get_pci_epc() callback
> > > > damengine: dw-edma: Fix MSI data values for multi-vector IMWr
> > > > interrupts
> > > > NTB: ntb_transport: Use seq_file for QP stats debugfs
> > > > NTB: ntb_transport: Move TX memory window setup into setup_qp_mw()
> > > > NTB: ntb_transport: Dynamically determine qp count
> > > > NTB: ntb_transport: Introduce get_dma_dev() helper
> > > > NTB: epf: Reserve a subset of MSI vectors for non-NTB users
> > > > NTB: ntb_transport: Introduce ntb_transport_backend_ops
> > > > PCI: dwc: ep: Cache MSI outbound iATU mapping
> > > > NTB: ntb_transport: Introduce remote eDMA backed transport mode
> > > > NTB: epf: Provide db_vector_count/db_vector_mask callbacks
> > > > ntb_netdev: Multi-queue support
> > > > NTB: epf: Add per-SoC quirk to cap MRRS for DWC eDMA (128B for R-Car)
> > > > iommu: ipmmu-vmsa: Add PCIe ch0 to devices_allowlist
> > > > iommu: ipmmu-vmsa: Add support for reserved regions
> > > > arm64: dts: renesas: Add Spider RC/EP DTs for NTB with remote DW PCIe
> > > > eDMA
> > > > NTB: epf: Add an additional memory window (MW2) barno mapping on
> > > > Renesas R-Car
> > > >
> > > > arch/arm64/boot/dts/renesas/Makefile | 2 +
> > > > .../boot/dts/renesas/r8a779f0-spider-ep.dts | 46 +
> > > > .../boot/dts/renesas/r8a779f0-spider-rc.dts | 52 +
> > > > drivers/dma/dw-edma/dw-edma-core.c | 28 +-
> > > > drivers/iommu/ipmmu-vmsa.c | 7 +-
> > > > drivers/net/ntb_netdev.c | 341 ++-
> > > > drivers/ntb/Kconfig | 11 +
> > > > drivers/ntb/Makefile | 3 +
> > > > drivers/ntb/hw/amd/ntb_hw_amd.c | 6 +-
> > > > drivers/ntb/hw/epf/ntb_hw_epf.c | 177 +-
> > > > drivers/ntb/hw/idt/ntb_hw_idt.c | 3 +-
> > > > drivers/ntb/hw/intel/ntb_hw_gen1.c | 6 +-
> > > > drivers/ntb/hw/intel/ntb_hw_gen1.h | 2 +-
> > > > drivers/ntb/hw/intel/ntb_hw_gen3.c | 3 +-
> > > > drivers/ntb/hw/intel/ntb_hw_gen4.c | 6 +-
> > > > drivers/ntb/hw/mscc/ntb_hw_switchtec.c | 6 +-
> > > > drivers/ntb/msi.c | 6 +-
> > > > drivers/ntb/ntb_edma.c | 628 ++++++
> > > > drivers/ntb/ntb_edma.h | 128 ++
> > > > .../{ntb_transport.c => ntb_transport_core.c} | 1829 ++++++++++++++---
> > > > drivers/ntb/test/ntb_perf.c | 4 +-
> > > > drivers/ntb/test/ntb_tool.c | 6 +-
> > > > .../pci/controller/dwc/pcie-designware-ep.c | 287 ++-
> > > > drivers/pci/controller/dwc/pcie-designware.h | 7 +
> > > > drivers/pci/endpoint/functions/pci-epf-vntb.c | 229 ++-
> > > > drivers/pci/endpoint/pci-epc-core.c | 44 +
> > > > include/linux/ntb.h | 39 +-
> > > > include/linux/ntb_transport.h | 21 +
> > > > include/linux/pci-epc.h | 11 +
> > > > 29 files changed, 3415 insertions(+), 523 deletions(-)
> > > > create mode 100644 arch/arm64/boot/dts/renesas/r8a779f0-spider-ep.dts
> > > > create mode 100644 arch/arm64/boot/dts/renesas/r8a779f0-spider-rc.dts
> > > > create mode 100644 drivers/ntb/ntb_edma.c
> > > > create mode 100644 drivers/ntb/ntb_edma.h
> > > > rename drivers/ntb/{ntb_transport.c => ntb_transport_core.c} (59%)
> > > >
> > > > --
> > > > 2.48.1
> > > >
© 2016 - 2026 Red Hat, Inc.