[RFC Patch net-next v1 0/9] r8169: add RSS support for RTL8127

javen posted 9 patches 1 month, 3 weeks ago
There is a newer version of this series
drivers/net/ethernet/realtek/r8169_main.c | 1437 ++++++++++++++++++---
1 file changed, 1238 insertions(+), 199 deletions(-)
[RFC Patch net-next v1 0/9] r8169: add RSS support for RTL8127
Posted by javen 1 month, 3 weeks ago
From: Javen Xu <javen_xu@realsil.com.cn>

This series patch adds RSS support for RTL8127 in the r8169 driver.

Currently, without RSS support, a single CPU core handles all incoming
traffic. Under heavy loads, this single core becomes a bottleneck, causing
high softirq usage and leading to unstable and degraded network throughput.

As a result, we add rss support for RTL8127. This RFC patch is just for
discussing. And we do some experiments on AMD platform. Below is the 
result.

Platform: AMD Ryzen Embedded R2514 with Radeon Graphics(4 Cores/8 Threads)
Arch: x86_64
Test command: 
  Server: iperf3 -s
  Client: iperf3 -c 192.168.2.1 -P 20 -t 3600
Monitor: mpstat -P ALL 1

Before this patch (Without RSS):
  Throughput: Unstable, fluctuating between 3.76 Gbits/sec and
  8.2 Gbits/sec.
  CPU Usage: A single CPU core is fully occupied with softirq reaching 
  up to 96%.

After this patch (With RSS enabled):
  Throughput: Stable at 9.42 Gbits/sec.
  CPU Usage: The traffic load is evenly distributed across multiple CPU
  cores. The maximum softirq on a single core dropped to 63%.
  
Patch summary:
  Patch 1: Adds necessary macro and register definitions for RSS.
  Patch 2-4: Support NAPI and multi RX/TX queues.
  Patch 5-6: Support MSI-X and enables it specifically for RTL8127.
  Patch 7: Enables RSS for RTL8127.
  Patch 8-9: Adds ethtool support to configure the number of RX queues.
  
Javen Xu (9):
  r8169: add some register definitions
  r8169: add napi and irq support
  r8169: add support for multi tx queues
  r8169: add support for multi rx queues
  r8169: add support for msix
  r8169: enable msix for RTL8127
  r8169: add support and enable rss
  r8169: move struct ethtool_ops
  r8169: add support for ethtool

 drivers/net/ethernet/realtek/r8169_main.c | 1437 ++++++++++++++++++---
 1 file changed, 1238 insertions(+), 199 deletions(-)

-- 
2.43.0
Re: [RFC Patch net-next v1 0/9] r8169: add RSS support for RTL8127
Posted by FUKAUMI Naoki 1 month, 3 weeks ago
Hi Javen,

Thank you very much for your nice work!

On 4/20/26 11:19, javen wrote:
> From: Javen Xu <javen_xu@realsil.com.cn>
> 
> This series patch adds RSS support for RTL8127 in the r8169 driver.
> 
> Currently, without RSS support, a single CPU core handles all incoming
> traffic. Under heavy loads, this single core becomes a bottleneck, causing
> high softirq usage and leading to unstable and degraded network throughput.
> 
> As a result, we add rss support for RTL8127. This RFC patch is just for
> discussing. And we do some experiments on AMD platform. Below is the
> result.
> 
> Platform: AMD Ryzen Embedded R2514 with Radeon Graphics(4 Cores/8 Threads)
> Arch: x86_64
> Test command:
>    Server: iperf3 -s
>    Client: iperf3 -c 192.168.2.1 -P 20 -t 3600
> Monitor: mpstat -P ALL 1
> 
> Before this patch (Without RSS):
>    Throughput: Unstable, fluctuating between 3.76 Gbits/sec and
>    8.2 Gbits/sec.
>    CPU Usage: A single CPU core is fully occupied with softirq reaching
>    up to 96%.
> 
> After this patch (With RSS enabled):
>    Throughput: Stable at 9.42 Gbits/sec.
>    CPU Usage: The traffic load is evenly distributed across multiple CPU
>    cores. The maximum softirq on a single core dropped to 63%.

Platform: Radxa ROCK 5T (RK3588: 4x Cortex-A76, 4x Cortex-A55)
Arch: aarch64
Configuration: smp_affinity is set to use only the big cores.

Vanilla Linux v7.0:
Throughput: 5.5 Gbps (4.3 Gbps with -P 20)
CPU Usage: ~100% on a single A76 core.

Linux v7.0 + this patch series:
Throughput: 9.4 Gbps with -P 20
CPU Usage: distributed across all four A76 cores.

Looks good to me!

Feel free to use:
Tested-by: FUKAUMI Naoki <naoki@radxa.com>

Best regards,

--
FUKAUMI Naoki
Radxa Computer (Shenzhen) Co., Ltd.

> Patch summary:
>    Patch 1: Adds necessary macro and register definitions for RSS.
>    Patch 2-4: Support NAPI and multi RX/TX queues.
>    Patch 5-6: Support MSI-X and enables it specifically for RTL8127.
>    Patch 7: Enables RSS for RTL8127.
>    Patch 8-9: Adds ethtool support to configure the number of RX queues.
>    
> Javen Xu (9):
>    r8169: add some register definitions
>    r8169: add napi and irq support
>    r8169: add support for multi tx queues
>    r8169: add support for multi rx queues
>    r8169: add support for msix
>    r8169: enable msix for RTL8127
>    r8169: add support and enable rss
>    r8169: move struct ethtool_ops
>    r8169: add support for ethtool
> 
>   drivers/net/ethernet/realtek/r8169_main.c | 1437 ++++++++++++++++++---
>   1 file changed, 1238 insertions(+), 199 deletions(-)
Re: [RFC Patch net-next v1 0/9] r8169: add RSS support for RTL8127
Posted by Heiner Kallweit 1 month, 3 weeks ago
On 20.04.2026 04:19, javen wrote:
> From: Javen Xu <javen_xu@realsil.com.cn>
> 
> This series patch adds RSS support for RTL8127 in the r8169 driver.
> 
> Currently, without RSS support, a single CPU core handles all incoming
> traffic. Under heavy loads, this single core becomes a bottleneck, causing
> high softirq usage and leading to unstable and degraded network throughput.
> 
> As a result, we add rss support for RTL8127. This RFC patch is just for
> discussing. And we do some experiments on AMD platform. Below is the 
> result.
> 
> Platform: AMD Ryzen Embedded R2514 with Radeon Graphics(4 Cores/8 Threads)

An older embedded CPU (AFAICS from 2019, refreshed in 2022) in reality is
unlikely to be used with sustained 10GBit traffic. It would be too weak to
handle userspace apps making use of this high throughput. This hw edge case
IMO isn't really an argument for adding 1.000 LoC, blowing up driver structs,
and adding the complexity of dealing with a register layout changing every
two chip versions.

It's really a problem that Realtek frequently changes register layout and/or
register semantics in a not backward-compatible way (and doesn't provide
documentation), resulting in ugly versioned stuff like the following.

IMR_V2_SET_REG_8125	= 0x0d0c,
IMR_V2_CLEAR_REG_8125	= 0x0d00,
IMR_V4_L2_CLEAR_REG_8125 = 0x0d10,
ISR_V2_8125		= 0x0d04,
ISR_V4_L2_8125		= 0x0d14,

case RTL_GIGA_MAC_VER_80:
	tp->HwSuppIsrVer = 6;
default:
	tp->HwSuppIsrVer = 1;

This messy hw design makes it hard to develop maintainable drivers.
This is underlined by the fact that Realtek has separate r8125, r8126,
r8127 drivers, even though they share most of the code.

> Arch: x86_64
> Test command: 
>   Server: iperf3 -s
>   Client: iperf3 -c 192.168.2.1 -P 20 -t 3600
> Monitor: mpstat -P ALL 1
> 
> Before this patch (Without RSS):
>   Throughput: Unstable, fluctuating between 3.76 Gbits/sec and
>   8.2 Gbits/sec.
>   CPU Usage: A single CPU core is fully occupied with softirq reaching 
>   up to 96%.
> 
> After this patch (With RSS enabled):
>   Throughput: Stable at 9.42 Gbits/sec.
>   CPU Usage: The traffic load is evenly distributed across multiple CPU
>   cores. The maximum softirq on a single core dropped to 63%.
>   
> Patch summary:
>   Patch 1: Adds necessary macro and register definitions for RSS.
>   Patch 2-4: Support NAPI and multi RX/TX queues.

Driver supports NAPI already.

>   Patch 5-6: Support MSI-X and enables it specifically for RTL8127.

Also MSI-X is used already.

>   Patch 7: Enables RSS for RTL8127.
>   Patch 8-9: Adds ethtool support to configure the number of RX queues.
>   
> Javen Xu (9):
>   r8169: add some register definitions
>   r8169: add napi and irq support
>   r8169: add support for multi tx queues
>   r8169: add support for multi rx queues
>   r8169: add support for msix
>   r8169: enable msix for RTL8127
>   r8169: add support and enable rss
>   r8169: move struct ethtool_ops
>   r8169: add support for ethtool
> 
>  drivers/net/ethernet/realtek/r8169_main.c | 1437 ++++++++++++++++++---
>  1 file changed, 1238 insertions(+), 199 deletions(-)
> 

Series includes functions like rtl8169_desc_quirk() indicating a need to work
around hw errata. Would be helpful to add comments describing the hw erratum,
best with a link to documentation.
RE: [RFC Patch net-next v1 0/9] r8169: add RSS support for RTL8127
Posted by Javen 1 month, 2 weeks ago
>On 20.04.2026 04:19, javen wrote:
>> From: Javen Xu <javen_xu@realsil.com.cn>
>>
>> This series patch adds RSS support for RTL8127 in the r8169 driver.
>>
>> Currently, without RSS support, a single CPU core handles all incoming
>> traffic. Under heavy loads, this single core becomes a bottleneck,
>> causing high softirq usage and leading to unstable and degraded network
>throughput.
>>
>> As a result, we add rss support for RTL8127. This RFC patch is just
>> for discussing. And we do some experiments on AMD platform. Below is
>> the result.
>>
>> Platform: AMD Ryzen Embedded R2514 with Radeon Graphics(4 Cores/8
>> Threads)
>
>An older embedded CPU (AFAICS from 2019, refreshed in 2022) in reality is
>unlikely to be used with sustained 10GBit traffic. It would be too weak to
>handle userspace apps making use of this high throughput. This hw edge case
>IMO isn't really an argument for adding 1.000 LoC, blowing up driver structs,
>and adding the complexity of dealing with a register layout changing every two
>chip versions.
>
>It's really a problem that Realtek frequently changes register layout and/or
>register semantics in a not backward-compatible way (and doesn't provide
>documentation), resulting in ugly versioned stuff like the following.
>
>IMR_V2_SET_REG_8125     = 0x0d0c,
>IMR_V2_CLEAR_REG_8125   = 0x0d00,
>IMR_V4_L2_CLEAR_REG_8125 = 0x0d10,
>ISR_V2_8125             = 0x0d04,
>ISR_V4_L2_8125          = 0x0d14,
>
>case RTL_GIGA_MAC_VER_80:
>        tp->HwSuppIsrVer = 6;
>default:
>        tp->HwSuppIsrVer = 1;
>
>This messy hw design makes it hard to develop maintainable drivers.
>This is underlined by the fact that Realtek has separate r8125, r8126,
>r8127 drivers, even though they share most of the code.
>
>> Arch: x86_64
>> Test command:
>>   Server: iperf3 -s
>>   Client: iperf3 -c 192.168.2.1 -P 20 -t 3600
>> Monitor: mpstat -P ALL 1
>>
>> Before this patch (Without RSS):
>>   Throughput: Unstable, fluctuating between 3.76 Gbits/sec and
>>   8.2 Gbits/sec.
>>   CPU Usage: A single CPU core is fully occupied with softirq reaching
>>   up to 96%.
>>
>> After this patch (With RSS enabled):
>>   Throughput: Stable at 9.42 Gbits/sec.
>>   CPU Usage: The traffic load is evenly distributed across multiple CPU
>>   cores. The maximum softirq on a single core dropped to 63%.
>>
>> Patch summary:
>>   Patch 1: Adds necessary macro and register definitions for RSS.
>>   Patch 2-4: Support NAPI and multi RX/TX queues.
>
>Driver supports NAPI already.
>
>>   Patch 5-6: Support MSI-X and enables it specifically for RTL8127.
>
>Also MSI-X is used already.
>
>>   Patch 7: Enables RSS for RTL8127.
>>   Patch 8-9: Adds ethtool support to configure the number of RX queues.
>>
>> Javen Xu (9):
>>   r8169: add some register definitions
>>   r8169: add napi and irq support
>>   r8169: add support for multi tx queues
>>   r8169: add support for multi rx queues
>>   r8169: add support for msix
>>   r8169: enable msix for RTL8127
>>   r8169: add support and enable rss
>>   r8169: move struct ethtool_ops
>>   r8169: add support for ethtool
>>
>>  drivers/net/ethernet/realtek/r8169_main.c | 1437
>> ++++++++++++++++++---
>>  1 file changed, 1238 insertions(+), 199 deletions(-)
>>
>
>Series includes functions like rtl8169_desc_quirk() indicating a need to work
>around hw errata. Would be helpful to add comments describing the hw
>erratum, best with a link to documentation.

This is a workaround for a hardware erratum on RTL8127.
The hardware cannot guarantee that the descriptor OwnBit is fully written to host memory before interrupt is triggered. If the CPU handles the interrupt very quickly, it might read stale descriptor data where DescOwn is still set, causing it to incorrectly skip the packet.
The recheck_desc_ownbit flag and the subsequent rtl8127_desc_quirk() are introduced to wait for the descriptor write to complete and check it one last time.


Thanks for your review and suggestions.

Summary of changes in upcoming v2:
- remove multi tx queue patch
- rename some macro definitions, such as RXS_8125B_RSS_UDP_V4
- convert enum rtl8127_rss_register_content to #define and use BIT() macro
- run checkpatch, explain the usage of dma_wmb() etc.
- fix typo errors (e.g., DEAFULT)

BRs,
Javen