drivers/net/ethernet/microsoft/mana/mana_en.c | 22 ++- .../ethernet/microsoft/mana/mana_ethtool.c | 164 ++++++++++++++---- include/net/mana/mana.h | 8 + 3 files changed, 163 insertions(+), 31 deletions(-)
On some ARM64 platforms with 4K PAGE_SIZE, utilizing page_pool
fragments for allocation in the RX refill path (~2kB buffer per fragment)
causes 15-20% throughput regression under high connection counts
(>16 TCP streams at 180+ Gbps). Using full-page buffers on these
platforms shows no regression and restores line-rate performance.
This behavior is observed on a single platform; other platforms
perform better with page_pool fragments, indicating this is not a
page_pool issue but platform-specific.
This series adds an ethtool private flag "full-page-rx" to let the
user opt in to one RX buffer per page:
ethtool --set-priv-flags eth0 full-page-rx on
There is no behavioral change by default. The flag can be persisted
via udev rule for affected platforms.
Changes in v5:
- Split prep refactor into separate patch (patch 1/2)
Changes in v4:
- Dropping the smbios string parsing and add ethtool priv flag
to reconfigure the queues with full page rx buffers.
Changes in v3:
- changed u8* to char*
Changes in v2:
- separate reading string index and the string, remove inline.
Dipayaan Roy (2):
net: mana: refactor mana_get_strings() and mana_get_sset_count() to
use switch
net: mana: force full-page RX buffers via ethtool private flag
drivers/net/ethernet/microsoft/mana/mana_en.c | 22 ++-
.../ethernet/microsoft/mana/mana_ethtool.c | 164 ++++++++++++++----
include/net/mana/mana.h | 8 +
3 files changed, 163 insertions(+), 31 deletions(-)
--
2.43.0
From: Dipayaan Roy <dipayanroy@linux.microsoft.com> Date: Sat, 4 Apr 2026 20:42:15 -0700 > On some ARM64 platforms with 4K PAGE_SIZE, utilizing page_pool > fragments for allocation in the RX refill path (~2kB buffer per fragment) > causes 15-20% throughput regression under high connection counts > (>16 TCP streams at 180+ Gbps). Using full-page buffers on these > platforms shows no regression and restores line-rate performance. > > This behavior is observed on a single platform; other platforms > perform better with page_pool fragments, indicating this is not a > page_pool issue but platform-specific. > > This series adds an ethtool private flag "full-page-rx" to let the > user opt in to one RX buffer per page: > > ethtool --set-priv-flags eth0 full-page-rx on Sorry I may've missed the previous threads. Has this approach been discussed here? Private flags are generally discouraged. Alternatively, you can provide Ethtool ops to change the Rx buffer size, so that you'd be able to set it to PAGE_SIZE on affected platforms and the result would be the same. > > There is no behavioral change by default. The flag can be persisted > via udev rule for affected platforms. > > Changes in v5: > - Split prep refactor into separate patch (patch 1/2) > Changes in v4: > - Dropping the smbios string parsing and add ethtool priv flag > to reconfigure the queues with full page rx buffers. > Changes in v3: > - changed u8* to char* > Changes in v2: > - separate reading string index and the string, remove inline. > > Dipayaan Roy (2): > net: mana: refactor mana_get_strings() and mana_get_sset_count() to > use switch > net: mana: force full-page RX buffers via ethtool private flag > > drivers/net/ethernet/microsoft/mana/mana_en.c | 22 ++- > .../ethernet/microsoft/mana/mana_ethtool.c | 164 ++++++++++++++---- > include/net/mana/mana.h | 8 + > 3 files changed, 163 insertions(+), 31 deletions(-) Thanks, Olek
On Tue, 7 Apr 2026 15:10:45 +0200 Alexander Lobakin wrote: > > On some ARM64 platforms with 4K PAGE_SIZE, utilizing page_pool > > fragments for allocation in the RX refill path (~2kB buffer per fragment) > > causes 15-20% throughput regression under high connection counts > > (>16 TCP streams at 180+ Gbps). Using full-page buffers on these > > platforms shows no regression and restores line-rate performance. > > > > This behavior is observed on a single platform; other platforms > > perform better with page_pool fragments, indicating this is not a > > page_pool issue but platform-specific. > > > > This series adds an ethtool private flag "full-page-rx" to let the > > user opt in to one RX buffer per page: > > > > ethtool --set-priv-flags eth0 full-page-rx on > > Sorry I may've missed the previous threads. > > Has this approach been discussed here? Private flags are generally > discouraged. > > Alternatively, you can provide Ethtool ops to change the Rx buffer size, > so that you'd be able to set it to PAGE_SIZE on affected platforms and > the result would be the same. Actually, hm. Now that you spoke up I wonder how much this is an inherent ARM problem vs problem in whatever ARM Microsoft's management empire-built themselves into. Do you have access to any ARM servers? Google says GCP offers ARM instances with idpf NICs. So if idpf benefits from the same "tuning" we should totally push for a proper API not priv flags.
On Tue, Apr 07, 2026 at 06:51:28PM -0700, Jakub Kicinski wrote: > On Tue, 7 Apr 2026 15:10:45 +0200 Alexander Lobakin wrote: > > > On some ARM64 platforms with 4K PAGE_SIZE, utilizing page_pool > > > fragments for allocation in the RX refill path (~2kB buffer per fragment) > > > causes 15-20% throughput regression under high connection counts > > > (>16 TCP streams at 180+ Gbps). Using full-page buffers on these > > > platforms shows no regression and restores line-rate performance. > > > > > > This behavior is observed on a single platform; other platforms > > > perform better with page_pool fragments, indicating this is not a > > > page_pool issue but platform-specific. > > > > > > This series adds an ethtool private flag "full-page-rx" to let the > > > user opt in to one RX buffer per page: > > > > > > ethtool --set-priv-flags eth0 full-page-rx on > > > > Sorry I may've missed the previous threads. > > > > Has this approach been discussed here? Private flags are generally > > discouraged. > > > > Alternatively, you can provide Ethtool ops to change the Rx buffer size, > > so that you'd be able to set it to PAGE_SIZE on affected platforms and > > the result would be the same. > > Actually, hm. Now that you spoke up I wonder how much this is > an inherent ARM problem vs problem in whatever ARM Microsoft's > management empire-built themselves into. > > Do you have access to any ARM servers? Google says GCP offers ARM > instances with idpf NICs. So if idpf benefits from the same > "tuning" we should totally push for a proper API not priv flags. Hi, Sharing an observation from earlier, with a different ARM64 fabric/platfrom when configured with base size of 4Kb and the smae MANA NIC, did not show this behaviour. In fact, it showed better performance with page fragments in single as well as multiple connections. Thats why initial version this patch we wanted to apply the work around only to this specific chip where the issue is seen with page fragments. Regards
On Tue, Apr 07, 2026 at 03:10:45PM +0200, Alexander Lobakin wrote: > From: Dipayaan Roy <dipayanroy@linux.microsoft.com> > Date: Sat, 4 Apr 2026 20:42:15 -0700 > > > On some ARM64 platforms with 4K PAGE_SIZE, utilizing page_pool > > fragments for allocation in the RX refill path (~2kB buffer per fragment) > > causes 15-20% throughput regression under high connection counts > > (>16 TCP streams at 180+ Gbps). Using full-page buffers on these > > platforms shows no regression and restores line-rate performance. > > > > This behavior is observed on a single platform; other platforms > > perform better with page_pool fragments, indicating this is not a > > page_pool issue but platform-specific. > > > > This series adds an ethtool private flag "full-page-rx" to let the > > user opt in to one RX buffer per page: > > > > ethtool --set-priv-flags eth0 full-page-rx on > > Sorry I may've missed the previous threads. > > Has this approach been discussed here? Private flags are generally > discouraged. > > Alternatively, you can provide Ethtool ops to change the Rx buffer size, > so that you'd be able to set it to PAGE_SIZE on affected platforms and > the result would be the same. > Hi Alex, This was discussed here: https://lore.kernel.org/all/adHTm2SvjDrezEdv@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net/ > > > > There is no behavioral change by default. The flag can be persisted > > via udev rule for affected platforms. > > > > Changes in v5: > > - Split prep refactor into separate patch (patch 1/2) > > Changes in v4: > > - Dropping the smbios string parsing and add ethtool priv flag > > to reconfigure the queues with full page rx buffers. > > Changes in v3: > > - changed u8* to char* > > Changes in v2: > > - separate reading string index and the string, remove inline. > > > > Dipayaan Roy (2): > > net: mana: refactor mana_get_strings() and mana_get_sset_count() to > > use switch > > net: mana: force full-page RX buffers via ethtool private flag > > > > drivers/net/ethernet/microsoft/mana/mana_en.c | 22 ++- > > .../ethernet/microsoft/mana/mana_ethtool.c | 164 ++++++++++++++---- > > include/net/mana/mana.h | 8 + > > 3 files changed, 163 insertions(+), 31 deletions(-) > > Thanks, > Olek
© 2016 - 2026 Red Hat, Inc.