[PATCH v2 net-next 0/2] virtio_net: add page_pool support

Vishwanath Seshagiri posted 2 patches 1 week, 3 days ago
There is a newer version of this series
drivers/net/Kconfig                           |   1 +
drivers/net/virtio_net.c                      | 353 ++++++++++--------
.../drivers/net/virtio_net/basic_features.sh  |  70 ++++
3 files changed, 273 insertions(+), 151 deletions(-)
[PATCH v2 net-next 0/2] virtio_net: add page_pool support
Posted by Vishwanath Seshagiri 1 week, 3 days ago
Introduce page_pool support in virtio_net driver to enable page recycling
in RX buffer allocation and avoid repeated page allocator calls. This
applies to mergeable and small buffer modes.

Beyond performance improvements, this patch is a prerequisite for enabling
memory provider-based zero-copy features in virtio_net, specifically devmem
TCP and io_uring ZCRX, which require drivers to use page_pool for buffer
management.

The implementation preserves the DMA premapping optimization introduced in
commit 31f3cd4e5756 ("virtio-net: rq submits premapped per-buffer") by
conditionally using PP_FLAG_DMA_MAP when the virtio backend supports
standard DMA API (vhost, virtio-pci), and falling back to allocation-only
mode for backends with custom DMA mechanisms (VDUSE).

Changes in v2
=============

Addressing reviewer feedback from v1:

- Add "select PAGE_POOL" to Kconfig (Jason Wang)
- Move page pool creation from ndo_open to probe for device lifetime
  management (Xuan Zhuo, Jason Wang)
- Implement conditional DMA strategy using virtqueue_dma_dev():
  - When non-NULL: use PP_FLAG_DMA_MAP for page_pool-managed DMA premapping
  - When NULL (VDUSE): page_pool handles allocation only
- Use page_pool_get_dma_addr() + virtqueue_add_inbuf_premapped() to
  preserve DMA premapping optimization from commit 31f3cd4e5756
  ("virtio-net: rq submits premapped per-buffer") (Jason Wang)
- Remove dual allocation code paths - page_pool now always used for
  small/mergeable modes (Jason Wang)
- Remove unused virtnet_rq_alloc/virtnet_rq_init_one_sg functions
- Add comprehensive performance data (Michael S. Tsirkin)
- v1 link: https://lore.kernel.org/virtualization/20260106221924.123856-1-vishs@meta.com/

Performance Results
===================

Tested using iperf3 TCP_STREAM with virtio-net on vhost backend.
300-second runs, results show throughput and TCP retransmissions.
The base kernel is synced to net tree and commit: 709bbb015538.

Mergeable Buffer Mode (mrg_rxbuf=on, GSO enabled, MTU 1500):
+--------+---------+---------+------------+------------+--------+--------+
| Queues | Streams |  Patch  | Throughput |   Retries  | Delta  | Retry% |
+--------+---------+---------+------------+------------+--------+--------+
|   1    |    1    |  base   |  25.7 Gbps |      0     |   -    |   -    |
|   1    |    1    |   pp    |  26.2 Gbps |      0     | +1.9%  |   0%   |
+--------+---------+---------+------------+------------+--------+--------+
|   8    |    8    |  base   |  95.6 Gbps |  236,432   |   -    |   -    |
|   8    |    8    |   pp    |  97.9 Gbps |  188,249   | +2.4%  | -20.4% |
+--------+---------+---------+------------+------------+--------+--------+

Small Buffer Mode (mrg_rxbuf=off, GSO disabled, MTU 1500):
+--------+---------+---------+------------+------------+--------+--------+
| Queues | Streams |  Patch  | Throughput |   Retries  | Delta  | Retry% |
+--------+---------+---------+------------+------------+--------+--------+
|   1    |    1    |  base   |  9.17 Gbps |    15,152  |   -    |   -    |
|   1    |    1    |   pp    |  9.19 Gbps |    12,203  | +0.2%  | -19.5% |
+--------+---------+---------+------------+------------+--------+--------+
|   8    |    8    |  base   | 43.0 Gbps  |   974,500  |   -    |   -    |
|   8    |    8    |   pp    | 44.7 Gbps  |   717,411  | +4.0%  | -26.4% |
+--------+---------+---------+------------+------------+--------+--------+

Testing
=======

The patches have been tested with:
- iperf3 bulk transfer workloads (multiple queue/stream configurations)
- Included selftests for buffer circulation verification
- Edge case testing: device unbind/bind cycles, rapid interface open/close,
  traffic during close, ethtool feature toggling, close with pending refill
  work, and data integrity verification

Vishwanath Seshagiri (2):
  virtio_net: add page_pool support for buffer allocation
  selftests: virtio_net: add buffer circulation test

 drivers/net/Kconfig                           |   1 +
 drivers/net/virtio_net.c                      | 353 ++++++++++--------
 .../drivers/net/virtio_net/basic_features.sh  |  70 ++++
 3 files changed, 273 insertions(+), 151 deletions(-)

--
2.47.3
Re: [PATCH v2 net-next 0/2] virtio_net: add page_pool support
Posted by Jason Wang 1 week, 3 days ago
On Thu, Jan 29, 2026 at 5:20 AM Vishwanath Seshagiri <vishs@meta.com> wrote:
>
> Introduce page_pool support in virtio_net driver to enable page recycling
> in RX buffer allocation and avoid repeated page allocator calls. This
> applies to mergeable and small buffer modes.
>
> Beyond performance improvements, this patch is a prerequisite for enabling
> memory provider-based zero-copy features in virtio_net, specifically devmem
> TCP and io_uring ZCRX, which require drivers to use page_pool for buffer
> management.
>
> The implementation preserves the DMA premapping optimization introduced in
> commit 31f3cd4e5756 ("virtio-net: rq submits premapped per-buffer") by
> conditionally using PP_FLAG_DMA_MAP when the virtio backend supports
> standard DMA API (vhost, virtio-pci), and falling back to allocation-only
> mode for backends with custom DMA mechanisms (VDUSE).
>
> Changes in v2
> =============
>
> Addressing reviewer feedback from v1:
>
> - Add "select PAGE_POOL" to Kconfig (Jason Wang)
> - Move page pool creation from ndo_open to probe for device lifetime
>   management (Xuan Zhuo, Jason Wang)
> - Implement conditional DMA strategy using virtqueue_dma_dev():
>   - When non-NULL: use PP_FLAG_DMA_MAP for page_pool-managed DMA premapping
>   - When NULL (VDUSE): page_pool handles allocation only
> - Use page_pool_get_dma_addr() + virtqueue_add_inbuf_premapped() to
>   preserve DMA premapping optimization from commit 31f3cd4e5756
>   ("virtio-net: rq submits premapped per-buffer") (Jason Wang)
> - Remove dual allocation code paths - page_pool now always used for
>   small/mergeable modes (Jason Wang)
> - Remove unused virtnet_rq_alloc/virtnet_rq_init_one_sg functions
> - Add comprehensive performance data (Michael S. Tsirkin)
> - v1 link: https://lore.kernel.org/virtualization/20260106221924.123856-1-vishs@meta.com/
>
> Performance Results
> ===================
>
> Tested using iperf3 TCP_STREAM with virtio-net on vhost backend.
> 300-second runs, results show throughput and TCP retransmissions.
> The base kernel is synced to net tree and commit: 709bbb015538.
>
> Mergeable Buffer Mode (mrg_rxbuf=on, GSO enabled, MTU 1500):
> +--------+---------+---------+------------+------------+--------+--------+
> | Queues | Streams |  Patch  | Throughput |   Retries  | Delta  | Retry% |
> +--------+---------+---------+------------+------------+--------+--------+
> |   1    |    1    |  base   |  25.7 Gbps |      0     |   -    |   -    |
> |   1    |    1    |   pp    |  26.2 Gbps |      0     | +1.9%  |   0%   |
> +--------+---------+---------+------------+------------+--------+--------+
> |   8    |    8    |  base   |  95.6 Gbps |  236,432   |   -    |   -    |
> |   8    |    8    |   pp    |  97.9 Gbps |  188,249   | +2.4%  | -20.4% |
> +--------+---------+---------+------------+------------+--------+--------+
>
> Small Buffer Mode (mrg_rxbuf=off, GSO disabled, MTU 1500):
> +--------+---------+---------+------------+------------+--------+--------+
> | Queues | Streams |  Patch  | Throughput |   Retries  | Delta  | Retry% |
> +--------+---------+---------+------------+------------+--------+--------+
> |   1    |    1    |  base   |  9.17 Gbps |    15,152  |   -    |   -    |
> |   1    |    1    |   pp    |  9.19 Gbps |    12,203  | +0.2%  | -19.5% |
> +--------+---------+---------+------------+------------+--------+--------+
> |   8    |    8    |  base   | 43.0 Gbps  |   974,500  |   -    |   -    |
> |   8    |    8    |   pp    | 44.7 Gbps  |   717,411  | +4.0%  | -26.4% |
> +--------+---------+---------+------------+------------+--------+--------+

It would be better to have more benchmark like:

PPS (using pktgen on host and XDP_DROP in the guest)

We can see PPS as well as XDP performance as well.

Thanks

>
> Testing
> =======
>
> The patches have been tested with:
> - iperf3 bulk transfer workloads (multiple queue/stream configurations)
> - Included selftests for buffer circulation verification
> - Edge case testing: device unbind/bind cycles, rapid interface open/close,
>   traffic during close, ethtool feature toggling, close with pending refill
>   work, and data integrity verification
>
> Vishwanath Seshagiri (2):
>   virtio_net: add page_pool support for buffer allocation
>   selftests: virtio_net: add buffer circulation test
>
>  drivers/net/Kconfig                           |   1 +
>  drivers/net/virtio_net.c                      | 353 ++++++++++--------
>  .../drivers/net/virtio_net/basic_features.sh  |  70 ++++
>  3 files changed, 273 insertions(+), 151 deletions(-)
>
> --
> 2.47.3
>
Re: [PATCH v2 net-next 0/2] virtio_net: add page_pool support
Posted by Jakub Kicinski 1 week, 3 days ago
On Wed, 28 Jan 2026 13:20:29 -0800 Vishwanath Seshagiri wrote:
> Introduce page_pool support in virtio_net driver to enable page recycling
> in RX buffer allocation and avoid repeated page allocator calls. This
> applies to mergeable and small buffer modes.
> 
> Beyond performance improvements, this patch is a prerequisite for enabling
> memory provider-based zero-copy features in virtio_net, specifically devmem
> TCP and io_uring ZCRX, which require drivers to use page_pool for buffer
> management.

Struggles to boot in the CI:

[   11.424197][    C0] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASAN
[   11.424454][    C0] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
[   11.424606][    C0] CPU: 0 UID: 0 PID: 271 Comm: ip Not tainted 6.19.0-rc6-virtme #1 PREEMPT(full) 
[   11.424784][    C0] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[   11.424913][    C0] RIP: 0010:page_pool_alloc_frag_netmem+0x34/0x8e0
[   11.425054][    C0] Code: b8 00 00 00 00 00 fc ff df 41 57 41 89 c8 41 56 41 55 41 89 d5 48 89 fa 41 54 48 c1 ea 03 49 89 f4 55 53 48 89 fb 48 83 ec 30 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e 32 05 00 00 8b 0b 83 f9 3f 0f
[   11.425413][    C0] RSP: 0018:ffa0000000007a00 EFLAGS: 00010286
[   11.425544][    C0] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000820
[   11.425697][    C0] RDX: 0000000000000000 RSI: ffa0000000007ac0 RDI: 0000000000000000
[   11.425846][    C0] RBP: 1ff4000000000f54 R08: 0000000000000820 R09: fff3fc0000000f8f
[   11.426000][    C0] R10: fff3fc0000000f90 R11: 0000000000000001 R12: ffa0000000007ac0
[   11.426156][    C0] R13: 0000000000000600 R14: ff11000008719d00 R15: ff1100000926de00
[   11.426308][    C0] FS:  00007f4db5334400(0000) GS:ff110000786fe000(0000) knlGS:0000000000000000
[   11.426487][    C0] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   11.426617][    C0] CR2: 00000000004e5e60 CR3: 000000000b9da001 CR4: 0000000000771ef0
[   11.426768][    C0] PKRU: 55555554
[   11.426846][    C0] Call Trace:
[   11.426927][    C0]  <IRQ>
[   11.426981][    C0]  ? alloc_chain_hlocks+0x1e5/0x5c0
[   11.427085][    C0]  ? buf_to_xdp.isra.0+0x2f0/0x2f0
[   11.427190][    C0]  page_pool_alloc_frag+0xe/0x20
[   11.427289][    C0]  add_recvbuf_mergeable+0x1e0/0x940
[   11.427392][    C0]  ? page_to_skb+0x760/0x760
[   11.427497][    C0]  ? lock_acquire.part.0+0xbc/0x260
[   11.427597][    C0]  ? find_held_lock+0x2b/0x80
[   11.427696][    C0]  try_fill_recv+0x180/0x240
[   11.427794][    C0]  virtnet_poll+0xc79/0x1450
[   11.427901][    C0]  ? receive_buf+0x690/0x690
[   11.428005][    C0]  ? virtnet_xdp_handler+0x900/0x900
[   11.428109][    C0]  ? do_raw_spin_unlock+0x59/0x250
[   11.428208][    C0]  ? rcu_is_watching+0x15/0xd0
[   11.428310][    C0]  __napi_poll.constprop.0+0x97/0x390
[   11.428415][    C0]  net_rx_action+0x4f6/0xed0
[   11.428517][    C0]  ? run_backlog_napi+0x90/0x90
[   11.428617][    C0]  ? sched_balance_domains+0x270/0xd40
[   11.428721][    C0]  ? lockdep_hardirqs_on_prepare.part.0+0x9a/0x160
[   11.428844][    C0]  ? lockdep_hardirqs_on+0x84/0x130
[   11.428949][    C0]  ? sched_balance_update_blocked_averages+0x137/0x1a0
[   11.429073][    C0]  ? mark_held_locks+0x40/0x70
[   11.429172][    C0]  handle_softirqs+0x1d7/0x840
[   11.429271][    C0]  ? _local_bh_enable+0xd0/0xd0
[   11.429371][    C0]  ? __flush_smp_call_function_queue+0x449/0x6d0
[   11.429497][    C0]  ? rcu_is_watching+0x15/0xd0
[   11.429597][    C0]  do_softirq+0xa9/0xe0

https://netdev-ctrl.bots.linux.dev/logs/vmksft/virtio/results/494081/1-basic-features-sh/stderr
-- 
pw-bot: cr
Re: [PATCH v2 net-next 0/2] virtio_net: add page_pool support
Posted by Vishwanath Seshagiri 1 week, 2 days ago
On 1/28/26 5:37 PM, Jakub Kicinski wrote:
> On Wed, 28 Jan 2026 13:20:29 -0800 Vishwanath Seshagiri wrote:
>> Introduce page_pool support in virtio_net driver to enable page recycling
>> in RX buffer allocation and avoid repeated page allocator calls. This
>> applies to mergeable and small buffer modes.
>>
>> Beyond performance improvements, this patch is a prerequisite for enabling
>> memory provider-based zero-copy features in virtio_net, specifically devmem
>> TCP and io_uring ZCRX, which require drivers to use page_pool for buffer
>> management.
> 
> Struggles to boot in the CI:
> 
> [   11.424197][    C0] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASAN
> [   11.424454][    C0] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
> [   11.424606][    C0] CPU: 0 UID: 0 PID: 271 Comm: ip Not tainted 6.19.0-rc6-virtme #1 PREEMPT(full)
> [   11.424784][    C0] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [   11.424913][    C0] RIP: 0010:page_pool_alloc_frag_netmem+0x34/0x8e0
> [   11.425054][    C0] Code: b8 00 00 00 00 00 fc ff df 41 57 41 89 c8 41 56 41 55 41 89 d5 48 89 fa 41 54 48 c1 ea 03 49 89 f4 55 53 48 89 fb 48 83 ec 30 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e 32 05 00 00 8b 0b 83 f9 3f 0f
> [   11.425413][    C0] RSP: 0018:ffa0000000007a00 EFLAGS: 00010286
> [   11.425544][    C0] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000820
> [   11.425697][    C0] RDX: 0000000000000000 RSI: ffa0000000007ac0 RDI: 0000000000000000
> [   11.425846][    C0] RBP: 1ff4000000000f54 R08: 0000000000000820 R09: fff3fc0000000f8f
> [   11.426000][    C0] R10: fff3fc0000000f90 R11: 0000000000000001 R12: ffa0000000007ac0
> [   11.426156][    C0] R13: 0000000000000600 R14: ff11000008719d00 R15: ff1100000926de00
> [   11.426308][    C0] FS:  00007f4db5334400(0000) GS:ff110000786fe000(0000) knlGS:0000000000000000
> [   11.426487][    C0] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   11.426617][    C0] CR2: 00000000004e5e60 CR3: 000000000b9da001 CR4: 0000000000771ef0
> [   11.426768][    C0] PKRU: 55555554
> [   11.426846][    C0] Call Trace:
> [   11.426927][    C0]  <IRQ>
> [   11.426981][    C0]  ? alloc_chain_hlocks+0x1e5/0x5c0
> [   11.427085][    C0]  ? buf_to_xdp.isra.0+0x2f0/0x2f0
> [   11.427190][    C0]  page_pool_alloc_frag+0xe/0x20
> [   11.427289][    C0]  add_recvbuf_mergeable+0x1e0/0x940
> [   11.427392][    C0]  ? page_to_skb+0x760/0x760
> [   11.427497][    C0]  ? lock_acquire.part.0+0xbc/0x260
> [   11.427597][    C0]  ? find_held_lock+0x2b/0x80
> [   11.427696][    C0]  try_fill_recv+0x180/0x240
> [   11.427794][    C0]  virtnet_poll+0xc79/0x1450
> [   11.427901][    C0]  ? receive_buf+0x690/0x690
> [   11.428005][    C0]  ? virtnet_xdp_handler+0x900/0x900
> [   11.428109][    C0]  ? do_raw_spin_unlock+0x59/0x250
> [   11.428208][    C0]  ? rcu_is_watching+0x15/0xd0
> [   11.428310][    C0]  __napi_poll.constprop.0+0x97/0x390
> [   11.428415][    C0]  net_rx_action+0x4f6/0xed0
> [   11.428517][    C0]  ? run_backlog_napi+0x90/0x90
> [   11.428617][    C0]  ? sched_balance_domains+0x270/0xd40
> [   11.428721][    C0]  ? lockdep_hardirqs_on_prepare.part.0+0x9a/0x160
> [   11.428844][    C0]  ? lockdep_hardirqs_on+0x84/0x130
> [   11.428949][    C0]  ? sched_balance_update_blocked_averages+0x137/0x1a0
> [   11.429073][    C0]  ? mark_held_locks+0x40/0x70
> [   11.429172][    C0]  handle_softirqs+0x1d7/0x840
> [   11.429271][    C0]  ? _local_bh_enable+0xd0/0xd0
> [   11.429371][    C0]  ? __flush_smp_call_function_queue+0x449/0x6d0
> [   11.429497][    C0]  ? rcu_is_watching+0x15/0xd0
> [   11.429597][    C0]  do_softirq+0xa9/0xe0
> 
> https://urldefense.com/v3/__https://netdev-ctrl.bots.linux.dev/logs/vmksft/virtio/results/494081/1-basic-features-sh/stderr__;!!Bt8RZUm9aw!7wS5Fv2oMa1yF_50zUYFtTbdPGX3y338sThfMUM_j_v_r6DUE5bRwHw9N4yAEVYm-uYHa48$

The CI error is a bug in my patch where page pools is not created for
all queues when num_online_cpus < max_queue_pairs. This is the issue
Jason caught about using max_queue_pairs instead of curr_queue_pairs.
I will fix it in v3.