[PATCH net-next v3 00/23][pull request] Queue configs and large buffer providers

Pavel Begunkov posted 23 patches 1 month, 2 weeks ago
Documentation/netlink/specs/ethtool.yaml      |   4 +
Documentation/netlink/specs/netdev.yaml       |  15 ++
Documentation/networking/ethtool-netlink.rst  |   7 +-
drivers/net/ethernet/broadcom/bnxt/bnxt.c     | 143 +++++++++++---
drivers/net/ethernet/broadcom/bnxt/bnxt.h     |   5 +-
.../net/ethernet/broadcom/bnxt/bnxt_ethtool.c |   9 +-
drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c |   6 +-
drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h |   2 +-
drivers/net/ethernet/google/gve/gve_main.c    |   9 +-
.../marvell/octeontx2/nic/otx2_ethtool.c      |   6 +-
.../net/ethernet/mellanox/mlx5/core/en_main.c |   9 +-
drivers/net/netdevsim/netdev.c                |   8 +-
include/linux/ethtool.h                       |   3 +
include/net/netdev_queues.h                   |  84 ++++++--
include/net/netdev_rx_queue.h                 |   3 +-
include/net/netlink.h                         |  19 ++
include/net/page_pool/types.h                 |   1 +
.../uapi/linux/ethtool_netlink_generated.h    |   1 +
include/uapi/linux/netdev.h                   |   2 +
net/core/Makefile                             |   2 +-
net/core/dev.c                                |  12 +-
net/core/dev.h                                |  15 ++
net/core/netdev-genl-gen.c                    |  15 ++
net/core/netdev-genl-gen.h                    |   1 +
net/core/netdev-genl.c                        |  92 +++++++++
net/core/netdev_config.c                      | 183 ++++++++++++++++++
net/core/netdev_rx_queue.c                    |  22 ++-
net/core/page_pool.c                          |   3 +
net/ethtool/common.c                          |   4 +-
net/ethtool/netlink.c                         |  14 +-
net/ethtool/rings.c                           |  14 +-
tools/include/uapi/linux/netdev.h             |   2 +
32 files changed, 631 insertions(+), 84 deletions(-)
create mode 100644 net/core/netdev_config.c
[PATCH net-next v3 00/23][pull request] Queue configs and large buffer providers
Posted by Pavel Begunkov 1 month, 2 weeks ago
Pull request with netdev only patches that add support for per queue
configuration and large rx buffers for memory providers. The zcrx
patch using it is separately and can be found at [2].

Large buffers yielded significant benefits during testing, e.g.
a setup with 32KB buffers was using 30% less CPU than with 4K,
see [3] for more details.

Per queue configuration series:
[1] https://lore.kernel.org/all/20250421222827.283737-1-kuba@kernel.org/
Branch with the zcrx patch
[2] https://github.com/isilence/linux.git zcrx/large-buffers-v3
v2 of the series
[3] https://lore.kernel.org/all/cover.1754657711.git.asml.silence@gmail.com/

---

v3: - rebased, excluded zcrx specific patches
    - set agg_size_fac to 1 on warning
v2: - Add MAX_PAGE_ORDER check on pp init (Patch 1)
    - Applied comments rewording (Patch 2)
    - Adjust pp.max_len based on order (Patch 8)
    - Patch up mlx5 queue callbacks after rebase (Patch 12)
    - Minor ->queue_mgmt_ops refactoring (Patch 15)
    - Rebased to account for both fill level and agg_size_fac (Patch 17)
    - Pass providers buf length in struct pp_memory_provider_params and
      apply it in __netdev_queue_confi(). (Patch 22)
    - Use ->supported_ring_params to validate drivers support of set
      qcfg parameters. (Patch 23)

The following changes since commit c17b750b3ad9f45f2b6f7e6f7f4679844244f0b9:

  Linux 6.17-rc2 (2025-08-17 15:22:10 -0700)

are available in the Git repository at:

  https://github.com/isilence/linux.git tags/net-for-6.18-queue-rx-buf-len

for you to fetch changes up to 417cf28f3bf129d1a0d1b231220aa045abac3263:

  net: validate driver supports passed qcfg params (2025-08-18 07:39:50 +0100)

Jakub Kicinski (20):
      docs: ethtool: document that rx_buf_len must control payload lengths
      net: ethtool: report max value for rx-buf-len
      net: use zero value to restore rx_buf_len to default
      net: clarify the meaning of netdev_config members
      net: add rx_buf_len to netdev config
      eth: bnxt: read the page size from the adapter struct
      eth: bnxt: set page pool page order based on rx_page_size
      eth: bnxt: support setting size of agg buffers via ethtool
      net: move netdev_config manipulation to dedicated helpers
      net: reduce indent of struct netdev_queue_mgmt_ops members
      net: allocate per-queue config structs and pass them thru the queue API
      net: pass extack to netdev_rx_queue_restart()
      net: add queue config validation callback
      eth: bnxt: always set the queue mgmt ops
      eth: bnxt: store the rx buf size per queue
      eth: bnxt: adjust the fill level of agg queues with larger buffers
      netdev: add support for setting rx-buf-len per queue
      net: wipe the setting of deactived queues
      eth: bnxt: use queue op config validate
      eth: bnxt: support per queue configuration of rx-buf-len

Pavel Begunkov (3):
      net: page_pool: sanitise allocation order
      net: let pp memory provider to specify rx buf len
      net: validate driver supports passed qcfg params

 Documentation/netlink/specs/ethtool.yaml           |   4 +
 Documentation/netlink/specs/netdev.yaml            |  15 ++
 Documentation/networking/ethtool-netlink.rst       |   7 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c          | 143 ++++++++++++----
 drivers/net/ethernet/broadcom/bnxt/bnxt.h          |   5 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c  |   9 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c      |   6 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h      |   2 +-
 drivers/net/ethernet/google/gve/gve_main.c         |   9 +-
 .../ethernet/marvell/octeontx2/nic/otx2_ethtool.c  |   6 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |   9 +-
 drivers/net/netdevsim/netdev.c                     |   8 +-
 include/linux/ethtool.h                            |   3 +
 include/net/netdev_queues.h                        |  84 ++++++++--
 include/net/netdev_rx_queue.h                      |   3 +-
 include/net/netlink.h                              |  19 +++
 include/net/page_pool/types.h                      |   1 +
 include/uapi/linux/ethtool_netlink_generated.h     |   1 +
 include/uapi/linux/netdev.h                        |   2 +
 net/core/Makefile                                  |   2 +-
 net/core/dev.c                                     |  12 +-
 net/core/dev.h                                     |  15 ++
 net/core/netdev-genl-gen.c                         |  15 ++
 net/core/netdev-genl-gen.h                         |   1 +
 net/core/netdev-genl.c                             |  92 +++++++++++
 net/core/netdev_config.c                           | 183 +++++++++++++++++++++
 net/core/netdev_rx_queue.c                         |  22 ++-
 net/core/page_pool.c                               |   3 +
 net/ethtool/common.c                               |   4 +-
 net/ethtool/netlink.c                              |  14 +-
 net/ethtool/rings.c                                |  14 +-
 tools/include/uapi/linux/netdev.h                  |   2 +
 32 files changed, 631 insertions(+), 84 deletions(-)
 create mode 100644 net/core/netdev_config.c

Jakub Kicinski (20):
  docs: ethtool: document that rx_buf_len must control payload lengths
  net: ethtool: report max value for rx-buf-len
  net: use zero value to restore rx_buf_len to default
  net: clarify the meaning of netdev_config members
  net: add rx_buf_len to netdev config
  eth: bnxt: read the page size from the adapter struct
  eth: bnxt: set page pool page order based on rx_page_size
  eth: bnxt: support setting size of agg buffers via ethtool
  net: move netdev_config manipulation to dedicated helpers
  net: reduce indent of struct netdev_queue_mgmt_ops members
  net: allocate per-queue config structs and pass them thru the queue
    API
  net: pass extack to netdev_rx_queue_restart()
  net: add queue config validation callback
  eth: bnxt: always set the queue mgmt ops
  eth: bnxt: store the rx buf size per queue
  eth: bnxt: adjust the fill level of agg queues with larger buffers
  netdev: add support for setting rx-buf-len per queue
  net: wipe the setting of deactived queues
  eth: bnxt: use queue op config validate
  eth: bnxt: support per queue configuration of rx-buf-len

Pavel Begunkov (3):
  net: page_pool: sanitise allocation order
  net: let pp memory provider to specify rx buf len
  net: validate driver supports passed qcfg params

 Documentation/netlink/specs/ethtool.yaml      |   4 +
 Documentation/netlink/specs/netdev.yaml       |  15 ++
 Documentation/networking/ethtool-netlink.rst  |   7 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     | 143 +++++++++++---
 drivers/net/ethernet/broadcom/bnxt/bnxt.h     |   5 +-
 .../net/ethernet/broadcom/bnxt/bnxt_ethtool.c |   9 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c |   6 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h |   2 +-
 drivers/net/ethernet/google/gve/gve_main.c    |   9 +-
 .../marvell/octeontx2/nic/otx2_ethtool.c      |   6 +-
 .../net/ethernet/mellanox/mlx5/core/en_main.c |   9 +-
 drivers/net/netdevsim/netdev.c                |   8 +-
 include/linux/ethtool.h                       |   3 +
 include/net/netdev_queues.h                   |  84 ++++++--
 include/net/netdev_rx_queue.h                 |   3 +-
 include/net/netlink.h                         |  19 ++
 include/net/page_pool/types.h                 |   1 +
 .../uapi/linux/ethtool_netlink_generated.h    |   1 +
 include/uapi/linux/netdev.h                   |   2 +
 net/core/Makefile                             |   2 +-
 net/core/dev.c                                |  12 +-
 net/core/dev.h                                |  15 ++
 net/core/netdev-genl-gen.c                    |  15 ++
 net/core/netdev-genl-gen.h                    |   1 +
 net/core/netdev-genl.c                        |  92 +++++++++
 net/core/netdev_config.c                      | 183 ++++++++++++++++++
 net/core/netdev_rx_queue.c                    |  22 ++-
 net/core/page_pool.c                          |   3 +
 net/ethtool/common.c                          |   4 +-
 net/ethtool/netlink.c                         |  14 +-
 net/ethtool/rings.c                           |  14 +-
 tools/include/uapi/linux/netdev.h             |   2 +
 32 files changed, 631 insertions(+), 84 deletions(-)
 create mode 100644 net/core/netdev_config.c

-- 
2.49.0
Re: [PATCH net-next v3 00/23][pull request] Queue configs and large buffer providers
Posted by Pavel Begunkov 1 month, 2 weeks ago
On 8/18/25 14:57, Pavel Begunkov wrote:
> Pull request with netdev only patches that add support for per queue
> configuration and large rx buffers for memory providers. The zcrx
> patch using it is separately and can be found at [2].

I'm sending it out as a v6.17-rc2 based pull request since I'll also
need it in another tree for zcrx. The patch number is over the limit,
however most of them are just taken from Jakub's series, and it'll
likely be esier this way for cross tree work. Please let me know if
that's acceptable or whether I need to somehow split or trim it
down.

-- 
Pavel Begunkov
Re: [PATCH net-next v3 00/23][pull request] Queue configs and large buffer providers
Posted by Jakub Kicinski 1 month, 2 weeks ago
On Mon, 18 Aug 2025 14:57:16 +0100 Pavel Begunkov wrote:
> Jakub Kicinski (20):

I think we need to revisit how we operate.
When we started the ZC work w/ io-uring I suggested a permanent shared
branch. That's perhaps an overkill. What I did not expect is that you
will not even CC netdev@ on changes to io_uring/zcrx.*

I don't mean to assert any sort of ownership of that code, but you're
not meeting basic collaboration standards for the kernel. This needs 
to change first.
-- 
pw-bot: defer
Re: [PATCH net-next v3 00/23][pull request] Queue configs and large buffer providers
Posted by Pavel Begunkov 1 month, 2 weeks ago
On 8/20/25 03:31, Jakub Kicinski wrote:
> On Mon, 18 Aug 2025 14:57:16 +0100 Pavel Begunkov wrote:
>> Jakub Kicinski (20):
> 
> I think we need to revisit how we operate.
> When we started the ZC work w/ io-uring I suggested a permanent shared
> branch. That's perhaps an overkill. What I did not expect is that you
> will not even CC netdev@ on changes to io_uring/zcrx.*
> 
> I don't mean to assert any sort of ownership of that code, but you're
> not meeting basic collaboration standards for the kernel. This needs
> to change first.

You're throwing quite allegations. Basic collaboration standards don't
include spamming people with unrelated changes via an already busy list.
I cc'ed netdev on patches that meaningfully change how it interacts
(incl indirectly) with netdev and/or might be of interest, which is
beyond of the usual standard expected of a project using infrastructure
provided by a subsystem. There are pieces that don't touch netdev, like
how io_uring pins pages, accounts memory, sets up rings, etc. In the
very same way generic io_uring patches are not normally posted to
netdev, and netdev patches are not redirected to mm because there
are kmalloc calls, even though, it's not even the standard used here.

If you have some way you want to work, I'd appreciate a clear
indication of that, because that message you mentioned was answered
and I've never heard any objection, or anything else really.

-- 
Pavel Begunkov
Re: [PATCH net-next v3 00/23][pull request] Queue configs and large buffer providers
Posted by Jakub Kicinski 1 month, 2 weeks ago
On Wed, 20 Aug 2025 14:39:51 +0100 Pavel Begunkov wrote:
> On 8/20/25 03:31, Jakub Kicinski wrote:
> > On Mon, 18 Aug 2025 14:57:16 +0100 Pavel Begunkov wrote:  
> >> Jakub Kicinski (20):  
> > 
> > I think we need to revisit how we operate.
> > When we started the ZC work w/ io-uring I suggested a permanent shared
> > branch. That's perhaps an overkill. What I did not expect is that you
> > will not even CC netdev@ on changes to io_uring/zcrx.*
> > 
> > I don't mean to assert any sort of ownership of that code, but you're
> > not meeting basic collaboration standards for the kernel. This needs
> > to change first.  
> 
> You're throwing quite allegations. Basic collaboration standards don't
> include spamming people with unrelated changes via an already busy list.
> I cc'ed netdev on patches that meaningfully change how it interacts
> (incl indirectly) with netdev and/or might be of interest, which is
> beyond of the usual standard expected of a project using infrastructure
> provided by a subsystem.

To me iouring is a fancy syscall layer. It's good at its job, sure,
but saying that netdev provides infrastructure to a syscall layer is
laughable.

> There are pieces that don't touch netdev, like
> how io_uring pins pages, accounts memory, sets up rings, etc. In the
> very same way generic io_uring patches are not normally posted to
> netdev, and netdev patches are not redirected to mm because there
> are kmalloc calls, even though, it's not even the standard used here.

I'm asking you to CC netdev, and people who work on ZC like Mina.
Normal reaction to someone asking to be CCed on patches is "Sure."
I don't understand what you're afraid of.

> If you have some way you want to work, I'd appreciate a clear
> indication of that, because that message you mentioned was answered
> and I've never heard any objection, or anything else really.

It honestly didn't cross my mind that you'd only CC netdev on patches
which touch code under net/. I'd have let you know sooner but it's hard
to reply to messages one doesn't see. I found out that there's whole
bunch of ZC work that landed in iouring from talking to David Wei.
Re: [PATCH net-next v3 00/23][pull request] Queue configs and large buffer providers
Posted by Pavel Begunkov 1 month, 1 week ago
On 8/21/25 02:37, Jakub Kicinski wrote:
> On Wed, 20 Aug 2025 14:39:51 +0100 Pavel Begunkov wrote:
>> On 8/20/25 03:31, Jakub Kicinski wrote:
>>> On Mon, 18 Aug 2025 14:57:16 +0100 Pavel Begunkov wrote:
>>>> Jakub Kicinski (20):
>>>
>>> I think we need to revisit how we operate.
>>> When we started the ZC work w/ io-uring I suggested a permanent shared
>>> branch. That's perhaps an overkill. What I did not expect is that you
>>> will not even CC netdev@ on changes to io_uring/zcrx.*
>>>
>>> I don't mean to assert any sort of ownership of that code, but you're
>>> not meeting basic collaboration standards for the kernel. This needs
>>> to change first.
>>
>> You're throwing quite allegations. Basic collaboration standards don't
>> include spamming people with unrelated changes via an already busy list.
>> I cc'ed netdev on patches that meaningfully change how it interacts
>> (incl indirectly) with netdev and/or might be of interest, which is
>> beyond of the usual standard expected of a project using infrastructure
>> provided by a subsystem.
> 
> To me iouring is a fancy syscall layer. It's good at its job, sure,
> but saying that netdev provides infrastructure to a syscall layer is
> laughable.

?

>> There are pieces that don't touch netdev, like
>> how io_uring pins pages, accounts memory, sets up rings, etc. In the
>> very same way generic io_uring patches are not normally posted to
>> netdev, and netdev patches are not redirected to mm because there
>> are kmalloc calls, even though, it's not even the standard used here.
> 
> I'm asking you to CC netdev, and people who work on ZC like Mina.
> Normal reaction to someone asking to be CCed on patches is "Sure."
> I don't understand what you're afraid of.

Normal reaction is to ask to CC and not attempt to slander as you
just did. That's not appreciated. All that cherry topped with a
signal that you're not going to take my work until I learn how to
read your mind.

https://lore.kernel.org/all/bcf5a9e8-5014-44cc-85a0-2974e3039cb6@gmail.com/

When you brought this topic before, I fully outlined what I believe
would be a good workflow, and since there was no answer, I've been
sticking to it. And let me note, you didn't directly and clearly
ask to CC netdev. And I'm pretty sure, ignoring messages and
smearing is not in the spirit of the "basic collaboration standards",
whatever those are.

>> If you have some way you want to work, I'd appreciate a clear
>> indication of that, because that message you mentioned was answered
>> and I've never heard any objection, or anything else really.
> 
> It honestly didn't cross my mind that you'd only CC netdev on patches
> which touch code under net/. I'd have let you know sooner but it's hard

If you refer to the directory, that's clearly not true.

> to reply to messages one doesn't see. I found out that there's whole
> bunch of ZC work that landed in iouring from talking to David Wei.

The linked thread above indicates the opposite. 	

-- 
Pavel Begunkov
Re: [PATCH net-next v3 00/23][pull request] Queue configs and large buffer providers
Posted by Mina Almasry 1 month, 2 weeks ago
On Wed, Aug 20, 2025 at 6:38 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>
> On 8/20/25 03:31, Jakub Kicinski wrote:
> > On Mon, 18 Aug 2025 14:57:16 +0100 Pavel Begunkov wrote:
> >> Jakub Kicinski (20):
> >
> > I think we need to revisit how we operate.
> > When we started the ZC work w/ io-uring I suggested a permanent shared
> > branch. That's perhaps an overkill. What I did not expect is that you
> > will not even CC netdev@ on changes to io_uring/zcrx.*
> >
> > I don't mean to assert any sort of ownership of that code, but you're
> > not meeting basic collaboration standards for the kernel. This needs
> > to change first.
>
> You're throwing quite allegations. Basic collaboration standards don't
> include spamming people with unrelated changes via an already busy list.
> I cc'ed netdev on patches that meaningfully change how it interacts
> (incl indirectly) with netdev and/or might be of interest, which is
> beyond of the usual standard expected of a project using infrastructure
> provided by a subsystem. There are pieces that don't touch netdev, like
> how io_uring pins pages, accounts memory, sets up rings, etc. In the
> very same way generic io_uring patches are not normally posted to
> netdev, and netdev patches are not redirected to mm because there
> are kmalloc calls, even though, it's not even the standard used here.
>
> If you have some way you want to work, I'd appreciate a clear
> indication of that, because that message you mentioned was answered
> and I've never heard any objection, or anything else really.
>

We could use tags in the MAINTAINERS file similar to these:

F: include/linux/*fence.h
F: include/linux/dma-buf.h
F: include/linux/dma-resv.h
K: \bdma_(?:buf|fence|resv)\b

We could make sure anything touching io_uring/zcrx. and anything using
netmem_ref/net_iov goes to netdev. I think roughly adding something
like this to general networking entry?

F: io_uring/zcrx.*
K: \bnet(mem_ref|_iov)\b

I had suggested this before but never had time to suggest the actual
changes, and in the back of my mind was a bit weary of spamming the
maintainers, but it seems this is not as much a concern as the patches
not getting to netdev.

-- 
Thanks,
Mina
Re: [PATCH net-next v3 00/23][pull request] Queue configs and large buffer providers
Posted by Jakub Kicinski 1 month, 2 weeks ago
On Wed, 20 Aug 2025 06:59:51 -0700 Mina Almasry wrote:
> We could make sure anything touching io_uring/zcrx. and anything using
> netmem_ref/net_iov goes to netdev. I think roughly adding something
> like this to general networking entry?
> 
> F: io_uring/zcrx.*
> K: \bnet(mem_ref|_iov)\b

Right, I think clearest would be to add a new entry for this, and copy
the real metadata (Jens as the maintainer, his tree etc.). If we just
add the match to netdev it will look like the patches will flow via
net-next. No strong preference, tho. As long as get_maintainer suggests
CCing netdev I'll be happy.

> I had suggested this before but never had time to suggest the actual
> changes, and in the back of my mind was a bit weary of spamming the
> maintainers, but it seems this is not as much a concern as the patches
> not getting to netdev.