Document expectations from drivers looking to add support for device
memory tcp or other memory provider based features.
Signed-off-by: Mina Almasry <almasrymina@google.com>
---
Documentation/networking/index.rst | 1 +
Documentation/networking/memory-provider.rst | 52 ++++++++++++++++++++
2 files changed, 53 insertions(+)
create mode 100644 Documentation/networking/memory-provider.rst
diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
index 46c178e564b3..c184e86a6ce1 100644
--- a/Documentation/networking/index.rst
+++ b/Documentation/networking/index.rst
@@ -73,6 +73,7 @@ Contents:
l2tp
lapb-module
mac80211-injection
+ memory-provider
mctp
mpls-sysctl
mptcp
diff --git a/Documentation/networking/memory-provider.rst b/Documentation/networking/memory-provider.rst
new file mode 100644
index 000000000000..4eee3b01eb18
--- /dev/null
+++ b/Documentation/networking/memory-provider.rst
@@ -0,0 +1,52 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+================
+Memory providers
+================
+
+
+Intro
+=====
+
+Device memory TCP, and likely more upcoming features, are reliant in memory
+provider support in the driver.
+
+
+Driver support
+==============
+
+1. The driver must support page_pool. The driver must not do its own recycling
+ on top of page_pool.
+
+2. The driver must support tcp-data-split ethtool option.
+
+3. The driver must use the page_pool netmem APIs. The netmem APIs are
+ currently 1-to-1 mapped with page APIs. Conversion to netmem should be
+ achievable by switching the page APIs to netmem APIs and tracking memory via
+ netmem_refs in the driver rather than struct page * :
+
+ - page_pool_alloc -> page_pool_alloc_netmem
+ - page_pool_get_dma_addr -> page_pool_get_dma_addr_netmem
+ - page_pool_put_page -> page_pool_put_netmem
+
+ Not all page APIs have netmem equivalents at the moment. If your driver
+ relies on a missing netmem API, feel free to add and propose to netdev@ or
+ reach out to almasrymina@google.com for help adding the netmem API.
+
+4. The driver must use the following PP_FLAGS:
+
+ - PP_FLAG_DMA_MAP: netmem is not dma mappable by the driver. The driver
+ must delegate the dma mapping to the page_pool.
+ - PP_FLAG_DMA_SYNC_DEV: netmem dma addr is not necessarily dma-syncable
+ by the driver. The driver must delegate the dma syncing to the page_pool.
+ - PP_FLAG_ALLOW_UNREADABLE_NETMEM. The driver must specify this flag iff
+ tcp-data-split is enabled. In this case the netmem allocated by the
+ page_pool may be unreadable, and netmem_address() will return NULL to
+ indicate that. The driver must not assume that the netmem is readable.
+
+5. The driver must use page_pool_dma_sync_netmem_for_cpu() in lieu of
+ dma_sync_single_range_for_cpu(). For some memory providers, dma_syncing for
+ CPU will be done by the page_pool, for others (particularly dmabuf memory
+ provider), dma syncing for CPU is the responsibility of the userspace using
+ dmabuf APIs. The driver must delegate the entire dma-syncing operation to
+ the page_pool which will do it correctly.
--
2.47.0.338.g60cca15819-goog
On Mon, 9 Dec 2024 17:23:08 +0000 Mina Almasry wrote: > +================ > +Memory providers > +================ > + > + > +Intro > +===== > + > +Device memory TCP, and likely more upcoming features, are reliant in memory > +provider support in the driver. Are "memory providers" something driver authors care about? I'd go with netmem naming in all driver facing APIs. Or perhaps say placing data into unreable buffers? > +Driver support > +============== > + > +1. The driver must support page_pool. The driver must not do its own recycling > + on top of page_pool. I like the rule, but is there a specific reason driver can't do its own recycling? > +2. The driver must support tcp-data-split ethtool option. > + > +3. The driver must use the page_pool netmem APIs. The netmem APIs are > + currently 1-to-1 mapped with page APIs. Conversion to netmem should be mapped => correspond ? > + achievable by switching the page APIs to netmem APIs and tracking memory via > + netmem_refs in the driver rather than struct page * : > + > + - page_pool_alloc -> page_pool_alloc_netmem > + - page_pool_get_dma_addr -> page_pool_get_dma_addr_netmem > + - page_pool_put_page -> page_pool_put_netmem > + > + Not all page APIs have netmem equivalents at the moment. If your driver > + relies on a missing netmem API, feel free to add and propose to netdev@ or > + reach out to almasrymina@google.com for help adding the netmem API. > + > +4. The driver must use the following PP_FLAGS: > + > + - PP_FLAG_DMA_MAP: netmem is not dma mappable by the driver. The driver > + must delegate the dma mapping to the page_pool. > + - PP_FLAG_DMA_SYNC_DEV: netmem dma addr is not necessarily dma-syncable > + by the driver. The driver must delegate the dma syncing to the page_pool. > + - PP_FLAG_ALLOW_UNREADABLE_NETMEM. The driver must specify this flag iff > + tcp-data-split is enabled. In this case the netmem allocated by the > + page_pool may be unreadable, and netmem_address() will return NULL to > + indicate that. The driver must not assume that the netmem is readable. Should we turn the netmem_address() explanation into a point of its own, calling out restrictions on use of CPU addresses, netmem_to_page() etc? > +5. The driver must use page_pool_dma_sync_netmem_for_cpu() in lieu of > + dma_sync_single_range_for_cpu(). For some memory providers, dma_syncing for > + CPU will be done by the page_pool, for others (particularly dmabuf memory > + provider), dma syncing for CPU is the responsibility of the userspace using > + dmabuf APIs. The driver must delegate the entire dma-syncing operation to > + the page_pool which will do it correctly.
On Tue, Dec 10, 2024 at 7:55 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Mon, 9 Dec 2024 17:23:08 +0000 Mina Almasry wrote: > > +================ > > +Memory providers > > +================ > > + > > + > > +Intro > > +===== > > + > > +Device memory TCP, and likely more upcoming features, are reliant in memory > > +provider support in the driver. > > Are "memory providers" something driver authors care about? > I'd go with netmem naming in all driver facing APIs. > Or perhaps say placing data into unreable buffers? > Sure, I can center the docs around netmem rather than 'memory providers'. > > +Driver support > > +============== > > + > > +1. The driver must support page_pool. The driver must not do its own recycling > > + on top of page_pool. > > I like the rule, but is there a specific reason driver can't do its own > recycling? > Drivers doing their own recycling is not currently supported, AFAICT. Adding support for it in the future and maintaining it is doable, but a headache. I also noticed with IDPF you're nacking drivers doing their own recycling anyway, so I thought why not just declare all such use cases as not supported to make the whole thing much simpler. Drivers can deprecate their recycling while adding support for netmem which brings them in line with what you're enforcing for new drivers anyway. The specific reason: currently drivers will get_page pp pages to hold on to them to do their own recycling, right? We don't even have a get_netmem equivalent. We could add one (and for the TX path, which is coming along, I do add one), but even then, the pp needs to detect elevated references on net_iovs to exclude them from pp recycling. The mp also needs to understand/keep track of elevated refcounts and make sure the page is returned to it when the elevated refcounts from the driver are dropped. -- Thanks, Mina
On Wed, 11 Dec 2024 09:53:36 -0800 Mina Almasry wrote: > Drivers doing their own recycling is not currently supported, AFAICT. > Adding support for it in the future and maintaining it is doable, but > a headache. I also noticed with IDPF you're nacking drivers doing > their own recycling anyway, so I thought why not just declare all such > use cases as not supported to make the whole thing much simpler. > Drivers can deprecate their recycling while adding support for netmem > which brings them in line with what you're enforcing for new drivers > anyway. IIRC IDPF was doing recycling based on the old page ref tricks, without any use of page pool at all. But without using page pool the driver will never acquire a netmem_ref in the first place. > The specific reason: currently drivers will get_page pp pages to hold > on to them to do their own recycling, right? We don't even have a > get_netmem equivalent. We could add one (and for the TX path, which is > coming along, I do add one), but even then, the pp needs to detect > elevated references on net_iovs to exclude them from pp recycling. The > mp also needs to understand/keep track of elevated refcounts and make > sure the page is returned to it when the elevated refcounts from the > driver are dropped. No? It should all just work. The page may get split / fragmented by the driver or page_pool_alloc_netmem() which you're adding in this series. A fragmented net_iov will have an elevated refcount in exactly the same way as if the driver was stashing one ref to release later.
On Wed, Dec 11, 2024 at 6:28 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Wed, 11 Dec 2024 09:53:36 -0800 Mina Almasry wrote: > > Drivers doing their own recycling is not currently supported, AFAICT. > > Adding support for it in the future and maintaining it is doable, but > > a headache. I also noticed with IDPF you're nacking drivers doing > > their own recycling anyway, so I thought why not just declare all such > > use cases as not supported to make the whole thing much simpler. > > Drivers can deprecate their recycling while adding support for netmem > > which brings them in line with what you're enforcing for new drivers > > anyway. > > IIRC IDPF was doing recycling based on the old page ref tricks, > without any use of page pool at all. But without using page pool > the driver will never acquire a netmem_ref in the first place. > > > The specific reason: currently drivers will get_page pp pages to hold > > on to them to do their own recycling, right? We don't even have a > > get_netmem equivalent. We could add one (and for the TX path, which is > > coming along, I do add one), but even then, the pp needs to detect > > elevated references on net_iovs to exclude them from pp recycling. The > > mp also needs to understand/keep track of elevated refcounts and make > > sure the page is returned to it when the elevated refcounts from the > > driver are dropped. > > No? It should all just work. The page may get split / fragmented by > the driver or page_pool_alloc_netmem() which you're adding in this > series. A fragmented net_iov will have an elevated refcount in exactly > the same way as if the driver was stashing one ref to release later. Ah, I had not considered that the driver may recycle the page by holding onto a pp ref count, rather than a page refcount. The former should indeed just work, although I don't have a driver that does this, so test coverage may be a bit lacking. But you mentioned you like the rule above. Do you want this removed from the docs entirely? Or clarified to something like "driver's can't perform their own recycling by holding onto page refs, but can hold onto page_pool refs"? -- Thanks, Mina
On Fri, 13 Dec 2024 09:50:15 -0800 Mina Almasry wrote: > > No? It should all just work. The page may get split / fragmented by > > the driver or page_pool_alloc_netmem() which you're adding in this > > series. A fragmented net_iov will have an elevated refcount in exactly > > the same way as if the driver was stashing one ref to release later. > > Ah, I had not considered that the driver may recycle the page by > holding onto a pp ref count, rather than a page refcount. The former > should indeed just work, although I don't have a driver that does > this, so test coverage may be a bit lacking. > > But you mentioned you like the rule above. Do you want this removed > from the docs entirely? Or clarified to something like "driver's can't > perform their own recycling by holding onto page refs, but can hold > onto page_pool refs"? We can call it out, makes sense. I'm not sure how to clearly name the page ref vs page_pool ref. But yes, driver must not attempt to hold onto struct page for recycling, as there is no struct page. I'd add to that that recycling using page pool refs is also discouraged as the circulation time for buffers is much longer than when data is copied out during recvmsg().
Hi-- On 12/9/24 9:23 AM, Mina Almasry wrote: > Document expectations from drivers looking to add support for device > memory tcp or other memory provider based features. > > Signed-off-by: Mina Almasry <almasrymina@google.com> > > --- > Documentation/networking/index.rst | 1 + > Documentation/networking/memory-provider.rst | 52 ++++++++++++++++++++ > 2 files changed, 53 insertions(+) > create mode 100644 Documentation/networking/memory-provider.rst > > diff --git a/Documentation/networking/memory-provider.rst b/Documentation/networking/memory-provider.rst > new file mode 100644 > index 000000000000..4eee3b01eb18 > --- /dev/null > +++ b/Documentation/networking/memory-provider.rst > @@ -0,0 +1,52 @@ > +.. SPDX-License-Identifier: GPL-2.0 > + > +================ > +Memory providers > +================ > + > + > +Intro Full word, please. > +===== > + > +Device memory TCP, and likely more upcoming features, are reliant in memory > +provider support in the driver. I can't quite parse after "features." Maybe "are reliant on"? Maybe "in-memory"? Should it be "reliable" instead of "reliant"? Internet tells me that reliant means dependent on. > + > + > +Driver support > +============== > + > +1. The driver must support page_pool. The driver must not do its own recycling > + on top of page_pool. > + > +2. The driver must support tcp-data-split ethtool option. must support the > + > +3. The driver must use the page_pool netmem APIs. The netmem APIs are > + currently 1-to-1 mapped with page APIs. Conversion to netmem should be > + achievable by switching the page APIs to netmem APIs and tracking memory via > + netmem_refs in the driver rather than struct page * : > + > + - page_pool_alloc -> page_pool_alloc_netmem > + - page_pool_get_dma_addr -> page_pool_get_dma_addr_netmem > + - page_pool_put_page -> page_pool_put_netmem > + > + Not all page APIs have netmem equivalents at the moment. If your driver > + relies on a missing netmem API, feel free to add and propose to netdev@ or > + reach out to almasrymina@google.com for help adding the netmem API. > + > +4. The driver must use the following PP_FLAGS: > + > + - PP_FLAG_DMA_MAP: netmem is not dma mappable by the driver. The driver dma-mappable > + must delegate the dma mapping to the page_pool. > + - PP_FLAG_DMA_SYNC_DEV: netmem dma addr is not necessarily dma-syncable > + by the driver. The driver must delegate the dma syncing to the page_pool. > + - PP_FLAG_ALLOW_UNREADABLE_NETMEM. The driver must specify this flag iff > + tcp-data-split is enabled. In this case the netmem allocated by the > + page_pool may be unreadable, and netmem_address() will return NULL to > + indicate that. The driver must not assume that the netmem is readable. > + > +5. The driver must use page_pool_dma_sync_netmem_for_cpu() in lieu of > + dma_sync_single_range_for_cpu(). For some memory providers, dma_syncing for > + CPU will be done by the page_pool, for others (particularly dmabuf memory > + provider), dma syncing for CPU is the responsibility of the userspace using > + dmabuf APIs. The driver must delegate the entire dma-syncing operation to > + the page_pool which will do it correctly. thanks. -- ~Randy
© 2016 - 2025 Red Hat, Inc.