Currently net_iovs support only pp ref counts, and do not support a
page ref equivalent.
This is fine for the RX path as net_iovs are used exclusively with the
pp and only pp refcounting is needed there. The TX path however does not
use pp ref counts, thus, support for get_page/put_page equivalent is
needed for netmem.
Support get_netmem/put_netmem. Check the type of the netmem before
passing it to page or net_iov specific code to obtain a page ref
equivalent.
For dmabuf net_iovs, we obtain a ref on the underlying binding. This
ensures the entire binding doesn't disappear until all the net_iovs have
been put_netmem'ed. We do not need to track the refcount of individual
dmabuf net_iovs as we don't allocate/free them from a pool similar to
what the buddy allocator does for pages.
This code is written to be extensible by other net_iov implementers.
get_netmem/put_netmem will check the type of the netmem and route it to
the correct helper:
pages -> [get|put]_page()
dmabuf net_iovs -> net_devmem_[get|put]_net_iov()
new net_iovs -> new helpers
Signed-off-by: Mina Almasry <almasrymina@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
---
v5: https://lore.kernel.org/netdev/20250227041209.2031104-2-almasrymina@google.com/
- Updated to check that the net_iov is devmem before calling
net_devmem_put_net_iov().
- Jakub requested that callers of __skb_frag_ref()/skb_page_unref be
inspected to make sure that they generate / anticipate skbs with the
correct pp_recycle and unreadable setting:
skb_page_unref
==============
- skb_page_unref is unreachable from these callers due to unreadable
checks returning early:
gro_pull_from_frag0, skb_copy_ubufs, __pskb_pull_tail
- callers that are reachable for unreadable skbs. These would only see rx
unreadable skbs with pp_recycle set before this patchset and would drop
a pp ref count. After this patchset they can see tx unreadable skbs
with no pp attached and no pp_recycle set, and so now they will drop
a net_iov ref via put_netmem:
__pskb_trim, __pskb_trim_head, skb_release_data, skb_shift
__skb_frag_ref
==============
Before this patchset __skb_frag_ref would not do the right thing if it
saw any unreadable skbs, either with pp_recycle set or not. Because it
unconditionally tries to acquire a page ref, but with RX only support I
can't reproduce calls to __skb_frag_ref even after enabling tc forwarding
to TX.
After this patchset __skb_frag_ref would obtain a page ref equivalent on
dmabuf net_iovs, by obtaining a ref on the binding.
Callers that are unreachable for unreadable skbs:
- veth_xdp_get
Callers that are reachable for unreadable skbs, and from code review they
look specific to the TX path:
- tcp_grow_skb, __skb_zcopy_downgrade_managed, __pskb_copy_fclone,
pskb_expand_head, skb_zerocopy, skb_split, pksb_carve_inside_header,
pskb_care_inside_nonlinear, tcp_clone_payload, skb_segment.
Callers that are reachable for unreadable skbs, and from code review they
look reachable in the RX path, although my testing never hit these
paths. These are concerning. Maybe we should put this patch in net and
cc stable? However, no drivers currently enable unreadable netmem, so
fixing this in net-next is fine as well maybe:
- skb_shift, skb_try_coalesce
v2:
- Add comment on top of refcount_t ref explaining the usage in the XT
path.
- Fix missing definition of net_devmem_dmabuf_binding_put in this patch.
---
include/linux/skbuff_ref.h | 4 ++--
include/net/netmem.h | 3 +++
net/core/devmem.c | 10 ++++++++++
net/core/devmem.h | 20 ++++++++++++++++++++
net/core/skbuff.c | 30 ++++++++++++++++++++++++++++++
5 files changed, 65 insertions(+), 2 deletions(-)
diff --git a/include/linux/skbuff_ref.h b/include/linux/skbuff_ref.h
index 0f3c58007488..9e49372ef1a0 100644
--- a/include/linux/skbuff_ref.h
+++ b/include/linux/skbuff_ref.h
@@ -17,7 +17,7 @@
*/
static inline void __skb_frag_ref(skb_frag_t *frag)
{
- get_page(skb_frag_page(frag));
+ get_netmem(skb_frag_netmem(frag));
}
/**
@@ -40,7 +40,7 @@ static inline void skb_page_unref(netmem_ref netmem, bool recycle)
if (recycle && napi_pp_put_page(netmem))
return;
#endif
- put_page(netmem_to_page(netmem));
+ put_netmem(netmem);
}
/**
diff --git a/include/net/netmem.h b/include/net/netmem.h
index 64af9a288c80..1b047cfb9e4f 100644
--- a/include/net/netmem.h
+++ b/include/net/netmem.h
@@ -273,4 +273,7 @@ static inline unsigned long netmem_get_dma_addr(netmem_ref netmem)
return __netmem_clear_lsb(netmem)->dma_addr;
}
+void get_netmem(netmem_ref netmem);
+void put_netmem(netmem_ref netmem);
+
#endif /* _NET_NETMEM_H */
diff --git a/net/core/devmem.c b/net/core/devmem.c
index f5c3a7e6dbb7..dca2ff7cf692 100644
--- a/net/core/devmem.c
+++ b/net/core/devmem.c
@@ -295,6 +295,16 @@ net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd,
return ERR_PTR(err);
}
+void net_devmem_get_net_iov(struct net_iov *niov)
+{
+ net_devmem_dmabuf_binding_get(net_devmem_iov_binding(niov));
+}
+
+void net_devmem_put_net_iov(struct net_iov *niov)
+{
+ net_devmem_dmabuf_binding_put(net_devmem_iov_binding(niov));
+}
+
/*** "Dmabuf devmem memory provider" ***/
int mp_dmabuf_devmem_init(struct page_pool *pool)
diff --git a/net/core/devmem.h b/net/core/devmem.h
index 7fc158d52729..946f2e015746 100644
--- a/net/core/devmem.h
+++ b/net/core/devmem.h
@@ -29,6 +29,10 @@ struct net_devmem_dmabuf_binding {
* The binding undos itself and unmaps the underlying dmabuf once all
* those refs are dropped and the binding is no longer desired or in
* use.
+ *
+ * net_devmem_get_net_iov() on dmabuf net_iovs will increment this
+ * reference, making sure that the binding remains alive until all the
+ * net_iovs are no longer used.
*/
refcount_t ref;
@@ -111,6 +115,9 @@ net_devmem_dmabuf_binding_put(struct net_devmem_dmabuf_binding *binding)
__net_devmem_dmabuf_binding_free(binding);
}
+void net_devmem_get_net_iov(struct net_iov *niov);
+void net_devmem_put_net_iov(struct net_iov *niov);
+
struct net_iov *
net_devmem_alloc_dmabuf(struct net_devmem_dmabuf_binding *binding);
void net_devmem_free_dmabuf(struct net_iov *ppiov);
@@ -120,6 +127,19 @@ bool net_is_devmem_iov(struct net_iov *niov);
#else
struct net_devmem_dmabuf_binding;
+static inline void
+net_devmem_dmabuf_binding_put(struct net_devmem_dmabuf_binding *binding)
+{
+}
+
+static inline void net_devmem_get_net_iov(struct net_iov *niov)
+{
+}
+
+static inline void net_devmem_put_net_iov(struct net_iov *niov)
+{
+}
+
static inline void
__net_devmem_dmabuf_binding_free(struct net_devmem_dmabuf_binding *binding)
{
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index d73ad79fe739..00c22bce98e4 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -89,6 +89,7 @@
#include <linux/textsearch.h>
#include "dev.h"
+#include "devmem.h"
#include "netmem_priv.h"
#include "sock_destructor.h"
@@ -7313,3 +7314,32 @@ bool csum_and_copy_from_iter_full(void *addr, size_t bytes,
return false;
}
EXPORT_SYMBOL(csum_and_copy_from_iter_full);
+
+void get_netmem(netmem_ref netmem)
+{
+ struct net_iov *niov;
+
+ if (netmem_is_net_iov(netmem)) {
+ niov = netmem_to_net_iov(netmem);
+ if (net_is_devmem_iov(niov))
+ net_devmem_get_net_iov(netmem_to_net_iov(netmem));
+ return;
+ }
+ get_page(netmem_to_page(netmem));
+}
+EXPORT_SYMBOL(get_netmem);
+
+void put_netmem(netmem_ref netmem)
+{
+ struct net_iov *niov;
+
+ if (netmem_is_net_iov(netmem)) {
+ niov = netmem_to_net_iov(netmem);
+ if (net_is_devmem_iov(niov))
+ net_devmem_put_net_iov(netmem_to_net_iov(netmem));
+ return;
+ }
+
+ put_page(netmem_to_page(netmem));
+}
+EXPORT_SYMBOL(put_netmem);
--
2.49.0.805.g082f7c87e0-goog
On 4/18/25 00:15, Mina Almasry wrote: > Currently net_iovs support only pp ref counts, and do not support a > page ref equivalent. Makes me wonder why it's needed. In theory, nobody should ever be taking page references without going through struct ubuf_info handling first, all in kernel users of these pages should always be paired with ubuf_info, as it's user memory, it's not stable, and without ubuf_info the user is allowed to overwrite it. Maybe there are some gray area cases like packet inspection or tracing? However in this case, after the ubuf_info is dropped, the user can overwrite the memory with its secrets. Definitely iffy in security terms. -- Pavel Begunkov
On Tue, Apr 22, 2025 at 1:43 AM Pavel Begunkov <asml.silence@gmail.com> wrote: > > On 4/18/25 00:15, Mina Almasry wrote: > > Currently net_iovs support only pp ref counts, and do not support a > > page ref equivalent. > > Makes me wonder why it's needed. In theory, nobody should ever be > taking page references without going through struct ubuf_info > handling first, all in kernel users of these pages should always > be paired with ubuf_info, as it's user memory, it's not stable, > and without ubuf_info the user is allowed to overwrite it. > The concern about the stability of the from-userspace data is already called out in the MSG_ZEROCOPY documentation that we're piggybacking devmem TX onto: https://www.kernel.org/doc/html/v4.15/networking/msg_zerocopy.html Basically the userspace passes the memory to the kernel and waits for a notification for when it's safe to reuse/overwrite the data. I don't know that it's a security concern. Basically if the userspace modifies the data before it gets the notification from the kernel, then it will mess up its own TX. The notification is sent by the kernel to the userspace when the skb is freed I believe, at that point it's safe to reuse the buffer as the kernel no longer needs it for TX. For devmem we do need to pin the binding until all TX users are done with it, so get_netmem will increase the refcount on the binding to keep it alive until the net stack is done with it. > Maybe there are some gray area cases like packet inspection or > tracing? However in this case, after the ubuf_info is dropped, the > user can overwrite the memory with its secrets. Definitely iffy > in security terms. > You can look at all the callers of skb_frag_ref to see all the code paths that grab a page ref on the frag today. There is also an inspection by me in the v5 changelog. -- Thanks, Mina
On 4/22/25 14:56, Mina Almasry wrote: > On Tue, Apr 22, 2025 at 1:43 AM Pavel Begunkov <asml.silence@gmail.com> wrote: >> >> On 4/18/25 00:15, Mina Almasry wrote: >>> Currently net_iovs support only pp ref counts, and do not support a >>> page ref equivalent. >> >> Makes me wonder why it's needed. In theory, nobody should ever be >> taking page references without going through struct ubuf_info >> handling first, all in kernel users of these pages should always >> be paired with ubuf_info, as it's user memory, it's not stable, >> and without ubuf_info the user is allowed to overwrite it. >> > > The concern about the stability of the from-userspace data is already > called out in the MSG_ZEROCOPY documentation that we're piggybacking > devmem TX onto: Sure, I didn't object that. There is no problem as long as the ubuf_info semantics is followed, which by extension mean that any ref manipulation should already be gated on ubuf_info, and there should be no need in changing generic paths. -- Pavel Begunkov
On Tue, Apr 22, 2025 at 11:19 AM Pavel Begunkov <asml.silence@gmail.com> wrote: > > On 4/22/25 14:56, Mina Almasry wrote: > > On Tue, Apr 22, 2025 at 1:43 AM Pavel Begunkov <asml.silence@gmail.com> wrote: > >> > >> On 4/18/25 00:15, Mina Almasry wrote: > >>> Currently net_iovs support only pp ref counts, and do not support a > >>> page ref equivalent. > >> > >> Makes me wonder why it's needed. In theory, nobody should ever be > >> taking page references without going through struct ubuf_info > >> handling first, all in kernel users of these pages should always > >> be paired with ubuf_info, as it's user memory, it's not stable, > >> and without ubuf_info the user is allowed to overwrite it. > >> > > > > The concern about the stability of the from-userspace data is already > > called out in the MSG_ZEROCOPY documentation that we're piggybacking > > devmem TX onto: > > Sure, I didn't object that. There is no problem as long as the > ubuf_info semantics is followed, which by extension mean that > any ref manipulation should already be gated on ubuf_info, and > there should be no need in changing generic paths. > I'm sorry I'm not following. skb_frag_ref is how the net stack obtains references on an skb_frag, regardless on whether the frag is a MSG_ZEROCOPY one with ubuf info, or a regular tx frag without a ubuf_info, or even an io_uring frag which I think have the msg->ubuf_info like we discussed previously. I don't see the net stack in the current code special casing how it obtains refs on frags, and I don't see the need to add special casing. Can you elaborate in more detail what is the gating you expect, and why? Are you asking that I check the skb has a ubuf_info before allowing to grab the reference on the dmabuf binding? Or something else? -- Thanks, Mina
On 4/22/25 19:30, Mina Almasry wrote: > On Tue, Apr 22, 2025 at 11:19 AM Pavel Begunkov <asml.silence@gmail.com> wrote: >> >> On 4/22/25 14:56, Mina Almasry wrote: >>> On Tue, Apr 22, 2025 at 1:43 AM Pavel Begunkov <asml.silence@gmail.com> wrote: >>>> >>>> On 4/18/25 00:15, Mina Almasry wrote: >>>>> Currently net_iovs support only pp ref counts, and do not support a >>>>> page ref equivalent. >>>> >>>> Makes me wonder why it's needed. In theory, nobody should ever be >>>> taking page references without going through struct ubuf_info >>>> handling first, all in kernel users of these pages should always >>>> be paired with ubuf_info, as it's user memory, it's not stable, >>>> and without ubuf_info the user is allowed to overwrite it. >>>> >>> >>> The concern about the stability of the from-userspace data is already >>> called out in the MSG_ZEROCOPY documentation that we're piggybacking >>> devmem TX onto: >> >> Sure, I didn't object that. There is no problem as long as the >> ubuf_info semantics is followed, which by extension mean that >> any ref manipulation should already be gated on ubuf_info, and >> there should be no need in changing generic paths. >> > > I'm sorry I'm not following. skb_frag_ref is how the net stack obtains > references on an skb_frag, regardless on whether the frag is a > MSG_ZEROCOPY one with ubuf info, or a regular tx frag without a > ubuf_info, or even an io_uring frag which I think have the Yep > msg->ubuf_info like we discussed previously. I don't see the net stack > in the current code special casing how it obtains refs on frags, and I > don't see the need to add special casing. Can you elaborate in more You'll be special casing it either way, it's probably unavoidable, just here it is in put/get_netmem. > detail what is the gating you expect, and why? Are you asking that I > check the skb has a ubuf_info before allowing to grab the reference on > the dmabuf binding? Or something else? get_page() already shouldn't be a valid operation for ubuf backed frags apart from few cases where frags are copied/moved together with ubuf. The frags are essentially bundled with ubuf and shouldn't exist without it, because otherwise user can overwrite memory with all the following nastiness. If there are some spots violating that, I'd rather say they should be addressed. Instead of adding net_iov / devmem handling in generic paths affecting everyone, you could change those functions where it's get_page() are called legitimately. The niov/devmem part of get/put_netmem doesn't even have the same semantics as the page counterparts as it cannot prevent from reallocation. That might be fine, but it's not clear why keep them together where they can't even follow the same behaviour. Interestingly, you can even replace per frag referencing with taking one ref per ubuf_info and putting it in the ubuf release callback, in a way similar to how SKBFL_MANAGED_FRAG_REFS works. FWIW, I do like the idea of get/put_netmem, it's nice to be able to easily list all callers, but maybe it should just warn on net_iovs. -- Pavel Begunkov
On 4/22/25 20:47, Pavel Begunkov wrote: > On 4/22/25 19:30, Mina Almasry wrote: >> On Tue, Apr 22, 2025 at 11:19 AM Pavel Begunkov <asml.silence@gmail.com> wrote: >>> >>> On 4/22/25 14:56, Mina Almasry wrote: >>>> On Tue, Apr 22, 2025 at 1:43 AM Pavel Begunkov <asml.silence@gmail.com> wrote: >>>>> >>>>> On 4/18/25 00:15, Mina Almasry wrote: >>>>>> Currently net_iovs support only pp ref counts, and do not support a >>>>>> page ref equivalent. >>>>> >>>>> Makes me wonder why it's needed. In theory, nobody should ever be >>>>> taking page references without going through struct ubuf_info >>>>> handling first, all in kernel users of these pages should always >>>>> be paired with ubuf_info, as it's user memory, it's not stable, >>>>> and without ubuf_info the user is allowed to overwrite it. >>>>> >>>> >>>> The concern about the stability of the from-userspace data is already >>>> called out in the MSG_ZEROCOPY documentation that we're piggybacking >>>> devmem TX onto: >>> >>> Sure, I didn't object that. There is no problem as long as the >>> ubuf_info semantics is followed, which by extension mean that >>> any ref manipulation should already be gated on ubuf_info, and >>> there should be no need in changing generic paths. >>> >> >> I'm sorry I'm not following. skb_frag_ref is how the net stack obtains >> references on an skb_frag, regardless on whether the frag is a >> MSG_ZEROCOPY one with ubuf info, or a regular tx frag without a >> ubuf_info, or even an io_uring frag which I think have the > > Yep > >> msg->ubuf_info like we discussed previously. I don't see the net stack >> in the current code special casing how it obtains refs on frags, and I >> don't see the need to add special casing. Can you elaborate in more > > You'll be special casing it either way, it's probably unavoidable, > just here it is in put/get_netmem. > >> detail what is the gating you expect, and why? Are you asking that I >> check the skb has a ubuf_info before allowing to grab the reference on >> the dmabuf binding? Or something else? > > get_page() already shouldn't be a valid operation for ubuf backed frags > apart from few cases where frags are copied/moved together with ubuf. > The frags are essentially bundled with ubuf and shouldn't exist without > it, because otherwise user can overwrite memory with all the following > nastiness. If there are some spots violating that, I'd rather say they > should be addressed. > > Instead of adding net_iov / devmem handling in generic paths affecting > everyone, you could change those functions where it's get_page() are > called legitimately. The niov/devmem part of get/put_netmem doesn't > even have the same semantics as the page counterparts as it cannot > prevent from reallocation. That might be fine, but it's not clear Actually, maybe it's not that exclusive to netiov, same reallocation argument is true for user pages, even though they're reffed separately. It might be fine to leave this approach, while suboptimal it should be easier for you. Depends on how folks feel about the extra overhead in the normal tx path. -- Pavel Begunkov
On Tue, Apr 22, 2025 at 1:03 PM Pavel Begunkov <asml.silence@gmail.com> wrote:
>
> On 4/22/25 20:47, Pavel Begunkov wrote:
> > On 4/22/25 19:30, Mina Almasry wrote:
> >> On Tue, Apr 22, 2025 at 11:19 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
> >>>
> >>> On 4/22/25 14:56, Mina Almasry wrote:
> >>>> On Tue, Apr 22, 2025 at 1:43 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
> >>>>>
> >>>>> On 4/18/25 00:15, Mina Almasry wrote:
> >>>>>> Currently net_iovs support only pp ref counts, and do not support a
> >>>>>> page ref equivalent.
> >>>>>
> >>>>> Makes me wonder why it's needed. In theory, nobody should ever be
> >>>>> taking page references without going through struct ubuf_info
> >>>>> handling first, all in kernel users of these pages should always
> >>>>> be paired with ubuf_info, as it's user memory, it's not stable,
> >>>>> and without ubuf_info the user is allowed to overwrite it.
> >>>>>
> >>>>
> >>>> The concern about the stability of the from-userspace data is already
> >>>> called out in the MSG_ZEROCOPY documentation that we're piggybacking
> >>>> devmem TX onto:
> >>>
> >>> Sure, I didn't object that. There is no problem as long as the
> >>> ubuf_info semantics is followed, which by extension mean that
> >>> any ref manipulation should already be gated on ubuf_info, and
> >>> there should be no need in changing generic paths.
> >>>
> >>
> >> I'm sorry I'm not following. skb_frag_ref is how the net stack obtains
> >> references on an skb_frag, regardless on whether the frag is a
> >> MSG_ZEROCOPY one with ubuf info, or a regular tx frag without a
> >> ubuf_info, or even an io_uring frag which I think have the
> >
> > Yep
> >
> >> msg->ubuf_info like we discussed previously. I don't see the net stack
> >> in the current code special casing how it obtains refs on frags, and I
> >> don't see the need to add special casing. Can you elaborate in more
> >
> > You'll be special casing it either way, it's probably unavoidable,
> > just here it is in put/get_netmem.
> >
> >> detail what is the gating you expect, and why? Are you asking that I
> >> check the skb has a ubuf_info before allowing to grab the reference on
> >> the dmabuf binding? Or something else?
> >
> > get_page() already shouldn't be a valid operation for ubuf backed frags
> > apart from few cases where frags are copied/moved together with ubuf.
This is where I'm not following. Per the v5 changelog of this commit,
all these skb_helpers hit skb_frag_ref (which is just get_page
underneath):
tcp_grow_skb, __skb_zcopy_downgrade_managed, __pskb_copy_fclone,
pskb_expand_head, skb_zerocopy, skb_split, pksb_carve_inside_header,
pskb_care_inside_nonlinear, tcp_clone_payload, skb_segment, skb_shift,
skb_try_coalesce.
I don't see many of them opt-out of skb_frag_ref if the skb is
unreadable or has ubuf_info. Are you saying all/most/some of these
callers are invalid? I tend to assume merged code is the correct one
unless I have ample expertise to say otherwise.
> > The frags are essentially bundled with ubuf and shouldn't exist without
> > it, because otherwise user can overwrite memory with all the following
> > nastiness. If there are some spots violating that, I'd rather say they
> > should be addressed.
> >
> > Instead of adding net_iov / devmem handling in generic paths affecting
> > everyone, you could change those functions where it's get_page() are
> > called legitimately. The niov/devmem part of get/put_netmem doesn't
> > even have the same semantics as the page counterparts as it cannot
> > prevent from reallocation. That might be fine, but it's not clear
>
> Actually, maybe it's not that exclusive to netiov, same reallocation
> argument is true for user pages, even though they're reffed
> separately.
>
> It might be fine to leave this approach, while suboptimal it should
> be easier for you. Depends on how folks feel about the extra
> overhead in the normal tx path.
>
Right, I think there is only 2 ways to handle all the code paths in
the tcp stack that hit skb_frag_ref:
1. We go over all of them and make sure they're unreachable for unreadable skbs:
if (!skb_frags_readable()) return; // or something
2. or, we just add net_iov support in skb_frag_ref.
This patch series does the latter, which IMO is much preferred.
FWIW I'm surprised that adding net_iov support to skb_frag_ref/unref
is facing uncertainty. I've added net_iov support for many skb helpers
in commit 65249feb6b3df ("net: add support for skbs with unreadable
frags") and commit 9f6b619edf2e8 ("net: support non paged skb frags").
skb_frag_ref/unref is just 1 helper I "missed" because it's mostly
(but not entirely) used by the TX path.
--
Thanks,
Mina
On 4/22/25 22:10, Mina Almasry wrote:
...
>>> Instead of adding net_iov / devmem handling in generic paths affecting
>>> everyone, you could change those functions where it's get_page() are
>>> called legitimately. The niov/devmem part of get/put_netmem doesn't
>>> even have the same semantics as the page counterparts as it cannot
>>> prevent from reallocation. That might be fine, but it's not clear
>>
>> Actually, maybe it's not that exclusive to netiov, same reallocation
>> argument is true for user pages, even though they're reffed
>> separately.
>>
>> It might be fine to leave this approach, while suboptimal it should
>> be easier for you. Depends on how folks feel about the extra
>> overhead in the normal tx path.
>>
>
> Right, I think there is only 2 ways to handle all the code paths in
> the tcp stack that hit skb_frag_ref:
>
> 1. We go over all of them and make sure they're unreachable for unreadable skbs:
>
> if (!skb_frags_readable()) return; // or something
>
> 2. or, we just add net_iov support in skb_frag_ref.
>
> This patch series does the latter, which IMO is much preferred.
>
> FWIW I'm surprised that adding net_iov support to skb_frag_ref/unref
> is facing uncertainty. I've added net_iov support for many skb helpers
> in commit 65249feb6b3df ("net: add support for skbs with unreadable
> frags") and commit 9f6b619edf2e8 ("net: support non paged skb frags").
> skb_frag_ref/unref is just 1 helper I "missed" because it's mostly
> (but not entirely) used by the TX path.
It'd have looked completely different if you did it back then, which
is the same semantics mismatch I mentioned. For pp rx niovs it'd
have incremented the niovs ref pinning that specific niov down
and preventing reallocation (by pp), and that with no devmem specific
code sticking into generic code.
This patch adds a 3rd way to refcount a frag (after page refs and
pp_ref_count), which is why it attracted attention.
--
Pavel Begunkov
© 2016 - 2025 Red Hat, Inc.