[PATCH 3/5] vhost/vsock: Allocate nonlinear SKBs for handling large receive buffers

Will Deacon posted 5 patches 3 months, 2 weeks ago
There is a newer version of this series
[PATCH 3/5] vhost/vsock: Allocate nonlinear SKBs for handling large receive buffers
Posted by Will Deacon 3 months, 2 weeks ago
When receiving a packet from a guest, vhost_vsock_handle_tx_kick()
calls vhost_vsock_alloc_skb() to allocate and fill an SKB with the
receive data. Unfortunately, these are always linear allocations and can
therefore result in significant pressure on kmalloc() considering that
the maximum packet size (VIRTIO_VSOCK_MAX_PKT_BUF_SIZE +
VIRTIO_VSOCK_SKB_HEADROOM) is a little over 64KiB, resulting in a 128KiB
allocation for each packet.

Rework the vsock SKB allocation so that, for sizes with page order
greater than PAGE_ALLOC_COSTLY_ORDER, a nonlinear SKB is allocated
instead with the packet header in the SKB and the receive data in the
fragments.

Signed-off-by: Will Deacon <will@kernel.org>
---
 drivers/vhost/vsock.c        | 15 +++++++++------
 include/linux/virtio_vsock.h | 31 +++++++++++++++++++++++++------
 2 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 66a0f060770e..cfa4e1bcf367 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -344,11 +344,16 @@ vhost_vsock_alloc_skb(struct vhost_virtqueue *vq,
 
 	len = iov_length(vq->iov, out);
 
-	if (len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE + VIRTIO_VSOCK_SKB_HEADROOM)
+	if (len < VIRTIO_VSOCK_SKB_HEADROOM ||
+	    len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE + VIRTIO_VSOCK_SKB_HEADROOM)
 		return NULL;
 
 	/* len contains both payload and hdr */
-	skb = virtio_vsock_alloc_skb(len, GFP_KERNEL);
+	if (len > SKB_WITH_OVERHEAD(PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
+		skb = virtio_vsock_alloc_skb_with_frags(len, GFP_KERNEL);
+	else
+		skb = virtio_vsock_alloc_skb(len, GFP_KERNEL);
+
 	if (!skb)
 		return NULL;
 
@@ -377,10 +382,8 @@ vhost_vsock_alloc_skb(struct vhost_virtqueue *vq,
 
 	virtio_vsock_skb_rx_put(skb);
 
-	nbytes = copy_from_iter(skb->data, payload_len, &iov_iter);
-	if (nbytes != payload_len) {
-		vq_err(vq, "Expected %zu byte payload, got %zu bytes\n",
-		       payload_len, nbytes);
+	if (skb_copy_datagram_from_iter(skb, 0, &iov_iter, payload_len)) {
+		vq_err(vq, "Failed to copy %zu byte payload\n", payload_len);
 		kfree_skb(skb);
 		return NULL;
 	}
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index 67ffb64325ef..8f9fa1cab32a 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -51,27 +51,46 @@ static inline void virtio_vsock_skb_rx_put(struct sk_buff *skb)
 {
 	u32 len;
 
+	DEBUG_NET_WARN_ON_ONCE(skb->len);
 	len = le32_to_cpu(virtio_vsock_hdr(skb)->len);
 
-	if (len > 0)
+	if (skb_is_nonlinear(skb))
+		skb->len = len;
+	else
 		skb_put(skb, len);
 }
 
-static inline struct sk_buff *virtio_vsock_alloc_skb(unsigned int size, gfp_t mask)
+static inline struct sk_buff *
+__virtio_vsock_alloc_skb_with_frags(unsigned int header_len,
+				    unsigned int data_len,
+				    gfp_t mask)
 {
 	struct sk_buff *skb;
+	int err;
 
-	if (size < VIRTIO_VSOCK_SKB_HEADROOM)
-		return NULL;
-
-	skb = alloc_skb(size, mask);
+	skb = alloc_skb_with_frags(header_len, data_len,
+				   PAGE_ALLOC_COSTLY_ORDER, &err, mask);
 	if (!skb)
 		return NULL;
 
 	skb_reserve(skb, VIRTIO_VSOCK_SKB_HEADROOM);
+	skb->data_len = data_len;
 	return skb;
 }
 
+static inline struct sk_buff *
+virtio_vsock_alloc_skb_with_frags(unsigned int size, gfp_t mask)
+{
+	size -= VIRTIO_VSOCK_SKB_HEADROOM;
+	return __virtio_vsock_alloc_skb_with_frags(VIRTIO_VSOCK_SKB_HEADROOM,
+						   size, mask);
+}
+
+static inline struct sk_buff *virtio_vsock_alloc_skb(unsigned int size, gfp_t mask)
+{
+	return __virtio_vsock_alloc_skb_with_frags(size, 0, mask);
+}
+
 static inline void
 virtio_vsock_skb_queue_head(struct sk_buff_head *list, struct sk_buff *skb)
 {
-- 
2.50.0.714.g196bf9f422-goog
Re: [PATCH 3/5] vhost/vsock: Allocate nonlinear SKBs for handling large receive buffers
Posted by Stefano Garzarella 3 months, 1 week ago
On Wed, Jun 25, 2025 at 02:15:41PM +0100, Will Deacon wrote:
>When receiving a packet from a guest, vhost_vsock_handle_tx_kick()
>calls vhost_vsock_alloc_skb() to allocate and fill an SKB with the
>receive data. Unfortunately, these are always linear allocations and can
>therefore result in significant pressure on kmalloc() considering that
>the maximum packet size (VIRTIO_VSOCK_MAX_PKT_BUF_SIZE +
>VIRTIO_VSOCK_SKB_HEADROOM) is a little over 64KiB, resulting in a 128KiB
>allocation for each packet.
>
>Rework the vsock SKB allocation so that, for sizes with page order
>greater than PAGE_ALLOC_COSTLY_ORDER, a nonlinear SKB is allocated
>instead with the packet header in the SKB and the receive data in the
>fragments.
>
>Signed-off-by: Will Deacon <will@kernel.org>
>---
> drivers/vhost/vsock.c        | 15 +++++++++------
> include/linux/virtio_vsock.h | 31 +++++++++++++++++++++++++------
> 2 files changed, 34 insertions(+), 12 deletions(-)
>
>diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>index 66a0f060770e..cfa4e1bcf367 100644
>--- a/drivers/vhost/vsock.c
>+++ b/drivers/vhost/vsock.c
>@@ -344,11 +344,16 @@ vhost_vsock_alloc_skb(struct vhost_virtqueue *vq,
>
> 	len = iov_length(vq->iov, out);
>
>-	if (len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE + VIRTIO_VSOCK_SKB_HEADROOM)
>+	if (len < VIRTIO_VSOCK_SKB_HEADROOM ||

Why moving this check here?

>+	    len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE + VIRTIO_VSOCK_SKB_HEADROOM)
> 		return NULL;
>
> 	/* len contains both payload and hdr */
>-	skb = virtio_vsock_alloc_skb(len, GFP_KERNEL);
>+	if (len > SKB_WITH_OVERHEAD(PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
>+		skb = virtio_vsock_alloc_skb_with_frags(len, GFP_KERNEL);
>+	else
>+		skb = virtio_vsock_alloc_skb(len, GFP_KERNEL);

Can we do this directly in virtio_vsock_alloc_skb() so we don't need
to duplicate code on virtio/vhost code?

>+
> 	if (!skb)
> 		return NULL;
>
>@@ -377,10 +382,8 @@ vhost_vsock_alloc_skb(struct vhost_virtqueue *vq,
>
> 	virtio_vsock_skb_rx_put(skb);
>
>-	nbytes = copy_from_iter(skb->data, payload_len, &iov_iter);
>-	if (nbytes != payload_len) {
>-		vq_err(vq, "Expected %zu byte payload, got %zu bytes\n",
>-		       payload_len, nbytes);
>+	if (skb_copy_datagram_from_iter(skb, 0, &iov_iter, payload_len)) {
>+		vq_err(vq, "Failed to copy %zu byte payload\n", payload_len);
> 		kfree_skb(skb);
> 		return NULL;
> 	}
>diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>index 67ffb64325ef..8f9fa1cab32a 100644
>--- a/include/linux/virtio_vsock.h
>+++ b/include/linux/virtio_vsock.h
>@@ -51,27 +51,46 @@ static inline void virtio_vsock_skb_rx_put(struct sk_buff *skb)
> {
> 	u32 len;
>
>+	DEBUG_NET_WARN_ON_ONCE(skb->len);

Should we mention in the commit message?

> 	len = le32_to_cpu(virtio_vsock_hdr(skb)->len);
>
>-	if (len > 0)

Why removing this check?

Thanks,
Stefano

>+	if (skb_is_nonlinear(skb))
>+		skb->len = len;
>+	else
> 		skb_put(skb, len);
> }
>
>-static inline struct sk_buff *virtio_vsock_alloc_skb(unsigned int size, gfp_t mask)
>+static inline struct sk_buff *
>+__virtio_vsock_alloc_skb_with_frags(unsigned int header_len,
>+				    unsigned int data_len,
>+				    gfp_t mask)
> {
> 	struct sk_buff *skb;
>+	int err;
>
>-	if (size < VIRTIO_VSOCK_SKB_HEADROOM)
>-		return NULL;
>-
>-	skb = alloc_skb(size, mask);
>+	skb = alloc_skb_with_frags(header_len, data_len,
>+				   PAGE_ALLOC_COSTLY_ORDER, &err, mask);
> 	if (!skb)
> 		return NULL;
>
> 	skb_reserve(skb, VIRTIO_VSOCK_SKB_HEADROOM);
>+	skb->data_len = data_len;
> 	return skb;
> }
>
>+static inline struct sk_buff *
>+virtio_vsock_alloc_skb_with_frags(unsigned int size, gfp_t mask)
>+{
>+	size -= VIRTIO_VSOCK_SKB_HEADROOM;
>+	return __virtio_vsock_alloc_skb_with_frags(VIRTIO_VSOCK_SKB_HEADROOM,
>+						   size, mask);
>+}
>+
>+static inline struct sk_buff *virtio_vsock_alloc_skb(unsigned int size, gfp_t mask)
>+{
>+	return __virtio_vsock_alloc_skb_with_frags(size, 0, mask);
>+}
>+
> static inline void
> virtio_vsock_skb_queue_head(struct sk_buff_head *list, struct sk_buff *skb)
> {
>-- 
>2.50.0.714.g196bf9f422-goog
>
>
Re: [PATCH 3/5] vhost/vsock: Allocate nonlinear SKBs for handling large receive buffers
Posted by Will Deacon 3 months, 1 week ago
On Fri, Jun 27, 2025 at 12:45:45PM +0200, Stefano Garzarella wrote:
> On Wed, Jun 25, 2025 at 02:15:41PM +0100, Will Deacon wrote:
> > When receiving a packet from a guest, vhost_vsock_handle_tx_kick()
> > calls vhost_vsock_alloc_skb() to allocate and fill an SKB with the
> > receive data. Unfortunately, these are always linear allocations and can
> > therefore result in significant pressure on kmalloc() considering that
> > the maximum packet size (VIRTIO_VSOCK_MAX_PKT_BUF_SIZE +
> > VIRTIO_VSOCK_SKB_HEADROOM) is a little over 64KiB, resulting in a 128KiB
> > allocation for each packet.
> > 
> > Rework the vsock SKB allocation so that, for sizes with page order
> > greater than PAGE_ALLOC_COSTLY_ORDER, a nonlinear SKB is allocated
> > instead with the packet header in the SKB and the receive data in the
> > fragments.
> > 
> > Signed-off-by: Will Deacon <will@kernel.org>
> > ---
> > drivers/vhost/vsock.c        | 15 +++++++++------
> > include/linux/virtio_vsock.h | 31 +++++++++++++++++++++++++------
> > 2 files changed, 34 insertions(+), 12 deletions(-)
> > 
> > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > index 66a0f060770e..cfa4e1bcf367 100644
> > --- a/drivers/vhost/vsock.c
> > +++ b/drivers/vhost/vsock.c
> > @@ -344,11 +344,16 @@ vhost_vsock_alloc_skb(struct vhost_virtqueue *vq,
> > 
> > 	len = iov_length(vq->iov, out);
> > 
> > -	if (len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE + VIRTIO_VSOCK_SKB_HEADROOM)
> > +	if (len < VIRTIO_VSOCK_SKB_HEADROOM ||
> 
> Why moving this check here?

I moved it here because virtio_vsock_alloc_skb_with_frags() does:

+       size -= VIRTIO_VSOCK_SKB_HEADROOM;
+       return __virtio_vsock_alloc_skb_with_frags(VIRTIO_VSOCK_SKB_HEADROOM,
+                                                  size, mask);

and so having the check in __virtio_vsock_alloc_skb_with_frags() looks
strange as, by then, it really only applies to the linear case. It also
feels weird to me to have the upper-bound of the length checked by the
caller but the lower-bound checked in the callee. I certainly find it
easier to reason about if they're in the same place.

Additionally, the lower-bound check is only needed by the vhost receive
code, as the transmit path uses virtio_vsock_alloc_skb(), which never
passes a size smaller than VIRTIO_VSOCK_SKB_HEADROOM.

Given all that, moving it to the one place that needs it seemed like the
best option. What do you think?

> > +	    len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE + VIRTIO_VSOCK_SKB_HEADROOM)
> > 		return NULL;
> > 
> > 	/* len contains both payload and hdr */
> > -	skb = virtio_vsock_alloc_skb(len, GFP_KERNEL);
> > +	if (len > SKB_WITH_OVERHEAD(PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
> > +		skb = virtio_vsock_alloc_skb_with_frags(len, GFP_KERNEL);
> > +	else
> > +		skb = virtio_vsock_alloc_skb(len, GFP_KERNEL);
> 
> Can we do this directly in virtio_vsock_alloc_skb() so we don't need
> to duplicate code on virtio/vhost code?

We can, but then I think we should do something different for the
rx_fill() path -- it feels fragile to rely on that using small-enough
buffers to guarantee linear allocations. How about I:

 1. Add virtio_vsock_alloc_linear_skb(), which always performs a linear
    allocation.

 2. Change virtio_vsock_alloc_skb() to use nonlinear SKBs for sizes
    greater than SKB_WITH_OVERHEAD(PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)

 3. Use virtio_vsock_alloc_linear_skb() to fill the guest RX buffers

 4. Use virtio_vsock_alloc_skb() for everything else

If you like the idea, I'll rework the series along those lines.
Diff below... (see end of mail)

> > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > index 67ffb64325ef..8f9fa1cab32a 100644
> > --- a/include/linux/virtio_vsock.h
> > +++ b/include/linux/virtio_vsock.h
> > @@ -51,27 +51,46 @@ static inline void virtio_vsock_skb_rx_put(struct sk_buff *skb)
> > {
> > 	u32 len;
> > 
> > +	DEBUG_NET_WARN_ON_ONCE(skb->len);
> 
> Should we mention in the commit message?

Sure, I'll add something. The non-linear handling doesn't accumulate len,
so it's a debug check to ensure that len hasn't been messed with between
allocation and here.

> > 	len = le32_to_cpu(virtio_vsock_hdr(skb)->len);
> > 
> > -	if (len > 0)
> 
> Why removing this check?

I think it's redundant: len is a u32, so we're basically just checking
to see if it's non-zero. All the callers have already checked for this
but, even if they didn't, skb_put(skb, 0) is harmless afaict.

Will

--->8

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 3799c0aeeec5..a6cd72a32f63 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -349,11 +349,7 @@ vhost_vsock_alloc_skb(struct vhost_virtqueue *vq,
 		return NULL;
 
 	/* len contains both payload and hdr */
-	if (len > SKB_WITH_OVERHEAD(PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
-		skb = virtio_vsock_alloc_skb_with_frags(len, GFP_KERNEL);
-	else
-		skb = virtio_vsock_alloc_skb(len, GFP_KERNEL);
-
+	skb = virtio_vsock_alloc_skb(len, GFP_KERNEL);
 	if (!skb)
 		return NULL;
 
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index 0e265921be03..ed5eab46e3dc 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -79,16 +79,19 @@ __virtio_vsock_alloc_skb_with_frags(unsigned int header_len,
 }
 
 static inline struct sk_buff *
-virtio_vsock_alloc_skb_with_frags(unsigned int size, gfp_t mask)
+virtio_vsock_alloc_linear_skb(unsigned int size, gfp_t mask)
 {
-	size -= VIRTIO_VSOCK_SKB_HEADROOM;
-	return __virtio_vsock_alloc_skb_with_frags(VIRTIO_VSOCK_SKB_HEADROOM,
-						   size, mask);
+	return __virtio_vsock_alloc_skb_with_frags(size, 0, mask);
 }
 
 static inline struct sk_buff *virtio_vsock_alloc_skb(unsigned int size, gfp_t mask)
 {
-	return __virtio_vsock_alloc_skb_with_frags(size, 0, mask);
+	if (size <= SKB_WITH_OVERHEAD(PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
+		return virtio_vsock_alloc_linear_skb(size, mask);
+
+	size -= VIRTIO_VSOCK_SKB_HEADROOM;
+	return __virtio_vsock_alloc_skb_with_frags(VIRTIO_VSOCK_SKB_HEADROOM,
+						   size, mask);
 }
 
 static inline void
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 4ae714397ca3..8c9ca0cb0d4e 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -321,7 +321,7 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
 	vq = vsock->vqs[VSOCK_VQ_RX];
 
 	do {
-		skb = virtio_vsock_alloc_skb(total_len, GFP_KERNEL);
+		skb = virtio_vsock_alloc_linear_skb(total_len, GFP_KERNEL);
 		if (!skb)
 			break;
 
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 424eb69e84f9..f74677c3511e 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -262,11 +262,7 @@ static struct sk_buff *virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *
 	if (!zcopy)
 		skb_len += payload_len;
 
-	if (skb_len > SKB_WITH_OVERHEAD(PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
-		skb = virtio_vsock_alloc_skb_with_frags(skb_len, GFP_KERNEL);
-	else
-		skb = virtio_vsock_alloc_skb(skb_len, GFP_KERNEL);
-
+	skb = virtio_vsock_alloc_skb(skb_len, GFP_KERNEL);
 	if (!skb)
 		return NULL;
Re: [PATCH 3/5] vhost/vsock: Allocate nonlinear SKBs for handling large receive buffers
Posted by Stefano Garzarella 3 months, 1 week ago
On Mon, Jun 30, 2025 at 03:20:57PM +0100, Will Deacon wrote:
>On Fri, Jun 27, 2025 at 12:45:45PM +0200, Stefano Garzarella wrote:
>> On Wed, Jun 25, 2025 at 02:15:41PM +0100, Will Deacon wrote:
>> > When receiving a packet from a guest, vhost_vsock_handle_tx_kick()
>> > calls vhost_vsock_alloc_skb() to allocate and fill an SKB with the
>> > receive data. Unfortunately, these are always linear allocations and can
>> > therefore result in significant pressure on kmalloc() considering that
>> > the maximum packet size (VIRTIO_VSOCK_MAX_PKT_BUF_SIZE +
>> > VIRTIO_VSOCK_SKB_HEADROOM) is a little over 64KiB, resulting in a 128KiB
>> > allocation for each packet.
>> >
>> > Rework the vsock SKB allocation so that, for sizes with page order
>> > greater than PAGE_ALLOC_COSTLY_ORDER, a nonlinear SKB is allocated
>> > instead with the packet header in the SKB and the receive data in the
>> > fragments.
>> >
>> > Signed-off-by: Will Deacon <will@kernel.org>
>> > ---
>> > drivers/vhost/vsock.c        | 15 +++++++++------
>> > include/linux/virtio_vsock.h | 31 +++++++++++++++++++++++++------
>> > 2 files changed, 34 insertions(+), 12 deletions(-)
>> >
>> > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>> > index 66a0f060770e..cfa4e1bcf367 100644
>> > --- a/drivers/vhost/vsock.c
>> > +++ b/drivers/vhost/vsock.c
>> > @@ -344,11 +344,16 @@ vhost_vsock_alloc_skb(struct vhost_virtqueue *vq,
>> >
>> > 	len = iov_length(vq->iov, out);
>> >
>> > -	if (len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE + VIRTIO_VSOCK_SKB_HEADROOM)
>> > +	if (len < VIRTIO_VSOCK_SKB_HEADROOM ||
>>
>> Why moving this check here?
>
>I moved it here because virtio_vsock_alloc_skb_with_frags() does:
>
>+       size -= VIRTIO_VSOCK_SKB_HEADROOM;
>+       return __virtio_vsock_alloc_skb_with_frags(VIRTIO_VSOCK_SKB_HEADROOM,
>+                                                  size, mask);
>
>and so having the check in __virtio_vsock_alloc_skb_with_frags() looks
>strange as, by then, it really only applies to the linear case. It also
>feels weird to me to have the upper-bound of the length checked by the
>caller but the lower-bound checked in the callee. I certainly find it
>easier to reason about if they're in the same place.
>
>Additionally, the lower-bound check is only needed by the vhost receive
>code, as the transmit path uses virtio_vsock_alloc_skb(), which never
>passes a size smaller than VIRTIO_VSOCK_SKB_HEADROOM.
>
>Given all that, moving it to the one place that needs it seemed like the
>best option. What do you think?

Okay, I see now. Yep, it's fine, but please mention in the commit 
description.

>
>> > +	    len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE + VIRTIO_VSOCK_SKB_HEADROOM)
>> > 		return NULL;
>> >
>> > 	/* len contains both payload and hdr */
>> > -	skb = virtio_vsock_alloc_skb(len, GFP_KERNEL);
>> > +	if (len > SKB_WITH_OVERHEAD(PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
>> > +		skb = virtio_vsock_alloc_skb_with_frags(len, GFP_KERNEL);
>> > +	else
>> > +		skb = virtio_vsock_alloc_skb(len, GFP_KERNEL);
>>
>> Can we do this directly in virtio_vsock_alloc_skb() so we don't need
>> to duplicate code on virtio/vhost code?
>
>We can, but then I think we should do something different for the
>rx_fill() path -- it feels fragile to rely on that using small-enough
>buffers to guarantee linear allocations. How about I:
>
> 1. Add virtio_vsock_alloc_linear_skb(), which always performs a linear
>    allocation.
>
> 2. Change virtio_vsock_alloc_skb() to use nonlinear SKBs for sizes
>    greater than SKB_WITH_OVERHEAD(PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)
>
> 3. Use virtio_vsock_alloc_linear_skb() to fill the guest RX buffers
>
> 4. Use virtio_vsock_alloc_skb() for everything else
>
>If you like the idea, I'll rework the series along those lines.
>Diff below... (see end of mail)

I really like it :-) let's go in that direction!

>
>> > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>> > index 67ffb64325ef..8f9fa1cab32a 100644
>> > --- a/include/linux/virtio_vsock.h
>> > +++ b/include/linux/virtio_vsock.h
>> > @@ -51,27 +51,46 @@ static inline void virtio_vsock_skb_rx_put(struct sk_buff *skb)
>> > {
>> > 	u32 len;
>> >
>> > +	DEBUG_NET_WARN_ON_ONCE(skb->len);
>>
>> Should we mention in the commit message?
>
>Sure, I'll add something. The non-linear handling doesn't accumulate len,
>so it's a debug check to ensure that len hasn't been messed with between
>allocation and here.
>
>> > 	len = le32_to_cpu(virtio_vsock_hdr(skb)->len);
>> >
>> > -	if (len > 0)
>>
>> Why removing this check?
>
>I think it's redundant: len is a u32, so we're basically just checking
>to see if it's non-zero. All the callers have already checked for this
>but, even if they didn't, skb_put(skb, 0) is harmless afaict.

Yep, I see, but now I don't remember why we have it, could it be more
expensive to call `skb_put(skb, 0)`, instead of just having the if for
control packets with no payload?

Thanks,
Stefano

>
>Will
>
>--->8
>
>diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>index 3799c0aeeec5..a6cd72a32f63 100644
>--- a/drivers/vhost/vsock.c
>+++ b/drivers/vhost/vsock.c
>@@ -349,11 +349,7 @@ vhost_vsock_alloc_skb(struct vhost_virtqueue *vq,
> 		return NULL;
>
> 	/* len contains both payload and hdr */
>-	if (len > SKB_WITH_OVERHEAD(PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
>-		skb = virtio_vsock_alloc_skb_with_frags(len, GFP_KERNEL);
>-	else
>-		skb = virtio_vsock_alloc_skb(len, GFP_KERNEL);
>-
>+	skb = virtio_vsock_alloc_skb(len, GFP_KERNEL);
> 	if (!skb)
> 		return NULL;
>
>diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>index 0e265921be03..ed5eab46e3dc 100644
>--- a/include/linux/virtio_vsock.h
>+++ b/include/linux/virtio_vsock.h
>@@ -79,16 +79,19 @@ __virtio_vsock_alloc_skb_with_frags(unsigned int header_len,
> }
>
> static inline struct sk_buff *
>-virtio_vsock_alloc_skb_with_frags(unsigned int size, gfp_t mask)
>+virtio_vsock_alloc_linear_skb(unsigned int size, gfp_t mask)
> {
>-	size -= VIRTIO_VSOCK_SKB_HEADROOM;
>-	return __virtio_vsock_alloc_skb_with_frags(VIRTIO_VSOCK_SKB_HEADROOM,
>-						   size, mask);
>+	return __virtio_vsock_alloc_skb_with_frags(size, 0, mask);
> }
>
> static inline struct sk_buff *virtio_vsock_alloc_skb(unsigned int size, gfp_t mask)
> {
>-	return __virtio_vsock_alloc_skb_with_frags(size, 0, mask);
>+	if (size <= SKB_WITH_OVERHEAD(PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
>+		return virtio_vsock_alloc_linear_skb(size, mask);
>+
>+	size -= VIRTIO_VSOCK_SKB_HEADROOM;
>+	return __virtio_vsock_alloc_skb_with_frags(VIRTIO_VSOCK_SKB_HEADROOM,
>+						   size, mask);
> }
>
> static inline void
>diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
>index 4ae714397ca3..8c9ca0cb0d4e 100644
>--- a/net/vmw_vsock/virtio_transport.c
>+++ b/net/vmw_vsock/virtio_transport.c
>@@ -321,7 +321,7 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
> 	vq = vsock->vqs[VSOCK_VQ_RX];
>
> 	do {
>-		skb = virtio_vsock_alloc_skb(total_len, GFP_KERNEL);
>+		skb = virtio_vsock_alloc_linear_skb(total_len, GFP_KERNEL);
> 		if (!skb)
> 			break;
>
>diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>index 424eb69e84f9..f74677c3511e 100644
>--- a/net/vmw_vsock/virtio_transport_common.c
>+++ b/net/vmw_vsock/virtio_transport_common.c
>@@ -262,11 +262,7 @@ static struct sk_buff *virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *
> 	if (!zcopy)
> 		skb_len += payload_len;
>
>-	if (skb_len > SKB_WITH_OVERHEAD(PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
>-		skb = virtio_vsock_alloc_skb_with_frags(skb_len, GFP_KERNEL);
>-	else
>-		skb = virtio_vsock_alloc_skb(skb_len, GFP_KERNEL);
>-
>+	skb = virtio_vsock_alloc_skb(skb_len, GFP_KERNEL);
> 	if (!skb)
> 		return NULL;
>
>
Re: [PATCH 3/5] vhost/vsock: Allocate nonlinear SKBs for handling large receive buffers
Posted by Will Deacon 3 months, 1 week ago
On Tue, Jul 01, 2025 at 12:44:58PM +0200, Stefano Garzarella wrote:
> On Mon, Jun 30, 2025 at 03:20:57PM +0100, Will Deacon wrote:
> > On Fri, Jun 27, 2025 at 12:45:45PM +0200, Stefano Garzarella wrote:
> > > On Wed, Jun 25, 2025 at 02:15:41PM +0100, Will Deacon wrote:
> > > > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > > > index 66a0f060770e..cfa4e1bcf367 100644
> > > > --- a/drivers/vhost/vsock.c
> > > > +++ b/drivers/vhost/vsock.c
> > > > @@ -344,11 +344,16 @@ vhost_vsock_alloc_skb(struct vhost_virtqueue *vq,
> > > >
> > > > 	len = iov_length(vq->iov, out);
> > > >
> > > > -	if (len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE + VIRTIO_VSOCK_SKB_HEADROOM)
> > > > +	if (len < VIRTIO_VSOCK_SKB_HEADROOM ||
> > > 
> > > Why moving this check here?
> > 
> > I moved it here because virtio_vsock_alloc_skb_with_frags() does:
> > 
> > +       size -= VIRTIO_VSOCK_SKB_HEADROOM;
> > +       return __virtio_vsock_alloc_skb_with_frags(VIRTIO_VSOCK_SKB_HEADROOM,
> > +                                                  size, mask);
> > 
> > and so having the check in __virtio_vsock_alloc_skb_with_frags() looks
> > strange as, by then, it really only applies to the linear case. It also
> > feels weird to me to have the upper-bound of the length checked by the
> > caller but the lower-bound checked in the callee. I certainly find it
> > easier to reason about if they're in the same place.
> > 
> > Additionally, the lower-bound check is only needed by the vhost receive
> > code, as the transmit path uses virtio_vsock_alloc_skb(), which never
> > passes a size smaller than VIRTIO_VSOCK_SKB_HEADROOM.
> > 
> > Given all that, moving it to the one place that needs it seemed like the
> > best option. What do you think?
> 
> Okay, I see now. Yep, it's fine, but please mention in the commit
> description.

Great, I'll do that.

> > > > 	len = le32_to_cpu(virtio_vsock_hdr(skb)->len);
> > > >
> > > > -	if (len > 0)
> > > 
> > > Why removing this check?
> > 
> > I think it's redundant: len is a u32, so we're basically just checking
> > to see if it's non-zero. All the callers have already checked for this
> > but, even if they didn't, skb_put(skb, 0) is harmless afaict.
> 
> Yep, I see, but now I don't remember why we have it, could it be more
> expensive to call `skb_put(skb, 0)`, instead of just having the if for
> control packets with no payload?

That sounds like a questionable optimisation, but I can preserve it in
the only caller that doesn't already check for a non-zero size
(virtio_transport_rx_work()). I mistakenly thought that it was already
checking it, but on closer inspection it only checks the size of the
virtqueue buffer and doesn't look at the packet header at all.

In fact, that is itself a bug because nothing prevents an SKB overflow
on the put path...

I'll add an extra fix for that in v2 so that it can be backported
independently.

Will